Generating Experiments to test Novel Biology Hypotheses Each of the 229 (orphan ?) reactions that have unknown ORF annotations corresponds to a node in a graph (hypergraph). Each reaction consists of a set of substrate compounds and a set of product compounds, and describes chemical transformations from the substrates to the products and vice versa. There are two kinds of reaction in the network, forwards only reactions and reversible reactions. In forwards only reactions, the chemical transformations can only move from substrates to products; in reversible reactions the chemical transformations can move in both directions: substrates direction products A + B -> C + D forwards only A + B <-> C + D reversible Reactions are connected within a metabolite network when two reactions share the same compound, e.g.: A + B -> C + D C + F -> G compound C is produced from A and B, and then is transformed to G in the precence of compound F. The yeast (any) metabolic network consists of thousands of these connections which form paths involving many linked compounds. There are many well described paths which correspond to the metabolic pathways (e.g. Glycoloysis/Gluconeogenesis) in online databases such as KEGG. Many single ORF knockout strains have one or more reactions effectively removed from the network, because the protein catalysing the reaction is no longer synthesised by the cell. A missing reaction can result in a cascade effect where many more reactions, dependent on chemicals no longer produced by the missing reaction, also no longer take place. As a result there may be many compounds no longer present in the metabolism of the cell, and the growth characteristics of the knockout strain can be different to the knockout. However, if the knockout strain is grown with a growth medium containing a compound or compounds from one of the reactions disabled by the missing ORF, many, if not all, of the implicated reactions can become functional again, and the growth characterists of the knockout strain will become closer to the growth characteristics of the wild type. The algorithm for generating experiments uses graph traversal to collect compounds in the neighbourhood of each of the orphan reactions (as well as from the orphan reaction itself). Nearby compounds have a greater chance of belonging to the missing reactions and therefore have a greater chance of restoring growth to that of the wild type. The algorithm corresponds to a levelwise search of the reactions comprising the metabolic network where each orphan reaction is level 0 and levels n and n+1 contain linked reactions that share at least one compound. 1 IF Level =< MaxLevel Then 2 With each orf(ORF,rank) -> reaction(Substrates,Direction,Products) from HypothesisReactions Do 3 IF Direction = -> Then 3.1 With each compound p from Products Do 3.1.1 IF p NOT in Avoid Compounds Then 3.1.1.1 add experiment(ORF,rank,p,Level) to Experiment Collection E 3.1.1.2 Find All PSReactions reaction(PSSubstrate, PSDirection,PSProducts) such that p is IN PSSubstrate 3.1.1.3 Find All PPReactions reaction(PPSubstrate, PPDirection,PPProducts) where PPDirection = <-> AND p is IN PPProducts 3.2 NewPReactions = PSReactions + PPReactions 3.3 NLevel = Level + 1 3.4 GOTO 1 with NewPReactions AND NLevel 4 IF Direction = <-> Then 4.1 With each compound p from Products Do 4.1.1 IF p NOT in Avoid Compounds Then 4.1.1.1 add add experiment(ORF,p,Level) to Experiment Collection E 4.1.1.2 Find All PSReactions ORF -> reaction(PSSubstrate, PSDirection,PSProducts) such that p is IN PSSubstrate 4.1.1.3 Find All PPReactions ORF -> reaction(PPSubstrate, PPDirection,PPProducts) where PPDirection = <-> AND p is IN PPProducts 4.2 NewPReactions = PSReactions + PPReactions 4.3 With each compound s from Substrates Do 4.3.1 IF s NOT in Avoid Compounds Then 4.3.1.1 add experiment(ORF,s,Level) to Experiment Collection E 4.3.1.2 Find All SSReactions ORF -> reaction(SSSubstrate, SSDirection,SSProducts) such that s is IN SSSubstrate 4.3.1.3 Find All SPReactions ORF -> reaction(SPSubstrate, SPDirection,SPProducts) where SPDirection = <-> AND s is IN SPProducts 4.4 NewSReactions = SSReactions + SPReactions 4.5 NewSPReactions = NewSReactions + NewPReactions 4.6 NLevel = Level + 1 4.6 GOTO 1 with NewSPReactions AND NLevel 6 IF Level > MaxLevel THEN Return Experiment Collection E Each experiment belonging to the resulting experiment collection consists of one candidate ORF from the hypothesis generation, together with the candidate rank from the Blast/FASTA e-score determination and one nutrient from the levelwise search described above, as well as the level at which the nutrient was found (this corresponds to the distance between the orphan reaction and the reaction containing the nutrient). The experiments can be ranked by both the candidate ORF ranking and the level distance measure. The most favourable experiments are those with a low ranking for the ORF and a low nutrient level(distance) score. To generate experiments for Adam's new biology investigations, a cutoff of 2 was used for the generation of candidate nutrients (i.e. only reactions at most 2 levels of connectivity were considered in the search for nutrients). This still produced 895 possible experiments. This number was further reduced by only considering the top candidate ORFs found from the hypothesis generation step. These Experiments were then sorted by the nutrient level score and also organised by nutrients so that experiments using the same nutrients were grouped together so that the maximum number of experiments could be performed in a single run by Adam (because of the 6 nutrient limit on the Sciclone deck). These two goals are in conflict: e.g. the best ordering of experiments will often not lead to the maximisation of experiments for Adam. Further constraints on the choice of experiments were 1) the availability of ORFs in the yeast strain library, including the ease by which the strain could be cultivated from the freezer and 2) the availability of the nutrient compound from the supplier (Sigma-Aldrich etc). Although all of the candidate experiments were generated in one step, the changing availability of ORFs and nutrients further complicated the process of generating experiments for any particular Adam run. Relevant files can be found in ../informatics/bioinformaticsData/avoid_compounds.csv and ../informatics/bioinformaticsData/available_nutrients.csv Relevant code can be found in ../informatics/code/laboratory_control.tar.gz ../informatics/code/yeast_model_engine.tar.gz