Graph and Logic Based Models of Metabolism

A graph is a mathematical representation that is very useful for illustrating and reasoning about complex objects composed of many smaller objects and the connections between them, for example a place map can be thought of as a graph, where the smaller objects are the towns and the connections are the roads leading from one town to another. Other applications of graphs include computer networks where each machine is connected to a set of other machines via network cables, and ecosystem energy flow diagrams where the connections illustrate the flow of energy between populations of species in the ecosystem.

Graphs are increasingly common in systems biology where they are used to model the metabolic pathways by which cells consume and construct the molecules necessary to sustain life and reproduce. Graphs allow an explicit representation of all the metabolic pathways, as well as the interactions between the pathways....

A graph G is an object composed of a set of vertices (or nodes) V and a set sof edges E connecting vertices from V, i.e. G = {V,E}.

for example...

Where {a,b,c,d,e,f,g} are vertices and {1,2,3,4,5,6,7,8,9,10,11,12} are edges. This is an example of a directed graph, i.e. the edges denote "flow" in only one direction. It also contains cycles e.g. c -> b -> d -> c

A Path in a graph is a traversal from a start node to a destination node, with a note of each node (and sometimes edge) passed through on the way, for example...

Path P(a,g) = {a,b,d,c,e,f,g} (vertices) or {1,6,7,4,12,11} (edges)

A metabolic graph is a graph that is used to represent the chemical transforamtions that take place within the cell - many metabolic pathways are in fact grpahs. Each vertex is a chemical compound (metabolite) and each edge represents a chemical transformation from one chemical to another.

Chemical transformations take place by means of chemical reactions, however transforming the set of reactions comprising a metabolic pathway into a graph is not completely straightforward - more than a single egde is required to represent complex reactions involving more than two chemicals e.g

The (hypothetical) reaction

A + B <=> C + D (a reversible reaction)

gives rise to the following edges or chemical transformations

In a similar way the following reaction from KEGG also gives rise to eight unique edges...

These transformations do not always result in "meaningful" chemical transformations, e.g. in reactions involving cofactors such as NADP+, a chemical transformation such as Pyruvate -> NADP+ is not terribly meaningful because there is no transfer of electrons/atoms by which Pyruvate is converted into NADP+, - so the prescence of an edge in the graph is a little misleading. The pathway diagrams in KEGG are metabolic graphs in which the less meaningful chemical transformations have been removed, so that the focus is on the various transformations from the start points of the pathway to the end products, via the intermediary reactions.

In KEGG, and in the logical model used by the Robot Scientist all/most of the edges are annotated with the enzyme(s) that catalyse(s) the reaction(s) in which the chemical transformation takes place, and by the gene(s) or open reading frame(s) (ORFs) by which the organism synthesises the enzymes.

The logical model (the robot model) that comprises the background knowledge for the Robot Scientist is a logical representation of a metabolic graph. The PROLOG computing language is used to represent the reactions and ORFs/Enzymes in the model. The robot model has been directly constructed from the "Nielsen" model; designed by Foster et al [REF maybe] for Flux Balance Analysis applications; and updated by adding new information from KEGG. The Nielsen model splits the cell into 2 internal compartments - cytosol and mitochondrion, reflecting the prescence of both nuclear and mitochondrial DNA in eukaryote cells, and an external compartment.

The robot model has an explicit representation of the reactions and ORF/enzymes as well as a PROLOG representation of the graph representing the chemical transformations.

The chemical transformations are represented by two sets of PROLOG facts:

node(Location,KEGGID). e.g. node(cytosol,'C00002'). states that the cytosol compartment has a node representing ATP in the graph
reaction_edge(Location1,KEGGID1,->,Location2,KEGGID2,Reactions) e.g. reaction_edge(cytosol,'C00002',->,cytosol,'C00008',[849,896,...]). states that there is a chemical transformation from ATP to ADP in the cytosol compartment and it is found in reactions [849,896,...]

The reactions and ORFs/Enzymes are also represented by two sets of PROLOG facts:

orf_fact(Orf,Enzyme,ECFact,GeneName,GeneDesc,Reaction) e.g. orf_fact('YBR166C','1.3.1.13,enzyme_class('1','3','1','13'), 'TYR1','prephenate dehydrogenase (nadp+),366). This states that ORF YBR166C codes a protein belonging to enzyme class 1.1.1.13 (also stored as a relation to allow traveral of the Enzyme Commission class hierarchy), that the ORF is the "TYR1" gene in Yeast, that EC class 1.3.1.13 catalyses reaction 366 which is responsibe for prephenate dehydrogenase (nadp+) activity in the cell
reaction(Num,Substrates,Direction,Products) e.g. reaction(366,[reactant(cytosol,1,'C00254'), reactant(cytosol,1,'C00006')],->, [reactant(cytosol,'C01179')], reactant(cytosol,'C00011'), reactant(cytosol,'C00005')]). Each reaction has a unique identifying number, a list of the reactants comprising the substrates and products and a note of the direction (reversible or irreversible). Reaction 366 defines the prephenate dehydrogenase activity coded by YBR166C, where 3-(4-Hydroxyphenyl) pyruvate is formed by removing a COOH group from Prephenate, using NADP+/NADPH as cofactors (see below*) The Substrates and Products are stored as lists of reactant(Location,Stoichiometry,KEGGID)

*This reaction diagram has been taken from KEGG, in the Nielsen model the H+ produced is not included and in KEGG all reactions are defined as reversible. Kegg terms this reaction as "Prephenate NADP+ Oxidoreductase (decarboxylating)"

Predicting Growth Outcomes with the Robot Model

A major task of the robot model is to predict the outcome of auxotrophic experiments where the change in cell growth/division etc is measured, when one or more ORFs is/are removed(knocked out) from the Yeast genome. Traditionally these types of experiments were used only for "auxotrophic" mutants - where the ORF(s) removed resulted in no further growth of the yeast. This was the experimental regime for the rediscovery work, where the task focused on the 8 (out of 15 total) auxotrophic mutants derived from the AAA pathway.

The function of the removed gene can then be discovered by finding a chemical compound or set of compounds that restore healthy growth, since it can be inferred that the removed gene plays a role in the synthesis of the compounds, i.e. the removed gene codes for (a) reaction(s) that produce the compound(s) directly or (a) reaction(s) that produce a crucial intermediary compound.

The task of predicting growth outcomes using the metabolite graph/robot model is related to the problem of finding a path in the metabolite graph from a set of initial compounds, usually representing a growth medium and any additional compounds to test recovery of growth; to a set of compounds deemed essential for cell growth/division etc. The task is simplified by assuming that certain common compounds such as ATP, NADP etc are always present in the cell.

The reactions themselves are used to govern cell behaviour because they are a more realistic simulation of how chemical transformations occur in the cell, for a reaction to proceed all compounds in the substrates OR products need to be present in the cell before the reaction can occur:

A reaction can take place IFF:

  • All of the compounds in the Subtrates list are present in the correct cell compartments
  • OR all of the compounds in the Products list are present in the correct cell compartments AND the reaction is reversible

Simulation of a reaction is completed by adding the Substrates OR Products of the reaction to the correct compartments, depending on the direction i.e. the Products are added if the Substrates are present, a forwards reaction; and the Substrates are added if the reaction is run in reverse when the Products are found in the cell. A list of compounds corresponding to the contents of each cell compartment is constructed as each reaction is processed in turn. Each reaction may be run at most once. The reactions are processed in a cyclical manner until a cycle results in no new reaction executions, or all reactions have been run. At this point the cell simulation is complete and the growth outcome is predicted by examining the cell contents lists to check for the presence of the essential compounds:

IF ALL essential compounds are present, THEN the cell will continue to grow

IF ANY essential compound is missing then cell growth will be arrested (this is a prediction of an auxotrophic mutant)

The 4 following cases illustrate hypothetical disruptions in a simple graph with 8 metabolites, 6 reactions and 4 ORFs. There are two starting compounds (A and D) and two essential compounds (end points - G and H). It includes a common situation where gene products from a single ORF catalyse more than one reaction. In these diagrams the reactions are listed in the order of execution by the simulator, and the changing cell contents list is also show. The first case is equivalent to the "wild variant" - where no ORFs have been removed, and there is a path from starts to ends.

In the second case ORF O2 has been removed and reaction 2 no longer occurs. However because compound C can be synthesised in reaction 1 (from compound A), there is still a complete path from starts to ends, and in this case the prediction would be continued healthy growth.

In the third case ORF O3 is removed and reactions 3 and 4 cannot take place. The cell has no mechanism for synthesis of compounds E and F, therefore neither G and H can be synthesised. Since G and H are essential compounds this case would result in an "arrested growth" prediction - i.e. this knockout mutant would be auxotrophic.

In the fourth case, ORF O3 has been removed, but the growth medium has been augmented by the addition of E and F, in an effort to identify the compounds synthesised by any reactions removed by the knocked out ORF. O3 was responsible for catalysing reactions 3 and 4, by which the cell synthesised E and F. When E and F are added to the growth medium, reactions 5 and 6 are once again functional, allowing the cell to synthesise G and H again, thereby restoring healthy growth. Since E and F are additional nutrients it can be inferred that ORF O3 plays a role in the synthesis of both compounds. Experiments and Inference of this nature can allow new reactions or chemical transformations to be discovered, as well as the genes/orfs responsible for catalysis.