Modelling of Biochemical Networks

Welcome to the Computational Biology Group Systems Biology Modelling pages. These web pages document and describe the various systems biology models created by the group and how the models have been used in the research activities of the group; with particular reference to logical models and the Robot Scientist(s).

Logical and Graph based (LG) Modelling

LG models use techniques from Graph theory and First Order Logic (FOL) to represent the "coarse structure" of biochemical networks, where the connections and overall topology of a network is of greater importance than the fine details of how each component behaves, e.g. LG models make no attempt to model reaction kinetics either as fluxes (FBA models) or as molecular concentrations and reaction rates (ODE, PDE models); however this assumption may be relaxed for models built using techniques from probabilistic extensions to FOL, or from Qualitative Reasoning (QR) representations. This focus on topology allows LG models to simulate biological behaviour on many scales, from single pathway models to whole genome models defining the entire metabolism of species; as well as the efficient simulation of changes in both the genetic and environmental components. Many online systems biology resources are essentially LG models, e.g. EcoCyc/MetaCyc (for E. coli), SGD and MIPS/CYGD (for S. cerevisiae) and KEGG

The use of FOL and Graph theory is also analogous to a relational database representation where knowledge corresponds to a set of defining entities and the relationships between these entities. In the case of the metabolic networks described here, the entities include the various open reading frames (ORFs), enzymes and chemical compounds found in the network: and the relationships include the catalysation of reactions by the appropriate enzyme; coding of the relevant enzymatic protein(s) by the corresponding ORF(s); and the reactions themselves which describe the chemical transformations found in the metabolic pathway(s) comprising the model. Indeed, the LG models desribed in these pages take the form of a single (large) graph or hypergraph of interconnected reactions corresponding to the metabolic behaviour of each species or pathway (the reaction network), with each individual reaction "annotated" by its catalysing enzyme(s) and coding ORF(s). The relationship between ORF, Enzyme and Reaction knowledge is complex ("many to many"), i.e. 1 ORF codes for 1 or more Enzymes, each one of which may be coded for any many other ORFs; and 1 reaction is catalysed by 1 or more Enzymes, each one of which may be catalysed by many other Enzymes.

This relational knowledge representation allows the simulation of possible metabolic behaviours to be performed by deductive inference, e.g. The altered reaction network produced by changes to the genetic make up of strain can be derived from the resultant changes to the ORF/enzyme/reaction relationships of the model. A "model engine" built using the PROLOG inference engine is used to simulate metabolic behaviour, where each simulation is also a prediction of a growth phenotype given the genetic make up of a strain and the environmental conditions, i.e. the growth medium and any additional nutrients added as experimental variables. The growth phenotype is determined by reference to a set of chemical compounds, deemed to be essential for for healthy continued growth of the strain. The model engine determines whether there is still sufficient connectivity in the altered reaction network to allow traversal from the growth medium compounds to each essential compound. Initial model creation for sce:aber concentrated on correctly defining the wild type, and to ensure the necessary connectivity, a set of "ubiquitous" compounds was compiled: these compounds are assumed to always be present in the cell, from initial budding onwards.

Logical Models Created and used by Computational Biology Group

The following models have been created by the Computational Biology Group and have been used as background knowledge for scientific discovery by the Robot Scientists developed by the group. Each page describes the components of a model, version history and how the model has been used; however the sce:AAA model has little development history because it was a prototype model used to initiate the logical modelling concept, as well as a test theory for the proof-of-concept Robot Scientist work. The knowledge contained within this model as been largely superseded by the sce:aber model which expands the concept to include most of the current kowledge of yeast metabolism. The nomenclature for naming the models uses the KEGG 3-letter abbreviation for the species name (sce = Saccharomyces cerevisiae), followed by the "informal" name of the model - AAA = Aromatic Amino Acid biosynthesis, aber = Aberystwyth, representing the academic institution where the model was created.

Logical Models created by the Computational Biology Group
Species	Model Description
Saccharomyces cerevisiae	sce:AAA: Model of the Aromatic Amino Acid Biosynthesis Pathway of Saccharomyces cerevisiae (bakers' yeast)
Saccharomyces cerevisiae	sce:aber: Whole Genome Logical Model of Saccharomyces cerevisiae Metabolic Network

Versioning of the Logical Models

The above pages include the version history and evolution of each model, as well as a description of the components and usage of the model.

The versions all have the A.B.C format ...

A will be incremented if a new version involves changes to the orf annotation and reaction components - i.e. this is a major change to the biological knowledge.

B will be incremented if there is a change to either the hypothesis generation method or to the supporting information used by the hypothesis generation components (e.g. the genomes listing in KEGG or the enzyme listing of KEGG.

C will be incremented if there are any other changes e.g. use of a different growth medium or a change to the simulation engine.