Aberystwyth Computer Science: Computational Biology: Protein Function

University of Wales, Aberystwyth
Computational Biology Group.
Department of Computer Science, Aberystwyth SY23 3DB, Wales, UK.


 

Glossary of terms used in rules

This is a glossary of the terms used in the rules generated for Predicting Protein Function from Sequence using Machine Learning

hom(A) refers to a homologous protein found by PSI-BLAST.
keyword(A, Word) refers to a SwissProt keyword found in A.
classification(A, Class) refers to the phylogenic classification of the organism A came from, taken from SwissProt.
species(A, Species) refers to the species of A, taken from SwissProt.
mol_wt_rule(A, Weight) refers to the molecular weight of A: 1 very low, 2 low, 3 medium, 4 high, and 5 very high.
amino_acid_ratio_rule(Residue, Weight) refers to the percentage composition of the residue in the sequence.
e_val_rule(A, Weight) refers to the PSI-Blast sequence similarity measure (note that a low value means a high sequence similarity).
e_val_gt
e_val_lteq
refers to the PSI-Blast sequence similarity measure,greater than or less than/equal to a certain value
mol_wt_lteq(A, Weight)
mol_wt_gt(A, Weight)
refers to the molecular weight of A being greater than or less than/equal to some value
amino_acid_pairs_wgand others similar, refers to the number of pairs of these two amino acids, in this case tryptophan and glycine
amino_acid_pair_ratio_qhand others similar, refers to the ratio of one amino acid to another in the ORF, in this case the ratio of glutamine(q) to histidine(h). This ratio is not a percentage, not out of a hundred, instead it's a ratio out of a thousand. So for example 2.8 means 0.28%.
amino_acid_ratio_gand others similar, refers to the percentage composition of the residue in the sequence of the ORF, in this case the percentage of glycine
psi_iter_gt
psi_iter_lteq
refers to the number of iterations of the PSI_BLAST search (greater than or less than/equal to some number)
ecoli_theo_pIrefers to the ORF's theoretical pI value
ss( SS ,X) *The ORF has a secondary structure prediction at position SS of a certain type X (either alpha helix, beta strand, or coil).
nss( SS1, SS2, X) *The ORF has a secondary structure prediction at position SS1 and position SS2 of a certain type X (either alpha helix, beta strand, or coil).
ss_alpha( SS, gt, B) *The ORF has an alpha helix secondary structure prediction at position SS with a residue length greater than B.
ss_beta( SS ,gt, B) *The ORF has an beta strand secondary structure prediction at position SS with a residue length greater than B.
ss_coil( SS ,gt, B) *The ORF has an coil secondary structure prediction at position SS with a residue length greater than B.
nss_alpha( SS1, SS2, gt, B) *The ORF's SS1th and SS2th (where SS2th=SS1th+2) alpha helix prediction have a residue length greater than B (similarly lteq instead of gt)
nss_beta( SS1, SS2, gt, B) *The ORF's SS1th and SS2th (where SS2th=SS1th+2) beta strand prediction have a residue length greater than B (similarly lteq instead of gt)
nss_coil( SS1, SS2, gt, B) *The ORF's SS1th and SS2th (where SS2th=SS1th+2) coil prediction have a residue length greater than B (similarly lteq instead of gt)
ecoli_aliphatic_indexrefers to the ORF's aliphatic index
ecoli_atomic_comp_srefers to the ORF's atomic composition of sulphur (or carbon, nitrogen, hydrogen, oxygen if _s is replaced by _c, _n, _h, _o respectively)

(*) Note about secondary structure attributes:

Positions in this text refer to the order in the predicted secondary structure. If for example an ORF has the following predicted secondary structure:

aaaabbbbbbaaacccccccbbbaaaaa

it would translate into

the 1st alpha helix secondary structure prediction is of length 4.
the 1st beta strand secondary structure prediction is of length 6.
the 2nd alpha helix secondary structure prediction is of length 3.
the 1st coil secondary structure prediction is of length 7.
the 2st beta strand secondary structure prediction is of length 6.
the 3rd alpha helix secondary structure prediction is of length 5.

(where length is the number of residues)

Amino acids

Alaninea
Argininer
Asparaginen
Aspartic acidd
Cysteinec
Glutamineq
Glutamic acide
Glycineg
Histidineh
Isoleucinei
Leucinel
Lysinek
Methioninem
Phenylalaninef
Prolinep
Serines
Threoninet
Tryptophanw
Tyrosiney
Valinev
Aspartic acid/Asparagineb
Glutamine/Glutamic acidz
residue that was passed through a low complexity filterx

Enquiries, contact Dr. Ross King.
Back to Predicting Protein Function from Sequence using Machine Learning
Updated: 14 March 2000