Button Spacer
Proteome Main Page Button
Button Spacer
Home Button
Site Search/Map Button
Group Members Button

PROTEIN IDENTIFICATION BY PEPTIDE MASS FINGERPRINTING TUTORIAL


Dr James R. Jefferies, Parasitology Group, Institute of Biological Sciences, University of Wales at Aberystwyth, Aberystwyth, Ceredigion, SY23 3DA, Wales, UK.

 

We wish to thank the BBSRC who funded much of this work.


Contents.

  1. Introduction
    1. Why PMF.
    2. Peptide mass fingerprint.
    3. How does PMF work.
  2. Sample Preparation
    1. Before we start.
    2. Spot removal.
    3. Cysteine residue modification.
    4. Protein digestion and peptide extraction.
  3. Mass Spectroscopy
  4. Database Searching
    1. Masses.
    2. Protease used.
    3. Databases to search.
    4. Mass and pI.
    5. Cysteine modification.
    6. Other modifications.
    7. Peptide hits.
    8. Mass tolerance.
    9. Searching.
  5. Results
  6. Post-Translational Modifications


(1) Introduction

The aim of proteomics is to separate and identify individual proteins of interest to you. In the first tutorial we discussed 2-dimensional electrophoresis as the method of separation. In this tutorial we will look at how we get the separated protein spots from the gel, to their eventual identification, using peptide mass fingerprinting (PMF). For a general overview of proteomics, its use and applications there is a review by Blackstock and Weir.

(a) Why PMF

For the identification of any protein, you need to obtain information that is unique to that particular molecule or its family. You can look at the pI or molecular weight, but even if this is done accurately it probably won’t give you a definitive answer, besides, if you’ve looked at 2D gels before you’ll know that pI is a poor predictor for identification purposes.

The most obvious answer is to sequence the protein, that will give you an excellent chance of identification. However, obtaining sequence can be time consuming and if you have a lot of samples to analyze you want something a lot quicker. This is where peptide mass fingerprinting (PMF) comes in, its quick and used under the correct circumstances can give excellent results.

(b) Peptide mass fingerprinting

As the name describes, the PMF of a protein can be likened to a fingerprint, as like your fingerprint, it is also unique to that molecule. However, with this fingerprint the information comes from a unique collection of peptides that occur when the molecule is digested. PMF is a method of identifying proteins that relies on the use of a protease to digest the protein in question into a number of smaller peptides. Not all proteases are suitable and the most commonly used one is trypsin. Trypsin, that is unmodified trypsin, cuts directly downstream of the two basic amino acids lysine (K) and arginine (R), which are fairly common residues. As different proteases cleave at different amino acid residues the PMF of the protein will depend on the protease used, but will always be the same for each one. This means that as long as the digestion in complete, that is the molecule is cleaved at all the possible sites it will produce a set of peptides, of varying masses, that are characteristic of that protein. The mass of each peptide will be the sum of the amino acids present including any modifications that those amino acids might have undergone.

(c) How does PMF work

The use of the fingerprint to identify proteins is not always possible as it relies on the ability to search data already present in various databases. So it is important that the organism that you are working on should be genome verified. That is, its genome should have been sequenced. This is very important, although we have worked on material from non verified organisms, in this case there was cDNA data available. If neither of these are the case then it is worth checking if there are any expressed sequence tags (EST) that can be used. These are fragments of sequence obtained by reversed transcription of messenger RNA and as such only represent those genes being expressed when the RNA was extracted. These cDNA fragments, are generally incomplete and so are not ideal for this purpose, however, they can give you results. If there is no reference material available in the various databases then PMF is a fairly pointless exercise and unless your protein is very highly conserved, with respect to those from other species that are in the database, you will have little luck. So the take home message here is no sequence in the database, no identification. How do you search for these PMFs in the database, well that is something we’ll get back to when we get to database searching.

Take a look at the comprehensive proteome analysis database from EBI, which has a huge amount of information on genome verified organisms.

For a comprehensive look at ESTs you can do no better than the NCBI expressed sequence tag database.

However, if you’re interested in parasitic nematodes the Blaxter lab has plenty of EST material available.

Goto Top of Tutorial



(2) Sample Preparation

(a) Before we start

For all of this work we are going to be using high grade chemicals etc. We most definitely will not be pillaging the old chemical cabinet for reagents. Mass specs are so sensitive nowadays they’ll find any contaminants, but worse still, they can cause all sorts of other problems that are much more profound. As a consequence we have a collection of high quality chemicals for MS work only. It is also wise to use good plastics too, as chemicals can leach out of these and cause problems. One last thing, always wear gloves when you are working with this material, you don’t want any contamination. When you open and close the necks of microfuge tubes your thumb or fingers can rub the neck of the tube when you open it.

(b) Spot removal

Its important, before you do anything else to make sure that the gel that you are taking the spots from has been thoroughly washed and all solvents etc removed. MALDI-TOFs are generally more tolerant to salts and other contaminants but there is no point in testing this fact.

Removing spots from a gel can be carried out in a number of ways, we have a robot to remove spots from our gels, which is very nice, although not necessary if only a small number are required. I’m only going to talk about manual extraction and preparation of spots here, so nothing about robots. The number 1 mistake during spot excision is mislabelling of spots as you cut them out of the gel, if you do that no matter how well you do the rest, your results will be incorrect. I use good quality microcentrifuge tubes which I mark with an indelible marker. This whole process can be a tedious job, but don’t fall asleep or before you know it you’ll be cutting out spot 13 from the gel and placing it into tube 14 and all that hard work will be for nothing. To cut the spots out we place the gel on a hard, very clean surface (I usually use an unused overhead acetate) and use a scalpel blade with a fine point to cut them out. There are a lot of different methods and devices for doing this, we also tried using a cut pipette tip but found that the gel often got stuck in the tip. If any companies out there have a better method, then send me a sample and I’ll review it here.

(c) Cysteine residue modification

So now you have isolated your proteins and they are all sitting in labeled tubes. The next thing to do is to digest the proteins and extract their peptides from the gel. Because proteins are large molecules they do not diffuse readily from SDS-PAGE gels. Digesting your protein whilst it is still inside the gel makes it far easier to extract, the small peptides diffuse much more readily through the gel matrix. You can, if you wish extract your protein whole, although the yield will not be so high and there is a method available for whole protein extraction .

There are a number of methods used for the digestion and extraction of peptides and they can be split roughly in two parts depending on the way in which you prepared your proteins during the two-dimensional electrophoresis steps. The first step is only required if you wish to prepare your cysteine residues

When we run our 2D gels we always carry out a 2 step equilibration of the IPG strips (see the 2-DE tutorial ). The first step in the equilibration is to reduce all of the cysteines present in the protein, this step is undertaken to reduce any disulphide bridges that may have formed between adjacent cysteine residues. This is a reversible process and oxidation may occur if reducing conditions are lost. For this reason, once we have carried out the reduction step we then treat the strips with iodoacetamide. A non-reversible reaction then occurs that places a functional group on the sulphide group, blocking it and ensuring that no further reactions can occur. For more on this take a look at the Ionsource webpage, this example is for iodoacetic acid, but the principal is the same for iodoacetamide, although the final blocking group will be different.

The reason that this is so important is that the cysteines must be of uniform and of known molecular mass. Remember that the sample is being prepared for MS and that relies on accurate mass determination, if you don’t know the exact mass of the cysteines in your peptide, then you cannot accurately predict the mass of peptides that contain a cysteine residue.

If your proteins were modified by iodoacetamide en masse during the equilibration step of the 2D procedure your job will be much easier than having to carry out this reaction on every protein individually. This modification of cysteines will also be discussed later when we get to PMF analysis.

A note of warning here though. Having just read a number of articles (Herbert B. et al. (2001) Electrophoresis 22, 2046-2057; Galvani M. et al. (2001) Electrophoresis 22, 2058-2065.) it seems that the method of alkylation that we use may be insufficient to prepare all of the cysteine residues. As I say we are not aware of any problems, however, this is something that we shall be looking into in the near future. If you want to know more check out the 2D tutorial.

(d) Protein digestion and peptide extraction

There are a number of protocols for digestion and extraction of peptides available from the University of Washington and also Vrije University in Amsterdam . There is also a very good site from the University of Arizona that covers extraction from both silver and Coomassie stained gels.

As the technique is covered sufficiently by the sites above I’ll just take a quick overview.

First of all the gel has to be dried to remove any water, buffer etc that is present and this is carried out firstly by using a solvent. The osmotic effect caused by this dehydrates it and it is finished off by complete dehydration in a SpeedVac.

The next step is to add the protease, which in this case we’ll take as being trypsin. The reason for drying the gel is that in this step you want to ensure that the trypsin is able to diffuse into the gel so that it can digest the protein. As I mentioned earlier its very difficult for proteins to diffuse out of the gel and the same applies to proteins trying to get in. By rehydrating your gel pieces in the presence of trypsin, though, you get a much better chance of it diffusing well into the gel as the protease solution is absorbed.

This rehydration should be carried at about 4 degrees C to ensure that the protease is not active until it has had time to reach its protein target.

The buffer recommended for making up the tryptic solution is ammonium hydrogen carbonate. The reason for using this buffer system is that it leaves no residue, instead you get ammonia and carbon dioxide produced on drying of the buffer.

The amount of trypsin you use will depend on the amount of protein you think is present. Don’t be tempted to add too much though, I generally use 20ng/micro l for Coomassie stained gels.

The addition of too much protease can be detrimental to the results, remember that this is adding more protein to your sample. Proteases don’t just digest other proteins, they also digest themselves, producing fragments known as autolysis products . These can be useful as we will see later, however, too many can be detrimental. When you come to look for your peptides with the mass spec you might find your sample has been swamped out by the signal produced by these autolysis products.

We normally carry out digestion in the afternoon so that just before we go home we place the samples in the incubator. This means that they have plenty of time to complete their work. Leaving the protease to digest for too long can also be a mistake, once it has finished digesting your sample protein it will digest itself and once more you may get a lot of autolysis products. Overnight is plenty of time for the digestion to run to completion.

The rest of the steps are fairly straightforward and are explained on the other webpages.

Goto Top of Tutorial



(3) Mass Spectroscopy

MALDI stands for matrix assisted laser desorption ionisation. It all sounds like a bit of a mouthful but is fairly straightforward in reality. The sample that you wish to analyse in the machine is spotted onto a target plate and allowed to dry. The plate is then placed into the machine, but to get a sample to enter the analyser it needs to be removed from this plate. With MALDI-TOF the sample needs to be ionised and vapourised in some way and in this case a laser is used to provide the energy requirement. Unfortunately firing a laser at the sample normally does not lead to adequate results so a little help is required. The help comes in the form of the matrix. This is simply a chemical with the correct properties that absorbs light, and so energy, at the wavelength of the laser used. In this case we’re using a nitrogen UV laser (337nm) and our matrix is Alpha-matrix (alpha-cyano-4-hydroxycinnamic acid). The sample is mixed with matrix and placed on a metal target plate. Now when the laser is fired the matrix absorbs the energy from it, heats and ionises rapidly. This also causes the sample, which is mixed in the matrix, to do the same so that when the laser hits, the matrix and sample are ablated from the plate into a small ionised aerosol that can enter the mass spec. Therefore we can see that a matrix assisted laser blast causes desorption and then ionisation of the sample and so we get MALDI.

The TOF part of it comes from Time-Of-Flight. Once the sample has been ionised it is accelerated through an electric field. The inside of the spec is a vacuum and the ions are able to pass along a straight length of analyser where their time of flight is measured. This measures the mass/charge ratio (m/z) of the fragment and smaller fragments move faster than larger ones, therefore allowing them to be measured. In MALDI the peptide fragments generally only have a single charge so that the m/z normally represents the mass of the fragment.

If you want to know more about how a MALDI-TOF works there is excellent information at the "Association of Biomolecular Resource Facilities", ABRF.

I’m not going to delve too far into how the MALDI-TOF works or how it is set up and run, this will obviously vary with the type of instrument that is available. I will also not cover any initial manipulation of data as this will depend on the software, instead I’ll move on to analysis of results.

The first thing to make sure once we have run our sample is that the spectrum is calibrated properly. If there is even a small error here you may have problems when you come to the database searches. Generally you have two ways of calibrating the spectra. Samples for MALDI-TOF are spotted onto a target plate for analysis, so you can either place the calibrant in your sample or you can place it on the plate by itself and use it as a lock mass to calibrate the samples around it. We generally add calibrants to each sample to ensure that the results are as accurate as possible. There is also another possibility that we take advantage of and that is autolysis products or contaminant peaks. We find that our Coomassie stain often gives us nice peaks and we also get autolysis products from porcine trypsin and these can be used as calibrants.

Goto Top of Tutorial



(4) Database Searching

So now I’m assuming you or your computer has looked over the spectrum and made a list of the peaks that you think represent peptides. Now you’re ready to interrogate the various databases to identify your protein. The first thing to do is to decide which search engine you will use, my favourite is MS-Fit, however there is also Mascot , Peptident and Profound to name a few. There is a comprehensive list of search engines at the LITBIO . You can even used a unified interface to search using a number of engines at once using CombSearch

Once you have chosen the one that suits you, there are a few parameter settings that they all have in common.

(a) Masses

Obviously they all have a window where you put your list of monoisotopic masses. Pasting the data into these windows can vary, MS-Fit likes the masses as a vertical column whilst others will take them in any form. Don’t be scared if you have a large number of masses, it’s better to add contaminant masses than to miss out peptide masses.

(b) Protease used

The next thing they all ask you is what enzyme you used, which is fairly straightforward. You will also probably be asked to give a maximum number for missed cleavages. Missed cleavaged occur when the protein is not digested to completion. I normally leave this at 1, however, if for some reason you think that your digest wasn’t complete you can increase this. Remember that if you do so you may decrease the specificity of subsequent searches and they will take longer to perform.

(c) Databases to search

Some search engines will ask you for the species that the sample came from and often they give a list of the various organisms that have substantial database entries. Obviously if what you are working on isn’t there you will have to do a general search and hope that enough of it is there for you to identify your protein. Try and limit the searches as much as possible, even if it just means excluding the Homo sapiens database, it will save you time and computing power. You may also be asked what database you wish to interrogate, we always use the NCBInr database and find this doesn’t miss much.

(d) Mass and pI

You will be asked for the mass and pI of your protein. If you try to be too accurate here and your initial mass calculation was slightly out you might exclude your protein from the search parameters, so be careful. The same can be said for pI, only in this case there is even more room for error, don’t forget that proteins can be modified in ways that change not only their mass but also their theoretical pI. I generally leave the pI between 3-10 and change my mass 10kD either side of what it has been estimated at.

(e) Cysteine modification

As we mentioned earlier any modifications to the cysteine residues will have a profound effect on the mass of cysteine bearing peptides. In this window you will be asked for any modifications. You must specify, if any, what modifications have been made.

(f) Other modifications

Some search engines also give you the chance to search for other modified residues such as oxidised methionine and phosphorylation of tyrosine etc.

(g) Peptide hits

Don’t expect that all of the masses that you put into the masses window will be fragments from your protein, some of them will be contaminants. You have to set the number of peptides that the engine has to find before it will record a hit. Remember that your protein may be cut into a large number of fragments. The default settings on most of the search engines are set at 4 and this is a good starting point. This means that if the database finds only 3 theoretical masses that match your experimental masses it will discard that hit, if however, it finds 4 it will record it as a hit. The hits that are returned by the search engine will generally be listed from the most significant as calculated by that search engine, to give a for instance, so you won’t have to go delving through reams of hits.

If you are searching for small proteins, say 7 kDa, then you may find that you have to lower this number to 3 to obtain hits. Generally though, the larger number of hits that you get the more statistically probable your hit will be. It isn’t just the number of hits that are important though, larger fragments are statistically more significant. When you use MS-fit it will give you a MOWSE score which uses these parameters to measure probability.

(h) Mass tolerance

This setting decides how stringent your search will be, the lower the figure the more stringent the search. This tolerance can be represented as %, ppm, Da etc. We use ppm (parts per million) and normally set it at about 50, this though will depend on the accuracy of the mass spec that you use and how well you calibrate your samples. This means that searching for a peptide fragment of 1000.00 Da with the tolerance set at 100ppm you will search between 999.90 Da and 1000.10 Da. Apparently mass specs have a mass dependant error associated with mass measurement so that using relative values like ppm is better than using absolute values such as Daltons.Getting this setting right is important and we will come back to this after we have had a look at what the output of one of these searches gives us.

(i) Searching

So now you’ve pressed start and the search engine has gone off to find a match for your PMF fingerprint. What it is doing now is going through the databases you have selected, theoretically digesting every protein within the search parameters that you set, and recording those that match your set of theoretical peptides most closely. The more you constrain your search the fewer proteins will be searched and the less the chance of spurious hits.

Goto Top of Tutorial



(5) Results

The results you get back will include the number of fragments that match yours and their sequences and the difference between the submitted masses. It will also include the theoretical masses and the coverage of the protein, or how much of the protein is represented by your PMF, measured as a percentage. All of this data gives you information on how probable your hit is and it is worth taking time to analyse them properly.

The first thing that I noticed when doing these searches was the difference you can get from changing the mass tolerance. At 100ppm you will get more fragments, but the search will be will be less stringent and you might see quite a few protein hits that obviously are not your protein. Tightening the search to say 50ppm, though, may be required to get a firm hit on your protein, but doing this may cause you to have to sacrifice a peptide or two.

When I first started this type of work I searched using carboxymethylated cysteines, not realising that this is the modification that occurs with iodoacetic acid and not iodoacetamide. It wasn’t until I noticed that I never obtained hits on any cysteine bearing fragments that I realised something was going wrong. I changed it to carbamidomethylation and suddenly they started to appear. At that time I also realised that taking a look at the mass spectrum of the sample can be helpful. You might find that you have good strong peaks that are not represented in the PMF of your protein. On closer inspection they might turn out to be contaminants, but there is always the possibility that a residue on a peptide has been post-translationally modified in a way that your search engine does not take into account. For instance you might have glutathionylation of a cysteine residue which will add 305.068 to the monoisotopic mass charge. For this reason I keep a little chart of possible post-translational modifications, that I got from Micromass, next to my computer. There is also a comprehensive list of modifications available from prowl . It’s always worth a look you never know what you might find. You can also use a program called FindMod to help look for modifications.

At the end of the day there is no hard and fast rule for confirming whether you have correctly identified your protein, it is often a matter of common sense. But if you don’t have a clear cut answer you need to do a little more work. Obviously the mass of the predicted protein must be within experimental parameters of the observed mass. We do a lot of work on glutathione S-transferases (GST) which are generally between 20-30 kDa. They normally function as a dimer, but on SDS-PAGE dissociate into individual monomers. However, when we run affinity purified samples we sometimes find spots on our gel at about 40 kDa. If we search for these proteins at between 30-50 kDa we get no firm hits, however, if we widen the search we suddenly find that we get a good positive hit for a GST at its monomeric size of around 23 kDa. This suggests that what we are seeing is the protein in its native dimeric form. Had we not known about its dimeric nature we might never have identified it. So the size of the protein is important during searching and some knowledge of the proteins that you are looking at can be a great help.

Another thing that becomes noticeable during searching is that if you do not constrain your searches by protein mass, you will find that large proteins will predominate in your search results. This is because there is a greater statistical chance that large proteins will have matching peptides than smaller proteins.

Goto Top of Tutorial



(6) Post-Translational Modifications

The modification of proteins plays a key role in their function, therefore, our ability to look at and identify changes in these proteins can be of key importance. Many proteins are activated by phosphorylation, or are expressed in an inactive form (proproteins, preproteins or preproproteins) and must be cleaved before they become active. There are a whole host of different modifications that can occur and some of them, for instance, oxidation of methionine and carbamylation are generally caused during sample preparation. Here is an interesting additional site with more about post-translational modifications.

Using the techniques described by this tutorial it is possible to find some of these modifications, however, this is not a simple task and it may be nessecary to obtain other data to confirm the findings. This seems to be a fairly new field and there isn’t a great of information available yet, however, I’m sure this will rapidly change.

Acetylation of the N-terminal amino acid of a peptide is a fairly common modification that can be found in PMFs. This modification induces an increase in the ionisation of the peptide, thus making it an abundant species in the PMF, so you see a large peak. Phosphorylation is another common modification that occurs on peptides and there are three amino acids that can be modified in this fashion, tyrosine, serine and threonine. This modification is said to suppress the ionisation of the fragment and it has also been suggested that these groups can also drop off the peptide whilst they are being analysed by the mass spec, so this is obviously a troublesome modification to look at. Another common modification that we see is cyclization of glutamic acid to pyro-glutamic acid

Expasy proteomic tools.

Goto Top of Tutorial