Abstract
Protein inference is the final but very important step to identify proteins in biological complex from tandem mass spectrometry analysis. Assembling peptides identified from tandem mass spectra is one of the most widely used methods for protein inference in proteomics studies. The “bottom-up” approaches cut off the connection between peptides and proteins and thus significantly complicate the protein inference. First, the existence of peptides shared by multiple parent proteins leads to ambiguities in identifying proteins. Second, it is tough to develop an effective model which can recover the connection between peptides and proteins. To address these issues, three new concepts have been proposed, which are parsimony principle, proteotypic peptides, and peptide detectability. In this review we survey the advantages and disadvantages of several state-of-the-art statistical models which applied these concepts. In order to keep in line with proteomics experiments, we first give a brief introduction of peptide identification. Finally, we point out further requirements for improvements and future perspectives.
Keywords: Protein inference, peptide identification, statistical model, degenerate peptide, proteotypic peptide, peptide detectability