Abstract
Proteomics is a still-evolving combination of technologies to describe and characterize all expressed proteins in a biological system. Because of upper limits on mass detection of mass spectrometers, the bottom-up approach is most widely employed in which tryptic peptides are quantified and identified from complex protein mixtures. Protein identification from tandem mass spectra is still a challenge in proteomics. Two approaches have been developed to identify proteins from tandem mass spectra, database searching and de novo sequencing. These approaches typically have positive identification rates of only ∼10-20%, and exhibit high false positive identification rates. This review surveys existing algorithms developed for database searching and de novo sequencing, with a focus on recent developments for tandem mass spectrum quality assessment, peptide identification using annotated spectra libraries, statistical approaches to assess identification quality, and methods for constrained searches. We also review research comparing the performance of existing protein identification packages.
Keywords: Protein identification, mass spectrometry, database searching, de novo sequencing, data preprocessing, intensity based identification