Abstract
Background: Tandem mass spectrometry (MS/MS) peptide identification is an important research topic in molecular biology; the comparison between an experimental spectrum and a theoretically predicted spectrum is a crucial step for many identification methods. Consequently, the accurate prediction of the theoretical spectrum from a peptide sequence can potentially improve the performance of peptide identification and is a significant problem for mass spectrometry-based proteomics.
Objective: We studied the mechanism of peptide fragmentation in the mass spectrometer and proposed a strategy for theoretical spectrum simulation. We have proposed a new theoretical spectrum prediction model called TagDict.
Method: TagDict built a “tag dictionary” from existing spectrum library and used for theoretical spectrum prediction. This dictionary collected a large number of records that each record comprised of peptide segment and the middle adjacent position fragment ion’s intensity.
Results: Full theoretical spectrum can derive from the adjacent ion intensity ratios get from query “tag dictionary”. Compared with MassAnalyzer, the theoretical spectrum of TagDict simulated is more similar to the real spectrum.
Conclusion: The new approach, comparing with another existing spectrum prediction tool MassAnalyzer, not only simplifies the process of theoretical spectrum simulation but also improves the prediction accuracy of the spectrum library searching by using this approach to extend spectrum library.
Keywords: Mass spectrometry, theoretical spectrum prediction, database search, peptide identification, TagDict, MassAnalyzer.
Graphical Abstract