Abstract
Background: Accurately recognizing nitrated tyrosine residues from protein sequences would pave a way for understanding the mechanism of nitration and the screening of the tyrosine residues in sequences.
Results: In this study, we proposed a prediction model that used the extreme learning machine (ELM) algorithm as the prediction engine to identify nitrated tyrosine residues. To encode each tyrosine residue, a sliding window technique was adopted to extract a peptide segment for each tyrosine residue, from which a number of features were extracted. These features were analyzed by a popular feature selection method, Minimum Redundancy Maximum Relevance (mRMR) method, producing a feature list, in which all features were ranked in a rigorous way. Then, the Incremental Feature Selection (IFS) method was utilized to discover the optimal features, on which the optimal ELM-based prediction model was built. This model produced satisfactory results on the training dataset with a Matthews correlation coefficient of 0.757. The model was also evaluated by an independent test dataset that contained only positive samples, yielding a sensitivity of 0.938.
Conclusion: Compared to other prediction models that use classic machine learning algorithms as prediction engines on the same datasets with their own optimal features, the optimal ELM-based prediction model produced much better results, indicating the superiority of the proposed model for the identification of nitrated tyrosine residues from protein sequences.
Keywords: Post-translational modification, nitrated tyrosine, extreme learning machine, minimum redundancy maximum relevance, incremental feature selection.