Abstract
Background: The post-translational modification of tyrosine nitration is an important covalently substituted process in many biochemical processes that are closely related to several human diseases.
Objective: Therefore, the correct recognition of nitration sites is useful for diseases diagnosis and the design of effective treatments. However, traditional experimental techniques and methods for the identification of nitration sites are time-consuming, labor-intensive and expensive. Alternatively, effective computational methods can be designed to tackle this problem.
Method: In this study, we proposed a computational workflow to identify and analyze nitrated tyrosine residues in proteins. Specifically, each nitrated tyrosine was represented by features derived from a segment of an amino acid sequence of the protein containing the nitrated tyrosine site. A reliable feature selection method, minimum redundancy maximum relevance, was adopted to analyze these features, and incremental feature selection and a type of support vector machine, SMO (sequential minimal optimization), were employed to extract core features and build an optimal prediction classifier.
Results: 223 features were extracted and used to build the optimal prediction classifier, with which the Matthew's correlation coefficient (MCC) of the training set was 0.717. The nitration sites in the testing set were extracted from the UniProt database based on the sequence similarity technique and were all denoted as positive samples. The sensitivity of the optimal classifier was 0.950 for the testing set. The results demonstrate the effectiveness and importance of optimal features and the classifier for the recognition of nitration sites. In addition, three other methods, the nearest neighbor algorithm (NNA), Dagging and random forest (RF) methods were also applied to the training and testing set, and the results were compared with those of the SMO.
Conclusion: 61 core features of the 223 total features were analyzed, and this analysis revealed the essential residue types and conserved sites proximal to the central tyrosine residue.
Keywords: Post-translational modification, tyrosine nitration prediction, minimum redundancy maximum relevance, support vector machine, incremental feature selection.
Graphical Abstract