Abstract
Background: Intrinsically disordered proteins lack a well-defined three-dimensional structure under physiological conditions. They have performed multiple functions in life activities and are closely related to many human diseases. The identification of the disordered region of intrinsically disordered proteins is important to protein function annotation.
Objective: To accurately identify the disordered regions in intrinsically disordered proteins.
Methods: In this study, we constructed a multi-feature fusion model based on a support vector machine to predict disordered regions of intrinsically disordered proteins from the DisPort database. We extracted codons usage frequencies, GC content, protein secondary structure components, hydrophilic-hydrophobic amino acid components, and chemical shifts as features to predict the disordered regions of intrinsically disordered proteins.
Results: The best accuracy is 82.098% by using codon frequencies in single feature prediction. In order to improve the performance, we fused these features and obtained the best result of 83.173% in combining codons frequencies with chemical shifts as the feature.
Conclusion: The results show that our model has achieved a good prediction result in predicting disordered regions of intrinsically disordered proteins-moreover, the performances of our model are better than those of existing methods.
Keywords: Intrinsically disordered proteins, support vector machine, GC content, codon usage frequency, chemical shifts, hydrophilic- hydrophobic.
Graphical Abstract