Abstract
Background: DNA-binding proteins are vital cellular components, and their identification is important for the understanding of biological processes. Traditional methods for the prediction of protein function are both time-consuming and expensive. With the development of bioinformatics, a large amount of protein sequence information is available to researchers, necessitating the development of an efficient predictor for identification of DNA-binding proteins based on the protein-sequence information.
Objective: To better utilize the protein sequence information and further improve the accuracy of DNA-binding protein recognition, we designed a new predictor for identifying DNA-binding protein based on a voting strategy.
Method: Here, we employed two feature extractions for DNA-binding protein identification, including Physicochemical Distance Transformation (PDT), and PDT-profile. Then two predictors (iDNA-Prot- PDT and iDNA-Prot-PDT-Profile) were established on the basis of these two feature extraction methods. To further improve the quality of prediction, a voting strategy (iDNA-Prot-Vote) was adopted.
Results: The experimental results on benchmark dataset and independent dataset showed that our methods outperformed other state-of-the-art methods.
Conclusion: These results indicate that the proposed methods are useful for DNA-binding protein identification, which would promote the development of protein sequence analysis.
Keywords: DNA-binding proteins identification, physicochemical distance transformation, frequency profile, ensemble learning, vector, threshold.
Graphical Abstract