Abstract
Protein-DNA interactions are involved in many essential biological processes such as transcription, splicing, replication and DNA repair. It is of great value to identify DNA-binding proteins as well as their binding sites in order to study the mechanisms of these biological processes. A number of experimental methods have been developed for the identification of DNA-binding proteins, such as DNAase foot printing, EMSA, X-ray crystallography, NMR spectroscopy and CHIP-on-Chip. However, with the increasingly greater number of suspected protein-DNA interactions, identification by experimental methods is expensive, labor-intensive and time-consuming. Hence, in the past decades researchers have developed many computational approaches to predict in silico the interactions of proteins and DNA. Machine learning technology has been widely used and become dominant in this field. In this article, we focus on reviewing recent machine learning–based progresses in DNA-binding protein and binding residue prediction methods, the most commonly used features in these predictions, machine learning classifier comparison and selection, evaluation method comparison, and existing problems and future directions for the field.
Keywords: DNA-binding protein, Prediction, Machine learning approach, Protein-DNA interaction, ranscription, biological process, NMR spectroscopy, CHIP-on-Chip