Abstract
Enhancers are the short functional regions (50–1500bp) in the genome, which play an effective character in activating gene-transcription in the presence of transcription-factors (TFs). Many human diseases, such as cancer and inflammatory bowel disease, are correlated with the enhancers’ genetic variations. The precise recognition of the enhancers provides useful insights for understanding the pathogenesis of human diseases and their treatments. High-throughput experiments are considered essential tools for characterizing enhancers; however, these methods are laborious, costly and time-consuming. Computational methods are considered alternative solutions for accurate and rapid identification of the enhancers. Over the past years, numerous computational predictors have been devised for predicting enhancers and their strength. A comprehensive review and thorough assessment are indispensable to systematically compare sequence-based enhancer’s bioinformatics tools on their performance. Giving the increasing interest in this domain, we conducted a large-scale analysis and assessment of the state-of-the-art enhancer predictors to evaluate their scalability and generalization power. Additionally, we classified the existing approaches into three main groups: conventional machine-learning, ensemble and deep learning-based approaches. Furthermore, the study has focused on exploring the important factors that are crucial for developing precise and reliable predictors such as designing trusted benchmark/independent datasets, feature representation schemes, feature selection methods, classification strategies, evaluation metrics and webservers. Finally, the insights from this review are expected to provide important guidelines to the research community and pharmaceutical companies in general and high-throughput tools for the detection and characterization of enhancers in particular.