Abstract
Background: Gene expression and disease control are regulated by the interaction between distal enhancers and proximal promoters, and the study of enhancer promoter interactions (EPIs) provides insight into the genetic basis of diseases.
Objective: Although the recent emergence of high-throughput sequencing methods have a deepened understanding of EPIs, accurate prediction of EPIs still limitations.
Methods: We have implemented a XGBoost-based approach and introduced two sets of features (epigenomic and sequence) to predict the interactions between enhancers and promoters in different cell lines.
Results: Extensive experimental results show that XGBoost effectively predicts EPIs across three cell lines, especially when using epigenomic and sequence features.
Conclusion: XGBoost outperforms other methods, such as random forest, Adadboost, GBDT, and TargetFinder.
Keywords: Enhancer-promoter interactions, supervised learning, machine learning, gene expression, feature extraction, XGBoost.
Graphical Abstract