Abstract
Background: Enhancer-Promoter Interaction (EPI) recognition is crucial for understanding human development and transcriptional regulation. EPI in the genome plays a significant role in regulating gene expression. In Genome-Wide Association Studies (GWAS), EPIs help to improve the mechanistic understanding of disease- or trait-associated genetic variants.
Methods: Experimental methods for classifying EPIs are time-consuming and expensive. Consequently, there has been a growing emphasis on research focused on developing computational approaches that leverage deep learning and other machine learning techniques. One of the main challenges in EPI prediction is the long sequences of enhancers and promoters, which most existing computational approaches struggle with. This paper proposes a new deep learning model based on the Hierarchical Attention Network (HAN) for EPI detection. The proposed EPI-HAN model has two unique features: (i) a hybrid embedding strategy (ii) a hierarchical HAN structure comprising two attention layers that operate at both the individual token and smaller sequence levels.
Results: In benchmark comparisons, the EPI-HAN model demonstrates superior performance over state-of-the-art methods, as evidenced by AUROC and AUPR metrics for specific cell lines. Specifically, for the cell lines HeLa-S3, HUVEC, and NHEK, the AUROC values are 0.962, 0.946, and 0.987, respectively, and the AUPR values are 0.842, 0.724, and 0.926, respectively.
Conclusion: The comparative results indicate that our model surpasses other state-of-the-art models in three out of six cell lines. The Superior performance in recognizing EPIs is attributed to the hierarchical structure of the attention mechanism.