Abstract
Background: The chemical modification of RNA plays a crucial role in many biological processes. N7-methylguanosine (m7G), being one of the most important epigenetic modifications, plays an important role in gene expression, processing metabolism, and protein synthesis. Detecting the exact location of m7G sites in the transcriptome is key to understanding their relevant mechanism in gene expression. On the basis of experimentally validated data, several machine learning or deep learning tools have been designed to identify internal m7G sites and have shown advantages over traditional experimental methods in terms of speed, cost-effectiveness and robustness. Aims: In this study, we aim to develop a computational model to help predict the exact location of m7G sites in humans.
Objective: Simple and advanced encoding methods and deep learning networks are designed to achieve excellent m7G prediction efficiently.
Methods: Three types of feature extractions and six classification algorithms were tested to identify m7G sites. Our final model, named Sia-m7G, adopts one-hot encoding and a delicate Siamese neural network with an attention mechanism. In addition, multiple 10-fold cross-validation tests were conducted to evaluate our predictor.
Results: Sia-m7G achieved the highest sensitivity, specificity and accuracy on 10-fold cross validation tests compared with the other six m7G predictors. Nucleotide preference and model visualization analyses were conducted to strengthen the interpretability of Sia-m7G and provide a further understanding of m7G site fragments in genomic sequences.
Conclusion: Sia-m7G has significant advantages over other classifiers and predictors, which proves the superiority of the Siamese neural network algorithm in identifying m7G sites.
[http://dx.doi.org/10.1126/science.aau1646] [PMID: 30262497]
[http://dx.doi.org/10.1016/j.biopha.2021.111376] [PMID: 33588266]
[http://dx.doi.org/10.2183/pjab.91.394] [PMID: 26460318]
[http://dx.doi.org/10.3390/ijms19124080] [PMID: 30562954]
[http://dx.doi.org/10.1016/j.molcel.2018.06.001] [PMID: 29983320]
[http://dx.doi.org/10.1002/anie.201810946] [PMID: 30370969]
[http://dx.doi.org/10.1016/j.molcel.2019.03.036] [PMID: 31031084]
[http://dx.doi.org/10.1038/s41422-019-0230-z] [PMID: 31520064]
[http://dx.doi.org/10.3389/fgene.2019.01145] [PMID: 31824562]
[http://dx.doi.org/10.1007/s13042-019-00990-x]
[http://dx.doi.org/10.1016/j.omtn.2019.08.022] [PMID: 31581051]
[http://dx.doi.org/10.1016/j.ygeno.2020.07.035] [PMID: 32721444]
[http://dx.doi.org/10.1093/bioinformatics/btaa178] [PMID: 32163126]
[http://dx.doi.org/10.1016/j.bpc.2021.106697] [PMID: 34628276]
[http://dx.doi.org/10.1016/j.ab.2020.113905] [PMID: 32805275]
[http://dx.doi.org/10.1093/bib/bbaa278]
[http://dx.doi.org/10.1016/j.omtn.2020.08.022] [PMID: 33230441]
[http://dx.doi.org/10.1016/j.jmb.2022.167549] [PMID: 35662472]
[http://dx.doi.org/10.1021/acsomega.3c01371] [PMID: 37305295]
[http://dx.doi.org/10.1016/j.chemolab.2021.104398]
[http://dx.doi.org/10.3390/electronics11121917]
[http://dx.doi.org/10.1093/bib/bbz041] [PMID: 31067315]
[PMID: 28427142]
[http://dx.doi.org/10.1093/bib/bbab279] [PMID: 34297803]
[http://dx.doi.org/10.1093/bioinformatics/btl151] [PMID: 16632492]
[http://dx.doi.org/10.1093/bioinformatics/btz768] [PMID: 31598637]
[http://dx.doi.org/10.1038/s41467-023-41437-w] [PMID: 37699885]
[http://dx.doi.org/10.1093/bib/bbad338] [PMID: 37798249]