Abstract
Background: 2’-O-Methylation (2’-O-Me) is a post-transcriptional RNA modification that occurs in the ribose sugar moiety of all four nucleotides and is abundant in both coding and non-coding RNAs. Accurate prediction of each subtype of 2’-O-Me (Am, Cm, Gm, Um) helps understand their role in RNA metabolism and function.
Objective: This study aims to build models that can predict each subtype of 2’-O-Me from RNA sequence and nanopore signals and exploit the model interpretability for sequence motif mining.
Methods: We first propose a novel deep learning model DeepNm to better capture the sequence features of each subtype with a multi-scale framework. Based on DeepNm, we continue to propose HybridNm, which combines sequences and nanopore signals through a dual-path framework. The nanopore signalderived features are first passed through a convolutional layer and then merged with sequence features extracted from different scales for final classification.
Results: A 5-fold cross-validation process on Nm-seq data shows that DeepNm outperforms two stateof- the-art 2’-O-Me predictors. After incorporating nanopore signal-derived features, HybridNm further achieved significant improvements. Through model interpretation, we identified not only subtypespecific motifs but also revealed shared motifs between subtypes. In addition, Cm, Gm, and Um shared motifs with the well-studied m6A RNA methylation, suggesting a potential interplay among different RNA modifications and the complex nature of epitranscriptome regulation.
Conclusion: The proposed frameworks can be useful tools to predict 2’-O-Me subtypes accurately and reveal specific sequence patterns.
Keywords: 2’-O-Methylation, site prediction, nanopore RNA sequencing, RNA methylation, epitranscriptome, Deep Learning.
Graphical Abstract
[http://dx.doi.org/10.1038/nrm.2016.132] [PMID: 27808276]
[http://dx.doi.org/10.1093/nar/gkx1030] [PMID: 29106616]
[http://dx.doi.org/10.1093/nar/gky811] [PMID: 30202881]
[http://dx.doi.org/10.3390/biom7010013] [PMID: 28208788]
[http://dx.doi.org/10.1038/s41467-019-11375-7] [PMID: 31363086]
[http://dx.doi.org/10.1038/nmeth.4294] [PMID: 28504680]
[http://dx.doi.org/10.1261/rna.044503.114] [PMID: 24951554]
[http://dx.doi.org/10.1134/S0006297916090030] [PMID: 27682166]
[http://dx.doi.org/10.1073/pnas.1707674114] [PMID: 29158377]
[http://dx.doi.org/10.1371/journal.pone.0193804] [PMID: 29601584]
[http://dx.doi.org/10.1016/j.ymeth.2016.02.003] [PMID: 26853326]
[http://dx.doi.org/10.1016/j.ymthe.2022.05.001] [PMID: 35526094]
[http://dx.doi.org/10.1093/nar/gkw104] [PMID: 26896799]
[http://dx.doi.org/10.1261/rna.069112.118] [PMID: 30425123]
[http://dx.doi.org/10.1093/bib/bbz112] [PMID: 31714956]
[http://dx.doi.org/10.1093/bioinformatics/btab278] [PMID: 34252943]
[http://dx.doi.org/10.1093/nar/gkz074] [PMID: 30993345]
[http://dx.doi.org/10.1186/s12859-019-3265-8] [PMID: 31874624]
[http://dx.doi.org/10.1016/j.jtbi.2018.12.034] [PMID: 30590059]
[http://dx.doi.org/10.3389/fcell.2021.686894] [PMID: 34055810]
[http://dx.doi.org/10.1038/s10038-019-0679-0] [PMID: 31602005]
[http://dx.doi.org/10.1016/j.xgen.2022.100097]
[http://dx.doi.org/10.1016/j.ygeno.2022.110372] [PMID: 35460817]
[http://dx.doi.org/10.1101/2020.05.31.126763]
[http://dx.doi.org/10.1093/nar/gkaa620] [PMID: 32710622]
[http://dx.doi.org/10.3389/fgene.2020.00394] [PMID: 32425981]
[http://dx.doi.org/10.1080/15476286.2021.1978215] [PMID: 34559589]
[http://dx.doi.org/10.1038/s41587-021-00949-w] [PMID: 34282325]
[http://dx.doi.org/10.1111/jipb.13002] [PMID: 32735361]
[http://dx.doi.org/10.1261/rna.072785.119] [PMID: 31624092]
[http://dx.doi.org/10.1038/s41467-019-11713-9] [PMID: 31501426]
[http://dx.doi.org/10.1186/s13059-020-02241-7] [PMID: 33413586]
[http://dx.doi.org/10.1101/2021.03.31.437901]
[http://dx.doi.org/10.1038/s41587-021-00915-6] [PMID: 33986546]
[http://dx.doi.org/10.1101/gr.260836.120] [PMID: 32907883]
[http://dx.doi.org/10.1093/bioinformatics/btaa601] [PMID: 32597959]
[http://dx.doi.org/10.1101/gr.247064.118] [PMID: 31439691]
[http://dx.doi.org/10.7554/eLife.49658] [PMID: 31931956]
[http://dx.doi.org/10.1038/s41467-019-08289-9] [PMID: 30718479]
[http://dx.doi.org/10.1101/2021.09.20.461055]
[http://dx.doi.org/10.1093/bioinformatics/bts565] [PMID: 23060610]
[http://dx.doi.org/10.1038/nmeth.3444] [PMID: 26076426]
[http://dx.doi.org/10.1093/bioinformatics/bty191] [PMID: 29750242]
[http://dx.doi.org/10.1186/s13059-020-02055-7] [PMID: 32560708]
[http://dx.doi.org/10.1016/j.gene.2021.145643] [PMID: 33848577]
[http://dx.doi.org/10.1016/j.ymeth.2021.12.004] [PMID: 34915158]
[http://dx.doi.org/10.1016/j.ygeno.2020.01.017] [PMID: 31987913]
[http://dx.doi.org/10.1162/089976698300017197] [PMID: 9744903]
[http://dx.doi.org/10.1038/nchembio.2040] [PMID: 26863410]
[http://dx.doi.org/10.1038/s41467-021-24313-3] [PMID: 34188054]