Note! Please note that this article is currently in the "Article in Press" stage and is not the final "Version of record". While it has been accepted, copy-edited, and formatted, however, it is still undergoing proofreading and corrections by the authors. Therefore, the text may still change before the final publication. Although "Articles in Press" may not have all bibliographic details available, the DOI and the year of online publication can still be used to cite them. The article title, DOI, publication year, and author(s) should all be included in the citation format. Once the final "Version of record" becomes available the "Article in Press" will be replaced by that.
Abstract
Introduction: N6-methyldeoxyadenine (6mA) is the most prevalent DNA modification in both prokaryotes and eukaryotes. While single-molecule real-time sequencing (SMRT-seq) can detect 6mA events at the individual nucleotide level, its practical application is hindered by a high rate of false positives.
Methods: We propose a computational model for identifying DNA 6mA that incorporates comprehensive site features from SMRT-seq and employs machine learning classifiers.
Results: The results demonstrate that 99.54% and 96.55% of the identified DNA 6mA instances in C.reinhardtii correspond with motifs and peak regions identified by methylated DNA immunoprecipitation sequencing (MeDIP-seq), respectively. Compared to SMRT-seq, the proportion of predicted DNA 6mA instances within MeDIP-seq peak regions increases by 2% to 70% across the six bacterial strains.
Conclusion: Our proposed method effectively reduces the false-positive rate in DNA 6mA prediction.