Abstract
Introduction: Differentially methylated regions (DMRs), including tissue-specific DMRs and disease-specific DMRs, can be used in revealing the mechanisms of gene regulation and screening diseases. Up until now, many methods have been proposed to detect DMRs from bisulfite sequencing data. In these methods, differentially methylated CpG sites and DMRs are usually identified based on statistical tests or distribution models, which neglect the joint methylation statuses provided in each read and result in inaccurate boundaries of DMRs.
Methods: In this paper, a method, named DMR_Kmeans, is proposed to detect DMRs based on kmeans clustering and read methylation haplotype filtering. In DMR_Kmeans, for each CpG site, the k-means algorithm is used to cluster the methylation levels from two groups, and the methylation difference of the CpG is measured based on the different distributions in clusters. Methylation haplotypes of reads are employed to extract the methylation patterns in a candidate region. Finally, DMRs are identified based on the methylation differences and the methylation patterns in candidate regions.
Result: Comparing the performance of DMR_Kmeans and eight DMR detection methods on the whole genome bisulfite sequencing data of six pairs of tissues, the results show that DMR_Kmeans achieves higher Qn and Ql, and more overlapped promoters than other methods when given a certain threshold of methylation difference greater than 0.4, which indicates that the DMRs predicted by DMR_Kmeans with accurate boundaries contain less CpGs with small methylation differences than those by other methods.
Conclusion: Furthermore, it suggests that DMR_Kmeans can provide a DMR set with high quality for downstream analysis since the total length of DMRs predicted by DMR_Kmeans is longer and the total number of CpG sites in the DMRs is greater than those of other methods.
[http://dx.doi.org/10.1038/nature14248] [PMID: 25693563]
[http://dx.doi.org/10.1038/nsmb.2518] [PMID: 23463312]
[http://dx.doi.org/10.3389/fgene.2021.697279] [PMID: 34262601]
[http://dx.doi.org/10.1371/journal.pgen.1000952] [PMID: 20485568]
[http://dx.doi.org/10.1186/gb-2011-12-1-r10] [PMID: 21251332]
[http://dx.doi.org/10.1073/pnas.0408436102] [PMID: 15728362]
[http://dx.doi.org/10.1101/gr.077479.108] [PMID: 18577705]
[http://dx.doi.org/10.1101/gr.074070.107] [PMID: 18971312]
[http://dx.doi.org/10.1002/art.37753] [PMID: 23124911]
[http://dx.doi.org/10.1038/nrg816] [PMID: 12042769]
[http://dx.doi.org/10.1038/ng.298] [PMID: 19151715]
[http://dx.doi.org/10.1038/s41467-018-07466-6] [PMID: 30498206]
[http://dx.doi.org/10.1186/s13059-017-1191-5] [PMID: 28335812]
[http://dx.doi.org/10.1038/ng.3805] [PMID: 28263317]
[http://dx.doi.org/10.1093/bib/bbaa060] [PMID: 32427285]
[http://dx.doi.org/10.3390/cancers10100357] [PMID: 30261643]
[http://dx.doi.org/10.1093/nar/gky423] [PMID: 29897492]
[http://dx.doi.org/10.1073/pnas.1519286113] [PMID: 26976580]
[http://dx.doi.org/10.1016/j.asoc.2022.108840]
[http://dx.doi.org/10.1016/j.ins.2023.01.149]
[http://dx.doi.org/10.1016/j.eswa.2023.121157]
[http://dx.doi.org/10.1109/TFUZZ.2022.3159103]
[http://dx.doi.org/10.1186/s12859-018-2037-1] [PMID: 29402210]
[http://dx.doi.org/10.1093/nar/gky602] [PMID: 29986099]
[http://dx.doi.org/10.1093/nar/gku154] [PMID: 24561809]
[http://dx.doi.org/10.1186/1471-2105-15-215] [PMID: 24962134]
[http://dx.doi.org/10.1186/gb-2014-15-2-r38] [PMID: 24565500]
[http://dx.doi.org/10.1186/gb-2012-13-10-r83] [PMID: 23034175]
[http://dx.doi.org/10.1093/nar/gkt1373] [PMID: 24423865]
[http://dx.doi.org/10.1093/nar/gkv715] [PMID: 26184873]
[http://dx.doi.org/10.1038/nmeth.3115] [PMID: 25262207]
[http://dx.doi.org/10.1186/gb-2012-13-10-r87] [PMID: 23034086]
[http://dx.doi.org/10.1093/nar/gkt242] [PMID: 23598999]
[http://dx.doi.org/10.1093/bioinformatics/btu126] [PMID: 24608764]
[http://dx.doi.org/10.1371/journal.pone.0132866] [PMID: 26176536]
[http://dx.doi.org/10.1093/bioinformatics/btt263] [PMID: 23658421]
[http://dx.doi.org/10.1186/1471-2105-14-S5-S10]
[http://dx.doi.org/10.1093/nar/gks829] [PMID: 22941633]
[http://dx.doi.org/10.1093/nar/gkv1332] [PMID: 26635396]
[http://dx.doi.org/10.1101/gr.196394.115] [PMID: 26631489]
[http://dx.doi.org/10.1093/bioinformatics/btw497] [PMID: 27493194]
[http://dx.doi.org/10.1126/science.1105136] [PMID: 15499007]
[http://dx.doi.org/10.1093/bioinformatics/btr167] [PMID: 21493656]
[http://dx.doi.org/10.1093/bib/bbab475] [PMID: 34874989]
[http://dx.doi.org/10.1186/s12859-019-2845-y] [PMID: 31096906]