Generic placeholder image

Current Bioinformatics

Editor-in-Chief

ISSN (Print): 1574-8936
ISSN (Online): 2212-392X

Research Article

DMR_Kmeans: Identifying Differentially Methylated Regions Based on k-means Clustering and Read Methylation Haplotype Filtering

Author(s): Xiaoqing Peng, Wanxin Cui, Xiangyan Kong, Yuannan Huang and Ji Li*

Volume 19, Issue 5, 2024

Published on: 06 October, 2023

Page: [490 - 501] Pages: 12

DOI: 10.2174/0115748936245495230925112419

Price: $65

Abstract

Introduction: Differentially methylated regions (DMRs), including tissue-specific DMRs and disease-specific DMRs, can be used in revealing the mechanisms of gene regulation and screening diseases. Up until now, many methods have been proposed to detect DMRs from bisulfite sequencing data. In these methods, differentially methylated CpG sites and DMRs are usually identified based on statistical tests or distribution models, which neglect the joint methylation statuses provided in each read and result in inaccurate boundaries of DMRs.

Methods: In this paper, a method, named DMR_Kmeans, is proposed to detect DMRs based on kmeans clustering and read methylation haplotype filtering. In DMR_Kmeans, for each CpG site, the k-means algorithm is used to cluster the methylation levels from two groups, and the methylation difference of the CpG is measured based on the different distributions in clusters. Methylation haplotypes of reads are employed to extract the methylation patterns in a candidate region. Finally, DMRs are identified based on the methylation differences and the methylation patterns in candidate regions.

Result: Comparing the performance of DMR_Kmeans and eight DMR detection methods on the whole genome bisulfite sequencing data of six pairs of tissues, the results show that DMR_Kmeans achieves higher Qn and Ql, and more overlapped promoters than other methods when given a certain threshold of methylation difference greater than 0.4, which indicates that the DMRs predicted by DMR_Kmeans with accurate boundaries contain less CpGs with small methylation differences than those by other methods.

Conclusion: Furthermore, it suggests that DMR_Kmeans can provide a DMR set with high quality for downstream analysis since the total length of DMRs predicted by DMR_Kmeans is longer and the total number of CpG sites in the DMRs is greater than those of other methods.

[1]
Kundaje A, Meuleman W, Ernst J, et al. Integrative analysis of 111 reference human epigenomes. Nature 2015; 518(7539): 317-30.
[http://dx.doi.org/10.1038/nature14248] [PMID: 25693563]
[2]
Bergman Y, Cedar H. DNA methylation dynamics in health and disease. Nat Struct Mol Biol 2013; 20(3): 274-81.
[http://dx.doi.org/10.1038/nsmb.2518] [PMID: 23463312]
[3]
Peng X, Li Y, Kong X, Zhu X, Ding X. Investigating different DNA methylation patterns at the resolution of methylation haplotypes. Front Genet 2021; 12: 697279.
[http://dx.doi.org/10.3389/fgene.2021.697279] [PMID: 34262601]
[4]
Gibbs JR, van der Brug MP, Hernandez DG, et al. Abundant quantitative trait loci exist for DNA methylation and gene expression in human brain. PLoS Genet 2010; 6(5): e1000952.
[http://dx.doi.org/10.1371/journal.pgen.1000952] [PMID: 20485568]
[5]
Bell JT, Pai AA, Pickrell JK, et al. DNA methylation patterns associate with genetic and gene expression variation in HapMap cell lines. Genome Biol 2011; 12(1): R10.
[http://dx.doi.org/10.1186/gb-2011-12-1-r10] [PMID: 21251332]
[6]
Song F, Smith JF, Kimura MT, et al. Association of tissue-specific differentially methylated regions (TDMs) with differential gene expression. Proc Natl Acad Sci 2005; 102(9): 3336-41.
[http://dx.doi.org/10.1073/pnas.0408436102] [PMID: 15728362]
[7]
Rakyan VK, Down TA, Thorne NP, et al. An integrated resource for genome-wide identification and analysis of human tissue-specific differentially methylated regions (tDMRs). Genome Res 2008; 18(9): 1518-29.
[http://dx.doi.org/10.1101/gr.077479.108] [PMID: 18577705]
[8]
Yagi S, Hirabayashi K, Sato S, et al. DNA methylation profile of tissue-dependent and differentially methylated regions (T-DMRs) in mouse promoter regions demonstrating tissue-specific gene expression. Genome Res 2008; 18(12): 1969-78.
[http://dx.doi.org/10.1101/gr.074070.107] [PMID: 18971312]
[9]
Delgado-Calle J, Fernández AF, Sainz J, et al. Genome-wide profiling of bone reveals differentially methylated regions in osteoporosis and osteoarthritis. Arthritis Rheum 2013; 65(1): 197-205.
[http://dx.doi.org/10.1002/art.37753] [PMID: 23124911]
[10]
Jones PA, Baylin SB. The fundamental role of epigenetic events in cancer. Nat Rev Genet 2002; 3(6): 415-28.
[http://dx.doi.org/10.1038/nrg816] [PMID: 12042769]
[11]
Irizarry RA, Ladd-Acosta C, Wen B, et al. The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shores. Nat Genet 2009; 41(2): 178-86.
[http://dx.doi.org/10.1038/ng.298] [PMID: 19151715]
[12]
Moss J, Magenheim J, Neiman D, et al. Comprehensive human cell-type methylation atlas reveals origins of circulating cell-free DNA in health and disease. Nat Commun 2018; 9(1): 5068.
[http://dx.doi.org/10.1038/s41467-018-07466-6] [PMID: 30498206]
[13]
Kang S, Li Q, Chen Q, et al. CancerLocator: Non-invasive cancer diagnosis and tissue-of-origin prediction using methylation profiles of cell-free DNA. Genome Biol 2017; 18(1): 53.
[http://dx.doi.org/10.1186/s13059-017-1191-5] [PMID: 28335812]
[14]
Guo S, Diep D, Plongthongkum N, Fung HL, Zhang K, Zhang K. Identification of methylation haplotype blocks aids in deconvolution of heterogeneous tissue samples and tumor tissue-of-origin mapping from plasma DNA. Nat Genet 2017; 49(4): 635-42.
[http://dx.doi.org/10.1038/ng.3805] [PMID: 28263317]
[15]
Peng X, Li HD, Wu FX, Wang J. Identifying the tissues-of-origin of circulating cell-free DNAs is a promising way in noninvasive diagnostics. Brief Bioinform 2021; 22(3): bbaa060.
[http://dx.doi.org/10.1093/bib/bbaa060] [PMID: 32427285]
[16]
Nunes S, Moreira-Barbosa C, Salta S, et al. Cell-free DNA methylation of selected genes allows for early detection of the major cancers in women. Cancers 2018; 10(10): 357.
[http://dx.doi.org/10.3390/cancers10100357] [PMID: 30261643]
[17]
Li W, Li Q, Kang S, et al. CancerDetector: Ultrasensitive and non-invasive cancer detection at the resolution of individual reads using cell-free DNA methylation sequencing data. Nucleic Acids Res 2018; 46(15): e89-9.
[http://dx.doi.org/10.1093/nar/gky423] [PMID: 29897492]
[18]
Lehmann-Werman R, Neiman D, Zemmour H, et al. Identification of tissue-specific cell death using methylation patterns of circulating DNA. Proc Natl Acad Sci 2016; 113(13): E1826-34.
[http://dx.doi.org/10.1073/pnas.1519286113] [PMID: 26976580]
[19]
Wang L, Ding Y, Xu J, et al. Identification of DNA N4-methylcytosine sites via fuzzy model on self representation. Appl Soft Comput 2022; 122: 108840.
[http://dx.doi.org/10.1016/j.asoc.2022.108840]
[20]
Wang L, Ding Y, Tiwari P, et al. A deep multiple kernel learning-based higher-order fuzzy inference system for identifying DNA N4-methylcytosine sites. Information Sciences 2023; 630: 40-52.
[http://dx.doi.org/10.1016/j.ins.2023.01.149]
[21]
Xie H, Ding Y, Qian Y, et al. Structured Sparse Regularization based Random Vector Functional Link Networks for DNA N4-methylcytosine sites prediction. Expert Systems with Applications 2024; 235: 121157.
[http://dx.doi.org/10.1016/j.eswa.2023.121157]
[22]
Ding Y, Tiwari P, Zou Q, et al. C-loss based higher order fuzzy inference systems for identifying DNA N4-methylcytosine sites. IEEE Trans Fuzzy Syst 2022; 30(11): 4754-65.
[http://dx.doi.org/10.1109/TFUZZ.2022.3159103]
[23]
Condon DE, Tran PV, Lien YC, et al. Defiant: (DMRs: Easy, fast, identification and ANnoTation) identifies differentially Methylated regions from iron-deficient rat hippocampus. BMC Bioinformatics 2018; 19(1): 31.
[http://dx.doi.org/10.1186/s12859-018-2037-1] [PMID: 29402210]
[24]
Catoni M, Tsang JMF, Greco AP, Zabet NR. DMRcaller: A versatile R/Bioconductor package for detection and visualization of differentially methylated regions in CpG and non-CpG contexts. Nucleic Acids Res 2018; 46(19): e114-4.
[http://dx.doi.org/10.1093/nar/gky602] [PMID: 29986099]
[25]
Feng H, Conneely KN, Wu H. A Bayesian hierarchical model to detect differentially methylated loci from single nucleotide resolution sequencing data. Nucleic Acids Res 2014; 42(8): e69-9.
[http://dx.doi.org/10.1093/nar/gku154] [PMID: 24561809]
[26]
Dolzhenko E, Smith AD. Using beta-binomial regression for high-precision differential methylation analysis in multifactor whole-genome bisulfite sequencing experiments. BMC Bioinformatics 2014; 15(1): 215.
[http://dx.doi.org/10.1186/1471-2105-15-215] [PMID: 24962134]
[27]
Sun D, Xi Y, Rodriguez B, et al. MOABS: Model based analysis of bisulfite sequencing data. Genome Biol 2014; 15(2): R38.
[http://dx.doi.org/10.1186/gb-2014-15-2-r38] [PMID: 24565500]
[28]
Hansen KD, Langmead B, Irizarry RA. BSmooth: From whole genome bisulfite sequencing reads to differentially methylated regions. Genome Biol 2012; 13(10): R83.
[http://dx.doi.org/10.1186/gb-2012-13-10-r83] [PMID: 23034175]
[29]
Saito Y, Tsuji J, Mituyama T. Bisulfighter: Accurate detection of methylated cytosines and differentially methylated regions. Nucleic Acids Res 2014; 42(6): e45-5.
[http://dx.doi.org/10.1093/nar/gkt1373] [PMID: 24423865]
[30]
Wu H, Xu T, Feng H, et al. Detection of differentially methylated regions from whole-genome bisulfite sequencing data without replicates. Nucleic Acids Res 2015; 43(21): gkv715.
[http://dx.doi.org/10.1093/nar/gkv715] [PMID: 26184873]
[31]
Assenov Y, Müller F, Lutsik P, Walter J, Lengauer T, Bock C. Comprehensive analysis of DNA methylation data with RnBeads. Nat Methods 2014; 11(11): 1138-40.
[http://dx.doi.org/10.1038/nmeth.3115] [PMID: 25262207]
[32]
Akalin A, Kormaksson M, Li S, et al. methylKit: A comprehensive R package for the analysis of genome-wide DNA methylation profiles. Genome Biol 2012; 13(10): R87.
[http://dx.doi.org/10.1186/gb-2012-13-10-r87] [PMID: 23034086]
[33]
Warden CD, Lee H, Tompkins JD, et al. COHCAP: An integrative genomic pipeline for single-nucleotide resolution DNA methylation analysis. Nucleic Acids Res 2013; 41(11): e117-7.
[http://dx.doi.org/10.1093/nar/gkt242] [PMID: 23598999]
[34]
Stockwell PA, Chatterjee A, Rodger EJ, Morison IM. DMAP: Differential methylation analysis package for RRBS and WGBS data. Bioinformatics 2014; 30(13): 1814-22.
[http://dx.doi.org/10.1093/bioinformatics/btu126] [PMID: 24608764]
[35]
Wang Z, Li X, Jiang Y, et al. swDMR: A sliding window approach to identify differentially methylated regions based on whole genome bisulfite sequencing. PLoS One 2015; 10(7): e0132866.
[http://dx.doi.org/10.1371/journal.pone.0132866] [PMID: 26176536]
[36]
Hebestreit K, Dugas M, Klein HU. Detection of significantly differentially methylated regions in targeted bisulfite sequencing data. Bioinformatics 2013; 29(13): 1647-53.
[http://dx.doi.org/10.1093/bioinformatics/btt263] [PMID: 23658421]
[37]
Li S, Garrett-Bakelman FE, Akalin A, et al. An optimized algorithm for detecting and annotating regional differential methylation. BMC Bioinform 2013; S10.
[http://dx.doi.org/10.1186/1471-2105-14-S5-S10]
[38]
Su J, Yan H, Wei Y, et al. CpG_MPs: Identification of CpG methylation patterns of genomic regions from high-throughput bisulfite sequencing data. Nucleic Acids Res 2013; 41(1): e4-4.
[http://dx.doi.org/10.1093/nar/gks829] [PMID: 22941633]
[39]
Liu H, Liu X, Zhang S, et al. Systematic identification and annotation of human methylation marks based on bisulfite sequencing methylomes reveals distinct roles of cell type-specific hypomethylation in the regulation of cell identity genes. Nucleic Acids Res 2016; 44(1): 75-94.
[http://dx.doi.org/10.1093/nar/gkv1332] [PMID: 26635396]
[40]
Jühling F, Kretzmer H, Bernhart SH, Otto C, Stadler PF, Hoffmann S. metilene: Fast and sensitive calling of differentially methylated regions from bisulfite sequencing data. Genome Res 2016; 26(2): 256-62.
[http://dx.doi.org/10.1101/gr.196394.115] [PMID: 26631489]
[41]
Wen Y, Chen F, Zhang Q, Zhuang Y, Li Z. Detection of differentially methylated regions in whole genome bisulfite sequencing data using local Getis-Ord statistics. Bioinformatics 2016; 32(22): 3396-404.
[http://dx.doi.org/10.1093/bioinformatics/btw497] [PMID: 27493194]
[42]
MacQueen J. Some methods for classification and analysis of multivariate observations. Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. Oakland, CA, USA. 1967; pp. 281-97.
[43]
Consortium EP. The ENCODE (ENCyclopedia of DNA elements) project. Science 2004; 306(5696): 636-40.
[http://dx.doi.org/10.1126/science.1105136] [PMID: 15499007]
[44]
Krueger F, Andrews SR. Bismark: A flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 2011; 27(11): 1571-2.
[http://dx.doi.org/10.1093/bioinformatics/btr167] [PMID: 21493656]
[45]
Peng X, Luo H, Kong X, Wang J. Metrics for evaluating differentially methylated region sets predicted from BS-seq data. Brief Bioinform 2022; 23(1): bbab475.
[http://dx.doi.org/10.1093/bib/bbab475] [PMID: 34874989]
[46]
Srivastava A, Karpievitch YV, Eichten SR, Borevitz JO, Lister R. HOME: A histogram based machine learning approach for effective identification of differentially methylated regions. BMC Bioinformatics 2019; 20(1): 253.
[http://dx.doi.org/10.1186/s12859-019-2845-y] [PMID: 31096906]

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy