Generic placeholder image

Current Bioinformatics

Editor-in-Chief

ISSN (Print): 1574-8936
ISSN (Online): 2212-392X

Research Article

An Unbiased Predictive Model to Detect DNA Methylation Propensity of CpG Islands in the Human Genome

Author(s): Dicle Yalcin and Hasan H. Otu*

Volume 16, Issue 2, 2021

Published on: 24 July, 2020

Page: [179 - 196] Pages: 18

DOI: 10.2174/1574893615999200724145835

Price: $65

Abstract

Background: Epigenetic repression mechanisms play an important role in gene regulation, specifically in cancer development. In many cases, a CpG island’s (CGI) susceptibility or resistance to methylation is shown to be contributed by local DNA sequence features.

Objective: To develop unbiased machine learning models–individually and combined for different biological features–that predict the methylation propensity of a CGI.

Methods: We developed our model consisting of CGI sequence features on a dataset of 75 sequences (28 prone, 47 resistant) representing a genome-wide methylation structure. We tested our model on two independent datasets that are chromosome (132 sequences) and disease (70 sequences) specific.

Results: We provided improvements in prediction accuracy over previous models. Our results indicate that combined features better predict the methylation propensity of a CGI (area under the curve (AUC) ~0.81). Our global methylation classifier performs well on independent datasets reaching an AUC of ~0.82 for the complete model and an AUC of ~0.88 for the model using select sequences that better represent their classes in the training set. We report certain de novo motifs and transcription factor binding site (TFBS) motifs that are consistently better in separating prone and resistant CGIs.

Conclusion: Predictive models for the methylation propensity of CGIs lead to a better understanding of disease mechanisms and can be used to classify genes based on their tendency to contain methylation prone CGIs, which may lead to preventative treatment strategies. MATLAB® and Python™ scripts used for model building, prediction, and downstream analyses are available at https://github.com/dicleyalcin/methylProp_predictor.

Keywords: CpG island, methylation, predictive model, unbiased learning, sequence signature, DNA motif.

Next »
Graphical Abstract

[1]
Costello JF, Frühwald MC, Smiraglia DJ, et al. Aberrant CpG-island methylation has non-random and tumour-type-specific patterns. Nat Genet 2000; 24(2): 132-8.
[http://dx.doi.org/10.1038/72785] [PMID: 10655057]
[2]
Merlo A, Herman JG, Mao L, et al. 5′ CpG island methylation is associated with transcriptional silencing of the tumour suppressor p16/CDKN2/MTS1 in human cancers. Nat Med 1995; 1(7): 686-92.
[http://dx.doi.org/10.1038/nm0795-686] [PMID: 7585152]
[3]
Teodoridis JM, Hall J, Marsh S, et al. CpG island methylation of DNA damage response genes in advanced ovarian cancer . Cancer Res 2005. 65(19): 8961-7.
[http://dx.doi.org/10.1158/0008-5472.CAN-05-1187] [PMID: 16204069]
[4]
Toyota M, Suzuki H, Sasaki Y, et al. .Epigenetic silencing of microRNA-34b/c and B-cell translocation gene 4 is associated with CpG island methylation in colorectal cancer. Cancer Res 2008; 68(11): 4123-32..
[http://dx.doi.org/10.1158/0008-5472.CAN-08-0325] [PMID: 18519671]
[5]
Goll MG, Bestor TH. Eukaryotic cytosine methyltransferases. Annu Rev Biochem 2005; 74: 481-514.
[http://dx.doi.org/10.1146/annurev.biochem.74.010904.153721] [PMID: 15952895]
[6]
Gardiner-Garden M, Frommer M. CpG islands in vertebrate genomes. J Mol Biol 1987; 196(2): 261-82..
[http://dx.doi.org/10.1016/0022-2836(87)90689-9] [PMID: 3656447]
[7]
Yalcin D, Otu HH, Eds. Comparative analysis of human and mouse CpG islands using dbCGI IEEE international conference on electro information technology. EIT 2017.
[http://dx.doi.org/10.1109/EIT.2017.8053357]
[8]
Esteller M. Epigenetic gene silencing in cancer: the DNA hypermethylome. Hum Mol Genet 2007; 16: R50-9.
[http://dx.doi.org/10.1093/hmg/ddm018] [PMID: 17613547]
[9]
Métivier R, Gallais R, Tiffoche C, et al. Cyclical DNA methylation of a transcriptionally active promoter. Nature 2008; 452(7183): 45-50.
[http://dx.doi.org/10.1038/nature06544] [PMID: 18322525]
[10]
Suzuki M, Yamada T, Kihara-Negishi F, et al. Site-specific DNA methylation by a complex of PU.1 and Dnmt3a/b. Oncogene 2006; 25(17): 2477-88.
[http://dx.doi.org/10.1038/sj.onc.1209272] [PMID: 16331260]
[11]
Turker MS. Gene silencing in mammalian cells and the spread of DNA methylation. Oncogene 2002; 21(35): 5388-93.
[http://dx.doi.org/10.1038/sj.onc.1205599] [PMID: 12154401]
[12]
Feltus FA, Lee EK, Costello JF, Plass C, Vertino PM. Predicting aberrant CpG island methylation. Proc Natl Acad Sci USA 2003; 100(21): 12253-8.
[http://dx.doi.org/10.1073/pnas.2037852100] [PMID: 14519846]
[13]
Long HK, King HW, Patient RK, Odom DT, Klose RJ. Protection of CpG islands from DNA methylation is DNA-encoded and evolutionarily conserved. Nucleic Acids Res 2016; 44(14): 6693-706.
[http://dx.doi.org/10.1093/nar/gkw258] [PMID: 27084945]
[14]
Fan S, Fang F, Zhang X, Zhang MQ. Putative zinc finger protein binding sites are over-represented in the boundaries of methylationresistant CpG islands in the human genome. PLoS One 2007; 2(11): e1184..
[http://dx.doi.org/10.1371/journal.pone.0001184] [PMID: 18030324]
[15]
Robertson KD, Ait-Si-Ali S, Yokochi T, Wade PA, Jones PL, Wolffe AP. DNMT1 forms a complex with Rb, E2F1 and HDAC1 and represses transcription from E2F-responsive promoters. Nat Genet 2000; 25(3): 338-42.
[http://dx.doi.org/10.1038/77124] [PMID: 10888886]
[16]
Brandeis M, Frank D, Keshet I, et al. Sp1 elements protect a CpG island from de novo methylation. Nature 1994; 371(6496): 435-8.
[http://dx.doi.org/10.1038/371435a0] [PMID: 8090226]
[17]
Graff JR, Herman JG, Myöhänen S, Baylin SB, Vertino PM. Mapping patterns of CpG island methylation in normal and neoplastic cells implicates both upstream and downstream regions in de novo methylation. J Biol Chem 1997; 272(35): 22322-9.
[http://dx.doi.org/10.1074/jbc.272.35.22322] [PMID: 9268383]
[18]
Macleod D, Charlton J, Mullins J, Bird AP. Sp1 sites in the mouse aprt gene promoter are required to prevent methylation of the CpG island. Genes Dev 1994; 8(19): 2282-92.
[http://dx.doi.org/10.1101/gad.8.19.2282] [PMID: 7958895]
[19]
Joo MK, Kim KH, Park JJ, et al. CpG island promoter hypermethylation of Ras association domain family 1A gene contributes to gastric carcinogenesis. Mol Med Rep 2015; 11(4): 3039-46.
[http://dx.doi.org/10.3892/mmr.2014.3055] [PMID: 25483734]
[20]
Rice JC, Futscher BW. Transcriptional repression of BRCA1 by aberrant cytosine methylation, histone hypoacetylation and chromatin condensation of the BRCA1 promoter. Nucleic Acids Res 2000; 28(17): 3233-9.
[http://dx.doi.org/10.1093/nar/28.17.3233] [PMID: 10954590]
[21]
Batzer MA, Deininger PL. Alu repeats and human genomic diversity. Nat Rev Genet 2002; 3(5): 370-9.
[http://dx.doi.org/10.1038/nrg798] [PMID: 11988762]
[22]
Das R, Dimitrova N, Xuan Z, et al. Computational prediction of methylation status in human genomic sequences. Proc Natl Acad Sci USA 2006; 103(28): 10713-6.
[http://dx.doi.org/10.1073/pnas.0602949103] [PMID: 16818882]
[23]
Fuks F, Burgers WA, Godin N, Kasai M, Kouzarides T. Dnmt3a binds deacetylases and is recruited by a sequence-specific repressor to silence transcription. EMBO J 2001; 20(10): 2536-44.
[http://dx.doi.org/10.1093/emboj/20.10.2536] [PMID: 11350943]
[24]
Millar DS, Paul CL, Molloy PL, Clark SJ. A distinct sequence (ATAAA)n separates methylated and unmethylated domains at the 5′-end of the GSTP1 CpG island. J Biol Chem 2000; 275(32): 24893-9.
[http://dx.doi.org/10.1074/jbc.M906538199] [PMID: 10779522]
[25]
Feltus FA, Lee EK, Costello JF, Plass C, Vertino PM. DNA motifs associated with aberrant CpG island methylation. Genomics 2006; 87(5): 572-9.
[http://dx.doi.org/10.1016/j.ygeno.2005.12.016] [PMID: 16487676]
[26]
McCabe MT, Lee EK, Vertino PM. A multifactorial signature of DNA sequence and polycomb binding predicts aberrant CpG island methylation . Cancer Res 2009; 69(1): 282-91..
[http://dx.doi.org/10.1158/0008-5472.CAN-08-3274] [PMID: 19118013]
[27]
Fang F, Fan S, Zhang X, Zhang MQ. Predicting methylation status of CpG islands in the human brain . Bioinformatics 2006; 22(18):2204-9. .
[http://dx.doi.org/10.1093/bioinformatics/btl377] [PMID: 16837523]
[28]
Estécio MR, Gallegos J, Vallot C, et al. Genome architecture marked by retrotransposons modulates predisposition to DNA methylation in cancer. Genome Res 2010; 20(10): 1369-82.
[http://dx.doi.org/10.1101/gr.107318.110] [PMID: 20716667]
[29]
Zheng H, Wu H, Li J, Jiang SW. CpGIMethPred: computational model for predicting methylation status of CpG islands in human genome. BMC Med Genomics 2013; 6(Suppl. 1): S13.
[http://dx.doi.org/10.1186/1755-8794-6-S1-S13] [PMID: 23369266]
[30]
Bock C, Paulsen M, Tierling S, Mikeska T, Lengauer T, Walter J. CpG island methylation in human lymphocytes is highly correlated with DNA sequence, repeats, and predicted DNA structure. PLoS Genet 2006; 2(3): e26..
[http://dx.doi.org/10.1371/journal.pgen.0020026] [PMID: 16520826]
[31]
Kuhn M, Johnson K. Applied predictive modeling. Springer 2013.
[http://dx.doi.org/10.1007/978-1-4614-6849-3]
[32]
James G, Witten D, Hastie T, Tibshirani R. An introduction to statistical learning. Springer 2013.
[http://dx.doi.org/10.1007/978-1-4614-7138-7]
[33]
Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. Proceedings of the 14th international joint conference on Artificial intelligence (IJCAI). (2): 1137-45.
[34]
Yadav S, Shukla S. Analysis of k-fold cross-validation over hold-out validation on colossal datasets for quality classification. Proceedings of IEEE 6th International conference on advanced computing (IACC). , 2016; 78-83..
[http://dx.doi.org/10.1109/IACC.2016.25]
[35]
Arlot S, Lerasle M. Choice of V for V-fold cross-validation in least-squares density estimation. J Mach Learn Res 2016; 17(1): 7256-305.
[36]
Yamada Y, Watanabe H, Miura F, et al. A comprehensive analysis of allelic methylation status of CpG islands on human chromosome 21q. Genome Res 2004; 14(2): 247-66.
[http://dx.doi.org/10.1101/gr.1351604] [PMID: 14762061]
[37]
Pearson K. On lines and planes of closest fit to systems of points in space. Philos Mag 1901; 2(11): 559-72.
[http://dx.doi.org/10.1080/14786440109462720]
[38]
Bailey TL, Elkan C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol 1994; 2: 28-36.
[PMID: 7584402]
[39]
Bailey TL, Bodén M, Whitington T, Machanick P. The value of position-specific priors in motif discovery using MEME. BMC Bioinformatics 2010; 11: 179.
[http://dx.doi.org/10.1186/1471-2105-11-179] [PMID: 20380693]
[40]
Hubley R, Finn RD, Clements J, et al. The Dfam database of repetitive DNA families. Nucleic Acids Res 2016; 44(D1): D81-9.
[http://dx.doi.org/10.1093/nar/gkv1272] [PMID: 26612867]
[41]
Hume MA, Barrera LA, Gisselbrecht SS, Bulyk ML. UniPROBE, update 2015: new tools and content for the online database of protein-binding microarray data on protein-DNA interactions. Nucleic Acids Res 2015; 43(Database issue): D117-22.
[http://dx.doi.org/10.1093/nar/gku1045] [PMID: 25378322]
[42]
Yang L, Zhou T, Dror I, et al. TFBSshape: a motif database for DNA shape features of transcription factor binding sites. Nucleic Acids Res 2014; 42(Database issue): D148-55.
[http://dx.doi.org/10.1093/nar/gkt1087] [PMID: 24214955]
[43]
Khan A, Fornes O, Stigliani A, et al. JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework. Nucleic Acids Res 2018; 46(D1): D260-6.
[http://dx.doi.org/10.1093/nar/gkx1126] [PMID: 29140473]
[44]
Boeva V. Analysis of genomic sequence motifs for deciphering transcription factor binding and transcriptional regulation in eukaryotic cells. Front Genet 2016; 7: 24.
[http://dx.doi.org/10.3389/fgene.2016.00024] [PMID: 26941778]
[45]
Hoaglin DC, Mosteller F, Tukey JW, Eds. Understanding Robust and Exploratory Data Analysis. New York: John Wiley & Sons 2000.
[46]
Hubbell E, Liu WM, Mei R. Robust estimators for expression analysis . Bioinformatics 2002; 18(12): 1585-92..
[http://dx.doi.org/10.1093/bioinformatics/18.12.1585] [PMID: 12490442]
[47]
Hsu CW, Lin CJ. A comparison of methods for multiclass support vector machines IEEE transactions on neural networks / a publication of the IEEE Neural Networks Council 2002; 13(2): 415-25..
[48]
Altman NS. An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 1992; 46(3): 175-85.
[49]
Ali A, Shamsuddin SM, Ralescu AL. Classification with class imbalance problem: a review. Int J Advance Soft Compu Appl 2015; 7(3): 176-204.
[50]
Chawla NV, Japkowicz N, Kotcz A. Editorial: Special issue on learning from imbalanced data sets. SIGKDD Explor 2004; 6(1): 1-6.
[http://dx.doi.org/10.1145/1007730.1007733]
[51]
Saito T, Rehmsmeier M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS One 2015; 10(3): e0118432..
[http://dx.doi.org/10.1371/journal.pone.0118432] [PMID: 25738806]
[52]
Amancio DR, Comin CH, Casanova D, et al. et al. A systematic comparison of supervised classifiers. PLoS One 2014; 9(4): e94137..
[http://dx.doi.org/10.1371/journal.pone.0094137] [PMID: 24763312]
[53]
Lowengrub JS, Frieboes HB, Jin F, et al. Nonlinear modelling of cancer: bridging the gap between cells and tumours. Nonlinearity 2010; 23(1): R1-9.
[http://dx.doi.org/10.1088/0951-7715/23/1/R01] [PMID: 20808719]
[54]
Razi A, Banerjee N, Dimitrova N, Varadan V. Non-linear Bayesian framework to determine the transcriptional effects of cancer-associated genomic aberrations .. Conf Proc IEEE Eng Med Biol Soc 2015; 2015: 6514-8..
[http://dx.doi.org/10.1109/EMBC.2015.7319885] [PMID: 26737785]
[55]
Brereton RG, Lloyd GR. Support vector machines for classification and regression. Analyst (Lond) 2010; 135(2): 230-67.
[http://dx.doi.org/10.1039/B918972F] [PMID: 20098757]
[56]
Witte T, Plass C, Gerhauser C. Pan-cancer patterns of DNA methylation. Genome Med 2014; 6(8): 66.
[http://dx.doi.org/10.1186/s13073-014-0066-6] [PMID: 25473433]
[57]
Majumdar S, Buckles E, Estrada J, Koochekpour S. Aberrant DNA methylation and prostate cancer . Curr Genomics 2011; 12(7): 486- 505.
[http://dx.doi.org/10.2174/138920211797904061] [PMID: 22547956]
[58]
Liu Y, Lan Q, Siegfried JM, Luketich JD, Keohavong P. Aberrant promoter methylation of p16 and MGMT genes in lung tumors from smoking and never-smoking lung cancer patients. Neoplasia 2006; 8(1): 46-51.
[http://dx.doi.org/10.1593/neo.05586] [PMID: 16533425]
[59]
Yan P, Yang X, Wang J, Wang S, Ren H. A novel CpG island methylation panel predicts survival in lung adenocarcinomas. Oncol Lett 2019; 18(2): 1011-22.
[http://dx.doi.org/10.3892/ol.2019.10431] [PMID: 31423161]
[60]
Lecomte T, Berger A, Zinzindohoué F, et al. Detection of free-circulating tumor-associated DNA in plasma of colorectal cancer patients and its association with prognosis. Int J Cancer 2002; 100(5): 542-8.
[http://dx.doi.org/10.1002/ijc.10526] [PMID: 12124803]
[61]
Müller HM, Widschwendter A, Fiegl H, et al. DNA methylation in serum of breast cancer patients: an independent prognostic marker. Cancer Res 2003; 63(22): 7641-5.
[PMID: 14633683]
[62]
Shen D, Liu T, Lin Q, et al. MGMT promoter methylation correlates with an overall survival benefit in Chinese high-grade glioblastoma patients treated with radiotherapy and alkylating agent-based chemotherapy: a single-institution study. PLoS One 2014; 9(9): e107558..
[http://dx.doi.org/10.1371/journal.pone.0107558] [PMID: 25211033]
[63]
Maier S, Dahlstroem C, Haefliger C, Plum A, Piepenbrock C. Identifying DNA methylation biomarkers of cancer drug response Am J Pharmacogenomics 2005; 5(4): 223-32..
[http://dx.doi.org/10.2165/00129785-200505040-00003] [PMID: 16078859]
[64]
Agrelo R, Cheng WH, Setien F, et al. Epigenetic inactivation of the premature aging Werner syndrome gene in human cancer. Proc Natl Acad Sci USA 2006; 103(23): 8822-7.
[http://dx.doi.org/10.1073/pnas.0600645103] [PMID: 16723399]
[65]
Nagasaka T, Sharp GB, Notohara K, et al. Hypermethylation of O6-methylguanine-DNA methyltransferase promoter may predict nonrecurrence after chemotherapy in colorectal cancer cases. Clin Cancer Res 2003; 9(14): 5306-12.
[PMID: 14614014]
[66]
Fahy J, Jeltsch A, Arimondo PB. DNA methyltransferase inhibitors in cancer: a chemical and therapeutic patent overview and selected clinical studies. Expert Opin Ther Pat 2012; 22(12): 1427-42.
[http://dx.doi.org/10.1517/13543776.2012.729579]
[67]
Marques-Magalhães Â, Graça I, Henrique R, Jerónimo C. Targeting DNA methyltranferases in urological tumors. Front Pharmacol 2018; 9: 366.
[http://dx.doi.org/10.3389/fphar.2018.00366] [PMID: 29706891]
[68]
Subramaniam D, Thombre R, Dhar A, Anant S. DNA methyltransferases: a novel target for prevention and therapy. Front Oncol 2014; 4: 80.
[http://dx.doi.org/10.3389/fonc.2014.00080] [PMID: 24822169]
[69]
Fedotova AA, Bonchuk AN, Mogila VA, Georgiev PG. C2H2 zinc finger proteins: The largest but poorly explored family of higher eukaryotic transcription factors. Acta Naturae 2017; 9(2): 47-58..
[http://dx.doi.org/10.32607/20758251-2017-9-2-47-58] [PMID: 28740726]
[70]
Laity JH, Lee BM, Wright PE. Zinc finger proteins: new insights into structural and functional diversity . Curr Opin Struct Biol 2001; 11(1): 39-46..
[http://dx.doi.org/10.1016/S0959-440X(00)00167-6] [PMID: 11179890]
[71]
Margolin JF, Friedman JR, Meyer WK, Vissing H, Thiesen HJ, Rauscher FJ III. Krüppel-associated boxes are potent transcriptional repression domains. Proc Natl Acad Sci USA 1994; 91(10): 4509-13.
[http://dx.doi.org/10.1073/pnas.91.10.4509] [PMID: 8183939]
[72]
Lupo A, Cesaro E, Montano G, Zurlo D, Izzo P, Costanzo P. KRAB-zinc finger proteins: A repressor family displaying multiple biological functions. Curr Genomics 2013; 14(4): 268-78.
[http://dx.doi.org/10.2174/13892029113149990002] [PMID: 24294107]
[73]
Medugno L, Florio F, De Cegli R, et al. The Krüppel-like zinc-finger protein ZNF224 represses aldolase A gene transcription by interacting with the KAP-1 co-repressor protein. Gene 2005; 359: 35-43.
[http://dx.doi.org/10.1016/j.gene.2005.06.021] [PMID: 16150558]
[74]
Urrutia R. KRAB-containing zinc-finger repressor proteins. Genome Biol 2003; 4(10): 231.
[http://dx.doi.org/10.1186/gb-2003-4-10-231] [PMID: 14519192]
[75]
Arechederra M, Daian F, Yim A, et al. Publisher correction: hypermethylation of gene body CpG islands predicts high dosage of functional oncogenes in liver cancer. Nat Commun 2018; 9(1): 3976.
[http://dx.doi.org/10.1038/s41467-018-06482-w] [PMID: 30254310]
[76]
McGuire MH, Herbrich SM, Dasari SK, et al. Pan-cancer genomic analysis links 3'UTR DNA methylation with increased gene expression in T cells. EBioMedicine 2019; 43: 127-37.
[http://dx.doi.org/10.1016/j.ebiom.2019.04.045] [PMID: 31056473]
[77]
Yu DH, Ware C, Waterland RA, et al. Developmentally programmed 3′ CpG island methylation confers tissue- and cell-type-specific transcriptional activation. Mol Cell Biol 2013; 33(9): 1845-58.
[http://dx.doi.org/10.1128/MCB.01124-12] [PMID: 23459939]
[78]
Dogan H, Can H, Otu HH. Whole genome sequence of a Turkish individual. PLoS One 2014; 9(1): e85233..
[http://dx.doi.org/10.1371/journal.pone.0085233] [PMID: 24416366]
[79]
Schrimpf R, Gottschalk M, Metzger J, Martinsson G, Sieme H, Distl O. Screening of whole genome sequences identified high-impact variants for stallion fertility. BMC Genomics 2016; 17: 288.
[http://dx.doi.org/10.1186/s12864-016-2608-3] [PMID: 27079378]

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy