Generic placeholder image

Current Bioinformatics

Editor-in-Chief

ISSN (Print): 1574-8936
ISSN (Online): 2212-392X

Research Article

Clustering Count-based RNA Methylation Data Using a Nonparametric Generative Model

Author(s): Lin Zhang, Yanling He, Huaizhi Wang, Hui Liu*, Yufei Huang, Xuesong Wang and Jia Meng*

Volume 14, Issue 1, 2019

Page: [11 - 23] Pages: 13

DOI: 10.2174/1574893613666180601080008

Price: $65

conference banner
Abstract

Background: RNA methylome has been discovered as an important layer of gene regulation and can be profiled directly with count-based measurements from high-throughput sequencing data. Although the detailed regulatory circuit of the epitranscriptome remains uncharted, clustering effect in methylation status among different RNA methylation sites can be identified from transcriptome-wide RNA methylation profiles and may reflect the epitranscriptomic regulation. Count-based RNA methylation sequencing data has unique features, such as low reads coverage, which calls for novel clustering approaches.

Objective: Besides the low reads coverage, it is also necessary to keep the integer property to approach clustering analysis of count-based RNA methylation sequencing data.

Method: We proposed a nonparametric generative model together with its Gibbs sampling solution for clustering analysis. The proposed approach implements a beta-binomial mixture model to capture the clustering effect in methylation level with the original count-based measurements rather than an estimated continuous methylation level. Besides, it adopts a nonparametric Dirichlet process to automatically determine an optimal number of clusters so as to avoid the common model selection problem in clustering analysis.

Results: When tested on the simulated system, the method demonstrated improved clustering performance over hierarchical clustering, K-means, MClust, NMF and EMclust. It also revealed on real dataset two novel RNA N6-methyladenosine (m6A) co-methylation patterns that may be induced directly by METTL14 and WTAP, which are two known regulatory components of the RNA m6A methyltransferase complex.

Conclusion: Our proposed DPBBM method not only properly handles the count-based measurements of RNA methylation data from sites of very low reads coverage, but also learns an optimal number of clusters adaptively from the data analyzed.

Availability: The source code and documents of DPBBM R package are freely available through the Comprehensive R Archive Network (CRAN): https://cran.r-project.org/web/packages/DPBBM/.

Keywords: RNA methylation, m6A-seq, beta-binomial mixture, dirichlet process, clustering, epitranscriptome.

Graphical Abstract

[1]
Jia CZ, Zhang JJ, Gu WZ. RNA-MethylPred: A high-accuracy predictor to identify N6-methyladenosine in RNA. Anal Biochem 2016; 510: 72-5.
[2]
Liu H, Flores MA, Meng J, et al. MeT-DB: a database of transcriptome methylation in mammalian cells. Nucleic Acids Res 2014; 43: D197.
[3]
Sun W, Li J, Liu S, et al. RMBase: a resource for decoding the landscape of RNA modifications from high-throughput sequencing data. Nucleic Acids Res 2015; 44: D259-65.
[4]
Fustin J, Doi M, Yamaguchi Y, et al. RNA-methylation-dependent RNA processing controls the speed of the circadian clock. Cell 2013; 155: 793-806.
[5]
Hess ME, Hess S, Meyer KD, et al. The fat mass and obesity associated gene (Fto) regulates activity of the dopaminergic midbrain circuitry. Nat Neurosci 2013; 16: 1042-8.
[6]
Schwartz S, Agarwala SD, Mumbach MR, et al. High-resolution mapping reveals a conserved, widespread, dynamic mRNA methylation program in yeast meiosis. Cell 2013; 155: 1409-21.
[7]
Liu J, Yue Y, Han D, et al. A METTL3-METTL14 complex mediates mammalian nuclear RNA N6-adenosine methylation. Nat Chem Biol 2014; 10: 93-5.
[8]
Wang X, Lu Z, Gomez A, et al. N6-methyladenosine-dependent regulation of messenger RNA stability. Nature 2014; 505: 117-20.
[9]
Barretina J, Caponigro G, Stransky N, et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 2012; 483: 603-7.
[10]
Dominissini D, Moshitch-Moshkovitz S, Schwartz S, et al. Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq. Nature 2012; 485: 201-6.
[11]
Wang X, Feng J, Xue Y, et al. Structural basis of N6-adenosine methylation by the METTL3–METTL14 complex. Nature 2016; 534: 575.
[12]
Ping X, Sun B, Wang L, et al. Mammalian WTAP is a regulatory subunit of the RNA N6-methyladenosine methyltransferase. Cell Res 2014; 24: 177.
[13]
Schwartz S, Mumbach MR, Jovanovic M, et al. Perturbation of m6A writers reveals two distinct classes of mRNA methylation at internal and 5′ sites. Cell Reports 2014; 8: 284-96.
[14]
Sledz P, Jinek M. Structural insights into the molecular mechanism of the m6A writer complex. eLife 2016; 5: e18434.
[15]
Jia G, Fu Y, Zhao X, et al. N6-methyladenosine in nuclear RNA is a major substrate of the obesity-associated FTO. Nat Chem Biol 2011; 7: 885-7.
[16]
Zheng G, Dahl JA, Niu Y, et al. ALKBH5 is a mammalian RNA demethylase that impacts RNA metabolism and mouse fertility. Mol Cell 2013; 49: 18-29.
[17]
Zou S, Toh JDW, Wong KHQ, Gao YG, Hong W, Woon ECY. N6-Methyladenosine: a conformational marker that regulates the substrate specificity of human demethylases FTO and ALKBH5. Sci Rep 2016; 6: 25677.
[18]
Liu L, Zhang S, Zhang Y, et al. Decomposition of RNA methylome reveals co-methylation patterns induced by latent enzymatic regulators of the epitranscriptome. Mol Biosyst 2015; 11: 262-74.
[19]
Alon U, Barkai N, Notterman DA, et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 1999; 96: 6745-50.
[20]
Bjornsson HT, Sigurdsson MI, Fallin MD, et al. Intra-individual change over time in DNA methylation with familial clustering. JAMA 2008; 299: 2877-83.
[21]
Jain AK, Murty MN, Flynn PJ. Data clustering: a review. ACM Comput Surv 1999; 31: 264-323.
[22]
Bouveyron C, Brunet-Saumard C. Model-based clustering of high-dimensional data: A review. Comput Stat Data Anal 2014; 71: 52-78.
[23]
Nieto JJ, Torres A, Georgiou DN, Karakasidis TE. Fuzzy polynucleotide spaces and metrics. Bull Math Biol 2006; 68: 703-25.
[24]
Saha I, Maulik U, Bandyopadhyay S, Plewczynski D. Fuzzy clustering of physicochemical and biochemical properties of amino acids. Amino Acids 2012; 43: 583-94.
[25]
Pelleg D, Moore AW. X-means: Extending K-means with efficient estimation of the number of clusters. In: Langley P, Ed. Seventeenth International Conference on Machine Learning. June 29; Stanford, CA, USA: Morgan Kaufmann 2000; pp. 727-34.
[26]
Antoniak CE. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann Stat 1974; 2: 1152-74.
[27]
Dahl DB. Model-based clustering for expression data via a Dirichlet process mixture model Bayesian inference for gene expression and proteomics 2006: 201-18
[28]
Silva A. A Dirichlet process mixture model for brain MRI tissue classification. Med Image Anal 2007; 11: 169-82.
[29]
Teh YW, Jordan MI, Beal MJ, Blei DM. Hierarchical dirichlet processes. J Am Stat Assoc 2012; 101: 1566-81.
[30]
Dai AM, Storkey AJ. The supervised hierarchical Dirichlet process. IEEE Trans Pattern Anal Mach Intell 2015; 37: 243-55.
[31]
Escobar MD. Estimating normal means with a Dirichlet process prior. J Am Stat Assoc 1994; 89: 268-77.
[32]
Escobar MD, West M. Bayesian density estimation and inference using mixtures. J Am Stat Assoc 1995; 90: 577-88.
[33]
MacEachern SN. Estimating normal means with a conjugate style Dirichlet process prior. Commun Stat Simul Comput 1994; 23: 727-41.
[34]
MacEachern SN, Müller P. Estimating mixture of Dirichlet process models. J Comput Graph Stat 1998; 7: 223-38.
[35]
Walker S, Damien P. Sampling Methods For Bayesian Nonparametric Inference Involving Stochastic Processes. In: Dey D, Müller P, Sinha D, editors. Practical Nonparametric and Semiparametric Bayesian Statistics. New York, NY: Springer New York 1998; pp. 243-54.
[36]
Neal RM. Markov chain sampling methods for Dirichlet process mixture models. J Comput Graph Stat 2000; 9: 249-65.
[37]
Casella G, Berger RL. Statistical inference2nd ed Duxbury: Duxbury Press. 2002.
[38]
Pitman J. Combinatorial stochastic processes. Lect Notes Math 2006; 1875: 75-92.
[39]
Sethuraman J. A constructive definition of Dirichlet priors. Stat Sin 1994; 4: 639-50.
[40]
Blackwell D, MacQueen JB. Ferguson distributions via Pólya urn schemes. Ann Stat 1973; 1: 353-5.
[41]
Paddock SM, Ruggeri F, Lavine M, West M. Randomized Polya tree models for nonparametric Bayesian inference. Stat Sin 2003; 13: 443-60.
[42]
Jara A. Applied Bayesian non-and semi-parametric inference using DPpackage http://www.jstatsoft.org/v40/i05/ (Accessed on July 9, 2018).
[43]
Jara A, Hanson TE, Quintana FA, Müller P, Rosner GL. DPpackage: Bayesian Semi- and Nonparametric Modeling in R. J Stat Softw 2011; 40: 1-30.
[44]
Fraley C, Raftery AE. MCLUST version 3: an R package for normal mixture modeling and model-based clustering. DTIC Document 2006.http://core.ac.uk/display/21790118
[45]
Lee DD, Seung HS. Algorithms for non-negative matrix factorization.In: Thomas G. Dietterich SB, Zoubin Ghahramani, Ed. Advances in neural information processing systems; Dec. 3 - Dec. 8; Vancouver, British Columbia, Canada: MIT Press; 2001; pp. 556-62.
[46]
Fraley C, Raftery A, Scrucca L. Normal mixture modeling for model-based clustering, classification, and density estimation http://cran.r-project.org/web/packages/mclust/ Accessed on July 9, 2018)
[47]
Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 2013; 14: R36.
[48]
Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods 2012; 9: 357-9.
[49]
Meng J, Lu Z, Liu H, et al. A protocol for RNA methylation differential analysis with MeRIP-Seq data and exomePeak R/Bioconductor package. Methods 2014; 69: 274-81.
[50]
Lawrence M, Huber W, Pagès H, et al. Software for Computing and Annotating Genomic Ranges. PLOS Comput Biol 2013; 9: e1003118.
[51]
Frayling TM, Timpson NJ, Weedon MN, et al. A Common Variant in the FTO Gene is associated with body mass index and predisposes to childhood and adult obesity. Science 2007; 316: 889-94.
[52]
Lin S, Choe J, Du P, Triboulet R, Gregory RI. The m6A methyltransferase METTL3 promotes translation in human cancer cells. Mol Cell 2016; 62: 335-45.
[53]
Nachtergaele S, Dong L, Hu C, et al. FTO plays an oncogenic role in acute myeloid leukemia as a N 6-Methyladenosine RNA demethylase. Cancer Cell 2017; 31: 1-15.
[54]
Zhang S, Zhao BS, Zhou A, et al. m6A Demethylase ALKBH5 maintains tumorigenicity of glioblastoma stem-like cells by sustaining FOXM1 expression and cell proliferation program. Cancer Cell 2017; 31: 591-606.
[55]
Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Series B Stat Methodol 1995; 57: 289-300.
[56]
Wang P, Doxtader Katelyn A, Nam Y. Structural basis for cooperative function of Mettl3 and Mettl14 methyltransferases. Mol Cell 2016; 63: 306-17.
[57]
Zhou Katherine I, Pan T. Structures of the m6A methyltransferase complex: Two subunits with distinct but coordinated roles. Mol Cell 2016; 63: 183-5.
[58]
Schaefer M. RNA 5-Methylcytosine analysis by bisulfite sequencing. Methods Enzymol 2015; 560: 297-329.
[59]
Dominissini D, Nachtergaele S, Moshitch-Moshkovitz S, et al. The dynamic N1-methyladenosine methylome in eukaryotic messenger RNA. Nature 2016; 530: 441-6.
[60]
Walker SG. Sampling the dirichlet mixture model with slices. Commun Stat Simul Comput 2007; 36: 45-54.

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy