Generic placeholder image

Current Bioinformatics

Editor-in-Chief

ISSN (Print): 1574-8936
ISSN (Online): 2212-392X

Research Article

M-CAMPTM: A Cloud-based Web Platform with a Novel Approach for Species-level Classification of 16S rRNA Microbiome Sequences

Author(s): Andrew E. Schriefer, Brajendra Kumar, Avihai Zolty, Adam Didier, Nirmal M.G., Greeshma G.T., Nofar Nadiv, Michael Perez, Preetam R., Santosh Kumar Mahankuda, Pankaj Kumar, Aaron Tenney, Maureen Bourner, Shira Lezer, Fei Zhong, Michal Daniely* and Yang Liu*

Volume 18, Issue 1, 2023

Published on: 29 November, 2022

Page: [21 - 39] Pages: 19

DOI: 10.2174/1574893617666220520100535

Price: $65

Abstract

Background: The M-CAMPTM (Microbiome Computational Analysis for Multi-omic Profiling) Cloud Platform was designed to provide users with an easy-to-use web interface to access best in class microbiome analysis tools. This interface allows bench scientists to conduct bioinformatic analysis on their samples and then download publication-ready graphics and reports.

Objective: In this study, we aim to describe the M-CAMPTM platform and demonstrate that the taxonomic classification is more accurate than previously described methods on a wide range of microbiome samples.

Methods: The core pipeline of the platform is the 16S-seq taxonomic classification algorithm which provides species-level classification of Illumina 16s sequencing. This algorithm uses a novel approach combining alignment and kmer-based taxonomic classification methodologies to produce a highly accurate and comprehensive profile. Additionally, a comprehensive proprietary database combining reference sequences from multiple sources was curated and contained 18056 unique V3-V4 sequences covering 11527 species.

Results and Discussion: The M-CAMPTM 16S taxonomic classification algorithm evaluated 52 sequencing samples from both public and in-house standard sample mixtures with known fractions. The same evaluation process was also performed on 5 well-known 16S taxonomic classification algorithms, including Qiime2, Kraken2, Mapseq, Idtaxa and Spingo, using the same dataset. Results have been discussed in terms of evaluation metrics and classified taxonomic levels.

Conclusion: Compared to current popular publicly accessible classification algorithms, M-CAMPTM 16S taxonomic classification algorithm provides the most accurate species-level classification of 16S rRNA sequencing data.

Keywords: Microbiology, 16S-seq, DNA sequencing, Bioinformatics, Web Application, Benchmarking

[1]
Council NR. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Washington (DC) 2007.
[2]
Venter JC, Remington K, Heidelberg JF, et al. Environmental genome shotgun sequencing of the Sargasso Sea. Science 2004; 304(5667): 66-74.
[http://dx.doi.org/10.1126/science.1093857] [PMID: 15001713]
[3]
Tyson GW, Chapman J, Hugenholtz P, et al. Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 2004; 428(6978): 37-43.
[http://dx.doi.org/10.1038/nature02340] [PMID: 14961025]
[4]
Lloyd-Price J, Arze C, Ananthakrishnan AN, et al. Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases. Nature 2019; 569(7758): 655-62.
[http://dx.doi.org/10.1038/s41586-019-1237-9] [PMID: 31142855]
[5]
Zhou W, Sailani MR, Contrepois K, et al. Longitudinal multi-omics of host-microbe dynamics in prediabetes. Nature 2019; 569(7758): 663-71.
[http://dx.doi.org/10.1038/s41586-019-1236-x] [PMID: 31142858]
[6]
Fettweis JM, Serrano MG, Brooks JP, et al. The vaginal microbiome and preterm birth. Nat Med 2019; 25(6): 1012-21.
[http://dx.doi.org/10.1038/s41591-019-0450-2] [PMID: 31142849]
[7]
Woese CR. Bacterial evolution. Microbiol Rev 1987; 51(2): 221-71.
[http://dx.doi.org/10.1128/mr.51.2.221-271.1987] [PMID: 2439888]
[8]
Gao B, Chi L, Zhu Y, et al. An introduction to next generation sequencing bioinformatic analysis in gut microbiome studies. Biomolecules 2021; 11(4): 530.
[http://dx.doi.org/10.3390/biom11040530] [PMID: 33918473]
[9]
Caporaso JG, Kuczynski J, Stombaugh J, et al. QIIME allows analysis of high-throughput community sequencing data. Nat Methods 2010; 7(5): 335-6.
[http://dx.doi.org/10.1038/nmeth.f.303] [PMID: 20383131]
[10]
Wood DE, Salzberg SL. Kraken: Ultrafast metagenomic sequence classification using exact alignments. Genome Biol 2014; 15(3): R46.
[http://dx.doi.org/10.1186/gb-2014-15-3-r46] [PMID: 24580807]
[11]
Matias Rodrigues JF, Schmidt TSB, Tackmann J, von Mering C. MAPseq: Highly efficient k-mer search with confidence estimates, for rRNA sequence analysis. Bioinformatics 2017; 33(23): 3808-10.
[http://dx.doi.org/10.1093/bioinformatics/btx517] [PMID: 28961926]
[12]
Allard G, Ryan FJ, Jeffery IB, Claesson MJ. SPINGO: A rapid species-classifier for microbial amplicon sequences. BMC Bioinform 2015; 16(1): 324.
[http://dx.doi.org/10.1186/s12859-015-0747-1] [PMID: 26450747]
[13]
Murali A, Bhargava A, Wright ES. IDTAXA: A novel approach for accurate taxonomic classification of microbiome sequences. Microbiome 2018; 6(1): 140.
[http://dx.doi.org/10.1186/s40168-018-0521-5] [PMID: 30092815]
[14]
Rognes T, Flouri T, Nichols B, Quince C, Mahé F. VSEARCH: A versatile open source tool for metagenomics. PeerJ 2016; 4: e2584.
[http://dx.doi.org/10.7717/peerj.2584] [PMID: 27781170]
[15]
Grim CJ, Daquigan N, Lusk Pfefer TS, Ottesen AR, White JR, Jarvis KG. High-resolution microbiome profiling for detection and tracking of Salmonella enterica. Front Microbiol 2017; 8: 1587.
[http://dx.doi.org/10.3389/fmicb.2017.01587] [PMID: 28868052]
[16]
DeSantis TZ, Hugenholtz P, Larsen N, et al. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol 2006; 72(7): 5069-72.
[http://dx.doi.org/10.1128/AEM.03006-05] [PMID: 16820507]
[17]
Quast C, Pruesse E, Yilmaz P, et al. The SILVA ribosomal RNA gene database project: Improved data processing and web-based tools. Nucleic Acids Res 2013; 41(Database issue): D590-6.
[PMID: 23193283]
[18]
Cole JR, Wang Q, Fish JA, et al. Ribosomal database project: Data and tools for high throughput rRNA analysis. Nucleic Acids Res 2014; 42(Database issue): D633-42.
[http://dx.doi.org/10.1093/nar/gkt1244] [PMID: 24288368]
[19]
Srinivasan R, Karaoz U, Volegova M, et al. Use of 16S rRNA gene for identification of a broad range of clinically relevant bacterial pathogens. PLoS One 2015; 10(2): e0117617.
[http://dx.doi.org/10.1371/journal.pone.0117617] [PMID: 25658760]
[20]
Bushnell B, Rood J, Singer E. BBMerge - Accurate paired shotgun read merging via overlap. PLoS One 2017; 12(10): e0185056.
[http://dx.doi.org/10.1371/journal.pone.0185056] [PMID: 29073143]
[21]
Zhang X, Shao Y, Tian J, et al. pTrimmer: An efficient tool to trim primers of multiplex deep sequencing data. BMC Bioinformatics 2019; 20(1): 236.
[http://dx.doi.org/10.1186/s12859-019-2854-x] [PMID: 31077131]
[22]
Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biol 2019; 20(1): 257.
[http://dx.doi.org/10.1186/s13059-019-1891-0] [PMID: 31779668]
[23]
O’Leary NA, Wright MW, Brister JR, et al. Reference sequence (RefSeq) database at NCBI: Current status, taxonomic expansion, and functional annotation. Nucleic Acids Res 2016; 44(D1): D733-45.
[http://dx.doi.org/10.1093/nar/gkv1189] [PMID: 26553804]
[24]
Boyer F, Mercier C, Bonin A, Le Bras Y, Taberlet P, Coissac E. obitools: A unix-inspired software package for DNA metabarcoding. Mol Ecol Resour 2016; 16(1): 176-82.
[http://dx.doi.org/10.1111/1755-0998.12428] [PMID: 25959493]
[25]
Herlemann DP, Labrenz M, Jürgens K, Bertilsson S, Waniek JJ, Andersson AF. Transitions in bacterial communities along the 2000 km salinity gradient of the Baltic Sea. ISME J 2011; 5(10): 1571-9.
[http://dx.doi.org/10.1038/ismej.2011.41] [PMID: 21472016]
[26]
Hanson NW, Konwar KM, Hallam SJ. LCA*: An entropy-based measure for taxonomic assignment within assembled metagenomes. Bioinformatics 2016; 32(23): 3535-42.
[http://dx.doi.org/10.1093/bioinformatics/btw400] [PMID: 27515739]
[27]
Seemann T. Barrnap 2013.
[28]
Grüning B, Dale R, Sjödin A, et al. Bioconda: Sustainable and comprehensive software distribution for the life sciences. Nat Methods 2018; 15(7): 475-6.
[http://dx.doi.org/10.1038/s41592-018-0046-7] [PMID: 29967506]
[29]
Bolyen E, Rideout JR, Dillon MR, et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat Biotechnol 2019; 37(8): 852-7.
[http://dx.doi.org/10.1038/s41587-019-0209-9] [PMID: 31341288]
[30]
Schirmer M, Ijaz UZ, D’Amore R, Hall N, Sloan WT, Quince C. Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform. Nucleic Acids Res 2015; 43(6): e37.
[http://dx.doi.org/10.1093/nar/gku1341] [PMID: 25586220]
[31]
Tourlousse DM, Yoshiike S, Ohashi A, Matsukura S, Noda N, Sekiguchi Y. Synthetic spike-in standards for high-throughput 16S rRNA gene amplicon sequencing. Nucleic Acids Res 2017; 45(4): e23.
[PMID: 27980100]
[32]
Gohl DM, Vangay P, Garbe J, et al. Systematic improvement of amplicon marker gene methods for increased accuracy in microbiome studies. Nat Biotechnol 2016; 34(9): 942-9.
[http://dx.doi.org/10.1038/nbt.3601] [PMID: 27454739]
[33]
Nearing JT, Comeau AM, Langille MGI. Identifying biases and their potential solutions in human microbiome studies. Microbiome 2021; 9(1): 113.
[http://dx.doi.org/10.1186/s40168-021-01059-0] [PMID: 34006335]
[34]
Powers DMW. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. Inter J Machine Learn Technol 2011; 2(1)
[35]
Chicco D, Jurman G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 2020; 21(1): 6.
[http://dx.doi.org/10.1186/s12864-019-6413-7] [PMID: 31898477]
[36]
Tovo A, Menzel P, Krogh A, Cosentino LM, Suweis S. Taxonomic classification method for metagenomics based on core protein families with Core-Kaiju. Nucleic Acids Res 2020; 48(16): e93.
[http://dx.doi.org/10.1093/nar/gkaa568] [PMID: 32633756]
[37]
Yue Y, Huang H, Qi Z, et al. Evaluating metagenomics tools for genome binning with real metagenomic datasets and CAMI datasets. BMC Bioinformatics 2020; 21(1): 334.
[http://dx.doi.org/10.1186/s12859-020-03667-3] [PMID: 32723290]
[38]
Sczyrba A, Hofmann P, Belmann P, et al. Critical assessment of metagenome interpretation-a benchmark of metagenomics software. Nat Methods 2017; 14(11): 1063-71.
[http://dx.doi.org/10.1038/nmeth.4458] [PMID: 28967888]
[39]
Do DT, Le NQK. Using extreme gradient boosting to identify origin of replication in Saccharomyces cerevisiae via hybrid features. Genomics 2020; 112(3): 2445-51.
[http://dx.doi.org/10.1016/j.ygeno.2020.01.017] [PMID: 31987913]
[40]
Tng SS, Le NQK, Yeh HY, Chua MCH. Improved prediction model of protein lysine crotonylation sites using bidirectional recurrent neural networks. J Proteome Res 2022; 21(1): 265-73.
[http://dx.doi.org/10.1021/acs.jproteome.1c00848] [PMID: 34812044]
[41]
Ye SH, Siddle KJ, Park DJ, Sabeti PC. Benchmarking metagenomics tools for taxonomic classification. Cell 2019; 178(4): 779-94.
[http://dx.doi.org/10.1016/j.cell.2019.07.010] [PMID: 31398336]
[42]
Huerta-Cepas J, Serra F, Bork P. ETE 3: Reconstruction, analysis, and visualization of phylogenomic data. Mol Biol Evol 2016; 33(6): 1635-8.
[http://dx.doi.org/10.1093/molbev/msw046] [PMID: 26921390]
[43]
Qian XB, Chen T, Xu YP, et al. A guide to human microbiome research: Study design, sample collection, and bioinformatics analysis. Chin Med J (Engl) 2020; 133(15): 1844-55.
[http://dx.doi.org/10.1097/CM9.0000000000000871] [PMID: 32604176]
[44]
Fuks G, Elgart M, Amir A, et al. Combining 16S rRNA gene variable regions enables high-resolution microbial community profiling. Microbiome 2018; 6(1): 17.
[http://dx.doi.org/10.1186/s40168-017-0396-x] [PMID: 29373999]
[45]
TIBCO. JasperReports. TIBCO Software Inc.
[46]
Segata N, Izard J, Waldron L, et al. Metagenomic biomarker discovery and explanation. Genome Biol 2011; 12(6): R60.
[http://dx.doi.org/10.1186/gb-2011-12-6-r60] [PMID: 21702898]
[47]
Peterson D, Bonham KS, Rowland S, Pattanayak CW, Klepac-Ceraj V. Comparative analysis of 16S rRNA gene and metagenome sequencing in pediatric gut microbiomes. Front Microbiol 2021; 12: 670336.
[http://dx.doi.org/10.3389/fmicb.2021.670336] [PMID: 34335499]
[48]
Laudadio I, Fulci V, Palone F, Stronati L, Cucchiara S, Carissimi C. Quantitative assessment of shotgun metagenomics and 16S rDNA amplicon sequencing in the study of human gut microbiome. OMICS 2018; 22(4): 248-54.
[http://dx.doi.org/10.1089/omi.2018.0013] [PMID: 29652573]
[49]
Willis AD. Rarefaction, alpha diversity, and statistics. Front Microbiol 2019; 10: 2407.
[http://dx.doi.org/10.3389/fmicb.2019.02407] [PMID: 31708888]
[50]
Zhao Y, Zhou J, Liu J, Wang Z, Chen M, Zhou S. Metagenome of gut microbiota of children with nonalcoholic fatty liver disease. Front Pediatr 2019; 7: 518.
[http://dx.doi.org/10.3389/fped.2019.00518] [PMID: 31921729]
[51]
Kelly BJ, Gross R, Bittinger K, et al. Power and sample-size estimation for microbiome studies using pairwise distances and PERMANOVA. Bioinformatics 2015; 31(15): 2461-8.
[http://dx.doi.org/10.1093/bioinformatics/btv183] [PMID: 25819674]
[52]
Hornung BVH, Zwittink RD, Kuijper EJ. Issues and current standards of controls in microbiome research. FEMS Microbiol Ecol 2019; 95(5): fiz045.
[http://dx.doi.org/10.1093/femsec/fiz045] [PMID: 30997495]

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy