Generic placeholder image

Current Bioinformatics

Editor-in-Chief

ISSN (Print): 1574-8936
ISSN (Online): 2212-392X

Research Article

Trimming and Decontamination of Metagenomic Data can Significantly Impact Assembly and Binning Metrics, Phylogenomic and Functional Analysis

Author(s): Jason M. Whitham* and Amy M. Grunden

Volume 18, Issue 5, 2023

Published on: 07 April, 2023

Page: [428 - 439] Pages: 12

DOI: 10.2174/1574893618666230227145952

Price: $65

Abstract

Background: Investigators using metagenomic sequencing to study microbiomes often trim and decontaminate reads without knowing their effect on downstream analyses.

Objective: This study was designed to evaluate the impacts JGI trimming and decontamination procedures have on assembly and binning metrics, placement of MAGs into species trees, and functional profiles of MAGs extracted from complex rhizosphere metagenomes, as well as how more aggressive trimming impacts these binning metrics.

Methods: Twenty-three Miscanthus x giganteus rhizosphere metagenomes were subjected to different combinations and thresholds of force, kmer, and quality trimming and decontamination using BBDuk. Reads were assembled and binned in KBase. Phylogenomic and statistical analyses were applied to evaluate the effects of trimming and decontamination on downstream analyses.

Results: We found that JGI trimmed and decontaminated reads had significant impacts on assembly and binning metrics compared to raw reads, including significantly higher total contig counts, more contigs greater than 10k bp in length, and larger total lengths of raw assemblies compared to QC assemblies, and 2.0% lower average contamination of QC MAGs compared to raw MAGs. We also found that differences in the placement of MAGs in species trees increased with decreasing completeness and contamination thresholds. Furthermore, aggressive trimming (Q20) was found to significantly reduce MAG counts.

Conclusion: Trimming and decontamination of metagenomics reads prior to assembly can change an investigator’s answer to the questions, “Who is there and what are they doing?” However, mild trimming and decontamination of metagenomic reads with high-quality scores are recommended for removing sample processing and sequencing artifacts.

Graphical Abstract

[1]
Lee H, Chawla HS, Obermeier C, Dreyer F, Abbadi A, Snowdon R. Chromosome-scale assembly of winter oilseed rape Brassica napus. Front Plant Sci 2020; 11: 496.
[http://dx.doi.org/10.3389/fpls.2020.00496] [PMID: 32411167]
[2]
Li D, Luo R, Liu CM, et al. MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices. Methods 2016; 102: 3-11.
[http://dx.doi.org/10.1016/j.ymeth.2016.02.020] [PMID: 27012178]
[3]
Dow EG, Wood-Charlson EM, Biller SJ, et al. Bioinformatic Teaching Resources – For Educators, by Educators – Using KBase, a Free, User-Friendly, Open Source Platform. Front Educ 2021; 6: 711535.
[http://dx.doi.org/10.3389/feduc.2021.711535]
[4]
Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: Quality assessment tool for genome assemblies. Bioinformat 2013; 29(8): 1072-5.
[http://dx.doi.org/10.1093/bioinformatics/btt086] [PMID: 23422339]
[5]
Mikheenko A, Saveliev V, Gurevich A. MetaQUAST: Evaluation of metagenome assemblies. Bioinformat 2016; 32(7): 1088-90.
[http://dx.doi.org/10.1093/bioinformatics/btv697] [PMID: 26614127]
[6]
Kang DD, Li F, Kirton E, et al. MetaBAT 2: An adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ 2019; 7: e7359.
[http://dx.doi.org/10.7717/peerj.7359] [PMID: 31388474]
[7]
Bowers RM, Kyrpides NC, Stepanauskas R, et al. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat Biotechnol 2017; 35(8): 725-31.
[http://dx.doi.org/10.1038/nbt.3893] [PMID: 28787424]
[8]
Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW, Check M. Assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 2015; 25(7): 1043-55.
[http://dx.doi.org/10.1101/gr.186072.114] [PMID: 25977477]
[9]
Arkin AP, Cottingham RW, Henry CS, et al. KBase: The United States department of energy systems biology knowledgebase. Nat Biotechnol 2018; 36(7): 566-9.
[http://dx.doi.org/10.1038/nbt.4163] [PMID: 29979655]
[10]
Kluyver T, Ragan-Kelley B, Pérez F, et al. Jupyter Notebooks-a publishing format for reproducible computational workflows. Proceedings of the 20th International Conference on Electronic Publishing. 2016 June; Göttingen, Germany. Amsterdam: IOS Press 2016.
[http://dx.doi.org/10.3233/978-1-61499-649-1-87]
[11]
Whitham JM. JGI QC impact on assembly, binning, phylogenomics, and functional analysis. United States. Jan 2021 [cited: 15 Oct 2020]; Available from:
[http://dx.doi.org/10.25982/62657.1515/1779219]
[12]
Whitham JM. Impact of BBDuk metagenomic read trimming and decontamination. United States. Jan 2021 [cited: 15 Oct 2020] Available from:
[http://dx.doi.org/10.25982/77705.1341/1779218]
[13]
Whitham JM. Repository for code and figures used in "Trimming and decontamination of metagenomic data can significantly impact assembly and binning metrics, phylogenomic and functional analysis". GitHub. 2021 April 22; [cited: 22 April 2021] Available from: https://github.com/jmwhitha/Trimming_and_decon
[14]
Tiedje J. Metagenomic analysis of the rhizosphere of three biofuel crops at the KBS intensive site. United States. Jan 2013 [cited: 15 Oct 2020] Available from:
[http://dx.doi.org/10.25585/1488010]
[15]
Guo J, Cole JR, Zhang Q, Brown CT, Tiedje JM. Microbial community analysis with ribosomal gene fragments from shotgun metagenomes. Appl Environ Microbiol 2016; 82(1): 157-66.
[http://dx.doi.org/10.1128/AEM.02772-15] [PMID: 26475107]
[16]
Bay SK, Dong X, Bradley JA, et al. Trace gas oxidizers are widespread and active members of soil microbial communities. Nat Microbiol 2021; 6(2): 246-56.
[http://dx.doi.org/10.1038/s41564-020-00811-w] [PMID: 33398096]
[17]
Chen IMA, Chu K, Palaniappan K, et al. The IMG/M data management and analysis system v.6.0: New tools and advanced capabilities. Nucleic Acids Res 2021; 49(D1): D751-63.
[http://dx.doi.org/10.1093/nar/gkaa939] [PMID: 33119741]
[18]
Mukherjee S, Stamatis D, Bertsch J, et al. Genomes OnLine Database (GOLD) v.8: Overview and updates. Nucleic Acids Res 2021; 49(D1): D723-33.
[http://dx.doi.org/10.1093/nar/gkaa983] [PMID: 33152092]
[19]
Bushnell B. BBTools software package. 2017. Available from: http://sourceforge.net/projects/bbmap [Accessed on: 15th Oct 2020).
[20]
Bushnell B. Introducing BBDuk: Adapter/Quality Trimming and Filtering. SeqAnswers. 2014 April 4, [cited: 15 October 2020]; Available from: https://www.seqanswers.com/forum/bioinformatics/bioinformatics-aa/37399-introducing-bbduk-adapter-quality-trimming-and-filtering
[21]
SeqAnswers BBDuk. Available from: http://seqanswers.com/forums/showthread.php?t=96593&goto=nextnewest [Accessed on: 15th Oct 2020].
[22]
Bushnell B. understanding contig statistics. BioStars. 2017 February 28; [cited: 15 Oct 2020] Available from: https://www.biostars.org/p/237714/#237745
[23]
Bushnell B. Metagenomics data: trimming and decontamination. BioStars. 2017 February 28; [cited: 15 Oct 2020] Available from: https://www.biostars.org/p/237931/
[24]
Gelman A, Hill J. Data analysis using regression and multilevel/hierarchical models. Cambridge University Press: New York 2006.
[http://dx.doi.org/10.1017/CBO9780511790942]
[25]
Azad A, Pavlopoulos GA, Ouzounis CA, Kyrpides NC, Buluç A. HipMCL: A high-performance parallel implementation of the Markov clustering algorithm for large-scale networks. Nucleic Acids Res 2018; 46(6): e33.
[http://dx.doi.org/10.1093/nar/gkx1313] [PMID: 29315405]
[26]
Prjibelski A, Antipov D, Meleshko D, Lapidus A, Korobeynikov A. Using SPAdes De Novo Assembler. Curr Protoc Bioinformatics 2020; 70(1): e102.
[http://dx.doi.org/10.1002/cpbi.102] [PMID: 32559359]
[27]
Peng Y, Leung HCM, Yiu SM, Chin FYL. IDBA-UD: A de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 2012; 28(11): 1420-8.
[http://dx.doi.org/10.1093/bioinformatics/bts174] [PMID: 22495754]
[28]
Whitham JM. KBase silver case study: Determining media formulation requirements for isolation of microbiome constituents. United States 2021 Web.
[http://dx.doi.org/10.25982/68579.143/1766297]
[29]
Wu YW, Simmons BA, Singer SW. MaxBin 2.0: An automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformat 2016; 32(4): 605-7.
[http://dx.doi.org/10.1093/bioinformatics/btv638] [PMID: 26515820]
[30]
Yue Y, Huang H, Qi Z, et al. Evaluating metagenomics tools for genome binning with real metagenomic datasets and CAMI datasets. BMC Bioinformatics 2020; 21(1): 334.
[http://dx.doi.org/10.1186/s12859-020-03667-3] [PMID: 32723290]
[31]
Brettin T, Davis JJ, Disz T, et al. RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes. Sci Rep 2015; 5(1): 8365.
[http://dx.doi.org/10.1038/srep08365] [PMID: 25666585]
[32]
Price MN, Dehal PS, Arkin AP. FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One 2010; 5(3): e9490.
[http://dx.doi.org/10.1371/journal.pone.0009490] [PMID: 20224823]
[33]
Huerta-Cepas J, Serra F, Bork P. ETE 3: Reconstruction, analysis, and visualization of phylogenomic data. Mol Biol Evol 2016; 33(6): 1635-8.
[http://dx.doi.org/10.1093/molbev/msw046] [PMID: 26921390]
[34]
Galperin MY, Wolf YI, Makarova KS, Vera Alvarez R, Landsman D, Koonin EV. COG database update: Focus on microbial diversity, model organisms, and widespread pathogens. Nucleic Acids Res 2021; 49(D1): D274-81.
[http://dx.doi.org/10.1093/nar/gkaa1018] [PMID: 33167031]
[35]
Mistry J, Chuguransky S, Williams L, et al. Pfam: The protein families database in 2021. Nucleic Acids Res 2021; 49(D1): D412-9.
[http://dx.doi.org/10.1093/nar/gkaa913] [PMID: 33125078]
[36]
Haft DH, Loftus BJ, Richardson DL, et al. TIGRFAMs: A protein family resource for the functional identification of proteins. Nucleic Acids Res 2001; 29(1): 41-3.
[http://dx.doi.org/10.1093/nar/29.1.41] [PMID: 11125044]
[37]
Torchiano M. Effsize - a package for efficient effect size computation. 2016 November 13; [cited 15 Oct 2020]. Available From:
[http://dx.doi.org/10.5281/zenodo.1480624]
[38]
Sainani K. The importance of accounting for correlated observations. PM R 2010; 2(9): 858-61.
[http://dx.doi.org/10.1016/j.pmrj.2010.07.482] [PMID: 20869686]

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy