Abstract
Background: Investigators using metagenomic sequencing to study microbiomes often trim and decontaminate reads without knowing their effect on downstream analyses.
Objective: This study was designed to evaluate the impacts JGI trimming and decontamination procedures have on assembly and binning metrics, placement of MAGs into species trees, and functional profiles of MAGs extracted from complex rhizosphere metagenomes, as well as how more aggressive trimming impacts these binning metrics.
Methods: Twenty-three Miscanthus x giganteus rhizosphere metagenomes were subjected to different combinations and thresholds of force, kmer, and quality trimming and decontamination using BBDuk. Reads were assembled and binned in KBase. Phylogenomic and statistical analyses were applied to evaluate the effects of trimming and decontamination on downstream analyses.
Results: We found that JGI trimmed and decontaminated reads had significant impacts on assembly and binning metrics compared to raw reads, including significantly higher total contig counts, more contigs greater than 10k bp in length, and larger total lengths of raw assemblies compared to QC assemblies, and 2.0% lower average contamination of QC MAGs compared to raw MAGs. We also found that differences in the placement of MAGs in species trees increased with decreasing completeness and contamination thresholds. Furthermore, aggressive trimming (Q20) was found to significantly reduce MAG counts.
Conclusion: Trimming and decontamination of metagenomics reads prior to assembly can change an investigator’s answer to the questions, “Who is there and what are they doing?” However, mild trimming and decontamination of metagenomic reads with high-quality scores are recommended for removing sample processing and sequencing artifacts.
Graphical Abstract
[http://dx.doi.org/10.3389/fpls.2020.00496] [PMID: 32411167]
[http://dx.doi.org/10.1016/j.ymeth.2016.02.020] [PMID: 27012178]
[http://dx.doi.org/10.3389/feduc.2021.711535]
[http://dx.doi.org/10.1093/bioinformatics/btt086] [PMID: 23422339]
[http://dx.doi.org/10.1093/bioinformatics/btv697] [PMID: 26614127]
[http://dx.doi.org/10.7717/peerj.7359] [PMID: 31388474]
[http://dx.doi.org/10.1038/nbt.3893] [PMID: 28787424]
[http://dx.doi.org/10.1101/gr.186072.114] [PMID: 25977477]
[http://dx.doi.org/10.1038/nbt.4163] [PMID: 29979655]
[http://dx.doi.org/10.3233/978-1-61499-649-1-87]
[http://dx.doi.org/10.25982/62657.1515/1779219]
[http://dx.doi.org/10.25982/77705.1341/1779218]
[http://dx.doi.org/10.25585/1488010]
[http://dx.doi.org/10.1128/AEM.02772-15] [PMID: 26475107]
[http://dx.doi.org/10.1038/s41564-020-00811-w] [PMID: 33398096]
[http://dx.doi.org/10.1093/nar/gkaa939] [PMID: 33119741]
[http://dx.doi.org/10.1093/nar/gkaa983] [PMID: 33152092]
[http://dx.doi.org/10.1017/CBO9780511790942]
[http://dx.doi.org/10.1093/nar/gkx1313] [PMID: 29315405]
[http://dx.doi.org/10.1002/cpbi.102] [PMID: 32559359]
[http://dx.doi.org/10.1093/bioinformatics/bts174] [PMID: 22495754]
[http://dx.doi.org/10.25982/68579.143/1766297]
[http://dx.doi.org/10.1093/bioinformatics/btv638] [PMID: 26515820]
[http://dx.doi.org/10.1186/s12859-020-03667-3] [PMID: 32723290]
[http://dx.doi.org/10.1038/srep08365] [PMID: 25666585]
[http://dx.doi.org/10.1371/journal.pone.0009490] [PMID: 20224823]
[http://dx.doi.org/10.1093/molbev/msw046] [PMID: 26921390]
[http://dx.doi.org/10.1093/nar/gkaa1018] [PMID: 33167031]
[http://dx.doi.org/10.1093/nar/gkaa913] [PMID: 33125078]
[http://dx.doi.org/10.1093/nar/29.1.41] [PMID: 11125044]
[http://dx.doi.org/10.5281/zenodo.1480624]
[http://dx.doi.org/10.1016/j.pmrj.2010.07.482] [PMID: 20869686]