Abstract
Background: Since bacteria are the earliest known organisms, there has been significant interest in their variety and biology, most certainly concerning human health. Recent advances in Metagenomics sequencing (mNGS), a culture-independent sequencing technology, have facilitated an accelerated development in clinical microbiology and our understanding of pathogens.
Objective: For the implementation of mNGS in routine clinical practice to become feasible, a practical and scalable strategy for the study of mNGS data is essential. This study presents a robust automated pipeline to analyze clinical metagenomic data for pathogen identification and classification.
Methods: The proposed Clin-mNGS pipeline is an integrated, open-source, scalable, reproducible, and user-friendly framework scripted using the Snakemake workflow management software. The implementation avoids the hassle of manual installation and configuration of the multiple commandline tools and dependencies. The approach directly screens pathogens from clinical raw reads and generates consolidated reports for each sample.
Results: The pipeline is demonstrated using publicly available data and is tested on a desktop Linux system and a High-performance cluster. The study compares variability in results from different tools and versions. The versions of the tools are made user modifiable. The pipeline results in quality check, filtered reads, host subtraction, assembled contigs, assembly metrics, relative abundances of bacterial species, antimicrobial resistance genes, plasmid finding, and virulence factors identification. The results obtained from the pipeline are evaluated based on sensitivity and positive predictive value.
Conclusion: Clin-mNGS is an automated Snakemake pipeline validated for the analysis of microbial clinical metagenomics reads to perform taxonomic classification and antimicrobial resistance prediction.
Keywords: Metagenomic analysis, clinical metagenomics, clinical diagnostics, snakemake, pathogen detection, taxonomic identification, antimicrobial drug resistance, virulence factor genes.
Graphical Abstract
[http://dx.doi.org/10.1038/s41576-019-0113-7] [PMID: 30918369]
[http://dx.doi.org/10.1093/cid/cix881] [PMID: 29040428]
[http://dx.doi.org/10.1146/annurev-pathmechdis-012418-012751] [PMID: 30355154]
[http://dx.doi.org/10.1128/JCM.02452-13] [PMID: 24172157]
[http://dx.doi.org/10.1016/j.jbiotec.2016.12.022] [PMID: 28042011]
[http://dx.doi.org/10.5858/arpa.2016-0539-RA] [PMID: 28169558]
[http://dx.doi.org/10.1371/journal.ppat.1002824] [PMID: 22876174]
[http://dx.doi.org/10.1038/nrg3226] [PMID: 22868263]
[http://dx.doi.org/10.3201/eid1811.120453] [PMID: 23092707]
[http://dx.doi.org/10.1038/nrg.2017.88] [PMID: 29129921]
[http://dx.doi.org/10.1038/nature11553] [PMID: 22972298]
[http://dx.doi.org/10.3389/fmicb.2017.01069] [PMID: 28725217]
[http://dx.doi.org/10.1007/s10096-016-2805-7] [PMID: 27771780]
[http://dx.doi.org/10.3389/fmicb.2017.00685] [PMID: 28473817]
[http://dx.doi.org/10.1111/ajt.14058] [PMID: 27647685]
[http://dx.doi.org/10.1001/jamaneurol.2018.0463] [PMID: 29710329]
[http://dx.doi.org/10.1164/rccm.201706-1097LE] [PMID: 28686513]
[http://dx.doi.org/10.1128/JCM.01965-15] [PMID: 26637379]
[http://dx.doi.org/10.1186/s13073-016-0344-6] [PMID: 27562436]
[http://dx.doi.org/10.1128/JCM.00402-18] [PMID: 29848568]
[http://dx.doi.org/10.1093/bioinformatics/bts480] [PMID: 22908215]
[http://dx.doi.org/10.1093/ve/vez050] [PMID: 31768265]
[http://dx.doi.org/10.21105/joss.01465]
[http://dx.doi.org/10.1186/s12859-018-2139-9] [PMID: 29649993]
[PMID: 28096075]
[http://dx.doi.org/10.1038/s41598-018-31873-w] [PMID: 30213965]
[http://dx.doi.org/10.1093/nar/gkp1137] [PMID: 20015970]
[http://dx.doi.org/10.1186/s40168-017-0318-y] [PMID: 28807044]
[http://dx.doi.org/10.1093/bioinformatics/btu170] [PMID: 24695404]
[http://dx.doi.org/10.1155/2015/292950]
[http://dx.doi.org/10.1016/j.gdata.2016.11.004] [PMID: 27896068]
[http://dx.doi.org/10.1128/AEM.00078-16] [PMID: 26873315]
[http://dx.doi.org/10.1038/sdata.2018.176] [PMID: 30179232]
[http://dx.doi.org/10.3389/fmicb.2019.01277] [PMID: 31244801]
[http://dx.doi.org/10.1038/nmeth.1923] [PMID: 22388286]
[http://dx.doi.org/10.1101/gr.213959.116] [PMID: 28298430]
[http://dx.doi.org/10.1093/bioinformatics/btv697] [PMID: 26614127]
[http://dx.doi.org/10.1038/nmeth.3589] [PMID: 26418763]
[PMID: 27789705]
[http://dx.doi.org/10.1128/AAC.01310-13] [PMID: 24145532]
[http://dx.doi.org/10.1093/jac/dks261] [PMID: 22782487]
[http://dx.doi.org/10.1128/AAC.02412-14] [PMID: 24777092]
[http://dx.doi.org/10.1093/nar/gkv1239] [PMID: 26578559]
[http://dx.doi.org/10.1038/s41592-018-0046-7] [PMID: 29967506]
[http://dx.doi.org/10.1093/bioinformatics/btw379] [PMID: 27378299]