Abstract
Background: Next-generation sequencing (NGS) technologies are being continuously used for high-throughput sequencing data generation that requires easy-to-use GUI-based data analysis software. These kinds of software could be used in-parallel with sequencing for the automatic data analysis. At present, very few software are available for use and most of them are commercial, thus creating a gap between data generation and data analysis.
Methods: GAAP is developed on the NodeJS platform that uses HTML, JavaScript as the front- end for communication with users. We have implemented FastQC and trimmomatic tool for quality checking and control. Velvet and Prodigal are integrated for genome assembly and gene prediction. The annotation will be done with the help of remote NCBI Blast and IPR-Scan. In the backend, we have used PERL and JavaScript for the processing of data. To evaluate the performance of GAAP, we have assembled a viral (SRR11621811), bacterial (SRR17153353) and human genome (SRR16845439).
Results: We have used GAAP software to assemble, and annotate a COVID-19 genome on a desktop computer that resulted in a single contig of 27994bp with 99.57% reference genome coverage. This assembly predicted 11 genes, of which 10 were annotated using annotation module of GAAP. We have also assembled a bacterial and human genome 138 and 194281 contigs with N50 value 100399 and 610, respectively.
Conclusion: In this study, we have developed freely available, platform-independent genome assembly and annotation (GAAP) software (www.deepaklab.com/gaap). The software itself acts as a complete data analysis package with quality check, quality control, de-novo genome assembly, gene prediction and annotation (Blast, PFAM, GO-Term, pathway and enzyme mapping) modules.
Keywords: NGS, software, GUI, genome assembly, gene prediction, annotation.
Graphical Abstract
[http://dx.doi.org/10.1080/21553769.2016.1178180]
[http://dx.doi.org/10.1016/j.csbj.2019.11.002] [PMID: 31890139]
[http://dx.doi.org/10.1093/nar/gkr854] [PMID: 22009675]
[http://dx.doi.org/10.1016/j.molp.2018.12.016] [PMID: 30594655]
[http://dx.doi.org/10.1016/j.nmni.2015.06.005] [PMID: 26442149]
[http://dx.doi.org/10.1038/s41477-020-0733-0] [PMID: 32690893]
[http://dx.doi.org/10.1186/s40538-016-0054-8]
[http://dx.doi.org/10.3389/fgene.2017.00023] [PMID: 28321234]
[http://dx.doi.org/10.1093/bioinformatics/bti610] [PMID: 16081474]
[http://dx.doi.org/10.1155/2008/619832] [PMID: 18483572]
[http://dx.doi.org/10.1093/bioinformatics/bts664] [PMID: 23162059]
[http://dx.doi.org/10.1093/nar/gkq1019] [PMID: 21062823]
[http://dx.doi.org/10.1093/bioinformatics/btu170] [PMID: 24695404]
[http://dx.doi.org/10.1101/gr.074492.107] [PMID: 18349386]
[http://dx.doi.org/10.1186/1471-2105-11-119] [PMID: 20211023]
[http://dx.doi.org/10.1016/S0022-2836(05)80360-2] [PMID: 2231712]
[http://dx.doi.org/10.1093/nar/gkw1107] [PMID: 27899635]
[http://dx.doi.org/10.1093/bioinformatics/btu031] [PMID: 24451626]
[http://dx.doi.org/10.1093/nar/gkr988] [PMID: 22080510]
[http://dx.doi.org/10.1093/nar/gkv1070] [PMID: 26476454]
[http://dx.doi.org/10.1093/nar/gky1055] [PMID: 30395331]
[http://dx.doi.org/10.1038/75556] [PMID: 10802651]
[http://dx.doi.org/10.1093/bioinformatics/btt086] [PMID: 23422339]