Generic placeholder image

Current Bioinformatics

Editor-in-Chief

ISSN (Print): 1574-8936
ISSN (Online): 2212-392X

Review Article

Integration of Artificial Intelligence, Machine Learning and Deep Learning Techniques in Genomics: Review on Computational Perspectives for NGS Analysis of DNA and RNA Seq Data

Author(s): Chandrashekar K., Vidya Niranjan*, Adarsh Vishal and Anagha S. Setlur

Volume 19, Issue 9, 2024

Published on: 23 January, 2024

Page: [825 - 844] Pages: 20

DOI: 10.2174/0115748936284044240108074937

Price: $65

Abstract

In the current state of genomics and biomedical research, the utilization of Artificial Intelligence (AI), Machine Learning (ML) and Deep Learning (DL) have emerged as paradigm shifters. While traditional NGS DNA and RNA sequencing analysis pipelines have been sound in decoding genetic information, the sequencing data’s volume and complexity have surged. There is a demand for more efficient and accurate methods of analysis. This has led to dependency on AI/ML and DL approaches. This paper highlights these tool approaches to ease combat the limitations and generate better results, with the help of pipeline automation and integration of these tools into the NGS DNA and RNA-seq pipeline we can improve the quality of research as large data sets can be processed using Deep Learning tools. Automation helps reduce labor-intensive tasks and helps researchers to focus on other frontiers of research. In the traditional pipeline all tasks from quality check to the variant identification in the case of SNP detection take a huge amount of computational time and manually the researcher has to input codes to prevent manual human errors, but with the power of automation, we can run the whole process in comparatively lesser time and smoother as the automated pipeline can run for multiple files instead of the one single file observed in the traditional pipeline. In conclusion, this review paper sheds light on the transformative impact of DL's integration into traditional pipelines and its role in optimizing computational time. Additionally, it highlights the growing importance of AI-driven solutions in advancing genomics research and enabling data-intensive biomedical applications.

[1]
Ki CS. Recent advances in the clinical application of next-generation sequencing. Pediatr Gastroenterol Hepatol Nutr 2021; 24(1): 1-6.
[http://dx.doi.org/10.5223/pghn.2021.24.1.1] [PMID: 33505888]
[2]
D’Agaro E. New advances in NGS technologies. Intechopen 2017.
[http://dx.doi.org/10.5772/66924]
[3]
Satam H, Joshi K, Mangrolia U, et al. Next-generation sequencing technology: Current trends and advancements. Biology 2023; 12(7): 997.
[http://dx.doi.org/10.3390/biology12070997] [PMID: 37508427]
[4]
Kumar S, Banks TW, Cloutier S. SNP discovery through next-generation sequencing and its applications. Int J Plant Genomics 2012; 2012: 1-15.
[http://dx.doi.org/10.1155/2012/831460] [PMID: 23227038]
[5]
Kim S, Misra A. SNP genotyping: Technologies and biomedical applications. Annu Rev Biomed Eng 2007; 9(1): 289-320.
[http://dx.doi.org/10.1146/annurev.bioeng.9.060906.152037] [PMID: 17391067]
[6]
Kumar A, Rajendran V, Sethumadhavan R, Shukla P, Tiwari S, Purohit R. Computational SNP analysis: Current approaches and future prospects. Cell Biochem Biophys 2014; 68(2): 233-9.
[http://dx.doi.org/10.1007/s12013-013-9705-6] [PMID: 23852834]
[7]
He B, Wang L, Wu Q, et al. Clinical application of NGS-based SNP haplotyping for PGT-M of methylmalonic acidemia. Syst Biol Reprod Med 2022; 68(1): 80-8.
[http://dx.doi.org/10.1080/19396368.2021.2005718] [PMID: 34913786]
[8]
Anaparthy N, Ho YJ, Martelotto L, Hammell M, Hicks J. Single-cell applications of next-generation sequencing. Cold Spring Harb Perspect Med 2019; 9(10): a026898.
[http://dx.doi.org/10.1101/cshperspect.a026898] [PMID: 30617056]
[9]
Brendel M, Su C, Bai Z, Zhang H, Elemento O, Wang F. Application of deep learning on single-cell RNA sequencing data analysis: A review. Genomics Proteomics Bioinform 2022; 20(5): 814-35.
[http://dx.doi.org/10.1016/j.gpb.2022.11.011] [PMID: 36528240]
[10]
Jovic D, Liang X, Zeng H, Lin L, Xu F, Luo Y. Single‐cell RNA sequencing technologies and applications: A brief overview. Clin Transl Med 2022; 12(3): e694.
[http://dx.doi.org/10.1002/ctm2.694] [PMID: 35352511]
[11]
Angermueller C, Pärnamaa T, Parts L, Stegle O. Deep learning for computational biology. Mol Syst Biol 2016; 12(7): 878.
[http://dx.doi.org/10.15252/msb.20156651] [PMID: 27474269]
[12]
Liu J, Li J, Wang H, Yan J. Application of deep learning in genomics. Sci China Life Sci 2020; 63(12): 1860-78.
[http://dx.doi.org/10.1007/s11427-020-1804-5] [PMID: 33051704]
[13]
Shen X, Jiang C, Wen Y, Li C, Lu Q. A brief review on deep learning applications in genomic studies. Front Sys Biol 2022; 2: 877717.
[http://dx.doi.org/10.3389/fsysb.2022.877717]
[14]
ENA Browser . Available from: https://www.ebi.ac.uk/ena/browser/home (Cited 2023 Sep 22).
[15]
Home - SRA - NCBI Available from: https://www.ncbi.nlm.nih.gov/sra (Cited 2023 Sep 22).
[16]
Babraham Bioinformatics - FastQC A Quality Control tool for High Throughput Sequence Data. Available from: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ (Cited 2023 Sep 22).
[17]
Conesa A, Madrigal P, Tarazona S, et al. A survey of best practices for RNA-seq data analysis. Genome Biol 2016; 17(1): 13.
[http://dx.doi.org/10.1186/s13059-016-0881-8] [PMID: 26813401]
[18]
Pedersen BS, Bhetariya PJ, Brown J, et al. Somalier: Rapid relatedness estimation for cancer and germline studies using efficient genome sketches. Genome Med 2020; 12(1): 62.
[http://dx.doi.org/10.1186/s13073-020-00761-2] [PMID: 32664994]
[19]
Bolger AM, Lohse M, Usadel B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 2014; 30(15): 2114-20.
[http://dx.doi.org/10.1093/bioinformatics/btu170] [PMID: 24695404]
[20]
Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J 2011; 17(1): 10-2.
[http://dx.doi.org/10.14806/ej.17.1.200]
[21]
FASTX-Toolkit Available from: http://hannonlab.cshl.edu/fastx_toolkit/ (Cited 2023 Sep 22).
[22]
Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods 2012; 9(4): 357-9.
[http://dx.doi.org/10.1038/nmeth.1923] [PMID: 22388286]
[23]
Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol 2019; 37(8): 907-15.
[http://dx.doi.org/10.1038/s41587-019-0201-4] [PMID: 31375807]
[24]
Dobin A, Davis CA, Schlesinger F, et al. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics 2013; 29(1): 15-21.
[http://dx.doi.org/10.1093/bioinformatics/bts635] [PMID: 23104886]
[25]
Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009; 25(14): 1754-60.
[http://dx.doi.org/10.1093/bioinformatics/btp324] [PMID: 19451168]
[26]
Musich R, Cadle-Davidson L, Osier MV. Comparison of short-read sequence aligners indicates strengths and weaknesses for biologists to consider. Front Plant Sci 2021; 12: 657240.
[http://dx.doi.org/10.3389/fpls.2021.657240] [PMID: 33936141]
[27]
Niranjan V. Investigation and identification of somatic and germline variants for colorectal cancer exomes using the NG 2023. Available from: https://www.protocols.io/view/investigation-and-identification-of-somatic-and-ge-cukwwuxe (Cited 2023 Sep 22).
[28]
Li H, Handsaker B, Wysoker A, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009; 25(16): 2078-9.
[http://dx.doi.org/10.1093/bioinformatics/btp352] [PMID: 19505943]
[29]
Yang L. FeatureCounts: An efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 2014; 30(7): 929-30.
[30]
Tarasov A, Vilella AJ, Cuppen E, Nijman IJ, Prins P. Sambamba: Fast processing of NGS alignment formats. Bioinformatics 2015; 31(12): 2032-4.
[http://dx.doi.org/10.1093/bioinformatics/btv098] [PMID: 25697820]
[31]
McKenna A, Hanna M, Banks E, et al. The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 2010; 20(9): 1297-303.
[http://dx.doi.org/10.1101/gr.107524.110] [PMID: 20644199]
[32]
Picard Tools - By Broad Institute. Available from: http://broadinstitute.github.io/picard/ (Cited 2023 Sep 22).
[33]
McLaren W, Gil L, Hunt SE, et al. The ensembl variant effect predictor. Genome Biol 2016; 17(1): 122.
[http://dx.doi.org/10.1186/s13059-016-0974-4] [PMID: 27268795]
[34]
Wang K, Li M, Hakonarson H. ANNOVAR: Functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res 2010; 38(16): e164.
[http://dx.doi.org/10.1093/nar/gkq603] [PMID: 20601685]
[35]
Danecek P, Auton A, Abecasis G, et al. The variant call format and VCFtools. Bioinformatics 2011; 27(15): 2156-8.
[http://dx.doi.org/10.1093/bioinformatics/btr330] [PMID: 21653522]
[36]
Kopanos C, Tsiolkas V, Kouris A, et al. VarSome: The human genomic variant search engine. Bioinformatics 2019; 35(11): 1978-80.
[http://dx.doi.org/10.1093/bioinformatics/bty897] [PMID: 30376034]
[37]
Tang Z, Kang B, Li C, Chen T, Zhang Z. GEPIA2: An enhanced web server for large-scale expression profiling and interactive analysis. Nucleic Acids Res 2019; 47(W1): W556-60.
[http://dx.doi.org/10.1093/nar/gkz430] [PMID: 31114875]
[38]
Jang Y, Seo J, Jang I, Lee B, Kim S, Lee S. CaPSSA: Visual evaluation of cancer biomarker genes for patient stratification and survival analysis using mutation and expression data. Bioinformatics 2019; 35(24): 5341-3.
[http://dx.doi.org/10.1093/bioinformatics/btz516] [PMID: 31228188]
[39]
Padmavathi P, Setlur AS, Chandrashekar K, Niranjan V. A comprehensive in-silico computational analysis of twenty cancer exome datasets and identification of associated somatic variants reveals potential molecular markers for detection of varied cancer types. Inform Med Unlocked 2021; 26: 100762.
[http://dx.doi.org/10.1016/j.imu.2021.100762]
[40]
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 2014; 15(12): 550.
[http://dx.doi.org/10.1186/s13059-014-0550-8] [PMID: 25516281]
[41]
Robinson MD, McCarthy DJ, Smyth GK. edgeR: A Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 2010; 26(1): 139-40.
[http://dx.doi.org/10.1093/bioinformatics/btp616] [PMID: 19910308]
[42]
Pereira WJ, Almeida FM, Conde D, et al. Asc-Seurat: Analytical single-cell Seurat-based web application. BMC Bioinformatics 2021; 22(1): 556.
[http://dx.doi.org/10.1186/s12859-021-04472-2] [PMID: 34794383]
[43]
Wolf FA, Angerer P, Theis FJ. SCANPY: Large-scale single-cell gene expression data analysis. Genome Biol 2018; 19(1): 15.
[http://dx.doi.org/10.1186/s13059-017-1382-0] [PMID: 29409532]
[44]
Gao J. The cBio cancer genomics portal: An open platform for exploring multidimensional cancer genomics data. Cancer Discov 2012; 2(5): 401-4.
[45]
Shihao S. rMATS: Robust and flexible detection of differential alternative splicing from replicate RNA-Seq data. Biol Sci 2014.
[46]
Trincado JL, Entizne JC, Hysenaj G, et al. SUPPA2: Fast, accurate, and uncertainty-aware differential splicing analysis across multiple conditions. Genome Biol 2018; 19(1): 40.
[http://dx.doi.org/10.1186/s13059-018-1417-1] [PMID: 29571299]
[47]
Katz Y, Wang ET, Airoldi EM, Burge CB. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Methods 2010; 7(12): 1009-15.
[http://dx.doi.org/10.1038/nmeth.1528] [PMID: 21057496]
[48]
Ewels P, Magnusson M, Lundin S, Käller M, Multi QC. Summarize analysis results for multiple tools and samples in a single report. Bioinformatics 2016; 32(19): 3047-8.
[http://dx.doi.org/10.1093/bioinformatics/btw354] [PMID: 27312411]
[49]
García-Alcalde F, Okonechnikov K, Carbonell J, et al. Qualimap: Evaluating next-generation sequencing alignment data. Bioinformatics 2012; 28(20): 2678-9.
[http://dx.doi.org/10.1093/bioinformatics/bts503] [PMID: 22914218]
[50]
Lassmann T. SAMStat 2: Quality control for next generation sequencing data. Bioinformatics 2023; 39(1): btad019.
[http://dx.doi.org/10.1093/bioinformatics/btad019] [PMID: 36637208]
[51]
Chen S, Zhou Y, Chen Y, Gu J. fastp: An ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 2018; 34(17): i884-90.
[http://dx.doi.org/10.1093/bioinformatics/bty560] [PMID: 30423086]
[52]
Schmieder R, Edwards R. Quality control and preprocessing of metagenomic datasets. Bioinformatics 2011; 27(6): 863-4.
[http://dx.doi.org/10.1093/bioinformatics/btr026] [PMID: 21278185]
[53]
BaseSpace Sequence Hub | Cloud-based genomic data management. Available from: https://www.illumina.com/products/by-type/informatics-products/basespace-sequence-hub.html (Cited 2023 Sep 24).
[54]
Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: Accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 2013; 14(4): R36.
[http://dx.doi.org/10.1186/gb-2013-14-4-r36] [PMID: 23618408]
[55]
Gonzalez MA, Lebrigio RFA, Van Booven D, et al. GEnomes Management Application (GEM.app): A new software tool for large-scale collaborative genome analysis. Hum Mutat 2013; 34(6): 842-6.
[http://dx.doi.org/10.1002/humu.22305] [PMID: 23463597]
[56]
Li H. Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics 2018; 34(18): 3094-100.
[http://dx.doi.org/10.1093/bioinformatics/bty191] [PMID: 29750242]
[57]
Wu TD, Watanabe CK. GMAP: A genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 2005; 21(9): 1859-75.
[http://dx.doi.org/10.1093/bioinformatics/bti310] [PMID: 15728110]
[58]
Hamada M, Ono Y, Asai K, Frith MC. Training alignment parameters for arbitrary sequencers with LAST-TRAIN. Bioinformatics 2017; 33(6): 926-8.
[http://dx.doi.org/10.1093/bioinformatics/btw742] [PMID: 28039163]
[59]
Srivastava A, Sarkar H, Gupta N, Patro R. RapMap: A rapid, sensitive and accurate tool for mapping RNA-seq reads to transcriptomes. Bioinformatics 2016; 32(12): i192-200.
[http://dx.doi.org/10.1093/bioinformatics/btw277] [PMID: 27307617]
[60]
Kent WJ. BLAT--the BLAST-like alignment tool. Genome Res 2002; 12(4): 656-64.
[PMID: 11932250]
[61]
Colak D, Al-Harazi O, Mustafa OM, et al. RNA-Seq transcriptome profiling in three liver regeneration models in rats: Comparative analysis of partial hepatectomy, ALLPS, and PVL. Sci Rep 2020; 10(1): 5213.
[http://dx.doi.org/10.1038/s41598-020-61826-1] [PMID: 32251301]
[62]
Galaxy Community. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update. Nucleic Acids Res 2022; 50(W1): W345–51.
[63]
BioBam OmicsBox Bioinformatics Software Available from: https://www.biobam.com/omicsbox/ (Cited 2023 Sep 24).
[64]
Bioinformatics Software | QIAGEN Digital Insights. Home - QIAGEN Digital Insights. Available from: https://digitalinsights.qiagen.com/(Cited 2023 Sep 24).
[65]
Okonechnikov K, Golosova O, Fursov M. Unipro UGENE: A unified bioinformatics toolkit. Bioinformatics 2012; 28(8): 1166-7.
[http://dx.doi.org/10.1093/bioinformatics/bts091] [PMID: 22368248]
[66]
Cingolani P, Platts A, Wang LL, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff. Fly 2012; 6(2): 80-92.
[http://dx.doi.org/10.4161/fly.19695] [PMID: 22728672]
[67]
Hinrichs AS, Raney BJ, Speir ML, et al. UCSC data integrator and variant annotation integrator. Bioinformatics 2016; 32(9): 1430-2.
[http://dx.doi.org/10.1093/bioinformatics/btv766] [PMID: 26740527]
[68]
Bendl J, Stourac J, Salanda O, et al. PredictSNP: Robust and accurate consensus classifier for prediction of disease-related mutations. PLOS Comput Biol 2014; 10(1): e1003440.
[http://dx.doi.org/10.1371/journal.pcbi.1003440] [PMID: 24453961]
[69]
Ng PC, Henikoff S. SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res 2003; 31(13): 3812-4.
[http://dx.doi.org/10.1093/nar/gkg509] [PMID: 12824425]
[70]
Adzhubei IA, Schmidt S, Peshkin L, et al. A method and server for predicting damaging missense mutations. Nat Methods 2010; 7(4): 248-9.
[http://dx.doi.org/10.1038/nmeth0410-248] [PMID: 20354512]
[71]
Ritchie ME, Phipson B, Wu D, et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 2015; 43(7): e47.
[http://dx.doi.org/10.1093/nar/gkv007] [PMID: 25605792]
[72]
Trapnell C, Roberts A, Goff L, et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 2012; 7(3): 562-78.
[http://dx.doi.org/10.1038/nprot.2012.016] [PMID: 22383036]
[73]
Pertea M, Pertea GM, Antonescu CM, Chang TC, Mendell JT, Salzberg SL. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol 2015; 33(3): 290-5.
[http://dx.doi.org/10.1038/nbt.3122] [PMID: 25690850]
[74]
Dingerdissen HM, Bastian F, Vijay-Shanker K. Robinson-Rechavi M, Bell A, Gogate N. OncoMX: A knowledgebase for exploring cancer biomarkers in the context of related cancer and healthy data. JCO Clin Cancer Inform 2020; 6: 00117.
[http://dx.doi.org/10.1200/CCI.19.00117]
[75]
Ardabili S, Mosavi A, Ghamisi P, et al. COVID-19 outbreak prediction with machine learning. Algorithms 2020; 13(10): 249.
[http://dx.doi.org/10.3390/a13100249]
[76]
Sarker IH. Machine learning: Algorithms, real-world applications and research directions. SN Comp Sci 2021; 2(3): 160.
[http://dx.doi.org/10.1007/s42979-021-00592-x] [PMID: 33778771]
[77]
Hammoudeh A. A Concise Introduction to Reinforcement Learning 2018. Available from: https://www.researchgate.net/publication/323178749_A_Concise_Introduction_to_Reinforcement_Learning
[78]
Rong S, Bao-wen Z. The research of regression model in machine learning field. MATEC Web Conf 2018; 176(3): 01033.
[http://dx.doi.org/10.1051/matecconf/201817601033]
[79]
le Cessie S, van Houwelingen JC. Ridge estimators in logistic regression. J R Stat Soc Ser C 1992; 41(1): 191-201.
[80]
Angelis D, Sofos F, Karakasidis TE. Artificial intelligence in physical sciences: Symbolic regression trends and perspectives. Arch Comput Methods Eng 2023; 30(6): 3845-65.
[http://dx.doi.org/10.1007/s11831-023-09922-z] [PMID: 37359747]
[81]
Zeng P, Song X, Lensen A, Ou Y, Sun Y, Zhang M. Differentiable genetic programming for high-dimensional symbolic regression. arxiv 2023; 2023: 08915.
[82]
Patel H, Prajapati P. Study and analysis of decision tree based classification algorithms. Int J Comput Sci Eng 2018; 6: 74-8.
[83]
Evgeniou T, Pontil M. Support Vector Machines: Theory and Applications. Springer 2001.
[84]
Zhang Y. Support vector machine classification algorithm and its application. In: Information Computing and Applications. Berlin, Heidelberg: Springer 2012.
[http://dx.doi.org/10.1007/978-3-642-34041-3_27]
[85]
Zhang Y, Zhu Y, Lin S, Liu X. Application of least squares support vector machine in fault diagnosis. In: Information Computing and Applications. Berlin, Heidelberg: Springer 2011.
[http://dx.doi.org/10.1007/978-3-642-27452-7_26]
[86]
Webb GI. Naïve bayes. In: Sammut C, Webb GI, Eds Encyclopedia of Machine Learning. Boston, MA: Springer US 2010; pp. 713-4.
[http://dx.doi.org/10.1007/978-0-387-30164-8_576]
[87]
Rao Jetti C, Shaik R, Shaik S. Disease prediction using naïve bayes - Machine learning algorithm. Int J Sci Healthcare Res 2021; 6(4): 17-22.
[http://dx.doi.org/10.52403/ijshr.20211004]
[88]
Taunk K, De S, Verma S, Swetapadma A. A brief review of nearest neighbor algorithm for learning and classification. 2019 International Conference on Intelligent Computing and Control Systems (ICCS). Madurai, India 15-17 May,.. 2019; pp. 1255-60.
[http://dx.doi.org/10.1109/ICCS45141.2019.9065747]
[89]
Uddin S, Haque I, Lu H, Moni MA, Gide E. Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction. Sci Rep 2022; 12(1): 6256.
[http://dx.doi.org/10.1038/s41598-022-10358-x] [PMID: 35428863]
[90]
Ahmed M, Seraj R, Islam SMS. The k-means algorithm: A comprehensive survey and performance evaluation. Electronics 2020; 9(8): 1295.
[http://dx.doi.org/10.3390/electronics9081295]
[91]
Li Y, Wu H. A clustering method based on K-means algorithm. Phys Procedia 2012; 25: 1104-9.
[http://dx.doi.org/10.1016/j.phpro.2012.03.206]
[92]
Georgiou DN, Karakasidis TE, Megaritis AC. A short survey on genetic sequences, chou’s pseudo amino acid composition and its combination with fuzzy set theory. Open Bioinform J 2013; 7(1): 41-8.
[http://dx.doi.org/10.2174/1875036201307010041]
[93]
Ali J, Khan R, Ahmad N, Maqsood I. Random forests and decision trees. IJCSI 2012; p. 9.
[94]
Breiman L. Random forests. Mach Learn 2001; 45(1): 5-32.
[http://dx.doi.org/10.1023/A:1010933404324]
[95]
Pellegrino E, Jacques C, Beaufils N, et al. Machine learning random forest for predicting oncosomatic variant NGS analysis. Sci Rep 2021; 11(1): 21820.
[http://dx.doi.org/10.1038/s41598-021-01253-y] [PMID: 34750410]
[96]
Sarica A, Cerasa A, Quattrone A. Random forest algorithm for the classification of neuroimaging data in alzheimer’s disease: A systematic review. Front Aging Neurosci 2017; 9: 329.
[http://dx.doi.org/10.3389/fnagi.2017.00329] [PMID: 29056906]
[97]
Aziz N, Akhir E, Aziz APDI, Jaafar J, Hasan MH, Abas A. A study on gradient boosting algorithms for development of AI monitoring and prediction systems. In: International Conference on Computational Intelligence (ICCI). Bandar Seri Iskandar, Malaysia. , 08-09 Oct, 2020.
[http://dx.doi.org/ 10.1109/ICCI51257.2020.924784]
[98]
Natekin A, Knoll A. Gradient boosting machines, a tutorial. Front Neurorobot 2013; 7: 21.
[http://dx.doi.org/10.3389/fnbot.2013.00021] [PMID: 24409142]
[99]
Otchere DA, Ganat TOA, Ojero JO, Tackie-Otoo BN, Taki MY. Application of gradient boosting regression model for the evaluation of feature selection techniques in improving reservoir characterisation predictions. J Petrol Sci Eng 2022; 208: 109244.
[http://dx.doi.org/10.1016/j.petrol.2021.109244]
[100]
Howley T, Madden M, O’Connell ML, Ryder A. The effect of principal component analysis on machine learning accuracy with high dimensional spectral data. In: Knowledge-Based Systems. Elsevier 2005.
[101]
Mishra S, Sarkar U, Taraphder S, Datta S, Swain D, Saikhom R. Principal component analysis. Int J Livest Res 2017; 1.
[102]
Salem N, Hussein S. Data dimensional reduction and principal components analysis. Procedia Comput Sci 2019; 163: 292-9.
[http://dx.doi.org/10.1016/j.procs.2019.12.111]
[103]
Kobak D, Berens P. The art of using t-SNE for single-cell transcriptomics. Nat Commun 2019; 10(1): 5416.
[http://dx.doi.org/10.1038/s41467-019-13056-x] [PMID: 31780648]
[104]
Pezoulas VC, Hazapis O, Lagopati N, et al. Machine learning approaches on high throughput NGS data to unveil mechanisms of function in biology and disease. Can Genom Proteom 2021; 18(5): 605-26.
[http://dx.doi.org/10.21873/cgp.20284] [PMID: 34479914]
[105]
Grossi E, Buscema M. Introduction to artificial neural networks. Eur J Gastroenterol Hepatol 2007; 19(12): 1046-54.
[http://dx.doi.org/10.1097/MEG.0b013e3282f198a0] [PMID: 17998827]
[106]
Madhiarasan M, Louzazni M. Analysis of artificial neural network: Architecture, types, and forecasting applications. J Electr Comput Eng 2022; 2022: 1-23.
[http://dx.doi.org/10.1155/2022/5416722]
[107]
Han SH, Kim KW, Kim S, Youn YC. Artificial neural network: Understanding the basic concepts without mathematics. Dement Neurocognit Disord 2018; 17(3): 83-9.
[http://dx.doi.org/10.12779/dnd.2018.17.3.83] [PMID: 30906397]
[108]
Artificial Neural Networks Advantages and Disadvantages 2018. Available from: https://www.linkedin.com/pulse/artificial-neural-networks-advantages-disadvantages-maad-m-mijwel
[109]
Alzubaidi L, Zhang J, Humaidi AJ, et al. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J Big Data 2021; 8(1): 53.
[http://dx.doi.org/10.1186/s40537-021-00444-8] [PMID: 33816053]
[110]
Tang B, Pan Z, Yin K, Khateeb A. Recent advances of deep learning in bioinformatics and computational biology. Front Genet 2019; 10: 214.
[http://dx.doi.org/10.3389/fgene.2019.00214] [PMID: 30972100]
[111]
Indolia S, Goswami AK, Mishra SP, Asopa P. Conceptual understanding of convolutional neural network- A deep learning approach. Procedia Comput Sci 2018; 132: 679-88.
[http://dx.doi.org/10.1016/j.procs.2018.05.069]
[112]
O’Shea K, Nash R. An introduction to convolutional neural networks. ArXiv 2015; 2015.
[113]
Kaur M, Mohta A. A review of deep learning with recurrent neural network. In: International Conference on Smart Systems and Inventive Technology (ICSSIT). Tirunelveli, India. 27-29 Nov, 2019.
[http://dx.doi.org/10.1109/ICSSIT46314.2019.8987837]
[114]
Sherstinsky A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D 2020; 404: 132306.
[http://dx.doi.org/10.1016/j.physd.2019.132306]
[115]
Abdel-Nasser Sharkawy Principle of neural network and its main types. Review J Adv Appl Comput Math 2020; 7: 8-19.
[http://dx.doi.org/10.15377/2409-5761.2020.07.2]
[116]
Poplin R, Chang PC, Alexander D, et al. A universal SNP and small-indel variant caller using deep neural networks. Nat Biotechnol 2018; 36(10): 983-7.
[http://dx.doi.org/10.1038/nbt.4235] [PMID: 30247488]
[117]
Ravasio V, Ritelli M, Legati A, Giacopuzzi E. GARFIELD-NGS: Genomic vARiants FIltering by dEep Learning moDels in NGS. Bioinformatics 2018; 34(17): 3038-40.
[http://dx.doi.org/10.1093/bioinformatics/bty303] [PMID: 29668842]
[118]
Khazeeva G, Sablauskas K, van der Sanden B, et al. DeNovoCNN: A deep learning approach to de novo variant calling in next generation sequencing data. Nucleic Acids Res 2022; 50(17): e97.
[http://dx.doi.org/10.1093/nar/gkac511] [PMID: 35713566]
[119]
Sahraeian SME, Liu R, Lau B, Podesta K, Mohiyuddin M, Lam HYK. Deep convolutional neural networks for accurate somatic mutation detection. Nat Commun 2019; 10(1): 1041.
[http://dx.doi.org/10.1038/s41467-019-09027-x] [PMID: 30833567]
[120]
Yang X, Xu X, Breuss MW, Antaki D, Ball LL, Chung C. DeepMosaic: Control-independent mosaic single nucleotide variant detection using deep convolutional neural networks bioRxiv 2021; 2021; 382473.
[121]
Cai L, Wu Y, Gao J, Deep SV. Accurate calling of genomic deletions from high-throughput sequencing data using deep convolutional neural network. BMC Bioinformatics 2019; 20(1): 665.
[http://dx.doi.org/10.1186/s12859-019-3299-y] [PMID: 31830921]
[122]
Zhou Y, Peng M, Yang B, Tong T, Zhang B, Tang N. scDLC: A deep learning framework to classify large sample single-cell RNA-seq data. BMC Genomics 2022; 23(1): 504.
[http://dx.doi.org/10.1186/s12864-022-08715-1] [PMID: 35831808]
[123]
Luo R, Wong CL, Wong YS, et al. Exploring the limit of using a deep neural network on pileup data for germline variant calling. Nat Mach Intell 2020; 2(4): 220-7.
[http://dx.doi.org/10.1038/s42256-020-0167-4]
[124]
Singh A, Bhatia P. Intelli-NGS: Intelligent NGS, a deep neural network-based artificial intelligence to delineate good and bad variant calls from IonTorrent sequencer data bioRxiv 2019; 2019; 879403.
[http://dx.doi.org/10.1101/2019.12.17.879403]
[125]
Gupta G, Saini S. DAVI: Deep learning-based tool for alignment and single nucleotide variant identification. Mach Learn: Sci Technol 2020; 1(2): 025013.
[http://dx.doi.org/10.1088/2632-2153/ab7e19]
[126]
Grønning AGB, Doktor TK, Larsen SJ, et al. DeepCLIP: Predicting the effect of mutations on protein-RNA binding with deep learning. Nucleic Acids Res 2020; 48(13): gkaa530.
[http://dx.doi.org/10.1093/nar/gkaa530] [PMID: 32558887]
[127]
Uhl M, Tran VD, Heyl F, Backofen R. RNAProt: An efficient and feature-rich RNA binding protein binding site predictor. Gigascience 2021; 10(8): giab054.
[http://dx.doi.org/10.1093/gigascience/giab054] [PMID: 34406415]
[128]
Kanzi AM, San JE, Chimukangara B, et al. Next generation sequencing and bioinformatics analysis of family genetic inheritance. Front Genet 2020; 11: 544162.
[http://dx.doi.org/10.3389/fgene.2020.544162] [PMID: 33193618]
[129]
Williams AG, Thomas S, Wyman SK, Holloway AK. RNA-seq Data: Challenges in and recommendations for experimental design and analysis. Curr Protoc Hum Genet 2014; 83: 11.13.1-11.13.20
[130]
Ozsolak F, Milos PM. RNA sequencing: Advances, challenges and opportunities. Nat Rev Genet 2011; 12(2): 87-98.
[http://dx.doi.org/10.1038/nrg2934] [PMID: 21191423]
[131]
Han Y, Gao S, Muegge K, Zhang W, Zhou B. Advanced applications of RNA sequencing and challenges Bioinform Biol Insights 2015; 9s1(Suppl. 1): S28991.
[http://dx.doi.org/10.4137/BBI.S28991] [PMID: 26609224]
[132]
van Vliet AHM. Next generation sequencing of microbial transcriptomes: Challenges and opportunities. FEMS Microbiol Lett 2010; 302(1): 1-7.
[http://dx.doi.org/10.1111/j.1574-6968.2009.01767.x] [PMID: 19735299]
[133]
Katta MAVSK, Khan AW, Doddamani D, Thudi M, Varshney RK. NGS-QCbox and raspberry for parallel, automated and rapid quality control analysis of large-scale next generation sequencing (Illumina) data. PLoS One 2015; 10(10): e0139868.
[http://dx.doi.org/10.1371/journal.pone.0139868] [PMID: 26460497]
[134]
Allen JM, Huang DI, Cronk QC, Johnson KP. aTRAM - automated target restricted assembly method: A fast method for assembling loci across divergent taxa from next-generation sequencing data. BMC Bioinformatics 2015; 16(1): 98.
[http://dx.doi.org/10.1186/s12859-015-0515-2] [PMID: 25887972]
[135]
Schmidt B, Hildebrandt A. Deep learning in next-generation sequencing. Drug Discov Today 2021; 26(1): 173-80.
[http://dx.doi.org/10.1016/j.drudis.2020.10.002] [PMID: 33059075]
[136]
Kukurba KR, Montgomery SB. RNA sequencing and analysis Cold Spring Harb Protoc 2015; 2015(11): pdb.top084970..
[http://dx.doi.org/10.1101/pdb.top084970] [PMID: 25870306 ]
[137]
Haque A, Engel J, Teichmann SA, Lönnberg T. A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications. Genome Med 2017; 9(1): 75.
[http://dx.doi.org/10.1186/s13073-017-0467-4] [PMID: 28821273]
[138]
Łabaj PP, Leparc GG, Linggi BE, Markillie LM, Wiley HS, Kreil DP. Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling. Bioinformatics 2011; 27(13): i383-91.
[http://dx.doi.org/10.1093/bioinformatics/btr247] [PMID: 21685096]
[139]
Amarasinghe SL, Su S, Dong X, Zappia L, Ritchie ME, Gouil Q. Opportunities and challenges in long-read sequencing data analysis. Genome Biol 2020; 21(1): 30.
[http://dx.doi.org/10.1186/s13059-020-1935-5] [PMID: 32033565]
[140]
Alharbi WS, Rashid M. A review of deep learning applications in human genomics using next-generation sequencing data. Hum Genomics 2022; 16(1): 26.
[http://dx.doi.org/10.1186/s40246-022-00396-x] [PMID: 35879805]
[141]
Rukhsar L, Bangyal WH, Ali Khan MS, Ag Ibrahim AA, Nisar K, Rawat DB. Analyzing RNA-seq gene expression data using deep learning approaches for cancer classification. Appl Sci 2022; 12(4): 1850.
[http://dx.doi.org/10.3390/app12041850]
[142]
Schmauch B, Romagnoni A, Pronier E, et al. A deep learning model to predict RNA-Seq expression of tumours from whole slide images. Nat Commun 2020; 11(1): 3877.
[http://dx.doi.org/10.1038/s41467-020-17678-4] [PMID: 32747659]

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy