Abstract
The deep learning arena explores new dimensions once considered impossible to human intelligence. Recently, it has taken footsteps in the biological data world to deal with the diverse patterns of data derived from biomolecules. The convolutional neural networks, one of the most employed and persuasive deep learning architectures, can unravel the sequestered truths from these data, especially from the biological sequences. These neural network variants outperform traditional bioinformatics tools for the enduring tasks associated with such sequences.
This work imparts an exciting preface to the basics of convolutional neural network architecture and how it can be instrumented to deal with biological sequence analysis.
The approach followed in this paper can provide the reader with an enhanced view of convolutional neural networks, their basic working principles and how they apply to biological sequences.
A detailed view of critical steps involved in deep learning, starting from the data preprocessing, architecture designing, model training, hyperparameter tuning, and evaluation metrics, are portrayed. A comparative analysis of convolutional neural network architectures developed for protein family classification is also discussed.
This review contributes significantly to understanding the concepts behind deep learning architectures and their applications in biological sequence analysis. It can lift the barrier of limited knowledge to a great extent on the deep learning concepts and their implementation, especially for people who are dealing with pure biology.
Graphical Abstract
[http://dx.doi.org/10.1016/j.heliyon.2018.e00938] [PMID: 30519653]
[http://dx.doi.org/10.1038/nature14539] [PMID: 26017442]
[http://dx.doi.org/10.1109/TLA.2018.8358674]
[http://dx.doi.org/10.1109/CAC.2015.7382560.]
[http://dx.doi.org/10.3115/v1/D14-1067]
[http://dx.doi.org/10.1016/j.neunet.2014.09.003] [PMID: 25462637]
[http://dx.doi.org/10.1016/j.tig.2007.12.007] [PMID: 18262675]
[http://dx.doi.org/10.4324/9781410612403]
[http://dx.doi.org/10.15252/msb.20156651] [PMID: 27474269]
[http://dx.doi.org/10.1021/acs.jcim.7b00028] [PMID: 28514151]
[http://dx.doi.org/10.1073/pnas.1705691114] [PMID: 28720701]
[http://dx.doi.org/10.3390/ijms22073589] [PMID: 33808317]
[http://dx.doi.org/10.1186/s13059-017-1189-z] [PMID: 28077169]
[http://dx.doi.org/10.1038/s41576-019-0122-6] [PMID: 30971806]
[http://dx.doi.org/10.1093/bib/bbw068] [PMID: 27473064]
[http://dx.doi.org/10.1093/bioinformatics/btx531] [PMID: 28961695]
[http://dx.doi.org/10.1093/jmcb/mjaa030] [PMID: 32573721]
[http://dx.doi.org/ 10.48550/arXiv.1702.07800]
[http://dx.doi.org/10.1145/3422622]
[http://dx.doi.org/10.1007/978-3-319-10578-9_23]
[http://dx.doi.org/ 10.1007/978-3-319-10590-1_53]
[http://dx.doi.org/10.48550/arXiv.1409.1556]
[http://dx.doi.org/10.1109/CVPR.2015.7298594]
[http://dx.doi.org/10.48550/arXiv.1312.6114]
[http://dx.doi.org/10.5555/3104482.3104499]
[http://dx.doi.org/10.1145/1390156.1390266]
[http://dx.doi.org/10.5555/3104482.3104621]
[http://dx.doi.org/10.1007/BF00344251] [PMID: 7370364]
[http://dx.doi.org/10.1037/h0042519] [PMID: 13602029]
[http://dx.doi.org/10.1113/jphysiol.1968.sp008455] [PMID: 4966457]
[http://dx.doi.org/10.1093/bib/bbaa435] [PMID: 33498086]
[http://dx.doi.org/10.1093/bioinformatics/bty222] [PMID: 29659719]
[http://dx.doi.org/10.1038/s42256-021-00316-z]
[http://dx.doi.org/10.1093/bioinformatics/btz094] [PMID: 30753280]
[http://dx.doi.org/10.1016/j.matpr.2022.02.395]
[http://dx.doi.org/10.7717/peerj.4750] [PMID: 29740518]
[http://dx.doi.org/10.1093/nar/gky215] [PMID: 29617928]
[http://dx.doi.org/10.1038/nmeth.3547] [PMID: 26301843]
[http://dx.doi.org/10.1093/gigascience/giy037] [PMID: 29648610]
[http://dx.doi.org/10.1038/nbt.3300] [PMID: 26213851]
[http://dx.doi.org/10.1093/bioinformatics/btx480] [PMID: 28961686]
[http://dx.doi.org/ 10.48550/arXiv.1810.01414]
[http://dx.doi.org/10.1371/journal.pone.0171410] [PMID: 28158264]
[http://dx.doi.org/10.1093/bioinformatics/btx267] [PMID: 28881999]
[http://dx.doi.org/10.1186/s12864-018-5350-1] [PMID: 30591034]
[http://dx.doi.org/10.3390/genes10080587]
[http://dx.doi.org/10.1016/j.gene.2020.100035] [PMID: 32550561]
[http://dx.doi.org/10.1109/EMBC.2018.8512780]
[http://dx.doi.org/10.1093/bioinformatics/bty418] [PMID: 29850775]
[http://dx.doi.org/10.1162/neco.1997.9.8.1735] [PMID: 9377276]
[http://dx.doi.org/10.1093/bioinformatics/bty179] [PMID: 29590297]
[http://dx.doi.org/10.1093/nar/gkw226] [PMID: 27084946]
[http://dx.doi.org/10.1038/s41598-021-82043-4] [PMID: 33547380]
[http://dx.doi.org/10.1007/s40484-019-0154-0] [PMID: 34113473]
[http://dx.doi.org/10.3389/fgene.2019.00432] [PMID: 31191597]
[http://dx.doi.org/10.1093/bioinformatics/bty1050] [PMID: 30649185]
[http://dx.doi.org/10.1093/bioinformatics/bty1068] [PMID: 30601980]
[http://dx.doi.org/10.1186/s12859-021-04293-3] [PMID: 34273967]
[http://dx.doi.org/10.1007/s10489-021-02572-3] [PMID: 34764607]
[http://dx.doi.org/10.1142/S0219720020500249]
[http://dx.doi.org/10.1155/2012/251364] [PMID: 22829749]
[http://dx.doi.org/10.1038/s41565-018-0236-6] [PMID: 30190617]
[http://dx.doi.org/10.1093/nar/gkn201] [PMID: 18440982]
[http://dx.doi.org/10.1093/nar/gkab1195] [PMID: 34986604]
[http://dx.doi.org/10.1109/TCBB.2020.2966633] [PMID: 31944984]
[http://dx.doi.org/10.1038/s41587-021-01179-w] [PMID: 35190689]
[http://dx.doi.org/10.1093/bioinformatics/bty275] [PMID: 29949966]
[http://dx.doi.org/10.4236/jbise.2016.95021]
[http://dx.doi.org/10.1093/bioinformatics/bty166] [PMID: 29554211]
[http://dx.doi.org/10.1111/j.1742-4658.2012.08603.x] [PMID: 22536855]
[http://dx.doi.org/10.1093/nargab/lqab004] [PMID: 33554119]
[http://dx.doi.org/10.1109/COMPSAC48688.2020.00-71]
[http://dx.doi.org/10.1093/bioinformatics/bty991] [PMID: 30500881]
[http://dx.doi.org/10.1093/bioinformatics/bty211] [PMID: 29648582]
[http://dx.doi.org/10.1093/bioinformatics/btu277] [PMID: 24931975]
[http://dx.doi.org/10.1007/978-3-642-35289-8_26]
[http://dx.doi.org/10.1016/j.neunet.2018.07.011] [PMID: 30092410]
[http://dx.doi.org/10.1093/bioinformatics/bts565] [PMID: 23060610]
[http://dx.doi.org/10.1007/s41066-017-0049-2]
[http://dx.doi.org/10.1093/bioinformatics/btx431] [PMID: 29036616]
[http://dx.doi.org/10.1186/s12859-020-03546-x] [PMID: 32517697]
[http://dx.doi.org/10.1006/jtbi.2001.2512] [PMID: 12183124]
[http://dx.doi.org/10.1038/nbt0804-1035] [PMID: 15286655]
[http://dx.doi.org/10.1002/bip.20296] [PMID: 15895431]
[http://dx.doi.org/10.1371/journal.pone.0141287] [PMID: 26555596]
[http://dx.doi.org/10.48550/arXiv.1701.06279]
[http://dx.doi.org/10.1038/s41598-019-53324-w] [PMID: 31729443]
[http://dx.doi.org/10.1038/s41592-019-0598-1] [PMID: 31636460]
[http://dx.doi.org/10.1145/2684746.2689060]
[http://dx.doi.org/ 10.1109/CogMI48466.2019.00029]
[http://dx.doi.org/10.1109/ICEngTechnol.2017.8308186]
[http://dx.doi.org/10.1007/11503415_37]
[http://dx.doi.org/10.48550/arXiv.1205.2653]
[http://dx.doi.org/10.1088/1742-6596/1168/2/022022]
[http://dx.doi.org/10.7551/mitpress/7503.003.0024]
[http://dx.doi.org/10.48550/arXiv.1704.08863]
[http://dx.doi.org/ 10.1109/ICCV.2015.123]
[http://dx.doi.org/10.1109/ACCESS.2019.2912200]
[http://dx.doi.org/10.1109/CDC.2012.6426698]
[http://dx.doi.org/10.1109/CVPR.2019.01138]
[http://dx.doi.org/10.1142/S0218488598000094]
[http://dx.doi.org/ 10.1109/ACCESS.2020.2982666]
[http://dx.doi.org/10.1186/s12859-017-1700-2] [PMID: 28545462]
[http://dx.doi.org/10.1007/978-1-4899-7993-3]
[http://dx.doi.org/10.1007/978-3-642-35289-8_32]
[http://dx.doi.org/10.48550/arXiv.1807.02811]
[http://dx.doi.org/10.48550/arXiv.1807.02811]
[http://dx.doi.org/10.1109/72.265956] [PMID: 18267775]
[http://dx.doi.org/10.1186/s12864-019-6413-7]
[http://dx.doi.org/10.1016/j.patrec.2008.08.010]
[http://dx.doi.org/10.1148/radiology.143.1.7063747] [PMID: 7063747]
[http://dx.doi.org/10.1016/0304-3800(89)90035-5]
[http://dx.doi.org/10.3354/cr030079]
[http://dx.doi.org/10.2214/ajr.18.20224]
[http://dx.doi.org/10.1093/nar/25.17.3389] [PMID: 9254694]
[http://dx.doi.org/10.1093/bioinformatics/14.9.755] [PMID: 9918945]
[http://dx.doi.org/10.1371/journal.pone.0258625] [PMID: 34669708]
[http://dx.doi.org/10.1016/j.neucom.2020.11.062]
[http://dx.doi.org/10.1145/1553374.1553399]
[http://dx.doi.org/10.1038/nature16961] [PMID: 26819042]
[http://dx.doi.org/ 10.2174/1574893616666211108094205 ]
[http://dx.doi.org/10.2174/1574893612666170707095707]
[http://dx.doi.org/ 10.2174/157489361508201221092330]
[http://dx.doi.org/10.1016/j.ymthe.2022.05.001] [PMID: 35526094]
[http://dx.doi.org/10.18653/v1/2020.emnlp-demos.6]
[http://dx.doi.org/10.1093/bioinformatics/btab133]
[http://dx.doi.org/10.1007/s00500-021-06707-z]
[http://dx.doi.org/10.1109/TCBB.2022.3170719] [PMID: 35476573]