Generic placeholder image

Current Bioinformatics

Editor-in-Chief

ISSN (Print): 1574-8936
ISSN (Online): 2212-392X

Review Article

Convolutional Neural Networks: A Promising Deep Learning Architecture for Biological Sequence Analysis

Author(s): Chinju John, Jayakrushna Sahoo, Manu Madhavan* and Oommen K. Mathew

Volume 18, Issue 7, 2023

Published on: 19 June, 2023

Page: [537 - 558] Pages: 22

DOI: 10.2174/1574893618666230320103421

Price: $65

Abstract

The deep learning arena explores new dimensions once considered impossible to human intelligence. Recently, it has taken footsteps in the biological data world to deal with the diverse patterns of data derived from biomolecules. The convolutional neural networks, one of the most employed and persuasive deep learning architectures, can unravel the sequestered truths from these data, especially from the biological sequences. These neural network variants outperform traditional bioinformatics tools for the enduring tasks associated with such sequences.

This work imparts an exciting preface to the basics of convolutional neural network architecture and how it can be instrumented to deal with biological sequence analysis.

The approach followed in this paper can provide the reader with an enhanced view of convolutional neural networks, their basic working principles and how they apply to biological sequences.

A detailed view of critical steps involved in deep learning, starting from the data preprocessing, architecture designing, model training, hyperparameter tuning, and evaluation metrics, are portrayed. A comparative analysis of convolutional neural network architectures developed for protein family classification is also discussed.

This review contributes significantly to understanding the concepts behind deep learning architectures and their applications in biological sequence analysis. It can lift the barrier of limited knowledge to a great extent on the deep learning concepts and their implementation, especially for people who are dealing with pure biology.

Next »
Graphical Abstract

[1]
Abiodun OI, Jantan A, Omolara AE, Dada KV, Mohamed NA, Arshad H. State-of-the-art in artificial neural network applications: A survey. Heliyon 2018; 4(11): e00938.
[http://dx.doi.org/10.1016/j.heliyon.2018.e00938] [PMID: 30519653]
[2]
LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015; 521(7553): 436-44.
[http://dx.doi.org/10.1038/nature14539] [PMID: 26017442]
[3]
Santana LMQ, Santos RM, Matos LN, Macedo HT. Deep neural networks for acoustic modeling in the presence of noise. Rev IEEE Am Lat 2018; 16(3): 918-25.
[http://dx.doi.org/10.1109/TLA.2018.8358674]
[4]
Wu M, Chen L. Image Recognition Based on Deep Learning. In 2015 Chinese Automation Congress (CAC) New York: IEEE,. 2015.
[http://dx.doi.org/10.1109/CAC.2015.7382560.]
[5]
Sutskever I, Vinyals O, Le QV. Sequence to sequence learning with neural networks. Adv Neural Inf Proc Syst 2014; 4(January): 3104-12.
[6]
Bordes A, Chopra S, Weston J. Question Answering with Subgraph Embeddings. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) Association for Computational Linguistics Stroudsburg, PA, USA. 2014; pp. 615-70.
[http://dx.doi.org/10.3115/v1/D14-1067]
[7]
Schmidhuber J. Deep learning in neural networks: An overview. Neural Netw 2015; 61: 85-117.
[http://dx.doi.org/10.1016/j.neunet.2014.09.003] [PMID: 25462637]
[8]
Mardis ER. The impact of next-generation sequencing technology on genetics. Trends Genet 2008; 24(3): 133-41.
[http://dx.doi.org/10.1016/j.tig.2007.12.007] [PMID: 18262675]
[9]
Hebb DO. The organization of behavior: A neuropsychological theory. New York: Psychology Press 2005.
[http://dx.doi.org/10.4324/9781410612403]
[10]
Angermueller C, Pärnamaa T, Parts L, Stegle O. Deep learning for computational biology. Mol Syst Biol 2016; 12(7): 878.
[http://dx.doi.org/10.15252/msb.20156651] [PMID: 27474269]
[11]
Du X, Sun S, Hu C, Yao Y, Yan Y, Zhang Y. deepppi: boosting prediction of protein–protein interactions with deep neural networks. J Chem Inf Model 2017; 57(6): 1499-510.
[http://dx.doi.org/10.1021/acs.jcim.7b00028] [PMID: 28514151]
[12]
Tran NH, Zhang X, Xin L, Shan B, Li M. De novo peptide sequencing by deep learning. Proc Natl Acad Sci USA 2017; 114(31): 8247-52.
[http://dx.doi.org/10.1073/pnas.1705691114] [PMID: 28720701]
[13]
Yang R, Wu F, Zhang C, Zhang L. Ienhancer-gan: A deep learning framework in combination with word embedding and sequence generative adversarial net to identify enhancers and their strength. Int J Mol Sci 2021; 22(7): 3589.
[http://dx.doi.org/10.3390/ijms22073589] [PMID: 33808317]
[14]
Hou J, Adhikari B, Cheng J, Deep SF. Deep convolutional neural network for mapping protein sequences to folds. Bioinform 2017; 34(8): 1295-303.
[15]
Angermueller C, Lee HJ, Reik W, Stegle O. DeepCpG: Accurate prediction of single-cell DNA methylation states using deep learning. Genome Biol 2017; 18(1): 67.
[http://dx.doi.org/10.1186/s13059-017-1189-z] [PMID: 28077169]
[16]
Eraslan G, Avsec Ž, Gagneur J, Theis FJ. Deep learning: New computational modelling techniques for genomics. Nat Rev Genet 2019; 20(7): 389-403.
[http://dx.doi.org/10.1038/s41576-019-0122-6] [PMID: 30971806]
[17]
Min S, Lee B, Yoon S. Deep learning in bioinformatics. Brief Bioinform 2016; 18(5): bbw068.
[http://dx.doi.org/10.1093/bib/bbw068] [PMID: 27473064]
[18]
Jurtz VI, Johansen AR, Nielsen M, et al. An introduction to deep learning on biological sequence data: Examples and solutions. Bioinformatics 2017; 33(22): 3685-90.
[http://dx.doi.org/10.1093/bioinformatics/btx531] [PMID: 28961695]
[19]
Li H, Tian S, Li Y, et al. Modern deep learning in bioinformatics. J Mol Cell Biol 2021; 12(11): 823-7.
[http://dx.doi.org/10.1093/jmcb/mjaa030] [PMID: 32573721]
[20]
Wang H. Raj, B On the Origin of Deep Learning; Machine Learning. arXiv 2017.
[http://dx.doi.org/ 10.48550/arXiv.1702.07800]
[21]
Tan M, Le Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning; Long Beach, California, 2019; 6105-14.http://arxiv.org/abs/1905.11946
[22]
Kaiser Ł Bengio S. Discrete autoencoders for sequence models. arXiv preprint arXiv:180109797, 2018.http://arxiv.org/abs/1801.09797
[23]
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN. Kaiser Ł, Polosukhin I. Attention is all you need. Advances in neural information processing systems; Long Beach, California 2017; 30.http://dl.acm.org/doi/10.5555/3295222.3295349
[24]
Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial networks. Commun ACM 2020; 63(11): 139-44.
[http://dx.doi.org/10.1145/3422622]
[25]
He K, Zhang X, Ren S, Sun J. Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Lect Notes Comput Sci . 2014; 8691: pp. 346-61.
[http://dx.doi.org/10.1007/978-3-319-10578-9_23]
[26]
Zeiler MD, Fergus R. Visualizing and understanding convolutional networks In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 2014 Proceedings, Part 1 Berlin Germany: Springer International Publishing. 818-33.
[http://dx.doi.org/ 10.1007/978-3-319-10590-1_53]
[27]
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556 2014 Sep 4
[http://dx.doi.org/10.48550/arXiv.1409.1556]
[28]
Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions. InProceedings of the IEEE conference on computer vision and pattern recognition; Boston, USA 2015; 1-9.
[http://dx.doi.org/10.1109/CVPR.2015.7298594]
[29]
Kingma DP, Welling M. Auto-encoding variational bayes. arXiv preprint arXiv:13126114 2013 Dec 20 2013.
[http://dx.doi.org/10.48550/arXiv.1312.6114]
[30]
Socher R, Lin CC-Y, Ng AY, Manning CD. Parsing Natural Scenes, and Natural Language with Recursive Neural Networks. In Proceedings of the 28th International Conference on International Conference on Machine Learning; Omnipress; Madison, WI, USA, 2011; 129-36.
[http://dx.doi.org/10.5555/3104482.3104499]
[31]
Salakhutdinov R, Murray I. On the quantitative analysis of deep belief networks. In Proceedings of the 25th international conference on Machine learning. Helsinki, Finland. 2008; pp. 872-9.
[http://dx.doi.org/10.1145/1390156.1390266]
[32]
Salakhutdinov R, Hinton G. Deep Boltzmann machines. J Mach Learn Res 2009; 5(3): 448-55.
[33]
Ngiam J, Chen Z, Koh PW, Ng AY. Learning Deep Energy Models.. In Proceedings of the 28th International Conference on International Conference on Machine Learning; Omnipress; Madison, WI, USA,. 1105-12.
[http://dx.doi.org/10.5555/3104482.3104621]
[34]
LeCun Y, Boser B, Denker J, et al. Handwritten digit recognition with a back-propagation network Advances in neural information processing systems Denver, Colorado1989; 396-404.
[35]
Jordan MI. Serial Order: A parallel distributed processing approach-Ies. Report 8604 California, CA: institute for cognitive science university of California,. 1986.
[36]
Fukushima K. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern 1980; 36(4): 193-202.
[http://dx.doi.org/10.1007/BF00344251] [PMID: 7370364]
[37]
Rosenblatt F. The perceptron: A probabilistic model for information storage and organization in the brain. Psychol Rev 1958; 65(6): 386-408.
[http://dx.doi.org/10.1037/h0042519] [PMID: 13602029]
[38]
Chakraverty S, Sahoo DM, Mahato NR, Chakraverty S, Sahoo DM, Mahato NR. McCulloch–Pitts neural network model Concepts of soft computing: fuzzy and ANN with programming. Berlin: Springer 2019; pp. 167-73.
[39]
Hubel DH, Wiesel TN. Receptive fields and functional architecture of monkey striate cortex. J Physiol 1968; 195(1): 215-43.
[http://dx.doi.org/10.1113/jphysiol.1968.sp008455] [PMID: 4966457]
[40]
O’Shea K, Nash R. An Introduction to Convolutional Neural Networks. arXiv [csNE], 2015.http://arxiv.org/abs/1511.08458
[41]
Agarap AF. Deep learning using Rectified Linear Units (ReLU). arXiv [csNE] 2018.http://arxiv.org/abs/1803.08375
[42]
Gholamalinezhad H, Khosravi H. Pooling methods in deep neural networks, a review. arXiv [csCV], 2020.http://arxiv.org/abs/2009.07485
[43]
Lin M, Chen Q, Yan S. (2014) Network in Network. 2nd International Conference on Learning Representations, ICLR 2014. Banff, AB. 2014; http://arxiv.org/abs/1312.4400
[44]
Zhang Q, Wang S, Chen Z, He Y, Liu Q, Huang DS. Locating transcription factor binding sites by fully convolutional neural network. Brief Bioinform 2021; 22(5): bbaa435.
[http://dx.doi.org/10.1093/bib/bbaa435] [PMID: 33498086]
[45]
Budach S, Marsico A. pysster: Classification of biological sequences by learning sequence and structure motifs with convolutional neural networks. Bioinformatics 2018; 34(17): 3035-7.
[http://dx.doi.org/10.1093/bioinformatics/bty222] [PMID: 29659719]
[46]
Zhang Z, Park CY, Theesfeld CL, Troyanskaya OG. An automated framework for efficiently designing deep convolutional neural networks in genomics. Nat Mach Intell 2021; 3(5): 392-400.
[http://dx.doi.org/10.1038/s42256-021-00316-z]
[47]
Chen D, Jacob L, Mairal J. Biological sequence modeling with convolutional kernel networks. Bioinformatics 2019; 35(18): 3294-302.
[http://dx.doi.org/10.1093/bioinformatics/btz094] [PMID: 30753280]
[48]
John C, Mathew OK, Sahoo J. CNN-LSTM based classification of polo like kinase family of Proteins: An emerging cancer drug target. Mater Today Proc 2022; 58: 445-50.
[http://dx.doi.org/10.1016/j.matpr.2022.02.395]
[49]
Amidi A, Amidi S, Vlachakis D, Megalooikonomou V, Paragios N, Zacharaki EI. EnzyNet: Enzyme classification using 3D convolutional neural networks on spatial representation. PeerJ 2018; 6(5): e4750.
[http://dx.doi.org/10.7717/peerj.4750] [PMID: 29740518]
[50]
Wang M, Tai C, L. Wei , DeFine L. Deep convolutional neural networks accurately quantify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding variants. Nucleic Acids Res 2018; 46(11): e69.
[http://dx.doi.org/10.1093/nar/gky215] [PMID: 29617928]
[51]
Zhou J, Troyanskaya OG. Predicting effects of noncoding variants with deep learning–based sequence model. Nat Methods 2015; 12(10): 931-4.
[http://dx.doi.org/10.1038/nmeth.3547] [PMID: 26301843]
[52]
Teng H, Cao MD, Hall MB, Duarte T, Wang S, Coin LJM. Chiron: Translating nanopore raw signal directly into nucleotide sequence using deep learning. Gigascience 2018; 7(5): giy037.
[http://dx.doi.org/10.1093/gigascience/giy037] [PMID: 29648610]
[53]
Alipanahi B, Delong A, Weirauch MT, Frey BJ. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol 2015; 33(8): 831-8.
[http://dx.doi.org/10.1038/nbt.3300] [PMID: 26213851]
[54]
Dai H, Umarov R, Kuwahara H, Li Y, Song L, Gao X. Sequence2Vec: A novel embedding approach for modeling transcription factor binding affinity landscape. Bioinformatics 2017; 33(22): 3575-83.
[http://dx.doi.org/10.1093/bioinformatics/btx480] [PMID: 28961686]
[55]
Umarov R, Kuwahara H, Li Y, Gao X, Solovyev V. PromID: Human promoter prediction by deep learning arXiv 2018; 1-18.
[http://dx.doi.org/ 10.48550/arXiv.1810.01414]
[56]
Umarov RK, Solovyev VV. Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks. PLoS One 2017; 12(2): e0171410.
[http://dx.doi.org/10.1371/journal.pone.0171410] [PMID: 28158264]
[57]
Shao M, Ma J, Wang S. DeepBound: Accurate identification of transcript boundaries via deep convolutional neural fields. Bioinformatics 2017; 33(14): i267-73.
[http://dx.doi.org/10.1093/bioinformatics/btx267] [PMID: 28881999]
[58]
Zhang Y, Liu X, MacLeod J, Liu J. Discerning novel splice junctions derived from RNA-seq alignment: A deep learning approach. BMC Genomics 2018; 19(1): 971.
[http://dx.doi.org/10.1186/s12864-018-5350-1] [PMID: 30591034]
[59]
Louadi Zakaria , et al. Deep splicing code: Classifying alternative splicing events using deep learning. Genes 2019; 10(8): 587.
[http://dx.doi.org/10.3390/genes10080587]
[60]
Albaradei S, Magana-Mora A, Thafar M, et al. Splice2Deep: An ensemble of deep convolutional neural networks for improved splice site prediction in genomic DNA. Gene 2020; 763(March): 100035.
[http://dx.doi.org/10.1016/j.gene.2020.100035] [PMID: 32550561]
[61]
Mostavi M, Salekin S, Huang Y. Deep-2′-O-Me: Predicting 2′-O-methylation sites by convolutional neural networks. Annu Int Conf IEEE Eng Med Biol Soc 2018; 2018: 2394-7.
[http://dx.doi.org/10.1109/EMBC.2018.8512780]
[62]
Baek Junghwan, et al. LncRNAnet: long non-coding RNA identification using deep learning. Bioinformatics 3422 (2018) 3889-97.
[http://dx.doi.org/10.1093/bioinformatics/bty418] [PMID: 29850775]
[63]
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput 1997; 9(8): 1735-80.
[http://dx.doi.org/10.1162/neco.1997.9.8.1735] [PMID: 9377276]
[64]
Veltri D, Kamath U, Shehu A. Deep learning improves antimicrobial peptide recognition. Bioinformatics 2018; 34(16): 2740-7.
[http://dx.doi.org/10.1093/bioinformatics/bty179] [PMID: 29590297]
[65]
Quang D, Xie X, Dan Q. A hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res 2016; 44(11): e107.
[http://dx.doi.org/10.1093/nar/gkw226] [PMID: 27084946]
[66]
Acera MP, Balboa RF, Easteal S, Eyras E, Patel HR. PACIFIC: A lightweight deep-learning classifier of SARS-CoV-2 and co-infecting RNA viruses. Sci Rep 2021; 11(1): 3209.
[http://dx.doi.org/10.1038/s41598-021-82043-4] [PMID: 33547380]
[67]
Singh S, Yang Y, Póczos B, Ma J. Predicting enhancer-promoter interaction from genomic sequence with deep neural networks. Quant Biol 2019; 7(2): 122-37.
[http://dx.doi.org/10.1007/s40484-019-0154-0] [PMID: 34113473]
[68]
Chung J, Gulcehre C, Cho K, Bengio Y. . Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv [csNE] 2014.http://arxiv.org/abs/1412.3555
[69]
Zhang H, Hung CL, Liu M, Hu X, Lin YY. NCNet: Deep learning network models for predicting function of non-coding DNA. Front Genet 2019; 10(MAY): 432.
[http://dx.doi.org/10.3389/fgene.2019.00432] [PMID: 31191597]
[70]
Zhuang Z, Shen X, Pan W. A simple convolutional neural network for prediction of enhancer–promoter interactions with DNA sequence data. Bioinformatics 2019; 35(17): 2899-906.
[http://dx.doi.org/10.1093/bioinformatics/bty1050] [PMID: 30649185]
[71]
Umarov R, Kuwahara H, Li Y, Gao X, Solovyev V. Promoter analysis and prediction in the human genome using sequence-based deep learning models. Bioinformatics 2019; 35(16): 2730-7.
[http://dx.doi.org/10.1093/bioinformatics/bty1068] [PMID: 30601980]
[72]
Kristianingsih R, MacLean D. Accurate plant pathogen effector protein classification ab initio with deepredeff: An ensemble of convolutional neural networks. BMC Bioinformatics 2021; 22(1): 372.
[http://dx.doi.org/10.1186/s12859-021-04293-3] [PMID: 34273967]
[73]
Dasari CM, Bhukya R. Explainable deep neural networks for novel viral genome prediction. Appl Intell 2022; 52(3): 3002-17.
[http://dx.doi.org/10.1007/s10489-021-02572-3] [PMID: 34764607]
[74]
Amilpur Santhosh, Bhukya Raju. Edeepssp: Explainable deep neural networks for exact splice sites prediction. J Bioinform Comput Biol 2020; 18(4): 2050024.
[http://dx.doi.org/10.1142/S0219720020500249]
[75]
Liu L, Li Y, Li S, et al. Comparison of next-generation sequencing systems. J Biomed Biotechnol 2012; 2012: 251364.
[http://dx.doi.org/10.1155/2012/251364] [PMID: 22829749]
[76]
Restrepo-Pérez L, Joo C, Dekker C. Paving the way to single-molecule protein sequencing. Nat Nanotechnol 2018; 13(9): 786-96.
[http://dx.doi.org/10.1038/s41565-018-0236-6] [PMID: 30190617]
[77]
Johnson M, Zaretskaya I, Raytselis Y, Merezhuk Y, McGinnis S, Madden TL. NCBI BLAST: A better web interface. Nucleic Acids Res 2008; 36(2): W5-9.
[http://dx.doi.org/10.1093/nar/gkn201] [PMID: 18440982]
[78]
Rigden DJ, Fernández XM. The 2022 Nucleic Acids Research database issue and the online molecular biology database collection. Nucleic Acids Res 2022; 50(D1): D1-D10.
[http://dx.doi.org/10.1093/nar/gkab1195] [PMID: 34986604]
[79]
Zhang D, Kabuka M. Protein family classification from scratch: A CNN based deep learning approach. IEEE/ACM Trans Comput Biol Bioinform 2020; 14(8): 1-1.
[http://dx.doi.org/10.1109/TCBB.2020.2966633] [PMID: 31944984]
[80]
Bileschi M L, Belanger D, Bryant D H, et al. Using deep learning to annotate the protein universe. Nat Biotechnol 2022; 40(6): 932-7.
[http://dx.doi.org/10.1038/s41587-021-01179-w] [PMID: 35190689]
[81]
Seo S, Oh M, Park Y, Kim S. DeepFam: Deep learning based alignment-free method for protein family modeling and prediction. Bioinformatics 2018; 34(13): i254-62.
[http://dx.doi.org/10.1093/bioinformatics/bty275] [PMID: 29949966]
[82]
Nguyen NG, Tran VA, Ngo DL, et al. DNA sequence classification by convolutional neural network. J Biomed Sci Eng 2016; 9(5): 280-6.
[http://dx.doi.org/10.4236/jbise.2016.95021]
[83]
Khurana S, Rawi R, Kunji K, Chuang GY, Bensmail H, Mall R. DeepSol: A deep learning framework for sequence-based protein solubility prediction. Bioinformatics 2018; 34(15): 2605-13.
[http://dx.doi.org/10.1093/bioinformatics/bty166] [PMID: 29554211]
[84]
Smialowski P, Doose G, Torkler P, Kaufmann S, Frishman D. PROSO II - a new method for protein solubility prediction. FEBS J 2012; 279(12): 2192-200.
[http://dx.doi.org/10.1111/j.1742-4658.2012.08603.x] [PMID: 22536855]
[85]
Bartoszewicz JM, Seidel A, Renard BY. Interpretable detection of novel human viruses from genome sequencing data. NAR Genomics and Bioinformatics 2021; 3(1): lqab004.
[http://dx.doi.org/10.1093/nargab/lqab004] [PMID: 33554119]
[86]
Tavakoli N. Seq2image: Sequence analysis using visualization and deep convolutional neural network. In2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC); Madrid, Spain 2020; 1332-7.
[http://dx.doi.org/10.1109/COMPSAC48688.2020.00-71]
[87]
Xia Z, Li Y, Zhang B, et al. DeeReCT-PolyA: A robust and generic deep learning method for PAS identification. Bioinformatics 2019; 35(14): 2371-9.
[http://dx.doi.org/10.1093/bioinformatics/bty991] [PMID: 30500881]
[88]
Leung MKK, Delong A, Frey BJ. Inference of the human polyadenylation code. Bioinformatics 2018; 34(17): 2889-98.
[http://dx.doi.org/10.1093/bioinformatics/bty211] [PMID: 29648582]
[89]
Leung MKK, Xiong HY, Lee LJ, Frey BJ. Deep learning of the tissue-regulated splicing code. Bioinformatics 2014; 30(12): i121-9.
[http://dx.doi.org/10.1093/bioinformatics/btu277] [PMID: 24931975]
[90]
Bengio Y. Practical recommendations for gradient-based training of deep architectures. Lect Notes Comput Sci . 2012; 7700: pp. 437-78.
[http://dx.doi.org/10.1007/978-3-642-35289-8_26]
[91]
Buda M, Maki A, Mazurowski MA. A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw 2018; 106: 249-59.
[http://dx.doi.org/10.1016/j.neunet.2018.07.011] [PMID: 30092410]
[92]
Jason Wei and Kai Zou. 2019 EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China. New York, USA: Association for Computational Linguistics 6382–638-8.http://aclanthology.org/D19-1670
[93]
Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics 2012; 28(23): 3150-2.
[http://dx.doi.org/10.1093/bioinformatics/bts565] [PMID: 23060610]
[94]
Liu H, Cocea M. Semi-random partitioning of data into training and test sets in granular computing context. Gran Comput 2017; 2(4): 357-86.
[http://dx.doi.org/10.1007/s41066-017-0049-2]
[95]
Almagro Armenteros JJ, Sønderby CK, Sønderby SK, Nielsen H, Winther O. DeepLoc: Prediction of protein subcellular localization using deep learning. Bioinformatics 2017; 33(21): 3387-95.
[http://dx.doi.org/10.1093/bioinformatics/btx431] [PMID: 29036616]
[96]
ElAbd H, Bromberg Y, Hoarfrost A, Lenz T, Franke A, Wendorff M. Amino acid encoding for deep learning applications. BMC Bioinformatics 2020; 21(1): 235.
[http://dx.doi.org/10.1186/s12859-020-03546-x] [PMID: 32517697]
[97]
Lin K, May ACW, Taylor WR. Amino acid encoding schemes from protein structure alignments: Multi-dimensional vectors to describe residue types. J Theor Biol 2002; 216(3): 361-5.
[http://dx.doi.org/10.1006/jtbi.2001.2512] [PMID: 12183124]
[98]
Eddy SR. Where did the BLOSUM62 alignment score matrix come from? Nat Biotechnol 2004; 22(8): 1035-6.
[http://dx.doi.org/10.1038/nbt0804-1035] [PMID: 15286655]
[99]
Mei H, Liao ZH, Zhou Y, Li SZ. A new set of amino acid descriptors and its application in peptide QSARs. Biopolymers 2005; 80(6): 775-86.
[http://dx.doi.org/10.1002/bip.20296] [PMID: 15895431]
[100]
Asgari E, Mofrad MRK. Continuous Distributed representation of biological sequences for deep proteomics and genomics. PLoS One 2015; 10(11): e0141287.
[http://dx.doi.org/10.1371/journal.pone.0141287] [PMID: 26555596]
[101]
Ng P. dna2vec: Consistent vector representations of variable-length k-mers. arXiv preprint arXiv:170106279 2017 Jan 23 2017.
[http://dx.doi.org/10.48550/arXiv.1701.06279]
[102]
Pennington J, Socher R, Manning CD. Glove: Global vectors for word representation. Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). Doha, Qatar. 2014; pp. 1532-43.
[103]
Lee TK, Nguyen T. Protein Family Classification with Neural Networks. 2016; pp. 1-9. Available from: cs224d.stanford.edu
[104]
Raimondi D, Orlando G, Vranken WF, Moreau Y. Exploring the limitations of biophysical propensity scales coupled with machine learning for protein sequence analysis. Sci Rep 2019; 9(1): 16932.
[http://dx.doi.org/10.1038/s41598-019-53324-w] [PMID: 31729443]
[105]
Alley EC, Khimulya G, Biswas S, AlQuraishi M, Church GM. Unified rational protein engineering with sequence-based deep representation learning. Nat Methods 2019; 16(12): 1315-22.
[http://dx.doi.org/10.1038/s41592-019-0598-1] [PMID: 31636460]
[106]
Zhang C, Li P, Sun G, Guan Y, Xiao B. Optimizing FPGA-based accelerator design for deep convolutional neural networks. InProceedings of the 2015 ACM/SIGDA international symposium on field-programmable gate arrays; Monterey, California 2015; 161-70.
[http://dx.doi.org/10.1145/2684746.2689060]
[107]
Dai W, Berleant D. Benchmarking Contemporary Deep Learning Hardware and Frameworks: A Survey of Qualitative Metrics. 2019 IEEE First International Conference on Cognitive Machine Intelligence (CogMI). Los Angeles, CA. New York: IEEE 2019; pp. 148-55.
[http://dx.doi.org/ 10.1109/CogMI48466.2019.00029]
[108]
Dwarampudi M, Reddy NVS. Effects of padding on LSTMs and CNNs. arXiv:190307288 2019.
[109]
Albawi S, Mohammed T, Alzawi S. Layers of a Convolutional Neural Network ICET; Antalya, Turkey. New York, USA: IEEE 2017; pp. 1-6.
[http://dx.doi.org/10.1109/ICEngTechnol.2017.8308186]
[110]
Xu B, Wang N, Chen T, Li M. Empirical evaluation of rectified activations in convolutional network. arXiv [csLG], 2015.http://arxiv.org/abs/1505.00853
[111]
Sharma S, Sharma S, Anidhya A. Understanding activation functions in neural networks. Int J Eng Appl Sci Technol 2020; 4(12): 310-6.
[112]
Srivastava N, et al. Dropout: A simple way to prevent neural networks from overfitting. J Mach Learn Res 2014; 15(1): 1929-58.
[113]
Srebro N, Shraibman A. Rank, trace-norm and max-norm. InLearning Theory 18th Annual Conference on Learning Theory, COLT 2005. Bertinoro, Italy. June 27-30, 2005. Proceedings 18 erlin: Springer 545-60.
[http://dx.doi.org/10.1007/11503415_37]
[114]
Cortes C, Mohri M, Rostamizadeh A. L2 regularization for learning kernels. arXiv preprint arXiv:12052653 2012 May 9 2012.
[http://dx.doi.org/10.48550/arXiv.1205.2653]
[115]
Ying X. An overview of overfitting and its solutions. J Phys Conf Ser 2019; 1168(2): 022022.
[http://dx.doi.org/10.1088/1742-6596/1168/2/022022]
[116]
Bengio Y, Lamblin P, Popovici D, Larochelle H. Greedy layer-wise training of deep networks. Adv Neural Inf Proce Syst. 2017; pp. (1): 153-60.
[http://dx.doi.org/10.7551/mitpress/7503.003.0024]
[117]
Kumar SK. On weight initialization in deep neural networks. arXiv preprint arXiv:170408863 2017 Apr 28
[http://dx.doi.org/10.48550/arXiv.1704.08863]
[118]
He K, Zhang X, Ren S, Sun J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. Proceedings of the IEEE international conference on computer vision Santiago, Chile. 2015; pp. 2015; 1026-34.
[http://dx.doi.org/ 10.1109/ICCV.2015.123]
[119]
Shrestha A, Mahmood A. Review of deep learning algorithms and architectures. IEEE Access 2019; 7: 53040-65.
[http://dx.doi.org/10.1109/ACCESS.2019.2912200]
[120]
Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. 2014 Dec 22. http://arxiv.org/abs/1412.6980
[121]
Lydia AA, Francis FS. Adagrad-an optimizer for stochastic gradient descent. Int J Inf Comput Sci 2019; 6(5): 566-8.
[122]
Duchi JC, Bartlett PL, Wainwright MJ. Randomized smoothing for (parallel) stochastic optimization. Proc IEEE Conf Decis Control 2012; 12: 5442-4.
[http://dx.doi.org/10.1109/CDC.2012.6426698]
[123]
Fangyu Zou, Li Shen, Zequn Jie, Weizhong Zhang, Wei Liu, “A Sufficient Condition for Convergences of Adam and RMSProp Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA. 2019; pp. 11127-35.
[http://dx.doi.org/10.1109/CVPR.2019.01138]
[124]
Hochreiter S. The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int J Uncertain Fuzziness Knowl Based Syst 1998; 6(2): 107-16.
[http://dx.doi.org/10.1142/S0218488598000094]
[125]
Ioffe S, Szegedy C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv [cs.LG], 2015.http://arxiv.org/abs/1502.03167
[126]
Khanal J, Tayara H, Chong KT. Identifying enhancers and their strength by the integration of word embedding and convolution neural network. IEEE Access 2020; 8: 58369-76.
[http://dx.doi.org/ 10.1109/ACCESS.2020.2982666]
[127]
Sun T, Zhou B, Lai L, Pei J. Sequence-based prediction of protein protein interaction using a deep-learning algorithm. BMC Bioinformatics 2017; 18(1): 277.
[http://dx.doi.org/10.1186/s12859-017-1700-2] [PMID: 28545462]
[128]
Raschka S. Model evaluation, model selection, and algorithm selection in machine learning. arXiv preprint arXiv:181112808 2018 Nov 13 http:/ 2018.arxiv.org/abs/1811.12808
[129]
Liu L, Özsu MT. Encyclopedia of database systems. New York, NY, USA: Springer 2020. Sep 29
[http://dx.doi.org/10.1007/978-1-4899-7993-3]
[130]
Hinton GE. A practical guide to training restricted boltzmann machines. Lect Notes Comput Sci 2012; 7700: pp. 599-619.
[http://dx.doi.org/10.1007/978-3-642-35289-8_32]
[131]
Bergstra J, Bardenet R, Bengio Y, Kégl B. Algorithms for hyper-parameter optimization. Advances in neural information processing systems; Granada, Spain 24: 1-9. [https://hal.inria.fr/hal-00642998
[132]
Bergstra J, Bengio Y. Random search for hyper-parameter optimization. J Mach Learn Res 2012; 13: 281-305.
[http://dx.doi.org/10.48550/arXiv.1807.02811]
[133]
Frazier PI. tutorial on Bayesian optimization. arXiv preprint arXiv:1807.02811. 2018 Jul 8. 2018.
[http://dx.doi.org/10.48550/arXiv.1807.02811]
[134]
Fogel DB. An introduction to simulated evolutionary optimization. IEEE Trans Neural Netw 1994; 5(1): 3-14.
[http://dx.doi.org/10.1109/72.265956] [PMID: 18267775]
[135]
Chicco D, Jurman G. The Advantages of the Matthews Correlation Coefficient (MCC) over F1 Score and Accuracy in Binary Classification Evaluation. BMC Genomics 2020; 21(1): 6.
[http://dx.doi.org/10.1186/s12864-019-6413-7]
[136]
Ferri C, Hernández-Orallo J, Modroiu R. An experimental comparison of performance measures for classification. Pattern Recognit Lett 2009; 30(1): 27-38.
[http://dx.doi.org/10.1016/j.patrec.2008.08.010]
[137]
Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982; 143(1): 29-36.
[http://dx.doi.org/10.1148/radiology.143.1.7063747] [PMID: 7063747]
[138]
Wallach D, Goffinet B. Mean squared error of prediction as a criterion for evaluating and comparing system models. Ecol Modell 1989; 44(3–4): 299-306.
[http://dx.doi.org/10.1016/0304-3800(89)90035-5]
[139]
Willmott Cort J, Kenji. Matsuura . Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate research 30.1(2005): 79-82.
[http://dx.doi.org/10.3354/cr030079]
[140]
Handelman GS, Kok HK, Chandra RV, et al. Peering into the black box of artificial intelligence: evaluation metrics of machine learning methods. AJR Am J Roentgenol 2019; 212(1): 38-43.
[http://dx.doi.org/10.2214/ajr.18.20224]
[141]
Altschul S, Madden TL, Schäffer AA, et al. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res 1997; 25(17): 3389-402.
[http://dx.doi.org/10.1093/nar/25.17.3389] [PMID: 9254694]
[142]
Eddy SR. Profile hidden Markov models. Bioinformatics 1998; 14(9): 755-63.
[http://dx.doi.org/10.1093/bioinformatics/14.9.755] [PMID: 9918945]
[143]
Sandaruwan PD, Wannige CT. An improved deep learning model for hierarchical classification of protein families. PLoS One 2021; 16(10): e0258625.
[http://dx.doi.org/10.1371/journal.pone.0258625] [PMID: 34669708]
[144]
Yusuf SM, Zhang F, Zeng M, Li M. DeepPPF: A deep learning framework for predicting protein family. Neurocomputing 2021; 428: 19-29.
[http://dx.doi.org/10.1016/j.neucom.2020.11.062]
[145]
Dai W, Xue JO, Yang GR, Yu Q. EigenTransfer: A Unified Framework for Transfer Learning. Proc 26th Int Conf Mach Learn ICML 2009; 193-200.
[http://dx.doi.org/10.1145/1553374.1553399]
[146]
Silver D, Huang A, Maddison CJ, et al. Mastering the game of Go with deep neural networks and tree search. Nature 2016; 529(7587): 484-9.
[http://dx.doi.org/10.1038/nature16961] [PMID: 26819042]
[147]
Mostafa F A, Afify Y M, Ismail R M, Badr N L. Deep learning model for protein disease classification. Curr Bioinform 2022; 17(3): 245-53.
[http://dx.doi.org/ 10.2174/1574893616666211108094205 ]
[148]
Peng L, Peng M, Liao B, Huang G, Li W, Xie D. The Advances and challenges of deep learning application in biological big data processing. Curr Bioinform 2018; 13(4): 352-9.
[http://dx.doi.org/10.2174/1574893612666170707095707]
[149]
Zhang Y. Artificial intelligence for bioinformatics and biomedicine. Curr Bioinform 2021; 15(8): 801-2.
[http://dx.doi.org/ 10.2174/157489361508201221092330]
[150]
Hasan MM, Tsukiyama S, Cho JY, et al. Deepm5C: A deep-learning-based hybrid framework for identifying human RNA N5-methylcytosine sites using a stacking strategy. Mol Ther 2022; 30(8): 2856-67.
[http://dx.doi.org/10.1016/j.ymthe.2022.05.001] [PMID: 35526094]
[151]
Wolf T, Debut L, Sanh V, et al. Transformers: state-of-the-art natural language processing. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations Association for Computational Linguistics Stroudsburg, PA, USA 2020; 38-45.
[http://dx.doi.org/10.18653/v1/2020.emnlp-demos.6]
[152]
Charoenkwan P, Nantasenamat C, Hasan MM, Manavalan B, Shoombuatong W. BERT4Bitter: A bidirectional encoder representations from transformers (BERT)-based model for improving the prediction of bitter peptides. Bioinformatics 2021; 37(17): 2556-62.
[http://dx.doi.org/10.1093/bioinformatics/btab133]
[153]
Mansoor M, Nauman M, Ur Rehman H, Benso A. Gene Ontology GAN (GOGAN): A novel architecture for protein function prediction. Soft Comput 2022; 26(16): 7653-67.
[http://dx.doi.org/10.1007/s00500-021-06707-z]
[154]
Li M, Shi W, Zhang F, Zeng M, Li Y. A deep learning framework for predicting protein functions with co-occurrence of GO terms. IEEE/ACM Trans Comput Biol Bioinformatics 2022; 20(2): 833-42.
[http://dx.doi.org/10.1109/TCBB.2022.3170719] [PMID: 35476573]

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy