Abstract
Background: Chromosomal DNA contains most of the genetic information of eukaryotes and plays an important role in the growth, development and reproduction of living organisms. Most chromosomal DNA sequences are known to wrap around histones, and distinguishing these DNA sequences from ordinary DNA sequences is important for understanding the genetic code of life. The main difficulty behind this problem is the feature selection process. DNA sequences have no explicit features, and the common representation methods, such as onehot coding, introduced the major drawback of high dimensionality. Recently, deep learning models have been proved to be able to automatically extract useful features from input patterns.
Objective: We aim to investigate which deep learning networks could achieve notable improvements in the field of DNA sequence classification using only sequence information.
Methods: In this paper, we present four different deep learning architectures using convolutional neural networks and long short-term memory networks for the purpose of chromosomal DNA sequence classification. Natural language model (Word2vec) was used to generate word embedding of sequence and learn features from it by deep learning.
Results: The comparison of these four architectures is carried out on 10 chromosomal DNA datasets. The results show that the architecture of convolutional neural networks combined with long short-term memory networks is superior to other methods with regards to the accuracy of chromosomal DNA prediction.
Conclusion: In this study, four deep learning models were compared for an automatic classification of chromosomal DNA sequences with no steps of sequence preprocessing. In particular, we have regarded DNA sequences as natural language and extracted word embedding with Word2Vec to represent DNA sequences. Results show a superiority of the CNN+LSTM model in the ten classification tasks. The reason for this success is that the CNN module captures the regulatory motifs, while the following LSTM layer captures the long-term dependencies between them.
Keywords: Convolutional neural network (CNN), long short-term memory network (LSTM), DNA sequence classification, eukaryotes, chromosomes, hybrid.
Graphical Abstract
Current Bioinformatics
Title:Classification of Chromosomal DNA Sequences Using Hybrid Deep Learning Architectures
Volume: 15 Issue: 10
Author(s): Zhihua Du*, Xiangdong Xiao and Vladimir N. Uversky*
Affiliation:
- Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen University,China
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, 12901 Bruce B. Downs Blvd. MDC07, Tampa, Florida, (V.N.U.),United States
Keywords: Convolutional neural network (CNN), long short-term memory network (LSTM), DNA sequence classification, eukaryotes, chromosomes, hybrid.
Abstract:
Background: Chromosomal DNA contains most of the genetic information of eukaryotes and plays an important role in the growth, development and reproduction of living organisms. Most chromosomal DNA sequences are known to wrap around histones, and distinguishing these DNA sequences from ordinary DNA sequences is important for understanding the genetic code of life. The main difficulty behind this problem is the feature selection process. DNA sequences have no explicit features, and the common representation methods, such as onehot coding, introduced the major drawback of high dimensionality. Recently, deep learning models have been proved to be able to automatically extract useful features from input patterns.
Objective: We aim to investigate which deep learning networks could achieve notable improvements in the field of DNA sequence classification using only sequence information.
Methods: In this paper, we present four different deep learning architectures using convolutional neural networks and long short-term memory networks for the purpose of chromosomal DNA sequence classification. Natural language model (Word2vec) was used to generate word embedding of sequence and learn features from it by deep learning.
Results: The comparison of these four architectures is carried out on 10 chromosomal DNA datasets. The results show that the architecture of convolutional neural networks combined with long short-term memory networks is superior to other methods with regards to the accuracy of chromosomal DNA prediction.
Conclusion: In this study, four deep learning models were compared for an automatic classification of chromosomal DNA sequences with no steps of sequence preprocessing. In particular, we have regarded DNA sequences as natural language and extracted word embedding with Word2Vec to represent DNA sequences. Results show a superiority of the CNN+LSTM model in the ten classification tasks. The reason for this success is that the CNN module captures the regulatory motifs, while the following LSTM layer captures the long-term dependencies between them.
Export Options
About this article
Cite this article as:
Du Zhihua *, Xiao Xiangdong and Uversky N. Vladimir *, Classification of Chromosomal DNA Sequences Using Hybrid Deep Learning Architectures, Current Bioinformatics 2020; 15 (10) . https://dx.doi.org/10.2174/1574893615666200224095531
DOI https://dx.doi.org/10.2174/1574893615666200224095531 |
Print ISSN 1574-8936 |
Publisher Name Bentham Science Publisher |
Online ISSN 2212-392X |
![](/images/wayfinder.jpg)
- Author Guidelines
- Bentham Author Support Services (BASS)
- Graphical Abstracts
- Fabricating and Stating False Information
- Research Misconduct
- Post Publication Discussions and Corrections
- Publishing Ethics and Rectitude
- Increase Visibility of Your Article
- Archiving Policies
- Peer Review Workflow
- Order Your Article Before Print
- Promote Your Article
- Manuscript Transfer Facility
- Editorial Policies
- Allegations from Whistleblowers
Related Articles
-
The I.F.A.S.T. Model Allows the Prediction of Conversion to Alzheimer Disease in Patients with Mild Cognitive Impairment with High Degree of Accuracy
Current Alzheimer Research Fundamentals of Prion Diseases and Their Involvement in the Loss of Function of Cellular Prion Protein
Protein & Peptide Letters From Cocaine to Ropivacaine: The History of Local Anesthetic Drugs
Current Topics in Medicinal Chemistry Hyperphosphorylation of Microtubule-Associated Protein Tau: A Promising Therapeutic Target for Alzheimer Disease
Current Medicinal Chemistry Animal Models of Narcolepsy
CNS & Neurological Disorders - Drug Targets Canine Nutritional Model: Influence of Age, Diet, and Genetics on Health and Well-Being
Current Nutrition & Food Science Recent Patents in Pressurised Metered Dose Inhalers
Recent Patents on Drug Delivery & Formulation Editorial (Thematic Issue: Effective and Promising Treatments for Neurological Disorders and Cancer)
Current Pharmaceutical Design Renal COX-2, Cytokines and 20-HETE: Tubular and Vascular Mechanisms
Current Pharmaceutical Design Computational Insights into the Structure and Dynamics of the Human Serotonin Transporter N-Terminus by Microsecond Molecular Dynamics
Current Proteomics Nephroprotective Potential of Urtica parviflora Roxb. Against Paracetamol Induced Nephrotoxicity in Rats
The Natural Products Journal Neuroprotective Effects of Lithium in Human Brain? Food for Thought
Current Alzheimer Research GC-MS Analysis of Essential Oil from Lavandula angustifolia Cultivated in Garhwal Himalaya
The Natural Products Journal HLA-G Molecule
Current Pharmaceutical Design Physical Exercise in MCI Elderly Promotes Reduction of Pro-Inflammatory Cytokines and Improvements on Cognition and BDNF Peripheral Levels
Current Alzheimer Research Lessons from the Drug Discovery of Lapatinib, a Dual ErbB1/2 Tyrosine Kinase Inhibitor
Current Topics in Medicinal Chemistry Artificial Intelligence for Epigenetics: Towards Personalized Medicine
Current Medicinal Chemistry Proteomic Analysis of Huntington’s Disease
Current Protein & Peptide Science Patent Selections:
Recent Patents on CNS Drug Discovery (Discontinued) Central Actions of Somatostatin-28 and Oligosomatostatin Agonists to Prevent Components of the Endocrine, Autonomic and Visceral Responses to Stress Through Interaction with Different Somatostatin Receptor Subtypes
Current Pharmaceutical Design