Protein Secondary Structure Prediction Using Character Bi-gram Embedding and Bi-LSTM

Ashish    Kumar    Sharma; Rajeev       Srivastava

doi:10.2174/1574893615999200601122840

Abstract

Background: Protein secondary structure is vital to predicting the tertiary structure, which is essential in deciding protein function and drug designing. Therefore, there is a high requirement of computational methods to predict secondary structure from their primary sequence. Protein primary sequences represented as a linear combination of twenty amino acid characters and contain the contextual information for secondary structure prediction.

Objective and Methods: Protein secondary structure predicted from their primary sequences using a deep recurrent neural network. Protein secondary structure depends on local and long-range residues in primary sequences. In the proposed work, the local contextual information of amino acid residues captures with character n-gram. A dense embedding vector represents this local contextual information. Furthermore, the bidirectional long short-term memory (Bi-LSTM) model is used to capture the long-range contexts by extracting the past and future residues information in primary sequences.

Results: The proposed deep recurrent architecture is evaluated for its efficacy for datasets, namely ss.txt, RS126, and CASP9. The model shows the Q3 accuracies of 88.45%, 83.48%, and 86.69% for ss.txt, RS126, and CASP9, respectively. The performance of the proposed model is also compared with other state-of-the-art methods available in the literature.

Conclusion: After a comparative analysis, it was observed that the proposed model is performing better in comparison to state-of-art methods.

Keywords: Proteomics, protein secondary structure, amino acids sequence, character n-gram embedding, deep learning, bidirectional long short-term memory.

« Previous Next »

Graphical Abstract

[1] 
Ashburner M, Ball CA, Blake JA, et al. The Gene Ontology Consortium. Gene ontology: tool for the unification of biology. Nat Genet  2000; 25(1): 25-9.
[http://dx.doi.org/10.1038/75556] [PMID:  10802651] 
[2] 
Cole C, Barber JD, Barton GJ. The Jpred 3 secondary structure prediction server .. Nucleic Acids Res 2008; 36(Web Server issue): W197-201. 
[http://dx.doi.org/10.1093/nar/gkn238] [PMID: 18463136] 
[3] 
Yoo P, Zhou B, Zomaya A. Machine Learning Techniques for Protein Secondary Structure Prediction: An Overview and Evaluation. Curr Bioinform  2008; 3(2): 74-86.
[http://dx.doi.org/10.2174/157489308784340676] 
[4] 
Yang Y, Gao J, Wang J, et al. Sixty-five years of the long march in protein secondary structure prediction: the final stretch? Brief Bioinform  2018; 19(3): 482-94.
[http://dx.doi.org/10.1093/bib/bbw129] [PMID:  28040746] 
[5] 
Kendrew JC, Bodo G, Dintzis HM, Parrish RG, Wyckoff H, Phillips DC. A three-dimensional model of the myoglobin molecule obtained by x-ray analysis. Nature  1958; 181(4610): 662-6.
[http://dx.doi.org/10.1038/181662a0] [PMID:  13517261] 
[6] 
Hafsa NE, Arndt D, Wishart DS. CSI 3.0: a web server for identifying secondary and super-secondary structure in proteins using NMR chemical shifts. Nucleic Acids Res  2015; 43(W1), W370-7.
[http://dx.doi.org/10.1093/nar/gkv494] [PMID:  25979265] 
[7] 
Dong A, Huang P, Caughey WS. Protein secondary structures in water from second-derivative amide I infrared spectra. Biochemistry  1990; 29(13): 3303-8.
[http://dx.doi.org/10.1021/bi00465a022] [PMID:  2159334] 
[8] 
Toomula N, Kumar S, Pavan Kumar V. Computational methods for protein structure prediction and its application in drug design. J Proteomics Bioinform Cit  2011; 4: 289-93.
[http://dx.doi.org/10.4172/jpb.1000203] 
[9] 
Chou PY, Fasman GD. Prediction of protein conformation. Biochemistry  1974; 13(2): 222-45.
[http://dx.doi.org/10.1021/bi00699a002] [PMID:  4358940] 
[10] 
Kloczkowski A, Ting KL, Jernigan RL, Garnier J. Combining the GOR V algorithm with evolutionary information for protein secondary structure prediction from amino acid sequence. Proteins  2002; 49(2): 154-66.
[http://dx.doi.org/10.1002/prot.10181] [PMID:  12210997] 
[11] 
Hua S, Sun Z. A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. J Mol Biol  2001; 308(2): 397-407.
[http://dx.doi.org/10.1006/jmbi.2001.4580] [PMID:  11327775] 
[12] 
Zhou J, Wang H, Zhao Z, Xu R, Lu Q. CNNH_PSS: protein 8-class secondary structure prediction by convolutional neural network with highway. BMC Bioinformatics  2018; 19(Suppl. 4): 60.
[http://dx.doi.org/10.1186/s12859-018-2067-8] [PMID:  29745837] 
[13] 
Liu B, Liu F, Wang X, Chen J, Fang L, Chou K-C. Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Res  2015; 43(W1), W65-71.
[http://dx.doi.org/10.1093/nar/gkv458] [PMID:  25958395] 
[14] 
Liu B, Gao X, Zhang H. BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches. Nucleic Acids Res  2019; 47(20), e127.
[http://dx.doi.org/10.1093/nar/gkz740] [PMID:  31504851] 
[15] 
Liu B, Wu H, Zhang D, Wang X, Chou K-C. Pse-Analysis: a python package for DNA/RNA and protein/peptide sequence analysis based on pseudo components and kernel methods. Oncotarget  2017; 8(8): 13338-43.
[http://dx.doi.org/10.18632/oncotarget.14524] [PMID:  28076851] 
[16] 
Chen Z, Zhao P, Li F, et al. iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences. Bioinformatics  2018; 34(14): 2499-502.
[http://dx.doi.org/10.1093/bioinformatics/bty140] [PMID:  29528364] 
[17] 
Hinton G, Deng L, Yu D, et al. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Proc Mag  2012; 29(6): 82-97.
[18] 
Voulodimos A, Doulamis N, Doulamis A, Protopapadakis E. Deep learning for computer vision: a brief review. Comput Intell Neurosci 2018., 20187068349.
[http://dx.doi.org/10.1155/2018/7068349] [PMID:  29487619] 
[19] 
Nogueira C, Santos D, Gatti M. Deep convolutional neural networks for sentiment analysis of short texts. 3rd International Conference on Control, Automation and Robotics (ICCAR). IEEE,Nagoya, 2017.. 
[20] 
Busia A, Collins J, Jaitly N. Protein Secondary Structure Prediction Using Deep Multi-scale Convolutional Neural Networks and Next-Step Conditioning nd Available from: http://www.princeton.edu/ (Accessed on January 15, 2020)
[21] 
Lin Z, Lanchantin J, Qi Y. MUST-CNN: A Multilayer Shift-and-Stitch Deep Convolutional Architecture for Sequence-Based Protein Structure Prediction nd Available from www.aaai.org (Accessed on January 15, 2020)
[22] 
Pollastri G, Przybylski D, Rost B, Baldi P. Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins  2002; 47(2): 228-35.
[http://dx.doi.org/10.1002/prot.10082] [PMID:  11933069] 
[23] 
Sønderby SK, Winther O. Protein secondary structure prediction
95  with long short term memory networks 2014. arXiv.org > q-bio >
96  arXiv:1412.7828.. 
[24] 
Guo Y, Wang B, Li W, Yang B. Protein secondary structure prediction improved by recurrent neural networks integrated with two-dimensional convolutional neural networks. J Bioinform Comput Biol  2018; 16(5), 1850021.
[http://dx.doi.org/10.1142/S021972001850021X] [PMID:  30419785] 
[25] 
Zhou J, Troyanskaya OG. Deep supervised and convolutional
103 generative stochastic network for protein secondary structure
104 prediction. 2014; eprint arXiv:1403.1347.. 
[26] 
Li Z, Yu Y. Protein secondary structure prediction using cascaded convolutional and recurrent neural networks 2016. ; arXiv.org > qbio> arXiv:1604.07176.
[27] 
Wang S, Peng J, Ma J, Xu J. Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields. Sci Rep  2016; 6: 18962.
[http://dx.doi.org/10.1038/srep18962] [PMID:  26752681] 
[28] 

https://www.rcsb.org/ RCSB PDB: Homepage, n.d. Available from:(Accessed on April 17, 2020).
[29] 
Moult J, Fidelis K, Kryshtafovych A, Tramontano A. Critical assessment of methods of protein structure prediction (CASP)-round IX. Proteins  2011; 79(Suppl. 10): 1-5.
[http://dx.doi.org/10.1002/prot.23200] [PMID:  21997831] 
[30] 
Rost B, Sander C. Improved prediction of protein secondary structure by use of sequence profiles and neural networks. Proc Natl Acad Sci USA  1993; 90(16): 7558-62.
[http://dx.doi.org/10.1073/pnas.90.16.7558] [PMID:  8356056] 
[31] 
Kabsch W, Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers  1983; 22(12): 2577-637.
[http://dx.doi.org/10.1002/bip.360221211] [PMID:  6667333] 
[32] 
Schuster M, Paliwal KK. Bidirectional recurrent neural networks. IEEE Trans Signal Process  1997; 45(11): 2673-81.
[http://dx.doi.org/10.1109/78.650093] 
[33] 
 Home - Keras Documentation, (n.d.). Available from:. https://keras.io/  (Accessed on January 15, 2020)
[34] 
 TensorFlow White Papers | TensorFlow, (n.d.). Available from:. https://www.tensorflow.org/about/bib (Accessed on January 15, 2020).
[35] 
Hinton G, Srivastava N, Swersky K. Neural Networks for Machine Learning Lecture 6a Overview of mini-batch gradient descent.
 2012; 14(8): 31.. 
[36] 
Heffernan R, Paliwal K, Lyons J, Singh J, Yang Y, Zhou Y. Single-sequence-based prediction of protein secondary structures and solvent accessibility by deep whole-sequence learning. J Comput Chem  2018; 39(26): 2210-6.
[http://dx.doi.org/10.1002/jcc.25534] [PMID:  30368831] 
[37] 
Drozdetskiy A, Cole C, Procter J, Barton GJ. JPred4: a protein secondary structure prediction server. Nucleic Acids Res  2015; 43(W1), W389-94.
[http://dx.doi.org/10.1093/nar/gkv332] [PMID:  25883141] 
[38] 
Wang S, Li W, Liu S, Xu J. RaptorX-Property: a web server for protein structure property prediction. Nucleic Acids Res  2016; 44(W1), W430-5.
[http://dx.doi.org/10.1093/nar/gkw306] [PMID:  27112573] 
[39] 
Fang C, Shang Y, Xu D. MUFOLD-SS: new deep inception-inside-inception networks for protein secondary structure prediction. Proteins  2018; 86(5): 592-8.
[40] 
Hu H, Li Z, Elofsson A, Xie S. A Bi-LSTM based ensemble algorithm for prediction of protein secondary structure. Appl Sci  2019; 9: 3538.
[http://dx.doi.org/10.3390/app9173538] 
[41] 
Aydin Z, Altunbasak Y, Borodovsky M. Protein secondary structure prediction for a single-sequence using hidden semi-Markov models. BMC Bioinformatics  2006; 7: 178.
[http://dx.doi.org/10.1186/1471-2105-7-178] [PMID:  16571137] 
[42] 
Rost B, Sander C, Schneider R. PHD--an automatic mail server for protein secondary structure prediction. Comput Appl Biosci  1994; 10(1): 53-60.
[http://dx.doi.org/10.1093/bioinformatics/10.1.53] [PMID:  8193956] 
[43] 
Magnan CN, Baldi P. SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity. Bioinformatics  2014; 30(18): 2592-7.
[http://dx.doi.org/10.1093/bioinformatics/btu352] [PMID:  24860169] 

Rights & Permissions Print Cite

Article Metrics

12

1

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1574893615999200601122840	Print ISSN 1574-8936
Publisher Name Bentham Science Publisher	Online ISSN 2212-392X

Current Bioinformatics

Protein Secondary Structure Prediction Using Character Bi-gram Embedding and Bi-LSTM

Abstract Play Pause

Graphical Abstract

Related Journals

Related Books

Abstract