Abstract
Background: The prediction of a protein's secondary structure from its amino acid sequence is an essential step towards predicting its 3-D structure. The prediction performance improves by incorporating homologous multiple sequence alignment information. Since homologous details not available for all proteins. Therefore, it is necessary to predict the protein secondary structure from single sequences.
Objective and Methods: Protein secondary structure predicted from their primary sequences using n-gram word embedding and deep recurrent neural network. Protein secondary structure depends on local and long-range neighbor residues in primary sequences. In the proposed work, the local contextual information of amino acid residues captures variable-length character n-gram words. An embedding vector represents these variable-length character n-gram words. Further, the bidirectional long short-term memory (Bi-LSTM) model is used to capture the long-range contexts by extracting the past and future residues information in primary sequences.
Results: The proposed model evaluates on three public datasets ss.txt, RS126, and CASP9. The model shows the Q3 accuracy of 92.57%, 86.48%, and 89.66% for ss.txt, RS126, and CASP9.
Conclusion: The proposed model performance compares with state-of-the-art methods available in the literature. After a comparative analysis, it observed that the proposed model performs better than state-of-the-art methods.
Keywords: Proteomics, protein secondary structure, amino acids sequence, character n-gram embedding, deep learning, bidirectional long short-term memory.
Graphical Abstract
[http://dx.doi.org/10.1038/75556] [PMID: 10802651]
[http://dx.doi.org/10.1093/nar/gkn238] [PMID: 18463136]
[http://dx.doi.org/10.2174/157489308784340676]
[http://dx.doi.org/10.1093/bib/bbw129] [PMID: 28040746]
[http://dx.doi.org/10.1038/181662a0] [PMID: 13517261]
[http://dx.doi.org/10.1093/nar/gkv494] [PMID: 25979265]
[http://dx.doi.org/10.1021/bi00465a022] [PMID: 2159334]
[http://dx.doi.org/10.4172/jpb.1000203]
[http://dx.doi.org/10.1006/jmbi.2001.4580] [PMID: 11327775]
[http://dx.doi.org/10.1186/s12859-018-2067-8] [PMID: 29745837]
[http://dx.doi.org/10.1093/nar/gkv458] [PMID: 25958395]
[http://dx.doi.org/10.1093/nar/gkz740] [PMID: 31504851]
[http://dx.doi.org/10.18632/oncotarget.14524] [PMID: 28076851]
[http://dx.doi.org/10.1093/bioinformatics/bty140] [PMID: 29528364]
[http://dx.doi.org/10.1155/2018/7068349] [PMID: 29487619]
[http://dx.doi.org/10.1002/prot.10082] [PMID: 11933069]
[http://dx.doi.org/10.1142/S021972001850021X] [PMID: 30419785]
[http://dx.doi.org/10.1038/srep18962] [PMID: 26752681]
[http://dx.doi.org/10.1002/prot.23200] [PMID: 21997831]
[http://dx.doi.org/10.1073/pnas.90.16.7558] [PMID: 8356056]
[http://dx.doi.org/10.1002/bip.360221211] [PMID: 6667333]
[http://dx.doi.org/10.1109/78.650093]
[http://dx.doi.org/10.1002/jcc.25534] [PMID: 30368831]
[http://dx.doi.org/10.1093/nar/gkv332] [PMID: 25883141]
[http://dx.doi.org/10.1093/nar/gkw306] [PMID: 27112573]
[http://dx.doi.org/10.3390/app9173538]
[http://dx.doi.org/10.1186/1471-2105-7-178] [PMID: 16571137]
[http://dx.doi.org/10.1093/bioinformatics/10.1.53] [PMID: 8193956]
[http://dx.doi.org/10.1093/bioinformatics/btu352] [PMID: 24860169]