Abstract
Background: Protein secondary structure is vital to predicting the tertiary structure, which is essential in deciding protein function and drug designing. Therefore, there is a high requirement of computational methods to predict secondary structure from their primary sequence. Protein primary sequences represented as a linear combination of twenty amino acid characters and contain the contextual information for secondary structure prediction.
Objective and Methods: Protein secondary structure predicted from their primary sequences using a deep recurrent neural network. Protein secondary structure depends on local and long-range residues in primary sequences. In the proposed work, the local contextual information of amino acid residues captures with character n-gram. A dense embedding vector represents this local contextual information. Furthermore, the bidirectional long short-term memory (Bi-LSTM) model is used to capture the long-range contexts by extracting the past and future residues information in primary sequences.
Results: The proposed deep recurrent architecture is evaluated for its efficacy for datasets, namely ss.txt, RS126, and CASP9. The model shows the Q3 accuracies of 88.45%, 83.48%, and 86.69% for ss.txt, RS126, and CASP9, respectively. The performance of the proposed model is also compared with other state-of-the-art methods available in the literature.
Conclusion: After a comparative analysis, it was observed that the proposed model is performing better in comparison to state-of-art methods.
Keywords: Proteomics, protein secondary structure, amino acids sequence, character n-gram embedding, deep learning, bidirectional long short-term memory.
Graphical Abstract
[http://dx.doi.org/10.1038/75556] [PMID: 10802651]
[http://dx.doi.org/10.1093/nar/gkn238] [PMID: 18463136]
[http://dx.doi.org/10.2174/157489308784340676]
[http://dx.doi.org/10.1093/bib/bbw129] [PMID: 28040746]
[http://dx.doi.org/10.1038/181662a0] [PMID: 13517261]
[http://dx.doi.org/10.1093/nar/gkv494] [PMID: 25979265]
[http://dx.doi.org/10.1021/bi00465a022] [PMID: 2159334]
[http://dx.doi.org/10.4172/jpb.1000203]
[http://dx.doi.org/10.1021/bi00699a002] [PMID: 4358940]
[http://dx.doi.org/10.1002/prot.10181] [PMID: 12210997]
[http://dx.doi.org/10.1006/jmbi.2001.4580] [PMID: 11327775]
[http://dx.doi.org/10.1186/s12859-018-2067-8] [PMID: 29745837]
[http://dx.doi.org/10.1093/nar/gkv458] [PMID: 25958395]
[http://dx.doi.org/10.1093/nar/gkz740] [PMID: 31504851]
[http://dx.doi.org/10.18632/oncotarget.14524] [PMID: 28076851]
[http://dx.doi.org/10.1093/bioinformatics/bty140] [PMID: 29528364]
[http://dx.doi.org/10.1155/2018/7068349] [PMID: 29487619]
[http://dx.doi.org/10.1002/prot.10082] [PMID: 11933069]
[http://dx.doi.org/10.1142/S021972001850021X] [PMID: 30419785]
[http://dx.doi.org/10.1038/srep18962] [PMID: 26752681]
[http://dx.doi.org/10.1002/prot.23200] [PMID: 21997831]
[http://dx.doi.org/10.1073/pnas.90.16.7558] [PMID: 8356056]
[http://dx.doi.org/10.1002/bip.360221211] [PMID: 6667333]
[http://dx.doi.org/10.1109/78.650093]
[http://dx.doi.org/10.1002/jcc.25534] [PMID: 30368831]
[http://dx.doi.org/10.1093/nar/gkv332] [PMID: 25883141]
[http://dx.doi.org/10.1093/nar/gkw306] [PMID: 27112573]
[http://dx.doi.org/10.3390/app9173538]
[http://dx.doi.org/10.1186/1471-2105-7-178] [PMID: 16571137]
[http://dx.doi.org/10.1093/bioinformatics/10.1.53] [PMID: 8193956]
[http://dx.doi.org/10.1093/bioinformatics/btu352] [PMID: 24860169]