Abstract
Background: Protein secondary structure prediction (PSSP) is a fundamental task in bioinformatics that is helpful for understanding the three-dimensional structure and biological function of proteins. Many neural network-based prediction methods have been developed for protein secondary structures. Deep learning and multiple features are two obvious means to improve prediction accuracy.
Objective: To promote the development of PSSP, a deep convolutional neural network-based method is proposed to predict both the eight-state and three-state of protein secondary structure.
Methods: In this model, sequence and evolutionary information of proteins are combined as multiple input features after preprocessing. A deep convolutional neural network with no pooling layer and connection layer is then constructed to predict the secondary structure of proteins. L2 regularization, batch normalization, and dropout techniques are employed to avoid over-fitting and obtain better prediction performance, and an improved cross-entropy is used as the loss function.
Results: Our proposed model can obtain Q3 prediction results of 86.2%, 84.5%, 87.8%, and 84.7%, respectively, on CullPDB, CB513, CASP10 and CASP11 datasets, with corresponding Q8 prediction results of 74.1%, 70.5%, 74.9%, and 71.3%.
Conclusion: We have proposed the DCNN-SS deep convolutional-network-based PSSP method, and experimental results show that DCNN-SS performs competitively with other methods.
Keywords: Convolutional neural network, deep learning, evolutionary information, multiple features, molecular structure prediction, protein secondary structure prediction.
Graphical Abstract