Abstract
Background: Neuropeptides are a class of bioactive peptides produced from neuropeptide precursors through a series of extremely complex processes, mediating neuronal regulations in many aspects. Accurate identification of cleavage sites of neuropeptide precursors is of great significance for the development of neuroscience and brain science.
Objective: With the explosive growth of neuropeptide precursor data, it is pretty much needed to develop bioinformatics methods for predicting neuropeptide precursors’ cleavage sites quickly and efficiently.
Methods: We started with processing the neuropeptide precursor data from SwissProt and NueoPedia into two sets of data, training dataset and testing dataset. Subsequently, six feature extraction schemes were applied to generate different feature sets and then feature selection methods were used to find the optimal feature subset of each. Thereafter the support vector machine was utilized to build models for different feature types. Finally, the performance of models were evaluated with the independent testing dataset.
Results: Six models are built through support vector machine. Among them the enhanced amino acid composition-based model reaches the highest accuracy of 91.60% in the 5-fold cross validation. When evaluated with independent testing dataset, it also showed an excellent performance with a high accuracy of 90.37% and Area under Receiver Operating Characteristic curve up to 0.9576.
Conclusion: The performance of the developed model was decent. Moreover, for users’ convenience, an online web server called NeuroCS is built, which is freely available at http://i.uestc.edu.cn/NeuroCS/dist/index.html#/. NeuroCS can be used to predict neuropeptide precursors’ cleavage sites effectively.
Keywords: Cleavage sites, enhanced amino acid composition, machine learning, neuropeptide, neuropeptide precursor, support vector machine.
Graphical Abstract
[http://dx.doi.org/10.1002/cbdv.201200288] [PMID: 23161624]
[http://dx.doi.org/10.1111/head.13084] [PMID: 28485842]
[http://dx.doi.org/10.1016/j.ygcen.2011.01.005] [PMID: 21241702]
[http://dx.doi.org/10.1186/s12862-016-0621-4] [PMID: 26923142]
[http://dx.doi.org/10.1007/s12539-018-0287-2] [PMID: 29525981]
[http://dx.doi.org/10.1002/1097-0134(20010101)42:1<136:AID-PROT130>3.0.CO;2-F] [PMID: 11093267]
[http://dx.doi.org/10.1109/TCBB.2017.2665542] [PMID: 28186905]
[http://dx.doi.org/10.1093/bioinformatics/btz044] [PMID: 30668649]
[http://dx.doi.org/10.1093/nar/gkv1100] [PMID: 26503249]
[http://dx.doi.org/10.1093/database/bay032] [PMID: 29688378]
[http://dx.doi.org/10.2174/1389200219666180821095355] [PMID: 30129406]
[http://dx.doi.org/10.7150/ijbs.24582] [PMID: 29989101]
[http://dx.doi.org/10.2174/0929867325666180629123117] [PMID: 29956612]
[http://dx.doi.org/10.1093/nar/gkl161] [PMID: 16845008]
[http://dx.doi.org/10.1109/IEMBS.2005.1617056] [PMID: 17282825]
[http://dx.doi.org/10.1007/978-1-4939-3167-5_2] [PMID: 26519399]
[http://dx.doi.org/10.1093/bioinformatics/btr445] [PMID: 21821666]
[http://dx.doi.org/10.1093/bioinformatics/bty140] [PMID: 29528364]
[http://dx.doi.org/10.1016/j.csbj.2018.10.007] [PMID: 30425802]
[http://dx.doi.org/10.1093/bioinformatics/btz408] [PMID: 31099381]
[http://dx.doi.org/10.3390/ijms20081964] [PMID: 31013619]
[http://dx.doi.org/10.3389/fmicb.2018.00476] [PMID: 29616000]
[http://dx.doi.org/10.1039/c4mb00316k] [PMID: 24931825]
[http://dx.doi.org/10.1023/A:1008363719778]
[http://dx.doi.org/10.1371/journal.pone.0022940] [PMID: 21857971]
[http://dx.doi.org/10.1155/2013/287019] [PMID: 23586028]
[http://dx.doi.org/10.1016/j.omtn.2019.04.019] [PMID: 31146255]
[http://dx.doi.org/10.3389/fimmu.2018.01783] [PMID: 30108593]
[http://dx.doi.org/10.1021/acs.jproteome.8b00148] [PMID: 29893128]
[http://dx.doi.org/10.1093/bioinformatics/bty943] [PMID: 30428009]
[http://dx.doi.org/10.3389/fimmu.2018.01695] [PMID: 30100904]
[http://dx.doi.org/10.1093/bioinformatics/btz015] [PMID: 30624619]
[PMID: 31157855]
[http://dx.doi.org/10.1016/j.eswa.2010.06.065]
[http://dx.doi.org/10.1517/17460441.2014.866943] [PMID: 24304044]
[http://dx.doi.org/10.3923/pjbs.2014.335.345] [PMID: 24897787]
[http://dx.doi.org/10.1089/cmb.2018.0004] [PMID: 30113871]
[http://dx.doi.org/10.1093/bioinformatics/btz358] [PMID: 31077296]
[http://dx.doi.org/10.1016/j.jim.2015.03.021] [PMID: 25862605]
[http://dx.doi.org/10.1142/S021972001450005X] [PMID: 24467763]
[http://dx.doi.org/10.1155/2016/9175143] [PMID: 27610387]
[http://dx.doi.org/10.1155/2017/5761517] [PMID: 29445741]
[http://dx.doi.org/10.1007/s12539-019-00330-1] [PMID: 31119495]
[http://dx.doi.org/10.1093/bioinformatics/bty827] [PMID: 30247625]
[http://dx.doi.org/10.3934/mbe.2019123] [PMID: 31137222]
[http://dx.doi.org/10.1016/j.ab.2018.09.002] [PMID: 30201554]
[http://dx.doi.org/10.7150/ijbs.24616] [PMID: 29989083]
[http://dx.doi.org/10.1016/j.artmed.2017.02.007] [PMID: 28283358]
[http://dx.doi.org/10.1145/1961189.1961199]
[http://dx.doi.org/10.1093/bib/bby124] [PMID: 30649170]
[http://dx.doi.org/10.1111/j.1432-1033.1995.tb20192.x] [PMID: 7867629]