Abstract
Background: Lectins are a diverse group of glycoproteins or glycoconjugate proteins that can be extracted from plants, invertebrates and higher animals. Cancerlectins, a kind of lectins, which play a key role in the process of tumor cells interacting with each other and are being employed as therapeutic agents. A full understanding of cancerlectins is significant because it provides a tool for the future direction of cancer therapy.
Objective: To develop an accurate and practically useful timesaving tool to identify cancerlectins. A novel sequence-based method is proposed along with a correlative webserver to access the proposed tool.
Methods: Firstly, protein features were extracted in a newly feature building way termed, g-gap tripeptide composition. After which a proposed cascade linear discriminant analysis (Cascade LDA) is used to alleviate the high dimensional difficulties with the Analysis Of Variance (ANOVA) as a feature importance criterion. Finally, Support Vector Machine (SVM) is used as the classifier to identify cancerlectins.
Results: The proposed method achieved an accuracy of 91.34% with sensitivity of 89.89%, specificity of 92.48% and an 0.8318 Mathew’s correlation coefficient based on only 13 fusion features in jackknife cross validation, the result of which is superior to other published methods in this domain.
Conclusion: In this study, a new method based only on primary structure of protein is proposed and experimental results show that it could be a promising tool to identify cancerlectins. An openaccess webserver is made available in this work to facilitate other related works.
Keywords: Cancerlectin, cascade LDA, g-gap tripeptide composition, SVM, protein, ANOVA.
Graphical Abstract
[http://dx.doi.org/10.1111/j.1749-6632.1988.tb22372.x PMID: 3072905]
[http://dx.doi.org/10.1126/science.2552581] [PMID: 2552581]
[http://dx.doi.org/10.1002/prca.200800153] [PMID: 21132067]
[http://dx.doi.org/10.1007/BF00731862] [PMID: 7795408]
[http://dx.doi.org/10.1172/JCI116210] [PMID: 7679406]
[http://dx.doi.org/10.4103/2395-4469.145348]
[http://dx.doi.org/10.3322/caac.20107] [PMID: 21296855]
[http://dx.doi.org/10.1007/BF02867384] [PMID: 23105409]
[http://dx.doi.org/10.1038/nrc1527] [PMID: 15630413]
[http://dx.doi.org/10.1023/A:1015535427597] [PMID: 12085965]
[http://dx.doi.org/10.1002/path.1896] [PMID: 16362990]
[http://dx.doi.org/10.1111/j.1365-2184.2011.00800.x PMID: 22172162]
[http://dx.doi.org/10.1111/j.1745-7270.2008.00488.x PMID: 19089301]
[http://dx.doi.org/10.1007/BF02980049] [PMID: 14969342]
[http://dx.doi.org/10.1186/1756-0500-4-237] [PMID: 21774797]
[http://dx.doi.org/10.1038/srep16964] [PMID: 26648527]
[http://dx.doi.org/10.1155/2016/7604641]
[http://dx.doi.org/10.18632/oncotarget.15963] [PMID: 28423655]
[http://dx.doi.org/10.1155/2018/9364182]
[http://dx.doi.org/10.1007/s10719-007-9085-5] [PMID: 18038206]
[http://dx.doi.org/10.1093/nar/gkh131]
[PMID: 22684147]
[http://dx.doi.org/10.1006/jmbi.1994.1267] [PMID: 8145256]
[http://dx.doi.org/10.1002/prot.1035] [PMID: 11288174]
[http://dx.doi.org/10.1016/j.jtbi.2018.03.034] [PMID: 29596863]
[http://dx.doi.org/10.1016/j.jtbi.2018.02.008] [PMID: 29476832]
[http://dx.doi.org/10.1016/j.jtbi.2018.05.006] [PMID: 29753757]
[http://dx.doi.org/10.2174/157016409789973707]
[http://dx.doi.org/10.1073/pnas.92.19.8700] [PMID: 7568000]
[http://dx.doi.org/10.1186/s12859-015-0828-1] [PMID: 26630876]
[http://dx.doi.org/10.1016/j.neucom.2014.12.123]
[http://dx.doi.org/10.1186/s12918-016-0353-5] [PMID: 28155714]
[http://dx.doi.org/10.1039/C4MB00316K] [PMID: 24931825]
[http://dx.doi.org/10.1016/j.chemolab.2013.03.005]
[http://dx.doi.org/10.1007/s00726-014-1862-4] [PMID: 25385313]
[http://dx.doi.org/10.1371/journal.pone.0075726] [PMID: 24130738]
[http://dx.doi.org/10.1109/TPAMI.2007.250598] [PMID: 17108382]
[http://dx.doi.org/10.1016/j.patcog.2011.01.009]
[http://dx.doi.org/10.1016/S0031-3203(00)00084-4]
[http://dx.doi.org/10.1016/j.neucom.2014.12.010]
[http://dx.doi.org/10.1016/j.ab.2013.05.024] [PMID: 23756733]
[http://dx.doi.org/10.1109/34.598228]
[http://dx.doi.org/10.2174/1574893611666160608102537]
[http://dx.doi.org/10.2174/157016461302160514000940]
[http://dx.doi.org/10.2174/1574893611666160608075753]
[http://dx.doi.org/10.1093/bioinformatics/btx479] [PMID: 28961687]
[http://dx.doi.org/10.1038/srep40242] [PMID: 28079126]
[http://dx.doi.org/10.1039/B918972F] [PMID: 20098757]
[http://dx.doi.org/10.1155/2016/1654623]
[http://dx.doi.org/10.1093/bioinformatics/bty508] [PMID: 29931187]
[http://dx.doi.org/10.1155/2016/5413903] [PMID: 27597968]
[http://dx.doi.org/10.1039/C5MB00883B] [PMID: 26883492]
[http://dx.doi.org/10.1155/2013/567529] [PMID: 24062796]
[http://dx.doi.org/10.1155/2013/530696] [PMID: 23762187]
[http://dx.doi.org/10.1016/j.jtbi.2010.12.024] [PMID: 21168420]
[http://dx.doi.org/10.7150/ijbs.24616] [PMID: 29989083]
[http://dx.doi.org/10.7150/ijbs.24174] [PMID: 29989085]
[http://dx.doi.org/10.1186/s12918-018-0570-1] [PMID: 29745856]
[PMID: 30247625] [http://dx.doi.org/10.1093/bioinformatics/bty827]]
[PMID: 30428009]
[PMID: 27543076]
[http://dx.doi.org/10.1093/nar/gkw1052] [PMID: 27899615]
[http://dx.doi.org/10.2174/1570178614666170329155502]
[PMID: 28171531]
[http://dx.doi.org/10.1093/bioinformatics/btu602] [PMID: 25231908]
[http://dx.doi.org/10.1093/bioinformatics/btx622] [PMID: 29028927]
[http://dx.doi.org/10.1093/bioinformatics/bty039] [PMID: 29420699]