Abstract
Background: Cancerlectins play an important role in various cancer metastasis and tumor cell differentiation. Therefore, comprehensively understanding the functions of cancerlectins could reveal the future direction of cancer treatment. Although cancerlectin protein sequences can be distinguished by various computational methods, which have been proposed as auxiliary tools, these methods sometimes fail because of the large sequence diversity among cancerlectins.
Objective: The objective of this study is to provide an efficient predictor for identifying cancerlectins.
Method: Herein, we build a prediction model based on a support vector machine, which improves the sensitivity and accuracy of cancerlectin protein identification. Feature extraction and selection are performed by our proposed Split Bi-Profile Bayes (SBPB) scheme and a lasso algorithm, respectively.
Results: In jackknife cross-validation, our model (called iCanLec-SBPB) achieved a sensitivity of 81.36% and an accuracy of 83.25%.
Conclusion: The results confirm the higher sensitivity and accuracy of iCanLec-SBPB than other existing methods.
Keywords: Cancerlectins, comparison, cross-validation, feature extraction, prediction model, SVM.
Graphical Abstract