Generic placeholder image

Combinatorial Chemistry & High Throughput Screening

Editor-in-Chief

ISSN (Print): 1386-2073
ISSN (Online): 1875-5402

Research Article

Investigating the Precise Identification of Citrullination Sites with High- Performance Score Metrics Using a Powerful Computation Predicting Tool

Author(s): Fee Faysal Ahmed*, Anamika Podder, Md. Farhad Bulbul, Md. Amzad Hossain, Mahedi Hasan, Md. Abdur Rauf Sarkar and Daijin Kim*

Volume 27, Issue 9, 2024

Published on: 25 September, 2023

Page: [1381 - 1393] Pages: 13

DOI: 10.2174/1386207326666230912151932

Price: $65

Abstract

Background: To elucidate the detailed mechanisms of citrullination at the molecular level and design drugs applicable to major human diseases, predicting protein citrullination sites (PCSs) is essential. Using experimental approaches to predict PCSs is time-consuming and costly. However, there is a limited scope of the current PCS predictors. In particular, most predictors are commonly used for PCS prediction and have limited performance scores.

Objective: This work aims to provide an improved sophisticated predictor of citrullination sites using a benchmark dataset in a machine learning platform.

Methods: This study presents a reliable citrullination site predictor based on a benchmark dataset containing a 1:1 ratio of positive and negative samples. We classified citrullination sites using the Composition of the K-Spaced Amino Acid Pairs (CKSAAP) and Support Vector Machine (SVM).

Results: We developed PCS predictors using integrated machine-learning methods that produced the highest average scores. Using 10-fold cross-validation on test datasets, the True Positive Rate (TPR) was 98.34%, the True Negative Rate (TNR) was 99.44%, the accuracy was 98.89%, the Mathew Correlation Coefficient (MCC) was 98.21%, the Area Under the ROC Curve (AUC) was 0.999, and the partial Area Under the ROC Curve (pAUC) was 0.1968.

Conclusion: According to overall performance, our developed predictor has a significantly higher implementation in comparison with the current tools on the same benchmark dataset. Moreover, it showed better performance metrics on both test and training datasets. Our developed predictor is promising and can be implemented as a complementary technique for identifying fast and precise citrullination sites.

Graphical Abstract

[1]
Mann, M.; Jensen, O.N. Proteomic analysis of post-translational modifications. Nat. Biotechnol., 2003, 21(3), 255-261.
[http://dx.doi.org/10.1038/nbt0303-255] [PMID: 12610572]
[2]
Xu, Y.; Chou, K.C. Recent progress in predicting posttranslational modification sites in proteins. Curr. Top. Med. Chem., 2015, 16(6), 591-603.
[http://dx.doi.org/10.2174/1568026615666150819110421] [PMID: 26286211]
[3]
Wang, Y.C.; Peterson, S.E.; Loring, J.F. Protein post-translational modifications and regulation of pluripotency in human stem cells. Cell Res., 2014, 24(2), 143-160.
[http://dx.doi.org/10.1038/cr.2013.151] [PMID: 24217768]
[4]
Huang, K.Y.; Lee, T.Y.; Kao, H.J.; Ma, C.T.; Lee, C.C.; Lin, T.H.; Chang, W.C.; Huang, H.D. dbPTM in 2019: Exploring disease association and cross-talk of post-translational modifications. Nucleic Acids Res., 2019, 47(D1), D298-D308.
[http://dx.doi.org/10.1093/nar/gky1074] [PMID: 30418626]
[5]
Kaore, S.N.; Amane, H.S.; Kaore, N.M. Citrulline: Pharmacological perspectives and its role as an emerging biomarker in future. Fundam. Clin. Pharmacol., 2013, 27(1), 35-50.
[http://dx.doi.org/10.1111/j.1472-8206.2012.01059.x] [PMID: 23316808]
[6]
Lazarus, R.C.; Buonora, J.E.; Kamnaksh, A.; Flora, M.N.; Freedy, J.G.; Holstein, G.R.; Martinelli, G.P.; Jacobowitz, D.M.; Agoston, D.; Mueller, G.P. Citrullination following traumatic brain injury: A mechanism for ongoing pathology through protein modification. In: Protein Deimination in Human Health and Disease; Springer: Cham, 2017; pp. 275-291.
[7]
Blom, N.; Sicheritz-Pontén, T.; Gupta, R.; Gammeltoft, S.; Brunak, S. Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence. Proteomics, 2004, 4(6), 1633-1649.
[http://dx.doi.org/10.1002/pmic.200300771] [PMID: 15174133]
[8]
Guo, Q.; Bedford, M.T.; Fast, W. Discovery of peptidylarginine deiminase-4 substrates by protein array: Antagonistic citrullination and methylation of human ribosomal protein S2. Mol. Biosyst., 2011, 7(7), 2286-2295.
[http://dx.doi.org/10.1039/c1mb05089c] [PMID: 21584310]
[9]
GS Chirivi, R.; Van Rosmalen, J.W.G.; Jenniskens, G.J.; Pruijn, G.J.M.; Raats, J.M.H. Citrullination: A target for disease intervention in multiple sclerosis and other inflammatory diseases? J. Clin. Cell. Immunol., 2013, 4(3), 1-8.
[http://dx.doi.org/10.4172/2155-9899.1000146]
[10]
Yuzhalin, A.E. Citrullination in cancer. Cancer Res., 2019, 79(7), 1274-1284.
[http://dx.doi.org/10.1158/0008-5472.CAN-18-2797] [PMID: 30894374]
[11]
György, B.; Tóth, E.; Tarcsa, E.; Falus, A.; Buzás, E.I. Citrullination: A posttranslational modification in health and disease. Int. J. Biochem. Cell Biol., 2006, 38(10), 1662-1677.
[http://dx.doi.org/10.1016/j.biocel.2006.03.008] [PMID: 16730216]
[12]
Chumanevich, A.A.; Causey, C.P.; Knuckley, B.A.; Jones, J.E.; Poudyal, D.; Chumanevich, A.P.; Davis, T.; Matesic, L.E.; Thompson, P.R.; Hofseth, L.J. Suppression of colitis in mice by Cl-amidine: A novel peptidylarginine deiminase inhibitor. Am. J. Physiol. Gastrointest. Liver Physiol., 2011, 300(6), G929-G938.
[http://dx.doi.org/10.1152/ajpgi.00435.2010] [PMID: 21415415]
[13]
Stensland, M.; Holm, A.; Kiehne, A.; Fleckenstein, B. Targeted analysis of protein citrullination using chemical modification and tandem mass spectrometry. Rapid Commun. Mass Spectrom., 2009, 23(17), 2754-2762.
[http://dx.doi.org/10.1002/rcm.4185] [PMID: 19639564]
[14]
Senshu, T.; Akiyama, K.; Kan, S.; Asaga, H.; Ishigami, A.; Manabe, M. Detection of deiminated proteins in rat skin: Probing with a monospecific antibody after modification of citrulline residues. J. Invest. Dermatol., 1995, 105(2), 163-169.
[http://dx.doi.org/10.1111/1523-1747.ep12317070] [PMID: 7543546]
[15]
Bicker, K.L.; Subramanian, V.; Chumanevich, A.A.; Hofseth, L.J.; Thompson, P.R. Seeing citrulline: Development of a phenylglyoxal-based probe to visualize protein citrullination. J. Am. Chem. Soc., 2012, 134(41), 17015-17018.
[http://dx.doi.org/10.1021/ja308871v] [PMID: 23030787]
[16]
Liu, M.; Liu, G. Prediction of citrullination sites on the basis of mRMR method and SNN. Comb. Chem. High Throughput Screen., 2020, 22(10), 705-715.
[http://dx.doi.org/10.2174/1386207322666191129113508] [PMID: 31782357]
[17]
Zhang, Q.; Sun, X.; Feng, K.; Wang, S.; Zhang, Y.H.; Wang, S.; Lu, L.; Cai, Y.D. Predicting citrullination sites in protein sequences using mRMR method and random forest algorithm. Comb. Chem. High Throughput Screen., 2017, 20(2), 164-173.
[PMID: 28029071]
[18]
Ju, Z.; Wang, S.Y. Prediction of citrullination sites by incorporating k-spaced amino acid pairs into Chou’s general pseudo amino acid composition. Gene, 2018, 664, 78-83.
[http://dx.doi.org/10.1016/j.gene.2018.04.055] [PMID: 29694908]
[19]
Hasan, M.M.; Zhou, Y.; Lu, X.; Li, J.; Song, J.; Zhang, Z. Computational identification of protein pupylation sites by using profile-based composition of k-spaced amino acid pairs. PLoS One, 2015, 10(6), e0129635.
[http://dx.doi.org/10.1371/journal.pone.0129635] [PMID: 26080082]
[20]
Chen, K.; Kurgan, L.; Rahbari, M. Prediction of protein crystallization using collocation of amino acid pairs. Biochem. Biophys. Res. Commun., 2007, 355(3), 764-769.
[http://dx.doi.org/10.1016/j.bbrc.2007.02.040] [PMID: 17316561]
[21]
Xu, Y.; Ding, J.; Wu, L.Y.; Chou, K.C. iSNO-PseAAC: Predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition. PLoS One, 2013, 8(2), e55844.
[http://dx.doi.org/10.1371/journal.pone.0055844] [PMID: 23409062]
[22]
Hasan, M.M.; Schaduangrat, N.; Basith, S.; Lee, G.; Shoombuatong, W.; Manavalan, B. HLPpred-Fuse: Improved and robust prediction of hemolytic peptide and its activity by fusing multiple feature representation. Bioinformatics, 2020, 36(11), 3350-3356.
[http://dx.doi.org/10.1093/bioinformatics/btaa160] [PMID: 32145017]
[23]
Hasan, M.M.; Khatun, M.S.; Mollah, M.N.H.; Yong, C.; Guo, D. A systematic identification of species-specific protein succinylation sites using joint element features information. Int. J. Nanomedicine, 2017, 12, 6303-6315.
[http://dx.doi.org/10.2147/IJN.S140875] [PMID: 28894368]
[24]
Chen, Z.; Chen, Y.Z.; Wang, X.F.; Wang, C.; Yan, R.X.; Zhang, Z. Prediction of ubiquitination sites by using the composition of k-spaced amino acid pairs. PLoS One, 2011, 6(7), e22930.
[http://dx.doi.org/10.1371/journal.pone.0022930] [PMID: 21829559]
[25]
Rey, D.; Neuhäuser, M. International encyclopedia of statistical science; Springer: Berlin, Heidelberg, 2011, pp. 1658-1659.
[http://dx.doi.org/10.1007/978-3-642-04898-2_616]
[26]
Duda, R.O.; Hart, P.E. Pattern classification and scene analysis; Wiley: New York, 1973.
[27]
Franco-Lopez, H.; Ek, A.R.; Bauer, M.E. Estimation and mapping of forest stand density, volume, and cover type using the k-nearest neighbors method. Remote Sens. Environ., 2001, 77(3), 251-274.
[http://dx.doi.org/10.1016/S0034-4257(01)00209-7]
[28]
Keller, J.M.; Gray, M.R.; Givens, J.A. A fuzzy K-nearest neighbor algorithm. IEEE Trans. Syst. Man Cybern., 1985, 15(4), 580-585.
[http://dx.doi.org/10.1109/TSMC.1985.6313426]
[29]
Dudani, S.A. The distance-weighted k-nearest neighbor rule. IEEE Trans. Syst. Man Cybern., 1978, 8(4), 311-313.
[http://dx.doi.org/10.1109/TSMC.1978.4309958]
[30]
Duda, R.O.; Hart, P.E.; Stork, D.G. Pattern Classification, 2nd ed.; Wiley, 2000.
[31]
Fukunaga, K. Introduction to statistical pattern classification academic press; Elsevier: San Diego, California, USA, 1990.
[32]
Hasan, M.M.; Khatun, M.S.; Kurata, H. Large-scale assessment of bioinformatics tools for lysine succinylation sites. Cells, 2019, 8(2), 95.
[http://dx.doi.org/10.3390/cells8020095] [PMID: 30696115]
[33]
Belhumeur, P.N.; Hespanha, J.P.; Kriegman, D.J. Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection. IEEE Trans. Pattern Anal. Mach. Intell., 1997, 19(7), 711-720.
[http://dx.doi.org/10.1109/34.598228]
[34]
Swets, D.L.; Weng, J.J. Using discriminant eigenfeatures for image retrieval. IEEE Trans. Pattern Anal. Mach. Intell., 1996, 18(8), 831-836.
[http://dx.doi.org/10.1109/34.531802]
[35]
Dudoit, S.; Fridlyand, J.; Speed, T.P. Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc., 2002, 97(457), 77-87.
[http://dx.doi.org/10.1198/016214502753479248]
[36]
Venables, W.N.; Ripley, B.D. Modern applied statistics with S-PLUS; Springer, 2013.
[37]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn., 1995, 20(3), 273-297.
[http://dx.doi.org/10.1007/BF00994018]
[38]
Khatun, M.S.; Hasan, M.M.; Kurata, H. PreAIP: Computational prediction of anti-inflammatory peptides by integrating multiple complementary features. Front. Genet., 2019, 10, 129.
[http://dx.doi.org/10.3389/fgene.2019.00129] [PMID: 30891059]
[39]
Chen, W.; Lv, H.; Nie, F.; Lin, H. i6mA-Pred: Identifying DNA N6-methyladenine sites in the rice genome. Bioinformatics, 2019, 35(16), 2796-2800.
[http://dx.doi.org/10.1093/bioinformatics/btz015] [PMID: 30624619]
[40]
Shen, X.J.; Mu, L.; Li, Z.; Wu, H.X.; Gou, J.P.; Chen, X. Large-scale support vector machine classification with redundant data reduction. Neurocomputing, 2016, 172, 189-197.
[http://dx.doi.org/10.1016/j.neucom.2014.10.102]
[41]
Murty, M.N.; Devi, V.S. Pattern Recognition: An Algorithmic Approach; Springer, 2011.
[http://dx.doi.org/10.1007/978-0-85729-495-1]
[42]
Ho, T.K. Random decision forests. Proceedings of 3rd International Conference on Document Analysis and Recognition., Montreal, QC, Canada14-16 Aug; 1995, pp. 278-282.
[43]
Hasan, M.M.; Guo, D.; Kurata, H. Computational identification of protein S-sulfenylation sites by incorporating the multiple sequence features information. Mol. Biosyst., 2017, 13(12), 2545-2550.
[http://dx.doi.org/10.1039/C7MB00491E] [PMID: 28990628]
[44]
Hasan, M.M.; Khatun, M.S.; Kurata, H. A comprehensive review of in silico analysis for protein S-sulfenylation sites. Protein Pept. Lett., 2018, 25(9), 815-821.
[http://dx.doi.org/10.2174/0929866525666180905110619] [PMID: 30182830]
[45]
Breiman, L. Random forests. Mach. Learn., 2001, 45(1), 5-32.
[http://dx.doi.org/10.1023/A:1010933404324]
[46]
Hasan, M.M.; Yang, S.; Zhou, Y.; Mollah, M.N.H. SuccinSite: A computational tool for the prediction of protein succinylation sites by exploiting the amino acid patterns and properties. Mol. Biosyst., 2016, 12(3), 786-795.
[http://dx.doi.org/10.1039/C5MB00853K] [PMID: 26739209]
[47]
Breiman, L. Bagging predictors. Mach. Learn., 1996, 24(2), 123-140.
[http://dx.doi.org/10.1007/BF00058655]
[48]
Freund, Y.; Schapire, R.; Abe, N. A short introduction to boosting. Jinko Chino Gakkaishi, 1999, 14, 1612.
[49]
Gandhi, R. Boosting algorithms: AdaBoost, gradient boosting and XGBoost. 2018. Available from: https://hackernoon.com/boosting-algorithms-adaboost-gradientboosting-and-xgboost-f74991cad38c
[50]
Mei, S.; Zhu, H. AdaBoost based multi-instance transfer learning for predicting proteome-wide interactions between Salmonella and human proteins. PLoS One, 2014, 9(10), e110488.
[http://dx.doi.org/10.1371/journal.pone.0110488] [PMID: 25330226]
[51]
Agresti, A. An introduction to categorical data analysis; John Wiley & Sons, 2018.
[52]
Hastie, T.; Tibshirani, R.; Friedman, J. The elements of statistical learning: Data mining, inference, and prediction; Springer, 2009.
[http://dx.doi.org/10.1007/978-0-387-84858-7]
[53]
Hilbe, J.M. Logistic regression models; CRC press, 2009.
[http://dx.doi.org/10.1201/9781420075779]
[54]
Curns, A.T.; Nizam, A. Student Solutions Manual for Kleinbaum, Kupper, Muller, and Nizam’s Applied Regression Analysis and Other Multivariable Methods; Duxbury Press, 1998.
[55]
Tabaei, B.P.; Herman, W.H. A multivariate logistic regression equation to screen for diabetes: Development and validation. Diabetes Care, 2002, 25(11), 1999-2003.
[http://dx.doi.org/10.2337/diacare.25.11.1999] [PMID: 12401746]
[56]
Vacic, V.; Iakoucheva, L.M.; Radivojac, P. Two Sample Logo: A graphical representation of the differences between two sets of sequence alignments. Bioinformatics, 2006, 22(12), 1536-1537.
[http://dx.doi.org/10.1093/bioinformatics/btl151] [PMID: 16632492]
[57]
Ma, H.; Bandos, A.I.; Rockette, H.E.; Gur, D. On use of partial area under the ROC curve for evaluation of diagnostic performance. Stat. Med., 2013, 32(20), 3449-3458.
[http://dx.doi.org/10.1002/sim.5777] [PMID: 23508757]
[58]
Berrar, D. Cross-validation. In: Encyclopedia of Bioinformatics and Computational Biology; Elsevier, 2019; 1, pp. 542-545.

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy