Generic placeholder image

Current Bioinformatics

Editor-in-Chief

ISSN (Print): 1574-8936
ISSN (Online): 2212-392X

General Research Article

SVM-Root: Identification of Root-Associated Proteins in Plants by Employing the Support Vector Machine with Sequence-Derived Features

Author(s): Prabina Kumar Meher*, Siddhartha Hati, Tanmaya Kumar Sahu, Upendra Pradhan, Ajit Gupta and Surya Narayan Rath

Volume 19, Issue 1, 2024

Published on: 22 December, 2023

Page: [91 - 102] Pages: 12

DOI: 10.2174/1574893618666230417104543

Price: $65

conference banner
Abstract

Background: Root is a desirable trait for modern plant breeding programs, as the roots play a pivotal role in the growth and development of plants. Therefore, identification of the genes governing the root traits is an essential research component. With regard to the identification of root-associated genes/proteins, the existing wet-lab experiments are resource intensive and the gene expression studies are species-specific. Thus, we proposed a supervised learning-based computational method for the identification of root-associated proteins.

Methods: The problem was formulated as a binary classification, where the root-associated proteins and non-root-associated proteins constituted the two classes. Four different machine learning algorithms such as support vector machine (SVM), extreme gradient boosting, random forest, and adaptive boosting were employed for the classification of proteins of the two classes. Sequence-derived features such as AAC, DPC, CTD, PAAC, and ACF were used as input for the learning algorithms.

Results: The SVM achieved higher accuracy with the 250 selected features of AAC+DPC+CTD than that of other possible combinations of feature sets and learning algorithms. Specifically, SVM with the selected features achieved overall accuracies of 0.74, 0.73, and 0.73 when evaluated with single 5-fold cross-validation (5F-CV), repeated 5F-CV, and independent test set, respectively.

Conclusions: A web-enabled prediction tool SVM-Root (https://iasri-sg.icar.gov.in/svmroot/) has been developed for the computational prediction of the root-associated proteins. Being the first of its kind, the proposed model is believed to supplement the existing experimental methods and high throughput GWAS and transcriptome studies.

Graphical Abstract

[1]
Grierson C, Nielsen E, Ketelaarc T, Schiefelbein J. Root hairs. Arabidopsis Book 2014; 2014(12): e0172.
[http://dx.doi.org/10.1199/tab.0172]
[2]
Hayat R, Ali S, Amara U, Khalid R, Ahmed I. Soil beneficial bacteria and their role in plant growth promotion: A review. Ann Microbiol 2010; 60(4): 579-98.
[http://dx.doi.org/10.1007/s13213-010-0117-1]
[3]
Brown LK, George TS, Dupuy LX, White PJ. A conceptual model of root hair ideotypes for future agricultural environments: What combination of traits should be targeted to cope with limited P availability? Ann Bot 2013; 112(2): 317-30.
[http://dx.doi.org/10.1093/aob/mcs231] [PMID: 23172412]
[4]
Moisseyev G, Park K, Cui A, et al. RGPDB: Database of root-associated genes and promoters in maize, soybean, and sorghum. Database 2020; 2020: baaa038.
[http://dx.doi.org/10.1093/database/baaa038]
[5]
Coudert Y, Le VAT, Adam H, et al. Identification of CROWN ROOTLESS 1‐regulated genes in rice reveals specific and conserved elements of postembryonic root formation. New Phytol 2015; 206(1): 243-54.
[http://dx.doi.org/10.1111/nph.13196] [PMID: 25442012]
[6]
Ober ES, Alahmad S, Cockram J, et al. Wheat root systems as a breeding target for climate resilience. Theor Appl Genet 2021; 134(6): 1645-62.
[http://dx.doi.org/10.1007/s00122-021-03819-w] [PMID: 33900415]
[7]
Ogura T, Goeschl C, Filiault D, et al. Root system depth in arabidopsis is shaped by EXOCYST70A3 via the dynamic modulation of auxin transport. Cell 2019; 178(2): 400-412.e16.
[http://dx.doi.org/10.1016/j.cell.2019.06.021] [PMID: 31299202]
[8]
Li Y, Liu X, Chen R, Tian J, Fan Y, Zhou X. Genome-scale mining of root-preferential genes from maize and characterization of their promoter activity. BMC Plant Biol 2019; 19(1): 584.
[http://dx.doi.org/10.1186/s12870-019-2198-8] [PMID: 31878892]
[9]
Lynch JP, Lynch JP. Roots of the second green revolution. Aust J Bot 2007; 55(5): 493-512.
[http://dx.doi.org/10.1071/BT06118]
[10]
Gewin V. Food: An underground revolution. Nature 2010; 466(7306): 552-3.
[http://dx.doi.org/10.1038/466552a] [PMID: 20671689]
[11]
Coudert Y, Périn C, Courtois B, Khong NG, Gantet P. Genetic control of root development in rice, the model cereal. Trends Plant Sci 2010; 15(4): 219-26.
[http://dx.doi.org/10.1016/j.tplants.2010.01.008] [PMID: 20153971]
[12]
Uga Y, Kitomi Y, Ishikawa S, Yano M. Genetic improvement for root growth angle to enhance crop production. Breed Sci 2015; 65(2): 111-9.
[http://dx.doi.org/10.1270/jsbbs.65.111] [PMID: 26069440]
[13]
Kalidhasan N, Joshi D, Bhatt T K, Gupta A K. Identification of key genes involved in root development of tomato using expressed sequence tag analysis. Physiol Mol Biol Plants 2015; 21(4): 491-503.
[http://dx.doi.org/10.1007/s12298-015-0304-4]
[14]
Birnbaum K, Shasha DE, Wang JY, et al. A gene expression map of the Arabidopsis root. Science 2003; 302(5652): 1956-60.
[http://dx.doi.org/10.1126/science.1090022] [PMID: 14671301]
[15]
Fizames C, Muños S, Cazettes C, et al. The Arabidopsis root transcriptome by serial analysis of gene expression. Gene identification using the genome sequence. Plant Physiol 2004; 134(1): 67-80.
[http://dx.doi.org/10.1104/pp.103.030536] [PMID: 14730065]
[16]
Jones M, Smirnoff N. Nuclear dynamics during the simultaneous and sustained tip growth of multiple root hairs arising from a single root epidermal cell. J Exp Bot 2006; 57(15): 4269-75.
[http://dx.doi.org/10.1093/jxb/erl204] [PMID: 17088364]
[17]
Markakis MN, De Cnodder T, Lewandowski M, et al. Identification of genes involved in the ACC-mediated control of root cell elongation in Arabidopsis thaliana. BMC Plant Biol 2012; 12(1): 208.
[http://dx.doi.org/10.1186/1471-2229-12-208] [PMID: 23134674]
[18]
Toal T W, Ron M, Gibson D, et al. Regulation of root angle and gravitropism. G3 2018; 8(12): 3841-55.
[http://dx.doi.org/10.1534/g3.118.200540]
[19]
Kwasniewski M, Nowakowska U, Szumera J, Chwialkowska K, Szarejko I. iRootHair: A comprehensive root hair genomics database. Plant Physiol 2012; 161(1): 28-35.
[http://dx.doi.org/10.1104/pp.112.206441] [PMID: 23129204]
[20]
Qi XH, Xu XW, Lin XJ, Zhang WJ, Chen XH. Identification of differentially expressed genes in cucumber (Cucumis sativus L.) root under waterlogging stress by digital gene expression profile. Genomics 2012; 99(3): 160-8.
[http://dx.doi.org/10.1016/j.ygeno.2011.12.008] [PMID: 22240004]
[21]
Halder T, Liu H, Chen Y, Yan G, Siddique KHM. Identification of candidate genes for root traits using genotype–phenotype association analysis of near-isogenic lines in hexaploid Wheat (Triticum aestivum L.). Int J Mol Sci 2021; 22(7): 3579.
[http://dx.doi.org/10.3390/ijms22073579] [PMID: 33808237]
[22]
Xu F, Chen S, Yang X, et al. Genome-wide association study on root traits under different growing environments in wheat (Triticum aestivum L.). Front Genet 2021; 12: 646712.
[http://dx.doi.org/10.3389/fgene.2021.646712] [PMID: 34178022]
[23]
Huang F, Chen Z, Du D, et al. Genome-wide linkage mapping of QTL for root hair length in a Chinese common wheat population. Crop J 2020; 8(6): 1049-56.
[http://dx.doi.org/10.1016/j.cj.2020.02.007]
[24]
Kirschner GK, Rosignoli S, Guo L, et al. Enhanced gravitropism 2 encodes a sterile alpha motif–containing protein that controls root growth angle in barley and wheat. Proc Natl Acad Sci 2021; 118(35): e2101526118.
[http://dx.doi.org/10.1073/pnas.2101526118] [PMID: 34446550]
[25]
Cai YD, Chou KC. Predicting membrane protein type by functional domain composition and pseudo-amino acid composition. J Theor Biol 2006; 238(2): 395-400.
[http://dx.doi.org/10.1016/j.jtbi.2005.05.035] [PMID: 16040052]
[26]
Meher PK, Sahu TK, Saini V, Rao AR. Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico-chemical and structural features into Chou’s general PseAAC. Sci Rep 2017; 7(1): 42362.
[http://dx.doi.org/10.1038/srep42362] [PMID: 28205576]
[27]
Meher PK, Sahu TK, Mohanty J, et al. nifPred: Proteome-wide identification and categorization of nitrogen-fixation proteins of diaztrophs based on composition-transition-distribution features using support vector machine. Front Microbiol 2018; 9: 1100.
[http://dx.doi.org/10.3389/fmicb.2018.01100] [PMID: 29896173]
[28]
Chou KC. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins 2001; 43(3): 246-55.
[http://dx.doi.org/10.1002/prot.1035] [PMID: 11288174]
[29]
Dubchak I, Muchnik I, Holbrook SR, Kim SH. Prediction of protein folding class using global description of amino acid sequence. Proc Natl Acad Sci 1995; 92(19): 8700-4.
[http://dx.doi.org/10.1073/pnas.92.19.8700] [PMID: 7568000]
[30]
Govindan G, Nair AS. Composition, transition and distribution [CTD] - A dynamic feature for predictions based on hierarchical structure of cellular sorting. Proceedings - 2011 Annual IEEE India Conference: Engineering Sustainable Solutions, INDICON-2011.
[http://dx.doi.org/10.1109/INDCON.2011.6139332]
[31]
Liu W, Chou KC. Prediction of protein structural classes by modified mahalanobis discriminant algorithm. J Protein Chem 1998; 17(3): 209-17.
[http://dx.doi.org/10.1023/A:1022576400291] [PMID: 9588944]
[32]
Zhang CT, Lin ZS, Zhang Z, Yan M. Prediction of the helix/strand content of globular proteins based on their primary sequences. Protein Eng Des Sel 1998; 11(11): 971-9.
[http://dx.doi.org/10.1093/protein/11.11.971] [PMID: 9876917]
[33]
Ding Y, Cai Y, Zhang G, Xu W. The influence of dipeptide composition on protein thermostability. FEBS Lett 2004; 569(1-3): 284-8.
[http://dx.doi.org/10.1016/j.febslet.2004.06.009] [PMID: 15225649]
[34]
Wang YC, Wang XB, Yang ZX, Deng NY. Prediction of enzyme subfamily class via pseudo amino acid composition by incorporating the conjoint triad feature. Protein Pept Lett 2010; 17(11): 1441-9.
[http://dx.doi.org/10.2174/0929866511009011441] [PMID: 20666729]
[35]
Kawashima S, Kanehisa M. AAindex: Amino acid index database. Nucleic Acids Res 2000; 28(1): 374-4.
[http://dx.doi.org/10.1093/nar/28.1.374] [PMID: 10592278]
[36]
Xiao N, Cao DS, Zhu MF, Xu QS. protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences. Bioinformatics 2015; 31(11): 1857-9.
[http://dx.doi.org/10.1093/bioinformatics/btv042] [PMID: 25619996]
[37]
Li H. Using the BioSeqClass Package. Homo. 2010; pp. 1-18. Available from: https://www.bioconductor.org/packages//2.7/bioc/vignettes/BioSeqClass/inst/doc/BioSeqClass.pdf
[38]
Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Mach Learn 2002; 46(1/3): 389-422.
[http://dx.doi.org/10.1023/A:1012487302797]
[39]
Harikrishna S, Farquad MAH, Shabana . Credit scoring using support vector machine: A comparative analysis. Adv Mat Res 2012; 433(440): 6527-6533,-.
[http://dx.doi.org/10.4028/www.scientific.net/AMR.433-440.6527]
[40]
Lin X, Yang F, Zhou L, et al. A support vector machine-recursive feature elimination feature selection method based on artificial contrast variables and mutual information. J Chromatogr B Analyt Technol Biomed Life Sci 2012; 910: 149-55.
[http://dx.doi.org/10.1016/j.jchromb.2012.05.020] [PMID: 22682888]
[41]
Huang ML, Hung YH, Lee WM, Li RK, Jiang BR. SVM-RFE based feature selection and Taguchi parameters optimization for multiclass SVM classifier. ScientificWorldJournal 2014; 2014: 1-10.
[http://dx.doi.org/10.1155/2014/795624] [PMID: 25295306]
[42]
Meher PK, Begam S, Sahu TK, et al. ASRmiRNA: Abiotic stress-responsive mirna prediction in plants by using machine learning algorithms with pseudo K-Tuple Nucleotide compositional features. Int J Mol Sci 2022; 23(3): 1612.
[http://dx.doi.org/10.3390/ijms23031612] [PMID: 35163534]
[43]
Das P, Roychowdhury A, Das S, Roychoudhury S, Tripathy S. sigFeature: Novel significant feature selection method for classification of gene expression data using support vector machine and t statistic. Front Genet 2020; 11: 247.
[http://dx.doi.org/10.3389/fgene.2020.00247] [PMID: 32346383]
[44]
Cortes C, Vapnik V. Support-vector networks. Mach Learn 1995; 20(3): 273-97.
[http://dx.doi.org/10.1007/BF00994018]
[45]
Breiman L. Random forests. Mach Learn 2001; 45(1): 5-32.
[http://dx.doi.org/10.1023/A:1010933404324]
[46]
Freund Y, Schapire RE. Experiments with a new boosting algorithm. In Proceedings of the 13th International Conference on Machine Learning. San Fransisco, USA. 1996; pp. 148-56.
[47]
Chen T, Guestrin C. XGBoost: A scalable tree boosting system. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Anchorage, USA. 2019; pp. 13-7.
[http://dx.doi.org/10.1145/2939672.2939785]
[48]
Dimitriadou AE, Hornik K, Leisch F, Meyer D, Weingessel A, Friedrichleischcituwienacat MFL. The E1071 Package. 2014. Available from: https://cran.r-project.org/web/packages/e1071/index.html
[49]
Liaw A, Wiener M. Classification and regression by random forest. R News 2002; 2: 18-22. Available from: https://cogns.northwestern.edu/cbmg/LiawAndWiener2002.pdf
[50]
Alfaro E, Gámez M, García N. adabag: An R package for classification with boosting and bagging. J Stat Softw 2013; 54(2): 1-35.
[http://dx.doi.org/10.18637/jss.v054.i02]
[51]
xgboost: Extreme Gradient Boosting version 1.6.0.1 from CRAN. Available from: https://rdrr.io/cran/xgboost/ (accessed 2022-04-21).
[52]
Fawcett T. An introduction to ROC analysis. Pattern Recognit Lett 2006; 27(8): 861-74.
[http://dx.doi.org/10.1016/j.patrec.2005.10.010]
[53]
Davis J, Goadrich M. The relationship between precision-recall and ROC curves. In. ACM International Conference Proceeding Series. New York, USA: ACM 2006; pp. 233-40.
[http://dx.doi.org/10.1145/1143844.1143874]
[54]
Manschadi AM, Kaul HP, Vollmann J, Eitzinger J, Wenzel W. Developing phosphorus-efficient crop varieties-An interdisciplinary research framework. Field Crops Res 2014; 162: 87-98.
[http://dx.doi.org/10.1016/j.fcr.2013.12.016]
[55]
Comas LH, Becker SR, Cruz VMV, Byrne PF, Dierig DA. Root traits contributing to plant productivity under drought. Front Plant Sci 2013; 4: 442.
[http://dx.doi.org/10.3389/fpls.2013.00442] [PMID: 24204374]
[56]
Fenta B, Beebe S, Kunert K, et al. Field phenotyping of soybean roots for drought stress tolerance. Agronomy 2014; 4(3): 418-35.
[http://dx.doi.org/10.3390/agronomy4030418]
[57]
Wade LJ, Bartolome V, Mauleon R, et al. Environmental response and genomic regions correlated with rice root growth and yield under drought in the oryzasnp panel across multiple study systems. PLoS One 2015; 10(4): e0124127.
[http://dx.doi.org/10.1371/journal.pone.0124127] [PMID: 25909711]
[58]
Rosas-Quijano R, Ontiveros-Cisneros A, Montes-García N, et al. A General Overview of Sweet Sorghum Genomics. London, UK: IntechOpen 2021.
[http://dx.doi.org/10.5772/intechopen.98539]
[59]
Brendel V, Kurtz S, Walbot V. Comparative genomics of Arabidopsis and maize: Prospects and limitations. Genome Biol 2002; 3(3): reviews1005.1.
[http://dx.doi.org/10.1186/gb-2002-3-3-reviews1005] [PMID: 11897028]
[60]
Paterson AH. Genomics of sorghum. Int J Plant Genomics 2008; 2008: 1-6.
[http://dx.doi.org/10.1155/2008/362451] [PMID: 18483564]
[61]
Traore SM, He G, Traore SM, He G. Soybean as a Model Crop to Study Plant Oil Genes: Mutations in FAD2 Gene Family. London. UK: IntechOpen 2021.
[http://dx.doi.org/10.5772/intechopen.99752]
[62]
Ferguson BJ, Gresshoff PM. Soybean as a model legume. Grain Legumes 2009; 53: 7.

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy