Abstract
Plant cell contains two major cell organelles, chloroplast and mitochondria, which play key roles in energy metabolism as well as in regulating a number of prominent processes. Also, proteins that are located in both chloroplast and mitochondria presumably function distinctly in both locations, and therefore the knowledge about the localization of protein is vital. Hence, a webserver (DualPred) is designed to predict the plant dual-targeted proteins (chloroplast and mitochondria) using novel split protein-relatedness-measure feature and AdaBoost-J48 as a classifier. DualPred adopts two-layer prediction for distinguishing plant proteins dual-targeted to chloroplast and mitochondria from other localized proteins. DualPred was rigorously trained and tested with different benchmark datasets and newly developed independent dataset. Statistical techniques including K-fold cross-validation, detailed ROC analysis, Mathew’s correlation, and area under ROC curves were conducted to assess the performance of DualPred. DualPred achieved an overall accuracy of 85.0% and 91.9% in a 10-fold cross validation on the new DT167 dataset and benchmark dataset, respectively in predicting dual-targeted protein. Validation with the independent dataset (DT167 as model and benchmark dataset as model) achieved overall accuracy of 89.2% and 86.8%, respectively. Also, the Mathew’s correlation and area under ROC curves for the classifiers on different datasets were found to be significant. Hence, based on the results of various validation tests it is evident that the novel feature representation was effective in distinguishing the plant proteins dual-targeted to chloroplast and mitochondria from other localized proteins. DualPred, the web server implementation of the algorithm written in PERL could be accessed freely through http://pcmpred.bicpu.edu.in/predict. php.
Keywords: Adaptive boosting, Best-first method, Chloroplast, Dual-targeted proteins, Mitochondria, Split protein-relatednessmeasure.