Abstract
Protein subcellular localization aims at predicting the location of a protein within a cell using computational methods. Knowledge of subcellular localization of viral proteins in a host cell or virus-infected cell is important because it is closely related to their destructive tendencies and consequences. Prediction of viral protein subcellular localization is an important but challenging problem, particularly when proteins may simultaneously exist at, or move between, two or more different subcellular location sites. Most of the existing protein subcellular localization methods specialized for viral proteins are only used to deal with the single-location proteins. To better reflect the characteristics of multiplex proteins, a new predictor, called Virus-ECC-mPLoc, has been developed that can be used to deal with the systems containing both singleplex and multiplex proteins by introducing a powerful multi-label learning approach which exploits correlations between subcellular locations and by hybridizing the gene ontology information with the dipeptide composition information. It can be utilized to identify viral proteins among the following six locations: (1) viral capsid, (2) host cell membrane, (3) host endoplasmic reticulum, (4) host cytoplasm, (5) host nucleus, and (6) secreted. Experimental results show that the overall success rates thus obtained by Virus-ECC-mPLoc are 86.9% for jackknife test and 87.2% for independent data set test, which are significantly higher than that by any of the existing predictors. As a user-friendly web-server, Virus-ECCmPLoc is freely accessible to the public at the web-site http://levis.tongji.edu.cn:8080/bioinfo/Virus-ECC-mPLoc/.
Keywords: Protein subcellular localization, multi-label learning, classifier chain, multiplex proteins, computational method, viral protein, virus-infected cell, Virus-ECC-mPLoc, dipeptide, gene ontology