Abstract
In this paper, a new predictor called “Gpos-mPLoc”, is developed for identifying the subcellular localization of Gram positive bacterial proteins by fusing the information of gene ontology, as well as the functional domain information and sequential evolution information. Compared with the old Gpos-PLoc, the new predictor is much more powerful and flexible. Particularly, it also has the capacity to deal with multiple-location proteins as indicated by the character “m” in front of “PLoc” of its name. For a newlyconstructed stringent benchmark dataset in which none of included proteins has ? 25% pairwise sequence identity to any other in a same subset (location), the overall jackknife success rate achieved by Gpos-mPLoc was 82.2%, which was about 10% higher than the corresponding rate by the Gpos-PLoc. As a user friendly web-server, Gpos-mPLoc is freely accessible at http://www.csbio.sjtu.edu.cn/bioinf/Gpos-multi/.
Keywords: Multiplex protein, homology search, representative proteins, gene ontology, functional domain, sequential evolution, ensemble classifier, fusion approach