Abstract
Knowing the subcellular locations of a protein helps to explore its functions in vivo since a protein can only play its roles properly if and only if it is located at certain subcellular compartments. Since it is both time-consuming and costly to determine protein subcellular localization purely by means of the conventional biotechnology experiments, computational methods play an important complementary role in this regard. Although a number of computational methods have been developed for predicting protein subcellular localization, it remains a challenge to deal with the multiplex proteins that may simultaneously exist at, or move between, two or more different locations. Here, a new predictor called Sort-PLoc was developed to tackle such a difficult and challenging problem. The key step was to select protein domains to code the protein samples by Incremental Feature Selection method. In each prediction, a series of subcellular locations were sorted descendingly according to their likelihood to be the site where the query protein may reside. Based on the selected domain set, the importance of Gene Ontology (GO) terms and domains in the contribution to the prediction was analyzed that may provide useful insights to the relevant areas. For the convenience of the broad experimental scientists, a user-friendly web-server for Sort-PLoc was established that is freely accessible to the public at http://yscl.biosino.org/.
Keywords: Multi subcellular locations, incremental feature selection, sort-PLoc, Gene Ontology, Nearest Neighbor Algorithm, Amino acid, Neighbor Algorithm, pseudo amino acid composition, macromolecular complex