Abstract
Background: Protein function is closely related to its location within the cell. Determination of protein subcellular location is helpful in uncovering its functions. However, traditional biological experiments to determine the subcellular location are of high cost and low efficiency, which cannot meet today’s needs. In recent years, many computational models have been set up to identify the subcellular location of proteins. Most models use features derived from protein sequences. Recently, features extracted from the protein-protein interaction (PPI) network have become popular in studying various protein-related problems. .
Objective: A novel model with features derived from multiple PPI networks was proposed to predict protein subcellular location. .
Methods: Protein features were obtained by a newly designed network embedding algorithm, Mnode2vec, which is a generalized version of the classic Node2vec algorithm. Two classic classification algorithms: support vector machine and random forest, were employed to build the model. .
Results: Such model provided good performance and was superior to the model with features extracted by Node2vec. Also, this model outperformed some classic models. Furthermore, Mnode2vec was found to produce powerful features when the path length was small.
Conclusion: The proposed model can be a powerful tool to determine protein subcellular location, and Mnode2vec can efficiently extract informative features from multiple networks.
Keywords: Protein subcellular location, multiple networks, network embedding algorithm, Mnode2vec, node2vec, classification algorithm.
Graphical Abstract
[http://dx.doi.org/10.1016/j.cbpa.2018.10.026] [PMID: 30503867]
[http://dx.doi.org/10.1093/nar/26.9.2230] [PMID: 9547285]
[http://dx.doi.org/10.1006/jmbi.1996.0804] [PMID: 9067612]
[http://dx.doi.org/10.1023/A:1025350409648] [PMID: 13678304]
[http://dx.doi.org/10.1007/s00726-007-0623-z] [PMID: 18209947]
[http://dx.doi.org/10.2174/092986608785133681] [PMID: 18782071]
[http://dx.doi.org/10.1002/prot.1035] [PMID: 11288174]
[http://dx.doi.org/10.2174/092986610792231528] [PMID: 20670213]
[http://dx.doi.org/10.1074/jbc.M204161200] [PMID: 12186861]
[http://dx.doi.org/10.1371/journal.pone.0009931] [PMID: 20368981]
[http://dx.doi.org/10.1016/S0006-291X(03)00775-7] [PMID: 12745090]
[http://dx.doi.org/10.1002/jcb.10790] [PMID: 15048874]
[http://dx.doi.org/10.1016/j.bbrc.2003.10.062] [PMID: 14623335]
[http://dx.doi.org/10.1016/j.jtbi.2013.01.012] [PMID: 23376577]
[http://dx.doi.org/10.1186/1471-2105-13-290] [PMID: 23130999]
[http://dx.doi.org/10.1093/bioinformatics/btx711] [PMID: 29106451]
[http://dx.doi.org/10.1016/j.ygeno.2017.10.002] [PMID: 28989035]
[http://dx.doi.org/10.1016/j.ygeno.2017.08.005] [PMID: 28818512]
[http://dx.doi.org/10.1145/2623330.2623732]
[http://dx.doi.org/10.1145/2939672.2939754]
[http://dx.doi.org/10.1016/j.cels.2016.10.017] [PMID: 27889536]
[http://dx.doi.org/10.2174/1570164617999201124142950]
[http://dx.doi.org/10.1109/TCBB.2021.3080386]
[http://dx.doi.org/10.3389/fgene.2021.783128] [PMID: 34804131]
[http://dx.doi.org/10.3389/fgene.2020.626500] [PMID: 33584818]
[http://dx.doi.org/10.1093/nar/gku1003] [PMID: 25352553]
[http://dx.doi.org/10.1007/BF00994018]
[http://dx.doi.org/10.1023/A:1010933404324]
[http://dx.doi.org/10.1093/bioinformatics/bts565] [PMID: 23060610]
[http://dx.doi.org/10.1016/j.compbiomed.2010.01.001] [PMID: 20089249]
[http://dx.doi.org/10.1371/journal.pone.0017668] [PMID: 21423698]
[http://dx.doi.org/10.1371/journal.pone.0014556] [PMID: 21283518]
[http://dx.doi.org/10.2174/092986612802084528] [PMID: 22486617]
[http://dx.doi.org/10.1371/journal.pone.0045944] [PMID: 23029334]
[PMID: 31593226]
[http://dx.doi.org/10.1093/bioinformatics/btaa166] [PMID: 32154836]
[http://dx.doi.org/10.1109/ACCESS.2017.2775703]
[http://dx.doi.org/10.3389/fcell.2020.627302] [PMID: 33505977]
[http://dx.doi.org/10.3389/fgene.2020.599970] [PMID: 33519902]
[http://dx.doi.org/10.1155/2021/5529389] [PMID: 34055035]
[http://dx.doi.org/10.1155/2020/1384749] [PMID: 32300371]
[http://dx.doi.org/10.1016/j.compbiolchem.2020.107304] [PMID: 32580129]
[http://dx.doi.org/10.1155/2021/6683051] [PMID: 33488764]
[http://dx.doi.org/10.1145/1961189.1961199]
[http://dx.doi.org/10.1155/2021/9969751] [PMID: 34122622]
[http://dx.doi.org/10.1155/2021/7681497] [PMID: 34671418]
[http://dx.doi.org/10.1155/2020/1573543] [PMID: 32454877]
[http://dx.doi.org/10.1109/TCBB.2016.2617337] [PMID: 28113406]
[http://dx.doi.org/10.1109/ACCESS.2020.3009439]
[http://dx.doi.org/10.1016/j.mbs.2018.09.010] [PMID: 30296417]
[http://dx.doi.org/10.1093/bioinformatics/btz954] [PMID: 31879763]
[http://dx.doi.org/10.3390/biology9080198] [PMID: 32751710]
[http://dx.doi.org/10.1016/0005-2795(75)90109-9] [PMID: 1180967]
[http://dx.doi.org/10.1016/j.compbiolchem.2004.09.006] [PMID: 15556477]
[http://dx.doi.org/10.1038/s41467-017-00680-8] [PMID: 28924171]
[http://dx.doi.org/10.2174/1574893616666210825115406]
[http://dx.doi.org/10.2174/1574893614666190220114644]
[http://dx.doi.org/10.1016/S0022-2836(05)80360-2] [PMID: 2231712]