Identification of Human Protein Subcellular Location with Multiple Networks

Rui      Wang; Lei      Chen

doi:10.2174/1570164619666220531113704

Abstract

Background: Protein function is closely related to its location within the cell. Determination of protein subcellular location is helpful in uncovering its functions. However, traditional biological experiments to determine the subcellular location are of high cost and low efficiency, which cannot meet today’s needs. In recent years, many computational models have been set up to identify the subcellular location of proteins. Most models use features derived from protein sequences. Recently, features extracted from the protein-protein interaction (PPI) network have become popular in studying various protein-related problems. .

Objective: A novel model with features derived from multiple PPI networks was proposed to predict protein subcellular location. .

Methods: Protein features were obtained by a newly designed network embedding algorithm, Mnode2vec, which is a generalized version of the classic Node2vec algorithm. Two classic classification algorithms: support vector machine and random forest, were employed to build the model. .

Results: Such model provided good performance and was superior to the model with features extracted by Node2vec. Also, this model outperformed some classic models. Furthermore, Mnode2vec was found to produce powerful features when the path length was small.

Conclusion: The proposed model can be a powerful tool to determine protein subcellular location, and Mnode2vec can efficiently extract informative features from multiple networks.

Keywords: Protein subcellular location, multiple networks, network embedding algorithm, Mnode2vec, node2vec, classification algorithm.

« Previous Next »

Graphical Abstract

[1] 
Nightingale, D.J.; Geladaki, A.; Breckels, L.M.; Oliver, S.G.; Lilley, K.S. The subcellular organisation of Saccharomyces cerevisiae. Curr. Opin. Chem. Biol.,  2019, 48, 86-95.
[http://dx.doi.org/10.1016/j.cbpa.2018.10.026] [PMID: 30503867] 
[2] 
Reinhardt, A.; Hubbard, T. Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Res.,  1998, 26(9), 2230-2236.
[http://dx.doi.org/10.1093/nar/26.9.2230] [PMID: 9547285] 
[3] 
Cedano, J.; Aloy, P.; Pérez-Pons, J.A.; Querol, E. Relation between amino acid composition and cellular location of proteins. J. Mol. Biol.,  1997, 266(3), 594-600.
[http://dx.doi.org/10.1006/jmbi.1996.0804] [PMID: 9067612] 
[4] 
Pan, Y.X.; Zhang, Z.Z.; Guo, Z.M.; Feng, G.Y.; Huang, Z.D.; He, L. Application of pseudo amino acid composition for predicting protein subcellular location: Stochastic signal processing approach. J. Protein Chem.,  2003, 22(4), 395-402.
[http://dx.doi.org/10.1023/A:1025350409648] [PMID: 13678304] 
[5] 
Shi, J.Y.; Zhang, S.W.; Pan, Q.; Zhou, G.P. Using pseudo amino acid composition to predict protein subcellular location: Approached with amino acid composition distribution. Amino Acids,  2008, 35(2), 321-327.
[http://dx.doi.org/10.1007/s00726-007-0623-z] [PMID: 18209947] 
[6] 
Lin, H.; Ding, H.; Guo, F.B.; Zhang, A.Y.; Huang, J. Predicting subcellular localization of mycobacterial proteins by using Chou’s pseudo amino acid composition. Protein Pept. Lett.,  2008, 15(7), 739-744.
[http://dx.doi.org/10.2174/092986608785133681] [PMID: 18782071] 
[7] 
Chou, K.C. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins,  2001, 43(3), 246-255.
[http://dx.doi.org/10.1002/prot.1035] [PMID: 11288174] 
[8] 
Liu, T.; Zheng, X.; Wang, C.; Wang, J. Prediction of subcellular location of apoptosis proteins using pseudo amino acid composition: An approach from auto covariance transformation. Protein Pept. Lett.,  2010, 17(10), 1263-1269.
[http://dx.doi.org/10.2174/092986610792231528] [PMID: 20670213] 
[9] 
Chou, K.C.; Cai, Y.D. Using functional domain composition and support vector machines for prediction of protein subcellular location. J. Biol. Chem.,  2002, 277(48), 45765-45769.
[http://dx.doi.org/10.1074/jbc.M204161200] [PMID: 12186861] 
[10] 
Chou, K.C.; Shen, H.B. A new method for predicting the subcellular localization of eukaryotic proteins with both single and multiple sites: Euk-mPLoc 2.0. PLoS One,  2010, 5(4), e9931.
[http://dx.doi.org/10.1371/journal.pone.0009931] [PMID: 20368981] 
[11] 
Cai, Y.D.; Chou, K.C. Nearest neighbour algorithm for predicting protein subcellular location by combining functional domain composition and pseudo-amino acid composition. Biochem. Biophys. Res. Commun.,  2003, 305(2), 407-411.
[http://dx.doi.org/10.1016/S0006-291X(03)00775-7] [PMID: 12745090] 
[12] 
Chou, K.C.; Cai, Y.D. Predicting subcellular localization of proteins by hybridizing functional domain composition and pseudo-amino acid composition. J. Cell. Biochem.,  2004, 91(6), 1197-1203.
[http://dx.doi.org/10.1002/jcb.10790] [PMID: 15048874] 
[13] 
Chou, K.C.; Cai, Y.D. A new hybrid approach to predict subcellular localization of proteins by incorporating gene ontology. Biochem. Biophys. Res. Commun.,  2003, 311(3), 743-747.
[http://dx.doi.org/10.1016/j.bbrc.2003.10.062] [PMID: 14623335] 
[14] 
Wan, S.; Mak, M.W.; Kung, S.Y. GOASVM: A subcellular location predictor by incorporating term-frequency gene ontology into the gen-eral form of Chou’s pseudo-amino acid composition. J. Theor. Biol.,  2013, 323, 40-48.
[http://dx.doi.org/10.1016/j.jtbi.2013.01.012] [PMID: 23376577] 
[15] 
Wan, S.; Mak, M.W.; Kung, S.Y. mGOASVM: Multi-label protein subcellular localization based on gene ontology and support vector machines. BMC Bioinformatics,  2012, 13(1), 290.
[http://dx.doi.org/10.1186/1471-2105-13-290] [PMID: 23130999] 
[16] 
Cheng, X.; Xiao, X.; Chou, K.C. pLoc-mHum: Predict subcellular localization of multi-location human proteins via general PseAAC to winnow out the crucial GO information. Bioinformatics,  2018, 34(9), 1448-1456.
[http://dx.doi.org/10.1093/bioinformatics/btx711] [PMID: 29106451] 
[17] 
Cheng, X.; Xiao, X.; Chou, K.C. pLoc-mGneg: Predict subcellular localization of Gram-negative bacterial proteins by deep gene ontology learning via general PseAAC. Genomics,  2017, 110(4), 231-239.
[http://dx.doi.org/10.1016/j.ygeno.2017.10.002] [PMID: 28989035] 
[18] 
Cheng, X.; Xiao, X.; Chou, K.C. pLoc-mEuk: Predict subcellular localization of multi-label eukaryotic proteins by extracting the key GO information into general PseAAC. Genomics,  2018, 110(1), 50-58.
[http://dx.doi.org/10.1016/j.ygeno.2017.08.005] [PMID: 28818512] 
[19] 
Perozzi, B.; Al-Rfou, R.; Skiena, S. In Deepwalk: Online learning of social representations. Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,  2014, pp. 701-710.
[http://dx.doi.org/10.1145/2623330.2623732] 
[20] 
Grover, A.; Leskovec, J. node2vec: Scalable Feature Learning for Networks. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,  2016, pp. 855-864.
[http://dx.doi.org/10.1145/2939672.2939754] 
[21] 
Cho, H.; Berger, B.; Peng, J. Compact integration of multi-network topology for functional analysis of genes. Cell Syst.,  2016, 3(6), 540-548.e5.
[http://dx.doi.org/10.1016/j.cels.2016.10.017] [PMID: 27889536] 
[22] 
Liu, H.; Hu, B.; Chen, L.; Lu, L. Identifying protein subcellular location with embedding features learned from networks. Curr. Proteomics,  2021, 18(5), 646-660.
[http://dx.doi.org/10.2174/1570164617999201124142950] 
[23] 
Pan, X.; Chen, L.; Liu, M.; Niu, Z.; Huang, T.; Cai, Y.D. Identifying protein subcellular locations with embeddings-based node2loc. IEEE/ACM Trans. Comput. Biol. Bioinformatics,  2021, 1.
[http://dx.doi.org/10.1109/TCBB.2021.3080386] 
[24] 
Chen, L.; Li, Z.; Zeng, T.; Zhang, Y.H.; Zhang, S.; Huang, T.; Cai, Y.D. Predicting human protein subcellular locations by using a combi-nation of network and function features. Front. Genet.,  2021, 12(2229), 783128.
[http://dx.doi.org/10.3389/fgene.2021.783128] [PMID: 34804131] 
[25] 
Pan, X.; Li, H.; Zeng, T.; Li, Z.; Chen, L.; Huang, T.; Cai, Y.D. Identification of protein subcellular localization with network and functional embeddings. Front. Genet.,  2021, 11, 626500.
[http://dx.doi.org/10.3389/fgene.2020.626500] [PMID: 33584818] 
[26] 
Szklarczyk, D.; Franceschini, A.; Wyder, S.; Forslund, K.; Heller, D.; Huerta-Cepas, J.; Simonovic, M.; Roth, A.; Santos, A.; Tsafou, K.P.; Kuhn, M.; Bork, P.; Jensen, L.J.; von Mering, C. STRING v10: Protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res.,  2015, 43(D1), D447-D452.
[http://dx.doi.org/10.1093/nar/gku1003] [PMID: 25352553] 
[27] 
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn.,  1995, 20(3), 273-297.
[http://dx.doi.org/10.1007/BF00994018] 
[28] 
Breiman, L. Random forests. Mach. Learn.,  2001, 45(1), 5-32.
[http://dx.doi.org/10.1023/A:1010933404324] 
[29] 
Li, B-Q.; Huang, T.; Chen, L.; Feng, K.Y.; Cai, Y.D. Prediction of human protein subcellular locations with feature selection and analysis. In: Frontiers in Protein and Peptide Sciences; Dunn, B.M., Ed.; Bentham Science Publishers: Soest, 2014; Vol. 1, pp. 206-225.
[30] 
Fu, L.; Niu, B.; Zhu, Z.; Wu, S.; Li, W. CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics,  2012, 28(23), 3150-3152.
[http://dx.doi.org/10.1093/bioinformatics/bts565] [PMID: 23060610] 
[31] 
Ng, K.L.; Ciou, J.S.; Huang, C.H. Prediction of protein functions based on function-function correlation relations. Comput. Biol. Med.,  2010, 40(3), 300-305.
[http://dx.doi.org/10.1016/j.compbiomed.2010.01.001] [PMID: 20089249] 
[32] 
Hu, L.; Huang, T.; Liu, X.J.; Cai, Y.D. Predicting protein phenotypes based on protein-protein interaction network. PLoS One,  2011, 6(3), e17668.
[http://dx.doi.org/10.1371/journal.pone.0017668] [PMID: 21423698] 
[33] 
Hu, L.; Huang, T.; Shi, X.; Lu, W.C.; Cai, Y.D.; Chou, K.C. Predicting functions of proteins in mouse based on weighted protein-protein interaction network and protein hybrid properties. PLoS One,  2011, 6(1), e14556.
[http://dx.doi.org/10.1371/journal.pone.0014556] [PMID: 21283518] 
[34] 
Gao, P.; Wang, Q.P.; Chen, L.; Huang, T. Prediction of human genes’ regulatory functions based on proteinprotein interaction network. Protein Pept. Lett.,  2012, 19(9), 910-916.
[http://dx.doi.org/10.2174/092986612802084528] [PMID: 22486617] 
[35] 
Gao, Y.F.; Chen, L.; Cai, Y.D.; Feng, K.Y.; Huang, T.; Jiang, Y. Predicting metabolic pathways of small molecules and enzymes based on interaction information of chemicals and proteins. PLoS One,  2012, 7(9), e45944.
[http://dx.doi.org/10.1371/journal.pone.0045944] [PMID: 23029334] 
[36] 
Zhou, J.P.; Chen, L.; Guo, Z.H. iATC-NRAKEL: An efficient multi-label classifier for recognizing anatomical therapeutic chemical classes of drugs. Bioinformatics,  2020, 36(5), 1391-1396.
[PMID: 31593226] 
[37] 
Zhou, J.P.; Chen, L.; Wang, T.; Liu, M. iATC-FRAKEL: A simple multi-label web server for recognizing anatomical therapeutic chemical classes of drugs with their fingerprints only. Bioinformatics,  2020, 36(11), 3568-3569.
[http://dx.doi.org/10.1093/bioinformatics/btaa166] [PMID: 32154836] 
[38] 
Chen, L.; Wang, S.; Zhang, Y.H.; Li, J.; Xing, Z.H.; Yang, J.; Huang, T.; Cai, Y-D. Identify key sequence features to improve CRISPR sgRNA efficacy. IEEE Access,  2017, 5, 26582-26590.
[http://dx.doi.org/10.1109/ACCESS.2017.2775703] 
[39] 
Zhang, Y.H.; Li, H.; Zeng, T.; Chen, L.; Li, Z.; Huang, T.; Cai, Y.D. Identifying Transcriptomic Signatures and Rules for SARS-CoV-2 Infection. Front. Cell Dev. Biol.,  2021, 8, 627302.
[http://dx.doi.org/10.3389/fcell.2020.627302] [PMID: 33505977] 
[40] 
Zhang, Y.H.; Li, Z.; Zeng, T.; Chen, L.; Li, H.; Huang, T.; Cai, Y.D. Detecting the multiomics signatures of factor-specific inflammatory effects on airway smooth muscles. Front. Genet.,  2021, 11, 599970.
[http://dx.doi.org/10.3389/fgene.2020.599970] [PMID: 33519902] 
[41] 
Wang, Y.; Xu, Y.; Yang, Z.; Liu, X.; Dai, Q. Using recursive feature selection with random forest to improve protein structural class pre-diction for low-similarity sequences. Comput. Math. Methods Med.,  2021, 2021, 5529389.
[http://dx.doi.org/10.1155/2021/5529389] [PMID: 34055035] 
[42] 
Sang, X.; Xiao, W.; Zheng, H.; Yang, Y.; Liu, T. HMMPred: Accurate prediction of DNA-binding proteins based on HMM profiles and XGBoost feature selection. Comput. Math. Methods Med.,  2020, 2020, 1384749.
[http://dx.doi.org/10.1155/2020/1384749] [PMID: 32300371] 
[43] 
Meng, C.; Guo, F.; Zou, Q. CWLy-SVM: A support vector machine-based tool for identifying cell wall lytic enzymes. Comput. Biol. Chem.,  2020, 87, 107304.
[http://dx.doi.org/10.1016/j.compbiolchem.2020.107304] [PMID: 32580129] 
[44] 
Zhu, Y.; Hu, B.; Chen, L.; Dai, Q. iMPTCE-Hnetwork: A multi-label classifier for identifying metabolic pathway types of chemicals and enzymes with a heterogeneous network. Comput. Math. Methods Med.,  2021, 2021, 6683051.
[http://dx.doi.org/10.1155/2021/6683051] [PMID: 33488764] 
[45] 
Chang, C-C.; Lin, C-J. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol.,  2011, 2(3), 27.
[http://dx.doi.org/10.1145/1961189.1961199] 
[46] 
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; Vanderplas, J.; Passos, A.; Cournapeau, D.; Brucher, M.; Perrot, M.; Duchesnay, E. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res.,  2011, 12, 2825-2830.
[47] 
Onesime, M.; Yang, Z.; Dai, Q. Genomic island prediction via chisquare test and random forest algorithm. Comput. Math. Methods Med.,  2021, 2021, 9969751.
[http://dx.doi.org/10.1155/2021/9969751] [PMID: 34122622] 
[48] 
Chen, W.; Chen, L.; Dai, Q. iMPT-FDNPL: Identification of membrane protein types with functional domains and a natural language pro-cessing approach. Comput. Math. Methods Med.,  2021, 2021, 7681497.
[http://dx.doi.org/10.1155/2021/7681497] [PMID: 34671418] 
[49] 
Liang, H.; Chen, L.; Zhao, X.; Zhang, X. Prediction of drug side effects with a refined negative sample selection strategy. Comput. Math. Methods Med.,  2020, 2020, 1573543.
[http://dx.doi.org/10.1155/2020/1573543] [PMID: 32454877] 
[50] 
Khan, S.; Naseem, I.; Togneri, R.; Bennamoun, M. RAFP-Pred: Robust prediction of antifreeze proteins using localized analysis of n-peptide compositions. IEEE/ACM Trans. Comput. Biol. Bioinformatics,  2018, 15(1), 244-250.
[http://dx.doi.org/10.1109/TCBB.2016.2617337] [PMID: 28113406] 
[51] 
Jia, Y.; Zhao, R.; Chen, L. Similarity-based machine learning model for predicting the metabolic pathways of compounds. IEEE Access,  2020, 8, 130687-130696.
[http://dx.doi.org/10.1109/ACCESS.2020.3009439] 
[52] 
Zhao, X.; Chen, L.; Lu, J. A similarity-based method for prediction of drug side effects with heterogeneous information. Math. Biosci.,  2018, 306, 136-144.
[http://dx.doi.org/10.1016/j.mbs.2018.09.010] [PMID: 30296417] 
[53] 
Baranwal, M.; Magner, A.; Elvati, P.; Saldinger, J.; Violi, A.; Hero, A.O. A deep learning architecture for metabolic pathway prediction. Bioinformatics,  2020, 36(8), 2547-2553.
[http://dx.doi.org/10.1093/bioinformatics/btz954] [PMID: 31879763] 
[54] 
Urista, D.V.; Carrué, D.B.; Otero, I.; Arrasate, S.; Quevedo-Tumailli, V.F.; Gestal, M.; González-Díaz, H.; Munteanu, C.R. Prediction of Antimalarial drug-decorated nanoparticle delivery systems with random forest models. Biology (Basel),  2020, 9(8), 198.
[http://dx.doi.org/10.3390/biology9080198] [PMID: 32751710] 
[55] 
Kohavi, R. A study of cross-validation and bootstrap for accuracy estimation and model selection. International Joint Conference on Arti-ficial Intelligence,  1995, pp. 1137-1145.
[56] 
Matthews, B.W. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta,  1975, 405(2), 442-451.
[http://dx.doi.org/10.1016/0005-2795(75)90109-9] [PMID: 1180967] 
[57] 
Gorodkin, J. Comparing two K-category assignments by a K-category correlation coefficient. Comput. Biol. Chem.,  2004, 28(5-6), 367-374.
[http://dx.doi.org/10.1016/j.compbiolchem.2004.09.006] [PMID: 15556477] 
[58] 
Luo, Y.; Zhao, X.; Zhou, J.; Yang, J.; Zhang, Y.; Kuang, W.; Peng, J.; Chen, L.; Zeng, J. A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information. Nat. Commun.,  2017, 8(1), 573.
[http://dx.doi.org/10.1038/s41467-017-00680-8] [PMID: 28924171] 
[59] 
Yang, Y.; Chen, L. Identification of drug–disease associations by using multiple drug and disease networks. Curr. Bioinform.,  2022, 17(1), 48-59.
[http://dx.doi.org/10.2174/1574893616666210825115406] 
[60] 
Zhao, X.; Chen, L.; Guo, Z.H.; Liu, T. Predicting drug side effects with compact integration of heterogeneous networks. Curr. Bioinform.,  2019, 14(8), 709-720.
[http://dx.doi.org/10.2174/1574893614666190220114644] 
[61] 
Altschul, S.F.; Gish, W.; Miller, W.; Myers, E.W.; Lipman, D.J. Basic local alignment search tool. J. Mol. Biol.,  1990, 215(3), 403-410.
[http://dx.doi.org/10.1016/S0022-2836(05)80360-2] [PMID: 2231712] 

Rights & Permissions Print Cite

Article Metrics

3

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1570164619666220531113704	Print ISSN 1570-1646
Publisher Name Bentham Science Publisher	Online ISSN 1875-6247

Current Proteomics

Identification of Human Protein Subcellular Location with Multiple Networks

Abstract Play Pause

Graphical Abstract

Related Journals

Related Books

Abstract