Predicting Drug Side Effects with Compact Integration of Heterogeneous Networks

Xian       Zhao; Lei       Chen; Zi-Han       Guo; Tao      Liu

doi:10.2174/1574893614666190220114644

Abstract

Background: The side effects of drugs are not only harmful to humans but also the major reasons for withdrawing approved drugs, bringing greater risks for pharmaceutical companies. However, detecting the side effects for a given drug via traditional experiments is time- consuming and expensive. In recent years, several computational methods have been proposed to predict the side effects of drugs. However, most of the methods cannot effectively integrate the heterogeneous properties of drugs.

Methods: In this study, we adopted a network embedding method, Mashup, to extract essential and informative drug features from several drug heterogeneous networks, representing different properties of drugs. For side effects, a network was also built, from where side effect features were extracted. These features can capture essential information about drugs and side effects in a network level. Drug and side effect features were combined together to represent each pair of drug and side effect, which was deemed as a sample in this study. Furthermore, they were fed into a random forest (RF) algorithm to construct the prediction model, called the RF network model.

Results: The RF network model was evaluated by several tests. The average of Matthews correlation coefficients on the balanced and unbalanced datasets was 0.640 and 0.641, respectively.

Conclusion: The RF network model was superior to the models incorporating other machine learning algorithms and one previous model. Finally, we also investigated the influence of two feature dimension parameters on the RF network model and found that our model was not very sensitive to these parameters.

Keywords: Drug discovery, drug side effect, network embedding method, mashup, heterogeneous network, random forest.

« Previous Next »

Graphical Abstract

[1] 
Pauwels E, Stoven V, Yamanishi Y. Predicting drug side-effect profiles: a chemical fragment-based approach. BMC Bioinformatics  2011; 12: 169.
[2] 
Sohn S, Kocher JPA, Chute CG, Savova GK. Drug side effect extraction from clinical narratives of psychiatry and psychology patients. Journal of the American Medical Informatics Association  2011; 18(Supplement_1): i144-9.
[3] 
Mizutani S, Pauwels E, Stoven V, Goto S, Yamanishi Y. Relating drug-protein interaction network with drug side effects. Bioinformatics  2012; 28(18): i522-8.
[4] 
Niu Y, Zhang W. Quantitative prediction of drug side effects based on drug-related features. Interdiscip Sci  2017; 9(3): 434-44.
[5] 
Fukuzaki M, Seki M, Kashima H, Sese J, Eds. Side Effect Prediction Using Cooperative Pathways. IEEE International Conference on Bioinformatics and Biomedicine 
[6] 
Yamanishi Y, Pauwels E, Kotera M. Drug side-effect prediction based on the integration of chemical and biological spaces. J Chem Inf Model  2012; 52(12): 3284-92.
[7] 
Zhao X, Chen L, Lu J. A similarity-based method for prediction of drug side effects with heterogeneous information. Math Biosci  2018; 306: 136-44.
[8] 
Breiman L. Random forests. Mach Learn  2001; 45(1): 5-32.
[9] 
Cho H, Berger B, Peng J. Compact integration of multi-network topology for functional analysis of genes. Cell Syst  2016; 3(6): 540-548.e5.
[10] 
Kuhn M, Campillos M, Letunic I, Jensen LJ, Bork P. A side effect resource to capture phenotypic effects of drugs. Mol Syst Biol  2010; 6: 343.
[11] 
Weininger D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci  1988; 28(1): 31-6.
[12] 
Available from: RDKit. Open-source cheminformatics http://www.rdkit.org
[13] 
Rogers D, Hahn M. Extended-connectivity fingerprints. J Chem Inf Model  2010; 50(5): 742-54.
[14] 
Hattori M, Okuno Y, Goto S, Kanehisa M. Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. J Am Chem Soc  2003; 125(39): 11853-65.
[15] 
Kanehisa M, Furumichi M, Tanabe M, Sato Y, Morishima K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res  2017; 45(D1): D353-61.
[16] 
Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res  2000; 28(1): 27-30.
[17] 
Kuhn M, von Mering C, Campillos M, Jensen LJ, Bork P. STITCH: interaction networks of chemicals and proteins. Nucleic Acids Res  2008; 36(Database issue): D684-8.
[18] 
Kuhn M, Szklarczyk D, Pletscher-Frankild S, et al. STITCH 4: integration of protein-chemical interactions with user data. Nucleic Acids Res  2014; 42(Database issue): D401-7.
[19] 
Wishart DS, Knox C, Guo AC, et al. DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res  2008; 36(Database issue): D901-6.
[20] 
Wishart DS, Knox C, Guo AC, et al. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res  2006; 34(Suppl. 1): D668-72.
[21] 
Luo Y, Zhao X, Zhou J, et al. A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information. Nat Commun  2017; 8(1): 573.
[22] 
Wang R, Liu G, Wang C, Su L, Sun L. Predicting overlapping protein complexes based on core-attachment and a local modularity structure. BMC Bioinformatics  2018; 19(1): 305.
[23] 
Schwartz GW, Petrovic J, Zhou Y, Faryabi RB. Differential Integration of Transcriptome and Proteome Identifies Pan-Cancer Prognostic Biomarkers. Front Genet  2018; 9: 205.
[24] 
Tranchevent LC, Nazarov PV, Kaoma T, et al. Predicting clinical outcome of neuroblastoma patients using an integrative network-based approach. Biol Direct  2018; 13(1): 12.
[25] 
Peng J, Wang H, Lu J, Hui W, Wang Y, Shang X. Identifying term relations cross different gene ontology categories. BMC Bioinformatics  2017; 18(Suppl. 16): 573.
[26] 
Ma CY, Chen YPP, Berger B, Liao CS. Identification of protein complexes by integrating multiple alignment of protein interaction networks. Bioinformatics  2017; 33(11): 1681-8.
[27] 
Köhler S, Bauer S, Horn D, Robinson PN. Walking the interactome for prioritization of candidate disease genes. Am J Hum Genet  2008; 82(4): 949-58.
[28] 
Chen L, Liu T, Zhao X. Inferring anatomical therapeutic chemical
	(ATC) class of drugs using shortest path and random walk with restart
	algorithms. BBA - Molecular Basis of Disease  2018; 1864(6,Part B): 2228-40.
[29] 
Chen L, Zhang Y-H, Zhang Z, Huang T, Cai Y-D. Inferring novel tumor suppressor genes with a protein-protein interac-tion network and network diffusion algorithms. Mol Ther Methods Clin Dev  2018; 10: 57-67.
[30] 
Fernandez-Delgado M, Cernadas E, Barro S, Amorim D. Do we Need Hundreds of Classifiers to Solve Real World Classi-fication Problems? J Mach Learn Res  2014; 15: 3133-81.
[31] 
Chen L, Chu C, Huang T, Kong X, Cai YD. Prediction and analysis of cell-penetrating peptides using pseudo-amino acid composition and random forest models. Amino Acids  2015; 47(7): 1485-93.
[32] 
Kandaswamy KK, Chou K-C, Martinetz T, et al. AFP-Pred: A random forest approach for predicting antifreeze proteins from sequence-derived properties. J Theor Biol  2011; 270(1): 56-62.
[33] 
Casanova R, Saldana S, Chew EY, Danis RP, Greven CM, Ambrosius WT. Application of random forests methods to diabetic retinopathy classification analyses. PLoS One  2014; 9(6)e98587
[34] 
Pugalenthi G, Kandaswamy KK, Chou K-C, Vivekanandan S, Kolatkar P. RSARF: prediction of residue solvent accessibility from protein sequence using random forest method. Protein Pept Lett  2012; 19(1): 50-6.
[35] 
Sprague B, Shi Q, Kim MT, et al. Design, synthesis and experimental validation of novel potential chemopreventive agents using random forest and support vector machine binary classifiers. J Comput Aided Mol Des  2014; 28(6): 631-46.
[36] 
Ijaz A. SUMOhunt: Combining Spatial Staging between Lysine and SUMO with Random Forests to Predict SUMOylation. ISRN Bioinform 2013.2013671269
[37] 
Witten IH, Frank E, Eds. Data Mining:Practical Machine Learning Tools and Techniques. San Francisco: Morgan, Kaufmann 2005.
[38] 
Kohavi R. Ed.A study of cross-validation and bootstrap for accuracy estimation and model selection.International joint Conference on artificial intelligence.  In: Lawrence Erlbaum Associates Ltd;.  1995.
[39] 
Chen L, Li J, Zhang Y-H, et al. Identification of gene expression signatures across different types of neural stem cells with the Monte-Carlo feature selection method. J Cell Biochem  2018; 119(4): 3394-403.
[40] 
Chen L, Pan X, Hu X, et al. Gene expression differences among different MSI statuses in colorectal cancer. Int J Cancer  2018; 143(7): 1731-40.
[41] 
Cai Y-D, Zhang S, Zhang Y-H, et al. Identification of the Gene Expression Rules That Define the Subtypes in Glioma. J Clin Med  2018; 7(10): 350.
[42] 
Matthews BW. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta  1975; 405(2): 442-51.
[43] 
Chen L, Chu C, Zhang Y-H, Zheng M-Y, Zhu L, Kong X, et al. Identification of Drug-Drug Interactions Using Chemical Interactions. Curr Bioinform  2017; 12(6): 526-34.
[44] 
Chen L, Wang S, Zhang Y-H, Li J, Xing Z-H, Yang J, et al. Identify key sequence features to improve CRISPR sgRNA ef-ficacy IEEE Access  2017; 5: 26582-90.
[45] 
Chen L, Wang S, Zhang Y-H, et al. Prediction of nitrated tyrosine residues in protein sequences by extreme learning machine and feature selection methods. Comb Chem High Throughput Screen  2018; 21(6): 393-402.
[46] 
Sasaki Y. The truth of the f-measure Teach Tutor mater  2007; 1-5.
[47] 
Powers D. Evaluation: From precision, recall and f-measure to roc., informedness, markedness & correlation. J Mach Learn Technol  2011; 2(1): 37-63.
[48] 
Egan J. Signal Detection Theory and ROC Analysis. New York: Academic Press 1975.
[49] 
Cover T, Hart P. Nearest neighbor pattern classification. IEEE Trans Inf Theory  1967; 13(1): 21-7.
[50] 
Corinna Cortes VV. Support-vector networks. Mach Learn  1995; 20(3): 273-97.
[51] 
Ting KM, Witten IH. Eds.Stacking bagged and dagged models. Fourteenth international Conference on Machine Learning. San Francisco, CA. . 1997.

Rights & Permissions Print Cite

Article Metrics

43

2

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1574893614666190220114644	Print ISSN 1574-8936
Publisher Name Bentham Science Publisher	Online ISSN 2212-392X

Current Bioinformatics

Predicting Drug Side Effects with Compact Integration of Heterogeneous Networks

Abstract Play Pause

Graphical Abstract

Related Journals

Related Books

Abstract