Abstract
Study of interactions between drugs and target proteins is an essential step in genomic drug discovery. It is very hard to determine the compound-protein interactions or drug-target interactions by experiment alone. As supplementary, effective prediction model using machine learning or data mining methods can provide much help. In this study, a prediction method based on Nearest Neighbor Algorithm and a novel metric, which was obtained by combining compound similarity and functional domain composition, was proposed. The target proteins were divided into the following groups: enzymes, ion channels, G protein-coupled receptors, and nuclear receptors. As a result, four predictors with the optimal parameters were established. The overall prediction accuracies, evaluated by jackknife cross-validation test, for four groups of target proteins are 90.23%, 94.74%, 97.80%, and 97.51%, respectively, indicating that compound similarity and functional domain composition are very effective to predict drug-target interaction networks.
Keywords: Compound similarity, drug-target interaction network, functional domain composition, jackknife cross-validation test, Matthew's correlation coefficient, nearest neighbor algorithm, SMILES, MACC, SBASE-A