Abstract
Using Self-Organizing Map (SOM) and Support Vector Machine (SVM), four classification models were built to predict whether a compound is an active or weakly active inhibitor of Aurora B kinase. A dataset of 679 Aurora B kinase inhibitors was collected, and randomly split into a training set (278 active and 204 weakly active inhibitors) and a test set (109 active and 88 weakly active inhibitors). Based on 19 selected ADRIANA.Code descriptors and 135 MACCS fingerprints, all the four models showed a good prediction accuracy of over 87% on the test set. It benefited from the advantages of two different types of molecular descriptors in encoding structure information of compounds and characterizing the diversity of different inhibitors. Some molecular properties, such as hydrogen-bonding interactions and atom charge related descriptors were found to be important to the bioactivity of Aurora B kinase inhibitors.
Keywords: ADRIANA.Code descriptors, Aurora B kinase inhibitors, classification model, MACCS fingerprints, Self- Organizing Map (SOM), Support Vector Machine (SVM).