Abstract
In the present study, molecular descriptors and physicochemical properties were used to encode drug molecules. Based on this molecular representation method, Random forest was applied to construct a drug-drug combination network. After feature selection, an optimal features subset was built, which described the main factors of drugs in our prediction. As a result, the selected features can be clustered into three categories: elemental analysis, chemistry, and geometric features. And all of the three types features are essential elements of the drug-drug combination network. The final prediction model achieved a Matthew's correlation coefficient (MCC) of 0.5335 and an overall prediction accuracy of 88.79% for the 10-fold cross-validation test.
Keywords: Physicochemical properties, mRMR, drug-drug combinations, random forest, feature selection.