Decision Trees for Continuous Data and Conditional Mutual Information as a Criterion for Splitting Instances

Georgios      Drakakis; Saadiq      Moledina; Charalampos      Chomenidis; Philip      Doganis; Haralambos      Sarimveis

doi:10.2174/1386207319666160414105217

Abstract

Decision trees are renowned in the computational chemistry and machine learning communities for their interpretability. Their capacity and usage are somewhat limited by the fact that they normally work on categorical data. Improvements to known decision tree algorithms are usually carried out by increasing and tweaking parameters, as well as the post-processing of the class assignment. In this work we attempted to tackle both these issues. Firstly, conditional mutual information was used as the criterion for selecting the attribute on which to split instances. The algorithm performance was compared with the results of C4.5 (WEKA’s J48) using default parameters and no restrictions. Two datasets were used for this purpose, DrugBank compounds for HRH1 binding prediction and Traditional Chinese Medicine formulation predicted bioactivities for therapeutic class annotation. Secondly, an automated binning method for continuous data was evaluated, namely Scott’s normal reference rule, in order to allow any decision tree to easily handle continuous data. This was applied to all approved drugs in DrugBank for predicting the RDKit SLogP property, using the remaining RDKit physicochemical attributes as input.

Keywords: Computational chemistry, DrugBank, Chinese Medicine, Cheminformatics, Decision Trees, Conditional Mutual Information.

« Previous

Rights & Permissions Print Cite

Article Metrics

26

3

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1386207319666160414105217	Print ISSN 1386-2073
Publisher Name Bentham Science Publisher	Online ISSN 1875-5402

Combinatorial Chemistry & High Throughput Screening

Decision Trees for Continuous Data and Conditional Mutual Information as a Criterion for Splitting Instances

Abstract Play Pause

Related Journals

Related Books

Abstract