Beware of External Validation! - A Comparative Study of Several Validation Techniques used in QSAR Modelling

Subhabrata        Majumdar; Subhash    C.    Basak

doi:10.2174/1573409914666180426144304

Abstract

Background: Proper validation is an important aspect of QSAR modelling. External validation is one of the widely used validation methods in QSAR where the model is built on a subset of the data and validated on the rest of the samples. However, its effectiveness for datasets with a small number of samples but a large number of predictors remains suspect.

Objective: Calculating hundreds or thousands of molecular descriptors using currently available software has become the norm in QSAR research, owing to computational advances in the past few decades. Thus, for n chemical compounds and p descriptors calculated for each molecule, the typical chemometric dataset today has a high value of p but small n (i.e. n << p). Motivated by the evidence of inadequacies of external validation in estimating the true predictive capability of a statistical model in recent literature, this paper performs an extensive and comparative study of this method with several other validation techniques.

Methodology: We compared four validation methods: Leave-one-out, K-fold, external and multi-split validation, using statistical models built using the LASSO regression, which simultaneously performs variable selection and modelling. We used 300 simulated datasets and one real dataset of 95 congeneric amine mutagens for this evaluation.

Results: External validation metrics have high variation among different random splits of the data, hence are not recommended for predictive QSAR models. LOO has the overall best performance among all validation methods applied in our scenario.

Conclusion: Results from external validation are too unstable for the datasets we analyzed. Based on our findings, we recommend using the LOO procedure for validating QSAR predictive models built on high-dimensional small-sample data.

Keywords: Cross validation, Leave One Out (LOO) cross validation, K-fold cross validation, external validation, LASSO, chemical mutagens, aromatic and heteroaromatic amines.

« Previous Next »

Graphical Abstract

Rights & Permissions Print Cite

Article Metrics

35

2

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1573409914666180426144304	Print ISSN 1573-4099
Publisher Name Bentham Science Publisher	Online ISSN 1875-6697

Current Computer-Aided Drug Design

Beware of External Validation! - A Comparative Study of Several Validation Techniques used in QSAR Modelling

Abstract Play Pause

Graphical Abstract

Related Journals

Related Books

Abstract