Abstract
A primary goal of quantitative structure-activity relationships (QSARs) and quantitative structure-property relationships (QSPRs) is to predict chemical activities from chemical structure. Chemical structure can be quantified in many ways resulting in hundreds, if not thousands, of measurements for every chemical. Chemical activities measures how the chemical interacts with other chemicals, e.g. toxicity, biodegradability, boiling point, and vapor pressure. Typically there are more chemical structure measurements than chemicals being measured, the so-called large-p, small-n problem. Here we review some of the statistical procedures that have been commonly used to explore these problems in the past and provide several examples of their use. Finally, we peek into the future to discuss two areas that we believe will see dramatically increased attention in the near future: model averaging and Bayesian techniques.
Keywords: AIC, Bayesian analysis, BIC, cross-validation, elastic net, k-means clustering, LASSO, model averaging, model selection, modeling, partial least squares, prediction, principal component analysis, principal component regression, regression, ridge regression.