Abstract
Quantitative Structure Activity Relationship (QSAR) modeling has been traditionally applied as an evaluative approach, i.e., with the focus on developing retrospective and explanatory models of existing data. Model extrapolation was considered if only in hypothetical sense in terms of potential modifications of known biologically active chemicals that could improve compounds activity. This critical review re-examines the strategy and the output of the modern QSAR modeling approaches. We provide examples and arguments suggesting that current methodologies may afford robust and validated models capable of accurate prediction of compound properties for molecules not included in the training sets. We discuss a data-analytical modeling workflow developed in our laboratory that incorporates modules for combinatorial QSAR model development (i.e., using all possible binary combinations of available descriptor sets and statistical data modeling techniques), rigorous model validation, and virtual screening of available chemical databases to identify novel biologically active compounds. Our approach places particular emphasis on model validation as well as the need to define model applicability domains in the chemistry space. We present examples of studies where the application of rigorously validated QSAR models to virtual screening identified computational hits that were confirmed by subsequent experimental investigations. The emerging focus of QSAR modeling on target property forecasting brings it forward as predictive, as opposed to evaluative, modeling approach.
Keywords: QSAR-quantitative structure activity relationships, combi-QSAR-combinatorial QSAR, kNN - k nearest neighbors, SA -simulating annealing, PLS - partial least squares