Abstract
Multivariate quantitative structure-activity relationship (QSAR) modeling, involving simultaneous modeling of activities towards several related endpoints, has emerged recently as an alternative to creating a group of separate models of each activity. The development of multivariate QSAR modeling has been driven by three factors. First, the number of aspects considered vital at earlier stages in the drug development pipeline has increased. Second, advanced screening technology has shifted the rate limiting step of drug discovery and development to other areas. Screening compounds for multiple properties has resulted in the availability of multi-endpoint datasets. Finally, the statistical and computational methods used in data analysis have evolved to allow for handling an increased complexity associated with multi-task prediction. In this review, we outline the justifications for the use of multivariate QSAR modeling. We review the techniques used for developing such models and their applications in drug discovery. We also summarize the methods for visual analysis of multivariate datasets. We focus on neural networks and other advanced, non-linear methods gaining popularity in the QSAR community, while also describing established linear techniques.
Keywords: QSAR, multivariate regression, multivariate analysis, neural networks