Abstract
The first step in the process of drug development is to determine those lead compounds that demonstrate significant biological activity with regard to a target protein. Because this process is often costly and time consuming, there is a need to develop efficient methodologies for the generation of lead compounds for practical drug design. One promising approach for determining a potent lead compound is computational virtual screening. The biological activities of candidate structures found in virtual libraries are estimated by using quantitative structure activity relationship (QSAR) models and/or computational docking simulations. In virtual screening studies, databases of existing drugs or natural products are commonly used as a source of lead candidates. However, these databases are not sufficient for the purpose of finding lead candidates having novel scaffolds. Therefore, a method must be developed to generate novel molecular structures to indicate high activity for efficient lead discovery. In this paper, we review current trends in structure generation methods for drug design and discuss future directions. First, we present an overview of lead discovery and drug design, and then, we review structure generation methods. Here, the structure generation methods are classified on the basis of whether or not they employ QSAR models for generating structures. We conclude that the use of QSAR models for structure generation is an effective method for computational lead discovery. Finally, we discuss the problems regarding the applicability domain of QSAR models and future directions in this field.
Keywords: Applicability domain, chemical space, drug design, lead generation, molecular design, QSAR, QSPR, structure generation, Rational Drug Design, QSAR Models, Structure-based drug design (SBDD), silico screening, NMR spectrum, IR spectrum, MOLGEN, GDB (generated a database), N-methyl-D-aspartic acid (NMDA), GDB-13, GAs, SMILES, GA, EA-based structure generation, CoMFA, CoMSIA, WHIM, back propagation neural networks (BPNN), support vector regression (SVR), kernel-PLS, arbitrary vertices, Kier index, octanol-water partition coefficient, LFA-1/ICAM-1 peptide inhibitors, hydrofluoroether (HFE), canonicalized path, linear Gaussian models, acyclic hydrocarbons, APPLICABILITY DOMAIN (AD), chemometrics, centroid, Euclidean distance, Mahalanobis distance, Tanimoto coefficient, Probability Density Distribution, Gaussian functions, Ensemble Learning, metabolic stability