Abstract
A number of methods currently exist for designing chemical libraries. General or universal libraries use a measurement of chemical diversity in their design and seek to cover as much of chemical space as possible in order to maximize the likelihood of discovering a novel lead class of active compounds. Focused chemical libraries are then synthesized to expand on this particular class and thoroughly explore the space about it. Rarely, however, is relevant biological data tightly incorporated in the design of focused libraries. Recursive partitioning is a statistical technique that is used to quickly build SAR models from high-throughput screening data sets and associated chemical descriptors. Using these models in a virtual screening mode significantly increases the probability of finding other active compounds. The predicted activity can be also be used as the fitness function for a genetic algorithm that is designed to select monomer subsets having a higher probability of being active. This dramatic ally reduces the number of compounds that need to be synthesized in focused libraries thus saving considerable time, effort and expense. This paper describes how recursive partitioning models are used to optimize the design of focused chemical libraries.
Keywords: focused chemical libraries, recursive partitioning, sequential screening