Abstract
The dramatically increasing number of compounds that become available for biological evaluation presents a significant challenge for database design, management, and mining. Computational approaches for screening, profiling, or filtering of large compound collections are by now widely used in pharmaceutical research. Among popular compound classification and database mining techniques, partitioning methods are computationally very efficient and particularly suitable for the analysis of increasingly large molecular databases, as they do not depend on pair-wise comparisons of compounds to assess molecular similarity or diversity. Promising applications of partitioning algorithms include diversity selection, searching for compounds with desired biological activity, or the derivation of predictive models from screening datasets. Compound partitioning is introduced here in the context of virtual screening and different partitioning methods are discussed that operate in low-dimensional or other chemical descriptor spaces, including a number of practical drug-discovery-related applications.
Keywords: molecular descriptors, chemical spaces, clustering and partitioning, compound classification, activity-based selection, database mining, virtual screening, drug discovery