Abstract
Chemical space has become a key concept in drug discovery. The continued growth in the number of molecules available raises the question regarding how many compounds may exist and which ones have the potential to become drugs. Analysis and visualization of the chemical space covered by public, commercial, in-house and virtual compound collections have found multiple applications in diversity analysis, in silico property profiling, data mining, virtual screening, library design, prioritization in screening campaigns, and acquisition of compound collections, among others. This review covers several techniques, computational programs and approaches that have been developed to visualize, navigate and study the chemical space of molecular databases. Techniques developed in our group are presented including a quantitative assessment of the multi-fusion similarity maps. Additionally an application of 3D-similarity, based on the overlay of chemical structures, to represent the chemical space is introduced. Several comparisons of the chemical space covered by compound collections from different sources such as combinatorial libraries, drugs and natural products, or directed to specific therapeutic areas are also discussed.
Keywords: Chemoinformatics, combinatorial libraries, data-driven analysis, data mining, molecular diversity, multi-fusion similarity maps, structure-activity relationships, virtual screening