Abstract
Descriptor Histogram Filtering (DHiFi) is presented as a novel molecular similarity-based method for the identification of active compounds. Initially, molecular property descriptors are examined for distinct differences in value distributions between known active molecules and other database compounds. For each selected descriptor, values of active training set molecules are recorded in histograms and a probabilitydependent mapping and selection technique is applied to decide whether or not a test compound passes the descriptor filter. A few sequential filtering steps or a corresponding multi-dimensional filter successfully deselect most database compounds but consistently recover molecules for a number of different activity classes. Only three to seven property descriptors are required for highly selective DHiFi calculations. These findings set the newly introduced methodology apart from many other molecular similarity methods and demonstrate that evaluating only a few chemical features can be sufficient to successfully select active compounds from large databases.
Keywords: Compound activity classes, Descriptors, Histogram filter functions, Molecular similarity, Virtual screening