Abstract
In this paper we introduce a quantitative model that relates chemical structural similarity to biological activity, and in particular to the activity of lead series of compounds in high-throughput assays. From this model we derive the optimal screening collection make up for a given fixed size of screening collection, and identify the conditions under which a diverse collection of compounds or a collection focusing on particular regions of chemical space are appropriate strategies. We derive from the model a diversity function that may be used to assess compounds for acquisition or libraries for combinatorial synthesis by their ability to complement an existing screening collection. The diversity function is linked directly through the model to the goal of more frequent discovery of lead series from high-throughput screening. We show how the model may also be used to derive relationships between collection size and probabilities of lead discovery in high-throughput screening, and to guide the judicious application of structural filters.
Keywords: high throughput screening, design of experiments, statistical design, library design, optimization, mathematical programming.