Abstract
Sequential screening is an iterative procedure that can greatly increase hit rates over random screening or noniterative procedures. We studied the effects of three factors on enrichment rates: the method used to rank compounds, the molecular descriptor set and the selection of initial training set. The primary factor influencing recovery rates was the method of selecting the initial training set. Rates for recovering active compounds were substantially lower with the diverse training sets than they were with training sets selected by other methods. Because structure-activity information is incrementally enhanced in intermediate training sets, sequential screening provides significant improvement in the average rate of recovery of active compounds when compared with non-iterative selection procedures.
Keywords: Sequential screening, training set selection, Nadaraya-Watson kernel, recursive partitioning, k-nearest neighbors, random forest