Abstract
Naive Bayesian classifiers are a relatively recent addition to the arsenal of tools available to computational chemists. These classifiers fall into a class of algorithms referred to broadly as machine learning algorithms. Bayesian classifiers may be used in conjunction with classical modeling techniques to assist in the rapid virtual screening of large compound libraries in a systematic manner with a minimum of human intervention. This approach allows computational scientists to concentrate their efforts on their core strengths of model building. Bayesian classifiers have an added advantage of being able to handle a variety of numerical or binary data such as physicochemical properties or molecular fingerprints, making the addition of new parameters to existing models a relatively straightforward process. As a result, during a drug discovery project these classifiers can better evolve with the needs of the projects from general models in the lead finding stages to increasingly precise models in the lead optimization stages that are of particular interest to a specific medicinal chemistry team. Although other machine learning algorithms abound, Bayesian classifiers have been shown to compare favorably under most working conditions and have been shown to be tolerant of noisy experimental data.
Keywords: Virtual screening, machine learning, naive bayes