Abstract
The identification of endocrine-disrupting chemicals (EDCs) is one of the important goals of environmental chemical hazard screening. The adverse health effects of EDCs in humans have been demonstrated to involve the developmental, reproductive, neurological, cardiovascular, metabolic, and immune systems.
The present study reports QSAR classification studies on a large database comprising 8,212 compounds collected from the Estrogenic Activity Database and National Center for Biotechnology Information Database. In this study, four classification models (Bayesian Categorization Model with molecular fingerprints or molecular descriptors as an input and Neural Classification Models with and without Bayesian regularization) were used. Evaluation of these binomial classification methods indicated that the Bayesian method (Bayesian QSAR) works as an excellent method for prediction with fingerprints used as input. In the case of the multilayer perceptron with molecular descriptors as inputs, changing the training mode by introducing a Bayesian regularization algorithm significantly improved ANNs’ predictive power. Our goal was to test two popular classification methods suitable for processing large data sets. Such datasets were required to ensure the prediction performances and applicability of the models as a virtual screening tool for an extensive database.
Keywords: ANN, bayesian categorization model, endocrine disruptor, endocrine disruptors knowledge base, molecular fingerprints, QSAR, virtual screening, molecular descriptors.