Abstract
Human Intestinal Absorption (HIA) has been modeled many times by using classification models. However, regression models are scarce. Here, Artificial Neural Networks (ANNs) are implemented for this purpose. A dataset of structurally diverse chemicals with their respective experimental HIA were used to design robust, true predictive and widespread applicable ANN models. An input variables pool was made up of structural invariants calculated by using either Dragon or our software Desmol 1. The selection of best variables was performed following three steps using the entire dataset of molecules. Firstly, variables poorly correlated with the experimental data were eliminated. Secondly, input variable selection was performed by stepwise multilinear regression. Thirdly, correlation matrix in the set of selected variables was then obtained to eliminate those variables strongly intercorrelated. Backpropagation ANNs were trained for these variables finally selected as inputs, and HIA as output. The training and selection procedure to find robust models consisted of randomly partitioning the dataset into three sets: training set, with 50% of the population, test set with 25%, and validation set with the other 25%. With each partitioning, diverse numbers of hidden nodes were assayed to optimize the performance in the prediction for the three sets. Models with r2 greater than 0.6 for the three sets were considered as robust. A randomization test following all these steps was performed, and the poor results obtained confirm the validity of the method presented in this paper to predict HIA for datasets of structurally diverse organic compounds.
Keywords: Artificial neural networks, human intestinal absorption, molecular topology, pattern recognition, QSAR.