Abstract
The objective of this work is the construction of a correlation between characteristics of heterogeneous catalysts, encoded in a descriptor vector, and their experimentally measured performances in the propene oxidation reaction. In this paper the key issue in the modeling process, namely the selection of adequate input variables, is explored. Several data-driven feature selection strategies were applied in order to obtain an estimate of the differences in variance and information content of various attributes, furthermore to compare their relative importance. Quantitative property activity relationship techniques using probabilistic neural networks have been used for the creation of various semi-empirical models. Finally, a robust classification model, assigning selected attributes of solid compounds as input to an appropriate performance class in the model reaction was obtained. It has been evident that the mathematical support for the primary attributes set proposed by chemists can be highly desirable.
Keywords: heterogeneous catalysts, Descriptors, Shannon Entropy, Unsupervised Forward Selection algorithm, Collinearity