Integrating Chemists Preferences for Shape-Similarity Clustering of Series

Laurent   A.   Baumes; Remi      Gaudin; Pedro      Serna; Nicolas      Nicoloyannis; Avelino      Corma

doi:10.2174/138620708784246068

Abstract

This study shows how chemistry knowledge and reasoning are taken into account for building a new methodology that aims at automatically grouping data having a chronological structure. We consider combinatorial catalytic experiments where the evolution of a reaction (e.g., conversion) over time is expected to be analyzed. The mathematical tool has been developed to compare and group curves taking into account their shape. The strategy, which consists on combining a hierarchical clustering with the k-means algorithm, is described and compared with both algorithms used separately. The hybridization is shown to be of great interest. Then, a second application mode of the proposed methodology is presented. Once meaningful clusters according to chemists preferences and goals are successfully achieved, the induced model may be used in order to automatically classify new experimental results. The grouping of the new catalysts tested for the Heck coupling reaction between styrene and iodobenzene verified the set of criteria “defined” during the initial clustering step, and facilitated a quick identification of the catalytic behaviors following users preferences.

Keywords: Combinatorial, high throughput, heterogeneous catalysis, heck, data mining, time series, clustering, hierarchical, k-means