Abstract
In this paper we study and analyze the behavior of different representational spaces for the clustering and building of QSAR models. Representational spaces based on fingerprint similarity, structural similarity using maximum common subgraphs (MCS) and all maximum common subgraphs (AMCS) approaches are compared against representational spaces based on structural fragments and non-isomorphic fragments (NIF), built using different molecular descriptors. Algorithms for extraction of MCS, AMCS and NIF are described and support vector machine is used for the classification of a dataset corresponding with 74 compounds of 1,4-benzoquinone derivatives. Molecular descriptors are tested in order to build QSAR models for the prediction of the antifungal activity of the dataset. Descriptors based on the consideration of graph connectivity and distances are the most appropriate for building QSAR models. Moreover, models based on approximate similarity improve the statistical of the equations thanks to combining structural similarity, nonisomorphic fragments and descriptors approaches for the creation of more robust and finer prediction equations.
Keywords: QSAR, approximate similarity, molecular descriptors, MCS, AMCS, molecular fragments.