Abstract
A new mathematical approach is proposed in the definition of molecular descriptors (MDs) based on the application of information theory concepts. This approach stems from a new matrix representation of a molecular graph (G) which is derived from the generalization of an incidence matrix whose row entries correspond to connected subgraphs of a given G, and the calculation of the Shannon’s entropy, the negentropy and the standardized information content, plus for the first time, the mutual, conditional and joint entropy-based MDs associated with G. We also define strategies that generalize the definition of global or local invariants from atomic contributions (local vertex invariants, LOVIs), introducing related metrics (norms), means and statistical invariants. These invariants are applied to a vector whose components express the atomic information content calculated using the Shannon’s, mutual, conditional and joint entropybased atomic information indices. The novel information indices (IFIs) are implemented in the program TOMOCOMDCARDD. A principal component analysis reveals that the novel IFIs are capable of capturing structural information not codified by IFIs implemented in the software DRAGON. A comparative study of the different parameters (e.g. subgraph orders and/or types, invariants and class of MDs) used in the definition of these IFIs reveals several interesting results. The mutual entropy-based indices give the best correlation results in modeling of a physicochemical property, namely the partition coefficient of the 34 derivatives of 2-furylethylenes, among the classes of indices investigated in this study. In a comparison with classical MDs it is demonstrated that the new IFIs give good results for various QSPR models.
Keywords: Shannon’s entropy, mutual entropy, conditional entropy, joint entropy, frequency matrix, structural descriptor, subgraph, principal component analysis, QSPR.