Abstract
Computational chemistry is largely based on the use of quantitative descriptors of organic molecules, allowing for the analysis of large molecular data sets and for building models that link the chemico-physical and structural descriptions of molecules to their biological activity or chemical reactivity. In the case of the proteins, this approach is severely hampered by the need to take into consideration in a meaningful way, the actual sequence of the aminoacid residues. From a purely mathematical perspective, the protein sequences can be viewed as a time series, where the role of time is played by the order of the aminoacid residues along the sequences. In turn, each individual residue can be considered as a single organic molecule that can be represented by the classical molecular descriptors. Thus, in principle the generation of orderdependent synthetic descriptors through the application of time series analysis can be used for building QSAR-like models of proteins. As a matter of fact, Recurrence Quantification Analysis (RQA) of hydrophobicity-coded sequences of proteins has already been demonstrated to be useful in protein science. In this paper, we show merits and pitfalls of RQA in different case studies, ranging from the global description of a large set of diverse proteins, to the study of the effect of mutations in the human cytochrome P450 system.
Keywords: Cytochrome P450, CYP 2D6, recurrence quantification analysis, chemoinformatics, drug metabolism, pharmacogenetics.