Prediction of Mutagenicity of Chemicals from Their Calculated Molecular Descriptors: A Case Study with Structurally Homogeneous versus Diverse Datasets

Subhash      C. Basak; Subhabrata      Majumdar

doi:10.2174/1871524915666150722121322

Abstract

Variation in high-dimensional data is often caused by a few latent factors, and hence dimension reduction or variable selection techniques are often useful in gathering useful information from the data. In this paper we consider two such recent methods: Interrelated two-way clustering and envelope models. We couple these methods with traditional statistical procedures like ridge regression and linear discriminant analysis, and apply them on two data sets which have more predictors than samples (i.e. n << p scenario) and several types of molecular descriptors. One of these datasets consists of a congeneric group of Amines while the other has a much diverse collection compounds. The difference of prediction results between these two datasets for both the methods supports the hypothesis that for a congeneric set of compounds, descriptors of a certain type are enough to provide good QSAR models, but as the data set grows diverse including a variety of descriptors can improve model quality considerably.

Keywords: Envelope models, hierarchical quantitative structure-activity relationship (HiQSAR), interrelated two-way clustering; linear discriminant analysis, mutagenicity, ridge regression, topological indices, congenericity principle, diversity begets diversity principle.

« Previous Next »

Rights & Permissions Print Cite

Article Metrics

58

2

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1871524915666150722121322	Print ISSN 1573-4099
Publisher Name Bentham Science Publisher	Online ISSN 1875-6697

Current Computer-Aided Drug Design

Prediction of Mutagenicity of Chemicals from Their Calculated Molecular Descriptors: A Case Study with Structurally Homogeneous versus Diverse Datasets

Abstract Play Pause

Related Journals

Related Books

Abstract