Abstract
Background: Computed mathematical descriptors of molecules are used for the prediction of their property/ bioactivity. In the 1970s only a few descriptors could be calculated, currently available software can calculate a large number of descriptors for molecules or biomolecules like DNA/ RNA, proteins.
Objective: When p molecular descriptors are calculated for n molecules, the data set can be viewed as n vectors in p dimensions, each chemical being represented as a point in . Because many of the descriptors are strongly correlated, the n points in will lie on a subspace of dimension lower than p. Methods like principal components analysis (PCA) can be used to characterize the intrinsic dimensionality of chemical spaces. Taking motivation from the work of Basak et al. in 1980s in using PCA of descriptors calculated for various congeneric and structurally diverse sets of chemicals relevant to new drug discovery and predictive toxicology, this paper explores the intrinsic dimensionality of chemical spaces for robust QSAR model development.
Methodology: Intrinsic dimensionality of chemical spaces was studied using three new statistical approaches and two data sets, viz. a congeneric set of 95 aromatic and heteroaromatic amine mutagens and a structurally diverse set of 508 chemical mutagens.
Results: The new outlier-robust methods applied here yield favorable prediction results compared to previous studies on same datasets.
Conclusion: We conclude that while analyzing data on large number of chemical descriptors, it is advisable to build QSAR models that are outlier-robust, and take into consideration the underlying correlations among predictors.
Keywords: Aromatic and heteroaromatic amines, chemical mutagens, hierarchical QSAR, leave one out (LOO) cross validation, principal component analysis, robustness, sufficient dimension reduction.
Graphical Abstract
Current Computer-Aided Drug Design
Title:Exploring Intrinsic Dimensionality of Chemical Spaces for Robust QSAR Model Development: A Comparison of Several Statistical Approaches
Volume: 12 Issue: 4
Author(s): Subhabrata Majumdar and Subhash C. Basak
Affiliation:
Keywords: Aromatic and heteroaromatic amines, chemical mutagens, hierarchical QSAR, leave one out (LOO) cross validation, principal component analysis, robustness, sufficient dimension reduction.
Abstract: Background: Computed mathematical descriptors of molecules are used for the prediction of their property/ bioactivity. In the 1970s only a few descriptors could be calculated, currently available software can calculate a large number of descriptors for molecules or biomolecules like DNA/ RNA, proteins.
Objective: When p molecular descriptors are calculated for n molecules, the data set can be viewed as n vectors in p dimensions, each chemical being represented as a point in . Because many of the descriptors are strongly correlated, the n points in will lie on a subspace of dimension lower than p. Methods like principal components analysis (PCA) can be used to characterize the intrinsic dimensionality of chemical spaces. Taking motivation from the work of Basak et al. in 1980s in using PCA of descriptors calculated for various congeneric and structurally diverse sets of chemicals relevant to new drug discovery and predictive toxicology, this paper explores the intrinsic dimensionality of chemical spaces for robust QSAR model development.
Methodology: Intrinsic dimensionality of chemical spaces was studied using three new statistical approaches and two data sets, viz. a congeneric set of 95 aromatic and heteroaromatic amine mutagens and a structurally diverse set of 508 chemical mutagens.
Results: The new outlier-robust methods applied here yield favorable prediction results compared to previous studies on same datasets.
Conclusion: We conclude that while analyzing data on large number of chemical descriptors, it is advisable to build QSAR models that are outlier-robust, and take into consideration the underlying correlations among predictors.
Export Options
About this article
Cite this article as:
Majumdar Subhabrata and Basak C. Subhash, Exploring Intrinsic Dimensionality of Chemical Spaces for Robust QSAR Model Development: A Comparison of Several Statistical Approaches, Current Computer-Aided Drug Design 2016; 12 (4) . https://dx.doi.org/10.2174/1573409912666160906111821
DOI https://dx.doi.org/10.2174/1573409912666160906111821 |
Print ISSN 1573-4099 |
Publisher Name Bentham Science Publisher |
Online ISSN 1875-6697 |
![](/images/wayfinder.jpg)
- Author Guidelines
- Bentham Author Support Services (BASS)
- Graphical Abstracts
- Fabricating and Stating False Information
- Research Misconduct
- Post Publication Discussions and Corrections
- Publishing Ethics and Rectitude
- Increase Visibility of Your Article
- Archiving Policies
- Peer Review Workflow
- Order Your Article Before Print
- Promote Your Article
- Manuscript Transfer Facility
- Editorial Policies
- Allegations from Whistleblowers