Abstract
Toxicity databases have a special role in predictive toxicology, providing ready access to historical information throughout the workflow of discovery, development, and product safety processes in drug development as well as in review by regulatory agencies. To provide accurate information within a hypothesesbuilding environment, the content of the databases needs to be rigorously modeled using standards and controlled vocabulary. The utilitarian purposes of databases widely vary, ranging from a source for (Q)SAR datasets for modelers to a basis for "read-across" for regulators. Many tasks involved in the use of databases are closely tied to data mining, hence database and data mining are essential technology pairs. To understand chemically-induced toxicity, chemical structures must be integrated into the toxicity databases. Data mining these "structure-integrated toxicity databases" requires techniques for handling both chemical structures and textual toxicity information. Structure data mining is similar with some modifications to that conventionally employed for large chemical databases, while data mining of toxicity endpoints is not well developed. This review presents a general strategy to data mine structure-integrated toxicity databases to link chemical structures to biological endpoints. Iterative probing of the chemical domain with toxicity endpoint descriptors and the biological domain with chemical descriptors enables linking of the two domains. Data mining steps to elucidate the hidden relationships between the target organs and chemical classes are presented as an example. Work is in progress in the public domain toward the linking of chemistry to biology by providing databases that can be mined.
Keywords: Bioinformatics, chemoinformatics, database, data mining, informatics, linking chemistry to biology, predictive toxicology, QSAR, toxicity