Abstract
The availability of high-throughput techniques combined with more exploratory and confirmatory studies in small-molecule science (e.g., probe- and drug-discovery) creates a significant need for structured approaches to data management. The probe- and drug-discovery scientific processes start and end with lower-throughput experiments, connected often by high-throughput cheminformatics, screening, and small-molecule profiling experiments. A rigorous and disciplined approach to data management ensures that data can be used to ask complex questions of assay results, and allows many questions to be answered computationally, without the need for significant manual effort. A structured approach to recording scientific experimental design and observations involves using a consistently maintained set of ‘master data’ or ‘metadata’. Master data include sets of tightly controlled terminology used to describe an experiment, including both materials and methods. Master data can be used at the level of an individual laboratory or with a scope as extensive as a whole community of scientists. Consistent use of master data increases experimental power by allowing data analysis to connect all parts of the discovery life cycle, across experiments performed by different researchers and from different laboratories, thus decreasing the opportunity cost for making novel connections between results. Despite the promise of this increased experimental power, challenges remain in implementation and consistent use of master data management (MDM) techniques in the laboratory. In this paper, we discuss how specific MDM techniques can enhance the quality and utility of scientific data at a project, laboratory, and institutional level. We present a model for storage and exploitation of master data, practical applications of these techniques in the research context of small-molecule science, and specific benefits of MDM to small-molecule screening aimed at probe- and drug-discovery.
Keywords: Chemical biology, drug discovery, master data, metadata, probe discovery, high-throughput techniques, probe, cheminformatics, Gene Ontology, cell lines