Abstract
Background: The rapid growth of gene expression databases has created a need for contentbased searches as an alternative to unstructured database queries using keyword- or metadata-based searches. Content-based searching is the ability to retrieve all experiments with similar gene expression patterns in a database regardless of the biological annotations provided for these experiments.
Objective: While this concept is still in its infancy in a general context, in this study we focus on applying it to a specific subset of gene expression datasets, by only querying experiments involving time-series expression profiles.
Method: To this end, we propose a novel experiment fingerprinting scheme obtained by clustering expression profiles, for content-based searching of time-series microarray experiments. To determine the retrieval ability of the proposed scheme, we performed a simulated information retrieval task on a large set of microarray experiments gathered from a public repository. The relevance between any two experiments was then defined using their commonalities based on annotated disease associations.
Results and Conclusion: The results showed that relevant experiments can be more successfully retrieved using this new method compared with traditional differential expression-based methods.
Keywords: Gene expression database, time-course data, time-series profile, content-based search, information retrieval, modelbased clustering.
Graphical Abstract