Abstract
Serial Analysis of Gene Expression (SAGE) is a sequence-based measure of gene expression that provides quantitative information on the population of transcripts through the generation and counting of specific sequence tags. Many SAGE datasets are publicly available for analysis, constituting a valuable resource for the study of gene expression. These datasets contain tags that are not obviously derived from known transcripts and thus hint at the existence of a large number of novel transcripts; however, the prioritization of candidates for further experimental verification is difficult. Here we demonstrate a method to identify non-coding antisense transcripts which may be implicated in stem cell differentiation by combining SAGE data with gene expression data derived by a complementary method. We produced SAGE libraries and paired microarray gene expression data pre- and post-differentiation of three mouse stem cell types (embryonic, mammary and neural). We found 1,674 SAGE tags antisense to 1,351 protein coding genes. A majority of these antisense tags overlap the 3’UTRs of sense genes; their abundance correlates with the expression of the corresponding sense genes and appears to be tissue specific. We did not find significant association between the expression of these tags and alternative splicing. We measured the expression of three genes expressed in the mouse embryo (Zfp42/Rex1, Ywhag/14- 3-3g and Pspr1) and corresponding putative antisense transcripts by qPCR before and after differentiation of mESC. We conclude that it is possible to identify putative novel antisense transcripts with a potential role in ES cell differentiation by integrating data from existing SAGE libraries with expression data derived by a complementary method. All data used in this work are available from the Gene Expression Omnibus (GEO) and StemBase databases.
Keywords: Serial Analysis of Gene Expression, DNA microarray profiling of gene expression, Embryonic stem cells, Neural stem cells, Mammospheres, Stem cell differentiation, Antisense transcripts, Non-coding RNA, Alternative splicing, Expressed Sequence Tag libraries.