Abstract
Newly developed full-length cDNA library technologies have enabled us to generate an unprecedented scale of cDNA resources with respect to both the forms of the physical cDNA clones and cDNA sequence information. Detailed annotations were attached to each of the cDNAs both computationally and manually and several integrated databases on the cDNA information were launched in a publicly accessible manner. Now, taking advantage of the physical cDNA clone resources, which are thought to cover most of the entire protein-coding genes in humans and mice non-redundantly, efficient high-throughput approaches, collectively called functional genomics approaches, are underway for characterizing biological functions of the encoded proteins from various points of view. In addition, it has become clear that the fulllength cDNA resources are also useful for determining the precise genomic positions of the transcriptional start sites. The positional information of the TSSs allowed us to identify and analyze the adjacent promoter regions as putative transcriptional regulatory regions. Moreover, several very recently developed methods, combining full-length cDNA technologies with SAGE technologies, have enabled further high-throughput identification of the TSSs. Based on further expansion of the full-length cDNA data, attempts have been started towards a comprehensive understanding of what genomic elements, including genic regions and non-genic regions such as promoters, would bring about what biological consequences in what cellular contexts. Rapid compilation of genomic sequence data as well as multifaceted use of the full-length cDNA resources will shortly lay a firm foundation for a global understanding of the complex molecular biological systems which convert the information in the genomic DNA into a living cell.
Keywords: full-length cdna, transcriptome, functional annotation, orfeome, promoters, regulations