Abstract
In this paper, a new protein coding gene-finding algorithm specific for the yeast genome at 96% accuracy is proposed. Five-fold cross-validation tests are performed to confirm the above accuracy. By the new algorithm, the number of protein coding genes in the yeast genome is re-estimated. The estimate is based on the assumption that the unknown genes have similar statistical properties to the known genes. It is found that the number of protein coding genes in the 16 yeast chromosomes is less than or equal to 5873, significantly coincident with the widely accepted range 5800-6000.
Keywords: Recognition, gene, yeast genome