Abstract
With operon predictions based on conservation of gene order and 330 Prokaryotic genomes available, I am able to show that in most of the genomes analyzed about 70% of the genes in operons are separated by distances of -20 to 30 base pairs. Most of the differences in this tendency might be related to the appearance of signals inside the operons. However, a closer look at the extreme exceptions confirms that annotation problems are partly responsible for inter-genic distance distributions that deviate from that of operons in Escherichia coli K12. I also argue that the inter-genic distances of adjacent genes in different transcription units (transcription unit boundaries or TUBs) should be expected to be more variable than those for genes in the same operon. Using phylogenetic profiles I show that predictions adjusted on a per genome basis might help increase the accuracy of operon predictions. Improvements in automated annotation might be necessary to fully evaluate the overall tendencies of genes in operons towards short inter-genic distances, and to better understand differences in regulatory complexity across Prokaryotes.
Keywords: Operon predictions, inter-genic distance, conservation of gene order, phylogenetic profiles