Abstract
The majority of viruses have a small genome. However, these small genomes often have complex gene features with transcriptional and translational exceptions, for instance, gene overlapping, alternative splicing, RNA editing, ribosomal slippage and stop codon read-through. These complex features and exceptions increase gene density and improve the gene coding efficiency of viral genomes. They also pose immense challenges to gene prediction algorithms. Most gene prediction programs for eukaryotic and prokaryotic genomes cannot detect or predict these exceptions correctly. It is critical to predict these complex features and exceptions with high precision and accuracy in order to interpret viral genomic data correctly.
This paper describes the most commonly used programs for viral gene predictions, focusing on the ab initio and similarity-based gene prediction programs, including GeneMarkS, ZCURVE_V, FgenesV, Phylo-HMM, MLOGD, GATU, VirGen, FLAN, VIGOR and others. Viral genome complex features and the basic algorithms of the gene prediction programs are introduced briefly, with identification of advantages and disadvantages, followed by a list of application scopes and specific features. Gene prediction programs for bacteriophages and viral meta-genomic sequences are reviewed separately. The last section of this review presents the future directions and challenges for viral gene prediction program development.
Keywords: Mature peptide prediction, viral gene complex feature, viral gene prediction, viral genome annotation, VIGOR.