Abstract
Essential genes often play key roles in biological processes and mutations in these genes will have a great impact on an organism’s survival and reproduction. Studying lethal phenotypes will provide important information about the function of the gene product and direct gene therapy. Traditionally, essential genes have been identified through single-gene knockout experiments, transposon mutagenesis, or antisense RNA inhibitions. However, experimental methods are expensive, labor-intensive, and time-consuming. In addition, such experiments are not always possible as the vast majority of microorganisms are unculturable. Computational methods for genome-scale essential gene prediction, aided bythe explosion of genome-scale data provided by high-throughput technologies in recent years, provide an alternative way to study essential genes. Constraint-based modeling and machine learning technology have been used in this area and achieved promising results. Information such as protein sequence, network topology, gene expression data and other features have been used to predict essential genes. In this article, we will review recent bioinformatics progresses in the prediction of gene essentiality, including databases, computational methods, the most commonly used features, machine learning classifier comparisons, and feature selection. Finally, we will discuss the challenges and future directions of the field.
Keywords: Computational modeling, essential genes, feature selection, flux balance analysis, machine learning, microbial, prediction.