Abstract
A central focus of cancer genetics is the study of mutations that are causally implicated in tumorigenesis. Although missense variants are commonly identified in genomic sequence, only a small fraction directly contributes to oncogenesis. The ability to distinguish those somatic missense changes that contribute to cancer progression from those that do not is a difficult problem usually accomplished through functional in vivo analyses. With the advent of several largescale cancer genome projects geared toward identifying mutations that are causally implicated in cancer, it is becoming increasingly important to develop methods for distinguishing functionally relevant mutations from those passenger mutations and other innocuous polymorphisms. Here we review two general strategies that are based on either mutation frequency data or the nature of amino acid substitutions. Frequency-based methods are commonly used for estimating the enrichment of causal mutations and for identifying specific mutations under positive selection pressure. The statistical power of these methods is dependent on the number of cancer samples being surveyed. The potential functional consequences of missense mutations can also be examined by bioinformatics approaches since multiple computational methods have been developed to estimate the deleterious effect of amino acid substitutions. It is likely that many of the existing methods can potentially be applied to large-scale cancer genome data to detect relevant causal mutations regardless of their prevalence. Future data analysis of missense somatic mutations will likely benefit from continual development of integrated and automated methods for combining all available information to predict whether a particular mutation is causally implicated.
Keywords: single nucleotide polymorphism, cancer-causing mutations, protein kinases, tumor DNA, amino acid substitution prediction methods