Abstract
Single Nucleotide Polymorphisms (SNPs) are the most abundant form of genetic variations observed in the human genome. With the advent of high-throughput genotyping arrays and next-generation sequencing (NGS) platforms, tens of millions of SNPs have been uncovered in several human populations. However, the huge amount of SNPs bring new challenges in subsequent analysis. In reality, a number of SNPs may not be genotyped, and non-mutant bases may be falsely reported as SNPs in the microarray and NGS platforms. Furthermore, the identification of disease susceptibility genes are often confounded by numerous SNPs correlated by chance. In this paper, we review existing approaches for calling SNPs using microarrays and next-generation sequencing (NGS) platforms. Methods for measuring linkage disequilibrium (LD) and applications of the LD structure are discussed. Finally, we compare methods for inferring haplotypes from genotypes using microarray and NGS platforms and present the challenges of using SNPs in large-scale association studies.
Keywords: Algorithm, Haplotype, Linkage disequilibrium, Microarray, Next generation sequencing, Single nucleotide polymorphism, genetic markers, International HapMap Project, Human Leucocyte Antigens, Illumina 650