A Tagging SNP Set Method Based on Network Community Partition of
Linkage Disequilibrium and Node Centrality

Yulin      Zhang; Qiang      Wan; Xiaochun      Cheng; Guangyang      Lu; Shudong      Wang; Sicheng      He

doi:10.2174/1574893617666220324155813

Abstract

Aims: Solving the tagSNP selection problem by network method and reconstructing unknown individual from tagSNPs by a prediction method.

Background: As a genetic marker, SNP has been used for linkage analysis of genetic diseases in genome- wide association studies. The genetic information carried by SNPs is redundant in regions of high linkage disequilibrium in the human genome. Therefore, a subset of informative SNPs (tagSNP set) is sufficient to represent the rest of the SNPs, reducing the genotyping cost and computational complexity greatly.

Methods: A novel tagSNP set selection method named NCCRT is proposed, which combines the ideas of the network community partition of the SNP network and node centrality ranking to select tagSNPs of genotype data.

Results: The method is tested on three data sets, including 176 SNPs, 169 SNPs, and 56 SNPs of gene ASAH1, HTR2A, and OLFM4. The experimental results show that our method achieves the best effect in terms of prediction accuracy and stability for ASAH1 and HTR2A.

Conclusion: Compared with random sampling, greedy algorithm, and TSMI algorithm, our method does not rely on causal SNP selection, but it can also quickly identify the tagSNP nodes and improve the prediction accuracy.

Keywords: TagSNP, linkage disequilibrium, community partition, centrality, linear regression, SNP.

« Previous Next »

Graphical Abstract

[1]
Pearson TA, Manolio TA. How to interpret a genome-wide association study. JAMA  2008; 299(11): 1335-44.
 [http://dx.doi.org/10.1001/jama.299.11.1335] [PMID: 18349094]

[2]
Braun R, Buetow K, Schork NJ. Pathways of distinction analysis: A new technique for multi-SNP analysis of GWAS data. PLoS Genet  2011; 7(6): e1002101.
 [http://dx.doi.org/10.1371/journal.pgen.1002101] [PMID: 21695280]

[3]
Gong J, Liu C, Liu W, et al. An update of miRNASNP database for better SNP selection by GWAS data, miRNA expression and online tools. Database (Oxford)  2015; 2015: bav029.
 [http://dx.doi.org/10.1093/database/bav029] [PMID: 25877638]

[4]
Li WZ, Zhang M, Huang YM, Li W-H, Wang L. Key Genes and Molecular Mechanism Investigation in the Synthesis of Maize Quercetin Based on SNP and Bioinformatics Analysis. Russ J Plant Physiol  2021; 68(3): 421-9.
 [http://dx.doi.org/10.1134/S1021443721030110]

[5]
Zhang H, Zhang Q. Potentiality of Risk SNPs Identification Based on GSP Theory. Curr Bioinform  2021; 16(4): 512-23.
 [http://dx.doi.org/10.2174/1574893615999200730161941]

[6]
Yip W, Lange C. Quantitative trait prediction based on genetic marker-array data, a simulation study. Bioinformatics  2011; 27(6): 745-8.
 [http://dx.doi.org/10.1093/bioinformatics/btr024] [PMID: 21285022]

[7]
Yeung CK, Adman ET, Rettie AE. Functional characterization of genetic variants of human FMO3 associated with trimethylaminuria. Arch Biochem Biophys  2007; 464(2): 251-9.
 [http://dx.doi.org/10.1016/j.abb.2007.04.014] [PMID: 17531949]

[8]
Yeager M, Orr N, Hayes RB, et al. Genome-wide association study of prostate cancer identifies a second risk locus at 8q24. Nat Genet  2007; 39(5): 645-9.
 [http://dx.doi.org/10.1038/ng2022] [PMID: 17401363]

[9]
Hunter DJ, Kraft P, Jacobs KB, et al. A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nat Genet  2007; 39(7): 870-4.
 [http://dx.doi.org/10.1038/ng2075] [PMID: 17529973]

[10]
Johnson GCL, Esposito L, Barratt BJ, et al. Haplotype tagging for the identification of common disease genes. Nat Genet  2001; 29(2): 233-7.
 [http://dx.doi.org/10.1038/ng1001-233] [PMID: 11586306]

[11]
Zhang K, Calabrese P, Nordborg M, Sun F. Haplotype block structure and its applications to association studies: Power and study designs. Am J Hum Genet  2002; 71(6): 1386-94.
 [http://dx.doi.org/10.1086/344780] [PMID: 12439824]

[12]
Zhang K, Deng M, Chen T, Waterman MS, Sun F. A dynamic programming algorithm for haplotype block partitioning. Proc Natl Acad Sci USA  2002; 99(11): 7335-9.
 [http://dx.doi.org/10.1073/pnas.102186799] [PMID: 12032283]

[13]
Zhang K, Sun F, Waterman MS, et al. Dynamic programming algorithms for haplotype block partitioning: Applications to human chromosome 21 haplotype data. Proceedings of the seventh annual international conference on Research in computational molecular biology.  332-40.
 [http://dx.doi.org/10.1145/640075.640119]

[14]
Huang YT, Zhang K, Chen T, Chao KM. Selecting additional tag SNPs for tolerating missing data in genotyping. BMC Bioinformatics  2005; 6(1): 263.
 [http://dx.doi.org/10.1186/1471-2105-6-263] [PMID: 16259642]

[15]
Chang CJ, Huang YT, Chao KM. A greedier approach for finding tag SNPs. Bioinformatics  2006; 22(6): 685-91.
 [http://dx.doi.org/10.1093/bioinformatics/btk035] [PMID: 16403792]

[16]
Carlson CS, Eberle MA, Rieder MJ, Yi Q, Kruglyak L, Nickerson DA. Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am J Hum Genet  2004; 74(1): 106-20.
 [http://dx.doi.org/10.1086/381000] [PMID: 14681826]

[17]
Ao SI, Yip K, Ng M, et al. CLUSTAG: Hierarchical clustering and graph methods for selecting tag SNPs. Bioinformatics  2005; 21(8): 1735-6.
 [http://dx.doi.org/10.1093/bioinformatics/bti201] [PMID: 15585525]

[18]
Phuong TM, Lin Z, Altman RB. CHOOSING SNPs USING FEATURE SELECTION. J Bioinform Comput Biol  2006; 4(2): 241-57.
 [http://dx.doi.org/10.1142/S0219720006001941] [PMID: 16819782]

[19]
Elmas A, Ou Yang TH, Wang X, Anastassiou D. Discovering genome-wide Tag SNPs based on the mutual information of the variants. PLoS One  2016; 11(12): e0167994.
 [http://dx.doi.org/10.1371/journal.pone.0167994] [PMID: 27992465]

[20]
Wang S, He S, Yuan F, Zhu X. Tagging SNP-set selection with maximum information based on linkage disequilibrium structure in genome-wide association studies. Bioinformatics  2017; 33(14): 2078-81.
 [http://dx.doi.org/10.1093/bioinformatics/btx151] [PMID: 28334342]

[21]
Wang S, Liu G, Wang X, Zhang Y, He S, Zhang Y. TagSNP-set selection for genotyping using integrated data. Future Gener Comput Syst  2021; 115: 327-34.
 [http://dx.doi.org/10.1016/j.future.2020.09.007]

[22]
Mahdevar G, Zahiri J, Sadeghi M, Nowzari-Dalini A, Ahrabian H. Tag SNP selection via a genetic algorithm. J Biomed Inform  2010; 43(5): 800-4.
 [http://dx.doi.org/10.1016/j.jbi.2010.05.011] [PMID: 20546935]

[23]
Liao B, Li X, Zhu W, Li R, Wang S. Multiple ant colony algorithm method for selecting tag SNPs. J Biomed Inf  2012; 45(5): 931-7.
 [http://dx.doi.org/10.1016/j.jbi.2012.03.003]

[24]
Bo Liao, Xiong Li, Lijun Cai, Zhi Cao, Haowen Chen. A Hierarchical Clustering Method of Selecting Kernel SNP to Unify Informative SNP and Tag SNP. IEEE/ACM Trans Comput Biol Bioinformatics  2015; 12(1): 113-22.
 [http://dx.doi.org/10.1109/TCBB.2014.2351797] [PMID: 26357082]

[25]
Chuang LY, Yang CS, Ho CH, Yang CH. Tag SNP selection using particle swarm optimization. Biotechnol Prog  2010; 26(2): 580-8.
 [PMID: 20039435]

[26]
Nielsen R, Signorovitch J. Correcting for ascertainment biases when analyzing SNP data: Applications to the estimation of linkage disequilibrium. Theor Popul Biol  2003; 63(3): 245-55.
 [http://dx.doi.org/10.1016/S0040-5809(03)00005-4] [PMID: 12689795]

[27]
Ke X, Cardon LR. Efficient selective screening of haplotype tag SNPs. Bioinformatics  2003; 19(2): 287-8.
 [http://dx.doi.org/10.1093/bioinformatics/19.2.287] [PMID: 12538253]

[28]
Stram DO, Haiman CA, Hirschhorn JN, et al. Choosing haplotype-tagging SNPS based on unphased genotype data using a preliminary sample of unrelated subjects with an example from the Multiethnic Cohort Study. Hum Hered  2003; 55(1): 27-36.
 [http://dx.doi.org/10.1159/000071807] [PMID: 12890923]

[29]
Newman MEJ. Fast algorithm for detecting community structure in networks. Phys Rev E Stat Nonlin Soft Matter Phys  2004; 69(6): 066133.
 [http://dx.doi.org/10.1103/PhysRevE.69.066133] [PMID: 15244693]

[30]
Koschützki D, Schreiber F. Centrality analysis methods for biological networks and their application to gene regulatory networks. Gene Regul Syst Bio  2008; 2(2): GRSB.S702..
 [http://dx.doi.org/10.4137/GRSB.S702] [PMID: 19787083]

[31]
Olivier M. A haplotype map of the human genome. Physiol Genomics  2003; 13(1): 3-9.
 [http://dx.doi.org/10.1152/physiolgenomics.00178.2002] [PMID: 12644628]

[32]
Matthew L, Alessio S, Marilee K, et al. Association study of brain-derived neurotrophic factor (BDNF) and LIN-7 homolog (LIN-7) genes with adult attention-deficit/hyperactivity disorder. Am J Med Genet B Neuropsychiatr Genet  2008; 147B(6): 945-51.
 [http://dx.doi.org/10.1002/ajmg.b.30723] [PMID: 18286632]

[33]
Vieira AR, Howe A, Murray JC. Studies of γ-aminobutyric acid type A receptor β3 (GABRB3) and glutamic acid decarboxylase 67 (GAD67) with oral clefts. Am J Med Genet A  2008; 146A(21): 2828-30.
 [http://dx.doi.org/10.1002/ajmg.a.32260] [PMID: 18837046]

[34]
Li CM, Park JH, He X, et al. The human acid ceramidase gene (ASAH): Structure, chromosomal location, mutation analysis, and expression. Genomics  1999; 62(2): 223-31.
 [http://dx.doi.org/10.1006/geno.1999.5940] [PMID: 10610716]

[35]
Luo Z, Zhang Q, Zhao Z, Li B, Chen J, Wang Y. OLFM4 is associated with lymph node metastasis and poor prognosis in patients with gastric cancer. J Cancer Res Clin Oncol  2011; 137(11): 1713-20.
 [http://dx.doi.org/10.1007/s00432-011-1042-9] [PMID: 21904905]

[36]
Antonio Drago, Drago A. De Ronchi D. HTR2A gene variants and psychiatric disorders: A review of current literature and selection of SNPs for future studies. Curr Med Chem  2007; 14(19): 2053-69.
 [http://dx.doi.org/10.2174/092986707781368450] [PMID: 17691947]

[37]
Su Z, Marchini J, Donnelly P. HAPGEN2: Simulation of multiple disease SNPs. Bioinformatics  2011; 27(16): 2304-5.
 [http://dx.doi.org/10.1093/bioinformatics/btr341] [PMID: 21653516]

[38]
He J, Zelikovsky A. MLR-tagging: Informative SNP selection for unphased genotypes based on multiple linear regression. Bioinformatics  2006; 22(20): 2558-61.
 [http://dx.doi.org/10.1093/bioinformatics/btl420] [PMID: 16895924]

[39]
Halperin E, Kimmel G, Shamir R. Tag SNP selection in genotype data for maximizing SNP prediction accuracy. Bioinformatics  2005; 21(1) (Suppl. 1): i195-203.
 [http://dx.doi.org/10.1093/bioinformatics/bti1021] [PMID: 15961458]

Rights & Permissions Print Cite

Article Metrics

1

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1574893617666220324155813	Print ISSN 1574-8936
Publisher Name Bentham Science Publisher	Online ISSN 2212-392X

Current Bioinformatics

A Tagging SNP Set Method Based on Network Community Partition of Linkage Disequilibrium and Node Centrality

Abstract Play Pause

Graphical Abstract

Related Journals

Related Books

Abstract