Generic placeholder image

Current Bioinformatics

Editor-in-Chief

ISSN (Print): 1574-8936
ISSN (Online): 2212-392X

Research Article

A Tagging SNP Set Method Based on Network Community Partition of Linkage Disequilibrium and Node Centrality

Author(s): Yulin Zhang, Qiang Wan*, Xiaochun Cheng*, Guangyang Lu, Shudong Wang and Sicheng He

Volume 17, Issue 9, 2022

Published on: 12 September, 2022

Page: [825 - 834] Pages: 10

DOI: 10.2174/1574893617666220324155813

Price: $65

conference banner
Abstract

Aims: Solving the tagSNP selection problem by network method and reconstructing unknown individual from tagSNPs by a prediction method.

Background: As a genetic marker, SNP has been used for linkage analysis of genetic diseases in genome- wide association studies. The genetic information carried by SNPs is redundant in regions of high linkage disequilibrium in the human genome. Therefore, a subset of informative SNPs (tagSNP set) is sufficient to represent the rest of the SNPs, reducing the genotyping cost and computational complexity greatly.

Methods: A novel tagSNP set selection method named NCCRT is proposed, which combines the ideas of the network community partition of the SNP network and node centrality ranking to select tagSNPs of genotype data.

Results: The method is tested on three data sets, including 176 SNPs, 169 SNPs, and 56 SNPs of gene ASAH1, HTR2A, and OLFM4. The experimental results show that our method achieves the best effect in terms of prediction accuracy and stability for ASAH1 and HTR2A.

Conclusion: Compared with random sampling, greedy algorithm, and TSMI algorithm, our method does not rely on causal SNP selection, but it can also quickly identify the tagSNP nodes and improve the prediction accuracy.

Keywords: TagSNP, linkage disequilibrium, community partition, centrality, linear regression, SNP.

Graphical Abstract

[1]
Pearson TA, Manolio TA. How to interpret a genome-wide association study. JAMA 2008; 299(11): 1335-44.
[http://dx.doi.org/10.1001/jama.299.11.1335] [PMID: 18349094]
[2]
Braun R, Buetow K, Schork NJ. Pathways of distinction analysis: A new technique for multi-SNP analysis of GWAS data. PLoS Genet 2011; 7(6): e1002101.
[http://dx.doi.org/10.1371/journal.pgen.1002101] [PMID: 21695280]
[3]
Gong J, Liu C, Liu W, et al. An update of miRNASNP database for better SNP selection by GWAS data, miRNA expression and online tools. Database (Oxford) 2015; 2015: bav029.
[http://dx.doi.org/10.1093/database/bav029] [PMID: 25877638]
[4]
Li WZ, Zhang M, Huang YM, Li W-H, Wang L. Key Genes and Molecular Mechanism Investigation in the Synthesis of Maize Quercetin Based on SNP and Bioinformatics Analysis. Russ J Plant Physiol 2021; 68(3): 421-9.
[http://dx.doi.org/10.1134/S1021443721030110]
[5]
Zhang H, Zhang Q. Potentiality of Risk SNPs Identification Based on GSP Theory. Curr Bioinform 2021; 16(4): 512-23.
[http://dx.doi.org/10.2174/1574893615999200730161941]
[6]
Yip W, Lange C. Quantitative trait prediction based on genetic marker-array data, a simulation study. Bioinformatics 2011; 27(6): 745-8.
[http://dx.doi.org/10.1093/bioinformatics/btr024] [PMID: 21285022]
[7]
Yeung CK, Adman ET, Rettie AE. Functional characterization of genetic variants of human FMO3 associated with trimethylaminuria. Arch Biochem Biophys 2007; 464(2): 251-9.
[http://dx.doi.org/10.1016/j.abb.2007.04.014] [PMID: 17531949]
[8]
Yeager M, Orr N, Hayes RB, et al. Genome-wide association study of prostate cancer identifies a second risk locus at 8q24. Nat Genet 2007; 39(5): 645-9.
[http://dx.doi.org/10.1038/ng2022] [PMID: 17401363]
[9]
Hunter DJ, Kraft P, Jacobs KB, et al. A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nat Genet 2007; 39(7): 870-4.
[http://dx.doi.org/10.1038/ng2075] [PMID: 17529973]
[10]
Johnson GCL, Esposito L, Barratt BJ, et al. Haplotype tagging for the identification of common disease genes. Nat Genet 2001; 29(2): 233-7.
[http://dx.doi.org/10.1038/ng1001-233] [PMID: 11586306]
[11]
Zhang K, Calabrese P, Nordborg M, Sun F. Haplotype block structure and its applications to association studies: Power and study designs. Am J Hum Genet 2002; 71(6): 1386-94.
[http://dx.doi.org/10.1086/344780] [PMID: 12439824]
[12]
Zhang K, Deng M, Chen T, Waterman MS, Sun F. A dynamic programming algorithm for haplotype block partitioning. Proc Natl Acad Sci USA 2002; 99(11): 7335-9.
[http://dx.doi.org/10.1073/pnas.102186799] [PMID: 12032283]
[13]
Zhang K, Sun F, Waterman MS, et al. Dynamic programming algorithms for haplotype block partitioning: Applications to human chromosome 21 haplotype data. Proceedings of the seventh annual international conference on Research in computational molecular biology. 332-40.
[http://dx.doi.org/10.1145/640075.640119]
[14]
Huang YT, Zhang K, Chen T, Chao KM. Selecting additional tag SNPs for tolerating missing data in genotyping. BMC Bioinformatics 2005; 6(1): 263.
[http://dx.doi.org/10.1186/1471-2105-6-263] [PMID: 16259642]
[15]
Chang CJ, Huang YT, Chao KM. A greedier approach for finding tag SNPs. Bioinformatics 2006; 22(6): 685-91.
[http://dx.doi.org/10.1093/bioinformatics/btk035] [PMID: 16403792]
[16]
Carlson CS, Eberle MA, Rieder MJ, Yi Q, Kruglyak L, Nickerson DA. Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am J Hum Genet 2004; 74(1): 106-20.
[http://dx.doi.org/10.1086/381000] [PMID: 14681826]
[17]
Ao SI, Yip K, Ng M, et al. CLUSTAG: Hierarchical clustering and graph methods for selecting tag SNPs. Bioinformatics 2005; 21(8): 1735-6.
[http://dx.doi.org/10.1093/bioinformatics/bti201] [PMID: 15585525]
[18]
Phuong TM, Lin Z, Altman RB. CHOOSING SNPs USING FEATURE SELECTION. J Bioinform Comput Biol 2006; 4(2): 241-57.
[http://dx.doi.org/10.1142/S0219720006001941] [PMID: 16819782]
[19]
Elmas A, Ou Yang TH, Wang X, Anastassiou D. Discovering genome-wide Tag SNPs based on the mutual information of the variants. PLoS One 2016; 11(12): e0167994.
[http://dx.doi.org/10.1371/journal.pone.0167994] [PMID: 27992465]
[20]
Wang S, He S, Yuan F, Zhu X. Tagging SNP-set selection with maximum information based on linkage disequilibrium structure in genome-wide association studies. Bioinformatics 2017; 33(14): 2078-81.
[http://dx.doi.org/10.1093/bioinformatics/btx151] [PMID: 28334342]
[21]
Wang S, Liu G, Wang X, Zhang Y, He S, Zhang Y. TagSNP-set selection for genotyping using integrated data. Future Gener Comput Syst 2021; 115: 327-34.
[http://dx.doi.org/10.1016/j.future.2020.09.007]
[22]
Mahdevar G, Zahiri J, Sadeghi M, Nowzari-Dalini A, Ahrabian H. Tag SNP selection via a genetic algorithm. J Biomed Inform 2010; 43(5): 800-4.
[http://dx.doi.org/10.1016/j.jbi.2010.05.011] [PMID: 20546935]
[23]
Liao B, Li X, Zhu W, Li R, Wang S. Multiple ant colony algorithm method for selecting tag SNPs. J Biomed Inf 2012; 45(5): 931-7.
[http://dx.doi.org/10.1016/j.jbi.2012.03.003]
[24]
Bo Liao, Xiong Li, Lijun Cai, Zhi Cao, Haowen Chen. A Hierarchical Clustering Method of Selecting Kernel SNP to Unify Informative SNP and Tag SNP. IEEE/ACM Trans Comput Biol Bioinformatics 2015; 12(1): 113-22.
[http://dx.doi.org/10.1109/TCBB.2014.2351797] [PMID: 26357082]
[25]
Chuang LY, Yang CS, Ho CH, Yang CH. Tag SNP selection using particle swarm optimization. Biotechnol Prog 2010; 26(2): 580-8.
[PMID: 20039435]
[26]
Nielsen R, Signorovitch J. Correcting for ascertainment biases when analyzing SNP data: Applications to the estimation of linkage disequilibrium. Theor Popul Biol 2003; 63(3): 245-55.
[http://dx.doi.org/10.1016/S0040-5809(03)00005-4] [PMID: 12689795]
[27]
Ke X, Cardon LR. Efficient selective screening of haplotype tag SNPs. Bioinformatics 2003; 19(2): 287-8.
[http://dx.doi.org/10.1093/bioinformatics/19.2.287] [PMID: 12538253]
[28]
Stram DO, Haiman CA, Hirschhorn JN, et al. Choosing haplotype-tagging SNPS based on unphased genotype data using a preliminary sample of unrelated subjects with an example from the Multiethnic Cohort Study. Hum Hered 2003; 55(1): 27-36.
[http://dx.doi.org/10.1159/000071807] [PMID: 12890923]
[29]
Newman MEJ. Fast algorithm for detecting community structure in networks. Phys Rev E Stat Nonlin Soft Matter Phys 2004; 69(6): 066133.
[http://dx.doi.org/10.1103/PhysRevE.69.066133] [PMID: 15244693]
[30]
Koschützki D, Schreiber F. Centrality analysis methods for biological networks and their application to gene regulatory networks. Gene Regul Syst Bio 2008; 2(2): GRSB.S702..
[http://dx.doi.org/10.4137/GRSB.S702] [PMID: 19787083]
[31]
Olivier M. A haplotype map of the human genome. Physiol Genomics 2003; 13(1): 3-9.
[http://dx.doi.org/10.1152/physiolgenomics.00178.2002] [PMID: 12644628]
[32]
Matthew L, Alessio S, Marilee K, et al. Association study of brain-derived neurotrophic factor (BDNF) and LIN-7 homolog (LIN-7) genes with adult attention-deficit/hyperactivity disorder. Am J Med Genet B Neuropsychiatr Genet 2008; 147B(6): 945-51.
[http://dx.doi.org/10.1002/ajmg.b.30723] [PMID: 18286632]
[33]
Vieira AR, Howe A, Murray JC. Studies of γ-aminobutyric acid type A receptor β3 (GABRB3) and glutamic acid decarboxylase 67 (GAD67) with oral clefts. Am J Med Genet A 2008; 146A(21): 2828-30.
[http://dx.doi.org/10.1002/ajmg.a.32260] [PMID: 18837046]
[34]
Li CM, Park JH, He X, et al. The human acid ceramidase gene (ASAH): Structure, chromosomal location, mutation analysis, and expression. Genomics 1999; 62(2): 223-31.
[http://dx.doi.org/10.1006/geno.1999.5940] [PMID: 10610716]
[35]
Luo Z, Zhang Q, Zhao Z, Li B, Chen J, Wang Y. OLFM4 is associated with lymph node metastasis and poor prognosis in patients with gastric cancer. J Cancer Res Clin Oncol 2011; 137(11): 1713-20.
[http://dx.doi.org/10.1007/s00432-011-1042-9] [PMID: 21904905]
[36]
Antonio Drago, Drago A. De Ronchi D. HTR2A gene variants and psychiatric disorders: A review of current literature and selection of SNPs for future studies. Curr Med Chem 2007; 14(19): 2053-69.
[http://dx.doi.org/10.2174/092986707781368450] [PMID: 17691947]
[37]
Su Z, Marchini J, Donnelly P. HAPGEN2: Simulation of multiple disease SNPs. Bioinformatics 2011; 27(16): 2304-5.
[http://dx.doi.org/10.1093/bioinformatics/btr341] [PMID: 21653516]
[38]
He J, Zelikovsky A. MLR-tagging: Informative SNP selection for unphased genotypes based on multiple linear regression. Bioinformatics 2006; 22(20): 2558-61.
[http://dx.doi.org/10.1093/bioinformatics/btl420] [PMID: 16895924]
[39]
Halperin E, Kimmel G, Shamir R. Tag SNP selection in genotype data for maximizing SNP prediction accuracy. Bioinformatics 2005; 21(1) (Suppl. 1): i195-203.
[http://dx.doi.org/10.1093/bioinformatics/bti1021] [PMID: 15961458]

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy