Abstract
Aims: Solving the tagSNP selection problem by network method and reconstructing unknown individual from tagSNPs by a prediction method.
Background: As a genetic marker, SNP has been used for linkage analysis of genetic diseases in genome- wide association studies. The genetic information carried by SNPs is redundant in regions of high linkage disequilibrium in the human genome. Therefore, a subset of informative SNPs (tagSNP set) is sufficient to represent the rest of the SNPs, reducing the genotyping cost and computational complexity greatly.
Methods: A novel tagSNP set selection method named NCCRT is proposed, which combines the ideas of the network community partition of the SNP network and node centrality ranking to select tagSNPs of genotype data.
Results: The method is tested on three data sets, including 176 SNPs, 169 SNPs, and 56 SNPs of gene ASAH1, HTR2A, and OLFM4. The experimental results show that our method achieves the best effect in terms of prediction accuracy and stability for ASAH1 and HTR2A.
Conclusion: Compared with random sampling, greedy algorithm, and TSMI algorithm, our method does not rely on causal SNP selection, but it can also quickly identify the tagSNP nodes and improve the prediction accuracy.
Keywords: TagSNP, linkage disequilibrium, community partition, centrality, linear regression, SNP.
Graphical Abstract
[http://dx.doi.org/10.1001/jama.299.11.1335] [PMID: 18349094]
[http://dx.doi.org/10.1371/journal.pgen.1002101] [PMID: 21695280]
[http://dx.doi.org/10.1093/database/bav029] [PMID: 25877638]
[http://dx.doi.org/10.1134/S1021443721030110]
[http://dx.doi.org/10.2174/1574893615999200730161941]
[http://dx.doi.org/10.1093/bioinformatics/btr024] [PMID: 21285022]
[http://dx.doi.org/10.1016/j.abb.2007.04.014] [PMID: 17531949]
[http://dx.doi.org/10.1038/ng2022] [PMID: 17401363]
[http://dx.doi.org/10.1038/ng2075] [PMID: 17529973]
[http://dx.doi.org/10.1038/ng1001-233] [PMID: 11586306]
[http://dx.doi.org/10.1086/344780] [PMID: 12439824]
[http://dx.doi.org/10.1073/pnas.102186799] [PMID: 12032283]
[http://dx.doi.org/10.1145/640075.640119]
[http://dx.doi.org/10.1186/1471-2105-6-263] [PMID: 16259642]
[http://dx.doi.org/10.1093/bioinformatics/btk035] [PMID: 16403792]
[http://dx.doi.org/10.1086/381000] [PMID: 14681826]
[http://dx.doi.org/10.1093/bioinformatics/bti201] [PMID: 15585525]
[http://dx.doi.org/10.1142/S0219720006001941] [PMID: 16819782]
[http://dx.doi.org/10.1371/journal.pone.0167994] [PMID: 27992465]
[http://dx.doi.org/10.1093/bioinformatics/btx151] [PMID: 28334342]
[http://dx.doi.org/10.1016/j.future.2020.09.007]
[http://dx.doi.org/10.1016/j.jbi.2010.05.011] [PMID: 20546935]
[http://dx.doi.org/10.1016/j.jbi.2012.03.003]
[http://dx.doi.org/10.1109/TCBB.2014.2351797] [PMID: 26357082]
[PMID: 20039435]
[http://dx.doi.org/10.1016/S0040-5809(03)00005-4] [PMID: 12689795]
[http://dx.doi.org/10.1093/bioinformatics/19.2.287] [PMID: 12538253]
[http://dx.doi.org/10.1159/000071807] [PMID: 12890923]
[http://dx.doi.org/10.1103/PhysRevE.69.066133] [PMID: 15244693]
[http://dx.doi.org/10.4137/GRSB.S702] [PMID: 19787083]
[http://dx.doi.org/10.1152/physiolgenomics.00178.2002] [PMID: 12644628]
[http://dx.doi.org/10.1002/ajmg.b.30723] [PMID: 18286632]
[http://dx.doi.org/10.1002/ajmg.a.32260] [PMID: 18837046]
[http://dx.doi.org/10.1006/geno.1999.5940] [PMID: 10610716]
[http://dx.doi.org/10.1007/s00432-011-1042-9] [PMID: 21904905]
[http://dx.doi.org/10.2174/092986707781368450] [PMID: 17691947]
[http://dx.doi.org/10.1093/bioinformatics/btr341] [PMID: 21653516]
[http://dx.doi.org/10.1093/bioinformatics/btl420] [PMID: 16895924]
[http://dx.doi.org/10.1093/bioinformatics/bti1021] [PMID: 15961458]