Dimensionality reduction reveals fine-scale structure in the Japanese population with consequences for polygenic risk prediction.
Journal
Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555
Informations de publication
Date de publication:
26 03 2020
26 03 2020
Historique:
received:
12
06
2019
accepted:
19
02
2020
entrez:
29
3
2020
pubmed:
29
3
2020
medline:
16
7
2020
Statut:
epublish
Résumé
The diversity in our genome is crucial to understanding the demographic history of worldwide populations. However, we have yet to know whether subtle genetic differences within a population can be disentangled, or whether they have an impact on complex traits. Here we apply dimensionality reduction methods (PCA, t-SNE, PCA-t-SNE, UMAP, and PCA-UMAP) to biobank-derived genomic data of a Japanese population (n = 169,719). Dimensionality reduction reveals fine-scale population structure, conspicuously differentiating adjacent insular subpopulations. We further enluciate the demographic landscape of these Japanese subpopulations using population genetics analyses. Finally, we perform phenome-wide polygenic risk score (PRS) analyses on 67 complex traits. Differences in PRS between the deconvoluted subpopulations are not always concordant with those in the observed phenotypes, suggesting that the PRS differences might reflect biases from the uncorrected structure, in a trait-dependent manner. This study suggests that such an uncorrected structure can be a potential pitfall in the clinical application of PRS.
Identifiants
pubmed: 32218440
doi: 10.1038/s41467-020-15194-z
pii: 10.1038/s41467-020-15194-z
pmc: PMC7099015
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
1569Références
Groucutt, H. S. et al. Rethinking the dispersal of Homo sapiens out of Africa. Evol. Anthropol. 24, 149–164 (2015).
pubmed: 26267436
pmcid: 6715448
doi: 10.1002/evan.21455
Pontzer, Herman Overview of hominin evolution|learn science at scitable. Nat. Educ. Knowl. 3, 8 (2012).
Fumagalli, M. et al. Greenlandic Inuit show genetic signatures of diet and climate adaptation. Science 349, 1343–1347 (2015).
pubmed: 26383953
doi: 10.1126/science.aab2319
Huerta-Sánchez, E. et al. Altitude adaptation in Tibetans caused by introgression of Denisovan-like DNA. Nature 512, 194–197 (2014).
pubmed: 25043035
pmcid: 4134395
doi: 10.1038/nature13408
Yang, J. et al. Genetic signatures of high-altitude adaptation in Tibetans. Proc. Natl Acad. Sci. USA 114, 4189–4194 (2017).
pubmed: 28373541
doi: 10.1073/pnas.1617042114
Sikora, M. et al. Physiological and genetic adaptations to diving in Sea Nomads. Cell 173, 569–580.e15 (2018).
pubmed: 29677510
doi: 10.1016/j.cell.2018.03.054
Galinsky, K. J. et al. Fast principal-component analysis reveals convergent evolution of ADH1B in Europe and East Asia. Am. J. Hum. Genet. 98, 456–472 (2016).
pubmed: 26924531
pmcid: 4827102
doi: 10.1016/j.ajhg.2015.12.022
Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).
pubmed: 16862161
doi: 10.1038/ng1847
Hirata, J. et al. Genetic and phenotypic landscape of the major histocompatibilty complex region in the Japanese population. Nat. Genet. 51, 470–480 (2019).
pubmed: 30692682
doi: 10.1038/s41588-018-0336-0
Li, L. et al. A machine-learning approach for predicting palmitoylation sites from integrated sequence-based features. J. Bioinform. Comput. Biol. 15, 1650025 (2017).
pubmed: 27411307
doi: 10.1142/S0219720016500256
Platzer, A. Visualization of SNPs with t-SNE. PLoS ONE 8, e56883 (2013).
pubmed: 23457633
pmcid: 3574019
doi: 10.1371/journal.pone.0056883
Diaz-Papkovich, A., Anderson-Trocmé, L., Ben-Eghan, C. & Gravel, S. UMAP reveals cryptic population structure and phenotype heterogeneity in large genomic cohorts. PLoS Genet. 15, e1008432 (2019).
pubmed: 31675358
pmcid: 6853336
doi: 10.1371/journal.pgen.1008432
Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019).
pubmed: 30926966
pmcid: 6563838
doi: 10.1038/s41588-019-0379-x
Kerminen, S. et al. Geographic variation and bias in the polygenic scores of complex diseases and traits in Finland. Am. J. Hum. Genet. 104, 1169–1181 (2019).
pubmed: 31155286
pmcid: 6562021
doi: 10.1016/j.ajhg.2019.05.001
Jinam, T. et al. The history of human populations in the Japanese Archipelago inferred from genome-wide SNP data with a special reference to the Ainu and the Ryukyuan populations. J. Hum. Genet. 57, 787–795 (2012).
pubmed: 23135232
doi: 10.1038/jhg.2012.114
pmcid: 23135232
Takeuchi, F. et al. The fine-scale genetic structure and evolution of the Japanese population. PLoS ONE 12, 1–28 (2017).
Omoto, K. & Saitou, N. Genetic origins of the Japanese: a partial support for the dual structure hypothesis. Am. J. Phys. Anthropol. 102, 437–446 (1997).
pubmed: 9140536
doi: 10.1002/(SICI)1096-8644(199704)102:4<437::AID-AJPA1>3.0.CO;2-P
pmcid: 9140536
van der Maaten, Laurens & Hinton, G. Visualizing data using t-SNE Laurens. J. Mach. Learn. Res. 9, 2579–2605 (2008).
McInnes, L., Healy, J. & Melville, J. UMAP: Uniform manifold approximation and projection for dimension reduction. Preprint at https://arxiv.org/abs/1802.03426 (2018).
Okada, Y. et al. Deep whole-genome sequencing reveals recent selection signatures linked to evolution and disease risk of Japanese. Nat. Commun. 9, 1631 (2018).
pubmed: 29691385
pmcid: 5915442
doi: 10.1038/s41467-018-03274-0
Cohen, J. A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20, 37–46 (1960).
doi: 10.1177/001316446002000104
Landis, J. R. & Koch, G. G. The measurement of observer agreement for categorical data. Biometrics 33, 159 (1977).
pubmed: 843571
pmcid: 843571
Gibbs, R. A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
doi: 10.1038/nature15393
Leslie, S. et al. The fine-scale genetic structure of the British population. Nature 519, 309–314 (2015).
pubmed: 25788095
pmcid: 4632200
doi: 10.1038/nature14230
Kerminen, S. et al. Fine-scale genetic structure in Finland. G3 Genes Genomes Genet. 7, 3459–3468 (2017).
Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
pubmed: 19648217
pmcid: 2752134
doi: 10.1101/gr.094052.109
Pickrell, J. K. & Pritchard, J. K. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet. 8, e1002967 (2012).
pubmed: 23166502
pmcid: 3499260
doi: 10.1371/journal.pgen.1002967
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
pubmed: 30305743
pmcid: 6786975
doi: 10.1038/s41586-018-0579-z
Too, C. L. et al. Smoking interacts with HLA-DRB1 shared epitope in the development of anti-citrullinated protein antibody-positive rheumatoid arthritis: results from the Malaysian Epidemiological Investigation of Rheumatoid Arthritis (MyEIRA). Arthritis Res. Ther. 14, R89 (2012).
pubmed: 22537824
pmcid: 3446463
doi: 10.1186/ar3813
Saxena, R. et al. A multinational Arab Genome‐Wide Association Study identifies new genetic associations for rheumatoid. Arthritis Arthritis Rheumatol. 69, 976–985 (2017).
pubmed: 28118524
doi: 10.1002/art.40051
Sakaue, S. et al. Trans-biobank analysis with 676,000 individuals elucidates the association of polygenic risk scores of complex traits with human lifespan. Nat. Med. https://doi.org/10.1038/s41591-020-0785-8 (in press).
Berg, J. J. et al. Reduced signal for polygenic adaptation of height in UK Biobank. eLife 8, e39725. https://doi.org/10.7554/eLife.39725 (2019).
O’Connor, L. J. et al. Extreme polygenicity of complex traits is explained by negative selection. Am. J. Hum. Genet. 105, 456–476 (2019).
pubmed: 31402091
pmcid: 6732528
doi: 10.1016/j.ajhg.2019.07.003
Turchin, M. C. et al. Evidence of widespread selection on standing variation in Europe at height-associated SNPs. Nat. Genet. 44, 1015–1019 (2012).
pubmed: 22902787
pmcid: 3480734
doi: 10.1038/ng.2368
Berg, J. J. & Coop, G. A population genetic signal of polygenic adaptation. PLoS Genet. 10, e1004412 (2014).
pubmed: 25102153
pmcid: 4125079
doi: 10.1371/journal.pgen.1004412
The “All of Us” research program. N. Engl. J. Med. 381, 668–676. https://www.nejm.org/doi/full/10.1056/NEJMsr1809937 (2019).
Hirata, M. et al. Cross-sectional analysis of BioBank Japan clinical data: a large cohort of 200,000 patients with 47 common diseases. J. Epidemiol. 27, S9–S21 (2017).
pubmed: 28190657
pmcid: 5363792
doi: 10.1016/j.je.2016.12.003
Nagai, A. et al. Overview of the BioBank Japan Project: study design and profile. J. Epidemiol. 27, S2–S8 (2017).
pubmed: 28189464
pmcid: 5350590
doi: 10.1016/j.je.2016.12.005
Akiyama, M. et al. Genome-wide association study identifies 112 new loci for body mass index in the Japanese population. Nat. Genet. 49, 1458–1467 (2017).
pubmed: 28892062
doi: 10.1038/ng.3951
pmcid: 28892062
Kanai, M. et al. Genetic analysis of quantitative traits in the Japanese population links cell types to complex human diseases. Nat. Genet. 50, 390–400 (2018).
pubmed: 29403010
doi: 10.1038/s41588-018-0047-6
Loh, P.-R. et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 48, 1443–1448 (2016).
pubmed: 27694958
pmcid: 5096458
doi: 10.1038/ng.3679
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
pubmed: 1950838
pmcid: 1950838
doi: 10.1086/519795
Van Der Maaten, L. Accelerating t-SNE using tree-based algorithms. J. Mach. Learn. Res. 15, http://homepage.tudelft.nl/19j49/tsne (2014).
Kanai, M., Maeda, Y. & Okada, Y. Grimon: graphical interface to visualize multi-omics networks. Bioinformatics 34, 3934–3936 (2018).
pubmed: 29931190
pmcid: 6223372
doi: 10.1093/bioinformatics/bty488
Too, C. L. et al. Polymorphisms in peptidylarginine deiminase associate with rheumatoid arthritis in diverse Asian populations: evidence from MyEIRA study and meta-analysis. Arthritis Res. Ther. 14, R250 (2012).
pubmed: 23164236
pmcid: 3674620
doi: 10.1186/ar4093
Sohail, M. et al. Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. eLife 8, e39702. https://doi.org/10.7554/eLife.39702 (2019).
Bulik-Sullivan, B. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
pubmed: 25642630
pmcid: 4495769
doi: 10.1038/ng.3211
Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
pubmed: 21167468
pmcid: 3014363
doi: 10.1016/j.ajhg.2010.11.011