Genetic analyses identify widespread sex-differential participation bias.
Journal
Nature genetics
ISSN: 1546-1718
Titre abrégé: Nat Genet
Pays: United States
ID NLM: 9216904
Informations de publication
Date de publication:
05 2021
05 2021
Historique:
received:
24
03
2020
accepted:
16
03
2021
pubmed:
24
4
2021
medline:
16
6
2021
entrez:
23
4
2021
Statut:
ppublish
Résumé
Genetic association results are often interpreted with the assumption that study participation does not affect downstream analyses. Understanding the genetic basis of participation bias is challenging since it requires the genotypes of unseen individuals. Here we demonstrate that it is possible to estimate comparative biases by performing a genome-wide association study contrasting one subgroup versus another. For example, we showed that sex exhibits artifactual autosomal heritability in the presence of sex-differential participation bias. By performing a genome-wide association study of sex in approximately 3.3 million males and females, we identified over 158 autosomal loci spuriously associated with sex and highlighted complex traits underpinning differences in study participation between the sexes. For example, the body mass index-increasing allele at FTO was observed at higher frequency in males compared to females (odds ratio = 1.02, P = 4.4 × 10
Identifiants
pubmed: 33888908
doi: 10.1038/s41588-021-00846-7
pii: 10.1038/s41588-021-00846-7
pmc: PMC7611642
mid: EMS133100
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
663-671Subventions
Organisme : Medical Research Council
ID : MC_PC_17228
Pays : United Kingdom
Organisme : Medical Research Council
ID : MC_QA137853
Pays : United Kingdom
Organisme : Medical Research Council
ID : MC_UU_00006/2
Pays : United Kingdom
Organisme : Medical Research Council
ID : MC_UU_12015/2
Pays : United Kingdom
Investigateurs
Michelle Agee
(M)
Stella Aslibekyan
(S)
Robert K Bell
(RK)
Katarzyna Bryc
(K)
Sarah K Clark
(SK)
Sarah L Elson
(SL)
Kipper Fletez-Brant
(K)
Pierre Fontanillas
(P)
Nicholas A Furlotte
(NA)
Pooja M Gandhi
(PM)
Karl Heilbron
(K)
Barry Hicks
(B)
Karen E Huber
(KE)
Ethan M Jewett
(EM)
Yunxuan Jiang
(Y)
Aaron Kleinman
(A)
Keng-Han Lin
(KH)
Nadia K Litterman
(NK)
Marie K Luff
(MK)
Matthew H McIntyre
(MH)
Kimberly F McManus
(KF)
Joanna L Mountain
(JL)
Sahar V Mozaffari
(SV)
Elizabeth S Noblin
(ES)
Carrie A M Northover
(CAM)
Jared O'Connell
(J)
Aaron A Petrakovitz
(AA)
Steven J Pitts
(SJ)
G David Poznik
(GD)
J Fah Sathirapongsasuti
(JF)
Janie F Shelton
(JF)
Suyash Shringarpure
(S)
Chao Tian
(C)
Joyce Y Tung
(JY)
Robert J Tunney
(RJ)
Vladimir Vacic
(V)
Xin Wang
(X)
Amir Zare
(A)
Preben Bo Mortensen
(PB)
Ole Mors
(O)
Thomas Werge
(T)
Merete Nordentoft
(M)
David M Hougaard
(DM)
Jonas Bybjerg-Grauholm
(J)
Marie Bækvad-Hansen
(M)
Références
Prictor, M., Teare, H. J. A. & Kaye, J. Equitable participation in biobanks: the risks and benefits of a “dynamic consent” approach. Front. Public Health 6, 253 (2018).
pubmed: 30234093
pmcid: 6133951
doi: 10.3389/fpubh.2018.00253
Leitsalu, L. et al. Cohort profile: Estonian Biobank of the Estonian Genome Center, University of Tartu. Int. J. Epidemiol. 44, 1137–1147 (2015).
pubmed: 24518929
doi: 10.1093/ije/dyt268
Klijs, B. et al. Representativeness of the LifeLines cohort study. PLoS ONE 10, e0137203 (2015).
pubmed: 26333164
pmcid: 4557968
doi: 10.1371/journal.pone.0137203
Fry, A. et al. Comparison of sociodemographic and health-related characteristics of UK Biobank participants with those of the general population. Am. J. Epidemiol. 186, 1026–1034 (2017).
pubmed: 28641372
pmcid: 5860371
doi: 10.1093/aje/kwx246
Pedersen, C. B. et al. The iPSYCH2012 case-cohort sample: new directions for unravelling genetic and environmental architectures of severe mental disorders. Mol. Psychiatry 23, 6–14 (2018).
pubmed: 28924187
doi: 10.1038/mp.2017.196
Rothman, K. J., Gallacher, J. E. J. & Hatch, E. E. Why representativeness should be avoided. Int. J. Epidemiol. 42, 1012–1014 (2013).
pubmed: 24062287
pmcid: 3888189
doi: 10.1093/ije/dys223
Keyes, K. M. & Westreich, D. UK Biobank, big data, and the consequences of non-representativeness. Lancet 393, 1297 (2019).
pubmed: 30938313
pmcid: 7825643
doi: 10.1016/S0140-6736(18)33067-8
Swanson, J. M. The UK Biobank and selection bias. Lancet 380, 110 (2012).
pubmed: 22794246
doi: 10.1016/S0140-6736(12)61179-9
Elwood, J. M. Commentary: on representativeness. Int. J. Epidemiol. 42, 1014–1015 (2013).
pubmed: 24062288
doi: 10.1093/ije/dyt101
Pizzi, C. et al. Sample selection and validity of exposure–disease association estimates in cohort studies. J. Epidemiol. Community Health 65, 407–411 (2011).
pubmed: 20881022
doi: 10.1136/jech.2009.107185
Richiardi, L., Pizzi, C. & Pearce, N. Commentary: representativeness is usually not necessary and often should be avoided. Int. J. Epidemiol. 42, 1018–1022 (2013).
pubmed: 24062290
doi: 10.1093/ije/dyt103
Perry, J. R. B. et al. Stratifying type 2 diabetes cases by BMI identifies genetic risk variants in LAMA1 and enrichment for risk variants in lean compared to obese cases. PLoS Genet. 8, e1002741 (2012).
pubmed: 22693455
pmcid: 3364960
doi: 10.1371/journal.pgen.1002741
Martin, J. et al. Association of genetic risk for schizophrenia with nonparticipation over time in a population-based cohort study. Am. J. Epidemiol. 183, 1149–1158 (2016).
pubmed: 27188935
pmcid: 4908211
doi: 10.1093/aje/kww009
Taylor, A. E. et al. Exploring the association of genetic factors with participation in the Avon Longitudinal Study of Parents and Children. Int. J. Epidemiol. 47, 1207–1216 (2018).
pubmed: 29800128
pmcid: 6124613
doi: 10.1093/ije/dyy060
Adams, M. J. et al. Factors associated with sharing e-mail information and mental health survey participation in large population cohorts. Int. J. Epidemiol. 49, 410–421 (2020).
pubmed: 31263887
doi: 10.1093/ije/dyz134
Tyrrell, J. et al. Genetic predictors of participation in optional components of UK Biobank. Nat. Commun. 12, 886 (2021).
pubmed: 33563987
pmcid: 7873270
doi: 10.1038/s41467-021-21073-y
Munafò, M. R., Tilling, K., Taylor, A. E., Evans, D. M. & Davey Smith, G. Collider scope: when selection bias can substantially influence observed associations. Int. J. Epidemiol. 47, 226–235 (2018).
pubmed: 29040562
doi: 10.1093/ije/dyx206
Boraska, V. et al. Genome-wide meta-analysis of common variant differences between men and women. Hum. Mol. Genet. 21, 4805–4815 (2012).
pubmed: 22843499
pmcid: 3471397
doi: 10.1093/hmg/dds304
Ryu, D., Ryu, J. & Lee, C. Genome-wide association study reveals sex-specific selection signals against autosomal nucleotide variants. J. Hum. Genet. 61, 423–426 (2016).
pubmed: 26763874
doi: 10.1038/jhg.2015.169
Watanabe, K. et al. A global overview of pleiotropy and genetic architecture in complex traits. Nat. Genet. 51, 1339–1348 (2019).
pubmed: 31427789
doi: 10.1038/s41588-019-0481-0
Lee, J. J. et al. Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat. Genet. 50, 1112–1121 (2018).
pubmed: 30038396
pmcid: 6393768
doi: 10.1038/s41588-018-0147-3
Censin, J. C. et al. Causal relationships between obesity and the leading causes of death in women and men. PLoS Genet. 15, e1008405 (2019).
pubmed: 31647808
pmcid: 6812754
doi: 10.1371/journal.pgen.1008405
Sudlow, C. et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).
pubmed: 25826379
pmcid: 4380465
doi: 10.1371/journal.pmed.1001779
Gaziano, J. M. et al. Million Veteran Program: a mega-biobank to study genetic influences on health and disease. J. Clin. Epidemiol. 70, 214–223 (2016).
pubmed: 26441289
doi: 10.1016/j.jclinepi.2015.09.016
Chen, Z. et al. China Kadoorie Biobank of 0.5 million people: survey methods, baseline characteristics and long-term follow-up. Int. J. Epidemiol. 40, 1652–1666 (2011).
pubmed: 22158673
pmcid: 3235021
doi: 10.1093/ije/dyr120
Dewey, F. E. et al. Distribution and clinical impact of functional variants in 50,726 whole-exome sequences from the DiscovEHR study. Science 354, aaf6814 (2016).
pubmed: 28008009
doi: 10.1126/science.aaf6814
Gottesman, O. et al. The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future. Genet. Med. 15, 761–771 (2013).
pubmed: 23743551
pmcid: 3795928
doi: 10.1038/gim.2013.72
Denny, J. C. et al. The “All of Us” Research Program. N. Engl. J. Med. 381, 668–676 (2019).
doi: 10.1056/NEJMsr1809937
pubmed: 31412182
Batty, G. D., Gale, C. R., Kivimäki, M., Deary, I. J. & Bell, S. Comparison of risk factor associations in UK Biobank against representative, general population based studies with conventional response rates: prospective cohort study and individual participant meta-analysis. BMJ 368, m131 (2020).
pubmed: 32051121
pmcid: 7190071
doi: 10.1136/bmj.m131
Richardson, D. B., Rzehak, P., Klenk, J. & Weiland, S. K. Analyses of case-control data for additional outcomes. Epidemiology 18, 441–445 (2007).
pubmed: 17473707
doi: 10.1097/EDE.0b013e318060d25c
Monsees, G. M., Tamimi, R. M. & Kraft, P. Genome-wide association scans for secondary traits using case-control samples. Genet. Epidemiol. 33, 717–728 (2009).
pubmed: 19365863
pmcid: 2790028
doi: 10.1002/gepi.20424
Dudbridge, F. et al. Adjustment for index event bias in genome-wide association studies of subsequent events. Nat. Commun. 10, 1561 (2019).
pubmed: 30952951
pmcid: 6450903
doi: 10.1038/s41467-019-09381-w
Mahmoud, O., Dudbridge, F., Davey Smith, G., Munafò, M. & Tilling, K. Slope-Hunter: a robust method for index-event bias correction in genome-wide association studies of subsequent traits. Preprint at bioRxiv https://doi.org/10.1101/2020.01.31.928077 (2020).
Grotzinger, A. D. et al. Genomic structural equation modelling provides insights into the multivariate genetic architecture of complex traits. Nat. Hum. Behav. 3, 513–525 (2019).
pubmed: 30962613
pmcid: 6520146
doi: 10.1038/s41562-019-0566-x
Heckman, J. J. Sample selection bias as a specification error. Econometrica 47, 153–161 (1979).
doi: 10.2307/1912352
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
pubmed: 32461654
pmcid: 7334197
doi: 10.1038/s41586-020-2308-7
Olsen, L. et al. Prevalence of rearrangements in the 22q11.2 region and population-based risk of neuropsychiatric and developmental disorders in a Danish population: a case-cohort study. Lancet Psychiatry 5, 573–580 (2018).
pubmed: 29886042
pmcid: 6560180
doi: 10.1016/S2215-0366(18)30168-8
Henn, B. M. et al. Cryptic distant relatives are common in both isolated and cosmopolitan genetic samples. PLoS ONE 7, e34267 (2012).
pubmed: 22509285
pmcid: 3317976
doi: 10.1371/journal.pone.0034267
Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).
pubmed: 25642633
pmcid: 4342297
doi: 10.1038/ng.3190
Zheng, X. et al. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics 28, 3326–3328 (2012).
pubmed: 23060615
pmcid: 3519454
doi: 10.1093/bioinformatics/bts606
Watanabe, K., Taskesen, E., van Bochoven, A. & Posthuma, D. Functional mapping and annotation of genetic associations with FUMA. Nat. Commun. 8, 1826 (2017).
pubmed: 29184056
pmcid: 5705698
doi: 10.1038/s41467-017-01261-5
Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019).
pubmed: 30445434
doi: 10.1093/nar/gky1120
Baselmans, B. M. L. et al. Multivariate genome-wide analyses of the well-being spectrum. Nat. Genet. 51, 445–451 (2019).
pubmed: 30643256
doi: 10.1038/s41588-018-0320-8
Jansen, I. E. et al. Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer’s disease risk. Nat. Genet. 51, 404–413 (2019).
pubmed: 30617256
pmcid: 6836675
doi: 10.1038/s41588-018-0311-9
Nolte, I. M. et al. Missing heritability: is the gap closing? An analysis of 32 complex traits in the Lifelines Cohort Study. Eur. J. Hum. Genet. 25, 877–885 (2017).
pubmed: 28401901
pmcid: 5520063
doi: 10.1038/ejhg.2017.50
Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
pubmed: 25642630
pmcid: 4495769
doi: 10.1038/ng.3211
Gazal, S. et al. Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421–1427 (2017).
pubmed: 28892061
pmcid: 6133304
doi: 10.1038/ng.3954
Gazal, S., Marquez-Luna, C., Finucane, H. K. & Price, A. L. Reconciling S-LDSC and LDAK functional enrichment estimates. Nat. Genet. 51, 1202–1204 (2019).
pubmed: 31285579
pmcid: 7006477
doi: 10.1038/s41588-019-0464-1
Evans, L. M. et al. Comparison of methods that use whole genome data to estimate the heritability and genetic architecture of complex traits. Nat. Genet. 50, 737–745 (2018).
pubmed: 29700474
pmcid: 5934350
doi: 10.1038/s41588-018-0108-x
Lee, S. H., Wray, N. R., Goddard, M. E. & Visscher, P. M. Estimating missing heritability for disease from genome-wide association studies. Am. J. Hum. Genet. 88, 294–305 (2011).
pubmed: 21376301
pmcid: 3059431
doi: 10.1016/j.ajhg.2011.02.002
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).
pubmed: 26414676
pmcid: 4797329
doi: 10.1038/ng.3406
Locke, A. E. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206 (2015).
pubmed: 25673413
pmcid: 4382211
doi: 10.1038/nature14177
Hemani, G. et al. The MR-Base platform supports systematic causal inference across the human phenome. eLife 7, e34408 (2018).
pubmed: 29846171
pmcid: 5976434
doi: 10.7554/eLife.34408
Choi, S. W. & O’Reilly, P. F. PRSice-2: polygenic risk score software for biobank-scale data. Gigascience 8, giz082 (2019).
pubmed: 31307061
pmcid: 6629542
doi: 10.1093/gigascience/giz082