Genetic analyses identify widespread sex-differential participation bias.


Journal

Nature genetics
ISSN: 1546-1718
Titre abrégé: Nat Genet
Pays: United States
ID NLM: 9216904

Informations de publication

Date de publication:
05 2021
Historique:
received: 24 03 2020
accepted: 16 03 2021
pubmed: 24 4 2021
medline: 16 6 2021
entrez: 23 4 2021
Statut: ppublish

Résumé

Genetic association results are often interpreted with the assumption that study participation does not affect downstream analyses. Understanding the genetic basis of participation bias is challenging since it requires the genotypes of unseen individuals. Here we demonstrate that it is possible to estimate comparative biases by performing a genome-wide association study contrasting one subgroup versus another. For example, we showed that sex exhibits artifactual autosomal heritability in the presence of sex-differential participation bias. By performing a genome-wide association study of sex in approximately 3.3 million males and females, we identified over 158 autosomal loci spuriously associated with sex and highlighted complex traits underpinning differences in study participation between the sexes. For example, the body mass index-increasing allele at FTO was observed at higher frequency in males compared to females (odds ratio = 1.02, P = 4.4 × 10

Identifiants

pubmed: 33888908
doi: 10.1038/s41588-021-00846-7
pii: 10.1038/s41588-021-00846-7
pmc: PMC7611642
mid: EMS133100
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

663-671

Subventions

Organisme : Medical Research Council
ID : MC_PC_17228
Pays : United Kingdom
Organisme : Medical Research Council
ID : MC_QA137853
Pays : United Kingdom
Organisme : Medical Research Council
ID : MC_UU_00006/2
Pays : United Kingdom
Organisme : Medical Research Council
ID : MC_UU_12015/2
Pays : United Kingdom

Investigateurs

Michelle Agee (M)
Stella Aslibekyan (S)
Robert K Bell (RK)
Katarzyna Bryc (K)
Sarah K Clark (SK)
Sarah L Elson (SL)
Kipper Fletez-Brant (K)
Pierre Fontanillas (P)
Nicholas A Furlotte (NA)
Pooja M Gandhi (PM)
Karl Heilbron (K)
Barry Hicks (B)
Karen E Huber (KE)
Ethan M Jewett (EM)
Yunxuan Jiang (Y)
Aaron Kleinman (A)
Keng-Han Lin (KH)
Nadia K Litterman (NK)
Marie K Luff (MK)
Matthew H McIntyre (MH)
Kimberly F McManus (KF)
Joanna L Mountain (JL)
Sahar V Mozaffari (SV)
Elizabeth S Noblin (ES)
Carrie A M Northover (CAM)
Jared O'Connell (J)
Aaron A Petrakovitz (AA)
Steven J Pitts (SJ)
G David Poznik (GD)
J Fah Sathirapongsasuti (JF)
Janie F Shelton (JF)
Suyash Shringarpure (S)
Chao Tian (C)
Joyce Y Tung (JY)
Robert J Tunney (RJ)
Vladimir Vacic (V)
Xin Wang (X)
Amir Zare (A)
Preben Bo Mortensen (PB)
Ole Mors (O)
Thomas Werge (T)
Merete Nordentoft (M)
David M Hougaard (DM)
Jonas Bybjerg-Grauholm (J)
Marie Bækvad-Hansen (M)

Références

Prictor, M., Teare, H. J. A. & Kaye, J. Equitable participation in biobanks: the risks and benefits of a “dynamic consent” approach. Front. Public Health 6, 253 (2018).
pubmed: 30234093 pmcid: 6133951 doi: 10.3389/fpubh.2018.00253
Leitsalu, L. et al. Cohort profile: Estonian Biobank of the Estonian Genome Center, University of Tartu. Int. J. Epidemiol. 44, 1137–1147 (2015).
pubmed: 24518929 doi: 10.1093/ije/dyt268
Klijs, B. et al. Representativeness of the LifeLines cohort study. PLoS ONE 10, e0137203 (2015).
pubmed: 26333164 pmcid: 4557968 doi: 10.1371/journal.pone.0137203
Fry, A. et al. Comparison of sociodemographic and health-related characteristics of UK Biobank participants with those of the general population. Am. J. Epidemiol. 186, 1026–1034 (2017).
pubmed: 28641372 pmcid: 5860371 doi: 10.1093/aje/kwx246
Pedersen, C. B. et al. The iPSYCH2012 case-cohort sample: new directions for unravelling genetic and environmental architectures of severe mental disorders. Mol. Psychiatry 23, 6–14 (2018).
pubmed: 28924187 doi: 10.1038/mp.2017.196
Rothman, K. J., Gallacher, J. E. J. & Hatch, E. E. Why representativeness should be avoided. Int. J. Epidemiol. 42, 1012–1014 (2013).
pubmed: 24062287 pmcid: 3888189 doi: 10.1093/ije/dys223
Keyes, K. M. & Westreich, D. UK Biobank, big data, and the consequences of non-representativeness. Lancet 393, 1297 (2019).
pubmed: 30938313 pmcid: 7825643 doi: 10.1016/S0140-6736(18)33067-8
Swanson, J. M. The UK Biobank and selection bias. Lancet 380, 110 (2012).
pubmed: 22794246 doi: 10.1016/S0140-6736(12)61179-9
Elwood, J. M. Commentary: on representativeness. Int. J. Epidemiol. 42, 1014–1015 (2013).
pubmed: 24062288 doi: 10.1093/ije/dyt101
Pizzi, C. et al. Sample selection and validity of exposure–disease association estimates in cohort studies. J. Epidemiol. Community Health 65, 407–411 (2011).
pubmed: 20881022 doi: 10.1136/jech.2009.107185
Richiardi, L., Pizzi, C. & Pearce, N. Commentary: representativeness is usually not necessary and often should be avoided. Int. J. Epidemiol. 42, 1018–1022 (2013).
pubmed: 24062290 doi: 10.1093/ije/dyt103
Perry, J. R. B. et al. Stratifying type 2 diabetes cases by BMI identifies genetic risk variants in LAMA1 and enrichment for risk variants in lean compared to obese cases. PLoS Genet. 8, e1002741 (2012).
pubmed: 22693455 pmcid: 3364960 doi: 10.1371/journal.pgen.1002741
Martin, J. et al. Association of genetic risk for schizophrenia with nonparticipation over time in a population-based cohort study. Am. J. Epidemiol. 183, 1149–1158 (2016).
pubmed: 27188935 pmcid: 4908211 doi: 10.1093/aje/kww009
Taylor, A. E. et al. Exploring the association of genetic factors with participation in the Avon Longitudinal Study of Parents and Children. Int. J. Epidemiol. 47, 1207–1216 (2018).
pubmed: 29800128 pmcid: 6124613 doi: 10.1093/ije/dyy060
Adams, M. J. et al. Factors associated with sharing e-mail information and mental health survey participation in large population cohorts. Int. J. Epidemiol. 49, 410–421 (2020).
pubmed: 31263887 doi: 10.1093/ije/dyz134
Tyrrell, J. et al. Genetic predictors of participation in optional components of UK Biobank. Nat. Commun. 12, 886 (2021).
pubmed: 33563987 pmcid: 7873270 doi: 10.1038/s41467-021-21073-y
Munafò, M. R., Tilling, K., Taylor, A. E., Evans, D. M. & Davey Smith, G. Collider scope: when selection bias can substantially influence observed associations. Int. J. Epidemiol. 47, 226–235 (2018).
pubmed: 29040562 doi: 10.1093/ije/dyx206
Boraska, V. et al. Genome-wide meta-analysis of common variant differences between men and women. Hum. Mol. Genet. 21, 4805–4815 (2012).
pubmed: 22843499 pmcid: 3471397 doi: 10.1093/hmg/dds304
Ryu, D., Ryu, J. & Lee, C. Genome-wide association study reveals sex-specific selection signals against autosomal nucleotide variants. J. Hum. Genet. 61, 423–426 (2016).
pubmed: 26763874 doi: 10.1038/jhg.2015.169
Watanabe, K. et al. A global overview of pleiotropy and genetic architecture in complex traits. Nat. Genet. 51, 1339–1348 (2019).
pubmed: 31427789 doi: 10.1038/s41588-019-0481-0
Lee, J. J. et al. Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat. Genet. 50, 1112–1121 (2018).
pubmed: 30038396 pmcid: 6393768 doi: 10.1038/s41588-018-0147-3
Censin, J. C. et al. Causal relationships between obesity and the leading causes of death in women and men. PLoS Genet. 15, e1008405 (2019).
pubmed: 31647808 pmcid: 6812754 doi: 10.1371/journal.pgen.1008405
Sudlow, C. et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).
pubmed: 25826379 pmcid: 4380465 doi: 10.1371/journal.pmed.1001779
Gaziano, J. M. et al. Million Veteran Program: a mega-biobank to study genetic influences on health and disease. J. Clin. Epidemiol. 70, 214–223 (2016).
pubmed: 26441289 doi: 10.1016/j.jclinepi.2015.09.016
Chen, Z. et al. China Kadoorie Biobank of 0.5 million people: survey methods, baseline characteristics and long-term follow-up. Int. J. Epidemiol. 40, 1652–1666 (2011).
pubmed: 22158673 pmcid: 3235021 doi: 10.1093/ije/dyr120
Dewey, F. E. et al. Distribution and clinical impact of functional variants in 50,726 whole-exome sequences from the DiscovEHR study. Science 354, aaf6814 (2016).
pubmed: 28008009 doi: 10.1126/science.aaf6814
Gottesman, O. et al. The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future. Genet. Med. 15, 761–771 (2013).
pubmed: 23743551 pmcid: 3795928 doi: 10.1038/gim.2013.72
Denny, J. C. et al. The “All of Us” Research Program. N. Engl. J. Med. 381, 668–676 (2019).
doi: 10.1056/NEJMsr1809937 pubmed: 31412182
Batty, G. D., Gale, C. R., Kivimäki, M., Deary, I. J. & Bell, S. Comparison of risk factor associations in UK Biobank against representative, general population based studies with conventional response rates: prospective cohort study and individual participant meta-analysis. BMJ 368, m131 (2020).
pubmed: 32051121 pmcid: 7190071 doi: 10.1136/bmj.m131
Richardson, D. B., Rzehak, P., Klenk, J. & Weiland, S. K. Analyses of case-control data for additional outcomes. Epidemiology 18, 441–445 (2007).
pubmed: 17473707 doi: 10.1097/EDE.0b013e318060d25c
Monsees, G. M., Tamimi, R. M. & Kraft, P. Genome-wide association scans for secondary traits using case-control samples. Genet. Epidemiol. 33, 717–728 (2009).
pubmed: 19365863 pmcid: 2790028 doi: 10.1002/gepi.20424
Dudbridge, F. et al. Adjustment for index event bias in genome-wide association studies of subsequent events. Nat. Commun. 10, 1561 (2019).
pubmed: 30952951 pmcid: 6450903 doi: 10.1038/s41467-019-09381-w
Mahmoud, O., Dudbridge, F., Davey Smith, G., Munafò, M. & Tilling, K. Slope-Hunter: a robust method for index-event bias correction in genome-wide association studies of subsequent traits. Preprint at bioRxiv https://doi.org/10.1101/2020.01.31.928077 (2020).
Grotzinger, A. D. et al. Genomic structural equation modelling provides insights into the multivariate genetic architecture of complex traits. Nat. Hum. Behav. 3, 513–525 (2019).
pubmed: 30962613 pmcid: 6520146 doi: 10.1038/s41562-019-0566-x
Heckman, J. J. Sample selection bias as a specification error. Econometrica 47, 153–161 (1979).
doi: 10.2307/1912352
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
pubmed: 32461654 pmcid: 7334197 doi: 10.1038/s41586-020-2308-7
Olsen, L. et al. Prevalence of rearrangements in the 22q11.2 region and population-based risk of neuropsychiatric and developmental disorders in a Danish population: a case-cohort study. Lancet Psychiatry 5, 573–580 (2018).
pubmed: 29886042 pmcid: 6560180 doi: 10.1016/S2215-0366(18)30168-8
Henn, B. M. et al. Cryptic distant relatives are common in both isolated and cosmopolitan genetic samples. PLoS ONE 7, e34267 (2012).
pubmed: 22509285 pmcid: 3317976 doi: 10.1371/journal.pone.0034267
Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).
pubmed: 25642633 pmcid: 4342297 doi: 10.1038/ng.3190
Zheng, X. et al. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics 28, 3326–3328 (2012).
pubmed: 23060615 pmcid: 3519454 doi: 10.1093/bioinformatics/bts606
Watanabe, K., Taskesen, E., van Bochoven, A. & Posthuma, D. Functional mapping and annotation of genetic associations with FUMA. Nat. Commun. 8, 1826 (2017).
pubmed: 29184056 pmcid: 5705698 doi: 10.1038/s41467-017-01261-5
Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019).
pubmed: 30445434 doi: 10.1093/nar/gky1120
Baselmans, B. M. L. et al. Multivariate genome-wide analyses of the well-being spectrum. Nat. Genet. 51, 445–451 (2019).
pubmed: 30643256 doi: 10.1038/s41588-018-0320-8
Jansen, I. E. et al. Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer’s disease risk. Nat. Genet. 51, 404–413 (2019).
pubmed: 30617256 pmcid: 6836675 doi: 10.1038/s41588-018-0311-9
Nolte, I. M. et al. Missing heritability: is the gap closing? An analysis of 32 complex traits in the Lifelines Cohort Study. Eur. J. Hum. Genet. 25, 877–885 (2017).
pubmed: 28401901 pmcid: 5520063 doi: 10.1038/ejhg.2017.50
Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
pubmed: 25642630 pmcid: 4495769 doi: 10.1038/ng.3211
Gazal, S. et al. Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421–1427 (2017).
pubmed: 28892061 pmcid: 6133304 doi: 10.1038/ng.3954
Gazal, S., Marquez-Luna, C., Finucane, H. K. & Price, A. L. Reconciling S-LDSC and LDAK functional enrichment estimates. Nat. Genet. 51, 1202–1204 (2019).
pubmed: 31285579 pmcid: 7006477 doi: 10.1038/s41588-019-0464-1
Evans, L. M. et al. Comparison of methods that use whole genome data to estimate the heritability and genetic architecture of complex traits. Nat. Genet. 50, 737–745 (2018).
pubmed: 29700474 pmcid: 5934350 doi: 10.1038/s41588-018-0108-x
Lee, S. H., Wray, N. R., Goddard, M. E. & Visscher, P. M. Estimating missing heritability for disease from genome-wide association studies. Am. J. Hum. Genet. 88, 294–305 (2011).
pubmed: 21376301 pmcid: 3059431 doi: 10.1016/j.ajhg.2011.02.002
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).
pubmed: 26414676 pmcid: 4797329 doi: 10.1038/ng.3406
Locke, A. E. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206 (2015).
pubmed: 25673413 pmcid: 4382211 doi: 10.1038/nature14177
Hemani, G. et al. The MR-Base platform supports systematic causal inference across the human phenome. eLife 7, e34408 (2018).
pubmed: 29846171 pmcid: 5976434 doi: 10.7554/eLife.34408
Choi, S. W. & O’Reilly, P. F. PRSice-2: polygenic risk score software for biobank-scale data. Gigascience 8, giz082 (2019).
pubmed: 31307061 pmcid: 6629542 doi: 10.1093/gigascience/giz082

Auteurs

Nicola Pirastu (N)

Centre for Global Health Research, Usher Institute, University of Edinburgh, Edinburgh, UK.

Mattia Cordioli (M)

Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland.

Priyanka Nandakumar (P)

23andMe, Inc., Sunnyvale, CA, USA.

Gianmarco Mignogna (G)

Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland.
Department of Statistics and Quantitative Methods, University of Milano Bicocca, Milan, Italy.
Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.

Abdel Abdellaoui (A)

Department of Psychiatry, Amsterdam Academic Medical Center, University of Amsterdam, Amsterdam, the Netherlands.

Benjamin Hollis (B)

MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge, Cambridge, UK.
The Kennedy Institute of Rheumatology, University of Oxford, Oxford, UK.

Masahiro Kanai (M)

Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan.

Veera M Rajagopal (VM)

Department of Biomedicine, Aarhus University, Aarhus, Denmark.
The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark.
Centre for Genomics and Personalized Medicine, Center for Genimics and Personalized Medice, Aarhus University, Aarhus, Denmark.
Centre for Integrative Sequencing, iSEQ, Aarhus University, Aarhus, Denmark.

Pietro Della Briotta Parolo (PDB)

Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland.

Nikolas Baya (N)

Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
Stanley Center for Psychiatric Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA.

Caitlin E Carey (CE)

Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
Stanley Center for Psychiatric Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA.

Juha Karjalainen (J)

Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland.
Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.

Thomas D Als (TD)

Department of Biomedicine, Aarhus University, Aarhus, Denmark.
The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark.
Centre for Genomics and Personalized Medicine, Center for Genimics and Personalized Medice, Aarhus University, Aarhus, Denmark.
Centre for Integrative Sequencing, iSEQ, Aarhus University, Aarhus, Denmark.

Matthijs D Van der Zee (MD)

Faculty of Behavioural and Movement Sciences, Biological Psychology, Vrije Universiteit, Amsterdam, the Netherlands.

Felix R Day (FR)

MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge, Cambridge, UK.

Ken K Ong (KK)

MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge, Cambridge, UK.
Department of Paediatrics, University of Cambridge, Cambridge, UK.

Takayuki Morisaki (T)

Division of Molecular Pathology, Institute of Medical Sciences, University of Tokyo, Tokyo, Japan.
BioBank Japan, Institute of Medical Science, University of Tokyo, Tokyo, Japan.
Department of Internal Medicine, Institute of Medical Science, University of Tokyo Hospital, Tokyo, Japan.

Eco de Geus (E)

Faculty of Behavioural and Movement Sciences, Biological Psychology, Vrije Universiteit, Amsterdam, the Netherlands.
Amsterdam Public Health Research institute, Amsterdam, the Netherlands.

Rino Bellocco (R)

Department of Statistics and Quantitative Methods, University of Milano Bicocca, Milan, Italy.
Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden.

Yukinori Okada (Y)

Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan.
Laboratory of Statistical Immunology, World Premier International Immunology Frontier Research Center, Osaka University, Suita, Japan.
Integrated Frontier Research for Medical Science Division, Institute for Open and Transdisciplinary Research Initiatives, Osaka University, Suita, Japan.

Anders D Børglum (AD)

Department of Biomedicine, Aarhus University, Aarhus, Denmark.
The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark.
Centre for Genomics and Personalized Medicine, Center for Genimics and Personalized Medice, Aarhus University, Aarhus, Denmark.
Centre for Integrative Sequencing, iSEQ, Aarhus University, Aarhus, Denmark.

Peter Joshi (P)

Centre for Global Health Research, Usher Institute, University of Edinburgh, Edinburgh, UK.

Adam Auton (A)

23andMe, Inc., Sunnyvale, CA, USA.

David Hinds (D)

23andMe, Inc., Sunnyvale, CA, USA.

Benjamin M Neale (BM)

Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
Stanley Center for Psychiatric Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA.

Raymond K Walters (RK)

Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
Stanley Center for Psychiatric Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA.

Michel G Nivard (MG)

Faculty of Behavioural and Movement Sciences, Biological Psychology, Vrije Universiteit, Amsterdam, the Netherlands.
Amsterdam Public Health, Methodology Program, Amsterdam, the Netherlands.
Amsterdam Neuroscience-Mood, Anxiety, Psychosis, Stress & Sleep, Amsterdam, the Netherlands.

John R B Perry (JRB)

MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge, Cambridge, UK. john.perry@mrc-epid.cam.ac.uk.

Andrea Ganna (A)

Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland. aganna@broadinstitute.org.
Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA. aganna@broadinstitute.org.
Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA. aganna@broadinstitute.org.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH