A Cautionary Note on the Effects of Population Stratification Under an Extreme Phenotype Sampling Design.

Type 1 error association study extreme phenotype sampling population stratification principal component analysis

Journal

Frontiers in genetics
ISSN: 1664-8021
Titre abrégé: Front Genet
Pays: Switzerland
ID NLM: 101560621

Informations de publication

Date de publication:
2019
Historique:
received: 31 12 2018
accepted: 12 04 2019
entrez: 28 5 2019
pubmed: 28 5 2019
medline: 28 5 2019
Statut: epublish

Résumé

Extreme phenotype sampling (EPS) is a popular study design used to reduce genotyping or sequencing costs. Assuming continuous phenotype data are available on a large cohort, EPS involves genotyping or sequencing only those individuals with extreme phenotypic values. Although this design has been shown to have high power to detect genetic effects even at smaller sample sizes, little attention has been paid to the effects of confounding variables, and in particular population stratification. Using extensive simulations, we demonstrate that the false positive rate under the EPS design is greatly inflated relative to a random sample of equal size or a "case-control"-like design where the cases are from one phenotypic extreme and the controls randomly sampled. The inflated false positive rate is observed even with allele frequency and phenotype mean differences taken from European population data. We show that the effects of confounding are not reduced by increasing the sample size. We also show that including the top principal components in a logistic regression model is sufficient for controlling the type 1 error rate using data simulated with a population genetics model and using 1,000 Genomes genotype data. Our results suggest that when an EPS study is conducted, it is crucial to adjust for all confounding variables. For genetic association studies this requires genotyping a sufficient number of markers to allow for ancestry estimation. Unfortunately, this could increase the costs of a study if sequencing or genotyping was only planned for candidate genes or pathways; the available genetic data would not be suitable for ancestry correction as many of the variants could have a true association with the trait.

Identifiants

pubmed: 31130982
doi: 10.3389/fgene.2019.00398
pmc: PMC6509877
doi:

Types de publication

Journal Article

Langues

eng

Pagination

398

Références

Am J Hum Genet. 1999 Jun;64(6):1764-72
pubmed: 10330364
Genetics. 2000 Jun;155(2):945-59
pubmed: 10835412
Behav Genet. 2000 Mar;30(2):141-6
pubmed: 10979604
Biometrics. 1999 Dec;55(4):997-1004
pubmed: 11315092
Theor Popul Biol. 2001 Nov;60(3):155-66
pubmed: 11855950
Hum Mol Genet. 2003 Nov 1;12(21):2733-43
pubmed: 12966036
Nat Genet. 2005 Aug;37(8):868-72
pubmed: 16041375
Am J Epidemiol. 2005 Oct 1;162(7):623-32
pubmed: 16107566
Am J Hum Genet. 2006 Mar;78(3):498-504
pubmed: 16465623
Hum Genet. 2006 May;119(4):365-75
pubmed: 16474934
Science. 2006 Apr 14;312(5771):279-83
pubmed: 16614226
Nat Genet. 2006 Aug;38(8):904-9
pubmed: 16862161
Am J Hum Genet. 2007 Mar;80(3):567-76
pubmed: 17273979
J Bone Miner Res. 2008 Apr;23(4):499-506
pubmed: 18021006
Genetics. 2008 Mar;178(3):1709-23
pubmed: 18385116
Hum Mutat. 2009 Jan;30(1):69-78
pubmed: 18683858
Proc Natl Acad Sci U S A. 2009 Mar 10;106(10):3871-6
pubmed: 19202052
PLoS One. 2009;4(5):e5472
pubmed: 19424496
PLoS Genet. 2009 Oct;5(10):e1000686
pubmed: 19834557
Nat Genet. 2010 Apr;42(4):348-54
pubmed: 20208533
Genet Epidemiol. 2011 May;35(4):236-46
pubmed: 21308769
Am J Hum Genet. 2011 Jul 15;89(1):82-93
pubmed: 21737059
Nat Methods. 2011 Sep 04;8(10):833-5
pubmed: 21892150
Eur J Hum Genet. 2012 Apr;20(4):449-56
pubmed: 22166943
Nat Genet. 2012 Feb 05;44(3):243-6
pubmed: 22306651
Ann Hum Genet. 2012 May;76(3):237-45
pubmed: 22497479
Nat Methods. 2012 May 30;9(6):525-6
pubmed: 22669648
Nat Genet. 2012 Jun 17;44(7):821-4
pubmed: 22706312
Nat Genet. 2012 Jul 08;44(8):886-9
pubmed: 22772370
Genet Epidemiol. 2013 Feb;37(2):142-51
pubmed: 23184518
Proc Natl Acad Sci U S A. 2013 Jul 23;110(30):12247-52
pubmed: 23847208
Theor Appl Genet. 1992 Nov;85(2-3):353-9
pubmed: 24197326
Eur J Hum Genet. 2015 Mar;23(3):381-7
pubmed: 24916650
Genetics. 1989 Jan;121(1):185-99
pubmed: 2563713
Nature. 2015 Oct 1;526(7571):68-74
pubmed: 26432245
Hum Genet. 2016 Feb;135(2):193-200
pubmed: 26693933
Hum Genomics. 2016 Jan 07;10:1
pubmed: 26744305
Cancer Med. 2016 Apr;5(4):631-9
pubmed: 26763541
Genes (Basel). 2016 Jan 14;7(1):null
pubmed: 26784232
Am J Hum Genet. 2016 Apr 7;98(4):653-66
pubmed: 27018471
Sci Rep. 2016 Nov 18;6:37444
pubmed: 27857226
Genet Epidemiol. 2018 Apr;42(3):276-287
pubmed: 29280188
PLoS One. 2018 Dec 6;13(12):e0207677
pubmed: 30521541
Genetica. 1995;96(1-2):3-12
pubmed: 7607457
Scand J Gastroenterol Suppl. 1994;202:7-20
pubmed: 8042019

Auteurs

Michela Panarella (M)

Department of Biology, University of Ottawa, Ottawa, ON, Canada.
Department of Mathematics and Statistics, University of Ottawa, Ottawa, ON, Canada.

Kelly M Burkett (KM)

Department of Mathematics and Statistics, University of Ottawa, Ottawa, ON, Canada.

Classifications MeSH