Multi-Set Testing Strategies Show Good Behavior When Applied to Very Large Sets of Rare Variants.
missing heritability
pathway testing
power investigation
rare variant analysis
statistical genetics
Journal
Frontiers in genetics
ISSN: 1664-8021
Titre abrégé: Front Genet
Pays: Switzerland
ID NLM: 101560621
Informations de publication
Date de publication:
2020
2020
Historique:
received:
05
08
2020
accepted:
05
10
2020
entrez:
26
11
2020
pubmed:
27
11
2020
medline:
27
11
2020
Statut:
epublish
Résumé
Gene-based tests of association (e.g., variance components and burden tests) are now common practice for analyses attempting to elucidate the contribution of rare genetic variants on common disease. As sequencing datasets continue to grow in size, the number of variants within each set (e.g., gene) being tested is also continuing to grow. Pathway-based methods have been used to allow for the initial aggregation of gene-based statistical evidence and then the subsequent aggregation of evidence across the pathway. This "multi-set" approach (first gene-based test, followed by pathway-based) lacks thorough exploration in regard to evaluating genotype-phenotype associations in the age of large, sequenced datasets. In particular, we wonder whether there are statistical and biological characteristics that make the multi-set approach optimal vs. simply doing all gene-based tests? In this paper, we provide an intuitive framework for evaluating these questions and use simulated data to affirm us this intuition. A real data application is provided demonstrating how our insights manifest themselves in practice. Ultimately, we find that when initial subsets are biologically informative (e.g., tending to aggregate causal genetic variants within one or more subsets, often genes), multi-set strategies can improve statistical power, with particular gains in cases where causal variants are aggregated in subsets with less variants overall (high proportion of causal variants in the subset). However, we find that there is little advantage when the sets are non-informative (similar proportion of causal variants in the subsets). Our application to real data further demonstrates this intuition. In practice, we recommend wider use of pathway-based methods and further exploration of optimal ways of aggregating variants into subsets based on emerging biological evidence of the genetic architecture of complex disease.
Identifiants
pubmed: 33240333
doi: 10.3389/fgene.2020.591606
pmc: PMC7680887
doi:
Types de publication
Journal Article
Langues
eng
Pagination
591606Subventions
Organisme : NHGRI NIH HHS
ID : R15 HG006915
Pays : United States
Informations de copyright
Copyright © 2020 Fore, Boehme, Li, Westra and Tintle.
Références
Genet Epidemiol. 2017 May;41(4):297-308
pubmed: 28211093
Front Oncol. 2019 Jul 05;9:574
pubmed: 31338326
BMC Proc. 2016 Oct 18;10(Suppl 7):349-355
pubmed: 27980661
Eur J Hum Genet. 2016 Jan;25(1):123-129
pubmed: 27577545
Genet Epidemiol. 2014 Sep;38 Suppl 1:S86-91
pubmed: 25112195
Sci Rep. 2019 Nov 20;9(1):17173
pubmed: 31748686
BMC Proc. 2009 Dec 15;3 Suppl 7:S96
pubmed: 20018093
Proc Natl Acad Sci U S A. 2005 Oct 25;102(43):15545-50
pubmed: 16199517
BMC Proc. 2011 Nov 29;5 Suppl 9:S48
pubmed: 22373429
BMC Med Genomics. 2013;6 Suppl 2:S6
pubmed: 23819467
Am J Hum Genet. 2011 Jul 15;89(1):82-93
pubmed: 21737059
Am J Hum Genet. 2012 Aug 10;91(2):224-37
pubmed: 22863193
Am J Hum Genet. 2008 Sep;83(3):311-21
pubmed: 18691683
BioData Min. 2016 Aug 30;9(1):27
pubmed: 27582876
Genet Epidemiol. 2019 Jun;43(4):365-372
pubmed: 30623491
BMC Med Inform Decis Mak. 2017 May 18;17(Suppl 1):61
pubmed: 28539126
Genet Epidemiol. 2018 Sep;42(6):516-527
pubmed: 29932245
Genet Epidemiol. 2013 May;37(4):345-57
pubmed: 23526307
Genet Epidemiol. 2013 Jul;37(5):478-94
pubmed: 23650134
Biometrics. 1998 Jun;54(2):638-45
pubmed: 9629647
Genet Epidemiol. 2009;33 Suppl 1:S74-80
pubmed: 19924705
Genome Biol. 2017 Apr 27;18(1):77
pubmed: 28449691
J Neurosci. 2010 Jun 23;30(25):8376-82
pubmed: 20573884
PLoS Med. 2015 Mar 31;12(3):e1001779
pubmed: 25826379
Nat Rev Genet. 2019 Dec;20(12):747-759
pubmed: 31605095
Cell Mol Life Sci. 2017 Aug;74(15):2723-2733
pubmed: 28285320
Genet Epidemiol. 2011 Nov;35(7):606-19
pubmed: 21769936
Nature. 2009 Oct 8;461(7265):747-53
pubmed: 19812666
Nat Rev Genet. 2010 Jun;11(6):415-25
pubmed: 20479773
Am J Hum Genet. 2010 Jun 11;86(6):929-42
pubmed: 20560208
Eur J Hum Genet. 2016 May;24(5):767-73
pubmed: 26508571
PLoS Genet. 2012 Feb;8(2):e1002496
pubmed: 22319458