On the cross-population generalizability of gene expression prediction models.
Journal
PLoS genetics
ISSN: 1553-7404
Titre abrégé: PLoS Genet
Pays: United States
ID NLM: 101239074
Informations de publication
Date de publication:
08 2020
08 2020
Historique:
received:
19
11
2019
accepted:
10
06
2020
revised:
26
08
2020
pubmed:
17
8
2020
medline:
24
9
2020
entrez:
16
8
2020
Statut:
epublish
Résumé
The genetic control of gene expression is a core component of human physiology. For the past several years, transcriptome-wide association studies have leveraged large datasets of linked genotype and RNA sequencing information to create a powerful gene-based test of association that has been used in dozens of studies. While numerous discoveries have been made, the populations in the training data are overwhelmingly of European descent, and little is known about the generalizability of these models to other populations. Here, we test for cross-population generalizability of gene expression prediction models using a dataset of African American individuals with RNA-Seq data in whole blood. We find that the default models trained in large datasets such as GTEx and DGN fare poorly in African Americans, with a notable reduction in prediction accuracy when compared to European Americans. We replicate these limitations in cross-population generalizability using the five populations in the GEUVADIS dataset. Via realistic simulations of both populations and gene expression, we show that accurate cross-population generalizability of transcriptome prediction only arises when eQTL architecture is substantially shared across populations. In contrast, models with non-identical eQTLs showed patterns similar to real-world data. Therefore, generating RNA-Seq data in diverse populations is a critical step towards multi-ethnic utility of gene expression prediction.
Identifiants
pubmed: 32797036
doi: 10.1371/journal.pgen.1008927
pii: PGENETICS-D-19-01922
pmc: PMC7449671
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
e1008927Subventions
Organisme : NHLBI NIH HHS
ID : R01 HL117004
Pays : United States
Organisme : NIGMS NIH HHS
ID : TL4 GM118986
Pays : United States
Organisme : NIMHD NIH HHS
ID : P60 MD006902
Pays : United States
Organisme : NIEHS NIH HHS
ID : R01 ES015794
Pays : United States
Organisme : NIGMS NIH HHS
ID : T34 GM008574
Pays : United States
Organisme : NHLBI NIH HHS
ID : R01 HL128439
Pays : United States
Organisme : NHGRI NIH HHS
ID : T32 HG000044
Pays : United States
Organisme : NHLBI NIH HHS
ID : K01 HL140218
Pays : United States
Organisme : NHGRI NIH HHS
ID : R56 HG010297
Pays : United States
Organisme : NHGRI NIH HHS
ID : U01 HG007419
Pays : United States
Organisme : NHGRI NIH HHS
ID : U01 HG009080
Pays : United States
Organisme : NHLBI NIH HHS
ID : R01 HL135156
Pays : United States
Organisme : NIEHS NIH HHS
ID : R21 ES024844
Pays : United States
Organisme : NHGRI NIH HHS
ID : R01 HG010297
Pays : United States
Organisme : NHLBI NIH HHS
ID : R01 HL104608
Pays : United States
Organisme : NHLBI NIH HHS
ID : R01 HL141992
Pays : United States
Organisme : NIGMS NIH HHS
ID : K12 GM081266
Pays : United States
Organisme : NHLBI NIH HHS
ID : R00 HL135403
Pays : United States
Organisme : NIMHD NIH HHS
ID : R01 MD010443
Pays : United States
Organisme : NIGMS NIH HHS
ID : UL1 GM118985
Pays : United States
Organisme : NIGMS NIH HHS
ID : RL5 GM118984
Pays : United States
Déclaration de conflit d'intérêts
We have read the journal's policy and one author of this manuscript (Chris Gignoux) has the following competing interests: ownership of stock in 23andMe. The remaining authors have declared that no competing interests exist.
Références
Nat Genet. 2016 May;48(5):481-7
pubmed: 27019110
N Engl J Med. 2010 Jul 22;363(4):321-30
pubmed: 20647190
Nat Genet. 2015 Sep;47(9):1091-8
pubmed: 26258848
Hum Mol Genet. 2018 May 15;27(10):1819-1829
pubmed: 29547942
J Community Genet. 2017 Oct;8(4):255-266
pubmed: 28770442
Am J Respir Crit Care Med. 2013 Aug 1;188(3):309-18
pubmed: 23750510
Nat Commun. 2018 Mar 7;9(1):988
pubmed: 29511167
Nat Genet. 2016 Mar;48(3):245-52
pubmed: 26854917
Nature. 2011 Jul 13;475(7355):163-5
pubmed: 21753830
Nat Protoc. 2009;4(8):1184-91
pubmed: 19617889
Sci Rep. 2017 Jul 14;7(1):5435
pubmed: 28710439
Genetics. 2012 Jun;191(2):607-19
pubmed: 22491189
Am J Respir Crit Care Med. 2013 Nov 15;188(10):1202-9
pubmed: 24050698
Nat Commun. 2018 May 8;9(1):1825
pubmed: 29739930
Nat Commun. 2018 Feb 13;9(1):490
pubmed: 29440659
Hum Mol Genet. 2013 Dec 15;22(24):5065-74
pubmed: 23900078
Nature. 2015 Oct 1;526(7571):75-81
pubmed: 26432246
PLoS Med. 2015 Dec 15;12(12):e1001918
pubmed: 26671224
Genet Epidemiol. 2020 Jul;44(5):425-441
pubmed: 32190932
Am J Hum Genet. 2018 May 3;102(5):760-775
pubmed: 29706349
Eur J Hum Genet. 2016 Apr;24(4):592-9
pubmed: 26130488
Public Health Genomics. 2010;13(2):72-9
pubmed: 19439916
Genome Biol. 2016 Jul 14;17(1):157
pubmed: 27418169
Nat Genet. 2016 Oct;48(10):1284-1287
pubmed: 27571263
Am J Hum Genet. 2016 Sep 1;99(3):624-635
pubmed: 27588449
Nature. 2010 Sep 2;467(7311):52-8
pubmed: 20811451
Eur J Hum Genet. 2016 Aug;24(9):1330-6
pubmed: 26839038
Am J Respir Crit Care Med. 2018 Jun 15;197(12):1552-1564
pubmed: 29509491
Nat Neurosci. 2018 Jun;21(6):811-819
pubmed: 29802388
Bioinformatics. 2011 Aug 15;27(16):2304-5
pubmed: 21653516
Genome Med. 2014 Oct 31;6(10):91
pubmed: 25473427
Nat Genet. 2011 Mar;43(3):237-41
pubmed: 21297632
Genome Res. 2014 Jan;24(1):14-24
pubmed: 24092820
J Allergy Clin Immunol. 2017 Apr;139(4):1148-1157
pubmed: 27554816
PLoS One. 2014 Sep 05;9(9):e107166
pubmed: 25192014
Hum Mol Genet. 2018 Feb 15;27(4):732-741
pubmed: 29228364
Am J Hum Genet. 2017 Apr 6;100(4):635-649
pubmed: 28366442
Curr Opin Genet Dev. 2018 Dec;53:98-104
pubmed: 30125792
Nature. 2017 Oct 11;550(7675):204-213
pubmed: 29022597
Front Genet. 2019 Apr 03;10:261
pubmed: 31001318
Am J Epidemiol. 2002 Nov 1;156(9):871-81
pubmed: 12397006
Nat Genet. 2013 Jun;45(6):580-5
pubmed: 23715323
Nat Genet. 2007 Oct;39(10):1217-24
pubmed: 17873874
PLoS Genet. 2016 May 27;12(5):e1006059
pubmed: 27232753
Nat Commun. 2019 Jul 24;10(1):3300
pubmed: 31341166
N Engl J Med. 2016 Aug 18;375(7):655-65
pubmed: 27532831
Nat Rev Genet. 2018 Mar;19(3):175-185
pubmed: 29151588
PLoS Genet. 2009 Jun;5(6):e1000519
pubmed: 19543370
Nature. 2013 Sep 26;501(7468):506-11
pubmed: 24037378
Genomics. 2011 Dec;98(6):422-30
pubmed: 21903159
Hum Mol Genet. 2010 Jul 15;19(14):2877-85
pubmed: 20418488
Nature. 2016 Oct 12;538(7624):161-164
pubmed: 27734877
PLoS Genet. 2018 Aug 10;14(8):e1007586
pubmed: 30096133
Am J Respir Crit Care Med. 2013 Apr 1;187(7):697-702
pubmed: 23392439
Nat Genet. 2016 Nov;48(11):1443-1448
pubmed: 27694958
PLoS Genet. 2016 Nov 11;12(11):e1006423
pubmed: 27835642
Sci Rep. 2015 Jan 22;5:7960
pubmed: 25609584
PLoS Med. 2015 Mar 31;12(3):e1001779
pubmed: 25826379
Am J Respir Crit Care Med. 2016 Feb 15;193(4):348-50
pubmed: 26871667
Bioinformatics. 2005 Aug 15;21(16):3439-40
pubmed: 16082012