Non-linear machine learning models incorporating SNPs and PRS improve polygenic prediction in diverse human populations.
Journal
Communications biology
ISSN: 2399-3642
Titre abrégé: Commun Biol
Pays: England
ID NLM: 101719179
Informations de publication
Date de publication:
22 08 2022
22 08 2022
Historique:
received:
19
07
2021
accepted:
05
08
2022
entrez:
22
8
2022
pubmed:
23
8
2022
medline:
25
8
2022
Statut:
epublish
Résumé
Polygenic risk scores (PRS) are commonly used to quantify the inherited susceptibility for a trait, yet they fail to account for non-linear and interaction effects between single nucleotide polymorphisms (SNPs). We address this via a machine learning approach, validated in nine complex phenotypes in a multi-ancestry population. We use an ensemble method of SNP selection followed by gradient boosted trees (XGBoost) to allow for non-linearities and interaction effects. We compare our results to the standard, linear PRS model developed using PRSice, LDpred2, and lassosum2. Combining a PRS as a feature in an XGBoost model results in a relative increase in the percentage variance explained compared to the standard linear PRS model by 22% for height, 27% for HDL cholesterol, 43% for body mass index, 50% for sleep duration, 58% for systolic blood pressure, 64% for total cholesterol, 66% for triglycerides, 77% for LDL cholesterol, and 100% for diastolic blood pressure. Multi-ancestry trained models perform similarly to specific racial/ethnic group trained models and are consistently superior to the standard linear PRS models. This work demonstrates an effective method to account for non-linearities and interaction effects in genetics-based prediction models.
Identifiants
pubmed: 35995843
doi: 10.1038/s42003-022-03812-z
pii: 10.1038/s42003-022-03812-z
pmc: PMC9395509
doi:
Banques de données
figshare
['10.6084/m9.figshare.20304135.v1', '10.6084/m9.figshare.20301423.v1']
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
856Subventions
Organisme : NHLBI NIH HHS
ID : HHSN268201100037C
Pays : United States
Organisme : NHLBI NIH HHS
ID : R01 HL120393
Pays : United States
Organisme : NHGRI NIH HHS
ID : U54 HG003067
Pays : United States
Organisme : NHLBI NIH HHS
ID : U01 HL120393
Pays : United States
Organisme : NHLBI NIH HHS
ID : R01 HL127564
Pays : United States
Organisme : NHGRI NIH HHS
ID : U54 HG003273
Pays : United States
Organisme : NHLBI NIH HHS
ID : R01 HL146860
Pays : United States
Organisme : NHLBI NIH HHS
ID : HHSN268201800001C
Pays : United States
Organisme : NHGRI NIH HHS
ID : UM1 HG008898
Pays : United States
Organisme : NHLBI NIH HHS
ID : R01 HL117626
Pays : United States
Organisme : NHGRI NIH HHS
ID : U24 HG008956
Pays : United States
Organisme : NIEHS NIH HHS
ID : HHSN268201600032C
Pays : United States
Organisme : NHLBI NIH HHS
ID : R21 HL145425
Pays : United States
Organisme : NHLBI NIH HHS
ID : R01 HL092577
Pays : United States
Organisme : NHLBI NIH HHS
ID : R01 HL059367
Pays : United States
Organisme : NHLBI NIH HHS
ID : R01 HL142711
Pays : United States
Organisme : NHLBI NIH HHS
ID : R35 HL135818
Pays : United States
Organisme : NHLBI NIH HHS
ID : R01 HL098433
Pays : United States
Organisme : NHLBI NIH HHS
ID : HHSN268201500015C
Pays : United States
Organisme : NIA NIH HHS
ID : R21 AG070644
Pays : United States
Organisme : NIEHS NIH HHS
ID : HHSN268201600033C
Pays : United States
Organisme : NHLBI NIH HHS
ID : HHSN268201500014C
Pays : United States
Organisme : NCATS NIH HHS
ID : KL2 TR002490
Pays : United States
Investigateurs
Paul de Vries
(P)
Informations de copyright
© 2022. The Author(s).
Références
Ann Epidemiol. 1992 Jan-Mar;2(1-2):23-8
pubmed: 1342260
Bioinformatics. 2017 Aug 01;33(15):2251-2257
pubmed: 28334390
Genome Res. 2001 Jan;11(1):143-51
pubmed: 11156623
Hum Mol Genet. 2018 Oct 15;27(20):3641-3649
pubmed: 30124842
Am J Hum Genet. 2016 Jan 7;98(1):127-48
pubmed: 26748516
Sci Rep. 2018 Sep 3;8(1):13149
pubmed: 30177847
G3 (Bethesda). 2018 May 4;8(5):1687-1699
pubmed: 29549092
Am J Hum Genet. 2017 Apr 6;100(4):635-649
pubmed: 28366442
Genome Med. 2020 May 18;12(1):44
pubmed: 32423490
Am J Hum Genet. 2016 Jan 7;98(1):165-84
pubmed: 26748518
Sci Rep. 2019 Jan 29;9(1):843
pubmed: 30696834
Hypertension. 2006 Jul;48(1):e3; author reply e5
pubmed: 16769991
Front Genet. 2019 Mar 27;10:267
pubmed: 30972108
Nat Commun. 2019 Jul 25;10(1):3328
pubmed: 31346163
Nat Genet. 2018 Nov;50(11):1514-1523
pubmed: 30275531
Nat Commun. 2019 Mar 7;10(1):1100
pubmed: 30846698
Genet Epidemiol. 2015 May;39(4):276-93
pubmed: 25810074
Sci Rep. 2020 Jul 6;10(1):11044
pubmed: 32632202
Nat Genet. 2019 Jan;51(1):51-62
pubmed: 30578418
Gigascience. 2019 Jul 1;8(7):
pubmed: 31307061
Bioinformatics. 2020 Dec 16;:
pubmed: 33326037
HGG Adv. 2021 Apr 8;2(2):
pubmed: 33937878
Nat Genet. 2019 Apr;51(4):584-591
pubmed: 30926966
Adv Chronic Kidney Dis. 2014 Sep;21(5):426-33
pubmed: 25168832
HGG Adv. 2021 Jan 14;2(1):
pubmed: 33564748
Am J Hum Genet. 2011 Jan 7;88(1):76-82
pubmed: 21167468
Arterioscler Thromb Vasc Biol. 2015 May;35(5):1271-8
pubmed: 25745061
Eur Heart J. 2021 Sep 7;42(34):3358-3360
pubmed: 33993286
Bioinformatics. 2019 Dec 15;35(24):5346-5348
pubmed: 31329242
Nat Genet. 2008 May;40(5):491-2
pubmed: 18443580
Circulation. 2011 May 24;123(20):2292-333
pubmed: 21502576
Nature. 2014 Apr 10;508(7495):249-53
pubmed: 24572353
Am J Epidemiol. 2021 Oct 1;190(10):1977-1992
pubmed: 33861317
G3 (Bethesda). 2020 Dec 3;10(12):4553-4563
pubmed: 33023974
Bioinformatics. 2018 Aug 15;34(16):2781-2787
pubmed: 29617937
Nucleic Acids Res. 2019 Jan 8;47(D1):D853-D858
pubmed: 30407534
Nat Commun. 2019 Apr 16;10(1):1776
pubmed: 30992449
Artif Intell Med. 2018 Apr;85:43-49
pubmed: 28943335
Nature. 2016 Oct 12;538(7624):161-164
pubmed: 27734877
PLoS Genet. 2019 Dec 23;15(12):e1008500
pubmed: 31869403
Nat Rev Genet. 2018 Sep;19(9):581-590
pubmed: 29789686
Genet Epidemiol. 2017 Sep;41(6):469-480
pubmed: 28480976
Nat Protoc. 2020 Sep;15(9):2759-2772
pubmed: 32709988
Nature. 2021 Feb;590(7845):290-299
pubmed: 33568819
Genet Epidemiol. 2019 Feb;43(1):50-62
pubmed: 30368908
Genet Epidemiol. 2019 Apr;43(3):263-275
pubmed: 30653739
Curr Cardiol Rev. 2010 Feb;6(1):54-61
pubmed: 21286279
Am J Med Genet B Neuropsychiatr Genet. 2019 Jan;180(1):80-85
pubmed: 30516002