Non-linear machine learning models incorporating SNPs and PRS improve polygenic prediction in diverse human populations.


Journal

Communications biology
ISSN: 2399-3642
Titre abrégé: Commun Biol
Pays: England
ID NLM: 101719179

Informations de publication

Date de publication:
22 08 2022
Historique:
received: 19 07 2021
accepted: 05 08 2022
entrez: 22 8 2022
pubmed: 23 8 2022
medline: 25 8 2022
Statut: epublish

Résumé

Polygenic risk scores (PRS) are commonly used to quantify the inherited susceptibility for a trait, yet they fail to account for non-linear and interaction effects between single nucleotide polymorphisms (SNPs). We address this via a machine learning approach, validated in nine complex phenotypes in a multi-ancestry population. We use an ensemble method of SNP selection followed by gradient boosted trees (XGBoost) to allow for non-linearities and interaction effects. We compare our results to the standard, linear PRS model developed using PRSice, LDpred2, and lassosum2. Combining a PRS as a feature in an XGBoost model results in a relative increase in the percentage variance explained compared to the standard linear PRS model by 22% for height, 27% for HDL cholesterol, 43% for body mass index, 50% for sleep duration, 58% for systolic blood pressure, 64% for total cholesterol, 66% for triglycerides, 77% for LDL cholesterol, and 100% for diastolic blood pressure. Multi-ancestry trained models perform similarly to specific racial/ethnic group trained models and are consistently superior to the standard linear PRS models. This work demonstrates an effective method to account for non-linearities and interaction effects in genetics-based prediction models.

Identifiants

pubmed: 35995843
doi: 10.1038/s42003-022-03812-z
pii: 10.1038/s42003-022-03812-z
pmc: PMC9395509
doi:

Banques de données

figshare
['10.6084/m9.figshare.20304135.v1', '10.6084/m9.figshare.20301423.v1']

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

856

Subventions

Organisme : NHLBI NIH HHS
ID : HHSN268201100037C
Pays : United States
Organisme : NHLBI NIH HHS
ID : R01 HL120393
Pays : United States
Organisme : NHGRI NIH HHS
ID : U54 HG003067
Pays : United States
Organisme : NHLBI NIH HHS
ID : U01 HL120393
Pays : United States
Organisme : NHLBI NIH HHS
ID : R01 HL127564
Pays : United States
Organisme : NHGRI NIH HHS
ID : U54 HG003273
Pays : United States
Organisme : NHLBI NIH HHS
ID : R01 HL146860
Pays : United States
Organisme : NHLBI NIH HHS
ID : HHSN268201800001C
Pays : United States
Organisme : NHGRI NIH HHS
ID : UM1 HG008898
Pays : United States
Organisme : NHLBI NIH HHS
ID : R01 HL117626
Pays : United States
Organisme : NHGRI NIH HHS
ID : U24 HG008956
Pays : United States
Organisme : NIEHS NIH HHS
ID : HHSN268201600032C
Pays : United States
Organisme : NHLBI NIH HHS
ID : R21 HL145425
Pays : United States
Organisme : NHLBI NIH HHS
ID : R01 HL092577
Pays : United States
Organisme : NHLBI NIH HHS
ID : R01 HL059367
Pays : United States
Organisme : NHLBI NIH HHS
ID : R01 HL142711
Pays : United States
Organisme : NHLBI NIH HHS
ID : R35 HL135818
Pays : United States
Organisme : NHLBI NIH HHS
ID : R01 HL098433
Pays : United States
Organisme : NHLBI NIH HHS
ID : HHSN268201500015C
Pays : United States
Organisme : NIA NIH HHS
ID : R21 AG070644
Pays : United States
Organisme : NIEHS NIH HHS
ID : HHSN268201600033C
Pays : United States
Organisme : NHLBI NIH HHS
ID : HHSN268201500014C
Pays : United States
Organisme : NCATS NIH HHS
ID : KL2 TR002490
Pays : United States

Investigateurs

Paul de Vries (P)

Informations de copyright

© 2022. The Author(s).

Références

Ann Epidemiol. 1992 Jan-Mar;2(1-2):23-8
pubmed: 1342260
Bioinformatics. 2017 Aug 01;33(15):2251-2257
pubmed: 28334390
Genome Res. 2001 Jan;11(1):143-51
pubmed: 11156623
Hum Mol Genet. 2018 Oct 15;27(20):3641-3649
pubmed: 30124842
Am J Hum Genet. 2016 Jan 7;98(1):127-48
pubmed: 26748516
Sci Rep. 2018 Sep 3;8(1):13149
pubmed: 30177847
G3 (Bethesda). 2018 May 4;8(5):1687-1699
pubmed: 29549092
Am J Hum Genet. 2017 Apr 6;100(4):635-649
pubmed: 28366442
Genome Med. 2020 May 18;12(1):44
pubmed: 32423490
Am J Hum Genet. 2016 Jan 7;98(1):165-84
pubmed: 26748518
Sci Rep. 2019 Jan 29;9(1):843
pubmed: 30696834
Hypertension. 2006 Jul;48(1):e3; author reply e5
pubmed: 16769991
Front Genet. 2019 Mar 27;10:267
pubmed: 30972108
Nat Commun. 2019 Jul 25;10(1):3328
pubmed: 31346163
Nat Genet. 2018 Nov;50(11):1514-1523
pubmed: 30275531
Nat Commun. 2019 Mar 7;10(1):1100
pubmed: 30846698
Genet Epidemiol. 2015 May;39(4):276-93
pubmed: 25810074
Sci Rep. 2020 Jul 6;10(1):11044
pubmed: 32632202
Nat Genet. 2019 Jan;51(1):51-62
pubmed: 30578418
Gigascience. 2019 Jul 1;8(7):
pubmed: 31307061
Bioinformatics. 2020 Dec 16;:
pubmed: 33326037
HGG Adv. 2021 Apr 8;2(2):
pubmed: 33937878
Nat Genet. 2019 Apr;51(4):584-591
pubmed: 30926966
Adv Chronic Kidney Dis. 2014 Sep;21(5):426-33
pubmed: 25168832
HGG Adv. 2021 Jan 14;2(1):
pubmed: 33564748
Am J Hum Genet. 2011 Jan 7;88(1):76-82
pubmed: 21167468
Arterioscler Thromb Vasc Biol. 2015 May;35(5):1271-8
pubmed: 25745061
Eur Heart J. 2021 Sep 7;42(34):3358-3360
pubmed: 33993286
Bioinformatics. 2019 Dec 15;35(24):5346-5348
pubmed: 31329242
Nat Genet. 2008 May;40(5):491-2
pubmed: 18443580
Circulation. 2011 May 24;123(20):2292-333
pubmed: 21502576
Nature. 2014 Apr 10;508(7495):249-53
pubmed: 24572353
Am J Epidemiol. 2021 Oct 1;190(10):1977-1992
pubmed: 33861317
G3 (Bethesda). 2020 Dec 3;10(12):4553-4563
pubmed: 33023974
Bioinformatics. 2018 Aug 15;34(16):2781-2787
pubmed: 29617937
Nucleic Acids Res. 2019 Jan 8;47(D1):D853-D858
pubmed: 30407534
Nat Commun. 2019 Apr 16;10(1):1776
pubmed: 30992449
Artif Intell Med. 2018 Apr;85:43-49
pubmed: 28943335
Nature. 2016 Oct 12;538(7624):161-164
pubmed: 27734877
PLoS Genet. 2019 Dec 23;15(12):e1008500
pubmed: 31869403
Nat Rev Genet. 2018 Sep;19(9):581-590
pubmed: 29789686
Genet Epidemiol. 2017 Sep;41(6):469-480
pubmed: 28480976
Nat Protoc. 2020 Sep;15(9):2759-2772
pubmed: 32709988
Nature. 2021 Feb;590(7845):290-299
pubmed: 33568819
Genet Epidemiol. 2019 Feb;43(1):50-62
pubmed: 30368908
Genet Epidemiol. 2019 Apr;43(3):263-275
pubmed: 30653739
Curr Cardiol Rev. 2010 Feb;6(1):54-61
pubmed: 21286279
Am J Med Genet B Neuropsychiatr Genet. 2019 Jan;180(1):80-85
pubmed: 30516002

Auteurs

Michael Elgart (M)

Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA. melgart@bwh.harvard.edu.
Department of Medicine, Harvard Medical School, Boston, MA, USA. melgart@bwh.harvard.edu.

Genevieve Lyons (G)

Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA.
Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.

Santiago Romero-Brufau (S)

Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
Department of Medicine, Mayo Clinic, Rochester, MN, USA.

Nuzulul Kurniansyah (N)

Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA.

Jennifer A Brody (JA)

Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA.

Xiuqing Guo (X)

The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA.

Henry J Lin (HJ)

The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA.

Laura Raffield (L)

Department of Genetics, University of North Carolina, Chapel Hill, NC, USA.

Yan Gao (Y)

The Jackson Heart Study, University of Mississippi Medical Center, Jackson, MS, USA.

Han Chen (H)

Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA.
Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA.

Paul de Vries (P)

Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA.

Donald M Lloyd-Jones (DM)

Department of Preventive Medicine, Northwestern University, Chicago, IL, USA.

Leslie A Lange (LA)

Department of Medicine, University of Colorado Denver, Anschutz Medical Campus, Aurora, CO, USA.

Gina M Peloso (GM)

Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA.

Myriam Fornage (M)

Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA.
Brown Foundation Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA.

Jerome I Rotter (JI)

The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA.

Stephen S Rich (SS)

Center for Public Health Genomics, University of Virginia School of Medicine, Charlottesville, VA, USA.

Alanna C Morrison (AC)

Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA.

Bruce M Psaty (BM)

Cardiovascular Health Research Unit, Departments of Medicine, Epidemiology, and Health Services, University of Washington, Seattle, WA, USA.

Daniel Levy (D)

The Population Sciences Branch of the National Heart, Lung and Blood Institute, Bethesda, MD, USA.
The Framingham Heart Study, Framingham, MA, USA.

Susan Redline (S)

Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA.
Department of Medicine, Harvard Medical School, Boston, MA, USA.

Tamar Sofer (T)

Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA. tsofer@bwh.harvard.edu.
Department of Medicine, Harvard Medical School, Boston, MA, USA. tsofer@bwh.harvard.edu.
Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA. tsofer@bwh.harvard.edu.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH