Machine learning models for predicting blood pressure phenotypes by combining multiple polygenic risk scores.
Journal
Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288
Informations de publication
Date de publication:
30 05 2024
30 05 2024
Historique:
received:
22
01
2024
accepted:
22
05
2024
medline:
31
5
2024
pubmed:
31
5
2024
entrez:
30
5
2024
Statut:
epublish
Résumé
We construct non-linear machine learning (ML) prediction models for systolic and diastolic blood pressure (SBP, DBP) using demographic and clinical variables and polygenic risk scores (PRSs). We developed a two-model ensemble, consisting of a baseline model, where prediction is based on demographic and clinical variables only, and a genetic model, where we also include PRSs. We evaluate the use of a linear versus a non-linear model at both the baseline and the genetic model levels and assess the improvement in performance when incorporating multiple PRSs. We report the ensemble model's performance as percentage variance explained (PVE) on a held-out test dataset. A non-linear baseline model improved the PVEs from 28.1 to 30.1% (SBP) and 14.3% to 17.4% (DBP) compared with a linear baseline model. Including seven PRSs in the genetic model computed based on the largest available GWAS of SBP/DBP improved the genetic model PVE from 4.8 to 5.1% (SBP) and 4.7 to 5% (DBP) compared to using a single PRS. Adding additional 14 PRSs computed based on two independent GWASs further increased the genetic model PVE to 6.3% (SBP) and 5.7% (DBP). PVE differed across self-reported race/ethnicity groups, with primarily all non-White groups benefitting from the inclusion of additional PRSs. In summary, non-linear ML models improves BP prediction in models incorporating diverse populations.
Identifiants
pubmed: 38816422
doi: 10.1038/s41598-024-62945-9
pii: 10.1038/s41598-024-62945-9
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
12436Subventions
Organisme : NHLBI NIH HHS
ID : R01HL161012
Pays : United States
Informations de copyright
© 2024. The Author(s).
Références
Torkamani, A., Wineinger, N. E. & Topol, E. J. The personal and clinical utility of polygenic risk scores. Nat. Rev. Genet. 19(9), 581–590 (2018).
pubmed: 29789686
doi: 10.1038/s41576-018-0018-x
Choi, S. W., Mak, T. S. & O’Reilly, P. F. Tutorial: A guide to performing polygenic risk score analyses. Nat. Protoc. 15(9), 2759–2772 (2020).
pubmed: 32709988
pmcid: 7612115
doi: 10.1038/s41596-020-0353-1
Ho, D. S. W. et al. Machine learning SNP based prediction for precision medicine. Front. Genet. 10, 1 (2019).
doi: 10.3389/fgene.2019.00267
Elgart, M. et al. Non-linear machine learning models incorporating SNPs and PRS improve polygenic prediction in diverse human populations. Commun. Biol. 5(1), 856 (2022).
pubmed: 35995843
pmcid: 9395509
doi: 10.1038/s42003-022-03812-z
Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58(1), 267–288 (1996).
doi: 10.1111/j.2517-6161.1996.tb02080.x
Pain, O. et al. Evaluation of polygenic prediction methodology within a reference-standardized framework. PLoS Genet. 17(5), e1009021 (2021).
pubmed: 33945532
pmcid: 8121285
doi: 10.1371/journal.pgen.1009021
Krapohl, E. et al. Multi-polygenic score approach to trait prediction. Mol. Psychiatry 23(5), 1368–1374 (2018).
pubmed: 28785111
doi: 10.1038/mp.2017.163
Schoeler, T. et al. Multi-polygenic score approach to identifying individual vulnerabilities associated with the risk of exposure to bullying. JAMA Psychiatry 76(7), 730–738 (2019).
pubmed: 30942833
pmcid: 6583782
doi: 10.1001/jamapsychiatry.2019.0310
Sinnott-Armstrong, N. et al. Genetics of 35 blood and urine biomarkers in the UK Biobank. Nat. Genet. 53(2), 185–194 (2021).
pubmed: 33462484
doi: 10.1038/s41588-020-00757-z
Abraham, G. et al. Genomic risk score offers predictive performance comparable to clinical risk factors for ischaemic stroke. Nat. Commun. 10(1), 5819 (2019).
pubmed: 31862893
pmcid: 6925280
doi: 10.1038/s41467-019-13848-1
Rodriguez, V. et al. Use of multiple polygenic risk scores for distinguishing schizophrenia-spectrum disorder and affective psychosis categories in a first-episode sample; the EU-GEI study. Psychol. Med. 53(8), 3396–3405 (2023).
pubmed: 35076361
doi: 10.1017/S0033291721005456
Meisner, A. et al. Combined utility of 25 disease and risk factor polygenic risk scores for stratifying risk of all-cause mortality. Am. J. Hum. Genet. 107(3), 418–431 (2020).
pubmed: 32758451
pmcid: 7477009
doi: 10.1016/j.ajhg.2020.07.002
Kurniansyah, N. et al. Evaluating the use of blood pressure polygenic risk scores across race/ethnic background groups. Nat. Commun. 14(1), 3202 (2023).
pubmed: 37268629
pmcid: 10238525
doi: 10.1038/s41467-023-38990-9
Coombes, B. J. et al. Dissecting clinical heterogeneity of bipolar disorder using multiple polygenic risk scores. Transl. Psychiatry 10(1), 314 (2020).
pubmed: 32948743
pmcid: 7501305
doi: 10.1038/s41398-020-00996-y
Xin, J. et al. Risk assessment for colorectal cancer via polygenic risk score and lifestyle exposure: A large-scale association study of East Asian and European populations. Genome Med. 15(1), 4 (2023).
pubmed: 36694225
pmcid: 9875451
doi: 10.1186/s13073-023-01156-9
Collister, J. A., Liu, X. & Clifton, L. Calculating polygenic risk scores (PRS) in UK Biobank: A practical guide for epidemiologists. Front. Genet. 13, 818574 (2022).
pubmed: 35251129
pmcid: 8894758
doi: 10.3389/fgene.2022.818574
Coombes, B. J. et al. A principal component approach to improve association testing with polygenic risk scores. Genet. Epidemiol. 44(7), 676–686 (2020).
pubmed: 32691445
pmcid: 7722089
doi: 10.1002/gepi.22339
Arvanitis, M. et al. Linear and nonlinear Mendelian randomization analyses of the association between diastolic blood pressure and cardiovascular events: The J-curve revisited. Circulation 143(9), 895–906 (2021).
pubmed: 33249881
doi: 10.1161/CIRCULATIONAHA.120.049819
Wan, E. Y. F. et al. Blood pressure and risk of cardiovascular disease in UK Biobank: A Mendelian randomization study. Hypertension 77(2), 367–375 (2021).
pubmed: 33390054
doi: 10.1161/HYPERTENSIONAHA.120.16138
Tsao, C. W. et al. Heart disease and stroke statistics-2023 update: A report from the American Heart Association. Circulation 147(8), e93–e621 (2023).
pubmed: 36695182
doi: 10.1161/CIR.0000000000001123
Mills, K. T. et al. Global disparities of hypertension prevalence and control: A systematic analysis of population-based studies from 90 countries. Circulation 134(6), 441–450 (2016).
pubmed: 27502908
pmcid: 4979614
doi: 10.1161/CIRCULATIONAHA.115.018912
Jaeger, B. C. et al. Hypertension statistics for US adults: An open-source web application for analysis and visualization of national health and nutrition examination survey data. Hypertension 80(6), 1311–1320 (2023).
pubmed: 37082970
doi: 10.1161/HYPERTENSIONAHA.123.20900
Ference, B. A. et al. Clinical effect of naturally random allocation to lower systolic blood pressure beginning before the development of hypertension. Hypertension 63(6), 1182–1188 (2014).
pubmed: 24591335
doi: 10.1161/HYPERTENSIONAHA.113.02734
Niiranen, T. J. et al. Prediction of blood pressure and blood pressure change with a genetic risk score. J. Clin. Hypertens. 18(3), 181–186 (2016).
doi: 10.1111/jch.12702
Fujii, R. et al. Associations of genome-wide polygenic risk score and risk factors with hypertension in a Japanese population. Circ. Genom. Precis. Med. 15(4), e003612 (2022).
pubmed: 35666837
doi: 10.1161/CIRCGEN.121.003612
Grinde, K. E. et al. Generalizing polygenic risk scores from Europeans to Hispanics/Latinos. Genet. Epidemiol. 43(1), 50–62 (2019).
pubmed: 30368908
doi: 10.1002/gepi.22166
McCaw, Z. R. et al. DeepNull models non-linear covariate effects to improve phenotypic prediction and association power. Nat. Commun. 13(1), 241 (2022).
pubmed: 35017556
pmcid: 8752755
doi: 10.1038/s41467-021-27930-0
Goodman, M. O. et al. Pathway-specific polygenic risk scores identify obstructive sleep apnea—Related pathways differentially moderating genetic susceptibility to coronary artery disease. Circ. Genom. Precis. Med. 15(5), e003535 (2022).
pubmed: 36170352
pmcid: 9588629
doi: 10.1161/CIRCGEN.121.003535
Choi, S. W. et al. PRSet: Pathway-based polygenic risk score analyses and software. PLoS Genet. 19(2), e1010624 (2023).
pubmed: 36749789
pmcid: 9937466
doi: 10.1371/journal.pgen.1010624
Darst, B. F. et al. Pathway-specific polygenic risk scores as predictors of amyloid-β deposition and cognitive function in a sample at increased risk for Alzheimer’s disease. J. Alzheimers Dis. 55(2), 473–484 (2017).
pubmed: 27662287
pmcid: 5123972
doi: 10.3233/JAD-160195
Naret, O. et al. Improving polygenic prediction with genetically inferred ancestry. HGG Adv. 3(3), 100109 (2022).
pubmed: 35571679
pmcid: 9095896
Chen, C. Y. et al. Explicit modeling of ancestry improves polygenic risk scores and BLUP prediction. Genet. Epidemiol. 39(6), 427–438 (2015).
pubmed: 25995153
pmcid: 4734143
doi: 10.1002/gepi.21906
Privé, F. et al. Portability of 245 polygenic scores when derived from the UK Biobank and applied to 9 ancestry groups from the same cohort. Am. J. Hum. Genet. 109(2), 373 (2022).
pubmed: 35120604
pmcid: 8874215
doi: 10.1016/j.ajhg.2022.01.007
Wang, Y. et al. Challenges and opportunities for developing more generalizable polygenic risk scores. Annu. Rev. Biomed. Data Sci. 5, 293–320 (2022).
pubmed: 35576555
pmcid: 9828290
doi: 10.1146/annurev-biodatasci-111721-074830
Zhao, Z. et al. The construction of cross-population polygenic risk scores using transfer learning. Am. J. Hum. Genet. 109(11), 1998–2008 (2022).
pubmed: 36240765
pmcid: 9674947
doi: 10.1016/j.ajhg.2022.09.010
Ruan, Y. et al. Improving polygenic prediction in ancestrally diverse populations. Nat. Genet. 54(5), 573–580 (2022).
pubmed: 35513724
pmcid: 9117455
doi: 10.1038/s41588-022-01054-7
Hoggart, C. J. et al. BridgePRS leverages shared genetic effects across ancestries to increase polygenic risk score portability. Nat. Genet. 56(1), 180–186 (2024).
pubmed: 38123642
doi: 10.1038/s41588-023-01583-9
Hu, X. et al. Polygenic transcriptome risk scores for COPD and lung function improve cross-ethnic portability of prediction in the NHLBI TOPMed program. Am. J. Hum. Genet. 109(5), 857–870 (2022).
pubmed: 35385699
pmcid: 9118106
doi: 10.1016/j.ajhg.2022.03.007
Weissbrod, O. et al. Leveraging fine-mapping and multipopulation training data to improve cross-population polygenic risk scores. Nat. Genet. 54(4), 450–458 (2022).
pubmed: 35393596
pmcid: 9009299
doi: 10.1038/s41588-022-01036-9
Steyerberg, E. W. & Vergouwe, Y. Towards better clinical prediction models: Seven steps for development and an ABCD for validation. Eur. Heart J. 35(29), 1925–1931 (2014).
pubmed: 24898551
pmcid: 4155437
doi: 10.1093/eurheartj/ehu207
Van Calster, B. et al. Calibration: The Achilles heel of predictive analytics. BMC Med. 17(1), 230 (2019).
pubmed: 31842878
pmcid: 6912996
doi: 10.1186/s12916-019-1466-7
Cook, N. R. & Ridker, P. M. Calibration of the pooled cohort equations for atherosclerotic cardiovascular disease. Ann. Intern. Med. 165(11), 786–794 (2016).
pubmed: 27723890
doi: 10.7326/M16-1739
Emdin, C. A. et al. Evaluation of the pooled cohort equations for prediction of cardiovascular risk in a contemporary prospective cohort. Am. J. Cardiol. 119(6), 881–885 (2017).
pubmed: 28061997
doi: 10.1016/j.amjcard.2016.11.042
Khan, S. S. et al. Coronary artery calcium score and polygenic risk score for the prediction of coronary heart disease events. JAMA 329(20), 1768–1777 (2023).
pubmed: 37219552
pmcid: 10208141
doi: 10.1001/jama.2023.7575
Mujwara, D. et al. Integrating a polygenic risk score for coronary artery disease as a risk-enhancing factor in the pooled cohort equation: A cost-effectiveness analysis study. J. Am. Heart Assoc. 11(12), e025236 (2022).
pubmed: 35699184
pmcid: 9238642
doi: 10.1161/JAHA.121.025236
Davis, S. E. et al. Calibration drift among regression and machine learning models for hospital mortality. AMIA Annu. Symp. Proc. 2017, 625–634 (2017).
pubmed: 29854127
Zhang, J. et al. Circadian blood pressure rhythm in cardiovascular and renal health and disease. Biomolecules 11, 6 (2021).
doi: 10.3390/biom11060868
Kurniansyah, N. et al. A multi-ethnic polygenic risk score is associated with hypertension prevalence and progression throughout adulthood. Nat. Commun. 13(1), 3549 (2022).
pubmed: 35729114
pmcid: 9213527
doi: 10.1038/s41467-022-31080-2
Toloşi, L. & Lengauer, T. Classification with correlated features: Unreliability of feature ranking and solutions. Bioinformatics 27(14), 1986–1994 (2011).
pubmed: 21576180
doi: 10.1093/bioinformatics/btr300
Stilp, A. M. et al. A system for phenotype harmonization in the national heart, lung, and blood institute trans-omics for precision medicine (TOPMed) program. Am. J. Epidemiol. 190(10), 1977–1992 (2021).
pubmed: 33861317
pmcid: 8485147
doi: 10.1093/aje/kwab115
Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590(7845), 290–299 (2021).
pubmed: 33568819
pmcid: 7875770
doi: 10.1038/s41586-021-03205-y
Conomos, M. P. et al. Model-free estimation of recent genetic relatedness. Am. J. Hum. Genet. 98(1), 127–148 (2016).
pubmed: 26748516
pmcid: 4716688
doi: 10.1016/j.ajhg.2015.11.022
Conomos, M. P., Miller, M. B. & Thornton, T. A. Robust inference of population structure for ancestry prediction and correction of stratification in the presence of relatedness. Genet. Epidemiol. 39(4), 276–293 (2015).
pubmed: 25810074
pmcid: 4836868
doi: 10.1002/gepi.21896
Gogarten, S. M. et al. Genetic association testing using the GENESIS R/bioconductor package. Bioinformatics 35(24), 5346–5348 (2019).
pubmed: 31329242
pmcid: 7904076
doi: 10.1093/bioinformatics/btz567
Sofer, T. tamartsi/Remove_overlap_GWAS_summary_stat: v1.0.0 (Zenodo, 2022).
Euesden, J., Lewis, C. M. & O’Reilly, P. F. PRSice: Polygenic risk score software. Bioinformatics 31(9), 1466–1468 (2015).
pubmed: 25550326
doi: 10.1093/bioinformatics/btu848
Berisa, T. & Pickrell, J. K. Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics 32(2), 283–285 (2016).
pubmed: 26395773
doi: 10.1093/bioinformatics/btv546
Ruan, Y. et al. Author Correction: Improving polygenic prediction in ancestrally diverse populations. Nat. Genet. 54(8), 1259 (2022).
pubmed: 35789324
doi: 10.1038/s41588-022-01144-6
Ge, T. et al. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat. Commun. 10(1), 1776 (2019).
pubmed: 30992449
pmcid: 6467998
doi: 10.1038/s41467-019-09718-5
Chen, T. & Guestrin, C. XGBoost: A scalable tree boosting system. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (Association for Computing Machinery, 2016).
Akiba, T. et al. Optuna: A next-generation hyperparameter optimization framework. In Proc. 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining 2623–2631 (Association for Computing Machinery, 2019).
Pedregosa, F. et al. Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011).