Human genotype-to-phenotype predictions: Boosting accuracy with nonlinear models.
Journal
PloS one
ISSN: 1932-6203
Titre abrégé: PLoS One
Pays: United States
ID NLM: 101285081
Informations de publication
Date de publication:
2022
2022
Historique:
received:
27
05
2021
accepted:
04
08
2022
entrez:
31
8
2022
pubmed:
1
9
2022
medline:
9
9
2022
Statut:
epublish
Résumé
Genotype-to-phenotype prediction is a central problem of human genetics. In recent years, it has become possible to construct complex predictive models for phenotypes, thanks to the availability of large genome data sets as well as efficient and scalable machine learning tools. In this paper, we make a threefold contribution to this problem. First, we ask if state-of-the-art nonlinear predictive models, such as boosted decision trees, can be more efficient for phenotype prediction than conventional linear models. We find that this is indeed the case if model features include a sufficiently rich set of covariates, but probably not otherwise. Second, we ask if the conventional selection of single nucleotide polymorphisms (SNPs) by genome wide association studies (GWAS) can be replaced by a more efficient procedure, taking into account information in previously selected SNPs. We propose such a procedure, based on a sequential feature importance estimation with decision trees, and show that this approach indeed produced informative SNP sets that are much more compact than when selected with GWAS. Finally, we show that the highest prediction accuracy can ultimately be achieved by ensembling individual linear and nonlinear models. To the best of our knowledge, for some of the phenotypes that we consider (asthma, hypothyroidism), our results are a new state-of-the-art.
Identifiants
pubmed: 36044406
doi: 10.1371/journal.pone.0273293
pii: PONE-D-21-17509
pmc: PMC9432766
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
e0273293Subventions
Organisme : Medical Research Council
ID : MC_PC_17228
Pays : United Kingdom
Organisme : Medical Research Council
ID : MC_QA137853
Pays : United Kingdom
Déclaration de conflit d'intérêts
The authors have declared that no competing interests exist.
Références
G3 (Bethesda). 2019 Nov 5;9(11):3691-3702
pubmed: 31533955
Nat Mach Intell. 2019 May;1(5):206-215
pubmed: 35603010
Gigascience. 2014 Jun 16;3:10
pubmed: 25002967
Nat Rev Genet. 2014 Jan;15(1):22-33
pubmed: 24296533
Genetics. 2018 Nov;210(3):809-819
pubmed: 30171033
AAPS PharmSci. 2000;2(1):E4
pubmed: 11741220
Trends Genet. 2020 Jun;36(6):442-455
pubmed: 32396837
Osteoporos Int. 2001;12(5):406-11
pubmed: 11444090
J Bone Miner Res. 1996 Apr;11(4):530-4
pubmed: 8992884
Mol Ther Nucleic Acids. 2020 Aug 25;22:362-372
pubmed: 33230441
Heredity (Edinb). 2018 Jun;120(6):500-514
pubmed: 29426878
G3 (Bethesda). 2020 Jan 7;10(1):109-115
pubmed: 31649046
Brief Funct Genomics. 2010 Mar;9(2):166-77
pubmed: 20156985
Nat Genet. 2017 Oct;49(10):1468-1475
pubmed: 28869591
Nat Genet. 2010 Jul;42(7):565-9
pubmed: 20562875
Planta. 2018 Nov;248(5):1307-1318
pubmed: 30101399
Gigascience. 2015 Feb 25;4:7
pubmed: 25722852
Bioinformatics. 2017 Sep 01;33(17):2776-2778
pubmed: 28475694
Sci Rep. 2018 Sep 3;8(1):13149
pubmed: 30177847
PLoS Genet. 2017 Apr 7;13(4):e1006711
pubmed: 28388634
Nat Rev Genet. 2008 Apr;9(4):255-66
pubmed: 18319743
Genetics. 2018 Oct;210(2):477-497
pubmed: 30150289
PLoS One. 2018 Jul 26;13(7):e0200785
pubmed: 30048462
Sci Rep. 2019 Oct 25;9(1):15286
pubmed: 31653892
Nat Genet. 2021 Jul;53(7):942-948
pubmed: 34183854
PLoS Genet. 2020 Oct 23;16(10):e1009141
pubmed: 33095761
Nat Genet. 2018 Sep;50(9):1219-1224
pubmed: 30104762
PLoS One. 2015 Oct 06;10(10):e0138903
pubmed: 26439851
PLoS Med. 2015 Mar 31;12(3):e1001779
pubmed: 25826379
BMC Bioinformatics. 2021 May 4;22(1):230
pubmed: 33947323