MagicalRsq: Machine-learning-based genotype imputation quality calibration.
XGBoost
genotype imputation
imputation quality
machine learning
post-imputation quality control
Journal
American journal of human genetics
ISSN: 1537-6605
Titre abrégé: Am J Hum Genet
Pays: United States
ID NLM: 0370475
Informations de publication
Date de publication:
03 11 2022
03 11 2022
Historique:
received:
19
06
2022
accepted:
16
09
2022
pubmed:
6
10
2022
medline:
9
11
2022
entrez:
5
10
2022
Statut:
ppublish
Résumé
Whole-genome sequencing (WGS) is the gold standard for fully characterizing genetic variation but is still prohibitively expensive for large samples. To reduce costs, many studies sequence only a subset of individuals or genomic regions, and genotype imputation is used to infer genotypes for the remaining individuals or regions without sequencing data. However, not all variants can be well imputed, and the current state-of-the-art imputation quality metric, denoted as standard Rsq, is poorly calibrated for lower-frequency variants. Here, we propose MagicalRsq, a machine-learning-based method that integrates variant-level imputation and population genetics statistics, to provide a better calibrated imputation quality metric. Leveraging WGS data from the Cystic Fibrosis Genome Project (CFGP), and whole-exome sequence data from UK BioBank (UKB), we performed comprehensive experiments to evaluate the performance of MagicalRsq compared to standard Rsq for partially sequenced studies. We found that MagicalRsq aligns better with true R
Identifiants
pubmed: 36198314
pii: S0002-9297(22)00412-8
doi: 10.1016/j.ajhg.2022.09.009
pmc: PMC9674945
pii:
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Research Support, N.I.H., Extramural
Langues
eng
Sous-ensembles de citation
IM
Pagination
1986-1997Subventions
Organisme : Medical Research Council
ID : MC_PC_17228
Pays : United Kingdom
Organisme : NHLBI NIH HHS
ID : U01 HL120393
Pays : United States
Organisme : NHLBI NIH HHS
ID : T32 HL129982
Pays : United States
Organisme : NHGRI NIH HHS
ID : U01 HG011720
Pays : United States
Organisme : Medical Research Council
ID : MC_QA137853
Pays : United Kingdom
Organisme : NIGMS NIH HHS
ID : R35 GM138286
Pays : United States
Organisme : NIEHS NIH HHS
ID : T32 ES007018
Pays : United States
Informations de copyright
Copyright © 2022 The Authors. Published by Elsevier Inc. All rights reserved.
Déclaration de conflit d'intérêts
Declaration of interests The authors declare no competing interests.
Références
Annu Rev Genomics Hum Genet. 2018 Aug 31;19:73-96
pubmed: 29799802
Nature. 2021 Nov;599(7886):628-634
pubmed: 34662886
Front Genet. 2019 Feb 05;10:34
pubmed: 30804980
Ann Am Thorac Soc. 2016 Jul;13(7):1173-9
pubmed: 27078236
Annu Rev Genomics Hum Genet. 2009;10:387-406
pubmed: 19715440
Brief Funct Genomics. 2016 Jul;15(4):298-304
pubmed: 26443613
Nat Genet. 2022 May;54(5):560-572
pubmed: 35551307
Cell. 2020 Sep 3;182(5):1214-1231.e11
pubmed: 32888494
Nat Genet. 2016 Nov;48(11):1443-1448
pubmed: 27694958
Nature. 2018 Mar 8;555(7695):210-215
pubmed: 29489753
PLoS One. 2010 Mar 15;5(3):e9697
pubmed: 20300623
Hum Mutat. 2012 Jan;33(1):8-21
pubmed: 21990134
G3 (Bethesda). 2011 Nov;1(6):457-70
pubmed: 22384356
Proc Natl Acad Sci U S A. 1979 Oct;76(10):5269-73
pubmed: 291943
Curr Protoc Hum Genet. 2019 Jun;102(1):e84
pubmed: 31216114
Genetics. 2000 Jul;155(3):1405-13
pubmed: 10880498
PLoS Genet. 2015 Feb 23;11(2):e1005004
pubmed: 25706129
PLoS Genet. 2016 Mar 15;12(3):e1005928
pubmed: 26977894
Genetics. 1997 Jul;146(3):1197-206
pubmed: 9215920
Cancer Res. 2005 Nov 1;65(21):10096-103
pubmed: 16267036
Am J Hum Genet. 2018 Sep 6;103(3):338-348
pubmed: 30100085
Nat Genet. 2016 Oct;48(10):1284-1287
pubmed: 27571263
J Neurodev Disord. 2022 Mar 3;14(1):16
pubmed: 35240980
PLoS One. 2017 Feb 16;12(2):e0169748
pubmed: 28207752
Nucleic Acids Res. 2017 Jan 4;45(D1):D840-D845
pubmed: 27899611
Genet Epidemiol. 2010 Dec;34(8):816-34
pubmed: 21058334
Nat Methods. 2017 Nov;14(11):1083-1086
pubmed: 28991892
J Med Genet. 2006 Apr;43(4):295-305
pubmed: 16014699
Eur J Hum Genet. 2015 Jul;23(7):975-83
pubmed: 25293720
Genet Epidemiol. 2012 Feb;36(2):107-17
pubmed: 22851474
Hum Mol Genet. 2004 Nov 15;13(22):2737-51
pubmed: 15385441
Curr Protoc Hum Genet. 2013 Jul;Chapter 1:Unit 1.25
pubmed: 23853078
Cell. 2020 Sep 3;182(5):1198-1213.e14
pubmed: 32888493
HGG Adv. 2022 Jan 11;3(2):100090
pubmed: 35128485
Bioinformatics. 2013 Feb 15;29(4):528-31
pubmed: 23292738
Nature. 2018 Oct;562(7726):203-209
pubmed: 30305743
Nat Rev Genet. 2010 Jul;11(7):499-511
pubmed: 20517342
Genetics. 2004 Jul;167(3):1513-24
pubmed: 15280259
Hum Mol Genet. 2008 Oct 15;17(R2):R122-8
pubmed: 18852200
Brief Bioinform. 2022 Jan 17;23(1):
pubmed: 34882196
J Hum Genet. 2022 Feb;67(2):87-93
pubmed: 34376796
PLoS Genet. 2019 Dec 23;15(12):e1008500
pubmed: 31869403
Nature. 2020 Oct;586(7831):749-756
pubmed: 33087929
Genetics. 1989 Nov;123(3):585-95
pubmed: 2513255
Drug Discov Today. 2018 Jun;23(6):1241-1250
pubmed: 29366762
Am J Hum Genet. 2022 Jun 2;109(6):1175-1181
pubmed: 35504290
Nature. 2021 Feb;590(7845):290-299
pubmed: 33568819
Mol Biol Evol. 2011 Jan;28(1):365-75
pubmed: 20709734
Am J Hum Genet. 2012 Nov 2;91(5):794-808
pubmed: 23103231