MagicalRsq-X: A cross-cohort transferable genotype imputation quality metric.

cross-cohort genome-wide association studies genotype imputation imputation quality machine learning quality control rare variants variant filtering whole-genome sequencing

Journal

American journal of human genetics
ISSN: 1537-6605
Titre abrégé: Am J Hum Genet
Pays: United States
ID NLM: 0370475

Informations de publication

Date de publication:
09 Apr 2024
Historique:
received: 25 01 2024
revised: 29 03 2024
accepted: 01 04 2024
medline: 19 4 2024
pubmed: 19 4 2024
entrez: 18 4 2024
Statut: aheadofprint

Résumé

Since genotype imputation was introduced, researchers have been relying on the estimated imputation quality from imputation software to perform post-imputation quality control (QC). However, this quality estimate (denoted as Rsq) performs less well for lower-frequency variants. We recently published MagicalRsq, a machine-learning-based imputation quality calibration, which leverages additional typed markers from the same cohort and outperforms Rsq as a QC metric. In this work, we extended the original MagicalRsq to allow cross-cohort model training and named the new model MagicalRsq-X. We removed the cohort-specific estimated minor allele frequency and included linkage disequilibrium scores and recombination rates as additional features. Leveraging whole-genome sequencing data from TOPMed, specifically participants in the BioMe, JHS, WHI, and MESA studies, we performed comprehensive cross-cohort evaluations for predominantly European and African ancestral individuals based on their inferred global ancestry with the 1000 Genomes and Human Genome Diversity Project data as reference. Our results suggest MagicalRsq-X outperforms Rsq in almost every setting, with 7.3%-14.4% improvement in squared Pearson correlation with true R

Identifiants

pubmed: 38636510
pii: S0002-9297(24)00116-2
doi: 10.1016/j.ajhg.2024.04.001
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Informations de copyright

Copyright © 2024. Published by Elsevier Inc.

Déclaration de conflit d'intérêts

Declaration of interests The authors declare no competing interests.

Auteurs

Quan Sun (Q)

Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.

Yingxi Yang (Y)

Department of Statistics and Data Science, Yale University, New Haven, CT 06520, USA.

Jonathan D Rosen (JD)

Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.

Jiawen Chen (J)

Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.

Xihao Li (X)

Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.

Wyliena Guan (W)

Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.

Min-Zhi Jiang (MZ)

Department of Applied Physical Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.

Jia Wen (J)

Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.

Rhonda G Pace (RG)

Marsico Lung Institute/UNC CF Research Center, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.

Scott M Blackman (SM)

Division of Pediatric Endocrinology, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA.

Michael J Bamshad (MJ)

Department of Pediatrics, University of Washington, Seattle, WA 98105, USA; Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA.

Ronald L Gibson (RL)

Department of Pediatrics, University of Washington, Seattle, WA 98105, USA.

Garry R Cutting (GR)

Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA.

Wanda K O'Neal (WK)

Marsico Lung Institute/UNC CF Research Center, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.

Michael R Knowles (MR)

Marsico Lung Institute/UNC CF Research Center, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.

Charles Kooperberg (C)

Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA.

Alexander P Reiner (AP)

Department of Epidemiology, University of Washington, Seattle, WA 98195, USA.

Laura M Raffield (LM)

Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.

April P Carson (AP)

Department of Epidemiology, University of Alabama at Birmingham, Birmingham, AL 35249, USA.

Stephen S Rich (SS)

Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia School of Medicine, Charlottesville, VA 22908, USA.

Jerome I Rotter (JI)

The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502, USA.

Ruth J F Loos (RJF)

The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York City, NY 10029, USA; Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen, Denmark.

Eimear Kenny (E)

The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York City, NY 10029, USA.

Byron C Jaeger (BC)

Wake Forest School of Medicine, Department of Biostatistics and Data Science, Wake Forest University, Winston-Salem, NC 27109, USA.

Yuan-I Min (YI)

Department of Medicine, University of Mississippi Medical Center, Jackson, MS 39216, USA.

Christian Fuchsberger (C)

Institute for Biomedicine, Eurac Research (affiliated with the University of Lübeck), Bolzano, Italy. Electronic address: cfuchsberger@eurac.edu.

Yun Li (Y)

Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA. Electronic address: yunli@med.unc.edu.

Classifications MeSH