Regularized sequence-context mutational trees capture variation in mutation rates across the human genome.


Journal

PLoS genetics
ISSN: 1553-7404
Titre abrégé: PLoS Genet
Pays: United States
ID NLM: 101239074

Informations de publication

Date de publication:
07 2023
Historique:
received: 12 12 2022
accepted: 01 06 2023
revised: 19 07 2023
medline: 21 7 2023
pubmed: 7 7 2023
entrez: 7 7 2023
Statut: epublish

Résumé

Germline mutation is the mechanism by which genetic variation in a population is created. Inferences derived from mutation rate models are fundamental to many population genetics methods. Previous models have demonstrated that nucleotides flanking polymorphic sites-the local sequence context-explain variation in the probability that a site is polymorphic. However, limitations to these models exist as the size of the local sequence context window expands. These include a lack of robustness to data sparsity at typical sample sizes, lack of regularization to generate parsimonious models and lack of quantified uncertainty in estimated rates to facilitate comparison between models. To address these limitations, we developed Baymer, a regularized Bayesian hierarchical tree model that captures the heterogeneous effect of sequence contexts on polymorphism probabilities. Baymer implements an adaptive Metropolis-within-Gibbs Markov Chain Monte Carlo sampling scheme to estimate the posterior distributions of sequence-context based probabilities that a site is polymorphic. We show that Baymer accurately infers polymorphism probabilities and well-calibrated posterior distributions, robustly handles data sparsity, appropriately regularizes to return parsimonious models, and scales computationally at least up to 9-mer context windows. We demonstrate application of Baymer in three ways-first, identifying differences in polymorphism probabilities between continental populations in the 1000 Genomes Phase 3 dataset, second, in a sparse data setting to examine the use of polymorphism models as a proxy for de novo mutation probabilities as a function of variant age, sequence context window size, and demographic history, and third, comparing model concordance between different great ape species. We find a shared context-dependent mutation rate architecture underlying our models, enabling a transfer-learning inspired strategy for modeling germline mutations. In summary, Baymer is an accurate polymorphism probability estimation algorithm that automatically adapts to data sparsity at different sequence context levels, thereby making efficient use of the available data.

Identifiants

pubmed: 37418489
doi: 10.1371/journal.pgen.1010807
pii: PGENETICS-D-22-01424
pmc: PMC10355397
doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural

Langues

eng

Sous-ensembles de citation

IM

Pagination

e1010807

Subventions

Organisme : NIDDK NIH HHS
ID : R01 DK101478
Pays : United States
Organisme : NIDDK NIH HHS
ID : R56 DK101478
Pays : United States
Organisme : NIDDK NIH HHS
ID : UM1 DK126194
Pays : United States

Informations de copyright

Copyright: © 2023 Adams et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Déclaration de conflit d'intérêts

The authors have declared that no competing interests exist.

Références

Nature. 2012 Aug 23;488(7412):504-7
pubmed: 22820252
Proc Natl Acad Sci U S A. 2021 May 25;118(21):
pubmed: 34016747
PLoS Genet. 2017 Feb 1;13(2):e1006581
pubmed: 28146552
Mutat Res. 1993 Jan;285(1):61-7
pubmed: 7678134
Nature. 2020 May;581(7809):434-443
pubmed: 32461654
PLoS Genet. 2009 May;5(5):e1000471
pubmed: 19424416
Elife. 2021 Nov 22;10:
pubmed: 34806592
Nat Rev Genet. 2011 Oct 04;12(11):756-66
pubmed: 21969038
Genome Biol Evol. 2022 Jan 4;14(1):
pubmed: 33983415
Genome Biol. 2013 May 29;14(5):R51
pubmed: 23718773
Elife. 2017 Apr 25;6:
pubmed: 28440220
Nat Genet. 2019 Jan;51(1):88-95
pubmed: 30531870
Nature. 2013 Jul 25;499(7459):471-5
pubmed: 23823723
Proc Natl Acad Sci U S A. 2019 May 7;116(19):9491-9500
pubmed: 31019089
Nat Genet. 2016 Apr;48(4):349-55
pubmed: 26878723
Elife. 2023 Feb 13;12:
pubmed: 36779395
Cell. 2022 Sep 1;185(18):3426-3440.e19
pubmed: 36055201
Nat Commun. 2022 Dec 22;13(1):7884
pubmed: 36550134
Nat Commun. 2018 Sep 14;9(1):3753
pubmed: 30218074
Mol Biol Evol. 2015 Jul;32(7):1672-83
pubmed: 25750180
Nat Genet. 2009 Apr;41(4):393-5
pubmed: 19287383
Proc Natl Acad Sci U S A. 2015 Mar 17;112(11):3439-44
pubmed: 25733855
Mol Biol Evol. 2005 Mar;22(3):650-8
pubmed: 15537806
Mol Biol Evol. 2019 May 1;36(5):955-965
pubmed: 30753705
Nature. 2016 Aug 17;536(7616):285-91
pubmed: 27535533
PLoS Genet. 2009 Oct;5(10):e1000695
pubmed: 19851460
PLoS Genet. 2013;9(8):e1003671
pubmed: 23966865
Proc Natl Acad Sci U S A. 2001 Jul 17;98(15):8319-25
pubmed: 11459970
Mol Ecol. 2012 Feb;21(4):974-86
pubmed: 22211450
Science. 2019 Jan 25;363(6425):
pubmed: 30679340
Mol Biol Evol. 2020 Mar 1;37(3):893-903
pubmed: 31651955
Cell. 2019 Mar 21;177(1):101-114
pubmed: 30901533
Genome Res. 2014 Nov;24(11):1751-64
pubmed: 25217194
Mol Biol Evol. 2020 Jan 1;37(1):2-10
pubmed: 31504792
Nature. 2015 Oct 1;526(7571):68-74
pubmed: 26432245
Nat Genet. 2018 Mar;50(3):333-337
pubmed: 29483654
Nature. 2001 Feb 15;409(6822):860-921
pubmed: 11237011
PLoS Genet. 2015 Sep 02;11(9):e1005492
pubmed: 26332131

Auteurs

Christopher J Adams (CJ)

Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.

Mitchell Conery (M)

Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.

Benjamin J Auerbach (BJ)

Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.

Shane T Jensen (ST)

Department of Statistics and Data Science, The Wharton School at the University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.

Iain Mathieson (I)

Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.

Benjamin F Voight (BF)

Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.
Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.
Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH