HaploBlocks: Efficient Detection of Positive Selection in Large Population Genomic Datasets.
big data
genome scan
natural selection
population genetics
Journal
Molecular biology and evolution
ISSN: 1537-1719
Titre abrégé: Mol Biol Evol
Pays: United States
ID NLM: 8501455
Informations de publication
Date de publication:
04 03 2023
04 03 2023
Historique:
pubmed:
16
2
2023
medline:
8
3
2023
entrez:
15
2
2023
Statut:
ppublish
Résumé
Genomic regions under positive selection harbor variation linked for example to adaptation. Most tools for detecting positively selected variants have computational resource requirements rendering them impractical on population genomic datasets with hundreds of thousands of individuals or more. We have developed and implemented an efficient haplotype-based approach able to scan large datasets and accurately detect positive selection. We achieve this by combining a pattern matching approach based on the positional Burrows-Wheeler transform with model-based inference which only requires the evaluation of closed-form expressions. We evaluate our approach with simulations, and find it to be both sensitive and specific. The computational resource requirements quantified using UK Biobank data indicate that our implementation is scalable to population genomic datasets with millions of individuals. Our approach may serve as an algorithmic blueprint for the era of "big data" genomics: a combinatorial core coupled with statistical inference in closed form.
Identifiants
pubmed: 36790822
pii: 7040366
doi: 10.1093/molbev/msad027
pmc: PMC9985328
pii:
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Informations de copyright
© The Author(s) 2023. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution.
Références
Sci Adv. 2019 Oct 23;5(10):eaaw9206
pubmed: 31681842
Genetics. 1969 Apr;61(4):893-903
pubmed: 5364968
Mol Ecol Resour. 2021 Nov;21(8):2719-2737
pubmed: 33964107
Bioinformatics. 2014 May 1;30(9):1266-72
pubmed: 24413527
Nature. 2018 Oct;562(7726):203-209
pubmed: 30305743
Front Genet. 2022 Jan 03;12:815160
pubmed: 35047024
PLoS Genet. 2019 Sep 13;15(9):e1008384
pubmed: 31518343
J Clin Epidemiol. 2016 Feb;70:214-23
pubmed: 26441289
Nat Genet. 2019 Sep;51(9):1321-1329
pubmed: 31477933
Theor Popul Biol. 2015 Feb;99:18-30
pubmed: 25446961
Mol Ecol Resour. 2019 Mar;19(2):552-566
pubmed: 30565882
Algorithms Mol Biol. 2020 Feb 10;15:2
pubmed: 32055252
Mol Biol Evol. 2019 Mar 1;36(3):632-637
pubmed: 30517680
Genetics. 2012 Nov;192(3):1049-64
pubmed: 22960214
Nature. 2015 Oct 1;526(7571):82-90
pubmed: 26367797
Bioinformatics. 2014 Apr 1;30(7):1003-5
pubmed: 24227676
Science. 1996 May 31;272(5266):1357-9; author reply 1361-2
pubmed: 8650551
Gigascience. 2015 Feb 25;4:7
pubmed: 25722852
Commun Biol. 2018 Jun 27;1:79
pubmed: 30271960
Am J Hum Genet. 2020 Nov 5;107(5):895-910
pubmed: 33053335
Genetics. 2000 Sep;156(1):297-304
pubmed: 10978293
BMC Bioinformatics. 2019 Nov 22;20(Suppl 9):337
pubmed: 31757205
PLoS Comput Biol. 2016 May 04;12(5):e1004842
pubmed: 27145223
Nucleic Acids Res. 2019 Jan 8;47(D1):D1080-D1089
pubmed: 30335169
Proc Natl Acad Sci U S A. 2011 Jul 19;108(29):11983-8
pubmed: 21730125
Bioinformatics. 2011 Aug 1;27(15):2156-8
pubmed: 21653522
iScience. 2020 Jun 26;23(6):101149
pubmed: 32446220
Genetics. 2005 Jul;170(3):1401-10
pubmed: 15911584
Annu Rev Genomics Hum Genet. 2017 Aug 31;18:297-319
pubmed: 28426286
Annu Rev Genomics Hum Genet. 2000;1:225-49
pubmed: 11701630
N Engl J Med. 2019 Aug 15;381(7):668-676
pubmed: 31412182
Hypertension. 2018 Feb;71(2):273-279
pubmed: 29229744
Curr Biol. 2020 Nov 2;30(21):4307-4315.e13
pubmed: 32888485