Haplotype based testing for a better understanding of the selective architecture.
Evolve and resequence
Experimental evolution
Haplotype
Hypothesis test
Post hoc test
Selection
Journal
BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194
Informations de publication
Date de publication:
26 Aug 2023
26 Aug 2023
Historique:
received:
29
11
2022
accepted:
03
08
2023
medline:
28
8
2023
pubmed:
27
8
2023
entrez:
26
8
2023
Statut:
epublish
Résumé
The identification of genomic regions affected by selection is one of the most important goals in population genetics. If temporal data are available, allele frequency changes at SNP positions are often used for this purpose. Here we provide a new testing approach that uses haplotype frequencies instead of allele frequencies. Using simulated data, we show that compared to SNP based test, our approach has higher power, especially when the number of candidate haplotypes is small or moderate. To improve power when the number of haplotypes is large, we investigate methods to combine them with a moderate number of haplotype subsets. Haplotype frequencies can often be recovered with less noise than SNP frequencies, especially under pool sequencing, giving our test an additional advantage. Furthermore, spurious outlier SNPs may lead to false positives, a problem usually not encountered when working with haplotypes. Post hoc tests for the number of selected haplotypes and for differences between their selection coefficients are also provided for a better understanding of the underlying selection dynamics. An application on a real data set further illustrates the performance benefits. Due to less multiple testing correction and noise reduction, haplotype based testing is able to outperform SNP based tests in terms of power in most scenarios.
Sections du résumé
BACKGROUND
BACKGROUND
The identification of genomic regions affected by selection is one of the most important goals in population genetics. If temporal data are available, allele frequency changes at SNP positions are often used for this purpose. Here we provide a new testing approach that uses haplotype frequencies instead of allele frequencies.
RESULTS
RESULTS
Using simulated data, we show that compared to SNP based test, our approach has higher power, especially when the number of candidate haplotypes is small or moderate. To improve power when the number of haplotypes is large, we investigate methods to combine them with a moderate number of haplotype subsets. Haplotype frequencies can often be recovered with less noise than SNP frequencies, especially under pool sequencing, giving our test an additional advantage. Furthermore, spurious outlier SNPs may lead to false positives, a problem usually not encountered when working with haplotypes. Post hoc tests for the number of selected haplotypes and for differences between their selection coefficients are also provided for a better understanding of the underlying selection dynamics. An application on a real data set further illustrates the performance benefits.
CONCLUSIONS
CONCLUSIONS
Due to less multiple testing correction and noise reduction, haplotype based testing is able to outperform SNP based tests in terms of power in most scenarios.
Identifiants
pubmed: 37633901
doi: 10.1186/s12859-023-05437-3
pii: 10.1186/s12859-023-05437-3
pmc: PMC10463365
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
322Subventions
Organisme : Austrian Science Fund
ID : DK W1225-B20
Organisme : National Science Foundation
ID : NSF PHY-1748958
Informations de copyright
© 2023. BioMed Central Ltd., part of Springer Nature.
Références
Turner TL, Stewart AD, Fields AT, Rice WR, Tarone AM. Population-based resequencing of experimentally evolved populations reveals the genetic basis of body size variation in Drosophila melanogaster. PLOS Genet. 2011;7(3):1–10. https://doi.org/10.1371/journal.pgen.1001336 .
doi: 10.1371/journal.pgen.1001336
Griffin PC, Hangartner SB, Fournier-Level A, Hoffmann AA. Genomic trajectories to desiccation resistance: convergence and divergence among replicate Selected Drosophila lines. Genetics. 2017;205(2):871–90. https://doi.org/10.1534/genetics.116.187104 .
doi: 10.1534/genetics.116.187104
pubmed: 28007884
Spitzer K, Pelizzola M, Futschik A. Modifying the Chi-square and the CMH test for population genetic inference: adapting to overdispersion. Ann Appl Stat. 2020;14(1):202–20. https://doi.org/10.1214/19-AOAS1301 .
doi: 10.1214/19-AOAS1301
Vlachos C, Burny C, Pelizzola M, Borges R, Futschik A, Kofler R, et al. Benchmarking software tools for detecting and quantifying selection in evolve and resequencing studies. Genome Biol. 2019. https://doi.org/10.1186/s13059-019-1770-8 .
doi: 10.1186/s13059-019-1770-8
pubmed: 31416462
pmcid: 6694636
Kidd KK, Pakstis AJ. State of the art for microhaplotypes. Genes. 2022;13(8). https://www.mdpi.com/2073-4425/13/8/1322 .
Clarke GM, Anderson CA, Pettersson F, Cardon LR, Morris AP, Zondervan KT. Basic statistical analysis in genetic case-control studies. Nat Protoc. 2011;6:121–33. https://doi.org/10.1038/nprot.2010.182 .
doi: 10.1038/nprot.2010.182
pubmed: 21293453
pmcid: 3154648
Datta A, Biswas S. Comparison of haplotype-based statistical tests for disease association with rare and common variants. Brief Bioinform. 2015. https://doi.org/10.1093/bib/bbv072 .
doi: 10.1093/bib/bbv072
pubmed: 26338417
pmcid: 4945828
Guo W, Lin S. Generalized linear modeling with regularization for detecting common disease rare haplotype association. Genet Epidemiol. 2009;33:308–16. https://doi.org/10.1002/gepi.20382 .
doi: 10.1002/gepi.20382
pubmed: 19025789
pmcid: 2752471
Pan W, Kim J, Zhang Y, Shen X, Wei P. A powerful test adaptive association, for rare variants. Genetics. 2014;197(4):1081–95. https://doi.org/10.1534/genetics.114.165035 .
doi: 10.1534/genetics.114.165035
pubmed: 24831820
pmcid: 4125385
Hamazaki K, Iwata H. RAINBOW: haplotype-based genome-wide association study using a novel SNP-set method. PLOS Comput Biol. 2020;16(2):1–17. https://doi.org/10.1371/journal.pcbi.1007663 .
doi: 10.1371/journal.pcbi.1007663
Sabeti P, Reich D, Higgins J, Levine H, Richter D, Schaffner S, et al. Detecting recent positive selection in the human genome from haplotype structure. Nature. 2002;11(419):832–7. https://doi.org/10.1038/nature01140 .
doi: 10.1038/nature01140
Zhang C, Bailey D, Awad T, Liu G, Xing G, Cao M, et al. A whole genome long-range haplotype (WGLRH) test for detecting imprints of positive selection. Bioinformatics (Oxford, England). 2006;10(22):2122–8. https://doi.org/10.1093/bioinformatics/btl365 .
doi: 10.1093/bioinformatics/btl365
Günther T, Schmid K. Improved haplotype-based detection of ongoing selective sweeps towards an application in Arabidopsis thaliana. BMC Res Notes. 2011;07(4):232. https://doi.org/10.1186/1756-0500-4-232 .
doi: 10.1186/1756-0500-4-232
Pelizzola M, Behr M, Li H, Munk A, Futschik A. Multiple haplotype reconstruction from allele frequency data. Nat Comput Sci. 2021;1:1–10. https://doi.org/10.1038/s43588-021-00056-5 .
doi: 10.1038/s43588-021-00056-5
Moeinzadeh MH, Yang J, Muzychenko E, Gallone G, Heller D, Reinert K, et al. Ranbow: a fast and accurate method for polyploid haplotype reconstruction. PLOS Comput Biol. 2020;16(5):1–23. https://doi.org/10.1371/journal.pcbi.1007843 .
doi: 10.1371/journal.pcbi.1007843
Phillips M, Kutch I, McHugh K, Taggard S, Burke M. Crossing design shapes patterns of genetic variation in synthetic recombinant populations of Saccharomyces cerevisiae. Sci Rep. 2021;10(11):19551. https://doi.org/10.1038/s41598-021-99026-0 .
doi: 10.1038/s41598-021-99026-0
Neuhauser C. 19. In: Mathematical models in population genetics. New York: Wiley; 2004. https://doi.org/10.1002/0470022620.bbc20 .
Sohail M, Louie R, McKay M, Barton J. MPL resolves genetic linkage in fitness inference from complex evolutionary histories. Nat Biotechnol. 2021;39:1–8. https://doi.org/10.1038/s41587-020-0737-3 .
doi: 10.1038/s41587-020-0737-3
Illingworth CJR, Parts L, Schiffels S, Liti G, Mustonen V. Quantifying selection acting on a complex trait using allele frequency time series data. Mol Biol Evol. 2011;29(4):1187–97. https://doi.org/10.1093/molbev/msr289 .
doi: 10.1093/molbev/msr289
pubmed: 22114362
pmcid: 3731369
Jónás Á, Taus T, Kosiol C, Schlötterer C, Futschik A. Estimating the effective population size from temporal allele frequency changes in experimental evolution. Genetics. 2016;204(2):723–35. https://doi.org/10.1534/genetics.116.191197 .
doi: 10.1534/genetics.116.191197
pubmed: 27542959
pmcid: 5068858
Wiberg RAW, Gaggiotti OE, Morrissey MB, Ritchie MG. Identifying consistent allele frequency differences in studies of stratified populations. Methods Ecol Evol. 2017;8(12):1899–909. https://doi.org/10.1111/2041-210X.12810 .
doi: 10.1111/2041-210X.12810
pubmed: 29263778
pmcid: 5726381
Vovk V, Wang R. Combining p-values via averaging. Biometrika. 2020;107(4):791–808. https://doi.org/10.1093/biomet/asaa027 .
doi: 10.1093/biomet/asaa027
Futschik A, Taus T, Zehetmayer S. An omnibus test for the global null hypothesis. Stat Methods Med Res. 2019;28(8):2292–304. https://doi.org/10.1177/0962280218768326 .
doi: 10.1177/0962280218768326
pubmed: 29635962
Wilson DJ. The harmonic mean p-value for combining dependent tests. Proc Natl Acad Sci. 2019;116(4):1195–200. https://doi.org/10.1073/pnas.1814092116 .
doi: 10.1073/pnas.1814092116
pubmed: 30610179
pmcid: 6347718
Goeman JJ, Rosenblatt JD, Nichols TE. The harmonic mean p-value: Strong versus weak control, and the assumption of independence. Proc Natl Acad Sci. 2019;116(47):23382–3. https://doi.org/10.1073/pnas.1909339116 .
doi: 10.1073/pnas.1909339116
pubmed: 31662466
pmcid: 6876242
Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodol). 1995;57(1):289–300. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x .
doi: 10.1111/j.2517-6161.1995.tb02031.x
Langmüller A, Schlötterer C. Low concordance of short-term and long-term selection responses in experimental Drosophila populations. Mol Ecol. 2020. https://doi.org/10.1111/mec.15579 .
doi: 10.1111/mec.15579
pubmed: 32762052
pmcid: 7540288
Pelletier K, Pitchers WR, Mammel A, Northrop-Albrecht E, Márquez EJ, Moscarella RA, et al. Complexities of recapitulating polygenic effects in natural populations: replication of genetic effects on wing shape in artificially selected and wild caught populations of Drosophila melanogaster. bioRxiv. 2022. https://doi.org/10.1101/2022.05.12.491649 .
doi: 10.1101/2022.05.12.491649
Won S, Park Je, Son JH, Lee SH, Park B, Park M, et al. Defined genomic prediction accuracy using haplotypes, by size and hierarchical clustering based on linkage disequilibrium. Front Genet. 2020. https://doi.org/10.3389/fgene.2020.00134 .
doi: 10.3389/fgene.2020.00134
pubmed: 33408737
pmcid: 7780896
Bardel C, Darlu P, Genin E. Clustering of haplotypes based on phylogeny: How good a strategy for association testing? Eur J Hum Genet EJHG. 2006;14:202–6. https://doi.org/10.1038/sj.ejhg.5201501 .
doi: 10.1038/sj.ejhg.5201501
pubmed: 16306882
Franssen SU, Barton NH, Schlötterer C. Reconstruction of haplotype-blocks selected during experimental evolution. Mol Biol Evol. 2016;34(1):174–84. https://doi.org/10.1093/molbev/msw210 .
doi: 10.1093/molbev/msw210
pubmed: 27702776
Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B, et al. The structure of haplotype blocks in the human genome. Science. 2002;296(5576):2225–9. https://doi.org/10.1126/science.1069424 .
doi: 10.1126/science.1069424
pubmed: 12029063
Lewontin RC. The interaction of selection and linkage. I. General considerations; heterotic models. Genetics. 1964;49(1):49–67. https://doi.org/10.1093/genetics/49.1.49 .
doi: 10.1093/genetics/49.1.49
pubmed: 17248194
pmcid: 1210557
Barghi N, Tobler R, Nolte V, Jakšić AM, Mallard F, Otte KA, et al. Genetic redundancy fuels polygenic adaptation in Drosophila. PLOS Biol. 2019;17(2):1–31. https://doi.org/10.1371/journal.pbio.3000128 .
doi: 10.1371/journal.pbio.3000128
Long Q, Jeffares D, Zhang Q, Ye K, Nizhynska V, Ning Z, et al. PoolHap: inferring haplotype frequencies from pooled samples by next generation sequencing. PLoS ONE. 2011;6:e15292. https://doi.org/10.1371/journal.pone.0015292 .
doi: 10.1371/journal.pone.0015292
pubmed: 21264334
pmcid: 3016441
Zhang P, Sheng H, Morabia A, Optimal GTC, Step Length EM. Algorithm (OSLEM) for the estimation of haplotype frequency and its application in lipoprotein lipase genotyping. BMC Bioinform. 2003. https://doi.org/10.1186/1471-2105-4-3 .
doi: 10.1186/1471-2105-4-3
Tsoungui Obama HCJ, Schneider KA. A maximum-likelihood method to estimate haplotype frequencies and prevalence alongside multiplicity of infection from SNP data. Front Epidemiol. 2022. https://doi.org/10.3389/fepid.2022.943625 .
doi: 10.3389/fepid.2022.943625
Van den Bergh B, Swings T, Fauvart M, Michiels J. Experimental design, population dynamics, and diversity in microbial experimental evolution. Microbiol Mol Biol Rev. 2018. https://doi.org/10.1128/MMBR.00008-18 .
doi: 10.1128/MMBR.00008-18
pubmed: 30045954
pmcid: 6094045
Kofler R, Schlötterer C. A guide for the design of evolve and resequencing studies. Mol Biol Evol. 2013;11:31. https://doi.org/10.1093/molbev/mst221 .
doi: 10.1093/molbev/mst221
Vlachos C, Kofler R. MimicrEE2: genome-wide forward simulations of evolve and resequencing studies. PLoS Comput Biol. 2018;08(14): e1006413. https://doi.org/10.1371/journal.pcbi.1006413 .
doi: 10.1371/journal.pcbi.1006413
Cherry JM, Adler C, Ball C, Chervitz SA, Dwight SS, Hester ET, et al. SGD: Saccharomyces genome database. Nucleic Acids Res. 1998;26(1):73–9. https://doi.org/10.1093/nar/26.1.73 .
doi: 10.1093/nar/26.1.73
pubmed: 9399804
pmcid: 147204
Kidd KK, Pakstis AJ, Speed WC, Lagacé R, Chang J, Wootton S, et al. Current sequencing technology makes microhaplotypes a powerful new type of genetic marker for forensics. Forensic Sci Int Genet. 2014;12:215–24. https://doi.org/10.1016/j.fsigen.2014.06.014 .
doi: 10.1016/j.fsigen.2014.06.014
pubmed: 25038325
Burke MK, Dunham JP, Shahrestani P, Thornton KR, Rose MR, Long AD. Genome-wide analysis of a long-term evolution experiment with Drosophila. Nature. 2010;467(7315):587–90. https://doi.org/10.1038/nature09352 .
doi: 10.1038/nature09352
pubmed: 20844486
R Core Team. R: a language and environment for statistical computing. Vienna, Austria; 2021. https://www.R-project.org/ .