Haplotype based testing for a better understanding of the selective architecture.

Evolve and resequence Experimental evolution Haplotype Hypothesis test Post hoc test Selection

Journal

BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194

Informations de publication

Date de publication:
26 Aug 2023
Historique:
received: 29 11 2022
accepted: 03 08 2023
medline: 28 8 2023
pubmed: 27 8 2023
entrez: 26 8 2023
Statut: epublish

Résumé

The identification of genomic regions affected by selection is one of the most important goals in population genetics. If temporal data are available, allele frequency changes at SNP positions are often used for this purpose. Here we provide a new testing approach that uses haplotype frequencies instead of allele frequencies. Using simulated data, we show that compared to SNP based test, our approach has higher power, especially when the number of candidate haplotypes is small or moderate. To improve power when the number of haplotypes is large, we investigate methods to combine them with a moderate number of haplotype subsets. Haplotype frequencies can often be recovered with less noise than SNP frequencies, especially under pool sequencing, giving our test an additional advantage. Furthermore, spurious outlier SNPs may lead to false positives, a problem usually not encountered when working with haplotypes. Post hoc tests for the number of selected haplotypes and for differences between their selection coefficients are also provided for a better understanding of the underlying selection dynamics. An application on a real data set further illustrates the performance benefits. Due to less multiple testing correction and noise reduction, haplotype based testing is able to outperform SNP based tests in terms of power in most scenarios.

Sections du résumé

BACKGROUND BACKGROUND
The identification of genomic regions affected by selection is one of the most important goals in population genetics. If temporal data are available, allele frequency changes at SNP positions are often used for this purpose. Here we provide a new testing approach that uses haplotype frequencies instead of allele frequencies.
RESULTS RESULTS
Using simulated data, we show that compared to SNP based test, our approach has higher power, especially when the number of candidate haplotypes is small or moderate. To improve power when the number of haplotypes is large, we investigate methods to combine them with a moderate number of haplotype subsets. Haplotype frequencies can often be recovered with less noise than SNP frequencies, especially under pool sequencing, giving our test an additional advantage. Furthermore, spurious outlier SNPs may lead to false positives, a problem usually not encountered when working with haplotypes. Post hoc tests for the number of selected haplotypes and for differences between their selection coefficients are also provided for a better understanding of the underlying selection dynamics. An application on a real data set further illustrates the performance benefits.
CONCLUSIONS CONCLUSIONS
Due to less multiple testing correction and noise reduction, haplotype based testing is able to outperform SNP based tests in terms of power in most scenarios.

Identifiants

pubmed: 37633901
doi: 10.1186/s12859-023-05437-3
pii: 10.1186/s12859-023-05437-3
pmc: PMC10463365
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

322

Subventions

Organisme : Austrian Science Fund
ID : DK W1225-B20
Organisme : National Science Foundation
ID : NSF PHY-1748958

Informations de copyright

© 2023. BioMed Central Ltd., part of Springer Nature.

Références

Turner TL, Stewart AD, Fields AT, Rice WR, Tarone AM. Population-based resequencing of experimentally evolved populations reveals the genetic basis of body size variation in Drosophila melanogaster. PLOS Genet. 2011;7(3):1–10. https://doi.org/10.1371/journal.pgen.1001336 .
doi: 10.1371/journal.pgen.1001336
Griffin PC, Hangartner SB, Fournier-Level A, Hoffmann AA. Genomic trajectories to desiccation resistance: convergence and divergence among replicate Selected Drosophila lines. Genetics. 2017;205(2):871–90. https://doi.org/10.1534/genetics.116.187104 .
doi: 10.1534/genetics.116.187104 pubmed: 28007884
Spitzer K, Pelizzola M, Futschik A. Modifying the Chi-square and the CMH test for population genetic inference: adapting to overdispersion. Ann Appl Stat. 2020;14(1):202–20. https://doi.org/10.1214/19-AOAS1301 .
doi: 10.1214/19-AOAS1301
Vlachos C, Burny C, Pelizzola M, Borges R, Futschik A, Kofler R, et al. Benchmarking software tools for detecting and quantifying selection in evolve and resequencing studies. Genome Biol. 2019. https://doi.org/10.1186/s13059-019-1770-8 .
doi: 10.1186/s13059-019-1770-8 pubmed: 31416462 pmcid: 6694636
Kidd KK, Pakstis AJ. State of the art for microhaplotypes. Genes. 2022;13(8). https://www.mdpi.com/2073-4425/13/8/1322 .
Clarke GM, Anderson CA, Pettersson F, Cardon LR, Morris AP, Zondervan KT. Basic statistical analysis in genetic case-control studies. Nat Protoc. 2011;6:121–33. https://doi.org/10.1038/nprot.2010.182 .
doi: 10.1038/nprot.2010.182 pubmed: 21293453 pmcid: 3154648
Datta A, Biswas S. Comparison of haplotype-based statistical tests for disease association with rare and common variants. Brief Bioinform. 2015. https://doi.org/10.1093/bib/bbv072 .
doi: 10.1093/bib/bbv072 pubmed: 26338417 pmcid: 4945828
Guo W, Lin S. Generalized linear modeling with regularization for detecting common disease rare haplotype association. Genet Epidemiol. 2009;33:308–16. https://doi.org/10.1002/gepi.20382 .
doi: 10.1002/gepi.20382 pubmed: 19025789 pmcid: 2752471
Pan W, Kim J, Zhang Y, Shen X, Wei P. A powerful test adaptive association, for rare variants. Genetics. 2014;197(4):1081–95. https://doi.org/10.1534/genetics.114.165035 .
doi: 10.1534/genetics.114.165035 pubmed: 24831820 pmcid: 4125385
Hamazaki K, Iwata H. RAINBOW: haplotype-based genome-wide association study using a novel SNP-set method. PLOS Comput Biol. 2020;16(2):1–17. https://doi.org/10.1371/journal.pcbi.1007663 .
doi: 10.1371/journal.pcbi.1007663
Sabeti P, Reich D, Higgins J, Levine H, Richter D, Schaffner S, et al. Detecting recent positive selection in the human genome from haplotype structure. Nature. 2002;11(419):832–7. https://doi.org/10.1038/nature01140 .
doi: 10.1038/nature01140
Zhang C, Bailey D, Awad T, Liu G, Xing G, Cao M, et al. A whole genome long-range haplotype (WGLRH) test for detecting imprints of positive selection. Bioinformatics (Oxford, England). 2006;10(22):2122–8. https://doi.org/10.1093/bioinformatics/btl365 .
doi: 10.1093/bioinformatics/btl365
Günther T, Schmid K. Improved haplotype-based detection of ongoing selective sweeps towards an application in Arabidopsis thaliana. BMC Res Notes. 2011;07(4):232. https://doi.org/10.1186/1756-0500-4-232 .
doi: 10.1186/1756-0500-4-232
Pelizzola M, Behr M, Li H, Munk A, Futschik A. Multiple haplotype reconstruction from allele frequency data. Nat Comput Sci. 2021;1:1–10. https://doi.org/10.1038/s43588-021-00056-5 .
doi: 10.1038/s43588-021-00056-5
Moeinzadeh MH, Yang J, Muzychenko E, Gallone G, Heller D, Reinert K, et al. Ranbow: a fast and accurate method for polyploid haplotype reconstruction. PLOS Comput Biol. 2020;16(5):1–23. https://doi.org/10.1371/journal.pcbi.1007843 .
doi: 10.1371/journal.pcbi.1007843
Phillips M, Kutch I, McHugh K, Taggard S, Burke M. Crossing design shapes patterns of genetic variation in synthetic recombinant populations of Saccharomyces cerevisiae. Sci Rep. 2021;10(11):19551. https://doi.org/10.1038/s41598-021-99026-0 .
doi: 10.1038/s41598-021-99026-0
Neuhauser C. 19. In: Mathematical models in population genetics. New York: Wiley; 2004. https://doi.org/10.1002/0470022620.bbc20 .
Sohail M, Louie R, McKay M, Barton J. MPL resolves genetic linkage in fitness inference from complex evolutionary histories. Nat Biotechnol. 2021;39:1–8. https://doi.org/10.1038/s41587-020-0737-3 .
doi: 10.1038/s41587-020-0737-3
Illingworth CJR, Parts L, Schiffels S, Liti G, Mustonen V. Quantifying selection acting on a complex trait using allele frequency time series data. Mol Biol Evol. 2011;29(4):1187–97. https://doi.org/10.1093/molbev/msr289 .
doi: 10.1093/molbev/msr289 pubmed: 22114362 pmcid: 3731369
Jónás Á, Taus T, Kosiol C, Schlötterer C, Futschik A. Estimating the effective population size from temporal allele frequency changes in experimental evolution. Genetics. 2016;204(2):723–35. https://doi.org/10.1534/genetics.116.191197 .
doi: 10.1534/genetics.116.191197 pubmed: 27542959 pmcid: 5068858
Wiberg RAW, Gaggiotti OE, Morrissey MB, Ritchie MG. Identifying consistent allele frequency differences in studies of stratified populations. Methods Ecol Evol. 2017;8(12):1899–909. https://doi.org/10.1111/2041-210X.12810 .
doi: 10.1111/2041-210X.12810 pubmed: 29263778 pmcid: 5726381
Vovk V, Wang R. Combining p-values via averaging. Biometrika. 2020;107(4):791–808. https://doi.org/10.1093/biomet/asaa027 .
doi: 10.1093/biomet/asaa027
Futschik A, Taus T, Zehetmayer S. An omnibus test for the global null hypothesis. Stat Methods Med Res. 2019;28(8):2292–304. https://doi.org/10.1177/0962280218768326 .
doi: 10.1177/0962280218768326 pubmed: 29635962
Wilson DJ. The harmonic mean p-value for combining dependent tests. Proc Natl Acad Sci. 2019;116(4):1195–200. https://doi.org/10.1073/pnas.1814092116 .
doi: 10.1073/pnas.1814092116 pubmed: 30610179 pmcid: 6347718
Goeman JJ, Rosenblatt JD, Nichols TE. The harmonic mean p-value: Strong versus weak control, and the assumption of independence. Proc Natl Acad Sci. 2019;116(47):23382–3. https://doi.org/10.1073/pnas.1909339116 .
doi: 10.1073/pnas.1909339116 pubmed: 31662466 pmcid: 6876242
Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodol). 1995;57(1):289–300. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x .
doi: 10.1111/j.2517-6161.1995.tb02031.x
Langmüller A, Schlötterer C. Low concordance of short-term and long-term selection responses in experimental Drosophila populations. Mol Ecol. 2020. https://doi.org/10.1111/mec.15579 .
doi: 10.1111/mec.15579 pubmed: 32762052 pmcid: 7540288
Pelletier K, Pitchers WR, Mammel A, Northrop-Albrecht E, Márquez EJ, Moscarella RA, et al. Complexities of recapitulating polygenic effects in natural populations: replication of genetic effects on wing shape in artificially selected and wild caught populations of Drosophila melanogaster. bioRxiv. 2022. https://doi.org/10.1101/2022.05.12.491649 .
doi: 10.1101/2022.05.12.491649
Won S, Park Je, Son JH, Lee SH, Park B, Park M, et al. Defined genomic prediction accuracy using haplotypes, by size and hierarchical clustering based on linkage disequilibrium. Front Genet. 2020. https://doi.org/10.3389/fgene.2020.00134 .
doi: 10.3389/fgene.2020.00134 pubmed: 33408737 pmcid: 7780896
Bardel C, Darlu P, Genin E. Clustering of haplotypes based on phylogeny: How good a strategy for association testing? Eur J Hum Genet EJHG. 2006;14:202–6. https://doi.org/10.1038/sj.ejhg.5201501 .
doi: 10.1038/sj.ejhg.5201501 pubmed: 16306882
Franssen SU, Barton NH, Schlötterer C. Reconstruction of haplotype-blocks selected during experimental evolution. Mol Biol Evol. 2016;34(1):174–84. https://doi.org/10.1093/molbev/msw210 .
doi: 10.1093/molbev/msw210 pubmed: 27702776
Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B, et al. The structure of haplotype blocks in the human genome. Science. 2002;296(5576):2225–9. https://doi.org/10.1126/science.1069424 .
doi: 10.1126/science.1069424 pubmed: 12029063
Lewontin RC. The interaction of selection and linkage. I. General considerations; heterotic models. Genetics. 1964;49(1):49–67. https://doi.org/10.1093/genetics/49.1.49 .
doi: 10.1093/genetics/49.1.49 pubmed: 17248194 pmcid: 1210557
Barghi N, Tobler R, Nolte V, Jakšić AM, Mallard F, Otte KA, et al. Genetic redundancy fuels polygenic adaptation in Drosophila. PLOS Biol. 2019;17(2):1–31. https://doi.org/10.1371/journal.pbio.3000128 .
doi: 10.1371/journal.pbio.3000128
Long Q, Jeffares D, Zhang Q, Ye K, Nizhynska V, Ning Z, et al. PoolHap: inferring haplotype frequencies from pooled samples by next generation sequencing. PLoS ONE. 2011;6:e15292. https://doi.org/10.1371/journal.pone.0015292 .
doi: 10.1371/journal.pone.0015292 pubmed: 21264334 pmcid: 3016441
Zhang P, Sheng H, Morabia A, Optimal GTC, Step Length EM. Algorithm (OSLEM) for the estimation of haplotype frequency and its application in lipoprotein lipase genotyping. BMC Bioinform. 2003. https://doi.org/10.1186/1471-2105-4-3 .
doi: 10.1186/1471-2105-4-3
Tsoungui Obama HCJ, Schneider KA. A maximum-likelihood method to estimate haplotype frequencies and prevalence alongside multiplicity of infection from SNP data. Front Epidemiol. 2022. https://doi.org/10.3389/fepid.2022.943625 .
doi: 10.3389/fepid.2022.943625
Van den Bergh B, Swings T, Fauvart M, Michiels J. Experimental design, population dynamics, and diversity in microbial experimental evolution. Microbiol Mol Biol Rev. 2018. https://doi.org/10.1128/MMBR.00008-18 .
doi: 10.1128/MMBR.00008-18 pubmed: 30045954 pmcid: 6094045
Kofler R, Schlötterer C. A guide for the design of evolve and resequencing studies. Mol Biol Evol. 2013;11:31. https://doi.org/10.1093/molbev/mst221 .
doi: 10.1093/molbev/mst221
Vlachos C, Kofler R. MimicrEE2: genome-wide forward simulations of evolve and resequencing studies. PLoS Comput Biol. 2018;08(14): e1006413. https://doi.org/10.1371/journal.pcbi.1006413 .
doi: 10.1371/journal.pcbi.1006413
Cherry JM, Adler C, Ball C, Chervitz SA, Dwight SS, Hester ET, et al. SGD: Saccharomyces genome database. Nucleic Acids Res. 1998;26(1):73–9. https://doi.org/10.1093/nar/26.1.73 .
doi: 10.1093/nar/26.1.73 pubmed: 9399804 pmcid: 147204
Kidd KK, Pakstis AJ, Speed WC, Lagacé R, Chang J, Wootton S, et al. Current sequencing technology makes microhaplotypes a powerful new type of genetic marker for forensics. Forensic Sci Int Genet. 2014;12:215–24. https://doi.org/10.1016/j.fsigen.2014.06.014 .
doi: 10.1016/j.fsigen.2014.06.014 pubmed: 25038325
Burke MK, Dunham JP, Shahrestani P, Thornton KR, Rose MR, Long AD. Genome-wide analysis of a long-term evolution experiment with Drosophila. Nature. 2010;467(7315):587–90. https://doi.org/10.1038/nature09352 .
doi: 10.1038/nature09352 pubmed: 20844486
R Core Team. R: a language and environment for statistical computing. Vienna, Austria; 2021. https://www.R-project.org/ .

Auteurs

Haoyu Chen (H)

University of Veterinary Medicine Vienna, Vienna, Austria.
Vienna Graduate School of Population Genetics, Vienna, Austria.

Marta Pelizzola (M)

Aarhus University, Aarhus, Denmark.

Andreas Futschik (A)

Johannes Kepler University of Linz, Linz, Austria. andreas.futschik@jku.at.

Articles similaires

Humans Macular Degeneration Mendelian Randomization Analysis Life Style Genome-Wide Association Study
Coal Metagenome Phylogeny Bacteria Genome, Bacterial
Capsicum Disease Resistance Plant Diseases Polymorphism, Single Nucleotide Ralstonia solanacearum

Classifications MeSH