Genotyping of SNPs in bread wheat at reduced cost from pooled experiments and imputation.


Journal

TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik
ISSN: 1432-2242
Titre abrégé: Theor Appl Genet
Pays: Germany
ID NLM: 0145600

Informations de publication

Date de publication:
19 Jan 2024
Historique:
received: 15 05 2023
accepted: 19 12 2023
medline: 20 1 2024
pubmed: 20 1 2024
entrez: 19 1 2024
Statut: epublish

Résumé

Pooling and imputation are computational methods that can be combined for achieving cost-effective and accurate high-density genotyping of both common and rare variants, as demonstrated in a MAGIC wheat population. The plant breeding industry has shown growing interest in using the genotype data of relevant markers for performing selection of new competitive varieties. The selection usually benefits from large amounts of marker data, and it is therefore crucial to dispose of data collection methods that are both cost-effective and reliable. Computational methods such as genotype imputation have been proposed earlier in several plant science studies for addressing the cost challenge. Genotype imputation methods have though been used more frequently and investigated more extensively in human genetics research. The various algorithms that exist have shown lower accuracy at inferring the genotype of genetic variants occurring at low frequency, while these rare variants can have great significance and impact in the genetic studies that underlie selection. In contrast, pooling is a technique that can efficiently identify low-frequency items in a population, and it has been successfully used for detecting the samples that carry rare variants in a population. In this study, we propose to combine pooling and imputation and demonstrate this by simulating a hypothetical microarray for genotyping a population of recombinant inbred lines in a cost-effective and accurate manner, even for rare variants. We show that with an adequate imputation model, it is feasible to accurately predict the individual genotypes at lower cost than sample-wise genotyping and time-effectively. Moreover, we provide code resources for reproducing the results presented in this study in the form of a containerized workflow.

Identifiants

pubmed: 38243086
doi: 10.1007/s00122-023-04533-5
pii: 10.1007/s00122-023-04533-5
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

26

Subventions

Organisme : Svenska Forskningsrådet Formas
ID : 2017-00453

Informations de copyright

© 2024. The Author(s).

Références

Ausmees K, Nettelblad C (2022) Achieving improved accuracy for imputation of ancient DNA. Bioinformatics. https://doi.org/10.1093/bioinformatics/btac738
doi: 10.1093/bioinformatics/btac738 pubmed: 36229780 pmcid: 9805568
Blischak PD, Kubatko LS, Wolfe AD (2017) SNP genotyping and parameter estimation in polyploids using low-coverage sequencing data. Bioinformatics 34(3):407–415. https://doi.org/10.1093/bioinformatics/btx587
doi: 10.1093/bioinformatics/btx587
Browning B (2018) Beagle 4.1. https://faculty.washington.edu/browning/beagle/b4_1.html
Browning BL, Browning SR (2016) Genotype imputation with millions of reference samples. Am J Human Genet 98:116–126. https://doi.org/10.1016/j.ajhg.2015.11.020
doi: 10.1016/j.ajhg.2015.11.020
Clevenger J, Chavarro C, Pearl SA, Ozias-Akins P, Jackson SA (2015) Single nucleotide polymorphism identification in polyploids: a review, example, and recommendations. Mol Plant 8(6):831–846. https://doi.org/10.1016/j.molp.2015.02.002
doi: 10.1016/j.molp.2015.02.002 pubmed: 25676455
Clouard C (2023) poolimputeSNPs: a Snakemake workflow for simulating pooled genotyping experiments with SNPs followed by genotype imputation. https://github.com/camcl/poolimputeSNPs
Clouard C, Ausmees K, Nettelblad C (2022) A joint use of pooling and imputation for genotyping SNPs. BMC Bioinform. https://doi.org/10.1186/s12859-022-04974-7
doi: 10.1186/s12859-022-04974-7
Das S, Abecasis GR, Browning BL (2018) Genotype imputation from large reference panels. Annu Rev Genomics Hum Genet 19:73–96
doi: 10.1146/annurev-genom-083117-021602 pubmed: 29799802
Davies R, Flint J, Myers S et al (2016) Rapid genotype imputation from sequence without reference panels. Nat Genet 48:965–969. https://doi.org/10.1038/ng.3594
doi: 10.1038/ng.3594 pubmed: 27376236 pmcid: 4966640
Fragoso CA, Heffelfinger C, Zhao H, Dellaporta SL (2015) Imputing genotypes in biallelic populations from low-coverage sequence data. Genetics 202(2):487–495. https://doi.org/10.1534/genetics.115.182071
doi: 10.1534/genetics.115.182071 pubmed: 26715670 pmcid: 4788230
Gao Y et al (2020) Plant-ImputeDB: an integrated multiple plant reference panel database for genotype imputation. Nucleic Acids Res 49(D1):D1480–D1488. https://doi.org/10.1093/nar/gkaa953
doi: 10.1093/nar/gkaa953 pmcid: 7779032
Gardner K, Wittern L, Mackay I (2016) A highly recombined, high-density, eight-founder wheat magic map reveals extensive segregation distortion and genomic locations of introgression segments. Plant Biotechnol J 14(6):1406–1417. https://doi.org/10.1111/pbi.12504
doi: 10.1111/pbi.12504 pubmed: 26801965 pmcid: 4985697
Gonen S, Wimmer V, Gaynor R et al (2018) A heuristic method for fast and accurate phasing and imputation of single-nucleotide polymorphism data in bi-parental plant populations. Theor Appl Genet 131:2345–2357. https://doi.org/10.1007/s00122-018-3156-9
doi: 10.1007/s00122-018-3156-9 pubmed: 30078163 pmcid: 6208939
International Wheat Genome Sequencing Consortium (IWGSC) (2018) Shifting the limits in wheat research and breeding using a fully annotated reference genome. Science 61(6403). https://doi.org/10.1126/science.aar7191
Keeble-Gagnère G et al (2021) Novel design of imputation-enabled SNP arrays for breeding and research applications supporting multi-species hybridization. Front Plant Sci. https://doi.org/10.3389/fpls.2021.756877
doi: 10.3389/fpls.2021.756877 pubmed: 35003156 pmcid: 8728019
London UC (2021) MAGIC_diverse_FILES. http://mtweb.cs.ucl.ac.uk/mus/www/MAGICdiverse/MAGIC_diverse_FILES/
Maccaferri M, Bruschi M, Tuberosa R (2022) Sequence-based marker assisted selection in wheat. Springer, Cham, pp 513–538
Marroni F, Pinosio S, Morgante M (2012) The quest for rare variants: pooled multiplexed next generation sequencing in plants. Front Plant Sci. https://doi.org/10.3389/fpls.2012.00133
doi: 10.3389/fpls.2012.00133 pubmed: 22754557 pmcid: 3384946
Mölder F et al (2021) Sustainable data analysis with Snakemake. F1000Research 10(33)
Nicod J, Davies R, Cai N et al (2016) Genome-wide association of multiple complex traits in outbred mice by ultra-low-coverage sequencing. Nat Genet 48:912–918. https://doi.org/10.1038/ng.3595
doi: 10.1038/ng.3595 pubmed: 27376238 pmcid: 4966644
Pickrell J (2015) Genetic maps for the 1000 Genomes Project variants. https://github.com/joepickrell/1000-genomes-genetic-maps
Pook T et al (2019) Improving imputation quality in Beagle for crop and livestock data. Genes Genomes Genet 98:116–126. https://doi.org/10.1534/g3.119.400798
doi: 10.1534/g3.119.400798
Pook T et al (2021) Increasing calling accuracy, coverage, and read-depth in sequence data by the use of haplotype blocks. PLoS Genet 17(12):1–22. https://doi.org/10.1371/journal.pgen.1009944
doi: 10.1371/journal.pgen.1009944
Rasheed A, Xia X (2019) From markers to genome-based breeding in wheat. Theor Appl Genet 132:767–784. https://doi.org/10.1007/s00122-019-03286-4
doi: 10.1007/s00122-019-03286-4 pubmed: 30673804
Scott MF et al (2021) Limited haplotype diversity underlies polygenic trait architecture across 70 years of wheat breeding. Genome Biol. https://doi.org/10.1186/s13059-021-02354-7
doi: 10.1186/s13059-021-02354-7 pubmed: 34088344 pmcid: 8176728
Skøt L, Grinberg N (2017) Genomic selection in crop plants. In: Thomas B, Murray BG, Murphy DJ (eds) Encyclopedia of applied plant sciences, 2nd edn. Academic Press, Oxford, pp 88–92
Technow F, Gerke J (2017) Parent-progeny imputation from pooled samples for cost-efficient genotyping in plant breeding. PLoS One. https://doi.org/10.1371/journal.pone.0190271
doi: 10.1371/journal.pone.0190271 pubmed: 29272307 pmcid: 5741258
Thorn S et al. (2021) Performance of genetic imputation across commercial crop species. bioRxiv. https://www.biorxiv.org/content/early/2021/12/03/2021.12.01.470712
Unité de Recherche en Génomique-Info (2018) IWGSC_RefSeq_Annotations. https://urgi.versailles.inra.fr/download/iwgsc/IWGSC_RefSeq_Annotations/v1.0/
Yoo AB, Jette MA, Grondona M, Feitelson D, Rudolph L, Schwiegelshohn U (2003) Slurm: simple Linux utility for resource management. In: Feitelson D, Rudolph L, Schwiegelshohn U (eds) Job scheduling strategies for parallel processing. Springer, Berlin, pp 44–60
doi: 10.1007/10968987_3
Zheng C, Boer MP, van Eeuwijk FA (2018) Accurate genotype imputation in multiparental populations from low-coverage sequence. Genetics 210(1):71–82. https://doi.org/10.1534/genetics.118.300885
doi: 10.1534/genetics.118.300885 pubmed: 30045858 pmcid: 6116951

Auteurs

Camille Clouard (C)

Division of Scientific Computing, Department of Information Technology, Uppsala University, Lägerhyddsvägen 1, 75237, Uppsala, Sweden. camille.clouard@it.uu.se.

Carl Nettelblad (C)

Division of Scientific Computing, Department of Information Technology, Uppsala University, Lägerhyddsvägen 1, 75237, Uppsala, Sweden.
SciLifeLab, Science for Life Laboratory, Husargatan 3, 75237, Uppsala, Sweden.

Classifications MeSH