Pangenome graphs improve the analysis of structural variants in rare genetic diseases.


Journal

Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555

Informations de publication

Date de publication:
22 Jan 2024
Historique:
received: 06 06 2023
accepted: 10 01 2024
medline: 23 1 2024
pubmed: 23 1 2024
entrez: 22 1 2024
Statut: epublish

Résumé

Rare DNA alterations that cause heritable diseases are only partially resolvable by clinical next-generation sequencing due to the difficulty of detecting structural variation (SV) in all genomic contexts. Long-read, high fidelity genome sequencing (HiFi-GS) detects SVs with increased sensitivity and enables assembling personal and graph genomes. We leverage standard reference genomes, public assemblies (n = 94) and a large collection of HiFi-GS data from a rare disease program (Genomic Answers for Kids, GA4K, n = 574 assemblies) to build a graph genome representing a unified SV callset in GA4K, identify common variation and prioritize SVs that are more likely to cause genetic disease (MAF < 0.01). Using graphs, we obtain a higher level of reproducibility than the standard reference approach. We observe over 200,000 SV alleles unique to GA4K, including nearly 1000 rare variants that impact coding sequence. With improved specificity for rare SVs, we isolate 30 candidate SVs in phenotypically prioritized genes, including known disease SVs. We isolate a novel diagnostic SV in KMT2E, demonstrating use of personal assemblies coupled with pangenome graphs for rare disease genomics. The community may interrogate our pangenome with additional assemblies to discover new SVs within the allele frequency spectrum relevant to genetic diseases.

Identifiants

pubmed: 38253606
doi: 10.1038/s41467-024-44980-2
pii: 10.1038/s41467-024-44980-2
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

657

Informations de copyright

© 2024. The Author(s).

Références

Treangen, T. J. & Salzberg, S. L. Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat. Rev. Genet. 13, 36–46 (2012).
doi: 10.1038/nrg3117
Olson, N. D. et al. Variant calling and benchmarking in an era of complete human genome sequences. Nat. Rev. Genet. 24, 464–483 (2023).
doi: 10.1038/s41576-023-00590-0 pubmed: 37059810
Zhao, X. et al. Expectations and blind spots for structural variation detection from long-read assemblies and short-read genome sequencing technologies. Am. J. Hum. Genet. 108, 919–928 (2021).
doi: 10.1016/j.ajhg.2021.03.014 pubmed: 33789087 pmcid: 8206509
Nurk, S. et al. The complete sequence of a human genome. Science 376, 44–53 (2022).
doi: 10.1126/science.abj6987 pubmed: 35357919 pmcid: 9186530
Cheng, H. et al. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).
doi: 10.1038/s41592-020-01056-5 pubmed: 33526886 pmcid: 7961889
Ebert, P. et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 372, eabf7117 (2021).
doi: 10.1126/science.abf7117 pubmed: 33632895 pmcid: 8026704
Li, H. et al. A synthetic-diploid benchmark for accurate variant-calling evaluation. Nat. Methods 15, 595–597 (2018).
doi: 10.1038/s41592-018-0054-7 pubmed: 30013044 pmcid: 6341484
Heller, D. & Vingron, M. SVIM-asm: structural variant detection from haploid and diploid genome assemblies. Bioinformatics 36, 5519–5521 (2020).
doi: 10.1093/bioinformatics/btaa1034 pmcid: 8016491
Jeffares, D. C. et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat. Commun. 8, 14061 (2017).
doi: 10.1038/ncomms14061 pubmed: 28117401 pmcid: 5286201
English, A. C. et al. Truvari: refined structural variant comparison preserves allelic diversity. Genome Biol. 23, 271 (2022).
doi: 10.1186/s13059-022-02840-6 pubmed: 36575487 pmcid: 9793516
Eggertsson, H. P. et al. GraphTyper2 enables population-scale genotyping of structural variation using pangenome graphs. Nat. Commun. 10, 5402 (2019).
doi: 10.1038/s41467-019-13341-9 pubmed: 31776332 pmcid: 6881350
Kirsche, M., Prabhu, G., Sherman, R. et al. Jasmine and Iris: population-scale structural variant comparison and analysis. Nat. Methods 20, 408–417 (2023).
Li, H., Feng, X. & Chu, C. The design and construction of reference pangenome graphs with minigraph. Genome Biol. 21, 265 (2020).
doi: 10.1186/s13059-020-02168-z pubmed: 33066802 pmcid: 7568353
Armstrong, J. et al. Progressive Cactus is a multiple-genome aligner for the thousand-genome era. Nature 587, 246–251 (2020).
doi: 10.1038/s41586-020-2871-y pubmed: 33177663 pmcid: 7673649
Garrison, E. et al. Building pangenome graphs. Preprint at bioRxiv https://doi.org/10.1101/2023.04.05.535718 (2023).
Liao, W.-W. et al. A draft human pangenome reference. Nature 617, 312–324 (2023).
doi: 10.1038/s41586-023-05896-x pubmed: 37165242 pmcid: 10172123
Kane, N. J. et al. Committing to genomic answers for all kids: evaluating inequity in genomic research enrollment. Genet. Med. 25, 100895 (2023).
doi: 10.1016/j.gim.2023.100895 pubmed: 37194653 pmcid: 10524770
Amberger, J. S. et al. OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders. Nucleic Acids Res. 43, D789–D798 (2015).
doi: 10.1093/nar/gku1205 pubmed: 25428349
Collins, R. L. et al. A structural variation reference for medical and population genetics. Nature 581, 444–451 (2020).
doi: 10.1038/s41586-020-2287-8 pubmed: 32461652 pmcid: 7334194
Cohen, A. S. A. et al. Genomic answers for children: dynamic analyses of >1000 pediatric rare disease genomes. Genet. Med. 24, 1336–1348 (2022).
doi: 10.1016/j.gim.2022.02.007 pubmed: 35305867
Chen, X. et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32, 1220–1222 (2016).
doi: 10.1093/bioinformatics/btv710 pubmed: 26647377
Jagadeesh, K. A. et al. Phrank measures phenotype sets similarity to greatly improve Mendelian diagnostic disease prioritization. Genet. Med. 21, 464–470 (2019).
doi: 10.1038/s41436-018-0072-y pubmed: 29997393
Poplin, R. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36, 983–987 (2018).
doi: 10.1038/nbt.4235 pubmed: 30247488
Guarracino, A. et al. Recombination between heterologous human acrocentric chromosomes. Nature 617, 335–343 (2023).
doi: 10.1038/s41586-023-05976-y pubmed: 37165241 pmcid: 10172130
Vollger, M. R. et al. Increased mutation and gene conversion within human segmental duplications. Nature 617, 325–334 (2023).
doi: 10.1038/s41586-023-05895-y pubmed: 37165237 pmcid: 10172114
Leonard, A. S. et al. Graph construction method impacts variation representation and analyses in a bovine super-pangenome. Genome Biol. 24, 124 (2023).
doi: 10.1186/s13059-023-02969-y pubmed: 37217946 pmcid: 10204317
Hickey G. et al. Pangenome graph construction from genome alignments with Minigraph-Cactus. Nat. Biotechnol. https://doi.org/10.1038/s41587-023-01793-w (2023)
Becker, T. et al. FusorSV: an algorithm for optimally combining data from multiple structural variation detection methods. Genome Biol. 19, 38 (2018).
doi: 10.1186/s13059-018-1404-6 pubmed: 29559002 pmcid: 5859555
Mohiyuddin, M. et al. MetaSV: an accurate and integrative structural-variant caller for next generation sequencing. Bioinformatics 31, 2741–2744 (2015).
doi: 10.1093/bioinformatics/btv204 pubmed: 25861968 pmcid: 4528635
Zarate, S. et al. Parliament2: accurate structural variant calling at scale. GigaScience 9, giaa145 (2020).
doi: 10.1093/gigascience/giaa145 pubmed: 33347570 pmcid: 7751401
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
doi: 10.1093/bioinformatics/bty191 pubmed: 29750242 pmcid: 6137996
Pedersen, B. S. et al. Somalier: rapid relatedness estimation for cancer and germline studies using efficient genome sketches. Genome Med. 12, 62 (2020).
doi: 10.1186/s13073-020-00761-2 pubmed: 32664994 pmcid: 7362544
Hubley, R. et al. The Dfam database of repetitive DNA families. Nucleic Acids Res. 44, D81–D89 (2016).
doi: 10.1093/nar/gkv1272 pubmed: 26612867
Cunningham, F. et al. Ensembl 2022. Nucleic Acids Res. 50, D988–D995 (2022).
doi: 10.1093/nar/gkab1049 pubmed: 34791404
McInnes, L. et al. UMAP: uniform manifold approximation and projection. J. Open Source Softw. 3, 861 (2018).
doi: 10.21105/joss.00861
Diaz-Papkovich, A. et al. UMAP reveals cryptic population structure and phenotype heterogeneity in large genomic cohorts. PLOS Genet. 15, e1008432 (2019).
doi: 10.1371/journal.pgen.1008432 pubmed: 31675358 pmcid: 6853336
Pedersen, B. S. & Quinlan, A. R. Mosdepth: quick coverage calculation for genomes and exomes. Bioinformatics 34, 867–868 (2018).
doi: 10.1093/bioinformatics/btx699 pubmed: 29096012

Auteurs

Cristian Groza (C)

Quantitative Life Sciences, McGill University, Montréal, QC, Canada.

Carl Schwendinger-Schreck (C)

Genomic Medicine Center, Children's Mercy Hospital and Research Institute, KC, MO, USA.

Warren A Cheung (WA)

Genomic Medicine Center, Children's Mercy Hospital and Research Institute, KC, MO, USA.

Emily G Farrow (EG)

Genomic Medicine Center, Children's Mercy Hospital and Research Institute, KC, MO, USA.

Isabelle Thiffault (I)

Genomic Medicine Center, Children's Mercy Hospital and Research Institute, KC, MO, USA.

Juniper Lake (J)

Pacific Biosciences, Menlo Park, CA, USA.

William B Rizzo (WB)

Child Health Research Institute, Department of Pediatrics, Nebraska Medical Center, Omaha, NE, USA.

Gilad Evrony (G)

Center for Human Genetics and Genomics, Department of Pediatrics, Neuroscience & Physiology, New York University Grossman School of Medicine, New York, NY, USA.

Tom Curran (T)

Children's Mercy Research Institute, Kansas City, MO, USA.

Guillaume Bourque (G)

Canadian Center for Computational Genomics, McGill University, Montréal, QC, Canada. guil.bourque@mcgill.ca.
Department of Human Genetics, McGill University, Montréal, QC, Canada. guil.bourque@mcgill.ca.
Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan. guil.bourque@mcgill.ca.
Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, QC, Canada. guil.bourque@mcgill.ca.

Tomi Pastinen (T)

Genomic Medicine Center, Children's Mercy Hospital and Research Institute, KC, MO, USA. tpastinen@cmh.edu.

Classifications MeSH