Pangenome graphs improve the analysis of structural variants in rare genetic diseases.
Journal
Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555
Informations de publication
Date de publication:
22 Jan 2024
22 Jan 2024
Historique:
received:
06
06
2023
accepted:
10
01
2024
medline:
23
1
2024
pubmed:
23
1
2024
entrez:
22
1
2024
Statut:
epublish
Résumé
Rare DNA alterations that cause heritable diseases are only partially resolvable by clinical next-generation sequencing due to the difficulty of detecting structural variation (SV) in all genomic contexts. Long-read, high fidelity genome sequencing (HiFi-GS) detects SVs with increased sensitivity and enables assembling personal and graph genomes. We leverage standard reference genomes, public assemblies (n = 94) and a large collection of HiFi-GS data from a rare disease program (Genomic Answers for Kids, GA4K, n = 574 assemblies) to build a graph genome representing a unified SV callset in GA4K, identify common variation and prioritize SVs that are more likely to cause genetic disease (MAF < 0.01). Using graphs, we obtain a higher level of reproducibility than the standard reference approach. We observe over 200,000 SV alleles unique to GA4K, including nearly 1000 rare variants that impact coding sequence. With improved specificity for rare SVs, we isolate 30 candidate SVs in phenotypically prioritized genes, including known disease SVs. We isolate a novel diagnostic SV in KMT2E, demonstrating use of personal assemblies coupled with pangenome graphs for rare disease genomics. The community may interrogate our pangenome with additional assemblies to discover new SVs within the allele frequency spectrum relevant to genetic diseases.
Identifiants
pubmed: 38253606
doi: 10.1038/s41467-024-44980-2
pii: 10.1038/s41467-024-44980-2
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
657Informations de copyright
© 2024. The Author(s).
Références
Treangen, T. J. & Salzberg, S. L. Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat. Rev. Genet. 13, 36–46 (2012).
doi: 10.1038/nrg3117
Olson, N. D. et al. Variant calling and benchmarking in an era of complete human genome sequences. Nat. Rev. Genet. 24, 464–483 (2023).
doi: 10.1038/s41576-023-00590-0
pubmed: 37059810
Zhao, X. et al. Expectations and blind spots for structural variation detection from long-read assemblies and short-read genome sequencing technologies. Am. J. Hum. Genet. 108, 919–928 (2021).
doi: 10.1016/j.ajhg.2021.03.014
pubmed: 33789087
pmcid: 8206509
Nurk, S. et al. The complete sequence of a human genome. Science 376, 44–53 (2022).
doi: 10.1126/science.abj6987
pubmed: 35357919
pmcid: 9186530
Cheng, H. et al. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).
doi: 10.1038/s41592-020-01056-5
pubmed: 33526886
pmcid: 7961889
Ebert, P. et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 372, eabf7117 (2021).
doi: 10.1126/science.abf7117
pubmed: 33632895
pmcid: 8026704
Li, H. et al. A synthetic-diploid benchmark for accurate variant-calling evaluation. Nat. Methods 15, 595–597 (2018).
doi: 10.1038/s41592-018-0054-7
pubmed: 30013044
pmcid: 6341484
Heller, D. & Vingron, M. SVIM-asm: structural variant detection from haploid and diploid genome assemblies. Bioinformatics 36, 5519–5521 (2020).
doi: 10.1093/bioinformatics/btaa1034
pmcid: 8016491
Jeffares, D. C. et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat. Commun. 8, 14061 (2017).
doi: 10.1038/ncomms14061
pubmed: 28117401
pmcid: 5286201
English, A. C. et al. Truvari: refined structural variant comparison preserves allelic diversity. Genome Biol. 23, 271 (2022).
doi: 10.1186/s13059-022-02840-6
pubmed: 36575487
pmcid: 9793516
Eggertsson, H. P. et al. GraphTyper2 enables population-scale genotyping of structural variation using pangenome graphs. Nat. Commun. 10, 5402 (2019).
doi: 10.1038/s41467-019-13341-9
pubmed: 31776332
pmcid: 6881350
Kirsche, M., Prabhu, G., Sherman, R. et al. Jasmine and Iris: population-scale structural variant comparison and analysis. Nat. Methods 20, 408–417 (2023).
Li, H., Feng, X. & Chu, C. The design and construction of reference pangenome graphs with minigraph. Genome Biol. 21, 265 (2020).
doi: 10.1186/s13059-020-02168-z
pubmed: 33066802
pmcid: 7568353
Armstrong, J. et al. Progressive Cactus is a multiple-genome aligner for the thousand-genome era. Nature 587, 246–251 (2020).
doi: 10.1038/s41586-020-2871-y
pubmed: 33177663
pmcid: 7673649
Garrison, E. et al. Building pangenome graphs. Preprint at bioRxiv https://doi.org/10.1101/2023.04.05.535718 (2023).
Liao, W.-W. et al. A draft human pangenome reference. Nature 617, 312–324 (2023).
doi: 10.1038/s41586-023-05896-x
pubmed: 37165242
pmcid: 10172123
Kane, N. J. et al. Committing to genomic answers for all kids: evaluating inequity in genomic research enrollment. Genet. Med. 25, 100895 (2023).
doi: 10.1016/j.gim.2023.100895
pubmed: 37194653
pmcid: 10524770
Amberger, J. S. et al. OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders. Nucleic Acids Res. 43, D789–D798 (2015).
doi: 10.1093/nar/gku1205
pubmed: 25428349
Collins, R. L. et al. A structural variation reference for medical and population genetics. Nature 581, 444–451 (2020).
doi: 10.1038/s41586-020-2287-8
pubmed: 32461652
pmcid: 7334194
Cohen, A. S. A. et al. Genomic answers for children: dynamic analyses of >1000 pediatric rare disease genomes. Genet. Med. 24, 1336–1348 (2022).
doi: 10.1016/j.gim.2022.02.007
pubmed: 35305867
Chen, X. et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32, 1220–1222 (2016).
doi: 10.1093/bioinformatics/btv710
pubmed: 26647377
Jagadeesh, K. A. et al. Phrank measures phenotype sets similarity to greatly improve Mendelian diagnostic disease prioritization. Genet. Med. 21, 464–470 (2019).
doi: 10.1038/s41436-018-0072-y
pubmed: 29997393
Poplin, R. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36, 983–987 (2018).
doi: 10.1038/nbt.4235
pubmed: 30247488
Guarracino, A. et al. Recombination between heterologous human acrocentric chromosomes. Nature 617, 335–343 (2023).
doi: 10.1038/s41586-023-05976-y
pubmed: 37165241
pmcid: 10172130
Vollger, M. R. et al. Increased mutation and gene conversion within human segmental duplications. Nature 617, 325–334 (2023).
doi: 10.1038/s41586-023-05895-y
pubmed: 37165237
pmcid: 10172114
Leonard, A. S. et al. Graph construction method impacts variation representation and analyses in a bovine super-pangenome. Genome Biol. 24, 124 (2023).
doi: 10.1186/s13059-023-02969-y
pubmed: 37217946
pmcid: 10204317
Hickey G. et al. Pangenome graph construction from genome alignments with Minigraph-Cactus. Nat. Biotechnol. https://doi.org/10.1038/s41587-023-01793-w (2023)
Becker, T. et al. FusorSV: an algorithm for optimally combining data from multiple structural variation detection methods. Genome Biol. 19, 38 (2018).
doi: 10.1186/s13059-018-1404-6
pubmed: 29559002
pmcid: 5859555
Mohiyuddin, M. et al. MetaSV: an accurate and integrative structural-variant caller for next generation sequencing. Bioinformatics 31, 2741–2744 (2015).
doi: 10.1093/bioinformatics/btv204
pubmed: 25861968
pmcid: 4528635
Zarate, S. et al. Parliament2: accurate structural variant calling at scale. GigaScience 9, giaa145 (2020).
doi: 10.1093/gigascience/giaa145
pubmed: 33347570
pmcid: 7751401
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
doi: 10.1093/bioinformatics/bty191
pubmed: 29750242
pmcid: 6137996
Pedersen, B. S. et al. Somalier: rapid relatedness estimation for cancer and germline studies using efficient genome sketches. Genome Med. 12, 62 (2020).
doi: 10.1186/s13073-020-00761-2
pubmed: 32664994
pmcid: 7362544
Hubley, R. et al. The Dfam database of repetitive DNA families. Nucleic Acids Res. 44, D81–D89 (2016).
doi: 10.1093/nar/gkv1272
pubmed: 26612867
Cunningham, F. et al. Ensembl 2022. Nucleic Acids Res. 50, D988–D995 (2022).
doi: 10.1093/nar/gkab1049
pubmed: 34791404
McInnes, L. et al. UMAP: uniform manifold approximation and projection. J. Open Source Softw. 3, 861 (2018).
doi: 10.21105/joss.00861
Diaz-Papkovich, A. et al. UMAP reveals cryptic population structure and phenotype heterogeneity in large genomic cohorts. PLOS Genet. 15, e1008432 (2019).
doi: 10.1371/journal.pgen.1008432
pubmed: 31675358
pmcid: 6853336
Pedersen, B. S. & Quinlan, A. R. Mosdepth: quick coverage calculation for genomes and exomes. Bioinformatics 34, 867–868 (2018).
doi: 10.1093/bioinformatics/btx699
pubmed: 29096012