Comprehensive genome analysis and variant detection at scale using DRAGEN.
Journal
Nature biotechnology
ISSN: 1546-1696
Titre abrégé: Nat Biotechnol
Pays: United States
ID NLM: 9604648
Informations de publication
Date de publication:
25 Oct 2024
25 Oct 2024
Historique:
received:
24
12
2023
accepted:
08
08
2024
medline:
26
10
2024
pubmed:
26
10
2024
entrez:
25
10
2024
Statut:
aheadofprint
Résumé
Research and medical genomics require comprehensive, scalable methods for the discovery of novel disease targets, evolutionary drivers and genetic markers with clinical significance. This necessitates a framework to identify all types of variants independent of their size or location. Here we present DRAGEN, which uses multigenome mapping with pangenome references, hardware acceleration and machine learning-based variant detection to provide insights into individual genomes, with ~30 min of computation time from raw reads to variant detection. DRAGEN outperforms current state-of-the-art methods in speed and accuracy across all variant types (single-nucleotide variations, insertions or deletions, short tandem repeats, structural variations and copy number variations) and incorporates specialized methods for analysis of medically relevant genes. We demonstrate the performance of DRAGEN across 3,202 whole-genome sequencing datasets by generating fully genotyped multisample variant call format files and demonstrate its scalability, accuracy and innovation to further advance the integration of comprehensive genomics. Overall, DRAGEN marks a major milestone in sequencing data analysis and will provide insights across various diseases, including Mendelian and rare diseases, with a highly comprehensive and scalable platform.
Identifiants
pubmed: 39455800
doi: 10.1038/s41587-024-02382-1
pii: 10.1038/s41587-024-02382-1
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Informations de copyright
© 2024. The Author(s).
Références
Goodwin, S., McPherson, J. D. & McCombie, W. R. Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet. 17, 333–351 (2016).
pubmed: 27184599
pmcid: 10373632
doi: 10.1038/nrg.2016.49
Zhang, J., Chiodini, R., Badr, A. & Zhang, G. The impact of next-generation sequencing on genomics. J. Genet. Genomics 38, 95–109 (2011).
pubmed: 21477781
pmcid: 3076108
doi: 10.1016/j.jgg.2011.02.003
Tarailo-Graovac, M., Wasserman, W. W. & Van Karnebeek, C. D. M. Impact of next-generation sequencing on diagnosis and management of neurometabolic disorders: current advances and future perspectives. Expert Rev. Mol. Diagn. 17, 307–309 (2017).
pubmed: 28277145
doi: 10.1080/14737159.2017.1293527
Satam, H. et al. Next-generation sequencing technology: current trends and advancements. Biology 12, 997 (2023).
pubmed: 37508427
pmcid: 10376292
doi: 10.3390/biology12070997
Coster, W. D., De Coster, W., Weissensteiner, M. H. & Sedlazeck, F. J. Towards population-scale long-read sequencing. Nat. Rev. Genet. 22, 572–587 (2021).
Sedlazeck, F. J., Lee, H., Darby, C. A. & Schatz, M. C. Piercing the dark matter: bioinformatics of long-range sequencing and mapping. Nat. Rev. Genet. 19, 329–346 (2018).
pubmed: 29599501
doi: 10.1038/s41576-018-0003-4
Mahmoud, M. et al. Structural variant calling: the long and the short of it. Genome Biol. 20, 246 (2019).
pubmed: 31747936
pmcid: 6868818
doi: 10.1186/s13059-019-1828-7
Rozowsky, J. et al. The EN-TEx resource of multi-tissue personal epigenomes & variant-impact models. Cell 186, 1493–1511 (2023).
pubmed: 37001506
pmcid: 10074325
doi: 10.1016/j.cell.2023.02.018
Sedlazeck, F. J. et al. Multiethnic catalog of structural variants and their translational impact for disease phenotypes across 19,652 genomes. Preprint at bioRxiv https://doi.org/10.1101/2020.05.02.074096 (2020).
Depienne, C. & Mandel, J. L. 30 years of repeat expansion disorders: what have we learned and what are the remaining challenges? Am. J. Hum. Genet. 108, 764–785 (2021).
pubmed: 33811808
pmcid: 8205997
doi: 10.1016/j.ajhg.2021.03.011
Mirceta, M., Shum, N., Schmidt, M. H. M. & Pearson, C. E. Fragile sites, chromosomal lesions, tandem repeats, and disease. Front. Genet. 13, 985975 (2022).
pubmed: 36468036
pmcid: 9714581
doi: 10.3389/fgene.2022.985975
Wagner, J. et al. Curated variation benchmarks for challenging medically relevant autosomal genes. Nat Biotechnol 40, 672–680 (2022).
pubmed: 35132260
pmcid: 9117392
doi: 10.1038/s41587-021-01158-1
Chen, S. et al. Paragraph: a graph-based structural variant genotyper for short-read sequence data. Genome Biol. 20, 291 (2019).
pubmed: 31856913
pmcid: 6921448
doi: 10.1186/s13059-019-1909-7
Ebler, J. et al. Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes. Nat. Genet. 54, 518–525 (2022).
pubmed: 35410384
pmcid: 9005351
doi: 10.1038/s41588-022-01043-w
Sirén, J. et al. Pangenomics enables genotyping of known structural variants in 5202 diverse genomes. Science 374, abg8871 (2021).
pubmed: 34914532
pmcid: 9365333
doi: 10.1126/science.abg8871
Olson, N. D. et al. Variant calling and benchmarking in an era of complete human genome sequences. Nat. Rev. Genet. 24, 464–483 (2023).
pubmed: 37059810
doi: 10.1038/s41576-023-00590-0
Zhang, F. & Lupski, J. R. Non-coding genetic variants in human disease. Hum. Mol. Genet. 24, R102–R110 (2015).
pubmed: 26152199
pmcid: 4572001
doi: 10.1093/hmg/ddv259
Abel, H. J. et al. Mapping and characterization of structural variation in 17,795 human genomes. Nature 583, 83–89 (2020).
pubmed: 32460305
pmcid: 7547914
doi: 10.1038/s41586-020-2371-0
Jun, G. et al. Structural variation across 138,134 samples in the TOPMed consortium. Preprint at Res. Sq. https://doi.org/10.21203/rs.3.rs-2515453/v1 (2023).
Maroilley, T. & Tarailo-Graovac, M. Uncovering missing heritability in rare diseases. Genes 10, 275 (2019).
pubmed: 30987386
pmcid: 6523881
doi: 10.3390/genes10040275
Theunissen, F. et al. Structural variants may be a source of missing heritability in sALS. Front. Neurosci. 14, 47 (2020).
pubmed: 32082115
pmcid: 7005198
doi: 10.3389/fnins.2020.00047
The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).
doi: 10.1038/nature15393
Behera, S. et al. FixItFelix: improving genomic analysis by fixing reference errors. Genome Biol. 24, 31 (2023).
pubmed: 36810122
pmcid: 9942314
doi: 10.1186/s13059-023-02863-7
Chen, X. et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32, 1220–1222 (2016).
pubmed: 26647377
doi: 10.1093/bioinformatics/btv710
Dolzhenko, E. et al. ExpansionHunter: a sequence-graph-based tool to analyze variation in short tandem repeat regions. Bioinformatics 35, 4754–4756 (2019).
pubmed: 31134279
pmcid: 6853681
doi: 10.1093/bioinformatics/btz431
Chen, X. et al. Cyrius: accurate CYP2D6 genotyping using whole-genome sequencing data. Pharmacogenomics J. 21, 251–261 (2021).
pubmed: 33462347
pmcid: 7997805
doi: 10.1038/s41397-020-00205-5
Chen, X. et al. Spinal muscular atrophy diagnosis and carrier screening from genome sequencing data. Genet. Med. 22, 945–953 (2020).
pubmed: 32066871
pmcid: 7200598
doi: 10.1038/s41436-020-0754-0
Toffoli, M. et al. Comprehensive short and long read sequencing analysis for the Gaucher and Parkinson’s disease-associated GBA gene. Commun. Biol. 5, 670 (2022).
pubmed: 35794204
pmcid: 9259685
doi: 10.1038/s42003-022-03610-7
Behera, S. et al. Identification of allele-specific KIV-2 repeats and impact on Lp(a) measurements for cardiovascular disease risk. Preprint at bioRxiv https://doi.org/10.1101/2023.04.24.538128 (2023).
Davies, B. J., Coller, J. K., Somogyi, A. A., Milne, R. W. & Sallustio, B. C. CYP2B6, CYP2D6, and CYP3A4 catalyze the primary oxidative metabolism of perhexiline enantiomers by human liver microsomes. Drug Metab. Dispos. 35, 128–138 (2007).
pubmed: 17050648
doi: 10.1124/dmd.106.012252
Prado, M. J. et al. Variant predictions in congenital adrenal hyperplasia caused by mutations in CYP21A2. Front. Pharmacol. 13, 931089 (2022).
pubmed: 36278220
pmcid: 9579345
doi: 10.3389/fphar.2022.931089
Gaubert, S. et al. Exploring the link between GBA1 mutations and dementia with Lewy bodies. A mini-review. Neurosci. Biobehav. Rev. 141, 104856 (2022).
pubmed: 36084847
doi: 10.1016/j.neubiorev.2022.104856
Riboldi, G. M. & Di Fonzo, A. B. Gaucher disease, and Parkinson’s disease: from genetic to clinic to new therapeutic approaches. Cells 8, 364 (2019).
pubmed: 31010158
pmcid: 6523296
doi: 10.3390/cells8040364
Mosaad, Y. M. Clinical role of human leukocyte antigen in health and disease. Scand. J. Immunol. 82, 283–306 (2015).
pubmed: 26099424
doi: 10.1111/sji.12329
Liu, B., Shao, Y. & Fu, R. Current research status of HLA in immune-related diseases. Immun. Inflamm. Dis. 9, 340–350 (2021).
pubmed: 33657268
pmcid: 8127548
doi: 10.1002/iid3.416
Galanello, R. & Cao, A. α-Thalassemia. Genet. Med. 13, 83–88 (2011).
pubmed: 21381239
doi: 10.1097/GIM.0b013e3181fcb468
Coassin, S. et al. A novel but frequent variant in LPA KIV-2 is associated with a pronounced Lp(a) and cardiovascular risk reduction. Eur. Heart J. 38, 1823–1831 (2017).
pubmed: 28444229
pmcid: 5837733
doi: 10.1093/eurheartj/ehx174
Wheeler, M. M. et al. Genomic characterization of the RH locus detects complex and novel structural variation in multi-ethnic cohorts. Genet. Med. 21, 477–486 (2019).
pubmed: 29955105
doi: 10.1038/s41436-018-0074-9
Zook, J. M. et al. A robust benchmark for detection of germline large deletions and insertions. Nat. Biotechnol. 38, 1347–1355 (2020).
pubmed: 32541955
pmcid: 8454654
doi: 10.1038/s41587-020-0538-8
Wagner, J. et al. Benchmarking challenging small variants with linked and long reads. Cell Genom. 2, 100128 (2022).
pubmed: 36452119
pmcid: 9706577
doi: 10.1016/j.xgen.2022.100128
Wagner, J. et al. Curated variation benchmarks for challenging medically relevant autosomal genes. Nat. Biotechnol. 40, 672–680 (2022).
pubmed: 35132260
pmcid: 9117392
doi: 10.1038/s41587-021-01158-1
Majidian, S., Agustinho, D. P., Chin, C.-S., Sedlazeck, F. J. & Mahmoud, M. Genomic variant benchmark: if you cannot measure it, you cannot improve it. Genome Biol. 24, 221 (2023).
pubmed: 37798733
pmcid: 10552390
doi: 10.1186/s13059-023-03061-1
Ng, C. & Piscuoglio, S. Variant Calling: Methods and Protocols (Springer Nature, 2022).
Poplin, R. et al. Scaling accurate genetic variant discovery to tens of thousands of samples. Preprint at bioRxiv https://doi.org/10.1101/201178 (2017).
Poplin, R. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36, 983–987 (2018).
pubmed: 30247488
doi: 10.1038/nbt.4235
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://doi.org/10.48550/arXiv.1303.3997 (2013).
Rausch, T. et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28, i333–i339 (2012).
pubmed: 22962449
pmcid: 3436805
doi: 10.1093/bioinformatics/bts378
Layer, R. M., Chiang, C., Quinlan, A. R. & Hall, I. M. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 15, R84 (2014).
pubmed: 24970577
pmcid: 4197822
doi: 10.1186/gb-2014-15-6-r84
Abyzov, A., Urban, A. E., Snyder, M. & Gerstein, M. CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 21, 974–984 (2011).
pubmed: 21324876
pmcid: 3106330
doi: 10.1101/gr.114876.110
English, A.C. et al. Analysis and benchmarking of small and large genomic variants across tandem repeats. Nat Biotechnol https://doi.org/10.1038/s41587-024-02225-z (2024)
English, A. C., Menon, V. K., Gibbs, R. A., Metcalf, G. A. & Sedlazeck, F. J. Truvari: refined structural variant comparison preserves allelic diversity. Genome Biol. 23, 271 (2022).
pubmed: 36575487
pmcid: 9793516
doi: 10.1186/s13059-022-02840-6
Mousavi, N., Shleizer-Burko, S., Yanicky, R. & Gymrek, M. Profiling the genome-wide landscape of tandem repeat expansions. Nucleic Acids Res. 47, e90 (2019).
pubmed: 31194863
pmcid: 6735967
doi: 10.1093/nar/gkz501
Pratt, V. M. et al. Characterization of 137 genomic DNA reference materials for 28 pharmacogenetic genes: a GeT-RM collaborative project. J. Mol. Diagn. 18, 109–123 (2016).
pubmed: 26621101
pmcid: 4695224
doi: 10.1016/j.jmoldx.2015.08.005
Dilthey, A. T. et al. HLA*LA—HLA typing from linearly projected graph alignments. Bioinformatics 35, 4394–4396 (2019).
pubmed: 30942877
pmcid: 6821427
doi: 10.1093/bioinformatics/btz235
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
pubmed: 31375807
pmcid: 7605509
doi: 10.1038/s41587-019-0201-4
Song, L., Bai, G., Liu, X. S., Li, B. & Li, H. Efficient and accurate KIR and HLA genotyping with massively parallel sequencing data. Genome Res. 33, 923–931 (2023).
pubmed: 37169596
pmcid: 10519407
doi: 10.1101/gr.277585.122
Dilthey, A. T. et al. High-accuracy HLA type inference from whole-genome sequencing data using population reference graphs. PLoS Comput. Biol. 12, e1005151 (2016).
pubmed: 27792722
pmcid: 5085092
doi: 10.1371/journal.pcbi.1005151
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
pubmed: 19451168
pmcid: 2705234
doi: 10.1093/bioinformatics/btp324
Olson, N. D. et al. PrecisionFDA Truth Challenge V2: calling variants from short and long reads in difficult-to-map regions. Cell Genom. 2, 100129 (2022).
pubmed: 35720974
pmcid: 9205427
doi: 10.1016/j.xgen.2022.100129
Byrska-Bishop, M. et al. High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. Cell 185, 3426–3440 (2022).
pubmed: 36055201
pmcid: 9439720
doi: 10.1016/j.cell.2022.08.004
Collins, R. L. et al. Author correction: A structural variation reference for medical and population genetics. Nature 590, E55 (2021).
pubmed: 33536627
pmcid: 8064907
doi: 10.1038/s41586-020-03176-6
Larson, D. E. et al. svtools: population-scale analysis of structural variation. Bioinformatics 35, 4782–4787 (2019).
pubmed: 31218349
pmcid: 6853660
doi: 10.1093/bioinformatics/btz492
Ebert, P. HGSVC2 Project code contributions. Zenodo https://doi.org/10.5281/ZENODO.4482026 (2021).
Stromberg, M. et al. Nirvana. In Proc. 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics (eds. Haspel, N., Cowen, L. J., Shehu, A., Kahveci, T. & Pozzi, G.) 596 (Association for Computing Machinery, 2017).
Morris, A. A. M. et al. Guidelines for the diagnosis and management of cystathionine β-synthase deficiency. J. Inherit. Metab. Dis. 40, 49–74 (2017).
pubmed: 27778219
doi: 10.1007/s10545-016-9979-0
Gabory, A., Jammes, H. & Dandolo, L. The H19 locus: role of an imprinted non-coding RNA in growth and development. Bioessays 32, 473–480 (2010).
pubmed: 20486133
doi: 10.1002/bies.200900170
Sawada, Y. et al. Cutaneous innate immune tolerance is mediated by epigenetic control of MAP2K3 by HDAC8/9. Sci. Immunol. 6, eabe1935 (2021).
pubmed: 34021025
pmcid: 8363943
doi: 10.1126/sciimmunol.abe1935
Ryan, D. P. et al. Mutations in potassium channel Kir2.6 cause susceptibility to thyrotoxic hypokalemic periodic paralysis. Cell 140, 88–98 (2010).
pubmed: 20074522
pmcid: 2885139
doi: 10.1016/j.cell.2009.12.024
Prior, T. W. Carrier screening for spinal muscular atrophy. Genet. Med. 10, 840–842 (2008).
pubmed: 18941424
pmcid: 3110347
doi: 10.1097/GIM.0b013e318188d069
Carvalho, C. M. B. & Lupski, J. R. Mechanisms underlying structural variant formation in genomic disorders. Nat. Rev. Genet. 17, 224–238 (2016).
pubmed: 26924765
pmcid: 4827625
doi: 10.1038/nrg.2015.25
Meyerson, M. & Pellman, D. Cancer genomes evolve by pulverizing single chromosomes. Cell 144, 9–10 (2011).
pubmed: 21215363
doi: 10.1016/j.cell.2010.12.025
Verkerk, A. J. et al. Identification of a gene (FMR-1) containing a CGG repeat coincident with a breakpoint cluster region exhibiting length variation in fragile X syndrome. Cell 65, 905–914 (1991).
pubmed: 1710175
doi: 10.1016/0092-8674(91)90397-H
Liao, W.-W. et al. A draft human pangenome reference. Nature 617, 312–324 (2023).
pubmed: 37165242
pmcid: 10172123
doi: 10.1038/s41586-023-05896-x
Ibañez, K. et al. Whole genome sequencing for the diagnosis of neurological repeat expansion disorders in the UK: a retrospective diagnostic accuracy and prospective clinical validation study. Lancet Neurol. 21, 234–245 (2022).
pubmed: 35182509
pmcid: 8850201
doi: 10.1016/S1474-4422(21)00462-2
Szolek, A. et al. OptiType: precision HLA typing from next-generation sequencing data. Bioinformatics 30, 3310–3316 (2014).
pubmed: 25143287
pmcid: 4441069
doi: 10.1093/bioinformatics/btu548
Mukherjee, K. et al. A starring role for pharmacogenomics: development and verification of “star allele” calling for 20 critical PGx genes using the DRAGEN Bio-IT platform. Genomics Research Hub https://www.illumina.com/science/genomics-research/articles/PGx-research-blog.html (2022).
Belyeu, J. R. et al. Overcoming high homology to detect variation in CYP21A2 with whole-genome sequencing in DRAGEN. Genomics Research Hub https://www.illumina.com/science/genomics-research/articles/CYP21A2.html (2023).
Han, S., Onuchic, V., Rossi, M., Roller, E. & Cameron, D. Genotyping of high homology HBA1 and HBA2 from Illumina whole-genome sequencing. Genomics Research Hub https://www.illumina.com/science/genomics-research/articles/HBA-targeted-caller.html (2022).
Giraffe DeepVariant Lite. Zenodo https://doi.org/10.5281/zenodo.6647019 (2022).
Behera, S. DRAGEN analysis. GitHub https://github.com/srbehera/DRAGEN_Analysis/ (2023).
Cleary, J. G. et al. Comparing variant call files for performance benchmarking of next-generation sequencing variant calling pipelines. Preprint at bioRxiv https://doi.org/10.1101/023754 (2015).
Wan, Y. & Ho, K. Wittyer. GitHub https://github.com/Illumina/witty.er (2023).
English, A. Project Adotto. GitHub https://github.com/ACEnglish/adotto/ (2023).
Lin, M. F. et al. GLnexus: joint variant calling for large cohort sequencing. Preprint at bioRxiv https://doi.org/10.1101/343970 (2018).
Jeffares, D. C. et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat. Commun. 8, 14061 (2017).
pubmed: 28117401
pmcid: 5286201
doi: 10.1038/ncomms14061
Arthur, R., Schulz-Trieglaff, O., Cox, A.J. & O'Connell, J. AKT: ancestry and kinship toolkit. Bioinformatics 33, 142–144 (2017).
pubmed: 27634946
doi: 10.1093/bioinformatics/btw576
Nicholas, T. J., Cormier, M. J. & Quinlan, A. R. Annotation of structural variants with reported allele frequencies and related metrics from multiple datasets using SVAFotate. BMC Bioinformatics 23, 490 (2022).
pubmed: 36384437
pmcid: 9670370
doi: 10.1186/s12859-022-05008-y
Bonfield, J. K. et al. HTSlib: C library for reading/writing high-throughput sequencing data. Gigascience 10, giab007 (2021).
pubmed: 33594436
pmcid: 7931820
doi: 10.1093/gigascience/giab007
Behera, S. DRAGEN files for HG002. Zenodo https://zenodo.org/records/8350256 (2023).
Behera, S. Variant calls for HG001–07. Zenodo https://zenodo.org/uploads/10428664 (2023).