Comprehensive genome analysis and variant detection at scale using DRAGEN.


Journal

Nature biotechnology
ISSN: 1546-1696
Titre abrégé: Nat Biotechnol
Pays: United States
ID NLM: 9604648

Informations de publication

Date de publication:
25 Oct 2024
Historique:
received: 24 12 2023
accepted: 08 08 2024
medline: 26 10 2024
pubmed: 26 10 2024
entrez: 25 10 2024
Statut: aheadofprint

Résumé

Research and medical genomics require comprehensive, scalable methods for the discovery of novel disease targets, evolutionary drivers and genetic markers with clinical significance. This necessitates a framework to identify all types of variants independent of their size or location. Here we present DRAGEN, which uses multigenome mapping with pangenome references, hardware acceleration and machine learning-based variant detection to provide insights into individual genomes, with ~30 min of computation time from raw reads to variant detection. DRAGEN outperforms current state-of-the-art methods in speed and accuracy across all variant types (single-nucleotide variations, insertions or deletions, short tandem repeats, structural variations and copy number variations) and incorporates specialized methods for analysis of medically relevant genes. We demonstrate the performance of DRAGEN across 3,202 whole-genome sequencing datasets by generating fully genotyped multisample variant call format files and demonstrate its scalability, accuracy and innovation to further advance the integration of comprehensive genomics. Overall, DRAGEN marks a major milestone in sequencing data analysis and will provide insights across various diseases, including Mendelian and rare diseases, with a highly comprehensive and scalable platform.

Identifiants

pubmed: 39455800
doi: 10.1038/s41587-024-02382-1
pii: 10.1038/s41587-024-02382-1
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Informations de copyright

© 2024. The Author(s).

Références

Goodwin, S., McPherson, J. D. & McCombie, W. R. Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet. 17, 333–351 (2016).
pubmed: 27184599 pmcid: 10373632 doi: 10.1038/nrg.2016.49
Zhang, J., Chiodini, R., Badr, A. & Zhang, G. The impact of next-generation sequencing on genomics. J. Genet. Genomics 38, 95–109 (2011).
pubmed: 21477781 pmcid: 3076108 doi: 10.1016/j.jgg.2011.02.003
Tarailo-Graovac, M., Wasserman, W. W. & Van Karnebeek, C. D. M. Impact of next-generation sequencing on diagnosis and management of neurometabolic disorders: current advances and future perspectives. Expert Rev. Mol. Diagn. 17, 307–309 (2017).
pubmed: 28277145 doi: 10.1080/14737159.2017.1293527
Satam, H. et al. Next-generation sequencing technology: current trends and advancements. Biology 12, 997 (2023).
pubmed: 37508427 pmcid: 10376292 doi: 10.3390/biology12070997
Coster, W. D., De Coster, W., Weissensteiner, M. H. & Sedlazeck, F. J. Towards population-scale long-read sequencing. Nat. Rev. Genet. 22, 572–587 (2021).
Sedlazeck, F. J., Lee, H., Darby, C. A. & Schatz, M. C. Piercing the dark matter: bioinformatics of long-range sequencing and mapping. Nat. Rev. Genet. 19, 329–346 (2018).
pubmed: 29599501 doi: 10.1038/s41576-018-0003-4
Mahmoud, M. et al. Structural variant calling: the long and the short of it. Genome Biol. 20, 246 (2019).
pubmed: 31747936 pmcid: 6868818 doi: 10.1186/s13059-019-1828-7
Rozowsky, J. et al. The EN-TEx resource of multi-tissue personal epigenomes & variant-impact models. Cell 186, 1493–1511 (2023).
pubmed: 37001506 pmcid: 10074325 doi: 10.1016/j.cell.2023.02.018
Sedlazeck, F. J. et al. Multiethnic catalog of structural variants and their translational impact for disease phenotypes across 19,652 genomes. Preprint at bioRxiv https://doi.org/10.1101/2020.05.02.074096 (2020).
Depienne, C. & Mandel, J. L. 30 years of repeat expansion disorders: what have we learned and what are the remaining challenges? Am. J. Hum. Genet. 108, 764–785 (2021).
pubmed: 33811808 pmcid: 8205997 doi: 10.1016/j.ajhg.2021.03.011
Mirceta, M., Shum, N., Schmidt, M. H. M. & Pearson, C. E. Fragile sites, chromosomal lesions, tandem repeats, and disease. Front. Genet. 13, 985975 (2022).
pubmed: 36468036 pmcid: 9714581 doi: 10.3389/fgene.2022.985975
Wagner, J. et al. Curated variation benchmarks for challenging medically relevant autosomal genes. Nat Biotechnol 40, 672–680 (2022).
pubmed: 35132260 pmcid: 9117392 doi: 10.1038/s41587-021-01158-1
Chen, S. et al. Paragraph: a graph-based structural variant genotyper for short-read sequence data. Genome Biol. 20, 291 (2019).
pubmed: 31856913 pmcid: 6921448 doi: 10.1186/s13059-019-1909-7
Ebler, J. et al. Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes. Nat. Genet. 54, 518–525 (2022).
pubmed: 35410384 pmcid: 9005351 doi: 10.1038/s41588-022-01043-w
Sirén, J. et al. Pangenomics enables genotyping of known structural variants in 5202 diverse genomes. Science 374, abg8871 (2021).
pubmed: 34914532 pmcid: 9365333 doi: 10.1126/science.abg8871
Olson, N. D. et al. Variant calling and benchmarking in an era of complete human genome sequences. Nat. Rev. Genet. 24, 464–483 (2023).
pubmed: 37059810 doi: 10.1038/s41576-023-00590-0
Zhang, F. & Lupski, J. R. Non-coding genetic variants in human disease. Hum. Mol. Genet. 24, R102–R110 (2015).
pubmed: 26152199 pmcid: 4572001 doi: 10.1093/hmg/ddv259
Abel, H. J. et al. Mapping and characterization of structural variation in 17,795 human genomes. Nature 583, 83–89 (2020).
pubmed: 32460305 pmcid: 7547914 doi: 10.1038/s41586-020-2371-0
Jun, G. et al. Structural variation across 138,134 samples in the TOPMed consortium. Preprint at Res. Sq. https://doi.org/10.21203/rs.3.rs-2515453/v1 (2023).
Maroilley, T. & Tarailo-Graovac, M. Uncovering missing heritability in rare diseases. Genes 10, 275 (2019).
pubmed: 30987386 pmcid: 6523881 doi: 10.3390/genes10040275
Theunissen, F. et al. Structural variants may be a source of missing heritability in sALS. Front. Neurosci. 14, 47 (2020).
pubmed: 32082115 pmcid: 7005198 doi: 10.3389/fnins.2020.00047
The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).
doi: 10.1038/nature15393
Behera, S. et al. FixItFelix: improving genomic analysis by fixing reference errors. Genome Biol. 24, 31 (2023).
pubmed: 36810122 pmcid: 9942314 doi: 10.1186/s13059-023-02863-7
Chen, X. et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32, 1220–1222 (2016).
pubmed: 26647377 doi: 10.1093/bioinformatics/btv710
Dolzhenko, E. et al. ExpansionHunter: a sequence-graph-based tool to analyze variation in short tandem repeat regions. Bioinformatics 35, 4754–4756 (2019).
pubmed: 31134279 pmcid: 6853681 doi: 10.1093/bioinformatics/btz431
Chen, X. et al. Cyrius: accurate CYP2D6 genotyping using whole-genome sequencing data. Pharmacogenomics J. 21, 251–261 (2021).
pubmed: 33462347 pmcid: 7997805 doi: 10.1038/s41397-020-00205-5
Chen, X. et al. Spinal muscular atrophy diagnosis and carrier screening from genome sequencing data. Genet. Med. 22, 945–953 (2020).
pubmed: 32066871 pmcid: 7200598 doi: 10.1038/s41436-020-0754-0
Toffoli, M. et al. Comprehensive short and long read sequencing analysis for the Gaucher and Parkinson’s disease-associated GBA gene. Commun. Biol. 5, 670 (2022).
pubmed: 35794204 pmcid: 9259685 doi: 10.1038/s42003-022-03610-7
Behera, S. et al. Identification of allele-specific KIV-2 repeats and impact on Lp(a) measurements for cardiovascular disease risk. Preprint at bioRxiv https://doi.org/10.1101/2023.04.24.538128 (2023).
Davies, B. J., Coller, J. K., Somogyi, A. A., Milne, R. W. & Sallustio, B. C. CYP2B6, CYP2D6, and CYP3A4 catalyze the primary oxidative metabolism of perhexiline enantiomers by human liver microsomes. Drug Metab. Dispos. 35, 128–138 (2007).
pubmed: 17050648 doi: 10.1124/dmd.106.012252
Prado, M. J. et al. Variant predictions in congenital adrenal hyperplasia caused by mutations in CYP21A2. Front. Pharmacol. 13, 931089 (2022).
pubmed: 36278220 pmcid: 9579345 doi: 10.3389/fphar.2022.931089
Gaubert, S. et al. Exploring the link between GBA1 mutations and dementia with Lewy bodies. A mini-review. Neurosci. Biobehav. Rev. 141, 104856 (2022).
pubmed: 36084847 doi: 10.1016/j.neubiorev.2022.104856
Riboldi, G. M. & Di Fonzo, A. B. Gaucher disease, and Parkinson’s disease: from genetic to clinic to new therapeutic approaches. Cells 8, 364 (2019).
pubmed: 31010158 pmcid: 6523296 doi: 10.3390/cells8040364
Mosaad, Y. M. Clinical role of human leukocyte antigen in health and disease. Scand. J. Immunol. 82, 283–306 (2015).
pubmed: 26099424 doi: 10.1111/sji.12329
Liu, B., Shao, Y. & Fu, R. Current research status of HLA in immune-related diseases. Immun. Inflamm. Dis. 9, 340–350 (2021).
pubmed: 33657268 pmcid: 8127548 doi: 10.1002/iid3.416
Galanello, R. & Cao, A. α-Thalassemia. Genet. Med. 13, 83–88 (2011).
pubmed: 21381239 doi: 10.1097/GIM.0b013e3181fcb468
Coassin, S. et al. A novel but frequent variant in LPA KIV-2 is associated with a pronounced Lp(a) and cardiovascular risk reduction. Eur. Heart J. 38, 1823–1831 (2017).
pubmed: 28444229 pmcid: 5837733 doi: 10.1093/eurheartj/ehx174
Wheeler, M. M. et al. Genomic characterization of the RH locus detects complex and novel structural variation in multi-ethnic cohorts. Genet. Med. 21, 477–486 (2019).
pubmed: 29955105 doi: 10.1038/s41436-018-0074-9
Zook, J. M. et al. A robust benchmark for detection of germline large deletions and insertions. Nat. Biotechnol. 38, 1347–1355 (2020).
pubmed: 32541955 pmcid: 8454654 doi: 10.1038/s41587-020-0538-8
Wagner, J. et al. Benchmarking challenging small variants with linked and long reads. Cell Genom. 2, 100128 (2022).
pubmed: 36452119 pmcid: 9706577 doi: 10.1016/j.xgen.2022.100128
Wagner, J. et al. Curated variation benchmarks for challenging medically relevant autosomal genes. Nat. Biotechnol. 40, 672–680 (2022).
pubmed: 35132260 pmcid: 9117392 doi: 10.1038/s41587-021-01158-1
Majidian, S., Agustinho, D. P., Chin, C.-S., Sedlazeck, F. J. & Mahmoud, M. Genomic variant benchmark: if you cannot measure it, you cannot improve it. Genome Biol. 24, 221 (2023).
pubmed: 37798733 pmcid: 10552390 doi: 10.1186/s13059-023-03061-1
Ng, C. & Piscuoglio, S. Variant Calling: Methods and Protocols (Springer Nature, 2022).
Poplin, R. et al. Scaling accurate genetic variant discovery to tens of thousands of samples. Preprint at bioRxiv https://doi.org/10.1101/201178 (2017).
Poplin, R. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36, 983–987 (2018).
pubmed: 30247488 doi: 10.1038/nbt.4235
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://doi.org/10.48550/arXiv.1303.3997 (2013).
Rausch, T. et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28, i333–i339 (2012).
pubmed: 22962449 pmcid: 3436805 doi: 10.1093/bioinformatics/bts378
Layer, R. M., Chiang, C., Quinlan, A. R. & Hall, I. M. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 15, R84 (2014).
pubmed: 24970577 pmcid: 4197822 doi: 10.1186/gb-2014-15-6-r84
Abyzov, A., Urban, A. E., Snyder, M. & Gerstein, M. CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 21, 974–984 (2011).
pubmed: 21324876 pmcid: 3106330 doi: 10.1101/gr.114876.110
English, A.C. et al. Analysis and benchmarking of small and large genomic variants across tandem repeats. Nat Biotechnol https://doi.org/10.1038/s41587-024-02225-z (2024)
English, A. C., Menon, V. K., Gibbs, R. A., Metcalf, G. A. & Sedlazeck, F. J. Truvari: refined structural variant comparison preserves allelic diversity. Genome Biol. 23, 271 (2022).
pubmed: 36575487 pmcid: 9793516 doi: 10.1186/s13059-022-02840-6
Mousavi, N., Shleizer-Burko, S., Yanicky, R. & Gymrek, M. Profiling the genome-wide landscape of tandem repeat expansions. Nucleic Acids Res. 47, e90 (2019).
pubmed: 31194863 pmcid: 6735967 doi: 10.1093/nar/gkz501
Pratt, V. M. et al. Characterization of 137 genomic DNA reference materials for 28 pharmacogenetic genes: a GeT-RM collaborative project. J. Mol. Diagn. 18, 109–123 (2016).
pubmed: 26621101 pmcid: 4695224 doi: 10.1016/j.jmoldx.2015.08.005
Dilthey, A. T. et al. HLA*LA—HLA typing from linearly projected graph alignments. Bioinformatics 35, 4394–4396 (2019).
pubmed: 30942877 pmcid: 6821427 doi: 10.1093/bioinformatics/btz235
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
pubmed: 31375807 pmcid: 7605509 doi: 10.1038/s41587-019-0201-4
Song, L., Bai, G., Liu, X. S., Li, B. & Li, H. Efficient and accurate KIR and HLA genotyping with massively parallel sequencing data. Genome Res. 33, 923–931 (2023).
pubmed: 37169596 pmcid: 10519407 doi: 10.1101/gr.277585.122
Dilthey, A. T. et al. High-accuracy HLA type inference from whole-genome sequencing data using population reference graphs. PLoS Comput. Biol. 12, e1005151 (2016).
pubmed: 27792722 pmcid: 5085092 doi: 10.1371/journal.pcbi.1005151
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
pubmed: 19451168 pmcid: 2705234 doi: 10.1093/bioinformatics/btp324
Olson, N. D. et al. PrecisionFDA Truth Challenge V2: calling variants from short and long reads in difficult-to-map regions. Cell Genom. 2, 100129 (2022).
pubmed: 35720974 pmcid: 9205427 doi: 10.1016/j.xgen.2022.100129
Byrska-Bishop, M. et al. High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. Cell 185, 3426–3440 (2022).
pubmed: 36055201 pmcid: 9439720 doi: 10.1016/j.cell.2022.08.004
Collins, R. L. et al. Author correction: A structural variation reference for medical and population genetics. Nature 590, E55 (2021).
pubmed: 33536627 pmcid: 8064907 doi: 10.1038/s41586-020-03176-6
Larson, D. E. et al. svtools: population-scale analysis of structural variation. Bioinformatics 35, 4782–4787 (2019).
pubmed: 31218349 pmcid: 6853660 doi: 10.1093/bioinformatics/btz492
Ebert, P. HGSVC2 Project code contributions. Zenodo https://doi.org/10.5281/ZENODO.4482026 (2021).
Stromberg, M. et al. Nirvana. In Proc. 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics (eds. Haspel, N., Cowen, L. J., Shehu, A., Kahveci, T. & Pozzi, G.) 596 (Association for Computing Machinery, 2017).
Morris, A. A. M. et al. Guidelines for the diagnosis and management of cystathionine β-synthase deficiency. J. Inherit. Metab. Dis. 40, 49–74 (2017).
pubmed: 27778219 doi: 10.1007/s10545-016-9979-0
Gabory, A., Jammes, H. & Dandolo, L. The H19 locus: role of an imprinted non-coding RNA in growth and development. Bioessays 32, 473–480 (2010).
pubmed: 20486133 doi: 10.1002/bies.200900170
Sawada, Y. et al. Cutaneous innate immune tolerance is mediated by epigenetic control of MAP2K3 by HDAC8/9. Sci. Immunol. 6, eabe1935 (2021).
pubmed: 34021025 pmcid: 8363943 doi: 10.1126/sciimmunol.abe1935
Ryan, D. P. et al. Mutations in potassium channel Kir2.6 cause susceptibility to thyrotoxic hypokalemic periodic paralysis. Cell 140, 88–98 (2010).
pubmed: 20074522 pmcid: 2885139 doi: 10.1016/j.cell.2009.12.024
Prior, T. W. Carrier screening for spinal muscular atrophy. Genet. Med. 10, 840–842 (2008).
pubmed: 18941424 pmcid: 3110347 doi: 10.1097/GIM.0b013e318188d069
Carvalho, C. M. B. & Lupski, J. R. Mechanisms underlying structural variant formation in genomic disorders. Nat. Rev. Genet. 17, 224–238 (2016).
pubmed: 26924765 pmcid: 4827625 doi: 10.1038/nrg.2015.25
Meyerson, M. & Pellman, D. Cancer genomes evolve by pulverizing single chromosomes. Cell 144, 9–10 (2011).
pubmed: 21215363 doi: 10.1016/j.cell.2010.12.025
Verkerk, A. J. et al. Identification of a gene (FMR-1) containing a CGG repeat coincident with a breakpoint cluster region exhibiting length variation in fragile X syndrome. Cell 65, 905–914 (1991).
pubmed: 1710175 doi: 10.1016/0092-8674(91)90397-H
Liao, W.-W. et al. A draft human pangenome reference. Nature 617, 312–324 (2023).
pubmed: 37165242 pmcid: 10172123 doi: 10.1038/s41586-023-05896-x
Ibañez, K. et al. Whole genome sequencing for the diagnosis of neurological repeat expansion disorders in the UK: a retrospective diagnostic accuracy and prospective clinical validation study. Lancet Neurol. 21, 234–245 (2022).
pubmed: 35182509 pmcid: 8850201 doi: 10.1016/S1474-4422(21)00462-2
Szolek, A. et al. OptiType: precision HLA typing from next-generation sequencing data. Bioinformatics 30, 3310–3316 (2014).
pubmed: 25143287 pmcid: 4441069 doi: 10.1093/bioinformatics/btu548
Mukherjee, K. et al. A starring role for pharmacogenomics: development and verification of “star allele” calling for 20 critical PGx genes using the DRAGEN Bio-IT platform. Genomics Research Hub https://www.illumina.com/science/genomics-research/articles/PGx-research-blog.html (2022).
Belyeu, J. R. et al. Overcoming high homology to detect variation in CYP21A2 with whole-genome sequencing in DRAGEN. Genomics Research Hub https://www.illumina.com/science/genomics-research/articles/CYP21A2.html (2023).
Han, S., Onuchic, V., Rossi, M., Roller, E. & Cameron, D. Genotyping of high homology HBA1 and HBA2 from Illumina whole-genome sequencing. Genomics Research Hub https://www.illumina.com/science/genomics-research/articles/HBA-targeted-caller.html (2022).
Giraffe DeepVariant Lite. Zenodo https://doi.org/10.5281/zenodo.6647019 (2022).
Behera, S. DRAGEN analysis. GitHub https://github.com/srbehera/DRAGEN_Analysis/ (2023).
Cleary, J. G. et al. Comparing variant call files for performance benchmarking of next-generation sequencing variant calling pipelines. Preprint at bioRxiv https://doi.org/10.1101/023754 (2015).
Wan, Y. & Ho, K. Wittyer. GitHub https://github.com/Illumina/witty.er (2023).
English, A. Project Adotto. GitHub https://github.com/ACEnglish/adotto/ (2023).
Lin, M. F. et al. GLnexus: joint variant calling for large cohort sequencing. Preprint at bioRxiv https://doi.org/10.1101/343970 (2018).
Jeffares, D. C. et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat. Commun. 8, 14061 (2017).
pubmed: 28117401 pmcid: 5286201 doi: 10.1038/ncomms14061
Arthur, R., Schulz-Trieglaff, O., Cox, A.J. & O'Connell, J. AKT: ancestry and kinship toolkit. Bioinformatics 33, 142–144 (2017).
pubmed: 27634946 doi: 10.1093/bioinformatics/btw576
Nicholas, T. J., Cormier, M. J. & Quinlan, A. R. Annotation of structural variants with reported allele frequencies and related metrics from multiple datasets using SVAFotate. BMC Bioinformatics 23, 490 (2022).
pubmed: 36384437 pmcid: 9670370 doi: 10.1186/s12859-022-05008-y
Bonfield, J. K. et al. HTSlib: C library for reading/writing high-throughput sequencing data. Gigascience 10, giab007 (2021).
pubmed: 33594436 pmcid: 7931820 doi: 10.1093/gigascience/giab007
Behera, S. DRAGEN files for HG002. Zenodo https://zenodo.org/records/8350256 (2023).
Behera, S. Variant calls for HG001–07. Zenodo https://zenodo.org/uploads/10428664 (2023).

Auteurs

Sairam Behera (S)

Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.

Severine Catreux (S)

Illumina, Inc., San Diego, CA, USA. scatreux@illumina.com.

Massimiliano Rossi (M)

Illumina, Inc., San Diego, CA, USA.

Sean Truong (S)

Illumina, Inc., San Diego, CA, USA.

Zhuoyi Huang (Z)

Illumina, Inc., San Diego, CA, USA.

Michael Ruehle (M)

Illumina, Inc., San Diego, CA, USA.

Arun Visvanath (A)

Illumina, Inc., San Diego, CA, USA.

Gavin Parnaby (G)

Illumina, Inc., San Diego, CA, USA.

Cooper Roddey (C)

Illumina, Inc., San Diego, CA, USA.

Vitor Onuchic (V)

Illumina, Inc., San Diego, CA, USA.

Andrea Finocchio (A)

Illumina, Inc., San Diego, CA, USA.

Daniel L Cameron (DL)

Illumina, Inc., San Diego, CA, USA.

Adam English (A)

Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.

Shyamal Mehtalia (S)

Illumina, Inc., San Diego, CA, USA.

James Han (J)

Illumina, Inc., San Diego, CA, USA. jhan6@illumina.com.

Rami Mehio (R)

Illumina, Inc., San Diego, CA, USA. rmehio@illumina.com.

Fritz J Sedlazeck (FJ)

Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA. fritz.sedlazeck@bcm.edu.
Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA. fritz.sedlazeck@bcm.edu.
Department of Computer Science, Rice University, Houston, TX, USA. fritz.sedlazeck@bcm.edu.

Classifications MeSH