Performance assessment of DNA sequencing platforms in the ABRF Next-Generation Sequencing Study.
Journal
Nature biotechnology
ISSN: 1546-1696
Titre abrégé: Nat Biotechnol
Pays: United States
ID NLM: 9604648
Informations de publication
Date de publication:
09 2021
09 2021
Historique:
received:
31
07
2020
accepted:
05
08
2021
entrez:
10
9
2021
pubmed:
11
9
2021
medline:
23
9
2021
Statut:
ppublish
Résumé
Assessing the reproducibility, accuracy and utility of massively parallel DNA sequencing platforms remains an ongoing challenge. Here the Association of Biomolecular Resource Facilities (ABRF) Next-Generation Sequencing Study benchmarks the performance of a set of sequencing instruments (HiSeq/NovaSeq/paired-end 2 × 250-bp chemistry, Ion S5/Proton, PacBio circular consensus sequencing (CCS), Oxford Nanopore Technologies PromethION/MinION, BGISEQ-500/MGISEQ-2000 and GS111) on human and bacterial reference DNA samples. Among short-read instruments, HiSeq 4000 and X10 provided the most consistent, highest genome coverage, while BGI/MGISEQ provided the lowest sequencing error rates. The long-read instrument PacBio CCS had the highest reference-based mapping rate and lowest non-mapping rate. The two long-read platforms PacBio CCS and PromethION/MinION showed the best sequence mapping in repeat-rich areas and across homopolymers. NovaSeq 6000 using 2 × 250-bp read chemistry was the most robust instrument for capturing known insertion/deletion events. This study serves as a benchmark for current genomics technologies, as well as a resource to inform experimental design and next-generation sequencing variant calling.
Identifiants
pubmed: 34504351
doi: 10.1038/s41587-021-01049-5
pii: 10.1038/s41587-021-01049-5
pmc: PMC8985210
mid: NIHMS1782172
doi:
Substances chimiques
DNA, Bacterial
0
DNA
9007-49-2
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.
Validation Study
Langues
eng
Sous-ensembles de citation
IM
Pagination
1129-1140Subventions
Organisme : NIAID NIH HHS
ID : R01 AI125416
Pays : United States
Organisme : NIEHS NIH HHS
ID : R01 ES021006
Pays : United States
Organisme : NIAID NIH HHS
ID : R21 AI129851
Pays : United States
Organisme : NIDA NIH HHS
ID : U01 DA053941
Pays : United States
Organisme : NIMH NIH HHS
ID : R01 MH117406
Pays : United States
Organisme : NIBIB NIH HHS
ID : R25 EB020393
Pays : United States
Organisme : NIAID NIH HHS
ID : R01 AI151059
Pays : United States
Organisme : NINDS NIH HHS
ID : R01 NS076465
Pays : United States
Organisme : NHGRI NIH HHS
ID : UM1 HG008898
Pays : United States
Commentaires et corrections
Type : ErratumIn
Informations de copyright
© 2021. The Author(s), under exclusive licence to Springer Nature America, Inc.
Références
Schuster, S. C. Next-generation sequencing transforms today’s biology. Nat. Methods 5, 16–18 (2008).
pubmed: 18165802
doi: 10.1038/nmeth1156
Shendure, J. & Ji, H. Next-generation DNA sequencing. Nat. Biotechnol. 26, 1135–1145 (2008).
pubmed: 18846087
doi: 10.1038/nbt1486
DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).
pubmed: 21478889
pmcid: 3083463
doi: 10.1038/ng.806
Mardis, E. R. The impact of next-generation sequencing technology on genetics. Trends Genet. 24, 133–141 (2008).
pubmed: 18262675
doi: 10.1016/j.tig.2007.12.007
MacLean, D., Jones, J. D. & Studholme, D. J. Application of ‘next-generation’ sequencing technologies to microbial genetics. Nature Rev. Microbiol. 7, 96–97 (2009).
doi: 10.1038/nrmicro2088
Glenn, T. C. Field guide to next-generation DNA sequencers. Mol. Ecol. Resour. 11, 759–769 (2011).
pubmed: 21592312
doi: 10.1111/j.1755-0998.2011.03024.x
Aziz, N. et al. College of American Pathologists’ laboratory standards for next-generation sequencing clinical tests. Arch. Pathol. Lab. Med. 139, 481–493 (2015).
pubmed: 25152313
doi: 10.5858/arpa.2014-0250-CP
Schlaberg, R. et al. Validation of metagenomic next-generation sequencing tests for universal pathogen detection. Arch. Pathol. Lab. Med. 141, 776–786 (2017).
pubmed: 28169558
doi: 10.5858/arpa.2016-0539-RA
Zhou, J. et al. Reproducibility and quantitation of amplicon sequencing-based detection. ISME J. 5, 1303–1313 (2011).
pubmed: 21346791
pmcid: 3146266
doi: 10.1038/ismej.2011.11
Mellmann, A. et al. High interlaboratory reproducibility and accuracy of next-generation-sequencing-based bacterial genotyping in a ring trial. J. Clin. Microbiol. 55, 908–913 (2017).
pubmed: 28053217
pmcid: 5328459
doi: 10.1128/JCM.02242-16
Quail, M. A. et al. A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics 13, 341 (2012).
pubmed: 22827831
pmcid: 3431227
doi: 10.1186/1471-2164-13-341
Shi, L. et al. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat. Biotechnol. 24, 1151–1161 (2006).
pubmed: 16964229
doi: 10.1038/nbt1239
Shi, L. et al. The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models. Nat. Biotechnol. 28, 827–838 (2010).
pubmed: 20676074
doi: 10.1038/nbt.1665
Li, S. et al. Multi-platform assessment of transcriptome profiling using RNA-seq in the ABRF next-generation sequencing study. Nat. Biotechnol. 32, 915–925 (2014).
pubmed: 25150835
pmcid: 4167418
doi: 10.1038/nbt.2972
Su, Z. et al. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nat. Biotechnol. 32, 903–914 (2014).
doi: 10.1038/nbt.2957
Wang, C. et al. The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance. Nat. Biotechnol. 32, 926–932 (2014).
pubmed: 25150839
pmcid: 4243706
doi: 10.1038/nbt.3001
Li, S. et al. Detecting and correcting systematic variation in large-scale RNA sequencing data. Nat. Biotechnol. 32, 888–895 (2014).
pubmed: 25150837
pmcid: 4160374
doi: 10.1038/nbt.3000
Risso, D., Ngai, J., Speed, T. P. & Dudoit, S. Normalization of RNA-seq data using factor analysis of control genes or samples. Nat. Biotechnol. 32, 896–902 (2014).
pubmed: 25150836
pmcid: 4404308
doi: 10.1038/nbt.2931
Merker, J. D. et al. Proficiency testing of standardized samples shows very high interlaboratory agreement for clinical next-generation sequencing–based oncology assays. Arch. Pathol. Lab. Med. 143, 463–471 (2019).
pubmed: 30376374
doi: 10.5858/arpa.2018-0336-CP
Mahamdallie, S. et al. The ICR639 CPG NGS validation series: a resource to assess analytical sensitivity of cancer predisposition gene testing. Wellcome Open Res. 3, 68 (2018).
pubmed: 30175241
pmcid: 6081973
doi: 10.12688/wellcomeopenres.14594.1
Zhong, Q. et al. Multi-laboratory proficiency testing of clinical cancer genomic profiling by next-generation sequencing. Pathol. Res. Pract. 214, 957–963 (2018).
pubmed: 29807778
doi: 10.1016/j.prp.2018.05.020
Zook, J. M. et al. An open resource for accurately benchmarking small variant and reference calls. Nat. Biotechnol. 37, 561–566 (2019).
pubmed: 30936564
pmcid: 6500473
doi: 10.1038/s41587-019-0074-6
Krusche, P. et al. Best practices for benchmarking germline small-variant calls in human genomes. Nat. Biotechnol. 37, 555–560 (2019).
pubmed: 30858580
pmcid: 6699627
doi: 10.1038/s41587-019-0054-x
Zook, J. M. et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci. Data 3, 160025 (2016).
pubmed: 27271295
pmcid: 4896128
doi: 10.1038/sdata.2016.25
Zook, J. M. et al. A robust benchmark for detection of germline large deletions and insertions. Nat. Biotechnol. 38, 1347–1355 (2020).
pubmed: 32541955
pmcid: 8454654
doi: 10.1038/s41587-020-0538-8
Ball, M. P. et al. A public resource facilitating clinical use of genomes. Proc. Natl Acad. Sci. USA 109, 11920–11927 (2012).
pubmed: 22797899
pmcid: 3409785
doi: 10.1073/pnas.1201904109
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
pubmed: 9862982
pmcid: 148217
doi: 10.1093/nar/27.2.573
Wagner, J. et al. Benchmarking challenging small variants with linked and long reads. Preprint at bioRxiv https://doi.org/10.1101/2020.07.24.212712 (2020).
Landrum, M. J. & Kattman, B. L. ClinVar at five years: delivering on the promise. Hum. Mutat. 39, 1623–1630 (2018).
pubmed: 30311387
doi: 10.1002/humu.23641
Amberger, J. S., Bocchini, C. A., Schiettecatte, F., Scott, A. F. & Hamosh, A. OMIM.org: Online Mendelian Inheritance in Man (OMIM), an online catalog of human genes and genetic disorders. Nucleic Acids Res. 43, D789–D798 (2015).
pubmed: 25428349
doi: 10.1093/nar/gku1205
Jeffares, D. C. et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat. Commun. 8, 14061 (2017).
pubmed: 28117401
pmcid: 5286201
doi: 10.1038/ncomms14061
Mahmoud, M. et al. Structural variant calling: the long and the short of it. Genome Biol. 20, 246 (2019).
pubmed: 31747936
pmcid: 6868818
doi: 10.1186/s13059-019-1828-7
Sedlazeck, F. J. et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 15, 461–468 (2018).
pubmed: 29713083
pmcid: 5990442
doi: 10.1038/s41592-018-0001-7
Olson, N. D. et al. precisionFDA Truth Challenge V2: calling variants from short-and long-reads in difficult-to-map regions. Preprint at bioRxiv https://doi.org/10.1101/2020.11.13.380741 (2020).
Freed, D. N., Aldana, R., Weber, J. A. & Edwards, J. S. The Sentieon Genomics Tools - A fast and accurate solution to variant calling from next-generation sequence data. Preprint at bioRxiv 115717 (2017).
McIntyre, A. B. et al. Comprehensive benchmarking and ensemble approaches for metagenomic classifiers. Genome Biol. 18, 182 (2017).
pubmed: 28934964
pmcid: 5609029
doi: 10.1186/s13059-017-1299-7
Sogin, M. L. in PCR Protocols: A Guide to Methods and Applications (eds Innis, M. et al.) (Elsevier, 2012).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
pubmed: 29750242
pmcid: 6137996
doi: 10.1093/bioinformatics/bty191
Pedersen, B. S. & Quinlan, A. R. Mosdepth: quick coverage calculation for genomes and exomes. Bioinformatics 34, 867–868 (2018).
pubmed: 29096012
doi: 10.1093/bioinformatics/btx699
Poplin, R. et al. Scaling accurate genetic variant discovery to tens of thousands of samples. Preprint at bioRxiv https://doi.org/10.1101/201178 (2018).
Kim, S. et al. Strelka2: fast and accurate calling of germline and somatic variants. Nat. Methods 15, 591–594 (2018).
pubmed: 30013048
doi: 10.1038/s41592-018-0051-x
Poplin, R. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36, 983–987 (2018).
pubmed: 30247488
doi: 10.1038/nbt.4235
Luo, R. et al. Exploring the limit of using a deep neural network on pileup data for germline variant calling. Nat. Mach. Intell. 2, 220–227 (2020).
doi: 10.1038/s42256-020-0167-4
Rausch, T. et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28, i333–i339 (2012).
pubmed: 22962449
pmcid: 3436805
doi: 10.1093/bioinformatics/bts378
Layer, R. M., Chiang, C., Quinlan, A. R. & Hall, I. M. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 15, R84 (2014).
pubmed: 24970577
pmcid: 4197822
doi: 10.1186/gb-2014-15-6-r84
Chen, X. et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32, 1220–1222 (2016).
pubmed: 26647377
doi: 10.1093/bioinformatics/btv710
Conway, J. R., Lex, A. & Gehlenborg, N. UpSetR: an R package for the visualization of intersecting sets and their properties. Bioinformatics 33, 2938–2940 (2017).
pubmed: 28645171
pmcid: 5870712
doi: 10.1093/bioinformatics/btx364
Gu, Z., Eils, R. & Schlesner, M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics 32, 2847–2849 (2016).
pubmed: 27207943
doi: 10.1093/bioinformatics/btw313
Toptaş, B. Ç., Rakocevic, G., Kómár, P. & Kural, D. Comparing complex variants in family trios. Bioinformatics 34, 4241–4247 (2018).
pubmed: 29868720
pmcid: 6289131
doi: 10.1093/bioinformatics/bty443