A Distributed Whole Genome Sequencing Benchmark Study.
benchmark
comparison
genome
informatics
variant
whole genome sequencing
Journal
Frontiers in genetics
ISSN: 1664-8021
Titre abrégé: Front Genet
Pays: Switzerland
ID NLM: 101560621
Informations de publication
Date de publication:
2020
2020
Historique:
received:
30
09
2020
accepted:
10
11
2020
entrez:
18
12
2020
pubmed:
19
12
2020
medline:
19
12
2020
Statut:
epublish
Résumé
Population sequencing often requires collaboration across a distributed network of sequencing centers for the timely processing of thousands of samples. In such massive efforts, it is important that participating scientists can be confident that the accuracy of the sequence data produced is not affected by which center generates the data. A study was conducted across three established sequencing centers, located in Montreal, Toronto, and Vancouver, constituting Canada's Genomics Enterprise (www.cgen.ca). Whole genome sequencing was performed at each center, on three genomic DNA replicates from three well-characterized cell lines. Secondary analysis pipelines employed by each site were applied to sequence data from each of the sites, resulting in three datasets for each of four variables (cell line, replicate, sequencing center, and analysis pipeline), for a total of 81 datasets. These datasets were each assessed according to multiple quality metrics including concordance with benchmark variant truth sets to assess consistent quality across all three conditions for each variable. Three-way concordance analysis of variants across conditions for each variable was performed. Our results showed that the variant concordance between datasets differing only by sequencing center was similar to the concordance for datasets differing only by replicate, using the same analysis pipeline. We also showed that the statistically significant differences between datasets result from the analysis pipeline used, which can be unified and updated as new approaches become available. We conclude that genome sequencing projects can rely on the quality and reproducibility of aggregate data generated across a network of distributed sites.
Identifiants
pubmed: 33335541
doi: 10.3389/fgene.2020.612515
pmc: PMC7736078
doi:
Types de publication
Journal Article
Langues
eng
Pagination
612515Informations de copyright
Copyright © 2020 Corbett, Eveleigh, Whitney, Barai, Bourgey, Chuah, Johnson, Moore, Moradin, Mungall, Pereira, Reuter, Thiruvahindrapuram, Wintle, Ragoussis, Strug, Herbrick, Aziz, Jones, Lathrop, Scherer, Staffa and Mungall.
Déclaration de conflit d'intérêts
SS is on the Scientific Advisory Boards of Deep Genomics and Population Bio and intellectual property arising from his research held at The Hospital for Sick Children is licensed by Athena Diagnostics and Lineagen. The Center for Applied Genomics, directed by SS and The Hospital for Sick Children has benefited from joint relationships with Illumina and other suppliers, but none of these arrangements have impacted the studies described in this paper. SS holds the Canadian Institutes of Health Research (CIHR) GlaxoSmithKline Endowed Chair for Genome Sciences at The Hospital for Sick Children and University of Toronto. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Références
Cell. 2018 Apr 5;173(2):371-385.e18
pubmed: 29625053
CMAJ. 2018 Feb 5;190(5):E126-E136
pubmed: 29431110
Genome Res. 2010 Sep;20(9):1297-303
pubmed: 20644199
Nat Genet. 2007 Jul;39(7 Suppl):S7-15
pubmed: 17597783
BMC Med Genomics. 2018 Nov 27;11(1):108
pubmed: 30482208
PLoS One. 2017 May 11;12(5):e0177459
pubmed: 28494014
Nat Rev Genet. 2014 Jan;15(1):56-62
pubmed: 24322726
Biopreserv Biobank. 2016 Jun;14(3):256-9
pubmed: 27082668
Nature. 2019 Nov;575(7781):210-216
pubmed: 31645765
Bioinformatics. 2020 Feb 15;36(4):1283-1285
pubmed: 31580400
BMC Bioinformatics. 2017 Jul 24;18(1):351
pubmed: 28738841
Sci Rep. 2016 Apr 20;6:24607
pubmed: 27094764
Genet Med. 2018 Apr;20(4):435-443
pubmed: 28771251
BMC Bioinformatics. 2019 Jun 17;20(1):342
pubmed: 31208315
Sci Data. 2019 Jun 14;6(1):91
pubmed: 31201313
Sci Adv. 2020 May 27;6(22):eaaz7835
pubmed: 32766443
Bioinformatics. 2012 Oct 15;28(20):2678-9
pubmed: 22914218
NPJ Genom Med. 2016 Jan 13;1:
pubmed: 28567303
Nat Genet. 2011 May;43(5):491-8
pubmed: 21478889
Nat Rev Genet. 2019 Nov;20(11):631-656
pubmed: 31341269
Nat Neurosci. 2017 Apr;20(4):602-611
pubmed: 28263302
Nature. 2020 Oct;586(7827):80-86
pubmed: 32717741
Eur J Hum Genet. 2015 Jun;23(6):721-8
pubmed: 25248399
Nature. 2015 Oct 1;526(7571):68-74
pubmed: 26432245
Nat Methods. 2018 Aug;15(8):591-594
pubmed: 30013048
Nature. 2020 Jul;583(7814):96-102
pubmed: 32581362
Genet Med. 2020 Nov;22(11):1892-1897
pubmed: 32624572
Sci Data. 2016 Jun 07;3:160025
pubmed: 27271295
J Med Genet. 2019 Dec;56(12):809-817
pubmed: 31515274
Curr Protoc Bioinformatics. 2013;43:11.10.1-11.10.33
pubmed: 25431634
Hum Genet. 2014 Jul;133(7):895-903
pubmed: 24573176
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
Sci Rep. 2019 Jun 27;9(1):9345
pubmed: 31249349
J Mol Diagn. 2020 Feb;22(2):141-146
pubmed: 31837431