Validated WGS and WES protocols proved saliva-derived gDNA as an equivalent to blood-derived gDNA for clinical and population genomic analyses.
Genomics variant analysis
Saliva-derived gDNA
Validation guideline
Whole exome sequencing
Whole genome sequencing
Journal
BMC genomics
ISSN: 1471-2164
Titre abrégé: BMC Genomics
Pays: England
ID NLM: 100965258
Informations de publication
Date de publication:
17 Feb 2024
17 Feb 2024
Historique:
received:
05
09
2023
accepted:
02
02
2024
medline:
17
2
2024
pubmed:
17
2
2024
entrez:
16
2
2024
Statut:
epublish
Résumé
Whole exome sequencing (WES) and whole genome sequencing (WGS) have become standard methods in human clinical diagnostics as well as in population genomics (POPGEN). Blood-derived genomic DNA (gDNA) is routinely used in the clinical environment. Conversely, many POPGEN studies and commercial tests benefit from easy saliva sampling. Here, we evaluated the quality of variant call sets and the level of genotype concordance of single nucleotide variants (SNVs) and small insertions and deletions (indels) for WES and WGS using paired blood- and saliva-derived gDNA isolates employing genomic reference-based validated protocols. The genomic reference standard Coriell NA12878 was repeatedly analyzed using optimized WES and WGS protocols, and data calls were compared with the truth dataset published by the Genome in a Bottle Consortium. gDNA was extracted from the paired blood and saliva samples of 10 participants and processed using the same protocols. A comparison of paired blood-saliva call sets was performed in the context of WGS and WES genomic reference-based technical validation results. The quality pattern of called variants obtained from genomic-reference-based technical replicates correlates with data calls of paired blood-saliva-derived samples in all levels of tested examinations despite a higher rate of non-human contamination found in the saliva samples. The F1 score of 10 blood-to-saliva-derived comparisons ranged between 0.8030-0.9998 for SNVs and between 0.8883-0.9991 for small-indels in the case of the WGS protocol, and between 0.8643-0.999 for SNVs and between 0.7781-1.000 for small-indels in the case of the WES protocol. Saliva may be considered an equivalent material to blood for genetic analysis for both WGS and WES under strict protocol conditions. The accuracy of sequencing metrics and variant-detection accuracy is not affected by choosing saliva as the gDNA source instead of blood but much more significantly by the genomic context, variant types, and the sequencing technology used.
Sections du résumé
BACKGROUND
BACKGROUND
Whole exome sequencing (WES) and whole genome sequencing (WGS) have become standard methods in human clinical diagnostics as well as in population genomics (POPGEN). Blood-derived genomic DNA (gDNA) is routinely used in the clinical environment. Conversely, many POPGEN studies and commercial tests benefit from easy saliva sampling. Here, we evaluated the quality of variant call sets and the level of genotype concordance of single nucleotide variants (SNVs) and small insertions and deletions (indels) for WES and WGS using paired blood- and saliva-derived gDNA isolates employing genomic reference-based validated protocols.
METHODS
METHODS
The genomic reference standard Coriell NA12878 was repeatedly analyzed using optimized WES and WGS protocols, and data calls were compared with the truth dataset published by the Genome in a Bottle Consortium. gDNA was extracted from the paired blood and saliva samples of 10 participants and processed using the same protocols. A comparison of paired blood-saliva call sets was performed in the context of WGS and WES genomic reference-based technical validation results.
RESULTS
RESULTS
The quality pattern of called variants obtained from genomic-reference-based technical replicates correlates with data calls of paired blood-saliva-derived samples in all levels of tested examinations despite a higher rate of non-human contamination found in the saliva samples. The F1 score of 10 blood-to-saliva-derived comparisons ranged between 0.8030-0.9998 for SNVs and between 0.8883-0.9991 for small-indels in the case of the WGS protocol, and between 0.8643-0.999 for SNVs and between 0.7781-1.000 for small-indels in the case of the WES protocol.
CONCLUSION
CONCLUSIONS
Saliva may be considered an equivalent material to blood for genetic analysis for both WGS and WES under strict protocol conditions. The accuracy of sequencing metrics and variant-detection accuracy is not affected by choosing saliva as the gDNA source instead of blood but much more significantly by the genomic context, variant types, and the sequencing technology used.
Identifiants
pubmed: 38365587
doi: 10.1186/s12864-024-10080-0
pii: 10.1186/s12864-024-10080-0
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
187Informations de copyright
© 2024. The Author(s).
Références
100,000 Genomes Project Pilot Investigators, Smedley D, Smith KR, Martin A, Thomas EA, McDonagh EM, et al. 100,000 genomes pilot on rare-disease diagnosis in health care - preliminary report. N Engl J Med. 2021;385:1868–80.
doi: 10.1056/NEJMoa2035790
Bick D, Jones M, Taylor SL, Taft RJ, Belmont J. Case for genome sequencing in infants and children with rare, undiagnosed or genetic diseases. J Med Genet. 2019;56:783–91.
pubmed: 31023718
doi: 10.1136/jmedgenet-2019-106111
Owen MJ, Wright MS, Batalov S, Kwon Y, Ding Y, Chau KK, et al. Reclassification of the etiology of infant mortality with whole-genome sequencing. JAMA Netw Open. 2023;6:e2254069.
pubmed: 36757698
pmcid: 9912130
doi: 10.1001/jamanetworkopen.2022.54069
Lee H-F, Chi C-S, Tsai C-R. Diagnostic yield and treatment impact of whole-genome sequencing in paediatric neurological disorders. Dev Med Child Neurol. 2021;63:934–8.
pubmed: 33244750
doi: 10.1111/dmcn.14722
Souche E, Beltran S, Brosens E, Belmont JW, Fossum M, Riess O, et al. Recommendations for whole genome sequencing in diagnostics for rare diseases. Eur J Hum Genet. 2022;30:1017–21.
pubmed: 35577938
pmcid: 9437083
doi: 10.1038/s41431-022-01113-x
Matthijs G, Souche E, Alders M, Corveleyn A, Eck S, Feenstra I, et al. Guidelines for diagnostic next-generation sequencing. Eur J Hum Genet. 2016;24:2–5.
pubmed: 26508566
doi: 10.1038/ejhg.2015.226
Schwarze K, Buchanan J, Taylor JC, Wordsworth S. Are whole-exome and whole-genome sequencing approaches cost-effective? A systematic review of the literature. Genet Med. 2018;20:1122–30.
pubmed: 29446766
doi: 10.1038/gim.2017.247
Aaltio J, Hyttinen V, Kortelainen M, Frederix GWJ, Lönnqvist T, Suomalainen A, et al. Cost-effectiveness of whole-exome sequencing in progressive neurological disorders of children. Eur J Paediatr Neurol. 2022;36:30–6.
pubmed: 34852981
doi: 10.1016/j.ejpn.2021.11.006
Fan S, Spence JP, Feng Y, Hansen MEB, Terhorst J, Beltrame MH, et al. Whole-genome sequencing reveals a complex African population demographic history and signatures of local adaptation. Cell. 2023;186:923-39.e14.
pubmed: 36868214
doi: 10.1016/j.cell.2023.01.042
Sirugo G, Williams SM, Tishkoff SA. The missing diversity in human genetic studies. Cell. 2019;177:26–31.
pubmed: 30901543
pmcid: 7380073
doi: 10.1016/j.cell.2019.02.048
Phulka JS, Ashraf M, Bajwa BK, Pare G, Laksman Z. Current state and future of polygenic risk scores in cardiometabolic disease: a scoping review. Circ Genom Precis Med. 2023;16:286–313.
pubmed: 37035923
doi: 10.1161/CIRCGEN.122.003834
Sherkow JS, Park JK, Lu CY. Regulating direct-to-consumer polygenic risk scores. JAMA. 2023. https://doi.org/10.1001/jama.2023.12262 .
doi: 10.1001/jama.2023.12262
pubmed: 37535347
Herzig AF, Velo-Suárez L, Le Folgoc G, Boland A, Blanché H, Olaso R, et al. Evaluation of saliva as a source of accurate whole-genome and microbiome sequencing data. Genet Epidemiol. 2021;45:537–48.
pubmed: 33998042
doi: 10.1002/gepi.22386
Gudiseva HV, Hansen M, Gutierrez L, Collins DW, He J, Verkuil LD, et al. Saliva DNA quality and genotyping efficiency in a predominantly elderly population. BMC Med Genomics. 2016;9:17.
pubmed: 27052975
pmcid: 4823890
doi: 10.1186/s12920-016-0172-y
Poehls UG, Hack CC, Ekici AB, Beckmann MW, Fasching PA, Ruebner M, et al. Saliva samples as a source of DNA for high throughput genotyping: an acceptable and sufficient means in improvement of risk estimation throughout mammographic diagnostics. Eur J Med Res. 2018;23:20.
pubmed: 29703267
pmcid: 5921411
doi: 10.1186/s40001-018-0318-9
Bruinsma FJ, Joo JE, Wong EM, Giles GG, Southey MC. The utility of DNA extracted from saliva for genome-wide molecular research platforms. BMC Res Notes. 2018;11:8.
pubmed: 29310721
pmcid: 5759806
doi: 10.1186/s13104-017-3110-y
Kidd JM, Sharpton TJ, Bobo D, Norman PJ, Martin AR, Carpenter ML, et al. Exome capture from saliva produces high quality genomic and metagenomic data. BMC Genomics. 2014;15:262.
pubmed: 24708091
pmcid: 4051168
doi: 10.1186/1471-2164-15-262
Zhu Q, Hu Q, Shepherd L, Wang J, Wei L, Morrison CD, et al. The impact of DNA input amount and DNA source on the performance of whole-exome sequencing in cancer epidemiology. Cancer Epidemiol Biomarkers Prev. 2015;24:1207–13.
pubmed: 25990554
pmcid: 4526319
doi: 10.1158/1055-9965.EPI-15-0205
Trost B, Walker S, Haider SA, Sung WWL, Pereira S, Phillips CL, et al. Impact of DNA source on genetic variant detection from human whole-genome sequencing data. J Med Genet. 2019;56:809–17.
pubmed: 31515274
doi: 10.1136/jmedgenet-2019-106281
Samson CA, Whitford W, Snell RG, Jacobsen JC, Lehnert K. Contaminating DNA in human saliva alters the detection of variants from whole genome sequencing. Sci Rep. 2020;10:19255.
pubmed: 33159102
pmcid: 7648094
doi: 10.1038/s41598-020-76022-4
Yao RA, Akinrinade O, Chaix M, Mital S. Quality of whole genome sequencing from blood versus saliva derived DNA in cardiac patients. BMC Med Genomics. 2020;13:11.
pubmed: 31996208
pmcid: 6988365
doi: 10.1186/s12920-020-0664-7
Zhao S, Agafonov O, Azab A, Stokowy T, Hovig E. Accuracy and efficiency of germline variant calling pipelines for human genome data. Sci Rep. 2020;10:20222.
pubmed: 33214604
pmcid: 7678823
doi: 10.1038/s41598-020-77218-4
Marshall CR, Chowdhury S, Taft RJ, Lebo MS, Buchan JG, Harrison SM, et al. Best practices for the analytical validation of clinical whole-genome sequencing intended for the diagnosis of germline disease. NPJ Genom Med. 2020;5:47.
pubmed: 33110627
pmcid: 7585436
doi: 10.1038/s41525-020-00154-9
Zare F, Ansari S, Najarian K, Nabavi S. Preprocessing sequence coverage data for more precise detection of copy number variations. IEEE/ACM Trans Comput Biol Bioinform. 2020;17:868–76.
pubmed: 30222580
doi: 10.1109/TCBB.2018.2869738
Rajagopalan R, Murrell JR, Luo M, Conlin LK. A highly sensitive and specific workflow for detecting rare copy-number variants from exome sequencing data. Genome Med. 2020;12:14.
pubmed: 32000839
pmcid: 6993336
doi: 10.1186/s13073-020-0712-0
Budiš J, Kucharík M, Ďuriš F, Gazdarica J, Zrubcová M, Ficek A, et al. Dante: genotyping of known complex and expanded short tandem repeats. Bioinformatics. 2019;35:1310–7.
pubmed: 30203023
doi: 10.1093/bioinformatics/bty791
Zook JM, McDaniel J, Olson ND, Wagner J, Parikh H, Heaton H, et al. An open resource for accurately benchmarking small variant and reference calls. Nat Biotechnol. 2019;37:561–6.
pubmed: 30936564
pmcid: 6500473
doi: 10.1038/s41587-019-0074-6
Index of /ReferenceSamples/giab/release/NA12878_HG001/latest/GRCh38. https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/NA12878_HG001/latest/GRCh38/ . Accessed 16 Aug 2023.
Krusche P, Trigg L, Boutros PC, Mason CE, De La Vega FM, Moore BL, et al. Best practices for benchmarking germline small-variant calls in human genomes. Nat Biotechnol. 2019;37:555–60.
pubmed: 30858580
pmcid: 6699627
doi: 10.1038/s41587-019-0054-x
Index of /ReferenceSamples/giab/release/genome-stratifications/v3.0/GRCh38/union. https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/genome-stratifications/v3.0/GRCh38/union/ . Accessed 16 Aug 2023.
NA12878. https://catalog.coriell.org/0/Sections/Search/Sample_Detail.aspx?Ref=NA12878 . Accessed 22 Aug 2023.
Wingett SW, Andrews S. FastQ Screen: A tool for multi-genome mapping and quality control. F1000Res. 2018;7:1338.
pubmed: 30254741
pmcid: 6124377
doi: 10.12688/f1000research.15931.1
Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34:i884–90.
pubmed: 30423086
pmcid: 6129281
doi: 10.1093/bioinformatics/bty560
Illumina DRAGEN Bio-IT Platform v3.10. https://support-docs.illumina.com/SW/DRAGEN_v310/Content/SW/FrontPages/DRAGEN.htm . Accessed 3 Dec 2023.
GitHub - pwwang/vcfstats: Powerful statistics for VCF files. GitHub. https://github.com/pwwang/vcfstats . Accessed 11 Jan 2023.
Dewhirst FE, Chen T, Izard J, Paster BJ, Tanner ACR, Yu W-H, et al. The human oral microbiome. J Bacteriol. 2010;192:5002–17.
pubmed: 20656903
pmcid: 2944498
doi: 10.1128/JB.00542-10
Dice LR. Measures of the amount of ecologic association between species. Ecology. 1945;26:297–302.
doi: 10.2307/1932409
Sørensen T. A Method of Establishing Groups of Equal Amplitude in Plant Sociology Based on Similarity of Species Content and Its Application to Analyses of the Vegetation on Danish Commons. Kongelige Danske Videnskabernes Selskab, Biologiske Skrifter. 1948;5:1–34.
GitHub - Illumina/hap.py: Haplotype VCF comparison tools. GitHub. https://github.com/Illumina/hap.py . Accessed 16 Jun 2023.
Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods. 2020;17:261–72.
pubmed: 32015543
pmcid: 7056644
doi: 10.1038/s41592-019-0686-2
Hunter JD. Matplotlib: A 2D graphics environment. Comput Sci Engine. 2007;9:90–5.
doi: 10.1109/MCSE.2007.55
Harris CR, Millman KJ, van der Walt SJ, Gommers R, Virtanen P, Cournapeau D, et al. Array programming with NumPy. Nature. 2020;585:357–62.
pubmed: 32939066
pmcid: 7759461
doi: 10.1038/s41586-020-2649-2
pandas-dev/pandas: Pandas. 2023. https://doi.org/10.5281/zenodo.7549438 .
Vallat R. Pingouin: statistics in Python. J Open Source Software. 2018;3:1026.
doi: 10.21105/joss.01026
Waskom M. seaborn: statistical data visualization. J Open Source Software. 2021;6:3021.
doi: 10.21105/joss.03021
Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, et al. Circos: an information aesthetic for comparative genomics. Genome Res. 2009;19:1639–45.
pubmed: 19541911
pmcid: 2752132
doi: 10.1101/gr.092759.109
Rehder C, Bean LJH, Bick D, Chao E, Chung W, Das S, et al. Next-generation sequencing for constitutional variants in the clinical laboratory, 2021 revision: a technical standard of the American College of Medical Genetics and Genomics (ACMG). Genet Med. 2021;23:1399–415.
pubmed: 33927380
doi: 10.1038/s41436-021-01139-4
Wagner J, Olson ND, Harris L, McDaniel J, Cheng H, Fungtammasan A, et al. Curated variation benchmarks for challenging medically relevant autosomal genes. Nat Biotechnol. 2022;40:672–80.
pubmed: 35132260
pmcid: 9117392
doi: 10.1038/s41587-021-01158-1
Kang J-H, Kho H-S. Blood contamination in salivary diagnostics: current methods and their limitations. Clin Chem Lab Med. 2019;57:1115–24.
pubmed: 30511922
doi: 10.1515/cclm-2018-0739
Theda C, Hwang SH, Czajko A, Loke YJ, Leong P, Craig JM. Quantitation of the cellular content of saliva and buccal swab samples. Sci Rep. 2018;8:6944.
pubmed: 29720614
pmcid: 5932057
doi: 10.1038/s41598-018-25311-0
Genome in a Bottle | NIST. 2012. https://www.nist.gov/programs-projects/genome-bottle . Accessed 21 June 2023.
Kubiritova Z, Gyuraszova M, Nagyova E, Hyblova M, Harsanyova M, Budis J, et al. On the critical evaluation and confirmation of germline sequence variants identified using massively parallel sequencing. J Biotechnol. 2019;298:64–75.
pubmed: 30998956
doi: 10.1016/j.jbiotec.2019.04.013
Budis J, Gazdarica J, Radvanszky J, Harsanyova M, Gazdaricova I, Strieskova L, et al. Non-invasive prenatal testing as a valuable source of population specific allelic frequencies. J Biotechnol. 2019;299:72–8.
pubmed: 31054297
doi: 10.1016/j.jbiotec.2019.04.026
Willems T, Zielinski D, Yuan J, Gordon A, Gymrek M, Erlich Y. Genome-wide profiling of heritable and de novo STR variations. Nat Methods. 2017;14:590–2.
pubmed: 28436466
pmcid: 5482724
doi: 10.1038/nmeth.4267
Dolzhenko E, Deshpande V, Schlesinger F, Krusche P, Petrovski R, Chen S, et al. ExpansionHunter: a sequence-graph-based tool to analyze variation in short tandem repeat regions. Bioinformatics. 2019;35:4754–6.
pubmed: 31134279
pmcid: 6853681
doi: 10.1093/bioinformatics/btz431
Kumar A, Skrahina V, Atta J, Boettcher V, Hanig N, Rolfs A, et al. Microbial contamination and composition of oral samples subjected to clinical whole genome sequencing. Front Genet. 2023;14:1081424.
pubmed: 36824435
pmcid: 9941560
doi: 10.3389/fgene.2023.1081424
Sender R, Fuchs S, Milo R. Revised estimates for the number of human and bacteria cells in the body. Plos Biol. 2016;14:e1002533.
pubmed: 27541692
pmcid: 4991899
doi: 10.1371/journal.pbio.1002533
Castillo DJ, Rifkin RF, Cowan DA, Potgieter M. The Healthy human blood microbiome: fact or fiction? Front Cell Infect Microbiol. 2019;9:148.
pubmed: 31139578
pmcid: 6519389
doi: 10.3389/fcimb.2019.00148
Şenel S. An overview of physical, microbiological and immune barriers of oral Mucosa. Int J Mol Sci. 2021;22:7821.
pubmed: 34360589
pmcid: 8346143
doi: 10.3390/ijms22157821
Caselli E, Fabbri C, D’Accolti M, Soffritti I, Bassi C, Mazzacane S, et al. Defining the oral microbiome by whole-genome sequencing and resistome analysis: the complexity of the healthy picture. BMC Microbiol. 2020;20:120.
pubmed: 32423437
pmcid: 7236360
doi: 10.1186/s12866-020-01801-y
Lee E-J, Sung J, Kim H-L, Kim H-N. Whole-genome sequencing reveals age-specific changes in the human blood microbiota. J Pers Med. 2022;12:939.
pubmed: 35743724
pmcid: 9225573
doi: 10.3390/jpm12060939
Peng X, Cheng L, You Y, Tang C, Ren B, Li Y, et al. Oral microbiota in human systematic diseases. Int J Oral Sci. 2022;14:14.
pubmed: 35236828
pmcid: 8891310
doi: 10.1038/s41368-022-00163-7
Olomu IN, Pena-Cortes LC, Long RA, Vyas A, Krichevskiy O, Luellwitz R, et al. Elimination of “kitome” and “splashome” contamination results in lack of detection of a unique placental microbiome. BMC Microbiol. 2020;20:157.
pubmed: 32527226
pmcid: 7291729
doi: 10.1186/s12866-020-01839-y
Sosonkina N, Kelly M, Holt J, Bick D, Nakouzi G. eP403: finding merit in impurity: designing a cost-effective workflow for saliva genome sequencing. Genet Med. 2022;24:253–4.
doi: 10.1016/j.gim.2022.01.438
BioProject. https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1008349 . Accessed 23 Aug 2023.