Optimized quantification of intra-host viral diversity in SARS-CoV-2 and influenza virus sequence data.


Journal

mBio
ISSN: 2150-7511
Titre abrégé: mBio
Pays: United States
ID NLM: 101519231

Informations de publication

Date de publication:
31 08 2023
Historique:
medline: 4 9 2023
pubmed: 30 6 2023
entrez: 30 6 2023
Statut: ppublish

Résumé

High error rates of viral RNA-dependent RNA polymerases lead to diverse intra-host viral populations during infection. Errors made during replication that are not strongly deleterious to the virus can lead to the generation of minority variants. However, accurate detection of minority variants in viral sequence data is complicated by errors introduced during sample preparation and data analysis. We used synthetic RNA controls and simulated data to test seven variant-calling tools across a range of allele frequencies and simulated coverages. We show that choice of variant caller and use of replicate sequencing have the most significant impact on single-nucleotide variant (SNV) discovery and demonstrate how both allele frequency and coverage thresholds impact both false discovery and false-negative rates. When replicates are not available, using a combination of multiple callers with more stringent cutoffs is recommended. We use these parameters to find minority variants in sequencing data from SARS-CoV-2 clinical specimens and provide guidance for studies of intra-host viral diversity using either single replicate data or data from technical replicates. Our study provides a framework for rigorous assessment of technical factors that impact SNV identification in viral samples and establishes heuristics that will inform and improve future studies of intra-host variation, viral diversity, and viral evolution. IMPORTANCE When viruses replicate inside a host cell, the virus replication machinery makes mistakes. Over time, these mistakes create mutations that result in a diverse population of viruses inside the host. Mutations that are neither lethal to the virus nor strongly beneficial can lead to minority variants that are minor members of the virus population. However, preparing samples for sequencing can also introduce errors that resemble minority variants, resulting in the inclusion of false-positive data if not filtered correctly. In this study, we aimed to determine the best methods for identification and quantification of these minority variants by testing the performance of seven commonly used variant-calling tools. We used simulated and synthetic data to test their performance against a true set of variants and then used these studies to inform variant identification in data from SARS-CoV-2 clinical specimens. Together, analyses of our data provide extensive guidance for future studies of viral diversity and evolution.

Identifiants

pubmed: 37389439
doi: 10.1128/mbio.01046-23
pmc: PMC10470513
doi:

Types de publication

Journal Article Research Support, N.I.H., Intramural Research Support, Non-U.S. Gov't Research Support, N.I.H., Extramural

Langues

eng

Sous-ensembles de citation

IM

Pagination

e0104623

Subventions

Organisme : NIAID NIH HHS
ID : UM1 AI148574
Pays : United States
Organisme : NIAID NIH HHS
ID : T32 AI007180
Pays : United States
Organisme : NIAID NIH HHS
ID : HHSN272201400008C
Pays : United States
Organisme : NIGMS NIH HHS
ID : T32 GM132037
Pays : United States

Commentaires et corrections

Type : UpdateOf

Déclaration de conflit d'intérêts

The authors declare no conflict of interest.

Références

Bioinformatics. 2009 Jul 15;25(14):1754-60
pubmed: 19451168
Proc Natl Acad Sci U S A. 2004 Jun 1;101(22):8396-401
pubmed: 15159545
J Virol. 2016 Jul 11;90(15):6884-95
pubmed: 27194763
Annu Rev Virol. 2014 Nov;1(1):111-32
pubmed: 26958717
Genome Med. 2020 Oct 26;12(1):91
pubmed: 33106175
Virus Res. 2006 Apr;117(1):17-37
pubmed: 16503362
Annu Rev Virol. 2020 Sep 29;7(1):63-81
pubmed: 32511081
Elife. 2021 Aug 13;10:
pubmed: 34387545
Bioinformatics. 2018 Sep 15;34(18):3094-3100
pubmed: 29750242
Nat Rev Genet. 2009 Aug;10(8):540-50
pubmed: 19564871
Cell Mol Life Sci. 2016 Dec;73(23):4433-4448
pubmed: 27392606
PLoS Pathog. 2021 Apr 7;17(4):e1009499
pubmed: 33826681
BMC Genomics. 2021 Jan 21;22(1):69
pubmed: 33478392
Hum Mutat. 2013 Oct;34(10):1432-8
pubmed: 23766071
Nat Biotechnol. 2013 Mar;31(3):213-9
pubmed: 23396013
Bioinformatics. 2014 Aug 1;30(15):2114-20
pubmed: 24695404
N Engl J Med. 2022 Apr 14;386(15):1477-1479
pubmed: 35263515
Cell Host Microbe. 2021 Jan 13;29(1):32-43.e4
pubmed: 33212020
Genome Med. 2021 Feb 22;13(1):30
pubmed: 33618765
Virus Evol. 2019 Jan 30;5(1):vey041
pubmed: 30723551
Nat Rev Genet. 2008 Apr;9(4):267-76
pubmed: 18319742
Biochemistry. 2004 May 11;43(18):5126-37
pubmed: 15122878
Front Med (Lausanne). 2021 Feb 15;8:585358
pubmed: 33659260
J Virol. 2016 Aug 12;90(17):8029
pubmed: 27520946
Nucleic Acids Res. 2012 Dec;40(22):11189-201
pubmed: 23066108
PLoS Pathog. 2012;8(5):e1002685
pubmed: 22570614
J Virol. 2002 Jan 1;76(1):463-465
pubmed: 33739796
J Infect Dis. 2020 Jun 29;222(2):203-205
pubmed: 32427340
Curr Opin Virol. 2018 Feb;28:20-25
pubmed: 29107838
Science. 2021 Apr 16;372(6539):
pubmed: 33688063
PLoS Pathog. 2016 Aug 29;12(8):e1005856
pubmed: 27571422
Front Genet. 2015 Jul 07;6:235
pubmed: 26217378
Curr Protoc Bioinformatics. 2013;43:11.10.1-11.10.33
pubmed: 25431634
Virus Evol. 2015 Oct 02;1(1):vev013
pubmed: 27774285
Genome Res. 2012 Mar;22(3):568-76
pubmed: 22300766
Sci Transl Med. 2021 Oct 27;13(617):eabh1803
pubmed: 34705523
Genome Biol. 2019 Jan 8;20(1):8
pubmed: 30621750
J Virol. 2018 Jun 29;92(14):
pubmed: 29720522
PLoS One. 2016 Nov 28;11(11):e0167047
pubmed: 27893777
Nat Rev Genet. 2016 Oct 14;17(11):704-714
pubmed: 27739533
Pathogens. 2022 Jun 08;11(6):
pubmed: 35745516

Auteurs

A E Roder (AE)

Systems Genomics Section, Laboratory of Parasitic Diseases, DIR, NIAID, NIH , Bethesda, Maryland, USA.

K E E Johnson (KEE)

Systems Genomics Section, Laboratory of Parasitic Diseases, DIR, NIAID, NIH , Bethesda, Maryland, USA.
Department of Biology, Center for Genomics and Systems Biology, New York University , New York, New York, USA.

M Knoll (M)

Department of Biology, Center for Genomics and Systems Biology, New York University , New York, New York, USA.

M Khalfan (M)

Department of Biology, Center for Genomics and Systems Biology, New York University , New York, New York, USA.

B Wang (B)

Department of Biology, Center for Genomics and Systems Biology, New York University , New York, New York, USA.

S Schultz-Cherry (S)

Department of Infectious Diseases, St Jude Children Research Hospital , Memphis, Tennessee, USA.

S Banakis (S)

Systems Genomics Section, Laboratory of Parasitic Diseases, DIR, NIAID, NIH , Bethesda, Maryland, USA.

A Kreitman (A)

Systems Genomics Section, Laboratory of Parasitic Diseases, DIR, NIAID, NIH , Bethesda, Maryland, USA.

C Mederos (C)

Systems Genomics Section, Laboratory of Parasitic Diseases, DIR, NIAID, NIH , Bethesda, Maryland, USA.

J-H Youn (JH)

Department of Laboratory Medicine, NIH , Bethesda, Maryland, USA.

R Mercado (R)

Department of Laboratory Medicine, NIH , Bethesda, Maryland, USA.

W Wang (W)

Systems Genomics Section, Laboratory of Parasitic Diseases, DIR, NIAID, NIH , Bethesda, Maryland, USA.

M Chung (M)

Systems Genomics Section, Laboratory of Parasitic Diseases, DIR, NIAID, NIH , Bethesda, Maryland, USA.

D Ruchnewitz (D)

Institute for Biological Physics, University of Cologne , Cologne, Germany.

M I Samanovic (MI)

Department of Medicine, New York University Langone Vaccine Center , New York, New York, USA.

M J Mulligan (MJ)

Department of Medicine, New York University Langone Vaccine Center , New York, New York, USA.

M Lässig (M)

Institute for Biological Physics, University of Cologne , Cologne, Germany.

M Luksza (M)

Department of Oncological Sciences, Icahn School of Medicine at Mount Sinai , New York, New York, USA.

S Das (S)

Department of Laboratory Medicine, NIH , Bethesda, Maryland, USA.

D Gresham (D)

Department of Biology, Center for Genomics and Systems Biology, New York University , New York, New York, USA.

E Ghedin (E)

Systems Genomics Section, Laboratory of Parasitic Diseases, DIR, NIAID, NIH , Bethesda, Maryland, USA.
Department of Biology, Center for Genomics and Systems Biology, New York University , New York, New York, USA.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C

Classifications MeSH