ViReMa: a virus recombination mapper of next-generation sequencing data characterizes diverse recombinant viral nucleic acids.

copy-back RNAs defective RNAs defective viral genomes next-generation sequencing virus recombination

Journal

GigaScience
ISSN: 2047-217X
Titre abrégé: Gigascience
Pays: United States
ID NLM: 101596872

Informations de publication

Date de publication:
20 03 2023
Historique:
received: 20 05 2022
revised: 30 11 2022
accepted: 03 02 2023
entrez: 20 3 2023
pubmed: 21 3 2023
medline: 22 3 2023
Statut: ppublish

Résumé

Genetic recombination is a tremendous source of intrahost diversity in viruses and is critical for their ability to rapidly adapt to new environments or fitness challenges. While viruses are routinely characterized using high-throughput sequencing techniques, characterizing the genetic products of recombination in next-generation sequencing data remains a challenge. Viral recombination events can be highly diverse and variable in nature, including simple duplications and deletions, or more complex events such as copy/snap-back recombination, intervirus or intersegment recombination, and insertions of host nucleic acids. Due to the variable mechanisms driving virus recombination and the different selection pressures acting on the progeny, recombination junctions rarely adhere to simple canonical sites or sequences. Furthermore, numerous different events may be present simultaneously in a viral population, yielding a complex mutational landscape. We have previously developed an algorithm called ViReMa (Virus Recombination Mapper) that bootstraps the bowtie short-read aligner to capture and annotate a wide range of recombinant species found within virus populations. Here, we have updated ViReMa to provide an "error density" function designed to accurately detect recombination events in the longer reads now routinely generated by the Illumina platforms and provide output reports for multiple types of recombinant species using standardized formats. We demonstrate the utility and flexibility of ViReMa in different settings to report deletion events in simulated data from Flock House virus, copy-back RNA species in Sendai viruses, short duplication events in HIV, and virus-to-host recombination in an archaeal DNA virus.

Sections du résumé

BACKGROUND
Genetic recombination is a tremendous source of intrahost diversity in viruses and is critical for their ability to rapidly adapt to new environments or fitness challenges. While viruses are routinely characterized using high-throughput sequencing techniques, characterizing the genetic products of recombination in next-generation sequencing data remains a challenge. Viral recombination events can be highly diverse and variable in nature, including simple duplications and deletions, or more complex events such as copy/snap-back recombination, intervirus or intersegment recombination, and insertions of host nucleic acids. Due to the variable mechanisms driving virus recombination and the different selection pressures acting on the progeny, recombination junctions rarely adhere to simple canonical sites or sequences. Furthermore, numerous different events may be present simultaneously in a viral population, yielding a complex mutational landscape.
FINDINGS
We have previously developed an algorithm called ViReMa (Virus Recombination Mapper) that bootstraps the bowtie short-read aligner to capture and annotate a wide range of recombinant species found within virus populations. Here, we have updated ViReMa to provide an "error density" function designed to accurately detect recombination events in the longer reads now routinely generated by the Illumina platforms and provide output reports for multiple types of recombinant species using standardized formats. We demonstrate the utility and flexibility of ViReMa in different settings to report deletion events in simulated data from Flock House virus, copy-back RNA species in Sendai viruses, short duplication events in HIV, and virus-to-host recombination in an archaeal DNA virus.

Identifiants

pubmed: 36939008
pii: 7080818
doi: 10.1093/gigascience/giad009
pmc: PMC10025937
pii:
doi:

Substances chimiques

Nucleic Acids 0
RNA 63231-63-0

Types de publication

Journal Article Research Support, Non-U.S. Gov't Research Support, N.I.H., Extramural Research Support, U.S. Gov't, P.H.S.

Langues

eng

Sous-ensembles de citation

IM

Subventions

Organisme : NIAID NIH HHS
ID : R21 AI151725
Pays : United States
Organisme : NIAID NIH HHS
ID : U01 AI151801
Pays : United States
Organisme : NIAID NIH HHS
ID : R01 AI042189
Pays : United States
Organisme : NIAID NIH HHS
ID : U54 AI150472
Pays : United States

Informations de copyright

© The Author(s) 2023. Published by Oxford University Press GigaScience.

Références

PLoS Pathog. 2015 Sep 03;11(9):e1005122
pubmed: 26336095
Proc Natl Acad Sci U S A. 2004 May 18;101(20):7716-20
pubmed: 15123802
Elife. 2018 Aug 29;7:
pubmed: 30156554
Genome Med. 2015 Jan 20;7(1):2
pubmed: 25699093
Annu Rev Virol. 2019 Sep 29;6(1):547-566
pubmed: 31082310
mBio. 2020 Aug 18;11(4):
pubmed: 32817101
Nat Microbiol. 2019 Jul;4(7):1075-1087
pubmed: 31160826
Nat Protoc. 2016 Sep;11(9):1650-67
pubmed: 27560171
Nucleic Acids Res. 2021 Jul 9;49(12):e70
pubmed: 33849057
Bioinformatics. 2009 Jul 15;25(14):1754-60
pubmed: 19451168
J Virol. 2004 Nov;78(21):12030-40
pubmed: 15479842
Sci Rep. 2018 Oct 11;8(1):15177
pubmed: 30310104
Genome Biol. 2009;10(3):R25
pubmed: 19261174
Bioinformatics. 2013 Mar 1;29(5):649-51
pubmed: 23314323
J Virol. 2003 Apr;77(8):4836-47
pubmed: 12663790
Nucleic Acids Res. 2014;42(16):e123
pubmed: 25120266
Nature. 1970 Apr 25;226(5243):325-7
pubmed: 5439728
Nature. 2021 Mar;591(7849):293-299
pubmed: 33494095
Structure. 2010 Dec 8;18(12):1579-86
pubmed: 21134637
PLoS Comput Biol. 2015 Apr 20;11(4):e1004249
pubmed: 25894830
Nat Rev Microbiol. 2011 Jul 04;9(8):617-26
pubmed: 21725337
Bioinformatics. 2018 Sep 1;34(17):i884-i890
pubmed: 30423086
Virology. 2001 Oct 25;289(2):269-82
pubmed: 11689050
J Virol. 2012 May;86(10):5697-707
pubmed: 22398290
PLoS Pathog. 2017 May 5;13(5):e1006365
pubmed: 28475646
Sci Adv. 2020 Jul 1;6(27):
pubmed: 32937441
Elife. 2021 Sep 28;10:
pubmed: 34581669
Bioinformatics. 2012 Feb 15;28(4):593-4
pubmed: 22199392
Nat Commun. 2017 Oct 6;8(1):799
pubmed: 28986577
Virology. 1983 Oct 30;130(2):390-6
pubmed: 6316636
Nucleic Acids Res. 2014 Jan;42(2):e11
pubmed: 24137010
J Virol. 2019 May 15;93(11):
pubmed: 30867305
J Virol. 2017 May 12;91(11):
pubmed: 28331089
RNA. 2020 Dec;26(12):1905-1918
pubmed: 32929001
J Virol. 2021 Jan 20;95(8):
pubmed: 33472930
Biomed Mater Eng. 2015;26 Suppl 1:S1791-6
pubmed: 26405948
PLoS Pathog. 2019 Apr 17;15(4):e1007707
pubmed: 30995283
Cell Host Microbe. 2016 Aug 10;20(2):259-70
pubmed: 27476412
Bioinformatics. 2010 Feb 1;26(3):401-2
pubmed: 19965881
J Mol Biol. 2015 Aug 14;427(16):2610-6
pubmed: 26116762
Nucleic Acids Res. 2022 Apr 22;50(7):e41
pubmed: 35018461
PLoS Comput Biol. 2007 May;3(5):e87
pubmed: 17500586
Proc Natl Acad Sci U S A. 2013 Apr 2;110(14):5504-9
pubmed: 23520050
J Bacteriol. 2017 Aug 8;199(17):
pubmed: 28630130
J Virol. 2014 May;88(10):5217-27
pubmed: 24574404
Nat Methods. 2012 Mar 04;9(4):357-9
pubmed: 22388286
BMC Genomics. 2022 Jun 6;23(1):422
pubmed: 35668367
J Gen Virol. 2016 Nov;97(11):3051-3062
pubmed: 27600541
Bioinformatics. 2013 Jan 1;29(1):15-21
pubmed: 23104886
J Gen Virol. 2012 Mar;93(Pt 3):526-530
pubmed: 22113007
Nucleic Acids Res. 2017 Nov 2;45(19):10989-11003
pubmed: 28977510
RNA. 2018 Oct;24(10):1285-1296
pubmed: 30012569
Nature. 2015 Oct 1;526(7571):68-74
pubmed: 26432245
Am J Trop Med Hyg. 2021 Nov 29;106(1):99-104
pubmed: 34844209
PLoS Pathog. 2021 Jan 19;17(1):e1009226
pubmed: 33465137
Bioinformatics. 2022 Sep 15;38(18):4420-4422
pubmed: 35904541
Bioinformatics. 2021 Jan 20;:
pubmed: 33471068
Nat Biotechnol. 2011 Jan;29(1):24-6
pubmed: 21221095
Nat Methods. 2015 Apr;12(4):357-60
pubmed: 25751142
PLoS One. 2014 Nov 19;9(11):e112963
pubmed: 25409509
Nat Microbiol. 2021 May;6(5):672-681
pubmed: 33795879
J Virol. 2015 May;89(9):4760-9
pubmed: 25673712
Science. 2009 Apr 3;324(5923):55-9
pubmed: 19213880
Methods. 2015 Dec;91:40-47
pubmed: 26408523
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
Bioinformatics. 2010 Mar 15;26(6):841-2
pubmed: 20110278
J Virol. 2015 Oct;89(20):10532-47
pubmed: 26269185
J Virol. 2001 Oct;75(20):9644-53
pubmed: 11559796
J Virol. 2015 Oct 28;90(2):768-79
pubmed: 26512081
Brief Bioinform. 2013 Mar;14(2):178-92
pubmed: 22517427
PLoS Pathog. 2020 May 21;16(5):e1008436
pubmed: 32437428

Auteurs

Stephanea Sotcheff (S)

Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch, Galveston, TX 77555, USA.

Yiyang Zhou (Y)

Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch, Galveston, TX 77555, USA.

Jason Yeung (J)

John Sealy School of Medicine, The University of Texas Medical Branch, Galveston, TX 77555, USA.

Yan Sun (Y)

Department of Microbiology and Immunology, The University of Rochester Medical Center, Rochester, NY 14642, USA.

John E Johnson (JE)

Department of Integrative Structural and Computational Biology, Scripps Research, La Jolla, CA 92037, USA.

Bruce E Torbett (BE)

Department of Pediatrics, School of Medicine, University of Washington, Seattle, WA 98105, USA.
Center for Immunity and Immunotherapies, Seattle Children's Research Institute, Seattle, WA 98105, USA.
Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA 98195, USA.

Andrew L Routh (AL)

Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch, Galveston, TX 77555, USA.
Sealy Center for Structural Biology and Molecular Biophysics, The University of Texas Medical Branch, Galveston, TX 77555, USA.
Institute for Human Infections and Immunity, University of Texas Medical Branch, Galveston, TX 77555, USA.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing
Coal Metagenome Phylogeny Bacteria Genome, Bacterial
Genome, Viral Ralstonia Composting Solanum lycopersicum Bacteriophages

Classifications MeSH