VirPool: model-based estimation of SARS-CoV-2 variant proportions in wastewater samples.
Probabilistic modeling
SARS-CoV-2
Variant proportion estimation
Wastewater analysis
Weighted mixture model
Journal
BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194
Informations de publication
Date de publication:
19 Dec 2022
19 Dec 2022
Historique:
received:
21
06
2022
accepted:
06
12
2022
entrez:
19
12
2022
pubmed:
20
12
2022
medline:
22
12
2022
Statut:
epublish
Résumé
The genomes of SARS-CoV-2 are classified into variants, some of which are monitored as variants of concern (e.g. the Delta variant B.1.617.2 or Omicron variant B.1.1.529). Proportions of these variants circulating in a human population are typically estimated by large-scale sequencing of individual patient samples. Sequencing a mixture of SARS-CoV-2 RNA molecules from wastewater provides a cost-effective alternative, but requires methods for estimating variant proportions in a mixed sample. We propose a new method based on a probabilistic model of sequencing reads, capturing sequence diversity present within individual variants, as well as sequencing errors. The algorithm is implemented in an open source Python program called VirPool. We evaluate the accuracy of VirPool on several simulated and real sequencing data sets from both Illumina and nanopore sequencing platforms, including wastewater samples from Austria and France monitoring the onset of the Alpha variant. VirPool is a versatile tool for wastewater and other mixed-sample analysis that can handle both short- and long-read sequencing data. Our approach does not require pre-selection of characteristic mutations for variant profiles, it is able to use the entire length of reads instead of just the most informative positions, and can also capture haplotype dependencies within a single read.
Sections du résumé
BACKGROUND
BACKGROUND
The genomes of SARS-CoV-2 are classified into variants, some of which are monitored as variants of concern (e.g. the Delta variant B.1.617.2 or Omicron variant B.1.1.529). Proportions of these variants circulating in a human population are typically estimated by large-scale sequencing of individual patient samples. Sequencing a mixture of SARS-CoV-2 RNA molecules from wastewater provides a cost-effective alternative, but requires methods for estimating variant proportions in a mixed sample.
RESULTS
RESULTS
We propose a new method based on a probabilistic model of sequencing reads, capturing sequence diversity present within individual variants, as well as sequencing errors. The algorithm is implemented in an open source Python program called VirPool. We evaluate the accuracy of VirPool on several simulated and real sequencing data sets from both Illumina and nanopore sequencing platforms, including wastewater samples from Austria and France monitoring the onset of the Alpha variant.
CONCLUSIONS
CONCLUSIONS
VirPool is a versatile tool for wastewater and other mixed-sample analysis that can handle both short- and long-read sequencing data. Our approach does not require pre-selection of characteristic mutations for variant profiles, it is able to use the entire length of reads instead of just the most informative positions, and can also capture haplotype dependencies within a single read.
Identifiants
pubmed: 36536300
doi: 10.1186/s12859-022-05100-3
pii: 10.1186/s12859-022-05100-3
pmc: PMC9761630
doi:
Substances chimiques
RNA, Viral
0
Wastewater
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
551Subventions
Organisme : Agentúra na Podporu Výskumu a Vývoja
ID : APVV-18-0239
Organisme : Agentúra na Podporu Výskumu a Vývoja
ID : APVV-18-0239
Organisme : Agentúra na Podporu Výskumu a Vývoja
ID : APVV-18-0239
Organisme : Operačný program Integrovaná infraštruktúra
ID : ITMS:313011ATL7
Organisme : Operačný program Integrovaná infraštruktúra
ID : ITMS:313011ATL7
Organisme : Operačný program Integrovaná infraštruktúra
ID : ITMS:313011ATL7
Organisme : Horizon 2020 Framework Programme
ID : 872539
Organisme : Horizon 2020 Framework Programme
ID : 872539
Organisme : Horizon 2020 Framework Programme
ID : 872539
Organisme : Vedecká Grantová Agentúra MŠVVaŠ SR a SAV
ID : 1/0538/22
Organisme : Vedecká Grantová Agentúra MŠVVaŠ SR a SAV
ID : 1/0463/20
Informations de copyright
© 2022. The Author(s).
Références
Lancet Reg Health Eur. 2021 Nov;10:100202
pubmed: 34423327
Water Res. 2021 Sep 1;202:117438
pubmed: 34333296
J Water Health. 2022 Jan;20(1):246-270
pubmed: 35100171
Nat Methods. 2020 Mar;17(3):261-272
pubmed: 32015543
mBio. 2021 Jan 19;12(1):
pubmed: 33468686
Water Res. 2022 Nov 1;226:119306
pubmed: 36369689
Water Res. 2021 Jun 1;197:117104
pubmed: 33857895
Water Res. 2022 Feb 5;214:118162
pubmed: 35193077
Epidemics. 2022 Jun;39:100560
pubmed: 35462206
Cell Rep Med. 2020 Sep 22;1(6):100098
pubmed: 32904687
Bioinformatics. 2018 Sep 15;34(18):3094-3100
pubmed: 29750242
Nat Microbiol. 2020 Nov;5(11):1403-1407
pubmed: 32669681
Proc Natl Acad Sci U S A. 2022 Feb 8;119(6):
pubmed: 35115406
Sci Rep. 2022 Feb 17;12(1):2659
pubmed: 35177697
PLoS One. 2021 Oct 29;16(10):e0259277
pubmed: 34714886
Front Microbiol. 2021 Oct 13;12:747458
pubmed: 34721349
Biol Methods Protoc. 2020 Jul 18;5(1):bpaa014
pubmed: 33029559
Emerg Infect Dis. 2021 May;27(5):1405-1415
pubmed: 33900177
BMC Bioinformatics. 2011 Apr 26;12:119
pubmed: 21521499
Water Res. 2021 Oct 15;205:117710
pubmed: 34607084
Pathogens. 2021 Aug 17;10(8):
pubmed: 34451505
PLoS Comput Biol. 2008 May 09;4(4):e1000074
pubmed: 18437230
J Comput Biol. 2018 Jul;25(7):637-648
pubmed: 29480740
Water Res. 2021 Jul 15;200:117214
pubmed: 34058486
Nat Protoc. 2017 Jun;12(6):1261-1276
pubmed: 28538739
Microbiol Resour Announc. 2021 Apr 15;10(15):
pubmed: 33858934
Nat Biotechnol. 2022 Dec;40(12):1814-1822
pubmed: 35851376
Nat Microbiol. 2022 Aug;7(8):1151-1160
pubmed: 35851854
Glob Chall. 2017 Jan 10;1(1):33-46
pubmed: 31565258