A Snapshot of SARS-CoV-2 Genome Availability up to April 2020 and its Implications: Data Analysis.
covid-19
evolution
genetics
genome
infectious disease
pandemic
phylogenetics
sars-cov-2
sequence
tracing
tracking
transmission
virus
Journal
JMIR public health and surveillance
ISSN: 2369-2960
Titre abrégé: JMIR Public Health Surveill
Pays: Canada
ID NLM: 101669345
Informations de publication
Date de publication:
01 06 2020
01 06 2020
Historique:
received:
06
04
2020
accepted:
12
05
2020
revised:
09
05
2020
pubmed:
16
5
2020
medline:
6
6
2020
entrez:
16
5
2020
Statut:
epublish
Résumé
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has been growing exponentially, affecting over 4 million people and causing enormous distress to economies and societies worldwide. A plethora of analyses based on viral sequences has already been published both in scientific journals and through non-peer-reviewed channels to investigate the genetic heterogeneity and spatiotemporal dissemination of SARS-CoV-2. However, a systematic investigation of phylogenetic information and sampling bias in the available data is lacking. Although the number of available genome sequences of SARS-CoV-2 is growing daily and the sequences show increasing phylogenetic information, country-specific data still present severe limitations and should be interpreted with caution. The objective of this study was to determine the quality of the currently available SARS-CoV-2 full genome data in terms of sampling bias as well as phylogenetic and temporal signals to inform and guide the scientific community. We used maximum likelihood-based methods to assess the presence of sufficient information for robust phylogenetic and phylogeographic studies in several SARS-CoV-2 sequence alignments assembled from GISAID (Global Initiative on Sharing All Influenza Data) data released between March and April 2020. Although the number of high-quality full genomes is growing daily, and sequence data released in April 2020 contain sufficient phylogenetic information to allow reliable inference of phylogenetic relationships, country-specific SARS-CoV-2 data sets still present severe limitations. At the present time, studies assessing within-country spread or transmission clusters should be considered preliminary or hypothesis-generating at best. Hence, current reports should be interpreted with caution, and concerted efforts should continue to increase the number and quality of sequences required for robust tracing of the epidemic.
Sections du résumé
BACKGROUND
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has been growing exponentially, affecting over 4 million people and causing enormous distress to economies and societies worldwide. A plethora of analyses based on viral sequences has already been published both in scientific journals and through non-peer-reviewed channels to investigate the genetic heterogeneity and spatiotemporal dissemination of SARS-CoV-2. However, a systematic investigation of phylogenetic information and sampling bias in the available data is lacking. Although the number of available genome sequences of SARS-CoV-2 is growing daily and the sequences show increasing phylogenetic information, country-specific data still present severe limitations and should be interpreted with caution.
OBJECTIVE
The objective of this study was to determine the quality of the currently available SARS-CoV-2 full genome data in terms of sampling bias as well as phylogenetic and temporal signals to inform and guide the scientific community.
METHODS
We used maximum likelihood-based methods to assess the presence of sufficient information for robust phylogenetic and phylogeographic studies in several SARS-CoV-2 sequence alignments assembled from GISAID (Global Initiative on Sharing All Influenza Data) data released between March and April 2020.
RESULTS
Although the number of high-quality full genomes is growing daily, and sequence data released in April 2020 contain sufficient phylogenetic information to allow reliable inference of phylogenetic relationships, country-specific SARS-CoV-2 data sets still present severe limitations.
CONCLUSIONS
At the present time, studies assessing within-country spread or transmission clusters should be considered preliminary or hypothesis-generating at best. Hence, current reports should be interpreted with caution, and concerted efforts should continue to increase the number and quality of sequences required for robust tracing of the epidemic.
Identifiants
pubmed: 32412415
pii: v6i2e19170
doi: 10.2196/19170
pmc: PMC7265655
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
e19170Commentaires et corrections
Type : ErratumIn
Type : CommentIn
Informations de copyright
©Carla Mavian, Simone Marini, Mattia Prosperi, Marco Salemi. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 01.06.2020.
Références
Mol Biol Evol. 2017 Apr 1;34(4):997-1007
pubmed: 28100788
Virus Evol. 2018 Jan 08;4(1):vex042
pubmed: 29340210
PLoS Biol. 2006 May;4(5):e88
pubmed: 16683862
Proc Natl Acad Sci U S A. 2020 Jun 9;117(23):12520-12521
pubmed: 32381737
Mol Biol Evol. 2004 Feb;21(2):255-65
pubmed: 14660700
Nucleic Acids Res. 2016 Jul 8;44(W1):W232-5
pubmed: 27084950
J Med Virol. 2020 May;92(5):518-521
pubmed: 32022275
J Hered. 2001 Jul-Aug;92(4):371-3
pubmed: 11535656
Sci Rep. 2017 Aug 18;7(1):8718
pubmed: 28821712
Mol Biol Evol. 2018 Jun 1;35(6):1550-1552
pubmed: 29669107
Lancet. 2020 Mar 28;395(10229):1039-1046
pubmed: 32192580
Lancet Infect Dis. 2020 Aug;20(8):911-919
pubmed: 32353347
Proc Natl Acad Sci U S A. 1997 Jun 24;94(13):6815-9
pubmed: 9192648
Science. 2020 May 8;368(6491):
pubmed: 32234805
Syst Biol. 2016 Jan;65(1):82-97
pubmed: 26424727
JAMA. 2020 Apr 10;:
pubmed: 32275295
Bioinformatics. 2002 Mar;18(3):502-4
pubmed: 11934758
Proc Natl Acad Sci U S A. 2020 Jun 9;117(23):12518-12519
pubmed: 32381733
Proc Natl Acad Sci U S A. 2020 Apr 28;117(17):9241-9243
pubmed: 32269081
Nat Med. 2020 Mar;26(3):317-319
pubmed: 32108160
Proc Natl Acad Sci U S A. 2020 Jun 9;117(23):12522-12523
pubmed: 32381734
BMC Evol Biol. 2007 Nov 08;7:214
pubmed: 17996036
Emerg Infect Dis. 2020 Jun;26(6):1343-1345
pubmed: 32163030
Bioinformatics. 2018 Dec 1;34(23):4121-4123
pubmed: 29790939
Euro Surveill. 2017 Mar 30;22(13):
pubmed: 28382917
Bioinformatics. 2016 Jul 1;32(13):1933-42
pubmed: 27153688
Lancet Infect Dis. 2020 May;20(5):533-534
pubmed: 32087114
Mol Biol Evol. 2015 Jan;32(1):268-74
pubmed: 25371430
Epidemics. 2015 Mar;10:88-92
pubmed: 25843391
Virus Evol. 2016 Apr 09;2(1):vew007
pubmed: 27774300
J Med Virol. 2020 Mar 29;:
pubmed: 32222993