Phylogenetic Analysis of SARS-CoV-2 Data Is Difficult.

SARS-CoV-2 outgroups phylogenetic inference phylogeny rooting strain classification

Journal

Molecular biology and evolution
ISSN: 1537-1719
Titre abrégé: Mol Biol Evol
Pays: United States
ID NLM: 8501455

Informations de publication

Date de publication:
04 05 2021
Historique:
pubmed: 15 12 2020
medline: 19 5 2021
entrez: 14 12 2020
Statut: ppublish

Résumé

Numerous studies covering some aspects of SARS-CoV-2 data analyses are being published on a daily basis, including a regularly updated phylogeny on nextstrain.org. Here, we review the difficulties of inferring reliable phylogenies by example of a data snapshot comprising a quality-filtered subset of 8,736 out of all 16,453 virus sequences available on May 5, 2020 from gisaid.org. We find that it is difficult to infer a reliable phylogeny on these data due to the large number of sequences in conjunction with the low number of mutations. We further find that rooting the inferred phylogeny with some degree of confidence either via the bat and pangolin outgroups or by applying novel computational methods on the ingroup phylogeny does not appear to be credible. Finally, an automatic classification of the current sequences into subclasses using the mPTP tool for molecular species delimitation is also, as might be expected, not possible, as the sequences are too closely related. We conclude that, although the application of phylogenetic methods to disentangle the evolution and spread of COVID-19 provides some insight, results of phylogenetic analyses, in particular those conducted under the default settings of current phylogenetic inference tools, as well as downstream analyses on the inferred phylogenies, should be considered and interpreted with extreme caution.

Identifiants

pubmed: 33316067
pii: 6030946
doi: 10.1093/molbev/msaa314
pmc: PMC7798910
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

1777-1791

Informations de copyright

© The Author(s) 2020. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Références

J Med Virol. 2020 Jun;92(6):602-611
pubmed: 32104911
Bioinformatics. 2020 May 1;36(10):3263-3265
pubmed: 32016344
PLoS Genet. 2020 Nov 18;16(11):e1009175
pubmed: 33206635
Science. 2020 Jul 17;369(6501):297-301
pubmed: 32471856
Virus Evol. 2020 May 14;6(1):veaa034
pubmed: 32817804
Bioinformatics. 2019 Nov 1;35(21):4453-4455
pubmed: 31070718
Science. 2020 Jul 31;369(6503):582-587
pubmed: 32513865
Bioinformatics. 2019 May 15;35(10):1771-1773
pubmed: 30321303
Mol Biol Evol. 2018 Feb 1;35(2):518-522
pubmed: 29077904
PLoS One. 2010 Mar 10;5(3):e9490
pubmed: 20224823
J Med Virol. 2020 Sep;92(9):1386-1390
pubmed: 32311094
Nat Microbiol. 2020 Jul;5(7):876-877
pubmed: 32427978
Nat Commun. 2020 Oct 9;11(1):5110
pubmed: 33037213
Syst Biol. 2019 Mar 1;68(2):365-369
pubmed: 30165689
Nat Microbiol. 2020 Nov;5(11):1403-1407
pubmed: 32669681
Nature. 2020 Mar;579(7798):270-273
pubmed: 32015507
N Engl J Med. 2020 Jun 11;382(24):2302-2315
pubmed: 32289214
JMIR Public Health Surveill. 2020 Jun 1;6(2):e19170
pubmed: 32412415
Infect Genet Evol. 2020 Sep;83:104351
pubmed: 32387564
Lancet. 2020 Feb 22;395(10224):565-574
pubmed: 32007145
Mol Biol Evol. 2021 Apr 13;38(4):1537-1543
pubmed: 33295605
Syst Biol. 2008 Oct;57(5):758-71
pubmed: 18853362
BMC Bioinformatics. 2021 May 1;22(1):225
pubmed: 33932975
J Med Virol. 2020 Jun;92(6):595-601
pubmed: 32100877
Bioinformatics. 2020 Apr 1;36(7):2280-2281
pubmed: 31755898
Mol Biol Evol. 2013 Apr;30(4):772-80
pubmed: 23329690
Nat Med. 2020 Apr;26(4):450-452
pubmed: 32284615
J Mol Biol. 2020 May 1;432(10):3309-3325
pubmed: 32320687
Bioinformatics. 2017 Jun 01;33(11):1630-1638
pubmed: 28108445
Syst Biol. 2000 Dec;49(4):652-70
pubmed: 12116432
BMC Bioinformatics. 2013 Nov 06;14:317
pubmed: 24191891
Bioinformatics. 2019 Apr 1;35(7):1151-1158
pubmed: 30169747
Genome Res. 2020 Oct;30(10):1434-1448
pubmed: 32878977
Mol Biol Evol. 2020 Jan 1;37(1):291-294
pubmed: 31432070
Nat Commun. 2011;2:321
pubmed: 21610724
Bioinformatics. 2018 Dec 1;34(23):4121-4123
pubmed: 29790939
Aerosol Air Qual Res. 2020 Jun;20(6):1167-1171
pubmed: 33424954
Euro Surveill. 2017 Mar 30;22(13):
pubmed: 28382917
Virus Evol. 2020 Aug 19;6(2):veaa061
pubmed: 33235813
Syst Biol. 2007 Apr;56(2):355-63
pubmed: 17464890
PLoS Comput Biol. 2011 Oct;7(10):e1002195
pubmed: 22039361
Euro Surveill. 2020 Aug;25(32):
pubmed: 32794443
Mol Phylogenet Evol. 2006 Nov;41(2):384-94
pubmed: 16815047
PeerJ. 2019 Oct 25;7:e7754
pubmed: 31667012

Auteurs

Benoit Morel (B)

Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.

Pierre Barbera (P)

Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.

Lucas Czech (L)

Department of Plant Biology, Carnegie Institution for Science, Stanford, CA, USA.

Ben Bettisworth (B)

Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.

Lukas Hübner (L)

Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.
Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany.

Sarah Lutteropp (S)

Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.

Dora Serdari (D)

Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.

Evangelia-Georgia Kostaki (EG)

Department of Hygiene Epidemiology and Medical Statistics, School of Medicine, National and Kapodistrian University of Athens, Athens, Greece.

Ioannis Mamais (I)

Department of Health Sciences, European University Cyprus, Nicosia, Cyprus.

Alexey M Kozlov (AM)

Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.

Pavlos Pavlidis (P)

Institute of Computer Science, Foundation for Research and Technology-Hellas, Crete, Greece.

Dimitrios Paraskevis (D)

Department of Hygiene Epidemiology and Medical Statistics, School of Medicine, National and Kapodistrian University of Athens, Athens, Greece.

Alexandros Stamatakis (A)

Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.
Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C

Classifications MeSH