SARS-CoV-2 lineage assignments using phylogenetic placement/UShER are superior to pangoLEARN machine-learning method.

Bioinformatics COVID-19 Phylogenetics variants

Journal

Virus evolution
ISSN: 2057-1577
Titre abrégé: Virus Evol
Pays: England
ID NLM: 101664675

Informations de publication

Date de publication:
2024
Historique:
received: 05 06 2023
revised: 13 12 2023
accepted: 05 01 2024
medline: 16 2 2024
pubmed: 16 2 2024
entrez: 16 2 2024
Statut: epublish

Résumé

With the rapid spread and evolution of SARS-CoV-2, the ability to monitor its transmission and distinguish among viral lineages is critical for pandemic response efforts. The most commonly used software for the lineage assignment of newly isolated SARS-CoV-2 genomes is pangolin, which offers two methods of assignment, pangoLEARN and pUShER. PangoLEARN rapidly assigns lineages using a machine-learning algorithm, while pUShER performs a phylogenetic placement to identify the lineage corresponding to a newly sequenced genome. In a preliminary study, we observed that pangoLEARN (decision tree model), while substantially faster than pUShER, offered less consistency across different versions of pangolin v3. Here, we expand upon this analysis to include v3 and v4 of pangolin, which moved the default algorithm for lineage assignment from pangoLEARN in v3 to pUShER in v4, and perform a thorough analysis confirming that pUShER is not only more stable across versions but also more accurate. Our findings suggest that future lineage assignment algorithms for various pathogens should consider the value of phylogenetic placement.

Identifiants

pubmed: 38361813
doi: 10.1093/ve/vead085
pii: vead085
pmc: PMC10868549
doi:

Types de publication

Journal Article

Langues

eng

Pagination

vead085

Informations de copyright

© The Author(s) 2024. Published by Oxford University Press.

Déclaration de conflit d'intérêts

None declared.

Auteurs

Adriano de Bernardi Schneider (A)

Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA.
Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA.

Michelle Su (M)

Department of Health and Mental Hygiene, New York City Public Health Laboratory, New York, NY 10016, USA.

Angie S Hinrichs (AS)

Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA.

Jade Wang (J)

Department of Health and Mental Hygiene, New York City Public Health Laboratory, New York, NY 10016, USA.

Helly Amin (H)

Department of Health and Mental Hygiene, New York City Public Health Laboratory, New York, NY 10016, USA.

John Bell (J)

California Department of Public Health (CDPH), VRDL/COVIDNet, Richmond, CA 94804, USA.

Debra A Wadford (DA)

California Department of Public Health (CDPH), VRDL/COVIDNet, Richmond, CA 94804, USA.

Áine O'Toole (Á)

Institute of Evolutionary Biology, University of Edinburgh, Edinburgh EH9 3FL, UK.

Emily Scher (E)

Institute of Evolutionary Biology, University of Edinburgh, Edinburgh EH9 3FL, UK.

Marc D Perry (MD)

Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA.

Yatish Turakhia (Y)

Department of Electrical and Computer Engineering, University of California San Diego, San Diego, CA 92093, USA.

Nicola De Maio (N)

European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton CB10 1SD, UK.

Scott Hughes (S)

Department of Health and Mental Hygiene, New York City Public Health Laboratory, New York, NY 10016, USA.

Russ Corbett-Detig (R)

Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA.
Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA.

Classifications MeSH