CurSa: scripts to curate metadata and sample genomes from GISAID for analysis and display in nextstrain and microreact.

SARS-CoV-2 coronavirus phylogenomics

Journal

Biology methods & protocols
ISSN: 2396-8923
Titre abrégé: Biol Methods Protoc
Pays: England
ID NLM: 101693064

Informations de publication

Date de publication:
2023
Historique:
received: 23 12 2022
revised: 26 03 2023
accepted: 11 04 2023
medline: 14 5 2023
pubmed: 14 5 2023
entrez: 14 5 2023
Statut: epublish

Résumé

The coronavirus SARS-CoV-2 is the most sequenced pathogen ever, with several million genome copies deposited in the GISAID database. This large amount of genomic information poses non-trivial bioinformatic challenges for those interested in studying the evolution of SARS-CoV-2. One common problem when studying the phylogeny of the coronavirus in its geographical context is to count with accurate information of the location of the samples. However, this information is filled by hand by research groups all over the world and sometimes typos and inconsistencies are introduced in the metadata when submitting the sequences to GISAID. Correcting these errors is laborious and time-consuming. Here, we provide a suite of Perl scripts designated to facilitate the curation of this vital information and perform a random sampling of genome sequences if necessary. The scripts provided here can be used to curate geographic information in the metadata and sample the sequences from any country of interest to ease the preparation of files for Nextstrain and Microreact, thus accelerating evolutionary studies of this important pathogen. CurSa scripts are accessible via: https://github.com/luisdelaye/CurSa/.

Identifiants

pubmed: 37180471
doi: 10.1093/biomethods/bpad007
pii: bpad007
pmc: PMC10174701
doi:

Types de publication

Journal Article

Langues

eng

Pagination

bpad007

Informations de copyright

© The Author(s) 2023. Published by Oxford University Press.

Références

Nat Rev Genet. 2022 Sep;23(9):547-562
pubmed: 35459859
Microb Genom. 2021 Nov;7(11):
pubmed: 34846283
Bioinformatics. 2018 Dec 1;34(23):4121-4123
pubmed: 29790939
Microb Genom. 2016 Nov 30;2(11):e000093
pubmed: 28348833
Brief Bioinform. 2021 Mar 22;22(2):631-641
pubmed: 33416890

Auteurs

Luis Delaye (L)

Department of Genetic Engineering, CINVESTAV-Irapuato, Irapuato, Guanajuato 36824, Mexico.

Classifications MeSH