High Performance Integration Pipeline for Viral and Epitope Sequences.
COVID-19
SARS-CoV-2
bioinformatics
data integration
metadata management
sequence analysis
viral datasets
Journal
Biotech (Basel (Switzerland))
ISSN: 2673-6284
Titre abrégé: BioTech (Basel)
Pays: Switzerland
ID NLM: 9918383086206676
Informations de publication
Date de publication:
21 Mar 2022
21 Mar 2022
Historique:
received:
31
01
2022
revised:
08
03
2022
accepted:
15
03
2022
entrez:
13
7
2022
pubmed:
14
7
2022
medline:
14
7
2022
Statut:
epublish
Résumé
With the spread of COVID-19, sequencing laboratories started to share hundreds of sequences daily. However, the lack of a commonly agreed standard across deposition databases hindered the exploration and study of all the viral sequences collected worldwide in a practical and homogeneous way. During the first months of the pandemic, we developed an automatic procedure to collect, transform, and integrate viral sequences of SARS-CoV-2, MERS, SARS-CoV, Ebola, and Dengue from four major database institutions (NCBI, COG-UK, GISAID, and NMDC). This data pipeline allowed the creation of the data exploration interfaces VirusViz and EpiSurf, as well as ViruSurf, one of the largest databases of integrated viral sequences. Almost two years after the first release of the repository, the original pipeline underwent a thorough refinement process and became more efficient, scalable, and general (currently, it also includes epitopes from the IEDB). Thanks to these improvements, we constantly update and expand our integrated repository, encompassing about 9.1 million SARS-CoV-2 sequences at present (March 2022). This pipeline made it possible to design and develop fundamental resources for any researcher interested in understanding the biological mechanisms behind the viral infection. In addition, it plays a crucial role in many analytic and visualization tools, such as ViruSurf, EpiSurf, VirusViz, and VirusLab.
Identifiants
pubmed: 35822815
pii: biotech11010007
doi: 10.3390/biotech11010007
pmc: PMC9245902
pii:
doi:
Types de publication
Journal Article
Langues
eng
Subventions
Organisme : European Research Council
ID : 693174
Pays : International
Organisme : European Institute of Innovation and Technology
ID : 20663
Références
Nucleic Acids Res. 2012 Jan;40(Database issue):D136-43
pubmed: 22139910
Innovation (Camb). 2021 Nov 28;2(4):100150
pubmed: 34401863
Virus Evol. 2021 Jul 30;7(2):veab064
pubmed: 34527285
Database (Oxford). 2019 Jan 1;2019:
pubmed: 31820804
Fly (Austin). 2012 Apr-Jun;6(2):80-92
pubmed: 22728672
Nucleic Acids Res. 2019 Jan 8;47(D1):D339-D343
pubmed: 30357391
Nucleic Acids Res. 2021 Sep 7;49(15):e90
pubmed: 34107016
J Med Internet Res. 2020 Oct 2;22(10):e22299
pubmed: 32931441
Nucleic Acids Res. 2016 Jan 4;44(D1):D733-45
pubmed: 26553804
Nucleic Acids Res. 2018 Jan 4;46(D1):D8-D13
pubmed: 29140470
Yi Chuan. 2020 Aug 20;42(8):799-809
pubmed: 32952115
Nucleic Acids Res. 2021 Jan 8;49(D1):D817-D824
pubmed: 33045721
Lancet Microbe. 2020 Jul;1(3):e99-e100
pubmed: 32835336
Database (Oxford). 2021 Sep 29;2021:
pubmed: 34585726
Euro Surveill. 2017 Mar 30;22(13):
pubmed: 28382917
Nucleic Acids Res. 2019 Jan 8;47(D1):D94-D99
pubmed: 30365038
Nucleic Acids Res. 2012 Jan;40(Database issue):D57-63
pubmed: 22139929