Genofunc: genome annotation and identification of genome features for automated pipelining analysis of virus whole genome sequences.


Journal

BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194

Informations de publication

Date de publication:
30 May 2023
Historique:
received: 25 01 2023
accepted: 25 05 2023
medline: 1 6 2023
pubmed: 31 5 2023
entrez: 30 5 2023
Statut: epublish

Résumé

Viral genomics and epidemiology have been increasingly important tools for analysing the spread of key pathogens affecting daily lives of individuals worldwide. With the rapidly expanding scale of pathogen genome sequencing efforts for epidemics and outbreaks efficient workflows in extracting genomic information are becoming increasingly important for answering key research questions. Here we present Genofunc, a toolkit offering a range of command line orientated functions for processing of raw virus genome sequences into aligned and annotated data ready for analysis. The tool contains functions such as genome annotation, feature extraction etc. for processing of large genomic datasets both manual or as part of pipeline such as Snakemake or Nextflow ready for down-stream phylogenetic analysis. Originally designed for a large-scale HIV sequencing project, Genofunc has been benchmarked against annotated sequence gene coordinates from the Los Alamos HIV database as validation with downstream phylogenetic analysis result comparable to past literature as case study. Genofunc is implemented fully in Python and licensed under the MIT license. Source code and documentation is available at: https://github.com/xiaoyu518/genofunc .

Sections du résumé

BACKGROUND BACKGROUND
Viral genomics and epidemiology have been increasingly important tools for analysing the spread of key pathogens affecting daily lives of individuals worldwide. With the rapidly expanding scale of pathogen genome sequencing efforts for epidemics and outbreaks efficient workflows in extracting genomic information are becoming increasingly important for answering key research questions.
RESULTS RESULTS
Here we present Genofunc, a toolkit offering a range of command line orientated functions for processing of raw virus genome sequences into aligned and annotated data ready for analysis. The tool contains functions such as genome annotation, feature extraction etc. for processing of large genomic datasets both manual or as part of pipeline such as Snakemake or Nextflow ready for down-stream phylogenetic analysis. Originally designed for a large-scale HIV sequencing project, Genofunc has been benchmarked against annotated sequence gene coordinates from the Los Alamos HIV database as validation with downstream phylogenetic analysis result comparable to past literature as case study.
CONCLUSION CONCLUSIONS
Genofunc is implemented fully in Python and licensed under the MIT license. Source code and documentation is available at: https://github.com/xiaoyu518/genofunc .

Identifiants

pubmed: 37254048
doi: 10.1186/s12859-023-05356-3
pii: 10.1186/s12859-023-05356-3
pmc: PMC10227794
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

218

Subventions

Organisme : Bill and Melinda Gates Foundation
ID : OPP1175094

Informations de copyright

© 2023. The Author(s).

Références

PLoS Comput Biol. 2013;9(3):e1002947
pubmed: 23555203
Mol Biol Evol. 2020 Jun 1;37(6):1832-1842
pubmed: 32101295
Nucleic Acids Res. 2002 Jul 15;30(14):3059-66
pubmed: 12136088
BMC Evol Biol. 2007 Nov 08;7:214
pubmed: 17996036
BMC Bioinformatics. 2016 Feb 10;17:81
pubmed: 26864881
Bioinformatics. 2018 Dec 1;34(23):4121-4123
pubmed: 29790939
Curr Opin HIV AIDS. 2019 May;14(3):173-180
pubmed: 30946141
Science. 2014 Oct 3;346(6205):56-61
pubmed: 25278604
PLoS Comput Biol. 2022 Nov 30;18(11):e1010745
pubmed: 36449514
F1000Res. 2021 Jan 18;10:33
pubmed: 34035898
FEBS Lett. 2001 Nov 30;509(1):66-70
pubmed: 11734207
Nat Genet. 2021 Jun;53(6):809-816
pubmed: 33972780
Mol Biol Evol. 2012 Nov;29(11):3345-58
pubmed: 22617951
Nature. 2008 Oct 2;455(7213):661-4
pubmed: 18833279
AIDS Res Hum Retroviruses. 2017 May 25;33(11):1083-1098
pubmed: 28540766
Science. 2014 Sep 12;345(6202):1369-72
pubmed: 25214632
Virus Evol. 2018 Jan 08;4(1):vex042
pubmed: 29340210
J Open Source Softw. 2021;6(57):
pubmed: 34189396
Sci Rep. 2021 Nov 29;11(1):23060
pubmed: 34845263
Nat Rev Immunol. 2022 Oct;22(10):597-613
pubmed: 36064780
Mol Biol Evol. 2015 Jan;32(1):268-74
pubmed: 25371430
Lancet Infect Dis. 2015 Mar;15(3):259-61
pubmed: 25749217
Bioinformatics. 2018 Sep 15;34(18):3094-3100
pubmed: 29750242
Lancet HIV. 2020 Mar;7(3):e173-e183
pubmed: 31953184

Auteurs

Xiaoyu Yu (X)

Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, EH9 3FL, Scotland, UK. xiaoyu.yu@ed.ac.uk.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C

Classifications MeSH