Getting insight into the pan-genome structure with PangTree.


Journal

BMC genomics
ISSN: 1471-2164
Titre abrégé: BMC Genomics
Pays: England
ID NLM: 100965258

Informations de publication

Date de publication:
16 Apr 2020
Historique:
entrez: 18 4 2020
pubmed: 18 4 2020
medline: 14 1 2021
Statut: epublish

Résumé

The term pan-genome was proposed to denominate collections of genomic sequences jointly analyzed or used as a reference. The constant growth of genomic data intensifies development of data structures and algorithms to investigate pan-genomes efficiently. This work focuses on providing a tool for discovering and visualizing the relationships between the sequences constituting a pan-genome. A new structure to represent such relationships - called affinity tree - is proposed. Each node of this tree has assigned a subset of genomes, as well as their homogeneity level and averaged consensus sequence. Moreover, subsets assigned to sibling nodes form a partition of the genomes assigned to their parent. Functionality of affinity tree is demonstrated on simulated data and on the Ebola virus pan-genome. Furthermore, two software packages are provided: PangTreeBuild constructs affinity tree, while PangTreeVis presents its result.

Sections du résumé

BACKGROUND BACKGROUND
The term pan-genome was proposed to denominate collections of genomic sequences jointly analyzed or used as a reference. The constant growth of genomic data intensifies development of data structures and algorithms to investigate pan-genomes efficiently.
RESULTS RESULTS
This work focuses on providing a tool for discovering and visualizing the relationships between the sequences constituting a pan-genome. A new structure to represent such relationships - called affinity tree - is proposed. Each node of this tree has assigned a subset of genomes, as well as their homogeneity level and averaged consensus sequence. Moreover, subsets assigned to sibling nodes form a partition of the genomes assigned to their parent.
CONCLUSIONS CONCLUSIONS
Functionality of affinity tree is demonstrated on simulated data and on the Ebola virus pan-genome. Furthermore, two software packages are provided: PangTreeBuild constructs affinity tree, while PangTreeVis presents its result.

Identifiants

pubmed: 32299360
doi: 10.1186/s12864-020-6610-4
pii: 10.1186/s12864-020-6610-4
pmc: PMC7161101
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

274

Références

Genome Res. 2017 May;27(5):737-746
pubmed: 28100585
Funct Integr Genomics. 2015 Mar;15(2):141-61
pubmed: 25722247
Proc Natl Acad Sci U S A. 2005 Sep 27;102(39):13950-5
pubmed: 16172379
Genome Biol. 2010;11(5):207
pubmed: 20441614
Genome Res. 2008 Nov;18(11):1814-28
pubmed: 18849524
Mol Ecol. 2012 Nov;21(22):5404-17
pubmed: 22913817
Genome Res. 2014 Dec;24(12):2077-89
pubmed: 25273068
Bioinformatics. 2012 Apr 15;28(8):1086-92
pubmed: 22368243
BMC Bioinformatics. 2014 Apr 09;15:99
pubmed: 24712884
Genome Res. 2004 Sep;14(9):1786-96
pubmed: 15342561
Brief Bioinform. 2018 Jan 1;19(1):118-135
pubmed: 27769991
Adv Virus Res. 2018;100:189-221
pubmed: 29551136
Arch Virol. 2010 Dec;155(12):2083-103
pubmed: 21046175
J Comput Biol. 2011 Mar;18(3):469-81
pubmed: 21385048
Bioinformatics. 2002 Mar;18(3):452-64
pubmed: 11934745
Bioinformatics. 2003 May 22;19(8):999-1008
pubmed: 12761063

Auteurs

Paulina Dziadkiewicz (P)

Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Banacha 2, Warsaw, 02-097, Poland.
Faculty of Mathematics and Information Science, Warsaw University of Technology, Koszykowa 75, Warsaw, 02-097, Poland.

Norbert Dojer (N)

Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Banacha 2, Warsaw, 02-097, Poland. dojer@mimuw.edu.pl.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
Animals Hemiptera Insect Proteins Phylogeny Insecticides
Amaryllidaceae Alkaloids Lycoris NADPH-Ferrihemoprotein Reductase Gene Expression Regulation, Plant Plant Proteins

Classifications MeSH