Platon: identification and characterization of bacterial plasmid contigs in short-read draft assemblies exploiting protein sequence-based replicon distribution scores.


Journal

Microbial genomics
ISSN: 2057-5858
Titre abrégé: Microb Genom
Pays: England
ID NLM: 101671820

Informations de publication

Date de publication:
10 2020
Historique:
pubmed: 25 6 2020
medline: 17 8 2021
entrez: 25 6 2020
Statut: ppublish

Résumé

Plasmids are extrachromosomal genetic elements that replicate independently of the chromosome and play a vital role in the environmental adaptation of bacteria. Due to potential mobilization or conjugation capabilities, plasmids are important genetic vehicles for antimicrobial resistance genes and virulence factors with huge and increasing clinical implications. They are therefore subject to large genomic studies within the scientific community worldwide. As a result of rapidly improving next-generation sequencing methods, the quantity of sequenced bacterial genomes is constantly increasing, in turn raising the need for specialized tools to (i) extract plasmid sequences from draft assemblies, (ii) derive their origin and distribution, and (iii) further investigate their genetic repertoire. Recently, several bioinformatic methods and tools have emerged to tackle this issue; however, a combination of high sensitivity and specificity in plasmid sequence identification is rarely achieved in a taxon-independent manner. In addition, many software tools are not appropriate for large high-throughput analyses or cannot be included in existing software pipelines due to their technical design or software implementation. In this study, we investigated differences in the replicon distributions of protein-coding genes on a large scale as a new approach to distinguish plasmid-borne from chromosome-borne contigs. We defined and computed statistical discrimination thresholds for a new metric: the replicon distribution score (RDS), which achieved an accuracy of 96.6 %. The final performance was further improved by the combination of the RDS metric with heuristics exploiting several plasmid-specific higher-level contig characterizations. We implemented this workflow in a new high-throughput taxon-independent bioinformatics software tool called Platon for the recruitment and characterization of plasmid-borne contigs from short-read draft assemblies. Compared to PlasFlow, Platon achieved a higher accuracy (97.5 %) and more balanced predictions (F1=82.6 %) tested on a broad range of bacterial taxa and better or equal performance against the targeted tools PlasmidFinder and PlaScope on sequenced

Identifiants

pubmed: 32579097
doi: 10.1099/mgen.0.000398
pmc: PMC7660248
doi:

Substances chimiques

Bacterial Proteins 0

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Références

Int J Med Microbiol. 2013 Aug;303(6-7):298-304
pubmed: 23499304
PLoS Genet. 2011 Aug;7(8):e1002222
pubmed: 21876676
Nucleic Acids Res. 2018 Jan 4;46(D1):D851-D860
pubmed: 29112715
Bioinformatics. 2017 Feb 15;33(4):475-482
pubmed: 28003256
PLoS Comput Biol. 2020 Mar 5;16(3):e1007134
pubmed: 32134915
Proc Natl Acad Sci U S A. 2015 Nov 17;112(46):14343-7
pubmed: 26534993
Nucleic Acids Res. 2003 Jan 1;31(1):439-41
pubmed: 12520045
Ann Clin Microbiol Antimicrob. 2016 Feb 19;15:10
pubmed: 26896089
Microb Genom. 2018 Nov;4(11):
pubmed: 30383524
Nature. 1976 Oct 28;263(5580):731-8
pubmed: 792710
BMC Bioinformatics. 2009 Dec 15;10:421
pubmed: 20003500
PLoS Biol. 2015 Jul 07;13(7):e1002195
pubmed: 26151137
Front Microbiol. 2017 Feb 09;8:182
pubmed: 28232822
Bioinformatics. 2016 Nov 15;32(22):3380-3387
pubmed: 27466620
Brief Bioinform. 2018 Jan 1;19(1):23-40
pubmed: 27742661
Mol Biol Evol. 2015 Dec;32(12):3079-88
pubmed: 25540453
PeerJ. 2018 Apr 2;6:e4588
pubmed: 29629246
Nucleic Acids Res. 2018 Apr 6;46(6):e35
pubmed: 29346586
Bioinformatics. 2019 Nov 1;35(21):4207-4212
pubmed: 30957837
PLoS Comput Biol. 2017 Jun 8;13(6):e1005595
pubmed: 28594827
Bioinformatics. 2012 Feb 15;28(4):593-4
pubmed: 22199392
Nucleic Acids Res. 2019 Jan 8;47(D1):D506-D515
pubmed: 30395287
Genome Biol. 2017 Dec 19;18(1):233
pubmed: 29258574
Sci Rep. 2016 Mar 16;6:23080
pubmed: 26979785
Front Microbiol. 2019 Feb 19;10:276
pubmed: 30837980
Microbiol Spectr. 2015 Apr;3(2):MDNA3-0019-2014
pubmed: 26104695
Bioinformatics. 2010 Aug 15;26(16):2051-2
pubmed: 20538725
J Antimicrob Chemother. 2013 Jan;68(1):60-7
pubmed: 22949623
Bioinformatics. 2017 Dec 1;33(23):3796-3798
pubmed: 29036591
PLoS Comput Biol. 2011 Oct;7(10):e1002195
pubmed: 22039361
J Comput Biol. 2012 May;19(5):455-77
pubmed: 22506599
Antimicrob Agents Chemother. 2014 Jul;58(7):3895-903
pubmed: 24777092
Microb Genom. 2017 Aug 18;3(10):e000128
pubmed: 29177087
Microb Genom. 2018 Aug;4(8):
pubmed: 30052170
Microbiol Mol Biol Rev. 2010 Sep;74(3):434-52
pubmed: 20805406
PLoS Comput Biol. 2018 Jan 26;14(1):e1005944
pubmed: 29373581
Nat Methods. 2015 Jan;12(1):59-60
pubmed: 25402007
Microb Genom. 2018 Sep;4(9):
pubmed: 30265232
BMC Microbiol. 2014 Jul 12;14:187
pubmed: 25014994
Antimicrob Agents Chemother. 2019 Oct 22;63(11):
pubmed: 31427293
BMC Bioinformatics. 2010 Mar 08;11:119
pubmed: 20211023
Nat Rev Microbiol. 2005 Sep;3(9):711-21
pubmed: 16138099
Nucleic Acids Res. 2019 Jan 8;47(D1):D195-D202
pubmed: 30380090
Nucleic Acids Res. 1993 Jan 25;21(2):361
pubmed: 8441647

Auteurs

Oliver Schwengers (O)

Bioinformatics and Systems Biology, Justus Liebig University Giessen, Giessen, Germany.
Institute of Medical Microbiology, Justus Liebig University Giessen, Giessen, Germany.
German Center for Infection Research (DZIF), partner site Giessen-Marburg-Langen, Giessen, Germany.

Patrick Barth (P)

Bioinformatics and Systems Biology, Justus Liebig University Giessen, Giessen, Germany.

Linda Falgenhauer (L)

Institute of Medical Microbiology, Justus Liebig University Giessen, Giessen, Germany.
German Center for Infection Research (DZIF), partner site Giessen-Marburg-Langen, Giessen, Germany.
Present address: Institute of Hygiene and Environmental Health, Justus Liebig University, Giessen, Germany.

Torsten Hain (T)

Institute of Medical Microbiology, Justus Liebig University Giessen, Giessen, Germany.
German Center for Infection Research (DZIF), partner site Giessen-Marburg-Langen, Giessen, Germany.

Trinad Chakraborty (T)

Institute of Medical Microbiology, Justus Liebig University Giessen, Giessen, Germany.
German Center for Infection Research (DZIF), partner site Giessen-Marburg-Langen, Giessen, Germany.

Alexander Goesmann (A)

Bioinformatics and Systems Biology, Justus Liebig University Giessen, Giessen, Germany.
German Center for Infection Research (DZIF), partner site Giessen-Marburg-Langen, Giessen, Germany.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
Photosynthesis Ribulose-Bisphosphate Carboxylase Carbon Dioxide Molecular Dynamics Simulation Cyanobacteria

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software

Classifications MeSH