GenMap: ultra-fast computation of genome mappability.


Journal

Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944

Informations de publication

Date de publication:
01 06 2020
Historique:
received: 14 12 2019
revised: 23 03 2020
accepted: 31 03 2020
pubmed: 5 4 2020
medline: 29 12 2020
entrez: 5 4 2020
Statut: ppublish

Résumé

Computing the uniqueness of k-mers for each position of a genome while allowing for up to e mismatches is computationally challenging. However, it is crucial for many biological applications such as the design of guide RNA for CRISPR experiments. More formally, the uniqueness or (k, e)-mappability can be described for every position as the reciprocal value of how often this k-mer occurs approximately in the genome, i.e. with up to e mismatches. We present a fast method GenMap to compute the (k, e)-mappability. We extend the mappability algorithm, such that it can also be computed across multiple genomes where a k-mer occurrence is only counted once per genome. This allows for the computation of marker sequences or finding candidates for probe design by identifying approximate k-mers that are unique to a genome or that are present in all genomes. GenMap supports different formats such as binary output, wig and bed files as well as csv files to export the location of all approximate k-mers for each genomic position. GenMap can be installed via bioconda. Binaries and C++ source code are available on https://github.com/cpockrandt/genmap.

Identifiants

pubmed: 32246826
pii: 5815974
doi: 10.1093/bioinformatics/btaa222
pmc: PMC7320602
doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

3687-3692

Subventions

Organisme : NIGMS NIH HHS
ID : R35 GM130151
Pays : United States

Informations de copyright

© The Author(s) 2020. Published by Oxford University Press.

Références

Nucleic Acids Res. 1995 Nov 11;23(21):4407-14
pubmed: 7501463
Genome Biol. 2009;10(3):R25
pubmed: 19261174
J Biotechnol. 2017 Nov 10;261:157-168
pubmed: 28888961
PLoS One. 2012;7(1):e30377
pubmed: 22276185
Biochim Biophys Acta. 1976 Feb 18;425(1):30-40
pubmed: 1247616
Nat Methods. 2012 Dec;9(12):1185-8
pubmed: 23103880
Appl Environ Microbiol. 2000 Oct;66(10):4555-8
pubmed: 11010916
Nucleic Acids Res. 2018 Nov 16;46(20):e120
pubmed: 30169659
Nature. 2017 Apr 26;544(7651):427-433
pubmed: 28447635
Bioinformatics. 2011 Jan 15;27(2):272-4
pubmed: 21075741
Bioinformatics. 2012 Dec 15;28(24):3169-77
pubmed: 23060614

Auteurs

Christopher Pockrandt (C)

Center for Computational Biology, School of Medicine.
Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
Department of Computer Science and Mathematics, Freie Universität Berlin.
Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany.

Mai Alzamel (M)

Department of Informatics, King's College London, London, UK.
Department of Computer Science, King Saud University, Riyadh, Saudi Arabia.

Costas S Iliopoulos (CS)

Department of Informatics, King's College London, London, UK.

Knut Reinert (K)

Department of Computer Science and Mathematics, Freie Universität Berlin.
Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software

Classifications MeSH