Short Sequence Aligner Benchmarking for Chromatin Research.


Journal

International journal of molecular sciences
ISSN: 1422-0067
Titre abrégé: Int J Mol Sci
Pays: Switzerland
ID NLM: 101092791

Informations de publication

Date de publication:
14 Sep 2023
Historique:
received: 15 08 2023
revised: 09 09 2023
accepted: 12 09 2023
medline: 4 10 2023
pubmed: 28 9 2023
entrez: 28 9 2023
Statut: epublish

Résumé

Much of today's molecular science revolves around next-generation sequencing. Frequently, the first step in analyzing such data is aligning sequencing reads to a reference genome. This step is often taken for granted, but any analysis downstream of the alignment will be affected by the aligner's ability to correctly map sequences. In most cases, for research into chromatin structure and nucleosome positioning, ATAC-seq, ChIP-seq, and MNase-seq experiments use short read lengths. How well aligners manage these reads is critical. Most aligner programs will output mapped reads and unmapped reads. However, from a biological point of view, reads will fall into one of three categories: correctly mapped, incorrectly mapped, and unmapped. While increased sequencing depth can often compensate for unmapped reads, incorrectly and correctly mapped reads appear algorithmically identical but can produce biologically significant alterations in the results. For this reason, we are benchmarking various alignment programs to determine their propensity to incorrectly map short reads. As short-read alignment is an important step in ATAC-seq, ChIP-seq, and MNase-seq experiments, caution should be taken in mapping reads to ensure that the most accurate conclusions can be made from the data generated. Our analysis is intended to help investigators new to the field pick the alignment program best suited for their experimental conditions. In general, the aligners we tested performed well. BWA, Bowtie2, and Chromap were all exceptionally accurate, and we recommend using them. Furthermore, we show that longer read lengths do in fact lead to more accurate mappings.

Identifiants

pubmed: 37762379
pii: ijms241814074
doi: 10.3390/ijms241814074
pmc: PMC10531285
pii:
doi:

Substances chimiques

Chromatin 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Subventions

Organisme : The Kenneth E. and Becky H. Johnson Foundation
ID : Graduate Student Support

Références

Bioinformatics. 2009 Jul 15;25(14):1754-60
pubmed: 19451168
Biomed Res Int. 2014;2014:309650
pubmed: 24779008
Nucleic Acids Res. 2016 Jul 8;44(W1):W160-5
pubmed: 27079975
Cancer Res. 2017 Nov 1;77(21):e31-e34
pubmed: 29092934
Nucleic Acids Res. 2018 Jul 2;46(W1):W537-W544
pubmed: 29790989
Nucleic Acids Res. 2013 May 1;41(10):e108
pubmed: 23558742
PLoS One. 2013 May 31;8(5):e65632
pubmed: 23741504
Nature. 2012 Sep 6;489(7414):57-74
pubmed: 22955616
Nat Commun. 2021 Nov 12;12(1):6566
pubmed: 34772935
Brief Bioinform. 2013 Mar;14(2):178-92
pubmed: 22517427
Gigascience. 2021 Feb 16;10(2):
pubmed: 33590861
Bioinformatics. 2012 Feb 15;28(4):593-4
pubmed: 22199392
Genome Biol. 2019 Mar 14;20(1):50
pubmed: 30867008
Genome Biol. 2021 Aug 26;22(1):249
pubmed: 34446078
Bioinformatics. 2010 Apr 1;26(7):873-81
pubmed: 20147302
Bioinformatics. 2023 Jan 1;39(1):
pubmed: 36562559
Genome Biol. 2008;9(9):R137
pubmed: 18798982
Nat Methods. 2012 Mar 04;9(4):357-9
pubmed: 22388286
Sci Rep. 2019 Jun 27;9(1):9354
pubmed: 31249361
Nat Biotechnol. 2011 Jan;29(1):24-6
pubmed: 21221095

Auteurs

John Lawrence Carter (JL)

Department of Microbiology and Molecular Biology, Brigham Young University, Provo, UT 84602, USA.

Harlan Stevens (H)

Department of Microbiology and Molecular Biology, Brigham Young University, Provo, UT 84602, USA.

Perry G Ridge (PG)

Department of Biology, Brigham Young University, Provo, UT 84602, USA.
Neuroscience Center, College of Life Sciences, Brigham Young University, Provo, UT 84602, USA.

Steven Michael Johnson (SM)

Department of Microbiology and Molecular Biology, Brigham Young University, Provo, UT 84602, USA.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software

Classifications MeSH