Short Sequence Aligner Benchmarking for Chromatin Research.
ChIP-seq
NGS
alignment programs
Journal
International journal of molecular sciences
ISSN: 1422-0067
Titre abrégé: Int J Mol Sci
Pays: Switzerland
ID NLM: 101092791
Informations de publication
Date de publication:
14 Sep 2023
14 Sep 2023
Historique:
received:
15
08
2023
revised:
09
09
2023
accepted:
12
09
2023
medline:
4
10
2023
pubmed:
28
9
2023
entrez:
28
9
2023
Statut:
epublish
Résumé
Much of today's molecular science revolves around next-generation sequencing. Frequently, the first step in analyzing such data is aligning sequencing reads to a reference genome. This step is often taken for granted, but any analysis downstream of the alignment will be affected by the aligner's ability to correctly map sequences. In most cases, for research into chromatin structure and nucleosome positioning, ATAC-seq, ChIP-seq, and MNase-seq experiments use short read lengths. How well aligners manage these reads is critical. Most aligner programs will output mapped reads and unmapped reads. However, from a biological point of view, reads will fall into one of three categories: correctly mapped, incorrectly mapped, and unmapped. While increased sequencing depth can often compensate for unmapped reads, incorrectly and correctly mapped reads appear algorithmically identical but can produce biologically significant alterations in the results. For this reason, we are benchmarking various alignment programs to determine their propensity to incorrectly map short reads. As short-read alignment is an important step in ATAC-seq, ChIP-seq, and MNase-seq experiments, caution should be taken in mapping reads to ensure that the most accurate conclusions can be made from the data generated. Our analysis is intended to help investigators new to the field pick the alignment program best suited for their experimental conditions. In general, the aligners we tested performed well. BWA, Bowtie2, and Chromap were all exceptionally accurate, and we recommend using them. Furthermore, we show that longer read lengths do in fact lead to more accurate mappings.
Identifiants
pubmed: 37762379
pii: ijms241814074
doi: 10.3390/ijms241814074
pmc: PMC10531285
pii:
doi:
Substances chimiques
Chromatin
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Subventions
Organisme : The Kenneth E. and Becky H. Johnson Foundation
ID : Graduate Student Support
Références
Bioinformatics. 2009 Jul 15;25(14):1754-60
pubmed: 19451168
Biomed Res Int. 2014;2014:309650
pubmed: 24779008
Nucleic Acids Res. 2016 Jul 8;44(W1):W160-5
pubmed: 27079975
Cancer Res. 2017 Nov 1;77(21):e31-e34
pubmed: 29092934
Nucleic Acids Res. 2018 Jul 2;46(W1):W537-W544
pubmed: 29790989
Nucleic Acids Res. 2013 May 1;41(10):e108
pubmed: 23558742
PLoS One. 2013 May 31;8(5):e65632
pubmed: 23741504
Nature. 2012 Sep 6;489(7414):57-74
pubmed: 22955616
Nat Commun. 2021 Nov 12;12(1):6566
pubmed: 34772935
Brief Bioinform. 2013 Mar;14(2):178-92
pubmed: 22517427
Gigascience. 2021 Feb 16;10(2):
pubmed: 33590861
Bioinformatics. 2012 Feb 15;28(4):593-4
pubmed: 22199392
Genome Biol. 2019 Mar 14;20(1):50
pubmed: 30867008
Genome Biol. 2021 Aug 26;22(1):249
pubmed: 34446078
Bioinformatics. 2010 Apr 1;26(7):873-81
pubmed: 20147302
Bioinformatics. 2023 Jan 1;39(1):
pubmed: 36562559
Genome Biol. 2008;9(9):R137
pubmed: 18798982
Nat Methods. 2012 Mar 04;9(4):357-9
pubmed: 22388286
Sci Rep. 2019 Jun 27;9(1):9354
pubmed: 31249361
Nat Biotechnol. 2011 Jan;29(1):24-6
pubmed: 21221095