LongTR: genome-wide profiling of genetic variation at tandem repeats from long reads.
Long reads
Microsatellites
Tandem repeats
Journal
Genome biology
ISSN: 1474-760X
Titre abrégé: Genome Biol
Pays: England
ID NLM: 100960660
Informations de publication
Date de publication:
04 Jul 2024
04 Jul 2024
Historique:
received:
30
01
2024
accepted:
21
06
2024
medline:
5
7
2024
pubmed:
5
7
2024
entrez:
4
7
2024
Statut:
epublish
Résumé
Tandem repeats are frequent across the human genome, and variation in repeat length has been linked to a variety of traits. Recent improvements in long read sequencing technologies have the potential to greatly improve tandem repeat analysis, especially for long or complex repeats. Here, we introduce LongTR, which accurately genotypes tandem repeats from high-fidelity long reads available from both PacBio and Oxford Nanopore Technologies. LongTR is freely available at https://github.com/gymrek-lab/longtr and https://zenodo.org/doi/10.5281/zenodo.11403979 .
Identifiants
pubmed: 38965568
doi: 10.1186/s13059-024-03319-2
pii: 10.1186/s13059-024-03319-2
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
176Subventions
Organisme : NHGRI NIH HHS
ID : 1R01HG010149
Pays : United States
Organisme : Intramural Research Program, National Institute on Drug Abuse
ID : U01DA051234
Informations de copyright
© 2024. The Author(s).
Références
Ziaei Jam H, Li Y, DeVito R, Mousavi N, Ma N, Lujumba I, et al. A deep population reference panel of tandem repeat variation. Nat Commun. 2023;14:6711.
doi: 10.1038/s41467-023-42278-3
pubmed: 37872149
pmcid: 10593948
Hannan AJ. Tandem repeats mediating genetic plasticity in health and disease. Nat Rev Genet. 2018;19:286–98.
doi: 10.1038/nrg.2017.115
pubmed: 29398703
Willems T, Zielinski D, Yuan J, Gordon A, Gymrek M, Erlich Y. Genome-wide profiling of heritable and de novo STR variations. Nat Methods. 2017. https://doi.org/10.1038/nmeth.4267 .
doi: 10.1038/nmeth.4267
pubmed: 28436466
pmcid: 5482724
Kristmundsdottir S, Eggertsson HP, Arnadottir GA, Halldorsson BV. popSTR2 enables clinical and population-scale genotyping of microsatellites. Bioinformatics. 2020;36:2269–71.
doi: 10.1093/bioinformatics/btz913
pubmed: 31804671
Mousavi N, Shleizer-Burko S, Yanicky R, Gymrek M. Profiling the genome-wide landscape of tandem repeat expansions. Nucleic Acids Res. 2019;47:e90.
doi: 10.1093/nar/gkz501
pubmed: 31194863
pmcid: 6735967
Dolzhenko E, van Vugt JJFA, Shaw RJ, Bekritsky MA, van Blitterswijk M, Narzisi G, et al. Detection of long repeat expansions from PCR-free whole-genome sequence data. Genome Res. 2017;27:1895–903.
doi: 10.1101/gr.225672.117
pubmed: 28887402
pmcid: 5668946
Wenger AM, Peluso P, Rowell WJ, Chang P-C, Hall RJ, Concepcion GT, et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol. 2019;37:1155–62.
doi: 10.1038/s41587-019-0217-9
pubmed: 31406327
pmcid: 6776680
Jain M, Olsen HE, Paten B, Akeson M. The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol. 2016;17:239.
doi: 10.1186/s13059-016-1103-0
pubmed: 27887629
English AC, Dolzhenko E, Ziaei Jam H, McKenzie SK, Olson ND, De Coster W, et al. Analysis and benchmarking of small and large genomic variants across tandem repeats. Nat Biotechnol. 2024. https://doi.org/10.1038/s41587-024-02225-z .
doi: 10.1038/s41587-024-02225-z
pubmed: 38671154
Rhie A, Nurk S, Cechova M, Hoyt SJ, Taylor DJ, Altemose N, et al. The complete sequence of a human Y chromosome. Nature. 2023;621:344–54.
doi: 10.1038/s41586-023-06457-y
pubmed: 37612512
pmcid: 10752217
Ren J, Gu B, Chaisson MJP. Vamos: Variable-number tandem repeats annotation using efficient motif sets. Genome Biol. 2023;24:175.
doi: 10.1186/s13059-023-03010-y
pubmed: 37501141
pmcid: 10373352
Bolognini D, Magi A, Benes V, Korbel JO, Rausch T. TRiCoLOR: tandem repeat profiling using whole-genome long-read sequencing data. Gigascience 2020;9. https://doi.org/10.1093/gigascience/giaa101 .
Mitsuhashi S, Frith MC, Mizuguchi T, Miyatake S, Toyota T, Adachi H, et al. Tandem-genotypes: robust detection of tandem repeat expansions from long DNA reads. Genome Biol. 2019;20:58.
doi: 10.1186/s13059-019-1667-6
pubmed: 30890163
pmcid: 6425644
Readman C, Indhu-Shree R-B, Jan MF, Inanc B. Straglr: discovering and genotyping tandem repeat expansions using whole genome long-read sequences. Genome Biol. 2021;22:224.
doi: 10.1186/s13059-021-02447-3
Dolzhenko E, English A, Dashnow H, De Sena BG, Mokveld T, Rowell WJ, et al. Characterization and visualization of tandem repeats at genome scale. Nat Biotechnol. 2024. https://doi.org/10.1038/s41587-023-02057-3 .
doi: 10.1038/s41587-023-02057-3
pubmed: 38671154
Nurk S, Walenz BP, Rhie A, Vollger MR, Logsdon GA, Grothe R, et al. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res. 2020;30:1291–305.
doi: 10.1101/gr.263566.120
pubmed: 32801147
pmcid: 7545148
Bakhtiari M, Park J, Ding Y-C, Shleizer-Burko S, Neuhausen SL, Halldórsson BV, et al. Variable number tandem repeats mediate the expression of proximal genes. Nat Commun. 2021;12:2075.
doi: 10.1038/s41467-021-22206-z
pubmed: 33824302
pmcid: 8024321
Park J, Kaufman E, Valdmanis PN, Bafna V. TRviz: a Python library for decomposing and visualizing tandem repeat sequences. Bioinform Adv. 2023;3:vbad058.
doi: 10.1093/bioadv/vbad058
pubmed: 37168281
pmcid: 10166586
Koren S, Bao Z, Guarracino A, Ou S, Goodwin S, Jenike KM, et al. Gapless assembly of complete human and plant chromosomes using only nanopore sequencing. BioRxivorg. 2024. https://doi.org/10.1101/2024.03.15.585294 .
doi: 10.1101/2024.03.15.585294
IGV: Integrative genomics viewer n.d. https://www.igv.org/ (Accessed 2 Jan 2024).
Olson ND, Wagner J, McDaniel J, Stephens SH, Westreich ST, Prasanna AG, et al. PrecisionFDA Truth Challenge V2: Calling variants from short and long reads in difficult-to-map regions. Cell Genom 2022;2. https://doi.org/10.1016/j.xgen.2022.100129 .
Oxford Nanopore technologies. Oxford Nanopore Technologies n.d. https://nanoporetech.com/platform/accuracy (Accessed 7 Jan 2024).
PacBio revio. PacBio 2022. https://www.pacb.com/revio/ (Accessed 7 Jan 2024).
Lee C. Generating consensus sequences from partial order multiple sequence alignment graphs. Bioinformatics. 2003;19:999–1008.
doi: 10.1093/bioinformatics/btg109
pubmed: 12761063
Albers CA, Lunter G, MacArthur DG, McVean G, Ouwehand WH, Durbin R. Dindel: accurate indel calls from short-read data. Genome Res. 2011;21:961–73.
doi: 10.1101/gr.112326.110
pubmed: 20980555
pmcid: 3106329
Bonfield JK, Marshall J, Danecek P, Li H, Ohan V, Whitwham A, et al. HTSlib: C library for reading/writing high-throughput sequencing data. Gigascience 2021;10. https://doi.org/10.1093/gigascience/giab007 .
Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34:3094–100.
doi: 10.1093/bioinformatics/bty191
pubmed: 29750242
pmcid: 6137996
Kiełbasa SM, Wan R, Sato K, Horton P, Frith MC. Adaptive seeds tame genomic sequence comparison. Genome Res. 2011;21:487–93.
doi: 10.1101/gr.113985.110
pubmed: 21209072
pmcid: 3044862
Martin M, Ebert P, Marschall T. Read-based phasing and analysis of phased variants with WhatsHap. Methods Mol Biol. 2023;2590:127–38.
doi: 10.1007/978-1-0716-2819-5_8
pubmed: 36335496
Poplin R, Chang P-C, Alexander D, Schwartz S, Colthurst T, Ku A, et al. A universal SNP and small-indel variant caller using deep neural networks. Nat Biotechnol. 2018;36:983–7.
doi: 10.1038/nbt.4235
pubmed: 30247488
Ziaei Jam H, Zook JM, Javadzadeh S, Park J, Sehgal A, Gymrek M. LongTR. GitHub 2023. https://github.com/gymrek-lab/LongTR (Accessed 2024).
Ziaei Jam H, Zook JM, Javadzadeh S, Park J, Sehgal A, Gymrek M. LongTR. Zenodo 2024. https://zenodo.org/doi/10.5281/zenodo.11403979 (Accessed 2024).
English A. Project adotto tandem-repeat regions and annotations 2022. https://doi.org/10.5281/ZENODO.6930201 .
Datasets - PacBio - Highly accurate long-read sequencing. PacBio 2020. https://www.pacb.com/connect/datasets/ (Accessed 4 June 2024).
Oxford Nanopore Technologies. Sequencing Genome in a Bottle samples 2023. https://doi.org/10.5281/ZENODO.8363974 .
Zook JM, Catoe D, McDaniel J, Vang L, Spies N, Sidow A, et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci Data. 2016;3:160025.
doi: 10.1038/sdata.2016.25
pubmed: 27271295
pmcid: 4896128