LongTR: genome-wide profiling of genetic variation at tandem repeats from long reads.


Journal

Genome biology
ISSN: 1474-760X
Titre abrégé: Genome Biol
Pays: England
ID NLM: 100960660

Informations de publication

Date de publication:
04 Jul 2024
Historique:
received: 30 01 2024
accepted: 21 06 2024
medline: 5 7 2024
pubmed: 5 7 2024
entrez: 4 7 2024
Statut: epublish

Résumé

Tandem repeats are frequent across the human genome, and variation in repeat length has been linked to a variety of traits. Recent improvements in long read sequencing technologies have the potential to greatly improve tandem repeat analysis, especially for long or complex repeats. Here, we introduce LongTR, which accurately genotypes tandem repeats from high-fidelity long reads available from both PacBio and Oxford Nanopore Technologies. LongTR is freely available at https://github.com/gymrek-lab/longtr and https://zenodo.org/doi/10.5281/zenodo.11403979 .

Identifiants

pubmed: 38965568
doi: 10.1186/s13059-024-03319-2
pii: 10.1186/s13059-024-03319-2
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

176

Subventions

Organisme : NHGRI NIH HHS
ID : 1R01HG010149
Pays : United States
Organisme : Intramural Research Program, National Institute on Drug Abuse
ID : U01DA051234

Informations de copyright

© 2024. The Author(s).

Références

Ziaei Jam H, Li Y, DeVito R, Mousavi N, Ma N, Lujumba I, et al. A deep population reference panel of tandem repeat variation. Nat Commun. 2023;14:6711.
doi: 10.1038/s41467-023-42278-3 pubmed: 37872149 pmcid: 10593948
Hannan AJ. Tandem repeats mediating genetic plasticity in health and disease. Nat Rev Genet. 2018;19:286–98.
doi: 10.1038/nrg.2017.115 pubmed: 29398703
Willems T, Zielinski D, Yuan J, Gordon A, Gymrek M, Erlich Y. Genome-wide profiling of heritable and de novo STR variations. Nat Methods. 2017. https://doi.org/10.1038/nmeth.4267 .
doi: 10.1038/nmeth.4267 pubmed: 28436466 pmcid: 5482724
Kristmundsdottir S, Eggertsson HP, Arnadottir GA, Halldorsson BV. popSTR2 enables clinical and population-scale genotyping of microsatellites. Bioinformatics. 2020;36:2269–71.
doi: 10.1093/bioinformatics/btz913 pubmed: 31804671
Mousavi N, Shleizer-Burko S, Yanicky R, Gymrek M. Profiling the genome-wide landscape of tandem repeat expansions. Nucleic Acids Res. 2019;47:e90.
doi: 10.1093/nar/gkz501 pubmed: 31194863 pmcid: 6735967
Dolzhenko E, van Vugt JJFA, Shaw RJ, Bekritsky MA, van Blitterswijk M, Narzisi G, et al. Detection of long repeat expansions from PCR-free whole-genome sequence data. Genome Res. 2017;27:1895–903.
doi: 10.1101/gr.225672.117 pubmed: 28887402 pmcid: 5668946
Wenger AM, Peluso P, Rowell WJ, Chang P-C, Hall RJ, Concepcion GT, et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol. 2019;37:1155–62.
doi: 10.1038/s41587-019-0217-9 pubmed: 31406327 pmcid: 6776680
Jain M, Olsen HE, Paten B, Akeson M. The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol. 2016;17:239.
doi: 10.1186/s13059-016-1103-0 pubmed: 27887629
English AC, Dolzhenko E, Ziaei Jam H, McKenzie SK, Olson ND, De Coster W, et al. Analysis and benchmarking of small and large genomic variants across tandem repeats. Nat Biotechnol. 2024. https://doi.org/10.1038/s41587-024-02225-z .
doi: 10.1038/s41587-024-02225-z pubmed: 38671154
Rhie A, Nurk S, Cechova M, Hoyt SJ, Taylor DJ, Altemose N, et al. The complete sequence of a human Y chromosome. Nature. 2023;621:344–54.
doi: 10.1038/s41586-023-06457-y pubmed: 37612512 pmcid: 10752217
Ren J, Gu B, Chaisson MJP. Vamos: Variable-number tandem repeats annotation using efficient motif sets. Genome Biol. 2023;24:175.
doi: 10.1186/s13059-023-03010-y pubmed: 37501141 pmcid: 10373352
Bolognini D, Magi A, Benes V, Korbel JO, Rausch T. TRiCoLOR: tandem repeat profiling using whole-genome long-read sequencing data. Gigascience 2020;9. https://doi.org/10.1093/gigascience/giaa101 .
Mitsuhashi S, Frith MC, Mizuguchi T, Miyatake S, Toyota T, Adachi H, et al. Tandem-genotypes: robust detection of tandem repeat expansions from long DNA reads. Genome Biol. 2019;20:58.
doi: 10.1186/s13059-019-1667-6 pubmed: 30890163 pmcid: 6425644
Readman C, Indhu-Shree R-B, Jan MF, Inanc B. Straglr: discovering and genotyping tandem repeat expansions using whole genome long-read sequences. Genome Biol. 2021;22:224.
doi: 10.1186/s13059-021-02447-3
Dolzhenko E, English A, Dashnow H, De Sena BG, Mokveld T, Rowell WJ, et al. Characterization and visualization of tandem repeats at genome scale. Nat Biotechnol. 2024. https://doi.org/10.1038/s41587-023-02057-3 .
doi: 10.1038/s41587-023-02057-3 pubmed: 38671154
Nurk S, Walenz BP, Rhie A, Vollger MR, Logsdon GA, Grothe R, et al. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res. 2020;30:1291–305.
doi: 10.1101/gr.263566.120 pubmed: 32801147 pmcid: 7545148
Bakhtiari M, Park J, Ding Y-C, Shleizer-Burko S, Neuhausen SL, Halldórsson BV, et al. Variable number tandem repeats mediate the expression of proximal genes. Nat Commun. 2021;12:2075.
doi: 10.1038/s41467-021-22206-z pubmed: 33824302 pmcid: 8024321
Park J, Kaufman E, Valdmanis PN, Bafna V. TRviz: a Python library for decomposing and visualizing tandem repeat sequences. Bioinform Adv. 2023;3:vbad058.
doi: 10.1093/bioadv/vbad058 pubmed: 37168281 pmcid: 10166586
Koren S, Bao Z, Guarracino A, Ou S, Goodwin S, Jenike KM, et al. Gapless assembly of complete human and plant chromosomes using only nanopore sequencing. BioRxivorg. 2024. https://doi.org/10.1101/2024.03.15.585294 .
doi: 10.1101/2024.03.15.585294
IGV: Integrative genomics viewer n.d. https://www.igv.org/ (Accessed 2 Jan 2024).
Olson ND, Wagner J, McDaniel J, Stephens SH, Westreich ST, Prasanna AG, et al. PrecisionFDA Truth Challenge V2: Calling variants from short and long reads in difficult-to-map regions. Cell Genom 2022;2. https://doi.org/10.1016/j.xgen.2022.100129 .
Oxford Nanopore technologies. Oxford Nanopore Technologies n.d. https://nanoporetech.com/platform/accuracy (Accessed 7 Jan 2024).
PacBio revio. PacBio 2022. https://www.pacb.com/revio/ (Accessed 7 Jan 2024).
Lee C. Generating consensus sequences from partial order multiple sequence alignment graphs. Bioinformatics. 2003;19:999–1008.
doi: 10.1093/bioinformatics/btg109 pubmed: 12761063
Albers CA, Lunter G, MacArthur DG, McVean G, Ouwehand WH, Durbin R. Dindel: accurate indel calls from short-read data. Genome Res. 2011;21:961–73.
doi: 10.1101/gr.112326.110 pubmed: 20980555 pmcid: 3106329
Bonfield JK, Marshall J, Danecek P, Li H, Ohan V, Whitwham A, et al. HTSlib: C library for reading/writing high-throughput sequencing data. Gigascience 2021;10. https://doi.org/10.1093/gigascience/giab007 .
Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34:3094–100.
doi: 10.1093/bioinformatics/bty191 pubmed: 29750242 pmcid: 6137996
Kiełbasa SM, Wan R, Sato K, Horton P, Frith MC. Adaptive seeds tame genomic sequence comparison. Genome Res. 2011;21:487–93.
doi: 10.1101/gr.113985.110 pubmed: 21209072 pmcid: 3044862
Martin M, Ebert P, Marschall T. Read-based phasing and analysis of phased variants with WhatsHap. Methods Mol Biol. 2023;2590:127–38.
doi: 10.1007/978-1-0716-2819-5_8 pubmed: 36335496
Poplin R, Chang P-C, Alexander D, Schwartz S, Colthurst T, Ku A, et al. A universal SNP and small-indel variant caller using deep neural networks. Nat Biotechnol. 2018;36:983–7.
doi: 10.1038/nbt.4235 pubmed: 30247488
Ziaei Jam H, Zook JM, Javadzadeh S, Park J, Sehgal A, Gymrek M. LongTR. GitHub 2023. https://github.com/gymrek-lab/LongTR (Accessed 2024).
Ziaei Jam H, Zook JM, Javadzadeh S, Park J, Sehgal A, Gymrek M. LongTR. Zenodo 2024. https://zenodo.org/doi/10.5281/zenodo.11403979 (Accessed 2024).
English A. Project adotto tandem-repeat regions and annotations 2022. https://doi.org/10.5281/ZENODO.6930201 .
Datasets - PacBio - Highly accurate long-read sequencing. PacBio 2020. https://www.pacb.com/connect/datasets/ (Accessed 4 June 2024).
Oxford Nanopore Technologies. Sequencing Genome in a Bottle samples 2023. https://doi.org/10.5281/ZENODO.8363974 .
Zook JM, Catoe D, McDaniel J, Vang L, Spies N, Sidow A, et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci Data. 2016;3:160025.
doi: 10.1038/sdata.2016.25 pubmed: 27271295 pmcid: 4896128

Auteurs

Helyaneh Ziaei Jam (H)

Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA.

Justin M Zook (JM)

Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, Gaithersburg, MD, USA.

Sara Javadzadeh (S)

Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA.

Jonghun Park (J)

Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA.

Aarushi Sehgal (A)

Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA.

Melissa Gymrek (M)

Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA. mgymrek@ucsd.edu.
Department of Medicine, University of California San Diego, La Jolla, CA, USA. mgymrek@ucsd.edu.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C

Classifications MeSH