STRavinsky STR database and PGTailor PGT tool demonstrate superiority of CHM13-T2T over hg38 and hg19 for STR-based applications.
Journal
European journal of human genetics : EJHG
ISSN: 1476-5438
Titre abrégé: Eur J Hum Genet
Pays: England
ID NLM: 9302235
Informations de publication
Date de publication:
Jul 2023
Jul 2023
Historique:
received:
08
01
2023
accepted:
23
03
2023
revised:
18
03
2023
pmc-release:
01
07
2024
medline:
10
7
2023
pubmed:
14
4
2023
entrez:
13
4
2023
Statut:
ppublish
Résumé
Short-Tandem-Repeats (STRs) have long been studied for possible roles in biological phenomena, and are utilized in multiple applications such as forensics, evolutionary studies and pre-implantation-genetic-testing (PGT). The two reference genomes most used by clinicians and researchers are GRCh37/hg19 and GRCh38/hg38, both constructed using mainly short-read-sequencing (SRS) in which all-STR-containing-reads cannot be assembled to the reference genome. With the introduction of long-read-sequencing (LRS) methods and the generation of the CHM13 reference genome, also known as T2T, many previously unmapped STRs were finally localized within the human genome. We generated STRavinsky, a compact STR database for three reference genomes, including T2T. We proceeded to demonstrate the advantages of T2T over hg19 and hg38, identifying nearly double the number of STRs throughout all chromosomes. Through STRavinsky, providing a resolution down to a specific genomic coordinate, we demonstrated extreme propensity of TGGAA repeats in p arms of acrocentric chromosomes, substantially corroborating early molecular studies suggesting a possible role in formation of Robertsonian translocations. Moreover, we delineated unique propensity of TGGAA repeats specifically in chromosome 16q11.2 and in 9q12. Finally, we harness the superior capabilities of T2T and STRavinsky to generate PGTailor, a novel web application dramatically facilitating design of STR-based PGT tests in mere minutes.
Identifiants
pubmed: 37055538
doi: 10.1038/s41431-023-01352-6
pii: 10.1038/s41431-023-01352-6
pmc: PMC10325972
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
738-743Informations de copyright
© 2023. The Author(s), under exclusive licence to European Society of Human Genetics.
Références
Weber JL, Myers EW. Human whole-genome shotgun sequencing. Genome Res. 1997;7:401–9.
doi: 10.1101/gr.7.5.401
pubmed: 9149936
Craig Venter J, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, et al. The sequence of the human genome. Science. 2001;291:1304–51.
doi: 10.1126/science.1058040
Alkan C, Sajjadian S, Eichler EE. Limitations of next-generation genome sequence assembly. Nat Methods. 2010;8:61–5.
doi: 10.1038/nmeth.1527
pubmed: 21102452
pmcid: 3115693
Rhoads A, Au KF. PacBio sequencing and its applications. Genom Proteom Bioinforma. 2015;13:278–89.
doi: 10.1016/j.gpb.2015.08.002
Jain M, Olsen HE, Paten B, Akeson M. The Oxford nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol. 2016;17:1–11.
Nurk S, Koren S, Rhie A, Rautiainen M, Bzikadze AV, Mikheenko A, et al. The complete sequence of a human genome. Science. 2022;376:44–53.
doi: 10.1126/science.abj6987
pubmed: 35357919
pmcid: 9186530
Noyes MD, Harvey WT, Porubsky D, Sulovari A, Li R, Rose NR, et al. Familial long-read sequencing increases yield of de novo mutations. Am J Hum Genet. 2022;109:631–46.
doi: 10.1016/j.ajhg.2022.02.014
pubmed: 35290762
pmcid: 9069071
Hoyt SJ, Storer JM, Hartley GA, Grady PGS, Gershman A, de Lima LG, et al. From telomere to telomere: the transcriptional and epigenetic state of human repeat elements. Science. 2022;376:eabk3112.
doi: 10.1126/science.abk3112
pubmed: 35357925
pmcid: 9301658
Mahmoud M, Huang Y, Garimella K, Audano PA, Wan W, Prasad N, et al. Utility of long-read sequencing for all of us. bioRxiv. 2023;2023.01.23.525236.
Steely CJ, Watkins WS, Baird L, Jorde LB. The mutational dynamics of short tandem repeats in large, multigenerational families. Genome Biol. 2022;23:1–19.
doi: 10.1186/s13059-022-02818-4
Hardy T. The role of prenatal diagnosis following preimplantation genetic testing for single-gene conditions: a historical overview of evolving technologies and clinical practice. Prenat Diagn. 2020;40:647–51.
doi: 10.1002/pd.5662
pubmed: 32037566
Alfonse LE, Garrett AD, Lun DS, Duffy KR, Grgicak CM. A large-scale dataset of single and mixed-source short tandem repeat profiles to inform human identification strategies: PROVEDIt. Forensic Sci Int Genet. 2018;32:62–70.
doi: 10.1016/j.fsigen.2017.10.006
pubmed: 29091906
Roewer L. Y-chromosome short tandem repeats in forensics—sexing, profiling, and matching male DNA. Wiley Interdiscip Rev Forsenic Sci. 2019;1:e1336.
Truong DT, Minh NVN, Nhung DP, van Luong H, Quyet D, Anh TN, et al. Short tandem repeats used in preimplantation genetic testing of β-thalassemia: genetic polymorphisms for 15 linked loci in the Vietnamese population. J Med Sci. 2019;7:4383–8.
Basille C, Frydman R, Aly A el, Hesters L, Fanchin R, Tachdjian G, et al. Preimplantation genetic diagnosis: state of the art. Eur J Obstet Gynecol Reprod Biol. 2009;145:9–13.
doi: 10.1016/j.ejogrb.2009.04.004
pubmed: 19411132
Wang W, Yap CHA, Loh SF, Tan ASC, Lim MN, Prasath EB, et al. Simplified PGD of common determinants of haemoglobin Bart’s hydrops fetalis syndrome using multiplex-microsatellite PCR. Reprod Biomed Online. 2010;21:642–8.
doi: 10.1016/j.rbmo.2010.06.021
pubmed: 20864413
Alkuraya FS. Impact of new genomic tools on the practice of clinical genetics in consanguineous populations: the Saudi experience. Clin Genet. 2013;84:203–8.
doi: 10.1111/cge.12131
pubmed: 23451714
Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999;27:573–80.
doi: 10.1093/nar/27.2.573
pubmed: 9862982
pmcid: 148217
Chen N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinform. 2004;25:4.10.1–14.
Sulovari A, Li R, Audano PA, Porubsky D, Vollger MR, Logsdon GA, et al. Human-specific tandem repeat expansion and differential gene expression during primate evolution. Proc Natl Acad Sci USA 2019;116:23243–53.
doi: 10.1073/pnas.1912175116
pubmed: 31659027
pmcid: 6859368
Grady DL, Ratliff RL, Robinson DL, Mccanlies EC, Meyne J, Moyzis RK. Highly conserved repetitive DNA sequences are present at human centromeres. Proc Natl Acad Sci USA 1992;89:1695.
doi: 10.1073/pnas.89.5.1695
pubmed: 1542662
pmcid: 48519
Page SL, Shin JC, Han JY, Choo KHA, Shaffer LG. Breakpoint diversity illustrates distinct mechanisms for Robertsonian translocation formation. Hum Mol Genet. 1996;5:1279–88.
doi: 10.1093/hmg/5.9.1279
pubmed: 8872467
Zhu L, Chou SH, Reid BR. A single G-to-C change causes human centromere TGGAA repeats to fold back into hairpins. Proc Natl Acad Sci. 1996;93:12159–64.
doi: 10.1073/pnas.93.22.12159
pubmed: 8901550
pmcid: 37960
Untergasser A, Cutcutache I, Koressaar T, Ye J, Faircloth BC, Remm M, et al. Primer3-new capabilities and interfaces. Nucleic Acids Res. 2012;40:e115.
doi: 10.1093/nar/gks596
pubmed: 22730293
pmcid: 3424584
Ye J, Coulouris G, Zaretskaya I, Cutcutache I, Rozen S, Madden TL. Primer-BLAST: a tool to design target-specific primers for polymerase chain reaction. BMC Bioinform. 2012;13:134.
doi: 10.1186/1471-2105-13-134
Hossain S. Visualization of bioinformatics data with dash bio. Proc of the 18th Python in science conference. 2019; https://dash.plot.ly/dash-bio .
Cechova M, Harris RS, Tomaszkiewicz M, Arbeithuber B, Chiaromonte F, Makova KD. High satellite repeat turnover in great apes studied with short- and long-read technologies. Mol Biol Evol. 2019;36:2415–31.
doi: 10.1093/molbev/msz156
pubmed: 31273383
pmcid: 6805231
Giacalone JP, Francke U. Common sequence motifs at the rearrangement sites of a constitutional X/autosome translocation and associated deletion. Am J Hum Genet. 1992;50:725.
pubmed: 1347968
pmcid: 1682635