Large scale in silico characterization of repeat expansion variation in human genomes.


Journal

Scientific data
ISSN: 2052-4463
Titre abrégé: Sci Data
Pays: England
ID NLM: 101640192

Informations de publication

Date de publication:
08 09 2020
Historique:
received: 22 01 2020
accepted: 13 08 2020
entrez: 9 9 2020
pubmed: 10 9 2020
medline: 5 11 2020
Statut: epublish

Résumé

Significant progress has been made in elucidating single nucleotide polymorphism diversity in the human population. However, the majority of the variation space in the genome is structural and remains partially elusive. One form of structural variation is tandem repeats (TRs). Expansion of TRs are responsible for over 40 diseases, but we hypothesize these represent only a fraction of the pathogenic repeat expansions that exist. Here we characterize long or expanded TR variation in 1,115 human genomes as well as a replication cohort of 2,504 genomes, identified using ExpansionHunter Denovo. We found that individual genomes typically harbor several rare, large TRs, generally in non-coding regions of the genome. We noticed that these large TRs are enriched in their proximity to Alu elements. The vast majority of these large TRs seem to be expansions of smaller TRs that are already present in the reference genome. We are providing this TR profile as a resource for comparison to undiagnosed rare disease genomes in order to detect novel disease-causing repeat expansions.

Identifiants

pubmed: 32901039
doi: 10.1038/s41597-020-00633-9
pii: 10.1038/s41597-020-00633-9
pmc: PMC7479135
doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural

Langues

eng

Sous-ensembles de citation

IM

Pagination

294

Subventions

Organisme : U.S. Department of Health & Human Services | NIH | National Institute of Neurological Disorders and Stroke (NINDS)
ID : R01NS072248
Pays : International
Organisme : U.S. Department of Health & Human Services | NIH | National Institute of Neurological Disorders and Stroke (NINDS)
ID : R01NS072248
Pays : International
Organisme : U.S. Department of Health & Human Services | NIH | National Institute of Neurological Disorders and Stroke (NINDS)
ID : R01NS072248
Pays : International

Références

Haghighi, A. et al. An integrated clinical program and crowdsourcing strategy for genomic sequencing and Mendelian disease gene discovery. Genomic Medicine 3, 21 (2018).
doi: 10.1038/s41525-018-0060-9
Gloss, B. S. & Dinger, M. E. Realizing the significance of noncoding functionality in clinical genomics. Experimental & Molecular Medicine 50, 97 (2018).
doi: 10.1038/s12276-018-0087-0
Maroilley, T. & Tarailo-Graovac, M. Uncovering Missing Heritability in Rare Diseases. Genes 10, 275 (2019).
doi: 10.3390/genes10040275
Chiang, C. et al. The impact of structural variation on human gene expression. Nature Genetics 49, 692–699 (2017).
doi: 10.1038/ng.3834
Paulson, H. Handbook of Clinical Neurology. Vol. 147, 105–123 (Elsevier B.V, 2018).
Campuzano, V. et al. Friedreich’s Ataxia: Autosomal Recessive Disease Caused by an Intronic GAA Triplet Repeat Expansion. Science 271, 1423–1427 (1996).
doi: 10.1126/science.271.5254.1423
DeJesus-Hernandez, M. et al. Expanded GGGGCC hexanucleotide repeat in noncoding region of C9ORF72 causes chromosome 9p-linked FTD and ALS. Neuron 72, 245–256 (2011).
doi: 10.1016/j.neuron.2011.09.011
Liquori, C. L., Ricker, K., Moseley, M. L. & Jacobsen, J. F. Myotonic dystrophy type 2 caused by a CCTG expansion in intron 1 of ZNF9. Science 293, 864–867 (2001).
doi: 10.1126/science.1062125
Tang, H. & Nzabarushimana, E. STRScan: targeted profiling of short tandem repeats in whole-genome sequencing data. BMC Bioinformatics 18, 31–36 (2017).
doi: 10.1186/s12859-016-1429-3
Legendre, M., Pochet, N., Pak, T. & Verstrepen, K. J. Sequence-based estimation of minisatellite and microsatellite repeat variability. Genome Research 17, 1787–1796 (2007).
doi: 10.1101/gr.6554007
Gemayel, R., Cho, J., Boeynaems, S. & Verstrepen, K. J. Beyond Junk-Variable Tandem Repeats as Facilitators of Rapid Evolution of Regulatory and Coding Sequences. Genes 3, 461–480 (2012).
doi: 10.3390/genes3030461
Read, L. R., Raynard, S. J., Rukść, A. & Baker, M. D. Gene repeat expansion and contraction by spontaneous intrachromosomal homologous recombination in mammalian cells. Nucleic Acids Research 32 (2004).
Dolzhenko, E. et al. Detection of long repeat expansions from PCR-free whole-genome sequence data. Genome Research 27, 1895–1903 (2017).
doi: 10.1101/gr.225672.117
Kraft, F. & Kurth, I. Long-read sequencing in human genetics. medizinische genetik 31, 198–204 (2019).
doi: 10.1007/s11825-019-0249-z
Dolzhenko, E. et al. ExpansionHunter Denovo: a computational method for locating known and novel repeat expansions in short-read sequencing data. Genome Biology 21, 102 (2020).
doi: 10.1186/s13059-020-02017-z
Mousavi, N., Shleizer-Burko, S., Yanicky, R. & Gymrek, M. Profiling the genome-wide landscape of tandem repeat expansions. Nucleic Acids Research 47, e90 (2019).
doi: 10.1093/nar/gkz501
Gymrek, M., Golan, D., Rosset, S. & Erlich, Y. lobSTR: A short tandem repeat profiler for personal genomes. Genome Research 22, 1154–1162 (2012).
doi: 10.1101/gr.135780.111
Willems, T. et al. Genome-wide profiling of heritable and de novo STR variations. Nature Methods 14, 590–592 (2017).
doi: 10.1038/nmeth.4267
Cortese, A. et al. Biallelic expansion of an intronic repeat in RFC1 is a common cause of late-onset ataxia. Nature Genetics 51, 649–658 (2019).
doi: 10.1038/s41588-019-0372-4
Dashnow, H. et al. STRetch: detecting and discovering pathogenic short tandem repeat expansions. Genome Biology 19, 121 (2018).
doi: 10.1186/s13059-018-1505-2
Tang, H. et al. Profiling of Short-Tandem-Repeat Disease Alleles in 12,632 Human Whole Genomes. American Journal of Human Genetics 101, 700–715 (2017).
doi: 10.1016/j.ajhg.2017.09.013
Fazal, S. et al. In silico characterization of repeat expansion variation in 1,115 genomes. figshare https://doi.org/10.6084/m9.figshare.c.4819050 (2020).
Fan, H. & Chu, J.-Y. A Brief Review of Short Tandem Repeat Mutation. Genomics, Proteomics & Bioinformatics 5, 7–14 (2007).
doi: 10.1016/S1672-0229(07)60009-6
Bolton, K. A. et al. STaRRRT: a table of short tandem repeats in regulatory regions of the human genome. BMC Genomics 1¢,, 795 (2013).
doi: 10.1186/1471-2164-14-795
Madsen, B. E., Villesen, P. & Wiuf, C. Short Tandem Repeats in Human Exons: A Target for Disease Mutations. BMC Genomics 9, 410 (2008).
doi: 10.1186/1471-2164-9-410
Pray, L. A. Functions and Utility of Alu Jumping Genes. Nature Education 1, 93 (2008).
Bahlo, M. et al. Recent advances in the detection of repeat expansions with short-read next-generation sequencing. F1000Research 7, 736 (2018).
doi: 10.12688/f1000research.13980.1
Wallace, S. E. & Bean, L. J. Resources for Genetics Professionals — Genetic Disorders Caused by Nucleotide Repeat Expansions and Contractions. GeneReviews (2017).
Deininger, P. Alu elements: know the SINEs. Genome Biology 12, 236–248 (2011).
doi: 10.1186/gb-2011-12-12-236
Mularoni, L., Ledda, A., Toll-Riera, M. & Albà, M. M. Natural selection drives the accumulation of amino acid tandem repeats in human proteins. Genome Research 20, 745–754 (2010).
doi: 10.1101/gr.101261.109
Sato, N. et al. Spinocerebellar Ataxia Type 31 Is Associated with “Inserted” Penta-Nucleotide Repeats Containing (TGGAA)n. The American Journal of Human Genetics 85, 544–557 (2009).
doi: 10.1016/j.ajhg.2009.09.019
Bejerano, G. et al. Ultraconserved Elements in the Human Genome. Science 304, 1321–1325 (2004).
doi: 10.1126/science.1098119
E pluribus unum. Nature Methods 7, 331 (2010).
Kuilenburg, A. B. P. V. et al. Glutaminase Deficiency Caused by Short Tandem Repeat Expansion in GLS. The New England Journal of Medicine 380, 1433–1441 (2019).
doi: 10.1056/NEJMoa1806627
Wieben, E. D. et al. A Common Trinucleotide Repeat Expansion within the Transcription Factor 4 (TCF4, E2-2) Gene Predicts Fuchs Corneal Dystrophy. Plos One 7, e49083 (2012).
doi: 10.1371/journal.pone.0049083
Al-Mahdawi, S. et al. Large Interruptions of GAA Repeat Expansion Mutations in Friedreich Ataxia Are Very Rare. Frontiers in Cellular Neuroscience 12 (2018).
Long, A. et al. Somatic instability of the expanded GAA repeats in Friedreich’s ataxia. Plos One 12, e0189990 (2017).
doi: 10.1371/journal.pone.0189990
Gijselinck, I. et al. The C9orf72 repeat size correlates with onset age of disease, DNA methylation and transcriptional downregulation of the promoter. Molecular Psychiatry 21, 1112–1124 (2016).
doi: 10.1038/mp.2015.159
Seltzer, M. M. et al. Prevalence of CGG expansions of the FMR1 gene in a US population‐based sample. American Journal of Medical Genetics 159B, 589–597 (2012).
pubmed: 22619118
Beck, J. et al. Large C9orf72 Hexanucleotide Repeat Expansions Are Seen in Multiple Neurodegenerative Syndromes and Are More Frequent Than Expected in the UK Population. American Journal of Human Genetics 92, 345–353 (2013).
doi: 10.1016/j.ajhg.2013.01.011
Renton, A. E. et al. A hexanucleotide repeat expansion in C9ORF72 is the cause of chromosome 9p21-linked ALS-FTD. Neuron 72, 257–268 (2011).
doi: 10.1016/j.neuron.2011.09.010
Ishikawa, K. et al. Pentanucleotide repeats at the spinocerebellar ataxia type 31 (SCA31) locus in Caucasians. Neurology 77, 1853–1855 (2011).
doi: 10.1212/WNL.0b013e3182377e3a
Arcot, S. S., Wang, Z., Weber, J. L., Deininger, P. L. & Batzer, M. A. Alu Repeats: A Source for the Genesis of Primate Microsatellites. Genomics 29, 136–144 (1995).
doi: 10.1006/geno.1995.1224
Rodriguez, C. M. et al. A native function for RAN translation and CGG repeats in regulating fragile X protein synthesis. Nature Neuroscience 23, 386–397 (2020).
doi: 10.1038/s41593-020-0590-1
Fotsing, S. F. et al. The impact of short tandem repeat variation on gene expression. Nature Genetics 51, 1652–1659 (2019).
doi: 10.1038/s41588-019-0521-9
Mollaa, M., Delcherb, A., Sunyaevc, S., Cantora, C. & Kasifa, S. Triplet repeat length bias and variation in the human transcriptome. PNAS 106, 17095–17100 (2009).
doi: 10.1073/pnas.0907112106

Auteurs

Sarah Fazal (S)

Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA.

Matt C Danzi (MC)

Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA.

Vivian P Cintra (VP)

Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA.

Dana M Bis-Brewer (DM)

Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA.

Egor Dolzhenko (E)

Illumina Inc., San Diego, CA, USA.

Michael A Eberle (MA)

Illumina Inc., San Diego, CA, USA.

Stephan Zuchner (S)

Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA. szuchner@med.miami.edu.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH