Functional filter for whole-genome sequencing data identifies HHT and stress-associated non-coding SMAD4 polyadenylation site variants >5 kb from coding DNA.
3′ untranslated region
CADD score
CPA site
PBMCs
alternate exon use
cleavage and polyadenylation site
combined annotation-dependant depletion score
cycloheximide
hereditary hemorrhagic telangiectasia
peripheral blood mononuclear cells
rare variant
Journal
American journal of human genetics
ISSN: 1537-6605
Titre abrégé: Am J Hum Genet
Pays: United States
ID NLM: 0370475
Informations de publication
Date de publication:
02 11 2023
02 11 2023
Historique:
received:
28
05
2023
revised:
01
09
2023
accepted:
08
09
2023
medline:
6
11
2023
pubmed:
11
10
2023
entrez:
10
10
2023
Statut:
ppublish
Résumé
Despite whole-genome sequencing (WGS), many cases of single-gene disorders remain unsolved, impeding diagnosis and preventative care for people whose disease-causing variants escape detection. Since early WGS data analytic steps prioritize protein-coding sequences, to simultaneously prioritize variants in non-coding regions rich in transcribed and critical regulatory sequences, we developed GROFFFY, an analytic tool that integrates coordinates for regions with experimental evidence of functionality. Applied to WGS data from solved and unsolved hereditary hemorrhagic telangiectasia (HHT) recruits to the 100,000 Genomes Project, GROFFFY-based filtration reduced the mean number of variants/DNA from 4,867,167 to 21,486, without deleting disease-causal variants. In three unsolved cases (two related), GROFFFY identified ultra-rare deletions within the 3' untranslated region (UTR) of the tumor suppressor SMAD4, where germline loss-of-function alleles cause combined HHT and colonic polyposis (MIM: 175050). Sited >5.4 kb distal to coding DNA, the deletions did not modify or generate microRNA binding sites, but instead disrupted the sequence context of the final cleavage and polyadenylation site necessary for protein production: By iFoldRNA, an AAUAAA-adjacent 16-nucleotide deletion brought the cleavage site into inaccessible neighboring secondary structures, while a 4-nucleotide deletion unfolded the downstream RNA polymerase II roadblock. SMAD4 RNA expression differed to control-derived RNA from resting and cycloheximide-stressed peripheral blood mononuclear cells. Patterns predicted the mutational site for an unrelated HHT/polyposis-affected individual, where a complex insertion was subsequently identified. In conclusion, we describe a functional rare variant type that impacts regulatory systems based on RNA polyadenylation. Extension of coding sequence-focused gene panels is required to capture these variants.
Identifiants
pubmed: 37816352
pii: S0002-9297(23)00318-X
doi: 10.1016/j.ajhg.2023.09.005
pmc: PMC10645545
pii:
doi:
Substances chimiques
DNA
9007-49-2
Nucleotides
0
RNA
63231-63-0
Smad4 Protein
0
SMAD4 protein, human
0
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
1903-1918Subventions
Organisme : Medical Research Council
ID : MC_EX_MR/M009203/1
Pays : United Kingdom
Organisme : Medical Research Council
ID : MC_PC_14089
Pays : United Kingdom
Organisme : Medical Research Council
ID : MR/M009203/1
Pays : United Kingdom
Organisme : NHLBI NIH HHS
ID : R35 HL140019
Pays : United States
Informations de copyright
Copyright © 2023 The Authors. Published by Elsevier Inc. All rights reserved.
Déclaration de conflit d'intérêts
Declaration of interests The authors declare no competing interests.
Références
J Med Genet. 2023 Aug 16;:
pubmed: 37586837
Nucleic Acids Res. 2019 Jan 8;47(D1):D155-D162
pubmed: 30423142
Nat Rev Mol Cell Biol. 2019 Dec;20(12):721-737
pubmed: 31477886
Bioinformatics. 2016 Jul 15;32(14):2089-95
pubmed: 27153568
Blood Rev. 2010 Nov;24(6):203-19
pubmed: 20870325
Blood. 2020 Oct 22;136(17):1907-1918
pubmed: 32573726
Nat Genet. 2021 Jul;53(7):994-1005
pubmed: 33986536
Blood Adv. 2022 Jul 12;6(13):3956-3969
pubmed: 35316832
Nat Chem Biol. 2010 Mar;6(3):209-217
pubmed: 20118940
Proc Natl Acad Sci U S A. 2021 Nov 30;118(48):
pubmed: 34815343
Nucleic Acids Res. 2021 Jul 2;49(W1):W431-W437
pubmed: 33956157
Nat Biotechnol. 2010 Oct;28(10):1045-8
pubmed: 20944595
Nucleic Acids Res. 2023 Jan 6;51(D1):D1188-D1195
pubmed: 36420891
Genome Biol. 2006;7 Suppl 1:S4.1-9
pubmed: 16925838
Bioinformatics. 2012 Jul 15;28(14):1919-20
pubmed: 22576172
FASEB J. 2006 Feb;20(2):353-5
pubmed: 16368718
Nucleic Acids Res. 2018 Jul 2;46(W1):W537-W544
pubmed: 29790989
Sci Rep. 2019 Nov 29;9(1):17960
pubmed: 31784565
Nucleic Acids Res. 2023 Jan 6;51(D1):D29-D38
pubmed: 36370100
Bioinformatics. 2014 Aug 1;30(15):2114-20
pubmed: 24695404
Nature. 2020 May;581(7809):434-443
pubmed: 32461654
Bioinformatics. 2015 Sep 1;31(17):2891-3
pubmed: 25910700
Nucleic Acids Res. 2000 Jan 1;28(1):235-42
pubmed: 10592235
Nat Rev Gastroenterol Hepatol. 2021 Jul;18(7):469-481
pubmed: 34089011
RNA. 2008 Jun;14(6):1164-73
pubmed: 18456842
Nucleic Acids Res. 2018 Jan 4;46(D1):D794-D801
pubmed: 29126249
Nature. 2016 Aug 17;536(7616):285-91
pubmed: 27535533
Bioinformatics. 2013 Aug 15;29(16):2041-3
pubmed: 23736529
Genet Med. 2015 May;17(5):405-24
pubmed: 25741868
Nat Genet. 2013 Jun;45(6):580-5
pubmed: 23715323
Nucleic Acids Res. 2019 Jan 8;47(D1):D135-D139
pubmed: 30371849
World J Stem Cells. 2022 Jan 26;14(1):41-53
pubmed: 35126827
Genome Biol. 2010;11(10):R106
pubmed: 20979621
Mol Syndromol. 2013 Apr;4(4):184-96
pubmed: 23801935
Nucleic Acids Res. 2020 Jan 8;48(D1):D127-D131
pubmed: 31504780
J Med Genet. 2020 Dec;57(12):859-862
pubmed: 32303606
Eur J Med Genet. 2022 Jan;65(1):104370
pubmed: 34737116
Genome Res. 2002 Jun;12(6):996-1006
pubmed: 12045153
Nucleic Acids Res. 2019 Jan 8;47(D1):D886-D894
pubmed: 30371827
Sci Rep. 2019 Jun 27;9(1):9354
pubmed: 31249361
PLoS One. 2016 Feb 11;11(2):e0147990
pubmed: 26866805
Cell Cycle. 2014;13(18):2847-52
pubmed: 25486472
Nat Commun. 2022 May 17;13(1):2709
pubmed: 35581194
Science. 2019 Dec 20;366(6472):
pubmed: 31806698
Nat Genet. 2022 Aug;54(8):1063-1065
pubmed: 35902745
Genome Res. 2012 Oct;22(10):2008-17
pubmed: 22722343
Circulation. 2023 Sep 19;148(12):982-988
pubmed: 37584195
EJHaem. 2023 Jul 03;4(3):602-611
pubmed: 37601877
BMJ. 2018 Apr 24;361:k1687
pubmed: 29691228
Clin Diagn Lab Immunol. 2002 Nov;9(6):1235-9
pubmed: 12414755
Nat Rev Genet. 2017 Aug;18(8):456
pubmed: 28669984
Haematologica. 2023 Sep 21;:
pubmed: 37731378
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
Nat Rev Mol Cell Biol. 2018 Mar;19(3):143-157
pubmed: 29138516
Gigascience. 2017 Jul 1;6(7):1-8
pubmed: 28531267
Am J Med Genet A. 2022 Mar;188(3):959-964
pubmed: 34904380
Bioinformatics. 2010 Mar 15;26(6):841-2
pubmed: 20110278
Brief Bioinform. 2018 Sep 28;19(5):776-792
pubmed: 28334202
Nucleic Acids Res. 2016 Jan 4;44(D1):D726-32
pubmed: 26527727
Ann Intern Med. 2020 Dec 15;173(12):989-1001
pubmed: 32894695
Hum Mol Genet. 2022 Oct 10;31(20):3539-3557
pubmed: 35708503
Cell Syst. 2018 Feb 28;6(2):230-244.e1
pubmed: 29428416
Nat Rev Mol Cell Biol. 2022 Dec;23(12):779-796
pubmed: 35798852