Gene family information facilitates variant interpretation and identification of disease-associated genes in neurodevelopmental disorders.
Conservation
Gene family
Missense variants
Neurodevelopmental disorders
Paralogs
Journal
Genome medicine
ISSN: 1756-994X
Titre abrégé: Genome Med
Pays: England
ID NLM: 101475844
Informations de publication
Date de publication:
17 03 2020
17 03 2020
Historique:
received:
15
06
2019
accepted:
21
02
2020
entrez:
19
3
2020
pubmed:
19
3
2020
medline:
5
1
2021
Statut:
epublish
Résumé
Classifying pathogenicity of missense variants represents a major challenge in clinical practice during the diagnoses of rare and genetic heterogeneous neurodevelopmental disorders (NDDs). While orthologous gene conservation is commonly employed in variant annotation, approximately 80% of known disease-associated genes belong to gene families. The use of gene family information for disease gene discovery and variant interpretation has not yet been investigated on a genome-wide scale. We empirically evaluate whether paralog-conserved or non-conserved sites in human gene families are important in NDDs. Gene family information was collected from Ensembl. Paralog-conserved sites were defined based on paralog sequence alignments; 10,068 NDD patients and 2078 controls were statistically evaluated for de novo variant burden in gene families. We demonstrate that disease-associated missense variants are enriched at paralog-conserved sites across all disease groups and inheritance models tested. We developed a gene family de novo enrichment framework that identified 43 exome-wide enriched gene families including 98 de novo variant carrying genes in NDD patients of which 28 represent novel candidate genes for NDD which are brain expressed and under evolutionary constraint. This study represents the first method to incorporate gene family information into a statistical framework to interpret variant data for NDDs and to discover new NDD-associated genes.
Sections du résumé
BACKGROUND
Classifying pathogenicity of missense variants represents a major challenge in clinical practice during the diagnoses of rare and genetic heterogeneous neurodevelopmental disorders (NDDs). While orthologous gene conservation is commonly employed in variant annotation, approximately 80% of known disease-associated genes belong to gene families. The use of gene family information for disease gene discovery and variant interpretation has not yet been investigated on a genome-wide scale. We empirically evaluate whether paralog-conserved or non-conserved sites in human gene families are important in NDDs.
METHODS
Gene family information was collected from Ensembl. Paralog-conserved sites were defined based on paralog sequence alignments; 10,068 NDD patients and 2078 controls were statistically evaluated for de novo variant burden in gene families.
RESULTS
We demonstrate that disease-associated missense variants are enriched at paralog-conserved sites across all disease groups and inheritance models tested. We developed a gene family de novo enrichment framework that identified 43 exome-wide enriched gene families including 98 de novo variant carrying genes in NDD patients of which 28 represent novel candidate genes for NDD which are brain expressed and under evolutionary constraint.
CONCLUSION
This study represents the first method to incorporate gene family information into a statistical framework to interpret variant data for NDDs and to discover new NDD-associated genes.
Identifiants
pubmed: 32183904
doi: 10.1186/s13073-020-00725-6
pii: 10.1186/s13073-020-00725-6
pmc: PMC7079346
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
28Subventions
Organisme : Medical Research Council
ID : MC_UP_1102/20
Pays : United Kingdom
Organisme : Medical Research Council
ID : MR/N026063/1
Pays : United Kingdom
Organisme : NHGRI NIH HHS
ID : T32 HG002295
Pays : United States
Organisme : NICHD NIH HHS
ID : U54 HD090255
Pays : United States
Investigateurs
Rudi Balling
(R)
Nina Barisic
(N)
Stéphanie Baulac
(S)
Hande Caglayan
(H)
Dana C Craiu
(DC)
Peter De Jonghe
(P)
Christel Depienne
(C)
Renzo Guerrini
(R)
Ingo Helbig
(I)
Helle Hjalgrim
(H)
Dorota Hoffman-Zacharska
(D)
Johanna Jähn
(J)
Karl M Klein
(KM)
Bobby P C Koeleman
(BPC)
Vladimir Komarek
(V)
Roland Krause
(R)
Eric Leguern
(E)
Anna-Elina Lehesjoki
(AE)
Johannes R Lemke
(JR)
Holger Lerche
(H)
Taria Linnankivi
(T)
Carla Marini
(C)
Patrick May
(P)
Hiltrud Muhle
(H)
Deb K Pal
(DK)
Aarno Palotie
(A)
Felix Rosenow
(F)
Susanne Schubert-Bast
(S)
Kaia Selmer
(K)
Jose M Serratosa
(JM)
Ulrich Stephani
(U)
Katalin Štěrbová
(K)
Pasquale Striano
(P)
Arvid Suls
(A)
Tina Talvik
(T)
Sarah von Spiczak
(S)
Yvonne G Weber
(YG)
Sarah Weckhuysen
(S)
Federico Zara
(F)
Références
Nat Genet. 2014 Sep;46(9):944-50
pubmed: 25086666
Nat Rev Genet. 2013 Sep;14(9):645-60
pubmed: 23949544
Genome Res. 2020 Jan;30(1):62-71
pubmed: 31871067
Nature. 2016 Aug 17;536(7616):285-91
pubmed: 27535533
Nucleic Acids Res. 2014 Jan;42(Database issue):D865-72
pubmed: 24217909
Nature. 2015 Mar 12;519(7542):223-8
pubmed: 25533962
N Engl J Med. 2012 Nov 15;367(20):1921-9
pubmed: 23033978
Nature. 2014 Nov 13;515(7526):216-21
pubmed: 25363768
Protein Sci. 2009 Jun;18(6):1306-15
pubmed: 19472362
Bioinformatics. 2005 Jun 1;21(11):2596-603
pubmed: 15713731
Oncogene. 2003 Feb 20;22(7):1002-11
pubmed: 12592387
Nature. 2013 Sep 12;501(7466):217-21
pubmed: 23934111
Comput Appl Biosci. 1993 Dec;9(6):745-56
pubmed: 8143162
Nat Genet. 2017 Apr;49(4):504-510
pubmed: 28191890
Nature. 2014 Feb 13;506(7487):179-84
pubmed: 24463507
Genome Res. 2010 Mar;20(3):301-10
pubmed: 20067941
Genome Res. 2009 Feb;19(2):327-35
pubmed: 19029536
Nat Genet. 2008 May;40(5):676-81
pubmed: 18408719
Mol Biol Evol. 2012 Jan;29(1):61-9
pubmed: 21705381
Nucleic Acids Res. 2010 Sep;38(16):e164
pubmed: 20601685
Nat Genet. 2018 Jul;50(7):1048-1053
pubmed: 29942082
Bioinformatics. 2015 Jul 1;31(13):2202-4
pubmed: 25701572
Nature. 2017 Feb 23;542(7642):433-438
pubmed: 28135719
Nat Genet. 2013 Jun;45(6):580-5
pubmed: 23715323
Am J Hum Genet. 2014 Oct 2;95(4):360-70
pubmed: 25262651
Bioinformatics. 2009 May 1;25(9):1189-91
pubmed: 19151095
Nature. 2014 Nov 13;515(7526):209-15
pubmed: 25363760
Nat Neurosci. 2016 Sep;19(9):1194-6
pubmed: 27479843
Genome Biol. 2016 Mar 14;17:47
pubmed: 26975353
Lancet. 2012 Nov 10;380(9854):1674-82
pubmed: 23020937
PLoS Comput Biol. 2015 Dec 04;11(12):e1004559
pubmed: 26636753
Nucleic Acids Res. 2004 Mar 19;32(5):1792-7
pubmed: 15034147
J Med Genet. 2014 Jan;51(1):35-44
pubmed: 24136861
Database (Oxford). 2011 Jul 23;2011:bar030
pubmed: 21785142
Science. 2015 Dec 4;350(6265):1262-6
pubmed: 26785492