Predicting congenital renal tract malformation genes using machine learning.
Journal
Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288
Informations de publication
Date de publication:
14 08 2023
14 08 2023
Historique:
received:
03
03
2023
accepted:
03
07
2023
medline:
16
8
2023
pubmed:
15
8
2023
entrez:
14
8
2023
Statut:
epublish
Résumé
Congenital renal tract malformations (RTMs) are the major cause of severe kidney failure in children. Studies to date have identified defined genetic causes for only a minority of human RTMs. While some RTMs may be caused by poorly defined environmental perturbations affecting organogenesis, it is likely that numerous causative genetic variants have yet to be identified. Unfortunately, the speed of discovering further genetic causes for RTMs is limited by challenges in prioritising candidate genes harbouring sequence variants. Here, we exploited the computer-based artificial intelligence methodology of supervised machine learning to identify genes with a high probability of being involved in renal development. These genes, when mutated, are promising candidates for causing RTMs. With this methodology, the machine learning classifier determines which attributes are common to renal development genes and identifies genes possessing these attributes. Here we report the validation of an RTM gene classifier and provide predictions of the RTM association status for all protein-coding genes in the mouse genome. Overall, our predictions, whilst not definitive, can inform the prioritisation of genes when evaluating patient sequence data for genetic diagnosis. This knowledge of renal developmental genes will accelerate the processes of reaching a genetic diagnosis for patients born with RTMs.
Identifiants
pubmed: 37580336
doi: 10.1038/s41598-023-38110-z
pii: 10.1038/s41598-023-38110-z
pmc: PMC10425350
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
13204Subventions
Organisme : British Heart Foundation
ID : RG/F/21/110050
Pays : United Kingdom
Organisme : Medical Research Council
ID : MR/T016809/1
Pays : United Kingdom
Organisme : British Heart Foundation
ID : FS/15/67/32038
Pays : United Kingdom
Organisme : British Heart Foundation
ID : RG/15/12/31616
Pays : United Kingdom
Organisme : British Heart Foundation
ID : CH/13/2/30154
Pays : United Kingdom
Informations de copyright
© 2023. Springer Nature Limited.
Références
Neild, G. H. Primary renal disease in young adults with renal failure. Nephrol. Dial. Transplant. 25, 1025–1032 (2010).
pubmed: 20019018
doi: 10.1093/ndt/gfp653
Plumb, L. et al. demography of the UK paediatric renal replacement therapy population in 2016. Nephron 139, 105–116 (2018).
pubmed: 29991000
doi: 10.1159/000490962
Westland, R., Renkema, K. Y. & Knoers, N. V. Clinical integration of genome diagnostics for congenital anomalies of the kidney and urinary tract. Clin. J. Am. Soc. Nephrol. 16, 128–137 (2021).
doi: 10.2215/CJN.14661119
Woolf, A. S., Lopes, F. M., Ranjzad, P. & Roberts, N. A. Congenital disorders of the human urinary tract: Recent insights from genetic and molecular studies. Front. Pediatr. 7, 136 (2019).
pubmed: 31032239
pmcid: 6470263
doi: 10.3389/fped.2019.00136
Adalat, S. et al. HNF1B mutations associate with hypomagnesemia and renal magnesium wasting. J. Am. Soc. Nephrol. 20, 1123–1131 (2009).
pubmed: 19389850
pmcid: 2678044
doi: 10.1681/ASN.2008060633
Weber, S. et al. Prevalence of mutations in renal developmental genes in children with renal hypodysplasia: Results of the ESCAPE study. J. Am. Soc. Nephrol. 17, 2864–2870 (2006).
pubmed: 16971658
doi: 10.1681/ASN.2006030277
Groen I’nt Woud, S. et al. Maternal risk factors involved in specific congenital anomalies of the kidney and urinary tract: A case–control study. Birth Defects Res. Part A Clin. Mol. Teratol. 106, 596–603 (2016).
doi: 10.1002/bdra.23500
Woolf, A. S. Environmental influences on renal tract development: A focus on maternal diet and the glucocorticoid hypothesis. Klin. Padiatr. 223, S10–S17 (2011).
pubmed: 21472672
doi: 10.1055/s-0030-1255876
Baştanlar, Y. & Özuysal, M. Introduction to machine learning. miRNomics: MicroRNA biology and computational analysis, 105–128 (2014).
Libbrecht, M. W. & Noble, W. S. Machine learning applications in genetics and genomics. Nat. Rev. Genet. 16, 321–332 (2015).
pubmed: 25948244
pmcid: 5204302
doi: 10.1038/nrg3920
Bakheet, T. M. & Doig, A. J. Properties and identification of human protein drug targets. Bioinformatics 25, 451–457 (2009).
pubmed: 19164304
doi: 10.1093/bioinformatics/btp002
Bakheet, T. M. & Doig, A. J. Properties and identification of antibiotic drug targets. BMC Bioinform. 11, 1–10 (2010).
doi: 10.1186/1471-2105-11-195
Bull, S. C. & Doig, A. J. Properties of protein drug target classes. PLoS ONE 10, e0117955 (2015).
pubmed: 25822509
pmcid: 4379170
doi: 10.1371/journal.pone.0117955
Tian, D. et al. Identifying mouse developmental essential genes using machine learning. Dis. Models Mech. 11, dmm034546 (2018).
doi: 10.1242/dmm.034546
Yang, W. et al. Identification of genes and pathways involved in kidney renal clear cell carcinoma. BMC Bioinform. 15, 1–10 (2014).
Guan, Y., Martini, S. & Mariani, L. H. in Seminars in Nephrology. 237–244 (Elsevier).
López-Bigas, N. & Ouzounis, C. A. Genome-wide identification of genes likely to be involved in human genetic disease. Nucleic Acids Res. 32, 3108–3114 (2004).
pubmed: 15181176
pmcid: 434425
doi: 10.1093/nar/gkh605
Oliver, P. L., Bitoun, E. & Davies, K. E. Comparative genetic analysis: The utility of mouse genetic systems for studying human monogenic disease. Mamm. Genome 18, 412–424 (2007).
pubmed: 17514509
pmcid: 1998876
doi: 10.1007/s00335-007-9014-8
Rangarajan, A. & Weinberg, R. A. Comparative biology of mouse versus human cells: Modelling human cancer in mice. Nat. Rev. Cancer 3, 952–959 (2003).
pubmed: 14737125
doi: 10.1038/nrc1235
Bult, C. J. et al. The mouse genome database (MGD): mouse biology and model systems. Nucleic Acids Res. 36, D724–D728 (2008).
pubmed: 18158299
doi: 10.1093/nar/gkm961
Koscielny, G. et al. The international mouse phenotyping consortium web portal, a unified point of access for knockout mice and related phenotyping data. Nucleic Acids Res. 42, D802–D809 (2014).
pubmed: 24194600
doi: 10.1093/nar/gkt977
Hamosh, A., Scott, A. F., Amberger, J. S., Bocchini, C. A. & McKusick, V. A. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 33, D514–D517 (2005).
pubmed: 15608251
doi: 10.1093/nar/gki033
Murugapoopathy, V. & Gupta, I. R. A primer on congenital anomalies of the kidneys and urinary tracts (CAKUT). Clin. J. Am. Soc. Nephrol. 15, 723–731 (2020).
pubmed: 32188635
pmcid: 7269211
doi: 10.2215/CJN.12581019
Kabir, M., Barradas, A., Tzotzos, G. T., Hentges, K. E. & Doig, A. J. Properties of genes essential for mouse development. PLoS ONE 12, e0178273 (2017).
pubmed: 28562614
pmcid: 5451031
doi: 10.1371/journal.pone.0178273
Apweiler, R. et al. UniProt: The universal protein knowledgebase. Nucleic Acids Res. 32, D115–D119 (2004).
pubmed: 14681372
pmcid: 308865
doi: 10.1093/nar/gkh131
Consortium, U. UniProt: A hub for protein information. Nucleic acids research 43, D204-D212 (2015).
Ashburner, M. et al. Gene ontology: Tool for the unification of biology. Nat. Genet. 25, 25–29 (2000).
pubmed: 10802651
pmcid: 3037419
doi: 10.1038/75556
Stark, C. et al. BioGRID: A general repository for interaction datasets. Nucleic Acids Res. 34, D535–D539 (2006).
pubmed: 16381927
doi: 10.1093/nar/gkj109
Bader, G. D., Betel, D. & Hogue, C. W. BIND: The biomolecular interaction network database. Nucleic Acids Res. 31, 248–250 (2003).
pubmed: 12519993
pmcid: 165503
doi: 10.1093/nar/gkg056
Chen, C. et al. Mouse Piwi interactome identifies binding mechanism of Tdrkh Tudor domain to arginine methylated Miwi. Proc. Natl. Acad. Sci. 106, 20336–20341 (2009).
pubmed: 19918066
pmcid: 2787181
doi: 10.1073/pnas.0911640106
Hermjakob, H. et al. IntAct: An open source molecular interaction database. Nucleic Acids Res. 32, D452–D455 (2004).
pubmed: 14681455
pmcid: 308786
doi: 10.1093/nar/gkh052
Lynn, D. J. et al. InnateDB: facilitating systems-level analyses of the mammalian innate immune response. Mol. Syst. Biol. 4, 218 (2008).
pubmed: 18766178
pmcid: 2564732
doi: 10.1038/msb.2008.55
Xenarios, I. et al. DIP: The database of interacting proteins. Nucleic Acids Res. 28, 289–291 (2000).
pubmed: 10592249
pmcid: 102387
doi: 10.1093/nar/28.1.289
Zanzoni, A. et al. MINT: A Molecular INTeraction database. FEBS Lett. 513, 135–140 (2002).
pubmed: 11911893
doi: 10.1016/S0014-5793(01)03293-8
Wang, J. et al. A protein interaction network for pluripotency of embryonic stem cells. Nature 444, 364–368 (2006).
pubmed: 17093407
doi: 10.1038/nature05284
He, H. & Garcia, E. A. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21, 1263–1284 (2009).
doi: 10.1109/TKDE.2008.239
Visa, S. & Ralescu, A. in Proceedings of the Sixteen Midwest Artificial Intelligence and Cognitive Science Conference. 67–73 (sn).
Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002).
doi: 10.1613/jair.953
Kobrynski, L. J. & Sullivan, K. E. Velocardiofacial syndrome, DiGeorge syndrome: the chromosome 22q11. 2 deletion syndromes. Lancet 370, 1443–1452 (2007).
pubmed: 17950858
doi: 10.1016/S0140-6736(07)61601-8
Lopez-Rivera, E. et al. Genetic drivers of kidney defects in the DiGeorge syndrome. N. Engl. J. Med. 376, 742–754 (2017).
pubmed: 28121514
pmcid: 5559731
doi: 10.1056/NEJMoa1609009
Darlow, J. M. et al. Genome-wide linkage and association study implicates the 10q26 region as a major genetic contributor to primary nonsyndromic vesicoureteric reflux. Sci. Rep. 7, 1–13 (2017).
doi: 10.1038/s41598-017-15062-9
Motenko, H., Neuhauser, S. B., O’keefe, M. & Richardson, J. E. MouseMine: a new data warehouse for MGI. Mamm. Genome 26, 325–330 (2015).
pubmed: 26092688
pmcid: 4534495
doi: 10.1007/s00335-015-9573-z
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
doi: 10.1023/A:1010933404324
Hall, M. et al. The WEKA data mining software: An update. ACM SIGKDD Explor. Newsl. 11, 10–18 (2009).
doi: 10.1145/1656274.1656278
Acencio, M. L. & Lemke, N. Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information. BMC Bioinform. 10, 1–18 (2009).
doi: 10.1186/1471-2105-10-290
Bureau, A. et al. Identifying SNPs predictive of phenotype using random forests. Genet. Epidemiol. Off. Publ. Int. Genet. Epidemiol. Soc. 28, 171–182 (2005).
Yuan, Y., Xu, Y., Xu, J., Ball, R. L. & Liang, H. Predicting the lethal phenotype of the knockout mouse by integrating comprehensive genomic data. Bioinformatics 28, 1246–1252 (2012).
pubmed: 22419784
pmcid: 3338016
doi: 10.1093/bioinformatics/bts120
Kohavi, R. in Ijcai. 1137–1145 (Montreal, Canada).
Breiman, L., Friedman, J. H., Olshen, R. A. & Stone, C. J. Classification and Regression Trees (Wadsworth. Inc., Monterey, California, USA, 1984).
Chen, T. & Guestrin, C. in Proceedings of the 22nd acm sigkdd International Conference on Knowledge Discovery and Data Mining. 785–794.
Chen, T. et al. Xgboost: Extreme gradient boosting. R package version 0.4-2 1, 1-4 (2015).
Cortes, C. & Vapnik, V. Support-vector networks. Mach. Learn. 20, 273–297 (1995).
doi: 10.1007/BF00994018
Fuchs, H. et al. The first Scube3 mutant mouse line with pleiotropic phenotypic alterations. G3 Genes Genomes Genet. 6, 4035–4046 (2016).
Cai, A. et al. Genetic inactivation of Semaphorin 3C protects mice from acute kidney injury. Kidney Int. 101, 720–732 (2022).
pubmed: 35090878
doi: 10.1016/j.kint.2021.12.028
Reidy, K. & Tufro, A. Semaphorins in kidney development and disease: Modulators of ureteric bud branching, vascular morphogenesis, and podocyte-endothelial crosstalk. Pediatr. Nephrol. 26, 1407–1412 (2011).
pubmed: 21336944
pmcid: 3397149
doi: 10.1007/s00467-011-1769-1
Combes, A. N. et al. Single cell analysis of the developing mouse kidney provides deeper insight into marker gene expression and ligand-receptor crosstalk. Development 146, dev178673 (2019).
pubmed: 31118232
doi: 10.1242/dev.178673
Vidal, V. P. et al. R-spondin signalling is essential for the maintenance and differentiation of mouse nephron progenitors. Elife 9, e53895 (2020).
pubmed: 32324134
pmcid: 7228766
doi: 10.7554/eLife.53895
Haworth, K. et al. Expression of the Scube3 epidermal growth factor-related gene during early embryonic development in the mouse. Gene Expr. Patterns 7, 630–634 (2007).
pubmed: 17258941
doi: 10.1016/j.modgep.2006.12.004
Lin, Y.-C. et al. SCUBE3 loss-of-function causes a recognizable recessive developmental disorder due to defective bone morphogenetic protein signaling. Am. J. Human Genet. 108, 115–133 (2021).
doi: 10.1016/j.ajhg.2020.11.015
Weber, S. et al. SIX2 and BMP4 mutations associate with anomalous kidney development. J. Am. Soc. Nephrol. 19, 891–903 (2008).
pubmed: 18305125
pmcid: 2386720
doi: 10.1681/ASN.2006111282
Morris, M. R. et al. Genome-wide methylation analysis identifies epigenetically inactivated candidate tumour suppressor genes in renal cell carcinoma. Oncogene 30, 1390–1401 (2011).
pubmed: 21132003
doi: 10.1038/onc.2010.525
Khouja, H. I. et al. Multi-staged gene expression profiling reveals potential genes and the critical pathways in kidney cancer. Sci. Rep. 12, 1–10 (2022).
doi: 10.1038/s41598-022-11143-6
Potter, A. S., Drake, K., Brunskill, E. W. & Potter, S. S. A bigenic mouse model of FSGS reveals perturbed pathways in podocytes, mesangial cells and endothelial cells. PLoS ONE 14, e0216261 (2019).
pubmed: 31461442
pmcid: 6713350
doi: 10.1371/journal.pone.0216261
Villegas, G. & Tufro, A. Ontogeny of semaphorins 3A and 3F and their receptors neuropilins 1 and 2 in the kidney. Mech. Dev. 119, S149–S153 (2002).
pubmed: 14516677
doi: 10.1016/S0925-4773(03)00108-4
Maschietto, M. et al. Temporal blastemal cell gene expression analysis in the kidney reveals new Wnt and related signaling pathway genes to be essential for Wilms’ tumor onset. Cell Death Dis. 2, e224–e224 (2011).
pubmed: 22048167
pmcid: 3223691
doi: 10.1038/cddis.2011.105
Natrajan, R. et al. Array CGH profiling of favourable histology Wilms tumours reveals novel gains and losses associated with relapse. J. Pathol. A J. Pathol. Soc. Great Br. Irel. 210, 49–58 (2006).
Dwivedi, N., Jamadar, A., Mathew, S., Fields, T. A. & Rao, R. Myofibroblast depletion reduces kidney cyst growth and fibrosis in autosomal dominant polycystic kidney disease. Kidney Int. (2022).
Sanna-Cherchi, S. et al. Exome-wide association study identifies GREB1L mutations in congenital kidney malformations. Am. J. Human Genet. 101, 789–802 (2017).
doi: 10.1016/j.ajhg.2017.09.018
Liu, J. et al. Congenital diaphragmatic hernia, kidney agenesis and cardiac defects associated with Slit3-deficiency in mice. Mech. Dev. 120, 1059–1070 (2003).
pubmed: 14550534
doi: 10.1016/S0925-4773(03)00161-8
Haller, M., Mo, Q., Imamoto, A. & Lamb, D. J. Murine model indicates 22q11.2 signaling adaptor CRKL is a dosage-sensitive regulator of genitourinary development. Proc. Natl. Acad. Sci. 114, 4981–4986 (2017).
pubmed: 28439006
pmcid: 5441740
doi: 10.1073/pnas.1619523114
Fu, W.-J. et al. Small interference RNA targeting Krüppel-like factor 8 inhibits the renal carcinoma 786–0 cells growth in vitro and in vivo. J. Cancer Res. Clin. Oncol. 136, 1255–1265 (2010).
pubmed: 20182889
doi: 10.1007/s00432-010-0776-0
Hubbard, T. et al. The Ensembl genome database project. Nucleic Acids Res. 30, 38–41 (2002).
pubmed: 11752248
pmcid: 99161
doi: 10.1093/nar/30.1.38
Pontius, J. U., Wagner, L. & Schuler, G. D. 21. UniGene: A unified view of the transcriptome. The NCBI Handbook. Bethesda, MD: National Library of Medicine (US), NCBI (2003).
Stanton, J.-A.L., Macgregor, A. B. & Green, D. P. Identifying tissue-enriched gene expression in mouse tissues using the NIH UniGene database. Appl. Bioinform. 2, S65–S74 (2003).
Smedley, D. et al. BioMart–biological queries made easy. BMC Genom. 10, 1–12 (2009).
doi: 10.1186/1471-2164-10-22
Bastian, F. et al. in International Workshop on Data Integration in the Life Sciences. 124–131 (Springer).
Rice, P., Longden, I. & Bleasby, A. EMBOSS: The European molecular biology open software suite. Trends Genet. 16, 276–277 (2000).
pubmed: 10827456
doi: 10.1016/S0168-9525(00)02024-2
Horton, P. et al. WoLF PSORT: Protein localization predictor. Nucleic Acids Res. 35, W585–W587 (2007).
pubmed: 17517783
pmcid: 1933216
doi: 10.1093/nar/gkm259
Petersen, T. N., Brunak, S., Von Heijne, G. & Nielsen, H. SignalP 4.0: Discriminating signal peptides from transmembrane regions. Nat. Methods 8, 785–786 (2011).
pubmed: 21959131
doi: 10.1038/nmeth.1701
Brown, K. R. & Jurisica, I. Unequal evolutionary conservation of human protein interactions in interologous networks. Genome Biol. 8, 1–11 (2007).
doi: 10.1186/gb-2007-8-5-r95
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
pubmed: 14597658
pmcid: 403769
doi: 10.1101/gr.1239303
Lin, C.-Y. et al. Hubba: Hub objects analyser–a framework of interactome hubs identification for network biology. Nucleic Acids Res. 36, W438–W443 (2008).
pubmed: 18503085
pmcid: 2447731
doi: 10.1093/nar/gkn257
Huang, D. W. et al. DAVID Bioinformatics Resources: Expanded annotation database and novel algorithms to better extract biology from large gene lists. Nucleic Acids Res. 35, W169–W175 (2007).
pubmed: 17576678
pmcid: 1933169
doi: 10.1093/nar/gkm415
Vitter, J. S. Random sampling with a reservoir. ACM Transact. Math. Softw. (TOMS) 11, 37–57 (1985).
doi: 10.1145/3147.3165
Han, J., Kamber, M. & Pei, J. Data mining: Concepts and techniques. (Elsevier, 2011).
Team, R. C. R. A language and environment for statistical computing. (2013).