Predicting congenital renal tract malformation genes using machine learning.


Journal

Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288

Informations de publication

Date de publication:
14 08 2023
Historique:
received: 03 03 2023
accepted: 03 07 2023
medline: 16 8 2023
pubmed: 15 8 2023
entrez: 14 8 2023
Statut: epublish

Résumé

Congenital renal tract malformations (RTMs) are the major cause of severe kidney failure in children. Studies to date have identified defined genetic causes for only a minority of human RTMs. While some RTMs may be caused by poorly defined environmental perturbations affecting organogenesis, it is likely that numerous causative genetic variants have yet to be identified. Unfortunately, the speed of discovering further genetic causes for RTMs is limited by challenges in prioritising candidate genes harbouring sequence variants. Here, we exploited the computer-based artificial intelligence methodology of supervised machine learning to identify genes with a high probability of being involved in renal development. These genes, when mutated, are promising candidates for causing RTMs. With this methodology, the machine learning classifier determines which attributes are common to renal development genes and identifies genes possessing these attributes. Here we report the validation of an RTM gene classifier and provide predictions of the RTM association status for all protein-coding genes in the mouse genome. Overall, our predictions, whilst not definitive, can inform the prioritisation of genes when evaluating patient sequence data for genetic diagnosis. This knowledge of renal developmental genes will accelerate the processes of reaching a genetic diagnosis for patients born with RTMs.

Identifiants

pubmed: 37580336
doi: 10.1038/s41598-023-38110-z
pii: 10.1038/s41598-023-38110-z
pmc: PMC10425350
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

13204

Subventions

Organisme : British Heart Foundation
ID : RG/F/21/110050
Pays : United Kingdom
Organisme : Medical Research Council
ID : MR/T016809/1
Pays : United Kingdom
Organisme : British Heart Foundation
ID : FS/15/67/32038
Pays : United Kingdom
Organisme : British Heart Foundation
ID : RG/15/12/31616
Pays : United Kingdom
Organisme : British Heart Foundation
ID : CH/13/2/30154
Pays : United Kingdom

Informations de copyright

© 2023. Springer Nature Limited.

Références

Neild, G. H. Primary renal disease in young adults with renal failure. Nephrol. Dial. Transplant. 25, 1025–1032 (2010).
pubmed: 20019018 doi: 10.1093/ndt/gfp653
Plumb, L. et al. demography of the UK paediatric renal replacement therapy population in 2016. Nephron 139, 105–116 (2018).
pubmed: 29991000 doi: 10.1159/000490962
Westland, R., Renkema, K. Y. & Knoers, N. V. Clinical integration of genome diagnostics for congenital anomalies of the kidney and urinary tract. Clin. J. Am. Soc. Nephrol. 16, 128–137 (2021).
doi: 10.2215/CJN.14661119
Woolf, A. S., Lopes, F. M., Ranjzad, P. & Roberts, N. A. Congenital disorders of the human urinary tract: Recent insights from genetic and molecular studies. Front. Pediatr. 7, 136 (2019).
pubmed: 31032239 pmcid: 6470263 doi: 10.3389/fped.2019.00136
Adalat, S. et al. HNF1B mutations associate with hypomagnesemia and renal magnesium wasting. J. Am. Soc. Nephrol. 20, 1123–1131 (2009).
pubmed: 19389850 pmcid: 2678044 doi: 10.1681/ASN.2008060633
Weber, S. et al. Prevalence of mutations in renal developmental genes in children with renal hypodysplasia: Results of the ESCAPE study. J. Am. Soc. Nephrol. 17, 2864–2870 (2006).
pubmed: 16971658 doi: 10.1681/ASN.2006030277
Groen I’nt Woud, S. et al. Maternal risk factors involved in specific congenital anomalies of the kidney and urinary tract: A case–control study. Birth Defects Res. Part A Clin. Mol. Teratol. 106, 596–603 (2016).
doi: 10.1002/bdra.23500
Woolf, A. S. Environmental influences on renal tract development: A focus on maternal diet and the glucocorticoid hypothesis. Klin. Padiatr. 223, S10–S17 (2011).
pubmed: 21472672 doi: 10.1055/s-0030-1255876
Baştanlar, Y. & Özuysal, M. Introduction to machine learning. miRNomics: MicroRNA biology and computational analysis, 105–128 (2014).
Libbrecht, M. W. & Noble, W. S. Machine learning applications in genetics and genomics. Nat. Rev. Genet. 16, 321–332 (2015).
pubmed: 25948244 pmcid: 5204302 doi: 10.1038/nrg3920
Bakheet, T. M. & Doig, A. J. Properties and identification of human protein drug targets. Bioinformatics 25, 451–457 (2009).
pubmed: 19164304 doi: 10.1093/bioinformatics/btp002
Bakheet, T. M. & Doig, A. J. Properties and identification of antibiotic drug targets. BMC Bioinform. 11, 1–10 (2010).
doi: 10.1186/1471-2105-11-195
Bull, S. C. & Doig, A. J. Properties of protein drug target classes. PLoS ONE 10, e0117955 (2015).
pubmed: 25822509 pmcid: 4379170 doi: 10.1371/journal.pone.0117955
Tian, D. et al. Identifying mouse developmental essential genes using machine learning. Dis. Models Mech. 11, dmm034546 (2018).
doi: 10.1242/dmm.034546
Yang, W. et al. Identification of genes and pathways involved in kidney renal clear cell carcinoma. BMC Bioinform. 15, 1–10 (2014).
Guan, Y., Martini, S. & Mariani, L. H. in Seminars in Nephrology. 237–244 (Elsevier).
López-Bigas, N. & Ouzounis, C. A. Genome-wide identification of genes likely to be involved in human genetic disease. Nucleic Acids Res. 32, 3108–3114 (2004).
pubmed: 15181176 pmcid: 434425 doi: 10.1093/nar/gkh605
Oliver, P. L., Bitoun, E. & Davies, K. E. Comparative genetic analysis: The utility of mouse genetic systems for studying human monogenic disease. Mamm. Genome 18, 412–424 (2007).
pubmed: 17514509 pmcid: 1998876 doi: 10.1007/s00335-007-9014-8
Rangarajan, A. & Weinberg, R. A. Comparative biology of mouse versus human cells: Modelling human cancer in mice. Nat. Rev. Cancer 3, 952–959 (2003).
pubmed: 14737125 doi: 10.1038/nrc1235
Bult, C. J. et al. The mouse genome database (MGD): mouse biology and model systems. Nucleic Acids Res. 36, D724–D728 (2008).
pubmed: 18158299 doi: 10.1093/nar/gkm961
Koscielny, G. et al. The international mouse phenotyping consortium web portal, a unified point of access for knockout mice and related phenotyping data. Nucleic Acids Res. 42, D802–D809 (2014).
pubmed: 24194600 doi: 10.1093/nar/gkt977
Hamosh, A., Scott, A. F., Amberger, J. S., Bocchini, C. A. & McKusick, V. A. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 33, D514–D517 (2005).
pubmed: 15608251 doi: 10.1093/nar/gki033
Murugapoopathy, V. & Gupta, I. R. A primer on congenital anomalies of the kidneys and urinary tracts (CAKUT). Clin. J. Am. Soc. Nephrol. 15, 723–731 (2020).
pubmed: 32188635 pmcid: 7269211 doi: 10.2215/CJN.12581019
Kabir, M., Barradas, A., Tzotzos, G. T., Hentges, K. E. & Doig, A. J. Properties of genes essential for mouse development. PLoS ONE 12, e0178273 (2017).
pubmed: 28562614 pmcid: 5451031 doi: 10.1371/journal.pone.0178273
Apweiler, R. et al. UniProt: The universal protein knowledgebase. Nucleic Acids Res. 32, D115–D119 (2004).
pubmed: 14681372 pmcid: 308865 doi: 10.1093/nar/gkh131
Consortium, U. UniProt: A hub for protein information. Nucleic acids research 43, D204-D212 (2015).
Ashburner, M. et al. Gene ontology: Tool for the unification of biology. Nat. Genet. 25, 25–29 (2000).
pubmed: 10802651 pmcid: 3037419 doi: 10.1038/75556
Stark, C. et al. BioGRID: A general repository for interaction datasets. Nucleic Acids Res. 34, D535–D539 (2006).
pubmed: 16381927 doi: 10.1093/nar/gkj109
Bader, G. D., Betel, D. & Hogue, C. W. BIND: The biomolecular interaction network database. Nucleic Acids Res. 31, 248–250 (2003).
pubmed: 12519993 pmcid: 165503 doi: 10.1093/nar/gkg056
Chen, C. et al. Mouse Piwi interactome identifies binding mechanism of Tdrkh Tudor domain to arginine methylated Miwi. Proc. Natl. Acad. Sci. 106, 20336–20341 (2009).
pubmed: 19918066 pmcid: 2787181 doi: 10.1073/pnas.0911640106
Hermjakob, H. et al. IntAct: An open source molecular interaction database. Nucleic Acids Res. 32, D452–D455 (2004).
pubmed: 14681455 pmcid: 308786 doi: 10.1093/nar/gkh052
Lynn, D. J. et al. InnateDB: facilitating systems-level analyses of the mammalian innate immune response. Mol. Syst. Biol. 4, 218 (2008).
pubmed: 18766178 pmcid: 2564732 doi: 10.1038/msb.2008.55
Xenarios, I. et al. DIP: The database of interacting proteins. Nucleic Acids Res. 28, 289–291 (2000).
pubmed: 10592249 pmcid: 102387 doi: 10.1093/nar/28.1.289
Zanzoni, A. et al. MINT: A Molecular INTeraction database. FEBS Lett. 513, 135–140 (2002).
pubmed: 11911893 doi: 10.1016/S0014-5793(01)03293-8
Wang, J. et al. A protein interaction network for pluripotency of embryonic stem cells. Nature 444, 364–368 (2006).
pubmed: 17093407 doi: 10.1038/nature05284
He, H. & Garcia, E. A. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21, 1263–1284 (2009).
doi: 10.1109/TKDE.2008.239
Visa, S. & Ralescu, A. in Proceedings of the Sixteen Midwest Artificial Intelligence and Cognitive Science Conference. 67–73 (sn).
Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002).
doi: 10.1613/jair.953
Kobrynski, L. J. & Sullivan, K. E. Velocardiofacial syndrome, DiGeorge syndrome: the chromosome 22q11. 2 deletion syndromes. Lancet 370, 1443–1452 (2007).
pubmed: 17950858 doi: 10.1016/S0140-6736(07)61601-8
Lopez-Rivera, E. et al. Genetic drivers of kidney defects in the DiGeorge syndrome. N. Engl. J. Med. 376, 742–754 (2017).
pubmed: 28121514 pmcid: 5559731 doi: 10.1056/NEJMoa1609009
Darlow, J. M. et al. Genome-wide linkage and association study implicates the 10q26 region as a major genetic contributor to primary nonsyndromic vesicoureteric reflux. Sci. Rep. 7, 1–13 (2017).
doi: 10.1038/s41598-017-15062-9
Motenko, H., Neuhauser, S. B., O’keefe, M. & Richardson, J. E. MouseMine: a new data warehouse for MGI. Mamm. Genome 26, 325–330 (2015).
pubmed: 26092688 pmcid: 4534495 doi: 10.1007/s00335-015-9573-z
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
doi: 10.1023/A:1010933404324
Hall, M. et al. The WEKA data mining software: An update. ACM SIGKDD Explor. Newsl. 11, 10–18 (2009).
doi: 10.1145/1656274.1656278
Acencio, M. L. & Lemke, N. Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information. BMC Bioinform. 10, 1–18 (2009).
doi: 10.1186/1471-2105-10-290
Bureau, A. et al. Identifying SNPs predictive of phenotype using random forests. Genet. Epidemiol. Off. Publ. Int. Genet. Epidemiol. Soc. 28, 171–182 (2005).
Yuan, Y., Xu, Y., Xu, J., Ball, R. L. & Liang, H. Predicting the lethal phenotype of the knockout mouse by integrating comprehensive genomic data. Bioinformatics 28, 1246–1252 (2012).
pubmed: 22419784 pmcid: 3338016 doi: 10.1093/bioinformatics/bts120
Kohavi, R. in Ijcai. 1137–1145 (Montreal, Canada).
Breiman, L., Friedman, J. H., Olshen, R. A. & Stone, C. J. Classification and Regression Trees (Wadsworth. Inc., Monterey, California, USA, 1984).
Chen, T. & Guestrin, C. in Proceedings of the 22nd acm sigkdd International Conference on Knowledge Discovery and Data Mining. 785–794.
Chen, T. et al. Xgboost: Extreme gradient boosting. R package version 0.4-2 1, 1-4 (2015).
Cortes, C. & Vapnik, V. Support-vector networks. Mach. Learn. 20, 273–297 (1995).
doi: 10.1007/BF00994018
Fuchs, H. et al. The first Scube3 mutant mouse line with pleiotropic phenotypic alterations. G3 Genes Genomes Genet. 6, 4035–4046 (2016).
Cai, A. et al. Genetic inactivation of Semaphorin 3C protects mice from acute kidney injury. Kidney Int. 101, 720–732 (2022).
pubmed: 35090878 doi: 10.1016/j.kint.2021.12.028
Reidy, K. & Tufro, A. Semaphorins in kidney development and disease: Modulators of ureteric bud branching, vascular morphogenesis, and podocyte-endothelial crosstalk. Pediatr. Nephrol. 26, 1407–1412 (2011).
pubmed: 21336944 pmcid: 3397149 doi: 10.1007/s00467-011-1769-1
Combes, A. N. et al. Single cell analysis of the developing mouse kidney provides deeper insight into marker gene expression and ligand-receptor crosstalk. Development 146, dev178673 (2019).
pubmed: 31118232 doi: 10.1242/dev.178673
Vidal, V. P. et al. R-spondin signalling is essential for the maintenance and differentiation of mouse nephron progenitors. Elife 9, e53895 (2020).
pubmed: 32324134 pmcid: 7228766 doi: 10.7554/eLife.53895
Haworth, K. et al. Expression of the Scube3 epidermal growth factor-related gene during early embryonic development in the mouse. Gene Expr. Patterns 7, 630–634 (2007).
pubmed: 17258941 doi: 10.1016/j.modgep.2006.12.004
Lin, Y.-C. et al. SCUBE3 loss-of-function causes a recognizable recessive developmental disorder due to defective bone morphogenetic protein signaling. Am. J. Human Genet. 108, 115–133 (2021).
doi: 10.1016/j.ajhg.2020.11.015
Weber, S. et al. SIX2 and BMP4 mutations associate with anomalous kidney development. J. Am. Soc. Nephrol. 19, 891–903 (2008).
pubmed: 18305125 pmcid: 2386720 doi: 10.1681/ASN.2006111282
Morris, M. R. et al. Genome-wide methylation analysis identifies epigenetically inactivated candidate tumour suppressor genes in renal cell carcinoma. Oncogene 30, 1390–1401 (2011).
pubmed: 21132003 doi: 10.1038/onc.2010.525
Khouja, H. I. et al. Multi-staged gene expression profiling reveals potential genes and the critical pathways in kidney cancer. Sci. Rep. 12, 1–10 (2022).
doi: 10.1038/s41598-022-11143-6
Potter, A. S., Drake, K., Brunskill, E. W. & Potter, S. S. A bigenic mouse model of FSGS reveals perturbed pathways in podocytes, mesangial cells and endothelial cells. PLoS ONE 14, e0216261 (2019).
pubmed: 31461442 pmcid: 6713350 doi: 10.1371/journal.pone.0216261
Villegas, G. & Tufro, A. Ontogeny of semaphorins 3A and 3F and their receptors neuropilins 1 and 2 in the kidney. Mech. Dev. 119, S149–S153 (2002).
pubmed: 14516677 doi: 10.1016/S0925-4773(03)00108-4
Maschietto, M. et al. Temporal blastemal cell gene expression analysis in the kidney reveals new Wnt and related signaling pathway genes to be essential for Wilms’ tumor onset. Cell Death Dis. 2, e224–e224 (2011).
pubmed: 22048167 pmcid: 3223691 doi: 10.1038/cddis.2011.105
Natrajan, R. et al. Array CGH profiling of favourable histology Wilms tumours reveals novel gains and losses associated with relapse. J. Pathol. A J. Pathol. Soc. Great Br. Irel. 210, 49–58 (2006).
Dwivedi, N., Jamadar, A., Mathew, S., Fields, T. A. & Rao, R. Myofibroblast depletion reduces kidney cyst growth and fibrosis in autosomal dominant polycystic kidney disease. Kidney Int. (2022).
Sanna-Cherchi, S. et al. Exome-wide association study identifies GREB1L mutations in congenital kidney malformations. Am. J. Human Genet. 101, 789–802 (2017).
doi: 10.1016/j.ajhg.2017.09.018
Liu, J. et al. Congenital diaphragmatic hernia, kidney agenesis and cardiac defects associated with Slit3-deficiency in mice. Mech. Dev. 120, 1059–1070 (2003).
pubmed: 14550534 doi: 10.1016/S0925-4773(03)00161-8
Haller, M., Mo, Q., Imamoto, A. & Lamb, D. J. Murine model indicates 22q11.2 signaling adaptor CRKL is a dosage-sensitive regulator of genitourinary development. Proc. Natl. Acad. Sci. 114, 4981–4986 (2017).
pubmed: 28439006 pmcid: 5441740 doi: 10.1073/pnas.1619523114
Fu, W.-J. et al. Small interference RNA targeting Krüppel-like factor 8 inhibits the renal carcinoma 786–0 cells growth in vitro and in vivo. J. Cancer Res. Clin. Oncol. 136, 1255–1265 (2010).
pubmed: 20182889 doi: 10.1007/s00432-010-0776-0
Hubbard, T. et al. The Ensembl genome database project. Nucleic Acids Res. 30, 38–41 (2002).
pubmed: 11752248 pmcid: 99161 doi: 10.1093/nar/30.1.38
Pontius, J. U., Wagner, L. & Schuler, G. D. 21. UniGene: A unified view of the transcriptome. The NCBI Handbook. Bethesda, MD: National Library of Medicine (US), NCBI (2003).
Stanton, J.-A.L., Macgregor, A. B. & Green, D. P. Identifying tissue-enriched gene expression in mouse tissues using the NIH UniGene database. Appl. Bioinform. 2, S65–S74 (2003).
Smedley, D. et al. BioMart–biological queries made easy. BMC Genom. 10, 1–12 (2009).
doi: 10.1186/1471-2164-10-22
Bastian, F. et al. in International Workshop on Data Integration in the Life Sciences. 124–131 (Springer).
Rice, P., Longden, I. & Bleasby, A. EMBOSS: The European molecular biology open software suite. Trends Genet. 16, 276–277 (2000).
pubmed: 10827456 doi: 10.1016/S0168-9525(00)02024-2
Horton, P. et al. WoLF PSORT: Protein localization predictor. Nucleic Acids Res. 35, W585–W587 (2007).
pubmed: 17517783 pmcid: 1933216 doi: 10.1093/nar/gkm259
Petersen, T. N., Brunak, S., Von Heijne, G. & Nielsen, H. SignalP 4.0: Discriminating signal peptides from transmembrane regions. Nat. Methods 8, 785–786 (2011).
pubmed: 21959131 doi: 10.1038/nmeth.1701
Brown, K. R. & Jurisica, I. Unequal evolutionary conservation of human protein interactions in interologous networks. Genome Biol. 8, 1–11 (2007).
doi: 10.1186/gb-2007-8-5-r95
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
pubmed: 14597658 pmcid: 403769 doi: 10.1101/gr.1239303
Lin, C.-Y. et al. Hubba: Hub objects analyser–a framework of interactome hubs identification for network biology. Nucleic Acids Res. 36, W438–W443 (2008).
pubmed: 18503085 pmcid: 2447731 doi: 10.1093/nar/gkn257
Huang, D. W. et al. DAVID Bioinformatics Resources: Expanded annotation database and novel algorithms to better extract biology from large gene lists. Nucleic Acids Res. 35, W169–W175 (2007).
pubmed: 17576678 pmcid: 1933169 doi: 10.1093/nar/gkm415
Vitter, J. S. Random sampling with a reservoir. ACM Transact. Math. Softw. (TOMS) 11, 37–57 (1985).
doi: 10.1145/3147.3165
Han, J., Kamber, M. & Pei, J. Data mining: Concepts and techniques. (Elsevier, 2011).
Team, R. C. R. A language and environment for statistical computing. (2013).

Auteurs

Mitra Kabir (M)

CentreDivision of Evolution, Infection and Genomics, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, The University of Manchester, Oxford Road, Manchester, M13 9PT, UK.

Helen M Stuart (HM)

CentreDivision of Evolution, Infection and Genomics, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, The University of Manchester, Oxford Road, Manchester, M13 9PT, UK.
Manchester Centre for Genomic Medicine, St. Mary's Hospital, Health Innovation Manchester, Manchester University Foundation NHS Trust, Manchester, M13 9WL, UK.

Filipa M Lopes (FM)

Division of Cell Matrix Biology and Regenerative Medicine, School of Biological Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester, M13 9PL, UK.

Elisavet Fotiou (E)

Division of Cardiovascular Sciences, School of Medical Sciences, Faculty of Biology, Medicine, and Health, The University of Manchester, Manchester, M13 9PL, UK.
C.B.B Lifeline Biotech Ltd, 5 Propontidos Street, Strovolos, 2033, Nicosia, Cyprus.

Bernard Keavney (B)

Division of Cardiovascular Sciences, School of Medical Sciences, Faculty of Biology, Medicine, and Health, The University of Manchester, Manchester, M13 9PL, UK.
Manchester Heart Institute, Manchester University NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester, M13 9WL, UK.

Andrew J Doig (AJ)

Division of Neuroscience, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Stopford Building, Manchester, M13 9BL, UK.

Adrian S Woolf (AS)

Division of Cell Matrix Biology and Regenerative Medicine, School of Biological Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester, M13 9PL, UK.
Department of Nephrology, Royal Manchester Children's Hospital, Manchester Academic Health Science Centre, Manchester, M13 9WL, UK.

Kathryn E Hentges (KE)

CentreDivision of Evolution, Infection and Genomics, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, The University of Manchester, Oxford Road, Manchester, M13 9PT, UK. Kathryn.hentges@manchester.ac.uk.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH