Forecasting risk gene discovery in autism with machine learning and genome-scale data.


Journal

Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288

Informations de publication

Date de publication:
12 03 2020
Historique:
received: 07 06 2019
accepted: 10 02 2020
entrez: 14 3 2020
pubmed: 14 3 2020
medline: 15 12 2020
Statut: epublish

Résumé

Genetics has been one of the most powerful windows into the biology of autism spectrum disorder (ASD). It is estimated that a thousand or more genes may confer risk for ASD when functionally perturbed, however, only around 100 genes currently have sufficient evidence to be considered true "autism risk genes". Massive genetic studies are currently underway producing data to implicate additional genes. This approach - although necessary - is costly and slow-moving, making identification of putative ASD risk genes with existing data vital. Here, we approach autism risk gene discovery as a machine learning problem, rather than a genetic association problem, by using genome-scale data as predictors to identify new genes with similar properties to established autism risk genes. This ensemble method, forecASD, integrates brain gene expression, heterogeneous network data, and previous gene-level predictors of autism association into an ensemble classifier that yields a single score indexing evidence of each gene's involvement in the etiology of autism. We demonstrate that forecASD has substantially better performance than previous predictors of autism association in three independent trio-based sequencing studies. Studying forecASD prioritized genes, we show that forecASD is a robust indicator of a gene's involvement in ASD etiology, with diverse applications to gene discovery, differential expression analysis, eQTL prioritization, and pathway enrichment analysis.

Identifiants

pubmed: 32165711
doi: 10.1038/s41598-020-61288-5
pii: 10.1038/s41598-020-61288-5
pmc: PMC7067874
doi:

Substances chimiques

Genetic Markers 0

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

4569

Subventions

Organisme : NIDCD NIH HHS
ID : R01 DC014489
Pays : United States
Organisme : NIGMS NIH HHS
ID : T32 GM007337
Pays : United States
Organisme : NIMH NIH HHS
ID : R01 MH105527
Pays : United States

Commentaires et corrections

Type : ErratumIn

Références

Rosenberg, R. E. et al. Characteristics and concordance of autism spectrum disorders among 277 twin pairs. Archives of Pediatrics & Adolescent Medicine 163, 907, https://doi.org/10.1001/archpediatrics.2009.98 (2009).
doi: 10.1001/archpediatrics.2009.98
Colvert, E. et al. Heritability of autism spectrum disorder in a UK population-based twin sample. JAMA Psychiatry 72, 415, https://doi.org/10.1001/jamapsychiatry.2014.3028 (2015).
doi: 10.1001/jamapsychiatry.2014.3028 pubmed: 25738232 pmcid: 4724890
Rubeis, S. D. et al. Synaptic, transcriptional and chromatin genes disrupted in autism. Nature 515, 209–215, https://doi.org/10.1038/nature13772 (2014).
doi: 10.1038/nature13772 pubmed: 25363760 pmcid: 4402723
Abrahams, B. S. et al. SFARI gene 2.0: a community-driven knowledgebase for the autism spectrum disorders (ASDs). Molecular Autism 4, 36, https://doi.org/10.1186/2040-2392-4-36 (2013).
doi: 10.1186/2040-2392-4-36 pubmed: 24090431 pmcid: 3851189
Iossifov, I. et al. The contribution of de novo coding mutations to autism spectrum disorder. Nature 515, 216–221, https://doi.org/10.1038/nature13908 (2014).
doi: 10.1038/nature13908 pubmed: 25363768 pmcid: 4313871
Satterstrom, F. K. et al. Large-scale exome sequencing study implicates both developmental and functional changes in the neurobiology of autism https://doi.org/10.1101/484113 (2018).
Liu, L. et al. DAWN: a framework to identify autism genes and subnetworks using gene expression and genetics. Molecular Autism 5, 22, https://doi.org/10.1186/2040-2392-5-22 (2014).
doi: 10.1186/2040-2392-5-22 pubmed: 24602502 pmcid: 4016412
Krishnan, A. et al. Genome-wide prediction and functional characterization of the genetic basis of autism spectrum disorder. Nature Neuroscience 19, 1454–1462, https://doi.org/10.1038/nn.4353 (2016).
doi: 10.1038/nn.4353 pubmed: 27479844 pmcid: 5803797
Zhang, C. & Shen, Y. A cell type-specific expression signature predicts haploinsufficient autism-susceptibility genes. Human Mutation 38, 204–215, https://doi.org/10.1002/humu.23147 (2016).
doi: 10.1002/humu.23147 pubmed: 27860035 pmcid: 5865588
Lin, Y., Rajadhyaksha, A. M., Potash, J. B. & Han, S. A machine learning approach to predicting autism risk genes: Validation of known genes and discovery of new candidates https://doi.org/10.1101/463547 (2018).
Duda, M. et al. Brain-specific functional relationship networks inform autism spectrum disorder gene prediction. Translational Psychiatry8 https://doi.org/10.1038/s41398-018-0098-6 (2018).
Yuen, R. K. C. et al. Whole genome sequencing resource identifies 18 new candidate genes for autism spectrum disorder. Nature Neuroscience 20, 602–611, https://doi.org/10.1038/nn.4524 (2017).
doi: 10.1038/nn.4524 pmcid: 5501701
Feliciano, P. et al. SPARK: A US cohort of 50, 000 families to accelerate autism research. Neuron 97, 488–493, https://doi.org/10.1016/j.neuron.2018.01.015 (2018).
doi: 10.1016/j.neuron.2018.01.015
Sunkin, S. M. et al. Allen brain atlas: an integrated spatio-temporal portal for exploring the central nervous system. Nucleic Acids Research 41, D996–D1008, https://doi.org/10.1093/nar/gks1042 (2012).
doi: 10.1093/nar/gks1042 pubmed: 23193282 pmcid: 3531093
v. Mering, C. STRING: a database of predicted functional associations between proteins. Nucleic Acids Research 31, 258–261, https://doi.org/10.1093/nar/gkg034 (2003).
doi: 10.1093/nar/gkg034
R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2008). ISBN 3-900051-07-0.
Sanders, S. J. et al. Insights into autism spectrum disorder genomic architecture and biology from 71 risk loci. Neuron 87, 1215–1233, https://doi.org/10.1016/j.neuron.2015.09.016 (2015).
doi: 10.1016/j.neuron.2015.09.016 pubmed: 26402605 pmcid: 4624267
Liaw, A. & Wiener, M. Classification and regression by randomforest. R News 2, 18–22 (2002).
denovo-db.gs.washington.edu. Accessed: 2018.
Feliciano, P. et al. Exome sequencing of 457 autism families recruited online provides evidence for novel asd genes. bioRxiv, https://doi.org/10.1101/516625 , https://www.biorxiv.org/content/early/2019/44101/09/516625.full.pdf (2019).
Zylka, M. J., Simon, J. M. & Philpot, B. D. Gene length matters in neurons. Neuron 86, 353–355, https://doi.org/10.1016/j.neuron.2015.03.059 (2015).
doi: 10.1016/j.neuron.2015.03.059 pubmed: 25905808 pmcid: 4584405
Ruzzo, E. K. et al. Whole genome sequencing in multiplex families reveals novel inherited and de novo genetic risk in autism, https://doi.org/10.1101/338855 (2018).
Fabregat, A. et al. The reactome pathway knowledgebase. Nucleic Acids Research 46, D649–D655, https://doi.org/10.1093/nar/gkx1132 (2017).
doi: 10.1093/nar/gkx1132 pmcid: 5753187
Mi, H. The PANTHER database of protein families, subfamilies, functions and pathways. Nucleic Acids Research 33, D284–D288, https://doi.org/10.1093/nar/gki078 (2004).
doi: 10.1093/nar/gki078 pmcid: 540032
Gandal, M. J. et al. Shared molecular neuropathology across major psychiatric disorders parallels polygenic overlap. Science 359, 693–697, https://doi.org/10.1126/science.aad6469 (2018).
doi: 10.1126/science.aad6469 pubmed: 29439242 pmcid: 5898828
Wang, D. et al. Comprehensive functional genomic resource and integrative model for the human brain. Science 362, eaat8464, https://doi.org/10.1126/science.aat8464 (2018).
doi: 10.1126/science.aat8464 pubmed: 30545857 pmcid: 6413328
Grove, J. et al. Identification of common genetic risk variants for autism spectrum disorder. Nature Genetics 51, 431–444, https://doi.org/10.1038/s41588-019-0344-8 (2019).
doi: 10.1038/s41588-019-0344-8 pubmed: 30804558 pmcid: 6454898
Csardi, G. & Nepusz, T. The igraph software package for complex network research. InterJournal  Complex Systems, 1695 (2006).
Newman, M. E. J. Modularity and community structure in networks. Proceedings of the National Academy of Sciences 103, 8577–8582, https://doi.org/10.1073/pnas.0601602103 (2006).
doi: 10.1073/pnas.0601602103
Shannon, P. Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Research 13, 2498–2504, https://doi.org/10.1101/gr.1239303 (2003).
doi: 10.1101/gr.1239303 pubmed: 14597658 pmcid: 403769
Szklarczyk, D. et al. The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible. Nucleic Acids Research 45, D362–D368, https://doi.org/10.1093/nar/464gkw937 (2016).
doi: 10.1093/nar/464gkw937 pubmed: 27924014 pmcid: 5210637
O’Roak, B. J. et al. Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature 485, 246–250, https://doi.org/10.1038/nature10989 (2012).
doi: 10.1038/nature10989 pubmed: 22495309 pmcid: 3350576
Sanders, S. J. et al. De novo mutations revealed by whole-exome sequencing are strongly associated with autism. Nature 485, 237–241, https://doi.org/10.1038/nature10945 (2012).
doi: 10.1038/nature10945 pubmed: 22495306 pmcid: 3667984
Turner, T. N. et al. Genomic patterns of de novo mutation in simplex autism. Cell 171, 710–722e12, https://doi.org/10.1016/j.cell.2017.08.047 (2017).
doi: 10.1016/j.cell.2017.08.047 pubmed: 28965761 pmcid: 5679715
Karczewski, K. J.  et al.  Variation across 141, 456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes https://doi.org/10.1101/531210 (2019).
Reiner, O., Karzbrun, E., Kshirsagar, A. & Kaibuchi, K. Regulation of neuronal migration, an emerging topic in autism spectrum disorders. Journal of Neurochemistry 136, 440–456, https://doi.org/10.1111/jnc.13403 (2015).
doi: 10.1111/jnc.13403 pubmed: 26485324
Loebrich, S. The role of f-actin in modulating clathrin-mediated endocytosis: Lessons from neurons in health and neuropsychiatric disorder. Communicative &Integrative Biology 7, e28740, https://doi.org/10.4161/cib.28740 (2014).
doi: 10.4161/cib.28740
Reichova, A., Zatkova, M., Bacova, Z. & Bakos, J. Abnormalities in interactions of rho GTPases with scaffolding proteins contribute to neurodevelopmental disorders. Journal of Neuroscience Research 96, 781–788, https://doi.org/10.1002/jnr.24200 (2017).
doi: 10.1002/jnr.24200 pubmed: 29168207
Martin-Vilchez, S. et al. RhoGTPase regulators orchestrate distinct stages of synaptic development. PLOS ONE 12, e0170464, https://doi.org/10.1371/journal.pone.0170464 (2017).
doi: 10.1371/journal.pone.0170464 pubmed: 28114311 pmcid: 5256999
Sun, W. et al. Histone acetylome-wide association study of autism spectrum disorder. Cell 167, 1385–1397.e11, https://doi.org/10.1016/j.cell.2016.10.031 (2016).
doi: 10.1016/j.cell.2016.10.031 pubmed: 27863250
Lipton, J. O. et al. Aberrant proteostasis of BMAL1 underlies circadian abnormalities in a paradigmatic mTOR-opathy. Cell Reports 20, 868–880, https://doi.org/10.1016/j.celrep.2017.07.008 (2017).
doi: 10.1016/j.celrep.2017.07.008 pubmed: 28746872 pmcid: 5603761
Monyak, R. E. et al. Insulin signaling misregulation underlies circadian and cognitive deficits in a drosophila fragile x model. Molecular Psychiatry 22, 1140–1148, https://doi.org/10.1038/mp.2016.51 (2016).
doi: 10.1038/mp.2016.51 pubmed: 27090306 pmcid: 5071102
Kozlov, S. V. et al. The imprinted gene magel2 regulates normal circadian output. Nature Genetics 39, 1266–1272, https://doi.org/10.1038/ng2114 (2007).
doi: 10.1038/ng2114 pubmed: 17893678
Guglielmi, L.  Update on the implication of potassium channels in autism: K channelautism spectrum disorder. Frontiers Cellular Neuroscience 9, https://doi.org/10.3389/fncel.2015.00034 (2015).
Deng, P.-Y. & Klyachko, V. A. Genetic upregulation of BK channel activity normalizes multiple synaptic and circuit defects in a mouse model of fragile x syndrome. The Journal of Physiology 594, 83–97, https://doi.org/10.1113/jp271031 (2015).
doi: 10.1113/jp271031 pubmed: 26427907 pmcid: 4704506
Lee, H. et al. Exome sequencing identifies de novo gain of function missense mutation in KCND2 in identical twins with autism and seizures that slows potassium channel inactivation. Human Molecular Genetics 23, 3481–3489, https://doi.org/10.1093/hmg/ddu056 (2014).
doi: 10.1093/hmg/ddu056 pubmed: 24501278 pmcid: 4049306
Sicca, F.  et al.  Gain-of-function defects of astrocytic kir4.1 channels in children with autism spectrum disorders and epilepsy. Scientific Reports  6,  https://doi.org/10.1038/srep34325 (2016).

Auteurs

Leo Brueggeman (L)

University of Iowa, Department of Psychiatry, Iowa City, IA, USA.
University of Iowa, Interdisciplinary Genetics Program, Iowa City, IA, USA.
University of Iowa, Medical Scientist Training Program, Iowa City, IA, USA.

Tanner Koomar (T)

University of Iowa, Department of Psychiatry, Iowa City, IA, USA.
University of Iowa, Interdisciplinary Genetics Program, Iowa City, IA, USA.

Jacob J Michaelson (JJ)

University of Iowa, Department of Psychiatry, Iowa City, IA, USA. jacob-michaelson@uiowa.edu.
University of Iowa, Interdisciplinary Genetics Program, Iowa City, IA, USA. jacob-michaelson@uiowa.edu.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C

Classifications MeSH