Forecasting risk gene discovery in autism with machine learning and genome-scale data.
Journal
Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288
Informations de publication
Date de publication:
12 03 2020
12 03 2020
Historique:
received:
07
06
2019
accepted:
10
02
2020
entrez:
14
3
2020
pubmed:
14
3
2020
medline:
15
12
2020
Statut:
epublish
Résumé
Genetics has been one of the most powerful windows into the biology of autism spectrum disorder (ASD). It is estimated that a thousand or more genes may confer risk for ASD when functionally perturbed, however, only around 100 genes currently have sufficient evidence to be considered true "autism risk genes". Massive genetic studies are currently underway producing data to implicate additional genes. This approach - although necessary - is costly and slow-moving, making identification of putative ASD risk genes with existing data vital. Here, we approach autism risk gene discovery as a machine learning problem, rather than a genetic association problem, by using genome-scale data as predictors to identify new genes with similar properties to established autism risk genes. This ensemble method, forecASD, integrates brain gene expression, heterogeneous network data, and previous gene-level predictors of autism association into an ensemble classifier that yields a single score indexing evidence of each gene's involvement in the etiology of autism. We demonstrate that forecASD has substantially better performance than previous predictors of autism association in three independent trio-based sequencing studies. Studying forecASD prioritized genes, we show that forecASD is a robust indicator of a gene's involvement in ASD etiology, with diverse applications to gene discovery, differential expression analysis, eQTL prioritization, and pathway enrichment analysis.
Identifiants
pubmed: 32165711
doi: 10.1038/s41598-020-61288-5
pii: 10.1038/s41598-020-61288-5
pmc: PMC7067874
doi:
Substances chimiques
Genetic Markers
0
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
4569Subventions
Organisme : NIDCD NIH HHS
ID : R01 DC014489
Pays : United States
Organisme : NIGMS NIH HHS
ID : T32 GM007337
Pays : United States
Organisme : NIMH NIH HHS
ID : R01 MH105527
Pays : United States
Commentaires et corrections
Type : ErratumIn
Références
Rosenberg, R. E. et al. Characteristics and concordance of autism spectrum disorders among 277 twin pairs. Archives of Pediatrics & Adolescent Medicine 163, 907, https://doi.org/10.1001/archpediatrics.2009.98 (2009).
doi: 10.1001/archpediatrics.2009.98
Colvert, E. et al. Heritability of autism spectrum disorder in a UK population-based twin sample. JAMA Psychiatry 72, 415, https://doi.org/10.1001/jamapsychiatry.2014.3028 (2015).
doi: 10.1001/jamapsychiatry.2014.3028
pubmed: 25738232
pmcid: 4724890
Rubeis, S. D. et al. Synaptic, transcriptional and chromatin genes disrupted in autism. Nature 515, 209–215, https://doi.org/10.1038/nature13772 (2014).
doi: 10.1038/nature13772
pubmed: 25363760
pmcid: 4402723
Abrahams, B. S. et al. SFARI gene 2.0: a community-driven knowledgebase for the autism spectrum disorders (ASDs). Molecular Autism 4, 36, https://doi.org/10.1186/2040-2392-4-36 (2013).
doi: 10.1186/2040-2392-4-36
pubmed: 24090431
pmcid: 3851189
Iossifov, I. et al. The contribution of de novo coding mutations to autism spectrum disorder. Nature 515, 216–221, https://doi.org/10.1038/nature13908 (2014).
doi: 10.1038/nature13908
pubmed: 25363768
pmcid: 4313871
Satterstrom, F. K. et al. Large-scale exome sequencing study implicates both developmental and functional changes in the neurobiology of autism https://doi.org/10.1101/484113 (2018).
Liu, L. et al. DAWN: a framework to identify autism genes and subnetworks using gene expression and genetics. Molecular Autism 5, 22, https://doi.org/10.1186/2040-2392-5-22 (2014).
doi: 10.1186/2040-2392-5-22
pubmed: 24602502
pmcid: 4016412
Krishnan, A. et al. Genome-wide prediction and functional characterization of the genetic basis of autism spectrum disorder. Nature Neuroscience 19, 1454–1462, https://doi.org/10.1038/nn.4353 (2016).
doi: 10.1038/nn.4353
pubmed: 27479844
pmcid: 5803797
Zhang, C. & Shen, Y. A cell type-specific expression signature predicts haploinsufficient autism-susceptibility genes. Human Mutation 38, 204–215, https://doi.org/10.1002/humu.23147 (2016).
doi: 10.1002/humu.23147
pubmed: 27860035
pmcid: 5865588
Lin, Y., Rajadhyaksha, A. M., Potash, J. B. & Han, S. A machine learning approach to predicting autism risk genes: Validation of known genes and discovery of new candidates https://doi.org/10.1101/463547 (2018).
Duda, M. et al. Brain-specific functional relationship networks inform autism spectrum disorder gene prediction. Translational Psychiatry8 https://doi.org/10.1038/s41398-018-0098-6 (2018).
Yuen, R. K. C. et al. Whole genome sequencing resource identifies 18 new candidate genes for autism spectrum disorder. Nature Neuroscience 20, 602–611, https://doi.org/10.1038/nn.4524 (2017).
doi: 10.1038/nn.4524
pmcid: 5501701
Feliciano, P. et al. SPARK: A US cohort of 50, 000 families to accelerate autism research. Neuron 97, 488–493, https://doi.org/10.1016/j.neuron.2018.01.015 (2018).
doi: 10.1016/j.neuron.2018.01.015
Sunkin, S. M. et al. Allen brain atlas: an integrated spatio-temporal portal for exploring the central nervous system. Nucleic Acids Research 41, D996–D1008, https://doi.org/10.1093/nar/gks1042 (2012).
doi: 10.1093/nar/gks1042
pubmed: 23193282
pmcid: 3531093
v. Mering, C. STRING: a database of predicted functional associations between proteins. Nucleic Acids Research 31, 258–261, https://doi.org/10.1093/nar/gkg034 (2003).
doi: 10.1093/nar/gkg034
R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2008). ISBN 3-900051-07-0.
Sanders, S. J. et al. Insights into autism spectrum disorder genomic architecture and biology from 71 risk loci. Neuron 87, 1215–1233, https://doi.org/10.1016/j.neuron.2015.09.016 (2015).
doi: 10.1016/j.neuron.2015.09.016
pubmed: 26402605
pmcid: 4624267
Liaw, A. & Wiener, M. Classification and regression by randomforest. R News 2, 18–22 (2002).
denovo-db.gs.washington.edu. Accessed: 2018.
Feliciano, P. et al. Exome sequencing of 457 autism families recruited online provides evidence for novel asd genes. bioRxiv, https://doi.org/10.1101/516625 , https://www.biorxiv.org/content/early/2019/44101/09/516625.full.pdf (2019).
Zylka, M. J., Simon, J. M. & Philpot, B. D. Gene length matters in neurons. Neuron 86, 353–355, https://doi.org/10.1016/j.neuron.2015.03.059 (2015).
doi: 10.1016/j.neuron.2015.03.059
pubmed: 25905808
pmcid: 4584405
Ruzzo, E. K. et al. Whole genome sequencing in multiplex families reveals novel inherited and de novo genetic risk in autism, https://doi.org/10.1101/338855 (2018).
Fabregat, A. et al. The reactome pathway knowledgebase. Nucleic Acids Research 46, D649–D655, https://doi.org/10.1093/nar/gkx1132 (2017).
doi: 10.1093/nar/gkx1132
pmcid: 5753187
Mi, H. The PANTHER database of protein families, subfamilies, functions and pathways. Nucleic Acids Research 33, D284–D288, https://doi.org/10.1093/nar/gki078 (2004).
doi: 10.1093/nar/gki078
pmcid: 540032
Gandal, M. J. et al. Shared molecular neuropathology across major psychiatric disorders parallels polygenic overlap. Science 359, 693–697, https://doi.org/10.1126/science.aad6469 (2018).
doi: 10.1126/science.aad6469
pubmed: 29439242
pmcid: 5898828
Wang, D. et al. Comprehensive functional genomic resource and integrative model for the human brain. Science 362, eaat8464, https://doi.org/10.1126/science.aat8464 (2018).
doi: 10.1126/science.aat8464
pubmed: 30545857
pmcid: 6413328
Grove, J. et al. Identification of common genetic risk variants for autism spectrum disorder. Nature Genetics 51, 431–444, https://doi.org/10.1038/s41588-019-0344-8 (2019).
doi: 10.1038/s41588-019-0344-8
pubmed: 30804558
pmcid: 6454898
Csardi, G. & Nepusz, T. The igraph software package for complex network research. InterJournal Complex Systems, 1695 (2006).
Newman, M. E. J. Modularity and community structure in networks. Proceedings of the National Academy of Sciences 103, 8577–8582, https://doi.org/10.1073/pnas.0601602103 (2006).
doi: 10.1073/pnas.0601602103
Shannon, P. Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Research 13, 2498–2504, https://doi.org/10.1101/gr.1239303 (2003).
doi: 10.1101/gr.1239303
pubmed: 14597658
pmcid: 403769
Szklarczyk, D. et al. The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible. Nucleic Acids Research 45, D362–D368, https://doi.org/10.1093/nar/464gkw937 (2016).
doi: 10.1093/nar/464gkw937
pubmed: 27924014
pmcid: 5210637
O’Roak, B. J. et al. Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature 485, 246–250, https://doi.org/10.1038/nature10989 (2012).
doi: 10.1038/nature10989
pubmed: 22495309
pmcid: 3350576
Sanders, S. J. et al. De novo mutations revealed by whole-exome sequencing are strongly associated with autism. Nature 485, 237–241, https://doi.org/10.1038/nature10945 (2012).
doi: 10.1038/nature10945
pubmed: 22495306
pmcid: 3667984
Turner, T. N. et al. Genomic patterns of de novo mutation in simplex autism. Cell 171, 710–722e12, https://doi.org/10.1016/j.cell.2017.08.047 (2017).
doi: 10.1016/j.cell.2017.08.047
pubmed: 28965761
pmcid: 5679715
Karczewski, K. J. et al. Variation across 141, 456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes https://doi.org/10.1101/531210 (2019).
Reiner, O., Karzbrun, E., Kshirsagar, A. & Kaibuchi, K. Regulation of neuronal migration, an emerging topic in autism spectrum disorders. Journal of Neurochemistry 136, 440–456, https://doi.org/10.1111/jnc.13403 (2015).
doi: 10.1111/jnc.13403
pubmed: 26485324
Loebrich, S. The role of f-actin in modulating clathrin-mediated endocytosis: Lessons from neurons in health and neuropsychiatric disorder. Communicative &Integrative Biology 7, e28740, https://doi.org/10.4161/cib.28740 (2014).
doi: 10.4161/cib.28740
Reichova, A., Zatkova, M., Bacova, Z. & Bakos, J. Abnormalities in interactions of rho GTPases with scaffolding proteins contribute to neurodevelopmental disorders. Journal of Neuroscience Research 96, 781–788, https://doi.org/10.1002/jnr.24200 (2017).
doi: 10.1002/jnr.24200
pubmed: 29168207
Martin-Vilchez, S. et al. RhoGTPase regulators orchestrate distinct stages of synaptic development. PLOS ONE 12, e0170464, https://doi.org/10.1371/journal.pone.0170464 (2017).
doi: 10.1371/journal.pone.0170464
pubmed: 28114311
pmcid: 5256999
Sun, W. et al. Histone acetylome-wide association study of autism spectrum disorder. Cell 167, 1385–1397.e11, https://doi.org/10.1016/j.cell.2016.10.031 (2016).
doi: 10.1016/j.cell.2016.10.031
pubmed: 27863250
Lipton, J. O. et al. Aberrant proteostasis of BMAL1 underlies circadian abnormalities in a paradigmatic mTOR-opathy. Cell Reports 20, 868–880, https://doi.org/10.1016/j.celrep.2017.07.008 (2017).
doi: 10.1016/j.celrep.2017.07.008
pubmed: 28746872
pmcid: 5603761
Monyak, R. E. et al. Insulin signaling misregulation underlies circadian and cognitive deficits in a drosophila fragile x model. Molecular Psychiatry 22, 1140–1148, https://doi.org/10.1038/mp.2016.51 (2016).
doi: 10.1038/mp.2016.51
pubmed: 27090306
pmcid: 5071102
Kozlov, S. V. et al. The imprinted gene magel2 regulates normal circadian output. Nature Genetics 39, 1266–1272, https://doi.org/10.1038/ng2114 (2007).
doi: 10.1038/ng2114
pubmed: 17893678
Guglielmi, L. Update on the implication of potassium channels in autism: K channelautism spectrum disorder. Frontiers Cellular Neuroscience 9, https://doi.org/10.3389/fncel.2015.00034 (2015).
Deng, P.-Y. & Klyachko, V. A. Genetic upregulation of BK channel activity normalizes multiple synaptic and circuit defects in a mouse model of fragile x syndrome. The Journal of Physiology 594, 83–97, https://doi.org/10.1113/jp271031 (2015).
doi: 10.1113/jp271031
pubmed: 26427907
pmcid: 4704506
Lee, H. et al. Exome sequencing identifies de novo gain of function missense mutation in KCND2 in identical twins with autism and seizures that slows potassium channel inactivation. Human Molecular Genetics 23, 3481–3489, https://doi.org/10.1093/hmg/ddu056 (2014).
doi: 10.1093/hmg/ddu056
pubmed: 24501278
pmcid: 4049306
Sicca, F. et al. Gain-of-function defects of astrocytic kir4.1 channels in children with autism spectrum disorders and epilepsy. Scientific Reports 6, https://doi.org/10.1038/srep34325 (2016).