Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings.


Journal

Nature genetics
ISSN: 1546-1718
Titre abrégé: Nat Genet
Pays: United States
ID NLM: 9216904

Informations de publication

Date de publication:
Dec 2023
Historique:
received: 13 03 2023
accepted: 08 09 2023
pubmed: 1 12 2023
medline: 1 12 2023
entrez: 30 11 2023
Statut: ppublish

Résumé

Deep learning methods have recently become the state of the art in a variety of regulatory genomic tasks

Identifiants

pubmed: 38036778
doi: 10.1038/s41588-023-01524-6
pii: 10.1038/s41588-023-01524-6
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

2060-2064

Commentaires et corrections

Type : UpdateOf

Informations de copyright

© 2023. The Author(s), under exclusive licence to Springer Nature America, Inc.

Références

Avsec, Ž. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203 (2021).
doi: 10.1038/s41592-021-01252-x pubmed: 34608324 pmcid: 8490152
Avsec, Ž. et al. Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat. Genet. 53, 354–366 (2021).
doi: 10.1038/s41588-021-00782-6 pubmed: 33603233 pmcid: 8812996
Eraslan, G., Avsec, Ž., Gagneur, J. & Theis, F. J. Deep learning: new computational modelling techniques for genomics. Nat. Rev. Genet. 20, 389–403 (2019).
doi: 10.1038/s41576-019-0122-6 pubmed: 30971806
Zhou, J. et al. Whole-genome deep-learning analysis identifies contribution of noncoding mutations to autism risk. Nat. Genet. 51, 973–980 (2019).
doi: 10.1038/s41588-019-0420-0 pubmed: 31133750 pmcid: 6758908
Zhou, J. Sequence-based modeling of three-dimensional genome architecture from kilobase to chromosome scale. Nat. Genet. 54, 725–734 (2022).
doi: 10.1038/s41588-022-01065-4 pubmed: 35551308 pmcid: 9186125
Park, C. Y. et al. Genome-wide landscape of RNA-binding protein target site dysregulation reveals a major impact on psychiatric disorder risk. Nat. Genet. 53, 166–173 (2021).
doi: 10.1038/s41588-020-00761-3 pubmed: 33462483 pmcid: 7886016
De Jager, P. L. et al. A multi-omic atlas of the human frontal cortex for aging and Alzheimer’s disease research. Sci. Data 5, 180142 (2018).
doi: 10.1038/sdata.2018.142 pubmed: 30084846 pmcid: 6080491
Kelley, D. R., Snoek, J. & Rinn, J. L. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 26, 990–999 (2016).
doi: 10.1101/gr.200535.115 pubmed: 27197224 pmcid: 4937568
Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).
doi: 10.1038/nmeth.3547 pubmed: 26301843 pmcid: 4768299
Yuan, H. & Kelley, D. R. scBasset: sequence-based modeling of single-cell ATAC-seq using convolutional neural networks. Nat. Methods https://doi.org/10.1038/s41592-022-01562-8 (2022).
doi: 10.1038/s41592-022-01562-8 pubmed: 35941239
Maslova, A. et al. Deep learning of immune cell differentiation. Proc. Natl Acad. Sci. USA 117, 25655–25666 (2020).
doi: 10.1073/pnas.2011795117 pubmed: 32978299 pmcid: 7568267
Chen, K. M., Wong, A. K., Troyanskaya, O. G. & Zhou, J. A sequence-based global map of regulatory activity for deciphering human genetics. Nat. Genet. 54, 940–949 (2022).
doi: 10.1038/s41588-022-01102-2 pubmed: 35817977 pmcid: 9279145
Kim, D. S. et al. The dynamic, combinatorial cis-regulatory lexicon of epidermal differentiation. Nat. Genet. https://doi.org/10.1038/s41588-021-00947-3 (2021).
doi: 10.1038/s41588-021-00947-3 pubmed: 34725478 pmcid: 8763320
Zhou, J. et al. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat. Genet. 50, 1171–1179 (2018).
doi: 10.1038/s41588-018-0160-6 pubmed: 30013180 pmcid: 6094955
Novakovsky, G., Dexter, N., Libbrecht, M. W., Wasserman, W. W. & Mostafavi, S. Obtaining genetics insights from deep learning via explainable artificial intelligence. Nat. Rev. Genet. https://doi.org/10.1038/s41576-022-00532-2 (2022).
doi: 10.1038/s41576-022-00532-2 pubmed: 36192604
Wang, Q. S. et al. Leveraging supervised learning for functionally informed fine-mapping of cis-eQTLs identifies an additional 20,913 putative causal eQTLs. Nat. Commun. 12, 3394 (2021).
doi: 10.1038/s41467-021-23134-8 pubmed: 34099641 pmcid: 8184741
Karollus, A., Mauermeier, T. & Gagneur, J. Current sequence-based models capture gene expression determinants in promoters but mostly ignore distal enhancers. Genome Biol. 24, 56 (2023).
doi: 10.1186/s13059-023-02899-9 pubmed: 36973806 pmcid: 10045630
Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091–1098 (2015).
doi: 10.1038/ng.3367 pubmed: 26258848 pmcid: 4552594
Reshef, Y. A. et al. Detecting genome-wide directional effects of transcription factor binding on polygenic disease risk. Nat. Genet. 50, 1483–1493 (2018).
doi: 10.1038/s41588-018-0196-7 pubmed: 30177862 pmcid: 6202062
Bennett, D. A. et al. Religious Orders Study and Rush Memory and Aging Project. J. Alzheimers Dis. 64, S161–S189 (2018).
doi: 10.3233/JAD-179939 pubmed: 29865057 pmcid: 6380522
Mostafavi, S. et al. A molecular network of the aging human brain provides insights into the pathology and cognitive decline of Alzheimer’s disease. Nat. Neurosci. 21, 811–819 (2018).
doi: 10.1038/s41593-018-0154-9 pubmed: 29802388 pmcid: 6599633
Battle, A. et al. Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals. Genome Res. 24, 14–24 (2014).
doi: 10.1101/gr.155192.113 pubmed: 24092820 pmcid: 3875855
GTEx Consortium. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).
doi: 10.1038/nature24277 pmcid: 5776756
Sundararajan, M., Taly, A. & Yan, Q. Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine Learning (eds. Precup, D. & Teh, Y. W.) Vol. 70 3319–3328 (PMLR, 2017); https://doi.org/10.5281/zenodo.8274879
Sasse, A, Ng, B, & Spiro, E. A. mostafavilabuw/EnformerAssessment: EnformerEvaluationV1. Zenado https://doi.org/10.5281/zenodo.8274879 (2023).

Auteurs

Alexander Sasse (A)

Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA.

Bernard Ng (B)

Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA.

Anna E Spiro (AE)

Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA.

Shinya Tasaki (S)

Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA.

David A Bennett (DA)

Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA.

Christopher Gaiteri (C)

Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA.
Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY, USA.

Philip L De Jager (PL)

Center for Translational & Computational Neuroimmunology, Department of Neurology, and the Taub Institute for the Study of Alzheimer's Disease and the Aging Brain, Columbia University Irving Medical Center, New York, NY, USA.

Maria Chikina (M)

Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA. mchikina@gmail.com.

Sara Mostafavi (S)

Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA. saramos@cs.washington.edu.
Canadian Institute for Advanced Research, Toronto, Ontario, Canada. saramos@cs.washington.edu.

Classifications MeSH