A sequence-based global map of regulatory activity for deciphering human genetics.


Journal

Nature genetics
ISSN: 1546-1718
Titre abrégé: Nat Genet
Pays: United States
ID NLM: 9216904

Informations de publication

Date de publication:
07 2022
Historique:
received: 10 06 2021
accepted: 13 05 2022
pubmed: 12 7 2022
medline: 16 7 2022
entrez: 11 7 2022
Statut: ppublish

Résumé

Epigenomic profiling has enabled large-scale identification of regulatory elements, yet we still lack a systematic mapping from any sequence or variant to regulatory activities. We address this challenge with Sei, a framework for integrating human genetics data with sequence information to discover the regulatory basis of traits and diseases. Sei learns a vocabulary of regulatory activities, called sequence classes, using a deep learning model that predicts 21,907 chromatin profiles across >1,300 cell lines and tissues. Sequence classes provide a global classification and quantification of sequence and variant effects based on diverse regulatory activities, such as cell type-specific enhancer functions. These predictions are supported by tissue-specific expression, expression quantitative trait loci and evolutionary constraint data. Furthermore, sequence classes enable characterization of the tissue-specific, regulatory architecture of complex traits and generate mechanistic hypotheses for individual regulatory pathogenic mutations. We provide Sei as a resource to elucidate the regulatory basis of human health and disease.

Identifiants

pubmed: 35817977
doi: 10.1038/s41588-022-01102-2
pii: 10.1038/s41588-022-01102-2
pmc: PMC9279145
doi:

Substances chimiques

Chromatin 0

Types de publication

Journal Article Research Support, U.S. Gov't, Non-P.H.S. Research Support, Non-U.S. Gov't Research Support, N.I.H., Extramural

Langues

eng

Sous-ensembles de citation

IM

Pagination

940-949

Subventions

Organisme : NIAID NIH HHS
ID : HHSN272201000054C
Pays : United States
Organisme : NHGRI NIH HHS
ID : R01 HG005998
Pays : United States
Organisme : NIGMS NIH HHS
ID : R01 GM071966
Pays : United States
Organisme : NHLBI NIH HHS
ID : U54 HL117798
Pays : United States
Organisme : NIGMS NIH HHS
ID : DP2 GM146336
Pays : United States

Commentaires et corrections

Type : CommentIn

Informations de copyright

© 2022. The Author(s).

Références

Edwards, S. L., Beesley, J., French, J. D. & Dunning, M. Beyond GWASs: illuminating the dark road from association to function. Am. J. Hum. Genet. 93, 779–797 (2013).
doi: 10.1016/j.ajhg.2013.10.012
Dunham, I. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
doi: 10.1038/nature11247
Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
doi: 10.1038/nature14248
Zheng, R. et al. Cistrome Data Browser: expanded datasets and new tools for gene regulatory analysis. Nucleic Acids Res. 47, D729–D735 (2019).
doi: 10.1093/nar/gky1094
Alipanahi, B., Delong, A., Weirauch, M. T. & Frey, B. J. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015).
doi: 10.1038/nbt.3300
Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).
doi: 10.1038/nmeth.3547
Zhou, J. et al. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat. Genet. 50, 1171–1179 (2018).
doi: 10.1038/s41588-018-0160-6
Kelley, D. R. et al. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res. 28, 739–750 (2018).
doi: 10.1101/gr.227819.117
Kelley, D. R. Cross-species regulatory sequence activity prediction. PLoS Comput. Biol. 16, e1008050 (2020).
doi: 10.1371/journal.pcbi.1008050
Avsec, Ž. et al. Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat. Genet. 53, 354–366 (2021).
doi: 10.1038/s41588-021-00782-6
Cofer, E. M. et al. Modeling transcriptional regulation of model species with deep learning. Genome Res. 31, 1097–1105 (2021).
doi: 10.1101/gr.266171.120
Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. 2008, P10008 (2008).
doi: 10.1088/1742-5468/2008/10/P10008
McInnes, L., Healy, J., Saul, N. & Großberger, L. UMAP: uniform manifold approximation and projection. J. Open Source Softw. 3, 861 (2018).
doi: 10.21105/joss.00861
Poličar, P. G., Stražar, M. & Zupan, B. openTSNE: a modular Python library for t-SNE dimensionality reduction and embedding. Preprint at bioRxiv https://doi.org/10.1101/731877 (2019).
Hawkins, R. D. et al. Distinct epigenomic landscapes of pluripotent and lineage-committed human cells. Cell Stem Cell 6, 479–491 (2010).
doi: 10.1016/j.stem.2010.03.018
Boros, J., Arnoult, N., Stroobant, V., Collet, J.-F. & Decottignies, A. Polycomb repressive complex 2 and H3K27me3 cooperate with H3K9 methylation to maintain heterochromatin protein 1α at chromatin. Mol. Cell. Biol. 34, 3662–3674 (2014).
doi: 10.1128/MCB.00205-14
Schwämmle, V. et al. Systems level analysis of histone H3 post-translational modifications (PTMs) reveals features of PTM crosstalk in chromatin regulation. Mol. Cell. Proteomics 15, 2715–2729 (2016).
doi: 10.1074/mcp.M115.054460
Meuleman, W. et al. Index and biological spectrum of human DNase I hypersensitive sites. Nature 584, 244–251 (2020).
doi: 10.1038/s41586-020-2559-3
Aguet, F. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
doi: 10.1126/science.aaz1776
Kerimov, N. et al. A compendium of uniformly processed human gene expression and splicing quantitative trait loci. Nat. Genet. 53, 1290–1299 (2021).
doi: 10.1038/s41588-021-00924-w
Altshuler, D. M. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
doi: 10.1038/nature11632
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
doi: 10.1038/ng.3404
Finucane, H. K. et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50, 621–629 (2018).
doi: 10.1038/s41588-018-0081-4
Reshef, Y. A. et al. Detecting genome-wide directional effects of transcription factor binding on polygenic disease risk. Nat. Genet. 50, 1483–1493 (2018).
doi: 10.1038/s41588-018-0196-7
Loh, P.-R., Kichaev, G., Gazal, S., Schoech, A. P. & Price, A. L. Mixed-model association for biobank-scale datasets. Nat. Genet. 50, 906–908 (2018).
doi: 10.1038/s41588-018-0144-6
Sudlow, C. et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).
doi: 10.1371/journal.pmed.1001779
Paththinige, C. S., Sirisena, N. D. & Dissanayake, V. H. W. Genetic determinants of inherited susceptibility to hypercholesterolemia—a comprehensive literature review. Lipids Health Dis. 16, 103 (2017).
doi: 10.1186/s12944-017-0488-4
Stenson, P. D. et al. The Human Gene Mutation Database: 2008 update. Genome Med. 1, 13 (2009).
doi: 10.1186/gm13
Gurnett, C. A. et al. Two novel point mutations in the long-range SHH enhancer in three families with triphalangeal thumb and preaxial polydactyly. Am. J. Med. Genet. A 143A, 27–32 (2007).
doi: 10.1002/ajmg.a.31563
Plenge, R. M. et al. A promoter mutation in the XIST gene in two unrelated families with skewed X-chromosome inactivation. Nat. Genet. 17, 353–356 (1997).
doi: 10.1038/ng1197-353
Pugacheva, E. M. et al. Familial cases of point mutations in the XIST promoter reveal a correlation between CTCF binding and pre-emptive choices of X chromosome inactivation. Hum. Mol. Genet. 14, 953–965 (2005).
doi: 10.1093/hmg/ddi089
De Gobbi, M. et al. A regulatory SNP causes a human genetic disease by creating a new transcriptional promoter. Science 312, 1215–1217 (2006).
doi: 10.1126/science.1126431
Horn, S. et al. TERT promoter mutations in familial and sporadic melanoma. Science 339, 959–961 (2013).
doi: 10.1126/science.1230062
Surrey, S., Delgrosso, K., Malladi, P. & Schwartz, E. A single-base change at position -175 in the 5′-flanking region of the
doi: 10.1182/blood.V71.3.807.807
Chen, K. M., Cofer, E. M., Zhou, J. & Troyanskaya, O. G. Selene: a PyTorch-based deep learning library for sequence data. Nat. Methods 16, 315–318 (2019).
doi: 10.1038/s41592-019-0360-8
Amemiya, H. M., Kundaje, A. & Boyle, A. P. The ENCODE blacklist: identification of problematic regions of the genome. Sci. Rep. 9, 9354 (2019).
doi: 10.1038/s41598-019-45839-z

Auteurs

Kathleen M Chen (KM)

Department of Computer Science, Princeton University, Princeton, NJ, USA.
Flatiron Institute, Simons Foundation, New York, NY, USA.

Aaron K Wong (AK)

Flatiron Institute, Simons Foundation, New York, NY, USA.

Olga G Troyanskaya (OG)

Department of Computer Science, Princeton University, Princeton, NJ, USA. ogt@cs.princeton.edu.
Flatiron Institute, Simons Foundation, New York, NY, USA. ogt@cs.princeton.edu.
Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA. ogt@cs.princeton.edu.

Jian Zhou (J)

Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX, USA. jian.zhou@utsouthwestern.edu.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH