A sequence-based global map of regulatory activity for deciphering human genetics.
Journal
Nature genetics
ISSN: 1546-1718
Titre abrégé: Nat Genet
Pays: United States
ID NLM: 9216904
Informations de publication
Date de publication:
07 2022
07 2022
Historique:
received:
10
06
2021
accepted:
13
05
2022
pubmed:
12
7
2022
medline:
16
7
2022
entrez:
11
7
2022
Statut:
ppublish
Résumé
Epigenomic profiling has enabled large-scale identification of regulatory elements, yet we still lack a systematic mapping from any sequence or variant to regulatory activities. We address this challenge with Sei, a framework for integrating human genetics data with sequence information to discover the regulatory basis of traits and diseases. Sei learns a vocabulary of regulatory activities, called sequence classes, using a deep learning model that predicts 21,907 chromatin profiles across >1,300 cell lines and tissues. Sequence classes provide a global classification and quantification of sequence and variant effects based on diverse regulatory activities, such as cell type-specific enhancer functions. These predictions are supported by tissue-specific expression, expression quantitative trait loci and evolutionary constraint data. Furthermore, sequence classes enable characterization of the tissue-specific, regulatory architecture of complex traits and generate mechanistic hypotheses for individual regulatory pathogenic mutations. We provide Sei as a resource to elucidate the regulatory basis of human health and disease.
Identifiants
pubmed: 35817977
doi: 10.1038/s41588-022-01102-2
pii: 10.1038/s41588-022-01102-2
pmc: PMC9279145
doi:
Substances chimiques
Chromatin
0
Types de publication
Journal Article
Research Support, U.S. Gov't, Non-P.H.S.
Research Support, Non-U.S. Gov't
Research Support, N.I.H., Extramural
Langues
eng
Sous-ensembles de citation
IM
Pagination
940-949Subventions
Organisme : NIAID NIH HHS
ID : HHSN272201000054C
Pays : United States
Organisme : NHGRI NIH HHS
ID : R01 HG005998
Pays : United States
Organisme : NIGMS NIH HHS
ID : R01 GM071966
Pays : United States
Organisme : NHLBI NIH HHS
ID : U54 HL117798
Pays : United States
Organisme : NIGMS NIH HHS
ID : DP2 GM146336
Pays : United States
Commentaires et corrections
Type : CommentIn
Informations de copyright
© 2022. The Author(s).
Références
Edwards, S. L., Beesley, J., French, J. D. & Dunning, M. Beyond GWASs: illuminating the dark road from association to function. Am. J. Hum. Genet. 93, 779–797 (2013).
doi: 10.1016/j.ajhg.2013.10.012
Dunham, I. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
doi: 10.1038/nature11247
Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
doi: 10.1038/nature14248
Zheng, R. et al. Cistrome Data Browser: expanded datasets and new tools for gene regulatory analysis. Nucleic Acids Res. 47, D729–D735 (2019).
doi: 10.1093/nar/gky1094
Alipanahi, B., Delong, A., Weirauch, M. T. & Frey, B. J. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015).
doi: 10.1038/nbt.3300
Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).
doi: 10.1038/nmeth.3547
Zhou, J. et al. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat. Genet. 50, 1171–1179 (2018).
doi: 10.1038/s41588-018-0160-6
Kelley, D. R. et al. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res. 28, 739–750 (2018).
doi: 10.1101/gr.227819.117
Kelley, D. R. Cross-species regulatory sequence activity prediction. PLoS Comput. Biol. 16, e1008050 (2020).
doi: 10.1371/journal.pcbi.1008050
Avsec, Ž. et al. Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat. Genet. 53, 354–366 (2021).
doi: 10.1038/s41588-021-00782-6
Cofer, E. M. et al. Modeling transcriptional regulation of model species with deep learning. Genome Res. 31, 1097–1105 (2021).
doi: 10.1101/gr.266171.120
Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. 2008, P10008 (2008).
doi: 10.1088/1742-5468/2008/10/P10008
McInnes, L., Healy, J., Saul, N. & Großberger, L. UMAP: uniform manifold approximation and projection. J. Open Source Softw. 3, 861 (2018).
doi: 10.21105/joss.00861
Poličar, P. G., Stražar, M. & Zupan, B. openTSNE: a modular Python library for t-SNE dimensionality reduction and embedding. Preprint at bioRxiv https://doi.org/10.1101/731877 (2019).
Hawkins, R. D. et al. Distinct epigenomic landscapes of pluripotent and lineage-committed human cells. Cell Stem Cell 6, 479–491 (2010).
doi: 10.1016/j.stem.2010.03.018
Boros, J., Arnoult, N., Stroobant, V., Collet, J.-F. & Decottignies, A. Polycomb repressive complex 2 and H3K27me3 cooperate with H3K9 methylation to maintain heterochromatin protein 1α at chromatin. Mol. Cell. Biol. 34, 3662–3674 (2014).
doi: 10.1128/MCB.00205-14
Schwämmle, V. et al. Systems level analysis of histone H3 post-translational modifications (PTMs) reveals features of PTM crosstalk in chromatin regulation. Mol. Cell. Proteomics 15, 2715–2729 (2016).
doi: 10.1074/mcp.M115.054460
Meuleman, W. et al. Index and biological spectrum of human DNase I hypersensitive sites. Nature 584, 244–251 (2020).
doi: 10.1038/s41586-020-2559-3
Aguet, F. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
doi: 10.1126/science.aaz1776
Kerimov, N. et al. A compendium of uniformly processed human gene expression and splicing quantitative trait loci. Nat. Genet. 53, 1290–1299 (2021).
doi: 10.1038/s41588-021-00924-w
Altshuler, D. M. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
doi: 10.1038/nature11632
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
doi: 10.1038/ng.3404
Finucane, H. K. et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50, 621–629 (2018).
doi: 10.1038/s41588-018-0081-4
Reshef, Y. A. et al. Detecting genome-wide directional effects of transcription factor binding on polygenic disease risk. Nat. Genet. 50, 1483–1493 (2018).
doi: 10.1038/s41588-018-0196-7
Loh, P.-R., Kichaev, G., Gazal, S., Schoech, A. P. & Price, A. L. Mixed-model association for biobank-scale datasets. Nat. Genet. 50, 906–908 (2018).
doi: 10.1038/s41588-018-0144-6
Sudlow, C. et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).
doi: 10.1371/journal.pmed.1001779
Paththinige, C. S., Sirisena, N. D. & Dissanayake, V. H. W. Genetic determinants of inherited susceptibility to hypercholesterolemia—a comprehensive literature review. Lipids Health Dis. 16, 103 (2017).
doi: 10.1186/s12944-017-0488-4
Stenson, P. D. et al. The Human Gene Mutation Database: 2008 update. Genome Med. 1, 13 (2009).
doi: 10.1186/gm13
Gurnett, C. A. et al. Two novel point mutations in the long-range SHH enhancer in three families with triphalangeal thumb and preaxial polydactyly. Am. J. Med. Genet. A 143A, 27–32 (2007).
doi: 10.1002/ajmg.a.31563
Plenge, R. M. et al. A promoter mutation in the XIST gene in two unrelated families with skewed X-chromosome inactivation. Nat. Genet. 17, 353–356 (1997).
doi: 10.1038/ng1197-353
Pugacheva, E. M. et al. Familial cases of point mutations in the XIST promoter reveal a correlation between CTCF binding and pre-emptive choices of X chromosome inactivation. Hum. Mol. Genet. 14, 953–965 (2005).
doi: 10.1093/hmg/ddi089
De Gobbi, M. et al. A regulatory SNP causes a human genetic disease by creating a new transcriptional promoter. Science 312, 1215–1217 (2006).
doi: 10.1126/science.1126431
Horn, S. et al. TERT promoter mutations in familial and sporadic melanoma. Science 339, 959–961 (2013).
doi: 10.1126/science.1230062
Surrey, S., Delgrosso, K., Malladi, P. & Schwartz, E. A single-base change at position -175 in the 5′-flanking region of the
doi: 10.1182/blood.V71.3.807.807
Chen, K. M., Cofer, E. M., Zhou, J. & Troyanskaya, O. G. Selene: a PyTorch-based deep learning library for sequence data. Nat. Methods 16, 315–318 (2019).
doi: 10.1038/s41592-019-0360-8
Amemiya, H. M., Kundaje, A. & Boyle, A. P. The ENCODE blacklist: identification of problematic regions of the genome. Sci. Rep. 9, 9354 (2019).
doi: 10.1038/s41598-019-45839-z