Sequence basis of transcription initiation in human genome.


Journal

bioRxiv : the preprint server for biology
Titre abrégé: bioRxiv
Pays: United States
ID NLM: 101680187

Informations de publication

Date de publication:
29 Jun 2023
Historique:
pubmed: 10 7 2023
medline: 10 7 2023
entrez: 10 7 2023
Statut: epublish

Résumé

Transcription initiation is an essential process for ensuring proper function of any gene, however, a unified understanding of sequence patterns and rules that determine transcription initiation sites in human genome remains elusive. By explaining transcription initiation at basepair resolution from sequence with a deep learning-inspired explainable modeling approach, here we show that simple rules can explain the vast majority of human promoters. We identified key sequence patterns that contribute to human promoter function, each activating transcription with a distinct position-specific effect curve that likely reflects its mechanism of promoting transcription initiation. Most of these position-specific effects have not been previously characterized, and we verified them using experimental perturbations of transcription factors and sequences. We revealed the sequence basis of bidirectional transcription at promoters and links between promoter selectivity and gene expression variation across cell types. Additionally, by analyzing 241 mammalian genomes and mouse transcription initiation site data, we showed that the sequence determinants are conserved across mammalian species. Taken together, we provide a unified model of the sequence basis of transcription initiation at the basepair level that is broadly applicable across mammalian species, and shed new light on basic questions related to promoter sequence and function.

Identifiants

pubmed: 37425823
doi: 10.1101/2023.06.27.546584
pmc: PMC10327147
pii:
doi:

Types de publication

Preprint

Langues

eng

Subventions

Organisme : NIGMS NIH HHS
ID : DP2 GM146336
Pays : United States

Auteurs

Kseniia Dudnyk (K)

Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center; Dallas, Texas, United States of America.

Chenlai Shi (C)

Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center; Dallas, Texas, United States of America.

Jian Zhou (J)

Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center; Dallas, Texas, United States of America.

Classifications MeSH