Simultaneous estimation of gene regulatory network structure and RNA kinetics from single cell gene expression.

Journal

bioRxiv : the preprint server for biology

Titre abrégé: bioRxiv

Pays: United States

ID NLM: 101680187

Informations de publication

Date de publication:
23 Sep 2023

Historique:

medline: 4 10 2023

pubmed: 4 10 2023

entrez: 4 10 2023

Statut: epublish

Résumé

Cells respond to environmental and developmental stimuli by remodeling their transcriptomes through regulation of both mRNA transcription and mRNA decay. A central goal of biology is identifying the global set of regulatory relationships between factors that control mRNA production and degradation and their target transcripts and construct a predictive model of gene expression. Regulatory relationships are typically identified using transcriptome measurements and causal inference algorithms. RNA kinetic parameters are determined experimentally by employing run-on or metabolic labeling (e.g. 4-thiouracil) methods that allow transcription and decay rates to be separately measured. Here, we develop a deep learning model, trained with single-cell RNA-seq data, that both infers causal regulatory relationships and estimates RNA kinetic parameters. The resulting

Identifiants

DOI: 10.1101/2023.09.21.558277 PMID: 37790443 PMC: PMC10542544

pubmed: 37790443

doi: 10.1101/2023.09.21.558277

pmc: PMC10542544

pii:

doi:

Types de publication

Preprint

Langues

eng

Déclaration de conflit d'intérêts

4.9Supplemental Data Supplemental Data 1 is single-cell response to rapamycin count data first sequenced in this work. It is a 173348 rows × 5847 columns TSV.GZ file where the first row is a header, the first 5843 columns are integer gene counts, and the final 4 columns (’Gene’, ’Replicate’, ’Pool’, and ’Experiment’) are cell-specific metadata.Supplemental Data 2 is bulk response to rapamycin count data first sequenced in this work. It is a 33 rows × 5847 columns TSV.GZ file where the first row is a header, the first 5843 columns are integer gene counts, and the final 4 columns (’Oligo’, ’Time’, ’Replicate’, and ’Sample_barcode’) are sample-specific metadata.Supplemental Data 3 is single-cell count data published as GSE125162 and re-analyzed with the pipeline used for single-cell quantification in this work. It is a 65068 rows × 5850 columns TSV.GZ file where the first row is a header, the first 5843 columns are integer gene counts, and the final 7 columns (’Condition’, ’Sample’, ’Genotype_Group’, ’Genotype_Individual’, ’Genotype’, ’Replicate’, ’Cell_Barcode’) are cell-specific metadata.Supplemental Data 4 is the four deep learning models trained in this work. It is a TAR.GZ file containing the final biophysical transcription/decay model, the pre-trained decay model, the velocity prediction model, and the count prediction model. Each model file is an h5 file containing a pytorch model that can be loaded with supirfactor_dynamical.read().Supplemental Data 5 is the prior knowledge network used to constrain the models for TF interpretability. It is a 1574 rows × 204 columns [Genes x TFs] TSV.GZ file where the first row is a header with TF names, the first column is an index of gene names, and TF-gene interactions are indicated by non-zero values in the matrix. There are 2799 TF-gene interactions.Supplemental Table 6 is the oligonucleotide sequences used in this work. It is a TSV file with a header row.Supplemental Table 7 is the yeast strains used in this work. It is a TSV file with a header row.Supplemental Table 8 is gene metadata used in this work (e.g. Ribosomal Protein gene labels, etc). It is a TSV file with a header row.Supplemental Table 9 is FY4/5 growth curve data generated in this work. It is a 20 rows × 7 columns TSV file where the first row is a header with replicate IDs, the first column is an index of times in minutes, and values are cell densities in YPD culture, in units of 106 cells / mL.Supplemental Data 10 is a TAR.GZ file containing the yeast SacCer3 genome, modified to add UTR sequences, that was used to generate transcripts for kallisto pseudoalignment in this work. 5.3Competing Interests RB is currently VP of Machine Learning for Drug Discovery at Genentech. The remaining authors declare no competing interests.

Simultaneous estimation of gene regulatory network structure and RNA kinetics from single cell gene expression.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Déclaration de conflit d'intérêts

Auteurs

Christopher A Jackson (CA)

Maggie Beheler-Amass (M)

Andreas Tjärnberg (A)

Ina Suresh (I)

Angela Shang-Mei Hickey (AS)

Richard Bonneau (R)

David Gresham (D)

Classifications MeSH