Simultaneous estimation of gene regulatory network structure and RNA kinetics from single cell gene expression.
Journal
bioRxiv : the preprint server for biology
Titre abrégé: bioRxiv
Pays: United States
ID NLM: 101680187
Informations de publication
Date de publication:
23 Sep 2023
23 Sep 2023
Historique:
medline:
4
10
2023
pubmed:
4
10
2023
entrez:
4
10
2023
Statut:
epublish
Résumé
Cells respond to environmental and developmental stimuli by remodeling their transcriptomes through regulation of both mRNA transcription and mRNA decay. A central goal of biology is identifying the global set of regulatory relationships between factors that control mRNA production and degradation and their target transcripts and construct a predictive model of gene expression. Regulatory relationships are typically identified using transcriptome measurements and causal inference algorithms. RNA kinetic parameters are determined experimentally by employing run-on or metabolic labeling (e.g. 4-thiouracil) methods that allow transcription and decay rates to be separately measured. Here, we develop a deep learning model, trained with single-cell RNA-seq data, that both infers causal regulatory relationships and estimates RNA kinetic parameters. The resulting
Identifiants
pubmed: 37790443
doi: 10.1101/2023.09.21.558277
pmc: PMC10542544
pii:
doi:
Types de publication
Preprint
Langues
eng
Déclaration de conflit d'intérêts
4.9Supplemental Data Supplemental Data 1 is single-cell response to rapamycin count data first sequenced in this work. It is a 173348 rows × 5847 columns TSV.GZ file where the first row is a header, the first 5843 columns are integer gene counts, and the final 4 columns (’Gene’, ’Replicate’, ’Pool’, and ’Experiment’) are cell-specific metadata.Supplemental Data 2 is bulk response to rapamycin count data first sequenced in this work. It is a 33 rows × 5847 columns TSV.GZ file where the first row is a header, the first 5843 columns are integer gene counts, and the final 4 columns (’Oligo’, ’Time’, ’Replicate’, and ’Sample_barcode’) are sample-specific metadata.Supplemental Data 3 is single-cell count data published as GSE125162 and re-analyzed with the pipeline used for single-cell quantification in this work. It is a 65068 rows × 5850 columns TSV.GZ file where the first row is a header, the first 5843 columns are integer gene counts, and the final 7 columns (’Condition’, ’Sample’, ’Genotype_Group’, ’Genotype_Individual’, ’Genotype’, ’Replicate’, ’Cell_Barcode’) are cell-specific metadata.Supplemental Data 4 is the four deep learning models trained in this work. It is a TAR.GZ file containing the final biophysical transcription/decay model, the pre-trained decay model, the velocity prediction model, and the count prediction model. Each model file is an h5 file containing a pytorch model that can be loaded with supirfactor_dynamical.read().Supplemental Data 5 is the prior knowledge network used to constrain the models for TF interpretability. It is a 1574 rows × 204 columns [Genes x TFs] TSV.GZ file where the first row is a header with TF names, the first column is an index of gene names, and TF-gene interactions are indicated by non-zero values in the matrix. There are 2799 TF-gene interactions.Supplemental Table 6 is the oligonucleotide sequences used in this work. It is a TSV file with a header row.Supplemental Table 7 is the yeast strains used in this work. It is a TSV file with a header row.Supplemental Table 8 is gene metadata used in this work (e.g. Ribosomal Protein gene labels, etc). It is a TSV file with a header row.Supplemental Table 9 is FY4/5 growth curve data generated in this work. It is a 20 rows × 7 columns TSV file where the first row is a header with replicate IDs, the first column is an index of times in minutes, and values are cell densities in YPD culture, in units of 106 cells / mL.Supplemental Data 10 is a TAR.GZ file containing the yeast SacCer3 genome, modified to add UTR sequences, that was used to generate transcripts for kallisto pseudoalignment in this work. 5.3Competing Interests RB is currently VP of Machine Learning for Drug Discovery at Genentech. The remaining authors declare no competing interests.