MousiPLIER: A Mouse Pathway-Level Information Extractor Model.


Journal

bioRxiv : the preprint server for biology
Titre abrégé: bioRxiv
Pays: United States
ID NLM: 101680187

Informations de publication

Date de publication:
15 Aug 2023
Historique:
pubmed: 14 8 2023
medline: 14 8 2023
entrez: 14 8 2023
Statut: epublish

Résumé

High throughput gene expression profiling is a powerful approach to generate hypotheses on the underlying causes of biological function and disease. Yet this approach is limited by its ability to infer underlying biological pathways and burden of testing tens of thousands of individual genes. Machine learning models that incorporate prior biological knowledge are necessary to extract meaningful pathways and generate rational hypothesis from the vast amount of gene expression data generated to date. We adopted an unsupervised machine learning method, Pathway-level information extractor (PLIER), to train the first mouse PLIER model on 190,111 mouse brain RNA-sequencing samples, the greatest amount of training data ever used by PLIER. mousiPLER converted gene expression data into a latent variables that align to known pathway or cell maker gene sets, substantially reducing data dimensionality and improving interpretability. To determine the utility of mousiPLIER, we applied it to a mouse brain aging study of microglia and astrocyte transcriptomic profiling. We found a specific set of latent variables that are significantly associated with aging, including one latent variable (LV41) corresponding to striatal signal. We next performed k-means clustering on the training data to identify studies that respond strongly to LV41, finding that the variable is relevant to striatum and aging across the scientific literature. Finally, we built a web server (http://mousiplier.greenelab.com/) for users to easily explore the learned latent variables. Taken together this study provides proof of concept that mousiPLIER can uncover meaningful biological processes in mouse transcriptomic studies.

Identifiants

pubmed: 37577575
doi: 10.1101/2023.07.31.551386
pmc: PMC10418102
pii:
doi:

Types de publication

Preprint

Langues

eng

Subventions

Organisme : NIDA NIH HHS
ID : R01 DA052465
Pays : United States
Organisme : NHGRI NIH HHS
ID : R01 HG010067
Pays : United States

Déclaration de conflit d'intérêts

The authors declare no competing financial interests.

Auteurs

Shuo Zhang (S)

Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA.
Penn Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.

Benjamin J Heil (BJ)

Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.

Weiguang Mao (W)

Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA.

Maria Chikina (M)

Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA.

Casey S Greene (CS)

Department of Pharmacology, University of Colorado School of Medicine, Denver, CO 80045, USA.
Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Denver, CO 80045, USA.

Elizabeth A Heller (EA)

Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA.
Penn Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.

Classifications MeSH