DataRemix: a universal data transformation for optimal inference from gene expression datasets.


Journal

Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944

Informations de publication

Date de publication:
17 05 2021
Historique:
received: 30 04 2020
revised: 01 08 2020
accepted: 17 08 2020
pubmed: 22 8 2020
medline: 9 6 2021
entrez: 22 8 2020
Statut: ppublish

Résumé

RNA-seq technology provides unprecedented power in the assessment of the transcription abundance and can be used to perform a variety of downstream tasks such as inference of gene-correlation network and eQTL discovery. However, raw gene expression values have to be normalized for nuisance biological variation and technical covariates, and different normalization strategies can lead to dramatically different results in the downstream study. We describe a generalization of singular value decomposition-based reconstruction for which the common techniques of whitening, rank-k approximation and removing the top k principal components are special cases. Our simple three-parameter transformation, DataRemix, can be tuned to reweigh the contribution of hidden factors and reveal otherwise hidden biological signals. In particular, we demonstrate that the method can effectively prioritize biological signals over noise without leveraging external dataset-specific knowledge, and can outperform normalization methods that make explicit use of known technical factors. We also show that DataRemix can be efficiently optimized via Thompson sampling approach, which makes it feasible for computationally expensive objectives such as eQTL analysis. Finally, we apply our method to the Religious Orders Study and Memory and Aging Project dataset, and we report what to our knowledge is the first replicable trans-eQTL effect in human brain. DataRemix is an R package which is freely available at GitHub (https://github.com/wgmao/DataRemix). Supplementary data are available at Bioinformatics online.

Identifiants

pubmed: 32821903
pii: 5895302
doi: 10.1093/bioinformatics/btaa745
pmc: PMC8128479
doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural

Langues

eng

Sous-ensembles de citation

IM

Pagination

984-991

Subventions

Organisme : NHGRI NIH HHS
ID : R01 HG009299
Pays : United States
Organisme : NIDDK NIH HHS
ID : U24 DK112331
Pays : United States
Organisme : NHGRI NIH HHS
ID : U54 HG008540
Pays : United States

Informations de copyright

© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Références

PLoS One. 2013 Jul 18;8(7):e68141
pubmed: 23874524
Cell Rep. 2017 Oct 24;21(4):1077-1088
pubmed: 29069589
Genome Res. 2014 Jan;24(1):14-24
pubmed: 24092820
Proc Natl Acad Sci U S A. 2003 Mar 18;100(6):3351-6
pubmed: 12631705
Nucleic Acids Res. 2019 Jan 8;47(D1):D506-D515
pubmed: 30395287
Proc Natl Acad Sci U S A. 2005 Oct 25;102(43):15545-50
pubmed: 16199517
Nat Genet. 2014 May;46(5):430-7
pubmed: 24728292
PLoS Genet. 2007 Sep;3(9):1724-35
pubmed: 17907809
Proc Natl Acad Sci U S A. 2010 Sep 21;107(38):16465-70
pubmed: 20810919
PLoS Comput Biol. 2010 May 06;6(5):e1000770
pubmed: 20463871
PLoS Comput Biol. 2015 May 13;11(5):e1004220
pubmed: 25970446
Proc Natl Acad Sci U S A. 2000 Aug 29;97(18):10101-6
pubmed: 10963673
Nat Genet. 2013 Jun;45(6):580-5
pubmed: 23715323
Genome Res. 2017 Nov;27(11):1843-1858
pubmed: 29021288
Genetics. 2008 Dec;180(4):1909-25
pubmed: 18791227
Sci Data. 2018 Aug 07;5:180142
pubmed: 30084846
F1000Res. 2018 Nov 28;7:1860
pubmed: 30613398

Auteurs

Weiguang Mao (W)

Joint Carnegie Mellon-University of Pittsburgh Ph.D. Program in Computational Biology, Pittsburgh, PA 15260, USA.
Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15260, USA.

Javad Rahimikollu (J)

Joint Carnegie Mellon-University of Pittsburgh Ph.D. Program in Computational Biology, Pittsburgh, PA 15260, USA.
Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15260, USA.

Ryan Hausler (R)

Department of Medicine, Division of Hematology/Oncology,, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.

Maria Chikina (M)

Joint Carnegie Mellon-University of Pittsburgh Ph.D. Program in Computational Biology, Pittsburgh, PA 15260, USA.
Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15260, USA.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH