DataRemix: a universal data transformation for optimal inference from gene expression datasets.
Journal
Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944
Informations de publication
Date de publication:
17 05 2021
17 05 2021
Historique:
received:
30
04
2020
revised:
01
08
2020
accepted:
17
08
2020
pubmed:
22
8
2020
medline:
9
6
2021
entrez:
22
8
2020
Statut:
ppublish
Résumé
RNA-seq technology provides unprecedented power in the assessment of the transcription abundance and can be used to perform a variety of downstream tasks such as inference of gene-correlation network and eQTL discovery. However, raw gene expression values have to be normalized for nuisance biological variation and technical covariates, and different normalization strategies can lead to dramatically different results in the downstream study. We describe a generalization of singular value decomposition-based reconstruction for which the common techniques of whitening, rank-k approximation and removing the top k principal components are special cases. Our simple three-parameter transformation, DataRemix, can be tuned to reweigh the contribution of hidden factors and reveal otherwise hidden biological signals. In particular, we demonstrate that the method can effectively prioritize biological signals over noise without leveraging external dataset-specific knowledge, and can outperform normalization methods that make explicit use of known technical factors. We also show that DataRemix can be efficiently optimized via Thompson sampling approach, which makes it feasible for computationally expensive objectives such as eQTL analysis. Finally, we apply our method to the Religious Orders Study and Memory and Aging Project dataset, and we report what to our knowledge is the first replicable trans-eQTL effect in human brain. DataRemix is an R package which is freely available at GitHub (https://github.com/wgmao/DataRemix). Supplementary data are available at Bioinformatics online.
Identifiants
pubmed: 32821903
pii: 5895302
doi: 10.1093/bioinformatics/btaa745
pmc: PMC8128479
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Langues
eng
Sous-ensembles de citation
IM
Pagination
984-991Subventions
Organisme : NHGRI NIH HHS
ID : R01 HG009299
Pays : United States
Organisme : NIDDK NIH HHS
ID : U24 DK112331
Pays : United States
Organisme : NHGRI NIH HHS
ID : U54 HG008540
Pays : United States
Informations de copyright
© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Références
PLoS One. 2013 Jul 18;8(7):e68141
pubmed: 23874524
Cell Rep. 2017 Oct 24;21(4):1077-1088
pubmed: 29069589
Genome Res. 2014 Jan;24(1):14-24
pubmed: 24092820
Proc Natl Acad Sci U S A. 2003 Mar 18;100(6):3351-6
pubmed: 12631705
Nucleic Acids Res. 2019 Jan 8;47(D1):D506-D515
pubmed: 30395287
Proc Natl Acad Sci U S A. 2005 Oct 25;102(43):15545-50
pubmed: 16199517
Nat Genet. 2014 May;46(5):430-7
pubmed: 24728292
PLoS Genet. 2007 Sep;3(9):1724-35
pubmed: 17907809
Proc Natl Acad Sci U S A. 2010 Sep 21;107(38):16465-70
pubmed: 20810919
PLoS Comput Biol. 2010 May 06;6(5):e1000770
pubmed: 20463871
PLoS Comput Biol. 2015 May 13;11(5):e1004220
pubmed: 25970446
Proc Natl Acad Sci U S A. 2000 Aug 29;97(18):10101-6
pubmed: 10963673
Nat Genet. 2013 Jun;45(6):580-5
pubmed: 23715323
Genome Res. 2017 Nov;27(11):1843-1858
pubmed: 29021288
Genetics. 2008 Dec;180(4):1909-25
pubmed: 18791227
Sci Data. 2018 Aug 07;5:180142
pubmed: 30084846
F1000Res. 2018 Nov 28;7:1860
pubmed: 30613398