PICALO: principal interaction component analysis for the identification of discrete technical, cell-type, and environmental factors that mediate eQTLs.
Cell type
Context
Hidden variable inference
Interaction eQTLs
eQTLs
Journal
Genome biology
ISSN: 1474-760X
Titre abrégé: Genome Biol
Pays: England
ID NLM: 100960660
Informations de publication
Date de publication:
22 Jan 2024
22 Jan 2024
Historique:
received:
22
12
2022
accepted:
20
12
2023
medline:
23
1
2024
pubmed:
23
1
2024
entrez:
23
1
2024
Statut:
epublish
Résumé
Expression quantitative trait loci (eQTL) offer insights into the regulatory mechanisms of trait-associated variants, but their effects often rely on contexts that are unknown or unmeasured. We introduce PICALO, a method for hidden variable inference of eQTL contexts. PICALO identifies and disentangles technical from biological context in heterogeneous blood and brain bulk eQTL datasets. These contexts are biologically informative and reproducible, outperforming cell counts or expression-based principal components. Furthermore, we show that RNA quality and cell type proportions interact with thousands of eQTLs. Knowledge of hidden eQTL contexts may aid in the inference of functional mechanisms underlying disease variants.
Identifiants
pubmed: 38254182
doi: 10.1186/s13059-023-03151-0
pii: 10.1186/s13059-023-03151-0
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
29Informations de copyright
© 2024. The Author(s).
Références
Fu, J. et al. Unraveling the regulatory mechanisms underlying tissue-dependent genetic variation of gene expression. PLoS Genet. 8, (2012).
van der Wijst MGP, et al. Single-cell RNA sequencing identifies celltype-specific cis-eQTLs and co-expression QTLs. Nat Genet. 2018;50:493–7.
pubmed: 29610479
pmcid: 5905669
doi: 10.1038/s41588-018-0089-9
Fairfax BP, et al. Innate immune activity conditions the effect of regulatory variants upon monocyte gene expression. Science. 2014;343:1246949.
pubmed: 24604202
pmcid: 4064786
doi: 10.1126/science.1246949
Connally, N. et al. The missing link between genetic association and regulatory function. 2021.06.08.21258515 https://www.medrxiv.org/content/10.1101/2021.06.08.21258515v2 (2021), https://doi.org/10.1101/2021.06.08.21258515 .
GTEx Consortium et al. Genetic effects on gene expression across human tissuesGTEx. Nature 550, 204–213 (2017).
Gay NR, et al. Impact of admixture and ancestry on eQTL analysis and GWAS colocalization in GTEx. Genome Biol. 2020;21:233.
pubmed: 32912333
pmcid: 7488497
doi: 10.1186/s13059-020-02113-0
De Klein N, et al. Brain expression quantitative trait locus and network analyses reveal downstream effects and putative drivers for brain-related diseases. Nat Genet. 2023;55:377–88.
pubmed: 36823318
pmcid: 10011140
doi: 10.1038/s41588-023-01300-6
van der Wijst MGP, et al. Single-cell RNA sequencing identifies celltype-specific cis-eQTLs and co-expression QTLs. Nat Genet. 2018;50:493–7.
pubmed: 29610479
pmcid: 5905669
doi: 10.1038/s41588-018-0089-9
Bonder MJ, et al. Identification of rare and common regulatory variants in pluripotent cells using population-scale transcriptomics. Nat Genet. 2021;53:313–21.
pubmed: 33664507
pmcid: 7944648
doi: 10.1038/s41588-021-00800-7
Ben-David E, et al. Whole-organism eQTL mapping at cellular resolution with single-cell sequencing. eLife. 2012;10:e65857.
doi: 10.7554/eLife.65857
de Vries DH, et al. Integrating GWAS with bulk and single-cell RNA-sequencing reveals a role for LY86 in the anti-Candida host response. PLOS Pathog. 2020;16:e1008408.
pubmed: 32251450
pmcid: 7173933
doi: 10.1371/journal.ppat.1008408
K A, et al. Shared genetic effects on chromatin and gene expression indicate a role for enhancer priming in immune response. Nat Genet. 2018;50:424–31.
doi: 10.1038/s41588-018-0046-7
Mo A, et al. Disease-specific regulation of gene expression in a comparative analysis of juvenile idiopathic arthritis and inflammatory bowel disease. Genome Med. 2018;10:48.
pubmed: 29950172
pmcid: 6020373
doi: 10.1186/s13073-018-0558-x
Yoo T, et al. Disease-specific eQTL screening reveals an anti-fibrotic effect of AGXT2 in non-alcoholic fatty liver disease. J Hepatol. 2021;75:514–23.
pubmed: 33892010
doi: 10.1016/j.jhep.2021.04.011
Westra H-J, et al. Cell specific eQTL analysis without sorting cells. PLOS Genet. 2015;11:e1005223.
pubmed: 25955312
pmcid: 4425538
doi: 10.1371/journal.pgen.1005223
Kim-Hellmuth S, et al. Cell type–specific genetic regulation of gene expression across human tissues. Science. 2020;369:eaaz8528.
pubmed: 32913075
pmcid: 8051643
doi: 10.1126/science.aaz8528
He Y, et al. sn-spMF: matrix factorization informs tissue-specific genetic regulation of gene expression. Genome Biol. 2020;21:235.
pubmed: 32912314
pmcid: 7488540
doi: 10.1186/s13059-020-02129-6
Zhernakova DV, et al. Identification of context-dependent expression quantitative trait loci in whole blood. Nat Genet. 2017;49:139–45.
pubmed: 27918533
doi: 10.1038/ng.3737
Flynn ED, et al. Transcription factor regulation of eQTL activity across individuals and tissues. PLOS Genet. 2022;18:e1009719.
pubmed: 35100260
pmcid: 8830792
doi: 10.1371/journal.pgen.1009719
Jolliffe, I. Principal component analysis. (Springer-Verlag, 2002). https://doi.org/10.1007/b98835 .
Leek JT, Storey JD. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 2007;3:e161.
pubmed: 17907809
pmcid: 1994707
doi: 10.1371/journal.pgen.0030161
Stegle O, Parts L, Durbin R, Winn J. A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies. PLoS Comput Biol. 2010;6:e1000770.
pubmed: 20463871
pmcid: 2865505
doi: 10.1371/journal.pcbi.1000770
Mostafavi S, et al. Normalizing RNA-sequencing data by modeling hidden covariates with prior knowledge. PLoS ONE. 2013;8:e68141.
pubmed: 23874524
pmcid: 3715474
doi: 10.1371/journal.pone.0068141
Zhou HJ, Li L, Li Y, Li W, Li JJ. PCA outperforms popular hidden variable inference methods for molecular QTL mapping. Genome Biol. 2022;23:210.
pubmed: 36221136
pmcid: 9552461
doi: 10.1186/s13059-022-02761-4
Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol. 1995;57:289–300.
Chen J, Bardes EE, Aronow BJ, Jegga AG. ToppGene Suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Res. 2009;37:W305–11.
pubmed: 19465376
pmcid: 2703978
doi: 10.1093/nar/gkp427
Rusinova I, et al. INTERFEROME v2.0: an updated database of annotated interferon-regulated genes. Nucleic Acids Res. 2012;41:D1040–6.
pubmed: 23203888
pmcid: 3531205
doi: 10.1093/nar/gks1215
Andreu-Sánchez, S. et al. Genetic, environmental and intrinsic determinants of the human antibody epitope repertoire. 2021.12.07.471553 Preprint at https://doi.org/10.1101/2021.12.07.471553 (2021).
Korndewal MJ, et al. Cytomegalovirus infection in the Netherlands: seroprevalence, risk factors, and implications. J Clin Virol Off Publ Pan Am Soc Clin Virol. 2015;63:53–8.
doi: 10.1016/j.jcv.2014.11.033
Numazaki K, Asanuma H, Chiba S. Latent infection and reactivation of human cytomegalovirus. Serodiagn Immunother Infect Dis. 1995;7:70–4.
doi: 10.1016/0888-0786(95)95348-T
Smith MS, Bentz GL, Alexander JS, Yurochko AD. Human cytomegalovirus induces monocyte differentiation and migration as a strategy for dissemination and persistence. J Virol. 2004. https://doi.org/10.1128/JVI.78.9.4444-4453.2004 .
doi: 10.1128/JVI.78.9.4444-4453.2004
pubmed: 15564471
pmcid: 533933
Noriega VM, et al. Human cytomegalovirus modulates monocyte-mediated innate immune responses during short-term experimental latency in vitro. J Virol. 2014;88:9391–405.
pubmed: 24920803
pmcid: 4136239
doi: 10.1128/JVI.00934-14
Shnayder, M. et al. Single cell analysis reveals human cytomegalovirus drives latently infected cells towards an anergic-like monocyte state. eLife 9, e52168 (2020).
Bryois J, et al. Cell-type-specific cis-eQTLs in eight human brain cell types identify novel risk genes for psychiatric and neurological disorders. Nat Neurosci. 2022;25:1104–12.
pubmed: 35915177
doi: 10.1038/s41593-022-01128-z
Qi T. Identifying gene targets for brain-related traits using transcriptomic and methylomic data from blood. Nat Commun. 2018;9:2282.
pubmed: 29891976
pmcid: 5995828
doi: 10.1038/s41467-018-04558-1
Holland, D. et al. Estimating effect sizes and expected replication probabilities from GWAS summary statistics. Front. Genet. 2016;7.
Karlsson M, et al. A single–cell type transcriptomics map of human tissues. Sci Adv. 2012;7:eabh2169.
doi: 10.1126/sciadv.abh2169
Võsa U, et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat Genet. 2021;53:1300–10.
pubmed: 34475573
pmcid: 8432599
doi: 10.1038/s41588-021-00913-z
Sun, B. B. et al. Genetic regulation of the human plasma proteome in 54,306 UK Biobank participants. https://www.biorxiv.org/content/10.1101/2022.06.17.496443v1 (2022), https://doi.org/10.1101/2022.06.17.496443 .
Westra H-J, et al. MixupMapper: correcting sample mix-ups in genome-wide datasets increases power to detect small genetic effects. Bioinformatics. 2011;27:2104–11.
pubmed: 21653519
doi: 10.1093/bioinformatics/btr323
Chang CC, et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience. 2015;4:7.
pubmed: 25722852
pmcid: 4342193
doi: 10.1186/s13742-015-0047-8
Ecker S, et al. Genome-wide analysis of differential transcriptional and epigenetic variability across human immune cell types. Genome Biol. 2017;18:18.
pubmed: 28126036
pmcid: 5270224
doi: 10.1186/s13059-017-1156-8
Mathys H, et al. Single-cell transcriptomic analysis of Alzheimer’s disease. Nature. 2019;570:332–7.
pubmed: 31042697
pmcid: 6865822
doi: 10.1038/s41586-019-1195-2
Zhu Z, et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat Genet. 2016;48:481–7.
pubmed: 27019110
doi: 10.1038/ng.3538
Storey, J., Bass, A., Dabney, A. & Robinson, D. qvalue: Q-value estimation for false discovery rate control. (2022).
Schoenmaker M, et al. Evidence of genetic enrichment for exceptional survival using a family approach: the Leiden Longevity Study. Eur J Hum Genet. 2006;14:79–84.
pubmed: 16251894
doi: 10.1038/sj.ejhg.5201508
Hofman A, et al. The Rotterdam Study: 2014 objectives and design update. Eur J Epidemiol. 2013;28:889–926.
pubmed: 24258680
doi: 10.1007/s10654-013-9866-z
van Greevenbroek MMJ, et al. The cross-sectional association between insulin resistance and circulating complement C3 is partly explained by plasma alanine aminotransferase, independent of central obesity and general inflammation (the CODAM study): INSULIN RESISTANCE, COMPLEMENT C3 and ALT. Eur J Clin Invest. 2011;41:372–9.
pubmed: 21114489
doi: 10.1111/j.1365-2362.2010.02418.x
Boomsma DI, et al. Netherlands Twin Register: a focus on longitudinal research. Twin Res. 2002;5:401–6.
pubmed: 12537867
doi: 10.1375/136905202320906174
Huisman MHB, et al. Population based epidemiology of amyotrophic lateral sclerosis using capture-recapture methodology. J Neurol Neurosurg Psychiatry. 2011;82:1165–70.
pubmed: 21622937
doi: 10.1136/jnnp.2011.244939
The Genome of the Netherlands Consortium. Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nat Genet. 2014;46:818–25.
doi: 10.1038/ng.3021
Heijmans, B. T. et al. Datasets. European Genome-phenome Archive. https://ega-archive.org/studies/EGAS00001001077 .
Prudencio M, et al. Distinct brain transcriptome profiles in C9orf72-associated and sporadic ALS. Nat Neurosci. 2015;18:1175–82.
pubmed: 26192745
pmcid: 4830686
doi: 10.1038/nn.4065
Fromer, M. et al. Gene expression elucidates functional impact of polygenic risk for schizophrenia. Nat Neurosci 64 (2017).
Fromer, M. et al. Gene expression elucidates functional impact of polygenic risk for schizophrenia. Datasets. Synapse. https://www.synapse.org/#!Synapse:syn2759792 .
Donovan, M. K. R., D’Antonio-Chronowska, A., D’Antonio, M. & Frazer, K. A. Cellular deconvolution of GTEx tissues powers eQTL studies to discover thousands of novel disease and cell-type associated regulatory variants. https://www.biorxiv.org/content/10.1101/671040v2.abstract (2019), https://doi.org/10.1101/671040 .
Donovan, M. K. R., D’Antonio-Chronowska, A., D’Antonio, M. & Frazer, K. A. Cellular deconvolution of GTEx tissues powers discovery of disease and cell-type associated regulatory variants. Datasets. database of Genotypes and Phenotypes. https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000424.v7.p2 .
Hodes RJ, Buckholtz N. Accelerating Medicines Partnership: Alzheimer’s Disease (AMP-AD) knowledge portal aids alzheimer’s drug discovery through open data sharing. Expert Opin Ther Targets. 2016;20:389–91.
pubmed: 26853544
doi: 10.1517/14728222.2016.1135132
Hodes, R. J. & Buckholtz, N. Accelerating medicines partnership: Alzheimer’s disease (AMP-AD) knowledge portal aids Alzheimer’s drug discovery through open data sharing. Datasets. Synapse. https://www.synapse.org/#!Synapse:syn2580853 .
Hodes, R. J. & Buckholtz, N. Accelerating medicines partnership: Alzheimer’s disease (AMP-AD) knowledge portal aids Alzheimer’s drug discovery through open data sharing. Datasets. Synapse. https://www.synapse.org/#!Synapse:syn18485175 .
Leinonen R, et al. The European Nucleotide Archive. Nucleic Acids Res. 2011;39:D28–31.
pubmed: 20972220
doi: 10.1093/nar/gkq967
Fromer, M. et al. Gene expression elucidates functional impact of polygenic risk for schizophrenia. Datasets. Synapse. https://www.synapse.org/#!Synapse:syn10623034 .
Schubert CR, et al. BrainSeq: neurogenomics to drive novel target discovery for neuropsychiatric disorders. Neuron. 2015;88:1078–83.
doi: 10.1016/j.neuron.2015.10.047
Schubert, C. R. et al. BrainSeq: neurogenomics to drive novel target discovery for neuropsychiatric disorders. Datasets. Synapse. https://www.synapse.org/#!Synapse:syn12299750 .
Wang, D. et al. Comprehensive functional genomic resource and integrative model for the human brain. Science 362, eaat8464 (2018).
Wang, D. et al. Comprehensive functional genomic resource and integrative model for the human brain. Datasets. Synapse. https://www.synapse.org/#!Synapse:syn4587609 . doi:syn4587609.
Wang, D. et al. Comprehensive functional genomic resource and integrative model for the human brain. Datasets. Synapse. https://www.synapse.org/#!Synapse:syn4590909 . syn4590909.
Wang, D. et al. Comprehensive functional genomic resource and integrative model for the human brain. Datasets. Synapse. https://www.synapse.org/#!Synapse:syn5844980 . doi:syn5844980.
Gibbs JR, et al. Abundant quantitative trait loci exist for DNA methylation and gene expression in human brain. PLoS Genet. 2010;6:e1000952.
pubmed: 20485568
pmcid: 2869317
doi: 10.1371/journal.pgen.1000952
Gibbs, J. R. et al. Abundant quantitative trait loci exist for DNA methylation and gene expression in human brain. Datasets. database of Genotypes and Phenotypes. https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001301.v1.p1 .
Vochteloo, M. PICALO. GitHub. https://github.com/molgenis/PICALO . (2023).
Vochteloo M. 2023. PICALO Zenodo. https://doi.org/10.5281/zenodo.8172196 .
Harris CR, et al. Array programming with NumPy. Nature. 2020;585:357–62.
pubmed: 32939066
pmcid: 7759461
doi: 10.1038/s41586-020-2649-2
McKinney, W. Data structures for statistical computing in Python. in 56–61 (2010). doi: https://doi.org/10.25080/Majora-92bf1922-00a .
Virtanen P, et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods. 2020;17:261–72.
pubmed: 32015543
pmcid: 7056644
doi: 10.1038/s41592-019-0686-2
Seabold, S, Perktold, J. Statsmodels: econometric and statistical modeling with Python. in 92–96 (2010). https://doi.org/10.25080/Majora-92bf1922-011 .
Hunter JD. Matplotlib: a 2D graphics environment. Comput Sci Eng. 2007;9:90–5.
doi: 10.1109/MCSE.2007.55
Waskom M. seaborn: statistical data visualization. J Open Source Softw. 2021;6:3021.
doi: 10.21105/joss.03021
Pedregosa, F. et al. Scikit-learn: machine learning in Python. Mach. Learn. PYTHON 6.
Lex A, Gehlenborg N, Strobelt H, Vuillemot R, Pfister H. UpSet: visualization of intersecting sets. IEEE Trans Vis Comput Graph. 2014;20:1983–92.
pubmed: 26356912
pmcid: 4720993
doi: 10.1109/TVCG.2014.2346248
Nothman, J. UpSetPlot: Draw Lex et al.’s UpSet plots with Pandas and Matplotlib.