PICALO: principal interaction component analysis for the identification of discrete technical, cell-type, and environmental factors that mediate eQTLs.

Cell type Context Hidden variable inference Interaction eQTLs eQTLs

Journal

Genome biology
ISSN: 1474-760X
Titre abrégé: Genome Biol
Pays: England
ID NLM: 100960660

Informations de publication

Date de publication:
22 Jan 2024
Historique:
received: 22 12 2022
accepted: 20 12 2023
medline: 23 1 2024
pubmed: 23 1 2024
entrez: 23 1 2024
Statut: epublish

Résumé

Expression quantitative trait loci (eQTL) offer insights into the regulatory mechanisms of trait-associated variants, but their effects often rely on contexts that are unknown or unmeasured. We introduce PICALO, a method for hidden variable inference of eQTL contexts. PICALO identifies and disentangles technical from biological context in heterogeneous blood and brain bulk eQTL datasets. These contexts are biologically informative and reproducible, outperforming cell counts or expression-based principal components. Furthermore, we show that RNA quality and cell type proportions interact with thousands of eQTLs. Knowledge of hidden eQTL contexts may aid in the inference of functional mechanisms underlying disease variants.

Identifiants

pubmed: 38254182
doi: 10.1186/s13059-023-03151-0
pii: 10.1186/s13059-023-03151-0
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

29

Informations de copyright

© 2024. The Author(s).

Références

Fu, J. et al. Unraveling the regulatory mechanisms underlying tissue-dependent genetic variation of gene expression. PLoS Genet. 8, (2012).
van der Wijst MGP, et al. Single-cell RNA sequencing identifies celltype-specific cis-eQTLs and co-expression QTLs. Nat Genet. 2018;50:493–7.
pubmed: 29610479 pmcid: 5905669 doi: 10.1038/s41588-018-0089-9
Fairfax BP, et al. Innate immune activity conditions the effect of regulatory variants upon monocyte gene expression. Science. 2014;343:1246949.
pubmed: 24604202 pmcid: 4064786 doi: 10.1126/science.1246949
Connally, N. et al. The missing link between genetic association and regulatory function. 2021.06.08.21258515 https://www.medrxiv.org/content/10.1101/2021.06.08.21258515v2 (2021), https://doi.org/10.1101/2021.06.08.21258515 .
GTEx Consortium et al. Genetic effects on gene expression across human tissuesGTEx. Nature 550, 204–213 (2017).
Gay NR, et al. Impact of admixture and ancestry on eQTL analysis and GWAS colocalization in GTEx. Genome Biol. 2020;21:233.
pubmed: 32912333 pmcid: 7488497 doi: 10.1186/s13059-020-02113-0
De Klein N, et al. Brain expression quantitative trait locus and network analyses reveal downstream effects and putative drivers for brain-related diseases. Nat Genet. 2023;55:377–88.
pubmed: 36823318 pmcid: 10011140 doi: 10.1038/s41588-023-01300-6
van der Wijst MGP, et al. Single-cell RNA sequencing identifies celltype-specific cis-eQTLs and co-expression QTLs. Nat Genet. 2018;50:493–7.
pubmed: 29610479 pmcid: 5905669 doi: 10.1038/s41588-018-0089-9
Bonder MJ, et al. Identification of rare and common regulatory variants in pluripotent cells using population-scale transcriptomics. Nat Genet. 2021;53:313–21.
pubmed: 33664507 pmcid: 7944648 doi: 10.1038/s41588-021-00800-7
Ben-David E, et al. Whole-organism eQTL mapping at cellular resolution with single-cell sequencing. eLife. 2012;10:e65857.
doi: 10.7554/eLife.65857
de Vries DH, et al. Integrating GWAS with bulk and single-cell RNA-sequencing reveals a role for LY86 in the anti-Candida host response. PLOS Pathog. 2020;16:e1008408.
pubmed: 32251450 pmcid: 7173933 doi: 10.1371/journal.ppat.1008408
K A, et al. Shared genetic effects on chromatin and gene expression indicate a role for enhancer priming in immune response. Nat Genet. 2018;50:424–31.
doi: 10.1038/s41588-018-0046-7
Mo A, et al. Disease-specific regulation of gene expression in a comparative analysis of juvenile idiopathic arthritis and inflammatory bowel disease. Genome Med. 2018;10:48.
pubmed: 29950172 pmcid: 6020373 doi: 10.1186/s13073-018-0558-x
Yoo T, et al. Disease-specific eQTL screening reveals an anti-fibrotic effect of AGXT2 in non-alcoholic fatty liver disease. J Hepatol. 2021;75:514–23.
pubmed: 33892010 doi: 10.1016/j.jhep.2021.04.011
Westra H-J, et al. Cell specific eQTL analysis without sorting cells. PLOS Genet. 2015;11:e1005223.
pubmed: 25955312 pmcid: 4425538 doi: 10.1371/journal.pgen.1005223
Kim-Hellmuth S, et al. Cell type–specific genetic regulation of gene expression across human tissues. Science. 2020;369:eaaz8528.
pubmed: 32913075 pmcid: 8051643 doi: 10.1126/science.aaz8528
He Y, et al. sn-spMF: matrix factorization informs tissue-specific genetic regulation of gene expression. Genome Biol. 2020;21:235.
pubmed: 32912314 pmcid: 7488540 doi: 10.1186/s13059-020-02129-6
Zhernakova DV, et al. Identification of context-dependent expression quantitative trait loci in whole blood. Nat Genet. 2017;49:139–45.
pubmed: 27918533 doi: 10.1038/ng.3737
Flynn ED, et al. Transcription factor regulation of eQTL activity across individuals and tissues. PLOS Genet. 2022;18:e1009719.
pubmed: 35100260 pmcid: 8830792 doi: 10.1371/journal.pgen.1009719
Jolliffe, I. Principal component analysis. (Springer-Verlag, 2002). https://doi.org/10.1007/b98835 .
Leek JT, Storey JD. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 2007;3:e161.
pubmed: 17907809 pmcid: 1994707 doi: 10.1371/journal.pgen.0030161
Stegle O, Parts L, Durbin R, Winn J. A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies. PLoS Comput Biol. 2010;6:e1000770.
pubmed: 20463871 pmcid: 2865505 doi: 10.1371/journal.pcbi.1000770
Mostafavi S, et al. Normalizing RNA-sequencing data by modeling hidden covariates with prior knowledge. PLoS ONE. 2013;8:e68141.
pubmed: 23874524 pmcid: 3715474 doi: 10.1371/journal.pone.0068141
Zhou HJ, Li L, Li Y, Li W, Li JJ. PCA outperforms popular hidden variable inference methods for molecular QTL mapping. Genome Biol. 2022;23:210.
pubmed: 36221136 pmcid: 9552461 doi: 10.1186/s13059-022-02761-4
Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol. 1995;57:289–300.
Chen J, Bardes EE, Aronow BJ, Jegga AG. ToppGene Suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Res. 2009;37:W305–11.
pubmed: 19465376 pmcid: 2703978 doi: 10.1093/nar/gkp427
Rusinova I, et al. INTERFEROME v2.0: an updated database of annotated interferon-regulated genes. Nucleic Acids Res. 2012;41:D1040–6.
pubmed: 23203888 pmcid: 3531205 doi: 10.1093/nar/gks1215
Andreu-Sánchez, S. et al. Genetic, environmental and intrinsic determinants of the human antibody epitope repertoire. 2021.12.07.471553 Preprint at https://doi.org/10.1101/2021.12.07.471553 (2021).
Korndewal MJ, et al. Cytomegalovirus infection in the Netherlands: seroprevalence, risk factors, and implications. J Clin Virol Off Publ Pan Am Soc Clin Virol. 2015;63:53–8.
doi: 10.1016/j.jcv.2014.11.033
Numazaki K, Asanuma H, Chiba S. Latent infection and reactivation of human cytomegalovirus. Serodiagn Immunother Infect Dis. 1995;7:70–4.
doi: 10.1016/0888-0786(95)95348-T
Smith MS, Bentz GL, Alexander JS, Yurochko AD. Human cytomegalovirus induces monocyte differentiation and migration as a strategy for dissemination and persistence. J Virol. 2004. https://doi.org/10.1128/JVI.78.9.4444-4453.2004 .
doi: 10.1128/JVI.78.9.4444-4453.2004 pubmed: 15564471 pmcid: 533933
Noriega VM, et al. Human cytomegalovirus modulates monocyte-mediated innate immune responses during short-term experimental latency in vitro. J Virol. 2014;88:9391–405.
pubmed: 24920803 pmcid: 4136239 doi: 10.1128/JVI.00934-14
Shnayder, M. et al. Single cell analysis reveals human cytomegalovirus drives latently infected cells towards an anergic-like monocyte state. eLife 9, e52168 (2020).
Bryois J, et al. Cell-type-specific cis-eQTLs in eight human brain cell types identify novel risk genes for psychiatric and neurological disorders. Nat Neurosci. 2022;25:1104–12.
pubmed: 35915177 doi: 10.1038/s41593-022-01128-z
Qi T. Identifying gene targets for brain-related traits using transcriptomic and methylomic data from blood. Nat Commun. 2018;9:2282.
pubmed: 29891976 pmcid: 5995828 doi: 10.1038/s41467-018-04558-1
Holland, D. et al. Estimating effect sizes and expected replication probabilities from GWAS summary statistics. Front. Genet. 2016;7.
Karlsson M, et al. A single–cell type transcriptomics map of human tissues. Sci Adv. 2012;7:eabh2169.
doi: 10.1126/sciadv.abh2169
Võsa U, et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat Genet. 2021;53:1300–10.
pubmed: 34475573 pmcid: 8432599 doi: 10.1038/s41588-021-00913-z
Sun, B. B. et al. Genetic regulation of the human plasma proteome in 54,306 UK Biobank participants. https://www.biorxiv.org/content/10.1101/2022.06.17.496443v1 (2022), https://doi.org/10.1101/2022.06.17.496443 .
Westra H-J, et al. MixupMapper: correcting sample mix-ups in genome-wide datasets increases power to detect small genetic effects. Bioinformatics. 2011;27:2104–11.
pubmed: 21653519 doi: 10.1093/bioinformatics/btr323
Chang CC, et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience. 2015;4:7.
pubmed: 25722852 pmcid: 4342193 doi: 10.1186/s13742-015-0047-8
Ecker S, et al. Genome-wide analysis of differential transcriptional and epigenetic variability across human immune cell types. Genome Biol. 2017;18:18.
pubmed: 28126036 pmcid: 5270224 doi: 10.1186/s13059-017-1156-8
Mathys H, et al. Single-cell transcriptomic analysis of Alzheimer’s disease. Nature. 2019;570:332–7.
pubmed: 31042697 pmcid: 6865822 doi: 10.1038/s41586-019-1195-2
Zhu Z, et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat Genet. 2016;48:481–7.
pubmed: 27019110 doi: 10.1038/ng.3538
Storey, J., Bass, A., Dabney, A. & Robinson, D. qvalue: Q-value estimation for false discovery rate control. (2022).
Schoenmaker M, et al. Evidence of genetic enrichment for exceptional survival using a family approach: the Leiden Longevity Study. Eur J Hum Genet. 2006;14:79–84.
pubmed: 16251894 doi: 10.1038/sj.ejhg.5201508
Hofman A, et al. The Rotterdam Study: 2014 objectives and design update. Eur J Epidemiol. 2013;28:889–926.
pubmed: 24258680 doi: 10.1007/s10654-013-9866-z
van Greevenbroek MMJ, et al. The cross-sectional association between insulin resistance and circulating complement C3 is partly explained by plasma alanine aminotransferase, independent of central obesity and general inflammation (the CODAM study): INSULIN RESISTANCE, COMPLEMENT C3 and ALT. Eur J Clin Invest. 2011;41:372–9.
pubmed: 21114489 doi: 10.1111/j.1365-2362.2010.02418.x
Boomsma DI, et al. Netherlands Twin Register: a focus on longitudinal research. Twin Res. 2002;5:401–6.
pubmed: 12537867 doi: 10.1375/136905202320906174
Huisman MHB, et al. Population based epidemiology of amyotrophic lateral sclerosis using capture-recapture methodology. J Neurol Neurosurg Psychiatry. 2011;82:1165–70.
pubmed: 21622937 doi: 10.1136/jnnp.2011.244939
The Genome of the Netherlands Consortium. Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nat Genet. 2014;46:818–25.
doi: 10.1038/ng.3021
Heijmans, B. T. et al. Datasets. European Genome-phenome Archive. https://ega-archive.org/studies/EGAS00001001077 .
Prudencio M, et al. Distinct brain transcriptome profiles in C9orf72-associated and sporadic ALS. Nat Neurosci. 2015;18:1175–82.
pubmed: 26192745 pmcid: 4830686 doi: 10.1038/nn.4065
Fromer, M. et al. Gene expression elucidates functional impact of polygenic risk for schizophrenia. Nat Neurosci 64 (2017).
Fromer, M. et al. Gene expression elucidates functional impact of polygenic risk for schizophrenia. Datasets. Synapse. https://www.synapse.org/#!Synapse:syn2759792 .
Donovan, M. K. R., D’Antonio-Chronowska, A., D’Antonio, M. & Frazer, K. A. Cellular deconvolution of GTEx tissues powers eQTL studies to discover thousands of novel disease and cell-type associated regulatory variants. https://www.biorxiv.org/content/10.1101/671040v2.abstract (2019), https://doi.org/10.1101/671040 .
Donovan, M. K. R., D’Antonio-Chronowska, A., D’Antonio, M. & Frazer, K. A. Cellular deconvolution of GTEx tissues powers discovery of disease and cell-type associated regulatory variants. Datasets. database of Genotypes and Phenotypes. https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000424.v7.p2 .
Hodes RJ, Buckholtz N. Accelerating Medicines Partnership: Alzheimer’s Disease (AMP-AD) knowledge portal aids alzheimer’s drug discovery through open data sharing. Expert Opin Ther Targets. 2016;20:389–91.
pubmed: 26853544 doi: 10.1517/14728222.2016.1135132
Hodes, R. J. & Buckholtz, N. Accelerating medicines partnership: Alzheimer’s disease (AMP-AD) knowledge portal aids Alzheimer’s drug discovery through open data sharing. Datasets. Synapse. https://www.synapse.org/#!Synapse:syn2580853 .
Hodes, R. J. & Buckholtz, N. Accelerating medicines partnership: Alzheimer’s disease (AMP-AD) knowledge portal aids Alzheimer’s drug discovery through open data sharing. Datasets. Synapse. https://www.synapse.org/#!Synapse:syn18485175 .
Leinonen R, et al. The European Nucleotide Archive. Nucleic Acids Res. 2011;39:D28–31.
pubmed: 20972220 doi: 10.1093/nar/gkq967
Fromer, M. et al. Gene expression elucidates functional impact of polygenic risk for schizophrenia. Datasets. Synapse. https://www.synapse.org/#!Synapse:syn10623034 .
Schubert CR, et al. BrainSeq: neurogenomics to drive novel target discovery for neuropsychiatric disorders. Neuron. 2015;88:1078–83.
doi: 10.1016/j.neuron.2015.10.047
Schubert, C. R. et al. BrainSeq: neurogenomics to drive novel target discovery for neuropsychiatric disorders. Datasets. Synapse. https://www.synapse.org/#!Synapse:syn12299750 .
Wang, D. et al. Comprehensive functional genomic resource and integrative model for the human brain. Science 362, eaat8464 (2018).
Wang, D. et al. Comprehensive functional genomic resource and integrative model for the human brain. Datasets. Synapse. https://www.synapse.org/#!Synapse:syn4587609 . doi:syn4587609.
Wang, D. et al. Comprehensive functional genomic resource and integrative model for the human brain. Datasets. Synapse. https://www.synapse.org/#!Synapse:syn4590909 . syn4590909.
Wang, D. et al. Comprehensive functional genomic resource and integrative model for the human brain. Datasets. Synapse. https://www.synapse.org/#!Synapse:syn5844980 . doi:syn5844980.
Gibbs JR, et al. Abundant quantitative trait loci exist for DNA methylation and gene expression in human brain. PLoS Genet. 2010;6:e1000952.
pubmed: 20485568 pmcid: 2869317 doi: 10.1371/journal.pgen.1000952
Gibbs, J. R. et al. Abundant quantitative trait loci exist for DNA methylation and gene expression in human brain. Datasets. database of Genotypes and Phenotypes. https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001301.v1.p1 .
Vochteloo, M. PICALO. GitHub. https://github.com/molgenis/PICALO . (2023).
Vochteloo M. 2023. PICALO Zenodo. https://doi.org/10.5281/zenodo.8172196 .
Harris CR, et al. Array programming with NumPy. Nature. 2020;585:357–62.
pubmed: 32939066 pmcid: 7759461 doi: 10.1038/s41586-020-2649-2
McKinney, W. Data structures for statistical computing in Python. in 56–61 (2010). doi: https://doi.org/10.25080/Majora-92bf1922-00a .
Virtanen P, et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods. 2020;17:261–72.
pubmed: 32015543 pmcid: 7056644 doi: 10.1038/s41592-019-0686-2
Seabold, S, Perktold, J. Statsmodels: econometric and statistical modeling with Python. in 92–96 (2010). https://doi.org/10.25080/Majora-92bf1922-011 .
Hunter JD. Matplotlib: a 2D graphics environment. Comput Sci Eng. 2007;9:90–5.
doi: 10.1109/MCSE.2007.55
Waskom M. seaborn: statistical data visualization. J Open Source Softw. 2021;6:3021.
doi: 10.21105/joss.03021
Pedregosa, F. et al. Scikit-learn: machine learning in Python. Mach. Learn. PYTHON 6.
Lex A, Gehlenborg N, Strobelt H, Vuillemot R, Pfister H. UpSet: visualization of intersecting sets. IEEE Trans Vis Comput Graph. 2014;20:1983–92.
pubmed: 26356912 pmcid: 4720993 doi: 10.1109/TVCG.2014.2346248
Nothman, J. UpSetPlot: Draw Lex et al.’s UpSet plots with Pandas and Matplotlib.

Auteurs

Martijn Vochteloo (M)

Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands.
Oncode Institute, Utrecht, The Netherlands.

Patrick Deelen (P)

Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands.
Oncode Institute, Utrecht, The Netherlands.

Britt Vink (B)

Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands.
Institute for Life Science & Technology, Hanze University of Applied Sciences, Groningen, The Netherlands.

Ellen A Tsai (EA)

Translational Sciences, Research and Development, Biogen, Cambridge, MA, USA.

Heiko Runz (H)

Translational Sciences, Research and Development, Biogen, Cambridge, MA, USA.

Sergio Andreu-Sánchez (S)

Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands.
Department of Pediatrics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands.

Jingyuan Fu (J)

Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands.
Department of Pediatrics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands.

Alexandra Zhernakova (A)

Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands.

Harm-Jan Westra (HJ)

Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands. h.j.westra@umcg.nl.
Oncode Institute, Utrecht, The Netherlands. h.j.westra@umcg.nl.

Lude Franke (L)

Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands. l.h.franke@umcg.nl.
Oncode Institute, Utrecht, The Netherlands. l.h.franke@umcg.nl.

Classifications MeSH