A benchmark of RNA-seq data normalization methods for transcriptome mapping on human genome-scale metabolic networks.


Journal

NPJ systems biology and applications
ISSN: 2056-7189
Titre abrégé: NPJ Syst Biol Appl
Pays: England
ID NLM: 101677786

Informations de publication

Date de publication:
24 Oct 2024
Historique:
received: 04 01 2024
accepted: 27 09 2024
medline: 25 10 2024
pubmed: 25 10 2024
entrez: 25 10 2024
Statut: epublish

Résumé

Genome-scale metabolic models (GEMs) cover the entire list of metabolic genes in an organism and associated reactions, in a tissue/condition non-specific manner. RNA-seq provides crucial information to make the GEMs condition-specific. Integrative Metabolic Analysis Tool (iMAT) and Integrative Network Inference for Tissues (INIT) are the two most popular algorithms to create condition-specific GEMs from human transcriptome data. The normalization method of choice for raw RNA-seq count data affects the model content produced by these algorithms and their predictive accuracy. However, a benchmark of the RNA-seq normalization methods on the performance of iMAT and INIT algorithms is missing in the literature. Another important phenomenon is covariates such as age and gender in a dataset, and they can affect the predictivity of analysis. In this study, we aimed to compare five different RNA-seq data normalization methods (TPM, FPKM, TMM, GeTMM, and RLE) and covariate adjusted versions of the normalized data by mapping them on a human GEM using the iMAT and INIT algorithms to generate personalized metabolic models. We used RNA-seq data for Alzheimer's disease (AD) and lung adenocarcinoma (LUAD) patients. The results demonstrated that RNA-seq data normalized by the RLE, TMM, or GeTMM methods enabled the production of condition-specific metabolic models with considerably low variability in terms of the number of active reactions compared to the within-sample normalization methods (FPKM, TPM). Using these models, we could more accurately capture the disease-associated genes (average accuracy of ~0.80 for AD and ~0.67 for LUAD) for the RLE, TMM, and GeTMM normalization methods. An increase in the accuracies was observed for all the methods when covariate adjustment was applied. We found a similar accuracy trend when we compared the metabolites of perturbed reactions to metabolome data for AD. Together, our benchmark study shows that the between-sample RNA-seq normalization methods reduce false positive predictions at the expense of missing some true positive genes when mapped on GEMs.

Identifiants

pubmed: 39448682
doi: 10.1038/s41540-024-00448-z
pii: 10.1038/s41540-024-00448-z
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

124

Informations de copyright

© 2024. The Author(s).

Références

Quinn, T. P., Crowley, T. M. & Richardson, M. F. Benchmarking differential expression analysis tools for RNA-Seq: normalization-based vs. log-ratio transformation-based methods. BMC Bioinforma. 19, 1–15 (2018).
doi: 10.1186/s12859-018-2261-8
Smid, M. et al. Gene length corrected trimmed mean of M-values (GeTMM) processing of RNA-seq data performs similarly in intersample analyses while improving intrasample comparisons. BMC Bioinforma. 19, 1–13 (2018).
doi: 10.1186/s12859-018-2246-7
Zyprych-Walczak, J. et al. The impact of normalization methods on RNA-Seq data analysis. Biomed Res. Int. 2015, 621690 (2015).
Anders, S. & Huber, W. Differential expression analysis for sequence count data. Nat. Preced. 1, 1 (2010).
Anders, S. et al. Count-based differential expression analysis of RNA sequencing data using R and Bioconductor. Nat. Protoc. 8, 1765–1786 (2013).
pubmed: 23975260 doi: 10.1038/nprot.2013.099
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
pubmed: 25516281 pmcid: 4302049 doi: 10.1186/s13059-014-0550-8
Li, P., Piao, Y., Shon, H. S. & Ryu, K. H. Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-Seq data. BMC Bioinforma. 16, 1–9 (2015).
doi: 10.1186/s12859-015-0778-7
Zhao, Y. et al. TPM, FPKM, or normalized counts? A comparative study of quantification measures for the analysis of RNA-seq data from the NCI patient-derived models repository. J. Transl. Med. 19, 1–15 (2021).
doi: 10.1186/s12967-021-02936-w
Stupnikov, A. et al. Robustness of differential gene expression analysis of RNA-seq. Comput. Struct. Biotechnol. J. 19, 3470–3481 (2021).
pubmed: 34188784 pmcid: 8214188 doi: 10.1016/j.csbj.2021.05.040
Evans, C., Hardin, J. & Stoebel, D. M. Selecting between-sample RNA-Seq normalization methods from the perspective of their assumptions. Brief. Bioinform. 19, 776–792 (2018).
pubmed: 28334202 doi: 10.1093/bib/bbx008
Mo, M. L., Jamshidi, N., Palsson & B, Ø. A genome-scale, constraint-based approach to systems biology of human metabolism. Mol. Biosyst. 3, 598–603 (2007).
pubmed: 17700859 doi: 10.1039/b705597h
Bordbar, A. & Palsson, B. O. Using the reconstructed genome-scale human metabolic network to study physiology and pathology. J. Intern. Med. 271, 131–141 (2012).
pubmed: 22142339 pmcid: 3243107 doi: 10.1111/j.1365-2796.2011.02494.x
Sen, P. & Orešič, M. Integrating Omics Data in Genome-Scale Metabolic Modeling: A Methodological Perspective for Precision Medicine. Metabolites 13, 855 (2023).
pubmed: 37512562 pmcid: 10383060 doi: 10.3390/metabo13070855
Cho, J. S., Gu, C., Han, T. H., Ryu, J. Y. & Lee, S. Y. Reconstruction of context-specific genome-scale metabolic models using multiomics data to study metabolic rewiring. Curr. Opin. Syst. Biol. 15, 1–11 (2019).
doi: 10.1016/j.coisb.2019.02.009
Machado, D. & Herrgård, M. Systematic evaluation of methods for integration of transcriptomic data into constraint-based models of metabolism. PLoS Comput Biol. 10, e1003580 (2014).
pubmed: 24762745 pmcid: 3998872 doi: 10.1371/journal.pcbi.1003580
Jamialahmadi, O., Hashemi-Najafabadi, S., Motamedian, E., Romeo, S. & Bagheri, F. A benchmark-driven approach to reconstruct metabolic networks for studying cancer metabolism. PLoS Comput. Biol. 15, e1006936 (2019).
pubmed: 31009458 pmcid: 6497301 doi: 10.1371/journal.pcbi.1006936
Pacheco, M. P., Pfau, T. & Sauter, T. Benchmarking procedures for high-throughput context specific reconstruction algorithms. Front. Physiol. 6, 410 (2016).
pubmed: 26834640 pmcid: 4722106 doi: 10.3389/fphys.2015.00410
Vieira, V., Ferreira, J. & Rocha, M. A pipeline for the reconstruction and evaluation of context-specific human metabolic models at a large-scale. PLoS Comput. Biol. 18, e1009294 (2022).
pubmed: 35749559 pmcid: 9278738 doi: 10.1371/journal.pcbi.1009294
Opdam, S. et al. A systematic evaluation of methods for tailoring genome-scale metabolic models. Cell Syst. 4, 318–329 (2017).
pubmed: 28215528 pmcid: 5526624 doi: 10.1016/j.cels.2017.01.010
Zur, H., Ruppin, E. & Shlomi, T. iMAT: an integrative metabolic analysis tool. Bioinformatics 26, 3140–3142 (2010).
pubmed: 21081510 doi: 10.1093/bioinformatics/btq602
Stempler, S., Yizhak, K. & Ruppin, E. Integrating transcriptomics with metabolic modeling predicts biomarkers and drug targets for Alzheimer’s disease. PLoS One 9, e105383 (2014).
pubmed: 25127241 pmcid: 4134302 doi: 10.1371/journal.pone.0105383
Varma, V. R. et al. Abnormal brain cholesterol homeostasis in Alzheimer’s disease—a targeted metabolomic and transcriptomic study. NPJ aging Mech. Dis. 7, 1–14 (2021).
doi: 10.1038/s41514-021-00064-9
Cheng, K. et al. Genome-scale metabolic modeling reveals SARS-CoV-2-induced metabolic changes and antiviral targets. Mol. Syst. Biol. 17, e10260 (2021).
pubmed: 34709707 pmcid: 8552660 doi: 10.15252/msb.202110260
Katzir, R. et al. The landscape of tiered regulation of breast cancer cell metabolism. Sci. Rep. 9, 17760 (2019).
pubmed: 31780802 pmcid: 6882817 doi: 10.1038/s41598-019-54221-y
Blazier, A. S. & Papin, J. A. Integration of expression data in genome-scale metabolic network reconstructions. Front. Physiol. 3, 299 (2012).
pubmed: 22934050 pmcid: 3429070 doi: 10.3389/fphys.2012.00299
Agren, R. et al. Reconstruction of genome-scale active metabolic networks for 69 human cell types and 16 cancer types using INIT. PLoS Comput. Biol. 8, e1002518 (2012).
pubmed: 22615553 pmcid: 3355067 doi: 10.1371/journal.pcbi.1002518
Kishk, A. et al. Review of Current Human Genome-Scale Metabolic Models for Brain Cancer and Neurodegenerative Diseases. Cells 11, 2486 (2022).
pubmed: 36010563 pmcid: 9406599 doi: 10.3390/cells11162486
Wang, Y., Eddy, J. A. & Price, N. D. Reconstruction of genome-scale metabolic models for 126 human tissues using mCADRE. BMC Syst. Biol. 6, 1–16 (2012).
doi: 10.1186/1752-0509-6-S1-S1
Wang, H. et al. Genome-scale metabolic network reconstruction of model animals as a platform for translational research. Proc. Natl Acad. Sci. 118, e2102344118 (2021).
pubmed: 34282017 pmcid: 8325244 doi: 10.1073/pnas.2102344118
Mucke, L. Alzheimer’s disease. Nature 461, 895–897 (2009).
pubmed: 19829367 doi: 10.1038/461895a
Venuta, F. et al. Lung cancer in elderly patients. J. Thorac. Dis. 8, S908 (2016).
pubmed: 27942414 pmcid: 5124601 doi: 10.21037/jtd.2016.05.20
Cancer Genome Atlas, R. N. Comprehensive molecular profiling of lung adenocarcinoma. Nature 511, 543 (2014).
Podcasy, J. L. & Epperson, C. N. Considering sex and gender in Alzheimer disease and other dementias. Dialogues Clin. Neurosci. 18, 437–446 (2016).
pubmed: 28179815 pmcid: 5286729 doi: 10.31887/DCNS.2016.18.4/cepperson
Tammemagi, C. M., Neslund-Dudas, C., Simoff, M. & Kvale, P. In lung cancer patients, age, race-ethnicity, gender and smoking predict adverse comorbidity, which in turn predicts treatment and survival. J. Clin. Epidemiol. 57, 597–609 (2004).
pubmed: 15246128 doi: 10.1016/j.jclinepi.2003.11.002
Stapelfeld, C., Dammann, C. & Maser, E. Sex-specificity in lung cancer risk. Int. J. Cancer 146, 2376–2382 (2020).
pubmed: 31583690 doi: 10.1002/ijc.32716
Merchant, J. P. et al. Predictive network analysis identifies JMJD6 and other potential key drivers in Alzheimer’s disease. Commun. Biol. 6, 503 (2023).
pubmed: 37188718 pmcid: 10185548 doi: 10.1038/s42003-023-04791-5
Lynch, M. T. et al. Evaluating genomic signatures of aging in brain tissue as it relates to Alzheimer’s disease. Sci. Rep. 13, 14747 (2023).
pubmed: 37679407 pmcid: 10484923 doi: 10.1038/s41598-023-41400-1
Posma, J. M. et al. Optimized Phenotypic Biomarker Discovery and Confounder Elimination via Covariate-Adjusted Projection to Latent Structures from Metabolic Spectroscopy Data. J. Proteome Res. 17, 1586–1595 (2018).
pubmed: 29457906 pmcid: 5891819 doi: 10.1021/acs.jproteome.7b00879
Radkiewicz, C. et al. Sex and survival in non-small cell lung cancer: A nationwide cohort study. PLoS One 14, e0219206 (2019).
pubmed: 31247015 pmcid: 6597110 doi: 10.1371/journal.pone.0219206
Raškevičius, V. et al. Genome scale metabolic models as tools for drug design and personalized medicine. PLoS One 13, e0190636 (2018).
pubmed: 29304175 pmcid: 5755790 doi: 10.1371/journal.pone.0190636
Barata, T., Vieira, V., Rodrigues, R., das Neves, R. P. & Rocha, M. Reconstruction of tissue-specific genome-scale metabolic models for human cancer stem cells. Comput. Biol. Med. 142, 105177 (2022).
pubmed: 35026576 doi: 10.1016/j.compbiomed.2021.105177
Baloni, P. et al. Metabolic network analysis reveals altered bile acid synthesis and metabolism in Alzheimer’s disease. Cell Reports Med. 1, 8 (2020).
Vlassis, N., Pacheco, M. P. & Sauter, T. Fast reconstruction of compact context-specific metabolic network models. PLoS Comput. Biol. 10, e1003424 (2014).
pubmed: 24453953 pmcid: 3894152 doi: 10.1371/journal.pcbi.1003424
Ramon, C. & Stelling, J. Functional comparison of metabolic networks across species. Nat. Commun. 14, 1699 (2023).
pubmed: 36973280 pmcid: 10043025 doi: 10.1038/s41467-023-37429-5
Choi, S. H. et al. Evaluation of logistic regression models and effect of covariates for case–control study in rna-seq analysis. BMC Bioinforma. 18, 1–13 (2017).
doi: 10.1186/s12859-017-1498-y
Düz, E. & Çakir, T. Effect of RNA-Seq data normalization on protein interactome mapping for Alzheimer’s disease. Comput. Biol. Chem. 109, 108028 (2024).
pubmed: 38377697 doi: 10.1016/j.compbiolchem.2024.108028
Corchete, L. A. et al. Systematic comparison and assessment of RNA-seq procedures for gene expression quantitative analysis. Sci. Rep. 10, 19737 (2020).
pubmed: 33184454 pmcid: 7665074 doi: 10.1038/s41598-020-76881-x
Maza, E., Frasse, P., Senin, P., Bouzayen, M. & Zouine, M. Comparison of normalization methods for differential gene expression analysis in RNA-Seq experiments: a matter of relative size of studied transcriptomes. Commun. \ Integr. Biol. 6, e25849 (2013).
pubmed: 26442135 pmcid: 3918003 doi: 10.4161/cib.25849
De Jager, P. L. et al. A multi-omic atlas of the human frontal cortex for aging and Alzheimer’s disease research. Sci. data 5, 1–13 (2018).
doi: 10.1038/sdata.2018.142
Bioinformatics, B. FastQC: a quality control tool for high throughput sequence data. (Cambridge, UK Babraham Inst., 2011).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
pubmed: 24695404 pmcid: 4103590 doi: 10.1093/bioinformatics/btu170
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
pubmed: 23104886 doi: 10.1093/bioinformatics/bts635
Frankish, A. et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res 47, D766–D773 (2019).
pubmed: 30357393 doi: 10.1093/nar/gky955
Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
pubmed: 24227677 doi: 10.1093/bioinformatics/btt656
Weinstein, J. N. et al. The cancer genome atlas pan-cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).
pubmed: 24071849 pmcid: 3919969 doi: 10.1038/ng.2764
Colaprico, A. et al. TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data. Nucleic Acids Res 44, e71–(2016).
pubmed: 26704973 doi: 10.1093/nar/gkv1507
Abrams, Z. B., Johnson, T. S., Huang, K., Payne, P. R. O. & Coombes, K. A protocol to evaluate RNA sequencing normalization methods. BMC Bioinforma. 20, 1–7 (2019).
doi: 10.1186/s12859-019-3247-x
Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008).
pubmed: 18516045 doi: 10.1038/nmeth.1226
Robinson, M. D. & Oshlack, A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11, R25 (2010).
pubmed: 20196867 pmcid: 2864565 doi: 10.1186/gb-2010-11-3-r25
Robinson, J. L. et al. An atlas of human metabolism. Sci. Signal. 13, eaaz1482 (2020).
Lüleci, H. B., Uzuner, D., Çakır, T. & Thambisetty, M. Computational Approaches to Assess Abnormal Metabolism in Alzheimer’s Disease Using Transcriptomics. Methods Mol. Biol. 2561, 173–189 (Springer, 2023).
Shlomi, T., Cabili, M. N., Herrgård, M. J., Palsson, B. Ø. & Ruppin, E. Network-based prediction of human tissue-specific metabolism. Nat. Biotechnol. 26, 1003–1010 (2008).
pubmed: 18711341 doi: 10.1038/nbt.1487
Heirendt, L. et al. Creation and analysis of biochemical constraint-based models using the COBRA Toolbox v. 3.0. Nat. Protoc. 14, 639–702 (2019).
pubmed: 30787451 pmcid: 6635304 doi: 10.1038/s41596-018-0098-2
Wang, H. et al. RAVEN 2.0: A versatile toolbox for metabolic network reconstruction and a case study on Streptomyces coelicolor. PLoS Comput. Biol. 14, e1006541 (2018).
pubmed: 30335785 pmcid: 6207324 doi: 10.1371/journal.pcbi.1006541
Fisher, R. A. The genetical theory of natural selection. (Рипол Классик, 1958).
Xie, Z. et al. Gene Set Knowledge Discovery with Enrichr. Curr. Protoc. 1, 1–51 (2021).
doi: 10.1002/cpz1.90
Piñero, J. et al. DisGeNET: A comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Res 45, D833–D839 (2017).
pubmed: 27924018 doi: 10.1093/nar/gkw943
Ceylan, B., Düz, E. & Çakir, T. Personalized Protein ‑ Protein Interaction Networks Towards Unraveling the Molecular Mechanisms of Alzheimer ’ s Disease. Mol. Neurobiol. https://doi.org/10.1007/s12035-023-03690-4 (2023).
Li, C., Long, Q., Zhang, D., Li, J. & Zhang, X. Identification of a four-gene panel predicting overall survival for lung adenocarcinoma. BMC Cancer 20, 1–16 (2020).
doi: 10.1186/s12885-020-07657-9
He, L., Chen, J., Xu, F. & Li, J. Prognostic Implication of a Metabolism-Associated Gene Signature in Lung Adenocarcinoma. Mol. Ther. - Oncolytics 19, 265–277 (2020).
pubmed: 33209981 pmcid: 7658576 doi: 10.1016/j.omto.2020.09.011
Morikawa, K. et al. A Prospective Validation Study of Lung Cancer Gene Panel Testing Using Cytological Specimens. Cancers. 14, 3784 (2022).
Liu, Y., Zhao, M. & Qu, H. A Database of Lung Cancer-Related Genes for the Identification of Subtype-Specific Prognostic Biomarkers. Biology. 12, 357 (2023).
Herbst, R. S., Morgensztern, D. & Boshoff, C. The biology and management of non-small cell lung cancer. Nature 553, 446–454 (2018).
pubmed: 29364287 doi: 10.1038/nature25183
Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Oxf. Univ. Press 28, 27–30 (2000).
Niwattanakul, S., Singthongchai, J., Naenudorn, E. & Wanapu, S. Using of Jaccard coefficient for keywords similarity. In Proceedings of the international multiconference of engineers and computer scientists 1, 380–384 (2013).

Auteurs

Hatice Büşra Lüleci (HB)

Department of Bioengineering, Gebze Technical University, Kocaeli, 41400, Turkey.

Dilara Uzuner (D)

Department of Bioengineering, Gebze Technical University, Kocaeli, 41400, Turkey.

Müberra Fatma Cesur (MF)

Department of Bioengineering, Gebze Technical University, Kocaeli, 41400, Turkey.

Atılay İlgün (A)

Department of Bioengineering, Gebze Technical University, Kocaeli, 41400, Turkey.

Elif Düz (E)

Department of Bioengineering, Gebze Technical University, Kocaeli, 41400, Turkey.

Ecehan Abdik (E)

Department of Bioengineering, Gebze Technical University, Kocaeli, 41400, Turkey.

Regan Odongo (R)

Department of Bioengineering, Gebze Technical University, Kocaeli, 41400, Turkey.

Tunahan Çakır (T)

Department of Bioengineering, Gebze Technical University, Kocaeli, 41400, Turkey. tcakir@gtu.edu.tr.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH