Comparison of gene set scoring methods for reproducible evaluation of tuberculosis gene signatures.


Journal

BMC infectious diseases
ISSN: 1471-2334
Titre abrégé: BMC Infect Dis
Pays: England
ID NLM: 100968551

Informations de publication

Date de publication:
20 Jun 2024
Historique:
received: 29 07 2023
accepted: 31 05 2024
medline: 21 6 2024
pubmed: 21 6 2024
entrez: 20 6 2024
Statut: epublish

Résumé

Blood-based transcriptional gene signatures for tuberculosis (TB) have been developed with potential use to diagnose disease. However, an unresolved issue is whether gene set enrichment analysis of the signature transcripts alone is sufficient for prediction and differentiation or whether it is necessary to use the original model created when the signature was derived. Intra-method comparison is complicated by the unavailability of original training data and missing details about the original trained model. To facilitate the utilization of these signatures in TB research, comparisons between gene set scoring methods cross-data validation of original model implementations are needed. We compared the performance of 19 TB gene signatures across 24 transcriptomic datasets using both rrebuilt original models and gene set scoring methods. Existing gene set scoring methods, including ssGSEA, GSVA, PLAGE, Singscore, and Zscore, were used as alternative approaches to obtain the profile scores. The area under the ROC curve (AUC) value was computed to measure performance. Correlation analysis and Wilcoxon paired tests were used to compare the performance of enrichment methods with the original models. For many signatures, the predictions from gene set scoring methods were highly correlated and statistically equivalent to the results given by the original models. In some cases, PLAGE outperformed the original models when considering signatures' weighted mean AUC values and the AUC results within individual studies. Gene set enrichment scoring of existing gene sets can distinguish patients with active TB disease from other clinical conditions with equivalent or improved accuracy compared to the original methods and models. These data justify using gene set scoring methods of published TB gene signatures for predicting TB risk and treatment outcomes, especially when original models are difficult to apply or implement.

Sections du résumé

BACKGROUND BACKGROUND
Blood-based transcriptional gene signatures for tuberculosis (TB) have been developed with potential use to diagnose disease. However, an unresolved issue is whether gene set enrichment analysis of the signature transcripts alone is sufficient for prediction and differentiation or whether it is necessary to use the original model created when the signature was derived. Intra-method comparison is complicated by the unavailability of original training data and missing details about the original trained model. To facilitate the utilization of these signatures in TB research, comparisons between gene set scoring methods cross-data validation of original model implementations are needed.
METHODS METHODS
We compared the performance of 19 TB gene signatures across 24 transcriptomic datasets using both rrebuilt original models and gene set scoring methods. Existing gene set scoring methods, including ssGSEA, GSVA, PLAGE, Singscore, and Zscore, were used as alternative approaches to obtain the profile scores. The area under the ROC curve (AUC) value was computed to measure performance. Correlation analysis and Wilcoxon paired tests were used to compare the performance of enrichment methods with the original models.
RESULTS RESULTS
For many signatures, the predictions from gene set scoring methods were highly correlated and statistically equivalent to the results given by the original models. In some cases, PLAGE outperformed the original models when considering signatures' weighted mean AUC values and the AUC results within individual studies.
CONCLUSION CONCLUSIONS
Gene set enrichment scoring of existing gene sets can distinguish patients with active TB disease from other clinical conditions with equivalent or improved accuracy compared to the original methods and models. These data justify using gene set scoring methods of published TB gene signatures for predicting TB risk and treatment outcomes, especially when original models are difficult to apply or implement.

Identifiants

pubmed: 38902649
doi: 10.1186/s12879-024-09457-z
pii: 10.1186/s12879-024-09457-z
doi:

Types de publication

Journal Article Comparative Study

Langues

eng

Sous-ensembles de citation

IM

Pagination

610

Subventions

Organisme : NIH HHS
ID : R01GM127430
Pays : United States

Informations de copyright

© 2024. The Author(s).

Références

World Health Organization. Global tuberculosis Report 2022. World Health Organization; 2022.
Pai M, Behr MA, Dowdy D, Dheda K, Divangahi M, Boehme CC, et al. Tuberculosis Nat Rev Dis Primers. 2016;2:16076.
doi: 10.1038/nrdp.2016.76 pubmed: 27784885
Park JH, Choe J, Bae M, Choi S, Jung KH, Kim MJ, et al. Clinical characteristics and radiologic features of immunocompromised patients with Pauci-Bacillary Pulmonary Tuberculosis receiving delayed diagnosis and treatment. Open Forum Infect Dis. 2019;6:ofz002.
doi: 10.1093/ofid/ofz002 pubmed: 30775402 pmcid: 6366656
Swaminathan S, Ramachandran G. Challenges in childhood tuberculosis. Clin Pharmacol Ther. 2015;98:240–4.
doi: 10.1002/cpt.175 pubmed: 26088359
Sharma SK, Ryan H, Khaparde S, Sachdeva KS, Singh AD, Mohan A, et al. Index-TB guidelines: guidelines on extrapulmonary tuberculosis for India. Indian J Med Res. 2017;145:448–63.
pubmed: 28862176 pmcid: 5663158
Gaur M, Singh A, Sharma V, Tandon G, Bothra A, Vasudeva A, et al. Diagnostic performance of non-invasive, stool-based molecular assays in patients with paucibacillary tuberculosis. Sci Rep. 2020;10:7102.
doi: 10.1038/s41598-020-63901-z pubmed: 32345991 pmcid: 7188812
Gupta RK, Turner CT, Venturini C, Esmail H, Rangaka MX, Copas A, et al. Concise whole blood transcriptional signatures for incipient tuberculosis: a systematic review and patient-level pooled meta-analysis. Lancet Respir Med. 2020;8:395–406.
doi: 10.1016/S2213-2600(19)30282-6 pubmed: 31958400 pmcid: 7113839
Sloot R, van der Schim MF, van Zwet EW, Haks MC, Keizer ST, Scholing M, et al. Biomarkers can identify pulmonary tuberculosis in HIV-infected drug users months prior to clinical diagnosis. EBioMedicine. 2015;2:172–9.
doi: 10.1016/j.ebiom.2014.12.001 pubmed: 26137541
Esmail H, Lai RP, Lesosky M, Wilkinson KA, Graham CM, Horswell S, et al. Complement pathway gene activation and rising circulating immune complexes characterize early disease in HIV-associated tuberculosis. Proc Natl Acad Sci U S A. 2018;115:E964–73.
doi: 10.1073/pnas.1711853115 pubmed: 29339504 pmcid: 5798330
Berry MPR, Graham CM, McNab FW, Xu Z, Bloch SAA, Oni T, et al. An interferon-inducible neutrophil-driven blood transcriptional signature in human tuberculosis. Nature. 2010;466:973–7.
doi: 10.1038/nature09247 pubmed: 20725040 pmcid: 3492754
Walter ND, Miller MA, Vasquez J, Weiner M, Chapman A, Engle M, et al. Blood transcriptional biomarkers for active tuberculosis among patients in the United States: a case-control study with systematic cross-classifier evaluation. J Clin Microbiol. 2016;54:274–82.
doi: 10.1128/JCM.01990-15 pubmed: 26582831 pmcid: 4733166
Kaforou M, Wright VJ, Oni T, French N, Anderson ST, Bangani N, et al. Detection of tuberculosis in HIV-infected and -uninfected African adults using whole blood RNA expression signatures: a case-control study. PLoS Med. 2013;10:e1001538.
doi: 10.1371/journal.pmed.1001538 pubmed: 24167453 pmcid: 3805485
Suliman S, Thompson EG, Sutherland J, Weiner J 3rd, Ota MOC, Shankar S, et al. Four-gene pan-african blood signature predicts progression to tuberculosis. Am J Respir Crit Care Med. 2018;197:1198–208.
doi: 10.1164/rccm.201711-2340OC pubmed: 29624071 pmcid: 6019933
Zak DE, Penn-Nicholson A, Scriba TJ, Thompson E, Suliman S, Amon LM, et al. A blood RNA signature for tuberculosis disease risk: a prospective cohort study. Lancet. 2016;387:2312–22.
doi: 10.1016/S0140-6736(15)01316-1 pubmed: 27017310 pmcid: 5392204
Singhania A, Verma R, Graham CM, Lee J, Tran T, Richardson M, et al. A modular transcriptional signature identifies phenotypic heterogeneity of human tuberculosis infection. Nat Commun. 2018;9:2308.
doi: 10.1038/s41467-018-04579-w pubmed: 29921861 pmcid: 6008327
Roe J, Venturini C, Gupta RK, Gurry C, Chain BM, Sun Y, et al. Blood transcriptomic stratification of short-term risk in contacts of tuberculosis. Clin Infect Dis. 2020;70:731–7.
pubmed: 30919880
Scriba TJ, Fiore-Gartland A, Penn-Nicholson A, Mulenga H, Kimbung Mbandi S, Borate B, et al. Biomarker-guided tuberculosis preventive therapy (CORTIS): a randomised controlled trial. Lancet Infect Dis. 2021;21:354–65.
doi: 10.1016/S1473-3099(20)30914-2 pubmed: 33508224 pmcid: 7907670
Warsinske H, Vashisht R, Khatri P. Host-response-based gene signatures for tuberculosis diagnosis: a systematic comparison of 16 signatures. PLoS Med. 2019;16:e1002786.
doi: 10.1371/journal.pmed.1002786 pubmed: 31013272 pmcid: 6478271
Noble WS. What is a support vector machine? Nat Biotechnol. 2006;24:1565–7.
doi: 10.1038/nbt1206-1565 pubmed: 17160063
Johnson WE, Odom A, Cintron C, Muthaiah M, Knudsen S, Joseph N, Babu S, Lakshminarayanan S, Jenkins DF, Zhao Y, Nankya E, Horsburgh CR, Roy G, Ellner JJ, Sarkar S, Salgame P, Hochberg NS. Comparing tuberculosis gene signatures in malnourished individuals using the TBSignatureProfiler. BMC Infect Dis. 2020.
Domaszewska T, Zyla J, Otto R, Kaufmann SHE, Weiner J. Gene set enrichment analysis reveals individual variability in host responses in tuberculosis patients. Front Immunol. 2021;12:694680.
doi: 10.3389/fimmu.2021.694680 pubmed: 34421903 pmcid: 8375662
Hänzelmann S, Castelo R, Guinney J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics. 2013;14:7.
doi: 10.1186/1471-2105-14-7 pubmed: 23323831 pmcid: 3618321
Sweeney TE, Braviak L, Tato CM, Khatri P. Genome-wide expression for diagnosis of pulmonary tuberculosis: a multicohort analysis. Lancet Respir Med. 2016;4:213–24.
doi: 10.1016/S2213-2600(16)00048-5 pubmed: 26907218 pmcid: 4838193
Anderson ST, Kaforou M, Brent AJ, Wright VJ, Banwell CM, Chagaluka G, et al. Diagnosis of childhood tuberculosis and host RNA expression in Africa. N Engl J Med. 2014;370:1712–23.
doi: 10.1056/NEJMoa1303657 pubmed: 24785206 pmcid: 4069985
Bloom CI, Graham CM, Berry MPR, Rozakeas F, Redford PS, Wang Y, et al. Transcriptional blood signatures distinguish pulmonary tuberculosis, pulmonary sarcoidosis, pneumonias and lung cancers. PLoS ONE. 2013;8:e70630.
doi: 10.1371/journal.pone.0070630 pubmed: 23940611 pmcid: 3734176
Laux da Costa L, Delcroix M, Dalla Costa ER, Prestes IV, Milano M, Francis SS, et al. A real-time PCR signature to discriminate between tuberculosis and other pulmonary diseases. Tuberculosis. 2015;95:421–5.
doi: 10.1016/j.tube.2015.04.008 pubmed: 26025597
Jacobsen M, Repsilber D, Gutschmidt A, Neher A, Feldmann K, Mollenkopf HJ, et al. Candidate biomarkers for discrimination between infection and disease caused by Mycobacterium tuberculosis. J Mol Med. 2007;85:613–21.
doi: 10.1007/s00109-007-0157-6 pubmed: 17318616
Leong S, Zhao Y, Joseph NM, Hochberg NS, Sarkar S, Pleskunas J, et al. Existing blood transcriptional classifiers accurately discriminate active tuberculosis from latent infection in individuals from south India. Tuberculosis. 2018;109:41–51.
doi: 10.1016/j.tube.2018.01.002 pubmed: 29559120
Maertzdorf J, McEwen G, Weiner J 3rd, Tian S, Lader E, Schriek U, et al. Concise gene signature for point-of-care classification of tuberculosis. EMBO Mol Med. 2016;8:86–95.
doi: 10.15252/emmm.201505790 pubmed: 26682570
Sambarey A, Devaprasad A, Mohan A, Ahmed A, Nayak S, Swaminathan S, et al. Unbiased identification of blood-based biomarkers for pulmonary tuberculosis by modeling and Mining Molecular Interaction Networks. EBioMedicine. 2017;15:112–26.
doi: 10.1016/j.ebiom.2016.12.009 pubmed: 28065665
Verhagen LM, Zomer A, Maes M, Villalba JA, Del Nogal B, Eleveld M, et al. A predictive signature gene set for discriminating active from latent tuberculosis in Warao Amerindian children. BMC Genomics. 2013;14:74.
doi: 10.1186/1471-2164-14-74 pubmed: 23375113 pmcid: 3600014
Leong S, Zhao Y, Ribeiro-Rodrigues R, Jones-López EC, Acuña-Villaorduña C, Rodrigues PM, Palaci M, Alland D, Dietze R, Ellner JJ, Johnson WE. Cross-validation of existing signatures and derivation of a novel 29-gene transcriptomic signature predictive of progression to TB in a Brazilian cohort of household contacts of pulmonary TB. Tuberculosis. 2020;120:101898.
doi: 10.1016/j.tube.2020.101898 pubmed: 32090859
Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102:15545–50.
doi: 10.1073/pnas.0506580102 pubmed: 16199517 pmcid: 1239896
Tomfohr J, Lu J, Kepler TB. Pathway level analysis of gene expression using singular value decomposition. BMC Bioinformatics. 2005;6:225.
doi: 10.1186/1471-2105-6-225 pubmed: 16156896 pmcid: 1261155
Lee E, Chuang H-Y, Kim J-W, Ideker T, Lee D. Inferring pathway activity toward precise disease classification. PLoS Comput Biol. 2008;4:e1000217.
doi: 10.1371/journal.pcbi.1000217 pubmed: 18989396 pmcid: 2563693
Foroutan M, Bhuva DD, Lyu R, Horan K, Cursons J, Davis MJ. Single sample scoring of molecular phenotypes. BMC Bioinformatics. 2018;19:404.
doi: 10.1186/s12859-018-2435-4 pubmed: 30400809 pmcid: 6219008
M.k V, K K. A survey on similarity measures in text mining. Mach Learn Appl Int J. 2016;3:19–28.
Patil P, Bachant-Winner P-O, Haibe-Kains B, Leek JT. Test set bias affects reproducibility of gene signatures. Bioinformatics. 2015;31:2318–23.
doi: 10.1093/bioinformatics/btv157 pubmed: 25788628 pmcid: 4495301
Tarca AL, Bhatti G, Romero R. A comparison of gene set analysis methods in terms of sensitivity, prioritization and specificity. PLoS ONE. 2013;8:e79217.
doi: 10.1371/journal.pone.0079217 pubmed: 24260172 pmcid: 3829842
Barbie DA, Tamayo P, Boehm JS, Kim SY, Moody SE, Dunn IF, et al. Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature. 2009;462:108–12.
doi: 10.1038/nature08460 pubmed: 19847166 pmcid: 2783335
Tabone O, Verma R, Singhania A, Chakravarty P, Branchett WJ, Graham CM et al. Blood transcriptomics reveal the evolution and resolution of the immune response in tuberculosis. J Exp Med [Internet]. 2021;218. https://doi.org/10.1084/jem.20210915 .
Tran TN, Wehrens R, Buydens LMC. KNN-kernel density-based clustering for high-dimensional multivariate data [Internet]. Computational Statistics & Data Analysis. 2006. pp. 513–25. https://doi.org/10.1016/j.csda.2005.10.001 .
Lulli A, Oneto L, Anguita D. Mining big data with random forests. Cognit Comput. 2019;11:294–316.
doi: 10.1007/s12559-018-9615-4

Auteurs

Xutao Wang (X)

Department of Biostatistics, Boston University, Boston, MA, USA.
Division of Computational Biomedicine and Bioinformatics Program, Boston University, Boston, MA, USA.

Arthur VanValkenberg (A)

Division of Infectious Disease, Center for Data Science, Rutgers New Jersey Medical School, Newark, NJ, USA.

Aubrey R Odom (AR)

Division of Computational Biomedicine and Bioinformatics Program, Boston University, Boston, MA, USA.

Jerrold J Ellner (JJ)

Department of Medicine, Center for Emerging Pathogens, Rutgers New Jersey Medical School, Newark, NJ, USA.

Natasha S Hochberg (NS)

Boston Medical Center, Boston, MA, USA.
Section of Infectious Diseases, Boston University School of Medicine, Boston, MA, USA.

Padmini Salgame (P)

Department of Medicine, Center for Emerging Pathogens, Rutgers New Jersey Medical School, Newark, NJ, USA.

Prasad Patil (P)

Department of Biostatistics, Boston University, Boston, MA, USA.

W Evan Johnson (WE)

Division of Infectious Disease, Center for Data Science, Rutgers New Jersey Medical School, Newark, NJ, USA. w.evan.johnson@rutgers.edu.
Department of Medicine, Center for Emerging Pathogens, Rutgers New Jersey Medical School, Newark, NJ, USA. w.evan.johnson@rutgers.edu.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH