Overcoming data scarcity in radiomics/radiogenomics using synthetic radiomic features.

Data augmentation Data scarcity Medical imaging Radiogenomics Radiomics Synthetic data generation

Journal

Computers in biology and medicine
ISSN: 1879-0534
Titre abrégé: Comput Biol Med
Pays: United States
ID NLM: 1250250

Informations de publication

Date de publication:
27 Mar 2024
Historique:
received: 07 12 2023
revised: 11 03 2024
accepted: 25 03 2024
medline: 10 4 2024
pubmed: 10 4 2024
entrez: 9 4 2024
Statut: aheadofprint

Résumé

To evaluate the potential of synthetic radiomic data generation in addressing data scarcity in radiomics/radiogenomics models. This study was conducted on a retrospectively collected cohort of 386 colorectal cancer patients (n = 2570 lesions) for whom matched contrast-enhanced CT images and gene TP53 mutational status were available. The full cohort data was divided into a training cohort (n = 2055 lesions) and an independent and fixed test set (n = 515 lesions). Differently sized training sets were subsampled from the training cohort to measure the impact of sample size on model performance and assess the added value of synthetic radiomic augmentation at different sizes. Five different tabular synthetic data generation models were used to generate synthetic radiomic data based on "real-world" radiomics data extracted from this cohort. The quality and reproducibility of the generated synthetic radiomic data were assessed. Synthetic radiomics were then combined with "real-world" radiomic training data to evaluate their impact on the predictive model's performance. A prediction model was generated using only "real-world" radiomic data, revealing the impact of data scarcity in this particular data set through a lack of predictive performance at low training sample numbers (n = 200, 400, 1000 lesions with average AUC = 0.52, 0.53, and 0.56 respectively, compared to 0.64 when using 2055 training lesions). Synthetic tabular data generation models created reproducible synthetic radiomic data with properties highly similar to "real-world" data (for n = 1000 lesions, average Chi-square = 0.932, average basic statistical correlation = 0.844). The integration of synthetic radiomic data consistently enhanced the performance of predictive models trained with small sample size sets (AUC enhanced by 9.6%, 11.3%, and 16.7% for models trained on n_samples = 200, 400, and 1000 lesions, respectively). In contrast, synthetic data generated from randomised/noisy radiomic data failed to enhance predictive performance underlining the requirement of true signal data to do so. Synthetic radiomic data, when combined with real radiomics, could enhance the performance of predictive models. Tabular synthetic data generation might help to overcome limitations in medical AI stemming from data scarcity.

Identifiants

pubmed: 38593640
pii: S0010-4825(24)00473-6
doi: 10.1016/j.compbiomed.2024.108389
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

108389

Informations de copyright

Copyright © 2024 Elsevier Ltd. All rights reserved.

Déclaration de conflit d'intérêts

Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Auteurs

Milad Ahmadian (M)

Department of Head and Neck Oncology and Surgery, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; Department of Radiology, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; Amsterdam Center for Language and Communication, University of Amsterdam, Amsterdam, the Netherlands. Electronic address: m.ahmadian@nki.nl.

Zuhir Bodalal (Z)

Department of Radiology, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; GROW School for Oncology and Developmental Biology, Maastricht University, Maastricht, the Netherlands.

Hedda J van der Hulst (HJ)

Department of Radiology, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; GROW School for Oncology and Developmental Biology, Maastricht University, Maastricht, the Netherlands.

Conchita Vens (C)

Department of Head and Neck Oncology and Surgery, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; School of Cancer Science, University of Glasgow, Glasgow, Scotland, UK.

Luc H E Karssemakers (LHE)

Department of Head and Neck Oncology and Surgery, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands.

Nino Bogveradze (N)

Department of Radiology, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; GROW School for Oncology and Developmental Biology, Maastricht University, Maastricht, the Netherlands; Department of Radiology, American Hospital Tbilisi, Tbilisi, Georgia.

Francesca Castagnoli (F)

Department of Radiology, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; Department of Radiology, Royal Marsden Hospital, London, UK; Division of Radiotherapy and Imaging, The Institute of Cancer Research, London, UK.

Federica Landolfi (F)

Department of Radiology, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; Radiology Unit, Sant'Andrea Hospital, Sapienza University of Rome, Rome, Italy.

Eun Kyoung Hong (EK)

Department of Radiology, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; GROW School for Oncology and Developmental Biology, Maastricht University, Maastricht, the Netherlands; Seoul National University Hospital, Seoul, South Korea.

Nicolo Gennaro (N)

Department of Radiology, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; Department of Radiology, Northwestern University, Chicago, USA.

Andrea Delli Pizzi (AD)

Department of Radiology, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; ITAB - Institute for Advanced Biomedical Technologies, G. d'Annunzio University, Chieti, Italy; Department of Innovative Technologies in Medicine and Dentistry, G. D'Annunzio University, Chieti, Italy.

Regina G H Beets-Tan (RGH)

Department of Radiology, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; GROW School for Oncology and Developmental Biology, Maastricht University, Maastricht, the Netherlands; Institute of Regional Health Research, University of Southern Denmark, Odense, Denmark.

Michiel W M van den Brekel (MWM)

Department of Head and Neck Oncology and Surgery, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; Amsterdam Center for Language and Communication, University of Amsterdam, Amsterdam, the Netherlands. Electronic address: m.vd.brekel@nki.nl.

Jonas A Castelijns (JA)

Department of Radiology, The Netherlands Cancer Institute/Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands.

Classifications MeSH