From code sharing to sharing of implementations: Advancing reproducible AI development for medical imaging through federated testing.

Artificial intelligence Deployment environment Federated testing PET segmentation Postprocessing Preprocessing Reproducibility

Résumé

The reproducibility crisis in AI research remains a significant concern. While code sharing has been acknowledged as a step toward addressing this issue, our focus extends beyond this paradigm. In this work, we explore "federated testing" as an avenue for advancing reproducible AI research and development especially in medical imaging. Unlike federated learning, where a model is developed and refined on data from different centers, federated testing involves models developed by one team being deployed and evaluated by others, addressing reproducibility across various implementations. Our study follows an exploratory design aimed at systematically evaluating the sources of discrepancies in shared model execution for medical imaging and outputs on the same input data, independent of generalizability analysis. We distributed the same model code to multiple independent centers, monitoring execution in different runtime environments while considering various real-world scenarios for pre- and post-processing steps. We analyzed deployment infrastructure by comparing the impact of different computational resources (GPU vs. CPU) on model performance. To assess federated testing in AI models for medical imaging, we performed a comparative evaluation across different centers, each with distinct pre- and post-processing steps and deployment environments, specifically targeting AI-driven positron emission tomography (PET) imaging segmentation. More specifically, we studied federated testing for an AI-based model for surrogate total metabolic tumor volume (sTMTV) segmentation in PET imaging: the AI algorithm, trained on maximum intensity projection (MIP) data, segments lymphoma regions and estimates sTMTV. Our study reveals that relying solely on open-source code sharing does not guarantee reproducible results due to variations in code execution, runtime environments, and incomplete input specifications. Deploying the segmentation model on local and virtual GPUs compared to using Docker containers showed no effect on reproducibility. However, significant sources of variability were found in data preparation and pre-/post- processing techniques for PET imaging. These findings underscore the limitations of code sharing alone in achieving consistent and accurate results in federated testing. Achieving consistently precise results in federated testing requires more than just sharing models through open-source code. Comprehensive pipeline sharing, including pre- and post-processing steps, is essential. Cloud-based platforms that automate these processes can streamline AI model testing across diverse locations. Standardizing protocols and sharing complete pipelines can significantly enhance the robustness and reproducibility of AI models.

Sections du résumé

BACKGROUND BACKGROUND

The reproducibility crisis in AI research remains a significant concern. While code sharing has been acknowledged as a step toward addressing this issue, our focus extends beyond this paradigm. In this work, we explore "federated testing" as an avenue for advancing reproducible AI research and development especially in medical imaging. Unlike federated learning, where a model is developed and refined on data from different centers, federated testing involves models developed by one team being deployed and evaluated by others, addressing reproducibility across various implementations.

METHODS METHODS

Our study follows an exploratory design aimed at systematically evaluating the sources of discrepancies in shared model execution for medical imaging and outputs on the same input data, independent of generalizability analysis. We distributed the same model code to multiple independent centers, monitoring execution in different runtime environments while considering various real-world scenarios for pre- and post-processing steps. We analyzed deployment infrastructure by comparing the impact of different computational resources (GPU vs. CPU) on model performance. To assess federated testing in AI models for medical imaging, we performed a comparative evaluation across different centers, each with distinct pre- and post-processing steps and deployment environments, specifically targeting AI-driven positron emission tomography (PET) imaging segmentation. More specifically, we studied federated testing for an AI-based model for surrogate total metabolic tumor volume (sTMTV) segmentation in PET imaging: the AI algorithm, trained on maximum intensity projection (MIP) data, segments lymphoma regions and estimates sTMTV.

RESULTS RESULTS

Our study reveals that relying solely on open-source code sharing does not guarantee reproducible results due to variations in code execution, runtime environments, and incomplete input specifications. Deploying the segmentation model on local and virtual GPUs compared to using Docker containers showed no effect on reproducibility. However, significant sources of variability were found in data preparation and pre-/post- processing techniques for PET imaging. These findings underscore the limitations of code sharing alone in achieving consistent and accurate results in federated testing.

CONCLUSION CONCLUSIONS

Achieving consistently precise results in federated testing requires more than just sharing models through open-source code. Comprehensive pipeline sharing, including pre- and post-processing steps, is essential. Cloud-based platforms that automate these processes can streamline AI model testing across diverse locations. Standardizing protocols and sharing complete pipelines can significantly enhance the robustness and reproducibility of AI models.

Identifiants

DOI: 10.1016/j.jmir.2024.101745 PMID: 39208523

pubmed: 39208523

pii: S1939-8654(24)00476-4

doi: 10.1016/j.jmir.2024.101745

pii:

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

101745

From code sharing to sharing of implementations: Advancing reproducible AI development for medical imaging through federated testing.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Informations de copyright

Auteurs

Fereshteh Yousefirizi (F)

Annudesh Liyanage (A)

Ivan S Klyuzhin (IS)

Arman Rahmim (A)

Classifications MeSH