Artificial intelligence in commercial fracture detection products: a systematic review and meta-analysis of diagnostic test accuracy.


Journal

Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288

Informations de publication

Date de publication:
04 Oct 2024
Historique:
received: 12 01 2024
accepted: 12 09 2024
medline: 5 10 2024
pubmed: 5 10 2024
entrez: 4 10 2024
Statut: epublish

Résumé

Conventional radiography (CR) is primarily utilized for fracture diagnosis. Artificial intelligence (AI) for CR is a rapidly growing field aimed at enhancing efficiency and increasing diagnostic accuracy. However, the diagnostic performance of commercially available AI fracture detection solutions (CAAI-FDS) for CR in various anatomical regions, their synergy with human assessment, as well as the influence of industry funding on reported accuracy are unknown. Peer-reviewed diagnostic test accuracy (DTA) studies were identified through a systematic review on Pubmed and Embase. Diagnostic performance measures were extracted especially for different subgroups such as product, type of rater (stand-alone AI, human unaided, human aided), funding, and anatomical region. Pooled measures were obtained with a bivariate random effects model. The impact of rater was evaluated with comparative meta-analysis. Seventeen DTA studies of seven CAAI-FDS analyzing 38,978 x-rays with 8,150 fractures were included. Stand-alone AI studies (n = 15) evaluated five CAAI-FDS; four with good sensitivities (> 90%) and moderate specificities (80-90%) and one with very poor sensitivity (< 60%) and excellent specificity (> 95%). Pooled sensitivities were good to excellent, and specificities were moderate to good in all anatomical regions (n = 7) apart from ribs (n = 4; poor sensitivity / moderate specificity) and spine (n = 4; excellent sensitivity / poor specificity). Funded studies (n = 4) had higher sensitivity (+ 5%) and lower specificity (-4%) than non-funded studies (n = 11). Sensitivity did not differ significantly between stand-alone AI and human AI aided ratings (p = 0.316) but specificity was significantly higher the latter group (p < 0.001). Sensitivity was significant lower in human unaided compared to human AI aided respectively stand-alone AI ratings (both p ≤ 0.001); specificity was higher in human unaided ratings compared to stand-alone AI (p < 0.001) and showed no significant differences AI aided ratings (p = 0.316). The study demonstrates good diagnostic accuracy across most CAAI-FDS and anatomical regions, with the highest performance achieved when used in conjunction with human assessment. Diagnostic accuracy appears lower for spine and rib fractures. The impact of industry funding on reported performance is small.

Identifiants

pubmed: 39367147
doi: 10.1038/s41598-024-73058-8
pii: 10.1038/s41598-024-73058-8
doi:

Types de publication

Journal Article Systematic Review Meta-Analysis

Langues

eng

Sous-ensembles de citation

IM

Pagination

23053

Informations de copyright

© 2024. The Author(s).

Références

Wu, A. M. et al. Global, regional, and national burden of bone fractures in 204 countries and territories, 1990–2019: A systematic analysis from the global burden of disease study 2019. Lancet Healthy Longev.2, e580–e592 (2021).
doi: 10.1016/S2666-7568(21)00172-0
Bergh, C., Wennergren, D., Möller, M. & Brisby, H. Fracture incidence in adults in relation to age and gender: A study of 27,169 fractures in the Swedish fracture Register in a well-defined catchment area. PLoS ONE15, e0244291 (2020).
pubmed: 33347485 pmcid: 7751975 doi: 10.1371/journal.pone.0244291
Burge, R. et al. Incidence and economic burden of osteoporosis-related fractures in the United States, 2005–2025. J. Bone Miner. Res.22, 465–475 (2007).
pubmed: 17144789 doi: 10.1359/jbmr.061113
Müller, M. et al. The development and validation of a resource consumption score of an emergency department consultation. PLoS ONE16, e0247244 (2021).
pubmed: 33606767 pmcid: 7894944 doi: 10.1371/journal.pone.0247244
Bruls, R. J. M. & Kwee, R. M. Workload for radiologists during on-call hours: Dramatic increase in the past 15 years. Insights Imaging11, 121 (2020).
pubmed: 33226490 pmcid: 7683675 doi: 10.1186/s13244-020-00925-z
Di Somma, S. et al. Overcrowding in emergency department: An international issue. Intern. Emerg. Med.10, 171–175 (2015).
pubmed: 25446540 doi: 10.1007/s11739-014-1154-8
Dan Lantsman, C. et al. Trend in radiologist workload compared to number of admissions in the emergency department. Eur. J. Radiol.149, 110195 (2022).
pubmed: 35149337 doi: 10.1016/j.ejrad.2022.110195
Smith, E. & Dasan, S. A system under pressure. Br. J. Hosp. Med.79, 495–499 (2018).
doi: 10.12968/hmed.2018.79.9.495
Mattsson, B., Ertman, D., Exadaktylos, A. K., Martinolli, L. & Hautz, W. E. Now you see me: A pragmatic cohort study comparing first and final radiological diagnoses in the emergency department. BMJ Open.8, e020230 (2018).
pubmed: 29331979 pmcid: 5781021 doi: 10.1136/bmjopen-2017-020230
O’ Neill, S. B. et al. Evaluating radiology result communication in the Emergency Department. Can. Assoc. Radiol. J.72, 846–853 (2021).
pubmed: 32063052 doi: 10.1177/0846537119899268
Duron, L. et al. Assessment of an AI aid in detection of adult appendicular skeletal fractures by emergency physicians and radiologists: A multicenter cross-sectional diagnostic study. Radiology300, 120–129 (2021).
pubmed: 33944629 doi: 10.1148/radiol.2021203886
Guermazi, A. et al. Improving radiographic fracture recognition performance and efficiency using artificial intelligence. Radiology302, 627–636 (2022).
pubmed: 34931859 doi: 10.1148/radiol.210937
Canoni-Meynet, L., Verdot, P., Danner, A., Calame, P. & Aubry, S. Added value of an artificial intelligence solution for fracture detection in the radiologist’s daily trauma emergencies workflow. Diagn. Interv. Imaging. 103, 594–600 (2022).
pubmed: 35780054 doi: 10.1016/j.diii.2022.06.004
Yang, S. et al. Diagnostic accuracy of deep learning in orthopaedic fractures: A systematic review and meta-analysis. Clin. Radiol.75, 713e17-713e28 (2020).
doi: 10.1016/j.crad.2020.05.021
Kuo, R. Y. L. et al. Artificial Intelligence in fracture detection: A systematic review and meta-analysis. Radiology304, 50–62 (2022).
pubmed: 35348381 doi: 10.1148/radiol.211785
Van Leeuwen, K. G., Schalekamp, S., Rutten, M. J. C. M., Van Ginneken, B. & De Rooij, M. Artificial intelligence in radiology: 100 commercially available products and their scientific evidence. Eur. Radiol.31, 3797–3804 (2021).
pubmed: 33856519 pmcid: 8128724 doi: 10.1007/s00330-021-07892-z
Zech, J. R., Santomartino, S. M. & Yi, P. H. Artificial Intelligence (AI) for fracture diagnosis: An overview of current products and considerations for clinical adoption, from the AJR special series on AI applications. Am. J. Roentgenol.219, 869–878 (2022).
doi: 10.2214/AJR.22.27873
Kim, D. W., Jang, H. Y., Kim, K. W., Shin, Y. & Park, S. H. Design characteristics of studies reporting the performance of artificial intelligence algorithms for diagnostic analysis of medical images: Results from recently published papers. Korean J. Radiol.20, 405 (2019).
pubmed: 30799571 pmcid: 6389801 doi: 10.3348/kjr.2019.0025
Nagendran, M. et al. Artificial intelligence versus clinicians: Systematic review of design, reporting standards, and claims of deep learning studies. BMJ. https://doi.org/10.1136/bmj.m689 (2020).
doi: 10.1136/bmj.m689 pubmed: 32213531 pmcid: 7190037
Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy. (Wiley, 2023). https://doi.org/10.1002/9781119756194
McInnes, M. D. F. et al. Preferred reporting items for a systematic review and meta-analysis of diagnostic test accuracy studies: The PRISMA-DTA statement. JAMA319, 388–396 (2018).
pubmed: 29362800 doi: 10.1001/jama.2017.19163
Diagnostic Image Analysis Group (DIAG). Radboud University Medical Center, Netherlands. www.AIforRadiology.com (2023).
American College of Radiology Data Science Institute. AI Central (2023). https://aicentral.acrdsi.org/
Haddaway, N. R., Grainger, M. J. & Gray, C. T. Citationchaser: A tool for transparent and efficient forward and backward citation chasing in systematic searching. Res. Synth. Methods13, 533–545 (2022).
pubmed: 35472127 doi: 10.1002/jrsm.1563
Bramer, W. M., Giustini, D., De Jonge, G. B., Holland, L. & Bekhuis, T. De-duplication of database search results for systematic reviews in EndNote. JMLA104 (2016).
Ouzzani, M., Hammady, H., Fedorowicz, Z. & Elmagarmid, A. Rayyan—A web and mobile app for systematic reviews. Syst. Rev.5, 210 (2016).
pubmed: 27919275 pmcid: 5139140 doi: 10.1186/s13643-016-0384-4
Franco, C., Little, R. J. A., Louis, T. A. & Slud, E. V. Comparative study of confidence intervals for proportions in complex sample surveys†. J. Surv. Stat. Methodol.7, 334–364 (2019).
pubmed: 31428658 pmcid: 6690503 doi: 10.1093/jssam/smy019
Whiting, P. F. et al. QUADAS-2: A revised tool for the quality assessment of diagnostic accuracy studies. Ann. Intern. Med.155, 529–536 (2011).
pubmed: 22007046 doi: 10.7326/0003-4819-155-8-201110180-00009
Nyaga, V. N. & Arbyn, M. Metadta: A stata command for meta-analysis and meta-regression of diagnostic test accuracy data – a tutorial. Arch. Public. Health80, 95 (2022).
pubmed: 35351195 pmcid: 8962039 doi: 10.1186/s13690-021-00747-5
Takwoingi, Y. et al. Chapter 10: Undertaking meta-analysis. In Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy (eds Deeks, J. J., Bossuyt, P. M., Leeflang, M. M., Takwoingi, Y. et al.) (Wiley, 2023). https://doi.org/10.1002/9781119756194 .
doi: 10.1002/9781119756194
Zhou, Y. & Dendukuri, N. Statistics for quantifying heterogeneity in univariate and bivariate meta-analyses of binary data: The case of meta-analyses of diagnostic accuracy. Statist. Med.33, 2701–2717 (2014).
doi: 10.1002/sim.6115
Shelmerdine, S. C., Martin, H., Shirodkar, K., Shamshuddin, S. & Weir-McCall, J. R. Can artificial intelligence pass the fellowship of the Royal College of Radiologists examination? Multi-reader diagnostic accuracy study. BMJ https://doi.org/10.1136/bmj-2022-072826 (2022).
doi: 10.1136/bmj-2022-072826 pubmed: 36543352 pmcid: 9768816
Lindsey, R. et al. Deep neural network improves fracture detection by clinicians. Proc. Natl. Acad. Sci. U S A115, 11591–11596 (2018).
pubmed: 30348771 pmcid: 6233134 doi: 10.1073/pnas.1806905115
Reichert, G. et al. How can a deep learning algorithm improve fracture detection on X-rays in the emergency room?. J. Imaging7, 105 (2021).
pubmed: 39080893 pmcid: 8321374 doi: 10.3390/jimaging7070105
Regnard, N. E. et al. Assessment of performances of a deep learning algorithm for the detection of limbs and pelvic fractures, dislocations, focal bone lesions, and elbow effusions on trauma X-rays. Eur. J. Radiol.154, 110447 (2022).
pubmed: 35921795 doi: 10.1016/j.ejrad.2022.110447
Anderson, P. G. et al. Deep learning assistance closes the accuracy gap in fracture detection across clinician types. Clin. Orthop. Relat. Res.481, 580–588 (2023).
pubmed: 36083847 doi: 10.1097/CORR.0000000000002385
Cohen, M. et al. Artificial intelligence vs. radiologist: Accuracy of wrist fracture detection on radiographs. Eur. Radiol.33, 3974–3983 (2022).
pubmed: 36515712 doi: 10.1007/s00330-022-09349-3
Dupuis, M., Delbos, L., Veil, R. & Adamsbaum, C. External validation of a commercially available deep learning algorithm for fracture detection in children. Diagn. Interv. Imaging103, 151–159 (2022).
pubmed: 34810137 doi: 10.1016/j.diii.2021.10.007
Hayashi, D. et al. Automated detection of acute appendicular skeletal fractures in pediatric patients using deep learning. Skelet. Radiol.51, 2129–2139 (2022).
doi: 10.1007/s00256-022-04070-0
Jones, R. M. et al. Assessment of a deep-learning system for fracture detection in musculoskeletal radiographs. Npj Digit. Med.3, 144 (2020).
pubmed: 33145440 pmcid: 7599208 doi: 10.1038/s41746-020-00352-w
Oppenheimer, J., Lüken, S., Hamm, B. & Niehues, S. M. A prospective approach to integration of AI fracture detection software in radiographs into clinical workflow. Life13, 223 (2023).
pubmed: 36676172 pmcid: 9864518 doi: 10.3390/life13010223
Parpaleix, A., Parsy, C., Cordari, M. & Mejdoubi, M. Assessment of a combined musculoskeletal and chest deep learning-based detection solution in an emergency setting. Eur. J. Radiol. Open.10, 100482 (2023).
pubmed: 36941993 pmcid: 10023863 doi: 10.1016/j.ejro.2023.100482
Gasmi, I. et al. Comparison of diagnostic performance of a deep learning algorithm, emergency physicians, junior radiologists and senior radiologists in the detection of appendicular fractures in children. Pediatr. Radiol.53, 1675–1684 (2023).
pubmed: 36877239 doi: 10.1007/s00247-023-05621-w
Nguyen, T. et al. Assessment of an artificial intelligence aid for the detection of appendicular skeletal fractures in children and young adults by senior and junior radiologists. Pediatr. Radiol.52, 2215–2226 (2022).
pubmed: 36169667 doi: 10.1007/s00247-022-05496-3
Jacques, T., Cardot, N., Ventre, J., Demondion, X. & Cotten, A. Commercially-available AI algorithm improves radiologists’ sensitivity for wrist and hand fracture detection on X-ray, compared to a CT-based ground truth. Eur. Radiol. https://doi.org/10.1007/s00330-023-10380-1 (2023).
doi: 10.1007/s00330-023-10380-1 pubmed: 37919408
Gipson, J. et al. Diagnostic accuracy of a commercially available deep-learning algorithm in supine chest radiographs following trauma. BJR95, 20210979 (2022).
pubmed: 35271382 pmcid: 10996416 doi: 10.1259/bjr.20210979
Huang, S. T., Liu, L. R., Chiu, H. W., Huang, M. Y. & Tsai, M. F. Deep convolutional neural network for rib fracture recognition on chest radiographs. Front. Med.10, 1178798 (2023).
doi: 10.3389/fmed.2023.1178798
Wu, J. et al. Convolutional neural network for detecting rib fractures on chest radiographs: A feasibility study. BMC Med. Imaging23, 18 (2023).
pubmed: 36717773 pmcid: 9885575 doi: 10.1186/s12880-023-00975-x
Rosenberg, G. S. et al. Artificial Intelligence accurately detects traumatic thoracolumbar fractures on sagittal radiographs. Medicina58, 998 (2022).
pubmed: 35893113 pmcid: 9330443 doi: 10.3390/medicina58080998
Bousson, V. et al. Artificial intelligence for detecting acute fractures in patients admitted to an emergency department: Real-life performance of three commercial algorithms. Acad. Radiol.30, 2118–2139 (2023).
pubmed: 37468377 doi: 10.1016/j.acra.2023.06.016
Zhang, X. et al. Diagnostic accuracy and potential covariates of artificial intelligence for diagnosing orthopedic fractures: A systematic literature review and meta-analysis. Eur. Radiol.32, 7196–7216 (2022).
pubmed: 35754091 doi: 10.1007/s00330-022-08956-4
Pauling, C., Kanber, B., Arthurs, O. J. & Shelmerdine, S. C. Commercially available artificial intelligence tools for fracture detection: the evidence. 6 (2024).
Guermazi, A. et al. How AI may transform musculoskeletal imaging. Radiology310, e230764 (2024).
pubmed: 38165245 doi: 10.1148/radiol.230764
Small, T. et al. Comparison of acetabular shell position using patient specific instruments vs. standard surgical instruments: A randomized clinical trial. J. Arthroplast.29, 1030–1037 (2014).
doi: 10.1016/j.arth.2013.10.006
Voter, A. F., Larson, M. E., Garrett, J. W. & Yu, J. P. J. Diagnostic accuracy and failure mode analysis of a deep learning algorithm for the detection of cervical spine fractures. AJNR Am. J. Neuroradiol.42, 1550–1556 (2021).
pubmed: 34117018 pmcid: 8367597 doi: 10.3174/ajnr.A7179
Weikert, T. et al. Assessment of a deep learning algorithm for the detection of Rib fractures on whole-body trauma computed tomography. Korean J. Radiol.21, 891 (2020).
pubmed: 32524789 doi: 10.3348/kjr.2019.0653

Auteurs

Julius Husarek (J)

Department of Orthopaedic Surgery and Traumatology, Bern University Hospital, Inselspital, University of Bern, Bern, Switzerland.
University of Bern, Bern, Switzerland.
Faculty of Medicine, Medical University of Sofia, Sofia, Bulgaria.

Silvan Hess (S)

Department of Orthopaedic Surgery and Traumatology, Bern University Hospital, Inselspital, University of Bern, Bern, Switzerland.

Sam Razaeian (S)

Department for Trauma, Hand and Reconstructive Surgery, Saarland University, Kirrberger Str. 100, 66421, Homburg, Germany.

Thomas D Ruder (TD)

Interventional and Pediatric Radiology, Inselspital, Bern University Hospital, University Institute of Diagnostic, University of Bern, Bern, Switzerland.

Stephan Sehmisch (S)

Department of Trauma Surgery, Hannover Medical School, Carl-Neuberg-Straße 1, 30625, Hannover, Germany.

Martin Müller (M)

Department of Emergency Medicine, Bern University Hospital, Inselspital, University of Bern, Bern, Switzerland.

Emmanouil Liodakis (E)

Department for Trauma, Hand and Reconstructive Surgery, Saarland University, Kirrberger Str. 100, 66421, Homburg, Germany. Emmanouil.Liodakis@uks.eu.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH