External validation of an artificial intelligence multi-label deep learning model capable of ankle fracture classification.
AO/OTA classification
Ankle
External validation
Machine learning
Neural networks
Trauma
Journal
BMC musculoskeletal disorders
ISSN: 1471-2474
Titre abrégé: BMC Musculoskelet Disord
Pays: England
ID NLM: 100968565
Informations de publication
Date de publication:
04 Oct 2024
04 Oct 2024
Historique:
received:
29
02
2024
accepted:
19
09
2024
medline:
5
10
2024
pubmed:
5
10
2024
entrez:
4
10
2024
Statut:
epublish
Résumé
Advances in medical imaging have made it possible to classify ankle fractures using Artificial Intelligence (AI). Recent studies have demonstrated good internal validity for machine learning algorithms using the AO/OTA 2018 classification. This study aimed to externally validate one such model for ankle fracture classification and ways to improve external validity. In this retrospective observation study, we trained a deep-learning neural network (7,500 ankle studies) to classify traumatic malleolar fractures according to the AO/OTA classification. Our internal validation dataset (IVD) contained 409 studies collected from Danderyd Hospital in Stockholm, Sweden, between 2002 and 2016. The external validation dataset (EVD) contained 399 studies collected from Flinders Medical Centre, Adelaide, Australia, between 2016 and 2020. Our primary outcome measures were the area under the receiver operating characteristic (AUC) and the area under the precision-recall curve (AUPR) for fracture classification of AO/OTA malleolar (44) fractures. Secondary outcomes were performance on other fractures visible on ankle radiographs and inter-observer reliability of reviewers. Compared to the weighted mean AUC (wAUC) 0.86 (95%CI 0.82-0.89) for fracture detection in the EVD, the network attained wAUC 0.95 (95%CI 0.94-0.97) for the IVD. The area under the precision-recall curve (AUPR) was 0.93 vs. 0.96. The wAUC for individual outcomes (type 44A-C, group 44A1-C3, and subgroup 44A1.1-C3.3) was 0.82 for the EVD and 0.93 for the IVD. The weighted mean AUPR (wAUPR) was 0.59 vs 0.63. Throughout, the performance was superior to that of a random classifier for the EVD. Although the two datasets had considerable differences, the model transferred well to the EVD and the alternative clinical scenario it represents. The direct clinical implications of this study are that algorithms developed elsewhere need local validation and that discrepancies can be rectified using targeted training. In a wider sense, we believe this opens up possibilities for building advanced treatment recommendations based on exact fracture types that are more objective than current clinical decisions, often influenced by who is present during rounds.
Sections du résumé
BACKGROUND
BACKGROUND
Advances in medical imaging have made it possible to classify ankle fractures using Artificial Intelligence (AI). Recent studies have demonstrated good internal validity for machine learning algorithms using the AO/OTA 2018 classification. This study aimed to externally validate one such model for ankle fracture classification and ways to improve external validity.
METHODS
METHODS
In this retrospective observation study, we trained a deep-learning neural network (7,500 ankle studies) to classify traumatic malleolar fractures according to the AO/OTA classification. Our internal validation dataset (IVD) contained 409 studies collected from Danderyd Hospital in Stockholm, Sweden, between 2002 and 2016. The external validation dataset (EVD) contained 399 studies collected from Flinders Medical Centre, Adelaide, Australia, between 2016 and 2020. Our primary outcome measures were the area under the receiver operating characteristic (AUC) and the area under the precision-recall curve (AUPR) for fracture classification of AO/OTA malleolar (44) fractures. Secondary outcomes were performance on other fractures visible on ankle radiographs and inter-observer reliability of reviewers.
RESULTS
RESULTS
Compared to the weighted mean AUC (wAUC) 0.86 (95%CI 0.82-0.89) for fracture detection in the EVD, the network attained wAUC 0.95 (95%CI 0.94-0.97) for the IVD. The area under the precision-recall curve (AUPR) was 0.93 vs. 0.96. The wAUC for individual outcomes (type 44A-C, group 44A1-C3, and subgroup 44A1.1-C3.3) was 0.82 for the EVD and 0.93 for the IVD. The weighted mean AUPR (wAUPR) was 0.59 vs 0.63. Throughout, the performance was superior to that of a random classifier for the EVD.
CONCLUSION
CONCLUSIONS
Although the two datasets had considerable differences, the model transferred well to the EVD and the alternative clinical scenario it represents. The direct clinical implications of this study are that algorithms developed elsewhere need local validation and that discrepancies can be rectified using targeted training. In a wider sense, we believe this opens up possibilities for building advanced treatment recommendations based on exact fracture types that are more objective than current clinical decisions, often influenced by who is present during rounds.
Identifiants
pubmed: 39367349
doi: 10.1186/s12891-024-07884-2
pii: 10.1186/s12891-024-07884-2
doi:
Types de publication
Journal Article
Validation Study
Observational Study
Langues
eng
Sous-ensembles de citation
IM
Pagination
788Informations de copyright
© 2024. The Author(s).
Références
Olczak J, Pavlopoulos J, Prijs J, Ijpma FFA, Doornberg JN, Lundström C, et al. Presenting artificial intelligence, deep learning, and machine learning studies to clinicians and healthcare stakeholders: an introductory reference with a guideline and a Clinical AI Research (CAIR) checklist proposal. Acta Orthop. 2021;14:1–13.
Liu X, Faes L, Kale AU, Wagner SK, Fu DJ, Bruynseels A, et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Dig Health. 2019;1(6):e271–97.
doi: 10.1016/S2589-7500(19)30123-2
Oliveira e Carmo L, van den Merkho A, Olczak J, Gordon M, Jutte PC, Jaarsma RL, et al. An increasing number of convolutional neural networks for fracture recognition and classification in orthopaedics. Bone Jt Open. 2021;2(10):879–85.
doi: 10.1302/2633-1462.210.BJO-2021-0133
pubmed: 34669518
pmcid: 8558452
Blüthgen C, Becker AS, de Vittoria Martini I, Meier A, Martini K, Frauenfelder T. Detection and localization of distal radius fractures: Deep learning system versus radiologists. Eur J Radiol. 2020;126:108925.
doi: 10.1016/j.ejrad.2020.108925
pubmed: 32193036
Choi JW, Cho YJ, Lee S, Lee J, Lee S, Choi YH, et al. Using a dual-input convolutional neural network for automated detection of pediatric supracondylar fracture on conventional radiography. Invest Radiol. 2020;55(2):101–10.
doi: 10.1097/RLI.0000000000000615
pubmed: 31725064
Zhou QQ, Wang J, Tang W, Hu ZC, Xia ZY, Li XS, et al. Automatic detection and classification of rib fractures on thoracic CT using convolutional neural network: accuracy and feasibility. Kor J Radiol. 2020;21(7):869–79.
doi: 10.3348/kjr.2019.0651
Groot OQ, Bindels BJJ, Ogink PT, Kapoor ND, Twining PK, Collins AK, et al. Availability and reporting quality of external validations of machine-learning prediction models with orthopedic surgical outcomes: a systematic review. Acta Orthop. 2021;92(4):385–93.
Liu X, Cruz Rivera S, Moher D, Calvert MJ, Denniston AK. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. Nat Med. 2020S;26(9):1364–74.
doi: 10.1038/s41591-020-1034-x
pubmed: 32908283
pmcid: 7598943
Olczak J, Emilson F, Razavian A, Antonsson T, Stark A, Gordon M. Ankle fracture classification using deep learning: automating detailed AO Foundation/Orthopedic Trauma Association (AO/OTA) 2018 malleolar fracture identification reaches a high degree of correct classification. Acta Orthop. 2021;92(1):102–8.
doi: 10.1080/17453674.2020.1837420
pubmed: 33103536
Olczak J, Fahlberg N, Maki A, Razavian AS, Jilert A, Stark A, et al. Artificial intelligence for analyzing orthopedic trauma radiographs. Acta Orthop. 2017;88(6):581–6.
doi: 10.1080/17453674.2017.1344459
pubmed: 28681679
pmcid: 5694800
Lind A, Akbarian E, Olsson S, Nåsell H, Sköldenberg O, Razavian AS, et al. Artificial intelligence for the classification of fractures around the knee in adults according to the 2018 AO/OTA classification system. PLoS One. 2021;16(4):e0248809.
doi: 10.1371/journal.pone.0248809
pubmed: 33793601
pmcid: 8016258
Prijs J, Liao Z, To MS, Verjans J, Jutte PC, Stirler V, et al. Development and external validation of automated detection, classification, and localization of ankle fractures: inside the black box of a convolutional neural network (CNN). Eur J Trauma Emerg Surg. 2023;49(2):1057–69.
doi: 10.1007/s00068-022-02136-1
pubmed: 36374292
He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. arXiv:151203385 [cs]. 2015 Dec 10; Available from: http://arxiv.org/abs/1512.03385 . Cited 2021 Dec 7
Collins GS, Reitsma JB, Altman DG, Moons KGM. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): the TRIPOD statement. Ann Intern Med. 2015;162(1):55–63.
doi: 10.7326/M14-0697
pubmed: 25560714
Saito T, Rehmsmeier M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS One. 2015;10(3). Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4349800/ . Cited 2020 Aug 6
Lindsjö U. Classification of ankle fractures: The Lauge-Hansen or AO system? Clin Orthop Relat Res. 1985;199:12–5.
doi: 10.1097/00003086-198510000-00003
Thomsen NO, Overgaard S, Olsen LH, Hansen H, Nielsen ST. Observer variation in the radiographic classification of ankle fractures. J Bone Joint Surg Br. 1991;73(4):676–8.
doi: 10.1302/0301-620X.73B4.2071659
pubmed: 2071659
Nielsen JØ, Dons-Jensen H, Sørensen HT. Lauge-Hansen classification of malleolar fractures: An assessment of the reproducibility in 118 cases. Acta Orthop Scand. 1990;61(5):385–7.
doi: 10.3109/17453679008993545
pubmed: 2239157
Gardner MJ, Demetrakopoulos D, Briggs SM, Helfet DL, Lorich DG. The ability of the Lauge-Hansen classification to predict ligament injury and mechanism in ankle fractures: an MRI study. J Orthop Trauma. 2006;20(4):267–72.
doi: 10.1097/00005131-200604000-00006
pubmed: 16721242
Fonseca L, Nunes I, Nogueira R, Martins G, Mesencio A, Kobata S. Reproducibility of the Lauge-Hansen, Danis-Weber, and AO classifications for ankle fractures. Revista Brasileira de Ortopedia (English Edition). 2017;1:53.
Boszczyk A, Fudalej M, Kwapisz S, Błoński M, Kiciński M, Kordasiewicz B, et al. X-ray features to predict ankle fracture mechanism. Forensic Sci Int. 2018;1(291):185–92.
doi: 10.1016/j.forsciint.2018.08.042
Kwon JY, Chacko AT, Kadzielski JJ, Appleton PT, Rodriguez EK. A novel methodology for the study of injury mechanism ankle fracture analysis using injury videos posted on YouTube.com. J Orthop Trauma. 2010;24(8):477.
doi: 10.1097/BOT.0b013e3181c99264
pubmed: 20657256
Rodriguez EK, Kwon JY, Chacko AT, Kadzielski JJ, Lindsay H, Appleton PT. An update on assessing the validity of the Lauge Hansen classification system for In-vivo ankle fractures using youtube videos of accidentally sustained ankle fractures as a tool for the dynamic assessment of injury. Harvard Orthop J. 2012;14:40–3.
Rodriguez EK, Kwon JY, Herder LM, Appleton PT. Correlation of AO and Lauge-Hansen classification systems for ankle fractures to the mechanism of injury. Foot Ankle Int. 2013;34(11):1516–20.
doi: 10.1177/1071100713491730
pubmed: 23729206
Patton BK, Orfield NJ, Clements JR. Does the Lauge-Hansen injury mechanism predict posterior Malleolar fracture morphology? J Foot Ankle Surg. 2022;61(6):1251–4.
doi: 10.1053/j.jfas.2022.02.013
pubmed: 35317945
Michelson J, Solocoff D, Waldman B, Kendell K, Ahn U. Ankle fractures. The Lauge-Hansen classification revisited. Clin Orthop Relat Res. 1997;345:198–205.
Haraguchi N, Arminger RS. A new interpretation of the mechanism of ankle fracture : JBJS. J Bone Joint Surg Am. 2009;1(91):821–9.
doi: 10.2106/JBJS.G.01288
Meinberg EG, Agel J, Roberts CS, Karam MD, Kellam JF. Fracture and dislocation classification compendium-2018. J Orthop Trauma. 2018;32(Suppl 1):S1-170.
doi: 10.1097/BOT.0000000000001063
pubmed: 29256945
Glen LZQ, Wong JYS, Tay WX, Li TP, Phua SKA, Manohara R, et al. Weber ankle fracture classification system yields greatest interobserver and intraobserver reliability over AO/OTA and Lauge-Hansen classification systems under time constraints in an Asian population. J Foot Ankle Surg. 2023;62(3):505–10.
doi: 10.1053/j.jfas.2022.12.004
pubmed: 36690511
Harper MC. Ankle fracture classification systems: a case for integration of the Lauge-Hansen and AO-Danis-Weber schemes. Foot Ankle. 1992;13(7):404–7.
doi: 10.1177/107110079201300708
pubmed: 1427532
Budny AM, Young BA. Analysis of radiographic classifications for rotational ankle fractures. Clin Podiatr Med Surg. 2008;25(2):139–52.
doi: 10.1016/j.cpm.2007.11.003
pubmed: 18346587
Chen DW, Li B, Yang YF, Yu GR. AO and Lauge-Hansen classification systems for ankle fractures. Foot Ankle Int. 2013;34(12):1750–1750.
doi: 10.1177/1071100713502467
pubmed: 24319034
Tartaglione JP, Rosenbaum AJ, Abousayed M, DiPreta JA. Classifications in brief: Lauge-Hansen classification of ankle fractures. Clin Orthop Relat Res. 2015;473(10):3323–8.
doi: 10.1007/s11999-015-4306-x
pubmed: 25900357
pmcid: 4562928
Rydberg EM, Zorko T, Sundfeldt M, Möller M, Wennergren D. Classification and treatment of lateral malleolar fractures - a single-center analysis of 439 ankle fractures using the Swedish Fracture Register. BMC Musculoskelet Disord. 2020;21(1):521.
doi: 10.1186/s12891-020-03542-5
pubmed: 32758193
pmcid: 7409659
Badgeley MA, Zech JR, Oakden-Rayner L, Glicksberg BS, Liu M, Gale W, et al. Deep learning predicts hip fracture using confounding patient and healthcare variables. Npj Digital Med. 2019;2(1):1–10.
doi: 10.1038/s41746-019-0105-1
MIT Press. Dataset shift in machine learning. Quiñonero-Candela J, editor. Cambridge, Mass: MIT Press; 2009. 229 p. (Neural information processing series). https://mitpress.mit.edu/9780262545877/dataset-shift-in-machine-learning/ .
Dreizin D, Goldmann F, LeBedis C, Boscak A, Dattwyler M, Bodanapally U, et al. An automated deep learning method for tile AO/OTA pelvic fracture severity grading from trauma whole-body CT. J Digit Imaging. 2021;34(1):53–65.
doi: 10.1007/s10278-020-00399-x
pubmed: 33479859
pmcid: 7886919
Qi Y, Zhao J, Shi Y, Zuo G, Zhang H, Long Y, et al. Ground truth annotated femoral X-ray image dataset and object detection based method for fracture types classification. IEEE Access. 2020;8:189436–44.
doi: 10.1109/ACCESS.2020.3029039
Tanzi L, Vezzetti E, Moreno R, Aprato A, Audisio A, Massè A. Hierarchical fracture classification of proximal femur X-ray images using a multistage deep learning approach. Eur J Radiol. 2020;1(133):109373.
doi: 10.1016/j.ejrad.2020.109373
Yoon SJ, Hyong Kim T, Joo SB, Eel OhS. Automatic multi-class intertrochanteric femur fracture detection from CT images based on AO/OTA classification using faster R-CNN-BO method. J Appl Biomed. 2020;18(4):97–105.
doi: 10.32725/jab.2020.013
pubmed: 34907762
Lee KM, Lee SY, Han CS, Choi SM. Long bone fracture type classification for limited number of CT data with deep learning. In: Proceedings of the 35th Annual ACM Symposium on Applied Computing. New York, NY, USA: Association for Computing Machinery; 2020. p. 1090–5. Available from: https://doi.org/10.1145/3341105.3373900 . Cited 2022 Jan 6
Olsson S, Akbarian E, Lind A, Razavian AS, Gordon M. Automating classification of osteoarthritis according to Kellgren-Lawrence in the knee using deep learning in an unfiltered adult population. BMC Musculoskelet Disord. 2021;22(1):844.
doi: 10.1186/s12891-021-04722-7
pubmed: 34600505
pmcid: 8487469
Chung SW, Han SS, Lee JW, Oh KS, Kim NR, Yoon JP, et al. Automated detection and classification of the proximal humerus fracture by using deep learning algorithm. Acta Orthop. 2018;89(4):468–73.
doi: 10.1080/17453674.2018.1453714
pubmed: 29577791
pmcid: 6066766
Lim HC, Adie S, Naylor JM, Harris IA. Randomised trial support for orthopaedic surgical procedures. PLoS One. 2014;9(6):e96745.
doi: 10.1371/journal.pone.0096745
pubmed: 24927114
pmcid: 4057075