Artificial intelligence and human expertise in hand trauma diagnosis: A collaborative approach

Scritto il 25/07/2025
da Céline Klein

Orthop Traumatol Surg Res. 2025 Jul 23:104338. doi: 10.1016/j.otsr.2025.104338. Online ahead of print.

ABSTRACT

BACKGROUND: Hand injuries are a frequent reason for an emergency department visit and require a radiographic analysis. Misdiagnosed or undiagnosed injuries may lead to poor functional outcomes. Artificial intelligence (AI) is providing new tools for the diagnosis of injuries in routine clinical practice. The primary objective of the present study was to assess the diagnostic performance of AI in the diagnosis of hand fractures and dislocations, when compared with reviews by two experienced hand surgeons. The secondary objective was to assess the diagnostic performance of a resident vs. the AI.

HYPOTHESIS: On the basis of standard radiographs, the AI system would diagnose metacarpal and phalangeal fractures and dislocations with the same level of diagnostic accuracy (i.e. sensitivity and specificity) as senior hand surgeons.

PATIENTS AND METHODS: This single-centre, retrospective study was conducted on hand radiography datasets collected from consecutive patients over the age of 16 consulting in an emergency department. The radiographic data were reviewed by two senior hand surgeons (constituting the gold standard) and a resident. Based on a contingency table, sensitivity, and specificity, the AI's and resident's respective abilities to detect fracture/dislocation were compared with the gold standard. The resident and the AI were also compared.

RESULTS: 1915 radiographic datasets (4738 X-rays for 1892 patients) were included in the analysis. The Cohen's kappa of 0.865 indicated almost perfect agreement between the two senior surgeons. The AI's analysis yielded a sensitivity [95% confidence interval] of 97.6% [0.96-0.98] and a specificity of 88.9% [87.2-90.4]. False positives were noted in 162 cases. The AI failed to diagnose 11 injuries (0.6%): two dislocations of the proximal interphalangeal joint, seven fractures of the phalanx (including one third phalanx amputation and two metacarpal fractures). Relative to the AI, the resident's analysis yielded a significantly lower sensitivity (p < 0.0001) and a significantly higher specificity (p = 0.007).

CONCLUSION: An AI may be a valuable tool in emergency settings - especially for less experienced practitioners - but does not surpass the diagnostic performance of senior surgeons. The AI's ability to detect dislocations and amputations must be improved. An AI can complement (but not replace) a thorough clinical examination.

LEVEL OF EVIDENCE: III.

PMID:40712955 | DOI:10.1016/j.otsr.2025.104338