Accuracy of artificial intelligence in carpal tunnel syndrome management: A comparative analysis of ChatGPT-4o and Gemini 1.5 Pro

Scritto il 12/12/2025
da Prabhjot Singh

Hand Surg Rehabil. 2025 Dec 10:102560. doi: 10.1016/j.hansur.2025.102560. Online ahead of print.

ABSTRACT

PURPOSE: This study evaluated the accuracy of leading AI models, ChatGPT-4o and Gemini 1.5 Pro, in providing management recommendations for CTS in patient scenarios against American Academy of Orthopedic Surgery (AAOS) guidelines.

METHODS: Treatment ratings for CTS patient scenarios from the AAOS Appropriate Use Criteria for Management of Carpal Tunnel Syndrome Pathology were compared with ratings provided by ChatGPT and Gemini 1.5 Pro using a scale from 1 to 9, with discrepancies in treatment rating calculated by contrasting scores with AAOS ratings. Spearman correlations and paired t-tests (α < .05) were conducted to assess consensus, while heatmaps were employed to display the findings.

RESULTS: A total of 810 paired scores were generated across 135 patient scenarios. Compared to AAOS guidelines, ChatGPT-4o under-recommended steroid injection (mean error -2.7 ± 1.2; P < .001) and carpal tunnel release (mean error -1.8 ± 1.7; P < .001) while over-recommending electrodiagnostic studies (mean error 4.5 ± 3.8; P < .001). Gemini 1.5 Pro demonstrated a similar pattern, under-recommending steroid injection (mean error -2.4 ± 1.4; P < .001) and carpal tunnel release (mean error -1.8 ± 1.3; P < .001) but showing a less pronounced over-recommendation for electrodiagnostic studies (mean error 3.7 ± 3.2; P < .001). Gemini 1.5 Pro outperformed ChatGPT-4o in alignment with AAOS guidelines, with a stronger Spearman correlation (Rho = 0.782 vs. 0.53, P < 0.001).

CONCLUSIONS: Gemini 1.5 Pro showed a stronger overall alignment with AAOS guidelines, indicating a more refined diagnostic approach. Ultimately, these platforms are bound by the limitations of algorithmic biases, posing a risk for misdiagnosis.

LEVEL OF EVIDENCE: Level III - Non-Experimental Study.

PMID:41386318 | DOI:10.1016/j.hansur.2025.102560