J Clin Med. 2025 Nov 13;14(22):8045. doi: 10.3390/jcm14228045.
ABSTRACT
Background: The application of large language model (LLM) in surgical decision-making is rapidly expanding, yet its potential in hand and peripheral nerve surgery remains largely unexplored. This study assessed the diagnostic and therapeutic performance of a large language model (ChatGPT-4o) in scenarios characterized by multiple valid management strategies and absent expert consensus. Methods: Three representative cases-thumb carpometacarpal (CMC I) arthritis, scaphoid nonunion, and carpal tunnel syndrome (CTS)-were developed to reflect frequent conditions in hand surgery with competing but accepted treatment options. Each case was submitted to ChatGPT-4o using a standardized prompt. LLM-generated responses were evaluated by 52 participants (34 board-certified hand surgeons and 18 residents) across diagnostic accuracy, clinical relevance, and completeness. Readability indices, including Flesch-Kincaid Grade Level, were analyzed to assess appropriateness for a medical audience. Results: ChatGPT-4o demonstrated coherent but limited diagnostic accuracy (mean 2.9 ± 1.2 SD), moderate clinical relevance (3.5 ± 1.0 SD), and slightly higher completeness (3.4 ± 1.1 SD). Performance was strongest in the standardized scenario (carpal tunnel syndrome, CTS) and weakest in individualized reasoning (CMC I arthritis). No significant differences were observed between experts and residents (p > 0.05). In higher-level reasoning, ChatGPT-4o performed best in CTS and weakest in CMC I arthritis. Readability confirmed professional-level language (mean Flesch-Kincaid Grade Level: 16.4). Conclusions: ChatGPT-4o shows promise as a supportive tool for diagnostic reasoning and surgical education, particularly where standardized frameworks exist. Its limitations in ambiguous scenarios highlight the ongoing need for expert oversight. Future large language model development should emphasize specialty-specific training and context-aware reasoning to enhance their role in surgical decision support.
PMID:41303080 | PMC:PMC12653297 | DOI:10.3390/jcm14228045