Cureus. 2025 Apr 8;17(4):e81906. doi: 10.7759/cureus.81906. eCollection 2025 Apr.
ABSTRACT
Background Midshaft clavicle fractures are common orthopaedic injuries with no consensus on optimal management. Large language models (LLMs) such as ChatGPT (OpenAI, San Francisco, USA) present a novel tool for patient education and clinical decision-making. This study aimed to evaluate the accuracy and consistency of ChatGPT's responses to patient-focused and clinical decision-making questions regarding this injury. Methods ChatGPT-4o mini was prompted three times with 14 patient-focused and orthopaedic clinical decision-making questions. References were requested for each response. Response accuracy was graded as: (I) comprehensive; (II) correct but inadequate; (III) mixed with correct and incorrect information; or (IV) completely incorrect. Two consultant and two trainee orthopaedic surgeons evaluated the accuracy and consistency of responses. References provided by ChatGPT were evaluated for accuracy. Results All 42 responses were graded as (III), indicating a mix of correct and incorrect information, with 78.6% consistency across the responses. Of the 128 references provided, 0.8% were correct, 10.9% were incorrect, and 88.3% were fabricated. Only 3.1% of references accurately reflected the cited conclusions. Conclusion ChatGPT demonstrates limitations in accuracy and consistency when answering patient-focused queries or aiding in orthopaedic clinical decision-making for midshaft clavicle fractures. Caution is advised before integrating ChatGPT into clinical workflows for patients or orthopaedic clinicians.
PMID:40342470 | PMC:PMC12059606 | DOI:10.7759/cureus.81906