Artificial Intelligence Can Answer Postoperative Questions About Distal Radius Fractures-But Can Patients Understand the Answers?

Scritto il 08/12/2025
da Rae Tarapore

J Hand Surg Glob Online. 2025 Sep 12;7(6):100822. doi: 10.1016/j.jhsg.2025.100822. eCollection 2025 Nov.

ABSTRACT

PURPOSE: The purpose of this study was to assess the validity, reliability, and readability of responses to common patient questions about postoperative from ChatGPT, Microsoft Copilot, and Google Gemini.

METHODS: Twenty-seven thoroughly vetted questions regarding distal radius fractures repair surgery were compiled and entered into ChatGPT 4, Gemini, and Copilot. The responses were analyzed for quality, accuracy, and readability using the DISCERN scale, the Journal of the American Medical Association benchmark criteria, Flesch-Kincaid Reading Ease Score, and Flesch-Kincaid Grade Level. Citations provided by Google Gemini and Microsoft Copilot were further categorized by source of reference. Five questions were resubmitted, requesting response simplification. The responses were re-evaluated using the same metrics.

RESULTS: All three artificial intelligence platforms produced answers that were considered "good" quality (DISCERN scores >50). Copilot had the highest quality of information (68.3), followed by Gemini (62.9) and ChatGPT (52.9). The information provided by Copilot demonstrated the highest reliability, with a Journal of the American Medical Association benchmark criterion of 3 (of 4) compared with Gemini (1) and ChatGPT (0). All three platforms generated complex texts with Flesch-Kincaid Reading Ease Scores ranging between 35.8 and 41.4 and Flesch-Kincaid Grade Level scores between 10.5 and 12.1, indicating a minimum of high-school graduate reading level required. After simplification, Gemini's reading level remained unchanged, whereas ChatGPT improved to that of a seventh-grade reading level and Copilot improved to that of an eighth-grade reading level. Copilot had a higher number of references (74) compared with Gemini (36).

CONCLUSIONS: All three platforms provided safe and reliable answers to postoperative questions about distal radius fractures. High reading levels provided by AI remain the biggest barrier to patient accessibility.

CLINICAL RELEVANCE: For the current state of mainstream AI platforms, they are best suited as adjunct tools to support, rather than replace, clinical communication from health care workers.

PMID:41356624 | PMC:PMC12675809 | DOI:10.1016/j.jhsg.2025.100822