J Hand Surg Glob Online. 2025 Sep 20;7(6):100831. doi: 10.1016/j.jhsg.2025.100831. eCollection 2025 Nov.
ABSTRACT
PURPOSE: The rise of artificial intelligence (AI) in health care comes with increasing concerns about the use and integrity of the information it generates. Chat Generative Pre-Trained Transformer (ChatGPT) 3.5, Google Gemini, and Bing Copilot are free AI chatbot platforms that may be used for answering medical questions and disseminating medical information. Given that carpal tunnel syndrome accounts for 90% of all neuropathies, it is important to understand the accuracy of the information patients may be receiving. The purpose of this study is to determine the use and accuracy of responses generated by ChatGPT, Google Gemini, and Bing Copilot in answering frequently asked questions about carpal tunnel syndrome.
METHODS: Two independent authors scored responses using the DISCERN tool. DISCERN consists of 15 questions assessing health information on a five-point scale, with total scores ranging from 15 to 75 points. Then, a two-factor analysis of variance was conducted, with scorer and chatbot type as the factors.
RESULTS: One-way analysis of variance revealed no significant difference in DISCERN scores among the three chatbots. The chatbots each scored in the "fair" range, with means of 45 for ChatGPT, 48 for Bing Copilot, and 46 for Google Gemini. The average Journal of the American Medical Association score for ChatGPT and Google Gemini surpassed that of Bing Copilot, with averages of 2.3, 2.3, and 1.8, respectively.
CONCLUSIONS: ChatGPT, Google Gemini, and Bing Copilot platforms generated relatively reliable answers for potential patient questions about carpal tunnel syndrome. However, users should continue to be aware of the shortcomings of the information provided, given the lack of citations, potential for misconstrued information, and perpetuated biases that inherently come with using such platforms. Future studies should explore the response quality for less common orthopedic pathologies and assess patient perceptions of response readability to determine the value of AI as a patient resource across the medical field.
TYPE OF STUDY/LEVEL OF EVIDENCE: Cross-sectional study V.
PMID:41049275 | PMC:PMC12489814 | DOI:10.1016/j.jhsg.2025.100831