Int J Rheum Dis. 2026 May;29(5):e70676. doi: 10.1111/1756-185x.70676.
ABSTRACT
BACKGROUND: Although artificial intelligence (AI) is increasingly recognized for enhancing efficiency in healthcare services, its role in exercise and rehabilitation strategies remains unclear.
OBJECTIVES: To assess the quality, reliability, accuracy, and readability of three large language models (LLMs), ChatGPT-5, DeepSeek-R1, and Gemini 2.5, in response to questions commonly asked by patients with rheumatoid arthritis (RA) regarding exercise and rehabilitation strategies.
METHODS: Using a cross-sectional comparative design, a structured assessment framework was developed that included exercise- and rehabilitation-related questions grouped into five thematic domains between 22 and 29 September 2025: exercise and physical activity (S1), hand function (S2), joint protection techniques (S3), breathing and pulmonary health (S4), and general topics (S5). Information quality was evaluated with the modified DISCERN tool, while content reliability was evaluated with the Reliability Score, and accuracy was measured using a five-point likert Accuracy Scale. Readability was determined through the Flesch Reading Ease scale.
RESULTS: DeepSeek-R1 and ChatGPT-5 achieved significantly higher scores for quality, reliability, accuracy, and readability compared with Gemini 2.5. In the S1 and S2 subgroups, both models consistently outperformed Gemini 2.5 across all evaluation metrics. Mean readability scores were 50.20 for DeepSeek-R1, 46.66 for ChatGPT-5, and 37.33 for Gemini 2.5, indicating that all responses were classified as difficult to read.
CONCLUSIONS: This study highlighted that DeepSeek-R1 and ChatGPT-5 generated more accurate and reliable RA-related responses than Gemini 2.5; however, the complex language used by all models may limit accessibility for patients with low health literacy, underscoring the need for professional supervision in RA exercise planning.
PMID:42084335 | DOI:10.1111/1756-185x.70676

