Categories
Nevin Manimala Statistics

Evaluation of Google and ChatGPT responses to common patient questions about scoliosis

Spine Deform. 2025 Aug 17. doi: 10.1007/s43390-025-01169-x. Online ahead of print.

ABSTRACT

OBJECTIVE: Scoliosis is primarily seen during adolescence and often causes significant concern among patients and their families when the deformity becomes noticeable. With technological advancements, patients frequently search the Internet for information regarding their disease’s diagnosis, treatment, prognosis, and potential complications. This study aims to assess the quality of Google and ChatGPT responses to questions about scoliosis.

METHODS: A search was conducted using Google with the keyword “scoliosis.” The first ten questions listed under the “People Also Ask” (FAQs) section were recorded. Responses to these questions from ChatGPT and Google were evaluated using a four-level rating system: “Excellent response not requiring clarification,” “satisfactory requiring minimal clarification,” “satisfactory requiring moderate clarification,” and “unsatisfactory requiring substantial clarification.” Additionally, the sources of the responses were categorized as academic, commercial, medical practice, governmental, or social media.

RESULTS: ChatGPT provided “excellent responses requiring no explanation” for 9 out of 10 questions (90%). In contrast, none of Google’s responses were categorized as excellent; 50% were unsatisfactory, requiring substantial clarification; 40% were satisfactory, requiring moderate clarification, and 10% were satisfactory, requiring minimal clarification. ChatGPT sourced 60% of its responses from academic resources and 40% from medical practice websites. Conversely, Google did not use scholarly sources, with 50% of reactions derived from commercial websites, 30% from medical practice sources, and 20% from social media. When the agreement between the 4 raters, regardless of AI, was examined using Fleiss Multirater Kappa in the reliability analysis, a statistically significant (p < 0.001) moderate agreement (κ = 0.48) was found.

CONCLUSION: ChatGPT outperformed Google by providing more accurate, well-referenced responses and utilizing more credible academic sources. This suggests its potential as a more reliable tool for obtaining health-related information.

PMID:40819320 | DOI:10.1007/s43390-025-01169-x

By Nevin Manimala

Portfolio Website for Nevin Manimala