Introduction
puerperal mastitis is a common inflammatory condition affecting breastfeeding women, characterized by breast pain, swelling, and sometimes infection. Effective outpatient management is critical to ensure prompt treatment, patient comfort, and prevention of complications such as abscess formation. The integration of artificial intelligence (AI) into clinical workflows offers promising potential to enhance outpatient decision-making and patient education. ChatGPT, an advanced language model developed by OpenAI, has attracted attention for its capability to provide information and support to both clinicians and patients. However, its performance in specific outpatient settings, such as general surgery cases involving puerperal mastitis, remains underexplored.
Objectives
This study aimed to evaluate the utility of ChatGPT-4 as a virtual outpatient assistant in managing puerperal mastitis. Specifically, the research assessed the accuracy, clarity, comprehensiveness, adherence to clinical guidelines, and safety of ChatGPT-4’s responses to common patient questions posed in Turkish.
Methods
Fifteen frequently asked questions about puerperal mastitis were collected from public health websites and online forums that serve Turkish-speaking patients. These questions were grouped into four categories: general information (2 questions), symptoms and diagnosis (6 questions), treatment (2 questions), and prognosis (5 questions).
Each question was submitted to ChatGPT-4 on September 3, 2024, and a single Turkish-language answer was recorded. A panel of five evaluators—three board-certified general surgeons and two general surgery residents—assessed the responses based on five criteria:
1. Sufficient length to cover the topic adequately
2. Use of language understandable to patients
3. Accuracy of medical information provided
4. Compliance with current clinical guidelines
5. Ensuring patient safety in recommendations
Quantitative measures included the DISCERN instrument for assessing written health information quality, the Flesch-Kincaid readability score adapted for Turkish, and inter-rater reliability computed via the intraclass correlation coefficient (ICC).
Results
All 15 questions were evaluated. Overall, ChatGPT-4’s answers were rated “excellent” by the panel, especially for treatment and prognosis-related queries. Statistical analysis revealed a significant difference in DISCERN scores across question types (P = .01), with treatment and prognosis questions scoring higher than general and diagnostic questions. However, no significant differences were found across evaluator-based ratings for length, understandability, accuracy, guideline adherence, or patient safety, nor in JAMA benchmark scores or readability levels (P > .05).
Inter-rater agreement was good across all evaluation parameters (ICC = 0.772), though agreement varied when considering individual criteria. Correlation analysis indicated no significant overall association between subjective evaluator ratings and objective quality measures. Notably, a strong positive correlation was identified between compliance with literature and patient safety for one particular question (r = 0.968, P < .001).
Discussion
The findings suggest that ChatGPT-4 can provide reliable and clear information on puerperal mastitis, particularly regarding treatment options and prognosis, which are critical areas for patient decision-making. The model’s responses were generally well-aligned with current clinical guidelines and emphasized patient safety.
However, variability among evaluators and the subjective nature of some assessments underscore the necessity for continued refinement of AI tools in clinical contexts. The absence of strong correlations between subjective and objective quality metrics highlights challenges in evaluating AI-generated health information.
Future research should focus on enhancing the reliability of AI assistants through iterative questioning techniques and regular updates to their medical knowledge bases. This dynamic approach can improve the accuracy, clarity, and clinical safety of AI responses, making them more effective and accessible for outpatient care.
Conclusion
ChatGPT-4 demonstrates promising capability as a virtual outpatient assistant in puerperal mastitis management, especially in delivering treatment and prognosis information. Nonetheless, further optimization and rigorous validation are required before widespread clinical adoption. Integrating AI tools with ongoing clinician oversight will be key to maximizing benefits while safeguarding patient safety.
References
Dolu F, Ay OF, Kupeli AH, Karademir E, Büyükavcı MH. Evaluation of ChatGPT-4 as an Online Outpatient Assistant in Puerperal Mastitis Management: Content Analysis of an Observational Study. JMIR Med Inform. 2025 Jul 24;13:e68980. doi: 10.2196/68980. PMID: 40705609; PMCID: PMC12288767.