Enhancing Surgical History-Taking Skills in Medical Students Using AI-Driven Simulated Patients: A Randomized Controlled Trial

Introduction

Effective communication is an indispensable skill in surgical practice, beginning fundamentally with the ability to take a comprehensive and accurate patient history. Proficiency in history-taking enables surgeons to gather essential clinical information, build rapport with patients, and tailor subsequent diagnostic and therapeutic interventions. Traditionally, medical curricula integrate simulation-based training methods to enhance these interpersonal and clinical communication skills, utilizing standardized or simulated patients (SPs) to provide interactive, experiential learning environments. However, the availability and standardization of human SPs can be limited by logistical, financial, and scheduling constraints.

Recent advancements in artificial intelligence (AI), particularly deep language learning models (DLM) such as ChatGPT developed by OpenAI, offer innovative opportunities to augment medical education. These models are capable of generating contextually appropriate, detailed, and nuanced simulated patient interactions in real-time, potentially addressing limitations of traditional simulation methods.

This article critically examines a recently published randomized controlled trial evaluating the integration of a DLM-based simulation tool as a virtual SP to enhance surgical history-taking skills among senior undergraduate medical students during their clinical rotations. The study’s implications for clinical education and future AI applications are discussed.

Study Background and Educational Need

Effective surgical communication begins with history-taking that is structured, patient-centered, and clinically relevant. Despite its importance, medical students often report insufficient confidence and variability in exposure to diverse clinical scenarios during surgical rotations. Simulation training has been employed to bridge this gap, yet dependence on human SPs remains resource-intensive.

Deep language learning models represent a frontier in educational technology, exhibiting abilities to maintain coherent, realistic dialogues and mimic human-like responses. Utilizing such AI-driven SPs could provide scalable, accessible, and consistent simulation experiences. However, robust evidence from randomized controlled trials assessing their educational efficacy in surgical training settings is limited.

Study Design and Methods

McCarrick et al. conducted a randomized controlled trial involving ninety senior medical students enrolled in a surgical module. Participants were allocated via randomized cluster sampling into two equal groups: a control arm receiving standard experiential learning during clinical rotations, and an intervention arm receiving additional training comprising three structured sessions engaging with a DLM (specifically ChatGPT, OpenAI) acting as the simulated patient.

The DLM-based interactions were scripted through student prompts, and the conversation transcripts were subsequently submitted for tutor evaluation to ensure clinical appropriateness and educational validity. All students undertook standardized Objective Structured Clinical Examinations (OSCEs) involving history-taking from a human SP. Assessors were blinded to student group allocation to mitigate bias. Baseline OSCEs were conducted to ascertain initial competency, followed by a repeat assessment after either the intervention period or an equivalent duration of traditional learning.

Additionally, students in the intervention group completed an anonymous survey to capture subjective metrics including communication confidence, perceived realism and detail of the AI histories, and willingness to utilize the tool again.

Key Findings

After successful pilot testing, the formal trial engaged ninety participants evenly split between groups. Baseline OSCE scores reflecting history-taking competency were statistically comparable (p-value not specified) between groups.

Post-intervention results demonstrated a statistically significant improvement in OSCE scores exclusively within the intervention group (p < 0.001). The observed effect size, measured by Cohen's d, was 0.37 in the intervention group compared with 0.19 in controls, indicating a meaningful educational benefit attributable to the DLM simulation sessions.

Content quality from the DLM was evaluated as uniformly appropriate and clinically relevant by tutors, affirming the model's capacity to generate contextually correct patient scenarios.

Survey participation rate was 62% among intervention students. Of respondents, 57% self-reported increased confidence in communication skills, 72% characterized the histories generated by the DLM as rich and detailed, and an overwhelming 95% expressed willingness to engage with the DLM-based simulation tool in future training.

No adverse effects or detriments to learning were identified. The data collectively support enhanced pedagogical outcomes attributable to DLM-augmented simulation.

Expert Commentary and Considerations

This pioneering study provides rigorous evidence supporting the integration of advanced AI language models into surgical education, marking a potential paradigm shift in simulation training.

The randomized controlled design, use of blinded assessors, and objective competency measures strengthen the validity of findings. The moderate effect size illustrates a tangible enhancement in skill acquisition, which is educationally significant when considering the efficiency benefits of AI-driven tools.

However, some limitations merit attention. The study was confined to a single institution and a specific cohort, which may affect generalizability. Longer-term retention of skills post-intervention was not assessed, nor was the tool’s efficacy in more complex or diverse surgical scenarios.

Further research should explore longitudinal outcomes, integration with other educational modalities, and scalability across institutions. Additionally, ethical considerations regarding AI use in education, including data privacy, student dependence on AI, and potential biases in model outputs, must be rigorously addressed.

Notwithstanding these considerations, this study aligns with an emerging body of literature advocating for thoughtful incorporation of AI to complement, not replace, human clinical teaching.

Conclusions

The adoption of a deep language learning model as a simulated patient significantly improved surgical history-taking competence and student confidence in a randomized controlled trial of senior undergraduate medical students. As AI technology evolves, such tools have the potential to enhance surgical education by providing standardized, accessible, and engaging simulation experiences.

This innovation addresses critical gaps in experiential learning and resource constraints inherent in traditional simulation programs. Future efforts should aim to validate these findings in broader settings and develop guidelines to optimize AI integration while safeguarding educational quality and ethical standards.

References

McCarrick CA, McEntee PD, Boland PA, Donnelly S, O’Meara Y, Heneghan H, Cahill RA. A Randomized Controlled Trial of a Deep Language Learning Model-Based Simulation Tool for Undergraduate Medical Students in Surgery. J Surg Educ. 2025 Sep;82(9):103629. doi: 10.1016/j.jsurg.2025.103629. Epub 2025 Jul 28. PMID: 40729832.

Kneebone R. Simulation in surgical training: Educational issues and practical implications. Med Educ. 2003;37(3):267-77.

Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019 Jan;25(1):44-56.

Muller AM, et al. Artificial intelligence-enabled virtual patients for medical education: A scoping review. BMC Med Educ. 2023;23(1):123.