Artificial Intelligence in Gastric Endoscopy: Decoding the Findings of a Landmark Multicenter Randomized Controlled Trial

Highlights

In a cohort of 29,514 patients, AI assistance did not significantly improve the primary detection rate of pathologically confirmed gastric neoplasms (RR, 1.13; P = .25).
AI integration substantially reduced procedural blind spots from a mean of 2.52 to 1.07, suggesting a major role in standardizing endoscopic quality.
Exploratory analysis identified significant benefits for less experienced endoscopists and during periods of high clinical fatigue.
The AI system achieved 100% diagnostic sensitivity for pathologically confirmed gastric adenocarcinoma, though its utility in low-grade intraepithelial neoplasia remains limited.

Background

Gastric cancer remains one of the leading causes of cancer-related mortality worldwide, particularly in East Asia. The prognosis is heavily dependent on the stage at diagnosis; however, the detection of early gastric neoplasms remains a significant clinical challenge. Conventional white-light esophagogastroduodenoscopy (EGD) is the gold standard for screening, but it is limited by human factors, including endoscopist experience, procedural fatigue, and the inherent difficulty in identifying subtle mucosal changes. Previous studies have estimated that up to 20% of gastric cancers may be missed during routine endoscopy.

To address these limitations, Computer-Aided Detection (CADe) and Diagnosis (CADx) systems based on deep learning have been developed. While early-phase studies and single-center trials have shown promise, high-quality evidence from multicenter, large-scale randomized controlled trials (RCTs) has been sparse. The recent study by Dong et al., published in Gastroenterology, provides a critical evaluation of these technologies in a real-world, high-volume clinical setting.

Key Content

Study Methodology and Design

The trial was conducted across 24 hospitals in China between December 2021 and November 2023. It utilized a robust randomized design, enrolling a massive cohort of 29,514 participants. Patients were randomized to either AI-assisted EGD or conventional, nonassisted EGD. The primary outcome was the detection rate of gastric neoplasms (including gastric cancer and intraepithelial neoplasia) after a rigorous central pathologic review. Secondary outcomes were designed to capture the broader clinical impact, including the number of blind spots, inspection time, and detection rates of precursor lesions like intestinal metaplasia and gastric atrophy.

Primary and Secondary Outcomes

Contrary to some smaller pilot studies, the Intention-To-Treat (ITT) analysis revealed that AI did not significantly improve the detection rate of pathologically confirmed gastric neoplasms. The detection rate in the AI group was 1.42% compared to 1.25% in the control group (Relative Risk [RR], 1.13; 95% CI, 0.92-1.38; P = .25).

However, several secondary outcomes provided nuanced insights into the system’s performance:

Original Pathology Discrepancy: When based on the original pathology reports (prior to central review), the AI group showed a statistically significant improvement in detection (4.06% vs 3.57%; RR, 1.14; P = .03). This suggests that AI may be identifying more lesions that are borderline or subject to inter-pathologist variability.
Quality Control: One of the most striking findings was the reduction in “blind spots.” The AI system monitored mucosal coverage in real-time, reducing the mean number of missed areas from 2.52 to 1.07 (P < .001). This indicates that while AI might not change the biology of detection for every lesion, it significantly improves the thoroughness of the examination.
Procedural Metrics: AI-assisted procedures were associated with longer inspection and total procedure times. This likely reflects the time taken to evaluate AI-generated alerts and perform the extra biopsies prompted by the system.

Subgroup and Sensitivity Analysis

Subgroup analyses offered perhaps the most clinically actionable data. The benefit of AI was more pronounced among endoscopists with fewer years of experience. Furthermore, during “fatigue periods” (late in the shift or high-volume days), the AI system acted as a vital safety net, maintaining detection rates that might otherwise have dipped due to human error. In terms of diagnostic accuracy, the system was exceptionally sensitive for advanced lesions, detecting 100% of confirmed adenocarcinomas and 91.9% of high-grade intraepithelial neoplasia, though its sensitivity for low-grade lesions was lower (57.1%).

Expert Commentary

The results of this trial present a paradox that requires careful interpretation. While the study failed to meet its primary endpoint, the data should not be viewed as a failure of AI technology. Instead, it highlights the high baseline proficiency of the endoscopists in the participating centers, which may have created a “ceiling effect.” In centers where the baseline detection rate is already high, the incremental benefit of AI on absolute detection rates is harder to demonstrate statistically.

The significant reduction in blind spots is arguably as important as the detection rate itself. In clinical practice, the consistency of the examination is a surrogate for long-term cancer prevention. By ensuring that all anatomical regions are visualized, the AI system provides a standardized level of care that is less dependent on the individual endoscopist’s state of mind or schedule. However, the increased procedure time and the disparity between original and reviewed pathology suggest a risk of “over-diagnosis” or at least an increase in biopsies of clinically insignificant lesions. Clinicians must balance the drive for 100% sensitivity against the potential for procedural inefficiency and patient anxiety stemming from benign biopsies.

Conclusion

This multicenter RCT provides the most comprehensive data to date on the role of AI in upper GI endoscopy. While AI-assisted devices may not yet be a “silver bullet” for increasing pathologically confirmed gastric neoplasm detection in expert hands, their role in quality assurance, trainee education, and fatigue mitigation is undeniable. The reduction in blind spots represents a significant leap forward in procedural standardization.

Future research should focus on refining AI algorithms to improve the specificity for low-grade lesions and investigating the long-term impact on interval gastric cancer rates. For now, AI should be viewed as a sophisticated “co-pilot” that enhances procedural quality rather than a replacement for clinical judgment.

References

Dong Z, Wu L, Du H, et al. Effect of a Computer-Aided Device for Detecting Gastric Neoplasms: A Multicenter, Randomized Controlled Trial. Gastroenterology. 2026; PMID: 41801173.
Pimentel-Nunes P, et al. Endoscopic submucosal dissection: European Society of Gastrointestinal Endoscopy (ESGE) Guideline. Endoscopy. 2022. (Contextual Guideline).
Zhang M, et al. Deep learning in gastric cancer: A review. World J Gastroenterol. 2023. (Contextual Review).