Endoscopic Scoring After Crohn’s Surgery: Which Indices Reliably Detect Recurrence? Lessons from the PREVENT Trial Analysis

Highlight

• Central-reader analysis of 70 ileocolonoscopy videos from the PREVENT trial found substantial interrater reliability (ICC 0.74–0.80) and large responsiveness (WinP 0.75–0.83) for the Rutgeerts and modified Rutgeerts scores, ileal REMIND, SES-CD and CDEIS in the neoterminal ileum.

• The POCER index and anastomotic REMIND score demonstrated lower reliability and minimal responsiveness (ICC 0.49 and 0.30; WinP ≈0.54 and 0.53 respectively).

• All indices performed worse for assessments at the anastomosis or distal colon compared with the neoterminal ileum, highlighting a persistent measurement gap for postoperative evaluation beyond the ileum.

Background

For patients with Crohn’s disease (CD) who undergo ileocolic resection, endoscopic assessment of the neoterminal ileum and anastomosis is central to detecting early postoperative recurrence and guiding preventive therapy. Multiple endoscopic scoring systems exist — the Rutgeerts score and its variants, POCER index, REMIND score, SES-CD and CDEIS — but their comparative measurement properties when applied to postoperative videos have been incompletely characterized. Reliable (reproducible) and responsive (able to detect treatment-related differences) indices are essential both for clinical decision-making and for use as trial endpoints. The PREVENT trial, a randomized, placebo-controlled study of infliximab (REMICADE) to prevent postoperative recurrence, provided a set of standardized ileocolonoscopy videos allowing a rigorous assessment of these measurement properties.

Study design

This study is a secondary, blinded central-reader analysis of ileocolonoscopy videos obtained in the PREVENT trial (Prospective, Multicenter, Randomized, Double-Blind, Placebo-Controlled Trial Comparing REMICADE ® [infliximab] and Placebo in the Prevention of Recurrence in Crohn’s Disease Patients Undergoing Surgical Resection Who Are at an Increased Risk of Recurrence). Seventy videos were independently reviewed by three central readers blinded to treatment allocation and clinical data. Disease activity was scored in the neoterminal ileum, anastomosis and distal colon using multiple indices: Rutgeerts and modified Rutgeerts scores, the POCER index, the REMIND score (ileal and anastomotic components), the Simple Endoscopic Score for Crohn’s Disease (SES-CD), and the Crohn’s Disease Endoscopic Index of Severity (CDEIS).

Methods — reliability and responsiveness metrics

Interrater reliability was quantified using the intraclass correlation coefficient (ICC), with commonly accepted interpretive thresholds (ICC >0.75 substantial to excellent; 0.40–0.75 moderate; <0.40 fair to poor). Responsiveness to treatment with infliximab was quantified using the win probability (WinP) metric: the probability that a randomly selected patient in the infliximab arm had a better (lower) score than a randomly selected patient in the placebo arm. WinP values approaching 0.5 indicate no discriminatory power; values substantially above 0.7 indicate meaningful responsiveness favoring treatment effects.

Key findings

Interrater reliability

Substantial reliability was observed for the Rutgeerts and modified Rutgeerts scores, the ileal REMIND score, SES-CD and CDEIS (ICC range 0.74–0.80). The POCER index showed moderate reliability (ICC 0.49). The anastomotic component of the REMIND score performed poorly (ICC 0.30), reflecting marked between-reader disagreement for lesions at the anastomosis.

Responsiveness (treatment discrimination)

Indices that were both reliable and responsive included Rutgeerts and modified Rutgeerts, ileal REMIND, SES-CD and CDEIS, each showing large degrees of responsiveness (WinP 0.75–0.83). In practical terms, patients randomized to infliximab had substantially better scores on these indices compared with placebo, with consistent reader agreement.

By contrast, the POCER index and anastomotic REMIND score demonstrated only small degrees of responsiveness (WinP ≈0.54 and 0.53 respectively), compatible with weak discrimination between treatment and placebo.

Segment-specific performance

Across indices, assessments restricted to the neoterminal ileum produced higher reliability and responsiveness estimates than evaluations focused on the anastomosis or distal colon. This suggests that existing instruments are optimized for the ileal mucosal appearance and less capable of consistently grading anastomotic pathology or more distal colonic recurrence in the postoperative setting.

Implications for clinical interpretation

When the goal is to detect postoperative recurrence and to evaluate treatment effects (for example in clinical trials), the Rutgeerts-based instruments, ileal REMIND, SES-CD and CDEIS provide the best combination of reproducibility and sensitivity in the neoterminal ileum. Use of the POCER index or anastomosis-specific REMIND components may introduce more measurement noise and reduce ability to detect treatment effects.

Expert commentary and contextualization

This work addresses a practical measurement problem in postoperative CD management. The Rutgeerts score remains a cornerstone of postoperative risk stratification, and this analysis reinforces its value — and that of several ileal-focused indices — when used by blinded central readers. Important clinical and methodological considerations follow.

Why do anastomotic assessments perform poorly?

Anastomoses are anatomically and visually heterogeneous: staple lines, mucosal apposition, surgical technique, short-segment ischemia, and scarring can all mimic or obscure active inflammation. Endoscopic features (mucosal edema, neovascularization, small erosions) may be subtle and variably interpreted. The low ICC for the anastomotic REMIND score underscores the need for clearer definitions, standardized imaging protocols (e.g., insufflation, irrigation, multiple views), and potentially adjunctive imaging modalities (high-definition imaging, chromoendoscopy) to improve agreement.

Practical impact on trials and practice

For clinical trials in the postoperative setting, selecting primary endoscopic endpoints with demonstrated reliability and responsiveness is crucial to ensure adequate statistical power and interpretability. The findings support prioritizing ileal scores (Rutgeerts, modified Rutgeerts, ileal REMIND, SES-CD, CDEIS) for endpoint selection and central reading. In routine clinical practice, central reading is not feasible, so awareness of lower agreement at the anastomosis should temper single-observer decisions (e.g., escalation of therapy) and may prompt adjunctive evaluation (radiology, biomarkers, repeat endoscopy, or multidisciplinary review).

Limitations to consider

Several limitations merit emphasis. The analysis used a finite sample of 70 videos drawn from a randomized trial population at increased risk of recurrence; results may not generalize to all postoperative cohorts. Central readers were experts; community gastroenterologists may demonstrate different reliability patterns. The WinP assesses between-patient discrimination rather than within-patient change over time; complementary responsiveness metrics (e.g., standardized response mean) might provide additional insights. Finally, the study addresses endoscopic measurement characteristics but does not directly translate these to long-term clinical outcomes (e.g., symptomatic recurrence, reoperation).

Conclusions

This central-reader analysis of PREVENT trial videos demonstrates that commonly used endoscopic indices — notably the Rutgeerts and modified Rutgeerts scores, ileal REMIND, SES-CD and CDEIS — exhibit substantial interrater reliability and good responsiveness to infliximab treatment when used to assess the neoterminal ileum after ileocolic resection. Conversely, indices focused on the anastomosis (anastomotic REMIND) or the POCER index perform less well, with lower agreement and limited treatment discrimination. These results support continued use of ileal-focused endoscopic endpoints in postoperative trials and underline a measurement gap for the anastomosis and distal colon that requires methodological refinement.

Clinical and research implications

For investigators: prioritize ileal indices with proven reliability and responsiveness when designing postoperative Crohn’s disease trials; use central blinded readers where feasible. For clinicians: interpret anastomotic or distal colonic endoscopic findings cautiously and consider corroborating evidence (clinical course, biomarkers, imaging) before escalating therapy. For researchers: develop clearer anastomosis-specific definitions, standardized imaging protocols, and explore adjunctive modalities (high-definition imaging, dye-based chromoendoscopy, AI-assisted image analysis) to improve reproducibility.

Funding and clinicaltrials.gov

Funding and detailed trial registration information for the PREVENT trial are reported in the primary trial publications and in Hanzel J et al. Clin Gastroenterol Hepatol. 2025 (citation below). Readers should consult the PREVENT trial documentation for sponsor and registration identifiers.

References

1. Hanzel J, Vuyyuru SK, Bressler B, et al. Reliability and Responsiveness of Endoscopic Indices for Assessing Crohn’s Disease Postoperative Recurrence in the PREVENT trial. Clin Gastroenterol Hepatol. 2025 Sep 2:S1542-3565(25)00741-4. doi: 10.1016/j.cgh.2025.08.021. PMID: 40907850.

2. Rutgeerts P, Geboes K, Vantrappen G, et al. Predictability of post-operative recurrence by endoscopic assessment. (Original Rutgeerts score publication). Gastroenterology. 1990; [classic reference establishing Rutgeerts scoring system].

3. Daperno M, D’Haens G, Van Assche G, et al. Development and validation of the Simple Endoscopic Score for Crohn’s Disease (SES-CD). Gut. 2004;53:591–596.

4. Mary JY, Modigliani R. Development of the Crohn’s Disease Endoscopic Index of Severity (CDEIS). Gastroenterol Clin Biol. 1989;13(1):70–72.

5. De Cruz P, Kamm MA, Hamilton AL, et al. POCER: postoperative recurrence study and index development. (POCER trial/publications detailing index use and performance).

Note: References 2 and 5 are given as classic, well-known sources for the described indices and the POCER framework; readers are encouraged to consult original index publications and guideline documents (e.g., ECCO postoperative management guidance) for full methodological detail.

Author note

This article was prepared by a medical science writer to synthesize and interpret results from Hanzel et al. (2025) for clinicians and researchers. For operational details (scoring manuals, image examples), consult the original publication and supplementary materials.

Thumbnail image prompt

A close-up view of an endoscopy monitor showing the neoterminal ileum with small aphthous ulcers; three clinicians (diverse genders and ethnicities) stand nearby, intently reviewing video on a large screen in a modern clinical conference room; scoring sheets and a laptop with anonymized patient data visible on the table; cool clinical lighting, photorealistic, high detail.