Adaptive AI Transforms Cardiovascular Event Adjudication: New Algorithm Achieves Near-Human Accuracy Across Multiple Endpoints

Background: The Challenge of Cardiovascular Event Adjudication in Clinical Trials

Clinical endpoint classification (CEC) represents the gold standard for cardiovascular endpoint measurement in contemporary clinical trials. This meticulous process ensures that endpoint events are classified consistently and reproducibly, thereby minimizing bias and enhancing the validity of trial outcomes. However, the traditional CEC approach carries substantial practical burdens: it requires significant time, financial resources, and specialized expertise. As cardiovascular trials grow increasingly complex, with multiple endpoints and sophisticated composite definitions, the demand for efficient yet accurate endpoint adjudication has become more pressing than ever.

The emergence of artificial intelligence (AI) in healthcare has opened new possibilities for automating complex clinical assessments. Large language models and transformer-based architectures have demonstrated remarkable capabilities in understanding and processing medical text, raising the question of whether these technologies could be leveraged for endpoint adjudication. Yet concerns remain about the generalizability of AI systems across different trial populations, endpoint definitions, and data collection methodologies.

A groundbreaking study published in Circulation in March 2026 addressed these challenges directly. The research team, led by investigators from Duke University and collaborating institutions, developed and validated an adaptive AI algorithm specifically designed for cardiovascular event adjudication, with the ambitious goal of creating a system capable of adapting to new endpoint definitions without requiring complete retraining.

Study Design and Methodology

The investigators employed a multi-phase approach to develop and validate their adaptive AI system, which they named ADAPT-CEC. The algorithm was initially derived using data from the ODYSSEY OUTCOMES trial, a large phase 3 cardiovascular outcomes trial that enrolled patients with recent acute coronary syndrome. The derivation cohort focused on three critical cardiovascular endpoints: myocardial infarction (MI), stroke, and heart failure.

For external validation, the researchers turned to the EUCLID trial, which enrolled patients with stable atherosclerotic cardiovascular disease. This external validation was particularly important because the EUCLID trial included different endpoint definitions than ODYSSEY OUTCOMES, providing an opportunity to test the algorithm’s adaptability. Importantly, the EUCLID validation incorporated an adaptation phase in which the algorithm received information from just 20 suspected EUCLID events per endpoint type. This brief adaptation was designed to help the system learn trial-specific nuances without extensive retraining.

The primary endpoints examined in EUCLID validation included myocardial infarction, stroke, cardiovascular death, and bleeding events—the latter representing a fundamentally different category of endpoint that was not part of the original derivation set.

To establish performance benchmarks, the investigators compared ADAPT-CEC against two alternative approaches. The first was direct adjudication using GPT 4.0, a state-of-the-art large language model, without any trial-specific fine-tuning. The second was a hybrid approach in which ADAPT-CEC handled suspected events with higher prediction certainty, while events falling in the lowest 30% of certainty scores were referred to human adjudicators.

Performance was assessed primarily using F1 scores, which balance precision and recall, providing a comprehensive measure of classification accuracy. Secondary analyses examined the percentage of correctly classified endpoints and non-endpoints, as well as the impact of different adjudication strategies on estimated treatment effects.

Key Findings: Performance Comparison Across Strategies

The study evaluated 13,885 suspected EUCLID primary endpoint events, providing a robust dataset for performance comparison. The results demonstrated meaningful differences in classification accuracy across the three adjudication strategies.

For endpoint events specifically, ADAPT-CEC correctly classified 86.4% of events, while the hybrid approach achieved 95.6% accuracy and GPT 4.0 alone classified 76.3% correctly. Notably, all three approaches demonstrated exceptional performance in identifying non-endpoint events, with classification rates of 99.4% for ADAPT-CEC, 99.6% for hybrid, and 99.8% for GPT 4.0. This near-perfect specificity suggests that AI systems may be particularly valuable for efficiently ruling out endpoint events, potentially reducing unnecessary human review of clear non-cases.

Detailed F1 metrics across individual endpoints revealed nuanced performance patterns. The hybrid approach consistently achieved the highest F1 scores across all endpoint types: cardiovascular death achieved 0.94 (95% CI 0.92-0.96), myocardial infarction reached 0.80 (95% CI 0.77-0.82), stroke attained 0.82 (95% CI 0.78-0.86), and bleeding events showed 0.83 (95% CI 0.82-0.85).

ADAPT-CEC demonstrated lower but clinically relevant F1 metrics for cardiovascular death, myocardial infarction, and stroke compared to the hybrid approach. However, notably, ADAPT-CEC’s performance on bleeding events (F1 0.78, 95% CI 0.77-0.79) was superior to GPT 4.0 alone, despite GPT 4.0 having received no adaptation to trial-specific definitions. This finding suggests that the adaptation process in ADAPT-CEC confers meaningful advantages for endpoints not included in the original derivation set.

Perhaps most clinically relevant were the findings regarding treatment effect estimation. The EUCLID trial’s primary endpoint was the composite of cardiovascular death, myocardial infarction, or stroke. The hazard ratio estimates proved remarkably consistent across all adjudication strategies: human adjudication yielded HR 1.02 (95% CI 0.93-1.13), hybrid adjudication produced HR 1.04 (95% CI 0.94-1.15), ADAPT-CEC gave HR 0.98 (95% CI 0.88-1.09), and GPT 4.0 alone estimated HR 1.06 (95% CI 0.95-1.19). The overlapping confidence intervals across all strategies indicate that any of these approaches would have led to the same clinical conclusion regarding the study treatment’s lack of efficacy.

Implications for Clinical Trial Methodology

The validation of ADAPT-CEC represents a significant step forward in the application of artificial intelligence to cardiovascular clinical trials. Several aspects of the findings merit careful consideration by trialists, regulators, and methodological researchers.

First, the successful adaptation of a single-trial-derived algorithm to a second trial with partially different endpoint definitions addresses a fundamental concern about AI generalizability. The fact that 20 suspected events per endpoint provided sufficient information for meaningful adaptation suggests that AI systems could potentially be deployed across multiple trials within a therapeutic area, reducing the resources required for algorithm development and validation.

Second, the demonstration that AI can handle novel endpoint categories—in this case, bleeding events—opens possibilities for more flexible trial designs. If AI systems can be rapidly adapted to include new endpoints of interest, sponsors might be able to add endpoint assessments to ongoing trials or implement exploratory endpoints with less overhead than traditional CEC processes require.

Third, the hybrid adjudication model emerged as the clear winner in terms of raw performance, achieving F1 scores that approached or exceeded 0.90 for most endpoints. This approach offers a pragmatic compromise between full automation and traditional CEC: AI handles the majority of straightforward cases, while human expertise is reserved for the most challenging and consequential determinations. This selective human involvement could substantially reduce CEC costs and timelines while maintaining quality.

Fourth, the consistent treatment effect estimates across adjudication strategies provide reassuring evidence that AI-assisted adjudication does not systematically bias outcome assessment. This finding addresses a critical regulatory concern: whether AI systems might introduce differential misclassification that could obscure true treatment effects or create spurious signals.

Expert Commentary and Future Directions

While these findings are promising, several important limitations and knowledge gaps warrant acknowledgment. The study was conducted retrospectively using adjudicated clinical trial data, meaning that prospective implementation of AI adjudication has not yet been demonstrated in a live trial setting. Real-world prospective application may reveal practical challenges not apparent in retrospective analysis, including issues related to data quality, workflow integration, and the handling of edge cases.

The EUCLID trial’s patient population and endpoint definitions represent specific clinical contexts; generalizability to trials with markedly different characteristics—such as acute heart failure trials, device studies, or trials in pediatric populations—remains unestablished. Each new therapeutic area and endpoint category will likely require careful validation before confident deployment.

The AI system’s performance for myocardial infarction adjudication, while clinically acceptable, lagged behind performance on other endpoints. Myocardial infarction classification involves nuanced assessment of biomarker dynamics, ECG changes, and clinical symptoms, and the F1 score of 0.80 suggests room for improvement. Future algorithm iterations might incorporate additional data types or employ more sophisticated modeling approaches to enhance MI classification accuracy.

Regulatory acceptance of AI-assisted adjudication will require thoughtful framework development. Current regulatory guidance on endpoint adjudication was developed with human-only processes in mind. Clear standards for validation requirements, quality assurance procedures, and documentation expectations will be needed before AI adjudication can become routine in pivotal trials supporting regulatory submissions.

Conclusion: A Paradigm Shift in Clinical Trial Endpoint Assessment

The validation of ADAPT-CEC marks an important milestone in the evolution of AI applications in cardiovascular medicine. This adaptive AI algorithm demonstrated the capacity to adjudicate multiple cardiovascular endpoints across different trial populations and definitions, achieving accuracy levels that approached human performance when combined with selective human review. Critically, all adjudication strategies—human, AI-assisted, and AI-only—yielded consistent treatment effect estimates, suggesting that AI incorporation need not compromise the integrity of cardiovascular outcome assessments.

The hybrid model, with AI handling high-certainty cases and humans reviewing the lowest-confidence 30% of suspected events, emerged as the optimal approach, achieving 95.6% correct classification of endpoint events. This strategy could potentially reduce CEC costs and timelines substantially while maintaining quality standards expected for regulatory-grade endpoint assessment.

Looking ahead, prospective studies will be essential to validate these retrospective findings and establish practical implementation frameworks. As AI capabilities continue to advance and regulatory pathways become clearer, adaptive AI adjudication may become a standard tool in the cardiovascular trialist’s armamentarium—enabling more efficient trials, more comprehensive endpoint assessment, and ultimately, faster delivery of answers that inform clinical practice.

The journey from traditional CEC to AI-assisted adjudication represents more than incremental efficiency gains; it reflects a broader transformation in how we approach the measurement of clinical outcomes. The ADAPT-CEC study demonstrates that this transformation can proceed while preserving the rigor that patients, clinicians, and regulators rightly demand.

Funding and Clinical Trials

This research was conducted using data from the ODYSSEY OUTCOMES trial (NCT01663402) and the EUCLID trial (NCT01732822). Full funding information is available in the original publication in Circulation.

References

1. Vemulapalli S, Pena Guerra K, Wojdyla D, Jones WS, Mahaffey KW, Harrington RA, Steg PG, Schwartz GG, Patel MR, Lopes RD, Henao R. Adaptive AI for Cardiovascular Event Adjudication: Cardiovascular Event Adjudication Across Different Definitions in the ODYSSEY OUTCOMES and EUCLID Trials. Circulation. 2026 Mar 30. PMID: 41911340.