AI-ECG Models for Heart Failure Screening Show High Accuracy in First-of-its-Kind Independent Comparison Study

AI-ECG Models for Heart Failure Screening Show High Accuracy in First-of-its-Kind Independent Comparison Study

High Accuracy and Cross-Population Robustness: AI-ECG Models for LVSD Detection Pass Independent Validation

Highlights

  • Independent validation of four international AI-ECG models demonstrated strong performance in detecting left ventricular systolic dysfunction (LVSD), with AUROCs ranging from 0.83 to 0.93.
  • The models remained effective even in lower-complexity subgroups (AUROC 0.87–0.96), suggesting utility in general screening populations.
  • Despite high performance, most published AI-ECG models carry a high risk of bias due to poor reporting and lack of external validation.
  • Limited model availability remains a significant bottleneck for the independent verification and clinical translation of digital health tools.

The Clinical Imperative for Improved LVSD Screening

Left ventricular systolic dysfunction (LVSD) is a primary precursor to symptomatic heart failure, a condition associated with significant morbidity, mortality, and healthcare costs. Early detection of LVSD—defined typically as a left ventricular ejection fraction (LVEF) ≤40% or ≤50%—is crucial because evidence-based pharmacological interventions, such as SGLT2 inhibitors and ACE inhibitors, can significantly improve outcomes. However, current screening methods, including physical examination and N-terminal pro-B-type natriuretic peptide (NT-proBNP) testing, often lack the sensitivity or specificity required for cost-effective population screening. While echocardiography is the gold standard, its use for mass screening is limited by cost and the requirement for specialized personnel.

Artificial Intelligence-enhanced electrocardiograms (AI-ECG) have emerged as a potentially transformative solution. By applying deep learning to standard 12-lead ECG data, these models can identify subtle patterns of structural heart disease that are invisible to the human eye. While numerous models have been published, they are often developed and validated within the same healthcare system, raising questions about their generalizability across different patient demographics and clinical settings.

Study Design: A Rigorous Approach to External Validation

In a landmark study published in JACC Advances, Croon et al. sought to address these gaps by conducting a systematic review and the first head-to-head independent validation of AI-ECG models for LVSD. The researchers identified 51 models from 35 studies but encountered significant hurdles in transparency: only four groups (from Korea, the United States, Taiwan, and the Netherlands) agreed to share their models for independent testing.

The external validation was conducted using a well-phenotyped registry of 1,203 consecutive patients undergoing routine clinical cardiac magnetic resonance imaging (MRI) at a single center. MRI served as the gold standard for LVEF assessment. The cohort’s mean age was 59 years, with 35% female representation. The researchers evaluated model performance in two groups: the total consecutive cohort and a lower-complexity subgroup designed to mimic a screening population with a 15% LVSD prevalence. Risk of bias was assessed using the Prediction Model Risk of Bias Assessment Tool (PROBAST).

Key Findings: Performance Metrics and Model Agreement

The results of the head-to-head comparison were remarkably consistent. In the total patient cohort, the area under the receiver-operating characteristic curve (AUROC) for the four models ranged from 0.83 to 0.93. When applied to the lower-complexity subset—those more representative of a primary care or screening environment—the performance improved, with AUROCs ranging from 0.87 to 0.96.

Consistency Across Subgroups

One of the most significant findings was the robustness of these models across various patient characteristics. Performance remained high across different age groups and sexes. However, the study did identify specific clinical scenarios where performance dipped slightly. Models were less accurate in patients with wide QRS complexes (≥120 ms) or those in atrial fibrillation. This is biologically plausible, as major conduction abnormalities can mask the subtle repolarization changes that AI models often use to detect LVSD.

Model Agreement

Interestingly, despite being trained on geographically and ethnically diverse populations—ranging from East Asian to North American and European cohorts—there was substantial agreement between the models. This suggests that the features learned by these neural networks are likely representative of fundamental pathophysiological changes in the heart rather than population-specific artifacts.

Navigating the Challenges of Bias and Reproducibility

While the performance data is encouraging, the systematic review portion of the study highlighted significant concerns regarding the state of AI research in cardiology. The researchers found that the majority of published models had a high risk of bias. Common issues included:

  • Inadequate description of development cohorts and exclusion criteria.
  • Lack of clarity regarding how the models were calibrated.
  • Failure to perform independent external validation in the original publications.

Furthermore, the low rate of model sharing (only 4 of 35 studies) underscores a major barrier to progress. For AI-ECG to become a standard clinical tool, the medical community must move toward a culture of open science, where models are made available for independent auditing and validation across diverse clinical environments.

Expert Commentary: Moving from Bench to Bedside

The findings by Croon et al. provide a strong evidence base for the clinical utility of AI-ECG. The high AUROCs in the lower-complexity subgroup are particularly promising for heart failure screening in primary care. If integrated into standard ECG machines, these algorithms could provide an immediate, low-cost risk assessment, identifying patients who require further evaluation with echocardiography.

However, clinical implementation requires more than just high AUROCs. We must consider the “black box” nature of these models. Clinicians are often hesitant to rely on an algorithm if they cannot understand the underlying physiological rationale. Future research should focus on explainable AI (XAI) techniques to highlight which parts of the ECG waveform are driving the prediction. Additionally, prospective randomized trials are needed to determine if AI-ECG-led screening actually improves clinical outcomes, such as reduced hospitalizations or mortality, compared to current standard-of-care practices.

Conclusion: A Call for Open Science in Digital Health

This first-in-kind independent validation study confirms that AI-ECG is a powerful tool for detecting LVSD, demonstrating high accuracy even when models are trained on disparate populations. The consistency of the results across the four shared models suggests that the technology is maturing and ready for more rigorous clinical testing.

However, the study also serves as a critical reminder of the need for transparency. The high risk of bias in the broader literature and the difficulty in obtaining models for validation are significant hurdles. For AI to truly revolutionize cardiology, researchers must prioritize reproducibility and open access. Only through independent verification can we build the trust necessary to integrate these digital tools into routine clinical practice and ultimately improve the care of patients at risk for heart failure.

References

Croon PM, Boonstra MJ, Allaart CP, et al. Artificial Intelligence-Enhanced Electrocardiogram Models for Detection of Left Ventricular Dysfunction: A Comparison Study. JACC Adv. 2026;5(2):102572. doi:10.1016/j.jacadv.2025.102572.

Heidenreich PA, Bozkurt B, Aguilar D, et al. 2022 AHA/ACC/HFSA Guideline for the Management of Heart Failure: A Report of the American College of Cardiology/American Heart Association Joint Committee on Clinical Practice Guidelines. J Am Coll Cardiol. 2022;79(17):e263-e421.

Attia ZI, Kapa S, Lopez-Jimenez F, et al. Screening for cardiac contractile dysfunction using an artificial intelligence-enabled electrocardiogram. Nat Med. 2019;25(1):70-74.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply