AI-Enhanced ECG Screening Pinpoints Atrial Fibrillation Risk in Older Adults: Key Insights from VITAL-AF Trial

Background: The Challenge of Atrial Fibrillation Detection

Atrial fibrillation (AF) represents the most common sustained cardiac arrhythmia globally, affecting millions of individuals and carrying substantial risks for stroke, heart failure, and mortality. The silent nature of this condition presents a significant clinical dilemma: AF often remains undiagnosed until devastating complications occur, particularly in older populations where prevalence increases exponentially with age. Current screening approaches relying on age-based thresholds of 65 years and older have demonstrated limited yield, prompting researchers to explore more sophisticated risk-stratification strategies.

The emergence of artificial intelligence (AI) in cardiovascular medicine has opened new frontiers for disease detection. Specifically, AI-enabled electrocardiogram (ECG) models have shown remarkable capability in identifying subtle electrical signatures that precede clinical AF manifestation. The question that remains paramount is whether these advanced technologies can meaningfully improve screening efficiency when integrated into primary care settings serving older adults.

Study Design and Methodology

The VITAL-AF trial (Screening for Atrial Fibrillation Among Older Patients in Primary Care Clinics; NCT03515057) employed a cluster-randomized design across 16 primary care practices affiliated with Massachusetts General Hospital. Patients aged 65 years and older were enrolled and randomized to either screening or control arms at the practice level.

Among participants without prevalent AF who had at least one 12-lead ECG performed within 3 years before enrollment (n=16,937), investigators estimated AF risk using three validated models developed externally to VITAL-AF:

The first model, the Cohorts of Heart and Aging Research in Genomic Epidemiology-AF (CHARGE-AF) clinical score, relies on conventional clinical parameters including age, race, height, weight, blood pressure, diabetes, heart failure, and myocardial infarction history. The second model, designated ECG-AI, utilizes an AI-based algorithm applied exclusively to 12-lead ECG data without clinical variables. The third model, CH-AI, represents a novel combination integrating both ECG-AI outputs and CHARGE-AF clinical variables.

The primary endpoint assessed 2-year incident AF discrimination using time-dependent area under the receiver-operating characteristic curve (AUROC) and average precision metrics. Screening effect was quantified as the difference in 2-year AF diagnosis rates (per 100 person-years) between screening and control arms across AF risk deciles.

Risk Model Discrimination Performance

The analysis revealed substantial differences in risk discrimination capability among the three models tested. Each score demonstrated meaningful ability to distinguish individuals who would develop AF within 2 years, though with notable variation in predictive power.

The CHARGE-AF clinical score achieved an AUROC of 0.711 (95% CI: 0.671-0.749), representing modest discrimination consistent with prior validation studies. In contrast, the ECG-AI model demonstrated superior performance with an AUROC of 0.784 (95% CI: 0.743-0.819), representing an absolute improvement of approximately 7 percentage points. The combined CH-AI model achieved the highest discrimination at 0.788 (95% CI: 0.754-0.824), though the incremental benefit over ECG-AI alone was marginal.

Similar patterns emerged with average precision analysis, which provides more informative assessment in imbalanced datasets typical of AF prediction. CHARGE-AF achieved an average precision of 0.0952 (95% CI: 0.0836-0.112), while ECG-AI reached 0.132 (95% CI: 0.113-0.157) and CH-AI achieved 0.133 (95% CI: 0.117-0.159). These findings confirm that ECG-derived AI features capture independent information beyond traditional clinical risk factors, though the combined model offers the most comprehensive risk assessment.

Screening Effect Across Risk Stratification

The critical question driving this analysis concerned whether screening benefit varies according to underlying AF risk. The investigators examined incident AF diagnosis rates across deciles of predicted risk, revealing a clear gradient effect. While modest screening effects were observed across multiple risk strata, the most compelling findings emerged in the highest-risk populations.

Among individuals in the top decile of CH-AI-predicted risk, screening demonstrated statistically significant benefit. The AF diagnosis rate in the screening arm reached 10.07 per 100 person-years (95% CI: 8.28-11.87) compared to 7.76 per 100 person-years (95% CI: 6.30-9.21) in the control arm (P<0.05). This absolute difference of 2.32 per 100 person-years (95% CI: 0.01-4.63) translates to a number-needed-to-screen (NNS) of just 43 individuals per year to detect one additional case of AF.

This NNS compares favorably with other accepted screening programs. For context, breast cancer screening with mammography typically requires 400-500 women to be screened to prevent one death over approximately 10 years. The substantially lower NNS observed in high-risk individuals suggests that targeted AF screening in stratify-selected populations may represent an efficient use of healthcare resources.

Expert Commentary and Clinical Implications

The findings from this VITAL-AF analysis carry important implications for cardiovascular prevention strategies. Dr. Steven Lubitz and colleagues from Massachusetts General Hospital and the Broad Institute, who led this investigation, have contributed substantially to our understanding of AF epidemiology and detection. Their work underscores a fundamental principle emerging across preventive cardiology: one-size-fits-all screening approaches rarely optimize the balance between detection yield and resource utilization.

The performance hierarchy observed—whereby ECG-AI outperformed CHARGE-AF alone, and combined CH-AI achieved marginal additional benefit over ECG-AI—suggests that ECG-derived signals may capture the essential pathophysiological substrate predisposing to AF. Electrical remodeling, atrial fibrosis, and subtle conduction abnormalities that precede clinical AF may manifest in standard ECG waveforms before symptoms or detectable arrhythmias occur. AI algorithms excel at pattern recognition in these subtle signatures.

However, important considerations temper enthusiasm for immediate implementation. The study was conducted within a single healthcare system with particular demographics and electronic health record infrastructure. Generalizability to community practices, rural settings, and populations with different racial/ethnic compositions requires further investigation. Additionally, whether earlier AF detection through AI-guided screening actually improves outcomes—particularly reducing stroke—remains unproven. The natural history of screen-detected AF may differ from clinically diagnosed disease, and the benefit of anticoagulation in screen-detected versus symptomatic AF requires prospective validation.

Limitations and Research Gaps

Several methodological considerations warrant acknowledgment. The analysis relied on participants who had prior ECG recordings within 3 years before enrollment, potentially introducing selection bias toward individuals with greater healthcare engagement. The single-lead ECG used for screening in VITAL-AF differs from the 12-lead ECG utilized for model development, which may affect performance characteristics. Furthermore, the 2-year follow-up period may underestimate longer-term AF development that occurs beyond this window.

The investigators appropriately note the trade-off between screening efficiency and population coverage inherent in risk-based approaches. While restricting screening to high-risk individuals maximizes yield per screen, it necessarily excludes some individuals who might benefit from detection. Determining the optimal risk threshold requires consideration of available resources, healthcare system capacity, and patient preferences.

Future research should address whether AI-guided screening actually reduces stroke incidence and cardiovascular mortality, whether these models perform similarly across diverse healthcare settings, and how to optimally integrate risk-guided screening into existing primary care workflows. Cost-effectiveness analyses comparing risk-guided versus universal screening approaches in older adults would provide crucial evidence for guideline development.

Conclusion

The VITAL-AF trial analysis demonstrates that ECG-based AI models, particularly when combined with clinical risk factors, can effectively identify older adults at elevated risk for atrial fibrillation who derive the greatest benefit from screening. The CH-AI model identified a high-risk decile with an NNS of 43 per year—a finding that suggests targeted screening strategies may substantially improve detection efficiency compared to age-based universal approaches.

These findings support a paradigm shift toward precision screening in cardiovascular disease prevention. Rather than applying uniform screening criteria based solely on age, integrating AI-derived risk stratification could enable more efficient allocation of screening resources while maximizing detection of clinically significant AF. Implementation studies examining real-world feasibility, patient acceptance, and long-term outcomes will be essential before widespread adoption.

The broader lesson from this research extends beyond atrial fibrillation: artificial intelligence applied to readily available clinical data has potential to transform preventive cardiology by enabling risk-based rather than demographic-based screening strategies. As healthcare systems increasingly digitize and AI capabilities mature, such approaches may become standard practice for multiple cardiovascular conditions.

Funding and Trial Registration

The VITAL-AF trial (NCT03515057) was conducted at Massachusetts General Hospital with support from the National Institutes of Health, American Heart Association, and Massachusetts General Hospital Research Scholars Award. The sponsors had no role in study design, data collection, analysis, or manuscript preparation. The authors declared no conflicts of interest relevant to this analysis.

References

1. Vedage NA, Friedman SF, Chang Y, Borowsky LH, Shah SJ, McManus DD, Atlas SJ, Singer DE, Lubitz SA, Maddah M, Ellinor PT, Khurshid S. Risk-Guided Atrial Fibrillation Screening With Artificial Intelligence-Enabled Electrocardiogram Models: A VITAL-AF Trial Analysis. J Am Coll Cardiol. 2026;87(14):1798-1813. PMID: 41983618.

2. Alonso A, Soliman EZ, Chen LY, Bluemke DA, Folsom AR. Association of Blood Pressure and Heart Rate with Incident Atrial Fibrillation (from the Multi-Ethnic Study of Atherosclerosis [MESA]). Am J Cardiol. 2019;124(8):1225-1230.

3. Schnabel RB, Yin X, Gona P, et al. 50-year trends in atrial fibrillation prevalence, incidence, risk factors, and mortality in the Framingham Heart Study: a cohort study. Lancet. 2015;386(9989):154-162.

4. Attia ZI, Noseworthy PA, Lopez-Jimenez F, et al. An artificial intelligence-enabled ECG algorithm for the identification of patients with atrial fibrillation during sinus rhythm: a retrospective analysis of outcome prediction. Lancet. 2019;394(10201):861-867.