Background
Epidermal growth factor receptor (EGFR) mutations represent one of the most clinically significant molecular alterations in non-small cell lung cancer, particularly in lung adenocarcinoma (LUAD). These mutations serve as predictive biomarkers for targeted therapy with EGFR tyrosine kinase inhibitors, which have transformed treatment outcomes for affected patients. The identification of EGFR mutations has traditionally relied on molecular testing methods such as next-generation sequencing (NGS), polymerase chain reaction-based assays, or Sanger sequencing—all requiring tissue sampling and laboratory processing time that may delay treatment initiation.
In recent years, artificial intelligence (AI) models have emerged as promising tools for extracting genomic information directly from routine hematoxylin-eosin (H&E) stained pathology slides. These computational pathology approaches aim to democratize access to molecular profiling by leveraging the morphological patterns that correlate with underlying genetic alterations. However, the generalizability of these models across diverse patient populations and clinical contexts remains inadequately characterized. Given the known interpopulation differences in EGFR mutation frequencies and potential confounding factors such as tissue composition and staining variability, rigorous evaluation of AI model performance across ancestry groups is essential before clinical implementation.
Study Design
This retrospective cohort study evaluated two open-source AI pathology models for predicting EGFR mutation status in lung adenocarcinoma using whole-slide imaging and molecular profiling data. The investigation included patients from two independent cohorts: the Dana-Farber Cancer Institute (DFCI) cohort comprising 1,759 patients treated between June 2013 and November 2023, and the European-based TNM-I trial cohort with 339 patients enrolled from August 2016 to February 2022.
All included patients had paired next-generation sequencing data confirming EGFR mutation status alongside digitized H&E-stained whole-slide images. In the DFCI cohort, genetic ancestry was inferred using germline genotype data, enabling stratification into predefined ancestral groups: African (n=54), American (n=101), Asian (n=95), and European (n=1,465). The study population had a mean age of 66.6 years (SD 10.3), with 1,315 female patients (63%) and 783 male patients (37%). EGFR mutations were detected in 432 patients (25%) in the DFCI cohort and 50 patients (15%) in the TNM-I cohort.
The primary outcome was model performance for EGFR mutation prediction, measured by the area under the receiver operating characteristic curve (AUC), evaluated overall and across ancestry subgroups and sample types including lung resection specimens and pleural biopsies.
Key Findings
The study demonstrated substantial performance variability between the two evaluated AI pathology models. In the DFCI cohort, the higher-performing model achieved an AUC of 0.83 (95% CI, 0.81-0.85), while the lower-performing model yielded an AUC of 0.68 (95% CI, 0.65-0.70). Validation in the independent TNM-I cohort confirmed these findings, with AUCs of 0.81 (95% CI, 0.74-0.88) and 0.75 (95% CI, 0.68-0.83) for the respective models.
Ancestry-stratified analyses of the DFCI cohort revealed significant performance heterogeneity across ancestral groups for the higher-performing model. Patients of European ancestry demonstrated an AUC of 0.84 (95% CI, 0.81-0.86), while African ancestry patients showed comparable performance with an AUC of 0.85 (95% CI, 0.72-0.94). Notably, Asian ancestry patients exhibited substantially lower predictive accuracy with an AUC of 0.68 (95% CI, 0.55-0.78), representing a 16-percentage point reduction compared to European ancestry patients. American ancestry patients comprised a smaller subgroup without separately reported stratified estimates.
Sample type analyses further demonstrated performance degradation in certain clinical contexts. The higher-performing model achieved an AUC of 0.86 (95% CI, 0.83-0.88) in standard lung specimens but declined to 0.66 (95% CI, 0.56-0.76) in pleural specimens. This differential performance highlights the importance of tissue context in AI-based genomic prediction.
From a clinical workflow perspective, AI-guided triage analysis suggested that implementation of the higher-performing model could potentially reduce rapid EGFR testing requirements by 57% while maintaining a sensitivity of 0.84 and specificity of 0.99. These estimates indicate that AI pre-screening could substantially decrease laboratory workload without compromising identification of mutation-positive cases.
Expert Commentary
The findings from this investigation carry significant implications for the development and deployment of computational pathology tools in precision oncology. The observed performance differential between ancestral groups—particularly the diminished accuracy in Asian patients—warrants careful consideration of underlying mechanisms and potential confounding factors.
Several factors may contribute to the ancestry-associated performance variability observed in this study. First, differences in EGFR mutation subtypes across populations may influence the morphological features that AI models learn to recognize. Asian patients demonstrate higher rates of sensitizing mutations such as exon 19 deletions and L858R point mutations compared to other populations, yet this cohort showed lower AI model performance despite higher mutation prevalence. This counterintuitive finding suggests that factors beyond mutation frequency may be operative, potentially including differences in tumor morphology, microenvironment composition, or technical factors related to slide preparation and digitization.
The substantially reduced performance in pleural specimens (AUC 0.66) compared to lung resection specimens (AUC 0.86) highlights a critical limitation of current AI pathology approaches. Pleural biopsies often represent the only available tissue in advanced-stage patients, making accurate prediction particularly clinically relevant in this context. The performance decline may reflect differences in tissue architecture, necrosis, or inflammatory infiltration patterns in metastatic or invasive samples compared to primary resection specimens.
From an implementation standpoint, the potential 57% reduction in rapid EGFR testing volume with maintained sensitivity represents a compelling argument for AI-assisted triage in resource-constrained settings. However, the differential performance across ancestry groups necessitates careful consideration before broad clinical deployment. Quality assurance protocols, regular performance monitoring across patient demographics, and transparent communication of model limitations to clinicians would be essential components of any implementation strategy.
Conclusion
This cohort study provides important evidence regarding the performance characteristics and limitations of open-source AI pathology models for EGFR mutation prediction in lung adenocarcinoma. While these tools demonstrate promising overall performance with AUC values exceeding 0.80 in primary cohorts, significant performance variability across ancestral groups and sample types raises critical considerations for equitable clinical implementation.
The notably lower predictive accuracy in Asian ancestry patients (AUC 0.68) compared to European (AUC 0.84) and African (AUC 0.85) patients represents a substantial disparity that must be addressed through ongoing model refinement, diverse training data incorporation, and robust validation across populations. Similarly, the performance degradation in pleural specimens highlights the need for tissue-context-specific validation before widespread adoption.
The potential for AI-guided triage to reduce laboratory testing volume while maintaining high sensitivity offers tangible benefits for clinical workflow optimization. However, realizing these benefits while ensuring equitable care across all patient populations will require continued research, validation, and thoughtful implementation strategies that prioritize performance monitoring across demographic subgroups.
Funding
This study was supported by grants from the National Cancer Institute and institutional research funds from Dana-Farber Cancer Institute. The TNM-I trial was supported by European research consortium funding.
Reference
Rakaee M, Nassar AH, Tafavvoghi M, et al. Ancestry-Associated Performance Variability of Open-Source AI Models for EGFR Prediction in Lung Cancer. JAMA Oncol. 2026;12(4):402-406. PMID: 41678173.
