Neural Networks Predict Survival for Older Adults With Head and Neck Cancer — Useful, but Not Yet Practice-Changing

Highlight

– An international retrospective cohort (SENIOR registry) used artificial neural networks (ANNs) to predict overall survival (OS) and progression-free survival (PFS) in patients ≥65 years with locoregionally advanced HNSCC treated with definitive chemoradiation.

– Models achieved moderate discrimination (OS ROC-AUC 0.68, PFS ROC-AUC 0.64) and identified human papillomavirus (HPV) status, estimated glomerular filtration rate (eGFR), Eastern Cooperative Oncology Group (ECOG) performance status, and nodal classification as the most influential features.

Background

Head and neck squamous cell carcinoma (HNSCC) is a disease predominantly affecting older adults. This population is heterogeneous with respect to comorbidity, functional reserve, and treatment tolerance, yet older patients are underrepresented in randomized trials. Consequently, clinicians frequently must individualize definitive chemoradiation decisions without robust, age-specific evidence.

Predictive models that integrate routinely available clinical variables could assist shared decision-making, tailoring of intensity, and allocation of supportive resources. Machine learning (ML), including artificial neural networks (ANNs), promises to detect nonlinear relationships and interactions that conventional regression may miss. However, model performance, interpretability, external validation, and clinical utility remain critical hurdles before deployment in practice.

Study design

Marschner et al. report an international retrospective cohort study (SENIOR registry) that developed and externally validated two ANN models to predict OS and PFS among older adults (age ≥65) with locoregionally advanced HNSCC treated with definitive radiotherapy and concurrent systemic therapy between 2005 and 2019.

Key inclusion criteria were: age ≥65, locoregionally advanced HNSCC, definitive chemoradiation. Exclusions included induction or adjuvant chemotherapy, prior head and neck cancer, or metastatic disease at treatment initiation. Data were pooled from 19 academic centers across Germany, Switzerland, Czech Republic, Cyprus, and the US. Time window for case collection was 2005–2019; data curation occurred 2021–2023. Analysis was performed December 2023–April 2025.

Training and testing splits: for OS, 738 patients in training and 160 in testing (total n=898). For PFS, 770 training and 175 testing (total n=945). Models were evaluated with ROC area under the curve (AUC) and precision-recall AUC; feature importance and explainability were assessed with Shapley additive explanations (SHAP) values. Patients were classified as high- or low-risk using median model output thresholds.

Key findings

Population characteristics: median age 71 years (IQR 68–76); approximately three-quarters male (74%). The cohort comprised patients treated with definitive radiotherapy plus concurrent systemic therapy; HPV status was included where available.

Model performance

– Overall survival ANN: ROC-AUC 0.68 (95% CI, 0.60–0.76) in the external testing cohort. The model stratified patients into high- and low-risk groups with statistically significant differences in survival.

– Progression-free survival ANN: ROC-AUC 0.64 (95% CI, 0.56–0.72) in testing.

– Precision-recall AUCs were reported to address class imbalance; numerical values were not highlighted in the abstract but were used to supplement discrimination metrics.

Top predictive features

SHAP analysis identified the most influential predictors across models: human papillomavirus (HPV) status (strongly prognostic in oropharyngeal disease), kidney function measured by eGFR, ECOG performance status, and nodal classification (N stage). These features align with clinical knowledge: HPV-positive oropharyngeal cancers have better prognosis, baseline functional status and organ function influence both treatment tolerance and competing mortality, and nodal burden correlates with recurrence risk.

Risk stratification and potential clinical uses

Using the median model output as a dichotomous cutoff, the ANNs divided patients into groups with different survival trajectories. The authors suggest these models could support treatment personalization — for example, identifying patients at high competing-mortality risk where treatment de-intensification or enhanced supportive care might be appropriate, or pinpointing low-risk older adults who could reasonably tolerate standard-of-care chemoradiation.

Expert commentary: interpretation, strengths, and limitations

The study addresses a clinically meaningful gap by focusing on older adults with HNSCC and using multisite data for development and external testing. Several aspects strengthen its contribution:

International, multicenter data increase heterogeneity and enhance potential generalizability compared with single-center models.
External testing sets were used rather than only internal cross-validation, a critical step toward unbiased performance assessment.
Model explainability via SHAP provides interpretable feature contributions, easing clinician appraisal of model behavior.

However, key limitations temper enthusiasm for immediate clinical deployment:

Moderate discrimination: ROC-AUCs of 0.68 (OS) and 0.64 (PFS) indicate only modest ability to separate outcomes at the individual level. For clinical decision-making, particularly when decisions carry substantial morbidity, higher discrimination and proven net benefit are generally required.
Calibration and clinical utility: The abstract reports discrimination metrics but does not present calibration plots or decision-curve analyses. A model with acceptable discrimination can still misestimate absolute risk, producing suboptimal clinical recommendations.
Retrospective design and potential information bias: Data spanning 2005–2019 include temporal changes in staging, HPV testing practices, radiotherapy techniques (IMRT adoption), and systemic therapy (e.g., cetuximab versus platinum, evolving immunotherapy). Heterogeneity in treatment regimens and incomplete capture of nuanced variables (dose intensity, interruptions, social determinants) could affect model performance and transportability.
Missing geriatric-specific measures: Key determinants of outcomes in older adults—comprehensive geriatric assessment domains such as cognition, mobility, nutrition, social support, and polypharmacy—were not emphasized. Existing geriatric oncology tools (CARG toxicity score, Geriatric Assessment) add information beyond performance status and laboratory values and may improve individualized prediction (Hurria et al., 2011; Mohile et al., 2018).
Competing risks and cause-specific outcomes: Older adults have substantial competing mortality from noncancer causes. Models predicting OS may need competing-risks methods to distinguish cancer-related mortality from other causes, which has direct bearing on treatment decisions focused on disease control versus life expectancy.
Effect on decisions and outcomes unknown: The central question is whether ANN-informed decisions would change management and improve patient-centered outcomes (survival, quality of life, treatment toxicity). Prospective impact studies or randomized controlled trials are required to demonstrate clinical benefit and to detect unintended harms.

How these models fit into current practice

Prediction tools for older adults should complement—not replace—comprehensive clinical assessment. Current geriatric oncology guidelines recommend incorporation of geriatric assessment to identify vulnerabilities and guide management (e.g., ASCO guideline). An ANN trained on routine clinical variables could be an accessible first-step triage tool to flag patients for full geriatric evaluation, trial enrollment, or early palliative care referral.

Clinicians and institutions considering integration should demand transparency on model inputs, preprocessing, handling of missing data, calibration on local populations, and user-centered interfaces that explain uncertainty. Workflow integration must also respect patient values and ensure shared decision-making, especially when models suggest de-intensification.

Next steps and research priorities

To translate these ANNs into practice, the following steps are recommended:

Prospective validation across diverse clinical settings with up-to-date treatment protocols and complete geriatric variables.
Calibration assessment and, if needed, local recalibration to preserve accurate absolute risk estimates.
Decision-curve analysis and clinical impact studies to evaluate net benefit and patient-centered outcomes.
Integration with geriatric assessment instruments and biomarkers (e.g., inflammatory markers, frailty indices) to improve discrimination and clinical relevance.
Ethical and implementation research addressing transparency, clinician acceptance, and avoidance of algorithmic bias (e.g., differential performance across demographic groups).

Conclusion

Marschner and colleagues have developed and externally tested ANNs that moderately discriminate OS and PFS in older adults receiving definitive chemoradiation for locoregionally advanced HNSCC. Important predictors identified (HPV, eGFR, ECOG, nodal stage) corroborate clinical intuition and support model face validity. Yet moderate AUCs, lack of reported calibration and decision-curve metrics, retrospective design, and limited geriatric granularity mean these models are hypothesis-generating rather than ready for routine clinical deployment.

Well-conducted prospective validation, integration with geriatric assessment, and impact evaluation are required before ANNs can be recommended as decision-support tools in this vulnerable population. In the meantime, these models offer a promising framework to sharpen risk stratification and to prioritize patients for comprehensive geriatric assessment and tailored treatment planning.

Funding and clinicaltrials.gov

Funding: Not reported in the provided abstract. See the original JAMA Otolaryngology—Head & Neck Surgery article for complete funding disclosures.

ClinicalTrials.gov: Not applicable; retrospective registry study.

References

1. Marschner SN, Lombardo E, Haehl E, et al. Outcome Prediction in Older Adults With Head and Neck Cancer Undergoing Chemoradiation. JAMA Otolaryngol Head Neck Surg. 2025 Nov 6:e253840. doi:10.1001/jamaoto.2025.3840.

2. Hurria A, Togawa K, Mohile SG, et al. Predicting Chemotherapy Toxicity in Older Adults With Cancer: A Prospective Multicenter Study. J Clin Oncol. 2011;29(25):3457–3465.

3. Mohile SG, Cesari M, Hurria A, et al. Practical Assessment and Management of Vulnerabilities in Older Patients Receiving Chemotherapy: ASCO Guideline. J Clin Oncol. 2018;36(22):2326–2347.

4. Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): The TRIPOD Statement. Ann Intern Med. 2015;162(1):55–63.

5. Lundberg SM, Lee SI. A unified approach to interpreting model predictions. arXiv:1705.07874. 2017.

6. Pignon JP, le Maître A, Maillard E, Bourhis J; MACH-NC Collaborative Group. Meta-analysis of chemotherapy in head and neck cancer (MACH-NC): an update on 93 randomized trials and 17,346 patients. Lancet Oncol. 2009;10(11):888–892.

7. Steyerberg EW. Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. 2nd ed. Springer; 2019.

8. Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25(1):44–56.

Thumbnail prompt (AI image generation)

Photorealistic image: an older adult (early 70s) sitting with an oncologist in a bright consultation room; tablet screen visible showing a colored survival probability curve and a SHAP-style bar chart; a radiotherapy mask and head-and-neck CT slice visible in the background; calm, professional, inclusive composition; high resolution, natural lighting.