AI Radiomics Detects Visually Occult Pancreatic Cancer on CT Nearly 16 Months Before Diagnosis and Outperforms Radiologists

Highlights

REDMOD, a fully automated radiomics-based AI framework, detected visually occult pancreatic ductal adenocarcinoma (PDA) on standard-of-care CT with an AUC of 0.82 and 73.0% sensitivity in an independent low-prevalence test cohort.

The model substantially outperformed radiologists for pre-diagnostic detection, with sensitivity of 73.0% versus 38.9% overall, and 68.0% versus 23.0% at lead times greater than 24 months.

Performance appeared biologically and technically credible: predictive capacity was driven mainly by multi-scale wavelet textural features, and longitudinal concordance reached 90%–92% on repeat imaging.

Specificity generalized across external cohorts from different institutions and a public dataset, supporting potential use in prospective surveillance of high-risk populations.

Background and Clinical Need

Pancreatic ductal adenocarcinoma remains one of the most lethal solid malignancies. The poor prognosis is driven largely by late presentation, when most tumors are locally advanced or metastatic and no longer amenable to curative resection. Even with improvements in surgery, systemic therapy, perioperative care, and supportive oncology, population-level survival gains have been limited compared with other major cancers.

The central clinical challenge is that pancreatic cancer often evolves silently. Conventional imaging can miss disease at a stage when the pancreas still appears visually normal or only subtly altered. By the time a discrete mass, duct cutoff, vascular encasement, or overt parenchymal atrophy becomes evident, a critical window for interception may already have closed. This creates a major unmet need for tools that can extract preclinical signals from routine imaging before disease becomes obvious to the human eye.

Radiomics offers one plausible solution. Rather than relying on gross morphologic findings, radiomics quantifies imaging texture, shape, and intensity patterns that may reflect tissue-level architectural change. Machine learning can then integrate these features to classify scans in ways that are not feasible through routine visual assessment alone. For pancreatic cancer, the question is especially important because screening the general population is not currently recommended, given low disease prevalence and concerns about false positives, downstream procedures, and cost. Therefore, any proposed AI tool must function in a low-prevalence environment, maintain stability over time, and generalize across institutions and scanners.

The study by Mukherjee and colleagues addresses this translational gap by developing the Radiomics-based Early Detection MODel, or REDMOD, for pre-diagnostic detection of visually occult PDA on standard CT.

Study Design and Methods

Overall design

This was a multi-institutional model development and validation study focused on identifying pre-diagnostic pancreatic cancer on CT scans obtained before clinical diagnosis. The investigators trained REDMOD on 969 individuals, including 156 pre-diagnostic PDA cases and 813 controls, and then evaluated it in an independent test set of 493 individuals, including 63 pre-diagnostic cases and 430 controls. Importantly, the testing framework simulated a low-prevalence early-detection scenario of roughly 1 case per 6 controls rather than an artificially balanced case-control experiment.

AI framework

REDMOD was built as a fully automated pipeline. It combined AI-driven pancreatic segmentation with an ensemble classification architecture trained on a 40-feature radiomic signature. Because pre-diagnostic PDA cases are relatively uncommon, the training process used Synthetic Minority Over-sampling Technique, or SMOTE, to balance the data. The final classifier incorporated a tunable threshold optimized through the Youden Index, allowing calibration of sensitivity and specificity without retraining the entire model.

Validation strategy

The authors went beyond standard internal validation. They compared model performance directly with radiologist interpretation, assessed longitudinal stability through test-retest analyses on serial imaging, and evaluated specificity in two external cohorts totaling 619 additional participants: a multi-institutional cohort of 539 and a public cohort of 80. They also explored mechanism by examining which feature classes contributed most strongly to prediction, specifically comparing wavelet-filtered textural features with unfiltered radiomics.

Clinical endpoint

The primary goal was detection of occult, pre-diagnostic PDA on CT before overt clinical diagnosis. An especially important translational metric was lead time, that is, how long before diagnosis the model could identify risk. This matters because a model that detects only near-diagnostic disease offers less clinical value than one that identifies cases far enough in advance to enable surveillance, intervention, or potentially resection at a curable stage.

Key Results

Discrimination in the independent test cohort

In the independent test set of 493 patients, REDMOD achieved an area under the receiver operating characteristic curve of 0.82. Sensitivity for occult PDA was 73.0%. According to the abstract, the median lead time was 475 days, or roughly 15.5 months before diagnosis. That is a clinically meaningful interval in a disease where the natural history between localized and unresectable cancer can be relatively short.

An AUC of 0.82 in this context is notable because the task is unusually difficult: the scans were pre-diagnostic, the cancers were visually occult, and the test prevalence was kept low enough to approximate real-world early-detection conditions. Many AI studies report impressive metrics under enriched or highly selected experimental settings; this study attempted to move closer to the conditions under which such a system might actually be deployed.

Comparison with radiologists

The most attention-grabbing result is the direct comparison with radiologists. REDMOD detected 73.0% of occult PDA cases versus 38.9% for radiologists, a nearly twofold increase in sensitivity, with p<0.001. The advantage widened at longer lead times: for scans acquired more than 24 months before diagnosis, REDMOD had sensitivity of 68.0% compared with 23.0% for radiologists, nearly a threefold difference.

This is clinically important for two reasons. First, it suggests the model is not merely recognizing subtle near-diagnostic masses that radiologists happened to miss. Second, it supports the authors’ hypothesis that the model is identifying subvisual imaging signatures of altered pancreatic architecture before conventional abnormalities become evident. In practical terms, this is the difference between AI as a second reader for overlooked lesions and AI as a genuinely earlier biomarker of disease biology.

Longitudinal stability

Longitudinal performance is a major but often underreported issue in imaging AI. A model intended for serial surveillance must produce reasonably stable outputs across repeated scans, rather than fluctuating unpredictably because of scanner noise, contrast timing, or preprocessing artifacts. REDMOD reportedly showed 90%–92% concordance on longitudinal test-retest analysis, supporting technical robustness. This finding matters because a surveillance tool that triggers different risk calls on similar sequential studies would be difficult to integrate into practice.

External specificity and generalizability

The model’s specificity remained acceptable in external validation datasets: 81.3% in a multi-institutional cohort of 539 and 87.5% in a public cohort of 80. These are encouraging results, especially given the known tendency of radiomics models to degrade outside the development environment because of differences in acquisition parameters, reconstruction kernels, contrast phase, and patient mix.

That said, specificity in the low 80% range also signals an important implementation challenge. In a truly low-prevalence screening context, even modest false-positive rates can create substantial downstream imaging, anxiety, endoscopic ultrasound referrals, or invasive workups. For that reason, the authors’ tunable threshold is more than a technical footnote; it is likely to be central to clinical deployment. A threshold appropriate for surveillance in hereditary high-risk individuals may be very different from one used for opportunistic case finding among broader CT populations.

Mechanistic feature analysis

The model’s predictive signature was composed predominantly of multi-scale wavelet-filtered textural features, accounting for 90% of selected features. These wavelet-derived features outperformed unfiltered features, with AUC 0.82 versus 0.74 and p=0.007. This helps address a common critique of radiomics studies, namely that they function as opaque pattern recognizers without biological grounding.

Wavelet transformations decompose an image into different spatial frequencies and scales, making subtle textural heterogeneity easier to quantify. In pancreatic carcinogenesis, such heterogeneity could plausibly reflect early stromal remodeling, acinar loss, fibrosis, duct-centric changes, or altered glandular architecture that does not yet manifest as a visible mass. While this does not prove mechanism in a histopathologic sense, it does strengthen the argument that the model is capturing meaningful tissue disruption rather than random scanner artifacts.

Clinical Interpretation

This study is important because it tackles one of the hardest problems in oncologic imaging: identifying a lethal cancer before it becomes morphologically obvious. The findings suggest that routine CT, an already widely available modality, may contain latent prognostic and diagnostic information that human readers cannot consistently extract. If validated prospectively, that would shift AI from workflow optimization into true preclinical interception.

The most credible near-term application is not population screening but surveillance enrichment in groups already considered at elevated risk. These include individuals with familial pancreatic cancer syndromes, certain germline mutation carriers, some patients with pancreatic cystic neoplasia, and perhaps selected people with new-onset diabetes or unexplained pancreatic abnormalities. In those settings, disease prevalence is higher, the tolerance for additional testing may be greater, and the balance between false positives and missed early cancers may be more favorable.

It is also conceivable that REDMOD-like systems could be embedded opportunistically into routine abdominal CT interpretation. Many patients undergo CT for unrelated symptoms or surveillance of other conditions. An automated background analysis of the pancreas could flag a subset of examinations for closer expert review. However, this pathway raises practical questions around alert fatigue, medicolegal responsibility, incidental findings management, and reimbursement.

Strengths of the Study

Several aspects strengthen confidence in the findings. First, the investigators used an independent test cohort rather than relying solely on cross-validation. Second, they studied the problem in a low-prevalence configuration, which is more clinically honest than balanced datasets that can exaggerate performance. Third, the model was fully automated, including segmentation, which improves reproducibility and scalability. Fourth, they incorporated longitudinal and external validation rather than stopping at single-time-point performance reporting. Finally, the comparison with radiologists provides a clinically interpretable benchmark that many AI studies lack.

Limitations and Unanswered Questions

Despite the strong design, this remains a retrospective study, and retrospective success does not guarantee prospective benefit. Case ascertainment, scan selection, and temporal linkage to eventual PDA diagnosis may all introduce bias that is hard to eliminate completely. The abstract does not provide a full breakdown of CT acquisition parameters, contrast phases, scanner types, or calibration methods, all of which can influence radiomics performance.

The reported specificity, although respectable, may still be insufficient for broad deployment in average-risk populations. Because pancreatic cancer prevalence is very low in the general population, positive predictive value could remain limited even with apparently good discrimination. This is a recurring issue in cancer early detection: strong AUC does not automatically translate into clinically efficient screening.

Another open question is actionability. What should clinicians do with a positive REDMOD result on a scan that otherwise appears normal? Options might include short-interval repeat pancreas-protocol CT, MRI, endoscopic ultrasound, biomarker testing, or referral to a specialized pancreas clinic. Yet each pathway carries cost and potential harm. Prospective studies will need predefined management algorithms, not just diagnostic accuracy endpoints.

The abstract also does not report calibration metrics, confidence intervals, subgroup analyses, or stage distribution at eventual diagnosis. These details will be important in the full article. For example, clinicians will want to know whether the model performs consistently across age groups, body habitus, diabetic status, or varying lead times, and whether detected cases were more likely to be surgically resectable.

Finally, generalizability beyond the participating institutions remains to be proven. External validation is encouraging, but widespread adoption will require testing across community hospitals, diverse scanner ecosystems, and internationally heterogeneous imaging practices.

How This Fits With Current Practice and Guidelines

Current guidelines do not recommend general population screening for pancreatic cancer. Instead, surveillance is generally reserved for selected high-risk individuals, often using MRI and endoscopic ultrasound in expert centers. This reflects both the low prevalence of disease and the limitations of available screening tools. A CT-based AI model that can extract pre-diagnostic signal from already acquired scans could complement, rather than replace, existing surveillance strategies.

Importantly, CT is not usually the preferred repeated surveillance modality in high-risk cohorts because of radiation exposure and sensitivity considerations compared with MRI and endoscopic ultrasound. REDMOD’s best role may therefore be opportunistic detection on standard-of-care CT already obtained for other reasons, or as an adjunctive risk stratification tool helping determine who needs intensified pancreas-focused evaluation.

Research and Implementation Priorities

The next step should be prospective validation in clearly defined high-risk cohorts, as the authors appropriately note. Such studies should assess not only diagnostic accuracy but also clinical utility: earlier stage at diagnosis, increased resection rates, reduced interval to specialist evaluation, and ideally improved survival. Decision-curve analysis and cost-effectiveness modeling will be essential, especially if the tool triggers downstream advanced imaging or endoscopy.

Future work should also address calibration across scanners and institutions, fairness across demographic groups, and integration into radiology workflow. Transparent reporting of false-positive patterns will be particularly valuable. Knowing which benign pancreatic or peripancreatic conditions most commonly trigger the model could inform safeguards and triage algorithms.

Combining radiomics with clinical and biologic data may further improve performance. Candidate additions include age, family history, smoking status, diabetes trajectory, circulating biomarkers such as CA 19-9, and perhaps molecular signals from blood-based assays. Multimodal risk models are likely to outperform image-only systems when prevalence is low and clinical consequences of false positives are substantial.

Conclusion

Mukherjee and colleagues present one of the more clinically compelling AI studies in pancreatic imaging to date. REDMOD detected visually occult, pre-diagnostic pancreatic ductal adenocarcinoma on routine CT with meaningful lead time, outperformed radiologists, remained stable longitudinally, and retained specificity across external cohorts. Just as importantly, its performance appears linked to interpretable radiomic texture signatures rather than entirely inscrutable model behavior.

The findings do not yet justify routine screening deployment, particularly in average-risk populations. But they do provide a credible foundation for prospective studies in high-risk surveillance and opportunistic CT-based detection. If those studies confirm clinical utility, this approach could help move pancreatic cancer care upstream, from diagnosis after symptom onset to identification during a still-silent but potentially more treatable phase.

Funding and Trial Registration

No ClinicalTrials.gov registration number is provided in the abstract. Funding information is not reported in the abstract and should be verified in the full-text article.

References

1. Mukherjee S, Antony A, Patnam NG, Trivedi KH, Karbhari A, Bhinder KK, Zarrintan A, Fletcher JG, Truty M, Johnson MP, Chari ST, Goenka AH. Next-generation AI for visually occult pancreatic cancer detection in a low-prevalence setting with longitudinal stability and multi-institutional generalisability. Gut. 2026-04-28. PMID: 42049489.

2. Corral JE, Das A, Bruno MJ, Wallace MB. Cost-effectiveness of pancreatic cancer surveillance in high-risk individuals: an economic analysis. Pancreas. 2019;48(4):526-536.

3. Goggins M, Overbeek KA, Brand R, Syngal S, Del Chiaro M, Bartsch DK, Bassi C, Carrato A, Farrell J, Fishman EK, et al. Management of patients with increased risk for familial pancreatic cancer: updated recommendations from the International Cancer of the Pancreas Screening Consortium. Gut. 2020;69(1):7-17.

4. Owens DK, Davidson KW, Krist AH, Barry MJ, Cabana M, Caughey AB, Curry SJ, Doubeni CA, Epling JW Jr, Kubik M, et al. Screening for pancreatic cancer: US Preventive Services Task Force reaffirmation recommendation statement. JAMA. 2019;322(5):438-444.

5. Lennon AM, Wolfgang CL, Canto MI, Klein AP, Herman JM, Goggins M, Fishman EK, Kamel I, Weiss MJ, Diaz LA Jr, Papadopoulos N, et al. The early detection of pancreatic cancer: what will it take to diagnose and treat curable pancreatic neoplasia? Cancer Res. 2014;74(13):3381-3389.