Deep learning classifies focus score and Sjögren’s disease from minor salivary gland biopsies and highlights a CD8+ acinar pattern

Deep learning classifies focus score and Sjögren’s disease from minor salivary gland biopsies and highlights a CD8+ acinar pattern

Highlights

– A deep‑learning model trained on digitised minor salivary gland H&E slides achieved strong external performance for focus score (AUROC 0.88) and Sjögren’s disease classification (AUROC 0.89).

– Model performance was particularly high in anti‑SSA (Ro) antibody–negative patients (AUROC 0.92), a clinically challenging subgroup.

– Explainable ML (Shapley values) identified a histological pattern—CD8+ T cells clustering around acinar epithelial cells—associated with Sjögren’s disease.

Background: clinical context and unmet need

Primary Sjögren’s syndrome (SjS) is an autoimmune disease characterised by lymphocytic infiltration of salivary and lacrimal glands, leading to sicca symptoms and systemic manifestations. Minor salivary gland biopsy with focus score measurement (number of lymphocytic foci per 4 mm2) is integrated into current diagnostic and classification frameworks and is a major objective criterion in the 2016 American College of Rheumatology–European League Against Rheumatism (ACR‑EULAR) classification criteria for primary Sjögren’s syndrome.

However, histopathological assessment suffers from inter‑observer variability: expert regrading can change the focus score and thus classification in a substantial proportion of cases. This variability complicates diagnosis, enrolment for clinical trials, and attempts to identify histopathological subtypes that might predict prognosis or treatment response. There is therefore a need for reproducible, scalable methods to read and interpret salivary gland biopsies and to mine them for new biologically and clinically meaningful patterns.

Study design and methods

Duquesne and colleagues conducted a retrospective, multicentre cohort study within the European H2020 NECESSITY consortium to develop and externally validate a deep‑learning classifier for both focus score (dichotomised at ≥1 versus <1) and for ACR‑EULAR–defined Sjögren's disease using digitised haematoxylin and eosin (H&E) minor salivary gland biopsy slides.

Key design elements:

  • Population: 545 participants from six expert centres across Europe (three UK centres; centres in Greece, Portugal, and France). Participants included people with sicca symptoms without Sjögren’s disease and patients meeting the ACR‑EULAR 2016 classification for Sjögren’s disease.
  • Index test: a deep convolutional neural network trained on digitised H&E slides from five centres; held‑out external validation was performed on slides from the sixth centre.
  • Primary endpoints: area under the receiver operating characteristic curve (AUROC) for (a) focus score classification (≥1 vs <1) and (b) Sjögren's disease classification (ACR‑EULAR positive vs negative).
  • Interpretability: Shapley values were computed to highlight image regions driving predictions, enabling identification of histological patterns contributing to model decisions.
  • Timeframe: study period Oct 13, 2021 to Sept 5, 2024.

Key results

Population: mean age 54.2 years (SD 13.5); 490/545 (90%) female and 55 (10%) male.

Primary performance metrics (external validation):

  • Focus score classification (≥1 versus <1): AUROC 0.88 (95% CI 0.82–0.94).
  • Sjögren’s disease classification (ACR‑EULAR criteria): AUROC 0.89 (95% CI 0.82–0.94).
  • Subgroup of anti‑Sjögren’s syndrome–related antigen A (anti‑SSA/Ro) antibody–negative patients: AUROC 0.92 (95% CI 0.87–1.00).

Explainability and histological discovery:

Using Shapley value–based attribution, the model highlighted regions and features that drove its predictions. Among these, the authors report identification of a previously undescribed or underappreciated pattern: dense aggregates of CD8+ T cells in close apposition to acinar epithelial cells (peri‑acinar CD8+ infiltration). This pattern was associated with Sjögren’s disease diagnosis in the dataset.

Operational and study findings of practical relevance:

  • The model relied solely on routine H&E slides; no advanced or special stains were required for the primary classification tasks.
  • Performance held up in external centre validation, suggesting reasonable robustness to inter‑centre variation in tissue processing and scanning, although full generalisability remains to be proven.

Interpretation and biological plausibility

The reported diagnostic performance suggests deep learning can provide a reproducible, objective adjunct to pathologist assessment of minor salivary gland biopsies. An AUROC in the high‑0.80s for both focus score and disease classification suggests clinical utility, particularly as an aid for centres with limited specialist pathology expertise or to reduce inter‑observer variability in multicentre clinical trials.

The identification of a CD8+ peri‑acinar infiltration pattern is biologically plausible. While Sjögren’s disease has historically been described as a B‑cell–rich, CD4+ T helper cell–driven process with characteristic focal aggregates, there is increasing evidence that cytotoxic CD8+ T cells and epithelial–immune interactions contribute to glandular dysfunction. CD8+ T‑cell–mediated epithelial injury could be an important pathogenic mechanism leading to acinar atrophy and reduced secretory function. If confirmed, this histological subtype could stratify patients by pathobiology and identify groups who might respond to therapies targeting T‑cell cytotoxicity or epithelial preservation strategies.

Strengths of the study

  • Multicentre dataset spanning multiple European pathology units enhances generalisability compared with single‑centre studies.
  • External validation using a held‑out centre provides a realistic estimate of performance across sites and helps guard against overfitting to local staining/scanner characteristics.
  • Use of explainable ML (Shapley values) moves beyond a black‑box model to biologically interpretable patterns, enabling hypothesis generation and histological discovery.
  • Particular strength in the anti‑SSA/Ro–negative subgroup addresses an important clinical gap: seronegative patients can be diagnostically challenging and are often under‑represented.

Limitations and cautions

Despite promising results, several limitations warrant emphasis before clinical adoption:

  • Retrospective design: case selection and spectrum bias may influence performance estimates. Prospective validation in unselected diagnostic cohorts is needed.
  • Ground truth labels: the model was trained against existing histological labels and ACR‑EULAR classification; inter‑observer variability in those labels remains a potential source of noise. How the model compares with a consensus panel or longitudinal clinical outcomes was not fully addressed.
  • Staining, scanning, and preprocessing heterogeneity: while external validation used a different centre, broader validation across more diverse laboratories, scanners, and populations (including non‑European cohorts) is required to confirm robustness.
  • CD8+ pattern verification: the report states association of a CD8+ peri‑acinar pattern; confirmation requires systematic immunohistochemical (IHC) validation in independent cohorts and correlation with clinical phenotypes and functional assays to determine pathogenic significance.
  • Regulatory, practical, and workflow integration hurdles: implementing AI in routine pathology requires technical integration, validation, pathologist acceptance, and regulatory approval for diagnostic use.

Clinical and research implications

If externally validated in prospective cohorts, the model could serve several roles:

  • Standardise focus score reporting and reduce inter‑observer variability in clinical practice and trials, thereby improving diagnostic consistency and trial eligibility adjudication.
  • Flag biopsies with the CD8+ peri‑acinar pattern for further targeted evaluation (IHC, molecular profiling), enabling histology‑driven subtyping for precision medicine approaches.
  • Provide adjunctive decision support in centres without specialist salivary gland pathology expertise, improving access to accurate diagnosis.

Research priorities include prospective validation, replication of the CD8+ finding with IHC and spatial transcriptomics, correlation of histological subtypes with clinical course and treatment response, and development of integrated models combining histology with serology and imaging.

Expert commentary

From a translational perspective, this study exemplifies how explainable AI can both automate routine classification tasks and generate biologically meaningful hypotheses. The high AUROC in anti‑SSA negative patients is particularly encouraging: these are patients for whom biopsy often remains pivotal. Nonetheless, experts will likely urge caution: AI models must be prospectively tested in real‑world diagnostic workflows and mapped to clinical outcomes before replacing or supplanting human judgement.

Conclusion

Duquesne et al. demonstrate that a deep‑learning approach using digitised minor salivary gland H&E slides can classify focus score and ACR‑EULAR–defined Sjögren’s disease with good discrimination and can uncover histological patterns—specifically peri‑acinar CD8+ T‑cell aggregates—associated with diagnosis. These findings offer a promising path toward more reproducible biopsy interpretation and toward histology‑based subtyping in Sjögren’s disease, but they require prospective, multi‑platform validation and biological confirmation before routine clinical implementation.

Funding and trial registration

Funding: Société Française de Rhumatologie, European Alliance of Associations for Rheumatology.

No clinicaltrials.gov identifier was reported for this retrospective diagnostic study.

Selected references

1. Duquesne J, Basseto L, Claye C, et al. Machine learning to classify the focus score and Sjögren’s disease using digitalised salivary gland biopsies: a retrospective cohort study. Lancet Rheumatol. 2025 Dec;7(12):e864‑e872. PMID: 41038216.

2. Shiboski CH, Shiboski SC, Seror R, et al. 2016 ACR‑EULAR classification criteria for primary Sjögren’s syndrome: A consensus and validation study. Ann Rheum Dis. 2017;76(1):9‑16.

Author note

This article was written to summarise and critically appraise the findings of Duquesne et al. for clinicians and researchers. It emphasises translational implications and necessary next steps for validation before clinical adoption.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply