Standardizing Liver Biopsy Interpretation in MASH Trials: Key Consensus Statements from the International MASLD Pathology Group

Introduction and Context

Metabolic dysfunction–associated steatohepatitis (MASH), within the newly framed spectrum of metabolic dysfunction-associated steatotic liver disease (MASLD), remains a leading indication for liver-related morbidity in adults with obesity and type 2 diabetes. Histologic assessment of liver biopsies — grading necroinflammatory activity and staging fibrosis — remains the gold standard for patient selection and primary endpoints in most therapeutic clinical trials. Yet variability in how pathologists define and score core histologic features (notably hepatocyte ballooning, lobular inflammation, and distribution of fibrosis) creates major problems: high inter- and intra-observer variability, inconsistent trial enrollment (screening failures), and variable placebo response rates that can obscure treatment effects.

To address these problems, the International MASLD Pathology Group (IMPG), a panel of 25 expert liver pathologists plus a statistician, developed consensus position statements to standardize histologic definitions, specimen handling, scoring use, and reader workflows specifically for MASH clinical trials. Their work, published in J Hepatol (Lackner et al., 2025), used a multi-stage Delphi process to arrive at a robust set of recommendations intended to improve reproducibility, facilitate multicenter trials, and support generation of high-quality annotated datasets for machine learning.

New Guideline Highlights

Key takeaways from the IMPG consensus:

– A rigorous, standardized approach to biopsy handling, staining, and reporting reduces variability and improves trial integrity.
– Clear, operational definitions for core histologic features (steatosis, ballooning, lobular and portal inflammation, Mallory-Denk bodies, and fibrosis patterns) were endorsed to reduce interpretive drift among pathologists.
– Guidance on the consistent application of currently used scoring systems (NASH Clinical Research Network [NASH-CRN] and SAF/FLIP) and on how to align them with trial endpoints was provided.
– Recommendations emphasize centralized reading paradigms (two independent blinded readers with adjudication), predefined training/calibration, and ongoing quality control.
– Statements were designed to be AI-ready: standardized labels, digitization standards, annotation conventions, and metadata are specified to enable reproducible supervised machine learning.

Important numerical outcomes of the Delphi process: the IMPG WGs generated 278 primary statements. In the first Delphi round 162 statements achieved ≥80% agreement. After revision and discussion, a second round yielded 192 final statements with ≥80% agreement.

Updated Recommendations and Key Changes

The IMPG statements do not substitute clinical practice guidelines for management of MASLD but provide standardized pathology procedures and definitions for clinical trials. Highlights of what is new or clarified compared with prior, less formal approaches include:

– Explicit minimum biopsy quality metrics for trial inclusion and reporting (specimen handling, fixation, length, and portal tract recommendations).
– Operationalized definitions for histologic ballooning and lobular inflammation to reduce subjective variation.
– Clear recommendations on which stains to use (H&E plus connective tissue stain) and when immunohistochemistry or special stains are appropriate.
– Standard guidance for the use of established scoring systems (NASH-CRN NAS, SAF) and how to interpret combined endpoints (e.g., resolution of steatohepatitis vs. fibrosis regression).
– Detailed recommendations for central reader workflows, calibration exercises, and concordance thresholds for adjudication.
– A framework for producing AI-compatible datasets: slide digitization standards, annotation granularity, and required metadata fields.

These consensus recommendations build on earlier foundational scoring systems (Kleiner et al. 2005 NASH-CRN; Bedossa et al. 2012 SAF) and on clinical practice guidance from major societies (AASLD 2018; EASL 2016), by focusing tightly on trial-related pathology standardization and reproducibility.

Topic-by-Topic Recommendations

Below are the practical, panel-endorsed recommendations organized by topic. Where IMPG used a consensus (Delphi) approach the panel’s agreement threshold (≥80%) is noted rather than traditional evidence grading.

Biopsy acquisition and processing
– Minimum quality metrics (consensus): recommend a core length target of at least 15–20 mm when possible, with documentation of number of portal tracts. Note: trials should predefine acceptable specimen lengths and a plan for resampling or exclusion if below threshold. (≥80% agreement)
– Fixation and processing: immediate placement in 10% neutral buffered formalin, routine paraffin embedding, and full-thickness sections. Record cold ischemia time and fixation time in metadata. (≥80%)
– Staining: mandatory H&E and a connective tissue stain (Masson trichrome or Sirius Red). Additional stains (e.g., CK8/18, ubiquitin) only when needed for specific endpoint adjudication. (≥80%)

Definitions of histologic features
– Steatosis: scored as % hepatocytes with macrovesicular steatosis (standard bins: 66%), consistent with prior scoring systems. (≥80%)
– Hepatocellular ballooning: panel endorsed operational descriptors (size increase, cytoplasmic rarefaction, rounded cell contour) and provided photomicrographic examples for training sets. The group emphasized differentiating ballooning from glycogenated nuclei or artifact. (≥80%)
– Lobular inflammation: defined as number and type of inflammatory foci per 200x field, with guidance on counting method and distinction from portal-based inflammation. (≥80%)
– Mallory-Denk bodies and apoptotic bodies: recommend explicit notation when present; their presence contributes to the activity score per trial protocol. (≥80%)

Fibrosis staging and reporting
– Separate activity (grading) and fibrosis (staging) — fibrosis must be staged independently of the activity score. (≥80%)
– Use of established staging anchors: the panel recommended continued use of validated ordinal fibrosis scales (e.g., NASH-CRN stages 0–4 or SAF fibrosis staging), with clear definitions for perisinusoidal (zone 3), portal, bridging, and cirrhotic changes. (≥80%)
– Document both pattern and stage (e.g., “stage 2 perisinusoidal and portal fibrosis”) to support mechanistic interpretation. (≥80%)

Scoring systems and trial endpoints
– Recommended that trial protocols pre-specify which scoring system will be used (NASH-CRN or SAF) and how composite endpoints are operationalized (e.g., “resolution of steatohepatitis without worsening fibrosis” or “≥1 stage fibrosis improvement”). (≥80%)
– Avoid mixing scoring systems across readers without pre-specified crosswalks. (≥80%)
– For fibrosis endpoints, require central re-staging of paired biopsies by the same readers and standardized washout periods to minimize drift. (≥80%)

Central reading, reader training, and adjudication
– Centralized reading recommended for phase II/III trials. The preferred model: two independent blinded expert readers with a predetermined adjudication path (third reader or consensus conference) for discordant cases beyond defined thresholds. (≥80%)
– Mandatory pre-trial calibration sessions using annotated training slides and ongoing quality control with intra-trial concordance monitoring. (≥80%)
– Define acceptable concordance metrics a priori (e.g., kappa thresholds or percent agreement) and an automated trigger for retraining or adjudication if concordance falls below thresholds. (≥80%)

Digitization, annotation, and AI readiness
– Whole-slide imaging at diagnostic resolution recommended; metadata must include scanner type, objective magnification, and file format. (≥80%)
– Standardized annotation conventions: mark regions of steatosis, ballooning, inflammation, and fibrosis; include both pixel-level and region-level labels where possible. (≥80%)
– Encourage sharing of de-identified, annotated datasets with clear licensing for model development and independent validation. (≥80%)

Special situations and caveats
– Biopsies with substantial fragmentation, poor fixation, or overt artifact should be annotated as such and—if below pre-specified quality thresholds—excluded from primary efficacy analyses per protocol. (≥80%)
– For rare histologic patterns (e.g., autoimmune features, cholestatic patterns, overlap syndromes), recommend additional clinical correlation and, where appropriate, exclusion from standard MASH endpoints unless explicitly allowed by the trial design. (≥80%)

Expert Commentary and Insights

The IMPG consensus was shaped by experienced liver pathologists and reflects both broad agreement and acknowledgment of key lingering controversies.

Areas of strong consensus
– That harmonized, operational definitions materially reduce inter-observer variability and improve trial efficiency.
– That central reading with pre-specified training/calibration is essential for multicenter trials.
– That AI will play an increasingly important role but requires standardized, high-quality labeled data.

Areas of ongoing debate
– Precise morphologic thresholds for what constitutes hepatocellular ballooning remain challenging in some cases; photographic atlases and training slides were emphasized as mitigation.
– Continuous (quantitative) versus ordinal (semiquantitative) approaches for fibrosis assessment: while quantitative digital measures are promising, the panel recognized regulatory and historical reliance on ordinal stages and therefore recommended parallel approaches when feasible.
– Minimum biopsy length: although longer cores reduce sampling error, practical constraints in multicenter trials (patient tolerance, operator skill) may make rigid cutoffs difficult. Trials should prespecify acceptable ranges and handling of suboptimal cores.

Future trends identified by the panel include wider adoption of digital pathology, development of validated image analysis algorithms that quantify both activity and fibrosis, and international efforts to pool annotated datasets to accelerate AI validation.

Practical Implications for Clinicians, Sponsors, and Pathologists

For trial sponsors and investigators
– Build the IMPG recommendations into protocol appendices: biopsy acquisition SOPs, central reader workflows, calibration plans, and biopsy quality thresholds.
– Expect lower screening failure due to histology-related discrepancies if standardized specimen handling and central reading are used.

For pathologists and central readers
– Participate in calibration exercises and use the IMPG photographic examples and annotation conventions to align interpretations.
– Document and report specimen adequacy and technical artifacts explicitly; these items matter for regulatory endpoints.

For clinics enrolling patients
– Coordinate with interventionalists to prioritize obtaining an adequate core (length, fixation protocol) and to ensure rapid processing and metadata capture for trial inclusion.

Patient vignette (illustrative)
John, a 52-year-old man with obesity and type 2 diabetes, consents to screening for a phase 3 MASH trial. A percutaneous liver biopsy is obtained and measured at 18 mm; it is fixed promptly and submitted with operator metadata. Central reading by two blinded pathologists (calibrated on the trial training set) reports: steatosis 34–66% (score 2), definite ballooning (score 1), lobular inflammation 2 foci/200x (score 1), NAS = 4, fibrosis stage 2 perisinusoidal and portal. The trial’s predefined inclusion requires NAS ≥4 and fibrosis stage 2–3; John is enrolled. Paired biopsy at 52 weeks is handled per the same SOPs and read by the same readers to maximize comparability.

References

1. Lackner C, Gouw ASH, Alves V, et al. Consensus position statements for the standardized application of histological grading and staging systems in MASH clinical trials. J Hepatol. 2025 Oct 8. doi:10.1016/j.jhep.2025.09.019. (Epub ahead of print)
2. Kleiner DE, Brunt EM, Van Natta M, et al. Design and validation of a histological scoring system for nonalcoholic fatty liver disease. Hepatology. 2005;41(6):1313–1321.
3. Bedossa P, Poitou C, Veyrie N, et al. Histopathological algorithm and scoring system for evaluation of liver lesions in patients with NAFLD. Hepatology. 2012;56(5):1751–1760.
4. Chalasani N, Younossi Z, Lavine JE, et al. The diagnosis and management of nonalcoholic fatty liver disease: Practice guidance from the American Association for the Study of Liver Diseases. Hepatology. 2018;67(1):328–357.
5. European Association for the Study of the Liver (EASL), European Association for the Study of Diabetes (EASD), European Association for the Study of Obesity (EASO). EASL–EASD–EASO Clinical Practice Guidelines for the management of non-alcoholic fatty liver disease. J Hepatol. 2016;64(6):1388–1402.

Closing

The IMPG consensus statements constitute a major step toward rigorous, reproducible histologic endpoints in MASH clinical trials. By standardizing specimen handling, harmonizing definitions, optimizing central reading workflows, and providing AI-ready annotation frameworks, these recommendations should reduce variability, improve trial signal detection, and accelerate reliable drug development for MASLD/MASH.