MRD Does Not Consistently Predict PFS Across Modern CLL Trials: Why Surrogacy Depends on Therapy, Duration, and Sample Source

Proposed section structure

For this topic, a clinically appropriate structure is: clinical context and why surrogate endpoints matter in CLL; study design and methods; main findings at the individual and trial levels; subgroup analyses by treatment class, treatment strategy, and sampling source; implications for clinical trials, regulation, and bedside practice; strengths and limitations; and a concise conclusion with evidence-based takeaways.

Highlights

MRD negativity remained prognostic for individual patients with CLL, but this did not translate into reliable trial-level surrogacy for progression-free survival across all contemporary therapies.

The association between MRD and PFS was particularly weak in trials of Bruton tyrosine kinase inhibitor-based therapy and venetoclax-based regimens.

Stronger correlations were observed in regimens that included chemoimmunotherapy and/or monoclonal antibodies, and bone marrow MRD appeared more predictive of PFS than peripheral blood MRD.

These findings argue against using MRD as a universal primary endpoint for drug approval or comparative efficacy claims in CLL without important context.

Background

Chronic lymphocytic leukemia is increasingly managed with highly active targeted therapies, especially covalent and non-covalent Bruton tyrosine kinase inhibitors, BCL2 inhibition with venetoclax, and anti-CD20 monoclonal antibodies. These treatments have transformed outcomes and also changed how response is measured. In older chemoimmunotherapy eras, achieving deep remission, particularly undetectable measurable residual disease, often aligned with prolonged progression-free survival and sometimes with longer survival. That experience helped establish MRD as an attractive biomarker for treatment depth.

MRD refers to residual leukemia cells below the threshold of standard morphologic assessment, typically detected by multiparameter flow cytometry or molecular assays in peripheral blood or bone marrow. Because MRD can be measured earlier than clinical progression, it has become appealing as a trial endpoint, especially in fixed-duration regimens where treatment aims to induce a deep remission and then stop therapy. However, a prognostic biomarker is not automatically a valid surrogate endpoint. To function as a surrogate for PFS in drug development, changes in MRD caused by a treatment should reliably predict changes in PFS across trials and treatment classes.

This distinction has become more important in modern CLL. BTK inhibitors may produce durable disease control despite persistent low-level disease, whereas time-limited venetoclax-based regimens can achieve high uMRD rates but still show variable long-term disease kinetics depending on prior therapy, disease biology, and sampling method. Regulators, sponsors, and clinicians therefore need to know whether MRD can be generalized as a substitute for PFS, or whether its meaning depends on therapeutic context.

Study design and methods

Wang and colleagues addressed this question in a systematic review and meta-analysis of clinical trials evaluating new CLL therapies. The investigators searched PubMed, Web of Science, Embase, and the Cochrane Library through February 8, 2025. They identified 43 trials comprising 9628 subjects.

The central aim was not simply to ask whether MRD has prognostic value, but whether MRD is an accurate surrogate for PFS. That required assessment at two levels. First, at the individual-patient level, the authors compared outcomes in patients with detectable MRD versus undetectable MRD. Second, at the trial level, they examined whether differences in MRD outcomes between treatment groups correlated with differences in PFS across studies. Trial-level surrogacy is the more demanding standard and the one most relevant for endpoint substitution in phase 2 or phase 3 trials.

The study also explored important sources of heterogeneity, including treatment class, treatment strategy such as fixed-duration versus continuous therapy, and whether MRD was measured in blood or bone marrow. This design is clinically important because modern CLL regimens differ substantially in mechanism and depth-versus-duration tradeoffs.

Key findings

Individual-level prognostic value was strong

At the patient level, detectable MRD was clearly associated with worse outcomes. Subjects with detectable MRD had a significantly higher risk of progression or death than those with undetectable MRD, with a hazard ratio of 3.67 and a 95% credibility interval of 3.34 to 4.03 (P less than 0.01). This is a robust effect size and confirms a point most CLL clinicians already recognize: within a given therapeutic setting, achieving uMRD usually identifies patients at lower near- to intermediate-term risk of progression.

That finding supports the continued use of MRD for risk stratification, post-treatment counseling, and scientific characterization of response depth. It does not, by itself, validate MRD as a trial surrogate.

Trial-level correlation with PFS was weak overall

The more consequential finding was the weak relationship between MRD-testing results and PFS at the trial level. Across studies, the Spearman correlation was -0.33 and the coefficient of determination was only 0.06. In practical terms, only a small proportion of the variation in PFS treatment effect was explained by variation in MRD treatment effect.

This matters because drug development decisions often hinge on whether an early biomarker can stand in for a later clinical endpoint. A marker can be prognostic for patients yet still fail as a surrogate for comparing therapies. That is what this analysis suggests for MRD in aggregate across modern CLL trials.

Performance differed substantially by therapy class

The weakness of surrogacy was especially striking for BTK inhibitor-based therapy. In this subgroup, the reported correlation was -0.05 with R2 of 0.03, indicating essentially no meaningful trial-level association between MRD effect and PFS effect. This aligns with the known biology of BTK inhibition. Patients often remain in clinical remission for prolonged periods while continuing therapy despite detectable residual disease. In other words, disease control with BTK inhibition may depend less on eradication and more on sustained pathway suppression.

Venetoclax-based therapies also showed weak trial-level association, with correlation -0.28 and R2 0.02. That observation may seem counterintuitive because venetoclax can generate deep remissions and high uMRD rates. Yet the analysis suggests that between-trial differences in MRD do not reliably map onto between-trial differences in PFS. Several mechanisms may contribute, including differences in exposure duration, partner antibodies, prior lines of therapy, TP53 aberrations, IGHV status, and timing of MRD assessment.

Stronger correlation in older-style regimens and with marrow testing

Correlations were notably stronger for therapies involving chemoimmunotherapy and/or monoclonal antibodies. In those settings, bone marrow MRD showed a very strong correlation with PFS, with R of -0.94 and R2 of 0.86, suggesting that marrow-based MRD in these regimens may capture a much larger share of the treatment effect on disease control.

This result is biologically plausible. In cytotoxic or antibody-containing regimens designed to induce deep remission, residual disease burden may be more directly linked to subsequent relapse kinetics. Bone marrow assessment may also better reflect residual leukemic involvement in a disease that occupies multiple compartments. By contrast, peripheral blood MRD can underestimate residual disease in nodal or marrow compartments, especially when therapies redistribute lymphocytes or have differential tissue effects.

Fixed-duration therapy came closer to acceptable surrogacy

In sensitivity analyses stratified by treatment approach, fixed-duration therapy showed a stronger correlation of treatment effect, approaching the authors’ pre-specified validity threshold, with R of -0.80 (95% interval -0.93 to -0.53), R2 0.43, and P less than 0.01. This is an important nuance. It does not prove universal validity, but it suggests that MRD may be more informative when therapy is intentionally stopped after a defined period and when remission depth is central to the therapeutic strategy.

Even here, however, the explained variance remained incomplete. An R2 of 0.43 means more than half of the variability in PFS treatment effect remained unexplained by MRD. For a biomarker intended to replace a hard clinical endpoint in registration-quality trials, that degree of uncertainty warrants caution.

Clinical interpretation

Why prognostic does not equal surrogate

The central lesson of this study is methodological as much as clinical. A biomarker can distinguish higher-risk from lower-risk patients without being a reliable substitute for a treatment effect on clinical outcomes. MRD appears to do the former well in CLL. It does the latter inconsistently, and the inconsistency is not random; it tracks with therapeutic mechanism, treatment duration, and sampling compartment.

For clinicians, this means that a patient achieving uMRD generally has a favorable prognosis, but one should be cautious about inferring that a regimen producing more uMRD than another will necessarily deliver proportionally better PFS across all settings. For investigators and regulators, the findings argue against broad claims that MRD can serve as a class-agnostic primary endpoint for approval of new CLL therapies.

Implications for BTK inhibitor-based treatment

The failure of MRD to predict PFS well in BTK inhibitor-based trials is especially relevant because these regimens have become central to front-line and relapsed CLL management. Continuous BTK inhibition can sustain disease control despite persistent MRD positivity. This reduces the conceptual value of eradication-based biomarkers. A trial of one BTK inhibitor combination versus another might show differences in uMRD rates without meaningful differences in PFS, particularly if both regimens maintain long-term target suppression.

Thus, in BTK inhibitor development, endpoints such as PFS, time to next treatment, treatment discontinuation, toxicity burden, cardiac safety, bleeding risk, and patient-reported outcomes may be more informative than MRD alone.

Implications for venetoclax-based fixed-duration strategies

The picture is more favorable, though still incomplete, for fixed-duration venetoclax-based strategies. In these regimens, depth of remission is closely tied to the rationale for stopping therapy. MRD remains clinically useful for understanding remission quality and for designing adaptive or risk-stratified discontinuation studies. Still, the present analysis indicates that even in this context MRD should not be treated as a fully validated surrogate for PFS without supporting long-term follow-up.

One practical implication is that MRD may be best viewed as an intermediate endpoint or enrichment tool rather than a stand-alone substitute. It can help identify patients suitable for treatment cessation, intensification, or post-remission monitoring, but confirmation with PFS remains important in pivotal trials.

Bone marrow versus peripheral blood

The stronger performance of bone marrow MRD in some settings raises a familiar tradeoff. Marrow is more invasive but may better reflect total disease burden and residual sanctuary disease. Peripheral blood is easier to serially measure and is highly practical in routine care and multicenter trials. These data suggest that when MRD is used to support major efficacy claims, assay compartment matters. Blood-based assays may be convenient, but convenience should not be mistaken for interchangeable validity.

Strengths and limitations

The major strength of this study is conceptual rigor. Rather than simply pooling prognostic data, the investigators explicitly examined surrogacy at both the individual and trial levels, which is the correct framework for endpoint validation. The sample size was substantial, with 43 trials and 9628 subjects, and the subgroup analyses addressed clinically meaningful treatment distinctions.

There are also limitations, some inherent to the available literature. First, trial-level surrogacy analyses can be affected by heterogeneity in line of therapy, patient selection, assay platform, MRD threshold, timing of sampling, and follow-up duration. Second, published trial reports do not always provide uniform or granular data, especially for landmark MRD analyses and paired marrow-blood comparisons. Third, newer therapies and combinations continue to evolve, and the relationship between MRD and PFS may differ in emerging regimens such as non-covalent BTK inhibitors, BTK degraders, triplet combinations, or MRD-guided adaptive treatment strategies. Fourth, PFS itself is a surrogate for overall survival in many CLL settings, particularly when effective salvage therapy is available. Therefore, the endpoint hierarchy remains complex: an imperfect surrogate for a surrogate should face a particularly high evidentiary bar.

Practice and policy implications

This study has direct implications for trial design and regulatory review. For broad CLL drug development, MRD should probably not be accepted as a universal replacement for PFS. Instead, its role should be contextual and pre-specified. In fixed-duration studies, especially those centered on venetoclax-based approaches, MRD may function as an important co-primary, hierarchical, or supportive endpoint, provided long-term PFS validation is built into the protocol. In continuous therapy studies, especially BTK inhibitor-based regimens, MRD may be better suited for translational analyses than for primary efficacy determination.

Guidelines and consensus groups have long supported MRD as a valuable prognostic and research tool, but this paper reinforces the need to separate clinical usefulness from formal surrogacy. For practicing hematologists, the message is not to abandon MRD. Rather, it is to use MRD in the right way: as one piece of evidence alongside disease genetics, line of therapy, treatment goals, toxicity profile, and patient preference.

Conclusion

The meta-analysis by Wang and colleagues provides a timely corrective in the era of endpoint acceleration. In CLL, MRD negativity remains a strong prognostic marker for individual patients, but it is not a universally reliable trial-level surrogate for progression-free survival. Its validity varies by treatment mechanism, treatment duration, and sampling source. The weakest surrogacy was seen with BTK inhibitor-based and venetoclax-based regimens overall, while stronger relationships appeared in chemoimmunotherapy or antibody-containing approaches and in marrow-based assessments. Fixed-duration therapy came closest to acceptable surrogacy but still did not eliminate uncertainty.

The practical conclusion is straightforward: MRD is clinically meaningful, scientifically valuable, and potentially useful in selected development contexts, but it should be interpreted cautiously when informing major clinical or regulatory decisions. PFS remains necessary in many pivotal CLL trials, and claims based on MRD should be tailored to the biological and therapeutic setting rather than generalized across the disease.

Funding and ClinicalTrials.gov

The PubMed citation provided does not include funding details in the abstract. No ClinicalTrials.gov registration number applies to this meta-analysis as a single interventional study; it synthesized data from previously conducted trials.

References

Wang Y, Li C, Gale RP, Liang Q, Kay NE, Huang Q, Song Y, Wang W, Liang Y. Measurable residual disease is not a universally reliable surrogate for progression-free survival in clinical trials of new chronic lymphocytic leukemia therapies. Leukemia. 2026-04-30. PMID: 42062556.

Hallek M, Cheson BD, Catovsky D, et al. iwCLL guidelines for diagnosis, indications for treatment, response assessment, and supportive management of CLL. Blood. 2018;131(25):2745-2760. PMID: 29540348.

Rawstron AC, Fazi C, Agathangelidis A, et al. A harmonised approach for flow cytometric residual disease monitoring in chronic lymphocytic leukaemia. Leukemia. 2016;30(4):929-936. PMID: 26702070.

Thompson PA, Tam CS, O’Brien SM, et al. Fludarabine, cyclophosphamide, and rituximab treatment achieves long-term disease-free survival in IGHV-mutated chronic lymphocytic leukemia. Blood. 2016;127(3):303-309. PMID: 26567338.

Al-Sawaf O, Zhang C, Tandon M, et al. Venetoclax plus obinutuzumab versus chlorambucil plus obinutuzumab for previously untreated chronic lymphocytic leukaemia (CLL14): follow-up results from a multicentre, open-label, randomised, phase 3 trial. Lancet Oncol. 2020;21(9):1188-1200. PMID: 32758452.

Sharman JP, Egyed M, Jurczak W, et al. Efficacy and safety in a 4-year follow-up of the ELEVATE-TN study comparing acalabrutinib with or without obinutuzumab versus chlorambucil-obinutuzumab in treatment-naive chronic lymphocytic leukemia. Leukemia. 2022;36(4):1171-1175. PMID: 35017583.

Munir T, Brown JR, O’Brien S, et al. Final analysis from RESONATE: up to six years of follow-up on ibrutinib in patients with previously treated chronic lymphocytic leukemia or small lymphocytic lymphoma. Am J Hematol. 2019;94(12):1353-1363. PMID: 31512252.