Preprint
Essay

This version is not peer-reviewed.

Between Rigor and Relevance: Why the EU Hta Guidelines on Indirect Comparisons Miss the Mark

Submitted:

02 June 2025

Posted:

03 June 2025

You are already at the latest version

Abstract
Indirect treatment comparisons (ITCs) are essential in the context of joint clinical assessments (JCAs) under Regulation (European Union [EU]) 2021/2282, bridging evidence gaps where head-to-head data are lacking and enabling assessment across diverse national patient, intervention, comparator, and outcome (PICO) requirements. This paper critically reviews the EU Health Technology Assessment Coordination Group’s (HTACG) guidelines on direct and indirect comparisons, with particular focus on ITCs. While the guidelines promote transparency and rigorous evaluation of assumptions, they adopt a restrictive stance on assumption violations, the use of unanchored comparisons, and population-adjusted methods such as matching-adjusted indirect comparisons (MAIC) and simulated treatment comparisons (STC). The guidance shows limited support for Bayesian methods and undervalues meta-regression in favor of subgroup analyses. Operational implications for health technology developers (HTDs) are substantial, including new requirements for dual systematic reviews, multiple network structures, and shifted null hypothesis testing. Moreover, the exclusion of non-randomized comparisons in rare or rapidly evolving indications may inadvertently hinder access to effective treatments. Emerging practices such as external control arms (ECA) or target trial emulation are underdeveloped. Notably, there is no indication that the guidelines are grounded in systematic methodological validation studies. As JCAs evolve, greater methodological flexibility, empirical grounding, and clear operational guidance will be essential. Refining the guidelines along these principles would enhance their practical utility, mitigate intrinsic assessment variability, support consistent assessments across Member States (MS), and ultimately improve patient access to innovative therapies.
Keywords: 
;  ;  ;  ;  ;  ;  

1. Introduction

The implementation of Joint Clinical Assessments (JCAs) under Regulation (European Union [EU]) 2021/2282 marks a significant transformation in the landscape of health technology assessment (HTA) within the EU [1]. Central to this framework is the need to evaluate the certainty of the relative clinical effectiveness of new health technologies across a diversity of national healthcare contexts [1,2]. A distinctive feature of JCAs is the multiplicity of PICO (Population, Intervention, Comparator, Outcome) framework questions, reflecting the varied clinical practices, treatment comparators, outcomes of interest, populations and subgroups relevant across Member States (MS) [3,4,5,6]. This complexity presents considerable methodological challenges for comparative effectiveness research [7]. A recent exercise by the JCA subgroup identified 13 PICOs for two oncology indications [8].
In this setting, indirect treatment comparisons (ITCs) are an indispensable analytical tool. In many instances, head-to-head randomized controlled trials (RCTs) comparing the intervention with all relevant national comparators are not available at the time of assessment. As a result, robust and transparent methodologies for indirect evidence synthesis are essential to fill the evidence gaps and support HTA assessment and reimbursement decision-making across jurisdictions.
The methodological importance of ITCs has been acknowledged in previous HTA frameworks, which refer to rigorous guidance for performing network meta-analyses (NMAs) and Population-Adjusted Indirect Comparisons (PAICs) in the absence of direct evidence. The most well-known and influential guidelines on this topic are probably those by the National Institute for Health and Care Excellence (NICE), Decision Support Unit (DSU) and the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) Task Force, [9,10,11]. The recent guidelines issued by the HTA Coordination Group (HTACG) extend this tradition by specifying harmonized methodological standards for the conduct and assessment of quantitative evidence synthesis within JCAs [11,12].
However, the HTACG guidelines also raise significant concerns related to the validity of ITCs. These guidelines explicitly require the verification of key assumptions such as similarity, homogeneity, and consistency to justify the validity of synthesis of evidence from multiple trials [11,12]. These assumptions are often difficult to meet in practice, particularly in the case of disconnected networks or when effect modifiers are unevenly distributed across studies. The increasing use of advanced methods, such as Matching-Adjusted Indirect Comparisons (MAIC), Simulated Treatment Comparisons (STC), and Multilevel Network Meta-Regression (ML-NMR), has further complicated the evidentiary landscape, calling for careful scrutiny of methodological rigor, transparency, and robustness [13]. To date, there is no clear definition of methods that can be universally and reliably applied to assess the value of products.
This paper provides a critical review of the HTACG’s methodological and practical guidelines on direct and indirect comparisons, with particular emphasis on indirect evidence synthesis. It highlights areas of ambiguity or methodological contention and evaluates their applicability in the JCA context. Special attention is given to the implications of these guidelines for health technology developers (HTDs).

2. Summary of the Guidelines

Overall, the HTACG promotes a conservative, hypothesis-driven approach to evidence synthesis. Direct comparisons using robust RCT data are preferred, and anchored ITCs are acceptable if all methodological requirements are met. When PAICs or unanchored comparisons are necessary, their use should be carefully justified, transparently reported, and subjected to extensive sensitivity analysis. The guidelines emphasize the importance of pre-specified analytical plans, expert statistical input, and critical evaluation of residual uncertainties throughout the synthesis process [11,12].

Key assumptions

All forms of evidence synthesis, whether direct or indirect, are based on the fundamental assumption of exchangeability, which is achieved through three essential criteria [11,13,14,15,16]:
i.
Similarity of studies with respect to effect modifiers.
ii.
Homogeneity of relative treatment effects across trials comparing the same interventions; and
iii.
Consistency between direct and indirect comparisons within an evidence network.
Violation of any of these conditions undermines the validity of the analysis [17]. Similarity, in particular, is highlighted as difficult to verify in practice, given the potential presence of unobserved or even unknown effect modifiers [11].

Direct Comparisons

Direct comparisons are typically synthesized using standard pairwise meta-analysis methods [18]. Fixed-effect models (e.g., inverse variance or Mantel-Haenszel methods) are acceptable when the assumption of a common effect is tenable, although this requires strong justification [11]. In most cases, random effects models are preferred because of the expected heterogeneity between studies. The Knapp-Hartung method with the Paule-Mandel estimator is recommended when at least five studies are available; for fewer studies, the use of variance correction, the DerSimonian-Laird (DSL) method, qualitative summaries, or the alternative use of Bayesian methods with weakly informative priors are recommended [11].

Indirect treatment comparisons

In the absence of head-to-head trials, or when multiple interventions need to be evaluated simultaneously, indirect comparisons are necessary. The guidelines categorize ITC methods into three primary types [11]:
1.
Anchored indirect comparisons
These preserve randomization by estimating relative treatment effects using a common comparator. The Bucher method is appropriate for comparisons with a common comparator and can be applied to simple star networks [19]. The NMA is required for more complex evidence networks [16]. Both frequentist and Bayesian implementations of NMA are acceptable, with the latter offering advantages in handling sparse data and incorporating prior knowledge [11]. Validity depends on the assumptions of similarity, homogeneity, and consistency, all of which must be rigorously evaluated. Tools such as node splitting [20] and inconsistency models [21] can be used to formally evaluate inconsistency within closed loops.
2.
Population-Adjusted Indirect Comparisons
When the similarity assumption is not met, due to observed imbalances in effect modifiers, PAICs may be considered [11]. These require access to individual patient-level data (IPD) for at least one study and include methods such as MAIC, STC, and ML-NMR [13]. These approaches rely on the assumption of conditional constancy of relative effects, i.e., that all relevant effect modifiers have been correctly identified and included. Because of their complexity and sensitivity to modeling choices, guidelines recommend that PAICs be prespecified and accompanied by sensitivity analyses, including formal tests against shifted null hypotheses to account for residual uncertainty [11,12].
3.
Unanchored comparisons and non-randomized evidence
In disconnected networks without a common comparator, analyses are equivalent to comparisons based on non-randomized evidence. Such comparisons require full access to IPD and rigorous adjustment for confounding [22]. The guideline briefly acknowledges advanced IPD-based methods for confounding adjustment, including multiple regression, instrumental variables, and g-computation [11], but does not elaborate on their implementation or endorse them as standard practice. Instead, it emphasizes propensity score-based methods, such as matching or inverse probability weighting. These require that all known relevant confounders be measured and that assumptions of positivity, overlap, and covariate balance be satisfied [13]. Even when these conditions are met, results from non-randomized data are considered to have a higher uncertainty and risk of bias, and should be interpreted with caution [11]. Unanchored MAICs and STCs that use aggregate comparator data rather than IPDs are considered insufficient.

3. Critical Review

The HTACG guidelines on direct and indirect comparisons set an ambitious standard for methodological rigor in the JCA framework, going beyond those previously set by the NICE DSU, the ISPOR Task Force on ITC, or the Cochrane Handbook [9,10,23,24].
However, a critical analysis reveals several problematic areas where the guidelines are either overly conservative, inconsistently reasoned, or insufficiently aligned with accepted best practice in evidence synthesis.
1.
Ambiguity in the role of indirect evidence alongside direct comparisons
The guidelines are unclear about whether indirect evidence can complement direct comparisons. While established methods such as NMA allow the integration of both evidence types, subject to a consistency check, the HTACG does not explicitly support this approach. The methodological guideline states for example: “when treatments have not been directly compared in RCTs, indirect comparisons can be used”[11]. This formulation is ambiguous and might be interpreted as “when treatments have been directly compared, indirect comparisons cannot be used.” The latter interpretation would be aligned with another statement from the HTACG: “the inclusion of additional comparators and studies beyond those required to connect the network should generally be avoided.” The omission of clear support for combining direct and indirect data creates uncertainty, which contradicts the international consensus favoring mixed treatment comparisons when properly justified [25,26].
Discouraging the use of larger networks stands in opposition to the principle that broader networks, incorporating (more) indirect evidence, yield more precise estimates and support more generalizable inferences, when used appropriately [27].
2.
Overly restrictive interpretation of assumption violations
The HTACG’s strict stance on assumption violations is a central concern. The methodological guideline asserts that “if any of these properties [similarity, homogeneity, consistency] do not hold, the results of an anchored indirect comparison are unlikely to provide a meaningful estimate” [11]. This fails to recognize the role of methods such as random-effects models, meta-regression, and sensitivity analyses in mitigating these limitations rather than discarding analyses altogether. NICE DSU guidance and other international sources promote such techniques as a standard part of evidence synthesis [26,28].
Furthermore, the guidelines suggest that comparisons should not proceed if data on a relevant effect modifier are missing in one or more studies: “If data on a relevant effect modifier is unavailable from one or more studies, then such a comparison cannot be made, and this should be clearly reported as a limitation” [12]. The guidelines then allow for the possible use of proxies, but it appears that the preferred approach of HTACG is to do nothing or perhaps to exclude the studies with missing data for the effect modifier. Excluding studies for a single missing covariate risks unnecessary loss of information. Best practice, as recommended in NICE DSU Technical Support Documents (TSDs) 18 and ISPOR guidelines, favors using proxy variables, modeling missingness, or applying bias-adjustment techniques over outright exclusion [24,29]. Evidence-based medicine (EBM) best practices emphasize the importance of incorporating all available evidence, even studies with limitations, to ensure comprehensive consideration in decision making. A grading of the strength of evidence is then provided to support the recommendation [30,31].
3.
Population-Adjusted ITCs: overcautious appraisal
The guidelines acknowledge the relevance of methods such as MAIC and STC in the context of anchored comparisons, yet caution that they are “often more suitable as an exploratory analysis rather than as the primary analysis”[12]. The HTACG’s concern that “the number of methods and potential covariate combinations available to the modeler raises the possibility of selecting the method that produces the most favorable results for the intervention under assessment” is understandable [12]. However, adjusted analyses generally provide the most plausible estimate of treatment effect, and a more appropriate recommendation would be to report the adjusted analysis as the primary analysis and the unadjusted one as secondary.
Moreover, the guideline uniquely recommends applying shifted null hypothesis testing to PAICs to account for possible unmeasured effect modifiers, without requiring the same for standard NMA or Bucher methods [11]. All indirect comparisons, adjusted or unadjusted, are subject to unverifiable assumptions about effect modifiers [24]. If this approach of shifted null hypothesis testing is considered informative, it is unclear why it would apply only in the case of PAICs. Furthermore, it should be noted that this approach is not standard and therefore does not align with the stated objective of describing the “most commonly used methods” for ITC.
In a sense, a shifted null is a way to make it harder to claim an effect is statistically significant. But we already account for uncertainty via confidence intervals (CI), credible intervals, or prediction intervals. If bias is suspected, one would typically interpret the results qualitatively with caution or adjust the level of certainty (e.g., GRADE “very low certainty” if there is a high risk of bias). Imposing a formal test threshold Δ could be seen as an arbitrary second hurdle. For instance, if a hazard ratio is 0.85 with 95% CI 0.72–0.99 (p < 0.05 for H0:1), a shifted null approach might say “we will only consider it significant if p < 0.05 for H0:0.90” – effectively requiring the CI to exclude 0.90. Why 0.90? The guideline offers no specific literature, leaving it to the HTD to justify. This could cause debates between companies and assessors over the choice of threshold rather than focusing on the actual data [11]. Furthermore, the guideline explicitly defers threshold setting to MS, which somewhat contradicts the idea of a unified assessment. France’s Haute Autorité de Santé (HAS) and Germany’s Institut für Qualität und Wirtschaftlichkeit im Gesundheitswesen (IQWiG) remain highly resistant to ITC evidence [32,33,34,35], while Sweden, Netherlands, Norway, and Belgium show greater openness to such data [33,36]. Allowing MS to set their own thresholds enables them to accept or reject ITC results based on their preferences. This does not align with JCA’s goal to standardize clinical assessment between countries. The approach of E-values could be considered as an alternative [37].
4.
Meta-regression vs. subgroup analyses: misplaced prioritization
Another controversial recommendation is the endorsement of subgroup analyses over meta-regression. The methodological guideline states that “in the context of JCA, subgroup analyses are often more useful than meta-regression”[11]. This is a debatable position. Subgroup analyses, especially those based on aggregate trial-level data, are underpowered, prone to spurious findings, often require dichotomization of continuous variables, and sometimes violate randomization. In contrast, meta-regression, when applied appropriately, can exploit continuous covariates across trials and detect trends that would be missed by binary subgroup splits [38].
This contradicts the view presented in the NICE DSU TSD 3, which emphasizes the use of meta-regression where feasible and only warns against ecological bias, without suggesting that one method is inherently superior [26]. A more balanced approach would advocate for a case-by-case evaluation, with sensitivity analyses using both methods where appropriate.
5.
Underappreciation of Bayesian methods
Bayesian methods receive only cursory mention in the guidelines, despite being especially advantageous in the small-sample and rare-event scenarios explicitly acknowledged as problematic for frequentist methods [39]. For example, the practical guideline admits that Knapp-Hartung methods may yield “misleadingly narrow” CIs in homogeneous settings and “non-informative results” when data are sparse [12]. Nonetheless, Bayesian approaches are described merely as “alternative” and are conditioned on the availability of priors [12]. By contrast, widely accepted literature endorses Bayesian methods as preferred in sparse or uncertain settings because of their ability to incorporate prior knowledge and produce interpretable posterior distributions [40,41,42].
Furthermore, the HTACG guidelines may result in assessment dossiers employing a mix of frequentist and Bayesian methods across different comparators or outcomes, depending on data availability and the specific context. While acknowledging data limitations, this approach can introduce complexity. Specifically, combining frequentist and Bayesian methods within a single assessment can pose challenges to maintaining a consistent inferential framework. This could lead to difficulties in the interpretation of results, conflicting with the Neyman-Pearson principles that underlie much of frequentist hypothesis testing. Prioritizing a single, unified statistical framework within an assessment could enhance consistency in interpreting results and reduce potential ambiguity in the overall synthesis. Moreover, given the complexities associated with applying frequentist multiplicity controls in the diverse landscape of HTA, a consistent application of a Bayesian framework may offer a more coherent and interpretable approach to conducting ITCs.
6.
Unanchored comparisons and non-randomized evidence: an overly restrictive stance
The HTACG takes a strong negative stance on unanchored indirect comparisons, stating that “only anchored indirect comparisons are appropriate, as these respect within-study randomization” [11]. While preserving randomization is methodologically ideal, this categorical rejection of unanchored methods is both impractical and inconsistent with modern HTA practice [43]. Single-arm trials (SATs) are increasingly used, especially in oncology and rare diseases. Between 2002 and 2021, approximately 31% of United States Food and Drug Administration (FDA) oncology approvals were based solely on SATs [44]. The European Medicines Agency (EMA) also recognizes that SATs may serve as pivotal evidence in specific regulatory contexts [45].
The guidelines risk eliminating informative analyses in situations where no other comparative evidence is available. This is particularly problematic when high-quality real-world data or external control arms (ECAs) can be used in conjunction with advanced methodologies such as propensity score adjustment, doubly robust estimators, or target trial emulation frameworks [46,47]. Such approaches are endorsed in both regulatory and academic literature as credible strategies when randomized data are unavailable [43,48]. Despite their limitations, unanchored MAIC and STC also have a role to play in informing decision-making [43]. The rigid exclusion of unanchored comparisons thus contrasts with a growing international consensus that, while such methods carry higher uncertainty, they may still support decision-making when carefully designed and transparently reported. This stance also conflicts with EBM principles, which advocate using all available evidence.
Although the HTACG considers unanchored comparisons to be generally “inappropriate”, the guidelines do provide a succinct description of relevant methods [11,12]. When the pivotal evidence for the assessed product is based on an SAT, the guidelines stipulate that “adjustment methods require access to the full IPD”, for both the assessed product and the comparator. This effectively positions MAICs or STCs, which utilize aggregate comparator data, as less desirable than ECAs. This is a contentious point. In practice, IPD are often unavailable for comparator trials, necessitating the use of ECAs derived from retrospective real-world data [49]. Consequently, the choice between an ECA based on potentially lower-quality real-world data and a MAIC or STC using higher-quality aggregate data is often complex, and the guidelines’ preference for ECAs in this context is debatable.
7.
Overlooked considerations in living evidence synthesis
While the guideline references the need for evidence synthesis to be current, it omits any discussion of living NMA or updating mechanisms [11]. This is a missed opportunity, particularly in fast-evolving fields like oncology, where evidence landscapes shift rapidly. Cochrane and other methodological leaders now emphasize living NMAs as essential for maintaining relevance and responsiveness to emerging data [50].
8.
Support for advanced time-to-event (TTE) methods
The guideline’s handling of TTE data is more balanced. It rightly calls for testing the proportional hazards (PH) assumption and recommends alternatives such as restricted mean survival time (RMST) and flexible survival models when PH does not hold. This is consistent with emerging best practices, particularly in oncology where PH violations are common [51].

4. Implications for JCA Submissions

The HTACG’s guidelines on direct and indirect comparisons impose significant methodological and procedural expectations that have substantial implications for HTDs preparing JCA submissions. The reporting requirements for ITCs go well beyond traditional HTA standards
First, the guidelines emphasize the need for prespecified strategies and transparent criteria to identify covariates that may impact relative treatment effects. More specifically, HTDs must now conduct not only a systematic literature review (SLR) to identify clinical evidence on treatment effects, but also a second comprehensive review to identify potential effect modifiers, as well consultations with health care professionals (HCPs) with experience in the disease area [11,12]. The implication is that reliance on subgroup analyses identified in the clinical study report and SLR and consultations with clinicians employed by the HTD will no longer suffice. The need to formally involve HCPs with disease-specific expertise in identifying effect modifiers adds further operational complexity, as this consultation must presumably be documented and justified within the submission.
Second, the guidelines mandate that JCA dossiers include both population-level and comparator-level evidence networks when conducting NMAs [12]. While this may enhance transparency for national decision-makers, it introduces an analytical burden that could be significant for submissions with multiple comparators. HTDs will need to construct and validate several network structures, increasing the risk of type I error and complicating consistency evaluations. No empirical justification is provided for this dual approach, which may lead to duplication of effort without corresponding benefit.
Third, the guideline’s preference for shifted null hypothesis testing in adjusted ITCs (e.g. MAIC/STC) imposes an additional evidentiary hurdle. HTDs are expected to defend the choice of a clinically meaningful threshold (e.g., a relative risk reduction of 10%) without a standard benchmark, leaving room for arbitrary or inconsistent assessments across MS. This creates uncertainty about how population-adjusted estimates will be received and whether they will be discounted based on subjective criteria.
Fourth, the guidelines recommend that prediction intervals, not just confidence or credible intervals, be routinely reported in random-effects models. This requirement is methodologically sound, but rarely fulfilled in current reports of ITCs [52]. HTDs must now plan to generate and interpret prediction intervals, particularly for sparse data scenarios where between-study heterogeneity may be high.
Fifth, while the use of unanchored comparisons is generally discouraged, there is a clear preference for ECAs over PAIC using aggregate data for the comparator, such as MAICs and STCs. This stance diverges from recent HTA trends: many relative effectiveness assessments based on SATs, especially in oncology, have relied on MAICs and STCs, while the use of formal ECAs remains relatively rare in European regulatory and HTA contexts [49,53]. Recent empirical reviews have found that MAIC and STC methods are commonly used and accepted in health technology submissions when SATs serve as pivotal evidence, largely due to the IPD requirement being limited to sponsor-held data [46]. In contrast, ECAs, requiring either high-quality observational data or access to relevant patient registries, pose significant feasibility challenges in terms of data access, endpoint alignment, and methodological transparency [54]. Nonetheless, as real-world data infrastructures mature and the methodological landscape evolves, HTDs should increasingly consider the feasibility of developing ECAs in such contexts. Approaches such as target trial emulation, leveraging observational datasets with robust confounding adjustment (e.g., propensity scores or doubly robust estimators), may offer a viable pathway toward more acceptable unanchored evidence [48].
Given the multiplicity of PICO questions [3,4,5,8] inherent in the JCA process and the need for sensitivity analyses and subgroup analyses, these expectations cumulatively lead to a marked increase in analytical and submission workload. In sum, HTDs will need to allocate additional resources—statistical, clinical, and procedural—to ensure compliance with the HTACG’s expectations. These guidelines not only raise the methodological bar, but also introduce substantial practical complexity, which, if not addressed proactively, could jeopardize the acceptability of the indirect evidence submitted in support of new interventions.

5. Conclusions

The HTACG’s methodological and practical guidelines on ITCs represent an important step towards harmonizing evidence synthesis practices across Europe within the JCA framework. The guidelines promote transparency, detailed reporting, and rigorous assessment of key assumptions - principles that are essential for robust comparative effectiveness assessments in an increasingly complex healthcare landscape [11,12].
However, this review has identified several important limitations that warrant careful consideration. First, the guidelines take an overly conservative stance toward assumption violations in ITCs, with a tendency to encourage rejection of analyses rather than consideration of available adjustment methods. Techniques such as meta-regression, population-adjusted indirect comparisons, and Bayesian modeling are acknowledged, but often relegated to secondary or exploratory roles, despite their widespread acceptance in international HTA and regulatory contexts [32,33].
Second, the guidelines do not adequately address the integration of direct and indirect evidence, nor do they clearly support the use of mixed treatment comparisons when both types of evidence are available. In areas where assumptions may not fully hold, such as the presence of unknown effect modifiers or missing covariates, the recommendations favor study exclusion over analytical adjustment, potentially leading to loss of evidence and biased conclusions.
Third, while the guidelines express methodological caution about unanchored comparisons, they do not adequately recognize the practical realities that necessitate such approaches, particularly in rare diseases and oncology, where SATs remain prevalent [55]. The lack of detailed guidance on acceptable methods for ECAs and on strategies such as target trial emulation represents a missed opportunity to promote more sophisticated use of real-world evidence in HTA [47].
Fourth, the guidance introduces several operational requirements for HTDs, including dual systematic reviews, comparator-level network analyses, and shifted hypothesis testing, that significantly raise the complexity of JCA submissions. While these requirements may enhance methodological thoroughness, they risk reducing the feasibility and efficiency of submissions, considering the short timelines [56], particularly for smaller companies or in therapeutic areas with fragmented evidence bases.
Finally, it is worth noting that the developers of the HTACG guidelines do not appear to have adhered to the same evidence-based standards that they recommend for submissions. The guidelines lack transparent documentation of the evidence base underpinning their methodological choices. Specifically, there is no mention of results from empirical evaluations or simulation studies that assess the performance, validity, or limitations of different evidence synthesis and ITC approaches. This omission raises concerns about the robustness and transparency of the recommendations. Best practice in guideline development, such as that advocated by GRADE [57], emphasizes the importance of grounding recommendations in a systematic review of methodological evidence.
Despite the EU HTA guidelines’ aim to create a unified methodological framework, it is doubtful that they will have a practical impact on harmonizing the perception and utilization of ITCs across Europe. Historically, HTA agencies have exhibited markedly divergent attitudes toward indirect evidence, with agencies like IQWiG in Germany rejecting 94% of ITCs submitted in benefit assessments between 2011 and 2017, largely due to concerns about study suitability, similarity, and statistical methods [58]. Similarly, the French Haute Autorité de Santé (HAS) has maintained a cautious stance on accepting adjusted indirect evidence. The current EU HTA guidelines risk reinforcing these conservative positions by providing formal arguments for rejecting analyses without offering mechanisms to promote alignment or mutual recognition of evidence standards across Member States. The development of a framework to grade the certainty of indirect evidence could be part of such harmonization mechanisms, whereas the current guidelines tend to dismiss evidence in a binary manner.
Guidelines alone are unlikely to address the concern feeding the resistance of some HTA agencies against indirect evidence. Randomized controlled trials (RCTs) remain the gold standard for clinical evidence, in part because all analytical methods are prespecified in study protocols and statistical analysis plans, limiting the scope for post hoc methodological choices. In contrast, ITCs—especially those requiring population or outcome adjustments—are often conducted after trial completion, making full pre-specification difficult, as the methodology needs to be adapted according to available data. Also, the suspicion surrounding ITCs may stem less from flaws in the methodologies themselves than from the flexibility they afford health technology developers (HTDs) in selecting analytic strategies that may yield more favorable outcomes. Nonetheless, ITCs are indispensable in the health technology assessment (HTA) process: it is neither feasible nor ethical to conduct head-to-head RCTs for every comparator in every population of interest. If the EU HTA regulation is to facilitate access to innovative therapies across MS, it must allow for PICO questions to be answered through rigorously conducted and transparently reported ITCs, and HTA agencies must accept adjusted analyses. If concerns about selective reporting persist, greater involvement of independent agencies—either through independent conduct of ITCs or supervisory roles during their design and analysis—may be necessary to ensure both credibility and acceptability of adjusted indirect evidence. As those agencies may not be able to access IPD and therefore may not be able to conduct ITCs directly, their role could be exercised through active supervision of the planning and implementation of analyses.
In conclusion, while the HTACG guidance documents establish a rigorous foundation for evidence synthesis in the EU, their current formulation may inadvertently constrain methodological flexibility and limit the practical use of valid ITCs. To ensure both scientific credibility and operational feasibility, future updates should aim to:
  • Provide clearer support for the integration of direct and indirect evidence (i.e. “mixed” treatment comparisons);
  • Allow for appropriately adjusted analyses in the presence of assumption violations;
  • Guide, rather than restrict, the cautious use of unanchored comparisons when no alternatives exist, including more guidance on the use of real-world evidence and ECAs;
  • Enhance support for Bayesian methods: the HTACG may have given more attention to frequentists methods assuming that assessors are more familiar with them, although these guidelines provide in fact an opportunity to help assessors understand the value of Bayesian methods;
  • Incorporate operational guidance that reflects the complexity and diversity of real-world submissions; and
  • Propose a framework for grading the strength of indirect evidence.
Addressing these issues could enhance the guidelines’ utility, support more consistent and equitable HTA outcomes across MS, and ultimately improve timely patient access to effective therapies in the EU.
  • Text box 1: Key limitations identified in the EU HTA Guidelines on quantitative evidence synthesis
Overly conservative stance on assumption violations, risking blanket rejection of indirect treatment comparisons (ITCs)
Ambiguous guidance on combining direct and indirect evidence, with no clear support for mixed treatment comparisons
Restrictive preference for subgroup analyses over meta-regression, despite lower statistical power and greater risk of false positives
Limited endorsement of population-adjusted methods (e.g., MAIC, STC), despite their relevance when effect modifiers are imbalanced
Minimal consideration of Bayesian approaches, despite their strengths, particularly in sparse or rare-event settings
Rigid dismissal of unanchored comparisons, overlooking recent advances in causal inference frameworks and evolving methods to quantify biases
No indication that recommendations are grounded in empirical validation or simulation studies
  • Text box 2: Suggested improvements for the EU HTA guidelines on indirect treatment comparisons
Reconsider the guideline’s advice against using indirect evidence when direct comparisons exist, as this restriction lacks empirical or methodological justification
Consider the wider use of Bayesian methods, which are well suited to quantitative evidence synthesis; provide readers with the knowledge required to engage with Bayesian approaches; reassess the appropriateness of combining Bayesian and frequentist analyses within the same dossier (currently allowed by the guidelines)
Offer balanced recommendations on population-adjusted methods (e.g., MAIC, STC), acknowledging their relevance as primary analyses in many situations
Explicitly acknowledge that all evidence has limitations, and propose a framework for grading the certainty of indirect evidence (e.g., in the manner of GRADE) to support decisions under imperfect evidence, potentially incorporating quantitative bias analysis
Expand guidance on leveraging real-world evidence and external control arms, including the adoption of emerging methods such as target trial emulation
Consider the practical feasibility of the guidelines, recognizing the limited time and resources available to both health technology developers and national assessors
Ground methodological recommendations in a systematic review of empirical validation studies and simulation research, ensuring that guidance reflects best available evidence

Author Contributions

S.A and M.T.: Conceptualized the content. S.A wrote the first draft of the manuscript. P.W cross checked and adjusted the references. The co-authors: P.W., E.C., B.F., S.S., P.A., S.C., R.B., J.R., F.-U.F., O.S.M., and L.B. challenged the concept, edited the manuscript, and refined arguments for clarity and coherence. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

None.

Conflicts of Interest

S.A and M.T are current employees of Inovintell. P.W. and E.C. are current employees of Clever Access. O.S.M. is a current employee of Health Innovation Technology Transfer. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
CI Confidence Interval
DSU Decision Support Unit (of NICE)
Δ (Delta) Symbol used for a threshold in hypothesis testing
EBM Evidence-Based Medicine
ECA External Control Arm
EMA European Medicines Agency
EU European Union
FDA United States Food and Drug Administration
GRADE Grading of Recommendations, Assessment, Development and Evaluation
H0 Null Hypothesis
HAS Haute Autorité de Santé (France)
HCP Health Care Professional
HTA Health Technology Assessment
HTACG Health Technology Assessment Coordination Group (of the EU)
HTD Health Technology Developer
IPD Individual Patient Data
IQWiG Institut für Qualität und Wirtschaftlichkeit im Gesundheitswesen (Germany)
ISPOR International Society for Pharmacoeconomics and Outcomes Research
ITC Indirect Treatment Comparison
JCA Joint Clinical Assessment
MAIC Matching-Adjusted Indirect Comparison
ML-NMR Multilevel Network Meta-Regression
MS Member States (of the EU)
NICE National Institute for Health and Care Excellence
NMA Network Meta-Analysis
PAIC Population-Adjusted Indirect Comparison
PH Proportional Hazards
PICO Population, Intervention, Comparator, Outcome
RCT Randomized Controlled Trial
RMST Restricted Mean Survival Time
SAT Single-Arm Trial
SLR Systematic Literature Review
STC Simulated Treatment Comparison
TSD Technical Support Document (from NICE DSU)
TTE Time-To-Event

References

  1. European Commission. Regulation (EU) 2021/2282 of the European Parliament and of the Council of 15 December 2021 on health technology assessment and amending Directive 2011/24/EU. Available online: https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32021R2282 (accessed on 18 December 2024).
  2. The Member State Coordination Group on Health Technology Assessment (HTACG). Guidance on the validity of clinical studies for joint clinical assessments. V1.0. Available online: https://health.ec.europa.eu/document/download/9f9dbfe4-078b-4959-9a07-df9167258772_en?filename=hta_clinical-studies-validity_guidance_en.pdf (accessed on 30 December 2024).
  3. Hollard, D., G. Roberts, I. Taylor, J. Gibson, and O. Darlington. HTA77 PICO Consolidation in European HTA Scoping: Examining PICO Variations in Oncology Drugs in the Context of the European Joint Clinical Assessment. Value Health 2024, 27, S258.
  4. Young, K. and I. Staatz. HTA111 Population, Intervention, Comparator, Outcomes (PICO) of ATMPs and Potential Impact on the Upcoming EU Regulation on HTA. Value Health 2023, 26, S340.
  5. van Engen, A., R. Kruger, J. Ryan, and P. Wagner. HTA97 impact of additive PICOs in a European joint health technology assessment. A hypothetical case study in lung cancer. Value Health 2022, 25, S315.
  6. The Member State Coordination Group on Health Technology Assessment (HTACG). Guidance on the scoping process. Available online: https://health.ec.europa.eu/document/download/7be11d76-9a78-426c-8e32-79d30a115a64_en?filename=hta_jca_scoping-process_en.pdf (accessed on 2 May 2025).
  7. van Engen, A., R. Krüger, A. Parnaby, M. Rotaru, J. Ryan, D. Samaha, and D. Tzelis. The impact of additive population (s), intervention, comparator (s), and outcomes in a European joint clinical health technology assessment. Value Health 2024, 27, 1722-1731.
  8. The Member State Coordination Group on Health Technology Assessment (HTACG). PICO exercises. Available online: https://health.ec.europa.eu/publications/pico-exercises_en (accessed on 26 February 2025).
  9. Dias, S., A.J. Sutton, A.E. Ades, and N.J. Welton. Evidence Synthesis for Decision Making 2. Med. Decis. Making 2013, 33, 607-617.
  10. Hoaglin, D.C., N. Hawkins, J.P. Jansen, D.A. Scott, R. Itzler, J.C. Cappelleri, C. Boersma, D. Thompson, K.M. Larholt, M. Diaz, and A. Barrett. Conducting Indirect-Treatment-Comparison and Network-Meta-Analysis Studies: Report of the ISPOR Task Force on Indirect Treatment Comparisons Good Research Practices: Part 2. Value Health 2011, 14, 429-437.
  11. The Member State Coordination Group on Health Technology Assessment (HTACG). Methodological Guideline for Quantitative Evidence Synthesis: Direct and Indirect Comparisons. Available online: https://health.ec.europa.eu/document/download/4ec8288e-6d15-49c5-a490-d8ad7748578f_en?filename=hta_methodological-guideline_direct-indirect-comparisons_en.pdf (accessed on 8 March 2025).
  12. The Member State Coordination Group on Health Technology Assessment (HTACG). Practical Guideline for Quantitative Evidence Synthesis: Direct and Indirect Comparisons. Available online: https://health.ec.europa.eu/document/download/1f6b8a70-5ce0-404e-9066-120dc9a8df75_en?filename=hta_practical-guideline_direct-and-indirect-comparisons_en.pdf (accessed on 8 March 2025).
  13. Macabeo, B., A. Quenéchdu, S. Aballéa, C. François, L. Boyer, and P. Laramée. Methods for indirect treatment comparison: results from a systematic literature review. Journal of market access & health policy 2024, 12, 58-80.
  14. Ahn, E. and H. Kang. Concepts and emerging issues of network meta-analysis. Korean J. Anesthesiol. 2021, 74, 371-382.
  15. Institute of Medicine (US) Committee on Standards for Systematic Reviews of Comparative Effectiveness Research. Finding What Works in Health Care: Standards for Systematic Reviews. Washington (DC): National Academies Press (US); 2011. 4, Standards for Synthesizing the Body of Evidence. Available online: https://www.ncbi.nlm.nih.gov/books/NBK209522/ (accessed on 8 May 2025).
  16. Guo, J.D., A. Gehchan, and A. Hartzema. Selection of indirect treatment comparisons for health technology assessments: a practical guide for health economics and outcomes research scientists and clinicians. BMJ open 2025, 15, e091961.
  17. Song, F., Y.K. Loke, T. Walsh, A.-M. Glenny, A.J. Eastwood, and D.G. Altman. Methodological problems in the use of indirect comparisons for evaluating healthcare interventions: survey of published systematic reviews. BMJ 2009, 338.
  18. Cochrane Training. Chapter 10: Analysing data and undertaking meta-analyses. Available online: https://training.cochrane.org/handbook/current/chapter-10 (accessed on 12 May 2025).
  19. Bucher, H.C., G.H. Guyatt, L.E. Griffith, and S.D. Walter. The results of direct and indirect treatment comparisons in meta-analysis of randomized controlled trials. J. Clin. Epidemiol. 1997, 50, 683-691.
  20. Yu-Kang, T. Node-Splitting Generalized Linear Mixed Models for Evaluation of Inconsistency in Network Meta-Analysis. Value Health 2016, 19, 957-963.
  21. Veroniki, A.A., S. Tsokani, I.R. White, G. Schwarzer, G. Rücker, D. Mavridis, J.P. Higgins, and G. Salanti. Prevalence of evidence of inconsistency and its association with network structural characteristics in 201 published networks of interventions. BMC Med. Res. Methodol. 2021, 21, 1-10.
  22. European Federation of Statisticians in the Pharmaceutical Industry (EFSPI). Unanchored indirect treatment comparison methods and unmeasured confounding. Available online: https://psiweb.org/docs/default-source/default-document-library/psi-hta-sig-itc-kr_final.pdf?sfvrsn=481bacdb_0 (accessed on 13 May 2025).
  23. Chaimani, A., D.M. Caldwell, T. Li, J.P. Higgins, and G. Salanti. Undertaking network meta-analyses. Cochrane handbook for systematic reviews of interventions 2019, 285-320.
  24. Phillippo, D., T. Ades, S. Dias, S. Palmer, K.R. Abrams, and N. Welton. NICE DSU technical support document 18: methods for population-adjusted indirect comparisons in submissions to NICE. 2016.
  25. Sutton, A.J., K.R. Abrams, D.R. Jones, T.A. Sheldon, and F. Song, Methods for meta-analysis in medical research. Vol. 348. 2000: Wiley Chichester.
  26. Dias, S., A.J. Sutton, N.J. Welton, and A. Ades. Heterogeneity: subgroups, meta-regression, bias and bias-adjustment. 2016.
  27. Mbuagbaw, L., B. Rochwerg, R. Jaeschke, D. Heels-Andsell, W. Alhazzani, L. Thabane, and G.H. Guyatt. Approaches to interpreting and choosing the best treatments in network meta-analyses. Systematic reviews 2017, 6, 1-5.
  28. Higgins, J.P., S.G. Thompson, and D.J. Spiegelhalter. A re-evaluation of random-effects meta-analysis. Journal of the Royal Statistical Society Series A: Statistics in Society 2009, 172, 137-159.
  29. Signorovitch, J.E., V. Sikirica, M.H. Erder, J. Xie, M. Lu, P.S. Hodgkins, K.A. Betts, and E.Q. Wu. Matching-adjusted indirect comparisons: a new tool for timely comparative effectiveness research. Value Health 2012, 15, 940-947.
  30. Sackett, D.L., W.M. Rosenberg, J.M. Gray, R.B. Haynes, and W.S. Richardson, Evidence based medicine: what it is and what it isn’t. 1996, British Medical Journal Publishing Group. p. 71-72.
  31. Guyatt, G.H., A.D. Oxman, G.E. Vist, R. Kunz, Y. Falck-Ytter, P. Alonso-Coello, and H.J. Schünemann. GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ 2008, 336, 924-926.
  32. Macabeo, B., T. Rotrou, A. Millier, C. François, and P. Laramée. The acceptance of indirect treatment comparison methods in oncology by health technology assessment agencies in England, France, Germany, Italy, and Spain. PharmacoEconomics-open 2024, 8, 5-18.
  33. Es-Skali, I.J. and J. Spoors. Analysis of indirect treatment comparisons in national health technology assessments and requirements for industry submissions. Journal of Comparative Effectiveness Research 2018, 7, 397-409.
  34. The independent Institute for Quality and Efficiency in Health Care (IQWiG). General Methods version 8.0. Available online: https://www.iqwig.de/methoden/allgemeine-methoden_entwurf-fuer-version-8-0.pdf (accessed on 26 February 2025).
  35. Haute Autorité de Santé (HAS). Indirect comparisons: Methods and validity Available online: https://www.has-sante.fr/upload/docs/application/pdf/2011-02/summary_report__indirect_comparisons_methods_and_validity_january_2011_2.pdf (accessed on 12 May 2025).
  36. The Belgian Health Care Knowledge Centre (KCE). Guidelines for pharmacoeconomic evaluations in Belgium Available online: https://kce.fgov.be/sites/default/files/2021-12/d20081027327.pdf (accessed on 12 May 2025).
  37. Chung, W.T. and K.C. Chung. The use of the E-value for sensitivity analysis. J. Clin. Epidemiol. 2023, 163, 92-94.
  38. Thompson, S.G. and J.P. Higgins. How should meta-regression analyses be undertaken and interpreted? Stat. Med. 2002, 21, 1559-1573.
  39. Jansen, K. and H. Holling. Rare events meta-analysis using the Bayesian beta-binomial model. Research Synthesis Methods 2023, 14, 853-873.
  40. The Member State Coordination Group on Health Technology Assessment (HTACG). Guidance on reporting requirements for multiplicity issues and subgroup, sensitivity and post hoc analyses in joint clinical assessments. Available online: https://health.ec.europa.eu/document/download/f2f00444-2427-4db9-8370-d984b7148653_en?filename=hta_multiplicity_jca_guidance_en.pdf (accessed on 8 January 2025).
  41. Turner, R.M., D. Jackson, Y. Wei, S.G. Thompson, and J.P. Higgins. Predictive distributions for between-study heterogeneity and simple methods for their application in Bayesian meta-analysis. Stat. Med. 2015, 34, 984-998.
  42. Sutton, A.J. and K.R. Abrams. Bayesian methods in meta-analysis and evidence synthesis. Stat. Methods Med. Res. 2001, 10, 277-303.
  43. Ren, S., S. Ren, N.J. Welton, and M. Strong. Advancing unanchored simulated treatment comparisons: A novel implementation and simulation study. Res Synth Methods 2024, 15, 657-670.
  44. Nierengarten, M.B. Single-arm trials for US Food and Drug Administration cancer drug approvals: Although there are some challenges in using single-arm studies for accelerated drug approvals, it can make a difference in getting drugs previously approved for other uses to patients.: Although there are some challenges in using single-arm studies for accelerated drug approvals, it can make a difference in getting drugs previously approved for other uses to patients. Cancer 2023, 129, 1626.
  45. European Medicines Agency (EMA). Establishing efficacy based on single-arm trials submitted as pivotal evidence in a marketing authorisation. Published online on 21 April2023. Available online: https://www.ema.europa.eu/en/establishing-efficacy-based-single-arm-trials-submitted-pivotal-evidence-marketing-authorisation (accessed on 7 May 2025).
  46. Hernán, M.A. and J.M. Robins. Using big data to emulate a target trial when a randomized trial is not available. Am. J. Epidemiol. 2016, 183, 758-764.
  47. Bucher, H.C. and F. Chammartin. Strengthening health technology assessment for cancer treatments in Europe by integrating causal inference and target trial emulation. The Lancet Regional Health–Europe 2025, 52.
  48. Franklin, J.M., R.J. Glynn, D. Martin, and S. Schneeweiss. Evaluating the use of nonrandomized real-world data analyses for regulatory decision making. Clin. Pharmacol. Ther. 2019, 105, 867-877.
  49. Krüger, R., C. Cantoni, and A. Van Engen. OP49 Are Propensity-Score-Based Adjusted Indirect Comparisons Feasible For All European Joint Clinical Assessments Based On Non-Randomized Data? Int. J. Technol. Assess. Health Care 2024, 40, S23-S23.
  50. Elliott, J.H., T. Turner, O. Clavisi, J. Thomas, J.P. Higgins, C. Mavergames, and R.L. Gruen. Living systematic reviews: an emerging opportunity to narrow the evidence-practice gap. PLoS Med. 2014, 11, e1001603.
  51. Stensrud, M.J. and M.A. Hernán. Why test for proportional hazards? JAMA 2020, 323, 1401-1402.
  52. Lin, L. Use of Prediction Intervals in Network Meta-analysis. JAMA Netw Open 2019, 2, e199735.
  53. European Medicines Agency (EMA). Reflection paper on establishing efficacy based on single-arm trials submitted as pivotal evidence in a marketing authorisation application. Considerations on evidence from single-arm trials. Published online r 9, 9 September 2024. Available online: https://www.ema.europa.eu/en/documents/scientific-guideline/reflection-paper-establishing-efficacy-based-single-arm-trials-submitted-pivotal-evidence-marketing-authorisation-application_en.pdf (accessed on 7 May 2025).
  54. Khachatryan, A., S.H. Read, and T. Madison. External control arms for rare diseases: building a body of supporting evidence. J. Pharmacokinet. Pharmacodyn. 2023, 50, 501-506.
  55. Wang, M., H. Ma, Y. Shi, H. Ni, C. Qin, and C. Ji. Single-arm clinical trials: design, ethics, principles. BMJ Support Palliat Care 2024, 15, 46-54.
  56. European Commission. Joint Clinical Assessment for Medicinal Products. Available online: https://health.ec.europa.eu/document/download/ced91156-ffe1-472d-85eb-aa6a91dd707e_en?filename=hta_htar_factsheet-jca_en.pdf (accessed on 12 May 2025).
  57. Guyatt, G.H., A.D. Oxman, R. Kunz, D. Atkins, J. Brozek, G. Vist, P. Alderson, P. Glasziou, Y. Falck-Ytter, and H.J. Schünemann. GRADE guidelines: 2. Framing the question and deciding on important outcomes. J. Clin. Epidemiol. 2011, 64, 395-400.
  58. Werner, S., L. Lechterbeck, A. Rasch, S. Merkesdal, and J. Ruof. Untersuchung der Akzeptanz und der Ablehnungsgründe indirekter Vergleiche in IQWiG-Nutzenbewertungen. Gesundheitsökonomie & Qualitätsmanagement 2020, 25, 24-36.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated