1. Introduction
Surgical robots have been adopted at scale across the major surgical specialties. The first robotic-assisted procedure using the da Vinci system in mainland China dates back to the mid-2000s, and the platform has since become a mainstream option in urological, gynaecological, thoracic, hepatobiliary, gastrointestinal and orthopaedic procedures [
1,
2]. In gynaecologic oncology, the landmark minimally invasive radical hysterectomy trial [
3] shifted the reference comparator, and large observational series have since reported favourable perioperative outcomes for robotic radical hysterectomy compared with conventional laparoscopy [
4,
5]. In hepatobiliary surgery, the Miami guidelines on minimally invasive pancreas resection summarise the rapidly growing evidence base [
6], and several meta-analyses and high-volume series have shown that robotic distal pancreatectomy is associated with lower conversion rates and higher spleen-preservation rates than laparoscopic distal pancreatectomy [
7,
8,
9], complementing the randomised LEOPARD evidence on minimally invasive versus open distal pancreatectomy [
10]. In orthopaedics, robotic assistance is increasingly used for pedicle-screw placement, total knee arthroplasty and trauma fixation, and learning-curve analyses have become a routine component of the evidence base [
11,
12]. Despite this breadth, health technology assessments (HTA) of surgical robots have produced inconsistent conclusions: systematic reviews and meta-analyses present mixed evidence on clinical benefits relative to higher costs, while real-world adoption rates continue to rise. This evidence–adoption mismatch suggests that traditional evaluation frameworks, which focus solely on mean comparisons, may overlook a critical dimension of robotic-surgery value:
risk mitigation through process standardisation and operational stability.
The disease burden context sharpens the policy stakes. Cervical cancer alone is responsible for an estimated 660,000 new cases and 350,000 deaths worldwide each year [
13], and cervical, colorectal, pancreatic and musculoskeletal conditions together account for a substantial fraction of the surgical demand that robotic platforms are positioned to serve [
6,
13]. The clinical signals within each domain are, however, not uniform. In gynaecologic oncology, studies comparing robotic versus laparoscopic radical hysterectomy report broadly comparable oncologic outcomes with differences in intra-operative blood loss, operative time and length of stay. The landmark NEJM minimally invasive radical hysterectomy trial [
3] and a subsequent large real-world cohort study in England [
14] provide the most direct evidence, while Chinese cohorts [
4,
5] document similar patterns in the domestic setting. In pancreatic surgery, the LEOPARD trial [
10] and follow-up meta-analyses [
7,
9] indicate that robotic and laparoscopic distal pancreatectomy have similar postoperative morbidity, but robotic assistance may reduce blood loss, conversion to open surgery and the time required to achieve splenic preservation [
8,
9]. In orthopaedics, robotic-assisted pedicle-screw insertion is reported to improve accuracy and reduce radiation exposure [
2,
15], but the operative-time advantage only emerges after the surgeon has passed the learning curve [
11,
12].
Three observations motivate a distributional view. First, mean comparisons cannot reflect changes in tail events such as severe complications and medical errors; technologies that compress the upper tail of an outcome distribution may appear “neutral” on the mean. Second, even when means are similar, a reduction in variance implies enhanced predictability and fewer outliers, which is material for healthcare payment and hospital management. Third, the closed-loop mechanism linking human–robot collaboration, process standardisation and quality control may reshape outcomes at the distribution level rather than merely shifting the mean. Incorporating variation into the core of HTA evaluation is therefore not supplementary but essential for capturing the full technological value of robotic surgery.
The economic evidence is similarly heterogeneous. Recent cost-effectiveness studies of robotic-assisted gynaecologic surgery in China report favourable incremental cost-utility ratios under standard willingness-to-pay thresholds [
16], but the result is sensitive to the disease substage, the WTP threshold and the cost of robotic consumables [
16]. A recent methodological review of economic evaluations of robotic-assisted surgery identifies persistent gaps: short time horizons, heterogeneity in costing approaches and limited evidence on the cost-utility of robotic assistance beyond gynaecologic oncology [
17]. Cross-sectional hospital-level evidence suggests that the financial impact of robotic systems is far from uniform across institutions. These inconsistencies point to a long-under-estimated
beyond-the-mean distributional dimension. Only a
mean–variance joint evaluation framework can explain and quantify the
risk-mitigation capability of robotic surgery, thereby improving the robustness of HTA conclusions and the implications for policy.
Our contribution is threefold. First, we construct a novel variance-decomposition term, , and embed it within a regression and a time-varying DID design, forming a replicable mean–variance integrated HTA assessment blueprint. Second, we show, both algebraically and through a worked derivation, that this term isolates the within-group variance change induced by robotic introduction from the mixed-variance term driven by the inter-group mean difference, so that the two effects can be estimated separately. Third, we validate the framework empirically on nationwide hospital-quarter aggregated data and provide policy evidence across the risk–cost dual dimensions.
The remainder of the paper is organised as follows.
Section 2 introduces the data, the identification strategy and the variance-decomposition framework.
Section 3 reports the empirical results.
Section 4 discusses the methodological and policy implications, sets out the limitations and points to future research.
Section 5 concludes.