A Multi-level System of Performance Evaluation Using Diagnosis- Related Groups for Cost Containment

This study aims to develop a performance evaluation system that can facilitate performance evaluation at region, hospital, and department levels to enable better cost management for sustainable development. A multi-level system of performance evaluation informs a hierarchical assessment of cost management from regions to hospitals to departments using diagnosis-related group (DRGs). Various metrics are developed employing the variances between targets and actuals where targets are determined from two perspectives: benchmarking using external regional prices and change management using internal data. Targets for the latter are statistically based and specifically incorporate variability. The model is applied to two hospitals, twenty departments, nine DRGs and 1071 inpatients. The analyses indicate that the approach can provide a practical evaluation tool that allows for particular characteristics at multiple levels. The system provides macro-micro and external-internal perspectives in performance, enabling high-level variances to be decomposed thereby identifying sources of performance variability and financial impact.


Introduction
Traditionally, countries have used two basic mechanisms to pay for hospital care: fee-forservice payments and global budgets. Both are problematic in terms of ensuring high-quality care due to the inherent incentive to overprovide (fee-for-service) or under-provide (global budgets) hospital services [1]. In recent years, many countries have introduced policies around fixed price payments for each patient, prospective reimbursement systems, which are considered to be more cost-efficient than cost-based retrospective reimbursements in response to rapid growth in the cost of healthcare. Diagnosis-related groups (DRGs) are a key component of these systems intended to control costs and motivate healthcare providers to improve efficiency [1]. DRGs provide multi-product 'views in acute care hospitals, classifying patients with similar resource consumption patterns into clinically meaningful categories [2]. DRGs can be used for hospital performance comparisons and to promote hospital management, as DRGbased hospital payment systems [3]. These payment systems have the objective of incentivising hospitals to improve their performance [4]. normalised variances. These are illustrated using two hospitals, nine DRGs, twenty departments and 1071 inpatients to demonstrate the principles, advantages and disadvantages of the system. Fourth, the trimmed lognormal mean and standard deviation in light of the probability distribution of health cost are used to set the cost targets for internal purposes as part of efforts to change behaviour. For some departments such as intensive care unit (ICU), low volumes for some DRGs may require a different approach and we suggest some ways to do this. The results reveal how costs can vary depending on the departmental structure and patient complexity with high cost mostly caused by ICU including PICU (Paediatric Intensive Care Unit).
The paper is structured as follows: Section 2 reviews the development of DRG-based performance evaluation including case-mix, DRG costing and cost management; Section 3 introduces the methodology, a multi-level system and the metrics for measuring performance; Section 4 illustrates the system with performance evaluations of two hospitals; Section 5 discusses the utility of performance evaluation based on DRGs, and a research agenda to assess to address some crucial problems in hospital performance evaluation; Section 6 concludes this study and points out limitations and future research.

Literature reviews 2.1 Performance evaluation with DRGs
DRGs have been used for many purposes including prospective payment systems, risk adjustment and performance evaluation of work efficiency and medical service quality [10]. Park and Shin [11] developed a model that measures and evaluates the case-mix complexity of tertiary care hospitals. DRGs are used to evaluate the inpatient service performance of the clinical subspecialty "major operation of the digestive system" of a cancer specialist hospital [12]. The relationships among the average number of procedures per surgeon, across the entire hospital, and multiple dimensions of overall hospital operational performance are evaluated using All Patient Refined Diagnostic Related Groups (APR-DRGs) to adjust for case-mix and severity [13]. DRGs are still predominantly used in most studies on hospital performance and case-mix complexity. Actual improvements in cost per discharge 1 , average length of stay, and conformance quality are used to assess changes in a hospital's level of process excellence [14].

DRG costs
High-quality cost accounting systems were conducive to the development of DRG payment mechanisms [15]. The accuracy of the calculation is only one of the factors that determine the choice of the cost accounting methodology, important aspects include the selection of reference hospitals, individual country regulations and the complexity of hospital information systems in hospitals [16]. Internationally there are differences in cost accounting systems through crosscountry comparisons to identify and improve areas of weakness [17]. The implementation of activity-based costing (ABC) was applied at a Heart Centre in Sweden for administrative cost information, strategic decision-making, quality improvement, and cost reduction [18].
There is also a tension between bottom-up and top-down costing where the former may provide different costs than the latter. Bottom-up microcosting, top-down microcosting and gross costing compared to estimate the total cost for appendectomy, normal delivery, stroke and Acute Myocardial Infarction (AMI) of hospital service that restricting the use of bottomup microcosting to cost components with a large impact on total costs (i.e., labour and inpatient stay) would likely result in reliable cost estimates [19].
DRG costs are used to derive case weights as a basis for hospital funding and comparative performance evaluations. The consequences of DRG cost weight volatility on hospital performance using the hospital base rate as a quantitative measure of hospital performance are estimated [20]. The average case weight increases by 14% for every one-day increase in the average length of stay (ALOS) [21]. The German DRG-system revealed that a "compression'' of DRG cost weights occurs and that the data sample used to calculate case weights lacks representativeness [22]. A better classification system and funding formulas incorporating reliable case weights derived from patient costing should overcome many of the deficiencies in the current case-mix payments systems [23].

Managing DRG-based costs
If a hospital could control the length of stay (LOS) at the projected level, on average, the cost per admission and the cost per patient day would decrease [24]. A significant reduction of LOS in hospitals that changed from per diem to DRG reimbursement in Switzerland [25]. Mean hospital cost as a function of patient stay is estimated and adjusted for the influence of patient characteristics and treatment procedures on LOS and cost [26]. The median cost for liver transplantation was not statistically significant with the corresponding hospital refund, the average DRG-based cost of liver transplantation was significantly higher than the actual cost, and the hospital's reimbursement for liver transplantation did not differ significantly from the actual registered cost [27]. From the hospital administration's perspective, it is not the additional costs of healthcare, but rather the cost-reimbursement relationship which guides decisions [28].
In summary, DRGs are commonly used for prospective payments systems and performance evaluation. While cost systems may differ for the determination of DRG costs and consequently case weights, there is likely to be reasonable uniformity within a specific country's health systems. Hence they form reasonable reference points for benchmarking purposes. So far as change behaviour is concerned, there appear to be established relationships between costs, patient stay and characteristics and conformance quality. The suggested use of an inter-quartile range to identify outliers is particularly relevant when considering cost containment especially for DRGs with high cost variability.

Methodology 3.1 Architecture of the multi-level system
In benchmarking, a target is set based on other organisations in a relevant sector which is then used to compare with actual costs and possibly value engineer the product to match with the target. In this study, the regional target sets a reference for each hospital which is cascaded down to hospitals to departments through a top down process, such that the competitive pressure faced by hospitals can be traced to departments. Thus senior hospital management can identify how the hospital is performing at all levels against the regional benchmark (RB) as shown on the left hand side of the multi-level performance evaluation system depicted in Figure  1.
Target costing also considers prices but its focus is not so much to minimize cost, but to achieve a desired level of cost management determined by the target costing process. In healthcare this process must also maintain or improve quality. For change management (CM), the target is determined based on internal measures shown on the right hand side of Figure 1. This is a bottom up process where targets or prices are set for DRGs taking into consideration each department's and hospital's context. For example, ICU departments may have a different target setting method than general departments, tertiary hospitals are different from secondary hospitals. These department targets are then rolled upwards into hospital targets and thence upwards further to the region total targets . Variances can be obtained for both regional benchmark and change management with actual costs being the common component for both. At the hospital level three variances are significant. First the hospital benchmark variance shows whether the hospital is making or losing money against regional prices. Second the change management variance shows how internal target costs compare with actual in terms of cost management efforts. The third variance closes the loop by showing how far off internal target costs are from external reference prices. If a hospital is serious about change management one would expect the change management target cost to be less than actual costs. The size and sign of the third variance show the level of difficulty for a hospital relative to external pricing. In an ideal situation, this would be a favourable variance indicating that the hospital is performing well in its change management. An unfavourable variance indicates serious issues for future funding. The regional benchmark discerns where there is a profit, loss or break-even, the change management evaluates the internal performance considering their different structures and gives the directions to optimize the health system.

Evaluation for benchmarking
The revenues of hospitals under DRG payment schemes primarily derive from reimbursement by healthcare insurers which are determined by the DRG prices (or reimbursement rates) and volumes. DRGs prices are often based on the regional/national average DRG cost [22]. In Netherland, Poland and New Zealand, they are determined nationally while in China, it is done by region.
Thus, if the regional average cost is an appropriate target, it should cascade down to hospitals and departments. The cost target j H T and standard deviation target at the hospital level, and the cost target j D T and standard deviation target at the department level should be: where j R denotes the average cost of Region / Country R and is the standard deviation of the Region / Country. These targets can be compared against internal costs to reveal areas of strength and weakness across all levels.

Evaluation for change management
For benchmarking, we take the regional mean as the target, but this does not consider the circumstances of hospitals such as tertiary or secondary or departments that are more complex than others e.g. ICU. If the costs of some DRGs in a hospital are inherently lower than the regional mean (target) due to its different positioning, speciality, history or other causes, then the target may not be appropriate or even a disincentive to improve performance. Alternatively, the costs of some DRGs may be higher than the target if the hospital or department receives more complex patients.
Emmanuel et al. [29] make the following observation based on Hofstede (1968)'s study: "the budget level which motivates the best performance is one that is somewhat more demanding than the level of performance that will actually be achieved. However, a budget that is likely to be achieved will motivate only a lower level of performance". One of the major problems in hospital management is variability manifested in LOS, theatre times, delays. While some of this is due to the diversity of patient needs, considerable efforts are made to standardise processes where possible to decrease heterogeneity. Examples of these include case-mix systems, care bundles, specialist departments and standard operating procedures. Based on this, we argue that an internal performance evaluation system should incorporate variability as one of its components. The objective is to motivate departmental managers to explore better ways of managing patient flow and clinical activities.
Therefore one possibility is to set the target using some type of average cost (e.g. mean, median) for a DRG within the hospital or department adjusting for a proportion (θ) of the standard deviation (SD): Cost Target = Average Cost -θ × SD. For target setting it is important that the target is perceived as fair and appropriate so we propose that the DRG costs are trimmed for both high and low outliers.
This has several advantages. First, the lower the standard deviation the closer the target becomes to actual cost thereby motivating departmental managers to find ways to reduce variability. Second, trimming by removing high and low outliers should be perceived as a fairer target than using the actual mean cost which includes these. Third, many researchers have noted that health care costs are skewed and not normally distributed [30][31][32]. Thus patient cost distributions can be heavily skewed to the right, signifying a form of cost distribution in which the mean value could lead to distorted targets. As several authors have pointed out, the cost function distribution is lognormal suggesting that the target should be based on a lognormal distribution [30,32,33]. Finally it should be noted that the comparator for both benchmarking and change management are actual costs. This ensures that managers are aware that these need to be managed and controlled as well as being familiar to them.
The next issue then arises as to the determination of θ and the percentage for trimming. A characteristic of a lognormal distribution is that when the logarithms of values form a normal distribution, the original values have a lognormal distribution. If the mean of the logarithms is set as the average and the adjustment factor θ is set at zero then departments will achieve the target (50% of the time); if θ is set at 0.1 then it will be less than 50% of the time and so on. The setting of θ is more dependent on management objectives, which will be demonstrated in Section 4 and further discussed in Section 5.
Spanish and Belgian studies have shown that high deficit cost outliers account for roughly five percent of cases but produce 11-20 percent of inpatient costs [32,34]. Other studies suggest that nearly five percent of cases fall in a high outlier category except for very low-volume groups, with a recommendation that the percentage should not exceed ten percent [30,35]. Therefore in our example we trim the top five percent and the lowest five percent of cases. A final issue concerns the volume of cases at departmental levels. For example, our empirical study was for only one month so that the number of cases for some DRGs was low even although on an annual basis, they would be much higher. Notwithstanding there are some complex DRGs that are low volume even on an annual basis; these tend to be cases in ICUs. Figure 2 sets out our thoughts on managing targets for low volumes and for specialized departments bearing in mind that the target setting process is likely to be done annually so that the incidence of low volumes for some DRGs would likely be low.

Fig. 2. Target setting at the DRG level for general and specialized departments
For most departments if their DRG volume is greater than 20 2 then the target is set using their lognormal mean adjusted by the standard deviation. Even if the department is a specialised department such as ICU (including PICU, NICU and CCU) provided its volume for DRG j is greater than 20 then the target is set the same as the other departments using the adjusted lognormal mean of the department (3) and standard deviation (4).

The mean of a lognormal distribution is
Where j T , the target of cost for DRG j; ̃j D , the mean of the logarithm of DRG j in Department D; ̃j D , the standard deviation of the logarithm of DRG j in Department D; , the target of the standard deviation of DRG j; θ, the adjustment factor.
If the DRG volume is less than 20 for a specialized department such as ICU then the test is whether the hospital volume for the DRG is greater or less than 20. Depending on this the target is set using an appropriate percentile of the Hospital lognormal distribution e.g. 97.5 th or 85 th percentile.
Where ( ) , the φ-th percentile for DRG j in lognormal distribution; Φ(φ), the φ-th percentile of the standard normal distribution; ̃j H , the mean of the logarithm of DRG j in Hospital H; ̃, the standard deviation of the logarithm of DRG j in Hospital H.
For non-specialised departments, the test is whether the hospital volume for the DRG is greater or less than 20 excluding specialised department volumes for that DRG. If less than 20 then the target is the hospital lognormal mean adjusted by SD without trim; otherwise the hospital lognormal mean adjusted by SD with trim.

Evaluation metrics
Cost in this study is an input variable and for the same output, less input is better assuming the same quality. Thus the following metrics are based on cost variances noting that more detailed investigation would involve close scrutiny of inputs making up those variances.
(1) Target Cost Variance Target Cost Variance (TCV) for every DRG equals the variance between its cost target and actual cost. Formulae for the total variances and mean variances are listed in Eq. (6) - (11). If the actual cost is less or equal to the cost target, the TCV is positive or 0; otherwise negative.  (11) where ̅̅̅̅̅̅ H , ̅̅̅̅̅̅ D and ̅̅̅̅̅̅ j denote the average TCV by case for hospitals, departments and DRGs respectively. For hospitals, mH is the number of DRG groups; , the number of cases of DRG jH; , the cost of case i of DRG jH; , the cost target of DRG jH. For departments , mD is the number of DRG groups; , number of cases of DRG jD; , the cost of case i of DRG jD; , the cost target of DRG jD. For DRGs, nj, the number of cases of DRG j; , the cost of case i of DRG j; C T , the cost target of DRG j.

(2) Target Cost Variance Percent
Since the cost targets for each DRG are different, the mean Target Cost Variance Percent (TCVP) evaluates the rate of cost reduction as Eq. (12) -(14). TCVP is equal to the ratio of cost reduction to cost target with similar notation as above. Hospital: (

3) Normalised Target Cost Variance
The Normalised Target Cost Variance (NTCV) is based on the number of standard deviations an observed value is above or below the mean. Z-scores are commonly used in the healthcare sector, e.g., [36], [37], [38].
NTCV is calculated by subtracting the actual cost from the cost target and then dividing the difference by the standard deviation as Eq. (15) (17) Where ̅̅̅̅̅̅̅̅ , ̅̅̅̅̅̅̅̅ and ̅̅̅̅̅̅̅̅ , the average Normalised Target Cost Variance (NTCV) by cases for hospitals, departments and DRGs respectively; , the target of standard deviation target of DRG jH for hospitals; , the target of standard deviation of DRG jD for departments; , the target of standard deviation of DRG j for DRGs.

Empirical application 4.1 Data
Two comprehensive hospitals in the same region are used to illustrate the multi-level system. Hospital A has 1,000 beds and 34 clinical departments, and Hospital B has 950 beds and 32 clinical departments. Hospital A was a new hospital, positioned as an emergency centre, and equipped with advanced medical apparatus and instruments to undertake more emergency functions and responsibilities. Hospital B was an old hospital, positioned as a general hospital to treat common diseases. Both hospitals had implemented Chinese diagnosis-related groups (CN-DRG) developed by the Chinese government with a total of 787 DRGs. All financial data are given in Chinese currency (USD 1=6.5 CNY).
The cost accounting methods employed in both hospitals are based on internationally accepted principles of 'activity-based' cost accounting and cost modelling. Patient clinical and financial data were collected from hospital documentation and finance departments respectively, to identify costs associated with every cost centre/department, and allocated to patients.
There were 2983 and 2585 inpatients reported in Hospital A and B respectively during August 2017. To illustrate the model for the emergency centred hospital and the conventional hospital, six DRGs were randomly selected from the emergency department in each hospital together with the three DRGs with the most cases (OC13, OB23, ES10). This resulted in nine DRGs and 1071 cases from the two hospitals with varied volumes ranging from 6 to 166 to enable the framework to be tested against varying circumstances. Similar departments are assigned the same number for hospitals A and B e.g. A12 stands Department 12 of Hospital A.
The number of cases for Hospital A (605) and Hospital B (466) for their respective departments, and the explanations of these DRGs are presented in Table 1. Nine departments are the same across both hospitals except for A12 (PICU) and B30 (Infections). Note that these volumes for one month only and display considerable variability in volumes. The regional, hospital and departmental cost means and standard deviations for these DRGs are reported in Table 2.

Top down evaluation using regional benchmarks
To test the differences between the region and the two hospitals, the tests for each DRG are reported in Table 3. • Z-test for comparing Regional mean cost with Hospital A mean cost and Hospital B mean cost; • χ 2 -test for comparing Regional standard deviation with Hospital A standard deviation and Hospital B standard deviation; • T-test for comparing Hospital A mean cost with Hospital B mean cost; • F-test for comparing Hospital A standard deviation with Hospital B standard deviation.
The Z-test and χ 2 -test identify any significant differences between each hospital and the region highlighting shortfalls or surpluses in funding.
At the hospital level, the regional mean (1) is set as the target for the evaluation and Table 4 reports favourable benchmark variances 95,962 and 136,694 for both hospitals suggesting satisfactory performance relative to external reference prices. However, the summary metric scores of TCVH, ̅̅̅̅̅̅ , ̅̅̅̅̅̅̅̅ , and ̅̅̅̅̅̅̅̅ suggest that Hospital B performed at a higher level than Hospital A, even though the average cost of Hospital B (10,656) was slightly higher than that of Hospital A (10,572). As an emergency hospital, Hospital A has a different structure to Hospital B that aligns with other studies [39,40].
Since the incomes of hospitals are primarily from the DRG reimbursement under DRG-PPS, Column 3 and 7 of Table 4 show which DRGs are profitable and which DRGs are in deficit. For example, BB23, BY13, FB29, OB23 and OC13 were profitable while BK19, BR23, ES10 and ES13 were in deficit in Hospital A; BK19, BR23, BY13, ES10 and ES13 were profitable while BB23, FB29, OB23 and OC13 were in deficit in Hospital B.
The individual DRG metrics show considerable differences. BB23 has positive metrics for hospital A but not for hospital B. The TCV for A (B) is 75,523 (-145,374) overall equating to 5,395 (-12,114) per case, 5.6% (-12.6%) of target cost and 0.116 (-0.259) standardised score in terms of variability. Each metric tells a slightly different story depending on whether the focus is on case profitability, the proportion of the reimbursement price or variability from the reimbursement price. For example, management of hospital A might be concerned that BR23 is losing them over 2,000 yen per case which is 24% of the price and almost one third of a standard deviation above target. Table 4 reports the performance metrics using regional benchmark, in both hospitals ICU Department 2 has the highest unfavourable results and Department 13 the most favourable. Some insight is provided for this in the last 10 columns (department level) where Departments 2 and 13 have opposite performance metrics reflecting differences in the complexity of the patients they serve. Using the regional mean and standard deviation as the benchmarks can reflect the performance of hospitals from a regional perspective, but becomes more difficult to evaluate departmental internal performance. As noted large deficits are reported for department 2 (ICU) with opposite profitable measures for department 13 (Neurosurgery). It is noteworthy that the mean for ICU lies more than two standard deviations from the regional mean in both hospitals suggesting that the regional target is unrealistic for this department. While this might provide some insight for hospital management, it is likely that departmental managers will either be irritated (ICU) or complacent (Neurosurgery). From a change management perspective, different targets are needed that consider the characteristics of the hospital and its internal structures.

Bottom up evaluation using change management targets
Following the bottom up approach depicted in Figure 1, targets are set for DRGs contextualised by the department providing them. Depending on volumes, either the adjusted lognormal mean (3) and standard deviation (4) or percentiles (5) are used to set the targets as set out in Figure 2. The adjustment factor θ will be further discussed in Section 5, but for explanatory purposes, a moderate θ of 0.03 is used for this Section. Table 5 displays the targets for each DRG and each department.  Table 6 shows the results of departments in Hospital A and B using change management target. Note that the targets for departments 2 (ICU) and 12 (PICU in hospital A) are closer to actual cost with commensurate lower unfavourable metrics. The targets of ICUs with volume less than twenty are calculated using the percentiles from (5) while that of ICUs with volume greater than twenty using adjusted lognormal mean from (3) in terms of the logic of Figure 2.
In contrast, the benchmarking analysis reported in Table 4 showed the ICUs (Dep.2 and Dep. 12) having the biggest loss and worst performance in both hospitals A and B. This is likely to be unfair since ICU is a special department that receives patients with severe or life-threatening illnesses and injuries, requiring constant care, close supervision of life support equipment and medication to support the delivery of both acute care and complex elective surgery. ICUs are staffed by highly trained physicians, nurses and respiratory therapists who specialize in caring for critically ill patients. They are also distinguished from general hospital wards by a higher staff-to-patient ratio and access to advanced medical resources and equipment that are not routinely available elsewhere. Typically, the costs of ICUs are much higher than general hospital wards.
In our view, Table 5 provides a more balanced target that takes into consideration the special characteristics of these specialised departments. Figure 2 was used to determine the targets for the departmental DRGs so DRG BB23 and BY13 for ICU in both hospitals were calculated using the 85 th percentile. In contrast for ES13, the 97.5 th percentile of trimmed hospital cases is used to set the target for Department 2 in both hospitals. Comparisons between the targets for benchmarking and change management are reported in the appendix. The change management targets are generally more demanding than the regional targets used for benchmarking and both hospitals now show total unfavourable variances in contrast to the favourable variances. While some individual DRGs have improved variances such as BK19 and BR23 for hospital A, others such as FB29, OB23 and OC13 show worse performance compared with regional benchmarking. At the hospital level, change management targets are lower than regional prices which may or may not be acceptable depending on management strategy and incentive schemes.

Discussion and implications 5.1 Target setting
Cost target setting is critical to cost containment because it allows all medical staff to know cost upper limits with appropriate accountability to investigate deviations from target. A cost target that is easy to achieve may result in inefficiencies. If it is too difficult to achieve, this can have a negative impact on motivation and performance.
For benchmarking, the regional DRG mean and standard deviation are used as the target benchmark, which helps management identify areas of strength and weakness and inform the process of target setting for change management or target costing.
At Department and DRG levels, targets need to be attainable as well as being reasonably tight, i.e. a department manager can achieve target a certain percentage of the time. For general departments we use the adjusted lognormal mean with trimming, but for ICUs with low volumes we use the lognormal percentile.
Our example was for only one month so on an annual basis which would be the normal target setting period, volumes are likely to be much higher. Thus, even if a department had only 5 cases for a DRG monthly, this would be 60 for a whole year. Notwithstanding, there may be some DRGs where the annual volume for a specialised department is less than, say, 20 but the annual volume at the hospital level is well above that. The question is where the mean of the distribution for that department lies in terms of the hospital distribution? That is an empirical question which we could answer if we had more monthly data but for this illustration we have assumed 97.5th percentile rather arbitrarily based on a 5% outlier decision.
The relationship between regional prices and internal targets provides a good example of linking an open with a closed system. Table 7 reports the different variances depending on which approach and target is selected. Benchmarking targets using regional prices have different variances from those using change management internally based targets. However, the more interesting variance is between the regional benchmarking target and the change management target of 395,010 and 325,178 for hospitals A and B respectively. These are 6.1% and 6.4% of the regional benchmarking target. These variances and percentages may be considered too high and although there are several ways to address this, an obvious avenue would be to examine the value of the adjustment factor θ and percentile φ. Table 8 sets out 4 scenarios depending on the position of the hospital mean and standard deviation relative to the region. Some examples of possible values for θ are provided illustrating a lenient or aggressive approach. Table 8. Suggested adjustment factor θ considering the relationship between regional prices and internal targets Scenario Hosp. Mean Hosp. SD θ Lenient Aggressive 1 < Regional Mean < Regional SD 0~0.01 0.01~0.02 2 < Regional Mean > Regional SD 0.01~0.03 0.03~0.05 3 > Regional Mean < Regional SD 0.02~0.03 0.04~0.07 4 > Regional Mean > Regional SD 0.03~0.05 0.05~0.1 An appropriate percentile φ is an empirical question that requires examination of a low volume DRG for ICU relative to the hospital wide costs. We argue that ICU cases are more complex and therefore these costs will lie around the higher end of the distribution at the hospital level. In summary, target setting for internal performance evaluation should consider the characteristics of the hospital, departments and DRGs, to prevent under-treatment or disincentives due to one size fits all, and enable cost containment.
The TCV and ̅̅̅̅̅̅ are intuitive approaches to gauge cost containment performance using the variances between the target and the actual cost but they may sometimes obscure performance. For example, if DRG α has a cost target $5000, and actual costs for a department are $4800, then the department is rewarded with 200 points. DRG β's cost target may be $500, and its actual costs for another department $480, resulting in 20 points for that department. Reducing DRG α costs by $20 may be relatively easy compared with DRG β, which motivates the TCVP metric. In this example, the TCVP s of both DRGs are 0.04. This implies the same level of effort in cost containment by both departments. Variability is also important since a high (low) variance may imply greater (less) ability to better manage costs. The NCTV metric addresses this issue and for benchmarking using regional targets, can clearly indicate how appropriate the target is for specific departments such as ICU.

Performance management
Two sets of targets might be considered overly complicated. However, using regional targets for both benchmarking and internal change management can be problematic. Schreyögg et al [22] describe cost compression in Germany where the prices of high cost DRGs were set low relative to their actual costs whereas the prices of low cost DRGs were set high relative to their costs. Theoretically, DRG-based payments provide strong incentives to increase the number of cases treated and to reduce the number of services per case. In contrast to fee-for-service systems, DRGs encourage hospitals to limit their activities to necessary services; in contrast to external budgets, DRGs encourage hospitals to treat more patients. In terms of expenditure control, the effect of DRG-based payments depends on which effect dominates: increasing the number of cases or decreasing the number of services per case. If the DRG does not adequately control for differences between patient populations or differences in the services provided within the DRG, payments for highly complex cases will be too low and payments for less complex cases will be too high [1].
Some researchers argue that a sole emphasis on costs may reduce the quality of medical care by shifting emphasis away from providing quality [41,42]. Many countries are experimenting with new methods to realign payment incentives in health care to encourage higher quality and more efficient care by pay-for-performance (P4P) [43,44]. These P4P schemes are testing whether new ways of paying providers (i.e. hospitals, primary care, integrated systems), which include a synthetic measure of quality, show improvements in the quality of care and value for money in health [45]. The model proposed in this study can be applied in P4P to stimulate external and internal competition among healthcare providers and to motivate the search for lower costs while maintaining volumes and quality. It is particularly suited to healthcare organisations facing external funding processes while also having to provide quality services to varying levels of complexity for some DRGs.

Conclusion, limitations, and future research
There is a danger that hospitals engage in 'cream-skimming' [46]; that is, they attempt to admit only those patients within each DRG that can be expected to have costs below the payment rate (for example, by selecting patients without co-morbidities, if these are not adequately accounted for in the DRG system) or that they 'dump' unprofitable patients by transferring them to other providers or avoiding them altogether [47]. The promise of DRG systems had to be weighed against their weaknesses and risks, including that DRG systems create an incentive for hospitals to skimp on services provided per admission (that is, to undertreat), discharge patients prematurely, and cherry-pick low-cost patients. The costs of developing new systems to prevent gaming of DRG payments (for example, through upcoding and splitting of admissions) also needed to be considered [48]. If DRG systems adequately account for differences between patients (by considering all relevant procedures), the incentives for certain unintended consequences, such as cream-skimming and skimping/undertreatment, could be greatly reduced.
The multi-level system assesses cost containment performance from various perspectives, enabling high-level variances to be decomposed into departmental and DRG variances thereby identifying sources of performance variability and financial impact. The perspectives range from macro to micro, external to internal, providing accountability throughout the organisation. As such it can provide a valuable system under PPS and any regime where external DRG prices are available. To avoid undertreatment, high cost targets are allowed in terms of their probability distributions for special departments (e.g., ICU); to avoid cream-skimming, set reasonable cost targets for non-special sectors in light of their probability distributions. Note that for both special departments and non-special departments, θ and φ are adjusted relying on their probability distributions to ensure that they operate within a reasonable range.
There are several limitations to this research. First, the research was conducted using two Chinese hospitals belonging to one region. Future research in different regions / countries can explore the generalizability of the multi-level model. Second, excluding outliers was argued to provide a better approximation of the regional mean in setting targets for DRGs within departments. A testable hypothesis would be that the mean cost or LOS of a DRG (e.g. hip replacement) should be the same across similar departments (e.g. orthopaedic) in different hospitals. Third, low volumes for some DRGs in specialised departments are challenging when setting appropriate targets. This study used percentiles at the high side of the hospital distribution of costs on the argument that complex cases populate this area. Research across similar departments in different hospitals could confirm this and identify what the appropriate percentiles should be. Forth, value assessments have become a critical foundation for policymaking, helping to create better-informed pricing. Some rely on the conventional costeffectiveness analysis (CEA) metric of the quality-adjusted life-year (QALY), whereas betterinformed pricing can lead to higher-value spending, which in turn can improve health outcomes, strengthen health system sustainability, and promote high-value innovation. Finally, the model has a strong cybernetic flavour with the combining of both open and closed systems with intervening variables θ and percentile φ. More formal cybernetic modelling could provide new directions in performance evaluation systems in both healthcare and other organisations.