4. Discussion
The results indicate a clear practical performance ordering in this dataset, with nonlinear models (GBR and RF) consistently outperforming linear baselines across in-sample, cross-validation, and test evaluations. A clear hierarchy of performance shows that nonlinear models have an essential advantage over linear approaches when the data structure is complex, asymmetric and burdened with large differences between organizational units. It is particularly important that the difference is not simply down to “slightly better metrics”, but reflects the different ability of the models to represent the actual waste generation process in clinical practice.
The key finding is that the linear models failed systematically and almost identically, regardless of regularization. The fact that OLS, Ridge and Lasso remain around the same level of explained variance (approximately R²- 0.2368) indicates that the underlying problem is not overfitting, but the misspecified functional form. In other words, penalizing the coefficients cannot solve the situation where the linearity assumption is itself inadequate.
This interpretation is methodologically consistent with the previous descriptive and correlational analysis: the large difference between Pearson’s and Spearman’s correlation, especially for the number of beds-waste relationship, points to a nonlinear and monotonically regulated relationship that linear models cannot approximate well enough.
Nonlinear models, in contrast, were able to capture patterns involving thresholds, interactions, and the cluster structure of the data by clinical profile. Gradient Boosting Regression achieved the best in-sample result (R² = 0.9999), which confirms the high capacity of the model. However, caution is necessary when interpreting this result because a very high training fit may be accompanied by increased generalization sensitivity.
In this sense, cross-validation and test evaluation play a central role: they show that Random Forest, although minimally weaker in-sample (R² = 0.9949), gave a more robust compromise between accuracy and stability, with a very strong test result (R² = 0.9596). For an applied context, especially in health resource management, that balance is often more valuable than the absolute best training result.
The results of the MLP model additionally confirm that a nonlinear representation is necessary, but also that the neural approach on such a sample carries a higher risk of instability. Although MLP achieved high in-sample performance, a negative CV score indicates limited reliability when generalizing to unseen data. This is understandable given the relationship between sample size, number of parameters and the need for careful regularization and validation. In practical terms, MLP can have a supplementary role for sensitivity analysis or as a secondary model for comparing scenarios, but not as a primary tool for operational planning in this data set.
Additional weight to the interpretation is given by the statistical confirmation of the differences between the models. The Friedman test χ² = 21.571 (p = 6.3149e-04), shows that the differences in performance are not random but systematic. This raises the argument about the advantage of nonlinear methods from the level of descriptive comparison to the level of an inferentially supported conclusion.
Infectious waste in a hospital is not determined by one “global” relationship between predictor and target, but by a series of local regimes that depend on the type of clinic, procedural complexity, work organization and waste segregation protocol.
When the differences between clinics are extreme (in our case the difference between the highest and lowest producer is approximately 83 times), models that implicitly partition the data space naturally have an advantage. In this framework, the dominance of the variable “number of beds” in tree-based significance should not be interpreted narrowly as “capacity explains everything”, but more broadly, as a structural marker of the clinical profile that indirectly carries information about the intensity and type of services.
On the practical side, the results have direct operational value. If infectious waste treatment planning is based on linear or aggregate average assumptions, there is a real risk of underestimating needs in high-intensity clinics and overestimating needs in low-intensity segments.
Models like RF enable more precise clinically differentiated planning: sterilization capacity, delivery dynamics, treatment contracting, as well as financial planning by year. In this sense, ten-year projections should not be seen as “fixed truth”, but as a validated tool for decision-making in conditions of uncertainty and limited resources.
In the table given in
Appendix B we have a Per-department forecast summary: 10-year average predicted waste (kg/year).
The findings are also theoretically consistent with a broader trend in the waste prediction literature, where ensemble methods often outperform linear models in heterogeneous systems. However, the specificity of this study is in the real-world panel structure with pronounced internal unevenness within one tertiary institution. It is this combination that makes the contribution relevant: the paper not only shows “which model is better”, but also explains under what conditions and why this advantage arises. This is important for the transferability of the methodology to similar health systems with multiple clinical profiles.
Despite the strong results, the limitations of the study must be clearly highlighted. First, the data originate from a single healthcare institution, so external generalizability should be checked at other hospitals and health levels. Second, the set of predictors is relatively concise; inclusion of additional clinical and operative variables (eg, procedural mix, intensity of invasive interventions, specific protocols) could further increase predictive power and interpretability. Third, long-term projections imply a certain degree of structural stability of the system, which may be undermined by regulatory changes, epidemiological shocks or organizational reorganizations.
Finally, although the results were tested through multiple evaluation modes, periodic retraining of the model remains necessary to maintain performance in real work.
Guidelines for future work naturally follow from the above. Multicenter external validation, the introduction of temporal components that explicitly model the trend and possible structural breaks, as well as a deeper analysis of the interpretability of predictions (e.g., local explanations by clinic and year) are recommended. In the application sense, the next step is the construction of a periodically updated decision-support framework, where RF serves as the primary model, and GBR as a control model for checking the robustness of the projections.
The magnitude of our performance gap (ΔR² ≈ 0.76) is notably larger than typically reported, attributable to the exceptionally high between-clinic heterogeneity in our single-institution panel dataset.
Overall, the discussion confirms the central message of the paper: in the prediction of infectious hospital waste, the methodological choice must follow the structure of the data. When nonlinearity and interclinical heterogeneity dominate, nonlinear ensemble models are not only a “better option,” but practically a necessary condition for reliable planning. In this framework, Random Forest represents the most rational choice for operational application, while Gradient Boosting remains very valuable as a high-precision complementary model for validation and analytical triangulation.