4. Experimental Results and Discussions
The results achieved by the model have highlighted high accuracy, with an Area Under the Curve (AUC) of 0.965, an accuracy (CA) of 85.4%, an F1 score of 0.852, a precision (PREC) of 85.1%, a recall rate of 85.4%, and a Matthews Correlation Coefficient (MCC) of 0.717. For a more in-depth analysis of the model's performance, the analysis of the confusion matrix and the Receiver Operating Characteristic (ROC) curves were included. The confusion matrix (
Figure 4) showed a significant match between the model's predictions and the actual classifications, with 91.4% of low mobility cases correctly identified (Low-Low), 83.3% of high mobility cases accurately classified (High-High), and 68.4% of medium mobility cases correctly recognized (Medium-Medium).
Classification errors were found in cases of medium mobility, with 21.1% mistakenly classified as low mobility and 10.5% as high mobility; 8.6% of low mobility cases were classified as medium, and 16.7% of high mobility as medium. The ROC curves for the three classes (low, medium, and high mobility) provided an effective visual representation of the model's ability to discriminate between these categories, demonstrating excellent separation with high AUC values, a sign of the model's strong ability to correctly classify hospitals based on their active mobility (
Figure 5).
The analysis of feature importance, as depicted in
Figure 6 and quantitatively detailed in
Table 5, reveals a complex interaction between structural, operational, and systemic factors that influence the model's classification accuracy. Each feature's MEAN and STD scores provide further quantification of their respective influences. For example, a higher standard deviation indicates greater variability in a feature's influence across different model iterations, pointing to potential instabilities in its impact.
The DEATHS variable (0.10063) emerges as the most influential factor, highlighting the importance of mortality rates within hospitals as indicators of the quality of care.
Following in significance, the NETWORK variable (0.0390019) illustrates how the affiliation to public or private networks impacts hospital performance, emphasizing systemic-structural differences. The PHYSICIANS variable (0.0342539) underscores the critical role of medical staff in health outcomes.
INTERVENTIONS (0.0198643) demonstrate the relevance of the volume of medical procedures to hospital mobility classification, indicating operational efficiency and the capacity to provide care. READMISSIONS (0.0164729) reflect the impact of patient management policies and the quality of post-discharge care.
Other variables such as BEDS (0.0108043), LEVEL (0.00547481), HOSPITAL STAFF (0.00203488), DEPARTMENTS (0.00135659), and NURSES (0.00523256), while having lower importance scores, underscore the significance of structural and operational aspects in hospital mobility. These data indicate that hospital capacity, specialization level, staff composition, and available resources play a significant role in optimizing health services performance.
The analysis details how a multitude of factors, each with its specific degree of influence, contribute to hospital mobility. These figures highlight how a combination of structural, operational, and clinical factors collectively interacts in determining hospital mobility.
We adopted SHAP (SHapley Additive exPlanations) for the fair interpretation of variables in our logistic regression model, crucial for informed decisions in healthcare. This technique, which assigns an impact value to each feature, has been applied in studies demonstrating its value in the analysis of complex models [
33].
The model demonstrated remarkable accuracy in classifying hospitals with "low mobility," with a final prediction of 0.97, significantly higher than the baseline value of 0.64, which represents the average probability of low mobility derived from the training data. In this context, the SHAP values illustrated in
Figure 7 play a crucial role in clarifying the contribution of each variable:
DEATHS: With a SHAP value of 0.142692 and a value of -0.63, this indicates that a lower mortality rate positively contributes to the low mobility classification in the model.
BEDS: A SHAP value of 0.0493684 with a value of -0.8 suggests that a smaller number of beds is associated with a higher probability of low mobility.
LEVEL=BASE LEVEL and NETWORK=PRIVATE: Both with positive SHAP values (0.0398403 and 0.0382124, respectively) and values of 1, show that base hospitals and private ones are more likely to be classified as having low mobility.
NETWORK=PUBLIC: With a SHAP of 0.0249563 and a value of 0, it appears that belonging to a public network does not significantly impact the probability of low mobility for the hospital in this case.
NURSES and DEPARTMENTS: These features have positive SHAP values (0.0189351 and 0.0182643) but negative values, indicating that a smaller number of nurses and departments could contribute to a higher probability of low mobility.
PHYSICIANS and READMISSIONS: With negative SHAP values (-0.0204742 and -0.00863305) and values of -0.34 and 0.19, they indicate that a smaller number of doctors and a higher number of readmissions are associated with a reduction in the probability of being classified as having low mobility.
The model's prediction for hospitals classified as having "medium mobility" is significantly low at 0.03, indicating a much lower probability compared to the baseline value of 0.23. This suggests that the observed features tend to shift the prediction towards other mobility classes. According to SHAP values, the features influence as follows:
BEDS and DEPARTMENTS: Present marginally negative SHAP values, suggesting that a higher number of beds and departments does not favor the classification of a hospital in the medium mobility category.
HOSPITAL STAFF: With a positive SHAP, this feature shows a slightly favorable effect on the likelihood of medium classification, although the associated value indicates less staff than commonly expected.
DEATHS: The mortality rate seems to have the most significant impact in reducing the probability of medium mobility, as highlighted by a considerably negative SHAP value.
INTERVENTIONS and READMISSIONS: Both with positive SHAP values, suggest that a higher number of interventions and readmissions might push the classification towards medium mobility.
NETWORK=PRIVATE and NETWORK=PUBLIC: Substantial negative SHAP values indicate that belonging to these networks contributes to reducing the likelihood of a medium classification, potentially in favor of a high or low mobility classification.
LEVEL=BASE LEVEL: A negative SHAP value shows that hospitals with basic services are less likely to be considered of medium mobility.
PHYSICIANS: This factor has the highest positive SHAP value, implying that a smaller number of doctors is correlated with an increased probability of a medium mobility classification.
The model's prediction for hospitals classified as having "high mobility" is 0.0, well below the baseline value of 0.14. This indicates that, according to the model, the considered features are generally not indicative of high mobility for this instance. Specific values:
BEDS, DEPARTMENTS, HOSPITAL STAFF, DEATHS: All have negative SHAP values, meaning that a smaller number of beds, departments, and staff, as well as a lower mortality rate, are associated with a reduced likelihood of being classified as high mobility.
INTERVENTIONS, READMISSIONS: These also have negative SHAP values, indicating that a higher number of interventions and readmissions is not correlated with a high mobility classification.
NETWORK=PRIVATE, NETWORK=PUBLIC: Surprisingly, these features have positive SHAP values despite the model's prediction being 0.0. This might indicate that while belonging to a private or public network has a positive influence, it is not enough on its own to tip the classification towards high mobility.
LEVEL=BASE LEVEL: Presents a small positive SHAP value, suggesting a slightly favorable influence towards a high mobility classification, which is interesting given the model's prediction value of 0.0.
NURSES, PHYSICIANS: Both present the highest negative SHAP values, indicating that a smaller number of nurses and doctors is strongly associated with a lower probability of high mobility.
The combination of these SHAP values, which are predominantly negative, clarifies why the model's prediction for high mobility is 0.0, indicating that, according to the model, the current conditions are largely not indicative of high mobility in the hospitals under examination.
We have meticulously examined the predictive value differences between Apulia and Emilia-Romagna, analyzing the impact of each characteristic on the predictions for the three mobility classes (low, medium, high) across the two regions. The graphical representations of the variables' effects on the model outcome, for the target classes (LOW, MEDIUM, HIGH), are illustrated in the violin plots in
Figure 8,
Figure 9 and
Figure 10. The features are positioned to the left, ordered according to their importance for predicting the specific class. Positive SHAP values (indicated to the right of the center) represent the impact of a feature on the prediction for the selected class, while negative SHAP values (to the left of the center) denote an opposite effect on the classification. Red signifies higher feature values, whereas blue indicates lower values.
The numerical impact scores for each target class were compared between the two regions using the Student's t-test. Statistically significant differences, with a p-value less than 0.005 between Apulia and Emilia-Romagna, are reported in
Table 6,
Table 7 and
Table 8, where the impact values are sorted by relevance within each target class.
For low mobility:
Level = Base Level: In Emilia-Romagna, the presence of base-level hospitals positively contributes to the classification of low mobility, suggesting that these hospitals are adequate to meet local health needs. Conversely, in Apulia, the impact is reversed, indicating there might be less confidence in the services provided by base hospitals or a need for more specialized services.
Network = Private: Private hospitals in Emilia-Romagna positively impact low mobility, which might reflect a higher perceived quality or greater accessibility compared to Apulia, where the impact is negative. This could indicate a different perception of quality among private healthcare networks in the two regions.
Readmissions: A strong negative impact of readmissions on low mobility in Apulia suggests that high readmission rates might be seen as an indicator of a lack in care quality, prompting patients to seek hospitals with a better reputation or more specialized services.
For medium mobility:
Hospital Staff: The positive impact of hospital staff allocation on medium mobility in Apulia suggests that an adequate number of staff correlates with the choice of hospitals closer to home or with an intermediate level of specialization. Conversely, a negative impact in Emilia-Romagna might reflect different expectations or a distinct distribution of human resources within healthcare facilities.
Nurses: Similar to hospital staff, a higher number of nurses in Apulia positively influences the choice of hospitals for medium mobility, highlighting the importance of nursing staff in the perception of care quality. In Emilia-Romagna, the negative effect could indicate that other factors more significantly influence hospital choice.
For high mobility:
Mortality: In Apulia, lower hospital mortality rates do not seem to be a decisive factor for high mobility, suggesting that other aspects of care quality or service accessibility are more relevant in-patient decisions. In Emilia-Romagna, a negative impact of mortality on high mobility might indicate a greater sensitivity to this indicator when assessing hospital quality.
Hospital Staff: The difference in the impact of hospital staff on high mobility between the two regions may reflect a varying evaluation of hospitals' ability to provide specialized or emergency care, with a negative impact in Apulia suggesting patients are seeking better-equipped hospitals.
The investigation into hospital mobility between Apulia and Emilia-Romagna unveils a complex landscape of how quality perceptions and accessibility to healthcare services shape patient decisions in these two regions. On one side, in Emilia-Romagna, the trust in basic hospital services and private facilities for less complex care suggests a healthcare network perceived as effective and reliable. On the other, Apulia shows a tendency to favor hospitals based on the availability of qualified staff, indicating the significance of human capital in hospital choice for care of medium complexity. Regarding more specialized care, both regions exhibit a preference for well-equipped facilities with positive outcomes, revealing a common expectation of excellence in high-complexity treatments. This contrast in hospital choice dynamics between Apulia and Emilia-Romagna not only reflects regional peculiarities in quality perceptions but also underscores the need for targeted healthcare strategies capable of strengthening trust in basic and intermediate care and ensuring accessibility to highly specialized services.
4.1. Experiments
In our analysis on hospital mobility prediction, we employed advanced methodologies to identify the optimal predictive model.
Figure 8 summarizes the comparison across various models - Logistic Regression, Random Forest, Gradient Boosting, SVM, kNN, Naive Bayes, and AdaBoost - using metrics such as AUC, Accuracy, F1 Score, Precision, Recall, and MCC. This selection of metrics allows for a comprehensive assessment of performance, guiding the choice of the most suitable model for hospital mobility prediction. One study examined the effectiveness of predictive models for the early diagnosis of diabetes, emphasizing the critical role of model selection in healthcare outcomes [
13]. Another work discussed the development and deployment of predictive models in the healthcare sector, providing practical insights into predictive modeling in healthcare [
10]. Furthermore, the comparison of predictive models for hospital readmission of heart failure patients was analyzed, highlighting the importance of cost considerations in model evaluation [
17].
Figure 8.
Performance Parameters of Prediction Models.
Figure 8.
Performance Parameters of Prediction Models.
Logistic Regression, chosen for predicting hospital mobility levels (low, medium, high), is distinguished by an AUC of 0.965. This metric reflects the model's high ability to differentiate between the predicted classes, a critical aspect for ensuring precision in clinical and operational decisions. The AUC, by measuring the model's quality across the entire spectrum of classification thresholds, provides an assessment independent of the specific distribution of classes in the dataset, a fundamental aspect when considering multiple outcome categories. Logistic Regression, with its probabilistic nature, offers a robust interpretative framework and flexibility in adapting to multi-class dependent variables, making it particularly suitable for addressing our tripartite target variable. The rigor in model validation was uniformly maintained for all, using 10-fold cross-validation and dividing the dataset into a proportion of 70% for training and 30% for testing, thus ensuring the robustness and generalizability of the predictive performances.
4.2. Impact of Machine Learning on Hospital Mobility: Perspectives and Challenges
The adoption of logistic regression and SHAP values in analyzing variables that influence hospital mobility opens new perspectives for understanding patients' perception of healthcare service quality. This approach, enriched using advanced machine learning techniques, allows for the interpretation of complex relationships between variables, significantly improving the transparency and interpretability of predictive models. Recent studies on seismic vulnerability assessment and the interpretation of behaviors in strategy games demonstrate the effectiveness of SHAP values in providing detailed insights and enhancing predictive analyses across various fields [11; 14]. This enables highlighting how specific factors influence patients' decisions regarding hospital mobility, offering valuable insights for the optimization of healthcare services.
The decision to compare two regions with distinct healthcare contexts enriches the analysis, highlighting how regional peculiarities can influence the perception of service quality. This approach is supported by studies that have examined both perceived and technical healthcare quality in primary care facilities, with significant implications for the sustainability of national health insurance schemes, as demonstrated in Ghana [
2]. Additionally, an analysis between the Lombardy Region and national data from Italy revealed substantial differences in hospital care quality and clinical outcomes, underscoring the importance of regional context in healthcare quality assessment [
29]. These examples illustrate the critical role of regional comparisons in understanding and improving healthcare quality, offering valuable insights for optimizing healthcare services based on regional characteristics and patient perceptions.
Despite the significant contributions, the study has some limitations, including its geographical scope limited to Puglia and Emilia-Romagna. Expanding the analysis to other regions or comparing Italian data with that of other countries could provide a more comprehensive view. This is supported by research that has examined interregional healthcare mobility within a decentralized healthcare system, highlighting how factors such as regional income, hospital capacity, organizational structure, performance, and technology influence such mobility. Interregional patient mobility in a context of a decentralized healthcare system underscores the importance of these factors in driving patient decisions regarding the choice of healthcare outside their region of residence [
5]. Recent research emphasizes the importance of these factors in guiding patient decisions regarding the choice of healthcare outside their region of residence, offering valuable insights for more effective health policies [
20]. These insights underline the need for a deeper understanding and targeted strategies to address the challenges posed by interregional healthcare mobility, to ensure equity and efficiency in access to care across the national territory.
Furthermore, data access and its quality are critical aspects that can influence the generalizability of results. Future research should aim for a broader and more diversified data collection to overcome these limitations and further strengthen the research foundation. This study marks an important step towards using machine learning to better analyze and understand hospital mobility and the perception of healthcare service quality.
Future research should aim to expand the geographical scope and the available data base, integrating interdisciplinary perspectives for a more holistic understanding of hospital mobility dynamics.
These insights highlight the need for a deeper understanding and targeted strategies to address the challenges posed by interregional healthcare mobility, to ensure equity and efficiency in access to care across the national territory.