This section presents the outcomes of our analysis using the different machine learning methods discussed: Support Vector Machines (SVM), Linear Regression, and K-Medoids clustering. The results are organized by each method, including performance metrics, comparisons, and interpretations of the findings.
4.1. Classification Results Using Support Vector Machines (SVM)
The performance of the Support Vector Machines (SVM) models was evaluated using two different kernels: Radial Basis Function (RBF) and Sigmoid. This evaluation was conducted through a rigorous 10-fold cross-validation process, as shown in
Table 4. The results, including key metrics such as accuracy, precision, recall, and F1-score, are detailed in
Table 5. Additionally, the results of the 10-fold cross-validation for RBF Kernel in SVM are illustrated in
Figure 6, and confusion matrices for RBF kernel types are displayed in
Figure 7, offering a visual representation of the classification performance for RBF Kernel displayed in
Figure 8.
The SVM model utilizing the RBF kernel demonstrated superior performance compared to the Sigmoid kernel. Specifically, the RBF kernel achieved higher values in accuracy and F1-score metrics, indicating its enhanced ability to correctly classify instances of stunting prevalence. These improved performance metrics suggest that the RBF kernel more effectively captures the complex, non-linear relationships present in the data, resulting in more accurate and reliable classification outcomes. The parameter combinations tested are as follows:
C = 0.1, gamma = 0.01,
C = 1.0, gamma = 0.1,
C = 10.0, gamma = 1.0.
Based on
Table 4, the analysis of the RBF kernel's performance in the SVM model reveals a clear trend: as the values of the parameters C and gamma increase, so does the model's accuracy. Specifically, three parameter combinations were tested—C = 0.1, gamma = 0.01; C = 1.0, gamma = 0.1; and C = 10.0, gamma = 1.0—across a 10-fold cross-validation process. The results demonstrate that the combination of C = 10.0 and gamma = 1.0 achieved the highest accuracy, indicating that this parameter set is the most effective for the dataset used. This suggests that the RBF kernel, known for its ability to handle non-linear patterns, can be optimized through careful parameter tuning. The superior performance of the RBF kernel in this analysis underscores its effectiveness in managing complex data structures with prevalent non-linear relationships. For the RBF kernel in the SVM model, performance metrics show that accuracy, precision, recall, and F1-Score all improve with higher values of C and γ. Specifically, the highest accuracy of 91.00%, precision of 0.86, recall of 0.89, and F1-Score of 0.87 are achieved with C=10.0 and γ=1.0. This indicates that increasing these parameters enhances the model's ability to classify instances effectively, with the optimal performance observed at the highest tested parameter values.
Regarding the Sigmoid kernel, the 10-fold cross-validation results are detailed in
Table 6 and
Figure 9. The confusion matrix is shown in
Figure 10. Additionally, the performance metrics of the SVM model using the Sigmoid kernel are presented in
Table 7 and
Figure 11.
Based on
Table 6, the Sigmoid kernel's performance was evaluated with three parameter combinations in a 10-fold cross-validation. The results, summarized in
Table 6, showed that the combination C=1.0 and γ=0.1 achieved the highest average accuracy of 78.99%. In comparison, C=0.1 and γ=0.01 had an average accuracy of 70.74%, while C=10.0 and γ=1.0 yielded 65.76%. This suggests that moderate values of C and γ are more effective for the Sigmoid kernel, highlighting the importance of parameter tuning based on the specific dataset.
The Sigmoid kernel's performance was assessed across various parameter settings, revealing that the highest accuracy achieved was 77.67% with C=10.0C=10.0 and γ=1.0γ=1.0. Compared to the RBF kernel, the Sigmoid kernel generally performed with lower accuracy and F1-score values. Precision and recall improved with higher CC and γγ values but remained below the levels seen with the RBF kernel. Specifically, the precision peaked at 0.73 and recall at 0.76, with the F1-score reaching a maximum of 0.74. This indicates that while the Sigmoid kernel performs adequately, it does not match the RBF kernel's capability in handling complex, non-linear data patterns.
4.2. Predictive Modeling Results Using Linear Regression
The results of the prediction of stunting prevalence in Aceh, Indonesia, using linear regression are shown in
Figure 12. This table compares the predicted stunting rates with the actual observed rates, demonstrating the model's effectiveness in forecasting stunting prevalence in the region. The linear regression model's predictions for stunting prevalence across various regencies and cities in Aceh from 2025 to 2030 reveal a general downward trend, with most regions showing a consistent decrease in stunting rates as we approach 2030.
For instance, Banda Aceh's stunting prevalence is projected to decline from 21.44% in 2025 to 16.09% by 2030. Similarly, Aceh Besar is expected to see a reduction from 26.56% in 2025 to 21.66% by 2030. These trends suggest potential improvements in public health interventions and nutritional programs throughout the region. However, some areas, such as Aceh Tenggara and Subulussalam, despite showing a downward trend, are still projected to have relatively high stunting rates by 2030 (28.47% and 31.66%, respectively). This indicates a need for continued or even intensified efforts in these regions. The linear regression model's performance was evaluated for predicting stunting prevalence based on the dataset. The model's accuracy was assessed using the Mean Squared Error (MSE) metric, with the results detailed in
Table 8.
Figure 13 presents the results of the Mean Squared Error (MSE) values.
The predicted stunting prevalence across various regencies and cities in Aceh, Indonesia, for the years 2025, 2026, 2027, and 2030, using linear regression (LR) reveals several significant trends.
The linear regression model forecasts a general decline in stunting prevalence across most regions over the observed period. For instance, Banda Aceh is projected to decrease from 21.44% in 2025 to 16.09% by 2030, illustrating a positive trend towards reducing stunting. Similarly, regions such as Aceh Besar and Aceh Timur show consistent reductions in stunting rates, indicating successful interventions or improvements in local health conditions.
There is noticeable variability in the predicted stunting rates among different regencies and cities. Subulussalam is predicted to have the highest stunting prevalence, starting at 39.06% in 2025 and decreasing to 31.66% by 2030. In contrast, Gayo Lues and Aceh Utara also experience declines but start from higher rates, with Gayo Lues dropping from 28.00% to 23.00% and Aceh Utara from 33.66% to 29.81%. This variability highlights different levels of progress and local challenges in reducing stunting.
Certain regions, such as Aceh Tenggara and Subulussalam, continue to exhibit relatively high predicted stunting rates throughout the forecast period. Aceh Tenggara’s rates decrease from 32.22% in 2025 to 28.47% by 2030, while Subulussalam maintains the highest prevalence, even at the end of the forecast period. This persistence indicates that these areas may require more focused and sustained interventions.
Regions with initially lower stunting rates, like Banda Aceh and Aceh Selatan, show marked improvement over time. For example, Aceh Selatan’s prevalence is projected to drop from 25.88% in 2025 to 20.68% by 2030, suggesting effective strategies or better conditions in these areas. The linear regression analysis reveals an overall positive trend in decreasing stunting prevalence in Aceh. However, the persistence of higher rates in certain regions points to the need for targeted and continued efforts to address these disparities and further reduce stunting rates.
The Mean Squared Error (MSE) values for the predicted stunting prevalence across various regencies and cities in Aceh, Indonesia, are detailed in
Table 8. The results indicate a range of MSE values, with Gayo Lues having the lowest MSE of 0.0000, suggesting highly accurate predictions for this region. Conversely, Banda Aceh shows the highest MSE at 0.0438, indicating less accurate predictions compared to other areas. Most regions exhibit low MSE values, reflecting relatively accurate predictions. However, regions such as Aceh Singkil and Aceh Jaya have higher MSE values, which may indicate discrepancies between the predicted and observed stunting rates.
In addition to highlighting prediction accuracy, the MSE values reveal important insights into stunting prevalence trends across Aceh. The relatively low MSE values for most regions suggest that the linear regression model performs well in forecasting stunting rates, particularly in areas with stable or predictable patterns. Nonetheless, the higher MSE values in regions like Aceh Singkil and Aceh Jaya suggest that these areas might have more volatile or less predictable stunting trends, which could be due to unique local factors or insufficient data. This analysis underscores the model's overall effectiveness while also identifying regions where additional data or more complex modeling approaches may be needed to improve prediction accuracy. Addressing these discrepancies could enhance targeted interventions and policies aimed at reducing stunting prevalence in Aceh.
4.3. Comparison of Clustering Results Using K-Medoids and WP + K-Medoids
We compare the clustering results obtained from the conventional K-Medoids algorithm and the WP (Weight Product) optimized K-Medoids algorithm. Both methods were applied to the same stunting prevalence data across various regencies and cities in Aceh, Indonesia. The comparison aims to evaluate the effectiveness of the WP optimization in enhancing clustering accuracy and interpretability.
Table 9 presents a comparison of clustering results between the WP (Weight Product) optimized K-Medoids and the conventional K-Medoids algorithm.
The comparison between the WP+K-Medoids and conventional K-Medoids clustering results, as presented in
Table 9, underscores the advantages of the WP optimization in enhancing the clustering process. The WP+K-Medoids approach required significantly fewer iterations (3 iterations) to achieve convergence compared to the conventional K-Medoids algorithm, which needed 7 iterations. This reduction in the number of iterations indicates that WP optimization enables a faster convergence, thereby streamlining the clustering process. Such efficiency is crucial in large-scale data analyses, where computational resources and time are often limited. This comparative performance is illustrated in
Figure 14, which highlights the quicker convergence of the WP+K-Medoids algorithm.
The Calinski Harabasz Index, a widely recognized measure of clustering validity, reinforces the advantages of the WP+K-Medoids approach over the conventional K-Medoids algorithm. The WP+K-Medoids method achieved a Calinski Harabasz Index value of 49.75, significantly surpassing the 25.30 obtained with conventional K-Medoids. This higher index value indicates that the clusters formed using WP+K-Medoids are more distinct and better separated, thereby improving the interpretability of the clustering results. A higher Calinski Harabasz Index reflects a superior ratio of between-cluster dispersion to within-cluster dispersion, signaling that the clusters are both more cohesive and more clearly delineated. This enhanced clustering quality is illustrated in
Figure 15. The comparison of Calinski Harabasz scores, as shown in
Table 10, provides insights into the clustering quality of both K-Medoids and WP+K-Medoids methods. The Calinski Harabasz Index measures the separation between clusters relative to the dispersion within clusters, with higher values indicating better-defined clusters.
The average Calinski Harabasz score for the K-Medoids method is 0.0274. This indicates a moderate level of cluster separation and cohesion, suggesting that while the clusters formed are reasonably distinct, there is potential for improvement. The average score for the WP+K-Medoids method is 0.0307, which is noticeably higher than that of K-Medoids. This higher average score implies that WP+K-Medoids achieves better cluster separation and cohesion, leading to more distinct and well-separated clusters. he scores for K-Medoids range from 0.0256 to 0.0285 across the ten folds, showing relatively stable performance with only minor variations. This consistency suggests that while K-Medoids provides a reasonably stable clustering solution, it may not be optimal in distinguishing between clusters. The WP+K-Medoids method exhibits scores between 0.0284 and 0.0325, with slightly higher variations but consistently better performance compared to K-Medoids. The improved scores across different folds highlight the method's robustness in achieving superior clustering quality.
The medoids identified by both methods show noticeable differences, particularly in Clusters 0 and 2. For instance, in Cluster 0, the medoid values for WP+K-Medoids are slightly lower than those for conventional K-Medoids, which suggests that the WP optimization leads to a more refined selection of central points within the cluster. Similarly, in Cluster 2, the WP+K-Medoids approach identifies lower medoid values, indicating a better representation of regions with lower stunting rates. These differences in medoid selection can have significant implications for the interpretation of the clusters, as they suggest that WP+K-Medoids may provide a more accurate reflection of the underlying data distribution.
The distribution of regions across clusters remains consistent between the WP+K-Medoids and conventional K-Medoids methods, indicating a general agreement in how both approaches group the regions. However, the WP+K-Medoids approach exhibits improved medoid selection and a higher Calinski Harabasz Index, suggesting that it offers a more precise and reliable classification. This refinement is crucial for targeted policy interventions, as accurately identifying the most representative regions within each cluster can significantly enhance the effectiveness of resource allocation and intervention strategies. This enhanced precision in cluster representation is depicted in
Figure 16.
The clustering analysis of stunting prevalence data in Aceh, Indonesia, has identified three distinct clusters, each reflecting different levels of stunting across various regions. These clusters are categorized as Cluster 0, Cluster 1, and Cluster 2.
Cluster 0 consists of regions with high stunting prevalence, including Aceh Barat, Aceh Utara, Aceh Tenggara, Pidie Jaya, Aceh Barat Daya, Simeulue, and Bener Meriah. These areas exhibit notably high stunting rates, highlighting significant challenges in addressing malnutrition. The common factors influencing these high rates may include socio-economic conditions, limited access to healthcare, and educational disparities. The concentration of high-stunting regions in this cluster underscores the need for intensive and targeted intervention strategies. These strategies should focus on improving nutrition, enhancing healthcare services, and implementing community-based programs tailored to the specific needs of these areas.
Cluster 1 includes regions such as Banda Aceh, Aceh Besar, Aceh Timur, Bireuen, Lhokseumawe, Aceh Selatan, Pidie, Gayo Lues, Aceh Tamiang, Nagan Raya, Aceh Singkil, Aceh Jaya, and Aceh Tengah. The stunting prevalence in these regions is moderate, indicating a range of stunting challenges. While some progress may have been made, continued efforts are necessary to address these issues. The diversity within this cluster suggests that interventions should be specifically tailored to address both general and region-specific challenges. Ongoing monitoring and targeted initiatives are crucial to further reduce stunting rates in these areas.
Cluster 2 is represented by Subulussalam, which has the lowest stunting prevalence among all clusters. This low prevalence indicates that Subulussalam has effectively managed and reduced stunting compared to other regions. Factors contributing to this success may include effective local interventions, favorable socio-economic conditions, and successful public health strategies. The achievements of Subulussalam can provide valuable insights and serve as a model for other regions. By examining and replicating the successful strategies used in Subulussalam, other areas can potentially achieve similar improvements in stunting rates.
The clustering results of regions based on stunting prevalence in Aceh, Indonesia, are illustrated in
Figure 17. This figure visually represents the categorization of various regencies and cities into distinct clusters based on their stunting rates.