3.1. Build Direction
Once the images corresponding to the force–displacement and stress–strain graphs in the selected training and testing models had been processed and loaded, and the hyperparameters had been adjusted using Bayesian optimization with the Optuna library, the performance evaluation metrics were applied. The accuracy results obtained are presented in
Table 2.
In the classification by build direction, the Gradient Boosting Classifier (GBC) achieves the highest accuracy with the ESM filter (without moving-average smoothing), reaching values close to 0.74, suggesting that preserving the original signal without smoothing treatment may be beneficial for classification, as it preserves critical information that could be lost during filtering. Likewise, the AdaBoost Classifier (ABC) and the Random Forest Classifier (RFC) achieve competitive performance in the EM5 and ESM scenarios, respectively, due to their ensemble-based approaches, which combine multiple base estimators and are less prone to systematic errors than individual models. By contrast, the Multilayer Perceptron (MLP) model produced results with a lower standard deviation, suggesting greater consistency; however, its overall accuracy was lower. Conversely, while the GBC and RFC models achieved acceptable accuracy values, they exhibited greater variability, potentially indicating less stability when faced with changes to the applied filters.
In general, the results suggest that both the graph type (with better performance in stress-strain curves) and the treatment applied to the images significantly affect the models' accuracy. Avoiding excessive filtering can improve classification by preserving the spectral richness of the original signal.
Table 3 shows the results for the F1 score, which balances precision and recall by evaluating the model's ability to detect positive cases while avoiding false positives. In this case, the AdaBoost Classifier (ABC) model achieved the highest average F1-score, exceeding 0.75, in the FM5 configuration (a moving average with five standard deviations). This was accompanied by a lower standard deviation, indicating high performance stability.
Models such as the Gradient Boosting Classifier (GBC) and the Random Forest Classifier (RFC) also showed favorable results in configurations such as ESM and FM3, though with higher variability. For its part, the Multilayer Perceptron (MLP) model recorded the lowest F1 scores, in some cases below 0.40 (as in FSM and FM3), suggesting limited ability to extract relevant patterns from the images and considerable standard deviation, indicating inconsistent predictions.
The Decision Tree Classifier (DTC) model performed intermediate, with an F1 score of 0.699 in the EM3 configuration. However, there was a significant drop in performance under other conditions, such as ESM, suggesting that the model is sensitive to the preprocessing method.
Table 4 presents Recall metric results, reflecting each model's ability to correctly detect positive cases. In this context, the AdaBoost Classifier (ABC) model stands out with the highest average values, close to 0.91 in the EM3 and FM5 configurations, accompanied by a low standard deviation, indicating high consistency in its performance.
Conversely, the Gradient Boosting Classifier (GBC) and Random Forest Classifier (RFC) models also produce favorable results, averaging between 0.73 and 0.76, though this varies moderately depending on the filter applied. Models such as the Support Vector Machine (SVM), the Extra Trees Classifier (ETC), and the Logistic Regression (LR) yield acceptable results across different configurations; however, their high standard deviations limit the reliability of their predictions.
Once again, the Multilayer Perceptron (MLP) model shows the lowest performance, with values close to 0.26 in the EM5 condition and high dispersion in almost all filters, demonstrating a limited ability to detect positive cases. Finally, the Decision Tree Classifier (DTC) model shows intermediate performance, with good results on EM3 and FM5, but is highly sensitive to preprocessing type, which affects its consistency.
Table 5 presents the mean ROC AUC values, which evaluate the models' overall performance across different decision thresholds and provide a comprehensive view of their discriminatory capacity. In this context, the highest values were obtained by the AdaBoost (ABC) and GradientBoosting (GBC) models, both averaging 0.7417 under the ESM filter, demonstrating the models' high sensitivity and robustness after preprocessing. Similarly, the Decision Tree (DTC) models with EM5 and the Gradient Boosting (GBC) model with FSM achieved values of 0.7333, confirming their effectiveness in complex, challenging environments.
In contrast, the MLP and SVM models showed more modest performance, with averages ranging from 0.55 to 0.63 across most filters, reflecting difficulties in capturing discriminative patterns in this task. However, MLP with EM5 exhibited the lowest standard deviation of the set (0.196), indicating relatively stable behavior despite its lower effectiveness.
To complement the values presented in
Table 2,
Table 3,
Table 4 and
Table 5, box plots are included for each metric and model (see
Figure 5). These aim to facilitate the identification of patterns of dispersion and variability between models. This representation provides additional insight into values that differ from the mean, enriching the interpretation of the results. It is also possible to observe outliers in specific configurations, which could reflect differences in model reliability, likely associated with data processing.
In addition to the statistical summaries, heatmaps were generated to provide a consolidated visual overview of model performance across all metrics and filter configurations (see
Figure 6). By encoding values through a continuous colour scale, this representation enables swift identification of areas of higher or lower performance, revealing consistent trends and highlighting contrasts that may be less evident in numerical tables. The side-by-side arrangement of heatmaps for each configuration is an effective method to facilitate cross-comparison. This approach provides an immediate, intuitive understanding of how models behave under different experimental conditions.
These results were compared with those of Barrios et al. [
33], who used decision tree-based models to predict the roughness of parts manufactured by fused deposition modeling (FDM) in two build directions: Ra, 0°, and Ra, 90°. In their study, the Decision Tree model (J48/C4.5) achieved accuracies of
0.709 and
0.733, respectively, in each direction, while Random Forest reported accuracies of
0.807 and
0.743. In this study, Random Forest with the ESM filter achieved an accuracy of
0.70, an intermediate value between the two angles reported by Barrios et al. Meanwhile, DTC achieved an accuracy of
0.720 with EM5 preprocessing, DTC but decreased to
0.610 with ESM, demonstrating the significant sensitivity to image processing methods.
In the current study, the FM5 filter yielded an F1 score of
0.680 for Ra,0°, compared with the
0.716 reported by Barrios et al. for the same orientation. Under EM3 preprocessing, the DecisionTree model achieved an F1 score of
0.690, surpassing the
0.650 documented by Barrios et al. With respect to ROC AUC, Barrios et al [
33]. obtained
0.692 and
0.481 for Random Forest in Ra,0° and Ra,90°, respectively; in contrast, the present work's Random Forest ROC AUC ranged from
0.630 to
0.730 depending on the chosen filter. For the DecisionTree classifier, Barrios et al. [
33] reported ROC AUC values of
0.154 and
0.385, whereas this investigation observed values ranging from 0.610 to
0.760 across different preprocessing methods.
The key difference between the two studies is the origin and volume of the data used. Barrios et al. based their work on five clean numerical variables. In contrast, this study analyzed force-displacement and stress-strain curves from images subjected to different moving-average filters. This study also optimized hyperparameters with Optuna. These factors were shown to have a decisive impact on the final metrics.
On the other hand, Patil et al. [
34] classified the need for support in FDM parts as being implicitly related to the printing direction. In their study, Random Forest achieved an accuracy of
0.88, an F1 score of
0.87, a recall of
0.87, and an ROC AUC of
0.87. In contrast, this study's results showed that the same model achieved 0.73 in accuracy (EM5), 0.68 in F1 score (FM5), 0.76 in recall (FM5), and 0.71 in AUC (ESM). Patil et al. reported an accuracy of 0.90, F1 score of 0.90, recall of 0.90, and AUC of
0.90 for the decision tree model. In this study, the decision tree model yielded an accuracy of
0.69 (FM5), an F1 score of
0.69 (EM3), a recall of
0.76 (EM3), and an AUC of
0.73 (EM5). For SVM, Patil et al. obtained an accuracy of
0.69, an F1 score of
0.65, a recall of
0.69, and an AUC of
0.66. In our study, SVM achieved an accuracy of
0.62 (EM3), an F1 score of
0.64 (ESM), a recall of
0.71 (EM3 and ESM), and an AUC of
0.675 (ESM). For Gradient Boosting, Patil et al. achieved an accuracy of
0.89, an F1 score of
0.89, a recall of
0.89, and an AUC of
0.88. In this study, the model achieved an accuracy of
0.73 (ESM), an F1 score of
0.70 (EM3), a recall of
0.73 (EM5), and an AUC of
0.74 (ESM).
These discrepancies can be explained by the fact that Patil et al. used thirteen clean, wall-based numerical variables whose characteristics directly affect support prediction. In contrast, force-displacement and stress-strain curves were processed using filtering and conversion to image and matrix formats. During these stages, critical data necessary for classification is lost.
3.2. Layer Thickness
Table 6 presents the classification accuracy per model and filter for the Layer thickness parameter. The AdaBoost Classifier (
ABC) model achieves the highest average value (
0.672) in the ESM configuration, with a low standard deviation. This behavior suggests that keeping the image unchanged by moving average could benefit the classification by preserving critical information that could be lost during filtering. Likewise, the
ABC model shows acceptable performance in the FM3 and FM5 configurations, demonstrating good stability across different transformations.
The Decision Tree Classifier (DTC) model also shows favorable results, with average scores close to 0.59 across most scenarios. However, it exhibits greater dispersion, which could indicate an adequate ability to extract relevant patterns, although with high sensitivity to the type of processing applied.
The Gradient Boosting Classifier (GBC) and Random Forest Classifier (RFC) models report intermediate accuracy values, but with higher standard deviations than other models, suggesting a clear dependence on the filter used. On the other hand, the Multilayer Perceptron (MLP) model yields relatively low values, albeit with low deviation, suggesting less accurate but more consistent behavior.
Finally, the Support Vector Machine (SVM) and ExtraTrees Classifier (ETC) models show less favorable results, with configurations achieving an accuracy of no more than 0.43. In general, ensemble models—particularly those based on boosting—perform better on this classification task.
On the other hand,
Table 7 shows the F1-score values, aiming to achieve the best balance between precision and recall. In this case, the AdaBoost (
ABC) model again yielded the highest average values, with a mean of 0.7089 across both the FM3 and FM5 filters and a low standard deviation. It should be noted that the Multilayer Perceptron (
MLP) also offered outstanding results in FM3 and FM5, with values above
0.6. In contrast, the ExtraTrees Classifier (
ETC) and Random Forest Classifier (
RFC) models had the lowest values, with average F1-scores below
0.45 and high dispersion. The Logistic Regression (
LR) and Support Vector Machine (
SVM) models yielded intermediate values, with EM5 and FSM standing out, respectively. The Decision Tree (
DT) model achieved its highest value (0.576) in the ESM filter, although with a high standard deviation.
Table 8 shows the recall values for each model under each applied filter. The
MLP and AdaBoost (
ABC) models achieved the highest average values, particularly when using the FM3, FM5, and FSM filters. In these configurations, the average values recorded by ABC ranged from 0.88 to 0.97, whereas the
MLP model consistently exceeded
0.86, demonstrating its high sensitivity in detecting positive cases.
In contrast, the Random Forest (RFC) and ExtraTrees (ETC) models showed the lowest performance, with average recall values below 0.5 across most filters and high standard deviations. This combination suggests lower stability and reliability in their classification ability.
A particular case was observed in logistic regression (LR) under the FM3 filter, where perfect recall (1.0) was achieved. However, the absence of variability (standard deviation = 0) could indicate overfitting or an imbalanced class distribution, so this result should be interpreted with caution.
Finally, it should be noted that the FM3, FM5, and FSM filters tend to improve recall in most of the models evaluated, suggesting that these configurations favor greater recovery of positive cases in the context analyzed.
Finally,
Table 9 presents the values for the ROC AUC metric. Once again, the AdaBoost (
ABC) model stands out with the highest average, particularly under the ESM filter, where it achieves an average of
0.692, demonstrating its good discrimination capacity and adequate stability.
The Gradient Boosting Classifier (GBC) model also showed favorable results in ESM (0.625), although with a higher standard deviation, indicating greater variability in its performance. On the other hand, the Logistic Regression (LR), Support Vector Machine (SVM), and Multilayer Perceptron (MLP) models showed moderate performance, with values ranging from 0.50 to 0.58, without standing out in any particular filter.
The Decision Tree Classifier (DTC) model produced high values of 0.60 in some cases, but with significant variability, which compromises its reliability. Overall, the ABC model is the most robust option for discrimination and stability of the layer thickness variable.
To complement the results in
Table 6,
Table 7,
Table 8 and
Table 9, box plots for all metrics and models were included (see
Figure 7). This visualization supports the identification of performance patterns, highlighting both consistency and variability across filters. Outliers and wide dispersions, especially in models like DTC, suggest potential reliability issues, while compact distributions in ABC reinforce its robustness.
To provide a comprehensive visual comparison of the model's performance across all metrics and filter configurations, as presented in
Table 6,
Table 7,
Table 8 and
Table 9, heatmaps were generated. The visual representation provided by these heatmaps is intended to offer a holistic assessment of the model's performance, as illustrated in
Figure 8. By encoding average values using a continuous colour scale, this representation enables rapid identification of high- and low-performing combinations and consistent trends across filters. The visual gradients facilitate the detection of clusters of strong performance, highlight subtle differences between models, and reveal patterns that may not be immediately evident in numerical tables, thus enriching the overall interpretation of the results.
In contrast, the work of Hien et al. [
35] was examined, in which the researchers non-destructively classified eight thicknesses of dielectric materials using support vector machines (
SVMs) and a deep neural network with six hidden layers. Utilizing an RBF kernel, the SVM attained an accuracy of
0.997 (Polynomial:
0.991; Sigmoid:
0.986), while the DNN attained
0.999 with a moderate amount of data, thereby demonstrating a strong correlation between well-based electromagnetic variables and thickness.
In this study, the same SVM achieved its lowest accuracy (0.41) with the FM3 and FM5 filters and its highest (0.57) with the EM3 filter. These findings suggest that classifying thicknesses from force–displacement and stress–strain curve images poses a greater challenge. The MLP classifier achieved an accuracy range of 0.45-0.51 across all configurations. It is noteworthy for its low standard deviation, though a concomitant reduction in overall precision accompanied it.
These discrepancies can be attributed to the inherent characteristics of the data itself. In contrast, Hien et al. employed five numerical variables that directly addressed layer thickness. In this study, however, the mechanical curve images underwent a series of processing stages, including filtration, matrix conversion, and flattening. These stages can potentially result in the loss of critical information necessary for effective classification.
3.3. Infill Density
Table 10 shows the accuracy values for the infill density variable by model and filter. The AdaBoost Classifier (ABC) model achieves the highest value in the FSM configuration (0.739), demonstrating its ability to classify data effectively even when processing images intensively. The Gradient Boosting Classifier (GBC) and Decision Tree Classifier (DTC) models also demonstrate outstanding performance in the FSM configuration, with values of 0.678 and 0.683, respectively, indicating that tree-based approaches and boosting techniques can extract relevant infill-related patterns regardless of the processing type applied.
The Support Vector Machine (SVM) model maintains values above 0.65 in FM3, FM5, and FSM, with moderate standard deviations, reflecting good stability across different image treatments. In contrast, the Multilayer Perceptron (MLP) model has the lowest values, below 0.49, with low deviations, suggesting consistent but inaccurate performance.
The Logistic Regression (LR) and Random Forest Classifier (RFC) models show intermediate performance, with values between 0.55 and 0.64 and standard deviations between 0.27 and 0.32, indicating a moderate but somewhat variable response.
Conversely,
Table 11 shows the F1-score values for each model. Once again, the AdaBoost Classifier (
ABC) model achieved the best results, with values of
0.710,
0.703, and
0.698 in FSM, FM3, and FM5, respectively. These values demonstrate moderate variability and a good balance between accuracy and consistency.
The Support Vector Machine (SVM) model performed well too, achieving values close to 0.63 across all three filters and demonstrating low standard deviation, indicating high stability in the face of configuration changes. By contrast, the Decision Tree Classifier (DTC) model achieved scores of 0.644 and 0.610 on FSM and FM3, respectively, and 0.610 on FM5. It demonstrated moderate variability and less consistency in more complex scenarios.
The Gradient Boosting Classifier (GBC) model produced intermediate results, achieving an FSM value of 0.617 and a low standard deviation, suggesting that the model is reliable for tasks involving multiple variables. By contrast, the Multilayer Perceptron (MLP) model produced the lowest values: 0.367 in FM3 and 0.594 in FM5. However, it showed high sensitivity to the processing method, as indicated by the high standard deviation.
Finally, the logistic regression (LR) and random forest classifier (RFC) models demonstrated intermediate performance, ranging from 0.54 to 0.61, indicating acceptable, albeit unstable, classification performance, depending on the filter used.
Table 12 below shows the recall values for each model under each filter configuration. The AdaBoost Classifier (
ABC) and Support Vector Machine (
SVM) models achieved the best results in the FM3, FM5, and FSM configurations. Their averages ranged from
0.78 to
0.92, and their standard deviation was low, demonstrating their ability to detect positive cases and their stability acrossrocessing variations.
The Logistic Regression (LR) model performed well too, achieving values of 0.71 and 0.75 in FM3 and FM5, respectively, with low variability in both configurations, indicating a reliable response. By contrast, the Multilayer Perceptron (MLP) model exhibited inconsistent behavior: while it achieved a value of 0.85 in FM5, it did so with high variability, reflecting low reliability and sensitivity to the type of filter used.
Conversely, the Extra Trees Classifier (ETC), Gradient Boosting Classifier (GBC), and Decision Tree Classifier (DTC) models exhibited intermediate values ranging from 0.63 to 0.73 with moderate variability, suggesting consistent yet less robust performance in challenging configurations. The Random Forest Classifier (RFC) model achieved FM3 and FM5 values of 0.70 and 0.72, respectively, but exhibited high variability in response to filter changes, limiting its comparative stability.
Finally,
Table 13 shows the ROC AUC values for each model, grouped by applied filter. The
SVM and Logistic Regression (
LR) models achieved the highest values in the FM3, FM5, and FSM configurations, averaging 0.66-
0.69 with moderate variability, indicating good discriminatory capacity and consistency in the face of processing changes.
The AdaBoost (ABC) model achieved an average FSM value of 0.75, but its standard deviation was higher than that of the other models, indicating instability and sensitivity to filter type. The Gradient Boosting (GBC) model demonstrated consistent performance with an FSM value of 0.70, though it also exhibited considerable variability.
In contrast, the Multilayer Perceptron (MLP) model produced the lowest results, ranging from 0.55 to 0.61. However, the standard deviations were reduced, suggesting limited yet stable capacity. The Extra Trees (ETC), Random Forest (RFC), and Decision Tree (DTC) models showed intermediate results between 0.63 and 0.68. However, their comparative reliability is limited due to high variability.
To complement the results in
Table 10,
Table 11,
Table 12 and
Table 13, box plots for all metrics and models were included (see
Figure 9). This visualization facilitates the comparative assessment of model performance across experimental conditions, emphasizing both stability and dispersion. Models such as
RF and
SVM exhibit compact distributions, indicating consistent behavior across filters. In contrast, broader spreads and frequent outliers in models like
DTC point to potential sensitivity to data variability, raising concerns about generalizability.
To provide a more comprehensive overview of the results presented in
Table 10,
Table 11,
Table 12 and
Table 13, heatmaps were generated to offer a visual representation of the model's performance across all metrics and filter configurations (see
Figure 10). By encoding average values through a continuous red-blue colour scale, this representation enables the rapid identification of high- and low-performing combinations, as well as consistent trends across filters. The visual gradients highlight clusters of strong performance, reveal subtle contrasts between models, and expose patterns that may remain hidden in numerical tables, thereby enriching the interpretation of the results.