Preprint
Article

This version is not peer-reviewed.

Sustainability-Focused Evaluation of Self-Compacting Concrete: Integrating Explainable Machine Learning and Mix Design Optimization

A peer-reviewed article of this preprint also exists.

Submitted:

29 December 2025

Posted:

31 December 2025

You are already at the latest version

Abstract
Self-compacting concrete (SCC) offers significant advantages in construction due to its superior workability; however, optimizing SCC mixture design remains challenging because of complex nonlinear material interactions and increasing sustainability requirements. This study proposes an integrated, sustainability-oriented framework that combines machine learning (ML), SHapley Additive exPlanations (SHAP), and multi-objective optimization to improve SCC mixture design. A large and heterogeneous global dataset, compiled from 156 peer-reviewed studies and enhanced through a structured three-stage data augmentation strategy, was used to develop robust predictive models for key fresh-state properties. An optimized XGBoost model demonstrated high predictive performance, achieving coefficients of determination of R2 = 0.835 for slump flow and R2 = 0.828 for T50 time, with strong generalization to industrial SCC datasets. SHAP-based interpretability analysis identified the water-to-binder ratio and superplasticizer dosage as the dominant factors governing fresh-state behavior, providing physically meaningful insights into mixture performance. A cradle-to-gate life cycle assessment was integrated within a multi-objective genetic algorithm to simultaneously minimize embodied CO2 emissions and material costs while satisfying workability constraints. The resulting Pareto-optimal mixtures achieved up to 3.9% reduction in embodied CO2 emissions compared to conventional SCC designs without compromising performance. External validation using industrial data confirms the practical reliability and transferability of the proposed framework. Overall, this study presents an interpretable and scalable AI-driven approach for the sustainable optimization of SCC mixture design.
Keywords: 
;  ;  ;  ;  ;  
Subject: 
Engineering  -   Other

1. Introduction

Self-compacting concrete (SCC) has emerged as one of the most groundbreaking advancements in contemporary construction technology. Its unique capability to self-consolidate under its own weight, while exhibiting exceptional flowability, passing ability, and resistance to segregation, has transformed the approach to concrete placement, particularly in complex structural applications [1,2]. The evaluation of SCC’s fresh-state performance is typically conducted through standardized tests, including slump flow, T50 flow time, V-funnel time, and L-box ratio, which collectively reveal the rheological properties governing flowability and passing ability [3,4,5].
However, SCC mix proportioning remains challenging due to the intricate interplay of powder content, aggregate gradation, water demand, and superplasticizer chemistry, which introduces highly nonlinear behavior in fresh-state performance [2,6,7]. At the same time, the construction industry faces growing pressure to reduce CO 2 emissions associated with cement production—a major contributor to global greenhouse gas emissions. SCC mixtures generally require higher binder content and increased use of chemical admixtures, which can further intensify their environmental burden unless carefully optimized [8,9,10]. Recent sustainability-oriented studies have underscored the need for next-generation SCC and related high-performance concretes to prioritize resource efficiency, lower carbon footprints, and enhanced durability to align with global environmental objectives [11,12,13]. Empirical investigations have demonstrated that well-optimized sustainable concrete designs can significantly reduce environmental impacts while maintaining acceptable workability and mechanical performance [9,12,14]. These findings position SCC as a feasible material for environmentally responsible construction when supported by intelligent optimization strategies.
Parallel to innovations in mix design, machine learning (ML) has introduced powerful capabilities for modeling SCC behavior. ML algorithms can effectively capture nonlinear interactions among mixture variables that traditional empirical models often struggle to represent [1,3,5,6,15]. Studies have shown that ensemble algorithms—such as Random Forest, Gradient Boosting, and XGBoost—achieve superior predictive accuracy for both fresh and hardened properties of SCC and other cementitious composites [3,15,16,17]. Nevertheless, a key limitation of many ML-based studies is their limited interpretability. The emergence of explainable artificial intelligence (XAI), particularly through SHapley Additive exPlanations (SHAP), has enabled more physics-informed understanding of feature contributions, identifying critical parameters such as water-to-binder ratio, powder content, and fine aggregate proportion as dominant drivers of concrete workability and strength [2,4,18,19,20].
Furthermore, multi-objective optimization methods have proven effective for designing sustainable concrete mixtures that balance workability or strength performance with environmental metrics like CO 2 emissions and binder consumption [9,10,11,12,13,14]. Prior research integrating ML predictions with evolutionary or genetic optimization algorithms has demonstrated the ability to produce Pareto-optimal mixes that satisfy performance constraints while minimizing cement and admixture content [9,13,14]. Such advancements highlight the emerging paradigm of data-driven, sustainability-focused concrete design.
Despite these improvements, important research gaps remain. Most existing studies treat ML prediction, interpretability, and optimization as independent stages rather than components of a unified pipeline [3,7,21]. Additionally, limited use of large or augmented SCC datasets reduces the generalizability of current models [3,16]. Even fewer studies combine ML, XAI, and sustainability-driven optimization to simultaneously generate accurate predictions, interpretable insights, and actionable mix design recommendations. Recent literature has emphasized the need for integrated frameworks that synergize prediction, explanation, and optimization for next-generation sustainable concrete development [9,11,13,14,16].
This study addresses these gaps by developing an integrated pipeline that combines ML, XAI, and multi-objective optimization to support sustainable SCC mix design. Using an augmented global dataset, the study achieves the following contributions: (1) highly accurate predictions of fresh SCC properties using XGBoost and complementary ML models; (2) transparent SHAP-based explanations revealing the influence of mixture parameters; and (3) multi-objective SCC mix optimization using a genetic multi-objective algorithm to balance workability with CO 2 emissions. This unified framework provides a transferable methodology for data-driven and sustainability-oriented SCC design while advancing both scientific understanding and practical application in sustainable concrete technology.

2. Materials and Methods

This section presents the comprehensive methodology used to develop an interpretable machine learning framework for sustainable SCC mix design. The workflow includes data collection and preprocessing, model development, interpretability analysis, sustainability assessment, multi-objective optimization, and external validation.

2.1. Data Collection and Preprocessing

2.1.1. Dataset Assembly and Cleaning

The foundation of this study is a large and heterogeneous dataset of SCC mix designs compiled from 156 independent sources published between 2001 and 2024, yielding a total of 2,506 unique mix designs. This dataset is significantly larger and more diverse than those typically used in previous research, enhancing the generalizability of the machine learning models.
Each mix design includes 20 numerical input features (e.g., water-to-binder ratio, total powder content, total aggregate content, and admixture dosages) and four target workability properties: Slump Flow (mm), V-funnel (s), T50 (s), and L-box ratio (H1/H2). All originally “Unnamed” features in the raw data were identified and renamed to physically meaningful engineering parameters.
The data cleaning process involved removing duplicate entries, detecting and handling outliers using the Interquartile Range (IQR) method, and applying K-Nearest Neighbors (KNN) imputation to address missing values.

2.1.2. Novel Data Augmentation Protocol

To improve model robustness and reduce overfitting, a novel three-stage data augmentation pipeline was applied:
  • Gaussian Noise Injection
  • Mixup Interpolation
  • SMOTE Oversampling
This expanded the training data fourfold (from 2,005 to 8,688 samples). The effect of this augmentation on the performance of different models is summarized in Figure 1.

2.2. Machine Learning Model Development

2.2.1. Model Selection and Training

Six machine learning algorithms were evaluated: XGBoost, Random Forest (RF), Gradient Boosting (GBM), Support Vector Regression (SVR), K-Nearest Neighbors (KNN), and Linear Regression (LR). The dataset was split into 80% training and 20% testing, and all models were cross-validated using five folds.

2.2.2. Hyperparameter Optimization

Grid Search with 5-fold cross-validation was applied to XGBoost, producing the best overall performance:
R Slump 2 = 0.835 , R T 50 2 = 0.828 .

2.3. Model Interpretability and Explainability (SHAP)

SHapley Additive exPlanations (SHAP) were used to transform the model from a black box into an interpretable tool. Global feature importance, dependence behavior, and local decision explanations were generated; detailed quantitative results are presented in Section 3.2.

2.4. Sustainability Assessment (LCA)

A cradle-to-gate Life Cycle Assessment was implemented to quantify embodied CO 2 , embodied energy, and material cost. The relationship between cement content and embodied CO 2 for the assembled SCC dataset is presented in Figure 9.

2.5. Multi-Objective Optimization (NSGA-II)

Optimization objectives included:
  • Maximize slump flow
  • Minimize CO 2 emissions
  • Minimize material cost
NSGA-II was executed for 200 generations and produced 50 Pareto-optimal SCC mix designs. The resulting trade-off surface is discussed and visualized in Figure 10 in the Results section.

2.6. External Validation

The model was tested on four industrial SCC mixtures from Kuwaiti British Readymix Co. W.L.L., confirming strong predictive reliability and real-world applicability (Section 3.4).

2.7. Software and Code Availability

All analyses were performed in Python 3.11 using scikit-learn, XGBoost, SHAP, pandas, numpy, matplotlib, and pymoo. All scripts and trained models are provided in the supplementary package.

3. Results

This section is organized into thematic subsections covering the predictive performance of the machine learning framework, interpretability analysis, multi-objective optimization, and external validation using industrial SCC data. Each subsection presents a concise and rigorous interpretation of the findings and highlights the engineering implications of the results.

3.1. Predictive Performance of the Machine Learning Framework

The optimized XGBoost model, trained on the augmented global SCC dataset, exhibited strong predictive performance across the four primary workability properties. Table 1 summarizes the evaluation metrics obtained from the independent 20% test set.
Figure 2 compares the R 2 scores of the competing algorithms, while Figure 3 presents the corresponding RMSE values. The XGBoost model consistently outperforms the remaining models across all workability targets.
To illustrate prediction accuracy at the sample level, Figure 4 presents the predicted versus actual values for Slump Flow and T50 obtained from the independent test set. The assembled dataset comprises 2,506 unique SCC mix designs and was divided into 80% for training and 20% for testing, resulting in approximately 501 samples in the test set. The close clustering of data points around the 1:1 line demonstrates strong agreement between the model predictions and the experimental measurements.
The enhanced performance is largely attributed to the data augmentation protocol, whose benefits are summarized in Figure 1 and discussed in Section 4.

3.2. Model Interpretability via SHAP Analysis

3.2.1. Global Feature Importance

Figure 5 summarizes the global SHAP feature importance across all SCC workability models. The most dominant feature is the water-to-binder ratio, followed by superplasticizer dosage and total powder content, which is consistent with established concrete rheology.
A more detailed view for Slump Flow is given by the SHAP beeswarm plot in Figure 6, which highlights the distribution of SHAP values for the most influential features.

3.2.2. Feature Dependence and Physical Interpretation

The nonlinear feature-response relationships are examined in Figure 7 and Figure 8, which show SHAP dependence plots for key predictors.
These plots reveal, for example, that Slump Flow SHAP values increase sharply up to w / b 0.45 before reaching a plateau, and that superplasticizer dosage exhibits diminishing returns beyond approximately 1.5% bwob, both behaviors aligning with physical expectations.

3.3. Multi-Objective Optimization for Sustainable SCC Design

3.3.1. Sustainability Benefits of Pareto-Optimal Mixes

The relationship between cement content and embodied CO 2 across the global SCC dataset is shown in Figure 9, highlighting the strong environmental motivation for cement-efficient mix designs.
Figure 9. Relationship between cement content and embodied CO 2 in SCC mixtures.
Figure 9. Relationship between cement content and embodied CO 2 in SCC mixtures.
Preprints 191972 g009
The NSGA-II algorithm generated a Pareto front of 50 non-dominated SCC mix designs. The three-dimensional trade-off surface between Slump Flow, cement content, and CO 2 emissions is illustrated in Figure 10.
Figure 10. Three-dimensional Pareto front illustrating trade-offs between Slump Flow, cement content, and CO 2 emissions.
Figure 10. Three-dimensional Pareto front illustrating trade-offs between Slump Flow, cement content, and CO 2 emissions.
Preprints 191972 g010
Compared to the average mix in the global dataset, the Pareto-optimal solutions achieved approximately 3.9% reduction in embodied CO 2 , 2.2% reduction in embodied energy, and 1.8% reduction in material cost, confirming the sustainability benefits of the optimized designs.

3.3.2. Constrained Single-Objective Optimization

A constrained Differential Evolution optimization was performed with Slump Flow as the objective while enforcing limits on V-funnel, T50, and L-box ratio. The comparison between the best optimized mix and the best existing mix is shown in Figure 11.
The optimized mix achieved a maximum Slump Flow of 776.92 mm while satisfying all SCC workability criteria, illustrating the ability of the framework to explore high-performance yet feasible mix designs.

3.4. External Validation Using Industrial SCC Mixes

To evaluate real-world applicability, the final XGBoost model was tested on four industrial SCC mix designs supplied by Kuwaiti British Readymix Co. W.L.L. Table 2 summarizes the external validation results.
Figure 12 visualizes the predictive performance for these industrial mixes. All four predictions fall comfortably within the ± 100 mm tolerance band.
The error range (MAE = 79.9 mm) is comparable to typical laboratory-to-laboratory variation, underscoring the practical reliability of the framework.

4. Discussion

4.1. Context, Implications, and Future Work

The results of this study confirm the central working hypothesis: a robust and interpretable machine learning framework—built upon a large, heterogeneous global dataset and enhanced through novel data augmentation—can accurately predict SCC workability and support sustainable mix design optimization. This section contextualizes the findings within prior research, discusses their broader industrial implications, and outlines limitations and avenues for future development.

4.1.1. Contextualization with Previous Studies

The predictive performance of the optimized XGBoost model ( R 2 = 0.835 for Slump Flow and R 2 = 0.828 for T50; Table 1) is comparable to or competitive with state-of-the-art models reported in recent literature, which typically present R 2 values between 0.85 and 0.95. However, direct comparisons can be misleading, as most previous studies rely on small and homogeneous datasets that naturally inflate performance metrics.
In contrast, this work utilized a significantly larger dataset—2,506 SCC mixes from 156 sources—approximately an order of magnitude larger than typical datasets. Despite this increased heterogeneity, the model maintained strong predictive accuracy, demonstrating superior generalization capacity. The performance gain from the augmentation protocol is clearly observed in Figure 1, where augmented models outperform their non-augmented counterparts across all workability properties. This improvement is particularly meaningful when viewed alongside earlier analyses on the same global dataset, where conventional Random Forest models trained without targeted augmentation yielded only moderate R 2 values on heterogeneous data. In this context, the present XGBoost+augmentation framework can be interpreted as a second-generation model that preserves physical consistency while substantially strengthening predictive power across a much noisier design space.
This study also addresses several critical gaps in previous SCC machine learning efforts:
  • Generalization Proof: The external validation results in Figure 12 demonstrate the model’s successful transfer to industrial SCC mixes from Kuwait. Four independent production mixes from a local ready-mix supplier were predicted, and all predictions fall within the ± 100 mm tolerance, with small and tightly clustered errors and no systematic bias. This confirms that a model trained exclusively on global academic data can generalize to real industrial conditions, providing a rare and robust demonstration of real-world applicability that goes beyond cross-validation statistics alone.1
  • Transparency and Interpretability: The global SHAP feature importance in Figure 5 shows that the water-to-binder ratio, superplasticizer dosage, and powder content are consistently dominant, fully aligning with expected rheological behavior and reinforcing confidence in the learned relationships. These findings echo previous explainable-AI analyses on the same dataset, which independently identified water-to-binder ratio, aggregate content, and powder volume as the principal drivers of SCC workability. The close agreement between current SHAP patterns and earlier studies suggests that the improved model is not simply overfitting but is reinforcing physically meaningful trends.
  • Integrated Sustainability Assessment: The strong dependence of embodied CO 2 on cement content (Figure 9) and the Pareto front of sustainable SCC designs (Figure 10) illustrate the value of coupling LCA with ML and evolutionary optimization in a unified framework. Compared with the original dataset, the Pareto-optimal solutions achieve noticeable reductions in CO 2 , energy, and cost while maintaining acceptable workability, confirming that the optimization procedure is not only mathematically sound but also practically beneficial from a sustainability perspective.

4.1.2. Broader Implications of the Findings

The validated ML–LCA–optimization framework carries several important implications:
  • Accelerated Sustainable Design: Engineers can rapidly explore environmentally optimized mixes guided by the Pareto front in Figure 10. These mixes achieve up to 3.9% CO 2 reduction while preserving workability requirements, shortening design cycles and reducing experimental load. In combination with the optimization-validation results, which show that the vast majority of Pareto-optimal solutions satisfy standard SCC acceptance criteria, the framework effectively delivers a ready-to-use design map of feasible, greener alternatives rather than isolated “point” recommendations.
  • Enhanced Quality Control: With accurate predictions of SCC workability from mix proportions (Figure 2 and Figure 4), the model can be integrated into batching systems to provide real-time guidance and reduce the risk of non-compliant deliveries. The external validation on Kuwaiti industrial mixes indicates that the predictive errors remain small and consistent even when materials and production conditions differ from those represented in the training data. This stability suggests that the model can function as a soft sensor for quality control, flagging potentially problematic batches before casting and supporting proactive adjustments in plant operations.
  • Advancement of Data-Driven Materials Science: SHAP interaction patterns in Figure 6, Figure 7 and Figure 8 expose complex nonlinear effects and thresholds that traditional mixture design methods cannot capture, providing new mechanistic insights and hypothesis-generation opportunities. For example, the observed interaction between powder content and superplasticizer dosage, or between aggregate grading and water-to-binder ratio, may motivate targeted experimental campaigns aimed at refining existing design guidelines and updating empirical limits used in codes and company specifications.

4.1.3. Limitations of the Work

Despite strong performance, several constraints should be acknowledged:
  • Focus on Fresh Properties Only: The present framework targets workability-related fresh properties. Hardened properties such as compressive strength or durability indicators were not included but are essential for full structural optimization. In particular, the current optimization searches within a feasible fresh-state envelope but does not explicitly enforce long-term mechanical or durability constraints, which must still be checked separately.
  • LCA Data Uncertainty: The sustainability assessment is based on regional average emission factors and cost data. Real impacts may vary with supplier-specific processes, transportation distances, and energy mixes. As a result, the absolute values of CO 2 , energy, and cost should be interpreted as approximate indicators rather than precise project-specific quantities, and recalibration with local LCA datasets is advised before use in critical infrastructure projects.
  • Literature-Derived Dataset: Although large, the dataset is derived from published studies and may therefore carry publication biases or over-representation of certain mix types. Industrial data from under-represented regions and applications (e.g., precast elements, high-powder or low-cement SCC) remain limited. While the Kuwaiti validation partially offsets this limitation by confirming performance on unseen industrial mixes, broader multi-regional validations would further strengthen confidence in global deployment.

4.1.4. Future Research Directions

Building on the present findings, the following research directions are recommended:
  • Integration of Hardened Properties: Extend the framework to predict compressive strength, modulus of elasticity, and durability metrics, enabling fully performance-based optimization of SCC. A natural next step is to embed multi-objective optimization in a joint fresh–hardened property space, balancing workability, mechanical performance, and durability with environmental and economic indicators.
  • Advanced Decision Support: Incorporate multi-criteria decision-making (MCDM) methods to help practitioners rank or select solutions from the Pareto front based on project-specific priorities (e.g., carbon-to-cost ratio, robustness to material variability, or construction speed). This would convert the current set of Pareto-optimal mixes into an interactive decision-support tool aligned with stakeholders’ preferences.
  • Real-Time Intelligent Batching: Couple the predictive models with sensor-driven feedback from batching plants to automatically adjust mix proportions under material variability. In such a closed-loop system, the ML model would serve as a digital twin of workability, continuously updated with plant measurements and enabling adaptive control strategies that maintain SCC performance despite fluctuations in moisture content, grading, or admixture effectiveness.
  • Transfer Learning and Regional Adaptation: Develop transfer learning pipelines to adapt the globally trained model to regional datasets with minimal local data, increasing accessibility for small- and medium-sized concrete producers. The Kuwaiti industrial validation suggests that only modest local calibration may be needed for good performance; formalizing this process through transfer learning, domain adaptation, or active learning would make the framework more scalable and easier to adopt in new regions and for new material systems (e.g., LC3 binders, recycled aggregates, or novel admixtures).

5. Conclusions

This study successfully developed and validated a comprehensive, interpretable machine learning framework for the sustainable mix design of self-compacting concrete (SCC). By systematically addressing critical gaps in the existing literature, the work represents a significant advancement toward the industrial deployment of AI-driven materials design.
The main conclusions are:
  • Superior Generalization Capability: The framework is built on the largest and most diverse SCC dataset reported to date, comprising 2,506 mixes from 156 global sources. A three-part data augmentation protocol (Gaussian Noise, Mixup, SMOTE) significantly improved robustness and mitigated the effects of dataset heterogeneity. As a result, the XGBoost model achieved excellent predictive performance with R 2 = 0.835 for Slump Flow and R 2 = 0.828 for T50.
  • Proven Real-World Applicability: External validation on four SCC mixes from Kuwait demonstrated 100% accuracy within the industry-standard tolerance of ± 100 mm (Figure 12), with a Mean Absolute Error of only 79.9 mm. This provides strong evidence of the model’s practicality for field adoption.
  • Transparent and Physically Grounded Insights: Through comprehensive SHAP analysis (Figure 5, Figure 6, Figure 7 and Figure 8), the framework transitions from a black-box predictor to a transparent engineering tool. The model’s learned relationships are physically meaningful, identifying the water-to-binder ratio and superplasticizer dosage as the dominant parameters influencing SCC workability.
  • Holistic Sustainable Optimization: By integrating cradle-to-gate LCA (Figure 9) with NSGA-II multi-objective optimization (Figure 10), the framework generates a Pareto front of 50 non-dominated, sustainability-enhanced mix designs. These optimized mixes achieve average reductions of 3.9% in embodied CO 2 and 2.2% in embodied energy compared to baseline designs.
Overall, the proposed framework offers a highly accurate, interpretable, and externally validated solution for SCC workability prediction and sustainable mix design, providing a practical and scalable pathway toward high-performance, low-carbon concrete mixtures.

Author Contributions

Conceptualization, A.A.; methodology, A.A.; software, A.A.; validation, A.A.; formal analysis, A.A.; investigation, A.A.; resources, A.A.; data curation, A.A.; writing—original draft preparation, A.A.; writing—review and editing, A.A. and S.K.; visualization, A.A.; supervision, S.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data and code are available from the corresponding author on reasonable request and will be provided as supplementary material.

Acknowledgments

The authors gratefully acknowledge the support of Kuwaiti British Readymix Co. W.L.L. for providing industrial SCC mix data.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

SCC Self-Compacting Concrete
ML Machine Learning
XAI Explainable Artificial Intelligence
SHAP SHapley Additive exPlanations
LCA Life Cycle Assessment
NSGA-II Non-Dominated Sorting Genetic Algorithm II

References

  1. El Asri, Y.; Benaicha, M.; Zaher, M.; Hafidi Alaoui, A. Prediction of the compressive strength of self-compacting concrete using artificial neural networks based on rheological parameters. Structural Concrete 2022, 23, 3864–3876. [Google Scholar] [CrossRef]
  2. Cheng, B.; Mei, L.; Long, W.J.; Kou, S.; Li, L.; Geng, S. Ai-guided proportioning and evaluating of self-compacting concrete based on rheological approach. Construction and Building Materials 2023, 399, 132522. [Google Scholar] [CrossRef]
  3. Safhi, A.E.M.; Dabiri, H.; Soliman, A.; Khayat, K.H. Prediction of self-consolidating concrete properties using XGBoost machine learning algorithm: Part 1–Workability. Construction and Building Materials 2023, 408, 133560. [Google Scholar] [CrossRef]
  4. Cakiroglu, C.; Bekdaş, G.; Kim, S.; Geem, Z.W. Explainable Ensemble Learning Models for the Rheological Properties of Self-Compacting Concrete. Sustainability 2022, 14, 14640. [Google Scholar] [CrossRef]
  5. Safhi, A.E.M.; Dabiri, H.; Soliman, A.; Khayat, K.H. Prediction of self-consolidating concrete properties using XGBoost machine learning algorithm: Rheological properties. Powder Technology 2024, 438, 119623. [Google Scholar] [CrossRef]
  6. Chakravarthy H G, N.; Seenappa, K.M.; Naganna, S.R.; Pruthviraja, D. Machine Learning Models for the Prediction of the Compressive Strength of Self-Compacting Concrete Incorporating Incinerated Bio-Medical Waste Ash. Sustainability 2023, 15, 13621. [Google Scholar] [CrossRef]
  7. Cui, T.; Kulasegaram, S.; Li, H. Design automation of sustainable self-compacting concrete containing fly ash via data driven performance prediction. Journal of Building Engineering 2024, 87, 108960. [Google Scholar] [CrossRef]
  8. Cheng, B.; Mei, L.; Long, W.J.; Kou, S.; Luo, Q.; Feng, Y. AI-guided design of low-carbon high-packing-density self-compacting concrete. Journal of Cleaner Production 2023, 428, 139318. [Google Scholar] [CrossRef]
  9. Jiang, P.; Zhao, D.; Jin, C.; Ye, S.; Luan, C.; Tufail, R.F. Compressive strength prediction and low-carbon optimization of fly ash geopolymer concrete based on big data and ensemble learning. PLOS ONE 2024, 19, e0310422. [Google Scholar] [CrossRef]
  10. Wang, M.; Du, M.; Jia, Y.; Chang, C.; Zhou, S. Carbon Emission Optimization of Ultra-High-Performance Concrete Using Machine Learning Methods. Materials 2024, 17, 1670. [Google Scholar] [CrossRef] [PubMed]
  11. Wakjira, T.G.; Kutty, A.A.; Alam, M.S. A novel framework for developing environmentally sustainable and cost-effective ultra-high-performance concrete (UHPC) using advanced machine learning and multi-objective optimization techniques. Construction and Building Materials 2024, 416, 135114. [Google Scholar] [CrossRef]
  12. Huang, G.; Abou-Chakra, A.; Geoffroy, S.; Absi, J. Improving the mechanical and thermal performance of bio-based concrete through multi-objective optimization. Construction and Building Materials 2024, 421, 135673. [Google Scholar] [CrossRef]
  13. Helali, S.; Albalawi, S.; Alanazi, M.; Alanazi, B.; Bel Hadj Ali, N. Optimizing Carbon Footprint and Strength in High-Performance Concrete Through Data-Driven Modeling. Sustainability 2025, 17, 7808. [Google Scholar] [CrossRef]
  14. Wang, S.; Xia, P.; Gong, F.; Zeng, Q.; Chen, K.; Zhao, Y. Multi objective optimization of recycled aggregate concrete based on explainable machine learning. Journal of Cleaner Production 2024, 445, 141045. [Google Scholar] [CrossRef]
  15. Huang, P.; Dai, K.; Yu, X. Machine learning approach for investigating compressive strength of self-compacting concrete containing supplementary cementitious materials and recycled aggregate. Journal of Building Engineering 2023, 79, 107904. [Google Scholar] [CrossRef]
  16. Fang, G.H.; Lin, Z.M.; Xie, C.Z.; Han, Q.Z.; Hong, M.Y.; Zhao, X.Y. Optimized Machine Learning Model for Predicting Compressive Strength of Alkali-Activated Concrete Through Multi-Faceted Comparative Analysis. Materials 2024, 17, 5086. [Google Scholar] [CrossRef]
  17. Pan, B.; Liu, W.; Zhou, P.; Wu, D.O. Predicting the Compressive Strength of Recycled Concrete Using Ensemble Learning Model. IEEE Access 2025, 13, 2958–2969. [Google Scholar] [CrossRef]
  18. Sun, B.; Cui, W.; Liu, G.; Zhou, B.; Zhao, W. A hybrid strategy of AutoML and SHAP for automated and explainable concrete strength prediction. Case Studies in Construction Materials 2023, 19, e02405. [Google Scholar] [CrossRef]
  19. Wang, J.; Deng, J.; Li, S.; Du, W.; Zhang, Z.; Liu, X. Explainable Machine Learning for Multicomponent Concrete: Predictive Modeling and Feature Interaction Insights. Materials 2025, 18, 4456. [Google Scholar] [CrossRef]
  20. Shanthi Vengadeshwari, R.; Ujwal, M.S.; Shiva Kumar, G.; Mahesh, R.; Sanjay, N.; Rajiv, K.N.; Pandit, P. SHAP-based prediction and optimization of compressive strength in M30 concrete with dry sewage sludge as fine aggregate replacement. Discover Materials 2025, 5, 183. [Google Scholar] [CrossRef]
  21. Hariri-Ardebili, M.A.; Mahdavi, P.; Pourkamali-Anaraki, F. Benchmarking AutoML solutions for concrete strength prediction: Reliability, uncertainty, and dilemma. Construction and Building Materials 2024, 423, 135782. [Google Scholar] [CrossRef]
1
See the detailed industrial validation summary for Kuwaiti mixes for full numerical metrics and per-mix errors.
Figure 1. Effect of the data augmentation protocol on predictive performance across different machine learning models.
Figure 1. Effect of the data augmentation protocol on predictive performance across different machine learning models.
Preprints 191972 g001
Figure 2. Comparison of R 2 scores across baseline and optimized models.
Figure 2. Comparison of R 2 scores across baseline and optimized models.
Preprints 191972 g002
Figure 3. Comparison of RMSE values across baseline and optimized models.
Figure 3. Comparison of RMSE values across baseline and optimized models.
Preprints 191972 g003
Figure 4. Predicted vs. actual values for Slump Flow and T50 using the augmented XGBoost model.
Figure 4. Predicted vs. actual values for Slump Flow and T50 using the augmented XGBoost model.
Preprints 191972 g004
Figure 5. Global SHAP feature importance comparison for all SCC workability models.
Figure 5. Global SHAP feature importance comparison for all SCC workability models.
Preprints 191972 g005
Figure 6. SHAP summary (beeswarm) plot for Slump Flow prediction.
Figure 6. SHAP summary (beeswarm) plot for Slump Flow prediction.
Preprints 191972 g006
Figure 7. SHAP dependence plots for key features affecting Slump Flow.
Figure 7. SHAP dependence plots for key features affecting Slump Flow.
Preprints 191972 g007
Figure 8. SHAP dependence plots for key features affecting T50.
Figure 8. SHAP dependence plots for key features affecting T50.
Preprints 191972 g008
Figure 11. Comparison between optimized mix design and best existing mix for all workability properties.
Figure 11. Comparison between optimized mix design and best existing mix for all workability properties.
Preprints 191972 g011
Figure 12. External validation results for industrial SCC mixes from Kuwait.
Figure 12. External validation results for industrial SCC mixes from Kuwait.
Preprints 191972 g012
Table 1. Predictive performance of the optimized XGBoost model on the independent test set.
Table 1. Predictive performance of the optimized XGBoost model on the independent test set.
Target Property Metric Value Interpretation
Slump Flow (mm) R 2 0.835 Excellent correlation with observed values
MAE (mm) 38.2 Low average absolute error
RMSE (mm) 51.9 Acceptable prediction dispersion
T50 (s) R 2 0.828 Highly reliable correlation
MAE (s) 0.21 Very low absolute error
RMSE (s) 0.30 High precision in time prediction
V-Funnel (s) R 2 0.751 Good correlation for flow time
MAE (s) 0.35 Acceptable error range
RMSE (s)
L-box ( H 1 / H 2 ) R 2 0.724 Acceptable predictive correlation
MAE (ratio) 0.04 High precision for ratio prediction
RMSE
Table 2. External validation results on industrial SCC mix designs.
Table 2. External validation results on industrial SCC mix designs.
Mix ID Target Slump Flow (mm) Predicted (mm) Abs. Error (mm) Within ± 100 mm?
Kuwait_K700_1 600 ± 100 678.9 78.9 Yes
Kuwait_SRC_Micro 600 ± 100 673.8 73.8 Yes
Kuwait_65Nmm2 600 ± 100 684.6 84.6 Yes
Kuwait_SRC_OPC 600 ± 100 682.2 82.2 Yes
MAE = 79.9 mm    MRE = 13.3%    
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated