Submitted:
10 October 2025
Posted:
10 October 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Literature Review
2.1. Direct and Indirect Costs Analysis in Construction Accidents
2.3. Data Mining in Construction Management Research
| Author (Year) | Model | Technique | Application | Advantage of the strategy | Disadvantages of the strategy |
| Lee et al. [23] | DT+DT | Classification & Regression | Productivity loss classification and quantification | Reduction of error compared to conventional regression models | Relies solely on tree-based classification and regression models, leaving other algorithms unexplored |
| Son and Kim [24] | ANN | Regression | Construction cost estimation. | The proposed ANN model captures diverse cost-influencing factors | The small dataset and the high complexity of ANN increase the risk of overfitting |
| Chou and Tsai [25] | SVM+L, SVM+MLP, and SVM+SVR | Classification & Regression | Compressive strength prediction for high performance concrete | Superior performance over single MLP, LR, and SVR regressors highlighting the need for combined strategies | SVM is natively a binary classification technique may not be well-suited for multiclass classification tasks where class boundaries are not clearly separable, such as the total accident cost categories in the present study |
| Tixier et al. [26] | RF and stochastic GB | Classification | Categorical safety outcomes | Justifies the use of algorithmic modeling over parametric modeling | Fails to predict one of the target variables correctly due to the noisy dataset |
| Ayhan & Tokdemir [27] | ANN | Regression | Prediction of construction incident outcome | Removes the attributes with less significance | Integration of Fuzzy Set theory to improve the vagueness of the ANN prediction |
| Pham et al. [28] | Multiple linear, Lasso, Ridge, Elastic-net, etc. | Regression | Optimization of the building cost | Allows a quick estimate for building costs and improving operational efficiency |
Out of 13 algorithms, ANN, GB, and XGBoost were found to be satisfactory, however, the optimization results of the ANN model are constrained by the imposed feature limitation |
| Shahani et al. [29] | GB, Catboost, LightGBM, and XGBoost | Regression | Uniaxial compressive strength prediction | Demonstrates the efficiency of boosting models in predictive analysis | XGBoost's exceptionally high performance may suggest overfitting, particularly due to the small dataset size |
| Choi and Kim [30] | DT | Classification | Fatality prediction | Demonstrates the efficiency of DT-based classification in construction engineering | Performance was not benchmarked against other algorithms |
| Wang et al. [31] | LC+LR | Classification & Regression | ERLS classification, prediction of the displacement ductility factor and strength hardening factor | Efficient prediction using a classification–regression hybrid model | Reliance on linear functions, which may restrict its ability to capture more complex nonlinear interactions |
3. Methodology

3.1. Accident Cost Questionnaire Design
3.2. Data Collection Process and Survey Overview
4. Two-Tiered Indirect Cost Prediction Model Development
4.1. Applied Training Strategies
| No. | Questionnaire | Variable | Code | Type | No. of Elements |
| 1 | Part 1 | Construction type* | CT | Categorical | 6 (Such as buildings, infrastructure, industrial construction etc.) |
| 2 | Company name | CN | Nominal | Major 20 contractors and 4 public institutions | |
| 3 | Project completion date | - | Numeric | DD/MM/YY to DD/MM/YY | |
| 4 | Project cost | - | Numeric | Total construction project cost | |
| 5 | Part 2 | Project completion rate | - | Numeric | Construction progress rate at the time of the accident |
| 6 | Accident date | AD | Numeric | n/a | |
| 7 | Year | - | Numeric | 11 (2011~2022) | |
| 8 | Day of the week* | DW | Categorical | 7 (Mon-Sun) | |
| 9 | Work process type* | WP | Categorical | 39 (Excavation, blasting, burying, etc.) | |
| 10 | Part 3 | Specific occupation type of worker* | IJ | Categorical | 62 (Worker’s specific occupation category (earthworks, boring, bricks, masonry works etc.)) |
| 11 | Worker's affiliation* | WP | Categorical | 2 (Worker's affiliation based on the operator type, i.e., contractor or subcontractor.) | |
| 12 | Length of service* | SP | Categorical | 8 (From less than 3 months to above 10 years) | |
| 13 | Average wage | - | Numeric | Daily wage in US$ | |
| 14 | Part 4 | Accident type* | AT | Categorical | 18 (Falling, bumping, Slipping, etc.) |
| 15 | Injured area* | IA | Categorical | 19 (Head, eyes, hands, etc.) | |
| 16 | Number of deaths* | - | Numeric | Number of victims who died as a result of the accident | |
| 17 | Number of injuries* | - | Numeric | Number of victims injured as a result of the accident | |
| 18 | Part 5 | Direct cost type* | DC | Categorical | 7 categories (less than 1,000 US$ to above 1 million US$) |
| 19 | Direct cost* | - | Numeric | 0~550,923 (US$) | |
| 20 | Indirect cost type | IDC | Categorical | 7 categories (less than 1000 US$ to above 1 million US$) | |
| 21 | Indirect cost* | - | Numerical | 0~2,668,907 (US$) | |
| 22 | Total cost type* | TAC | Categorical | 7 categories (less than 1000 US$ to above 1 million US$) | |
| 23 | Total accident cost | - | Numerical | 179~3,020,080 (US$) |
4.2. Selection of Accident Cost Variables Using Statistical Analysis

| No. | Questionnaire | Variable name | p-value | Significance | Inclusion |
| 1 | Part 1 | Construction type | 0.00165 | <0.05 | √ |
| 2 | Part 2 | Day of the week | 0.05835 | >0.05 | × |
| 3 | Work process type | 0.00025 | <0.05 | √ | |
| 4 | Part 3 | Specific occupation type of worker | 2.53E-15 | <0.05 | √ |
| 5 | Worker's affiliation | 2.72E-13 | <0.05 | √ | |
| 6 | Length of service | 0.06786 | >0.05 | × | |
| 7 | Part 4 | Accident type | 1.01E-14 | <0.05 | √ |
| 8 | Injured area | 2.35E-22 | <0.05 | √ | |
| 9 | Part 5 | Direct cost type | 1.12E-156 | <0.05 | √ |
4.3. Two-Tiered Prediction Model Development Process

5. Results and Discussion
5.1. 1st-Tier Classification Model for The Total Accident Cost Prediction
| ML model | Training accuracy | Testing accuracy | Cross-validation score | ||||||
| Regular model | ROS model | RUS model | Regular model | ROS model | RUS model | Regular model | ROS model | RUS model | |
| DT | 0.83 | 0.81 | 0.81 | 0.81 | 0.78 | 0.73 | 0.80 | 0.81 | 0.77 |
| RF | 0.82 | 0.82 | 0.82 | 0.80 | 0.80 | 0.75 | 0.80 | 0.81 | 0.76 |
| K-NN | 0.80 | 0.90 | 0.75 | 0.80 | 0.87 | 0.70 | 0.81 | 0.88 | 0.73 |
| XGBoost | 0.85 | 0.86 | 0.82 | 0.80 | 0.83 | 0.74 | 0.80 | 0.84 | 0.75 |
| ML model | Precision | Recall | F1-score | Test data number | ||||||||
| Reg | ROS | RUS | Reg | ROS | RUS | Reg | ROS | RUS | Reg | ROS | RUS | |
| DT | 0.82 | 0.85 | 0.72 | 0.8 | 0.81 | 0.74 | 0.8 | 0.81 | 0.71 | 182 | 1129 | 168 |
| RF | 0.83 | 0.88 | 0.71 | 0.82 | 0.86 | 0.75 | 0.8 | 0.86 | 0.71 | |||
| K-NN | 0.81 | 0.93 | 0.64 | 0.8 | 0.91 | 0.71 | 0.8 | 0.91 | 0.67 | |||
| XGBoost | 0.8 | 0.87 | 0.65 | 0.81 | 0.84 | 0.74 | 0.8 | 0.84 | 0.67 | |||
5.2. 2nd-Tier Regression Model for Indirect Cost Prediction

| Serial | Construction Project | Number of recorded accidents | Direct cost: Indirect cost | Construction code (from Appendix A) |
| 1 | Apartment buildings | 432 | 1:1.05 | CT0 |
| 2 | Business facilities | 76 | 1:0.93 | CT0 |
| 3 | Medical facilities | 27 | 1:1.5 | CT0 |
| 4 | Cultural and community spaces | 23 | 1:6.76 | CT2 |
| 5 | Sewer systems | 18 | 1:3.6 | CT1 |
| No | Original indirect cost ($) | Predicted indirect cost ($) | Difference with prediction | Average deviation | Conventional ratio-based indirect cost (1:4 with direct cost) ($) | Difference with estimation | Average deviation |
| 1 | 1347.79 | 1196.70 | 151.09 | 362.23 | 24196.85 | 22849.06 | 89,852.61 |
| 2 | 50.48 | 0 | 50.48 | 60613.58 | 60563.09 | ||
| 3 | 0 | 0 | 0 | 33390.45 | 33390.45 | ||
| 4 | 812921.46 | 811667.91 | 1253.55 | 685760.98 | 127160.47 | ||
| 5 | 2872.10 | 2516.08 | 356.02 | 208532.08 | 205659.98 |
5.3. Research Findings, Significance and Practical Implementation Strategies
6. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- [1] Bang S.; Jeong J.; Lee J.; Jeong J.; Soh J. Evaluation of Accident Risk Level Based on Construction Cost, Size and Facility Type. Sustainability 2023, 15.2, 1565. [CrossRef]
- [2] Ministry of Employment and Labor. 2022 Industrial Accident Status Analysis Report. Republic of Korea. Available online: https://www.moel.go.kr/policy/policydata/view.do?bbs_seq=20231201612 (accessed on 8 May 2024).
- [3] Safe Work Australia. Australian WHS Strategy 2023-2033: Baseline Report. Available online: https://www.moel.go.kr/policy/policydata/view.do?bbs_seq=20231201612 (accessed on 28 September 2025).
- [4] Selleck, R.; Cattani, M.; Hassall, M. Proposal for and validation of novel risk-based process to reduce the risk of construction site fatalities (Major Accident Prevention (MAP) program). Safety Sci. 2023. 158-105986. [CrossRef]
- [5] Jaselskis, E.J.; Anderson, S.D.; Russell, J.S. Strategies for achieving excellence in construction safety performance. J. Constr. Eng. Manage. 1996. 122-1, 61-70. [CrossRef]
- [6] Hinze, J. Construction Safety, 2nd ed.; Prentice-Hall, Hoboken, New Jersey, U.S., 2006, pp. 63.
- [7] Pellicer, E.; Carvajal, G.I.; Rubio, M.C.; Catalá, J. A method to estimate occupational health and safety costs in construction projects.” KSCE J. Civ. Eng. 2014, 18, 1955-1965. [CrossRef]
- [8] Heinrich, H. W. Industrial accident prevention: A scientific approach (1931 for the 1st ed.; 1941 for the 2nd ed.), (4th ed.). McGraw Hill, New York, U.S, 1959, pp. 2.
- [9] LaBelle, J. E. What do accidents truly cost? Determining Total Incident Costs. Prof. Saf. 2000. 45.4, 38–42.
- [10] Jallon, R.; Imbeau, D.; de Marcellis-Warin, N. Development of an indirect-cost calculation model suitable for workplace use. J. Safety Res. 2011. 42.3, 149-164. [CrossRef]
- [11] Brody, B.; Létourneau, Y.; Poirier, A. An indirect cost theory of work accident prevention. J. Occup. Accidents 1990. 13.4, 255-270. [CrossRef]
- [12] Hinze, J.; Appelgate, L.L. Costs of construction injuries. J. Constr. Eng. Manage. 1991. 117.3, 537-550.
- [13] Sun, L.; Paez, O.; Lee, D.; Salem, S.; Daraiseh, N. Estimating the uninsured costs of work-related accidents, part I: a systematic review. Theor. Issues Ergon. Sci. 2006. 7.3, 227–245. [CrossRef]
- [14] Manuele, F.A. Accident Costs. Prof. Saf. 2011., 56.1, 39-47.
- [15] Haupt, T.C.; Pillay, K. Investigating the true costs of construction accidents. J. Eng. Des. Technol. 2016. 14.2, 373-419. [CrossRef]
- [16] Azman, N.N.K.N.M.; Ahmad, A.C.; Derus, M.M.; Kamar, I.F.M. In Determination of direct to indirect accident cost Ratio for railway construction project, Proc., MATEC Web of Conferences 2019, 266, p. 03009). EDP Sciences. [CrossRef]
- [17] Gavious, A.; Mizrahi, S.; Shani, Y.; Minchuk, Y. The costs of industrial accidents for the organization: developing methods and tools for evaluation and cost–benefit analysis of investment in safety. J. Loss Prev. Process Ind. 2009. 22.4, 434-438. [CrossRef]
- [18] Allison, R.W.; Hon, C.K.; Xia, B. Construction accidents in Australia: Evaluating the true costs. Safety Sci. 2019. 120, 886-896. [CrossRef]
- [19] Leopold, E., and Leonard, S. (1987). "Costs of construction accidents to employers." J. Occup. Accid., 8(4), 273-294. [CrossRef]
- [20] Choi, S.D. A survey of the safety roles and costs of injuries in the roofing contracting industry. J. Saf. Health Environ. Res. 2006. 3(1), 1-20.
- [21] Teo, E.A.L.; Feng, Y. Costs of construction accidents to Singapore contractors. Int. J. Constr. Manage. 2011. 11.3, 79-92. [CrossRef]
- [22] Chen, F.; Deng, P.; Wan, J.; Zhang, D.; Vasilakos, A.V.; Rong, X. Data mining for the internet of things: literature review and challenges. Int. J. Distrib. Sens. Netw. 2015. 11(8), 431047. [CrossRef]
- [23] Lee, M.J.; Hanna, A.S.; Loh, W.Y. Decision tree approach to classify and quantify cumulative impact of change orders on productivity. J. Comput. Civ. Eng. 2004. 18.2,132-144. [CrossRef]
- [24] Son, J.H.; Kim, C.Y. A study on the model of artificial neural network for construction cost estimation of educational facilities at conceptual stage. Korean J. Constr. Eng. Manag. 2006. 7(4), 91-99.
- [25] Chou, J.S.; Tsai, C.F. Concrete compressive strength analysis using a combined classification and regression technique. Autom. Constr. 2012. 24, 52-60. [CrossRef]
- [26] Tixier, A.J.P.; Hallowell, M.R.; Rajagopalan, B.; Bowman, D. Application of machine learning to construction injury prediction. Autom. Constr, 2016. 69, 102-114. [CrossRef]
- [27] Ayhan, B.U.; Tokdemir, O.B. Predicting the outcome of construction incidents. Safety Sci. 2019. 113, 91-104. [CrossRef]
- [28] Pham, T.Q.D.; Le-Hong, T.; Tran, X.V. Efficient estimation and optimization of building costs using machine learning. Int. J. Constr. Manage. 2021. 23.5, 909-921. [CrossRef]
- [29] Shahani, N.M.; Kamran, M.; Zheng, X.; Liu, C.; Guo, X. Application of gradient boosting machine learning algorithms to predict uniaxial compressive strength of soft sedimentary rocks at Thar Coalfield. Adv. Civ. Eng., 2021. 1-19. [CrossRef]
- [30] Choi, J.W.; Kim, H.S. Predictive Analytics Model for Death Accidents in Building Projects by Trade - Based on Decision Tree-. Korean J. Constr. Eng. Manag. 2021. 22.5, 55–65. [CrossRef]
- [31] Wang, J.; Ye, A.; Wang, X. Quantifying Easy-to-Repair Displacement Ductility and Lateral Strength of Scoured Bridge Pile Group Foundations in Cohesionless Soils: A Classification–Regression Combination Surrogate Model. J. of Bridg. Eng. 2023. 28.11, 04023080. [CrossRef]
- [32] Devos, O.; Ruckebusch, C.; Durand, A.; Duponchel, L.; Huvenne, J.P. Support vector machines (SVM) in near infrared (NIR) spectroscopy: Focus on parameters optimization and model interpretation. Chemometr. Intell. Lab. Syst. 2009. 96.1, 27-33. [CrossRef]
- [33] Chiang, Y.H.; Wong, F.K.W.; Liang, S. Fatal construction accidents in Hong Kong. J. Constr. Eng. Manage. 2018. 144.3, 04017121. [CrossRef]
- [34] Wong, L.; Wang, Y.; Law, T.; Lo, C. T. Association of Root Causes in Fatal Fall-from-Height Construction Accidents in Hong Kong. J. Constr. Eng. Manage. 2016. 142.7, 04016018. [CrossRef]
- [35] Hatami, S. E.; Ravandi, M. R. G.; Hatami, S. T.; Khanjani, N. Epidemiology of work-related injuries among insured construction workers in Iran. Elec. Phys., 2017. 9.11, 5841–5847. [CrossRef]
- [36] Koc, K.; Ekmekcioğlu, Ö.; Gurgun, A.P. Integrating feature engineering, genetic algorithm and tree-based machine learning methods to predict the post-accident disability status of construction workers. Autom. Constr. 2021. 131, 103896. [CrossRef]
- [37] Korea Workers’ Compensation & Welfare Service, 2018. Industrial accident insurance collection & payment status, Korea Workers’ Compensation & Welfare Service.
- [38] Vargas, W.; Aranda, S.; Costa, S. Imbalanced data preprocessing techniques for machine learning: a systematic mapping study. Knowl. Inf. Syst. 2023. 65, 31-57. [CrossRef]
- [39] Koc, K.; Ekmekcioğlu, Ö.; Gurgun, A.P. Prediction of construction accident outcomes based on an imbalanced dataset through integrated resampling techniques and machine learning methods. Engineering. Constr. Architect. Manage. 2023. 30(9), 4486-4517. 10.1108/ECAM-04-2022-0305.
- [40] Choi, J.; Gu, B.; Chin, S.; Lee, J.S. Machine learning predictive model based on national data for fatal accidents of construction workers. Autom. Constr. 2020. 110, 102974. [CrossRef]
- [41] Choudhry, R.M.; Hinze, J.W.; Arshad, M.; Gabriel, H.F. Subcontracting practices in the construction industry of Pakistan. J. Constr. Eng. Manage. 2012. 138(12),1353-1359. [CrossRef]
- [42] Nguyen, H.; Vu, T.; Vo, T.P.; Thai, H.T. Efficient machine learning models for prediction of concrete strengths. Constr. Build. Mater. 2021. 266, 120950. [CrossRef]
- [43] Takyi-Annan, G.E.; Zhang, H. A Multivariate Analysis of the Variables Impacting the Level of BIM Expertise of Professionals in the Architecture, Engineering and Construction (AEC) Industries of the Developing World Using Nonparametric Tests. Buildings 2023. 13.7, 1606. [CrossRef]
- [44] Vakharia, V.; Gujar, R. Prediction of compressive strength and portland cement composition using cross-validation and feature ranking techniques. Constr. Build. Mater. 2019. 225, 292-301. [CrossRef]
- [45] Buckland, M.; Gey, F. The relationship between recall and precision. J. Am. Soc. Inf. Sci. 1994. 45(1),12-19.
- [46] Prabowo, R; Thelwall, M. Sentiment analysis: A combined approach. J. Informetrics. 2009. 3.2, 143-157. [CrossRef]
- [47] Attal, F.; Mohammed, S.; Dedabrishvili, M.; Chamroukhi, F.; Oukhellou, L.; Amirat, Y. Physical human activity recognition using wearable sensors. Sensors 2015. 15(12), 31314-31338. [CrossRef]
- [48] Rico-Juan, J.R.; Paz, P.T.D.L. Machine learning with explainability or spatial hedonics tools? An analysis of the asking prices in the housing market in Alicante, Spain. Expert Syst. Appl. 2021. 171, 114590. [CrossRef]
- [49] Moon, S.; Chowdhury, A. M. Utilization of prior information in neural network training for improving 28-day concrete strength prediction. J. Constr. Eng. Manage. 2021. 147(5), 04021028. [CrossRef]
- [50] Peng, H.; Wu, H.; Wang, J.; Dede, T. Research on the prediction of the water demand of construction engineering based on the BP neural network. Adv. Civ. Eng. 2020. 1.11, 8868817. [CrossRef]
- [51] Bentéjac, C.; Csörgő, A.; Martínez-Muñoz, G.. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 2021. 54, 1937-1967. [CrossRef]
- [52] Patel, R.S.; Akolekar, H.D. Machine-learning based optimization of a biomimiced herringbone microstructure for superior aerodynamic performance. Eng. Res. Express. 2023. 5.4, 045065. [CrossRef]


| No. | ML model | R2-score | MAE | MSE |
| 1 | DT | 0.78 | 0.13 | 0.49 |
| 2 | RF | 0.91 | 0.11 | 0.29 |
| 3 | GB | 0.95 | 0.1 | 0.21 |
| 4 | LGBM | 0.94 | 0.1 | 0.23 |
| No. | Analysis Approach | Regressor | R2-score | |
| Two-tiered model | Single regression model | |||
| 1 | ML | DT regressor | 0.78 | 0.55 |
| 2 | ML | RF regressor | 0.91 | 0.79 |
| 3 | ML | GB regressor | 0.95 | 0.83 |
| 4 | ML | LGBM regressor | 0.94 | 0.43 |
| 5 | Statistical | Linear regression | 0.64 | |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).