Submitted:
04 June 2024
Posted:
05 June 2024
Read the latest preprint version here
Abstract
Keywords:
Introduction
Literature Review
Gradient Boosting Machine
Random Forest
Hyperparameter Tuning
Methodology
Data Definitions and Sources
Exploratory Data Analysis
Results and Discussions
Conclusions
Funding
Declaration of Conflicting Interests
References
- Ahlquis, K D, Sugden L A, Ramachandran S (2023) Enabling interpretable machine learning for biological data with reliability scores. PLOS Computational Biology 19(5): e1011175. [CrossRef] [PubMed]
- Ahmed M M, Abdel–Aty M (2013) Application of stochastic gradient boosting technique to enhance reliability of real–time risk assessment. Transportation Research Record Journal of the Transportation Research Board 2386: 26–34.
- Anjaneyulu B, Goswami S, Banik P, Chauhan V, Raghav N, Chinmay (2024) Revolution of Artificial Intelligence in Computational Chemistry breakthroughs. Chemistry Africa. [CrossRef]
- Aysan A F, Ciftler B S, Unal I M (2024) Predictive power of Random Forests in analyzing risk management in Islamic banking. Journal of Risk and Financial Management 17(3): 104. [CrossRef]
- Barbhuiya S, Sharif S (2023) Artificial Intelligence in concrete mix design: advances, applications and challenges. 3ICT 2023: International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies. University of Bahrain, Bahrain 20 - 21 Nov 2023 IEEE. [CrossRef]
- Bergstra J, Bengio Y (2012) Random Search for hyper-parameter optimization. Journal of Machine Learning Research 13: 281–305.
- Besson P, Rogalski E, Gill N P, Zhang H, Martersteck A and Bandt S K (2022) Geometric deep learning reveals a structuro–temporal understanding of healthy and pathologic brain aging. Frontier in Aging Neuroscience. 14: 895535. [CrossRef]
- Breiman L (1996) Bagging predictors. Machine Learning 24(2): 123–140. [CrossRef]
- Breiman L (1997) Arcing the edge (Technical Report 486). Berkeley: Department of Statistics, University of California.
- Breiman L (2001) Random Forests. Machine Learning 45(1): 5–32. [CrossRef]
- Calainho F D, van de Minne A M, Francke M K (2022) A machine learning approach to price indices: applications in commercial real estate. Journal of Real Estate Finance and Economics. [CrossRef]
- Castaneda J, Cardona J F, Martins L, Juan A A (2021) Supervised machine learning algorithms for measuring and promoting sustainable transportation and green logistics. Transportation Research Procedia 58: 455–462.
- Choy L H T, Ho W K O (2023) On the use of machine learning in real estate research. Land 12(4): 740. [CrossRef]
- Chung C W, Hsiao T H, Huang C J, Chen Y J, Chen H H, Lin C H, Chou S C, Chen TS, Chung Y F, Yang H I, Chen Y M (2021) Machine learning approaches for the genomic prediction of rheumatoid arthritis and systemic lupus erythematosus. BioData Mining 14:52, 1–13. [CrossRef] [PubMed]
- Chung Y S (2013) Factor complexity of crash occurrence: An empirical demonstration using boosted regression trees. Accident Analysis & Prevention 61, 107–118.
- Cugurullo F, Caprotti F, Cook M, Karvonen A, MᶜGuirk P, Marvin S, (2024) The rise of AI urbanism in post–smart cities: A critical commentary on urban artificial intelligence. Urban Studies 61(6): 1168–1182.
- Friedman J (1999) Greedy function approximation: A Gradient Boosting Machine. IMS 1999 Reitz Lecture, February 24, 1999.
- Grande E, Imbimbo M (2012) A data–driven approach for damage detection: An application to the ASCE steel benchmark structure. Journal of Civil Structural Health Monitoring 2: 73–85 (2012). [CrossRef]
- Greener J G, Kandathil S M, Moffat L, Jone D T (2022) A guide to machine learning for biologists. Nature Reviews Molecular Cell Biology 23: 40–55. [CrossRef] [PubMed]
- Hamner B (2010) Predicting travel times with context–dependent random forests by modeling local and aggregate traffic flow. Paper presented at Data Mining Workshops (ICDMW), 2010 IEEE International Conference on IEEE, 1357–1359.
- Hjort A, Pensar J, Scheel I, Sommervoll D E (2022) House price prediction with gradient boosted trees under different loss functions. Journal of Property Research 39(4): 338–364.
- Ho W K O, Tang B S, Wong S W (2021) Predict property prices with machine learning algorithms. Journal of Property Research 38(1): 48–70. [CrossRef]
- Jamous M, Marsooli R, Miller J K (2023) Physics–based modeling of climate change impact on hurricane–induced coastal erosion hazards. npj Climate Atmospheric Science 6: 86. [CrossRef]
- Kalliola J, Kapočiūtė-Dzikienė J, Damaševičius R (2021) Neural network hyperparameter optimization for prediction of real estate prices in Helsinki. PeerJ Computer Science 7: e444. [CrossRef] [PubMed]
- Kashinath K, Mustafa M, A. Albert A, Wu J L, Jiang C, Esmaeilzadeh S, Azizzadenesheli K, Wang R, A. Chattopadhyay A, Singh A, Manepalli A, Chirila D, Yu R, Walters R, White B, Xiao H, Tchelepi H A, Marcus P, Anandkumar A, Hassanzadeh P, Prabhat (2021) Physics-informed machine learning: case studies for weather and climate modelling. Philosophical Transactions of the Royal Society A 379(2194): 20200093. [CrossRef]
- e Jongh S, Gielnik F, Mueller F, Schmit L, Suriyah M, Leibfried T (2022) Physics–informed geometric deep learning for inference tasks in power systems. Electric Power Systems Research 211: 108362. [CrossRef]
- Leshem G, Ritov Y A (2007) Traffic flow prediction using Adaboost algorithm with Random Forests as a weak learner. International Journal of Electrical and Computer Engineering 2(6): 404–409.
- Li F, Yigitcanlar T, Nepal M, Nguyen K, Dur F (2023) Machine learning and remote sensing integration for leveraging urban sustainability: A review and framework. Sustainable Cities and Society 96(13): 104653.
- Loef B, Wong A, Janssen N A H, Strak M, Hoekstra J, Picavet H S J, Boshuizen H C H, Verschuren W M M, Herber Gerrie–Corr M (2022) Using Random Forest to identify longitudinal predictors of health in a 30–year cohort study. Scientific Reports 12: 10372. [CrossRef] [PubMed]
- Long Y S, Zeng L Q, Wang J, Long X C, Wu L (2022) A gradient boosting approach to estimating tail risk interconnectedness. Applied Economics 54(8): 862–879. [CrossRef]
- Lorenz F, Willwersch J, Cajias M, Fuerst F (2023) Interpretable machine learning for real estate market analysis. Real Estate Economics 51(5): 1178–1208. [CrossRef]
- Miller M I, Shih L C, Kolachalama V B (2023) Machine learning in clinical trials: A primer with applications to neurology. Neurotherapeutics 20: 1066–1080. [CrossRef] [PubMed]
- Monteleoni C, Schmidt G A, Saroha S, Asplund E (2011) Tracking climate model. Statistical Analysis and Data Mining 4: 372–392. [CrossRef]
- Noorbakhsh J, Chandok H, Karuturi K M (2019) Machine learning in biology and medicine. Advances in Molecular Pathology 2(1): 143–152. [CrossRef]
- Optuna (2018) Optuna: A hyperparameter optimization framework. https://optuna.readthedocs.io/en/stable/.
- Reades J, De Souza J, Hubbard P (2019) Understanding urban gentrification through machine learning: Predicting neighbourhood change in London. Urban Studies 56(5): 922–942. [CrossRef]
- Renaud J, Karam R, Salomon M, Couturier R (2023) Deep learning and gradient boosting for urban environmental noise monitoring in smart cities. Expert Systems with Applications 218: 119568. [CrossRef]
- Sideris N, Bardis G, Voulodimos A, Miaoulis G, Ghazanfarpour D (2019) Using Random Forests on Real-World City Data for Urban Planning in a Visual Semantic Decision Support System. Sensors (Basel) 19(10): 2266. https://doi: 10.3390/s19102266. PMID: 31100879; PMCID: PMC6567884.
- Sit M, Bekir Z, Demiray B Z, Xiang Z R, Ewing G J, Sermet Y, Demir I (2020) A comprehensive review of deep learning applications in hydrology and water resources. Water Science and Technology 82(12): 2635–2670.
- Thackway W, Ng M, Lee C L, Pettit C (2023) Building a predictive machine learning model of gentrification in Sydney. Cities 134: 104192.
- Tsagkis P, Bakogiannis E, Nikitas A (2023) Analysing urban growth using machine learning and open data: An artificial neural network modelled case study of five Greek cities. Sustainable Cities and Society 89: 104337.
- Wang Y (2011) Prediction of weather impacted airport capacity using ensemble learning. Paper presented at Digital Avionics Systems Conference (DASC), 2011 IEEE/AIAA 30th, 2011. IEEE, 2D6-1-2D6-11. Wang, J., Shi, Q., 2012.
- Wang J, Biljecki F (2022) Unsupervised machine learning in urban studies: A systematic review of applications. Cities 129(12):103925.
- Weissler E H, Naumann T, Andersson T, Ranganath R, Elemento O, Luo Y, Freitag D F, Benoit J, Hughes M C, Khan F, Slater P, Shameer K, Roe M, Hutchison E, Kollins S H, Broedl U, Meng Z L, Wong J L, Curtis L, Huang E, Ghassemi M (2021) The role of machine learning in clinical research: transforming the future of evidence generation, Trials 22: 537.
- Whitehall B L, Lu S C Y (1991) Machine learning in engineering automation–The present and the future. Computers in Industry 17: 91–100.
- Willard J, Jia X W, Xu S M, Steinbach M, Kumar V (2022) Integrating scientific knowledge with machine learning for engineering and environmental systems, ACM Computing Survey 55(4): 1–37.
- Worden K, Manson G (2006) The application of machine learning to structural health monitoring. Philosophical Transactions of the Royal Society A 365(1851): 515–537. [CrossRef] [PubMed]
- Wu C, Wang J, Wang M, Kraak M J (2024) Machine learning–based characterisation of urban morphology with the street pattern. Computers, Environment and Urban Systems 109: 102078.
- Yang J, Fricker P, Jung A (2024) From intangible to tangible: The role of big data and machine learning in walkability studies. Computers, Environment and Urban Systems 109: 102087.
- Yin H, Sharma B, Hu, H, Liu F, Kaur M, Cohen G, McConnell R, Eckel S (2024) Predicting the climate impact of healthcare facilities using Gradient Boosting Machines. Cleaner Environmental System 12: 100155. [CrossRef]
- Yuh Y G, Tracz W, Matthews H D, Turner S E (2023) Application of machine learning approaches for land cover monitoring in northern Cameroon. Ecological Informatics 74: 101955. [CrossRef]
- Zhang Y R, Haghani A (2015) A gradient boosting method to improve travel time prediction. Transportation Research Part C 58: 308–324.
- Zheng M R, Wang H Y, Shang Y Q, Zheng X Q (2023a) Identification and prediction of mixed–use functional areas supported by POI data in Jinan City of China. Scientific Reports 13: 2913. [CrossRef] [PubMed]
- Zheng Y, Lin Y, Zhao L, Wu T H, Jin D P, Li Y (2023b) Spatial planning of urban communities via deep reinforcement learning. Nature Computer Science 3: 748–762. [CrossRef] [PubMed]
- Zhu D X, Cai C J, Yang T B, Zhou X (2018) A machine learning approach for air quality prediction: Model regularization and optimization. Big Data and Cognitive Computing 2(1): 5. [CrossRef]

| GBM | RF | |
| Bootstrap | True | |
| Criterion | friedman_mse | friedman_mse |
| learning_rate | 0.1 | |
| max_depth | 6, 7, 8, 9, 10 | 6, 7, 8, 9, 10 |
| max_features | sqrt | sqrt |
| min_samples_leaf | 2, 3, 4, 5, 6 | 2, 3, 4, 5, 6 |
| min_samples_split | 2, 3, 4, 5, 6 | 2, 3, 4, 5, 6 |
| n_estimators | 560, 570, 580, 590, 600 | 40, 50, 60, 70, 80 |
| Subsample | 0.6 |


| GBM | RF | |
| Bootstrap | True | |
| Criterion | friedman_mse | friedman_mse |
| learning_rate | 0.1 | |
| max_depth | 6, 7, 8, 9, 10 | 6, 7, 8, 9, 10 |
| max_features | sqrt | sqrt |
| min_samples_leaf | 2, 3, 4, 5, 6 | 2, 3, 4, 5, 6 |
| min_samples_split | 2, 3, 4, 5, 6 | 2, 3, 4, 5, 6 |
| n_estimators | 560, 570, 580, 590, 600 | 40, 50, 60, 70, 80 |
| Subsample | 0.6 |
| RP | GFA | AGE | FL | E | S | W | N | NE | SE | SW | NW | |
| Count | 7652 | 7652 | 7652 | 7652 | 7652 | 7652 | 7652 | 7652 | 7652 | 7652 | 7652 | 7652 |
| Mean | 3.65468 | 560.21445 | 6.45791 | 28.64310 | 0.07462 | 0.07475 | 0.08900 | 0.06404 | 0.26411 | 0.06730 | 0.24804 | 0.11814 |
| Std | 0.99348 | 94.34913 | 5.18221 | 15.94853 | 0.26280 | 0.26301 | 0.28476 | 0.24483 | 0.44089 | 0.25056 | 0.43190 | 0.32279 |
| Min | 0.51868 | 402 | 0.00274 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 25% | 2.918350 | 506 | 1.86575 | 15 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 50% | 3.31126 | 552 | 5.89041 | 30 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 75% | 4.01398 | 559 | 9.50137 | 42 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Max | 8.21933 | 851 | 20.09589 | 53 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| Skew | 1.33763 | 1.35301 | 0.66926 | -0.17527 | 3.238178 | 3.23458 | 2.88745 | 3.56226 | 1.07033 | 3.45472 | 1.16705 | 2.36659 |
| GBM | RF | |||||
|---|---|---|---|---|---|---|
| Optuna | Random Search | Grid Search | Optuna | Random Search | Grid Search | |
| bootstrap | True | True | True | |||
| criterion | friedman_mse | friedman_mse | friedman_mse | friedman_mse | friedman_mse | friedman_mse |
| learning_rate | 0.1 | 0.1 | 0.1 | |||
| max_depth | 6 | 6 | 6 | 10 | 10 | 10 |
| max_features | sqrt | sqrt | Sqrt | sqrt | sqrt | Sqrt |
| min_samples_leaf | 6 | 4 | 4 | 2 | 2 | 2 |
| min_samples_split | 6 | 2 | 2 | 5 | 6 | 6 |
| n_estimators | 560 | 560 | 560 | 80 | 80 | 80 |
| subsample | 0.6 | 0.6 | 0.6 | |||
| GBM | RF | |||||
|---|---|---|---|---|---|---|
| Optuna | Random Search | Grid Search | Optuna | Random Search | Grid Search | |
| 0.97312 (0.91728) | 0.97768 (0.91564) | 0.97768 (0.91564) | 0.93429 (0.90745) | 0.93328 (0.90773) | 0.93328 (0.90773) | |
| MAE | 0.10407 (0.16529) | 0.09774 (0.16936) | 0.09774 (0.16936) | 0.15878 (0.18427) | 0.15957 (0.18297) | 0.15957 (0.18297) |
| MSE | 0.02652 (0.08168) | 0.02202 (0.08329) | 0.02202 (0.08329) | 0.02652 (0.08168) | 0.06583 (0.09111) | 0.06583 (0.09111) |
| MAPE (%) | 3.431394 (5.63212) | 3.17283 (5.77724) | 3.17283 (5.77724) | 5.38893 (6.13946) | 5.41679 (6.10164) | 5.41679 (6.10164) |
| RMSE | 0.16285 (0.28579) | 0.14840 (0.28860) | 0.14840 (0.28860) | 0.25463 (0.30230) | 0.25657 (0.30184) | 0.25657 (0.30184) |
| Computational speed | 372.83 seconds | 2,256.82 seconds | 2,200.86 seconds | 65.3 seconds | 282.07 seconds | 273.26 seconds |
| Notes: Figures indicate the values for training set while figures in bracket indicate the values for test set. | ||||||
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).