Submitted:
26 July 2024
Posted:
29 July 2024
You are already at the latest version
Abstract
Keywords:
Introduction
Literature Review
Gradient Boosting Machine
Random Forest
Hyperparameter Tuning
Methodology
| GBM | RF | |
| bootstrap | True | |
| criterion | friedman_mse | friedman_mse |
| learning_rate | 0.1 | |
| loss | squared_error | |
| max_depth | 2, 3, …, 10 | 10, 11, …, 20 |
| max_features | sqrt | sqrt |
| min_samples_leaf | 2, 3, …, 10 | 2, 3, …, 10 |
| min_samples_split | 2, 3, …, 10 | 2, 3, …, 10 |
| n_estimators | 500, 510, …, 600 | 50, 60, …, 150 |
| Subsample | 0.6 |
Data Definitions and Sources
- represents the total transaction price of residential property i during time period t, measured in HK dollars.
- represents the net floor area of residential property i.
- represents the age of residential property i in years, which can be obtained by the difference between the date of issue of the occupation permit and the date of housing sales.
- represents the floor level of residential property i resides.
- represent eight orientations that residential property i is facing. They are assigned to be 1 if a property is facing a particular orientation, 0 otherwise. The omitted category is Northwest so that coefficients may be interpreted relative to this category.
Exploratory Data Analysis
Results and Discussions
Conclusions
Funding
Declaration of Conflicting Interests
References
- Ahlquis, K. D., Sugden, L. A. & Ramachandran, S. (2023). Enabling interpretable machine learning for biological data with reliability scores. PLOS Computational Biology, 19(5), e1011175. [CrossRef]
- Ahmed, M. M. & Abdel–Aty, M. (2013). Application of stochastic gradient boosting technique to enhance reliability of real–time risk assessment. Transportation Research Record Journal of the Transportation Research Board, 2386, 26–34.
- Anjaneyulu. B., Goswami, S., Banik, P., Chauhan, V., Raghav, N. & Chinmay (2024). Revolution of Artificial Intelligence in Computational Chemistry breakthroughs. Chemistry Africa. [CrossRef]
- Aysan, A. F., Ciftler, B. S. & Unal, I. M. (2024). Predictive power of Random Forests in analyzing risk management in Islamic banking. Journal of Risk and Financial Management, 17(3), 104. [CrossRef]
- Barbhuiya, S. & Sharif, S. (2023). Artificial Intelligence in concrete mix design: advances, applications and challenges. 3ICT 2023: International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies. University of Bahrain, Bahrain 20–21 Nov 2023 IEEE. [CrossRef]
- Bergstra, J. & Bengio, Y. (2012). Random Search for hyper-parameter optimization. Journal of Machine Learning Research, 13, 281–305.
- Besson, P., Rogalski, E., Gill, N. P., Zhang, H., Martersteck, A. & Bandt, S. K. (2022). Geometric deep learning reveals a structuro–temporal understanding of healthy and pathologic brain aging. Frontier in Aging Neuroscience, 14, 895535. [CrossRef]
- Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.
- Breiman, L. (1997). Arcing the edge (Technical Report 486). Berkeley: Department of Statistics, University of California.
- Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32.
- Calainho, F. D., van de Minne, A. M., Francke, M. K. (2022). A machine learning approach to price indices: applications in commercial real estate. Journal of Real Estate Finance and Economics. [CrossRef]
- Castaneda, J., Cardona, J. F., Martins, L. & Juan, A. A. (2021). Supervised machine learning algorithms for measuring and promoting sustainable transportation and green logistics. Transportation Research Procedia, 58, 455–462.
- Choy, L. H. T. & Ho, W. K. O. (2023). On the use of machine learning in real estate research. Land 12(4), 740. [CrossRef]
- Chung, C. W., Hsiao, T. H., Huang, C. J., Chen, Y. J., Chen, H. H., Lin, C. H., Chou, S. C., Chen, T. S., Chung, Y. F., Yang, H. I., Chen, Y. M. (2021). Machine learning approaches for the genomic prediction of rheumatoid arthritis and systemic lupus erythematosus. BioData Mining, 14(52), 1–13. [CrossRef]
- Chung, Y. S. (2013). Factor complexity of crash occurrence: An empirical demonstration using boosted regression trees. Accident Analysis & Prevention, 61, 107–118.
- Cugurullo, F., Caprotti, F., Cook, M., Karvonen, A., MᶜGuirk, P. & Marvin, S. (2024). The rise of AI urbanism in post–smart cities: A critical commentary on urban artificial intelligence. Urban Studies, 61(6), 1168–1182.
- Friedman, J. (1999). Greedy function approximation: A Gradient Boosting Machine. IMS 1999 Reitz Lecture, February 24, 1999.
- Grande, E. & Imbimbo, M. (2012). A data–driven approach for damage detection: An application to the ASCE steel benchmark structure. Journal of Civil Structural Health Monitoring, 2, 73–85. [CrossRef]
- Greener, J. G., Kandathil, S. M., Moffat, L. & Jone, D. T. (2022). A guide to machine learning for biologists. Nature Reviews Molecular Cell Biology, 23, 40–55. [CrossRef]
- Hamner, B. (2010). Predicting travel times with context–dependent random forests by modeling local and aggregate traffic flow. Paper presented at Data Mining Workshops (ICDMW), 2010 IEEE International Conference on IEEE, 1357–1359.
- Hjort, A., Pensar, J., Scheel, I. & Sommervoll, D. E. (2022). House price prediction with gradient boosted trees under different loss functions. Journal of Property Research, 39(4), 338–364. [CrossRef]
- Ho, W. K. O., Tang, B. S. & Wong, S. W. (2021). Predict property prices with machine learning algorithms. Journal of Property Research, 38(1), 48–70. [CrossRef]
- Jamous, M., Marsooli, R. & Miller, J. K. (2023). Physics–based modeling of climate change impact on hurricane–induced coastal erosion hazards. npj Climate Atmospheric Science, 6, 86. [CrossRef]
- Jun, H. J., Jung, S., Kang, S., Kim, T., Cho, C. H., Jhoo, W. Y., & Heo, J. P. (2023). Factors associated with pedestrian-vehicle collision hotspots involving seniors and children: a deep learning analysis of street-level images. International Journal of Urban Sciences, 28(2), 359–377. [CrossRef]
- Kalliola, J., Kapočiūtė-Dzikienė, J., Damaševičius, R. (2021). Neural network hyperparameter optimization for prediction of real estate prices in Helsinki. PeerJ Computer Science, 7, e444. [CrossRef]
- Kashinath, K., Mustafa, M. A., Albert, A., Wu, J. L. & Jiang, C,. Esmaeilzadeh, S., Azizzadenesheli, K., Wang, R. A., Chattopadhyay, A., Singh, A., Manepalli, A., Chirila, D., Yu, R., Walters, R., White, B., Xiao, H., Tchelepi, H. A., Marcus, P., Anandkumar, A., Hassanzadeh, P. & Prabhat (2021) Physics–informed machine learning: case studies for weather and climate modelling. Philosophical Transactions of the Royal Society A, 379(2194), 20200093. [CrossRef]
- de Jongh, S., Gielnik, F., Mueller, F., Schmit, L., Suriyah, M., Leibfried, T. (2022) Physics–informed geometric deep learning for inference tasks in power systems. Electric Power Systems Research, 211, 108362. [CrossRef]
- Leshem, G., Ritov, Y. A. (2007) Traffic flow prediction using Adaboost algorithm with Random Forests as a weak learner. International Journal of Electrical and Computer Engineering, 2(6), 404–409.
- Li, F., Yigitcanlar, T., Nepal, M., Nguyen, K. & Dur, F. (2023) Machine learning and remote sensing integration for leveraging urban sustainability: A review and framework. Sustainable Cities and Society, 96(13), 104653.
- Loef, B., Wong, A., Janssen, N. A. H., Strak, M., Hoekstra, J., Picavet, H. S. J., Boshuizen, H. C. H., Verschuren, W. M. M. & Herber, Gerrie–Corr M. (2022) Using Random Forest to identify longitudinal predictors of health in a 30–year cohort study. Scientific Reports, 12, 10372. [CrossRef]
- Long, Y. S., Zeng, L. Q., Wang, J., Long, X. C. & Wu, L. (2022) A gradient boosting approach to estimating tail risk interconnectedness. Applied Economics, 54(8), 862–879. [CrossRef]
- Lorenz, F., Willwersch, J., Cajias, M. &, Fuerst, F. (2023) Interpretable machine learning for real estate market analysis. Real Estate Economics, 51(5), 1178–1208. [CrossRef]
- Miller, M. I., Shih, L. C. & Kolachalama, V. B. (2023) Machine learning in clinical trials: A primer with applications to neurology. Neurotherapeutics, 20, 1066–1080. [CrossRef]
- Monteleoni, C., Schmidt, G. A., Saroha, S. & Asplund, E. (2011) Tracking climate model. Statistical Analysis and Data Mining, 4, 372–392. [CrossRef]
- Noorbakhsh, J., Chandok, H., Karuturi, K. M. (2019) Machine learning in biology and medicine. Advances in Molecular Pathology, 2(1), 143–152. [CrossRef]
- Optuna (2018). Optuna: A hyperparameter optimization framework. https://optuna.readthedocs.io/en/stable/.
- Reades, J., De Souza, J. & Hubbard, P. (2019). Understanding urban gentrification through machine learning: Predicting neighbourhood change in London. Urban Studies, 56(5), 922–942. [CrossRef]
- Renaud, J., Karam, R., Salomon, M. & Couturier, R. (2023) Deep learning and gradient boosting for urban environmental noise monitoring in smart cities. Expert Systems with Applications, 218, 119568. [CrossRef]
- Sideris, N., Bardis, G., Voulodimos, A., Miaoulis, G. & Ghazanfarpour, D. (2019). Using Random Forests on Real–World City Data for Urban Planning in a Visual Semantic Decision Support System. Sensors (Basel), 19(10), 2266. /. [CrossRef] [PubMed]
- Sit, M., Bekir, Z., Demiray, B. Z., Xiang, Z. R., Ewing, G. J., Sermet, Y. & Demir, I. (2020). A comprehensive review of deep learning applications in hydrology and water resources. Water Science and Technology, 82(12), 2635–2670. [CrossRef]
- Sanchez, T. W., Shumway, H., Gordner, T., & Lim, T. (2022). The prospects of artificial intelligence in urban planning. International Journal of Urban Sciences, 27(2), 179–194. [CrossRef]
- Thackway, W., Ng, M., Lee,C. L. & Pettit, C. (2023). Building a predictive machine learning model of gentrification in Sydney. Cities, 134, 104192.
- Tsagkis, P., Bakogiannis E. & Nikitas, A. (2023). Analysing urban growth using machine learning and open data: An artificial neural network modelled case study of five Greek cities. Sustainable Cities and Society, 89, 104337.
- Wang, Y. (2011). Prediction of weather impacted airport capacity using ensemble learning. 2011 IEEE/AIAA 30th Digital Avionics Systems Conference, Seattle, WA, USA, 2011, pp. 2D6–1–2D6–11. [CrossRef]
- Wang, J. & Biljecki, F. (2022). Unsupervised machine learning in urban studies: A systematic review of applications. Cities, 129(12), 103925.
- Weissler, E. H., Naumann, T., Andersson, T., Ranganath, R., Elemento, O., Luo, Y., Freitag, D. F., Benoit, J., Hughes, M. C., Khan, F., Slater, P., Shameer, K., Roe, M., Hutchison, E., Kollins, S. H., Broedl, U., Meng, Z. L., Wong, J. L., Curtis, L., Huang, E. & Ghassemi, M. (2021). The role of machine learning in clinical research: transforming the future of evidence generation, Trials, 22, 537. [CrossRef]
- Whitehall, B. L. & Lu, S. C. Y. (1991). Machine learning in engineering automation–The present and the future. Computers in Industry, 17, 91–100.
- Willard, J., Jia, X. W. & Xu, S. M., Steinbach, M. & Kumar, V. (2022). Integrating scientific knowledge with machine learning for engineering and environmental systems, ACM Computing Survey, 55(4), 1–37. [CrossRef]
- Worden, K. & Manson, G. (2006). The application of machine learning to structural health monitoring. Philosophical Transactions of the Royal Society A, 365(1851), 515–537. [CrossRef]
- Wu, C., Wang, J., Wang, M. & Kraak, M. J. (2024). Machine learning–based characterisation of urban morphology with the street pattern. Computers, Environment and Urban Systems, 109, 102078.
- Yang, J., Fricker, P. & Jung, A. (2024) From intangible to tangible: The role of big data and machine learning in walkability studies. Computers, Environment and Urban Systems, 109, 102087.
- Yin, H., Sharma, B., Hu, H., Liu, F., Kaur, M., Cohen, G., McConnell, R. & Eckel, S. (2024). Predicting the climate impact of healthcare facilities using Gradient Boosting Machines. Cleaner Environmental System, 12, 100155. [CrossRef]
- Yuh, Y. G., Tracz, W., Matthews, H. D. & Turner, S. E. (2023). Application of machine learning approaches for land cover monitoring in northern Cameroon. Ecological Informatics, 74, 101955. [CrossRef]
- Zhang, Y. R. & Haghani, A. (2015) A gradient boosting method to improve travel time prediction. Transportation Research Part C, 58, 308–324.
- Zheng, M. R., Wang, H. Y., Shang, Y. Q. & Zheng, X. Q. (2023a). Identification and prediction of mixed–use functional areas supported by POI data in Jinan City of China. Scientific Reports, 13, 2913. [CrossRef]
- Zheng, Y., Lin, Y., Zhao, L., Wu, T. H., Jin, D. P. & Li, Y. (2023b) Spatial planning of urban communities via deep reinforcement learning. Nature Computer Science, 3, 748–762. [CrossRef]
- Zhu, D. X., Cai, C. J., Yang, T. B. & Zhou, X. (2018) A machine learning approach for air quality prediction: Model regularization and optimization. Big Data and Cognitive Computing, 2(1), 5. [CrossRef]



| RP | GFA | AGE | FL | E | S | W | N | NE | SE | SW | NW | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Count | 7652 | 7652 | 7652 | 7652 | 7652 | 7652 | 7652 | 7652 | 7652 | 7652 | 7652 | 7652 |
| Mean | 3.65468 | 560.21445 | 6.45791 | 28.64310 | 0.07462 | 0.07475 | 0.08900 | 0.06404 | 0.26411 | 0.06730 | 0.24804 | 0.11814 |
| Std | 0.99348 | 94.34913 | 5.18221 | 15.94853 | 0.26280 | 0.26301 | 0.28476 | 0.24483 | 0.44089 | 0.25056 | 0.43190 | 0.32279 |
| Min | 0.51868 | 402 | 0.00274 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 25% | 2.918350 | 506 | 1.86575 | 15 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 50% | 3.31126 | 552 | 5.89041 | 30 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 75% | 4.01398 | 559 | 9.50137 | 42 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Max | 8.21933 | 851 | 20.09589 | 53 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| Skew | 1.33763 | 1.35301 | 0.66926 | -0.17527 | 3.238178 | 3.23458 | 2.88745 | 3.56226 | 1.07033 | 3.45472 | 1.16705 | 2.36659 |
| GBM | RF | |||||
|---|---|---|---|---|---|---|
| Optuna | Random Search | Grid Search | Optuna | Random Search | Grid Search | |
| bootstrap | True | True | True | |||
| criterion | friedman_mse | friedman_mse | friedman_mse | friedman_mse | friedman_mse | friedman_mse |
| learning_rate | 0.1 | 0.1 | 0.1 | |||
| loss | squared_error | squared_error | squared_error | |||
| max_depth | 6 | 5 | 5 | 18 | 19 | 19 |
| max_features | sqrt | sqrt | sqrt | sqrt | sqrt | sqrt |
| min_samples_leaf | 9 | 10 | 10 | 2 | 2 | 2 |
| min_samples_split | 9 | 3 | 2 | 3 | 4 | 2 |
| n_estimators | 520 | 500 | 500 | 130 | 130 | 110 |
| subsample | 0.6 | 0.6 | 0.6 | |||
| GBM | RF | |||||
|---|---|---|---|---|---|---|
| Optuna | Random Search | Grid Search | Optuna | Random Search | Grid Search | |
| 0.96677 (0.91837) | 0.95480 (0.91774) | 0.95480 (0.91774) | 0.96256 (0.91879) | 0.96318 (0.91837) | 0.96306 (0.91851) | |
| MAE | 0.11442 (0.16595) | 0.13285 (0.17022) | 0.13285 (0.17022) | 0.10740 (0.16349) | 0.10638 (0.16383) | 0.10686 (0.16363) |
| MSE | 0.03279 (0.08060) | 0.04460 (0.08122) | 0.04460 (0.08122) | 0.03694 (0.08018) | 0.03633 (0.08060) | 0.03645 (0.08046) |
| MAPE (%) | 3.80679 (5.66011) | 4.45201 (5.74195) | 4.45201 (5.74195) | 3.71744 (5.54538) | 3.68398 (5.66610) | 3.69441 (5.56038) |
| RMSE | 0.18107 (0.28391) | 0.21119 (0.28499) | 0.21119 (0.28499) | 0.19220 (0.28317) | 0.19062 (0.28390) | 0.19092 (0.28366) |
| Computational speed | 290.60 seconds | 2,551.43 seconds | 20,488.57 seconds | 174.62 seconds | 974.02 seconds | 9,361.18 seconds |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).