Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Predicting UK Housing Price using Machine Learning Algorithms

Version 1 : Received: 3 April 2024 / Approved: 4 April 2024 / Online: 4 April 2024 (12:58:30 CEST)

How to cite: Ogundeji, G.A.; Pitts, D.D.A.; Sun, Y.; Ghafoor, M. Predicting UK Housing Price using Machine Learning Algorithms. Preprints 2024, 2024040343. https://doi.org/10.20944/preprints202404.0343.v1 Ogundeji, G.A.; Pitts, D.D.A.; Sun, Y.; Ghafoor, M. Predicting UK Housing Price using Machine Learning Algorithms. Preprints 2024, 2024040343. https://doi.org/10.20944/preprints202404.0343.v1

Abstract

The development of reliable predictive algorithm for house price as the housing market is a stand-out among the most involved regarding valuing the price and continues to fluctuate, is constantly a need for socio-economic advancement and welfare of citizen. In this paper, we develop machine learning algorithms for forecasting UK housing Price, and find an optimal algorithm that forecasts housing price accurately on the premises of the presence of many features or covariates. After applying correlation analysis to remove correlated variables in order to avoid multicollinearity, thereby increasing the statistical power, a novel method of using regression analysis to first of all understand and select statistically significant features for the various regions in England based on North South divide is adopted. These features are then used in the machine learning algorithm to further increase the statistical power of the algorithm, increase the level of accuracy for each of them and ultimately increase the predictive values for the algorithms. The model construction involves 3 stages: 1- correlation analysis to identify and remove correlated variables thereby avoiding multicollinearity and increasing the statistical power of the linear regression, 2 - using linear regression to determine variables that are statistically significant and 3 - building the machine learning algorithms based on the variables that are statistically significant from the linear regression. A comprehensive dataset of UK Paid housing Price from 2010 to 2019 was linked to a number of other datasets to generate a total 21 variables or features used for the models. Catboost, Gradient Boosting, Bagging, Random Forest, Extra Tree all achieved the excellent models performance result in all the regions considered. The comparison of the seven models showed that Extra Tree algorithm consistently achieved the best performance in term of level of accuracy in all the regions. K-Nearest Neighbours (KNN) is the only algorithm with less than 50% level of accuracy. Noticeably, the regions considered had varying or differing insignificant variables, implying that although many variables are common (statistically significant) to all the regions, there are regional differences and impact when modelling or predicting housing prices. This study validates the practicability of developing a machine learning methodology for the prediction of housing price. This research offers a reference for future house price prediction based on machine learning.

Keywords

House pricing; Catboost; Gradient Boosting; Bagging; Random Forest; Extra Tree; KNN; ANN; UK paid housing price

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.