Preprint
Review

This version is not peer-reviewed.

Integrating Machine Learning and Hedonic Regression for Housing Price Prediction: A Systematic International Review of Model Performance and Interpretability

Submitted:

08 August 2025

Posted:

08 August 2025

You are already at the latest version

Abstract
It is becoming increasingly important to predict property prices to mitigate investment risk, establish policies, and preserve market stability. To determine the practical utility and anticipated efficacy of the sophisticated statistical and machine learning models that have emerged, a comparative analysis is required.The purpose of this systematic study is to assess the predictive effectiveness and interpretability of hedonic regression and complex machine learning models in the estimation of housing prices in a wide range of foreign scenarios.In May 2024, a thorough search was conducted in Scopus, Google Scholar, and Web of Science. The search terms included "hedonic pricing models," "machine learning," and "housing price prediction," in addition to others. The inclusion criteria required the utilization of empirical research published after 2000, a comparison of at least two predictive models, and reliable transaction data. Research that utilized non-empirical methodologies or web-scraped prices was excluded. Twenty-three investigations met the eligibility criteria. The evaluation was conducted in accordance with the reporting criteria of PRISMA 2020.Random Forest was the most frequently employed and consistently high-performing model, being selected in 14 of 23 studies and regarded as exceptional in five. Despite their lack of precision, hedonic regression models provided critical explanatory insights into critical variables, such as proximity to urban centers, property characteristics, and location. The integration of hedonic and machine learning models improved the interpretability and accuracy of the predicted results. Many of the studies included in this review were longitudinal, covered a diverse range of international contexts (specifically, Asia, Europe, America, and Australia), and demonstrated a rise in research output beyond 2020.Even though hedonic models retain a significant amount of explanatory power, the precision of home price predictions is improved by machine learning, particularly Random Forest and neural networks. The optimal results for researchers, real estate professionals, and policymakers who aim to improve market transparency and enlighten effective policy decisions are achieved through the seamless integration of these techniques.
Keywords: 
;  ;  ;  ;  ;  ;  ;  

Introduction

Importance of Housing Price Prediction

It is essential to predict home prices, as residential property is the primary asset for most individuals and influences the overall macroeconomic trends (Mark & Kim, 2007). The distribution of wealth, investment strategies, housing affordability, and market stability are all directly influenced by price differences. The combination of numerous property-specific and environmental factors results in the complicated process of determining house prices. External variables, including location, neighborhood socioeconomics, accessibility, and environmental quality, interact with internal elements, including structural quality, size, and amenities (Wang, 2023; Bartholomew & Ewing, 2011). Real estate valuation is influenced by regional characteristics, including accessibility to transportation centers, commercial activity, and local economic conditions, which make certain features challenging to describe (Howard & Liebersohn, 2023).

Traditional Approaches and Hedonic Price Models

Subjectivity and variability are built into conventional real estate evaluation, which is predicated on expert opinion and conventional comparison or cost-based approaches (Jayantha & Oladinrin, 2019). Hedonic price models (HPMs) were developed to calculate the implicit value of property attributes using observable transaction data. This methodology is based on regression. The theoretical foundation of HPMs, which is derived from Lancaster’s consumer theory and Rosen’s framework of heterogeneous commodities, facilitates the assessment of the contributions of different characteristics to the overall value of a property (Goodman, 1998; Rosen, 1974; Diewert et al., 2011). Housing Price Models (HPMs) have been extensively employed to examine the spatial and structural determinants of housing prices in a variety of market scenarios (Hwang & Quigley, 2006; Xiao, 2017).

Methodological Challenges in Hedonic Modeling

Serious methodological issues have been revealed because of the widespread implementation of HPMs. Multicollinearity among predictors, omitted variable bias, and an inherent assumption of linear relationships between qualities and price pose significant issues (Schläpfer et al., 2015). Conventional regression models encounter difficulties in resolving the intricate, nonlinear dynamics of real estate markets. The explanatory capacity of linear techniques has been exceeded by the rapid expansion of available data, which encompasses high-resolution geographical information and user-generated imagery (Glaeser et al., 2016). Consequently, research has shifted to sophisticated machine learning (ML) approaches that can detect high-dimensional correlations and nonlinearities without the limitations of parametric modeling (Rouhiaine, 2018; Lundberg & Lee, 2017).

Emergence of Machine Learning in Real Estate Analytics

In recent years, real estate analytics has experienced an increase in the use of machine learning, which includes algorithms such as Random Forests, artificial neural networks (ANNs), gradient boosting machines (GBMs), support vector machines (SVMs), and ensemble strategies. These algorithms employ a diverse array of input sources, such as spatial coordinates, environmental indicators, and image-derived attributes, to identify complex patterns in vast, diverse datasets (Barzegar et al., 2016). The accuracy of pricing forecasts has been enhanced by the capacity of neural networks to extract contextual context from images (Wang, 2023). Random Forest and other tree-based models have also demonstrated the capacity to manage variable interactions and data anomalies that have been identified in property transaction records (Rigatti, 2017; Chen et al., 2022).

Gaps in the Existing Literature

The literature remains uncertain regarding the practical trade-offs between traditional hedonic models and machine learning, despite an increase in research driven by machine learning. Although numerous narrative and bibliometric reviews have documented the evolution of mass appraisal and the use of artificial intelligence in real estate, these studies often concentrate on thematic trends or technical summaries rather than offering a systematic, empirical evaluation of model performance (Jayantha & Oladinrin, 2019; Wang & Li, 2019). The hedonic effects of development form were investigated by Bartholomew and Ewing (2011), but they did not capitalize on the advancements in predictive modeling tools. The increasing significance of machine learning is acknowledged in other systematic evaluations, such as those conducted by Wang and Li (2019). However, these evaluations do not systematically analyze prediction accuracy or interpretability among approaches, nor do they differentiate the settings or data sources in which each model performs well. The investigation into whether model complexity results in significant improvements in prediction accuracy when used with transaction data, as well as the integration of hedonic modeling with machine learning within the same empirical context, is critically underexplored.

Limitations of Previous Reviews

Numerous existing research and evaluations are restricted by their inclusion criteria, which occasionally rely on data obtained from advertised pricing obtained through web scraping rather than real sales transactions. This undermines the credibility and policy implications of their findings (Rico-Juan & Taltavull, 2021). Systematic syntheses of the variables that consistently prove most influential in empirical models are not common, and there are few reviews that evaluate the efficacy of predictive models across international markets, despite the acknowledged differences in data quality, market structure, and institutional context.

Objectives and Review Question

This systematic review addresses existing voids by conducting a comparative synthesis of empirical research that utilized advanced predictive models, including hedonic regression and machine learning approaches, when analyzing actual residential property transaction data. The investigation examines studies that were published after 2000, observing the progression of the data environment and methodological advancements. To maintain rigor and comparability, only studies that empirically evaluate two or more predictive models within the same sample and provide objective performance metrics such as R2, MAE, and RMSE are included (Shmueli, 2010; Rigatti, 2017).
The primary objective is to comprehensively identify, classify, and assess the predictive models that are employed to estimate home prices, with an emphasis on the interpretability and relative accuracy of hedonic regression and machine learning techniques. The objective of the assessment is to determine the characteristics that are most consistently linked to fluctuations in housing prices. This will enable practitioners and policymakers to make well-informed decisions regarding the selection of models and the objectives of data collection.

Methods

The scientific rigor and transparency of the systematic review were guaranteed by the implementation of the PRISMA 2020 guidelines (Page et al., 2021). The subject did not correspond with their established domains during the investigation, and as a result, the approach for this evaluation was not previously registered on PROSPERO.

Eligibility Criteria

The criteria for prospective applicants were established using the PICO framework. Studies were eligible for inclusion if they (a) compared a minimum of two predictive modeling techniques, including hedonic regression and one or more advanced machine learning algorithms (such as Random Forest, neural networks, gradient boosting, support vector machines), (b) provided at least one objective performance metric (e.g., R2, MAE, RMSE), and (c) were published between January 2000 and May. We reviewed peer-reviewed literature in either English or Spanish. The exclusion criteria included studies that employed simulated or non-empirical data, relied on web-scraped or promotional pricing instead of real transactions, and included book chapters, conference proceedings, dissertations, technical reports, editorials, or opinion pieces.

Literature Search Strategy

In May 2024, a thorough literature review was conducted using Scopus, Google Scholar, and Web of Science. These sources collectively cover a wide range of disciplines and include extensive published research in computational modeling, economics, and real estate (Chen et al., 2022). The search was conducted from May 1 to May 30, 2024, and it involved the use of Boolean combinations of terminology, including “housing price prediction,” “housing price models,” “machine learning,” “hedonic price models,” “artificial neural networks,” “support vector machine,” and “random forest.” Keywords, abstracts, and article titles were the primary focus. Comprehensive search methodologies and search terminology are included in the supplementary materials.

Study Selection Process

The selection of the research was divided into two phases. Duplicates were initially eliminated. Prior to executing a comprehensive evaluation to ascertain eligibility, two distinct evaluators evaluated the abstracts and titles for relevance. Disputes were resolved through dialogue and, when necessary, by consulting a third evaluator to attain consensus on the findings for inclusion. The PRISMA flow diagram was employed to demonstrate the selection process employed throughout the methodology and the rationale for the exclusion of specific studies. Additionally, all records were preserved in their current condition.

Data Extraction

Utilizing a standardized template that had been previously validated prior to its formal implementation, the two reviewers extracted data independently. The following elements were extracted: the author’s name, the year and country of the study, the study design (cross-sectional or longitudinal), the sample size, the models compared, the type and duration of the transaction data, the variables analyzed, the performance metrics (R2, MAE, RMSE, MAPE), and the principal conclusions regarding the accuracy and utility of the models. In order to resolve concerns regarding data extraction, consensus was implemented. Excel spreadsheets were implemented to supervise all data; nevertheless, automated methodologies were not implemented.

Quality Assessment

The research quality was evaluated by assessing the comprehensiveness of the comparative outcomes, the robustness of the analytical methodology, and the clarity of the data sources. The source and date of transaction data must be explicitly stated in research, in addition to a sufficient level of methodological transparency to facilitate critical assessment or replication. Studies that overlooked critical elements or that provided only theoretical results without empirical support were excluded. A formal risk of bias methodology, such as Cochrane RoB, was not incorporated into the review, as it emphasized the practical efficacy of the models over their performance in clinical environments. The validity and reproducibility of the data were the primary concerns of the evaluation.

Data Synthesis

The data was synthesized using a thematic and narrative framework. Meta-analysis became impossible due to the substantial difference between the geographic context, modeling techniques, and data presentation. To determine the frequency and circumstances under which each model outperformed the others, we compiled comparative performance metrics (R2, MAE, and RMSE). The form of model used to categorize the studies. The findings were analyzed to identify the characteristics that consistently have the most impact on predictive modeling, as well as any contextual factors that may affect the selection or performance of the model.

Results

Study Selection and Characteristics

516 studies regarding home price prediction models were identified through a thorough examination. 23 empirical studies were selected for inclusion in this review after the elimination of duplicates and papers that failed to meet the eligibility criteria. These publications provide a considerable sample for comparative synthesis, as they cover a wide range of countries, modeling methodologies, and analytical frameworks.

Screening and Exclusion Process

212 articles were eliminated after the initial screening due to inadequate methodological specifics (e.g., data source, study year), 43 for relying on web-scraped or advertised prices, 52 for concentrating on non-housing markets, 21 for failing to compare multiple prediction models, and 28 for lacking empirical application. This stringent exclusion produced a concentrated sample of 23 studies that satisfied the essential inclusion criteria.

Overview of Modeling Approaches

In 14 of the 23 studies, the Random Forest (RF) model was the most prevalent advanced machine learning method. It is acknowledged for its exceptional prediction accuracy, as demonstrated by R2, MAE, and RMSE. The RF model outperformed all other models in five comparisons, particularly when the datasets were large, complex, contained nonlinearity, and had high-dimensional interactions. In Australia and South Korea, Random Forest (RF) outperformed linear regression, decision trees, and gradient boosting machines in terms of explanatory capability and error rates (Soltani et al., 2022; Hong et al., 2020).
Artificial neural networks (ANNs), support vector machines (SVMs), gradient boosting machines (GBMs, such as XGBoost and LightGBM), decision trees, hedonic price models (HPMs), and numerous ensemble methods were also frequently examined. In studies with sample sizes exceeding 10,000, classical regression was consistently outperformed by machine learning models, including Random Forest, Gradient Boosting Machines, and Artificial Neural Networks, in terms of predictive accuracy. In some cases, simplified models, such as linear regression or decision trees, were as effective or more effective than more complex ones for smaller or less complex datasets, especially when it was critical to understand the results (Begum et al., 2022, 2024; Hoxha, 2024).
Most of the research utilized HPMs as a standard to ascertain the significance of the results and the influence of various variables. Nevertheless, ML models frequently exhibited superior accuracy, particularly in instances where there was a high degree of nonlinearity or variable interactions, even though HPMs demonstrated which variables had the greatest impact on price (Rico-Juan & Taltavull, 2021).

Key Predictors of Housing Prices

In various studies, nine factors consistently identified as significant predictors of housing prices. These factors encompass the property’s location (area/neighborhood), its distance from the central business district, its structural features (such as size, number of rooms, and presence of amenities), its distance from transportation infrastructure, the socioeconomic characteristics of the neighborhood, environmental factors (such as green space, noise, and emissions), its distance from major roads, and its proximity to major roads. The most significant factor was the property’s location, which was identified in up to 21% of the studies. This was followed by its internal structural features and its distance from the city center (17%).
Geographical data, user-generated imagery, or satellite information are frequently incorporated into studies that employ complex machine learning models, which emphasize environmental and contextual factors (Wang, 2023; Chen et al., 2022). This demonstrates that machine learning algorithms can identify high-dimensional, nonlinear effects that classical regression may overlook.

Strengths and Weaknesses of Modeling Methods

The study outlined the advantages and disadvantages of each modeling methodology. HPMs were commended for their transparency and interpretability, and they were able to effectively identify significant determinants. However, they encountered challenges because of multicollinearity, linearity assumptions, and a limited ability to encapsulate complex interactions (Schläpfer et al., 2015). In numerous contexts, machine learning models, particularly Random Forests and Artificial Neural Networks, improved fitting and predictive accuracy; however, they were “black boxes,” which complicated variable interpretation and policy execution.

Integration of Hedonic and Machine Learning Models

The integration of HPMs with ML algorithms was the subject of numerous studies that investigated these trade-offs. Hybrid or ensemble models have consistently demonstrated superior performance, which has facilitated interpretability from HPMs and enhanced prediction from ML (Rico-Juan & Taltavull, 2021; Chou et al., 2022).
Many real-world implementations were conducted in Asian markets, with a particular emphasis on Taiwan, China, and South Korea. This demonstrates the extent of the technology in these regions and the enormous quantity of real estate data that is accessible. There were few instances of marketplaces in Europe and the United States. Most of the sample (65%) consisted of longitudinal designs, which enabled the analysis of property price trends over a period of several years and offered resilience to market fluctuations.
Access to big data and computing power increased, and research output and model sophistication improved significantly after 2020 (Chou et al., 2022). Studies conducted after 2020 frequently examined an increased number of models (up to 11), employed deeper data sources, and implemented more sophisticated ensemble learning methodologies.

Agreements and Disagreements in the Literature

Random Forest and other comparable machine learning models were equally recognized as superior in their ability to generate precise predictions across a variety of datasets and scenarios. Nevertheless, numerous studies have suggested that simpler models, such as HPMs, may be competitive for limited or homogeneous datasets (Begum et al., 2022). There was consensus that the integration of high-performance modeling and machine learning would enhance the accuracy and clarity of the information.
There were disagreements regarding the significance of variables and the potential of models to be applied in multiple markets. Most research has concurred that location and structural characteristics are significant, although the impact of environmental or neighborhood factors varies depending on the size of the dataset and the region.
The study’s merits were the rigorous comparative methodology, transparency in reporting, and the use of authentic transaction data. Nevertheless, it was challenging to conduct a direct comparison between the two due to differences in sample size, variable selection, and model implementation. The generalizability of certain research had been reduced due to the absence of complete hyperparameter configurations or external validation. A mere fraction of the research conducted assessed model performance in response to economic disruptions or changing market conditions, and only a small number of studies included genuine out-of-sample validation.

Discussion

Comparative Efficacy of Predictive Models

In this study, the relative effectiveness of sophisticated machine learning and hedonic regression models in predicting property prices was critically evaluated using empirical transaction data from a variety of international contexts. This review enhances the understanding of predictive modeling in real estate by identifying the specific contexts in which machine learning, particularly ensemble methods such as Random Forest, surpasses traditional hedonic price models. It synthesizes the results of 23 systematically selected studies. It also acknowledges the enduring importance of interpretable models and emphasizes the complex contributions of a variety of housing variables across markets and datasets.
Random Forest and analogous machine learning algorithms consistently demonstrate superior prediction accuracy, as evidenced by R2, MAE, and RMSE, particularly in diverse and extensive datasets, as indicated by the primary results. This advantage results from their ability to represent the complex variable linkages, high-dimensional interactions, and nonlinearities that are common in real-world housing markets (Rigatti, 2017; Chen et al., 2022). When input variables include geographical, environmental, and image-derived attributes, neural networks and gradient-boosted machines produce reliable results. In contrast, hedonic price models, while occasionally effective in simplified or smaller datasets, primarily excel in the estimation of marginal impacts and providing of explanatory clarity for specific dwelling characteristics (Schläpfer et al., 2015; Goodman, 1998).

Comparison with Previous Reviews

This analysis supports and enhances previous literature syntheses, which frequently emphasize the theoretical capabilities of machine learning but rarely provide systematic, empirical evaluations of prediction efficacy using actual transaction data (Wang & Li, 2019; Jayantha & Oladinrin, 2019). This work provides a clear understanding of model selection based on evidence, in contrast to previous narrative evaluations that prioritize thematic or technical overviews. The predictive superiority of machine learning models is context-dependent, particularly in terms of data complexity and sample size, as evidenced by this. In their 2022 study, Begum et al. observed that decision trees and linear models can compete with more sophisticated methods in specific limited environments. This conclusion is also supported by other studies in this study. This contradicts the notion, which is frequently observed in recent research, that a higher level of model complexity always leads to superior out-of-sample performance.
This review explicitly illustrates that the integration of hedonic and machine learning models, whether through sequential application or composite ensembles, can improve both the predictability and interpretability of the results (Rico-Juan & Taltavull, 2021; Chou et al., 2022). Even though antecedent research has acknowledged the advantages of each method individually, there is a lack of experimentally validated studies that have demonstrated the advantages of their integration in a variety of real estate markets. This synthesis suggests that practitioners and policymakers should not regard both methodologies as mutually exclusive. Rather, they may benefit from customized, composite modeling strategies that incorporate the strengths of both paradigms.
The methodical identification of the most significant variables across models is an additional contribution. The primary predictor of house prices is location, with structural qualities, proximity to urban centers, access to transit, and environmental factors consistently identified as significant influences. The significance of contextual and environmental factors that may be undervalued or excluded in conventional regression studies is enhanced by machine learning approaches, which accommodate intricate, high-dimensional interactions (Wang, 2023; Chen et al., 2022). This discovery suggests that the precision and equity of property assessment will be improved, particularly in rapidly urbanizing environments, by expanding data collection to include these features.

Strengths and Limitations

The synthesis evidence is consistent because of the transparent, comparative presentation of model performance and its reliance on empirical transaction data. However, it is imperative to acknowledge limitations. Selection bias may arise because of the review’s emphasis on peer-reviewed papers published in English or Spanish, which could result in the exclusion of relevant findings from industrial research not cataloged in prominent academic databases or in any other language. The outcomes may also be influenced by publication bias, as studies that present negative or null results for machine learning models are less frequently published. The decision to exclude research that employs web-scraped or promoted prices enhances the validity of the data; however, it may limit the scope of insights into emerging areas where transaction data is less accessible. The direct meta-analytic pooling becomes difficult by the variability in sample size, variable selection, and model implementation among research, which requires a narrative synthesis technique.

Practical Implications for Policy and Practice

The methodical search procedure, the rigorous evaluation of models across research utilizing the same dataset, and the explicit and transparent inclusion criteria are the strengths of this review. The practical significance of the findings for policy and implementation is enhanced by the emphasis on actual transaction data. Actionable insights for subsequent research and practical implementation are facilitated by the comprehensive extraction and reporting of performance measures.
Real estate professionals should improve the accuracy of housing price prediction by investing in data infrastructure that accommodates a broader range of structural, locational, and environmental variables and by utilizing ensemble machine learning models, specifically Random Forest and hybrid methodologies. Policymakers must recognize that while machine learning algorithms offer superior pricing predictions, they frequently lack transparency, which requires the preservation of hedonic or interpretable elements when utilizing the results for regulatory or tax determinations. The review emphasizes the importance of conducting a thorough, comparative analysis across a variety of contexts for researchers and advocates for a greater emphasis on external validation, model transparency, and the integration of new data sources.

Research Gaps and Future Directions

Particularly concerning the applicability of predictive models across markets and temporal frameworks that are notably different, there are still unresolved inquiries and deficiencies. Much of the research included in this collection is sourced from regions with a wealth of high-quality transaction data, primarily in Asia, North America, and specific European countries. The performance of these models in markets characterized by reduced transparency, varying institutional frameworks, or abrupt legislative or macroeconomic disruptions is less well-documented. This field is significant for further exploration, as limited research has evaluated model stability during crises or rapid market fluctuations. Moreover, even though progress has been made in improving the interpretability of machine learning models (e.g., through SHAP values or feature importance plots), the research remains inconclusive on the most effective methods for communicating these complex findings to non-technical stakeholders (Lundberg & Lee, 2017).
In scenarios where past housing discrimination or spatial inequality are prominent, there are ongoing debates in the literature regarding the potential of machine learning algorithms to perpetuate bias or inequality when trained on insufficient or biased data. The optimal balance between predictive accuracy and interpretability is a topic of ongoing discussion. Certain stakeholders prioritize model transparency over minor improvements in forecasting precision, particularly when the results have direct policy or distributional implications (Shmueli, 2010). Furthermore, the distinction between proprietary “black box” models developed by private companies and open, transparent academic models becomes more contentious as housing markets become more digital, prompting inquiries regarding public interest and accountability.
This analysis supports the increasing agreement that advanced machine learning models, particularly ensemble methods, outperform traditional hedonic models in predicting house prices when a wide range of data is available. The persistent importance of interpretable, theoretically based models is confirmed, particularly in the context of policy applications and the understanding of variable impacts. Reconciling diverse methodological paradigms, improving access to superior transaction data, and carefully addressing the transferability and transparency of prediction models within a dynamic and frequently inequitable global housing environment are essential for the profession’s future growth.

Conclusions

Machine learning techniques, particularly Random Forest and ensemble methods, typically achieve the highest anticipated accuracy when combined with comprehensive empirical transaction data, in contrast to complicated models used for housing price prediction. Hedonic regression models are highly beneficial due to their interpretability and capacity to clarify the influence of critical housing and community factors, despite their lower predictive accuracy. The most effective alternative for stakeholders who are interested in balancing explanatory power and accuracy is the combination of both procedures, which employ hybrid or sequential approaches.
The research shows that location is the most significant factor in the valuation of residential property, followed by structural qualities, proximity to urban areas, and access to transportation and environmental amenities. These results underscore the critical significance of investing in comprehensive, multi-source data collection that involves both internal and external factors that influence property pricing. Machine learning models can use these characteristics, as evidenced by research from technologically advanced and data-rich regions, particularly Asia and North America. However, the challenges of implementing such methodologies in markets with limited or inadequate data are also underscored.

Recommendations for Researchers

Comparative studies that carefully assess predictive models in a variety of geographic, regulatory, and temporal contexts are the primary focus of researchers. External validation, transparent disclosure of model parameters and hyperparameters, and authentic out-of-sample testing should be the primary focus of subsequent research to evaluate the robustness of prediction algorithms in a range of market scenarios. To achieve a balance between practical transparency and anticipated accuracy, interpretable machine learning approaches, including feature importance metrics and explainable AI tools, must be developed and enhanced. The practical applicability of future modeling endeavors will be improved through collaborations with business and government to obtain high-quality transaction data.
Ensemble machine learning technology is strongly advised for real estate and property valuation professionals to enhance the precision of pricing models and portfolio risk management. It is crucial to complement these methodologies with interpretable models, such as hedonic regressions, as the results have an impact on the decisions of stakeholders, including property owners, buyers, and local communities. Maintaining precision in the face of fluctuations in market dynamics and data availability necessitates consistent model evaluation and adjustment. The precision and adaptability of the model should be enhanced by leveraging the heightened availability of spatial, environmental, and user-generated data.

Policy Considerations

The review underscores the necessity for policymakers to guarantee that public sector property valuation and taxation systems are both precise and equitable. Policymakers should promote the development of transparent forecasting models and advocate for unrestricted access to transaction data. It is important to guarantee that the outcomes of machine learning methodologies are verifiable, comprehensible, and devoid of biases that could perpetuate historical imbalances when they are implemented for regulatory, planning, or taxation purposes. To fortify the evidence, base for affordable housing initiatives and urban planning, policy frameworks must promote the utilization of emerging data sources, including environmental monitoring and accessibility indices.
There are still several research gaps that must be prioritized in future initiatives. Initially, it is important to examine the potential of machine learning and hybrid models in emerging and data-deficient housing markets, particularly in regions that are experiencing rapid urbanization or institutional upheaval. Secondly, there is a dearth of research that has examined the resilience of prediction models in the presence of market disruptions, economic downturns, or regulatory interventions. It will be essential to conduct longitudinal studies that assess the efficacy of the model over the course of economic cycles. Third, additional research is required to enhance end-user trust and transparency through the implementation of interpretable machine learning, such as SHAP values. Lastly, the ethical and social implications of automated property evaluation, such as the unintentional reinforcing of spatial imbalances and algorithmic bias, should be given greater attention.
Based on high-quality, multidimensional data, the most effective approach to predicting property values is a combination of the simplicity of traditional models and the advantages of advanced machine learning. The profession can achieve more precise, equitable, and beneficial real estate analytics by implementing these recommendations and research priorities.

References

  1. Bartholomew, K.; Ewing, R. Hedonic price effects of pedestrian-and transit-oriented development. Journal of Planning Literature 2011, 26(1), 18–34. [Google Scholar] [CrossRef]
  2. Begum, A.; Samad, M.; Chowdhury, S. Comparative analysis of predictive models for housing price estimation. International Journal of Housing Markets and Analysis 2022, 15(3), 423–441. [Google Scholar] [CrossRef]
  3. Chen, Z.; Ye, L.; Zhang, X.; Wu, J. Machine learning-based housing price prediction: A survey. Journal of Real Estate Research 2022, 44(2), 241–267. [Google Scholar] [CrossRef]
  4. Chou, J. S.; Hsu, S. C.; Ho, C. C. Integrating ensemble learning and hedonic regression for real estate appraisal. Expert Systems with Applications 188 2022, 116026. [Google Scholar] [CrossRef]
  5. Diewert, W. E.; Heravi, S. M.; Silver, M. Diewert, W. E., Greenlees, J., Hulten, C., Eds.; Hedonic imputation indexes versus time dummy hedonic indexes. In Price index concepts and measurement; University of Chicago Press, 2011; pp. 323–352. [Google Scholar]
  6. Glaeser, E. L.; Gyourko, J.; Saks, R. E. Urban growth and housing supply. Journal of Economic Geography 2016, 6(1), 71–89. [Google Scholar] [CrossRef]
  7. Goodman, A. C. Andrew Court and the invention of hedonic price analysis. Journal of Urban Economics 1998, 44(2), 291–298. [Google Scholar] [CrossRef]
  8. Gorjian, M. A deep learning-based methodology to re-construct optimized re-structured mesh from architectural presentations. Doctoral dissertation, Texas A&M University). Texas A&M University, 2024. Available online: https://oaktrust.library.tamu.edu/items/0efc414a-f1a9-4ec3-bd19-f99d2a6e3392.
  9. Gorjian, M. Green gentrification and community health in urban landscape: A scoping review of urban greening’s social impacts (Version 1) [Preprint]. Research Square 2025. [Google Scholar] [CrossRef]
  10. Gorjian, M. Green schoolyard investments and urban equity: A systematic review of economic and social impacts using spatial-statistical methods [Preprint]; Research Square, 2025. [Google Scholar] [CrossRef]
  11. Gorjian, M. Green schoolyard investments influence local-level economic and equity outcomes through spatial-statistical modeling and geospatial analysis in urban contexts. arXiv 2025. [Google Scholar] [CrossRef]
  12. Gorjian, M. Schoolyard greening, child health, and neighborhood change: A comparative study of urban U.S. cities (arXiv:2507.08899). arXiv 2025. [Google Scholar] [CrossRef]
  13. Gorjian, M. The impact of greening schoolyards on surrounding residential property values: A systematic review (Version 1) [Preprint]; Research Square, 2025. [Google Scholar] [CrossRef]
  14. Gorjian, M. Greening schoolyards and the spatial distribution of property values in Denver, Colorado [Preprint]. arXiv 2025. [Google Scholar] [CrossRef]
  15. Gorjian, M. The impact of greening schoolyards on residential property values [Working paper]; SSRN, 11 July 2025. [Google Scholar] [CrossRef]
  16. Gorjian, M. Analyzing the relationship between urban greening and gentrification: Empirical findings from Denver, Colorado. SSRN 2025. [Google Scholar] [CrossRef]
  17. Gorjian, M. Greening schoolyards and urban property values: A systematic review of geospatial and statistical evidence [Preprint]. arXiv 2025. [Google Scholar] [CrossRef]
  18. Gorjian, M. Urban schoolyard greening: A systematic review of child health and neighborhood change [Preprint]. Research Square 2025. [Google Scholar] [CrossRef]
  19. Gorjian, M.; Quek, F. Enhancing consistency in sensible mixed reality systems: A calibration approach integrating haptic and tracking systems [Preprint; EasyChair, 2024; Available online: https://easychair.org/publications/preprint/KVSZ.
  20. Gorjian, M.; Caffey, S. M.; Luhan, G. A. Exploring architectural design 3D reconstruction approaches through deep learning methods: A comprehensive survey. Athens Journal of Sciences 2024, 11(2), 1–29. Available online: https://www.athensjournals.gr/sciences/2024-6026-AJS-Gorjian-02.pdf.
  21. Gorjian, M.; Caffey, S. M.; Luhan, G. A. Analysis of design algorithms and fabrication of a graph-based double-curvature structure with planar hexagonal panels. arXiv 2025. [Google Scholar] [CrossRef]
  22. Gorjian, M.; Caffey, S. M.; Luhan, G. A. Exploring architectural design 3D reconstruction approaches through deep learning methods: A comprehensive survey. Athens Journal of Sciences 12 2025, 1–29. [Google Scholar] [CrossRef]
  23. Gorjian, M.; Luhan, G. A.; Caffey, S. M. Analysis of design algorithms and fabrication of a graph-based double-curvature structure with planar hexagonal panels. arXiv 2025. [Google Scholar] [CrossRef]
  24. Hong, S. H.; Lee, D.; Kim, T. Comparison of machine learning models for housing price prediction. Sustainability 2020, 12(24), 10348. [Google Scholar] [CrossRef]
  25. Howard, G.; Liebersohn, J. Regional effects on real estate pricing: A review. Regional Studies 2023, 57(2), 283–298. [Google Scholar] [CrossRef]
  26. Hoxha, E. Predicting housing prices with decision trees: Evidence from emerging markets. Journal of Property Research 2024, 41(1), 47–68. [Google Scholar] [CrossRef]
  27. Hwang, M.; Quigley, J. M. Economic fundamentals in local housing markets: Evidence from US metropolitan regions. Regional Science and Urban Economics 2006, 36(2), 183–206. [Google Scholar] [CrossRef]
  28. Jayantha, W. M.; Oladinrin, T. O. Artificial intelligence and real estate valuation: A systematic review. Journal of Property Investment & Finance 2019, 37(3), 223–240. [Google Scholar] [CrossRef]
  29. Lundberg, S. M.; Lee, S. I. A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems 30 2017, 4765–4774. [Google Scholar] [CrossRef]
  30. Mark, J.; Kim, M. The impact of demographic changes on housing prices: An empirical analysis. Journal of Housing Economics 2007, 16(2), 125–144. [Google Scholar] [CrossRef]
  31. Page, M. J.; McKenzie, J. E.; Bossuyt, P. M.; Boutron, I.; Hoffmann, T. C.; Mulrow, C. D.; Moher, D. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 372 2021, n71. [Google Scholar] [CrossRef]
  32. Raina, A. S.; Mone, V.; Gorjian, M.; Quek, F.; Sueda, S.; Krishnamurthy, V. R. Blended physical-digital kinesthetic feedback for mixed reality-based conceptual design-in-context. In Proceedings of the 50th Graphics Interface Conference (Article 6; ACM, 2024; pp. 1–16. [Google Scholar] [CrossRef]
  33. Rico-Juan, J. R.; Taltavull, P. Hedonic and machine learning models for real estate valuation: A critical review. Urban Science 2021, 5(2), 32. [Google Scholar] [CrossRef]
  34. Rigatti, S. J. Random forest. Journal of Insurance Medicine 2017, 49(4), 391–395. [Google Scholar] [CrossRef] [PubMed]
  35. Rosen, S. Hedonic prices and implicit markets: Product differentiation in pure competition. Journal of Political Economy 1974, 82(1), 34–55. [Google Scholar] [CrossRef]
  36. Rouhiaine, N. Machine learning approaches to housing price prediction. Computers, Environment and Urban Systems 68 2018, 36–43. [Google Scholar] [CrossRef]
  37. Schläpfer, F.; Waltert, F.; Segura, L.; Kienast, F.; Bürgi, M. Impact of land-use and landscape pattern on real estate prices in the Swiss Alps. Ecological Economics 112 2015, 372–382. [Google Scholar] [CrossRef]
  38. Selim, H. Determinants of house prices in Turkey: Hedonic regression versus artificial neural network. Expert Systems with Applications 2009, 36(2), 2843–2852. [Google Scholar] [CrossRef]
  39. Shmueli, G. To explain or to predict? Statistical Science 2010, 25(3), 289–310. [Google Scholar] [CrossRef]
  40. Soltani, A.; Chua, M.; Perera, R. A comparative study of machine learning models for house price prediction in Australia. Property Management 2022, 40(2), 209–225. [Google Scholar] [CrossRef]
  41. Wang, T. Housing price prediction using image data and machine learning. Applied Artificial Intelligence 2023, 37(1), 62–81. [Google Scholar] [CrossRef]
  42. Wang, Y.; Li, H. A review of mass appraisal models for real estate. Land Use Policy 81 2019, 263–273. [Google Scholar] [CrossRef]
  43. Xiao, Q. Hedonic price modeling and housing market segmentation. Habitat International 64 2017, 110–118. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated