Submitted:
06 October 2025
Posted:
06 October 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Literature Review
2.1. Overview
2.2. Determinants of Airbnb Pricing
2.3. Machine-Learning Approaches to Price Prediction
2.4. Methodological Limitations in Prior Studies
2.5. Comparative Summary of Key Studies
2.6. Identified Research Gaps
- Limited robustness testing — Few studies employ multi-seed evaluation to confirm model stability across random splits.
- Narrow feature scope — Calendar availability, host activity, and behavioral variables are rarely included, despite their potential explanatory power.
- Lack of transparency — Many high-performing models are not easily interpretable, limiting their usefulness for non-technical stakeholders.
3. Methodology
3.1. Research Approach
3.2. Data Source and Integration
- Listings.csv – Property-level metadata, amenities, host information, and nightly prices.
- Reviews.csv – Guest reviews and timestamps, used to generate host-activity features.
- Calendar files – Daily availability and booking rules for each listing.
- Neighbourhoods.xlsx – Names and coordinates of Seattle neighborhoods.
3.3. Data Cleaning and Preparation
3.4. Feature Engineering
- Structural / Capacity Features: accommodates, bedrooms, and bathrooms_num, plus a composite size_index combining these indicators.
- Amenities: Binary flags for common amenities (workspace, gym, hot tub, pool, air conditioning) and a total amenity_count.
- Neighborhood Tier: Each neighborhood’s median price was used to classify it as High, Mid, or Low tier, then one-hot encoded.
- Host and Review Metrics: review_count, reviews_90d, and a multi_listing_host flag identifying hosts managing multiple properties.
- Calendar Variables: Aggregated metrics derived from locally parsed calendar files, including 30-, 90-, and 180-day availability rates, median minimum nights, and booking-rule statistics.
3.5. Exploratory Data Analysis
3.6. Model Specification
3.7. Training, Validation, and Robustness
3.8. Implementation and Reproducibility
4. Results
4.1. Overview
4.2. Model Performance Summary
4.3. Cross-Validation Results
4.4. Multi-Seed Robustness Analysis
4.5. Feature Importance Analysis
4.6. Model Interpretation and Error Patterns
4.7. Comparative Performance in Context
4.8. Summary of Findings
5. Discussion and Implications
5.1. Overview
5.2. Interpretation of Findings
5.3. Methodological Contributions
5.4. Theoretical Implications
5.5. Practical Implications
5.6. Broader Analytical Significance
5.7. Summary
6. Limitations and Future Research
6.1. Data Limitations
6.2. Methodological Limitations
6.3. Future Research Directions
7. Conclusions
References
- Alharbi, Z. H. (2023). A Sustainable Price Prediction Model for Airbnb Listings Using Machine Learning and Sentiment Analysis. Sustainability, 15(17), 13159. [CrossRef]
- Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. [CrossRef]
- Camatti, N., di Tollo, G., Filograsso, G. et al. Predicting Airbnb pricing: a comparative analysis of artificial intelligence and traditional approaches. Comput Manag Sci 21, 30 (2024). [CrossRef]
- Gunter, U. and Önder, İ. (2018). determinants of airbnb demand in vienna and their implications for the traditional accommodation industry. Tourism Economics, 24(3), 270-293. [CrossRef]
- Jeroen Oskam, Albert Boswijk; Airbnb: the future of networked hospitality businesses. Journal of Tourism Futures 14 March 2016; 2 (1): 22–42. [CrossRef]
- Pedregosa, F., Varoquaux, G., Gramfort, A., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830. https://www.jmlr.org/papers/v12/pedregosa11a.html.
- Teubner, T. , Hawlitschek, F., & Dann, D. (2017). Price determinants on airbnb: how reputation pays off in the sharing economy. Journal of Self-Governance and Management Economics, 5(4), 53. [CrossRef]
- Varma, S. and Simon, R. (2006). Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics, 7(1). [CrossRef]
- Wang, D. and Nicolau, J. (2017). Price determinants of sharing economy based accommodation rental: a study of listings from 33 cities on airbnb.com. International Journal of Hospitality Management, 62, 120-131. [CrossRef]
- Zervas, G., Proserpio, D., & Byers, J. W. (2017). The rise of the sharing economy: Estimating the impact of Airbnb on the hotel industry. Journal of Marketing Research, 54(5), 687–705. [CrossRef]



| Study | City / Dataset | Model(s) | Distinctive Features |
| Oskam & Boswijk (2016) | Amsterdam | OLS | Structural + locational |
| Gunter & Önder (2018) | Vienna | Spatial econometric | Neighborhood density, demand effects |
| Teubner et al. (2017) | Multi-city (Germany) | Hedonic / Regression | Reputation and trust factors |
| Wang & Nicolau (2017) | 33 cities | Linear regression | Amenity + capacity dominant |
| Alharbi (2023) | Barcelona | GBM + sentiment | Textual sentiment features |
| Camatti et al. (2024) | Netherlands | AI vs traditional | Explainability focus |
| Present Study (2025) | Seattle | Linear, Ridge, RF | Multi-seed validation |
| Model | log MAE | log RMSE | R² | MAE ($) | RMSE ($) |
| Linear Regression | 0.283 | 0.370 | 0.637 | $63 | $152 |
| Ridge Regression | 0.283 | 0.370 | 0.637 | $62 | $147 |
| Random Forest Regressor | 0.235 | 0.321 | 0.726 | $51 | $91 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).