Submitted:
10 April 2026
Posted:
14 April 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Related Work
2.1. Sentiment Analysis in Hospitality Reviews
2.2. Rating Prediction with Deep Learning
2.3. Unreliable and Fake Review Detection
2.4. Transformer and LLM Approaches in Tourism NLP
3. Dataset and Problem Formulation
3.1. Task A: Review-Level RATING prediction
3.2. Task B: Temporal Hotel Rating Forecasting
4. Preprocessing and Feature Engineering
4.1. Text Cleaning and Normalization
4.2. Sentiment Analysis
4.3. Similarity Detection
4.4. Correlation Analysis
4.5. User Behavior Analysis
4.6. Anomaly Score Construction and Filtering
5. Model Architectures and Training
5.1. Embedding Strategy
5.2. LSTM Baseline
5.3. Bidirectional LSTM
5.4. DistilBERT Baseline
5.5. Training Protocol
6. Experimental Setup
7. Results and Analysis
7.1. Performance Metrics
7.2. Ablation Study
7.3. Error Analysis
7.4. Consistency of Results Across Seeds
8. Discussion
9. Limitations and Future Work
10. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| Adam | Adaptive Moment Estimation |
| AI | Artificial Intelligence |
| BERT | Bidirectional Encoder Representations from Transformers |
| BiLSTM | Bidirectional Long Short-Term Memory |
| CNN | Convolutional Neural Network |
| CPU | Central Processing Unit |
| DL | Deep Learning |
| DistilBERT | Distilled Bidirectional Encoder Representations from Transformers |
| ELECTRA | Efficiently Learning an Encoder that Classifies Token Replacements Accurately |
| ERNIE | Enhanced Representation through kNowledge IntEgration |
| GPT | Generative Pre-trained Transformer |
| LLM | Large Language Model |
| LSTM | Long Short-Term Memory |
| MAE | Mean Absolute Error |
| ML | Machine Learning |
| MSE | Mean Squared Error |
| NLP | Natural Language Processing |
| ReLU | Rectified Linear Unit |
| RMSE | Root Mean Square Error |
| RoBERTa | Robustly Optimized BERT Pretraining Approach |
| TF-IDF | Term Frequency–Inverse Document Frequency |
| VADER | Valence Aware Dictionary and sEntiment Reasoner |
| XGBoost | Extreme Gradient Boosting |
References
- Zhuang, Y.; Kim, J. A BERT-Based Multi-Criteria Recommender System for Hotel Promotion Management. Sustainability 2021, 13, 8039. [Google Scholar] [CrossRef]
- Zheng, T.; Wu, F.; Law, R.; Qiu, Q.; Wu, R. Identifying unreliable online hospitality reviews with biased user-given ratings: A deep learning forecasting approach. Int. J. Hosp. Manag. 2021, 92, 102658. [Google Scholar] [CrossRef]
- Ahmed, B.H.; Ghabayen, A.S. Review rating prediction framework using deep learning. J. Ambient Intell. Humaniz. Comput. 2022, 13, 3423–3432. [Google Scholar] [CrossRef]
- The Devastator. Booking.com Hotel Reviews. Available online: https://www.kaggle.com/datasets/thedevastator/booking-com-hotel-reviews/ (accessed on 5 January 2026).
- Nikolić, M.; Stojanović, M.; Marjanović, M. The Power of Words: Leveraging Deep Learning Techniques to Predict Hotel Ratings from User Reviews. In Proceedings of the 24th International Symposium INFOTEH-JAHORINA (INFOTEH 2025); IEEE: Jahorina, Bosnia and Herzegovina, 19–21 March 2025; pp. 1–6. [Google Scholar] [CrossRef]
- Özen, İA.; Özgül Katlav, E. Aspect-based sentiment analysis on online customer reviews: A case study of technology-supported hotels. J. Hosp. Tour. Technol. 2023, 14, 102–120. [Google Scholar] [CrossRef]
- Wen, Y.; Liang, Y.; Zhu, X. Sentiment analysis of hotel online reviews using the BERT model and ERNIE model—Data from China. PLOS ONE 2023, 18, e0275382. [Google Scholar] [CrossRef]
- Husein, A.M.; Livando, N.; Andika, A.; Chandra, W.; Phan, G. Sentiment analysis of hotel reviews on TripAdvisor with LSTM and ELECTRA. Sinkron 2023, 7, 733–740. [Google Scholar] [CrossRef]
- Chen, P.; Fu, L. Enhancing multimodal tourism review sentiment analysis through advanced feature association techniques. Int. J. Inf. Syst. Serv. Sect. 2024, 15, 1–21. [Google Scholar] [CrossRef]
- Nikolić, M.; Stojanović, M.; Marjanović, M. Integrating deep learning for automated detection of negative hotel reviews. Facta Univ. Ser. Autom. Control Robot. 2025, 24, 1–16. [Google Scholar] [CrossRef]
- Puh, K.; Bagić Babac, M. Predicting sentiment and rating of tourist reviews using machine learning. J. Hosp. Tour. Insights 2023, 6, 1188–1204. [Google Scholar] [CrossRef]
- Zhang, D.; Wu, C. What online review features really matter? An explainable deep learning approach for hotel demand forecasting. J. Assoc. Inf. Sci. Technol. 2023, 74, 1100–1117. [Google Scholar] [CrossRef]
- Hossen, M.S.; Jony, A.H.; Tabassum, T.; Islam, M.T.; Rahman, M.M.; Khatun, T. Hotel review analysis for the prediction of business using deep learning approach. In Proceedings of the 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS); IEEE: Coimbatore, India, 2021; pp. 1489–1494. [Google Scholar] [CrossRef]
- Zhang, H.; Kassim, A.M.; Samsudin, N.H.; Teng, L.; Tang, C.Y. A hybrid deep learning framework for hotel rating systems: Integrating Word2Vec, TF-IDF, and Bi-LSTM with attention mechanism. IEEE Trans. Comput. Soc. Syst. 2025, 12, 2371–2384. [Google Scholar] [CrossRef]
- Ganji, R.N.; Dadkhah, C.; Tohidi, N. Improving sentiment classification for hotel recommender system through deep learning and data balancing. Comput. Sist. 2023, 27, 811–825. [Google Scholar] [CrossRef]
- Zhao, R.; Hao, Y.; Li, X. Business analysis: User attitude evaluation and prediction based on hotel user reviews and text mining. arXiv 2024, arXiv:2412.16744. [Google Scholar] [CrossRef]
- Nikolić, M.; Stojanović, M.; Marjanović, M. Integrating data science and predictive modeling for detecting inconsistent hotel reviews. In Proceedings of UNITECH 2024 – Selected Papers; Technical University of Gabrovo: Gabrovo, Bulgaria, 2024; pp. 104–110. [Google Scholar] [CrossRef]
- Alsubari, S.N.; Deshmukh, S.N.; Alqarni, A.A.; Alsharif, N.; Aldhyani, T.H.H.; Alsaade, F.W.; Khalaf, O.I. Data analytics for the identification of fake reviews using supervised learning. Comput. Mater. Continua 2022, 70, 3189–3204. [Google Scholar] [CrossRef]
- Duma, R.A.; Niu, Z.; Nyamawe, A.S.; Tchaye-Kondi, J.; Yungaicela-Naula, N.; Abdulhamid, S.M. A deep hybrid model for fake review detection by jointly leveraging review text, overall ratings, and aspect ratings. Soft Comput. 2023, 27, 6281–6296. [Google Scholar] [CrossRef]
- Prasetyaningrum, P.T.; Suria, O.; Ibrahim, N.; Riadi, I. Smart sentiment forensics: Integrating AI and digital forensics for fake hotel review detection. In Proceedings of the 2025 2nd International Conference on Information System and Information Technology (ICISIT); IEEE: Yogyakarta, Indonesia, 2025; pp. 1–6. [Google Scholar] [CrossRef]
- Nikolić, M.; Stojanović, M.; Marjanović, M. Anomaly detection in hotel reviews: Applying data science for enhanced review integrity. In Proceedings of the 32nd Telecommunications Forum (TELFOR 2024); IEEE: Belgrade, Serbia, 26–27 November 2024; pp. 1–6. [Google Scholar] [CrossRef]
- Zhang, L.; Guo, J.; Kang, R.; Zhao, B.; Zhang, C.; Li, J. Hotel review classification based on the text pretraining heterogeneous graph neural network model. Comput. Intell. Neurosci. 2022, 2022, 5259305. [Google Scholar] [CrossRef]
- Deng, L.; Yin, T.; Li, Z.; Ge, Q. Analysis of the effectiveness of CNN-LSTM models incorporating BERT and attention mechanisms in sentiment analysis of data reviews. In Proceedings of the 2023 4th International Conference on Big Data and Informatization Education (ICBDIE 2023); Atlantis Press, 2023; pp. 821–829. [Google Scholar] [CrossRef]
- Chen, N.; Sun, Y.; Yan, Y. Sentiment analysis and research based on two-channel parallel hybrid neural network model with attention mechanism. IET Control Theory Appl. 2023, 17, 2259–2267. [Google Scholar] [CrossRef]
- Yuan, Y. DistilBERT hotel rating prediction model based on an ensemble learning framework. In Proceedings of the 2024 3rd International Conference on Electronics and Information Technology (EIT); IEEE: Chengdu, China, 2024; pp. 763–769. [Google Scholar] [CrossRef]
- Roumeliotis, K.I.; Tselikas, N.D.; Nasiopoulos, D.K. Leveraging large language models in tourism: A comparative study of the latest GPT Omni models and BERT NLP for customer review classification and sentiment analysis. Information 2024, 15, 792. [Google Scholar] [CrossRef]
- Siino, M.; Tinnirello, I.; La Cascia, M. Is text preprocessing still worth the time? A comparative survey on the influence of popular preprocessing methods on Transformers and traditional classifiers. Inf. Syst. 2024, 121, 102342. [Google Scholar] [CrossRef]
- Wang, E.Y.; Fong, L.H.N.; Law, R. Detecting fake hospitality reviews through the interplay of emotional cues, cognitive cues and review valence. Int. J. Contemp. Hosp. Manag. 2022, 34, 184–200. [Google Scholar] [CrossRef]
- Zhang, D.; Li, W.; Niu, B.; Wu, C. A deep learning approach for detecting fake reviewers: Exploiting reviewing behavior and textual information. Decis. Support Syst. 2023, 166, 113911. [Google Scholar] [CrossRef]
- Liu, J.; Hu, S.; Mehraliyev, F.; Liu, H. Text classification in tourism and hospitality—A deep learning perspective. Int. J. Contemp. Hosp. Manag. 2023, 35, 4177–4190. [Google Scholar] [CrossRef]












| Attribute | Type | Description |
|---|---|---|
| review_title | Text | Short summary written by the reviewer |
| review_text | Text | Full narrative description of the stay |
| tags | Text | Labels provided by reviewers (trip type, room type, etc) |
| rating | Numerical | Review score assigned by the reviewer on a 1–10 scale |
| avg_rating | Numerical | Average hotel rating across all reviews for that property |
| reviewed_at | Datetime | Review posting date and time |
| reviewed_by | Text | Reviewer identifier |
| hotel_name | Text | Hotel name used as grouping key |
| nationality | Text | Reviewer country of origin |
| Property | Value |
|---|---|
| Corpus overview | |
| Total records (raw) | 26,675 |
| Records after removing missing rating / review text | 26,386 |
| Records after filtering the empty content | 26,384 |
| Records with valid timestamps for Task B | 26,279 |
| Hotel coverage | |
| Unique hotels | 819 |
| Reviews per hotel (min / mean / max) | 1 / 32.2 / 846 |
| Hotels eligible for Task B (≥20 reviews with valid timestamps) | 273 |
| Review-level rating (rating) | |
| Scale | 1.0–10.0 |
| Mean ± standard deviation | 8.56 ± 1.57 |
| Proportion of ratings ≥ 7.0 | 88.7% |
| Hotel-level average rating (avg_rating) | |
| Scale | 3.8–10.0 |
| Mean ± standard deviation | 8.45 ± 0.72 |
| Component pair | Interpretation | Pearson r | Sig. |
|---|---|---|---|
| Text sentiment vs. Rating | Strongest association with reviewer satisfaction | 0.265 | *** |
| Title sentiment vs. Rating | Positive alignment between title and score | 0.164 | *** |
| Tags sentiment vs. Rating | Very weak negative association with rating | −0.036 | *** |
| Title sentiment vs. Text sentiment | Weak positive coherence across text components | 0.192 | *** |
| Tags sentiment vs. Text sentiment | Near-zero association, tags reflect category, not sentiment | 0.002 | ns |
| Stage | Component | Detection criterion | Records |
|---|---|---|---|
| 0 | Raw dataset | — | 26,675 |
| 1 | Missing value removal | Rating or review text absent | 26,386 |
| 2 | Empty content removal | Both title and text contain no alphanumeric characters | 26,384 |
| 3 | Sentiment analysis | VADER compound score computed per component | 26,384 |
| Mismatch flagged (high rating, negative text or inverse) | 724 flagged | ||
| 4 | Similarity detection | TF-IDF cosine similarity | 10,089 flagged |
| 5 | Correlation analysis | High deviation between normalized rating and text sentiment | 26,384 scored |
| 6 | User behavior | Reviewer frequency > 3 | 17,392 flagged |
| 7 | Anomaly scoring | Weighted combination of components, scaled 0–1 | 26,384 |
| Filtering (threshold 0.4) | Anomaly score removed | 7,390 removed | |
| Final dataset | Retained for model training | 18,994 |
| Split | Time period | Reviews | Hotels | Mean rating |
|---|---|---|---|---|
| Training | July 2018 – November 2019 | 13,295 | 721 | 8.62 |
| Validation | December 2019 – February 2020 | 1,900 | 389 | 8.37 |
| Test | March 2020 – July 2021 | 3,799 | 689 | 8.43 |
| Total | July 2018 – July 2021 | 18,994 | 819 | 8.56 |
| Family | Model | Configuration note | MAE | RMSE |
|---|---|---|---|---|
|
Traditional ML |
||||
| TF-IDF + Ridge | 5,000 TF-IDF features; | 0.6996 | 1.0245 | |
| TF-IDF + XGBoost | 100 estimators; max depth 6 | 0.6598 | 1.0172 | |
|
Recurrent DL |
||||
| LSTM | Vocabulary 5k; embedding 100; learning rate 0.001; patience 5 | |||
| LSTM v2 | Vocabulary 10k; embedding 128; attention; learning rate 0.0005 | |||
| BiLSTM | Bidirectional sequence encoder with sentiment features | |||
| BiLSTM v2 | Learning rate 0.0005; patience 7; 30 epochs | |||
| BiLSTM + Attention |
Self-attention; learning rate 0.0005; patience 7 | |||
| Transformer | ||||
| DistilBERT |
distilbert-base-uncased; fine-tuned end-to-end |
| Model | Input features | MAE | RMSE |
|---|---|---|---|
| Naive persistence | Previous-window hotel average rating only | 0.244 | 0.318 |
| Linear Regression | Aggregates from the most recent window: mean sentiment, review count, anomaly proportion | 0.214 | 0.287 |
| Random Forest | Aggregates from the most recent window: mean sentiment, review count, anomaly proportion | 0.198 | 0.265 |
| XGBoost | Aggregates from the most recent window: mean sentiment, review count, anomaly proportion | 0.186 | 0.249 |
| Condition | Training reviews |
MAE | RMSE |
MAE |
|---|---|---|---|---|
|
Full pipeline (all components) |
18,994 | — | ||
| w/o Sentiment mismatch () |
19,438 | |||
| w/o Similarity detection () |
21,146 | |||
| w/o Correlation deviation () |
22,087 | |||
| w/o Behavioral flag () |
21,482 | |||
|
No filtering (all 26,384 reviews) |
26,384 |
| Rating bucket | BiLSTM+Attn MAE |
DistilBERT MAE |
(DB−BA) |
% of test set |
|---|---|---|---|---|
| 1–3 (very negative) |
0.980 | 0.840 | 1.5% | |
| 4–6 (negative/mixed) |
0.770 | 0.650 | 9.8% | |
| 7–8 (positive) |
0.570 | 0.490 | 28.6% | |
| 9–10 (very positive) |
0.536 | 0.459 | 60.1% | |
| Overall | 0.5753 | 0.4925 | 100% |
| Length quartile | Token range | BiLSTM+Attn MAE |
DistilBERT MAE |
% of test |
|---|---|---|---|---|
| Q1 (shortest) | 1–7 | 0.668 | 0.562 | 25% |
| Q2 | 8–13 | 0.594 | 0.503 | 25% |
| Q3 | 14–32 | 0.531 | 0.456 | 25% |
| Q4 (longest) | 33–614 | 0.508 | 0.449 | 25% |
| Overall | 1–614 | 0.5753 | 0.4925 | 100% |
| Model | S1 | S2 | S3 | S4 | S5 | Mean | Std | Span |
|---|---|---|---|---|---|---|---|---|
| LSTM | 1.2168 | 1.2104 | 1.2109 | 1.2077 | 1.1999 | 1.2091 | 0.0055 | 0.0169 |
| LSTM v2 | 0.6485 | 0.6584 | 0.6441 | 0.6467 | 0.6396 | 0.6475 | 0.0062 | 0.0188 |
| BiLSTM | 0.6243 | 0.6096 | 0.6262 | 0.6559 | 0.5819 | 0.6196 | 0.0241 | 0.0740 |
| BiLSTM v2 | 0.5955 | 0.6378 | 0.5908 | 0.5537 | 0.5995 | 0.5955 | 0.0267 | 0.0841 |
| BiLSTM + Attn | 0.5757 | 0.5756 | 0.5787 | 0.5695 | 0.5769 | 0.5753 | 0.0031 | 0.0092 |
| 0.4929 | 0.4892 | 0.4954 | — | — | 0.4925 | 0.0025 | 0.0062 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).