Submitted:
14 May 2025
Posted:
15 May 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Theoretical Background
2.1. Sentiment Analysis
2.2. Aspect-Based Sentiment Analysis
2.3. PhoBERT – Pre-Trained Language Models for Vietnamese
3. Data preparation and text preprocessing
3.1. Dataset
3.2. Data Preprocessing
- Converting all text to lowercase: This eliminates unnecessary distinctions between uppercase and lowercase letters.
- Removing null values and invalid comments: Empty comments or those containing inappropriate content are removed from the dataset.
- Standardizing vocabulary and language: The text is normalized by replacing abbreviations, slang, and spelling errors with standard Vietnamese expressions. We use a customized dictionary specifically designed for e-commerce data related to footwear. For example, "giày đẹp vl" is standardized to "giày rất đẹp".
- Converting emojis and punctuation: Symbols such as ":)", "<3", and "^^" are converted into explicit emotional expressions ("vui vẻ", "yêu thích", "hài lòng").
- Word segmentation: Texts are segmented using the VNCoreNLP tool to ensure that Vietnamese compound words such as "giày thể thao" are correctly recognized rather than split into separate words like "giày", "thể", "thao".
- Removing short comments: Comments with fewer than five words are considered insufficiently informative and are removed to improve training accuracy. Additionally, generic or vague comments containing keywords like"ok", "ổn", "rồi", "bình thường", "tạm", "được", "giao" are also filtered out to ensure specificity in sentiment analysis.


4. Experimental Results
4.1. Experimental Setup
- Accuracy: The ratio of the total number of correct predictions to the total number of instances in the test set. This metric reflects the overall correctness of the model.
- Precision: The average of precision scores computed separately for each class. Precision measures the proportion of correctly predicted positive instances over all instances predicted as positive, helping assess the model's ability to avoid false positives.
- Recall: The average of recall scores computed separately for each class. Recall evaluates the model’s ability to correctly identify positive instances within the actual dataset.
- F1-score: The average of F1-scores computed separately for each class. The F1-score is the harmonic mean of precision and recall, balancing both metrics, especially important in imbalanced datasets.
4.2. Experimental Results





5. Conclusions
References
- Chen, X.; Wu, S. Z.; Hong, M. Understanding Gradient Clipping in Private SGD: A Geometric Perspective. Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. F., Lin, H., Eds.; In Advances in Neural Information Processing Systems; Curran Associates, Inc., 2020; Vol. 33, pp. 13773–13782. https://proceedings.neurips.cc/paper_files/paper/2020/file/9ecff5455677b38d19f49ce658ef0608-Paper.pdf.
- Dang, C.; Moreno-García, M. N.; De la Prieta, F.; Nguyen, K. V.; Ngo, V. M. Sentiment Analysis for Vietnamese–Based Hybrid Deep Learning Models. Preprints 2023. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2019. https://arxiv.org/abs/1810.04805.
- Do, H. H.; Prasad, P. W.; Maag, A.; Alsadoon, A. Deep learning for aspect-based sentiment analysis: A comparative review. Expert Systems with Applications 2019, 118, 272–299. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. 2018. https://arxiv.org/abs/1708.02002.
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. 2019. https://arxiv.org/abs/1907.11692.
- Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. 2019. https://arxiv.org/abs/1711.05101.
- Mejova, Y. Sentiment analysis: An overview. University of Iowa, Computer Science Department 2009, 5, 1–34. [Google Scholar]
- Nguyen, D. Q.; Tuan Nguyen, A. PhoBERT: Pre-trained language models for Vietnamese. In Findings of the Association for Computational Linguistics: EMNLP 2020; Cohn, T., He, Y., Liu, Y., Eds.; Association for Computational Linguistics, 2020; pp. 1037–1042. [Google Scholar] [CrossRef]
- Tran, Q.-L.; Le, P. T. D.; Do, T.-H. Aspect-based sentiment analysis for Vietnamese reviews about beauty product on E-commerce websites. In Proceedings of the 36th Pacific Asia Conference on Language, Information and Computation; 2022; pp. 767–776. [Google Scholar]
- Van Thin, D.; Hao, D. N.; Nguyen, N. L.-T. Vietnamese Sentiment Analysis: An Overview and Comparative Study of Fine-tuning Pretrained Language Models. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 2023, 22(6). [Google Scholar] [CrossRef]
- Webb, G. I.; Keogh, E.; Miikkulainen, R. Naïve Bayes. Encyclopedia of Machine Learning 2010, 15(1), 713–714. [Google Scholar]
- Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.; Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M.; Davison, J.; Shleifer, S.; Platen, P. von; Ma, C.; Jernite, Y.; Plu, J.; Xu, C.; Scao, T. L.; Gugger, S.; Rush, A. M. HuggingFace’s Transformers: State-of-the-art Natural Language Processing. 2020. https://arxiv.org/abs/1910.03771.
- Xue, H.; Yang, Q.; Chen, S. SVM: Support vector machines. In The top ten algorithms in data mining; Chapman and Hall/CRC, 2009; pp. 51–74. [Google Scholar]

| Khía cạnh | Định nghĩa |
|---|---|
| Price | User feedback is related to price (high/low/….) |
| Shipping | User feedback related to shipping service (fast/slow/….) |
| Outlook | User feedback related to the shoe's appearance (beautiful/ugly/dirty/….) |
| Quality | User feedback regarding the quality of the product (good/bad/damaged/….) |
| Size | User feedback regarding shoe size (tight/medium/small/wide/….) |
| Shop_Service | User feedback related to the quality of the seller's customer care (satisfied/unsatisfied/not good/….) |
| General | User feedback regarding the overall condition of the shoe (ok/ no problem/….) |
| Others | User feedback is not related to the product or aspects mentioned above |
| 1 | |
| 2 | |
| 3 | |
| 4 | |
| 5 | |
| 6 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).