Submitted:
18 June 2023
Posted:
19 June 2023
You are already at the latest version
Abstract
Keywords:
1. Introduction
- We present a hybrid deep learning model that effectively captures Vietnamese text features by combining CNN, LSTM, and SVM.
- We comprehensively compare our proposed model with several baseline models, including traditional machine learning techniques and standalone deep learning models, demonstrating the superiority of our approach on benchmark Vietnamese sentiment analysis datasets.
- Our work contributes to the growing body of research on Vietnamese NLP, providing insights and directions for future studies in this area.
2. Related Work
3. Proposed Model
3.1. Word Embedding Layer
3.2. CNN Layer
3.3. LSTM Layer with Attention Mechanism
3.4. Fully Connected Layer and Output
4. Experimental Results
4.1. Dataset and Preprocessing
- USAL-UTH dataset collected by us: The USAL-UTH dataset consists of 10,000 Vietnamese customer comments collected from Shopee1 e-commerce system. The reviews are labeled as positive or negative based on the rating, Negative with a 1-2 start, Positive with a five start.
- UIT-VSFC dataset [27]: The UIT-VSFC dataset contains 16,000 Vietnamese students’ feedback, with three classes: positive, neutral, and negative.
4.2. Results and Discussion
5. Conclusions and Future Work
| 1. | https://shopee.vn/ |
References
- Chen, Y.: Convolutional neural network for sentence classification. University of Waterloo (2015). 2015.
- Paredes-Valverde, M.A., Colomo-Palacios, R., Salas-Zárate, M.d.P., Valencia-García, R.: Sentiment analysis in Spanish for improvement of products and services: A deep learning approach. Scientific Programming 2017, (2017). 2017.
- Vateekul, P., Koomsubha, T.: A study of sentiment analysis using deep learning techniques on Thai Twitter data. In: 2016 13th International joint conference on computer science and software engineering (JCSSE), pp. 1-6. IEEE, (2016).
- Roshanfekr, B., Khadivi, S., Rahmati, M.: Sentiment analysis using deep learning on Persian texts. In: 2017 Iranian conference on electrical engineering (ICEE), pp. 1503-1508. IEEE, (Year).
- Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. Advances in neural information processing systems 28, (2015). 28.
- Alomari, K.M., ElSherif, H.M., Shaalan, K.: Arabic tweets sentimental analysis using machine learning. In: Advances in Artificial Intelligence: From Theory to Practice: 30th International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2017, Arras, France, June 27-30, 2017, Proceedings, Part I 30, pp. 602-610. Springer, (2017). 27 June.
- Yadav, V. Yadav, V., Verma, P., Katiyar, V.: Long short term memory (LSTM) model for sentiment analysis in social data for e-commerce products reviews in Hindi languages. International Journal of Information Technology 15, 759-772 (2023). 2023; 15. [Google Scholar] [CrossRef]
- Le, C.-C., Prasad, P., Alsadoon, A., Pham, L., Elchouemi, A.: Text classification: Naive Bayes classifier with sentiment lexicon. IAENG International journal of computer science 46, 141-148 (2019).
- Le, H.P., Nguyen, T.M.H., Nguyen, P.T., Vu, X.L.: Building a Large Syntactically-Annotated Corpus of Vietnamese. In: The Third Linguistic Annotation Workshop (The LAW III), pp. 6p., (2009). [CrossRef]
- Bang, T.S., Haruechaiyasak, C., Sornlertlamvanich, V.: Vietnamese sentiment analysis based on term feature selection approach. In: Proc. 10th International Conference on Knowledge Information and Creativity Support Systems (KICSS 2015), pp. 196-204. (2015).
- Kieu, B.T., Pham, S.B.: Sentiment analysis for Vietnamese. In: 2010 Second international conference on knowledge and systems engineering, pp. 152-157. IEEE, (2010). 2010.
- Trinh, S., Nguyen, L., Vo, M., Do, P.: Lexicon-based sentiment analysis of Facebook comments in Vietnamese language. Recent developments in intelligent information and database systems 263-276 (2016).
- Nguyen, P., Le, L., Ngo, V., Nguyen, P.: Using Entity Relations for Opinion Mining of Vietnamese Comments. arXiv preprint arXiv:1905.06647 (2019).
- Nguyen, L., Pham, N., Ngo, V.M.: Opinion Spam Recognition Method for Online Reviews using Ontological Features. arXiv preprint arXiv:1807.11024 (2018).
- Nguyen, V.D., Van Nguyen, K., Nguyen, N.L.-T.: Variants of long short-term memory for sentiment analysis on Vietnamese students’ feedback corpus. In: 2018 10th International conference on knowledge and systems engineering (KSE), pp. 306-311. IEEE, (2018). [CrossRef]
- Nguyen, K.T.-T. Nguyen, K.T.-T., Huynh, S.K., Phan, L.L., Pham, P.H., Nguyen, D.-V., Van Nguyen, K.: Span detection for aspect-based sentiment analysis in Vietnamese. arXiv preprint arXiv:2110.07833 (2021).
- Vo, Q.-H., Nguyen, H.-T., Le, B., Nguyen, M.-L.: Multi-channel LSTM-CNN model for Vietnamese sentiment analysis. In: 2017 9th international conference on knowledge and systems engineering (KSE), pp. 24-29. IEEE, (2017).
- Nguyen, Q.T., Nguyen, T.L., Luong, N.H., Ngo, Q.H.: Fine-tuning bert for sentiment analysis of vietnamese reviews. In: 2020 7th NAFOSTED conference on information and computer science (NICS), pp. 302-307. IEEE, (2020).
- Truong, T.-L., Le, H.-L., Le-Dang, T.-P.: Sentiment Analysis Implementing BERT-based Pre-trained Language Model for Vietnamese. In: 2020 7th NAFOSTED Conference on Information and Computer Science (NICS), pp. 362-367. IEEE, (2020). [CrossRef]
- Dang, C.N., Moreno-García, M.N., De la Prieta, F.: Hybrid deep learning models for sentiment analysis. Complexity 2021, 1-16 (2021). [CrossRef]
- Dang, C.N., Moreno-García, M.N., De la Prieta, F.: Using hybrid deep learning models of sentiment analysis and item genres in recommender systems for streaming services. Electronics 10, 2459 (2021).
- Yamashita, R., Nishio, M., Do, R.K.G., Togashi, K.: Convolutional neural networks: an overview and application in radiology. Insights into imaging 9, 611-629 (2018). [CrossRef]
- Hochreiter, S., Schmidhuber, J.: LSTM can solve hard long time lag problems. Advances in neural information processing systems 9, (1996). 1996; 9.
- https://pytorch.org/.
- https://colab.research.google.com/notebooks/pro.ipynb.
- 26. Aggarwal, C.C.: Neural networks and deep learning. Springer 10, 3 (2018).
- 27. Van Nguyen, K., Nguyen, V.D., Nguyen, P.X., Truong, T.T., Nguyen, N.L.-T.: UIT-VSFC: Vietnamese students’ feedback corpus for sentiment analysis. In: 2018 10th international conference on knowledge and systems engineering (KSE), pp. 19-24. IEEE, (2018).
- Nguyen, D.Q., Nguyen, A.T.: PhoBERT: Pre-trained language models for Vietnamese. arXiv preprint arXiv:2003.00744 (2020).





| Layer (type) | Output Shape | Parameter # |
|---|---|---|
| BERT (base-uncased) | (None, 768, 1) | 110,000,000 |
| conv1d (Conv1D) | (None, 768, 512) | 2,048 |
| conv1d_1 (Conv1D) | (None, 768, 256) | 393,472 |
| conv1d_2 (Conv1D) | (None, 768, 128) | 98,432 |
| lstm_1 (LSTM) | (None, 500) | 1,258,000 |
| dense_1 (Dense) | (None, 50) | 25,050 |
| dense_2 (Dense) | (None, 2) | 102 |
| Total parameters: 111,777,104 | ||
| Trainable parameters: 111,777,104 | ||
| Non-trainable parameters: 0 |
| Layer (type) | Output Shape | Parameter # |
|---|---|---|
| BERT (base-uncased) | (None, 768, 1) | 110,000,000 |
| lstm_1 (LSTM) | (None, 768, 500) | 1,004,000 |
| conv1d_1 (Conv1D) | (None, 768, 512) | 768,512 |
| conv1d_2 (Conv1D) | (None, 768, 256) | 393,472 |
| conv1d_3 (Conv1D) | (None, 768, 128) | 98,432 |
| flatten (Flatten) | (None, 98304) | 0 |
| dense_2 (Dense) | (None, 50) | 4,915,250 |
| dense_3 (Dense) | (None, 2) | 102 |
| Total parameters: 117,179,768 | ||
| Trainable parameters: 117,179,768 | ||
| Non-trainable parameters: 0 |
| Models | Accuracy | Recall | Precision | F-Score | AUC |
|---|---|---|---|---|---|
| SVM | 92.24 | 81.62 | 78.85 | 77.99 | 87.22 |
| DNN | 93.56 | 84.74 | 79.86 | 81.80 | 88.03 |
| CNN | 93.49 | 84.56 | 80.00 | 81.85 | 88.09 |
| LSTM | 93.42 | 82.93 | 80.74 | 81.69 | 88.49 |
| LSTM-CNN | 93.53 | 83.47 | 80.72 | 81.92 | 88.50 |
| CNN-LSTM | 93.52 | 83.82 | 80.61 | 82.01 | 88.43 |
| Models | Accuracy | Recall | Precision | F-Score | AUC |
|---|---|---|---|---|---|
| SVM | 93.33 | 94.54 | 92.57 | 93.52 | 93.35 |
| DNN | 93.35 | 93.38 | 93.90 | 93.63 | 93.36 |
| CNN | 93.56 | 94.06 | 93.58 | 93.81 | 93.56 |
| LSTM | 93.38 | 93.89 | 93.37 | 93.62 | 93.38 |
| LSTM-CNN | 93.21 | 94.17 | 92.73 | 93.42 | 93.21 |
| CNN-LSTM | 93.94 | 95.12 | 93.17 | 94.12 | 93.95 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).