Submitted:
21 August 2024
Posted:
22 August 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Adapting sentiment analysis models to diverse linguistic contexts: By training models on low-resource languages and fine-tuning them on larger, more commonly used languages, we aim to bridge the gap in emotional lexicons and improve the performance of sentiment analysis across different languages.
- Addressing data imbalance through innovative data augmentation: We develop language-specific data augmentation strategies to tackle issues of data scarcity and imbalance, ensuring that our models are trained on diverse and representative datasets for each language.
- Introducing multi-language sentiment normalization: To ensure consistent sentiment interpretation across languages, we propose a technique that normalizes sentiment scores across multiple languages, enabling more accurate cross-lingual comparisons and analysis.
2. Related Work
3. Methods
4. Experiments
4.1. Experiment Setup
4.1.1. Dataset
- The ’Medicine’ domain has the longest average text length at 782 units.
- The ’School’ domain has the shortest average text length at 427 units.
- The average text length for ’Hotels’ and ’Products’ is relatively similar, at 773 and 756 units respectively.

- The ’School’ and ’Medicine’ domains have the longest average sentence lengths, at 110 and 111 units respectively.
- The ’Hotels’ domain has the shortest average sentence length at 92 units.
- The ’Products’ domain has an average sentence length of 98 units, which is in the mid-range.
- Across all domains, the average text length is significantly longer than the average sentence length. This is expected as a text usually includes multiple sentences.
- The difference between average text length and sentence length is greatest in the ’School’ domain, suggesting that texts in this domain may contain more sentences on average compared to the other domains.
- The ’Medicine’ domain, despite having the longest average text length, does not have the largest difference between text and sentence length. This suggests that while the texts are long, they may not contain as many sentences as in the ’School’ domain but rather longer sentences.
4.1.2. Baseline
- LASER + BiLSTM [32] presents a system for learning sentence embeddings across 93 languages using a shared encoder, enabling zero-shot transfer of NLP models to new languages and introducing a new multilingual test set.
- MultiFit [24] proposes an efficient method for fine-tuning language models in multiple languages, especially useful for low-resource languages, and includes a zero-shot approach using existing cross-lingual models.
- XLM-RoBERTa [25] introduces XLM-RoBERTa, a Transformer-based model pre-trained on 100 languages, showing significant performance gains in cross-lingual tasks and a detailed analysis of scaling effects on model performance.
- LLaMa3zero-shot and LLaMa3few-shot) [31] are developed by Meta AI, capable of generating text in response to instructions, demonstrating strong understanding and generation capabilities for complex language tasks.
4.1.3. Experiment Result for Hyperparameter Tuning
4.1.4. Evaluation
- TP (True Positive) represents the number of true positive instances,
- TN (True Negative) represents the number of true negative instances,
- FP (False Positive) represents the number of false positive instances, and
- FN (False Negative) represents the number of false negative instances.
- is the precision of the model,
- is the recall of the model.
4.2. Main Results
5. Discussion
5.1. Compare with Other Method
5.2. Analysis
5.3. Future Research Directions
6. Conclusions
Author Contributions
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| CLSA | Cross-lingual Sentiment Analysis |
| LLM | Large Language Model |
| RoBERTa | Robustly Optimized Bidirectional Encoder Representations from Transformers |
| LASER | Language-Agnostic SEntence Representations |
| BiLSTM | Bidirectional Long Short-Term Memory |
| SP | Strong Positive |
| SN | Strong Negative |
| AMB | Ambivalent |
| TP | True Positive |
| TN | True Negative |
| FP | False Positive |
| FN | False Negative |
References
- Marvin M. Agüero-Torales, José I. Abreu Salas, Antonio G. López-Herrera. Deep learning and multilingual sentiment analysis on social media data: An overview. Applied Soft Computing 2021, 107, 142–149.
- Filip, T., Pavlíček, M., Sosík, P. (2024). Fine-tuning multilingual language models in Twitter/X sentiment analysis: a study on Eastern-European V4 languages. arXiv preprint arXiv:2408.02044.
- Wong, E. P., M’hiri, F. (2024). Analyzing Language Bias Between French and English in Conventional Multilingual Sentiment Analysis Models. arXiv preprint arXiv:2405.06692.
- Thakkar, G., Hakimov, S., Tadić, M. (2024). M2SA: Multimodal and Multilingual Model for Sentiment Analysis of Tweets. arXiv preprint arXiv:2404.01753.
- Hasan, M. A. (2024). Ensemble Language Models for Multilingual Sentiment Analysis. arXiv preprint arXiv:2403.06060.
- Koto, F., Beck, T., Talat, Z., Gurevych, I., Baldwin, T. (2024). Zero-shot sentiment analysis in low-resource languages using a multilingual sentiment lexicon. arXiv preprint arXiv:2402.02113.
- Buscemi, A., Proverbio, D. (2024). Chatgpt vs gemini vs llama on multilingual sentiment analysis. arXiv preprint arXiv:2402.01715.
- Hu, D., Wei, L., Liu, Y., Zhou, W., Hu, S. (2023). UCAS-IIE-NLP at SemEval-2023 Task 12: Enhancing Generalization of Multilingual BERT for Low-resource Sentiment Analysis. arXiv preprint arXiv:2306.01093.
- Wang, M., Adel, H., Lange, L., Strötgen, J., Schütze, H. (2023). Nlnde at semeval-2023 task 12: Adaptive pretraining and source language selection for low-resource multilingual sentiment analysis. arXiv preprint arXiv:2305.00090.
- Manuvie, R., Chatterjee, S. (2023). Automated Sentiment and Hate Speech Analysis of Facebook Data by Employing Multilingual Transformer Models. arXiv preprint arXiv:2301.13668.
- Câmara, A., Taneja, N., Azad, T., Allaway, E., Zemel, R. (2022). Mapping the multilingual margins: Intersectional biases of sentiment analysis systems in English, Spanish, and Arabic. arXiv preprint arXiv:2204.03558.
- Antypas, D., Preece, A., Camacho-Collados, J. (2023). Negativity spreads faster: A large-scale multilingual twitter analysis on the role of sentiment in political communication.Online Social Networks and Media 2023, 33.
- Muhammad, S. H., Adelani, D. I., Ruder, S., Ahmad, I. S., Abdulmumin, I., Bello, B. S., ... Brazdil, P. (2022). Naijasenti: A nigerian twitter sentiment corpus for multilingual sentiment analysis. arXiv preprint arXiv:2201.08277.
- Rodrigues, R. C., Inuzuka, M. A., Gomes, J. R. S. A., Rocha, A. S., Calixto, I., & Nascimento, H. A. D. D. (2021). Zero-shot hashtag segmentation for multilingual sentiment analysis. arXiv preprint arXiv:2112.03213.
- Barbieri, F., Anke, L. E., Camacho-Collados, J. (2021). XLM-T: Multilingual language models in Twitter for sentiment analysis and beyond. arXiv preprint arXiv:2104.12250.
- Azhar, A. N., Khodra, M. L. (2020, September). Fine-tuning pretrained multilingual BERT model for Indonesian aspect-based sentiment analysis. In 2020 7th International Conference on Advance Informatics: Concepts, Theory and Applications (ICAICTA). IEEE 2020, 1–6.
- Öhman, E., Pàmies, M., Kajava, K., Tiedemann, J. (2020). XED: A multilingual dataset for sentiment analysis and emotion detection. arXiv preprint arXiv:2011.01612.
- Barriere, V., Balahur, A. (2020). Improving sentiment analysis over non-English tweets using multilingual transformers and automatic translation for data-augmentation. arXiv preprint arXiv:2010.03486.
- Shah, S. R., Kaushik, A. (2019). Sentiment analysis on indian indigenous languages: a review on multilingual opinion mining. arXiv preprint arXiv:1911.12848.
- Graff, M., Miranda-Jimenez, S., Tellez, E. S., Moctezuma, D. (2020). Evomsa: A multilingual evolutionary approach for sentiment analysis [application notes]. IEEE Computational Intelligence Magazine2020, 15(1), 76-88.
- Can, E. F., Ezen-Can, A., Can, F. (2018). Multilingual sentiment analysis: An RNN-based framework for limited data. arXiv preprint arXiv:1806.04511.
- Lu, Y., Mori, T. (2017). Deep learning paradigm with transformed monolingual word embeddings for multilingual sentiment analysis. arXiv preprint arXiv:1710.03203.
- Ruder, S., Ghaffari, P., Breslin, J. G. (2016). Insight-1 at semeval-2016 task 5: Deep learning for multilingual aspect-based sentiment analysis. arXiv preprint arXiv:1609.02748.
- Eisenschlos, J., Ruder, S., Czapla, P., Kadras, M., Gugger, S., Howard, J.: MultiFit: Efficient multi-lingual language model fine-tuning. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). pp. 5706-5711 (2019).
- Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., ... Stoyanov, V. (2019). Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116.
- Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., ... McGrew, B. (2023). Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
- Liu Zhuang, Lin Wayne, Shi Ya, and Zhao Jun. 2021. A Robustly Optimized BERT Pre-training Approach with Post-training. In Proceedings of the 20th Chinese National Conference on Computational Linguistics, pages 1218–1227, Huhhot, China. Chinese Information Processing Society of China.
- Floridi, L., Chiriatti, M. (2020). GPT-3: Its nature, scope, limits, and consequences. Minds and Machines, 30, 681-694.
- Vaswani, A. (2017). Attention is all you need. arXiv preprint arXiv:1706.03762.
- Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., ... Scialom, T. (2023). Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., ... Ganapathy, R. (2024). The llama 3 herd of models. arXiv preprint arXiv:2407.21783.
- Mikel Artetxe and Holger Schwenk. 2019. Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond. Transactions of the Association for Computational Linguistics, 7:597–610.
- Kocoń, J., Miłkowski, P., Kanclerz, K. (2021, June). Multiemo: Multilingual, multilevel, multidomain sentiment analysis corpus of consumer reviews. In International Conference on Computational Science (pp. 297-312). Cham: Springer International Publishing.




| Tpye | Domain | Train | Val | Test | SUM |
|---|---|---|---|---|---|
| Text | Hotels | 3165 | 396 | 395 | 3956 |
| Medicine | 2618 | 327 | 327 | 3272 | |
| Products | 387 | 49 | 48 | 484 | |
| School | 403 | 50 | 51 | 504 | |
| All | 6573 | 823 | 820 | 8216 | |
| Sentence | Hotels | 19881 | 2485 | 2485 | 24851 |
| Medicine | 18126 | 2265 | 2266 | 22657 | |
| Products | 5942 | 743 | 742 | 7427 | |
| School | 2025 | 253 | 253 | 2531 | |
| All | 45974 | 5745 | 5747 | 57466 |
| Language | Method | SP | 0 | SN | AMB | F1 | Micro | Macro |
|---|---|---|---|---|---|---|---|---|
| Chinese | LASER+BiLSTM | 16.45 | 0.72 | 18.70 | 0.66 | 12.64 | 62.19 | 52.45 |
| MultiFit | 85.81 | 95.02 | 86.78 | 59.91 | 83.19 | 88.79 | 87.64 | |
| RoBERTa | 86.34 | 95.69 | 87.99 | 57.13 | 84.05 | 89.37 | 87.92 | |
| LLaMazero-shot | 83.67 | 51.9 | 84.32 | 14.81 | 67.64 | 70.98 | 58.68 | |
| LLaMafew-shot | 85.98 | 75.00 | 86.67 | 8.70 | 72.73 | 77.78 | 64.09 | |
| OURS | 94.88 | 96.80 | 93.25 | 80.00 | 91.79 | 91.73 | 91.23 | |
| Dutch | LASER+BiLSTM | 67.62 | 73.71 | 78.66 | 39.59 | 70.48 | 80.32 | 76.94 |
| RoBERTa | 84.00 | 96.39 | 86.31 | 53.20 | 82.45 | 88.30 | 86.78 | |
| LLaMazero-shot | 82.05 | 60.94 | 86.67 | 17.98 | 69.17 | 73.66 | 61.91 | |
| LLaMafew-shot | 82.50 | 55.56 | 84.95 | 38.71 | 72.82 | 74.46 | 65.43 | |
| OURS | 92.78 | 98.65 | 94.10 | 79.14 | 91.33 | 91.36 | 91.17 | |
| English | LASER+BiLSTM | 69.89 | 71.21 | 77.45 | 35.53 | 70.07 | 80.04 | 76.08 |
| RoBERTa | 85.96 | 93.76 | 88.67 | 60.47 | 84.87 | 89.91 | 88.48 | |
| LLaMazero-shot | 82.91 | 73.24 | 88.02 | 12.77 | 68.37 | 75.12 | 64.23 | |
| LLaMafew-shot | 86.67 | 80.77 | 83.33 | 44.44 | 74.55 | 77.22 | 73.80 | |
| OURS | 95.61 | 98.68 | 94.03 | 83.33 | 92.80 | 92.82 | 92.91 | |
| French | LASER+BiLSTM | 62.47 | 59.48 | 76.78 | 30.81 | 66.92 | 77.99 | 72.52 |
| MultiFit | 86.48 | 96.04 | 87.49 | 57.42 | 83.63 | 89.09 | 87.76 | |
| RoBERTa | 83.88 | 95.60 | 86.18 | 51.81 | 81.93 | 87.96 | 86.43 | |
| LLaMazero-shot | 78.76 | 70.34 | 89.47 | 15.38 | 68.52 | 74.63 | 63.49 | |
| LLaMafew-shot | 82.99 | 73.68 | 86.11 | 38.96 | 73.72 | 76.36 | 70.44 | |
| OURS | 93.86 | 98.65 | 94.83 | 84.83 | 92.95 | 92.94 | 93.04 | |
| German | LASER+BiLSTM | 70.37 | 65.07 | 78.76 | 34.81 | 70.43 | 80.29 | 75.48 |
| MultiFit | 85.85 | 96.52 | 88.21 | 60.35 | 84.22 | 89.48 | 88.28 | |
| RoBERTa | 82.16 | 89.83 | 86.86 | 59.06 | 82.74 | 88.49 | 86.85 | |
| LLaMazero-shot | 85.12 | 60.29 | 87.77 | 21.78 | 69.18 | 73.93 | 63.74 | |
| LLaMafew-shot | 86.03 | 67.42 | 90.30 | 53.01 | 79.02 | 80.27 | 74.19 | |
| OURS | 95.20 | 98.46 | 95.83 | 85.63 | 94.03 | 94.04 | 93.78 | |
| Italian | LASER+BiLSTM | 70.00 | 69.77 | 80.07 | 35.30 | 71.86 | 81.24 | 76.73 |
| MultiFit | 86.18 | 96.04 | 87.87 | 57.91 | 83.70 | 89.13 | 87.82 | |
| RoBERTa | 85.36 | 93.75 | 87.65 | 59.06 | 84.06 | 89.37 | 87.87 | |
| LLaMazero-shot | 82.11 | 59.88 | 89.77 | 5.26 | 66.88 | 72.98 | 59.26 | |
| LLaMafew-shot | 85.56 | 75.97 | 86.81 | 43.30 | 75.96 | 77.78 | 72.91 | |
| OURS | 94.31 | 99.39 | 95.85 | 82.22 | 93.81 | 93.92 | 92.94 | |
| Japanese | LASER+BiLSTM | 3.05 | 0.75 | 21.35 | 0.00 | 12.10 | 60.99 | 50.57 |
| MultiFit | 83.39 | 95.77 | 87.63 | 58.09 | 82.61 | 88.41 | 87.35 | |
| RoBERTa | 84.54 | 93.60 | 87.41 | 58.80 | 83.67 | 89.11 | 87.54 | |
| LLaMazero-shot | 82.84 | 62.03 | 88.81 | 21.05 | 69.77 | 73.77 | 63.68 | |
| LLaMafew-shot | 82.46 | 75.00 | 84.40 | 32.56 | 73.35 | 75.16 | 68.60 | |
| OURS | 96.69 | 99.17 | 93.27 | 73.50 | 89.87 | 90.15 | 89.28 | |
| Portuguese | LASER+BiLSTM | 67.42 | 66.57 | 77.29 | 32.61 | 69.00 | 79.33 | 74.61 |
| RoBERTa | 85.85 | 96.87 | 86.88 | 55.69 | 83.40 | 88.93 | 87.62 | |
| LLaMazero-shot | 81.08 | 53.52 | 87.54 | 16.33 | 65.73 | 70.94 | 59.62 | |
| LLaMafew-shot | 84.27 | 72.16 | 84.11 | 40.51 | 73.10 | 76.06 | 70.26 | |
| OURS | 92.72 | 98.70 | 95.21 | 79.08 | 91.85 | 91.97 | 91.43 | |
| Russian | LASER+BiLSTM | 65.46 | 43.54 | 75.43 | 31.19 | 65.43 | 76.95 | 70.56 |
| MultiFit | 85.54 | 96.40 | 86.95 | 59.72 | 83.43 | 88.96 | 87.87 | |
| RoBERTa | 82.95 | 90.93 | 86.96 | 58.94 | 83.22 | 88.81 | 87.10 | |
| LLaMazero-shot | 80.00 | 53.79 | 87.97 | 22.22 | 67.19 | 71.03 | 61.00 | |
| LLaMafew-shot | 76.40 | 75.00 | 82.76 | 42.42 | 69.46 | 72.58 | 69.15 | |
| OURS | 95.75 | 98.43 | 94.29 | 85.12 | 93.43 | 93.43 | 93.40 | |
| Spanish | LASER+BiLSTM | 65.02 | 56.33 | 75.41 | 38.23 | 66.68 | 77.79 | 73.77 |
| MultiFit | 86.67 | 95.98 | 87.36 | 59.45 | 83.81 | 89.21 | 88.05 | |
| RoBERTa | 86.28 | 96.64 | 87.05 | 56.59 | 83.56 | 89.04 | 87.83 | |
| LLaMazero-shot | 82.01 | 55.74 | 86.34 | 23.42 | 66.13 | 71.54 | 61.88 | |
| LLaMafew-shot | 80.47 | 68.24 | 86.60 | 48.98 | 72.90 | 75.09 | 71.07 | |
| OURS | 92.63 | 99.53 | 96.11 | 84.18 | 93.07 | 93.07 | 93.11 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).