Submitted:
19 August 2024
Posted:
20 August 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
- ▪
- Introduction of a Twitter-specific sophisticated Lexicon set to avoid ambiguity of sentiments in sentence level.
- ▪
- Analysis of Target class determination and Domain Adaptability using classifiers.
- ▪
- Implementation of MVT (Majority Voting Technique) with Random Forest classifier for improvement on Accuracy and other performance measures in sentiment analysis task.
2. Related Work
2.1. Research Questions and Inferences in TSA
3. Proposed Framework
3.1. Data Preprocessing
3.2. Text Cleaning and Tokenization
3.3. Lemmatization
3.4. N-gram Construction
3.5. Vector Representation
4. Vectorization Approach for Machine Learning Techniques
4.1. TF-IDF
4.2. Co-Occurrence Matrix
4.3. Word2Vec
4.4. CBow
4.5. Skip-Gram
5. Machine Learning Models
5.1. Naïve Bayes (NB)
5.2. Decision Tree (DT)
5.3. K-Nearest Neighbor (K-NN)
5.4. Logistic Regression (LR)
- σ(x) represents the sigmoid function.
- e is the base of the natural logarithm, close to 2.71828.
- x is the function’s input value.
5.5. Random Forest (RF)
6. Dataset Description
- Twitter_hatespeech dataset is used in this study comprising 48813 tweets already labeled with sentiment polarity conveyed (0=negative, 1=positive).
- Additionally, the Twitter_ parsed dataset containing 21907 tweets with the same polarity labels (0=negative, 1=positive) was used as a second dataset.
- The two datasets were merged based on their features, resulting in a combined dataset of 70720 tweets. These tweets were categorized into positive and negative classes for further analysis.
- Figure 7. Provides a clear and perceptive understanding of the distribution of Positive and Negative tweets in our dataset.
- “index” is the sample no.
- “id” is the unique id of each tweet.
- “label” is the polarity of the tweet.
- “tweet” is the tweets exact wording.
6.1. Wordcloud
7. Results and Discussion
| Performance Measures | NB | DT | KNN | LR | RF |
| Accuracy | 81.77 | 4.55 | 89.43 | 87.44 | 96.10 |
| Precision | 87.75 | 97.94 | 91.80 | 86.98 | 98.91 |
| Recall | 73.82 | 90.06 | 86.57 | 88.02 | 93.34 |
| F1-Score | 80.19 | 94.29 | 89.11 | 87.50 | 96.01 |
| Specificity | 73.81 | 90.04 | 86.56 | 88.01 | 92.91 |
| ROC_AUC | 81.77 | 94.55 | 89.43 | 87.44 | 96.15 |
7.1. Comparative Analysis through Evaluation Metrics
7.2. Analysis through Confusion Matrix
7.3. Comparison study
8. Conclusions and Future Work
References
- Poomka, Pumrapee, Nittaya Kerdprasop, and Kittisak Kerdprasop. "Machine learning versus deep learning performances on the sentiment analysis of product reviews." International Journal of Machine Learning and Computing 11, no. 2 (2021): 103-109. [CrossRef]
- Umarani, V., Anitha Julian, and J. Deepa. "Sentiment analysis using various machine learning and deep learning Techniques." Journal of the Nigerian Society of Physical Sciences (2021): 385-394. [CrossRef]
- Shamrat, F. M. J. M., Sovon Chakraborty, M. M. Imran, Jannatun Naeem Muna, Md Masum Billah, Protiva Das, and O. M. Rahman. "Sentiment analysis on twitter tweets about COVID-19 vaccines using NLP and supervised KNN classification algorithm." Indonesian Journal of Electrical Engineering and Computer Science 23, no. 1 (2021): 463-470. [CrossRef]
- Gaye, Babacar, Dezheng Zhang, and Aziguli Wulamu. "A tweet sentiment classification approach using a hybrid stacked ensemble technique." Information 12, no. 9 (2021): 374. [CrossRef]
- Dagar, Mohit, Abhishek Kajal, and Pardeep Bhatia. "Twitter sentiment analysis using supervised machine learning techniques." In 2021 5th International Conference on Information Systems and Computer Networks (ISCON), pp. 1-7. IEEE, 2021.
- Kokatnoor, Sujatha Arun, and Balachandran Krishnan. "Twitter hate speech detection using stacked weighted ensemble (SWE) model." In 2020 Fifth International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN), pp. 87-92. IEEE, 2020.
- Tang, Duyu, Furu Wei, Nan Yang, Ming Zhou, Ting Liu, and Bing Qin. "Learning sentiment-specific word embedding for twitter sentiment classification." In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1555-1565. 2014.
- Wang, Hao, Doğan Can, Abe Kazemzadeh, François Bar, and Shrikanth Narayanan. "A system for real-time twitter sentiment analysis of 2012 us presidential election cycle." In Proceedings of the ACL 2012 system demonstrations, pp. 115-120. 2012.
- Ala'M, Al-Zoubi, Ja'far Alqatawna, and Hossam Paris. "Spam profile detection in social networks based on public features." In 2017 8th International Conference on information and Communication Systems (ICICS), pp. 130-135. IEEE, 2017.
- Patel, Ravikumar, and Kalpdrum Passi. "Sentiment analysis on twitter data of world cup soccer tournament using machine learning." IoT 1, no. 2 (2020): 14. [CrossRef]
- Saranya, S., and G. Usha. "A Machine Learning-Based Technique with IntelligentWordNet Lemmatize for Twitter Sentiment Analysis." Intelligent Automation & Soft Computing 36, no. 1 (2023). [CrossRef]
- Jayakody, J. P. U. S. D., and B. T. G. S. Kumara. "Sentiment analysis on product reviews on twitter using Machine Learning Approaches." In 2021 International Conference on Decision Aid Sciences and Application (DASA), pp. 1056-1061. IEEE, 2021.
- Rodrigues, Anisha P., Roshan Fernandes, Adarsh Shetty, Atul K, Kuruva Lakshmanna, and R. Mahammad Shafi. "[Retracted] Real-Time Twitter Spam Detection and Sentiment Analysis using Machine Learning and Deep Learning Techniques." Computational Intelligence and Neuroscience 2022, no. 1 (2022): 5211949.
- Ala'M, Al-Zoubi, Ja'far Alqatawna, and Hossam Paris. "Spam profile detection in social networks based on public features." In 2017 8th International Conference on information and Communication Systems (ICICS), pp. 130-135. IEEE, 2017.
- Patel, Ravikumar, and Kalpdrum Passi. "Sentiment analysis on twitter data of world cup soccer tournament using machine learning." IoT 1, no. 2 (2020): 14. [CrossRef]
- Shafin, Minhajul Abedin, Md Mehedi Hasan, Md Rejaul Alam, Mosaddek Ali Mithu, Arafat Ulllah Nur, and Md Omar Faruk. "Product review sentiment analysis by using nlp and machine learning in bangla language." In 2020 23rd International Conference on Computer and Information Technology (ICCIT), pp. 1-5. IEEE, 2020.
- Zhang, Lei, Riddhiman Ghosh, Mohamed Dekhil, Meichun Hsu, and Bing Liu. "Combining lexicon-based and learning-based methods for Twitter sentiment analysis." HP Laboratories, Technical Report HPL-2011 89 (2011): 1-8.
- Basari, Abd Samad Hasan, Burairah Hussin, I. Gede Pramudya Ananta, and Junta Zeniarja. "Opinion mining of movie review using hybrid method of support vector machine and particle swarm optimization." Procedia Engineering 53 (2013): 453-462. [CrossRef]
- Bohra, Aditya, Deepanshu Vijay, Vinay Singh, Syed Sarfaraz Akhtar, and Manish Shrivastava. "A dataset of Hindi-English code-mixed social media text for hate speech detection." In Proceedings of the second workshop on computational modeling of people’s opinions, personality, and emotions in social media, pp. 36-41. 2018.
- Dang, Nhan Cach, María N. Moreno-García, and Fernando De la Prieta. "Sentiment analysis based on deep learning: A comparative study." Electronics 9, no. 3 (2020): 483. [CrossRef]
- Musleh, Dhiaa A., Ibrahim Alkhwaja, Ali Alkhwaja, Mohammed Alghamdi, Hussam Abahussain, Faisal Alfawaz, Nasro Min-Allah, and Mamoun Masoud Abdulqader. "Arabic sentiment analysis of youtube comments: Nlp-based machine learning approaches for content evaluation." Big Data and Cognitive Computing 7, no. 3 (2023): 127. [CrossRef]
- Kastrati, Zenun, Fisnik Dalipi, Ali Shariq Imran, Krenare Pireva Nuci, and Mudasir Ahmad Wani. "Sentiment analysis of students’ feedback with NLP and deep learning: A systematic mapping study." Applied Sciences 11, no. 9 (2021): 3986. [CrossRef]
- Mitra, Ayushi, and Sanjukta Mohanty. "Sentiment analysis using machine learning approaches." Emerging Technologies in Data Mining and Information Security: Proceedings of IEMIS 2 (2020): 63-68.














| Sl. No. | Methods used/ Reference | Vectorization Techniques | Performance Metric | |||||
| A | P | R | F1 | S | ROC_AUC | |||
| 1 | Logistic Regression [10] | Hashing vectorizer | 55.0 | 56.1 | 66.0 | 60.0 | - | 59.0 |
| 2 | Decision Tree (DT) [13] | Bag of Words (BoW) | 94.3 | 91.9 | 88.1 | 89.9 | 91.1 | - |
| 3 | Naive Bayes (NB) [14] | Count Vectorizer | 95.7 | 94.0 | 93.0 | - | - | 50.0 |
| 4 | Naive Bayes (NB) [15] | TF-IDF Transformer | 87.5 | 88.0 | 87.6 | 87.3 | - | 95.8 |
| 5 | K-Nearest Neighbor (KNN) [16] | Count Vectorizer | 92.4 | 92.3 | 92.5 | 92.8 | 91.8 | - |
| 6 | Support Vector Machine [17] | TF-IDF | - | 68.7 | 82.7 | 74.9 | - | - |
| 7 | Support Vector Machine with Particle Swarm Optimization (SVM-PSO) [18] | TF, TF-IDF | 77.0 | 77.5 | 76.1 | - | - | - |
| 8 | Support Vector Machine (SVM) [19] | Word2vec | 82.6 | - | - | 62.0 | - | - |
| 9 | Naïve Bayes [21] | TF-IDF | - | 94.6 | 94.64 | 94.62 | - | - |
| 10 | Proposed model (Random Forest) | CBow, Skip-gram | 96.1 | 98.9 | 93.3 | 96.0 | 92.9 | 96.1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).