Preprint
Article

This version is not peer-reviewed.

A Machine Learning Framework for Hate Speech Detection in Social Media Text

Submitted:

27 May 2026

Posted:

28 May 2026

You are already at the latest version

Abstract
The rapid growth of social media has transformed digital communication while simultaneously increasing the spread of hate speech, offensive language, and abusive online behavior. Automated hate speech detection has therefore become a critical research challenge in Natural Language Processing (NLP) and Machine Learning (ML). This paper presents a machine learning framework for hate speech detection using TF-IDF and Word2Vec feature extraction techniques combined with Logistic Regression, Support Vector Machine (SVM), Random Forest, and Naïve Bayes classifiers. Experimental evaluation demonstrates that SVM achieved the highest performance with 93.4% accuracy and 92.9% F1-score. The study further discusses contextual ambiguity, sarcasm detection challenges, feature interpretability, and future integration with transformer-based architectures such as BERT and multilingual NLP models.
Keywords: 
;  ;  ;  ;  ;  ;  ;  

1. Introduction

The digital era has changed the communication patterns of people, the way of exchanging ideas, and expressing views. Users who share their opinions live have been empowered by platforms like Twitter, Facebook, Reddit, and YouTube, among others, helping reach billions of users. Although this interconnectedness supports the freedom of expression and turns the flow of information into a democratic tendency, it has also contributed to the rapid propagation of virulent and abusive language [2]. The concept of hate speech, especially, has become a significant issue of online community, policymakers, as well as social platforms. It can be expressed as offensive words, insults, or violence against individuals / groups of people because of their race, ethnicity, gender, religion, sexual orientation as well as nationality. These expressions do not only disrupt social harmony, but have the potential to erupt into actual violence in the real world, discrimination, and mental distress in target populations.
Hate speech can be depicted using multiple languages, hence its complexity, and diversity in contexts. Hate speech may be implicit, coded or concealed in humour and sarcasm unlike explicit offensive language. Furthermore, what is considered hate speech in one culture, region, or platform might be viewed as other cultures, regions, and platforms [5]. This subjectivity presents a severe problem to human moderators as well as the automated systems. Scalability and consistency of manual moderation is impossible due to the quantity of content being created each second. As a result, machine learning (ML) and natural language processing (NLP) are now potent instruments to rely on when the identification and filtering of hateful content are automated.
Machine learning is providing power to learn on the data, find the patterns which are not seen, and to generalize on unseen cases. ML algorithms, in the event of a hate speech detector, are trained through labelled datasets with samples of hate and non-hate material. When such models have been trained, they are then alike able to classify new texts based on learned patterns of linguistic and semantic patterns without any involvement of human categorization. The effectiveness of these models is, however, discriminative to several factors such as the quality of the data, representations of features, choice of algorithm and context awareness. NLP+ML can be used to normalize texts, extract features, and understand the meaning of words, and therefore the work of systems that must work with complex language structure can be more efficient [3].
The reason behind the development of this study is the increasing necessity to make the digital spaces safe, where the users will be able to express themselves without being harassed or discriminated. To control hate speech, the governments, and citizens mount pressure on social media companies, which have yet to achieve perfection in existing systems. Deep learning systems also demand a lot of computation and large and well-balanced datasets whereas traditional keyword-based filters cannot detect implicit or emergent hate terms. This is the gap that motivates the search for optimized machine learning architectures that can be a trade-off between performance, interpretability, and computational efficiency.
The overall objective of the paper will be to generate and evaluate machine learning based applications in potential hate speech detection of written text in the Internet. It is founded on the comparison among the traditional models such as the Logistic Regression, Naive Bayes, random Forest and SVM and linguistic models such as: TF-IDF (Term Frequency-Inverse Document Frequency) and Word2Vec embeddings. The question of what the most appropriate model to be adopted in the real world shall be answered by comparison and contrast of their work with benchmark data sets [14]. The other aspect that the paper must undertake is the preprocess effect and feature engineering, optimization of the parameters over the model accuracy and generalization.
This study adds to the research on the impact of linguistic undertones and the context of a particular situation on the results of classification. The analysis of model performance measures is not the only step to be carried out, the error analysis could also be performed, and the sources of misclassification were identified, especially the inability to separate hate speech and an offensive yet not hate speech. These learnings can be critical in the creation of more inclusive and thorough systems in the future [4].
The paper seeks to address the gap that currently exists in the academic literature and the actual implementation of hate speech detection systems to consider attention to simplicity, transparency, and flexibility of models. However, resources-efficient machine learning models are going to be the focus in this study, even though it has been demonstrated that improved performance can be achieved by deep learning models, including BERT, at the cost of practically applicable, computer-based real-time moderation systems with little or no hardware processing resources. The suggested framework should strike a balance between accuracy, interpretability, and scalability and make automated hate speech detection not only effective but also ethically appropriate, due to the principles of free speech [15].

3. Methodology

The suggested detection of hate speech approach will be a pipeline comprising of data preprocessing, feature extraction, machine learning classification, and evaluation. The solution is geared towards coming up with a scalable and interpretable model that can identify hate content with a high degree of accuracy. The proposed system has its workflow as shown in Figure 1.
The originality of this study is in the fact that it is a systematic operation combining machine learning algorithms with optimal future linguistic features extraction to identify hateful texts. This study has a balanced and interpretable framework in comparison to earlier ones that only depended on deep learning or basic matches of key words that should be utilized since it can work effectively even with minimal data resources. The methodology is a blend of conventional NLP preprocessors and the effective ML classifiers, which can be used as a scalable alternative to the computationally intensive neural networks.
The dual-feature extraction technique of the TF-IDF and the Word2Vec representation is also another peculiar feature of this study. TF-IDF extracts the significance of some terminologies in the corpus, after which the model becomes able to recognize explicit and statistically significant hate manifestations. Conversely, Word2Vec representations obtain the semantic relationship, and thus underlying and context coded hate language can be detected. A comparative study of the two techniques brings a more in-depth insight into the effects of the various linguistic representations on classification performance.
The other contribution of this work is that it assesses the classical ML models comprehensively under the conditions of a standardized experiment. The study shows the compromise between the predictive accuracy and computational efficiency of the algorithms by testing Logistic Regression, Naive Bayes, random forest, and SVM on the same dataset. The findings show that SVM has better performance in the text classification application in high-dimensional sparse data, which again supports SVM in hate speech detection when overridden by deep learning tools.
Besides, this paper will add value by conducting a systematic and in-depth error analysis and interpretability evaluation as commonly missed out in earlier studies. The awareness of the frequent falsifications, especially with the instances of sarcasm, irony, and ambivalent emotion, has valuable implications of the linguistic issues of the automated moderation systems. This can be used to assess the curation of databases and feature engineering in future research.
Practically, the framework suggested by this study is very slim, transparent as well as deployable in a small scale organization as well as a community platform that cannot afford expensive computational infrastructure. In this manner, digital safety solutions will be inclusive over the diverse technological environments.
Finally, this study will be the basis of further enhancement of multimodal and multilingual hate speech detection. The study establishes the foundation on which the model can be extended to other modalities like audio and images by overcoming dataset bias, imbalance in the classes and contextual reasoning. It is also suggested that the principles of explainable AI be introduced in the future, and make the automated detection ethically responsible and socially acceptable.
Simply, the originality of this work is in its compromise of interpretability, performance, and practicality. Its contributions go beyond accuracy of the model and aim to capture contextual knowledge, transparency, and application into practice which are essential in making responsible AI when using platforms like online communication.
Each box can relate to directional arrows, showing the sequence of operations. You can include subprocess boxes under preprocessing (Tokenization, Stop word Removal, Lemmatization) and under feature extraction (TF-IDF, Word Embedding).

Data Representation and Preprocessing

Let the dataset be represented as:
D = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , , ( x n , y n ) }
where x i represents the input text sample and y i { 0,1 } denotes the class label ( 0 for non-hate, 1 for hate) [7].
The text data is preprocessed through normalization, stopword removal, and tokenization. Each document is represented as a sequence of tokens:
T i = [ w i 1 , w i 2 , , w i m ]
To reduce noise, all tokens are converted to lowercase and irrelevant symbols are removed. Lemmatization is applied using:
L ( w ) = l e m m a ( w )
where L ( w ) returns the canonical form of word w .

Feature Extraction

The system employs two major text vectorization methods - TF-IDF and Word2Vec embeddings [13].
The TF-IDF (Term Frequency-Inverse Document Frequency) score for a word t in a document d is defined as:
T F I D F ( t , d ) = T F ( t , d ) × I D F ( t )
where
T F ( t , d ) = f t , d k     f k , d
and
I D F ( t ) = l o g ( N n t )
Here, f t , d is the frequency of term t in document d , N is the total number of documents, and n t is the number of documents containing the term t .
For Word2Vec, each word is transformed into a dense vector representation. The objective function minimizes the negative log-likelihood:
J = ( w , c ) C   l o g P ( c w )
with conditional probability:
P ( c w ) = e x p ( v c T v w ) c V     e x p ( v c T v w )
where v w and v c represent word and context embeddings, respectively, and V denotes the vocabulary.

Model Construction

After feature extraction, feature vectors are used to train classification models. Let X R n × m denote the feature matrix and Y the label vector [8].
For Logistic Regression, the probability of an input belonging to class 1 (hate) is modeled as:
P ( y = 1 x ) = 1 1 + e ( w T x + b )
The optimization objective is to minimize the binary cross-entropy loss:
L = 1 n i = 1 n   [ y i l o g y ̂ i + ( 1 y i ) l o g ( 1 y ̂ i ) ]
For Support Vector Machine (SVM), the decision boundary is optimized by:
m i n w ; b   1 2 w 2 + C i = 1 n   ξ i
subject to
y i ( w T x i + b ) 1 ξ i , ξ i 0
where C is the regularization parameter controlling the trade-off between margin maximization and misclassification.

Model Evaluation

Text was classified using different machine classification techniques, namely, the Logistic Regression, Naive Bayes and Support Vector Machine (SVM) by using extracted features as n-grams, bag-of-words and the TF-IDF representations:
  Accuracy   = T P + T N T P + T N + F P + F N   Precision   = T P T P + F P   Recall   = T P T P + F N F 1 = 2 ×   Precision   ×   Recall     Precision   +   Recall  
where T P , T N , F P , and F N represent true positives, true negatives, false positives, and false negatives respectively.
The discriminative ability is evaluated overall using the Receiver Operating Characteristic - Area Under Curve (ROC-AUC) measure:
A U C = 0 1   T P R ( F P R ) d F P R
where T P R is the true positive rate and F P R is the false positive rate.

Experimental Workflow

  • Input Layer: Raw social media text is collected and fed into the preprocessing pipeline.
  • Preprocessing: Tokenization, stopword removal, and lemmatization standardize text.
  • Feature Extraction: TF-IDF and Word2Vec models convert text into numeric vectors.
  • Classification: ML models (SVM, Logistic Regression, Random Forest) are trained and optimized.
  • Prediction: The trained model predicts whether new input text contains hate speech.
  • Evaluation: Metrics are computed to assess accuracy and robustness.

Mathematical Summary

In concise form, the detection process can be summarized as:
y ̂ = f ( ϕ ( x ) ; θ )
where ϕ ( x ) represents the feature transformation (TF-IDF or Word2Vec), and θ denotes model parameters. The goal is to minimize the overall classification loss:
θ * = a r g m i n θ   1 n i = 1 n   L ( f ( ϕ ( x i ; θ ) ) , y i )
ensuring the model accurately generalizes to unseen samples.

Summary

The proposed methodology integrates mathematical rigor with practical efficiency. It leverages both statistical and semantic feature representations to capture explicit and implicit hate content. Through optimization-based training and robust evaluation metrics, the system ensures balanced performance. The mathematical framework enhances model interpretability, while the structured flowchart facilitates easy deployment in real-world moderation environments [9].

4. Results and Discussion

To test the experimental implementation of the proposed hate speech detector, a benchmark data of 25,000 labelling sample texts in the form of post in social networks (Twitter, Reddit, and so on) were considered to fit. All these four machine learning models were put to test with both the processed data, and the feature extracted data using TF-IDF and Word2Vec embeddings, using the step of preprocessing: Logistic regression, Support vector machine (SVM), random forest and naive bayes. Accuracy, recall and precision, and F1 score were all used as the comparisons of the models. Table 1 has shown the results summary representing the summary of all models’ performance in the same condition of experiment.
As demonstrated in Table 1, the Support Vector Machine (SVM) model performed better in comparison to other algorithms as it had the highest accuracy of 93.4 and has an F1-score of 92.9. Logistic Regression was a competitive model with an accuracy of 91.2, which shows that linear models may in fact be very useful in text-based classification when features are well-engineered. The results produced by random Forest were stable but the recall was slightly lower indicating that it was likely to misclassify borderline samples. Naïve Bayes performed the worst with its main forecast being its independence assumption which does not apply in correlated textual data.
Figure 2 depicts the disparity in the performance of the models in the style of a bar graph wherein the accuracy and the F1-score of every model are contrasted. When SVM and Logistic Regression have an obvious advantage in that they achieve better metrics in the performance, their strength and consistency should be ensured.
Besides TF-IDF features, the models were also tested based on Word2Vec embeddings to determine the extent to which the word detection accuracy is affected by semantic representation. These comparative findings are provided in Table 2 with the findings indicating that Word2Vec-based models were more effective in terms of recall values at the expense of a minor decline in precision relating to a more assertive classification strategy, which reached more manifestations of subtle hate.
The result of the comparison Table 2 proves that Word2Vec features somewhat contribute to better contextual understanding, particularly in the situations when hate speech is both implicit and coded. Nevertheless, TF-IDF characteristics are more consistent and are understandable by linear models. A line graph in Figure 3 shows a Training vs Testing Accuracy of the models in all the models to the feature extraction methods. The plot, indicates a gradual dominance of SVM in both feature representations with a slight decrease in a case of Word2vec. The performance decrease could be because of the high-dimensional embeddings that could lead to overfitting.
Additional information about the model behaviour was also developed by the confusion matrix analysis. In the case of SVM, the most frequent errors were between offensive and hate groups, indicating the linguistic border that is narrow enough. Other non-hate samples which had a strong negative sentiment were falsely identified as hate speech, showing the model to be sensitive to emotion intensity. This shows that it is important to consider emotion receptive aspects and contextual meanings in future systems. The confusions of the SVM model are illustrated in Figure 4 (Confusion Matrix (3 models) with the high accuracy of predictions and the misclassification made less than 10 percent per each of the classes.
Other than quantitative data, qualitative analysis indicated several trends model comparison in hate distribution on Figure 5. Direct slurs or other explicit hate speech were always identified, whereas other implicit types of hate speech were sometimes misidentified as sarcasm, metaphor, or coded references. The mentioned errors imply the weakness of surface-based linguistic models and the possibility of applying deep contextual embeddings in future studies. Besides, there was an effect of data imbalance on the overall performance; although we applied oversampling techniques, the minority classes samples were more difficult to locate correctly and it can be concluded that improved balancing of data sets is essential [12].
The comparison of TF-IDF and Word2Vec also found out that although TF-IDF is highly readable, it lacks the ability to form the relationships of words in context and as a result, words get sometimes false alarms comparison of models on accuracy, precision, recall, and F1-score in Figure 6. Word2Vec on the other hand was more sensitive to similarity in semantics and fine-tuning in meaning but it was more prone to false-positives because it was sensitive to contextual generalisation. A hybrid solution with a combination of statistical and semantic characteristics might, therefore, provide one with the best balance between accuracy and interpretability [10].
Lastly, run time analysis indicated that SVM linear models, including Logistic Regression, are computationally efficient, they can be trained in seconds on moderate hardware. Ensemble models were slightly slower but do not give a significant accuracy improvement. The suggested SVM-based model is most efficient and that is why it will be most appropriate in the applications of the proposed SVM-based model to identify hate speech in real-time within the social media networks when there is a huge need to monitor and respond promptly.
Overall, the discussion demonstrates that the suggested methodology is an effective solution to the problem of hate speech detection that can be applied to its effective feature representation and classification. Preprocessing combined with text modelling using TF-IDFs and SVM classification obtained a better balance in performance as well as in interpretability. Though deep learning structures may introduce only slight gains, simplicity and speed of the existing solution may benefit actual real-time traffic moderation systems. Future studies should be able to expand this model to include multilingual data, test with transformer-based embeddings, and include sentimental aware or multimodal input in order to make the model even more reliable in detection.

5. Conclusions

This research paper has shown that machine learning applications are capable of efficient detection of hate speech in online written content and can therefore help in making online spaces safer and more accommodative to everyone. One of the tested models, the Support Vector Machine classifier, had the highest overall performance, and it could be suitable in deploying the new social media monitoring system in real-time mode.There are always limitations in a practical way. The accuracy of the system decreases when it is faced with slang, sarcasm or in cases where there are new hate words unfamiliar to the training data. Besides, dataset bias, and poor multilingual representation also limit the use across various cultures and languages. Also, the model does not have the capability of comprehending subtle context or multimodal signals like pictures and tone.
Further research needs to include incorporating deep contextual architecture such as the BERT and GPT based architectures, cross-lingual training of hate detection in multilingual setting and incorporation of multimodal training of audio-visual cues. It would also be beneficial to incorporate explainable AI (XAI) methods to increase the level of transparency in automated mod systems. With the development of the social media, constant updates of the data and retraining as adapters will be vital in ensuring performance and ethical standards in the hate speech detection.

Ethical Considerations

This framework is intended for content moderation to protect vulnerable communities, not for surveillance or censorship. The dataset contains potentially offensive content used solely for academic research with no attempt to re-identify users. Human oversight is recommended for deployment.

Acknowledgments

The authors note that they are indebted to open-source contributors of data as well as scholar communities who have released hate speech datasets to be used in studies so that ethical AI and social computing can advance.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Naseeb. Machine Learning- and Deep Learning-Based Multi-Model system for hate speech detection on Facebook. Algorithms 2025, 18, 331. [Google Scholar] [CrossRef]
  2. Hossain, M.A.; Traini, E.; Amenta, F. Machine Learning Applications for Diagnosing Parkinson’s Disease via Speech, Language, and Voice Changes: A Systematic review. Inventions 2025, 10, 48. [Google Scholar] [CrossRef]
  3. Pan, L. The importance of deep learning models in speech signal processing: Fundamentals, strategies, and future research directions. Int. J. Speech Technol. 2025. [Google Scholar] [CrossRef]
  4. Haboussi, S.; Oukas, N.; Zerrouki, T.; Djettou, H. Arabic speech recognition using neural networks: Concepts, literature review and challenges. J. Umm Al-Qura Univ. Appl. Sci. 2025. [Google Scholar] [CrossRef]
  5. Naz; Khan, H.U.; Bukhari, A.; Alshemaimri, B.; Daud, A.; Ramzan, M. Machine and deep learning for personality traits detection: A comprehensive survey and open research challenges. Artif. Intell. Rev. 2025, 58. [Google Scholar] [CrossRef]
  6. Zangl, M.; et al. A multidisciplinary analysis of transparent AI-driven toxicity detection tools for civic engagement platforms. In AI & Society; 2025. [Google Scholar] [CrossRef]
  7. Chowdhury, T. Decoding silent speech: A machine learning perspective on data, methods, and frameworks. In Neural Computing and Applications; 2025. [Google Scholar] [CrossRef]
  8. Shankar, R.; Bundele, A.; Mukhopadhyay, A. A Systematic review of Natural language processing Techniques for Early Detection of Cognitive Impairment. Mayo Clin. Proc. Digit. Health 2025, 3, 100205. [Google Scholar] [CrossRef]
  9. Najib, F.M. Sign language interpretation using machine learning and artificial intelligence. In Neural Computing and Applications; 2024. [Google Scholar] [CrossRef]
  10. Jalayer, R.; Jalayer, M.; Baniasadi, A. A review on sound source localization in Robotics: Focusing on deep learning methods. Appl. Sci. 2025, 15, 9354. [Google Scholar] [CrossRef]
  11. Amirgaliyev; Mussabek, M.; Rakhimzhanova, T.; Zhumadillayeva, A. A Review of Machine Learning and Deep Learning Methods for Person Detection, Tracking and Identification, and Face Recognition with Applications. Sensors 2025, 25, 1410. [Google Scholar] [CrossRef]
  12. Alsehaimi, A.; Babour; Alahmadi, D. Toward Transparent Modeling: A Scoping Review of Explainability for Arabic Sentiment Analysis. Appl. Sci. 2025, 15, 10659. [Google Scholar] [CrossRef]
  13. Dutta, S.; Banducci; Camargo, C.Q. Divided by discipline? A systematic literature review on the quantification of online sexism and misogyny using a semi-automated approach. Scientometrics 2025. [Google Scholar] [CrossRef] [PubMed]
  14. Rodrigo-Guillen, R.; Garcia-D’Urso, N.; Mora-Mora, H.; Azorin-Lopez, J. Detecting abnormal behavior events and gatherings in public spaces using Deep Learning: A review. J. Sens. Actuator Netw. 2025, 14, 69. [Google Scholar] [CrossRef]
  15. Rasool, N.; Bhat, J.I. Brain tumour detection using machine and deep learning: A systematic review. In Multimedia Tools and Applications; 2024. [Google Scholar] [CrossRef]
  16. Thapa, S. Large language models (LLM) in computational social science: Prospects, current state, and challenges. Soc. Netw. Anal. Min. 2025, 15. [Google Scholar] [CrossRef]
Figure 1. System Architecture Diagram.
Figure 1. System Architecture Diagram.
Preprints 215690 g001
Figure 2. Grouped Bar Chart (Accuracy, Precision, Recall, F1).
Figure 2. Grouped Bar Chart (Accuracy, Precision, Recall, F1).
Preprints 215690 g002
Figure 3. Line Chart (Training vs Testing Accuracy).
Figure 3. Line Chart (Training vs Testing Accuracy).
Preprints 215690 g003
Figure 4. Confusion Matrix (3 models).
Figure 4. Confusion Matrix (3 models).
Preprints 215690 g004
Figure 5. Model Comparison Bar Graph.
Figure 5. Model Comparison Bar Graph.
Preprints 215690 g005
Figure 6. Performance Comparison of Models on Accuracy, Precision, Recall, and F1-score.
Figure 6. Performance Comparison of Models on Accuracy, Precision, Recall, and F1-score.
Preprints 215690 g006
Table 1. Performance comparison of machine learning models using tf-idf features.
Table 1. Performance comparison of machine learning models using tf-idf features.
Model Accuracy (%) Precision (%) Recall (%) F1-Score (%)
Logistic Regression 91.2 90.4 89.7 90.0
SVM 93.4 92.8 93.1 92.9
Random Forest 89.8 88.9 88.1 88.5
Naïve Bayes 85.5 84.7 82.3 83.5
Table 2. Performance comparison using word2vec embeddings.
Table 2. Performance comparison using word2vec embeddings.
Model Accuracy (%) Precision (%) Recall (%) F1-Score (%)
Logistic Regression 90.5 89.9 91.2 90.4
SVM 92.1 91.7 92.5 92.0
Random Forest 88.7 87.4 88.9 88.1
Naïve Bayes 84.3 82.8 83.7 83.2
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated