Submitted:
21 February 2025
Posted:
24 February 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Two encoder-only-based models (DEENT-Generic and DEENT-Bert) able to estimate effectively, regarding accuracy, balanced accuracy, precision, and f1-score, the depression during the COVID-19 pandemic from a Twitter dataset.
- A labeled dataset, built using BERT and K-means clustering, containing non-depressive and depressive tweets.
2. Background
3. Related Work
4. DEENT
4.1. Pipeline
4.2. Business and Data Understanding
4.3. Data Engineering
4.4. Depression Twitter Dataset
- A pre-trained model, called covid-twitter-bert-v2-mnli [35] (available on HugginFace [36]) and based on BERT, was used to find the probability score of each tweet (just the text column of the sentiment-oriented dataset was used) of belong to two candidate labels (depressive and non-depressive); this model was used since it was already fine-tuned for classification problems related to COVID pandemic, note that had not been used for mental illness classification as we do in this paper. In this process, the Transformer library version 4.40.2 of HuggingFace was used to download the covid-twitter-bert-v2-mnli that computes the probability scores related to depression.
- The K-means clustering algorithm was enforced on the dataset containing the probability scores obtained in the previous step. In particular, we varied the number of clusters k from 2 to 9 and evaluated the K-Means outcome with the Silhouette coefficient (near values to 1 are desirable) to determine the clustering process quality. Results revealed that the coefficient (0.613) was highest when K-means operated with k=2. Remarkably, K-means grouped the tweets as depressive (label = 1) and non-depressive (label=0) and, consequently, got a dataset for binary depression classification. The resultant dataset (see Table 3) was imbalanced, including 70,509 (56.87%) non-depressive tweets and 53,475 (43.13%) depressive. In this process, the Scikit-learn 1.4.1 was used to build the K-means clustering model and Matplotlib 3.7.1 to plot the figures and analyze the results.
4.5. DEENT-Generic
4.6. DEENT-Bert
5. Evaluation
5.1. Performance Metrics
5.2. DEENT Training
5.3. Baseline
5.4. Results and Analysis
6. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| AI | Artificial Intelligence |
| AUROC | Area Under the Receiver Operating Characteristics |
| BERT | Bidirectional Encoder Representations from Transformers |
| BiGRU | bidirectional Gated Recurrent Unit |
| CNN | Convolutional Neural Network |
| DL | Deep Learning |
| FN | False Negatives |
| FP | False Positives |
| GAI | Generative Artificial Intelligence |
| HAN | Hierarchical Attention Network |
| LSTM | Long short-term memory |
| MDHAN | Multi-Aspect Depression Detection with Hierarchical Attention Network |
| ML | Machine Learning |
| NB | Naive Bayes |
| NLP | Natural Language Processing |
| RF | Random Forest |
| KNN | k-Nearest Neighbors |
| RNN | Recurrent Neural Network |
| SVM | Support Vector Machine |
| TN | True Negatives |
| TP | True Positives |
| XGBoost | Extreme Gradient Boosting Machine |
| WHO | World Health Organization |
References
- Mental disorders. Available online: https://www.who.int/news-room/fact-sheets/detail/mental-disorders (accessed on 10 December 2024).
- Sher, L. Post-COVID syndrome and suicide risk. QJM. 2021, 114, 95 – 98. [CrossRef]
- Latoo, J.; Haddad, P.M.; Mistry, M.; Wadoo, O.; Islam, S.M.S.; Jan, F.; Iqbal, Y.; Howseman, T.; Riley, D.; Alabdulla, M. The COVID-19 pandemic: An opportunity to make mental health a higher public health priority. BJPsych Open. 2021, 7. [CrossRef]
- Li, S.; Wang, Y.; Xue, J.; Zhao, N.; Zhu, T. The impact of covid-19 epidemic declaration on psychological consequences: A study on active weibo users. Int. J. Environ. Res. Public Health. 2020, 17. [CrossRef]
- Adikari, A.; Nawaratne, R.; de Silva, D.; Ranasinghe, S.; Alahakoon, O.; Alahakoon, D. Emotions of COVID-19: Content analysis of self-reported information using artificial intelligence. J. Med. Internet Res. 2021, 23. [CrossRef]
- Simjanoski, M.; Ballester, P.L.; da Mota, J.C.; De Boni, R.B.; Balanzá-Martínez, V.; Atienza-Carbonell, B.; Bastos, F.I.; Frey, B.N.; Minuzzi, L.; Cardoso, T.d.A.; et al. Lifestyle predictors of depression and anxiety during COVID-19: a machine learning approach. Trends Psychiatry Psychother. 2022, 44. [CrossRef]
- Jha, I.P.; Awasthi, R.; Kumar, A.; Kumar, V.; Sethi, T. Learning the mental health impact of COVID-19 in the United States with explainable artificial intelligence: Observational study. JMIR Mental Health. 2021, 8. [CrossRef]
- Huma.; Sohail, M.K.; Akhtar, N.; Muhammad, D.; Afzal, H.; Mufti, M.R.; Hussain, S.; Ahmed, M. Analyzing COVID-2019 Impact on Mental Health through Social Media Forum. Comput. Mater. Continua. 2021, 67, 3737 – 3748. [CrossRef]
- Adarsh, V.; Arun Kumar, P.; Lavanya, V.; Gangadharan, G. Fair and Explainable Depression Detection in Social Media. Inf. Process. Manag. 2023, 60. [CrossRef]
- Zogan, H.; Razzak, I.; Wang, X.; Jameel, S.; Xu, G. Explainable depression detection with multi-aspect features using a hybrid deep learning model on social media. World Wide Web. 2022, 25, 281 – 304. [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [CrossRef]
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, USA, 13-17 August 2016; p. 785 – 794. [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735 – 1780. [CrossRef]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE. 1998, 86, 2278 – 2323. [CrossRef]
- Yang, Z.; Yang, D.; Dyer, C.; He, X.; Smola, A.; Hovy, E. Hierarchical Attention Networks for Document Classification. In Proceedings of the Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California, June 2016; pp. 1480–1489. [CrossRef]
- Thushari, P.D.; Aggarwal, N.; Vajrobol, V.; Saxena, G.J.; Singh, S.; Pundir, A. Identifying discernible indications of psychological well-being using ML: explainable AI in reddit social media interactions. Soc. Netw. Anal. Min. 2023, 13. [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Minnesota, June 2019; pp. 4171–4186. [CrossRef]
- Depression. Available online: https://www.who.int/es/news-room/fact-sheets/detail/depression (accessed on 10 December 2024).
- Depression. Available online: https://www.nimh.nih.gov/health/topics/depression (accessed on 10 December 2024).
- Deshpande, M.; Rao, V. Depression detection using emotion artificial intelligence. In Proceedings of the Proceedings of the International Conference on Intelligent Sustainable Systems, ICISS 2017, Palladam, India, 7-8 December 2017; pp. 858–862. [CrossRef]
- Cacheda, F.; Fernandez, D.; Novoa, F.J.; Carneiro, V. Early detection of depression: Social network analysis and random forest techniques. J. Mach. Learn. Res. 2019, 21. [CrossRef]
- Bombieri, M.; Rospocher, M.; Dall’Alba, D.; Fiorini, P. Automatic detection of procedural knowledge in robotic-assisted surgical texts. Int. J. Comput. Assist. Radiol. Surg. 2021, 16, 1287 – 1295. [CrossRef]
- Mehta, D.; Dwivedi, A.; Patra, A.; Anand Kumar, M. A transformer-based architecture for fake news classification. Soc. Netw. Anal. Min. 2021, 11. [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, California, USA, 4-9 December 2017; pp. 6000–6010. [CrossRef]
- Lan, Z.; Chen, M.; Goodman, S.; Gimpel, K.; Sharma, P.; Soricut, R. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. In Proceedings of the International Conference on Learning Representations, April 2020. [CrossRef]
- Liu, M.; Xue, J.; Zhao, N.; Wang, X.; Jiao, D.; Zhu, T. Using Social Media to Explore the Consequences of Domestic Violence on Mental Health. J. Interpers. Violence. 2021, 36, NP1965 – 1985NP. [CrossRef]
- Kalt, T. A New Probabilistic Model of Text Classification and Retrieval. Technical report, University of Massachusetts, USA, 1998.
- Krasker, W.S. Estimation in linear regression models with disparate data points. Econometrica 1980, 48, 1333. [CrossRef]
- Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent Dirichlet allocation. J. Mach. Learn. Res. 2003, 3, 993 – 1022.
- Meng, Q.M.; Wu, W.G. Artificial emotional model based on finite state machine. J. Cent. South Univ. Technol. 2008, 15, 694 – 699. [CrossRef]
- Ji, S.; Zhang, T.; Ansari, L.; Fu, J.; Tiwari, P.; Cambria, E. MentalBERT: Publicly Available Pretrained Language Models for Mental Healthcare. In Proceedings of the Proceedings of the Thirteenth Language Resources and Evaluation Conference, Marseille, France, June 2022; pp. 7184–7190. [CrossRef]
- Studer, S.; Bui, T.B.; Drescher, C.; Hanuschkin, A.; Winkler, L.; Peters, S.; Müller, K.R. Towards CRISP-ML(Q): A Machine Learning Process Model with Quality Assurance Methodology. Mach. Learn. Knowl. Extr. 2021, 3, 392 – 413. [CrossRef]
- University of Melbourne, M. Depressive/non-Depressive Tweets between Dec’19 to Dec’20. Available online: https://ieee-dataport.org/open-access/depressivenon-depressive-tweets-between-dec19-dec20 (accessed on 10 December 2024).
- Müller, M.; Salathé, M.; Kummervold, P.E. COVID-Twitter-BERT: A Natural Language Processing Model to Analyse COVID-19 Content on Twitter. CoRR 2020.
- Huggingface. Available online: https://huggingface.co/digitalepidemiologylab/covid-twitter-bert-v2-mnli (accessed on 10 December 2024).
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: synthetic minority over-sampling technique. J. Artif. Int. Res. 2002, 16, 321–357. [CrossRef]
- Huggingface BERT Model. Available online: https://huggingface.co/tiya1012/swmh4_bert (accessed on 10 December 2024).
- Sokolova, M.; Japkowicz, N.; Szpakowicz, S. Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation. In Proceedings of the AI 2006: Advances in Artificial Intelligence, Hobart, Australia, 4-8 December 2006; pp. 1015–1021. [CrossRef]
- García, V.; Mollineda, R.; Sánchez, J. Index of balanced accuracy: A performance measure for skewed class distributions. Pattern Recognit. Image Anal. 2009, 5524 LNCS, 441 – 448. [CrossRef]
- Boutaba, R.; Salahuddin, M.A.; Limam, N.; Ayoubi, S.; Shahriar, N.; Solano, F.E.; Rendón, O.M.C. A comprehensive survey on machine learning for networking: evolution, applications and research opportunities. J. Internet Serv. Appl. 2018, 9, 16:1–16:99. [CrossRef]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization 2017. [arXiv:cs.LG/1412.6980].
- Ruby, U.; Yendapalli, V. Binary cross entropy with deep learning technique for Image classification. Int. J. Adv. Trends Comput. Sci. Eng. 2020, 9. [CrossRef]







| Reference | Data Source | Samples | Collection Time | Domain | Algorithm | Metric | Transformer Architecture |
|---|---|---|---|---|---|---|---|
| [4] | 17,864 | During COVID-19 | Sentiment Analysis | Naive Bayes, Linear Regression | |||
| [5] | 73,000 | During COVID-19 | Sentiment Analysis | Latent Dirichlet Allocation | |||
| [6] | Google Form Survey | 22,562 | During COVID-19 | Depression and Anxiety Classification | RF, XGBoost | Balanced Accuracy, Sensitivity, Specificity | |
| [7] | Google Form Survey | 17,764 | During COVID-19 | Mental Illnesses Classification | RF, SVM | Accuracy, Sensitivity, Specificity, AUROC | |
| [8] | 5,877 msgs; 1,000 related to depression | During COVID-19 | Mental Illnesses Classification | SVM, RF, NB | Precision, Recall, F1-Score | ||
| [9] | 12,911 msgs; 4,996 related to depression | During COVID-19 | Depression and Suicide Classification | SVM, RF, XGBoost, CNN, SVM+KNN | Accuracy, Precision, Recall, F1-Score, AUROC | ||
| [10] | 447,856 | Before COVID-19 (2009 to 2016) | Depression Classification | MDHAN, SVM, BiGRU, MBiGRU, CNN, MCNN, HAN | Accuracy, Precision, Recall, F1-Score | ||
| [17] | 46,103 msgs; 16,205 were related to depression | Before COVID-19 | Mental Illnesses Classification | XGBoost, RF, CNN, LSTM, BERT, and MentalBERT | Accuracy, Precision, Recall, F1-Score | Encoder-only | |
| DEENT | 123,984 tweets; 53,475 related to depression | During COVID-19 (Dec 2019 to Dec 2020) | Depression Classification | DEENT-Generic and DEENT-Bert | Accuracy, Balanced Accuracy, Precision, Recall, F1-Score | Encoder-only |
| index | text | sentiment* |
|---|---|---|
| 0 | rising cases of covid does not alarm me rising death rate does more testing capacity means more cases are detected earlier and asymtomatics and mild cases are identified india is in scary place go check out their graphs | 1 |
| 1 | please vote for chicagoindiaresolution marking india independence shared values of democracy human rights secularism | 0 |
| 2 | wishing all of you eidaladha hazrat ibrahim as ki sunnah aap sab ko mubarak in most parts of india | 1 |
| 3 | daily coronavirus cases in india top for first time covid | 1 |
| 4 | sitting here india style watching the raindrops hit this big ass pond listening to amy winehouse finallay understand what zahree was talking about | 0 |
| * 0: positive; 1: negative | ||
| index | text | target* |
|---|---|---|
| 0 | rising cases covid alarm rising death rate testing capacity means cases detected earlier asymtomatics mild cases identified india scary place go check graphs | 0 |
| 1 | please vote chicagoindiaresolution marking india independence shared values democracy human rights secularism | 1 |
| 2 | wishing eidaladha hazrat ibrahim ki sunnah aap sab ko mubarak parts india | 0 |
| 3 | daily coronavirus cases india top first time covid | 1 |
| 4 | sitting india style watching raindrops hit big ass pond listening amy winehouse finallay understand zahree talking | 0 |
| * 0: non-depressive; 1: depressive | ||
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).