Submitted:
13 December 2024
Posted:
16 December 2024
You are already at the latest version
Abstract
As global suicide rates continue to rise, the demand for innovative, data-driven solutions in mental health surveillance has never been more urgent. This study harnesses the power of advanced artificial intelligence (AI) and machine learning techniques to detect suicidal ideation from Twitter data, presenting a groundbreaking approach to suicide prevention. A robust, real-time predictive model was developed to process vast volumes of social media posts, integrating natural language processing (NLP) and sentiment analysis to identify textual and emotional cues indicative of distress. This approach enables precise detection of potential suicide risks while significantly minimizing false positives, paving the way for more accurate and effective mental health interventions. The study's findings highlight the transformative potential of machine learning in suicide prevention. By uncovering behavioral patterns and context-specific triggers such as social isolation and bullying, it establishes a benchmark for the application of AI in sensitive, real-time mental health contexts. The proposed framework offers a scalable, high-performance tool for timely, data-driven responses, contributing substantially to global suicide prevention strategies. The model demonstrated exceptional predictive performance, achieving an overall accuracy of 85%, a precision of 88%, and a recall of 83% in detecting "Potential Suicide Posts." High-quality data transformation was ensured through advanced preprocessing techniques, including tokenization, stemming, and feature extraction using Term Frequency-Inverse Document Frequency (TF-IDF) and Count Vectorization. A Random Forest Classifier, chosen for its robustness in handling high-dimensional data, effectively captured linguistic and emotional patterns associated with suicidal ideation. The model’s reliability was further validated with an impressive Precision-Recall AUC score of 0.93, solidifying its efficacy as a powerful tool for real-time mental health surveillance and intervention.
Keywords:
1. Introduction
1.1. Suicide as a Global Public Health Crisis
1.2. The Role of Technology in Suicide Prevention
1.3. Objectives and Contributions of This Study
2. Literature Review
2.1. Suicidal Ideation
2.2. Prevalence and Demographics
- Suicidal ideation is notably prevalent among adolescents, with studies indicating a significant presence in this age group. For instance, in a study conducted in Macapá, 46.7% of adolescents reported experiencing suicidal thoughts, with a higher prevalence in private school students compared to public school students (Abreu & Martins, 2022)
- The global suicide rate among 15-19-year-olds is significantly higher than in younger age groups, highlighting adolescence as a critical period for intervention (Berger et al., 2015)
2.3. Associated Factors
- Psychological factors such as depression, hopelessness, and feelings of worthlessness are strongly linked to suicidal ideation. These are often exacerbated by mental health disorders like depression and anxiety (Berger et al., 2015) (Quiroga & Walton, 2014)
- Substance abuse is another significant factor, with a high percentage of individuals in treatment for substance-related disorders reporting suicidal thoughts (Vale et al., 2023)
- Social factors, including poor family relationships and exposure to violence or discrimination, particularly affect vulnerable groups such as transgender individuals (Vaz et al., 2022)
2.4. Intervention and Prevention
- Cognitive Behavioral Therapy (CBT) has been identified as an effective intervention for reducing suicidal ideation by promoting cognitive restructuring and problem-solving skills (Abreu & Martins, 2022)
- Early detection and treatment of mental health disorders, along with maintaining supportive relationships, are crucial protective factors (Berger et al., 2015)
- For military personnel, addressing the unique stressors of military life and providing mental health support can mitigate the risk of suicidal ideation (Mostardeiro et al., 2022)
2.5. Broader Perspectives
2.6. Risk Factors for Suicide
2.7. Individual Risk Factors
- Mental Health Disorders: Conditions such as depression and previous suicide attempts are strong predictors of suicide, with risk ratios ranging from 4 to 13 (Favril et al., 2023)
- Demographic Variables: Age and gender also play roles, with males and younger individuals showing higher risk (Grover et al., 2023)
2.8. Environmental and Social Factors
- School Environment: For adolescents, factors like school maladjustment, victimization, and negative peer relationships are critical (“Environmental Systematic Analysis of Factors Associated with Adolescent Suicide Risk,” 2024)
- Socioeconomic Stressors: Unemployment and poverty correlate positively with suicide risk, although not significantly in all studies (Muniyapillai et al., 2024)
2.9. Cultural and Contextual Influences
- Racism and Trauma: Experiences of racism and intergenerational trauma can exacerbate mental health challenges, particularly among marginalized groups (Wong, 2023)
2.10. Social Media as a Mental Health Indicator
2.11. Linguistic Indicators
- Studies have shown that specific language use on social media correlates with mental health issues, such as increased expressions of sadness or anxiety (Kansal et al., 2024)
- Machine learning models can analyze these linguistic patterns to detect early signs of mental disorders (Abdurrahim & Fudholi, 2024)
2.12. Social Connections
- The relationships and interactions users have on social media platforms can also serve as predictors of mental health, with network-based models outperforming traditional text-based approaches (Oliveira et al., 2024)
- Social media usage patterns, including frequency and type of interactions, have been linked to symptoms of anxiety and depression among adolescents (Mohamed, 2024)
2.13. Social Media Insights on Suicidal Ideation
2.14. Network Connections and Mental Health
- Social media connections, such as Twitter friends and followers, can serve as strong predictors of mental health conditions. Network-based models have been found to outperform text-based models in predicting depression and anxiety, suggesting the importance of considering social connections in mental health assessments (Oliveira et al., 2024)
- The use of machine learning models to analyze social media interactions has shown promise in automating the diagnosis of mental health disorders, particularly during the COVID-19 pandemic (Kansal et al., 2024)
2.15. Emotional Expression and Sentiment Analysis
- Sentiment analysis of social media content, such as tweets, can reveal patterns of emotional expression that correlate with mental health outcomes. Studies have linked the variability and instability of emotional content on social media to depressive and anxiety symptoms (Joinson et al., 2024)
- The use of advanced models like CNN-BiLSTM for classifying mental health-related texts has demonstrated high accuracy, indicating the potential of these techniques for early diagnosis (Abdurrahim & Fudholi, 2024)
2.16. Impact on Adolescents and Young Adults
- Social media use has been associated with negative mental health outcomes among adolescents, affecting well-being, self-esteem, and social relationships. The impact varies across different platforms, with some like TikTok and Instagram having more negative effects (Wal et al., 2024)
- The relationship between social media use and mental health in young adults is complex, with studies highlighting the need for further research to understand the underlying mechanisms (Chugh et al., 2024) (Mohamed, 2024)
3. Role of Deep Learning in Sentiment Analysis
3.1. Methodologies in Deep Learning for Sentiment Analysis
- Aspect-Based Sentiment Analysis (ABSA): Deep learning models, such as those used in ABSA, focus on predicting sentiment polarities related to specific features or entities within text, offering more precise insights than general sentiment analysis (Umamaheswari & Ranjana, 2024)
- Deep Learning Architectures: Various architectures, including Deep Convolutional Neural Networks (DCNN), Long Short-Term Memory (LSTM), and BERT models, have been employed to enhance sentiment classification accuracy across different datasets (Adagale & Gupta, 2024; Wu et al., 2024)
- Transfer Learning: Techniques like BERT and its variants, such as Arabert, leverage pre-trained models on large corpora to improve sentiment analysis in specific languages or domains, such as Arabic text (Elouli et al., 2024)
3.2. Applications of Deep Learning in Sentiment Analysis
- E-commerce: In e-commerce, deep learning models analyze customer reviews to improve consumer experience by accurately classifying sentiments expressed in multimodal formats, including text, images, and emojis (N & Kothandaraman, 2024)
- Social Media and Public Opinion: Sentiment analysis on social media platforms helps businesses and political entities understand public opinion, aiding in strategic decision-making (Hase et al., 2024; Bhor et al., 2024)
- Financial Markets: Deep learning models are used to analyze sentiment in financial texts, providing insights into market trends and aiding in risk management and decision-making for investors (Botta et al., 2024)
3.3. Challenges and Future Directions
- Data Complexity: Handling diverse data modalities, such as visual and multimodal data, remains a challenge, requiring adaptation of deep learning techniques (Suryawanshi, 2024)
- Language and Context Variability: Variations in language, context, and semantics pose challenges in accurately capturing sentiments, necessitating ongoing research and model refinement (Adagale & Gupta, 2024)
3.4. Existing Studies on Suicide Detection
3.5. NLP and Machine Learning Approaches
- Several studies have utilized NLP techniques to analyze linguistic patterns in social media posts. For instance, Cai et al. employed models like Logistic Regression and BERT to detect suicidal tendencies in tweets, highlighting the potential of machine learning in digital mental health monitoring (Cai et al., 2024)
- Balasubramanian et al. proposed a Cat Swarm-Intelligent Adaptive Recurrent Network (CSI-ARN) model, achieving high accuracy and F1-scores in detecting suicidal thoughts from social media comments (Balasubramanian et al., 2024)
- Lin et al. introduced a RoBERTa-CNN model, which demonstrated robust performance in identifying suicidal intentions on Reddit posts, achieving a mean accuracy of 98% (Lin et al., 2024)
3.6. Deep Learning and Psychiatric Integration
- Wang et al. suggested integrating psychiatric scales with neural networks to provide theoretical support for suicide risk detection models, enhancing the interpretability and accuracy of predictions (Wang et al., 2024)
- Raja and Nagarajan developed an LSTM-Attention-RNN model, which effectively captured emotional nuances in social media posts, achieving notable improvements over baseline models (R & Nagarajan, 2024)
3.7. Challenges and Ethical Considerations
- Squires et al. addressed the challenge of uncertainty in mental health data classification by introducing a semi-supervised deep label smoothing method, which improved classification accuracy on Reddit datasets (Squires et al., 2024)
- Cai et al. emphasized the ethical considerations in applying NLP models for sensitive topics like suicide detection, advocating for the integration of human judgment in decision-making processes (Cai et al., 2024)
3.8. Dataset Development and Augmentation
4. Methodology
4.1. Data Collection
4.2. Data Collection Methodology
- #wanttodie
- #suicideprevention
- #waysout
- #depressionhelp
- #feelinghopeless
- #mentalhealthstruggles
- #overwhelmed
- Anonymized User ID: Ensures user privacy while maintaining the ability to analyze post history.
- Timestamp: Specifies the time and date of the post.
- Content: The main body of the tweet, including any hashtags.
- Associated Keywords/Hashtags: A list of tags or terms that triggered the inclusion of the tweet.
4.3. Risk Categorization Framework
- Potential Suicide post: Posts that lightly touch on distressing thoughts but do not exhibit immediate suicidal intent (Class 1).
- Not Suicide: Tweets that show no signs of suicidal ideation (Class 0)
4.3. Data Preprocessing
4.3.1. Loading and Cleaning Data
- ○
- The dataset was imported using Pandas, and missing values were removed.
- ○
-
Tweets were cleaned by:
- ■
- Converting text to lowercase.
- ■
- Removing mentions (@usernames), URLs, special characters, and numbers using regular expressions.
- ■
- Reducing consecutive repeating characters to single instances (e.g., “soooo” → “so”).

4.3.2. Tokenization and Stopword Removal
- ○
- The text was tokenized into individual words.
- ○
- Common stopwords (e.g., “the,” “and”) were removed, and words were reduced to their root forms using the Porter Stemmer.

4.3.3. Feature Extraction
- ○
- Text data was converted into numerical format using TF-IDF (Term Frequency-Inverse Document Frequency) and Count Vectorization for machine learning readiness.
4.3.4. Train-Test Split
- ○
- The dataset was divided into training (80%) and testing (20%) subsets using train_test_split.

4.4. Model Development
-
Algorithm Selection:
- ○
- A Random Forest Classifier was chosen for its robustness and ability to handle high-dimensional data. It was trained with 100 estimators for optimal performance.
-
Training Process:
- ○
- The classifier was trained on the pre-processed training set (X_train, y_train) and validated on the testing set (X_test, y_test).
-
Evaluation Metrics:
- ○
-
The model’s performance was evaluated using standard metrics:
- ■
- Precision: Proportion of correct positive predictions.
- ■
- Recall: Proportion of actual positives correctly identified.
- ■
- F1-Score: Harmonic mean of precision and recall.
- ■
- Accuracy: Overall correctness of predictions.
- ○
- A confusion matrix was used to visualize true positives, true negatives, false positives, and false negatives, which provided a detailed view of model performance. The confusion matrix was particularly useful for identifying specific areas where the model underperformed, such as false negatives (critical in suicide ideation detection).
4.5. Data Features
Data Sample
| Index | Tweet | Suicide |
|---|---|---|
| 0 | I love my new phone it’s super fast | Not Suicide Post |
| 1 | Excited to start a new journey in life | Not Suicide Post |
| 2 | It hurts to even wake up every morning | Potential Suicide Post |
| 3 | Cherishing every moment with my loved ones | Not Suicide Post |
| 4 | Sometimes I wonder if life is worth it | Potential Suicide Post |
| 5 | Cherishing every moment with my loved ones | Not Suicide Post |
| 6 | Pushing through challenges feeling stronger every day | Not Suicide Post |
| 7 | I can’t seem to find a way out of this darkness | Potential Suicide Post |
| 8 | Planning to clean my house this weekend | Not Suicide Post |
| 9 | It hurts to even wake up every morning | Potential Suicide Post |
| 10 | Planning to clean my house this weekend | Not Suicide Post |
| 11 | Feeling grateful for another beautiful day | Not Suicide Post |
| 12 | Went for a walk in the park; it was relaxing | Not Suicide Post |
| 13 | Thankful for the little joys in life | Not Suicide Post |
| 14 | Thankful for the little joys in life | Not Suicide Post |

5. Results and Findings
5.1. Performance Results
| Class | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| Not Suicide Post (0) | 0.82 | 0.88 | 0.85 | 145 |
| Potential Suicide Post (1) | 0.88 | 0.83 | 0.85 | 155 |
| Accuracy | 0.85 | 300 | ||
| Macro Avg | 0.85 | 0.85 | 0.85 | 300 |
| Weighted Avg | 0.85 | 0.85 | 0.85 | 300 |

5.2. Precision-Recall Curve
5.3. Data Set Reduction
5.4. Suicide Ideation Confusion Matrix
5.5. Training Confusion Matrix (Left)
- True Positives (TP): 302
- True Negatives (TN): 315
- False Positives (FP): 40
- False Negatives (FN): 43
Test Confusion Matrix (Right)
- True Positives (TP): 128
- True Negatives (TN): 127
- False Positives (FP): 18
- False Negatives (FN): 27
-
Training vs. Test:
- ○
- The training matrix shows higher overall correct classifications compared to the test matrix, indicating that the model has learned well on the training data. However, the slight difference in performance on the test set may highlight minor overfitting or areas where the model’s generalizability could improve.
-
False Negatives:
- ○
- The presence of 27 false negatives in the test set is critical for suicide ideation detection, as missing “Potential Suicide Posts” could have severe real-world implications. Strategies to improve recall, such as fine-tuning thresholds or enhancing feature representation, are necessary.
-
False Positives:
- ○
- The relatively low false positives in both matrices indicate that the model maintains high precision, minimizing unnecessary alerts, which is valuable for efficient resource allocation
5.6. Performance Analysis
-
Class 0 (“Not Suicide Post”):
- ○
- The model exhibited better recall (0.88), indicating it correctly identified most non-suicidal posts. However, a precision of 0.82 suggests some false positives.
-
Class 1 (“Potential Suicide Post”):
- ○
- The precision (0.88) was higher than recall (0.83), meaning the model effectively reduced false positives but missed some critical cases (false negatives).
6. Discussion
6.1. Alignment with Objectives
6.2. Future Directions
7. Conclusions
References
- Abdulsalam, A., & Alhothali, A. (2024). Suicidal ideation detection on social media: A review of machine learning methods. Social Network Analysis and Mining, 14(1), 1-16.
- Abraham, Z.K., & Sher, L. (2019). Adolescent suicide as a global public health issue. International Journal of Adolescent Medicine and Health. [CrossRef]
- Abdurrahim, A., & Fudholi, D.H. (2024). Mental health prediction model on social media data using CNN-BiLSTM. Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control. [CrossRef]
- Abreu, T. de, & Martins, M. das G. T. (2022). A presença de ideação suicida em adolescentes e terapia cognitivo-comportamental na intervenção: Um estudo de campo. Revista Ibero-Americana de Humanidades, Ciências e Educação. [CrossRef]
- Adagale, S., & Gupta, P. (2024). Comprehensive analysis of text-based sentiment analysis using deep learning. IEEE ICITEICS. [CrossRef]
- Balasubramanian, J., Koppa, K.B., Solanki, V., & Saxena, A.K. (2024). Suicide thoughts screening with social media using cat swarm-intelligent adaptive recurrent network. Multidisciplinary Science Journal. [CrossRef]
- Berger, G., Casa, A.D., & Pauli, D. (2015). Suizidalität bei Adoleszenten – Prävention und Behandlung. Therapeutische Umschau. Revue Thérapeutique. [CrossRef]
- Bhor, S., Bhor, S., Rani, B., Aniket, F., & Dube, Prof. D. (2024). Implementation of sentiment analysis using deep learning. International Journal of Advanced Research in Science, Communication and Technology. [CrossRef]
- Botta, A., Mohini, Pandey, Raju, A., & Balaji, Ch. (2024). Deep learning for sentiment analysis in financial markets. IEEE ICRTCST. [CrossRef]
- Cai, S., Jung, H., Liu, J., & Liu, W. (2024). Research on the applicability of suicide tweet detection algorithms. Applied and Computational Engineering. [CrossRef]
- Chugh, S., Bansal, Y., Nagpal, R., Sakshi, ., Kaur, S., Saluja, S., Ahluwalia, B.K., & Sharma, S. (2024). The impact of social media on mental health. Mental Health Insights. [CrossRef]
- Elsayed, N., Elsayed, Z., & Ozer, M. (2024). CautionSuicide: A deep learning-based approach for detecting suicidal ideation in chatbot conversation. arXiv.Org. [CrossRef]
- Favril, L., Yu, R., Geddes, J.R., & Fazel, S. (2023). Individual-level risk factors for suicide mortality in the general population: An umbrella review. The Lancet. Public Health. [CrossRef]
- Grover, C., Huber, J., Brewer, M., Basu, A., & Large, M. (2023). Meta-analysis of clinical risk factors for suicide among people presenting to emergency departments. Acta Psychiatrica Scandinavica. [CrossRef]
- Hase, Y.P., Karwar, P.B., Hingmire, S.N., & Gopale, B.V. (2024). Sentiment analysis using deep learning. International Journal of Advanced Research in Science, Communication and Technology. [CrossRef]
- Joinson, D., Davis, O., & Simpson, E. (2024). The dynamics of emotion expression on Twitter and mental health in a UK longitudinal study. International Journal for Population Data Science. [CrossRef]
- Kansal, M., Singh, P., Srivastava, P., Singhal, R., Deep, N., & Singh, A. (2024). Mental health monitoring in the digital age. Mental Health Monitoring. [CrossRef]
- Liu, H.Y., & Qayyum, Z. (2023). Suicidal behaviors in children and adolescents: Synthesis of issues and solutions from global perspectives. JAAC. [CrossRef]
- Mahmud, S.A. (2023). Suicidal tweet dataset. Kaggle. Retrieved November 29, 2024, from https://www.kaggle.com/datasets/aunanya875/suicidal-tweet-detection-dataset/code.
- Meier, A., & Reinecke, L. (2023). Social media and mental health. Oxford University Press. [CrossRef]
- Mohamed, N. (2024). Investigating the impact of social media use on adolescent mental health. OSF Preprints. [CrossRef]
- Mostardeiro, V.M.P., Somavilla, V.E., & Mocelin, G. (2022). Ideação suicida no contexto militar. Conjeturas. [CrossRef]
- Moulahi, B., Azé, J., & Bringay, S. (2017). Suivi et détection des idéations suicidaires dans les médias sociaux. Social Media Insights.
- Muniyapillai, T., Kulothungan, K., Jai, N., CM, S.S.K., Godwyn, R., Shivashankari, S., Raje, S., Krishnakumar, S.P., Devi, S., & Suresh, S. (2024). Suicide and its risk factors – An ecological study. Journal of Education and Health Promotion. [CrossRef]
- Environmental Systematic Analysis of Factors Associated with Adolescent Suicide Risk. (2024). Korean Association for Learner-Centered Curriculum and Instruction. [CrossRef]




Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).