DR-Transformer: A Dual-Regularized Transformer Combining Sparse Attention and Supervised Contrastive Learning for Interpretable Stress Detection in Social Media

Mehdi Chrifi Alaoui; Nour-eddine Joudar; Mohamed Ettaouil

doi:10.20944/preprints202605.1549.v1

Submitted:

21 May 2026

Posted:

22 May 2026

You are already at the latest version

Abstract

Automatic detection of stress from social-media text holds promise for digital mental health, but most existing Transformer-based approaches are opaque and computationally demanding. This work presents DR-Transformer, a Dual-Regularized Transformer that combines two complementary mechanisms: (i) a group sparsity penalty (L_2,1/L₂ elastic net) applied to the query and key projection matrices of every attention head, which encourages whole-row sparsity for token-level interpretability; and (ii) a supervised contrastive loss on the [CLS] projection, which organizes the latent space according to the stress label. The architecture is intentionally lightweight (6 layers, 8 heads, 256-dim embeddings; ∼9.5M parameters) and runs entirely on consumer-grade hardware (NVIDIA GTX 1660, 6 GB). Experiments on the publicly available Dreaddit dataset (binary stress classification, 2,838 train / 715 test segments) compare DR-Transformer against logistic regression, BiLSTM, a standard Transformer of identical architecture, and MentalBERT. Across five seeded runs, DR-Transformer (Full) reaches F₁ = 0.876 (bootstrap 95% CI 0.852–0.898), outperforming the Standard Transformer (F₁ = 0.842; McNemar p < 0.001 with Bonferroni correction) and performing comparably to the much larger MentalBERT (F₁ = 0.879; p = 0.421). Sparse regularization increases the fraction of near-zero attention weights (below 0.01) from 0.215 to 0.682, yielding visibly focused attention maps, while the supervised contrastive loss improves the silhouette score of [CLS] embeddings from 0.312 to 0.483. Dual regularization thus combines accuracy, interpretability, and efficiency in a single model trainable without specialized infrastructure.

Keywords:

stress detection

;

mental health

;

social media

;

Transformer

;

sparse attention

;

supervised contrastive learning

;

interpretability

;

computational efficiency

;

natural language processing

;

Dreaddit

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

DR-Transformer: A Dual-Regularized Transformer Combining Sparse Attention and Supervised Contrastive Learning for Interpretable Stress Detection in Social Media

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe