Preprint
Article

This version is not peer-reviewed.

Lightweight and Effective Coded-Slang Detection for Cyber-Drug Intelligence

Submitted:

13 May 2026

Posted:

13 May 2026

You are already at the latest version

Abstract
Drug-related criminal activities on social media increasingly employ dynamic coded language—such as fruit substitutions, numeric homophones, and dialectal metaphors—to evade detection. This linguistic obfuscation poses significant challenges to conventional keyword-based monitoring systems. Furthermore, the scarcity of open-source datasets capturing these specific evasive expressions severely impedes automated detection research. To address these limitations, we construct a dedicated dataset of 10000 samples of drug-related coded texts sourced from mainstream Chinese social media platforms. Concurrently, we propose an optimized, TextCNN-based deep learning framework tailored for the automated identification of such illicit content. By leveraging multi-scale convolutional feature extraction, our model effectively captures intricate local semantic patterns and morphological variations inherent in short, highly noisy social media texts. Experimental results demonstrate that the proposed method achieves an F1-score of 99.3%, significantly outperforming established baseline approaches in the semantic representation of coded language. These findings indicate that our framework provides an efficient, robust, and scalable computational solution for intelligent drug-related content monitoring in complex online environments.
Keywords: 
;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated