Preprint
Article

This version is not peer-reviewed.

Designing CAPTCHA Systems with Reinforcement Learning for Adaptive Defense

Submitted:

15 April 2026

Posted:

16 April 2026

You are already at the latest version

Abstract
CAPTCHA systems remain a widely deployed defense against automated abuse, but advances in machine learning have reduced the effectiveness of traditional challenge-based designs and exposed limitations in proprietary risk-scoring systems. This paper presents an adaptive, reinforcement learning-based CAPTCHA defense framework for high-security web applications. The proposed system formulates bot detection as a partially observable Markov decision process and uses a Proximal Policy Optimization agent with Long Short-Term Memory to analyze streamed behavioral telemetry, including mouse movements, clicks, keystrokes, and scrolling, over sequential interaction windows. Based on accumulated evidence, the agent can continue observing, deploy a honeypot, issue graded CAPTCHA challenges, allow a session, or block it. To complement the sequential agent, the framework also includes an XGBoost classifier that produces a session-level human-likelihood score as a supervised benchmark. Experiments on a simulated ticket-purchasing web application using human-generated sessions and multiple bot tiers, including scripted, replay-based, and LLM-powered agents, show strong preliminary performance. Among the evaluated reinforcement learning variants, Soft PPO achieved the best test performance with two reward structures, with one it reached 98.8% accuracy, 100% precision, and 0.987 F1 score, while with the revised reward structure it reached 96.4% accuracy, 100% precision, and 0.963 F1 score. The XGBoost classifier achieved 99.48% accuracy, 1.000 ROC-AUC, and 0.9919 F1 score. The results indicate that sequential reinforcement learning can support accurate and low-friction bot detection, while the accompanying classifier provides an interpretable and efficient benchmark. Compared with proprietary systems such as Google reCAPTCHA v3, the proposed framework emphasizes transparency, auditability, and explicit sequential decision-making rather than black-box risk scoring. Overall, this work introduces an open and adaptive CAPTCHA-defense framework that offers a promising alternative for studying and deploying behavior-based bot mitigation.
Keywords: 
;  ;  ;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated