Submitted:
18 February 2026
Posted:
27 February 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Unsafe identity binding: keying strict limits on attacker-controlled identifiers (e.g., unauthenticated usernames) enables targeted lockouts and cross-user interference.
- Proxy ambiguity: incorrect client IP extraction behind reverse proxies causes false positives; trusting forwarding headers from untrusted peers enables spoofing.
- Endpoint canonicalization gaps: multiple equivalent URL representations can bypass endpoint-specific limits.
- State explosion: per-key bucket creation without bounds enables memory/CPU denial-of-service.
- Scaling semantics drift: local-only rate limiters enforce per-replica budgets, which can be exceeded by traffic sprayed across replicas.
- Contributions.
- An endpoint-aware, DB-configured token-bucket mechanism with deterministic policy precedence and low-latency caching.
- A security analysis organized by concrete bypass/DoS vectors: identity binding, proxy/IP trust, canonicalization and endpoint matching, bucket explosion, misconfiguration, and scaling semantics.
- A reproducible evaluation methodology (microbenchmarks + adversarial tests) that measures overhead and attacker-induced degradation, including fake-endpoint flooding.
2. Background
2.1. Rate Limiting and HTTP Signaling
2.2. Algorithm Families
3. Threat Model and Design Goals
3.1. Threat Model
- paths and encodings, query strings, and method selection,
- headers (including spoofed forwarding headers unless constrained),
- request bodies (size, nesting depth, values),
- source IPs (single host, distributed botnet, or proxy rotation),
- credential state (no creds / stolen API key / stolen bearer token).
3.2. Security and Correctness Goals
- Bypass resistance: trivial transformations should not exceed intended limits.
- Fairness/isolation: one principal should not drain capacity intended for others.
- No secret leakage: rate-limit keys and telemetry must not expose sensitive identifiers.
- DoS robustness: attacker traffic should not turn limiter state into memory/CPU exhaustion.
- Explicit scaling semantics: per-replica vs global budgets must be well-defined and testable.
4. Mechanism Overview
5. Endpoint Model and Normalization
5.1. Endpoint Validity Registry (Early Rejection)
- the service routing table (router metadata),
- an OpenAPI specification,
- a curated configuration list for critical endpoints.
5.2. Normalization to Templates with UNKNOWN Collapse
- collapse dynamic path segments to * (or equivalent),
- bind method into the template,
- exclude query strings from the route identity unless explicitly required,
- apply a strict canonicalizer to avoid bypasses (dot segments, repeated slashes, percent-decoding policy).
| Listing 1. Endpoint normalization with UNKNOWN collapse (pseudocode). |
![]() |
6. Policy Model and Precedence
6.1. Database Schema
| Column | Type | Meaning |
| endpoint | string | normalized endpoint template (method+path) |
| project_id | nullable string/int | tenant override; NULL = global |
| rps_limit | int | allowed requests per second for this scope |
- default: fallback for recognized endpoints without a specific entry.
- UNKNOWN: fallback for requests that do not match any recognized endpoint template.
6.2. Deterministic Policy Precedence
7. Identity Resolution and Key Construction
7.1. Identity Sources and Validation
- Verified bearer-token subject (signature verified, not expired) and derived tenant/project id.
- Validated API key id (key exists and is active; never trust an arbitrary header string).
- Fallback to client network identity (IP or prefix), using trusted-proxy extraction rules.
- Avoid attacker-controlled identifiers.
7.2. Client IP Extraction Behind Proxies
7.3. Bucket Key Format and Secrecy
8. Token Bucket Parameters from RPS
8.1. RPS-to-Bucket Mapping
8.2. Per-Endpoint Consume Weights
9. State Management and Operational Controls
9.1. Policy Cache
- key: (endpoint|project_id)
- value: rps_limit
- periodic refresh with bounded staleness, and/or
- admin-triggered reload for rapid changes.
9.2. Bucket Cache and Boundedness
- idle expiration (TTL after last access),
- maximum size (LRU/LFU eviction),
- optional admission control (refuse new keys under extreme churn).
9.3. Body Parsing Safety
- parse only on an explicit allowlist of endpoints,
- enforce maximum body size and maximum JSON depth,
- fail safely: parsing errors must not silently skip strict checks for sensitive endpoints.
9.4. Dual-Bucket Controls for Sensitive Unauthenticated Flows
- per-IP (or IP-prefix) bucket to bound volumetric traffic,
- per-identifier bucket to slow targeted guessing,
10. Scaling Semantics and Distributed Enforcement
11. Security Analysis
11.1. Identity Binding Failures
- Attacker-controlled identifiers.
- Fake API key rotation.
- Secret leakage via telemetry.
11.2. Proxy Ambiguity and Header Spoofing
11.3. Endpoint Canonicalization Bypasses
- repeated slashes (//), dot segments (/./, /../),
- percent encoding and double-encoding,
- trailing slash variation, case variation (depending on routing rules),
- query-string abuse when it is included in keys.
11.4. Bucket Explosion and Fake-Endpoint Flooding
- early invalid-endpoint rejection reduces wasted work,
- UNKNOWN template collapse prevents bypass by random paths,
- strict UNKNOWN RPS limits constrain residual work on unmatched-but-not-rejected requests.
11.5. Misconfiguration Hazards
- linting/validation on load (reject non-positive rps_limit, detect overlap),
- explicit precedence rules,
- explicit fail-open vs fail-closed behavior for DB/cache outages.
12. Evaluation Methodology
12.1. Microbenchmarks
- endpoint normalization cost (match success, worst-case no match),
- identity resolution cost (verified token, API key validation, IP fallback),
- policy cache lookup and reload overhead,
- bucket consume under contention and under churn,
- end-to-end middleware overhead vs baseline.
12.2. Adversarial Test Suite
- A1: Targeted lockout test (auth endpoint).
- A2: Forwarding header spoofing.
- A3: Canonicalization corpus.
- A4: Bucket explosion under churn.
- A5: Fake API key rotation.
- A6: Fake endpoint flooding.
- A7: Scaling semantics.
13. Related Work
14. Comparison with Production Systems
15. Discussion and Limitations
- Rate limiting is not a complete security solution.
- Operational clarity matters as much as algorithms.
- Scaling semantics must be explicit.
16. Conclusion
A. Appendix A: Canonicalization Test Corpus
- /api/x/../login→/api/login
- /api//login→/api/login
- /api/login%2f (decoding policy must be explicit)
B. Appendix B: Bucket Cache Requirements
- idle expiration (inactive key eviction),
- size-based eviction (LRU/LFU),
- metrics on bucket creation and eviction,
- optional admission control when churn is high.
C. Appendix C: Evaluation Metrics Checklist
| Operation | P50 (s) | P95 (s) | P99 (s) |
|---|---|---|---|
| Normalize endpoint (match) | |||
| Normalize endpoint (no match) | |||
| Resolve identity (token) | |||
| Resolve identity (API key) | |||
| Resolve identity (IP fallback) | |||
| Policy cache lookup | |||
| Bucket consume (single-thread) | |||
| Bucket consume (contended) | |||
| End-to-end limiter overhead |
References
- OWASP Foundation. API Security Top 10 (2023): API2 – Broken Authentication. https://owasp.org/API-Security/editions/2023/en/0xa2-broken-authentication/, 2023. Accessed 2026-02-16.
- OWASP Foundation. API Security Top 10 (2019): API4 – Lack of Resources & Rate Limiting. https://owasp.org/API-Security/editions/2019/en/0xa4-lack-of-resources-and-rate-limiting/, 2019. Accessed 2026-02-16.
- Temoshok, D.; et al. Digital Identity Guidelines: Authentication and Authenticator Management. Special Publication NIST SP 800-63B, National Institute of Standards and Technology (NIST), 2025. Revision 4. Accessed 2026-02-16.
- OWASP Cheat Sheet Series. Denial of Service Cheat Sheet. https://cheatsheetseries.owasp.org/cheatsheets/Denial_of_Service_Cheat_Sheet.html, 2026. Accessed 2026-02-16.
- Serbout, S.; El Malki, A.; Pautasso, C.; Zdun, U. API Rate Limit Adoption – A Pattern Collection. In Proceedings of the Proceedings of the 28th European Conference on Pattern Languages of Programs (EuroPLoP ’23). Association for Computing Machinery, 2023, pp. 5:1–5:20. [CrossRef]
- Park, J.; Park, J.; Jung, Y.; Lim, H.; Yeo, H.; Han, D. TopFull: An Adaptive Top-Down Overload Control for SLO-Oriented Microservices. In Proceedings of the Proceedings of the ACM SIGCOMM 2024 Conference. Association for Computing Machinery, 2024, pp. 876–890. [CrossRef]
- Chen, Z.; Fan, Y.; Qian, K.; Meng, Q.; Shu, R.; Li, X.; Zhang, Y.; Wang, B.; Li, W.; Ren, F. ScalaTap: Scalable Outbound Rate Limiting in Public Cloud. In Proceedings of the IEEE INFOCOM 2025, 2025, pp. 1–10. [CrossRef]
- Chen, L.; et al. CMDRL: A Markovian Distributed Rate Limiting Algorithm in Cloud Networks. In Proceedings of the Proceedings of APNet 2024. Association for Computing Machinery, 2024. [CrossRef]
- Lyu, N.; Wang, Y.; Cheng, Z.; Zhang, Q.; Chen, F. Multi-Objective Adaptive Rate Limiting in Microservices Using Deep Reinforcement Learning. https://arxiv.org/abs/2511.03279, 2025. arXiv:2511.03279. Accessed 2026-02-16.
- Farkiani, B.; Liu, F.; Crowley, P. Rethinking HTTP API Rate Limiting: A Client-Side Approach. https://arxiv.org/abs/2510.04516, 2025. arXiv:2510.04516. Accessed 2026-02-16.
- Raghavan, B.; Vishwanath, K.; Ramabhadran, S.; Yocum, K.; Snoeren, A.C. Cloud Control with Distributed Rate Limiting. In Proceedings of the Proceedings of the ACM SIGCOMM 2007 Conference. Association for Computing Machinery, 2007, pp. 337–348. [CrossRef]
- Guan, B. Designing Scalable Rate Limiting Systems: Algorithms, Architecture, and Distributed Solutions. https://arxiv.org/abs/2602.11741, 2026. arXiv:2602.11741. Accessed 2026-02-16.
- Nottingham, M.; Fielding, R.T. Additional HTTP Status Codes. RFC 6585, 2012.
- MDN Web Docs. HTTP 429 Too Many Requests. https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/429, 2026. Accessed 2026-02-16.
- IETF HTTPAPI Working Group. RateLimit Header Fields for HTTP. https://datatracker.ietf.org/doc/html/draft-ietf-httpapi-ratelimit-headers-10, 2025. Internet-Draft draft-ietf-httpapi-ratelimit-headers-10 (Sep 27, 2025). Accessed 2026-02-16.
- Heinanen, J.; Guerin, R. A Single Rate Three Color Marker. RFC 2697, 1999.
- Heinanen, J.; Guerin, R. A Two Rate Three Color Marker. RFC 2698, 1999.
- Leach, B. redis-cell. https://github.com/brandur/redis-cell, 2026. Accessed 2026-02-16.
- Leach, B. Rate Limiting, Cells, and GCRA. https://brandur.org/rate-limiting, 2015. Accessed 2026-02-16.
- DragonflyDB. CL.THROTTLE Command Reference. https://www.dragonflydb.io/docs/command-reference/strings/cl.throttle, 2026. Accessed 2026-02-16.
- Fielding, R.T.; Reschke, J. Forwarded HTTP Extension. RFC 7239, 2014.
- Envoy Project. Global Rate Limiting Architecture Overview. https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/other_features/global_rate_limiting, 2026. Accessed 2026-02-16.
- Envoy Proxy Community. ratelimit: A Generic gRPC Rate Limit Service (Envoy-Compatible). https://github.com/envoyproxy/ratelimit, 2026. Accessed 2026-02-16.
- Envoy Project. HTTP Rate Limit Filter Documentation. https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/rate_limit_filter, 2026. Accessed 2026-02-16.
- NGINX. Module ngx_http_limit_req_module. https://nginx.org/en/docs/http/ngx_http_limit_req_module.html, 2026. Accessed 2026-02-16.
- HAProxy Technologies. Stick Tables Configuration Tutorial. https://www.haproxy.com/documentation/haproxy-configuration-tutorials/proxying-essentials/custom-rules/stick-tables/, 2026. Accessed 2026-02-16.
- Amazon Web Services. Amazon API Gateway Request Throttling. https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-request-throttling.html, 2026. Accessed 2026-02-16.
| System | Layer | State | Typical keying | Notes |
|---|---|---|---|---|
| Endpoint-aware DB policy + token bucket |
App/ service boundary |
Local (opt. shared) |
method+template + principal + IP |
Rich identity + endpoint isolation; bounded caches; UNKNOWN collapse. |
| Envoy local/global rate limiting |
Gateway/ mesh |
Local / external service |
route descriptors | Local token bucket and global enforcement via an external rate-limit service [22,24]. |
| Descriptor-based rate limit services (Lyft/Envoy) |
Shared service | Shared store | domain + descriptors |
Configuration-driven, shared-state decisions returned to callers [23]. |
| NGINX limit_req |
Edge proxy | shared- memory zone |
typically IP-based key | Very fast edge throttling; limited application identity semantics [25]. |
| HAProxy stick tables |
Edge/LB | local (+ peers) |
IP / arbitrary string keys |
Flexible counters; optional synchronization [26]. |
| AWS API Gateway throttling |
Managed gateway | provider-managed | account/stage/ route limits |
Burst + steady-state model; 429 responses [27]. |
| redis-cell / Dragonfly CL.THROTTLE (GCRA) |
Datastore primitive | central atomic op | any key | Rolling-window-like behavior, O(1) command [18,20]. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
