Skip to content

Rate limit /chat/message: per-user + per-IP defence-in-depth#48

Open
vahid-ahmadi wants to merge 1 commit intomainfrom
feat/rate-limiting
Open

Rate limit /chat/message: per-user + per-IP defence-in-depth#48
vahid-ahmadi wants to merge 1 commit intomainfrom
feat/rate-limiting

Conversation

@vahid-ahmadi
Copy link
Copy Markdown
Contributor

Summary

Adds two-layer rate limiting on `POST /chat/message` — the only expensive endpoint. Other routes are cheap reads and stay unlimited.

  • Per user (or IP if anonymous) — `5/min` and `60/hour`. Authenticated clients send `X-User-Id`; the limiter keys by `user:` so signed-in users aren't blocked by anon flooders on shared NAT/proxy IPs.
  • Per IP defence-in-depth — `30/min`, regardless of who is sending. Catches anon abuse and runaway scripts.
  • Limits configurable via `RATE_LIMIT_CHAT_PER_MIN`, `RATE_LIMIT_CHAT_PER_HOUR`, `RATE_LIMIT_CHAT_IP_PER_MIN`.
  • 429 handler returns JSON with `retry_after_seconds` and a matching `Retry-After` header.
  • Frontend handles 429 with an inline assistant message ("you're sending messages a bit fast — please wait ~Ns") rather than the 402 paywall flow.

Closes #46.

Implementation notes

  • New module `backend/rate_limit.py` owns the `Limiter`, the `chat_key_func`, and the env-tunable limit strings. Imported by both `main.py` (registration + handler) and `chatbot.py` (decorator).
  • `backend/tests/conftest.py` bumps test limits to 10000/min — without it the existing `TestChatMessage` tests would trip the production cap on the 6th call.
  • Storage is slowapi's default in-memory backend. Modal runs up to 10 containers with concurrency 100, so a determined attacker can spread requests across containers and bypass any single counter. The IP layer is approximate but adequate for the threat model — swap to Redis via `storage_uri` when we need cross-container accuracy.

Test plan

  • `docker-compose up`, then hammer `/chat/message` 6 times in 30s with the same logged-in user → 6th request returns 429 with `Retry-After: 60`.
  • Hammer 6 anonymous requests from one IP → same behaviour.
  • In the UI, hit the limit → an inline assistant bubble appears with "you're sending messages a bit fast — please wait ~60s". Streaming state clears, paywall is not shown.
  • `GET /health`, `GET /version`, `GET /conversations`, and the title generator continue to work under load.
  • `pytest backend/tests/test_api.py -q` passes — the new `TestRateLimitConfig` is fast and offline.

Out of scope

  • Cross-container accurate rate limiting (would need Redis).
  • Per-org/team limits (no orgs yet).
  • Rate-limiting cheap GET endpoints (deemed unnecessary for current threat model).

Two layers protect the only expensive endpoint:

- Per user (or per IP if anonymous): 5/min and 60/hour. Authenticated
  clients send their user id in `X-User-Id`; the limiter keys by
  `user:<id>` so signed-in users aren't blocked by anon flooders on
  shared IPs.
- Per IP defence-in-depth: 30/min, regardless of who is sending.

Both limits are env-tunable (`RATE_LIMIT_CHAT_PER_MIN`,
`RATE_LIMIT_CHAT_PER_HOUR`, `RATE_LIMIT_CHAT_IP_PER_MIN`).

The 429 handler returns JSON with `retry_after_seconds` and a matching
`Retry-After` header. Frontend handles 429 with an inline assistant
message ("you're sending messages a bit fast — please wait ~Ns") rather
than the paywall flow used for 402.

Storage is slowapi's default in-memory backend. With Modal's
max_containers=10 and concurrent=100, an attacker could spread requests
across containers to bypass any single counter — the IP layer is
approximate but adequate. Swap to Redis via `storage_uri` if we need
cross-container accuracy later.

Tests: TestRateLimitConfig covers the key function (user/IP precedence,
empty header fallback, env-var overrides). End-to-end limit triggering
isn't tested — it would require precise timing and per-test storage
resets. conftest.py raises test limits well above pytest workload so
the existing chat tests don't trip the production 5/min cap.

Closes #46

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 6, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
policyengine-uk-chat Ready Ready Preview, Comment May 6, 2026 11:37am

Request Review

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

Beta preview is ready.

@vahid-ahmadi vahid-ahmadi self-assigned this May 6, 2026
@vahid-ahmadi vahid-ahmadi requested a review from SakshiKekre May 6, 2026 12:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rate limiting on /chat/message and other expensive endpoints

1 participant