Skip to content

feat(server): Track request rejections with a unified server.rejected metric#350

Draft
jan-auer wants to merge 4 commits intomainfrom
feat/rejection-reason-metrics
Draft

feat(server): Track request rejections with a unified server.rejected metric#350
jan-auer wants to merge 4 commits intomainfrom
feat/rejection-reason-metrics

Conversation

@jan-auer
Copy link
Member

@jan-auer jan-auer commented Mar 3, 2026

Add a RejectionReason enum in a new rejection module that classifies why a request was rejected and emits a server.rejected counter tagged with a reason value on every rejection.

Previously, killswitch and rate limit rejections were only logged at debug level — no metric was emitted. The web concurrency limit had its own standalone web.concurrency.rejected counter with no equivalent for other rejection types.

Refs FS-275

… metric

Add a RejectionReason enum that classifies why a request was rejected
(killswitch, rate_limit, auth, bad_request, internal, task_concurrency,
web_concurrency) and emits a server.rejected counter tagged with the
reason on every rejection.

Previously, killswitch and rate limit rejections were only logged at
debug level with no metric. The web concurrency limit had its own
standalone web.concurrency.rejected counter. Now all rejection paths
emit a single server.rejected metric that can be summed or split by
reason.

Refs FS-275
Co-Authored-By: Claude <noreply@anthropic.com>
@linear
Copy link

linear bot commented Mar 3, 2026

@jan-auer jan-auer marked this pull request as ready for review March 3, 2026 15:11
@jan-auer jan-auer requested a review from a team as a code owner March 3, 2026 15:11
…id multipart

Two gaps in the batch flow were missed in the initial implementation:

- Per-item errors in the batch endpoint passed through create_error_part,
  which called error.status() and serialized the body but never called
  into_response(), so rejection_reason().emit() was never reached. Fixed
  by calling error.rejection_reason().emit() directly in create_error_part.

- Invalid multipart requests (missing Content-Type boundary) were rejected
  by Axum via MultipartRejection's own IntoResponse, bypassing ApiError
  entirely. Fixed by introducing BatchStreamRejection, a thin newtype
  wrapper that emits RejectionReason::BadRequest before delegating to
  the original rejection response.

Refs FS-275
Co-Authored-By: Claude <noreply@anthropic.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

jan-auer and others added 2 commits March 3, 2026 17:21
Co-Authored-By: Claude <noreply@anthropic.com>
Aligns with the server.requests.* metric family. Reverts per-operation
emit from create_error_part to keep the metric strictly request-level.

Refs FS-275
Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Contributor

@matt-codecov matt-codecov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this is a bit messy. this metric and logic are essentially duplicated from our response code metric and logic, except they:

  • omit <400
  • conflate 401 and 403
  • conflate 400 and 413
  • (your goal) distinguish between 429s caused by rate limits, killswitches, task concurrency limits, and web concurrency limits

what i'd prefer (and what i think we need anyway) is:

  • a pair of gauges (limit, current value) for each rate limit using tags to distinguish between global, usecase, and scope limits
  • a pair of gauges (limit, current value) for task concurrency and web concurrency
  • a counter for killswitches using tags to distinguish between killswitch rules

it's more involved to implement (changes to TokenBucket, maybe other things), but we can see which things are trending toward their limits and intervene before the 429s happen, or if it's a surge we can see with more granularity which limits are being tripped. also sentry product teams can track their rate limit usage on their own dashboards

@jan-auer jan-auer marked this pull request as draft March 4, 2026 07:24
@jan-auer
Copy link
Member Author

jan-auer commented Mar 4, 2026

Thanks, this is raising good points. Let me reconsider the approach.

conflate 401 and 403
conflate 400 and 413

These were very intentional. Since we have status codes tracked already, I'm now looking to create a high-level classification of silent, "expected" errors. If any of these occur too often, we are facing a systemic issue that needs to be fixed. This can be related to auth, the client violating the spec, or one of our limits.

Some of these may be valuable to report to the usecase, hence the separation of the limits. Use cases will be able to define their own limits soon and need to be able to act on those.

a pair of gauges [...]

Agreed. These exist and are exposed to KEDA currently; some also via statsd. My goal right now, however, is to focus on rejection reasons rather than capacity.

a counter for killswitches

I'm struggling to find a central place where we can count all rejections and classify them. Instrumenting them individually like before means we can easily miss some.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants