feat(server): Track request rejections with a unified server.rejected metric#350
feat(server): Track request rejections with a unified server.rejected metric#350
Conversation
… metric Add a RejectionReason enum that classifies why a request was rejected (killswitch, rate_limit, auth, bad_request, internal, task_concurrency, web_concurrency) and emits a server.rejected counter tagged with the reason on every rejection. Previously, killswitch and rate limit rejections were only logged at debug level with no metric. The web concurrency limit had its own standalone web.concurrency.rejected counter. Now all rejection paths emit a single server.rejected metric that can be summed or split by reason. Refs FS-275 Co-Authored-By: Claude <noreply@anthropic.com>
…id multipart Two gaps in the batch flow were missed in the initial implementation: - Per-item errors in the batch endpoint passed through create_error_part, which called error.status() and serialized the body but never called into_response(), so rejection_reason().emit() was never reached. Fixed by calling error.rejection_reason().emit() directly in create_error_part. - Invalid multipart requests (missing Content-Type boundary) were rejected by Axum via MultipartRejection's own IntoResponse, bypassing ApiError entirely. Fixed by introducing BatchStreamRejection, a thin newtype wrapper that emits RejectionReason::BadRequest before delegating to the original rejection response. Refs FS-275 Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Co-Authored-By: Claude <noreply@anthropic.com>
Aligns with the server.requests.* metric family. Reverts per-operation emit from create_error_part to keep the metric strictly request-level. Refs FS-275 Co-Authored-By: Claude <noreply@anthropic.com>
matt-codecov
left a comment
There was a problem hiding this comment.
i think this is a bit messy. this metric and logic are essentially duplicated from our response code metric and logic, except they:
- omit <400
- conflate 401 and 403
- conflate 400 and 413
- (your goal) distinguish between 429s caused by rate limits, killswitches, task concurrency limits, and web concurrency limits
what i'd prefer (and what i think we need anyway) is:
- a pair of gauges (limit, current value) for each rate limit using tags to distinguish between global, usecase, and scope limits
- a pair of gauges (limit, current value) for task concurrency and web concurrency
- a counter for killswitches using tags to distinguish between killswitch rules
it's more involved to implement (changes to TokenBucket, maybe other things), but we can see which things are trending toward their limits and intervene before the 429s happen, or if it's a surge we can see with more granularity which limits are being tripped. also sentry product teams can track their rate limit usage on their own dashboards
|
Thanks, this is raising good points. Let me reconsider the approach.
These were very intentional. Since we have status codes tracked already, I'm now looking to create a high-level classification of silent, "expected" errors. If any of these occur too often, we are facing a systemic issue that needs to be fixed. This can be related to auth, the client violating the spec, or one of our limits. Some of these may be valuable to report to the usecase, hence the separation of the limits. Use cases will be able to define their own limits soon and need to be able to act on those.
Agreed. These exist and are exposed to KEDA currently; some also via statsd. My goal right now, however, is to focus on rejection reasons rather than capacity.
I'm struggling to find a central place where we can count all rejections and classify them. Instrumenting them individually like before means we can easily miss some. |
Add a
RejectionReasonenum in a newrejectionmodule that classifies why a request was rejected and emits aserver.rejectedcounter tagged with areasonvalue on every rejection.Previously, killswitch and rate limit rejections were only logged at debug level — no metric was emitted. The web concurrency limit had its own standalone
web.concurrency.rejectedcounter with no equivalent for other rejection types.Refs FS-275