feat: add response source differentiation for gateway vs upstream errors#13224
Open
nic-6443 wants to merge 10 commits intoapache:masterfrom
Open
feat: add response source differentiation for gateway vs upstream errors#13224nic-6443 wants to merge 10 commits intoapache:masterfrom
nic-6443 wants to merge 10 commits intoapache:masterfrom
Conversation
Add core.response.get_response_source(ctx) API that returns one of three values to identify where a response originated: - "apisix": response generated by APISIX Lua code (route not found, plugin rejection, upstream not configured, etc.) - "nginx": error generated by NGINX proxy module (connection refused, read timeout, DNS resolution failure, etc.) - "upstream": real HTTP response returned by the upstream service Implementation: - core.response.exit() now marks ctx._resp_source = "apisix" for 4xx/5xx - init.lua sets ctx._apisix_proxied = true after before_proxy phase, before proxy_pass dispatch - get_response_source() uses the last token of $upstream_header_time to distinguish NGINX errors (header_time = "-") from upstream responses (header_time has numeric value) Plugin updates: - opentelemetry: adds apisix.response_source span attribute, uses ngx.status for http.status_code (fixes nil status for gateway errors) - prometheus: adds response_source label to http_status counter - zipkin: adds apisix.response_source span tag, uses ngx.status
There was a problem hiding this comment.
Pull request overview
Adds a new core.response.get_response_source(ctx) API and propagates its classification into tracing (OpenTelemetry/Zipkin) and Prometheus metrics, enabling users to distinguish gateway-generated errors from upstream responses.
Changes:
- Introduces response-source classification (
apisix/nginx/upstream) incore.response, with_resp_sourceand_apisix_proxiedmarkers. - Updates OpenTelemetry and Zipkin spans to tag
apisix.response_sourceand to usengx.statusfor status reporting. - Extends Prometheus
http_statuscounter with aresponse_sourcelabel; adds a new core test file for the API.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
apisix/core/response.lua |
Implements get_response_source() and marks _resp_source on core.response.exit() for error codes. |
apisix/init.lua |
Marks requests as proxied (_apisix_proxied) after before_proxy and before NGINX proxy dispatch. |
apisix/plugins/opentelemetry.lua |
Adds apisix.response_source span attribute and switches status reporting to ngx.status. |
apisix/plugins/zipkin.lua |
Adds apisix.response_source span tag and switches status reporting to ngx.status. |
apisix/plugins/prometheus/exporter.lua |
Adds response_source label to the http_status counter and emits it in log phase. |
t/core/response-source.t |
Adds tests for get_response_source() and _resp_source behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Use gmatch(%S+) loop instead of str_match pattern to correctly handle spaces in comma-separated upstream_header_time (e.g. '- , -') - Guard ctx.var access with nil check for defensive safety - Improve integration tests to verify response_source via serverless-pre-function log phase plugin - Add tests for spaced separators and nil ctx.var edge cases
ctx.var may return upstream_header_time as a number (not string) for single numeric values. Apply tostring() before gmatch to avoid 'attempt to index local (a number value)' error.
…EST 17 - opentelemetry4-bugfix-pb-state.t TEST 3: add expected apisix.response_source attribute - response-source.t TEST 17: add diagnostic logging to debug connection refused case
The new response_source label in apisix_http_status metric needs to be accounted for in existing test assertions.
Add core.response.set_response_source(ctx, source) API for plugins that bypass NGINX proxy (e.g. ai-proxy) to explicitly mark whether the response came from upstream. The ai-proxy base now sets source to 'upstream' when the LLM provider responds, ensuring correct classification even for 429/5xx responses from upstream.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Add
core.response.get_response_source(ctx)API to distinguish where HTTP responses originate from:"apisix"core.response.exit()setsctx._resp_source"nginx"$upstream_header_timelast token is"-""upstream"$upstream_header_timelast token is numericWhy
Users need to differentiate 5XX error sources via OTel tracing and Prometheus metrics. Previously, gateway-generated errors (connection refused → 502, timeout → 504) were indistinguishable from upstream-returned 5XX, and in OTel traces, gateway errors were completely invisible (
upstream_statuswas nil).Changes
core/response.lua:resp_exit()marksctx._resp_source = "apisix"for error codes; newget_response_source()APIinit.lua: Setsctx._apisix_proxied = trueafterbefore_proxyphase, beforeproxy_passopentelemetry.lua: Addsapisix.response_sourcespan attribute; usesngx.statusforhttp.status_code(fixes nil for gateway errors)prometheus/exporter.lua: Addsresponse_sourcelabel tohttp_statuscounterzipkin.lua: Addsapisix.response_sourcespan tag; usesngx.statusget_response_source()covering all scenarios including multi-attempt retriesKnown Limitations
bypass_nginx_upstreampath (e.g. ai-proxy): responses are labeled"apisix"since they don't go through NGINX proxy. Bypass plugins can explicitly setctx._resp_source = "upstream"if needed.$upstream_header_timeis HTTP-only; stream needs separate handling.