Skip to content

feat: add response source differentiation for gateway vs upstream errors#13224

Open
nic-6443 wants to merge 10 commits intoapache:masterfrom
nic-6443:feat/upstream-error-differentiation
Open

feat: add response source differentiation for gateway vs upstream errors#13224
nic-6443 wants to merge 10 commits intoapache:masterfrom
nic-6443:feat/upstream-error-differentiation

Conversation

@nic-6443
Copy link
Copy Markdown
Member

What

Add core.response.get_response_source(ctx) API to distinguish where HTTP responses originate from:

Value Meaning Detection
"apisix" APISIX Lua code generated the response core.response.exit() sets ctx._resp_source
"nginx" NGINX proxy module generated the error Proxied but $upstream_header_time last token is "-"
"upstream" Upstream service returned the response Proxied and $upstream_header_time last token is numeric

Why

Users need to differentiate 5XX error sources via OTel tracing and Prometheus metrics. Previously, gateway-generated errors (connection refused → 502, timeout → 504) were indistinguishable from upstream-returned 5XX, and in OTel traces, gateway errors were completely invisible (upstream_status was nil).

Changes

  • core/response.lua: resp_exit() marks ctx._resp_source = "apisix" for error codes; new get_response_source() API
  • init.lua: Sets ctx._apisix_proxied = true after before_proxy phase, before proxy_pass
  • opentelemetry.lua: Adds apisix.response_source span attribute; uses ngx.status for http.status_code (fixes nil for gateway errors)
  • prometheus/exporter.lua: Adds response_source label to http_status counter
  • zipkin.lua: Adds apisix.response_source span tag; uses ngx.status
  • Tests: Unit tests for get_response_source() covering all scenarios including multi-attempt retries

Known Limitations

  • bypass_nginx_upstream path (e.g. ai-proxy): responses are labeled "apisix" since they don't go through NGINX proxy. Bypass plugins can explicitly set ctx._resp_source = "upstream" if needed.
  • Stream (L4) proxy: $upstream_header_time is HTTP-only; stream needs separate handling.

Add core.response.get_response_source(ctx) API that returns one of three
values to identify where a response originated:

- "apisix": response generated by APISIX Lua code (route not found,
  plugin rejection, upstream not configured, etc.)
- "nginx": error generated by NGINX proxy module (connection refused,
  read timeout, DNS resolution failure, etc.)
- "upstream": real HTTP response returned by the upstream service

Implementation:
- core.response.exit() now marks ctx._resp_source = "apisix" for 4xx/5xx
- init.lua sets ctx._apisix_proxied = true after before_proxy phase,
  before proxy_pass dispatch
- get_response_source() uses the last token of $upstream_header_time to
  distinguish NGINX errors (header_time = "-") from upstream responses
  (header_time has numeric value)

Plugin updates:
- opentelemetry: adds apisix.response_source span attribute, uses
  ngx.status for http.status_code (fixes nil status for gateway errors)
- prometheus: adds response_source label to http_status counter
- zipkin: adds apisix.response_source span tag, uses ngx.status
Copilot AI review requested due to automatic review settings April 14, 2026 07:42
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels Apr 14, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new core.response.get_response_source(ctx) API and propagates its classification into tracing (OpenTelemetry/Zipkin) and Prometheus metrics, enabling users to distinguish gateway-generated errors from upstream responses.

Changes:

  • Introduces response-source classification (apisix / nginx / upstream) in core.response, with _resp_source and _apisix_proxied markers.
  • Updates OpenTelemetry and Zipkin spans to tag apisix.response_source and to use ngx.status for status reporting.
  • Extends Prometheus http_status counter with a response_source label; adds a new core test file for the API.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
apisix/core/response.lua Implements get_response_source() and marks _resp_source on core.response.exit() for error codes.
apisix/init.lua Marks requests as proxied (_apisix_proxied) after before_proxy and before NGINX proxy dispatch.
apisix/plugins/opentelemetry.lua Adds apisix.response_source span attribute and switches status reporting to ngx.status.
apisix/plugins/zipkin.lua Adds apisix.response_source span tag and switches status reporting to ngx.status.
apisix/plugins/prometheus/exporter.lua Adds response_source label to the http_status counter and emits it in log phase.
t/core/response-source.t Adds tests for get_response_source() and _resp_source behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Use gmatch(%S+) loop instead of str_match pattern to correctly handle
  spaces in comma-separated upstream_header_time (e.g. '- , -')
- Guard ctx.var access with nil check for defensive safety
- Improve integration tests to verify response_source via
  serverless-pre-function log phase plugin
- Add tests for spaced separators and nil ctx.var edge cases
ctx.var may return upstream_header_time as a number (not string) for
single numeric values. Apply tostring() before gmatch to avoid
'attempt to index local (a number value)' error.
…EST 17

- opentelemetry4-bugfix-pb-state.t TEST 3: add expected apisix.response_source attribute
- response-source.t TEST 17: add diagnostic logging to debug connection refused case
The new response_source label in apisix_http_status metric needs to be
accounted for in existing test assertions.
Add core.response.set_response_source(ctx, source) API for plugins that
bypass NGINX proxy (e.g. ai-proxy) to explicitly mark whether the response
came from upstream. The ai-proxy base now sets source to 'upstream' when
the LLM provider responds, ensuring correct classification even for 429/5xx
responses from upstream.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants