Skip to content
Merged
1 change: 1 addition & 0 deletions .github/workflows/weekly-valgrind.yml
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ jobs:
cli \
route \
upstream \
h2_upstream \
proxy \
rate_limit \
circuit_breaker \
Expand Down
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,11 @@ A C++17 network server and gateway built on the Reactor pattern. It uses non-blo
- **WebSocket** — RFC 6455 support: handshake, binary/text frames, fragmentation, close handshake, ping/pong
- **TLS/SSL** — OpenSSL 3.x integration for downstream server TLS and upstream client TLS
- **Upstream Proxy** — per-service connection pools with TLS, streaming response relay, retry policy, trailer handling, and header rewriting
- **HTTP/2 Upstream** — per-upstream opt-in multiplexed H2 client (donated-lease pattern), ALPN-negotiated `auto / always / never` dispatch, two-deadline send-stall + response-timeout model, transport-drain-driven sink dispatch, GOAWAY/PING liveness, live-reloadable session settings
- **Rate Limiting** — per-client / per-route token-bucket middleware with LRU eviction, `RateLimit-*` headers, dry-run mode, hot reload
- **Circuit Breaking** — per-upstream state machines, retry budgets, dry-run mode, wait-queue drain, hot-reloadable breaker tuning
- **OAuth 2.0 Token Validation** — JWT validation with JWKS/OIDC discovery, multi-issuer policies, outbound identity headers, and RFC 7662 introspection mode
- **Observability (OpenTelemetry)** — W3C + Jaeger trace propagation, OTLP/JSON traces + metrics push, Prometheus pull `/metrics`, route-aware sampling, per-request span tree across inbound + auth + proxy + WS, idempotent finalize-CAS bookkeeping, four-phase graceful shutdown
- **DNS and IPv6** — bind-host and upstream hostname resolution, IPv4/IPv6 family selection, stale-on-error reload handling
- **Reactor Core** — edge-triggered epoll (Linux) / kqueue (macOS), non-blocking I/O, multi-threaded dispatcher pool
- **Thread Pool** — configurable worker threads for event loop dispatchers
Expand Down Expand Up @@ -496,6 +498,8 @@ make -C thread_pool
| [docs/configuration.md](docs/configuration.md) | JSON config, environment variables, DNS, upstreams, rate limiting, structured logging |
| [docs/oauth2.md](docs/oauth2.md) | OAuth 2.0 JWT and introspection validation |
| [docs/circuit_breaker.md](docs/circuit_breaker.md) | Circuit breaker configuration, retry budgets, hot reload, observability |
| [docs/http2_upstream.md](docs/http2_upstream.md) | HTTP/2 upstream client — `prefer` modes, reload semantics, failure modes, tuning |
| [docs/observability.md](docs/observability.md) | OpenTelemetry — traces, metrics, propagators, sampling, OTLP / Prometheus configuration |

## Platform Support

Expand Down
11 changes: 8 additions & 3 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,10 +28,15 @@ The server uses the [Reactor pattern](https://en.wikipedia.org/wiki/Reactor_patt
```
Layer 7: AuthManager, AuthMiddleware, (inbound middleware stack)
RateLimitManager, RateLimitZone,
TokenBucket, CircuitBreakerManager
Layer 6: UpstreamManager, UpstreamHostPool, (upstream connection pooling)
TokenBucket, CircuitBreakerManager,
ObservabilityManager, (cross-cutting: trace + metrics emission)
TracerProvider, MeterProvider
Layer 6: UpstreamManager, UpstreamHostPool, (upstream connection pooling + proxy engine)
PoolPartition, UpstreamConnection,
UpstreamLease, TlsClientContext,
ProxyHandler, ProxyTransaction,
UpstreamCodec + UpstreamHttpCodec (H1) + UpstreamH2Codec (H2),
UpstreamH2Connection, H2ConnectionTable (multiplexed H2 sessions),
DnsResolver (hostname resolution, reload-time re-resolve)
Layer 5: HttpServer (application entry point)
Layer 4: HttpRouter, WebSocketConnection (routing, WS message API)
Expand All @@ -45,7 +50,7 @@ Layer 1: ConnectionHandler, Channel, (reactor core)
Dispatcher, EventHandler
```

Layers 1–2 are the transport. Layers 3–5 are the protocol. Layer 6 is the gateway (upstream connectivity + DNS resolution). Layer 7 is the inbound traffic-management middleware (auth, rate limiting, circuit breaking). HTTP/1.x and HTTP/2 are parallel handlers at Layer 3, selected by `ProtocolDetector` at connection time. Both converge on the same `HttpRouter` at Layer 4. ConnectionHandler supports both inbound (server) and outbound (client) connections.
Layers 1–2 are the transport. Layers 3–5 are the protocol. Layer 6 is the gateway (upstream connectivity + proxy engine + DNS resolution). Layer 7 is the inbound traffic-management middleware (auth, rate limiting, circuit breaking, observability emission). HTTP/1.x and HTTP/2 are parallel handlers at Layer 3, selected by `ProtocolDetector` at connection time. Both converge on the same `HttpRouter` at Layer 4. ConnectionHandler supports both inbound (server) and outbound (client) connections. Upstream traffic mirrors the layering: the proxy engine dispatches per request through an H1 or H2 codec based on per-upstream `http2.enabled` + ALPN negotiation. H2 upstream sessions follow a donated-lease pattern — one real `UpstreamLease` is held for the multiplexed session lifetime; per-request transactions route as sentinel reuses through the existing session.

`DnsResolver` is owned by `HttpServer` and is used at two points: (1) bind-host resolution during `Start()`, and (2) upstream hostname re-resolution during each `Reload()`. IP-literal upstreams bypass `DnsResolver` entirely.

Expand Down
3 changes: 3 additions & 0 deletions docs/http2_upstream.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,3 +138,6 @@ The defaults are conservative and work for most deployments. Tune only if you ob
- **No mid-stream SETTINGS update** — reloads apply to NEW connections only. Existing sessions keep their construction-time settings.
- **One H2 connection per upstream per dispatcher** — until saturation routing lands (`saturation_open_pct`), each partition holds one multiplexed connection per upstream. For very-high-fanout workloads this can be a bottleneck; mitigate by increasing the dispatcher count.
- **Per-stream backpressure is not strictly bounded by `initial_window_size`** — the proxy lets nghttp2 manage stream-level flow control with auto-`WINDOW_UPDATE` enabled, so the upstream's effective window is continuously refreshed as bytes are delivered to the on-data-chunk callback. In practice per-stream upstream buffering tracks the auto-update cadence (~`initial_window_size` worth of bytes outstanding under steady traffic) plus the `StreamingResponseSender` high-water mark on the downstream side, but it is not a hard cap: a slow downstream client paired with a fast H2 upstream can buffer somewhat more depending on `MAX_FRAME_SIZE` and how quickly chunks are read. For workloads with bursty downstream stalls and a high `initial_window_size`, watch RSS and consider lowering the window size. A future refinement will disable auto-update and pause per-stream consumption via `nghttp2_session_consume_stream` to enforce a strict cap.
- **CONNECT method is rejected** with 502 + `X-H2-Limitation: connect-not-supported`. The H2 codec emits `:scheme` and `:path` on every request, which RFC 9113 §8.5 forbids on CONNECT pseudo-headers; rather than emit a malformed request, the gateway rejects deterministically. Use an H1 upstream for CONNECT tunnelling.
- **Truncation observability** — when a backend declares `Content-Length` and closes early, when it returns body bytes on a `204 No Content` / `304 Not Modified` / `HEAD` response that should have no body, or when it sends more bytes than `Content-Length` declared, nghttp2's HTTP messaging enforcement detects the violation and the gateway surfaces it as `RESULT_UPSTREAM_DISCONNECT` (the same bucket as a torn TCP connection). A dedicated `RESULT_TRUNCATED_RESPONSE` code exists in the binary for defense-in-depth but is not normally observable in production. If you need to distinguish "peer reset / TCP drop" from "framing violation", correlate by upstream-side response logs. Truncated responses count toward circuit-breaker upstream-failure totals via the `RESULT_TRUNCATED_RESPONSE` → `UPSTREAM_DISCONNECT` `FailureKind` mapping.
- **H2 send-stall is a timeout** — the per-stream send-stall budget refreshes on each DATA frame that actually drains off the transport buffer (not when nghttp2 serializes a frame into its internal buffer). The gateway tracks every outbound HEADERS/DATA frame in a per-session drain queue inside `UpstreamH2Connection`; the transport's `write_progress_callback` / `complete_callback` pop the queue as bytes hit the socket / TLS layer and dispatch the per-stream sink virtuals. This means a backend that has stopped reading (TCP RWIN at zero, TLS WANT_WRITE, OS socket EAGAIN) holds the request frames in the gateway's transport buffer — the stall budget runs against real wire progress, not nghttp2 bookkeeping. A truly stalled upload — peer's flow-control window drained and no transport drain for `response_timeout_ms` (or `30s` if disabled) — surfaces as `RESULT_RESPONSE_TIMEOUT` (504), not `RESULT_UPSTREAM_DISCONNECT` (502). This matches H1's transport-callback-driven send-stall semantics and routes through the retryable-timeout path so `retry_on_timeout` policies apply.
4 changes: 2 additions & 2 deletions docs/testing.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
## Running Tests

```bash
make test # Build and run all suites (1021 tests across 35+ suites at HEAD)
make test # Build and run all suites (1379 tests across 41+ suites at HEAD)
./test_runner # Run all tests directly (after building)
./test_runner help # Print every supported flag

Expand Down Expand Up @@ -87,7 +87,7 @@ make test_auth_race
make test_auth_observability
```

At current head, `./test_runner` reports **1021 / 1021 passing** (100 %).
At current head, `./test_runner` reports **1379 / 1379 passing** (100 %).

## Test Suites

Expand Down
119 changes: 119 additions & 0 deletions include/upstream/proxy_transaction.h
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,10 @@ class ProxyTransaction
: public std::enable_shared_from_this<ProxyTransaction>,
public UPSTREAM_CALLBACKS_NAMESPACE::UpstreamResponseSink,
public OBSERVABILITY_NAMESPACE::UpstreamTransactionLink {
// Test-only friend that pokes the private H2 dispatch state to
// exercise OnRequestSubmitted's response_timeout branch without
// spinning up the full UpstreamManager / dispatcher / pool stack.
friend struct H2ResponseTimeoutTestFixture;
public:
// Result codes for internal state tracking
static constexpr int RESULT_SUCCESS = 0;
Expand All @@ -60,6 +64,19 @@ class ProxyTransaction
// X-Retry-Budget-Exhausted so operators can tell the two 503s apart
// from circuit-open rejects.
static constexpr int RESULT_RETRY_BUDGET_EXHAUSTED = -8;
// Upstream response did not match its declared length. Two cases:
// - Content-Length declared, peer delivered fewer bytes before clean
// close, or more bytes than declared.
// - Response classified NO_BODY (status 204/304 or HEAD method) but
// peer sent body bytes anyway.
// Terminal — partial body has already been streamed downstream so retry
// would double-deliver bytes. Maps to 502 BadGateway in MakeErrorResponse.
static constexpr int RESULT_TRUNCATED_RESPONSE = -10;
// CONNECT method on an H2 upstream. RFC 9113 §8.5 forbids :scheme and
// :path on CONNECT pseudo-headers, but our H2 codec always emits both;
// serving CONNECT here would emit a malformed request. Terminal —
// deterministic policy reject (502 BadGateway + X-H2-Limitation header).
static constexpr int RESULT_H2_METHOD_NOT_SUPPORTED = -11;

// Constructor copies all needed fields from client_request (method, path,
// query, headers, body, params, dispatcher_index, client_ip, client_tls,
Expand Down Expand Up @@ -137,15 +154,66 @@ class ProxyTransaction
return kill_for_shutdown_.load(std::memory_order_acquire);
}

// Returns true iff the comma-separated TE header value contains the
// `trailers` token. Handles RFC 9110 §10.1.4 syntax: each entry MAY
// carry `;q=...` weight parameters (e.g. `te: trailers;q=1.0`); the
// matcher splits on the bare token name (substring before the first
// ';' in each comma-segment), trimmed of OWS. Locale-safe ASCII
// lowercase via explicit `c | 0x20` branch (NOT std::tolower).
// Public + static so test code can verify the contract directly.
static bool ContainsTeTrailersToken(const std::string& value);

// Computes the H2 send-stall budget. Mirrors H1's zero-disable
// semantic: response_timeout_ms == 0 opts out of the response-wait
// timer but the stall-phase hang protection stays on, falling back
// to SEND_STALL_FALLBACK_MS. Negative values are treated the same
// as zero (defensive — config validation enforces non-negative,
// but a future bug must not produce a zero or negative budget that
// would either fire instantly or never).
// Public + static so tests verify the contract directly.
static int ComputeH2StallBudgetMs(int response_timeout_ms) {
return (response_timeout_ms > 0) ? response_timeout_ms
: SEND_STALL_FALLBACK_MS;
}

bool OnHeaders(
const UPSTREAM_CALLBACKS_NAMESPACE::UpstreamResponseHead& head) override;
bool OnBodyChunk(const char* data, size_t len) override;
void OnTrailers(
const std::vector<std::pair<std::string, std::string>>& trailers) override;
void OnComplete() override;
void OnError(int error_code, const std::string& message) override;
void OnRequestSubmitted() override;
void OnRequestBodyProgress() override;

// Send-phase stall fallback budget when config_.response_timeout_ms == 0.
// The response-wait timeout is operator-disable-able (set to 0), but the
// stall-phase hang protection is always on — without it a wedged upstream
// that stops reading our request body would pin both the client and the
// pooled connection indefinitely. Used by both the H1 send loop and the
// H2 send-stall closure (via ComputeH2StallBudgetMs).
//
// Public so test code can verify the contract directly. Leaking a
// static-constexpr int is harmless — no ABI surface, no mutable state.
static constexpr int SEND_STALL_FALLBACK_MS = 30000; // 30s

private:
// Bump h2_send_stall_generation_ and queue a fresh send-stall
// closure for the full budget. Called from DispatchH2 at attempt
// start. OnRequestBodyProgress does NOT call this directly —
// refreshes flow through h2_last_progress_at_ + the closure's
// self-rescheduling check.
void ArmH2SendStallDeadline(int budget_ms);

// Queue (or re-queue) the send-stall closure with the given
// generation and delay. Called by ArmH2SendStallDeadline (initial
// arm with a fresh generation) and by the closure itself on
// observed progress (re-queue with the current generation for the
// remaining budget). Same-generation re-queue is correct because
// Cleanup / OnRequestSubmitted bump the generation, invalidating
// any in-flight closure regardless of who queued it.
void QueueH2SendStallClosure(uint64_t generation, int delay_ms);

// State machine states
enum class State {
INIT, // Created, not yet started
Expand All @@ -158,6 +226,10 @@ class ProxyTransaction
};

State state_ = State::INIT;
// After H2 SubmitRequest failure rollback, state_ briefly reads
// SENDING_REQUEST while h2_path_ is already false. AttemptCheckout
// (retry path) and DeliverTerminalError (no-retry path) reset it;
// no live reader observes the gap.
int attempt_ = 0; // Current attempt number (0 = first try)
// Set by Cancel() — short-circuits checkout / retry / response
// delivery paths so the transaction is torn down even if an
Expand Down Expand Up @@ -305,6 +377,53 @@ class ProxyTransaction
int32_t h2_stream_id_ = -1;
bool h2_path_ = false;

// True iff the inbound request carried `te: trailers` (RFC 7230 §4.3
// / RFC 9113 §8.2.2 — required by gRPC clients to negotiate trailer
// support). Captured at construction BEFORE HeaderRewriter strips
// all te values per RFC 7230 hop-by-hop rules. The H2 outbound nv
// build re-emits a synthetic `te: trailers` based on this flag; H1
// path is unchanged (rewriter strips, no re-emit).
bool client_te_trailers_ = false;

// H2 send-stall generation counter. Bounds the time spent in
// SENDING_REQUEST waiting for END_STREAM to flush — without this, a
// wedged peer that stops reading our DATA frames would pin the H2
// stream until the peer's PING timeout (or forever, if PING is
// disabled). Armed BEFORE SubmitRequest so the synchronous
// on_frame_send_callback path (bodyless requests where nghttp2
// inline-flushes HEADERS+END_STREAM) can kill it via generation
// bump. Cleanup also bumps to invalidate any in-flight closure.
uint64_t h2_send_stall_generation_ = 0;

// H2 response-timeout arm-once flag. Coordinates the response timer
// arming between OnHeaders and OnRequestSubmitted: whichever fires
// first arms ArmResponseTimeout and sets this flag, and the other
// skips re-arming. Required because the existing H1 OnHeaders path
// calls ClearResponseTimeout (semantic doesn't apply to H2's
// two-deadline model) and DispatchH2 cannot arm response-timeout
// upfront without leaking the budget into the body-write phase.
// Reset by Cleanup so retry attempts arm fresh.
bool h2_response_timeout_armed_ = false;

// True once OnRequestSubmitted has fired. OnRequestBodyProgress
// gates refresh on this rather than state_ — request-side and
// response-side phases diverge on the early-final-headers path.
// Reset by DispatchH2 init + Cleanup.
bool h2_request_fully_sent_ = false;

// Last time the H2 codec emitted a request-side DATA frame.
// Updated by OnRequestBodyProgress; inspected by the single
// in-flight send-stall closure on fire. The closure re-queues
// itself for the remaining budget if progress was observed,
// otherwise it fires the timeout. This keeps the dispatcher's
// min-heap bounded to one closure per request regardless of
// upload size, while preserving refresh-on-every-DATA semantics.
std::chrono::steady_clock::time_point h2_last_progress_at_{};

// Cached send-stall budget for this attempt. Computed once in
// DispatchH2 so the closure's progress check doesn't recompute.
int h2_stall_budget_ms_ = 0;

// H2 response timeout uses a dispatcher-scheduled task instead of a
// transport-level deadline: the transport is shared across every
// stream on the multiplexed session, so SetDeadline would tear down
Expand Down
Loading
Loading