fix(observability): classify WS protocol HTTP-version handshake error as expected (Sentry CORE-RUST-DP)#2792
Open
CodeGhost21 wants to merge 1 commit into
Conversation
… as expected (Sentry CORE-RUST-DP) Add the tungstenite `ProtocolError::WrongHttpVersion` Display string (`"HTTP version must be 1.1 or higher"`) to the `is_network_unreachable_message` anchor list. Fires from `src/openhuman/socket/ws_loop.rs:188` via the supervisor's sustained- outage escalation when a server (or intermediary proxy / HTTP/2-only edge) responds to the WebSocket upgrade with HTTP/2+, which the WS spec forbids — handshake requires HTTP/1.1. Self-hosted Sentry CORE-RUST-DP (~2 events / 24h on `openhuman@0.56.0+e8968077aeb5`, `domain=socket operation=ws_connect`). Same handshake-stage user-environment shape as the existing `"tls handshake"` / `"certificate verify failed"` / `"http error: 200 ok"` captive-portal entries. The socket supervisor's exponential-backoff retry already handles the condition; Sentry has no actionable signal to add — no status, no trace, no payload beyond what the supervisor already logs as a breadcrumb at every retry tier. Two new tests pin the contract: classification of both the supervisor- wrapped wire shape (verbatim CORE-RUST-DP body) and the bare tungstenite render, plus a rejection contract over four unrelated HTTP-version log lines (`"HTTP/1.0 to HTTP/2"`, `"h2 alpn"`, `"HTTP/1.1 only"`, `"requires HTTP/1.2 or higher"`) so a future refactor that loosens the substring into a generic `"http version"` matcher fails loudly. `cargo test --lib core::observability::tests` → 90/90 pass. Sentry-Issue: CORE-RUST-DP
Contributor
📝 WalkthroughWalkthroughThis PR extends the ChangesWebSocket handshake HTTP version classifier
Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ProtocolError::WrongHttpVersionDisplay string (\"HTTP version must be 1.1 or higher\") to theis_network_unreachable_messageanchor list insrc/core/observability.rsso the socket supervisor's sustained-outage escalation routes it toExpectedErrorKind::NetworkUnreachable(breadcrumb, no Sentry event) instead of firing an error event.CORE-RUST-DP(~2 events onopenhuman@0.56.0+e8968077aeb5,domain=socket operation=ws_connect, first seen 2026-05-27).\"tls handshake\"/\"certificate verify failed\"/\"http error: 200 ok\"(captive-portal) entries that already inhabit this matcher.Problem
src/openhuman/socket/ws_loop.rs:178routes the one-shot sustained-outage escalation (consecutive == FAIL_ESCALATE_THRESHOLD, default 5 attempts) throughcrate::core::observability::report_error_or_expectedwith the message:The inner string is tungstenite's
ProtocolError::WrongHttpVersionrendering. It fires when the server (or an intermediary proxy / HTTP/2-only edge) responds to the WS upgrade with HTTP/2+, which RFC 6455 forbids — the WS upgrade handshake requires HTTP/1.1.This is a user-environment / upstream-infra misconfiguration:
tokio-tungstenite) cannot negotiate downward — it requires HTTP/1.1 for the Upgrade handshake by spec.The supervisor already retries with exponential backoff and limits the Sentry-visible event to one per affected client per outage. There is no actionable Sentry signal beyond what's already logged as a breadcrumb at every retry tier — no code-side remediation available.
Solution
Add one anchor to the existing
is_network_unreachable_messageOR-chain. The new substring is the literal tungstenite Display string — stable acrosstungsteniteversions and distinctive enough not to false-positive on adjacent transport logs (\"HTTP/1.0 to HTTP/2\",\"h2 alpn\",\"client supports HTTP/1.1 only\",\"requires HTTP/1.2 or higher\").The dispatcher precedence in
expected_error_kindis unchanged —is_loopback_unavailablestill runs first (so a hypothetical loopback HTTP/2 server keeps its own bucket), then the network-unreachable matcher catches this shape and demotes totracing::warn!(breadcrumb, no Sentry event).The matching doc-comment is extended with a fifth bullet covering the new anchor and pointing at CORE-RUST-DP.
Submission Checklist
src/core/observability.rstests module:\"[socket] Connection failed (sustained outage after 5 attempts): WebSocket connect: WebSocket protocol error: HTTP version must be 1.1 or higher\") AND the bare tungstenite render (in case the protocol error escapes through a non-supervisor call site — the classifier runs on the full anyhow chain).core::observability::testsmodule runs 90/90.core::observabilitymodule; not a tracked feature row in `docs/TEST-COVERAGE-MATRIX.md`.Impact
socket::ws_loop:188now classifies asExpectedErrorKind::NetworkUnreachablefor HTTP/2+ handshake responses, demoting totracing::warn!(no Sentry event). No behavioural change to the supervisor's restart loop, the WS reconnect logic, orhealth.busescalation.core-rustproject for users behind HTTP/2-only edges. ~2 events / 24h today, but the count is bound to grow as more proxies default to HTTP/2.tracing::warn!breadcrumb in the local log, never as a Sentry event. The supervisor's per-attempt logs (below threshold and after threshold) are unchanged and remain available for local diagnosis.Notes for reviewers
fix/observability-operation-timed-out): both PRs add a new substring to the sameis_network_unreachable_messageOR-chain. Either merge order produces a trivial 1-line concurrent edit; git will auto-merge.Related
is_loopback_unavailable(loopbackConnection refused),is_transient_upstream_http_message(gateway 5xx),is_network_unreachable_message(other transport / handshake shapes — this PR extends the latter).log_connection_failurefor the routing site.Summary by CodeRabbit
Bug Fixes
Tests