Skip to content

fix(envoy): require runner config for ws conn#4612

Merged
MasterPtato merged 1 commit intomainfrom
04-12-fix_envoy_require_runner_config_for_ws_conn
Apr 13, 2026
Merged

fix(envoy): require runner config for ws conn#4612
MasterPtato merged 1 commit intomainfrom
04-12-fix_envoy_require_runner_config_for_ws_conn

Conversation

@MasterPtato
Copy link
Copy Markdown
Contributor

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Copy link
Copy Markdown
Contributor Author

MasterPtato commented Apr 12, 2026

@MasterPtato MasterPtato mentioned this pull request Apr 12, 2026
11 tasks
@MasterPtato MasterPtato force-pushed the 04-12-fix_envoy_require_runner_config_for_ws_conn branch from e946974 to 288d5f1 Compare April 12, 2026 21:21
@MasterPtato MasterPtato force-pushed the 04-12-fix_pb_backport_serverless_fix branch from f4342a7 to b45c190 Compare April 12, 2026 21:22
@claude
Copy link
Copy Markdown

claude bot commented Apr 12, 2026

PR Review: fix(envoy): require runner config for ws conn

Summary

This PR tightens the envoy WebSocket connection contract by requiring a runner config to already exist before an envoy can connect, rather than auto-creating a default "normal" config on demand. The change is clean and improves explicitness, but there are a few items worth discussing.


Issues

Cache race window (bypass_cache: false) — The runner config is now fetched with bypass_cache: false, which uses the 5-second TTL cache. If a runner config is provisioned and then an envoy connects within that TTL window, it will receive a spurious NoRunnerConfig error because the cache may still hold a None result. The old ensure_normal_if_missing code was an upsert and was not sensitive to this. Consider whether bypass_cache: true is more appropriate here since this is a once-per-connection gating check on the critical path.

Missing trailing newlineengine/artifacts/errors/ws.no_runner_config.json has no trailing newline. POSIX-compliant files should end with a newline, and this may trigger lint warnings or artifact tooling issues.


Observations

ensure_normal_if_missing op may now be dead code — This operation was removed from conn.rs but is still present in the codebase. If it has no remaining callers, it should be cleaned up to avoid confusion.

Breaking change for self-hosted / test environments — Any deployment that relied on auto-creation of a normal runner config on envoy connect will now break with a NoRunnerConfig error. It may be worth calling this out explicitly in the PR description or any migration notes for self-hosters.

Error message format is consistent — Using {pool_name:?} (debug format, quoted output) matches the pattern used by other variants in WsError.


Code Quality

The refactor is clean. Moving the runner config fetch before tokio::try_join! and computing serverless_drain_grace_period directly from the result is clearer than the prior approach of using .first().and_then(...) with an implicit silent-None fallback. The new NoRunnerConfig error variant is well-structured, uses the RivetError derive macro correctly, and includes actionable metadata (pool_name). The artifact JSON is consistent with other error definitions in the registry.

@claude
Copy link
Copy Markdown

claude bot commented Apr 12, 2026

PR Review: fix(envoy): require runner config for ws conn

Summary

This PR changes init_conn in the envoy WebSocket connection path to fail explicitly with NoRunnerConfig when no runner config exists for the requested pool, instead of silently auto-creating a default "normal" config via ensure_normal_if_missing. It also fixes a pre-existing bug where is_serverless was always hardcoded to false on the Conn struct.

Correctness: is_serverless Bug Fix

The fix to is_serverless is correct. Previously the field was hardcoded to false; it is now correctly derived from whether serverless_drain_grace_period is Some. This is a real bug fix.

Concerns

1. Cache race window (bypass_cache: false)

The new gate check uses bypass_cache: false, meaning the 5-second TTL cache applies. If a runner config is created and an envoy connects within that window, the cache may return a stale miss, causing a spurious NoRunnerConfig error. The old ensure_normal_if_missing was an upsert and was cache-insensitive. The strict validation path should arguably use bypass_cache: true, or at minimum include a comment acknowledging this tradeoff.

2. Missing trailing newline in artifact file

engine/artifacts/errors/ws.no_runner_config.json appears to be missing a trailing newline, unlike sibling artifact files (e.g., ws.eviction.json). Minor POSIX hygiene issue.

3. {pool_name:?} debug format in error message

The error message uses {pool_name:?} (debug/quoted format), which emits the pool name surrounded by quotes in output (e.g., "my-pool"). Other parameterized WsError variants use display format ({0}). Confirm this is intentional.

Non-issues

  • ensure_normal_if_missing is not dead code after this change. It still has a caller in pegboard/src/workflows/runner2.rs via the EnsureRunnerConfig activity.
  • The behavioral change (envoy connections now fail without a pre-existing runner config) is the intended design.

Overall the change is clean and focused. The is_serverless fix is a genuine correctness improvement. The cache race window is the most substantive concern worth addressing before merging.

@MasterPtato MasterPtato force-pushed the 04-12-fix_envoy_require_runner_config_for_ws_conn branch from 288d5f1 to 1671d04 Compare April 12, 2026 21:45
@MasterPtato MasterPtato force-pushed the 04-12-fix_pb_backport_serverless_fix branch from b45c190 to d9003fa Compare April 12, 2026 21:45
@MasterPtato MasterPtato force-pushed the 04-12-fix_pb_backport_serverless_fix branch from d9003fa to fad7632 Compare April 12, 2026 23:38
@MasterPtato MasterPtato force-pushed the 04-12-fix_envoy_require_runner_config_for_ws_conn branch from 1671d04 to 5671fec Compare April 12, 2026 23:38
@MasterPtato MasterPtato force-pushed the 04-12-fix_envoy_require_runner_config_for_ws_conn branch from 5671fec to 6b3850b Compare April 13, 2026 00:20
@MasterPtato MasterPtato force-pushed the 04-12-fix_pb_backport_serverless_fix branch from fad7632 to 744ddb3 Compare April 13, 2026 00:20
Copy link
Copy Markdown
Contributor Author

MasterPtato commented Apr 13, 2026

Merge activity

  • Apr 13, 12:21 AM UTC: A user started a stack merge that includes this pull request via Graphite.
  • Apr 13, 12:40 AM UTC: Graphite rebased this pull request as part of a merge.
  • Apr 13, 12:40 AM UTC: @MasterPtato merged this pull request with Graphite.

@MasterPtato MasterPtato changed the base branch from 04-12-fix_pb_backport_serverless_fix to graphite-base/4612 April 13, 2026 00:37
@MasterPtato MasterPtato changed the base branch from graphite-base/4612 to main April 13, 2026 00:38
@MasterPtato MasterPtato force-pushed the 04-12-fix_envoy_require_runner_config_for_ws_conn branch from 6b3850b to a308ebc Compare April 13, 2026 00:39
@MasterPtato MasterPtato merged commit a91a6e5 into main Apr 13, 2026
5 of 7 checks passed
@MasterPtato MasterPtato deleted the 04-12-fix_envoy_require_runner_config_for_ws_conn branch April 13, 2026 00:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants