fix(client): await_ready retries on transient errors instead of bailing#147
Merged
githubrobbi merged 3 commits intomainfrom May 8, 2026
Merged
fix(client): await_ready retries on transient errors instead of bailing#147githubrobbi merged 3 commits intomainfrom
await_ready retries on transient errors instead of bailing#147githubrobbi merged 3 commits intomainfrom
Conversation
…ling The sync `UffsClientSync::await_ready` matched `Err(other) => return Err(other)`, so a single transient error from any `status` poll aborted readiness probing immediately. The async sibling at `crates/uffs-client/src/connect.rs::await_ready` has always classified non-Ready, non-success outcomes as `PollOutcome::OtherError` and continued the loop. This patch aligns the sync path with that behaviour: every non-Ready outcome — Loading/Refreshing status, I/O error, connection closed, RPC timeout, or transient `Protocol` error from a partial-response read — keeps polling until the caller-supplied `timeout` deadline elapses. ## Why now The 2026-05-07 Phase 7 24-h soak attempt failed with `Daemon did not become ready in time / request timed out` even though the captured `daemon.log` showed the daemon up and IPC-listening 1.3 s after spawn. Root cause: the very first `status` RPC during the Windows `AF_UNIX` socket-bind window can hit its per-RPC deadline and surface as `ClientError::Timeout`, which the pre-fix code returned immediately. Post-fix the loop tolerates that transient and waits for the daemon to finish coming up. ## Regression test `await_ready_retries_on_protocol_error_until_deadline` feeds a JSON-RPC error response to the very first poll, which surfaces as `ClientError::Protocol`. Pre-fix the loop would have returned that error in <1 ms; post-fix the test takes ~120 ms (the deadline) and returns `ClientError::Timeout` — confirming the loop polled rather than fail-fast. Mac gates green: `cargo nextest -p uffs-client` (173/173), `just lint-fast`, `just check-windows`.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Aligns the sync
UffsClientSync::await_readywith its async sibling: any non-Ready, non-success outcome from astatuspoll (Loading status, I/O error, connection closed, RPC timeout, transient protocol error) now keeps polling until the caller-suppliedtimeoutdeadline elapses.Root cause
crates/uffs-client/src/connect_sync.rs::await_readymatchedErr(other) => return Err(other), so a single transient error from any in-flightstatusRPC aborted readiness probing immediately. The async sibling atcrates/uffs-client/src/connect.rs::await_readyhas always classified non-Ready, non-success outcomes asPollOutcome::OtherErrorand continued the loop — the sync path was the outlier.Why now — 2026-05-07 Phase 7 soak finding
The Phase 7 24-h soak attempt failed with:
even though the captured
daemon.logshowed the daemon up and IPC-listening 1.3 s after spawn. The very firststatusRPC during the WindowsAF_UNIXsocket-bind window can hit its per-RPC deadline and surface asClientError::Timeout, which the pre-fix code returned immediately — aborting probing while the daemon was healthy and one poll away from Ready.PR #146 already added a soak-harness-level workaround (idempotent attach + race-tolerant spawn). This PR fixes the underlying CLI bug so direct
uffs daemon startinvocations no longer race.Regression test
await_ready_retries_on_protocol_error_until_deadline(crates/uffs-client/src/connect_sync_tests.rs) feeds a JSON-RPC error response to the very first poll, surfaced asClientError::Protocol.Err(Protocol("transient mid-handshake error"))Err(ClientError::Timeout)The 0.332 s observed test wall-clock confirms the loop actually polled.
Local validation
cargo nextest run -p uffs-client— 173/173 passed, including the new test and all four existingawait_ready_*regression pinsjust lint-fast— fmt-check, file-size, typos, reuse, lint-ci, lint-prod, lint-testsjust check-windows—cargo xwin check --workspace --all-targets --all-featuresCompliance audit (mandatory rules)
connect_sync.rswas avoided by trimming an inline comment (the regression test's docstring carries the rationale), not by adding afile_size_exceptions.txtentry.PollOutcome::OtherErrorclassification.await_readystill returnsErr(ClientError::Timeout)on deadline expiry; only the fast-fail-on-transient-error edge case changes (and that path was the bug).await_ready_*tests still pass.