feat(relay/git): git on Sprout, S3-backed#726
Merged
Conversation
…essed packs Introduces `crates/sprout-relay/src/api/git/store.rs` — the object-store half of the git-on-object-storage protocol (docs/git-on-object-storage.md). API surface: - `put_pack` / `put_manifest` — create-only (`If-None-Match: *`), content-addressed, idempotent. Maps A1: writes that would overwrite a content-addressed key (i.e. identical bytes) succeed silently. - `get` / `get_verified` — `get_verified` enforces A1 detectability by re-hashing returned bytes against the expected SHA-256 digest. - `get_pointer` — HEAD+GET, returns `Option<(ETag, Bytes)>` (None on 404). - `put_pointer(key, body, Precond)` — the CAS primitive (§Push step 7). Returns `CasOutcome::Won(ETag)` or `CasOutcome::LostRace`; 412 is a *semantic* result, not an error. Sharp edge documented at module level: `rust-s3` 0.37 + `fail-on-err` (shared with `sprout-media`) makes non-2xx arrive as `S3Error::HttpFailWithBody`. `classify_cas` maps 412 → `LostRace` and bubbles everything else. Verified empirically against MinIO in `probe::probe_412_surfacing` and `probe::probe_full_roundtrip` — both gated on `SPROUT_GIT_S3_PROBE=1` so CI doesn't depend on a live bucket. Empirically verified: - `If-None-Match: *` collision → `HttpFailWithBody(412, _)` on MinIO. - `If-Match` wrong ETag → `HttpFailWithBody(412, _)`. - MinIO returns the new object ETag in the PUT response, so callers can chain `Won → IfMatch → Won` without a HEAD round-trip. The full chain is asserted in `probe_full_roundtrip`. - `get_verified` rejects digest mismatch. 196 relay tests green (unfiltered). `cargo clippy --all-targets -D warnings` and `cargo fmt --check` clean. Probe tests run on demand only. Co-authored-with: Eva (ticket), Quinn (CAS interface coordination) Co-authored-by: Dawn <c6237ef84fa537c78dcee78efd2d4e59f728859c7f194da42ac51ededfa0be05@users.noreply.sprout> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
…e probe Three review hits from Max, Perci, and Mari, plus Mari's probe API: 1. **`get_pointer` is now a single GET** that reads the ETag from the GET response headers — not HEAD+GET. The two-call version could straddle a concurrent writer (HEAD's ETag and GET's body describing different pointer versions); a caller predicating `IfMatch(etag)` on the HEAD's ETag would be using a version it never actually read. Verified empirically that MinIO populates `etag` on GET responses (`probe::probe_get_exposes_etag`). 2. **`put_pack`/`put_manifest` no longer accept a key parameter.** The key is derived inside `put_immutable` as `<prefix>/<hex sha256(bytes)>`. This makes the idempotency claim *constructive* rather than trusted: a 412 collision means the stored bytes' digest equals these bytes' digest (the key), so by A1 the stored bytes equal these bytes. A buggy caller passing the wrong key can no longer silently break A1 detectability on read. `put_pack(bytes) -> Result<String, _>` returns the key. 3. **`run_conformance_probe(ProbeConfig) -> Result<ProbeReport, _>`** — public method per Mari's requested shape. Four phases: `sequential` (A1+A2), `if_match_race` (A3), `if_none_match_race` (A1+A3 on the create-only primitive `put_pack`/`put_manifest` use), `etag_consistency` (token round-trips opaquely). Failures carry `phase`, `round`, `key`, `reason`. Defaults: width=32, rounds=3. Tied to the same `put_immutable` path as the production write methods so the gate tests the load-bearing A1 primitive, not just pointer creation. Verified end-to-end against live MinIO at 8-way×2-round in `probe::probe_conformance`. 198 relay tests green (was 196 — 2 new pure unit tests on `classify_cas`), 6 store probe tests green when `SPROUT_GIT_S3_PROBE=1` against MinIO. clippy/fmt clean. Co-authored-by: Dawn <c6237ef84fa537c78dcee78efd2d4e59f728859c7f194da42ac51ededfa0be05@users.noreply.sprout> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Per Perci's tightening note: a `put_pointer` PUT that returns 2xx without
an ETag header is not a `Won` — it's a non-conforming backend. The
previous `unwrap_or_default()` would hand the caller `ETag("")`, which
the next CAS round would treat as a normal `LostRace`. That hides
backend mis-conformance behind a single retry instead of failing at
admission.
Now: missing ETag on a 2xx CAS PUT → `StoreError::Backend(...)` with a
diagnostic body. The conformance probe's `etag_consistency` and
`if_match_race` phases exercise the path on every round, so a
non-conforming backend fails the deployment gate; in production, a
mid-stream ETag dropout fails the affected push rather than corrupting
chain semantics.
198 relay tests + 6 store probes still green; clippy/fmt clean.
Co-authored-by: Dawn <c6237ef84fa537c78dcee78efd2d4e59f728859c7f194da42ac51ededfa0be05@users.noreply.sprout>
Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Adds `crates/sprout-relay/src/api/git/manifest.rs` — the shared type
that the push path (Quinn's `cas_publish`) constructs and the read path
(Dawn's `hydrate`) consumes. Sibling module so neither owns the schema.
Schema (per channel agreement with Sami, Max, Perci, Quinn):
```rust
pub struct Manifest {
pub version: u32, // = 1
pub head: String, // "refs/heads/main" unprefixed
pub refs: BTreeMap<String, String>,
pub packs: Vec<String>, // store keys, sorted
pub parent: Option<String>, // previous manifest digest
}
```
- HEAD is *published* ref state, not derived at read time. A hydrate-time
default would let a clone advertise a branch the writer didn't intend
(Inv_RefEffectApplied).
- `canonical_bytes()` is deterministic: BTreeMap iterates sorted, packs
sorted + deduped defensively, struct field order pinned by declaration,
`serde_json::to_vec` emits no whitespace. So `key == sha256(bytes)` is
reproducible — A1 stays mechanical.
- `from_bytes` rejects unknown `version`.
Six unit tests pin: round-trip equality, byte stability under permuted
ref insertion order, packs sort+dedup, version rejection, no-parent first
push, and a fully byte-pinned canonical string ("any unintended serialization
change shifts every manifest digest" → loud failure, not silent).
204/204 relay tests; clippy/fmt clean.
Co-authored-by: Dawn <c6237ef84fa537c78dcee78efd2d4e59f728859c7f194da42ac51ededfa0be05@users.noreply.sprout>
Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
…hex_oid Sami flagged on cas_publish review: write side filters malformed OIDs but not refnames or detached HEAD; read side (hydrate) rejects exactly those. Net = "valid CAS, un-clone-able output" — manifest commits successfully, every subsequent clone 5xx'es. Tyler's 9/10 bar fails. Fail at write, not at every read. Solution: lift the validation predicates that already live in `hydrate.rs` into `manifest.rs` as `pub` items, plus add `Manifest::validate()` for one-shot write-side checking. - `pub fn is_safe_refname(s) -> bool` — moved from hydrate, identical predicate. Symmetric protection: writers reject before put_manifest, readers double-check defense-in-depth on hydrate. - `pub fn is_hex_oid(s) -> bool` — same move, same shape (accepts SHA-1 and SHA-256 widths for the forward algorithm transition). - `pub fn Manifest::validate(&self) -> Result<(), ManifestError>` — bundles head non-empty + head safe + every ref name safe + every oid valid. Quinn's `cas_publish` calls this between `compose_after` and `put_manifest` in a follow-up. - `ManifestError` grows three variants: `UnsafeRefName`, `MalformedOid`, `EmptyHead`. All carry diagnostic context. `canonical_bytes` deliberately does NOT call `validate` — keeps the write seam visible (caller must invoke `validate`?`canonical_bytes`). 8 new tests pin predicate behavior + each `validate` failure path. 212 relay tests; clippy + fmt clean. hydrate.rs will follow up to import the shared predicates instead of duplicating; today nothing breaks because the predicates are identical text. Co-authored-by: Dawn <c6237ef84fa537c78dcee78efd2d4e59f728859c7f194da42ac51ededfa0be05@users.noreply.sprout> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Two follow-ups for the convergence: 1. **`pub fn pointer_key(owner, repo) -> String`** in manifest.rs. Single source of truth shared by cas_publish (write) and hydrate (read). Max + Sami both pushed for centralization — "same string in two files" drifts on the next refactor. Schema is `repos/<owner>/<repo>/pointer` (Quinn's choice; we agreed earlier). 2. **`Manifest::parent` is a bare 64-char hex digest, not a store key.** Perci flagged on cas_publish review: `read_parent` stored `"manifests/<digest>"` but Dawn's manifest contract says digest alone — `Inv_RefDerivedFromParent` reads `parent = pointer.digest` literally. Tightened the docstring + added `ManifestError::MalformedParent`, so `Manifest::validate()` catches the common bug at the write seam (Quinn strips `manifests/` before assigning, or the call fails loudly). 4 new tests pin the parent-shape rejection (`manifests/<digest>` prefix rejected, short rejected, None accepted) and pointer_key `.git` stripping. 18/18 manifest, 216/216 relay lib, clippy + fmt clean. Follow-up commit on dawn/git-read-hydrate imports `manifest::pointer_key` and drops the duplicate definition. Co-authored-by: Dawn <c6237ef84fa537c78dcee78efd2d4e59f728859c7f194da42ac51ededfa0be05@users.noreply.sprout> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Adds `crates/sprout-relay/src/api/git/hydrate.rs` — the spec §Read flow.
GET pointer → GET manifest (digest-verified) → GET packs in parallel
(digest-verified) → init temp bare repo → write+index every pack → write
refs + HEAD. Returns `HydratedRepo { _tempdir, path }`; dropping the
handle removes the directory.
Ordering invariant (Sami's emphasis, Max's ask): all packs must be fully
written + indexed before any ref is installed. A ref pointing into a
not-yet-indexed pack is an opaque protocol failure mid-stream; the
phase-2 split (refs/HEAD only after the phase-1 pack loop completes via
`?`-bubbling) makes a failed hydrate fail-closed as "no advertised refs."
Pointer 404 vs. backend failure split (Max): `Ok(None)` on pointer
absence → caller returns HTTP 404. Any other failure → `Err(_)` →
backend error. Empty repo (pointer present, refs empty in manifest) is a
valid committed state and returns `Ok(Some(_))` with an empty bare repo.
No `git index-pack --strict` on the read path. `index-pack` alone
validates pack structure (CRC, type tags, internal refs); `--strict`
adds the connectivity-graph fsck which would re-prove what
manifest.packs covers by construction (Inv_Closed) — a write-path
invariant, not a read-path obligation. Sami's call, agreed.
Defensive refname validation rejects traversal / non-`refs/` prefixes /
control chars before any file write — the writer should enforce this
but we re-check because we're producing file paths from manifest input.
Tests:
- 3 pure unit tests on the helpers (refname safety, hex oid recognition,
pointer key derivation with `.git` suffix stripping).
- 2 live MinIO + real-git integration tests gated on `SPROUT_GIT_S3_PROBE=1`:
- `live_hydrate_roundtrip` — build a real source repo, repack, upload
pack+manifest+pointer via `store.rs` API, call `hydrate_for_read`,
assert refs/HEAD match, then `git clone <hydrated> /tmp` and assert
cloned file content survives.
- `live_hydrate_missing_pointer_returns_none` — pointer absence is `Ok(None)`.
209 relay tests green; 5 hydrate-side; clippy + fmt clean.
Branch: `dawn/git-read-hydrate` off `dawn/git-store-cas`. Transport
wiring (info_refs/upload_pack switching on `state.git_store.is_some()`)
is the follow-up commit; this lands the hydration primitive in isolation.
Co-authored-by: Dawn <c6237ef84fa537c78dcee78efd2d4e59f728859c7f194da42ac51ededfa0be05@users.noreply.sprout>
Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Closes the named gap from `f5cb52ee`: confirms that `hydrate_for_read` with a manifest carrying empty `refs` and `packs` produces a bare repo whose `git clone` behavior is indistinguishable from `git clone` of a freshly `git init --bare`'d repo — exits 0, no objects, no refs, HEAD pointing at the manifest-configured default branch, with git's standard "you appear to have cloned an empty repository" warning on stderr. Mari flagged this as the case her HTTP e2e harness will exercise; this test pins the hydration mechanics underneath it. 210/210 relay tests; 6/6 hydrate (3 unit + 3 live, all `SPROUT_GIT_S3_PROBE=1`-gated). Co-authored-by: Dawn <c6237ef84fa537c78dcee78efd2d4e59f728859c7f194da42ac51ededfa0be05@users.noreply.sprout> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Two cleanups from Sami's `f5cb52ee` read-path review: 1. **`HydrateError::PointerNotFound` was declared but never constructed.** `hydrate_for_read` signals "repo doesn't exist" via `Ok(None)`; the variant was masked by a module-wide `#![allow(dead_code)]`. The doc-comment on `HydrateError` still claimed `PointerNotFound` was the 404 signal, which contradicted the actual code. Remove the variant (the `Ok(None)`/`Err(_)` split was already structural) and update the doc to say: every `HydrateError` variant maps to a 5xx, missing-repo is `Ok(None)`. Type system now enforces the HTTP layer split. 2. **Drop module-wide `#![allow(dead_code)]`.** It was hiding (1) and would silence future accidental dead code. Turns out nothing in the module is actually dead once the test module references the API, so the allow wasn't needed at all — replaced with a comment explaining why the items are `pub` ahead of transport wiring. 210/210 relay tests, 6/6 hydrate (incl. live MinIO roundtrip), clippy + fmt clean. Co-authored-by: Dawn <c6237ef84fa537c78dcee78efd2d4e59f728859c7f194da42ac51ededfa0be05@users.noreply.sprout> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
…/<r>/pointer) Quinn's `cas_publish::pointer_key` (`quinn/push-fence-cas @ 1185709`) uses `repos/<owner>/<repo>/pointer`. My hydrate used `pointers/<owner>/<repo>`. **Hard write/read mismatch** that would have silently 404'd every clone against a repo that exists, only surfacing at live e2e time. Aligning to Quinn's schema — namespace under each repo lets future per-repo keys (config, refs index, gc state) co-locate. Live MinIO roundtrip + empty-repo + missing-pointer tests all still green under the new layout. 210/210 relay tests; clippy + fmt clean. Co-authored-by: Dawn <c6237ef84fa537c78dcee78efd2d4e59f728859c7f194da42ac51ededfa0be05@users.noreply.sprout> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
The predicates were duplicated between hydrate (read side) and manifest (write side). After `manifest::validate` lifted them as `pub` predicates, hydrate imports them instead of carrying its own. Single source of truth → no divergence drift between write rejection and read rejection. Live MinIO roundtrip + empty-repo + missing-pointer still 6/6 green; 218 relay tests total; clippy + fmt clean. Co-authored-by: Dawn <c6237ef84fa537c78dcee78efd2d4e59f728859c7f194da42ac51ededfa0be05@users.noreply.sprout> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Follows the predicate-dedup commit (`471e4dae`): `manifest::pointer_key` is now `pub`, so hydrate imports rather than carrying its own definition. Single source of truth shared with `cas_publish`. Drops the duplicate `pointer_key_strips_dot_git` test (covered by `manifest::tests::pointer_key_strips_dot_git`). 221/221 relay tests; 5/5 hydrate live (MinIO roundtrip + empty-repo + missing-pointer); clippy + fmt clean. Co-authored-by: Dawn <c6237ef84fa537c78dcee78efd2d4e59f728859c7f194da42ac51ededfa0be05@users.noreply.sprout> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Pure function over a borrowed slice of Manifest fields (RefStateInputs). Quinn's cas_publish calls build_ref_state_event after a successful CAS in finalize_push. 9 unit tests pin NIP-34 invariants: - HEAD tag wrapped 'ref: <head>' (storage is bare; protocol formatting at emit) - Only refs/heads/* and refs/tags/* emit; other ref namespaces filtered - OIDs validated 40/64-hex (SHA-1/SHA-256); invalid OIDs skip-not-fail - Ref names validated (no //, no leading /) - Deterministic tag order: d, refs (BTreeMap-sorted), HEAD, p - Invalid actor_pubkey_hex errors before tag construction - d-tag is repo_id, NOT <repo>.git (caller responsibility, pinned) Signed with relay_keys: relay is authoritative for ref state of repos it hosts. Co-authored-by: Sami <sami@users.noreply.sprout> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
…apshot
Refactors the receive-pack path so the post-push publish is **inline and
sequenced before** the success response is built — and so the structural
seam is type-visible rather than convention.
## What changes
- Introduce `PackOutput { stdout: Vec<u8> }` and `PushContext { pack,
refs_before, owner, repo, pusher, repo_path }`. The push handler
passes a `PushContext` to `finalize_push`, which is the **only**
function that converts a push's `PackOutput` into a 2xx `Response`
(via `build_git_response`).
- `snapshot_refs` returns `Result<String, SnapshotError>` instead of
collapsing failure to `""`. The previous shape silently skipped
publish when *both* pre- and post-push snapshots failed (they both
became `""` and compared equal); the `Result` shape makes that
impossible.
- Pull the publish/skip decision into a pure `should_publish(before,
after)` function. Deny-by-default: skip only when both snapshots are
`Ok` and equal; every other state (changed refs, either side `Err`)
falls through to publish. Six unit tests pin each arm.
- Remove the fire-and-forget `tokio::spawn` that previously published
*after* `Ok(response)` returned. Publish now awaits inline on the
changed-or-error path; no-op pushes short-circuit and pay zero fence
latency.
- Drop the unused `run_git_service_with_env` wrapper now that
receive-pack calls `run_git_subprocess` directly.
## Why
This is the §Implementation Correspondence seam called for by
`docs/git-on-object-storage.md`. With this in place:
1. **Theorem 1 (fence) — structurally enforceable.** A push cannot
produce a 2xx `Response` without going through `finalize_push`, which
contains the conditional `publish_ref_state().await`. The compiler
is the doc comment.
2. **Double-snapshot-failure hole closed.** `(Err, Err)` is outside the
skip arm; it cannot collide on `""` and silently bypass publish.
3. **No-op fast path preserved.** Denied/no-op pushes do not pay
publish latency. The fence engages iff refs changed or either
snapshot errored.
Today `publish_ref_state` is a relay-DB insert; failure is logged and
the response is still 200 (pack is durable on disk). The S3
manifest-CAS evolution lands on this same seam — `publish_ref_state`
becomes a conditional PUT, and the 412→409 mapping plugs into the
`if let Err(e)` arm of `finalize_push`. No further refactor needed when
that work lands.
## Tests
Six unit tests on `should_publish` cover every arm of the fence
decision: no-op skip, changed-refs publish, first-push-to-empty-repo
publish, before-err publish, after-err publish, and the load-bearing
both-err publish (the bug this refactor exists to close).
Runtime ordering — publish completes before `build_git_response` is
called — is enforced by `finalize_push` being a single sequential async
function. A future integration test once a mockable publish seam exists
would be belt-and-suspenders; left as a follow-up.
Refs: spec/git-on-object-storage @ 0896aff (or successor).
Co-authored-by: Quinn <quinn@users.noreply.sprout>
Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Adds crates/sprout-relay/src/api/git/cas_publish.rs — the pure async function that turns a post-receive-pack workspace into a durable manifest CAS. Composes Dawn's GitStore primitives (put_pack, put_manifest, get_pointer, put_pointer) and Dawn's Manifest schema (canonical_bytes, validate, Inv_Closed at compose time) into the spec's step 2-7 sequence: read pointer (e, d_before) §step 3 fetch + verify m_before §step 3 + A1 detectability snapshot refs + symref-HEAD from disk (HEAD inherits parent on detach) pack new objects via pack-objects --revs §step 1-2 put_pack(bytes) -> packs/<sha256> §step 2 (A1) compose m_after (parent packs + new pack, parent = digest only) §step 5 m_after.validate() (Sami/Max/Perci #2-#4) put_manifest(canonical_bytes) -> manifests/<sha256> §step 6 put_pointer(IfMatch(e) | IfNoneMatchStar) §step 7 (CAS) Won -> CasSuccess { manifest, manifest_key } Lost -> CasError::Conflict { winner_manifest, winner_manifest_key } (→ HTTP 409, with winner for disk reconcile) The function returns *before* a Response is constructed — it is called from finalize_push, which is the unique site that builds a push 2xx, so the structural seam still enforces Theorem 1 (success-after-CAS). ## Review fixes folded in Sami's review (#1–#6) + Perci's #1 + Max's pre-CAS-validation blocker are all addressed in this commit: - **parent = bare 64-hex digest, not full key** (Perci #1, Max). Pointer body is `<digest>`; `Manifest.parent` stores the same digest, matching `Inv_RefDerivedFromParent` literally. `read_parent` strips the `manifests/` prefix before assigning. Dawn's new `MalformedParent` validator catches any drift at the write seam. - **Pre-CAS validation** (Sami #2, Max). `m_after.validate()?` runs between `compose_after` and `put_manifest`. Unsafe refnames, malformed oids, empty HEAD — all surface as `CasError::ManifestInvalid(...)` (4xx-class) before any S3 write, *not* as "valid CAS, un-clone-able output." Typed variant (not reused `ManifestReadFailed`) so logs / status mapping distinguish "client input rejected" from "stored parent failed A1" (Max + Dawn). - **Detached-HEAD fallback** (Sami #3). `snapshot_workspace_state` returns empty `head` on detached HEAD; `cas_publish` falls back to `parent.head` if non-empty. `validate()` rejects the first-push-+- detached case (Sami #4 — no parent to inherit from, manifest is un-clone-able). - **Conflict carries the winner** (Sami #5 + Dawn). `Conflict { winner_manifest, winner_manifest_key }` lets `finalize_push` invoke Eva's `reconcile_to_manifest` mechanically from the error arm, without a second pointer GET in the caller. `warn!()` at the `LostRace` site logs (pointer, expected etag, attempted manifest) for debugging concurrent-push patterns. Boxed for `clippy::result_large_err`. - **Empty-pack comment** (Sami #6). Clarified `capture_pack` returns `None` in both the delete-all (`refs_after.is_empty()`) and refs-only (`pack-objects` empty stdout) cases. - **`pointer_key` consolidated** in `manifest.rs` (Sami #1, Dawn, Max — Sami's "single source of truth" argument). `cas_publish` imports it; the duplicate definition is gone. - **`validate-invocation` test added** in `cas_publish.rs` (Sami's recommendation). Pins that a future refactor dropping the `validate?` call between `compose_after` and `put_manifest` is caught by unit test, not by every subsequent un-clone-able read. ## What this deliberately does NOT do (each with citation) - No retry on LostRace. Per Sami's TLA-action guidance: the receive-pack output is derived against a now-superseded parent; reusing it would violate Inv_RefDerivedFromParent. Client re-pushes, which re-hydrates + re-runs receive-pack against the advanced pointer — the only safe retry, which git already performs. Spec §Push step 7: 'GOTO 3 (retry) or respond non-ff' — both arms safe; we take non-ff. - No kind:30618 emission. That is derived after CAS — finalize_push calls Sami's build_ref_state_event over m_after.refs / m_after.head on Ok. Spec §Implementation Correspondence: 'kind:30618 is derived after CAS, never the commit.' - No advisory lock. Spec §Push 'no advisory lock in v1' — writer serialization is the CAS. A mutex would hide the contention Inv_NoFork proves safe. ## Tests 10 unit tests pin digest_from_key (manifest/<...> prefix invariant), compose_after (Inv_Closed coverage, sort, dedupe, refs-only-no-new-pack, first-push, parent-is-digest-not-key), validate invocation (unsafe refname + first-push-empty-HEAD both rejected pre-CAS). 244 relay tests green; clippy --tests -D warnings clean. The integration into finalize_push lands separately — Eva owns the AppState::git_store wiring + main.rs startup probe gate. This module is callable today: cas_publish(&store, repo_path, owner, repo, &refs_before) -> Result<CasSuccess, CasError>. Refs: - docs/git-on-object-storage.md §Push step 2-7, §Implementation Correspondence, §Mechanized Verification (Inv_NoFork, Inv_RefEffectApplied, Inv_RefDerivedFromParent, Inv_Closed). Also makes transport::harden_git_env pub(crate) for reuse by cas_publish's two subprocess sites (for-each-ref, pack-objects). Co-authored-by: Tyler Longwell <tyler@block.xyz> Co-authored-by: Quinn <quinn@users.noreply.sprout> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Hoists the pointer-load out of cas_publish so the workspace and the CAS predicate come from the same observed pointer state, closing Perci's "build on superseded state" hazard at the type system rather than at runtime. Concretely: - cas_publish::ParentState becomes pub, with a thin from_loaded(etag, digest, manifest) constructor. The pointer-reading helper read_parent is deleted; cas_publish no longer touches the pointer except for the CAS write at step 7. - cas_publish takes &ParentState instead of refs_before. The capture_pack "not" set is now parent_state.parent.refs (the refs the workspace was hydrated from), so the delta covers exactly the new objects, not whatever happened to be on disk. - hydrate.rs factors out two private helpers: load_pointer (pointer → (etag, digest, verified manifest) with fail-closed below-pointer semantics) and materialize_manifest (init bare → fetch+index packs → install refs+HEAD, phase-ordered). hydrate_for_read becomes a three-line wrapper. - New hydrate_for_write(store, owner, repo) -> (HydratedRepo, ParentState): the write-path entry point. Loads pointer + materializes workspace in one round-trip; first-push case returns (git init --bare + ParentState::fresh()), no manifest needed. Caller contract is now structural: cas_publish(.., &parent_state) CAS-es on parent_state.if_match, so a concurrent writer that advances the pointer between hydrate and CAS reliably surfaces as Conflict/HTTP 409. m_after.parent is *literally* the digest of the manifest the workspace was hydrated from — Inv_RefDerivedFromParent holds by construction, not by code review. 65 git tests still pass on relay lib. No transport changes yet — those land next. Co-authored-by: Tyler Longwell <tyler@block.xyz> Co-authored-by: Quinn <quinn@users.noreply.sprout> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Co-authored-by: Mari <95cae996907d7cab9f5dbf43c0f53edeac6ab0b032a6feae4abfd784e467b3f5@users.noreply.sprout> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Rewrites transport.rs to make the manifest-pointer protocol the only path that exists. The legacy persistent-bare-repo path is gone; every request hydrates an ephemeral tempdir from the published state and drops it on scope exit. Per Eva's "transport is yours, end to end" brief. ## What lands **Read path (info_refs, upload_pack):** validate_repo_id → hydrate_for_read → spawn git --stateless-rpc against repo.path() → build response. `Ok(None)` from hydrate (pointer absent) → 404; `Err` (any below-pointer failure: manifest 404, digest mismatch, pack 404, index-pack fail) → 5xx. Closes Max's read-fail-closed blocker — A1 detectability now holds end-to-end on the read side. **Push path (receive_pack):** validate_repo_id → hydrate_for_write → install_hook → run receive-pack against tempdir → finalize_push. PushContext binds the *same* ParentState the workspace was hydrated from, so the CAS predicate in cas_publish is structurally tied to the observed pointer — Inv_RefDerivedFromParent holds by construction, no re-reading of the pointer between hydrate and CAS. **Fence (finalize_push):** the unique constructor of a push 2xx. Calls cas_publish; on Won, derives kind:30618 from the *committed* manifest's refs/head via manifest_event::build_ref_state_event, fans out, *then* builds the response. On Conflict → 409 (tempdir drops on scope exit; no on-disk reconcile needed because there's no persistent disk to drift). On ManifestInvalid → 400 (client/input rejected pre- CAS). On other CasError → 500. **30618 only on real change:** compares parent_digest to the committed manifest digest; a true no-op push (refs/head identical to parent, no new objects) skips 30618 emission entirely. Pointer re-installation on no-op is cheap (one IfMatch PUT of the same value) and was deemed not worth a special-case bypass. ## What's gone - `git_repo_locks` on AppState + the per-repo tokio mutex. **Writer serialization is the CAS** (spec §Push, no advisory lock in v1). The mutex hid the contention Inv_NoFork proves safe and was wrong at multi-instance scale anyway. - `should_publish` predicate + `SnapshotError` + `snapshot_refs` + 6 unit tests on the old fence shape. The new fence is "the structural seam: only finalize_push builds a push 2xx, and only after cas_publish returns Ok." Skip semantics moved into the manifest-changed comparison in finalize_push. - `publish_ref_state` (90 LOC of inline kind:30618 building) — replaced by a 5-line call to Sami's `manifest_event::build_ref_state_event` over the committed manifest. Unit-pinned NIP-34 invariants beat reading refs from disk a second time. - `validate_repo_path` (path canonicalization, repo-root containment). No longer relevant — workspaces are tempdirs, not paths under a repo root. Owner/repo *name* validation stays as `validate_repo_id`; it's still load-bearing because owner/repo become object-store key components via `manifest::pointer_key`. - `run_git_service` + `run_git_subprocess` (`(owner, repo)`-coupled) replaced by single `run_git_at(path, service, body, extra_env)` taking a path directly. One shared subprocess runner for all three handlers; the read paths use it for `--advertise-refs`, push uses it for `--stateless-rpc receive-pack`. ## Caller contract preserved - Pre-receive hook contract is unchanged: install_hook writes the same script; hook env (SPROUT_HOOK_URL/SECRET/REPO_*/PUSHER_PUBKEY + GIT_CONFIG_COUNT/KEY/VALUE override of core.hooksPath) is the same per-push env vec, just keyed to the tempdir's hooks/ instead of a persistent path. Authz + FF detection is byte-identical to before. - Global git_semaphore stays — bounds in-flight subprocesses, distinct from the per-repo serialization the dropped mutex was conflating itself with. ## Tests 238 relay lib tests green (-6 from the dropped should_publish tests). clippy --tests -D warnings clean. fmt clean. Relay binary builds. sprout-test-client e2e_git compiles (Mari's live tests, ported via Mari's cherry-picked commit). The full live e2e roundtrip needs a running relay + MinIO; that's Eva's e2e run on the assembled branch per the convergence plan. Refs: - docs/git-on-object-storage.md §Push step 2–8, §Read, §Implementation Correspondence, §Mechanized Verification (Inv_NoFork, Inv_RefEffectApplied, Inv_RefDerivedFromParent, Inv_Closed). Co-authored-by: Tyler Longwell <tyler@block.xyz> Co-authored-by: Quinn <quinn@users.noreply.sprout> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
The architectural split Eva's live e2e caught: 30617 created the on-disk bare repo but no S3 pointer, while the read path (info_refs/upload_pack) now hydrates from the pointer. Pointer-absent ⇒ Ok(None) ⇒ 404 by Max's intentional fail-closed, so announce-then-clone returned 404 for every freshly-announced repo. Fix: establish the invariant "repo announced ⟺ pointer exists" by seeding an empty-manifest pointer at the end of `handle_git_repo_ announcement`. After seed: - `info_refs` stays strictly fail-closed: pointer-absent means never-announced, not "announced but no pushes." Max's blocker preserved exactly. - `live_hydrate_empty_repo` proves clone of an empty hydrated repo works. - First push CASes the seeded pointer via `IfMatch(etag)`, no special-case branch. The `IfNoneMatchStar` arm of `cas_publish` becomes dead code for announced repos. Two new private helpers in `side_effects.rs`: - `seed_manifest_pointer(state, owner, repo)` — `put_manifest(empty)` + `put_pointer(IfNoneMatchStar)`. **Idempotency is constructive, not trusted**: a `CasOutcome::LostRace` is treated as success ONLY if the existing pointer body matches the empty manifest's digest. Any other value (stale from a prior lifecycle, real misconfiguration) surfaces as `anyhow!` — Max's guardrail #1. - `emit_initial_ref_state(state, owner, repo)` — kind:30618 over the seeded empty manifest using Sami's `build_ref_state_event`. Pointer is the commit; this event is the derived "repo exists, empty" signal — Max's guardrail #2. Non-fatal on failure: manifest is truth, 30618 is just notification. Empty manifests across all repos share canonical bytes (deterministic serialization by construction) ⟹ same digest ⟹ `put_manifest` is idempotent at the store level. One blob, many pointers. Pinned by `empty_manifest_validates` test in manifest.rs with byte-stable canonical bytes — locks the digest so future serde version bumps don't silently shift it. Rollback on seed failure: remove the on-disk repo + the name reservation, same pattern as hook-install failure. A successful seed but failed emit leaves the pointer in place (correctly — it's the source of truth). 239/239 relay tests (was 238 + 1 new), clippy + fmt clean. Co-authored-by: Dawn <c6237ef84fa537c78dcee78efd2d4e59f728859c7f194da42ac51ededfa0be05@users.noreply.sprout> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
de17a77 to
37dad04
Compare
…spondence refresh §Scope-and-Non-Goals gains a 'v1 deployment architecture' subsection: - Stateless per-request hydration as the v1 read+write path (hydrate_for_read / hydrate_for_write). Multi-instance-ready by construction; nothing on local disk needs cross-instance coordination. - The accepted v1 tradeoff: under concurrent same-repo pushes, every contender hydrates and runs receive-pack, and CAS losers' subprocess work is discarded. Wasted CPU/IO under contention, not a correctness bug — Inv_NoFork holds because the CAS is the only writer serialization. - Bounded retry on classified-terminal-vs-transport errors is parked, not closed. The checked-in e2e runs an 8-way live CAS race against MinIO with no retry layer (the regression fence); a local 16-way run confirms the property holds at that width too (calibration evidence, not the CI default). v1 ships without retry. The open question — 'is the no-retry default safe past MinIO and ≤16-way?' — re-opens on a different backend or sustained load. Non-negotiable rule: retry, if added, retries only pre-classification network errors; never Ok(2xx), LostRace(412), or NotFound(404). §Implementation Correspondence refreshed to match PR #726's tip. The 'fallible snapshot' / 'conditional fence' / 'should_publish' bullets and the pre-S3 line-number table predated the S3-CAS implementation and were actively stale; replaced with four current obligations: - Unique constructor seam (unchanged: finalize_push is still the sole push 2xx constructor). - Parent observed once: hydrate_for_write returns (HydratedRepo, ParentState); cas_publish predicates the CAS on the same ETag the workspace was hydrated against, never re-reads. Rust analogue of Inv_RefDerivedFromParent. - 412 → 409 terminal: CasError::Conflict is typed and distinct from Backend(StoreError); ?-bubbling cannot turn 412 into 500; no in-handler retry. Closes the 'future work' gap the previous version named. - kind:30618 derived after CAS: emission gated on manifest digest change; event built from CasSuccess.manifest (the values that physically landed); relay-signed; non-fatal on insert error. - No advisory lock: writer serialization is the CAS; the legacy per-repo mutex is incompatible with the multi-instance architecture and is gone. Spec-element table updated to current file locations across manifest.rs / store.rs / hydrate.rs / cas_publish.rs / manifest_event.rs / transport.rs (line numbers pinned at landing time; reviewers checking after later refactors should use symbol search, not line counts). 239 relay lib tests + live MinIO e2e (roundtrip + N-way concurrent-push no-fork) green on PR #726's tip. Sole remaining named follow-up: behavioral integration test for runtime ordering (publish-before-response), pending a mockable-CAS seam. The no-fork claim is empirically gated by the live N-way e2e. Co-authored-by: Sami <sami@users.noreply.sprout> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Co-authored-by: Mari <95cae996907d7cab9f5dbf43c0f53edeac6ab0b032a6feae4abfd784e467b3f5@users.noreply.sprout> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
The startup A3 conformance gate races race_width parallel CAS updates against the object store (default 32). The Desktop E2E job's MinIO couldn't service 32 simultaneous fresh connections at boot — one racer hit a reqwest send error (transport, not a conformance violation) and the fail-closed probe aborted relay startup. 8-way still exercises all four probe phases (sequential / if_match_race / if_none_match_race / etag_consistency) and proves A1/A2/A3; it just doesn't burst the connection pool. Matches the dev default. Follow-up (separate commit): the probe should classify transport errors (reqwest/http/io send failures) distinctly from conformance signals and bounded-retry them, so a saturated backend never gates startup at any width. Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
The PR's thesis is "no persistent per-repo disk state — runtime reads and writes hydrate an ephemeral bare repo from object storage per request." But handle_git_repo_announcement still wrote one: git init --bare + four git config calls + pre-receive hook install at git_repo_path/<owner>/<repo>.git/. Nothing in the runtime ever read it — info_refs / upload_pack / receive_pack all go through hydrate_for_read / hydrate_for_write into a tempfile::TempDir and do their own init + hook per request. The on-disk repo was dead state that contradicted the architecture it shipped under. It wasn't a clean delete: repo_dir.exists() was the "already announced?" check and the owner subdir's entry-count was the per-pubkey quota, so removing the bare repo naively would have broken both. Collapse both onto the .names/ registry, which already existed for name uniqueness: .names/<repo_id>/owner records the announcer. One mechanism now serves three jobs — atomic-create uniqueness, same-owner idempotent re-announce, and quota-by-owner. The path-canonicalization block (only there to guard the now-gone repo_dir) and the tokio::process::Command import go with it. git_repo_path now holds only the name index; doc comments on the handler, the dispatch arm, and the config field updated to say so. The one local-disk simplification (separate instances each grant a name; the CAS pointer, not this registry, prevents ref-state corruption) is called out in-code as the multi-instance follow-up. Net -114 LOC. Verified: full cargo test -p sprout-relay (239 passed), clippy -D warnings clean, fmt clean, and the live MinIO e2e (git_clone_push_fetch_force_roundtrip + concurrent-push no-fork race) green against a relay running this binary — with zero bare repos created on disk, only .names/<repo>/owner reservations. Co-authored-by: Eva <eva@users.noreply.sprout> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
tlongwell-block
pushed a commit
that referenced
this pull request
May 22, 2026
…robe races
The probe was conflating transport failures with conformance failures.
`if_match_race` / `if_none_match_race` did `Err(e) → ProbeFailure` for
any per-racer error, but `S3Error::Reqwest/Http/Io` means the racer
never got a classified response from the backend — its outcome is
*unknown*, not negative. A3 is a claim about **observers** ("exactly
one CAS wins per (key, etag)"); an unknown racer isn't an observer.
Treating one as a probe failure made the relay un-bootable on any
constrained host that couldn't service the burst (Eva caught this on
Desktop E2E at 32-way against a CI-box MinIO).
Fix is drop-and-floor:
- Pre-classification transport errors (`S3Error::{Reqwest, Http, Io}`)
drop the racer from the observer set and increment `transport_drops`.
- Post-receipt parse/decode errors (`Utf8`, `ReqwestHeaderToStr`,
`SerdeXml`, `FromUtf8`, `InvalidHeaderValue`) and any
`HttpFailWithBody` stay in the catch-all — the backend *did* answer,
but not in the contract shape, which is a real conformance signal.
- Per-round floor: `classified >= 2` (A3 needs ≥2 observers to *see* a
race; with 0 classified the probe didn't run at all — fail closed).
- `winners == 1` among classified observers (was: among all racers).
- Create-only race contract is now `twos == 1 && twelves == classified
- 1` (was: fixed `race_width - 1`, which false-positived on any
transport drop).
`ProbeReport` gains `transport_drops: usize` (total across both race
phases), surfaced in the startup admission log line — a slowly-
degrading backend shows up as "admitted with degraded observation
count" before it's a probe failure.
The shape is strictly smaller than the alternatives the team
considered: no per-racer or per-round retry, no retry policy
parameters, no startup-latency cost under flake. Crucially it doesn't
smuggle a network-stack test into a correctness probe — retrying a
classified outcome would change the TLA action and violate the same
"retry only pre-classification errors" rule Sami's `cas_publish`
guidance enforces.
Eva's runtime workaround (`SPROUT_GIT_PROBE_WRITERS=8` on Desktop E2E,
already shipped on 61cc38f) stays as a deployment-level knob, but the
probe is no longer fragile to it.
Diff scope: `store.rs` (the rewrite) + one log-line touch in `main.rs`
(folding `transport_drops` into the admission log). 239 relay lib
tests still green, clippy `--tests -D warnings` clean, fmt clean.
Refs:
- Channel discussion: PR #726 thread, shape (c) drop-and-floor
agreed by Quinn + Max + Eva (the latter accepting the framing flip
from her original (a)/(b) retry proposals).
- docs/git-on-object-storage.md §Conformance, §Implementation
Correspondence (retry rule: "if added, lives in the store layer and
retries *only* pre-classification network errors").
Co-authored-by: Tyler Longwell <tyler@block.xyz>
(cherry picked from commit b55f9c2)
…robe races
The probe was conflating transport failures with conformance failures.
`if_match_race` / `if_none_match_race` did `Err(e) → ProbeFailure` for
any per-racer error, but `S3Error::Reqwest/Http/Io` means the racer
never got a classified response from the backend — its outcome is
*unknown*, not negative. A3 is a claim about **observers** ("exactly
one CAS wins per (key, etag)"); an unknown racer isn't an observer.
Treating one as a probe failure made the relay un-bootable on any
constrained host that couldn't service the burst (Eva caught this on
Desktop E2E at 32-way against a CI-box MinIO).
Fix is drop-and-floor:
- Pre-classification transport errors (`S3Error::{Reqwest, Http, Io}`)
drop the racer from the observer set and increment `transport_drops`.
- Post-receipt parse/decode errors (`Utf8`, `ReqwestHeaderToStr`,
`SerdeXml`, `FromUtf8`, `InvalidHeaderValue`) and any
`HttpFailWithBody` stay in the catch-all — the backend *did* answer,
but not in the contract shape, which is a real conformance signal.
- Per-round floor: `classified >= 2` (A3 needs ≥2 observers to *see* a
race; with 0 classified the probe didn't run at all — fail closed).
- `winners == 1` among classified observers (was: among all racers).
- Create-only race contract is now `twos == 1 && twelves == classified
- 1` (was: fixed `race_width - 1`, which false-positived on any
transport drop).
`ProbeReport` gains `transport_drops: usize` (total across both race
phases), surfaced in the startup admission log line — a slowly-
degrading backend shows up as "admitted with degraded observation
count" before it's a probe failure.
The shape is strictly smaller than the alternatives the team
considered: no per-racer or per-round retry, no retry policy
parameters, no startup-latency cost under flake. Crucially it doesn't
smuggle a network-stack test into a correctness probe — retrying a
classified outcome would change the TLA action and violate the same
"retry only pre-classification errors" rule Sami's `cas_publish`
guidance enforces.
Eva's runtime workaround (`SPROUT_GIT_PROBE_WRITERS=8` on Desktop E2E,
already shipped on 61cc38f) stays as a deployment-level knob, but the
probe is no longer fragile to it.
Diff scope: `store.rs` (the rewrite) + one log-line touch in `main.rs`
(folding `transport_drops` into the admission log). 239 relay lib
tests still green, clippy `--tests -D warnings` clean, fmt clean.
Refs:
- Channel discussion: PR #726 thread, shape (c) drop-and-floor
agreed by Quinn + Max + Eva (the latter accepting the framing flip
from her original (a)/(b) retry proposals).
- docs/git-on-object-storage.md §Conformance, §Implementation
Correspondence (retry rule: "if added, lives in the store layer and
retries *only* pre-classification network errors").
Co-authored-by: Quinn <quinn@users.noreply.sprout>
Co-authored-by: Tyler Longwell <tyler@block.xyz>
Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
b2ff4c0 to
e6a285d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Git on Sprout, S3-backed
Git push / fetch / clone over Sprout's relay with S3 (object storage) as the source of truth — no per-repo persistent disk state, multi-instance ready by construction.
Architecture
Manifest { head, refs, packs, parent }. A single S3 pointer per repo names the live manifest digest.ParentState), runs receive-pack against a workspace hydrated from that exact manifest, composesm_after, and commits with a conditional pointer write (If-Match: etag). Concurrent writers lose the CAS →409, never corruption. The "build ond_old, publish againstd_new" hazard is closed by type —ParentStateis the only way intocas_publish.Responseis constructible without aPushContextthroughfinalize_push; every error arm returns non-2xx before any 2xx is built.Verification (tip
de17a77a)cargo test -p sprout-relay→ 239 passed, 0 failed--ignored):git_clone_push_fetch_force_roundtrip✓ andgit_concurrent_push_one_wins_and_repo_recovers✓ — the N-way race: one writer wins, N−1 clean 409s, repo recovers, no corruption (verified in relay logs: all losers resolve the same winner manifest)--tests+ fmt clean; pre-push CI gate (rust-tests / rust-clippy / desktop-tauri-check / mobile-test) greenFiles
cas_publish.rs— §Push step 2–7 commit point + CAS ·hydrate.rs— stateless read/write hydration ·manifest.rs/manifest_event.rs— canonical manifest + 30618 builder ·store.rs— S3 object-store + pointer CAS ·transport.rs— info-refs / upload-pack / receive-pack on hydrate+CAS ·side_effects.rs— seed-pointer-on-announce ·e2e_git.rs— live git e2eFollow-up before merge: design-doc patch (stateless-hydration framing) from Sami; final team 9/10 sign-off.
🤖 Eva (Sprout) · co-authored by Quinn, Dawn, Sami, Max, Perci, Mari