sandbox-image: bridge CLI to sandbox-proxy sign_blob#698
Open
sgirones wants to merge 10 commits into
Open
Conversation
Adds a CLI bridge so `build_sandbox_image` works against both the legacy platform-api response (embedded pre-signed `upload` block) and the new versioned-response shape (`snapshotRelPath` only). On the new path the CLI calls the sandbox-proxy `POST /api/v1/blob/sign` endpoint and splices the returned upload spec into the raw prepared spec before handing spec.json to the in-sandbox rootfs builder. The branch key (`snapshot_rel_path`) is the only field added to the typed `PreparedSandboxTemplateBuild`. Everything else — including the `upload` block from either path — stays opaque inside the raw passthrough `Value`, preserving the property that future fields added to the platform-api ↔ in-sandbox-builder contract don't require an SDK release. Always multipart on the new path with 100 MB parts, clamped to ≥ 1 and saturated at u32::MAX; size hint reuses the existing `rootfs_disk_bytes` precedence (explicit --disk_mb → parent's rootfsDiskBytes for diff builds → default). Bindings (Python, Node) are unchanged — they only see the final registered-template JSON. Co-authored-by: Cursor <cursoragent@cursor.com>
Platform-api is moving the snapshot location off `snapshotUri` and onto `snapshotRelPath` (the rel-path then gets resolved client-side via `SandboxProxyClient::sign_blob`). Stop requiring `snapshotUri` on the prepared-spec response so the CLI keeps deserializing once platform-api drops the field. The completion path now prefers the in-sandbox builder's metadata.json for the final URI (it always knows where it landed the upload), falls back to the prepared value for the legacy path, and errors clearly if neither source provides one — instead of POSTing an empty string to platform-api's complete endpoint. Co-authored-by: Cursor <cursoragent@cursor.com>
`pick_upload_op` always returned `MultipartPut` — it "picked" nothing. The whole helper, plus `disk_mb_for_upload`, plus the four boundary tests, were just wrapping a one-line part-count computation around the sole call site in `build_sandbox_image`. Inline it. The splice now reuses the `rootfs_disk_bytes` value already computed just upstream for builder sizing, so we don't recompute the same precedence (explicit --disk_mb → parent rootfsDiskBytes for diff → default). `MULTIPART_PART_SIZE_MB` stays as the one tunable, and the clamp / saturation rationale moves into the comment at the call site. Net -42 lines. Co-authored-by: Cursor <cursoragent@cursor.com>
Drop `#[serde(rename_all = "camelCase")]` so `rel_path` goes on the wire as `rel_path` to match the sandbox-proxy's expected payload shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop the part size from 100 MiB to 64 MiB and cap the requested part count at S3's 10,000-part limit so absurd disk budgets don't ask the proxy to mint an invalid multipart op. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extend SignBlobRequest to accept either a rel_path or a full uri and add a SingleGet BlobOp so the proxy can presign downloads. When a prepared spec includes a parent, fetch a signed download for the parent manifest URI and inject it into the prepared spec.
Cross-reference MAX_MULTIPART_PARTS in the dataplane's sign_blob endpoint so a future change to either side flags the other.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
CLI side of the versioned-response rollout for sandbox-image builds.
Platform-api is moving away from returning a pre-signed
uploadblockfrom
prepare-rootfs-buildin favor ofsnapshotRelPath. When the CLIsees the new shape it calls sandbox-proxy's new
POST /api/v1/blob/signand splices the result back in as
uploadbefore handing the spec tothe in-sandbox builder. The legacy shape still works unchanged so we
can roll this out without lockstep deploys.
snapshotUriis also relaxed to optional on the prepared spec for thesame forward-compat reason; the completion path now resolves it from
the builder's
metadata.jsonfirst, the prepared spec second, errorsif neither has it.
Notes for review
uploadblock stays opaque JSON in the passthroughValue—deliberately not typed, so future fields in the platform-api ↔
in-sandbox-builder contract don't force an SDK release.
not something this PR decides.
SinglePutis still on the wire asa variant.
the final registered-template JSON, and
#[serde(default)]on thenew/relaxed fields keeps them compiling.
Test plan
upload).sign_blob.Feature: dataplane presigned URLs for image builder
This PR is part of a three-repo feature. The same explanation is appended to all three so reviewers can pick up cold from any of them.
Related PRs
platform-api— https://github.com/tensorlakeai/platform-api/pull/530 —/prepareexposessnapshotRelPath;/completeaccepts dataplane-signed URIs.compute-engine-internal— https://github.com/tensorlakeai/compute-engine-internal/pull/986 —POST /api/v1/blob/signon the dataplane HTTP proxy.tensorlake(CLI / SDK) — sandbox-image: bridge CLI to sandbox-proxy sign_blob #698 — CLI bridges/prepare→sign_blob→ in-sandbox builder.Why
Move S3 URL signing for sandbox-template rootfs builds out of
platform-apiand into the regional dataplane. The dataplane already owns blob-store credentials for its region; this removes the last piece of S3 fromplatform-apiand lets each region sign against its own bucket.Components
platform-api— orchestrates the build, owns the rel-path namespace (projects/{project}/sandbox-template-builds/{build}/{snapshot}.tlsnap).compute-engine-internal(dataplane) — exposesPOST /api/v1/blob/signon the HTTP proxy; composes the bucket URI from its own config + the caller's authenticated namespace, then signs.tensorlake(CLI / SDK) — bridges the two: calls/prepare, asks the sandbox proxy to mint URLs, runs the in-sandbox builder.Old flow (pre-signed at
/prepare)platform-apiover-provisioned ~3× (≈ 528 parts default, up to 10 000) because it didn't see the CLI's--diskvalue.New flow (signed at the dataplane)
The CLI now sizes the part count from the actual rootfs disk budget (typically a few hundred 64 MiB parts), so the dataplane cap matches S3's 10 000-part ceiling without over-allocating.
Wire contract
/api/v1/blob/signresponse as opaqueserde_json::Valueand splices it verbatim intospec.upload/spec.parent.download. Theplatform-api↔ in-sandbox-builder contract stays unchanged.MAX_MULTIPART_PARTS = 10_000is enforced on both sides (indexify/crates/dataplane/src/sign_blob.rsandtensorlake/crates/cloud-sdk/src/sandbox_images.rs); keep these in sync.Security posture
rel_pathsigning, the dataplane substitutes the caller's authenticated namespace into the prefix — clients cannot sign URLs for another project's prefix.SingleGet-only, used for parent snapshots which may live outside the caller's prefix. The dataplane will sign any URI its IAM identity can read; treat parent URIs as effectively public.X-Tensorlake-Sandbox-Idis set (overwriting, not appending) bysandbox-proxyafter authn. The dataplane HTTP proxy must not be directly reachable from inside sandboxes, or the header can be spoofed./completestill trusts a CLI-declaredsnapshotUrias long as its suffix matches the reconstructed rel-path. Tracked as phase-b follow-up: allowlist bucket origins against the dataplane fleet or have the dataplane attest completion.