filer: switch workspace upload from import-file to /workspace/import#5165
Draft
shreyas-goenka wants to merge 6 commits intomainfrom
Draft
filer: switch workspace upload from import-file to /workspace/import#5165shreyas-goenka wants to merge 6 commits intomainfrom
shreyas-goenka wants to merge 6 commits intomainfrom
Conversation
Replace `POST /api/2.0/workspace-files/import-file/{path}?overwrite=…`
with the multipart variant of `POST /api/2.0/workspace/import` (via the
SDK's `Workspace.Upload` + `format=AUTO`). The legacy endpoint is being
deprecated and the new endpoint is the strategic replacement.
The multipart variant is required because the JSON body of /workspace/import
is server-capped at 10 MiB; multipart accepts the same sizes import-file did
(verified up to 250 MiB against a real workspace), so DAB users shipping
wheels/jars/large files keep working.
Error mapping uses SDK sentinels via errors.Is rather than raw status/error
code comparisons:
- ErrNotFound → noSuchDirectoryError (or mkdir+retry under
CreateParentDirectories mode); also catches RESOURCE_DOES_NOT_EXIST.
- ErrResourceAlreadyExists → fileAlreadyExistsError (the new endpoint
reliably sets error_code RESOURCE_ALREADY_EXISTS).
- ErrInvalidParameterValue + "Requested node type" message →
fileAlreadyExistsError (existing object's node type doesn't match the
upload — file vs notebook collision).
- ErrPermissionDenied → permissionError.
Apply the same sentinel-based pattern to Delete, ReadDir, and Stat for
404 detection, matching the existing usage in bundle/direct/util.go and
following AGENTS.md's rule against branching on err.Error() string content.
DIRECTORY_NOT_EMPTY in Delete keeps an explicit ErrorCode check since the
SDK has no sentinel for it.
Test plan:
- libs/filer/workspace_files_client_test.go covers the new error mapping.
- libs/testserver/handlers.go extended with a multipart handler at
/workspace/import that surfaces 409s from the shared fake as 400 +
RESOURCE_ALREADY_EXISTS to match real-workspace shape.
- acceptance/internal/prepare_server.go normalizes multipart bodies
(sorted form fields, file parts recorded as {filename, size}) so
request fixtures stay deterministic.
- ~70 acceptance fixtures regenerated for the new request shape.
- End-to-end verified against a real workspace for files, all notebook
types (.py / .sql / .ipynb / .lvdash.json / .scala / .r), dashboards,
and a 60 MB binary upload.
- Integration test TestFilerWorkspaceNotebook assertion updated to assert
the path with extension (tc.name) — matches the absPath returned in
fileAlreadyExistsError. Same change as #5106.
Co-authored-by: Isaac
0a9923f to
e86aeba
Compare
…ted goldens Add a focused acceptance test (acceptance/bundle/sync/upload-edge-cases) that exercises the multipart upload pipeline with the inputs that differ in shape between the legacy import-file and the new /workspace/import endpoints: - A 12 MiB binary file (above the JSON-body 10 MiB cap that the multipart variant lifts). - An empty file (multipart encodes an empty content part distinct from JSON's empty string). - A python notebook (auto-detected as NOTEBOOK; testserver mirrors extension stripping). - A .lvdash.json dashboard descriptor (real workspace assigns DASHBOARD; testserver records the upload-side request shape). - A non-ASCII filename (héllo.txt — multipart encodes filenames with quoted string rules). - A filename with a space. Assertions inspect out.requests.txt and pin: the set of multipart_form.path values, that every upload sets format=AUTO. Also restore 11 golden out.requests.txt files that the previous regenerate sweep accidentally deleted (templates/telemetry/*, run/inline-script/**, run/scripts/**, resources/volumes/change-schema-name) — they were present on main and silently disappeared during the rebase that brought their goldens together with the removal patch. Co-authored-by: Isaac
4db3fb6 to
6907378
Compare
f08d69a to
37a1400
Compare
… writes)
The /workspace/import endpoint reports an existing path with two different
error codes depending on contention:
- 400 RESOURCE_ALREADY_EXISTS: sequential conflict (overwrite=false on existing path)
Message: "<path> already exists. Please pass overwrite=true to overwrite it."
- 409 ALREADY_EXISTS: concurrent contention (e.g. multiple deploy locks racing)
Message: "Node with name <path> already exists. Please pass overwrite=true to update it."
Verified by direct probe against aws-prod and aws-prod-ucws — sequential
conflicts produce the first shape, but a 5-goroutine race on the same path
produces the second shape with status 409 and error_code ALREADY_EXISTS.
The original PR's errors.Is(err, ErrResourceAlreadyExists) only catches the
first shape, leaving locker.Lock() returning the raw API error instead of
falling through to assertLockHeld() — which is what TestLock asserts. Add
errors.Is(err, ErrAlreadyExists) for the second shape; both inherit from
apierr.ErrResourceConflict but we use the specific sentinels rather than
the broader parent (ErrResourceConflict also covers ErrAborted, which is
a different concept).
Co-authored-by: Isaac
37a1400 to
1e2fe83
Compare
The os.Getenv check was added in this PR's first commit but is never set in any CI config or test environment. The pre-existing !WorkspaceTmpDir guard already covers the actual problem (yamlfmt build fails on DBR workspace filesystems). The env var was an unused escape hatch. Co-authored-by: Isaac
…ontent]" placeholder)
Multipart parts in recorded fixtures used to be summarized as the literal string
"[content]" regardless of what was uploaded. Reviewers couldn't see the actual
deploy.lock / deployment.json / app.yml payloads, only that *something* was sent.
Show the literal content for valid UTF-8 parts; existing global UNIX_TIME / UUID /
USERNAME / DEV_VERSION / TIMESTAMP replacements normalize the dynamic fields so
deploy.lock and similar payloads still produce stable diffs. Binary content and
parts above 4 KiB still collapse to "[binary content N bytes]" so a 12 MiB wheel
upload doesn't bloat the recording.
Update the upload-edge-cases acceptance test to assert on the recorded content for
each small text input (notebook source, lakeview dashboard JSON, "hello, world",
"hello, naïve world", empty file) and confirm the 12 MiB binary collapses.
Regenerated affected goldens under acceptance/bundle/{user_agent,validate,
artifacts,libraries,apps}/.
Co-authored-by: Isaac
Both the JSON-body and multipart variants of /api/2.0/workspace/import are capped — they share the same /workspace/import path but enforce different caps. Per Confluence "Workspace Import Export Endpoints": - JSON body, AUTO format: 10 MiB (databricks.webapp.autoExportFormatLimitBytes) - Multipart: 200 MiB (databricks.workspaceFilesystem.maxImportSizeBytes) - Legacy import-file: 200 MiB (same Armeria server limit) So multipart is the right replacement for the legacy endpoint not because it "handles arbitrary size", but because its 200 MiB cap matches the legacy endpoint we are migrating away from. The previous comment was wrong on both counts (no arbitrary size, and the 10 MiB cap is the JSON-body variant of /workspace/import, not the legacy /workspace-files/import-file endpoint). Co-authored-by: Isaac
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replace
POST /api/2.0/workspace-files/import-file/{path}?overwrite=…with the multipart variant ofPOST /api/2.0/workspace/import(via the SDK'sWorkspace.Upload+format=AUTO). The legacy endpoint is being deprecated and the new endpoint is the strategic replacement.Why multipart and not the JSON body of
/workspace/import: the JSON body is server-capped at 10 MiB. Multipart accepts the same sizesimport-filedid (verified up to 250 MiB against a real workspace), so DAB users shipping wheels/jars/large files keep working.Error mapping
Switched from raw
aerr.StatusCode/ErrorCodecomparisons toerrors.Isagainst the SDK'sapierrsentinels:ErrNotFound→noSuchDirectoryError(or mkdir+retry underCreateParentDirectoriesmode); also catchesRESOURCE_DOES_NOT_EXIST.ErrResourceAlreadyExists→fileAlreadyExistsError(the new endpoint reliably setserror_codeRESOURCE_ALREADY_EXISTS).ErrInvalidParameterValue+"Requested node type"message →fileAlreadyExistsError(existing object's node type doesn't match the upload — file vs notebook collision).ErrPermissionDenied→permissionError.Same sentinel pattern applied to
Delete,ReadDir, andStatfor 404 detection.DIRECTORY_NOT_EMPTYkeeps an explicitErrorCodecheck since the SDK has no sentinel for it.End-to-end verification
format=AUTOis verified for every workspace-filesystem object type DABs cares about, against a real workspace:object_type.pywith# Databricks notebook sourceNOTEBOOK(PYTHON), extension stripped.sqlwith-- Databricks notebook sourceNOTEBOOK(SQL), extension stripped.ipynbNOTEBOOK(PYTHON), extension stripped.pywithout headerFILE.lvdash.jsonDASHBOARD, extension preservedFILEFILE(uploaded successfully — would have failed with JSON body)Alerts / jobs / pipelines / schemas / etc. are not files in the workspace; they're created via dedicated REST APIs and don't go through the filer.
Test plan
libs/filer/workspace_files_client_test.gocover the new error mapping.libs/testserver/handlers.goextended with multipart handler at/workspace/import.acceptance/internal/prepare_server.gonormalizes multipart bodies (sorted form fields, file parts recorded as{filename, size}) so request fixtures stay deterministic.TestFilerWorkspaceNotebookassertion updated to assert path with extension (tc.name); same change as filer: detect notebook already-exists across both error formats #5106.This pull request and its description were written by Isaac.