-
Notifications
You must be signed in to change notification settings - Fork 121
Description
Description
The databricks sync command's --include flag does not work as expected for selective file inclusion. Instead of uploading only the specified files, it either uploads all files or attempts to delete existing workspace files.
Expected Behavior
When using --include flags, only the explicitly included files should be synced to the workspace:
databricks sync . "$PROJECT_PATH" \
--include "libs/__init__.py" \
--include "libs/source_loader.py" \
--include "sources/__init__.py"Expected: Only these 3 files should be uploaded, preserving the directory structure.
Actual Behavior
Three scenarios tested with --dry-run:
Scenario 1: --include alone
databricks sync . "$PROJECT_PATH" --dry-run \
--include "libs/__init__.py" \
--include "libs/source_loader.py" \
--include "sources/__init__.py"Result: Initial Sync Complete - All files in the directory are uploaded (include patterns ignored)
Scenario 2: --exclude "*" before --include
databricks sync . "$PROJECT_PATH" --dry-run \
--exclude "*" \
--include "libs/__init__.py" \
--include "libs/source_loader.py" \
--include "sources/__init__.py"Result: Action: DELETE - Attempts to DELETE all existing files from workspace
Scenario 3: --include before --exclude "**/*"
databricks sync . "$PROJECT_PATH" --dry-run \
--include "libs/__init__.py" \
--include "libs/source_loader.py" \
--include "sources/__init__.py" \
--exclude "**/*"Result: Action: DELETE - Attempts to DELETE all files from workspace (same as Scenario 2)
Steps to Reproduce
- Create a directory with multiple files
- Run:
databricks sync . /Workspace/Users/<user>/test --dry-run --include "specific/file.py" - Observe that all files are marked for upload, not just the included one
Environment
- Databricks CLI version: (tested on latest)
- OS: macOS (darwin 24.6.0)
- Shell: bash
Workaround
Currently using a temporary directory approach:
TEMP_DIR=$(mktemp -d)
mkdir -p "$TEMP_DIR/libs"
cp libs/__init__.py libs/source_loader.py "$TEMP_DIR/libs/"
databricks sync "$TEMP_DIR" "$PROJECT_PATH" --full
rm -rf "$TEMP_DIR"Related
This might be related to #4137 (Feature suggestion - rsync equivalent), which suggests that databricks sync could benefit from rsync-style selective file inclusion.
Suggested Fix
Either:
- Fix
--includeto work as a positive allowlist (only sync these files) - Document the current behavior more clearly in
databricks sync --help - Add a new flag like
--only-includethat syncs only the specified files