Skip to content

databricks sync --include flag does not selectively include files as expected #4141

@rsleedbx

Description

@rsleedbx

Description

The databricks sync command's --include flag does not work as expected for selective file inclusion. Instead of uploading only the specified files, it either uploads all files or attempts to delete existing workspace files.

Expected Behavior

When using --include flags, only the explicitly included files should be synced to the workspace:

databricks sync . "$PROJECT_PATH" \
  --include "libs/__init__.py" \
  --include "libs/source_loader.py" \
  --include "sources/__init__.py"

Expected: Only these 3 files should be uploaded, preserving the directory structure.

Actual Behavior

Three scenarios tested with --dry-run:

Scenario 1: --include alone

databricks sync . "$PROJECT_PATH" --dry-run \
  --include "libs/__init__.py" \
  --include "libs/source_loader.py" \
  --include "sources/__init__.py"

Result: Initial Sync Complete - All files in the directory are uploaded (include patterns ignored)

Scenario 2: --exclude "*" before --include

databricks sync . "$PROJECT_PATH" --dry-run \
  --exclude "*" \
  --include "libs/__init__.py" \
  --include "libs/source_loader.py" \
  --include "sources/__init__.py"

Result: Action: DELETE - Attempts to DELETE all existing files from workspace

Scenario 3: --include before --exclude "**/*"

databricks sync . "$PROJECT_PATH" --dry-run \
  --include "libs/__init__.py" \
  --include "libs/source_loader.py" \
  --include "sources/__init__.py" \
  --exclude "**/*"

Result: Action: DELETE - Attempts to DELETE all files from workspace (same as Scenario 2)

Steps to Reproduce

  1. Create a directory with multiple files
  2. Run: databricks sync . /Workspace/Users/<user>/test --dry-run --include "specific/file.py"
  3. Observe that all files are marked for upload, not just the included one

Environment

  • Databricks CLI version: (tested on latest)
  • OS: macOS (darwin 24.6.0)
  • Shell: bash

Workaround

Currently using a temporary directory approach:

TEMP_DIR=$(mktemp -d)
mkdir -p "$TEMP_DIR/libs"
cp libs/__init__.py libs/source_loader.py "$TEMP_DIR/libs/"
databricks sync "$TEMP_DIR" "$PROJECT_PATH" --full
rm -rf "$TEMP_DIR"

Related

This might be related to #4137 (Feature suggestion - rsync equivalent), which suggests that databricks sync could benefit from rsync-style selective file inclusion.

Suggested Fix

Either:

  1. Fix --include to work as a positive allowlist (only sync these files)
  2. Document the current behavior more clearly in databricks sync --help
  3. Add a new flag like --only-include that syncs only the specified files

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions