Skip to content

feat: SPOG (Single Point of Gateway) host support#1479

Open
sd-db wants to merge 2 commits into
mainfrom
sd-db/spog-impl
Open

feat: SPOG (Single Point of Gateway) host support#1479
sd-db wants to merge 2 commits into
mainfrom
sd-db/spog-impl

Conversation

@sd-db
Copy link
Copy Markdown
Collaborator

@sd-db sd-db commented May 25, 2026

Summary

Adds SPOG (Single Point of Gateway) support — account-level vanity hosts like peco.azuredatabricks.net where workspaces are disambiguated by ?o=<workspace-id> on http_path. Convention matches databricks-sql-python (#767), databricks-sql-go, databricks-jdbc, and the ADBC Rust driver.

Builds on the dep-ceiling bumps already in main (#1474). Feature is opt-in via those bumps: activates only when databricks-sql-connector ≥ 4.2.6 and databricks-sdk ≥ 0.104.0 are installed. Pre-SPOG dep versions continue to work unchanged on legacy hosts — non-SPOG users see no behavior change.

What changes

  • New package dbt/adapters/databricks/spog/:
    • extract — parse ?o= (or fall back to /o/<id>/ in cluster paths) from http_path.
    • capabilities — runtime detect: connector_supports_spog (PEP-440 version compare), sdk_supports_workspace_id (feature-detect via inspect.signature(Config)).
    • probe — one-shot GET /.well-known/databricks-config per host with 3-attempt backoff; probe failure is non-fatal.
    • decision — applies the §8 decision matrix at connection.open(); raises DbtConfigError with a pointed upgrade/fix message on every misconfig row.
  • credentials.py:
    • Cluster-ID regex tightened: (.*)([^?&]+) so the capture stops at any query string (independently useful even on legacy hosts).
    • DatabricksCredentialManager gains a workspace_id field populated by extract_workspace_id(credentials.http_path).
    • All five authenticate_with_* methods plumb workspace_id into Config(...) via a single _config_kwargs helper, gated on sdk_supports_workspace_id() so old SDKs are unaffected.
  • connections.py: DatabricksConnectionManager.open() collects every http_path in play (default + per-compute) and invokes check_spog_preconditions(...) before constructing conn_args. No-op on legacy hosts; pointed DbtConfigError on misconfig.
  • impl.py: DatabricksAdapter.debug_query override emits a SPOG status block (host_type, workspace_id, dep-version suitability) before the standard select 1 as id — makes "is SPOG working here?" a one-command answer via dbt debug.

Misconfiguration handling

Each row in §8 of the design fails fast at connection.open() with a DbtConfigError naming the file/field to fix:

Host type ?o= present? Connector / SDK suitable? Behavior
SPOG yes yes proceed
SPOG no error: http_path is missing ?o=<workspace-id>
SPOG yes no error: upgrade databricks-sql-connector / databricks-sdk
non-SPOG yes error: remove ?o= from http_path (or fix host)
non-SPOG no proceed (probe failure is non-fatal)

Design doc

docs/superpowers/specs/2026-05-19-dbt-databricks-spog-design.md (committed in this PR) holds the full spec — background, the §8 decision matrix, all the upstream PRs referenced, and rationale for opt-in via ceiling bumps.

Test plan

  • Unit tests pass locally (hatch run unit tests/unit -q) — 1174 passed, 6 skipped
  • pre-commit run --all-files passes
  • Functional tests against legacy host (existing CI: databricks_uc_sql_endpoint, databricks_uc_cluster, databricks_cluster)
  • Functional tests against SPOG host (new .github/workflows/spog-integration.yml, manual / scheduled — points at peco.azuredatabricks.net)
  • dbt debug exercises the new SPOG status block on both SPOG and legacy hosts

sd-db added 2 commits May 25, 2026 11:08
Implements support for Databricks SPOG hosts — account-level vanity URLs
(e.g. peco.azuredatabricks.net) where workspaces are disambiguated by a
`?o=<workspace-id>` query parameter on http_path. Approach matches the
convention adopted by databricks-sql-python, databricks-sql-go,
databricks-jdbc, and the ADBC Rust driver: parse ?o= from http_path and
use it to set the X-Databricks-Org-Id header on non-OAuth endpoints.

Opt-in via the dependency ceiling bumps already landed: requires
`databricks-sql-connector >= 4.2.6` and `databricks-sdk >= 0.104.0` for
the SPOG code path to activate. Pre-SPOG dep versions continue to work
unchanged on legacy hosts.

- `extract.py` — pure parser; pulls ?o=<workspace-id> from http_path.
- `capabilities.py` — runtime detect SPOG support: `connector_supports_spog`
  (version-detect with packaging.version.Version), `sdk_supports_workspace_id`
  (feature-detect via inspect.signature(Config) so forks/wrappers report
  correctly).
- `probe.py` — one-shot per-host probe of /.well-known/databricks-config.
  3-attempt exponential backoff; on exhaust returns HostMetadata(host_type=None)
  + WARN. Probe failure is never fatal.
- `decision.py` — applies the spec §8 decision matrix at connection.open():
  raises DbtConfigError on every misconfiguration row with a pointed
  upgrade/fix message; returns the extracted workspace_id on the happy path.

- `credentials.py`:
  - Cluster-ID regex tightened: `(.*)` -> `([^?&]+)` so the capture stops
    at any query string (independently useful even on legacy hosts).
  - DatabricksCredentialManager gains a `workspace_id` field populated by
    `extract_workspace_id(credentials.http_path)` in create_from.
  - All five `authenticate_with_*` methods now plumb workspace_id into
    `Config(...)` via a single `_config_kwargs` helper — gated on
    `sdk_supports_workspace_id()` so old SDKs are unaffected.
- `connections.py`: `DatabricksConnectionManager.open()` collects every
  http_path in play (default + per-compute) and invokes
  `check_spog_preconditions(host=..., http_paths=...)` before constructing
  conn_args. On legacy hosts the call is a no-op; on misconfiguration it
  raises a pointed DbtConfigError.
- `impl.py`: `DatabricksAdapter.debug_query` override emits a SPOG status
  block (host_type, workspace_id, dep-version suitability) before the
  standard `select 1 as id`. Makes 'is SPOG working here?' a one-command
  answer for support escalations.

- 35 unit tests under `tests/unit/spog/` covering every §8 matrix row,
  retry/backoff math, capability detection branches, both-deps-old
  ordering, HTTP-error retry fallback, and probe caching.
- 17 cross-module unit tests (workspace_id plumbing, connection.open
  wiring, dbt debug block, cluster-id regex).
- 3 functional tests under `tests/functional/adapter/spog/`:
  - test_spog_debug — assert dbt debug emits the SPOG block.
  - test_spog_missing_o_raises — strip ?o=, expect the §8 row-4 error.
  - test_spog_probe_failure_fallback — simulate probe failure; expect
    WARN + run still succeeds.
  All three skip when DBT_DATABRICKS_SPOG_* env vars are absent.

`.github/workflows/spog-integration.yml` runs the SPOG functional tests
against `peco.azuredatabricks.net?o=6436897454825492` using the existing
Azure secrets (same workspace; only host + ?o= suffix differ). Forces
SPOG-capable connector and SDK pins. Triggered weekly + workflow_dispatch.

Design spec at `docs/superpowers/specs/2026-05-19-dbt-databricks-spog-design.md`;
implementation plan at `docs/superpowers/plans/2026-05-19-dbt-databricks-spog.md`;
follow-up items tracked in `.claude/ideas/spog-future-tasks.md` (gitignored,
local-only). CHANGELOG entry added under `dbt-databricks next`.
- test_python_helpers: stub Mock() credentials.http_path with a real
  string so extract_workspace_id() (now called in
  DatabricksCredentialManager.create_from) doesn't trip on
  "argument of type 'Mock' is not iterable".
- test_auth (TestEnsureConfigTriggersTheRightAuth): autouse-patch
  sdk_supports_workspace_id() to False so the auth-routing
  assertions stay focused on auth_type. SPOG workspace_id plumbing
  has its own coverage in tests/unit/spog/.
@sd-db sd-db requested a review from jprakash-db as a code owner May 25, 2026 05:44
@github-actions
Copy link
Copy Markdown

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  dbt/adapters/databricks
  connections.py
  credentials.py
  impl.py 1115-1116, 1120-1125, 1140-1141
  dbt/adapters/databricks/spog
  capabilities.py
  decision.py
  extract.py
  probe.py
Project Total  

This report was generated by python-coverage-comment-action

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant