Skip to content

Datasets: clarify database scoping and migrate existing datasets #96

@eddietejeda

Description

@eddietejeda

Background

Datasets are "derived views" — virtual SQL tables built from SQL queries over workspace data. With the introduction of the databases API (PR #94), queries are now scoped via the X-Database-Id header so that default resolves to a specific database's catalog.

Current behavior

X-Database-Id is sent on every API request when a current database is set, including all /datasets endpoints (list, create, refresh, etc.). This is because ApiClient appends the header unconditionally.

However, datasets appear to be workspace-scoped, not database-scoped:

  • They live in the datasets catalog, addressed as datasets.main.<table_name> — separate from default.public.* used by database tables
  • GET /datasets returned all 46 workspace datasets regardless of which database was set

The tension

There are two separate concerns that need to be resolved:

  1. Dataset listing/creation — Should /datasets be scoped to a database? Currently the server appears to ignore X-Database-Id here and returns all workspace datasets. This may be intentional (datasets are workspace-level) or unintentional.

  2. SQL query execution inside a dataset — If a dataset's source SQL references database tables (e.g. SELECT * FROM default.public.trips), the X-Database-Id header is needed so the query engine resolves default to the right catalog. Without it, the query would fail or hit the wrong catalog.

What needs to happen

  • Clarify with backend: should /datasets list/create/refresh respect X-Database-Id?
  • Decide whether datasets should be explicitly associated with a database (i.e. a database_id field on the dataset record)
  • If datasets are database-scoped: migrate existing workspace datasets to an appropriate database, and add --database flag to datasets create (similar to databases tables load)
  • If datasets remain workspace-scoped: ensure the CLI does not send X-Database-Id on dataset endpoints to avoid unintended side effects, but still passes it through when executing the source SQL query

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions