Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/validate.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ jobs:
- run: bun install

- name: Extract metadata
run: bun run bin/cli.ts extract-table-metadata examples/v1/metadata.json /tmp/databases
run: bun run bin/cli.ts extract-table-metadata examples/v1/table_metadata.json /tmp/databases

- name: Diff examples
run: diff -r examples/v1/databases /tmp/databases
34 changes: 23 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@

Metabase represents database metadata — synced databases, their tables, and their fields — as a tree of YAML files. Files are diff-friendly: numeric IDs are omitted entirely, and foreign keys use natural-key tuples like `["Sample Database", "PUBLIC", "ORDERS"]` instead of database identifiers.

This repository contains the specification, examples, and a CLI that converts the `metadata.json` downloaded from a Metabase instance into YAML.
This repository contains the specification, examples, and a CLI that converts the `table_metadata.json` downloaded from a Metabase instance into YAML.

## Specification

The format is defined in **[core-spec/v1/spec.md](core-spec/v1/spec.md)** (v1.0.4). It covers entity keys, field types, folder structure, and the shape of each entity.

Reference output for the Sample Database lives in **[examples/v1/](examples/v1/)** — both the raw `metadata.json` and the extracted YAML tree.
Reference output for the Sample Database lives in **[examples/v1/](examples/v1/)** — both the raw `table_metadata.json` and the extracted YAML tree.

### Entities

Expand All @@ -20,7 +20,19 @@ Reference output for the Sample Database lives in **[examples/v1/](examples/v1/)

## Obtaining metadata

Metadata is fetched from Metabase's `GET /api/ee/serialization/metadata/export` endpoint as a `metadata.json` file — a flat JSON document with three arrays (`databases`, `tables`, and `fields`) streamed so even warehouses with very large schemas can be exported without exhausting server memory.
Metadata is fetched from Metabase's `GET /api/ee/serialization/metadata/export` endpoint as a `table_metadata.json` file — a flat JSON document with three arrays (`databases`, `tables`, and `fields`) streamed so even warehouses with very large schemas can be exported without exhausting server memory.

The endpoint accepts three boolean query parameters that opt sections in or out — they all default to `false`, so requests must explicitly set the sections they want:

- `with-databases` — include the `databases` array.
- `with-tables` — include the `tables` array.
- `with-fields` — include the `fields` array.

A typical full export sets all three to `true`:

```
GET /api/ee/serialization/metadata/export?with-databases=true&with-tables=true&with-fields=true
```

### Extracting metadata to YAML

Expand All @@ -30,7 +42,7 @@ The CLI turns that JSON into the human- and agent-friendly YAML tree described i
bunx @metabase/database-metadata extract-table-metadata <input-file> <output-folder>
```

- `<input-file>` — path to the `metadata.json` downloaded from Metabase.
- `<input-file>` — path to the `table_metadata.json` downloaded from Metabase.
- `<output-folder>` — destination directory. Database folders are created directly under it.

### Extracting the spec
Expand All @@ -49,11 +61,11 @@ The following is the **default** workflow for a project that wants to use Metaba

### 1. A `.metadata/` directory at the repo root

Create a top-level `.metadata/` directory and **add it to `.gitignore`**. This is where the raw `metadata.json` and the extracted `databases/` YAML tree live:
Create a top-level `.metadata/` directory and **add it to `.gitignore`**. This is where the raw `table_metadata.json` and the extracted `databases/` YAML tree live:

```
.metadata/
├── metadata.json
├── table_metadata.json
└── databases/
└── …
```
Expand All @@ -70,17 +82,17 @@ Each developer (or a CI job) fetches metadata on demand from their own Metabase

### 3. Download from Metabase and extract

Each developer downloads `metadata.json` from their Metabase instance and drops it into `.metadata/`. Then run the extractor:
Each developer downloads `table_metadata.json` from their Metabase instance and drops it into `.metadata/`. Then run the extractor:

```sh
mkdir -p .metadata
# Drop metadata.json from Metabase into .metadata/
# Drop table_metadata.json from Metabase into .metadata/

rm -rf .metadata/databases
bunx @metabase/database-metadata extract-table-metadata .metadata/metadata.json .metadata/databases
bunx @metabase/database-metadata extract-table-metadata .metadata/table_metadata.json .metadata/databases
```

After this, tools and agents should read the YAML tree under `.metadata/databases/` — not `metadata.json`, which exists only as input to the extractor.
After this, tools and agents should read the YAML tree under `.metadata/databases/` — not `table_metadata.json`, which exists only as input to the extractor.

## Publishing to NPM

Expand All @@ -94,7 +106,7 @@ The workflow requires an `NPM_RELEASE_TOKEN` secret with publish access to the `

```sh
bun install
bun bin/cli.ts extract-table-metadata examples/v1/metadata.json /tmp/.metadata/databases
bun bin/cli.ts extract-table-metadata examples/v1/table_metadata.json /tmp/.metadata/databases
```

### Scripts
Expand Down
2 changes: 1 addition & 1 deletion bin/cli.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ import { join, resolve } from "path";

const REPO_ROOT = resolve(import.meta.dirname, "..");
const CLI = "bin/cli.ts";
const EXAMPLE_INPUT = "examples/v1/metadata.json";
const EXAMPLE_INPUT = "examples/v1/table_metadata.json";

type RunResult = {
stdout: string;
Expand Down
8 changes: 4 additions & 4 deletions bin/cli.ts
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ function parseArguments() {
});
}

async function handleExtractMetadata(positionals: string[]): Promise<void> {
function handleExtractMetadata(positionals: string[]): void {
const inputFile = positionals[1];
const outputFolder = positionals[2];

Expand All @@ -44,7 +44,7 @@ async function handleExtractMetadata(positionals: string[]): Promise<void> {
process.exit(1);
}

const stats = await extractTableMetadata({ inputFile, outputFolder });
const stats = extractTableMetadata({ inputFile, outputFolder });
console.log(
`Extracted ${stats.databases} databases, ${stats.tables} tables, ${stats.fields} fields`,
);
Expand All @@ -57,7 +57,7 @@ function handleExtractSpec(values: ParsedValues): void {
process.exit(0);
}

async function main(): Promise<void> {
function main(): void {
const { values, positionals } = parseArguments();
const command = positionals[0];

Expand All @@ -77,4 +77,4 @@ async function main(): Promise<void> {
}
}

await main();
main();
5 changes: 0 additions & 5 deletions bun.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 1 addition & 2 deletions core-spec/v1/spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Metabase database metadata is a read-only snapshot of databases, tables, and fie

The format is designed to be **portable** and **reviewable**: numeric IDs are omitted or replaced with human-readable natural keys (database name, `[database, schema, table]` tuples, etc.). Files can be diffed, grepped, and edited by hand.

The raw `metadata.json` is a single flat JSON document with `databases`, `tables`, and `fields` arrays, optimized for transport rather than reading. It can be arbitrarily large — tens or hundreds of megabytes on warehouses with many tables — and is not intended for direct consumption. Tools and humans should read the extracted YAML tree under `databases/` instead, where each entity lives in its own small file.
The raw `table_metadata.json` is a single flat JSON document with `databases`, `tables`, and `fields` arrays, optimized for transport rather than reading. It can be arbitrarily large — tens or hundreds of megabytes on warehouses with many tables — and is not intended for direct consumption. Tools and humans should read the extracted YAML tree under `databases/` instead, where each entity lives in its own small file.

## Table of Contents

Expand Down Expand Up @@ -252,4 +252,3 @@ parent_id:
- DATA
- user
```

Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
db_id: Sample Database
name: ACCOUNTS
schema: PUBLIC
description: Information on customer accounts registered with Piespace. Each account represents a new organization signing up for on-demand pies.
db_id: Sample Database
fields:
- name: LONGITUDE
base_type: type/Float
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
db_id: Sample Database
name: ANALYTIC_EVENTS
schema: PUBLIC
description: Piespace does some anonymous analytics tracking on how users interact with their platform. They’ve only had time to implement a few events, but you know how it is. Pies come first.
db_id: Sample Database
fields:
- name: BUTTON_LABEL
base_type: type/Text
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
db_id: Sample Database
name: FEEDBACK
schema: PUBLIC
description: With each order of pies sent out, Piespace includes a place for customers to submit feedback and review their order.
db_id: Sample Database
fields:
- name: ID
base_type: type/BigInteger
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
db_id: Sample Database
name: INVOICES
schema: PUBLIC
description: Confirmed payments from Piespace’s customers. Most accounts pay for their pie subscription on a monthly basis.
db_id: Sample Database
fields:
- name: PLAN
base_type: type/Text
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
db_id: Sample Database
name: ORDERS
schema: PUBLIC
description: Confirmed Sample Company orders for a product, from a user.
db_id: Sample Database
fields:
- name: QUANTITY
description: Number of products bought.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
db_id: Sample Database
name: PEOPLE
schema: PUBLIC
description: Information on the user accounts registered with Sample Company.
db_id: Sample Database
fields:
- name: STATE
description: The state or province of the account’s billing address
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
db_id: Sample Database
name: PRODUCTS
schema: PUBLIC
description: Includes a catalog of all the products ever sold by the famed Sample Company.
db_id: Sample Database
fields:
- name: ID
description: The numerical product number. Only used internally. All external communication should use the title or EAN.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
db_id: Sample Database
name: REVIEWS
schema: PUBLIC
description: Reviews that Sample Company customers have left on our products.
db_id: Sample Database
fields:
- name: RATING
description: The rating (on a scale of 1-5) the user left.
Expand Down
Loading
Loading