diff --git a/skills/kaggle-benchmarks/SKILL.md b/skills/kaggle-benchmarks/SKILL.md
new file mode 100644
index 00000000..7c61c16e
--- /dev/null
+++ b/skills/kaggle-benchmarks/SKILL.md
@@ -0,0 +1,217 @@
+---
+name: kaggle-benchmarks
+description: >
+  How to write, push, run, and manage Kaggle Benchmark tasks using the kaggle
+  CLI and the kaggle-benchmarks Python SDK. Activate this skill when the user
+  wants to create a benchmark task, push a task file, run benchmarks against
+  LLM models, check run status, download results, or troubleshoot benchmark
+  workflows. Keywords: kaggle benchmarks, benchmark task, kbench, model proxy,
+  push task, run task, benchmark status, benchmark download.
+metadata:
+  author: kaggle
+  version: "0.1"
+---
+
+# Kaggle Benchmarks CLI Reference
+
+This reference covers how to use the `kaggle` CLI to manage Kaggle Benchmark tasks — pushing task files, running them against LLM models, checking status, and downloading results.
+
+## Official resources
+
+- **kaggle-benchmarks SDK repo:** https://github.com/Kaggle/kaggle-benchmarks — full source, API reference, and examples for the `kaggle-benchmarks` Python library used to write task files
+- **DeepWiki documentation:** https://deepwiki.com/Kaggle/kaggle-benchmarks — auto-generated documentation for the SDK
+
+## Prerequisites
+
+- `kaggle` CLI installed (`pip install kaggle` or `pip install -e .` from source)
+- `kaggle-benchmarks` SDK installed (`pip install kaggle-benchmarks`)
+- Valid Kaggle credentials: `KAGGLE_API_TOKEN` env var, `~/.kaggle/access_token` file, or OAuth via `kaggle auth login`
+
+## Command Hierarchy
+
+```
+kaggle benchmarks (alias: kaggle b)
+├── auth              — Fetch Model Proxy credentials
+├── init              — Fetch credentials + setup local dev environment
+└── tasks (alias: t)  — Manage benchmark tasks
+    ├── push          — Upload a task from a .py file
+    ├── run           — Run a task against model(s)
+    ├── list          — List your benchmark tasks
+    ├── status        — Show task details and per-model run status
+    ├── download      — Download completed run outputs
+    ├── models        — List available benchmark models
+    └── delete        — Delete a task (not yet supported by server)
+```
+
+## Setup & Authentication
+
+### Initialize a Benchmark Project
+
+The `init` command fetches Model Proxy credentials, writes default environment variables, generates a starter example task file, and a syntax reference document.
+
+```bash
+# Initialize with defaults (always writes .env, example_task.py, kaggle_benchmarks_reference.md)
+kaggle b init -y
+
+# Use custom paths for env file and/or example file:
+# kaggle b init -y --env-file my_project/.env --example-file my_project/my_task.py
+```
+
+**Options:**
+- `-y, --yes`: Skip confirmation prompt
+- `--env-file <FILE>`: Path to write env vars (default: `.env`)
+- `--example-file <FILE>`: Path to write example task (default: `example_task.py`)
+
+**Environment variables written (appended to the env file):**
+- `MODEL_PROXY_URL` — Model Proxy endpoint
+- `MODEL_PROXY_API_KEY` — Short-lived API key
+- `MODEL_PROXY_EXPIRY_TIME` — Token expiry
+- `LLM_DEFAULT` — Default model slug (e.g. `google/gemini-3-flash-preview`)
+- `LLM_DEFAULT_EVAL` — Default eval model slug
+- `LLMS_AVAILABLE` — Comma-separated list of available model slugs
+
+**⚠ Note:** Environment variables are **appended** to the env file. When loaded via `dotenv`, the last value wins, so re-running `init` or `auth` is safe. The file may accumulate duplicate entries over time; clean up manually if desired.
+
+**Files generated in the same directory as the example file:**
+- `example_task.py` — Starter benchmark task using `@task` decorator
+- `kaggle_benchmarks_reference.md` — Syntax reference for the `kaggle-benchmarks` Python library
+
+If either file already exists, it is skipped without overwriting.
+
+### Fetch Only Auth Credentials
+
+If you just need the Model Proxy token (without the extra env vars and example files):
+
+```bash
+# Refresh only the 3 credential variables (MODEL_PROXY_URL, MODEL_PROXY_API_KEY, MODEL_PROXY_EXPIRY_TIME)
+kaggle b auth -y
+
+# Or write to a custom env file:
+# kaggle b auth -y --env-file custom.env
+```
+
+## Core Workflow: Push → Run → Status → Download
+
+### Step 1: Write a Task File
+
+Task files are Python scripts using the `kaggle-benchmarks` library. They must:
+- Import `kaggle_benchmarks as kbench`
+- Define at least one function decorated with `@kbench.task(...)`
+- Call `.run(kbench.llm)` on the task function
+- Use `# %%` cell markers to separate notebook cells (percent format)
+
+**⚠ Important:** The `.run()` call is what triggers execution and produces a `.run.json` output file. Without invoking `.run()` (or `.evaluate()`), no run file is produced and nothing is recorded. The push will still succeed (since push validation only checks for `@task` decorators), but the task will silently produce no results when executed on the server.
+
+**Minimal example:**
+```python
+# %%
+import kaggle_benchmarks as kbench
+
+# %%
+@kbench.task(name="my-test-task")
+def my_test_task(llm):
+    response = llm.prompt("What is 2 + 2?")
+    kbench.assertions.assert_in("4", response, expectation="Should contain 4")
+
+my_test_task.run(kbench.llm)
+```
+
+**Task name defaults:** If you omit the `name=` argument from `@kbench.task()`, the task name defaults to the function name, title-cased with underscores replaced by spaces. For example, `@kbench.task()` on a function named `my_eval` produces the task name `"My Eval"`, which is slugified to `my-eval`.
+
+**Task file format rules:**
+- Must be a `.py` file
+- Uses "percent format" — `# %%` cell markers separate notebook cells. Each `# %%` starts a new cell. The CLI converts the file to `.ipynb` using `jupytext` with this format.
+- IPython magics (`%`, `!`, `%%`) are stripped during AST validation but kept in the final notebook for server execution
+- The task name is normalized to a URL-safe slug (e.g. `"My Test Task"` → `my-test-task`)
+- The slug used in the CLI must match a `@task` decorator in the file
+
+### Step 2: Validate Locally
+
+Before pushing, run the `.py` file locally to confirm it executes end-to-end and produces a `.run.json` output. This catches missing `.run()` calls, broken prompts, and assertion failures before they show up as silent no-ops on the server.
+
+```bash
+# Make sure your env is initialized (writes MODEL_PROXY_* + LLM_* vars to .env)
+kaggle b init -y
+
+# Run the task file directly
+python task.py
+
+# Confirm a .run.json was produced next to the task file
+ls -1 *.run.json
+```
+
+If `python task.py` exits cleanly and a `.run.json` appears, the task is safe to push.
+
+**How the LLM is chosen** (in order of precedence):
+
+1. **Explicit model in the task code** — pick a specific model from `kbench.llms`:
+   ```python
+   task.run(llm=kbench.llms["google/gemini-3.5-flash"])
+   ```
+2. **Default in the task code** — use `kbench.llm`, which resolves to `LLM_DEFAULT`:
+   ```python
+   task.run(llm=kbench.llm)
+   ```
+3. **Environment variables** (`.env`, written by `kaggle b init`) — control what `kbench.llm` resolves to and which models are available:
+   ```dotenv
+   LLM_DEFAULT=google/gemini-3.5-flash
+   LLM_DEFAULT_EVAL=google/gemini-3.1-pro-preview
+   LLMS_AVAILABLE=google/gemini-2.5-flash,google/gemini-2.5-pro,...
+   MODEL_PROXY_URL=...
+   MODEL_PROXY_API_KEY=...
+   ```
+If `python task.py` fails with an auth error, re-run `kaggle b auth -y` to refresh `MODEL_PROXY_API_KEY` (it's short-lived).
+
+### Step 3: Push the Task
+
+```bash
+# Push and wait for server-side creation to complete (recommended)
+kaggle b t push my-task -f task.py --wait
+
+# Push with timeout (60s) and custom poll interval (5s)
+kaggle b t push my-task -f task.py --wait 60 --poll-interval 5
+
+# Push without waiting (fire-and-forget; check status with `kaggle b t status`)
+# kaggle b t push my-task -f task.py
+```
+
+**Arguments:**
+- `<TASK>` (positional, required): Task slug (must match a `@task` decorator name in the file)
+- `-f, --file <FILE>` (required): Path to the `.py` task file
+
+### Step 4: Run the Task
+
+```bash
+# Run against the default model
+kaggle b t run my-task
+
+# Run against specific models
+kaggle b t run my-task -m google/gemini-3-flash-preview -m openai/gpt-4o
+
+# List available models
+kaggle b t models
+```
+
+### Step 5: Check Status
+
+```bash
+# Show task details and per-model run status
+kaggle b t status my-task
+```
+
+### Step 6: Download Results
+
+```bash
+# Download completed run outputs
+kaggle b t download my-task
+
+# Download to a specific directory
+kaggle b t download my-task -o ./results/
+```
+
+## Common Issues & Troubleshooting
+
+- **"No run file produced"**: Ensure your task calls `.run(kbench.llm)` — without it, push succeeds but no results are recorded
+- **Token expired**: Re-run `kaggle b auth -y` or `kaggle b init -y` to refresh Model Proxy credentials
+- **Task slug mismatch**: The slug in `kaggle b t push <SLUG>` must exactly match a `@task(name="<SLUG>")` decorator in your file
+- **Cell format errors**: Ensure `# %%` markers are present — the CLI converts percent-format `.py` to `.ipynb`
diff --git a/skills/references/benchmarks.md b/skills/references/benchmarks.md
deleted file mode 100644
index 54c6d595..00000000
--- a/skills/references/benchmarks.md
+++ /dev/null
@@ -1,400 +0,0 @@
-# Kaggle Benchmarks CLI Reference
-
-This reference covers how to use the `kaggle` CLI to manage Kaggle Benchmark tasks — pushing task files, running them against LLM models, checking status, and downloading results.
-
-## Prerequisites
-
-- Python 3.11+
-- `kaggle` CLI installed (`pip install kaggle` or `pip install -e .` from source)
-- Valid Kaggle credentials: `KAGGLE_API_TOKEN` env var, `~/.kaggle/access_token` file, or OAuth via `kaggle auth login`
-
-## Command Hierarchy
-
-```
-kaggle benchmarks (alias: kaggle b)
-├── auth              — Fetch Model Proxy credentials
-├── init              — Fetch credentials + setup local dev environment
-└── tasks (alias: t)  — Manage benchmark tasks
-    ├── push          — Upload a task from a .py file
-    ├── run           — Run a task against model(s)
-    ├── list          — List your benchmark tasks
-    ├── status        — Show task details and per-model run status
-    ├── download      — Download completed run outputs
-    ├── models        — List available benchmark models
-    └── delete        — Delete a task (not yet supported by server)
-```
-
-## Setup & Authentication
-
-### Initialize a Benchmark Project
-
-The `init` command fetches Model Proxy credentials, writes default environment variables, generates a starter example task file, and a syntax reference document.
-
-```bash
-# Initialize with defaults (always writes .env, example_task.py, kaggle_benchmarks_reference.md)
-kaggle b init -y
-
-# Use custom paths for env file and/or example file:
-# kaggle b init -y --env-file my_project/.env --example-file my_project/my_task.py
-```
-
-**Options:**
-- `-y, --yes`: Skip confirmation prompt
-- `--env-file <FILE>`: Path to write env vars (default: `.env`)
-- `--example-file <FILE>`: Path to write example task (default: `example_task.py`)
-
-**Environment variables written (appended to the env file):**
-- `MODEL_PROXY_URL` — Model Proxy endpoint
-- `MODEL_PROXY_API_KEY` — Short-lived API key
-- `MODEL_PROXY_EXPIRY_TIME` — Token expiry
-- `LLM_DEFAULT` — Default model slug (e.g. `google/gemini-3-flash-preview`)
-- `LLM_DEFAULT_EVAL` — Default eval model slug
-- `LLMS_AVAILABLE` — Comma-separated list of available model slugs
-
-**⚠ Note:** Environment variables are **appended** to the env file. When loaded via `dotenv`, the last value wins, so re-running `init` or `auth` is safe. The file may accumulate duplicate entries over time; clean up manually if desired.
-
-**Files generated in the same directory as the example file:**
-- `example_task.py` — Starter benchmark task using `@task` decorator
-- `kaggle_benchmarks_reference.md` — Syntax reference for the `kaggle-benchmarks` Python library
-
-If either file already exists, it is skipped without overwriting.
-
-### Fetch Only Auth Credentials
-
-If you just need the Model Proxy token (without the extra env vars and example files):
-
-```bash
-# Refresh only the 3 credential variables (MODEL_PROXY_URL, MODEL_PROXY_API_KEY, MODEL_PROXY_EXPIRY_TIME)
-kaggle b auth -y
-
-# Or write to a custom env file:
-# kaggle b auth -y --env-file custom.env
-```
-
-## Core Workflow: Push → Run → Status → Download
-
-### Step 1: Write a Task File
-
-Task files are Python scripts using the `kaggle-benchmarks` library. They must:
-- Import `kaggle_benchmarks as kbench`
-- Define at least one function decorated with `@kbench.task(...)`
-- Call `.run(kbench.llm)` on the task function
-- Use `# %%` cell markers to separate notebook cells (percent format)
-
-**⚠ Important:** The `.run()` call is what triggers execution and produces a `.run.json` output file. Without invoking `.run()` (or `.evaluate()`), no run file is produced and nothing is recorded. The push will still succeed (since push validation only checks for `@task` decorators), but the task will silently produce no results when executed on the server.
-
-**Minimal example:**
-```python
-# %%
-import kaggle_benchmarks as kbench
-
-# %%
-@kbench.task(name="my-test-task")
-def my_test_task(llm):
-    response = llm.prompt("What is 2 + 2?")
-    kbench.assertions.assert_in("4", response, expectation="Should contain 4")
-
-my_test_task.run(kbench.llm)
-```
-
-**Task name defaults:** If you omit the `name=` argument from `@kbench.task()`, the task name defaults to the function name, title-cased with underscores replaced by spaces. For example, `@kbench.task()` on a function named `my_eval` produces the task name `"My Eval"`, which is slugified to `my-eval`.
-
-**Task file format rules:**
-- Must be a `.py` file
-- Uses "percent format" — `# %%` cell markers separate notebook cells. Each `# %%` starts a new cell. The CLI converts the file to `.ipynb` using `jupytext` with this format.
-- IPython magics (`%`, `!`, `%%`) are stripped during AST validation but kept in the final notebook for server execution
-- The task name is normalized to a URL-safe slug (e.g. `"My Test Task"` → `my-test-task`)
-- The slug used in the CLI must match a `@task` decorator in the file
-
-### Step 2: Push the Task
-
-```bash
-# Push and wait for server-side creation to complete (recommended)
-kaggle b t push my-task -f task.py --wait
-
-# Push with timeout (60s) and custom poll interval (5s)
-kaggle b t push my-task -f task.py --wait 60 --poll-interval 5
-
-# Push without waiting (fire-and-forget; check status with `kaggle b t status`)
-# kaggle b t push my-task -f task.py
-```
-
-**Arguments:**
-- `<TASK>` (positional, required): Task name/slug
-- `-f, --file <FILE>` (required): Path to the `.py` source file
-
-**Options:**
-- `--wait [TIMEOUT]`: Wait for creation to complete. `--wait` alone = wait indefinitely. `--wait 60` = timeout after 60s.
-- `--poll-interval <SECONDS>`: Seconds between status polls (default: `10`)
-
-**What happens:**
-1. Validates the file is a `.py` file and exists
-2. Parses the file AST to verify it contains a `@task` decorator matching the task name
-3. If the task name differs from its slug form, prints a warning (e.g. `"My Task"` → `"my-task"`)
-4. Converts the `.py` file to `.ipynb` notebook format via `jupytext`
-5. Uploads to Kaggle as a benchmark task (creates new or new version if exists)
-6. Prints the Task URL and a hint to run
-
-**Error scenarios:**
-- File not found: `ValueError: File task.py does not exist`
-- Non-`.py` file: `ValueError: File task.txt must be a .py file`
-- Missing `@task` decorator: `ValueError: No @task decorators found in file task.py. The file must define at least one task.`
-- Task name mismatch: `ValueError: Task 'wrong-name' not found in file task.py. Found tasks: real-task`
-- Re-push while previous is still processing (without `--wait`): `ValueError: Task 'my-task' is currently being created (pending). Cannot push now. Use --wait to monitor the existing creation.`
-- Re-push with `--wait`: Waits for existing creation to complete, then pushes new version automatically
-
-### Step 3: Run the Task Against Models
-
-```bash
-# Run with interactive model selection (paginated picker)
-kaggle b t run my-task
-
-# Run against specific models
-kaggle b t run my-task -m google/gemini-2.5-pro anthropic/claude-sonnet-4
-
-# Run against a model and wait for completion
-kaggle b t run my-task -m google/gemini-2.5-pro --wait
-
-# Run with timeout and custom poll interval
-kaggle b t run my-task -m google/gemini-2.5-pro --wait 30 --poll-interval 5
-```
-
-**Arguments:**
-- `<TASK>` (positional, required): Task name/slug
-
-**Options:**
-- `-m, --model <MODEL> [MODEL ...]`: One or more model slugs. If omitted, shows interactive picker.
-- `--wait [TIMEOUT]`: Wait for runs to complete. `0` or omit value = indefinite.
-- `--poll-interval <SECONDS>`: Seconds between polls (default: `10`)
-
-**Interactive model selection:**
-- Shows numbered list of available models
-- Enter comma-separated numbers (e.g. `1,3,5`) to select specific models
-- Enter `all` to select every available model
-- Pagination: `n` = next page, `p` = previous page (when > 20 models)
-
-**Error scenarios:**
-- Non-existent task: `ValueError: Task 'no-such-task' not found. Check the task name and try again. Use 'kaggle b t list' to see your tasks.`
-- Invalid model: `ValueError: Failed to schedule runs. One or more model names may be invalid: ['nonexistent-model']. Use 'kaggle b t run my-task' (without -m) to select from available models.`
-- Task not ready: `ValueError: Task 'my-task' is not ready to run (status: QUEUED). Only completed tasks can be run.`
-- Timeout: `Timed out waiting for runs after 30 seconds.`
-
-### Step 4: Check Status
-
-```bash
-# Full status for a task
-kaggle b t status my-task
-
-# Filter to specific models
-kaggle b t status my-task -m google/gemini-2.5-pro
-kaggle b t status my-task -m google/gemini-2.5-pro anthropic/claude-sonnet-4
-```
-
-**Output format:**
-```
-Task:     my-task
-Status:   COMPLETED
-Created:  2026-04-28 18:13:04
-Task URL: https://www.kaggle.com/...
-
-Model                     Status      Started               Ended
---------------------------------------------------------------------------
-gemini-2.5-pro            COMPLETED   2026-04-28 18:13:04   2026-04-28 18:14:00
-claude-sonnet-4           ERRORED     2026-04-28 18:13:04   2026-04-28 18:13:04
-
-Errors:
-  [claude-sonnet-4]
-    Traceback (most recent call last):
-      ...
-    ValueError: some error
-```
-
-If no runs exist: `No runs yet. Use 'kaggle b t run my-task' to start one.`
-
-### Step 5: Download Results
-
-```bash
-# Download all terminal run outputs (completed and errored)
-kaggle b t download my-task
-
-# Download for specific model(s)
-kaggle b t download my-task -m google/gemini-2.5-pro
-
-# Download to a custom directory
-kaggle b t download my-task -o ./results
-```
-
-**Options:**
-- `-m, --model <MODEL> [MODEL ...]`: Download only for specific models
-- `-o, --output <DIRECTORY>`: Output directory (default: current directory)
-
-**Output directory structure:**
-```
-<output>/<task>/<version>/<model>/<run_id>/    (version is "unset" if unavailable)
-   ├── output files...
-```
-
-**Behavior details:**
-- Downloads outputs for all runs in a **terminal state** — this includes both `COMPLETED` and `ERRORED` runs (errored runs may still have partial output)
-- Downloads zip archives and extracts them automatically
-- Already-downloaded runs are skipped: `Skipping gemini-2.5-pro (run 123) — already downloaded to ./my-task/1/gemini-2.5-pro/123`
-- Corrupt zips: Warning printed, raw `.zip` file kept, continues with other models
-- No downloadable runs (all still in progress): `No downloadable runs yet — N run(s) still in progress. Use 'kaggle b t status my-task' to check progress.`
-- No runs at all: `No runs found for task 'my-task'. Use 'kaggle b t run my-task' to start one.`
-
-## Additional Commands
-
-### List Tasks
-
-```bash
-# List all your tasks
-kaggle b t list
-
-# Filter by name (regex)
-kaggle b t list --name-regex "^math"
-
-# Filter by status
-kaggle b t list --status completed
-
-# Combine filters
-kaggle b t list --name-regex "^math" --status errored
-```
-
-**Status filter values:** `queued`, `running`, `completed`, `errored`
-
-**Output:** Aligned table with columns: Task, Version (or `unset`), Status, Created
-
-### List Available Models
-
-```bash
-kaggle b t models
-```
-
-**Output:** Table with columns: Slug, Display Name
-
-### Delete a Task
-
-```bash
-kaggle b t delete my-task
-kaggle b t delete my-task -y   # skip confirmation
-```
-
-**Note:** Delete is not yet supported by the server. Currently prints: `Delete is not supported by the server yet.`
-
-## Task Name Normalization
-
-Task names are automatically normalized to URL-safe slugs:
-- `my_task` → `my-task`
-- `My Test Task` → `my-test-task`
-- `My Task` → `my-task`
-
-When the CLI normalizes a name, it prints a yellow warning:
-```
-⚠ Warning: task name 'My Test Task' was normalized to slug 'my-test-task'.
-  Use 'my-test-task' in future commands.
-```
-
-The slug must match between the `@task(name=...)` decorator in the file and the CLI command. Comparison is done on slugified names, so `@task(name="My Task")` matches `kaggle b t push my-task -f file.py`.
-
-## Model Slug Format
-
-Models use an `owner/model-name` format:
-- `google/gemini-2.5-pro`
-- `anthropic/claude-sonnet-4`
-- `openai/gpt-oss-120b`
-
-When displaying model names, the owner prefix is stripped for readability (e.g. `gemini-2.5-pro`).
-
-The server may sometimes return model slugs in different formats (e.g. `anthropic/claude-sonnet-4-6@default`). The CLI handles this with client-side normalization, replacing `@` with `-` for matching.
-
-## Common Workflows
-
-### Full End-to-End Workflow
-
-```bash
-# 1. Setup
-kaggle b init -y
-
-# 2. Write your task in task.py (see task file format above)
-
-# 3. Push
-kaggle b t push my-task -f task.py --wait
-
-# 4. Run against models
-kaggle b t run my-task -m google/gemini-2.5-pro anthropic/claude-sonnet-4 --wait
-
-# 5. Check status
-kaggle b t status my-task
-
-# 6. Download results
-kaggle b t download my-task -o ./results
-```
-
-### Local Iteration Loop
-
-Before pushing to the server, you can test your task locally against the Model Proxy to catch errors early. This avoids the push → run → wait → download round-trip for every change.
-
-**1. Get credentials:**
-```bash
-kaggle b init -y
-# or just: kaggle b auth -y
-```
-
-**2. Load env vars and run locally:**
-```bash
-# Source the .env file to set MODEL_PROXY_URL, MODEL_PROXY_API_KEY, etc.
-set -a && source .env && set +a
-
-# Run your task file directly with Python
-python task.py
-```
-
-**3. Check the output:**
-- A successful run produces a `.run.json` file in the current directory
-- Assertions print pass/fail inline so you can iterate on prompts and thresholds
-- Errors surface immediately in your terminal — no need to wait for server execution
-
-**4. Once satisfied, push to the server:**
-```bash
-kaggle b t push my-task -f task.py --wait && \
-kaggle b t run my-task -m google/gemini-2.5-pro --wait && \
-kaggle b t download my-task -o ./results
-```
-
-**⚠ Note:** Local runs use the `LLM_DEFAULT` model from your `.env` file. Server runs use whatever model(s) you specify with `-m`. Behavior may differ between models, so always validate against your target model(s) on the server after local iteration.
-
-### Quick Push-Run-Download
-
-```bash
-# Push and wait, then run and wait, all in sequence
-kaggle b t push my-task -f task.py --wait && \
-kaggle b t run my-task -m google/gemini-2.5-pro --wait && \
-kaggle b t download my-task -o ./results
-```
-
-### Testing a Task That Intentionally Errors
-
-```python
-# t.py
-# %%
-import kaggle_benchmarks as kbench
-
-# %%
-@kbench.task()
-def d(llm):
-    raise ValueError("intentional error")
-
-# %%
-d.run(kbench.llm)
-```
-
-```bash
-# Push succeeds (error only triggers at run time)
-kaggle b t push d -f t.py --wait
-
-# Run — will complete with ERRORED status
-kaggle b t run d -m google/gemini-3-flash-preview --wait
-
-# Status shows clean table + separate Errors section
-kaggle b t status d
-```