Skip to content

feat(benchmarks): Add log and download source files#1019

Draft
dolaameng wants to merge 7 commits into
mainfrom
dolaameng/btcli
Draft

feat(benchmarks): Add log and download source files#1019
dolaameng wants to merge 7 commits into
mainfrom
dolaameng/btcli

Conversation

@dolaameng
Copy link
Copy Markdown
Contributor

@dolaameng dolaameng commented May 21, 2026

feat(benchmarks): Add publish and log commands, support dataset push, and improve download

TODO: release kaggle-sdk-python new version and use it here.

Publish command (new)

Adds kaggle b t publish <task> [--publish-backing-notebook] to make a benchmark task public.

  • Handles partial states: if a task is already public but the backing notebook is private (and the flag is set), it will publish only the notebook.
  • Returns clear warnings if no backing notebook is associated with the task.

Log command (new)

Adds kaggle b t log <task> [-m <model> ...] to view execution logs.

  • Streaming: Active (RUNNING) runs stream live output via SSE; completed/errored runs print the persisted log.
  • Robustness: Gracefully handles QUEUED runs (which have no logs yet) by catching the server's 404/400 per-run, printing a friendly indicator, and continuing to show logs for other runs instead of crashing.
  • Multi-model: Logs for each model are printed sequentially (not interleaved), with a header showing run ID and state, a line-count footer, and a final summary.

Push with dataset (new)

Adds -d / --kaggle-dataset to kaggle b t push to attach Kaggle datasets as data sources.

  • Visual warning if re-pushing a task without -d would detach previously associated datasets.
  • Clean validation checks to warn about unresolved dataset slugs on the server.

Download improvements

  • --include-source / -s: Also download the kernel session's source notebooks alongside output files.
  • --force / -f: Re-download and overwrite previously downloaded runs instead of skipping them. Downloads to a staging directory first so the previous output is preserved if the download fails.
  • Summary line: Prints a final Done: X downloaded, Y skipped. count (and outputs Done: 0 downloaded. consistently on early returns).

Refactors & Code Quality

  • TTY-Aware Colors: Replaced all raw ANSI escape codes (\033[) with static utility helpers (_bold(), _warn(), _error()) that automatically strip color codes when the output stream is redirected (non-TTY, pipes, logs, agents).
  • Format Improvements:
    • Unified _format_model_hint() to fix model name quoting inconsistencies.
    • _print_log_entry() now prints fallback dictionaries with formatting via json.dumps() instead of Python's raw repr().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant