Skip to content

Conversation

@JannikSt
Copy link
Member

@JannikSt JannikSt commented Dec 30, 2025

Adds missing eval_config support to the RL API client.

Changes:

  • Add eval_config field to RLRun model
  • Add eval_config parameter to create_run()
  • Wire up payload["eval"] in request body

Relates to #256


Note

Adds evaluation support to RL runs and a new logs experience.

  • Adds eval_config to RLRun, plumbs eval_config through RLClient.create_run() as payload["eval"]
  • CLI: introduces [eval] config (envs, interval, num_examples, rollouts_per_example, base_model) and corresponding --eval-* flags; validates env/eval slugs; surfaces eval settings in run creation output
  • New prime rl logs command with --tail and --follow; cleans output by stripping ANSI and collapsing progress bars; handles rate limiting
  • API client: new get_logs(run_id, tail_lines) method used by CLI

Written by Cursor Bugbot for commit aa1b26d. This will update automatically on new commits. Configure here.

@JannikSt
Copy link
Member Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- Use BaseConfig.from_sources for eval config precedence instead of manual if-statements
- Require owner/name format for --eval-envs (same as training environments)
- Rename EvalConfig.eval_base_model to base_model for proper underscore mapping
@JannikSt
Copy link
Member Author

JannikSt commented Jan 3, 2026

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: aa1b26d886

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@JannikSt JannikSt merged commit f553271 into feature/rft Jan 3, 2026
1 check passed
@JannikSt JannikSt deleted the feature/rft-eval-config branch January 3, 2026 12:00
JannikSt added a commit that referenced this pull request Jan 3, 2026
* Implement commands for hosted RL

* Hosted RL

* Allow for user to use just
 Usage: prime rl [OPTIONS] ENVIRONMENTS... | COMMAND [ARGS]...

 Manage RL training runs.

 By default, 'prime rl <environments>' runs 'prime rl run <environments>'.

╭─ Options ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --help  -h        Show this message and exit.                                                                                                                                                                      │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ run      Create an RL training run with specified environments and model.                                                                                                                                          │
│ models   List available models for RL training.                                                                                                                                                                    │
│ runs     List your RL training runs.                                                                                                                                                                               │
│ stop     Stop an RL training run.                                                                                                                                                                                  │
│ delete   Delete an RL training run.                                                                                                                                                                                │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ to start a run

* Support tomls on prime rl cmd

* Minor fix

* Cleanup references to RFT

* Minor improvements

* Fix ruff

* Match post rft run schema to new backend

* Refactor delete_run method to remove return value and simplify success handling in RLClient and related command.

* Fix/prime rl list (#267)

* quick fix for prime rl list when no name set

* remove truncation of id in prime rl list

* Add support for run_config

* feat: add eval_config support to RL API client (#271)

* feat: add eval_config support to RL API client

* Remove accidentally committed test files

* feat: add logs command for RL runs

* fix: move time import to top, add rl_config example

* feat: add --watch flag and improve log streaming

* fix: allow built-in envs like reverse-text, update example

* feat: add --eval-* options to rl run command

* fix: strip ANSI escape codes from logs output

* fix: increase poll interval to 5s, add rate limit handling

* fix: filter progress bars from logs output, remove redundant --watch flag

* fix: keep 100% progress bar completion lines in logs

* fix: address review comments - simplify log follow, warn on unused eval options

* fix: handle log rotation in follow mode when tail window is full

* fix: always use overlap detection for log follow to handle fast growth with rotation

* feat: add [eval] section support in TOML config files

* fix: improve progress bar filtering to remove empty lines

* fix: require owner/name format for environments, remove example config

* fix: use from_sources for eval config merging, require owner/name format

- Use BaseConfig.from_sources for eval config precedence instead of manual if-statements
- Require owner/name format for --eval-envs (same as training environments)
- Rename EvalConfig.eval_base_model to base_model for proper underscore mapping

* prime registry support (#215)

* custom image registry for sandboxes

* prime images

* --image typo

* linux/amd64

* updated to not build locally

* full image path

* rm emojis

* remove inline

* image status

* full image path

* add cleanup

* adjust scope output

* bug bot stuff

* validate_output_format

* bug bot comment

* update prime images list

* limit platform

* bump timeout

* add closed beta info

* Chore/bump version 0.5.8 (#270)

* bump version to 0.5.8

* bump versions

* Fix: Update eval sample field (#265)

* Update eval sample field.

* Update docs.

* Fix: Remove trailing comma from API token URL (#273)

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: sami <sami@primeintellect.ai>

---------

Co-authored-by: Johannes Hagemann <johannes@primeintellect.ai>
Co-authored-by: JannikSt <JannikSt@users.noreply.github.com>
Co-authored-by: Jannik Straube <info@jannik-straube.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants