ENG-3243 Add hosted eval export command by d42me · Pull Request #647 · PrimeIntellect-ai/prime

d42me · 2026-05-13T22:39:17Z

Adds prime eval export for verifiers JSONL and Inspect .eval exports with reward filtering and docs.

Note

Medium Risk
Adds a new prime eval export command that fetches and serializes hosted eval samples to disk, including pagination and filtering; mistakes could lead to incomplete/incorrect exports or large-memory runs when exporting big evaluations.

Overview
Adds prime eval export <run-id> to download all hosted evaluation samples, filter them (failed rollouts and reward thresholds), and write them out as either verifiers JSONL or a zipped Inspect .eval (with log.json).

Includes new eval_export utilities for normalizing sample/message shapes and deriving metadata, plus CLI logic to resolve a run ID to an evaluation ID, prevent exporting active runs, and page through samples. Updates the README with usage examples and adds tests covering verifiers row shape, Inspect output creation, and reward filtering.

^{Reviewed by Cursor Bugbot for commit dac3210. Bugbot is set up for automated code reviews on this repo. Configure here.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dac3210220

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-13T22:44:06Z

+    if split is not None and split != 1:
+        console.print("[red]Error:[/red] split exports are not available for this run")


Reject unsupported split selection consistently

The --split flag is documented as selecting a specific env config set, but this guard only errors for values other than 1, so --split 1 is silently accepted and then ignored during export. In runs that actually contain multiple config sets, users will think they exported a subset while the command writes all rollouts, which can pollute downstream training/eval datasets. Until split-aware filtering is implemented, any non-None split value should fail explicitly.

Useful? React with 👍 / 👎.

ENG-3243 Add hosted eval export command

dac3210

chatgpt-codex-connector Bot reviewed May 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENG-3243 Add hosted eval export command#647

ENG-3243 Add hosted eval export command#647
d42me wants to merge 1 commit into
mainfrom
feature/add-export-for-hosted

d42me commented May 13, 2026 •

edited by cursor Bot

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		if split is not None and split != 1:
		console.print("[red]Error:[/red] split exports are not available for this run")

Conversation

d42me commented May 13, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

d42me commented May 13, 2026 •

edited by cursor Bot

Loading