feat: add teacher config/sft loss for hosted training SFT#514
Conversation
Add TeacherRolloutModelConfig to the RL config schema so users can specify an external teacher model for SFT hard distill via TOML: [teacher_rollout_model] base_url = ["https://..."] api_key_var = "PRIME_API_KEY" name = "model-name" The field flows through the API client to the platform, which merges it into the orchestrator's run_config as CLI overrides. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit d2f3992. Configure here.
| if ( | ||
| teacher_sampling.max_tokens is not None | ||
| or teacher_sampling.enable_thinking is not None | ||
| or teacher_sampling.reasoning_effort |
There was a problem hiding this comment.
Inconsistent truthiness check for teacher reasoning_effort display
Low Severity
The teacher display section checks teacher_sampling.reasoning_effort using bare truthiness (lines 936 and 944), while the equivalent main sampling display section uses is not None checks (lines 964 and 986). Since reasoning_effort is str | None, the truthiness check would suppress display for any falsy string value (e.g. ""), creating inconsistent behavior between the two sections.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit d2f3992. Configure here.


Summary
Implements APR-157 SFT distillation support through the existing
prime trainconfig path.loss = "rl" | "sft"with[teacher]and[teacher.sampling].teacher.save = trueis not supported.rollouts_per_exampleto1while preserving explicit overrides.lossandteacher.enable_thinkingandreasoning_effortsampling aliases intoextra_body.chat_template_kwargs.Training,Teacher, andRun Configrender as separate sections.checkpoint_idare uncommented together.Config Shape
API Payload
{ "loss": "sft", "teacher": { "model": "openai/gpt-oss-120b", "save": false, "sampling": { "max_tokens": 2048, "extra_body": { "chat_template_kwargs": { "enable_thinking": false, "reasoning_effort": "medium" } } } } }Tests
uv run pytest packages/prime/tests/test_rl_config.py packages/prime/tests/test_train_cli.py -q