Skip to content

Clarify whether slow update should be validation-gated or force-injected #22

@luigi-sqwish

Description

@luigi-sqwish

Hi SkillOpt team, thanks for releasing the paper and code. I wanted to clarify the intended semantics of the slow-update path.

In the paper, Section 3.6 appears to describe slow update as still going through the validation/selection gate after the longitudinal guidance is injected. My reading is that the slow-update candidate should be accepted only if it passes the same held-out selection validation used for normal edits.

In current main, however, the implementation appears to force-inject slow-update guidance into both current_skill and best_skill without a selection gate:

# Slow update field is force-updated into both
# current_skill and best_skill unconditionally.
# The epoch-level longitudinal guidance should always
# persist — it must not be gated by step-level
# selection scores.

The surrounding code records the action as force_accept:

slow_result["action"] = "force_accept"
current_origin = f"slow_update_epoch_{epoch:02d}"

Could you clarify which behavior is intended for reproducing/reporting SkillOpt results?

  1. Should slow-update guidance be validation-gated, as described in the paper?
  2. Or is the force-injection behavior in the released code the intended implementation?
  3. If force-injection is intended, should exported best_skill.md be interpreted as the best validation-gated skill plus the latest slow-update field, rather than the exact best step candidate selected by the held-out gate?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions