Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions openspec/changes/checkpoint-discipline/.openspec.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-02-22
96 changes: 96 additions & 0 deletions openspec/changes/checkpoint-discipline/design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
## Context

The apply-change skill instructions originate from a single TypeScript source file:

**`src/core/templates/workflows/apply-change.ts`** — contains two functions:
- `getApplyChangeSkillTemplate()` — generates SKILL.md files for all tools (Claude, Cursor, Windsurf, etc.)
- `getOpsxApplyCommandTemplate()` — generates tool-specific command files (e.g., `.claude/commands/opsx/apply.md`)

Both functions contain identical instruction text for steps 1–7, output formats, and guardrails. The command template has minor wording differences (e.g., `/opsx:continue` instead of `openspec-continue-change`) but the same step 6 and guardrails.

The current step 6 is a flat bullet list: show task → make changes → mark complete → continue. The checkpoint instruction exists only as guardrail #6: "Update task checkbox immediately after completing each task." In practice, agents ignore this because stronger competing signals (system-level parallelism prompts, the "keep going" guardrail) override it.

## Goals / Non-Goals

**Goals:**

- Make checkpoint-after-task the dominant behavioural signal in step 6, not a buried guardrail
- Include the recovery rationale inline so agents understand *why*, not just *what*
- Add git commit as part of the checkpoint so recovery survives process crashes (not just context window loss)
- Define tightly-coupled grouping clearly enough that agents can apply it without ambiguity
- Promote the checkpoint guardrail to first position with anti-batching language
- Address the merge request noise problem by noting that per-task commits should be bundled for review

**Non-Goals:**

- Changing the schema, artifact graph, or CLI commands
- Adding new CLI flags or configuration options for checkpoint behaviour
- Modifying other skills (archive, verify, sync, etc.)
- Enforcing checkpoint discipline programmatically (e.g., via hooks or validation) — this change relies on instruction strength only
- Changing commit message conventions beyond the task-referencing format

## Decisions

### 1. Replace step 6 rather than adding a new step

**Decision:** Rewrite step 6 in-place with the checkpoint loop structure (announce → implement → mark complete → commit → confirm).

**Rationale:** Adding a separate "checkpoint" step would break the existing step numbering and create ambiguity about whether the old loop or new checkpoint step takes precedence. Replacing step 6 keeps the structure clean and makes the checkpoint integral to the loop, not an afterthought.

**Alternative considered:** Adding a step 6b or inserting between steps 6 and 7. Rejected because it fragments the loop logic across multiple steps.

### 2. Use sub-steps (a–e) within step 6

**Decision:** Structure the checkpoint loop as lettered sub-steps: a. Announce, b. Implement, c. Mark complete, d. Commit, e. Confirm.

**Rationale:** The sequential lettering makes the order unambiguous. Agents can reference specific sub-steps. The hard gate ("steps c and d must happen before starting the next task") is clear when the steps are named.

**Alternative considered:** Keeping the bullet-list format. Rejected because bullets imply optional/unordered items, which is exactly the misinterpretation we're fixing.

### 3. Define grouping by build coherence, not by file proximity

**Decision:** Group the minimum number of tasks needed to avoid introducing a commit that would break the build on its own. A class change and its corresponding test are the canonical example — committing one without the other would fail CI.

**Rationale:** File-based rules (e.g., "tasks touching the same file") are too brittle — a refactoring task and a feature task might touch the same file but be logically independent. Build coherence is the right criterion because it's objective (does CI pass?) and maps to what a reviewer would consider a single logical unit.

**Alternative considered:** Strict one-task-per-checkpoint with no grouping. Rejected as impractical — splitting a function change from its test would create commits that fail tests, which is worse for recovery.

### 4. Hard ceiling of 3 tasks per checkpoint

**Decision:** No checkpoint may exceed 3 tasks, regardless of coupling. If more than 3 are needed to avoid a broken build, the tasks should be restructured.

**Rationale:** The build-coherence rule (decision 3) provides the primary grouping criterion, but without a ceiling agents could rationalise large batches. The cap of 3 is a safety net — the typical case is 1 task per checkpoint, and needing more than 3 for a non-breaking commit signals the tasks are too granular and should be rewritten.

### 5. Include git commit in the checkpoint, not just file update

**Decision:** The checkpoint includes `git add -A && git commit -m "task N: <short description>"`.

**Rationale:** Marking `[x]` in the tasks file without committing only protects against context window loss (the agent can re-read the file). It does not protect against process crashes, machine failures, or session timeouts — the primary failure modes we're addressing. A git commit makes progress durable on disk.

**Alternative considered:** Making the commit optional or configurable. Rejected because the whole point is crash recovery — without the commit, the checkpoint is incomplete.

### 6. Update both template functions in apply-change.ts

**Decision:** Edit `getApplyChangeSkillTemplate()` and `getOpsxApplyCommandTemplate()` in `apply-change.ts` with matching changes.

**Rationale:** Both functions contain the same step 6 and guardrails text. They must stay in sync — the skill template drives SKILL.md generation and the command template drives slash command generation.

**Follow-up consideration:** A snapshot or diff-based test asserting that the step-6 and guardrail sections of both rendered outputs are identical (modulo the expected `/opsx:continue` vs `openspec-continue-change` substitution) would prevent silent drift. This is out of scope for this change but worth tracking separately.

### 7. Add merge request bundling guidance to the guardrails

**Decision:** Add a guardrail note that per-task commits should be bundled into a single merge request by default, with user confirmation.

**Rationale:** Per-task commits are a recovery mechanism, not a review unit. Without this guidance, the granular commit history could produce noisy merge requests. The user confirmation step ensures the agent doesn't automatically submit without the user's awareness.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same terminology inconsistency as in proposal.md: "Squashing commits automatically" should be "Bundling commits into a single PR" to match the accepted decision and spec terminology.

**Alternative considered:** Automatically bundling commits into a single squashed commit. Rejected because rewriting history destroys the recovery trail before the user has confirmed the work is complete, and some users may prefer the granular history.

## Risks / Trade-offs

**[Agents may still batch despite stronger instructions]** → The instructions are the strongest lever available without programmatic enforcement. The combination of rationale, sub-step structure, hard gate language, and first-position guardrail is significantly stronger than the current single bullet. If agents still batch, the next step would be hooks or validation — but that's a separate change.

**[Per-task commits add overhead]** → Each checkpoint adds a git commit, which takes time and creates history. This is intentional — the overhead is the cost of recoverability. The tightly-coupled grouping rule mitigates this by allowing natural units to be committed together.

**[`git add -A` may stage unintended files]** → This matches the existing convention in the upstream proposal. Projects with sensitive files should have `.gitignore` configured appropriately. Changing to explicit file staging would require the agent to track which files each task modified, adding complexity disproportionate to the risk.

**[Commit message format is prescriptive]** → The `task N: <short description>` format is simple and traceable. Projects with strict commit conventions can override this via project config rules.
32 changes: 32 additions & 0 deletions openspec/changes/checkpoint-discipline/proposal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
## Why

The apply-change skill's task loop is the only point where implementation progress becomes durable. If a session crashes mid-implementation, only tasks marked `[x]` and committed to git can be recovered. The current instructions tell agents to "update task checkbox immediately after completing each task," but in practice agents routinely batch multiple tasks and mark them all complete at the end — meaning a crash loses all progress.

This happens because the instruction competes with stronger signals: coding agents are prompted to maximise parallelism and throughput, no rationale is given for *why* checkpointing matters, no anti-pattern is stated, and the "keep going through tasks until done" guardrail reinforces batching over discipline. The fix is to make checkpoint semantics explicit, explain the recovery purpose, and structure the loop as a hard gate rather than a soft preference.

## What Changes

- **Restructure step 6** of the apply-change skill from a simple loop into a checkpoint-disciplined loop with announce → implement → mark complete → commit → confirm stages
- **Explain the "why"** — frame the tasks file as a recovery log so agents understand the purpose of per-task updates
- **Add git commit to the checkpoint** — marking `[x]` without committing still loses progress on crash; committing makes recovery durable
- **Allow pragmatic grouping** — group the minimum number of tasks needed to avoid a commit that would break the build (e.g., a class change and its test), with a hard ceiling of 3 tasks per checkpoint
- **Default to bundling per-task commits in a single merge request** — per-task commits are a recovery mechanism, not a review unit. When creating a merge request for a change, all per-task commits should be included in a single merge request by default, with the user prompted to confirm before submission
- **Reorder and strengthen guardrails** — move the checkpoint rule to the top, add an explicit anti-batching statement, remove the weaker "update task checkbox immediately" bullet it replaces

## Capabilities

### New Capabilities

- `apply-checkpoint-discipline`: Defines the checkpoint loop contract for the apply-change skill — when to checkpoint, what a checkpoint includes (mark complete + commit), how tightly-coupled tasks may be grouped, and the maximum batch size

### Modified Capabilities

*(none — no existing spec requirements are changing)*

## Impact

- **Files changed:** `src/core/templates/workflows/apply-change.ts` — both `getApplyChangeSkillTemplate()` and `getOpsxApplyCommandTemplate()`. These are the source templates from which all tool-specific SKILL.md and command files are generated during `openspec init`.
- **No breaking changes:** The checkpoint loop is a refinement of the existing loop, not a new workflow step
- **Trade-off: resilience vs parallel efficiency.** Committing after each task adds overhead and prevents agents from parallelising unrelated tasks. This is intentional — the tasks file exists for recoverability, and that value is lost if progress isn't persisted. The build-coherence grouping rule (minimum tasks to avoid a broken build, capped at 3) provides a pragmatic escape valve so agents aren't forced to split changes that would break CI independently.
- **Trade-off: granular commits vs reviewable merge requests.** Per-task commits create a clean recovery trail but would produce noisy merge requests if submitted as-is. The default behaviour should be to bundle all per-task commits for a change into a single merge request, with user confirmation before submission.
- **No schema changes** — this modifies skill instructions only, not the artifact graph or schema definitions
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# apply-checkpoint-discipline Specification

## Purpose
Define the checkpoint contract for the apply-change implementation loop, ensuring task progress is durable and recoverable after session crashes while allowing pragmatic grouping of tightly-coupled tasks.

## ADDED Requirements

### Requirement: Checkpoint after each task
The apply-change skill SHALL checkpoint after completing each task (or small group of tightly-coupled tasks) before starting the next. A checkpoint consists of marking the task complete in the tasks file and committing the changes to git.

#### Scenario: Single task completion
- **WHEN** an agent completes a single pending task
- **THEN** it SHALL mark the task `[x]` in the tasks file
- **AND** run `git add -A && git commit -m "task N: <short description>"` where N is the task number
- **AND** output a confirmation (e.g., "Task N complete") before starting the next task

#### Scenario: Session crash after checkpoint
- **WHEN** a session crashes after a task has been checkpointed (marked `[x]` and committed)
- **THEN** the completed task's changes SHALL be recoverable from the git history
- **AND** the tasks file SHALL accurately reflect which tasks were completed

#### Scenario: Session crash before checkpoint
- **WHEN** a session crashes while a task is in progress but before the checkpoint
- **THEN** only the current in-progress task's changes are lost
- **AND** all previously checkpointed tasks remain recoverable

#### Scenario: Session crash in mid-checkpoint (marked complete, uncommitted)
- **WHEN** a session crashes after marking a task `[x]` but before the `git commit` succeeds
- **THEN** on recovery the agent SHALL verify that each `[x]` task has a corresponding commit in git history
- **AND** for any `[x]` task without a corresponding commit, SHALL check the working directory for uncommitted changes
- **AND** if uncommitted changes are present, SHALL commit them to complete the checkpoint
- **AND** if no uncommitted changes are present, SHALL treat the task as not started and re-implement before committing

### Requirement: Recovery rationale in instructions
The apply-change skill instructions SHALL explain that the tasks file serves as a recovery log and that progress is only durable once marked complete and committed. This rationale SHALL appear in the implementation step, not only in guardrails.

#### Scenario: Agent reads implementation step
- **WHEN** an agent processes the apply-change skill's implementation step
- **THEN** the step SHALL state that the tasks file is a recovery log
- **AND** SHALL state that progress is only durable once tasks are marked complete and committed

### Requirement: Tightly-coupled task grouping
The apply-change skill SHALL group tasks into the minimum number needed to avoid introducing a commit that would break the build on its own. Unrelated tasks that can each produce a non-breaking commit SHALL be checkpointed separately. No checkpoint SHALL exceed 3 tasks regardless of coupling.

#### Scenario: Grouping a class change and its test
- **WHEN** committing a class modification without its corresponding unit test would break the build
- **THEN** the agent SHALL group both tasks into a single checkpoint
- **AND** SHALL commit them together

#### Scenario: Unrelated tasks
- **WHEN** two tasks can each produce a non-breaking commit independently
- **THEN** the agent SHALL checkpoint each task separately
- **AND** SHALL NOT batch them into a single commit

#### Scenario: Maximum batch size
- **WHEN** an agent groups tightly-coupled tasks
- **THEN** no more than 3 tasks SHALL accumulate without a checkpoint (the typical case is 1; grouping is the exception)
- **AND** if more than 3 tasks are needed to avoid a broken build, the tasks SHOULD be restructured to allow smaller checkpoints

### Requirement: Checkpoint guardrail prominence
The checkpoint rule SHALL appear as the first guardrail in the apply-change skill and SHALL include an explicit anti-batching statement.

#### Scenario: Guardrail ordering
- **WHEN** the apply-change skill's guardrails are rendered
- **THEN** the checkpoint guardrail SHALL be the first item in the list

#### Scenario: Anti-batching statement
- **WHEN** the checkpoint guardrail is rendered
- **THEN** it SHALL explicitly state that large numbers of task completions MUST NOT be batched together

### Requirement: Per-task commits bundled in merge requests
Per-task commits SHALL be bundled into a single merge request per change by default. The agent SHALL prompt the user for confirmation before creating the merge request.

#### Scenario: Creating a merge request after implementation
- **WHEN** all tasks for a change are complete and the user requests a merge request
- **THEN** the agent SHALL include all per-task commits for that change in a single merge request by default
- **AND** SHALL prompt the user to confirm before submitting

#### Scenario: User opts for separate merge requests
- **WHEN** the user declines the default bundling and requests separate merge requests
- **THEN** the agent SHALL create separate merge requests as directed by the user
- **AND** SHALL NOT bundle per-task commits without the user's explicit confirmation
23 changes: 23 additions & 0 deletions openspec/changes/checkpoint-discipline/tasks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
## 1. Update step 6 in getApplyChangeSkillTemplate()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, what do you think about adding this in as an optional experimental skill/slash command to begin with?

We recently added optional skills that can be installed but are not created/added by default. I'd be open to trying this out as a separate skill to begin with, and if it works well enough, we can replace the main skill with it or keep it as a skill for more advanced users.

I also wonder if per-task commit is too granular in general, maybe per-phase commit? <- people will definitely start asking to control this behaviour, haha, but it's not a pressing concern.


- [ ] 1.1 Replace step 6 heading from "Implement tasks (loop until done or blocked)" to "Implement tasks (checkpoint after each)"
- [ ] 1.2 Add recovery rationale paragraph ("The tasks file is a recovery log...")
- [ ] 1.3 Replace flat bullet list with lettered sub-steps: a. Announce, b. Implement, c. Mark complete, d. Commit, e. Confirm
- [ ] 1.4 Add hard gate statement ("Steps c and d must happen before starting the next task")
- [ ] 1.5 Add tightly-coupled definition and maximum batch size (2–3 tasks)

## 2. Update guardrails in getApplyChangeSkillTemplate()

- [ ] 2.1 Replace "Update task checkbox immediately after completing each task" with checkpoint guardrail as first item, including anti-batching statement
- [ ] 2.2 Add MR bundling guardrail: per-task commits bundled into a single PR by default, with user confirmation
- [ ] 2.3 Reorder remaining guardrails (keep going, read context files, etc.) after the checkpoint guardrail

## 3. Apply matching changes to getOpsxApplyCommandTemplate()

- [ ] 3.1 Replace step 6 with identical checkpoint loop (matching task group 1)
- [ ] 3.2 Replace guardrails with identical updated guardrails (matching task group 2)

## 4. Verify

- [ ] 4.1 Run `pnpm run build` to confirm TypeScript compiles without errors
- [ ] 4.2 Run `pnpm test` to confirm no test regressions