From 18ee59b62eacd6ef6592d6ce730f51052b6bc9f7 Mon Sep 17 00:00:00 2001 From: Marcus Don Date: Sun, 22 Feb 2026 20:00:47 +0000 Subject: [PATCH 1/4] feat: add checkpoint discipline proposal for apply-change skill Adds an OpenSpec change proposal to strengthen the apply-change skill's implementation loop with checkpoint discipline for crash recovery. Co-Authored-By: Claude Opus 4.6 --- .../checkpoint-discipline/.openspec.yaml | 2 + .../changes/checkpoint-discipline/design.md | 94 +++++++++++++++++++ .../changes/checkpoint-discipline/proposal.md | 32 +++++++ .../specs/apply-checkpoint-discipline/spec.md | 73 ++++++++++++++ .../changes/checkpoint-discipline/tasks.md | 23 +++++ 5 files changed, 224 insertions(+) create mode 100644 openspec/changes/checkpoint-discipline/.openspec.yaml create mode 100644 openspec/changes/checkpoint-discipline/design.md create mode 100644 openspec/changes/checkpoint-discipline/proposal.md create mode 100644 openspec/changes/checkpoint-discipline/specs/apply-checkpoint-discipline/spec.md create mode 100644 openspec/changes/checkpoint-discipline/tasks.md diff --git a/openspec/changes/checkpoint-discipline/.openspec.yaml b/openspec/changes/checkpoint-discipline/.openspec.yaml new file mode 100644 index 000000000..cbbb57832 --- /dev/null +++ b/openspec/changes/checkpoint-discipline/.openspec.yaml @@ -0,0 +1,2 @@ +schema: spec-driven +created: 2026-02-22 diff --git a/openspec/changes/checkpoint-discipline/design.md b/openspec/changes/checkpoint-discipline/design.md new file mode 100644 index 000000000..360e1c305 --- /dev/null +++ b/openspec/changes/checkpoint-discipline/design.md @@ -0,0 +1,94 @@ +## Context + +The apply-change skill instructions originate from a single TypeScript source file: + +**`src/core/templates/workflows/apply-change.ts`** — contains two functions: +- `getApplyChangeSkillTemplate()` — generates SKILL.md files for all tools (Claude, Cursor, Windsurf, etc.) +- `getOpsxApplyCommandTemplate()` — generates tool-specific command files (e.g., `.claude/commands/opsx/apply.md`) + +Both functions contain identical instruction text for steps 1–7, output formats, and guardrails. The command template has minor wording differences (e.g., `/opsx:continue` instead of `openspec-continue-change`) but the same step 6 and guardrails. + +The current step 6 is a flat bullet list: show task → make changes → mark complete → continue. The checkpoint instruction exists only as guardrail #6: "Update task checkbox immediately after completing each task." In practice, agents ignore this because stronger competing signals (system-level parallelism prompts, the "keep going" guardrail) override it. + +## Goals / Non-Goals + +**Goals:** + +- Make checkpoint-after-task the dominant behavioural signal in step 6, not a buried guardrail +- Include the recovery rationale inline so agents understand *why*, not just *what* +- Add git commit as part of the checkpoint so recovery survives process crashes (not just context window loss) +- Define tightly-coupled grouping clearly enough that agents can apply it without ambiguity +- Promote the checkpoint guardrail to first position with anti-batching language +- Address the PR noise problem by noting that per-task commits should be bundled for review + +**Non-Goals:** + +- Changing the schema, artifact graph, or CLI commands +- Adding new CLI flags or configuration options for checkpoint behaviour +- Modifying other skills (archive, verify, sync, etc.) +- Enforcing checkpoint discipline programmatically (e.g., via hooks or validation) — this change relies on instruction strength only +- Changing commit message conventions beyond the task-referencing format + +## Decisions + +### 1. Replace step 6 rather than adding a new step + +**Decision:** Rewrite step 6 in-place with the checkpoint loop structure (announce → implement → mark complete → commit → confirm). + +**Rationale:** Adding a separate "checkpoint" step would break the existing step numbering and create ambiguity about whether the old loop or new checkpoint step takes precedence. Replacing step 6 keeps the structure clean and makes the checkpoint integral to the loop, not an afterthought. + +**Alternative considered:** Adding a step 6b or inserting between steps 6 and 7. Rejected because it fragments the loop logic across multiple steps. + +### 2. Use sub-steps (a–e) within step 6 + +**Decision:** Structure the checkpoint loop as lettered sub-steps: a. Announce, b. Implement, c. Mark complete, d. Commit, e. Confirm. + +**Rationale:** The sequential lettering makes the order unambiguous. Agents can reference specific sub-steps. The hard gate ("steps c and d must happen before starting the next task") is clear when the steps are named. + +**Alternative considered:** Keeping the bullet-list format. Rejected because bullets imply optional/unordered items, which is exactly the misinterpretation we're fixing. + +### 3. Define "tightly coupled" by coherence, not by file proximity + +**Decision:** Define tightly-coupled tasks as "changes that would be incoherent if split" with the example of a class change and its corresponding test. + +**Rationale:** File-based rules (e.g., "tasks touching the same file") are too brittle — a refactoring task and a feature task might touch the same file but be logically independent. Coherence is the right criterion because it maps to what a reviewer would consider a single logical unit. + +**Alternative considered:** Strict one-task-per-checkpoint with no grouping. Rejected as impractical — splitting a function change from its test would create commits that fail tests, which is worse for recovery. + +### 4. Cap grouped tasks at 2–3 + +**Decision:** "Never let more than 2–3 tasks accumulate without a checkpoint." + +**Rationale:** This is deliberately imprecise. A hard cap of exactly 2 or exactly 3 would invite rules-lawyering. The range signals "this should be rare and small" while leaving room for agent judgment. The typical case is 1 task per checkpoint; grouping is the exception. + +### 5. Include git commit in the checkpoint, not just file update + +**Decision:** The checkpoint includes `git add -A && git commit -m "task N: "`. + +**Rationale:** Marking `[x]` in the tasks file without committing only protects against context window loss (the agent can re-read the file). It does not protect against process crashes, machine failures, or session timeouts — the primary failure modes we're addressing. A git commit makes progress durable on disk. + +**Alternative considered:** Making the commit optional or configurable. Rejected because the whole point is crash recovery — without the commit, the checkpoint is incomplete. + +### 6. Update both template functions in apply-change.ts + +**Decision:** Edit `getApplyChangeSkillTemplate()` and `getOpsxApplyCommandTemplate()` in `apply-change.ts` with matching changes. + +**Rationale:** Both functions contain the same step 6 and guardrails text. They must stay in sync — the skill template drives SKILL.md generation and the command template drives slash command generation. + +### 7. Add MR bundling guidance to the guardrails + +**Decision:** Add a guardrail note that per-task commits should be bundled into a single merge request by default, with user confirmation. + +**Rationale:** Per-task commits are a recovery mechanism, not a review unit. Without this guidance, the granular commit history could produce noisy PRs. The user confirmation step ensures the agent doesn't automatically submit without the user's awareness. + +**Alternative considered:** Squashing commits automatically. Rejected because squashing destroys the recovery trail before the user has confirmed the work is complete, and some users may prefer the granular history. + +## Risks / Trade-offs + +**[Agents may still batch despite stronger instructions]** → The instructions are the strongest lever available without programmatic enforcement. The combination of rationale, sub-step structure, hard gate language, and first-position guardrail is significantly stronger than the current single bullet. If agents still batch, the next step would be hooks or validation — but that's a separate change. + +**[Per-task commits add overhead]** → Each checkpoint adds a git commit, which takes time and creates history. This is intentional — the overhead is the cost of recoverability. The tightly-coupled grouping rule mitigates this by allowing natural units to be committed together. + +**[`git add -A` may stage unintended files]** → This matches the existing convention in the upstream proposal. Projects with sensitive files should have `.gitignore` configured appropriately. Changing to explicit file staging would require the agent to track which files each task modified, adding complexity disproportionate to the risk. + +**[Commit message format is prescriptive]** → The `task N: ` format is simple and traceable. Projects with strict commit conventions can override this via project config rules. diff --git a/openspec/changes/checkpoint-discipline/proposal.md b/openspec/changes/checkpoint-discipline/proposal.md new file mode 100644 index 000000000..b9796eef6 --- /dev/null +++ b/openspec/changes/checkpoint-discipline/proposal.md @@ -0,0 +1,32 @@ +## Why + +The apply-change skill's task loop is the only point where implementation progress becomes durable. If a session crashes mid-implementation, only tasks marked `[x]` and committed to git can be recovered. The current instructions tell agents to "update task checkbox immediately after completing each task," but in practice agents routinely batch multiple tasks and mark them all complete at the end — meaning a crash loses all progress. + +This happens because the instruction competes with stronger signals: coding agents are prompted to maximise parallelism and throughput, no rationale is given for *why* checkpointing matters, no anti-pattern is stated, and the "keep going through tasks until done" guardrail reinforces batching over discipline. The fix is to make checkpoint semantics explicit, explain the recovery purpose, and structure the loop as a hard gate rather than a soft preference. + +## What Changes + +- **Restructure step 6** of the apply-change skill from a simple loop into a checkpoint-disciplined loop with announce → implement → mark complete → commit → confirm stages +- **Explain the "why"** — frame the tasks file as a recovery log so agents understand the purpose of per-task updates +- **Add git commit to the checkpoint** — marking `[x]` without committing still loses progress on crash; committing makes recovery durable +- **Allow pragmatic grouping** — tightly-coupled tasks (e.g., a class change and its test) can be checkpointed together, but unrelated tasks must not be batched +- **Default to squashing per-task commits in merge requests** — per-task commits are a recovery mechanism, not a review unit. When creating a merge request for a change, all per-task commits should be included in a single PR by default, with the user prompted to confirm before submission +- **Reorder and strengthen guardrails** — move the checkpoint rule to the top, add an explicit anti-batching statement, remove the weaker "update task checkbox immediately" bullet it replaces + +## Capabilities + +### New Capabilities + +- `apply-checkpoint-discipline`: Defines the checkpoint loop contract for the apply-change skill — when to checkpoint, what a checkpoint includes (mark complete + commit), how tightly-coupled tasks may be grouped, and the maximum batch size + +### Modified Capabilities + +_(none — no existing spec requirements are changing)_ + +## Impact + +- **Files changed:** `src/core/templates/workflows/apply-change.ts` — both `getApplyChangeSkillTemplate()` and `getOpsxApplyCommandTemplate()`. These are the source templates from which all tool-specific SKILL.md and command files are generated during `openspec init`. +- **No breaking changes:** The checkpoint loop is a refinement of the existing loop, not a new workflow step +- **Trade-off: resilience vs parallel efficiency.** Committing after each task adds overhead and prevents agents from parallelising unrelated tasks. This is intentional — the tasks file exists for recoverability, and that value is lost if progress isn't persisted. The tightly-coupled grouping rule (up to 2–3 tasks) provides a pragmatic escape valve so agents aren't forced to split changes that would be incoherent apart. +- **Trade-off: granular commits vs reviewable PRs.** Per-task commits create a clean recovery trail but would produce noisy merge requests if submitted as-is. The default behaviour should be to bundle all per-task commits for a change into a single PR, with user confirmation before submission. +- **No schema changes** — this modifies skill instructions only, not the artifact graph or schema definitions diff --git a/openspec/changes/checkpoint-discipline/specs/apply-checkpoint-discipline/spec.md b/openspec/changes/checkpoint-discipline/specs/apply-checkpoint-discipline/spec.md new file mode 100644 index 000000000..4cac23b36 --- /dev/null +++ b/openspec/changes/checkpoint-discipline/specs/apply-checkpoint-discipline/spec.md @@ -0,0 +1,73 @@ +# apply-checkpoint-discipline Specification + +## Purpose +Define the checkpoint contract for the apply-change implementation loop, ensuring task progress is durable and recoverable after session crashes while allowing pragmatic grouping of tightly-coupled tasks. + +## ADDED Requirements + +### Requirement: Checkpoint after each task +The apply-change skill SHALL checkpoint after completing each task (or small group of tightly-coupled tasks) before starting the next. A checkpoint consists of marking the task complete in the tasks file and committing the changes to git. + +#### Scenario: Single task completion +- **WHEN** an agent completes a single pending task +- **THEN** it SHALL mark the task `[x]` in the tasks file +- **AND** run `git add -A && git commit` with a message referencing the task number and a short description +- **AND** output a confirmation (e.g., "Task N complete") before starting the next task + +#### Scenario: Session crash after checkpoint +- **WHEN** a session crashes after a task has been checkpointed (marked `[x]` and committed) +- **THEN** the completed task's changes SHALL be recoverable from the git history +- **AND** the tasks file SHALL accurately reflect which tasks were completed + +#### Scenario: Session crash before checkpoint +- **WHEN** a session crashes while a task is in progress but before the checkpoint +- **THEN** only the current in-progress task's changes are lost +- **AND** all previously checkpointed tasks remain recoverable + +### Requirement: Recovery rationale in instructions +The apply-change skill instructions SHALL explain that the tasks file serves as a recovery log and that progress is only durable once marked complete and committed. This rationale SHALL appear in the implementation step, not only in guardrails. + +#### Scenario: Agent reads implementation step +- **WHEN** an agent processes the apply-change skill's implementation step +- **THEN** the step SHALL state that the tasks file is a recovery log +- **AND** SHALL state that progress is only durable once tasks are marked complete and committed + +### Requirement: Tightly-coupled task grouping +The apply-change skill SHALL allow tightly-coupled tasks to be checkpointed together in a single commit, but SHALL prohibit batching unrelated tasks. + +#### Scenario: Grouping a class change and its test +- **WHEN** two tasks are tightly coupled (e.g., a class modification and its corresponding unit test) +- **THEN** the agent MAY implement both before checkpointing +- **AND** SHALL commit them together in a single checkpoint + +#### Scenario: Unrelated tasks +- **WHEN** two tasks are not tightly coupled (their changes would be coherent if split) +- **THEN** the agent SHALL checkpoint each task separately +- **AND** SHALL NOT batch them into a single commit + +#### Scenario: Maximum batch size +- **WHEN** an agent groups tightly-coupled tasks +- **THEN** no more than 2–3 tasks SHALL accumulate without a checkpoint + +### Requirement: Checkpoint guardrail prominence +The checkpoint rule SHALL appear as the first guardrail in the apply-change skill and SHALL include an explicit anti-batching statement. + +#### Scenario: Guardrail ordering +- **WHEN** the apply-change skill's guardrails are rendered +- **THEN** the checkpoint guardrail SHALL be the first item in the list + +#### Scenario: Anti-batching statement +- **WHEN** the checkpoint guardrail is rendered +- **THEN** it SHALL explicitly state that large numbers of task completions MUST NOT be batched together + +### Requirement: Per-task commits bundled in merge requests +Per-task commits SHALL be bundled into a single merge request per change by default. The agent SHALL prompt the user for confirmation before creating the merge request. + +#### Scenario: Creating a merge request after implementation +- **WHEN** all tasks for a change are complete and the user requests a merge request +- **THEN** the agent SHALL include all per-task commits for that change in a single merge request by default +- **AND** SHALL prompt the user to confirm before submitting + +#### Scenario: User opts for separate merge requests +- **WHEN** the user declines the default bundling and requests separate merge requests +- **THEN** the agent SHALL accommodate the user's preference diff --git a/openspec/changes/checkpoint-discipline/tasks.md b/openspec/changes/checkpoint-discipline/tasks.md new file mode 100644 index 000000000..122e10df2 --- /dev/null +++ b/openspec/changes/checkpoint-discipline/tasks.md @@ -0,0 +1,23 @@ +## 1. Update step 6 in getApplyChangeSkillTemplate() + +- [ ] 1.1 Replace step 6 heading from "Implement tasks (loop until done or blocked)" to "Implement tasks (checkpoint after each)" +- [ ] 1.2 Add recovery rationale paragraph ("The tasks file is a recovery log...") +- [ ] 1.3 Replace flat bullet list with lettered sub-steps: a. Announce, b. Implement, c. Mark complete, d. Commit, e. Confirm +- [ ] 1.4 Add hard gate statement ("Steps c and d must happen before starting the next task") +- [ ] 1.5 Add tightly-coupled definition and maximum batch size (2–3 tasks) + +## 2. Update guardrails in getApplyChangeSkillTemplate() + +- [ ] 2.1 Replace "Update task checkbox immediately after completing each task" with checkpoint guardrail as first item, including anti-batching statement +- [ ] 2.2 Add MR bundling guardrail: per-task commits bundled into a single PR by default, with user confirmation +- [ ] 2.3 Reorder remaining guardrails (keep going, read context files, etc.) after the checkpoint guardrail + +## 3. Apply matching changes to getOpsxApplyCommandTemplate() + +- [ ] 3.1 Replace step 6 with identical checkpoint loop (matching task group 1) +- [ ] 3.2 Replace guardrails with identical updated guardrails (matching task group 2) + +## 4. Verify + +- [ ] 4.1 Run `pnpm run build` to confirm TypeScript compiles without errors +- [ ] 4.2 Run `pnpm test` to confirm no test regressions From 647bd72f364cf8322aa71a9d2f3314a71a7ae225 Mon Sep 17 00:00:00 2001 From: Marcus Don Date: Mon, 23 Feb 2026 09:38:33 +0000 Subject: [PATCH 2/4] fix: address review feedback on checkpoint-discipline proposal - Replace "squashing" with "bundling" in proposal.md and design.md to match spec terminology and avoid implying history rewriting - Use asterisks for emphasis instead of underscores (MD049) - Inline commit message format in spec scenario - Make "user opts for separate MRs" scenario observable Co-Authored-By: Claude Opus 4.6 --- openspec/changes/checkpoint-discipline/design.md | 2 +- openspec/changes/checkpoint-discipline/proposal.md | 4 ++-- .../specs/apply-checkpoint-discipline/spec.md | 5 +++-- 3 files changed, 6 insertions(+), 5 deletions(-) diff --git a/openspec/changes/checkpoint-discipline/design.md b/openspec/changes/checkpoint-discipline/design.md index 360e1c305..9374ceef5 100644 --- a/openspec/changes/checkpoint-discipline/design.md +++ b/openspec/changes/checkpoint-discipline/design.md @@ -81,7 +81,7 @@ The current step 6 is a flat bullet list: show task → make changes → mark co **Rationale:** Per-task commits are a recovery mechanism, not a review unit. Without this guidance, the granular commit history could produce noisy PRs. The user confirmation step ensures the agent doesn't automatically submit without the user's awareness. -**Alternative considered:** Squashing commits automatically. Rejected because squashing destroys the recovery trail before the user has confirmed the work is complete, and some users may prefer the granular history. +**Alternative considered:** Automatically bundling commits into a single squashed commit. Rejected because rewriting history destroys the recovery trail before the user has confirmed the work is complete, and some users may prefer the granular history. ## Risks / Trade-offs diff --git a/openspec/changes/checkpoint-discipline/proposal.md b/openspec/changes/checkpoint-discipline/proposal.md index b9796eef6..1d933563c 100644 --- a/openspec/changes/checkpoint-discipline/proposal.md +++ b/openspec/changes/checkpoint-discipline/proposal.md @@ -10,7 +10,7 @@ This happens because the instruction competes with stronger signals: coding agen - **Explain the "why"** — frame the tasks file as a recovery log so agents understand the purpose of per-task updates - **Add git commit to the checkpoint** — marking `[x]` without committing still loses progress on crash; committing makes recovery durable - **Allow pragmatic grouping** — tightly-coupled tasks (e.g., a class change and its test) can be checkpointed together, but unrelated tasks must not be batched -- **Default to squashing per-task commits in merge requests** — per-task commits are a recovery mechanism, not a review unit. When creating a merge request for a change, all per-task commits should be included in a single PR by default, with the user prompted to confirm before submission +- **Default to bundling per-task commits in a single merge request** — per-task commits are a recovery mechanism, not a review unit. When creating a merge request for a change, all per-task commits should be included in a single PR by default, with the user prompted to confirm before submission - **Reorder and strengthen guardrails** — move the checkpoint rule to the top, add an explicit anti-batching statement, remove the weaker "update task checkbox immediately" bullet it replaces ## Capabilities @@ -21,7 +21,7 @@ This happens because the instruction competes with stronger signals: coding agen ### Modified Capabilities -_(none — no existing spec requirements are changing)_ +*(none — no existing spec requirements are changing)* ## Impact diff --git a/openspec/changes/checkpoint-discipline/specs/apply-checkpoint-discipline/spec.md b/openspec/changes/checkpoint-discipline/specs/apply-checkpoint-discipline/spec.md index 4cac23b36..926cc09e7 100644 --- a/openspec/changes/checkpoint-discipline/specs/apply-checkpoint-discipline/spec.md +++ b/openspec/changes/checkpoint-discipline/specs/apply-checkpoint-discipline/spec.md @@ -11,7 +11,7 @@ The apply-change skill SHALL checkpoint after completing each task (or small gro #### Scenario: Single task completion - **WHEN** an agent completes a single pending task - **THEN** it SHALL mark the task `[x]` in the tasks file -- **AND** run `git add -A && git commit` with a message referencing the task number and a short description +- **AND** run `git add -A && git commit -m "task N: "` where N is the task number - **AND** output a confirmation (e.g., "Task N complete") before starting the next task #### Scenario: Session crash after checkpoint @@ -70,4 +70,5 @@ Per-task commits SHALL be bundled into a single merge request per change by defa #### Scenario: User opts for separate merge requests - **WHEN** the user declines the default bundling and requests separate merge requests -- **THEN** the agent SHALL accommodate the user's preference +- **THEN** the agent SHALL create separate merge requests as directed by the user +- **AND** SHALL NOT bundle per-task commits without the user's explicit confirmation From 808de0361166fe5fdb1c2b978df9f77f69dc7b85 Mon Sep 17 00:00:00 2001 From: Marcus Don Date: Mon, 23 Feb 2026 20:44:07 +0000 Subject: [PATCH 3/4] fix: address second round of review feedback - Align "PR" to "merge request" in proposal.md for consistency - Define grouping by build coherence (minimum tasks to avoid broken build) with hard ceiling of 3 as safety net - Add mid-checkpoint crash scenario (marked [x] but uncommitted) - Note follow-up test idea for template drift in design.md Co-Authored-By: Claude Opus 4.6 --- .../changes/checkpoint-discipline/design.md | 14 ++++++++------ .../changes/checkpoint-discipline/proposal.md | 6 +++--- .../specs/apply-checkpoint-discipline/spec.md | 18 ++++++++++++------ 3 files changed, 23 insertions(+), 15 deletions(-) diff --git a/openspec/changes/checkpoint-discipline/design.md b/openspec/changes/checkpoint-discipline/design.md index 9374ceef5..5efea8748 100644 --- a/openspec/changes/checkpoint-discipline/design.md +++ b/openspec/changes/checkpoint-discipline/design.md @@ -47,19 +47,19 @@ The current step 6 is a flat bullet list: show task → make changes → mark co **Alternative considered:** Keeping the bullet-list format. Rejected because bullets imply optional/unordered items, which is exactly the misinterpretation we're fixing. -### 3. Define "tightly coupled" by coherence, not by file proximity +### 3. Define grouping by build coherence, not by file proximity -**Decision:** Define tightly-coupled tasks as "changes that would be incoherent if split" with the example of a class change and its corresponding test. +**Decision:** Group the minimum number of tasks needed to avoid introducing a commit that would break the build on its own. A class change and its corresponding test are the canonical example — committing one without the other would fail CI. -**Rationale:** File-based rules (e.g., "tasks touching the same file") are too brittle — a refactoring task and a feature task might touch the same file but be logically independent. Coherence is the right criterion because it maps to what a reviewer would consider a single logical unit. +**Rationale:** File-based rules (e.g., "tasks touching the same file") are too brittle — a refactoring task and a feature task might touch the same file but be logically independent. Build coherence is the right criterion because it's objective (does CI pass?) and maps to what a reviewer would consider a single logical unit. **Alternative considered:** Strict one-task-per-checkpoint with no grouping. Rejected as impractical — splitting a function change from its test would create commits that fail tests, which is worse for recovery. -### 4. Cap grouped tasks at 2–3 +### 4. Hard ceiling of 3 tasks per checkpoint -**Decision:** "Never let more than 2–3 tasks accumulate without a checkpoint." +**Decision:** No checkpoint may exceed 3 tasks, regardless of coupling. If more than 3 are needed to avoid a broken build, the tasks should be restructured. -**Rationale:** This is deliberately imprecise. A hard cap of exactly 2 or exactly 3 would invite rules-lawyering. The range signals "this should be rare and small" while leaving room for agent judgment. The typical case is 1 task per checkpoint; grouping is the exception. +**Rationale:** The build-coherence rule (decision 3) provides the primary grouping criterion, but without a ceiling agents could rationalise large batches. The cap of 3 is a safety net — the typical case is 1 task per checkpoint, and needing more than 3 for a non-breaking commit signals the tasks are too granular and should be rewritten. ### 5. Include git commit in the checkpoint, not just file update @@ -75,6 +75,8 @@ The current step 6 is a flat bullet list: show task → make changes → mark co **Rationale:** Both functions contain the same step 6 and guardrails text. They must stay in sync — the skill template drives SKILL.md generation and the command template drives slash command generation. +**Follow-up consideration:** A snapshot or diff-based test asserting that the step-6 and guardrail sections of both rendered outputs are identical (modulo the expected `/opsx:continue` vs `openspec-continue-change` substitution) would prevent silent drift. This is out of scope for this change but worth tracking separately. + ### 7. Add MR bundling guidance to the guardrails **Decision:** Add a guardrail note that per-task commits should be bundled into a single merge request by default, with user confirmation. diff --git a/openspec/changes/checkpoint-discipline/proposal.md b/openspec/changes/checkpoint-discipline/proposal.md index 1d933563c..60b19daf5 100644 --- a/openspec/changes/checkpoint-discipline/proposal.md +++ b/openspec/changes/checkpoint-discipline/proposal.md @@ -9,8 +9,8 @@ This happens because the instruction competes with stronger signals: coding agen - **Restructure step 6** of the apply-change skill from a simple loop into a checkpoint-disciplined loop with announce → implement → mark complete → commit → confirm stages - **Explain the "why"** — frame the tasks file as a recovery log so agents understand the purpose of per-task updates - **Add git commit to the checkpoint** — marking `[x]` without committing still loses progress on crash; committing makes recovery durable -- **Allow pragmatic grouping** — tightly-coupled tasks (e.g., a class change and its test) can be checkpointed together, but unrelated tasks must not be batched -- **Default to bundling per-task commits in a single merge request** — per-task commits are a recovery mechanism, not a review unit. When creating a merge request for a change, all per-task commits should be included in a single PR by default, with the user prompted to confirm before submission +- **Allow pragmatic grouping** — group the minimum number of tasks needed to avoid a commit that would break the build (e.g., a class change and its test), with a hard ceiling of 3 tasks per checkpoint +- **Default to bundling per-task commits in a single merge request** — per-task commits are a recovery mechanism, not a review unit. When creating a merge request for a change, all per-task commits should be included in a single merge request by default, with the user prompted to confirm before submission - **Reorder and strengthen guardrails** — move the checkpoint rule to the top, add an explicit anti-batching statement, remove the weaker "update task checkbox immediately" bullet it replaces ## Capabilities @@ -27,6 +27,6 @@ This happens because the instruction competes with stronger signals: coding agen - **Files changed:** `src/core/templates/workflows/apply-change.ts` — both `getApplyChangeSkillTemplate()` and `getOpsxApplyCommandTemplate()`. These are the source templates from which all tool-specific SKILL.md and command files are generated during `openspec init`. - **No breaking changes:** The checkpoint loop is a refinement of the existing loop, not a new workflow step -- **Trade-off: resilience vs parallel efficiency.** Committing after each task adds overhead and prevents agents from parallelising unrelated tasks. This is intentional — the tasks file exists for recoverability, and that value is lost if progress isn't persisted. The tightly-coupled grouping rule (up to 2–3 tasks) provides a pragmatic escape valve so agents aren't forced to split changes that would be incoherent apart. +- **Trade-off: resilience vs parallel efficiency.** Committing after each task adds overhead and prevents agents from parallelising unrelated tasks. This is intentional — the tasks file exists for recoverability, and that value is lost if progress isn't persisted. The build-coherence grouping rule (minimum tasks to avoid a broken build, capped at 3) provides a pragmatic escape valve so agents aren't forced to split changes that would break CI independently. - **Trade-off: granular commits vs reviewable PRs.** Per-task commits create a clean recovery trail but would produce noisy merge requests if submitted as-is. The default behaviour should be to bundle all per-task commits for a change into a single PR, with user confirmation before submission. - **No schema changes** — this modifies skill instructions only, not the artifact graph or schema definitions diff --git a/openspec/changes/checkpoint-discipline/specs/apply-checkpoint-discipline/spec.md b/openspec/changes/checkpoint-discipline/specs/apply-checkpoint-discipline/spec.md index 926cc09e7..16a7f6f6d 100644 --- a/openspec/changes/checkpoint-discipline/specs/apply-checkpoint-discipline/spec.md +++ b/openspec/changes/checkpoint-discipline/specs/apply-checkpoint-discipline/spec.md @@ -24,6 +24,11 @@ The apply-change skill SHALL checkpoint after completing each task (or small gro - **THEN** only the current in-progress task's changes are lost - **AND** all previously checkpointed tasks remain recoverable +#### Scenario: Session crash in mid-checkpoint (marked complete, uncommitted) +- **WHEN** a session crashes after marking a task `[x]` but before the `git commit` succeeds +- **THEN** on recovery the agent SHALL verify that each `[x]` task has a corresponding commit in git history +- **AND** SHALL treat any `[x]` task without a corresponding commit as incomplete and re-attempt the checkpoint + ### Requirement: Recovery rationale in instructions The apply-change skill instructions SHALL explain that the tasks file serves as a recovery log and that progress is only durable once marked complete and committed. This rationale SHALL appear in the implementation step, not only in guardrails. @@ -33,21 +38,22 @@ The apply-change skill instructions SHALL explain that the tasks file serves as - **AND** SHALL state that progress is only durable once tasks are marked complete and committed ### Requirement: Tightly-coupled task grouping -The apply-change skill SHALL allow tightly-coupled tasks to be checkpointed together in a single commit, but SHALL prohibit batching unrelated tasks. +The apply-change skill SHALL group tasks into the minimum number needed to avoid introducing a commit that would break the build on its own. Unrelated tasks that can each produce a non-breaking commit SHALL be checkpointed separately. No checkpoint SHALL exceed 3 tasks regardless of coupling. #### Scenario: Grouping a class change and its test -- **WHEN** two tasks are tightly coupled (e.g., a class modification and its corresponding unit test) -- **THEN** the agent MAY implement both before checkpointing -- **AND** SHALL commit them together in a single checkpoint +- **WHEN** committing a class modification without its corresponding unit test would break the build +- **THEN** the agent SHALL group both tasks into a single checkpoint +- **AND** SHALL commit them together #### Scenario: Unrelated tasks -- **WHEN** two tasks are not tightly coupled (their changes would be coherent if split) +- **WHEN** two tasks can each produce a non-breaking commit independently - **THEN** the agent SHALL checkpoint each task separately - **AND** SHALL NOT batch them into a single commit #### Scenario: Maximum batch size - **WHEN** an agent groups tightly-coupled tasks -- **THEN** no more than 2–3 tasks SHALL accumulate without a checkpoint +- **THEN** no more than 3 tasks SHALL accumulate without a checkpoint (the typical case is 1; grouping is the exception) +- **AND** if more than 3 tasks are needed to avoid a broken build, the tasks SHOULD be restructured to allow smaller checkpoints ### Requirement: Checkpoint guardrail prominence The checkpoint rule SHALL appear as the first guardrail in the apply-change skill and SHALL include an explicit anti-batching statement. From 5059d1e2c9eeffdaa0d89a9383214fa3da509245 Mon Sep 17 00:00:00 2001 From: Marcus Don Date: Mon, 23 Feb 2026 20:55:08 +0000 Subject: [PATCH 4/4] fix: address third round of review feedback - Standardise all "PR"/"PRs"/"MR" to "merge request(s)" in design.md and proposal.md - Clarify mid-checkpoint crash recovery: distinguish between uncommitted changes still present (just commit) vs changes lost (re-implement) Co-Authored-By: Claude Opus 4.6 --- openspec/changes/checkpoint-discipline/design.md | 6 +++--- openspec/changes/checkpoint-discipline/proposal.md | 2 +- .../specs/apply-checkpoint-discipline/spec.md | 4 +++- 3 files changed, 7 insertions(+), 5 deletions(-) diff --git a/openspec/changes/checkpoint-discipline/design.md b/openspec/changes/checkpoint-discipline/design.md index 5efea8748..874fa22a5 100644 --- a/openspec/changes/checkpoint-discipline/design.md +++ b/openspec/changes/checkpoint-discipline/design.md @@ -19,7 +19,7 @@ The current step 6 is a flat bullet list: show task → make changes → mark co - Add git commit as part of the checkpoint so recovery survives process crashes (not just context window loss) - Define tightly-coupled grouping clearly enough that agents can apply it without ambiguity - Promote the checkpoint guardrail to first position with anti-batching language -- Address the PR noise problem by noting that per-task commits should be bundled for review +- Address the merge request noise problem by noting that per-task commits should be bundled for review **Non-Goals:** @@ -77,11 +77,11 @@ The current step 6 is a flat bullet list: show task → make changes → mark co **Follow-up consideration:** A snapshot or diff-based test asserting that the step-6 and guardrail sections of both rendered outputs are identical (modulo the expected `/opsx:continue` vs `openspec-continue-change` substitution) would prevent silent drift. This is out of scope for this change but worth tracking separately. -### 7. Add MR bundling guidance to the guardrails +### 7. Add merge request bundling guidance to the guardrails **Decision:** Add a guardrail note that per-task commits should be bundled into a single merge request by default, with user confirmation. -**Rationale:** Per-task commits are a recovery mechanism, not a review unit. Without this guidance, the granular commit history could produce noisy PRs. The user confirmation step ensures the agent doesn't automatically submit without the user's awareness. +**Rationale:** Per-task commits are a recovery mechanism, not a review unit. Without this guidance, the granular commit history could produce noisy merge requests. The user confirmation step ensures the agent doesn't automatically submit without the user's awareness. **Alternative considered:** Automatically bundling commits into a single squashed commit. Rejected because rewriting history destroys the recovery trail before the user has confirmed the work is complete, and some users may prefer the granular history. diff --git a/openspec/changes/checkpoint-discipline/proposal.md b/openspec/changes/checkpoint-discipline/proposal.md index 60b19daf5..fd1a3351b 100644 --- a/openspec/changes/checkpoint-discipline/proposal.md +++ b/openspec/changes/checkpoint-discipline/proposal.md @@ -28,5 +28,5 @@ This happens because the instruction competes with stronger signals: coding agen - **Files changed:** `src/core/templates/workflows/apply-change.ts` — both `getApplyChangeSkillTemplate()` and `getOpsxApplyCommandTemplate()`. These are the source templates from which all tool-specific SKILL.md and command files are generated during `openspec init`. - **No breaking changes:** The checkpoint loop is a refinement of the existing loop, not a new workflow step - **Trade-off: resilience vs parallel efficiency.** Committing after each task adds overhead and prevents agents from parallelising unrelated tasks. This is intentional — the tasks file exists for recoverability, and that value is lost if progress isn't persisted. The build-coherence grouping rule (minimum tasks to avoid a broken build, capped at 3) provides a pragmatic escape valve so agents aren't forced to split changes that would break CI independently. -- **Trade-off: granular commits vs reviewable PRs.** Per-task commits create a clean recovery trail but would produce noisy merge requests if submitted as-is. The default behaviour should be to bundle all per-task commits for a change into a single PR, with user confirmation before submission. +- **Trade-off: granular commits vs reviewable merge requests.** Per-task commits create a clean recovery trail but would produce noisy merge requests if submitted as-is. The default behaviour should be to bundle all per-task commits for a change into a single merge request, with user confirmation before submission. - **No schema changes** — this modifies skill instructions only, not the artifact graph or schema definitions diff --git a/openspec/changes/checkpoint-discipline/specs/apply-checkpoint-discipline/spec.md b/openspec/changes/checkpoint-discipline/specs/apply-checkpoint-discipline/spec.md index 16a7f6f6d..b297d12ea 100644 --- a/openspec/changes/checkpoint-discipline/specs/apply-checkpoint-discipline/spec.md +++ b/openspec/changes/checkpoint-discipline/specs/apply-checkpoint-discipline/spec.md @@ -27,7 +27,9 @@ The apply-change skill SHALL checkpoint after completing each task (or small gro #### Scenario: Session crash in mid-checkpoint (marked complete, uncommitted) - **WHEN** a session crashes after marking a task `[x]` but before the `git commit` succeeds - **THEN** on recovery the agent SHALL verify that each `[x]` task has a corresponding commit in git history -- **AND** SHALL treat any `[x]` task without a corresponding commit as incomplete and re-attempt the checkpoint +- **AND** for any `[x]` task without a corresponding commit, SHALL check the working directory for uncommitted changes +- **AND** if uncommitted changes are present, SHALL commit them to complete the checkpoint +- **AND** if no uncommitted changes are present, SHALL treat the task as not started and re-implement before committing ### Requirement: Recovery rationale in instructions The apply-change skill instructions SHALL explain that the tasks file serves as a recovery log and that progress is only durable once marked complete and committed. This rationale SHALL appear in the implementation step, not only in guardrails.