feat: checkpoint discipline for apply-change skill#744
feat: checkpoint discipline for apply-change skill#744FasterPHP wants to merge 4 commits intoFission-AI:mainfrom
Conversation
Adds an OpenSpec change proposal to strengthen the apply-change skill's implementation loop with checkpoint discipline for crash recovery. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review infoConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds specification, design, proposal, and task-planning documents that redefine apply-change step 6 into a checkpoint-disciplined loop (announce → implement → mark complete → commit → confirm), mandates per-task durable commits with MR-bundling guidance, and instructs synchronizing two templates implementing step-6 text. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant ApplySkill as Apply-Change Skill
participant Tasks as Tasks File (recovery log)
participant Git as Git Repo / MR
User->>ApplySkill: Start apply-change workflow
ApplySkill->>User: Announce next task
User->>ApplySkill: Implement task
ApplySkill->>Tasks: Mark task complete (persisted)
ApplySkill->>Git: Commit per-task change
ApplySkill->>User: Prompt to bundle/submit MR
User->>Git: Confirm MR submission
Git-->>ApplySkill: Commit / MR confirmed
ApplySkill-->>User: Confirm checkpoint and proceed
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryAdds checkpoint discipline to the apply-change skill to strengthen crash recovery by making per-task checkpointing (marking complete + git commit) the dominant behavioral signal. The proposal restructures step 6 into explicit sub-steps (announce → implement → mark complete → commit → confirm) with recovery rationale inline, allows pragmatic grouping of tightly-coupled tasks (max 2-3), and promotes the checkpoint guardrail to first position with anti-batching language. Key changes:
Minor issue found:
Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
Start([Agent starts task N]) --> Announce[a. Announce task N]
Announce --> Implement[b. Implement changes]
Implement --> Coupled{Tightly coupled<br/>with next task?}
Coupled -->|Yes, and < 3 tasks| NextTask[Continue to next task]
Coupled -->|No or at limit| MarkComplete[c. Mark task complete in tasks.md]
NextTask --> Implement
MarkComplete --> GitCommit[d. git add -A && git commit -m 'task N: description']
GitCommit --> Confirm[e. Confirm completion]
Confirm --> MoreTasks{More tasks?}
MoreTasks -->|Yes| Start
MoreTasks -->|No| Done[All tasks complete]
Done --> PR[Bundle commits into single PR<br/>with user confirmation]
Last reviewed commit: 18ee59b |
| - **Explain the "why"** — frame the tasks file as a recovery log so agents understand the purpose of per-task updates | ||
| - **Add git commit to the checkpoint** — marking `[x]` without committing still loses progress on crash; committing makes recovery durable | ||
| - **Allow pragmatic grouping** — tightly-coupled tasks (e.g., a class change and its test) can be checkpointed together, but unrelated tasks must not be batched | ||
| - **Default to squashing per-task commits in merge requests** — per-task commits are a recovery mechanism, not a review unit. When creating a merge request for a change, all per-task commits should be included in a single PR by default, with the user prompted to confirm before submission |
There was a problem hiding this comment.
Terminology inconsistency: Line 13 says "Default to squashing per-task commits" but the rest of the proposal (including the spec) uses "bundled" to describe this behavior. Consider using "bundled" here for consistency, since the proposal isn't actually proposing to squash commits (which would lose the recovery trail).
| **Decision:** Add a guardrail note that per-task commits should be bundled into a single merge request by default, with user confirmation. | ||
|
|
||
| **Rationale:** Per-task commits are a recovery mechanism, not a review unit. Without this guidance, the granular commit history could produce noisy PRs. The user confirmation step ensures the agent doesn't automatically submit without the user's awareness. | ||
|
|
There was a problem hiding this comment.
Same terminology inconsistency as in proposal.md: "Squashing commits automatically" should be "Bundling commits into a single PR" to match the accepted decision and spec terminology.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
openspec/changes/checkpoint-discipline/specs/apply-checkpoint-discipline/spec.md (2)
71-73: Vague THEN clause in "User opts for separate merge requests" scenario."SHALL accommodate the user's preference" has no observable outcome. A scenario should specify what accommodation looks like (e.g., creating one MR per task commit, or per logical group) so the behavior can be verified.
✏️ Proposed refinement
#### Scenario: User opts for separate merge requests - **WHEN** the user declines the default bundling and requests separate merge requests -- **THEN** the agent SHALL accommodate the user's preference +- **THEN** the agent SHALL create separate merge requests as directed by the user +- **AND** SHALL NOT bundle per-task commits without the user's explicit confirmation🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@openspec/changes/checkpoint-discipline/specs/apply-checkpoint-discipline/spec.md` around lines 71 - 73, The THEN clause "SHALL accommodate the user's preference" in the "User opts for separate merge requests" scenario is non‑observable; replace it with a concrete, verifiable outcome such as "THEN the agent SHALL create separate merge requests according to the user's selected granularity (e.g., one MR per task commit or one MR per logical change group) and include each MR's title, description, and associated commit list so reviewers can verify the split." Update the scenario to require explicit criteria for grouping (commit-level or logical-group) and that the agent surfaces the created MR IDs/URLs and the mapping from tasks/commits to MRs so the behavior can be tested.
14-14: Spec precision: specify the commit message format inline.The scenario references the commit message only as "with a message referencing the task number," but
design.md(Decision 5) specifies the exact format astask N: <short description>. Embedding the format here avoids an implementer having to cross-reference the design doc to satisfy this scenario.✏️ Proposed refinement
-- **AND** run `git add -A && git commit` with a message referencing the task number and a short description +- **AND** run `git add -A && git commit -m "task N: <short description>"` where N is the task number🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@openspec/changes/checkpoint-discipline/specs/apply-checkpoint-discipline/spec.md` at line 14, Update the scenario text that currently says to run "git add -A && git commit" with a message referencing the task number to explicitly require the commit message format used in Decision 5; replace or augment that phrase so it reads that the commit should use the format "task N: <short description>" (i.e., include the exact format inline), so implementers don't need to cross-reference the design doc when running git commit.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@openspec/changes/checkpoint-discipline/proposal.md`:
- Line 24: Replace the underscore-based emphasis on the line that reads "_(none
— no existing spec requirements are changing)_" with asterisks to satisfy MD049;
update that snippet to use "* (none — no existing spec requirements are
changing) *" (i.e., change leading and trailing underscores to asterisks) so the
file openspec/changes/checkpoint-discipline/proposal.md no longer triggers the
markdownlint MD049 warning.
- Line 13: The heading "Default to squashing per-task commits in merge requests"
conflicts with Decision 7 in design.md which rejects squashing; update the
phrase to "Default to bundling per-task commits in merge requests" and adjust
the sentence to explicitly state that per-task commits should be combined into a
single PR without rewriting history (no git squash/rebase), matching spec.md and
tasks.md terminology and preserving the recovery trail.
---
Nitpick comments:
In
`@openspec/changes/checkpoint-discipline/specs/apply-checkpoint-discipline/spec.md`:
- Around line 71-73: The THEN clause "SHALL accommodate the user's preference"
in the "User opts for separate merge requests" scenario is non‑observable;
replace it with a concrete, verifiable outcome such as "THEN the agent SHALL
create separate merge requests according to the user's selected granularity
(e.g., one MR per task commit or one MR per logical change group) and include
each MR's title, description, and associated commit list so reviewers can verify
the split." Update the scenario to require explicit criteria for grouping
(commit-level or logical-group) and that the agent surfaces the created MR
IDs/URLs and the mapping from tasks/commits to MRs so the behavior can be
tested.
- Line 14: Update the scenario text that currently says to run "git add -A &&
git commit" with a message referencing the task number to explicitly require the
commit message format used in Decision 5; replace or augment that phrase so it
reads that the commit should use the format "task N: <short description>" (i.e.,
include the exact format inline), so implementers don't need to cross-reference
the design doc when running git commit.
- Replace "squashing" with "bundling" in proposal.md and design.md to match spec terminology and avoid implying history rewriting - Use asterisks for emphasis instead of underscores (MD049) - Inline commit message format in spec scenario - Make "user opts for separate MRs" scenario observable Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
openspec/changes/checkpoint-discipline/specs/apply-checkpoint-discipline/spec.md (2)
48-50:SHALLwith a2–3range is ambiguous normative language.RFC 2119
SHALLis normative and requires a precise constraint. Pairing it with the range2–3makes the requirement untestable — you cannot definitively fail compliance when the ceiling is a range. Design Decision 4 explicitly calls this "deliberately imprecise," but that rationale belongs in the design doc, not the spec. Consider pinning the spec to the upper bound (3) while preserving the intent in a note:- - **THEN** no more than 2–3 tasks SHALL accumulate without a checkpoint + - **THEN** no more than 3 tasks SHALL accumulate without a checkpoint (the typical case is 1; grouping is the exception)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@openspec/changes/checkpoint-discipline/specs/apply-checkpoint-discipline/spec.md` around lines 48 - 50, The normative requirement in the "Scenario: Maximum batch size" should be made precise: replace the ambiguous "SHALL ... 2–3 tasks" wording with a concrete upper bound such as "SHALL not allow more than 3 tasks to accumulate without a checkpoint" (keeping the scenario title "Maximum batch size" and text referring to tightly-coupled tasks), and move any rationale about deliberate imprecision into a non-normative note or the design doc (preserve intent by adding a note that explains the original 2–3 intent but keep the normative line fixed to an exact ceiling of 3).
22-26: Spec gap: partial-checkpoint state (crash after[x]but beforegit commit) is unaddressed.The current "crash before checkpoint" scenario covers "task in progress before the checkpoint," implying the tasks file still shows
[ ]. However, there is a mid-checkpoint failure mode: the agent marks[x]to the filesystem but crashes before thegit commitcompletes. On recovery, the tasks file shows the task as complete but no commit exists — a recovering agent could skip the task entirely, silently losing its uncommitted changes.This is distinct from both existing scenarios and is not covered by the durability rationale in lines 27–33 (which states the rule but defines no recovery action for this state). Suggest adding an explicit scenario:
+#### Scenario: Session crash in mid-checkpoint (marked complete, uncommitted) +- **WHEN** a session crashes after marking a task `[x]` but before the `git commit` succeeds +- **THEN** the agent SHALL verify that each `[x]` task in the tasks file has a corresponding commit in git history +- **AND** SHALL treat any `[x]` task without a corresponding commit as incomplete and re-attempt the checkpoint🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@openspec/changes/checkpoint-discipline/specs/apply-checkpoint-discipline/spec.md` around lines 22 - 26, Add a new Scenario titled "Crash after marking [x] but before git commit" that describes: WHEN the agent writes the task as checked "[x]" to the tasks file but crashes before creating the git commit, THEN on recovery the agent must detect the mismatch by checking the tasks file state against the repository history (e.g., look for a commit that records the task completion), and if no matching commit exists treat the task as not completed (reset the task to "[ ]" or re-run the task) and ensure the task’s changes are committed atomically after successful re-execution; reference the tasks file state, the "[x]" marker, and the git commit check in the scenario and recovery action.openspec/changes/checkpoint-discipline/design.md (1)
72-76: Consider a mechanism to prevent silent divergence between the two template functions.Decision 6 correctly identifies that
getApplyChangeSkillTemplate()andgetOpsxApplyCommandTemplate()share identical step-6 and guardrail content and must stay in sync. But with no enforcement, a future edit to one function that misses the other will silently create a behavioural inconsistency between SKILL.md-generated files and slash-command-generated files — exactly the kind of drift that's hard to catch in review.A lightweight option would be a snapshot or diff-based test asserting that the step-6 and guardrail sections of the two rendered outputs are identical (or differ only by the expected
/opsx:continue/openspec-continue-changesubstitution). This would be worth tracking as a follow-up task.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@openspec/changes/checkpoint-discipline/design.md` around lines 72 - 76, Add a unit/integration test that prevents silent divergence by rendering both template functions getApplyChangeSkillTemplate() and getOpsxApplyCommandTemplate() and asserting the step-6 and guardrails sections are identical aside from the expected command-name substitution; locate the rendered outputs from those functions, extract the step-6/guardrail block (by delimiter or header strings) and compare them after normalizing the allowed difference (e.g., replace "/opsx:continue" with "openspec-continue-change" or vice versa) so future edits that change one template and not the other will fail the test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@openspec/changes/checkpoint-discipline/proposal.md`:
- Line 13: The sentence uses inconsistent terminology ("single merge request"
and later "single PR"); update the latter occurrence so both use "merge request"
to match the spec terminology. Locate the bullet containing the phrase "Default
to bundling per-task commits in a single merge request" and replace "PR" with
"merge request" so the entire sentence consistently uses "merge request".
---
Nitpick comments:
In `@openspec/changes/checkpoint-discipline/design.md`:
- Around line 72-76: Add a unit/integration test that prevents silent divergence
by rendering both template functions getApplyChangeSkillTemplate() and
getOpsxApplyCommandTemplate() and asserting the step-6 and guardrails sections
are identical aside from the expected command-name substitution; locate the
rendered outputs from those functions, extract the step-6/guardrail block (by
delimiter or header strings) and compare them after normalizing the allowed
difference (e.g., replace "/opsx:continue" with "openspec-continue-change" or
vice versa) so future edits that change one template and not the other will fail
the test.
In
`@openspec/changes/checkpoint-discipline/specs/apply-checkpoint-discipline/spec.md`:
- Around line 48-50: The normative requirement in the "Scenario: Maximum batch
size" should be made precise: replace the ambiguous "SHALL ... 2–3 tasks"
wording with a concrete upper bound such as "SHALL not allow more than 3 tasks
to accumulate without a checkpoint" (keeping the scenario title "Maximum batch
size" and text referring to tightly-coupled tasks), and move any rationale about
deliberate imprecision into a non-normative note or the design doc (preserve
intent by adding a note that explains the original 2–3 intent but keep the
normative line fixed to an exact ceiling of 3).
- Around line 22-26: Add a new Scenario titled "Crash after marking [x] but
before git commit" that describes: WHEN the agent writes the task as checked
"[x]" to the tasks file but crashes before creating the git commit, THEN on
recovery the agent must detect the mismatch by checking the tasks file state
against the repository history (e.g., look for a commit that records the task
completion), and if no matching commit exists treat the task as not completed
(reset the task to "[ ]" or re-run the task) and ensure the task’s changes are
committed atomically after successful re-execution; reference the tasks file
state, the "[x]" marker, and the git commit check in the scenario and recovery
action.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
openspec/changes/checkpoint-discipline/design.mdopenspec/changes/checkpoint-discipline/proposal.mdopenspec/changes/checkpoint-discipline/specs/apply-checkpoint-discipline/spec.md
- Align "PR" to "merge request" in proposal.md for consistency - Define grouping by build coherence (minimum tasks to avoid broken build) with hard ceiling of 3 as safety net - Add mid-checkpoint crash scenario (marked [x] but uncommitted) - Note follow-up test idea for template drift in design.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@openspec/changes/checkpoint-discipline/design.md`:
- Line 22: Replace inconsistent terminology: locate all occurrences of the
abbreviations "PR", "PRs", and "MR" in the document (notably the Decision 7
heading and the prose instances that currently read "PR"/"PRs") and standardize
them to "merge request" or "merge requests" to match the rest of the spec;
update the Decision 7 heading text and the inline mentions around the earlier
paragraph (the occurrences currently using "PR"/"PRs"/"MR") so they read "merge
request" or "merge requests" with correct pluralization.
In `@openspec/changes/checkpoint-discipline/proposal.md`:
- Line 31: Replace the remaining "PR" terminology in the bold heading and
closing phrase: change the heading text "Trade-off: granular commits vs
reviewable PRs" to use "merge requests" and replace "single PR" at the end of
the sentence with "single merge request" so the sentence consistently reads
about "merge requests" (the mid-sentence "noisy merge requests" is already
correct).
In
`@openspec/changes/checkpoint-discipline/specs/apply-checkpoint-discipline/spec.md`:
- Around line 27-31: Clarify the ambiguous "re-attempt the checkpoint" behavior
in the Scenario: Session crash in mid-checkpoint (marked complete, uncommitted)
by updating the THEN/AND clauses to require the agent to (1) verify each `[x]`
task has a corresponding git commit, (2) for any `[x]` without a commit, check
the working directory for uncommitted changes and if changes exist perform the
git commit, and (3) if no working-directory changes exist treat the `[x]` task
as not started and re-implement the task before committing; reference the
Scenario title and the `[x]` task marker so readers can locate the affected
clauses.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
openspec/changes/checkpoint-discipline/design.mdopenspec/changes/checkpoint-discipline/proposal.mdopenspec/changes/checkpoint-discipline/specs/apply-checkpoint-discipline/spec.md
openspec/changes/checkpoint-discipline/specs/apply-checkpoint-discipline/spec.md
Show resolved
Hide resolved
- Standardise all "PR"/"PRs"/"MR" to "merge request(s)" in design.md and proposal.md - Clarify mid-checkpoint crash recovery: distinguish between uncommitted changes still present (just commit) vs changes lost (re-implement) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Also, thanks for only opening a proposal! |
| @@ -0,0 +1,23 @@ | |||
| ## 1. Update step 6 in getApplyChangeSkillTemplate() | |||
There was a problem hiding this comment.
Hmm, what do you think about adding this in as an optional experimental skill/slash command to begin with?
We recently added optional skills that can be installed but are not created/added by default. I'd be open to trying this out as a separate skill to begin with, and if it works well enough, we can replace the main skill with it or keep it as a skill for more advanced users.
I also wonder if per-task commit is too granular in general, maybe per-phase commit? <- people will definitely start asking to control this behaviour, haha, but it's not a pressing concern.
Summary
Artifacts
proposal.md— Problem statement, justification, and impactspecs/apply-checkpoint-discipline/spec.md— Formal requirements with WHEN/THEN scenariosdesign.md— Technical decisions and trade-offstasks.md— Implementation checklistTest plan
🤖 Generated with Claude Code using claude-opus-4-6
Summary by CodeRabbit
New Features
Documentation