prosdevlab · prosdev · Mar 14, 2026 · Mar 14, 2026 · Mar 14, 2026 · Mar 14, 2026
diff --git a/.claude/agents/research-planner.md b/.claude/agents/research-planner.md
@@ -0,0 +1,79 @@
+---
+name: research-planner
+description: "Researches and plans features through structured dialogue. Use when starting a new phase or feature — identifies assumptions, presents architectural options, and breaks work into demoable phases."
+tools: Read, Edit, Write, Grep, Glob, Bash, WebSearch, WebFetch
+model: opus
+color: green
+---
+
+You are a senior software engineer helping me plan features.
+
+## Behavior
+- Before proposing a solution, identify 2-3 key assumptions and ask me to confirm them.
+- For non-trivial decisions (architecture, library choice, data model), present 2-3 options as a
+  table: approach | tradeoffs | risk | mitigation.
+- Default to the simplest approach unless I indicate otherwise.
+- Flag irreversible decisions explicitly (e.g., schema changes, public API contracts).
+
+## Planning
+- Break work into phases. Each phase should be independently demoable or revertable.
+- For each phase, call out: what could go wrong, how we'd detect it, and how we'd roll back.
+- Distinguish between "must decide now" vs "can defer" choices.
+
+## Communication
+- Be direct. Skip preamble.
+- When you're uncertain, say so and quantify your confidence if possible.
+- If my request is ambiguous, ask a focused clarifying question rather than guessing.
+  Limit to 3 questions at a time — batch if needed.
+- Ask lots of clarifying questions. Don't assume — probe. Cover: scope boundaries,
+  expected behavior, edge cases, integration points, and anything that could be
+  interpreted two ways. It's better to ask too many questions than to build the wrong thing.
+
+## Research Process
+1. Read relevant existing code, plans, and skills before proposing anything.
+2. Check library documentation and APIs when making technical recommendations.
+3. Cross-reference the PROPOSAL.md roadmap and CLAUDE.md constraints.
+4. Verify assumptions against the actual codebase — don't guess at file paths or APIs.
+
+## Output
+- Plans go in `.claude/gw-plans/` following the existing structure.
+- Each plan should be self-contained: someone reading only the plan file should understand
+  what to build and why.
+- Include a "Not in Scope" section to prevent scope creep.
+- Include a "Decisions & Risks" section documenting assumptions and their mitigations.
+- Include a "Commit Plan" section: ordered list of commits, each with a conventional commit
+  message, the files touched, and what the commit achieves. Each commit should be independently
+  buildable and testable — never leave the codebase in a broken state between commits.
+- Include a "Detailed Todolist" section: granular, ordered checklist of implementation steps
+  that Claude can follow mechanically. Each item should be small enough to complete without
+  further clarification. Group by commit where possible.
+
+## Plan Structure — Small vs Large Features
+- **Small feature** (1-2 commits, ~1 file changed): single plan file.
+  Example: `execution/phase-4-api-routes.md`
+- **Large feature** (3+ commits, multiple modules): use a folder with an overview + per-commit
+  part files. This keeps each file reviewable in one pass (~250-300 lines max).
+  Example:
+  ```
+  execution/phase-3/
+    overview.md          — architecture, decisions, SSE contract, not-in-scope
+    3.1-builder-checkpointer.md  — commit plan + detailed todolist for part 1
+    3.2-run-manager.md           — commit plan + detailed todolist for part 2
+    3.3-executor-core.md         — commit plan + detailed todolist for part 3
+    3.4-routes.md                — commit plan + detailed todolist for part 4
+  ```
+- The **overview** contains: architecture diagrams, execution flow, decisions & risks table,
+  SSE/API contracts, not-in-scope. This is the "what and why" — reviewed once.
+- Each **part file** contains: commit message, files touched, detailed todolist, tests.
+  This is the "how" — reviewed per-commit.
+- Aim for 350-400 lines per part file, 500 lines max.
+- Use your judgement on the threshold. If a plan exceeds ~400 lines or has 3+ distinct
+  commits touching different modules, split it.
+
+## Revision Workflow (for the orchestrating agent)
+When plan-reviewer findings need to be applied to a large feature (overview + parts):
+1. **Fix the overview first** (sequentially) — it sets the architecture decisions that parts reference.
+2. **Fix the part files in parallel** — they are independent of each other and can reference
+   the updated overview. This gives consistency and speed.
+The research-planner cannot spawn sub-agents itself. The orchestrating agent (main conversation)
+should launch parallel research-planner invocations for the part files after the overview is done.
diff --git a/.claude/gw-plans/execution/README.md b/.claude/gw-plans/execution/README.md
@@ -9,6 +9,6 @@ FastAPI + LangGraph backend phases.
 | 1 | [DB, Tools, State Utils](phase-1-db-tools-state-utils.md) | Merged | [#1](https://github.com/prosdevlab/graphweave/pull/1) |
 | 1.5 | [Scoped API Key Auth](phase-1.5-execution-auth.md) | Merged | [#2](https://github.com/prosdevlab/graphweave/pull/2) |
 | 2 | [GraphSchema -> LangGraph Builder](phase-2-graph-schema-langgraph-builder.md) | Merged | [#3](https://github.com/prosdevlab/graphweave/pull/3) |
-| 3 | Executor + SSE streaming | Not started | — |
+| 3 | [Executor + SSE Streaming](phase-3/overview.md) | Planned | — |
 | 4 | API routes (run, stream, resume, validate, export) | Not started | — |
 | 5 | Exporter + remaining tools + SSRF transport | Not started | — |
diff --git a/.claude/gw-plans/execution/phase-3/3.1-builder-checkpointer.md b/.claude/gw-plans/execution/phase-3/3.1-builder-checkpointer.md
@@ -0,0 +1,76 @@
+# Part 3.1: Builder Checkpointer Parameter
+
+See [overview.md](overview.md) for architecture context.
+
+## Summary
+
+Add an optional `checkpointer` keyword argument to `build_graph()` in `app/builder.py`. When provided, it overrides the existing auto-detection logic (which only adds `InMemorySaver` for `human_input` graphs). When not provided, existing behavior is preserved. This is additive -- no existing tests break.
+
+The executor (Part 3.3) will call `build_graph(schema, checkpointer=InMemorySaver())` so every graph has a checkpointer, enabling `aget_state()` for state snapshots.
+
+## Implementation
+
+### Change to `build_graph()` signature
+
+```python
+def build_graph(
+    schema: dict,
+    *,
+    llm_override=None,
+    checkpointer: "BaseCheckpointSaver | None" = None,  # NEW: optional override
+) -> BuildResult:
+```
+
+### Change to compilation section
+
+```python
+    has_human_input = any(n["type"] == "human_input" for n in schema["nodes"])
+    try:
+        if checkpointer is not None:
+            compiled = graph.compile(checkpointer=checkpointer)
+        elif has_human_input:
+            from langgraph.checkpoint.memory import InMemorySaver
+            compiled = graph.compile(checkpointer=InMemorySaver())
+        else:
+            compiled = graph.compile()
+        return BuildResult(graph=compiled, defaults=defaults)
+    except Exception as exc:
+        logger.exception("Graph compilation failed")
+        raise GraphBuildError("Graph compilation failed") from exc
+```
+
+## Files
+
+| Action | File |
+|--------|------|
+| **modify** | `app/builder.py` |
+| **modify** | `tests/unit/test_builder.py` |
+
+## Tests (3)
+
+- **test_checkpointer_parameter**: Call `build_graph(schema, checkpointer=InMemorySaver())` with a schema that has no human_input nodes. Verify the provided checkpointer is used. Note: verify actual LangGraph behavior before writing the assertion -- if `result.graph.checkpointer` is not directly accessible, use `isinstance` or check that `aget_state` works (which requires a checkpointer).
+- **test_checkpointer_none_preserves_behavior**: Call `build_graph(schema)` with a schema that has no human_input nodes. Verify graph compiles without checkpointer (same as existing behavior).
+- **test_checkpointer_overrides_human_input_auto_detection**: Create a schema with human_input nodes. Pass an explicit `saver = InMemorySaver()`. Verify identity: `result.graph.checkpointer is saver` -- the builder must use the provided instance, not create a second one.
+
+## Commit
+
+```
+feat: add checkpointer parameter to build_graph
+
+Allow callers to provide an explicit checkpointer for graph compilation.
+The executor uses this to enable state snapshots on all graphs.
+
+Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
+```
+
+## Detailed Todolist
+
+- [ ] Read `app/builder.py` current `build_graph()` signature and compilation section
+- [ ] Add `checkpointer: BaseCheckpointSaver | None = None` parameter to `build_graph()` function signature (import `BaseCheckpointSaver` from `langgraph.checkpoint.base`)
+- [ ] Update docstring to document the new parameter
+- [ ] Modify compilation section: if `checkpointer is not None`, use it; elif `has_human_input`, use `InMemorySaver()`; else compile without
+- [ ] In `tests/unit/test_builder.py`, add `test_checkpointer_parameter`: call `build_graph(schema, checkpointer=InMemorySaver())`, verify graph has checkpointer. Note: verify actual LangGraph attribute access before writing assertion -- if `result.graph.checkpointer` is not directly accessible, verify `aget_state` works instead.
+- [ ] Add `test_checkpointer_none_preserves_behavior`: call `build_graph(schema)` with a schema that has no human_input nodes, verify graph compiles without checkpointer (same as existing behavior)
+- [ ] Add `test_checkpointer_overrides_human_input_auto_detection`: create schema with human_input nodes, pass `saver = InMemorySaver()`, verify `result.graph.checkpointer is saver` -- the builder must use the provided instance, not create a second one
+- [ ] Run `uv run ruff check app/builder.py tests/unit/test_builder.py`
+- [ ] Run `uv run pytest tests/unit/test_builder.py -v` -- all tests pass including new ones