feat(orchestration): docs/12 spec/planner/executor pipeline + audit fixes#1
Open
devteapot wants to merge 16 commits into
Open
feat(orchestration): docs/12 spec/planner/executor pipeline + audit fixes#1devteapot wants to merge 16 commits into
devteapot wants to merge 16 commits into
Conversation
Phase A of docs/15-executor-routing.md. Replaces the parallel
`model?: string` / `executionMode?: string` shortcuts on the sub-agent
spawn path with a single typed `executor?: ExecutorBinding`
(`{kind:"llm",profileId,modelOverride?} | {kind:"acp",adapterId,timeoutMs?}`)
resolved through one `ExecutorResolver`. Specialists, tasks, role
defaults, overlays, and gates land in later phases.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tightens the docs/12 HITL substrate against an audit pass. Each fix targets a documented invariant the prior implementation did not enforce: - Plan revision accept supersedes prior-revision non-terminal tasks so completion gating reflects the active "complete slice set." - Plans can only reference accepted spec versions; drafts/active/archived are rejected at create_plan_revision and assertPlanSpecFresh. - Evidence claims validate criterion_id membership against the slice's acceptance_criteria; replayable kind requires at least one replayable ref; failing replayable checks (exit != 0) cannot satisfy criteria. - Gate acceptance now reverts to open if the resolution handler throws, closing the accepted-but-unapplied window. - generateDigest decrements activeGenerationCount in finally, fixing a liveness leak that disabled triggered digests until restart. - Tasks tagged with plan_revision_id; new listActiveRevisionTaskIds scopes final audit, /orchestration root, retry-budget, digest counts, and drift progress to the active revision (excluding superseded). - create_task retries inherit docs/12 fields and acceptance criteria from the source slice. - New observed_only_coverage warning drift event surfaces in digest near_misses; aggregates per-criterion across rows and consults prior claims; legacy record_verification path also runs through drift. - Legacy record_verification no longer treats skipped as criterion- satisfying. Tests: regression coverage added for each fix in tests/docs12-orchestration.test.ts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires LLM-driven role-scoped sub-agents on top of the docs/12 substrate. Phase 1 — autonomous executor: - executor_binding flows through PlanSliceInput / TaskDefinition / scheduler to delegation spawn, so plans can per-slice route LLM/ACP. - Scheduler tags every dispatched spawn with role: "executor" and the per-slice binding. - roleId plumbed end-to-end: spawn_agent affordance → DelegationAgentSpawn → SubAgentRunner → SessionRuntime (sub-agents now actually carry a role instead of running with the default). - EXECUTOR_PROMPT rewritten with the submit_evidence_claim contract, criterion-mapping rules, and hard rules (no spec/plan/goal authoring, no irreversibles without a gate). - Hub-layer executorRoleRule denies /specs, /goals, plan-revision, and delegation.spawn_agent for the executor role. Phase 2 — autonomous spec-agent and planner: - Goal.autonomous flag; create_goal accepts autonomous: true. - AutonomousGoalCoordinator watches /goals + /gates + /specs and spawns a spec-agent on autonomous goal creation; spawns the planner when the matching spec_accept gate is accepted. plan_accept → executor flow is unchanged. - SPEC_AGENT_PROMPT and PLANNER_PROMPT fleshed out with concrete contracts. specAgentRoleRule and plannerRoleRule enforce role boundaries (spec-agent can't author plans/evidence/goals; planner can't author specs/evidence/goals; neither mutates workspace files). Tests: - tests/orchestration-executor-autonomy.test.ts — scheduler dispatch with role=executor, mock executor submits evidence and slice gate auto-accepts; per-slice executor_binding propagation. - tests/orchestration-autonomous-goal.test.ts — autonomous goal spawns spec-agent → opens spec gate → user accepts → planner spawned; non-autonomous goals don't spawn anything. Out of scope (deferred): real LLM-driven sub-agent execution (tests use mock runners), cron/daily digest cadence, USD cost calculation, off-plan slice intent-drift detection, docs/13 specialist routing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Testing