From bf82948747bf588703883095662102e2efe0e745 Mon Sep 17 00:00:00 2001 From: Danilo Campos Date: Tue, 5 May 2026 16:45:35 -0400 Subject: [PATCH 1/9] Wizard dev skill --- .claude/skills/wizard-development/SKILL.md | 311 ++++++++++++++++++ .../references/ANTI-PATTERNS.md | 202 ++++++++++++ .../references/ARCHITECTURE.md | 209 ++++++++++++ 3 files changed, 722 insertions(+) create mode 100644 .claude/skills/wizard-development/SKILL.md create mode 100644 .claude/skills/wizard-development/references/ANTI-PATTERNS.md create mode 100644 .claude/skills/wizard-development/references/ARCHITECTURE.md diff --git a/.claude/skills/wizard-development/SKILL.md b/.claude/skills/wizard-development/SKILL.md new file mode 100644 index 00000000..de36608d --- /dev/null +++ b/.claude/skills/wizard-development/SKILL.md @@ -0,0 +1,311 @@ +--- +name: wizard-development +description: > + Design philosophy and architectural principles for the PostHog wizard. + Read this before making any structural change — adding features, extending + the runner, modifying the TUI, or introducing new abstractions. Covers + why the architecture is shaped this way, how to evaluate whether a change + fits, and how to extend the system into territory no existing skill covers. + Complements the existing skills in this repo targeted at specific subsystems. +compatibility: Designed for Claude Code or similar coding agents working on the PostHog wizard codebase. +metadata: + author: posthog + version: "1.0" +--- + +# Wizard Development + +This skill teaches the design discipline behind the wizard's architecture. The +procedural skills tell you *how* to add things. This skill tells you *why* the system is shaped the +way it is, so you can make good decisions when extending into territory no +existing skill covers. + +Read this first. Then any procedural skill that's relevant for what you're building. + +## The core discipline + +Every architectural decision in this codebase is guided by one question: + +> **Who will need to change this next, and what's the smallest thing they +> should have to understand?** + +This question produces a specific, testable commitment: **product knowledge +never enters infrastructure code.** The runner pipeline, the TUI store, the +detection loop, the prompt assembler — these are machinery. They don't know +what PostHog is. They don't know what a framework is. They execute a pipeline +driven by configuration. + +The knowledge — what PostHog needs from a Next.js project, how to detect +SvelteKit, what security rules to enforce — lives in configuration surfaces +with typed boundaries: + +| Which domain has leverage | Where it lives | What they need to understand | +|---|---|---| +| Frameworks | `FrameworkConfig` (~70-120 lines) | Detection, env vars, prompts | +| Docs | Skill markdown in context-mill | Workflow steps, example code | +| Security | YARA rules in `yara-scanner.ts` | Regex patterns, severity levels | +| Context | Step array in `workflows/` | Gates, screens, predicates | +| UI | Screen component + primitives | Ink, store getters, layout | +| Agent development | Runner, store, detection loop, tools | The machinery itself | + +These tight boundaries increase the scope of who is able to contribute to the wizard, while lowering the costs of those contributions. PostHog has a variety of skillsets across many kinds of teams. The wizard should be as accessible to contributions as possible to as many contributor skill sets as possible. + +## The five principles + +### 1. Route to the narrowest configuration surface + +When a new concern arises, find the narrowest typed boundary where it belongs. +Don't add it to the runner. Don't add it to the store. Don't create a new +abstraction. Find the existing configuration surface — or, if truly needed, +create a new one that follows the same pattern. + +**The test:** Can someone modify this concern without reading the agent runner? + +**What goes wrong when violated:** Product-specific logic gets inlined into +infrastructure files. The runner grows from hundreds of lines to thousands. +Changes in one product domain risk breaking another. Contributors need to +understand the full system to make local changes. Recovery infrastructure +proliferates to handle the unpredictable interactions between concerns that +should have been separated. + +### 2. Knowledge lives in markdown, not code + +The agent's behavior is shaped by skill content and commandments, not by +conditional logic in the runner. Framework-specific integration knowledge is +authored as static markdown artifacts — reviewed, tested, versioned, and +delivered separately from the engine via the context-mill repo. + +This separation exists because PostHog's product surface is vast (analytics, +session replay, feature flags, experiments, surveys, error tracking, LLM +analytics, revenue analytics, data warehouse). Each domain has its own SDK +patterns and its own update cadence. If integration knowledge lived in wizard +code, every docs update would require a wizard release. + +**The test:** Can the docs team update integration instructions without a +wizard code change? Is the prompt assembler under 70 lines? + +**What goes wrong when violated:** The prompt builder accumulates +framework-specific conditionals. The commandments grow from terse rules into +paragraph-length tutorials. Knowledge that changes at docs-team cadence gets +locked to wizard release cadence. The system prompt bloats, consuming context +window that the agent needs for actual work. + +### 3. Prevent rather than recover + +Invest in making the agent go right rather than building recovery +infrastructure for when it goes wrong. Prevention at the boundary is cheaper +and more reliable than detection-and-repair after the fact. + +Three prevention layers: + +- **L0 — Commandments** (`commandments.ts`): Terse rules in the system prompt. + Keep these short. If a rule needs a paragraph of explanation, it belongs in + a skill reference file, not in the per-turn system prompt. + +- **L1 — canUseTool allowlist** (`agent-interface.ts`): Blocks dangerous bash + commands before execution. The `wizard-tools` MCP fences `.env` files so the + agent never sees secret values. + +- **L2 — YARA scanner** (`yara-scanner.ts` + `yara-hooks.ts`): Regex rules + running as pre/post tool-use hooks. Catches PII in capture calls, hardcoded + keys, prompt injection, secret exfiltration, supply chain attacks. Critical + violations terminate the session. **Fails closed** — scanner error means + block, not pass. + +**The test:** When the agent misbehaves, does the system prevent the damage or +detect it afterward? Does the system fail closed on uncertainty? + +**What goes wrong when violated:** You build circuit breakers, checkpoints, +retry-from-checkpoint, self-heal logic, graceful-exit handling. Each of these +is code that exists because the boundary didn't prevent the problem. The +codebase grows to compensate for the absence of prevention, and the recovery +code itself becomes a source of complexity and bugs. + +### 4. New capability is a new workflow, not a new branch + +When the product needs a new capability (revenue analytics, audit, LLM +analytics), express it as a new workflow — a separate step array with its own +config — not as conditional logic in the existing runner. + +`WorkflowConfig` is a uniform type. The workflow registry is an array. CLI +subcommands, TUI flows, and the router all derive from the registry +automatically. Adding a workflow means: + +1. Create `src/lib/workflows//` with `index.ts` exporting a `WorkflowConfig` +2. Add it to `WORKFLOW_REGISTRY` in `workflow-registry.ts` +3. Add a `Flow` enum entry in `flows.ts` + +No changes to `bin.ts`, the store, or the runner. + +**The test:** Can a new workflow ship without modifying the runner, the store, +or `bin.ts`? + +**What goes wrong when violated:** The runner becomes a monolithic function +with product-specific branches. Each new capability increases the blast radius +of every other capability's changes. The function grows past the point where +anyone can hold it in their head. + +### 5. The UI is a pure function of state + +The screen the user sees is computed from `WizardSession` — the single source +of truth. No component sets its own visibility. No code imperatively navigates. +The router walks the workflow's step list and returns the first step whose +`isComplete` predicate is still false. + +Every session mutation goes through an explicit store setter that calls +`emitChange()`. The version counter bumps. Gate predicates are re-evaluated. +Screen transitions are detected. React re-renders. + +**The test:** Can you predict which screen is active by reading only the +session state? Is there any imperative navigation (`goTo`, `push`, `navigate`) +in the codebase? + +**What goes wrong when violated:** The UI can reach states that don't +correspond to any valid session. Screens get stuck. Flows advance when they +shouldn't. Contributors add imperative transitions to "fix" rendering bugs +that are actually state bugs, creating a tangle of navigation logic that +obscures the real control flow. + +## Extending into new territory + +The five principles above cover the existing patterns. But the wizard will need +capabilities that don't fit neatly into any current configuration surface. +Here's how to evaluate new extensions: + +### Decision framework + +When you're adding something that doesn't have a precedent in the codebase, +ask these questions in order: + +1. **Which domain does this belong to?** If it's framework-specific, it goes + in `FrameworkConfig`. If it's integration knowledge, it goes in skill + content. If it's a security constraint, it goes in YARA rules. If you + can't name a specific domain from the table above, the concern may not + be well-defined yet. + +2. **Does this change at a different rate than the code around it?** Knowledge + that updates weekly shouldn't live in code that releases monthly. If the + concern changes faster than its container, it needs a decoupled delivery + mechanism (like context-mill skills). If it changes slower, it can live in + code. + +3. **Can I express this as configuration rather than logic?** A typed + interface with six fields is better than a function with six conditionals. + The interface is documentation. The conditionals are implementation detail + that future readers must reverse-engineer. + +4. **What happens if this concern is wrong?** If a wrong value causes a bad + user experience, the boundary should provide fast feedback (compiler error, + test failure). If a wrong value causes a security incident, the boundary + should prevent execution (YARA, canUseTool). Match the severity of failure + to the strength of the boundary. + +5. **Does this expand the required knowledge for a domain?** If adding this + feature means working on frameworks now requires understanding the TUI, + or working on security now requires understanding the runner, the + boundary is in the wrong place. Refactor until the required knowledge + for each domain stays within its column. + +### Patterns for common extension types + +**New agent tool (in-process MCP):** +Add it to `wizard-tools.ts` alongside `check_env_keys`, `set_env_values`, etc. +The tool runs locally — secret values never leave the machine. Register the +tool name in `WIZARD_TOOL_NAMES` so the SDK allowlist includes it. Follow the +existing tool pattern: zod schema, path-traversal protection, logging. + +**New security rule:** +Add a `YaraRule` object to the `RULES` array in `yara-scanner.ts`. Each rule +has a name, description, severity, category, `appliesTo` (which hook+tool +combinations), and `patterns` (compiled regex). One match per rule is +sufficient. The hooks in `yara-hooks.ts` don't need to change — they iterate +the rules array automatically. + +**New detection signal:** +If you need to detect something about the project beyond framework identity +(e.g., Stripe presence, LLM SDK usage), add it to `detection/features.ts`. +The `discoverFeatures()` function returns an array of `DiscoveredFeature` +enums. The intro screen and workflow can read discovered features from the +session. Detection functions are pure — no store mutations, no UI calls. + +**New middleware:** +Implement the `Middleware` interface from `middleware/types.ts`: a `name` and +optional lifecycle hooks (`onInit`, `onMessage`, `onPhaseTransition`, +`onFinalize`). Add it to the pipeline construction in `agent-runner.ts`. The +pipeline dispatches to middlewares in order. Each middleware publishes to a +shared store that downstream middleware can read. The pipeline itself doesn't +change shape. + +**New TUI primitive:** +Create the component in `src/ui/tui/primitives/`. Export from +`primitives/index.ts`. Primitives are pure rendering — they take props and +return Ink JSX. They don't import the store, don't call `getUI()`, don't +mutate state. Screens compose primitives; primitives don't know what screen +they're in. See the `ink-tui` skill for the full component catalog and layout +patterns. + +**New post-agent step:** +If you need to do something after the agent completes (upload env vars, install +MCP servers, etc.), use the `postRun` hook on `WorkflowRun`. The hook receives +the session and credentials. It runs after the agent succeeds but before the +outro. It doesn't modify the runner pipeline — it's a callback on the workflow +configuration. + +**New environment variable convention:** +Each framework's `FrameworkConfig.environment.getEnvVars()` returns the env var +names and values. The naming convention is framework-specific (e.g., +`NEXT_PUBLIC_POSTHOG_PROJECT_TOKEN` for Next.js, `PUBLIC_POSTHOG_PROJECT_TOKEN` +for SvelteKit). If you need a new convention, add it to the framework config. +The runner doesn't know or care what the env vars are called. + +## What to watch for + +These are the early warning signs that a change is drifting from the +discipline: + +- **The runner is getting longer.** If you're adding lines to + `agent-runner.ts` or `agent-interface.ts`, ask whether the concern belongs + in a `WorkflowRun` config, a middleware, a post-run hook, or a skill file. + +- **A framework contributor needs to read the runner.** The `FrameworkConfig` + interface should be sufficient. If it's not, the interface is missing a + field — extend the interface, don't add special-case logic to the runner. + +- **The commandments are getting long.** Commandments are per-turn system + prompt — every token counts. If a rule needs explanation, move the + explanation to a skill reference file and leave only the invariant in the + commandment. + +- **You're building recovery infrastructure.** If you're writing retry logic, + checkpoint saving, or self-heal code, ask whether a prevention layer + (YARA rule, canUseTool rule, skill content improvement) would eliminate the + failure mode instead of recovering from it. + +- **You're adding imperative UI transitions.** If you're writing `goTo`, + `navigate`, or `if (screen === X) show Y`, the session state doesn't + accurately represent what the user should see. Fix the state model and let + the router derive the screen. + +- **A change in one workflow breaks another.** Workflows should be + independent. If they share mutable state beyond the session, that state + needs an explicit boundary (a store setter, a workflow config field) instead + of implicit coupling through the runner. + +## Compactness is an indicator, not a goal + +The wizard is ~20K lines of source. This isn't minimalism for its own sake. +It's a side effect of routing each concern to the right configuration surface. + +When concerns are correctly separated, each change is local. Defensive code is +unnecessary because boundaries prevent damage from propagating. The codebase +stays small because it doesn't need recovery infrastructure, special-case +branches, or coordination code between concerns that shouldn't know about each +other. + +If the codebase is growing faster than the capability it delivers, the +boundaries are in the wrong place. Measure the ratio, not the absolute size. + +## Reference files + +- [references/ARCHITECTURE.md](references/ARCHITECTURE.md) — Pipeline anatomy, data flow, security boundaries, screen resolution, MCP topology, middleware pipeline. **Read when you need to understand where a new concern hooks in.** +- [references/ANTI-PATTERNS.md](references/ANTI-PATTERNS.md) — Concrete failure modes with alternatives: inlining product logic, bloating commandments, building recovery instead of prevention, imperative navigation, bundling mismatched-rate knowledge. **Read when evaluating whether a proposed change fits.** diff --git a/.claude/skills/wizard-development/references/ANTI-PATTERNS.md b/.claude/skills/wizard-development/references/ANTI-PATTERNS.md new file mode 100644 index 00000000..6898a0e9 --- /dev/null +++ b/.claude/skills/wizard-development/references/ANTI-PATTERNS.md @@ -0,0 +1,202 @@ +--- +title: Anti-patterns and failure modes +description: Concrete examples of changes that violate the design discipline, what goes wrong downstream, and what to do instead. Read when evaluating whether a proposed change fits the architecture. +--- + +# Anti-patterns + +Each section describes a pattern that looks reasonable in isolation but +degrades the system over time. The failure modes are drawn from real +production codebases — they're not hypothetical. + +## Inlining product logic into the runner + +**The pattern:** A new feature needs something after the agent completes — +committing an event plan to an API, polling for data ingestion, creating a +dashboard server-side. The fastest path is adding the logic directly to the +runner function, after the agent run and before the outro. + +**What happens next:** The runner grows. It now knows about your product +concept (event plans, dashboards, data ingestion). Another feature does the +same thing. The runner now knows about two product concepts. A third. The +function passes 500 lines, then 1000. Each product concern interacts with +error handling — a network failure during dashboard creation needs different +recovery than a failure during event plan commit. The error handling branches +multiply. Token refresh logic appears (long runs outlive the OAuth expiry). +Soft-error vs. hard-error classification appears (did the agent do enough work +that we should continue despite the failure?). Each of these is reasonable in +isolation. Together they produce a function that no one can hold in their head. + +**The cost:** Every future change to any product concern risks breaking every +other product concern. A contributor who wants to modify the event plan step +must understand the dashboard step, the data ingestion step, the error +classification, and the token refresh timing. The required knowledge for a +local change has expanded to the full function. + +**What to do instead:** Use `WorkflowRun.postRun` for post-agent work. If the +post-agent step is complex enough to need its own error handling, extract it to +a module and call it from postRun. If multiple workflows need the same +post-agent step, make it a shared function that postRun calls — not shared code +in the runner. The runner stays generic; the product knowledge stays in +workflow configuration. + +--- + +## Growing commandments into tutorials + +**The pattern:** The agent keeps making the same mistake — using the wrong env +var name, running a project-wide lint instead of scoping to edited files, +installing non-PostHog packages. The fix seems obvious: add more detail to the +commandments so the agent knows exactly what to do. + +**What happens next:** Each commandment grows from one sentence to a paragraph. +Paragraphs acquire bullet lists. Bullet lists acquire sub-bullets with +examples. The commandments file grows from 500 tokens to 5,000. But +commandments are appended to the system prompt on every turn — they're part of +the cached prefix, but they still consume context window. On a long run with +many tool calls, the bloated commandments push useful context (the user's +actual code, the skill content, the conversation history) out of the window. +The agent starts making mistakes because it can't see its own prior work, not +because it lacks instructions. + +**The cost:** Context window is finite. Every token of commandment is a token +not available for the agent's actual task. And long commandments create a +second problem: the agent satisfices across conflicting instructions rather +than following any single instruction precisely. A 20-bullet commandment list +gets partially followed; a 5-line commandment list gets followed completely. + +**What to do instead:** Keep commandments to one sentence per rule — the +invariant, not the explanation. Move the explanation, examples, edge cases, and +rationale to a skill reference file. The agent loads reference files on demand +(progressive disclosure), so the detailed guidance is available when needed +without consuming context window on every turn. The commandment points at the +reference: "API key conventions — see `wizard-prompt-supplement/references/api-keys-and-env.md`." + +--- + +## Building recovery instead of prevention + +**The pattern:** The agent occasionally runs a destructive command, writes a +hardcoded secret, or gets stuck in a retry loop. The response is to build +infrastructure that detects the problem and recovers: a circuit breaker that +kills the run after N consecutive denied commands, a checkpoint system that +saves state so the run can resume, a self-heal module that detects and patches +agent mistakes. + +**What happens next:** The recovery infrastructure works — sort of. The circuit +breaker fires after 47 denied commands instead of preventing the first one. The +checkpoint system adds 300 lines of state serialization. The self-heal module +adds 250 lines of heuristic pattern matching. Each module is tested +independently but their interactions are not — a checkpoint saved during a +self-heal cycle can restore to a state that triggers the circuit breaker. The +recovery code becomes its own source of bugs, and the bugs are harder to +diagnose because they involve interactions between three systems that were each +designed in isolation. + +**The cost:** Recovery is O(n) in the number of failure modes. Each new failure +mode needs new recovery code. Prevention is O(1) — a YARA rule, a canUseTool +entry, or a skill content improvement addresses the root cause once. The +codebase grows linearly with recovery and stays constant with prevention. + +**What to do instead:** For each failure mode, ask: can I prevent this at the +boundary? A command that should never run → add it to the canUseTool deny list. +A pattern that should never appear in written code → add a YARA rule. A mistake +the agent keeps making → improve the skill content or add a one-line +commandment. Reserve recovery infrastructure for truly unpredictable failures +(network outages, API rate limits) — not for agent behavior that can be shaped +by better boundaries. + +--- + +## Imperative UI navigation + +**The pattern:** A screen needs to appear conditionally — only when the agent +encounters a specific error, or only when a feature flag is enabled. The +fastest path is adding a navigation call: `if (condition) goToScreen('error')`. + +**What happens next:** The navigation call works for this case. Another +conditional screen needs the same treatment. A third. Now the flow has three +imperative jumps that interact with the router's predicate-based resolution. +The router says the active screen should be X (based on session state), but an +imperative jump sent the user to Y. The session state says the user is on X. +The screen shows Y. A store setter fires, the router re-resolves, and the user +is yanked from Y back to X mid-interaction. The fix is another imperative jump +to keep the user on Y. The fix for the fix is a flag that suppresses the +router. The router is now partially bypassed. + +**The cost:** The system has two navigation mechanisms that disagree. Every +future screen change must account for both the predicate-based flow and the +imperative jumps. Bugs in this regime are non-local — the cause is a state +mutation in the runner, the symptom is a screen flicker in the TUI, and the +"fix" that someone applies (another imperative jump) makes the problem worse. + +**What to do instead:** Add a field to `WizardSession` that represents the +condition. Add a `show` or `isComplete` predicate on the workflow step that +reads the field. The router derives the correct screen automatically. If the +condition is an overlay (error modal, conflict resolution), use +`store.pushOverlay()` — overlays stack above the flow and pop when dismissed, +without disrupting the flow cursor. + +--- + +## Bundling knowledge that changes at a different rate + +**The pattern:** Integration knowledge (how to set up PostHog in a Next.js App +Router project) is useful and well-written. The simplest delivery mechanism is +to bundle it in the wizard's npm package — ship the skill files alongside the +code. + +**What happens next:** It works fine for a while. Then the docs team updates +the Next.js integration guide. The skill content is now out of date. But the +skill is bundled — updating it requires a wizard release. The wizard release +has its own PR cycle, CI, review, publish. The docs update ships in hours; the +wizard update ships in days. The gap widens as the product surface grows — +more frameworks, more features, more docs updates, each waiting for a wizard +release to propagate. + +**The cost:** Knowledge delivery is gated by code deployment. Teams that own +the knowledge (docs, frameworks, product) can't ship updates without +coordinating with the team that owns the deployment (wizard infrastructure). +This is a coordination cost that scales linearly with the number of knowledge +domains. + +**What to do instead:** Deliver knowledge through a decoupled channel. The +wizard fetches skill packages at runtime from a source that the knowledge +owners can update independently. Context-mill publishes skills as GitHub +release assets — the docs team can update a skill without touching the wizard +repo. The wizard's runtime fetch adds a network dependency, but the tradeoff +is correct: the network cost is paid once per run, the coordination cost of +bundling is paid on every update across every domain. + +**When bundling is fine:** If the knowledge rarely changes and the domain +count is small, bundling avoids the complexity of runtime fetch. The +decision depends on the rate of change, not on a universal preference. + +--- + +## Shared mutable state between workflows + +**The pattern:** Two workflows need the same information — the detected +framework, the user's credentials, a feature discovery result. The fastest +path is having both workflows read and write the same session fields, relying +on execution order to ensure the first workflow populates what the second +needs. + +**What happens next:** The dependency is implicit. A refactor changes the +execution order. The second workflow reads a field the first workflow hasn't +populated yet. The field is null. The second workflow fails with a cryptic +error. Or worse: it succeeds with default values that produce a subtly wrong +integration. The bug is invisible in tests because the test setup populates +the field explicitly. + +**The cost:** Implicit ordering dependencies between workflows make the system +fragile to refactoring. The dependency graph exists in the developer's head, +not in the code. + +**What to do instead:** Use the `requires` field on `WorkflowConfig` to make +workflow dependencies explicit. For shared data, use `frameworkContext` with +documented keys — the `setFrameworkContext` setter ensures `emitChange()` fires +and downstream predicates re-evaluate. If two workflows need the same +detection result, factor the detection into a shared module that both +workflows' `onReady` hooks call, rather than relying on one workflow to +populate what another reads. diff --git a/.claude/skills/wizard-development/references/ARCHITECTURE.md b/.claude/skills/wizard-development/references/ARCHITECTURE.md new file mode 100644 index 00000000..54751940 --- /dev/null +++ b/.claude/skills/wizard-development/references/ARCHITECTURE.md @@ -0,0 +1,209 @@ +--- +title: Pipeline anatomy and data flow +description: How data moves through the wizard — from CLI args to the screen the user sees. Read when you need to understand where a new concern should hook in. +--- + +# Architecture + +## The runner pipeline + +Every wizard run — framework integration, revenue analytics, audit, generic +skill — executes the same pipeline in `agent-runner.ts`. The pipeline is +fixed. What varies is the `WorkflowRun` configuration object. + +``` + 1. Init logging + debug + 2. Health check (skip if TUI already ran it) + 3. Settings conflict detection + resolution + 4. OAuth / credential flow + 5. Skill install (if WorkflowRun.skillId is set) + 6. Agent initialization (MCP servers, tools, sandbox, env) + 7. Prompt assembly (project context + custom prompt + skill path) + 8. Agent execution (SDK query with hooks) + 9. Error classification + handling +10. Post-run hooks (WorkflowRun.postRun — e.g. env var upload) +11. Outro data construction +12. Analytics shutdown +``` + +### Where configuration hooks fire + +- **WorkflowRun.customPrompt** — called at step 7, receives `PromptContext` + (projectId, apiKey, host, skillPath). Returns additional prompt text. +- **WorkflowRun.postRun** — called at step 10, receives session + credentials. + Runs after the agent succeeds but before the outro. +- **WorkflowRun.buildOutroData** — called at step 11 if present. Otherwise + the runner builds default outro data from successMessage/reportFile/docsUrl. +- **WorkflowRun.abortCases** — matched against `[ABORT] ` signals + during step 8. First regex match renders a custom error outro. +- **WorkflowStep.onInit** — fires during store construction (step 0, before + session is assigned). Use for session-independent work only (e.g. health + check prefetch). +- **WorkflowStep.onReady** — fires after `tui.store.session = session` in + bin.ts. Awaited in sequence. Use for session-dependent pre-flow work (e.g. + framework detection, prerequisite scanning). +- **WorkflowStep.gate** — predicate checked on every `emitChange()`. bin.ts + parks on `await store.getGate(stepId)` until the predicate flips true. + +### What the runner does NOT know + +The runner doesn't know what framework is being integrated. It doesn't know +what skills exist. It doesn't know what env vars are called. It doesn't know +what the outro should say. All of that comes from `WorkflowRun` and +`FrameworkConfig` — configuration, not code. + +## Session data flow + +``` +CLI args / env vars + ↓ +buildSession() → WizardSession (flat data bag, all fields initialized) + ↓ +Store assignment → tui.store.session = session + ↓ +onReady hooks → detect framework, gather context, check version + ↓ +TUI screens → user confirms setup, authenticates, etc. + ↓ (each screen calls a store setter → emitChange()) +Agent run → agent reads/writes files, emits signals + ↓ +postRun hooks → env var upload, etc. + ↓ +Outro → session.outroData drives the outro screen +``` + +Session is populated in layers. Early layers provide defaults. Later layers +override. Business logic reads from the session — never calls a prompt. The +session never calls `getUI()`. + +## Agent output flow + +During the agent run (step 8), the SDK emits messages via an async generator. +`handleSDKMessage` in `agent-interface.ts` processes each message: + +``` +SDK message (async generator) + ↓ +handleSDKMessage() + ├─ assistant message + │ ├─ text content → collectedText[] (for signal detection) + │ ├─ [STATUS] marker → getUI().pushStatus() → store.statusMessages + │ └─ TodoWrite tool_use → getUI().syncTodos() → store.tasks + ├─ result message + │ ├─ success → mark receivedSuccessResult + │ └─ error → log + surface to user (unless post-success cleanup noise) + └─ system message (init) → log tools/model/mcpServers +``` + +Key: the agent doesn't know the TUI exists. It uses standard Claude Code +patterns (`[STATUS]` text markers, `TodoWrite` tool calls) and the harness +translates them into store state. Adding a new observation channel means +adding a new signal pattern to `handleSDKMessage` and a new store atom + +setter, not modifying the agent prompt. + +## Security boundary flow + +Three layers, each enforced at a different point in the tool-use lifecycle: + +``` +Agent wants to use a tool + ↓ +canUseTool() [L1] → allow/deny before execution + ↓ (bash allowlist, .env file fencing) +PreToolUse YARA hooks [L2] → scan input, block if matched + ↓ (exfiltration, destructive ops, supply chain) +Tool executes + ↓ +PostToolUse YARA hooks [L2] → scan output, instruct revert or terminate + ↓ (PII in capture, hardcoded keys, prompt injection) +Result returned to agent +``` + +The sandbox (filesystem + network scoping) is configured once in the SDK +`query()` call and enforced by the SDK runtime — not by wizard code. + +Commandments (L0) are in the system prompt and operate at the model's +judgment layer — no code enforcement. They're the first line, not the last. + +## Screen resolution flow + +``` +Store setter called (e.g. store.completeSetup()) + ↓ +$session atom updated + ↓ +emitChange() + ├─ version counter bumps (React re-renders via useSyncExternalStore) + ├─ _checkGates() — resolve any gate whose predicate is now true + └─ _detectTransition() — fire enter-screen hooks, capture analytics + ↓ +router.resolve(session) + ├─ if overlay stack non-empty → return top overlay + └─ walk flow entries: + for each entry: + skip if entry.show(session) === false + skip if entry.isComplete(session) === true + return entry.screen ← first incomplete, visible screen + fallback: last entry (outro) +``` + +No imperative navigation anywhere. The router is a pure function of session +state + overlay stack. If you need to change which screen is active, change +the session state that the predicates read. + +## The WizardUI abstraction + +Business logic never imports the store directly. It calls `getUI()`, which +returns a `WizardUI` interface. Two implementations: + +- **InkUI** — translates calls to store setters. Used in interactive TUI mode. +- **LoggingUI** — translates calls to console output. Used in CI mode. + +This boundary means the runner, the agent interface, and the OAuth flow don't +know whether they're driving a TUI or printing to a log. When adding a new +piece of state that the UI should reflect: + +1. Add the field to `WizardSession` +2. Add a setter to `WizardStore` that calls `emitChange()` +3. Add the method to `WizardUI` interface +4. Implement in both `InkUI` (delegates to store setter) and `LoggingUI` + (prints or no-ops) + +## MCP server topology + +The agent has access to two MCP servers: + +- **posthog-wizard** — remote, HTTP-based. The PostHog MCP server at + `mcp.posthog.com/mcp` (or `mcp-eu.posthog.com/mcp`). Provides query tools + for PostHog data, dashboard creation, etc. Authenticated via Bearer token. + Tool schemas are deferred (`ENABLE_TOOL_SEARCH: 'auto:0'`) to avoid + bloating the system prompt. + +- **wizard-tools** — local, in-process. Created by `createWizardToolsServer()` + in `wizard-tools.ts`. Provides `check_env_keys`, `set_env_values`, + `detect_package_manager`, `load_skill_menu`, `install_skill`. Runs in the + wizard process — secret values never leave the machine. + +Frameworks can add additional MCP servers via +`FrameworkConfig.metadata.additionalMcpServers` (e.g. SvelteKit adds the +official Svelte MCP at `https://mcp.svelte.dev/mcp`). + +## Middleware pipeline + +The middleware system is opt-in (currently used for benchmarking). It +implements `{ onMessage, finalize }` — the same interface the runner expects: + +``` +MiddlewarePipeline + ├─ middleware 1: onInit, onMessage, onPhaseTransition, onFinalize + ├─ middleware 2: ... + └─ middleware N: ... +``` + +Each middleware has a `name` and optional lifecycle hooks. A shared store +(`MiddlewareContext.get` / `MiddlewareStore.set`) lets upstream middleware +publish data that downstream middleware reads. Phase detection is automatic +(from SDK message content) or explicit (`pipeline.startPhase()`). + +To add a middleware: implement the `Middleware` interface, add it to the +pipeline construction in `agent-runner.ts`. The pipeline dispatches in order. From ce9a4a870507d36ca266ea2d99af1c0d27139c1d Mon Sep 17 00:00:00 2001 From: Danilo Campos Date: Tue, 5 May 2026 16:59:54 -0400 Subject: [PATCH 2/9] Create CLAUDE.md --- CLAUDE.md | 82 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 82 insertions(+) create mode 100644 CLAUDE.md diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 00000000..c34cb026 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,82 @@ +# PostHog Wizard + +The PostHog wizard (`npx @posthog/wizard`) is a CLI that adds PostHog to a user's project using an AI agent. It authenticates the user, detects their framework, runs an agent that integrates the SDK and instruments events, and walks the user through their first dashboard. All from the terminal. + +User-facing docs: https://posthog.com/docs/ai-engineering/ai-wizard + +## Design discipline + +This codebase follows a specific design discipline: **product knowledge never enters infrastructure code.** The runner pipeline, the TUI store, the detection loop, and the prompt assembler are machinery. They don't know what PostHog is. They don't know what a framework is. They execute a pipeline driven by typed configuration surfaces. + +Each domain has a dedicated boundary: + +- **Frameworks** → `FrameworkConfig` in `src/frameworks//` +- **Integration knowledge** → markdown skills in the +[context-mill](https://github.com/PostHog/context-mill) repo +- **Security policy** → YARA rules in `src/lib/yara-scanner.ts` +- **Workflows** → step arrays in `src/lib/workflows/` +- **TUI** → screen components and primitives in `src/ui/tui/` + +Adding a new concern means finding the narrowest existing surface, not adding logic to the runner. The wizard is small (~20K lines) because boundaries prevent damage from propagating between concerns. + +## Before making structural changes + +Read `.claude/skills/wizard-development/SKILL.md` first. It covers the design discipline, a decision framework for new extensions, and warning signs that a change is drifting off-pattern. Two reference files extend it: + +- `references/ARCHITECTURE.md` — pipeline anatomy, data flow, security +boundaries, screen resolution +- `references/ANTI-PATTERNS.md` — concrete failure modes with alternatives + +## Skills available + +Four skills live under `.claude/skills/`. Read `wizard-development` first for any structural change; then load the relevant procedural skill: + +| Skill | When to use | +|---|---| +| `wizard-development` | Before any structural change. Design principles + decision framework. | +| `adding-framework-support` | Adding a new framework integration (e.g. Ruby on Rails, Go, Angular). | +| `adding-skill-workflow` | Adding a new skill-based workflow (e.g. a new product feature setup). | +| `ink-tui` | Building or modifying TUI screens, layouts, and primitives. | + +## Commands + +```bash +pnpm install # Install dependencies +pnpm try --install-dir= # Run the wizard locally against a test project +pnpm build # Compile TypeScript +pnpm test # Unit tests (builds first) +pnpm test:watch # Unit tests in watch mode +pnpm test:e2e # End-to-end tests +pnpm lint # Prettier + ESLint checks +pnpm fix # Auto-fix lint issues +pnpm dev # Build, link globally, watch for changes +``` + +After any change, verify with: + +```bash +pnpm build && pnpm test && pnpm fix +``` + +## Companion projects + +- **[context-mill](https://github.com/PostHog/context-mill)** — builds and +publishes the markdown skills the wizard agent uses for framework-specific integration knowledge. Skills are decoupled from the wizard release cycle so docs and integration patterns can update independently. +- **[wizard-workbench](https://github.com/PostHog/wizard-workbench)** — the +development and testing environment. Houses framework test apps (Next.js, React Router, Django, Flask, Laravel, SvelteKit, Swift, TanStack, FastAPI) with no PostHog installed, plus an `mprocs`-driven local dev stack that runs context-mill + MCP + the wizard together with hot reload. Use this to develop and test wizard changes against real projects. + +## Repository conventions + +- TypeScript everywhere. Use `type` (not `interface`) for framework context +types so they satisfy `Record`. +- All UI calls go through `getUI()` (returns `WizardUI` interface). Never import +the store directly from business logic. +- Session mutations go through explicit store setters that call `emitChange()`. +Never mutate `session` directly — nanostore holds a shallow copy. +- The router resolves the active screen from session state. No imperative +navigation (`goTo`, `navigate`, `push`) anywhere. +- Never write secrets to source code or hardcode API keys. Use the +`wizard-tools` MCP server (`check_env_keys` / `set_env_values`) for `.env` file operations. +- Feedback / issues: wizard@posthog.com or +[GitHub Issues](https://github.com/posthog/wizard/issues). + From 055d16cb0bb56a05b5f7c28d9c57997a4e1bed09 Mon Sep 17 00:00:00 2001 From: Danilo Campos Date: Tue, 5 May 2026 17:00:44 -0400 Subject: [PATCH 3/9] Clean up hard wraps --- .claude/skills/wizard-development/SKILL.md | 261 ++++-------------- .../references/ANTI-PATTERNS.md | 213 +++----------- .../references/ARCHITECTURE.md | 92 ++---- CLAUDE.md | 1 + 4 files changed, 131 insertions(+), 436 deletions(-) diff --git a/.claude/skills/wizard-development/SKILL.md b/.claude/skills/wizard-development/SKILL.md index de36608d..02863eba 100644 --- a/.claude/skills/wizard-development/SKILL.md +++ b/.claude/skills/wizard-development/SKILL.md @@ -15,10 +15,7 @@ metadata: # Wizard Development -This skill teaches the design discipline behind the wizard's architecture. The -procedural skills tell you *how* to add things. This skill tells you *why* the system is shaped the -way it is, so you can make good decisions when extending into territory no -existing skill covers. +This skill teaches the design discipline behind the wizard's architecture. The procedural skills tell you *how* to add things. This skill tells you *why* the system is shaped the way it is, so you can make good decisions when extending into territory no existing skill covers. Read this first. Then any procedural skill that's relevant for what you're building. @@ -29,15 +26,9 @@ Every architectural decision in this codebase is guided by one question: > **Who will need to change this next, and what's the smallest thing they > should have to understand?** -This question produces a specific, testable commitment: **product knowledge -never enters infrastructure code.** The runner pipeline, the TUI store, the -detection loop, the prompt assembler — these are machinery. They don't know -what PostHog is. They don't know what a framework is. They execute a pipeline -driven by configuration. +This question produces a specific, testable commitment: **product knowledge never enters infrastructure code.** The runner pipeline, the TUI store, the detection loop, the prompt assembler — these are machinery. They don't know what PostHog is. They don't know what a framework is. They execute a pipeline driven by configuration. -The knowledge — what PostHog needs from a Next.js project, how to detect -SvelteKit, what security rules to enforce — lives in configuration surfaces -with typed boundaries: +The knowledge — what PostHog needs from a Next.js project, how to detect SvelteKit, what security rules to enforce — lives in configuration surfaces with typed boundaries: | Which domain has leverage | Where it lives | What they need to understand | |---|---|---| @@ -54,82 +45,43 @@ These tight boundaries increase the scope of who is able to contribute to the wi ### 1. Route to the narrowest configuration surface -When a new concern arises, find the narrowest typed boundary where it belongs. -Don't add it to the runner. Don't add it to the store. Don't create a new -abstraction. Find the existing configuration surface — or, if truly needed, -create a new one that follows the same pattern. +When a new concern arises, find the narrowest typed boundary where it belongs. Don't add it to the runner. Don't add it to the store. Don't create a new abstraction. Find the existing configuration surface — or, if truly needed, create a new one that follows the same pattern. **The test:** Can someone modify this concern without reading the agent runner? -**What goes wrong when violated:** Product-specific logic gets inlined into -infrastructure files. The runner grows from hundreds of lines to thousands. -Changes in one product domain risk breaking another. Contributors need to -understand the full system to make local changes. Recovery infrastructure -proliferates to handle the unpredictable interactions between concerns that -should have been separated. +**What goes wrong when violated:** Product-specific logic gets inlined into infrastructure files. The runner grows from hundreds of lines to thousands. Changes in one product domain risk breaking another. Contributors need to understand the full system to make local changes. Recovery infrastructure proliferates to handle the unpredictable interactions between concerns that should have been separated. ### 2. Knowledge lives in markdown, not code -The agent's behavior is shaped by skill content and commandments, not by -conditional logic in the runner. Framework-specific integration knowledge is -authored as static markdown artifacts — reviewed, tested, versioned, and -delivered separately from the engine via the context-mill repo. +The agent's behavior is shaped by skill content and commandments, not by conditional logic in the runner. Framework-specific integration knowledge is authored as static markdown artifacts — reviewed, tested, versioned, and delivered separately from the engine via the context-mill repo. -This separation exists because PostHog's product surface is vast (analytics, -session replay, feature flags, experiments, surveys, error tracking, LLM -analytics, revenue analytics, data warehouse). Each domain has its own SDK -patterns and its own update cadence. If integration knowledge lived in wizard -code, every docs update would require a wizard release. +This separation exists because PostHog's product surface is vast (analytics, session replay, feature flags, experiments, surveys, error tracking, LLM analytics, revenue analytics, data warehouse). Each domain has its own SDK patterns and its own update cadence. If integration knowledge lived in wizard code, every docs update would require a wizard release. -**The test:** Can the docs team update integration instructions without a -wizard code change? Is the prompt assembler under 70 lines? +**The test:** Can the docs team update integration instructions without a wizard code change? Is the prompt assembler under 70 lines? -**What goes wrong when violated:** The prompt builder accumulates -framework-specific conditionals. The commandments grow from terse rules into -paragraph-length tutorials. Knowledge that changes at docs-team cadence gets -locked to wizard release cadence. The system prompt bloats, consuming context -window that the agent needs for actual work. +**What goes wrong when violated:** The prompt builder accumulates framework-specific conditionals. The commandments grow from terse rules into paragraph-length tutorials. Knowledge that changes at docs-team cadence gets locked to wizard release cadence. The system prompt bloats, consuming context window that the agent needs for actual work. ### 3. Prevent rather than recover -Invest in making the agent go right rather than building recovery -infrastructure for when it goes wrong. Prevention at the boundary is cheaper -and more reliable than detection-and-repair after the fact. +Invest in making the agent go right rather than building recovery infrastructure for when it goes wrong. Prevention at the boundary is cheaper and more reliable than detection-and-repair after the fact. Three prevention layers: -- **L0 — Commandments** (`commandments.ts`): Terse rules in the system prompt. - Keep these short. If a rule needs a paragraph of explanation, it belongs in - a skill reference file, not in the per-turn system prompt. +- **L0 — Commandments** (`commandments.ts`): Terse rules in the system prompt. Keep these short. If a rule needs a paragraph of explanation, it belongs in a skill reference file, not in the per-turn system prompt. -- **L1 — canUseTool allowlist** (`agent-interface.ts`): Blocks dangerous bash - commands before execution. The `wizard-tools` MCP fences `.env` files so the - agent never sees secret values. +- **L1 — canUseTool allowlist** (`agent-interface.ts`): Blocks dangerous bash commands before execution. The `wizard-tools` MCP fences `.env` files so the agent never sees secret values. -- **L2 — YARA scanner** (`yara-scanner.ts` + `yara-hooks.ts`): Regex rules - running as pre/post tool-use hooks. Catches PII in capture calls, hardcoded - keys, prompt injection, secret exfiltration, supply chain attacks. Critical - violations terminate the session. **Fails closed** — scanner error means - block, not pass. +- **L2 — YARA scanner** (`yara-scanner.ts` + `yara-hooks.ts`): Regex rules running as pre/post tool-use hooks. Catches PII in capture calls, hardcoded keys, prompt injection, secret exfiltration, supply chain attacks. Critical violations terminate the session. **Fails closed** — scanner error means block, not pass. -**The test:** When the agent misbehaves, does the system prevent the damage or -detect it afterward? Does the system fail closed on uncertainty? +**The test:** When the agent misbehaves, does the system prevent the damage or detect it afterward? Does the system fail closed on uncertainty? -**What goes wrong when violated:** You build circuit breakers, checkpoints, -retry-from-checkpoint, self-heal logic, graceful-exit handling. Each of these -is code that exists because the boundary didn't prevent the problem. The -codebase grows to compensate for the absence of prevention, and the recovery -code itself becomes a source of complexity and bugs. +**What goes wrong when violated:** You build circuit breakers, checkpoints, retry-from-checkpoint, self-heal logic, graceful-exit handling. Each of these is code that exists because the boundary didn't prevent the problem. The codebase grows to compensate for the absence of prevention, and the recovery code itself becomes a source of complexity and bugs. ### 4. New capability is a new workflow, not a new branch -When the product needs a new capability (revenue analytics, audit, LLM -analytics), express it as a new workflow — a separate step array with its own -config — not as conditional logic in the existing runner. +When the product needs a new capability (revenue analytics, audit, LLM analytics), express it as a new workflow — a separate step array with its own config — not as conditional logic in the existing runner. -`WorkflowConfig` is a uniform type. The workflow registry is an array. CLI -subcommands, TUI flows, and the router all derive from the registry -automatically. Adding a workflow means: +`WorkflowConfig` is a uniform type. The workflow registry is an array. CLI subcommands, TUI flows, and the router all derive from the registry automatically. Adding a workflow means: 1. Create `src/lib/workflows//` with `index.ts` exporting a `WorkflowConfig` 2. Add it to `WORKFLOW_REGISTRY` in `workflow-registry.ts` @@ -137,175 +89,80 @@ automatically. Adding a workflow means: No changes to `bin.ts`, the store, or the runner. -**The test:** Can a new workflow ship without modifying the runner, the store, -or `bin.ts`? +**The test:** Can a new workflow ship without modifying the runner, the store, or `bin.ts`? -**What goes wrong when violated:** The runner becomes a monolithic function -with product-specific branches. Each new capability increases the blast radius -of every other capability's changes. The function grows past the point where -anyone can hold it in their head. +**What goes wrong when violated:** The runner becomes a monolithic function with product-specific branches. Each new capability increases the blast radius of every other capability's changes. The function grows past the point where anyone can hold it in their head. ### 5. The UI is a pure function of state -The screen the user sees is computed from `WizardSession` — the single source -of truth. No component sets its own visibility. No code imperatively navigates. -The router walks the workflow's step list and returns the first step whose -`isComplete` predicate is still false. +The screen the user sees is computed from `WizardSession` — the single source of truth. No component sets its own visibility. No code imperatively navigates. The router walks the workflow's step list and returns the first step whose `isComplete` predicate is still false. -Every session mutation goes through an explicit store setter that calls -`emitChange()`. The version counter bumps. Gate predicates are re-evaluated. -Screen transitions are detected. React re-renders. +Every session mutation goes through an explicit store setter that calls `emitChange()`. The version counter bumps. Gate predicates are re-evaluated. Screen transitions are detected. React re-renders. -**The test:** Can you predict which screen is active by reading only the -session state? Is there any imperative navigation (`goTo`, `push`, `navigate`) -in the codebase? +**The test:** Can you predict which screen is active by reading only the session state? Is there any imperative navigation (`goTo`, `push`, `navigate`) in the codebase? -**What goes wrong when violated:** The UI can reach states that don't -correspond to any valid session. Screens get stuck. Flows advance when they -shouldn't. Contributors add imperative transitions to "fix" rendering bugs -that are actually state bugs, creating a tangle of navigation logic that -obscures the real control flow. +**What goes wrong when violated:** The UI can reach states that don't correspond to any valid session. Screens get stuck. Flows advance when they shouldn't. Contributors add imperative transitions to "fix" rendering bugs that are actually state bugs, creating a tangle of navigation logic that obscures the real control flow. ## Extending into new territory -The five principles above cover the existing patterns. But the wizard will need -capabilities that don't fit neatly into any current configuration surface. -Here's how to evaluate new extensions: +The five principles above cover the existing patterns. But the wizard will need capabilities that don't fit neatly into any current configuration surface. Here's how to evaluate new extensions: ### Decision framework -When you're adding something that doesn't have a precedent in the codebase, -ask these questions in order: - -1. **Which domain does this belong to?** If it's framework-specific, it goes - in `FrameworkConfig`. If it's integration knowledge, it goes in skill - content. If it's a security constraint, it goes in YARA rules. If you - can't name a specific domain from the table above, the concern may not - be well-defined yet. - -2. **Does this change at a different rate than the code around it?** Knowledge - that updates weekly shouldn't live in code that releases monthly. If the - concern changes faster than its container, it needs a decoupled delivery - mechanism (like context-mill skills). If it changes slower, it can live in - code. - -3. **Can I express this as configuration rather than logic?** A typed - interface with six fields is better than a function with six conditionals. - The interface is documentation. The conditionals are implementation detail - that future readers must reverse-engineer. - -4. **What happens if this concern is wrong?** If a wrong value causes a bad - user experience, the boundary should provide fast feedback (compiler error, - test failure). If a wrong value causes a security incident, the boundary - should prevent execution (YARA, canUseTool). Match the severity of failure - to the strength of the boundary. - -5. **Does this expand the required knowledge for a domain?** If adding this - feature means working on frameworks now requires understanding the TUI, - or working on security now requires understanding the runner, the - boundary is in the wrong place. Refactor until the required knowledge - for each domain stays within its column. +When you're adding something that doesn't have a precedent in the codebase, ask these questions in order: + +1. **Which domain does this belong to?** If it's framework-specific, it goes in `FrameworkConfig`. If it's integration knowledge, it goes in skill content. If it's a security constraint, it goes in YARA rules. If you can't name a specific domain from the table above, the concern may not be well-defined yet. + +2. **Does this change at a different rate than the code around it?** Knowledge that updates weekly shouldn't live in code that releases monthly. If the concern changes faster than its container, it needs a decoupled delivery mechanism (like context-mill skills). If it changes slower, it can live in code. + +3. **Can I express this as configuration rather than logic?** A typed interface with six fields is better than a function with six conditionals. The interface is documentation. The conditionals are implementation detail that future readers must reverse-engineer. + +4. **What happens if this concern is wrong?** If a wrong value causes a bad user experience, the boundary should provide fast feedback (compiler error, test failure). If a wrong value causes a security incident, the boundary should prevent execution (YARA, canUseTool). Match the severity of failure to the strength of the boundary. + +5. **Does this expand the required knowledge for a domain?** If adding this feature means working on frameworks now requires understanding the TUI, or working on security now requires understanding the runner, the boundary is in the wrong place. Refactor until the required knowledge for each domain stays within its column. ### Patterns for common extension types -**New agent tool (in-process MCP):** -Add it to `wizard-tools.ts` alongside `check_env_keys`, `set_env_values`, etc. -The tool runs locally — secret values never leave the machine. Register the -tool name in `WIZARD_TOOL_NAMES` so the SDK allowlist includes it. Follow the -existing tool pattern: zod schema, path-traversal protection, logging. - -**New security rule:** -Add a `YaraRule` object to the `RULES` array in `yara-scanner.ts`. Each rule -has a name, description, severity, category, `appliesTo` (which hook+tool -combinations), and `patterns` (compiled regex). One match per rule is -sufficient. The hooks in `yara-hooks.ts` don't need to change — they iterate -the rules array automatically. - -**New detection signal:** -If you need to detect something about the project beyond framework identity -(e.g., Stripe presence, LLM SDK usage), add it to `detection/features.ts`. -The `discoverFeatures()` function returns an array of `DiscoveredFeature` -enums. The intro screen and workflow can read discovered features from the -session. Detection functions are pure — no store mutations, no UI calls. - -**New middleware:** -Implement the `Middleware` interface from `middleware/types.ts`: a `name` and -optional lifecycle hooks (`onInit`, `onMessage`, `onPhaseTransition`, -`onFinalize`). Add it to the pipeline construction in `agent-runner.ts`. The -pipeline dispatches to middlewares in order. Each middleware publishes to a -shared store that downstream middleware can read. The pipeline itself doesn't -change shape. - -**New TUI primitive:** -Create the component in `src/ui/tui/primitives/`. Export from -`primitives/index.ts`. Primitives are pure rendering — they take props and -return Ink JSX. They don't import the store, don't call `getUI()`, don't -mutate state. Screens compose primitives; primitives don't know what screen -they're in. See the `ink-tui` skill for the full component catalog and layout -patterns. - -**New post-agent step:** -If you need to do something after the agent completes (upload env vars, install -MCP servers, etc.), use the `postRun` hook on `WorkflowRun`. The hook receives -the session and credentials. It runs after the agent succeeds but before the -outro. It doesn't modify the runner pipeline — it's a callback on the workflow -configuration. - -**New environment variable convention:** -Each framework's `FrameworkConfig.environment.getEnvVars()` returns the env var -names and values. The naming convention is framework-specific (e.g., -`NEXT_PUBLIC_POSTHOG_PROJECT_TOKEN` for Next.js, `PUBLIC_POSTHOG_PROJECT_TOKEN` -for SvelteKit). If you need a new convention, add it to the framework config. -The runner doesn't know or care what the env vars are called. +**New agent tool (in-process MCP):** Add it to `wizard-tools.ts` alongside `check_env_keys`, `set_env_values`, etc. The tool runs locally — secret values never leave the machine. Register the tool name in `WIZARD_TOOL_NAMES` so the SDK allowlist includes it. Follow the existing tool pattern: zod schema, path-traversal protection, logging. + +**New security rule:** Add a `YaraRule` object to the `RULES` array in `yara-scanner.ts`. Each rule has a name, description, severity, category, `appliesTo` (which hook+tool combinations), and `patterns` (compiled regex). One match per rule is sufficient. The hooks in `yara-hooks.ts` don't need to change — they iterate the rules array automatically. + +**New detection signal:** If you need to detect something about the project beyond framework identity (e.g., Stripe presence, LLM SDK usage), add it to `detection/features.ts`. The `discoverFeatures()` function returns an array of `DiscoveredFeature` enums. The intro screen and workflow can read discovered features from the session. Detection functions are pure — no store mutations, no UI calls. + +**New middleware:** Implement the `Middleware` interface from `middleware/types.ts`: a `name` and optional lifecycle hooks (`onInit`, `onMessage`, `onPhaseTransition`, `onFinalize`). Add it to the pipeline construction in `agent-runner.ts`. The pipeline dispatches to middlewares in order. Each middleware publishes to a shared store that downstream middleware can read. The pipeline itself doesn't change shape. + +**New TUI primitive:** Create the component in `src/ui/tui/primitives/`. Export from `primitives/index.ts`. Primitives are pure rendering — they take props and return Ink JSX. They don't import the store, don't call `getUI()`, don't mutate state. Screens compose primitives; primitives don't know what screen they're in. See the `ink-tui` skill for the full component catalog and layout patterns. + +**New post-agent step:** If you need to do something after the agent completes (upload env vars, install MCP servers, etc.), use the `postRun` hook on `WorkflowRun`. The hook receives the session and credentials. It runs after the agent succeeds but before the outro. It doesn't modify the runner pipeline — it's a callback on the workflow configuration. + +**New environment variable convention:** Each framework's `FrameworkConfig.environment.getEnvVars()` returns the env var names and values. The naming convention is framework-specific (e.g., `NEXT_PUBLIC_POSTHOG_PROJECT_TOKEN` for Next.js, `PUBLIC_POSTHOG_PROJECT_TOKEN` for SvelteKit). If you need a new convention, add it to the framework config. The runner doesn't know or care what the env vars are called. ## What to watch for -These are the early warning signs that a change is drifting from the -discipline: +These are the early warning signs that a change is drifting from the discipline: -- **The runner is getting longer.** If you're adding lines to - `agent-runner.ts` or `agent-interface.ts`, ask whether the concern belongs - in a `WorkflowRun` config, a middleware, a post-run hook, or a skill file. +- **The runner is getting longer.** If you're adding lines to `agent-runner.ts` or `agent-interface.ts`, ask whether the concern belongs in a `WorkflowRun` config, a middleware, a post-run hook, or a skill file. -- **A framework contributor needs to read the runner.** The `FrameworkConfig` - interface should be sufficient. If it's not, the interface is missing a - field — extend the interface, don't add special-case logic to the runner. +- **A framework contributor needs to read the runner.** The `FrameworkConfig` interface should be sufficient. If it's not, the interface is missing a field — extend the interface, don't add special-case logic to the runner. -- **The commandments are getting long.** Commandments are per-turn system - prompt — every token counts. If a rule needs explanation, move the - explanation to a skill reference file and leave only the invariant in the - commandment. +- **The commandments are getting long.** Commandments are per-turn system prompt — every token counts. If a rule needs explanation, move the explanation to a skill reference file and leave only the invariant in the commandment. -- **You're building recovery infrastructure.** If you're writing retry logic, - checkpoint saving, or self-heal code, ask whether a prevention layer - (YARA rule, canUseTool rule, skill content improvement) would eliminate the - failure mode instead of recovering from it. +- **You're building recovery infrastructure.** If you're writing retry logic, checkpoint saving, or self-heal code, ask whether a prevention layer (YARA rule, canUseTool rule, skill content improvement) would eliminate the failure mode instead of recovering from it. -- **You're adding imperative UI transitions.** If you're writing `goTo`, - `navigate`, or `if (screen === X) show Y`, the session state doesn't - accurately represent what the user should see. Fix the state model and let - the router derive the screen. +- **You're adding imperative UI transitions.** If you're writing `goTo`, `navigate`, or `if (screen === X) show Y`, the session state doesn't accurately represent what the user should see. Fix the state model and let the router derive the screen. -- **A change in one workflow breaks another.** Workflows should be - independent. If they share mutable state beyond the session, that state - needs an explicit boundary (a store setter, a workflow config field) instead - of implicit coupling through the runner. +- **A change in one workflow breaks another.** Workflows should be independent. If they share mutable state beyond the session, that state needs an explicit boundary (a store setter, a workflow config field) instead of implicit coupling through the runner. ## Compactness is an indicator, not a goal -The wizard is ~20K lines of source. This isn't minimalism for its own sake. -It's a side effect of routing each concern to the right configuration surface. +The wizard is ~20K lines of source. This isn't minimalism for its own sake. It's a side effect of routing each concern to the right configuration surface. -When concerns are correctly separated, each change is local. Defensive code is -unnecessary because boundaries prevent damage from propagating. The codebase -stays small because it doesn't need recovery infrastructure, special-case -branches, or coordination code between concerns that shouldn't know about each -other. +When concerns are correctly separated, each change is local. Defensive code is unnecessary because boundaries prevent damage from propagating. The codebase stays small because it doesn't need recovery infrastructure, special-case branches, or coordination code between concerns that shouldn't know about each other. -If the codebase is growing faster than the capability it delivers, the -boundaries are in the wrong place. Measure the ratio, not the absolute size. +If the codebase is growing faster than the capability it delivers, the boundaries are in the wrong place. Measure the ratio, not the absolute size. ## Reference files - [references/ARCHITECTURE.md](references/ARCHITECTURE.md) — Pipeline anatomy, data flow, security boundaries, screen resolution, MCP topology, middleware pipeline. **Read when you need to understand where a new concern hooks in.** - [references/ANTI-PATTERNS.md](references/ANTI-PATTERNS.md) — Concrete failure modes with alternatives: inlining product logic, bloating commandments, building recovery instead of prevention, imperative navigation, bundling mismatched-rate knowledge. **Read when evaluating whether a proposed change fits.** + diff --git a/.claude/skills/wizard-development/references/ANTI-PATTERNS.md b/.claude/skills/wizard-development/references/ANTI-PATTERNS.md index 6898a0e9..aafcf4cb 100644 --- a/.claude/skills/wizard-development/references/ANTI-PATTERNS.md +++ b/.claude/skills/wizard-development/references/ANTI-PATTERNS.md @@ -5,198 +5,77 @@ description: Concrete examples of changes that violate the design discipline, wh # Anti-patterns -Each section describes a pattern that looks reasonable in isolation but -degrades the system over time. The failure modes are drawn from real -production codebases — they're not hypothetical. +Each section describes a pattern that looks reasonable in isolation but degrades the system over time. The failure modes are drawn from real production codebases — they're not hypothetical. ## Inlining product logic into the runner -**The pattern:** A new feature needs something after the agent completes — -committing an event plan to an API, polling for data ingestion, creating a -dashboard server-side. The fastest path is adding the logic directly to the -runner function, after the agent run and before the outro. - -**What happens next:** The runner grows. It now knows about your product -concept (event plans, dashboards, data ingestion). Another feature does the -same thing. The runner now knows about two product concepts. A third. The -function passes 500 lines, then 1000. Each product concern interacts with -error handling — a network failure during dashboard creation needs different -recovery than a failure during event plan commit. The error handling branches -multiply. Token refresh logic appears (long runs outlive the OAuth expiry). -Soft-error vs. hard-error classification appears (did the agent do enough work -that we should continue despite the failure?). Each of these is reasonable in -isolation. Together they produce a function that no one can hold in their head. - -**The cost:** Every future change to any product concern risks breaking every -other product concern. A contributor who wants to modify the event plan step -must understand the dashboard step, the data ingestion step, the error -classification, and the token refresh timing. The required knowledge for a -local change has expanded to the full function. - -**What to do instead:** Use `WorkflowRun.postRun` for post-agent work. If the -post-agent step is complex enough to need its own error handling, extract it to -a module and call it from postRun. If multiple workflows need the same -post-agent step, make it a shared function that postRun calls — not shared code -in the runner. The runner stays generic; the product knowledge stays in -workflow configuration. +**The pattern:** A new feature needs something after the agent completes — committing an event plan to an API, polling for data ingestion, creating a dashboard server-side. The fastest path is adding the logic directly to the runner function, after the agent run and before the outro. + +**What happens next:** The runner grows. It now knows about your product concept (event plans, dashboards, data ingestion). Another feature does the same thing. The runner now knows about two product concepts. A third. The function passes 500 lines, then 1000. Each product concern interacts with error handling — a network failure during dashboard creation needs different recovery than a failure during event plan commit. The error handling branches multiply. Token refresh logic appears (long runs outlive the OAuth expiry). Soft-error vs. hard-error classification appears (did the agent do enough work that we should continue despite the failure?). Each of these is reasonable in isolation. Together they produce a function that no one can hold in their head. + +**The cost:** Every future change to any product concern risks breaking every other product concern. A contributor who wants to modify the event plan step must understand the dashboard step, the data ingestion step, the error classification, and the token refresh timing. The required knowledge for a local change has expanded to the full function. + +**What to do instead:** Use `WorkflowRun.postRun` for post-agent work. If the post-agent step is complex enough to need its own error handling, extract it to a module and call it from postRun. If multiple workflows need the same post-agent step, make it a shared function that postRun calls — not shared code in the runner. The runner stays generic; the product knowledge stays in workflow configuration. --- ## Growing commandments into tutorials -**The pattern:** The agent keeps making the same mistake — using the wrong env -var name, running a project-wide lint instead of scoping to edited files, -installing non-PostHog packages. The fix seems obvious: add more detail to the -commandments so the agent knows exactly what to do. - -**What happens next:** Each commandment grows from one sentence to a paragraph. -Paragraphs acquire bullet lists. Bullet lists acquire sub-bullets with -examples. The commandments file grows from 500 tokens to 5,000. But -commandments are appended to the system prompt on every turn — they're part of -the cached prefix, but they still consume context window. On a long run with -many tool calls, the bloated commandments push useful context (the user's -actual code, the skill content, the conversation history) out of the window. -The agent starts making mistakes because it can't see its own prior work, not -because it lacks instructions. - -**The cost:** Context window is finite. Every token of commandment is a token -not available for the agent's actual task. And long commandments create a -second problem: the agent satisfices across conflicting instructions rather -than following any single instruction precisely. A 20-bullet commandment list -gets partially followed; a 5-line commandment list gets followed completely. - -**What to do instead:** Keep commandments to one sentence per rule — the -invariant, not the explanation. Move the explanation, examples, edge cases, and -rationale to a skill reference file. The agent loads reference files on demand -(progressive disclosure), so the detailed guidance is available when needed -without consuming context window on every turn. The commandment points at the -reference: "API key conventions — see `wizard-prompt-supplement/references/api-keys-and-env.md`." +**The pattern:** The agent keeps making the same mistake — using the wrong env var name, running a project-wide lint instead of scoping to edited files, installing non-PostHog packages. The fix seems obvious: add more detail to the commandments so the agent knows exactly what to do. + +**What happens next:** Each commandment grows from one sentence to a paragraph. Paragraphs acquire bullet lists. Bullet lists acquire sub-bullets with examples. The commandments file grows from 500 tokens to 5,000. But commandments are appended to the system prompt on every turn — they're part of the cached prefix, but they still consume context window. On a long run with many tool calls, the bloated commandments push useful context (the user's actual code, the skill content, the conversation history) out of the window. The agent starts making mistakes because it can't see its own prior work, not because it lacks instructions. + +**The cost:** Context window is finite. Every token of commandment is a token not available for the agent's actual task. And long commandments create a second problem: the agent satisfices across conflicting instructions rather than following any single instruction precisely. A 20-bullet commandment list gets partially followed; a 5-line commandment list gets followed completely. + +**What to do instead:** Keep commandments to one sentence per rule — the invariant, not the explanation. Move the explanation, examples, edge cases, and rationale to a skill reference file. The agent loads reference files on demand (progressive disclosure), so the detailed guidance is available when needed without consuming context window on every turn. The commandment points at the reference: "API key conventions — see `wizard-prompt-supplement/references/api-keys-and-env.md`." --- ## Building recovery instead of prevention -**The pattern:** The agent occasionally runs a destructive command, writes a -hardcoded secret, or gets stuck in a retry loop. The response is to build -infrastructure that detects the problem and recovers: a circuit breaker that -kills the run after N consecutive denied commands, a checkpoint system that -saves state so the run can resume, a self-heal module that detects and patches -agent mistakes. - -**What happens next:** The recovery infrastructure works — sort of. The circuit -breaker fires after 47 denied commands instead of preventing the first one. The -checkpoint system adds 300 lines of state serialization. The self-heal module -adds 250 lines of heuristic pattern matching. Each module is tested -independently but their interactions are not — a checkpoint saved during a -self-heal cycle can restore to a state that triggers the circuit breaker. The -recovery code becomes its own source of bugs, and the bugs are harder to -diagnose because they involve interactions between three systems that were each -designed in isolation. - -**The cost:** Recovery is O(n) in the number of failure modes. Each new failure -mode needs new recovery code. Prevention is O(1) — a YARA rule, a canUseTool -entry, or a skill content improvement addresses the root cause once. The -codebase grows linearly with recovery and stays constant with prevention. - -**What to do instead:** For each failure mode, ask: can I prevent this at the -boundary? A command that should never run → add it to the canUseTool deny list. -A pattern that should never appear in written code → add a YARA rule. A mistake -the agent keeps making → improve the skill content or add a one-line -commandment. Reserve recovery infrastructure for truly unpredictable failures -(network outages, API rate limits) — not for agent behavior that can be shaped -by better boundaries. +**The pattern:** The agent occasionally runs a destructive command, writes a hardcoded secret, or gets stuck in a retry loop. The response is to build infrastructure that detects the problem and recovers: a circuit breaker that kills the run after N consecutive denied commands, a checkpoint system that saves state so the run can resume, a self-heal module that detects and patches agent mistakes. + +**What happens next:** The recovery infrastructure works — sort of. The circuit breaker fires after 47 denied commands instead of preventing the first one. The checkpoint system adds 300 lines of state serialization. The self-heal module adds 250 lines of heuristic pattern matching. Each module is tested independently but their interactions are not — a checkpoint saved during a self-heal cycle can restore to a state that triggers the circuit breaker. The recovery code becomes its own source of bugs, and the bugs are harder to diagnose because they involve interactions between three systems that were each designed in isolation. + +**The cost:** Recovery is O(n) in the number of failure modes. Each new failure mode needs new recovery code. Prevention is O(1) — a YARA rule, a canUseTool entry, or a skill content improvement addresses the root cause once. The codebase grows linearly with recovery and stays constant with prevention. + +**What to do instead:** For each failure mode, ask: can I prevent this at the boundary? A command that should never run → add it to the canUseTool deny list. A pattern that should never appear in written code → add a YARA rule. A mistake the agent keeps making → improve the skill content or add a one-line commandment. Reserve recovery infrastructure for truly unpredictable failures (network outages, API rate limits) — not for agent behavior that can be shaped by better boundaries. --- ## Imperative UI navigation -**The pattern:** A screen needs to appear conditionally — only when the agent -encounters a specific error, or only when a feature flag is enabled. The -fastest path is adding a navigation call: `if (condition) goToScreen('error')`. - -**What happens next:** The navigation call works for this case. Another -conditional screen needs the same treatment. A third. Now the flow has three -imperative jumps that interact with the router's predicate-based resolution. -The router says the active screen should be X (based on session state), but an -imperative jump sent the user to Y. The session state says the user is on X. -The screen shows Y. A store setter fires, the router re-resolves, and the user -is yanked from Y back to X mid-interaction. The fix is another imperative jump -to keep the user on Y. The fix for the fix is a flag that suppresses the -router. The router is now partially bypassed. - -**The cost:** The system has two navigation mechanisms that disagree. Every -future screen change must account for both the predicate-based flow and the -imperative jumps. Bugs in this regime are non-local — the cause is a state -mutation in the runner, the symptom is a screen flicker in the TUI, and the -"fix" that someone applies (another imperative jump) makes the problem worse. - -**What to do instead:** Add a field to `WizardSession` that represents the -condition. Add a `show` or `isComplete` predicate on the workflow step that -reads the field. The router derives the correct screen automatically. If the -condition is an overlay (error modal, conflict resolution), use -`store.pushOverlay()` — overlays stack above the flow and pop when dismissed, -without disrupting the flow cursor. +**The pattern:** A screen needs to appear conditionally — only when the agent encounters a specific error, or only when a feature flag is enabled. The fastest path is adding a navigation call: `if (condition) goToScreen('error')`. + +**What happens next:** The navigation call works for this case. Another conditional screen needs the same treatment. A third. Now the flow has three imperative jumps that interact with the router's predicate-based resolution. The router says the active screen should be X (based on session state), but an imperative jump sent the user to Y. The session state says the user is on X. The screen shows Y. A store setter fires, the router re-resolves, and the user is yanked from Y back to X mid-interaction. The fix is another imperative jump to keep the user on Y. The fix for the fix is a flag that suppresses the router. The router is now partially bypassed. + +**The cost:** The system has two navigation mechanisms that disagree. Every future screen change must account for both the predicate-based flow and the imperative jumps. Bugs in this regime are non-local — the cause is a state mutation in the runner, the symptom is a screen flicker in the TUI, and the "fix" that someone applies (another imperative jump) makes the problem worse. + +**What to do instead:** Add a field to `WizardSession` that represents the condition. Add a `show` or `isComplete` predicate on the workflow step that reads the field. The router derives the correct screen automatically. If the condition is an overlay (error modal, conflict resolution), use `store.pushOverlay()` — overlays stack above the flow and pop when dismissed, without disrupting the flow cursor. --- ## Bundling knowledge that changes at a different rate -**The pattern:** Integration knowledge (how to set up PostHog in a Next.js App -Router project) is useful and well-written. The simplest delivery mechanism is -to bundle it in the wizard's npm package — ship the skill files alongside the -code. - -**What happens next:** It works fine for a while. Then the docs team updates -the Next.js integration guide. The skill content is now out of date. But the -skill is bundled — updating it requires a wizard release. The wizard release -has its own PR cycle, CI, review, publish. The docs update ships in hours; the -wizard update ships in days. The gap widens as the product surface grows — -more frameworks, more features, more docs updates, each waiting for a wizard -release to propagate. - -**The cost:** Knowledge delivery is gated by code deployment. Teams that own -the knowledge (docs, frameworks, product) can't ship updates without -coordinating with the team that owns the deployment (wizard infrastructure). -This is a coordination cost that scales linearly with the number of knowledge -domains. - -**What to do instead:** Deliver knowledge through a decoupled channel. The -wizard fetches skill packages at runtime from a source that the knowledge -owners can update independently. Context-mill publishes skills as GitHub -release assets — the docs team can update a skill without touching the wizard -repo. The wizard's runtime fetch adds a network dependency, but the tradeoff -is correct: the network cost is paid once per run, the coordination cost of -bundling is paid on every update across every domain. - -**When bundling is fine:** If the knowledge rarely changes and the domain -count is small, bundling avoids the complexity of runtime fetch. The -decision depends on the rate of change, not on a universal preference. +**The pattern:** Integration knowledge (how to set up PostHog in a Next.js App Router project) is useful and well-written. The simplest delivery mechanism is to bundle it in the wizard's npm package — ship the skill files alongside the code. + +**What happens next:** It works fine for a while. Then the docs team updates the Next.js integration guide. The skill content is now out of date. But the skill is bundled — updating it requires a wizard release. The wizard release has its own PR cycle, CI, review, publish. The docs update ships in hours; the wizard update ships in days. The gap widens as the product surface grows — more frameworks, more features, more docs updates, each waiting for a wizard release to propagate. + +**The cost:** Knowledge delivery is gated by code deployment. Teams that own the knowledge (docs, frameworks, product) can't ship updates without coordinating with the team that owns the deployment (wizard infrastructure). This is a coordination cost that scales linearly with the number of knowledge domains. + +**What to do instead:** Deliver knowledge through a decoupled channel. The wizard fetches skill packages at runtime from a source that the knowledge owners can update independently. Context-mill publishes skills as GitHub release assets — the docs team can update a skill without touching the wizard repo. The wizard's runtime fetch adds a network dependency, but the tradeoff is correct: the network cost is paid once per run, the coordination cost of bundling is paid on every update across every domain. + +**When bundling is fine:** If the knowledge rarely changes and the domain count is small, bundling avoids the complexity of runtime fetch. The decision depends on the rate of change, not on a universal preference. --- ## Shared mutable state between workflows -**The pattern:** Two workflows need the same information — the detected -framework, the user's credentials, a feature discovery result. The fastest -path is having both workflows read and write the same session fields, relying -on execution order to ensure the first workflow populates what the second -needs. - -**What happens next:** The dependency is implicit. A refactor changes the -execution order. The second workflow reads a field the first workflow hasn't -populated yet. The field is null. The second workflow fails with a cryptic -error. Or worse: it succeeds with default values that produce a subtly wrong -integration. The bug is invisible in tests because the test setup populates -the field explicitly. - -**The cost:** Implicit ordering dependencies between workflows make the system -fragile to refactoring. The dependency graph exists in the developer's head, -not in the code. - -**What to do instead:** Use the `requires` field on `WorkflowConfig` to make -workflow dependencies explicit. For shared data, use `frameworkContext` with -documented keys — the `setFrameworkContext` setter ensures `emitChange()` fires -and downstream predicates re-evaluate. If two workflows need the same -detection result, factor the detection into a shared module that both -workflows' `onReady` hooks call, rather than relying on one workflow to -populate what another reads. +**The pattern:** Two workflows need the same information — the detected framework, the user's credentials, a feature discovery result. The fastest path is having both workflows read and write the same session fields, relying on execution order to ensure the first workflow populates what the second needs. + +**What happens next:** The dependency is implicit. A refactor changes the execution order. The second workflow reads a field the first workflow hasn't populated yet. The field is null. The second workflow fails with a cryptic error. Or worse: it succeeds with default values that produce a subtly wrong integration. The bug is invisible in tests because the test setup populates the field explicitly. + +**The cost:** Implicit ordering dependencies between workflows make the system fragile to refactoring. The dependency graph exists in the developer's head, not in the code. + +**What to do instead:** Use the `requires` field on `WorkflowConfig` to make workflow dependencies explicit. For shared data, use `frameworkContext` with documented keys — the `setFrameworkContext` setter ensures `emitChange()` fires and downstream predicates re-evaluate. If two workflows need the same detection result, factor the detection into a shared module that both workflows' `onReady` hooks call, rather than relying on one workflow to populate what another reads. + diff --git a/.claude/skills/wizard-development/references/ARCHITECTURE.md b/.claude/skills/wizard-development/references/ARCHITECTURE.md index 54751940..a96bb762 100644 --- a/.claude/skills/wizard-development/references/ARCHITECTURE.md +++ b/.claude/skills/wizard-development/references/ARCHITECTURE.md @@ -7,9 +7,7 @@ description: How data moves through the wizard — from CLI args to the screen t ## The runner pipeline -Every wizard run — framework integration, revenue analytics, audit, generic -skill — executes the same pipeline in `agent-runner.ts`. The pipeline is -fixed. What varies is the `WorkflowRun` configuration object. +Every wizard run — framework integration, revenue analytics, audit, generic skill — executes the same pipeline in `agent-runner.ts`. The pipeline is fixed. What varies is the `WorkflowRun` configuration object. ``` 1. Init logging + debug @@ -28,29 +26,17 @@ fixed. What varies is the `WorkflowRun` configuration object. ### Where configuration hooks fire -- **WorkflowRun.customPrompt** — called at step 7, receives `PromptContext` - (projectId, apiKey, host, skillPath). Returns additional prompt text. -- **WorkflowRun.postRun** — called at step 10, receives session + credentials. - Runs after the agent succeeds but before the outro. -- **WorkflowRun.buildOutroData** — called at step 11 if present. Otherwise - the runner builds default outro data from successMessage/reportFile/docsUrl. -- **WorkflowRun.abortCases** — matched against `[ABORT] ` signals - during step 8. First regex match renders a custom error outro. -- **WorkflowStep.onInit** — fires during store construction (step 0, before - session is assigned). Use for session-independent work only (e.g. health - check prefetch). -- **WorkflowStep.onReady** — fires after `tui.store.session = session` in - bin.ts. Awaited in sequence. Use for session-dependent pre-flow work (e.g. - framework detection, prerequisite scanning). -- **WorkflowStep.gate** — predicate checked on every `emitChange()`. bin.ts - parks on `await store.getGate(stepId)` until the predicate flips true. +- **WorkflowRun.customPrompt** — called at step 7, receives `PromptContext` (projectId, apiKey, host, skillPath). Returns additional prompt text. +- **WorkflowRun.postRun** — called at step 10, receives session + credentials. Runs after the agent succeeds but before the outro. +- **WorkflowRun.buildOutroData** — called at step 11 if present. Otherwise the runner builds default outro data from successMessage/reportFile/docsUrl. +- **WorkflowRun.abortCases** — matched against `[ABORT] ` signals during step 8. First regex match renders a custom error outro. +- **WorkflowStep.onInit** — fires during store construction (step 0, before session is assigned). Use for session-independent work only (e.g. health check prefetch). +- **WorkflowStep.onReady** — fires after `tui.store.session = session` in bin.ts. Awaited in sequence. Use for session-dependent pre-flow work (e.g. framework detection, prerequisite scanning). +- **WorkflowStep.gate** — predicate checked on every `emitChange()`. bin.ts parks on `await store.getGate(stepId)` until the predicate flips true. ### What the runner does NOT know -The runner doesn't know what framework is being integrated. It doesn't know -what skills exist. It doesn't know what env vars are called. It doesn't know -what the outro should say. All of that comes from `WorkflowRun` and -`FrameworkConfig` — configuration, not code. +The runner doesn't know what framework is being integrated. It doesn't know what skills exist. It doesn't know what env vars are called. It doesn't know what the outro should say. All of that comes from `WorkflowRun` and `FrameworkConfig` — configuration, not code. ## Session data flow @@ -72,14 +58,11 @@ postRun hooks → env var upload, etc. Outro → session.outroData drives the outro screen ``` -Session is populated in layers. Early layers provide defaults. Later layers -override. Business logic reads from the session — never calls a prompt. The -session never calls `getUI()`. +Session is populated in layers. Early layers provide defaults. Later layers override. Business logic reads from the session — never calls a prompt. The session never calls `getUI()`. ## Agent output flow -During the agent run (step 8), the SDK emits messages via an async generator. -`handleSDKMessage` in `agent-interface.ts` processes each message: +During the agent run (step 8), the SDK emits messages via an async generator. `handleSDKMessage` in `agent-interface.ts` processes each message: ``` SDK message (async generator) @@ -95,11 +78,7 @@ handleSDKMessage() └─ system message (init) → log tools/model/mcpServers ``` -Key: the agent doesn't know the TUI exists. It uses standard Claude Code -patterns (`[STATUS]` text markers, `TodoWrite` tool calls) and the harness -translates them into store state. Adding a new observation channel means -adding a new signal pattern to `handleSDKMessage` and a new store atom + -setter, not modifying the agent prompt. +Key: the agent doesn't know the TUI exists. It uses standard Claude Code patterns (`[STATUS]` text markers, `TodoWrite` tool calls) and the harness translates them into store state. Adding a new observation channel means adding a new signal pattern to `handleSDKMessage` and a new store atom + setter, not modifying the agent prompt. ## Security boundary flow @@ -119,11 +98,9 @@ PostToolUse YARA hooks [L2] → scan output, instruct revert or terminate Result returned to agent ``` -The sandbox (filesystem + network scoping) is configured once in the SDK -`query()` call and enforced by the SDK runtime — not by wizard code. +The sandbox (filesystem + network scoping) is configured once in the SDK `query()` call and enforced by the SDK runtime — not by wizard code. -Commandments (L0) are in the system prompt and operate at the model's -judgment layer — no code enforcement. They're the first line, not the last. +Commandments (L0) are in the system prompt and operate at the model's judgment layer — no code enforcement. They're the first line, not the last. ## Screen resolution flow @@ -147,51 +124,35 @@ router.resolve(session) fallback: last entry (outro) ``` -No imperative navigation anywhere. The router is a pure function of session -state + overlay stack. If you need to change which screen is active, change -the session state that the predicates read. +No imperative navigation anywhere. The router is a pure function of session state + overlay stack. If you need to change which screen is active, change the session state that the predicates read. ## The WizardUI abstraction -Business logic never imports the store directly. It calls `getUI()`, which -returns a `WizardUI` interface. Two implementations: +Business logic never imports the store directly. It calls `getUI()`, which returns a `WizardUI` interface. Two implementations: - **InkUI** — translates calls to store setters. Used in interactive TUI mode. - **LoggingUI** — translates calls to console output. Used in CI mode. -This boundary means the runner, the agent interface, and the OAuth flow don't -know whether they're driving a TUI or printing to a log. When adding a new -piece of state that the UI should reflect: +This boundary means the runner, the agent interface, and the OAuth flow don't know whether they're driving a TUI or printing to a log. When adding a new piece of state that the UI should reflect: 1. Add the field to `WizardSession` 2. Add a setter to `WizardStore` that calls `emitChange()` 3. Add the method to `WizardUI` interface -4. Implement in both `InkUI` (delegates to store setter) and `LoggingUI` - (prints or no-ops) +4. Implement in both `InkUI` (delegates to store setter) and `LoggingUI` (prints or no-ops) ## MCP server topology The agent has access to two MCP servers: -- **posthog-wizard** — remote, HTTP-based. The PostHog MCP server at - `mcp.posthog.com/mcp` (or `mcp-eu.posthog.com/mcp`). Provides query tools - for PostHog data, dashboard creation, etc. Authenticated via Bearer token. - Tool schemas are deferred (`ENABLE_TOOL_SEARCH: 'auto:0'`) to avoid - bloating the system prompt. +- **posthog-wizard** — remote, HTTP-based. The PostHog MCP server at `mcp.posthog.com/mcp` (or `mcp-eu.posthog.com/mcp`). Provides query tools for PostHog data, dashboard creation, etc. Authenticated via Bearer token. Tool schemas are deferred (`ENABLE_TOOL_SEARCH: 'auto:0'`) to avoid bloating the system prompt. -- **wizard-tools** — local, in-process. Created by `createWizardToolsServer()` - in `wizard-tools.ts`. Provides `check_env_keys`, `set_env_values`, - `detect_package_manager`, `load_skill_menu`, `install_skill`. Runs in the - wizard process — secret values never leave the machine. +- **wizard-tools** — local, in-process. Created by `createWizardToolsServer()` in `wizard-tools.ts`. Provides `check_env_keys`, `set_env_values`, `detect_package_manager`, `load_skill_menu`, `install_skill`. Runs in the wizard process — secret values never leave the machine. -Frameworks can add additional MCP servers via -`FrameworkConfig.metadata.additionalMcpServers` (e.g. SvelteKit adds the -official Svelte MCP at `https://mcp.svelte.dev/mcp`). +Frameworks can add additional MCP servers via `FrameworkConfig.metadata.additionalMcpServers` (e.g. SvelteKit adds the official Svelte MCP at `https://mcp.svelte.dev/mcp`). ## Middleware pipeline -The middleware system is opt-in (currently used for benchmarking). It -implements `{ onMessage, finalize }` — the same interface the runner expects: +The middleware system is opt-in (currently used for benchmarking). It implements `{ onMessage, finalize }` — the same interface the runner expects: ``` MiddlewarePipeline @@ -200,10 +161,7 @@ MiddlewarePipeline └─ middleware N: ... ``` -Each middleware has a `name` and optional lifecycle hooks. A shared store -(`MiddlewareContext.get` / `MiddlewareStore.set`) lets upstream middleware -publish data that downstream middleware reads. Phase detection is automatic -(from SDK message content) or explicit (`pipeline.startPhase()`). +Each middleware has a `name` and optional lifecycle hooks. A shared store (`MiddlewareContext.get` / `MiddlewareStore.set`) lets upstream middleware publish data that downstream middleware reads. Phase detection is automatic (from SDK message content) or explicit (`pipeline.startPhase()`). + +To add a middleware: implement the `Middleware` interface, add it to the pipeline construction in `agent-runner.ts`. The pipeline dispatches in order. -To add a middleware: implement the `Middleware` interface, add it to the -pipeline construction in `agent-runner.ts`. The pipeline dispatches in order. diff --git a/CLAUDE.md b/CLAUDE.md index c34cb026..29ac0f13 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -80,3 +80,4 @@ navigation (`goTo`, `navigate`, `push`) anywhere. - Feedback / issues: wizard@posthog.com or [GitHub Issues](https://github.com/posthog/wizard/issues). + From a182bc65c798dc560aae1725284008b77130ae28 Mon Sep 17 00:00:00 2001 From: Danilo Campos Date: Wed, 6 May 2026 15:13:59 -0400 Subject: [PATCH 4/9] Refinements from PR, local testing --- .claude/skills/adding-skill-workflow/SKILL.md | 218 +++++++-------- .../references/WORKFLOW-GUIDE.md | 251 ------------------ .claude/skills/wizard-development/SKILL.md | 3 + .../references/MAINTAINING-SKILLS.md | 85 ++++++ 4 files changed, 191 insertions(+), 366 deletions(-) delete mode 100644 .claude/skills/adding-skill-workflow/references/WORKFLOW-GUIDE.md create mode 100644 .claude/skills/wizard-development/references/MAINTAINING-SKILLS.md diff --git a/.claude/skills/adding-skill-workflow/SKILL.md b/.claude/skills/adding-skill-workflow/SKILL.md index a4d01d96..19fc5b4b 100644 --- a/.claude/skills/adding-skill-workflow/SKILL.md +++ b/.claude/skills/adding-skill-workflow/SKILL.md @@ -1,172 +1,160 @@ --- name: adding-skill-workflow -description: Create a new skill-based workflow for the PostHog wizard. Use when adding a new workflow type (like revenue analytics, error tracking, feature flags) that installs a context-mill skill and runs an agent. Covers workflow steps, detection, flow registration, runner, custom screens, and CLI command. +description: Create a new skill-based workflow for the PostHog wizard. Use when adding a workflow type (like revenue analytics, audit, error tracking) that installs a context-mill skill and runs an agent against it. Covers the createSkillWorkflow factory for the common case, customization via WorkflowRun, and advanced patterns for custom screens or detection. compatibility: Designed for Claude Code working on the PostHog wizard codebase. metadata: author: posthog - version: "1.1" + version: "2.0" --- # Adding a Skill-Based Workflow -## Architecture Overview +A skill-based workflow installs a context-mill skill and runs the agent against it. Examples in the codebase: the `audit` workflow (clean factory call), the `revenue-analytics` workflow (factory + custom intro screen + detect step). -Skill-based workflows (like revenue analytics) follow a different path from framework integrations. Instead of the agent runner building a prompt from a `FrameworkConfig`, a skill-based workflow: +Before reading this, read `wizard-development/SKILL.md` for the architectural context — particularly principle 4 ("New capability is a new workflow, not a new branch"). -1. **Detects prerequisites** and downloads a skill from context-mill -2. **Runs the agent** against the installed skill using the generic `skill-runner.ts` -3. **Shows results** via data-driven outro (no hardcoded messages) +## Architecture -Key files: -- `src/lib/workflow-step.ts` — `WorkflowStep` interface with `gate`, `onInit`, `StoreInitContext` -- `src/lib/skill-runner.ts` — Generic runner: takes a skill path, builds bootstrap prompt, runs agent -- `src/lib/wizard-tools.ts` — `fetchSkillMenu()` and `downloadSkill()` for installing skills via code -- `src/utils/file-utils.ts` — Shared `IGNORED_DIRS` for project-tree scans -- `src/ui/tui/flows.ts` — `Flow` enum, `Screen` enum, `WORKFLOW_STEPS`, `FLOWS` maps -- `src/ui/tui/screen-registry.tsx` — Maps screen IDs to React components -- `src/ui/tui/store.ts` — Gate system derived from workflow step definitions +The wizard's runner pipeline is fixed. What varies between workflows is a `WorkflowRun` configuration object that controls the skill ID, prompt, success message, abort cases, and post-run hooks. A `WorkflowConfig` ties together: the CLI command, the step list, and the `WorkflowRun`. The workflow registry derives all downstream wiring — CLI subcommands, TUI flows, the router — from a single array. **Adding a workflow is configuration, not code.** -## How It Works +## The common case: `createSkillWorkflow` -### Gates and isComplete +For workflows that just install a skill and let the agent run it (most workflows), use the factory in `agent-skill/index.ts`: -- **`isComplete`** — exit condition for the screen. Router advances past the step when true. Defaults to `gate` if unset. -- **`gate`** — define this if your screen needs to await user interactions. bin.ts pauses on `await store.getGate(stepId)` until the predicate becomes true. +```ts +// src/lib/workflows/error-tracking/index.ts +import { createSkillWorkflow } from '../agent-skill/index.js'; -### Detect step pattern +export const errorTrackingConfig = createSkillWorkflow({ + skillId: 'error-tracking-setup', + command: 'errors', + flowKey: 'error-tracking', + description: 'Set up PostHog error tracking', + integrationLabel: 'error-tracking', + successMessage: 'Error tracking configured!', + reportFile: 'posthog-error-tracking-report.md', + docsUrl: 'https://posthog.com/docs/error-tracking', + spinnerMessage: 'Setting up error tracking...', + estimatedDurationMinutes: 5, + requires: ['posthog-integration'], // optional: prior workflows that must run first +}); +``` -Detection is split into two pieces: +Then register it in two places: -1. **A headless `detect` workflow step** with a gate predicate that resolves once `frameworkContext.skillPath` or `frameworkContext.detectError` is set. -2. **An exported `detect*Prerequisites()` async function** that bin.ts calls AFTER the session is assigned to the store. +1. `src/lib/workflows/workflow-registry.ts` — add to `WORKFLOW_REGISTRY` array +2. `src/ui/tui/flows.ts` — add a `Flow` enum entry whose value matches `flowKey` -**Why not `onInit`?** Because `onInit` fires during store construction (inside `_initFromWorkflow`), which runs BEFORE `tui.store.session = session` in bin.ts. Any `onInit` that reads `session.installDir` would get the default `process.cwd()`, not the app directory. `onInit` is fine for session-independent work like the integration flow's health check. +That's the entire workflow. **bin.ts, the store, the agent runner, and the router all derive their wiring from the registry automatically.** Don't add a yargs command. Don't add a runner function. Don't touch bin.ts. -```typescript -// In your workflow file -export async function detectYourPrerequisites( - session: WizardSession, - setFrameworkContext: (key: string, value: unknown) => void, -): Promise { - // Verify session.installDir, scan for required artifacts, fetch + download - // the skill. On failure: setFrameworkContext('detectError', '...'). - // On success: setFrameworkContext('skillPath', '.claude/skills/...'). - // Optionally store any data the intro screen should render. -} +The `audit` workflow (`src/lib/workflows/audit/`) is the cleanest example of this pattern. -export const YOUR_WORKFLOW: Workflow = [ - { - id: 'detect', - label: 'Detecting prerequisites', - gate: (s) => - s.frameworkContext.skillPath != null || - s.frameworkContext.detectError != null, - }, - // ... -]; -``` +## Customizing the agent run -### Error handling — never console.error from inside the TUI +`createSkillWorkflow` accepts these optional fields on `SkillWorkflowOptions`, all of which flow through to the `WorkflowRun`: -When the Ink TUI is rendering, calling `console.error` and `process.exit(1)` mangles the screen. Instead, your custom intro screen reads `frameworkContext.detectError` and renders an error view with an Exit option. bin.ts just awaits the intro gate — the screen handles both success and error states. +| Option | Purpose | +|---|---| +| `customPrompt` | Extra prompt instructions appended after the default project prompt | +| `buildOutroData` | Override the default outro. Receives session, credentials, cloud region. Returns `OutroData`. | +| `abortCases` | Array of `{ match: RegExp, message, body, docsUrl? }` that match `[ABORT] ` signals from the skill | +| `requires` | Other workflow `flowKey`s that must be satisfied first | -### StoreInitContext +For more complex post-agent work (env var upload, dashboard creation, anything that needs to run after the agent completes but before the outro), drop the factory and build the `WorkflowConfig` directly so you can set `WorkflowRun.postRun`. See `posthog-integration` for that pattern. -Available in `onInit` callbacks (use only for session-independent work): -- `ctx.session` — read current session state -- `ctx.setReadinessResult(result)` — store health check results -- `ctx.setFrameworkContext(key, value)` — store detection results -- `ctx.emitChange()` — trigger gate re-evaluation +## Dynamic run configuration -## Steps to Add a Workflow +If your workflow needs to inspect the session before building the run config (read framework context, seed state on disk, set per-session prompt fragments), pass an async function as the workflow's `run`: -### 1. Define workflow steps +```ts +const baseConfig = createSkillWorkflow({ /* ... */ }); -Create `src/lib/workflows/.ts` with a detect step + (optional) intro step + auth + run + outro. +const dynamicRun = async (session: WizardSession): Promise => { + // do per-session work here (e.g. seed a ledger, populate frameworkContext) + if (!baseConfig.run) throw new Error('missing run'); + return typeof baseConfig.run === 'function' + ? baseConfig.run(session) + : baseConfig.run; +}; -Export `detect*Prerequisites()` as a standalone async function — do NOT put detection in `onInit`. - -### 2. Register the flow +export const yourConfig: WorkflowConfig = { + ...baseConfig, + run: dynamicRun, +}; +``` -In `src/ui/tui/flows.ts`: -- Add to `Flow` enum -- Add to `WORKFLOW_STEPS` map -- Add to `FLOWS` record via `workflowToFlowEntries()` +The `audit` workflow uses this pattern to seed a checks ledger on disk before the agent run. -### 3. Create the runner +## Custom screens -The runner is trivial — it reads the skill path from session and delegates to `runSkillBootstrap()`: +Skill-based workflows default to the generic step list in `agent-skill/steps.ts` (intro → auth → run → outro → keep-skills). To use workflow-specific screens (a custom intro that displays detection results, a custom outro with workflow-specific bullets), override the relevant step's `screen` field: -```typescript -import { runSkillBootstrap } from './skill-runner'; +```ts +const SCREEN_BY_STEP: Record = { + intro: 'your-intro', + outro: 'your-outro', +}; -export async function runYourWizard(session: WizardSession): Promise { - const skillPath = session.frameworkContext.skillPath as string; +const yourSteps: Workflow = AGENT_SKILL_STEPS.map((step) => { + const override = SCREEN_BY_STEP[step.id]; + return override ? { ...step, screen: override } : step; +}); - await runSkillBootstrap(session, { - skillPath, - integrationLabel: 'your-workflow', - promptContext: 'Set up X for this project.', - successMessage: 'X configured!', - reportFile: 'posthog-x-report.md', - docsUrl: 'https://posthog.com/docs/x', - spinnerMessage: 'Setting up X...', - estimatedDurationMinutes: 5, - }); -} +export const yourConfig: WorkflowConfig = { + ...baseConfig, + steps: yourSteps, +}; ``` -Use the actual skill ID from context-mill's skill menu — don't guess. +Then: -### 4. (Optional) Custom intro screen +1. Add the screen IDs to the `Screen` enum in `flows.ts` +2. Create the React component(s) under `src/ui/tui/screens/` +3. Register them in `src/ui/tui/screen-registry.tsx` -If you want a workflow-specific welcome screen, create one. The screen should also handle the `detectError` state since that's where errors are rendered. +The screen reads from the store (via `useWizardStore`), renders error states from `frameworkContext.detectError` if present, and calls `store.completeSetup()` (or equivalent) when the user advances. The router resolves the active screen from session state — see `wizard-development/references/ARCHITECTURE.md` for the full screen resolution flow. **Never call `console.error` or imperatively navigate from inside the TUI.** -**a.** Add a screen ID to the `Screen` enum in `src/ui/tui/flows.ts`. +## Detection / prerequisite checking -**b.** Create `src/ui/tui/screens/YourIntroScreen.tsx`. Subscribe to the store, read `detectError` and detection results from `session.frameworkContext`, render either an error view (with Exit) or the welcome view (with Continue/Cancel). On confirm, call `store.completeSetup()`. +If your workflow needs to verify prerequisites before showing the intro screen (e.g. PostHog must already be installed, certain SDKs must be present), add a headless detect step at the top of the workflow with an `onReady` hook: -**c.** Register it in `src/ui/tui/screen-registry.tsx`. - -**d.** Add an intro step to your workflow (after `detect`, before `auth`): -```typescript +```ts { - id: 'intro', - label: 'Welcome', - screen: 'your-intro', - gate: (s) => s.setupConfirmed, - isComplete: (s) => s.setupConfirmed, + id: 'detect', + label: 'Detecting prerequisites', + // No screen — this step is headless + onReady: async (ctx) => { + // ctx.session.installDir is the user's project dir + // On success: ctx.setFrameworkContext('skillPath', '...') + // On failure: ctx.setFrameworkContext('detectError', { kind: '...', ... }) + }, }, ``` -In bin.ts, await the intro gate after detect. Don't pre-set `setupConfirmed = true` if you have a custom intro — the user confirms via the screen. +Use `onReady`, not `onInit` — `onInit` fires during store construction before `session` is assigned, so it can't read `installDir`. The custom intro screen reads `frameworkContext.detectError` and renders an error view (with an Exit option) when present, or the welcome view otherwise. -### 5. Add the CLI command +The `revenue-analytics` workflow is the canonical example of this pattern (detect step + custom intro + abort cases). -In `bin.ts`, add a yargs command. The pattern: -1. Start the TUI with your `Flow` -2. Build session, assign to store -3. Call `detect*Prerequisites()` explicitly -4. Await `getGate('detect')` -5. Await `getGate('intro')` — the screen handles both error and success states -6. Call your runner -7. Wait for `outroDismissed` via store subscribe, then `process.exit(0)` — without this, the process exits before the user can read the outro +## Verification -**Do not** `console.error` or `process.exit` for `detectError` from bin.ts — that mangles the Ink output. Let the intro screen render the error. +```bash +pnpm build +pnpm test +pnpm fix +``` -### 6. Verify +Then run end-to-end against a real test app: ```bash -pnpm build # Must compile -pnpm test # All tests pass +pnpm try --install-dir= ``` -Then run your command end-to-end against a real test app, including failure cases (missing prerequisites, bad directories) to confirm graceful handling. - -## Reference +Test failure cases too — missing prerequisites, bad install directories, network errors during skill download. The wizard should render structured error outros, not stack traces. -See `references/WORKFLOW-GUIDE.md` for the full step-by-step guide with complete code examples. +## Canonical examples in the codebase -## Canonical Example +- `src/lib/workflows/audit/` — clean `createSkillWorkflow` call with abort cases, custom screens, and a dynamic `run` function for per-session seeding +- `src/lib/workflows/revenue-analytics/` — factory + custom intro screen + detect step with prerequisite checking +- `src/lib/workflows/agent-skill/` — the factory itself (`createSkillWorkflow`) and the generic step list (`AGENT_SKILL_STEPS`) -`src/lib/workflows/revenue-analytics.ts` — read this for a full working implementation of every piece described above. +When in doubt, read the directory of the workflow that most resembles what you're building. diff --git a/.claude/skills/adding-skill-workflow/references/WORKFLOW-GUIDE.md b/.claude/skills/adding-skill-workflow/references/WORKFLOW-GUIDE.md deleted file mode 100644 index 9c733979..00000000 --- a/.claude/skills/adding-skill-workflow/references/WORKFLOW-GUIDE.md +++ /dev/null @@ -1,251 +0,0 @@ -# Adding a New Skill-Based Workflow - -How to add a new workflow to the wizard (like revenue analytics) that installs a skill from context-mill and lets the agent follow it. - -## Prerequisites - -- A skill published to context-mill with a `SKILL.md` and workflow files -- The skill registered in the skill menu under a category (e.g. `revenue-analytics-setup`) - -## Steps - -### 1. Define the workflow steps - -Create `wizard/src/lib/workflows/.ts`: - -```typescript -import type { Workflow } from '../workflow-step.js'; -import { RunPhase } from '../wizard-session.js'; - -export const YOUR_WORKFLOW: Workflow = [ - { - id: 'auth', - label: 'Authentication', - screen: 'auth', - isComplete: (s) => s.credentials !== null, - }, - { - id: 'run', - label: 'Your workflow label', - screen: 'run', - isComplete: (s) => - s.runPhase === RunPhase.Completed || s.runPhase === RunPhase.Error, - }, - { - id: 'outro', - label: 'Done', - screen: 'outro', - isComplete: (s) => s.outroDismissed, - }, -]; -``` - -If your workflow needs a health check or setup confirmation, add `gate` predicates and `onInit` callbacks to the relevant steps: - -**Example: Adding a setup confirmation gate** - -A gate blocks bin.ts from proceeding until its predicate returns true. Here, the `intro` step blocks until the user confirms setup: - -```typescript -{ - id: 'intro', - label: 'Welcome', - screen: 'intro', - gate: (s) => s.setupConfirmed, // bin.ts awaits store.getGate('intro') - isComplete: (s) => s.setupConfirmed, // router advances past this screen -}, -``` - -**Example: Adding a health check with async init** - -The `onInit` callback fires during store construction. Here it kicks off a health check while the user is still on the intro screen. The gate blocks until the result arrives: - -```typescript -import { - evaluateWizardReadiness, - WizardReadiness, -} from '../health-checks/readiness.js'; - -{ - id: 'health-check', - label: 'Health check', - screen: 'health-check', - gate: (s) => { - if (!s.readinessResult) return false; - if (s.readinessResult.decision === WizardReadiness.No) - return s.outageDismissed; // user must dismiss blocking outage - return true; - }, - isComplete: (s) => { - if (!s.readinessResult) return false; - if (s.readinessResult.decision === WizardReadiness.No) - return s.outageDismissed; - return true; - }, - onInit: (ctx) => { - evaluateWizardReadiness() - .then((readiness) => ctx.setReadinessResult(readiness)) - .catch(() => ctx.setReadinessResult({ - decision: WizardReadiness.Yes, - health: {} as never, - reasons: [], - })); - }, -}, -``` - -If your workflow doesn't have these steps, the gates simply don't exist and `store.getGate('...')` resolves immediately. - -### 2. Register the flow - -In `wizard/src/ui/tui/flows.ts`: - -1. Add to the `Flow` enum: -```typescript -export enum Flow { - Wizard = 'wizard', - Revenue = 'revenue', - YourFlow = 'your-flow', // add - // ... -} -``` - -2. Add to `WORKFLOW_STEPS`: -```typescript -export const WORKFLOW_STEPS: Partial> = { - [Flow.Wizard]: POSTHOG_INTEGRATION_WORKFLOW, - [Flow.Revenue]: REVENUE_ANALYTICS_WORKFLOW, - [Flow.YourFlow]: YOUR_WORKFLOW, // add -}; -``` - -3. Add to `FLOWS`: -```typescript -[Flow.YourFlow]: workflowToFlowEntries(YOUR_WORKFLOW) as FlowEntry[], -``` - -### 3. Create the runner - -The runner is split into two parts: - -1. **A `detect` workflow step** — checks prerequisites, selects the right skill, and downloads it. This is workflow-specific (framework detection for integration, payment provider for revenue, etc.) -2. **A thin runner function** that reads the skill path from the session and hands off to `runSkillBootstrap` - -The detect step lives in the workflow definition (step 1). Here's how revenue does it: - -```typescript -// In wizard/src/lib/workflows/revenue-analytics.ts - -const POSTHOG_SDKS = ['posthog-js', 'posthog-node', 'posthog-react-native', ...]; -const STRIPE_SDKS = ['stripe', '@stripe/stripe-js']; -const SKILL_ID = 'revenue-analytics-stripe'; - -{ - id: 'detect', - label: 'Detecting prerequisites', - // No screen — headless step. Runs via onInit, blocks via gate. - gate: (s) => - s.frameworkContext.skillPath != null || - s.frameworkContext.detectError != null, - onInit: (ctx) => { - // 1. Read package.json, check for PostHog + Stripe SDKs - // 2. If either missing → ctx.setFrameworkContext('detectError', '...') - // 3. If both found → fetch skill menu, download skill - // 4. On success → ctx.setFrameworkContext('skillPath', '.claude/skills/...') - // 5. On failure → ctx.setFrameworkContext('detectError', '...') - // - // The gate resolves once either skillPath or detectError is set. - // bin.ts awaits store.getGate('detect'), then checks for detectError. - }, -}, -``` - -In `bin.ts`, the revenue command awaits the detect gate and checks for errors: - -```typescript -await tui.store.getGate('detect'); - -const detectError = tui.store.session.frameworkContext.detectError as string | undefined; -if (detectError) { - console.error(detectError); - process.exit(1); - return; -} -``` - -The runner itself is trivial — it reads the skill path and delegates: - -```typescript -// wizard/src/lib/revenue-runner.ts -import { runSkillBootstrap } from './skill-runner'; -import type { WizardSession } from './wizard-session'; - -export async function runRevenueWizard(session: WizardSession): Promise { - // Skill was already downloaded by the detect workflow step - const skillPath = session.frameworkContext.skillPath as string; - - await runSkillBootstrap(session, { - skillPath, - integrationLabel: 'revenue-analytics-setup', - promptContext: 'Set up revenue analytics for this project.', - successMessage: 'Revenue analytics configured!', - reportFile: 'posthog-revenue-report.md', - docsUrl: 'https://posthog.com/docs/revenue-analytics', - spinnerMessage: 'Setting up revenue analytics...', - estimatedDurationMinutes: 5, - }); -} -``` - -This separation matters because different workflows need different detection: -- **Revenue**: checks for PostHog + Stripe, could later detect payment provider -- **Integration**: detects framework, version, project type before picking a skill -- **Future "error tracking"**: might detect error library (Sentry vs Bugsnag) first - -### 4. Add the CLI command - -In `wizard/bin.ts`, add a new command: - -```typescript -program - .command('your-command') - .description('Set up X') - .action(async (options) => { - // Prerequisite checks (e.g. verify required packages are installed) - // ... - - const { startTUI } = await import('./src/ui/tui/start-tui.js'); - const { buildSession } = await import('./src/lib/wizard-session.js'); - const { Flow } = await import('./src/ui/tui/router.js'); - - const tui = startTUI(WIZARD_VERSION, Flow.YourFlow); - - const session = buildSession({ - debug: options.debug, - localMcp: options.localMcp, - installDir, - ci: false, - }); - tui.store.session = session; - tui.store.session.setupConfirmed = true; // skip intro if no intro step - - await tui.store.getGate('health-check'); // resolves immediately if no health step - - const { runYourWizard } = await import('./src/lib/your-runner.js'); - await runYourWizard(tui.store.session); - }); -``` - -### 5. Verify - -1. `npm test` — all tests should pass (no store changes needed) -2. Run your command: `npx posthog-wizard your-command` -3. Verify the outro shows your success message and report file - -## Architecture Notes - -- **Workflow steps** (`workflow-step.ts`) are the single source of truth for flow structure, gates, and init work -- **The store** derives gate promises from step definitions — no per-flow hardcoding -- **The skill runner** (`skill-runner.ts`) handles the full lifecycle: skill install, agent init, prompt, error handling, outro -- **The outro screen** reads `outroData.message` and `outroData.reportFile` — no hardcoded strings -- Adding a new workflow requires **zero changes** to the store or outro screen diff --git a/.claude/skills/wizard-development/SKILL.md b/.claude/skills/wizard-development/SKILL.md index 02863eba..5125dfa6 100644 --- a/.claude/skills/wizard-development/SKILL.md +++ b/.claude/skills/wizard-development/SKILL.md @@ -135,6 +135,8 @@ When you're adding something that doesn't have a precedent in the codebase, ask **New post-agent step:** If you need to do something after the agent completes (upload env vars, install MCP servers, etc.), use the `postRun` hook on `WorkflowRun`. The hook receives the session and credentials. It runs after the agent succeeds but before the outro. It doesn't modify the runner pipeline — it's a callback on the workflow configuration. +**Custom workflow outro:** Set `buildOutroData` on the `WorkflowRun`. The function receives the session, credentials, and cloud region; returns an `OutroData` object that drives the outro screen. Use this for workflow-specific success messages, change lists, dashboard URLs, or seasonal copy. The runner uses sensible defaults from `successMessage`/`reportFile`/`docsUrl` when `buildOutroData` is omitted. + **New environment variable convention:** Each framework's `FrameworkConfig.environment.getEnvVars()` returns the env var names and values. The naming convention is framework-specific (e.g., `NEXT_PUBLIC_POSTHOG_PROJECT_TOKEN` for Next.js, `PUBLIC_POSTHOG_PROJECT_TOKEN` for SvelteKit). If you need a new convention, add it to the framework config. The runner doesn't know or care what the env vars are called. ## What to watch for @@ -165,4 +167,5 @@ If the codebase is growing faster than the capability it delivers, the boundarie - [references/ARCHITECTURE.md](references/ARCHITECTURE.md) — Pipeline anatomy, data flow, security boundaries, screen resolution, MCP topology, middleware pipeline. **Read when you need to understand where a new concern hooks in.** - [references/ANTI-PATTERNS.md](references/ANTI-PATTERNS.md) — Concrete failure modes with alternatives: inlining product logic, bloating commandments, building recovery instead of prevention, imperative navigation, bundling mismatched-rate knowledge. **Read when evaluating whether a proposed change fits.** +- [references/MAINTAINING-SKILLS.md](references/MAINTAINING-SKILLS.md) — How to keep the skills under `.claude/skills/` accurate over time: drift sources, review checklist, review triggers, patterns that age well, the deletion question, versioning convention. **Read when making any major architecture change in this repo.** diff --git a/.claude/skills/wizard-development/references/MAINTAINING-SKILLS.md b/.claude/skills/wizard-development/references/MAINTAINING-SKILLS.md new file mode 100644 index 00000000..662b8b4d --- /dev/null +++ b/.claude/skills/wizard-development/references/MAINTAINING-SKILLS.md @@ -0,0 +1,85 @@ +--- +title: Maintaining the wizard's skills corpus +description: How to keep the skills under .claude/skills/ accurate and useful over time. Read when reviewing, updating, or auditing the skill set — or when feedback from a real task surfaces drift. +--- + +# Maintaining the wizard's skills corpus + +The wizard ships four skills under `.claude/skills/`: `wizard-development`, `adding-framework-support`, `adding-skill-workflow`, `ink-tui`. They guide agents and human contributors through the design discipline (the meta-skill) and three procedural domains. Like any documentation, they go stale silently. This reference codifies what we've learned about keeping them load-bearing. + +## Why skills go stale + +Skills go stale when the codebase changes faster than the skill's prose. The most common drift sources, in order of severity: + +1. **A new factory or abstraction lands.** When a clean factory call exists for the common case (like `createSkillWorkflow` for skill workflows, or `runAgentWizard` for framework integrations), the skill should lead with it. Skills written before the factory still teach the primitives — and following them produces more invasive changes than needed. This is the highest-impact drift: contributors do extra work the architecture doesn't ask for. + +2. **A refactor changes the wiring.** When wiring becomes derived (e.g. bin.ts deriving subcommands from `WORKFLOW_REGISTRY` instead of hand-wiring each one), skills written before the refactor still teach the manual path. Contributors edit files that no longer need editing. + +3. **Path references rot.** Specific file paths (`src/lib/workflows/revenue-analytics.ts`) become directories or move. Directory paths (`src/lib/workflows/revenue-analytics/`) survive better. Pinning to a specific file inside a directory ages worst — the file gets renamed during refactors and the skill's pointer rots first. + +4. **Code snippets duplicate canonical examples.** When the skill embeds code that mirrors a pattern in the codebase, the two copies drift apart. Six months later the canonical example has new fields the snippet lacks; the snippet has parameters the canonical example removed. + +5. **Prose contradicts a code comment.** The header comment in `workflow-registry.ts` says "flows.ts, store.ts, and bin.ts all derive their wiring from this array — no need to touch those files." If the skill still walks through editing those files, the skill is the wrong one. + +## What to check when reviewing a skill + +Run this checklist against the skill: + +- **Path validity.** Every file path in the skill — open it. If it's a directory now, fix the reference or drop the path entirely. If the file moved, find its replacement or remove the pointer. +- **Factory check.** Search the codebase for any function whose name suggests it's the entry point for what the skill teaches (e.g. `create*Workflow`, `add*Framework`, `register*`). If one exists and the skill doesn't mention it, the skill is leading with the wrong abstraction level. +- **Auto-derived wiring.** Search for `getSubcommandWorkflows`, `WORKFLOW_REGISTRY`, `FLOWS`, or similar registry-derived patterns. If the skill teaches manual edits to anything those derive, the skill is teaching dead work. +- **Code snippet drift.** Every code snippet — does an equivalent canonical example exist in the codebase? If yes, replace the snippet with a directory pointer. The codebase is the source of truth; skill snippets are a copy. +- **Code comment vs skill prose.** The most authoritative documentation is the comment at the top of the file the skill describes. If the skill and the comment disagree, the comment wins (it's adjacent to the code that enforces it). +- **API surface check.** For each interface or type the skill names — open the source file. Does the skill list every required field? Every optional field worth mentioning? Are any deprecated fields still in the skill? Are any new fields missing? +- **Test as truth.** Tests in `__tests__/` adjacent to the abstraction lock down its contract. If a skill claims behavior that no test enforces, that claim is more likely to drift. Prefer pointing at tests for invariants. + +## When to trigger a review + +- **While you're shipping a major architectural change.** This is the primary trigger. When you introduce a new factory, derive wiring that used to be manual, deprecate a pattern, rename a load-bearing file, or change the shape of a typed boundary, audit every skill that names that abstraction *as part of the same change*. Updating the skill alongside the refactor is cheap; updating it months later, after a contributor has already followed the stale guidance, is expensive. The skills are downstream of the architecture — keep them in sync at the source. +- **After a real task validation.** The most reliable retroactive staleness signal is feedback from someone (or some agent) who tried to use the skill to do real work. If they had to read source code to figure out something the skill should have told them, that's a gap. If they made edits the skill suggested but the system didn't need, that's drift. Use these reports to catch what slipped past the at-the-source audit. +- **When the skill's `version` is unchanged but the codebase has shipped multiple releases.** A skill at `version: "1.0"` six months and ten releases later is more likely to have drifted than not. Use git log on the source files the skill describes — if there's significant churn, audit. +- **Before a season of contribution.** If you're inviting external contributors (PostHog hackday, an open-source push), audit the skills first. The cost of a stale skill multiplies by the number of people who follow it. + +## Patterns that age well + +- **Pointing at directories, not files.** `src/lib/workflows/audit/` survives file renames within the directory. `src/lib/workflows/audit/index.ts` rots when the file gets restructured. +- **Pointing at canonical examples and letting the reader read.** "The audit workflow is the cleanest example of this pattern" survives refactors of the audit workflow as long as audit remains canonical. A 40-line code snippet inside the skill does not. +- **Stating invariants, not implementations.** "The runner is fixed; what varies is the WorkflowRun config" is an architectural claim that survives any refactor that doesn't violate it. A code snippet showing the runner's current shape rots on every refactor. +- **Listing what NOT to do.** Anti-patterns are sticky because they describe failure modes that recur. The contents of `ANTI-PATTERNS.md` stay valid even when the positive patterns evolve, because they describe what breaks if the discipline lapses. +- **Decision questions, not decision trees.** "Who changes this next?" is a question that produces fresh answers as the system evolves. A flowchart that bakes in current answers rots when the answers shift. + +## Patterns that age poorly + +- **Step-by-step procedures with file paths.** Every path is a hostage to refactoring. Step-by-step prose is a hostage to API changes. +- **Code snippets that duplicate canonical examples.** Two copies always drift. Pick one source of truth and point at it. +- **"You'll also need to edit X" instructions.** When wiring becomes derived, these instructions become work the contributor doesn't need to do — but the skill still tells them to do it. +- **Prose that summarizes what a code comment already says.** When the comment changes, the prose doesn't. Reference the comment instead of paraphrasing it. +- **Cross-skill duplication.** When two skills teach the same concept, they drift independently. One skill should own the concept; others should point at it. + +## The deletion question + +Sometimes the right move is to delete a stale reference file rather than update it. Consider deletion when: + +- The reference is more than 50% stale and the SKILL.md can absorb the still-valid parts. +- The reference duplicates information that's better located in the codebase (canonical examples, tests, type definitions). +- Updating would require maintaining a parallel copy of something that already exists in code. + +Deletion removes drift surface permanently. Update creates drift surface that needs maintenance forever. When in doubt, prefer deletion plus a directory pointer to update plus an obligation. + +## Versioning convention + +Skills carry a `version` field in their frontmatter. Bump it as follows: + +- **Patch** (1.1.0 → 1.1.1): typo fixes, link updates, minor clarifications. No semantic change. +- **Minor** (1.1 → 1.2): new sections, additional patterns, extension of existing guidance. The previous version is still substantially correct. +- **Major** (1.x → 2.0): substantial rewrite. Following the previous version would produce wrong work. The bump signals "everything you knew about this skill needs re-reading." + +When you bump a version, also bump the `version` in any reference file that shipped as part of the change. Version churn is itself a maintenance signal — a skill that's bumped majors three times in a year is probably teaching an abstraction that hasn't stabilized. + +## The maintainer's question + +When you finish updating a skill, ask: "If a contributor with no prior context follows this exactly, will they produce work the architecture currently asks for?" Not "will they produce something that works" — works is necessary but not sufficient. The skill should produce idiomatic output, not just functional output. If following the skill produces a bin.ts edit that the registry would have done automatically, the skill is asking for work that should be automatic. + +The wizard's compactness is a side effect of good design discipline. The skill's accuracy is a side effect of good maintenance discipline. Both decay without active care. + + From 7d5b31b5f1f7d54f0adb1abda3637aa1099315f9 Mon Sep 17 00:00:00 2001 From: Danilo Campos Date: Wed, 6 May 2026 15:15:29 -0400 Subject: [PATCH 5/9] Playground! --- .claude/skills/ink-tui/SKILL.md | 2 +- .claude/skills/wizard-development/SKILL.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/.claude/skills/ink-tui/SKILL.md b/.claude/skills/ink-tui/SKILL.md index 0a017f56..c917c3ad 100644 --- a/.claude/skills/ink-tui/SKILL.md +++ b/.claude/skills/ink-tui/SKILL.md @@ -78,7 +78,7 @@ Read each primitive's source file for its current props interface. Shared style constants (`Colors`, `Icons`, `HAlign`, `VAlign`) live in `src/ui/tui/styles.ts`. -**Playground**: Run `pnpm try --playground` to see all primitives in action. +**Playground**: Run `pnpm try --playground` to see all primitives in action. **When you add a new primitive, also add a demo for it under `src/ui/tui/playground/demos/` and register it in `PlaygroundApp.tsx`.** The playground is the single place a contributor (or future agent) can see every primitive rendered with realistic props — a primitive that's not in the playground is invisible to anyone who didn't write it, and the next person solving the same UI problem will build a parallel component instead of reusing yours. ### Enums everywhere diff --git a/.claude/skills/wizard-development/SKILL.md b/.claude/skills/wizard-development/SKILL.md index 5125dfa6..d05564be 100644 --- a/.claude/skills/wizard-development/SKILL.md +++ b/.claude/skills/wizard-development/SKILL.md @@ -131,7 +131,7 @@ When you're adding something that doesn't have a precedent in the codebase, ask **New middleware:** Implement the `Middleware` interface from `middleware/types.ts`: a `name` and optional lifecycle hooks (`onInit`, `onMessage`, `onPhaseTransition`, `onFinalize`). Add it to the pipeline construction in `agent-runner.ts`. The pipeline dispatches to middlewares in order. Each middleware publishes to a shared store that downstream middleware can read. The pipeline itself doesn't change shape. -**New TUI primitive:** Create the component in `src/ui/tui/primitives/`. Export from `primitives/index.ts`. Primitives are pure rendering — they take props and return Ink JSX. They don't import the store, don't call `getUI()`, don't mutate state. Screens compose primitives; primitives don't know what screen they're in. See the `ink-tui` skill for the full component catalog and layout patterns. +**New TUI primitive:** Create the component in `src/ui/tui/primitives/`. Export from `primitives/index.ts`. Primitives are pure rendering — they take props and return Ink JSX. They don't import the store, don't call `getUI()`, don't mutate state. Screens compose primitives; primitives don't know what screen they're in. **Add a demo under `src/ui/tui/playground/demos/` and register it in `PlaygroundApp.tsx`** — primitives that aren't in the playground are invisible to future contributors, who will build duplicates instead of reusing yours. See the `ink-tui` skill for the full component catalog and layout patterns. **New post-agent step:** If you need to do something after the agent completes (upload env vars, install MCP servers, etc.), use the `postRun` hook on `WorkflowRun`. The hook receives the session and credentials. It runs after the agent succeeds but before the outro. It doesn't modify the runner pipeline — it's a callback on the workflow configuration. From 840834b2faf1f900b487d1bd857867d86c147053 Mon Sep 17 00:00:00 2001 From: Danilo Campos Date: Wed, 6 May 2026 15:15:48 -0400 Subject: [PATCH 6/9] Update .claude/skills/wizard-development/references/ARCHITECTURE.md Co-authored-by: Vincent (Wen Yu) Ge <29069505+gewenyu99@users.noreply.github.com> --- .claude/skills/wizard-development/references/ARCHITECTURE.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.claude/skills/wizard-development/references/ARCHITECTURE.md b/.claude/skills/wizard-development/references/ARCHITECTURE.md index a96bb762..cd4c9fd2 100644 --- a/.claude/skills/wizard-development/references/ARCHITECTURE.md +++ b/.claude/skills/wizard-development/references/ARCHITECTURE.md @@ -36,7 +36,7 @@ Every wizard run — framework integration, revenue analytics, audit, generic sk ### What the runner does NOT know -The runner doesn't know what framework is being integrated. It doesn't know what skills exist. It doesn't know what env vars are called. It doesn't know what the outro should say. All of that comes from `WorkflowRun` and `FrameworkConfig` — configuration, not code. +The `agent-runner.ts` doesn't know what framework is being integrated. It doesn't know what skills exist. It doesn't know what env vars are called. It doesn't know what the outro should say. All of that comes from `WorkflowRun` and `FrameworkConfig` — configuration, not code. ## Session data flow From f7857d68297d2b07c603dc8109edc511af38ce01 Mon Sep 17 00:00:00 2001 From: Danilo Campos Date: Wed, 6 May 2026 15:17:11 -0400 Subject: [PATCH 7/9] Update ARCHITECTURE.md --- .claude/skills/wizard-development/references/ARCHITECTURE.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/.claude/skills/wizard-development/references/ARCHITECTURE.md b/.claude/skills/wizard-development/references/ARCHITECTURE.md index a96bb762..03e40ff4 100644 --- a/.claude/skills/wizard-development/references/ARCHITECTURE.md +++ b/.claude/skills/wizard-development/references/ARCHITECTURE.md @@ -144,7 +144,7 @@ This boundary means the runner, the agent interface, and the OAuth flow don't kn The agent has access to two MCP servers: -- **posthog-wizard** — remote, HTTP-based. The PostHog MCP server at `mcp.posthog.com/mcp` (or `mcp-eu.posthog.com/mcp`). Provides query tools for PostHog data, dashboard creation, etc. Authenticated via Bearer token. Tool schemas are deferred (`ENABLE_TOOL_SEARCH: 'auto:0'`) to avoid bloating the system prompt. +- **posthog-wizard** — remote, HTTP-based. The PostHog MCP server at `mcp.posthog.com/mcp` (or `mcp-eu.posthog.com/mcp`). Provides query tools for PostHog data, dashboard creation, etc. Authenticated via Bearer token. Tool schemas are deferred (`ENABLE_TOOL_SEARCH: 'auto:0'`) to avoid bloating the system prompt. It's almost never the right move to add tools here, unless a server-side component is the only path forward. - **wizard-tools** — local, in-process. Created by `createWizardToolsServer()` in `wizard-tools.ts`. Provides `check_env_keys`, `set_env_values`, `detect_package_manager`, `load_skill_menu`, `install_skill`. Runs in the wizard process — secret values never leave the machine. @@ -164,4 +164,3 @@ MiddlewarePipeline Each middleware has a `name` and optional lifecycle hooks. A shared store (`MiddlewareContext.get` / `MiddlewareStore.set`) lets upstream middleware publish data that downstream middleware reads. Phase detection is automatic (from SDK message content) or explicit (`pipeline.startPhase()`). To add a middleware: implement the `Middleware` interface, add it to the pipeline construction in `agent-runner.ts`. The pipeline dispatches in order. - From 483cd0c13542aea88ae44562426fea0aa6c3e8bc Mon Sep 17 00:00:00 2001 From: Danilo Campos Date: Wed, 6 May 2026 15:25:22 -0400 Subject: [PATCH 8/9] Update SKILL.md --- .claude/skills/wizard-development/SKILL.md | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/.claude/skills/wizard-development/SKILL.md b/.claude/skills/wizard-development/SKILL.md index d05564be..d5af4238 100644 --- a/.claude/skills/wizard-development/SKILL.md +++ b/.claude/skills/wizard-development/SKILL.md @@ -139,6 +139,25 @@ When you're adding something that doesn't have a precedent in the codebase, ask **New environment variable convention:** Each framework's `FrameworkConfig.environment.getEnvVars()` returns the env var names and values. The naming convention is framework-specific (e.g., `NEXT_PUBLIC_POSTHOG_PROJECT_TOKEN` for Next.js, `PUBLIC_POSTHOG_PROJECT_TOKEN` for SvelteKit). If you need a new convention, add it to the framework config. The runner doesn't know or care what the env vars are called. +## Tests + +Tests use vitest and live in `__tests__/` directories adjacent to the source they cover. The architecture is test-friendly because most hot zones are pure functions: router resolution, gate predicates, detection logic, prompt assembly, YARA rule matching, factory output. Test these directly — no mocking, no fixtures. + +What to test when extending the system: + +- **New factory or typed interface:** add contract tests in a sibling `__tests__/` that lock down shape and defaults. Future readers (and future skill writers) lean on these as the durable record of what the abstraction guarantees. +- **New gate or `isComplete` predicate:** test the predicate as a pure function with mutated `buildSession({})` results. Don't try to integration-test the router around it. +- **New YARA rule:** test a matching string and at least one near-miss to catch regex over-matching. +- **New detection function:** test against fixture inputs (parsed `package.json` shapes, file contents). Detection is pure — no install dir mocking needed if you call the parser directly. + +What NOT to test: + +- The agent run itself, the SDK call, real network calls. These are integration concerns; the unit boundary stops at "the configuration the runner consumes is correct." +- The TUI's full render output. Test screen resolution (router predicates) and store setters; trust Ink to render. +- Things the type system already guarantees. If the compiler catches it, a test for it is redundant. + +Verify your changes with `pnpm build && pnpm test && pnpm fix` before finishing. If you can't write a clean test for what you're adding, that's often a signal that the boundary is in the wrong place. + ## What to watch for These are the early warning signs that a change is drifting from the discipline: From 41a1dcd47e25be8b7fc9021714f0d3688f455bf5 Mon Sep 17 00:00:00 2001 From: Danilo Campos Date: Mon, 11 May 2026 18:24:50 -0400 Subject: [PATCH 9/9] Warlock effects --- .claude/skills/wizard-development/SKILL.md | 14 +++++++------- .../wizard-development/references/ANTI-PATTERNS.md | 4 ++-- .../wizard-development/references/ARCHITECTURE.md | 14 ++++++++------ CLAUDE.md | 3 ++- 4 files changed, 19 insertions(+), 16 deletions(-) diff --git a/.claude/skills/wizard-development/SKILL.md b/.claude/skills/wizard-development/SKILL.md index d5af4238..6ddea8f7 100644 --- a/.claude/skills/wizard-development/SKILL.md +++ b/.claude/skills/wizard-development/SKILL.md @@ -34,7 +34,7 @@ The knowledge — what PostHog needs from a Next.js project, how to detect Svelt |---|---|---| | Frameworks | `FrameworkConfig` (~70-120 lines) | Detection, env vars, prompts | | Docs | Skill markdown in context-mill | Workflow steps, example code | -| Security | YARA rules in `yara-scanner.ts` | Regex patterns, severity levels | +| Security | YARA-X rules in the [warlock](https://github.com/PostHog/warlock) sibling repo | Rule content (patterns, severity, category). Wizard wires the engine via hooks. | | Context | Step array in `workflows/` | Gates, screens, predicates | | UI | Screen component + primitives | Ink, store getters, layout | | Agent development | Runner, store, detection loop, tools | The machinery itself | @@ -71,7 +71,7 @@ Three prevention layers: - **L1 — canUseTool allowlist** (`agent-interface.ts`): Blocks dangerous bash commands before execution. The `wizard-tools` MCP fences `.env` files so the agent never sees secret values. -- **L2 — YARA scanner** (`yara-scanner.ts` + `yara-hooks.ts`): Regex rules running as pre/post tool-use hooks. Catches PII in capture calls, hardcoded keys, prompt injection, secret exfiltration, supply chain attacks. Critical violations terminate the session. **Fails closed** — scanner error means block, not pass. +- **L2 — warlock scanner** (rules live in [warlock](https://github.com/PostHog/warlock); wizard wires hooks in `yara-hooks.ts`): Real YARA-X rules running as pre/post tool-use hooks. Catches PII in capture calls, hardcoded keys, prompt injection, secret exfiltration, supply chain attacks, destructive operations. The scanner is engine-only — it returns matches with category/severity/action metadata; the wizard decides how to respond. Critical violations terminate the session. **Fails closed** — scanner error means block, not pass. (Note: the wizard still ships a legacy hand-rolled regex scanner at `src/lib/yara-scanner.ts` during the warlock migration. New rules go in warlock; the in-repo scanner is being retired.) **The test:** When the agent misbehaves, does the system prevent the damage or detect it afterward? Does the system fail closed on uncertainty? @@ -111,7 +111,7 @@ The five principles above cover the existing patterns. But the wizard will need When you're adding something that doesn't have a precedent in the codebase, ask these questions in order: -1. **Which domain does this belong to?** If it's framework-specific, it goes in `FrameworkConfig`. If it's integration knowledge, it goes in skill content. If it's a security constraint, it goes in YARA rules. If you can't name a specific domain from the table above, the concern may not be well-defined yet. +1. **Which domain does this belong to?** If it's framework-specific, it goes in `FrameworkConfig`. If it's integration knowledge, it goes in skill content. If it's a security constraint, it goes in a warlock rule (the sibling repo). If you can't name a specific domain from the table above, the concern may not be well-defined yet. 2. **Does this change at a different rate than the code around it?** Knowledge that updates weekly shouldn't live in code that releases monthly. If the concern changes faster than its container, it needs a decoupled delivery mechanism (like context-mill skills). If it changes slower, it can live in code. @@ -125,7 +125,7 @@ When you're adding something that doesn't have a precedent in the codebase, ask **New agent tool (in-process MCP):** Add it to `wizard-tools.ts` alongside `check_env_keys`, `set_env_values`, etc. The tool runs locally — secret values never leave the machine. Register the tool name in `WIZARD_TOOL_NAMES` so the SDK allowlist includes it. Follow the existing tool pattern: zod schema, path-traversal protection, logging. -**New security rule:** Add a `YaraRule` object to the `RULES` array in `yara-scanner.ts`. Each rule has a name, description, severity, category, `appliesTo` (which hook+tool combinations), and `patterns` (compiled regex). One match per rule is sufficient. The hooks in `yara-hooks.ts` don't need to change — they iterate the rules array automatically. +**New security rule:** Contribute the rule to [warlock](https://github.com/PostHog/warlock), not to the wizard. Warlock is the YARA-X engine that backs the wizard's security scanning; it's an append-only sibling repo with its own contribution process. Add a `.yar` file under `src/scanner/rules/` with a meta block (description, severity, category, scan_context, action) and a test under `src/scanner/__tests__/rules/`. The wizard's hooks in `yara-hooks.ts` automatically pick up new rules when it bumps its warlock dependency — wizard-side changes are unnecessary unless the response to a new category needs different handling. Filter consumed matches by `category` and `severity` (the append-only API contract), not by individual rule names. **New detection signal:** If you need to detect something about the project beyond framework identity (e.g., Stripe presence, LLM SDK usage), add it to `detection/features.ts`. The `discoverFeatures()` function returns an array of `DiscoveredFeature` enums. The intro screen and workflow can read discovered features from the session. Detection functions are pure — no store mutations, no UI calls. @@ -141,13 +141,13 @@ When you're adding something that doesn't have a precedent in the codebase, ask ## Tests -Tests use vitest and live in `__tests__/` directories adjacent to the source they cover. The architecture is test-friendly because most hot zones are pure functions: router resolution, gate predicates, detection logic, prompt assembly, YARA rule matching, factory output. Test these directly — no mocking, no fixtures. +Tests use vitest and live in `__tests__/` directories adjacent to the source they cover. The architecture is test-friendly because most hot zones are pure functions: router resolution, gate predicates, detection logic, prompt assembly, hook wiring, factory output. Test these directly — no mocking, no fixtures. (Security rule matching is tested in the warlock repo, not here.) What to test when extending the system: - **New factory or typed interface:** add contract tests in a sibling `__tests__/` that lock down shape and defaults. Future readers (and future skill writers) lean on these as the durable record of what the abstraction guarantees. - **New gate or `isComplete` predicate:** test the predicate as a pure function with mutated `buildSession({})` results. Don't try to integration-test the router around it. -- **New YARA rule:** test a matching string and at least one near-miss to catch regex over-matching. +- **New warlock rule:** tests live alongside the rule in the warlock repo (`src/scanner/__tests__/rules/`). Cover at least one matching string and one near-miss to catch over-matching. The wizard doesn't need tests for rules it merely consumes. - **New detection function:** test against fixture inputs (parsed `package.json` shapes, file contents). Detection is pure — no install dir mocking needed if you call the parser directly. What NOT to test: @@ -168,7 +168,7 @@ These are the early warning signs that a change is drifting from the discipline: - **The commandments are getting long.** Commandments are per-turn system prompt — every token counts. If a rule needs explanation, move the explanation to a skill reference file and leave only the invariant in the commandment. -- **You're building recovery infrastructure.** If you're writing retry logic, checkpoint saving, or self-heal code, ask whether a prevention layer (YARA rule, canUseTool rule, skill content improvement) would eliminate the failure mode instead of recovering from it. +- **You're building recovery infrastructure.** If you're writing retry logic, checkpoint saving, or self-heal code, ask whether a prevention layer (warlock rule, canUseTool rule, skill content improvement) would eliminate the failure mode instead of recovering from it. - **You're adding imperative UI transitions.** If you're writing `goTo`, `navigate`, or `if (screen === X) show Y`, the session state doesn't accurately represent what the user should see. Fix the state model and let the router derive the screen. diff --git a/.claude/skills/wizard-development/references/ANTI-PATTERNS.md b/.claude/skills/wizard-development/references/ANTI-PATTERNS.md index aafcf4cb..8069e9b3 100644 --- a/.claude/skills/wizard-development/references/ANTI-PATTERNS.md +++ b/.claude/skills/wizard-development/references/ANTI-PATTERNS.md @@ -37,9 +37,9 @@ Each section describes a pattern that looks reasonable in isolation but degrades **What happens next:** The recovery infrastructure works — sort of. The circuit breaker fires after 47 denied commands instead of preventing the first one. The checkpoint system adds 300 lines of state serialization. The self-heal module adds 250 lines of heuristic pattern matching. Each module is tested independently but their interactions are not — a checkpoint saved during a self-heal cycle can restore to a state that triggers the circuit breaker. The recovery code becomes its own source of bugs, and the bugs are harder to diagnose because they involve interactions between three systems that were each designed in isolation. -**The cost:** Recovery is O(n) in the number of failure modes. Each new failure mode needs new recovery code. Prevention is O(1) — a YARA rule, a canUseTool entry, or a skill content improvement addresses the root cause once. The codebase grows linearly with recovery and stays constant with prevention. +**The cost:** Recovery is O(n) in the number of failure modes. Each new failure mode needs new recovery code. Prevention is O(1) — a warlock rule, a canUseTool entry, or a skill content improvement addresses the root cause once. The codebase grows linearly with recovery and stays constant with prevention. -**What to do instead:** For each failure mode, ask: can I prevent this at the boundary? A command that should never run → add it to the canUseTool deny list. A pattern that should never appear in written code → add a YARA rule. A mistake the agent keeps making → improve the skill content or add a one-line commandment. Reserve recovery infrastructure for truly unpredictable failures (network outages, API rate limits) — not for agent behavior that can be shaped by better boundaries. +**What to do instead:** For each failure mode, ask: can I prevent this at the boundary? A command that should never run → add it to the canUseTool deny list. A pattern that should never appear in written code → add a rule to warlock (the sibling repo that ships the YARA-X scanner). A mistake the agent keeps making → improve the skill content or add a one-line commandment. Reserve recovery infrastructure for truly unpredictable failures (network outages, API rate limits) — not for agent behavior that can be shaped by better boundaries. --- diff --git a/.claude/skills/wizard-development/references/ARCHITECTURE.md b/.claude/skills/wizard-development/references/ARCHITECTURE.md index b4e2108d..cc219483 100644 --- a/.claude/skills/wizard-development/references/ARCHITECTURE.md +++ b/.claude/skills/wizard-development/references/ARCHITECTURE.md @@ -87,17 +87,19 @@ Three layers, each enforced at a different point in the tool-use lifecycle: ``` Agent wants to use a tool ↓ -canUseTool() [L1] → allow/deny before execution - ↓ (bash allowlist, .env file fencing) -PreToolUse YARA hooks [L2] → scan input, block if matched - ↓ (exfiltration, destructive ops, supply chain) +canUseTool() [L1] → allow/deny before execution + ↓ (bash allowlist, .env file fencing) +PreToolUse warlock hook [L2] → scan input, block if matched + ↓ (exfiltration, destructive ops, supply chain) Tool executes ↓ -PostToolUse YARA hooks [L2] → scan output, instruct revert or terminate - ↓ (PII in capture, hardcoded keys, prompt injection) +PostToolUse warlock hook [L2] → scan output, instruct revert or terminate + ↓ (PII in capture, hardcoded keys, prompt injection) Result returned to agent ``` +The L2 detection layer is the [warlock](https://github.com/PostHog/warlock) sibling repo — an engine-only YARA-X scanner that returns matches with `category`, `severity`, and `action` (recommendation: `block` / `revert` / `warn`). The wizard wires it into the SDK's PreToolUse/PostToolUse hooks (`src/lib/yara-hooks.ts`) and decides how to respond per match. Adding a new detection means contributing a rule to warlock, not editing wizard code. (A legacy in-repo regex scanner at `src/lib/yara-scanner.ts` is being retired as warlock takes over.) + The sandbox (filesystem + network scoping) is configured once in the SDK `query()` call and enforced by the SDK runtime — not by wizard code. Commandments (L0) are in the system prompt and operate at the model's judgment layer — no code enforcement. They're the first line, not the last. diff --git a/CLAUDE.md b/CLAUDE.md index 29ac0f13..5a75864d 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -13,7 +13,7 @@ Each domain has a dedicated boundary: - **Frameworks** → `FrameworkConfig` in `src/frameworks//` - **Integration knowledge** → markdown skills in the [context-mill](https://github.com/PostHog/context-mill) repo -- **Security policy** → YARA rules in `src/lib/yara-scanner.ts` +- **Security policy** → YARA-X rules in the [warlock](https://github.com/PostHog/warlock) sibling repo. The wizard wires the scanner via PostToolUse/PreToolUse hooks (`src/lib/yara-hooks.ts`); the rule content itself lives in warlock. - **Workflows** → step arrays in `src/lib/workflows/` - **TUI** → screen components and primitives in `src/ui/tui/` @@ -64,6 +64,7 @@ pnpm build && pnpm test && pnpm fix publishes the markdown skills the wizard agent uses for framework-specific integration knowledge. Skills are decoupled from the wizard release cycle so docs and integration patterns can update independently. - **[wizard-workbench](https://github.com/PostHog/wizard-workbench)** — the development and testing environment. Houses framework test apps (Next.js, React Router, Django, Flask, Laravel, SvelteKit, Swift, TanStack, FastAPI) with no PostHog installed, plus an `mprocs`-driven local dev stack that runs context-mill + MCP + the wizard together with hot reload. Use this to develop and test wizard changes against real projects. +- **[warlock](https://github.com/PostHog/warlock)** — the security scanner engine for PostHog's agentic flows. Bundles YARA-X rules for prompt injection, exfiltration, destructive operations, supply chain attacks, hardcoded secrets, and PII. Engine-only: it returns matches with category/severity/action metadata; the wizard decides how to respond. New security rules belong in warlock, not in the wizard. ## Repository conventions