From a6a11042bafd71905245d94324c8572f80328a50 Mon Sep 17 00:00:00 2001 From: Edwin Lim Date: Sat, 9 May 2026 19:31:13 -0400 Subject: [PATCH 1/5] events audit init --- .../skills/events-audit/config.yaml | 14 + .../skills/events-audit/description.md | 83 ++++++ .../events-audit/references/1-detect.md | 42 +++ .../skills/events-audit/references/2-scan.md | 262 ++++++++++++++++ .../events-audit/references/3-extract.md | 54 ++++ .../events-audit/references/4-mcp-query.md | 109 +++++++ .../events-audit/references/5-report.md | 282 ++++++++++++++++++ 7 files changed, 846 insertions(+) create mode 100644 transformation-config/skills/events-audit/config.yaml create mode 100644 transformation-config/skills/events-audit/description.md create mode 100644 transformation-config/skills/events-audit/references/1-detect.md create mode 100644 transformation-config/skills/events-audit/references/2-scan.md create mode 100644 transformation-config/skills/events-audit/references/3-extract.md create mode 100644 transformation-config/skills/events-audit/references/4-mcp-query.md create mode 100644 transformation-config/skills/events-audit/references/5-report.md diff --git a/transformation-config/skills/events-audit/config.yaml b/transformation-config/skills/events-audit/config.yaml new file mode 100644 index 0000000..817a26f --- /dev/null +++ b/transformation-config/skills/events-audit/config.yaml @@ -0,0 +1,14 @@ +type: docs-only +template: description.md +description: Audit PostHog events in a codebase — produce an inventory of every captured event mapped to its file, area, and 30-day volume for the PM to query +tags: [analytics, audit, best-practices] +references: + preamble: "**Read ONLY this file.** Do not read any other reference file until this one tells you to." +shared_docs: + - https://posthog.com/docs/product-analytics/best-practices.md + - https://posthog.com/docs/getting-started/identify-users.md +variants: + - id: all + display_name: PostHog events audit + tags: [analytics, audit, best-practices] + docs_urls: [] diff --git a/transformation-config/skills/events-audit/description.md b/transformation-config/skills/events-audit/description.md new file mode 100644 index 0000000..e81dabc --- /dev/null +++ b/transformation-config/skills/events-audit/description.md @@ -0,0 +1,83 @@ +# PostHog events audit + +This skill produces a PM-browseable inventory of every PostHog event your code captures, mapped to the codebase by file path and enriched with 30-day volume from PostHog. The PM does the synthesis on demand by asking follow-up questions against the inventory — the skill itself doesn't cluster events into flows or write per-flow narratives. + +The checklist has three shared checks: `identity-segmentation`, `coverage-map`, `data-quality`. Finish each one. Don't invent new ids. + +## Workflow + +The audit runs as a 5-step chain: + +1. Detect +2. Scan +3. Extract +4. Enrich +5. Report + +Each step file points to the next. Run them in order. Don't explore the source tree on your own. + +The wizard seeds the checklist with the three shared checks before you start. + +Step 1 confirms the shape and reseeds if it's missing or out of date. As you finish each check, patch it with `mcp__wizard-tools__audit_resolve_checks`. + +**Start by reading `references/1-detect.md`** (relative to this skill's directory – typically `.claude/skills/events-audit/references/1-detect.md`). Don't read ahead. Don't re-read a step once you've passed it. Don't re-read SKILL.md. + +Some tools are deferred by the SDK – load each once via `ToolSearch select:` before first use: `Read`, `Bash`, `Glob`, `Grep`, `Write`, `mcp__wizard-tools__audit_resolve_checks`, `mcp__wizard-tools__audit_seed_checks`, and the PostHog query tools `mcp__posthog-wizard__query-run` and `mcp__posthog-wizard__insights-list`. Use `ToolSearch` to load named tools only – don't browse. + +`Agent` is **not** in the default load list. Step 2 is the only place where fan-out is conditional; load `Agent` *inside* step 2, only after deciding to dispatch subagents. + +**Don't call `TodoWrite`.** Progress comes from the checklist and `[STATUS]` lines, not a todo list. + +If the wizard prompt names a framework (e.g. "Framework: Flask"), use it to narrow your scans – skip manifests and language patterns that don't apply. + +## When to trigger + +Trigger when the user asks for an event audit, event inventory, or events documentation; "what events does my code capture"; "find redundant or stale events"; or "which PM questions can my data answer." + +Don't trigger when the user wants to *add* instrumentation (defer to `instrument-product-analytics`) or debug a single missing event (defer to `diagnosing-missing-recordings`). + +## Live activity – `[STATUS]` + +The "Working on …" banner reads from `[STATUS]` lines you emit in plain text. Whenever you start a sub-step, write a line like: + +``` +[STATUS] Scanning capture sites +``` + +The wizard catches these and updates the spinner. Use them freely – they're cheap. Each step file lists the exact strings to emit. Don't invent your own. + +## The audit checklist + +The checklist lives at `.posthog-audit-checks.json` and shows live in the "Audit plan" tab. It's owned by MCP tools – **never `Write` it directly**: + +- `mcp__wizard-tools__audit_resolve_checks({ updates })` - patch one or more checks by `id`. Each `update` is `{ id, status, file?, details? }`. Emit one call per check as you finish its analysis – the "Audit plan" tab updates live, so streaming resolutions one-at-a-time gives the user visible progress instead of a single end-of-step flip. Only batch when you genuinely produce two updates in the same model turn (rare). +- `mcp__wizard-tools__audit_seed_checks({ checks })` - replaces the whole checklist atomically. Step 1's fallback uses this when the file is missing or out of date; otherwise don't call it. + +A second file, `.posthog-events-inventory.json`, holds the capture sites with derived `area`/`route`/`enclosing` fields, event names, properties, and per-event volume from PostHog. You write it directly in steps 2 through 4. It's **not** MCP-owned – no `audit_*` tool guards it. **The inventory is the audit's deliverable** — keep it on disk after the report is written so the PM can ask follow-ups against it. + +### Check entry shape + +- `id` - stable kebab-case slug. The three shared ids are `identity-segmentation`, `coverage-map`, `data-quality`. +- `area` - short group name. Shared entries use `Identity`, `Coverage`, `Data quality`. +- `label` - short human name. +- `status` - `pending` | `pass` | `error` | `warning` | `suggestion`. +- `file` - optional `path:line` for findings tied to a location. +- `details` - Markdown bulleted summary in plain language. Describe state and the PM questions blocked. Don't render `status` as a grade in the report; the enum is for filter logic only. + +## Key principles + +- **Show your evidence.** Cite `file:line` for every non-pass finding. +- **Frame findings as product questions.** Every finding describes *what product question or insight it blocks*, not what code rule it breaks. +- **Hand the PM the map. Don't tell the story for them.** The deliverable is an inventory plus three short qualitative checks plus a few suggested follow-ups. The PM clusters events into flows on demand by asking targeted follow-up questions against the inventory — the skill doesn't do that synthesis upfront. + +## Abort statuses + +Report aborts with `[ABORT]` prefixed messages. The wizard catches these and stops the run – don't halt yourself. + +- `[ABORT] No PostHog SDK found` +- `[ABORT] No capture call sites found in any detected SDK` +- `[ABORT] MCP project mismatch – enrichment unsafe` + +## Framework guidelines + +{commandments} diff --git a/transformation-config/skills/events-audit/references/1-detect.md b/transformation-config/skills/events-audit/references/1-detect.md new file mode 100644 index 0000000..c51fe5a --- /dev/null +++ b/transformation-config/skills/events-audit/references/1-detect.md @@ -0,0 +1,42 @@ +--- +next_step: 2-scan.md +--- + +# Step 1 – Detect SDKs + +Find every PostHog SDK in the project and remember which language(s) and framework(s) the rest of the audit will work on. **Read-only.** Don't scan code for capture sites – that's step 2. + +## Status + +Emit: + +``` +[STATUS] Detecting SDKs +``` + +## Action + +### a. Find PostHog SDKs + +`Glob` for the project's dependency manifests across every language PostHog ships an SDK for. The full list: + +- `package.json` - npm / pnpm / yarn (Node, web, React, Next.js, Nuxt, Vue, Svelte, Angular, React Native, Expo) +- `requirements.txt`, `pyproject.toml`, `Pipfile`, `setup.py` – Python (Django, Flask, FastAPI) +- `Gemfile` - Ruby / Rails +- `composer.json` - PHP / Laravel +- `go.mod` - Go +- `build.gradle`, `build.gradle.kts`, `pom.xml` – Java / Android +- `Podfile`, `Package.swift` – iOS / Swift +- `pubspec.yaml` - Flutter / Dart +- `*.csproj` - .NET +- `mix.exs` - Elixir + +Read enough of them to identify which PostHog SDK the project uses, what version, and what framework it sits on top of. + +If the project is a monorepo, you may find multiple PostHog SDKs. + +If no PostHog SDK is anywhere in the project, emit `[ABORT] No PostHog SDK found` and stop. The wizard catches `[ABORT]` and terminates the run. + +For each dependency manifest, extract every dependency whose name starts with `posthog` (e.g. `posthog`, `posthog-node`, `posthog-js`, `posthog-python`, `posthog-ruby`). Hold `{ sdk, version, manifest, framework }` per SDK in memory. The next step uses this list. + +If no PostHog SDK is anywhere, emit `[ABORT] No PostHog SDK found`. diff --git a/transformation-config/skills/events-audit/references/2-scan.md b/transformation-config/skills/events-audit/references/2-scan.md new file mode 100644 index 0000000..8d9bde9 --- /dev/null +++ b/transformation-config/skills/events-audit/references/2-scan.md @@ -0,0 +1,262 @@ +--- +next_step: 3-extract.md +--- + +# Step 2 – Scan capture sites (two-phase) + +Find every PostHog capture/identify/group SDK call in the codebase, derive the codebase mapping (`area`, `route`, `enclosing`), and extract per-call fields. Write the inventory to disk **without ever materializing the full enriched JSON in a single model turn.** + +The previous architecture collapsed enrichment + merge into one orchestrator turn and crashed at `max_tokens` on a 51-file project. This step is split into three phases that respect that limit: + +1. **Phase 1 — orchestrator structural pass.** One Grep, write a small base inventory with `file` / `line` / `event_name_hint` per row. +2. **Phase 2 — subagent enrichment fan-out.** All subagents dispatched in **one assistant turn**. Each subagent enriches a slice of rows and writes a part-file. Subagents return a one-line confirmation, never the JSON. +3. **Phase 3 — orchestrator concat via `jq`.** A single Bash call merges part-files into the canonical inventory. Zero output tokens for the merge. + +Don't judge severity, don't infer flows, don't call MCP — those come later. + +## Status + +Emit, in order: + +``` +[STATUS] Scanning capture sites +[STATUS] Writing base inventory +[STATUS] Enriching capture sites +[STATUS] Merging part-files +``` + +## Phase 1 — Orchestrator structural pass + +### a. Grep for direct SDK calls + +Run a single `Grep` for the standard PostHog call shapes. Narrow `--include` to the languages step 1 detected — don't scan `*.kt` if the project is Python. + +``` +Grep -rn -E 'posthog\??\.(capture|identify|alias|group|setPersonProperties|setPersonPropertiesForFlags|reset)|usePostHog\(\)\??\.(capture|identify)|client\??\.capture|PostHog\??\.(shared|capture)|Posthog\(\)\??\.capture' +``` + +The `\??\.` matches both `posthog.capture(...)` and `posthog?.capture(...)` (optional chaining). JS/TS codebases routinely guard SDK calls with `?.` when the SDK may be uninitialised — missing this pattern undercounts the inventory by half or more. + +Common include patterns: + +- Python: `--include='*.py'` +- JS/TS web: `--include='*.ts' --include='*.tsx' --include='*.js' --include='*.jsx' --include='*.vue' --include='*.svelte' --include='*.html'` +- Ruby: `--include='*.rb'` +- Go: `--include='*.go'` +- Java/Kotlin/Android: `--include='*.java' --include='*.kt'` +- iOS/Swift: `--include='*.swift'` +- Flutter: `--include='*.dart'` +- C#/.NET: `--include='*.cs'` +- Elixir: `--include='*.ex' --include='*.exs'` + +**Exclude test files.** Drop hits in paths matching `*.test.*`, `*.spec.*`, `__tests__/**`, `tests/**`, `spec/**`. They pollute the inventory. + +If the result is empty: +- And the project's manifest had a PostHog SDK in step 1 → the codebase likely wraps the SDK behind a custom helper. Write `{ "rows": [], "wrapper_undetected": true }` to `.posthog-events-inventory.json` and skip phases 2 and 3 (move on to step 3). The data-quality check in the report step will flag this. +- And no SDK was in the manifest either → emit `[ABORT] No capture call sites found in any detected SDK`. + +### b. Write the base inventory + +Build base rows directly from the grep result text. **Do not read any source files in phase 1.** Each row has only what's available from the grep line itself: + +```jsonc +{ + "id": "capture--", + "file": "src/checkout/Checkout.tsx", + "line": 88, + "raw_match": " posthog.capture(\"purchase_completed\", { revenue, currency });", + "event_name_hint": "purchase_completed" +} +``` + +`event_name_hint` is best-effort: extract the first quoted string from `raw_match` (single, double, or backtick-quoted). For multi-line capture calls (`posthog.capture(\n "...", ...)`) the hint will be `null` — phase 2 resolves the canonical name by reading the file. **Don't try to be clever with regex here.** If the first quoted string is on the same line as the `.capture(` token, take it; otherwise leave `null`. + +`Write` `.posthog-events-inventory.json` with the base rows. This file is small (~40 bytes per row × 100 rows ≈ 4KB) so the Write fits in one turn easily. + +```jsonc +{ + "rows": [ ], + "wrapper_undetected": false, + "_phase": "base" +} +``` + +The `_phase: "base"` marker tells you this file is not yet enriched. Phase 3 overwrites it. + +## Phase 2 — Subagent enrichment fan-out + +### c. Decide the partition + +Count distinct files in the base inventory. + +- **≤ 8 distinct files**: skip fan-out. The orchestrator handles enrichment inline (one subagent's worth of work; the merge is small). Skip phase 2's `Agent` dispatch and proceed straight to enrichment via direct `Read` + `Write` of the part-file convention. +- **> 8 distinct files**: fan out. `N = ceil(files / 10)`, capped at 8. Round-robin assign files alphabetically to N groups; each group's row-id list is what the subagent receives. Don't bother estimating file sizes — the orchestrator's job is dispatch, not load-balancing. + +### d. Spawn N sub-agents in parallel using the `Agent` tool + +Load `Agent` once: `ToolSearch select:Agent`. + +**Spawn all N sub-agents in parallel using the `Agent` tool — one assistant turn, N tool_use blocks in the same message.** Sequential dispatch (one Agent per turn) loses ~30s of orchestration latency for no reason; the prior diagnostic confirmed this. Batch them. + +Each `Agent` invocation passes the subagent prompt template (below) plus that subagent's row-id list and the partition number N. Set `run_in_background: false` — you want their results before the merge. + +### e. Subagent prompt template + +Each subagent receives this prompt (substitute `{{N}}` and `{{ROW_IDS}}`): + +``` +You are an events-audit enrichment subagent. You will read source files and write enriched capture rows to a part-file. Do not return the rows in your final message — write to disk only. + +Inputs: +- Read .posthog-events-inventory.json once. The "rows" array contains base rows with id, file, line, raw_match, event_name_hint. +- Process only rows whose id is in this list: {{ROW_IDS}}. + +For each assigned row, read its file ONCE (cache by file path; multiple rows in the same file share one Read). For each row, produce an enriched row with these fields: + +- id, file, line — copy from the base row +- sdk — one of posthog-js, posthog-node, posthog-python, posthog-ruby, posthog-go, posthog-ios, posthog-android, posthog-react-native, posthog-flutter, posthog-php, posthog-dotnet, posthog-elixir +- call_kind — one of capture, identify, set, set_once, group, alias, reset +- event_name — the literal string in the event-name slot (resolve from the full call expression, not just the grep line). For dynamic names (variable, template literal, expression), set null and is_dynamic: true. +- is_dynamic — true if event_name couldn't be resolved to a literal +- properties — array of property keys from the properties argument (object literal / dict / hash). Empty array if the call passes a variable; empty array for non-capture call_kinds. +- conditional_fire — true if the call sits inside an if/ternary/guard that depends on something other than user identity +- distinct_id_kind — server-side SDKs only: "variable" | "literal" | "missing". null for client-side rows. +- area — codebase bucket from the file path (rules below) +- route — Next.js route if applicable, otherwise null +- enclosing — nearest enclosing function/component name from a backward scan +- status — "pending" +- volume_30d — null +- last_seen — null + +Skip $pageview and $pageleave from the SDK — they are SDK-internal except in rare manual setups. If a base row's raw_match shows $pageview/$pageleave, drop it (don't emit a row in your part-file). + +When you have all enriched rows, Write .posthog-events-inventory.part-{{N}}.json with a JSON array of the rows (no wrapper object, just [...]). Pretty-print with two-space indent. + +Final message: respond with exactly one line — "wrote part-{{N}} with M rows" — where M is the count. Do NOT include the rows in your message. Do NOT recap. Just the one line. + +Reference: per-SDK signatures, identification surfaces, area/route/enclosing rules are in the parent skill file at .claude/skills/events-audit/references/2-scan.md (sections "Reference: per-SDK signatures" through "Reference: enclosing"). Read that file once if you need them. +``` + +### f. Wait for all subagents to return + +Each subagent returns a single confirmation line. Verify each part-file exists before phase 3: + +``` +Bash: for n in 1 2 ... N; do test -f .posthog-events-inventory.part-$n.json || echo "MISSING: part-$n"; done +``` + +If any part-file is missing, the subagent failed. Re-dispatch only the failed subagent with the same row-id slice. Don't re-run successful subagents. + +## Phase 3 — Concat via jq + +### g. Merge part-files into the canonical inventory + +One `Bash` call: + +``` +jq -s '{rows: (add | sort_by(.file, .line)), wrapper_undetected: false}' .posthog-events-inventory.part-*.json > .posthog-events-inventory.json && rm .posthog-events-inventory.part-*.json +``` + +This: +- Slurps every part-file as an array of arrays +- `add` flattens to a single rows array +- `sort_by(.file, .line)` produces a stable, readable order +- Wraps in `{rows, wrapper_undetected}` +- Overwrites the base inventory with the enriched one +- Cleans up part-files + +The orchestrator never has to materialize the merged JSON in a model turn — `jq` does the merge in shell, costing zero output tokens. + +If `jq` isn't available on the user's system, fall back to a Bash one-liner using `cat` + `python3 -c`: + +``` +python3 -c "import json,glob; rows=[] +[rows.extend(json.load(open(f))) for f in sorted(glob.glob('.posthog-events-inventory.part-*.json'))] +rows.sort(key=lambda r: (r['file'], r['line'])) +json.dump({'rows': rows, 'wrapper_undetected': False}, open('.posthog-events-inventory.json','w'), indent=2)" && rm .posthog-events-inventory.part-*.json +``` + +Don't try to merge in a model turn. That's the rule that crashed the previous run. + +## Reference: per-SDK signatures + +| SDK | Capture pattern | Event-name position | Properties position | +|-----|-----------------|---------------------|---------------------| +| posthog-js | `posthog.capture("event", { props })` | positional 1 | positional 2 (object literal) | +| posthog-js (hook) | `usePostHog().capture("event", { props })` | positional 1 | positional 2 | +| posthog-node | `client.capture({ distinctId, event, properties })` | object key `event` | object key `properties` | +| posthog-python | `posthog.capture(distinct_id, "event", properties)` | positional 2 | positional 3 (dict) | +| posthog-ruby | `posthog.capture({ distinct_id:, event:, properties: })` | hash key `event` | hash key `properties` | +| posthog-go | `client.Enqueue(posthog.Capture{Event: "...", Properties: posthog.NewProperties()...})` | struct field `Event` | struct field `Properties` | +| posthog-ios | `PostHog.shared.capture("event", properties: ["k": "v"])` | positional 1 | named `properties` | +| posthog-android | `PostHog.capture("event", properties = mapOf("k" to "v"))` | positional 1 | named `properties` | +| posthog-react-native | Same shape as posthog-js | positional 1 | positional 2 | +| posthog-flutter | `Posthog().capture(eventName: "...", properties: { ... })` | named `eventName` | named `properties` | +| posthog-php | `PostHog::capture(['distinctId' => ..., 'event' => '...', 'properties' => [...]])` | array key `event` | array key `properties` | +| posthog-dotnet | `client.Capture(distinctId, "event", new() { ["k"] = "v" })` | positional 2 | positional 3 | +| posthog-elixir | `Posthog.capture("event", distinct_id, %{ k: v })` | positional 1 | positional 3 | + +## Reference: identification surfaces + +The scanner records (with `call_kind` set accordingly): + +- `posthog.identify(distinctId, $set, $set_once)` → `identify` +- `posthog.setPersonProperties({ ... })` → `set` +- `posthog.setPersonPropertiesForFlags` → `set_once` +- `posthog.group(type, key, properties)` → `group` +- `posthog.alias(alias, distinctId)` → `alias` +- `posthog.reset()` → `reset` (no event name; the identity check uses presence to score cross-device hygiene) + +## Reference: `area` rules + +Strip a single leading `src/`, `app/`, `pages/`, or `apps//` (monorepo). Then apply the first matching rule: + +| Path shape after stripping | `area` | +|---|---| +| `app//...` (Next.js app router) | `` | +| `pages//...` (Next.js pages router) | `` (use `api/` for `pages/api//...`) | +| `components//...` | `` | +| `features//...` | `` | +| `screens//...` | `` (mobile) | +| `routes//...`, `views//...`, `controllers//...` (backend) | `` | +| `hooks/...`, `lib/...`, `utils/...`, `analytics/...`, `services/...`, `helpers/...` | `shared` | +| `app/layout.tsx`, `app/template.tsx`, `_app.tsx`, `_document.tsx`, `app/error.tsx`, `app/not-found.tsx` | `global` | +| Anything else | first path segment after stripping, lowercased | + +Strip only the first matching prefix. + +## Reference: `route` rules (Next.js only) + +- `app/foo/page.tsx` → `/foo` +- `app/foo/bar/page.tsx` → `/foo/bar` +- `app/foo/[id]/page.tsx` → `/foo/[id]` +- `app/(group)/foo/page.tsx` → `/foo` (route groups in parens are ignored) +- `pages/foo.tsx` → `/foo` +- `pages/foo/[id].tsx` → `/foo/[id]` +- `pages/api/` → `/api/` (without the file extension) + +Set `route: null` for any path that isn't router-shaped. + +## Reference: `enclosing` rules + +Backward-scan from the capture line. Match these patterns (first match wins above the capture line): + +- `function (\w+)\(` (named function) +- `const (\w+) = \(?` / `const (\w+) = async` +- `export (?:default )?function (\w+)\(` +- `export const (\w+) = ` +- `class (\w+)` +- `def (\w+)\(` (Python) +- `func (\w+)\(` (Go / Swift) +- `fun (\w+)\(` (Kotlin) +- `def (\w+)` (Ruby) + +Take the closest match above the capture line at column 0 or one indent level deeper than the capture's expected wrapper. If nothing matches within ~80 lines above, set `enclosing: null`. Don't read more file context to chase it. + +For unnamed default exports (`export default function () { ... }`), use the file's basename without extension as the enclosing name (e.g. `CheckoutPage`). + +## Notes on wrapper resolution + +This step intentionally does **not** chase wrapper functions (`trackEvent`, `analytics.track`, etc.). Cross-file wrapper resolution doesn't fit cleanly in row-range subagent fan-out, and the reframing principle is "let the PM ask follow-ups." + +If `wrapper_undetected: true` (SDK in deps but no direct calls found), the report step's data-quality check surfaces it, and the suggested-follow-ups list points the PM at: *"find calls to `trackEvent`/`logEvent`/`analytics.track` and resolve their callers as additional capture sites."* diff --git a/transformation-config/skills/events-audit/references/3-extract.md b/transformation-config/skills/events-audit/references/3-extract.md new file mode 100644 index 0000000..d1f360c --- /dev/null +++ b/transformation-config/skills/events-audit/references/3-extract.md @@ -0,0 +1,54 @@ +--- +next_step: 4-mcp-query.md +--- + +# Step 3 – Resolve dynamic event names + +For inventory rows with `is_dynamic: true` or `event_name: null`, try to resolve the literal name by tracing the local code. Anything that doesn't resolve stays dynamic – the data-quality check in the report step treats unresolved dynamics as undercount risk. + +## Status + +Emit: + +``` +[STATUS] Extracting event names +``` + +## Action + +`Read` `.posthog-events-inventory.json`. Filter to rows where `is_dynamic == true` or `event_name == null`. If empty, continue to step 4 immediately. + +For each ambiguous row, `Read` its file once and try the patterns below. + +### Pattern A – constant inlining + +```ts +const EVENT = "signup_completed"; +posthog.capture(EVENT, { method }); +``` + +If `EVENT` is a `const` / `final` / `let` / module-level variable in the same file, has a literal initializer, and is never reassigned, inline its value. If it's reassigned anywhere, leave the row dynamic. + +### Pattern B – enum / object dispatch + +```ts +const EVENTS = { + SIGNUP_COMPLETED: "signup_completed", + CHECKOUT_STARTED: "checkout_started", +} as const; + +posthog.capture(EVENTS.SIGNUP_COMPLETED, { ... }); +``` + +If the property access targets an object literal in the same module and every value is a literal, inline the resolved value. Don't resolve enums imported from other modules – leave dynamic. + +### What you don't resolve + +- Names built with template literals: `` `signup_${variant}` ``. Leave dynamic. The data-quality check flags these as undercount risk. +- Names imported from another module (other than the same-file enum pattern). Leave dynamic. +- Names from network responses or feature-flag values. Leave dynamic. +- **Wrapper / function-arg passthrough.** If the dynamic name is a function parameter (`posthog.capture(eventName, ...)` where `eventName` is the enclosing function's argument), leave dynamic — chasing callers across files is intentionally out of scope. The report step's suggested follow-ups list points the PM at this case so they can ask Claude to resolve specific wrappers on demand. + +When a row can't be resolved, leave it as `is_dynamic: true` with `event_name: null`. The data-quality check counts these as undercount risk; the report's by-event table omits them (they appear only in a "dynamic captures" footnote). + +`Write` the updated inventory back. This is the only step that edits the inventory by hand – keep the two-space indent. diff --git a/transformation-config/skills/events-audit/references/4-mcp-query.md b/transformation-config/skills/events-audit/references/4-mcp-query.md new file mode 100644 index 0000000..220bac2 --- /dev/null +++ b/transformation-config/skills/events-audit/references/4-mcp-query.md @@ -0,0 +1,109 @@ +--- +next_step: 5-report.md +--- + +# Step 4 – Query PostHog (MCP) for volume + +Pull 30-day volume and `last_seen` for every event the inventory references. The SQL filters to inventory event names — orphan detection is intentionally out of scope (PostHog projects often span multiple repos, so events without a code match are usually noise from another codebase). After merging, resort `rows[]` by `volume_30d` so the report's by-event table naturally surfaces highest-impact events first. + +## Output discipline + +This step is one MCP call, one in-place merge, one `Write`. Do not re-emit the entire inventory in assistant text before writing — prior runs spent ~150 seconds streaming the JSON into the conversation before invoking `Write`, which is pure output-token waste. The flow is: + +1. Read the inventory. +2. Build the IN-list and call `query-run`. +3. Merge volume/`last_seen` into rows in your working memory. +4. Sort and tag. +5. `Write` directly. No "here's the updated inventory:" preamble. No `details` recap. + +## Status + +Emit: + +``` +[STATUS] Querying PostHog for volume +``` + +## MCP tools + +| MCP tool | When | Use | +|----------|------|-----| +| `mcp__posthog-wizard__query-run` | (c) below | Execute HogQL/SQL. Filtered query returns volume + last_seen for inventory events. | +| `mcp__posthog-wizard__insights-list` | (f) below, optional | List actions for the report appendix. | +| `mcp__posthog-wizard__entity-search` | **Avoid.** | Requires project-key permissions; personal API keys get "permission denied". The SQL approach below works regardless. | + +The active project comes from the wizard session – don't pick or switch projects yourself. + +## Action + +### a. Confirm the project + +The active project is whatever the wizard's MCP session targets. If you can't confirm it, or the user said this codebase ships to a different project, emit `[ABORT] MCP project mismatch – enrichment unsafe`. + +### b. Build the event-name list + +`Read` `.posthog-events-inventory.json`. Collect every distinct `event_name` from `rows[]` where `call_kind == "capture"` and `is_dynamic == false` and `event_name != null`. Deduplicate. This is the IN-list for the SQL. + +If the list is empty (every capture row is dynamic), skip the SQL call and proceed to (d) – every row will keep `volume_30d: 0` and `last_seen: null`. + +### c. Query volume for inventory events + +`mcp__posthog-wizard__query-run` with: + +```sql +SELECT event, + count() AS volume_30d, + max(timestamp) AS last_seen +FROM events +WHERE timestamp > now() - INTERVAL 30 DAY + AND event IN () +GROUP BY event +ORDER BY volume_30d DESC +``` + +The result covers only events the code already references – there is no `definitions[]` of the project's full event universe, by design. + +### d. Merge into the inventory + +The inventory now grows an optional `actions[]` field. Final shape: + +```jsonc +{ + "rows": [ ... ], // existing per-site rows, now with volume_30d + last_seen + "actions": [ ... ] // optional, from insights-list +} +``` + +For each `row` with `call_kind == "capture"` and a non-null `event_name`, copy `volume_30d` and `last_seen` from the SQL result keyed by `event`. Rows whose name isn't in the SQL result keep `volume_30d: 0` and `last_seen: null` – this is the phantom signal the data-quality check uses. + +### e. Resort by volume + +Sort `rows[]` in place by `volume_30d` descending (rows with `null` or `0` volume sink to the bottom; tie-break by `file:line` so ordering is stable). Non-capture rows (`identify`, `set`, `group`, etc.) have no volume – sort them after capture rows but keep them in scan order amongst themselves. + +This is the only place the inventory is reordered. The report step reads in this order – the by-event table benefits from "highest-impact first" without any extra sorting. + +### f. Tag status from volume + +Walk `rows[]` once and set `status` on every `call_kind == "capture"` row: + +- `is_dynamic == true` → `status = "dynamic"`. +- `volume_30d > 0` → `status = "resolved"` (event fired in last 30 days). +- `volume_30d == 0` and `last_seen == null` → `status = "phantom"`, `details = "event referenced in code but not seen in PostHog in last 30 days"`. + +Phantom is the inverse of orphan: the code references an event that PostHog hasn't seen recently. Could be a typo, a code path that no longer fires, or instrumentation that hasn't shipped yet. The data-quality check uses this as undercount risk. + +If the SQL call in (c) was skipped or errored (every row has `volume_30d: null`), leave `status: "pending"` on every row – the report step will note "no MCP volume data available" and judge only on code presence. + +`Write` the inventory back. + +### g. Pull actions for the report appendix (optional) + +Call `mcp__posthog-wizard__insights-list` for actions if available. The audit doesn't analyze actions – they only show up in the report appendix. If the call fails or the API can't filter to actions, drop the appendix and note that in the report. + +### h. Failure handling + +Three failure modes, in order of severity: + +- **No MCP connection or no project id.** Emit `[ABORT] MCP project mismatch – enrichment unsafe`. The wizard halts the run. +- **`query-run` errors out** (misconfigured project, schema drift). Set `volume_30d = null` and `last_seen = null` on every row and continue. The report step's data-quality check will note "no MCP volume data available" and judge only on code presence. +- **Empty result** (zero events in the last 30 days for every inventory event). Treat as "no events in PostHog – likely the wrong project" and let the data-quality check flag it. diff --git a/transformation-config/skills/events-audit/references/5-report.md b/transformation-config/skills/events-audit/references/5-report.md new file mode 100644 index 0000000..d812f72 --- /dev/null +++ b/transformation-config/skills/events-audit/references/5-report.md @@ -0,0 +1,282 @@ +--- +next_step: null +--- + +# Step 5 – Render the report + +Produce the audit deliverable in a single pass: a by-event inventory table sorted by 30-day volume, a short by-area index that doubles as a lightweight flow map, three shared qualitative checks (identity & segmentation, coverage map, data quality), and a suggested follow-ups list the PM can paste back to ask Claude. + +The skill's job ends with a map and a few short observations. **Don't cluster events into flows. Don't write per-flow narratives. Don't synthesize a story.** The PM does that on demand. + +## Output discipline + +This is one report `Write`, not a write-then-read-then-rewrite cycle. Prior runs read their own freshly-written report 23 seconds after writing it and regenerated it — that wastes ~3 minutes of generation per cycle. Compose the entire Markdown in one model turn, then call `Write` once. If something is wrong with the result, fix it via `Edit` on the same file — don't `Write` it again. + +Also: don't recap the inventory contents in assistant text before writing. Stream straight from the inventory you already read into the report. + +## Status + +Emit, in order: + +``` +[STATUS] Reading inventory +[STATUS] Computing area index +[STATUS] Analyzing identity & segmentation +[STATUS] Analyzing coverage map +[STATUS] Analyzing data quality +[STATUS] Writing report +``` + +## Action + +### a. Read the inventory + +`Read` `.posthog-events-inventory.json` once. From it you'll work with: + +- `rows[]` – capture rows (sorted by `volume_30d` desc by step 4) with `event_name`, `properties[]`, `area`, `route`, `enclosing`, `volume_30d`, `last_seen`, `status`, etc. +- `actions[]` – optional, for the appendix. +- `wrapper_undetected` – top-level boolean. + +If `rows[]` is empty, render a short report explaining the inventory is empty, resolve all three shared checks with `pending` details (no data to evaluate), and exit. + +### b. Aggregate by event name (the headline view) + +Group capture rows by `event_name` (skip rows where `is_dynamic == true` or `event_name == null`; those go to the dynamic-captures footnote). For each distinct event, compute: + +- `event` – the literal name. +- `volume_30d` – pulled from any one row (all rows for the same event share volume). +- `last_seen` – same. +- `status` – `resolved` | `phantom` | `pending` (one of the three step 4 set). +- `capture_sites[]` – list of `{ file, line, area, route, enclosing }` for every row sharing this event name. +- `properties_seen[]` – union of all `properties[]` across the rows, sorted alphabetically. + +Sort by `volume_30d` desc; phantoms sink to the bottom of the table; ties break by event name. + +### c. Compute the by-area index + +Tally distinct event names per `area`. Build `[{ area, event_count, total_volume_30d }]`, sorted by `total_volume_30d` desc. Use this as the report's "flow map" — a one-line summary at the top of §1 plus a short index. **Don't render per-area narratives.** The index points the PM at where to look; the by-event table is where they read. + +If every row collapses to one or two `area` buckets (a flat repo without per-feature directories), say so in plain language ("Capture sites all live in a few shared modules — the area grouping is coarse here") and let the PM scan the by-event table directly. + +### d. Analyze identity & segmentation (shared check) + +Reframe identity rules as PM-facing capabilities. Identification works differently on the client and the server, so judge per SDK family detected in step 1. + +#### Capabilities + +1. **Cross-session tracking.** + - **Client-side family present** (posthog-js, react-native, ios, android, flutter): pass if a `call_kind == "identify"` row exists with a stable user id as first arg (`session.user.id`, `auth.uid()`, JWT `sub`, or similar named variable — not a session-only id, not anonymous), and that identify row precedes the first `capture` row in the same file or auth boundary. + - **Server-side family present** (posthog-node, python, ruby, go, php, dotnet, elixir): pass if **most** capture rows for that SDK have `distinct_id_kind == "variable"`. Server-side identification is per-call by design; an `identify()` row is **not** required. Fail if the dominant pattern is `"missing"` or `"literal"`. + - **Both families present:** both branches must pass independently. If client identifies but server fires personless captures (or vice versa), users will appear as two distinct profiles — call this out. + +2. **Plan-level breakdown.** Passes if any `set` / `set_once` row sets a `plan` (or `tier`, `subscription_tier`) person property; or `plan` appears in ≥1 capture row's `properties[]`. + +3. **Org / team / workspace breakdown.** Passes if any `group` row exists with type `organization`, `team`, `workspace`, or similar. + +4. **Cross-device tracking.** Passes if any `reset` row exists. Server-only projects skip this — the concept doesn't apply. + +#### Rendering shape (§3 in the report) + +Render as **bold lead** + one bold-leading bullet per capability + sub-bullets for granular evidence. **No prose paragraphs.** Every capability gets its own bullet — consistent shape across audits is what makes the section scannable. + +```markdown +**** + +- **Cross-session (client)** — . . + - +- **Cross-session (server)** — . . +- **Plan / tier breakdown** — . . +- **Org / workspace breakdown** — . . +- **Cross-device hygiene** — . . +``` + +If a capability doesn't apply (e.g. server-only project for cross-device), still emit the bullet with `n/a — `. Don't omit it. + +#### Resolve the check + +Call `mcp__wizard-tools__audit_resolve_checks` for `identity-segmentation` with status `pass` if all applicable capabilities pass, `warning` if cross-session is partial or one segmentation breakdown is blocked, `error` if cross-session fails. `details` mirrors the rendering shape above. No grades. + +### e. Analyze coverage map (shared check) + +Walk the by-area index from (c). Coverage is qualitative — describe state, don't grade. + +#### Things to call out + +- **Distribution** — how many areas carry events; what kinds of activity they cover (engagement, conversion, content, server-side, etc.). One bullet, factual. +- **Dark surfaces** — areas where captures exist in code but have zero 30-day volume. Name the area and a representative file. Each dark surface gets its own bullet. +- **Reliance on `shared` / `global`** — if these areas carry a large share of captures, flag it: the coverage map can't tell you which user-visible surface fired the event without a follow-up. +- **Person properties without events** — `setPersonProperties` calls in areas that have no `capture` events. Person properties without events mean you can describe the user but can't count their actions. +- **Wrapper-undetected** — if `wrapper_undetected == true` from step 2: "An SDK is installed but no direct capture sites were found. There's likely a wrapper the scanner didn't follow." +- **Coarse grouping** — if only one or two `area` buckets exist: "The repo isn't organized by feature; the by-event table is the primary view." + +#### Rendering shape (§4 in the report) + +Render as **bold lead** + one bold-leading bullet per observation + sub-bullets for evidence. **No prose paragraphs.** Use the bullet labels above (`Distribution`, `Dark surface — `, `Reliance on shared/global`, etc.) so multiple audits stay comparable. + +```markdown +**** + +- **Distribution** — distinct areas carry events: . +- **Dark surface — ** — events implemented at `` and `` have zero 30-day volume. . +- **Reliance on `shared`** — all fire from ``. Without a `source` property, you can't tell which page surface triggered them. +- ** sets person properties but emits no events** — ``. You see who but not what they did. +``` + +Skip bullets that don't apply. Don't render an empty "Wrapper-undetected: n/a" bullet. + +#### Resolve the check + +Call `audit_resolve_checks` for `coverage-map` with status `pass` (broad coverage, multiple areas, no dark surfaces), `warning` (one or more dark surfaces, or heavy reliance on `shared`), or `suggestion` (wrapper-undetected or coarse grouping). `details` mirrors the rendering shape above. + +### f. Analyze data quality (shared check) + +Walk the inventory once. Only flag issues that bite a PM building dashboards. + +1. **Name drift** — same concept under two different keys. Heuristic: lowercase + strip underscores; if two keys collapse to the same string, that's drift. Examples: `user_id` vs `userId`, `signup_method` vs `method`. **Splits funnels.** +2. **Type drift on numeric properties** — for keys named `revenue`, `amount`, `price`, `count`, `duration_*`, `quantity`, scan call-site literals; mixing number and string is an error. **Silently zeros out aggregates.** +3. **Conditional-fire undercount** — count rows with `conditional_fire: true` and list affected events. **Funnel undercounts on certain code paths.** +4. **Duplicate-event overcount** — same event name on two SDK families. Skip when one is in test files or one explicitly threads `distinctId` from request context. +5. **Phantom events** — `status == "phantom"` rows. List the top offenders. **Either typo, dead code path, or instrumentation that hasn't shipped.** +6. **Unresolved dynamic names** — rows where step 3 left `is_dynamic: true`. Flag as undercount risk. + +#### Rendering shape (§5 in the report) + +Render as **bold lead** stating the worst issue as a PM cost + one bold-leading bullet per issue + sub-bullets for granular evidence (call sites, property unions, paired events). **No prose paragraphs.** + +```markdown +**** + +- **** — . + - + - +- **** — . +``` + +Sort issues by PM cost: type drift > name drift on flagship events > duplicate captures > conditional fires > phantom clusters > unresolved dynamics. The lead bold sentence names whichever issue tops that list. + +#### Resolve the check + +Call `audit_resolve_checks` for `data-quality` with status `pass` (no issues), `warning` (one or two issues), or `error` (type drift, name drift on flagship events, or many phantoms). `details` mirrors the rendering shape above. + +### g. Render the report + +`Write` `posthog-events-audit-report.md` at the project root. Single Markdown file, composed in one model turn. Strip the `BEGIN-REPORT` / `END-REPORT` markers when writing. + +```markdown + +# PostHog events audit – {{repo_name}} + +_Generated {{timestamp}}_ + +This audit lists every event your code captures, where it fires, and how often PostHog has seen it in the last 30 days. The deliverable is the inventory plus three short observations — use the suggested follow-ups at the end to ask Claude focused questions against the inventory. + +## 1. Events by volume + +{{one-line summary: " distinct events captured across areas; top areas: ."}} + +| Event | Volume (30d) | Sites | Areas | Properties | +|-------|--------------|-------|-------|------------| +{{event_rows}} + +Notes column conventions: +- `(phantom)` after the event name when `status == "phantom"`. +- `(conditional)` when any site has `conditional_fire == true`. +- Sites column: count of distinct `file:line` (e.g. `3 sites`). +- Areas column: comma-separated unique `area` values for this event (`checkout`, `Posts`). +- Properties column: comma-separated keys, truncated to ~5 with `… (+N more)` if longer. + +### Capture sites per event + +For each event in the table above, render a collapsible-style block: + +```markdown +
+purchase_completed — 1,400 events / 3 sites + +- `src/checkout/Checkout.tsx:88` — area `checkout`, route `/checkout`, enclosing `handleSubmit` +- `mobile/Checkout.tsx:44` — area `checkout`, enclosing `onPaymentSuccess` +- `api/webhooks/stripe.py:120` — area `api/webhooks`, enclosing `handle_payment_intent` + +Properties seen: `revenue`, `currency`, `plan` +
+``` + +Use HTML `
` so the report stays scannable but every site is one click away. + +## 2. By area + +A coarse map of where instrumentation lives. + +| Area | Events | 30d volume | +|------|--------|------------| +{{area_index_rows}} + +{{one-line note from step (c): coarse-grouping or normal}} + +## 3. Identity & segmentation + +{{identity_segmentation_details}} + +## 4. Coverage map + +{{coverage_map_details}} + +## 5. Data quality + +{{data_quality_details}} + +## Suggested follow-ups + +You can ask Claude any of these against the inventory at `.posthog-events-inventory.json`: + +- Which of these events fire on ``? (e.g. signup, checkout, onboarding) +- Which events have inconsistent property naming or types? +- Build a funnel from `` to `` and tell me the drop-off. +- Which areas have the highest event volume but the thinnest property coverage? +- Which phantom events look like dead instrumentation we can remove? + +## Appendix – dynamic event names + +Events whose name couldn't be resolved at scan time (template literal, network value, or imported enum). Listed for completeness; not in §1's table. + +{{dynamic_appendix}} + +## Appendix – person properties (`identify` / `set` / `set_once`) + +{{person_properties_appendix}} + +## Appendix – groups (`group`) + +{{groups_appendix}} + +## Appendix – actions + +{{actions_appendix}} + +## About this audit + +Generated by the PostHog events-audit skill. The full inventory is at `.posthog-events-inventory.json` (kept after the run for follow-up questions). Re-run `posthog-wizard events-audit` to refresh. + +``` + +### Rendering rules + +- **One `Write` call.** Compose the full Markdown in your turn before invoking `Write`. Don't pre-stream the content into assistant text. +- **Plain language, no grades.** Don't render the check `status` enum (`pass`/`warning`/`error`) as a badge or label in the report. Use prominence and word choice — a missing flagship capability leads its section; a nice-to-have is a footnote bullet. +- **`file:line` citations** on every non-pass observation. +- **Fan-out is not used in this step.** The data fits in one turn. + +### h. Surface the deliverables + +The inventory is the deliverable. **Do not delete `.posthog-events-inventory.json`.** + +Emit two trailing lines so the wizard can surface both files to the user: + +``` +Created events audit report: +Kept events inventory: +``` + +## Resolve + +`next_step: null` – the chain ends here. By the end of this step, all three shared checks (`identity-segmentation`, `coverage-map`, `data-quality`) must be resolved via `audit_resolve_checks`. There are no per-flow checks to resolve. From 13e505db04d633728c01f65202d468374a403e84 Mon Sep 17 00:00:00 2001 From: Edwin Lim Date: Sat, 9 May 2026 23:46:18 -0400 Subject: [PATCH 2/5] workflow --- .../skills/events-audit/config.yaml | 2 +- .../skills/events-audit/description.md | 19 +- .../references/2-scan-enrichment.md | 101 +++++ .../references/2-scan-subagent-prompt.md | 43 +++ .../skills/events-audit/references/2-scan.md | 138 +------ .../events-audit/references/3-extract.md | 2 +- .../events-audit/references/4-mcp-query.md | 16 +- .../references/5-report-template.md | 112 ++++++ .../events-audit/references/5-report.md | 344 ++++++++---------- .../events-audit/references/6-dashboard.md | 158 ++++++++ 10 files changed, 594 insertions(+), 341 deletions(-) create mode 100644 transformation-config/skills/events-audit/references/2-scan-enrichment.md create mode 100644 transformation-config/skills/events-audit/references/2-scan-subagent-prompt.md create mode 100644 transformation-config/skills/events-audit/references/5-report-template.md create mode 100644 transformation-config/skills/events-audit/references/6-dashboard.md diff --git a/transformation-config/skills/events-audit/config.yaml b/transformation-config/skills/events-audit/config.yaml index 817a26f..12e462f 100644 --- a/transformation-config/skills/events-audit/config.yaml +++ b/transformation-config/skills/events-audit/config.yaml @@ -1,6 +1,6 @@ type: docs-only template: description.md -description: Audit PostHog events in a codebase — produce an inventory of every captured event mapped to its file, area, and 30-day volume for the PM to query +description: Audit PostHog events in a codebase — produce an inventory of every captured event mapped to its file, area, and 30-day volume for the product team to query tags: [analytics, audit, best-practices] references: preamble: "**Read ONLY this file.** Do not read any other reference file until this one tells you to." diff --git a/transformation-config/skills/events-audit/description.md b/transformation-config/skills/events-audit/description.md index e81dabc..4b54d52 100644 --- a/transformation-config/skills/events-audit/description.md +++ b/transformation-config/skills/events-audit/description.md @@ -1,18 +1,19 @@ # PostHog events audit -This skill produces a PM-browseable inventory of every PostHog event your code captures, mapped to the codebase by file path and enriched with 30-day volume from PostHog. The PM does the synthesis on demand by asking follow-up questions against the inventory — the skill itself doesn't cluster events into flows or write per-flow narratives. +This skill produces a product-browseable report of every PostHog event your code captures, mapped to the codebase by file path and enriched with 30-day volume from PostHog. The reader does the synthesis on demand by asking follow-up questions about the report — the skill itself doesn't cluster events into flows or write per-flow narratives. The checklist has three shared checks: `identity-segmentation`, `coverage-map`, `data-quality`. Finish each one. Don't invent new ids. ## Workflow -The audit runs as a 5-step chain: +The audit runs as a 6-step chain: 1. Detect 2. Scan 3. Extract 4. Enrich 5. Report +6. Dashboard Each step file points to the next. Run them in order. Don't explore the source tree on your own. @@ -22,17 +23,15 @@ Step 1 confirms the shape and reseeds if it's missing or out of date. As you fin **Start by reading `references/1-detect.md`** (relative to this skill's directory – typically `.claude/skills/events-audit/references/1-detect.md`). Don't read ahead. Don't re-read a step once you've passed it. Don't re-read SKILL.md. -Some tools are deferred by the SDK – load each once via `ToolSearch select:` before first use: `Read`, `Bash`, `Glob`, `Grep`, `Write`, `mcp__wizard-tools__audit_resolve_checks`, `mcp__wizard-tools__audit_seed_checks`, and the PostHog query tools `mcp__posthog-wizard__query-run` and `mcp__posthog-wizard__insights-list`. Use `ToolSearch` to load named tools only – don't browse. +Some tools are deferred by the SDK – load each once via `ToolSearch select:` before first use: `Read`, `Bash`, `Glob`, `Grep`, `Write`, `mcp__wizard-tools__audit_resolve_checks`, `mcp__wizard-tools__audit_seed_checks`, and the PostHog query tool `mcp__posthog-wizard__query-run`. The dashboard write tools `mcp__posthog-wizard__dashboard-create` and `mcp__posthog-wizard__insight-create` are loaded inside step 6. Use `ToolSearch` to load named tools only – don't browse. `Agent` is **not** in the default load list. Step 2 is the only place where fan-out is conditional; load `Agent` *inside* step 2, only after deciding to dispatch subagents. -**Don't call `TodoWrite`.** Progress comes from the checklist and `[STATUS]` lines, not a todo list. - If the wizard prompt names a framework (e.g. "Framework: Flask"), use it to narrow your scans – skip manifests and language patterns that don't apply. ## When to trigger -Trigger when the user asks for an event audit, event inventory, or events documentation; "what events does my code capture"; "find redundant or stale events"; or "which PM questions can my data answer." +Trigger when the user asks for an event audit, event inventory, or events documentation; "what events does my code capture"; "find redundant or stale events"; or "which product questions can my data answer." Don't trigger when the user wants to *add* instrumentation (defer to `instrument-product-analytics`) or debug a single missing event (defer to `diagnosing-missing-recordings`). @@ -53,7 +52,9 @@ The checklist lives at `.posthog-audit-checks.json` and shows live in the "Audit - `mcp__wizard-tools__audit_resolve_checks({ updates })` - patch one or more checks by `id`. Each `update` is `{ id, status, file?, details? }`. Emit one call per check as you finish its analysis – the "Audit plan" tab updates live, so streaming resolutions one-at-a-time gives the user visible progress instead of a single end-of-step flip. Only batch when you genuinely produce two updates in the same model turn (rare). - `mcp__wizard-tools__audit_seed_checks({ checks })` - replaces the whole checklist atomically. Step 1's fallback uses this when the file is missing or out of date; otherwise don't call it. -A second file, `.posthog-events-inventory.json`, holds the capture sites with derived `area`/`route`/`enclosing` fields, event names, properties, and per-event volume from PostHog. You write it directly in steps 2 through 4. It's **not** MCP-owned – no `audit_*` tool guards it. **The inventory is the audit's deliverable** — keep it on disk after the report is written so the PM can ask follow-ups against it. +A second file, `.posthog-events-inventory.json`, is the working ledger for steps 2 through 4. It holds the capture sites with derived `package`/`area`/`route`/`enclosing` fields, event names, properties, and per-event volume from PostHog. + +It's **not** MCP-owned – no `audit_*` tool guards it. The inventory is **transient scratch state**, not a deliverable: step 5 deletes `.posthog-audit-checks.json` once the report is written, and step 6 deletes the inventory after the optional dashboard step. The report is the only artifact the user keeps. ### Check entry shape @@ -62,13 +63,13 @@ A second file, `.posthog-events-inventory.json`, holds the capture sites with de - `label` - short human name. - `status` - `pending` | `pass` | `error` | `warning` | `suggestion`. - `file` - optional `path:line` for findings tied to a location. -- `details` - Markdown bulleted summary in plain language. Describe state and the PM questions blocked. Don't render `status` as a grade in the report; the enum is for filter logic only. +- `details` - Markdown bulleted summary in plain language. Describe state and the product questions blocked. Don't render `status` as a grade in the report; the enum is for filter logic only. ## Key principles - **Show your evidence.** Cite `file:line` for every non-pass finding. - **Frame findings as product questions.** Every finding describes *what product question or insight it blocks*, not what code rule it breaks. -- **Hand the PM the map. Don't tell the story for them.** The deliverable is an inventory plus three short qualitative checks plus a few suggested follow-ups. The PM clusters events into flows on demand by asking targeted follow-up questions against the inventory — the skill doesn't do that synthesis upfront. +- **Hand the reader the map. Don't tell the story for them.** The deliverable is a single report with three short qualitative checks plus a few suggested follow-ups. The reader clusters events into flows on demand by asking targeted follow-up questions about the report — the skill doesn't do that synthesis upfront. ## Abort statuses diff --git a/transformation-config/skills/events-audit/references/2-scan-enrichment.md b/transformation-config/skills/events-audit/references/2-scan-enrichment.md new file mode 100644 index 0000000..37384cc --- /dev/null +++ b/transformation-config/skills/events-audit/references/2-scan-enrichment.md @@ -0,0 +1,101 @@ +# Step 2 enrichment reference + +Lookup tables and rules subagents apply during step 2 enrichment. Read this file **once** at the start of your enrichment run. + +This file is supporting material for step 2; it has no `next_step` and is not part of the main step chain. The orchestrator does not read it. + +## Per-SDK capture call signatures + +| SDK | Capture pattern | Event-name position | Properties position | +|-----|-----------------|---------------------|---------------------| +| posthog-js | `posthog.capture("event", { props })` | positional 1 | positional 2 (object literal) | +| posthog-js (hook) | `usePostHog().capture("event", { props })` | positional 1 | positional 2 | +| posthog-node | `client.capture({ distinctId, event, properties })` | object key `event` | object key `properties` | +| posthog-python | `posthog.capture(distinct_id, "event", properties)` | positional 2 | positional 3 (dict) | +| posthog-ruby | `posthog.capture({ distinct_id:, event:, properties: })` | hash key `event` | hash key `properties` | +| posthog-go | `client.Enqueue(posthog.Capture{Event: "...", Properties: posthog.NewProperties()...})` | struct field `Event` | struct field `Properties` | +| posthog-ios | `PostHog.shared.capture("event", properties: ["k": "v"])` | positional 1 | named `properties` | +| posthog-android | `PostHog.capture("event", properties = mapOf("k" to "v"))` | positional 1 | named `properties` | +| posthog-react-native | Same shape as posthog-js | positional 1 | positional 2 | +| posthog-flutter | `Posthog().capture(eventName: "...", properties: { ... })` | named `eventName` | named `properties` | +| posthog-php | `PostHog::capture(['distinctId' => ..., 'event' => '...', 'properties' => [...]])` | array key `event` | array key `properties` | +| posthog-dotnet | `client.Capture(distinctId, "event", new() { ["k"] = "v" })` | positional 2 | positional 3 | +| posthog-elixir | `Posthog.capture("event", distinct_id, %{ k: v })` | positional 1 | positional 3 | + +## Identification surfaces + +Set `call_kind` according to the call: + +- `posthog.identify(distinctId, $set, $set_once)` → `identify` +- `posthog.setPersonProperties({ ... })` → `set` +- `posthog.setPersonPropertiesForFlags` → `set_once` +- `posthog.group(type, key, properties)` → `group` +- `posthog.alias(alias, distinctId)` → `alias` +- `posthog.reset()` → `reset` (no event name; the identity check uses presence to score cross-device hygiene) + +## `package` rules (monorepo dimension) + +Compute `package` **before** `area` from the file path. Match the first prefix below; everything after the prefix's package segment is what `area` rules then operate on. + +| Path prefix | `package` | +|---|---| +| `apps//...` | `` | +| `packages//...` | `` | +| `services//...` | `` | +| `projects//...` | `` | +| Anything else | `null` | + +Examples: +- `apps/web/components/Checkout/Checkout.tsx` → `package: "web"`, then `area` rules see `components/Checkout/Checkout.tsx`. +- `packages/sdk/src/track.ts` → `package: "sdk"`, then `area` rules see `src/track.ts`. +- `src/checkout/Checkout.tsx` → `package: null`, `area` rules see the original path. + +Don't fabricate a package from `src/` or `app/` — those are within-package directories, not package roots. + +## `area` rules + +After `package` extraction, strip one leading `src/`, `app/`, or `pages/` from the remaining path. Then apply the first matching rule: + +| Path shape after stripping | `area` | +|---|---| +| `app//...` (Next.js app router) | `` | +| `pages//...` (Next.js pages router) | `` (use `api/` for `pages/api//...`) | +| `components//...` | `` | +| `features//...` | `` | +| `screens//...` | `` (mobile) | +| `routes//...`, `views//...`, `controllers//...` (backend) | `` | +| `hooks/...`, `lib/...`, `utils/...`, `analytics/...`, `services/...`, `helpers/...` | `shared` | +| `app/layout.tsx`, `app/template.tsx`, `_app.tsx`, `_document.tsx`, `app/error.tsx`, `app/not-found.tsx` | `global` | +| Anything else | first path segment after stripping, lowercased | + +Strip only the first matching prefix. + +## `route` rules (Next.js only) + +- `app/foo/page.tsx` → `/foo` +- `app/foo/bar/page.tsx` → `/foo/bar` +- `app/foo/[id]/page.tsx` → `/foo/[id]` +- `app/(group)/foo/page.tsx` → `/foo` (route groups in parens are ignored) +- `pages/foo.tsx` → `/foo` +- `pages/foo/[id].tsx` → `/foo/[id]` +- `pages/api/` → `/api/` (without the file extension) + +Set `route: null` for any path that isn't router-shaped. Don't fabricate routes for non-Next.js codebases. + +## `enclosing` rules + +Backward-scan from the capture line. Match these patterns (first match wins above the capture line): + +- `function (\w+)\(` (named function) +- `const (\w+) = \(?` / `const (\w+) = async` +- `export (?:default )?function (\w+)\(` +- `export const (\w+) = ` +- `class (\w+)` +- `def (\w+)\(` (Python) +- `func (\w+)\(` (Go / Swift) +- `fun (\w+)\(` (Kotlin) +- `def (\w+)` (Ruby) + +Take the closest match above the capture line at column 0 or one indent level deeper than the capture's expected wrapper. If nothing matches within ~80 lines above, set `enclosing: null`. Don't read more file context to chase it. + +For unnamed default exports (`export default function () { ... }`), use the file's basename without extension as the enclosing name (e.g. `CheckoutPage`). diff --git a/transformation-config/skills/events-audit/references/2-scan-subagent-prompt.md b/transformation-config/skills/events-audit/references/2-scan-subagent-prompt.md new file mode 100644 index 0000000..b9f7b00 --- /dev/null +++ b/transformation-config/skills/events-audit/references/2-scan-subagent-prompt.md @@ -0,0 +1,43 @@ + + +You are an events-audit enrichment subagent. You will read source files and write enriched capture rows to a part-file. Do not return the rows in your final message — write to disk only. + +Inputs: +- Read `.posthog-events-inventory.json` once. The `rows` array contains base rows with `id`, `file`, `line`, `raw_match`, `event_name_hint`. +- Process only rows whose `id` is in this list: {{ROW_IDS}}. + +For each assigned row, read its file **once** (cache by file path; multiple rows in the same file share one `Read`). For each row, produce an enriched row with these fields: + +- `id`, `file`, `line` — copy from the base row. +- `sdk` — one of `posthog-js`, `posthog-node`, `posthog-python`, `posthog-ruby`, `posthog-go`, `posthog-ios`, `posthog-android`, `posthog-react-native`, `posthog-flutter`, `posthog-php`, `posthog-dotnet`, `posthog-elixir`. +- `call_kind` — one of `capture`, `identify`, `set`, `set_once`, `group`, `alias`, `reset`. +- `event_name` — the literal string in the event-name slot (resolve from the full call expression, not just the grep line). For dynamic names (variable, template literal, expression), set `null` and `is_dynamic: true`. +- `is_dynamic` — `true` if `event_name` couldn't be resolved to a literal. +- `properties` — array of property keys from the properties argument (object literal / dict / hash). Empty array if the call passes a variable; empty array for non-capture `call_kind`s. +- `conditional_fire` — `true` if the call sits inside an `if` / ternary / guard that depends on something other than user identity. +- `distinct_id_kind` — server-side SDKs only: `"variable"` | `"literal"` | `"missing"`. `null` for client-side rows. +- `package` — monorepo package name from `apps//`, `packages//`, `services//`, or `projects//` prefix. `null` for single-app repos. See the `package` rules in the enrichment reference. +- `area` — codebase bucket from the file path (computed *after* the `package` prefix is stripped). +- `route` — Next.js route if applicable, otherwise `null`. +- `enclosing` — nearest enclosing function/component name from a backward scan. +- `status` — `"pending"`. +- `volume_30d` — `null`. +- `last_seen` — `null`. + +Skip `$pageview` and `$pageleave` from the SDK — they are SDK-internal except in rare manual setups. If a base row's `raw_match` shows `$pageview` / `$pageleave`, drop the row (don't emit it in your part-file). + +When you have all enriched rows, `Write` `.posthog-events-inventory.part-{{N}}.json` with a JSON array of the rows (no wrapper object — just `[...]`). Pretty-print with two-space indent. + +Final message: respond with exactly one line — `"wrote part-{{N}} with M rows"` — where `M` is the count. Do NOT include the rows in your message. Do NOT recap. Just the one line. + +Reference: read `.claude/skills/events-audit/references/2-scan-enrichment.md` once for per-SDK call signatures, identification surfaces, and the `area` / `route` / `enclosing` rules. diff --git a/transformation-config/skills/events-audit/references/2-scan.md b/transformation-config/skills/events-audit/references/2-scan.md index 8d9bde9..f3f3f33 100644 --- a/transformation-config/skills/events-audit/references/2-scan.md +++ b/transformation-config/skills/events-audit/references/2-scan.md @@ -14,6 +14,13 @@ The previous architecture collapsed enrichment + merge into one orchestrator tur Don't judge severity, don't infer flows, don't call MCP — those come later. +## Supporting files + +This step uses two supporting reference files (not part of the chain): + +- `references/2-scan-subagent-prompt.md` — verbatim subagent prompt template. Orchestrator reads it once at phase 2 start, substitutes `{{N}}` and `{{ROW_IDS}}`, passes the result to each `Agent` invocation. +- `references/2-scan-enrichment.md` — per-SDK call signatures, identification surfaces, `area` / `route` / `enclosing` rules. Subagents read it once during enrichment; the orchestrator does not. + ## Status Emit, in order: @@ -96,50 +103,20 @@ Count distinct files in the base inventory. Load `Agent` once: `ToolSearch select:Agent`. -**Spawn all N sub-agents in parallel using the `Agent` tool — one assistant turn, N tool_use blocks in the same message.** Sequential dispatch (one Agent per turn) loses ~30s of orchestration latency for no reason; the prior diagnostic confirmed this. Batch them. - -Each `Agent` invocation passes the subagent prompt template (below) plus that subagent's row-id list and the partition number N. Set `run_in_background: false` — you want their results before the merge. - -### e. Subagent prompt template +Read `references/2-scan-subagent-prompt.md`. Strip the leading HTML comment block (everything between ``, inclusive), then substitute: -Each subagent receives this prompt (substitute `{{N}}` and `{{ROW_IDS}}`): - -``` -You are an events-audit enrichment subagent. You will read source files and write enriched capture rows to a part-file. Do not return the rows in your final message — write to disk only. +- `{{N}}` — the partition number for that subagent (`1`, `2`, ..., up to N) +- `{{ROW_IDS}}` — JSON array of the row IDs assigned to that subagent -Inputs: -- Read .posthog-events-inventory.json once. The "rows" array contains base rows with id, file, line, raw_match, event_name_hint. -- Process only rows whose id is in this list: {{ROW_IDS}}. +The substituted text is the full prompt for that subagent. -For each assigned row, read its file ONCE (cache by file path; multiple rows in the same file share one Read). For each row, produce an enriched row with these fields: - -- id, file, line — copy from the base row -- sdk — one of posthog-js, posthog-node, posthog-python, posthog-ruby, posthog-go, posthog-ios, posthog-android, posthog-react-native, posthog-flutter, posthog-php, posthog-dotnet, posthog-elixir -- call_kind — one of capture, identify, set, set_once, group, alias, reset -- event_name — the literal string in the event-name slot (resolve from the full call expression, not just the grep line). For dynamic names (variable, template literal, expression), set null and is_dynamic: true. -- is_dynamic — true if event_name couldn't be resolved to a literal -- properties — array of property keys from the properties argument (object literal / dict / hash). Empty array if the call passes a variable; empty array for non-capture call_kinds. -- conditional_fire — true if the call sits inside an if/ternary/guard that depends on something other than user identity -- distinct_id_kind — server-side SDKs only: "variable" | "literal" | "missing". null for client-side rows. -- area — codebase bucket from the file path (rules below) -- route — Next.js route if applicable, otherwise null -- enclosing — nearest enclosing function/component name from a backward scan -- status — "pending" -- volume_30d — null -- last_seen — null - -Skip $pageview and $pageleave from the SDK — they are SDK-internal except in rare manual setups. If a base row's raw_match shows $pageview/$pageleave, drop it (don't emit a row in your part-file). - -When you have all enriched rows, Write .posthog-events-inventory.part-{{N}}.json with a JSON array of the rows (no wrapper object, just [...]). Pretty-print with two-space indent. - -Final message: respond with exactly one line — "wrote part-{{N}} with M rows" — where M is the count. Do NOT include the rows in your message. Do NOT recap. Just the one line. +**Spawn all N sub-agents in parallel using the `Agent` tool — one assistant turn, N tool_use blocks in the same message.** Sequential dispatch (one Agent per turn) loses ~30s of orchestration latency for no reason; the prior diagnostic confirmed this. Batch them. -Reference: per-SDK signatures, identification surfaces, area/route/enclosing rules are in the parent skill file at .claude/skills/events-audit/references/2-scan.md (sections "Reference: per-SDK signatures" through "Reference: enclosing"). Read that file once if you need them. -``` +Set `run_in_background: false` — you want their results before the merge. -### f. Wait for all subagents to return +### e. Wait for all subagents to return -Each subagent returns a single confirmation line. Verify each part-file exists before phase 3: +Each subagent returns a single confirmation line (`"wrote part-N with M rows"`). Verify each part-file exists before phase 3: ``` Bash: for n in 1 2 ... N; do test -f .posthog-events-inventory.part-$n.json || echo "MISSING: part-$n"; done @@ -149,7 +126,7 @@ If any part-file is missing, the subagent failed. Re-dispatch only the failed su ## Phase 3 — Concat via jq -### g. Merge part-files into the canonical inventory +### f. Merge part-files into the canonical inventory One `Bash` call: @@ -167,7 +144,7 @@ This: The orchestrator never has to materialize the merged JSON in a model turn — `jq` does the merge in shell, costing zero output tokens. -If `jq` isn't available on the user's system, fall back to a Bash one-liner using `cat` + `python3 -c`: +If `jq` isn't available on the user's system, fall back to a Bash one-liner using Python: ``` python3 -c "import json,glob; rows=[] @@ -178,85 +155,8 @@ json.dump({'rows': rows, 'wrapper_undetected': False}, open('.posthog-events-inv Don't try to merge in a model turn. That's the rule that crashed the previous run. -## Reference: per-SDK signatures - -| SDK | Capture pattern | Event-name position | Properties position | -|-----|-----------------|---------------------|---------------------| -| posthog-js | `posthog.capture("event", { props })` | positional 1 | positional 2 (object literal) | -| posthog-js (hook) | `usePostHog().capture("event", { props })` | positional 1 | positional 2 | -| posthog-node | `client.capture({ distinctId, event, properties })` | object key `event` | object key `properties` | -| posthog-python | `posthog.capture(distinct_id, "event", properties)` | positional 2 | positional 3 (dict) | -| posthog-ruby | `posthog.capture({ distinct_id:, event:, properties: })` | hash key `event` | hash key `properties` | -| posthog-go | `client.Enqueue(posthog.Capture{Event: "...", Properties: posthog.NewProperties()...})` | struct field `Event` | struct field `Properties` | -| posthog-ios | `PostHog.shared.capture("event", properties: ["k": "v"])` | positional 1 | named `properties` | -| posthog-android | `PostHog.capture("event", properties = mapOf("k" to "v"))` | positional 1 | named `properties` | -| posthog-react-native | Same shape as posthog-js | positional 1 | positional 2 | -| posthog-flutter | `Posthog().capture(eventName: "...", properties: { ... })` | named `eventName` | named `properties` | -| posthog-php | `PostHog::capture(['distinctId' => ..., 'event' => '...', 'properties' => [...]])` | array key `event` | array key `properties` | -| posthog-dotnet | `client.Capture(distinctId, "event", new() { ["k"] = "v" })` | positional 2 | positional 3 | -| posthog-elixir | `Posthog.capture("event", distinct_id, %{ k: v })` | positional 1 | positional 3 | - -## Reference: identification surfaces - -The scanner records (with `call_kind` set accordingly): - -- `posthog.identify(distinctId, $set, $set_once)` → `identify` -- `posthog.setPersonProperties({ ... })` → `set` -- `posthog.setPersonPropertiesForFlags` → `set_once` -- `posthog.group(type, key, properties)` → `group` -- `posthog.alias(alias, distinctId)` → `alias` -- `posthog.reset()` → `reset` (no event name; the identity check uses presence to score cross-device hygiene) - -## Reference: `area` rules - -Strip a single leading `src/`, `app/`, `pages/`, or `apps//` (monorepo). Then apply the first matching rule: - -| Path shape after stripping | `area` | -|---|---| -| `app//...` (Next.js app router) | `` | -| `pages//...` (Next.js pages router) | `` (use `api/` for `pages/api//...`) | -| `components//...` | `` | -| `features//...` | `` | -| `screens//...` | `` (mobile) | -| `routes//...`, `views//...`, `controllers//...` (backend) | `` | -| `hooks/...`, `lib/...`, `utils/...`, `analytics/...`, `services/...`, `helpers/...` | `shared` | -| `app/layout.tsx`, `app/template.tsx`, `_app.tsx`, `_document.tsx`, `app/error.tsx`, `app/not-found.tsx` | `global` | -| Anything else | first path segment after stripping, lowercased | - -Strip only the first matching prefix. - -## Reference: `route` rules (Next.js only) - -- `app/foo/page.tsx` → `/foo` -- `app/foo/bar/page.tsx` → `/foo/bar` -- `app/foo/[id]/page.tsx` → `/foo/[id]` -- `app/(group)/foo/page.tsx` → `/foo` (route groups in parens are ignored) -- `pages/foo.tsx` → `/foo` -- `pages/foo/[id].tsx` → `/foo/[id]` -- `pages/api/` → `/api/` (without the file extension) - -Set `route: null` for any path that isn't router-shaped. - -## Reference: `enclosing` rules - -Backward-scan from the capture line. Match these patterns (first match wins above the capture line): - -- `function (\w+)\(` (named function) -- `const (\w+) = \(?` / `const (\w+) = async` -- `export (?:default )?function (\w+)\(` -- `export const (\w+) = ` -- `class (\w+)` -- `def (\w+)\(` (Python) -- `func (\w+)\(` (Go / Swift) -- `fun (\w+)\(` (Kotlin) -- `def (\w+)` (Ruby) - -Take the closest match above the capture line at column 0 or one indent level deeper than the capture's expected wrapper. If nothing matches within ~80 lines above, set `enclosing: null`. Don't read more file context to chase it. - -For unnamed default exports (`export default function () { ... }`), use the file's basename without extension as the enclosing name (e.g. `CheckoutPage`). - ## Notes on wrapper resolution -This step intentionally does **not** chase wrapper functions (`trackEvent`, `analytics.track`, etc.). Cross-file wrapper resolution doesn't fit cleanly in row-range subagent fan-out, and the reframing principle is "let the PM ask follow-ups." +This step intentionally does **not** chase wrapper functions (`trackEvent`, `analytics.track`, etc.). Cross-file wrapper resolution doesn't fit cleanly in row-range subagent fan-out, and the reframing principle is "let the reader ask follow-ups." -If `wrapper_undetected: true` (SDK in deps but no direct calls found), the report step's data-quality check surfaces it, and the suggested-follow-ups list points the PM at: *"find calls to `trackEvent`/`logEvent`/`analytics.track` and resolve their callers as additional capture sites."* +If `wrapper_undetected: true` (SDK in deps but no direct calls found), the report step's data-quality check surfaces it, and the suggested-follow-ups list points the reader at: *"find calls to `trackEvent`/`logEvent`/`analytics.track` and resolve their callers as additional capture sites."* diff --git a/transformation-config/skills/events-audit/references/3-extract.md b/transformation-config/skills/events-audit/references/3-extract.md index d1f360c..b6a8234 100644 --- a/transformation-config/skills/events-audit/references/3-extract.md +++ b/transformation-config/skills/events-audit/references/3-extract.md @@ -47,7 +47,7 @@ If the property access targets an object literal in the same module and every va - Names built with template literals: `` `signup_${variant}` ``. Leave dynamic. The data-quality check flags these as undercount risk. - Names imported from another module (other than the same-file enum pattern). Leave dynamic. - Names from network responses or feature-flag values. Leave dynamic. -- **Wrapper / function-arg passthrough.** If the dynamic name is a function parameter (`posthog.capture(eventName, ...)` where `eventName` is the enclosing function's argument), leave dynamic — chasing callers across files is intentionally out of scope. The report step's suggested follow-ups list points the PM at this case so they can ask Claude to resolve specific wrappers on demand. +- **Wrapper / function-arg passthrough.** If the dynamic name is a function parameter (`posthog.capture(eventName, ...)` where `eventName` is the enclosing function's argument), leave dynamic — chasing callers across files is intentionally out of scope. The report step's suggested follow-ups list points the reader at this case so they can ask Claude to resolve specific wrappers on demand. When a row can't be resolved, leave it as `is_dynamic: true` with `event_name: null`. The data-quality check counts these as undercount risk; the report's by-event table omits them (they appear only in a "dynamic captures" footnote). diff --git a/transformation-config/skills/events-audit/references/4-mcp-query.md b/transformation-config/skills/events-audit/references/4-mcp-query.md index 220bac2..5777da2 100644 --- a/transformation-config/skills/events-audit/references/4-mcp-query.md +++ b/transformation-config/skills/events-audit/references/4-mcp-query.md @@ -29,7 +29,6 @@ Emit: | MCP tool | When | Use | |----------|------|-----| | `mcp__posthog-wizard__query-run` | (c) below | Execute HogQL/SQL. Filtered query returns volume + last_seen for inventory events. | -| `mcp__posthog-wizard__insights-list` | (f) below, optional | List actions for the report appendix. | | `mcp__posthog-wizard__entity-search` | **Avoid.** | Requires project-key permissions; personal API keys get "permission denied". The SQL approach below works regardless. | The active project comes from the wizard session – don't pick or switch projects yourself. @@ -65,15 +64,6 @@ The result covers only events the code already references – there is no `defin ### d. Merge into the inventory -The inventory now grows an optional `actions[]` field. Final shape: - -```jsonc -{ - "rows": [ ... ], // existing per-site rows, now with volume_30d + last_seen - "actions": [ ... ] // optional, from insights-list -} -``` - For each `row` with `call_kind == "capture"` and a non-null `event_name`, copy `volume_30d` and `last_seen` from the SQL result keyed by `event`. Rows whose name isn't in the SQL result keep `volume_30d: 0` and `last_seen: null` – this is the phantom signal the data-quality check uses. ### e. Resort by volume @@ -96,11 +86,7 @@ If the SQL call in (c) was skipped or errored (every row has `volume_30d: null`) `Write` the inventory back. -### g. Pull actions for the report appendix (optional) - -Call `mcp__posthog-wizard__insights-list` for actions if available. The audit doesn't analyze actions – they only show up in the report appendix. If the call fails or the API can't filter to actions, drop the appendix and note that in the report. - -### h. Failure handling +### g. Failure handling Three failure modes, in order of severity: diff --git a/transformation-config/skills/events-audit/references/5-report-template.md b/transformation-config/skills/events-audit/references/5-report-template.md new file mode 100644 index 0000000..fdbe0e7 --- /dev/null +++ b/transformation-config/skills/events-audit/references/5-report-template.md @@ -0,0 +1,112 @@ + + +# PostHog events audit – {{repo_name}} + +_Generated {{timestamp}}_ + +This audit lists every event your code captures, where it fires, and how often PostHog has seen it in the last 30 days. Use the suggested follow-ups at the end to ask Claude focused questions about the events listed here. + +## 1. Overview + +| Metric | Value | +|---|---:| +| Total events volume (30d) | {{total_volume}} | +| Distinct events | {{distinct_count}} | +| Phantom events (no volume) | {{phantom_count}} | +| Top 10 events = % of total volume | {{top_10_share}} | + +> **Live dashboard:** _not linked — `dashboard-create` did not succeed during this run. See the run output for the failure reason, then re-run the audit to retry._ + +{{overview_panels}} + +## 2. Volume map + +Top events by 30-day volume. The bar shows each event's share of total captured volume. + +| # | Event | Volume | Share | Bar | +|--:|-------|------:|------:|:----| +{{volume_map_rows}} + +{{volume_map_footnote}} + +### Capture sites + +{{capture_sites_collapsibles}} + +## 3. Area topology + +Events grouped by codebase area, sorted by area volume. + +{{area_topology_sections}} + +{{area_topology_commentary}} + +## 4. Identity & segmentation + +{{identity_segmentation_details}} + +## Suggested follow-ups + +You can ask Claude any of these about the events in this report: + +- Which of these events fire on ``? (e.g. signup, checkout, onboarding) +- Which events have inconsistent property naming or types? +- Build a funnel from `` to `` and tell me the drop-off. +- Which areas have the highest event volume but the thinnest property coverage? +- Which phantom events look like dead instrumentation we can remove? + +## Appendix – dynamic event names + +Events whose name couldn't be resolved at scan time (template literal, network value, or imported enum). Listed for completeness; not in §2's table. + +{{dynamic_appendix}} + +## Appendix – person properties (`identify` / `set` / `set_once`) + +{{person_properties_appendix}} + +## Appendix – groups (`group`) + +{{groups_appendix}} + +## About this audit + +Generated by the PostHog events-audit skill. Re-run `posthog-wizard events-audit` to refresh. diff --git a/transformation-config/skills/events-audit/references/5-report.md b/transformation-config/skills/events-audit/references/5-report.md index d812f72..b03d94f 100644 --- a/transformation-config/skills/events-audit/references/5-report.md +++ b/transformation-config/skills/events-audit/references/5-report.md @@ -1,12 +1,18 @@ --- -next_step: null +next_step: 6-dashboard.md --- # Step 5 – Render the report -Produce the audit deliverable in a single pass: a by-event inventory table sorted by 30-day volume, a short by-area index that doubles as a lightweight flow map, three shared qualitative checks (identity & segmentation, coverage map, data quality), and a suggested follow-ups list the PM can paste back to ask Claude. +Produce the audit deliverable in a single pass. The report has three high-level views the reader scans first — Overview, Volume Map, Area Topology — followed by Identity & Segmentation, suggested follow-ups, and appendices. -The skill's job ends with a map and a few short observations. **Don't cluster events into flows. Don't write per-flow narratives. Don't synthesize a story.** The PM does that on demand. +The skill's job is to give the reader a map plus a few short observations. **Don't cluster events into flows. Don't write per-flow narratives. Don't synthesize a story.** The reader does that on demand. + +## Supporting files + +This step uses one supporting reference file (not part of the chain): + +- `references/5-report-template.md` — verbatim markdown template for the rendered report. Orchestrator reads it once at step (f), strips the leading HTML comment block, substitutes every `{{placeholder}}`, and writes the result to `posthog-events-audit-report.md`. ## Output discipline @@ -20,10 +26,10 @@ Emit, in order: ``` [STATUS] Reading inventory -[STATUS] Computing area index +[STATUS] Computing volume map +[STATUS] Computing area topology +[STATUS] Computing overview KPIs [STATUS] Analyzing identity & segmentation -[STATUS] Analyzing coverage map -[STATUS] Analyzing data quality [STATUS] Writing report ``` @@ -33,34 +39,77 @@ Emit, in order: `Read` `.posthog-events-inventory.json` once. From it you'll work with: -- `rows[]` – capture rows (sorted by `volume_30d` desc by step 4) with `event_name`, `properties[]`, `area`, `route`, `enclosing`, `volume_30d`, `last_seen`, `status`, etc. -- `actions[]` – optional, for the appendix. +- `rows[]` – capture rows (sorted by `volume_30d` desc by step 4) with `event_name`, `properties[]`, `package`, `area`, `route`, `enclosing`, `volume_30d`, `last_seen`, `status`, etc. - `wrapper_undetected` – top-level boolean. If `rows[]` is empty, render a short report explaining the inventory is empty, resolve all three shared checks with `pending` details (no data to evaluate), and exit. -### b. Aggregate by event name (the headline view) +### b. Aggregate by event (Volume Map records) -Group capture rows by `event_name` (skip rows where `is_dynamic == true` or `event_name == null`; those go to the dynamic-captures footnote). For each distinct event, compute: +Group capture rows by `event_name` (skip rows where `is_dynamic == true` or `event_name == null`; those go to the dynamic-captures appendix). For each distinct event, compute: - `event` – the literal name. -- `volume_30d` – pulled from any one row (all rows for the same event share volume). +- `volume_30d` – pulled from any one row (rows for the same event share volume). - `last_seen` – same. -- `status` – `resolved` | `phantom` | `pending` (one of the three step 4 set). -- `capture_sites[]` – list of `{ file, line, area, route, enclosing }` for every row sharing this event name. +- `status` – `resolved` | `phantom` | `pending` (from step 4). +- `capture_sites[]` – list of `{ file, line, package, area, route, enclosing }` for every row sharing this name. +- `areas[]` – distinct `area` values across the sites, alphabetical. +- `packages[]` – distinct non-null `package` values across the sites, alphabetical. Empty if all rows have `package: null`. - `properties_seen[]` – union of all `properties[]` across the rows, sorted alphabetically. +- `has_conditional` – `true` if any contributing row has `conditional_fire == true`. + +Sort by `volume_30d` desc; phantoms sink to the bottom. Compute `total_volume_30d = sum(volume_30d)` across all distinct events; per-event `share = volume_30d / total_volume_30d` for the Volume Map bars. + +### c. Group by area (Area Topology records) + +First, count the distinct non-null `package` values across all rows. Branch: + +- **Multi-package monorepo (≥2 distinct non-null packages):** group two levels deep — `package > area`. For each distinct `package` (plus a `(unscoped)` bucket if any rows have `package: null`), tally per-area records inside it. +- **Single package or flat repo (0 or 1 distinct non-null package):** group flat by `area` as before. + +For each `area` record (single-level or nested under a package), taking the first listed `(package, area)` per event when an event spans multiple — events appear once in topology, in their primary location — compute: + +- `package` – the package bucket (only used in multi-package mode; `null` otherwise). +- `area` – the bucket name. +- `event_count` – number of distinct events in this area. +- `total_volume_30d` – sum of `volume_30d` for this area's events. +- `events[]` – the area's events, sorted by `volume_30d` desc, each carrying `event`, `volume_30d`, `has_conditional`, `is_phantom`. + +Sort areas by `total_volume_30d` desc within their group; in multi-package mode, sort packages by their summed `total_volume_30d` desc. + +If every event collapses to one or two `area` buckets (a flat repo), note this inline in the rendered topology section — the by-event table becomes the primary view. + +### d. Compute Overview KPIs and panels -Sort by `volume_30d` desc; phantoms sink to the bottom of the table; ties break by event name. +Overview is the action-oriented top section. It's a small KPI grid plus a series of issue panels. -### c. Compute the by-area index +#### KPIs (four numbers) -Tally distinct event names per `area`. Build `[{ area, event_count, total_volume_30d }]`, sorted by `total_volume_30d` desc. Use this as the report's "flow map" — a one-line summary at the top of §1 plus a short index. **Don't render per-area narratives.** The index points the PM at where to look; the by-event table is where they read. +- **Total events (30d)** — `total_volume_30d` from (b). +- **Distinct events** — count of by-event records from (b). +- **Phantom (dead code)** — count of by-event records where `status == "phantom"`. +- **Top 10 = % of volume** — sum of the top ten events' `volume_30d` / `total_volume_30d`, rendered as a percentage. This is the concentration headline; high values (>95%) mean the long tail is mostly low-signal and a handful of events dominate ingestion. -If every row collapses to one or two `area` buckets (a flat repo without per-feature directories), say so in plain language ("Capture sites all live in a few shared modules — the area grouping is coarse here") and let the PM scan the by-event table directly. +#### Panels (zero or more, render only those that have content) -### d. Analyze identity & segmentation (shared check) +Each panel is a short bulleted list. Panels are derived deterministically from the inventory. -Reframe identity rules as PM-facing capabilities. Identification works differently on the client and the server, so judge per SDK family detected in step 1. +1. **Phantom events — in code, zero volume.** Events where `status == "phantom"`. Each bullet: `event_name — area`. Sort by area, then event name. +2. **No properties attached — flying blind.** Events where `properties_seen[] == []`. Sort by `volume_30d` desc. Each bullet: `event_name — Xk events flying blind`. Limit to top 12; add `… (+N more)` if longer. +3. **Name drift — same concept, different keys.** Pairs of events whose names collapse to the same string when lowercased and stripped of underscores/spaces. Each bullet: `event_a vs event_b — splits funnels on `. +4. **Type drift — numeric property with mixed types.** Property keys named `revenue`, `amount`, `price`, `count`, `duration_*`, `quantity` whose values mix number and string across call sites. Each bullet: `property — number at file:line, string at file:line — silently zeros aggregates`. +5. **Conditional fires — undercount risk.** Events where `has_conditional == true`. Each bullet: `event_name — fires inside at file:line`. Sort by volume desc; cap at 8. +6. **Duplicate captures — same event from multiple SDK families.** Events present in both client- and server-side SDK rows, where neither row is in a test file and neither explicitly threads `distinctId` from request context. Each bullet: `event_name — fires from at file:line and at file:line — risks 2× counting`. +7. **Unresolved dynamic captures.** Inventory rows still flagged `is_dynamic: true` after step 3. Each bullet: `file:line — event name is `. +8. **Volume concentration.** A short text line plus the top 10 events as a bulleted list with bars. Each bullet: `event_name — Xk · share% · ▓▓▓▓░░░░░░`. Bars use Unicode block characters (`▓` for filled, `░` for empty), 12 chars wide, scaled to per-event share of `total_volume_30d`. + +Skip any panel whose source list is empty. Don't render an empty "No phantom events" header — silence is the signal. + +These panels carry the findings that previously lived in the standalone Coverage Map and Data Quality sections; rendering them as Overview panels keeps action items in one place at the top of the report. The `coverage-map` and `data-quality` checks are still resolved separately via `audit_resolve_checks` (their `details` mirror the relevant Overview panels). + +### e. Analyze identity & segmentation (shared check) + +Reframe identity rules as product-facing capabilities. Identification works differently on the client and the server, so judge per SDK family detected in step 1. #### Capabilities @@ -75,12 +124,12 @@ Reframe identity rules as PM-facing capabilities. Identification works different 4. **Cross-device tracking.** Passes if any `reset` row exists. Server-only projects skip this — the concept doesn't apply. -#### Rendering shape (§3 in the report) +#### Rendering shape (Identity & Segmentation section) Render as **bold lead** + one bold-leading bullet per capability + sub-bullets for granular evidence. **No prose paragraphs.** Every capability gets its own bullet — consistent shape across audits is what makes the section scannable. ```markdown -**** +**** - **Cross-session (client)** — . . - @@ -90,193 +139,96 @@ Render as **bold lead** + one bold-leading bullet per capability + sub-bullets f - **Cross-device hygiene** — . . ``` -If a capability doesn't apply (e.g. server-only project for cross-device), still emit the bullet with `n/a — `. Don't omit it. - -#### Resolve the check - -Call `mcp__wizard-tools__audit_resolve_checks` for `identity-segmentation` with status `pass` if all applicable capabilities pass, `warning` if cross-session is partial or one segmentation breakdown is blocked, `error` if cross-session fails. `details` mirrors the rendering shape above. No grades. - -### e. Analyze coverage map (shared check) - -Walk the by-area index from (c). Coverage is qualitative — describe state, don't grade. - -#### Things to call out - -- **Distribution** — how many areas carry events; what kinds of activity they cover (engagement, conversion, content, server-side, etc.). One bullet, factual. -- **Dark surfaces** — areas where captures exist in code but have zero 30-day volume. Name the area and a representative file. Each dark surface gets its own bullet. -- **Reliance on `shared` / `global`** — if these areas carry a large share of captures, flag it: the coverage map can't tell you which user-visible surface fired the event without a follow-up. -- **Person properties without events** — `setPersonProperties` calls in areas that have no `capture` events. Person properties without events mean you can describe the user but can't count their actions. -- **Wrapper-undetected** — if `wrapper_undetected == true` from step 2: "An SDK is installed but no direct capture sites were found. There's likely a wrapper the scanner didn't follow." -- **Coarse grouping** — if only one or two `area` buckets exist: "The repo isn't organized by feature; the by-event table is the primary view." - -#### Rendering shape (§4 in the report) - -Render as **bold lead** + one bold-leading bullet per observation + sub-bullets for evidence. **No prose paragraphs.** Use the bullet labels above (`Distribution`, `Dark surface — `, `Reliance on shared/global`, etc.) so multiple audits stay comparable. - -```markdown -**** - -- **Distribution** — distinct areas carry events: . -- **Dark surface — ** — events implemented at `` and `` have zero 30-day volume. . -- **Reliance on `shared`** — all fire from ``. Without a `source` property, you can't tell which page surface triggered them. -- ** sets person properties but emits no events** — ``. You see who but not what they did. -``` - -Skip bullets that don't apply. Don't render an empty "Wrapper-undetected: n/a" bullet. - -#### Resolve the check - -Call `audit_resolve_checks` for `coverage-map` with status `pass` (broad coverage, multiple areas, no dark surfaces), `warning` (one or more dark surfaces, or heavy reliance on `shared`), or `suggestion` (wrapper-undetected or coarse grouping). `details` mirrors the rendering shape above. - -### f. Analyze data quality (shared check) - -Walk the inventory once. Only flag issues that bite a PM building dashboards. - -1. **Name drift** — same concept under two different keys. Heuristic: lowercase + strip underscores; if two keys collapse to the same string, that's drift. Examples: `user_id` vs `userId`, `signup_method` vs `method`. **Splits funnels.** -2. **Type drift on numeric properties** — for keys named `revenue`, `amount`, `price`, `count`, `duration_*`, `quantity`, scan call-site literals; mixing number and string is an error. **Silently zeros out aggregates.** -3. **Conditional-fire undercount** — count rows with `conditional_fire: true` and list affected events. **Funnel undercounts on certain code paths.** -4. **Duplicate-event overcount** — same event name on two SDK families. Skip when one is in test files or one explicitly threads `distinctId` from request context. -5. **Phantom events** — `status == "phantom"` rows. List the top offenders. **Either typo, dead code path, or instrumentation that hasn't shipped.** -6. **Unresolved dynamic names** — rows where step 3 left `is_dynamic: true`. Flag as undercount risk. - -#### Rendering shape (§5 in the report) - -Render as **bold lead** stating the worst issue as a PM cost + one bold-leading bullet per issue + sub-bullets for granular evidence (call sites, property unions, paired events). **No prose paragraphs.** - -```markdown -**** - -- **** — . - - - - -- **** — . -``` - -Sort issues by PM cost: type drift > name drift on flagship events > duplicate captures > conditional fires > phantom clusters > unresolved dynamics. The lead bold sentence names whichever issue tops that list. - -#### Resolve the check - -Call `audit_resolve_checks` for `data-quality` with status `pass` (no issues), `warning` (one or two issues), or `error` (type drift, name drift on flagship events, or many phantoms). `details` mirrors the rendering shape above. - -### g. Render the report - -`Write` `posthog-events-audit-report.md` at the project root. Single Markdown file, composed in one model turn. Strip the `BEGIN-REPORT` / `END-REPORT` markers when writing. - -```markdown - -# PostHog events audit – {{repo_name}} - -_Generated {{timestamp}}_ - -This audit lists every event your code captures, where it fires, and how often PostHog has seen it in the last 30 days. The deliverable is the inventory plus three short observations — use the suggested follow-ups at the end to ask Claude focused questions against the inventory. - -## 1. Events by volume - -{{one-line summary: " distinct events captured across areas; top areas: ."}} - -| Event | Volume (30d) | Sites | Areas | Properties | -|-------|--------------|-------|-------|------------| -{{event_rows}} - -Notes column conventions: -- `(phantom)` after the event name when `status == "phantom"`. -- `(conditional)` when any site has `conditional_fire == true`. -- Sites column: count of distinct `file:line` (e.g. `3 sites`). -- Areas column: comma-separated unique `area` values for this event (`checkout`, `Posts`). -- Properties column: comma-separated keys, truncated to ~5 with `… (+N more)` if longer. - -### Capture sites per event +If a capability doesn't apply, still emit the bullet with `n/a — `. Don't omit it. + +#### Resolve all three shared checks + +After computing the Overview panels in (d) and the identity capabilities in (e), call `mcp__wizard-tools__audit_resolve_checks` once for each shared check. Stream them as separate calls so the audit-plan tab updates progressively. + +- `identity-segmentation` — status `pass` if all applicable capabilities pass, `warning` if cross-session is partial or one breakdown is blocked, `error` if cross-session fails. `details` mirrors the Identity & Segmentation rendering shape above. +- `coverage-map` — status `pass` (broad area distribution, no dark surfaces), `warning` (one or more dark surfaces, heavy reliance on `shared`, or wrapper-undetected), `suggestion` (coarse grouping). `details` is a short bullet list summarizing what the Area Topology section will show: distribution, dark surfaces, reliance on shared, wrapper-undetected, coarse-grouping. Cite `file:line` per non-pass bullet. +- `data-quality` — status `pass` (no Overview panels render), `warning` (one or two issue panels render), `error` (type drift, name drift on flagship events, or many phantoms). `details` mirrors the Overview panels that rendered: lead with the worst issue stated as product cost (e.g. "`revenue` type drift will silently zero out checkout aggregates"), then one bullet per issue. + +### f. Render the report + +The markdown report template lives in `references/5-report-template.md`. The orchestrator reads it once, strips the leading HTML comment block (the placeholder catalog), substitutes every `{{placeholder}}` with values computed in steps (b) through (e), and writes the result to `posthog-events-audit-report.md` at the project root. + +#### Substitution conventions + +These rules tell you how to format each placeholder. The placeholder names themselves are documented in the template's header comment. + +- **`{{repo_name}}`** — the project root directory name. +- **`{{timestamp}}`** — short human-readable date (e.g. `2026-05-09`) or full ISO timestamp. +- **`{{total_volume}}`** — formatted with thousands separator (`310,000`) or compact (`310k`); use compact for totals ≥10,000. +- **`{{distinct_count}}`** — integer; from the by-event records in (b). +- **`{{phantom_count}}`** — integer; render as `0` if no phantoms (the row is still useful at all-zeros). +- **`{{top_10_share}}`** — percentage rounded to nearest whole, e.g. `90%`. +- **`{{overview_panels}}`** — concatenation of the panels from (d), each rendered as: + ```markdown + ### + - + - + ``` + Skip panels with no content. If every panel is empty, render the line `_No issues detected. Naming, types, and capture sites all look consistent._` instead. +- **`{{volume_map_rows}}`** — top 10–15 events from (b), one markdown table row each: `| # | \`event_name\` | volume | share | bar |`. Bar column uses a 12-char Unicode block: `▓` × `round(share × 12)`, padded with `░`. Phantom events sink to the bottom of the table; tag them inline with `· phantom` after the event name in the Event column. +- **`{{volume_map_footnote}}`** — one line stating how many events are in the table vs. total, plus a pointer to where the long tail can be found. Example: `Showing top 12 of 51 distinct events; the remaining events appear in the Area topology section below.` +- **`{{capture_sites_collapsibles}}`** — for each event in the volume map, one `
` block. Include `package ` in each site bullet only when the event's `packages[]` is non-empty (skip otherwise to keep single-package output uncluttered): + ```markdown +
+ purchase_completed — 1,400 events / 3 sites + + - `apps/web/components/Checkout/Checkout.tsx:88` — package `web`, area `checkout`, route `/checkout`, enclosing `handleSubmit` + - `apps/mobile/Checkout.tsx:44` — package `mobile`, area `checkout`, enclosing `onPaymentSuccess` + + Properties seen: `revenue`, `currency`, `plan` +
+ ``` + Use HTML `
` so the report stays scannable but every site is one click away. +- **`{{area_topology_sections}}`** — for each area from (c). In single-package mode, render flat: + ```markdown + ### ( · events) + - `event_a` — Xk · conditional + - `event_b` — Yk + - `event_c` — Zk · phantom + ``` + In multi-package mode, render nested under each package — the package header is `###`, area is `####`: + ```markdown + ### · areas + #### ( · events) + - `event_a` — Xk + ``` + Use `(unscoped)` as the package header for rows with `package: null`. Annotations after the volume: `· conditional` if the event has `has_conditional`, `· phantom` if `is_phantom`. Both can stack. +- **`{{area_topology_commentary}}`** — one or two short bullets if the topology has notable shapes (e.g. "Auth events all live in `shared` — without a `source` property, you can't tell which page surface triggered each login"). Skip when nothing notable applies. +- **`{{identity_segmentation_details}}`** — the bold-lead-plus-bullets shape from step (e). +- **`{{dynamic_appendix}}`** — bulleted list of unresolved-dynamic rows: `file:line — `. +- **`{{person_properties_appendix}}`** — bulleted list of person property keys from `identify` / `set` / `set_once` rows; deduplicate. +- **`{{groups_appendix}}`** — bulleted list of `group` rows: `: `. + +#### Rendering rules + +- **One `Write` call.** Compose the full substituted markdown in your turn before invoking `Write`. Don't pre-stream the content into assistant text. +- **Plain language, no grades.** Don't render the check `status` enum (`pass`/`warning`/`error`) as a badge or label in the report. Use prominence and word choice — a missing flagship capability leads its section; a nice-to-have is a footnote bullet. +- **`file:line` citations** on every non-pass observation. +- **Fan-out is not used in this step.** The data fits in one turn. -For each event in the table above, render a collapsible-style block: +### g. Surface the deliverable and clean up the checklist -```markdown -
-purchase_completed — 1,400 events / 3 sites +**Only one file is produced by this skill:** `posthog-events-audit-report.md`. **Do not write any additional summary, recap, or "what was done" file** (e.g. `posthog-audit-report.md`, `audit-summary.md`, `SUMMARY.md`). The single report from step (f) is the entire deliverable. Don't write an end-of-turn summary as a file — keep that in the chat reply only. -- `src/checkout/Checkout.tsx:88` — area `checkout`, route `/checkout`, enclosing `handleSubmit` -- `mobile/Checkout.tsx:44` — area `checkout`, enclosing `onPaymentSuccess` -- `api/webhooks/stripe.py:120` — area `api/webhooks`, enclosing `handle_payment_intent` +After the report is written and all three shared checks are resolved, delete the checklist file — it's a transient progress ledger: -Properties seen: `revenue`, `currency`, `plan` -
``` - -Use HTML `
` so the report stays scannable but every site is one click away. - -## 2. By area - -A coarse map of where instrumentation lives. - -| Area | Events | 30d volume | -|------|--------|------------| -{{area_index_rows}} - -{{one-line note from step (c): coarse-grouping or normal}} - -## 3. Identity & segmentation - -{{identity_segmentation_details}} - -## 4. Coverage map - -{{coverage_map_details}} - -## 5. Data quality - -{{data_quality_details}} - -## Suggested follow-ups - -You can ask Claude any of these against the inventory at `.posthog-events-inventory.json`: - -- Which of these events fire on ``? (e.g. signup, checkout, onboarding) -- Which events have inconsistent property naming or types? -- Build a funnel from `` to `` and tell me the drop-off. -- Which areas have the highest event volume but the thinnest property coverage? -- Which phantom events look like dead instrumentation we can remove? - -## Appendix – dynamic event names - -Events whose name couldn't be resolved at scan time (template literal, network value, or imported enum). Listed for completeness; not in §1's table. - -{{dynamic_appendix}} - -## Appendix – person properties (`identify` / `set` / `set_once`) - -{{person_properties_appendix}} - -## Appendix – groups (`group`) - -{{groups_appendix}} - -## Appendix – actions - -{{actions_appendix}} - -## About this audit - -Generated by the PostHog events-audit skill. The full inventory is at `.posthog-events-inventory.json` (kept after the run for follow-up questions). Re-run `posthog-wizard events-audit` to refresh. - +Bash: rm -f .posthog-audit-checks.json ``` -### Rendering rules - -- **One `Write` call.** Compose the full Markdown in your turn before invoking `Write`. Don't pre-stream the content into assistant text. -- **Plain language, no grades.** Don't render the check `status` enum (`pass`/`warning`/`error`) as a badge or label in the report. Use prominence and word choice — a missing flagship capability leads its section; a nice-to-have is a footnote bullet. -- **`file:line` citations** on every non-pass observation. -- **Fan-out is not used in this step.** The data fits in one turn. - -### h. Surface the deliverables - -The inventory is the deliverable. **Do not delete `.posthog-events-inventory.json`.** +**Do not delete `.posthog-events-inventory.json` yet** — step 6 needs it for the IN-list. Step 6 cleans it up after the dashboard step, regardless of whether the user opts into dashboard creation. -Emit two trailing lines so the wizard can surface both files to the user: +Emit one trailing line so the wizard can surface the report to the user: ``` Created events audit report: -Kept events inventory: ``` ## Resolve -`next_step: null` – the chain ends here. By the end of this step, all three shared checks (`identity-segmentation`, `coverage-map`, `data-quality`) must be resolved via `audit_resolve_checks`. There are no per-flow checks to resolve. +`next_step: 6-dashboard.md`. By the end of this step, all three shared checks (`identity-segmentation`, `coverage-map`, `data-quality`) must be resolved via `audit_resolve_checks`. There are no per-flow checks to resolve. Step 6 handles the optional dashboard creation and the final inventory cleanup. diff --git a/transformation-config/skills/events-audit/references/6-dashboard.md b/transformation-config/skills/events-audit/references/6-dashboard.md new file mode 100644 index 0000000..b21d5b3 --- /dev/null +++ b/transformation-config/skills/events-audit/references/6-dashboard.md @@ -0,0 +1,158 @@ +--- +next_step: null +--- + +# Step 6 – Live dashboard + +The static report shows what your code captures. This step creates a live PostHog dashboard pinned to the same code-confirmed event list, so you can watch volume over time and catch phantoms as they appear. The dashboard is part of the standard audit deliverable — don't ask the user whether to create it. If the MCP project isn't writable, fail soft (log the reason, leave the placeholder in the report) and clean up as normal. + +## Status + +Emit, in order: + +``` +[STATUS] Creating dashboard +[STATUS] Creating insights +[STATUS] Linking dashboard in report +[STATUS] Cleaning up +``` + +## MCP tools + +| MCP tool | When | Use | +|----------|------|-----| +| `mcp__posthog-wizard__dashboard-create` | (b) below | Create the parent dashboard. Returns a dashboard with `id` and a PostHog URL. | +| `mcp__posthog-wizard__insight-create` | (c) below | Create each insight, attached to the dashboard via `dashboards: []`. | + +Load both via `ToolSearch select:mcp__posthog-wizard__dashboard-create,mcp__posthog-wizard__insight-create` once at the start of (a). They're write tools — every call mutates the user's PostHog project. + +## Action + +### a. Create the dashboard + +`Read` `.posthog-events-inventory.json` once and rebuild the IN-list — same rule as step 4 (b): every distinct `event_name` from `rows[]` where `call_kind == "capture"` and `is_dynamic == false` and `event_name != null`. Hold it as `IN_LIST` in memory; you'll embed it into each insight's HogQL `source`. + +Call `mcp__posthog-wizard__dashboard-create` with: + +```json +{ + "name": "Wizard events audit – ", + "description": "Live volume view of events captured in the codebase. Generated by the PostHog events-audit skill on . The static report at posthog-events-audit-report.md has the code-side findings.", + "tags": ["events-audit", "wizard"] +} +``` + +Capture the returned `id` as `DASHBOARD_ID` and the returned PostHog URL. + +If the call errors (permission denied, project misconfigured, network), emit one line — `Dashboard creation failed: . Skipping insights.` — and skip to (e). Don't retry. Don't fall back to a different approach. + +### b. Create the three insights + +For each insight, call `mcp__posthog-wizard__insight-create` with `dashboards: [DASHBOARD_ID]` so it's attached on creation. The `query` field is a `DataVisualizationNode` wrapping a HogQL query — that's the simplest shape for these three views. + +Embed `IN_LIST` directly in each SQL statement as a comma-separated list of single-quoted event names. Do not use parameter placeholders — the MCP `insight-create` tool persists the query verbatim, so the IN-list has to be inlined. + +#### Insight 1 — Daily volume trend + +```json +{ + "name": "Events audit · Daily volume (30d)", + "description": "Total daily count of code-confirmed events over the last 30 days.", + "dashboards": [], + "query": { + "kind": "DataVisualizationNode", + "display": "ActionsLineGraph", + "source": { + "kind": "HogQLQuery", + "query": "SELECT toDate(timestamp) AS day, count() AS volume FROM events WHERE timestamp > now() - INTERVAL 30 DAY AND event IN () GROUP BY day ORDER BY day" + }, + "chartSettings": { + "xAxis": { "column": "day" }, + "yAxis": [{ "column": "volume" }], + "showLegend": false + } + } +} +``` + +#### Insight 2 — Top events by volume + +```json +{ + "name": "Events audit · Top events by volume (30d)", + "description": "Code-confirmed events ranked by 30-day count.", + "dashboards": [], + "query": { + "kind": "DataVisualizationNode", + "display": "ActionsTable", + "source": { + "kind": "HogQLQuery", + "query": "SELECT event, count() AS volume_30d, max(timestamp) AS last_seen FROM events WHERE timestamp > now() - INTERVAL 30 DAY AND event IN () GROUP BY event ORDER BY volume_30d DESC LIMIT 25" + } + } +} +``` + +#### Insight 3 — Phantom watch + +This insight surfaces events the code references but PostHog hasn't seen recently. Build the query with the IN-list as an inline `VALUES`-style CTE: + +```json +{ + "name": "Events audit · Phantom watch", + "description": "Events captured in code but with zero or near-zero volume in the last 30 days. A growing list here usually means dead instrumentation, a typo, or a code path that no longer fires.", + "dashboards": [], + "query": { + "kind": "DataVisualizationNode", + "display": "ActionsTable", + "source": { + "kind": "HogQLQuery", + "query": "WITH code_events AS (SELECT 'event_a' AS name UNION ALL SELECT 'event_b' UNION ALL SELECT 'event_c') SELECT ce.name AS event, coalesce(p.volume, 0) AS volume_30d FROM code_events ce LEFT JOIN (SELECT event, count() AS volume FROM events WHERE timestamp > now() - INTERVAL 30 DAY GROUP BY event) p ON p.event = ce.name ORDER BY volume_30d ASC, event ASC" + } + } +} +``` + +In the actual call, replace the `code_events` CTE's `SELECT 'event_a' ... UNION ALL ...` with one `SELECT ''` per IN-list entry, joined by `UNION ALL`. Keep it on one line (HogQL accepts it). + +If any single `insight-create` call errors, log the failure inline (`Insight "" failed: `) and continue with the rest. A partial dashboard is more useful than no dashboard. + +### c. Patch the dashboard URL into the report + +The report (written in step 5) has a one-line blockquote callout inside the Overview section, immediately after the Overview metric table: + +``` +> **Live dashboard:** _not linked — `dashboard-create` did not succeed during this run. See the run output for the failure reason, then re-run the audit to retry._ +``` + +If at least one insight was created successfully, `Edit` `posthog-events-audit-report.md` to swap that callout for a live link. Use the `Edit` tool with: + +- `old_string`: the full blockquote line above, exactly as written (single line; do not include surrounding blank lines). +- `new_string`: a single blockquote line of the form: + ``` + > **Live dashboard:** []() — daily volume trend, top events, and phantom watch. + ``` + +Substitute `` and `` from the `dashboard-create` response. If one or two insights failed and the rest succeeded, trim the trailing list to mention only the insights that exist (e.g. "daily volume trend and top events" if phantom watch failed). + +If every `insight-create` call failed in (c), don't patch the report — leave the placeholder as-is. An empty dashboard isn't worth linking to. Delete the empty dashboard if the MCP project has `mcp__posthog-wizard__dashboard-delete` available; otherwise note "Dashboard created but all insights failed; remove it manually at " and move on. + +### d. Surface the dashboard URL + +Emit one line so the wizard can surface the dashboard to the user: + +``` +Created events audit dashboard: +``` + +### e. Clean up the inventory + +Whether creation succeeded, partially succeeded, or failed — delete the inventory now. It's transient scratch state. + +``` +Bash: rm -f .posthog-events-inventory.json +``` + +## Resolve + +`next_step: null` – the chain ends here. No checks to resolve in step 6 (dashboard creation isn't part of the audit checklist). From 76332ed152d2d650950024879ceb33f096444507 Mon Sep 17 00:00:00 2001 From: Edwin Lim Date: Sun, 10 May 2026 13:20:10 -0400 Subject: [PATCH 3/5] another revision --- .../skills/events-audit/description.md | 63 +++++++------------ .../events-audit/references/1-detect.md | 44 ++++++++++++- .../references/2-scan-subagent-prompt.md | 12 ---- .../skills/events-audit/references/2-scan.md | 2 +- .../references/5-report-template.md | 40 ------------ .../events-audit/references/5-report.md | 4 +- 6 files changed, 68 insertions(+), 97 deletions(-) diff --git a/transformation-config/skills/events-audit/description.md b/transformation-config/skills/events-audit/description.md index 4b54d52..ee1b579 100644 --- a/transformation-config/skills/events-audit/description.md +++ b/transformation-config/skills/events-audit/description.md @@ -1,62 +1,37 @@ # PostHog events audit -This skill produces a product-browseable report of every PostHog event your code captures, mapped to the codebase by file path and enriched with 30-day volume from PostHog. The reader does the synthesis on demand by asking follow-up questions about the report — the skill itself doesn't cluster events into flows or write per-flow narratives. - -The checklist has three shared checks: `identity-segmentation`, `coverage-map`, `data-quality`. Finish each one. Don't invent new ids. +This skill produces a product-browseable report of every PostHog event your code captures, mapped to the codebase area, and enriched with 30-day volume from PostHog. ## Workflow The audit runs as a 6-step chain: -1. Detect -2. Scan -3. Extract -4. Enrich -5. Report -6. Dashboard +1. Detect SDK +2. Scan events +3. Extract event names +4. MCP query +5. Write report +6. Create dashboard Each step file points to the next. Run them in order. Don't explore the source tree on your own. -The wizard seeds the checklist with the three shared checks before you start. - -Step 1 confirms the shape and reseeds if it's missing or out of date. As you finish each check, patch it with `mcp__wizard-tools__audit_resolve_checks`. - **Start by reading `references/1-detect.md`** (relative to this skill's directory – typically `.claude/skills/events-audit/references/1-detect.md`). Don't read ahead. Don't re-read a step once you've passed it. Don't re-read SKILL.md. -Some tools are deferred by the SDK – load each once via `ToolSearch select:` before first use: `Read`, `Bash`, `Glob`, `Grep`, `Write`, `mcp__wizard-tools__audit_resolve_checks`, `mcp__wizard-tools__audit_seed_checks`, and the PostHog query tool `mcp__posthog-wizard__query-run`. The dashboard write tools `mcp__posthog-wizard__dashboard-create` and `mcp__posthog-wizard__insight-create` are loaded inside step 6. Use `ToolSearch` to load named tools only – don't browse. - -`Agent` is **not** in the default load list. Step 2 is the only place where fan-out is conditional; load `Agent` *inside* step 2, only after deciding to dispatch subagents. - -If the wizard prompt names a framework (e.g. "Framework: Flask"), use it to narrow your scans – skip manifests and language patterns that don't apply. - -## When to trigger - -Trigger when the user asks for an event audit, event inventory, or events documentation; "what events does my code capture"; "find redundant or stale events"; or "which product questions can my data answer." - -Don't trigger when the user wants to *add* instrumentation (defer to `instrument-product-analytics`) or debug a single missing event (defer to `diagnosing-missing-recordings`). - -## Live activity – `[STATUS]` - -The "Working on …" banner reads from `[STATUS]` lines you emit in plain text. Whenever you start a sub-step, write a line like: - -``` -[STATUS] Scanning capture sites -``` - -The wizard catches these and updates the spinner. Use them freely – they're cheap. Each step file lists the exact strings to emit. Don't invent your own. +Step 1 seeds the audit checklist as its first action. Don't assume the runtime pre-seeds it. ## The audit checklist -The checklist lives at `.posthog-audit-checks.json` and shows live in the "Audit plan" tab. It's owned by MCP tools – **never `Write` it directly**: +The audit checklist has three shared checks in addition to the event map audit: `identity-segmentation`, `coverage-map`, `data-quality`. Finish each one. Don't invent new ids. + +The checklist lives at `.posthog-audit-checks.json`. It's owned by MCP tools – **never `Write` it directly**, -- `mcp__wizard-tools__audit_resolve_checks({ updates })` - patch one or more checks by `id`. Each `update` is `{ id, status, file?, details? }`. Emit one call per check as you finish its analysis – the "Audit plan" tab updates live, so streaming resolutions one-at-a-time gives the user visible progress instead of a single end-of-step flip. Only batch when you genuinely produce two updates in the same model turn (rare). -- `mcp__wizard-tools__audit_seed_checks({ checks })` - replaces the whole checklist atomically. Step 1's fallback uses this when the file is missing or out of date; otherwise don't call it. +## The events inventory -A second file, `.posthog-events-inventory.json`, is the working ledger for steps 2 through 4. It holds the capture sites with derived `package`/`area`/`route`/`enclosing` fields, event names, properties, and per-event volume from PostHog. +A second file, `.posthog-events-inventory.json`, is the working event inventory for steps 2 through 4. It holds the capture sites with derived `package`/`area`/`route`/`enclosing` fields, event names, properties, and per-event volume from PostHog. It's **not** MCP-owned – no `audit_*` tool guards it. The inventory is **transient scratch state**, not a deliverable: step 5 deletes `.posthog-audit-checks.json` once the report is written, and step 6 deletes the inventory after the optional dashboard step. The report is the only artifact the user keeps. -### Check entry shape +Check entry shape: - `id` - stable kebab-case slug. The three shared ids are `identity-segmentation`, `coverage-map`, `data-quality`. - `area` - short group name. Shared entries use `Identity`, `Coverage`, `Data quality`. @@ -71,6 +46,16 @@ It's **not** MCP-owned – no `audit_*` tool guards it. The inventory is **trans - **Frame findings as product questions.** Every finding describes *what product question or insight it blocks*, not what code rule it breaks. - **Hand the reader the map. Don't tell the story for them.** The deliverable is a single report with three short qualitative checks plus a few suggested follow-ups. The reader clusters events into flows on demand by asking targeted follow-up questions about the report — the skill doesn't do that synthesis upfront. +## Live activity – `[STATUS]` + +The "Working on …" banner reads from `[STATUS]` lines you emit in plain text. Whenever you start a sub-step, write a line like: + +``` +[STATUS] Scanning capture sites +``` + +The wizard catches these and updates the spinner. Use them freely – they're cheap. Each step file lists the exact strings to emit. Don't invent your own. + ## Abort statuses Report aborts with `[ABORT]` prefixed messages. The wizard catches these and stops the run – don't halt yourself. diff --git a/transformation-config/skills/events-audit/references/1-detect.md b/transformation-config/skills/events-audit/references/1-detect.md index c51fe5a..52956fe 100644 --- a/transformation-config/skills/events-audit/references/1-detect.md +++ b/transformation-config/skills/events-audit/references/1-detect.md @@ -4,19 +4,57 @@ next_step: 2-scan.md # Step 1 – Detect SDKs -Find every PostHog SDK in the project and remember which language(s) and framework(s) the rest of the audit will work on. **Read-only.** Don't scan code for capture sites – that's step 2. +Seed the audit checklist, then find every PostHog SDK in the project and remember which language(s) and framework(s) the rest of the audit will work on. **Read-only on the codebase.** Don't scan code for capture sites – that's step 2. + +## Tools + +Load via `ToolSearch select:Read,Glob,mcp__wizard-tools__audit_seed_checks,mcp__wizard-tools__audit_resolve_checks` once at the start of this step. ## Status -Emit: +Emit, in order: ``` +[STATUS] Seeding audit checklist [STATUS] Detecting SDKs ``` ## Action -### a. Find PostHog SDKs +### a. Seed the audit checklist + +The checklist lives at `.posthog-audit-checks.json` and renders live in the "Audit plan" tab. **Don't rely on the runtime pre-seeding it** — call `mcp__wizard-tools__audit_seed_checks` directly here. The tool replaces the file atomically, so calling it once at the start of every run is safe. + +Pass exactly these three shared checks (`identity-segmentation`, `coverage-map`, `data-quality`): + +```json +{ + "checks": [ + { + "id": "identity-segmentation", + "area": "Identity", + "label": "Identity & segmentation", + "status": "pending" + }, + { + "id": "coverage-map", + "area": "Coverage", + "label": "Coverage map", + "status": "pending" + }, + { + "id": "data-quality", + "area": "Data quality", + "label": "Data quality", + "status": "pending" + } + ] +} +``` + +Don't invent new ids — later steps resolve checks by these exact ids. Don't `Write` the file directly; the MCP tool owns it. + +### b. Find PostHog SDKs `Glob` for the project's dependency manifests across every language PostHog ships an SDK for. The full list: diff --git a/transformation-config/skills/events-audit/references/2-scan-subagent-prompt.md b/transformation-config/skills/events-audit/references/2-scan-subagent-prompt.md index b9f7b00..a647002 100644 --- a/transformation-config/skills/events-audit/references/2-scan-subagent-prompt.md +++ b/transformation-config/skills/events-audit/references/2-scan-subagent-prompt.md @@ -1,15 +1,3 @@ - - You are an events-audit enrichment subagent. You will read source files and write enriched capture rows to a part-file. Do not return the rows in your final message — write to disk only. Inputs: diff --git a/transformation-config/skills/events-audit/references/2-scan.md b/transformation-config/skills/events-audit/references/2-scan.md index f3f3f33..10f249c 100644 --- a/transformation-config/skills/events-audit/references/2-scan.md +++ b/transformation-config/skills/events-audit/references/2-scan.md @@ -103,7 +103,7 @@ Count distinct files in the base inventory. Load `Agent` once: `ToolSearch select:Agent`. -Read `references/2-scan-subagent-prompt.md`. Strip the leading HTML comment block (everything between ``, inclusive), then substitute: +Read `references/2-scan-subagent-prompt.md`, then substitute: - `{{N}}` — the partition number for that subagent (`1`, `2`, ..., up to N) - `{{ROW_IDS}}` — JSON array of the row IDs assigned to that subagent diff --git a/transformation-config/skills/events-audit/references/5-report-template.md b/transformation-config/skills/events-audit/references/5-report-template.md index fdbe0e7..54688c7 100644 --- a/transformation-config/skills/events-audit/references/5-report-template.md +++ b/transformation-config/skills/events-audit/references/5-report-template.md @@ -1,43 +1,3 @@ - - # PostHog events audit – {{repo_name}} _Generated {{timestamp}}_ diff --git a/transformation-config/skills/events-audit/references/5-report.md b/transformation-config/skills/events-audit/references/5-report.md index b03d94f..36e42c4 100644 --- a/transformation-config/skills/events-audit/references/5-report.md +++ b/transformation-config/skills/events-audit/references/5-report.md @@ -12,7 +12,7 @@ The skill's job is to give the reader a map plus a few short observations. **Don This step uses one supporting reference file (not part of the chain): -- `references/5-report-template.md` — verbatim markdown template for the rendered report. Orchestrator reads it once at step (f), strips the leading HTML comment block, substitutes every `{{placeholder}}`, and writes the result to `posthog-events-audit-report.md`. +- `references/5-report-template.md` — verbatim markdown template for the rendered report. Orchestrator reads it once at step (f), substitutes every `{{placeholder}}`, and writes the result to `posthog-events-audit-report.md`. ## Output discipline @@ -151,7 +151,7 @@ After computing the Overview panels in (d) and the identity capabilities in (e), ### f. Render the report -The markdown report template lives in `references/5-report-template.md`. The orchestrator reads it once, strips the leading HTML comment block (the placeholder catalog), substitutes every `{{placeholder}}` with values computed in steps (b) through (e), and writes the result to `posthog-events-audit-report.md` at the project root. +The markdown report template lives in `references/5-report-template.md`. The orchestrator reads it once, substitutes every `{{placeholder}}` with values computed in steps (b) through (e), and writes the result to `posthog-events-audit-report.md` at the project root. #### Substitution conventions From 120ef64145a222a265441f2cc0e110412abb3c64 Mon Sep 17 00:00:00 2001 From: Edwin Lim Date: Sun, 10 May 2026 13:44:45 -0400 Subject: [PATCH 4/5] fan out on step 3 --- .../skills/events-audit/description.md | 6 +- .../references/2-scan-subagent-prompt.md | 31 ---- .../skills/events-audit/references/2-scan.md | 161 +++++++----------- ...an-enrichment.md => 3-enrich-reference.md} | 30 +--- .../references/3-enrich-subagent-prompt.md | 72 ++++++++ .../events-audit/references/3-enrich.md | 99 +++++++++++ .../events-audit/references/3-extract.md | 54 ------ .../references/{4-mcp-query.md => 4-query.md} | 2 +- .../references/5-report-template.md | 2 +- .../events-audit/references/5-report.md | 22 ++- .../events-audit/references/6-dashboard.md | 25 +-- 11 files changed, 274 insertions(+), 230 deletions(-) delete mode 100644 transformation-config/skills/events-audit/references/2-scan-subagent-prompt.md rename transformation-config/skills/events-audit/references/{2-scan-enrichment.md => 3-enrich-reference.md} (65%) create mode 100644 transformation-config/skills/events-audit/references/3-enrich-subagent-prompt.md create mode 100644 transformation-config/skills/events-audit/references/3-enrich.md delete mode 100644 transformation-config/skills/events-audit/references/3-extract.md rename transformation-config/skills/events-audit/references/{4-mcp-query.md => 4-query.md} (99%) diff --git a/transformation-config/skills/events-audit/description.md b/transformation-config/skills/events-audit/description.md index ee1b579..17c7dc4 100644 --- a/transformation-config/skills/events-audit/description.md +++ b/transformation-config/skills/events-audit/description.md @@ -7,9 +7,9 @@ This skill produces a product-browseable report of every PostHog event your code The audit runs as a 6-step chain: 1. Detect SDK -2. Scan events -3. Extract event names -4. MCP query +2. Scan capture sites (grep only) +3. Enrich (subagent fan-out — the only step that reads source files) +4. Query PostHog for volume 5. Write report 6. Create dashboard diff --git a/transformation-config/skills/events-audit/references/2-scan-subagent-prompt.md b/transformation-config/skills/events-audit/references/2-scan-subagent-prompt.md deleted file mode 100644 index a647002..0000000 --- a/transformation-config/skills/events-audit/references/2-scan-subagent-prompt.md +++ /dev/null @@ -1,31 +0,0 @@ -You are an events-audit enrichment subagent. You will read source files and write enriched capture rows to a part-file. Do not return the rows in your final message — write to disk only. - -Inputs: -- Read `.posthog-events-inventory.json` once. The `rows` array contains base rows with `id`, `file`, `line`, `raw_match`, `event_name_hint`. -- Process only rows whose `id` is in this list: {{ROW_IDS}}. - -For each assigned row, read its file **once** (cache by file path; multiple rows in the same file share one `Read`). For each row, produce an enriched row with these fields: - -- `id`, `file`, `line` — copy from the base row. -- `sdk` — one of `posthog-js`, `posthog-node`, `posthog-python`, `posthog-ruby`, `posthog-go`, `posthog-ios`, `posthog-android`, `posthog-react-native`, `posthog-flutter`, `posthog-php`, `posthog-dotnet`, `posthog-elixir`. -- `call_kind` — one of `capture`, `identify`, `set`, `set_once`, `group`, `alias`, `reset`. -- `event_name` — the literal string in the event-name slot (resolve from the full call expression, not just the grep line). For dynamic names (variable, template literal, expression), set `null` and `is_dynamic: true`. -- `is_dynamic` — `true` if `event_name` couldn't be resolved to a literal. -- `properties` — array of property keys from the properties argument (object literal / dict / hash). Empty array if the call passes a variable; empty array for non-capture `call_kind`s. -- `conditional_fire` — `true` if the call sits inside an `if` / ternary / guard that depends on something other than user identity. -- `distinct_id_kind` — server-side SDKs only: `"variable"` | `"literal"` | `"missing"`. `null` for client-side rows. -- `package` — monorepo package name from `apps//`, `packages//`, `services//`, or `projects//` prefix. `null` for single-app repos. See the `package` rules in the enrichment reference. -- `area` — codebase bucket from the file path (computed *after* the `package` prefix is stripped). -- `route` — Next.js route if applicable, otherwise `null`. -- `enclosing` — nearest enclosing function/component name from a backward scan. -- `status` — `"pending"`. -- `volume_30d` — `null`. -- `last_seen` — `null`. - -Skip `$pageview` and `$pageleave` from the SDK — they are SDK-internal except in rare manual setups. If a base row's `raw_match` shows `$pageview` / `$pageleave`, drop the row (don't emit it in your part-file). - -When you have all enriched rows, `Write` `.posthog-events-inventory.part-{{N}}.json` with a JSON array of the rows (no wrapper object — just `[...]`). Pretty-print with two-space indent. - -Final message: respond with exactly one line — `"wrote part-{{N}} with M rows"` — where `M` is the count. Do NOT include the rows in your message. Do NOT recap. Just the one line. - -Reference: read `.claude/skills/events-audit/references/2-scan-enrichment.md` once for per-SDK call signatures, identification surfaces, and the `area` / `route` / `enclosing` rules. diff --git a/transformation-config/skills/events-audit/references/2-scan.md b/transformation-config/skills/events-audit/references/2-scan.md index 10f249c..25b84dd 100644 --- a/transformation-config/skills/events-audit/references/2-scan.md +++ b/transformation-config/skills/events-audit/references/2-scan.md @@ -1,45 +1,34 @@ --- -next_step: 3-extract.md +next_step: 3-enrich.md --- -# Step 2 – Scan capture sites (two-phase) +# Step 2 – Scan capture sites -Find every PostHog capture/identify/group SDK call in the codebase, derive the codebase mapping (`area`, `route`, `enclosing`), and extract per-call fields. Write the inventory to disk **without ever materializing the full enriched JSON in a single model turn.** +Find every PostHog capture/identify/group SDK call in the codebase via a single `Grep` and write a base inventory. **Read-only via Grep.** Don't `Read` any source files in this step — file-level enrichment happens in step 3. -The previous architecture collapsed enrichment + merge into one orchestrator turn and crashed at `max_tokens` on a 51-file project. This step is split into three phases that respect that limit: +This step is one Grep, one Write. No file Reads, no subagents, no MCP. Severity, flows, and identity analysis come later. -1. **Phase 1 — orchestrator structural pass.** One Grep, write a small base inventory with `file` / `line` / `event_name_hint` per row. -2. **Phase 2 — subagent enrichment fan-out.** All subagents dispatched in **one assistant turn**. Each subagent enriches a slice of rows and writes a part-file. Subagents return a one-line confirmation, never the JSON. -3. **Phase 3 — orchestrator concat via `jq`.** A single Bash call merges part-files into the canonical inventory. Zero output tokens for the merge. +## Tools -Don't judge severity, don't infer flows, don't call MCP — those come later. - -## Supporting files - -This step uses two supporting reference files (not part of the chain): - -- `references/2-scan-subagent-prompt.md` — verbatim subagent prompt template. Orchestrator reads it once at phase 2 start, substitutes `{{N}}` and `{{ROW_IDS}}`, passes the result to each `Agent` invocation. -- `references/2-scan-enrichment.md` — per-SDK call signatures, identification surfaces, `area` / `route` / `enclosing` rules. Subagents read it once during enrichment; the orchestrator does not. +Load via `ToolSearch select:Grep,Write` once at the start of this step. ## Status Emit, in order: ``` -[STATUS] Scanning capture sites -[STATUS] Writing base inventory -[STATUS] Enriching capture sites -[STATUS] Merging part-files +[STATUS] Scanning SDK capture sites +[STATUS] Writing base event inventory ``` -## Phase 1 — Orchestrator structural pass +## Action -### a. Grep for direct SDK calls +### a. Grep for direct SDK calls (with context) -Run a single `Grep` for the standard PostHog call shapes. Narrow `--include` to the languages step 1 detected — don't scan `*.kt` if the project is Python. +Run a single `Grep` for the standard PostHog call shapes. Use `-A 3` so multi-line capture calls are visible without opening the file. Narrow `--include` to the languages step 1 detected — don't scan `*.kt` if the project is Python. ``` -Grep -rn -E 'posthog\??\.(capture|identify|alias|group|setPersonProperties|setPersonPropertiesForFlags|reset)|usePostHog\(\)\??\.(capture|identify)|client\??\.capture|PostHog\??\.(shared|capture)|Posthog\(\)\??\.capture' +Grep -rn -B 0 -A 3 -E 'posthog\??\.(capture|identify|alias|group|setPersonProperties|setPersonPropertiesForFlags|reset)|usePostHog\(\)\??\.(capture|identify)|client\??\.capture|PostHog\??\.(shared|capture)|Posthog\(\)\??\.capture' ``` The `\??\.` matches both `posthog.capture(...)` and `posthog?.capture(...)` (optional chaining). JS/TS codebases routinely guard SDK calls with `?.` when the SDK may be uninitialised — missing this pattern undercounts the inventory by half or more. @@ -58,102 +47,78 @@ Common include patterns: **Exclude test files.** Drop hits in paths matching `*.test.*`, `*.spec.*`, `__tests__/**`, `tests/**`, `spec/**`. They pollute the inventory. +#### Per-SDK call signatures (covered by the regex above) + +Canonical reference for what a PostHog capture call looks like in each SDK. The grep regex above is a union of these shapes; step 3 subagents also use this table to find `event_name` and `properties` slots when extracting (they `Read` this file once at start). + +| SDK | Capture pattern | Event-name position | Properties position | +|-----|-----------------|---------------------|---------------------| +| posthog-js | `posthog.capture("event", { props })` | positional 1 | positional 2 (object literal) | +| posthog-js (hook) | `usePostHog().capture("event", { props })` | positional 1 | positional 2 | +| posthog-node | `client.capture({ distinctId, event, properties })` | object key `event` | object key `properties` | +| posthog-python | `posthog.capture(distinct_id, "event", properties)` | positional 2 | positional 3 (dict) | +| posthog-ruby | `posthog.capture({ distinct_id:, event:, properties: })` | hash key `event` | hash key `properties` | +| posthog-go | `client.Enqueue(posthog.Capture{Event: "...", Properties: posthog.NewProperties()...})` | struct field `Event` | struct field `Properties` | +| posthog-ios | `PostHog.shared.capture("event", properties: ["k": "v"])` | positional 1 | named `properties` | +| posthog-android | `PostHog.capture("event", properties = mapOf("k" to "v"))` | positional 1 | named `properties` | +| posthog-react-native | Same shape as posthog-js | positional 1 | positional 2 | +| posthog-flutter | `Posthog().capture(eventName: "...", properties: { ... })` | named `eventName` | named `properties` | +| posthog-php | `PostHog::capture(['distinctId' => ..., 'event' => '...', 'properties' => [...]])` | array key `event` | array key `properties` | +| posthog-dotnet | `client.Capture(distinctId, "event", new() { ["k"] = "v" })` | positional 2 | positional 3 | +| posthog-elixir | `Posthog.capture("event", distinct_id, %{ k: v })` | positional 1 | positional 3 | + If the result is empty: -- And the project's manifest had a PostHog SDK in step 1 → the codebase likely wraps the SDK behind a custom helper. Write `{ "rows": [], "wrapper_undetected": true }` to `.posthog-events-inventory.json` and skip phases 2 and 3 (move on to step 3). The data-quality check in the report step will flag this. + +- And the project's manifest had a PostHog SDK in step 1 → the codebase likely wraps the SDK behind a custom helper. Write `{ "rows": [], "wrapper_undetected": true }` to `.posthog-events-inventory.json` and skip the rest of this step (move on to step 3, which will short-circuit on empty rows). The data-quality check in the report step will flag this. - And no SDK was in the manifest either → emit `[ABORT] No capture call sites found in any detected SDK`. -### b. Write the base inventory +### b. Parse grep output into row groups -Build base rows directly from the grep result text. **Do not read any source files in phase 1.** Each row has only what's available from the grep line itself: +`Grep -A 3` emits one trigger line plus up to three following lines per match, separated by `--` divider lines (when running across files) or contiguous when matches are adjacent. For each match: -```jsonc -{ - "id": "capture--", - "file": "src/checkout/Checkout.tsx", - "line": 88, - "raw_match": " posthog.capture(\"purchase_completed\", { revenue, currency });", - "event_name_hint": "purchase_completed" -} -``` +- The trigger line is `path:line:content` — the `.capture(` / `.identify(` / etc. site. +- The following 0–3 lines are continuations from the same file. +- Group them as a "slice" — the trigger line plus its trailing context lines. + +The slice is what you reason about in step (c). You don't need to re-grep or open the file. -`event_name_hint` is best-effort: extract the first quoted string from `raw_match` (single, double, or backtick-quoted). For multi-line capture calls (`posthog.capture(\n "...", ...)`) the hint will be `null` — phase 2 resolves the canonical name by reading the file. **Don't try to be clever with regex here.** If the first quoted string is on the same line as the `.capture(` token, take it; otherwise leave `null`. +### c. Build base rows -`Write` `.posthog-events-inventory.json` with the base rows. This file is small (~40 bytes per row × 100 rows ≈ 4KB) so the Write fits in one turn easily. +For each grouped slice, build one row: ```jsonc { - "rows": [ ], - "wrapper_undetected": false, - "_phase": "base" + "id": "capture--", + "file": "src/checkout/Checkout.tsx", + "line": 88, + "raw_match": "", + "event_name": "purchase_completed", + "is_dynamic": false } ``` -The `_phase: "base"` marker tells you this file is not yet enriched. Phase 3 overwrites it. - -## Phase 2 — Subagent enrichment fan-out - -### c. Decide the partition - -Count distinct files in the base inventory. - -- **≤ 8 distinct files**: skip fan-out. The orchestrator handles enrichment inline (one subagent's worth of work; the merge is small). Skip phase 2's `Agent` dispatch and proceed straight to enrichment via direct `Read` + `Write` of the part-file convention. -- **> 8 distinct files**: fan out. `N = ceil(files / 10)`, capped at 8. Round-robin assign files alphabetically to N groups; each group's row-id list is what the subagent receives. Don't bother estimating file sizes — the orchestrator's job is dispatch, not load-balancing. - -### d. Spawn N sub-agents in parallel using the `Agent` tool +`event_name` resolution rule: extract the **first quoted string literal** (single, double, or backtick-quoted) found anywhere in the slice. If the first non-whitespace argument inside the parentheses is a quoted literal, take it. Otherwise: -Load `Agent` once: `ToolSearch select:Agent`. +- The slice contains a quoted literal but it's clearly a property value (e.g. `{ revenue: "USD" }`) and not the event name → keep scanning forward to find the event-name slot, or fall through to dynamic. +- The slice contains no quoted literal at all → set `event_name: null`, `is_dynamic: true`. Step 3's subagents will retry via Pattern A/B (same-file constant / enum) when they read the file. +- The argument is a template literal (`` `name_${...}` ``), variable, or expression → set `event_name: null`, `is_dynamic: true`. -Read `references/2-scan-subagent-prompt.md`, then substitute: +**Don't try to be clever.** If the slice doesn't make the literal obvious, leave it dynamic — step 3 has the file open and will resolve what it can. -- `{{N}}` — the partition number for that subagent (`1`, `2`, ..., up to N) -- `{{ROW_IDS}}` — JSON array of the row IDs assigned to that subagent +Skip `$pageview` and `$pageleave` matches entirely — they're SDK-internal in most setups. Drop those rows; they don't go into the inventory. -The substituted text is the full prompt for that subagent. +### d. Write the base inventory -**Spawn all N sub-agents in parallel using the `Agent` tool — one assistant turn, N tool_use blocks in the same message.** Sequential dispatch (one Agent per turn) loses ~30s of orchestration latency for no reason; the prior diagnostic confirmed this. Batch them. +`Write` `.posthog-events-inventory.json` with the rows: -Set `run_in_background: false` — you want their results before the merge. - -### e. Wait for all subagents to return - -Each subagent returns a single confirmation line (`"wrote part-N with M rows"`). Verify each part-file exists before phase 3: - -``` -Bash: for n in 1 2 ... N; do test -f .posthog-events-inventory.part-$n.json || echo "MISSING: part-$n"; done -``` - -If any part-file is missing, the subagent failed. Re-dispatch only the failed subagent with the same row-id slice. Don't re-run successful subagents. - -## Phase 3 — Concat via jq - -### f. Merge part-files into the canonical inventory - -One `Bash` call: - -``` -jq -s '{rows: (add | sort_by(.file, .line)), wrapper_undetected: false}' .posthog-events-inventory.part-*.json > .posthog-events-inventory.json && rm .posthog-events-inventory.part-*.json -``` - -This: -- Slurps every part-file as an array of arrays -- `add` flattens to a single rows array -- `sort_by(.file, .line)` produces a stable, readable order -- Wraps in `{rows, wrapper_undetected}` -- Overwrites the base inventory with the enriched one -- Cleans up part-files - -The orchestrator never has to materialize the merged JSON in a model turn — `jq` does the merge in shell, costing zero output tokens. - -If `jq` isn't available on the user's system, fall back to a Bash one-liner using Python: - -``` -python3 -c "import json,glob; rows=[] -[rows.extend(json.load(open(f))) for f in sorted(glob.glob('.posthog-events-inventory.part-*.json'))] -rows.sort(key=lambda r: (r['file'], r['line'])) -json.dump({'rows': rows, 'wrapper_undetected': False}, open('.posthog-events-inventory.json','w'), indent=2)" && rm .posthog-events-inventory.part-*.json +```jsonc +{ + "rows": [ ], + "wrapper_undetected": false +} ``` -Don't try to merge in a model turn. That's the rule that crashed the previous run. +This file is small (~80 bytes per row × 100 rows ≈ 8KB) so the Write fits in one turn easily. ## Notes on wrapper resolution diff --git a/transformation-config/skills/events-audit/references/2-scan-enrichment.md b/transformation-config/skills/events-audit/references/3-enrich-reference.md similarity index 65% rename from transformation-config/skills/events-audit/references/2-scan-enrichment.md rename to transformation-config/skills/events-audit/references/3-enrich-reference.md index 37384cc..83305c5 100644 --- a/transformation-config/skills/events-audit/references/2-scan-enrichment.md +++ b/transformation-config/skills/events-audit/references/3-enrich-reference.md @@ -1,26 +1,10 @@ -# Step 2 enrichment reference - -Lookup tables and rules subagents apply during step 2 enrichment. Read this file **once** at the start of your enrichment run. - -This file is supporting material for step 2; it has no `next_step` and is not part of the main step chain. The orchestrator does not read it. - -## Per-SDK capture call signatures - -| SDK | Capture pattern | Event-name position | Properties position | -|-----|-----------------|---------------------|---------------------| -| posthog-js | `posthog.capture("event", { props })` | positional 1 | positional 2 (object literal) | -| posthog-js (hook) | `usePostHog().capture("event", { props })` | positional 1 | positional 2 | -| posthog-node | `client.capture({ distinctId, event, properties })` | object key `event` | object key `properties` | -| posthog-python | `posthog.capture(distinct_id, "event", properties)` | positional 2 | positional 3 (dict) | -| posthog-ruby | `posthog.capture({ distinct_id:, event:, properties: })` | hash key `event` | hash key `properties` | -| posthog-go | `client.Enqueue(posthog.Capture{Event: "...", Properties: posthog.NewProperties()...})` | struct field `Event` | struct field `Properties` | -| posthog-ios | `PostHog.shared.capture("event", properties: ["k": "v"])` | positional 1 | named `properties` | -| posthog-android | `PostHog.capture("event", properties = mapOf("k" to "v"))` | positional 1 | named `properties` | -| posthog-react-native | Same shape as posthog-js | positional 1 | positional 2 | -| posthog-flutter | `Posthog().capture(eventName: "...", properties: { ... })` | named `eventName` | named `properties` | -| posthog-php | `PostHog::capture(['distinctId' => ..., 'event' => '...', 'properties' => [...]])` | array key `event` | array key `properties` | -| posthog-dotnet | `client.Capture(distinctId, "event", new() { ["k"] = "v" })` | positional 2 | positional 3 | -| posthog-elixir | `Posthog.capture("event", distinct_id, %{ k: v })` | positional 1 | positional 3 | +# Step 3 enrichment reference + +Lookup tables and rules subagents apply during step 3 enrichment. Read this file **once** at the start of your enrichment run. + +This file is supporting material for step 3; it has no `next_step` and is not part of the main step chain. The orchestrator does not read it. + +The per-SDK capture call signatures (where `event_name` and `properties` live in each SDK's call shape) are in `2-scan.md` under "Per-SDK call signatures". Read that section once at the start of your enrichment run alongside this file — you'll need it to extract `event_name` and `properties`. ## Identification surfaces diff --git a/transformation-config/skills/events-audit/references/3-enrich-subagent-prompt.md b/transformation-config/skills/events-audit/references/3-enrich-subagent-prompt.md new file mode 100644 index 0000000..419522a --- /dev/null +++ b/transformation-config/skills/events-audit/references/3-enrich-subagent-prompt.md @@ -0,0 +1,72 @@ +You are an events-audit enrichment subagent. You will read source files and write enriched capture rows to a part-file. Do not return the rows in your final message — write to disk only. + +Inputs: +- Read `.posthog-events-inventory.json` once. The `rows` array contains base rows with `id`, `file`, `line`, `raw_match`, `event_name`, `is_dynamic`. Step 2 already resolved most `event_name` values from grep output; rows where the literal wasn't visible from the grep slice are flagged `is_dynamic: true` with `event_name: null`. +- Process only rows whose `id` is in this list: {{ROW_IDS}}. + +For each assigned row, read its file **once** (cache by file path; multiple rows in the same file share one `Read`). For each row, produce an enriched row with these fields: + +- `id`, `file`, `line` — copy from the base row. +- `sdk` — one of `posthog-js`, `posthog-node`, `posthog-python`, `posthog-ruby`, `posthog-go`, `posthog-ios`, `posthog-android`, `posthog-react-native`, `posthog-flutter`, `posthog-php`, `posthog-dotnet`, `posthog-elixir`. +- `call_kind` — one of `capture`, `identify`, `set`, `set_once`, `group`, `alias`, `reset`. +- `event_name` — see "Retroactive name resolution" below. For most rows, copy from the base row. For rows step 2 left dynamic, try Pattern A / Pattern B; otherwise copy `null`. +- `is_dynamic` — `true` if `event_name` couldn't be resolved to a literal (after Pattern A/B retry). `false` once resolution succeeds. +- `properties` — array of property keys from the properties argument (object literal / dict / hash). Empty array if the call passes a variable; empty array for non-capture `call_kind`s. +- `conditional_fire` — `true` if the call sits inside an `if` / ternary / guard that depends on something other than user identity. +- `distinct_id_kind` — server-side SDKs only: `"variable"` | `"literal"` | `"missing"`. `null` for client-side rows. +- `package` — monorepo package name from `apps//`, `packages//`, `services//`, or `projects//` prefix. `null` for single-app repos. See the `package` rules in the enrichment reference. +- `area` — codebase bucket from the file path (computed *after* the `package` prefix is stripped). +- `route` — Next.js route if applicable, otherwise `null`. +- `enclosing` — nearest enclosing function/component name from a backward scan. +- `status` — `"pending"`. +- `volume_30d` — `null`. +- `last_seen` — `null`. + +## Retroactive name resolution (only for rows where `is_dynamic: true` from step 2) + +For rows the orchestrator left dynamic, you have the file open already — try the patterns below before falling back to `is_dynamic: true`. Don't try these for rows where step 2 already resolved `event_name`; trust the base row. + +### Pattern A — constant inlining + +```ts +const EVENT = "signup_completed"; +posthog.capture(EVENT, { method }); +``` + +If the capture's first argument is a `const` / `let` / `final` / module-level variable in the same file, has a literal initializer, and is never reassigned, inline its value. If it's reassigned anywhere, leave the row dynamic. + +### Pattern B — enum / object dispatch + +```ts +const EVENTS = { + SIGNUP_COMPLETED: "signup_completed", + CHECKOUT_STARTED: "checkout_started", +} as const; + +posthog.capture(EVENTS.SIGNUP_COMPLETED, { ... }); +``` + +If the property access targets an object literal in the same module and every value is a literal, inline the resolved value. Don't resolve enums imported from other modules — leave dynamic. + +### What you don't resolve + +- Names built with template literals: `` `signup_${variant}` ``. Leave dynamic. +- Names imported from another module (other than the same-file enum pattern). Leave dynamic. +- Names from network responses or feature-flag values. Leave dynamic. +- **Wrapper / function-arg passthrough.** If the dynamic name is a function parameter (`posthog.capture(eventName, ...)` where `eventName` is the enclosing function's argument), leave dynamic — chasing callers across files is intentionally out of scope. The report step's suggested follow-ups list points the reader at this case so they can ask Claude to resolve specific wrappers on demand. + +When a row can't be resolved, leave it `is_dynamic: true` with `event_name: null`. The data-quality check counts these as undercount risk; the report's by-event table omits them (they appear only in a "dynamic captures" footnote). + +## $pageview / $pageleave + +Skip `$pageview` and `$pageleave` from the SDK — they are SDK-internal except in rare manual setups. If a base row's `raw_match` shows `$pageview` / `$pageleave`, drop the row (don't emit it in your part-file). Step 2 already drops these in most cases, but double-check. + +## Output + +When you have all enriched rows, `Write` `.posthog-events-inventory.part-{{N}}.json` with a JSON array of the rows (no wrapper object — just `[...]`). Pretty-print with two-space indent. + +Final message: respond with exactly one line — `"wrote part-{{N}} with M rows"` — where `M` is the count. Do NOT include the rows in your message. Do NOT recap. Just the one line. + +References to read once at the start of your run: +- `.claude/skills/events-audit/references/3-enrich-reference.md` — identification surfaces, `package` / `area` / `route` / `enclosing` rules. +- `.claude/skills/events-audit/references/2-scan.md` — only the "Per-SDK call signatures" table, for extracting `event_name` and `properties` from each SDK's call shape. diff --git a/transformation-config/skills/events-audit/references/3-enrich.md b/transformation-config/skills/events-audit/references/3-enrich.md new file mode 100644 index 0000000..932de81 --- /dev/null +++ b/transformation-config/skills/events-audit/references/3-enrich.md @@ -0,0 +1,99 @@ +--- +next_step: 4-query.md +--- + +# Step 3 – Enrich capture sites (subagent fan-out) + +For each row in the base inventory, read the source file once and produce the full enrichment fields: `sdk`, `call_kind`, `properties`, `conditional_fire`, `distinct_id_kind`, `package`, `area`, `route`, `enclosing`. Also retroactively resolve `event_name` for any row step 2 left dynamic (Pattern A: same-file constant inlining; Pattern B: same-file enum dispatch). + +**This is the only step that `Read`s source files.** Step 2 worked from grep output alone; step 3 owns all file I/O. Subagent enrichment fans out across files in parallel, so the orchestrator never materializes the full enriched JSON in a single model turn — that crashed prior runs at `max_tokens`. + +The step has three phases: + +1. **Phase 1 — orchestrator structural pass.** Decide partition based on distinct file count. +2. **Phase 2 — subagent enrichment fan-out.** All subagents dispatched in **one assistant turn**. Each subagent enriches a slice of rows and writes a part-file. +3. **Phase 3 — orchestrator concat via `jq`.** A single Bash call merges part-files into the canonical inventory. + +## Tools + +Load `Read`, `Write`, and `Bash` via `ToolSearch select:Read,Write,Bash` once at the start of this step. Load `Agent` only inside Phase 2 if fan-out is needed (see partition rules below). + +## Supporting files + +This step uses two supporting reference files (not part of the chain): + +- `references/3-enrich-subagent-prompt.md` — verbatim subagent prompt template. Orchestrator reads it once at phase 2 start, substitutes `{{N}}` and `{{ROW_IDS}}`, passes the result to each `Agent` invocation. +- `references/3-enrich-reference.md` — per-SDK call signatures, identification surfaces, `package` / `area` / `route` / `enclosing` rules. Subagents read it once during enrichment; the orchestrator does not. + +## Status + +Emit, in order: + +``` +[STATUS] Spawning sub-agents for synthesis +[STATUS] Enriching capture sites +[STATUS] Merging part-files +``` + +## Phase 1 — Decide the partition + +`Read` `.posthog-events-inventory.json` once. If `rows[]` is empty, skip phases 2 and 3 entirely and continue to step 4. + +Count distinct files in the base inventory. + +- **≤ 8 distinct files**: skip fan-out. The orchestrator handles enrichment inline (one subagent's worth of work; the merge is small). Read each file directly, apply the subagent enrichment rules from `3-enrich-reference.md`, and write a single part-file `.posthog-events-inventory.part-1.json`. Then proceed to phase 3. +- **> 8 distinct files**: fan out. `N = ceil(files / 10)`, capped at 8. Round-robin assign files alphabetically to N groups; each group's row-id list is what the subagent receives. Don't bother estimating file sizes — the orchestrator's job is dispatch, not load-balancing. + +## Phase 2 — Spawn N sub-agents in parallel + +Load `Agent` once: `ToolSearch select:Agent`. + +Read `references/3-enrich-subagent-prompt.md`, then substitute: + +- `{{N}}` — the partition number for that subagent (`1`, `2`, ..., up to N) +- `{{ROW_IDS}}` — JSON array of the row IDs assigned to that subagent + +The substituted text is the full prompt for that subagent. + +**Spawn all N sub-agents in parallel using the `Agent` tool — one assistant turn, N tool_use blocks in the same message.** Sequential dispatch (one Agent per turn) loses ~30s of orchestration latency for no reason. Batch them. + +Set `run_in_background: false` — you want their results before the merge. + +### Wait for all subagents to return + +Each subagent returns a single confirmation line (`"wrote part-N with M rows"`). Verify each part-file exists before phase 3: + +``` +Bash: for n in 1 2 ... N; do test -f .posthog-events-inventory.part-$n.json || echo "MISSING: part-$n"; done +``` + +If any part-file is missing, the subagent failed. Re-dispatch only the failed subagent with the same row-id slice. Don't re-run successful subagents. + +## Phase 3 — Concat via jq + +One `Bash` call: + +``` +jq -s '{rows: (add | sort_by(.file, .line)), wrapper_undetected: false}' .posthog-events-inventory.part-*.json > .posthog-events-inventory.json && rm .posthog-events-inventory.part-*.json +``` + +This: +- Slurps every part-file as an array of arrays +- `add` flattens to a single rows array +- `sort_by(.file, .line)` produces a stable, readable order +- Wraps in `{rows, wrapper_undetected}` +- Overwrites the base inventory with the enriched one +- Cleans up part-files + +The orchestrator never has to materialize the merged JSON in a model turn — `jq` does the merge in shell, costing zero output tokens. + +If `jq` isn't available on the user's system, fall back to a Bash one-liner using Python: + +``` +python3 -c "import json,glob; rows=[] +[rows.extend(json.load(open(f))) for f in sorted(glob.glob('.posthog-events-inventory.part-*.json'))] +rows.sort(key=lambda r: (r['file'], r['line'])) +json.dump({'rows': rows, 'wrapper_undetected': False}, open('.posthog-events-inventory.json','w'), indent=2)" && rm .posthog-events-inventory.part-*.json +``` + +Don't try to merge in a model turn. That's the rule that crashed the previous run. diff --git a/transformation-config/skills/events-audit/references/3-extract.md b/transformation-config/skills/events-audit/references/3-extract.md deleted file mode 100644 index b6a8234..0000000 --- a/transformation-config/skills/events-audit/references/3-extract.md +++ /dev/null @@ -1,54 +0,0 @@ ---- -next_step: 4-mcp-query.md ---- - -# Step 3 – Resolve dynamic event names - -For inventory rows with `is_dynamic: true` or `event_name: null`, try to resolve the literal name by tracing the local code. Anything that doesn't resolve stays dynamic – the data-quality check in the report step treats unresolved dynamics as undercount risk. - -## Status - -Emit: - -``` -[STATUS] Extracting event names -``` - -## Action - -`Read` `.posthog-events-inventory.json`. Filter to rows where `is_dynamic == true` or `event_name == null`. If empty, continue to step 4 immediately. - -For each ambiguous row, `Read` its file once and try the patterns below. - -### Pattern A – constant inlining - -```ts -const EVENT = "signup_completed"; -posthog.capture(EVENT, { method }); -``` - -If `EVENT` is a `const` / `final` / `let` / module-level variable in the same file, has a literal initializer, and is never reassigned, inline its value. If it's reassigned anywhere, leave the row dynamic. - -### Pattern B – enum / object dispatch - -```ts -const EVENTS = { - SIGNUP_COMPLETED: "signup_completed", - CHECKOUT_STARTED: "checkout_started", -} as const; - -posthog.capture(EVENTS.SIGNUP_COMPLETED, { ... }); -``` - -If the property access targets an object literal in the same module and every value is a literal, inline the resolved value. Don't resolve enums imported from other modules – leave dynamic. - -### What you don't resolve - -- Names built with template literals: `` `signup_${variant}` ``. Leave dynamic. The data-quality check flags these as undercount risk. -- Names imported from another module (other than the same-file enum pattern). Leave dynamic. -- Names from network responses or feature-flag values. Leave dynamic. -- **Wrapper / function-arg passthrough.** If the dynamic name is a function parameter (`posthog.capture(eventName, ...)` where `eventName` is the enclosing function's argument), leave dynamic — chasing callers across files is intentionally out of scope. The report step's suggested follow-ups list points the reader at this case so they can ask Claude to resolve specific wrappers on demand. - -When a row can't be resolved, leave it as `is_dynamic: true` with `event_name: null`. The data-quality check counts these as undercount risk; the report's by-event table omits them (they appear only in a "dynamic captures" footnote). - -`Write` the updated inventory back. This is the only step that edits the inventory by hand – keep the two-space indent. diff --git a/transformation-config/skills/events-audit/references/4-mcp-query.md b/transformation-config/skills/events-audit/references/4-query.md similarity index 99% rename from transformation-config/skills/events-audit/references/4-mcp-query.md rename to transformation-config/skills/events-audit/references/4-query.md index 5777da2..98fe809 100644 --- a/transformation-config/skills/events-audit/references/4-mcp-query.md +++ b/transformation-config/skills/events-audit/references/4-query.md @@ -21,7 +21,7 @@ This step is one MCP call, one in-place merge, one `Write`. Do not re-emit the e Emit: ``` -[STATUS] Querying PostHog for volume +[STATUS] Querying PostHog MCP for volume ``` ## MCP tools diff --git a/transformation-config/skills/events-audit/references/5-report-template.md b/transformation-config/skills/events-audit/references/5-report-template.md index 54688c7..7b86512 100644 --- a/transformation-config/skills/events-audit/references/5-report-template.md +++ b/transformation-config/skills/events-audit/references/5-report-template.md @@ -13,7 +13,7 @@ This audit lists every event your code captures, where it fires, and how often P | Phantom events (no volume) | {{phantom_count}} | | Top 10 events = % of total volume | {{top_10_share}} | -> **Live dashboard:** _not linked — `dashboard-create` did not succeed during this run. See the run output for the failure reason, then re-run the audit to retry._ +{{dashboard_callout}} {{overview_panels}} diff --git a/transformation-config/skills/events-audit/references/5-report.md b/transformation-config/skills/events-audit/references/5-report.md index 36e42c4..79b8cea 100644 --- a/transformation-config/skills/events-audit/references/5-report.md +++ b/transformation-config/skills/events-audit/references/5-report.md @@ -94,18 +94,20 @@ Overview is the action-oriented top section. It's a small KPI grid plus a series Each panel is a short bulleted list. Panels are derived deterministically from the inventory. -1. **Phantom events — in code, zero volume.** Events where `status == "phantom"`. Each bullet: `event_name — area`. Sort by area, then event name. -2. **No properties attached — flying blind.** Events where `properties_seen[] == []`. Sort by `volume_30d` desc. Each bullet: `event_name — Xk events flying blind`. Limit to top 12; add `… (+N more)` if longer. -3. **Name drift — same concept, different keys.** Pairs of events whose names collapse to the same string when lowercased and stripped of underscores/spaces. Each bullet: `event_a vs event_b — splits funnels on `. -4. **Type drift — numeric property with mixed types.** Property keys named `revenue`, `amount`, `price`, `count`, `duration_*`, `quantity` whose values mix number and string across call sites. Each bullet: `property — number at file:line, string at file:line — silently zeros aggregates`. -5. **Conditional fires — undercount risk.** Events where `has_conditional == true`. Each bullet: `event_name — fires inside at file:line`. Sort by volume desc; cap at 8. -6. **Duplicate captures — same event from multiple SDK families.** Events present in both client- and server-side SDK rows, where neither row is in a test file and neither explicitly threads `distinctId` from request context. Each bullet: `event_name — fires from at file:line and at file:line — risks 2× counting`. -7. **Unresolved dynamic captures.** Inventory rows still flagged `is_dynamic: true` after step 3. Each bullet: `file:line — event name is `. -8. **Volume concentration.** A short text line plus the top 10 events as a bulleted list with bars. Each bullet: `event_name — Xk · share% · ▓▓▓▓░░░░░░`. Bars use Unicode block characters (`▓` for filled, `░` for empty), 12 chars wide, scaled to per-event share of `total_volume_30d`. +1. **Volume concentration.** A short text line plus the top 10 events as a bulleted list with bars. Each bullet: `event_name — Xk · share% · ▓▓▓▓░░░░░░`. Bars use Unicode block characters (`▓` for filled, `░` for empty), 12 chars wide, scaled to per-event share of `total_volume_30d`. +2. **Phantom events — in code, zero volume.** Events where `status == "phantom"`. Each bullet: `event_name — area`. Sort by area, then event name. +3. **No properties attached.** Events where `properties_seen[] == []`. Sort by `volume_30d` desc. Each bullet: `event_name — Xk events flying blind`. Limit to top 12; add `… (+N more)` if longer. +4. **Name drift — same concept, different keys.** Pairs of events whose names collapse to the same string when lowercased and stripped of underscores/spaces. Each bullet: `event_a vs event_b — splits funnels on `. +5. **Type drift — numeric property with mixed types.** Property keys named `revenue`, `amount`, `price`, `count`, `duration_*`, `quantity` whose values mix number and string across call sites. Each bullet: `property — number at file:line, string at file:line — silently zeros aggregates`. +6. **Conditional fires — undercount risk.** Events where `has_conditional == true`. Each bullet: `event_name — fires inside at file:line`. Sort by volume desc; cap at 8. +7. **Duplicate captures — same event from multiple SDK families.** Events present in both client- and server-side SDK rows, where neither row is in a test file and neither explicitly threads `distinctId` from request context. Each bullet: `event_name — fires from at file:line and at file:line — risks 2× counting`. +8. **Unresolved dynamic captures.** Inventory rows still flagged `is_dynamic: true` after step 3. Each bullet: `file:line — event name is `. Skip any panel whose source list is empty. Don't render an empty "No phantom events" header — silence is the signal. -These panels carry the findings that previously lived in the standalone Coverage Map and Data Quality sections; rendering them as Overview panels keeps action items in one place at the top of the report. The `coverage-map` and `data-quality` checks are still resolved separately via `audit_resolve_checks` (their `details` mirror the relevant Overview panels). +These panels carry the findings that previously lived in the standalone Coverage Map and Data Quality sections; rendering them as Overview panels keeps action items in one place at the top of the report. + +The `coverage-map` and `data-quality` checks are still resolved separately via `audit_resolve_checks` (their `details` mirror the relevant Overview panels). ### e. Analyze identity & segmentation (shared check) @@ -153,6 +155,8 @@ After computing the Overview panels in (d) and the identity capabilities in (e), The markdown report template lives in `references/5-report-template.md`. The orchestrator reads it once, substitutes every `{{placeholder}}` with values computed in steps (b) through (e), and writes the result to `posthog-events-audit-report.md` at the project root. +**Exception: `{{dashboard_callout}}` is intentionally not substituted in this step.** Step 6 fills that placeholder after dashboard creation runs. Leave it as-is in the rendered output — step 6 always resolves it (to a link on success, or empty string on failure), so it never ships to the reader. + #### Substitution conventions These rules tell you how to format each placeholder. The placeholder names themselves are documented in the template's header comment. diff --git a/transformation-config/skills/events-audit/references/6-dashboard.md b/transformation-config/skills/events-audit/references/6-dashboard.md index b21d5b3..194cae0 100644 --- a/transformation-config/skills/events-audit/references/6-dashboard.md +++ b/transformation-config/skills/events-audit/references/6-dashboard.md @@ -99,7 +99,7 @@ This insight surfaces events the code references but PostHog hasn't seen recentl ```json { - "name": "Events audit · Phantom watch", + "name": "Events audit · Phantom events", "description": "Events captured in code but with zero or near-zero volume in the last 30 days. A growing list here usually means dead instrumentation, a typo, or a code path that no longer fires.", "dashboards": [], "query": { @@ -117,25 +117,30 @@ In the actual call, replace the `code_events` CTE's `SELECT 'event_a' ... UNION If any single `insight-create` call errors, log the failure inline (`Insight "" failed: `) and continue with the rest. A partial dashboard is more useful than no dashboard. -### c. Patch the dashboard URL into the report +### c. Resolve the dashboard placeholder in the report -The report (written in step 5) has a one-line blockquote callout inside the Overview section, immediately after the Overview metric table: +Step 5 writes the report with a `{{dashboard_callout}}` placeholder still in it — step 5 intentionally leaves it for step 6 to fill. The placeholder lives inside the Overview section, immediately after the metric table. -``` -> **Live dashboard:** _not linked — `dashboard-create` did not succeed during this run. See the run output for the failure reason, then re-run the audit to retry._ -``` +Step 6 always `Edit`s the placeholder; the substitution depends on outcome. -If at least one insight was created successfully, `Edit` `posthog-events-audit-report.md` to swap that callout for a live link. Use the `Edit` tool with: +**On success (at least one insight created):** swap the placeholder for a live blockquote link. -- `old_string`: the full blockquote line above, exactly as written (single line; do not include surrounding blank lines). +- `old_string`: `{{dashboard_callout}}` - `new_string`: a single blockquote line of the form: ``` - > **Live dashboard:** []() — daily volume trend, top events, and phantom watch. + > **Events audit dashboard:** []() — daily volume trend, top events, and phantom watch. Auto-created by the wizard. ``` Substitute `` and `` from the `dashboard-create` response. If one or two insights failed and the rest succeeded, trim the trailing list to mention only the insights that exist (e.g. "daily volume trend and top events" if phantom watch failed). -If every `insight-create` call failed in (c), don't patch the report — leave the placeholder as-is. An empty dashboard isn't worth linking to. Delete the empty dashboard if the MCP project has `mcp__posthog-wizard__dashboard-delete` available; otherwise note "Dashboard created but all insights failed; remove it manually at " and move on. +**On failure (dashboard creation errored, or every `insight-create` call failed):** swap the placeholder for empty string. + +- `old_string`: `{{dashboard_callout}}` +- `new_string`: (empty) + +The report ends up with no dashboard line at all — that's the right UX for "no dashboard available." Don't try to surface the failure reason inside the report; the wizard already shows the failure in the run output. **Always perform this Edit** even on failure — leaving an unresolved `{{dashboard_callout}}` in the report would leak templating internals to the reader. + +If every `insight-create` call failed but the dashboard itself was created, also try to delete the empty dashboard via `mcp__posthog-wizard__dashboard-delete` if that tool is available; otherwise note "Dashboard created but all insights failed; remove it manually at " in the run output and move on. ### d. Surface the dashboard URL From b32962553943e0578b9734562eb18520910b8777 Mon Sep 17 00:00:00 2001 From: Edwin Lim Date: Sun, 10 May 2026 20:39:17 -0400 Subject: [PATCH 5/5] mcp handling and dashboard url emissions --- .../skills/events-audit/description.md | 3 +- .../skills/events-audit/references/4-query.md | 30 +++++++++++++------ .../references/5-report-template.md | 2 ++ .../events-audit/references/5-report.md | 25 +++++++++++++--- .../events-audit/references/6-dashboard.md | 24 ++++++++------- 5 files changed, 59 insertions(+), 25 deletions(-) diff --git a/transformation-config/skills/events-audit/description.md b/transformation-config/skills/events-audit/description.md index 17c7dc4..8fd82c2 100644 --- a/transformation-config/skills/events-audit/description.md +++ b/transformation-config/skills/events-audit/description.md @@ -62,7 +62,8 @@ Report aborts with `[ABORT]` prefixed messages. The wizard catches these and sto - `[ABORT] No PostHog SDK found` - `[ABORT] No capture call sites found in any detected SDK` -- `[ABORT] MCP project mismatch – enrichment unsafe` + +MCP failures (project mismatch, query errors, no connection) are **not** abort conditions — step 4 soft-degrades and step 5 renders the report with a `{{mcp_disclaimer}}` callout in place of volume sections. See step 4 for the degradation contract. ## Framework guidelines diff --git a/transformation-config/skills/events-audit/references/4-query.md b/transformation-config/skills/events-audit/references/4-query.md index 98fe809..92d2bde 100644 --- a/transformation-config/skills/events-audit/references/4-query.md +++ b/transformation-config/skills/events-audit/references/4-query.md @@ -6,6 +6,8 @@ next_step: 5-report.md Pull 30-day volume and `last_seen` for every event the inventory references. The SQL filters to inventory event names — orphan detection is intentionally out of scope (PostHog projects often span multiple repos, so events without a code match are usually noise from another codebase). After merging, resort `rows[]` by `volume_30d` so the report's by-event table naturally surfaces highest-impact events first. +**Soft-degrade if MCP isn't available.** If the project is wrong, the connection fails, or the query errors, do not abort the run. Set a top-level flag in the inventory and continue to step 5 — the report renders with code-side findings only and a disclaimer at the top. + ## Output discipline This step is one MCP call, one in-place merge, one `Write`. Do not re-emit the entire inventory in assistant text before writing — prior runs spent ~150 seconds streaming the JSON into the conversation before invoking `Write`, which is pure output-token waste. The flow is: @@ -37,13 +39,13 @@ The active project comes from the wizard session – don't pick or switch projec ### a. Confirm the project -The active project is whatever the wizard's MCP session targets. If you can't confirm it, or the user said this codebase ships to a different project, emit `[ABORT] MCP project mismatch – enrichment unsafe`. +The active project is whatever the wizard's MCP session targets. If you can't confirm it, or the user said this codebase ships to a different project, **don't abort** — set `mcp_unavailable_reason: "project mismatch"` (see step (g)) and skip to (g) without running the query. ### b. Build the event-name list `Read` `.posthog-events-inventory.json`. Collect every distinct `event_name` from `rows[]` where `call_kind == "capture"` and `is_dynamic == false` and `event_name != null`. Deduplicate. This is the IN-list for the SQL. -If the list is empty (every capture row is dynamic), skip the SQL call and proceed to (d) – every row will keep `volume_30d: 0` and `last_seen: null`. +If the list is empty (every capture row is dynamic), skip the SQL call and proceed to (d) – every row will keep `volume_30d: 0` and `last_seen: null`. Set `mcp_available: true` and `mcp_skipped_reason: "no resolved event names to query"` so step 5 knows volume is available in principle but nothing was queried. ### c. Query volume for inventory events @@ -82,14 +84,24 @@ Walk `rows[]` once and set `status` on every `call_kind == "capture"` row: Phantom is the inverse of orphan: the code references an event that PostHog hasn't seen recently. Could be a typo, a code path that no longer fires, or instrumentation that hasn't shipped yet. The data-quality check uses this as undercount risk. -If the SQL call in (c) was skipped or errored (every row has `volume_30d: null`), leave `status: "pending"` on every row – the report step will note "no MCP volume data available" and judge only on code presence. +### g. Set the MCP-availability flag and write + +Set top-level keys on the inventory based on what happened: + +| Outcome | `mcp_available` | `mcp_skipped_reason` | +|---|---|---| +| Query ran successfully and returned rows | `true` | `null` | +| Query ran but the result was empty (zero matching events in last 30d) | `true` | `"empty result — likely wrong project, but proceeding"` | +| IN-list was empty (every capture is dynamic) | `true` | `"no resolved event names to query"` | +| Project couldn't be confirmed in (a) | `false` | `"project mismatch"` | +| `query-run` errored out (misconfigured project, schema drift, network) | `false` | `"query-run failed: "` | +| No MCP connection at all | `false` | `"MCP unavailable"` | -`Write` the inventory back. +When `mcp_available: false`, leave every row's `volume_30d: null`, `last_seen: null`, `status: "pending"`. Step 5 reads these flags and renders a disclaimer in place of volume-dependent sections. Step 6 reads `mcp_available` and skips dashboard creation entirely if false. -### g. Failure handling +`Write` the inventory back. Continue to step 5 in every case — never abort here. -Three failure modes, in order of severity: +### h. Notes for the orchestrator -- **No MCP connection or no project id.** Emit `[ABORT] MCP project mismatch – enrichment unsafe`. The wizard halts the run. -- **`query-run` errors out** (misconfigured project, schema drift). Set `volume_30d = null` and `last_seen = null` on every row and continue. The report step's data-quality check will note "no MCP volume data available" and judge only on code presence. -- **Empty result** (zero events in the last 30 days for every inventory event). Treat as "no events in PostHog – likely the wrong project" and let the data-quality check flag it. +- **Don't retry on failure.** One attempt; if it fails, soft-degrade. The wizard logs the failure reason and the user can re-run with a corrected project. +- **Don't try to recover by guessing a different project.** The active project is the wizard's session — switching it is out of scope. diff --git a/transformation-config/skills/events-audit/references/5-report-template.md b/transformation-config/skills/events-audit/references/5-report-template.md index 7b86512..0688760 100644 --- a/transformation-config/skills/events-audit/references/5-report-template.md +++ b/transformation-config/skills/events-audit/references/5-report-template.md @@ -4,6 +4,8 @@ _Generated {{timestamp}}_ This audit lists every event your code captures, where it fires, and how often PostHog has seen it in the last 30 days. Use the suggested follow-ups at the end to ask Claude focused questions about the events listed here. +{{mcp_disclaimer}} + ## 1. Overview | Metric | Value | diff --git a/transformation-config/skills/events-audit/references/5-report.md b/transformation-config/skills/events-audit/references/5-report.md index 79b8cea..7b4e73e 100644 --- a/transformation-config/skills/events-audit/references/5-report.md +++ b/transformation-config/skills/events-audit/references/5-report.md @@ -41,9 +41,22 @@ Emit, in order: - `rows[]` – capture rows (sorted by `volume_30d` desc by step 4) with `event_name`, `properties[]`, `package`, `area`, `route`, `enclosing`, `volume_30d`, `last_seen`, `status`, etc. - `wrapper_undetected` – top-level boolean. +- `mcp_available` – top-level boolean from step 4. `false` means PostHog volume data is missing; render the report in degraded mode (see below). +- `mcp_skipped_reason` – optional short string explaining why MCP was skipped or failed. Used in the disclaimer when `mcp_available: false`. If `rows[]` is empty, render a short report explaining the inventory is empty, resolve all three shared checks with `pending` details (no data to evaluate), and exit. +#### Degraded mode (`mcp_available: false`) + +When MCP wasn't reachable in step 4, every row has `volume_30d: null` and `status: "pending"`. Render the report with these adjustments: + +- Substitute `{{mcp_disclaimer}}` with a one-paragraph callout (see substitution conventions in (f)). Otherwise leave it empty. +- Volume KPIs in Overview render as `—` instead of numbers: total volume, phantom count, top-10 share. Distinct-events count still renders (it's code-derived). +- Volume Map section: instead of the events table, the body becomes a single line `_PostHog volume data was not fetched during this run — see the disclaimer above. Capture sites are still listed in the Area topology section below._`. Skip the capture-sites collapsibles too. +- Area Topology section: still renders, but sort areas alphabetically (no volume to sort by) and omit the `` figure from each area heading. Each event bullet shows just the event name plus `· conditional` if applicable; no volume number, no `· phantom` tag. +- Overview panels: skip "Volume concentration" and "Phantom events" entirely (both need volume). All other panels (no-properties, name drift, type drift, conditional fires, duplicate captures, unresolved dynamics) still render — they're code-derived. +- The `data-quality` check resolves to whatever the code-derived panels found. Don't penalize for missing volume; that's not a code problem. + ### b. Aggregate by event (Volume Map records) Group capture rows by `event_name` (skip rows where `is_dynamic == true` or `event_name == null`; those go to the dynamic-captures appendix). For each distinct event, compute: @@ -163,10 +176,14 @@ These rules tell you how to format each placeholder. The placeholder names thems - **`{{repo_name}}`** — the project root directory name. - **`{{timestamp}}`** — short human-readable date (e.g. `2026-05-09`) or full ISO timestamp. -- **`{{total_volume}}`** — formatted with thousands separator (`310,000`) or compact (`310k`); use compact for totals ≥10,000. -- **`{{distinct_count}}`** — integer; from the by-event records in (b). -- **`{{phantom_count}}`** — integer; render as `0` if no phantoms (the row is still useful at all-zeros). -- **`{{top_10_share}}`** — percentage rounded to nearest whole, e.g. `90%`. +- **`{{mcp_disclaimer}}`** — empty string when `mcp_available: true`. When `mcp_available: false`, a one-paragraph callout. Use this exact shape, substituting `` from `mcp_skipped_reason`: + ```markdown + > **Volume data not fetched.** PostHog could not be queried during this run (). The events your code captures, where they fire, and how they're identified are still in the report below — but per-event 30-day volume, phantom detection (events seen in code but not in PostHog), and the top-events table are missing. Re-run the audit with PostHog MCP configured to populate them. + ``` +- **`{{total_volume}}`** — formatted with thousands separator (`310,000`) or compact (`310k`); use compact for totals ≥10,000. **Render `—` when `mcp_available: false`.** +- **`{{distinct_count}}`** — integer; from the by-event records in (b). Always renders (code-derived). +- **`{{phantom_count}}`** — integer; render as `0` if no phantoms (the row is still useful at all-zeros). **Render `—` when `mcp_available: false`.** +- **`{{top_10_share}}`** — percentage rounded to nearest whole, e.g. `90%`. **Render `—` when `mcp_available: false`.** - **`{{overview_panels}}`** — concatenation of the panels from (d), each rendered as: ```markdown ### diff --git a/transformation-config/skills/events-audit/references/6-dashboard.md b/transformation-config/skills/events-audit/references/6-dashboard.md index 194cae0..8c3003b 100644 --- a/transformation-config/skills/events-audit/references/6-dashboard.md +++ b/transformation-config/skills/events-audit/references/6-dashboard.md @@ -4,7 +4,9 @@ next_step: null # Step 6 – Live dashboard -The static report shows what your code captures. This step creates a live PostHog dashboard pinned to the same code-confirmed event list, so you can watch volume over time and catch phantoms as they appear. The dashboard is part of the standard audit deliverable — don't ask the user whether to create it. If the MCP project isn't writable, fail soft (log the reason, leave the placeholder in the report) and clean up as normal. +The static report shows what your code captures. This step creates a live PostHog dashboard pinned to the same code-confirmed event list, so you can watch volume over time and catch phantoms as they appear. The dashboard is part of the standard audit deliverable — don't ask the user whether to create it. If the MCP project isn't writable, fail soft (log the reason, resolve the `{{dashboard_callout}}` placeholder to empty string) and clean up as normal. + +**Pre-check:** `Read` `.posthog-events-inventory.json` and check the top-level `mcp_available` flag set by step 4. If `mcp_available: false`, skip directly to step (c) — there's no point attempting `dashboard-create` against an unavailable MCP. Step (c) resolves the placeholder to empty string (the failure path) and step (d) cleans up. ## Status @@ -44,7 +46,15 @@ Call `mcp__posthog-wizard__dashboard-create` with: Capture the returned `id` as `DASHBOARD_ID` and the returned PostHog URL. -If the call errors (permission denied, project misconfigured, network), emit one line — `Dashboard creation failed: . Skipping insights.` — and skip to (e). Don't retry. Don't fall back to a different approach. +**Emit the URL immediately for the wizard.** As soon as `dashboard-create` succeeds, write a single line on its own (no quotes, no surrounding code fence — just plain text in your assistant message): + +``` +[DASHBOARD_URL] +``` + +The wizard scans for the literal marker `[DASHBOARD_URL]` and stores the URL that follows. The marker can sit anywhere in a line, but a dedicated line is cleanest. **Emit this before attempting insight creation** — if insight creation fails afterwards, the wizard already has the dashboard URL and can surface it. + +If the call errors (permission denied, project misconfigured, network), emit one line — `Dashboard creation failed: . Skipping insights.` — and skip to (c). Don't retry. Don't fall back to a different approach. Do not emit `[DASHBOARD_URL]` on failure — there's no URL to surface. ### b. Create the three insights @@ -142,15 +152,7 @@ The report ends up with no dashboard line at all — that's the right UX for "no If every `insight-create` call failed but the dashboard itself was created, also try to delete the empty dashboard via `mcp__posthog-wizard__dashboard-delete` if that tool is available; otherwise note "Dashboard created but all insights failed; remove it manually at " in the run output and move on. -### d. Surface the dashboard URL - -Emit one line so the wizard can surface the dashboard to the user: - -``` -Created events audit dashboard: -``` - -### e. Clean up the inventory +### d. Clean up the inventory Whether creation succeeded, partially succeeded, or failed — delete the inventory now. It's transient scratch state.