Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions transformation-config/skills/events-audit/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
type: docs-only
template: description.md
description: Audit PostHog events in a codebase — produce an inventory of every captured event mapped to its file, area, and 30-day volume for the product team to query
tags: [analytics, audit, best-practices]
references:
preamble: "**Read ONLY this file.** Do not read any other reference file until this one tells you to."
shared_docs:
- https://posthog.com/docs/product-analytics/best-practices.md
- https://posthog.com/docs/getting-started/identify-users.md
variants:
- id: all
display_name: PostHog events audit
tags: [analytics, audit, best-practices]
docs_urls: []
70 changes: 70 additions & 0 deletions transformation-config/skills/events-audit/description.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# PostHog events audit

This skill produces a product-browseable report of every PostHog event your code captures, mapped to the codebase area, and enriched with 30-day volume from PostHog.

## Workflow

The audit runs as a 6-step chain:

1. Detect SDK
2. Scan capture sites (grep only)
3. Enrich (subagent fan-out — the only step that reads source files)
4. Query PostHog for volume
5. Write report
6. Create dashboard

Each step file points to the next. Run them in order. Don't explore the source tree on your own.

**Start by reading `references/1-detect.md`** (relative to this skill's directory – typically `.claude/skills/events-audit/references/1-detect.md`). Don't read ahead. Don't re-read a step once you've passed it. Don't re-read SKILL.md.

Step 1 seeds the audit checklist as its first action. Don't assume the runtime pre-seeds it.

## The audit checklist

The audit checklist has three shared checks in addition to the event map audit: `identity-segmentation`, `coverage-map`, `data-quality`. Finish each one. Don't invent new ids.

The checklist lives at `.posthog-audit-checks.json`. It's owned by MCP tools – **never `Write` it directly**,

## The events inventory

A second file, `.posthog-events-inventory.json`, is the working event inventory for steps 2 through 4. It holds the capture sites with derived `package`/`area`/`route`/`enclosing` fields, event names, properties, and per-event volume from PostHog.

It's **not** MCP-owned – no `audit_*` tool guards it. The inventory is **transient scratch state**, not a deliverable: step 5 deletes `.posthog-audit-checks.json` once the report is written, and step 6 deletes the inventory after the optional dashboard step. The report is the only artifact the user keeps.

Check entry shape:

- `id` - stable kebab-case slug. The three shared ids are `identity-segmentation`, `coverage-map`, `data-quality`.
- `area` - short group name. Shared entries use `Identity`, `Coverage`, `Data quality`.
- `label` - short human name.
- `status` - `pending` | `pass` | `error` | `warning` | `suggestion`.
- `file` - optional `path:line` for findings tied to a location.
- `details` - Markdown bulleted summary in plain language. Describe state and the product questions blocked. Don't render `status` as a grade in the report; the enum is for filter logic only.

## Key principles

- **Show your evidence.** Cite `file:line` for every non-pass finding.
- **Frame findings as product questions.** Every finding describes *what product question or insight it blocks*, not what code rule it breaks.
- **Hand the reader the map. Don't tell the story for them.** The deliverable is a single report with three short qualitative checks plus a few suggested follow-ups. The reader clusters events into flows on demand by asking targeted follow-up questions about the report — the skill doesn't do that synthesis upfront.

## Live activity – `[STATUS]`

The "Working on …" banner reads from `[STATUS]` lines you emit in plain text. Whenever you start a sub-step, write a line like:

```
[STATUS] Scanning capture sites
```

The wizard catches these and updates the spinner. Use them freely – they're cheap. Each step file lists the exact strings to emit. Don't invent your own.

## Abort statuses

Report aborts with `[ABORT]` prefixed messages. The wizard catches these and stops the run – don't halt yourself.

- `[ABORT] No PostHog SDK found`
- `[ABORT] No capture call sites found in any detected SDK`

MCP failures (project mismatch, query errors, no connection) are **not** abort conditions — step 4 soft-degrades and step 5 renders the report with a `{{mcp_disclaimer}}` callout in place of volume sections. See step 4 for the degradation contract.

## Framework guidelines

{commandments}
80 changes: 80 additions & 0 deletions transformation-config/skills/events-audit/references/1-detect.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
---
next_step: 2-scan.md
---

# Step 1 – Detect SDKs

Seed the audit checklist, then find every PostHog SDK in the project and remember which language(s) and framework(s) the rest of the audit will work on. **Read-only on the codebase.** Don't scan code for capture sites – that's step 2.

## Tools

Load via `ToolSearch select:Read,Glob,mcp__wizard-tools__audit_seed_checks,mcp__wizard-tools__audit_resolve_checks` once at the start of this step.

## Status

Emit, in order:

```
[STATUS] Seeding audit checklist
[STATUS] Detecting SDKs
```

## Action

### a. Seed the audit checklist

The checklist lives at `.posthog-audit-checks.json` and renders live in the "Audit plan" tab. **Don't rely on the runtime pre-seeding it** — call `mcp__wizard-tools__audit_seed_checks` directly here. The tool replaces the file atomically, so calling it once at the start of every run is safe.

Pass exactly these three shared checks (`identity-segmentation`, `coverage-map`, `data-quality`):

```json
{
"checks": [
{
"id": "identity-segmentation",
"area": "Identity",
"label": "Identity & segmentation",
"status": "pending"
},
{
"id": "coverage-map",
"area": "Coverage",
"label": "Coverage map",
"status": "pending"
},
{
"id": "data-quality",
"area": "Data quality",
"label": "Data quality",
"status": "pending"
}
]
}
```

Don't invent new ids — later steps resolve checks by these exact ids. Don't `Write` the file directly; the MCP tool owns it.

### b. Find PostHog SDKs

`Glob` for the project's dependency manifests across every language PostHog ships an SDK for. The full list:

- `package.json` - npm / pnpm / yarn (Node, web, React, Next.js, Nuxt, Vue, Svelte, Angular, React Native, Expo)
- `requirements.txt`, `pyproject.toml`, `Pipfile`, `setup.py` – Python (Django, Flask, FastAPI)
- `Gemfile` - Ruby / Rails
- `composer.json` - PHP / Laravel
- `go.mod` - Go
- `build.gradle`, `build.gradle.kts`, `pom.xml` – Java / Android
- `Podfile`, `Package.swift` – iOS / Swift
- `pubspec.yaml` - Flutter / Dart
- `*.csproj` - .NET
- `mix.exs` - Elixir

Read enough of them to identify which PostHog SDK the project uses, what version, and what framework it sits on top of.

If the project is a monorepo, you may find multiple PostHog SDKs.

If no PostHog SDK is anywhere in the project, emit `[ABORT] No PostHog SDK found` and stop. The wizard catches `[ABORT]` and terminates the run.

For each dependency manifest, extract every dependency whose name starts with `posthog` (e.g. `posthog`, `posthog-node`, `posthog-js`, `posthog-python`, `posthog-ruby`). Hold `{ sdk, version, manifest, framework }` per SDK in memory. The next step uses this list.

If no PostHog SDK is anywhere, emit `[ABORT] No PostHog SDK found`.
127 changes: 127 additions & 0 deletions transformation-config/skills/events-audit/references/2-scan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
---
next_step: 3-enrich.md
---

# Step 2 – Scan capture sites

Find every PostHog capture/identify/group SDK call in the codebase via a single `Grep` and write a base inventory. **Read-only via Grep.** Don't `Read` any source files in this step — file-level enrichment happens in step 3.

This step is one Grep, one Write. No file Reads, no subagents, no MCP. Severity, flows, and identity analysis come later.

## Tools

Load via `ToolSearch select:Grep,Write` once at the start of this step.

## Status

Emit, in order:

```
[STATUS] Scanning SDK capture sites
[STATUS] Writing base event inventory
```

## Action

### a. Grep for direct SDK calls (with context)

Run a single `Grep` for the standard PostHog call shapes. Use `-A 3` so multi-line capture calls are visible without opening the file. Narrow `--include` to the languages step 1 detected — don't scan `*.kt` if the project is Python.

```
Grep -rn -B 0 -A 3 -E 'posthog\??\.(capture|identify|alias|group|setPersonProperties|setPersonPropertiesForFlags|reset)|usePostHog\(\)\??\.(capture|identify)|client\??\.capture|PostHog\??\.(shared|capture)|Posthog\(\)\??\.capture'
```

The `\??\.` matches both `posthog.capture(...)` and `posthog?.capture(...)` (optional chaining). JS/TS codebases routinely guard SDK calls with `?.` when the SDK may be uninitialised — missing this pattern undercounts the inventory by half or more.

Common include patterns:

- Python: `--include='*.py'`
- JS/TS web: `--include='*.ts' --include='*.tsx' --include='*.js' --include='*.jsx' --include='*.vue' --include='*.svelte' --include='*.html'`
- Ruby: `--include='*.rb'`
- Go: `--include='*.go'`
- Java/Kotlin/Android: `--include='*.java' --include='*.kt'`
- iOS/Swift: `--include='*.swift'`
- Flutter: `--include='*.dart'`
- C#/.NET: `--include='*.cs'`
- Elixir: `--include='*.ex' --include='*.exs'`

**Exclude test files.** Drop hits in paths matching `*.test.*`, `*.spec.*`, `__tests__/**`, `tests/**`, `spec/**`. They pollute the inventory.

#### Per-SDK call signatures (covered by the regex above)

Canonical reference for what a PostHog capture call looks like in each SDK. The grep regex above is a union of these shapes; step 3 subagents also use this table to find `event_name` and `properties` slots when extracting (they `Read` this file once at start).

| SDK | Capture pattern | Event-name position | Properties position |
|-----|-----------------|---------------------|---------------------|
| posthog-js | `posthog.capture("event", { props })` | positional 1 | positional 2 (object literal) |
| posthog-js (hook) | `usePostHog().capture("event", { props })` | positional 1 | positional 2 |
| posthog-node | `client.capture({ distinctId, event, properties })` | object key `event` | object key `properties` |
| posthog-python | `posthog.capture(distinct_id, "event", properties)` | positional 2 | positional 3 (dict) |
| posthog-ruby | `posthog.capture({ distinct_id:, event:, properties: })` | hash key `event` | hash key `properties` |
| posthog-go | `client.Enqueue(posthog.Capture{Event: "...", Properties: posthog.NewProperties()...})` | struct field `Event` | struct field `Properties` |
| posthog-ios | `PostHog.shared.capture("event", properties: ["k": "v"])` | positional 1 | named `properties` |
| posthog-android | `PostHog.capture("event", properties = mapOf("k" to "v"))` | positional 1 | named `properties` |
| posthog-react-native | Same shape as posthog-js | positional 1 | positional 2 |
| posthog-flutter | `Posthog().capture(eventName: "...", properties: { ... })` | named `eventName` | named `properties` |
| posthog-php | `PostHog::capture(['distinctId' => ..., 'event' => '...', 'properties' => [...]])` | array key `event` | array key `properties` |
| posthog-dotnet | `client.Capture(distinctId, "event", new() { ["k"] = "v" })` | positional 2 | positional 3 |
| posthog-elixir | `Posthog.capture("event", distinct_id, %{ k: v })` | positional 1 | positional 3 |

If the result is empty:

- And the project's manifest had a PostHog SDK in step 1 → the codebase likely wraps the SDK behind a custom helper. Write `{ "rows": [], "wrapper_undetected": true }` to `.posthog-events-inventory.json` and skip the rest of this step (move on to step 3, which will short-circuit on empty rows). The data-quality check in the report step will flag this.
- And no SDK was in the manifest either → emit `[ABORT] No capture call sites found in any detected SDK`.

### b. Parse grep output into row groups

`Grep -A 3` emits one trigger line plus up to three following lines per match, separated by `--` divider lines (when running across files) or contiguous when matches are adjacent. For each match:

- The trigger line is `path:line:content` — the `.capture(` / `.identify(` / etc. site.
- The following 0–3 lines are continuations from the same file.
- Group them as a "slice" — the trigger line plus its trailing context lines.

The slice is what you reason about in step (c). You don't need to re-grep or open the file.

### c. Build base rows

For each grouped slice, build one row:

```jsonc
{
"id": "capture-<short-file-slug>-<line>",
"file": "src/checkout/Checkout.tsx",
"line": 88,
"raw_match": "<the trigger line + up to 3 continuation lines, joined by \\n>",
"event_name": "purchase_completed",
"is_dynamic": false
}
```

`event_name` resolution rule: extract the **first quoted string literal** (single, double, or backtick-quoted) found anywhere in the slice. If the first non-whitespace argument inside the parentheses is a quoted literal, take it. Otherwise:

- The slice contains a quoted literal but it's clearly a property value (e.g. `{ revenue: "USD" }`) and not the event name → keep scanning forward to find the event-name slot, or fall through to dynamic.
- The slice contains no quoted literal at all → set `event_name: null`, `is_dynamic: true`. Step 3's subagents will retry via Pattern A/B (same-file constant / enum) when they read the file.
- The argument is a template literal (`` `name_${...}` ``), variable, or expression → set `event_name: null`, `is_dynamic: true`.

**Don't try to be clever.** If the slice doesn't make the literal obvious, leave it dynamic — step 3 has the file open and will resolve what it can.

Skip `$pageview` and `$pageleave` matches entirely — they're SDK-internal in most setups. Drop those rows; they don't go into the inventory.

### d. Write the base inventory

`Write` `.posthog-events-inventory.json` with the rows:

```jsonc
{
"rows": [ <base rows> ],
"wrapper_undetected": false
}
```

This file is small (~80 bytes per row × 100 rows ≈ 8KB) so the Write fits in one turn easily.

## Notes on wrapper resolution

This step intentionally does **not** chase wrapper functions (`trackEvent`, `analytics.track`, etc.). Cross-file wrapper resolution doesn't fit cleanly in row-range subagent fan-out, and the reframing principle is "let the reader ask follow-ups."

If `wrapper_undetected: true` (SDK in deps but no direct calls found), the report step's data-quality check surfaces it, and the suggested-follow-ups list points the reader at: *"find calls to `trackEvent`/`logEvent`/`analytics.track` and resolve their callers as additional capture sites."*
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# Step 3 enrichment reference

Lookup tables and rules subagents apply during step 3 enrichment. Read this file **once** at the start of your enrichment run.

This file is supporting material for step 3; it has no `next_step` and is not part of the main step chain. The orchestrator does not read it.

The per-SDK capture call signatures (where `event_name` and `properties` live in each SDK's call shape) are in `2-scan.md` under "Per-SDK call signatures". Read that section once at the start of your enrichment run alongside this file — you'll need it to extract `event_name` and `properties`.

## Identification surfaces

Set `call_kind` according to the call:

- `posthog.identify(distinctId, $set, $set_once)` → `identify`
- `posthog.setPersonProperties({ ... })` → `set`
- `posthog.setPersonPropertiesForFlags` → `set_once`
- `posthog.group(type, key, properties)` → `group`
- `posthog.alias(alias, distinctId)` → `alias`
- `posthog.reset()` → `reset` (no event name; the identity check uses presence to score cross-device hygiene)

## `package` rules (monorepo dimension)

Compute `package` **before** `area` from the file path. Match the first prefix below; everything after the prefix's package segment is what `area` rules then operate on.

| Path prefix | `package` |
|---|---|
| `apps/<name>/...` | `<name>` |
| `packages/<name>/...` | `<name>` |
| `services/<name>/...` | `<name>` |
| `projects/<name>/...` | `<name>` |
| Anything else | `null` |

Examples:
- `apps/web/components/Checkout/Checkout.tsx` → `package: "web"`, then `area` rules see `components/Checkout/Checkout.tsx`.
- `packages/sdk/src/track.ts` → `package: "sdk"`, then `area` rules see `src/track.ts`.
- `src/checkout/Checkout.tsx` → `package: null`, `area` rules see the original path.

Don't fabricate a package from `src/` or `app/` — those are within-package directories, not package roots.

## `area` rules

After `package` extraction, strip one leading `src/`, `app/`, or `pages/` from the remaining path. Then apply the first matching rule:

| Path shape after stripping | `area` |
|---|---|
| `app/<x>/...` (Next.js app router) | `<x>` |
| `pages/<x>/...` (Next.js pages router) | `<x>` (use `api/<seg>` for `pages/api/<seg>/...`) |
| `components/<x>/...` | `<x>` |
| `features/<x>/...` | `<x>` |
| `screens/<x>/...` | `<x>` (mobile) |
| `routes/<x>/...`, `views/<x>/...`, `controllers/<x>/...` (backend) | `<x>` |
| `hooks/...`, `lib/...`, `utils/...`, `analytics/...`, `services/...`, `helpers/...` | `shared` |
| `app/layout.tsx`, `app/template.tsx`, `_app.tsx`, `_document.tsx`, `app/error.tsx`, `app/not-found.tsx` | `global` |
| Anything else | first path segment after stripping, lowercased |

Strip only the first matching prefix.

## `route` rules (Next.js only)

- `app/foo/page.tsx` → `/foo`
- `app/foo/bar/page.tsx` → `/foo/bar`
- `app/foo/[id]/page.tsx` → `/foo/[id]`
- `app/(group)/foo/page.tsx` → `/foo` (route groups in parens are ignored)
- `pages/foo.tsx` → `/foo`
- `pages/foo/[id].tsx` → `/foo/[id]`
- `pages/api/<rest>` → `/api/<rest>` (without the file extension)

Set `route: null` for any path that isn't router-shaped. Don't fabricate routes for non-Next.js codebases.

## `enclosing` rules

Backward-scan from the capture line. Match these patterns (first match wins above the capture line):

- `function (\w+)\(` (named function)
- `const (\w+) = \(?` / `const (\w+) = async`
- `export (?:default )?function (\w+)\(`
- `export const (\w+) = `
- `class (\w+)`
- `def (\w+)\(` (Python)
- `func (\w+)\(` (Go / Swift)
- `fun (\w+)\(` (Kotlin)
- `def (\w+)` (Ruby)

Take the closest match above the capture line at column 0 or one indent level deeper than the capture's expected wrapper. If nothing matches within ~80 lines above, set `enclosing: null`. Don't read more file context to chase it.

For unnamed default exports (`export default function () { ... }`), use the file's basename without extension as the enclosing name (e.g. `CheckoutPage`).
Loading
Loading