Skip to content

feat: demo-ready CLI consolidation + autonomous orchestration#52

Merged
WellDunDun merged 31 commits intodevfrom
custom/prefix/router-1773579485701
Mar 15, 2026
Merged

feat: demo-ready CLI consolidation + autonomous orchestration#52
WellDunDun merged 31 commits intodevfrom
custom/prefix/router-1773579485701

Conversation

@WellDunDun
Copy link
Collaborator

@WellDunDun WellDunDun commented Mar 15, 2026

Summary

Complete demo-ready overhaul of selftune CLI and autonomous operation pipeline.

CLI Consolidation (28 → 21 grouped commands)

  • selftune ingest <agent> — unified ingestion (claude, codex, opencode, openclaw, wrap-codex)
  • selftune grade <mode> — unified grading (auto, baseline, or default session grade)
  • selftune evolve <action> — unified evolution (body, rollback, or default evolve)
  • selftune eval <type> — unified evals (generate, unit-test, composability, import)
  • Bare selftune now shows status dashboard instead of help text

Autonomous Orchestration

  • orchestrate --loop continuous mode with configurable interval and SIGINT handler
  • Phased decision reports for orchestrator explainability
  • Evidence-based candidate selection gating
  • Orchestrate run reports persisted in dashboard

UX Improvements (autoresearch-inspired)

  • Default cheap-loop mode on, --full-model escape hatch
  • Git-style diff display after skill evolution deployments
  • Selftune resource usage in skill reports (replaces misleading session metrics)

Infrastructure

  • Local dashboard SPA (React + Vite) with SQLite materialization
  • Generic scheduling with OpenClaw cron as optional
  • E2E autonomy proof harness for evolution pipeline
  • Hardened LLM calls and improved sync/query filtering

Documentation

  • All 14+ workflow docs updated with new command names
  • SKILL.md, AGENTS.md, ARCHITECTURE.md, README.md aligned
  • Operator guide rewritten for autonomy-first setup
  • Agent definitions updated (diagnosis, evolution-reviewer, pattern-analyst, integration-guide)

Test plan

  • selftune (bare) shows status output
  • selftune --help shows grouped command structure
  • selftune ingest claude routes correctly
  • selftune grade auto routes correctly
  • selftune evolve rollback routes correctly
  • selftune eval unit-test routes correctly
  • selftune orchestrate --loop --loop-interval 120 starts continuous mode
  • bun run lint passes
  • bun test passes (excluding known dashboard timeout)

🤖 Generated with Claude Code

WellDunDun and others added 6 commits March 15, 2026 13:56
Reframes operator guide around autonomy-first setup, adds orchestrate runs
endpoint to architecture/dashboard docs, and updates skill workflows to
recommend --enable-autonomy as the default initialization path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- orchestrate --loop: continuous autonomous improvement cycle with configurable interval
- evolve: default cheap-loop on, add --full-model escape hatch, show diff after deploy
- bare `selftune` shows status dashboard instead of help text

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…n skill report

Skill report cards now display selftune's own LLM calls and evolution
duration per skill (from orchestrate_runs) instead of misleading
session-level token/duration aggregates. Also extracts tokens and
duration from transcripts into canonical execution facts for future use.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Mar 15, 2026

Warning

Rate limit exceeded

@WellDunDun has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 8 minutes and 4 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e5405b8a-92cc-44be-bbcb-93ac7096f4da

📥 Commits

Reviewing files that changed from the base of the PR and between 08e4dd9 and c9f39ff.

📒 Files selected for processing (13)
  • ARCHITECTURE.md
  • README.md
  • apps/local-dashboard/src/pages/Status.tsx
  • cli/selftune/hooks/prompt-log.ts
  • cli/selftune/hooks/session-stop.ts
  • cli/selftune/orchestrate.ts
  • llms.txt
  • skill/Workflows/Evals.md
  • skill/Workflows/Initialize.md
  • skill/Workflows/Orchestrate.md
  • skill/references/logs.md
  • tests/hooks/signal-detection.test.ts
  • tests/signal-orchestrate.test.ts
📝 Walkthrough

Walkthrough

Restructures CLI into grouped subcommands, adds loop-capable orchestrate with lockfile and signal-aware candidate selection, records per-skill elapsed_ms and llm_calls into persisted orchestrate_runs, surfaces aggregated selftune_stats and doctor endpoint on the dashboard API, adds Status UI, extends transcript/ingestor telemetry (tokens/duration), adds signal detection and reactive orchestration spawn, and auto-installs Claude Code hooks on init.

Changes

Cohort / File(s) Summary
CLI routing & commands
cli/selftune/index.ts, cli/selftune/*
Consolidate CLI into nested group subcommands (ingest, eval, grade, evolve, cron/schedule, hook, etc.); default no-arg → status; dynamic imports and updated help text.
Orchestrator, locking & signals
cli/selftune/orchestrate.ts, cli/selftune/constants.ts, cli/selftune/types.ts
Add loop mode (--loop, --loop-interval), lockfile acquire/release, read/consume improvement signals, propagate signaledSkills into candidate selection, and extend run reports / skill actions with elapsed_ms and llm_calls. New exports: acquireLock, releaseLock, markSignalsConsumed, ImprovementSignalRecord and augmented types.
Hooks & signal detection
cli/selftune/hooks/prompt-log.ts, cli/selftune/hooks/session-stop.ts
Detect improvement signals from prompts, append to SIGNAL_LOG, expose getInstalledSkillNames/detectImprovementSignal, add maybeSpawnReactiveOrchestrate() invoked on session-stop; session-stop now records input/output token and duration telemetry.
Telemetry parsing & ingestors
cli/selftune/utils/transcript.ts, cli/selftune/ingestors/claude-replay.ts
Parse and propagate input_tokens, output_tokens, and duration_ms from transcripts/session metrics into TranscriptMetrics and canonical execution facts.
Dashboard server & contract
cli/selftune/dashboard-server.ts, cli/selftune/dashboard-contract.ts, tests/dashboard/*
Add GET /api/v2/doctor; compute and expose selftune_stats by aggregating orchestrate_runs.skill_actions_json; expand SkillReportResponse to include token_usage, duration_stats, and selftune_stats; update tests/fixtures.
Local dashboard UI & client types
apps/local-dashboard/src/pages/SkillReport.tsx, apps/local-dashboard/src/pages/Status.tsx, apps/local-dashboard/src/api.ts, apps/local-dashboard/src/hooks/*, apps/local-dashboard/src/types.ts, apps/local-dashboard/src/components/app-sidebar.tsx, apps/local-dashboard/src/App.tsx
SkillReport switched to selftune_stats fields; add Status page, fetchDoctor/useDoctor, DoctorResult/Health types, sidebar System Status link, routing and cache tuning.
Evolution UX & CLI flags
cli/selftune/evolution/evolve.ts, cli/selftune/evolution/*
Add --full-model flag, change cheap-loop defaulting and model fallbacks, print simple color diffs for proposals/deploys; help text updated.
Init / Claude Code hooks
cli/selftune/init.ts
Add installClaudeCodeHooks() to merge bundled Claude Code hook snippet into user Claude settings during init (idempotent, non‑destructive).
Dashboard tests & fixtures
tests/dashboard/*
Update fixtures to include selftune_stats and adjust token/duration expectations to match new SkillReportResponse contract.
Docs & repository shape
AGENTS.md, ARCHITECTURE.md, PRD.md, README.md, CHANGELOG.md, docs/**, skill/**, apps/*
Major docs overhaul: agent-first framing, operator/architecture/operator-guide, many workflow docs, command renames (hyphenated → grouped subcommands), dashboard/workflow descriptions, SPA/hand-off docs.
Small help/text-only renames
skill/Workflows/*, docs/design-docs/*, .claude/agents/*
Widespread documentation and help string renames: evalseval generate, baselinegrade baseline, unit-testeval unit-test, ingest-*ingest <service>, rollbackevolve rollback, and CLI example updates.
Package & build
package.json
Add prepublishOnly script to build dashboard before publish.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI (selftune)
    participant Orch as Orchestrator
    participant DB as LocalDB (orchestrate_runs)
    participant Hooks as Hooks (prompt-log/session-stop)
    participant DashSrv as Dashboard Server
    participant UI as Local Dashboard

    rect rgba(240,240,255,0.5)
    CLI->>Orch: start orchestrate [--loop | --loop-interval]
    end

    rect rgba(240,255,240,0.5)
    Orch->>DB: query skills, schedules, previous runs
    Orch->>DB: read pending signals (SIGNAL_LOG) via readSignals
    Orch->>Orch: selectCandidates(signaledSkills)
    Orch->>Orch: decide actions per skill (evolve/watch)
    Orch->>DB: persist orchestrate_run (skill_actions_json incl. elapsed_ms,llm_calls)
    Orch-->>CLI: emit JSON report + human summary
    end

    rect rgba(255,240,240,0.5)
    Hooks->>DB: on session-stop append canonical execution + tokens/duration
    Hooks->>Orch: maybeSpawnReactiveOrchestrate() (fire-and-forget)
    end

    rect rgba(255,255,240,0.5)
    DashSrv->>DB: materialize orchestrate_runs -> aggregate elapsed_ms/llm_calls
    DashSrv->>DashSrv: compute selftune_stats
    DashSrv-->>UI: serve SkillReport API (includes selftune_stats) and GET /api/v2/doctor
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch custom/prefix/router-1773579485701

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
cli/selftune/utils/transcript.ts (1)

48-67: ⚠️ Potential issue | 🟠 Major

Use the same timestamp fallback here as the rest of the parser.

extractActionableUserQueries() already checks message.timestamp, but duration_ms only reads entry.timestamp. Any transcript variant that nests timestamps inside message will silently lose duration data.

Suggested fix
-    // Track timestamps for duration calculation
-    const ts = entry.timestamp as string | undefined;
+    const msg = (entry.message as Record<string, unknown>) ?? entry;
+    const ts =
+      (entry.timestamp as string | undefined) ?? (msg.timestamp as string | undefined);
     if (ts) {
       if (!firstTimestamp) firstTimestamp = ts;
       lastTimestamp = ts;
     }
@@
-    // Normalise: unwrap nested message if present
-    const msg = (entry.message as Record<string, unknown>) ?? entry;
+    // Normalise: unwrap nested message if present
     const role = (msg.role as string) ?? (entry.role as string) ?? "";
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli/selftune/utils/transcript.ts` around lines 48 - 67, The duration
timestamp handling only reads entry.timestamp and ignores nested message
timestamps; update the timestamp extraction logic (where firstTimestamp and
lastTimestamp are set) to use the same fallback as elsewhere by checking both
entry.timestamp and (entry.message as Record<string, unknown>)?.timestamp (e.g.,
const ts = (entry.timestamp as string | undefined) ?? ((entry.message as
Record<string, unknown>)?.timestamp as string | undefined)); then use that ts to
set firstTimestamp and lastTimestamp so duration_ms is computed correctly for
transcripts that nest timestamps inside message; ensure you reference the
existing variables firstTimestamp, lastTimestamp and the local entry/msg
extraction.
PRD.md (1)

124-136: ⚠️ Potential issue | 🟡 Minor

Update the platform/adapters count in this section.

This section now lists four platforms/adapters after adding OpenClaw, but the surrounding prose still says “three.” That leaves the section internally inconsistent.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@PRD.md` around lines 124 - 136, Update the opening sentence under the "##
Multi-Tool Architecture" header to reflect four platforms (Claude Code, Codex,
OpenCode, OpenClaw) instead of "three"; locate the phrase mentioning "three
major agent platforms" and change it to "four major agent platforms" so the
prose matches the listed adapters (Claude Code, Codex, OpenCode, OpenClaw).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@AGENTS.md`:
- Line 11: Add a language specifier (e.g., text or plaintext) to the fenced code
block that contains the directory listing "selftune/" so the markdown fence
becomes ```text (or ```plaintext) instead of just ```, ensuring static analysis
warnings are resolved; update the fenced block surrounding the "selftune/" line
accordingly.

In `@ARCHITECTURE.md`:
- Around line 161-176: The ARCHITECTURE.md import matrix is too strict: update
it so it reflects actual dependencies by either (A) expanding the Dashboard row
to include the additional allowed imports (add Status, Observability, Evolution
or list "Status, Observability, Evolution" alongside Shared, LocalDB) to cover
cli/selftune/dashboard-server.ts which depends on status.ts, observability.ts,
and evolution/evidence.ts, and by relaxing the Shared row to permit intra-shared
helper imports used by cli/selftune/utils/transcript.ts, or (B) add a
"Transitional exceptions" subsection listing the specific files
(cli/selftune/dashboard-server.ts and cli/selftune/utils/transcript.ts) and the
exact modules they may import (status.ts, observability.ts,
evolution/evidence.ts, other Shared helpers); choose one approach and update the
matrix and/or add the exceptions so the documented rules match the codebase.

In `@cli/selftune/dashboard-server.ts`:
- Around line 891-897: The current code repurposes durationStats and hardcodes
token_usage = 0, fabricating legacy metrics; instead keep selftuneStats
(avg_elapsed_ms, total_elapsed_ms, run_count) as-is and build separate legacy
execution metrics by aggregating execution_facts for input_tokens,
output_tokens, duration_ms, and errors_encountered; update token_usage to sum
input_tokens+output_tokens from execution_facts, set duration_stats to the
aggregated duration_ms from execution_facts (not selftuneStats), and populate
total_errors from execution_facts errors aggregation; locate and change the
variables durationStats, token_usage, and any uses of selftuneStats in the route
handler to use execution_facts aggregation logic so legacy fields are derived
only from execution_facts while preserving selftuneStats unchanged.
- Around line 833-842: The loop over actions parsed from skill_actions_json
incorrectly increments selftuneRunCount for any matching a.skill, including
skip/watch entries; update the loop (the actions array handling where a.skill is
compared) to only count and aggregate elapsed_ms/llm_calls for actual evolution
runs by checking the action/type field (e.g., a.action or a.type) and excluding
'skip' and 'watch' (or explicitly requiring the evolve/run action name used by
orchestrate_runs.skill_actions); ensure you only increment selftuneRunCount and
add to totalSelftunElapsedMs/totalLlmCalls when that type check passes so
run_count and avg_elapsed_ms reflect real evolution runs.

In `@cli/selftune/evolution/evolve.ts`:
- Around line 146-162: formatSimpleDiff currently compares oldLines[i] to
newLines[i] by index which yields misleading diffs when lines are
inserted/deleted; replace its body to use the diff package (import diffLines
from 'diff' or use diff.diffLines) to compute line-level changes between oldText
and newText, then iterate the resulting change objects and push removed lines
prefixed with red "- ", added lines with green "+ ", and skip or omit unchanged
chunks as before; update the function formatSimpleDiff and add the diff
dependency to package.json (or import) so that diffLines(oldText, newText)
drives the output instead of index-based comparison.

In `@cli/selftune/index.ts`:
- Around line 85-90: In the if (!command) block in cli/selftune/index.ts, call
statusMain() as before but add an explicit return immediately after the
statusMain() call to prevent control-flow fallthrough if statusMain stops
calling process.exit or throws; update the block around the existing statusMain
import/call (symbol: statusMain) so execution does not continue to argv.shift()
and the subsequent switch when command is falsy.

In `@cli/selftune/orchestrate.ts`:
- Around line 804-809: The SIGINT handler should request a cooperative shutdown
rather than calling process.exit directly: replace the immediate exit in the
handler used when isLoop is true with setting a shared cancellation flag (e.g.,
stopRequested or cancelOrchestrate) or invoking an existing cancellation
function so the running cycle (sync, evolve, watch) can detect the request,
flush audit entries and orchestrate_runs.jsonl, and exit cleanly; update the
long-running loop and the functions sync/evolve/watch to periodically check that
flag or accept a CancellationToken and perform final flush/cleanup when set;
apply the same change to the other SIGINT handler used elsewhere (the similar
handler referenced in the review) so both signal handlers follow the cooperative
shutdown pattern.

In `@PRD.md`:
- Line 315: Add a blank line above the heading "M9 — Trustworthy Autonomy (1.0)"
so the heading does not directly follow the preceding list (this will satisfy
markdownlint rule MD022); locate the heading "M9 — Trustworthy Autonomy (1.0)"
in PRD.md and insert a single empty line immediately before it.

---

Outside diff comments:
In `@cli/selftune/utils/transcript.ts`:
- Around line 48-67: The duration timestamp handling only reads entry.timestamp
and ignores nested message timestamps; update the timestamp extraction logic
(where firstTimestamp and lastTimestamp are set) to use the same fallback as
elsewhere by checking both entry.timestamp and (entry.message as Record<string,
unknown>)?.timestamp (e.g., const ts = (entry.timestamp as string | undefined)
?? ((entry.message as Record<string, unknown>)?.timestamp as string |
undefined)); then use that ts to set firstTimestamp and lastTimestamp so
duration_ms is computed correctly for transcripts that nest timestamps inside
message; ensure you reference the existing variables firstTimestamp,
lastTimestamp and the local entry/msg extraction.

In `@PRD.md`:
- Around line 124-136: Update the opening sentence under the "## Multi-Tool
Architecture" header to reflect four platforms (Claude Code, Codex, OpenCode,
OpenClaw) instead of "three"; locate the phrase mentioning "three major agent
platforms" and change it to "four major agent platforms" so the prose matches
the listed adapters (Claude Code, Codex, OpenCode, OpenClaw).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e1be0b74-45d8-49b2-8296-b5323774b7e5

📥 Commits

Reviewing files that changed from the base of the PR and between b14b6d2 and 3bea7f7.

📒 Files selected for processing (24)
  • AGENTS.md
  • ARCHITECTURE.md
  • PRD.md
  • README.md
  • apps/local-dashboard/src/pages/SkillReport.tsx
  • cli/selftune/dashboard-contract.ts
  • cli/selftune/dashboard-server.ts
  • cli/selftune/evolution/evolve.ts
  • cli/selftune/hooks/session-stop.ts
  • cli/selftune/index.ts
  • cli/selftune/ingestors/claude-replay.ts
  • cli/selftune/orchestrate.ts
  • cli/selftune/types.ts
  • cli/selftune/utils/transcript.ts
  • docs/design-docs/index.md
  • docs/design-docs/system-overview.md
  • docs/integration-guide.md
  • docs/operator-guide.md
  • skill/SKILL.md
  • skill/Workflows/Dashboard.md
  • skill/Workflows/Initialize.md
  • skill/Workflows/Schedule.md
  • tests/dashboard/badge-routes.test.ts
  • tests/dashboard/dashboard-server.test.ts

Comment on lines +146 to +162
function formatSimpleDiff(oldText: string, newText: string): string {
const oldLines = oldText.split("\n");
const newLines = newText.split("\n");
const output: string[] = [];
const maxLen = Math.max(oldLines.length, newLines.length);
for (let i = 0; i < maxLen; i++) {
const oldLine = oldLines[i];
const newLine = newLines[i];
if (oldLine === newLine) continue;
if (oldLine !== undefined) {
output.push(`\x1b[31m- ${oldLine}\x1b[0m`);
}
if (newLine !== undefined) {
output.push(`\x1b[32m+ ${newLine}\x1b[0m`);
}
}
return output.join("\n");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if diff package is already a dependency
cat package.json | jq '.dependencies["diff"] // .devDependencies["diff"] // "not found"'

Repository: selftune-dev/selftune

Length of output: 76


🏁 Script executed:

#!/bin/bash
# Check where formatSimpleDiff is used
rg "formatSimpleDiff" --type ts -B 2 -A 2

Repository: selftune-dev/selftune

Length of output: 817


🏁 Script executed:

#!/bin/bash
# Check the package.json for any dependency constraints or comments
head -50 package.json

Repository: selftune-dev/selftune

Length of output: 1090


🏁 Script executed:

#!/bin/bash
# Check the context around line 805-811 where diff output is used
sed -n '800,815p' cli/selftune/evolution/evolve.ts

Repository: selftune-dev/selftune

Length of output: 744


🏁 Script executed:

#!/bin/bash
# Check for other diff-related utilities in the codebase
rg "diff|patch|similarity" --type ts --type js | grep -i "import\|export\|function" | head -20

Repository: selftune-dev/selftune

Length of output: 1401


🏁 Script executed:

#!/bin/bash
# Check the comment context around formatSimpleDiff to see if there are notes about dependencies
sed -n '135,165p' cli/selftune/evolution/evolve.ts

Repository: selftune-dev/selftune

Length of output: 1055


🏁 Script executed:

#!/bin/bash
# Check if "diff" or similar packages are mentioned in docs
find . -name "*.md" -type f | xargs grep -l "diff\|dependency" 2>/dev/null | head -5

Repository: selftune-dev/selftune

Length of output: 214


Index-based line comparison produces misleading diffs when lines are inserted or deleted.

When a line is inserted, all subsequent lines appear changed because the function compares oldLines[i] with newLines[i] by position rather than content. This makes the skill description diff output confusing to users.

Replace the index-based approach with the diff package:

Use the `diff` package for accurate line-based diffing
+import { diffLines } from "diff";
+
 function formatSimpleDiff(oldText: string, newText: string): string {
-  const oldLines = oldText.split("\n");
-  const newLines = newText.split("\n");
-  const output: string[] = [];
-  const maxLen = Math.max(oldLines.length, newLines.length);
-  for (let i = 0; i < maxLen; i++) {
-    const oldLine = oldLines[i];
-    const newLine = newLines[i];
-    if (oldLine === newLine) continue;
-    if (oldLine !== undefined) {
-      output.push(`\x1b[31m- ${oldLine}\x1b[0m`);
-    }
-    if (newLine !== undefined) {
-      output.push(`\x1b[32m+ ${newLine}\x1b[0m`);
-    }
-  }
-  return output.join("\n");
+  const changes = diffLines(oldText, newText);
+  const output: string[] = [];
+  for (const part of changes) {
+    const lines = part.value.replace(/\n$/, "").split("\n");
+    for (const line of lines) {
+      if (part.added) {
+        output.push(`\x1b[32m+ ${line}\x1b[0m`);
+      } else if (part.removed) {
+        output.push(`\x1b[31m- ${line}\x1b[0m`);
+      }
+    }
+  }
+  return output.join("\n");
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli/selftune/evolution/evolve.ts` around lines 146 - 162, formatSimpleDiff
currently compares oldLines[i] to newLines[i] by index which yields misleading
diffs when lines are inserted/deleted; replace its body to use the diff package
(import diffLines from 'diff' or use diff.diffLines) to compute line-level
changes between oldText and newText, then iterate the resulting change objects
and push removed lines prefixed with red "- ", added lines with green "+ ", and
skip or omit unchanged chunks as before; update the function formatSimpleDiff
and add the diff dependency to package.json (or import) so that
diffLines(oldText, newText) drives the output instead of index-based comparison.

### M9 — Autonomous (1.0)
- Fully autonomous loop: observe → grade → evolve → deploy → watch
- Human-in-the-loop controls: approve/reject PR, pause evolution, pin a description
### M9 — Trustworthy Autonomy (1.0)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a blank line before the M9 heading.

markdownlint MD022 is firing here because the heading immediately follows the preceding list.

Suggested fix
 - Devcontainer-based isolation with firewall, no API key needed
+
 ### M9 — Trustworthy Autonomy (1.0)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
### M9 — Trustworthy Autonomy (1.0)
- Devcontainer-based isolation with firewall, no API key needed
### M9 — Trustworthy Autonomy (1.0)
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)

[warning] 315-315: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@PRD.md` at line 315, Add a blank line above the heading "M9 — Trustworthy
Autonomy (1.0)" so the heading does not directly follow the preceding list (this
will satisfy markdownlint rule MD022); locate the heading "M9 — Trustworthy
Autonomy (1.0)" in PRD.md and insert a single empty line immediately before it.

Group 15 related commands under 4 parent commands:
- selftune ingest <agent> (claude, codex, opencode, openclaw, wrap-codex)
- selftune grade [mode] (auto, baseline)
- selftune evolve [target] (body, rollback)
- selftune eval <action> (generate, unit-test, import, composability)

Update all 39 files: router, subcommand help text, SKILL.md, workflow
docs, design docs, README, PRD, CHANGELOG, and agent configs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@WellDunDun WellDunDun changed the title feat: show selftune resource usage in skill report feat: demo-ready CLI consolidation + autonomous orchestration Mar 15, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
skill/Workflows/EvolveBody.md (1)

60-68: ⚠️ Potential issue | 🟡 Minor

Remove the stale evolve-body spelling from the pre-flight section.

This block still names the old hyphenated command while the executable examples now use selftune evolve body. Keeping both spellings here is ambiguous for agents following the workflow.

As per coding guidelines, skill/**/*.md: review for clear step-by-step instructions and no ambiguous references.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skill/Workflows/EvolveBody.md` around lines 60 - 68, Update the Pre-Flight
Configuration section under the "Pre-Flight Configuration" heading to remove the
stale hyphenated command spelling "evolve-body" and replace or unify it with the
current command usage "selftune evolve body" (or simply present "selftune evolve
body" as the recommended example). Ensure the displayed example block and any
references in that paragraph use "selftune evolve body" consistently so there is
no ambiguity for agents following the workflow.
♻️ Duplicate comments (2)
cli/selftune/index.ts (1)

64-69: ⚠️ Potential issue | 🟡 Minor

Prevent the default status path from falling through into the command switch.

If statusMain() ever returns normally, execution continues at Line 75 and the router still reaches the default case with command === undefined. Guard the dispatch with an else/main() boundary instead of relying on statusMain() to terminate the process.

Also applies to: 75-77

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli/selftune/index.ts` around lines 64 - 69, The default path calls
statusMain() but doesn't prevent fallthrough to the command switch when
statusMain() returns; update the branch that checks command so it either
returns/throws after invoking statusMain() or use an else block around the
command dispatch to ensure the switch/default never runs with command ===
undefined—specifically modify the block that imports and calls statusMain
(cliMain: statusMain) so it exits the current function (or wraps the switch in
an else) to guard the dispatch and avoid hitting the default case.
PRD.md (1)

173-174: ⚠️ Potential issue | 🟡 Minor

Fix MD022 heading spacing (blank line after headings).

Line 173, Line 280, and Line 315 headings should be followed by a blank line to satisfy markdownlint MD022.

Proposed markdownlint fix
 ### Retroactive Replay (`selftune ingest claude`)
+
 Batch ingestor for existing Claude Code session transcripts. Scans `~/.claude/projects/<hash>/<session-id>.jsonl`, extracts user queries and session metrics, and populates the shared JSONL logs. Idempotent via marker file — safe to run repeatedly. Supports `--since` date filtering, `--dry-run` preview, `--force` re-ingestion, and `--verbose` output. Bootstraps the eval corpus from existing sessions without waiting for hooks to accumulate data.
@@
 ### M7 — Retroactive Replay & Community Contribution (Complete)
+
 - `selftune ingest claude`: batch ingest Claude Code transcripts from `~/.claude/projects/`
@@
 ### M9 — Trustworthy Autonomy (1.0)
+
 - Stronger candidate selection and evidence gating

Also applies to: 280-281, 315-316

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@PRD.md` around lines 173 - 174, Add a blank line immediately after the
Markdown heading "### Retroactive Replay (`selftune ingest claude`)" and
likewise insert a blank line after the other top-level or subheadings flagged by
the reviewer (the headings reported around lines 280 and 315) so each heading is
followed by an empty line to satisfy markdownlint rule MD022; edit the PRD.md
file and ensure every heading (e.g., "### Retroactive Replay (`selftune ingest
claude`)", plus the two other flagged headings) has one blank line beneath it.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cli/selftune/index.ts`:
- Around line 231-250: Wrap the parseArgs(...) call in a try/catch to handle and
report parse errors from parseArgs(strict: true) and exit non-zero; then
validate values.window after parsing by ensuring Number.parseInt(values.window,
10) yields a finite integer > 0 (reject NaN, negatives, zero, and non-integer
strings) and print a clear error + exit if invalid. Locate the parsing block
around parseArgs and the uses of values.window/Number.parseInt and ensure you
pass a valid windowSize (number | undefined) into analyzeComposability only
after this validation; keep the error messages descriptive (e.g., "--window must
be a positive integer") and return non-zero exit codes on failure.

In `@PRD.md`:
- Around line 134-136: Update the mismatched adapter count: replace the phrase
"All three adapters" with either "All adapters" or "All four adapters" in the
paragraph that follows the OpenClaw bullet (the sentence beginning "All three
adapters write to the same shared log schema.") so the wording matches the
listed adapters (Claude Code, Codex, OpenCode, OpenClaw).

In `@skill/references/invocation-taxonomy.md`:
- Line 4: Replace the ambiguous CLI invocation "eval generate" with the
fully-qualified command "selftune eval generate" wherever it appears (notably
the instance shown and the occurrences around lines 95–96) in
invocation-taxonomy.md so the docs reference the exact top-level command; update
both locations to the exact string "selftune eval generate".

In `@skill/SKILL.md`:
- Around line 35-36: Update the `selftune dashboard` description to explicitly
document both output modes: when invoked with `--export` it produces an HTML
artifact (static file) and may print informational progress lines, and when
invoked with `--serve` it runs a live server that emits server logs/non-JSON
output instead of exporting an artifact; adjust the text that currently claims a
single behavior so it mentions both `--export` (artifact) and `--serve` (live
server logs) and mirror this clarification where `selftune dashboard` is
referenced later in the doc.

In `@skill/Workflows/Composability.md`:
- Around line 21-49: The documented `eval composability` JSON schema in
skill/Workflows/Composability.md conflicts with the actual serialization used by
analyzeComposability() and the shape described in
docs/design-docs/composability-v2.md; pick a single canonical schema (e.g., the
Composability.md shape with keys: skill_name, analyzed_sessions,
co_occurring_skills, conflict_candidates, generated_at), then update
analyzeComposability() and the serializer in cli/selftune/index.ts to emit that
schema (rename or map pairs → co_occurring_skills and total_sessions_analyzed →
analyzed_sessions, conflict_count → length of conflict_candidates), and finally
harmonize docs/design-docs/composability-v2.md and
skill/Workflows/Composability.md to the chosen contract so agents/readers and
tests use one unambiguous format.

In `@skill/Workflows/Dashboard.md`:
- Around line 65-89: The fenced code block that shows the example command
"selftune dashboard --serve --port 8080" needs a language specifier for proper
syntax highlighting; update the triple-backtick opening fence to include "bash"
(i.e., ```bash) for that code block in the Dashboard.md content so the command
is rendered as a bash snippet.
- Around line 90-133: Open the Dashboard.md under skill/Workflows and update the
three JSON fenced code blocks (the "Watch and Evolve" request body, the
"Rollback" request body, and the response example under "All action endpoints")
to include the language specifier by changing the opening backtick fence from
``` to ```json so the blocks render with JSON syntax highlighting; locate the
exact blocks by the headings "Watch and Evolve" and "Rollback" and the response
example within the "Action Endpoints" section.

In `@skill/Workflows/Replay.md`:
- Around line 53-70: The README still uses the verb "replay" inconsistently with
the renamed command surface; update the text in Replay.md to use the canonical
"ingest" terminology everywhere (e.g., replace any occurrences of "replay" or
"replayed" with "ingest" or "ingested") and remove ambiguous phrases like
"replay sessions" so the steps and patterns consistently reference the
executable commands such as `selftune ingest claude`, `selftune ingest claude
--dry-run`, `selftune ingest claude --since`, `selftune ingest claude --force`,
and the verification step `selftune doctor`; ensure the "How do I know it
worked?" guidance mentions "ingested logs" or "ingestion" and checking log file
line counts increased.

---

Outside diff comments:
In `@skill/Workflows/EvolveBody.md`:
- Around line 60-68: Update the Pre-Flight Configuration section under the
"Pre-Flight Configuration" heading to remove the stale hyphenated command
spelling "evolve-body" and replace or unify it with the current command usage
"selftune evolve body" (or simply present "selftune evolve body" as the
recommended example). Ensure the displayed example block and any references in
that paragraph use "selftune evolve body" consistently so there is no ambiguity
for agents following the workflow.

---

Duplicate comments:
In `@cli/selftune/index.ts`:
- Around line 64-69: The default path calls statusMain() but doesn't prevent
fallthrough to the command switch when statusMain() returns; update the branch
that checks command so it either returns/throws after invoking statusMain() or
use an else block around the command dispatch to ensure the switch/default never
runs with command === undefined—specifically modify the block that imports and
calls statusMain (cliMain: statusMain) so it exits the current function (or
wraps the switch in an else) to guard the dispatch and avoid hitting the default
case.

In `@PRD.md`:
- Around line 173-174: Add a blank line immediately after the Markdown heading
"### Retroactive Replay (`selftune ingest claude`)" and likewise insert a blank
line after the other top-level or subheadings flagged by the reviewer (the
headings reported around lines 280 and 315) so each heading is followed by an
empty line to satisfy markdownlint rule MD022; edit the PRD.md file and ensure
every heading (e.g., "### Retroactive Replay (`selftune ingest claude`)", plus
the two other flagged headings) has one blank line beneath it.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5df5c7fb-32d4-45aa-bf35-94015759faad

📥 Commits

Reviewing files that changed from the base of the PR and between 3bea7f7 and 41a4b9d.

📒 Files selected for processing (39)
  • .claude/agents/diagnosis-analyst.md
  • .claude/agents/evolution-reviewer.md
  • .claude/agents/integration-guide.md
  • .claude/agents/pattern-analyst.md
  • CHANGELOG.md
  • PRD.md
  • README.md
  • cli/selftune/eval/baseline.ts
  • cli/selftune/eval/unit-test-cli.ts
  • cli/selftune/evolution/evolve-body.ts
  • cli/selftune/evolution/rollback.ts
  • cli/selftune/index.ts
  • docs/design-docs/composability-v2.md
  • docs/design-docs/evolution-pipeline.md
  • docs/design-docs/monitoring-pipeline.md
  • docs/design-docs/sandbox-claude-code.md
  • docs/design-docs/sandbox-test-harness.md
  • docs/design-docs/workflow-support.md
  • docs/exec-plans/active/multi-agent-sandbox.md
  • docs/exec-plans/completed/agent-first-skill-restructure.md
  • docs/exec-plans/scope-expansion-plan.md
  • docs/integration-guide.md
  • skill/SKILL.md
  • skill/Workflows/AutoActivation.md
  • skill/Workflows/Baseline.md
  • skill/Workflows/Composability.md
  • skill/Workflows/Cron.md
  • skill/Workflows/Dashboard.md
  • skill/Workflows/Evals.md
  • skill/Workflows/EvolveBody.md
  • skill/Workflows/ImportSkillsBench.md
  • skill/Workflows/Ingest.md
  • skill/Workflows/Initialize.md
  • skill/Workflows/Replay.md
  • skill/Workflows/Rollback.md
  • skill/Workflows/UnitTest.md
  • skill/Workflows/Watch.md
  • skill/references/invocation-taxonomy.md
  • skill/references/setup-patterns.md

Comment on lines 21 to +49
## Output Format

When `skill_usage_log.jsonl` is available, selftune uses the v2 analyzer and
prints co-occurring pairs, detected sequences, workflow candidates, and
conflicts.
```json
{
"skill_name": "Research",
"analyzed_sessions": 150,
"co_occurring_skills": [
{
"skill_a": "Research",
"skill_b": "Browser",
"co_occurrence_count": 42,
"conflict_score": 0.12,
"avg_errors_together": 1.5,
"avg_errors_alone": 1.3
}
],
"conflict_candidates": [
{
"skill_a": "Research",
"skill_b": "Content",
"co_occurrence_count": 15,
"conflict_score": 0.45,
"avg_errors_together": 3.2,
"avg_errors_alone": 1.1
}
],
"generated_at": "2026-03-04T12:00:00.000Z"
}
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Unify the documented eval composability output schema.

This workflow shows a skill_name / co_occurring_skills JSON shape, but cli/selftune/index.ts still serializes analyzeComposability() directly and docs/design-docs/composability-v2.md describes that report as pairs / total_sessions_analyzed / conflict_count. One of these contracts is wrong, and agents cannot reliably parse both.

As per coding guidelines, skill/**/*.md: review for valid code examples and no ambiguous references.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skill/Workflows/Composability.md` around lines 21 - 49, The documented `eval
composability` JSON schema in skill/Workflows/Composability.md conflicts with
the actual serialization used by analyzeComposability() and the shape described
in docs/design-docs/composability-v2.md; pick a single canonical schema (e.g.,
the Composability.md shape with keys: skill_name, analyzed_sessions,
co_occurring_skills, conflict_candidates, generated_at), then update
analyzeComposability() and the serializer in cli/selftune/index.ts to emit that
schema (rename or map pairs → co_occurring_skills and total_sessions_analyzed →
analyzed_sessions, conflict_count → length of conflict_candidates), and finally
harmonize docs/design-docs/composability-v2.md and
skill/Workflows/Composability.md to the chosen contract so agents/readers and
tests use one unambiguous format.

Comment on lines +90 to +133
### Action Endpoints

**"Start the dashboard without launching a browser"**
> Run `selftune dashboard --no-open`.
Action buttons in the dashboard trigger selftune commands via POST
requests. Each endpoint spawns a `bun run` subprocess.

**"The dashboard won’t load"**
> Ensure the SPA build exists with `bun run build:dashboard` in the repo, then retry.
> If using the published package, verify the install completed correctly and run `selftune doctor`.
**Watch and Evolve** request body:

**"I want a per-skill deep link"**
> Open `/skills/<name>` in the SPA, or `/report/<name>` for the HTML report view.
```json
{
"skill": "skill-name",
"skillPath": "/path/to/SKILL.md"
}
```

## SPA Development
**Rollback** request body:

```bash
# From repo root
bun run dev
```json
{
"skill": "skill-name",
"skillPath": "/path/to/SKILL.md",
"proposalId": "proposal-uuid"
}
```

# Server only
bun run dev:dashboard
All action endpoints return:

# Or manually:
selftune dashboard --port 7888 --no-open
cd apps/local-dashboard
bun install
bunx vite
```json
{
"success": true,
"output": "command stdout",
"error": null
}
```

The Vite dev server runs at `http://localhost:5199` and proxies API traffic to
the dashboard server on `http://localhost:7888`.
On failure, `success` is `false` and `error` contains the error message.

### Browser and Shutdown

The live server auto-opens the dashboard URL in the default browser on
macOS (`open`) and Linux (`xdg-open`).

Graceful shutdown on `SIGINT` (Ctrl+C) and `SIGTERM`: closes all SSE
client connections and stops the server.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Add language specifiers to JSON code blocks.

JSON code blocks should specify json as the language for proper syntax highlighting.

🔧 Proposed fix

Apply to all JSON blocks (lines 97-102, 105-111, 115-121):

-```
+```json
 {
   "skill": "skill-name",
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)

[warning] 95-95: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skill/Workflows/Dashboard.md` around lines 90 - 133, Open the Dashboard.md
under skill/Workflows and update the three JSON fenced code blocks (the "Watch
and Evolve" request body, the "Rollback" request body, and the response example
under "All action endpoints") to include the language specifier by changing the
opening backtick fence from ``` to ```json so the blocks render with JSON syntax
highlighting; locate the exact blocks by the headings "Watch and Evolve" and
"Rollback" and the response example within the "Action Endpoints" section.

- Filter skip/watch actions from selftune_stats run counts
- Restore legacy token_usage/duration_stats from execution_facts
- Cooperative SIGINT/SIGTERM shutdown for orchestrate loop
- Validate --window as positive integer with error message
- Add process.exit guard for bare selftune status fallthrough
- Update ARCHITECTURE.md import matrix for Dashboard dependencies
- Fix adapter count, code fence languages, and doc terminology

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
skill/SKILL.md (1)

142-146: ⚠️ Potential issue | 🟡 Minor

Add a language tag to the feedback-loop fenced block

This fenced block is unlabeled and is flagged by markdownlint (MD040). Use text for the ASCII diagram.

Proposed doc patch
-```
+```text
 Observe --> Detect --> Diagnose --> Propose --> Validate --> Deploy --> Watch
    |                                                                    |
    +--------------------------------------------------------------------+
</details>


As per coding guidelines, skill markdown should include valid code examples.

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @skill/SKILL.md around lines 142 - 146, The fenced ASCII diagram block in
SKILL.md is unlabeled and triggers markdownlint MD040; update the
triple-backtick fence for the diagram to include the language tag "text" so the
block becomes a labeled code fence (i.e., change the unlabeled fence around the Observe --> Detect --> ... Watch ASCII diagram totext). This will
satisfy the linter and keep the ASCII diagram intact.


</details>

</blockquote></details>

</blockquote></details>
♻️ Duplicate comments (1)
skill/Workflows/Replay.md (1)

63-67: 🧹 Nitpick | 🔵 Trivial

Minor inconsistency: "replay" verb in prose while command uses "ingest".

The pattern headings use "replay" (e.g., "Only replay recent sessions") but the commands use selftune ingest claude. Consider aligning the prose for clarity.

Suggested wording adjustment
 **"Only replay recent sessions"**
-> Run `selftune ingest claude --since 2026-02-01`
+**"Only ingest recent sessions"**
+> Run `selftune ingest claude --since 2026-02-01`
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skill/Workflows/Replay.md` around lines 63 - 67, The headings say "replay"
but the CLI shown is `selftune ingest claude` — change the prose to match the
command: update headings like "Only replay recent sessions" and "Re-ingest
everything" to use "ingest" (e.g., "Only ingest recent sessions" and "Re-ingest
everything" -> "Force re-ingest") or alternatively change the CLI examples to
`selftune replay ...` if the intended verb is replay; update the text around the
headings that contain the phrases "Only replay recent sessions" and the command
example `selftune ingest claude --since 2026-02-01` (and the force example
`selftune ingest claude --force`) so the verb and command are consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ARCHITECTURE.md`:
- Around line 74-82: The fenced code block containing the coverage spectrum
diagram (the block that starts with "flowchart TD" and includes the line
"Explicit only     -->  Skill is too rigid, users must babysit") is missing a
language specifier; change its opening fence from ``` to ```text so the diagram
snippet is marked as plain text (i.e., replace the current fenced block around
the coverage spectrum with a ```text fence).

In `@cli/selftune/dashboard-server.ts`:
- Around line 829-832: The LIKE pattern currently interpolates skillName
directly into the query (see skill_actions_json, orchestrate_runs and the .all
call), which can lead to SQL injection or unexpected wildcard matches; fix it by
building a safe parameterized pattern instead of string interpolation: escape
SQL LIKE special characters (%) and (_) and backslash in skillName, then create
a pattern like `%escapedSkillName%` and pass that as a bound parameter to the
prepared query (use the existing ? placeholder in the SELECT) rather than
embedding the value into the SQL string.

In `@skill/SKILL.md`:
- Around line 148-154: Update the evolution workflow sequence to explicitly
include an audit trail and rollback capability by inserting two steps (e.g.,
"Audit" — record deployment metadata, validation evidence and diffs; and
"Rollback" — mechanism to restore the previous description backup) into the list
around deployment; modify the sequence containing Observe, Detect, Diagnose,
Propose, Validate, Deploy, Watch to read something like: Observe, Detect,
Diagnose, Propose, Validate, Audit, Deploy (with backup), Rollback (capability),
Watch, and ensure any references to `evolve`, `validate`, `deploy`, and `watch`
callouts note the audit-recording and rollback hooks so operators can trigger
restores using the existing backup.

---

Outside diff comments:
In `@skill/SKILL.md`:
- Around line 142-146: The fenced ASCII diagram block in SKILL.md is unlabeled
and triggers markdownlint MD040; update the triple-backtick fence for the
diagram to include the language tag "text" so the block becomes a labeled code
fence (i.e., change the unlabeled ``` fence around the Observe --> Detect -->
... Watch ASCII diagram to ```text). This will satisfy the linter and keep the
ASCII diagram intact.

---

Duplicate comments:
In `@skill/Workflows/Replay.md`:
- Around line 63-67: The headings say "replay" but the CLI shown is `selftune
ingest claude` — change the prose to match the command: update headings like
"Only replay recent sessions" and "Re-ingest everything" to use "ingest" (e.g.,
"Only ingest recent sessions" and "Re-ingest everything" -> "Force re-ingest")
or alternatively change the CLI examples to `selftune replay ...` if the
intended verb is replay; update the text around the headings that contain the
phrases "Only replay recent sessions" and the command example `selftune ingest
claude --since 2026-02-01` (and the force example `selftune ingest claude
--force`) so the verb and command are consistent.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 363b9192-9a93-43ac-b98d-b81245a6dc55

📥 Commits

Reviewing files that changed from the base of the PR and between 41a4b9d and 43f2d70.

📒 Files selected for processing (9)
  • AGENTS.md
  • ARCHITECTURE.md
  • PRD.md
  • cli/selftune/dashboard-server.ts
  • cli/selftune/index.ts
  • cli/selftune/orchestrate.ts
  • skill/SKILL.md
  • skill/Workflows/Replay.md
  • skill/references/invocation-taxonomy.md

- Escape SQL LIKE wildcards in dashboard skill name query
- Add Audit + Rollback steps to SKILL.md feedback loop
- Fix stale "replay" references in quickstart help text and quickstart.ts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
WellDunDun and others added 3 commits March 15, 2026 16:54
- Fix dashboard-server.ts indentation on LIKE escape pattern
- Prefix unused deployedCount/watchedCount with underscore
- Format api.ts import to multi-line per biome rules

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Surfaces the doctor health checks (config, log files, hooks, evolution)
through a new /status route in the dashboard SPA, so humans can monitor
selftune health without touching the CLI.

- Add GET /api/v2/doctor endpoint to dashboard server
- Add DoctorResult/HealthCheck types to dashboard contract
- Create Status page with grouped checks, summary cards, auto-refresh
- Add System Status link in sidebar footer
- Update all related docs (ARCHITECTURE, HANDOFF, system-overview, Dashboard workflow)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cli/selftune/dashboard-server.ts`:
- Around line 828-835: The LIKE pattern escaping in the orchestrateRows query
doesn't work because SQLite needs an explicit ESCAPE clause; update the SQL
passed to db.query in dashboard-server.ts where orchestrateRows is defined (the
query that selects skill_actions_json from orchestrate_runs) to append "ESCAPE
'\\'" to the WHERE ... LIKE ? clause so the backslash escapes produced by
skillName.replace(...) are honored; keep the existing pattern construction
(escaping %, _, and \) unchanged.

In `@cli/selftune/quickstart.ts`:
- Around line 86-89: The error message in the catch block that prints `Replay
failed: ${msg}` is stale and should reflect the renamed ingest flow; update the
catch in quickstart.ts (the try/catch around the ingest call) to log something
like `Ingest failed: ${msg}` (or `selftune ingest failed: ${msg}`) so it matches
the guidance `You can run \`selftune ingest claude\` manually to troubleshoot;
ensure the surrounding code still computes msg the same way (`err instanceof
Error ? err.message : String(err)`) and keep the follow-up console.log
unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ec51dfce-c57a-41c6-9e2c-ed39a41b402f

📥 Commits

Reviewing files that changed from the base of the PR and between 43f2d70 and 35f67a4.

📒 Files selected for processing (4)
  • cli/selftune/dashboard-server.ts
  • cli/selftune/index.ts
  • cli/selftune/quickstart.ts
  • skill/SKILL.md

- SQLite LIKE needs explicit ESCAPE '\\' for backslash escapes to work
- Rename "Replay failed" to "Ingest failed" in quickstart error output

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (1)
cli/selftune/dashboard-server.ts (1)

828-837: ⚠️ Potential issue | 🟠 Major

Missing ESCAPE clause in LIKE query — escapes are ineffective without it.

The code escapes \, %, and _ in skillName, but SQLite's LIKE operator has no default escape character. Without ESCAPE '\\' in the SQL, the backslash sequences are treated as literal characters, not escape sequences.

Proposed fix
         const orchestrateRows = db
           .query(
             `SELECT skill_actions_json FROM orchestrate_runs
-             WHERE skill_actions_json LIKE ?`,
+             WHERE skill_actions_json LIKE ? ESCAPE '\\'`,
           )
           .all(
             `%${skillName.replace(/\\/g, "\\\\").replace(/%/g, "\\%").replace(/_/g, "\\_")}%`,
           ) as Array<{
           skill_actions_json: string;
         }>;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli/selftune/dashboard-server.ts` around lines 828 - 837, The LIKE query in
the db.query call retrieving orchestrateRows escapes backslashes, % and _ in
skillName but never declares an ESCAPE character, so those backslashes are
treated literally; update the SQL string used in orchestrateRows (the db.query
that selects skill_actions_json) to append " ESCAPE '\\'" to the WHERE ... LIKE
clause and keep the escaped skillName parameter as-is (or switch to a
parameterized bind) so the backslash escape sequences are honored by SQLite.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/local-dashboard/src/pages/Status.tsx`:
- Around line 161-171: The current filtering (configChecks, logChecks,
hookChecks, evolutionChecks) and groups construction drops any checks that don't
match those predicates; update the logic to compute an "otherChecks" set (checks
not included in configChecks/logChecks/hookChecks/evolutionChecks) and add a
fallback group like { title: "Other", checks: otherChecks } when
otherChecks.length > 0 so unknown/new check types (e.g., "database_connection")
appear in the UI; adjust the variable names (configChecks, logChecks,
hookChecks, evolutionChecks, groups) accordingly.
- Around line 237-239: The React list is using check.name as the key in
group.checks.map which can collide when multiple checks share the same name;
update the key to a stable unique identifier (e.g., use an explicit id property
if available like check.id, or compose a compound key using the parent group and
check such as `${group.name}-${check.name}` or `${groupIndex}-${check.name}`) in
the mapping that renders CheckCard to avoid React reconciliation bugs; locate
the map where CheckCard is rendered and replace the key usage of check.name with
the chosen unique compound or id.

In `@cli/selftune/dashboard-contract.ts`:
- Around line 177-194: Remove the duplicated type definitions for HealthStatus,
HealthCheck, and DoctorResult from dashboard-contract.ts and instead import them
from the centralized types.ts module; update the top of the file to import {
HealthStatus, HealthCheck, DoctorResult } from "types.ts" (or the local types
module), remove the local interface/type declarations, and ensure any exports or
usages in functions like DoctorResult references the imported types so all code
uses the single source-of-truth types.

In `@cli/selftune/orchestrate.ts`:
- Around line 807-813: The sleep interruption can deadlock because requestStop
only clears sleepTimer but does not resolve the Promise waiting on setTimeout;
modify the sleep logic so the Promise exposes a resolver (e.g., store a
sleepResolve variable when creating the sleep Promise used at lines around the
setTimeout), and in requestStop after clearing sleepTimer call that resolver to
immediately resolve the pending sleep Promise; ensure sleepResolve is nulled
after use and that requestStop still sets stopRequested and removes listeners if
needed. Use the existing symbols requestStop, sleepTimer, stopRequested and the
Promise created around setTimeout to locate and wire the resolver.

In `@skill/Workflows/Dashboard.md`:
- Around line 56-58: Update the dashboard docs to remove SSE-related text and
the __SELFTUNE_LIVE__ flag mention: replace references to /api/events and
Server-Sent Events with a description that the UI uses TanStack Query polling
against /api/v2/overview (15s interval) and /api/v2/orchestrate-runs (30s
interval) to refresh data and trigger action buttons for selftune commands;
rewrite both the paragraph that currently mentions SSE (around the section
describing the live server) and the later section that references
__SELFTUNE_LIVE__ so they accurately describe the polling-based refresh
mechanism, include the exact endpoint names (/api/v2/overview,
/api/v2/orchestrate-runs) and intervals, and remove any mention of /api/events
or the __SELFTUNE_LIVE__ flag.

---

Duplicate comments:
In `@cli/selftune/dashboard-server.ts`:
- Around line 828-837: The LIKE query in the db.query call retrieving
orchestrateRows escapes backslashes, % and _ in skillName but never declares an
ESCAPE character, so those backslashes are treated literally; update the SQL
string used in orchestrateRows (the db.query that selects skill_actions_json) to
append " ESCAPE '\\'" to the WHERE ... LIKE clause and keep the escaped
skillName parameter as-is (or switch to a parameterized bind) so the backslash
escape sequences are honored by SQLite.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0ffeaf5c-26b4-4fb0-855c-d221190281ba

📥 Commits

Reviewing files that changed from the base of the PR and between 35f67a4 and aababd8.

📒 Files selected for processing (13)
  • ARCHITECTURE.md
  • apps/local-dashboard/HANDOFF.md
  • apps/local-dashboard/src/App.tsx
  • apps/local-dashboard/src/api.ts
  • apps/local-dashboard/src/components/app-sidebar.tsx
  • apps/local-dashboard/src/hooks/useDoctor.ts
  • apps/local-dashboard/src/pages/Status.tsx
  • apps/local-dashboard/src/types.ts
  • cli/selftune/dashboard-contract.ts
  • cli/selftune/dashboard-server.ts
  • cli/selftune/orchestrate.ts
  • docs/design-docs/system-overview.md
  • skill/Workflows/Dashboard.md

WellDunDun and others added 2 commits March 15, 2026 17:24
- Add "Other" fallback group for unknown check types in Status page
- Use compound key (name+idx) to avoid React key collisions
- Re-export DoctorResult types from types.ts instead of duplicating
- Fix orchestrate loop sleep deadlock on SIGINT/SIGTERM
- Replace stale SSE references with polling-based refresh in Dashboard docs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
selftune is a skill consumed by agents, not a CLI tool for humans.
Users install the skill and talk to their agent ("improve my skills"),
the agent reads SKILL.md, routes to workflows, and runs CLI commands.

- AGENTS.md: add Agent-First Architecture section + dev guidance
- ARCHITECTURE.md: add Agent-First Design Principle at top
- SKILL.md: add agent-addressing preamble ("You are the operator")

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/local-dashboard/src/pages/Status.tsx`:
- Line 158: The destructuring const { checks, summary, healthy, timestamp } =
data can yield checks as undefined/null and later filter calls will throw;
change the destructure so checks gets a safe default (e.g., use data.checks ??
[]) while still extracting summary, healthy, and timestamp so downstream uses
like checks.filter(...) are protected; update any references to the original
checks variable accordingly (look for the destructuring of data and uses of
checks in this component).

In `@cli/selftune/orchestrate.ts`:
- Line 550: The variables _deployedCount and _watchedCount are declared and
incremented but never read; either remove their declarations and all increments
(clean dead code) or make their purpose explicit (keep the declarations but add
a brief comment like "// intentionally unused — reserved for future telemetry"
or export/consume them where intended) and ensure their names remain
underscore-prefixed if you intend them to be intentionally unused; update
occurrences in orchestrate.ts (references to _deployedCount and _watchedCount at
the reported locations) to follow one of these two options.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1df91075-8b16-4d11-a3fb-0c88e4081c63

📥 Commits

Reviewing files that changed from the base of the PR and between 1f41dca and 1e9f785.

📒 Files selected for processing (7)
  • AGENTS.md
  • ARCHITECTURE.md
  • apps/local-dashboard/src/pages/Status.tsx
  • cli/selftune/dashboard-contract.ts
  • cli/selftune/orchestrate.ts
  • skill/SKILL.md
  • skill/Workflows/Dashboard.md

WellDunDun and others added 4 commits March 15, 2026 17:32
Four parallel agent implementations:

1. SKILL.md trigger keywords: added natural-language triggers across 10
   workflows + 13 new user-facing examples ("set up selftune", "improve
   my skills", "how are my skills doing", etc.)

2. Hook auto-merge: selftune init now automatically merges hooks into
   ~/.claude/settings.json for Claude Code — no manual settings editing.
   Initialize.md updated to reflect auto-install.

3. Cold-start fallback: quickstart detects empty telemetry after ingest
   and shows hook-discovered skills or guidance message instead of blank
   output. No LLM calls, purely data-driven.

4. Dashboard build: added prepublishOnly script to ensure SPA is built
   before npm publish (CI already did this, but local publish was not).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Status.tsx: default checks to [] if API returns undefined
- orchestrate.ts: annotate _deployedCount/_watchedCount as reserved

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Mark Codex/OpenCode/OpenClaw as experimental across docs, SKILL.md,
  CLI help text, and README. Claude Code is the primary platform.
- Unify cron and schedule into `selftune cron` with --platform flag
  for agent-specific setup. `selftune schedule` kept as alias.
- Remove dead _deployedCount/_watchedCount counters from orchestrate.ts
  (summary already computed via array filters in Step 7).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ARCHITECTURE.md: add Interactive vs Automated mode explanation,
  document JSONL-first data flow with SQLite as materialized view
- Cron.md: fix stale orchestrate schedule (weekly → every 6 hours),
  correct "agent runs" to "OS scheduler calls CLI directly"
- Orchestrate.md: add execution context table (interactive vs automated)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cli/selftune/quickstart.ts (1)

186-203: ⚠️ Potential issue | 🟡 Minor

Missing error handling and exit code in cliMain.

If quickstart() throws an unexpected error, the process crashes with a raw stack trace instead of an actionable message. Per coding guidelines, CLI entry points should exit with code 1 on errors.

Suggested fix
 export async function cliMain(): Promise<void> {
   // Check for --help
   if (process.argv.includes("--help") || process.argv.includes("-h")) {
     console.log(`selftune quickstart — Guided onboarding
 
 Usage:
   selftune quickstart
 
 Steps:
   1. Runs init if ~/.selftune/config.json doesn't exist
   2. Runs ingest claude if session marker doesn't exist
   3. Shows current status
   4. Suggests top skills to evolve`);
     process.exit(0);
   }
 
-  await quickstart();
+  try {
+    await quickstart();
+  } catch (err) {
+    const msg = err instanceof Error ? err.message : String(err);
+    console.error(`quickstart failed: ${msg}`);
+    process.exit(1);
+  }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli/selftune/quickstart.ts` around lines 186 - 203, Wrap the await
quickstart() call in cliMain inside a try/catch so any thrown errors are caught;
in the catch block log a concise actionable message and the error via
console.error (include error.message or the error object) and call
process.exit(1) to ensure the CLI exits with a non-zero status. Specifically
update the cliMain function to catch errors from quickstart() and handle them
instead of letting the process crash with a stack trace.
♻️ Duplicate comments (2)
ARCHITECTURE.md (1)

138-143: ⚠️ Potential issue | 🟡 Minor

Add language specifiers to the remaining plain-text fences.

markdownlint still flags the unlabeled fences on Lines 138, 150, and 166. Mark these examples as text to clear MD040.

Also applies to: 150-155, 166-184

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ARCHITECTURE.md` around lines 138 - 143, The three unlabeled fenced code
blocks that show the agent workflow examples (the blocks containing "User:
\"improve my skills\" → Agent reads SKILL.md → routes to Orchestrate workflow →
Agent runs: selftune orchestrate → Agent summarizes results to user" and the two
similar examples later) need explicit language specifiers to satisfy
markdownlint MD040; edit each triple-backtick fence and change ``` to ```text so
the examples are marked as plain text (apply this to the blocks currently around
those examples).
cli/selftune/index.ts (1)

255-258: ⚠️ Potential issue | 🟠 Major

Reject partial numeric --window values instead of truncating them.

Line 256 still uses Number.parseInt(), so --window 1.5 becomes 1 and --window 7days becomes 7. That silently changes the analysis window instead of failing fast, so the previous validation gap is still open.

Suggested fix
-        const windowSize =
-          values.window === undefined ? undefined : Number.parseInt(values.window as string, 10);
-        if (windowSize !== undefined && (!Number.isInteger(windowSize) || windowSize <= 0)) {
+        const windowArg = values.window as string | undefined;
+        const windowSize =
+          windowArg === undefined
+            ? undefined
+            : /^[1-9]\d*$/.test(windowArg)
+              ? Number(windowArg)
+              : undefined;
+        if (windowArg !== undefined && windowSize === undefined) {
           console.error("Invalid --window value. Use a positive integer number of days.");
           process.exit(1);
         }

Run this to verify the current behavior. Expected result: "1.5" and "7days" still print as accepted by the current check.

#!/bin/bash
set -euo pipefail

sed -n '255,258p' cli/selftune/index.ts

node --eval '
for (const raw of ["7", "1.5", "7days", "0", "-2"]) {
  const parsed = Number.parseInt(raw, 10);
  const accepted = Number.isInteger(parsed) && parsed > 0;
  console.log(JSON.stringify({ raw, parsed, accepted }));
}
'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli/selftune/index.ts` around lines 255 - 258, The current parsing of
values.window uses Number.parseInt which silently accepts "1.5" and "7days" by
truncation; update the validation around windowSize (the values.window handling
and the const windowSize) to reject non-integer strings: treat values.window as
a string, ensure it matches /^\d+$/ (only digits) or parse with Number(value)
and confirm Number.isInteger(parsed) and String(parsed) === value before
accepting; if it fails, keep the error branch that logs "Invalid --window value.
Use a positive integer number of days." so only strictly positive integer day
values are accepted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@AGENTS.md`:
- Around line 165-174: Update the AGENTS.md scheduling section to choose a
single canonical command that matches the CLI entrypoint in
cli/selftune/index.ts — use "selftune cron" as the canonical scheduling command
and explicitly mark "selftune schedule --install" (or "selftune schedule") as a
backward-compatibility alias; edit the bullet that currently reads "`selftune
schedule --install` and `selftune cron` are the primary autonomous loop" to
instead state the canonical command first ("selftune cron" is the primary
autonomous loop) and note the other as an alias and backward-compatible
alternative, ensuring wording matches the grouping/naming used in
cli/selftune/index.ts.
- Around line 48-56: The AGENTS.md repo tree is incomplete and reads as
authoritative while omitting files dispatched by cli/selftune/index.ts (notably
cli/selftune/ingestors/openclaw-ingest.ts and the hooks auto-activate.ts,
skill-change-guard.ts, evolution-guard.ts); update the document to either list
those missing files under the appropriate hooks/ and ingestors/ sections or
change the wording to state the tree is illustrative/partial, and ensure
references to the dispatcher in cli/selftune/index.ts are reflected so readers
can reconcile the manifest with the router.

In `@apps/local-dashboard/src/pages/Status.tsx`:
- Around line 199-200: The refresh button is icon-only (Button with onClick={()
=> refetch()} rendering RefreshCwIcon) and lacks an accessible name; add an
aria-label and a title to the Button (for example aria-label="Refresh" and
title="Refresh") so screen readers and hover/tooltips can identify the control,
e.g., update the Button props where RefreshCwIcon is used to include aria-label
and title that describe the action (you can use a localized string if available)
while leaving onClick calling refetch unchanged.

In `@ARCHITECTURE.md`:
- Around line 163-184: The filenames in the "Source of Truth" section are
inconsistent with the "Shared Local Artifacts" table; pick a single canonical
naming convention (e.g., use the full names session_telemetry_log.jsonl,
skill_usage_log.jsonl, all_queries_log.jsonl) and update the "Source of Truth"
bullets (the entries currently listing telemetry.jsonl, skill-usage.jsonl,
queries.jsonl, etc.) to use those canonical filenames, or alternatively add a
parenthetical note after each short name (e.g., telemetry.jsonl →
session_telemetry_log.jsonl) so both references match; update the mentions in
Core Loop and Materialized View text (readJsonl(TELEMETRY_LOG), TELEMETRY_LOG,
SKILL_LOG, QUERY_LOG, and the file list under Source of Truth) to use the chosen
canonical identifiers.

In `@cli/selftune/index.ts`:
- Around line 133-139: The grade and evolve command branches treat any missing
subcommand or any argument starting with "-" as “run default”, so passing
-h/--help never shows the grouped subcommand help; update the switch cases for
"grade" and "evolve" to explicitly detect sub === "-h" || sub === "--help"
(before the existing default-action guard that checks !sub ||
sub.startsWith("-")) and dispatch to a help routine that prints the grouped
subcommands (for grade: auto|baseline; for evolve: body|rollback). Edit the case
"grade" and case "evolve" blocks (use the existing sub variable and existing
imports like cliMain in ./grading/grade-session.js and the evolve handler) to
add these explicit help branches so -h/--help is handled before falling through
to the default grader/evolver.
- Around line 228-247: The composability branch uses parseArgs inside the case
"composability" and currently rejects --help as an unknown option; update the
logic so help is handled before parseArgs (or register a boolean "help" option
in the parseArgs options) to allow the command to show usage instead of exiting.
Specifically, either inspect process.argv for "--help" (or "-h") at the start of
the case "composability" and call the existing usage/help printer, or add help:
{ type: "boolean" } to the parseArgs options and after parsing check values.help
to print usage and exit normally; adjust references to parseArgs, values, and
the composability handler accordingly.

In `@cli/selftune/quickstart.ts`:
- Line 59: The inline comment "// Step 2: Replay if marker doesn't exist" is
stale; update it to describe the current ingest flow instead (e.g., "// Step 2:
Ingest missing data if marker doesn't exist" or similar) so it accurately
reflects what the surrounding logic in quickstart.ts performs; locate the
comment near the Step 2 block in quickstart.ts and replace the word "Replay"
with wording that matches the ingest behavior implemented.

---

Outside diff comments:
In `@cli/selftune/quickstart.ts`:
- Around line 186-203: Wrap the await quickstart() call in cliMain inside a
try/catch so any thrown errors are caught; in the catch block log a concise
actionable message and the error via console.error (include error.message or the
error object) and call process.exit(1) to ensure the CLI exits with a non-zero
status. Specifically update the cliMain function to catch errors from
quickstart() and handle them instead of letting the process crash with a stack
trace.

---

Duplicate comments:
In `@ARCHITECTURE.md`:
- Around line 138-143: The three unlabeled fenced code blocks that show the
agent workflow examples (the blocks containing "User: \"improve my skills\" →
Agent reads SKILL.md → routes to Orchestrate workflow → Agent runs: selftune
orchestrate → Agent summarizes results to user" and the two similar examples
later) need explicit language specifiers to satisfy markdownlint MD040; edit
each triple-backtick fence and change ``` to ```text so the examples are marked
as plain text (apply this to the blocks currently around those examples).

In `@cli/selftune/index.ts`:
- Around line 255-258: The current parsing of values.window uses Number.parseInt
which silently accepts "1.5" and "7days" by truncation; update the validation
around windowSize (the values.window handling and the const windowSize) to
reject non-integer strings: treat values.window as a string, ensure it matches
/^\d+$/ (only digits) or parse with Number(value) and confirm
Number.isInteger(parsed) and String(parsed) === value before accepting; if it
fails, keep the error branch that logs "Invalid --window value. Use a positive
integer number of days." so only strictly positive integer day values are
accepted.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2c6c7e63-6c57-4348-811b-785ccd486c89

📥 Commits

Reviewing files that changed from the base of the PR and between 1e9f785 and a1a2a85.

📒 Files selected for processing (14)
  • AGENTS.md
  • ARCHITECTURE.md
  • README.md
  • apps/local-dashboard/src/pages/Status.tsx
  • cli/selftune/index.ts
  • cli/selftune/init.ts
  • cli/selftune/orchestrate.ts
  • cli/selftune/quickstart.ts
  • package.json
  • skill/SKILL.md
  • skill/Workflows/Cron.md
  • skill/Workflows/Ingest.md
  • skill/Workflows/Initialize.md
  • skill/Workflows/Orchestrate.md

Comment on lines +165 to +174
- **selftune is agent-first:** users interact through their coding agent, not the CLI directly. SKILL.md and workflow docs are the product surface; the CLI is the agent's API.
- Claude Code is the primary supported platform; Codex, OpenCode, and OpenClaw adapters are experimental (they exist but are not actively tested). All four write to the same shared log schema
- Source-truth transcripts/rollouts are authoritative; hooks are low-latency hints, not the canonical record
- Grading uses the user's existing agent subscription — no separate API key
- Hooks must be zero-config after installation
- Hooks should be zero-config after installation where the host agent supports them
- Log files are append-only JSONL at `~/.claude/`
- Evolution proposals require validation against eval set before deploy
- `selftune orchestrate` and `selftune schedule --install` are the primary autonomous loop; `selftune cron` is the OpenClaw-specific adapter
- All knowledge lives in-repo, not in external tools
- Zero runtime dependencies uses only Bun built-ins
- The core CLI keeps zero runtime dependencies and uses only Bun built-ins
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Document one canonical scheduling command here.

Line 172 still teaches selftune schedule --install, but cli/selftune/index.ts now presents selftune cron as the grouped entrypoint and keeps schedule as backward compatibility. Pick one canonical command in this guide and explicitly label the other as an alias.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AGENTS.md` around lines 165 - 174, Update the AGENTS.md scheduling section to
choose a single canonical command that matches the CLI entrypoint in
cli/selftune/index.ts — use "selftune cron" as the canonical scheduling command
and explicitly mark "selftune schedule --install" (or "selftune schedule") as a
backward-compatibility alias; edit the bullet that currently reads "`selftune
schedule --install` and `selftune cron` are the primary autonomous loop" to
instead state the canonical command first ("selftune cron" is the primary
autonomous loop) and note the other as an alias and backward-compatible
alternative, ensuring wording matches the grouping/naming used in
cli/selftune/index.ts.

Three parallel agents rewrote workflow docs so the agent (not the human)
is the operator:

Critical (3 files): Evolve.md, Evals.md, Baseline.md
- Pre-flight sections now have explicit selection-to-flag mapping tables
- Agent knows exactly how to parse user choices into CLI commands

Moderate (11 files): Initialize, Dashboard, Watch, Grade, Contribute,
UnitTest, Sync, AutoActivation, Orchestrate, Doctor, Replay
- "When to Use" sections rewritten as agent trigger conditions
- "Common Patterns" converted from user quotes to agent decision logic
- Steps use imperative agent voice throughout
- Replay.md renamed to "Ingest (Claude) Workflow" with compatibility note

Minor (8 files): Composability, Schedule, Cron, Badge, Workflows,
EvolutionMemory, ImportSkillsBench, Ingest
- Added missing "When to Use" sections
- Added error handling guidance
- Fixed agent voice consistency

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
skill/Workflows/Baseline.md (1)

99-106: 🧹 Nitpick | 🔵 Trivial

Add language specifier to fenced code block.

The code block at line 99 uses text as the language specifier, but a similar block starting around line 138 is missing the language specifier entirely. Markdown linters flag this as a consistency issue. Add text to the fenced code block for consistency.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skill/Workflows/Baseline.md` around lines 99 - 106, The fenced code block
that begins with the lines "Configuration Summary:" (the triple-backtick block
shown in the diff) is missing a language specifier; update the opening fence
from ``` to ```text so it matches the other block that uses `text` and resolves
linter consistency errors—ensure the opening fence is exactly ```text and leave
the block contents unchanged.
skill/Workflows/Doctor.md (2)

145-145: ⚠️ Potential issue | 🟡 Minor

Ambiguous command reference.

The bare init is unclear. Should be selftune init to match the CLI pattern used elsewhere in the document.

🔧 Proposed fix
-| Hook scripts missing | Verify the selftune repo path. Re-run `init` if the repo was moved. |
+| Hook scripts missing | Verify the selftune repo path. Re-run `selftune init` if the repo was moved. |

As per coding guidelines: "No ambiguous references" - command names should be fully qualified.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skill/Workflows/Doctor.md` at line 145, Update the ambiguous command
reference in the Doctor.md line that currently reads "Re-run `init` if the repo
was moved." — replace the bare `init` with the fully qualified CLI command
`selftune init` so it matches the rest of the document's command pattern and
removes ambiguity.

79-79: ⚠️ Potential issue | 🟠 Major

Fix broken documentation references in health checks and troubleshooting sections.

Two file paths cannot be resolved:

  • Line 79: Change references/logs.md to ../references/logs.md (relative to Workflows directory) or reference full path skill/references/logs.md
  • Line 150: Change assets/activation-rules-default.json to skill/assets/activation-rules-default.json (the file exists only in skill/assets/, not root assets/)

These broken links prevent agents from following documentation guidance.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skill/Workflows/Doctor.md` at line 79, The documentation contains broken
relative links in skill/Workflows/Doctor.md: update the reference
`references/logs.md` (seen in the Schema conformance line) to
`../references/logs.md` or `skill/references/logs.md`, and update the activation
rules link `assets/activation-rules-default.json` to
`skill/assets/activation-rules-default.json`; modify these two string literals
in Doctor.md so the health checks and troubleshooting sections point to the
actual files.
♻️ Duplicate comments (1)
skill/Workflows/Ingest.md (1)

235-237: ⚠️ Potential issue | 🟡 Minor

Use ingest terminology in this pattern title.

The pattern label says “Replay only recent Claude Code sessions” while the documented command surface is selftune ingest ..., which is inconsistent.

As per coding guidelines, skill/**/*.md should avoid ambiguous references and keep instructions consistent.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skill/Workflows/Ingest.md` around lines 235 - 237, Update the pattern title
"Replay only recent Claude Code sessions" to use the ingest terminology
consistent with the documented command (e.g., "Ingest only recent Claude Code
sessions" or "Ingest recent Claude Code sessions") so it matches the command
surface shown (`selftune ingest claude --since 2026-02-01`) and aligns with
skill/**/*.md guidelines; ensure the title text changed in Workflows/Ingest.md
and any surrounding references to the pattern are updated to the new phrasing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@skill/Workflows/Contribute.md`:
- Around line 74-75: Update Contribute.md to add explicit parsing instructions
for the output of the command `selftune contribute --preview --skill selftune`:
specify the expected output format (e.g., JSON or YAML) and list the key fields
to extract and display (for example: bundle_id/name, bundle_size,
preview_item_count, sample_records (sanitized), detected_sensitive_fields, and
any errors/warnings), describe how to parse nested preview items and sanitize
sensitive fields (PII) before reporting, and include an example schema or sample
parsed summary so readers know exactly what to extract and present to the user.

In `@skill/Workflows/Cron.md`:
- Around line 52-54: The example uses the obsolete flag name "--format" which
conflicts with the document's "--platform" surface; update the alias example for
"selftune schedule" (alias of "selftune cron") to use the current flag
"--platform launchd" instead of "--format launchd" so existing examples and
flags are unambiguous and consistent with the rest of the doc.
- Line 132: The rollback example in Workflows/Cron.md omits the required
--skill-path flag; update the example command string (the line containing
`selftune evolve rollback --skill <name>`) to include `--skill-path <path>` so
it reads like the documented manual rollback form (e.g., `selftune evolve
rollback --skill <name> --skill-path <path>`), ensuring the example in Cron.md
matches the Rollback.md pattern and complies with the skill/**/*.md
command-example guideline.
- Around line 92-106: The fenced flow-diagram block starting with "OS scheduler
fires (cron/launchd/systemd)" should include a language specifier to satisfy
markdown linting; update the opening fence from ``` to ```text so the block is
```text ... ``` (i.e., add the text specifier to the existing flow diagram
fenced code block in Cron.md).

In `@skill/Workflows/Evals.md`:
- Line 190: Replace the HTML entities in the Output Path line so it reads with
literal angle brackets: change the text "evals-&lt;skill&gt;.json" in the line
beginning with "**Output Path:**" (currently "evals-&lt;skill&gt;.json
(default)") to use literal "<skill>" (preferably inside an inline code span) so
it becomes "evals-<skill>.json (default)".

In `@skill/Workflows/Evolve.md`:
- Around line 123-135: Replace the ambiguous "(pareto mode flag)" entry in the
Selection/CLI Flag table for row "6b (pareto)" in Workflows/Evolve.md with the
concrete CLI flag name used by the implementation (e.g., the actual flag such as
--pareto or --pareto-mode), wrapped in backticks, and ensure the table
formatting remains intact; verify the string exactly matches the flag defined in
the CLI parsing code so generated agent commands are deterministic.

In `@skill/Workflows/Grade.md`:
- Around line 135-138: Update the workflow text in Grade.md to state that the
grader writes results to the canonical artifact file grading.json (not stdout);
specifically, change the line describing output for the selftune grade --skill
<name> command to instruct readers to open/parse grading.json for the pass rate
and failure details and remove or replace any wording that says “parse the JSON
output” or implies stdout. Ensure the document consistently references
grading.json as the source of truth wherever the workflow or examples mention
parsing results.

In `@skill/Workflows/Ingest.md`:
- Around line 226-227: The guidance about retrying with "--verbose" is invalid
because the "ingest wrap-codex" command has no documented options; update the
Ingest.md entry for the "ingest wrap-codex" command to either (A) document a
real --verbose flag (add a short CLI option description, usage example like
"ingest wrap-codex --verbose", and show expected output/behavior) or (B) remove
the retry-with---verbose sentence and replace it with a valid troubleshooting
step (e.g., how to check codex binary accessibility and a sample command to
run). Ensure changes reference the "ingest wrap-codex" command and update the
surrounding code example blocks so the document remains a valid, runnable
example under skill/**/*.md guidelines.

In `@skill/Workflows/Initialize.md`:
- Around line 143-145: The verification command in Initialize.md uses a relative
path 'ls .claude/agents/' which can produce false negatives; update that command
to use the absolute user path 'ls ~/.claude/agents/' (or 'ls
$HOME/.claude/agents/') so the documented step reliably checks the correct
directory in the user's home, and ensure the README line containing 'ls
.claude/agents/' is replaced accordingly.
- Around line 104-106: Update the short CLI example "ingest wrap-codex" to the
full, unambiguous form "selftune ingest wrap-codex" so it matches the rest of
the document and coding guidelines; ensure both examples use the full "selftune
ingest ..." form (e.g., "selftune ingest wrap-codex" and "selftune ingest
codex") and verify there are no other abbreviated CLI invocations in
Initialize.md.

In `@skill/Workflows/Orchestrate.md`:
- Around line 89-104: The docs omit the new autonomous continuous mode; update
the "Two Execution Contexts" section to document the `selftune orchestrate
--loop` execution path as a first-class automated option: describe how to start
it (invoke `selftune orchestrate --loop`), its interval behavior (continuous
loop with optional internal backoff/interval and relation to the evolution
step), how to stop it (Ctrl+C / SIGTERM or a documented stop command if
implemented), and how it differs from `selftune cron setup` (cron is
scheduler-driven; `--loop` is a long-running CLI process that consumes no tokens
except during evolution). Reference the exact flag `--loop`, mention evolution
step and model usage, and add brief step-by-step run/stop instructions
consistent with other skill workflow docs.
- Around line 94-102: Update the "Automated" row in the table to remove the
contradictory "Zero (CLI only, no LLM)" and instead state "No agent-session
token cost; LLM cost only if evolution is triggered" (or equivalent explicit
phrasing), and adjust the following paragraph to mirror that wording: clarify
that the OS calls the CLI with no agent session and thus no session tokens are
consumed, but LLM calls may occur during the evolution step
(proposing/validating description changes) and will incur model-tier token costs
only when that evolution is triggered.

In `@skill/Workflows/Sync.md`:
- Around line 58-60: Update the guidance that treats "synced == 0" as a failure:
change text around the "Check that the synced counts are non-zero for active
sources" check to explain that zero synced can be a valid no-op (already
up-to-date) and instead define success as: no ingestion/hook errors, expected
sources were scanned or explicitly skipped, and marker/progression advanced
where applicable; only recommend running `selftune doctor` when there are
explicit ingestion or hook errors (not when synced==0). Also apply the same
clarification to the other occurrence of this check in the document (the later
paragraph referencing `selftune doctor`) and ensure wording is precise and
stepwise per the doc guidelines.
- Around line 46-47: Update the workflow examples so all `selftune sync`
invocations explicitly include the `--json` flag (e.g., change references like
"Run `selftune sync --dry-run`" to "Run `selftune sync --dry-run --json`") so
output is machine-parseable; also update the guidance around result
interpretation (lines referencing `scanned`/`synced`) to note that `synced == 0`
is expected for `--dry-run` or already-up-to-date sources and that you should
check `scanned > 0` or `repair.ran` (or other provided JSON fields) to determine
whether a sync was meaningful rather than relying solely on `synced > 0`.

---

Outside diff comments:
In `@skill/Workflows/Baseline.md`:
- Around line 99-106: The fenced code block that begins with the lines
"Configuration Summary:" (the triple-backtick block shown in the diff) is
missing a language specifier; update the opening fence from ``` to ```text so it
matches the other block that uses `text` and resolves linter consistency
errors—ensure the opening fence is exactly ```text and leave the block contents
unchanged.

In `@skill/Workflows/Doctor.md`:
- Line 145: Update the ambiguous command reference in the Doctor.md line that
currently reads "Re-run `init` if the repo was moved." — replace the bare `init`
with the fully qualified CLI command `selftune init` so it matches the rest of
the document's command pattern and removes ambiguity.
- Line 79: The documentation contains broken relative links in
skill/Workflows/Doctor.md: update the reference `references/logs.md` (seen in
the Schema conformance line) to `../references/logs.md` or
`skill/references/logs.md`, and update the activation rules link
`assets/activation-rules-default.json` to
`skill/assets/activation-rules-default.json`; modify these two string literals
in Doctor.md so the health checks and troubleshooting sections point to the
actual files.

---

Duplicate comments:
In `@skill/Workflows/Ingest.md`:
- Around line 235-237: Update the pattern title "Replay only recent Claude Code
sessions" to use the ingest terminology consistent with the documented command
(e.g., "Ingest only recent Claude Code sessions" or "Ingest recent Claude Code
sessions") so it matches the command surface shown (`selftune ingest claude
--since 2026-02-01`) and aligns with skill/**/*.md guidelines; ensure the title
text changed in Workflows/Ingest.md and any surrounding references to the
pattern are updated to the new phrasing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3a5f8306-086f-4795-a745-fe80b33b15b3

📥 Commits

Reviewing files that changed from the base of the PR and between a1a2a85 and 79ac970.

📒 Files selected for processing (22)
  • skill/Workflows/AutoActivation.md
  • skill/Workflows/Badge.md
  • skill/Workflows/Baseline.md
  • skill/Workflows/Composability.md
  • skill/Workflows/Contribute.md
  • skill/Workflows/Cron.md
  • skill/Workflows/Dashboard.md
  • skill/Workflows/Doctor.md
  • skill/Workflows/Evals.md
  • skill/Workflows/EvolutionMemory.md
  • skill/Workflows/Evolve.md
  • skill/Workflows/Grade.md
  • skill/Workflows/ImportSkillsBench.md
  • skill/Workflows/Ingest.md
  • skill/Workflows/Initialize.md
  • skill/Workflows/Orchestrate.md
  • skill/Workflows/Replay.md
  • skill/Workflows/Schedule.md
  • skill/Workflows/Sync.md
  • skill/Workflows/UnitTest.md
  • skill/Workflows/Watch.md
  • skill/Workflows/Workflows.md

Comment on lines +74 to +75
1. Run `selftune contribute --preview --skill selftune` to preview the contribution bundle
2. Parse the output and report the sanitized data summary to the user for review
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Clarify parsing instructions for Step 1.

Line 75 states "Parse the output and report the sanitized data summary" but doesn't specify the output format or which fields to extract. Other workflow docs provide explicit parsing guidance (e.g., Watch.md lines 52-64). Consider adding a "Parsing Instructions" section or specifying the expected output format and key fields to report.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skill/Workflows/Contribute.md` around lines 74 - 75, Update Contribute.md to
add explicit parsing instructions for the output of the command `selftune
contribute --preview --skill selftune`: specify the expected output format
(e.g., JSON or YAML) and list the key fields to extract and display (for
example: bundle_id/name, bundle_size, preview_item_count, sample_records
(sanitized), detected_sensitive_fields, and any errors/warnings), describe how
to parse nested preview items and sanitize sensitive fields (PII) before
reporting, and include an example schema or sample parsed summary so readers
know exactly what to extract and present to the user.

Comment on lines +46 to +47
Run `selftune sync --dry-run`. Parse the JSON output to check per-source
`scanned` counts. Report the preview summary to the user.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify whether sync supports explicit JSON output and whether docs mention it.
rg -n -C2 '\bsync\b' skill/Workflows/Sync.md
rg -n -C3 --type=ts --type=js --type=tsx --type=jsx --type=json '\b(sync|--dry-run|--json)\b'

Repository: selftune-dev/selftune

Length of output: 2088


🏁 Script executed:

#!/bin/bash
# Search for sync command implementation to determine JSON output behavior
fd -type f -name "*.ts" -o -name "*.js" | head -20 | xargs grep -l "sync" | head -10
# Find command definitions
find . -type f \( -name "*command*" -o -name "*sync*" \) \( -name "*.ts" -o -name "*.js" \) 2>/dev/null | head -20

Repository: selftune-dev/selftune

Length of output: 296


🏁 Script executed:

#!/bin/bash
# Look for main entry point and CLI structure
ls -la | grep -E "^-.*\.(ts|js|json|bun)$"
fd -type f -name "*.json" | grep -E "(package|tsconfig|bun)" | head -5
git ls-files | grep -E "^(src|bin|lib|commands)" | head -20

Repository: selftune-dev/selftune

Length of output: 668


🏁 Script executed:

#!/bin/bash
# Search for JSON output or flag handling in the codebase
rg -n "json|--json" --type=ts --type=js -i | head -40
rg -n "synced.*count\|output.*format" --type=ts --type=js -i | head -20

Repository: selftune-dev/selftune

Length of output: 3743


🏁 Script executed:

#!/bin/bash
# Read the sync command implementation
wc -l ./cli/selftune/sync.ts
head -100 ./cli/selftune/sync.ts

Repository: selftune-dev/selftune

Length of output: 2811


🏁 Script executed:

#!/bin/bash
# Continue reading sync.ts to see output formatting
tail -100 ./cli/selftune/sync.ts
# Also check what's in the middle
sed -n '100,200p' ./cli/selftune/sync.ts

Repository: selftune-dev/selftune

Length of output: 6908


🏁 Script executed:

#!/bin/bash
# Check the test file to understand expected output format
wc -l ./tests/sync.test.ts
head -150 ./tests/sync.test.ts

Repository: selftune-dev/selftune

Length of output: 4424


Add --json flag to sync commands for machine-parseable output.

The workflow examples instruct "Parse the JSON output" without specifying how to enable it. The --json flag must be explicitly included. JSON is not the default format; without this flag, output is human-readable text (suitable for interactive use only).

Update these lines to include --json:

Required changes
-Run `selftune sync --dry-run`. Parse the JSON output to check per-source
+Run `selftune sync --dry-run --json`. Parse the JSON output to check per-source
 `scanned` counts. Report the preview summary to the user.

-Run `selftune sync`. Parse the JSON output for:
+Run `selftune sync --json`. Parse the JSON output for:
 - Per-source `scanned`, `synced`, and `skipped` counts
 - Repaired overlay totals
 - Any errors or warnings

-Run `selftune sync`. Parse the JSON output and report per-source counts.
+Run `selftune sync --json`. Parse the JSON output and report per-source counts.

Line 80–81 also needs clarification: a zero synced count is normal for --dry-run or when data is already up-to-date. Use available fields (e.g., scanned > 0 or repair.ran) to determine if sync was meaningful, not just synced > 0.

Applies to: Lines 46–47, 51–54, 70, 80–81

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Run `selftune sync --dry-run`. Parse the JSON output to check per-source
`scanned` counts. Report the preview summary to the user.
Run `selftune sync --dry-run --json`. Parse the JSON output to check per-source
`scanned` counts. Report the preview summary to the user.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skill/Workflows/Sync.md` around lines 46 - 47, Update the workflow examples
so all `selftune sync` invocations explicitly include the `--json` flag (e.g.,
change references like "Run `selftune sync --dry-run`" to "Run `selftune sync
--dry-run --json`") so output is machine-parseable; also update the guidance
around result interpretation (lines referencing `scanned`/`synced`) to note that
`synced == 0` is expected for `--dry-run` or already-up-to-date sources and that
you should check `scanned > 0` or `repair.ran` (or other provided JSON fields)
to determine whether a sync was meaningful rather than relying solely on `synced
> 0`.

WellDunDun and others added 4 commits March 15, 2026 18:09
Autonomous mode:
- Evolve, Watch, Grade, Orchestrate workflows now document their
  behavior when called by selftune orchestrate (no user interaction,
  defaults used, pre-flight skipped, auto-rollback enabled)
- SKILL.md routing table marks autonomous workflows with †

Agent connections:
- All 4 agents (.claude/agents/) now have "Connection to Workflows"
  sections explaining when the main agent should spawn them
- Key workflows (Evolve, Doctor, Composability, Initialize) now have
  "Subagent Escalation" sections referencing the relevant agent
- SKILL.md agents table adds "When to spawn" column with triggers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
findRecentlyEvolvedSkills was identical to findRecentlyDeployedSkills.
Consolidated into one function used for both cooldown gating and
watch targeting.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
AGENTS.md:
- Add missing hook files and openclaw-ingest to project tree
- Use selftune cron as canonical scheduling command

Status.tsx:
- Add aria-label and title to refresh button for accessibility

ARCHITECTURE.md:
- Use canonical JSONL filenames matching constants.ts
- Add text language specifier to code block

index.ts:
- Add --help handlers for grade and evolve grouped commands
- Add --help handler for eval composability before parseArgs

quickstart.ts:
- Fix stale "Replay" comment to "Ingest"

Workflow docs:
- Cron.md: fix --format to --platform, add text fence, add --skill-path
- Evals.md: fix HTML entities to literal angle brackets
- Evolve.md: replace placeholder with actual --pareto flag
- Grade.md: clarify results come from grading.json not stdout
- Ingest.md: fix wrap-codex error guidance (no --verbose flag)
- Initialize.md: use full selftune command form, fix relative path
- Orchestrate.md: fix token cost contradiction, document --loop mode
- Sync.md: clarify synced=0 is valid, fix output parsing guidance

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Core feature: selftune now detects when users correct skill misses
("why didn't you use X?", "please use the commit skill") and triggers
focused improvement automatically when the session ends.

Signal detection (prompt-log.ts):
- Pure regex patterns detect corrections, explicit requests
- Extracts mentioned skill name from query text
- Appends to improvement_signals.jsonl (zero LLM cost)

Reactive trigger (session-stop.ts):
- Checks for pending signals when session ends
- Spawns background selftune orchestrate if signals exist
- Lockfile prevents concurrent runs (30-min stale threshold)

Signal-aware orchestrator (orchestrate.ts):
- Reads pending signals at startup (no new CLI flags)
- Boosts priority of signaled skills (+150 per signal, cap 450)
- Signaled skills bypass evidence and UNGRADED gates
- Marks signals consumed after run completes
- Lockfile acquire/release wrapping full orchestrate body

Tests: 32 new tests across 2 files (signal detection + orchestrator)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
skill/Workflows/Cron.md (1)

36-40: ⚠️ Potential issue | 🟠 Major

Clarify list/remove scope vs platform-agnostic cron setup

selftune cron setup is documented as cross-platform, but selftune cron list/remove are documented as OpenClaw-backed only. This is ambiguous for users on launchd/systemd/cron and can cause incorrect operator expectations.

Suggested doc patch
 ### `selftune cron list`
 
-Show all registered selftune cron jobs. Reads from
-`~/.openclaw/cron/jobs.json` and filters for `selftune-*` entries.
-No flags.
+Show registered selftune cron jobs.
+
+- For `--platform openclaw`: reads `~/.openclaw/cron/jobs.json` and filters `selftune-*`.
+- For OS-native platforms (`launchd`, `systemd`, `cron`): lists jobs from the platform-specific scheduler.
+
+Use `--platform <name>` to force lookup behavior when needed.
 
 ### `selftune cron remove`
 
-Remove all selftune cron jobs from OpenClaw.
+Remove all registered selftune cron jobs for the selected platform.
 
 | Flag | Description | Default |
 |------|-------------|---------|
+| `--platform <name>` | Target platform (`openclaw`, `cron`, `launchd`, `systemd`) | Auto-detect |
 | `--dry-run` | Preview which jobs would be removed without deleting | Off |

As per coding guidelines, skill/**/*.md must have “Clear step-by-step instructions” and “No ambiguous references.”

Also applies to: 42-49

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skill/Workflows/Cron.md` around lines 36 - 40, Update the Cron.md entries for
"selftune cron setup", "selftune cron list", and "selftune cron remove" to
remove ambiguity by explicitly stating scope: mark "selftune cron setup" as
cross-platform (works with launchd/systemd/cron) and mark "selftune cron list"
and "selftune cron remove" as OpenClaw-backed operations that only read/write
~/.openclaw/cron/jobs.json and will not inspect or modify platform-native
schedulers; add clear step-by-step instructions for users on how to list/remove
platform-native jobs (e.g., use launchctl/systemctl/crontab commands) and
include a short note telling operators to use those system tools when they
created the cron via the OS rather than via "selftune cron setup".
♻️ Duplicate comments (1)
skill/Workflows/Sync.md (1)

81-84: ⚠️ Potential issue | 🟡 Minor

Don't require scanned > 0 here.

A healthy no-op sync can still have sources that were explicitly skipped or had nothing to replay. Define success as "no sync errors and expected sources were scanned or skipped" so agents do not treat clean no-op runs as failures.

As per coding guidelines, skill workflow docs must provide clear step-by-step instructions and no ambiguous references.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skill/Workflows/Sync.md` around lines 81 - 84, Update the "Agent needs to
verify sync worked" guidance to stop requiring `scanned > 0`; instead state
success as "no sync errors and each expected source was either scanned or
explicitly skipped," and replace the ambiguous sentence with step-by-step
checks: 1) confirm no sync errors, 2) for each expected source assert it appears
in the report as `scanned` or `skipped`, and 3) treat `synced=0` as normal when
data is up-to-date; revise the lines referencing `scanned`, `synced`, and
`skipped` to exactly use these terms so agents can match them programmatically.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/agents/diagnosis-analyst.md:
- Around line 21-34: The "Connection to Workflows" section contains a
contradictory activation policy: it states the agent is "not called directly by
the user" but also lists direct activation phrases/entries; resolve by choosing
the canonical policy (recommend keeping subagent-only) and updating the section
accordingly—remove any direct user-trigger activation phrases or examples that
imply direct invocation, keep and clarify orchestration triggers (e.g., the
spawn cases for selftune doctor, Grade, Status), and ensure wording like "spawn
this agent" and "not called directly by the user" consistently reflect the
subagent-only policy so orchestration/routing is unambiguous.

In @.claude/agents/evolution-reviewer.md:
- Around line 129-130: Update the documentation row that currently points
reviewers to run `selftune eval generate --skill <name>`; instead instruct
reviewers to open the eval file path recorded by the `selftune evolve --skill
<name> --skill-path <path> --dry-run` output or the evolution audit log so they
validate against the exact eval set used during the proposal gate; reference the
"evolve output" or "audit log" as the source of the eval file path and replace
the command suggestion with a note to inspect that recorded file for review.

In @.claude/agents/integration-guide.md:
- Line 174: The guide currently tells users to run "selftune eval generate
--skill <name>" then a bare "selftune evolve", which can evolve a different
target in multi-skill/monorepo setups; update the instructions to pass the same
skill flag to evolve (i.e., instruct users to run "selftune evolve --skill
<name>") so the selected skill from "selftune eval generate --skill <name>" is
carried through to the evolve step.

In `@apps/local-dashboard/src/pages/Status.tsx`:
- Around line 86-118: In CheckCard, STATUS_DISPLAY[check.status] can be
undefined and later dereferenced; change the assignment of display (used in the
Badge) to provide a defensive fallback object when STATUS_DISPLAY[check.status]
is missing (e.g., default variant, a generic icon, and a label derived from
check.status or "Unknown") so accesses to
display.variant/display.icon/display.label are always safe; update the display
variable in the CheckCard function (and keep CHECK_META usage unchanged) to use
this fallback.

In `@cli/selftune/index.ts`:
- Around line 283-286: The parsing for the --window flag currently uses
Number.parseInt on values.window which silently accepts malformed strings like
"10days" or "1.5"; update the check to validate the raw string (values.window)
with a strict integer regex (e.g. /^\d+$/) before converting so only pure
positive integers are allowed, then parse to Number and set windowSize; use the
existing variables (values.window, windowSize) and the same error branch to
reject invalid formats and non-positive values.
- Around line 62-67: The branch that handles a missing command imports cliMain
as statusMain and calls statusMain(); remove the redundant process.exit(0) that
follows so the code relies on statusMain() to handle exit codes and error
propagation; specifically, update the block referencing command and statusMain
(from "./status.js") to call statusMain() only and delete the trailing
process.exit(0) statement.

In `@cli/selftune/quickstart.ts`:
- Around line 100-101: The log message using the step prefix "[2/3]" after step
2 has completed (in quickstart.ts) is inconsistent and may confuse users; update
the console output in the block that checks hasSessions so it no longer reuses
the "[2/3]" prefix—either remove the "[2/3]" prefix entirely or replace it with
a distinct marker like "[info]" or "[check]" in the console.log call that prints
"No sessions found. Checking for skills from hooks...", referencing the
hasSessions check location to make the change.

In `@skill/SKILL.md`:
- Around line 150-154: The fenced ASCII diagram block for the pipeline ("Observe
--> Detect --> Diagnose --> Propose --> Validate --> Audit --> Deploy --> Watch
--> Rollback" with the connecting lines) is missing a language specifier; update
that fenced code block to use the `text` language (i.e., change the opening
fence to ```text) so static analysis recognizes it as plain text.

In `@skill/Workflows/Doctor.md`:
- Around line 176-179: Update the "No telemetry data available" branch to route
fixes by agent platform: in the workflow doc and branching logic around the
"selftune doctor" instruction, detect the agent platform and send Claude users
to the Initialize workflow (Claude hook installation) but send
Codex/OpenCode/OpenClaw users to the ingest or wrapper workflow instead of the
hook path; update the Doctor.md text to include concise step-by-step
instructions for each platform (e.g., 1) detect platform, 2) if Claude -> run
Initialize to install hooks and then run one session, 3) if
Codex/OpenCode/OpenClaw -> run the ingest or wrapper workflow and then run one
session) and clearly state that at least one session must run after installation
to generate telemetry.

In `@skill/Workflows/Evals.md`:
- Around line 179-205: The synthetic pre-flight flow in "Generation Mode" (the
Ask prompt that asks "Reply with your choices or 'use defaults'") does not
collect the required skill path for synthetic mode, leaving the generated CLI
invocation missing the `--skill-path <path>` argument; update the
prompt/questions so when option 1b (synthetic) is chosen the user is explicitly
asked for "Skill path" (or to confirm a default) and ensure the parser maps that
answer into the CLI flag `--synthetic --skill-path <path>` alongside any
selected `--model`, `--max`, and `--out` flags; locate this logic tied to the
Generation Mode prompt and the mapping table to add the new question and include
the captured value in the final command construction.
- Around line 137-138: The docs list an invalid CLI alias `--model haiku`; the
claude CLI only accepts `sonnet` and `opus` as direct aliases (see llm-call.ts
which maps valid --model aliases), so update the table and example: either
remove `haiku` from the listed CLI options or add a clear note that `haiku` is
not a valid --model alias and must be referenced by its full model ID (or
resolved internally by selftune), and ensure examples use either `--model
sonnet|opus` or the full model ID (e.g., claude-sonnet-4-5-20250514) to avoid
CLI errors.

In `@skill/Workflows/Ingest.md`:
- Around line 226-227: The troubleshooting step incorrectly refers to hook
health for the wrap-codex telemetry path; update the text for the "ingest
wrap-codex" workflow to advise verifying Codex wrapper/log/telemetry health
rather than hooks—replace "verify hook health" and "selftune doctor" guidance
with concrete steps to check that the codex binary is accessible, that the
target working directory exists, and how to inspect Codex wrapper logs/telemetry
(e.g., check wrapper stdout/stderr or log files used by wrap-codex) so operators
are directed to the correct diagnostics for wrap-codex.

In `@skill/Workflows/Initialize.md`:
- Around line 87-91: The fenced code block in Initialize.md is missing a blank
line before the opening fence and lacks a language specifier; update the snippet
around the init output example (the fenced block that currently shows the [INFO]
Installed... line) by inserting a blank line immediately before the ``` and
adding a language tag (e.g., ```text) on the opening fence so it becomes a
proper fenced code block with a language specifier.

In `@skill/Workflows/Orchestrate.md`:
- Around line 118-121: Replace the specific agent-only command "selftune ingest
claude" with the broader sync command "selftune sync" in the Orchestrate
workflow text (the list item currently numbered "1. **Sync**"), so the
documented step reflects the full refresh/repair across all supported agents;
update the line to read something like "1. **Sync** — refresh source-truth
telemetry (`selftune sync`)" to ensure it does not drop non-Claude sources and
matches the intended orchestration flow.

---

Outside diff comments:
In `@skill/Workflows/Cron.md`:
- Around line 36-40: Update the Cron.md entries for "selftune cron setup",
"selftune cron list", and "selftune cron remove" to remove ambiguity by
explicitly stating scope: mark "selftune cron setup" as cross-platform (works
with launchd/systemd/cron) and mark "selftune cron list" and "selftune cron
remove" as OpenClaw-backed operations that only read/write
~/.openclaw/cron/jobs.json and will not inspect or modify platform-native
schedulers; add clear step-by-step instructions for users on how to list/remove
platform-native jobs (e.g., use launchctl/systemctl/crontab commands) and
include a short note telling operators to use those system tools when they
created the cron via the OS rather than via "selftune cron setup".

---

Duplicate comments:
In `@skill/Workflows/Sync.md`:
- Around line 81-84: Update the "Agent needs to verify sync worked" guidance to
stop requiring `scanned > 0`; instead state success as "no sync errors and each
expected source was either scanned or explicitly skipped," and replace the
ambiguous sentence with step-by-step checks: 1) confirm no sync errors, 2) for
each expected source assert it appears in the report as `scanned` or `skipped`,
and 3) treat `synced=0` as normal when data is up-to-date; revise the lines
referencing `scanned`, `synced`, and `skipped` to exactly use these terms so
agents can match them programmatically.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 367aa309-fdbf-494c-b391-f629bec4df47

📥 Commits

Reviewing files that changed from the base of the PR and between 79ac970 and f02ebc5.

📒 Files selected for processing (22)
  • .claude/agents/diagnosis-analyst.md
  • .claude/agents/evolution-reviewer.md
  • .claude/agents/integration-guide.md
  • .claude/agents/pattern-analyst.md
  • AGENTS.md
  • ARCHITECTURE.md
  • apps/local-dashboard/src/pages/Status.tsx
  • cli/selftune/index.ts
  • cli/selftune/orchestrate.ts
  • cli/selftune/quickstart.ts
  • skill/SKILL.md
  • skill/Workflows/Composability.md
  • skill/Workflows/Cron.md
  • skill/Workflows/Doctor.md
  • skill/Workflows/Evals.md
  • skill/Workflows/Evolve.md
  • skill/Workflows/Grade.md
  • skill/Workflows/Ingest.md
  • skill/Workflows/Initialize.md
  • skill/Workflows/Orchestrate.md
  • skill/Workflows/Sync.md
  • skill/Workflows/Watch.md

WellDunDun and others added 5 commits March 15, 2026 18:45
- ARCHITECTURE.md: add Signal-Reactive Improvement section with mermaid
  sequence diagram showing signal flow from prompt-log to orchestrate
- Orchestrate.md: add Signal-Reactive Trigger section with guard rails
- evolution-pipeline.md: add signal detection as pipeline input
- system-overview.md: add signal-reactive path to system overview
- logs.md: document improvement_signals.jsonl format and fields

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Agents:
- diagnosis-analyst: resolve activation contradiction (subagent-only)
- evolution-reviewer: inspect recorded eval file, not regenerated one
- integration-guide: carry --skill flag through to evolve command

Dashboard:
- Status.tsx: defensive fallback for unknown health status values

CLI:
- index.ts: remove redundant process.exit(0) after statusMain
- index.ts: strict regex validation for --window (reject "10days")

Quickstart:
- Remove misleading [2/3] prefix from post-step check

Workflows:
- SKILL.md: add text language specifier to feedback loop diagram
- Initialize.md: add blank line + text specifier to code block
- Orchestrate.md: fix sync step to use selftune sync not ingest claude
- Doctor.md: route missing-telemetry fixes by agent platform
- Evals.md: add skill-path to synthetic pre-flight, note haiku alias
- Ingest.md: wrap-codex uses wrapper not hooks for telemetry

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. AGENTS.md: Add Change Propagation Map — "if you change X, update Y"
   table that agents check before committing. Prevents stale docs.

2. README.md: Refresh for v0.2 architecture:
   - Agent-first framing ("tell your agent" not "run this command")
   - Grouped commands table (ingest, grade, evolve, eval, auto)
   - Signal-reactive detection mentioned in Detect section
   - Automate section with selftune cron setup
   - Removed CLI-centric use case descriptions

3. Exec plan for dashboard signal integration (planned, not started):
   - Schema + materialization + queries + contract + API + UI
   - 3 parallel agent workstreams, ~4.5 hours estimated

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update all repo path references (badges, clone URLs, install command,
contribute PR target, security tab link, llms.txt) from personal
WellDunDun/selftune to org selftune-dev/selftune.

Kept as WellDunDun (personal account, not repo path):
- CODEOWNERS (@WellDunDun)
- FUNDING.yml (sponsors/WellDunDun)
- LICENSE copyright
- PRD owner field

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/local-dashboard/src/pages/Status.tsx`:
- Around line 162-163: Destructure defensively so healthy and timestamp cannot
be undefined: when extracting from data (the line with const { checks:
rawChecks, summary, healthy, timestamp } = data), supply sensible defaults
(e.g., checks -> [], healthy -> false, timestamp -> 0 or null/Date fallback, and
summary -> '') or guard with data ?? {} before destructuring; update uses of
checks/healthy/timestamp (variables: rawChecks, checks, healthy, timestamp) to
rely on those defaults so later lines that reference healthy and timestamp won't
access undefined.
- Around line 217-240: The CardTitle usages read summary.pass, summary.warn, and
summary.fail directly and can throw if summary is undefined; update those
references (the CardTitle renderings for "pass", "warn", and "fail") to guard
with optional chaining and defaults (e.g., use summary?.pass ?? 0 or destructure
const { pass = 0, warn = 0, fail = 0 } = summary || {}) so the UI always
displays 0 when fields are missing or summary is undefined while preserving the
existing styling logic that checks summary.warn/summary.fail for color classes.
- Around line 200-201: The span rendering Last checked calls timeAgo(timestamp)
without validating timestamp; update the Status component to guard the timestamp
before calling timeAgo (e.g., compute a safe value using a helper or inline
check: if timestamp is falsy or new Date(timestamp) is invalid, render a
fallback like "never" or "N/A" instead of calling timeAgo). Locate the JSX that
contains Last checked {timeAgo(timestamp)} and replace it with a conditional
expression or a small sanitizedTimestamp variable that verifies timestamp (using
new Date(timestamp).getTime() or Date.parse) and only calls
timeAgo(sanitizedTimestamp) when valid.

In `@ARCHITECTURE.md`:
- Around line 145-148: The doc currently recommends two different automation
entrypoints; pick and use one consistently—either standardize on "selftune
schedule --install" as the primary automated scheduler entrypoint or explicitly
reserve "selftune cron setup" for the OpenClaw adapter only. Update the
Automated Mode section to reference the chosen command (replace "selftune cron
setup" with "selftune schedule --install" if consolidating to the primary path),
and add a short note mentioning that the "cron" subcommand is only used when
targeting the OpenClaw adapter (or remove the OpenClaw reference if you choose
cron as primary) so all mentions of selftune scheduling are consistent across
the doc.

In `@cli/selftune/hooks/prompt-log.ts`:
- Around line 58-95: SIGNAL_PATTERNS uses six regexes that capture skill names
with (?<skill>\w+), which drops hyphenated names; update each pattern in the
SIGNAL_PATTERNS array to use (?<skill>[\w-]+) instead of (?<skill>\w+) so
detectImprovementSignal() will capture full hyphenated skill names (e.g.,
integration-guide, wrap-codex) for accurate lookup and telemetry.

In `@cli/selftune/hooks/session-stop.ts`:
- Around line 41-57: The current check-only lock logic around
lockPath/LOCK_STALE_MS can race; replace the read-only check with an atomic
claim using openSync(lockPath, "wx") (or equivalent) to create the lock file
only if it doesn't exist, write the timestamp into the newly-created lock, and
proceed to Bun.spawn(...) only on successful create; if openSync fails with
EEXIST, re-read the lock and honor LOCK_STALE_MS as before; if spawn
(Bun.spawn(...)) throws or returns a non-started process, remove/unlink the lock
file to avoid leaving a stale claim, and keep proc.unref() after a successful
spawn.

In `@cli/selftune/orchestrate.ts`:
- Around line 116-147: markSignalsConsumed currently reads the signal log,
updates records, and overwrites the file causing races where another writer
appends signals between read and write; to fix, right before writing back,
re-read signalLogPath (using readJsonl) to capture any new appended records,
merge them by key (`${timestamp}|${session_id}`) so you don't lose records, then
write the merged array with writeFileSync; keep using the same key logic and
fields (consumed, consumed_at, consumed_by_run) so updated records are applied
and newly appended records are preserved.

In `@skill/references/logs.md`:
- Around line 212-224: Update the docs to describe append-only semantics by
treating signal consumption as a separate appended event rather than mutating
the original JSONL record: stop showing a mutated "consumed" boolean flip and
instead document an appended consumption event containing fields like
"timestamp", "session_id", "signal_type", "mentioned_skill", "consumed_at", and
"consumed_by_run"; explicitly state that log files are stored under ~/.claude/
and must be append-only for auditing/replication and that consumers should
correlate events by "session_id" (and/or signal identifiers) rather than relying
on in-place updates.

In `@skill/Workflows/Evals.md`:
- Around line 187-208: Summary: Fix inconsistent numbering and selection labels
for "Max Entries", "Model (synthetic mode only)", and "Output Path" and update
the selection-to-CLI mapping so option letters match the document headings. Edit
the Evals.md sections so the list reads sequentially (e.g., 3. Max Entries, 4.
Model (synthetic mode only), 5. Output Path) and change the model mapping rows
from "3a/3b/3c" to "4a/4b/4c"; also ensure the mode selection rows (log-based vs
synthetic) and custom options reference the same option numbers/letters used in
the list and keep the CLI flags exactly as shown (e.g., --synthetic --skill-path
<path>, --max <value>, --model haiku/sonnet/opus, --out <path>) so parsing is
unambiguous.

In `@skill/Workflows/Initialize.md`:
- Around line 128-151: The docs use repo-relative copy/list commands (cp
templates/activation-rules-default.json and instructions to copy agents from the
repository) which won't work from an arbitrary project; update the steps around
activation-rules-default.json and ls ~/.claude/agents/ to either (a) reference
the resolved installed-skill path returned by the installer/runtime (e.g., the
path the skill lives under the user's home install) instead of
"templates/..."/".claude/agents/" in the repo, or (b) instruct the user to run
the provided tooling to resolve/copy assets (call out running selftune init
--force or selftune doctor) so the skill can populate
~/.selftune/activation-rules.json and ~/.claude/agents/ for them; mention the
exact filenames to be copied (activation-rules-default.json and the agent files
diagnosis-analyst.md, pattern-analyst.md, evolution-reviewer.md,
integration-guide.md) and ensure the step explicitly shows either the resolved
path or the command the user must run (selftune init --force / selftune doctor)
to perform the installation.

In `@skill/Workflows/Orchestrate.md`:
- Around line 109-112: The docs state the --loop-interval default as "300s / 5
minutes" but the CLI default in cli/selftune/orchestrate.ts is "3600" (1 hour);
update the text in Workflows/Orchestrate.md (the "Loop mode" paragraph
referencing `selftune orchestrate --loop` and `--loop-interval`) to show the
correct default "3600s / 1 hour" so the documentation matches the default value
defined in orchestrate.ts.

In `@tests/hooks/signal-detection.test.ts`:
- Around line 107-118: The test "unknown skill still matches pattern but
mentioned_skill from capture" is vacuous because assertions are guarded by if
(result) and the sample text captures the generic word "an" which production
filters; update the test to assert result is not null (no conditional) and use
an input that actually captures a non-generic unknown skill token (e.g., a
clearly-named unknown like "unknownSkill" or "foobar") so
detectImprovementSignal(...) returns a signal; then assert result.signal_type
=== "correction" and that result.mentioned_skill equals the captured unknown
name to exercise the fallback path in detectImprovementSignal.

In `@tests/signal-orchestrate.test.ts`:
- Around line 249-255: The test currently calls markSignalsConsumed with an
empty list so it returns early; change the test to pass a non-empty pending
signal (e.g., a single pending signal object) so execution reaches the branch
that checks for the missing log file and still asserts that
markSignalsConsumed(..., "run_123", signalPath) does not throw; update the test
in tests/signal-orchestrate.test.ts to call markSignalsConsumed with a non-empty
array (referencing markSignalsConsumed) to exercise the missing-log branch.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 73c4de91-29f1-44fc-9a0e-3f7b38450646

📥 Commits

Reviewing files that changed from the base of the PR and between f02ebc5 and 885af5d.

📒 Files selected for processing (27)
  • .claude/agents/diagnosis-analyst.md
  • .claude/agents/evolution-reviewer.md
  • .claude/agents/integration-guide.md
  • AGENTS.md
  • ARCHITECTURE.md
  • CHANGELOG.md
  • README.md
  • apps/local-dashboard/src/pages/Status.tsx
  • cli/selftune/constants.ts
  • cli/selftune/hooks/prompt-log.ts
  • cli/selftune/hooks/session-stop.ts
  • cli/selftune/index.ts
  • cli/selftune/orchestrate.ts
  • cli/selftune/quickstart.ts
  • cli/selftune/types.ts
  • docs/design-docs/evolution-pipeline.md
  • docs/design-docs/system-overview.md
  • docs/exec-plans/active/dashboard-signal-integration.md
  • skill/SKILL.md
  • skill/Workflows/Doctor.md
  • skill/Workflows/Evals.md
  • skill/Workflows/Ingest.md
  • skill/Workflows/Initialize.md
  • skill/Workflows/Orchestrate.md
  • skill/references/logs.md
  • tests/hooks/signal-detection.test.ts
  • tests/signal-orchestrate.test.ts

Code fixes:
- prompt-log.ts: broaden skill capture regex to [\w-]+ for hyphenated names
- session-stop.ts: atomic lock acquisition with openSync("wx") + cleanup
- orchestrate.ts: re-read signal log before write to prevent race condition

Dashboard:
- Status.tsx: defensive defaults for healthy, summary, timestamp

Docs:
- ARCHITECTURE.md: use selftune cron setup as canonical scheduler
- Orchestrate.md: fix loop-interval default (3600s not 300s)
- Evals.md: fix option numbering (1-5) and selection mapping (4a/4b/4c)
- Initialize.md: use selftune init --force instead of repo-relative paths
- logs.md: document signal consumption as exception to append-only

Tests:
- signal-detection: fix vacuous unknown-skill test
- signal-orchestrate: exercise missing-log branch with non-empty signals

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@llms.txt`:
- Around line 16-57: The links in this file hardcode "/blob/master/" which can
break when branches change; update all occurrences of "/blob/master/" (e.g., in
the links for README, CONTRIBUTING, CHANGELOG, ARCHITECTURE, SKILL.md,
AGENTS.md, and other referenced docs) to use branch-agnostic forms—either
replace "/blob/master/" with "/blob/HEAD/" or convert to relative repository
paths (e.g., "README.md", "skill/SKILL.md") consistently across the file so
links remain valid across branches.

In `@README.md`:
- Line 38: The sentence in README.md repeats timing ("Two minutes." and "Within
minutes"); consolidate to a single concise timeframe by merging them into one
phrase (e.g., "Within two minutes" or "In two minutes") while keeping the rest
of the sentence ("No API keys. No external services. No configuration ceremony.
Uses your existing agent subscription. You'll see which skills are
undertriggering.") intact so the unique copy around the timing remains
unchanged.
- Line 28: Update the intro sentence "Works with **Claude Code**, **Codex**,
**OpenCode**, and **OpenClaw**. Zero runtime dependencies." to clarify that
Claude Code is the primary supported platform and the others are experimental
(for example: "Works with Claude Code (primary); Codex, OpenCode, and OpenClaw
are experimental"), or move the experimental caveat from the Platforms section
up near this sentence so readers see the support status immediately; ensure the
phrasing matches the Platforms section language to avoid contradictory
statements.
- Line 162: Update the README footer sentence "MIT licensed. Free forever. Works
with Claude Code, Codex, OpenCode, and OpenClaw." to clearly state experimental
status: indicate that only Claude Code is fully supported and that Codex,
OpenCode, and OpenClaw are experimental (or link to the Platforms section for
details). Edit the footer text in README.md and ensure consistency with the
Platforms section wording so readers aren’t misled about support levels.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ebce81d8-e948-4572-9738-4da92fc9012c

📥 Commits

Reviewing files that changed from the base of the PR and between 885af5d and 08e4dd9.

📒 Files selected for processing (6)
  • AGENTS.md
  • CONTRIBUTING.md
  • README.md
  • SECURITY.md
  • cli/selftune/contribute/contribute.ts
  • llms.txt

- llms.txt: /blob/master/ → /blob/HEAD/ for branch-agnostic URLs
- README line 28: clarify Claude Code primary, others experimental
- README line 38: remove redundant "Within minutes"
- README footer: match experimental language from Platforms section

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@WellDunDun WellDunDun merged commit bd44eb3 into dev Mar 15, 2026
6 checks passed
WellDunDun added a commit that referenced this pull request Mar 15, 2026
* Add make clean-branches target for repo hygiene

Deletes Conductor worktree branches (custom/prefix/router-*),
selftune evolve test branches, and orphaned worktree-agent-* branches.
Also prunes stale remote tracking refs. Run with `make clean-branches`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add composability v2: synergy detection, sequence extraction, workflow candidates

Extends the composability analysis with positive interaction detection
(synergy scores), ordered skill sequence extraction from usage timestamps,
and automatic workflow candidate flagging. Backwards compatible — v1
function and tests unchanged, CLI falls back to v1 when no usage log exists.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add workflow discovery, SKILL.md writer, and CLI command (v0.3)

Implements multi-skill workflow support: discovers workflow patterns from
existing telemetry, displays them via `selftune workflows`, and codifies
them to SKILL.md via `selftune workflows save`. Includes 48 tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review feedback for workflows

* fix: stabilize biome config for CI lint

* fix: address new PR review threads

* Fix all 13 demo findings: grade/evolve/status/doctor/telemetry

BUG-1: Remove false-positive git hook checks, fix hook key names to PascalCase
BUG-2: Auto-derive expectations from SKILL.md when none provided
BUG-3: Add --help output to grade command documenting --session-id
BUG-4: Prefer skills_invoked over skills_triggered in session matching
BUG-5: Add pre-flight validation and human-readable errors to evolve
BUG-6: Distinguish real Skill tool calls from SKILL.md browsing reads
IMP-1: Confirmed templates/ in package.json files array
IMP-2: Auto-install agent files during init
IMP-3: Show UNGRADED instead of CRITICAL when no graded sessions exist
IMP-4: Use portable npx selftune hook <name> instead of absolute paths
IMP-5: Add selftune auto-grade command
IMP-6: Mandate AskUserQuestion in evolve workflows
IMP-7: Add selftune quickstart command

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix hook subcommand to spawn hooks as subprocesses

Hook files guard execution behind import.meta.main, so dynamically
importing them was a no-op. Spawn as subprocess instead so stdin
payloads are processed and hooks write telemetry logs correctly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Address review comments

* Fix lint CI failures

* Trigger CI rerun

* Fix lint command resolution

* Address remaining review comments

* Fix grade-session test isolation

* docs: Align selftune skill docs with shipped workflows (#31)

* Update selftune workflow docs and skill versioning

* Improve selftune skill portability and setup docs

* Clarify workflow doc edge cases

* Fix OpenClaw doctor validation and workflow docs

* Polish composability and setup docs

* Fix BUG-7, BUG-8, BUG-9 from demo findings (#32)

* Fix BUG-7, BUG-8, BUG-9 from demo findings

BUG-7: Add try/catch + array validation around eval-set file loading in
evolve() so parse errors surface as user-facing messages instead of
silent exit.

BUG-8: Add cold-start bootstrap — when extractFailurePatterns returns
empty but the eval set has positive entries, treat those positives as
missed queries so evolve can work on skills with zero usage history.

BUG-9: Add --out flag to evals CLI parseArgs as alias for --output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix evolve CI regressions

* Isolate blog proof fixture mutations

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Fix dashboard data and extract telemetry contract (#33)

* Fix dashboard export and layout

* Improve telemetry normalization groundwork

* Add test runner state

* Separate and extract telemetry contract

* Fix telemetry CI lint issues

* Fix remaining CI regressions

* Detect zero-trigger monitoring regressions

* Stabilize dashboard report route tests

* Address telemetry review feedback

* Fix telemetry normalization edge cases (#34)

* Fix telemetry follow-up edge cases

* Fix rollback payload and Codex prompt attribution

* Tighten Codex rollout prompt tracking

* Update npm package metadata (#35)

* Prepare 0.2.1 release (#36)

* Prepare 0.2.1 release

* Update README install path

* Use trusted publishing for npm

* feat: harden LLM calls and fix test failures (#38)

* feat: consume @selftune/telemetry-contract as workspace package

Replace relative path imports of telemetry-contract with the published
@selftune/telemetry-contract workspace package. Adds workspace config to
package.json and expands tsconfig includes to cover packages/*.

Closes SEL-10

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(telemetry-contract): add versioning, metadata, and golden fixtures

Add version 1.0.0 and package metadata (description, author, license,
repository) to the telemetry-contract package. Create golden fixture file
with one valid example per record kind and a test suite that validates
all fixtures against the contract validator.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Make selftune source-truth driven

* Harden live dashboard loading

* Audit cleanup: test split, docs, lint fixes

- Add make test-fast / test-slow targets (5s vs 80s, 16x faster dev loop)
- Add bun run test:fast / test:slow scripts in package.json
- Reposition README as "Claude Code first", update competitive comparison
- Bump PRD.md version to 0.2.1
- Add CHANGELOG unreleased section (source-truth, telemetry-contract, test split)
- Fix pre-existing lint: types.ts formatting, golden.test.ts import order

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add sync flags and hook dispatch to integration guide

- Document all selftune sync flags (--since, --dry-run, --force, etc.)
- Add selftune hook dispatch command with all 6 hook names
- Verified init, activation rules, and source-truth sections already current

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Harden LLM calls and fix pre-existing test failures

Add exponential backoff retry to callViaAgent for transient subprocess
failures. Cap JSONL health-check validation at 500 lines to prevent
timeouts on large log files. Use exported DEFAULT_WINDOW_SESSIONS
constant in dashboard data collection instead of telemetry.length.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* Add local SQLite materialization layer for dashboard (#42)

* feat: add SQLite materialization layer for dashboard queries

Add a local SQLite database (via bun:sqlite) as an indexed materialized
view store so the dashboard/report UX no longer depends on recomputing
everything from raw JSONL logs on every request.

New module at cli/selftune/localdb/ with:
- schema.ts: 10 tables + 19 indexes mirroring canonical telemetry and
  local log shapes
- db.ts: openDb() lifecycle with WAL mode, meta key-value helpers
- materialize.ts: full rebuild and incremental materialization from
  JSONL source-of-truth logs
- queries.ts: getOverviewPayload(), getSkillReportPayload(),
  getSkillsList() query helpers

Raw JSONL logs remain authoritative — the DB is a disposable cache that
can always be rebuilt. No new npm dependencies (bun:sqlite only).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: resolve Biome lint and format errors

Auto-fix import ordering, formatting, and replace non-null assertions
with optional chaining in tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: implement autonomous selftune orchestrator core loop (#40)

* feat: add selftune orchestrate command for autonomous core loop

Introduces `selftune orchestrate` — a single entry point that chains
sync → status → evolve → watch into one coordinated run. Defaults to
dry-run mode with explicit --auto-approve for deployments.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — lint errors, logic bug, and type completeness

- Replace string concatenation with template literals (Biome lint)
- Add guard in evolve loop for agent-missing skip mutations
- Replace non-null assertion with `as string` cast
- Remove unused EvolutionAuditEntry import
- Complete DoctorResult mock with required fields

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: apply Biome formatting and import sorting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Build(deps): Bump oven-sh/setup-bun from 2.1.2 to 2.1.3 (#26)

Bumps [oven-sh/setup-bun](https://github.com/oven-sh/setup-bun) from 2.1.2 to 2.1.3.
- [Release notes](https://github.com/oven-sh/setup-bun/releases)
- [Commits](oven-sh/setup-bun@3d26778...ecf28dd)

---
updated-dependencies:
- dependency-name: oven-sh/setup-bun
  dependency-version: 2.1.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Build(deps): Bump actions/setup-node from 6.2.0 to 6.3.0 (#27)

Bumps [actions/setup-node](https://github.com/actions/setup-node) from 6.2.0 to 6.3.0.
- [Release notes](https://github.com/actions/setup-node/releases)
- [Commits](actions/setup-node@6044e13...53b8394)

---
updated-dependencies:
- dependency-name: actions/setup-node
  dependency-version: 6.3.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Build(deps): Bump github/codeql-action from 4.32.4 to 4.32.6 (#28)

Bumps [github/codeql-action](https://github.com/github/codeql-action) from 4.32.4 to 4.32.6.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](github/codeql-action@89a39a4...0d579ff)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-version: 4.32.6
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Improve sync progress and tighten query filtering (#43)

* Improve sync progress and tighten query filtering

* Fix biome formatting errors in sync.ts and query-filter.test.ts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: generic scheduling and reposition OpenClaw cron as optional (#41)

* feat: add generic scheduling command and reposition OpenClaw cron as optional

The primary automation story is now agent-agnostic. `selftune schedule`
generates ready-to-use snippets for system cron, macOS launchd, and Linux
systemd timers. `selftune cron` is repositioned as an optional OpenClaw
integration rather than the main automation path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit review — centralize schedule data, fix generators and formatting

Derive SCHEDULE_ENTRIES from DEFAULT_CRON_JOBS (single source of truth),
generate launchd/systemd configs for all 4 entries instead of sync-only,
fix biome formatting, and add markdown language tag.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use StartCalendarInterval for fixed-time launchd and shell wrappers for chained commands

- launchd: use StartCalendarInterval (Hour/Minute/Weekday) for fixed-time
  schedules instead of approximating with StartInterval
- launchd/systemd: use /bin/sh -c wrapper for commands with && chains
  so prerequisite steps (like sync) are not silently dropped

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add local dashboard SPA with React + Vite (#39)

* feat: add local dashboard SPA with React + Vite

Introduces a minimal React SPA at apps/local-dashboard/ with two routes:
overview (KPIs, skill health grid, evolution feed) and per-skill drilldown
(pass rate, invocation breakdown, evaluation records). Consumes existing
dashboard-server API endpoints with SSE live updates, explicit loading/
error/empty states, and design tokens matching the current dashboard.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit review feedback for local dashboard

Extract shared utils (deriveStatus, formatRate, timeAgo), add SSE exponential
backoff with max retries, filter ungraded skills from avg pass rate, fix stuck
loading state for undefined skillName, use word-boundary regex for evolution
filtering, add focus-visible styles, add typecheck script, and add Vite env
types.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address second round CodeRabbit review feedback

Cancel pending SSE reconnect timers on cleanup, add stale-request guard
to useSkillReport, remove redundant decodeURIComponent (React Router
already decodes), quote font names in CSS for stylelint, and format
deriveStatus signature for Biome.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: align local dashboard SPA with SQLite v2 data architecture

Migrate SPA from old JSONL-reading /api/data endpoints to new
SQLite-backed /api/v2/* endpoints. Add v2 server routes for overview
and per-skill reports. Replace SSE with 15s polling. Rewrite types
to match materialized query shapes from queries.ts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit review feedback on local dashboard SPA

- Add language identifier to HANDOFF.md fenced code block (MD040)
- Prevent overlapping polls in useOverview with in-flight guard and sequential setTimeout
- Broaden empty-state check in useSkillReport to include evolution/proposals
- Fix Sessions KPI to use counts.sessions instead of counts.telemetry
- Wrap materializeIncremental in try/catch to preserve last good snapshot on failure

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: sort imports to satisfy Biome organizeImports lint

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit nitpicks — cross-platform dev script, stricter types, CSS compat

- Use concurrently for cross-platform dev script instead of shell backgrounding
- Tighten Sidebar counts prop to Partial<Record<SkillHealthStatus, number>>
- Replace color-mix() with rgba fallback for broader browser support

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add UNKNOWN status filter and extract header height CSS variable

- Add UNKNOWN to STATUS_OPTIONS so all SkillHealthStatus values are filterable
- Extract hardcoded 56px header height to --header-h CSS variable

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: hoist sidebar collapse state to layout and add UNKNOWN filter style

- Lift collapsed state from Sidebar to Overview so grid columns resize properly
- Add .sidebar-collapsed grid rules at all breakpoints
- Fix mobile: collapsed sidebar no longer creates dead-end (shows inline)
- Add .filter-pill.active.filter-unknown CSS rule for UNKNOWN status

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: serve SPA as default dashboard, legacy at /legacy/

- Dashboard server now serves built SPA from apps/local-dashboard/dist/ at /
- Legacy dashboard moved to /legacy/ route
- SPA fallback for client-side routes (e.g. /skills/:name)
- Static asset serving with content-hashed caching for /assets/*
- Path traversal protection on static file serving
- Add build:dashboard script to root package.json
- Include apps/local-dashboard/dist/ in published files
- Falls back to legacy dashboard if SPA build not found

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add shadcn theming with dark/light toggle and selftune branding

Migrate dashboard to shadcn theme system with proper light/dark support.
Dark mode uses selftune site colors (navy/cream/copper), light mode uses
standard shadcn defaults. Add ThemeProvider with localStorage persistence,
sun/moon toggle in site header, and SVG logo with currentColor for both themes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: path traversal check and 404 for missing skills

Use path.relative() + isAbsolute() instead of startsWith() for the SPA
static asset path check to prevent directory traversal bypass. Return 404
from /api/v2/skills/:name when the skill has no usage data.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: biome formatting — semicolons, import order, line length

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit review — dedupe polling, fix stale closures, harden theme/config

- Lift useOverview to DashboardShell, pass as prop to Overview (no double polling)
- Fix stale closure in drag handler by deriving indices from prev state
- Validate localStorage theme values, use undefined context default
- Add relative positioning to theme toggle button for MoonIcon overlay
- Fix falsy check hiding zero values in chart tooltip
- Fix invalid Tailwind selectors in dropdown-menu and toggle-group
- Use ESM-safe fileURLToPath instead of __dirname in vite.config
- Switch manualChunks to function form for Base UI subpath matching
- Align pass-rate threshold with deriveStatus in SkillReport
- Use local theme provider in sonner instead of next-themes
- Add missing React import in skeleton, remove unused Separator import
- Include vite.config.ts in tsconfig for typecheck coverage
- Fix inconsistent JSX formatting in select scroll buttons

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit review round 2 — shared sorting, DnD fixes, Tailwind v4 migration

- Extract sortByPassRateAndChecks to utils.ts, dedupe sorting in App + Overview
- Derive DnD dataIds from row model (not raw data), guard against -1 indexOf
- Hide pagination when table is empty instead of showing "Page 1 of 0"
- Fix ActivityTimeline default tab to prefer non-empty dataset
- Import ReactNode directly instead of undeclared React namespace
- Quote CSS attribute selector in chart style injection
- Use stable composite keys for tooltip and legend items
- Remove unnecessary "use client" directive from dropdown-menu (Vite SPA)
- Migrate outline-none to outline-hidden for Tailwind v4 accessibility
- Fix toggle-group orientation selectors to match data-orientation attribute
- Add missing CSSProperties import in sonner.tsx
- Add dark mode variant for SkillReport row highlight
- Format vite.config.ts with Biome

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add evidence viewer, evolution timeline, and enhanced skill report

Add EvidenceViewer, EvolutionTimeline, and InfoTip components. Enhance
SkillReport with richer data display, expand dashboard server API
endpoints, and update documentation and architecture docs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit review round 3 — DnD/sort conflict, theme listener, formatting

- Disable DnD reorder when table sorting is active (skill-health-grid)
- Listen for OS theme preference changes when system theme is active
- Apply Biome formatting to sortByPassRateAndChecks
- Remove unused useEffect import from Overview
- Deduplicate confidence filter in SkillReport
- Materialize session IDs once in dashboard-server to avoid repeated subqueries

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: show selftune version in sidebar footer

Pass version from API response through to AppSidebar and display
it dynamically instead of hardcoded "dashboard v0.1".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: biome formatting in dashboard-server — line length wrapping

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit review round 4 — dedupe formatRate, STATUS_CONFIG, cleanup

- Remove duplicate formatRate from app-sidebar, import from @/utils
- Extract STATUS_CONFIG to shared @/constants module, import in both
  skill-health-grid and SkillReport
- Remove misleading '' fallback from sessionPlaceholders since the
  ternary guards already skip queries when empty

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove redundant items prop from Select to avoid duplication

The SelectItem children already define the options; the items prop
was duplicating them unnecessarily.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add sortableKeyboardCoordinates to KeyboardSensor for proper keyboard DnD

Without this, keyboard navigation moves by pixels instead of jumping
between sortable items.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: Linear-style dashboard UX — collapsible sidebar, direct skill links, scope grouping

- Simplify sidebar: remove status filters, keep logo + search + skills list
- Add collapsible scope groups (Project/Global) using base-ui Collapsible
- Surface skill_scope from DB query through API to dashboard types
- Replace skill drawer with direct Link navigation to skill report
- Add Scope column to skills table with filter dropdown
- Slim down site header: remove breadcrumbs, reduce to sidebar trigger + theme toggle
- Add side-by-side grid layout: skills table left, activity panel right
- Gitignore pnpm-lock.yaml alongside bun.lock

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit review — accessibility, semantics, state reset

- Remove bun.lock from .gitignore to maintain build reproducibility
- Preserve unexpected scope values in sidebar (don't drop unrecognized scopes)
- Add aria-label to skill search input for screen reader accessibility
- Switch status filter from checkbox to radio-group semantics (mutually exclusive)
- Reset selectedProposal when navigating between skills via useEffect on name

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add TanStack Query and optimize SQL queries for dashboard performance

Migrate data fetching from manual polling/dedup hooks to TanStack Query
for instant cached navigation, background refetch, and request dedup.
Optimize SQL: replace NOT IN subqueries with LEFT JOIN, move JS dedup
to GROUP BY, add LIMIT 200 to unbounded evidence queries.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: track root bun.lock for reproducible installs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit review — collapsible sync, drag handle dedup, a11y, not-found heuristic

- Make sidebar Collapsible controlled so it auto-opens when active skill
  changes (Comment #1)
- Consolidate useSortable to single call per row via React context,
  use setActivatorNodeRef on drag handle button (Comment #2)
- Remove capitalize CSS transform on free-form scope values (Comment #3)
- Broaden isNotFound heuristic to check invocations, prompts, sessions
  in addition to evals/evolution/proposals (Comment #4)
- Move Tooltip outside TabsTrigger to avoid nested interactive elements,
  use Base UI render prop for composition (Comment #5)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit nitpicks — version pinning, changelog clarity, shared query helper

- Use caret range for recharts version (^2.15.4) for consistency
- Clarify changelog: SSE was removed, polling via refetchInterval is primary
- Extract getPendingProposals() shared helper in queries.ts, used by both
  getOverviewPayload() and dashboard-server skill report endpoint

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit round 3 — deps, async fs, type safety, deterministic query

- Move @tailwindcss/vite, tailwindcss, shadcn to devDependencies
- Fix trailing space in version display when version is empty
- Type caught error as unknown in refreshV2Data
- Replace sync fs (readFileSync/statSync) with Bun.file() for hot-path asset serving
- Return 404 for missing /assets/* files instead of falling through to SPA
- Add details and eval_set fields to SkillReportPayload.evidence type
- Fix nondeterministic GROUP BY with ROW_NUMBER() CTE in getPendingProposals

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: resolve Biome lint and format errors in CI

- Replace non-null assertion with type cast in useSkillReport (noNonNullAssertion)
- Break long import line in dashboard-server.ts to satisfy Biome formatter

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit round 4 — CTE subqueries, type alignment, scope index

- Replace dynamic bind-parameter expansion with CTE subquery for session lookups
- Add skill_name to OverviewPayload.pending_proposals type to match runtime shape
- Add composite index on skill_usage(skill_name, skill_scope, timestamp) for scope lookups

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit round 5 — startup guard, 404 heuristic, deterministic tiebreaker

- Guard initial v2 materialization with try/catch to avoid full server crash
- Include evidence in not-found check so evidence-only skills aren't 404'd
- Add ea.id DESC tiebreaker to ROW_NUMBER() for deterministic pending proposals

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit round 6 — db guard, refresh throttle, deferred 404

- Guard openDb() in try/catch so DB bootstrap failure doesn't crash server
- Make db nullable, return 503 from /api/v2/* when store is unavailable
- Throttle failed refresh attempts with separate lastV2RefreshAttemptAt timestamp
- Move skill 404 check after enrichment queries (evolution, proposals, invocations)
- Use optional chaining for db.close() on shutdown

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* Prepare SPA dashboard release path (#44)

* Promote product planning docs

* Add execution plans for product gaps and evals

* Prepare SPA dashboard release path

* Remove legacy dashboard runtime

* Refresh execution plans after dashboard cutover

* Build dashboard SPA in CI and publish

* Refresh README for SPA release path

* Address dashboard release review comments

* Fix biome lint errors in dashboard tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Make autonomous loop the default scheduler path

* Document orchestrate as the autonomous loop

* Document autonomy-first setup path

* Harden autonomous scheduler install paths

* Clarify sync force usage in README

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: phased decision report for orchestrator explainability (#48)

* feat: add phased decision report to orchestrator

Orchestrate output now explains each decision clearly so users can trust
the autonomous loop. Adds formatOrchestrateReport() with 5-phase human
report (sync, status, decisions, evolution, watch) and enriched JSON
with per-skill decisions array. Supersedes PR #45.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: update orchestrate workflow docs and changelog for decision report

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove redundant null check in formatEvolutionPhase

The filter already guarantees evolveResult is defined; use non-null
assertion instead of a runtime guard.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add defensive optional chaining for watch snapshot in JSON output

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: avoid empty parentheses in watch report when snapshot missing

Consolidates pass_rate and baseline into a single conditional metrics
suffix so lines without a snapshot render cleanly. Addresses CodeRabbit
review feedback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve biome lint and format errors

Replace non-null assertion (!) with type-safe cast to satisfy
noNonNullAssertion rule, and collapse single-arg lines.push to one line
per biome formatter.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: evidence-based candidate selection gating (#50)

* feat: evidence-based candidate selection with cooldown, evidence, and trend gates

Add four gating rules to selectCandidates so autonomous evolution acts on
stronger signals and skips noisy/premature candidates:

- Cooldown gate: skip skills deployed within 24h
- Evidence gate: require 3+ skill_checks for CRITICAL/WARNING
- Weak-signal filter: skip WARNING with 0 missed queries + non-declining trend
- Trend boost: declining skills prioritized higher in sort order

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use epoch-ms comparison for timestamp gating, use cooldown constant in test

Fixes two CodeRabbit review issues:
- Timestamp comparisons in findRecentlyDeployedSkills and
  findRecentlyEvolvedSkills now use Date.parse + epoch-ms instead of
  lexicographic string comparison, which breaks on non-UTC offsets
- Test derives oldTimestamp from DEFAULT_COOLDOWN_HOURS instead of
  hardcoding 48, fixing the unused import lint error

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: fix import formatting in orchestrate test to satisfy CI

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: apply biome formatting to orchestrate and tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* E2E autonomy proof harness for evolution pipeline (#51)

* feat: add e2e autonomy proof harness for evolution pipeline

Proves three core autonomous evolution claims with 8 deterministic tests:
- Autonomous deploy: orchestrate selects WARNING skill, evolve deploys real SKILL.md
- Regression detection: watch fires alert when pass rate drops below baseline
- Auto-rollback: deploy→regression→rollback restores original file from backup

Uses dependency injection to skip LLM calls while exercising real file I/O
(deployProposal writes, rollback restores, audit trail persists).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve Biome lint errors in autonomy-proof test

Sort imports, fix formatting, remove unused imports, replace non-null
assertions with optional chaining.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: persist orchestrate run reports in dashboard (#49)

* feat: persist orchestrate run reports and expose in dashboard SPA

Orchestrate now writes a structured run report (JSONL) after each run,
materialized into SQLite for the dashboard. A new "Orchestrate Runs"
panel on the Overview page lets users inspect what selftune did, why
skills were selected/skipped/deployed, and review autonomous decisions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address CodeRabbit review findings

- Handle rolledBack state in OrchestrateRunsPanel badge rendering
- Show loading/error states instead of false empty state for orchestrate runs
- Move ORCHESTRATE_RUN_LOG to LOG_DIR (~/.claude) per log-path convention
- Validate limit param with 400 error for non-numeric input
- Derive run report counts from final candidates instead of stale summary
- Include error message in appendJsonl catch for diagnosability

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: update autonomy-proof test fixtures for new candidate selection gates

After merging dev, selectCandidates gained cooldown, evidence, and
weak-signal gates. The test fixtures used snapshot: null and
trend: "declining", which caused skills to be skipped by the
insufficient-evidence gate and missed the renamed trend value "down".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: unify summary totals to prevent CLI/dashboard metric drift

Both result.summary and runReport now derive from a single
finalTotals object computed from the final candidates array,
eliminating the possibility of divergent counts between CLI
output and persisted dashboard data.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: demo-ready CLI consolidation + autonomous orchestration (#52)

* Refresh architecture and operator docs

* docs: align docs and skill workflows with autonomy-first operator path

Reframes operator guide around autonomy-first setup, adds orchestrate runs
endpoint to architecture/dashboard docs, and updates skill workflows to
recommend --enable-autonomy as the default initialization path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: autoresearch-inspired UX improvements for demo readiness

- orchestrate --loop: continuous autonomous improvement cycle with configurable interval
- evolve: default cheap-loop on, add --full-model escape hatch, show diff after deploy
- bare `selftune` shows status dashboard instead of help text

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: show selftune resource usage instead of session-level metrics in skill report

Skill report cards now display selftune's own LLM calls and evolution
duration per skill (from orchestrate_runs) instead of misleading
session-level token/duration aggregates. Also extracts tokens and
duration from transcripts into canonical execution facts for future use.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: consolidate CLI from 28 flat commands to 21 grouped commands

Group 15 related commands under 4 parent commands:
- selftune ingest <agent> (claude, codex, opencode, openclaw, wrap-codex)
- selftune grade [mode] (auto, baseline)
- selftune evolve [target] (body, rollback)
- selftune eval <action> (generate, unit-test, import, composability)

Update all 39 files: router, subcommand help text, SKILL.md, workflow
docs, design docs, README, PRD, CHANGELOG, and agent configs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address CodeRabbit PR review comments

- Filter skip/watch actions from selftune_stats run counts
- Restore legacy token_usage/duration_stats from execution_facts
- Cooperative SIGINT/SIGTERM shutdown for orchestrate loop
- Validate --window as positive integer with error message
- Add process.exit guard for bare selftune status fallthrough
- Update ARCHITECTURE.md import matrix for Dashboard dependencies
- Fix adapter count, code fence languages, and doc terminology

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address remaining review comments and stale replay references

- Escape SQL LIKE wildcards in dashboard skill name query
- Add Audit + Rollback steps to SKILL.md feedback loop
- Fix stale "replay" references in quickstart help text and quickstart.ts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve CI lint failures

- Fix dashboard-server.ts indentation on LIKE escape pattern
- Prefix unused deployedCount/watchedCount with underscore
- Format api.ts import to multi-line per biome rules

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: escape backslash in SQL LIKE pattern to satisfy CodeQL

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add system status page to dashboard with doctor diagnostics

Surfaces the doctor health checks (config, log files, hooks, evolution)
through a new /status route in the dashboard SPA, so humans can monitor
selftune health without touching the CLI.

- Add GET /api/v2/doctor endpoint to dashboard server
- Add DoctorResult/HealthCheck types to dashboard contract
- Create Status page with grouped checks, summary cards, auto-refresh
- Add System Status link in sidebar footer
- Update all related docs (ARCHITECTURE, HANDOFF, system-overview, Dashboard workflow)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add ESCAPE clause to LIKE query and fix stale replay label

- SQLite LIKE needs explicit ESCAPE '\\' for backslash escapes to work
- Rename "Replay failed" to "Ingest failed" in quickstart error output

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address CodeRabbit review comments on status page PR

- Add "Other" fallback group for unknown check types in Status page
- Use compound key (name+idx) to avoid React key collisions
- Re-export DoctorResult types from types.ts instead of duplicating
- Fix orchestrate loop sleep deadlock on SIGINT/SIGTERM
- Replace stale SSE references with polling-based refresh in Dashboard docs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: establish agent-first architecture principle across repo

selftune is a skill consumed by agents, not a CLI tool for humans.
Users install the skill and talk to their agent ("improve my skills"),
the agent reads SKILL.md, routes to workflows, and runs CLI commands.

- AGENTS.md: add Agent-First Architecture section + dev guidance
- ARCHITECTURE.md: add Agent-First Design Principle at top
- SKILL.md: add agent-addressing preamble ("You are the operator")

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: demo-ready P0 fixes from architecture audit

Four parallel agent implementations:

1. SKILL.md trigger keywords: added natural-language triggers across 10
   workflows + 13 new user-facing examples ("set up selftune", "improve
   my skills", "how are my skills doing", etc.)

2. Hook auto-merge: selftune init now automatically merges hooks into
   ~/.claude/settings.json for Claude Code — no manual settings editing.
   Initialize.md updated to reflect auto-install.

3. Cold-start fallback: quickstart detects empty telemetry after ingest
   and shows hook-discovered skills or guidance message instead of blank
   output. No LLM calls, purely data-driven.

4. Dashboard build: added prepublishOnly script to ensure SPA is built
   before npm publish (CI already did this, but local publish was not).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: defensive checks fallback and clarify reserved counters

- Status.tsx: default checks to [] if API returns undefined
- orchestrate.ts: annotate _deployedCount/_watchedCount as reserved

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: prioritize Claude Code, unify cron/schedule, remove dead code

- Mark Codex/OpenCode/OpenClaw as experimental across docs, SKILL.md,
  CLI help text, and README. Claude Code is the primary platform.
- Unify cron and schedule into `selftune cron` with --platform flag
  for agent-specific setup. `selftune schedule` kept as alias.
- Remove dead _deployedCount/_watchedCount counters from orchestrate.ts
  (summary already computed via array filters in Step 7).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: document two operating modes and data architecture

- ARCHITECTURE.md: add Interactive vs Automated mode explanation,
  document JSONL-first data flow with SQLite as materialized view
- Cron.md: fix stale orchestrate schedule (weekly → every 6 hours),
  correct "agent runs" to "OS scheduler calls CLI directly"
- Orchestrate.md: add execution context table (interactive vs automated)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: rewrite all 22 workflow docs for agent-first consistency

Three parallel agents rewrote workflow docs so the agent (not the human)
is the operator:

Critical (3 files): Evolve.md, Evals.md, Baseline.md
- Pre-flight sections now have explicit selection-to-flag mapping tables
- Agent knows exactly how to parse user choices into CLI commands

Moderate (11 files): Initialize, Dashboard, Watch, Grade, Contribute,
UnitTest, Sync, AutoActivation, Orchestrate, Doctor, Replay
- "When to Use" sections rewritten as agent trigger conditions
- "Common Patterns" converted from user quotes to agent decision logic
- Steps use imperative agent voice throughout
- Replay.md renamed to "Ingest (Claude) Workflow" with compatibility note

Minor (8 files): Composability, Schedule, Cron, Badge, Workflows,
EvolutionMemory, ImportSkillsBench, Ingest
- Added missing "When to Use" sections
- Added error handling guidance
- Fixed agent voice consistency

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add autonomous mode + connect agents to workflows

Autonomous mode:
- Evolve, Watch, Grade, Orchestrate workflows now document their
  behavior when called by selftune orchestrate (no user interaction,
  defaults used, pre-flight skipped, auto-rollback enabled)
- SKILL.md routing table marks autonomous workflows with †

Agent connections:
- All 4 agents (.claude/agents/) now have "Connection to Workflows"
  sections explaining when the main agent should spawn them
- Key workflows (Evolve, Doctor, Composability, Initialize) now have
  "Subagent Escalation" sections referencing the relevant agent
- SKILL.md agents table adds "When to spawn" column with triggers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: remove duplicate findRecentlyEvolvedSkills function

findRecentlyEvolvedSkills was identical to findRecentlyDeployedSkills.
Consolidated into one function used for both cooldown gating and
watch targeting.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address 21 CodeRabbit review comments

AGENTS.md:
- Add missing hook files and openclaw-ingest to project tree
- Use selftune cron as canonical scheduling command

Status.tsx:
- Add aria-label and title to refresh button for accessibility

ARCHITECTURE.md:
- Use canonical JSONL filenames matching constants.ts
- Add text language specifier to code block

index.ts:
- Add --help handlers for grade and evolve grouped commands
- Add --help handler for eval composability before parseArgs

quickstart.ts:
- Fix stale "Replay" comment to "Ingest"

Workflow docs:
- Cron.md: fix --format to --platform, add text fence, add --skill-path
- Evals.md: fix HTML entities to literal angle brackets
- Evolve.md: replace placeholder with actual --pareto flag
- Grade.md: clarify results come from grading.json not stdout
- Ingest.md: fix wrap-codex error guidance (no --verbose flag)
- Initialize.md: use full selftune command form, fix relative path
- Orchestrate.md: fix token cost contradiction, document --loop mode
- Sync.md: clarify synced=0 is valid, fix output parsing guidance

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: real-time improvement signal detection and reactive orchestration

Core feature: selftune now detects when users correct skill misses
("why didn't you use X?", "please use the commit skill") and triggers
focused improvement automatically when the session ends.

Signal detection (prompt-log.ts):
- Pure regex patterns detect corrections, explicit requests
- Extracts mentioned skill name from query text
- Appends to improvement_signals.jsonl (zero LLM cost)

Reactive trigger (session-stop.ts):
- Checks for pending signals when session ends
- Spawns background selftune orchestrate if signals exist
- Lockfile prevents concurrent runs (30-min stale threshold)

Signal-aware orchestrator (orchestrate.ts):
- Reads pending signals at startup (no new CLI flags)
- Boosts priority of signaled skills (+150 per signal, cap 450)
- Signaled skills bypass evidence and UNGRADED gates
- Marks signals consumed after run completes
- Lockfile acquire/release wrapping full orchestrate body

Tests: 32 new tests across 2 files (signal detection + orchestrator)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: document signal-reactive improvement across architecture docs

- ARCHITECTURE.md: add Signal-Reactive Improvement section with mermaid
  sequence diagram showing signal flow from prompt-log to orchestrate
- Orchestrate.md: add Signal-Reactive Trigger section with guard rails
- evolution-pipeline.md: add signal detection as pipeline input
- system-overview.md: add signal-reactive path to system overview
- logs.md: document improvement_signals.jsonl format and fields

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address 14 CodeRabbit review comments (round 4)

Agents:
- diagnosis-analyst: resolve activation contradiction (subagent-only)
- evolution-reviewer: inspect recorded eval file, not regenerated one
- integration-guide: carry --skill flag through to evolve command

Dashboard:
- Status.tsx: defensive fallback for unknown health status values

CLI:
- index.ts: remove redundant process.exit(0) after statusMain
- index.ts: strict regex validation for --window (reject "10days")

Quickstart:
- Remove misleading [2/3] prefix from post-step check

Workflows:
- SKILL.md: add text language specifier to feedback loop diagram
- Initialize.md: add blank line + text specifier to code block
- Orchestrate.md: fix sync step to use selftune sync not ingest claude
- Doctor.md: route missing-telemetry fixes by agent platform
- Evals.md: add skill-path to synthetic pre-flight, note haiku alias
- Ingest.md: wrap-codex uses wrapper not hooks for telemetry

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: dependency map, README refresh, dashboard signal exec plan

1. AGENTS.md: Add Change Propagation Map — "if you change X, update Y"
   table that agents check before committing. Prevents stale docs.

2. README.md: Refresh for v0.2 architecture:
   - Agent-first framing ("tell your agent" not "run this command")
   - Grouped commands table (ingest, grade, evolve, eval, auto)
   - Signal-reactive detection mentioned in Detect section
   - Automate section with selftune cron setup
   - Removed CLI-centric use case descriptions

3. Exec plan for dashboard signal integration (planned, not started):
   - Schema + materialization + queries + contract + API + UI
   - 3 parallel agent workstreams, ~4.5 hours estimated

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: migrate repo URLs from WellDunDun to selftune-dev org

Update all repo path references (badges, clone URLs, install command,
contribute PR target, security tab link, llms.txt) from personal
WellDunDun/selftune to org selftune-dev/selftune.

Kept as WellDunDun (personal account, not repo path):
- CODEOWNERS (@WellDunDun)
- FUNDING.yml (sponsors/WellDunDun)
- LICENSE copyright
- PRD owner field

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add repo org/name migration to change propagation map

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address 13 CodeRabbit review comments (round 5)

Code fixes:
- prompt-log.ts: broaden skill capture regex to [\w-]+ for hyphenated names
- session-stop.ts: atomic lock acquisition with openSync("wx") + cleanup
- orchestrate.ts: re-read signal log before write to prevent race condition

Dashboard:
- Status.tsx: defensive defaults for healthy, summary, timestamp

Docs:
- ARCHITECTURE.md: use selftune cron setup as canonical scheduler
- Orchestrate.md: fix loop-interval default (3600s not 300s)
- Evals.md: fix option numbering (1-5) and selection mapping (4a/4b/4c)
- Initialize.md: use selftune init --force instead of repo-relative paths
- logs.md: document signal consumption as exception to append-only

Tests:
- signal-detection: fix vacuous unknown-skill test
- signal-orchestrate: exercise missing-log branch with non-empty signals

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: llms.txt branch-agnostic links, README experimental clarity

- llms.txt: /blob/master/ → /blob/HEAD/ for branch-agnostic URLs
- README line 28: clarify Claude Code primary, others experimental
- README line 38: remove redundant "Within minutes"
- README footer: match experimental language from Platforms section

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant