Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

README.md

The Learning Loop

The learning loop is how Cortex turns raw tool usage into behavioral rules. It is an offline pipeline — not real-time — that processes observation logs, extracts patterns, scores their confidence, and promotes the strongest patterns into Claude Code's permanent rule set.

The loop runs on a 7-day rolling window. Patterns must appear consistently over multiple sessions before they earn promotion. This prevents one-off behaviors from becoming permanent habits.

Pipeline Overview

 OBSERVE          ANALYZE           INSTINCT          PROMOTE           DECAY
 (PostToolUse)    (analyze.py)      (instinct files)  (generated-rules) (archive)

 +----------+     +----------+      +----------+      +----------+      +----------+
 | Tool call|---->| 7-day    |----->| Pattern  |----->| Rule in  |      | Archived |
 | logged   |     | rolling  |      | file w/  |      | Claude's |      | pattern  |
 | to JSONL |     | window   |      | score    |      | behavior |      | below    |
 +----------+     +----------+      +----------+      +----------+      | threshold|
      |                |                  |                 |            +----------+
      |                |                  |                 |                 ^
      v                v                  v                 v                 |
  observe.sh       analyze.py        homunculus/       generated-         confidence
  (async)          (on Stop)         instincts/        rules.md           drops over
                                                                          time

Stage 1: Observe

Component: observe.sh (PostToolUse hook)

Every tool call Claude makes is logged to a JSONL file in the homunculus directory. Each observation records:

  • Tool name (Bash, Read, Edit, Grep, etc.)
  • Timestamp
  • Success or failure
  • Duration in milliseconds
  • The two preceding tools (forming a 3-tool chain)
  • Session identifier

The observer runs asynchronously so it does not block Claude's workflow. It is append-only and write-only during the session.

Example JSONL entry:

{"ts":"2026-03-04T14:22:31Z","tool":"Edit","ok":true,"ms":89,"prev":"Read","prev2":"Grep","session":"2026-03-04-a"}

Stage 2: Analyze

Component: analyze.py (runs on Stop hook, or manually)

The analyzer processes all observation files within a 7-day rolling window. It performs three operations:

Chain extraction. Groups consecutive tool calls into 3-tool chains and counts their frequency. A chain like Bash -> Read -> Edit that appears 45 times across sessions is a strong workflow signal.

Error-retry detection. Identifies cases where a Bash command fails and is immediately retried. High retry counts suggest pre-validation opportunities.

Confidence scoring. Each pattern receives a confidence score based on three factors:

confidence = frequency x consistency x recency

Where:

  • Frequency is the raw count of observations, normalized against the most common pattern. A pattern seen 45 times when the most common is 50 scores 0.90 on frequency.
  • Consistency is the ratio of sessions where the pattern appeared vs total sessions in the window. A pattern appearing in 6 of 7 sessions scores 0.86 on consistency.
  • Recency applies temporal decay — recent observations weigh more than old ones. Uses the decay function from lib/temporal-decay.js with configurable half-life (default: 7 days).

The combined score is expressed as a percentage. Patterns scoring below 50% are dropped from active tracking.

Output: Instinct files written to homunculus/instincts/, one per pattern, plus a daily-summary.md rollup.

Constraints: Maximum 30 active instinct files at any time. Archive pruning removes files older than 30 days.

Stage 3: Instinct

Component: Instinct files in homunculus/instincts/

An instinct file represents a recognized but not-yet-promoted pattern. It contains:

# Instinct: Bash->Read->Edit chain
- Confidence: 88%
- Observations: 45
- First seen: 2026-02-26
- Last seen: 2026-03-04
- Sessions: 6/7

Instinct files are intermediate state. They are visible to the analyzer (which updates them) and to the promotion check (which reads them), but they do not directly affect Claude's behavior. They are hypotheses about useful patterns, waiting for sufficient evidence.

Stage 4: Promote

Component: /bye skill + analyze.py promotion check

A pattern is promoted when it meets both thresholds:

  • Confidence >= 90%
  • Observations >= 5 (across at least 2 distinct sessions)

Promoted patterns are written to generated-rules.md, which is loaded as a cross-cutting rule by Claude Code on every session. Once promoted, the pattern directly influences Claude's behavior.

Example promoted rule:

- **[95%, 127x seen]** Frequent tool chain: Bash->Bash->Bash (seen 127x)
- **[95%, 65x seen]** Frequent tool chain: Read->Read->Read (seen 65x)
- **[93%, 21x seen]** Bash error-retry pairs detected (21x). Consider pre-validation.

The promotion check runs during /bye (session consolidation). This ensures patterns are evaluated when observation data is fresh and complete.

Stage 5: Decay

Patterns that are not reinforced lose confidence over time. The decay function (lib/temporal-decay.js) applies exponential decay with a configurable half-life:

decayed_score = score x 0.5^(days_since_last_seen / half_life)

Default half-life is 7 days. A pattern with 90% confidence that is not seen for 14 days drops to ~22.5%. At that point, the archive pruner removes it from active instinct files and moves it to the archive.

This prevents stale patterns from cluttering the rule set. If a workflow changes — say, a new tool replaces an old chain — the old pattern naturally fades without manual intervention.

Before and After

Without the learning loop, Claude approaches every session as a blank slate (beyond its training). It has no awareness of the user's workflow patterns, common tool sequences, or recurring mistakes. Every session rediscovers the same approaches.

Situation Without Loop With Loop
User opens a file for editing Claude may Grep, then Read the whole file, then Edit Claude knows "Grep -> targeted Read -> Edit" is the established pattern and follows it directly
Bash command fails Claude retries immediately, possibly multiple times Promoted rule flags error-retry pairs, prompting pre-validation before execution
Multi-file implementation Claude picks an ad-hoc order Promoted chains guide tool sequencing: Read dependencies first, then implement, then test
Session start Cold start, no context Synergatis loads state, Cortex routes to relevant capabilities, promoted rules shape behavior

With the learning loop, Claude's behavior converges toward the user's actual workflow over time. The loop is conservative by design — it takes multiple sessions and consistent evidence before a pattern earns behavioral influence.

Concrete Example

Observation phase (sessions 1-4):

The PostToolUse observer logs the following across four sessions:

Session 1: Bash(grep) -> Read(target) -> Edit(fix)     x3
Session 2: Bash(grep) -> Read(target) -> Edit(fix)     x5
Session 3: Bash(grep) -> Read(target) -> Edit(fix)     x4
Session 4: Grep -> Read(target) -> Edit(fix)            x6
            Bash(grep) -> Read(target) -> Edit(fix)     x2

Analysis phase (after session 4):

analyze.py processes the 7-day window:

  • Pattern Bash->Read->Edit: 14 observations across 4/4 sessions
  • Pattern Grep->Read->Edit: 6 observations across 1/4 sessions

Confidence for Bash->Read->Edit:

  • Frequency: 14 observations (normalized against window) = 0.82
  • Consistency: 4/4 sessions = 1.00
  • Recency: all within 7 days = ~0.95
  • Combined: 0.82 x 1.00 x 0.95 = 78% -- not yet promoted (threshold: 90%)

Confidence for Grep->Read->Edit:

  • Frequency: 6 observations = 0.35
  • Consistency: 1/4 sessions = 0.25
  • Combined: too low for consideration

Observation phase (sessions 5-7):

The pattern evolves — Grep replaces Bash for search:

Session 5: Grep -> Read -> Edit    x8
Session 6: Grep -> Read -> Edit    x7
Session 7: Grep -> Read -> Edit    x9

Analysis phase (after session 7):

With sessions 1-3 decayed out of the 7-day window, the analyzer now sees:

  • Pattern Grep->Read->Edit: 30 observations across 4 sessions (4, 5, 6, 7)
  • Frequency: 30 observations = 1.00 (dominant pattern in window)
  • Consistency: 4/4 active sessions = 1.00
  • Recency: all within last 3 sessions = ~0.95
  • Combined: 1.00 x 1.00 x 0.95 = 95%

Promotion: The pattern crosses both thresholds (95% confidence, 30 observations >= 5 minimum). It is written to generated-rules.md:

- **[95%, 30x seen]** Frequent tool chain: Grep->Read->Edit (seen 30x)

Effect: In session 8 and beyond, Claude's rule set includes this pattern. When faced with a search-then-edit task, Claude's behavior is nudged toward the Grep->Read->Edit sequence rather than ad-hoc alternatives.

Configuration

Parameter Default Location
Rolling window 7 days analyze.py
Promotion confidence threshold 90% analyze.py
Minimum observations for promotion 5 analyze.py
Decay half-life 7 days lib/decay-config.json
Max active instinct files 30 analyze.py
Archive pruning age 30 days analyze.py
Forced eval interval Every 5 turns hooks/prompt-router.sh

All parameters are configurable. The defaults are tuned for daily use with 1-3 sessions per day. Users with higher session frequency may want to increase the promotion threshold to avoid premature pattern lock-in.