Skip to content

feat(hooks+llm): turn-based memory injection on UserPromptSubmit + Claude CLI HyDE provider#357

Open
Huntehhh wants to merge 5 commits into
buildingjoshbetter:mainfrom
Huntehhh:feat/turn-based-injection-and-claude-cli-llm
Open

feat(hooks+llm): turn-based memory injection on UserPromptSubmit + Claude CLI HyDE provider#357
Huntehhh wants to merge 5 commits into
buildingjoshbetter:mainfrom
Huntehhh:feat/turn-based-injection-and-claude-cli-llm

Conversation

@Huntehhh
Copy link
Copy Markdown
Contributor

Summary

Three coupled changes targeting Pro-tier subscription users on Windows.

Turn-based injection (UserPromptSubmit) — fires once per session when user-turn count ≥ 13 OR current prompt length > 333 chars (whichever first). Query = last 6 user turns concatenated, separated by \n---\n, giving the vector pipeline richer semantic context than the existing raw-prompt path. Marker file at ~/.truememory/turn_injected/<safe_session_id> dedupes re-fires; pattern mirrors should_extract_session / mark_session_extracted. The existing _detect_recall regex path is untouched and still fires every turn for explicit recall questions — both paths can emit on the same prompt, combined into one additionalContext JSON line.

Claude CLI priority-1 _build_llm_fn provider_build_claude_cli_llm wraps the existing truememory.ingest.models._complete_claude_cli helper into the (prompt) -> str shape search_deep expects. Pro tier now gets HyDE for free on subscription auth — no API key required. Opt out via TRUEMEMORY_DISABLE_CLAUDE_CLI=1.

Sub-agent hook skip — new is_subagent_invocation() helper in _shared.py; all 3 hooks early-return for Claude Code Task-tool sub-agents (detected via agent_id field in stdin OR /subagents/ in transcript_path). Stops per-sub-agent terminal flash + orchestrator-prompt memory pollution.

Depends on #345 (fcntl Windows portability) — cherry-picked into this branch; will dedupe on rebase once #345 merges.

Changes

File Change
truememory/mcp_server.py +_build_claude_cli_llm, prepend claude_cli to _LLM_PROVIDERS, add no-key path + TRUEMEMORY_DISABLE_CLAUDE_CLI kill-switch in _build_llm_fn
truememory/ingest/hooks/_shared.py +is_subagent_invocation, +TURN_INJECTED_DIR, +count_user_turns, +already_injected, +mark_injected
truememory/ingest/hooks/user_prompt_submit.py +_build_turn_based_query, +_try_turn_based_injection, sub-agent guard in main(), single-emit restructure so explicit-recall + turn-based combine into one additionalContext line
truememory/ingest/hooks/session_start.py Sub-agent guard at top of main()
truememory/ingest/hooks/stop.py Sub-agent guard at top of main()
tests/test_llm_fn_init.py +6 Claude CLI cases; fixture defaults TRUEMEMORY_DISABLE_CLAUDE_CLI=1 so legacy API-key tests don't pick up the new priority-1 provider
tests/test_turn_based_injection.py NEW — 19 cases covering helpers, gate logic, marker dedup, kill switch, search-exception graceful-degrade
tests/ingest/test_hooks_windows_portability.py From #345 cherry-pick — will dedupe on rebase

New env vars (all overridable, sensible defaults)

Var Default Effect
TRUEMEMORY_INJECT_AFTER_TURNS 13 Turn-count trigger threshold
TRUEMEMORY_INJECT_AFTER_CHARS 333 Prompt-length trigger threshold
TRUEMEMORY_INJECT_QUERY_TURNS 6 Recent turns concatenated into the query
TRUEMEMORY_INJECT_QUERY_TURN_CHARS 500 Per-turn truncation in the query (prevents pasted code blowing up the embedding input)
TRUEMEMORY_INJECT_RECALL_LIMIT 15 Memories fetched
TRUEMEMORY_INJECT_BULLET_CHARS 300 Per-bullet truncation in the emitted block
TRUEMEMORY_INJECT_DISABLED unset Kill switch — disables turn-based injection entirely
TRUEMEMORY_DISABLE_CLAUDE_CLI unset Force _build_llm_fn to skip Claude CLI and fall through to API-key providers
TRUEMEMORY_HYDE_CLAUDE_MODEL claude-haiku-4-5 Model used for HyDE doc generation via Claude CLI

Test plan

  • pytest tests/test_llm_fn_init.py tests/test_turn_based_injection.py tests/ingest/test_hooks_windows_portability.py -v → 37 pass / 1 skip (fcntl.flock path, Linux-only)
  • Sub-agent detection: 6/6 cases pass (agent_id, agent_type, /subagents/ path, mixed-slash Windows paths, empty input)
  • In-process smoke of _try_turn_based_injection with stubbed Memory — emits <truememory-context> block at trigger=turns, writes marker, re-fire returns None
  • On a machine with claude.exe + OPENROUTER_API_KEY set: truememory_search_deep selects claude_cli provider; verify _current_llm_provider_name == "claude_cli" and zero OpenRouter spend
  • End-to-end manual smoke in a fresh Claude Code session: type a 350-char prompt → <truememory-context> emitted (~500ms HyDE pause); subsequent prompts → no re-fire; marker at ~/.truememory/turn_injected/<session_id> exists with trigger, query_chars, n_results, timestamp
  • Sub-agent Task spawn: hooks exit in <10ms, no recall / buffer write / ingestion side effects, no terminal flash

Breaking changes

Provider priority shift for users with both Claude CLI on PATH and an API key set:

- _LLM_PROVIDERS = (
-     ("anthropic", "ANTHROPIC_API_KEY", ...),
-     ("openrouter", "OPENROUTER_API_KEY", ...),
-     ("openai", "OPENAI_API_KEY", ...),
- )
+ _LLM_PROVIDERS = (
+     ("claude_cli", "", "", "_build_claude_cli_llm"),   # NEW — priority 1
+     ("anthropic", "ANTHROPIC_API_KEY", ...),
+     ("openrouter", "OPENROUTER_API_KEY", ...),
+     ("openai", "OPENAI_API_KEY", ...),
+ )

Old behavior: API key wins → user pays per HyDE call.
New behavior: Claude CLI wins → user pays nothing (subscription auth).

Users who deliberately want the API-key path (e.g., centralized billing, rate-limit isolation, paid Anthropic plan for HyDE consistency) set TRUEMEMORY_DISABLE_CLAUDE_CLI=1. Documented in the new env-var table above.

Users without claude.exe on PATH see identical behavior — the new provider falls through to the existing API-key chain.

Co-Authored-By: claude-opus-4-7 wontreply@getfucked.ai

Huntehhh and others added 5 commits May 18, 2026 21:46
truememory/ingest/hooks/_shared.py and session_start.py both had bare
`import fcntl` at module top with no `try / except ImportError` guard.
fcntl is POSIX-only — on Windows every import raised ModuleNotFoundError
before any function ran, which cascaded to every Claude Code hook
subprocess that imports from _shared (SessionStart, UserPromptSubmit,
Stop). Silent failure mode: no memory extraction after sessions, no
context injection at session start, no buffer writes on user prompts.

Fix mirrors the established `_HAS_FCNTL` flag pattern already used by
ingest/pipeline.py, hooks/core.py, and ingest/hooks/user_prompt_submit.py.
The two affected call sites:

- _shared.check_extraction_budget — POSIX uses fcntl.flock(LOCK_EX) for
  atomic read-modify-write across concurrent ingest processes. Windows
  falls back to non-atomic R/M/W; worst-case race window allows 1-2
  extra extractions per hour, acceptable given the alternative is no
  enforcement at all.

- session_start._scan_stale_sessions — POSIX uses a non-blocking advisory
  lock to prevent concurrent scans. Windows skips the lock; the backlog
  drainer's atomic .json → .processing rename already deduplicates any
  overlapping work.

Tests in tests/ingest/test_hooks_windows_portability.py (8 tests) pin
the _HAS_FCNTL flag contract on both modules, verify check_extraction_budget
behaves correctly without fcntl (allows, enforces cap, resets hourly),
and include a POSIX-only regression lock that fcntl.flock(LOCK_EX) is
still acquired on platforms where it's available — so a future refactor
can't silently drop cross-process coordination.

Co-Authored-By: claude-opus-4-7 <wontreply@getfucked.ai>
Two bundled changes that ship together because the new hook depends on
the new provider being wired.

1. Claude CLI llm_fn (mcp_server.py)
   * Add `_build_claude_cli_llm` factory wrapping the existing
     `truememory.ingest.models._complete_claude_cli` helper in a
     `(prompt) -> str` closure compatible with `search_deep`'s `llm_fn`.
   * Prepend `("claude_cli", "", "", "_build_claude_cli_llm")` to
     `_LLM_PROVIDERS` as priority 1 (uses Claude Code subscription auth,
     zero cash spend). Falls through to API-key providers if `claude.exe`
     isn't on PATH or `TRUEMEMORY_DISABLE_CLAUDE_CLI=1` is set.
   * `_build_llm_fn` loop now special-cases the no-key Claude CLI path
     while preserving the env-var / config-key contract for the other
     three providers.
   * Behavior change for users with both Claude CLI on PATH AND an API
     key set: Claude CLI now wins. Opt out via the kill-switch env var.

2. Turn-based memory injection (user_prompt_submit.py + _shared.py)
   * New gate: fire ONE targeted recall per session when conversation
     has signal — `count_user_turns >= INJECT_AFTER_TURNS` (default 13)
     OR `len(current_prompt) > INJECT_AFTER_CHARS` (default 333).
   * Query is built from the last K user turns (default 6) concatenated
     with `---` separators, NOT just the raw current prompt — captures
     topic commitment + project context for richer vector search.
   * Calls `Memory.search_deep(query, user_id, limit=15, llm_fn=llm_fn)`
     so HyDE fires when the Claude CLI provider above resolves.
   * Marker file at `~/.truememory/turn_injected/<safe_session_id>`
     dedupes re-fires; pattern mirrors `should_extract_session` /
     `mark_session_extracted`. Atomic write via tempfile + replace.
   * Existing `_detect_recall` regex path stays untouched — explicit
     recall questions still work every turn. Both paths can fire on the
     same prompt; combined output goes out as a single
     `additionalContext` JSON line to avoid double-emit.
   * New env vars (all overridable): `TRUEMEMORY_INJECT_AFTER_TURNS`,
     `TRUEMEMORY_INJECT_AFTER_CHARS`, `TRUEMEMORY_INJECT_QUERY_TURNS`,
     `TRUEMEMORY_INJECT_QUERY_TURN_CHARS`, `TRUEMEMORY_INJECT_RECALL_LIMIT`,
     `TRUEMEMORY_INJECT_BULLET_CHARS`, `TRUEMEMORY_INJECT_DISABLED`.

Tests:
 * tests/test_llm_fn_init.py: +6 cases for Claude CLI priority, kill
   switch, error-fall-through. Fixture now defaults
   `TRUEMEMORY_DISABLE_CLAUDE_CLI=1` so legacy API-key tests don't
   accidentally pick up the new priority-1 provider.
 * tests/test_turn_based_injection.py (new): 19 cases covering
   `count_user_turns` parsing (JSONL + JSON-array + empty + missing),
   marker round-trip, `_build_turn_based_query` truncation, gate logic
   (turn / length / both), dedup, kill switch, empty-transcript skip,
   no-results-still-marks-injected, search-exception graceful-degrade.

37/38 of the targeted tests pass (1 skip is Linux-only). The 25
broader Windows test failures pre-date this branch — verified by
stash-and-rerun on origin/main.

Builds on `fix(hooks): guard fcntl import on Windows` (cherry-picked
from `fix/hooks-fcntl-windows-portability`). PR will note the
dependency so reviewers can merge that first.

Co-Authored-By: claude-opus-4-7 <wontreply@getfucked.ai>
…nvocations

Sub-agents run as separate Claude Code sessions with their own SessionStart
/ UserPromptSubmit / SessionEnd events. With three hooks installed and
sub-agent-heavy workflows, this floods the user with terminal flashes
every ~30 seconds, AND pollutes the memory store with orchestrator-
generated prompts (which look nothing like real user input).

The fix is a single helper `is_subagent_invocation()` in `_shared.py`
that detects sub-agent invocations via two signals:

1. `input_data["agent_id"]` / `input_data["agent_type"]` — present in
   newer Claude Code hook stdin JSON for sub-agents.
2. Fallback: `transcript_path` contains `/subagents/` — robust against
   older Claude Code versions that don't emit the agent_id field. Sub-
   agent transcripts always land under
   `<project>/<session_id>/subagents/agent-<id>.jsonl`.

Each of the three hooks (`session_start`, `user_prompt_submit`,
`stop`) now early-returns when this helper returns True. The hook still
spawns a Python process (Claude Code's own behavior — out of our
control), but the body exits in <10ms instead of doing recall / buffer
write / ingestion spawn. Combined with `pythonw.exe` in the user's
settings.json command, the per-sub-agent overhead becomes invisible.

The Windows terminal-flash issue itself is fixed by switching the hook
command from `python.exe` to `pythonw.exe` (windowless Python that
preserves stdin/stdout for hook IO) — that lives in the user's
settings.json, not in this repo.

Co-Authored-By: claude-opus-4-7 <wontreply@getfucked.ai>
…--agent main mode

Adversarial check against the canonical Claude Code docs revealed that
``agent_type`` is set NOT ONLY for sub-agent invocations but ALSO for
user-launched ``--agent <name>`` main sessions, where the user is
running a custom-agent persona AS the main thread. Those prompts are
real user input and MUST be ingested; the previous guard was silently
skipping them.

Per canonical docs at code.claude.com/docs/en/hooks:
- ``agent_id`` is "Present only when the hook fires inside a subagent
  call" — authoritative sub-agent signal.
- ``agent_type`` is "Present when the session uses --agent or the hook
  fires inside a subagent" — overloaded, NOT a sub-agent-exclusive
  signal.

Fix: ``is_subagent_invocation()`` now uses two signals:

1. ``agent_id`` in hook stdin (authoritative — sub-agent only).
2. ``/subagents/`` directory component in ``transcript_path`` (robust
   fallback for any event where ``agent_id`` might be absent and for
   older Claude Code versions pre-v2.1.69).

Adds 5 regression tests covering:
- agent_id alone → True
- /subagents/ in path (forward AND backslash variants) → True
- agent_type alone → False (the new correct behavior — main session)
- plain main thread → False
- empty input → False (no false-positive on init failures)

35/35 tests in this PR pass after the change.

Co-Authored-By: claude-opus-4-7 <wontreply@getfucked.ai>
Adds a prominent design-note comment above the is_subagent_invocation
regression tests in test_turn_based_injection.py. Explains:

  - Why TrueMemory's hooks skip sub-agent invocations (sub-agent prompts
    are orchestrator-generated, not user input; on Windows each fire
    flashes a console window via Claude Code's flag-omitting spawn).
  - Why this is a Hunter-specific design preference that may disagree
    with upstream's "sub-agents fire hooks per canonical Claude Code
    behavior" stance.
  - Offers a clean backout: gating the guard behind a
    TRUEMEMORY_SKIP_SUBAGENT_HOOKS env var (default off) on review
    request if upstream prefers the canonical behavior.
  - Documents the two detection signals (agent_id, /subagents/ path)
    and explicitly explains why agent_type is NOT a signal — the false
    positive on `--agent <name>` main sessions is locked in by tests.

No functional change. Pure documentation.

Co-Authored-By: claude-opus-4-7 <wontreply@getfucked.ai>
Huntehhh added a commit to Huntehhh/TrueMemory that referenced this pull request May 19, 2026
Adds an append-only structured log at ~/.truememory/claude-cli-usage.jsonl
written from a try/finally wrapping the existing CLI invocation. Every
record captures: ts, model, caller (walked back from the stack via
inspect.stack — module.function:line outside this file), prompt_chars,
response_chars, duration_ms, exit_code, pid, and an error string when
the call failed.

Reason: diagnosing where Hunter's Claude subscription rate limits are
coming from. _complete_claude_cli is the single chokepoint for every
TrueMemory Claude CLI call — HyDE on search_deep, background ingestion
Haiku extraction, the once-per-session turn-based injection, and the
new claude_cli priority-1 _build_llm_fn provider (PR buildingjoshbetter#357). With this
log we can `jq` per-caller per-hour breakdowns to pinpoint quota burn.

Telemetry is crash-tolerant (try/finally fires even on LLMError) and
never raises (broad except inside _log_claude_cli_usage so a telemetry
failure can never break the LLM call path).

Smoke-tested via forced cli_not_on_path error — record landed with all
fields populated, caller frame correctly identified.

Local-only diagnostic overlay. NOT for upstream PR (this branch's
charter per `local/instrumentation-diag` convention).

Co-Authored-By: claude-opus-4-7 <wontreply@getfucked.ai>
Huntehhh added a commit to Huntehhh/TrueMemory that referenced this pull request May 20, 2026
…T_PROVIDER in auto_detect

ROOT CAUSE — Hunter was hitting Claude subscription rate limits because
auto_detect picks Claude CLI at priority 2 whenever `claude.exe` is on
PATH (which it always is for Claude Code users). Every background
extraction Haiku call burned subscription quota. Quantitative impact
from telemetry: ~338 ingest sessions/day × ~10.5 chunks each = ~3,549
Haiku calls/day via subscription auth. Two backlog cascade bursts today
(52 sessions at 16:33, 66 sessions at 17:35) made the throttle bite.

The pre-existing auto_detect priority chain (Ollama → Claude CLI →
OpenRouter → Anthropic) meant setting ANTHROPIC_API_KEY alone did NOT
route extraction away from the subscription, because Claude CLI at
priority 2 already won. Users running Claude Code who want their
background ingestion to NOT touch the subscription had no escape hatch.

Adds two env-var overrides:

  1. TRUEMEMORY_INGEST_PROVIDER=<provider> — explicit override that
     wins all priority checks. Set to `anthropic` / `openrouter` /
     `openai` / `ollama` to force that path. Pairs with the matching
     API key env var.

  2. TRUEMEMORY_DISABLE_CLAUDE_CLI=1 — skips the Claude CLI branch in
     the priority chain (mirrors the same env var already honored by
     `mcp_server._build_llm_fn` from PR buildingjoshbetter#357 — consistency across the
     two LLM-resolution paths in TrueMemory).

With either env var set + an API key configured, background extraction
routes to paid API instead of subscription. At Hunter's call volume
(~3,500/day Haiku) that's roughly $1.75/day cost in exchange for
zero subscription quota burn.

Co-Authored-By: claude-opus-4-7 <wontreply@getfucked.ai>
Huntehhh added a commit to Huntehhh/TrueMemory that referenced this pull request May 20, 2026
Brain-dump of architectural options + tradeoffs surfaced during local
fork work, framed honestly for upstream consideration (or rejection).

Topics:
1. http hook type as architectural alternative to subprocess-spawn
2. Per-call telemetry chokepoint (already shipped on this branch)
3. TRUEMEMORY_BACKLOG_BATCH_SIZE env var (shipped this commit's predecessor)
4. Sub-agent context skip guard (PR buildingjoshbetter#357)
5. Claude CLI as priority-1 _build_llm_fn provider (PR buildingjoshbetter#357)

Local-only deviations explicitly NOT for upstream are flagged at the
bottom.

Maintained on local/instrumentation-diag for fork visibility — not
proposed for upstream merge as-is (it's a fork-side discussion doc).

Co-Authored-By: claude-opus-4-7 <wontreply@getfucked.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant