feat(hooks+llm): turn-based memory injection on UserPromptSubmit + Claude CLI HyDE provider by Huntehhh · Pull Request #357 · buildingjoshbetter/TrueMemory

Huntehhh · 2026-05-19T03:54:00Z

Summary

Three coupled changes targeting Pro-tier subscription users on Windows.

Turn-based injection (UserPromptSubmit) — fires once per session when user-turn count ≥ 13 OR current prompt length > 333 chars (whichever first). Query = last 6 user turns concatenated, separated by \n---\n, giving the vector pipeline richer semantic context than the existing raw-prompt path. Marker file at ~/.truememory/turn_injected/<safe_session_id> dedupes re-fires; pattern mirrors should_extract_session / mark_session_extracted. The existing _detect_recall regex path is untouched and still fires every turn for explicit recall questions — both paths can emit on the same prompt, combined into one additionalContext JSON line.

Claude CLI priority-1 _build_llm_fn provider — _build_claude_cli_llm wraps the existing truememory.ingest.models._complete_claude_cli helper into the (prompt) -> str shape search_deep expects. Pro tier now gets HyDE for free on subscription auth — no API key required. Opt out via TRUEMEMORY_DISABLE_CLAUDE_CLI=1.

Sub-agent hook skip — new is_subagent_invocation() helper in _shared.py; all 3 hooks early-return for Claude Code Task-tool sub-agents (detected via agent_id field in stdin OR /subagents/ in transcript_path). Stops per-sub-agent terminal flash + orchestrator-prompt memory pollution.

Depends on #345 (fcntl Windows portability) — cherry-picked into this branch; will dedupe on rebase once #345 merges.

Changes

File	Change
`truememory/mcp_server.py`	`+_build_claude_cli_llm`, prepend `claude_cli` to `_LLM_PROVIDERS`, add no-key path + `TRUEMEMORY_DISABLE_CLAUDE_CLI` kill-switch in `_build_llm_fn`
`truememory/ingest/hooks/_shared.py`	`+is_subagent_invocation`, `+TURN_INJECTED_DIR`, `+count_user_turns`, `+already_injected`, `+mark_injected`
`truememory/ingest/hooks/user_prompt_submit.py`	`+_build_turn_based_query`, `+_try_turn_based_injection`, sub-agent guard in `main()`, single-emit restructure so explicit-recall + turn-based combine into one `additionalContext` line
`truememory/ingest/hooks/session_start.py`	Sub-agent guard at top of `main()`
`truememory/ingest/hooks/stop.py`	Sub-agent guard at top of `main()`
`tests/test_llm_fn_init.py`	+6 Claude CLI cases; fixture defaults `TRUEMEMORY_DISABLE_CLAUDE_CLI=1` so legacy API-key tests don't pick up the new priority-1 provider
`tests/test_turn_based_injection.py`	NEW — 19 cases covering helpers, gate logic, marker dedup, kill switch, search-exception graceful-degrade
`tests/ingest/test_hooks_windows_portability.py`	From #345 cherry-pick — will dedupe on rebase

New env vars (all overridable, sensible defaults)

Var	Default	Effect
`TRUEMEMORY_INJECT_AFTER_TURNS`	`13`	Turn-count trigger threshold
`TRUEMEMORY_INJECT_AFTER_CHARS`	`333`	Prompt-length trigger threshold
`TRUEMEMORY_INJECT_QUERY_TURNS`	`6`	Recent turns concatenated into the query
`TRUEMEMORY_INJECT_QUERY_TURN_CHARS`	`500`	Per-turn truncation in the query (prevents pasted code blowing up the embedding input)
`TRUEMEMORY_INJECT_RECALL_LIMIT`	`15`	Memories fetched
`TRUEMEMORY_INJECT_BULLET_CHARS`	`300`	Per-bullet truncation in the emitted block
`TRUEMEMORY_INJECT_DISABLED`	unset	Kill switch — disables turn-based injection entirely
`TRUEMEMORY_DISABLE_CLAUDE_CLI`	unset	Force `_build_llm_fn` to skip Claude CLI and fall through to API-key providers
`TRUEMEMORY_HYDE_CLAUDE_MODEL`	`claude-haiku-4-5`	Model used for HyDE doc generation via Claude CLI

Test plan

pytest tests/test_llm_fn_init.py tests/test_turn_based_injection.py tests/ingest/test_hooks_windows_portability.py -v → 37 pass / 1 skip (fcntl.flock path, Linux-only)
Sub-agent detection: 6/6 cases pass (agent_id, agent_type, /subagents/ path, mixed-slash Windows paths, empty input)
In-process smoke of _try_turn_based_injection with stubbed Memory — emits <truememory-context> block at trigger=turns, writes marker, re-fire returns None
On a machine with claude.exe + OPENROUTER_API_KEY set: truememory_search_deep selects claude_cli provider; verify _current_llm_provider_name == "claude_cli" and zero OpenRouter spend
End-to-end manual smoke in a fresh Claude Code session: type a 350-char prompt → <truememory-context> emitted (~500ms HyDE pause); subsequent prompts → no re-fire; marker at ~/.truememory/turn_injected/<session_id> exists with trigger, query_chars, n_results, timestamp
Sub-agent Task spawn: hooks exit in <10ms, no recall / buffer write / ingestion side effects, no terminal flash

Breaking changes

Provider priority shift for users with both Claude CLI on PATH and an API key set:

- _LLM_PROVIDERS = (
-     ("anthropic", "ANTHROPIC_API_KEY", ...),
-     ("openrouter", "OPENROUTER_API_KEY", ...),
-     ("openai", "OPENAI_API_KEY", ...),
- )
+ _LLM_PROVIDERS = (
+     ("claude_cli", "", "", "_build_claude_cli_llm"),   # NEW — priority 1
+     ("anthropic", "ANTHROPIC_API_KEY", ...),
+     ("openrouter", "OPENROUTER_API_KEY", ...),
+     ("openai", "OPENAI_API_KEY", ...),
+ )

Old behavior: API key wins → user pays per HyDE call.
New behavior: Claude CLI wins → user pays nothing (subscription auth).

Users who deliberately want the API-key path (e.g., centralized billing, rate-limit isolation, paid Anthropic plan for HyDE consistency) set TRUEMEMORY_DISABLE_CLAUDE_CLI=1. Documented in the new env-var table above.

Users without claude.exe on PATH see identical behavior — the new provider falls through to the existing API-key chain.

Co-Authored-By: claude-opus-4-7 wontreply@getfucked.ai

truememory/ingest/hooks/_shared.py and session_start.py both had bare `import fcntl` at module top with no `try / except ImportError` guard. fcntl is POSIX-only — on Windows every import raised ModuleNotFoundError before any function ran, which cascaded to every Claude Code hook subprocess that imports from _shared (SessionStart, UserPromptSubmit, Stop). Silent failure mode: no memory extraction after sessions, no context injection at session start, no buffer writes on user prompts. Fix mirrors the established `_HAS_FCNTL` flag pattern already used by ingest/pipeline.py, hooks/core.py, and ingest/hooks/user_prompt_submit.py. The two affected call sites: - _shared.check_extraction_budget — POSIX uses fcntl.flock(LOCK_EX) for atomic read-modify-write across concurrent ingest processes. Windows falls back to non-atomic R/M/W; worst-case race window allows 1-2 extra extractions per hour, acceptable given the alternative is no enforcement at all. - session_start._scan_stale_sessions — POSIX uses a non-blocking advisory lock to prevent concurrent scans. Windows skips the lock; the backlog drainer's atomic .json → .processing rename already deduplicates any overlapping work. Tests in tests/ingest/test_hooks_windows_portability.py (8 tests) pin the _HAS_FCNTL flag contract on both modules, verify check_extraction_budget behaves correctly without fcntl (allows, enforces cap, resets hourly), and include a POSIX-only regression lock that fcntl.flock(LOCK_EX) is still acquired on platforms where it's available — so a future refactor can't silently drop cross-process coordination. Co-Authored-By: claude-opus-4-7 <wontreply@getfucked.ai>

Two bundled changes that ship together because the new hook depends on the new provider being wired. 1. Claude CLI llm_fn (mcp_server.py) * Add `_build_claude_cli_llm` factory wrapping the existing `truememory.ingest.models._complete_claude_cli` helper in a `(prompt) -> str` closure compatible with `search_deep`'s `llm_fn`. * Prepend `("claude_cli", "", "", "_build_claude_cli_llm")` to `_LLM_PROVIDERS` as priority 1 (uses Claude Code subscription auth, zero cash spend). Falls through to API-key providers if `claude.exe` isn't on PATH or `TRUEMEMORY_DISABLE_CLAUDE_CLI=1` is set. * `_build_llm_fn` loop now special-cases the no-key Claude CLI path while preserving the env-var / config-key contract for the other three providers. * Behavior change for users with both Claude CLI on PATH AND an API key set: Claude CLI now wins. Opt out via the kill-switch env var. 2. Turn-based memory injection (user_prompt_submit.py + _shared.py) * New gate: fire ONE targeted recall per session when conversation has signal — `count_user_turns >= INJECT_AFTER_TURNS` (default 13) OR `len(current_prompt) > INJECT_AFTER_CHARS` (default 333). * Query is built from the last K user turns (default 6) concatenated with `---` separators, NOT just the raw current prompt — captures topic commitment + project context for richer vector search. * Calls `Memory.search_deep(query, user_id, limit=15, llm_fn=llm_fn)` so HyDE fires when the Claude CLI provider above resolves. * Marker file at `~/.truememory/turn_injected/<safe_session_id>` dedupes re-fires; pattern mirrors `should_extract_session` / `mark_session_extracted`. Atomic write via tempfile + replace. * Existing `_detect_recall` regex path stays untouched — explicit recall questions still work every turn. Both paths can fire on the same prompt; combined output goes out as a single `additionalContext` JSON line to avoid double-emit. * New env vars (all overridable): `TRUEMEMORY_INJECT_AFTER_TURNS`, `TRUEMEMORY_INJECT_AFTER_CHARS`, `TRUEMEMORY_INJECT_QUERY_TURNS`, `TRUEMEMORY_INJECT_QUERY_TURN_CHARS`, `TRUEMEMORY_INJECT_RECALL_LIMIT`, `TRUEMEMORY_INJECT_BULLET_CHARS`, `TRUEMEMORY_INJECT_DISABLED`. Tests: * tests/test_llm_fn_init.py: +6 cases for Claude CLI priority, kill switch, error-fall-through. Fixture now defaults `TRUEMEMORY_DISABLE_CLAUDE_CLI=1` so legacy API-key tests don't accidentally pick up the new priority-1 provider. * tests/test_turn_based_injection.py (new): 19 cases covering `count_user_turns` parsing (JSONL + JSON-array + empty + missing), marker round-trip, `_build_turn_based_query` truncation, gate logic (turn / length / both), dedup, kill switch, empty-transcript skip, no-results-still-marks-injected, search-exception graceful-degrade. 37/38 of the targeted tests pass (1 skip is Linux-only). The 25 broader Windows test failures pre-date this branch — verified by stash-and-rerun on origin/main. Builds on `fix(hooks): guard fcntl import on Windows` (cherry-picked from `fix/hooks-fcntl-windows-portability`). PR will note the dependency so reviewers can merge that first. Co-Authored-By: claude-opus-4-7 <wontreply@getfucked.ai>

…nvocations Sub-agents run as separate Claude Code sessions with their own SessionStart / UserPromptSubmit / SessionEnd events. With three hooks installed and sub-agent-heavy workflows, this floods the user with terminal flashes every ~30 seconds, AND pollutes the memory store with orchestrator- generated prompts (which look nothing like real user input). The fix is a single helper `is_subagent_invocation()` in `_shared.py` that detects sub-agent invocations via two signals: 1. `input_data["agent_id"]` / `input_data["agent_type"]` — present in newer Claude Code hook stdin JSON for sub-agents. 2. Fallback: `transcript_path` contains `/subagents/` — robust against older Claude Code versions that don't emit the agent_id field. Sub- agent transcripts always land under `<project>/<session_id>/subagents/agent-<id>.jsonl`. Each of the three hooks (`session_start`, `user_prompt_submit`, `stop`) now early-returns when this helper returns True. The hook still spawns a Python process (Claude Code's own behavior — out of our control), but the body exits in <10ms instead of doing recall / buffer write / ingestion spawn. Combined with `pythonw.exe` in the user's settings.json command, the per-sub-agent overhead becomes invisible. The Windows terminal-flash issue itself is fixed by switching the hook command from `python.exe` to `pythonw.exe` (windowless Python that preserves stdin/stdout for hook IO) — that lives in the user's settings.json, not in this repo. Co-Authored-By: claude-opus-4-7 <wontreply@getfucked.ai>

…--agent main mode Adversarial check against the canonical Claude Code docs revealed that ``agent_type`` is set NOT ONLY for sub-agent invocations but ALSO for user-launched ``--agent <name>`` main sessions, where the user is running a custom-agent persona AS the main thread. Those prompts are real user input and MUST be ingested; the previous guard was silently skipping them. Per canonical docs at code.claude.com/docs/en/hooks: - ``agent_id`` is "Present only when the hook fires inside a subagent call" — authoritative sub-agent signal. - ``agent_type`` is "Present when the session uses --agent or the hook fires inside a subagent" — overloaded, NOT a sub-agent-exclusive signal. Fix: ``is_subagent_invocation()`` now uses two signals: 1. ``agent_id`` in hook stdin (authoritative — sub-agent only). 2. ``/subagents/`` directory component in ``transcript_path`` (robust fallback for any event where ``agent_id`` might be absent and for older Claude Code versions pre-v2.1.69). Adds 5 regression tests covering: - agent_id alone → True - /subagents/ in path (forward AND backslash variants) → True - agent_type alone → False (the new correct behavior — main session) - plain main thread → False - empty input → False (no false-positive on init failures) 35/35 tests in this PR pass after the change. Co-Authored-By: claude-opus-4-7 <wontreply@getfucked.ai>

Adds a prominent design-note comment above the is_subagent_invocation regression tests in test_turn_based_injection.py. Explains: - Why TrueMemory's hooks skip sub-agent invocations (sub-agent prompts are orchestrator-generated, not user input; on Windows each fire flashes a console window via Claude Code's flag-omitting spawn). - Why this is a Hunter-specific design preference that may disagree with upstream's "sub-agents fire hooks per canonical Claude Code behavior" stance. - Offers a clean backout: gating the guard behind a TRUEMEMORY_SKIP_SUBAGENT_HOOKS env var (default off) on review request if upstream prefers the canonical behavior. - Documents the two detection signals (agent_id, /subagents/ path) and explicitly explains why agent_type is NOT a signal — the false positive on `--agent <name>` main sessions is locked in by tests. No functional change. Pure documentation. Co-Authored-By: claude-opus-4-7 <wontreply@getfucked.ai>

Adds an append-only structured log at ~/.truememory/claude-cli-usage.jsonl written from a try/finally wrapping the existing CLI invocation. Every record captures: ts, model, caller (walked back from the stack via inspect.stack — module.function:line outside this file), prompt_chars, response_chars, duration_ms, exit_code, pid, and an error string when the call failed. Reason: diagnosing where Hunter's Claude subscription rate limits are coming from. _complete_claude_cli is the single chokepoint for every TrueMemory Claude CLI call — HyDE on search_deep, background ingestion Haiku extraction, the once-per-session turn-based injection, and the new claude_cli priority-1 _build_llm_fn provider (PR buildingjoshbetter#357). With this log we can `jq` per-caller per-hour breakdowns to pinpoint quota burn. Telemetry is crash-tolerant (try/finally fires even on LLMError) and never raises (broad except inside _log_claude_cli_usage so a telemetry failure can never break the LLM call path). Smoke-tested via forced cli_not_on_path error — record landed with all fields populated, caller frame correctly identified. Local-only diagnostic overlay. NOT for upstream PR (this branch's charter per `local/instrumentation-diag` convention). Co-Authored-By: claude-opus-4-7 <wontreply@getfucked.ai>

…T_PROVIDER in auto_detect ROOT CAUSE — Hunter was hitting Claude subscription rate limits because auto_detect picks Claude CLI at priority 2 whenever `claude.exe` is on PATH (which it always is for Claude Code users). Every background extraction Haiku call burned subscription quota. Quantitative impact from telemetry: ~338 ingest sessions/day × ~10.5 chunks each = ~3,549 Haiku calls/day via subscription auth. Two backlog cascade bursts today (52 sessions at 16:33, 66 sessions at 17:35) made the throttle bite. The pre-existing auto_detect priority chain (Ollama → Claude CLI → OpenRouter → Anthropic) meant setting ANTHROPIC_API_KEY alone did NOT route extraction away from the subscription, because Claude CLI at priority 2 already won. Users running Claude Code who want their background ingestion to NOT touch the subscription had no escape hatch. Adds two env-var overrides: 1. TRUEMEMORY_INGEST_PROVIDER=<provider> — explicit override that wins all priority checks. Set to `anthropic` / `openrouter` / `openai` / `ollama` to force that path. Pairs with the matching API key env var. 2. TRUEMEMORY_DISABLE_CLAUDE_CLI=1 — skips the Claude CLI branch in the priority chain (mirrors the same env var already honored by `mcp_server._build_llm_fn` from PR buildingjoshbetter#357 — consistency across the two LLM-resolution paths in TrueMemory). With either env var set + an API key configured, background extraction routes to paid API instead of subscription. At Hunter's call volume (~3,500/day Haiku) that's roughly $1.75/day cost in exchange for zero subscription quota burn. Co-Authored-By: claude-opus-4-7 <wontreply@getfucked.ai>

Brain-dump of architectural options + tradeoffs surfaced during local fork work, framed honestly for upstream consideration (or rejection). Topics: 1. http hook type as architectural alternative to subprocess-spawn 2. Per-call telemetry chokepoint (already shipped on this branch) 3. TRUEMEMORY_BACKLOG_BATCH_SIZE env var (shipped this commit's predecessor) 4. Sub-agent context skip guard (PR buildingjoshbetter#357) 5. Claude CLI as priority-1 _build_llm_fn provider (PR buildingjoshbetter#357) Local-only deviations explicitly NOT for upstream are flagged at the bottom. Maintained on local/instrumentation-diag for fork visibility — not proposed for upstream merge as-is (it's a fork-side discussion doc). Co-Authored-By: claude-opus-4-7 <wontreply@getfucked.ai>

Huntehhh and others added 5 commits May 18, 2026 21:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(hooks+llm): turn-based memory injection on UserPromptSubmit + Claude CLI HyDE provider#357

feat(hooks+llm): turn-based memory injection on UserPromptSubmit + Claude CLI HyDE provider#357
Huntehhh wants to merge 5 commits into
buildingjoshbetter:mainfrom
Huntehhh:feat/turn-based-injection-and-claude-cli-llm

Huntehhh commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Huntehhh commented May 19, 2026

Summary

Changes

New env vars (all overridable, sensible defaults)

Test plan

Breaking changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant