feat(hooks+llm): turn-based memory injection on UserPromptSubmit + Claude CLI HyDE provider#357
Open
Huntehhh wants to merge 5 commits into
Conversation
truememory/ingest/hooks/_shared.py and session_start.py both had bare `import fcntl` at module top with no `try / except ImportError` guard. fcntl is POSIX-only — on Windows every import raised ModuleNotFoundError before any function ran, which cascaded to every Claude Code hook subprocess that imports from _shared (SessionStart, UserPromptSubmit, Stop). Silent failure mode: no memory extraction after sessions, no context injection at session start, no buffer writes on user prompts. Fix mirrors the established `_HAS_FCNTL` flag pattern already used by ingest/pipeline.py, hooks/core.py, and ingest/hooks/user_prompt_submit.py. The two affected call sites: - _shared.check_extraction_budget — POSIX uses fcntl.flock(LOCK_EX) for atomic read-modify-write across concurrent ingest processes. Windows falls back to non-atomic R/M/W; worst-case race window allows 1-2 extra extractions per hour, acceptable given the alternative is no enforcement at all. - session_start._scan_stale_sessions — POSIX uses a non-blocking advisory lock to prevent concurrent scans. Windows skips the lock; the backlog drainer's atomic .json → .processing rename already deduplicates any overlapping work. Tests in tests/ingest/test_hooks_windows_portability.py (8 tests) pin the _HAS_FCNTL flag contract on both modules, verify check_extraction_budget behaves correctly without fcntl (allows, enforces cap, resets hourly), and include a POSIX-only regression lock that fcntl.flock(LOCK_EX) is still acquired on platforms where it's available — so a future refactor can't silently drop cross-process coordination. Co-Authored-By: claude-opus-4-7 <wontreply@getfucked.ai>
Two bundled changes that ship together because the new hook depends on
the new provider being wired.
1. Claude CLI llm_fn (mcp_server.py)
* Add `_build_claude_cli_llm` factory wrapping the existing
`truememory.ingest.models._complete_claude_cli` helper in a
`(prompt) -> str` closure compatible with `search_deep`'s `llm_fn`.
* Prepend `("claude_cli", "", "", "_build_claude_cli_llm")` to
`_LLM_PROVIDERS` as priority 1 (uses Claude Code subscription auth,
zero cash spend). Falls through to API-key providers if `claude.exe`
isn't on PATH or `TRUEMEMORY_DISABLE_CLAUDE_CLI=1` is set.
* `_build_llm_fn` loop now special-cases the no-key Claude CLI path
while preserving the env-var / config-key contract for the other
three providers.
* Behavior change for users with both Claude CLI on PATH AND an API
key set: Claude CLI now wins. Opt out via the kill-switch env var.
2. Turn-based memory injection (user_prompt_submit.py + _shared.py)
* New gate: fire ONE targeted recall per session when conversation
has signal — `count_user_turns >= INJECT_AFTER_TURNS` (default 13)
OR `len(current_prompt) > INJECT_AFTER_CHARS` (default 333).
* Query is built from the last K user turns (default 6) concatenated
with `---` separators, NOT just the raw current prompt — captures
topic commitment + project context for richer vector search.
* Calls `Memory.search_deep(query, user_id, limit=15, llm_fn=llm_fn)`
so HyDE fires when the Claude CLI provider above resolves.
* Marker file at `~/.truememory/turn_injected/<safe_session_id>`
dedupes re-fires; pattern mirrors `should_extract_session` /
`mark_session_extracted`. Atomic write via tempfile + replace.
* Existing `_detect_recall` regex path stays untouched — explicit
recall questions still work every turn. Both paths can fire on the
same prompt; combined output goes out as a single
`additionalContext` JSON line to avoid double-emit.
* New env vars (all overridable): `TRUEMEMORY_INJECT_AFTER_TURNS`,
`TRUEMEMORY_INJECT_AFTER_CHARS`, `TRUEMEMORY_INJECT_QUERY_TURNS`,
`TRUEMEMORY_INJECT_QUERY_TURN_CHARS`, `TRUEMEMORY_INJECT_RECALL_LIMIT`,
`TRUEMEMORY_INJECT_BULLET_CHARS`, `TRUEMEMORY_INJECT_DISABLED`.
Tests:
* tests/test_llm_fn_init.py: +6 cases for Claude CLI priority, kill
switch, error-fall-through. Fixture now defaults
`TRUEMEMORY_DISABLE_CLAUDE_CLI=1` so legacy API-key tests don't
accidentally pick up the new priority-1 provider.
* tests/test_turn_based_injection.py (new): 19 cases covering
`count_user_turns` parsing (JSONL + JSON-array + empty + missing),
marker round-trip, `_build_turn_based_query` truncation, gate logic
(turn / length / both), dedup, kill switch, empty-transcript skip,
no-results-still-marks-injected, search-exception graceful-degrade.
37/38 of the targeted tests pass (1 skip is Linux-only). The 25
broader Windows test failures pre-date this branch — verified by
stash-and-rerun on origin/main.
Builds on `fix(hooks): guard fcntl import on Windows` (cherry-picked
from `fix/hooks-fcntl-windows-portability`). PR will note the
dependency so reviewers can merge that first.
Co-Authored-By: claude-opus-4-7 <wontreply@getfucked.ai>
…nvocations Sub-agents run as separate Claude Code sessions with their own SessionStart / UserPromptSubmit / SessionEnd events. With three hooks installed and sub-agent-heavy workflows, this floods the user with terminal flashes every ~30 seconds, AND pollutes the memory store with orchestrator- generated prompts (which look nothing like real user input). The fix is a single helper `is_subagent_invocation()` in `_shared.py` that detects sub-agent invocations via two signals: 1. `input_data["agent_id"]` / `input_data["agent_type"]` — present in newer Claude Code hook stdin JSON for sub-agents. 2. Fallback: `transcript_path` contains `/subagents/` — robust against older Claude Code versions that don't emit the agent_id field. Sub- agent transcripts always land under `<project>/<session_id>/subagents/agent-<id>.jsonl`. Each of the three hooks (`session_start`, `user_prompt_submit`, `stop`) now early-returns when this helper returns True. The hook still spawns a Python process (Claude Code's own behavior — out of our control), but the body exits in <10ms instead of doing recall / buffer write / ingestion spawn. Combined with `pythonw.exe` in the user's settings.json command, the per-sub-agent overhead becomes invisible. The Windows terminal-flash issue itself is fixed by switching the hook command from `python.exe` to `pythonw.exe` (windowless Python that preserves stdin/stdout for hook IO) — that lives in the user's settings.json, not in this repo. Co-Authored-By: claude-opus-4-7 <wontreply@getfucked.ai>
…--agent main mode Adversarial check against the canonical Claude Code docs revealed that ``agent_type`` is set NOT ONLY for sub-agent invocations but ALSO for user-launched ``--agent <name>`` main sessions, where the user is running a custom-agent persona AS the main thread. Those prompts are real user input and MUST be ingested; the previous guard was silently skipping them. Per canonical docs at code.claude.com/docs/en/hooks: - ``agent_id`` is "Present only when the hook fires inside a subagent call" — authoritative sub-agent signal. - ``agent_type`` is "Present when the session uses --agent or the hook fires inside a subagent" — overloaded, NOT a sub-agent-exclusive signal. Fix: ``is_subagent_invocation()`` now uses two signals: 1. ``agent_id`` in hook stdin (authoritative — sub-agent only). 2. ``/subagents/`` directory component in ``transcript_path`` (robust fallback for any event where ``agent_id`` might be absent and for older Claude Code versions pre-v2.1.69). Adds 5 regression tests covering: - agent_id alone → True - /subagents/ in path (forward AND backslash variants) → True - agent_type alone → False (the new correct behavior — main session) - plain main thread → False - empty input → False (no false-positive on init failures) 35/35 tests in this PR pass after the change. Co-Authored-By: claude-opus-4-7 <wontreply@getfucked.ai>
Adds a prominent design-note comment above the is_subagent_invocation
regression tests in test_turn_based_injection.py. Explains:
- Why TrueMemory's hooks skip sub-agent invocations (sub-agent prompts
are orchestrator-generated, not user input; on Windows each fire
flashes a console window via Claude Code's flag-omitting spawn).
- Why this is a Hunter-specific design preference that may disagree
with upstream's "sub-agents fire hooks per canonical Claude Code
behavior" stance.
- Offers a clean backout: gating the guard behind a
TRUEMEMORY_SKIP_SUBAGENT_HOOKS env var (default off) on review
request if upstream prefers the canonical behavior.
- Documents the two detection signals (agent_id, /subagents/ path)
and explicitly explains why agent_type is NOT a signal — the false
positive on `--agent <name>` main sessions is locked in by tests.
No functional change. Pure documentation.
Co-Authored-By: claude-opus-4-7 <wontreply@getfucked.ai>
Huntehhh
added a commit
to Huntehhh/TrueMemory
that referenced
this pull request
May 19, 2026
Adds an append-only structured log at ~/.truememory/claude-cli-usage.jsonl written from a try/finally wrapping the existing CLI invocation. Every record captures: ts, model, caller (walked back from the stack via inspect.stack — module.function:line outside this file), prompt_chars, response_chars, duration_ms, exit_code, pid, and an error string when the call failed. Reason: diagnosing where Hunter's Claude subscription rate limits are coming from. _complete_claude_cli is the single chokepoint for every TrueMemory Claude CLI call — HyDE on search_deep, background ingestion Haiku extraction, the once-per-session turn-based injection, and the new claude_cli priority-1 _build_llm_fn provider (PR buildingjoshbetter#357). With this log we can `jq` per-caller per-hour breakdowns to pinpoint quota burn. Telemetry is crash-tolerant (try/finally fires even on LLMError) and never raises (broad except inside _log_claude_cli_usage so a telemetry failure can never break the LLM call path). Smoke-tested via forced cli_not_on_path error — record landed with all fields populated, caller frame correctly identified. Local-only diagnostic overlay. NOT for upstream PR (this branch's charter per `local/instrumentation-diag` convention). Co-Authored-By: claude-opus-4-7 <wontreply@getfucked.ai>
Huntehhh
added a commit
to Huntehhh/TrueMemory
that referenced
this pull request
May 20, 2026
…T_PROVIDER in auto_detect
ROOT CAUSE — Hunter was hitting Claude subscription rate limits because
auto_detect picks Claude CLI at priority 2 whenever `claude.exe` is on
PATH (which it always is for Claude Code users). Every background
extraction Haiku call burned subscription quota. Quantitative impact
from telemetry: ~338 ingest sessions/day × ~10.5 chunks each = ~3,549
Haiku calls/day via subscription auth. Two backlog cascade bursts today
(52 sessions at 16:33, 66 sessions at 17:35) made the throttle bite.
The pre-existing auto_detect priority chain (Ollama → Claude CLI →
OpenRouter → Anthropic) meant setting ANTHROPIC_API_KEY alone did NOT
route extraction away from the subscription, because Claude CLI at
priority 2 already won. Users running Claude Code who want their
background ingestion to NOT touch the subscription had no escape hatch.
Adds two env-var overrides:
1. TRUEMEMORY_INGEST_PROVIDER=<provider> — explicit override that
wins all priority checks. Set to `anthropic` / `openrouter` /
`openai` / `ollama` to force that path. Pairs with the matching
API key env var.
2. TRUEMEMORY_DISABLE_CLAUDE_CLI=1 — skips the Claude CLI branch in
the priority chain (mirrors the same env var already honored by
`mcp_server._build_llm_fn` from PR buildingjoshbetter#357 — consistency across the
two LLM-resolution paths in TrueMemory).
With either env var set + an API key configured, background extraction
routes to paid API instead of subscription. At Hunter's call volume
(~3,500/day Haiku) that's roughly $1.75/day cost in exchange for
zero subscription quota burn.
Co-Authored-By: claude-opus-4-7 <wontreply@getfucked.ai>
Huntehhh
added a commit
to Huntehhh/TrueMemory
that referenced
this pull request
May 20, 2026
Brain-dump of architectural options + tradeoffs surfaced during local fork work, framed honestly for upstream consideration (or rejection). Topics: 1. http hook type as architectural alternative to subprocess-spawn 2. Per-call telemetry chokepoint (already shipped on this branch) 3. TRUEMEMORY_BACKLOG_BATCH_SIZE env var (shipped this commit's predecessor) 4. Sub-agent context skip guard (PR buildingjoshbetter#357) 5. Claude CLI as priority-1 _build_llm_fn provider (PR buildingjoshbetter#357) Local-only deviations explicitly NOT for upstream are flagged at the bottom. Maintained on local/instrumentation-diag for fork visibility — not proposed for upstream merge as-is (it's a fork-side discussion doc). Co-Authored-By: claude-opus-4-7 <wontreply@getfucked.ai>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three coupled changes targeting Pro-tier subscription users on Windows.
Turn-based injection (
UserPromptSubmit) — fires once per session when user-turn count ≥ 13 OR current prompt length > 333 chars (whichever first). Query = last 6 user turns concatenated, separated by\n---\n, giving the vector pipeline richer semantic context than the existing raw-prompt path. Marker file at~/.truememory/turn_injected/<safe_session_id>dedupes re-fires; pattern mirrorsshould_extract_session/mark_session_extracted. The existing_detect_recallregex path is untouched and still fires every turn for explicit recall questions — both paths can emit on the same prompt, combined into oneadditionalContextJSON line.Claude CLI priority-1
_build_llm_fnprovider —_build_claude_cli_llmwraps the existingtruememory.ingest.models._complete_claude_clihelper into the(prompt) -> strshapesearch_deepexpects. Pro tier now gets HyDE for free on subscription auth — no API key required. Opt out viaTRUEMEMORY_DISABLE_CLAUDE_CLI=1.Sub-agent hook skip — new
is_subagent_invocation()helper in_shared.py; all 3 hooks early-return for Claude Code Task-tool sub-agents (detected viaagent_idfield in stdin OR/subagents/intranscript_path). Stops per-sub-agent terminal flash + orchestrator-prompt memory pollution.Depends on #345 (fcntl Windows portability) — cherry-picked into this branch; will dedupe on rebase once #345 merges.
Changes
truememory/mcp_server.py+_build_claude_cli_llm, prependclaude_clito_LLM_PROVIDERS, add no-key path +TRUEMEMORY_DISABLE_CLAUDE_CLIkill-switch in_build_llm_fntruememory/ingest/hooks/_shared.py+is_subagent_invocation,+TURN_INJECTED_DIR,+count_user_turns,+already_injected,+mark_injectedtruememory/ingest/hooks/user_prompt_submit.py+_build_turn_based_query,+_try_turn_based_injection, sub-agent guard inmain(), single-emit restructure so explicit-recall + turn-based combine into oneadditionalContextlinetruememory/ingest/hooks/session_start.pymain()truememory/ingest/hooks/stop.pymain()tests/test_llm_fn_init.pyTRUEMEMORY_DISABLE_CLAUDE_CLI=1so legacy API-key tests don't pick up the new priority-1 providertests/test_turn_based_injection.pytests/ingest/test_hooks_windows_portability.pyNew env vars (all overridable, sensible defaults)
TRUEMEMORY_INJECT_AFTER_TURNS13TRUEMEMORY_INJECT_AFTER_CHARS333TRUEMEMORY_INJECT_QUERY_TURNS6TRUEMEMORY_INJECT_QUERY_TURN_CHARS500TRUEMEMORY_INJECT_RECALL_LIMIT15TRUEMEMORY_INJECT_BULLET_CHARS300TRUEMEMORY_INJECT_DISABLEDTRUEMEMORY_DISABLE_CLAUDE_CLI_build_llm_fnto skip Claude CLI and fall through to API-key providersTRUEMEMORY_HYDE_CLAUDE_MODELclaude-haiku-4-5Test plan
pytest tests/test_llm_fn_init.py tests/test_turn_based_injection.py tests/ingest/test_hooks_windows_portability.py -v→ 37 pass / 1 skip (fcntl.flockpath, Linux-only)agent_id,agent_type,/subagents/path, mixed-slash Windows paths, empty input)_try_turn_based_injectionwith stubbedMemory— emits<truememory-context>block attrigger=turns, writes marker, re-fire returnsNoneclaude.exe+OPENROUTER_API_KEYset:truememory_search_deepselectsclaude_cliprovider; verify_current_llm_provider_name == "claude_cli"and zero OpenRouter spend<truememory-context>emitted (~500ms HyDE pause); subsequent prompts → no re-fire; marker at~/.truememory/turn_injected/<session_id>exists withtrigger,query_chars,n_results,timestampBreaking changes
Provider priority shift for users with both Claude CLI on PATH and an API key set:
Old behavior: API key wins → user pays per HyDE call.
New behavior: Claude CLI wins → user pays nothing (subscription auth).
Users who deliberately want the API-key path (e.g., centralized billing, rate-limit isolation, paid Anthropic plan for HyDE consistency) set
TRUEMEMORY_DISABLE_CLAUDE_CLI=1. Documented in the new env-var table above.Users without
claude.exeon PATH see identical behavior — the new provider falls through to the existing API-key chain.Co-Authored-By: claude-opus-4-7 wontreply@getfucked.ai