Skip to content

feat: add MEMCTX_LLM_MODEL env, async agent_end, model attribution#71

Open
ersintarhan wants to merge 2 commits into
weauratech:mainfrom
ersintarhan:feat/memctx-llm-model-async-agent-end
Open

feat: add MEMCTX_LLM_MODEL env, async agent_end, model attribution#71
ersintarhan wants to merge 2 commits into
weauratech:mainfrom
ersintarhan:feat/memctx-llm-model-async-agent-end

Conversation

@ersintarhan
Copy link
Copy Markdown

Closes #70

Motivation

Heavy host models (Claude Opus, etc.) make pi-memctx's agent_end hook block the parent agent turn 10-30s on rich turns, degrading coding UX. Additionally, there's no way to delegate pi-memctx's internal LLM work (curate, gateway judge, query expansion, pack-generate) to a faster/cheaper model while keeping the host on a premium model for reasoning.

Changes (three orthogonal, opt-in improvements)

1. Async agent_end — highest impact

Hook now wraps curate logic in void (async () => { ... })() so the parent turn returns immediately. Curate runs in the background; errors logged through ctx.logger?.error. Validated in production: opus host now feels instant on rich turns.

2. MEMCTX_LLM_MODEL env override

export MEMCTX_LLM_MODEL=anthropic/claude-haiku-4-5
# or plain id
export MEMCTX_LLM_MODEL=claude-haiku-4-5

New resolveLlmModel(ctx) helper:

  • Looks up env value in ctx.modelRegistry by id or provider/model
  • Warns + falls back to ctx.model if invalid (no hard fail)
  • Replaces direct ctx.model reads in 10 call sites
  • completeJsonWithLlm and completeJsonWithModel also accept an explicit model param

3. Optional model: frontmatter attribution

buildNote and llmArchitectureNote accept an optional model parameter. When provided, frontmatter gets:

---
type: observation
model: claude-haiku-4-5
...
---

Audit trail for which model wrote each memory. Old notes without the field continue to work.

Tests

  • 10 new unit tests in test/unit.test.ts:
    • 5 × resolveLlmModel (unset, by id, provider/model, invalid → warn + fallback, no model → undefined)
    • 2 × buildNote model attribution (with model, without)
    • 2 × llmArchitectureNote model attribution (with model, without)
    • 1 × memctx_save integration test (saved note carries model: field)
  • Total: 111 tests passing (101 baseline + 10 new)
  • bun run ci (typecheck + tests + e2e) green

Backward compatibility

Zero breaking changes:

  • MEMCTX_LLM_MODEL unset + no explicit model param → identical to v0.13.1
  • agent_end hook signature unchanged
  • model: frontmatter only rendered when supplied

Real-world validation

Patch deployed to a Sentirum dev workspace via symlink (~/.pi/agent/extensions/pi-memctx) and run against a 49-doc pack with Claude Opus as the host model. Observed:

  • ✅ Parent turn ends immediately on rich turns (async hook proven)
  • ✅ Generated memories carry model: <id> field (attribution proven)
  • ✅ Curate quality unchanged (English, structured, taxonomy-correct: action / context / observation / runbook)
  • ✅ Three pi-memctx packs running side-by-side (cli-tools, pi-memctx self-pack, senti-sportsbook) with no conflict

Notes

  • No build step changes (still raw TS via files: ['index.ts', ...])
  • No new dependencies
  • ~/.pi/agent/extensions/*/index.ts symlink loader confirmed working for local dev iteration

Use case for the env override

A typical setup that benefits:

# Host model = premium reasoning
# pi-memctx model = fast + cheap for curate volume
MEMCTX_LLM_MODEL=anthropic/claude-haiku-4-5 pi

This effectively splits the model budget: keep premium reasoning for the user-facing agent, delegate background memory work to a cheaper tier.

Three orthogonal improvements; fully backward compatible.

1. MEMCTX_LLM_MODEL env override
   - New resolveLlmModel(ctx) helper checks process.env.MEMCTX_LLM_MODEL
   - Supports plain 'id' and 'provider/model' formats via ctx.modelRegistry
   - Warns + falls back to ctx.model if env value is invalid
   - Refactored 10 call sites that previously used ctx.model directly
   - completeJsonWithLlm and completeJsonWithModel accept optional model param

2. Async agent_end (fire-and-forget curate)
   - agent_end hook now wraps curate in 'void (async () => { ... })()'
   - Parent turn returns immediately; curate runs in the background
   - Errors logged via ctx.logger?.error, never thrown to host
   - Significantly improves UX with slow host models (opus, etc.)

3. Optional 'model:' frontmatter attribution
   - buildNote and llmArchitectureNote accept optional model parameter
   - Renders 'model: <id>' field only when supplied (backward compatible)
   - saveMemoryCandidate threads resolveLlmModel(ctx)?.id through
   - Enables audit trail of which model curated each memory

Tests: 10 new unit tests (5 resolveLlmModel + 2 buildNote + 2 llmArchitectureNote
+ 1 memctx_save integration). Total: 111 passing. 'bun run ci' green.

Backward compat: with MEMCTX_LLM_MODEL unset and no model passed, behavior is
identical to v0.13.1.
Copilot AI review requested due to automatic review settings May 14, 2026 15:17
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds opt-in LLM model selection and model attribution for pi-memctx memory generation, while making agent_end curation run asynchronously to reduce user-facing latency.

Changes:

  • Adds resolveLlmModel and routes LLM call sites through MEMCTX_LLM_MODEL fallback logic.
  • Adds optional model: frontmatter to generated notes and LLM architecture notes.
  • Converts agent_end memory curation to fire-and-forget background execution and adds related unit tests.

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 1 comment.

File Description
index.ts Implements model resolution, model attribution, and async agent_end curation.
test/unit.test.ts Adds tests for model resolution and model frontmatter attribution.
.gitignore Ignores local .brv/ test artifact directory.
Comments suppressed due to low confidence (5)

index.ts:5314

  • Because the fire-and-forget task continues after this handler returns while activePack/activePackPath remain global mutable state, a pack switch or /memctx-init during the background LLM call can cause the turn's candidates to be saved or queued against the wrong pack. Snapshot the active pack/path for this task and ensure the save path uses that snapshot instead of the current globals.
		void (async () => {
			try {
				const messages = (event as any).messages ?? [];
				const { snippets, hasWrites, richDiscovery } = collectTurnSnippets(messages);
				if (snippets.length === 0) return;

index.ts:5403

  • The PR description says async curate errors are logged through ctx.logger?.error, but this implementation writes to console.error instead. If the host only captures extension logs via ctx.logger, background failures may be invisible to users/operators despite the advertised behavior.
			} catch (err) {
				console.error("[pi-memctx] curate failed (async)", err);
			}

index.ts:913

  • An invalid MEMCTX_LLM_MODEL logs a warning every time this helper is called. Several request paths call resolveLlmModel multiple times (and pack refresh can call it per repository), so one bad env value can flood logs; consider caching the resolution/warn result or warning once per value.
		const allModels = ctx.modelRegistry.getAll();
		const found = allModels.find((m) => m.id === envModel);
		if (found) return found;
		// Not found — warn and fall back to ctx.model
		console.warn(`[pi-memctx] MEMCTX_LLM_MODEL=${envModel} not found in registry, falling back to ctx.model`);

index.ts:5314

  • This newly fire-and-forget agent_end path is not covered by the added tests, even though it changes execution ordering and error handling. Please add a hook test that verifies the handler returns without awaiting curation and that background failures are captured/logged, so regressions in the async behavior are caught.
		void (async () => {
			try {
				const messages = (event as any).messages ?? [];
				const { snippets, hasWrites, richDiscovery } = collectTurnSnippets(messages);
				if (snippets.length === 0) return;

index.ts:905

  • When MEMCTX_LLM_MODEL is set, this assumes ctx.modelRegistry is always present. Several existing tests invoke changed call paths such as memctx_save with an empty context object, so running the suite with the new env var set will throw here instead of falling back; guard the registry access or update the affected contexts/tests to keep the env override truly opt-in.
	if (envModel) {
		// Try provider/model format first
		const slashIdx = envModel.indexOf("/");
		if (slashIdx > 0) {
			const provider = envModel.slice(0, slashIdx);
			const modelId = envModel.slice(slashIdx + 1);
			if (provider && modelId) {
				const resolved = ctx.modelRegistry.find(provider, modelId);
				if (resolved) return resolved;

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread index.ts
Comment on lines +888 to +895
/**
* Resolve the LLM model to use for pi-memctx operations.
*
* Priority:
* 1. MEMCTX_LLM_MODEL env var (if set and found in registry)
* 2. ctx.model (pi's active model — existing behavior)
*/
export function resolveLlmModel(ctx: ExtensionContext): ExtensionContext["model"] {
Copy link
Copy Markdown
Contributor

@magnonta magnonta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, I did a review of this alongside #70. The direction is solid and the problem is real — I just found a few things that need fixing before we can merge.

I posted design direction on the five open questions in #70, please check that first.


Blocking

1. resolveLlmModel needs to go through MemctxConfig / envOrConfig

The way I see it, this is the main architectural issue. The PR reads process.env.MEMCTX_LLM_MODEL directly on every call to resolveLlmModel(), but we already have the MemctxConfig / applyMemctxConfig() / envOrConfig() pattern for all 15 existing settings. Bypassing that creates three problems:

  • Inconsistent with how everything else is configured
  • No config.json support — user can't persist llmModel without env vars
  • console.warn fires on every call when the env value is invalid (flood)

The fix is straightforward: add llmModel?: string to MemctxConfig, resolve once in applyMemctxConfig() via envOrConfig(), and have resolveLlmModel use the already-resolved module-level variable.

2. confirm mode inside fire-and-forget

This one is a real bug. The entire agent_end body is wrapped in void (async () => { ... })(), but when autosaveMode === "confirm" it still calls ctx.ui.confirm() inside that block — after the turn already returned. ctx.ui.confirm needs an active turn to prompt the user, so this will either deadlock, throw, or silently fail.

The confirm path needs to run synchronously (before the void wrapper), or we skip curate entirely when in confirm mode.

3. Global state race condition

The fire-and-forget closure reads activePack and activePackPath by reference. If the user switches packs while background curate is still running, candidates get saved to the wrong pack. Snapshot both values at the top:

void (async () => {
    const pack = activePack;
    const packPath = activePackPath;
    // use these instead of globals
})();

4. memctx_save model attribution is wrong

memctx_save passes resolveLlmModel(ctx)?.id as the model field, but memctx_save is called by the host agent. The content was generated by the host model (e.g. Opus), not the internal memctx model (e.g. Haiku). It should be ctx.model?.id here. resolveLlmModel(ctx)?.id is correct for curate-generated and pack-generate notes — just not for memctx_save.


Important

5. ctx.signal in fire-and-forget

completeJsonWithModel passes ctx.signal to the LLM call. After the turn returns, the host may abort the signal and background curate fails silently. Maybe the fire-and-forget path should use its own AbortController, or at least handle signal abort gracefully.

6. Double resolveLlmModel calls

Several spots resolve the model for a guard check, then call completeJsonWithLlm which resolves it again internally. Example in judgeGatewayMemory:

const resolvedModel = resolveLlmModel(ctx);        // first
if (precisionRepoQuestion && resolvedModel) {
    await completeJsonWithModel(ctx, ...);          // resolves again inside
}

Same in buildRetrievalQueries, curateMemoryCandidatesFromTurn, and others. Each call hits process.env + getAll(). With the envOrConfig fix this becomes a non-issue, but worth cleaning up.

7. No tests for the async behavior

The 10 new tests cover resolveLlmModel and attribution well, but nothing verifies that agent_end actually returns without awaiting, that background errors are captured, or the confirm mode edge case.

8. /memctx-review approve attribution timing

The model is resolved at approve time, but the candidate was curated earlier — possibly in a different session with a different model config. The model should be captured at curate time and stored with the queued candidate, not resolved when the user approves.


Minor

9. .brv/ in .gitignore

This looks like a local test artifact from your workspace. Should go in .git/info/exclude rather than the shared .gitignore.

10. Indentation

The first few lines inside void (async () => { are re-indented, but the rest of the body keeps old indentation. Mixed indentation in the same block — not a runtime issue but makes the code inconsistent.

11. ctx.modelRegistry guard

If MEMCTX_LLM_MODEL is set but ctx.modelRegistry is absent (some test contexts), resolveLlmModel throws. Worth guarding.

12. Curate errors should show in /memctx-status

console.error alone for background failures is not enough. If curate is failing systematically, the user has no way to know. A counter in llmStats (e.g. curateErrors) that shows in /memctx-status would help.


Happy to review a v2 once these are addressed. The core ideas are good — just need to fit them into the existing patterns.

@ersintarhan
Copy link
Copy Markdown
Author

ersintarhan commented May 15, 2026

Addressed the review feedback in a5aaf8f.

What changed

  • Moved the internal LLM model override into the memctx config flow:
    • llmModel in config
    • MEMCTX_LLM_MODEL env override
    • model registry guard
    • one-time warning for invalid configured models
  • Fixed model attribution:
    • memctx_save now attributes to the host model (ctx.model?.id)
    • reviewed candidates preserve curate-time model attribution
  • Hardened async agent_end autosave:
    • snapshots pack/path/model before background work
    • skips detached UI confirm prompts
    • uses a separate abort signal
    • tracks background curate errors in /memctx-status
  • Removed shared .brv/ ignore entry and moved it to local .git/info/exclude.
  • Updated docs and added unit coverage for:
    • registry fallback
    • one-time invalid-model warnings
    • host-model attribution

Validation

  • bun run ci

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Async agent_end + MEMCTX_LLM_MODEL env override + model attribution

3 participants