feat: add MEMCTX_LLM_MODEL env, async agent_end, model attribution#71
feat: add MEMCTX_LLM_MODEL env, async agent_end, model attribution#71ersintarhan wants to merge 2 commits into
Conversation
Three orthogonal improvements; fully backward compatible.
1. MEMCTX_LLM_MODEL env override
- New resolveLlmModel(ctx) helper checks process.env.MEMCTX_LLM_MODEL
- Supports plain 'id' and 'provider/model' formats via ctx.modelRegistry
- Warns + falls back to ctx.model if env value is invalid
- Refactored 10 call sites that previously used ctx.model directly
- completeJsonWithLlm and completeJsonWithModel accept optional model param
2. Async agent_end (fire-and-forget curate)
- agent_end hook now wraps curate in 'void (async () => { ... })()'
- Parent turn returns immediately; curate runs in the background
- Errors logged via ctx.logger?.error, never thrown to host
- Significantly improves UX with slow host models (opus, etc.)
3. Optional 'model:' frontmatter attribution
- buildNote and llmArchitectureNote accept optional model parameter
- Renders 'model: <id>' field only when supplied (backward compatible)
- saveMemoryCandidate threads resolveLlmModel(ctx)?.id through
- Enables audit trail of which model curated each memory
Tests: 10 new unit tests (5 resolveLlmModel + 2 buildNote + 2 llmArchitectureNote
+ 1 memctx_save integration). Total: 111 passing. 'bun run ci' green.
Backward compat: with MEMCTX_LLM_MODEL unset and no model passed, behavior is
identical to v0.13.1.
There was a problem hiding this comment.
Pull request overview
Adds opt-in LLM model selection and model attribution for pi-memctx memory generation, while making agent_end curation run asynchronously to reduce user-facing latency.
Changes:
- Adds
resolveLlmModeland routes LLM call sites throughMEMCTX_LLM_MODELfallback logic. - Adds optional
model:frontmatter to generated notes and LLM architecture notes. - Converts
agent_endmemory curation to fire-and-forget background execution and adds related unit tests.
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
index.ts |
Implements model resolution, model attribution, and async agent_end curation. |
test/unit.test.ts |
Adds tests for model resolution and model frontmatter attribution. |
.gitignore |
Ignores local .brv/ test artifact directory. |
Comments suppressed due to low confidence (5)
index.ts:5314
- Because the fire-and-forget task continues after this handler returns while
activePack/activePackPathremain global mutable state, a pack switch or/memctx-initduring the background LLM call can cause the turn's candidates to be saved or queued against the wrong pack. Snapshot the active pack/path for this task and ensure the save path uses that snapshot instead of the current globals.
void (async () => {
try {
const messages = (event as any).messages ?? [];
const { snippets, hasWrites, richDiscovery } = collectTurnSnippets(messages);
if (snippets.length === 0) return;
index.ts:5403
- The PR description says async curate errors are logged through
ctx.logger?.error, but this implementation writes toconsole.errorinstead. If the host only captures extension logs viactx.logger, background failures may be invisible to users/operators despite the advertised behavior.
} catch (err) {
console.error("[pi-memctx] curate failed (async)", err);
}
index.ts:913
- An invalid
MEMCTX_LLM_MODELlogs a warning every time this helper is called. Several request paths callresolveLlmModelmultiple times (and pack refresh can call it per repository), so one bad env value can flood logs; consider caching the resolution/warn result or warning once per value.
const allModels = ctx.modelRegistry.getAll();
const found = allModels.find((m) => m.id === envModel);
if (found) return found;
// Not found — warn and fall back to ctx.model
console.warn(`[pi-memctx] MEMCTX_LLM_MODEL=${envModel} not found in registry, falling back to ctx.model`);
index.ts:5314
- This newly fire-and-forget
agent_endpath is not covered by the added tests, even though it changes execution ordering and error handling. Please add a hook test that verifies the handler returns without awaiting curation and that background failures are captured/logged, so regressions in the async behavior are caught.
void (async () => {
try {
const messages = (event as any).messages ?? [];
const { snippets, hasWrites, richDiscovery } = collectTurnSnippets(messages);
if (snippets.length === 0) return;
index.ts:905
- When
MEMCTX_LLM_MODELis set, this assumesctx.modelRegistryis always present. Several existing tests invoke changed call paths such asmemctx_savewith an empty context object, so running the suite with the new env var set will throw here instead of falling back; guard the registry access or update the affected contexts/tests to keep the env override truly opt-in.
if (envModel) {
// Try provider/model format first
const slashIdx = envModel.indexOf("/");
if (slashIdx > 0) {
const provider = envModel.slice(0, slashIdx);
const modelId = envModel.slice(slashIdx + 1);
if (provider && modelId) {
const resolved = ctx.modelRegistry.find(provider, modelId);
if (resolved) return resolved;
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /** | ||
| * Resolve the LLM model to use for pi-memctx operations. | ||
| * | ||
| * Priority: | ||
| * 1. MEMCTX_LLM_MODEL env var (if set and found in registry) | ||
| * 2. ctx.model (pi's active model — existing behavior) | ||
| */ | ||
| export function resolveLlmModel(ctx: ExtensionContext): ExtensionContext["model"] { |
magnonta
left a comment
There was a problem hiding this comment.
Hey, I did a review of this alongside #70. The direction is solid and the problem is real — I just found a few things that need fixing before we can merge.
I posted design direction on the five open questions in #70, please check that first.
Blocking
1. resolveLlmModel needs to go through MemctxConfig / envOrConfig
The way I see it, this is the main architectural issue. The PR reads process.env.MEMCTX_LLM_MODEL directly on every call to resolveLlmModel(), but we already have the MemctxConfig / applyMemctxConfig() / envOrConfig() pattern for all 15 existing settings. Bypassing that creates three problems:
- Inconsistent with how everything else is configured
- No
config.jsonsupport — user can't persistllmModelwithout env vars console.warnfires on every call when the env value is invalid (flood)
The fix is straightforward: add llmModel?: string to MemctxConfig, resolve once in applyMemctxConfig() via envOrConfig(), and have resolveLlmModel use the already-resolved module-level variable.
2. confirm mode inside fire-and-forget
This one is a real bug. The entire agent_end body is wrapped in void (async () => { ... })(), but when autosaveMode === "confirm" it still calls ctx.ui.confirm() inside that block — after the turn already returned. ctx.ui.confirm needs an active turn to prompt the user, so this will either deadlock, throw, or silently fail.
The confirm path needs to run synchronously (before the void wrapper), or we skip curate entirely when in confirm mode.
3. Global state race condition
The fire-and-forget closure reads activePack and activePackPath by reference. If the user switches packs while background curate is still running, candidates get saved to the wrong pack. Snapshot both values at the top:
void (async () => {
const pack = activePack;
const packPath = activePackPath;
// use these instead of globals
})();4. memctx_save model attribution is wrong
memctx_save passes resolveLlmModel(ctx)?.id as the model field, but memctx_save is called by the host agent. The content was generated by the host model (e.g. Opus), not the internal memctx model (e.g. Haiku). It should be ctx.model?.id here. resolveLlmModel(ctx)?.id is correct for curate-generated and pack-generate notes — just not for memctx_save.
Important
5. ctx.signal in fire-and-forget
completeJsonWithModel passes ctx.signal to the LLM call. After the turn returns, the host may abort the signal and background curate fails silently. Maybe the fire-and-forget path should use its own AbortController, or at least handle signal abort gracefully.
6. Double resolveLlmModel calls
Several spots resolve the model for a guard check, then call completeJsonWithLlm which resolves it again internally. Example in judgeGatewayMemory:
const resolvedModel = resolveLlmModel(ctx); // first
if (precisionRepoQuestion && resolvedModel) {
await completeJsonWithModel(ctx, ...); // resolves again inside
}Same in buildRetrievalQueries, curateMemoryCandidatesFromTurn, and others. Each call hits process.env + getAll(). With the envOrConfig fix this becomes a non-issue, but worth cleaning up.
7. No tests for the async behavior
The 10 new tests cover resolveLlmModel and attribution well, but nothing verifies that agent_end actually returns without awaiting, that background errors are captured, or the confirm mode edge case.
8. /memctx-review approve attribution timing
The model is resolved at approve time, but the candidate was curated earlier — possibly in a different session with a different model config. The model should be captured at curate time and stored with the queued candidate, not resolved when the user approves.
Minor
9. .brv/ in .gitignore
This looks like a local test artifact from your workspace. Should go in .git/info/exclude rather than the shared .gitignore.
10. Indentation
The first few lines inside void (async () => { are re-indented, but the rest of the body keeps old indentation. Mixed indentation in the same block — not a runtime issue but makes the code inconsistent.
11. ctx.modelRegistry guard
If MEMCTX_LLM_MODEL is set but ctx.modelRegistry is absent (some test contexts), resolveLlmModel throws. Worth guarding.
12. Curate errors should show in /memctx-status
console.error alone for background failures is not enough. If curate is failing systematically, the user has no way to know. A counter in llmStats (e.g. curateErrors) that shows in /memctx-status would help.
Happy to review a v2 once these are addressed. The core ideas are good — just need to fit them into the existing patterns.
|
Addressed the review feedback in What changed
Validation
|
Closes #70
Motivation
Heavy host models (Claude Opus, etc.) make pi-memctx's
agent_endhook block the parent agent turn 10-30s on rich turns, degrading coding UX. Additionally, there's no way to delegate pi-memctx's internal LLM work (curate, gateway judge, query expansion, pack-generate) to a faster/cheaper model while keeping the host on a premium model for reasoning.Changes (three orthogonal, opt-in improvements)
1. Async
agent_end— highest impactHook now wraps curate logic in
void (async () => { ... })()so the parent turn returns immediately. Curate runs in the background; errors logged throughctx.logger?.error. Validated in production: opus host now feels instant on rich turns.2.
MEMCTX_LLM_MODELenv overrideNew
resolveLlmModel(ctx)helper:ctx.modelRegistrybyidorprovider/modelctx.modelif invalid (no hard fail)ctx.modelreads in 10 call sitescompleteJsonWithLlmandcompleteJsonWithModelalso accept an explicitmodelparam3. Optional
model:frontmatter attributionbuildNoteandllmArchitectureNoteaccept an optionalmodelparameter. When provided, frontmatter gets:Audit trail for which model wrote each memory. Old notes without the field continue to work.
Tests
test/unit.test.ts:resolveLlmModel(unset, by id, provider/model, invalid → warn + fallback, no model → undefined)buildNotemodel attribution (with model, without)llmArchitectureNotemodel attribution (with model, without)memctx_saveintegration test (saved note carriesmodel:field)bun run ci(typecheck + tests + e2e) greenBackward compatibility
Zero breaking changes:
MEMCTX_LLM_MODELunset + no explicitmodelparam → identical to v0.13.1agent_endhook signature unchangedmodel:frontmatter only rendered when suppliedReal-world validation
Patch deployed to a Sentirum dev workspace via symlink (
~/.pi/agent/extensions/pi-memctx) and run against a 49-doc pack with Claude Opus as the host model. Observed:model: <id>field (attribution proven)Notes
files: ['index.ts', ...])~/.pi/agent/extensions/*/index.tssymlink loader confirmed working for local dev iterationUse case for the env override
A typical setup that benefits:
This effectively splits the model budget: keep premium reasoning for the user-facing agent, delegate background memory work to a cheaper tier.