diff --git a/docs/beam-10m-smoke-plan.md b/docs/beam-10m-smoke-plan.md
deleted file mode 100644
index 8088d81..0000000
--- a/docs/beam-10m-smoke-plan.md
+++ /dev/null
@@ -1,150 +0,0 @@
-# BEAM-10M Smoke + Multirun Plan
-
-**Date:** 2026-05-07
-**Branch:** `worktree-tbc-prototype`
-**Status:** plan only. Smoke runs in T3.2 (post-T2.4 wire-up); full multirun in T4.1.
-
----
-
-## 1. Dataset acquisition
-
-**Source:** `Mohammadta/BEAM-10M` on HuggingFace.
-**Shape:** 10 conversations × 200 total questions (20 per conv × 10 abilities × 2 each).
-**Approx context:** ~1.4M tokens per conversation × 10 = ~14M tokens total.
-**On-disk size:** ~140 MB JSON.
-
-Acquisition command:
-```bash
-huggingface-cli download Mohammadta/BEAM-10M --repo-type dataset \
-  --local-dir atomicmemory-benchmarks/data/beam-10m
-```
-
-Then `src/eval/beam-10m-loader.ts::loadBeam10MDataset()` parses the local JSON. The current loader is a typed stub; the real implementation lands when the file is in place.
-
----
-
-## 2. Cost estimate
-
-Per `estimateBeamCost()` (per-seed, full 10-conv × 200-question run):
-
-| Component | Cost |
-|---|---|
-| Ingest (10 convs × ~150 facts × $0.002/fact) | ~$3.00 |
-| Hierarchical summaries (10 convs × ~50 sessions × $0.001/session + $0.005/conv-summary) | ~$0.55 |
-| Search + answer + judge (200 q × $0.10/q multi-iter) | ~$20.00 |
-| **Per-seed total** | **~$23.55** |
-| **Multirun n=3 with bootstrap CI** | **~$70.65** |
-
-Numbers are conservative — actual BEAM-100K runs came in ~50% under the per-question estimate. Likely real cost: **$60–80 for full n=3 multirun**.
-
-LiteLLM remaining budget: $68 of $100 → fits comfortably with no extra spend approval.
-
----
-
-## 3. Smoke test scope
-
-| Property | Value |
-|---|---|
-| Conversations | conv-1 only |
-| Questions | 20 (smoke = full conv-1 question set) |
-| Seeds | n=1 |
-| Stack | H1.1 (token budget=4000, top_k=100) + classifier on + ability_hint OFF + TBC ON + hierarchical ON |
-| Backbone | Haiku 4.5 via LiteLLM |
-| Cost cap | **$5** (hard cap; abort if exceeded) |
-| Wall-time cap | 30 minutes |
-
-**Why these knobs:** every Phase 1–2 lever set to its current best value; Phase 3 architectural additions (TBC + hierarchical) enabled to validate they don't crash at 10M scale.
-
-### Smoke success criteria
-
-| Criterion | Threshold | Action if missed |
-|---|---|---|
-| No crashes / unhandled errors | 0 | block T4.1, diagnose |
-| Wall time | ≤ 30 min | tune top_k or skip TBC for multirun |
-| LiteLLM cost | ≤ $5 | scale knobs down before multirun |
-| AM ingest completes for 1.4M-token conversation | yes | check for token-limit errors; chunk if needed |
-| All 20 questions answered (no parse-errors) | ≥ 18/20 | investigate judge prompt; lower threshold ok |
-| Composite ≥ 0.40 on conv-1 | yes | gates the $70+ multirun |
-
-A composite of **0.40** on conv-1 is a sanity floor (matches conv-1 BEAM-100K performance). If conv-1 BEAM-10M is below 0.40, the architecture isn't reaching the right facts at scale and a multirun would burn budget on a known-bad config.
-
----
-
-## 4. Multirun protocol (T4.1)
-
-**Trigger:** smoke test passes all success criteria.
-
-| Property | Value |
-|---|---|
-| Conversations | all 10 (full BEAM-10M canonical set) |
-| Questions | 200 |
-| Seeds | n=3 |
-| Bootstrap | 95% CI over per-conv composites |
-| Backbone | Haiku 4.5 |
-| Hard cost cap | $130 (deny on overrun) |
-| Hard wall-time cap | 8 hours |
-
-### Multirun seeds
-
-Seeds are different **answer-LLM seeds** + **session-summary-LLM seeds**, both forwarded via the existing seed plumbing. Ingest is deterministic per fixture so seeds only affect generation steps.
-
-### Result reporting
-
-| File | Contents |
-|---|---|
-| `data/beam-10m/results-T4.1-<timestamp>.json` | per-conv per-question per-seed verdicts |
-| `data/beam-10m/results-T4.1-<timestamp>-summary.md` | composite + per-ability breakdown + bootstrap CI |
-| `verified-results-extended.csv` row | `AM TBC+hierarchical, Haiku 4.5, BEAM-10M, <composite>, n=3` |
-
-### Comparison targets
-
-| System | Reference | Source |
-|---|---|---|
-| Mem0 OSS BEAM-10M | 0.486 | `mem0ai/memory-benchmarks` published |
-| Truncation baseline (this sprint, BEAM-10M) | TBD via T4.0 | needs separate $25 truncation run |
-| AM H1.1 + TBC + hierarchical (this run) | TBD | T4.1 output |
-
----
-
-## 5. Decision gate after T4.1
-
-| 3-conv composite mean | Interpretation | Next action |
-|---|---|---|
-| **≥ 0.55** | Strong SOTA over Mem0 0.486 (+0.06+) | Write **paper variant 5.1-A**: "AM beats Mem0 on BEAM-10M with TBC + hierarchical" |
-| **0.49–0.55** | Marginal SOTA (+0.00–0.06) | Write **paper variant 5.1-B**: "AM matches/edges Mem0 on BEAM-10M with reproducible OSS architecture" |
-| **0.45–0.49** | Tied within noise | Methodology paper (5.1-C) + BEAM-100K headline; framing notes 10M parity |
-| **< 0.45** | Below Mem0 baseline | Pivot fully to methodology paper (5.1-C) using existing BEAM-100K data |
-
-**Probability estimates** (based on Haiku BEAM-100K +0.10 lift from H1.1 + Tier 2 unmeasured architectural lift):
-
-| Outcome | Probability |
-|---|---|
-| ≥ 0.55 (clean SOTA) | ~25% |
-| 0.49–0.55 (marginal SOTA) | ~30% |
-| 0.45–0.49 (tied) | ~25% |
-| < 0.45 (below baseline) | ~20% |
-
-Combined "we hit SOTA-or-tied" probability: **~80%.** Fallback paper exists either way.
-
----
-
-## 6. Pre-T4.1 dependencies
-
-Before firing the multirun, these must land:
-
-| Dependency | Status | Blocker for T4.1? |
-|---|---|---|
-| T2.4: hierarchical arm wired into memory-search.ts RRF fusion | not yet defined | yes (without this, hierarchical does nothing) |
-| TBC dual-write hook installed in production runtime | not yet defined | only if TBC is in the run |
-| BEAM-10M dataset downloaded locally | T3.2 follow-up | yes |
-| AM server restarted with `HIERARCHICAL_RETRIEVAL_ENABLED=true` and `TBC_ENABLED=true` | env config | yes |
-| BEAM-10M smoke (conv-1) passes all criteria | T3.2 | yes |
-
----
-
-## 7. What we are NOT doing in this sprint
-
-- BEAM-1M tier (managed-platform-only Mem0 number, unreproducible)
-- Cross-backbone GPT-5 BEAM-10M (cost-prohibitive given the existing $32 spent + remaining headroom)
-- Multi-bench validation (LongMemEval, MultiSessionChat) — defer to follow-up sprint
-- Custom retrievers / trained classifiers (research-grade, multi-month)
diff --git a/docs/hierarchical-retrieval.md b/docs/hierarchical-retrieval.md
deleted file mode 100644
index ea8479f..0000000
--- a/docs/hierarchical-retrieval.md
+++ /dev/null
@@ -1,239 +0,0 @@
-# Hierarchical Retrieval — Design (T2.1)
-
-**Date:** 2026-05-06
-**Branch:** `worktree-tbc-prototype`
-**Status:** design only. Implementation in T2.2 (session-summary generation) and T2.3 (5th RRF arm).
-**Target:** BEAM-10M tier (10 conversations × ~1.4M tokens each = ~14M total context per system).
-
----
-
-## Why hierarchical retrieval is needed for BEAM-10M
-
-**BEAM-100K** (the tier we've been running) has 3 conversations × ~33k tokens each = ~100k total context. Top-K=100 over a flat vector index can plausibly recall the right facts.
-
-**BEAM-10M** has 10 conversations × ~1.4M tokens each ≈ **14M total context**. The fact store grows to ~3,000–5,000 atomic claims per system. A flat vector retrieval over 5,000 facts at top-K=100 returns 2% of the store; if the right facts are not in that 2%, the answer LLM has nothing useful.
-
-Hindsight publishes 0.486 BEAM-10M with their TEMPR architecture (4 retrieval arms + cross-encoder rerank + token budget). Their advantage at this tier is precisely the **multi-arm retrieval** that separately surfaces (a) topically similar facts, (b) lexically matching facts, (c) entity-graph-connected facts, (d) temporally-relevant facts.
-
-We have arms (a) and (b) (vector + BM25). We're missing the **hierarchical** retrieval shape that handles 14M-token context: **first pick the right conversation/session, then expand to atomic facts within**. Without this, the 4th and 5th RRF arms (temporal, graph) are also recall-bound by the same flat-store problem.
-
----
-
-## The architecture
-
-### Three-level memory hierarchy
-
-```
-       conversation (level 2)
-       ├── conv_summary  (~200 tokens, embedded)
-       │
-       ├── session (level 1)
-       │   ├── session_summary  (~100 tokens, embedded)
-       │   ├── session_topics   (existing FactMetadata field)
-       │   │
-       │   └── atomic claim (level 0 — existing memories table)
-       │       ├── content
-       │       ├── embedding
-       │       ├── classifier metadata (existing)
-       │       └── belief state (TBC Phase 3 — confidence, tier, edges)
-```
-
-Level 0 already exists: the `memories` table with per-claim atomic storage.
-Level 1 (session) and Level 2 (conversation) are new.
-
-### Schema additions
-
-**New tables** (Phase 5 of TBC roadmap):
-
-```sql
-CREATE TABLE IF NOT EXISTS session_summaries (
-  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
-  user_id TEXT NOT NULL,
-  session_id TEXT NOT NULL,            -- BEAM-style session anchor
-  conversation_id TEXT NOT NULL,
-  session_index INTEGER NOT NULL,
-  summary_text TEXT NOT NULL,           -- LLM-generated, ~100 tokens
-  summary_embedding vector({{EMBEDDING_DIMENSIONS}}) NOT NULL,
-  topics TEXT[] NOT NULL DEFAULT '{}',  -- denormalized from session_topics metadata
-  fact_count INTEGER NOT NULL DEFAULT 0,
-  occurred_start TIMESTAMPTZ DEFAULT NULL,
-  occurred_end TIMESTAMPTZ DEFAULT NULL,
-  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
-  workspace_id UUID DEFAULT NULL,
-  agent_id UUID DEFAULT NULL
-);
-
-CREATE TABLE IF NOT EXISTS conv_summaries (
-  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
-  user_id TEXT NOT NULL,
-  conversation_id TEXT NOT NULL,
-  summary_text TEXT NOT NULL,           -- LLM-generated, ~200 tokens
-  summary_embedding vector({{EMBEDDING_DIMENSIONS}}) NOT NULL,
-  session_count INTEGER NOT NULL DEFAULT 0,
-  fact_count INTEGER NOT NULL DEFAULT 0,
-  occurred_start TIMESTAMPTZ DEFAULT NULL,
-  occurred_end TIMESTAMPTZ DEFAULT NULL,
-  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
-  workspace_id UUID DEFAULT NULL,
-  agent_id UUID DEFAULT NULL
-);
-
-CREATE INDEX IF NOT EXISTS idx_session_summaries_user_conv
-  ON session_summaries (user_id, conversation_id, session_index);
-CREATE INDEX IF NOT EXISTS idx_session_summaries_embedding
-  ON session_summaries USING hnsw (summary_embedding vector_cosine_ops)
-  WITH (m = 16, ef_construction = 200);
-CREATE INDEX IF NOT EXISTS idx_conv_summaries_user
-  ON conv_summaries (user_id, conversation_id);
-CREATE INDEX IF NOT EXISTS idx_conv_summaries_embedding
-  ON conv_summaries USING hnsw (summary_embedding vector_cosine_ops)
-  WITH (m = 16, ef_construction = 200);
-```
-
----
-
-## When summaries are generated
-
-| Granularity | Trigger | Latency budget | Cost per summary |
-|---|---|---|---|
-| Session summary | end-of-session ingest (last batch chunk lands) | 1-3 s | ~$0.001 (Haiku, ~500 input tokens) |
-| Conversation summary | end-of-conversation ingest (final session ingested) | 2-5 s | ~$0.005 (Haiku, ~2000 input tokens) |
-
-The LLM call is gated by config flag `HIERARCHICAL_RETRIEVAL_ENABLED`. When off, no summaries generated, no rows written.
-
-For BEAM-10M:
-- 10 conversations × ~50 sessions each = 500 session summaries × $0.001 = $0.50
-- 10 conversation summaries × $0.005 = $0.05
-- **Total summary-generation cost: ~$0.55 per system per BEAM-10M run.** Negligible.
-
----
-
-## How retrieval changes — the 5th RRF arm
-
-### Existing 3-arm pipeline (current AM)
-```
-query
-  ├── vector arm   → top-K via pgvector cosine on embeddings
-  ├── BM25 arm     → top-K via tsvector
-  └── (existing extras: lessons gate, consensus filter)
-        ↓
-       RRF fusion
-        ↓
-      top-K facts
-```
-
-### New 5-arm pipeline (T2.3)
-```
-query
-  ├── vector arm           (existing)
-  ├── BM25 arm             (existing)
-  ├── temporal arm         (H1.2 — pending; date-range parsed from query)
-  ├── graph arm            (H1.3 — pending; entity-link 1-hop expansion)
-  └── HIERARCHICAL arm     (this doc)
-       ├── stage 1: vector-search conv_summaries → top-3 conversations
-       ├── stage 2: vector-search session_summaries WHERE conv ∈ stage-1 → top-10 sessions
-       ├── stage 3: vector-search memories WHERE session ∈ stage-2 → top-50 facts
-       └── candidates feed into RRF as a 5th arm
-                                     ↓
-                              RRF fusion (k=60)
-                                     ↓
-                       top-300 candidates → cross-encoder rerank
-                                     ↓
-                       fill prompt to SEARCH_TOKEN_BUDGET tokens
-```
-
-### Why the hierarchical arm complements vector
-
-Vector arm answers "which atomic facts are most similar to the query?" — fast at small scale, drowns in a 5000-fact store.
-
-Hierarchical arm answers "which session's *gist* matches the query, and what facts live in that session?" — gives the answerer a coherent slice of conversation rather than a scattered sample.
-
-For "What did we agree on at the project kickoff?" — vector might return 100 unrelated atoms; hierarchical filters to the kickoff-session summary first, then expands within.
-
-For "What did we discuss about the API design over the last month?" — hierarchical surfaces 3-5 sessions whose summaries mention "API"; vector arm picks the specific atomic facts inside those sessions; both go through RRF.
-
----
-
-## Per-ability hypotheses
-
-Hierarchical arm is hypothesised to lift these BEAM abilities:
-
-| Ability | Why hierarchical helps |
-|---|---|
-| **MSR** (multi-session reasoning) | The arm explicitly surfaces 3+ distinct sessions before atomic expansion — exactly what MSR questions need. Currently broken at 0/6 across our runs. |
-| **EO** (event ordering) | Session summaries carry `occurred_start`/`occurred_end` time anchors; ordering is geometric. Currently 0/6. |
-| **TR** (temporal reasoning) | "Last month" → conv_summary filter by occurred_at range → expand. Cheaper than the standalone temporal RRF arm. |
-| **SUM** (summarization) | Conv summaries ARE the summarisation — the arm directly returns them when SUM queries hit. |
-
-Conservatively expected lift: **+0.10 on BEAM-10M composite** from MSR/EO alone, more if SUM benefits.
-
----
-
-## Why this is a 5th arm, not a replacement
-
-A fully hierarchical retriever (LLM walks the document tree, paradigm 5 in the 19-system survey) replaces vector entirely. We **do not** want that — it's slow and expensive. Hindsight, Mem0, and other paradigm-4 systems found that hierarchical retrieval works best as a **fused arm** alongside vector + BM25 + temporal, not as the dominant strategy.
-
-Our hierarchical arm:
-- Returns ~50 atomic-level candidates (same scale as the other arms)
-- Goes through the existing RRF fusion (k=60)
-- Per-arm weights stay equal (per Hindsight's empirical finding)
-- Cross-encoder reranks the union
-
-The arm earns its slot in the union by surfacing facts that vector+BM25 miss because they're more similar to the *session gist* than to the literal query.
-
----
-
-## Cost model for BEAM-10M
-
-| Phase | Cost |
-|---|---|
-| Ingest (10 convs × ~50 sessions × ~150 facts) | ~$5 LLM (existing pipeline) |
-| Session-summary generation | ~$0.55 |
-| Conv-summary generation | ~$0.05 |
-| Question phase (200 questions × multi-iter answer + judge) | ~$25-40 |
-| **Total per seed** | **~$30-45** |
-| Multirun n=3 with bootstrap CI | **~$90-135** |
-
-Fits within remaining LiteLLM budget ($68 of $100).
-
----
-
-## Implementation order (after Phase 3 schema lands)
-
-| Step | Task | Output |
-|---|---|---|
-| 1 | T2.1 design (this doc) | ✓ |
-| 2 | T2.2: session-summary generation in ingest pipeline | new module `session-summary-generator.ts`; gated by env flag |
-| 3 | T2.2: conv-summary generation hook | added to ingest pipeline finalizer |
-| 4 | Schema migration: append session_summaries + conv_summaries to schema.sql | (additive, IF NOT EXISTS) |
-| 5 | T2.3: hierarchical retrieval arm in memory-search | wired into RRF fusion; gated by `HIERARCHICAL_RETRIEVAL_ENABLED` |
-| 6 | Smoke test on BEAM-100K (single conv) | confirm no regression when flag off; lift on MSR/EO when on |
-| 7 | T3.1: BEAM-10M smoke (single conv) | end-to-end at scale |
-
-Steps 2-5 are ~2-3 weeks of focused engineering. Step 6 is the regression gate before BEAM-10M cost.
-
----
-
-## Open questions for implementation
-
-1. **Summary prompt template.** Should session summaries be facts-style ("Alice mentioned she's switching to TypeScript; team agreed on Postgres") or topics-style ("TypeScript migration discussion; database choice debate")? Topics-style aligns with `session_topics` metadata; facts-style is more retrievable. Recommend topics-style for v1.
-
-2. **Re-summarisation under update.** When a session has new facts ingested after summary generation (e.g., late-arriving messages), do we regenerate the summary? Default: yes, replace; the table's `created_at` reflects the latest gen.
-
-3. **Cross-conversation summary.** A user's "career trajectory" might span multiple conversations. Should there be a higher-level "user_summary" tier? Defer to Phase 6+; not needed for BEAM-10M.
-
-4. **Hot-path latency.** Hierarchical arm adds 2 vector lookups + 1 SQL filter per query. With HNSW indexes both lookups are <50ms. Total query latency: vector(50ms) + BM25(20ms) + hierarchical(120ms) + temporal(30ms) + graph(150ms) + RRF(5ms) + rerank(80ms) = ~450ms. Acceptable for non-realtime BEAM eval.
-
-5. **Belief-tier integration.** When TBC Phase 3 lands, hierarchical retrieval should respect `belief_tier`. Concretely: stage-3 atomic-fact retrieval filters out `tier='retracted'`; directives surface first. This is a one-line WHERE clause addition in T2.3.
-
----
-
-## Why this is the right next step (not Phase 3 of TBC)
-
-Two architectural commitments are in flight:
-- **TBC Phase 3** (T1.2, T1.3) — typed belief operators with a queryable graph.
-- **Hierarchical retrieval** (T2) — multi-level summary indexing for BEAM-10M scale.
-
-These are **orthogonal** — TBC Phase 3 changes WHAT we store; hierarchical changes HOW we retrieve. They compose cleanly: hierarchical retrieval at stage 3 (atomic) reads the new TBC `belief_tier` column to skip retracted claims and surface directives.
-
-The right ordering: TBC Phase 3 schema first (T1.2 done), then hierarchical retrieval implementation (T2.2 + T2.3), then dual-write integration (T1.3) so the search layer can read both new shapes simultaneously. Both ship together as the "BEAM-10M architecture commit."
diff --git a/docs/superpowers/plans/2026-05-11-beam-085-phase0-phase1.md b/docs/superpowers/plans/2026-05-11-beam-085-phase0-phase1.md
deleted file mode 100644
index 50d5cd6..0000000
--- a/docs/superpowers/plans/2026-05-11-beam-085-phase0-phase1.md
+++ /dev/null
@@ -1,2196 +0,0 @@
-# BEAM 0.85+ — Phase 0 + Phase 1 Implementation Plan
-
-> **For agentic workers:** REQUIRED SUB-SKILL: Use `superpowers:subagent-driven-development` (recommended) or `superpowers:executing-plans` to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
-
-**Goal:** Implement Phase 0 (L1-patched, L3-deleted, baseline relock) and Phase 1 (async Reflect step via Sonnet with `session_reflections` table + `## OBSERVATIONS` retrieval channel) per `docs/superpowers/specs/2026-05-11-beam-085-anthropic-only-design.md`.
-
-**Architecture:** Phase 0 is surgical cleanup of the broken L3 mechanism and tightening of the L1 over-constraint that hurt event-ordering. Phase 1 adds an async Sonnet-driven consolidation step that runs at session boundaries via a Postgres-backed job queue, writes synthesized observations to a new table, and surfaces them at retrieval time as a separate prompt channel routed by question type.
-
-**Tech Stack:** TypeScript ESM, Express, Postgres + pgvector, Anthropic SDK (Haiku 4.5 + Sonnet 4.6), Vitest, Docker compose.
-
-**Gates:** Phase 0 must PASS (composite Δ ≥ +0.05 vs prev baseline 0.411 AND no per-ability regression > 0.10 at 4-conv n=80) before Phase 1 implementation begins. Phase 1 must PASS the same gate before we re-enter brainstorming for Phase 2.
-
-**Working directory:** `/Users/moralespanitz/me/supernet/atomicmemory-core/.claude/worktrees/tbc-prototype`
-
-**Constraints (from CLAUDE.md):**
-- TypeScript ESM, no `any`
-- Files ≤ 400 lines (excluding comments)
-- Functions ≤ 40 lines (excluding catch/finally)
-- No `process.env` reads outside `src/config.ts`
-- Mutations fail closed (no silent fallback)
-- JSDoc at top of every file
-- Pre-commit: `npx tsc --noEmit`, `npm test`, `fallow --no-cache`
-
----
-
-## File Structure
-
-### Phase 0 — modify only
-
-| File | Change |
-|---|---|
-| `src/services/answer-format.ts` | Patch ORDERED_LIST hint (relax "EXACTLY"). Tighten classifier to require both `list` AND a numeric/spelled-out token |
-| `src/services/__tests__/answer-format.test.ts` | New cases for patched behavior |
-| `src/services/counter-edge-surface.ts` | **DELETE** |
-| `src/services/__tests__/counter-edge-surface.test.ts` | **DELETE** |
-| `src/config.ts` | Remove `counterEdgeSurfaceEnabled` flag |
-| `src/app/runtime-container.ts` | Remove `counterEdgeSurfaceEnabled` from interface and unused belief-edges retrieval wiring |
-| `src/services/search-pipeline.ts` | Remove call to `maybeSurfaceCounterEdges` |
-| `src/services/retrieval-format.ts` | Remove `[CONTRADICTS prior fact]` marker emission |
-| `src/db/repository-types.ts` | Remove `counterOf?: string` from `SearchResult` |
-
-### Phase 1 — create
-
-| File | Role |
-|---|---|
-| `src/db/migrations/20260512_session_reflections.sql` | Schema for `session_reflections` + `reflection_jobs` |
-| `src/db/reflections-repository.ts` | CRUD + cosine-similarity search on `session_reflections` |
-| `src/db/reflection-jobs-repository.ts` | Postgres-backed job queue ops |
-| `src/services/reflect-prompts.ts` | Sonnet system prompt + Anthropic tool-use schema for consolidation |
-| `src/services/reflect.ts` | Orchestrator: load session memories → call Sonnet → parse → persist |
-| `src/services/reflect-jobs.ts` | Worker: poll queue, run reflect, mark status |
-| `src/services/reflect-retrieval.ts` | Query-time top-K reflection fetch by cosine similarity |
-| `src/services/__tests__/reflect.test.ts` | Unit tests for orchestrator |
-| `src/services/__tests__/reflect-jobs.test.ts` | Worker unit tests |
-| `src/services/__tests__/reflect-retrieval.test.ts` | Retrieval-side tests |
-| `src/db/__tests__/reflections-repository.test.ts` | Repo integration tests (real Postgres) |
-| `src/db/__tests__/reflection-jobs-repository.test.ts` | Queue integration tests |
-
-### Phase 1 — modify
-
-| File | Change |
-|---|---|
-| `src/services/memory-ingest.ts` | After AUDN commit, enqueue `reflection_jobs` row for `(userId, conversationId)` |
-| `src/services/search-pipeline.ts` | When query classifier matches SUM/KU/MSR/CR/PF/IE, fetch top-5 reflections and pass through stores to retrieval-format |
-| `src/services/retrieval-format.ts` | Emit `## OBSERVATIONS` prompt channel when reflections are present |
-| `src/db/stores.ts` | Add `reflections: ReflectionsRepository` and `reflectionJobs: ReflectionJobsRepository` |
-| `src/app/runtime-container.ts` | Instantiate the new repos and start the reflect-jobs worker |
-| `src/config.ts` | Add `REFLECT_ENABLED`, `REFLECT_MODEL`, `REFLECT_MAX_OBSERVATIONS`, `REFLECT_JOB_POLL_MS`, `REFLECT_DEBOUNCE_MS`, `REFLECT_RETRIEVAL_TOP_K` |
-| `src/routes/reflect.ts` | NEW route file mounted at `/v1/reflect/flush` for synchronous benchmark-mode flush |
-
-### Validation env files (Phase 0 + Phase 1)
-
-| File | Role |
-|---|---|
-| `.env.phase0-l1patched` | Kept stack + L1-patched (`ANSWER_FORMAT_ALIGNMENT_ENABLED=true`) + L3-deleted. Port 3102/5502. |
-| `.env.phase1-reflect` | Phase 0 + `REFLECT_ENABLED=true`, `REFLECT_MODEL=claude-sonnet-4-5`. Port 3103/5503. |
-
----
-
-# PHASE 0 — Foundation cleanup
-
-Goal: relock baseline above 0.411 (kept-stack reproduction) by patching L1's toxic ORDERED_LIST rule and deleting L3's broken `[CONTRADICTS]` marker mechanism.
-
-## Task 0.1: Patch L1 ORDERED_LIST hint to allow partial answers
-
-**Files:**
-- Modify: `src/services/answer-format.ts` (line 67, FORMAT_HINTS map)
-- Test: `src/services/__tests__/answer-format.test.ts`
-
-- [ ] **Step 1: Write the failing test for the new hint wording**
-
-Edit `src/services/__tests__/answer-format.test.ts`. Add this test inside the existing `describe('classifyQuestion', () => {})` block at the bottom:
-
-```typescript
-describe('getOutputFormatHint (patched)', () => {
-  it('ORDERED_LIST hint allows partial answers when items < requested count', () => {
-    const hint = getOutputFormatHint(QuestionType.ORDERED_LIST);
-    expect(hint).toContain('if retrievable');
-    expect(hint.toLowerCase()).not.toMatch(/exactly the count requested/);
-  });
-});
-```
-
-- [ ] **Step 2: Run the test to verify it fails**
-
-```bash
-cd /Users/moralespanitz/me/supernet/atomicmemory-core/.claude/worktrees/tbc-prototype
-npx vitest run src/services/__tests__/answer-format.test.ts -t "ORDERED_LIST hint allows partial"
-```
-
-Expected: FAIL — current hint contains "EXACTLY the count requested".
-
-- [ ] **Step 3: Patch the hint in `answer-format.ts`**
-
-Replace the ORDERED_LIST entry in the `FORMAT_HINTS` map (currently at line ~67):
-
-```typescript
-  [QuestionType.ORDERED_LIST]:
-    "FORMAT: Numbered list. Include EXACTLY the count requested if retrievable from the facts; otherwise list only the items that ARE retrievable and state that fewer than N are available. Format: '1) {item}, 2) {item}, ...'",
-```
-
-- [ ] **Step 4: Run the test to verify it passes**
-
-```bash
-npx vitest run src/services/__tests__/answer-format.test.ts -t "ORDERED_LIST hint allows partial"
-```
-
-Expected: PASS.
-
-- [ ] **Step 5: Run the full answer-format suite to verify no regressions**
-
-```bash
-npx vitest run src/services/__tests__/answer-format.test.ts
-```
-
-Expected: 11/11 (10 existing + 1 new) PASS.
-
-- [ ] **Step 6: Commit**
-
-```bash
-git add src/services/answer-format.ts src/services/__tests__/answer-format.test.ts
-git commit -m "fix(answer-format): relax ORDERED_LIST hint to allow partial answers
-
-Sprint 3 L1 diagnostic showed the strict 'EXACTLY the count requested'
-phrasing forced Haiku to fabricate items when retrieved facts fell short
-of the count. This hurt event_ordering by -0.175 on conv 2 n=20.
-Updated hint instructs the model to enumerate only retrievable items
-and explicitly state the shortage when count cannot be reached."
-```
-
-## Task 0.2: Tighten L1 classifier to require numeric token for ORDERED_LIST
-
-**Files:**
-- Modify: `src/services/answer-format.ts` (line 38, ORDERED_LIST_PATTERN)
-- Test: `src/services/__tests__/answer-format.test.ts`
-
-- [ ] **Step 1: Add failing test cases**
-
-In `src/services/__tests__/answer-format.test.ts`, add inside the `describe('classifyQuestion', () => {})` block:
-
-```typescript
-  it('does NOT classify "list common errors" as ORDERED_LIST (no numeric token)', () => {
-    expect(classifyQuestion('What are some common responses when an API fails? List them.')).toBe(
-      QuestionType.OTHER,
-    );
-  });
-
-  it('classifies "list five items in order" as ORDERED_LIST (numeric token present)', () => {
-    expect(classifyQuestion('Can you list five items in order?')).toBe(QuestionType.ORDERED_LIST);
-  });
-
-  it('classifies "Mention ONLY three items" as ORDERED_LIST (numeric token present)', () => {
-    expect(classifyQuestion('List them in order. Mention ONLY three items.')).toBe(
-      QuestionType.ORDERED_LIST,
-    );
-  });
-```
-
-- [ ] **Step 2: Run to verify the new tests fail**
-
-```bash
-npx vitest run src/services/__tests__/answer-format.test.ts -t "ORDERED_LIST"
-```
-
-Expected: 2 fail (the "common errors" case currently MATCHES ORDERED_LIST because of the loose pattern, and "Mention ONLY three" doesn't match because no "in order"/sequence/chronological token).
-
-- [ ] **Step 3: Tighten the regex in `answer-format.ts`**
-
-Replace the ORDERED_LIST_PATTERN constant (line ~38) with:
-
-```typescript
-// Requires either:
-// (a) "list ... in order" / "order in which" / "chronological" — explicit ordering verb, OR
-// (b) ordering verb + a spelled-out or digit count token ("three", "5", "ONLY five items")
-// This prevents false-positives on generic "list X" / "list common errors" queries.
-const ORDERED_LIST_NUMERIC = /\b(\d+|one|two|three|four|five|six|seven|eight|nine|ten)\b/i;
-const ORDERED_LIST_HINT = /\b(list|sequence|order|chronological|mention)\b/i;
-const ORDERED_LIST_EXPLICIT = /\b(list\s+(?:.*?\s+)?in order|order in which|chronological order)\b/i;
-```
-
-Then replace the corresponding classifier branch (line ~54) with:
-
-```typescript
-  if (ORDERED_LIST_EXPLICIT.test(query)) return QuestionType.ORDERED_LIST;
-  if (ORDERED_LIST_HINT.test(query) && ORDERED_LIST_NUMERIC.test(query)) {
-    return QuestionType.ORDERED_LIST;
-  }
-```
-
-(Place these BEFORE the existing CONTRADICTION/SUMMARY branches — priority must be preserved.)
-
-- [ ] **Step 4: Run the new tests to verify they pass**
-
-```bash
-npx vitest run src/services/__tests__/answer-format.test.ts -t "ORDERED_LIST"
-```
-
-Expected: all PASS, including the existing 1 from prior tasks.
-
-- [ ] **Step 5: Run the full module suite**
-
-```bash
-npx vitest run src/services/__tests__/answer-format.test.ts
-```
-
-Expected: 14/14 PASS.
-
-- [ ] **Step 6: tsc clean**
-
-```bash
-npx tsc --noEmit
-```
-
-Expected: no errors.
-
-- [ ] **Step 7: Commit**
-
-```bash
-git add src/services/answer-format.ts src/services/__tests__/answer-format.test.ts
-git commit -m "fix(answer-format): require numeric token for ORDERED_LIST classification
-
-Sprint 3 conv 2 diagnostic: 'What are some common responses' falsely
-matched the loose ORDERED_LIST regex via the bare 'list' verb, then
-triggered the count-enforcement hint and damaged instruction_following
-(-0.50 on that question alone). Tightened classifier to require either
-an explicit 'in order' phrase or a numeric/spelled-out count token
-alongside the list verb. Adds two negative-case tests."
-```
-
-## Task 0.3: Delete L3 (counter-edge-surface) module and its config flag
-
-**Files:**
-- Delete: `src/services/counter-edge-surface.ts`
-- Delete: `src/services/__tests__/counter-edge-surface.test.ts`
-- Modify: `src/config.ts` (remove `counterEdgeSurfaceEnabled`)
-- Modify: `src/app/runtime-container.ts` (remove from interface)
-- Modify: `src/services/search-pipeline.ts` (remove `maybeSurfaceCounterEdges` call)
-- Modify: `src/services/retrieval-format.ts` (remove `[CONTRADICTS prior fact]` marker)
-- Modify: `src/db/repository-types.ts` (remove `counterOf` field from `SearchResult`)
-
-- [ ] **Step 1: Verify nothing other than these files imports counter-edge-surface**
-
-```bash
-grep -rn "counter-edge-surface\|CounterEdgeSurface\|counterEdgeSurfaceEnabled\|counterOf" src/ \
-  --include="*.ts" | grep -v __tests__ | grep -v counter-edge-surface.ts
-```
-
-Expected output (only these references):
-- `src/config.ts` — flag declaration
-- `src/app/runtime-container.ts` — flag in interface
-- `src/services/search-pipeline.ts` — function call
-- `src/services/retrieval-format.ts` — marker emission + `counterOf` checks
-- `src/db/repository-types.ts` — field definition
-
-If you see more, STOP and consult the user.
-
-- [ ] **Step 2: Delete the module and its tests**
-
-```bash
-rm src/services/counter-edge-surface.ts
-rm src/services/__tests__/counter-edge-surface.test.ts
-```
-
-- [ ] **Step 3: Remove the config flag**
-
-In `src/config.ts`, find and delete the line:
-
-```typescript
-  counterEdgeSurfaceEnabled: (optionalEnv('COUNTER_EDGE_SURFACE_ENABLED') ?? 'false') === 'true',
-```
-
-Also remove the corresponding entry from the `RuntimeConfig` interface and the `INTERNAL_POLICY_CONFIG_FIELDS` array.
-
-- [ ] **Step 4: Remove from runtime-container interface**
-
-In `src/app/runtime-container.ts`, find and delete the line:
-
-```typescript
-  counterEdgeSurfaceEnabled: boolean;
-```
-
-from the `CoreRuntimeConfig` interface.
-
-- [ ] **Step 5: Remove the call site from search-pipeline.ts**
-
-In `src/services/search-pipeline.ts`, find and delete the import and the `maybeSurfaceCounterEdges` block (it lives between `applyExpansionAndReranking` and the namespace filter). Replace the `surfaced = ...` line with a direct assignment from the prior `selected` variable.
-
-- [ ] **Step 6: Remove the `[CONTRADICTS]` marker from retrieval-format.ts**
-
-In `src/services/retrieval-format.ts`, find every reference to `counterOf` (typically in `formatFullLine`, `formatStagedLine`, `formatSubjectSection`, `formatTieredLine`). Remove the conditional that prepends `[CONTRADICTS prior fact <id>]:` and emit the line normally.
-
-- [ ] **Step 7: Remove `counterOf` from `repository-types.ts`**
-
-In `src/db/repository-types.ts`, find and delete the `counterOf?: string;` field from `SearchResult`.
-
-- [ ] **Step 8: tsc clean**
-
-```bash
-npx tsc --noEmit
-```
-
-Expected: no errors. If you see "Cannot find name 'counterOf'" or similar, you missed a reference — grep again.
-
-- [ ] **Step 9: Run the full test suite to verify nothing else broke**
-
-```bash
-npm test -- --reporter=basic 2>&1 | tail -20
-```
-
-Expected: all suites pass except any tests directly testing the deleted module (those were deleted in step 2).
-
-- [ ] **Step 10: Commit**
-
-```bash
-git add -A
-git commit -m "chore: delete Layer 3 (counter-edge-surface) — replaced by CR specialist later
-
-Sprint 3 conv 2 n=20 measurement: L3-only composite 0.377 vs baseline 0.520
-(-0.143). The mechanism designed to lift CR actually dropped CR to 0.125
-(-0.188) because the [CONTRADICTS prior fact] marker confused Haiku into
-picking the unmarked side of the contradiction. Per the Phase 0 cleanup
-in the BEAM-0.85 design, L3 is deleted entirely. A reworked CR specialist
-with explicit FACT A / FACT B framing will arrive in Phase 2.2."
-```
-
-## Task 0.4: Author the Phase 0 validation env file
-
-**Files:**
-- Create: `.env.phase0-l1patched`
-
-- [ ] **Step 1: Write the env file**
-
-Create `/Users/moralespanitz/me/supernet/atomicmemory-core/.claude/worktrees/tbc-prototype/.env.phase0-l1patched`:
-
-```bash
-POSTGRES_PORT=5502
-APP_PORT=3102
-DATABASE_URL=postgresql://atomicmemory:atomicmemory@localhost:5502/atomicmemory
-LLM_PROVIDER=anthropic
-LLM_API_URL=
-LLM_API_KEY=<your-anthropic-api-key>
-LLM_MODEL=claude-haiku-4-5
-EMBEDDING_PROVIDER=transformers
-EMBEDDING_MODEL=Xenova/all-MiniLM-L6-v2
-EMBEDDING_DIMENSIONS=384
-ANTHROPIC_API_KEY=<your-anthropic-api-key>
-ATOMICMEMORY_API_URL=http://localhost:3102
-COST_CAP_DAILY=200
-COST_CAP_ITER=20
-# Kept stack (h3-timeline)
-TBC_ENABLED=true
-TOPIC_ABSTRACTION_ENABLED=false
-TOPIC_SEARCH_ENABLED=false
-RERANKER_ENABLED=true
-RECAP_LAYER_ENABLED=false
-RECAP_SEARCH_ENABLED=false
-HIERARCHICAL_RETRIEVAL_ENABLED=false
-CHUNKED_EXTRACTION_ENABLED=true
-CHUNKED_EXTRACTION_FALLBACK_ENABLED=true
-TIMELINE_CHANNEL_ENABLED=true
-PACKAGING_USE_OBSERVED_AT=true
-# Layer 1 patched + on. Layer 3 deleted (no flag needed).
-ANSWER_FORMAT_ALIGNMENT_ENABLED=true
-```
-
-- [ ] **Step 2: DO NOT commit this file**
-
-`.env.*` files are blocked by `.gitignore` (everything except `.env.example`).
-Real API keys live in these files. The file exists on disk for the docker
-stack to read; it never enters git.
-
-If you `git add` it accidentally, the .gitignore will block it. If you
-force-add with `-f`, you have done a security violation — revert before push.
-
-## Task 0.5: Run the Phase 0 4-conv n=80 validation
-
-**Files:**
-- Will create: `benchmarks-sprint3/results/haiku080/phase0-l1patched/summary.json`
-
-- [ ] **Step 1: Verify ports 3102 and 5502 are free**
-
-```bash
-for p in 3102 5502; do
-  if lsof -nP -iTCP:$p -sTCP:LISTEN >/dev/null 2>&1; then echo "$p BUSY"; else echo "$p free"; fi
-done
-```
-
-Expected: both free. If busy, identify and kill the holding process.
-
-- [ ] **Step 2: Verify LiteLLM is up (still required by the existing runner)**
-
-```bash
-curl -sfm 2 http://localhost:4000/health/liveliness
-```
-
-Expected: `"I'm alive!"`. If down, start LiteLLM before proceeding.
-
-- [ ] **Step 3: Run all 4 conversations sequentially**
-
-```bash
-/Users/moralespanitz/me/supernet/atomicmemory-research/memory-research/benchmarks-sprint3/tools/run_parallel_cell.sh \
-  phase0-l1patched .env.phase0-l1patched 3102 5502 am-phase0-l1patched \
-  1,2,3,4 anthropic-haiku-4-5 anthropic-haiku-4-5
-```
-
-Expected wall time: ~25 min. Expected cost: ~$2.
-
-- [ ] **Step 4: Inspect the summary**
-
-```bash
-cat /Users/moralespanitz/me/supernet/atomicmemory-research/memory-research/benchmarks-sprint3/results/haiku080/phase0-l1patched/summary.json
-```
-
-Record the composite and per-ability scores. Compare against `h3-timeline/summary.json` (the previous baseline).
-
-- [ ] **Step 5: Run the per-question diff**
-
-```bash
-python3 << 'PYEOF'
-import json
-prev = json.load(open('/Users/moralespanitz/me/supernet/atomicmemory-research/memory-research/benchmarks-sprint3/results/haiku080/h3-timeline/summary.json'))
-new  = json.load(open('/Users/moralespanitz/me/supernet/atomicmemory-research/memory-research/benchmarks-sprint3/results/haiku080/phase0-l1patched/summary.json'))
-print(f"composite: {prev['composite']:.3f} -> {new['composite']:.3f}  (delta {new['composite']-prev['composite']:+.3f})")
-print("per-ability:")
-for ab in sorted(prev['per_ability']):
-    p, n = prev['per_ability'][ab], new['per_ability'][ab]
-    print(f"  {ab:25s}: {p:.3f} -> {n:.3f}  (delta {n-p:+.3f})")
-PYEOF
-```
-
-- [ ] **Step 6: Apply the Phase 0 gate**
-
-Decision table:
-
-| Composite Δ | Worst per-ability Δ | Verdict |
-|---|---|---|
-| ≥ +0.05 | ≥ -0.10 | **PASS** — proceed to Phase 1 |
-| ∈ [-0.03, +0.05] | ≥ -0.10 | **PLATEAU** — diagnose via per-question diff, document, then proceed to Phase 1 anyway (L1 is the only Phase 0 change and the patch is theory-correct; we don't expect a large lift, we expect "doesn't regress") |
-| < -0.03 OR any ability < -0.10 | — | **REGRESS** — stop. Run `git revert HEAD~5..HEAD` to undo Phase 0. Open a diagnostic and consult the user. |
-
-- [ ] **Step 7: Write the Phase 0 diagnostic doc**
-
-Create `/Users/moralespanitz/me/supernet/atomicmemory-research/memory-research/benchmarks-sprint5/phase-0-diagnostic.md` with:
-- Composite before / after / delta
-- Per-ability table
-- 1-paragraph diagnosis from the per-question diff
-- Verdict (PASS / PLATEAU / REGRESS)
-- Next step
-
-- [ ] **Step 8: Commit**
-
-```bash
-cd /Users/moralespanitz/me/supernet/atomicmemory-research
-git add memory-research/benchmarks-sprint3/results/haiku080/phase0-l1patched/ \
-        memory-research/benchmarks-sprint5/phase-0-diagnostic.md
-git commit -m "results: Phase 0 4-conv n=80 validation"
-```
-
-- [ ] **Step 9: Gate**
-
-If verdict is REGRESS, stop and consult the user. Otherwise proceed to Phase 1.
-
----
-
-# PHASE 1 — Reflect step (async Sonnet consolidation)
-
-Goal: add an async LLM consolidation step that runs at session boundaries, writes synthesized observations to a new `session_reflections` table, and surfaces them via a `## OBSERVATIONS` retrieval channel for SUM / KU / MSR / CR / PF / IE queries.
-
-## Task 1.1: Author the migration for `session_reflections` and `reflection_jobs`
-
-**Files:**
-- Create: `src/db/migrations/20260512_session_reflections.sql`
-
-- [ ] **Step 1: Write the migration**
-
-```sql
--- 20260512_session_reflections.sql
--- Phase 1 of BEAM-0.85 plan: async Reflect step storage.
---
--- Two tables:
---   session_reflections: synthesized observations per (user_id, conversation_id),
---                        each citing evidence_memory_ids and embedded for retrieval
---   reflection_jobs:     Postgres-backed async work queue for the reflect worker
-
-CREATE TABLE IF NOT EXISTS session_reflections (
-  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
-  user_id TEXT NOT NULL,
-  conversation_id TEXT NOT NULL,
-  observation TEXT NOT NULL,
-  observation_type TEXT NOT NULL CHECK (observation_type IN (
-    'entity_state', 'event_summary', 'preference',
-    'contradiction', 'decision', 'numeric_value'
-  )),
-  evidence_memory_ids TEXT[] NOT NULL,
-  embedding vector(384),
-  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
-);
-
-CREATE INDEX IF NOT EXISTS ix_session_reflections_user_conv
-  ON session_reflections (user_id, conversation_id);
-
-CREATE INDEX IF NOT EXISTS ix_session_reflections_embedding
-  ON session_reflections USING hnsw (embedding vector_cosine_ops);
-
-CREATE TABLE IF NOT EXISTS reflection_jobs (
-  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
-  user_id TEXT NOT NULL,
-  conversation_id TEXT NOT NULL,
-  status TEXT NOT NULL DEFAULT 'pending'
-    CHECK (status IN ('pending', 'in_progress', 'completed', 'failed')),
-  attempts INTEGER NOT NULL DEFAULT 0,
-  last_error TEXT,
-  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
-  last_tried_at TIMESTAMPTZ
-);
-
-CREATE UNIQUE INDEX IF NOT EXISTS ix_reflection_jobs_pending_unique
-  ON reflection_jobs (user_id, conversation_id)
-  WHERE status IN ('pending', 'in_progress');
-
-CREATE INDEX IF NOT EXISTS ix_reflection_jobs_status_created
-  ON reflection_jobs (status, created_at);
-```
-
-- [ ] **Step 2: Run migration on the test DB**
-
-```bash
-npm run migrate:test
-```
-
-Expected: no errors; tables created.
-
-- [ ] **Step 3: Verify schema**
-
-```bash
-dotenv -e .env.test -- psql "$DATABASE_URL" -c "\d session_reflections" -c "\d reflection_jobs"
-```
-
-Expected: both tables shown with the columns above.
-
-- [ ] **Step 4: Commit**
-
-```bash
-git add src/db/migrations/20260512_session_reflections.sql
-git commit -m "migration: add session_reflections + reflection_jobs tables"
-```
-
-## Task 1.2: Implement `ReflectionsRepository` (CRUD + similarity search)
-
-**Files:**
-- Create: `src/db/reflections-repository.ts`
-- Create: `src/db/__tests__/reflections-repository.test.ts`
-
-- [ ] **Step 1: Write the failing integration test**
-
-```typescript
-// src/db/__tests__/reflections-repository.test.ts
-/**
- * Integration tests for ReflectionsRepository. Uses the .env.test Postgres
- * instance; assumes the 20260512_session_reflections migration has been
- * applied (via `npm run migrate:test`).
- */
-import { afterAll, beforeEach, describe, expect, it } from 'vitest';
-import pg from 'pg';
-import { ReflectionsRepository, type NewReflection } from '../reflections-repository.js';
-import { config } from '../../config.js';
-
-const pool = new pg.Pool({ connectionString: config.databaseUrl });
-const repo = new ReflectionsRepository(pool);
-
-afterAll(async () => { await pool.end(); });
-
-beforeEach(async () => {
-  await pool.query("DELETE FROM session_reflections WHERE user_id LIKE 'test-refl-%'");
-});
-
-const USER = 'test-refl-1';
-const CONV = 'conv-A';
-const VEC = (n: number): number[] => Array.from({ length: 384 }, () => n);
-
-describe('ReflectionsRepository', () => {
-  it('inserts and reads back reflections by (userId, conversationId)', async () => {
-    const rows: NewReflection[] = [
-      { userId: USER, conversationId: CONV,
-        observation: 'User uses Flask-Login v0.6.2',
-        observationType: 'entity_state',
-        evidenceMemoryIds: ['m1', 'm2'],
-        embedding: VEC(0.1) },
-    ];
-    await repo.insertMany(rows);
-    const found = await repo.findByConversation(USER, CONV);
-    expect(found).toHaveLength(1);
-    expect(found[0].observation).toBe('User uses Flask-Login v0.6.2');
-    expect(found[0].observationType).toBe('entity_state');
-    expect(found[0].evidenceMemoryIds).toEqual(['m1', 'm2']);
-  });
-
-  it('findSimilar returns the most cosine-similar reflections first', async () => {
-    await repo.insertMany([
-      { userId: USER, conversationId: CONV,
-        observation: 'similar', observationType: 'event_summary',
-        evidenceMemoryIds: ['m1'], embedding: VEC(0.1) },
-      { userId: USER, conversationId: CONV,
-        observation: 'far',     observationType: 'event_summary',
-        evidenceMemoryIds: ['m2'], embedding: VEC(-0.9) },
-    ]);
-    const hits = await repo.findSimilar(USER, VEC(0.1), 2);
-    expect(hits[0].observation).toBe('similar');
-    expect(hits[1].observation).toBe('far');
-  });
-
-  it('returns empty array when no reflections exist', async () => {
-    const hits = await repo.findSimilar(USER, VEC(0.5), 5);
-    expect(hits).toEqual([]);
-  });
-});
-```
-
-- [ ] **Step 2: Run the test to verify it fails**
-
-```bash
-dotenv -e .env.test -- npx vitest run src/db/__tests__/reflections-repository.test.ts
-```
-
-Expected: FAIL — module not found.
-
-- [ ] **Step 3: Implement the repository**
-
-```typescript
-// src/db/reflections-repository.ts
-/**
- * ReflectionsRepository — CRUD plus cosine-similarity search for the
- * session_reflections table. Each row is an LLM-synthesized observation about
- * a conversation, with citations to the supporting memory ids and an embedding
- * for retrieval-side similarity search.
- *
- * Pure SQL via pg.Pool. No ORM. Mutations fail closed: caller catches errors,
- * we propagate them with the original error attached.
- */
-import pg from 'pg';
-
-export type ObservationType =
-  | 'entity_state'
-  | 'event_summary'
-  | 'preference'
-  | 'contradiction'
-  | 'decision'
-  | 'numeric_value';
-
-export interface NewReflection {
-  userId: string;
-  conversationId: string;
-  observation: string;
-  observationType: ObservationType;
-  evidenceMemoryIds: string[];
-  embedding: number[];
-}
-
-export interface Reflection extends NewReflection {
-  id: string;
-  createdAt: Date;
-}
-
-function vectorLiteral(vec: number[]): string {
-  return `[${vec.join(',')}]`;
-}
-
-export class ReflectionsRepository {
-  constructor(private readonly pool: pg.Pool) {}
-
-  async insertMany(rows: readonly NewReflection[]): Promise<void> {
-    if (rows.length === 0) return;
-    const sql = `
-      INSERT INTO session_reflections
-        (user_id, conversation_id, observation, observation_type, evidence_memory_ids, embedding)
-      VALUES ($1, $2, $3, $4, $5, $6::vector)
-    `;
-    const client = await this.pool.connect();
-    try {
-      await client.query('BEGIN');
-      for (const r of rows) {
-        await client.query(sql, [
-          r.userId, r.conversationId, r.observation, r.observationType,
-          r.evidenceMemoryIds, vectorLiteral(r.embedding),
-        ]);
-      }
-      await client.query('COMMIT');
-    } catch (e) {
-      await client.query('ROLLBACK');
-      throw e;
-    } finally {
-      client.release();
-    }
-  }
-
-  async findByConversation(userId: string, conversationId: string): Promise<Reflection[]> {
-    const { rows } = await this.pool.query(
-      `SELECT id, user_id, conversation_id, observation, observation_type,
-              evidence_memory_ids, created_at
-       FROM session_reflections
-       WHERE user_id = $1 AND conversation_id = $2
-       ORDER BY created_at ASC`,
-      [userId, conversationId],
-    );
-    return rows.map(mapRow);
-  }
-
-  async findSimilar(userId: string, queryEmbedding: number[], topK: number): Promise<Reflection[]> {
-    const { rows } = await this.pool.query(
-      `SELECT id, user_id, conversation_id, observation, observation_type,
-              evidence_memory_ids, created_at
-       FROM session_reflections
-       WHERE user_id = $1
-       ORDER BY embedding <=> $2::vector
-       LIMIT $3`,
-      [userId, vectorLiteral(queryEmbedding), topK],
-    );
-    return rows.map(mapRow);
-  }
-}
-
-function mapRow(r: pg.QueryResultRow): Reflection {
-  return {
-    id: r.id,
-    userId: r.user_id,
-    conversationId: r.conversation_id,
-    observation: r.observation,
-    observationType: r.observation_type,
-    evidenceMemoryIds: r.evidence_memory_ids,
-    embedding: [],
-    createdAt: r.created_at,
-  };
-}
-```
-
-- [ ] **Step 4: Run tests to verify they pass**
-
-```bash
-dotenv -e .env.test -- npx vitest run src/db/__tests__/reflections-repository.test.ts
-```
-
-Expected: 3/3 PASS.
-
-- [ ] **Step 5: tsc clean**
-
-```bash
-npx tsc --noEmit
-```
-
-Expected: no errors.
-
-- [ ] **Step 6: Commit**
-
-```bash
-git add src/db/reflections-repository.ts src/db/__tests__/reflections-repository.test.ts
-git commit -m "feat(reflections): add ReflectionsRepository with insertMany, findByConversation, findSimilar"
-```
-
-## Task 1.3: Implement `ReflectionJobsRepository` (Postgres-backed queue)
-
-**Files:**
-- Create: `src/db/reflection-jobs-repository.ts`
-- Create: `src/db/__tests__/reflection-jobs-repository.test.ts`
-
-- [ ] **Step 1: Write the failing test**
-
-```typescript
-// src/db/__tests__/reflection-jobs-repository.test.ts
-import { afterAll, beforeEach, describe, expect, it } from 'vitest';
-import pg from 'pg';
-import { ReflectionJobsRepository } from '../reflection-jobs-repository.js';
-import { config } from '../../config.js';
-
-const pool = new pg.Pool({ connectionString: config.databaseUrl });
-const repo = new ReflectionJobsRepository(pool);
-
-afterAll(async () => { await pool.end(); });
-
-beforeEach(async () => {
-  await pool.query("DELETE FROM reflection_jobs WHERE user_id LIKE 'test-rjq-%'");
-});
-
-const USER = 'test-rjq-1';
-const CONV = 'conv-A';
-
-describe('ReflectionJobsRepository', () => {
-  it('enqueue creates a pending job', async () => {
-    await repo.enqueue(USER, CONV);
-    const ready = await repo.fetchPending(10);
-    expect(ready).toHaveLength(1);
-    expect(ready[0].status).toBe('pending');
-    expect(ready[0].userId).toBe(USER);
-  });
-
-  it('enqueue is idempotent per (userId, conversationId) while pending or in_progress', async () => {
-    await repo.enqueue(USER, CONV);
-    await repo.enqueue(USER, CONV);
-    const ready = await repo.fetchPending(10);
-    expect(ready).toHaveLength(1);
-  });
-
-  it('markInProgress / markCompleted / markFailed flow', async () => {
-    await repo.enqueue(USER, CONV);
-    const [job] = await repo.fetchPending(10);
-    await repo.markInProgress(job.id);
-    let row = await repo.findById(job.id);
-    expect(row?.status).toBe('in_progress');
-    await repo.markCompleted(job.id);
-    row = await repo.findById(job.id);
-    expect(row?.status).toBe('completed');
-
-    await repo.enqueue(USER, 'conv-B');
-    const [other] = await repo.fetchPending(10);
-    await repo.markFailed(other.id, 'boom');
-    row = await repo.findById(other.id);
-    expect(row?.status).toBe('failed');
-    expect(row?.lastError).toBe('boom');
-  });
-
-  it('after completion, enqueue for same (user, conv) creates a new job', async () => {
-    await repo.enqueue(USER, CONV);
-    const [j] = await repo.fetchPending(10);
-    await repo.markInProgress(j.id);
-    await repo.markCompleted(j.id);
-    await repo.enqueue(USER, CONV);
-    const again = await repo.fetchPending(10);
-    expect(again).toHaveLength(1);
-    expect(again[0].id).not.toBe(j.id);
-  });
-});
-```
-
-- [ ] **Step 2: Verify fail**
-
-```bash
-dotenv -e .env.test -- npx vitest run src/db/__tests__/reflection-jobs-repository.test.ts
-```
-
-Expected: FAIL — module not found.
-
-- [ ] **Step 3: Implement**
-
-```typescript
-// src/db/reflection-jobs-repository.ts
-/**
- * Postgres-backed work queue for the async Reflect step.
- *
- * Idempotent enqueue: a unique partial index on (user_id, conversation_id)
- * WHERE status IN ('pending','in_progress') guarantees one in-flight job per
- * conversation at a time. Re-enqueue after completion creates a new job (the
- * unique index excludes 'completed' and 'failed').
- *
- * The worker (services/reflect-jobs.ts) drives the lifecycle: fetchPending →
- * markInProgress → run reflect → markCompleted | markFailed.
- */
-import pg from 'pg';
-
-export type JobStatus = 'pending' | 'in_progress' | 'completed' | 'failed';
-
-export interface ReflectionJob {
-  id: string;
-  userId: string;
-  conversationId: string;
-  status: JobStatus;
-  attempts: number;
-  lastError: string | null;
-  createdAt: Date;
-  lastTriedAt: Date | null;
-}
-
-export class ReflectionJobsRepository {
-  constructor(private readonly pool: pg.Pool) {}
-
-  async enqueue(userId: string, conversationId: string): Promise<void> {
-    await this.pool.query(
-      `INSERT INTO reflection_jobs (user_id, conversation_id) VALUES ($1, $2)
-       ON CONFLICT DO NOTHING`,
-      [userId, conversationId],
-    );
-  }
-
-  async fetchPending(limit: number): Promise<ReflectionJob[]> {
-    const { rows } = await this.pool.query(
-      `SELECT id, user_id, conversation_id, status, attempts, last_error,
-              created_at, last_tried_at
-       FROM reflection_jobs
-       WHERE status = 'pending'
-       ORDER BY created_at ASC
-       LIMIT $1`,
-      [limit],
-    );
-    return rows.map(mapJob);
-  }
-
-  async markInProgress(id: string): Promise<void> {
-    await this.pool.query(
-      `UPDATE reflection_jobs
-       SET status = 'in_progress', attempts = attempts + 1, last_tried_at = now()
-       WHERE id = $1`,
-      [id],
-    );
-  }
-
-  async markCompleted(id: string): Promise<void> {
-    await this.pool.query(
-      `UPDATE reflection_jobs SET status = 'completed' WHERE id = $1`,
-      [id],
-    );
-  }
-
-  async markFailed(id: string, error: string): Promise<void> {
-    await this.pool.query(
-      `UPDATE reflection_jobs SET status = 'failed', last_error = $2 WHERE id = $1`,
-      [id, error],
-    );
-  }
-
-  async findById(id: string): Promise<ReflectionJob | null> {
-    const { rows } = await this.pool.query(
-      `SELECT id, user_id, conversation_id, status, attempts, last_error,
-              created_at, last_tried_at
-       FROM reflection_jobs WHERE id = $1`,
-      [id],
-    );
-    return rows[0] ? mapJob(rows[0]) : null;
-  }
-}
-
-function mapJob(r: pg.QueryResultRow): ReflectionJob {
-  return {
-    id: r.id,
-    userId: r.user_id,
-    conversationId: r.conversation_id,
-    status: r.status,
-    attempts: r.attempts,
-    lastError: r.last_error,
-    createdAt: r.created_at,
-    lastTriedAt: r.last_tried_at,
-  };
-}
-```
-
-- [ ] **Step 4: Verify pass**
-
-```bash
-dotenv -e .env.test -- npx vitest run src/db/__tests__/reflection-jobs-repository.test.ts
-```
-
-Expected: 4/4 PASS.
-
-- [ ] **Step 5: Commit**
-
-```bash
-git add src/db/reflection-jobs-repository.ts src/db/__tests__/reflection-jobs-repository.test.ts
-git commit -m "feat(reflect): add Postgres-backed reflection_jobs queue repository"
-```
-
-## Task 1.4: Author `reflect-prompts.ts` — Sonnet system prompt + tool-use schema
-
-**Files:**
-- Create: `src/services/reflect-prompts.ts`
-- Create: `src/services/__tests__/reflect-prompts.test.ts`
-
-- [ ] **Step 1: Write the failing unit test**
-
-```typescript
-// src/services/__tests__/reflect-prompts.test.ts
-import { describe, expect, it } from 'vitest';
-import { buildReflectMessages, REFLECT_TOOL_SCHEMA } from '../reflect-prompts.js';
-
-describe('reflect-prompts', () => {
-  it('REFLECT_TOOL_SCHEMA defines record_observations with required fields', () => {
-    expect(REFLECT_TOOL_SCHEMA.name).toBe('record_observations');
-    const props = REFLECT_TOOL_SCHEMA.input_schema.properties;
-    expect(props).toBeDefined();
-    expect(props.observations).toBeDefined();
-    expect(props.observations.type).toBe('array');
-    const items = props.observations.items;
-    expect(items.required).toEqual(expect.arrayContaining(['text', 'type', 'evidence_memory_ids']));
-    expect(items.properties.type.enum).toEqual(expect.arrayContaining([
-      'entity_state', 'event_summary', 'preference',
-      'contradiction', 'decision', 'numeric_value',
-    ]));
-  });
-
-  it('buildReflectMessages includes each memory id and observation type list', () => {
-    const memories = [
-      { id: 'm1', text: 'User uses Flask 2.3', observedAt: new Date('2026-03-01') },
-      { id: 'm2', text: 'User never used Flask',  observedAt: new Date('2026-03-15') },
-    ];
-    const { system, user } = buildReflectMessages(memories);
-    expect(system).toContain('observations');
-    expect(user).toContain('m1');
-    expect(user).toContain('m2');
-    expect(user).toContain('User uses Flask 2.3');
-    expect(user).toContain('User never used Flask');
-  });
-});
-```
-
-- [ ] **Step 2: Verify fail**
-
-```bash
-npx vitest run src/services/__tests__/reflect-prompts.test.ts
-```
-
-Expected: FAIL — module not found.
-
-- [ ] **Step 3: Implement**
-
-```typescript
-// src/services/reflect-prompts.ts
-/**
- * Prompt assembly + Anthropic tool-use schema for the async Reflect step.
- *
- * The Reflect call presents Sonnet with a chronologically-sorted list of the
- * session's raw memories (each with its memory id and observed_at) and asks
- * Sonnet to consolidate them into a small set of typed observations. Each
- * observation MUST cite the memory_ids it draws from, so retrieval can verify
- * evidence still exists when the observation is later read by the answer LLM.
- *
- * Tool-use guarantees structured output — Sonnet returns a JSON payload that
- * matches REFLECT_TOOL_SCHEMA, eliminating the freeform-prose parsing failures
- * we saw with the Sprint 3 verifier pass.
- */
-
-export interface ReflectMemoryInput {
-  id: string;
-  text: string;
-  observedAt: Date;
-}
-
-export interface ReflectMessages {
-  system: string;
-  user: string;
-}
-
-const SYSTEM_PROMPT = [
-  'You are consolidating a single conversation\'s raw memories into a small set of typed observations.',
-  'Each observation must (a) be answerable from the cited evidence_memory_ids alone, (b) prefer concrete factual claims over narrative, (c) avoid restating the raw facts verbatim.',
-  '',
-  'Observation types (use exactly one per observation):',
-  '- entity_state: the current value of an attribute on an entity, with the latest-known value',
-  '- event_summary: a discrete event or action that happened',
-  '- preference: a stated user preference, opinion, or choice',
-  '- contradiction: two facts in the session that disagree (include both sides)',
-  '- decision: a user decision made during the session',
-  '- numeric_value: a numeric fact (count, amount, duration, percentage)',
-  '',
-  'Output 5–15 observations covering distinct claims. Call the record_observations tool.',
-].join('\n');
-
-export const REFLECT_TOOL_SCHEMA = {
-  name: 'record_observations',
-  description: 'Persist the consolidated observations for this conversation.',
-  input_schema: {
-    type: 'object',
-    properties: {
-      observations: {
-        type: 'array',
-        items: {
-          type: 'object',
-          required: ['text', 'type', 'evidence_memory_ids'],
-          properties: {
-            text: { type: 'string' },
-            type: {
-              type: 'string',
-              enum: [
-                'entity_state', 'event_summary', 'preference',
-                'contradiction', 'decision', 'numeric_value',
-              ],
-            },
-            evidence_memory_ids: {
-              type: 'array',
-              items: { type: 'string' },
-            },
-          },
-        },
-      },
-    },
-    required: ['observations'],
-  },
-} as const;
-
-export function buildReflectMessages(memories: readonly ReflectMemoryInput[]): ReflectMessages {
-  const lines = memories.map(
-    m => `[${m.id}] (${m.observedAt.toISOString().slice(0, 10)}) ${m.text}`,
-  );
-  const user = ['Memories from this conversation (chronological):', '', ...lines].join('\n');
-  return { system: SYSTEM_PROMPT, user };
-}
-```
-
-- [ ] **Step 4: Verify pass**
-
-```bash
-npx vitest run src/services/__tests__/reflect-prompts.test.ts
-```
-
-Expected: 2/2 PASS.
-
-- [ ] **Step 5: Commit**
-
-```bash
-git add src/services/reflect-prompts.ts src/services/__tests__/reflect-prompts.test.ts
-git commit -m "feat(reflect): add Sonnet system prompt + tool-use schema for record_observations"
-```
-
-## Task 1.5: Implement `reflect.ts` orchestrator
-
-**Files:**
-- Create: `src/services/reflect.ts`
-- Create: `src/services/__tests__/reflect.test.ts`
-
-- [ ] **Step 1: Write the failing test (mocked LLM + repo)**
-
-```typescript
-// src/services/__tests__/reflect.test.ts
-import { describe, expect, it, vi } from 'vitest';
-import { runReflectForConversation, type ReflectDeps } from '../reflect.js';
-
-const memories = [
-  { id: 'm1', text: 'first', observedAt: new Date('2026-03-01') },
-  { id: 'm2', text: 'second', observedAt: new Date('2026-03-02') },
-];
-
-const toolOutput = {
-  observations: [
-    { text: 'O1', type: 'event_summary' as const, evidence_memory_ids: ['m1', 'm2'] },
-    { text: 'O2', type: 'preference' as const,    evidence_memory_ids: ['m1'] },
-  ],
-};
-
-describe('runReflectForConversation', () => {
-  it('calls LLM with built messages, embeds each observation, persists with citations', async () => {
-    const insertMany = vi.fn().mockResolvedValue(undefined);
-    const llmTool = vi.fn().mockResolvedValue(toolOutput);
-    const embed = vi.fn().mockResolvedValue([0.1, 0.2]);
-    const fetchMemories = vi.fn().mockResolvedValue(memories);
-    const deps: ReflectDeps = {
-      fetchMemories,
-      llmCallTool: llmTool,
-      embed,
-      reflections: { insertMany } as any,
-      maxObservations: 15,
-    };
-    const res = await runReflectForConversation(deps, 'u1', 'c1');
-    expect(fetchMemories).toHaveBeenCalledWith('u1', 'c1');
-    expect(llmTool).toHaveBeenCalledTimes(1);
-    expect(embed).toHaveBeenCalledTimes(2);
-    expect(insertMany).toHaveBeenCalledTimes(1);
-    const inserted = insertMany.mock.calls[0][0];
-    expect(inserted).toHaveLength(2);
-    expect(inserted[0].observation).toBe('O1');
-    expect(inserted[0].evidenceMemoryIds).toEqual(['m1', 'm2']);
-    expect(res.count).toBe(2);
-  });
-
-  it('returns count=0 when conversation has no memories', async () => {
-    const deps: ReflectDeps = {
-      fetchMemories: vi.fn().mockResolvedValue([]),
-      llmCallTool: vi.fn(),
-      embed: vi.fn(),
-      reflections: { insertMany: vi.fn() } as any,
-      maxObservations: 15,
-    };
-    const res = await runReflectForConversation(deps, 'u1', 'c1');
-    expect(res.count).toBe(0);
-    expect(deps.llmCallTool).not.toHaveBeenCalled();
-  });
-
-  it('truncates observations to maxObservations', async () => {
-    const insertMany = vi.fn().mockResolvedValue(undefined);
-    const big = { observations: Array.from({ length: 20 }, (_, i) => ({
-      text: `O${i}`, type: 'event_summary' as const, evidence_memory_ids: ['m1'],
-    })) };
-    const deps: ReflectDeps = {
-      fetchMemories: vi.fn().mockResolvedValue(memories),
-      llmCallTool: vi.fn().mockResolvedValue(big),
-      embed: vi.fn().mockResolvedValue([0.1]),
-      reflections: { insertMany } as any,
-      maxObservations: 5,
-    };
-    const res = await runReflectForConversation(deps, 'u1', 'c1');
-    expect(res.count).toBe(5);
-  });
-});
-```
-
-- [ ] **Step 2: Verify fail**
-
-```bash
-npx vitest run src/services/__tests__/reflect.test.ts
-```
-
-Expected: FAIL — module not found.
-
-- [ ] **Step 3: Implement**
-
-```typescript
-// src/services/reflect.ts
-/**
- * Reflect orchestrator. Pulls a conversation's memories, sends them to the
- * answer-LLM tool-use endpoint with the record_observations schema, embeds
- * each returned observation, and persists them to session_reflections.
- *
- * Pure dependency-injected — the worker (reflect-jobs) supplies real
- * implementations; tests supply mocks. No I/O of its own beyond what the
- * injected dependencies do.
- */
-import type {
-  ReflectionsRepository,
-  NewReflection,
-  ObservationType,
-} from '../db/reflections-repository.js';
-import {
-  buildReflectMessages,
-  REFLECT_TOOL_SCHEMA,
-  type ReflectMemoryInput,
-} from './reflect-prompts.js';
-
-export interface ReflectToolOutput {
-  observations: Array<{
-    text: string;
-    type: ObservationType;
-    evidence_memory_ids: string[];
-  }>;
-}
-
-export interface ReflectDeps {
-  fetchMemories: (userId: string, conversationId: string) => Promise<readonly ReflectMemoryInput[]>;
-  llmCallTool: (system: string, user: string, toolSchema: typeof REFLECT_TOOL_SCHEMA)
-    => Promise<ReflectToolOutput>;
-  embed: (text: string) => Promise<number[]>;
-  reflections: Pick<ReflectionsRepository, 'insertMany'>;
-  maxObservations: number;
-}
-
-export interface ReflectResult {
-  count: number;
-}
-
-export async function runReflectForConversation(
-  deps: ReflectDeps,
-  userId: string,
-  conversationId: string,
-): Promise<ReflectResult> {
-  const memories = await deps.fetchMemories(userId, conversationId);
-  if (memories.length === 0) return { count: 0 };
-
-  const { system, user } = buildReflectMessages(memories);
-  const out = await deps.llmCallTool(system, user, REFLECT_TOOL_SCHEMA);
-
-  const truncated = out.observations.slice(0, deps.maxObservations);
-  const rows: NewReflection[] = [];
-  for (const o of truncated) {
-    const embedding = await deps.embed(o.text);
-    rows.push({
-      userId,
-      conversationId,
-      observation: o.text,
-      observationType: o.type,
-      evidenceMemoryIds: o.evidence_memory_ids,
-      embedding,
-    });
-  }
-
-  await deps.reflections.insertMany(rows);
-  return { count: rows.length };
-}
-```
-
-- [ ] **Step 4: Verify pass**
-
-```bash
-npx vitest run src/services/__tests__/reflect.test.ts
-```
-
-Expected: 3/3 PASS.
-
-- [ ] **Step 5: Commit**
-
-```bash
-git add src/services/reflect.ts src/services/__tests__/reflect.test.ts
-git commit -m "feat(reflect): add runReflectForConversation orchestrator with DI"
-```
-
-## Task 1.6: Implement `reflect-jobs.ts` worker
-
-**Files:**
-- Create: `src/services/reflect-jobs.ts`
-- Create: `src/services/__tests__/reflect-jobs.test.ts`
-
-- [ ] **Step 1: Write the failing test**
-
-```typescript
-// src/services/__tests__/reflect-jobs.test.ts
-import { describe, expect, it, vi } from 'vitest';
-import { processOnePendingJob, type JobsWorkerDeps } from '../reflect-jobs.js';
-
-const baseDeps = (): JobsWorkerDeps => ({
-  jobs: {
-    fetchPending: vi.fn(),
-    markInProgress: vi.fn().mockResolvedValue(undefined),
-    markCompleted: vi.fn().mockResolvedValue(undefined),
-    markFailed: vi.fn().mockResolvedValue(undefined),
-  } as any,
-  runReflect: vi.fn().mockResolvedValue({ count: 3 }),
-});
-
-describe('processOnePendingJob', () => {
-  it('returns false when no pending job available', async () => {
-    const deps = baseDeps();
-    (deps.jobs.fetchPending as any).mockResolvedValue([]);
-    const did = await processOnePendingJob(deps);
-    expect(did).toBe(false);
-    expect(deps.runReflect).not.toHaveBeenCalled();
-  });
-
-  it('marks in_progress, runs reflect, marks completed on success', async () => {
-    const deps = baseDeps();
-    (deps.jobs.fetchPending as any).mockResolvedValue([
-      { id: 'j1', userId: 'u', conversationId: 'c' },
-    ]);
-    const did = await processOnePendingJob(deps);
-    expect(did).toBe(true);
-    expect(deps.jobs.markInProgress).toHaveBeenCalledWith('j1');
-    expect(deps.runReflect).toHaveBeenCalledWith('u', 'c');
-    expect(deps.jobs.markCompleted).toHaveBeenCalledWith('j1');
-    expect(deps.jobs.markFailed).not.toHaveBeenCalled();
-  });
-
-  it('marks failed when runReflect throws', async () => {
-    const deps = baseDeps();
-    (deps.jobs.fetchPending as any).mockResolvedValue([
-      { id: 'j2', userId: 'u', conversationId: 'c' },
-    ]);
-    (deps.runReflect as any).mockRejectedValue(new Error('boom'));
-    const did = await processOnePendingJob(deps);
-    expect(did).toBe(true);
-    expect(deps.jobs.markFailed).toHaveBeenCalledWith('j2', expect.stringContaining('boom'));
-    expect(deps.jobs.markCompleted).not.toHaveBeenCalled();
-  });
-});
-```
-
-- [ ] **Step 2: Verify fail**
-
-```bash
-npx vitest run src/services/__tests__/reflect-jobs.test.ts
-```
-
-Expected: FAIL — module not found.
-
-- [ ] **Step 3: Implement**
-
-```typescript
-// src/services/reflect-jobs.ts
-/**
- * Reflect worker. Pulls one pending job at a time from reflection_jobs,
- * marks it in_progress, invokes the Reflect orchestrator, and records the
- * outcome on the job row.
- *
- * Mutations fail closed: if Reflect throws, the job is marked failed with
- * the error message — the loop continues with the next job. The worker never
- * silently swallows errors.
- *
- * Designed for single-instance deployment; multi-instance leasing is out of
- * scope for v1 (the unique partial index on (user_id, conversation_id) WHERE
- * status IN ('pending','in_progress') keeps work bounded if accidentally
- * double-deployed, but doesn't prevent two workers picking different jobs).
- */
-import type { ReflectionJobsRepository } from '../db/reflection-jobs-repository.js';
-import type { ReflectResult } from './reflect.js';
-
-export interface JobsWorkerDeps {
-  jobs: Pick<ReflectionJobsRepository, 'fetchPending' | 'markInProgress' | 'markCompleted' | 'markFailed'>;
-  runReflect: (userId: string, conversationId: string) => Promise<ReflectResult>;
-}
-
-export async function processOnePendingJob(deps: JobsWorkerDeps): Promise<boolean> {
-  const [job] = await deps.jobs.fetchPending(1);
-  if (!job) return false;
-  await deps.jobs.markInProgress(job.id);
-  try {
-    await deps.runReflect(job.userId, job.conversationId);
-    await deps.jobs.markCompleted(job.id);
-  } catch (e) {
-    const msg = e instanceof Error ? e.message : String(e);
-    await deps.jobs.markFailed(job.id, msg);
-  }
-  return true;
-}
-
-export interface WorkerHandle {
-  stop: () => void;
-}
-
-export function startReflectWorker(deps: JobsWorkerDeps, pollMs: number): WorkerHandle {
-  let stopped = false;
-  const tick = async (): Promise<void> => {
-    if (stopped) return;
-    try {
-      const didWork = await processOnePendingJob(deps);
-      if (!didWork) {
-        await new Promise(r => setTimeout(r, pollMs));
-      }
-    } catch (e) {
-      // Worker-level errors (DB conn drop, etc.) — log to stderr and back off.
-      console.error('[reflect-worker] unexpected error:', e);
-      await new Promise(r => setTimeout(r, pollMs * 2));
-    }
-    if (!stopped) void tick();
-  };
-  void tick();
-  return { stop: () => { stopped = true; } };
-}
-```
-
-- [ ] **Step 4: Verify pass**
-
-```bash
-npx vitest run src/services/__tests__/reflect-jobs.test.ts
-```
-
-Expected: 3/3 PASS.
-
-- [ ] **Step 5: Commit**
-
-```bash
-git add src/services/reflect-jobs.ts src/services/__tests__/reflect-jobs.test.ts
-git commit -m "feat(reflect): add reflect-jobs worker with fail-closed job lifecycle"
-```
-
-## Task 1.7: Wire enqueue into `memory-ingest.ts`
-
-**Files:**
-- Modify: `src/services/memory-ingest.ts`
-
-- [ ] **Step 1: Inspect current ingest entry point**
-
-```bash
-grep -n "export async function\|export function" src/services/memory-ingest.ts | head -10
-```
-
-Identify the function that wraps a complete ingest of one or more memories.
-
-- [ ] **Step 2: Add the enqueue call at the end of the ingest flow**
-
-Find the function that returns after AUDN commits. Inject a `reflectionJobs?` dependency and an `enqueueIfEnabled` call after the commit:
-
-```typescript
-// near the top of the file, add to deps interface
-import type { ReflectionJobsRepository } from '../db/reflection-jobs-repository.js';
-
-// in the deps shape:
-interface IngestDeps {
-  // ... existing ...
-  reflectionJobs?: ReflectionJobsRepository;
-  reflectEnabled: boolean;
-}
-
-// at the end of the main ingest function, after the existing commit:
-if (deps.reflectEnabled && deps.reflectionJobs) {
-  try {
-    await deps.reflectionJobs.enqueue(userId, conversationId);
-  } catch (e) {
-    // Enqueue failure must not block the ingest response.
-    console.error('[memory-ingest] reflection enqueue failed:', e);
-  }
-}
-```
-
-Adapt to actual deps-shape and conversation_id source in the existing file. If `conversationId` is not currently threaded into ingest, thread it from the route handler.
-
-- [ ] **Step 3: tsc clean**
-
-```bash
-npx tsc --noEmit
-```
-
-Expected: no errors.
-
-- [ ] **Step 4: Add a unit test that confirms enqueue is called**
-
-If `memory-ingest.ts` already has tests, extend them. Add one test (in the existing memory-ingest tests file) where `reflectEnabled=true` and `reflectionJobs.enqueue` is asserted called once with `(userId, conversationId)`. Add a complement where `reflectEnabled=false` and `enqueue` is NOT called.
-
-- [ ] **Step 5: Run tests**
-
-```bash
-npx vitest run src/services/__tests__/memory-ingest.test.ts
-```
-
-(If the test file name differs, find the right one with `grep -rl memory-ingest src/services/__tests__/`.)
-
-Expected: PASS including the two new assertions.
-
-- [ ] **Step 6: Commit**
-
-```bash
-git add src/services/memory-ingest.ts src/services/__tests__/memory-ingest*.ts
-git commit -m "feat(ingest): enqueue reflection job after AUDN commit when REFLECT_ENABLED"
-```
-
-## Task 1.8: Implement `reflect-retrieval.ts` for query-time fetch
-
-**Files:**
-- Create: `src/services/reflect-retrieval.ts`
-- Create: `src/services/__tests__/reflect-retrieval.test.ts`
-
-- [ ] **Step 1: Write the failing test**
-
-```typescript
-// src/services/__tests__/reflect-retrieval.test.ts
-import { describe, expect, it, vi } from 'vitest';
-import { fetchReflectionsForQuery, type ReflectRetrievalDeps } from '../reflect-retrieval.js';
-import { QuestionType } from '../answer-format.js';
-
-const reflection = (text: string): any => ({
-  id: 'r1', userId: 'u', conversationId: 'c', observation: text,
-  observationType: 'event_summary', evidenceMemoryIds: ['m1'],
-  embedding: [], createdAt: new Date(),
-});
-
-describe('fetchReflectionsForQuery', () => {
-  it('returns empty when reflect retrieval disabled', async () => {
-    const deps: ReflectRetrievalDeps = {
-      reflections: { findSimilar: vi.fn() } as any,
-      embed: vi.fn(),
-      topK: 5,
-      enabled: false,
-    };
-    const out = await fetchReflectionsForQuery(deps, 'u', 'How many?', QuestionType.NUMERIC_COUNT);
-    expect(out).toEqual([]);
-    expect(deps.reflections.findSimilar).not.toHaveBeenCalled();
-  });
-
-  it('returns empty when question type is OTHER', async () => {
-    const deps: ReflectRetrievalDeps = {
-      reflections: { findSimilar: vi.fn() } as any,
-      embed: vi.fn(),
-      topK: 5,
-      enabled: true,
-    };
-    const out = await fetchReflectionsForQuery(deps, 'u', 'unrelated', QuestionType.OTHER);
-    expect(out).toEqual([]);
-    expect(deps.reflections.findSimilar).not.toHaveBeenCalled();
-  });
-
-  it('embeds and fetches top-K when type is in the routed set', async () => {
-    const findSimilar = vi.fn().mockResolvedValue([reflection('R1'), reflection('R2')]);
-    const embed = vi.fn().mockResolvedValue([0.1, 0.2]);
-    const deps: ReflectRetrievalDeps = {
-      reflections: { findSimilar } as any, embed, topK: 5, enabled: true,
-    };
-    const out = await fetchReflectionsForQuery(deps, 'u', 'Summary please.', QuestionType.SUMMARY);
-    expect(embed).toHaveBeenCalledWith('Summary please.');
-    expect(findSimilar).toHaveBeenCalledWith('u', [0.1, 0.2], 5);
-    expect(out).toHaveLength(2);
-  });
-});
-```
-
-- [ ] **Step 2: Verify fail**
-
-```bash
-npx vitest run src/services/__tests__/reflect-retrieval.test.ts
-```
-
-Expected: FAIL.
-
-- [ ] **Step 3: Implement**
-
-```typescript
-// src/services/reflect-retrieval.ts
-/**
- * Query-time reflection retrieval. When the question classifier returns one of
- * the "synthesis-heavy" types (summary, contradiction, preference, ...), this
- * module embeds the query and pulls top-K reflections by cosine similarity.
- * The result is later emitted as a ## OBSERVATIONS prompt channel by
- * retrieval-format.ts.
- *
- * Returns [] when disabled or when the question type is OTHER — the caller
- * passes the empty array through and downstream packaging emits no
- * observations block.
- */
-import type { ReflectionsRepository, Reflection } from '../db/reflections-repository.js';
-import { QuestionType } from './answer-format.js';
-
-const ROUTED_TYPES: ReadonlySet<QuestionType> = new Set([
-  QuestionType.SUMMARY,
-  QuestionType.CONTRADICTION,
-  QuestionType.PREFERENCE,
-  QuestionType.NUMERIC_COUNT,
-  QuestionType.EXACT_DATE,
-  QuestionType.ORDERED_LIST,
-]);
-
-export interface ReflectRetrievalDeps {
-  reflections: Pick<ReflectionsRepository, 'findSimilar'>;
-  embed: (text: string) => Promise<number[]>;
-  topK: number;
-  enabled: boolean;
-}
-
-export async function fetchReflectionsForQuery(
-  deps: ReflectRetrievalDeps,
-  userId: string,
-  query: string,
-  questionType: QuestionType,
-): Promise<Reflection[]> {
-  if (!deps.enabled) return [];
-  if (!ROUTED_TYPES.has(questionType)) return [];
-  const embedding = await deps.embed(query);
-  return deps.reflections.findSimilar(userId, embedding, deps.topK);
-}
-```
-
-- [ ] **Step 4: Verify pass**
-
-```bash
-npx vitest run src/services/__tests__/reflect-retrieval.test.ts
-```
-
-Expected: 3/3 PASS.
-
-- [ ] **Step 5: Commit**
-
-```bash
-git add src/services/reflect-retrieval.ts src/services/__tests__/reflect-retrieval.test.ts
-git commit -m "feat(reflect): add query-time reflection retrieval gated by question type"
-```
-
-## Task 1.9: Wire reflection retrieval into `search-pipeline.ts`
-
-**Files:**
-- Modify: `src/services/search-pipeline.ts`
-
-- [ ] **Step 1: Identify where the final retrieval result is assembled**
-
-```bash
-grep -n "applyExpansionAndReranking\|surfaced\|selected\|topK\|return" src/services/search-pipeline.ts | head -30
-```
-
-Identify the function/section where the final selected set is passed downstream.
-
-- [ ] **Step 2: Thread reflections into the pipeline output**
-
-Add an `reflections` field to the pipeline output type (or to the deps that flow to retrieval-format). Call `fetchReflectionsForQuery` with the query, classified type, and configured top-K. Attach the array (possibly empty) to the result.
-
-Insertion shape (illustrative):
-
-```typescript
-import { fetchReflectionsForQuery } from './reflect-retrieval.js';
-import { classifyQuestion } from './answer-format.js';
-
-// in the orchestration function, after `selected = await applyExpansionAndReranking(...)`:
-const reflections = await fetchReflectionsForQuery(
-  {
-    reflections: deps.stores.reflections!,
-    embed: deps.embed,
-    topK: deps.config.reflectRetrievalTopK,
-    enabled: deps.config.reflectEnabled,
-  },
-  userId,
-  query,
-  classifyQuestion(query),
-);
-// pass `reflections` downstream alongside `selected`
-```
-
-- [ ] **Step 3: tsc clean**
-
-```bash
-npx tsc --noEmit
-```
-
-- [ ] **Step 4: Smoke test the pipeline still compiles end-to-end**
-
-```bash
-npm run build 2>&1 | tail -5
-```
-
-Expected: build succeeds.
-
-- [ ] **Step 5: Commit**
-
-```bash
-git add src/services/search-pipeline.ts
-git commit -m "feat(search-pipeline): wire reflection retrieval branch into orchestration"
-```
-
-## Task 1.10: Emit `## OBSERVATIONS` channel in `retrieval-format.ts`
-
-**Files:**
-- Modify: `src/services/retrieval-format.ts`
-- Modify: `src/services/__tests__/retrieval-format.test.ts`
-
-- [ ] **Step 1: Write the failing test**
-
-In `src/services/__tests__/retrieval-format.test.ts`, add a new describe block:
-
-```typescript
-import type { Reflection } from '../../db/reflections-repository.js';
-
-const sampleReflection = (text: string, type: any = 'event_summary'): Reflection => ({
-  id: 'r1', userId: 'u', conversationId: 'c',
-  observation: text, observationType: type,
-  evidenceMemoryIds: ['m1', 'm2'],
-  embedding: [], createdAt: new Date(),
-});
-
-describe('buildInjection with reflections', () => {
-  it('emits ## OBSERVATIONS section when reflections array is non-empty', () => {
-    const out = buildInjection({
-      // existing required args — adapt to the existing helper signature
-      memories: [],
-      reflections: [sampleReflection('Observation 1')],
-      // ... other args as currently required
-    } as any);
-    expect(out).toContain('## OBSERVATIONS');
-    expect(out).toContain('Observation 1');
-    expect(out).toContain('event_summary');
-  });
-
-  it('omits the OBSERVATIONS section when reflections array is empty', () => {
-    const out = buildInjection({
-      memories: [],
-      reflections: [],
-    } as any);
-    expect(out).not.toContain('## OBSERVATIONS');
-  });
-});
-```
-
-- [ ] **Step 2: Verify fail**
-
-```bash
-npx vitest run src/services/__tests__/retrieval-format.test.ts -t "buildInjection with reflections"
-```
-
-Expected: FAIL.
-
-- [ ] **Step 3: Implement the channel emission**
-
-In `src/services/retrieval-format.ts`, extend the `buildInjection` argument type to accept `reflections?: readonly Reflection[]`. Add a small helper:
-
-```typescript
-import type { Reflection } from '../db/reflections-repository.js';
-
-function buildObservationsChannel(reflections: readonly Reflection[] | undefined): string {
-  if (!reflections || reflections.length === 0) return '';
-  const lines = reflections.map(r => {
-    const evidence = r.evidenceMemoryIds.join(', ');
-    return `- [${r.observationType}] ${r.observation}\n  evidence: ${evidence}`;
-  });
-  return `## OBSERVATIONS\n${lines.join('\n')}\n\n`;
-}
-```
-
-In the main assembly path of `buildInjection`, prepend the result of `buildObservationsChannel(args.reflections)` to the existing injection text (BEFORE `## TIMELINE` or after, your choice — but consistent).
-
-- [ ] **Step 4: Verify pass**
-
-```bash
-npx vitest run src/services/__tests__/retrieval-format.test.ts
-```
-
-Expected: all tests PASS (existing + new 2).
-
-- [ ] **Step 5: Commit**
-
-```bash
-git add src/services/retrieval-format.ts src/services/__tests__/retrieval-format.test.ts
-git commit -m "feat(retrieval-format): emit ## OBSERVATIONS channel for reflections"
-```
-
-## Task 1.11: Add config flags + flush route
-
-**Files:**
-- Modify: `src/config.ts`
-- Create: `src/routes/reflect.ts`
-
-- [ ] **Step 1: Add the config keys**
-
-In `src/config.ts`, inside the `RuntimeConfig` interface, add:
-
-```typescript
-  reflectEnabled: boolean;
-  reflectModel: string;
-  reflectMaxObservations: number;
-  reflectJobPollMs: number;
-  reflectDebounceMs: number;
-  reflectRetrievalTopK: number;
-```
-
-In the config constructor function, parse them:
-
-```typescript
-  reflectEnabled: (optionalEnv('REFLECT_ENABLED') ?? 'false') === 'true',
-  reflectModel: optionalEnv('REFLECT_MODEL') ?? 'claude-sonnet-4-5',
-  reflectMaxObservations: parseInt(optionalEnv('REFLECT_MAX_OBSERVATIONS') ?? '12', 10),
-  reflectJobPollMs: parseInt(optionalEnv('REFLECT_JOB_POLL_MS') ?? '5000', 10),
-  reflectDebounceMs: parseInt(optionalEnv('REFLECT_DEBOUNCE_MS') ?? '60000', 10),
-  reflectRetrievalTopK: parseInt(optionalEnv('REFLECT_RETRIEVAL_TOP_K') ?? '5', 10),
-```
-
-Add to `INTERNAL_POLICY_CONFIG_FIELDS` if other flags follow that pattern.
-
-In `src/app/runtime-container.ts`, mirror the fields in `CoreRuntimeConfig`.
-
-- [ ] **Step 2: Implement the flush route**
-
-```typescript
-// src/routes/reflect.ts
-/**
- * Synchronous reflect-flush endpoint for benchmark / eval mode.
- * Processes all pending reflection_jobs serially and returns the count
- * of jobs processed. Returns 503 if Reflect is disabled.
- */
-import type { Request, Response } from 'express';
-import type { JobsWorkerDeps } from '../services/reflect-jobs.js';
-import { processOnePendingJob } from '../services/reflect-jobs.js';
-
-export function makeReflectFlushHandler(
-  deps: JobsWorkerDeps,
-  enabled: boolean,
-): (req: Request, res: Response) => Promise<void> {
-  return async (_req, res) => {
-    if (!enabled) {
-      res.status(503).json({ error: 'reflect_disabled' });
-      return;
-    }
-    let processed = 0;
-    let cap = 1000;
-    while (cap-- > 0) {
-      const did = await processOnePendingJob(deps);
-      if (!did) break;
-      processed++;
-    }
-    res.json({ processed });
-  };
-}
-```
-
-Mount it in `server.ts` or wherever routes are registered: `POST /v1/reflect/flush`.
-
-- [ ] **Step 3: tsc clean**
-
-```bash
-npx tsc --noEmit
-```
-
-- [ ] **Step 4: Commit**
-
-```bash
-git add src/config.ts src/app/runtime-container.ts src/routes/reflect.ts src/server.ts
-git commit -m "feat(reflect): config flags + POST /v1/reflect/flush sync endpoint"
-```
-
-## Task 1.12: Wire dependencies in `runtime-container.ts` and start the worker
-
-**Files:**
-- Modify: `src/app/runtime-container.ts`
-- Modify: `src/db/stores.ts`
-
-- [ ] **Step 1: Add the two repos to the stores bundle**
-
-In `src/db/stores.ts`, extend `CoreStores`:
-
-```typescript
-import { ReflectionsRepository } from './reflections-repository.js';
-import { ReflectionJobsRepository } from './reflection-jobs-repository.js';
-
-export interface CoreStores {
-  // ... existing fields ...
-  reflections: ReflectionsRepository | null;
-  reflectionJobs: ReflectionJobsRepository | null;
-}
-```
-
-- [ ] **Step 2: Instantiate them in `createCoreRuntime`**
-
-In `src/app/runtime-container.ts`, after the other repository instantiations:
-
-```typescript
-import { ReflectionsRepository } from '../db/reflections-repository.js';
-import { ReflectionJobsRepository } from '../db/reflection-jobs-repository.js';
-import { runReflectForConversation } from '../services/reflect.js';
-import { startReflectWorker } from '../services/reflect-jobs.js';
-import { embed as embedQuery } from '../services/embedding.js'; // adapt to actual API
-
-const reflections = runtimeConfig.reflectEnabled ? new ReflectionsRepository(pool) : null;
-const reflectionJobs = runtimeConfig.reflectEnabled ? new ReflectionJobsRepository(pool) : null;
-
-// ... after building stores ...
-stores.reflections = reflections;
-stores.reflectionJobs = reflectionJobs;
-
-if (runtimeConfig.reflectEnabled && reflections && reflectionJobs) {
-  const reflectModel = runtimeConfig.reflectModel;
-  const workerDeps = {
-    jobs: reflectionJobs,
-    runReflect: (userId: string, conversationId: string) =>
-      runReflectForConversation(
-        {
-          fetchMemories: async (u, c) => {
-            const rows = await memory.findByConversation(u, c);
-            return rows.map(r => ({ id: r.id, text: r.text, observedAt: r.observedAt }));
-          },
-          llmCallTool: (system, user, toolSchema) =>
-            callAnthropicTool(reflectModel, system, user, toolSchema),
-          embed: embedQuery,
-          reflections,
-          maxObservations: runtimeConfig.reflectMaxObservations,
-        },
-        userId,
-        conversationId,
-      ),
-  };
-  startReflectWorker(workerDeps, runtimeConfig.reflectJobPollMs);
-}
-```
-
-### Sub-step 2a — Add `findByConversation` to `MemoryRepository` if missing
-
-In `src/db/memory-repository.ts`, add (skip if it already exists):
-
-```typescript
-async findByConversation(
-  userId: string,
-  conversationId: string,
-): Promise<Array<{ id: string; text: string; observedAt: Date }>> {
-  const { rows } = await this.pool.query(
-    `SELECT id, content as text, observed_at
-     FROM memories
-     WHERE user_id = $1 AND conversation_id = $2
-     ORDER BY observed_at ASC`,
-    [userId, conversationId],
-  );
-  return rows.map(r => ({ id: r.id, text: r.text, observedAt: r.observed_at }));
-}
-```
-
-(Adapt the column name `content`/`text` and `conversation_id` to whatever the existing memories schema uses — `\d memories` to confirm.)
-
-### Sub-step 2b — Add `callAnthropicTool` to `services/llm.ts` if missing
-
-```typescript
-import Anthropic from '@anthropic-ai/sdk';
-import { config } from '../config.js';
-
-const client = new Anthropic({ apiKey: config.anthropicApiKey });
-
-interface AnthropicToolSchema {
-  name: string;
-  description: string;
-  input_schema: Record<string, unknown>;
-}
-
-export async function callAnthropicTool<T>(
-  model: string,
-  system: string,
-  user: string,
-  toolSchema: AnthropicToolSchema,
-): Promise<T> {
-  const response = await client.messages.create({
-    model,
-    max_tokens: 4096,
-    system,
-    messages: [{ role: 'user', content: user }],
-    tools: [toolSchema],
-    tool_choice: { type: 'tool', name: toolSchema.name },
-  });
-  for (const block of response.content) {
-    if (block.type === 'tool_use' && block.name === toolSchema.name) {
-      return block.input as T;
-    }
-  }
-  throw new Error(`Anthropic tool-use returned no ${toolSchema.name} block`);
-}
-```
-
-(Adapt `apiKey` to whatever shape the existing config exposes — it's likely `config.llmApiKey` based on prior env-files.)
-
-- [ ] **Step 3: tsc clean**
-
-```bash
-npx tsc --noEmit
-```
-
-- [ ] **Step 4: Run all tests**
-
-```bash
-npm test 2>&1 | tail -20
-```
-
-Expected: PASS.
-
-- [ ] **Step 5: Commit**
-
-```bash
-git add -A
-git commit -m "feat(reflect): wire repos + worker in runtime-container, expose via stores"
-```
-
-## Task 1.13: Author the Phase 1 validation env file
-
-**Files:**
-- Create: `.env.phase1-reflect`
-
-- [ ] **Step 1: Write the env file**
-
-```bash
-POSTGRES_PORT=5503
-APP_PORT=3103
-DATABASE_URL=postgresql://atomicmemory:atomicmemory@localhost:5503/atomicmemory
-LLM_PROVIDER=anthropic
-LLM_API_URL=
-LLM_API_KEY=<your-anthropic-api-key>
-LLM_MODEL=claude-haiku-4-5
-EMBEDDING_PROVIDER=transformers
-EMBEDDING_MODEL=Xenova/all-MiniLM-L6-v2
-EMBEDDING_DIMENSIONS=384
-ANTHROPIC_API_KEY=<your-anthropic-api-key>
-ATOMICMEMORY_API_URL=http://localhost:3103
-COST_CAP_DAILY=200
-COST_CAP_ITER=20
-# Kept stack (h3-timeline) + L1-patched
-TBC_ENABLED=true
-TOPIC_ABSTRACTION_ENABLED=false
-TOPIC_SEARCH_ENABLED=false
-RERANKER_ENABLED=true
-RECAP_LAYER_ENABLED=false
-RECAP_SEARCH_ENABLED=false
-HIERARCHICAL_RETRIEVAL_ENABLED=false
-CHUNKED_EXTRACTION_ENABLED=true
-CHUNKED_EXTRACTION_FALLBACK_ENABLED=true
-TIMELINE_CHANNEL_ENABLED=true
-PACKAGING_USE_OBSERVED_AT=true
-ANSWER_FORMAT_ALIGNMENT_ENABLED=true
-# Phase 1 — Reflect ON
-REFLECT_ENABLED=true
-REFLECT_MODEL=claude-sonnet-4-5
-REFLECT_MAX_OBSERVATIONS=12
-REFLECT_JOB_POLL_MS=5000
-REFLECT_DEBOUNCE_MS=10000
-REFLECT_RETRIEVAL_TOP_K=5
-```
-
-- [ ] **Step 2: DO NOT commit this file**
-
-`.env.*` files are blocked by `.gitignore` (real API keys). File exists on
-disk for docker; it never enters git.
-
-## Task 1.14: Run Phase 1 4-conv n=80 validation
-
-- [ ] **Step 1: Verify ports 3103 and 5503 are free**
-
-```bash
-for p in 3103 5503; do
-  if lsof -nP -iTCP:$p -sTCP:LISTEN >/dev/null 2>&1; then echo "$p BUSY"; else echo "$p free"; fi
-done
-```
-
-- [ ] **Step 2: Modify the runner to call `/v1/reflect/flush` between ingest and query phases**
-
-The current `run_parallel_cell.sh` does ingest and query inside one `omb run` invocation. We need a hook to call `POST http://localhost:$PORT/v1/reflect/flush` after the harness ingests the conversation but before it starts asking questions. Two options:
-
-(a) **Easiest:** add `BEAM_POST_INGEST_HOOK_URL` to the omb harness env (modify `agent-memory-benchmark/src/memory_bench/run.py`) and have it `requests.post` after ingest.
-
-(b) **No-harness-change:** patch the AM `/v1/memories` endpoint to also call `processOnePendingJob` synchronously when an env flag is on (only used in eval mode). Acceptable hack for the validation.
-
-Pick (b) for speed. Add to `routes/memories.ts` (the ingest handler): if `runtimeConfig.reflectEnabled` AND the post-ingest hook flag is set, await one job-drain cycle before responding. Document the flag clearly: `REFLECT_SYNC_DRAIN_ON_INGEST=true` (eval-mode only).
-
-- [ ] **Step 3: Run the validation**
-
-```bash
-/Users/moralespanitz/me/supernet/atomicmemory-research/memory-research/benchmarks-sprint3/tools/run_parallel_cell.sh \
-  phase1-reflect .env.phase1-reflect 3103 5503 am-phase1-reflect \
-  1,2,3,4 anthropic-haiku-4-5 anthropic-haiku-4-5
-```
-
-Expected wall time: ~35 min (Reflect adds ~10 min for the Sonnet calls). Expected cost: ~$4 (Haiku $2 + Sonnet $2).
-
-- [ ] **Step 4: Per-question diff vs Phase 0 baseline**
-
-```bash
-python3 << 'PYEOF'
-import json
-prev = json.load(open('/Users/moralespanitz/me/supernet/atomicmemory-research/memory-research/benchmarks-sprint3/results/haiku080/phase0-l1patched/summary.json'))
-new  = json.load(open('/Users/moralespanitz/me/supernet/atomicmemory-research/memory-research/benchmarks-sprint3/results/haiku080/phase1-reflect/summary.json'))
-print(f"composite: {prev['composite']:.3f} -> {new['composite']:.3f}  (delta {new['composite']-prev['composite']:+.3f})")
-print("per-ability:")
-for ab in sorted(prev['per_ability']):
-    p, n = prev['per_ability'][ab], new['per_ability'][ab]
-    print(f"  {ab:25s}: {p:.3f} -> {n:.3f}  (delta {n-p:+.3f})")
-PYEOF
-```
-
-- [ ] **Step 5: Apply Phase 1 strict gate**
-
-| Composite Δ | Worst per-ability Δ | Verdict |
-|---|---|---|
-| ≥ +0.05 | ≥ -0.10 | **PASS** — Phase 1 ships, proceed to Phase 2 brainstorm |
-| Composite plateau, but SUM/KU/CR/MSR each ≥ +0.05 | ≥ -0.10 | **PASS** (per-ability win) — proceed to Phase 2 |
-| ∈ [-0.03, +0.05] otherwise | ≥ -0.10 | **PLATEAU** — diagnose, document, decide keep-flagged-off vs continue-on |
-| < -0.03 OR any ability < -0.10 | — | **REGRESS** — revert Phase 1 commits, root-cause, consult user |
-
-- [ ] **Step 6: Multirun if borderline**
-
-If composite Δ is within ±0.03 of the +0.05 threshold, run 3 seeds × n=80 to kill noise:
-
-```bash
-for seed in 1 2 3; do
-  /Users/moralespanitz/me/supernet/atomicmemory-research/memory-research/benchmarks-sprint3/tools/run_parallel_cell.sh \
-    phase1-reflect-s${seed} .env.phase1-reflect 3103 5503 am-phase1-reflect-s${seed} \
-    1,2,3,4 anthropic-haiku-4-5 anthropic-haiku-4-5
-done
-```
-
-Compute the 9-cell mean and the bootstrap 95% CI. Decide based on the CI lower bound.
-
-- [ ] **Step 7: Write the Phase 1 diagnostic doc**
-
-Create `/Users/moralespanitz/me/supernet/atomicmemory-research/memory-research/benchmarks-sprint5/phase-1-diagnostic.md`:
-- Composite + per-ability before/after
-- Sample of Reflect outputs (read 5 random rows from `session_reflections`)
-- Per-question diff: which question types lifted, which regressed
-- Verdict (PASS / PLATEAU / REGRESS)
-- Next step
-
-- [ ] **Step 8: Commit**
-
-```bash
-cd /Users/moralespanitz/me/supernet/atomicmemory-research
-git add memory-research/benchmarks-sprint3/results/haiku080/phase1-reflect/ \
-        memory-research/benchmarks-sprint5/phase-1-diagnostic.md
-git commit -m "results: Phase 1 4-conv n=80 validation"
-```
-
-- [ ] **Step 9: Final gate**
-
-If PASS or PASS (per-ability win): announce ready for Phase 2 (specialists). The next implementation plan begins with another brainstorm → spec update → writing-plans cycle.
-
-If PLATEAU: diagnose. Common Reflect-specific failure modes to check first:
-- Reflect not producing observations (check `session_reflections` row count)
-- Observations not being retrieved (check `## OBSERVATIONS` appearing in query logs)
-- Observations contradicting raw memories (per-question diff shows answer flipping wrong)
-- Sonnet voice mismatch (try `REFLECT_MODEL=claude-haiku-4-5` and re-validate)
-
-If REGRESS: revert all Phase 1 commits (`git revert HEAD~N..HEAD` where N is the Phase 1 task count), open a diagnostic, consult user.
-
----
-
-## Plan-level pre-commit checklist (after Phase 1 lands)
-
-Before declaring Phase 1 done:
-
-- [ ] `npx tsc --noEmit` clean
-- [ ] `npm test` all pass (no skipped suites)
-- [ ] `fallow --no-cache` clean (or remaining items consciously deferred and documented)
-- [ ] All new files ≤ 400 lines, all new functions ≤ 40 lines
-- [ ] No `any`, no direct `process.env` reads
-- [ ] Phase 0 + Phase 1 diagnostic docs committed in `benchmarks-sprint5/`
-- [ ] Both `summary.json` files committed in `benchmarks-sprint3/results/haiku080/`
-
-## Out of scope (deferred to later plans)
-
-- Per-ability specialists (Phase 2 — separate plan)
-- Hybrid model routing (Phase 3)
-- TEMPR graph arm (Phase 4)
-- Mental Models / Mission-Directives (Phase 5)
-- Reflect storage TTL / compaction
-- BEAM-1M / BEAM-10M tier validation
-- Multi-instance worker leasing (single-instance only for v1)
diff --git a/docs/superpowers/specs/2026-05-11-beam-085-anthropic-only-design.md b/docs/superpowers/specs/2026-05-11-beam-085-anthropic-only-design.md
deleted file mode 100644
index 939883c..0000000
--- a/docs/superpowers/specs/2026-05-11-beam-085-anthropic-only-design.md
+++ /dev/null
@@ -1,244 +0,0 @@
-# BEAM 0.85+ via Anthropic-Only Hybrid Architecture — Design
-
-**Date:** 2026-05-11
-**Author:** AtomicStrata research (Claude + Moralespanitz)
-**Status:** Spec — pending user approval before transition to writing-plans
-**Target:** Composite ≥0.85 on BEAM-100K (stretch 0.90+) under Anthropic-judge, beating Hindsight (0.75 published, Gemini-judge) and Mem0 on the public leaderboard.
-
-## Goal
-
-Lift AtomicMemory's BEAM-100K composite from **0.411 (strict Haiku × Haiku-judge, kept stack)** to **0.85–0.92** without leaving Anthropic's model family for the pipeline. Quality is the primary objective; Pareto position is secondary but tracked as a constraint (target: stay below Hindsight's published $0.075/q cost). Reach the BEAM-1M and BEAM-10M tiers as a stretch goal.
-
-## Architecture
-
-Approach C — **shared spine + per-ability specialists for the bottom 3 abilities**. The current AM pipeline (RRF + reranker + TBC + timeline + packaging + Haiku answer) is preserved as the default "shared spine." A new async **Reflect step** consolidates session memories at ingest time. A new **question-type router** dispatches MSR / CR / KU+IE queries to specialist branches that bypass parts of the shared spine for those specific question types. All other questions take the shared spine unchanged.
-
-### Diagram
-
-```
-Ingest path
-  user turn → AUDN (Haiku) → memories table
-                            └─→ literal-extractor (Haiku) → entity_values table
-                            └─→ [async, session boundary]
-                                Reflect (Sonnet) → session_reflections table
-
-Retrieve path
-  query → question-type classifier (deterministic regex)
-        ├─ shared spine (default):
-        │   RRF (sem + BM25 + temporal) → reranker → kept stack
-        │   packaging (+ ## TIMELINE + ## OBSERVATIONS if reflect retrieved)
-        │   → Haiku answer with L1 format-aligned prompt
-        ├─ MSR specialist (/how many|total|across all/):
-        │   retrieve → memory-aggregate (group by entity) →
-        │   Haiku with answer_with_count tool-use call
-        ├─ CR specialist (/have I ever|conflicting/):
-        │   retrieve → bilateral COUNTER fetch (both sides) →
-        │   Haiku with answer_contradiction tool-use call (FACT A / FACT B framing)
-        └─ KU/IE specialist (/what is the|when does/):
-            entity_values SQL lookup → hit: literal value
-                                       miss: fall through to shared spine
-```
-
-### Components (new modules, all ≤400 lines)
-
-| Module | Role | Phase |
-|---|---|---|
-| `services/reflect.ts` | Reflect orchestrator | 1 |
-| `services/reflect-prompts.ts` | Sonnet system prompt + tool-use schema | 1 |
-| `services/reflect-jobs.ts` | Postgres-backed async job queue | 1 |
-| `services/reflect-retrieval.ts` | Query-time reflection fetch | 1 |
-| `db/reflections-repository.ts` | CRUD for session_reflections | 1 |
-| `db/reflection-jobs-repository.ts` | CRUD for reflection_jobs | 1 |
-| `db/migrations/20260512_session_reflections.sql` | Schema | 1 |
-| `services/specialists/question-router.ts` | Deterministic classifier + dispatch | 2 |
-| `services/specialists/msr-specialist.ts` | MSR aggregator + count tool-use | 2.1 |
-| `services/specialists/cr-specialist.ts` | CR bilateral framing + tool-use | 2.2 |
-| `services/specialists/ku-ie-specialist.ts` | Literal SQL lookup | 2.3 |
-| `services/specialists/specialist-types.ts` | Shared specialist types | 2 |
-| `services/literal-extractor.ts` | Ingest-side literal-field extraction | 2.3 |
-| `db/entity-values-repository.ts` | CRUD for entity_values | 2.3 |
-| `db/migrations/20260513_entity_values.sql` | Schema | 2.3 |
-| `services/model-router.ts` | Per-ability model selection (Phase 3) | 3 |
-| `services/graph-arm.ts` | TEMPR 4th retrieval arm via belief_edges | 4 |
-
-### Modified files
-
-| File | Change | Phase |
-|---|---|---|
-| `services/answer-format.ts` | Patch ORDERED_LIST hint; tighten classifier (require numeric token) | 0 |
-| `services/counter-edge-surface.ts` | **DELETE** — replaced by CR specialist | 0 |
-| `config.ts` | New flags: `REFLECT_ENABLED`, `SPECIALIST_*_ENABLED`, `LITERAL_EXTRACTOR_ENABLED`, etc. | per phase |
-| `services/memory-ingest.ts` | Call literal-extractor + write reflection_job after AUDN | 1, 2.3 |
-| `services/memory-search.ts` | Dispatch via question-router; integrate reflection-retrieval | 1, 2 |
-| `services/search-pipeline.ts` | Add reflection retrieval branch; add graph arm in Phase 4 | 1, 4 |
-| `services/retrieval-format.ts` | Emit `## OBSERVATIONS` prompt channel | 1 |
-| `app/runtime-container.ts` | Wire new repositories + job worker | 1+ |
-
-## Data flow
-
-### Ingest
-1. User turn arrives via HTTP `POST /v1/memories`.
-2. AUDN extracts atomic facts (Haiku call, current behavior preserved).
-3. **NEW** (Phase 2.3): literal-extractor runs on each new fact, extracting `(entity, attribute, value, value_type, observed_at)` tuples into `entity_values`.
-4. **NEW** (Phase 1): a `reflection_jobs` row is written for the affected `(user_id, conversation_id)`. Status = `pending`. Response returns to caller immediately.
-5. **NEW** (Phase 1, async): worker polls `reflection_jobs WHERE status = 'pending' AND age > debounce_threshold`. For each ready job, fetches all memories for the conversation, calls Sonnet with the Reflect prompt + tool-use schema. Writes resulting observations to `session_reflections`. Marks job `completed`.
-
-### Retrieve
-1. Query arrives via `POST /v1/memories/search`.
-2. Question-type classifier (deterministic, no LLM) inspects query → returns one of `{msr, cr, ku_ie, shared}`.
-3. Dispatch:
-   - `shared`: existing RRF + rerank + packaging + Haiku. PLUS — if classifier sub-flag `summary_or_preference_or_knowledge_update` matches, also fetch top-5 reflections via cosine similarity and emit `## OBSERVATIONS` channel in packaging.
-   - `msr`: shared retrieval + memory-aggregate post-process + Haiku tool-use call `answer_with_count`.
-   - `cr`: shared retrieval + bilateral COUNTER fetch (query `belief_edges` for both directions) + Haiku tool-use call `answer_contradiction`.
-   - `ku_ie`: query `entity_values` directly via SQL. On hit: return literal value in minimal answer template. On miss: fall through to shared spine.
-4. Answer returned. Telemetry records which branch was taken.
-
-## Conditional cases / phase decision tree
-
-Sequential phases. Each phase gates the next.
-
-### Strict gate definition
-
-- **PASS:** composite Δ ≥ +0.05 vs prev-phase baseline AND no per-ability regression > 0.10 at 4-conv n=80.
-- **PLATEAU:** composite Δ ∈ [−0.03, +0.05] with no ability < −0.10. **Diagnose via per-question diff** before deciding. May ship behind flag without claim.
-- **REGRESS:** composite Δ < −0.03 OR any ability < −0.10. **Revert immediately**, root-cause, then optionally retry with modification.
-- **Borderline:** for composite Δ within ±0.03 of the +0.05 threshold, run multirun (3 seeds × n=80) to kill single-run noise.
-
-### Phase tree
-
-```
-Phase 0 — L1-patched, L3-deleted, relock baseline
-  PASS → Phase 1
-  PLATEAU → keep, no claim → Phase 1
-  REGRESS → revert L1, investigate (ask user)
-
-Phase 1 — Reflect step (async, Sonnet)
-  PASS → Phase 2.1
-  PLATEAU on composite, per-ability lifts on SUM/KU/CR/MSR ≥+0.05 each → keep ON → Phase 2.1
-  PLATEAU else → flag OFF, diagnose, retry with modified prompt
-  REGRESS → OFF, diagnose:
-    - bad observations (evidence cite validity?)
-    - prompt-slot competition (verify ## OBSERVATIONS only fires when routed)?
-    - voice mismatch (try Haiku for Reflect)?
-
-Phase 2.1 — MSR specialist
-  PASS (MSR Δ ≥+0.15 AND composite Δ ≥+0.03) → Phase 2.2
-  PLATEAU → ablate: classifier accuracy? aggregator grouping?
-  REGRESS → OFF, no retry without root cause
-
-Phase 2.2 — CR specialist
-  PASS (CR Δ ≥+0.15 AND composite Δ ≥+0.02) → Phase 2.3
-  PLATEAU → ablate: COUNTER edges sparse? tool-use schema ambiguous?
-  REGRESS → OFF, investigate
-
-Phase 2.3 — KU/IE specialist + literal-extractor
-  PASS (KU OR IE Δ ≥+0.20) → Phase 3
-  PLATEAU → ablate: entity_values population rate? (entity, attribute) extraction accuracy?
-  REGRESS → OFF, investigate
-
-Phase 3 — Hybrid model routing (Sonnet/Opus for hard abilities)
-  PASS (composite Δ ≥+0.05) → Phase 4
-  PLATEAU → stay on Haiku (Sprint 3 documented: stronger LLM hurts strict-judge)
-  REGRESS → all-Haiku, document the model-vs-judge pattern
-
-Phase 4 — TEMPR 4th arm (graph retrieval)
-  PASS → Phase 5
-  PLATEAU → OFF, document
-  REGRESS → OFF
-
-Phase 5 — Mental Models + Mission/Directives (polish)
-  PASS → DONE, run 4-conv multirun + BEAM-1M / BEAM-10M tiers
-  PLATEAU → DONE at current best
-  REGRESS → revert, claim previous best
-```
-
-### Cross-phase rules
-
-1. **Never stack two unvalidated changes.** Each phase ships in isolation, validated solo, then stacks.
-2. **Never ship a mechanism that regresses any ability by >0.15** even if composite lifts. (L3 regressed CR by −0.188 while *targeting* CR — exact failure pattern this rule prevents.)
-3. **Always keep previous-known-good as rollback.** Feature branch per phase, merge to `main` only after phase PASS.
-4. **Always run per-question diff** before claiming PASS — composite numbers can hide mechanism-level damage (dedup looked +0.024 single-conv, was −0.024 at 4-conv).
-5. **Rate-limit guard:** if Anthropic rate-limit pool exhausts during a phase, pause and reschedule. Do not parallelize harder.
-
-### Escalation ladder
-
-- **Phase regresses 3× with different mods** → architecture review: consider pivoting from Approach C → Approach B (full per-ability refactor) for the stuck ability, or scope down + skip.
-- **Composite plateaus < 0.65 after Phase 3** → likely model-capacity bound. Add Phase 3.5: hybrid with Sonnet for ALL answer generation, validate.
-- **Composite plateaus < 0.75 after Phase 4** → likely judge-calibration bound. Document the ceiling. Consider Sonnet-as-judge as an alternative anchor.
-
-## Error handling (per-mechanism)
-
-| Mechanism | Failure mode | Mitigation |
-|---|---|---|
-| Reflect | LLM call fails | Retry 3× exponential backoff. Then mark job `failed`, no reflections written. Query path unaffected. |
-| Reflect | Hallucinated observation | Every observation cites `evidence_memory_ids`. Reflections with missing/invalid evidence filtered at retrieval. |
-| Reflect | Contradicts raw memory | Raw memories always ground truth. Reflections are supplementary, never override. |
-| Reflect | Storage growth | Out of scope for v1 (deferred to a separate compaction sprint). |
-| MSR specialist | Aggregator returns 0 items | Fall through to shared spine. |
-| MSR specialist | Tool-use call fails | Retry once, then fall through. |
-| CR specialist | No COUNTER edges | Fall through. |
-| CR specialist | Both sides missing | Fall through. |
-| KU/IE specialist | entity_values miss | Fall through to shared spine (this will be common until table fills). |
-| Question router | Classifier crash | Default to `shared` (fail-open routing is safe here). |
-| Hybrid model router | Sonnet/Opus rate limit | Downgrade routed queries to Haiku for this request only; log + alert. |
-
-## Testing
-
-### Test types
-
-- **Unit (Vitest, in-tree):** every new module gets a `__tests__` file. Coverage: classifier regexes, aggregator grouping, Reflect prompt assembly, tool-use schema validation, repo CRUD.
-- **Integration (Vitest + Postgres test DB):** Reflect end-to-end (ingest → job → worker → DB). Specialist dispatch (query → router → specialist → answer). Fall-through behavior (specialist miss → shared spine).
-- **Smoke benchmark** (conv 2 n=20, ~5 min, $0.50): after each mechanism, before 4-conv. Catches catastrophic regressions. Never used to claim PASS.
-- **Validation benchmark** (4-conv n=80, ~25 min, $2): the gate. Produces composite + per-ability summary.json.
-- **Multirun (borderline)** (3 seeds × n=80, ~75 min, $6): when composite Δ within ±0.03 of +0.05 threshold.
-- **Per-question diff** (Python over c{1,2,3,4}.json): after every validation run. Surfaces which questions moved and why.
-
-### Validation artifacts per phase
-
-Each phase produces:
-- Branch: `feature/phase-N-{mechanism}`
-- Results JSON: `benchmarks-sprint5/results/phase-N/summary.json`
-- 1-page diagnostic: `benchmarks-sprint5/phase-N-diagnostic.md` (per-question diff + decision)
-- Merge commit on `main` (only after PASS) OR revert commit (on REGRESS)
-
-## Risks
-
-1. **Reflect hallucinations contaminate retrieval.** → evidence_memory_ids validation + fail-closed.
-2. **Specialist classifier misfires** (false-positive routing). → 100% fall-through on miss + verbose dispatch logging + per-question diff catches it.
-3. **Stronger LLM hurts strict-judge composite** (documented Sprint 3 pattern). → Phase 3 has explicit "stay on Haiku" branch if regression.
-4. **Approach C plateaus below 0.75.** → Phase 5 escalation ladder includes architecture review + Approach B pivot for stuck abilities.
-5. **Anthropic rate limits during multirun.** → pause + reschedule, not parallelize harder. Budget per phase: max ~$15 in API calls.
-
-## Out of scope
-
-- LongMemEval-S, LoCoMo10, PersonaMem benchmarks (separate sprint after BEAM-100K target).
-- Gemini-judge cross-calibration (Anthropic-judge only per user requirement).
-- Working memory / scratchpad (Phase 5 stretch only; not required for 0.85 target).
-- BGE Small EN v1.5 embedding swap (not LLM-dependent; can ship if Phase 4 plateaus).
-- Reflect storage compaction / TTL (defer to later sprint).
-
-## Constraints (from CLAUDE.md)
-
-- TypeScript ESM
-- Files ≤ 400 lines (excluding comments). New modules designed for this.
-- Functions ≤ 40 lines (excluding catch/finally).
-- No `any`.
-- No `process.env` reads outside `src/config.ts`.
-- Mutations fail closed.
-- Pre-commit: `npx tsc --noEmit`, `npm test`, `fallow --no-cache`.
-
-## Success criteria
-
-- **Required:** BEAM-100K composite ≥ 0.75 at 4-conv n=80 under Haiku × Haiku-judge.
-- **Target:** BEAM-100K composite ≥ 0.85.
-- **Stretch:** BEAM-100K composite ≥ 0.90 AND BEAM-1M / BEAM-10M tiers measured.
-- **Constraint:** Pareto cost ≤ Hindsight's $0.075/q across all measured tiers.
-- **Quality bar:** every claimed PASS reproducible at 4-conv n=80 (multirun if borderline).
-
-## Scope of the first implementation plan
-
-This spec covers all 5 phases as the strategic umbrella. The **first implementation plan** (produced by `superpowers:writing-plans`) will cover only **Phase 0 + Phase 1** — foundation cleanup + Reflect step. After Phase 1 ships and gates, we re-enter brainstorming → spec-update → next plan for Phase 2 specialists. This keeps each implementation plan tractable and lets us re-plan based on Phase 1's actual measured results.
-
-## Next step
-
-Transition to `superpowers:writing-plans` to produce the Phase 0 + Phase 1 TDD implementation plan with one task per step.
diff --git a/docs/tbc-phase-3-schema.md b/docs/tbc-phase-3-schema.md
deleted file mode 100644
index d936281..0000000
--- a/docs/tbc-phase-3-schema.md
+++ /dev/null
@@ -1,214 +0,0 @@
-# TBC Phase 3 — Schema Migration Design
-
-**Date:** 2026-05-06
-**Branch:** `worktree-tbc-prototype`
-**Status:** design (T1.1 deliverable). Implementation in T1.2.
-
----
-
-## Goal
-
-Phase 2 wrote belief state to `memories.metadata` (JSONB). Phase 3 promotes belief state to first-class columns + a new typed-edge table, so search-time consumers can read normalized fields without parsing JSONB.
-
-**Migration is strictly additive.** Pre-migration databases stay queryable; tbc-execution.ts dual-writes during the migration window.
-
----
-
-## Schema additions
-
-### 1. New columns on `memories`
-
-```sql
--- Confidence in [0,1]; default 1.0 means "fully believed" (matches AUDN's no-confidence-tracking baseline).
-ALTER TABLE memories ADD COLUMN IF NOT EXISTS confidence REAL DEFAULT 1.0
-  CHECK (confidence >= 0.0 AND confidence <= 1.0);
-
--- Belief tier — controls how the claim influences answer generation.
---   standard:   default tier, normal weight in retrieval
---   directive:  promoted; injected as a "must follow" rule in answer prompt
---   demoted:    challenged; lower weight + flagged for re-evaluation
---   retracted:  believed false; excluded from default retrieval
-ALTER TABLE memories ADD COLUMN IF NOT EXISTS belief_tier TEXT DEFAULT 'standard'
-  CHECK (belief_tier IN ('standard', 'directive', 'demoted', 'retracted'));
-
--- The TBC operator that most recently mutated this memory.
-ALTER TABLE memories ADD COLUMN IF NOT EXISTS mutation_type TEXT DEFAULT NULL
-  CHECK (mutation_type IS NULL OR mutation_type IN (
-    'AFFIRM', 'UPDATE', 'RETRACT', 'SUPERSEDE',
-    'PROMOTE', 'DEMOTE', 'EVIDENCE_FOR', 'COUNTER'
-  ));
-```
-
-### 2. New table `belief_edges`
-
-```sql
-CREATE TABLE IF NOT EXISTS belief_edges (
-  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
-  user_id TEXT NOT NULL,
-  source_id UUID NOT NULL,
-  target_id UUID NOT NULL,
-  edge_type TEXT NOT NULL CHECK (edge_type IN (
-    'evidence_for',  -- source supports target's confidence
-    'counter',       -- source contradicts target's confidence
-    'supersedes',    -- source replaces target (more specific or general)
-    'promotes',      -- source promoted target to directive tier
-    'demotes'        -- source challenged target without retracting
-  )),
-  weight REAL NOT NULL DEFAULT 0.0
-    CHECK (weight >= -1.0 AND weight <= 1.0),
-  rationale TEXT NOT NULL DEFAULT '',
-  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
-  workspace_id UUID DEFAULT NULL,
-  agent_id UUID DEFAULT NULL
-);
-```
-
-### 3. Indexes
-
-```sql
--- For "all evidence pointing at this claim" queries (queryable belief state)
-CREATE INDEX IF NOT EXISTS idx_belief_edges_target
-  ON belief_edges (target_id, edge_type, created_at DESC);
-
--- For "all claims this evidence supports/counters" queries
-CREATE INDEX IF NOT EXISTS idx_belief_edges_source
-  ON belief_edges (source_id, edge_type);
-
--- User-scoped target traversal (multi-tenant safety)
-CREATE INDEX IF NOT EXISTS idx_belief_edges_user_target
-  ON belief_edges (user_id, target_id);
-
--- Tier-aware retrieval (directives surface fast)
-CREATE INDEX IF NOT EXISTS idx_memories_belief_tier
-  ON memories (user_id, belief_tier)
-  WHERE deleted_at IS NULL AND expired_at IS NULL AND belief_tier != 'standard';
-
--- Confidence-weighted retrieval (low-confidence demotion)
-CREATE INDEX IF NOT EXISTS idx_memories_confidence
-  ON memories (user_id, confidence DESC)
-  WHERE deleted_at IS NULL AND expired_at IS NULL;
-```
-
----
-
-## Migration semantics
-
-| Property | Value |
-|---|---|
-| Additive only | yes — no DROP, no destructive change |
-| Backfill | implicit via DEFAULT clauses; existing rows: `confidence=1.0`, `belief_tier='standard'`, `mutation_type=NULL` |
-| Rollback | `ALTER TABLE memories DROP COLUMN ...` + `DROP TABLE belief_edges`; no data loss in pre-existing columns |
-| Dual-write window | tbc-execution.ts writes both `metadata.confidence/mutation_type` AND new columns |
-| Read path | search/repository can read either; prefer columns when populated, fall back to metadata |
-
----
-
-## How tbc-execution.ts changes
-
-Phase 2 wrote everything into `memories.metadata`. Phase 3 changes the executor to **dual-write** during the migration window:
-
-| Operator | Phase 2 (metadata-only) | Phase 3 (dual-write) |
-|---|---|---|
-| Affirm | metadata.confidence += delta | + `UPDATE memories SET confidence = confidence + delta` |
-| Update | mutation_type=UPDATE in metadata | + `UPDATE memories SET mutation_type='UPDATE'` |
-| Retract | tier=retracted in metadata | + `UPDATE memories SET belief_tier='retracted', mutation_type='RETRACT'` |
-| Supersede | revision_history append | + `INSERT INTO belief_edges (..., edge_type='supersedes')` |
-| Promote | tier=directive in metadata | + `UPDATE memories SET belief_tier='directive'` + edge insert |
-| Demote | tier=demoted, conf-= in metadata | + `UPDATE memories SET belief_tier='demoted', confidence=...` + edge insert |
-| EvidenceFor | revision_history append | + `INSERT INTO belief_edges (..., edge_type='evidence_for')` |
-| Counter | revision_history append | + `INSERT INTO belief_edges (..., edge_type='counter')` |
-
-The Phase 2 metadata writes stay **as a fallback** for pre-migration databases. After Phase 3 lands and migration is verified, a cleanup commit removes the metadata writes.
-
----
-
-## New repository: `belief-edges-repository.ts`
-
-API:
-```ts
-export interface BeliefEdge {
-  id: string;
-  source_id: string;
-  target_id: string;
-  edge_type: 'evidence_for' | 'counter' | 'supersedes' | 'promotes' | 'demotes';
-  weight: number;
-  rationale: string;
-  created_at: Date;
-}
-
-export async function appendEdge(
-  userId: string,
-  source: string,
-  target: string,
-  edge_type: BeliefEdge['edge_type'],
-  weight: number,
-  rationale: string,
-): Promise<BeliefEdge>;
-
-export async function getEdgesForTarget(
-  userId: string,
-  target_id: string,
-): Promise<BeliefEdge[]>;
-
-export async function aggregateConfidenceDelta(
-  userId: string,
-  target_id: string,
-): Promise<number>;  // sum of weights for evidence_for - sum of weights for counter
-```
-
-Aggregation function `aggregateConfidenceDelta` is the bridge to a future "queryable belief state" search operator: given a claim, fold all evidence/counter edges into a current-confidence reading.
-
----
-
-## Behavioral guarantees (regression)
-
-When `TBC_ENABLED=false` (default):
-- No new columns are read or written
-- AUDN code path is byte-for-byte unchanged
-- belief_edges table stays empty for that user
-- 62 regression tests still pass
-
-When `TBC_ENABLED=true`:
-- Existing memories' rows pre-migration: `confidence=1.0`, `belief_tier='standard'` — TBC reads default values, writes update them on next mutation
-- New ingest goes through tbc-execution dual-write
-- Search consumers can read either columns or metadata; prefer columns
-
----
-
-## Open questions for Phase 3 implementation
-
-1. **Confidence aggregation rule.** Two candidates fill the same claim slot, both get Affirm — does confidence update sum, max, or weighted average? Default proposal: weighted average by `weight` field, capped at 1.0.
-
-2. **Promote auto-eligibility.** Currently Promote is only LLM-triggered. Should the system auto-promote claims whose evidence-edge sum ≥ threshold (e.g., 3 EvidenceFor edges over time)? Defer to Phase 4.
-
-3. **Retract → directive tier interaction.** If a Promoted claim is later Retracted, what happens to dependent reasoning? Phase 3 just sets `belief_tier='retracted'`; downstream-edge invalidation deferred.
-
-4. **Belief_edges and workspace scoping.** Should an edge cross workspaces? Default: no; both source and target must be in the same workspace.
-
-5. **Pruning policy for belief_edges.** Without pruning, the edge table grows quadratically with conversation length. Phase 5+ adds a retention policy (e.g., compress edges older than N days into aggregate weights).
-
----
-
-## Migration execution plan (T1.2)
-
-1. Author migration as a separate file `src/db/migrations/2026-05-06-tbc-phase3.sql` (or similar — discover the project's migration convention)
-2. Apply to local dev DB; run existing test suite to confirm no regression
-3. Apply to test DB; run TBC unit tests against real schema
-4. Document the rollback SQL alongside the forward migration
-5. Update `tbc-execution.ts` to dual-write
-6. Add `belief-edges-repository.ts`
-7. Smoke test: ingest 5 facts, mutate via TBC, query belief_edges and confirm rows
-
----
-
-## Phase 4+ (post-Phase 3)
-
-Phase 4 wires belief-state queries into search:
-- New search request field: `recall_belief_state(attribute, as_of?)` → returns current believed value with provenance
-- Used to attack BEAM-100K KU/CR/ABS abilities specifically
-
-Phase 5 brings belief-state into hierarchical retrieval (T2 line of work):
-- Session-summary embeddings filter to "high-confidence + non-retracted" claims at retrieval time
-- Belief tier influences answer prompt (directives go first)
-
-The Phase 3 migration is the foundation; Phases 4+5 are paper-shape contributions.
diff --git a/docs/typed-belief-calculus.md b/docs/typed-belief-calculus.md
deleted file mode 100644
index 4f64712..0000000
--- a/docs/typed-belief-calculus.md
+++ /dev/null
@@ -1,244 +0,0 @@
-# Typed Belief Calculus (TBC) — Design Document
-
-**Status:** Phase 2 prototype (uncommitted, behind `TBC_ENABLED=false` by default)
-**Owner:** AtomicMemory core
-**Source rationale:** `atomicmemory-research/memory-research/landscape/2026-05-06-typed-belief-calculus-thinking.md`
-
-## 1. Why TBC
-
-Today AUDN reconciles every inbound atomic claim against existing memories
-and emits one of `Add | Update | Delete | No-op | Supersede | Clarify`. This
-is already finer-grained than any peer system in the 19-system landscape, but
-it still treats updates as discrete state changes. Beliefs in agent memory
-are continuous: evidence accumulates, contradicts, qualifies, generalizes.
-
-The Typed Belief Calculus (TBC) extends AUDN's decision space to **eight typed
-operators**, each with explicit storage semantics. AUDN remains a strict subset
-— every existing AUDN action maps to a TBC operator, and the rollout is gated
-by a single flag so the prototype can ride alongside production without risk.
-
-## 2. The eight operators
-
-### Affirm
-New evidence **supports** an existing claim. No new canonical fact is created.
-The target claim's confidence is incremented and an evidence pointer is
-recorded against its current version. This is the TBC analog of `NOOP` when
-the candidate is genuinely a duplicate, but with the explicit signal that the
-duplicate carries informational weight.
-
-### Update
-A claim about the same attribute now holds a **different value** (e.g., "lives
-in Boston" → "lives in Seattle"). Versioned supersession: the old version is
-retained as historic state, the new version becomes current, and the
-revision history records the operator that drove the change. This is the
-direct heir of AUDN `UPDATE`.
-
-### Retract
-The claim is now believed **false** with no replacement. Mark `RETRACTED`
-rather than deleting the row, and preserve the original as evidence so
-future agents can see "this was once asserted and was withdrawn." This is
-finer-grained than AUDN `DELETE`: deletion erases, retraction is a typed
-non-belief.
-
-### Supersede
-Replaced by a more **specific or general** claim ("uses a Python web
-framework" → "uses FastAPI"). Old and new are linked, both queryable. Maps
-1-to-1 with AUDN `SUPERSEDE` but TBC additionally records direction
-(specialization vs. generalization) for downstream query rewriting.
-
-### Promote
-A claim has been **strong and repeated** enough to become a **directive** —
-a constraint that influences answer assembly, not just one fact among many.
-Promotion moves the claim into a "directive" tier and bumps its prompt
-priority. This is genuinely new: AUDN has no analog. Phase 2 will define the
-threshold (count of Affirms, confidence floor) and whether promotion is
-implicit or explicit (see open questions).
-
-### Demote
-The claim has been **challenged but not retracted** — fresh evidence is
-inconsistent enough to lower confidence and flag the belief for
-re-evaluation, but not enough to retract. Confidence drops; a "needs
-re-evaluation" tag attaches; the claim remains queryable. Adds visibility
-to the soft-conflict regime AUDN currently routes to `CLARIFY`.
-
-### EvidenceFor
-Adds a **supports** edge from the inbound claim to a target claim's current
-version. Does not introduce a new canonical fact and does not change the
-target's content — only the evidence graph. Distinct from `Affirm` in that
-the inbound text is itself novel (it stays as its own node) but its semantic
-weight contributes to a different claim's confidence.
-
-### Counter
-Adds a **contradicts** edge from the inbound claim to a target claim's
-current version. Like `EvidenceFor`, this is a graph-only operator —
-neither claim is mutated; the edge records the tension. Aggregating edges
-is what eventually drives `Demote` or `Retract`.
-
-## 3. Schema additions (Phase 3 plan)
-
-Phase 1 is non-schema. The following are projected for Phase 3.
-
-### New columns
-- `memories.confidence` (`real`, default `1.0`) — current belief strength.
-- `memories.belief_tier` (`text`, default `'standard'`, candidate values
-  `'standard' | 'directive'`) — tier promoted via `Promote`.
-- `claim_versions.mutation_type` extends to include the eight TBC operators
-  (current set: `add | update | supersede | delete | clarify`).
-
-### New table — `belief_edges`
-```
-id            uuid primary key
-user_id       uuid not null
-source_id     uuid not null  -- inbound claim (memory id)
-target_id     uuid not null  -- supported/contradicted claim (memory id)
-edge_type     text not null  -- 'evidence_for' | 'counter'
-weight        real not null  -- in [0, 1]; aggregated into confidence_delta
-rationale     text
-created_at    timestamptz default now()
-```
-
-This is the substrate that turns memory from a fact list into a graph.
-Search and retrieval read it through aggregation views; ingest writes one
-row per `EvidenceFor` / `Counter` decision.
-
-### New table — `belief_revision_history` (optional, Phase 3)
-Normalized form of the in-metadata `revision_history` array — useful when
-a claim accumulates more than a handful of revisions. Phase 1/2 keep the
-list inline in `MemoryMetadata` because revision counts will be small.
-
-## 4. AUDN integration
-
-The integration seam is intentionally narrow. When `TBC_ENABLED=false`
-(default) nothing in the AUDN path changes; the new types exist but are
-never read. When `TBC_ENABLED=true`:
-
-1. `resolveAndExecuteAudn` (in `src/services/memory-audn.ts`) checks
-   `deps.config.tbcEnabled`.
-2. If true, it calls `decideBeliefOperator(newClaim, candidates)` from
-   `src/services/typed-belief-calculus.ts` instead of `cachedResolveAUDN`.
-3. The resulting `BeliefOperationDecision` is translated to either:
-   - an existing AUDN executor (Affirm → NOOP+evidence, Update → UPDATE,
-     Retract → DELETE, Supersede → SUPERSEDE), or
-   - a new TBC-only executor (Promote, Demote, EvidenceFor, Counter)
-     landing in Phase 2 alongside the LLM resolver.
-4. The trace shape (`IngestFactTrace`) gains an optional `beliefOperator`
-   field so existing traces remain valid; this lands in Phase 2 with the
-   first executor.
-
-Critically, **the fast-path AUDN and deferred-AUDN routes remain
-unchanged.** They continue to short-circuit before TBC is consulted; the
-LLM call is the only step we rewire.
-
-## 5. Migration path
-
-Phase 1 (this PR) is **strictly additive and gated off**:
-- New file `src/services/typed-belief-calculus.ts` (types + stub resolver)
-- One config flag (`tbcEnabled`, default false)
-- One IngestRuntimeConfig field
-- This design doc
-
-No DB migration. No data migration. No production code path branches on
-`tbcEnabled` yet; the flag exists so Phase 2's wiring stays a one-line
-change.
-
-Phase 2 (LLM resolver) and Phase 3 (schema additions) are tracked
-separately; both will be additive — existing rows without `confidence`
-default to `1.0`, existing memories without a `mutation_type` default to
-the AUDN-era value, and the `belief_edges` table is unreferenced when
-the flag is off. There is no rollback hazard because there is no
-destructive migration.
-
-## 6. Open questions
-
-1. **Confidence aggregation.** When several `EvidenceFor` edges fire over
-   time, how does the target's `confidence` move? Linear sum capped at
-   1.0? Bayesian update with a fixed prior? Beta posterior parameterized
-   by edge count? Phase 2 needs to fix this; Phase 1 leaves it
-   unspecified because no aggregation runs yet.
-2. **Promote: implicit vs. explicit.** Should `Promote` fire automatically
-   when an Affirm count crosses a threshold, or only when the LLM
-   resolver explicitly chooses it given a candidate's history? Implicit
-   is simpler; explicit gives the LLM a knob to tune directive strength
-   per-domain.
-3. **Counter without a known target.** What does the resolver do with
-   an inbound claim that contradicts something we don't have? AUDN
-   today treats it as `ADD`; TBC could record a "challenge in waiting"
-   so a future ingest of the matching claim is auto-demoted.
-4. **Revision-history bound.** The `BeliefMetadata.revision_history`
-   array is unbounded by design (audit). Phase 2 may need a rotation
-   policy for long-lived directive claims.
-5. **Demote and search ranking.** Once `confidence` lands as a column,
-   should retrieval scoring weight by it? Almost certainly yes for
-   directive-tier claims; less obviously for `standard`. Phase 4 (search)
-   territory.
-
-## 7. Phase status
-
-| Phase | Scope | Status |
-|---|---|---|
-| 1 | Type surface, config flag, design doc | Done |
-| 2 | LLM resolver, executors for the four new operators, trace extension | **In progress (this PR)** |
-| 3 | DB migration: confidence column, belief_edges table, mutation_type expansion | Not started |
-| 4 | Search integration: confidence-weighted ranking, directive-tier injection | Not started |
-| 5 | Benchmark validation: BEAM CR/KU/ABS lift under TBC vs. AUDN baseline | Not started |
-
-## 8. Phase 2 status — wired vs. deferred
-
-**Wired in this PR:**
-
-- `decideBeliefOperator(newClaim, candidates, llm?)` is now an LLM-backed
-  resolver in `src/services/typed-belief-calculus.ts`. It builds a TBC
-  prompt around the inbound claim plus up to 3 conflict candidates with
-  their current belief state and demands a JSON response with
-  `{operator, target_claim_id?, confidence_delta, rationale}`. JSON parse
-  failures, transport failures, invalid operators, and out-of-set target
-  IDs all raise the typed `BeliefResolverError` — there is no silent
-  fallback to ADD.
-- `resolveAndExecuteAudn` (in `src/services/memory-audn.ts`) now branches
-  on `deps.config.tbcEnabled` after fast-audn / deferred-audn short-circuit
-  and delegates to `resolveAndExecuteTbc` in
-  `src/services/tbc-execution.ts`. With the flag off, the file-byte-diff
-  inside `resolveAndExecuteAudn` is a single guarded `if`; nothing under
-  the AUDN code path changes.
-- `tbc-execution.ts` translates each of the eight operators:
-  - **Affirm** → existing AUDN `NOOP` (records evidence on the existing
-    claim version).
-  - **Update** → existing AUDN `UPDATE` executor.
-  - **Retract** → existing AUDN `DELETE` executor.
-  - **Supersede** → existing AUDN `SUPERSEDE` executor.
-  - **Promote** → in-memory metadata write of `mutation_type=PROMOTE` plus
-    `directive: true` and a bumped `confidence`, with a new
-    `revision_history` entry.
-  - **Demote** → in-memory metadata write of `mutation_type=DEMOTE` plus
-    a lowered `confidence`, with a new `revision_history` entry.
-  - **EvidenceFor** → graph-only edge appended to `belief_edges` with a
-    positive `weight` derived from `confidence_delta`.
-  - **Counter** → graph-only edge appended to `belief_edges` with a
-    negative `weight`.
-  All four new operators write into the existing JSONB `metadata` column —
-  no DB migration in this phase.
-- `IngestFactTrace` (and its inner `IngestTraceDecision`) gain an optional
-  `beliefOperator?: BeliefOperator` field plus eight new
-  `tbc-*` reason codes and a `'tbc'` decision source. AUDN traces remain
-  unchanged when the flag is off.
-- Unit tests live at
-  `src/services/__tests__/typed-belief-calculus.test.ts` and cover the
-  resolver (eight operators, confidence clamping, three fail-closed
-  paths), the executor (each TBC-only operator, the AUDN-mappable
-  routing, the confidence-math sequence), and the flag-off regression.
-
-**Deferred to Phase 3 / 4:**
-
-- A real `belief_edges` table — Phase 2 stores edges in metadata only.
-  When the table lands, the Phase-2 `belief_edges` metadata blob becomes
-  a write-through cache to seed the new schema.
-- A `confidence` column on `memories` and a `belief_tier` enum. Today
-  confidence and tier live in metadata; query-time reads default to
-  `1.0` / `'standard'` until the column exists.
-- Reading hydrated `BeliefMetadata` into the resolver's prompt. The Phase-2
-  prompt currently reports `confidence: 1.0` and `mutation_type: NONE`
-  for every candidate; Phase 4 wires the loaded state through.
-- Search-side use of TBC state — confidence-weighted scoring and
-  directive-tier injection are Phase 4 (search integration) territory.
-- Aggregating accumulated `belief_edges` into a confidence drift signal.
-  Phase 2 records the edges; Phase 3/4 read them back.
diff --git a/package-lock.json b/package-lock.json
index 9a340d3..73dedb7 100644
--- a/package-lock.json
+++ b/package-lock.json
@@ -1,12 +1,12 @@
 {
   "name": "@atomicmemory/core",
-  "version": "1.0.0",
+  "version": "1.0.1",
   "lockfileVersion": 3,
   "requires": true,
   "packages": {
     "": {
       "name": "@atomicmemory/core",
-      "version": "1.0.0",
+      "version": "1.0.1",
       "license": "Apache-2.0",
       "dependencies": {
         "@anthropic-ai/claude-agent-sdk": "^0.2.140",
diff --git a/package.json b/package.json
index 0fb4058..b146aae 100644
--- a/package.json
+++ b/package.json
@@ -1,6 +1,6 @@
 {
   "name": "@atomicmemory/core",
-  "version": "1.0.0",
+  "version": "1.0.1",
   "description": "Open-source memory engine for AI applications — semantic retrieval, AUDN mutation, and contradiction-safe claim versioning.",
   "type": "module",
   "license": "Apache-2.0",
diff --git a/src/config.ts b/src/config.ts
index ca637bc..fdc6b56 100644
--- a/src/config.ts
+++ b/src/config.ts
@@ -191,7 +191,7 @@ export interface RuntimeConfig {
    * Typed Belief Calculus (TBC) gate. When true, the AUDN decision step
    * defers to `decideBeliefOperator` from `services/typed-belief-calculus.ts`.
    * Default false — Phase 1 ships only the type surface and stub resolver,
-   * so existing AUDN behavior is unchanged. See `docs/typed-belief-calculus.md`.
+   * so existing AUDN behavior is unchanged.
    */
   tbcEnabled: boolean;
   /**
@@ -199,7 +199,7 @@ export interface RuntimeConfig {
    * conversation/session summaries first, then expands to atomic facts within
    * the matched sessions. Targets BEAM-10M scale (~14M tokens of context per
    * system) where flat top-K retrieval loses signal.
-   * Default false. See `docs/hierarchical-retrieval.md`.
+   * Default false.
    * Env var: HIERARCHICAL_RETRIEVAL_ENABLED=true
    */
   hierarchicalRetrievalEnabled: boolean;
diff --git a/src/db/belief-edges-repository.ts b/src/db/belief-edges-repository.ts
index c551f63..5f5a28c 100644
--- a/src/db/belief-edges-repository.ts
+++ b/src/db/belief-edges-repository.ts
@@ -4,7 +4,7 @@
  * Promote / Demote operators of the typed belief calculus.
  *
  * Schema lives in src/db/schema.sql under "TBC Phase 3" section.
- * Activated only when `TBC_ENABLED=true`; see docs/typed-belief-calculus.md.
+ * Activated only when `TBC_ENABLED=true`.
  */
 
 import pg from 'pg';
diff --git a/src/db/schema.sql b/src/db/schema.sql
index dadb242..029f549 100644
--- a/src/db/schema.sql
+++ b/src/db/schema.sql
@@ -520,7 +520,7 @@ CREATE INDEX IF NOT EXISTS idx_memory_foresight_workspace
 -- Promotes belief state from `memories.metadata` JSONB into typed columns +
 -- a new `belief_edges` table. All additions are idempotent (IF NOT EXISTS).
 -- Pre-migration databases stay queryable; tbc-execution.ts dual-writes
--- during the migration window. Design doc: docs/tbc-phase-3-schema.md.
+-- during the migration window.
 -- Activated only when TBC_ENABLED=true; defaults preserve existing behavior.
 -- ---------------------------------------------------------------------------
 
@@ -593,7 +593,7 @@ CREATE INDEX IF NOT EXISTS idx_belief_edges_user_target
 -- BEAM-10M scale (10 conversations × ~1.4M tokens each = ~14M total context).
 -- session_summaries + conv_summaries indexed via HNSW on summary_embedding.
 -- Activated only when HIERARCHICAL_RETRIEVAL_ENABLED=true; defaults preserve
--- existing flat-retrieval behavior. Design doc: docs/hierarchical-retrieval.md.
+-- existing flat-retrieval behavior.
 -- ---------------------------------------------------------------------------
 
 CREATE TABLE IF NOT EXISTS session_summaries (
diff --git a/src/db/summaries-repository.ts b/src/db/summaries-repository.ts
index d70fa16..2645cbc 100644
--- a/src/db/summaries-repository.ts
+++ b/src/db/summaries-repository.ts
@@ -1,8 +1,7 @@
 /**
  * Repository for hierarchical-retrieval session + conversation summaries.
  * Schema lives in src/db/schema.sql under "Hierarchical Retrieval" section.
- * Activated only when `HIERARCHICAL_RETRIEVAL_ENABLED=true`; see
- * docs/hierarchical-retrieval.md.
+ * Activated only when `HIERARCHICAL_RETRIEVAL_ENABLED=true`.
  *
  * Reads use pgvector cosine distance (`embedding <=> $1`) returning
  * `1 - distance` as similarity. The `pgvector` package converts JS
diff --git a/src/services/memory-service-types.ts b/src/services/memory-service-types.ts
index 64ea18a..c867422 100644
--- a/src/services/memory-service-types.ts
+++ b/src/services/memory-service-types.ts
@@ -371,7 +371,6 @@ export interface IngestRuntimeConfig {
    * Hierarchical retrieval gate. When true, ingest generates session +
    * conversation summaries (session-summary-generator.ts); search adds a 5th
    * RRF arm over those summaries. Default false — no runtime effect today.
-   * See `docs/hierarchical-retrieval.md`.
    */
   hierarchicalRetrievalEnabled: boolean;
   /**
diff --git a/src/services/retrieval-policy.ts b/src/services/retrieval-policy.ts
index 5489ee0..914b99c 100644
--- a/src/services/retrieval-policy.ts
+++ b/src/services/retrieval-policy.ts
@@ -75,7 +75,7 @@ const RECALL_BYPASS_REASONS = {
  *
  * Validated 2026-04-01: 0/15 false positives across 2,173 benchmark queries
  * (7 datasets). 4 borderline date-pinned queries are harmless (extra depth,
- * no accuracy impact). See: docs/.../current-marker-fp-analysis-2026-04-01.md
+ * no accuracy impact).
  *
  * If editing this list, re-run the FP scan:
  *   classifyQueryDetailed() against all eval dataset queries.
diff --git a/src/services/typed-belief-calculus.ts b/src/services/typed-belief-calculus.ts
index cdbd48d..ac0045b 100644
--- a/src/services/typed-belief-calculus.ts
+++ b/src/services/typed-belief-calculus.ts
@@ -9,8 +9,7 @@
  * Phase 2 (this revision) wires `decideBeliefOperator` to a real LLM call
  * and lets `memory-audn.ts` route through it when `RuntimeConfig.tbcEnabled`
  * is true. Schema is unchanged — TBC mutations write to existing JSONB
- * metadata only. See `tbc-execution.ts` for the executor and
- * `docs/typed-belief-calculus.md` for design rationale.
+ * metadata only. See `tbc-execution.ts` for the executor.
  */
 
 import type { ChatMessage, LLMProvider } from './llm.js';