Problem
After KS69-KS71 (consolidation redesign + Tier 2), 3 benchmark cases consistently fail in both embedding-only and consolidation modes. The correct child facts ARE extracted and stored, but they don't score high enough to rank in top-5 results.
All 3 share the same root cause: embedding similarity gap — the child's embedding is too distant from the query, and high-activation organic entries dominate.
Failing Cases
KU-3: "What IDE does Sam use?"
TR-2: "Where has Sam traveled recently?"
PT-3: "What language is Sam learning?"
- Expected: Japanese/JLPT in top-3
- Child exists: "I practiced my Japanese — I'm at JLPT N3 level" (subject: "Japanese")
- Child rank: Not in top-5 at all
- Blocking entries: Programming language memories (Rust, Go, Python) dominate because "language" is ambiguous between natural and programming
- Gap: BGE-small-EN-v1.5 doesn't distinguish "natural language learning" from "programming language preference" well enough
What's been tried (KS68-KS71)
Potential fix directions (not yet implemented)
A. HyDE query expansion
Ask an LLM to generate a hypothetical answer before embedding the query. "What IDE does Sam use?" → "Sam uses Neovim as his editor" → embed that instead. The expanded query embeds much closer to the child fact. EchoConfig already has a hyde_enabled field stub.
B. Importance boost for superseding children
When a child's parent supersedes another memory, boost the child's importance score. "Sam uses Neovim" (child of superseding parent) gets +0.1 importance when the VS Code→Neovim edge exists.
C. Stronger label classification for LLM-extracted children
Currently children inherit parent labels (Tier 1 keyword). Add Tier 2 label enrichment specifically for children — classify "Sam uses Neovim" with topic:tools:editor so label_topic_boost fires.
D. Embedding model upgrade
BGE-small-EN-v1.5 (384-dim) conflates "language" (natural vs programming). A larger model (e.g., BGE-base or E5-large) may separate these better. Trade-off: latency + memory.
E. Query-type disambiguation
Detect "learning" in query → boost action:learning labeled entries. Detect "IDE"/"editor" → boost topic:tools:editor. Already partially exists in label_topic_boost but needs the child to carry the right label.
Benchmark context
- Embedding-only: 16-17/20 (KU-3, TR-2, PT-3 always fail)
- Consolidation (qwen2.5:1.5b): 17/20 (same 3 fail)
- Seeded (deterministic children): 20/20 (all pass with hand-crafted labels + confidence)
- The 3-case gap between seeded and consolidation is entirely this issue
Files
crates/shrimpk-memory/src/echo.rs — scoring pipeline, label_topic_boost, importance_boost
crates/shrimpk-memory/src/labels.rs — query classification, Tier 1/2 labels
crates/shrimpk-memory/src/consolidation.rs — child creation, label inheritance
tests/echo_micro_benchmark.rs — benchmark definitions
Problem
After KS69-KS71 (consolidation redesign + Tier 2), 3 benchmark cases consistently fail in both embedding-only and consolidation modes. The correct child facts ARE extracted and stored, but they don't score high enough to rank in top-5 results.
All 3 share the same root cause: embedding similarity gap — the child's embedding is too distant from the query, and high-activation organic entries dominate.
Failing Cases
KU-3: "What IDE does Sam use?"
topic:tools:editorlabel (triggers +0.06 label_topic_boost)TR-2: "Where has Sam traveled recently?"
PT-3: "What language is Sam learning?"
What's been tried (KS68-KS71)
topic:tools:editor) — helps in seeded benchmark but LLM doesn't produce this labelPotential fix directions (not yet implemented)
A. HyDE query expansion
Ask an LLM to generate a hypothetical answer before embedding the query. "What IDE does Sam use?" → "Sam uses Neovim as his editor" → embed that instead. The expanded query embeds much closer to the child fact.
EchoConfigalready has ahyde_enabledfield stub.B. Importance boost for superseding children
When a child's parent supersedes another memory, boost the child's importance score. "Sam uses Neovim" (child of superseding parent) gets +0.1 importance when the VS Code→Neovim edge exists.
C. Stronger label classification for LLM-extracted children
Currently children inherit parent labels (Tier 1 keyword). Add Tier 2 label enrichment specifically for children — classify "Sam uses Neovim" with
topic:tools:editorsolabel_topic_boostfires.D. Embedding model upgrade
BGE-small-EN-v1.5 (384-dim) conflates "language" (natural vs programming). A larger model (e.g., BGE-base or E5-large) may separate these better. Trade-off: latency + memory.
E. Query-type disambiguation
Detect "learning" in query → boost
action:learninglabeled entries. Detect "IDE"/"editor" → boosttopic:tools:editor. Already partially exists inlabel_topic_boostbut needs the child to carry the right label.Benchmark context
Files
crates/shrimpk-memory/src/echo.rs— scoring pipeline, label_topic_boost, importance_boostcrates/shrimpk-memory/src/labels.rs— query classification, Tier 1/2 labelscrates/shrimpk-memory/src/consolidation.rs— child creation, label inheritancetests/echo_micro_benchmark.rs— benchmark definitions