KU-3, TR-2, PT-3: Child facts exist but outranked by high-activation organic entries

## Problem

After KS69-KS71 (consolidation redesign + Tier 2), 3 benchmark cases consistently fail in both embedding-only and consolidation modes. The correct child facts ARE extracted and stored, but they don't score high enough to rank in top-5 results.

All 3 share the same root cause: **embedding similarity gap** — the child's embedding is too distant from the query, and high-activation organic entries dominate.

## Failing Cases

### KU-3: "What IDE does Sam use?"
- **Expected:** Neovim in top-3
- **Child exists:** "Sam uses Neovim as his primary code editor" (subject: "Neovim", confidence: 0.85)
- **Child rank:** #5 (score: 0.956)
- **Blocking entries:** Arch Linux (0.983), Rust/Go prefs (0.982), Sam Torres bio (0.969), MLTK (0.962)
- **Gap:** 0.027 below #4 — all blockers are semantically adjacent "tech setup" memories with higher ACT-R activation
- **Supersession:** VS Code→Neovim supersedes edge exists (P0 subject fix enabled it), but demotion alone doesn't close the gap
- **Seeded benchmark:** PASSES at 20/20 because hand-crafted child has `topic:tools:editor` label (triggers +0.06 label_topic_boost)

### TR-2: "Where has Sam traveled recently?"
- **Expected:** Tokyo in top-3
- **Child exists:** "Sam visited Tokyo last November for two weeks" (subject: "Tokyo")
- **Child rank:** #5 (score: 0.815)
- **Blocking entries:** Oakland/SF location-move entries score higher because "travel" semantics overlap with "move"
- **Gap:** Location memories (Oakland→SF move) embed closer to "traveled" than actual travel memories

### PT-3: "What language is Sam learning?"
- **Expected:** Japanese/JLPT in top-3
- **Child exists:** "I practiced my Japanese — I'm at JLPT N3 level" (subject: "Japanese")
- **Child rank:** Not in top-5 at all
- **Blocking entries:** Programming language memories (Rust, Go, Python) dominate because "language" is ambiguous between natural and programming
- **Gap:** BGE-small-EN-v1.5 doesn't distinguish "natural language learning" from "programming language preference" well enough

## What's been tried (KS68-KS71)
- [x] Label topic boost (+0.06 for `topic:tools:editor`) — helps in seeded benchmark but LLM doesn't produce this label
- [x] Supersession demotion (-0.15) — works for KU-1 (Shopify→Stripe) but gap is too small for KU-3
- [x] Self-contained proposition extraction (KS69 prompt v3) — facts are good quality, problem is ranking not extraction
- [x] Subject fix (KS71 P0) — subjects now "Neovim"/"Tokyo"/"Japanese" not "the user"
- [x] Quality gate (KS71 P1) — filters fragments, doesn't affect ranking
- [x] Soft invalidation (KS71 P3) — 0.5x demotion on superseded children, helps but not enough

## Potential fix directions (not yet implemented)

### A. HyDE query expansion
Ask an LLM to generate a hypothetical answer before embedding the query. "What IDE does Sam use?" → "Sam uses Neovim as his editor" → embed that instead. The expanded query embeds much closer to the child fact. `EchoConfig` already has a `hyde_enabled` field stub.

### B. Importance boost for superseding children
When a child's parent supersedes another memory, boost the child's importance score. "Sam uses Neovim" (child of superseding parent) gets +0.1 importance when the VS Code→Neovim edge exists.

### C. Stronger label classification for LLM-extracted children
Currently children inherit parent labels (Tier 1 keyword). Add Tier 2 label enrichment specifically for children — classify "Sam uses Neovim" with `topic:tools:editor` so `label_topic_boost` fires.

### D. Embedding model upgrade
BGE-small-EN-v1.5 (384-dim) conflates "language" (natural vs programming). A larger model (e.g., BGE-base or E5-large) may separate these better. Trade-off: latency + memory.

### E. Query-type disambiguation
Detect "learning" in query → boost `action:learning` labeled entries. Detect "IDE"/"editor" → boost `topic:tools:editor`. Already partially exists in `label_topic_boost` but needs the child to carry the right label.

## Benchmark context
- Embedding-only: 16-17/20 (KU-3, TR-2, PT-3 always fail)
- Consolidation (qwen2.5:1.5b): 17/20 (same 3 fail)
- Seeded (deterministic children): 20/20 (all pass with hand-crafted labels + confidence)
- The 3-case gap between seeded and consolidation is entirely this issue

## Files
- `crates/shrimpk-memory/src/echo.rs` — scoring pipeline, label_topic_boost, importance_boost
- `crates/shrimpk-memory/src/labels.rs` — query classification, Tier 1/2 labels
- `crates/shrimpk-memory/src/consolidation.rs` — child creation, label inheritance
- `tests/echo_micro_benchmark.rs` — benchmark definitions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KU-3, TR-2, PT-3: Child facts exist but outranked by high-activation organic entries #5

Problem

Failing Cases

KU-3: "What IDE does Sam use?"

TR-2: "Where has Sam traveled recently?"

PT-3: "What language is Sam learning?"

What's been tried (KS68-KS71)

Potential fix directions (not yet implemented)

A. HyDE query expansion

B. Importance boost for superseding children

C. Stronger label classification for LLM-extracted children

D. Embedding model upgrade

E. Query-type disambiguation

Benchmark context

Files

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

KU-3, TR-2, PT-3: Child facts exist but outranked by high-activation organic entries #5

Description

Problem

Failing Cases

KU-3: "What IDE does Sam use?"

TR-2: "Where has Sam traveled recently?"

PT-3: "What language is Sam learning?"

What's been tried (KS68-KS71)

Potential fix directions (not yet implemented)

A. HyDE query expansion

B. Importance boost for superseding children

C. Stronger label classification for LLM-extracted children

D. Embedding model upgrade

E. Query-type disambiguation

Benchmark context

Files

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions