Problem
ShrimPK has 9 separate mechanisms that independently approximate entity identification without a unifying layer:
subject field on MemoryEntry (core/memory.rs:234) — extracted per child fact
- Entity labels (
entity:* prefix in labels.rs) — keyword-based, not linked to entities
- Triples (subject, predicate, object) — stored per memory, disconnected from other entity refs
fix_degenerate_subject() (consolidation.rs:700) — 100-line heuristic to repair "the user"/"I"/"me"
child_topic_matches_query() (echo.rs:3008) — label/subject gate, fallback-based
- Subject diversity cap (echo.rs:3283) — per-(subject, topic) result limiting
subjects_overlap() (consolidation.rs:1225) — case-insensitive subject comparison for supersession
detect_relationship() (consolidation.rs:897) — regex relationship type extraction
- Hebbian co-activation (hebbian.rs) — typed edges (WorksAt, LivesIn, etc.) but not entity-anchored
These mechanisms don't talk to each other. "Sam", "Sam Torres", "the user", and "I" are treated as different subjects across different mechanisms. No centralized entity registry exists.
Impact
- Supersession fails when subject heuristics disagree (e.g., "the user" vs "Sam" in different facts)
- No entity profiles — can't answer "tell me everything about Sam" without full embedding scan
- Contradiction detection impossible — can't check for conflicting facts about the same entity without knowing they're the same entity
- GDPR deletion incomplete — can't find all traces of an entity across subjects, triples, labels, and Hebbian edges
- 6 roadmap features blocked: contradiction detection, tombstone propagation, faithfulness scoring, collaborative memory, memory-type routing, entity-centric retrieval
Proposed Solution: EntityFrame
A lightweight entity registry (EntityFrame) that absorbs existing mechanisms rather than adding another layer:
- Replaces:
fix_degenerate_subject() (~100 lines), subjects_overlap() (~30 lines), store entity_index
- Absorbs:
subject field becomes entity_id: Option<EntityId>, entity: labels generated from frame, triple subjects resolve to EntityId
- Untouched: Hebbian graph structure, non-entity labels, scoring pipeline, confidence/quality gates
Net complexity: ~+250 new lines, ~-300 removed heuristics. The system gets simpler.
Store-time detection: Aho-Corasick over known aliases (zero LLM calls). Entity-less memories (pure events, moods) get no entity assignment — not force-fitted.
Design Brief
Full design brief with EntityFrame structure, open questions, and roadmap impact at:
Obsidian Vault/ShrimPK Kernel/Design Brief — Entity Identification.md
Related Research
- Zep/Graphiti: NER + LLM entity extraction + temporal validity per entity edge
- Mem0: User profile building via fact extraction
- MIRIX (arXiv 2507.07957): "core biography" memory type per entity
- A-MEM (NeurIPS 2025): Zettelkasten notes with retroactive link updates on entity changes
Labels
- enhancement
- architecture
- consolidation
Problem
ShrimPK has 9 separate mechanisms that independently approximate entity identification without a unifying layer:
subjectfield on MemoryEntry (core/memory.rs:234) — extracted per child factentity:*prefix in labels.rs) — keyword-based, not linked to entitiesfix_degenerate_subject()(consolidation.rs:700) — 100-line heuristic to repair "the user"/"I"/"me"child_topic_matches_query()(echo.rs:3008) — label/subject gate, fallback-basedsubjects_overlap()(consolidation.rs:1225) — case-insensitive subject comparison for supersessiondetect_relationship()(consolidation.rs:897) — regex relationship type extractionThese mechanisms don't talk to each other. "Sam", "Sam Torres", "the user", and "I" are treated as different subjects across different mechanisms. No centralized entity registry exists.
Impact
Proposed Solution: EntityFrame
A lightweight entity registry (
EntityFrame) that absorbs existing mechanisms rather than adding another layer:fix_degenerate_subject()(~100 lines),subjects_overlap()(~30 lines), store entity_indexsubjectfield becomesentity_id: Option<EntityId>,entity:labels generated from frame, triple subjects resolve to EntityIdNet complexity: ~+250 new lines, ~-300 removed heuristics. The system gets simpler.
Store-time detection: Aho-Corasick over known aliases (zero LLM calls). Entity-less memories (pure events, moods) get no entity assignment — not force-fitted.
Design Brief
Full design brief with EntityFrame structure, open questions, and roadmap impact at:
Obsidian Vault/ShrimPK Kernel/Design Brief — Entity Identification.mdRelated Research
Labels