graph-population PR A: extraction service + store ops#50
Merged
Conversation
Foundation for the graph-population pipeline (spec:
docs/superpowers/specs/2026-05-08-lore-graph-population-design.md).
This PR adds the extraction service and the persistence-layer ops it
needs; PR B wires it into the create-time route handlers and adds the
backfill endpoint.
New service: src/lore/services/graph_extraction.py
* extract_and_persist(store, *, org_id, memory_id, content, context,
spawn_fn=None, timeout=None) -> ExtractionResult — run a
`claude -p` subagent with a deterministic extraction prompt, parse
the JSON from the final assistant message, and persist entities
(with case-insensitive + alias dedup) / mentions / relationships.
Idempotent: existing edges for the memory are deleted before insert.
* Concurrency capped via LORE_GRAPH_EXTRACTION_CONCURRENCY (default 2)
so dream-finalize bursts don't spawn 50 subprocesses at once.
* Spawn flags pinned: --output-format stream-json --verbose
--permission-mode default. The dream/capture saga (PRs #48, #49)
burned us once on the first two; the regression test at
test_spawn_claude_args.test_passes_required_flags catches future
silent-empty-graph failures.
* Failure modes (timeout / parse / non-zero exit / claude not on PATH)
all swallow and return ExtractionResult.error — no exception bubbles.
* is_enabled() auto-on iff `claude` is on PATH; explicit override via
LORE_GRAPH_EXTRACTION_ENABLED.
New store ops on Postgres + SQLite + protocol:
* find_entity_by_name_or_alias — case-insensitive name + alias lookup
(LOWER(name) match, then alias scan). PG uses jsonb_array_elements_text;
SQLite does a Python-side scan to stay portable across aiosqlite builds.
* replace_memory_mentions — DELETE existing rows for memory_id, INSERT
the supplied set, atomic via transaction.
* replace_memory_relationships — same shape on relationships
WHERE source_memory_id = ?. Active-edge UNIQUE conflicts (the
partial index on (source, target, type) WHERE valid_until IS NULL)
are silently skipped because the edge already exists from another
memory and re-asserting it from a different source isn't an error.
* list_memories_without_mentions — LEFT JOIN entity_mentions IS NULL
so the backfill endpoint (PR B) can find what to process.
Tests: tests/services/test_graph_extraction.py (39 cases — 17 run
unconditionally, 22 gated on the parametrized PG+SQLite store fixture
that CI's python-postgres job exercises). Coverage:
* Prompt builder: content + optional context block, schema lists
every entity type.
* Stream-json parser: picks last assistant text, handles json fences,
skips tool_use mid-stream events, returns None on no-JSON / empty.
* Spawn-args sanity: regression guard for the flag saga.
* Happy path: 2 entities + 1 relationship round-trip via real store.
* Dedup by case-insensitive name; dedup by alias.
* Idempotent re-extraction: replay produces same row count, not
doubled.
* Drops relationships referencing undeclared entities (LLM jitter
guard).
* Failure modes: timeout, parse error, non-zero exit, missing claude.
* Empty extraction persists nothing (clean no-op).
* Feature flag: explicit true/false + auto-on with claude on PATH.
* Concurrency cap holds under 10-task burst.
* Env-knob validators (concurrency min 1, timeout min 1s, invalid
falls back to default).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Foundation for the graph-population pipeline. Implements the extraction service and the new persistence-layer ops it needs; no route wiring or live LLM calls yet — that's PR B.
Spec: `docs/superpowers/specs/2026-05-08-lore-graph-population-design.md` (decided to use Claude via `claude -p` rather than OpenAI per review).
New service: `src/lore/services/graph_extraction.py`
New store ops (Postgres + SQLite + protocol)
Test plan
What's next
🤖 Generated with Claude Code