graph-population PR A: extraction service + store ops by amitpaz1 · Pull Request #50 · agentkitai/lore

amitpaz1 · 2026-05-08T11:02:06Z

Summary

Foundation for the graph-population pipeline. Implements the extraction service and the new persistence-layer ops it needs; no route wiring or live LLM calls yet — that's PR B.

Spec: `docs/superpowers/specs/2026-05-08-lore-graph-population-design.md` (decided to use Claude via `claude -p` rather than OpenAI per review).

New service: `src/lore/services/graph_extraction.py`

`extract_and_persist(store, *, org_id, memory_id, content, context, spawn_fn=None, timeout=None) -> ExtractionResult` — spawn a `claude -p` subagent with a deterministic extraction prompt, parse JSON from the final assistant message, persist entities (case-insensitive + alias dedup) / mentions / relationships. Idempotent: existing edges for the memory are deleted before insert.
Concurrency capped via `LORE_GRAPH_EXTRACTION_CONCURRENCY` (default 2) so dream-finalize bursts don't spawn 50 subprocesses at once.
Spawn flags pinned: `--output-format stream-json --verbose --permission-mode default`. The dream/capture saga (PRs fix(capture+dream): pass --verbose to claude -p stream-json #48 fix(capture+dream): bypassPermissions for trusted subagents #49) burned us once on the first two; the regression test catches future silent-empty-graph failures.
Failure modes (timeout, parse, non-zero exit, claude not on PATH) all swallow and return `ExtractionResult.error` — no exceptions bubble.
`is_enabled()` auto-on iff `claude` is on PATH; explicit override via `LORE_GRAPH_EXTRACTION_ENABLED`.

New store ops (Postgres + SQLite + protocol)

`find_entity_by_name_or_alias` — case-insensitive name + alias lookup.
`replace_memory_mentions` — DELETE existing rows for memory_id, INSERT the supplied set, atomic.
`replace_memory_relationships` — same shape on relationships WHERE source_memory_id; active-edge UNIQUE conflicts silently skipped (edge already exists from another memory).
`list_memories_without_mentions` — LEFT JOIN entity_mentions IS NULL so PR B's backfill endpoint can find what to process.

Test plan

`ruff check src/ tests/` — clean
`pytest tests/services/test_graph_extraction.py` — 17 passed, 22 skipped (parametrized store fixture; CI python-postgres exercises them)
`pytest tests/ --ignore=test_http_store_integration` — 2803 passed, no regressions
CI green

What's next

PR B — wire `extract_and_persist` into `routes/memories.py` + `routes/observations.py` create handlers as a fire-and-forget task; add `POST /v1/graph/backfill` endpoint and retarget the broken `lore graph-backfill` CLI to it.
PR C (optional) — UI graph polish if the populated graph still feels sparse.

🤖 Generated with Claude Code

Foundation for the graph-population pipeline (spec: docs/superpowers/specs/2026-05-08-lore-graph-population-design.md). This PR adds the extraction service and the persistence-layer ops it needs; PR B wires it into the create-time route handlers and adds the backfill endpoint. New service: src/lore/services/graph_extraction.py * extract_and_persist(store, *, org_id, memory_id, content, context, spawn_fn=None, timeout=None) -> ExtractionResult — run a `claude -p` subagent with a deterministic extraction prompt, parse the JSON from the final assistant message, and persist entities (with case-insensitive + alias dedup) / mentions / relationships. Idempotent: existing edges for the memory are deleted before insert. * Concurrency capped via LORE_GRAPH_EXTRACTION_CONCURRENCY (default 2) so dream-finalize bursts don't spawn 50 subprocesses at once. * Spawn flags pinned: --output-format stream-json --verbose --permission-mode default. The dream/capture saga (PRs #48, #49) burned us once on the first two; the regression test at test_spawn_claude_args.test_passes_required_flags catches future silent-empty-graph failures. * Failure modes (timeout / parse / non-zero exit / claude not on PATH) all swallow and return ExtractionResult.error — no exception bubbles. * is_enabled() auto-on iff `claude` is on PATH; explicit override via LORE_GRAPH_EXTRACTION_ENABLED. New store ops on Postgres + SQLite + protocol: * find_entity_by_name_or_alias — case-insensitive name + alias lookup (LOWER(name) match, then alias scan). PG uses jsonb_array_elements_text; SQLite does a Python-side scan to stay portable across aiosqlite builds. * replace_memory_mentions — DELETE existing rows for memory_id, INSERT the supplied set, atomic via transaction. * replace_memory_relationships — same shape on relationships WHERE source_memory_id = ?. Active-edge UNIQUE conflicts (the partial index on (source, target, type) WHERE valid_until IS NULL) are silently skipped because the edge already exists from another memory and re-asserting it from a different source isn't an error. * list_memories_without_mentions — LEFT JOIN entity_mentions IS NULL so the backfill endpoint (PR B) can find what to process. Tests: tests/services/test_graph_extraction.py (39 cases — 17 run unconditionally, 22 gated on the parametrized PG+SQLite store fixture that CI's python-postgres job exercises). Coverage: * Prompt builder: content + optional context block, schema lists every entity type. * Stream-json parser: picks last assistant text, handles json fences, skips tool_use mid-stream events, returns None on no-JSON / empty. * Spawn-args sanity: regression guard for the flag saga. * Happy path: 2 entities + 1 relationship round-trip via real store. * Dedup by case-insensitive name; dedup by alias. * Idempotent re-extraction: replay produces same row count, not doubled. * Drops relationships referencing undeclared entities (LLM jitter guard). * Failure modes: timeout, parse error, non-zero exit, missing claude. * Empty extraction persists nothing (clean no-op). * Feature flag: explicit true/false + auto-on with claude on PATH. * Concurrency cap holds under 10-task burst. * Env-knob validators (concurrency min 1, timeout min 1s, invalid falls back to default). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

amitpaz1 merged commit a970185 into main May 8, 2026
6 checks passed

amitpaz1 mentioned this pull request May 8, 2026

graph-population PR B: route wiring + backfill endpoint #51

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

graph-population PR A: extraction service + store ops#50

graph-population PR A: extraction service + store ops#50
amitpaz1 merged 1 commit into
mainfrom
graph-extraction-foundation

amitpaz1 commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

amitpaz1 commented May 8, 2026

Summary

New service: `src/lore/services/graph_extraction.py`

New store ops (Postgres + SQLite + protocol)

Test plan

What's next

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant