Skip to content

Releases: andrew-hernandez-paragon/code-graph-context

v4.0.0 — Graph as Persistent Memory

26 Apr 18:59

Choose a tag to compare

Changed (BREAKING for tooling that relies on CLEAR_PROJECT nuking everything)

  • parse_typescript_project with clearExisting: true (the default) no longer
    deletes SessionNote, SessionBookmark, Pheromone, or Project nodes for
    the target project. Code nodes (SourceFile, ClassDeclaration, etc.) are still
    rebuilt as before. The denylist is exposed as the PRESERVED_LABELS constant
    in src/storage/neo4j/neo4j.service.ts.
  • After parse completes, :ABOUT, :REFERENCES, and :MARKS edges from
    preserved nodes to the (rebuilt) code nodes are automatically recreated using
    the deterministic node IDs persisted on each preserved node. Orphan references
    (IDs no longer present after reparse) are surfaced via the parse-success
    message and session_recall's per-note staleAboutNodeIds.

Added

  • SessionNote.aboutNodeIds, lastValidated, supersededBy properties
    (auto-migrated on first MCP startup after upgrade). aboutNodeIds enables
    :ABOUT-edge recovery after reparse; lastValidated powers the recall
    freshness rerank; supersededBy is the single mechanism for "is this
    current?" — non-null filters the note out of default recall.
  • session_recall returns lastValidated, supersededBy, aboutNodeIds,
    staleAboutNodeIds per note. Filters out superseded notes by default; pass
    includeSuperseded: true to surface history.
  • SessionBookmark.embedding — bookmarks are now embedded on save and
    recallable via semantic search. New session_bookmarks_idx vector index;
    existing bookmarks backfilled paginated/idempotently on first MCP startup.
    session_recall with query and no sessionId now returns the top
    semantically-matched bookmark across all sessions, not just the current one.
  • session_update — new tool for in-place revision of a SessionNote
    (typo, severity, lastValidated bump, minor content correction, aboutNodeIds
    resync, supersession marker). Re-embeds on content change; drops/recreates
    :ABOUT edges on aboutNodeIds change. For substantive changes prefer
    session_save with supersededBy set so history is preserved as a new note.
  • CLEAR_PROJECT_FORCE — internal-only Cypher query (no denylist) for
    tests and explicit-nuke scenarios. Not exposed via the parse tool surface.

Changed (UX)

  • session_recall default limit 10 → 5. Pass limit: 10 explicitly when
    broader retrieval is needed. Reduces conversation-context bloat.
  • session_recall re-ranking: within similarity-sorted vector results,
    secondary order is lastValidated DESC, severity DESC so fresher and more
    critical notes surface first when scores are close. Filter-mode ordering
    switches to coalesce(lastValidated, createdAt) DESC, createdAt DESC.
  • session_recall: query embedding is computed once per call and reused for
    both bookmark and note semantic searches (avoids the previous double-embed).

Fixed

  • Latent bug where the Project node's status update silently no-op'd because
    the node had been deleted by CLEAR_PROJECT between the upsert (status:
    'parsing') and the post-import update (status: 'complete'). Now Project
    metadata survives reparse and the status update lands as intended.

Migration notes

  • All schema additions are auto-migrated on first MCP startup after upgrade.
    The migrateSessionNoteProperties and backfillBookmarkEmbeddings functions
    in src/mcp/service-init.ts are idempotent — subsequent startups touch zero
    rows once data is migrated.
  • No manual migration steps required. Failure of either migration is non-fatal
    and logged.

v3.0.16

14 Apr 03:25

Choose a tag to compare

Fix — async parse mode

Silence dotenv stdout writes in Worker threads. This is the true root cause of the async-mode SIGTERM mystery tracked across v3.0.14 and v3.0.15.

Root cause (confirmed via Claude Code's MCP client log)

Connection error: JSON Parse error: Unrecognized token '◇'
Closing transport (stdio transport error: SyntaxError)

dotenv@17.2.3 calls console.log('[dotenv@X.Y.Z] injecting env (N) from ... -- tip: 🔐 ...') on every load. That writes to stdout. Worker threads share stdout with the parent MCP process, so each Worker's dotenv.config() call emitted a non-JSON line onto the JSON-RPC pipe to Claude Code. The client errored with SyntaxError and closed the transport, then SIGTERMed the server ~2s later.

The error fired 4 times in rapid succession — once per Worker: the parse-coordinator plus the 3 chunk workers spawned by the worker pool. Sync mode never hits this because it never spawns those Workers.

The main MCP server already passed quiet: true at mcp.server.ts:19 — the two Worker files were just missed.

Changes

  • Add quiet: true to dotenv.config() in parse-coordinator.ts and chunk.worker.ts with a comment explaining why it is mandatory.
  • Revert the v3.0.15 "pre-warm embedding sidecar on main thread" change — that was based on a wrong hypothesis and added ~7s latency to the async tool return with no benefit.

Full Changelog

v3.0.15...v3.0.16

v3.0.15

14 Apr 03:12

Choose a tag to compare

Fix

Pre-warm the embedding sidecar on the MCP main thread before spawning the async parse Worker.

Root cause

When parse_typescript_project was called with async: true, a parse-coordinator Worker thread was spawned. That Worker initialized its own EmbeddingsService and, on the first embed call, tried to spawn the Python sentence-transformer sidecar from inside the Worker thread via child_process.spawn.

Spawning a subprocess from a Worker thread (while the main MCP thread is idle waiting on stdio) briefly disturbs the MCP stdio pipe to Claude Code. The harness responds with SIGTERM ~0.6s after the embedding batch starts — killing the server mid-parse and orphaning the async job. Subsequent check_parse_status calls return Not connected or Job not found because the server restarts with an empty in-memory job map.

Sync mode (async: false) was unaffected because embeddings run on the MCP main thread, which owns the stdio pipe and can spawn subprocesses cleanly.

Fix

Start the embedding sidecar on the main thread before spawning the parse Worker. The Worker's module-scoped sidecar singleton detects the server is already healthy on localhost via the health check in doStart() and skips the subprocess spawn entirely — it just makes HTTP calls to the already-running sidecar.

Full Changelog

v3.0.14...v3.0.15

v3.0.14

14 Apr 03:00

Choose a tag to compare

Fix

Reduce stderr back-pressure during parse to avoid MCP harness SIGTERMing the server mid-parse.

Parallel chunk workers plus per-batch import logs were flooding stderr through the MCP stdio pipe. When the harness back-pressured the pipe, Node would block the workers, and the harness would eventually kill the server — orphaning async parse jobs (the in-memory job map is lost on restart, so check_parse_status returned Not connected / Job not found).

Changes

  • Remove per-batch Created X nodes/edges in batch Y-Z logging in graph-generator.handler.ts (fired hundreds of times per parse across parallel workers). All info still captured via debugLog to the debug file.
  • Drop emoji/banner console.error calls in parse-typescript-project.tool.ts and parser-factory.ts that duplicated debugLog entries.
  • Gate Python sidecar stderr forwarding behind CGC_DEBUG=1 env var. The sentence-transformer sidecar emits continuous per-batch progress that was streaming live to MCP stderr during embedding.
  • Rewrite remaining error-path console.error calls as structured JSON for consistency with server lifecycle logging.

Full Changelog

v3.0.13...v3.0.14

v3.0.0 — MCP Tool Improvements

17 Mar 23:36

Choose a tag to compare

Breaking Changes

  • Session tools consolidated (5→3): save_session_bookmark, restore_session_bookmark, save_session_note, recall_session_notes replaced by session_save and session_recall. cleanup_session unchanged.
  • swarm_claim_task split into 3 tools: Release/abandon → swarm_release_task. Start/force_start → swarm_advance_task. The action parameter replaced by startImmediately boolean.
  • traverse_from_node display params nested: includeCode, snippetLength, summaryOnly, maxNodesPerChain, maxTotalNodes, limit moved into optional displayOptions object (13→8 top-level params).

Added

  • session_save — unified save tool, auto-detects bookmark vs note from input
  • session_recall — unified recall tool, combines bookmark restore + semantic note search
  • swarm_release_task — release or abandon claimed tasks
  • swarm_advance_task — start or force-start claimed tasks
  • autoResolveProjectIddetect_dead_code and detect_duplicate_code now auto-resolve when only one project exists
  • createEmptyResponse helper — standardized { status: 'empty', message, suggestion } shape

Changed

  • All tool descriptions restructured with category/usage hints for better LLM tool selection
  • Parameter descriptions trimmed (~290 lines removed) — no more restated defaults, types, or enum values
  • Summary stats always included in detect_dead_code/detect_duplicate_code regardless of summaryOnly
  • Error responses standardized across all tools

Fixed

  • useWeightedTraversal doc bug (description said default false, schema had true)
  • chunkSize doc bug (description said default 50, schema had 100)
  • success: false inside success response in session-bookmark

v2.14.1

17 Mar 02:36

Choose a tag to compare

Fix: NL-to-Cypher JSON parsing failures

The natural_language_to_cypher tool was failing because GPT-4o returned prose/markdown instead of raw JSON.

Root causes

  • file_search tool encouraged the LLM to narrate its findings before producing JSON
  • 170-line prompt buried the "JSON only" instruction under redundant rules
  • No response_format enforcement at the API level

Changes

  • Removed file_search and code_interpreter from the OpenAI assistant — schema is now injected directly into each message
  • Added response_format: { type: 'json_object' } to enforce JSON output at the API level
  • Condensed prompt from ~170 lines to ~25 lines — focused on Cypher rules, no redundancy
  • Enriched schema injection with full details (counts, properties, connection patterns)
  • Added extractJson() fallback for backward compatibility with cached OPENAI_ASSISTANT_ID

Note

If you have OPENAI_ASSISTANT_ID set in your environment, clear it so a new assistant is created with these settings.

v2.14.0

16 Mar 06:19

Choose a tag to compare

What's Changed

Schema Introspection Overhaul

  • Replaced apoc.meta.schema() with project-scoped DISCOVER_* queries — schema file is now clean, structured JSON with node types (+ property keys), relationship types (+ all connection patterns), semantic types (+ parent labels), and common patterns
  • Schema is compact (~5KB) and suitable for direct injection into LLM prompts

Environment Variable Restructuring

  • BREAKING: OPENAI_ENABLED renamed to OPENAI_EMBEDDINGS_ENABLED to separate embedding provider control from NL-to-Cypher
  • Old OPENAI_ENABLED still works with a deprecation warning
  • OPENAI_API_KEY alone now enables natural_language_to_cypher (independent of embedding provider choice)

Bug Fixes

  • Fixed discovery queries missing projectId parameter (broken since multi-project support in v2.10.0)
  • Fixed natural_language_to_cypher silently failing without OpenAI — now skips init cleanly with actionable error message
  • Fixed Error objects serializing as {} in debug logs

Migration

If you use OPENAI_ENABLED=true, rename it to OPENAI_EMBEDDINGS_ENABLED=true. The old variable will continue to work but logs a deprecation warning.

Full Changelog: v2.13.3...v2.14.0

v2.13.3

15 Mar 23:33

Choose a tag to compare

What's new

  • Config file ingestion — parsing now ingests JSON, YAML, .env, Dockerfiles, shell scripts, and Python files as ConfigFile nodes in the graph, with configurable glob patterns
  • filePath resolution in swarm toolsswarm_pheromone and swarm_sense now accept filePath as an alternative to nodeId, auto-resolving to the corresponding SourceFile node
  • Code formatting cleanup across embeddings and swarm modules

v2.13.2

11 Mar 22:33

Choose a tag to compare

Fix: Concurrency semaphore prevents embedding timeouts

When parallel workers (15+) send large embed requests simultaneously, the sidecar's GPU processes them serially — later requests queue in uvicorn and timeout at 60s.

Fix: A semaphore in EmbeddingSidecar.embed() limits concurrent requests to 2 (configurable via EMBEDDING_MAX_CONCURRENT). Excess callers queue in Node.js with no timeout pressure. The 60s timer only starts after acquiring a slot.

Before: 15 workers × 50 large texts → 15/15 timeout at 60s
After: 15 workers × 50 large texts → 15/15 succeed

Combined with v2.13.1's HTTP batch cap, this completes the fix for large codebase parsing failures introduced in v2.13.0.

v2.13.1

11 Mar 22:05

Choose a tag to compare

Fix: Embedding sidecar OOM under parallel load

v2.13.0's batch consolidation sent unlimited texts per HTTP request to the embedding sidecar. When multiple parallel workers hit it simultaneously (~15 chunks × 50+ texts), the sidecar ran out of memory and crashed — causing "fetch failed" across all workers.

Fix: Cap HTTP requests at 50 texts (configurable via EMBEDDING_HTTP_BATCH_LIMIT). This bounds peak memory per request while still being a major improvement over the old per-8-text batching.

Scenario HTTP requests for 50 texts
v2.12.x ~6 (8 texts each)
v2.13.0 (broken) 1 (all 50, OOM when concurrent)
v2.13.1 1 (fits in 50 limit, splits only for larger chunks)

New env variable: EMBEDDING_HTTP_BATCH_LIMIT (default: 50) — max texts per HTTP request to the sidecar.