Atrv-Shrn · Atrv-Shrn · May 14, 2026 · May 14, 2026 · May 14, 2026 · May 14, 2026
diff --git a/.claude/reviews/readme-finish-spec.md b/.claude/reviews/readme-finish-spec.md
@@ -0,0 +1,22 @@
+# Review: readme-finish-spec
+
+## P1 Findings (Breaking)
+
+- [README.md:1093] "Allowed characters are letters, digits, hyphens, underscores, dots, and colons" -- colons are NOT allowed. The source code (`_ILLEGAL_CHARS = set('<>:"|?*')` in `vault.py:450`) includes `:` as a rejected character. The README's own next sentence also lists `:` as a rejected Windows-illegal character, creating an internal contradiction. A user who follows the "allowed characters" claim and uses a colon in an agent_id will get a 400 error.
+
+## P2 Findings (Minor)
+
+- [README.md:32] Forbidden word "just" -- "find memories by meaning, not just exact keyword matches" violates Rule 2 (no "simply", "just", "obviously", "easily", "merely").
+
+- [README.md:650-666] System Prompt Endpoint has no curl example -- Rule 5 requires every API endpoint to have a complete curl example with full request and full response body. This endpoint only shows `GET /agents/{agent_id}/system-prompt` without a curl command.
+
+- [README.md:664] Truncated output with "..." in example response body -- `"system_prompt_block": "## 0. MEMORY MANDATE\n\n..."` violates Rule 6 (no "..." or truncated output in examples).
+
+- [README.md:24] Glossary claims all REST endpoints live under `/agents/{agent_id}/memories/...` -- factually incorrect. The inject endpoint (`/agents/{agent_id}/inject`), system-prompt endpoint (`/agents/{agent_id}/system-prompt`), and shared inject endpoint (`/shared/inject`) do not live under `/memories/...`. Also uses `...` which violates Rule 6.
+
+- [README.md:859-954] Walkthrough section repeats full response shapes instead of cross-referencing the API Reference section, violating Rule 8 (cross-reference response shapes instead of repeating them). The write response (lines 859-865), read response (lines 878-889), search response (lines 910-919), and inject response (lines 938-954) all duplicate shapes already shown in the REST API Reference.
+
+- No troubleshooting entry for the "shared" reserved agent_id error -- Using `shared` as an agent_id returns `{"detail": "agent_id 'shared' is reserved"}` (400), but there is no troubleshooting entry for this specific error. The existing "agent_id contains illegal characters" entry does not cover this case. Violates Rule 3.
+
+## Verdict
+REWORK (P1 found)
diff --git a/.claude/settings.local.json b/.claude/settings.local.json
@@ -0,0 +1,7 @@
+{
+  "permissions": {
+    "allow": [
+      "Bash(curl *)"
+    ]
+  }
+}
diff --git a/.claude/specs/consolidation.md b/.claude/specs/consolidation.md
@@ -0,0 +1,37 @@
+# Spec — feat/consolidation
+
+## Branch Scope
+
+Background Memory Consolidation: config fields, Consolidator class, app lifecycle, tests.
+
+## Source Files
+
+| File                                         | Change                                                                                                                                                                                                                                                                                                         |
+| -------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `src/memstack/core/config.py`                | Add `consolidation_enabled`, `consolidation_interval`, `consolidation_batch_size`, `consolidation_model` fields; add `effective_consolidation_model` property; add validators: `consolidation_interval >= 60`, `1 <= consolidation_batch_size <= 100`                                                          |
+| `src/memstack/intelligence/consolidation.py` | New module with `Consolidator` class: `run()` async loop, `_consolidate_all_agents()`, `_consolidate_agent(agent_id)`, `stop()`; supports rewrite/merge/split/enrich operations; uses VaultStore methods for all file ops; skips `shared/` directory; skips operations on missing IDs; graceful on LLM failure |
+| `src/memstack/intelligence/__init__.py`      | Export `Consolidator`                                                                                                                                                                                                                                                                                          |
+| `src/memstack/interfaces/rest/app.py`        | Add `app.state.consolidator`; start consolidator in startup event if `consolidation_enabled`; stop in shutdown event                                                                                                                                                                                           |
+| `tests/test_consolidation.py`                | 15 tests: batch selection, fewer memories than batch, skip empty agents, skip shared, rewrite/enrich/merge/split operations, skip on missing ID, LLM failure continues, stop cancels iteration, shared mode merge/split, interval validation, batch_size validation                                            |
+| `tests/test_config.py`                       | Add assertions for all 4 consolidation config fields, defaults, overrides, and validators                                                                                                                                                                                                                      |
+| `.env.example`                               | Add `MEMSTACK_CONSOLIDATION_ENABLED`, `MEMSTACK_CONSOLIDATION_INTERVAL`, `MEMSTACK_CONSOLIDATION_BATCH_SIZE`, `MEMSTACK_CONSOLIDATION_MODEL`                                                                                                                                                                   |
+
+## Out of Scope
+
+- Synthesis feature
+- Version bump
+- README/CHANGELOCK narrative updates
+- Changes to vault format/search/pipeline logic
+
+## Acceptance Criteria
+
+1. When `consolidation_enabled=True`, background task runs every `consolidation_interval` seconds
+2. Supports 4 operations: rewrite, merge, split, enrich — all via VaultStore methods
+3. Rewrite/enrich preserve ID, update only body + `updated`
+4. Merge creates new memory with averaged importance, deletes originals
+5. Split creates new memories with original importance, deletes original
+6. Missing memory ID -> skip entire operation, log warning
+7. `consolidator.stop()` cancels next iteration cleanly
+8. `shared/` directory never processed
+9. Interval < 60 and batch_size outside 1-100 rejected by Settings
+10. All consolidation tests pass, ruff lint clean, >=90% coverage of new code
diff --git a/.claude/specs/dimension-validation.md b/.claude/specs/dimension-validation.md
@@ -0,0 +1,40 @@
+# Spec: feat/dimension-validation
+
+## Branch Scope
+
+Dimension validation + auto-reindex at startup. When the embedding model changes (e.g., from ollama/nomic-embed-text to fastembed/BAAI/bge-small-en-v1.5), the vector dimension in LanceDB may mismatch. This branch adds a `validate_dimension()` method that detects mismatches and triggers a full reindex.
+
+## Files to Change
+
+1. `src/memstack/search/index.py` — add `validate_dimension()` method
+2. `src/memstack/interfaces/rest/app.py` — call `validate_dimension()` after SearchIndex creation
+3. `src/memstack/interfaces/mcp/server.py` — call `validate_dimension()` after SearchIndex creation
+4. `tests/test_search_index.py` — add 4 tests
+
+## Implementation Details
+
+### validate_dimension() method on SearchIndex
+
+- Read the existing LanceDB table schema to get the current vector dimension
+- If no table exists yet (`_table is None` and `open_table` raises), skip gracefully (no warning, no reindex)
+- If no embedding provider (`_embedding_provider is None`), skip gracefully (no warning, no reindex)
+- Embed a test string with the current provider to get the provider's dimension
+- Compare provider dimension with table dimension
+- If they match → no action, no log
+- If they mismatch → log a warning naming both dimensions, then call `self.reindex()`
+
+### App startup calls
+
+- REST app (`app.py`): call `search_index.validate_dimension()` after SearchIndex creation, before vault scan
+- MCP server (`server.py`): call `search_index.validate_dimension()` after SearchIndex creation in `mcp_lifespan`
+
+### Tests
+
+1. `test_validate_dimension_no_table` — no existing table → no reindex, no warning
+2. `test_validate_dimension_matching` — matching dimensions → no reindex, no warning
+3. `test_validate_dimension_mismatch_triggers_reindex` — mismatch → warning logged, reindex called
+4. `test_validate_dimension_no_provider` — no embedding provider → skip gracefully
+
+## Out of Scope
+
+- Embedding config changes, .env.example, config.py defaults, autofallback removal, version bump, CHANGELOG
diff --git a/.claude/specs/synthesis.md b/.claude/specs/synthesis.md
@@ -0,0 +1,28 @@
+# Spec: feat/synthesis — LLM Memory Synthesis
+
+## Branch Scope
+
+From `reference/branch-plan.md`:
+
+- `src/memstack/core/config.py` — add `synthesis_enabled: bool = False`, `synthesis_model: str = ""` fields, add `effective_synthesis_model` property
+- `src/memstack/intelligence/synthesis.py` — new module with `synthesize()` function
+- `src/memstack/intelligence/__init__.py` — export `synthesize`
+- `src/memstack/interfaces/rest/memories.py` — add synthesis check in `create_memory()`
+- `tests/test_synthesis.py` — 7 tests
+- `tests/test_config.py` — add assertions for synthesis config
+- `tests/test_routes_memories.py` — add 4 tests for synthesis in routes
+- `.env.example` — add synthesis env vars
+
+## Acceptance Criteria
+
+- When `synthesis_enabled=True` and `"auto-capture"` in tags, LLM extracts statements; each enters pipeline independently
+- When `synthesis_enabled=False` or no `"auto-capture"` tag, write passes through unchanged
+- LLM failure, unparseable JSON, or empty statements → `[content]` fallback (never loses data)
+- HTTP response returns first statement's decision; subsequent statements logged at INFO
+- All synthesis tests pass, ruff lint clean on changed files, ≥90% coverage of new code
+
+## Out of Scope
+
+- Consolidation feature (separate branch)
+- Version bump, README/CHANGELOG narrative updates
+- Changes to vault format, search, pipeline logic
diff --git a/.claude/specs/thresholds.md b/.claude/specs/thresholds.md
@@ -0,0 +1,27 @@
+# Spec: feat/thresholds
+
+## Branch Scope
+
+Bug 3: Lower similarity threshold defaults, add similarity_ignore_enabled toggle.
+
+## Files In Scope
+
+- `src/memstack/core/config.py` — lower `similarity_add_threshold` 0.3→0.25, lower `similarity_ignore_threshold` 0.92→0.85, add `similarity_ignore_enabled: bool = False`
+- `.env.example` — update threshold values, add `MEMSTACK_SIMILARITY_IGNORE_ENABLED=false`, update version header to 1.4.3
+- `README.md` — update config table values, add new row for `MEMSTACK_SIMILARITY_IGNORE_ENABLED`, update smart write pipeline description, update troubleshooting
+- `tests/test_config.py` — update threshold default assertions, add `similarity_ignore_enabled` default and override assertions
+- `tests/test_pipeline.py` — update `_make_settings` helper threshold values to 0.25/0.85
+
+## Files Out of Scope
+
+- Pipeline logic changes (feat/merge-pipeline)
+- LLM prompt changes (feat/merge-pipeline)
+- VaultStore.merge() (feat/merge-pipeline)
+- Version bump in **init**.py/pyproject.toml/MCP server (feat/integration at merge time)
+
+## Acceptance Criteria
+
+1. `Settings()` returns `similarity_add_threshold=0.25`, `similarity_ignore_threshold=0.85`, `similarity_ignore_enabled=False`
+2. `.env.example` documents all three with correct defaults
+3. README config table reflects new values
+4. All existing tests pass with updated assertions
diff --git a/.env.example b/.env.example
@@ -1,4 +1,4 @@
-# MemStack v1.4.4 Configuration
+# MemStack v1.4.5 Configuration
 # Copy this file to .env and fill in the values
 
 # Required: Path to the vault directory where memory files are stored
@@ -12,11 +12,6 @@ MEMSTACK_HOST=127.0.0.1
 # Default: 7777
 MEMSTACK_PORT=7777
 
-# Whether to format memory files for Obsidian compatibility
-# Reserved: accepted but unused in current version
-# Default: true
-MEMSTACK_OBSIDIAN_MODE=true
-
 # Enable shared mode (multiple agents read/write a common pool)
 # Default: false
 MEMSTACK_SHARED_MODE=false
@@ -46,16 +41,12 @@ MEMSTACK_LOG_RETENTION=7 days
 MEMSTACK_STATE_FILE=~/.memstack/state.json
 
 # Embedding provider: ollama or fastembed
-# Default: ollama
-MEMSTACK_EMBEDDING_PROVIDER=ollama
+# Default: fastembed
+MEMSTACK_EMBEDDING_PROVIDER=fastembed
 
 # Embedding model name (provider-specific)
-# Default: nomic-embed-text
-MEMSTACK_EMBEDDING_MODEL=nomic-embed-text
-
-# Auto-fallback from ollama to fastembed if ollama is unavailable
-# Default: true
-MEMSTACK_EMBEDDING_AUTOFALLBACK=true
+# Default: BAAI/bge-small-en-v1.5
+MEMSTACK_EMBEDDING_MODEL=BAAI/bge-small-en-v1.5
 
 # Maximum tokens per chunk for semantic chunking
 # Default: 512
@@ -81,16 +72,10 @@ MEMSTACK_INDEX_PATH=~/.memstack/index
 # Default: 0.25
 MEMSTACK_SIMILARITY_ADD_THRESHOLD=0.25
 
-# Similarity threshold above which memories are ignored as duplicates (0.0 to 1.0)
+# Similarity threshold for high-similarity matches (0.0 to 1.0)
 # Default: 0.85
 MEMSTACK_SIMILARITY_IGNORE_THRESHOLD=0.85
 
-# Enable auto-ignore for scores at or above the ignore threshold (true/false)
-# When false (default), all scores at or above the add threshold go to the LLM
-# When true, the old auto-ignore behavior is restored for high-similarity matches
-# Default: false
-MEMSTACK_SIMILARITY_IGNORE_ENABLED=false
-
 # Half-life in days for importance score decay
 # Default: 7.0
 MEMSTACK_IMPORTANCE_DECAY_HALFLIFE=7.0

diff --git a/.gitignore b/.gitignore
@@ -48,18 +48,8 @@ coverage.xml
 
 # Node.js
 node_modules/
-openclaw-bridge/dist/
 .tsbuildinfo
 
-# Dev-only files — not for release
-.claude/
-reference/
+# Dev-only files — not for release (kept on feat/integration, excluded from main via .gitattributes)
 evals/
-tests/
-CLAUDE.md
-CONTEXT.md
-STATUS.md
-progress.md
-ARCHITECTURE.md
-CONTRIBUTING.md