From 230bf164bf58695ef550970a88a6a23016f01362 Mon Sep 17 00:00:00 2001 From: Lior Cohen Date: Fri, 10 Apr 2026 05:20:21 +0300 Subject: [PATCH 1/3] KS78: Sync docs -- ROADMAP v0.7.5, CHANGELOG, MCP tool count - Rewrite ROADMAP.md from stale v0.5.0 to current v0.7.0 state - Add CHANGELOG [0.7.5] section covering KS67-KS77 changes - Fix MCP tool count: 9 -> 12 in CONTRIBUTING.md, CHANGELOG.md, ARCHITECTURE.md - Update SECURITY.md supported version: 0.5.x -> 0.7.x - List all 12 MCP tools where tool names are enumerated Co-Authored-By: Claude Opus 4.6 (1M context) --- CHANGELOG.md | 36 ++++- CONTRIBUTING.md | 2 +- SECURITY.md | 4 +- docs/ARCHITECTURE.md | 2 +- docs/ROADMAP.md | 356 ++++++++++++------------------------------- 5 files changed, 138 insertions(+), 262 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index c4647fa..284b379 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,6 +6,38 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/). ## [Unreleased] +## [0.7.5] -- 2026-04-10 + +### Added +- **Schema-driven fact extraction** (KS67): structured extraction pipeline replacing free-form LLM output +- **Entity unification** (KS73): EntityFrame, EntityId resolution, alias tracking, supersession rewrite +- **Configurable embedding** (KS75): EmbeddingProvider trait, 10 fastembed models, OpenAI API support +- **Universal prompt** (KS76): single consolidation prompt for all reader models (no per-model tuning) +- **Temporal boost** (KS76): recency-weighted scoring for time-sensitive queries +- **Importance scoring** (KS76): 5-signal importance scoring (entity density, temporal salience, novelty, info density, user signal) +- **Design system foundation** (KS77): design tokens, component spec for viz app +- **Negative recall benchmark**: 3/3 baseline for "I don't know" scenarios +- **Abstention benchmark**: 5/5 -- engine correctly abstains when no relevant memory exists + +### Changed +- **Consolidation redesign** (KS69): child memory pipeline rewrite with quality gates, dedup, soft invalidation +- **Consolidation Tier 2** (KS71): subject fix, quality gate, dedup, soft invalidation +- **Child keyword labels** (KS72): labels assigned at child creation time +- **Default enrichment model**: switched to `qwen2.5:1.5b` +- **MCP server**: now exposes 12 tools (was 9) -- added `memory_graph`, `memory_related`, `memory_get` + +### Fixed +- **KU-3 recall** (KS77): knowledge update scenario now passes in seeded benchmark +- **IE-3, TR-3, ME-4, PT-3 recall** (KS68): multiple recall fixes across LME categories +- **Temporal label dedup trap** (KS77): avoid adding temporal labels to children when parent has temporal content +- **Persistence format version**: format mismatch fix for MCP store/echo + +### Performance +- Seeded micro-benchmark: 19/20 (up from 55% baseline) +- Abstention: 5/5 +- Negative recall: 3/3 +- LME-S baseline (GPT-4o judge): 24.2% overall + ## [0.7.0] — 2026-04-02 ### Added @@ -153,8 +185,8 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/). - Competitive scan update (MuninnDB, memU, Hindsight, NeuralMemory) ### Added — KS7: MCP Server -- **MCP Server** (`shrimpk-mcp`): 9 tools over JSON-RPC 2.0 stdio - - store, echo, stats, forget, dump, config_show, config_set, persist, status +- **MCP Server** (`shrimpk-mcp`): 12 tools over JSON-RPC 2.0 stdio + - store, echo, memory_graph, memory_related, memory_get, stats, forget, dump, config_show, config_set, persist, status - Lazy engine init (fastembed loads on first tool call, not on handshake) - Auto-persist after store/forget, stdout sacred (logs to stderr) - Registered globally via `claude mcp add --scope user` diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 4d61ee5..7c3c8dd 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -30,7 +30,7 @@ Unit tests run entirely in-memory and complete in seconds. Integration tests dow | `shrimpk-security` | Sandbox, permissions | Planned (stub) | | `shrimpk-kernel` | Integration facade | Stable | | `shrimpk-python` | PyO3 bindings | Exists (untested in CI) | -| `shrimpk-mcp` | MCP server (9 tools) | Stable | +| `shrimpk-mcp` | MCP server (12 tools) | Stable | | `shrimpk-daemon` | HTTP daemon + proxy | Stable | | `shrimpk-tray` | System tray app | Stable | diff --git a/SECURITY.md b/SECURITY.md index 402e1a4..437fc8f 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -4,8 +4,8 @@ | Version | Supported | |---------|-----------| -| 0.5.x (latest) | Yes | -| < 0.5.0 | No | +| 0.7.x (latest) | Yes | +| < 0.7.0 | No | Only the latest released version receives security fixes. If you are running an older version, please upgrade before reporting. diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md index c283c75..8b4ffee 100644 --- a/docs/ARCHITECTURE.md +++ b/docs/ARCHITECTURE.md @@ -662,7 +662,7 @@ Integration layer that wires together `shrimpk-memory`, `shrimpk-context`, and ` ### shrimpk-mcp -Model Context Protocol server. Exposes Echo Memory as MCP tools (`store`, `echo`, `stats`, `forget`, `status`, `config_show`, `dump`) via JSON-RPC 2.0 over stdio. Compatible with any MCP-aware AI client. +Model Context Protocol server. Exposes Echo Memory as 12 MCP tools (`store`, `echo`, `memory_graph`, `memory_related`, `memory_get`, `stats`, `forget`, `status`, `config_show`, `config_set`, `dump`, `persist`) via JSON-RPC 2.0 over stdio. Compatible with any MCP-aware AI client. Key design: the `EchoEngine` is lazily initialized on first tool call. The server starts in milliseconds; fastembed model loading (a few seconds) is deferred until a memory operation is actually requested. diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md index 0468bf2..9aa8736 100644 --- a/docs/ROADMAP.md +++ b/docs/ROADMAP.md @@ -1,331 +1,175 @@ # ShrimPK Roadmap -This roadmap reflects the current state of the kernel and planned directions for future releases. -Dates are aspirational. Contributions are welcome at any stage — see the Contribution Opportunities -section for specific items you can pick up today. +This roadmap reflects the current state of the kernel and planned directions. +Dates are aspirational. Contributions welcome -- see Contribution Opportunities below. --- -## Current State — v0.5.0 +## Current State -- v0.7.0 -Released March 2026. The core pipeline is stable and benchmarked. +Released April 2026. The core pipeline is stable with 11 crates + CLI, hybrid GraphRAG retrieval, and entity unification. ### What is shipped and working -**Echo pipeline** +**Echo pipeline (hybrid GraphRAG)** -The full retrieval chain is operational: Bloom filter pre-screening (O(1) topic elimination), -LSH candidate retrieval (sub-linear at scale), cosine reranking, Hebbian co-activation boosting, -and recency decay. Optional HyDE (hypothetical document expansion) and LLM reranking are -available via config flags. +Full retrieval chain: Bloom filter pre-screening, LSH candidate retrieval, cosine reranking, +Hebbian co-activation boosting, FSRS decay, ACT-R activation, label-based pre-filtering, +schema-driven fact extraction, entity unification with supersession, and temporal boosting. +Optional HyDE and LLM reranking via config flags. -**Text memory — BGE-small-EN-v1.5** +**Configurable embedding** -Primary embedding model: `BAAI/bge-small-en-v1.5` via fastembed. The pipeline achieves 84% -top-3 recall (combined HyDE + LLM reranker config) on a realistic 41-memory, 25-query benchmark -spanning five LongMemEval categories: information extraction, multi-session reasoning, temporal -reasoning, knowledge update, and preference tracking. Temporal queries hit 100% (5/5) across -all pipeline configs. +EmbeddingProvider trait with 10 fastembed models (BGE-small-EN-v1.5 default, 384-dim) and +OpenAI API support. Runtime-switchable via config without restart. -**Vision memory — CLIP ViT-B/32** +**Multimodal SHRM v2 format** -Image memories are embedded using CLIP ViT-B/32 (512-dim) via fastembed's `ClipVitB32` variant. -Cross-modal retrieval (text queries retrieving image memories) works in the same embedding space. -The vision feature is gated behind `--features vision`. +Memory-mapped binary format with 3 channels: text (384-dim), vision (512-dim), speech (640-dim). +32-bit CRC per entry, atomic flush, crash recovery. Per-channel LSH indices. -**Sleep consolidation** - -A background consolidation pass runs during idle periods (configurable schedule). It uses a local -LLM via Ollama to extract atomic facts from raw memories, de-duplicate, and merge related entries. -In benchmarks, consolidation lifted top-3 recall from 72% to 76% over the baseline (no -consolidation) configuration. - -**SHRM v2 storage format** - -Memory-mapped binary format with 32-bit CRC per entry, atomic flush, and crash recovery. Stores -text embeddings (384-dim), optional vision embeddings (512-dim), optional speech embeddings -(640-dim field, populated from v0.6.0 onward), metadata, and sensitivity labels. - -**Speech architecture (structure only)** - -`shrimpk-memory/src/speech.rs` defines the full `SpeechEmbedder` struct with dimension constants -(`SPEAKER_DIM=256`, `PROSODY_DIM=384`, `SPEECH_DIM=640`), Whisper log-Mel preprocessing, and -ONNX sessions wired in v0.6.0. The 16 kHz resampler uses linear interpolation. - -**MCP server** - -`shrimpk-mcp` exposes nine tools over stdio: `store`, `echo`, `forget`, `stats`, `status`, -`config_show`, `config_set`, `dump`, `persist` (plus `store_image` and `store_audio` when -multimodal features are enabled). Compatible with Claude Desktop and any MCP client. - -**Daemon + tray** - -`shrimpk-daemon` runs as a background HTTP service on `localhost:11435`. `shrimpk-tray` provides -a system tray icon and launch/stop controls on Windows. - -**Performance (release build, i7-1165G7)** - -| Metric | Result | -|--------|--------| -| P50 echo latency at 10K memories | 3.50ms | -| P50 echo latency at 100K memories | 23.79ms (regression — see Known Issues) | -| Store throughput | ~128 memories/sec | -| RAM (10K text memories) | ~85 MB | - ---- - -## v0.6.0 — Speech and Vision Upgrade +**Speech pipeline (640-dim)** -Target: Q2 2026. Focus: wire the speech ONNX models and upgrade the vision model. +ECAPA-TDNN (256-dim speaker) + Whisper-tiny encoder (384-dim prosody). ONNX inference via ort, +auto-download from HuggingFace Hub. Silero VAD gating. Feature-gated behind `--features speech`. -### Speech: ONNX models wired (640-dim — DONE in KS51) +**Vision pipeline (512-dim)** -The speech pipeline is **640-dim** (ECAPA-TDNN 256 + Whisper-tiny encoder 384). The emotion -channel (Wav2Small, CC-BY-NC-SA-4.0) was dropped as license-incompatible. Both wired models -carry permissive licenses: ECAPA-TDNN (Apache-2.0) and Whisper-tiny (MIT). +CLIP ViT-B/32 via fastembed. Cross-modal text-to-image retrieval. Feature-gated behind `--features vision`. -#### ECAPA-TDNN 256-dim — speaker identification +**Entity unification** -Model: `Wespeaker/wespeaker-cnceleb-resnet34-LM` (`cnceleb_resnet34_LM.onnx`, ~24 MB, -Apache 2.0). Loaded via `ort` (ONNX Runtime Rust crate). Auto-downloads from HuggingFace Hub. +EntityFrame with EntityId resolution, alias tracking, supersession rewrite. Entities unify +across memories for consistent knowledge updates. -Input: 80-bin FBank features, shape `(1, frames, 80)`, 25ms frame, 10ms hop, 16 kHz. -Output: 256-dim L2-normalized speaker embedding (output name: `embs`). - -#### Whisper-tiny encoder 384-dim — prosody - -Model: `onnx-community/whisper-tiny` (`onnx/encoder_model.onnx`, 32.9 MB, MIT). The encoder -takes 80-bin Whisper log-Mel spectrogram, shape `(batch, 80, 3000)`, padded to 30 seconds. -Mean-pooling over the sequence dimension produces a 384-dim prosody vector. - -#### Spectrogram preprocessing - -Two spectrogram pipelines run in parallel: - -- **Kaldi fbank** for ECAPA-TDNN: 80 Mel bins, 25ms frame, 10ms hop, 16 kHz. Implementation via - the `mel-spec` crate (v0.3.4, MIT). -- **Whisper log-Mel** for the encoder: 80 Mel bins, N_FFT=400, hop=160 samples, normalized as - `(log_spec + 4.0) / 4.0`. Also handled by `mel-spec`. - -#### Band-limited resampling +**Sleep consolidation** -The current `resample_linear()` stub in `speech.rs` introduces aliasing at high downsample ratios -(e.g., 48 kHz → 16 kHz). v0.6.0 replaces it with the `rubato` crate (v1.0.1), which provides -sinc-interpolation and FFT-based resamplers that are alias-free. +Background LLM-driven fact extraction via Ollama. Schema-driven extraction with quality gates, +dedup, soft invalidation. Universal prompt works across all reader models. -#### VAD gate — Silero VAD +**Importance scoring** -A Voice Activity Detection pass runs before the ECAPA and Whisper sessions. Silent frames -(below a configurable threshold) are skipped entirely to avoid embedding noise as speech. -Silero VAD is loaded as a small ONNX model (~2 MB, MIT license) via a direct `ort::Session`. -The `silero-vad` crate on crates.io is GPL-2.0 and is explicitly avoided — the ONNX model -is loaded directly. +5-signal importance scoring: entity density, temporal salience, novelty, information density, +and user-signal weighting. -#### ort version pinning +**MCP server** -fastembed v5.x pins `ort = "=2.0.0-rc.11"`. The speech code must use the exact same version -to avoid Cargo dependency conflicts. Do not add `ort` as a direct workspace dependency with a -different version specifier. +`shrimpk-mcp` exposes 12 tools over stdio: `store`, `echo`, `memory_graph`, `memory_related`, +`memory_get`, `stats`, `forget`, `status`, `config_show`, `config_set`, `dump`, `persist`. +Compatible with Claude Desktop and any MCP client. -#### Model download on first use +**Daemon + proxy** -Models are downloaded on first `SpeechEmbedder::from_config()` call if not already cached, -following the fastembed pattern: `hf-hub` crate + `dirs::cache_dir()/shrimpk/models/speech/`. -Total first-use download: ~60 MB (ECAPA 25 MB + Whisper encoder 33 MB + Silero VAD 2 MB). +`shrimpk-daemon` on `localhost:11435`. OpenAI-compatible proxy (`/v1/chat/completions`) with +transparent memory injection. Health, debug, and stats endpoints. -### Vision: CLIP ViT-B/32 → Nomic Embed Vision v1.5 (512 → 768-dim) +**System tray** -`NomicEmbedVisionV15` is already a first-class variant in fastembed v5 (`ImageEmbeddingModel` -enum). The swap is a single-line change in `embedder.rs`. The quality improvement is substantial: -+7.8 percentage points on ImageNet zero-shot (71.0% vs 63.2%) and dramatically better cross-modal -MTEB quality (62.28 vs 43.82 for the paired text model). The q4-quantized ONNX is 62 MB vs -CLIP's unquantized 352 MB — a 6x size reduction. +`shrimpk-tray` provides Windows system tray controls. -The 512 → 768 dimension change is a **breaking migration** for stored vision embeddings. The -SHRM v2 format header records embedding dimensions per modality. On first launch after upgrade, -the kernel will detect the dimension mismatch, re-embed all stored vision memories, and rewrite -the store. For the v0.5.0 → v0.6.0 transition the user base is small and a hard-cut re-embed -is the correct strategy. A migration guide will be included in the release notes. +**CLI** -Cross-modal text queries against vision memories must use Nomic Text v1.5 with the mandatory -`search_query:` prefix. This is handled internally by the embedder — callers do not need to -add the prefix manually. +`store`, `echo`, `status`, `explore` (ratatui TUI), `detect`, `dump`, `bench`, `config`. -### Fix: 100K latency regression +### Benchmarks -The P50 latency at 100K memories is 23.79ms against a 4.0ms target. Investigation is required -before v0.6.0 ships. See Known Issues for details. +| Metric | Result | +|--------|--------| +| Seeded micro-benchmark | 19/20 | +| Abstention | 5/5 | +| Negative recall | 3/3 | +| LME-S baseline (GPT-4o judge) | 24.2% overall, 25.3% task-avg | +| P50 echo latency (10K) | 3.50ms | +| Test count | ~481 | + +### Workspace (11 crates + CLI) + +| Crate | Purpose | +|-------|---------| +| `shrimpk-core` | Types: MemoryEntry, EchoResult, EchoConfig, Modality | +| `shrimpk-memory` | Engine: EchoEngine, embedding, LSH, Bloom, Hebbian, labels, FSRS, ACT-R | +| `shrimpk-daemon` | HTTP server: axum, proxy, routes | +| `shrimpk-mcp` | MCP server (stdio): 12 tools | +| `shrimpk-context` | ContextAssembler: token-budgeted prompt compilation | +| `shrimpk-router` | CascadeRouter: provider routing | +| `shrimpk-security` | PII masking (6 categories, 14 regex patterns) | +| `shrimpk-kernel` | Facade crate re-exporting core + memory + context | +| `shrimpk-python` | PyO3 bindings (maturin) | +| `shrimpk-ros2` | ROS2 bridge (stub) | +| `shrimpk-tray` | Windows system tray (win32) | +| `cli/` | CLI binary | --- -## v0.7.0 — Robotics, Speaker Upgrade, and Quantization +## Upcoming -Target: Q3 2026. Focus: ROS2 integration, model quality improvements, and memory footprint. +### KS78 -- Critical Fixes (April 2026) -### ROS2 bridge — `shrimpk-ros2` crate +- Persistence format version mismatch fix (Issue #16) +- Documentation sync (ROADMAP, CHANGELOG, MCP tool count) +- Design system v2 implementation -A new workspace crate `crates/shrimpk-ros2` will provide a ROS2 node that exposes ShrimPK -memory over standard ROS2 topics and services. +### KS79 -- Multi-Resolution Retrieval -The node subscribes to: -- `/shrimpk/store/text` (`std_msgs/String`) — text memories -- `/shrimpk/store/image` (`sensor_msgs/CompressedImage`) — visual memories via CLIP -- `/shrimpk/store/audio` (`audio_common_msgs/AudioStamped`) — speech memories +- Hierarchical retrieval across raw memories, extracted facts, and entity summaries +- Adaptive context window based on query complexity -The node publishes to: -- `/shrimpk/echo` (`shrimpk_msgs/EchoResults`) — push-activated memories -- `/shrimpk/context` (`std_msgs/String`, latched) — current context string for downstream LLMs -- `/shrimpk/status` (`std_msgs/String`, JSON) — health and latency stats +### KS80 -- Memory Lifecycle Improvements -A `/shrimpk/query` service (`shrimpk_msgs/EchoQuery`) supports pull-based querying for nodes -that prefer request/response semantics over the push model. +- Smarter consolidation scheduling based on memory age and access patterns +- Improved supersession confidence scoring -Primary integration path: `rclrs` 0.7+ with colcon on ROS2 Jazzy (Ubuntu 24.04). -Alternative: `r2r` for simpler `cargo build` integration without colcon. -Optional feature flag: `ros2-native` using `ros2-client` (pure Rust DDS, no ROS2 install needed) -for distribution to users who do not have a full ROS2 environment. +--- -The echo latency budget is feasible: 3.50ms ShrimPK echo is well within a 30 Hz camera frame -(33ms). The full pipeline including embedding and topic publish should stay under 15–20ms. +## Future -- No Fixed Timeline -No other push-based memory system has a ROS2 bridge. ReMEmbR (NVIDIA) is pull-based and -Python-only. `shrimpk-ros2` would be the first native-Rust, push-activated memory layer for ROS2. +### Vision model upgrade (CLIP -> Nomic Embed Vision v1.5) -### Speaker upgrade: ECAPA-TDNN → CAM++ +512 -> 768-dim. +7.8pp ImageNet zero-shot. 6x smaller model (62 MB vs 352 MB). +Breaking migration for stored vision embeddings. -CAM++ (Context-Aware Masking) achieves lower equal error rate than ECAPA-TDNN on VoxCeleb1/2 -at comparable model size. The upgrade is a drop-in replacement at the 512-dim output level -provided an Apache 2.0-compatible ONNX export is available. If no suitable pre-built ONNX exists, -the ECAPA-TDNN model ships in v0.7.0 and CAM++ is deferred to v0.8.0. +### ROS2 bridge -- full implementation -### f16 quantization for vision and speech embeddings +`shrimpk-ros2` topics for text/image/audio store, echo publish, query service. +Target: ROS2 Jazzy via rclrs. -Stored vision and speech embeddings currently use f32 (4 bytes/dimension). A v0.7.0 storage -format revision (SHRM v3) will store these as f16 (2 bytes/dimension) with promotion to f32 -at query time. Impact: ~50% reduction in disk and memory footprint for vision/speech memories, -no measurable quality loss for cosine similarity. +### Speaker upgrade (ECAPA-TDNN -> CAM++) -SHRM v3 will include automatic migration from v2 on first launch. +Lower EER at comparable model size. Blocked on Apache 2.0 ONNX availability. ---- +### f16 quantization (SHRM v3) -## Future — No Fixed Timeline - -These items are research directions or require dependencies that are not yet settled. +~50% disk/memory reduction for vision and speech embeddings. ### Custom fine-tuned embedding model -The text embedding model (BGE-small) is a general-purpose model trained on web text. A model -fine-tuned specifically on personal memory data (short episodic sentences, user preferences, -recurring entities) could improve recall quality without increasing model size. This requires -a labeled dataset and an ML training pipeline — it is a research item, not an implementation task. +BGE-small fine-tuned on personal memory data for improved recall. ### crates.io publish -Publishing `shrimpk-core`, `shrimpk-memory`, and (eventually) `shrimpk-ros2` to crates.io -is planned once the API stabilizes beyond v0.6.0. The current pre-1.0 semver signals that -breaking changes are expected. +`shrimpk-core`, `shrimpk-memory` once API stabilizes past v1.0. ### Cloud sync -Optional encrypted sync of the memory store across devices. End-to-end encrypted, the server -sees only ciphertext. The key design question is key management — the server must never hold -decryption keys. This is a future research and design item. - -### Emotion channel - -The 3-dim arousal/dominance/valence emotion channel is architecturally present in `speech.rs` -(`EMOTION_DIM=3`) but has no available ONNX model under a permissive license. If a suitable -Apache 2.0 or MIT model emerges, the emotion channel can be re-enabled without a breaking change -to the storage format (the slot is reserved). Alternatively, a categorical speech emotion -recognition model (4-class: angry, happy, sad, neutral) under a permissive license could -replace the dimensional approach. +Optional E2E encrypted memory sync across devices. --- ## Contribution Opportunities -All issues below are open for contribution. The project uses Apache 2.0. Opening a discussion -issue before starting significant work is encouraged to avoid duplication. - ### Good first issue -**Fix vision feature flag propagation** (difficulty: low, Rust knowledge required) -Vision benchmarks (`echo_multimodal_bench.rs`) are blocked because -`#[cfg(feature = "vision")]` checks the root test crate's features, not `shrimpk-memory`'s. -The fix is adding a forwarding `vision` feature to the root `Cargo.toml` that enables -`shrimpk-memory/vision`. Estimated: 1–2 hours. - -**Add `search_query:` prefix for cross-modal text queries** (difficulty: low, Rust) -When Nomic Embed Vision v1.5 is the active vision model (v0.6.0), text queries used in -cross-modal retrieval must be prefixed with `"search_query: "`. This should be applied -automatically in `MultiEmbedder` when the Nomic vision model is active, not pushed to callers. -Requires reading the fastembed API and adding a model-variant check. - -**Extend the Tier 2 benchmark with a CrossEncoder config** (difficulty: low, Rust) -The realistic Tier 2 benchmark tests four pipeline configs (Baseline, HyDE, Reranker-LLM, -Combined). A CrossEncoder-only config was benchmarked separately and showed strong results -(2,823ms average at 100% recall on 6 regression cases). Adding it to the standard Tier 2 -suite would complete the comparison matrix. +- **Fix vision feature flag propagation** -- forwarding `vision` feature to root `Cargo.toml` +- **CrossEncoder config in Tier 2 benchmark** -- add to standard benchmark suite ### Help wanted -**Investigate 100K latency regression** (difficulty: medium, Rust + profiling) -P50 at 100K memories is 23.79ms against a 4.0ms target. Likely causes: LSH bucket saturation -with BGE-small embedding distribution, brute-force fallback frequency, or Windows I/O interference -during the benchmark. The investigation should profile LSH hit rate, Bloom false-positive rate, -and brute-force fallback frequency at scale. Tools: `perf`, `cargo flamegraph`, or the -`tracing` spans already in the echo path. A fix might involve tuning LSH parameters -(hash count, bucket width) for the BGE-small distribution. - -**~~Wire ECAPA-TDNN ONNX session~~** — DONE (KS51). Wespeaker ResNet34 256-dim, FBank -preprocessing implemented in pure Rust (`compute_fbank_flat()`), `ort` version matches -fastembed's pinned `=2.0.0-rc.11`. - -**~~Wire Whisper-tiny encoder ONNX session~~** — DONE (KS51). Whisper-tiny encoder takes -`(1, 80, 3000)` log-Mel spectrogram, outputs `(1, 1500, 384)` hidden states, mean-pooled -to 384-dim. -Preprocessing uses the Whisper log-Mel formula implemented in `mel-spec`. Can be done in -parallel with the ECAPA item by a different contributor. - -**Implement band-limited resampling with `rubato`** (difficulty: medium, Rust + DSP) -Replace `resample_linear()` in `speech.rs` with sinc or FFT-based resampling from the `rubato` -crate (v1.0.1). The current linear resampler causes aliasing at high downsample ratios and is -documented as a placeholder. The replacement should pass the existing `resample_*` unit tests -and add a new test verifying that a 1 kHz sine wave downsampled from 48 kHz to 16 kHz does not -contain aliasing artifacts above 8 kHz. - -**Linux CI hardening** (difficulty: medium, DevOps + Rust) -The kernel builds and tests pass on CI for Linux and macOS, but the test coverage is lower than -on the primary Windows development machine. Specifically: daemon startup tests, tray icon tests, -and file locking tests need Linux-specific validation. Contributions improving Linux CI coverage -are welcome. +- **100K latency regression** -- P50 23.79ms vs 4.0ms target. Needs LSH profiling. +- **Band-limited resampling** -- replace `resample_linear()` with `rubato` sinc resampling +- **Linux CI hardening** -- daemon startup, file locking, tray icon tests ### Research needed -**Emotion model under permissive license** (difficulty: high, ML research) -The 3-dim arousal/dominance/valence emotion slot in the speech pipeline is reserved but empty -because all mature dimensional emotion models (Wav2Small, wav2vec2-large-robust) carry -CC-BY-NC-SA-4.0 licenses. Options: (1) identify an existing Apache 2.0 / MIT categorical -speech emotion model that can be exported to ONNX and mapped to a valence proxy, (2) train a -small distillation model on CC0 or public-domain audio corpora, or (3) propose an alternative -paralinguistic dimension that has available permissive models. - -**LSH parameter tuning for BGE-small distribution** (difficulty: high, information retrieval) -The LSH index was tuned for `all-MiniLM-L6-v2` embeddings. The upgrade to `BGE-small-EN-v1.5` -changed the embedding distribution in ways that may require different hash count, bucket width, -or candidate list size to maintain sub-10ms P50 at 100K scale. This is an empirical research -task: vary LSH parameters, run the 100K latency benchmark, and identify the configuration that -recovers the 4.0ms target. - -**CAM++ Apache 2.0 ONNX availability** (difficulty: medium, ML research) -The v0.7.0 speaker upgrade to CAM++ depends on finding or producing an Apache 2.0-compatible -ONNX export. WeSpeaker provides CAM++ checkpoints but the license status of any pre-built -ONNX exports needs verification. This research item should produce a clear verdict: model ID, -license, ONNX file location, and input/output specification. - -**SigLIP 2 fastembed support** (difficulty: high, ML + Rust) -SigLIP 2 ViT-B/16 achieves 78.2% ImageNet zero-shot (vs Nomic Vision v1.5 at 71.0%) but has -no official ONNX model and no fastembed support as of March 2026. If an Apache 2.0 ONNX export -emerges, contributing a `SigLIP2VitB16` variant to fastembed and then updating ShrimPK's -vision channel would be a meaningful quality improvement. +- **Emotion model under permissive license** -- 3-dim A/D/V slot reserved, no Apache 2.0 model +- **LSH parameter tuning for BGE-small** -- hash count, bucket width optimization at 100K scale +- **SigLIP 2 fastembed support** -- 78.2% ImageNet zero-shot, no ONNX export yet From 312515e3ee1234f0744c82b7e6dd084237e62c94 Mon Sep 17 00:00:00 2001 From: Lior Cohen Date: Fri, 10 Apr 2026 05:22:42 +0300 Subject: [PATCH 2/3] fix: revert accidental hebbian_boosts changes, keep only recency epsilon MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Remove merged supersession-demotion changes (hebbian_boosts type change, demotion→superseded_count rename, multiplicative demotion) that belong on a separate branch - Keep only the recency tie-breaker epsilon (step 7c7) and google_result test fix from PR #13 Co-Authored-By: Claude Opus 4.6 (1M context) --- crates/shrimpk-memory/src/echo.rs | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/crates/shrimpk-memory/src/echo.rs b/crates/shrimpk-memory/src/echo.rs index 3164c77..25bc401 100644 --- a/crates/shrimpk-memory/src/echo.rs +++ b/crates/shrimpk-memory/src/echo.rs @@ -1624,6 +1624,15 @@ impl EchoEngine { } } + // 7c7. KS78: Recency tie-breaker (#13) — after all boosts and caps, add a + // negligible epsilon derived from created_at so newer memories win ties. + for result in &mut results { + if let Some(entry) = store.get(&result.memory_id) { + let recency_epsilon = (entry.created_at.timestamp_micros() as f64) * 1e-18; + result.final_score += recency_epsilon; + } + } + // 7d. Re-sort by final_score (similarity + hebbian boost) results.sort_by(|a, b| { b.final_score @@ -3616,9 +3625,12 @@ mod tests { assert!(results.len() >= 2, "Should have at least 2 results"); - // Find both memories in results + // Find both memories in results (the Meta memory also mentions "Google", + // so match the Google-only memory by excluding results that mention "Meta") let meta_result = results.iter().find(|r| r.content.contains("Meta")); - let google_result = results.iter().find(|r| r.content.contains("Google")); + let google_result = results + .iter() + .find(|r| r.content.contains("Google") && !r.content.contains("Meta")); assert!(meta_result.is_some(), "Meta memory should surface"); assert!(google_result.is_some(), "Google memory should surface"); From e6297078cd4747943e5c08a038ac6aae63312c86 Mon Sep 17 00:00:00 2001 From: Lior Cohen Date: Fri, 10 Apr 2026 16:15:14 +0300 Subject: [PATCH 3/3] =?UTF-8?q?fix:=20ROADMAP=20header=20v0.7.0=E2=86=92v0?= =?UTF-8?q?.7.5=20+=20CHANGELOG=20heading=20style=20(Greptile)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - docs/ROADMAP.md: "Current State" header said v0.7.0 but content describes v0.7.5 features (12 MCP tools, entity unification, etc.) - CHANGELOG.md: v0.7.5 entry used "--" instead of em dash "—", inconsistent with all other version headings in the file Co-Authored-By: Claude Opus 4.6 (1M context) --- CHANGELOG.md | 2 +- docs/ROADMAP.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 284b379..ad5f20d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,7 +6,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/). ## [Unreleased] -## [0.7.5] -- 2026-04-10 +## [0.7.5] — 2026-04-10 ### Added - **Schema-driven fact extraction** (KS67): structured extraction pipeline replacing free-form LLM output diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md index 9aa8736..988af38 100644 --- a/docs/ROADMAP.md +++ b/docs/ROADMAP.md @@ -5,7 +5,7 @@ Dates are aspirational. Contributions welcome -- see Contribution Opportunities --- -## Current State -- v0.7.0 +## Current State -- v0.7.5 Released April 2026. The core pipeline is stable with 11 crates + CLI, hybrid GraphRAG retrieval, and entity unification.