Skip to content

v0.10.0 — omc-memory-plus axes 1-4: 5,356× context compression on real Claude Code dev work

Latest

Choose a tag to compare

@RandomCoder-lab RandomCoder-lab released this 18 May 01:53
· 71 commits to master since this release

Pushing OMC Memory+ compression ceiling beyond v1.0's 297× along four orthogonal axes. All four ship with round-trip verification on this codebase's own chapter writeups.

Headline

axis mechanism measured win
1 Merkle manifest hashes 5,356× context compression (19 chapters → 1 hash)
2 Cross-namespace dedup pool 5× disk on 5-way duplicate (linear with N namespaces)
3 Aged-tier zlib (`OMCZ` magic) 2.19× disk on Markdown
4 Substrate tokenizer (`OMCT` magic) 2.37× disk on OMC source (≈ ties Axis 3)

What's new

6 new MCP tools

  • `omc_memory_create_manifest(namespace, entries)` — bundle N leaf hashes into 1 manifest hash
  • `omc_memory_recall_manifest(content_hash, expand?)` — recall manifest, optionally fetch all leaves
  • `omc_memory_compact(namespace, age_threshold_secs)` — re-deflate aged pool bodies as OMCZ
  • `omc_memory_compact_substrate(namespace, age_threshold_secs)` — re-encode aged bodies via substrate tokenizer as OMCT
  • Auto-decompression of OMCZ + OMCT bodies on recall (transparent)
  • Cross-namespace dedup pool at `~/.omc/memory/_pool//.txt`

Architecture

  • All bodies content-addressed to a global pool with 256-shard fanout by hash top byte
  • Per-namespace dirs hold only the chronological index (`_index.jsonl`)
  • Recall: pool first → legacy per-namespace fallback → maybe_decompress (OMCZ / OMCT / plain)
  • flate2 added as omnimcode-core dep (rust_backend, no system zlib required)

How the compounding works

Axis 1 attacks context cost (tokens in LLM working set). Axes 2-4 attack disk cost (bytes on filesystem). Axis 1 is what the LLM pays per turn; axes 2-4 are what the user pays in storage. They multiply because they target different scarce resources.

Example — 19 chapters duplicated across 5 namespaces, all aged into Axis 3 compaction:

version disk bytes context tokens needed to reference everything
v1.0 naive 570,760 (95 files) 95 hash refs = 475 tokens
v0.9.2 pool dedup 114,152 (19 files) 95 refs = 475 tokens
v0.9.3 + zlib aged ~52,000 95 refs = 475 tokens
v0.9.1 manifest (same disk) 5 tokens (1 manifest hash)

The Axis 1 manifest hash is the headline win for LLM context cost. The other axes are the foundation that keeps disk + retrieval cheap as memory grows.

Honest framing on Axis 4

Substrate tokenizer compaction was hypothesized to dominate raw zlib on OMC-flavored content because the substrate dictionary was tuned for OMC syntax. Measured: 2.37× vs raw zlib's 2.48× on the same content — essentially tied. Axis 4 ships as the substrate-native compression path that enables future Axis 6 HBit dual-band work, even though raw byte-savings is on par with Axis 3.

Still on the roadmap

axis mechanism est. additional win
5 Delta compression between similar entries 10-100× on iterative content
6 HBit dual-band codec 2-3× over Axis 4
7 LLM-assisted lossy + hash verification 10-50× more on prose with regen

Tests

1111/1111 OMC tests pass. End-to-end MCP integration test verifies round-trip on Markdown + OMC source.

Files

  • `omnimcode-core/src/memory.rs` — Axis 1-4 implementations + maybe_decompress + varint helpers
  • `omnimcode-core/Cargo.toml` — flate2 added
  • `omnimcode-mcp/src/main.rs` — 4 new tool registrations + dispatch