Benchmark Results

Auto-generated by npm run bench:save. Do not edit manually.

v1.3.0 · Generated: 2026-03-22

Summary

Metric	Value
Scenarios	8
Average compression	2.01x
Best compression	4.90x
Round-trip integrity	all PASS
Average quality score	0.985
Average entity retention	96%

pie title "Message Outcomes"
    "Preserved" : 90
    "Compressed" : 65

Compression by Scenario

8 scenarios · 2.01x avg ratio · 1.00x – 4.90x range · all round-trips PASS

xychart-beta
    title "Compression Ratio by Scenario"
    x-axis ["Coding", "Long Q&A", "Tool-heavy", "Short", "Deep", "Technical", "Structured", "Agentic"]
    y-axis "Char Ratio"
    bar [1.94, 4.90, 1.40, 1.00, 2.50, 1.00, 1.86, 1.48]

Scenario	Ratio	Reduction	Token Ratio	Messages	Compressed	Preserved
Coding assistant	1.94	48%	1.93	13	5	8
Long Q&A	4.90	80%	4.88	10	4	6
Tool-heavy	1.40	29%	1.39	18	2	16
Short conversation	1.00	0%	1.00	7	0	7
Deep conversation	2.50	60%	2.49	51	50	1
Technical explanation	1.00	0%	1.00	11	0	11
Structured content	1.86	46%	1.85	12	2	10
Agentic coding session	1.48	32%	1.47	33	2	31

Deduplication Impact

xychart-beta
    title "Deduplication Impact (recencyWindow=0)"
    x-axis ["Long Q&A", "Agentic"]
    y-axis "Char Ratio"
    bar [4.00, 1.20]
    bar [4.90, 1.48]

First bar: no dedup · Second bar: with dedup

Scenario	No Dedup (rw=0)	Dedup (rw=0)	No Dedup (rw=4)	Dedup (rw=4)	Deduped
Coding assistant	1.94	1.94	1.61	1.61	0
Long Q&A	4.00	4.90	1.76	1.92	1
Tool-heavy	1.40	1.40	1.40	1.40	0
Short conversation	1.00	1.00	1.00	1.00	0
Deep conversation	2.50	2.50	2.24	2.24	0
Technical explanation	1.00	1.00	1.00	1.00	0
Structured content	1.86	1.86	1.33	1.33	0
Agentic coding session	1.20	1.48	1.20	1.48	4

Fuzzy Dedup

Scenario	Exact Deduped	Fuzzy Deduped	Ratio	vs Base
Coding assistant	0	0	1.94	-
Long Q&A	1	0	4.90	-
Tool-heavy	0	0	1.40	-
Short conversation	0	0	1.00	-
Deep conversation	0	0	2.50	-
Technical explanation	0	0	1.00	-
Structured content	0	0	1.86	-
Agentic coding session	4	2	2.35	+59%

ANCS-Inspired Features

Importance scoring preserves high-value messages outside the recency window. Contradiction detection compresses superseded messages.

Scenario	Baseline	+Importance	+Contradiction	Combined	Imp. Preserved	Contradicted
Deep conversation	2.37	2.37	2.37	2.37	0	0
Agentic coding session	1.47	1.24	1.47	1.24	4	0
Iterative design	1.62	1.26	1.62	1.26	6	2

Quality Metrics

Scenario	Entity Retention	Structural Integrity	Reference Coherence	Quality Score
Coding assistant	100%	100%	100%	1.000
Long Q&A	100%	100%	100%	1.000
Tool-heavy	93%	100%	100%	0.972
Deep conversation	100%	100%	100%	1.000
Structured content	100%	100%	100%	1.000
Agentic coding session	85%	100%	100%	0.939

Token Budget

Target: 2000 tokens · 1/4 fit

Scenario	Dedup	Tokens	Fits	recencyWindow	Compressed	Preserved	Deduped
Deep conversation	no	3188	no	0	50	1	0
Deep conversation	yes	3188	no	0	50	1	0
Agentic coding session	no	2223	no	0	4	33	0
Agentic coding session	yes	1900	yes	9	1	32	4

Bundle Size

Zero-dependency ESM library — tracked per-file to catch regressions.

File	Size	Gzip
adapters.js	4.1 KB	1.3 KB
classifier.js	4.5 KB	1.6 KB
classify.js	10.7 KB	4.3 KB
cluster.js	8.4 KB	2.8 KB
compress.js	84.6 KB	16.6 KB
contradiction.js	7.5 KB	2.7 KB
coreference.js	4.2 KB	1.5 KB
dedup.js	10.0 KB	2.8 KB
discourse.js	6.6 KB	2.4 KB
entities.js	8.2 KB	2.6 KB
entropy.js	1.9 KB	832 B
expand.js	2.7 KB	934 B
feedback.js	11.6 KB	2.9 KB
flow.js	7.8 KB	2.0 KB
importance.js	4.6 KB	1.8 KB
index.js	1.8 KB	761 B
ml-classifier.js	3.0 KB	1.2 KB
summarizer.js	2.5 KB	993 B
types.js	11 B	31 B
total	185.0 KB	49.9 KB

LLM vs Deterministic

Results are non-deterministic — LLM outputs vary between runs. Saved as reference data, not used for regression testing.

Deterministic vs ollama/llama3.2

Coding assistant        Det ████████████░░░░░░░░░░░░░░░░░░ 1.94x
                        LLM █████████░░░░░░░░░░░░░░░░░░░░░ 1.55x

Long Q&A                Det ██████████████████████████████ 4.90x
                        LLM ███████████████████████████░░░ 4.49x

Tool-heavy              Det █████████░░░░░░░░░░░░░░░░░░░░░ 1.40x
                        LLM ████████░░░░░░░░░░░░░░░░░░░░░░ 1.28x

Deep conversation       Det ███████████████░░░░░░░░░░░░░░░ 2.50x
                        LLM ████████████████████░░░░░░░░░░ 3.28x  ★

Technical explanation   Det ██████░░░░░░░░░░░░░░░░░░░░░░░░ 1.00x
                        LLM ██████░░░░░░░░░░░░░░░░░░░░░░░░ 1.00x

Structured content      Det ███████████░░░░░░░░░░░░░░░░░░░ 1.86x
                        LLM █████████░░░░░░░░░░░░░░░░░░░░░ 1.46x

Agentic coding session  Det █████████░░░░░░░░░░░░░░░░░░░░░ 1.48x
                        LLM █████████░░░░░░░░░░░░░░░░░░░░░ 1.40x

★ = LLM wins

Deterministic vs openai/gpt-4.1-mini

Coding assistant        Det ███████████░░░░░░░░░░░░░░░░░░░ 1.94x
                        LLM █████████░░░░░░░░░░░░░░░░░░░░░ 1.64x

Long Q&A                Det ███████████████████████████░░░ 4.90x
                        LLM ██████████████████████████████ 5.37x  ★

Tool-heavy              Det ████████░░░░░░░░░░░░░░░░░░░░░░ 1.40x
                        LLM ██████░░░░░░░░░░░░░░░░░░░░░░░░ 1.12x

Deep conversation       Det ██████████████░░░░░░░░░░░░░░░░ 2.50x
                        LLM █████████████░░░░░░░░░░░░░░░░░ 2.37x

Technical explanation   Det ██████░░░░░░░░░░░░░░░░░░░░░░░░ 1.00x
                        LLM ██████░░░░░░░░░░░░░░░░░░░░░░░░ 1.00x

Structured content      Det ██████████░░░░░░░░░░░░░░░░░░░░ 1.86x
                        LLM ███████░░░░░░░░░░░░░░░░░░░░░░░ 1.29x

Agentic coding session  Det ████████░░░░░░░░░░░░░░░░░░░░░░ 1.48x
                        LLM ████████░░░░░░░░░░░░░░░░░░░░░░ 1.43x

★ = LLM wins

Provider Summary

Provider	Model	Avg Ratio	Avg vsDet	Round-trip	Budget Fits	Avg Time
ollama	llama3.2	2.09x	0.96	all PASS	1/4	4.2s
openai	gpt-4.1-mini	2.09x	0.92	all PASS	2/4	8.1s

Key findings: LLM wins on prose-heavy scenarios: Deep conversation, Technical explanation Deterministic wins on structured/technical content: Coding assistant, Long Q&A, Tool-heavy, Structured content

ollama (llama3.2)

Generated: 2026-02-25

Scenario details

Scenario	Method	Char Ratio	Token Ratio	vsDet	Compressed	Preserved	Round-trip	Time
Coding assistant	deterministic	1.68	1.67	-	5	8	PASS	0ms
	llm-basic	1.48	1.48	0.88	5	8	PASS	5.9s
	llm-escalate	1.55	1.55	0.92	5	8	PASS	3.0s
Long Q&A	deterministic	6.16	6.11	-	4	6	PASS	1ms
	llm-basic	4.31	4.28	0.70	4	6	PASS	4.1s
	llm-escalate	4.49	4.46	0.73	4	6	PASS	3.7s
Tool-heavy	deterministic	1.30	1.29	-	2	16	PASS	2ms
	llm-basic	1.12	1.11	0.86	2	16	PASS	2.3s
	llm-escalate	1.28	1.28	0.99	2	16	PASS	2.8s
Deep conversation	deterministic	2.12	2.12	-	50	1	PASS	3ms
	llm-basic	3.12	3.11	1.47	50	1	PASS	22.7s
	llm-escalate	3.28	3.26	1.54	50	1	PASS	23.3s
Technical explanation	deterministic	1.00	1.00	-	0	11	PASS	1ms
	llm-basic	1.00	1.00	1.00	0	11	PASS	3.2s
	llm-escalate	1.00	1.00	1.00	2	9	PASS	785ms
Structured content	deterministic	1.93	1.92	-	2	10	PASS	0ms
	llm-basic	1.46	1.45	0.75	2	10	PASS	3.5s
	llm-escalate	1.38	1.38	0.71	2	10	PASS	3.7s
Agentic coding session	deterministic	1.43	1.43	-	2	31	PASS	1ms
	llm-basic	1.35	1.34	0.94	2	31	PASS	3.3s
	llm-escalate	1.40	1.40	0.98	2	31	PASS	5.4s

Token Budget (target: 2000 tokens)

Scenario	Method	Tokens	Fits	recencyWindow	Ratio	Round-trip	Time
Deep conversation	deterministic	3738	false	0	2.12	PASS	12ms
	llm-escalate	2593	false	0	3.08	PASS	132.0s
Agentic coding session	deterministic	1957	true	9	1.36	PASS	2ms
	llm-escalate	2003	false	9	1.33	PASS	4.1s

openai (gpt-4.1-mini)

Generated: 2026-02-25

Scenario details

Scenario	Method	Char Ratio	Token Ratio	vsDet	Compressed	Preserved	Round-trip	Time
Coding assistant	deterministic	1.68	1.67	-	5	8	PASS	0ms
	llm-basic	1.64	1.63	0.98	5	8	PASS	5.6s
	llm-escalate	1.63	1.63	0.97	5	8	PASS	6.0s
Long Q&A	deterministic	6.16	6.11	-	4	6	PASS	1ms
	llm-basic	5.37	5.33	0.87	4	6	PASS	5.9s
	llm-escalate	5.35	5.31	0.87	4	6	PASS	7.0s
Tool-heavy	deterministic	1.30	1.29	-	2	16	PASS	0ms
	llm-basic	1.11	1.10	0.85	2	16	PASS	3.5s
	llm-escalate	1.12	1.12	0.86	2	16	PASS	5.3s
Deep conversation	deterministic	2.12	2.12	-	50	1	PASS	3ms
	llm-basic	2.34	2.33	1.10	50	1	PASS	50.4s
	llm-escalate	2.37	2.36	1.11	50	1	PASS	50.8s
Technical explanation	deterministic	1.00	1.00	-	0	11	PASS	1ms
	llm-basic	1.00	1.00	1.00	1	10	PASS	2.6s
	llm-escalate	1.00	1.00	1.00	1	10	PASS	3.3s
Structured content	deterministic	1.93	1.92	-	2	10	PASS	0ms
	llm-basic	1.23	1.23	0.64	2	10	PASS	10.2s
	llm-escalate	1.29	1.29	0.67	2	10	PASS	4.8s
Agentic coding session	deterministic	1.43	1.43	-	2	31	PASS	1ms
	llm-basic	1.43	1.43	1.00	2	31	PASS	5.8s
	llm-escalate	1.32	1.32	0.93	1	32	PASS	9.5s

Token Budget (target: 2000 tokens)

Scenario	Method	Tokens	Fits	recencyWindow	Ratio	Round-trip	Time
Deep conversation	deterministic	3738	false	0	2.12	PASS	10ms
	llm-escalate	3391	false	0	2.35	PASS	280.5s
Agentic coding session	deterministic	1957	true	9	1.36	PASS	2ms
	llm-escalate	1915	true	3	1.39	PASS	28.1s

Version History

Version	Date	Avg Char Ratio	Avg Token Ratio	Scenarios
1.3.0	2026-03-22	2.01	2.00	8
1.2.0	2026-03-20	2.01	2.00	8
1.1.0	2026-03-20	2.01	2.00	8
1.0.0	2026-03-10	2.01	2.00	8

v1.2.0 → v1.3.0

2.01x → 2.01x avg compression (0.00%)

Scenario	v1.2.0	v1.3.0	Change	Token Δ
Coding assistant	1.94x	1.94x	0.00%	0.00%	─
Long Q&A	4.90x	4.90x	0.00%	0.00%	─
Tool-heavy	1.40x	1.40x	0.00%	0.00%	─
Short conversation	1.00x	1.00x	0.00%	0.00%	─
Deep conversation	2.50x	2.50x	0.00%	0.00%	─
Technical explanation	1.00x	1.00x	0.00%	0.00%	─
Structured content	1.86x	1.86x	0.00%	0.00%	─
Agentic coding session	1.48x	1.48x	0.00%	0.00%	─

Bundle: 183.5 KB → 185.0 KB (+0.86%)

v1.2.0 (2026-03-20) — 2.01x avg

Scenario	Char Ratio	Token Ratio	Compressed	Preserved
Coding assistant	1.94	1.93	5	8
Long Q&A	4.90	4.88	4	6
Tool-heavy	1.40	1.39	2	16
Short conversation	1.00	1.00	0	7
Deep conversation	2.50	2.49	50	1
Technical explanation	1.00	1.00	0	11
Structured content	1.86	1.85	2	10
Agentic coding session	1.48	1.47	2	31

v1.1.0 (2026-03-20) — 2.01x avg

Scenario	Char Ratio	Token Ratio	Compressed	Preserved
Coding assistant	1.94	1.93	5	8
Long Q&A	4.90	4.88	4	6
Tool-heavy	1.41	1.40	2	16
Short conversation	1.00	1.00	0	7
Deep conversation	2.50	2.49	50	1
Technical explanation	1.00	1.00	0	11
Structured content	1.86	1.85	2	10
Agentic coding session	1.48	1.47	2	31

v1.0.0 (2026-03-10) — 2.01x avg

Scenario	Char Ratio	Token Ratio	Compressed	Preserved
Coding assistant	1.94	1.93	5	8
Long Q&A	4.90	4.88	4	6
Tool-heavy	1.41	1.40	2	16
Short conversation	1.00	1.00	0	7
Deep conversation	2.50	2.49	50	1
Technical explanation	1.00	1.00	0	11
Structured content	1.86	1.85	2	10
Agentic coding session	1.48	1.47	2	31

Methodology

All deterministic results use the same input → same output guarantee
Metrics: compression ratio, token ratio, message counts, dedup counts
Timing is excluded from baselines (hardware-dependent)
LLM benchmarks are saved as reference data, not used for regression testing
Round-trip integrity is verified for every scenario (compress then uncompress)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark Results

Summary

Compression by Scenario

Deduplication Impact

Fuzzy Dedup

ANCS-Inspired Features

Quality Metrics

Token Budget

Bundle Size

LLM vs Deterministic

Provider Summary

ollama (llama3.2)

Token Budget (target: 2000 tokens)

openai (gpt-4.1-mini)

Token Budget (target: 2000 tokens)

Version History

v1.2.0 → v1.3.0

Methodology

FilesExpand file tree

benchmark-results.md

Latest commit

History

benchmark-results.md

File metadata and controls

Benchmark Results

Summary

Compression by Scenario

Deduplication Impact

Fuzzy Dedup

ANCS-Inspired Features

Quality Metrics

Token Budget

Bundle Size

LLM vs Deterministic

Provider Summary

ollama (llama3.2)

Token Budget (target: 2000 tokens)

openai (gpt-4.1-mini)

Token Budget (target: 2000 tokens)

Version History

v1.2.0 → v1.3.0

Methodology