Back to README | All docs | Handbook
Auto-generated by npm run bench:save. Do not edit manually.
v1.3.0 · Generated: 2026-03-22
Metric
Value
Scenarios
8
Average compression
2.01x
Best compression
4.90x
Round-trip integrity
all PASS
Average quality score
0.985
Average entity retention
96%
pie title "Message Outcomes"
"Preserved" : 90
"Compressed" : 65
Loading
8 scenarios · 2.01x avg ratio · 1.00x – 4.90x range · all round-trips PASS
xychart-beta
title "Compression Ratio by Scenario"
x-axis ["Coding", "Long Q&A", "Tool-heavy", "Short", "Deep", "Technical", "Structured", "Agentic"]
y-axis "Char Ratio"
bar [1.94, 4.90, 1.40, 1.00, 2.50, 1.00, 1.86, 1.48]
Loading
Scenario
Ratio
Reduction
Token Ratio
Messages
Compressed
Preserved
Coding assistant
1.94
48%
1.93
13
5
8
Long Q&A
4.90
80%
4.88
10
4
6
Tool-heavy
1.40
29%
1.39
18
2
16
Short conversation
1.00
0%
1.00
7
0
7
Deep conversation
2.50
60%
2.49
51
50
1
Technical explanation
1.00
0%
1.00
11
0
11
Structured content
1.86
46%
1.85
12
2
10
Agentic coding session
1.48
32%
1.47
33
2
31
xychart-beta
title "Deduplication Impact (recencyWindow=0)"
x-axis ["Long Q&A", "Agentic"]
y-axis "Char Ratio"
bar [4.00, 1.20]
bar [4.90, 1.48]
Loading
First bar: no dedup · Second bar: with dedup
Scenario
No Dedup (rw=0)
Dedup (rw=0)
No Dedup (rw=4)
Dedup (rw=4)
Deduped
Coding assistant
1.94
1.94
1.61
1.61
0
Long Q&A
4.00
4.90
1.76
1.92
1
Tool-heavy
1.40
1.40
1.40
1.40
0
Short conversation
1.00
1.00
1.00
1.00
0
Deep conversation
2.50
2.50
2.24
2.24
0
Technical explanation
1.00
1.00
1.00
1.00
0
Structured content
1.86
1.86
1.33
1.33
0
Agentic coding session
1.20
1.48
1.20
1.48
4
Scenario
Exact Deduped
Fuzzy Deduped
Ratio
vs Base
Coding assistant
0
0
1.94
-
Long Q&A
1
0
4.90
-
Tool-heavy
0
0
1.40
-
Short conversation
0
0
1.00
-
Deep conversation
0
0
2.50
-
Technical explanation
0
0
1.00
-
Structured content
0
0
1.86
-
Agentic coding session
4
2
2.35
+59%
Importance scoring preserves high-value messages outside the recency window. Contradiction detection compresses superseded messages.
Scenario
Baseline
+Importance
+Contradiction
Combined
Imp. Preserved
Contradicted
Deep conversation
2.37
2.37
2.37
2.37
0
0
Agentic coding session
1.47
1.24
1.47
1.24
4
0
Iterative design
1.62
1.26
1.62
1.26
6
2
Scenario
Entity Retention
Structural Integrity
Reference Coherence
Quality Score
Coding assistant
100%
100%
100%
1.000
Long Q&A
100%
100%
100%
1.000
Tool-heavy
93%
100%
100%
0.972
Deep conversation
100%
100%
100%
1.000
Structured content
100%
100%
100%
1.000
Agentic coding session
85%
100%
100%
0.939
Target: 2000 tokens · 1/4 fit
Scenario
Dedup
Tokens
Fits
recencyWindow
Compressed
Preserved
Deduped
Deep conversation
no
3188
no
0
50
1
0
Deep conversation
yes
3188
no
0
50
1
0
Agentic coding session
no
2223
no
0
4
33
0
Agentic coding session
yes
1900
yes
9
1
32
4
Zero-dependency ESM library — tracked per-file to catch regressions.
File
Size
Gzip
adapters.js
4.1 KB
1.3 KB
classifier.js
4.5 KB
1.6 KB
classify.js
10.7 KB
4.3 KB
cluster.js
8.4 KB
2.8 KB
compress.js
84.6 KB
16.6 KB
contradiction.js
7.5 KB
2.7 KB
coreference.js
4.2 KB
1.5 KB
dedup.js
10.0 KB
2.8 KB
discourse.js
6.6 KB
2.4 KB
entities.js
8.2 KB
2.6 KB
entropy.js
1.9 KB
832 B
expand.js
2.7 KB
934 B
feedback.js
11.6 KB
2.9 KB
flow.js
7.8 KB
2.0 KB
importance.js
4.6 KB
1.8 KB
index.js
1.8 KB
761 B
ml-classifier.js
3.0 KB
1.2 KB
summarizer.js
2.5 KB
993 B
types.js
11 B
31 B
total
185.0 KB
49.9 KB
Results are non-deterministic — LLM outputs vary between runs. Saved as reference data, not used for regression testing.
Deterministic vs ollama/llama3.2
Coding assistant Det ████████████░░░░░░░░░░░░░░░░░░ 1.94x
LLM █████████░░░░░░░░░░░░░░░░░░░░░ 1.55x
Long Q&A Det ██████████████████████████████ 4.90x
LLM ███████████████████████████░░░ 4.49x
Tool-heavy Det █████████░░░░░░░░░░░░░░░░░░░░░ 1.40x
LLM ████████░░░░░░░░░░░░░░░░░░░░░░ 1.28x
Deep conversation Det ███████████████░░░░░░░░░░░░░░░ 2.50x
LLM ████████████████████░░░░░░░░░░ 3.28x ★
Technical explanation Det ██████░░░░░░░░░░░░░░░░░░░░░░░░ 1.00x
LLM ██████░░░░░░░░░░░░░░░░░░░░░░░░ 1.00x
Structured content Det ███████████░░░░░░░░░░░░░░░░░░░ 1.86x
LLM █████████░░░░░░░░░░░░░░░░░░░░░ 1.46x
Agentic coding session Det █████████░░░░░░░░░░░░░░░░░░░░░ 1.48x
LLM █████████░░░░░░░░░░░░░░░░░░░░░ 1.40x
★ = LLM wins
Deterministic vs openai/gpt-4.1-mini
Coding assistant Det ███████████░░░░░░░░░░░░░░░░░░░ 1.94x
LLM █████████░░░░░░░░░░░░░░░░░░░░░ 1.64x
Long Q&A Det ███████████████████████████░░░ 4.90x
LLM ██████████████████████████████ 5.37x ★
Tool-heavy Det ████████░░░░░░░░░░░░░░░░░░░░░░ 1.40x
LLM ██████░░░░░░░░░░░░░░░░░░░░░░░░ 1.12x
Deep conversation Det ██████████████░░░░░░░░░░░░░░░░ 2.50x
LLM █████████████░░░░░░░░░░░░░░░░░ 2.37x
Technical explanation Det ██████░░░░░░░░░░░░░░░░░░░░░░░░ 1.00x
LLM ██████░░░░░░░░░░░░░░░░░░░░░░░░ 1.00x
Structured content Det ██████████░░░░░░░░░░░░░░░░░░░░ 1.86x
LLM ███████░░░░░░░░░░░░░░░░░░░░░░░ 1.29x
Agentic coding session Det ████████░░░░░░░░░░░░░░░░░░░░░░ 1.48x
LLM ████████░░░░░░░░░░░░░░░░░░░░░░ 1.43x
★ = LLM wins
Provider
Model
Avg Ratio
Avg vsDet
Round-trip
Budget Fits
Avg Time
ollama
llama3.2
2.09x
0.96
all PASS
1/4
4.2s
openai
gpt-4.1-mini
2.09x
0.92
all PASS
2/4
8.1s
Key findings:
LLM wins on prose-heavy scenarios: Deep conversation, Technical explanation
Deterministic wins on structured/technical content: Coding assistant, Long Q&A, Tool-heavy, Structured content
Generated: 2026-02-25
Scenario details
Scenario
Method
Char Ratio
Token Ratio
vsDet
Compressed
Preserved
Round-trip
Time
Coding assistant
deterministic
1.68
1.67
-
5
8
PASS
0ms
llm-basic
1.48
1.48
0.88
5
8
PASS
5.9s
llm-escalate
1.55
1.55
0.92
5
8
PASS
3.0s
Long Q&A
deterministic
6.16
6.11
-
4
6
PASS
1ms
llm-basic
4.31
4.28
0.70
4
6
PASS
4.1s
llm-escalate
4.49
4.46
0.73
4
6
PASS
3.7s
Tool-heavy
deterministic
1.30
1.29
-
2
16
PASS
2ms
llm-basic
1.12
1.11
0.86
2
16
PASS
2.3s
llm-escalate
1.28
1.28
0.99
2
16
PASS
2.8s
Deep conversation
deterministic
2.12
2.12
-
50
1
PASS
3ms
llm-basic
3.12
3.11
1.47
50
1
PASS
22.7s
llm-escalate
3.28
3.26
1.54
50
1
PASS
23.3s
Technical explanation
deterministic
1.00
1.00
-
0
11
PASS
1ms
llm-basic
1.00
1.00
1.00
0
11
PASS
3.2s
llm-escalate
1.00
1.00
1.00
2
9
PASS
785ms
Structured content
deterministic
1.93
1.92
-
2
10
PASS
0ms
llm-basic
1.46
1.45
0.75
2
10
PASS
3.5s
llm-escalate
1.38
1.38
0.71
2
10
PASS
3.7s
Agentic coding session
deterministic
1.43
1.43
-
2
31
PASS
1ms
llm-basic
1.35
1.34
0.94
2
31
PASS
3.3s
llm-escalate
1.40
1.40
0.98
2
31
PASS
5.4s
Token Budget (target: 2000 tokens)
Scenario
Method
Tokens
Fits
recencyWindow
Ratio
Round-trip
Time
Deep conversation
deterministic
3738
false
0
2.12
PASS
12ms
llm-escalate
2593
false
0
3.08
PASS
132.0s
Agentic coding session
deterministic
1957
true
9
1.36
PASS
2ms
llm-escalate
2003
false
9
1.33
PASS
4.1s
Generated: 2026-02-25
Scenario details
Scenario
Method
Char Ratio
Token Ratio
vsDet
Compressed
Preserved
Round-trip
Time
Coding assistant
deterministic
1.68
1.67
-
5
8
PASS
0ms
llm-basic
1.64
1.63
0.98
5
8
PASS
5.6s
llm-escalate
1.63
1.63
0.97
5
8
PASS
6.0s
Long Q&A
deterministic
6.16
6.11
-
4
6
PASS
1ms
llm-basic
5.37
5.33
0.87
4
6
PASS
5.9s
llm-escalate
5.35
5.31
0.87
4
6
PASS
7.0s
Tool-heavy
deterministic
1.30
1.29
-
2
16
PASS
0ms
llm-basic
1.11
1.10
0.85
2
16
PASS
3.5s
llm-escalate
1.12
1.12
0.86
2
16
PASS
5.3s
Deep conversation
deterministic
2.12
2.12
-
50
1
PASS
3ms
llm-basic
2.34
2.33
1.10
50
1
PASS
50.4s
llm-escalate
2.37
2.36
1.11
50
1
PASS
50.8s
Technical explanation
deterministic
1.00
1.00
-
0
11
PASS
1ms
llm-basic
1.00
1.00
1.00
1
10
PASS
2.6s
llm-escalate
1.00
1.00
1.00
1
10
PASS
3.3s
Structured content
deterministic
1.93
1.92
-
2
10
PASS
0ms
llm-basic
1.23
1.23
0.64
2
10
PASS
10.2s
llm-escalate
1.29
1.29
0.67
2
10
PASS
4.8s
Agentic coding session
deterministic
1.43
1.43
-
2
31
PASS
1ms
llm-basic
1.43
1.43
1.00
2
31
PASS
5.8s
llm-escalate
1.32
1.32
0.93
1
32
PASS
9.5s
Token Budget (target: 2000 tokens)
Scenario
Method
Tokens
Fits
recencyWindow
Ratio
Round-trip
Time
Deep conversation
deterministic
3738
false
0
2.12
PASS
10ms
llm-escalate
3391
false
0
2.35
PASS
280.5s
Agentic coding session
deterministic
1957
true
9
1.36
PASS
2ms
llm-escalate
1915
true
3
1.39
PASS
28.1s
Version
Date
Avg Char Ratio
Avg Token Ratio
Scenarios
1.3.0
2026-03-22
2.01
2.00
8
1.2.0
2026-03-20
2.01
2.00
8
1.1.0
2026-03-20
2.01
2.00
8
1.0.0
2026-03-10
2.01
2.00
8
2.01x → 2.01x avg compression (0.00%)
Scenario
v1.2.0
v1.3.0
Change
Token Δ
Coding assistant
1.94x
1.94x
0.00%
0.00%
─
Long Q&A
4.90x
4.90x
0.00%
0.00%
─
Tool-heavy
1.40x
1.40x
0.00%
0.00%
─
Short conversation
1.00x
1.00x
0.00%
0.00%
─
Deep conversation
2.50x
2.50x
0.00%
0.00%
─
Technical explanation
1.00x
1.00x
0.00%
0.00%
─
Structured content
1.86x
1.86x
0.00%
0.00%
─
Agentic coding session
1.48x
1.48x
0.00%
0.00%
─
Bundle: 183.5 KB → 185.0 KB (+0.86%)
v1.2.0 (2026-03-20) — 2.01x avg
Scenario
Char Ratio
Token Ratio
Compressed
Preserved
Coding assistant
1.94
1.93
5
8
Long Q&A
4.90
4.88
4
6
Tool-heavy
1.40
1.39
2
16
Short conversation
1.00
1.00
0
7
Deep conversation
2.50
2.49
50
1
Technical explanation
1.00
1.00
0
11
Structured content
1.86
1.85
2
10
Agentic coding session
1.48
1.47
2
31
v1.1.0 (2026-03-20) — 2.01x avg
Scenario
Char Ratio
Token Ratio
Compressed
Preserved
Coding assistant
1.94
1.93
5
8
Long Q&A
4.90
4.88
4
6
Tool-heavy
1.41
1.40
2
16
Short conversation
1.00
1.00
0
7
Deep conversation
2.50
2.49
50
1
Technical explanation
1.00
1.00
0
11
Structured content
1.86
1.85
2
10
Agentic coding session
1.48
1.47
2
31
v1.0.0 (2026-03-10) — 2.01x avg
Scenario
Char Ratio
Token Ratio
Compressed
Preserved
Coding assistant
1.94
1.93
5
8
Long Q&A
4.90
4.88
4
6
Tool-heavy
1.41
1.40
2
16
Short conversation
1.00
1.00
0
7
Deep conversation
2.50
2.49
50
1
Technical explanation
1.00
1.00
0
11
Structured content
1.86
1.85
2
10
Agentic coding session
1.48
1.47
2
31
All deterministic results use the same input → same output guarantee
Metrics: compression ratio, token ratio, message counts, dedup counts
Timing is excluded from baselines (hardware-dependent)
LLM benchmarks are saved as reference data, not used for regression testing
Round-trip integrity is verified for every scenario (compress then uncompress)