Skip to content

Commit ac04bef

Browse files
authored
Merge pull request #19 from SimplyLiz/feature/v2-improvements
feat: v2 compression features — quality metrics, flow detection, tiered budget, depth control
2 parents 11cabc3 + a75f1d4 commit ac04bef

37 files changed

Lines changed: 6456 additions & 254 deletions

CHANGELOG.md

Lines changed: 32 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,40 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
## [1.2.0] - 2026-03-20
11+
1012
### Added
1113

12-
- **Importance-weighted retention** (`importanceScoring: true`) — per-message importance scoring based on forward-reference density (how many later messages share entities with this one), decision/correction content signals, and recency. Messages scoring above `importanceThreshold` (default 0.35) are preserved even outside the recency window. `forceConverge` truncates low-importance messages first. New stats: `messages_importance_preserved`.
13-
- **Contradiction detection** (`contradictionDetection: true`) — detects later messages that correct or override earlier ones using topic-overlap gating (word-level Jaccard) and correction signal patterns (`actually`, `don't use`, `instead`, `scratch that`, etc.). Superseded messages are compressed with a provenance annotation (`[cce:superseded by ...]`) linking to the correction. New stats: `messages_contradicted`. New decision action: `contradicted`.
14-
- New exports: `computeImportance`, `scoreContentSignals`, `DEFAULT_IMPORTANCE_THRESHOLD`, `analyzeContradictions` for standalone use outside `compress()`.
15-
- New types: `ImportanceMap`, `ContradictionAnnotation`.
14+
- **Quality metrics**`entity_retention`, `structural_integrity`, `reference_coherence`, and composite `quality_score` (0–1) computed automatically on every compression. Tracks identifier preservation, code fence survival, and reference coherence.
15+
- **Relevance threshold** (`relevanceThreshold`) — drops low-value messages to compact stubs instead of producing low-quality summaries. Consecutive stubs grouped. New stat: `messages_relevance_dropped`.
16+
- **Tiered budget strategy** (`budgetStrategy: 'tiered'`) — alternative to binary search that keeps recency window fixed and progressively compresses older content (tighten → stub → truncate).
17+
- **Entropy scorer** (`entropyScorer`) — plug in a small causal LM for information-theoretic sentence scoring. Modes: `'augment'` (weighted average with heuristic) or `'replace'` (entropy only).
18+
- **Conversation flow detection** (`conversationFlow: true`) — groups Q&A pairs, request→action→confirmation chains, corrections, and acknowledgments into compression units for more coherent summaries.
19+
- **Cross-message coreference** (`coreference: true`) — inlines entity definitions into compressed summaries when a preserved message references an entity defined only in a compressed message.
20+
- **Semantic clustering** (`semanticClustering: true`) — groups consecutive messages by topic using TF-IDF cosine similarity + entity overlap Jaccard, compresses each cluster as a unit.
21+
- **Compression depth** (`compressionDepth`) — `'gentle'` (default), `'moderate'` (tighter budgets), `'aggressive'` (entity-only stubs), `'auto'` (progressive escalation until `tokenBudget` fits).
22+
- **Discourse-aware summarization** (`discourseAware: true`) — experimental EDU-lite decomposition with dependency tracking. Reduces ratio 8–28% without a custom ML scorer; use exported `segmentEDUs`/`scoreEDUs`/`selectEDUs` directly instead.
23+
- **ML token classifier** (`mlTokenClassifier`) — per-token keep/remove classification via user-provided model (LLMLingua-2 style). Includes `createMockTokenClassifier` for testing.
24+
- **Importance-weighted retention** (`importanceScoring: true`) — per-message importance scoring based on forward-reference density, decision/correction content signals, and recency. Default threshold raised to 0.65.
25+
- **Contradiction detection** (`contradictionDetection: true`) — detects later messages that correct earlier ones. Superseded messages compressed with provenance annotation.
26+
- **A/B comparison tool** (`npm run bench:compare`) — side-by-side comparison of default vs v2 features.
27+
- **V2 Features Comparison** section in benchmark output — per-feature and recommended combo vs default.
28+
- **Adversarial test suite** — 8 edge-case tests (pronoun-heavy, scattered entities, correction chains, code-interleaved prose, near-duplicates, 10k+ char messages, mixed SQL/JSON/bash, full round-trip with all features).
29+
- New modules: `entities.ts`, `entropy.ts`, `flow.ts`, `coreference.ts`, `cluster.ts`, `discourse.ts`, `ml-classifier.ts`.
30+
- New types: `ImportanceMap`, `ContradictionAnnotation`, `MLTokenClassifier`, `TokenClassification`, `FlowChain`, `MessageCluster`, `EDU`, `EntityDefinition`.
31+
- Comprehensive [V2 features documentation](docs/v2-features.md) with tradeoff analysis per feature.
32+
33+
### Changed
34+
35+
- Adaptive summary budgets scale with content density when `compressionDepth` is set to `'moderate'` or higher (entity-dense content gets up to 45% budget, sparse content down to 15%).
36+
- Default path (no v2 options) produces identical output to v1.1.0 — all new features are opt-in.
37+
- Quality metrics section added to benchmark reporter and generated docs.
38+
39+
### Fixed
40+
41+
- Flow chains no longer skip non-member messages between chain endpoints.
42+
- Semantic clusters restricted to consecutive indices to preserve round-trip ordering.
43+
- Flow chains exclude messages with code fences to prevent structural integrity loss.
1644

1745
## [1.1.0] - 2026-03-19
1846

bench/baseline.ts

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,13 @@ export interface RetentionResult {
4646
structuralRetention: number;
4747
}
4848

49+
export interface QualityResult {
50+
entityRetention: number;
51+
structuralIntegrity: number;
52+
referenceCoherence: number;
53+
qualityScore: number;
54+
}
55+
4956
export interface AncsResult {
5057
baselineRatio: number;
5158
importanceRatio: number;
@@ -62,6 +69,7 @@ export interface BenchmarkResults {
6269
fuzzyDedup: Record<string, FuzzyDedupResult>;
6370
bundleSize: Record<string, BundleSizeResult>;
6471
retention?: Record<string, RetentionResult>;
72+
quality?: Record<string, QualityResult>;
6573
ancs?: Record<string, AncsResult>;
6674
}
6775

@@ -1192,6 +1200,13 @@ export function generateBenchmarkDocs(baselinesDir: string, outputPath: string):
11921200
lines.push(`| Average compression | ${fix(avgR)}x |`);
11931201
lines.push(`| Best compression | ${fix(Math.max(...ratios))}x |`);
11941202
lines.push(`| Round-trip integrity | all PASS |`);
1203+
if (latest.results.quality && Object.keys(latest.results.quality).length > 0) {
1204+
const qualityEntries = Object.values(latest.results.quality);
1205+
const avgQ = qualityEntries.reduce((s, q) => s + q.qualityScore, 0) / qualityEntries.length;
1206+
lines.push(`| Average quality score | ${fix(avgQ, 3)} |`);
1207+
const avgER = qualityEntries.reduce((s, q) => s + q.entityRetention, 0) / qualityEntries.length;
1208+
lines.push(`| Average entity retention | ${(avgER * 100).toFixed(0)}% |`);
1209+
}
11951210
lines.push('');
11961211

11971212
// --- Pie chart: message outcome distribution ---
@@ -1219,6 +1234,22 @@ export function generateBenchmarkDocs(baselinesDir: string, outputPath: string):
12191234
lines.push('');
12201235
}
12211236

1237+
// --- Quality ---
1238+
if (latest.results.quality && Object.keys(latest.results.quality).length > 0) {
1239+
lines.push('## Quality Metrics');
1240+
lines.push('');
1241+
lines.push(
1242+
'| Scenario | Entity Retention | Structural Integrity | Reference Coherence | Quality Score |',
1243+
);
1244+
lines.push('| --- | --- | --- | --- | --- |');
1245+
for (const [name, q] of Object.entries(latest.results.quality)) {
1246+
lines.push(
1247+
`| ${name} | ${(q.entityRetention * 100).toFixed(0)}% | ${(q.structuralIntegrity * 100).toFixed(0)}% | ${(q.referenceCoherence * 100).toFixed(0)}% | ${q.qualityScore.toFixed(3)} |`,
1248+
);
1249+
}
1250+
lines.push('');
1251+
}
1252+
12221253
// --- Token budget ---
12231254
lines.push(...generateTokenBudgetSection(latest.results));
12241255
lines.push('');

bench/baselines/current.json

Lines changed: 82 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
2-
"version": "1.1.0",
3-
"generated": "2026-03-20T18:05:08.551Z",
2+
"version": "1.2.0",
3+
"generated": "2026-03-20T22:34:22.455Z",
44
"results": {
55
"basic": {
66
"Coding assistant": {
@@ -16,8 +16,8 @@
1616
"preserved": 6
1717
},
1818
"Tool-heavy": {
19-
"ratio": 1.4128440366972477,
20-
"tokenRatio": 1.4043583535108959,
19+
"ratio": 1.4009797060881735,
20+
"tokenRatio": 1.3908872901678657,
2121
"compressed": 2,
2222
"preserved": 16
2323
},
@@ -102,10 +102,10 @@
102102
"deduped": 1
103103
},
104104
"Tool-heavy": {
105-
"rw0Base": 1.4128440366972477,
106-
"rw0Dup": 1.4128440366972477,
107-
"rw4Base": 1.4128440366972477,
108-
"rw4Dup": 1.4128440366972477,
105+
"rw0Base": 1.4009797060881735,
106+
"rw0Dup": 1.4009797060881735,
107+
"rw4Base": 1.4009797060881735,
108+
"rw4Dup": 1.4009797060881735,
109109
"deduped": 0
110110
},
111111
"Short conversation": {
@@ -158,7 +158,7 @@
158158
"Tool-heavy": {
159159
"exact": 0,
160160
"fuzzy": 0,
161-
"ratio": 1.4128440366972477
161+
"ratio": 1.4009797060881735
162162
},
163163
"Short conversation": {
164164
"exact": 0,
@@ -199,18 +199,38 @@
199199
"bytes": 10994,
200200
"gzipBytes": 4452
201201
},
202+
"cluster.js": {
203+
"bytes": 7587,
204+
"gzipBytes": 2471
205+
},
202206
"compress.js": {
203-
"bytes": 53439,
204-
"gzipBytes": 11671
207+
"bytes": 86117,
208+
"gzipBytes": 16727
205209
},
206210
"contradiction.js": {
207211
"bytes": 7700,
208212
"gzipBytes": 2717
209213
},
214+
"coreference.js": {
215+
"bytes": 4321,
216+
"gzipBytes": 1500
217+
},
210218
"dedup.js": {
211219
"bytes": 10260,
212220
"gzipBytes": 2864
213221
},
222+
"discourse.js": {
223+
"bytes": 6792,
224+
"gzipBytes": 2495
225+
},
226+
"entities.js": {
227+
"bytes": 8403,
228+
"gzipBytes": 2665
229+
},
230+
"entropy.js": {
231+
"bytes": 1979,
232+
"gzipBytes": 832
233+
},
214234
"expand.js": {
215235
"bytes": 2795,
216236
"gzipBytes": 934
@@ -219,13 +239,21 @@
219239
"bytes": 11923,
220240
"gzipBytes": 2941
221241
},
242+
"flow.js": {
243+
"bytes": 7967,
244+
"gzipBytes": 2086
245+
},
222246
"importance.js": {
223247
"bytes": 4759,
224-
"gzipBytes": 1849
248+
"gzipBytes": 1850
225249
},
226250
"index.js": {
227-
"bytes": 854,
228-
"gzipBytes": 405
251+
"bytes": 1809,
252+
"gzipBytes": 761
253+
},
254+
"ml-classifier.js": {
255+
"bytes": 3096,
256+
"gzipBytes": 1208
229257
},
230258
"summarizer.js": {
231259
"bytes": 2542,
@@ -236,8 +264,46 @@
236264
"gzipBytes": 31
237265
},
238266
"total": {
239-
"bytes": 114084,
240-
"gzipBytes": 31813
267+
"bytes": 187862,
268+
"gzipBytes": 50483
269+
}
270+
},
271+
"quality": {
272+
"Coding assistant": {
273+
"entityRetention": 1,
274+
"structuralIntegrity": 1,
275+
"referenceCoherence": 1,
276+
"qualityScore": 1
277+
},
278+
"Long Q&A": {
279+
"entityRetention": 1,
280+
"structuralIntegrity": 1,
281+
"referenceCoherence": 1,
282+
"qualityScore": 1
283+
},
284+
"Tool-heavy": {
285+
"entityRetention": 0.931,
286+
"structuralIntegrity": 1,
287+
"referenceCoherence": 1,
288+
"qualityScore": 0.972
289+
},
290+
"Deep conversation": {
291+
"entityRetention": 1,
292+
"structuralIntegrity": 1,
293+
"referenceCoherence": 1,
294+
"qualityScore": 1
295+
},
296+
"Structured content": {
297+
"entityRetention": 1,
298+
"structuralIntegrity": 1,
299+
"referenceCoherence": 1,
300+
"qualityScore": 1
301+
},
302+
"Agentic coding session": {
303+
"entityRetention": 0.848,
304+
"structuralIntegrity": 1,
305+
"referenceCoherence": 1,
306+
"qualityScore": 0.939
241307
}
242308
},
243309
"retention": {

0 commit comments

Comments
 (0)