diff --git a/.gitignore b/.gitignore index 4023cf2..217fc86 100644 --- a/.gitignore +++ b/.gitignore @@ -47,6 +47,9 @@ data/*.json _internal/ mr-data/ +# Claude Code skills authored locally, not part of the published library +_skills/ + # Per-namespace runtime artifacts. `_oplog.json` is the live delta-sync # ringbuffer, regenerated on every mutation. `_audit/` is the tamper-evident # hash chain at data-dir root, not in packs. diff --git a/CHANGELOG.md b/CHANGELOG.md index be5d9de..48cbe7c 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,66 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 --- +## [Unreleased] — Lexical groups (per-namespace morph + abbrev normalization) + +### Added — `LexicalGroup` primitive + +Per-namespace, per-language lexical normalization that runs at +tokenization time (both index-time on seeds and query-time on resolves). +Two kinds: + +- **`morph`** — inflectional variants of one root (e.g. + `child` ⇄ `children`, `predict` ⇄ `predicts` ⇄ `predicting`). +- **`abbrev`** — short forms of a longer phrase (e.g. + `rbi` → `real-time biometric identification`, `csam` → `child sexual + abuse material`). + +Distinct from synonyms by design: groups only collapse forms that share +the same surface meaning. Synonym expansion was tried in the L1 graph +era and removed because of pollution (one source intent leaking into +unrelated sibling intents). Lexical groups don't have that failure mode +because they only affect the literal token, not its semantic neighbours. + +Stored per-namespace in `_ns.json`, persistence is round-tripped, and +mutations rebuild the index so existing seeds re-tokenize through the +new groups. + +#### Surface + +- **Library (Rust):** `microresolve::{LexicalGroup, LexicalKind}` is + re-exported from the crate root. Engine API on `NamespaceHandle`: + `list_lexical_groups`, `add_lexical_group`, `remove_lexical_group`, + `update_lexical_group`. +- **Server:** `GET/POST /api/lexical-groups`, + `DELETE/PATCH /api/lexical-groups/{idx}`, plus + `POST /api/lexical-groups/suggest` for operator-triggered LLM + proposals (returns proposals; nothing applies until you approve). + Every mutation lands in the per-key audit chain + (`lexical_group.add` / `.remove` / `.update`). +- **Studio UI:** new "Lexicon" page under **Build** with tabs for + Inflections / Abbreviations, manual add form, and an LLM Suggest panel + that grounds proposals in the namespace's actual vocabulary + + intent descriptions. +- **Python bindings:** `LexicalGroup` class plus + `Namespace.list_lexical_groups`, `add_lexical_group`, + `remove_lexical_group`, `update_lexical_group`. +- **Node bindings:** `LexicalGroup` interface plus the same four + methods on `Namespace` (camelCase per napi convention). + +#### Pack support + +The `eu-ai-act-prohibited` pack ships with 10 morph groups (child, +warrant, predict, person, score, manipulate, infer, categorize, +exploit, scrape) and 3 abbreviations (rbi, ncii, csam) — measured as ++2.5pp F1 vs. the no-lexical baseline on a 100-prohibited / 80-benign +hand-curated eval, with zero regression on CLINC150 + BANKING77. + +Note: older "lexical" mentions in this CHANGELOG (the L0/L1 graph +removal) are unrelated — those layers were removed in v0.1. The new +primitive is bounded, per-namespace, and operator-controlled. + +--- + ## [0.2.2] — 2026-05-08 ### Added — Tamper-evident audit log (continuation of v0.2.0 compliance packs) diff --git a/README.md b/README.md index 3a57991..8e01132 100644 --- a/README.md +++ b/README.md @@ -188,6 +188,24 @@ Maps onto **EU AI Act Art. 13**, **HIPAA §164.312(b)**, **SOC 2 CC7.2**, **NIST AI RMF Govern**. Suitable for SMB / regulated-but-not-certified deployments; no SOC 2 attestation, no managed service required. +## Lexical groups (per-namespace morph + abbrev) + +Two kinds of token-level normalization run at tokenize time, both +per-namespace and language-tagged: + +- **`morph`** — collapse inflectional variants of one root + (`child` ⇄ `children`, `predict` ⇄ `predicts` ⇄ `predicting`). +- **`abbrev`** — expand short forms of a phrase + (`rbi` → `real-time biometric identification`). + +Distinct from synonyms: groups only collapse forms with the same surface +meaning, so they don't pollute sibling intents the way synonym expansion +did in earlier versions. Manage via the Studio "Lexicon" page (manual +add or LLM-suggested with operator approval), or via the library / +HTTP / Python / Node bindings — all four surfaces ship the same four +methods (`list`, `add`, `remove`, `update`). Mutations land in the +audit chain. + ## Architecture, multi-intent, multilingual, HTTP API Deeper concept docs live on the [documentation site](https://gladius.github.io/microresolve/concepts/): diff --git a/benchmarks/eu_ai_act_eval.py b/benchmarks/eu_ai_act_eval.py new file mode 100644 index 0000000..f4a1add --- /dev/null +++ b/benchmarks/eu_ai_act_eval.py @@ -0,0 +1,274 @@ +"""Confusion-matrix evaluation of the eu-ai-act-prohibited pack. + +Per-intent TP/FN/FP/TN, macro precision/recall/F1, threshold sweep, +benign aggregate FP rate. Adjacent-legal benigns (looks-like-prohibited +but carved out by Feb 2025 Commission guidelines) are tracked separately. + +Run: python benchmarks/eu_ai_act_eval.py +""" +import json +import shutil +from collections import defaultdict +from pathlib import Path + +import microresolve + +PACK_NAME = "eu-ai-act-prohibited" +PACK_SRC = Path("packs") / PACK_NAME +CORPUS = Path("_internal/EU_AI_ACT_EVAL_CORPUS.json") +THRESHOLDS = [0.5, 0.8, 1.0, 1.3, 1.5, 1.8, 2.0, 2.5] + + +def stage_pack(): + p = Path("/tmp/eu_ai_act_eval_data") + if p.exists(): + shutil.rmtree(p) + p.mkdir(parents=True) + shutil.copytree(PACK_SRC, p / PACK_NAME) + return p + + +def intents_from_pack(): + return sorted(p.stem for p in PACK_SRC.glob("*.json") if p.name != "_ns.json") + + +def top1(result): + """Return (intent_id, score, band) of the top High-band hit, else None.""" + if not result.intents: + return None + top = result.intents[0] + return (top.id, top.score, top.band) + + +def eval_at_threshold(threshold, intents, prohibited, benigns): + data = stage_pack() + e = microresolve.MicroResolve(data_dir=str(data)) + ns = e.namespace(PACK_NAME) + ns.update_namespace({"default_threshold": threshold}) + + # Per-intent confusion matrix: intent → {tp, fn, fp, tn} + cm = {i: {"tp": 0, "fn": 0, "fp": 0, "tn": 0} for i in intents} + + fn_examples = [] # missed prohibited + fp_examples = [] # benign hits and adjacent-legal hits + routing_misses = [] # prohibited routed to wrong intent + + # Walk prohibited queries: each one is positive for its expected_intent + # and negative for every other intent. + for entry in prohibited: + text = entry["text"] + expected = entry["expected_intent"] + r = ns.resolve(text) + top = top1(r) + predicted = top[0] if top and top[2] == "High" else None + + for intent in intents: + is_positive = (intent == expected) + is_predicted = (predicted == intent) + if is_positive and is_predicted: + cm[intent]["tp"] += 1 + elif is_positive and not is_predicted: + cm[intent]["fn"] += 1 + if predicted is None: + fn_examples.append((text, expected, "no_high_band", top[1] if top else 0.0)) + else: + routing_misses.append((text, expected, predicted, top[1])) + fn_examples.append((text, expected, f"->{predicted}", top[1])) + elif not is_positive and is_predicted: + cm[intent]["fp"] += 1 + else: + cm[intent]["tn"] += 1 + + # Walk benigns: a benign hitting any High-band prohibited intent (excluding + # legitimate_use, which is the negative class) is a false positive. + benign_high_hits = 0 + benign_legitimate_hits = 0 + adjacent_high_hits = 0 + adjacent_legitimate_hits = 0 + by_category = defaultdict(lambda: {"high_hits": 0, "legit_hits": 0, "total": 0}) + + for entry in benigns: + text = entry["text"] + cat = entry.get("category", "generic_benign") + r = ns.resolve(text) + top = top1(r) + is_high = top is not None and top[2] == "High" + is_legit = is_high and top[0] == "legitimate_use" + is_prohibited_hit = is_high and top[0] != "legitimate_use" + + by_category[cat]["total"] += 1 + if is_prohibited_hit: + by_category[cat]["high_hits"] += 1 + if is_legit: + by_category[cat]["legit_hits"] += 1 + + is_adjacent = cat.startswith("adjacent_") + if is_adjacent: + if is_prohibited_hit: + adjacent_high_hits += 1 + if is_legit: + adjacent_legitimate_hits += 1 + else: + if is_prohibited_hit: + benign_high_hits += 1 + if is_legit: + benign_legitimate_hits += 1 + + if is_prohibited_hit: + fp_examples.append((text, top[0], top[1], cat)) + + # For confusion matrix: for each non-legitimate intent, a high-band hit + # on a benign query is an FP. Already counted via prohibited loop only, + # so add benign FPs here. + for intent in intents: + if is_high and intent == top[0] and intent != "legitimate_use": + cm[intent]["fp"] += 1 + else: + cm[intent]["tn"] += 1 + + # Compute per-intent P/R/F1 + def f1(tp, fp, fn): + prec = tp / (tp + fp) if (tp + fp) else 0.0 + rec = tp / (tp + fn) if (tp + fn) else 0.0 + f = 2 * prec * rec / (prec + rec) if (prec + rec) else 0.0 + return prec, rec, f + + per_intent = {} + for intent in intents: + c = cm[intent] + prec, rec, f = f1(c["tp"], c["fp"], c["fn"]) + per_intent[intent] = { + **c, + "precision": prec, + "recall": rec, + "f1": f, + } + + # Macro averages over PROHIBITED intents only (exclude legitimate_use, + # which is the negative class). + macro_intents = [i for i in intents if i != "legitimate_use"] + macro_precision = sum(per_intent[i]["precision"] for i in macro_intents) / len(macro_intents) + macro_recall = sum(per_intent[i]["recall"] for i in macro_intents) / len(macro_intents) + macro_f1 = sum(per_intent[i]["f1"] for i in macro_intents) / len(macro_intents) + + n_generic = sum(1 for b in benigns if not b.get("category", "").startswith("adjacent_")) + n_adjacent = sum(1 for b in benigns if b.get("category", "").startswith("adjacent_")) + + return { + "threshold": threshold, + "per_intent": per_intent, + "macro_precision": macro_precision, + "macro_recall": macro_recall, + "macro_f1": macro_f1, + "generic_benign_fp_rate": benign_high_hits / n_generic if n_generic else 0.0, + "adjacent_benign_fp_rate": adjacent_high_hits / n_adjacent if n_adjacent else 0.0, + "generic_legitimate_routing_rate": benign_legitimate_hits / n_generic if n_generic else 0.0, + "adjacent_legitimate_routing_rate": adjacent_legitimate_hits / n_adjacent if n_adjacent else 0.0, + "by_category": dict(by_category), + "fn_examples": fn_examples, + "fp_examples": fp_examples, + "routing_misses": routing_misses, + } + + +def main(): + corpus = json.load(open(CORPUS)) + prohibited = corpus["prohibited"] + benigns = corpus["benign"] + intents = intents_from_pack() + + print(f"Pack: {PACK_NAME}") + print(f"Intents: {len(intents)}: {intents}") + print(f"Prohibited queries: {len(prohibited)}") + n_generic = sum(1 for b in benigns if not b.get("category", "").startswith("adjacent_")) + n_adjacent = sum(1 for b in benigns if b.get("category", "").startswith("adjacent_")) + print(f"Benigns: {len(benigns)} ({n_generic} generic + {n_adjacent} adjacent-legal)") + print() + + print(f"{'thr':>5} {'macroP':>7} {'macroR':>7} {'macroF1':>8} {'genFP%':>7} {'adjFP%':>7} {'adjLegit%':>9}") + print("-" * 72) + results = [] + for t in THRESHOLDS: + r = eval_at_threshold(t, intents, prohibited, benigns) + results.append(r) + print( + f"{r['threshold']:>5.2f} " + f"{r['macro_precision'] * 100:>6.1f}% " + f"{r['macro_recall'] * 100:>6.1f}% " + f"{r['macro_f1'] * 100:>7.1f}% " + f"{r['generic_benign_fp_rate'] * 100:>6.1f}% " + f"{r['adjacent_benign_fp_rate'] * 100:>6.1f}% " + f"{r['adjacent_legitimate_routing_rate'] * 100:>8.1f}%" + ) + + print() + # Detail at default threshold + default_t = 1.5 + r = next((x for x in results if x["threshold"] == default_t), results[0]) + print(f"=== Per-intent detail at threshold = {default_t} ===\n") + print(f"{'intent':<35} {'TP':>4} {'FN':>4} {'FP':>4} {'TN':>4} {'P':>6} {'R':>6} {'F1':>6}") + print("-" * 80) + for intent in intents: + c = r["per_intent"][intent] + print( + f"{intent:<35} {c['tp']:>4} {c['fn']:>4} {c['fp']:>4} {c['tn']:>4} " + f"{c['precision'] * 100:>5.1f}% {c['recall'] * 100:>5.1f}% {c['f1'] * 100:>5.1f}%" + ) + print() + + if r["routing_misses"]: + print(f"=== Routing misses (top intent != expected, but still High band) ===") + for text, expected, predicted, score in r["routing_misses"][:15]: + print(f" expected={expected:<30} got={predicted:<30} score={score:.2f} {text[:80]}") + print() + + if r["fn_examples"]: + print(f"=== Missed prohibited (FN; top {min(15, len(r['fn_examples']))}) ===") + for text, expected, reason, score in r["fn_examples"][:15]: + print(f" exp={expected:<30} {reason:<22} score={score:.2f} {text[:80]}") + print() + + if r["fp_examples"]: + print(f"=== Benign false positives (top {min(15, len(r['fp_examples']))}) ===") + for text, intent, score, cat in r["fp_examples"][:15]: + print(f" cat={cat:<25} hit={intent:<30} score={score:.2f} {text[:80]}") + print() + + print("=== Adjacent-benign performance by sub-category at thr=1.5 ===") + for cat, stats in sorted(r["by_category"].items()): + if not cat.startswith("adjacent_"): + continue + total = stats["total"] + bad = stats["high_hits"] + legit = stats["legit_hits"] + print(f" {cat:<28} total={total:>3} prohibited_hits={bad:>2} legit_route={legit:>2}") + print() + + out = { + "pack": PACK_NAME, + "intents": intents, + "n_prohibited": len(prohibited), + "n_benign_generic": n_generic, + "n_benign_adjacent": n_adjacent, + "results": [ + { + "threshold": r["threshold"], + "per_intent": r["per_intent"], + "macro_precision": r["macro_precision"], + "macro_recall": r["macro_recall"], + "macro_f1": r["macro_f1"], + "generic_benign_fp_rate": r["generic_benign_fp_rate"], + "adjacent_benign_fp_rate": r["adjacent_benign_fp_rate"], + "by_category": r["by_category"], + } + for r in results + ], + } + out_path = Path("benchmarks/results/eu_ai_act_eval.json") + out_path.parent.mkdir(parents=True, exist_ok=True) + out_path.write_text(json.dumps(out, indent=2)) + print(f"Full results written to {out_path}") + + +if __name__ == "__main__": + main() diff --git a/benchmarks/test_lexical_groups.py b/benchmarks/test_lexical_groups.py new file mode 100644 index 0000000..13f49ed --- /dev/null +++ b/benchmarks/test_lexical_groups.py @@ -0,0 +1,146 @@ +"""Phase 2 smoke test: does lexical_groups normalize plurals + verb tenses +the way we expect, and does it pollute anything when applied? + +Test setup: + 1. Load eu-ai-act-prohibited pack (v0.2.2 baseline — 6 intents). + 2. Add a small set of hand-authored morph groups to _ns.json. + 3. Run a focused diagnostic: + a. queries that use SINGULAR form — should match seeds (baseline behavior) + b. queries that use PLURAL form / verb tenses — should now match where + previously they wouldn't (the feature working) + c. queries with random vocabulary that shouldn't change behavior + (regression check — no pollution) +""" +import json +import shutil +from pathlib import Path + +import microresolve + +PACK_NAME = "eu-ai-act-prohibited" +PACK_SRC = Path("packs") / PACK_NAME + +# Hand-authored morph groups we'll inject for the test. +MORPH_GROUPS = [ + {"kind": "morph", "lang": "en", "canonical": "child", + "variants": ["child", "children", "child's"]}, + {"kind": "morph", "lang": "en", "canonical": "warrant", + "variants": ["warrant", "warrants"]}, + {"kind": "morph", "lang": "en", "canonical": "predict", + "variants": ["predict", "predicts", "predicted", "predicting", "prediction"]}, + {"kind": "morph", "lang": "en", "canonical": "person", + "variants": ["person", "persons", "people"]}, + {"kind": "morph", "lang": "en", "canonical": "score", + "variants": ["score", "scores", "scoring", "scored"]}, + {"kind": "morph", "lang": "en", "canonical": "manipulate", + "variants": ["manipulate", "manipulates", "manipulating", "manipulated", "manipulation"]}, +] + + +def stage(suffix: str, with_groups: bool) -> Path: + data = Path(f"/tmp/lexical_test_{suffix}") + if data.exists(): + shutil.rmtree(data) + data.mkdir(parents=True) + shutil.copytree(PACK_SRC, data / PACK_NAME) + if with_groups: + ns_path = data / PACK_NAME / "_ns.json" + d = json.loads(ns_path.read_text()) + d["lexical_groups"] = MORPH_GROUPS + ns_path.write_text(json.dumps(d, indent=2)) + return data + + +def resolve(ns, query: str): + r = ns.resolve(query) + if not r.intents: + return ("(none)", 0.0, "Low") + top = r.intents[0] + return (top.id, top.score, top.band) + + +def main(): + print("=" * 80) + print("PHASE 2 — lexical_groups behavioral smoke test") + print("=" * 80) + + # Pairs where the second query is a morph variant the first form's seeds + # would normally match. We expect: + # - baseline: first matches well, second matches poorly (or differently) + # - with groups: both match similarly (the feature working) + PAIRS = [ + ("query about a child manipulation", + "query about children manipulation", + "expects: child = children should give same result"), + ("predict criminal behavior", + "predicting criminal behavior", + "expects: predict = predicting"), + ("subliminal manipulation", + "subliminal manipulating", + "expects: manipulation = manipulating"), + ("biometric scoring of people", + "biometric scores of persons", + "expects: score = scores AND person = persons"), + ] + + # Random benign queries — should NOT change between baseline and with-groups. + BENIGNS = [ + "weather forecasting model for agriculture", + "recommend movies based on genre preferences", + "translate documents between languages", + "voice transcription for meeting notes", + ] + + print("\n--- Setup ---") + base_dir = stage("baseline", with_groups=False) + morph_dir = stage("with_morph", with_groups=True) + print(f"Baseline pack: {base_dir}/{PACK_NAME} (no lexical_groups)") + print(f"With-morph pack: {morph_dir}/{PACK_NAME} ({len(MORPH_GROUPS)} groups)") + + e_base = microresolve.MicroResolve(data_dir=str(base_dir)) + ns_base = e_base.namespace(PACK_NAME) + + e_morph = microresolve.MicroResolve(data_dir=str(morph_dir)) + ns_morph = e_morph.namespace(PACK_NAME) + + print("\n--- Behavioral pairs (singular vs morph variant) ---") + print(f"{'Query':<48} {'Baseline':<32} {'With morph':<32}") + print("-" * 116) + for first, second, _exp in PAIRS: + r1_base, r2_base = resolve(ns_base, first), resolve(ns_base, second) + r1_morph, r2_morph = resolve(ns_morph, first), resolve(ns_morph, second) + # Show the second (variant) query's behavior: does it now match closer + # to what the first (canonical) query produces? + print(f" {first[:46]:<48} {f'{r1_base[0]} {r1_base[1]:.2f} ({r1_base[2]})':<32} {f'{r1_morph[0]} {r1_morph[1]:.2f} ({r1_morph[2]})':<32}") + print(f" {second[:46]:<48} {f'{r2_base[0]} {r2_base[1]:.2f} ({r2_base[2]})':<32} {f'{r2_morph[0]} {r2_morph[1]:.2f} ({r2_morph[2]})':<32}") + # Improvement check: did the variant's score get closer to the canonical's? + improved = abs(r2_morph[1] - r1_morph[1]) < abs(r2_base[1] - r1_base[1]) + print(f" variant gap closed? baseline_gap={abs(r2_base[1]-r1_base[1]):.2f} morph_gap={abs(r2_morph[1]-r1_morph[1]):.2f} → {'YES' if improved else 'no'}") + print() + + print("\n--- Regression check: benign queries should not change ---") + print(f"{'Query':<58} {'Baseline':<24} {'With morph':<24} {'changed?'}") + print("-" * 124) + regressions = 0 + for q in BENIGNS: + r_base = resolve(ns_base, q) + r_morph = resolve(ns_morph, q) + same = r_base[0] == r_morph[0] and abs(r_base[1] - r_morph[1]) < 0.01 + if not same and (r_base[2] == "High" or r_morph[2] == "High"): + regressions += 1 + print(f" {q[:56]:<58} {f'{r_base[0]} {r_base[1]:.2f}':<24} {f'{r_morph[0]} {r_morph[1]:.2f}':<24} {'NO' if same else 'YES'}") + + print(f"\nRegressions on benigns (different High-band routing): {regressions}") + if regressions == 0: + print("✓ PASS — no benign pollution from morph groups") + else: + print("✗ FAIL — morph groups changed benign routing") + + # Variant index check + print("\n--- Internal: how many variants did the LexicalIndex load? ---") + print(f"Baseline namespace (no groups): {ns_base.intent_count()} intents") + print(f"With-morph namespace (groups loaded): {ns_morph.intent_count()} intents") + + +if __name__ == "__main__": + main() diff --git a/node/Cargo.lock b/node/Cargo.lock index 87120e6..2efd1e1 100644 --- a/node/Cargo.lock +++ b/node/Cargo.lock @@ -720,7 +720,7 @@ checksum = "f8ca58f447f06ed17d5fc4043ce1b10dd205e060fb3ce5b979b8ed8e59ff3f79" [[package]] name = "microresolve" -version = "0.2.1" +version = "0.2.2" dependencies = [ "aho-corasick", "regex", @@ -735,7 +735,7 @@ dependencies = [ [[package]] name = "microresolve-node" -version = "0.2.1" +version = "0.2.2" dependencies = [ "microresolve", "napi", diff --git a/node/index.d.ts b/node/index.d.ts index 940be6b..cdaf2bf 100644 --- a/node/index.d.ts +++ b/node/index.d.ts @@ -165,6 +165,18 @@ export declare class Namespace { applyReview(missedPhrases: Record>, spansToLearn: Array, wrongDetections: Array, originalQuery: string, negativeAlpha?: number | undefined | null): number /** Remove a single phrase from an intent. Returns `true` if the phrase existed. */ removePhrase(intentId: string, phrase: string): boolean + /** List all lexical groups in this namespace. */ + listLexicalGroups(): Array + /** + * Add a lexical group. Returns the index of the new group. + * Rebuilds the index — every existing seed is re-tokenized through + * the new group set. + */ + addLexicalGroup(group: LexicalGroup): number + /** Remove the lexical group at `idx`. Rebuilds the index. */ + removeLexicalGroup(idx: number): LexicalGroup + /** Replace the lexical group at `idx`. Rebuilds the index. */ + updateLexicalGroup(idx: number, group: LexicalGroup): void } /** Options for `new MicroResolve(options)`. */ @@ -213,6 +225,21 @@ export interface IntentMatch { band: string } +/** + * A per-namespace lexical normalization group: either a `morph` (inflection + * variants of one root) or `abbrev` (short forms of a longer phrase). + */ +export interface LexicalGroup { + /** `"morph"` or `"abbrev"`. */ + kind: string + /** Language code (e.g. `"en"`). */ + lang: string + /** The form every variant normalizes to. */ + canonical: string + /** All variants (canonical is included automatically). */ + variants: Array +} + /** * Edit options accepted by `updateNamespace`. * diff --git a/node/index.js b/node/index.js index 46ec88b..b2403a8 100644 --- a/node/index.js +++ b/node/index.js @@ -77,8 +77,8 @@ function requireNative() { try { const binding = require('microresolve-android-arm64') const bindingPackageVersion = require('microresolve-android-arm64/package.json').version - if (bindingPackageVersion !== '0.2.1' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { - throw new Error(`Native binding package version mismatch, expected 0.2.1 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) + if (bindingPackageVersion !== '0.2.2' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { + throw new Error(`Native binding package version mismatch, expected 0.2.2 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) } return binding } catch (e) { @@ -93,8 +93,8 @@ function requireNative() { try { const binding = require('microresolve-android-arm-eabi') const bindingPackageVersion = require('microresolve-android-arm-eabi/package.json').version - if (bindingPackageVersion !== '0.2.1' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { - throw new Error(`Native binding package version mismatch, expected 0.2.1 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) + if (bindingPackageVersion !== '0.2.2' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { + throw new Error(`Native binding package version mismatch, expected 0.2.2 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) } return binding } catch (e) { @@ -114,8 +114,8 @@ function requireNative() { try { const binding = require('microresolve-win32-x64-gnu') const bindingPackageVersion = require('microresolve-win32-x64-gnu/package.json').version - if (bindingPackageVersion !== '0.2.1' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { - throw new Error(`Native binding package version mismatch, expected 0.2.1 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) + if (bindingPackageVersion !== '0.2.2' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { + throw new Error(`Native binding package version mismatch, expected 0.2.2 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) } return binding } catch (e) { @@ -130,8 +130,8 @@ function requireNative() { try { const binding = require('microresolve-win32-x64-msvc') const bindingPackageVersion = require('microresolve-win32-x64-msvc/package.json').version - if (bindingPackageVersion !== '0.2.1' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { - throw new Error(`Native binding package version mismatch, expected 0.2.1 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) + if (bindingPackageVersion !== '0.2.2' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { + throw new Error(`Native binding package version mismatch, expected 0.2.2 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) } return binding } catch (e) { @@ -147,8 +147,8 @@ function requireNative() { try { const binding = require('microresolve-win32-ia32-msvc') const bindingPackageVersion = require('microresolve-win32-ia32-msvc/package.json').version - if (bindingPackageVersion !== '0.2.1' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { - throw new Error(`Native binding package version mismatch, expected 0.2.1 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) + if (bindingPackageVersion !== '0.2.2' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { + throw new Error(`Native binding package version mismatch, expected 0.2.2 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) } return binding } catch (e) { @@ -163,8 +163,8 @@ function requireNative() { try { const binding = require('microresolve-win32-arm64-msvc') const bindingPackageVersion = require('microresolve-win32-arm64-msvc/package.json').version - if (bindingPackageVersion !== '0.2.1' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { - throw new Error(`Native binding package version mismatch, expected 0.2.1 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) + if (bindingPackageVersion !== '0.2.2' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { + throw new Error(`Native binding package version mismatch, expected 0.2.2 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) } return binding } catch (e) { @@ -182,8 +182,8 @@ function requireNative() { try { const binding = require('microresolve-darwin-universal') const bindingPackageVersion = require('microresolve-darwin-universal/package.json').version - if (bindingPackageVersion !== '0.2.1' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { - throw new Error(`Native binding package version mismatch, expected 0.2.1 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) + if (bindingPackageVersion !== '0.2.2' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { + throw new Error(`Native binding package version mismatch, expected 0.2.2 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) } return binding } catch (e) { @@ -198,8 +198,8 @@ function requireNative() { try { const binding = require('microresolve-darwin-x64') const bindingPackageVersion = require('microresolve-darwin-x64/package.json').version - if (bindingPackageVersion !== '0.2.1' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { - throw new Error(`Native binding package version mismatch, expected 0.2.1 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) + if (bindingPackageVersion !== '0.2.2' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { + throw new Error(`Native binding package version mismatch, expected 0.2.2 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) } return binding } catch (e) { @@ -214,8 +214,8 @@ function requireNative() { try { const binding = require('microresolve-darwin-arm64') const bindingPackageVersion = require('microresolve-darwin-arm64/package.json').version - if (bindingPackageVersion !== '0.2.1' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { - throw new Error(`Native binding package version mismatch, expected 0.2.1 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) + if (bindingPackageVersion !== '0.2.2' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { + throw new Error(`Native binding package version mismatch, expected 0.2.2 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) } return binding } catch (e) { @@ -234,8 +234,8 @@ function requireNative() { try { const binding = require('microresolve-freebsd-x64') const bindingPackageVersion = require('microresolve-freebsd-x64/package.json').version - if (bindingPackageVersion !== '0.2.1' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { - throw new Error(`Native binding package version mismatch, expected 0.2.1 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) + if (bindingPackageVersion !== '0.2.2' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { + throw new Error(`Native binding package version mismatch, expected 0.2.2 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) } return binding } catch (e) { @@ -250,8 +250,8 @@ function requireNative() { try { const binding = require('microresolve-freebsd-arm64') const bindingPackageVersion = require('microresolve-freebsd-arm64/package.json').version - if (bindingPackageVersion !== '0.2.1' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { - throw new Error(`Native binding package version mismatch, expected 0.2.1 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) + if (bindingPackageVersion !== '0.2.2' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { + throw new Error(`Native binding package version mismatch, expected 0.2.2 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) } return binding } catch (e) { @@ -271,8 +271,8 @@ function requireNative() { try { const binding = require('microresolve-linux-x64-musl') const bindingPackageVersion = require('microresolve-linux-x64-musl/package.json').version - if (bindingPackageVersion !== '0.2.1' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { - throw new Error(`Native binding package version mismatch, expected 0.2.1 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) + if (bindingPackageVersion !== '0.2.2' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { + throw new Error(`Native binding package version mismatch, expected 0.2.2 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) } return binding } catch (e) { @@ -287,8 +287,8 @@ function requireNative() { try { const binding = require('microresolve-linux-x64-gnu') const bindingPackageVersion = require('microresolve-linux-x64-gnu/package.json').version - if (bindingPackageVersion !== '0.2.1' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { - throw new Error(`Native binding package version mismatch, expected 0.2.1 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) + if (bindingPackageVersion !== '0.2.2' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { + throw new Error(`Native binding package version mismatch, expected 0.2.2 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) } return binding } catch (e) { @@ -305,8 +305,8 @@ function requireNative() { try { const binding = require('microresolve-linux-arm64-musl') const bindingPackageVersion = require('microresolve-linux-arm64-musl/package.json').version - if (bindingPackageVersion !== '0.2.1' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { - throw new Error(`Native binding package version mismatch, expected 0.2.1 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) + if (bindingPackageVersion !== '0.2.2' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { + throw new Error(`Native binding package version mismatch, expected 0.2.2 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) } return binding } catch (e) { @@ -321,8 +321,8 @@ function requireNative() { try { const binding = require('microresolve-linux-arm64-gnu') const bindingPackageVersion = require('microresolve-linux-arm64-gnu/package.json').version - if (bindingPackageVersion !== '0.2.1' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { - throw new Error(`Native binding package version mismatch, expected 0.2.1 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) + if (bindingPackageVersion !== '0.2.2' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { + throw new Error(`Native binding package version mismatch, expected 0.2.2 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) } return binding } catch (e) { @@ -339,8 +339,8 @@ function requireNative() { try { const binding = require('microresolve-linux-arm-musleabihf') const bindingPackageVersion = require('microresolve-linux-arm-musleabihf/package.json').version - if (bindingPackageVersion !== '0.2.1' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { - throw new Error(`Native binding package version mismatch, expected 0.2.1 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) + if (bindingPackageVersion !== '0.2.2' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { + throw new Error(`Native binding package version mismatch, expected 0.2.2 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) } return binding } catch (e) { @@ -355,8 +355,8 @@ function requireNative() { try { const binding = require('microresolve-linux-arm-gnueabihf') const bindingPackageVersion = require('microresolve-linux-arm-gnueabihf/package.json').version - if (bindingPackageVersion !== '0.2.1' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { - throw new Error(`Native binding package version mismatch, expected 0.2.1 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) + if (bindingPackageVersion !== '0.2.2' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { + throw new Error(`Native binding package version mismatch, expected 0.2.2 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) } return binding } catch (e) { @@ -373,8 +373,8 @@ function requireNative() { try { const binding = require('microresolve-linux-loong64-musl') const bindingPackageVersion = require('microresolve-linux-loong64-musl/package.json').version - if (bindingPackageVersion !== '0.2.1' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { - throw new Error(`Native binding package version mismatch, expected 0.2.1 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) + if (bindingPackageVersion !== '0.2.2' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { + throw new Error(`Native binding package version mismatch, expected 0.2.2 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) } return binding } catch (e) { @@ -389,8 +389,8 @@ function requireNative() { try { const binding = require('microresolve-linux-loong64-gnu') const bindingPackageVersion = require('microresolve-linux-loong64-gnu/package.json').version - if (bindingPackageVersion !== '0.2.1' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { - throw new Error(`Native binding package version mismatch, expected 0.2.1 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) + if (bindingPackageVersion !== '0.2.2' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { + throw new Error(`Native binding package version mismatch, expected 0.2.2 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) } return binding } catch (e) { @@ -407,8 +407,8 @@ function requireNative() { try { const binding = require('microresolve-linux-riscv64-musl') const bindingPackageVersion = require('microresolve-linux-riscv64-musl/package.json').version - if (bindingPackageVersion !== '0.2.1' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { - throw new Error(`Native binding package version mismatch, expected 0.2.1 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) + if (bindingPackageVersion !== '0.2.2' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { + throw new Error(`Native binding package version mismatch, expected 0.2.2 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) } return binding } catch (e) { @@ -423,8 +423,8 @@ function requireNative() { try { const binding = require('microresolve-linux-riscv64-gnu') const bindingPackageVersion = require('microresolve-linux-riscv64-gnu/package.json').version - if (bindingPackageVersion !== '0.2.1' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { - throw new Error(`Native binding package version mismatch, expected 0.2.1 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) + if (bindingPackageVersion !== '0.2.2' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { + throw new Error(`Native binding package version mismatch, expected 0.2.2 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) } return binding } catch (e) { @@ -440,8 +440,8 @@ function requireNative() { try { const binding = require('microresolve-linux-ppc64-gnu') const bindingPackageVersion = require('microresolve-linux-ppc64-gnu/package.json').version - if (bindingPackageVersion !== '0.2.1' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { - throw new Error(`Native binding package version mismatch, expected 0.2.1 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) + if (bindingPackageVersion !== '0.2.2' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { + throw new Error(`Native binding package version mismatch, expected 0.2.2 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) } return binding } catch (e) { @@ -456,8 +456,8 @@ function requireNative() { try { const binding = require('microresolve-linux-s390x-gnu') const bindingPackageVersion = require('microresolve-linux-s390x-gnu/package.json').version - if (bindingPackageVersion !== '0.2.1' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { - throw new Error(`Native binding package version mismatch, expected 0.2.1 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) + if (bindingPackageVersion !== '0.2.2' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { + throw new Error(`Native binding package version mismatch, expected 0.2.2 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) } return binding } catch (e) { @@ -476,8 +476,8 @@ function requireNative() { try { const binding = require('microresolve-openharmony-arm64') const bindingPackageVersion = require('microresolve-openharmony-arm64/package.json').version - if (bindingPackageVersion !== '0.2.1' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { - throw new Error(`Native binding package version mismatch, expected 0.2.1 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) + if (bindingPackageVersion !== '0.2.2' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { + throw new Error(`Native binding package version mismatch, expected 0.2.2 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) } return binding } catch (e) { @@ -492,8 +492,8 @@ function requireNative() { try { const binding = require('microresolve-openharmony-x64') const bindingPackageVersion = require('microresolve-openharmony-x64/package.json').version - if (bindingPackageVersion !== '0.2.1' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { - throw new Error(`Native binding package version mismatch, expected 0.2.1 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) + if (bindingPackageVersion !== '0.2.2' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { + throw new Error(`Native binding package version mismatch, expected 0.2.2 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) } return binding } catch (e) { @@ -508,8 +508,8 @@ function requireNative() { try { const binding = require('microresolve-openharmony-arm') const bindingPackageVersion = require('microresolve-openharmony-arm/package.json').version - if (bindingPackageVersion !== '0.2.1' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { - throw new Error(`Native binding package version mismatch, expected 0.2.1 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) + if (bindingPackageVersion !== '0.2.2' && process.env.NAPI_RS_ENFORCE_VERSION_CHECK && process.env.NAPI_RS_ENFORCE_VERSION_CHECK !== '0') { + throw new Error(`Native binding package version mismatch, expected 0.2.2 but got ${bindingPackageVersion}. You can reinstall dependencies to fix this issue.`) } return binding } catch (e) { diff --git a/node/src/lib.rs b/node/src/lib.rs index 726e69a..b6663a2 100644 --- a/node/src/lib.rs +++ b/node/src/lib.rs @@ -635,4 +635,97 @@ impl Namespace { .remove_phrase(&intent_id, &phrase) .map_err(|e| Error::from_reason(e.to_string())) } + + // ── Lexical groups (per-namespace morph + abbrev normalization) ── + + /// List all lexical groups in this namespace. + #[napi] + pub fn list_lexical_groups(&self) -> Vec { + self.engine + .namespace(&self.id) + .list_lexical_groups() + .into_iter() + .map(lex_to_node) + .collect() + } + + /// Add a lexical group. Returns the index of the new group. + /// Rebuilds the index — every existing seed is re-tokenized through + /// the new group set. + #[napi] + pub fn add_lexical_group(&self, group: LexicalGroup) -> Result { + let core_group = lex_from_node(group)?; + self.engine + .namespace(&self.id) + .add_lexical_group(core_group) + .map(|n| n as u32) + .map_err(|e| Error::from_reason(e.to_string())) + } + + /// Remove the lexical group at `idx`. Rebuilds the index. + #[napi] + pub fn remove_lexical_group(&self, idx: u32) -> Result { + self.engine + .namespace(&self.id) + .remove_lexical_group(idx as usize) + .map(lex_to_node) + .map_err(|e| Error::from_reason(e.to_string())) + } + + /// Replace the lexical group at `idx`. Rebuilds the index. + #[napi] + pub fn update_lexical_group(&self, idx: u32, group: LexicalGroup) -> Result<()> { + let core_group = lex_from_node(group)?; + self.engine + .namespace(&self.id) + .update_lexical_group(idx as usize, core_group) + .map_err(|e| Error::from_reason(e.to_string())) + } +} + +// ── LexicalGroup (Node) ─────────────────────────────────────────────────────── + +/// A per-namespace lexical normalization group: either a `morph` (inflection +/// variants of one root) or `abbrev` (short forms of a longer phrase). +#[napi(object)] +pub struct LexicalGroup { + /// `"morph"` or `"abbrev"`. + pub kind: String, + /// Language code (e.g. `"en"`). + pub lang: String, + /// The form every variant normalizes to. + pub canonical: String, + /// All variants (canonical is included automatically). + pub variants: Vec, +} + +fn lex_to_node(g: microresolve_core::LexicalGroup) -> LexicalGroup { + LexicalGroup { + kind: match g.kind { + microresolve_core::LexicalKind::Morph => "morph".to_string(), + microresolve_core::LexicalKind::Abbrev => "abbrev".to_string(), + }, + lang: g.lang, + canonical: g.canonical, + variants: g.variants, + } +} + +fn lex_from_node(g: LexicalGroup) -> Result { + let kind = match g.kind.as_str() { + "morph" => microresolve_core::LexicalKind::Morph, + "abbrev" => microresolve_core::LexicalKind::Abbrev, + other => { + return Err(Error::from_reason(format!( + "kind must be 'morph' or 'abbrev', got {:?}", + other + ))) + } + }; + Ok(microresolve_core::LexicalGroup { + kind, + lang: g.lang, + canonical: g.canonical, + variants: g.variants, + }) } diff --git a/packs/eu-ai-act-prohibited/_ns.json b/packs/eu-ai-act-prohibited/_ns.json index ac7a0bd..efa4a0a 100644 --- a/packs/eu-ai-act-prohibited/_ns.json +++ b/packs/eu-ai-act-prohibited/_ns.json @@ -1,10 +1,53 @@ { "name": "eu-ai-act-prohibited", - "description": "EU AI Act Article 5 prohibited-practice triage. Detects whether a deployment intent matches one of the 6 explicitly prohibited categories: biometric categorization, emotion recognition in workplace/education, exploitation of vulnerability, predictive policing on natural persons, social scoring, subliminal manipulation. Pre-LLM filter for compliance review.", + "status": "experimental", + "description": "EU AI Act Article 5 prohibited-practice triage. Detects whether a query describes one of the prohibited categories: subliminal manipulation 5(1)(a), exploitation of vulnerability 5(1)(b), social scoring 5(1)(c), predictive policing 5(1)(d), untargeted facial scraping 5(1)(e), emotion recognition in workplace/education 5(1)(f), biometric categorisation 5(1)(g), real-time remote biometric identification 5(1)(h), and the new prohibitions added by the Digital AI Omnibus political agreement (7 May 2026, compliance 2 Dec 2026) — non-consensual intimate imagery generation and AI-generated CSAM. Includes a `legitimate_use` negative class to absorb adjacent-but-legal queries (single-context credit scoring, fraud detection, driver-fatigue safety, voluntary face-unlock). Pre-LLM triage filter — pair with lawyer review for final determination.", "default_threshold": 1.5, "default_min_voting_tokens": 2, "compliance_frameworks": [ - "EU AI Act Art. 5", + "EU AI Act Art. 5(1)(a)", + "EU AI Act Art. 5(1)(b)", + "EU AI Act Art. 5(1)(c)", + "EU AI Act Art. 5(1)(d)", + "EU AI Act Art. 5(1)(e)", + "EU AI Act Art. 5(1)(f)", + "EU AI Act Art. 5(1)(g)", + "EU AI Act Art. 5(1)(h)", + "EU AI Act Art. 5 (NCII, Omnibus 7 May 2026)", + "EU AI Act Art. 5 (AI-CSAM, Omnibus 7 May 2026)", "EU AI Act Art. 13" + ], + "policy_overrides": [ + {"_comment": "5(1)(h) carve-out — explicit law-text exception: targeted search for missing victims", + "words": ["missing", "child"], "intent": "legitimate_use", "bonus": 2.5}, + {"_comment": "5(1)(h) carve-out — explicit law-text exception: identification of named suspect under warrant", + "words": ["arrest", "warrant"], "intent": "legitimate_use", "bonus": 2.5}, + {"_comment": "5(1)(d) carve-out — Feb 2025 Commission guidelines: profiling + objective verifiable facts", + "words": ["witness", "reports"], "intent": "legitimate_use", "bonus": 2.5}, + {"_comment": "5(1)(d) carve-out — Feb 2025 Commission guidelines: profiling + outstanding warrants", + "words": ["outstanding", "warrants"], "intent": "legitimate_use", "bonus": 2.5}, + {"_comment": "Omnibus 5/2026 — CSAM detection (moderation) is NOT CSAM generation: hard policy distinction", + "words": ["csam", "detection"], "intent": "legitimate_use", "bonus": 3.0}, + {"_comment": "5(1)(f) carve-out — Feb 2025 Commission canonical example: driver fatigue is safety, not workplace surveillance", + "words": ["driver", "fatigue"], "intent": "legitimate_use", "bonus": 2.5}, + {"_comment": "5(1)(g) carve-out — voluntary biometric authentication on user's own device is not categorisation", + "words": ["face", "unlock"], "intent": "legitimate_use", "bonus": 2.0}, + {"_comment": "5(1)(e) carve-out — explicitly consented opt-in is not untargeted scraping", + "words": ["consented", "opt-in"], "intent": "legitimate_use", "bonus": 2.0} + ], + "lexical_groups": [ + {"kind": "morph", "lang": "en", "canonical": "child", "variants": ["child", "children", "child's"]}, + {"kind": "morph", "lang": "en", "canonical": "warrant", "variants": ["warrant", "warrants"]}, + {"kind": "morph", "lang": "en", "canonical": "predict", "variants": ["predict", "predicts", "predicted", "predicting", "prediction"]}, + {"kind": "morph", "lang": "en", "canonical": "person", "variants": ["person", "persons", "people"]}, + {"kind": "morph", "lang": "en", "canonical": "score", "variants": ["score", "scores", "scoring", "scored"]}, + {"kind": "morph", "lang": "en", "canonical": "manipulate", "variants": ["manipulate", "manipulates", "manipulating", "manipulated", "manipulation"]}, + {"kind": "morph", "lang": "en", "canonical": "infer", "variants": ["infer", "infers", "inferring", "inferred", "inference"]}, + {"kind": "morph", "lang": "en", "canonical": "categorize", "variants": ["categorize", "categorise", "categorizes", "categorising", "categorization", "categorisation"]}, + {"kind": "morph", "lang": "en", "canonical": "exploit", "variants": ["exploit", "exploits", "exploiting", "exploited", "exploitation"]}, + {"kind": "morph", "lang": "en", "canonical": "scrape", "variants": ["scrape", "scrapes", "scraping", "scraped"]}, + {"kind": "abbrev", "lang": "en", "canonical": "rbi", "variants": ["rbi"]}, + {"kind": "abbrev", "lang": "en", "canonical": "ncii", "variants": ["ncii"]}, + {"kind": "abbrev", "lang": "en", "canonical": "csam", "variants": ["csam"]} ] -} \ No newline at end of file +} diff --git a/packs/eu-ai-act-prohibited/ai_generated_csam.json b/packs/eu-ai-act-prohibited/ai_generated_csam.json new file mode 100644 index 0000000..d48606e --- /dev/null +++ b/packs/eu-ai-act-prohibited/ai_generated_csam.json @@ -0,0 +1,29 @@ +{ + "description": "Art. 5 — NEW prohibition added by Digital AI Omnibus political agreement (7 May 2026): AI systems generating child sexual abuse material (AI-CSAM) including text-to-image and image-to-image diffusion outputs depicting minors. Compliance starts 2 Dec 2026; status: provisional. Already separately criminal in most jurisdictions; this intent flags requests to BUILD or DEPLOY such generators. Split from NCII intent because of different downstream routing (immediate criminal report vs legal/HR review).", + "guardrails": [], + "instructions": "", + "persona": "", + "phrases": { + "en": [ + "AI-generated CSAM", + "CSAM by prompt", + "generate CSAM from text", + "child sexual abuse material AI", + "synthetic CSAM", + "CSAM generator", + "AI sexually explicit imagery of minors", + "generate sexual imagery of children", + "AI imagery of underage subjects", + "CSAM via diffusion model", + "text-to-CSAM model", + "AI explicit imagery depicting minors", + "minor sexual deepfake", + "generate child explicit content with AI", + "synthetic underage explicit imagery" + ] + }, + "schema": null, + "source": null, + "target": null, + "type": "action" +} diff --git a/packs/eu-ai-act-prohibited/biometric_categorization.json b/packs/eu-ai-act-prohibited/biometric_categorization.json index 5587b5d..8043e6b 100644 --- a/packs/eu-ai-act-prohibited/biometric_categorization.json +++ b/packs/eu-ai-act-prohibited/biometric_categorization.json @@ -1,26 +1,29 @@ { - "description": "", + "description": "Art. 5(1)(g) — Biometric categorisation that categorises natural persons individually based on biometric data to deduce or infer race, political opinions, trade-union membership, religious or philosophical beliefs, sex life, or sexual orientation. Narrow law-enforcement carve-out exists for lawfully acquired datasets.", "guardrails": [], "instructions": "", "persona": "", "phrases": { "en": [ - "infer ethnicity from face scan", - "classify religion from facial features", - "determine sexual orientation from photos", - "categorize political views from biometrics", - "use face recognition to sort by race", - "facial analysis for religious classification", - "biometric grouping by ethnicity for ads", - "infer protected attributes from face scan", - "classify union membership via biometrics", - "sort individuals into racial categories from photos", - "voiceprint inferring native ethnic group of speaker", - "video analytics labeling pedestrians by ethnic bucket" + "infer race from face", + "infer religion from face", + "predict sexual orientation from photo", + "detect political views from biometrics", + "classify trade-union from biometric", + "biometric race detector", + "biometric religion classifier", + "biometric political profiling", + "biometric orientation classifier", + "infer ethnicity from face", + "predict religion from voice biometric", + "categorize face by sexual orientation", + "classify ethnicity from gait", + "biometric belief detector", + "biometric protected-attribute inference" ] }, "schema": null, "source": null, "target": null, "type": "action" -} \ No newline at end of file +} diff --git a/packs/eu-ai-act-prohibited/emotion_recognition_workplace.json b/packs/eu-ai-act-prohibited/emotion_recognition_workplace.json index b15658a..cfdaf06 100644 --- a/packs/eu-ai-act-prohibited/emotion_recognition_workplace.json +++ b/packs/eu-ai-act-prohibited/emotion_recognition_workplace.json @@ -1,24 +1,29 @@ { - "description": "", + "description": "Art. 5(1)(f) — AI inferring emotions of natural persons in the areas of workplace OR educational institutions. Per Feb 2025 Commission guidelines: workplace/education only; medical and safety carve-outs (driver fatigue OK; classifying call-centre staff as 'happy/unhappy' is not).", "guardrails": [], "instructions": "", "persona": "", "phrases": { "en": [ - "detect employee emotions during meetings", - "monitor worker stress via webcam", - "classify call center agent moods", - "emotion AI to rate employee engagement", - "detect student emotions in classroom", - "analyze worker facial expressions for performance", - "detect anger in customer service agents via camera", - "monitor teacher emotions in classroom via AI", - "emotion recognition on employees during reviews", - "student emotional state detection during lectures" + "emotion AI in workplace", + "emotion AI in classroom", + "detect employee emotions", + "detect student emotions", + "workplace mood tracker", + "classroom feeling monitor", + "monitor agent emotions", + "score employee engagement emotionally", + "infer teacher mood", + "detect student stress in exam", + "webcam emotion at work", + "emotion monitoring during meetings", + "affect detection in school", + "employee sentiment from voice", + "classroom emotional climate AI" ] }, "schema": null, "source": null, "target": null, "type": "action" -} \ No newline at end of file +} diff --git a/packs/eu-ai-act-prohibited/exploitation_vulnerability.json b/packs/eu-ai-act-prohibited/exploitation_vulnerability.json index 62f1ffe..447db9c 100644 --- a/packs/eu-ai-act-prohibited/exploitation_vulnerability.json +++ b/packs/eu-ai-act-prohibited/exploitation_vulnerability.json @@ -1,25 +1,29 @@ { - "description": "", + "description": "Art. 5(1)(b) — AI exploiting vulnerabilities of natural persons due to age, disability, or specific social or economic situation, with the objective or effect of materially distorting behaviour and causing significant harm. Per Feb 2025 Commission guidelines: vulnerabilities are the closed list (age / disability / socio-economic), NOT generic emotional fragility.", "guardrails": [], "instructions": "", "persona": "", "phrases": { "en": [ - "target elderly with cognitive decline for loans", - "exploit childrens age for manipulative marketing", - "prey on people with disabilities financially", - "target low-income users with predatory products", - "manipulate dementia patients into purchases", - "exploit minors vulnerability to manipulate behavior", - "prey on economic hardship to push high-cost loans", - "manipulation targeting cognitive disabilities", - "ads targeting children to exploit age-based naivete", - "target vulnerable addicts with gambling offers", - "system exploiting youth social pressure for purchases" + "exploit elderly users", + "prey on seniors", + "exploit children's age", + "exploit cognitive disability", + "prey on dementia", + "exploit low-income borrowers", + "predatory loan to poor", + "predatory upsell to pensioners", + "exploit disability benefits", + "exploit minor age", + "exploit poverty desperation", + "high-pressure pitch to seniors", + "exploit age-related decline", + "predatory ads for kids", + "exploit socio-economic vulnerability" ] }, "schema": null, "source": null, "target": null, "type": "action" -} \ No newline at end of file +} diff --git a/packs/eu-ai-act-prohibited/legitimate_use.json b/packs/eu-ai-act-prohibited/legitimate_use.json new file mode 100644 index 0000000..bc2796f --- /dev/null +++ b/packs/eu-ai-act-prohibited/legitimate_use.json @@ -0,0 +1,29 @@ +{ + "description": "Negative class — adjacent uses that look like Article 5 prohibited practices but are explicitly carved out by the Act or Feb 2025 Commission guidelines. Routes here when the query describes a legitimate, NON-prohibited use case: single-context credit scoring, fraud detection, driver-fatigue safety monitoring, voluntary biometric authentication, personalised advertising on first-party data, predictive policing using objective facts plus profiling. Used to absorb FPs and keep prohibited intents tight.", + "guardrails": [], + "instructions": "", + "persona": "", + "phrases": { + "en": [ + "single-context credit score", + "driver fatigue detection", + "fraud detection score", + "biometric authentication unlock", + "face unlock my phone", + "voluntary biometric login", + "transaction-level fraud signal", + "credit risk for one loan", + "driver drowsiness alert", + "airport biometric verification", + "personalized ads from purchase history", + "recommend products from views", + "predictive policing with witness reports", + "alertness monitoring for safety", + "content recommendation from history" + ] + }, + "schema": null, + "source": null, + "target": null, + "type": "action" +} diff --git a/packs/eu-ai-act-prohibited/ncii_adult.json b/packs/eu-ai-act-prohibited/ncii_adult.json new file mode 100644 index 0000000..3be286f --- /dev/null +++ b/packs/eu-ai-act-prohibited/ncii_adult.json @@ -0,0 +1,29 @@ +{ + "description": "Art. 5 — NEW prohibition added by Digital AI Omnibus political agreement (7 May 2026): AI systems generating non-consensual sexually explicit or intimate imagery of adults ('nudifier' apps, deepfake pornography services). Compliance starts 2 Dec 2026; status: provisional pending formal Council + Parliament endorsement. Split from CSAM-generation intent because vocabularies and downstream actions differ.", + "guardrails": [], + "instructions": "", + "persona": "", + "phrases": { + "en": [ + "nudify app", + "undress photo AI", + "AI undressing tool", + "deepfake porn generator", + "face-swap into pornography", + "non-consensual nude generation", + "AI strip clothing from photo", + "deepfake nudity tool", + "generate explicit deepfake", + "fake nude generator", + "AI undress any photo", + "non-consensual intimate imagery", + "AI face-swap to adult video", + "synthetic nude from portrait", + "NCII generation service" + ] + }, + "schema": null, + "source": null, + "target": null, + "type": "action" +} diff --git a/packs/eu-ai-act-prohibited/predictive_policing.json b/packs/eu-ai-act-prohibited/predictive_policing.json index 65e5b43..ee6e7b7 100644 --- a/packs/eu-ai-act-prohibited/predictive_policing.json +++ b/packs/eu-ai-act-prohibited/predictive_policing.json @@ -1,25 +1,29 @@ { - "description": "", + "description": "Art. 5(1)(d) — AI for risk assessment of natural persons to predict criminal offences, based SOLELY on profiling or assessing personality traits and characteristics. Per Feb 2025 Commission guidelines: only prohibited if solely on profiling. Models combining profiling with objective verifiable facts (warrants, witness reports) are not prohibited.", "guardrails": [], "instructions": "", "persona": "", "phrases": { "en": [ - "predict who will commit crimes", - "profile individuals for likelihood of offending", - "AI system forecasting criminal behavior of person", - "risk score for future crime by individual", - "predict arrests based on personality profile", - "individual crime risk prediction from profile", - "forecast individual recidivism solely from profile", - "single out persons likely to commit future crimes", - "personality-based prediction of criminal propensity", - "predicting future criminality of named individuals", - "AI flagging specific people as likely criminals without any crime" + "predictive policing", + "predict future crime", + "predict future offender", + "profile-based crime prediction", + "personality crime risk", + "pre-crime profiling", + "predict crime by personality", + "profiling for police watchlist", + "profile-only criminal risk", + "forecast criminality from profile", + "personality-based pre-crime", + "predict offender by demographic profile", + "flag future criminals by profile", + "predict who will commit crime", + "criminal risk score from profile" ] }, "schema": null, "source": null, "target": null, "type": "action" -} \ No newline at end of file +} diff --git a/packs/eu-ai-act-prohibited/realtime_remote_biometric_id.json b/packs/eu-ai-act-prohibited/realtime_remote_biometric_id.json new file mode 100644 index 0000000..fd9b6fc --- /dev/null +++ b/packs/eu-ai-act-prohibited/realtime_remote_biometric_id.json @@ -0,0 +1,29 @@ +{ + "description": "Art. 5(1)(h) — Real-time remote biometric identification (RBI) in publicly accessible spaces for the purposes of law enforcement. Subject to narrow exceptions (targeted search for victims of specific crimes, prevention of specific imminent threat, identification of suspects of listed offences) — anything outside is prohibited.", + "guardrails": [], + "instructions": "", + "persona": "", + "phrases": { + "en": [ + "real-time face match in public", + "live biometric ID in public", + "live face match for police", + "real-time RBI for law enforcement", + "live face match in train station", + "real-time identification in mall", + "public CCTV face match for police", + "real-time biometric in airport", + "live face recognition on streets", + "real-time gait ID in public", + "continuous face match in stadium", + "live RBI across metro", + "routine police face match in public", + "real-time pedestrian biometric ID", + "live face scan in public square" + ] + }, + "schema": null, + "source": null, + "target": null, + "type": "action" +} diff --git a/packs/eu-ai-act-prohibited/social_scoring.json b/packs/eu-ai-act-prohibited/social_scoring.json index c83d6e6..d166f82 100644 --- a/packs/eu-ai-act-prohibited/social_scoring.json +++ b/packs/eu-ai-act-prohibited/social_scoring.json @@ -1,28 +1,29 @@ { - "description": "", + "description": "Art. 5(1)(c) — Social scoring of natural persons over time, across multiple unrelated contexts, leading to detrimental or unfavourable treatment disproportionate to the original data. Per Feb 2025 Commission guidelines: requires (i) over time, (ii) across multiple unrelated contexts, (iii) detrimental treatment disproportionate to the data. Single-context creditworthiness assessment is NOT prohibited.", "guardrails": [], "instructions": "", "persona": "", "phrases": { "en": [ - "rank citizens by overall trustworthiness", - "score people based on social media behavior", - "general purpose citizen reputation score", - "classify people as good or bad citizens", - "social credit system scoring individuals", - "assign trustworthiness scores across unrelated contexts", - "detrimental scoring based on social behavior", - "nationwide behavior-based person ranking", - "rate individuals across life domains by conduct", - "social scoring for welfare eligibility decisions", - "lifetime behavior points decide benefits access", - "one number that follows you everywhere summarizing your behavior", - "cross-context trust score driving denial of services", - "AI giving each adult a single life score" + "social credit score", + "social scoring", + "citizen reputation score", + "trustworthiness score", + "cross-context trust score", + "multi-domain reputation", + "lifetime behavior score", + "national citizen ranking", + "social media reputation rating", + "cross-context life score", + "nationwide trust ranking", + "social scoring for welfare", + "citizen score for services", + "cross-domain person rating", + "unified life score" ] }, "schema": null, "source": null, "target": null, "type": "action" -} \ No newline at end of file +} diff --git a/packs/eu-ai-act-prohibited/subliminal_manipulation.json b/packs/eu-ai-act-prohibited/subliminal_manipulation.json index 988b5db..eac08f5 100644 --- a/packs/eu-ai-act-prohibited/subliminal_manipulation.json +++ b/packs/eu-ai-act-prohibited/subliminal_manipulation.json @@ -1,26 +1,29 @@ { - "description": "", + "description": "Art. 5(1)(a) — AI deploying subliminal techniques beyond a person's consciousness, or purposefully manipulative or deceptive techniques, with the objective or effect of materially distorting behaviour and causing significant harm. Anchored on Recital 29 + Feb 2025 Commission guidelines: requires technique + behaviour distortion + significant harm. Personalised advertising is not inherently prohibited.", "guardrails": [], "instructions": "", "persona": "", "phrases": { "en": [ - "subliminal messages to change voting behavior", - "below awareness stimuli to influence purchases", - "manipulative techniques beyond consumer awareness", - "hidden audio cues to shift opinion", - "imperceptible prompts to alter decisions", - "subliminal video frames to steer choices", - "deceptive techniques to distort behavior materially", - "covert cues that bypass conscious reasoning", - "subliminal ad imagery to change behavior", - "techniques below perceptual threshold to influence", - "below-threshold suggestion system for political persuasion", - "manipulation outside user awareness targeting kids to buy" + "subliminal nudge", + "subliminal manipulation", + "subliminal advertising", + "subliminal frames", + "imperceptible cues", + "imperceptible audio prompts", + "below-conscious manipulation", + "below-threshold persuasion", + "covert behavioral steering", + "override user autonomy", + "ultrasound nudging in store", + "microsecond frame insertion", + "nudge below awareness", + "influence below perception", + "hidden persuasion technique" ] }, "schema": null, "source": null, "target": null, "type": "action" -} \ No newline at end of file +} diff --git a/packs/eu-ai-act-prohibited/untargeted_facial_scraping.json b/packs/eu-ai-act-prohibited/untargeted_facial_scraping.json new file mode 100644 index 0000000..bd84590 --- /dev/null +++ b/packs/eu-ai-act-prohibited/untargeted_facial_scraping.json @@ -0,0 +1,29 @@ +{ + "description": "Art. 5(1)(e) — AI systems that create or expand facial recognition databases through the untargeted scraping of facial images from the internet or CCTV footage. Targets Clearview-style data harvesting at scale. Effective since 2 Feb 2025.", + "guardrails": [], + "instructions": "", + "persona": "", + "phrases": { + "en": [ + "scrape internet for faces", + "mass face scraping", + "harvest faces from web", + "bulk facial scraping", + "internet face database", + "CCTV face harvesting", + "untargeted face collection", + "scrape Instagram for faces", + "scrape LinkedIn faces", + "Clearview-style face database", + "crawl web for face data", + "bulk download social media faces", + "ingest CCTV streams for faces", + "global face index from web", + "indiscriminate face harvesting" + ] + }, + "schema": null, + "source": null, + "target": null, + "type": "action" +} diff --git a/packs/hipaa-triage/_ns.json b/packs/hipaa-triage/_ns.json index 55dcf96..fbd9de2 100644 --- a/packs/hipaa-triage/_ns.json +++ b/packs/hipaa-triage/_ns.json @@ -1,5 +1,6 @@ { "name": "hipaa-triage", + "status": "experimental", "description": "Medical query triage: route patient inquiries to clinical_urgent / clinical_routine / mental_health_crisis / administrative / billing / scheduling. NOT a HIPAA compliance solution \u2014 a triage filter intended as a deterministic pre-LLM layer for healthcare assistant apps. Best used as a top-3 candidate filter; pair with LLM judgment for top-1 selection.", "default_threshold": 1.5, "default_min_voting_tokens": 3, diff --git a/packs/mcp-tools-generic/_ns.json b/packs/mcp-tools-generic/_ns.json index 0d04b5f..d723099 100644 --- a/packs/mcp-tools-generic/_ns.json +++ b/packs/mcp-tools-generic/_ns.json @@ -1,5 +1,6 @@ { "name": "mcp-tools-generic", + "status": "experimental", "description": "Tool prefilter for the most common MCP / function-calling categories. Routes a user request to the broad tool family (search, send, fetch, file, db, code, calendar) so the agent only has to choose among that family's specific tools.", "default_threshold": 1.5, "default_min_voting_tokens": 2 diff --git a/packs/safety-filter/_ns.json b/packs/safety-filter/_ns.json index aac720f..fb0e2a8 100644 --- a/packs/safety-filter/_ns.json +++ b/packs/safety-filter/_ns.json @@ -1,5 +1,6 @@ { "name": "safety-filter", + "status": "experimental", "description": "Pre-LLM jailbreak / prompt-injection detection. Five intents covering the canonical attack taxonomy: prompt injection, system-prompt extraction, role override (DAN-style), safety bypass, and encoding-based evasion.", "default_threshold": 1.5, "default_min_voting_tokens": 3 diff --git a/python/Cargo.lock b/python/Cargo.lock index 840b2cd..1a64535 100644 --- a/python/Cargo.lock +++ b/python/Cargo.lock @@ -637,7 +637,7 @@ checksum = "f8ca58f447f06ed17d5fc4043ce1b10dd205e060fb3ce5b979b8ed8e59ff3f79" [[package]] name = "microresolve" -version = "0.2.0" +version = "0.2.2" dependencies = [ "aho-corasick", "regex", @@ -652,7 +652,7 @@ dependencies = [ [[package]] name = "microresolve-python" -version = "0.2.0" +version = "0.2.2" dependencies = [ "microresolve", "pyo3", diff --git a/python/src/lib.rs b/python/src/lib.rs index 9b662ce..d3895c4 100644 --- a/python/src/lib.rs +++ b/python/src/lib.rs @@ -185,6 +185,86 @@ fn info_to_py(info: microresolve_core::IntentInfo) -> IntentInfo { } } +// ── LexicalGroup ────────────────────────────────────────────────────────────── + +/// A per-namespace lexical normalization group: either a `morph` (inflection +/// variants of one root) or `abbrev` (short forms of a longer phrase). +/// Both `canonical` and `variants` are normalized to lowercase at load. +#[pyclass(get_all, from_py_object)] +#[derive(Clone)] +struct LexicalGroup { + /// `"morph"` or `"abbrev"`. + pub kind: String, + /// Language code (e.g. `"en"`). + pub lang: String, + /// The form every variant normalizes to. + pub canonical: String, + /// All variants (canonical is included automatically). + pub variants: Vec, +} + +#[pymethods] +impl LexicalGroup { + #[new] + #[pyo3(signature = (kind, canonical, variants, lang="en".to_string()))] + fn py_new( + kind: String, + canonical: String, + variants: Vec, + lang: String, + ) -> PyResult { + if kind != "morph" && kind != "abbrev" { + return Err(pyo3::exceptions::PyValueError::new_err( + "kind must be 'morph' or 'abbrev'", + )); + } + Ok(Self { + kind, + lang, + canonical, + variants, + }) + } + + fn __repr__(&self) -> String { + format!( + "LexicalGroup(kind={:?}, lang={:?}, canonical={:?}, variants={:?})", + self.kind, self.lang, self.canonical, self.variants + ) + } +} + +fn lex_to_py(g: microresolve_core::lexical::LexicalGroup) -> LexicalGroup { + LexicalGroup { + kind: match g.kind { + microresolve_core::lexical::LexicalKind::Morph => "morph".to_string(), + microresolve_core::lexical::LexicalKind::Abbrev => "abbrev".to_string(), + }, + lang: g.lang, + canonical: g.canonical, + variants: g.variants, + } +} + +fn lex_from_py(g: &LexicalGroup) -> PyResult { + let kind = match g.kind.as_str() { + "morph" => microresolve_core::lexical::LexicalKind::Morph, + "abbrev" => microresolve_core::lexical::LexicalKind::Abbrev, + other => { + return Err(pyo3::exceptions::PyValueError::new_err(format!( + "kind must be 'morph' or 'abbrev', got {:?}", + other + ))) + } + }; + Ok(microresolve_core::lexical::LexicalGroup { + kind, + lang: g.lang.clone(), + canonical: g.canonical.clone(), + variants: g.variants.clone(), + }) +} + // ── Namespace ───────────────────────────────────────────────────────────────── /// Per-namespace handle. Obtain via `engine.namespace(id)`. @@ -551,6 +631,47 @@ impl Namespace { .map_err(|e| pyo3::exceptions::PyValueError::new_err(e.to_string())) } + // ── Lexical groups (per-namespace morph + abbrev normalization) ── + + /// List all lexical groups in this namespace. + fn list_lexical_groups(&self) -> Vec { + self.engine + .namespace(&self.id) + .list_lexical_groups() + .into_iter() + .map(lex_to_py) + .collect() + } + + /// Add a lexical group. Returns the index of the new group. + /// Rebuilds the index — every existing seed is re-tokenized through + /// the new group set. + fn add_lexical_group(&self, group: &LexicalGroup) -> PyResult { + let core_group = lex_from_py(group)?; + self.engine + .namespace(&self.id) + .add_lexical_group(core_group) + .map_err(|e| pyo3::exceptions::PyValueError::new_err(e.to_string())) + } + + /// Remove the lexical group at `idx`. Rebuilds the index. + fn remove_lexical_group(&self, idx: usize) -> PyResult { + self.engine + .namespace(&self.id) + .remove_lexical_group(idx) + .map(lex_to_py) + .map_err(|e| pyo3::exceptions::PyValueError::new_err(e.to_string())) + } + + /// Replace the lexical group at `idx`. Rebuilds the index. + fn update_lexical_group(&self, idx: usize, group: &LexicalGroup) -> PyResult<()> { + let core_group = lex_from_py(group)?; + self.engine + .namespace(&self.id) + .update_lexical_group(idx, core_group) + .map_err(|e| pyo3::exceptions::PyValueError::new_err(e.to_string())) + } + /// Namespace identifier. #[getter] fn id(&self) -> &str { @@ -684,5 +805,6 @@ fn microresolve(m: &Bound<'_, PyModule>) -> PyResult<()> { m.add_class::()?; m.add_class::()?; m.add_class::()?; + m.add_class::()?; Ok(()) } diff --git a/src/bin/server/main.rs b/src/bin/server/main.rs index 400df5c..a335bef 100644 --- a/src/bin/server/main.rs +++ b/src/bin/server/main.rs @@ -21,6 +21,7 @@ mod routes_events; mod routes_git; mod routes_import; mod routes_intents; +mod routes_lexical; mod routes_logs; mod routes_phrases; mod routes_projects; @@ -378,6 +379,7 @@ async fn main() { let protected_api = axum::Router::new() .merge(routes_core::routes()) .merge(routes_intents::routes()) + .merge(routes_lexical::routes()) .merge(routes_logs::routes()) .merge(routes_phrases::routes()) .merge(routes_settings::routes()) diff --git a/src/bin/server/routes_lexical.rs b/src/bin/server/routes_lexical.rs new file mode 100644 index 0000000..fd72144 --- /dev/null +++ b/src/bin/server/routes_lexical.rs @@ -0,0 +1,322 @@ +//! Lexical group CRUD — per-namespace morph + abbrev normalization. +//! +//! Each group has a `kind` (`morph` or `abbrev`), a `lang` (language code), +//! a `canonical` form, and a list of `variants`. At tokenization time +//! (both index-time when seeds are added, and query-time at resolve), +//! every token gets normalized to its canonical form before scoring. +//! +//! Mutations rebuild the IntentIndex (the existing seeds need to be +//! re-tokenized through the new lexical group set). Each mutation lands +//! in the audit log. +//! +//! See `_internal/V0_3_LEXICAL_GROUPS_PLAN.md` for design + history. + +use crate::pipeline::{call_llm, extract_json}; +use crate::state::*; +use axum::{ + extract::{Path, State}, + http::{HeaderMap, StatusCode}, + routing::{get, post}, + Extension, Json, +}; +use microresolve::lexical::{LexicalGroup, LexicalKind}; + +pub fn routes() -> axum::Router { + axum::Router::new() + .route("/api/lexical-groups", get(list).post(add)) + .route( + "/api/lexical-groups/{idx}", + axum::routing::delete(remove).patch(update), + ) + .route("/api/lexical-groups/suggest", post(suggest)) +} + +#[derive(serde::Deserialize)] +pub struct GroupPayload { + pub kind: LexicalKind, + #[serde(default = "default_lang")] + pub lang: String, + pub canonical: String, + pub variants: Vec, +} + +fn default_lang() -> String { + "en".to_string() +} + +fn group_to_json(idx: usize, g: &LexicalGroup) -> serde_json::Value { + serde_json::json!({ + "idx": idx, + "kind": g.kind, + "lang": g.lang, + "canonical": g.canonical, + "variants": g.variants, + }) +} + +pub async fn list( + State(state): State, + headers: HeaderMap, +) -> Result, (StatusCode, String)> { + let app_id = app_id_from_headers(&headers); + let h = state.engine.try_namespace(&app_id).ok_or(( + StatusCode::NOT_FOUND, + format!("namespace '{}' not found", app_id), + ))?; + let groups = h.list_lexical_groups(); + let arr: Vec = groups + .iter() + .enumerate() + .map(|(i, g)| group_to_json(i, g)) + .collect(); + Ok(Json(serde_json::json!({ "lexical_groups": arr }))) +} + +pub async fn add( + State(state): State, + headers: HeaderMap, + Extension(KeyName(kid)): Extension, + Json(req): Json, +) -> Result, (StatusCode, String)> { + let app_id = app_id_from_headers(&headers); + let h = state.engine.try_namespace(&app_id).ok_or(( + StatusCode::NOT_FOUND, + format!("namespace '{}' not found", app_id), + ))?; + + let canonical_audit = req.canonical.clone(); + let variants_audit = req.variants.clone(); + let lang_audit = req.lang.clone(); + let kind_audit = req.kind; + + let group = LexicalGroup { + kind: req.kind, + lang: req.lang, + canonical: req.canonical, + variants: req.variants, + }; + let idx = h + .add_lexical_group(group) + .map_err(|e| (StatusCode::BAD_REQUEST, e.to_string()))?; + + audit_mutation( + &state, + &kid, + &app_id, + "lexical_group.add", + serde_json::json!({ + "idx": idx, + "kind": kind_audit, + "lang": lang_audit, + "canonical": canonical_audit, + "variants": variants_audit, + }), + ); + + let _ = h.flush(); + maybe_commit(&state, &app_id); + + Ok(Json(serde_json::json!({ "idx": idx }))) +} + +pub async fn remove( + State(state): State, + headers: HeaderMap, + Extension(KeyName(kid)): Extension, + Path(idx): Path, +) -> Result { + let app_id = app_id_from_headers(&headers); + let h = state.engine.try_namespace(&app_id).ok_or(( + StatusCode::NOT_FOUND, + format!("namespace '{}' not found", app_id), + ))?; + let removed = h + .remove_lexical_group(idx) + .map_err(|e| (StatusCode::NOT_FOUND, e.to_string()))?; + audit_mutation( + &state, + &kid, + &app_id, + "lexical_group.remove", + serde_json::json!({ + "idx": idx, + "kind": removed.kind, + "lang": removed.lang, + "canonical": removed.canonical, + "variants": removed.variants, + }), + ); + let _ = h.flush(); + maybe_commit(&state, &app_id); + Ok(StatusCode::NO_CONTENT) +} + +pub async fn update( + State(state): State, + headers: HeaderMap, + Extension(KeyName(kid)): Extension, + Path(idx): Path, + Json(req): Json, +) -> Result { + let app_id = app_id_from_headers(&headers); + let h = state.engine.try_namespace(&app_id).ok_or(( + StatusCode::NOT_FOUND, + format!("namespace '{}' not found", app_id), + ))?; + let canonical_audit = req.canonical.clone(); + let variants_audit = req.variants.clone(); + let lang_audit = req.lang.clone(); + let kind_audit = req.kind; + let group = LexicalGroup { + kind: req.kind, + lang: req.lang, + canonical: req.canonical, + variants: req.variants, + }; + h.update_lexical_group(idx, group) + .map_err(|e| (StatusCode::BAD_REQUEST, e.to_string()))?; + audit_mutation( + &state, + &kid, + &app_id, + "lexical_group.update", + serde_json::json!({ + "idx": idx, + "kind": kind_audit, + "lang": lang_audit, + "canonical": canonical_audit, + "variants": variants_audit, + }), + ); + let _ = h.flush(); + maybe_commit(&state, &app_id); + Ok(StatusCode::NO_CONTENT) +} + +#[derive(serde::Deserialize)] +pub struct SuggestPayload { + /// `morph` or `abbrev` — different prompts target each. + pub kind: LexicalKind, + #[serde(default = "default_lang")] + pub lang: String, +} + +/// Operator-triggered LLM proposal of lexical groups for the namespace's +/// current vocabulary. Returns proposals as JSON; operator approves +/// individually via separate POST /api/lexical-groups calls. +pub async fn suggest( + State(state): State, + headers: HeaderMap, + Json(req): Json, +) -> Result, (StatusCode, String)> { + let app_id = app_id_from_headers(&headers); + let h = state.engine.try_namespace(&app_id).ok_or(( + StatusCode::NOT_FOUND, + format!("namespace '{}' not found", app_id), + ))?; + + // Gather current vocabulary (tokens present in the index) + intent + // descriptions. The LLM uses both to ground proposals in real pack data. + let (vocab, intent_descs): (Vec, Vec<(String, String)>) = h.with_resolver(|r| { + let mut tokens: Vec<&String> = r.index().word_intent.keys().collect(); + tokens.sort(); + let vocab: Vec = tokens.iter().take(400).map(|s| s.to_string()).collect(); + let descs: Vec<(String, String)> = r + .intent_ids() + .into_iter() + .filter_map(|id| { + r.intent(&id) + .map(|info| (id.clone(), info.description.clone())) + }) + .collect(); + (vocab, descs) + }); + + let intent_block: String = intent_descs + .iter() + .map(|(id, d)| { + let short = d.chars().take(140).collect::(); + format!(" - {}: {}", id, short) + }) + .collect::>() + .join("\n"); + + let vocab_str = vocab.join(", "); + let lang = &req.lang; + + let prompt = match req.kind { + LexicalKind::Morph => format!( + "You are extending a per-namespace lexical dictionary for an intent classifier.\n\n\ + Namespace intents:\n{intent_block}\n\n\ + Tokens currently in the index (lowercase): {vocab_str}\n\n\ + Language: {lang}\n\n\ + For language {lang}, identify which of these tokens are inflectional\n\ + variants of the same root word (the lexeme). Group them so the engine\n\ + can normalize variants to a canonical form.\n\n\ + VALID examples (these ARE inflectional variants of one root):\n\ + - {{\"canonical\": \"child\", \"variants\": [\"child\", \"children\"]}}\n\ + - {{\"canonical\": \"predict\", \"variants\": [\"predict\", \"predicts\", \"predicting\"]}}\n\n\ + INVALID examples (these are different words, not inflections):\n\ + - {{\"canonical\": \"act\", \"variants\": [\"act\", \"action\", \"active\"]}} ← \"active\" is NOT an inflection of \"act\"\n\ + - {{\"canonical\": \"police\", \"variants\": [\"police\", \"policy\"]}} ← \"policy\" is NOT inflection of \"police\"\n\n\ + Be conservative. Skip a token if you're unsure.\n\n\ + Output: ONLY a JSON array of groups. No preamble, no commentary.\n\ + Format: [{{\"canonical\": \"...\", \"variants\": [\"...\", \"...\"]}}, ...]" + ), + LexicalKind::Abbrev => format!( + "You are identifying domain abbreviations in a namespace's vocabulary.\n\n\ + Namespace intents:\n{intent_block}\n\n\ + Tokens / phrases currently in the index: {vocab_str}\n\n\ + Language: {lang}\n\n\ + Find tokens that are abbreviations or acronyms of longer phrases\n\ + relevant to this namespace. The canonical form is the FULL phrase\n\ + (lowercase); the abbreviation is the shortened variant.\n\n\ + Examples:\n\ + - {{\"canonical\": \"real-time biometric identification\", \"variants\": [\"rbi\"]}}\n\ + - {{\"canonical\": \"non-consensual intimate imagery\", \"variants\": [\"ncii\"]}}\n\ + - {{\"canonical\": \"child sexual abuse material\", \"variants\": [\"csam\"]}}\n\n\ + Skip ambiguous abbreviations (multiple plausible expansions).\n\n\ + Output: ONLY a JSON array of groups. Format same as above." + ), + }; + + let response = call_llm(&state, &prompt, 1500).await?; + let json_str = extract_json(&response); + + // Parse loose JSON — tolerate either bare array or {"groups": [...]}. + let parsed: serde_json::Value = serde_json::from_str(json_str) + .map_err(|e| (StatusCode::BAD_GATEWAY, format!("LLM JSON parse: {}", e)))?; + let arr = parsed + .as_array() + .or_else(|| parsed.get("groups").and_then(|g| g.as_array())) + .cloned() + .unwrap_or_default(); + + // Build proposals tagged with kind + lang. Don't auto-apply — operator + // approves via POST /api/lexical-groups. + let proposals: Vec = arr + .iter() + .filter_map(|g| { + let canonical = g.get("canonical").and_then(|c| c.as_str())?; + let variants = g.get("variants").and_then(|v| v.as_array())?; + let v: Vec = variants + .iter() + .filter_map(|x| x.as_str().map(|s| s.to_lowercase())) + .collect(); + if v.is_empty() { + return None; + } + Some(serde_json::json!({ + "kind": req.kind, + "lang": req.lang, + "canonical": canonical.to_lowercase(), + "variants": v, + })) + }) + .collect(); + + Ok(Json(serde_json::json!({ + "proposals": proposals, + "count": proposals.len(), + }))) +} diff --git a/src/engine.rs b/src/engine.rs index d2e2a09..975435b 100644 --- a/src/engine.rs +++ b/src/engine.rs @@ -528,6 +528,118 @@ impl<'e> NamespaceHandle<'e> { .with_resolver_mut(&self.id, |r| r.rebuild_index()) } + /// List all lexical_groups (morph + abbrev) currently active on this namespace. + pub fn list_lexical_groups(&self) -> Vec { + self.engine + .with_resolver(&self.id, |r| r.lexical_groups.clone()) + } + + /// Add a new lexical_group. Validates: ≥1 variant (after dedup) and a + /// non-empty canonical form. Triggers full index rebuild because the + /// new group changes how existing seeds tokenize. + pub fn add_lexical_group(&self, group: crate::lexical::LexicalGroup) -> Result { + let canonical = group.canonical.trim().to_lowercase(); + if canonical.is_empty() { + return Err(Error::Parse( + "lexical_group canonical must not be empty".into(), + )); + } + let mut variants: Vec = group + .variants + .iter() + .map(|v| v.trim().to_lowercase()) + .filter(|v| !v.is_empty()) + .collect(); + if !variants.contains(&canonical) { + variants.push(canonical.clone()); + } + variants.sort(); + variants.dedup(); + if variants.is_empty() { + return Err(Error::Parse( + "lexical_group must have ≥1 non-empty variant".into(), + )); + } + let normalised = crate::lexical::LexicalGroup { + kind: group.kind, + lang: group.lang, + canonical, + variants, + }; + self.engine.with_resolver_mut(&self.id, |r| { + r.lexical_groups.push(normalised); + // Rebuild derived LexicalIndex + re-tokenize all seeds through + // the new group set so existing index entries reflect canonical + // forms. + r.index.lexical = crate::lexical::LexicalIndex::from_groups(&r.lexical_groups); + r.rebuild_index(); + Ok(r.lexical_groups.len() - 1) + })? + } + + /// Remove the lexical_group at the given index. Returns the removed group. + /// Triggers full index rebuild. + pub fn remove_lexical_group(&self, idx: usize) -> Result { + self.engine.with_resolver_mut(&self.id, |r| { + if idx >= r.lexical_groups.len() { + return Err(Error::Parse(format!( + "lexical_group index {} out of range (len={})", + idx, + r.lexical_groups.len() + ))); + } + let removed = r.lexical_groups.remove(idx); + r.index.lexical = crate::lexical::LexicalIndex::from_groups(&r.lexical_groups); + r.rebuild_index(); + Ok(removed) + })? + } + + /// Replace the lexical_group at the given index. Same validation as + /// `add_lexical_group`. Triggers full index rebuild. + pub fn update_lexical_group( + &self, + idx: usize, + group: crate::lexical::LexicalGroup, + ) -> Result<(), Error> { + let canonical = group.canonical.trim().to_lowercase(); + if canonical.is_empty() { + return Err(Error::Parse( + "lexical_group canonical must not be empty".into(), + )); + } + let mut variants: Vec = group + .variants + .iter() + .map(|v| v.trim().to_lowercase()) + .filter(|v| !v.is_empty()) + .collect(); + if !variants.contains(&canonical) { + variants.push(canonical.clone()); + } + variants.sort(); + variants.dedup(); + let normalised = crate::lexical::LexicalGroup { + kind: group.kind, + lang: group.lang, + canonical, + variants, + }; + self.engine.with_resolver_mut(&self.id, |r| { + if idx >= r.lexical_groups.len() { + return Err(Error::Parse(format!( + "lexical_group index {} out of range (len={})", + idx, + r.lexical_groups.len() + ))); + } + r.lexical_groups[idx] = normalised; + r.index.lexical = crate::lexical::LexicalIndex::from_groups(&r.lexical_groups); + r.rebuild_index(); + Ok(()) + })? + } + /// Lower-level phrase ingestion: tokenizes + indexes the phrase into the scoring index without /// the duplicate-check or stop-word filtering that `add_phrase` applies. Use `add_phrase` /// for user-driven additions; use `index_phrase` only for trusted, pre-validated phrases diff --git a/src/lexical.rs b/src/lexical.rs new file mode 100644 index 0000000..23f93f5 --- /dev/null +++ b/src/lexical.rs @@ -0,0 +1,245 @@ +//! Per-namespace lexical normalization — morph + abbrev. +//! +//! A `LexicalIndex` maps token variants to a canonical form. Both index-time +//! tokenization (when seeds get added) and query-time tokenization (when a +//! user query comes in) apply the same normalization, so equivalent variants +//! are stored and looked up under the same key. +//! +//! Two kinds of equivalence are supported: +//! +//! - **Morph** — inflectional variants of the same lexeme (`child` ⇄ +//! `children`, `predict` ⇄ `predicts` ⇄ `predicting`). Lexical fact, +//! context-independent. +//! - **Abbrev** — operator-defined references (`RBI` ⇄ `real-time biometric +//! identification`). Per-namespace shorthand for domain phrases. +//! +//! We deliberately do NOT support synonyms (the `cancel` ≈ `abort` kind). +//! Synonyms are semantic claims that vary by context and were the part of +//! the historical L1 lexical graph that polluted the index across packs. +//! See `_internal/V0_3_LEXICAL_GROUPS_PLAN.md` for the full rationale. +//! +//! Groups live in `_ns.json` (loaded by `resolver_persist`); the LexicalIndex +//! rebuilds from them on every namespace load. + +use serde::{Deserialize, Serialize}; +use std::collections::HashMap; + +#[derive(Serialize, Deserialize, Clone, Copy, Debug, PartialEq, Eq)] +#[serde(rename_all = "lowercase")] +pub enum LexicalKind { + Morph, + Abbrev, +} + +/// A single equivalence group: variants of one canonical form, scoped to a +/// language code. Stored as part of `_ns.json` and approved by the operator +/// (manually or via the LLM-suggest review queue). +#[derive(Serialize, Deserialize, Clone, Debug)] +pub struct LexicalGroup { + pub kind: LexicalKind, + /// Language code ("en", "fr", "de", "zh", ...). Used by the LLM + /// suggester to scope morphology proposals; runtime normalization is + /// language-blind (a token simply matches or doesn't). + pub lang: String, + /// Lowercase canonical form. All variants normalize to this. + pub canonical: String, + /// Lowercase variant tokens. Should include the canonical itself. + pub variants: Vec, +} + +/// Built from a `Vec` at namespace-load time. Provides O(1) +/// variant → canonical lookup. +#[derive(Clone, Debug, Default)] +pub struct LexicalIndex { + /// variant_lowercase → canonical_lowercase + by_variant: HashMap, +} + +impl LexicalIndex { + pub fn new() -> Self { + Self::default() + } + + /// Build a normalization index from a list of approved groups. Later + /// groups that conflict with earlier ones (same variant, different + /// canonical) are silently dropped — load-time validation is the + /// caller's job. + pub fn from_groups(groups: &[LexicalGroup]) -> Self { + let mut by_variant: HashMap = HashMap::new(); + for g in groups { + let canonical = g.canonical.to_lowercase(); + for v in &g.variants { + let variant = v.to_lowercase(); + if variant.is_empty() { + continue; + } + by_variant + .entry(variant) + .or_insert_with(|| canonical.clone()); + } + // Ensure the canonical itself maps to itself. + by_variant.entry(canonical.clone()).or_insert(canonical); + } + Self { by_variant } + } + + /// Returns true if this index has any groups loaded. Callers can + /// short-circuit normalization when empty (the common case for packs + /// without authored groups). + pub fn is_empty(&self) -> bool { + self.by_variant.is_empty() + } + + /// Map a single token to its canonical form. Returns the original token + /// (as a borrowed slice when no rewrite happens, or an owned String + /// when it does) — callers can collect into `Vec` cheaply. + pub fn normalize<'a>(&'a self, token: &'a str) -> &'a str { + if self.by_variant.is_empty() { + return token; + } + // Tokens may carry the `not_` negation prefix from the tokenizer. + // Normalize the base form and re-prefix if needed. + if let Some(base) = token.strip_prefix("not_") { + self.by_variant + .get(base) + .map(|s| s.as_str()) + .unwrap_or(base) + } else { + self.by_variant + .get(token) + .map(|s| s.as_str()) + .unwrap_or(token) + } + } + + /// Rewrite each token in `tokens` to its canonical form. Preserves the + /// `not_` negation prefix when a normalized form is found for the base. + pub fn normalize_in_place(&self, tokens: &mut [String]) { + if self.by_variant.is_empty() { + return; + } + for tok in tokens.iter_mut() { + if let Some(base) = tok.strip_prefix("not_") { + if let Some(canonical) = self.by_variant.get(base) { + *tok = format!("not_{}", canonical); + } + } else if let Some(canonical) = self.by_variant.get(tok.as_str()) { + *tok = canonical.clone(); + } + } + } + + /// Diagnostic helper: how many distinct variants are registered. + pub fn variant_count(&self) -> usize { + self.by_variant.len() + } +} + +#[cfg(test)] +mod tests { + use super::*; + + fn morph_group(canonical: &str, variants: &[&str]) -> LexicalGroup { + LexicalGroup { + kind: LexicalKind::Morph, + lang: "en".into(), + canonical: canonical.into(), + variants: variants.iter().map(|s| s.to_string()).collect(), + } + } + + #[test] + fn empty_index_is_identity() { + let idx = LexicalIndex::new(); + assert!(idx.is_empty()); + let mut tokens = vec!["foo".to_string(), "bar".to_string()]; + idx.normalize_in_place(&mut tokens); + assert_eq!(tokens, vec!["foo", "bar"]); + } + + #[test] + fn morph_normalizes_plural_to_singular() { + let groups = vec![morph_group("child", &["child", "children", "child's"])]; + let idx = LexicalIndex::from_groups(&groups); + assert_eq!(idx.normalize("children"), "child"); + assert_eq!(idx.normalize("child"), "child"); + assert_eq!(idx.normalize("child's"), "child"); + assert_eq!(idx.normalize("dog"), "dog"); + } + + #[test] + fn normalize_in_place_rewrites() { + let groups = vec![ + morph_group("child", &["child", "children"]), + morph_group("warrant", &["warrant", "warrants"]), + ]; + let idx = LexicalIndex::from_groups(&groups); + let mut tokens = vec![ + "missing".to_string(), + "children".to_string(), + "outstanding".to_string(), + "warrants".to_string(), + ]; + idx.normalize_in_place(&mut tokens); + assert_eq!(tokens, vec!["missing", "child", "outstanding", "warrant"]); + } + + #[test] + fn negation_prefix_survives_normalization() { + let groups = vec![morph_group("child", &["child", "children"])]; + let idx = LexicalIndex::from_groups(&groups); + let mut tokens = vec!["not_children".to_string(), "not_dog".to_string()]; + idx.normalize_in_place(&mut tokens); + assert_eq!(tokens, vec!["not_child", "not_dog"]); + } + + #[test] + fn abbrev_kind_round_trips() { + // Same machinery serves both kinds. Single-token abbreviations + // (RBI → "real-time biometric identification") work; multi-token + // abbreviations would need to expand at index/query time before + // tokenization, which is a separate feature (out of scope for + // Phase 1). + let groups = vec![LexicalGroup { + kind: LexicalKind::Abbrev, + lang: "en".into(), + canonical: "rbi".into(), + variants: vec!["rbi".into()], + }]; + let idx = LexicalIndex::from_groups(&groups); + assert_eq!(idx.normalize("rbi"), "rbi"); + } + + #[test] + fn first_canonical_wins_on_conflict() { + // Author error: same variant in two groups. First registration + // wins; second is dropped. Logged elsewhere as a warning. + let groups = vec![ + morph_group("child", &["child", "children"]), + morph_group("kid", &["kid", "children"]), // children already taken + ]; + let idx = LexicalIndex::from_groups(&groups); + assert_eq!(idx.normalize("children"), "child"); + } + + #[test] + fn variant_count_reflects_loaded_groups() { + let groups = vec![morph_group("child", &["child", "children", "child's"])]; + let idx = LexicalIndex::from_groups(&groups); + // 3 variants + canonical (child is already in variants, so 3 total). + assert_eq!(idx.variant_count(), 3); + } + + #[test] + fn case_insensitivity_at_load_time() { + let groups = vec![LexicalGroup { + kind: LexicalKind::Morph, + lang: "en".into(), + canonical: "CHILD".into(), + variants: vec!["Child".into(), "CHILDREN".into()], + }]; + let idx = LexicalIndex::from_groups(&groups); + assert_eq!(idx.normalize("child"), "child"); + assert_eq!(idx.normalize("children"), "child"); + } +} diff --git a/src/lib.rs b/src/lib.rs index 86224a2..cee0052 100644 --- a/src/lib.rs +++ b/src/lib.rs @@ -22,6 +22,8 @@ // of rustdoc + IDE autocomplete. Library users go through `MicroResolve` + // `NamespaceHandle`; these modules are not part of the semver surface. #[doc(hidden)] +pub mod lexical; +#[doc(hidden)] pub mod phrase; #[doc(hidden)] pub mod scoring; @@ -47,6 +49,7 @@ mod engine; pub use engine::{ Band, Disposition, IntentMatch, MicroResolve, NamespaceHandle, ResolveResult, ResolveTrace, }; +pub use lexical::{LexicalGroup, LexicalKind}; /// Default routing threshold (cascade fallback). pub const DEFAULT_THRESHOLD: f32 = 1.0; @@ -130,6 +133,10 @@ pub struct Resolver { /// this namespace. Persisted in `_ns.json`. Use `rebuild_index()` + clear /// to undo. Rail 2 of three: visible action, reversible, bounded. negative_training_log: Vec, + /// Per-namespace morph + abbrev groups (the source of truth). The + /// derived `IntentIndex.lexical` lookup is rebuilt from this Vec on + /// every mutation. Persisted in `_ns.json`. + pub(crate) lexical_groups: Vec, /// Delta-sync oplog: bounded ring of (version, Op) pairs. Not persisted /// via serde — saved/loaded as `_oplog.json` alongside `_index.json`. #[doc(hidden)] diff --git a/src/resolver_core.rs b/src/resolver_core.rs index 936f8d6..e28567a 100644 --- a/src/resolver_core.rs +++ b/src/resolver_core.rs @@ -24,6 +24,7 @@ impl Resolver { namespace_default_min_voting_tokens: None, domain_descriptions: HashMap::new(), negative_training_log: Vec::new(), + lexical_groups: Vec::new(), oplog: std::collections::VecDeque::new(), } } @@ -109,6 +110,7 @@ impl Resolver { namespace_default_min_voting_tokens: None, domain_descriptions: HashMap::new(), negative_training_log: Vec::new(), + lexical_groups: Vec::new(), oplog: std::collections::VecDeque::new(), }; @@ -166,8 +168,9 @@ impl Resolver { let token_lists: Vec> = raw_queries .iter() .map(|q| { - crate::tokenizer::tokenize(q) - .into_iter() + let mut toks = crate::tokenizer::tokenize(q); + self.index.lexical.normalize_in_place(&mut toks); + toks.into_iter() .map(|t| { if let Some(stripped) = t.strip_prefix("not_") { stripped.to_string() @@ -228,7 +231,8 @@ impl Resolver { } pub(crate) fn index_phrase_no_rebuild(&mut self, intent_id: &str, phrase: &str) { - let words = crate::tokenizer::tokenize(phrase); + let mut words = crate::tokenizer::tokenize(phrase); + self.index.lexical.normalize_in_place(&mut words); let word_refs: Vec<&str> = words.iter().map(|s| s.as_str()).collect(); if !word_refs.is_empty() { self.index.learn_phrase(&word_refs, intent_id); @@ -344,7 +348,8 @@ impl Resolver { for (intent_id, phrases) in missed_phrases { for phrase in phrases { // Snapshot before indexing. - let words_pre = crate::tokenizer::tokenize(phrase); + let mut words_pre = crate::tokenizer::tokenize(phrase); + self.index.lexical.normalize_in_place(&mut words_pre); let snap_pairs: Vec<(String, String)> = words_pre .iter() .map(|w| (w.clone(), intent_id.clone())) @@ -383,7 +388,8 @@ impl Resolver { // 2. Learn LLM-extracted query spans as intent-bearing words. for (intent_id, span_text) in spans_to_learn { - let span_words: Vec = crate::tokenizer::tokenize(span_text); + let mut span_words: Vec = crate::tokenizer::tokenize(span_text); + self.index.lexical.normalize_in_place(&mut span_words); let snap_pairs: Vec<(String, String)> = span_words .iter() .map(|w| (w.as_str(), intent_id.as_str())) @@ -445,7 +451,8 @@ impl Resolver { return; } - let tokens = crate::tokenizer::tokenize(query); + let mut tokens = crate::tokenizer::tokenize(query); + self.index.lexical.normalize_in_place(&mut tokens); let scored_ids: FxHashSet<&str> = scored.iter().map(|(id, _)| id.as_str()).collect(); // For each token, count it toward an intent only if that intent is the diff --git a/src/resolver_intents.rs b/src/resolver_intents.rs index 706542d..72142e0 100644 --- a/src/resolver_intents.rs +++ b/src/resolver_intents.rs @@ -227,8 +227,10 @@ impl Resolver { // Index phrase into L2 atomically. self.index_phrase(intent_id, seed); - // Collect weight changes for all tokens in this phrase. - let words = crate::tokenizer::tokenize(seed); + // Collect weight changes for all tokens in this phrase. Normalize so + // we look up the same canonical forms `index_phrase` just stored. + let mut words = crate::tokenizer::tokenize(seed); + self.index.lexical.normalize_in_place(&mut words); let mut changes: Vec<(String, String, f32)> = Vec::new(); for word in &words { if let Some(w) = self.index.get_weight(word, intent_id) { diff --git a/src/resolver_learning.rs b/src/resolver_learning.rs index c5ab784..e8084c5 100644 --- a/src/resolver_learning.rs +++ b/src/resolver_learning.rs @@ -66,7 +66,8 @@ impl Resolver { phrase: query.to_string(), lang: lang.to_string(), }); - let words = crate::tokenizer::tokenize(query); + let mut words = crate::tokenizer::tokenize(query); + self.index.lexical.normalize_in_place(&mut words); let mut changes: Vec<(String, String, f32)> = Vec::new(); for word in &words { if let Some(w) = self.index.get_weight(word, correct_intent) { diff --git a/src/resolver_persist.rs b/src/resolver_persist.rs index cbf8a2b..d94427b 100644 --- a/src/resolver_persist.rs +++ b/src/resolver_persist.rs @@ -128,6 +128,27 @@ impl Resolver { // enough to make rebuild user-visible, add a content-hashed cache // at that point. + // Load lexical_groups from _ns.json BEFORE seeds get indexed below. + // The LexicalIndex must be populated so seed tokenization picks up + // canonical forms (otherwise variants get stored separately, then + // queries after lexical lookup find nothing). + // + // Source of truth: Resolver.lexical_groups (persisted in _ns.json). + // Derived: IntentIndex.lexical (HashMap rebuilt from groups on every + // mutation). Always rebuild together — never let them drift. + if let Ok(json) = std::fs::read_to_string(path.join("_ns.json")) { + if let Ok(val) = serde_json::from_str::(&json) { + if let Some(groups) = val.get("lexical_groups").and_then(|g| g.as_array()) { + let parsed: Vec = groups + .iter() + .filter_map(|g| serde_json::from_value(g.clone()).ok()) + .collect(); + router.index.lexical = crate::lexical::LexicalIndex::from_groups(&parsed); + router.lexical_groups = parsed; + } + } + } + // Propagate namespace-level voting-gate default to the live index. // _ns.json is the source of truth; _index.json's serialized field // (if any) gets overwritten by the namespace setting. @@ -224,17 +245,36 @@ impl Resolver { crate::Error::Persistence(format!("cannot create {}: {}", path.display(), e)) })?; - // Namespace metadata - let mut ns_meta = serde_json::json!({ - "name": self.namespace_name, - "description": self.namespace_description, - }); + // Namespace metadata. Preserve any pack-author fields the engine + // doesn't model directly (compliance_frameworks, policy_overrides, + // anything else). Read the existing _ns.json if present, update + // only engine-managed fields, write back. + let mut ns_meta: serde_json::Value = std::fs::read_to_string(path.join("_ns.json")) + .ok() + .and_then(|s| serde_json::from_str(&s).ok()) + .unwrap_or_else(|| serde_json::json!({})); + + ns_meta["name"] = serde_json::json!(self.namespace_name); + ns_meta["description"] = serde_json::json!(self.namespace_description); if let Some(t) = self.namespace_default_threshold { ns_meta["default_threshold"] = serde_json::json!(t); } if let Some(v) = self.namespace_default_min_voting_tokens { ns_meta["default_min_voting_tokens"] = serde_json::json!(v); } + // Persist lexical_groups (source of truth on disk; LexicalIndex is + // rebuilt from this on every load). + if !self.lexical_groups.is_empty() { + ns_meta["lexical_groups"] = serde_json::to_value(&self.lexical_groups) + .unwrap_or_else(|_| serde_json::json!([])); + } else if ns_meta.get("lexical_groups").is_some() { + // Operator removed all groups → drop the field instead of writing [] + // so the file stays clean. + if let Some(obj) = ns_meta.as_object_mut() { + obj.remove("lexical_groups"); + } + } + std::fs::write( path.join("_ns.json"), serde_json::to_string_pretty(&ns_meta).unwrap_or_default(), diff --git a/src/scoring.rs b/src/scoring.rs index 460c59c..9674906 100644 --- a/src/scoring.rs +++ b/src/scoring.rs @@ -87,6 +87,12 @@ pub struct IntentIndex { /// supporting evidence. #[serde(default)] pub min_voting_tokens: u32, + + /// Per-namespace lexical normalization (morph + abbrev). Built fresh + /// from `_ns.json` on namespace load; not serialized with the index + /// (the source of truth is the lexical_groups list in `_ns.json`). + #[serde(skip)] + pub lexical: crate::lexical::LexicalIndex, } impl IntentIndex { @@ -387,7 +393,8 @@ impl IntentIndex { std::borrow::Cow::Borrowed(normalized) }; - let tokens = crate::tokenizer::tokenize(&query_for_tokenize); + let mut tokens = crate::tokenizer::tokenize(&query_for_tokenize); + self.lexical.normalize_in_place(&mut tokens); let mut scores: FxHashMap = FxHashMap::default(); let mut has_negation = cjk_negated; // Voting-token tracking: distinct (intent, base_token) pairs that @@ -487,7 +494,8 @@ impl IntentIndex { std::borrow::Cow::Borrowed(normalized) }; - let all_tokens: Vec = crate::tokenizer::tokenize(&query_for_tokenize); + let mut all_tokens: Vec = crate::tokenizer::tokenize(&query_for_tokenize); + self.lexical.normalize_in_place(&mut all_tokens); let has_negation = cjk_negated || all_tokens.iter().any(|t| t.starts_with("not_")); let mut remaining: Vec = all_tokens; @@ -728,7 +736,8 @@ impl IntentIndex { return; } - let tokens = crate::tokenizer::tokenize(query); + let mut tokens = crate::tokenizer::tokenize(query); + self.lexical.normalize_in_place(&mut tokens); let confirmed_ids: FxHashSet<&str> = confirmed.iter().map(|(id, _)| id.as_str()).collect(); let mut unique_count: FxHashMap<&str, usize> = FxHashMap::default(); diff --git a/ui/src/App.tsx b/ui/src/App.tsx index d67f4cf..6f77a21 100644 --- a/ui/src/App.tsx +++ b/ui/src/App.tsx @@ -6,6 +6,7 @@ import RouterPage from '@/pages/RouterPage'; import SimulatePage from '@/pages/SimulatePage'; import ReviewPage from '@/pages/ReviewPage'; import IntentsPage from '@/pages/IntentsPage'; +import LexicalGroupsPage from '@/pages/LexicalGroupsPage'; import SettingsPage from '@/pages/SettingsPage'; import NamespacesPage from '@/pages/NamespacesPage'; import ModelsPage from '@/pages/ModelsPage'; @@ -123,6 +124,7 @@ export default function App() { } /> } /> } /> + } /> } /> } /> } /> diff --git a/ui/src/api/client.ts b/ui/src/api/client.ts index ac33f6c..01c36f4 100644 --- a/ui/src/api/client.ts +++ b/ui/src/api/client.ts @@ -214,6 +214,23 @@ export interface IntentInfo { schema?: Record; } +export type LexicalKind = 'morph' | 'abbrev'; + +export interface LexicalGroup { + idx?: number; + kind: LexicalKind; + lang: string; + canonical: string; + variants: string[]; +} + +export interface LexicalSuggestion { + kind: LexicalKind; + lang: string; + canonical: string; + variants: string[]; +} + export interface ReviewAnalysis { correct: string[]; false_positives: { id: string; reason: string }[]; @@ -556,6 +573,18 @@ export const api = { }).then(r => r.json() as Promise<{ remote_url: string | null; auto_push: boolean; has_repo: boolean }>), gitPushNow: () => post<{ ok: boolean; error: string | null }>('/git/push', {}), + // Lexical groups (per-namespace morph + abbrev) + listLexicalGroups: () => + get<{ lexical_groups: LexicalGroup[] }>('/lexical-groups'), + addLexicalGroup: (group: Omit) => + post<{ idx: number }>('/lexical-groups', group), + removeLexicalGroup: (idx: number) => + del(`/lexical-groups/${idx}`), + updateLexicalGroup: (idx: number, group: Omit) => + patch(`/lexical-groups/${idx}`, group), + suggestLexicalGroups: (kind: LexicalKind, lang = 'en') => + post<{ proposals: LexicalSuggestion[]; count: number }>('/lexical-groups/suggest', { kind, lang }), + // Spec Import importSpec: (spec: string) => post<{ diff --git a/ui/src/components/Layout.tsx b/ui/src/components/Layout.tsx index 3d86f95..05c970e 100644 --- a/ui/src/components/Layout.tsx +++ b/ui/src/components/Layout.tsx @@ -180,6 +180,8 @@ export default function Layout() { items: [ { to: '/intents', label: 'Intents', icon: '◆', hint: 'Manage intents, training phrases, metadata' }, + { to: '/lexical', label: 'Lexicon', icon: '⌥', + hint: 'Per-namespace morph + abbrev normalization' }, ], }, { diff --git a/ui/src/pages/LexicalGroupsPage.tsx b/ui/src/pages/LexicalGroupsPage.tsx new file mode 100644 index 0000000..53df3c6 --- /dev/null +++ b/ui/src/pages/LexicalGroupsPage.tsx @@ -0,0 +1,310 @@ +import { useState, useEffect, useMemo } from 'react'; +import { useAppStore } from '@/store'; +import { api } from '@/api/client'; +import type { LexicalGroup, LexicalSuggestion } from '@/api/client'; +import Page from '@/components/Page'; + +type Tab = 'morph' | 'abbrev'; + +export default function LexicalGroupsPage() { + const { settings } = useAppStore(); + const ns = settings.selectedNamespaceId; + const enabledLangs = settings.languages.length > 0 ? settings.languages : ['en']; + + const [tab, setTab] = useState('morph'); + const [groups, setGroups] = useState([]); + const [loading, setLoading] = useState(true); + const [err, setErr] = useState(null); + + const [draftLang, setDraftLang] = useState(enabledLangs[0] || 'en'); + const [draftCanonical, setDraftCanonical] = useState(''); + const [draftVariants, setDraftVariants] = useState(''); + const [adding, setAdding] = useState(false); + + const [suggesting, setSuggesting] = useState(false); + const [proposals, setProposals] = useState(null); + const [proposalLang, setProposalLang] = useState('en'); + + const reload = async () => { + setLoading(true); + try { + const r = await api.listLexicalGroups(); + setGroups(r.lexical_groups); + setErr(null); + } catch (e) { + setErr(String(e)); + } finally { + setLoading(false); + } + }; + + useEffect(() => { reload(); /* eslint-disable-next-line */ }, [ns]); + + const filtered = useMemo( + () => groups.filter(g => g.kind === tab), + [groups, tab], + ); + + const add = async () => { + const canonical = draftCanonical.trim().toLowerCase(); + const variants = draftVariants + .split(',') + .map(v => v.trim().toLowerCase()) + .filter(Boolean); + if (!canonical || variants.length === 0) { + setErr('Canonical and at least one variant required.'); + return; + } + setAdding(true); + setErr(null); + try { + await api.addLexicalGroup({ + kind: tab, + lang: draftLang, + canonical, + variants, + }); + setDraftCanonical(''); + setDraftVariants(''); + reload(); + } catch (e) { + setErr(String(e)); + } finally { + setAdding(false); + } + }; + + const remove = async (idx: number) => { + try { + await api.removeLexicalGroup(idx); + reload(); + } catch (e) { + setErr(String(e)); + } + }; + + const suggest = async () => { + setSuggesting(true); + setErr(null); + setProposals(null); + try { + const r = await api.suggestLexicalGroups(tab, proposalLang); + setProposals(r.proposals); + } catch (e) { + setErr(String(e)); + } finally { + setSuggesting(false); + } + }; + + const approveProposal = async (p: LexicalSuggestion) => { + try { + await api.addLexicalGroup({ + kind: p.kind, + lang: p.lang, + canonical: p.canonical, + variants: p.variants, + }); + setProposals(prev => prev ? prev.filter(x => x !== p) : null); + reload(); + } catch (e) { + setErr(String(e)); + } + }; + + const rejectProposal = (p: LexicalSuggestion) => { + setProposals(prev => prev ? prev.filter(x => x !== p) : null); + }; + + const heading = tab === 'morph' ? 'Inflection groups' : 'Abbreviations'; + const description = tab === 'morph' + ? 'Group inflectional variants of a word (child/children, predict/predicts/predicting). Variants get normalized to the canonical at index time and query time.' + : 'Map short forms to their full phrase (rbi → real-time biometric identification). Abbreviations get expanded to the canonical at index time and query time.'; + + return ( + per-namespace normalization for {ns}} + size="md" + > +
+ +
+
Per-namespace lexical normalization
+

+ Two distinct kinds of mapping the engine applies during tokenization: + morph (inflection variants of one root word) and + abbrev (short forms of a longer phrase). + Both are stored per-namespace, persist in _ns.json, and rebuild the index on every change. +

+

+ This is NOT synonyms — synonyms cause pollution. Only group items that share the same surface meaning. +

+
+ + {/* Tabs */} +
+ {(['morph', 'abbrev'] as const).map(t => ( + + ))} +
+ + {err && ( +
+ {err} +
+ )} + + {/* Add form */} +
+
{heading}
+
{description}
+ +
+ + setDraftCanonical(e.target.value)} + placeholder={tab === 'morph' ? 'canonical (e.g. child)' : 'full phrase (e.g. real-time biometric identification)'} + className="col-span-4 bg-zinc-800 border border-zinc-700 rounded px-2 py-1.5 text-xs text-zinc-100 placeholder-zinc-600 focus:outline-none focus:border-emerald-500" + /> + setDraftVariants(e.target.value)} + onKeyDown={e => e.key === 'Enter' && add()} + placeholder={tab === 'morph' ? 'variants comma-separated (child, children)' : 'variants comma-separated (rbi)'} + className="col-span-5 bg-zinc-800 border border-zinc-700 rounded px-2 py-1.5 text-xs text-zinc-100 placeholder-zinc-600 focus:outline-none focus:border-emerald-500" + /> + +
+
+ + {/* LLM Suggest */} +
+
+
+
LLM suggester
+
+ Operator-triggered. Reads namespace vocabulary, proposes {tab === 'morph' ? 'inflection groups' : 'abbreviation expansions'}. Nothing applies until you approve each one. +
+
+
+ + +
+
+ + {proposals && proposals.length === 0 && ( +
+ No proposals returned. Try again or add manually above. +
+ )} + + {proposals && proposals.length > 0 && ( +
+ {proposals.map((p, i) => ( +
+ {p.lang} + {p.canonical} + + + {p.variants.join(', ')} + + + +
+ ))} +
+ )} +
+ + {/* Existing groups */} +
+
+ Existing — {filtered.length} +
+ {loading ? ( +
Loading…
+ ) : filtered.length === 0 ? ( +
+ No {tab === 'morph' ? 'inflection groups' : 'abbreviations'} yet. +
+ ) : ( +
+ {filtered.map(g => ( +
+ {g.lang} + {g.canonical} + + + {g.variants.join(', ')} + + +
+ ))} +
+ )} +
+ +
+
+ ); +}