Skip to content

Explorer FTS Track 5: GO/NO-GO decision gate #172

@rdhyee

Description

@rdhyee

Updated 2026-05-08 (rounds 1 + 2 per Codex review on #165). Round 1 added quality-gate cells. Round 2 sharpened: (a) "non-empty" is necessary but not sufficient — concept-only and stopword-heavy checks now require top-K relevance, not just any results; (b) added pathological-behavior hard-fails covering tokenizer parity, all-stopword queries, duplicate terms, edge-length tokens, missing display-join rows, filter composition; (c) NO-GO framing makes hosted-search a permanent contingency for either v1 failure or future v2+ quality requirements.

Sub-issue of #165. Depends on #171 (browser query prototype + benchmark data).

Goal

Mechanical decision gate. No budget renegotiation here — the budgets in #169's SEARCH_INDEX_V1.md are the contract.

Decision criterion

Does the prototype meet ALL of the cells below — every latency/bytes target, every quality target, and every hard-fail check?

A fast-but-mediocre search that fails any quality cell or any hard-fail check is NOT a GO, regardless of how it performs on the latency/bytes table.

Performance gates (hard)

metric contract prototype pass?
cold first search (P50) ≤ 2 s (fill)
warm repeat-same-query search ≤ 500 ms (fill)
warm new-query-after-warm-up search ≤ 500 ms (fill)
filter-composed cold search ≤ 3 s (fill)
bytes transferred cold ≤ 5 MB (fill)
bytes transferred warm ≤ 1 MB (fill)

Quality gates (hard, not advisory)

metric contract prototype pass?
top-3 overlap vs hand-labeled set ≥ TBD% (fill)
top-10 overlap vs hand-labeled set ≥ TBD% (fill)
top-10 overlap vs DuckDB FTS local oracle ≥ TBD% (fill)
concept-only top-3 relevance: each of ceramic, bone, mammal (+ 1-2 more) returns ≥ 2 of 3 hand-labeled known-good PIDs 2/3 each (fill)
stopword-heavy near-equivalence: top-K Jaccard between pottery from Cyprus and pottery Cyprus (stopword-stripped form) ≥ 0.8 top-10 (fill)

Numeric thresholds get filled in once #167 baseline + #171 prototype + DuckDB FTS oracle numbers land, so we know what "beats ILIKE" and "approaches BM25 oracle" mean on the canonical query set.

Hard-fail checks (any single failure = NO-GO)

Semantics

check pass?
Concept-only queries (ceramic, bone, mammal) all return non-empty results
Concept-only queries hit the top-3 relevance bar in the quality table above
Stopword-heavy queries (pottery from Cyprus) return non-empty results
Stopword-heavy queries hit the near-equivalence bar in the quality table above
Diacritic queries (Çatalhöyük) match the diacritic-stripped index

Tokenizer + query parsing

check pass?
Tokenizer parity: Python and JS produce identical token sequences for every term in the curated benchmark (not just the regression set)
All-stopword query (a the of) yields a controlled empty state with helpful copy, not an error or a full-corpus dump
Duplicate terms (pottery pottery cyprus) produce the same top-K result identity as pottery cyprus, within ranking-order tolerance
Empty / 1-char / very-long token queries: do not fetch broad shards; return an empty or error-with-copy state without long stalls
Wildcard literals (%, _) tokenize without errors

Display + composition

check pass?
Missing display-join rows: substrate hit whose pid has no row in samples_map_lite does not crash and does not silently drop a top hit (either show with placeholder or document as known limit)
Filter composition matches a labeled expectation (one of two modes per (query, filter) pair). (a) Pair has a hand-labeled expected filtered top-K (in tests/search_benchmark.json); the substrate's filtered top-K must match it. (b) Filter is chosen such that ALL hand-labeled unfiltered top-K results satisfy it (e.g., source filter whose set covers every top-K result's source); the filtered top-K must equal the unfiltered top-K. The earlier "implicitly satisfies" wording was too loose — a top result that doesn't satisfy the filter legitimately drops out, so a raw top-K change is not necessarily a bug; the invariant has to be tied to labeled expectation. Tested on at least 3 distinct (query, filter) pairs.

Two outcomes

GO

  • All performance cells pass.
  • All quality cells pass.
  • All hard-fail checks pass.
  • Open ship issue: remove ?fts=v1 flag, route doSearch() permanently to substrate path, deprecate the ILIKE path.
  • Update query-spec.qmd:225 to describe the substrate-backed search.
  • Close Improve Interactive Explorer full-text search substrate #165 once ship issue lands.

A v1 GO does not close the hosted-search-backend question. It defers it. See NO-GO framing below for why hosted search remains a permanent contingency for v2+ requirements (richer analyzers, phrase search, typo tolerance, v2 field growth).

NO-GO

  • At least one cell fails.
  • File Explorer FTS Track 6: Hosted-search backend issue with:
    • the failed-cell data attached (which budgets, which quality, which hard-fails)
    • a starter requirements doc referencing Solr searchText semantics from query-spec.qmd:213-221
    • the DuckDB FTS local oracle numbers from Explorer FTS Track 4: Browser query prototype + benchmark #171 §5 as the relevance bar to clear
    • explicit framing: hosted-search is the answer if the static substrate is structurally limited; static-site constraint should not permanently cap search quality
  • Keep the ?fts=v1 flag in place as a measurement tool until the hosted backend lands.
  • Close Improve Interactive Explorer full-text search substrate #165 with a pointer to the hosted-search issue.

Hosted-search backend as a permanent contingency

The Track 6 hosted-search-backend issue may be triggered by either:

  • (a) v1 GO/NO-GO failure — at least one cell fails the gate above.
  • (b) Post-ship v2+ requirements — even on a v1 GO, future quality requirements (phrase search, typo tolerance, richer analyzers, v2 field growth that exceeds the static substrate's byte budget) may exceed what a static-Parquet substrate can deliver. When that happens, Track 6 fires for the same reasons the budget data would have triggered it under (a).

Both triggers file the same downstream issue with the same starter requirements doc.

Refs

#165, #169, #171

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestexplorerInteractive Explorer features

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions