feat(tracing): granular spans + auto-index migration for SQLite by anoop-narang · Pull Request #10 · hotdata-dev/datafusion-vector-search-ext

anoop-narang · 2026-03-18T14:02:44Z

Summary

Add granular tracing spans inside usearch_execute to break down the opaque single span into:
- usearch_hnsw_search: pure HNSW index lookup (records k and dims)
- usearch_sqlite_fetch: SQLite point lookup by keys (records fetch count)
- usearch_attach_distances: distance column assembly
Auto-create index on key column for legacy SQLite tables that were built before the INTEGER PRIMARY KEY fix (PR fix(sqlite-provider): use caller-provided key column name #9). On startup, ensure_key_index() checks for an existing PK or secondary index and creates one if missing.

This fixed a 40x performance regression — legacy tables without a key index were doing full table scans (216ms for 5 rows from 1.2M), now 0.1ms.

Test plan

cargo fmt --check — clean
cargo clippy -- -D warnings — clean
cargo test --features sqlite-provider — all 7 tests pass
Verified auto-migration creates index on legacy table (no PK)
Verified spans appear in Jaeger via OTEL
Benchmarked: 250ms → 6ms end-to-end on 1.2M row dataset

…ch_distances Break down the opaque usearch_execute span into sub-spans: - usearch_hnsw_search: pure HNSW index lookup (with k and dims) - usearch_sqlite_fetch: point lookup by keys (with fetch count) - usearch_attach_distances: distance column assembly Applied to both unfiltered and filtered (high-selectivity) paths.

claude · 2026-03-18T14:07:37Z

src/sqlite_provider.rs

+    let has_index: bool = conn
+        .query_row(
+            "SELECT COUNT(*) FROM sqlite_master WHERE type='index' AND tbl_name=?1 AND sql LIKE ?2",
+            rusqlite::params![table_name, format!("%{}%", quote_ident(key_col))],


P2 — LIKE wildcards in column name cause false positives

quote_ident(key_col) wraps the name in double-quotes but doesn't escape SQLite LIKE metacharacters (% and _). With the commonly-used key column _key, the pattern becomes %"_key"%, where _ is a LIKE wildcard matching any single character. This matches index SQL containing "akey", "bkey", etc. — any column whose name ends in key. If such an index exists on the table, has_index returns true and the migration is silently skipped.

Suggested fix — escape % and _ in the pattern and add an ESCAPE clause:

Suggested change

rusqlite::params![table_name, format!("%{}%", quote_ident(key_col))],

rusqlite::params![table_name, format!("%{}%", quote_ident(key_col).replace('%', "\\%").replace('_', "\\_")),

And change the query to:

... AND sql LIKE ?2 ESCAPE '\'

Alternatively, avoid SQL text matching entirely and use pragma_index_list / pragma_index_info to inspect indexes programmatically — that approach handles all edge cases (unquoted identifiers, multi-column indexes, auto indexes).

claude · 2026-03-18T14:07:41Z

src/sqlite_provider.rs

+    // Check if any index already covers the key column.
+    let has_index: bool = conn
+        .query_row(
+            "SELECT COUNT(*) FROM sqlite_master WHERE type='index' AND tbl_name=?1 AND sql LIKE ?2",


P3 — SQL text matching is brittle for manually created indexes

Matching against sqlite_master.sql with a LIKE pattern on the quoted identifier means indexes created without quoting (e.g. CREATE INDEX idx ON t(key_col)) won't be detected. The result is a harmless duplicate index (different name, SQLite allows it), but it's silent noise on every startup for any table where the index was created outside this code path.

More robust: use pragma_index_list(?1) to enumerate indexes on the table, then pragma_index_info(idx_name) to check column names — no SQL text matching needed.

claude · 2026-03-18T14:11:29Z

src/sqlite_provider.rs

+            col = quote_ident(key_col),
+        ),
+        [],
+    )


P1 — Race condition: missing IF NOT EXISTS

If two processes start simultaneously against the same database (a realistic deployment pattern), both can pass the has_index check above and then both attempt CREATE INDEX "idx_table_key". The second attempt hits SQLITE_ERROR: index already exists and propagates as a DataFusionError, aborting provider initialization for that process.

The same failure occurs if the has_index LIKE check produces a false negative (see below) and an index with the same auto-generated name already exists in the database.

Suggested change

)

conn.execute(

&format!(

"CREATE INDEX IF NOT EXISTS {idx} ON {tn}({col})",

idx = quote_ident(&format!("idx_{table_name}_{key_col}")),

tn = quote_ident(table_name),

col = quote_ident(key_col),

),

[],

)

.map_err(|e| DataFusionError::Execution(format!("failed to create key index: {e}")))?;

claude · 2026-03-18T14:11:40Z

src/sqlite_provider.rs

+    let has_index: bool = conn
+        .query_row(
+            "SELECT COUNT(*) FROM sqlite_master WHERE type='index' AND tbl_name=?1 AND sql LIKE ?2",
+            rusqlite::params![table_name, format!("%{}%", quote_ident(key_col))],
+            |row| row.get::<_, i64>(0),
+        )
+        .map(|n| n > 0)
+        .unwrap_or(false);


P2 (suggestion) — has_index check against sqlite_master.sql text is fragile

Two problems with the LIKE-on-SQL approach:

False negatives: If an index was created with unquoted identifiers (CREATE INDEX idx ON t(col) rather than CREATE INDEX "idx" ON "t"("col")), the pattern %"col"% won't match even though the index exists. Combined with the missing IF NOT EXISTS (see other comment), this can cause a startup failure if the auto-generated index name collides with the existing one.

LIKE wildcards in column names: quote_ident(key_col) doesn't escape % or _, so a column named e.g. key_100% would produce a LIKE pattern with unescaped metacharacters.

More robust approach — use pragma_index_list + pragma_index_info to check coverage structurally rather than parsing SQL text:

let has_index: bool = { let mut stmt = conn .prepare(&format!( "SELECT ii.name FROM pragma_index_list({tn}) AS il JOIN pragma_index_info(il.name) AS ii ON ii.name = ?1", tn = quote_ident(table_name), )) .unwrap_or_else(|_| return); stmt.exists(rusqlite::params![key_col]).unwrap_or(false) };

Non-blocking suggestion — adding IF NOT EXISTS (other comment) makes the current check safe enough for this use case.

On startup, ensure_key_index() checks whether the key column has a PRIMARY KEY or secondary index. If neither exists, it creates one. This prevents accidental full table scans if a table is ever built or altered without a proper key index.

claude · 2026-03-18T14:15:24Z

src/planner.rs

+                .await
+        }
+        .instrument(tracing::info_span!(
+            "usearch_sqlite_fetch",


Suggestion (non-blocking): The span name "usearch_sqlite_fetch" is baked into lookup_provider.fetch_by_keys(...), but lookup_provider is provider-agnostic — it could be a ParquetLookupProvider. This will appear as usearch_sqlite_fetch in traces even when no SQLite is involved, which may confuse consumers of the telemetry.

Consider "usearch_lookup_fetch" (or similar) here and at the corresponding site in adaptive_filtered_execute (~line 517).

anoop-narang · 2026-03-18T14:15:40Z

Addressed all bot review comments:

P1 (race condition): Added IF NOT EXISTS to CREATE INDEX statement.
P2 (LIKE wildcards): Replaced sqlite_master LIKE matching with pragma_index_list + pragma_index_info — inspects indexes programmatically, no SQL text matching.
P3 (brittle SQL matching): Same fix as P2.

anoop-narang · 2026-03-18T14:18:08Z

Re: span name usearch_sqlite_fetch — acknowledged, but keeping as-is for now. SQLite is the only lookup provider implementation and the specific name helps in trace analysis. If we add other backends we can rename then.

anoop-narang force-pushed the feat/granular-tracing-spans branch 2 times, most recently from bd02ce8 to 4cfb4cc Compare March 18, 2026 14:04

claude bot reviewed Mar 18, 2026

View reviewed changes

claude bot previously approved these changes Mar 18, 2026

View reviewed changes

anoop-narang dismissed claude[bot]’s stale review via 37c3d92 March 18, 2026 14:08

anoop-narang force-pushed the feat/granular-tracing-spans branch from 4cfb4cc to 37c3d92 Compare March 18, 2026 14:08

claude bot reviewed Mar 18, 2026

View reviewed changes

anoop-narang force-pushed the feat/granular-tracing-spans branch from 37c3d92 to 6862b99 Compare March 18, 2026 14:12

anoop-narang force-pushed the feat/granular-tracing-spans branch from 6862b99 to 09f63d8 Compare March 18, 2026 14:12

claude bot reviewed Mar 18, 2026

View reviewed changes

claude bot approved these changes Mar 18, 2026

View reviewed changes

anoop-narang merged commit a016358 into main Mar 18, 2026
5 checks passed

anoop-narang deleted the feat/granular-tracing-spans branch March 18, 2026 14:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tracing): granular spans + auto-index migration for SQLite#10

feat(tracing): granular spans + auto-index migration for SQLite#10
anoop-narang merged 2 commits intomainfrom
feat/granular-tracing-spans

anoop-narang commented Mar 18, 2026

Uh oh!

claude bot Mar 18, 2026

Uh oh!

claude bot Mar 18, 2026

Uh oh!

claude bot Mar 18, 2026

Uh oh!

claude bot Mar 18, 2026

Uh oh!

claude bot Mar 18, 2026

Uh oh!

anoop-narang commented Mar 18, 2026

Uh oh!

anoop-narang commented Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	rusqlite::params![table_name, format!("%{}%", quote_ident(key_col))],
	rusqlite::params![table_name, format!("%{}%", quote_ident(key_col).replace('%', "\\%").replace('_', "\\_")),

-    )
+    conn.execute(
+        &format!(
+            "CREATE INDEX IF NOT EXISTS {idx} ON {tn}({col})",
+            idx = quote_ident(&format!("idx_{table_name}_{key_col}")),
+            tn = quote_ident(table_name),
+            col = quote_ident(key_col),
+        ),
+        [],
+    )
+    .map_err(|e| DataFusionError::Execution(format!("failed to create key index: {e}")))?;

Conversation

anoop-narang commented Mar 18, 2026

Summary

Test plan

Uh oh!

claude bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

anoop-narang commented Mar 18, 2026

Uh oh!

anoop-narang commented Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant