feat(tracing): granular spans + auto-index migration for SQLite#10
feat(tracing): granular spans + auto-index migration for SQLite#10anoop-narang merged 2 commits intomainfrom
Conversation
…ch_distances Break down the opaque usearch_execute span into sub-spans: - usearch_hnsw_search: pure HNSW index lookup (with k and dims) - usearch_sqlite_fetch: point lookup by keys (with fetch count) - usearch_attach_distances: distance column assembly Applied to both unfiltered and filtered (high-selectivity) paths.
bd02ce8 to
4cfb4cc
Compare
src/sqlite_provider.rs
Outdated
| let has_index: bool = conn | ||
| .query_row( | ||
| "SELECT COUNT(*) FROM sqlite_master WHERE type='index' AND tbl_name=?1 AND sql LIKE ?2", | ||
| rusqlite::params![table_name, format!("%{}%", quote_ident(key_col))], |
There was a problem hiding this comment.
P2 — LIKE wildcards in column name cause false positives
quote_ident(key_col) wraps the name in double-quotes but doesn't escape SQLite LIKE metacharacters (% and _). With the commonly-used key column _key, the pattern becomes %"_key"%, where _ is a LIKE wildcard matching any single character. This matches index SQL containing "akey", "bkey", etc. — any column whose name ends in key. If such an index exists on the table, has_index returns true and the migration is silently skipped.
Suggested fix — escape % and _ in the pattern and add an ESCAPE clause:
| rusqlite::params![table_name, format!("%{}%", quote_ident(key_col))], | |
| rusqlite::params![table_name, format!("%{}%", quote_ident(key_col).replace('%', "\\%").replace('_', "\\_")), |
And change the query to:
... AND sql LIKE ?2 ESCAPE '\'Alternatively, avoid SQL text matching entirely and use pragma_index_list / pragma_index_info to inspect indexes programmatically — that approach handles all edge cases (unquoted identifiers, multi-column indexes, auto indexes).
src/sqlite_provider.rs
Outdated
| // Check if any index already covers the key column. | ||
| let has_index: bool = conn | ||
| .query_row( | ||
| "SELECT COUNT(*) FROM sqlite_master WHERE type='index' AND tbl_name=?1 AND sql LIKE ?2", |
There was a problem hiding this comment.
P3 — SQL text matching is brittle for manually created indexes
Matching against sqlite_master.sql with a LIKE pattern on the quoted identifier means indexes created without quoting (e.g. CREATE INDEX idx ON t(key_col)) won't be detected. The result is a harmless duplicate index (different name, SQLite allows it), but it's silent noise on every startup for any table where the index was created outside this code path.
More robust: use pragma_index_list(?1) to enumerate indexes on the table, then pragma_index_info(idx_name) to check column names — no SQL text matching needed.
4cfb4cc to
37c3d92
Compare
| col = quote_ident(key_col), | ||
| ), | ||
| [], | ||
| ) |
There was a problem hiding this comment.
P1 — Race condition: missing IF NOT EXISTS
If two processes start simultaneously against the same database (a realistic deployment pattern), both can pass the has_index check above and then both attempt CREATE INDEX "idx_table_key". The second attempt hits SQLITE_ERROR: index already exists and propagates as a DataFusionError, aborting provider initialization for that process.
The same failure occurs if the has_index LIKE check produces a false negative (see below) and an index with the same auto-generated name already exists in the database.
| ) | |
| conn.execute( | |
| &format!( | |
| "CREATE INDEX IF NOT EXISTS {idx} ON {tn}({col})", | |
| idx = quote_ident(&format!("idx_{table_name}_{key_col}")), | |
| tn = quote_ident(table_name), | |
| col = quote_ident(key_col), | |
| ), | |
| [], | |
| ) | |
| .map_err(|e| DataFusionError::Execution(format!("failed to create key index: {e}")))?; |
src/sqlite_provider.rs
Outdated
| let has_index: bool = conn | ||
| .query_row( | ||
| "SELECT COUNT(*) FROM sqlite_master WHERE type='index' AND tbl_name=?1 AND sql LIKE ?2", | ||
| rusqlite::params![table_name, format!("%{}%", quote_ident(key_col))], | ||
| |row| row.get::<_, i64>(0), | ||
| ) | ||
| .map(|n| n > 0) | ||
| .unwrap_or(false); |
There was a problem hiding this comment.
P2 (suggestion) — has_index check against sqlite_master.sql text is fragile
Two problems with the LIKE-on-SQL approach:
-
False negatives: If an index was created with unquoted identifiers (
CREATE INDEX idx ON t(col)rather thanCREATE INDEX "idx" ON "t"("col")), the pattern%"col"%won't match even though the index exists. Combined with the missingIF NOT EXISTS(see other comment), this can cause a startup failure if the auto-generated index name collides with the existing one. -
LIKE wildcards in column names:
quote_ident(key_col)doesn't escape%or_, so a column named e.g.key_100%would produce a LIKE pattern with unescaped metacharacters.
More robust approach — use pragma_index_list + pragma_index_info to check coverage structurally rather than parsing SQL text:
let has_index: bool = {
let mut stmt = conn
.prepare(&format!(
"SELECT ii.name FROM pragma_index_list({tn}) AS il
JOIN pragma_index_info(il.name) AS ii ON ii.name = ?1",
tn = quote_ident(table_name),
))
.unwrap_or_else(|_| return);
stmt.exists(rusqlite::params![key_col]).unwrap_or(false)
};Non-blocking suggestion — adding IF NOT EXISTS (other comment) makes the current check safe enough for this use case.
37c3d92 to
6862b99
Compare
On startup, ensure_key_index() checks whether the key column has a PRIMARY KEY or secondary index. If neither exists, it creates one. This prevents accidental full table scans if a table is ever built or altered without a proper key index.
6862b99 to
09f63d8
Compare
| .await | ||
| } | ||
| .instrument(tracing::info_span!( | ||
| "usearch_sqlite_fetch", |
There was a problem hiding this comment.
Suggestion (non-blocking): The span name "usearch_sqlite_fetch" is baked into lookup_provider.fetch_by_keys(...), but lookup_provider is provider-agnostic — it could be a ParquetLookupProvider. This will appear as usearch_sqlite_fetch in traces even when no SQLite is involved, which may confuse consumers of the telemetry.
Consider "usearch_lookup_fetch" (or similar) here and at the corresponding site in adaptive_filtered_execute (~line 517).
|
Addressed all bot review comments:
|
|
Re: span name |
Summary
usearch_executeto break down the opaque single span into:usearch_hnsw_search: pure HNSW index lookup (records k and dims)usearch_sqlite_fetch: SQLite point lookup by keys (records fetch count)usearch_attach_distances: distance column assemblyINTEGER PRIMARY KEYfix (PR fix(sqlite-provider): use caller-provided key column name #9). On startup,ensure_key_index()checks for an existing PK or secondary index and creates one if missing.This fixed a 40x performance regression — legacy tables without a key index were doing full table scans (216ms for 5 rows from 1.2M), now 0.1ms.
Test plan
cargo fmt --check— cleancargo clippy -- -D warnings— cleancargo test --features sqlite-provider— all 7 tests pass