[phase-31 4/4] PostgreSQL metastore — migration + compaction columns#6245
[phase-31 4/4] PostgreSQL metastore — migration + compaction columns#6245g-talbot wants to merge 5 commits intogtt/phase-31-writer-wiringfrom
Conversation
|
|
||
| -- Compaction scope index: supports the compaction planner's primary query pattern | ||
| -- "give me all Published splits for a given (index_uid, sort_fields, window_start) triple." | ||
| CREATE INDEX IF NOT EXISTS idx_metrics_splits_compaction_scope |
There was a problem hiding this comment.
On the log side, we have a column (maturity_timestamp) that is used (I think?) to filter further the scope of splits candidate for merge: they need to be published and immature.
I would prefer to keep the same logic, but it should work without it.
Generally speaking AI hurts the consistency of this project. I guess that's the price to pay for the increased productivity.
| } | ||
|
|
||
| // Filter by compaction scope | ||
| if let Some(ws) = query.window_start |
There was a problem hiding this comment.
can we avoid two letter variable name. This is just hurting readability here.
ws -> window_start
| if let Some(ws) = query.window_start | ||
| && split.metadata.window_start != Some(ws) | ||
| { | ||
| return false; |
There was a problem hiding this comment.
So this checks for window_start being equal. What is ListMetricsSplitsQuery?
If it is for the read path, it should check if the window intersect, shoudn't it?
| let mut size_bytes_list = Vec::with_capacity(splits_metadata.len()); | ||
| let mut split_metadata_jsons = Vec::with_capacity(splits_metadata.len()); | ||
| let mut window_starts: Vec<Option<i64>> = Vec::with_capacity(splits_metadata.len()); | ||
| let mut window_duration_secs_list: Vec<Option<i32>> = |
There was a problem hiding this comment.
I expected u32 here. You made it unsigned in the file backed metastore. Does postgres not allow this?
| pub tag_region: Option<String>, | ||
| /// Host tag filter. | ||
| pub tag_host: Option<String>, | ||
| /// Window start filter for compaction scope queries. |
There was a problem hiding this comment.
ah ok, so if I understand correctly, this query object exists both for compaction and for the read path.
For the read path we use range_start/range_end, for compaction we use window_start.
Do we ever change the window duration in reality?
Could we drop window_start and query for [time_range_start..time_range_start + 15mn) ?
c8836d6 to
eccf657
Compare
3bbfb71 to
95c3596
Compare
6eeaecd to
f2113e5
Compare
179ccd2 to
ed6d687
Compare
f2113e5 to
0d561b4
Compare
ed6d687 to
a4d0d36
Compare
8ba9201 to
0761d11
Compare
a4d0d36 to
f05d4e7
Compare
0761d11 to
b551b13
Compare
f05d4e7 to
4a0507e
Compare
b551b13 to
398cdf2
Compare
4a0507e to
bc9458d
Compare
3289f58 to
dbbbd8b
Compare
2599f67 to
46903b3
Compare
54a1f95 to
9696d99
Compare
74bfd04 to
de0f8c6
Compare
9696d99 to
bf23ab3
Compare
…publish Add compaction metadata to the PostgreSQL metastore: Migration 27: - 6 new columns: window_start, window_duration_secs, sort_fields, num_merge_ops, row_keys, zonemap_regexes - Partial index idx_metrics_splits_compaction_scope on (index_uid, sort_fields, window_start) WHERE split_state = 'Published' stage_metrics_splits: - INSERT extended from 15 to 21 bind parameters for compaction columns - ON CONFLICT SET updates all compaction columns list_metrics_splits: - PgMetricsSplit construction includes compaction fields (defaults from JSON) Also fixes pre-existing compilation errors on upstream-10b-parquet-actors: - Missing StageMetricsSplitsRequestExt import - index_id vs index_uid type mismatches in publish/mark/delete - IndexUid binding (to_string() for sqlx) - ListMetricsSplitsResponseExt trait disambiguation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ublish validation Close critical gaps identified during port review: split_writer.rs: - Store table_config on ParquetSplitWriter (not just pass-through) - Compute window_start from batch time range using table_config.window_duration_secs - Populate sort_fields, window_duration_secs, parquet_files on metadata before write - Call write_to_file_with_metadata(Some(&metadata)) to embed KV metadata in Parquet - Update size_bytes after write completes metastore/mod.rs: - Add window_start and sort_fields fields to ListMetricsSplitsQuery - Add with_compaction_scope() builder method metastore/postgres/metastore.rs: - Add compaction scope filters (AND window_start = $N, AND sort_fields = $N) to list query - Add replaced_split_ids count verification in publish_metrics_splits - Bind compaction scope query parameters ingest/config.rs: - Add table_config: TableConfig field to ParquetIngestConfig Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ad code removal - file_backed_index/mod.rs: Add window_start and sort_fields filtering to metrics_split_matches_query() for compaction scope queries - writer.rs: Add test_meta07_self_describing_parquet_roundtrip test (writes compaction metadata to Parquet, reads back from cold file, verifies all fields roundtrip correctly) - fields.rs: Remove dead sort_order() method (replaced by TableConfig) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…regexes
Gap 1: Change window_duration_secs from i32 to Option<i32> in both
PgMetricsSplit and InsertableMetricsSplit. Pre-Phase-31 splits now
correctly map 0 → NULL in PostgreSQL, enabling Phase 32 compaction
queries to use `WHERE window_duration_secs IS NOT NULL` instead of
the fragile `WHERE window_duration_secs > 0`.
Gap 2: Change zonemap_regexes from String to serde_json::Value in
both structs. This maps directly to JSONB in sqlx, avoiding ambiguity
when PostgreSQL JSONB operators are used in Phase 34/35 zonemap pruning.
Gap 3: Add two missing tests:
- test_insertable_from_metadata_with_compaction_fields: verifies all 6
compaction fields round-trip through InsertableMetricsSplit
- test_insertable_from_metadata_pre_phase31_defaults: verifies pre-Phase-31
metadata produces window_duration_secs: None, zonemap_regexes: json!({})
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
de0f8c6 to
2d9e6eb
Compare
bf23ab3 to
a60b361
Compare
Summary
Add compaction metadata to the PostgreSQL metastore with migration 27 and updated SQL queries (Phase 31 Metadata Foundation, PR 4 of 4).
Stacks on
gtt/phase-31-writer-wiring(PR #6244).What's included
Migration 27 (
27_add-compaction-metadata.{up,down}.sql):window_start,window_duration_secs,sort_fields,num_merge_ops,row_keys,zonemap_regexesidx_metrics_splits_compaction_scopeon(index_uid, sort_fields, window_start) WHERE split_state = 'Published'stage_metrics_splits:
list_metrics_splits:
Bug fixes (pre-existing on upstream-10b-parquet-actors):
StageMetricsSplitsRequestExtimportindex_idvsindex_uidtype mismatches in publish, mark, and delete functionsIndexUidbinding to sqlx (use.to_string())ListMetricsSplitsResponseExt::try_from_splitstrait disambiguationVerification
cargo build -p quickwit-metastore --features quickwit-metastore/postgres✅Test plan
make test-metrics-e2e🤖 Generated with Claude Code