-
Notifications
You must be signed in to change notification settings - Fork 535
[phase-31 4/4] PostgreSQL metastore — migration + compaction columns #6245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: gtt/phase-31-writer-wiring
Are you sure you want to change the base?
Changes from all commits
ff605b9
723168f
9ca263d
73a20ef
75c15a0
ef21859
b4dac46
db51a96
605708e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| -- Reverse Phase 31: Remove compaction metadata columns. | ||
| DROP INDEX IF EXISTS idx_metrics_splits_compaction_scope; | ||
| ALTER TABLE metrics_splits DROP COLUMN IF EXISTS zonemap_regexes; | ||
| ALTER TABLE metrics_splits DROP COLUMN IF EXISTS row_keys; | ||
| ALTER TABLE metrics_splits DROP COLUMN IF EXISTS num_merge_ops; | ||
| ALTER TABLE metrics_splits DROP COLUMN IF EXISTS sort_fields; | ||
| ALTER TABLE metrics_splits DROP COLUMN IF EXISTS window_duration_secs; | ||
| ALTER TABLE metrics_splits DROP COLUMN IF EXISTS window_start; |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| -- Phase 31: Add compaction metadata columns to metrics_splits. | ||
| -- These columns support time-windowed compaction planning and execution. | ||
| ALTER TABLE metrics_splits ADD COLUMN IF NOT EXISTS window_start BIGINT; | ||
| ALTER TABLE metrics_splits ADD COLUMN IF NOT EXISTS window_duration_secs INTEGER; | ||
| ALTER TABLE metrics_splits ADD COLUMN IF NOT EXISTS sort_fields TEXT NOT NULL DEFAULT ''; | ||
| ALTER TABLE metrics_splits ADD COLUMN IF NOT EXISTS num_merge_ops INTEGER NOT NULL DEFAULT 0; | ||
| ALTER TABLE metrics_splits ADD COLUMN IF NOT EXISTS row_keys BYTEA; | ||
| ALTER TABLE metrics_splits ADD COLUMN IF NOT EXISTS zonemap_regexes JSONB NOT NULL DEFAULT '{}'; | ||
|
|
||
| -- Compaction scope index: supports the compaction planner's primary query pattern | ||
| -- "give me all Published splits for a given (index_uid, sort_fields, window_start) triple." | ||
| CREATE INDEX IF NOT EXISTS idx_metrics_splits_compaction_scope | ||
| ON metrics_splits (index_uid, sort_fields, window_start) | ||
| WHERE split_state = 'Published'; | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -75,6 +75,10 @@ pub struct ListMetricsSplitsQuery { | |
| pub tag_region: Option<String>, | ||
| /// Host tag filter. | ||
| pub tag_host: Option<String>, | ||
| /// Window start filter for compaction scope queries. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ah ok, so if I understand correctly, this query object exists both for compaction and for the read path. For the read path we use range_start/range_end, for compaction we use window_start. Could we drop window_start and query for [time_range_start..time_range_start + 15mn) ? |
||
| pub window_start: Option<i64>, | ||
| /// Sort fields filter for compaction scope queries. | ||
| pub sort_fields: Option<String>, | ||
| /// Limit number of results. | ||
| pub limit: Option<usize>, | ||
| } | ||
|
|
@@ -107,6 +111,17 @@ impl ListMetricsSplitsQuery { | |
| self.metric_names = names; | ||
| self | ||
| } | ||
|
|
||
| /// Filter by compaction scope (window_start + sort_fields). | ||
| pub fn with_compaction_scope( | ||
| mut self, | ||
| window_start: i64, | ||
| sort_fields: impl Into<String>, | ||
| ) -> Self { | ||
| self.window_start = Some(window_start); | ||
| self.sort_fields = Some(sort_fields.into()); | ||
| self | ||
| } | ||
| } | ||
|
|
||
| /// Splits batch size returned by the stream splits API | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the log side, we have a column (maturity_timestamp) that is used (I think?) to filter further the scope of splits candidate for merge: they need to be published and immature.
I would prefer to keep the same logic, but it should work without it.
Generally speaking AI hurts the consistency of this project. I guess that's the price to pay for the increased productivity.