Conversation
d51894c to
37162f3
Compare
… retention config
993d043 to
fa5eee3
Compare
|
|
||
| /// Deletes a single batch of parquet splits from storage and metastore. | ||
| /// Returns (succeeded, failed). | ||
| async fn delete_parquet_splits_from_storage_and_metastore( |
There was a problem hiding this comment.
mimics logic in delete_splits_from_storage_and_metastore
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1d08467270
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| Err(err) => { | ||
| error!(index_uid=%index_uid, error=?err, "failed to list metrics splits"); | ||
| break; |
There was a problem hiding this comment.
Propagate metastore list failures from parquet GC
When list_metrics_splits fails, this branch only logs and breaks, and delete_marked_parquet_splits later returns Ok(removal_info); run_parquet_garbage_collect therefore reports success even when no cleanup could be performed. In production metastore outages for metrics indexes, janitor success counters/metrics are incremented and operators lose the failure signal, so parquet GC can be silently ineffective for entire runs.
Useful? React with 👍 / 👎.
| let query = ListMetricsSplitsQuery::for_index(index_uid.clone()) | ||
| .with_max_time_range_end(max_retention_timestamp); |
There was a problem hiding this comment.
Paginate parquet retention scans before marking splits
This retention path queries expired metrics splits without a limit/cursor and then marks all returned split IDs in one request, which scales poorly compared to the paginated GC path added in this commit. On indexes with many expired parquet splits, the response/request payload can become large enough to hit RPC/message-size limits or memory pressure, causing the retention execution to fail and leave old data unmarked.
Useful? React with 👍 / 👎.
Description
This PR can be reviewed commit by commit.
Enables the janitor to clean up parquet files, for metrics indexes. Functionally, should be the same as tantivy split cleanup.
How was this PR tested?
Describe how you tested this PR.