Skip to content

[metrics] Janitor cleanup for parquet files#6248

Open
mattmkim wants to merge 9 commits intomainfrom
matthew.kim/parquet-janitor
Open

[metrics] Janitor cleanup for parquet files#6248
mattmkim wants to merge 9 commits intomainfrom
matthew.kim/parquet-janitor

Conversation

@mattmkim
Copy link
Copy Markdown
Contributor

@mattmkim mattmkim commented Mar 30, 2026

Description

This PR can be reviewed commit by commit.

Enables the janitor to clean up parquet files, for metrics indexes. Functionally, should be the same as tantivy split cleanup.

How was this PR tested?

Describe how you tested this PR.

@mattmkim mattmkim force-pushed the matthew.kim/parquet-janitor branch from d51894c to 37162f3 Compare March 30, 2026 20:56
@mattmkim mattmkim force-pushed the matthew.kim/parquet-janitor branch from 993d043 to fa5eee3 Compare March 30, 2026 21:11
@mattmkim mattmkim changed the title [draft] parquet janitor [metrics] Janitor cleanup for parquet files Mar 30, 2026

/// Deletes a single batch of parquet splits from storage and metastore.
/// Returns (succeeded, failed).
async fn delete_parquet_splits_from_storage_and_metastore(
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattmkim mattmkim marked this pull request as ready for review March 30, 2026 21:27
@mattmkim
Copy link
Copy Markdown
Contributor Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1d08467270

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +294 to +296
Err(err) => {
error!(index_uid=%index_uid, error=?err, "failed to list metrics splits");
break;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Propagate metastore list failures from parquet GC

When list_metrics_splits fails, this branch only logs and breaks, and delete_marked_parquet_splits later returns Ok(removal_info); run_parquet_garbage_collect therefore reports success even when no cleanup could be performed. In production metastore outages for metrics indexes, janitor success counters/metrics are incremented and operators lose the failure signal, so parquet GC can be silently ineffective for entire runs.

Useful? React with 👍 / 👎.

Comment on lines +109 to +110
let query = ListMetricsSplitsQuery::for_index(index_uid.clone())
.with_max_time_range_end(max_retention_timestamp);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Paginate parquet retention scans before marking splits

This retention path queries expired metrics splits without a limit/cursor and then marks all returned split IDs in one request, which scales poorly compared to the paginated GC path added in this commit. On indexes with many expired parquet splits, the response/request payload can become large enough to hit RPC/message-size limits or memory pressure, causing the retention execution to fail and leave old data unmarked.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant