Skip to content

[metrics] Support DDSketch in the parquet pipeline#6257

Open
mattmkim wants to merge 7 commits intomatthew.kim/metrics-wide-schemafrom
matthew.kim/parquet-sketches
Open

[metrics] Support DDSketch in the parquet pipeline#6257
mattmkim wants to merge 7 commits intomatthew.kim/metrics-wide-schemafrom
matthew.kim/parquet-sketches

Conversation

@mattmkim
Copy link
Copy Markdown
Contributor

@mattmkim mattmkim commented Mar 31, 2026

Description

This PR can be reviewed commit by commit.

This PR updates the parquet pipeline to process DDSketches. See https://datadoghq.atlassian.net/wiki/spaces/QKHS/pages/6291357728/DDSketch+in+Parquet for more information about the DDSketch spec.

How was this PR tested?

Describe how you tested this PR.

@mattmkim mattmkim force-pushed the matthew.kim/parquet-sketches branch from db0e0db to 86c034b Compare March 31, 2026 21:14
@mattmkim mattmkim changed the title [draft] parquet ddsketch engine [metrics] Support DDSketch in the parquet pipeline Mar 31, 2026
@mattmkim mattmkim marked this pull request as ready for review March 31, 2026 21:30
@mattmkim mattmkim force-pushed the matthew.kim/parquet-sketches branch from c3fc790 to 2261237 Compare March 31, 2026 21:38
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 727f085864

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +264 to +267
let is_sketch = new_splits
.first()
.map(|s| s.kind == quickwit_parquet_engine::split::ParquetSplitKind::Sketches)
.unwrap_or(false);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Derive sketch publish RPC from index, not split list

This branch infers whether to call publish_sketch_splits by looking at new_splits.first(), but the parquet indexer intentionally sends checkpoint-only updates where new_splits is empty (e.g., force-commit/EOF paths), so sketch pipelines fall back to metrics RPCs via unwrap_or(false). That means empty-flush checkpoint publishes for sketch indexes can hit the wrong metastore endpoint, which is incorrect routing and can break backends/tests that enforce sketch-specific publish methods.

Useful? React with 👍 / 👎.

Comment on lines +2581 to +2583
let query: ListParquetSplitsQuery = serde_utils::from_json_str(&request.query_json)?;
let splits = self
.list_parquet_splits_impl(ParquetSplitKind::Sketches, query)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Enforce request index UID in sketch split listing

The Postgres sketch listing path ignores request.index_uid and directly trusts the index_uid embedded in request.query_json, so a mismatched request can query a different index than the RPC envelope indicates. This is an authorization/data-scope footgun for callers that rely on the top-level request field; the implementation should validate equality or overwrite query.index_uid with request.index_uid() before executing the query.

Useful? React with 👍 / 👎.

@fulmicoton-dd
Copy link
Copy Markdown
Collaborator

i cannot review this PR due to lack of context. I know what DDSketch are, but I do not know what they are used for in the context the metrics ingestion pipeline, why they are stored in different files, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants