[metrics] Support DDSketch in the parquet pipeline#6257
[metrics] Support DDSketch in the parquet pipeline#6257mattmkim wants to merge 7 commits intomatthew.kim/metrics-wide-schemafrom
Conversation
…or processors, writers, and metastore RPCs
db0e0db to
86c034b
Compare
c3fc790 to
2261237
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 727f085864
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let is_sketch = new_splits | ||
| .first() | ||
| .map(|s| s.kind == quickwit_parquet_engine::split::ParquetSplitKind::Sketches) | ||
| .unwrap_or(false); |
There was a problem hiding this comment.
Derive sketch publish RPC from index, not split list
This branch infers whether to call publish_sketch_splits by looking at new_splits.first(), but the parquet indexer intentionally sends checkpoint-only updates where new_splits is empty (e.g., force-commit/EOF paths), so sketch pipelines fall back to metrics RPCs via unwrap_or(false). That means empty-flush checkpoint publishes for sketch indexes can hit the wrong metastore endpoint, which is incorrect routing and can break backends/tests that enforce sketch-specific publish methods.
Useful? React with 👍 / 👎.
| let query: ListParquetSplitsQuery = serde_utils::from_json_str(&request.query_json)?; | ||
| let splits = self | ||
| .list_parquet_splits_impl(ParquetSplitKind::Sketches, query) |
There was a problem hiding this comment.
Enforce request index UID in sketch split listing
The Postgres sketch listing path ignores request.index_uid and directly trusts the index_uid embedded in request.query_json, so a mismatched request can query a different index than the RPC envelope indicates. This is an authorization/data-scope footgun for callers that rely on the top-level request field; the implementation should validate equality or overwrite query.index_uid with request.index_uid() before executing the query.
Useful? React with 👍 / 👎.
|
i cannot review this PR due to lack of context. I know what DDSketch are, but I do not know what they are used for in the context the metrics ingestion pipeline, why they are stored in different files, etc. |
Description
This PR can be reviewed commit by commit.
This PR updates the parquet pipeline to process DDSketches. See https://datadoghq.atlassian.net/wiki/spaces/QKHS/pages/6291357728/DDSketch+in+Parquet for more information about the DDSketch spec.
How was this PR tested?
Describe how you tested this PR.