[metrics] Materialize all metrics tags into top level columns by mattmkim · Pull Request #6237 · quickwit-oss/quickwit

mattmkim · 2026-03-30T13:59:36Z

Description

This PR can be reviewed commit by commit.

Currently, we define a static schema for metrics data in parquet. This PR makes the schema dynamic - all tags are put into their own columns during ingestion. Incoming metrics points must have "metric_name", "metric_type", "timestamp_secs", "value" as fields.

Again, a lot of metrics parsing/arrow logic lives in quickwit-opentelemetry, when it should not. We will refactor this, eventually :)

How was this PR tested?

Describe how you tested this PR.

mattmkim · 2026-03-31T21:34:38Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5eae799d9a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-31T21:40:10Z

quickwit/quickwit-indexing/src/actors/parquet_doc_processor.rs

            let empty_batch =
-                RecordBatch::new_empty(self.processor.schema().arrow_schema().clone());
+                RecordBatch::new_empty(std::sync::Arc::new(arrow::datatypes::Schema::empty()));


Preserve required schema on checkpoint-only flush batches

Constructing the fallback checkpoint batch with Schema::empty() causes a hard failure when force_commit is true and no valid docs were forwarded (for example, all docs in the raw batch failed parsing): the indexer still flushes this batch, and ParquetWriter::write_to_file now rejects it because required columns are missing. In that path, the packager exits with an error instead of forwarding the checkpoint, which can stall ingestion progress for that shard.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-31T21:40:10Z

quickwit/quickwit-opentelemetry/src/otlp/otel_metrics.rs

        for scope_metrics in resource_metrics.scope_metrics {
            for metric in scope_metrics.metrics {
-                parse_metric(
-                    &metric,
-                    &service_name,
-                    &resource_attributes,
-                    &mut data_points,
-                )?;
+                parse_metric(&metric, &service_name, &mut data_points, &mut num_rejected);
            }


Materialize resource-level metric attributes into tags

Resource attributes are parsed but no longer propagated into per-point tags: after extracting service.name, metrics are parsed with only service_name, so dimensions that commonly live at resource scope (like env, region, host, custom resource tags) are dropped entirely. This is a data-loss regression versus the previous behavior and breaks filtering/grouping on those tags for OTLP senders that attach dimensions at the resource level.

Useful? React with 👍 / 👎.

mattmkim force-pushed the matthew.kim/metrics-wide-schema branch 3 times, most recently from fd2f214 to 7621e22 Compare March 30, 2026 17:52

mattmkim changed the title ~~[draft] [metrics] materialize all metrics tags into columns~~ [metrics] Materialize all metrics tags into top level columns Mar 30, 2026

mattmkim force-pushed the matthew.kim/metrics-wide-schema branch from 7621e22 to 1b50f21 Compare March 30, 2026 18:12

mattmkim marked this pull request as ready for review March 30, 2026 18:14

mattmkim added 2 commits March 30, 2026 14:21

feat: replace fixed MetricDataPoint fields with dynamic tag HashMap

d892f23

feat: replace ParquetField enum with constants and dynamic validation

4e928fe

mattmkim force-pushed the matthew.kim/metrics-wide-schema branch from 1b50f21 to 0b642de Compare March 30, 2026 18:26

mattmkim added 6 commits March 30, 2026 14:44

feat: derive sort order and bloom filters from batch schema

d8d71ed

feat: union schema accumulation and schema-agnostic ingest validation

ad115bf

feat: dynamic column lookup in split writer

d0a995e

feat: remove ParquetSchema dependency from indexing actors

35c3942

refactor: deduplicate test batch helpers

33c4070

lint

7a5979f

mattmkim force-pushed the matthew.kim/metrics-wide-schema branch from 0b642de to 7a5979f Compare March 30, 2026 18:45

mattmkim added 2 commits March 31, 2026 09:41

Merge branch 'main' into matthew.kim/metrics-wide-schema

440631b

Merge branch 'main' into matthew.kim/metrics-wide-schema

5eae799

chatgpt-codex-connector bot reviewed Mar 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[metrics] Materialize all metrics tags into top level columns#6237

[metrics] Materialize all metrics tags into top level columns#6237
mattmkim wants to merge 10 commits intomainfrom
matthew.kim/metrics-wide-schema

mattmkim commented Mar 30, 2026 •

edited

Loading

Uh oh!

mattmkim commented Mar 31, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 31, 2026

Uh oh!

chatgpt-codex-connector bot Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mattmkim commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How was this PR tested?

Uh oh!

mattmkim commented Mar 31, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mattmkim commented Mar 30, 2026 •

edited

Loading