Skip to content

Attach metric name labels#36633

Draft
SangJunBak wants to merge 4 commits into
MaterializeInc:mainfrom
SangJunBak:jun/attach-metric-name-labels
Draft

Attach metric name labels#36633
SangJunBak wants to merge 4 commits into
MaterializeInc:mainfrom
SangJunBak:jun/attach-metric-name-labels

Conversation

@SangJunBak
Copy link
Copy Markdown
Contributor

Adds a feature flag for the upcoming federated
/metrics/external endpoint on environmentd, which will fan out a single
scrape across env's local metrics and every clusterd replica's /metrics.Remove these sections if your commit already has a good description!

Motivation

Why does this change exist? Link to a GitHub issue, design doc, Slack
thread, or explain the problem in a sentence or two. A reviewer who has
no context should understand why after reading this section.

If this implements or addresses an existing issue, it's enough to link to that:
Closes
Fixes
etc.

Description

What does this PR actually do? Focus on the approach and any non-obvious
decisions. The diff shows the code --- use this space to explain what the
diff can't tell a reviewer.

Verification

How do you know this change is correct? Describe new or existing automated
tests, or manual steps you took.

Adds a new mod rule with a declarative Rule enum (ClusterNameLookup,
ReplicaNameLookup, ObjectNameLookup), a CatalogNameLookup trait, and a
Rule::apply that stamps resolved name labels onto a MetricFamily.

Threads an optional rules: [...] field through the metric! macro and
MakeCollectorOpts. MetricsRegistry::register stores those rules keyed by
the fully-qualified Prometheus metric name and exposes them via a new
rules_by_metric() accessor.

No HTTP wiring yet — follow-ups connect the registry to the federated
endpoint via a response header.
Adds two header constants and a wants_enrich_rules helper. When a caller
sends X-Materialize-Accept-Enrich-Rules, handle_prometheus serializes
registry.rules_by_metric() and attaches it as
X-Materialize-Enrich-Rules in the response.

Header is opt-in so default Prometheus scrapers see clean responses;
only consumers that understand the per-metric rules wire format request
it (today: env's federated /metrics/public scraper).
Wires env's federated /metrics/public endpoint to the per-metric rules
mechanism: scraper sends the opt-in header on every replica scrape,
parses X-Materialize-Enrich-Rules off the response as a map of metric
name -> rules, and applies them only to the matching family.

Adds CatalogLookup wrapping &Catalog (implements CatalogNameLookup with
typed ClusterId / ReplicaId / GlobalId parsing), splits the old
add_replica_labels into stamp_scrape_context_labels (the three labels
that come from the connection) plus rule application. Env-local
metrics get the same per-metric treatment via metrics_registry
.rules_by_metric().

E2E test_metrics_public_endpoint is expected red after this commit
until callers declare rules on their metrics in the next change.
Migrates the three subsystems with metrics carrying cluster_id /
replica_id / source_id / parent_source_id / collection_id labels to
declare their enrichment rules inline via the metric! macro's new
rules: [...] field:

- cluster-client: ControllerMetrics adds ClusterNameLookup,
  ReplicaNameLookup, and ObjectNameLookup (for collection_id) to each
  of the three dataflow wallclock-lag metrics.
- compute-client: ComputeControllerMetrics gains instance_name_rule()
  and replica_name_rule() helpers; every metric with an instance_id /
  replica_id label is decorated.
- storage/statistics: SourceStatisticsMetricDefs gains
  source_name_rule() and parent_source_name_rule() helpers; every
  source metric with source_id / parent_source_id is decorated.

These declarations replace what the prototype's global register_rule
calls used to do, but scoped per-metric so labels can't bleed across
unrelated families. E2E test_metrics_public_endpoint is back to green
via cluster-client's mz_dataflow_wallclock_lag_seconds rule.
@SangJunBak SangJunBak force-pushed the jun/attach-metric-name-labels branch from 129175c to 3412db4 Compare May 20, 2026 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant