release-26.2: obs: add per-CODEOWNER metric count to Prometheus scrape by angles-n-daemons · Pull Request #167439 · cockroachdb/cockroach

angles-n-daemons · 2026-04-02T21:10:37Z

Backport 1/1 commits from #166695 on behalf of @angles-n-daemons.

/cc @cockroachdb/release

Add a runtime metric obs.metric_export.codeowner.metric_count (GaugeVec
labeled by codeowner) that reports metric counts per owning team during
each scrape. Counts reflect downstream ingestion: simple metrics count as
1, histograms expand to their computed metrics (percentiles, count, sum,
avg, max). This follows the same pattern as the existing
obs.metric_export.child.count metric and enables teams to understand
their contribution to scrape volume.

To make metric-to-team ownership data available at runtime, embed
metric_owners.yaml (generated by gen-metric-owners) into the
metricscan package via go:embed. The ./dev generate docs step copies
the YAML into the package directory. The data is loaded once at
MetricsRecorder construction and used during each scrape to attribute
metric families to their CODEOWNER team.

Also fix check_generated_code CI to run gen-metric-owners before
//pkg/gen, ensuring that metric_owners.yaml staleness is caught and
new metrics get their owner field populated in metrics.yaml.

Resolves: #166692

Release note: None

Release justification: Allows us to report on metrics output volume by codeowner, it's a low risk change.

Add a runtime metric `obs.metric_export.codeowner.metric_count` (GaugeVec labeled by `codeowner`) that reports metric counts per owning team during each scrape. Counts reflect downstream ingestion: simple metrics count as 1, histograms expand to their computed metrics (percentiles, count, sum, avg, max). This follows the same pattern as the existing `obs.metric_export.child.count` metric and enables teams to understand their contribution to scrape volume. To make metric-to-team ownership data available at runtime, embed `metric_owners.yaml` (generated by `gen-metric-owners`) into the `metricscan` package via `go:embed`. The `./dev generate docs` step copies the YAML into the package directory. The data is loaded once at `MetricsRecorder` construction and used during each scrape to attribute metric families to their CODEOWNER team. Gate output behind `obs.metric_export.codeowner_count.enabled` (default `false`) so customer clusters don't see internal team names in their metrics. Internal CRL clusters override this to `true` via managed-service. Fix 8 `admission_cpu_time_tokens_per_tenant_*` metrics showing as `codeowner="unknown"` by adding format-string `Name` fields to the base metadata templates in `cpu_time_token_metrics.go`, so the AST scanner detects them as patterns. Also fix `check_generated_code` CI to run `gen-metric-owners` before `//pkg/gen`, ensuring that `metric_owners.yaml` staleness is caught and new metrics get their `owner` field populated in `metrics.yaml`. Follow-up: CLOUDOPS-19945 Resolves: cockroachdb#166692 Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>

trunk-io · 2026-04-02T21:10:44Z

Merging to release-26.2 in this repository is managed by Trunk.

To merge this pull request, check the box to the left or comment /trunk merge below.

After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here

blathers-crl · 2026-04-02T21:10:47Z

Thanks for opening a backport.

Before merging, please confirm that the change does not break backwards compatibility and otherwise complies with the backport policy. Include a brief release justification in the PR description explaining why the backport is appropriate. All backports must be reviewed by the TL for the owning area. While the stricter LTS policy does not yet apply, please exercise judgment and consider gating non-critical changes behind a disabled-by-default feature flag when appropriate.

cockroach-teamcity · 2026-04-02T21:11:02Z

This change is

Remove metrics that exist on master but not on release-26.2 (rpc.drpc.enabled, rpc.server.requests.total, rpc.client.request.duration.nanos, rpc.client.requests.total, storage.wal.failover.secondary.disk.available, storage.wal.failover.secondary.disk.capacity) from metrics.yaml and metric_owners.yaml to fix check_generated_code CI. Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>

dhartunian · 2026-04-03T17:05:04Z

pkg/server/status/recorder.go

+	settings.ApplicationLevel, "obs.metric_export.codeowner_count.enabled",
+	"enables the reporting of per-CODEOWNER metric counts in the Prometheus scrape",
+	false,
+	settings.WithPublic)


nit: this setting doesn't need to be public

Oh nice catch, I'll change that - and open a pr against master for the same.

dhartunian · 2026-04-03T17:06:57Z

pkg/util/admission/cpu_time_token_metrics.go

 	// suffix is appended in makeCPUTimeTokenMetrics. See the comment on
 	// cpuTimeTokenMetrics.AdmittedCountPerTenant for the rationale.
 	cpuTimeTokenAdmittedCountPerTenantMetaBase = metric.Metadata{
+		Name: "admission.cpu_time_tokens.per_tenant.admitted_count.%s",


why are these changed here?

They need to be in near the Metadata definition so that the code scanner (which associates metrics with owners) can pick them up.

They're actually overridden with the same name further down in the code, this bit just allows us to associate them with an owner.

dhartunian · 2026-04-03T17:07:29Z

pkg/server/status/recorder_test.go

+
+	// test_gauge and test_counter are simple metrics: 1 each.
+	require.Regexp(t,
+		`obs_metric_export_codeowner_metric_count\{`+


for next time: I'd strongly prefer datadriven tests for stuff like this so we can clearly see the output instead of a regex. This is tough to verify visually.

Ah I'll try to keep that in mind for next time.

dhartunian · 2026-04-03T17:16:38Z

pkg/server/status/recorder.go

+	var count int64
+	for _, m := range family.Metric {
+		if m.Histogram != nil {
+			count += int64(len(metric.HistogramMetricComputers))


Not blocking but this is not quite correct in terms of the docstring. len(metric.HistogramMetricComputers) records the number of items in the TSDB's persisted set. The Datadog estimate is tough to gauge because it converts to a native representation.

That's a good point, I'll update the docstring. Is there a better way we can somewhat quantify the impact, from a cost perspective? The Computers just felt like a low-hanging approximation.

I think the bucket count is the proxy here since that affects the size of the histogram

blathers-crl · 2026-04-03T17:35:37Z

Metrics change detected

This PR adds or updates one or more CRDB metrics. If you want these metrics to be exported by CRDB Cloud clusters to Internal CRL Datadog and/or included in the customer metric export integration (Essential metrics for standard deployment, and Essential metrics for advanced deployment), refer to this Installation and Usage guide of a CLI tool that syncs the metric mappings in managed-service. Run this CLI tool after your CRDB PR is merged.

The CLI opens a PR in managed-service with the required config changes.
Please track that PR and ensure it merges so your metrics become available to CRDB Cloud clusters.

Note: Your metric will appear in Internal CRL Datadog only after the managed-service PR merges and the new OTel configuration rolls out to at least one cluster running a CRDB build that includes this metric.

Docs: cockroach-metric-sync

Questions: reach out to @obs-india-prs

Address review feedback: the `obs.metric_export.codeowner_count.enabled` setting is internal-only and doesn't need to be publicly visible. Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>

angles-n-daemons requested review from a team as code owners April 2, 2026 21:10

angles-n-daemons requested review from alyshanjahani-crl, herkolategan and srosenberg and removed request for a team April 2, 2026 21:10

blathers-crl bot added backport Label PR's that are backports to older release branches T-observability labels Apr 2, 2026

angles-n-daemons requested review from dhartunian and removed request for a team, alyshanjahani-crl, herkolategan and srosenberg April 3, 2026 16:42

dhartunian approved these changes Apr 3, 2026

View reviewed changes

obs: make codeowner_count cluster setting non-public

549e855

Address review feedback: the `obs.metric_export.codeowner_count.enabled` setting is internal-only and doesn't need to be publicly visible. Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>

Conversation

angles-n-daemons commented Apr 2, 2026

Uh oh!

trunk-io bot commented Apr 2, 2026

Uh oh!

blathers-crl bot commented Apr 2, 2026

Uh oh!

cockroach-teamcity commented Apr 2, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

blathers-crl bot commented Apr 3, 2026

Metrics change detected

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants