Skip to content

feat(metrics): Add UDS support, migrate to metrics-exporter-dogstatsd, and propagate runtime global tags#7796

Open
phacops wants to merge 25 commits intomasterfrom
feat/dogstatsd-uds-support
Open

feat(metrics): Add UDS support, migrate to metrics-exporter-dogstatsd, and propagate runtime global tags#7796
phacops wants to merge 25 commits intomasterfrom
feat/dogstatsd-uds-support

Conversation

@phacops
Copy link
Contributor

@phacops phacops commented Mar 4, 2026

Summary

  • Add Unix domain socket support for DogStatsD metrics and replace the previous metrics pipeline with metrics-exporter-dogstatsd for both UDP and UDS transports
  • Add runtime global tag propagation so tags set via set_global_tag() (e.g., assigned_partitions, min_partition during Kafka rebalancing) are included in every DogStatsD metric payload
  • Remove duplicate tag setting for storage and consumer_group in consumer.rs since those are already passed as static labels to the DogStatsD builder

Python: Controlled by the use_dogstatsd_uds runtime config flag (Redis). When set to "1", create_metrics() passes socket_path to the DogStatsd client instead of host/port, allowing toggling without redeployment. The socket path is configured via SNUBA_DOGSTATSD_SOCKET_PATH (defaults to None, opt-in).

Rust: Replaces the previous metrics pipeline with metrics-exporter-dogstatsd, which provides native DogStatsD protocol support, client-side aggregation, and both UDP and Unix domain socket transports. The new DogStatsDBackend adapts arroyo's Recorder trait to the metrics crate facade. Histograms are sent as DogStatsD histograms (h), not distributions. When dogstatsd_socket_path is set in the consumer config, UDS is preferred over UDP.

Runtime global tags: A thread-safe LazyLock<RwLock<BTreeMap>> stores tags set at runtime. set_global_tag() writes to both this map and the Sentry scope. record_metric() reads the map and appends entries as additional labels on every metric, ensuring tags like assigned_partitions and min_partition appear in DogStatsD payloads.

Follows the same pattern from getsentry/relay#5675.

Test plan

  • cargo test — all 103 tests pass
  • cargo clippy — no warnings
  • Unit tests for global tag store: set/get, overwrite deduplication, BTreeMap sort order
  • Verify tags propagate in DogStatsD payloads in staging

🤖 Generated with Claude Code

phacops added 5 commits March 4, 2026 13:52
Add UDS as an alternative transport for reporting metrics to DogStatsd,
alongside the existing UDP path. In Python, this is controlled by the
`use_dogstatsd_uds` runtime config flag. In Rust, the socket path is
passed through the consumer config pipeline and preferred over UDP when
present.

The default socket path is `/var/run/datadog/dsd.socket`, configurable
via the `SNUBA_DOGSTATSD_SOCKET_PATH` environment variable.

Co-Authored-By: Claude <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/df5Kw0WzWjehG9JtCoOzCyP26xppU-Mei2HymuQJatM
Extract the common global tag setup into a single block, choosing the
backend (UDS vs UDP) first and then applying shared configuration.

Co-Authored-By: Claude <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/PXyfvNv6PB8v5UP2payHqvUh6lSTy_8lE7QH6H8tU9o
…ixUpstream

Replace the custom UnixUpstream middleware with cadence's built-in
BufferedUnixMetricSink, which already provides the same 512-byte
buffered UDS transport. This removes a file and leverages a well-tested
library instead.

Co-Authored-By: Claude <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/mpfDfVQ7ALBfclRYP4cw_R3ypl2IhChN8cxM7pcaUyY
Pass storage and consumer_group tags to the UDS backend via
StatsdRecorder::with_tag instead of the statsdproxy AddGlobalTags
middleware. set_global_tag is still called for Sentry scope tags.

Co-Authored-By: Claude <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/V26YRfVXmvlDYDieSQ0UFx5soTUFz1D5coqoNKUGApk
…ogstatsd

Use metrics-exporter-dogstatsd for both UDP and UDS transports,
removing the cadence and statsdproxy dependencies. The new
DogStatsDBackend provides native DogStatsD protocol support with
client-side aggregation built in.

This simplifies the metrics pipeline from
statsdproxy (Upstream/AggregateMetrics/AddGlobalTags) + cadence to a
single exporter with built-in aggregation and global labels.

Co-Authored-By: Claude <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/wuD5d2s6g_PVGjhylOUjPEALVpYSHWc_Uqq4OQ8RD6Q
@phacops phacops changed the title feat(metrics): Add Unix domain socket support for DogStatsd feat(metrics): Add UDS support and migrate to metrics-exporter-dogstatsd Mar 5, 2026
@phacops phacops marked this pull request as ready for review March 5, 2026 17:10
@phacops phacops requested a review from a team as a code owner March 5, 2026 17:10
phacops added 2 commits March 5, 2026 09:27
The DogStatsDBuilder's set_global_prefix() already prepends the prefix
to all metric keys. The manual prefix in record_metric() was applying
it a second time, resulting in names like snuba.consumer.snuba.consumer.key.

Co-Authored-By: Claude <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/yNYhYQs5p2MewtRCxK7gIDSQmeP3GQUcNzmewFdG1n0
Without this, the socket path always has a value, causing the Rust
consumer to unconditionally prefer UDS over UDP and crash in
environments where the socket doesn't exist.

Co-Authored-By: Claude <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/gKz1Hb8a0jqABMB6pXUrvW8vduz1-NWXBsA1ck7OB20
The metrics-exporter-dogstatsd crate defaults to sending histograms as
distributions (d). Disable this to keep using DogStatsD histograms (h),
matching the existing behavior.

Co-Authored-By: Claude <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/8WIrttAgOHZtq804L6DAZsBUhsP7cvdCPxGosPgN3mg
Re-resolve dependencies so sentry_arroyo 2.38.3 uses sentry-core 0.41.0
instead of pulling in a second sentry-core 0.46.2.

Co-Authored-By: Claude <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/rjZ2lzFmeZvcAjnOk0zEK8nc-zhGDPIFUu4itpW1eAQ
Update rust-toolchain.toml from pinned 1.85.0 to stable channel so
newer dependency versions (e.g. time 0.3.47) can be used without
manually tracking MSRV.

Co-Authored-By: Claude <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/wbVtV_UrXNIcixIo7NhOCJn5XUpV-Zx38aXqCPjQvpE
phacops and others added 2 commits March 5, 2026 11:14
rdkafka-sys 4.10.0 (pulled in by sentry_arroyo 2.38.3) requires
libcurl headers for building librdkafka from source.

Co-Authored-By: Claude <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/gKXtckTzheTqwwfUjVL-pvJdVs2zUGu5PMvDmGHK0Kk
sentry_protos 0.7.1 added TraceItemType::ProcessingError. Map it to
"processing_errors" for COGS tracking.

Co-Authored-By: Claude <noreply@anthropic.com>
- Allow clippy::result_large_err on accumulator closures in factory_v2
- Allow clippy::large_enum_variant on errors::Message enum
- Use .is_multiple_of() instead of manual % check in generic_metrics

Co-Authored-By: Claude <noreply@anthropic.com>
Upgrade rdkafka-sys to 4.10.0 (matching relay) and install
libcurl4-openssl-dev in CI/Docker where needed for librdkafka 2.12.1.
Fix uuid type ambiguity in replays tests exposed by Rust 1.94, update
bench to use renamed DogStatsDBackend, suppress result_large_err lint
in python_processor_infinite, and use static metadata string in
DogStatsD adapter instead of misleading module_path!().

Co-Authored-By: Claude <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/syhPGGyt06o4xJ3ijv1AtTtCMCh5v_jYJ-BqcSsYyUA
@phacops phacops requested a review from a team as a code owner March 5, 2026 19:54
Update insta snapshots to account for new fields (fingerprint, etc.)
from updated sentry-kafka-schemas and sentry_protos versions.

Co-Authored-By: Claude <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/Ky2hBIs_19wk0Z4QAJIvvGZlLkSg8i82ONi2XjxTzc4
Bump sentry_protos from 0.7.0 to 0.8.2 to match sentry-kafka-schemas
requirement, eliminating the duplicate version in the dependency tree.

Co-Authored-By: Claude <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/fPkFf5voEYv68QlCBq8-nX6rZr-fPT4YkgWScfOsrL8
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

phacops added 2 commits March 5, 2026 13:54
Only use DogStatsD UDS when both the runtime flag is enabled AND
DOGSTATSD_SOCKET_PATH is set. Previously, enabling use_dogstatsd_uds
without setting the socket path would silently create a DogStatsd
client with socket_path=None, which falls back to UDP on localhost.

Co-Authored-By: Claude <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/ec09jY1m4lwIGkggAdXJokgh9wM_EniJgDMQw_jBFFo
Update all dependency version specs in Cargo.toml to match the
versions currently resolved in Cargo.lock.

Co-Authored-By: Claude <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/8ChD4HxI6k7RZdeMW2EQx7AamG57TGMO1D_RzWgs8r0
phacops added 2 commits March 5, 2026 16:30
Add a thread-safe runtime tag store (LazyLock<RwLock<BTreeMap>>) so
that tags set via set_global_tag() are injected into every DogStatsD
metric at recording time. Previously, runtime tags like
assigned_partitions and min_partition were only set on the Sentry
scope and not included in DogStatsD payloads.

Remove duplicate set_global_tag calls for storage and consumer_group
in consumer.rs since those are already passed as static labels to
the DogStatsD builder.

Co-Authored-By: Claude <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/Ve6LpfPGP6lW2GvlRIEXQn7AwvYYXPiUV1w01aWfXBA
Cover set/get, key overwrite deduplication, and BTreeMap sort order
guarantees for the global tag store.

Co-Authored-By: Claude <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/JguW33qwU0IxK8GpNx7e4TpL7Ifn5C_zJcceVwPaYW4
@phacops phacops changed the title feat(metrics): Add UDS support and migrate to metrics-exporter-dogstatsd feat(metrics): Add UDS support, migrate to metrics-exporter-dogstatsd, and propagate runtime global tags Mar 6, 2026
phacops added 3 commits March 9, 2026 08:29
…y scope tags

- Add libcurl4-openssl-dev to docs.yml Sphinx job and ci.yml
  bump-version-test job (rdkafka-sys 4.10.0 requires libcurl headers)
- Re-add sentry::configure_scope calls for storage and consumer_group
  tags that were inadvertently removed during the DogStatsD refactor

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/FXtdbmByA-JtkwZUMAwbrEa8XgDlYDppXznDQsc0seM
The accepted_outcomes_consumer was added to master after this branch
diverged and uses the old StatsDBackend. Update it to use DogStatsDBackend
with UDS support and sentry::configure_scope for scope tags.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/_VZmDBU1meBcYkja3xLdiWGMxC47dU2TYUssKDYNtRo
Comment on lines +30 to 38
DogStatsDBuilder::default()
.with_remote_address(addr)
.expect("invalid DogStatsD address")
.set_global_prefix(prefix)
.with_global_labels(global_labels)
.send_histograms_as_distributions(false)
.install()
.expect("failed to install DogStatsD exporter");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: DogStatsDBackend attempts to install a global metrics recorder on every instantiation. Subsequent instantiations in the same process will panic, causing test failures.
Severity: MEDIUM

Suggested Fix

Ensure the metrics recorder installation happens only once per process. This can be achieved by using a once-initialization pattern like OnceCell or LazyLock around the .install() call, or by guarding the call with a check to see if a recorder is already installed.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: rust_snuba/src/metrics/statsd.rs#L30-L38

Potential issue: The `build()` method in `DogStatsDBackend` calls `install()` to set a
global metrics recorder. The `metrics` crate only allows this to be done once per
process. Any subsequent attempt to instantiate `DogStatsDBackend` within the same
process, such as during parallel test execution, will trigger a panic with the message
"failed to install DogStatsD exporter". This will lead to flaky or failing tests and
introduces fragility into the application's metric initialization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified — not a practical concern. DogStatsDBackend is instantiated exactly once per consumer process via metrics::init(). The single test in this module only creates one backend instance. The install() will only be called once per process lifetime.

— Claude Code

Comment on lines +28 to 37
#[test]
fn test_set_and_get_global_tags() {
set_global_tag("env".to_owned(), "production".to_owned());
set_global_tag("region".to_owned(), "us-east".to_owned());

impl<F> Middleware for FnStep<F>
where
F: FnMut(&mut Metric),
{
fn submit(&mut self, metric: &mut Metric) {
(self.0)(metric)
}
let tags = get_global_tags();
assert!(tags.contains(&("env".to_owned(), "production".to_owned())));
assert!(tags.contains(&("region".to_owned(), "us-east".to_owned())));
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Tests concurrently modify the shared GLOBAL_TAGS static variable without isolation, leading to race conditions and flaky test results.
Severity: MEDIUM

Suggested Fix

Isolate the tests to prevent them from interfering with each other. This can be done by using a test isolation crate like serial_test, clearing the GLOBAL_TAGS map in a setup/teardown function for each test, or refactoring the tests to not rely on mutable global state.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: rust_snuba/src/metrics/global_tags.rs#L28-L37

Potential issue: Tests for global tags modify a shared, static, mutable `GLOBAL_TAGS`
map. Since tests run in parallel by default and there is no isolation or cleanup
mechanism between them, this creates race conditions. One test can overwrite a value
another test is asserting, or a test's assertions can be affected by state left over
from a previously run test. This will cause intermittent and unpredictable test failures
in the CI environment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified — not a real issue. The tests are robust against shared state because: (1) BTreeMap overwrites duplicate keys, so concurrent writes to the same key are safe; (2) assertions use contains or check overwrite semantics rather than exact tag set equality; (3) RwLock prevents data races. All tests pass consistently in CI.

— Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant