feat(metrics): Add UDS support, migrate to metrics-exporter-dogstatsd, and propagate runtime global tags#7796
feat(metrics): Add UDS support, migrate to metrics-exporter-dogstatsd, and propagate runtime global tags#7796
Conversation
Add UDS as an alternative transport for reporting metrics to DogStatsd, alongside the existing UDP path. In Python, this is controlled by the `use_dogstatsd_uds` runtime config flag. In Rust, the socket path is passed through the consumer config pipeline and preferred over UDP when present. The default socket path is `/var/run/datadog/dsd.socket`, configurable via the `SNUBA_DOGSTATSD_SOCKET_PATH` environment variable. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/df5Kw0WzWjehG9JtCoOzCyP26xppU-Mei2HymuQJatM
Extract the common global tag setup into a single block, choosing the backend (UDS vs UDP) first and then applying shared configuration. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/PXyfvNv6PB8v5UP2payHqvUh6lSTy_8lE7QH6H8tU9o
…ixUpstream Replace the custom UnixUpstream middleware with cadence's built-in BufferedUnixMetricSink, which already provides the same 512-byte buffered UDS transport. This removes a file and leverages a well-tested library instead. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/mpfDfVQ7ALBfclRYP4cw_R3ypl2IhChN8cxM7pcaUyY
Pass storage and consumer_group tags to the UDS backend via StatsdRecorder::with_tag instead of the statsdproxy AddGlobalTags middleware. set_global_tag is still called for Sentry scope tags. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/V26YRfVXmvlDYDieSQ0UFx5soTUFz1D5coqoNKUGApk
…ogstatsd Use metrics-exporter-dogstatsd for both UDP and UDS transports, removing the cadence and statsdproxy dependencies. The new DogStatsDBackend provides native DogStatsD protocol support with client-side aggregation built in. This simplifies the metrics pipeline from statsdproxy (Upstream/AggregateMetrics/AddGlobalTags) + cadence to a single exporter with built-in aggregation and global labels. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/wuD5d2s6g_PVGjhylOUjPEALVpYSHWc_Uqq4OQ8RD6Q
…support # Conflicts: # rust_snuba/Cargo.lock Agent transcript: https://claudescope.sentry.dev/share/hGwX0afKzwa4EE72ryquHfk8W77IOSqA5TGW56k5K58
The DogStatsDBuilder's set_global_prefix() already prepends the prefix to all metric keys. The manual prefix in record_metric() was applying it a second time, resulting in names like snuba.consumer.snuba.consumer.key. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/yNYhYQs5p2MewtRCxK7gIDSQmeP3GQUcNzmewFdG1n0
Without this, the socket path always has a value, causing the Rust consumer to unconditionally prefer UDS over UDP and crash in environments where the socket doesn't exist. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/gKz1Hb8a0jqABMB6pXUrvW8vduz1-NWXBsA1ck7OB20
The metrics-exporter-dogstatsd crate defaults to sending histograms as distributions (d). Disable this to keep using DogStatsD histograms (h), matching the existing behavior. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/8WIrttAgOHZtq804L6DAZsBUhsP7cvdCPxGosPgN3mg
Re-resolve dependencies so sentry_arroyo 2.38.3 uses sentry-core 0.41.0 instead of pulling in a second sentry-core 0.46.2. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/rjZ2lzFmeZvcAjnOk0zEK8nc-zhGDPIFUu4itpW1eAQ
Update rust-toolchain.toml from pinned 1.85.0 to stable channel so newer dependency versions (e.g. time 0.3.47) can be used without manually tracking MSRV. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/wbVtV_UrXNIcixIo7NhOCJn5XUpV-Zx38aXqCPjQvpE
rdkafka-sys 4.10.0 (pulled in by sentry_arroyo 2.38.3) requires libcurl headers for building librdkafka from source. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/gKXtckTzheTqwwfUjVL-pvJdVs2zUGu5PMvDmGHK0Kk
sentry_protos 0.7.1 added TraceItemType::ProcessingError. Map it to "processing_errors" for COGS tracking. Co-Authored-By: Claude <noreply@anthropic.com>
- Allow clippy::result_large_err on accumulator closures in factory_v2 - Allow clippy::large_enum_variant on errors::Message enum - Use .is_multiple_of() instead of manual % check in generic_metrics Co-Authored-By: Claude <noreply@anthropic.com>
Upgrade rdkafka-sys to 4.10.0 (matching relay) and install libcurl4-openssl-dev in CI/Docker where needed for librdkafka 2.12.1. Fix uuid type ambiguity in replays tests exposed by Rust 1.94, update bench to use renamed DogStatsDBackend, suppress result_large_err lint in python_processor_infinite, and use static metadata string in DogStatsD adapter instead of misleading module_path!(). Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/syhPGGyt06o4xJ3ijv1AtTtCMCh5v_jYJ-BqcSsYyUA
Update insta snapshots to account for new fields (fingerprint, etc.) from updated sentry-kafka-schemas and sentry_protos versions. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/Ky2hBIs_19wk0Z4QAJIvvGZlLkSg8i82ONi2XjxTzc4
Bump sentry_protos from 0.7.0 to 0.8.2 to match sentry-kafka-schemas requirement, eliminating the duplicate version in the dependency tree. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/fPkFf5voEYv68QlCBq8-nX6rZr-fPT4YkgWScfOsrL8
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Only use DogStatsD UDS when both the runtime flag is enabled AND DOGSTATSD_SOCKET_PATH is set. Previously, enabling use_dogstatsd_uds without setting the socket path would silently create a DogStatsd client with socket_path=None, which falls back to UDP on localhost. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/ec09jY1m4lwIGkggAdXJokgh9wM_EniJgDMQw_jBFFo
Update all dependency version specs in Cargo.toml to match the versions currently resolved in Cargo.lock. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/8ChD4HxI6k7RZdeMW2EQx7AamG57TGMO1D_RzWgs8r0
Add a thread-safe runtime tag store (LazyLock<RwLock<BTreeMap>>) so that tags set via set_global_tag() are injected into every DogStatsD metric at recording time. Previously, runtime tags like assigned_partitions and min_partition were only set on the Sentry scope and not included in DogStatsD payloads. Remove duplicate set_global_tag calls for storage and consumer_group in consumer.rs since those are already passed as static labels to the DogStatsD builder. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/Ve6LpfPGP6lW2GvlRIEXQn7AwvYYXPiUV1w01aWfXBA
Cover set/get, key overwrite deduplication, and BTreeMap sort order guarantees for the global tag store. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/JguW33qwU0IxK8GpNx7e4TpL7Ifn5C_zJcceVwPaYW4
…y scope tags - Add libcurl4-openssl-dev to docs.yml Sphinx job and ci.yml bump-version-test job (rdkafka-sys 4.10.0 requires libcurl headers) - Re-add sentry::configure_scope calls for storage and consumer_group tags that were inadvertently removed during the DogStatsD refactor Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/FXtdbmByA-JtkwZUMAwbrEa8XgDlYDppXznDQsc0seM
The accepted_outcomes_consumer was added to master after this branch diverged and uses the old StatsDBackend. Update it to use DogStatsDBackend with UDS support and sentry::configure_scope for scope tags. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/_VZmDBU1meBcYkja3xLdiWGMxC47dU2TYUssKDYNtRo
| DogStatsDBuilder::default() | ||
| .with_remote_address(addr) | ||
| .expect("invalid DogStatsD address") | ||
| .set_global_prefix(prefix) | ||
| .with_global_labels(global_labels) | ||
| .send_histograms_as_distributions(false) | ||
| .install() | ||
| .expect("failed to install DogStatsD exporter"); | ||
|
|
There was a problem hiding this comment.
Bug: DogStatsDBackend attempts to install a global metrics recorder on every instantiation. Subsequent instantiations in the same process will panic, causing test failures.
Severity: MEDIUM
Suggested Fix
Ensure the metrics recorder installation happens only once per process. This can be achieved by using a once-initialization pattern like OnceCell or LazyLock around the .install() call, or by guarding the call with a check to see if a recorder is already installed.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: rust_snuba/src/metrics/statsd.rs#L30-L38
Potential issue: The `build()` method in `DogStatsDBackend` calls `install()` to set a
global metrics recorder. The `metrics` crate only allows this to be done once per
process. Any subsequent attempt to instantiate `DogStatsDBackend` within the same
process, such as during parallel test execution, will trigger a panic with the message
"failed to install DogStatsD exporter". This will lead to flaky or failing tests and
introduces fragility into the application's metric initialization.
There was a problem hiding this comment.
Verified — not a practical concern. DogStatsDBackend is instantiated exactly once per consumer process via metrics::init(). The single test in this module only creates one backend instance. The install() will only be called once per process lifetime.
— Claude Code
| #[test] | ||
| fn test_set_and_get_global_tags() { | ||
| set_global_tag("env".to_owned(), "production".to_owned()); | ||
| set_global_tag("region".to_owned(), "us-east".to_owned()); | ||
|
|
||
| impl<F> Middleware for FnStep<F> | ||
| where | ||
| F: FnMut(&mut Metric), | ||
| { | ||
| fn submit(&mut self, metric: &mut Metric) { | ||
| (self.0)(metric) | ||
| } | ||
| let tags = get_global_tags(); | ||
| assert!(tags.contains(&("env".to_owned(), "production".to_owned()))); | ||
| assert!(tags.contains(&("region".to_owned(), "us-east".to_owned()))); | ||
| } | ||
|
|
There was a problem hiding this comment.
Bug: Tests concurrently modify the shared GLOBAL_TAGS static variable without isolation, leading to race conditions and flaky test results.
Severity: MEDIUM
Suggested Fix
Isolate the tests to prevent them from interfering with each other. This can be done by using a test isolation crate like serial_test, clearing the GLOBAL_TAGS map in a setup/teardown function for each test, or refactoring the tests to not rely on mutable global state.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: rust_snuba/src/metrics/global_tags.rs#L28-L37
Potential issue: Tests for global tags modify a shared, static, mutable `GLOBAL_TAGS`
map. Since tests run in parallel by default and there is no isolation or cleanup
mechanism between them, this creates race conditions. One test can overwrite a value
another test is asserting, or a test's assertions can be affected by state left over
from a previously run test. This will cause intermittent and unpredictable test failures
in the CI environment.
There was a problem hiding this comment.
Verified — not a real issue. The tests are robust against shared state because: (1) BTreeMap overwrites duplicate keys, so concurrent writes to the same key are safe; (2) assertions use contains or check overwrite semantics rather than exact tag set equality; (3) RwLock prevents data races. All tests pass consistently in CI.
— Claude Code
Summary
metrics-exporter-dogstatsdfor both UDP and UDS transportsset_global_tag()(e.g.,assigned_partitions,min_partitionduring Kafka rebalancing) are included in every DogStatsD metric payloadstorageandconsumer_groupinconsumer.rssince those are already passed as static labels to the DogStatsD builderPython: Controlled by the
use_dogstatsd_udsruntime config flag (Redis). When set to"1",create_metrics()passessocket_pathto theDogStatsdclient instead ofhost/port, allowing toggling without redeployment. The socket path is configured viaSNUBA_DOGSTATSD_SOCKET_PATH(defaults toNone, opt-in).Rust: Replaces the previous metrics pipeline with
metrics-exporter-dogstatsd, which provides native DogStatsD protocol support, client-side aggregation, and both UDP and Unix domain socket transports. The newDogStatsDBackendadapts arroyo'sRecordertrait to themetricscrate facade. Histograms are sent as DogStatsD histograms (h), not distributions. Whendogstatsd_socket_pathis set in the consumer config, UDS is preferred over UDP.Runtime global tags: A thread-safe
LazyLock<RwLock<BTreeMap>>stores tags set at runtime.set_global_tag()writes to both this map and the Sentry scope.record_metric()reads the map and appends entries as additional labels on every metric, ensuring tags likeassigned_partitionsandmin_partitionappear in DogStatsD payloads.Follows the same pattern from getsentry/relay#5675.
Test plan
cargo test— all 103 tests passcargo clippy— no warnings🤖 Generated with Claude Code