[do not merge] feat: Span streaming & new span API#5551
[do not merge] feat: Span streaming & new span API#5551sentrivana wants to merge 151 commits intomasterfrom
Conversation
Semver Impact of This PR⚪ None (no version bump detected) 📋 Changelog PreviewThis is how your changes will appear in the changelog. This PR will not appear in the changelog. 🤖 This preview updates automatically when you update the PR. |
Codecov Results 📊✅ 13 passed | Total: 13 | Pass Rate: 100% | Execution Time: 4.74s 📊 Comparison with Base Branch
✨ No test changes detected All tests are passing successfully. ❌ Patch coverage is 22.69%. Project has 14388 uncovered lines. Files with missing lines (43)
Coverage diff@@ Coverage Diff @@
## main #PR +/-##
==========================================
+ Coverage 25.65% 30.31% +4.66%
==========================================
Files 189 189 —
Lines 19838 20646 +808
Branches 6430 6792 +362
==========================================
+ Hits 5089 6258 +1169
- Misses 14749 14388 -361
- Partials 421 503 +82Generated by Codecov Action |
There was a problem hiding this comment.
Race condition causes span loss when buffer is at flush threshold (sentry_sdk/_span_batcher.py:19)
When MAX_BEFORE_FLUSH (1000) equals MAX_BEFORE_DROP (1000), a race condition exists where spans are unnecessarily dropped. After the 1000th span triggers a flush and releases the lock, subsequent add() calls can acquire the lock before the flush thread clears the buffer, seeing size >= MAX_BEFORE_DROP and dropping spans. This results in data loss during high-throughput scenarios.
Async Redis spans are not closed when exceptions occur (sentry_sdk/integrations/redis/_async_common.py:135)
In _sentry_execute_command, spans are created via __enter__() but __exit__() is called outside of a try/finally block. If old_execute_command raises an exception, the db_span and cache_span will never be closed, causing span leaks. The sync version in _sync_common.py correctly wraps this in a try/finally block (lines 141-151).
AttributeError when legacy Span is on scope during streaming mode (sentry_sdk/scope.py:1249)
At line 1249, parent_span is assigned from self.span or self.get_current_scope().span, which can be a legacy Span (from sentry_sdk.tracing). However, at line 1284, the code accesses parent_span.segment, an attribute that only exists on StreamedSpan, not on the legacy Span class. If streaming mode is enabled but a legacy Span ends up on the scope (e.g., from a third-party integration or mixed code), this will cause an AttributeError: 'Span' object has no attribute 'segment'.
Span silently dropped when end() called without start() (sentry_sdk/traces.py:341)
When span.end() is called without first calling span.start() or using the context manager, the _context_manager_state attribute is not initialized. The code at line 342 attempts to unpack this attribute, and the resulting AttributeError is swallowed by capture_internal_exceptions(). The span is silently dropped without any warning to the user, and the scope's span reference is not restored.
Identified by Warden find-bugs
There was a problem hiding this comment.
Missing try/finally causes span leak when Redis command raises exception (sentry_sdk/integrations/redis/_async_common.py:137)
In _sentry_execute_command, the async version does not wrap the await old_execute_command() call in a try/finally block, unlike the sync version in _sync_common.py. If the Redis command raises an exception, db_span.__exit__() and cache_span.__exit__() will never be called, causing the spans to remain unclosed. This could lead to resource leaks and corrupted tracing state.
Scope corruption when real_putrequest raises exception in streaming mode (sentry_sdk/integrations/stdlib.py:127)
In the span streaming code path (lines 109-127), span.start() is called which sets the span as active on the scope and saves the old span in _context_manager_state. If real_putrequest() at line 148 raises an exception, span.end() in getresponse is never called, leaving the scope's span attribute pointing to an orphaned span and never restoring the previous span. This corrupts the scope state for subsequent operations in the same request/thread.
Dict rules with unrecognized keys in ignore_spans config silently ignore ALL spans (sentry_sdk/tracing_utils.py:1498)
When ignore_spans contains a dict with only unrecognized keys (e.g., a typo like {"nme": "/health"} instead of {"name": "/health"}), both name_matches and attributes_match default to True, causing the rule to match ALL spans. This could silently drop all trace data due to a simple configuration mistake.
Identified by Warden find-bugs
| type="span", | ||
| content_type="application/vnd.sentry.items.span.v2+json", | ||
| headers={ | ||
| "item_count": len(spans), |
There was a problem hiding this comment.
item_count header reports total spans instead of actual batch size when splitting envelopes
When spans exceed MAX_ENVELOPE_SIZE (1000), the code correctly splits them into multiple envelopes. However, the item_count header is always set to len(spans) (the total count for the trace) instead of the actual number of items in each batch. For example, if there are 2500 spans, three envelopes would be created with item_counts of 2500, 2500, and 2500 instead of 1000, 1000, and 500. This causes a mismatch between the reported item_count and actual payload size, potentially causing issues on the receiving server.
Verification
Read the full _span_batcher.py file. The loop at line 121 iterates over batches of MAX_ENVELOPE_SIZE spans using slicing at lines 140-143, but line 134 always uses len(spans) which is the total count, not the slice size.
Suggested fix: Calculate the actual batch size for each envelope slice and use that for item_count.
| "item_count": len(spans), | |
| batch = spans[ | |
| i * self.MAX_ENVELOPE_SIZE : (i + 1) * self.MAX_ENVELOPE_SIZE | |
| ] | |
| "item_count": len(batch), | |
| self._to_transport_format(span) for span in batch |
Identified by Warden code-review · WDF-CMZ
| finally: | ||
| span = sentry_sdk.get_current_span() | ||
| if span is not None and span.status == SPANSTATUS.INTERNAL_ERROR: | ||
| if isinstance(span, Span) and span.status == SPANSTATUS.INTERNAL_ERROR: |
There was a problem hiding this comment.
StreamedSpan error cleanup is not handled, potentially leaking span resources
The change from span is not None to isinstance(span, Span) correctly prevents AttributeError (since StreamedSpan uses get_status() instead of .status), but it also means StreamedSpan instances will never have __exit__ called on error. When set_span_errored() is called for a StreamedSpan, it sets SpanStatus.ERROR (not SPANSTATUS.INTERNAL_ERROR), and now the isinstance check excludes StreamedSpan entirely. This could result in unclosed spans when using streaming mode with errors.
Verification
Read anthropic.py (lines 540-614), tracing_utils.py (set_span_errored function at lines 1109-1126), traces.py (StreamedSpan class at lines 191-475, SpanStatus enum at lines 44-49), and consts.py (SPANSTATUS at line 896). Confirmed that: 1) StreamedSpan has no .status property (uses get_status()/set_status() methods), 2) StreamedSpan errors use SpanStatus.ERROR='error' vs Span uses SPANSTATUS.INTERNAL_ERROR='internal_error', 3) The isinstance check prevents checking status on StreamedSpan but also prevents calling exit on errored StreamedSpan instances.
Identified by Warden code-review · YB7-EKF
| db_span.__enter__() | ||
|
|
||
| set_db_data_fn(db_span, self) |
There was a problem hiding this comment.
Async spans leak on exception - missing try/finally block
In _sentry_execute_command, when old_execute_command raises an exception, both db_span and cache_span will never have their __exit__ methods called. This leaks spans that are never properly finished, causing incorrect timing data and potential memory leaks. The sync version (_sync_common.py) correctly wraps this in a try/finally block (lines 141-150), but this async version lacks equivalent exception handling.
Verification
Read the full file at sentry_sdk/integrations/redis/_async_common.py and compared with sentry_sdk/integrations/redis/_sync_common.py. The sync version uses try/finally (lines 141-150) to ensure spans are closed. The async version calls enter on line 120 and 135 but only calls exit in the success path (lines 142, 146). Traced StreamedSpan.exit in sentry_sdk/traces.py (lines 331-345) which properly handles exception info when passed.
Suggested fix: Wrap the async command execution in a try/finally block to ensure spans are properly closed even when exceptions occur, similar to the sync implementation.
| db_span.__enter__() | |
| set_db_data_fn(db_span, self) | |
| with capture_internal_exceptions(): | |
| set_db_data_fn(db_span, self) | |
| _set_client_data(db_span, is_cluster, name, *args) | |
| try: | |
| value = await old_execute_command(self, name, *args, **kwargs) | |
| finally: | |
| db_span.__exit__(None, None, None) | |
| if cache_span: | |
| with capture_internal_exceptions(): | |
| _set_cache_data(cache_span, self, cache_properties, value) | |
| cache_span.__exit__(None, None, None) |
Identified by Warden code-review · MW4-P2M
Introduce a new
start_span()API with a simpler and more intuitive signature to eventually replace the originalstart_span()andstart_transaction()APIs.Additionally, introduce a new streaming mode (
sentry_sdk.init(_experiments={"trace_lifecycle": "stream"})) that will send spans as they finish, rather than by transaction.The new API MUST be used with the new streaming mode, and the old API MUST be used in the legacy non-streaming (static) mode.
Migration guide: getsentry/sentry-docs#16072
Notes
Spanand drop the newStreamedSpanintracing.pyas a replacement.trace_id(we can't send spans from different traces in the same envelope).Release Plan
Project
https://linear.app/getsentry/project/span-first-sdk-python-727da28dd037/overview