Skip to content

[do not merge] feat: Span streaming & new span API#5551

Draft
sentrivana wants to merge 151 commits intomasterfrom
feat/span-first
Draft

[do not merge] feat: Span streaming & new span API#5551
sentrivana wants to merge 151 commits intomasterfrom
feat/span-first

Conversation

@sentrivana
Copy link
Contributor

Introduce a new start_span() API with a simpler and more intuitive signature to eventually replace the original start_span() and start_transaction() APIs.

Additionally, introduce a new streaming mode (sentry_sdk.init(_experiments={"trace_lifecycle": "stream"})) that will send spans as they finish, rather than by transaction.

import sentry_sdk

sentry_sdk.init(
    _experiments={"trace_lifecycle": "stream"},
)

with sentry_sdk.traces.start_span(name="my_span"):
    ...

The new API MUST be used with the new streaming mode, and the old API MUST be used in the legacy non-streaming (static) mode.

Migration guide: getsentry/sentry-docs#16072

Notes

  • The diff is huge mostly because I've optimized for easy removal of legacy code in the next major, deliberately duplicating a lot. I'll of course split it up to reviewable PRs once ready.
    • Chose to go with a new file and a new span class so that we can just remove the old Span and drop the new StreamedSpan in tracing.py as a replacement.
  • The batcher for spans is a bit different from the logs and metrics batchers because it needs to batch by trace_id (we can't send spans from different traces in the same envelope).

Release Plan

  • There will be prereleases for internal testing.
  • We'll release the new API in a minor version as opt-in.
  • In the next major, we'll drop the legacy API.

Project

https://linear.app/getsentry/project/span-first-sdk-python-727da28dd037/overview

@sentrivana sentrivana changed the title Feat/span first [do not merge] feat: Span streaming & new span API Feb 26, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 26, 2026

Semver Impact of This PR

None (no version bump detected)

📋 Changelog Preview

This is how your changes will appear in the changelog.
Entries from this PR are highlighted with a left border (blockquote style).


This PR will not appear in the changelog.


🤖 This preview updates automatically when you update the PR.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 26, 2026

Codecov Results 📊

13 passed | Total: 13 | Pass Rate: 100% | Execution Time: 4.74s

📊 Comparison with Base Branch

Metric Change
Total Tests
Passed Tests
Failed Tests
Skipped Tests

✨ No test changes detected

All tests are passing successfully.

❌ Patch coverage is 22.69%. Project has 14388 uncovered lines.
✅ Project coverage is 30.31%. Comparing base (base) to head (head).

Files with missing lines (43)
File Patch % Lines
tracing_utils.py 39.24% ⚠️ 429 Missing and 28 partials
__init__.py 5.28% ⚠️ 377 Missing
starlette.py 5.26% ⚠️ 360 Missing
scope.py 67.05% ⚠️ 289 Missing and 69 partials
client.py 55.38% ⚠️ 224 Missing and 59 partials
anthropic.py 9.03% ⚠️ 252 Missing
traces.py 27.14% ⚠️ 247 Missing
strawberry.py 7.76% ⚠️ 226 Missing
utils.py 16.35% ⚠️ 220 Missing
huggingface_hub.py 9.42% ⚠️ 173 Missing
rust_tracing.py 0.00% ⚠️ 163 Missing
asgi.py 19.38% ⚠️ 129 Missing
spotlight.py 28.47% ⚠️ 103 Missing and 8 partials
asgi.py 0.00% ⚠️ 109 Missing
envelope.py 54.04% ⚠️ 91 Missing and 17 partials
caching.py 0.00% ⚠️ 106 Missing
asyncpg.py 11.86% ⚠️ 104 Missing
stdlib.py 53.51% ⚠️ 86 Missing and 15 partials
templates.py 0.00% ⚠️ 100 Missing
utils.py 13.79% ⚠️ 100 Missing
httpx.py 12.15% ⚠️ 94 Missing
middleware.py 0.00% ⚠️ 90 Missing
_wsgi_common.py 30.71% ⚠️ 88 Missing and 1 partials
graphene.py 13.48% ⚠️ 77 Missing
sqlalchemy.py 10.71% ⚠️ 75 Missing
monitoring.py 17.44% ⚠️ 71 Missing
transactions.py 0.00% ⚠️ 67 Missing
__init__.py 86.43% ⚠️ 38 Missing and 23 partials
api.py 63.52% ⚠️ 58 Missing
views.py 0.00% ⚠️ 50 Missing
_span_batcher.py 33.33% ⚠️ 48 Missing
signals_handlers.py 0.00% ⚠️ 44 Missing
threading.py 63.16% ⚠️ 35 Missing and 5 partials
utils.py 68.54% ⚠️ 28 Missing and 11 partials
caches.py 47.62% ⚠️ 33 Missing and 2 partials
_async_common.py 75.36% ⚠️ 17 Missing and 7 partials
_compat.py 41.03% ⚠️ 23 Missing
_sync_common.py 76.06% ⚠️ 17 Missing and 6 partials
redis_cluster.py 52.94% ⚠️ 16 Missing
feature_flags.py 57.58% ⚠️ 14 Missing
_types.py 60.00% ⚠️ 12 Missing
queries.py 88.57% ⚠️ 4 Missing and 3 partials
consts.py 99.43% ⚠️ 2 Missing
Coverage diff
@@            Coverage Diff             @@
##          main       #PR       +/-##
==========================================
+ Coverage    25.65%    30.31%    +4.66%
==========================================
  Files          189       189         —
  Lines        19838     20646      +808
  Branches      6430      6792      +362
==========================================
+ Hits          5089      6258     +1169
- Misses       14749     14388      -361
- Partials       421       503       +82

Generated by Codecov Action

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race condition causes span loss when buffer is at flush threshold (sentry_sdk/_span_batcher.py:19)

When MAX_BEFORE_FLUSH (1000) equals MAX_BEFORE_DROP (1000), a race condition exists where spans are unnecessarily dropped. After the 1000th span triggers a flush and releases the lock, subsequent add() calls can acquire the lock before the flush thread clears the buffer, seeing size >= MAX_BEFORE_DROP and dropping spans. This results in data loss during high-throughput scenarios.

Async Redis spans are not closed when exceptions occur (sentry_sdk/integrations/redis/_async_common.py:135)

In _sentry_execute_command, spans are created via __enter__() but __exit__() is called outside of a try/finally block. If old_execute_command raises an exception, the db_span and cache_span will never be closed, causing span leaks. The sync version in _sync_common.py correctly wraps this in a try/finally block (lines 141-151).

AttributeError when legacy Span is on scope during streaming mode (sentry_sdk/scope.py:1249)

At line 1249, parent_span is assigned from self.span or self.get_current_scope().span, which can be a legacy Span (from sentry_sdk.tracing). However, at line 1284, the code accesses parent_span.segment, an attribute that only exists on StreamedSpan, not on the legacy Span class. If streaming mode is enabled but a legacy Span ends up on the scope (e.g., from a third-party integration or mixed code), this will cause an AttributeError: 'Span' object has no attribute 'segment'.

Span silently dropped when end() called without start() (sentry_sdk/traces.py:341)

When span.end() is called without first calling span.start() or using the context manager, the _context_manager_state attribute is not initialized. The code at line 342 attempts to unpack this attribute, and the resulting AttributeError is swallowed by capture_internal_exceptions(). The span is silently dropped without any warning to the user, and the scope's span reference is not restored.

Identified by Warden find-bugs

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing try/finally causes span leak when Redis command raises exception (sentry_sdk/integrations/redis/_async_common.py:137)

In _sentry_execute_command, the async version does not wrap the await old_execute_command() call in a try/finally block, unlike the sync version in _sync_common.py. If the Redis command raises an exception, db_span.__exit__() and cache_span.__exit__() will never be called, causing the spans to remain unclosed. This could lead to resource leaks and corrupted tracing state.

Scope corruption when real_putrequest raises exception in streaming mode (sentry_sdk/integrations/stdlib.py:127)

In the span streaming code path (lines 109-127), span.start() is called which sets the span as active on the scope and saves the old span in _context_manager_state. If real_putrequest() at line 148 raises an exception, span.end() in getresponse is never called, leaving the scope's span attribute pointing to an orphaned span and never restoring the previous span. This corrupts the scope state for subsequent operations in the same request/thread.

Dict rules with unrecognized keys in ignore_spans config silently ignore ALL spans (sentry_sdk/tracing_utils.py:1498)

When ignore_spans contains a dict with only unrecognized keys (e.g., a typo like {"nme": "/health"} instead of {"name": "/health"}), both name_matches and attributes_match default to True, causing the rule to match ALL spans. This could silently drop all trace data due to a simple configuration mistake.

Identified by Warden find-bugs

type="span",
content_type="application/vnd.sentry.items.span.v2+json",
headers={
"item_count": len(spans),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

item_count header reports total spans instead of actual batch size when splitting envelopes

When spans exceed MAX_ENVELOPE_SIZE (1000), the code correctly splits them into multiple envelopes. However, the item_count header is always set to len(spans) (the total count for the trace) instead of the actual number of items in each batch. For example, if there are 2500 spans, three envelopes would be created with item_counts of 2500, 2500, and 2500 instead of 1000, 1000, and 500. This causes a mismatch between the reported item_count and actual payload size, potentially causing issues on the receiving server.

Verification

Read the full _span_batcher.py file. The loop at line 121 iterates over batches of MAX_ENVELOPE_SIZE spans using slicing at lines 140-143, but line 134 always uses len(spans) which is the total count, not the slice size.

Suggested fix: Calculate the actual batch size for each envelope slice and use that for item_count.

Suggested change
"item_count": len(spans),
batch = spans[
i * self.MAX_ENVELOPE_SIZE : (i + 1) * self.MAX_ENVELOPE_SIZE
]
"item_count": len(batch),
self._to_transport_format(span) for span in batch

Identified by Warden code-review · WDF-CMZ

finally:
span = sentry_sdk.get_current_span()
if span is not None and span.status == SPANSTATUS.INTERNAL_ERROR:
if isinstance(span, Span) and span.status == SPANSTATUS.INTERNAL_ERROR:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StreamedSpan error cleanup is not handled, potentially leaking span resources

The change from span is not None to isinstance(span, Span) correctly prevents AttributeError (since StreamedSpan uses get_status() instead of .status), but it also means StreamedSpan instances will never have __exit__ called on error. When set_span_errored() is called for a StreamedSpan, it sets SpanStatus.ERROR (not SPANSTATUS.INTERNAL_ERROR), and now the isinstance check excludes StreamedSpan entirely. This could result in unclosed spans when using streaming mode with errors.

Verification

Read anthropic.py (lines 540-614), tracing_utils.py (set_span_errored function at lines 1109-1126), traces.py (StreamedSpan class at lines 191-475, SpanStatus enum at lines 44-49), and consts.py (SPANSTATUS at line 896). Confirmed that: 1) StreamedSpan has no .status property (uses get_status()/set_status() methods), 2) StreamedSpan errors use SpanStatus.ERROR='error' vs Span uses SPANSTATUS.INTERNAL_ERROR='internal_error', 3) The isinstance check prevents checking status on StreamedSpan but also prevents calling exit on errored StreamedSpan instances.

Identified by Warden code-review · YB7-EKF

Comment on lines 135 to 137
db_span.__enter__()

set_db_data_fn(db_span, self)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Async spans leak on exception - missing try/finally block

In _sentry_execute_command, when old_execute_command raises an exception, both db_span and cache_span will never have their __exit__ methods called. This leaks spans that are never properly finished, causing incorrect timing data and potential memory leaks. The sync version (_sync_common.py) correctly wraps this in a try/finally block (lines 141-150), but this async version lacks equivalent exception handling.

Verification

Read the full file at sentry_sdk/integrations/redis/_async_common.py and compared with sentry_sdk/integrations/redis/_sync_common.py. The sync version uses try/finally (lines 141-150) to ensure spans are closed. The async version calls enter on line 120 and 135 but only calls exit in the success path (lines 142, 146). Traced StreamedSpan.exit in sentry_sdk/traces.py (lines 331-345) which properly handles exception info when passed.

Suggested fix: Wrap the async command execution in a try/finally block to ensure spans are properly closed even when exceptions occur, similar to the sync implementation.

Suggested change
db_span.__enter__()
set_db_data_fn(db_span, self)
with capture_internal_exceptions():
set_db_data_fn(db_span, self)
_set_client_data(db_span, is_cluster, name, *args)
try:
value = await old_execute_command(self, name, *args, **kwargs)
finally:
db_span.__exit__(None, None, None)
if cache_span:
with capture_internal_exceptions():
_set_cache_data(cache_span, self, cache_properties, value)
cache_span.__exit__(None, None, None)

Identified by Warden code-review · MW4-P2M

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants