Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view

This file was deleted.

This file was deleted.

49 changes: 0 additions & 49 deletions develop-docs/sdk/telemetry/telemetry-buffer/index.mdx

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,26 +1,30 @@
---
title: Backend Telemetry Buffer
description: Detailed backend telemetry buffer design.
title: Backend Telemetry Processor
description: Detailed backend telemetry processor design.
sidebar_order: 1
---

## Telemetry Buffer Layer: Prioritized, Bounded, Rate-Aware Envelope Delivery
<Alert level="warning">
🚧 This document is work in progress.
</Alert>

## Telemetry Processor Layer: Prioritized, Bounded, Rate-Aware Envelope Delivery

### Overview

The buffer system sits between the SDK client and the HTTP transport layer, ensuring that critical telemetry like errors take priority over high-volume data like logs and traces. This prevents important events from getting lost when your application is under heavy load or sending large amounts of telemetry.
The TelemetryProcessor sits between the SDK client and the HTTP transport layer, ensuring that critical telemetry like errors take priority over high-volume data like logs and traces. This prevents important events from getting lost when your application is under heavy load or sending large amounts of telemetry.

### Motivation

- Aggregation lives in a unified buffer layer (this way we avoid creating multiple batch processors for different telemetry types).
- Aggregation lives in a unified layer (this way we avoid creating multiple batch processors for different telemetry types).
- All telemetry types use capture APIs (CaptureX) routed through the Client.
- Rate-limit awareness is built-in across categories.
- Buffers support two modes: normal ring buffer and bucket-by-trace (for spans).
- For spans, dropping an entire trace under pressure is preferable.

### Architecture Overview

Introduce a `Buffer` layer between the `Client` and the `Transport`. This `Buffer` wraps prioritization and scheduling and exposes a minimal API to the SDK:
Introduce a `TelemetryProcessor` layer between the `Client` and the `Transport`. This `TelemetryProcessor` wraps prioritization and scheduling and exposes a minimal API to the SDK:

- Add(item).
- Flush(timeout).
Expand All @@ -34,19 +38,19 @@ Introduce a `Buffer` layer between the `Client` and the `Transport`. This `Buffe

┌────────────────────────────────────────────────────────────────────────────┐
Buffer
TelemetryProcessor
│ Add(item) · Flush(timeout) · Close(timeout) │
│ │
│ ┌──────────────────────┐ ┌──────────────────────┐ ┌──────────────────┐ │
│ │ Error Store │ │ Check-in Store │ │ Log Store │ │
│ │ Error Buffer │ │ Check-in Buffer │ │ Log Buffer │ │
│ │ (CRITICAL) │ │ (HIGH) │ │ (LOW) │ │
│ │ Timeout: N/A │ │ Timeout: N/A │ │ Timeout: 5s │ │
│ │ BatchSize: 1 │ │ BatchSize: 1 │ │ BatchSize: 100 │ │
│ └──────────────────────┘ └──────────────────────┘ └──────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Scheduler (Weighted Round-Robin) │ │
│ │ TelemetryScheduler (Weighted Round-Robin) │ │
│ │ - Priority weights: CRITICAL=5, HIGH=4, MEDIUM=3, LOW=2, LOWEST=1 │ │
│ │ - Processes a batch of items based on BatchSize and/or Timeout │ │
│ │ - Builds envelopes from batch │ │
Expand All @@ -61,10 +65,10 @@ Introduce a `Buffer` layer between the `Client` and the `Transport`. This `Buffe
└────────────────────────────────────────────────────────────────────────────┘
```

#### How the Buffer works
#### How the Processor works

- **Smart batching**: Logs are batched into single requests; errors, transactions, and monitors are sent immediately.
- **Pre-send rate limiting**: The scheduler checks rate limits before serialization to avoid unnecessary processing. When a telemetry is rate-limited the selected batch should
- **Pre-send rate limiting**: The TelemetryScheduler checks rate limits before serialization to avoid unnecessary processing. When a telemetry is rate-limited the selected batch should
be dropped, to avoid filling up the buffers.
- **Category isolation**: Separate ring buffers for each telemetry type prevent head-of-line blocking.
- **Weighted scheduling**: High-priority telemetry gets sent more frequently via round-robin selection.
Expand All @@ -81,9 +85,9 @@ Configurable via weights.

### Components

#### Storage
#### TelemetryBuffers

Each telemetry category maintains a store interface; a fixed-size circular array/ring buffer (not to be confused with the `Buffer` wrapper) that stores items before transmission:
Each telemetry category maintains a buffer interface; a fixed-size circular array/ring buffer that stores items before transmission:

- **Bounded capacity**: Default to 100 items for errors, logs, and monitors; 1000 for transactions. This prevents unbounded memory growth regardless of telemetry volume and backpressure handling.
- **Overflow policies** (optional):
Expand All @@ -100,10 +104,10 @@ Each telemetry category maintains a store interface; a fixed-size circular array
- Offer semantics: if not full, append; when full, apply `overflowPolicy`:
- `drop_oldest`: evict the oldest item, insert the new one, and invoke the dropped callback with reason `buffer_full_drop_oldest`.
- `drop_newest`: reject the new item and invoke the dropped callback with reason `buffer_full_drop_newest`.
- Readiness: a store is ready when `size >= batchSize` or when `timeout` has elapsed since `lastFlushTime` (and it is non-empty).
- Polling: `PollIfReady()` returns up to `batchSize` items and updates `lastFlushTime`; `Drain()` empties the store.
- Readiness: a buffer is ready when `size >= batchSize` or when `timeout` has elapsed since `lastFlushTime` (and it is non-empty).
- Polling: `PollIfReady()` returns up to `batchSize` items and updates `lastFlushTime`; `Drain()` empties the buffer.

##### Bucketed-by-trace storage (spans)
##### Bucketed-by-trace buffer (spans)

- **Purpose**: keep spans from the same trace together and flush them as a unit to avoid partial-trace delivery under pressure. This addresses a gap in standard implementations where individual span drops can create incomplete traces.
- **Grouping**: a new bucket is created per trace id; a map (`traceIndex`) provides O(1) lookup.
Expand All @@ -119,11 +123,11 @@ Each telemetry category maintains a store interface; a fixed-size circular array
There still remains a small subset of cases that might result in partial traces, where either an old trace bucket was dropped and a new span with the same trace arrived, or we dropped an incoming span of this trace.
The preferred overflow behavior in most cases should be `drop_oldest` since it results in the fewest incomplete traces from the two scenarios.

Stores are mapped to [DataCategories](https://github.com/getsentry/relay/blob/master/relay-base-schema/src/data_category.rs), which determine their scheduling priority and rate limits.
Buffers are mapped to [DataCategories](https://github.com/getsentry/relay/blob/master/relay-base-schema/src/data_category.rs), which determine their scheduling priority and rate limits.

#### Scheduler
#### TelemetryScheduler

The scheduler runs as a background worker, coordinating the flow of telemetry from storage to the transport:
The TelemetryScheduler runs as a background worker, coordinating the flow of telemetry from buffers to the transport:

- **Initialization**: Constructs a weighted priority cycle (e.g., `[CRITICAL×5, HIGH×4, MEDIUM×3, ...]`) based on configured weights.
- **Event loop**: Wakes when explicitly signaled from the `captureX` methods on the client when new data is available (if the language does not support this, then a periodic ticker can be used).
Expand Down Expand Up @@ -191,7 +195,7 @@ type Storage[T any] interface {
}


// Single item store
// Single item buffer
func (b *RingBuffer[T]) PollIfReady() []T {
b.mu.Lock()
defer b.mu.Unlock()
Expand Down Expand Up @@ -226,7 +230,7 @@ func (b *RingBuffer[T]) PollIfReady() []T {
return result
}

// Bucketed store
// Bucketed buffer
func (b *BucketedBuffer[T]) PollIfReady() []T {
b.mu.Lock()
defer b.mu.Unlock()
Expand Down Expand Up @@ -257,10 +261,10 @@ func (b *BucketedBuffer[T]) PollIfReady() []T {

```

#### Scheduler Processing
#### TelemetryScheduler Processing

```go
func (s *Scheduler) run() {
func (s *TelemetryScheduler) run() {
for {
s.mu.Lock()

Expand All @@ -274,7 +278,7 @@ func (s *Scheduler) run() {
}
}

func (s *Scheduler) hasWork() bool {
func (s *TelemetryScheduler) hasWork() bool {
for _, buffer := range s.buffers {
if buffer.IsReadyToFlush() {
return true
Expand All @@ -283,15 +287,15 @@ func (s *Scheduler) hasWork() bool {
return false
}

func (s *Scheduler) processNextBatch() {
func (s *TelemetryScheduler) processNextBatch() {
if len(s.currentCycle) == 0 {
return
}

priority := s.currentCycle[s.cyclePos]
s.cyclePos = (s.cyclePos + 1) % len(s.currentCycle)

var bufferToProcess Storage[protocol.EnvelopeItemConvertible]
var bufferToProcess TelemetryBuffer[protocol.EnvelopeItemConvertible]
var categoryToProcess ratelimit.Category
for category, buffer := range s.buffers {
if buffer.Priority() == priority && buffer.IsReadyToFlush() {
Expand All @@ -311,8 +315,8 @@ func (s *Scheduler) processNextBatch() {
#### Flushing

```go
func (s *Scheduler) flush() {
// should process all store buffers and send to transport
func (s *TelemetryScheduler) flush() {
// should process all buffers and send to transport
for category, buffer := range s.buffers {
if !buffer.IsEmpty() {
s.processItems(buffer, category, true)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,12 @@
title: Batch Processor (deprecated)
redirect_from:
- /sdk/telemetry/spans/batch-processor/
- /sdk/telemetry/telemetry-buffer/batch-processor/
sidebar_order: 10
---

<Alert level="warning">
The BatchProcessor is deprecated. Please use the [Telemetry Buffer](/sdk/telemetry/telemetry-buffer/) instead.
The BatchProcessor is deprecated. Please use the [Telemetry Processor](/sdk/telemetry/telemetry-processor/) instead.
</Alert>

<Alert>
Expand All @@ -15,7 +16,7 @@ sidebar_order: 10

# BatchProcessor (deprecated)

This section covers the initial specification of the BatchProcessor, which some SDKs use as a reference when implementing logs. This exists only as a reference until we fully spec out the [telemetry buffer](/sdk/telemetry/telemetry-buffer/) across all platforms.
This section covers the initial specification of the BatchProcessor, which some SDKs use as a reference when implementing logs. This exists only as a reference until we fully spec out the [telemetry processor](/sdk/telemetry/telemetry-processor/) across all platforms.

## Overview

Expand All @@ -37,7 +38,7 @@ The BatchProcessor **MUST** forward all spans and logs in memory to the transpor
2. When the user calls `SentrySDK.close()`, the BatchProcessor **MUST** forward all data in memory to the transport. SDKs **SHOULD** keep their existing closing behavior.
3. When the application shuts down gracefully, the BatchProcessor **SHOULD** forward all data in memory to the transport. The transport **SHOULD** keep its existing behavior, which usually stores the data to disk as an envelope. It is not required to call a transport `flush`. This is mostly relevant for mobile SDKs already subscribed to these hooks, such as [applicationWillTerminate](https://developer.apple.com/documentation/uikit/uiapplicationdelegate/applicationwillterminate(_:)) on iOS.
4. When the application moves to the background, the BatchProcessor **SHOULD** forward all the data in memory to the transport and stop the timer. The transport **SHOULD** keep its existing behavior, which usually stores the data to disk as an envelope. It is not required to call the transport `flush`. This is mostly relevant for mobile SDKs.
5. Mobile SDKs **MUST** minimize data loss when sudden process terminations occur. Refer to the [Mobile Telemetry Buffer](/sdk/telemetry/telemetry-buffer/mobile-telemetry-buffer) section for more details.
5. Mobile SDKs **MUST** minimize data loss when sudden process terminations occur. Refer to the [Mobile Telemetry Processor](/sdk/telemetry/telemetry-processor/mobile-telemetry-processor) section for more details.

The detailed specification is written in the [Gherkin syntax](https://cucumber.io/docs/gherkin/reference/). The specification uses spans as an example, but the same applies to logs or any other future telemetry data.

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
title: Browser Telemetry Processor
description: Detailed browser telemetry processor design.
sidebar_order: 2
---

To be defined — full spec lives here.
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
title: GDX Telemetry Processor
description: Detailed GDX telemetry processor design.
sidebar_order: 3
---

To be defined — full spec lives here.
Loading
Loading