[BUG] OTEL span reports accumulated_usage instead of per-invocation usage, causing inflated token metrics in Langfuse

### Checks

- [x] I have updated to the lastest minor and patch version of Strands
- [x] I have checked the documentation and this is not expected behavior
- [x] I have searched [./issues](./issues?q=) and there are no duplicates of my issue

### Strands Version

v1.33

### Python Version

3.12

### Operating System

macox

### Installation Method

pip

### Steps to Reproduce

- Install Strands Python SDK via pip

- Configure OTEL env vars to export traces to Langfuse:

  - `OTEL_EXPORTER_OTLP_ENDPOINT`
  - `OTEL_EXPORTER_OTLP_HEADERS`
  - `OTEL_EXPORTER_OTLP_PROTOCOL`
  - `DISABLE_ADOT_OBSERVABILITY=true`
  - `OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_tool_definitions`

- Run an agent across multiple invocations in a multi-chat session

- Observe token usage reported in Langfuse per span


### Expected Behavior

Each OTEL span should report token usage for that specific invocation only. e.g., in a session with 10 invocations of ~100k tokens each:

- Request 1 → 100k tokens
- Request 2 → 100k tokens
- Request 3 → 100k tokens

Total: ~1M tokens


### Actual Behavior

Each OTEL span reports the session-lifetime accumulated_usage instead of per-invocation usage. e.g.:

- Request 1 → 100k tokens ✅
- Request 2 → 200k tokens ❌ (should be 100k)
- Request 3 → 300k tokens ❌ (should be 100k)

Langfuse then sums these, resulting in wildly inflated token counts and cost estimates. Additionally, `reset_usage_metrics()` does NOT reset `accumulated_usage` — this appears to be intentional per the test suite, but means there is no workaround.


### Additional Context

Root cause identified in `src/strands/telemetry/tracer.py`: The OTEL span reporter explicitly uses `accumulated_usage` when setting span attributes:

```python
accumulated_usage = response.metrics.accumulated_usage
attributes.update({
    "gen_ai.usage.prompt_tokens": accumulated_usage["inputTokens"],
    "gen_ai.usage.input_tokens":  accumulated_usage["inputTokens"],
    "gen_ai.usage.output_tokens": accumulated_usage["outputTokens"],
    ...
})
```

In `metrics.py`, `reset_usage_metrics()` is intentionally designed to NOT clear `accumulated_usage`:

```python
# Verify accumulated_usage is NOT cleared
assert event_loop_metrics.accumulated_usage["inputTokens"] == 11
```


### Possible Solution

In `tracer.py`, replace `accumulated_usage` with the latest invocation's usage:

```python
agent_invocations[-1].usage
```

instead of:

```python
response.metrics.accumulated_usage
```

This would report only the current invocation's token usage on each span.


### Related Issues

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] OTEL span reports accumulated_usage instead of per-invocation usage, causing inflated token metrics in Langfuse #2010

Checks

Strands Version

Python Version

Operating System

Installation Method

Steps to Reproduce

Expected Behavior

Actual Behavior

Additional Context

Possible Solution

Related Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] OTEL span reports accumulated_usage instead of per-invocation usage, causing inflated token metrics in Langfuse #2010

Description

Checks

Strands Version

Python Version

Operating System

Installation Method

Steps to Reproduce

Expected Behavior

Actual Behavior

Additional Context

Possible Solution

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions