Skip to content

[BUG] OTEL span reports accumulated_usage instead of per-invocation usage, causing inflated token metrics in Langfuse #2010

@afarntrog

Description

@afarntrog

Checks

  • I have updated to the lastest minor and patch version of Strands
  • I have checked the documentation and this is not expected behavior
  • I have searched ./issues and there are no duplicates of my issue

Strands Version

v1.33

Python Version

3.12

Operating System

macox

Installation Method

pip

Steps to Reproduce

  • Install Strands Python SDK via pip

  • Configure OTEL env vars to export traces to Langfuse:

    • OTEL_EXPORTER_OTLP_ENDPOINT
    • OTEL_EXPORTER_OTLP_HEADERS
    • OTEL_EXPORTER_OTLP_PROTOCOL
    • DISABLE_ADOT_OBSERVABILITY=true
    • OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_tool_definitions
  • Run an agent across multiple invocations in a multi-chat session

  • Observe token usage reported in Langfuse per span

Expected Behavior

Each OTEL span should report token usage for that specific invocation only. e.g., in a session with 10 invocations of ~100k tokens each:

  • Request 1 → 100k tokens
  • Request 2 → 100k tokens
  • Request 3 → 100k tokens

Total: ~1M tokens

Actual Behavior

Each OTEL span reports the session-lifetime accumulated_usage instead of per-invocation usage. e.g.:

  • Request 1 → 100k tokens ✅
  • Request 2 → 200k tokens ❌ (should be 100k)
  • Request 3 → 300k tokens ❌ (should be 100k)

Langfuse then sums these, resulting in wildly inflated token counts and cost estimates. Additionally, reset_usage_metrics() does NOT reset accumulated_usage — this appears to be intentional per the test suite, but means there is no workaround.

Additional Context

Root cause identified in src/strands/telemetry/tracer.py: The OTEL span reporter explicitly uses accumulated_usage when setting span attributes:

accumulated_usage = response.metrics.accumulated_usage
attributes.update({
    "gen_ai.usage.prompt_tokens": accumulated_usage["inputTokens"],
    "gen_ai.usage.input_tokens":  accumulated_usage["inputTokens"],
    "gen_ai.usage.output_tokens": accumulated_usage["outputTokens"],
    ...
})

In metrics.py, reset_usage_metrics() is intentionally designed to NOT clear accumulated_usage:

# Verify accumulated_usage is NOT cleared
assert event_loop_metrics.accumulated_usage["inputTokens"] == 11

Possible Solution

In tracer.py, replace accumulated_usage with the latest invocation's usage:

agent_invocations[-1].usage

instead of:

response.metrics.accumulated_usage

This would report only the current invocation's token usage on each span.

Related Issues

No response

Metadata

Metadata

Assignees

Labels

area-communityRelated to community and contributor healthbugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions