-
Notifications
You must be signed in to change notification settings - Fork 753
[BUG] OTEL span reports accumulated_usage instead of per-invocation usage, causing inflated token metrics in Langfuse #2010
Description
Checks
- I have updated to the lastest minor and patch version of Strands
- I have checked the documentation and this is not expected behavior
- I have searched ./issues and there are no duplicates of my issue
Strands Version
v1.33
Python Version
3.12
Operating System
macox
Installation Method
pip
Steps to Reproduce
-
Install Strands Python SDK via pip
-
Configure OTEL env vars to export traces to Langfuse:
OTEL_EXPORTER_OTLP_ENDPOINTOTEL_EXPORTER_OTLP_HEADERSOTEL_EXPORTER_OTLP_PROTOCOLDISABLE_ADOT_OBSERVABILITY=trueOTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_tool_definitions
-
Run an agent across multiple invocations in a multi-chat session
-
Observe token usage reported in Langfuse per span
Expected Behavior
Each OTEL span should report token usage for that specific invocation only. e.g., in a session with 10 invocations of ~100k tokens each:
- Request 1 → 100k tokens
- Request 2 → 100k tokens
- Request 3 → 100k tokens
Total: ~1M tokens
Actual Behavior
Each OTEL span reports the session-lifetime accumulated_usage instead of per-invocation usage. e.g.:
- Request 1 → 100k tokens ✅
- Request 2 → 200k tokens ❌ (should be 100k)
- Request 3 → 300k tokens ❌ (should be 100k)
Langfuse then sums these, resulting in wildly inflated token counts and cost estimates. Additionally, reset_usage_metrics() does NOT reset accumulated_usage — this appears to be intentional per the test suite, but means there is no workaround.
Additional Context
Root cause identified in src/strands/telemetry/tracer.py: The OTEL span reporter explicitly uses accumulated_usage when setting span attributes:
accumulated_usage = response.metrics.accumulated_usage
attributes.update({
"gen_ai.usage.prompt_tokens": accumulated_usage["inputTokens"],
"gen_ai.usage.input_tokens": accumulated_usage["inputTokens"],
"gen_ai.usage.output_tokens": accumulated_usage["outputTokens"],
...
})In metrics.py, reset_usage_metrics() is intentionally designed to NOT clear accumulated_usage:
# Verify accumulated_usage is NOT cleared
assert event_loop_metrics.accumulated_usage["inputTokens"] == 11Possible Solution
In tracer.py, replace accumulated_usage with the latest invocation's usage:
agent_invocations[-1].usageinstead of:
response.metrics.accumulated_usageThis would report only the current invocation's token usage on each span.
Related Issues
No response