Skip to content

Conversation

@Focadecombate
Copy link

@Focadecombate Focadecombate commented Dec 9, 2025

PR Description

feat(crewai): Add tool spans and enhance GenAI semantic conventions

Summary

This PR enhances the CrewAI instrumentation by adding tool execution spans and aligning with the GenAI semantic conventions for better observability.

Changes

New Features:

  • Added instrumentation for ToolUsage._use to capture tool execution spans
  • Tool spans include:
    • gen_ai.tool.name - the name of the tool being executed
    • gen_ai.tool.call.arguments - JSON serialized tool arguments
    • gen_ai.tool.call.result - JSON serialized tool execution result
    • CrewAI-specific attributes like current_usage_count, max_usage_count, and result_as_answer

Enhanced GenAI Semantic Conventions:

  • Added gen_ai.operation.name attribute across all span types:
    • invoke_agent for agent execution spans
    • chat for LLM call spans
    • execute_tool for tool execution spans
  • Added gen_ai.input_messages and gen_ai.output_messages to LLM call spans for better message tracing

Dependencies:

  • Updated opentelemetry-api from ^1.38.0 to ^1.39.0
  • Expanded CrewAI version compatibility from ^0.80.0 to >=0.80.0,<0.203.0

Checklist

  • I have added tests that cover my changes.
  • If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change. - Will add jaeger screenshots
  • PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
  • (If applicable) I have updated the documentation accordingly.

Important

Add tool execution spans and enhance GenAI semantic conventions in CrewAI instrumentation, updating dependencies accordingly.

  • New Features:
    • Add wrap_tool_use to instrumentation.py for capturing tool execution spans with attributes like gen_ai.tool.name, gen_ai.tool.call.arguments, and gen_ai.tool.call.result.
    • Include CrewAI-specific attributes: current_usage_count, max_usage_count, and result_as_answer.
  • Enhanced GenAI Semantic Conventions:
    • Add gen_ai.operation.name to all span types: invoke_agent, chat, execute_tool.
    • Add gen_ai.input_messages and gen_ai.output_messages to LLM call spans.
  • Dependencies:
    • Update opentelemetry-api to ^1.39.0 in pyproject.toml.
    • Expand CrewAI version compatibility to >=0.80.0,<0.203.0.

This description was created by Ellipsis for 051d531. You can customize this summary. It will automatically update as commits are pushed.

Summary by CodeRabbit

  • New Features

    • Added distributed tracing for tool usage with span instrumentation capturing tool context and results.
  • Improvements

    • Expanded tracing observability: more span attributes (usage counts, args/results, duration, status) and metrics extended to cover tool calls alongside agent and LLM operations.
  • Chores

    • Updated OpenTelemetry API to 1.39.0 and broadened CrewAI test compatibility.

✏️ Tip: You can customize this high-level summary in your review settings.

@CLAassistant
Copy link

CLAassistant commented Dec 9, 2025

CLA assistant check
All committers have signed the CLA.

@coderabbitai
Copy link

coderabbitai bot commented Dec 9, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Added instrumentation for tool usage by introducing a new wrap_tool_use wrapper that creates tracing spans for tool calls, records context, arguments, results, durations, and token/duration metrics; the wrapper is registered/unregistered in the package's _instrument/_uninstrument flow and dependency bounds were bumped.

Changes

Cohort / File(s) Summary
Tool instrumentation implementation
packages/opentelemetry-instrumentation-crewai/opentelemetry/instrumentation/crewai/instrumentation.py
Added wrap_tool_use(tracer, duration_histogram, token_histogram, wrapped, instance, args, kwargs) to instrument tool executions. Creates spans named "{tool_name}.tool", captures tool context, args/results (JSON-serialized), duration, status, and emits metrics.
Wrapper registration & helpers
packages/opentelemetry-instrumentation-crewai/opentelemetry/instrumentation/crewai/instrumentation.py
Registered the new ToolUsage wrapper in _instrument and added corresponding unwrapping in _uninstrument. Updated helper with_tracer_wrapper and existing wrappers (wrap_kickoff, wrap_agent_execute_task, wrap_task_execute, wrap_llm_call) to multiline signatures and explicit typing.
Tracing attributes & metrics
packages/opentelemetry-instrumentation-crewai/opentelemetry/instrumentation/crewai/instrumentation.py
Expanded span attributes and metric emissions: LLM call now emits GEN_AI_OUTPUT_MESSAGES; agent kickoff and task/agent wrappers include additional span attributes and conditional result attributes; token and duration histograms extended to cover tool usage.
Dependency updates
packages/opentelemetry-instrumentation-crewai/pyproject.toml
Bumped opentelemetry-api from ^1.38.0 to ^1.39.0. Relaxed crewai test dependency from ^0.80.0 to >=0.80.0,<0.203.0.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant Instrumentation
    participant Tracer
    participant Tool
    participant Span

    Caller->>Instrumentation: invoke tool (wrapped by wrap_tool_use)
    Instrumentation->>Tracer: start span "{tool_name}.tool"
    Tracer-->>Span: span started

    Instrumentation->>Span: set attributes (tool_name, context, args, counts)
    Instrumentation->>Tool: call underlying tool
    Tool-->>Instrumentation: return result / raise error

    Instrumentation->>Span: set result attributes (JSON result, status)
    Instrumentation->>Instrumentation: record duration & token metrics
    Instrumentation->>Tracer: end span
    Span-->>Tracer: span completed

    Instrumentation-->>Caller: return tool result
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Review span creation/closing and attribute names in wrap_tool_use
  • Verify registration/unregistration in _instrument / _uninstrument
  • Check metric histogram usage for token/duration consistency
  • Validate JSON serialization of args/results and error handling

Poem

🐰 I hopped in to wrap a clever new tool,
Spans stitched each hop like a neat little spool,
I counted the tokens, the time, and the cheer,
Traces now sparkle when tools appear ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the main changes: adding tool spans instrumentation and enhancing GenAI semantic conventions for CrewAI.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Changes requested ❌

Reviewed everything up to 051d531 in 2 minutes and 23 seconds. Click for details.
  • Reviewed 328 lines of code in 2 files
  • Skipped 1 files when reviewing.
  • Skipped posting 2 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-crewai/opentelemetry/instrumentation/crewai/instrumentation.py:257
  • Draft comment:
    Using math.inf as the default for max_usage_count may cause JSON serialization issues downstream. Consider using a finite value or converting it to a string.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment raises a potentially valid concern. Python's math.inf is a float value representing infinity. When OpenTelemetry tries to serialize span attributes (which often happens when exporting traces), JSON serialization could indeed fail because JSON doesn't have a standard representation for infinity (though some implementations handle it differently). However, I need to consider: 1) OpenTelemetry's span attribute system may already handle this gracefully by converting it to a string or special value, 2) This is speculative - the comment says "may cause" issues, which violates the rule about not making speculative comments, 3) Without seeing actual evidence that this causes a problem in practice, this is just a theoretical concern. The comment is speculative and doesn't provide evidence of an actual issue. OpenTelemetry libraries are mature and may already handle edge cases like infinity values in span attributes. The comment is speculative ("may cause issues") without demonstrating that this actually breaks anything. If this were a real issue, it would likely be caught during testing or runtime. While OpenTelemetry might handle this gracefully, JSON serialization of infinity is genuinely problematic in standard JSON (it's not part of the JSON spec). However, the comment is still speculative - it says "may cause" rather than demonstrating an actual issue. According to the rules, speculative comments should be deleted unless there's definite evidence of a problem. This comment should be deleted because it's speculative ("may cause issues") without providing evidence that this actually causes a problem. The rules explicitly state not to make speculative comments - only comment if it's definitely an issue.
2. packages/opentelemetry-instrumentation-crewai/opentelemetry/instrumentation/crewai/instrumentation.py:54
  • Draft comment:
    It looks like there might be a typographical error in the function call on this line. Previously, the task execution wrapper was named wrap_task_execute_task, but here it has been changed to wrap_task_execute. If this change was unintentional, please update the function name to maintain consistency.
  • Reason this comment was not posted:
    Comment was on unchanged code.

Workflow ID: wflow_zA308dx3WFdk9Nun

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
packages/opentelemetry-instrumentation-crewai/pyproject.toml (1)

23-23: Version mismatch between opentelemetry-api and opentelemetry-sdk.

opentelemetry-api is bumped to ^1.39.0, but opentelemetry-sdk in test dependencies (line 39) remains at ^1.38.0. Consider aligning these versions to ensure consistency during testing.

-opentelemetry-sdk = "^1.38.0"
+opentelemetry-sdk = "^1.39.0"
packages/opentelemetry-instrumentation-crewai/opentelemetry/instrumentation/crewai/instrumentation.py (1)

213-215: Input messages format may not match semantic conventions.

json.dumps(args) serializes the raw args tuple, which may not conform to the expected message structure. The output messages (lines 221-224) use a structured format {"role": "...", "content": "..."}. Consider formatting input messages consistently.

-            GenAIAttributes.GEN_AI_INPUT_MESSAGES: json.dumps(args),
+            GenAIAttributes.GEN_AI_INPUT_MESSAGES: json.dumps(
+                [{"role": "user", "content": str(arg)} for arg in args if arg]
+            ),
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 53920e3 and 051d531.

⛔ Files ignored due to path filters (1)
  • packages/opentelemetry-instrumentation-crewai/poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (2)
  • packages/opentelemetry-instrumentation-crewai/opentelemetry/instrumentation/crewai/instrumentation.py (8 hunks)
  • packages/opentelemetry-instrumentation-crewai/pyproject.toml (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/opentelemetry-instrumentation-crewai/opentelemetry/instrumentation/crewai/instrumentation.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: duanyutong
Repo: traceloop/openllmetry PR: 3487
File: packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py:177-178
Timestamp: 2025-12-02T21:09:48.690Z
Learning: The opentelemetry-instrumentation-openai and opentelemetry-instrumentation-openai-agents packages must remain independent and not share code, so code duplication between them is acceptable.
Learnt from: CR
Repo: traceloop/openllmetry PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Instrumentation packages must leverage the semantic conventions package and emit OTel-compliant spans
📚 Learning: 2025-08-22T14:41:26.962Z
Learnt from: prane-eth
Repo: traceloop/openllmetry PR: 3336
File: packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py:8-8
Timestamp: 2025-08-22T14:41:26.962Z
Learning: In the openllmetry project, the `packaging` library is available as a transitive dependency through other packages (visible in poetry.lock) and doesn't need to be explicitly declared in pyproject.toml dependencies.

Applied to files:

  • packages/opentelemetry-instrumentation-crewai/pyproject.toml
🧬 Code graph analysis (1)
packages/opentelemetry-instrumentation-crewai/opentelemetry/instrumentation/crewai/instrumentation.py (4)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (3)
  • Meters (36-61)
  • SpanAttributes (64-245)
  • TraceloopSpanKindValues (285-290)
packages/traceloop-sdk/traceloop/sdk/tracing/tracing.py (1)
  • get_tracer (226-227)
packages/opentelemetry-instrumentation-crewai/opentelemetry/instrumentation/crewai/crewai_span_attributes.py (2)
  • set_span_attribute (6-10)
  • CrewAISpanAttributes (13-163)
packages/traceloop-sdk/traceloop/sdk/decorators/__init__.py (1)
  • tool (61-71)
🪛 Ruff (0.14.8)
packages/opentelemetry-instrumentation-crewai/opentelemetry/instrumentation/crewai/instrumentation.py

67-67: Unused method argument: kwargs

(ARG002)


98-98: Unused function argument: duration_histogram

(ARG001)


99-99: Unused function argument: token_histogram

(ARG001)


125-125: Consider moving this statement to an else block

(TRY300)


133-133: Unused function argument: duration_histogram

(ARG001)


180-180: Unused function argument: duration_histogram

(ARG001)


180-180: Unused function argument: token_histogram

(ARG001)


206-206: Unused function argument: token_histogram

(ARG001)


245-245: Unused function argument: token_histogram

(ARG001)


245-245: Unused function argument: instance

(ARG001)

🔇 Additional comments (7)
packages/opentelemetry-instrumentation-crewai/pyproject.toml (1)

35-35: Wide version range for crewai test dependency.

The range >=0.80.0,<0.203.0 spans many versions. While this provides flexibility, consider whether CI tests against multiple CrewAI versions to catch breaking changes across this range.

packages/opentelemetry-instrumentation-crewai/opentelemetry/instrumentation/crewai/instrumentation.py (6)

1-21: LGTM!

Import reorganization is clean and aligns with the new functionality requirements.


41-65: LGTM!

The new ToolUsage._use instrumentation follows the established pattern for the other wrappers.


96-128: Approve with note on unused parameters.

The duration_histogram and token_histogram parameters are unused in this wrapper but kept for interface consistency with with_tracer_wrapper. This is acceptable for maintaining a uniform API.


131-175: LGTM!

Good enhancement with GEN_AI_OPERATION_NAME and token histogram recording. The use of set_span_attribute helper maintains consistency with the codebase patterns.


178-201: LGTM!

Task execution wrapper follows the established pattern.


279-287: Status and result set outside try block.

Setting span status and result attributes after the try/except block works but could be clearer within an else clause. The current structure is correct since raise prevents reaching lines 283-287 on error.

@Focadecombate Focadecombate force-pushed the feature/add-tool-spans-for-crewai branch from 7bc1c23 to 5458b2b Compare December 9, 2025 01:34
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-crewai/opentelemetry/instrumentation/crewai/instrumentation.py (1)

94-103: Unused wrapper parameters will keep Ruff noisy.

Several wrapper parameters are currently unused (duration_histogram / token_histogram in wrap_kickoff, wrap_task_execute, wrap_llm_call, wrap_tool_use, and instance in wrap_tool_use). Given the shared signature enforced by with_tracer_wrapper, you can either:

  • Prefix currently unused parameters with _ (e.g., _duration_histogram) to silence lint warnings, or
  • Remove them from the wrapper factories where you are sure you will not need them.

This will keep Ruff/Flake8 clean without changing behavior.

Also applies to: 130-133, 178-180, 205-206, 243-245

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 051d531 and 5458b2b.

📒 Files selected for processing (1)
  • packages/opentelemetry-instrumentation-crewai/opentelemetry/instrumentation/crewai/instrumentation.py (8 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/opentelemetry-instrumentation-crewai/opentelemetry/instrumentation/crewai/instrumentation.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: traceloop/openllmetry PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Instrumentation packages must leverage the semantic conventions package and emit OTel-compliant spans
🪛 Ruff (0.14.8)
packages/opentelemetry-instrumentation-crewai/opentelemetry/instrumentation/crewai/instrumentation.py

66-66: Unused method argument: kwargs

(ARG002)


97-97: Unused function argument: duration_histogram

(ARG001)


98-98: Unused function argument: token_histogram

(ARG001)


124-124: Consider moving this statement to an else block

(TRY300)


132-132: Unused function argument: duration_histogram

(ARG001)


179-179: Unused function argument: duration_histogram

(ARG001)


179-179: Unused function argument: token_histogram

(ARG001)


205-205: Unused function argument: token_histogram

(ARG001)


244-244: Unused function argument: token_histogram

(ARG001)


244-244: Unused function argument: instance

(ARG001)

🔇 Additional comments (3)
packages/opentelemetry-instrumentation-crewai/opentelemetry/instrumentation/crewai/instrumentation.py (3)

40-64: ToolUsage wrapper wiring looks correct; confirm target API remains stable.

The new wrap_function_wrapper("crewai.tools.tool_usage", "ToolUsage._use", ...) and matching unwrap("crewai.tools.tool_usage.ToolUsage", "_use") are consistent with the existing Crew/Agent/Task/LLM patterns and should correctly hook tool usage spans.

Please double‑check against the CrewAI versions you support (≥0.80.0,<0.203.0) that ToolUsage._use still lives in crewai.tools.tool_usage and retains the same method name, so instrumentation does not silently stop working across versions.

Also applies to: 66-71


203-215: Guard JSON serialization of LLM messages to avoid runtime failures.

json.dumps(args) and json.dumps([{"role": "assistant", "content": result}]) may raise TypeError if args or result contain non-JSON-serializable objects (e.g., pydantic models, custom classes). Since json.dumps() is evaluated inside the attributes dict passed to start_as_current_span, such an error would occur before the span is created and before the wrapped LLM call runs, breaking instrumentation.

Use a safer serialization approach, for example via default=str:

     with tracer.start_as_current_span(
         f"{llm}.llm",
         kind=SpanKind.CLIENT,
         attributes={
-            GenAIAttributes.GEN_AI_OPERATION_NAME: GenAIAttributes.GenAiOperationNameValues.CHAT.value,
-            GenAIAttributes.GEN_AI_INPUT_MESSAGES: json.dumps(args),
+            GenAIAttributes.GEN_AI_OPERATION_NAME: GenAIAttributes.GenAiOperationNameValues.CHAT.value,
+            GenAIAttributes.GEN_AI_INPUT_MESSAGES: json.dumps(args, default=str),
         },
     ) as span:
@@
-            span.set_attribute(
-                GenAIAttributes.GEN_AI_OUTPUT_MESSAGES,
-                json.dumps([{"role": "assistant", "content": result}]),
-            )
+            span.set_attribute(
+                GenAIAttributes.GEN_AI_OUTPUT_MESSAGES,
+                json.dumps([{"role": "assistant", "content": result}], default=str),
+            )

242-293: Improve tool span robustness: handle positional tool arguments, ensure consistent attributes, and protect JSON serialization.

wrap_tool_use needs several defensive improvements:

  1. Support positional tool argument. If Tool.use() or similar methods pass tool positionally, tool = kwargs.get("tool") will miss it. Add fallback:
-    tool = kwargs.get("tool")
+    tool = kwargs.get("tool")
+    if tool is None and args:
+        tool = args[0]
  1. Set GEN_AI_SYSTEM and GEN_AI_TOOL_NAME consistently. Both should be present on every span for uniform querying:
-    attributes: dict[str, Any] = {
-        GenAIAttributes.GEN_AI_OPERATION_NAME: GenAIAttributes.GenAiOperationNameValues.EXECUTE_TOOL.value,
-    }
+    attributes: dict[str, Any] = {
+        GenAIAttributes.GEN_AI_SYSTEM: "crewai",
+        GenAIAttributes.GEN_AI_OPERATION_NAME: GenAIAttributes.GenAiOperationNameValues.EXECUTE_TOOL.value,
+    }

And ensure GEN_AI_TOOL_NAME is always set:

-    if tool:
-        tool_name = tool.name
-        attributes.update({
-            GenAIAttributes.GEN_AI_TOOL_NAME: tool_name,
+    if tool:
+        tool_name = getattr(tool, "name", tool_name)
+        attributes[GenAIAttributes.GEN_AI_TOOL_NAME] = tool_name
+    else:
+        attributes[GenAIAttributes.GEN_AI_TOOL_NAME] = tool_name
  1. Protect JSON serialization with default=str. Both arguments and responses may contain non-JSON-serializable types:
-            GenAIAttributes.GEN_AI_TOOL_CALL_ARGUMENTS: json.dumps(
-                getattr(tool, "args", {})
-            ),
+            GenAIAttributes.GEN_AI_TOOL_CALL_ARGUMENTS: json.dumps(
+                getattr(tool, "args", {}),
+                default=str,
+            ),

And for the response:

-        span.set_attribute(
-            GenAIAttributes.GEN_AI_TOOL_CALL_RESULT, json.dumps(response)
-        )
+        span.set_attribute(
+            GenAIAttributes.GEN_AI_TOOL_CALL_RESULT,
+            json.dumps(response, default=str),
+        )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants