-
Notifications
You must be signed in to change notification settings - Fork 850
feat(crewai): Add tool spans and enhance GenAI semantic conventions #3509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(crewai): Add tool spans and enhance GenAI semantic conventions #3509
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdded instrumentation for tool usage by introducing a new wrap_tool_use wrapper that creates tracing spans for tool calls, records context, arguments, results, durations, and token/duration metrics; the wrapper is registered/unregistered in the package's _instrument/_uninstrument flow and dependency bounds were bumped. Changes
Sequence DiagramsequenceDiagram
participant Caller
participant Instrumentation
participant Tracer
participant Tool
participant Span
Caller->>Instrumentation: invoke tool (wrapped by wrap_tool_use)
Instrumentation->>Tracer: start span "{tool_name}.tool"
Tracer-->>Span: span started
Instrumentation->>Span: set attributes (tool_name, context, args, counts)
Instrumentation->>Tool: call underlying tool
Tool-->>Instrumentation: return result / raise error
Instrumentation->>Span: set result attributes (JSON result, status)
Instrumentation->>Instrumentation: record duration & token metrics
Instrumentation->>Tracer: end span
Span-->>Tracer: span completed
Instrumentation-->>Caller: return tool result
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Changes requested ❌
Reviewed everything up to 051d531 in 2 minutes and 23 seconds. Click for details.
- Reviewed
328lines of code in2files - Skipped
1files when reviewing. - Skipped posting
2draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-crewai/opentelemetry/instrumentation/crewai/instrumentation.py:257
- Draft comment:
Using math.inf as the default for max_usage_count may cause JSON serialization issues downstream. Consider using a finite value or converting it to a string. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment raises a potentially valid concern. Python'smath.infis a float value representing infinity. When OpenTelemetry tries to serialize span attributes (which often happens when exporting traces), JSON serialization could indeed fail because JSON doesn't have a standard representation for infinity (though some implementations handle it differently). However, I need to consider: 1) OpenTelemetry's span attribute system may already handle this gracefully by converting it to a string or special value, 2) This is speculative - the comment says "may cause" issues, which violates the rule about not making speculative comments, 3) Without seeing actual evidence that this causes a problem in practice, this is just a theoretical concern. The comment is speculative and doesn't provide evidence of an actual issue. OpenTelemetry libraries are mature and may already handle edge cases like infinity values in span attributes. The comment is speculative ("may cause issues") without demonstrating that this actually breaks anything. If this were a real issue, it would likely be caught during testing or runtime. While OpenTelemetry might handle this gracefully, JSON serialization of infinity is genuinely problematic in standard JSON (it's not part of the JSON spec). However, the comment is still speculative - it says "may cause" rather than demonstrating an actual issue. According to the rules, speculative comments should be deleted unless there's definite evidence of a problem. This comment should be deleted because it's speculative ("may cause issues") without providing evidence that this actually causes a problem. The rules explicitly state not to make speculative comments - only comment if it's definitely an issue.
2. packages/opentelemetry-instrumentation-crewai/opentelemetry/instrumentation/crewai/instrumentation.py:54
- Draft comment:
It looks like there might be a typographical error in the function call on this line. Previously, the task execution wrapper was namedwrap_task_execute_task, but here it has been changed towrap_task_execute. If this change was unintentional, please update the function name to maintain consistency. - Reason this comment was not posted:
Comment was on unchanged code.
Workflow ID: wflow_zA308dx3WFdk9Nun
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
...opentelemetry-instrumentation-crewai/opentelemetry/instrumentation/crewai/instrumentation.py
Outdated
Show resolved
Hide resolved
...opentelemetry-instrumentation-crewai/opentelemetry/instrumentation/crewai/instrumentation.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (2)
packages/opentelemetry-instrumentation-crewai/pyproject.toml (1)
23-23: Version mismatch between opentelemetry-api and opentelemetry-sdk.
opentelemetry-apiis bumped to^1.39.0, butopentelemetry-sdkin test dependencies (line 39) remains at^1.38.0. Consider aligning these versions to ensure consistency during testing.-opentelemetry-sdk = "^1.38.0" +opentelemetry-sdk = "^1.39.0"packages/opentelemetry-instrumentation-crewai/opentelemetry/instrumentation/crewai/instrumentation.py (1)
213-215: Input messages format may not match semantic conventions.
json.dumps(args)serializes the raw args tuple, which may not conform to the expected message structure. The output messages (lines 221-224) use a structured format{"role": "...", "content": "..."}. Consider formatting input messages consistently.- GenAIAttributes.GEN_AI_INPUT_MESSAGES: json.dumps(args), + GenAIAttributes.GEN_AI_INPUT_MESSAGES: json.dumps( + [{"role": "user", "content": str(arg)} for arg in args if arg] + ),
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
packages/opentelemetry-instrumentation-crewai/poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (2)
packages/opentelemetry-instrumentation-crewai/opentelemetry/instrumentation/crewai/instrumentation.py(8 hunks)packages/opentelemetry-instrumentation-crewai/pyproject.toml(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/opentelemetry-instrumentation-crewai/opentelemetry/instrumentation/crewai/instrumentation.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: duanyutong
Repo: traceloop/openllmetry PR: 3487
File: packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py:177-178
Timestamp: 2025-12-02T21:09:48.690Z
Learning: The opentelemetry-instrumentation-openai and opentelemetry-instrumentation-openai-agents packages must remain independent and not share code, so code duplication between them is acceptable.
Learnt from: CR
Repo: traceloop/openllmetry PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Instrumentation packages must leverage the semantic conventions package and emit OTel-compliant spans
📚 Learning: 2025-08-22T14:41:26.962Z
Learnt from: prane-eth
Repo: traceloop/openllmetry PR: 3336
File: packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py:8-8
Timestamp: 2025-08-22T14:41:26.962Z
Learning: In the openllmetry project, the `packaging` library is available as a transitive dependency through other packages (visible in poetry.lock) and doesn't need to be explicitly declared in pyproject.toml dependencies.
Applied to files:
packages/opentelemetry-instrumentation-crewai/pyproject.toml
🧬 Code graph analysis (1)
packages/opentelemetry-instrumentation-crewai/opentelemetry/instrumentation/crewai/instrumentation.py (4)
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (3)
Meters(36-61)SpanAttributes(64-245)TraceloopSpanKindValues(285-290)packages/traceloop-sdk/traceloop/sdk/tracing/tracing.py (1)
get_tracer(226-227)packages/opentelemetry-instrumentation-crewai/opentelemetry/instrumentation/crewai/crewai_span_attributes.py (2)
set_span_attribute(6-10)CrewAISpanAttributes(13-163)packages/traceloop-sdk/traceloop/sdk/decorators/__init__.py (1)
tool(61-71)
🪛 Ruff (0.14.8)
packages/opentelemetry-instrumentation-crewai/opentelemetry/instrumentation/crewai/instrumentation.py
67-67: Unused method argument: kwargs
(ARG002)
98-98: Unused function argument: duration_histogram
(ARG001)
99-99: Unused function argument: token_histogram
(ARG001)
125-125: Consider moving this statement to an else block
(TRY300)
133-133: Unused function argument: duration_histogram
(ARG001)
180-180: Unused function argument: duration_histogram
(ARG001)
180-180: Unused function argument: token_histogram
(ARG001)
206-206: Unused function argument: token_histogram
(ARG001)
245-245: Unused function argument: token_histogram
(ARG001)
245-245: Unused function argument: instance
(ARG001)
🔇 Additional comments (7)
packages/opentelemetry-instrumentation-crewai/pyproject.toml (1)
35-35: Wide version range for crewai test dependency.The range
>=0.80.0,<0.203.0spans many versions. While this provides flexibility, consider whether CI tests against multiple CrewAI versions to catch breaking changes across this range.packages/opentelemetry-instrumentation-crewai/opentelemetry/instrumentation/crewai/instrumentation.py (6)
1-21: LGTM!Import reorganization is clean and aligns with the new functionality requirements.
41-65: LGTM!The new
ToolUsage._useinstrumentation follows the established pattern for the other wrappers.
96-128: Approve with note on unused parameters.The
duration_histogramandtoken_histogramparameters are unused in this wrapper but kept for interface consistency withwith_tracer_wrapper. This is acceptable for maintaining a uniform API.
131-175: LGTM!Good enhancement with
GEN_AI_OPERATION_NAMEand token histogram recording. The use ofset_span_attributehelper maintains consistency with the codebase patterns.
178-201: LGTM!Task execution wrapper follows the established pattern.
279-287: Status and result set outside try block.Setting span status and result attributes after the try/except block works but could be clearer within an else clause. The current structure is correct since
raiseprevents reaching lines 283-287 on error.
...opentelemetry-instrumentation-crewai/opentelemetry/instrumentation/crewai/instrumentation.py
Outdated
Show resolved
Hide resolved
...opentelemetry-instrumentation-crewai/opentelemetry/instrumentation/crewai/instrumentation.py
Outdated
Show resolved
Hide resolved
...opentelemetry-instrumentation-crewai/opentelemetry/instrumentation/crewai/instrumentation.py
Outdated
Show resolved
Hide resolved
...opentelemetry-instrumentation-crewai/opentelemetry/instrumentation/crewai/instrumentation.py
Show resolved
Hide resolved
7bc1c23 to
5458b2b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-crewai/opentelemetry/instrumentation/crewai/instrumentation.py (1)
94-103: Unused wrapper parameters will keep Ruff noisy.Several wrapper parameters are currently unused (
duration_histogram/token_histograminwrap_kickoff,wrap_task_execute,wrap_llm_call,wrap_tool_use, andinstanceinwrap_tool_use). Given the shared signature enforced bywith_tracer_wrapper, you can either:
- Prefix currently unused parameters with
_(e.g.,_duration_histogram) to silence lint warnings, or- Remove them from the wrapper factories where you are sure you will not need them.
This will keep Ruff/Flake8 clean without changing behavior.
Also applies to: 130-133, 178-180, 205-206, 243-245
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
packages/opentelemetry-instrumentation-crewai/opentelemetry/instrumentation/crewai/instrumentation.py(8 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
packages/opentelemetry-instrumentation-crewai/opentelemetry/instrumentation/crewai/instrumentation.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: traceloop/openllmetry PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Instrumentation packages must leverage the semantic conventions package and emit OTel-compliant spans
🪛 Ruff (0.14.8)
packages/opentelemetry-instrumentation-crewai/opentelemetry/instrumentation/crewai/instrumentation.py
66-66: Unused method argument: kwargs
(ARG002)
97-97: Unused function argument: duration_histogram
(ARG001)
98-98: Unused function argument: token_histogram
(ARG001)
124-124: Consider moving this statement to an else block
(TRY300)
132-132: Unused function argument: duration_histogram
(ARG001)
179-179: Unused function argument: duration_histogram
(ARG001)
179-179: Unused function argument: token_histogram
(ARG001)
205-205: Unused function argument: token_histogram
(ARG001)
244-244: Unused function argument: token_histogram
(ARG001)
244-244: Unused function argument: instance
(ARG001)
🔇 Additional comments (3)
packages/opentelemetry-instrumentation-crewai/opentelemetry/instrumentation/crewai/instrumentation.py (3)
40-64: ToolUsage wrapper wiring looks correct; confirm target API remains stable.The new
wrap_function_wrapper("crewai.tools.tool_usage", "ToolUsage._use", ...)and matchingunwrap("crewai.tools.tool_usage.ToolUsage", "_use")are consistent with the existing Crew/Agent/Task/LLM patterns and should correctly hook tool usage spans.Please double‑check against the CrewAI versions you support (≥0.80.0,<0.203.0) that
ToolUsage._usestill lives increwai.tools.tool_usageand retains the same method name, so instrumentation does not silently stop working across versions.Also applies to: 66-71
203-215: Guard JSON serialization of LLM messages to avoid runtime failures.
json.dumps(args)andjson.dumps([{"role": "assistant", "content": result}])may raiseTypeErrorifargsorresultcontain non-JSON-serializable objects (e.g., pydantic models, custom classes). Sincejson.dumps()is evaluated inside theattributesdict passed tostart_as_current_span, such an error would occur before the span is created and before the wrapped LLM call runs, breaking instrumentation.Use a safer serialization approach, for example via
default=str:with tracer.start_as_current_span( f"{llm}.llm", kind=SpanKind.CLIENT, attributes={ - GenAIAttributes.GEN_AI_OPERATION_NAME: GenAIAttributes.GenAiOperationNameValues.CHAT.value, - GenAIAttributes.GEN_AI_INPUT_MESSAGES: json.dumps(args), + GenAIAttributes.GEN_AI_OPERATION_NAME: GenAIAttributes.GenAiOperationNameValues.CHAT.value, + GenAIAttributes.GEN_AI_INPUT_MESSAGES: json.dumps(args, default=str), }, ) as span: @@ - span.set_attribute( - GenAIAttributes.GEN_AI_OUTPUT_MESSAGES, - json.dumps([{"role": "assistant", "content": result}]), - ) + span.set_attribute( + GenAIAttributes.GEN_AI_OUTPUT_MESSAGES, + json.dumps([{"role": "assistant", "content": result}], default=str), + )
242-293: Improve tool span robustness: handle positional tool arguments, ensure consistent attributes, and protect JSON serialization.
wrap_tool_useneeds several defensive improvements:
- Support positional
toolargument. IfTool.use()or similar methods passtoolpositionally,tool = kwargs.get("tool")will miss it. Add fallback:- tool = kwargs.get("tool") + tool = kwargs.get("tool") + if tool is None and args: + tool = args[0]
- Set
GEN_AI_SYSTEMandGEN_AI_TOOL_NAMEconsistently. Both should be present on every span for uniform querying:- attributes: dict[str, Any] = { - GenAIAttributes.GEN_AI_OPERATION_NAME: GenAIAttributes.GenAiOperationNameValues.EXECUTE_TOOL.value, - } + attributes: dict[str, Any] = { + GenAIAttributes.GEN_AI_SYSTEM: "crewai", + GenAIAttributes.GEN_AI_OPERATION_NAME: GenAIAttributes.GenAiOperationNameValues.EXECUTE_TOOL.value, + }And ensure
GEN_AI_TOOL_NAMEis always set:- if tool: - tool_name = tool.name - attributes.update({ - GenAIAttributes.GEN_AI_TOOL_NAME: tool_name, + if tool: + tool_name = getattr(tool, "name", tool_name) + attributes[GenAIAttributes.GEN_AI_TOOL_NAME] = tool_name + else: + attributes[GenAIAttributes.GEN_AI_TOOL_NAME] = tool_name
- Protect JSON serialization with
default=str. Both arguments and responses may contain non-JSON-serializable types:- GenAIAttributes.GEN_AI_TOOL_CALL_ARGUMENTS: json.dumps( - getattr(tool, "args", {}) - ), + GenAIAttributes.GEN_AI_TOOL_CALL_ARGUMENTS: json.dumps( + getattr(tool, "args", {}), + default=str, + ),And for the response:
- span.set_attribute( - GenAIAttributes.GEN_AI_TOOL_CALL_RESULT, json.dumps(response) - ) + span.set_attribute( + GenAIAttributes.GEN_AI_TOOL_CALL_RESULT, + json.dumps(response, default=str), + )
PR Description
feat(crewai): Add tool spans and enhance GenAI semantic conventions
Summary
This PR enhances the CrewAI instrumentation by adding tool execution spans and aligning with the GenAI semantic conventions for better observability.
Changes
New Features:
ToolUsage._useto capture tool execution spansgen_ai.tool.name- the name of the tool being executedgen_ai.tool.call.arguments- JSON serialized tool argumentsgen_ai.tool.call.result- JSON serialized tool execution resultcurrent_usage_count,max_usage_count, andresult_as_answerEnhanced GenAI Semantic Conventions:
gen_ai.operation.nameattribute across all span types:invoke_agentfor agent execution spanschatfor LLM call spansexecute_toolfor tool execution spansgen_ai.input_messagesandgen_ai.output_messagesto LLM call spans for better message tracingDependencies:
opentelemetry-apifrom^1.38.0to^1.39.0^0.80.0to>=0.80.0,<0.203.0Checklist
feat(instrumentation): ...orfix(instrumentation): ....Important
Add tool execution spans and enhance GenAI semantic conventions in CrewAI instrumentation, updating dependencies accordingly.
wrap_tool_usetoinstrumentation.pyfor capturing tool execution spans with attributes likegen_ai.tool.name,gen_ai.tool.call.arguments, andgen_ai.tool.call.result.current_usage_count,max_usage_count, andresult_as_answer.gen_ai.operation.nameto all span types:invoke_agent,chat,execute_tool.gen_ai.input_messagesandgen_ai.output_messagesto LLM call spans.opentelemetry-apito^1.39.0inpyproject.toml.>=0.80.0,<0.203.0.This description was created by
for 051d531. You can customize this summary. It will automatically update as commits are pushed.
Summary by CodeRabbit
New Features
Improvements
Chores
✏️ Tip: You can customize this high-level summary in your review settings.