feat(browser/langgraph): Add browser LangGraph tests with variant options#87
feat(browser/langgraph): Add browser LangGraph tests with variant options#87priscilawebdev wants to merge 12 commits intomainfrom
Conversation
Introduces five isolated browser test frameworks for LangGraph, each targeting a specific known Sentry SDK instrumentation bug. Splitting into separate framework folders means each bug is independently observable and its fix can be validated in isolation. ## Frameworks added ### langgraph (instrumentLangGraph only) Uses StateGraph(MessagesAnnotation) + Sentry.instrumentLangGraph() before compile(). Blocking invoke() works and produces an invoke_agent span with input/output messages. Streaming (stream()) is not patched — no spans are created (Bug 3). streamingMode: "both" to surface this difference. ### langgraph-langchain (createLangChainCallbackHandler only) Uses StateGraph + createLangChainCallbackHandler passed to compiledGraph.invoke(). This triggers handleChainStart for each LangGraph node, which previously produced spans named "unknown_chain" instead of the actual node name (Bug 2, fixed in sentry-javascript#19554). No invoke_agent span is created since instrumentLangGraph is not used. ### langgraph-combined (both APIs together) Uses both instrumentLangGraph and createLangChainCallbackHandler together. Combined use causes chat spans to be orphaned (not nested inside invoke_agent) and missing input/output messages. Spurious invoke_agent sub-spans also appear (Bug 4). ### langgraph-compiled (instrumentLangGraph on compiled graph) Uses createReactAgent (which returns an already-compiled graph) and then calls instrumentLangGraph on the result. This crashes with a TypeError because instrumentLangGraph cannot patch a graph that is already compiled (Bug 1). Mirrors the pattern shown in the official Sentry docs. ### langgraph-custom-state (Annotation.Root custom state) Uses StateGraph with Annotation.Root instead of MessagesAnnotation. instrumentLangGraph runs without error and invoke_agent spans are created, but recordInputs/recordOutputs silently records nothing because the state has no "messages" key (Bug 5). ## Supporting changes - Add checkAgentInputOutputMessages check: validates that invoke_agent spans carry gen_ai.input.messages and gen_ai.output.messages when using instrumentLangGraph. Skips for non-instrumentLangGraph frameworks. Fails for langgraph-custom-state to surface Bug 5. - Fix agents/node/langgraph/config.json: add missing sentryVersions field that was causing the framework to be silently skipped by discovery. - Update CLAUDE.md and templates/README.md with browser variant docs and supported SDK table entries. Co-Authored-By: Claude <noreply@anthropic.com>
The Sentry JavaScript SDK had a bug in createLangChainCallbackHandler where handleChainStart only read 4 of the 8 parameters passed by LangChain. The 8th argument (runName) carries the actual LangGraph node name, but was never reached. As a result, every chain span fell back to "unknown_chain" regardless of the actual node name (Bug 2). This was fixed in sentry-javascript#19554 by reading runName as the first fallback before chain.name. ## Changes ### Fix langgraph-langchain template The callback handler was previously passed to llm.invoke() rather than compiledGraph.invoke(). This meant handleChainStart was never triggered for LangGraph nodes, so Bug 2 was not observable at all. Moving the callback to compiledGraph.invoke() causes handleChainStart to fire for each node (e.g. "agent", "__start__"), which is the correct usage and the pattern that surfaced the original bug. ### Add checkLangChainNodeNames check New check that only runs for langgraph-langchain (skipIf for all other frameworks). Finds invoke_agent spans with a langchain.chain.name attribute (created by handleChainStart) and asserts none of them carry the value "unknown_chain". This directly validates the fix in #19554 and will catch any regression. Note: handleChainStart creates spans with op="gen_ai.invoke_agent" but without gen_ai.operation.name in data, so the check uses a direct op filter rather than findAgentSpans() which relies on that attribute. The check is added to all six agent test cases so that any regression is caught regardless of which test runs. Co-Authored-By: Claude <noreply@anthropic.com>
🔴 AI SDK Integration Test ResultsStatus: 2 regressions detected Summary
🔴 RegressionsThese tests were passing on main but are now failing: cloudflare/anthropic :: Basic LLM Test (blocking)Error: Test execution failed: Wrangler exited with code 1 cloudflare/google-genai :: Vision LLM Test (blocking)Error: Test execution failed: Wrangler exited with code 1 ✅ FixedThese tests were failing on main but are now passing:
🆕 New TestsFailing (66): ❌ browser/langgraph :: Basic Agent Test (streaming, graph)Error: 2 check(s) failed: ❌ browser/langgraph :: Basic Agent Test (streaming, langchain)Error: 5 check(s) failed: ❌ browser/langgraph :: Basic Agent Test (streaming, combined)Error: 4 check(s) failed: ❌ browser/langgraph :: Basic Agent Test (streaming, compiled)Error: 4 check(s) failed: ❌ browser/langgraph :: Basic Agent Test (streaming, custom-state)Error: 2 check(s) failed: ❌ browser/langgraph :: Basic Agent Test (blocking, graph)Error: 1 check(s) failed: ❌ browser/langgraph :: Basic Agent Test (blocking, langchain)Error: 5 check(s) failed: ❌ browser/langgraph :: Basic Agent Test (blocking, combined)Error: 3 check(s) failed: ❌ browser/langgraph :: Basic Agent Test (blocking, compiled)Error: 4 check(s) failed: ❌ browser/langgraph :: Basic Agent Test (blocking, custom-state)Error: 3 check(s) failed: ❌ browser/langgraph :: Tool Call Agent Test (streaming, graph)Error: 6 check(s) failed: ❌ browser/langgraph :: Tool Call Agent Test (streaming, langchain)Error: 9 check(s) failed: ❌ browser/langgraph :: Tool Call Agent Test (streaming, combined)Error: 8 check(s) failed: ❌ browser/langgraph :: Tool Call Agent Test (streaming, compiled)Error: 8 check(s) failed: ❌ browser/langgraph :: Tool Call Agent Test (streaming, custom-state)Error: 6 check(s) failed: ❌ browser/langgraph :: Tool Call Agent Test (blocking, graph)Error: 5 check(s) failed: ❌ browser/langgraph :: Tool Call Agent Test (blocking, langchain)Error: 9 check(s) failed: ❌ browser/langgraph :: Tool Call Agent Test (blocking, combined)Error: 7 check(s) failed: ❌ browser/langgraph :: Tool Call Agent Test (blocking, compiled)Error: 8 check(s) failed: ❌ browser/langgraph :: Tool Call Agent Test (blocking, custom-state)Error: 7 check(s) failed: ❌ browser/langgraph :: Tool Error Agent Test (streaming, graph)Error: 6 check(s) failed: ❌ browser/langgraph :: Tool Error Agent Test (streaming, langchain)Error: 8 check(s) failed: ❌ browser/langgraph :: Tool Error Agent Test (streaming, combined)Error: 7 check(s) failed: ❌ browser/langgraph :: Tool Error Agent Test (streaming, compiled)Error: 8 check(s) failed: ❌ browser/langgraph :: Tool Error Agent Test (streaming, custom-state)Error: 6 check(s) failed: ❌ browser/langgraph :: Tool Error Agent Test (blocking, graph)Error: 5 check(s) failed: ❌ browser/langgraph :: Tool Error Agent Test (blocking, langchain)Error: 8 check(s) failed: ❌ browser/langgraph :: Tool Error Agent Test (blocking, combined)Error: 6 check(s) failed: ❌ browser/langgraph :: Tool Error Agent Test (blocking, compiled)Error: 8 check(s) failed: ❌ browser/langgraph :: Tool Error Agent Test (blocking, custom-state)Error: 6 check(s) failed: ❌ browser/langgraph :: Vision Agent Test (streaming, graph)Error: 3 check(s) failed: ❌ browser/langgraph :: Vision Agent Test (streaming, langchain)Error: 6 check(s) failed: ❌ browser/langgraph :: Vision Agent Test (streaming, combined)Error: 5 check(s) failed: ❌ browser/langgraph :: Vision Agent Test (streaming, compiled)Error: 5 check(s) failed: ❌ browser/langgraph :: Vision Agent Test (streaming, custom-state)Error: 3 check(s) failed: ❌ browser/langgraph :: Vision Agent Test (blocking, graph)Error: 1 check(s) failed: ❌ browser/langgraph :: Vision Agent Test (blocking, langchain)Error: 6 check(s) failed: ❌ browser/langgraph :: Vision Agent Test (blocking, combined)Error: 3 check(s) failed: ❌ browser/langgraph :: Vision Agent Test (blocking, compiled)Error: 5 check(s) failed: ❌ browser/langgraph :: Vision Agent Test (blocking, custom-state)Error: 4 check(s) failed: ❌ browser/langgraph :: Long Input Agent Test (streaming, graph)Error: 2 check(s) failed: ❌ browser/langgraph :: Long Input Agent Test (streaming, langchain)Error: 4 check(s) failed: ❌ browser/langgraph :: Long Input Agent Test (streaming, combined)Error: 3 check(s) failed: ❌ browser/langgraph :: Long Input Agent Test (streaming, compiled)Error: 4 check(s) failed: ❌ browser/langgraph :: Long Input Agent Test (streaming, custom-state)Error: 2 check(s) failed: ❌ browser/langgraph :: Long Input Agent Test (blocking, graph)Error: 1 check(s) failed: ❌ browser/langgraph :: Long Input Agent Test (blocking, langchain)Error: 4 check(s) failed: ❌ browser/langgraph :: Long Input Agent Test (blocking, combined)Error: 2 check(s) failed: ❌ browser/langgraph :: Long Input Agent Test (blocking, compiled)Error: 4 check(s) failed: ❌ browser/langgraph :: Long Input Agent Test (blocking, custom-state)Error: 2 check(s) failed: ❌ browser/langgraph :: Conversation ID Agent Test (streaming, graph)Error: 3 check(s) failed: ❌ browser/langgraph :: Conversation ID Agent Test (streaming, langchain)Error: 6 check(s) failed: ❌ browser/langgraph :: Conversation ID Agent Test (streaming, combined)Error: 5 check(s) failed: ❌ browser/langgraph :: Conversation ID Agent Test (streaming, compiled)Error: 5 check(s) failed: ❌ browser/langgraph :: Conversation ID Agent Test (streaming, custom-state)Error: 3 check(s) failed: ❌ browser/langgraph :: Conversation ID Agent Test (blocking, graph)Error: 2 check(s) failed: ❌ browser/langgraph :: Conversation ID Agent Test (blocking, langchain)Error: 6 check(s) failed: ❌ browser/langgraph :: Conversation ID Agent Test (blocking, combined)Error: 4 check(s) failed: ❌ browser/langgraph :: Conversation ID Agent Test (blocking, compiled)Error: 5 check(s) failed: ❌ browser/langgraph :: Conversation ID Agent Test (blocking, custom-state)Error: 4 check(s) failed: ❌ node/langgraph :: Basic Agent TestError: 2 check(s) failed: ❌ node/langgraph :: Tool Call Agent TestError: 6 check(s) failed: ❌ node/langgraph :: Tool Error Agent TestError: 6 check(s) failed: ❌ node/langgraph :: Vision Agent TestError: 2 check(s) failed: ❌ node/langgraph :: Long Input Agent TestError: 2 check(s) failed: ❌ node/langgraph :: Conversation ID Agent TestError: 2 check(s) failed: Test MatrixAgent Tests
Embedding Tests
LLM Tests
Legend: ✅ Pass | ❌ Fail | ✅🔧 Fixed | ❌📉 Regressed | ✅🆕 New (pass) | ❌🆕 New (fail) | 🗑️ Removed | str=streaming blk=blocking a=async s=sync hi=highlevel lo=lowlevel Generated by AI SDK Integration Tests |
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Autofix Details
Bugbot Autofix prepared fixes for both issues found in the latest run.
- ✅ Fixed: Custom-state template extracts system message, not user message
- The custom-state template now selects the first message with role "user" (with fallback) before deriving
userInput, so the intended user prompt is sent to the model.
- The custom-state template now selects the first message with role "user" (with fallback) before deriving
- ✅ Fixed: Combined template callback placement prevents full bug reproduction
- The combined template now invokes the compiled graph with the callback handler and removes callbacks from
llm.invoke, enabling graph-level callback events needed for the interference reproduction.
- The combined template now invokes the compiled graph with the callback handler and removes callbacks from
Or push these changes by commenting:
@cursor push 6f72370ba1
Preview (6f72370ba1)
diff --git a/src/runner/templates/agents/browser/langgraph-combined/template.njk b/src/runner/templates/agents/browser/langgraph-combined/template.njk
--- a/src/runner/templates/agents/browser/langgraph-combined/template.njk
+++ b/src/runner/templates/agents/browser/langgraph-combined/template.njk
@@ -48,9 +48,9 @@
apiKey: OPENAI_API_KEY,
});
- // Build graph with agent node — passes callback handler to llm.invoke()
+ // Build graph with agent node
async function agentNode(state) {
- const response = await llm.invoke(state.messages, { callbacks: [callbackHandler] });
+ const response = await llm.invoke(state.messages);
return { messages: [response] };
}
@@ -88,7 +88,10 @@
try {
log('Starting request {{ loop.index }}...');
- const result = await compiledGraph.invoke({ messages: messages{{ loop.index }} });
+ const result = await compiledGraph.invoke(
+ { messages: messages{{ loop.index }} },
+ { callbacks: [callbackHandler] }
+ );
const lastMessage = result.messages[result.messages.length - 1];
log('Response {{ loop.index }}:', lastMessage.content);
} catch (error) {
diff --git a/src/runner/templates/agents/browser/langgraph-custom-state/template.njk b/src/runner/templates/agents/browser/langgraph-custom-state/template.njk
--- a/src/runner/templates/agents/browser/langgraph-custom-state/template.njk
+++ b/src/runner/templates/agents/browser/langgraph-custom-state/template.njk
@@ -71,7 +71,11 @@
{% endif %}
// Request {{ loop.index }}{% if loop.length > 1 %} of {{ loop.length }}{% endif %}
// Extract the first user message text as a plain string for the custom state
- const userInput{{ loop.index }} = {% if input.messages[0].content is string %}"{{ input.messages[0].content }}"{% else %}"{{ input.messages[0].content[0].text }}"{% endif %};
+ const inputMessages{{ loop.index }} = {{ input.messages | dump }};
+ const userMessage{{ loop.index }} = inputMessages{{ loop.index }}.find((message) => message.role === "user") ?? inputMessages{{ loop.index }}[0];
+ const userInput{{ loop.index }} = typeof userMessage{{ loop.index }}.content === "string"
+ ? userMessage{{ loop.index }}.content
+ : (userMessage{{ loop.index }}.content.find((part) => part.type === "text")?.text ?? "");
try {
log('Starting request {{ loop.index }}...');
src/runner/templates/agents/browser/langgraph-custom-state/template.njk
Outdated
Show resolved
Hide resolved
src/runner/templates/agents/browser/langgraph-combined/template.njk
Outdated
Show resolved
Hide resolved
Move callbackHandler from llm.invoke() to graph.invoke() in langgraph-combined template — LangGraph auto-propagates callbacks to nested calls, so graph-level is the realistic user pattern. Fix misleading comment in langgraph-custom-state that said "first user message" when it actually extracts the first message regardless of role. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…plicate invoke_agent spans Update CLAUDE.md and template comments to reflect observed behavior: chat spans are dropped intermittently (not orphaned) and duplicate invoke_agent spans are produced when both instrumentation APIs are active. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tputMessages The filter used exact === "invoke_agent" but findAgentSpans uses a regex that also matches "gen_ai.invoke_agent". If the SDK emits the prefixed form, the filter would silently drop all spans and skipIf would skip the entire check without any visible failure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…h generic options Replace five separate langgraph browser folders (langgraph, langgraph-langchain, langgraph-combined, langgraph-compiled, langgraph-custom-state) with a single langgraph folder using the generic options system. The `variant` option expands the test matrix to cover all five instrumentation approaches. All variants now run both streaming and blocking modes. Remove integration-specific checks (checkAgentInputOutputMessages, checkLangChainNodeNames) and instead validate gen_ai.agent.name matches the expected name from the test definition in the generic checkAgentSpanAttributes. Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| {% if variant == "compiled" %} | ||
| const llm = new langchainOpenAI.ChatOpenAI({ | ||
| modelName: {% if causeAPIError %}"invalid-model"{% else %}"{{ inputs[0].model }}"{% endif %}, | ||
| openAIApiKey: OPENAI_API_KEY, |
There was a problem hiding this comment.
Compiled variant uses deprecated openAIApiKey parameter name
Medium Severity
The compiled variant passes openAIApiKey to the ChatOpenAI constructor, while all other variants in the same template (lines 85 and 110) correctly use apiKey. The @langchain/openai package deprecated openAIApiKey in favor of apiKey. Depending on the version of @langchain/openai (1.2.12 per config.json), this could cause the API key to be silently ignored, leading to authentication failures that get misattributed to the known TypeError issue for this variant rather than the actual root cause.
| // Request {{ loop.index }}{% if loop.length > 1 %} of {{ loop.length }}{% endif %} | ||
|
|
||
| {% if variant == "custom-state" %} | ||
| const userInput{{ loop.index }} = {% if input.messages[0].content is string %}"{{ input.messages[0].content }}"{% else %}"{{ input.messages[0].content[0].text }}"{% endif %}; |
There was a problem hiding this comment.
Custom-state variant sends system message as user input
Low Severity
The custom-state variant extracts input.messages[0].content as user input. For test cases like basicAgentTest and visionAgentTest, messages[0] is the system message (e.g., "You are a helpful assistant."), not the user message. The actual user question at messages[1] is silently discarded, and for the vision test the image content is completely lost. The template inconsistently picks the correct message depending on which test case it runs with.



Add browser LangGraph tests covering five instrumentation patterns, using the generic options system to keep everything in a single
langgraphframework folder.A
variantoption expands the test matrix. All variants run both streaming and blocking. Use--option variant=<name>to filter.Variants
graphinstrumentLangGraph()onlyinvoke_agentspanlangchaincreateLangChainCallbackHandler()onlygen_ai.agent.name/gen_ai.operation.namecombinedinvoke_agentspanscompiledinstrumentLangGraph()on compiled graphTypeErrorcustom-stateinstrumentLangGraph()with custom staterecordInputs/recordOutputssilently records nothingOther changes
checkAgentSpanAttributesnow validatesgen_ai.agent.namematches the expected name from the test definition (not just existence), catching regressions like theunknown_chainbugCloses https://linear.app/getsentry/issue/TET-1946/ai-testing-framework-dogfood-langgraph-in-a-browser-runtime