fix(workflow-executor): scope MCP tool fetch to the step's target server (PRD-363)#1607
Conversation
…ver (PRD-363) Every MCP step opened connections to every configured MCP server and listed their tools, then filtered in-memory by step.mcpServerId. With N configured servers, an MCP step paid N× the connect/list-tools roundtrip cost, hit every upstream server (billable/rate-limited), and over-allocated MCP connections. Filter configs by cfg.id (DB id from PRD-360) inside Runner.fetchRemoteTools before delegating to loadRemoteTools. The MCP branch of StepExecutorFactory forwards step.mcpServerId — the Runner stays generic. The executor's getFilteredTools collapses into requireTools (pre-scoped list → assert non-empty). fixes PRD-363 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6 new issues
|
|
Coverage Impact ⬆️ Merging this pull request will increase total coverage on Modified Files with Diff Coverage (6)
🛟 Help
|
Address review feedback: - Strengthen tests with divergent mcpServerId fixtures so a future re-introduced filter or per-dispatch memoization regression fails. - Update the MCP integration test to exercise the cfg.id matching. - Surface a warn log when the orchestrator returns configs but none matches the step's mcpServerId — disambiguates "no configs" from "missing server config" in ops logs. - Strip ticket refs and internal taxonomy from inline comments. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tighten the disambiguation comment on the diagnostic warn log to lead with the WHY, and drop a marginal restatement comment in the legacy- fallback test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Distinguish "no configs at all" vs "no match" warns in Runner.fetchRemoteTools. - Detect partial load failure (case b): error-log failed sourceIds so ops can tell "wrong target server" from "MCP server down". - Warn about MCP configs lacking an id when mcpServerId is set (partial PRD-360 migration); filter undefined out of availableMcpServerIds payload. - Drop dead loadedMcpServerIds param from NoMcpToolsError; remove duplicate log in McpStepExecutor.requireTools (BaseStepExecutor already logs the error). - Strengthen tests: assert warn payloads in runner.test.ts, scoped call in the integration test, rename stale test in mcp-step-executor.test.ts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Scra3
left a comment
There was a problem hiding this comment.
Question sur le scoping des configs MCP non identifiées.
| // Configs without id cannot be matched against a defined mcpServerId. Surface them so a | ||
| // partial PRD-360 migration doesn't masquerade as "wrong target server" downstream. | ||
| const unidentifiedConfigNames = Object.entries(configs) | ||
| .filter(([, cfg]) => cfg.id === undefined) |
There was a problem hiding this comment.
question: PRD-360 ayant introduit mcpServerId, l'orchestrateur ne devrait-il pas toujours renvoyer un id désormais ? Si la migration est déployée partout, ce champ devrait être obligatoire (non-optionnel) et tout ce bloc unidentifiedConfigNames deviendrait du code mort à supprimer.
There was a problem hiding this comment.
Après échanges, on peut supprimer. Le seul point qui resterait c'est le typage de cet ID, strict coté backend, mais beaucoup moins ensuite (string). Hors de cette PR.
Scra3
left a comment
There was a problem hiding this comment.
Readability suggestion on fetchRemoteTools.
|
|
||
| private async fetchRemoteTools(): Promise<RemoteTool[]> { | ||
| // Match by config.id, not by Record key: server names can collide across configs. | ||
| private async fetchRemoteTools(mcpServerId?: string): Promise<RemoteTool[]> { |
There was a problem hiding this comment.
suggestion: This method mixes one behavior (scope configs + load tools) with three distinct diagnostics, making it ~57 lines and hard to grasp at a glance; extracting a pure scopeConfigsToServer(configs, mcpServerId) plus one private helper per diagnostic would shrink the body to its intent.
Scra3
left a comment
There was a problem hiding this comment.
Follow-up: extract MCP tool fetching out of Runner.
|
|
||
| private async fetchRemoteTools(): Promise<RemoteTool[]> { | ||
| // Match by config.id, not by Record key: server names can collide across configs. | ||
| private async fetchRemoteTools(mcpServerId?: string): Promise<RemoteTool[]> { |
There was a problem hiding this comment.
suggestion: Beyond splitting the body, this whole concern could move into its own RemoteToolFetcher (new file) constructed with workflowPort, aiModelPort and logger; Runner is already 473 lines spanning lifecycle, polling and run chaining, MCP fetching only touches those two ports, and a standalone module makes the scoping logic genuinely unit-testable with mocked ports.
There was a problem hiding this comment.
My favorite proposition between both (alban lol)
Scra3
left a comment
There was a problem hiding this comment.
Skeptic-validated findings: one diagnostic bug for Forest connectors, plus the test that would lock it in.
| // succeeded. Compare scoped keys to the tools' sourceIds so ops can tell "wrong config" | ||
| // from "MCP server down". | ||
| const loadedSourceIds = new Set(tools.map(t => t.sourceId)); | ||
| const failedSourceIds = Object.keys(scoped).filter(name => !loadedSourceIds.has(name)); |
There was a problem hiding this comment.
issue (non-blocking): This assumes sourceId === config Record key, true for MCP (mcp-client.ts sets sourceId: name) but false for Forest integrations whose sourceId is a hardcoded literal (zendesk/kolar/snowflake), so any successfully-loaded Forest connector keyed otherwise always lands in failedSourceIds and emits a false error on the happy path — restrict the comparison to MCP configs (!isForestIntegrationConfig) and rename to failedConfigNames to surface the two namespaces.
| workflowPort.getMcpServerConfigs.mockResolvedValue({ | ||
| 'server-A': { id: 'id-A', url: 'https://a.example', type: 'http', headers: {} }, | ||
| }); | ||
| // McpClient swallows per-server load errors — simulate the empty-result case here. |
There was a problem hiding this comment.
suggestion: With a single config and loadRemoteTools resolving to [], this is the all-failed case (which routes to NoMcpToolsError), not a genuine partial failure — add a case with two scoped configs where one loads and one fails, asserting the run proceeds with the survivor while failedSourceIds logs the other, since that is the only test that locks the sourceId-vs-key semantics.
…t connectors in partial-failure check Splits MCP fetch concern out of Runner into a dedicated module with one helper per diagnostic, shrinking Runner by ~50 lines and isolating the scoping logic for unit testing. errorOnPartialLoadFailure now excludes Forest integrations from the sourceId-vs-Record-key comparison: Forest connectors carry a hardcoded sourceId (zendesk/kolar/snowflake/...) that does not match the Record key, so a happy-path load was being reported as failed. Field renamed failedSourceIds -> failedConfigNames to surface the right namespace. Adds a genuine partial-failure runner test (survivor + failed) that locks the sourceId-vs-key semantics end-to-end by inspecting the captured McpStepExecutor instance — needed because the global BaseStepExecutor.execute spy hides downstream behaviour otherwise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…D-360) PRD-360 is delivered on the orchestrator: every config emitted on /liana/mcp-server-configs-with-details carries a non-null DB id, on both MCP and Forest-connector branches. Lock the executor's trust boundary accordingly and remove the legacy-fallback code that handled the pre-migration "no mcpServerId / no config.id" shape. - zod McpStepDefinition.mcpServerId becomes required — a missing id is now a malformed run instead of a silent broad fetch. - step-definition mapper switches on the union discriminant so TS narrows mcpServerId; drops the defensive 'in task' spread. - StepExecutorFactory + NoMcpToolsError signatures tightened to string. - RemoteToolFetcher.fetch is now string-only; warnUnidentifiedConfigs (the partial-PRD-360-migration sentinel) is gone. - Tests pruned of legacy-fallback scenarios; mapper boundary now has a fail-fast assertion for the missing-id case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ia tool.mcpServerId Post-review follow-ups for PR #1607. errorOnPartialLoadFailure now discriminates on tool.mcpServerId instead of tool.sourceId, which both McpClient and ForestIntegrationClient populate from the orchestrator's persisted id. This removes the Forest carve-out and fixes a real silent failure: a Forest connector that fails to load entirely is now flagged with its Record key in the error log, instead of bubbling up as a generic NoMcpToolsError with no indication of which connector is down. Other follow-ups: - z.string().min(1) on mcpServerId at the boundary — empty string is no longer parseable. - NoMcpToolsError userMessage made actionable, aligned with the StepTimeoutError pattern ("X happened. [Action]"). - Stale references corrected: mcp-step-executor's requireTools comment no longer mentions Runner; integration test no longer references the removed Runner.fetchRemoteTools symbol. - New unit tests for getMcpServerConfigs / loadRemoteTools rejection paths pin the contract that errors propagate instead of being swallowed into an empty tool list. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Summary
Before this change, every MCP step opened connections to every configured MCP server in the customer's project and listed their tools, then filtered in-memory by
step.mcpServerId. With N configured servers, each MCP step paid N× the per-step connect/list-tools cost, hit every upstream server (billable / rate-limited), and over-allocated MCP connections.This PR introduces
RemoteToolFetcher, a dedicated module that scopes the configs Record bycfg.id === mcpServerId(matching by DB id from PRD-360, not by Record key which can collide on server names) before delegating toloadRemoteTools. Only the targeted server's connection is opened.Design
RemoteToolFetcher— Standalone class extracted fromRunner. Owns the scoping logic + three operational diagnostics (missing target server, partial load failure). Pure helperscopeConfigsToServerexported for unit testing.mcpServerId(PRD-360) — The zod schema onMcpStepDefinitionrequiresmcpServerId: z.string().min(1), locking the boundary fail-fast against orchestrator regressions. PRD-360 is delivered orchestrator-side — every config (MCP + Forest) carries the persisted DB id.errorOnPartialLoadFailurediscriminates ontool.mcpServerId === cfg.id, which bothMcpClientandForestIntegrationClientpopulate from the orchestrator id. A Forest connector that fails to load entirely is now flagged infailedConfigNames(previously silently swallowed).McpStepExecutorno longer filters tools internally (getFilteredTools→requireTools). Tools are pre-scoped upstream; the executor asserts non-empty and throwsNoMcpToolsErrorwith an actionable user message.Behaviour
step.mcpServerIdmatches a config id → one config scoped → one MCP connection.step.mcpServerIdmatches no config → empty Record → executor throwsNoMcpToolsError. Diagnostic distinguishes "no configs at all" from "configs exist but none match".step.mcpServerIdmissing from wire payload →DomainValidationErrorat the zod boundary → reported as malformed run.loadRemoteToolsreturns fewer tools than expected →failedConfigNameslists the Record keys whose id is missing from the loaded tools. Uniform across MCP and Forest providers.Test plan
yarn workspace @forestadmin/workflow-executor test— 819 passedyarn workspace @forestadmin/workflow-executor lint— 0 errorsyarn workspace @forestadmin/workflow-executor build— cleanNew/updated tests:
scopeConfigsToServer: matches bycfg.id, not Record key; empty result when no match.RemoteToolFetcher: scoping, partial-failure (MCP + Forest), rejection paths forgetMcpServerConfigsandloadRemoteTools.mcpServerIdat the zod boundary.fixes PRD-363
🤖 Generated with Claude Code
Note
Scope MCP tool fetching to the step's target server in workflow executor
Runner.fetchRemoteToolsnow accepts an optionalmcpServerIdand filters MCP configs byconfig.idbefore callingloadRemoteTools, so each step only loads tools from its target server.StepExecutorFactory.createpasses the step'smcpServerIdtofetchRemoteTools, replacing the previous unscopedloadToolsdependency.McpStepExecutorremoves internal filtering bymcpServerId; it now asserts the provided tool set is non-empty via a newrequireToolsmethod and uses tools as-is.loadRemoteToolsreturns fewer tools than expected.mcpServerIdreceive an empty tool set and throwNoMcpToolsErrorinstead of silently filtering.Changes since #1607 opened
RemoteToolFetcherclass within theworkflow-executorpackage [3b5632c]Runnerclass to delegate remote tool fetching toRemoteToolFetcher[3b5632c]failedSourceIdstofailedConfigNamesand excluded Forest integrations from failure detection [3b5632c]scopeConfigsToServerandRemoteToolFetcher.fetchand updated existingRunnertests [3b5632c]mcpServerIdfrom optional to required acrossworkflow-executorpackage for MCP step definitions, remote tool fetching, and error handling [dbfd26d]mcpServerIdis specified [dbfd26d]ServerTaskTypeEnuminstead of using intermediateTASK_TYPE_TO_STEP_TYPEconstant [dbfd26d]workflow-executorpackage to provide requiredmcpServerIdfor MCP steps and align with new scoping behavior [dbfd26d]remote-tool-fetchermodule to uniformly comparetool.mcpServerIdagainstconfig.idfor all server types [e607979]McpStepDefinitionSchemato require non-emptymcpServerIdstrings [e607979]NoMcpToolsErrorclass constructor to use user-friendly error message without internal identifiers [e607979]errors.test.ts,mcp-step-executor.test.ts,runner.test.ts, andremote-tool-fetcher.test.tsto verify new scoping behavior and error messages [e607979]McpStepExecutor.requireToolsmethod andworkflow-execution.test.tsintegration test [e607979]Macroscope summarized 96ca091.