Skip to content

fix(workflow-executor): scope MCP tool fetch to the step's target server (PRD-363)#1607

Open
christophebrun-forest wants to merge 7 commits into
feat/prd-214-server-step-mapperfrom
fix/prd-363-scope-mcp-tool-fetch
Open

fix(workflow-executor): scope MCP tool fetch to the step's target server (PRD-363)#1607
christophebrun-forest wants to merge 7 commits into
feat/prd-214-server-step-mapperfrom
fix/prd-363-scope-mcp-tool-fetch

Conversation

@christophebrun-forest
Copy link
Copy Markdown
Member

@christophebrun-forest christophebrun-forest commented May 28, 2026

Summary

Before this change, every MCP step opened connections to every configured MCP server in the customer's project and listed their tools, then filtered in-memory by step.mcpServerId. With N configured servers, each MCP step paid N× the per-step connect/list-tools cost, hit every upstream server (billable / rate-limited), and over-allocated MCP connections.

This PR introduces RemoteToolFetcher, a dedicated module that scopes the configs Record by cfg.id === mcpServerId (matching by DB id from PRD-360, not by Record key which can collide on server names) before delegating to loadRemoteTools. Only the targeted server's connection is opened.

Design

  • RemoteToolFetcher — Standalone class extracted from Runner. Owns the scoping logic + three operational diagnostics (missing target server, partial load failure). Pure helper scopeConfigsToServer exported for unit testing.
  • Required mcpServerId (PRD-360) — The zod schema on McpStepDefinition requires mcpServerId: z.string().min(1), locking the boundary fail-fast against orchestrator regressions. PRD-360 is delivered orchestrator-side — every config (MCP + Forest) carries the persisted DB id.
  • Uniform partial-failure check across providerserrorOnPartialLoadFailure discriminates on tool.mcpServerId === cfg.id, which both McpClient and ForestIntegrationClient populate from the orchestrator id. A Forest connector that fails to load entirely is now flagged in failedConfigNames (previously silently swallowed).
  • Executor responsibilitiesMcpStepExecutor no longer filters tools internally (getFilteredToolsrequireTools). Tools are pre-scoped upstream; the executor asserts non-empty and throws NoMcpToolsError with an actionable user message.

Behaviour

  • step.mcpServerId matches a config id → one config scoped → one MCP connection.
  • step.mcpServerId matches no config → empty Record → executor throws NoMcpToolsError. Diagnostic distinguishes "no configs at all" from "configs exist but none match".
  • step.mcpServerId missing from wire payloadDomainValidationError at the zod boundary → reported as malformed run.
  • loadRemoteTools returns fewer tools than expected → failedConfigNames lists the Record keys whose id is missing from the loaded tools. Uniform across MCP and Forest providers.

Test plan

  • yarn workspace @forestadmin/workflow-executor test — 819 passed
  • yarn workspace @forestadmin/workflow-executor lint — 0 errors
  • yarn workspace @forestadmin/workflow-executor build — clean

New/updated tests:

  • scopeConfigsToServer: matches by cfg.id, not Record key; empty result when no match.
  • RemoteToolFetcher: scoping, partial-failure (MCP + Forest), rejection paths for getMcpServerConfigs and loadRemoteTools.
  • Runner: re-scopes per dispatch on chained MCP steps; partial-failure diagnostic is non-blocking.
  • Step-definition mapper: rejects missing mcpServerId at the zod boundary.

fixes PRD-363

🤖 Generated with Claude Code

Note

Scope MCP tool fetching to the step's target server in workflow executor

  • Runner.fetchRemoteTools now accepts an optional mcpServerId and filters MCP configs by config.id before calling loadRemoteTools, so each step only loads tools from its target server.
  • StepExecutorFactory.create passes the step's mcpServerId to fetchRemoteTools, replacing the previous unscoped loadTools dependency.
  • McpStepExecutor removes internal filtering by mcpServerId; it now asserts the provided tool set is non-empty via a new requireTools method and uses tools as-is.
  • Warnings are emitted when a requested server ID matches no config, configs lack IDs, or loadRemoteTools returns fewer tools than expected.
  • Behavioral Change: MCP tool loading is now scoped per-step at fetch time rather than post-fetch; steps requesting an unknown mcpServerId receive an empty tool set and throw NoMcpToolsError instead of silently filtering.

Changes since #1607 opened

  • Extracted remote tool fetching logic into a new RemoteToolFetcher class within the workflow-executor package [3b5632c]
  • Refactored Runner class to delegate remote tool fetching to RemoteToolFetcher [3b5632c]
  • Changed partial failure logging from failedSourceIds to failedConfigNames and excluded Forest integrations from failure detection [3b5632c]
  • Added test coverage for scopeConfigsToServer and RemoteToolFetcher.fetch and updated existing Runner tests [3b5632c]
  • Changed mcpServerId from optional to required across workflow-executor package for MCP step definitions, remote tool fetching, and error handling [dbfd26d]
  • Removed legacy behavior of loading remote tools across all MCP server configs when no mcpServerId is specified [dbfd26d]
  • Refactored task-to-step mapping in step-definition-mapper to switch directly on ServerTaskTypeEnum instead of using intermediate TASK_TYPE_TO_STEP_TYPE constant [dbfd26d]
  • Updated test files across workflow-executor package to provide required mcpServerId for MCP steps and align with new scoping behavior [dbfd26d]
  • Changed partial-load failure detection in remote-tool-fetcher module to uniformly compare tool.mcpServerId against config.id for all server types [e607979]
  • Strengthened validation in McpStepDefinitionSchema to require non-empty mcpServerId strings [e607979]
  • Updated NoMcpToolsError class constructor to use user-friendly error message without internal identifiers [e607979]
  • Updated test expectations across errors.test.ts, mcp-step-executor.test.ts, runner.test.ts, and remote-tool-fetcher.test.ts to verify new scoping behavior and error messages [e607979]
  • Updated inline comments in McpStepExecutor.requireTools method and workflow-execution.test.ts integration test [e607979]

Macroscope summarized 96ca091.

…ver (PRD-363)

Every MCP step opened connections to every configured MCP server and listed
their tools, then filtered in-memory by step.mcpServerId. With N configured
servers, an MCP step paid N× the connect/list-tools roundtrip cost, hit every
upstream server (billable/rate-limited), and over-allocated MCP connections.

Filter configs by cfg.id (DB id from PRD-360) inside Runner.fetchRemoteTools
before delegating to loadRemoteTools. The MCP branch of StepExecutorFactory
forwards step.mcpServerId — the Runner stays generic. The executor's
getFilteredTools collapses into requireTools (pre-scoped list → assert
non-empty).

fixes PRD-363

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@linear
Copy link
Copy Markdown

linear Bot commented May 28, 2026

PRD-363

@qltysh
Copy link
Copy Markdown

qltysh Bot commented May 28, 2026

6 new issues

Tool Category Rule Count
qlty Structure Function with many returns (count = 6): mapTask 3
qlty Structure Function with high complexity (count = 12): executeToolAndPersist 2
qlty Structure Function with many parameters (count = 5): create 1

@qltysh
Copy link
Copy Markdown

qltysh Bot commented May 28, 2026

Qlty


Coverage Impact

⬆️ Merging this pull request will increase total coverage on feat/prd-214-server-step-mapper by 0.01%.

Modified Files with Diff Coverage (6)

RatingFile% DiffUncovered Line #s
Coverage rating: A Coverage rating: A
packages/workflow-executor/src/runner.ts100.0%
Coverage rating: A Coverage rating: A
packages/workflow-executor/src/executors/step-executor-factory.ts100.0%
Coverage rating: A Coverage rating: A
packages/workflow-executor/src/errors.ts100.0%
Coverage rating: A Coverage rating: A
packages/workflow-executor/src/executors/mcp-step-executor.ts100.0%
Coverage rating: A Coverage rating: A
packages/workflow-executor/src/adapters/step-definition-mapper.ts100.0%
New Coverage rating: A
packages/workflow-executor/src/remote-tool-fetcher.ts100.0%
Total100.0%
🚦 See full report on Qlty Cloud »

🛟 Help
  • Diff Coverage: Coverage for added or modified lines of code (excludes deleted files). Learn more.

  • Total Coverage: Coverage for the whole repository, calculated as the sum of all File Coverage. Learn more.

  • File Coverage: Covered Lines divided by Covered Lines plus Missed Lines. (Excludes non-executable lines including blank lines and comments.)

    • Indirect Changes: Changes to File Coverage for files that were not modified in this PR. Learn more.

christophebrun-forest and others added 3 commits May 28, 2026 16:57
Address review feedback:
- Strengthen tests with divergent mcpServerId fixtures so a future
  re-introduced filter or per-dispatch memoization regression fails.
- Update the MCP integration test to exercise the cfg.id matching.
- Surface a warn log when the orchestrator returns configs but none
  matches the step's mcpServerId — disambiguates "no configs" from
  "missing server config" in ops logs.
- Strip ticket refs and internal taxonomy from inline comments.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tighten the disambiguation comment on the diagnostic warn log to lead
with the WHY, and drop a marginal restatement comment in the legacy-
fallback test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Distinguish "no configs at all" vs "no match" warns in Runner.fetchRemoteTools.
- Detect partial load failure (case b): error-log failed sourceIds so ops can tell
  "wrong target server" from "MCP server down".
- Warn about MCP configs lacking an id when mcpServerId is set (partial PRD-360
  migration); filter undefined out of availableMcpServerIds payload.
- Drop dead loadedMcpServerIds param from NoMcpToolsError; remove duplicate log
  in McpStepExecutor.requireTools (BaseStepExecutor already logs the error).
- Strengthen tests: assert warn payloads in runner.test.ts, scoped call in the
  integration test, rename stale test in mcp-step-executor.test.ts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@Scra3 Scra3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question sur le scoping des configs MCP non identifiées.

// Configs without id cannot be matched against a defined mcpServerId. Surface them so a
// partial PRD-360 migration doesn't masquerade as "wrong target server" downstream.
const unidentifiedConfigNames = Object.entries(configs)
.filter(([, cfg]) => cfg.id === undefined)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: PRD-360 ayant introduit mcpServerId, l'orchestrateur ne devrait-il pas toujours renvoyer un id désormais ? Si la migration est déployée partout, ce champ devrait être obligatoire (non-optionnel) et tout ce bloc unidentifiedConfigNames deviendrait du code mort à supprimer.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Après échanges, on peut supprimer. Le seul point qui resterait c'est le typage de cet ID, strict coté backend, mais beaucoup moins ensuite (string). Hors de cette PR.

Copy link
Copy Markdown
Member

@Scra3 Scra3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Readability suggestion on fetchRemoteTools.


private async fetchRemoteTools(): Promise<RemoteTool[]> {
// Match by config.id, not by Record key: server names can collide across configs.
private async fetchRemoteTools(mcpServerId?: string): Promise<RemoteTool[]> {
Copy link
Copy Markdown
Member

@Scra3 Scra3 May 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: This method mixes one behavior (scope configs + load tools) with three distinct diagnostics, making it ~57 lines and hard to grasp at a glance; extracting a pure scopeConfigsToServer(configs, mcpServerId) plus one private helper per diagnostic would shrink the body to its intent.

Copy link
Copy Markdown
Member

@Scra3 Scra3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up: extract MCP tool fetching out of Runner.


private async fetchRemoteTools(): Promise<RemoteTool[]> {
// Match by config.id, not by Record key: server names can collide across configs.
private async fetchRemoteTools(mcpServerId?: string): Promise<RemoteTool[]> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Beyond splitting the body, this whole concern could move into its own RemoteToolFetcher (new file) constructed with workflowPort, aiModelPort and logger; Runner is already 473 lines spanning lifecycle, polling and run chaining, MCP fetching only touches those two ports, and a standalone module makes the scoping logic genuinely unit-testable with mocked ports.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My favorite proposition between both (alban lol)

Copy link
Copy Markdown
Member

@Scra3 Scra3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skeptic-validated findings: one diagnostic bug for Forest connectors, plus the test that would lock it in.

// succeeded. Compare scoped keys to the tools' sourceIds so ops can tell "wrong config"
// from "MCP server down".
const loadedSourceIds = new Set(tools.map(t => t.sourceId));
const failedSourceIds = Object.keys(scoped).filter(name => !loadedSourceIds.has(name));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (non-blocking): This assumes sourceId === config Record key, true for MCP (mcp-client.ts sets sourceId: name) but false for Forest integrations whose sourceId is a hardcoded literal (zendesk/kolar/snowflake), so any successfully-loaded Forest connector keyed otherwise always lands in failedSourceIds and emits a false error on the happy path — restrict the comparison to MCP configs (!isForestIntegrationConfig) and rename to failedConfigNames to surface the two namespaces.

workflowPort.getMcpServerConfigs.mockResolvedValue({
'server-A': { id: 'id-A', url: 'https://a.example', type: 'http', headers: {} },
});
// McpClient swallows per-server load errors — simulate the empty-result case here.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: With a single config and loadRemoteTools resolving to [], this is the all-failed case (which routes to NoMcpToolsError), not a genuine partial failure — add a case with two scoped configs where one loads and one fails, asserting the run proceeds with the survivor while failedSourceIds logs the other, since that is the only test that locks the sourceId-vs-key semantics.

christophebrun-forest and others added 3 commits May 29, 2026 11:40
…t connectors in partial-failure check

Splits MCP fetch concern out of Runner into a dedicated module with one
helper per diagnostic, shrinking Runner by ~50 lines and isolating the
scoping logic for unit testing.

errorOnPartialLoadFailure now excludes Forest integrations from the
sourceId-vs-Record-key comparison: Forest connectors carry a hardcoded
sourceId (zendesk/kolar/snowflake/...) that does not match the Record
key, so a happy-path load was being reported as failed. Field renamed
failedSourceIds -> failedConfigNames to surface the right namespace.

Adds a genuine partial-failure runner test (survivor + failed) that
locks the sourceId-vs-key semantics end-to-end by inspecting the
captured McpStepExecutor instance — needed because the global
BaseStepExecutor.execute spy hides downstream behaviour otherwise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…D-360)

PRD-360 is delivered on the orchestrator: every config emitted on
/liana/mcp-server-configs-with-details carries a non-null DB id, on both
MCP and Forest-connector branches. Lock the executor's trust boundary
accordingly and remove the legacy-fallback code that handled the
pre-migration "no mcpServerId / no config.id" shape.

- zod McpStepDefinition.mcpServerId becomes required — a missing id is
  now a malformed run instead of a silent broad fetch.
- step-definition mapper switches on the union discriminant so TS
  narrows mcpServerId; drops the defensive 'in task' spread.
- StepExecutorFactory + NoMcpToolsError signatures tightened to string.
- RemoteToolFetcher.fetch is now string-only; warnUnidentifiedConfigs
  (the partial-PRD-360-migration sentinel) is gone.
- Tests pruned of legacy-fallback scenarios; mapper boundary now has a
  fail-fast assertion for the missing-id case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ia tool.mcpServerId

Post-review follow-ups for PR #1607.

errorOnPartialLoadFailure now discriminates on tool.mcpServerId instead
of tool.sourceId, which both McpClient and ForestIntegrationClient
populate from the orchestrator's persisted id. This removes the Forest
carve-out and fixes a real silent failure: a Forest connector that
fails to load entirely is now flagged with its Record key in the error
log, instead of bubbling up as a generic NoMcpToolsError with no
indication of which connector is down.

Other follow-ups:
- z.string().min(1) on mcpServerId at the boundary — empty string is no
  longer parseable.
- NoMcpToolsError userMessage made actionable, aligned with the
  StepTimeoutError pattern ("X happened. [Action]").
- Stale references corrected: mcp-step-executor's requireTools comment
  no longer mentions Runner; integration test no longer references the
  removed Runner.fetchRemoteTools symbol.
- New unit tests for getMcpServerConfigs / loadRemoteTools rejection
  paths pin the contract that errors propagate instead of being
  swallowed into an empty tool list.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants