Skip to content

F099: MCP Proxy — Interception et contrôle des tool calls #353

@pocky

Description

@pocky

F099: MCP Proxy — Tool call interception + plugin tool exposure

Note: simplified scope — focus on 2 priorities. Everything else (policy, approval, cache, recorder/playback, middleware, external MCP-as-Plugin, snapshot, virtual tools) is explicitly deferred.

Objective

Intercept tool calls of all 5 supported agent providers (Claude, Gemini, Codex, OpenCode, OpenAI Compatible) by routing them through an AWF-controlled local MCP server, and let existing AWF gRPC plugins create new tools that those agents can call.

This is a deliberately reduced scope: it ships the two capabilities listed above and nothing else. Everything that the original F099 layered on top of those primitives (policy, approval, cache, recorder, middleware, external MCP servers, snapshot isolation, virtual tools) is explicitly out of scope and tracked separately.

Decisions

Decision Choice Alternative considered Trade-off
Interception mode Active proxy — AWF re-exposes built-ins, becomes the sole tool source Passive NDJSON observation; additive-only (plugin tools added next to native built-ins) Active proxy is the only way to actually control what the agent calls; additive mode is preserved as an opt-in per step (see intercept_builtins: false)
Provider coverage All 5 (Claude, Gemini, Codex, OpenCode, OpenAI Compatible) Claude only; Claude + OpenAI Compatible only Full coverage validates the abstraction across the 2 fundamentally different mechanisms (CLI subprocess vs HTTP native tools[]); Codex/OpenCode accept coexistence mode
Plugin tool exposure Explicit per-step declaration with plugin_tools[].expose: [...] Implicit (all plugins, all ops); plugin-level toggle Aligns with AWF "explicit > implicit" philosophy; keeps tools/list minimal per step
Built-in interception toggle intercept_builtins: true (default) + opt-out Always intercept; always additive Knob covers both "full control" and "just add plugin tools" use cases with one if-statement of extra code
Observability OTel spans + zap logging Recorder JSONL + EventBus events + spans Spans + logging are zero-cost when telemetry is disabled and reuse existing infrastructure; recorder is its own feature (needs playback consumer)
External MCP server bridging Out of scope Bridge external MCP servers (GitHub, Postgres) as plugins via type: mcp Different feature (subprocess lifecycle, handshake, schema mapping); tracked as a future F
Policy / approval / cache / middleware / recorder Out of scope Ship one or more in v1 None of these block the two stated priorities; all can be added later behind ports without breaking the v1 schema

In Scope

  • Local MCP server (stdio JSON-RPC 2.0) injected into agent CLIs as the sole tool source via per-provider mechanisms
  • Six built-in tools re-implemented and re-exposed by the proxy: Read, Write, Edit, Bash, Glob, Grep
  • Per-provider built-in disablement (Claude, Gemini full; Codex, OpenCode coexistence with startup warning)
  • Native HTTP tools[] interception for OpenAI Compatible (extension of chatCompletionsRequest, role: tool messages, multi-turn loop, SSE delta assembly, infinite-loop guard)
  • Plugin Bridge: existing gRPC OperationProvider exposed as MCP tools via adapter, with OperationSchema → JSON Schema mapping
  • Per-step subprocess lifecycle (start awf mcp-serve, graceful shutdown on step end / failure / SIGINT)
  • Tool name namespacing for plugin tools (<plugin>_<op>) with collision detection at step startup
  • YAML schema mcp_proxy: block (4 keys total) and awf validate rules with stable error codes USER.MCP_PROXY.*
  • OpenTelemetry spans per tool call (child of step span) and zap log line per tool call
  • Hexagonal architecture compliance: domain port ToolProvider, application services, infrastructure adapters; .go-arch-lint.yml updated

Out of Scope (explicit non-goals)

The following items are not delivered. Each is independently addable later without breaking the v1 schema or architecture:

  • Policy Engine (allow/deny lists, filesystem sandboxing)
  • Human-in-the-loop approval (approval: always / pattern)
  • Content-addressed result cache (path + mtime + size keying, TTL, invalidation)
  • Tool call recorder (JSONL append-only) and awf playback command
  • Composable middleware chain (redact_secrets, truncate_large_files, rate_limit, inject_context)
  • MCP-as-Plugin bridging (external MCP servers like @modelcontextprotocol/server-github registered as type: mcp plugins)
  • Bypass detection via NDJSON output parsing for Codex/OpenCode coexistence
  • EventBus events (tool.call.start/end/denied/bypassed)
  • Snapshot isolation (CoW filesystem overlay) for parallel steps
  • Virtual composite tools (pipelines with rollback)

Architecture

┌─────────────────────────────────────────────────────────────┐
│ INTERFACES (cli + YAML)                                     │
│   - YAML block  mcp_proxy: { enable, intercept_builtins,    │
│                              plugin_tools }                 │
│   - awf validate runs the block validation                  │
│   - Internal command `awf mcp-serve --config=<path>`        │
└──────────────────────────┬──────────────────────────────────┘
                           │
┌──────────────────────────┴──────────────────────────────────┐
│ APPLICATION                                                 │
│   - ToolProxyService : orchestrates the per-step lifecycle  │
│   - ToolRouter       : aggregates ToolProviders, routes     │
│                        by name, detects collisions          │
└──────────────────────────┬──────────────────────────────────┘
                           │
┌──────────────────────────┴──────────────────────────────────┐
│ DOMAIN — ports                                              │
│   - ToolProvider : ListTools(), CallTool(name, args), Close │
│   - (NO ToolPolicy / ToolMiddleware / ToolCache)            │
└──────────────────────────┬──────────────────────────────────┘
                           │
┌──────────────────────────┴──────────────────────────────────┐
│ INFRASTRUCTURE                                              │
│   - pkg/mcpserver       : reusable MCP server (stdio,       │
│                           zero internal/ imports, NFR-005)  │
│   - BuiltinToolProvider : Read/Write/Edit/Bash/Glob/Grep    │
│   - PluginToolAdapter   : OperationProvider → ToolProvider  │
│   - Provider injection  : buildExecuteArgs extension for    │
│                           Claude/Gemini/Codex/OpenCode      │
│                           + chatCompletionsRequest for      │
│                           OpenAI Compatible                 │
└─────────────────────────────────────────────────────────────┘

Key invariants:

  • One MCP server per step (lifetime bound to step, graceful shutdown via defer)
  • The MCP server runs as a separate subprocess (awf mcp-serve --config=<tmpfile>) for stdio providers (Claude/Gemini/Codex/OpenCode); for OpenAI Compatible there is no subprocess — ToolRouter is invoked in-process by the HTTP provider
  • Each CallTool opens a child OTel span of the current step span; attributes: tool name, source (builtin / plugin:<name>), duration, error
  • One zap log line per tool call; zero persistent storage

Components

1. pkg/mcpserver — Reusable MCP Server

Standalone package, zero imports from internal/ (preserves NFR-005). Implements the stable MCP subset: initialize, initialized, tools/list, tools/call, shutdown.

package mcpserver

type Server struct { /* ... */ }

func New() *Server
func (s *Server) RegisterTool(name string, schema InputSchema, handler ToolHandler)
func (s *Server) Serve(ctx context.Context, stdin io.Reader, stdout io.Writer) error

type ToolHandler func(ctx context.Context, args json.RawMessage) (Result, error)
type InputSchema struct { /* JSON Schema document */ }
type Result struct {
    Content []ContentBlock
    IsError bool
}

Out of scope for v1: notifications/progress, prompts, resources, sampling.

2. Domain port — internal/domain/ports/tool_provider.go

type ToolProvider interface {
    ListTools(ctx context.Context) ([]ToolDefinition, error)
    CallTool(ctx context.Context, name string, args map[string]any) (*ToolResult, error)
    Close(ctx context.Context) error
}

type ToolDefinition struct {
    Name        string
    Description string
    InputSchema map[string]any // JSON Schema
    Source      string         // "builtin" | "plugin:<plugin_name>"
}

type ToolResult struct {
    Content []ToolContent
    IsError bool
}

No ToolPolicy, ToolMiddleware, ToolCache ports are introduced in v1.

3. Infrastructure adapters

Adapter Location Responsibility
BuiltinToolProvider internal/infrastructure/tools/builtins/ Implements Read, Write, Edit, Bash, Glob, Grep. Uses the existing Executor for Bash; os/filepath helpers for the file ops. No filesystem sandboxing (out of scope).
PluginToolAdapter internal/infrastructure/tools/plugin_adapter.go Wraps a ports.OperationProvider. For each op listed in expose:, maps OperationSchema → InputSchema (JSON Schema). Prefixes tool names with <plugin_name>_.

4. Application services

Service Location Responsibility
ToolRouter internal/application/tools/router.go Aggregates multiple ToolProviders. Builds the consolidated tools/list. Routes tools/call by name. Detects collisions at registration (fatal at step startup, not runtime). Wraps each call with OTel span and zap log.
ToolProxyService internal/application/tools/proxy_service.go Per-step coordinator: reads mcp_proxy: config, instantiates ToolProviders, builds the ToolRouter, spawns awf mcp-serve (for stdio providers) or hands the router to the HTTP provider (for OpenAI Compatible), returns the provider-specific config payload, shuts everything down on step end.

5. Internal CLI command — awf mcp-serve

Not exposed in user help. Launched as a subprocess by ToolProxyService. Takes --config=<path> pointing to a tmp file describing the tools to expose (built-ins flag + plugin_tools list). Starts an mcpserver.Server, registers the tool handlers, calls Serve() on stdin/stdout.

6. Provider injection extensions

Each provider's buildExecuteArgs (or HTTP request builder for OpenAI Compatible) is extended to inject the proxy when mcp_proxy.enable: true. See Per-Provider Injection below for full flag tables.

7. OTel + Logging (cross-cutting)

Wired in ToolRouter.CallTool:

ctx, span := tracer.Start(ctx, "tool.call." + name)
defer span.End()
span.SetAttributes(
    attribute.String("tool.name", name),
    attribute.String("tool.source", source),
)

start := time.Now()
result, err := provider.CallTool(ctx, name, args)
duration := time.Since(start)

logger.Info("tool call",
    zap.String("tool", name),
    zap.String("source", source),
    zap.Duration("duration", duration),
    zap.Error(err),
)

span.SetAttributes(attribute.Int64("tool.duration_ms", duration.Milliseconds()))
if err != nil {
    span.RecordError(err)
}

Zero cost when no telemetry exporter is configured (existing AWF behavior).

YAML Schema

Grammar

mcp_proxy:
  enable: bool                    # default: false. Activates the proxy on this step.
  intercept_builtins: bool        # default: true. If false → native built-ins stay active,
                                  #                 proxy only adds plugin_tools.
  plugin_tools:                   # optional. Plugins to expose.
    - plugin: string              # name from .awf/plugins.yaml
      expose: [string, ...]       # operations to expose as MCP tools

Examples

Case 1 — proxy unused (default, backwards compatible)

states:
  refactor:
    type: step
    agent:
      provider: claude
      prompt: "Refactor src/foo.go"
    # no mcp_proxy: → identical behavior to today, native built-ins.

Case 2 — full interception, built-ins only (pure observability)

states:
  refactor:
    type: step
    agent:
      provider: claude
      prompt: "Refactor src/foo.go"
    mcp_proxy:
      enable: true
      # Read/Write/Edit/Bash/Glob/Grep re-exposed by AWF.
      # No plugin_tools → only the 6 built-ins.
      # Use case: zap logging + OTel spans on every FS/shell op the agent performs.

Case 3 — full interception + plugin tools

states:
  deploy:
    type: step
    agent:
      provider: claude
      prompt: "Apply the new k8s manifest"
    mcp_proxy:
      enable: true
      plugin_tools:
        - plugin: kubernetes
          expose: [kubectl_apply, kubectl_get]
      # Agent sees: Read, Write, Edit, Bash, Glob, Grep,
      #             kubernetes_kubectl_apply, kubernetes_kubectl_get

Case 4 — additive proxy (native built-ins intact, plugin tools added)

states:
  deploy:
    type: step
    agent:
      provider: claude
      prompt: "Apply the new k8s manifest"
    mcp_proxy:
      enable: true
      intercept_builtins: false
      plugin_tools:
        - plugin: kubernetes
          expose: [kubectl_apply]
      # Agent sees: its NATIVE Read/Write/Edit/Bash/Glob/Grep +
      #             kubernetes_kubectl_apply (via AWF).
      # OTel/logging only on kubernetes_kubectl_apply.

Validation rules (awf validate)

Error code Condition
USER.MCP_PROXY.UNKNOWN_KEY Unknown key in the mcp_proxy: block (typo, future schema, unsupported sub-key)
USER.MCP_PROXY.UNKNOWN_PLUGIN plugin_tools[].plugin does not match any plugin declared in .awf/plugins.yaml
USER.MCP_PROXY.UNKNOWN_OPERATION plugin_tools[].expose[] references an operation the plugin does not expose
USER.MCP_PROXY.NAME_COLLISION Two tools (built-in or plugin, after namespacing) resolve to the same name
USER.MCP_PROXY.EMPTY_PROXY enable: true + intercept_builtins: false + empty/missing plugin_tools → effective no-op, explicit error to flag the dead config
USER.MCP_PROXY.UNSUPPORTED_PROVIDER (warn only) Step uses Codex or OpenCode — logs a startup warning about coexistence mode

Per-Provider Injection

Mode intercept_builtins: true (default)

Provider Flags / mechanism MCP-only isolation
Claude --mcp-config <path> + --tools "" + --strict-mcp-config Guaranteed
Gemini --mcp-server awf-proxy=<cmd> + --allowed-mcp-server-names awf-proxy + -e "" (fallback --policy <deny-all-path> if -e "" does not fully disable extensions) Validation in Phase 3
Codex -c 'mcp_servers.awf-proxy.command="<path>"' + -c 'mcp_servers.awf-proxy.args=[...]' + -s read-only + system prompt mitigation ("Use only MCP tools, never built-in tools") Coexistence — built-ins remain accessible. Startup warning emitted.
OpenCode opencode mcp add awf-proxy -- <cmd> (persistent config, applied before exec) + system prompt mitigation Coexistence — same as Codex. Startup warning emitted.
OpenAI Compatible No CLI flags. Native mechanism: chatCompletionsRequest.tools[] carries the 6 built-ins + plugin tools, tool_choice: "auto", role: "tool" messages with tool_call_id, multi-turn execution loop, SSE delta assembly for tool_calls. Loop guard: len(tool_calls) == 0 && finish_reason == "tool_calls" → structured error. Guaranteed (AWF is the HTTP client)

Mode intercept_builtins: false

Native built-ins remain active; the MCP server is injected alongside, carrying only the plugin tools.

Provider Difference vs. full-interception mode
Claude Drop --tools "" and --strict-mcp-config. Keep only --mcp-config <path>.
Gemini Drop -e "" and --allowed-mcp-server-names. Keep only --mcp-server awf-proxy=<cmd>.
Codex Identical to full-interception (no --tools flag to omit — full-mode was already coexistence). Drop system prompt mitigation.
OpenCode Same as Codex.
OpenAI Compatible chatCompletionsRequest.tools[] only carries plugin tools (no built-ins).

Subprocess lifecycle (Claude / Gemini / Codex / OpenCode)

ToolProxyService.Start(step) {
  1. Build config file (tmp): describes tools to expose
  2. Spawn `awf mcp-serve --config=<tmpfile>` as subprocess
  3. Generate provider-specific MCP config (.json for Claude, etc.)
  4. Return: (mcpConfigPath, cleanupFunc)
}

→ Agent CLI invoked with injected flags pointing to mcpConfigPath
→ Agent connects via stdio to awf mcp-serve subprocess
→ Agent issues tools/list and tools/call via JSON-RPC

ToolProxyService.Close(step) {
  1. Send shutdown to mcp-serve subprocess (SIGTERM)
  2. Wait max 5s for graceful exit
  3. SIGKILL if still alive
  4. Remove tmpfile
}

For OpenAI Compatible: no subprocess. The ToolRouter is invoked directly in-process by the HTTP provider during its multi-turn loop.

Startup warning for Codex / OpenCode

When a step launches with intercept_builtins: true on Codex or OpenCode, log via zap at WARN:

WARN: mcp_proxy on provider=codex runs in coexistence mode.
      Built-in tools cannot be disabled and may bypass the proxy.
      Use 'claude' or 'openai-compatible' for guaranteed MCP-only isolation.

The user accepts this trade-off implicitly by choosing the provider — no additional opt-in.

Phasing

Phase Deliverable Effort estimate
1 — Foundation + Claude pkg/mcpserver, ToolProvider port, BuiltinToolProvider, ToolRouter, ToolProxyService, awf mcp-serve command, Claude injection, YAML schema + validation (codes UNKNOWN_KEY and EMPTY_PROXY), intercept_builtins knob, OTel + logging, .go-arch-lint.yml update. End-to-end: a Claude step exercises the 6 built-ins via the proxy. ~1-2 weeks
2 — Plugin Bridge PluginToolAdapter, OperationSchema → JSON Schema mapping, namespacing <plugin>_<op>, collision detection, YAML plugin_tools: support, validation codes UNKNOWN_PLUGIN, UNKNOWN_OPERATION, NAME_COLLISION. End-to-end: a Claude step exposes a gRPC plugin's operation as an MCP tool. ~3-5 days
3 — Multi-provider stdio Gemini injection (with -e "" validation + --policy fallback), Codex injection (coexistence + prompt mitigation), OpenCode injection (opencode mcp add + prompt mitigation), startup warning for Codex/OpenCode, validation code UNSUPPORTED_PROVIDER. ~1 week
4 — OpenAI Compatible native tools[] chatCompletionsRequest extension (tools[], tool_choice), role: "tool" message support, multi-turn execution loop, SSE delta assembly for tool_calls, infinite-loop guard. Reuses ToolRouter directly (no subprocess). ~1 week

Total: ~4-5 weeks for one full-time engineer. Size: L.

Dependencies: Phases 2, 3, and 4 all depend on Phase 1. Phases 2, 3, and 4 are independent of each other and may be parallelized.

MVP: Phases 1 + 2 deliver both priorities on Claude in ~2 weeks (size M). The full scope is committed but a partial cut is shippable.

Acceptance Criteria

ID Criterion
AC-1 A step with mcp_proxy.enable: true on Claude, Gemini, or OpenAI Compatible exercises Read and Bash exclusively through awf mcp-serve (or in-process ToolRouter for OpenAI Compatible). OTel span is emitted as child of the step span. Zap log line is written.
AC-2 A step with intercept_builtins: false + plugin_tools: [{plugin: P, expose: [op]}] results in the agent seeing native built-ins + the namespaced plugin tool. The plugin tool call dispatches to OperationProvider.Execute(op, args).
AC-3 A name collision between two plugin tools, or between a plugin tool and a built-in, fails at step startup with USER.MCP_PROXY.NAME_COLLISION. Runtime collision is impossible.
AC-4 awf validate rejects mcp_proxy: blocks with unknown keys (UNKNOWN_KEY), unknown plugins (UNKNOWN_PLUGIN), unknown operations (UNKNOWN_OPERATION), and dead configs (EMPTY_PROXY).
AC-5 Ctrl+C during a step with the proxy active terminates the awf mcp-serve subprocess cleanly (no zombies, verified by pgrep integration test).
AC-6 A Codex or OpenCode step with intercept_builtins: true logs the expected coexistence warning at startup.
AC-7 An OpenAI Compatible step with tools[] returning zero tool calls and finish_reason: "tool_calls" errors out instead of looping.
AC-8 make build, make lint, make lint-arch, make test, make test-race all pass with zero violations.
AC-9 pkg/mcpserver has zero imports from internal/ (verified by make lint-arch).

Risks

Risk Likelihood Impact Mitigation
Gemini -e "" does not disable extensions as documented Medium Medium Fallback to --policy <deny-all-path> validated in Phase 3 before merge
Codex / OpenCode prompt mitigation insufficient to prevent native built-in use High Low Accepted trade-off; documented; warning emitted; users in sensitive contexts use Claude or OpenAI Compatible
OpenAI Compatible SSE delta assembly subtly broken for multi-chunk tool_calls arguments Medium High Integration test with a tool whose args span 2+ chunks; loop guard prevents infinite loop
OperationSchema → JSON Schema mapping loses information (e.g., complex types, defaults) Medium Medium Phase 2 lands with a curated mapping; unsupported features error explicitly at registration rather than silently dropping
Subprocess awf mcp-serve orphaned after parent crash Low Medium SIGTERM with 5s timeout then SIGKILL; integration test with pgrep verifies no orphans on Ctrl+C

Future Work (explicitly deferred)

Each item below can be added behind the existing ToolProvider port and YAML schema without breaking changes:

  • Policy Engine (allow/deny, filesystem sandboxing) — adds a ToolPolicy port wrapping ToolRouter
  • Human-in-the-loop approval — extension of Policy Engine
  • Content-addressed result cache — adds a ToolCache decorator around ToolProvider
  • Tool call recorder (JSONL) + awf playback <id> command
  • Composable middleware chain — adds a ToolMiddleware port; chain composed at ToolRouter level
  • MCP-as-Plugin (external MCP servers as plugins via type: mcp in plugin config)
  • Bypass detection via NDJSON parsing for Codex/OpenCode
  • EventBus events (tool.call.start/end/denied/bypassed)
  • Snapshot isolation (CoW filesystem overlay) for parallel steps
  • Virtual composite tools (pipelines with rollback)

Metadata

  • Status: backlog
  • Version: v0.10.0
  • Priority: high
  • Estimation: L (was XL)

Dependencies

  • Blocked by: none (gRPC plugin system C066–C069 is a prerequisite for Phase 2 but is already implemented)
  • Unblocks: future Policy/Cache/Recorder/Middleware features (all designed to plug behind the ToolProvider port without breaking changes)

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureFeature specificationv0.10.0Target version

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions