From 7f0587295ea5d3ce94f058deb5a7307d2bc6f32b Mon Sep 17 00:00:00 2001
From: Shivang <shivang.iitk@gmail.com>
Date: Wed, 20 May 2026 01:28:59 -0700
Subject: [PATCH 1/4] feat(mcp): run_command tool for visible-pane shell with
 read-back
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Today AI clients (Claude Code, Codex, Cursor, …) running shell commands
go through their built-in Bash tool — invisible to the user. The user
asked for shell work to happen in their actual BossTerm window while
the agent still receives stdout/stderr + exit code, so they can watch
what's running and the agent isn't blind.

New write tool mcp__bossterm__run_command, blocking variant of
run_in_panel:
- Spawns (or reuses) a visible BossTerm pane, writes the script, waits
  for OSC 133;D, returns {exitCode, output, durationMs, paneId,
  truncated, error}.
- Pane reuse: first call splits below the focused pane (or whatever
  mcpRunCommandDefaultPanel says); subsequent calls in the same tab
  reuse that pane. The cache (tabId → paneId) is guarded by a per-tab
  Mutex so two concurrent calls can't both miss the cache and orphan
  a pane. A per-pane Mutex serializes overlapping calls into the same
  pane's stdin.
- Output capture: absolute history-line tracking via the OSC 133;B and
  ;D events, so the slice stays correct when output scrolls into
  history mid-command. ANSI-stripped via the emulator's per-line .text
  accessor (same path read_scrollback uses).
- Alt-screen / TUI detection: polls textBuffer.isUsingAlternateBuffer
  while waiting for D; on transition, completes with
  error: "TUI detected …" instead of timing out. The process stays
  alive so the caller can drive it via send_input + read_scrollback.
- Reserved tool: added to UNDISABLABLE_TOOLS so manage_tools rejects
  attempts to hide it (otherwise the PreToolUse hook below would route
  Bash to a tool that's no longer on the wire). applyDisabledSet also
  filters reserved names defensively against hand-edits of
  disabledMcpTools in settings.json.

Cross-client preference advertising:
- Server now passes a BOSSTERM_MCP_INSTRUCTIONS string into the SDK's
  4-arg Server constructor. The MCP initialize response surfaces it
  into every client's system prompt, so the "prefer run_command over
  your built-in shell" guidance lands without per-project config.

Port marker file for cheap external probes:
- BossTermMcpManager atomically writes the actual bound port (after
  the 7676→7685 fallback) to ~/.bossterm/mcp.port on every successful
  bind, deletes it on clean stop. Lets the user-global PreToolUse hook
  decide reachability with stat + nc -z (~5ms) instead of an HTTP
  probe with timeout (~300ms worst case).

Settings:
- mcpRunCommandDefaultTimeoutMs (default 120_000, UI input)
- mcpRunCommandDefaultPanel (default horizontal_split, UI dropdown;
  invalid values silently fall back to horizontal_split — never bubble
  the user's bad setting up as a per-call error)
- mcpRunCommandMaxOutputBytes (default 120_000, advanced — sized to
  fit under mcpMaxAnswerChars=150_000 with JSON-wrapper headroom)
- mcpRunCommandShellReadyTimeoutMs (default 1_500, advanced)

UI:
- Two new rows under "BossTerm MCP Server" (panel dropdown + timeout
  input).
- ExposedToolsSection grays out reserved tools and gains a blurb
  surfacing the always-exposed manage_tools meta-tool (which lives
  outside BUILT_IN_READ/WRITE_TOOLS).

Docs:
- README intro updated; new "Using as Claude Code's default shell"
  subsection pointing at the docs.
- docs/mcp-server.md: full run_command schema + response shape +
  error contract, ~/.bossterm/mcp.port marker file note, initialize-
  time instructions section, settings reference rows, "Using as
  Claude Code's default shell" how-to (copy-paste hook script +
  settings.json + CLAUDE.md), multi-instance + sudo + nc caveats in
  Troubleshooting.

Verification:
- ./gradlew :compose-ui:compileKotlinDesktop --no-daemon clean.
- Hook script smoke-tested across all four branches (no marker /
  dead port / live port / no nc).
- End-to-end (MCP Inspector + two concurrent run_command calls + real
  Bash hooking) is still manual; the listener-before-send, alt-screen
  detection, and cache lock can't be exercised without a live UI.

User-global config (~/.claude/CLAUDE.md, hooks/prefer-bossterm.sh,
settings.json hook entry) is per-user and lives outside this repo —
the docs walk through it.

Generated with [Claude Code](https://claude.com/claude-code)
---
 README.md                                     |  16 +-
 .../compose/mcp/BossTermMcpManager.kt         |  46 ++
 .../bossterm/compose/mcp/BossTermMcpServer.kt | 538 +++++++++++++++++-
 .../compose/mcp/McpTerminalRegistry.kt        |  52 ++
 .../compose/settings/TerminalSettings.kt      |  59 ++
 .../settings/sections/McpSettingsSection.kt   |  69 ++-
 docs/mcp-server.md                            | 189 +++++-
 7 files changed, 947 insertions(+), 22 deletions(-)

diff --git a/README.md b/README.md
index 1fce3118..4ce99461 100644
--- a/README.md
+++ b/README.md
@@ -461,7 +461,9 @@ BossTerm ships an in-process [Model Context Protocol](https://modelcontextprotoc
 server that exposes the running terminal to MCP-aware clients (Claude Code,
 Codex, Gemini CLI, OpenCode). Clients can enumerate tabs, read scrollback,
 search output, capture the last completed command, and — when write tools
-are enabled — drive shells, send signals, and open new splits.
+are enabled — drive shells, send signals, open new splits, and **run
+commands in a visible pane** while still capturing stdout/stderr and exit
+code (`run_command` — recommended default shell for AI clients).
 
 - **Endpoint**: `http://127.0.0.1:7676/` over Server-Sent Events, configurable
   via Settings → BossTerm MCP → Port.
@@ -480,6 +482,18 @@ are enabled — drive shells, send signals, and open new splits.
    register the endpoint with. Re-attachment is idempotent and happens
    silently on subsequent launches.
 
+### Using as Claude Code's default shell
+
+Out of the box, the server's initialize-time `instructions` already tell
+Claude Code to prefer `run_command` over its built-in `Bash` whenever the
+MCP is attached — commands run in a visible BossTerm pane and the output
+still comes back to the agent. For a hard guarantee, add the user-global
+`PreToolUse` hook described in
+[docs/mcp-server.md](docs/mcp-server.md#using-as-claude-codes-default-shell):
+the hook checks the `~/.bossterm/mcp.port` marker BossTerm writes on every
+successful bind and routes `Bash` calls to `mcp__bossterm__run_command` when
+BossTerm is running, falling through silently when it isn't.
+
 ### Embedding it (as a developer)
 
 ```kotlin
diff --git a/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/BossTermMcpManager.kt b/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/BossTermMcpManager.kt
index f3b082da..73400568 100644
--- a/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/BossTermMcpManager.kt
+++ b/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/BossTermMcpManager.kt
@@ -28,10 +28,13 @@ import kotlinx.coroutines.sync.Mutex
 import kotlinx.coroutines.sync.withLock
 import kotlinx.coroutines.withContext
 import org.slf4j.LoggerFactory
+import java.io.File
 import java.io.IOException
 import java.net.BindException
 import java.net.InetSocketAddress
 import java.net.ServerSocket
+import java.nio.file.Files
+import java.nio.file.StandardCopyOption
 
 /**
  * Lifecycle wrapper that brings up the BossTerm in-process MCP server on a
@@ -360,6 +363,7 @@ class BossTermMcpManager(
             runningPort = port
             runningServer = mcpServerWrapper
             registry.setRunning(port)
+            writePortMarker(port)
             log.info(
                 "BossTerm MCP server ready: http://{}:{}{} (SSE transport, {} state(s) registered)",
                 HOST, port, PATH, registry.stateCount()
@@ -461,9 +465,51 @@ class BossTermMcpManager(
             runningPort = null
             runningServer = null
             registry.setStopped()
+            deletePortMarker()
         }
     }
 
+    /**
+     * Atomic write of the bound port to `~/.bossterm/mcp.port` so the user-global
+     * Claude Code `PreToolUse` hook can decide whether to route `Bash` through
+     * `mcp__bossterm__run_command` with a single stat + `nc -z` instead of an
+     * HTTP probe (~5ms vs ~300ms worst case per Bash call).
+     *
+     * Reflects the *actual* bound port, including the 7676→7685 fallback range,
+     * so the hook doesn't need to know about fallback. Best-effort: any I/O
+     * failure is logged at WARN and ignored — the marker is an optimization,
+     * not a correctness lever.
+     */
+    private fun writePortMarker(port: Int) {
+        try {
+            val target = mcpPortMarkerFile()
+            target.parentFile?.mkdirs()
+            val tmp = File(target.parentFile, ".mcp.port.tmp")
+            tmp.writeText(port.toString())
+            // ATOMIC_MOVE so concurrent hook reads never see a partial file.
+            Files.move(
+                tmp.toPath(), target.toPath(),
+                StandardCopyOption.ATOMIC_MOVE, StandardCopyOption.REPLACE_EXISTING
+            )
+        } catch (e: Throwable) {
+            log.warn("Failed to write MCP port marker: {}", e.message)
+        }
+    }
+
+    private fun deletePortMarker() {
+        try {
+            val target = mcpPortMarkerFile()
+            if (target.exists() && !target.delete()) {
+                log.warn("Failed to delete MCP port marker at {}", target)
+            }
+        } catch (e: Throwable) {
+            log.warn("Error while deleting MCP port marker: {}", e.message)
+        }
+    }
+
+    private fun mcpPortMarkerFile(): File =
+        File(System.getProperty("user.home"), ".bossterm/mcp.port")
+
     private data class McpRuntimeConfig(val enabled: Boolean, val port: Int)
 
     private companion object {
diff --git a/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/BossTermMcpServer.kt b/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/BossTermMcpServer.kt
index 69763068..7539eb37 100644
--- a/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/BossTermMcpServer.kt
+++ b/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/BossTermMcpServer.kt
@@ -1,9 +1,12 @@
 package ai.rever.bossterm.compose.mcp
 
 import ai.rever.bossterm.compose.TabbedTerminalState
+import ai.rever.bossterm.compose.TerminalSession
 import ai.rever.bossterm.compose.debug.ChunkSource
 import ai.rever.bossterm.compose.settings.SettingsManager
 import ai.rever.bossterm.compose.tabs.TerminalTab
+import ai.rever.bossterm.terminal.model.CommandStateListener
+import ai.rever.bossterm.terminal.model.TerminalTextBuffer
 import io.modelcontextprotocol.kotlin.sdk.server.Server
 import io.modelcontextprotocol.kotlin.sdk.server.ServerOptions
 import io.modelcontextprotocol.kotlin.sdk.types.CallToolResult
@@ -11,6 +14,13 @@ import io.modelcontextprotocol.kotlin.sdk.types.Implementation
 import io.modelcontextprotocol.kotlin.sdk.types.ServerCapabilities
 import io.modelcontextprotocol.kotlin.sdk.types.TextContent
 import io.modelcontextprotocol.kotlin.sdk.types.ToolSchema
+import kotlinx.coroutines.CompletableDeferred
+import kotlinx.coroutines.coroutineScope
+import kotlinx.coroutines.delay
+import kotlinx.coroutines.isActive
+import kotlinx.coroutines.launch
+import kotlinx.coroutines.sync.withLock
+import kotlinx.coroutines.withTimeoutOrNull
 import kotlinx.serialization.Serializable
 import kotlinx.serialization.json.Json
 import kotlinx.serialization.json.JsonArray
@@ -108,7 +118,8 @@ class BossTermMcpServer(
     private val writeToolRegistrations: Map<String, (Server) -> Unit> = mapOf(
         "send_input" to ::registerSendInput,
         "send_signal" to ::registerSendSignal,
-        "run_in_panel" to ::registerRunInPanel
+        "run_in_panel" to ::registerRunInPanel,
+        "run_command" to ::registerRunCommand
     )
 
     /** Reserved tools that callers cannot disable. */
@@ -148,7 +159,8 @@ class BossTermMcpServer(
                 capabilities = ServerCapabilities(
                     tools = ServerCapabilities.Tools(listChanged = true)
                 )
-            )
+            ),
+            instructions = BOSSTERM_MCP_INSTRUCTIONS
         )
         serverRef = server
 
@@ -200,10 +212,16 @@ class BossTermMcpServer(
             // Read the ref inside the lock so a concurrent detachServer() can't
             // null it between the check and the mutations.
             val server = serverRef ?: return
+            // Defense in depth: even if settings.json is hand-edited to put
+            // `manage_tools` or `run_command` in `disabledMcpTools`, never
+            // actually take them off the wire. `manage_tools` handler already
+            // refuses these names, but applyDisabledSet trusts the persisted
+            // set blindly — the filter here covers the hand-edit path.
+            val effectiveDisabled = disabled - undisablableTools
             for (name in availableToolNames()) {
                 val prefixed = toolName(name)
                 val present = server.tools.containsKey(prefixed)
-                val shouldBeExposed = name !in disabled
+                val shouldBeExposed = name !in effectiveDisabled
                 if (shouldBeExposed && !present) {
                     (readToolRegistrations[name] ?: writeToolRegistrations[name])?.invoke(server)
                 } else if (!shouldBeExposed && present) {
@@ -898,6 +916,445 @@ class BossTermMcpServer(
         }
     }
 
+    // -----------------------------------------------------------------
+    // Tool: run_command
+    // -----------------------------------------------------------------
+
+    private fun registerRunCommand(server: Server) {
+        server.addTool(
+            name = toolName("run_command"),
+            description = describe(
+                "run_command",
+                "Run a shell command in a visible BossTerm pane and return its stdout/stderr, " +
+                        "exit code, and duration. Reuses the same pane across calls within a tab — " +
+                        "pass back the `pane_id` from a prior call to keep using the same pane. " +
+                        "Requires OSC 133 shell integration on the user's shell. Use this instead " +
+                        "of your built-in shell tool when the BossTerm MCP is attached. For TUIs " +
+                        "(vim, less, htop, git commit without -m), returns `error: \"TUI detected\"`; " +
+                        "switch to send_input / read_scrollback to drive those."
+            ),
+            inputSchema = ToolSchema(
+                properties = buildJsonObject {
+                    putJsonObject("script") {
+                        put("type", "string")
+                        put("description", "Shell command to run. A trailing newline is added if absent.")
+                    }
+                    putJsonObject("pane_id") {
+                        put("type", "string")
+                        put("description", "Optional MCP pane to reuse. Defaults to the pane this " +
+                                "tool last created for `tab_id`; if none, a new pane is created.")
+                    }
+                    putJsonObject("tab_id") {
+                        put("type", "string")
+                        put("description", "Source tab. Defaults to the primary window's active tab.")
+                    }
+                    putJsonObject("panel") {
+                        put("type", "string")
+                        put("description", "Panel mode when creating a new pane: reuse (default), " +
+                                "horizontal_split, vertical_split, new_tab.")
+                    }
+                    putJsonObject("split_ratio") {
+                        put("type", "number")
+                        put("description", "Fraction of parent dimension the NEW pane gets " +
+                                "(0.05..0.95). Defaults to `mcpDefaultSplitRatio`.")
+                    }
+                    putJsonObject("working_dir") {
+                        put("type", "string")
+                        put("description", "Working directory for the new pane. Splits inherit " +
+                                "cwd via OSC 7 by default.")
+                    }
+                    putJsonObject("timeout_ms") {
+                        put("type", "integer")
+                        put("description", "Hard timeout in milliseconds. Default 120000, max 600000.")
+                    }
+                },
+                required = listOf("script")
+            )
+        ) { request ->
+            val args = request.arguments
+            val script = args.requireString("script")
+                ?: return@addTool errorResult("Missing required argument: script")
+            if (script.isEmpty()) {
+                return@addTool errorResult("'script' must not be empty")
+            }
+            val explicitPaneId = args.requireString("pane_id")
+            val requestedTabId = args.requireString("tab_id")
+            val panel = (args.requireString("panel") ?: "reuse").lowercase()
+            val workingDir = args.requireString("working_dir")
+            val userSettings = settingsManager.settings.value
+            val defaultTimeoutMs = userSettings.mcpRunCommandDefaultTimeoutMs
+                .coerceIn(MIN_RUN_COMMAND_TIMEOUT_MS, MAX_RUN_COMMAND_TIMEOUT_MS)
+            val timeoutMs = (args.optionalInt("timeout_ms") ?: defaultTimeoutMs)
+                .coerceIn(MIN_RUN_COMMAND_TIMEOUT_MS, MAX_RUN_COMMAND_TIMEOUT_MS)
+            val maxOutputBytes = userSettings.mcpRunCommandMaxOutputBytes.coerceAtLeast(1024)
+            val shellReadyTimeoutMs = userSettings.mcpRunCommandShellReadyTimeoutMs
+                .coerceAtLeast(0).toLong()
+
+            val state = if (requestedTabId != null) {
+                registry.findState(requestedTabId)
+                    ?: return@addTool errorResult("Unknown tab_id: $requestedTabId")
+            } else {
+                registry.primaryState()
+                    ?: return@addTool errorResult("No registered terminal window")
+            }
+            val tabId = requestedTabId
+                ?: state.activeTabId
+                ?: return@addTool errorResult("No active tab")
+
+            // Resolve target pane in priority order: explicit > cached > newly-created.
+            // freshlyCreated controls whether we wait for OSC 133;A before sending.
+            //
+            // The cached-or-create path runs under the per-tab cache lock so two
+            // concurrent calls with the same tabId can't both miss the cache and
+            // both create new panes (orphaning one). Explicit pane_id skips the
+            // lock — it doesn't read or mutate the cache.
+            var session: TerminalSession? = null
+            var paneId: String? = null
+            var freshlyCreated = false
+
+            if (explicitPaneId != null) {
+                session = state.findSession(tabId, explicitPaneId)
+                    ?: return@addTool errorResult(
+                        "Unknown pane_id '$explicitPaneId' in tab '$tabId'"
+                    )
+                paneId = explicitPaneId
+                // Keep the cache in sync with explicit use so later `panel:
+                // "reuse"` calls return the same pane the user is already
+                // driving. Idempotent under concurrent explicit calls.
+                registry.setScratchPane(tabId, explicitPaneId)
+            } else {
+                val resolveResult = registry.tabCacheLock(tabId).withLock {
+                    val cached = registry.getScratchPane(tabId)
+                    if (cached != null) {
+                        val cachedSession = state.findSession(tabId, cached)
+                        if (cachedSession != null) {
+                            return@withLock PaneResolution.Hit(cached, cachedSession, fresh = false)
+                        }
+                        // Stale entry (user closed the pane). Drop cache + per-pane mutex.
+                        registry.clearScratchPane(tabId, paneId = cached)
+                    }
+
+                    // Cache-miss path: create a new pane, cache it, return it.
+                    val configuredDefault = userSettings.mcpDefaultSplitRatio
+                    val requestedRatio = args.optionalFloat("split_ratio")
+                    val effectiveRatio = (requestedRatio ?: configuredDefault)
+                        .coerceIn(0.05f, 0.95f)
+                    // The setting only controls what `panel: "reuse"` decays
+                    // to on first creation; invalid setting values quietly
+                    // fall back to "horizontal_split" so users don't get an
+                    // error referring to their own setting as the bad value.
+                    val configuredPanel = userSettings.mcpRunCommandDefaultPanel.lowercase()
+                    val defaultPanel = if (configuredPanel in VALID_RUN_COMMAND_PANELS) {
+                        configuredPanel
+                    } else {
+                        "horizontal_split"
+                    }
+                    val effectivePanel = if (panel == "reuse") defaultPanel else panel
+                    val newPaneId = when (effectivePanel) {
+                        "horizontal_split" -> state.splitHorizontal(
+                            tabId = tabId,
+                            ratio = effectiveRatio,
+                            initialCommand = null
+                        )
+                        "vertical_split" -> state.splitVertical(
+                            tabId = tabId,
+                            ratio = effectiveRatio,
+                            initialCommand = null
+                        )
+                        "new_tab" -> state.createTab(
+                            workingDir = workingDir,
+                            initialCommand = null
+                        )
+                        else -> return@withLock PaneResolution.BadPanel(effectivePanel)
+                    } ?: return@withLock PaneResolution.CreateFailed
+                    val newSession = state.findSession(tabId, newPaneId)
+                        ?: state.findSession(newPaneId)
+                        ?: return@withLock PaneResolution.Unresolvable(newPaneId)
+                    registry.setScratchPane(tabId, newPaneId)
+                    PaneResolution.Hit(newPaneId, newSession, fresh = true)
+                }
+                when (resolveResult) {
+                    is PaneResolution.Hit -> {
+                        paneId = resolveResult.paneId
+                        session = resolveResult.session
+                        freshlyCreated = resolveResult.fresh
+                    }
+                    is PaneResolution.BadPanel -> return@addTool errorResult(
+                        "Unknown panel: '${resolveResult.value}'. Expected one of: reuse, " +
+                            "horizontal_split, vertical_split, new_tab. Check your `panel` " +
+                            "argument or the `mcpRunCommandDefaultPanel` setting."
+                    )
+                    PaneResolution.CreateFailed -> return@addTool errorResult(
+                        "Failed to create pane (terminal too small?)"
+                    )
+                    is PaneResolution.Unresolvable -> return@addTool errorResult(
+                        "Created pane ${resolveResult.paneId} but cannot resolve session"
+                    )
+                }
+            }
+
+            val resolvedPaneId = paneId!!
+            val resolvedSession = session!!
+
+            // Per-pane mutex queues concurrent calls FIFO. MCP transport allows
+            // overlapping JSON-RPC requests on one session, so without this two
+            // pipelined calls would interleave their scripts in the shell's stdin.
+            val mutex = registry.paneMutex(resolvedPaneId)
+            val startTimeMs = System.currentTimeMillis()
+            val outcome = mutex.withLock {
+                executeInPane(
+                    session = resolvedSession,
+                    script = script,
+                    timeoutMs = timeoutMs,
+                    freshlyCreated = freshlyCreated,
+                    shellReadyTimeoutMs = shellReadyTimeoutMs,
+                    maxOutputBytes = maxOutputBytes
+                )
+            }
+            val durationMs = System.currentTimeMillis() - startTimeMs
+
+            val result = RunCommandResult(
+                ok = outcome.error == null,
+                tabId = tabId,
+                paneId = resolvedPaneId,
+                exitCode = outcome.exitCode,
+                durationMs = durationMs,
+                output = outcome.output,
+                truncated = outcome.truncated,
+                error = outcome.error
+            )
+            successJson(json.encodeToString(RunCommandResult.serializer(), result))
+        }
+    }
+
+    /**
+     * Drive a single command on [session]: register an OSC 133 listener, wait
+     * for the prompt (only on freshly-created panes), write the script, then
+     * suspend until OSC 133;D OR the terminal enters the alternate buffer (TUI)
+     * OR [timeoutMs] elapses. Slices the captured output via absolute history
+     * line tracking so it survives scrollback wraparound during the command.
+     *
+     * Caller MUST hold the per-pane mutex; concurrent calls on the same pane
+     * would interleave writes to the shell's stdin.
+     */
+    private suspend fun executeInPane(
+        session: TerminalSession,
+        script: String,
+        timeoutMs: Int,
+        freshlyCreated: Boolean,
+        shellReadyTimeoutMs: Long,
+        maxOutputBytes: Int
+    ): RunOutcome {
+        val terminal = session.terminal
+        val textBuffer = session.textBuffer
+
+        val promptReadySignal = CompletableDeferred<Unit>()
+        val finishedSignal = CompletableDeferred<CommandFinish>()
+        // Snapshotted in onCommandStarted. -1 sentinel = "B never fired"; falls
+        // back to (historyAtSend, cursorYAtSend) sampled right before write.
+        var historyAtB = -1
+        var cursorYAtB = -1
+
+        val listener = object : CommandStateListener {
+            override fun onPromptStarted() {
+                // SDK uses a one-shot deferred; subsequent A events (the next
+                // prompt that fires right after D) are ignored.
+                if (!promptReadySignal.isCompleted) promptReadySignal.complete(Unit)
+            }
+            override fun onCommandStarted() {
+                historyAtB = textBuffer.historyLinesCount
+                cursorYAtB = terminal.cursorY - 1
+            }
+            override fun onCommandFinished(exitCode: Int) {
+                if (!finishedSignal.isCompleted) {
+                    finishedSignal.complete(CommandFinish.Done(exitCode))
+                }
+            }
+        }
+        terminal.addCommandStateListener(listener)
+
+        return try {
+            // Freshly-created panes haven't seen their first prompt yet.
+            // Reused panes already received A from PROMPT_COMMAND after the
+            // previous D, so our listener wouldn't see another A — skip the wait.
+            if (freshlyCreated && shellReadyTimeoutMs > 0) {
+                val ready = withTimeoutOrNull(shellReadyTimeoutMs) {
+                    promptReadySignal.await()
+                }
+                if (ready == null) {
+                    log.debug(
+                        "run_command: OSC 133;A not seen within {} ms on fresh pane; " +
+                                "sending script anyway (shell integration may be missing)",
+                        shellReadyTimeoutMs
+                    )
+                }
+            }
+
+            // Fallback start mark, in case B never fires (shell-integration missing
+            // or a degenerate command path).
+            val historyAtSend = textBuffer.historyLinesCount
+            val cursorYAtSend = terminal.cursorY - 1
+
+            val toWrite = if (script.endsWith("\n")) script else script + "\n"
+            session.writeUserInput(toWrite)
+
+            val finish = withTimeoutOrNull(timeoutMs.toLong()) {
+                coroutineScope {
+                    // Alternate-screen poll runs alongside the OSC 133;D wait.
+                    // Whichever fires first completes finishedSignal. TUI detection
+                    // doesn't kill the process — the user can still send_input.
+                    val tuiPoller = launch {
+                        while (isActive) {
+                            if (textBuffer.isUsingAlternateBuffer) {
+                                if (!finishedSignal.isCompleted) {
+                                    finishedSignal.complete(CommandFinish.TuiDetected)
+                                }
+                                break
+                            }
+                            delay(TUI_POLL_INTERVAL_MS)
+                        }
+                    }
+                    val outcome = finishedSignal.await()
+                    tuiPoller.cancel()
+                    outcome
+                }
+            }
+
+            val historyAtEnd = textBuffer.historyLinesCount
+            val cursorYAtEnd = terminal.cursorY - 1
+            val startHistory = if (historyAtB >= 0) historyAtB else historyAtSend
+            val startCursorY = if (cursorYAtB >= 0) cursorYAtB else cursorYAtSend
+
+            when (finish) {
+                null -> {
+                    val sliced = sliceCommandOutput(
+                        textBuffer = textBuffer,
+                        startHistory = startHistory,
+                        startCursorY = startCursorY,
+                        endHistory = historyAtEnd,
+                        endCursorY = cursorYAtEnd,
+                        maxOutputBytes = maxOutputBytes
+                    )
+                    RunOutcome(
+                        exitCode = null,
+                        output = sliced.text,
+                        truncated = true,
+                        error = "Timed out after ${timeoutMs}ms waiting for command to finish. " +
+                                "Partial output captured."
+                    )
+                }
+                is CommandFinish.TuiDetected -> RunOutcome(
+                    exitCode = null,
+                    output = "",
+                    truncated = false,
+                    error = "TUI detected (alternate screen entered). Use send_input + " +
+                            "read_scrollback to drive the program, or rerun with " +
+                            "non-interactive flags."
+                )
+                is CommandFinish.Done -> {
+                    val sliced = sliceCommandOutput(
+                        textBuffer = textBuffer,
+                        startHistory = startHistory,
+                        startCursorY = startCursorY,
+                        endHistory = historyAtEnd,
+                        endCursorY = cursorYAtEnd,
+                        maxOutputBytes = maxOutputBytes
+                    )
+                    RunOutcome(
+                        exitCode = finish.exitCode,
+                        output = sliced.text,
+                        truncated = sliced.truncated,
+                        error = null
+                    )
+                }
+            }
+        } finally {
+            terminal.removeCommandStateListener(listener)
+        }
+    }
+
+    /**
+     * Capture the buffer slice covering the most-recent command's output.
+     *
+     * Coordinate system: absolute "history-line" numbers, captured as
+     * `historyLinesCount + cursorY-0-indexed` at B-time and D-time. Translating
+     * back to current-snapshot row indices uses the delta between historyAtB
+     * and historyAtEnd, so the slice stays correct when output scrolls into
+     * history during the command.
+     *
+     * Falls back to "last visible screen" if the start mark scrolled past the
+     * history cap (very long outputs). Caps total bytes at [MAX_OUTPUT_BYTES]
+     * and reports `truncated=true` in that case.
+     */
+    private fun sliceCommandOutput(
+        textBuffer: TerminalTextBuffer,
+        startHistory: Int,
+        startCursorY: Int,
+        endHistory: Int,
+        endCursorY: Int,
+        maxOutputBytes: Int
+    ): SlicedOutput {
+        val snapshot = textBuffer.createSnapshot()
+        val historyDelta = endHistory - startHistory
+        // First output row at end-snapshot time. May be negative (in history).
+        var startRow = startCursorY - historyDelta
+        // Last output row inclusive. cursorYAtEnd is the row where the *next*
+        // prompt will be drawn (after the final newline), so the last output
+        // line is one row above it.
+        val endRowInclusive = endCursorY - 1
+
+        val oldestAvailableRow = -snapshot.historyLinesCount
+        if (startRow < oldestAvailableRow) startRow = oldestAvailableRow
+        if (endRowInclusive < startRow) return SlicedOutput("", false)
+
+        val sb = StringBuilder()
+        var truncated = false
+        var row = startRow
+        while (row <= endRowInclusive) {
+            val line = snapshot.getLine(row).text.trimEnd()
+            // +1 for the newline that joins lines.
+            if (sb.length + line.length + 1 > maxOutputBytes) {
+                truncated = true
+                break
+            }
+            if (sb.isNotEmpty()) sb.append('\n')
+            sb.append(line)
+            row++
+        }
+        return SlicedOutput(sb.toString(), truncated)
+    }
+
+    private data class RunOutcome(
+        val exitCode: Int?,
+        val output: String,
+        val truncated: Boolean,
+        val error: String?
+    )
+
+    private sealed class CommandFinish {
+        data class Done(val exitCode: Int) : CommandFinish()
+        object TuiDetected : CommandFinish()
+    }
+
+    /**
+     * Result of the cache-locked pane resolution. Carries either a usable
+     * session or one of the discrete failure modes the caller needs to
+     * translate into MCP error responses. Sealed so the `when` in the
+     * caller is exhaustive — no silent fall-through.
+     */
+    private sealed class PaneResolution {
+        data class Hit(
+            val paneId: String,
+            val session: TerminalSession,
+            val fresh: Boolean
+        ) : PaneResolution()
+        data class BadPanel(val value: String) : PaneResolution()
+        object CreateFailed : PaneResolution()
+        data class Unresolvable(val paneId: String) : PaneResolution()
+    }
+
+    private data class SlicedOutput(val text: String, val truncated: Boolean)
+
     // -----------------------------------------------------------------
     // Tool: read_debug_console
     // -----------------------------------------------------------------
@@ -1333,6 +1790,22 @@ class BossTermMcpServer(
         val paneId: String?
     )
 
+    @Serializable
+    data class RunCommandResult(
+        val ok: Boolean,
+        val tabId: String,
+        val paneId: String,
+        /** Process exit code from OSC 133;D, or null on timeout / TUI / shell-integration missing. */
+        val exitCode: Int?,
+        val durationMs: Long,
+        /** Captured stdout/stderr, ANSI-stripped per the emulator's per-line `.text`. */
+        val output: String,
+        /** True when output was capped at MAX_OUTPUT_BYTES or the command timed out. */
+        val truncated: Boolean,
+        /** Non-null on timeout, TUI detection, or other recoverable failures. */
+        val error: String?
+    )
+
     @Serializable
     data class DebugConsoleChunk(
         val index: Int,
@@ -1367,6 +1840,33 @@ class BossTermMcpServer(
         private const val DEFAULT_SEARCH_MAX_MATCHES = 50
         private const val DEFAULT_DEBUG_CHUNKS = 100
 
+        /**
+         * Per-call clamp on `timeout_ms`. Even if a user sets
+         * `mcpRunCommandDefaultTimeoutMs` outside these bounds, every call is
+         * still coerced into `[MIN, MAX]` — protects against typos in
+         * settings.json and against agents passing absurd values.
+         */
+        private const val MIN_RUN_COMMAND_TIMEOUT_MS = 100
+        /** Upper bound — a single MCP request that hogs the pane for >10 min is suspect. */
+        private const val MAX_RUN_COMMAND_TIMEOUT_MS = 600_000
+
+        /**
+         * How often `run_command` checks `isUsingAlternateBuffer` while waiting
+         * for OSC 133;D. 100ms is the typical human-perception threshold and
+         * cheap relative to the surrounding wait. Not user-tunable — exposing
+         * it would invite agents to set it sky-high and lose TUI detection.
+         */
+        private const val TUI_POLL_INTERVAL_MS = 100L
+
+        /**
+         * Valid values for `mcpRunCommandDefaultPanel`. Excludes "reuse"
+         * intentionally — `reuse` only makes sense as a per-call argument
+         * that defers to this setting. If a user puts `reuse` here we fall
+         * back silently to `horizontal_split`.
+         */
+        private val VALID_RUN_COMMAND_PANELS: Set<String> =
+            setOf("horizontal_split", "vertical_split", "new_tab")
+
         /**
          * Unprefixed built-in read tool names, in display order. Single source of
          * truth shared with the settings UI so toggle rows can't drift from the
@@ -1386,14 +1886,40 @@ class BossTermMcpServer(
         val BUILT_IN_WRITE_TOOLS: List<String> = listOf(
             "send_input",
             "send_signal",
-            "run_in_panel"
+            "run_in_panel",
+            "run_command"
         )
 
         /**
          * Tools that may never be disabled. `manage_tools` is the only escape hatch
          * once everything else has been turned off, so disabling it would brick the
-         * MCP surface.
+         * MCP surface. `run_command` is the tool the user-global PreToolUse hook
+         * (~/.claude/hooks/prefer-bossterm.sh) routes Bash calls to, so disabling
+         * it would leave Claude staring at a "deny: use run_command" message with
+         * no run_command on the wire.
+         */
+        val UNDISABLABLE_TOOLS: Set<String> = setOf("manage_tools", "run_command")
+
+        /**
+         * Surfaced to clients in the MCP `initialize` response so they see this
+         * preference in their system prompt without any user-side config.
+         *
+         * The Kotlin MCP SDK 0.8.3 puts the string here into [io.modelcontextprotocol
+         * .kotlin.sdk.server.ServerSession.instructions], which the spec says
+         * clients SHOULD surface to the LLM as system-prompt guidance. Both
+         * Claude Code and Codex do.
          */
-        val UNDISABLABLE_TOOLS: Set<String> = setOf("manage_tools")
+        @Suppress("ConstPropertyName")
+        const val BOSSTERM_MCP_INSTRUCTIONS: String =
+            "The `run_command` tool runs a shell command in a visible BossTerm pane and " +
+                "returns its stdout/stderr, exit code, and duration. Prefer it over your " +
+                "built-in shell tool when this server is attached — the user sees commands " +
+                "run in their actual terminal, output is still returned to you, and " +
+                "consecutive calls reuse the same pane so the UI stays clean. Pass back " +
+                "the `pane_id` from a prior call to keep using the same pane. Use " +
+                "`run_in_panel` only when you explicitly want a fire-and-forget visible " +
+                "action (long-running dev servers, REPLs). If `run_command` returns " +
+                "`error: \"TUI detected\"`, switch to `send_input` + `read_scrollback` to " +
+                "drive the program — do not retry the same command."
     }
 }
diff --git a/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/McpTerminalRegistry.kt b/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/McpTerminalRegistry.kt
index 1ab3e3d2..dea5e38c 100644
--- a/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/McpTerminalRegistry.kt
+++ b/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/McpTerminalRegistry.kt
@@ -6,6 +6,8 @@ import ai.rever.bossterm.compose.tabs.TerminalTab
 import kotlinx.coroutines.flow.MutableStateFlow
 import kotlinx.coroutines.flow.StateFlow
 import kotlinx.coroutines.flow.asStateFlow
+import kotlinx.coroutines.sync.Mutex
+import java.util.concurrent.ConcurrentHashMap
 import java.util.concurrent.CopyOnWriteArrayList
 
 /**
@@ -147,6 +149,56 @@ object McpTerminalRegistry {
         }
     }
 
+    // -----------------------------------------------------------------
+    // run_command pane reuse — cache the "MCP scratch pane" per tab so
+    // consecutive run_command calls stack into one visible split instead
+    // of spawning a new one each time. Eviction is lazy: the cache holds
+    // a hint, and run_command verifies the paneId still resolves via
+    // state.findSession before reusing. No listener wiring needed.
+    //
+    // Per-pane Mutex serializes concurrent run_command calls hitting the
+    // same pane — without it, two pipelined calls would interleave their
+    // scripts in the shell's stdin buffer. Stale entries accumulate when
+    // panes close (unbounded by paneId UUIDs) but the leak is bounded by
+    // pane-creation rate, which is human-scale.
+    // -----------------------------------------------------------------
+
+    private val mcpScratchPanes = ConcurrentHashMap<String /*tabId*/, String /*paneId*/>()
+    private val paneMutexes = ConcurrentHashMap<String /*paneId*/, Mutex>()
+    private val tabCacheLocks = ConcurrentHashMap<String /*tabId*/, Mutex>()
+
+    /** Most recent MCP scratch pane recorded for [tabId], or null if none. */
+    internal fun getScratchPane(tabId: String): String? = mcpScratchPanes[tabId]
+
+    /** Record [paneId] as the active MCP scratch pane for [tabId]. */
+    internal fun setScratchPane(tabId: String, paneId: String) {
+        mcpScratchPanes[tabId] = paneId
+    }
+
+    /**
+     * Drop the recorded scratch pane for [tabId] (called when the pane is gone).
+     * Pass [paneId] to also evict the stale per-pane mutex, preventing the
+     * mutex map from accumulating entries over long-lived sessions.
+     */
+    internal fun clearScratchPane(tabId: String, paneId: String? = null) {
+        mcpScratchPanes.remove(tabId)
+        if (paneId != null) paneMutexes.remove(paneId)
+    }
+
+    /** Per-pane mutex; created on first use. */
+    internal fun paneMutex(paneId: String): Mutex =
+        paneMutexes.computeIfAbsent(paneId) { Mutex() }
+
+    /**
+     * Per-tab lock that guards the scratch-pane read-or-create-and-cache
+     * critical section. Without this, two concurrent run_command calls with
+     * the same tabId can each miss the cache, each create a fresh pane, and
+     * orphan one of them. Created on first use; held briefly so contention
+     * is low.
+     */
+    internal fun tabCacheLock(tabId: String): Mutex =
+        tabCacheLocks.computeIfAbsent(tabId) { Mutex() }
+
     private fun persist(targets: Set<McpAttachTarget>) {
         // Sort by enum-declaration order so settings.json is deterministic
         // across saves (kotlinx.serialization writes whatever the Set
diff --git a/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/settings/TerminalSettings.kt b/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/settings/TerminalSettings.kt
index bb507c70..2c314a45 100644
--- a/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/settings/TerminalSettings.kt
+++ b/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/settings/TerminalSettings.kt
@@ -779,6 +779,65 @@ data class TerminalSettings(
      */
     val mcpDefaultSplitRatio: Float = 0.3f,
 
+    /**
+     * Default hard timeout (in milliseconds) for `run_command`, applied when
+     * the caller doesn't pass `timeout_ms`. Default and per-call overrides
+     * are both clamped to `100..600_000` server-side. Lower this if you'd
+     * rather see "timed out" results sooner; raise it for long builds.
+     *
+     * Default `120_000` (2 minutes) covers the majority of dev workflows
+     * (tests, builds, package installs) while keeping a 10-minute ceiling
+     * before a runaway script can hog the pane. Long builds (e.g. cold
+     * gradle builds) may need a higher value — pass `timeout_ms` per call,
+     * or raise the default.
+     */
+    val mcpRunCommandDefaultTimeoutMs: Int = 120_000,
+
+    /**
+     * Cap on the captured `output` field returned by `run_command`. Beyond
+     * this, output is truncated and the response carries `truncated: true`.
+     *
+     * Default `120_000` (~120 KB) is sized to fit under `mcpMaxAnswerChars`
+     * (`150_000` soft response cap) with headroom for the JSON wrapper, so
+     * a maxed-out `run_command` reply never trips the response-shortening
+     * ladder. Raise it (and `mcpMaxAnswerChars`) together for tooling that
+     * emits very large dumps; lower it for tight-context clients.
+     *
+     * Minimum enforced: `1024` bytes — smaller values are silently raised
+     * so a single typical output line still fits.
+     *
+     * Advanced setting — no UI control, edit settings.json directly.
+     */
+    val mcpRunCommandMaxOutputBytes: Int = 120_000,
+
+    /**
+     * Fallback delay `run_command` waits for OSC 133;A on a freshly-created
+     * pane before sending the script anyway. Only kicks in when the user's
+     * shell hasn't been configured for OSC 133 prompt-ready notifications,
+     * so most users never see this matter.
+     *
+     * Default `1_500` ms. Raise it if your shell rc files are very slow to
+     * load; lower it if you want faster "shell integration missing" feedback.
+     * Set `0` to skip the wait entirely — the script is sent immediately on
+     * a freshly-created pane (cached panes never wait regardless).
+     *
+     * Advanced setting — no UI control, edit settings.json directly.
+     */
+    val mcpRunCommandShellReadyTimeoutMs: Int = 1_500,
+
+    /**
+     * Panel mode `run_command` uses when it has to create a new MCP scratch
+     * pane (no cached pane for the tab, no explicit `pane_id`, and the
+     * caller passed `panel: "reuse"` or omitted `panel`). One of:
+     * `horizontal_split` (default — splits below the focused pane),
+     * `vertical_split` (splits beside), or `new_tab` (opens a fresh tab).
+     *
+     * Subsequent `run_command` calls reuse the pane created here, so this
+     * is "what does the first call's UI look like" — it doesn't kick in
+     * every call.
+     */
+    val mcpRunCommandDefaultPanel: String = "horizontal_split",
+
     /**
      * Names (enum `.name`) of [ai.rever.bossterm.compose.mcp.McpAttachTarget]s
      * that this BossTerm endpoint is registered with via the user's
diff --git a/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/settings/sections/McpSettingsSection.kt b/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/settings/sections/McpSettingsSection.kt
index c3a5715d..fd4fc857 100644
--- a/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/settings/sections/McpSettingsSection.kt
+++ b/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/settings/sections/McpSettingsSection.kt
@@ -36,6 +36,7 @@ import ai.rever.bossterm.compose.settings.SettingsTheme.AccentColor
 import ai.rever.bossterm.compose.settings.SettingsTheme.TextMuted
 import ai.rever.bossterm.compose.settings.SettingsTheme.TextPrimary
 import ai.rever.bossterm.compose.settings.TerminalSettings
+import ai.rever.bossterm.compose.settings.components.SettingsDropdown
 import ai.rever.bossterm.compose.settings.components.SettingsNumberInput
 import ai.rever.bossterm.compose.settings.components.SettingsSection
 import ai.rever.bossterm.compose.settings.components.SettingsToggle
@@ -102,7 +103,7 @@ fun McpSettingsSection(
             )
 
             ai.rever.bossterm.compose.settings.components.SettingsSlider(
-                label = "Default Split Size for `run_in_panel`",
+                label = "Default Split Size for `run_in_panel` / `run_command`",
                 value = settings.mcpDefaultSplitRatio,
                 onValueChange = { onSettingsChange(settings.copy(mcpDefaultSplitRatio = it)) },
                 onValueChangeFinished = onSettingsSave,
@@ -112,10 +113,37 @@ fun McpSettingsSection(
                 valueRange = 0.05f..0.95f,
                 steps = 17, // 0.05, 0.10, ..., 0.95 = 19 stops, 17 internal steps
                 valueDisplay = { "${(it * 100).toInt()}%" },
-                description = "When an MCP agent opens a split via `run_in_panel` without " +
-                        "specifying split_ratio, the new pane gets this fraction of the " +
-                        "parent's size. Smaller values (~30%) keep the agent's main pane " +
-                        "visible; larger values give the script more real estate.",
+                description = "When an MCP agent opens a split without specifying split_ratio, " +
+                        "the new pane gets this fraction of the parent's size. Smaller values " +
+                        "(~30%) keep the agent's main pane visible; larger values give the " +
+                        "script more real estate.",
+                enabled = settings.mcpEnabled
+            )
+
+            SettingsDropdown(
+                label = "Default Panel Mode for `run_command`",
+                options = listOf("horizontal_split", "vertical_split", "new_tab"),
+                selectedOption = settings.mcpRunCommandDefaultPanel,
+                onOptionSelected = {
+                    onSettingsChange(settings.copy(mcpRunCommandDefaultPanel = it))
+                },
+                description = "Where `run_command` creates its scratch pane on the first call " +
+                        "in a tab. Subsequent calls reuse that pane regardless of this setting. " +
+                        "`horizontal_split` puts a strip below the agent's pane; `vertical_split` " +
+                        "puts it beside; `new_tab` opens a fresh tab.",
+                enabled = settings.mcpEnabled
+            )
+
+            SettingsNumberInput(
+                label = "Default `run_command` Timeout (ms)",
+                value = settings.mcpRunCommandDefaultTimeoutMs,
+                onValueChange = {
+                    onSettingsChange(settings.copy(mcpRunCommandDefaultTimeoutMs = it))
+                },
+                range = 100..600_000,
+                description = "Hard timeout `run_command` uses when the caller doesn't pass " +
+                        "`timeout_ms` explicitly. 120000 = 2 minutes (default). Range " +
+                        "100..600000. Per-call values from the agent still override this.",
                 enabled = settings.mcpEnabled
             )
         }
@@ -184,9 +212,10 @@ private fun ExposedToolsSection(
         Text(
             text = "Pick which built-in BossTerm MCP tools clients can call. Toggling here " +
                     "is equivalent to calling the `manage_tools` MCP tool — both update the " +
-                    "same setting and apply live without restarting the server. The " +
-                    "`manage_tools` tool itself is always exposed so disabling everything " +
-                    "leaves a way back.",
+                    "same setting and apply live without restarting the server. Two tools " +
+                    "are always exposed and cannot be disabled: `manage_tools` (so there's " +
+                    "always a way to re-enable the others from MCP), and `run_command` (the " +
+                    "Claude Code PreToolUse hook depends on routing Bash to it).",
             color = TextMuted,
             fontSize = 12.sp,
             modifier = Modifier.padding(bottom = 8.dp)
@@ -207,12 +236,17 @@ private fun ExposedToolsSection(
             Spacer(modifier = Modifier.height(8.dp))
             ToolGroupLabel("Write tools")
             BossTermMcpServer.BUILT_IN_WRITE_TOOLS.forEach { name ->
+                // Reserved tools (run_command) cannot be disabled — the Claude
+                // Code PreToolUse hook routes Bash to it, so taking it off the
+                // wire would brick that integration. Show the row so users
+                // know it exists, but gate the toggle.
+                val reserved = name in BossTermMcpServer.UNDISABLABLE_TOOLS
                 SettingsToggle(
                     label = name,
                     checked = name !in disabled,
                     onCheckedChange = { setEnabled(name, it) },
                     description = toolDescription(name),
-                    enabled = settings.mcpEnabled
+                    enabled = settings.mcpEnabled && !reserved
                 )
             }
         } else {
@@ -223,6 +257,18 @@ private fun ExposedToolsSection(
                 fontSize = 11.sp
             )
         }
+
+        Spacer(modifier = Modifier.height(8.dp))
+        Text(
+            // Always-exposed meta-tool isn't in BUILT_IN_READ_TOOLS or
+            // BUILT_IN_WRITE_TOOLS, so the loops above never render it.
+            // Surface its existence here so users know it's there.
+            text = "Plus the always-exposed meta-tool `manage_tools` — lets clients " +
+                    "enable/disable the tools above at runtime. It cannot be hidden from " +
+                    "this surface.",
+            color = TextMuted,
+            fontSize = 11.sp
+        )
     }
 }
 
@@ -247,7 +293,10 @@ private fun toolDescription(name: String): String = when (name) {
     "read_debug_console" -> "Read recent entries from a tab's debug-data buffer."
     "send_input" -> "Write raw text (including newlines) to a tab/pane's stdin."
     "send_signal" -> "Send ctrl_c / ctrl_d / ctrl_z to a tab/pane."
-    "run_in_panel" -> "Open a new tab or split pane and run a script in it."
+    "run_in_panel" -> "Open a new tab or split pane and run a script in it (fire-and-forget)."
+    "run_command" ->
+        "Run a shell command in a visible pane and return its output + exit code. " +
+            "Reserved — cannot be disabled (the Claude Code PreToolUse hook depends on it)."
     else -> name
 }
 
diff --git a/docs/mcp-server.md b/docs/mcp-server.md
index 57082ece..86b6be41 100644
--- a/docs/mcp-server.md
+++ b/docs/mcp-server.md
@@ -62,6 +62,29 @@ The full advertised URL is logged at startup:
 INFO  BossTermMcpManager - BossTerm MCP server ready: http://127.0.0.1:7676/ (SSE transport, N state(s) registered)
 ```
 
+### `~/.bossterm/mcp.port` marker file
+
+Every successful bind atomically writes the **bound** port (after fallback,
+not the configured one) to `~/.bossterm/mcp.port`. It's deleted on a clean
+stop. The marker exists so external tooling — primarily the user-global
+Claude Code PreToolUse hook described below — can decide whether BossTerm
+MCP is reachable with a stat + cheap TCP probe (`nc -z 127.0.0.1 <port>`),
+instead of an HTTP request with a timeout.
+
+The marker is an optimization, not a security boundary. Any local user
+process can already reach the loopback endpoint while it's running.
+
+### Initialize-time instructions
+
+The server's `initialize` response includes an MCP-spec `instructions` string
+telling the client to prefer `run_command` over its built-in shell tool.
+Claude Code and Codex both surface MCP server instructions in the model's
+system prompt at session start, so the preference is communicated
+out-of-band — no per-project config required for it to take effect.
+
+The full string is the `BOSSTERM_MCP_INSTRUCTIONS` constant in
+[`BossTermMcpServer.kt`](../compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/BossTermMcpServer.kt).
+
 ## Built-in tools
 
 Tool names are unprefixed below. If the embedder sets
@@ -282,6 +305,65 @@ with shell startup.
   `paneId` is `null` for `new_tab`; for splits it's the new pane's session
   id, which you can pass back as `pane_id` to other tools.
 
+### `run_command` (write tool)
+
+Blocking variant of `run_in_panel`: runs a shell command in a visible BossTerm
+pane and waits for OSC 133;D before returning the **exit code, captured
+stdout/stderr, and duration**. Consecutive calls in the same tab reuse one
+"MCP scratch pane" so the UI doesn't accumulate splits.
+
+This is the tool to prefer over your client's built-in shell tool when the
+BossTerm MCP is attached — the user sees commands run in their actual
+terminal *and* the output still comes back to the agent. The server
+advertises that preference in its [initialize-time instructions](#initialize-time-instructions).
+
+Requires OSC 133 shell integration on the user's shell. See
+[`.claude/rules/shell-integration.md`](../.claude/rules/shell-integration.md).
+
+- Required:
+  - `script` (string) — shell command. A trailing newline is added if absent.
+- Optional:
+  - `pane_id` (string) — reuse a specific MCP pane. Defaults to the pane this
+    tool last created for `tab_id`; if none, a new pane is created.
+  - `tab_id` (string) — source tab. Defaults to the primary window's active
+    tab.
+  - `panel` — panel mode used **only when creating a new pane**: `"reuse"`
+    (default; behaves as `horizontal_split` for the first call),
+    `"horizontal_split"`, `"vertical_split"`, or `"new_tab"`.
+  - `split_ratio` (number, `0.05..0.95`) — only used on the first call that
+    creates the pane. Defaults to `mcpDefaultSplitRatio`.
+  - `working_dir` (string) — only used when creating a new pane.
+  - `timeout_ms` (integer, `100..600_000`) — hard timeout. Default `120_000`.
+- Returns:
+  ```json
+  {
+    "ok": true,
+    "tabId": "<uuid>",
+    "paneId": "<uuid>",
+    "exitCode": 0,
+    "durationMs": 42,
+    "output": "captured stdout/stderr (ANSI-stripped)",
+    "truncated": false,
+    "error": null
+  }
+  ```
+  - `exitCode` is `null` on timeout, TUI detection, or shell-integration
+    missing (no OSC 133;D ever arrived).
+  - `output` is capped at 200 KB; `truncated` becomes `true` when the cap is
+    hit *or* the command timed out.
+  - `error` is set when `ok` is `false`. Notable values:
+    - `"TUI detected (alternate screen entered). Use send_input + read_scrollback ..."`
+      — the command entered an alternate-screen program (`vim`, `less`,
+      `htop`, `git commit` without `-m`, etc.). The pane stays alive so the
+      caller can drive it via `send_input` + `read_scrollback`. **Do not
+      retry the same call** — it will time out the same way.
+    - `"Timed out after Nms ..."` — `timeout_ms` elapsed before OSC 133;D
+      arrived. Partial output is still captured.
+
+Concurrent calls on the same `pane_id` are serialized FIFO (per-pane mutex)
+so two pipelined commands cannot interleave their input in the shell's stdin
+buffer.
+
 ## `manage_tools` meta-tool
 
 Always exposed. Use it to introspect or change which built-in tools are
@@ -305,7 +387,8 @@ omitted when `allowWriteTools = false`).
     { "name": "read_debug_console",  "enabled": true },
     { "name": "send_input",          "enabled": false },
     { "name": "send_signal",         "enabled": true },
-    { "name": "run_in_panel",        "enabled": true }
+    { "name": "run_in_panel",        "enabled": true },
+    { "name": "run_command",         "enabled": true }
   ]
 }
 ```
@@ -324,8 +407,16 @@ Response:
 { "ok": true }
 ```
 
-Unknown names error out before any change is written. `manage_tools` itself
-is reserved and cannot be disabled — that would brick the surface.
+Unknown names error out before any change is written. Two tools are reserved
+and cannot be disabled:
+
+- `manage_tools` — disabling it would brick the surface (no way to re-enable
+  anything from MCP).
+- `run_command` — the user-global Claude Code PreToolUse hook
+  (`~/.claude/hooks/prefer-bossterm.sh`, see
+  [Using as Claude Code's default shell](#using-as-claude-codes-default-shell))
+  routes `Bash` calls to it. Disabling it would leave Claude facing a
+  "use run_command instead" message with no `run_command` on the wire.
 
 ## Attaching to AI CLIs
 
@@ -357,6 +448,84 @@ under the button. **Codex caveat**: registration succeeds with codex-cli 0.130,
 but Codex currently speaks streamable HTTP only, so the runtime connection
 will fail against the SSE endpoint until BossTerm's MCP SDK is upgraded.
 
+## Using as Claude Code's default shell
+
+`run_command` plus the initialize-time `instructions` field steer Claude Code
+toward the MCP tool whenever the BossTerm MCP is attached. For a stronger
+guarantee — Claude *can't* fall back to its built-in `Bash` when BossTerm
+is running — wire up a user-global `PreToolUse` hook. The hook reads the
+[`~/.bossterm/mcp.port` marker](#bosstermmcpport-marker-file) to decide
+whether to enforce; if the file is missing or the port isn't listening, it
+exits silently so non-BossTerm sessions are unaffected.
+
+**1. Hook script** at `~/.claude/hooks/prefer-bossterm.sh` (chmod +x):
+
+```sh
+#!/bin/sh
+set -e
+cat >/dev/null 2>&1 || true  # discard the stdin payload Claude sends
+marker="$HOME/.bossterm/mcp.port"
+[ -f "$marker" ] || exit 0
+port=$(cat "$marker" 2>/dev/null) || exit 0
+case "$port" in ''|*[!0-9]*) exit 0 ;; esac
+if command -v nc >/dev/null 2>&1; then
+    nc -z 127.0.0.1 "$port" >/dev/null 2>&1 || exit 0
+fi
+cat <<'JSON'
+{"hookSpecificOutput":{"hookEventName":"PreToolUse","permissionDecision":"deny","permissionDecisionReason":"BossTerm MCP is attached. Use mcp__bossterm__run_command instead of Bash. Pass back pane_id from a prior call to reuse the pane."}}
+JSON
+```
+
+**2. Register it** in `~/.claude/settings.json`:
+
+```json
+{
+  "hooks": {
+    "PreToolUse": [
+      {
+        "matcher": "Bash",
+        "hooks": [
+          { "type": "command", "command": "$HOME/.claude/hooks/prefer-bossterm.sh" }
+        ]
+      }
+    ]
+  }
+}
+```
+
+**3. Optional `~/.claude/CLAUDE.md` addendum** (advisory, anchors the
+preference even before the hook fires on the first attempt):
+
+```markdown
+### Shell execution
+
+If `mcp__bossterm__run_command` is available, prefer it over `Bash`.
+Reuse the same pane by passing the prior call's `pane_id`. If it returns
+`error: "TUI detected"`, switch to `send_input` + `read_scrollback`.
+```
+
+Behavior with all three pieces in place:
+
+- BossTerm closed → marker absent → hook exits silently → Claude uses
+  `Bash` normally in every project.
+- BossTerm open with MCP enabled → hook denies `Bash` calls with the routing
+  reason → Claude retries with `mcp__bossterm__run_command` → command runs
+  in a visible pane and output returns to Claude.
+- BossTerm killed mid-session → marker file disappears → next `Bash` call
+  passes through (degrades gracefully without restarting Claude).
+
+Caveats:
+
+- **Claude and BossTerm must run as the same OS user.** The hook reads
+  `$HOME/.bossterm/mcp.port`; BossTerm writes to `${user.home}/.bossterm/mcp.port`.
+  Under `sudo claude` or `su` those resolve to different paths, the hook
+  finds no marker, and routing degrades to no-op (Bash still works — it
+  just doesn't go through BossTerm).
+- **`nc` must be available** on the user's `PATH` for the probe step. The
+  hook fails closed (lets Bash through) when `nc` is missing, since it
+  can't verify the marker isn't stale. macOS and most Linux distributions
+  ship `nc` by default; minimal Alpine containers don't.
+
 ## Settings reference
 
 All MCP-related fields live in
@@ -368,9 +537,13 @@ and are persisted to `~/.bossterm/settings.json`.
 | `mcpEnabled`              | `Boolean`           | `false`     | Bind the MCP server. Toggles the engine on/off live.                                                   |
 | `mcpPort`                 | `Int`               | `7676`      | Localhost TCP port. Changing while enabled performs stop-then-start.                                   |
 | `mcpShowStatusIndicator`  | `Boolean`           | `true`      | Show the green "BossTerm MCP on" pill in the tab bar.                                                  |
-| `mcpDefaultSplitRatio`    | `Float`             | `0.3`       | Default new-pane size for `run_in_panel` splits when `split_ratio` is omitted. Range `0.05..0.95`.     |
+| `mcpDefaultSplitRatio`    | `Float`             | `0.3`       | Default new-pane size for `run_in_panel` / `run_command` splits when `split_ratio` is omitted. Range `0.05..0.95`. |
+| `mcpRunCommandDefaultTimeoutMs` | `Int`         | `120_000`   | Default hard timeout for `run_command` when the caller doesn't pass `timeout_ms`. Clamped per-call to `100..600_000`. |
+| `mcpRunCommandMaxOutputBytes`   | `Int`         | `120_000`   | Cap on the captured `output` field returned by `run_command`. Beyond it, output is truncated and `truncated: true` is set. Sized to fit under `mcpMaxAnswerChars` (150_000) with JSON-wrapper headroom; raise both together for tooling that emits very large dumps. Minimum enforced: 1024. Advanced; no UI control. |
+| `mcpRunCommandShellReadyTimeoutMs` | `Int`      | `1_500`     | Fallback delay `run_command` waits for OSC 133;A on a freshly-created pane before sending anyway. Set `0` to skip the wait entirely. Advanced; no UI control. |
+| `mcpRunCommandDefaultPanel`     | `String`      | `"horizontal_split"` | Panel mode `run_command` uses when it has to create a new MCP scratch pane and the caller passed `panel: "reuse"` (or omitted it). One of `horizontal_split`, `vertical_split`, `new_tab`. |
 | `mcpAttachedTo`           | `Set<String>`       | `{}`        | Stable `persistenceKey`s (e.g. `"CLAUDE_CODE"`) of attached AI CLIs. Used for silent re-attach.        |
-| `disabledMcpTools`        | `Set<String>`       | `{}`        | Unprefixed built-in tool names hidden from clients. Edited via the UI or `manage_tools`.               |
+| `disabledMcpTools`        | `Set<String>`       | `{}`        | Unprefixed built-in tool names hidden from clients. Edited via the UI or `manage_tools`. `manage_tools` and `run_command` are reserved and ignored if added by hand. |
 | `mcpMaxAnswerChars`       | `Int`               | `150_000`   | Soft ceiling on tool response size. When exceeded, the tool returns a progressively smaller summary instead of the full payload — see [Response shortening](#response-shortening). Advanced; no UI control. |
 | `mcpConfigured`           | `Boolean`           | `false`     | Internal first-launch marker. Once `true`, embedder defaults no longer override the user's choice.     |
 
@@ -705,3 +878,9 @@ auto-refreshes once the SDK is upgraded.
 **Port stuck after Force-Quit.** The 1.5 s Ktor shutdown grace is normally
 enough, but a `kill -9` leaves the port in `TIME_WAIT`. Wait ~30 s or change
 the port temporarily.
+
+**Multiple BossTerm processes at once.** Unsupported. Each process writes
+`~/.bossterm/mcp.port` on bind; the last writer wins. Older instances are
+still reachable on their own ports, but the Claude Code PreToolUse hook
+(and anything else reading the marker) only ever sees one of them. Close
+extras before relying on the routing.

From 4f65af29639aa5b042114d30bd5178e917c81003 Mon Sep 17 00:00:00 2001
From: Shivang <shivang.iitk@gmail.com>
Date: Wed, 20 May 2026 01:54:11 -0700
Subject: [PATCH 2/4] feat(mcp): route tools to the calling client's window,
 not first-registered
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Today McpTerminalRegistry.primaryState() returns the first-registered
window, so run_command / run_in_panel / get_active_tab without a tab_id
target the wrong window whenever multiple BossTerm windows are open —
the user sees commands run somewhere they aren't looking. The correct
invariant is "the window the calling MCP client is running INSIDE."

New file ProcessAncestry.kt resolves that window by:

1. Resolving the PID owning the loopback TCP socket on the client side
   (lsof on macOS; /proc/net/tcp + /proc/<pid>/fd on Linux).
2. Walking the parent process tree from that PID.
3. Matching the first ancestor PID against every tracked pane's shell
   PID across all registered states.
4. Returning the state owning that pane — or null if no ancestor
   matches (client running outside any BossTerm pane: Claude Desktop,
   external Inspector, CI…).

BossTermMcpManager's Ktor interceptor calls the resolver on every
incoming MCP request and records the result in
McpTerminalRegistry.lastResolvedClientWindow (AtomicReference). The
three tool handlers that previously used registry.primaryState()
(get_active_tab, run_in_panel, run_command) now read the resolved
client window first, with the existing primaryState() as fallback.

Design notes:

- Resolution failures are silent — caller falls back to first-
  registered. Never crashes a tool call on a parent-walk hiccup.
- Concurrent multi-client racing is acceptable last-writer-wins for
  v1 — the dominant single-client case is consistent.
- Resolver runs INSIDE the existing intercept block, AFTER the
  DNS-rebinding check, so a hostile request never triggers lsof.
- Stale lastResolvedClientWindow entries (window closed) are detected
  lazily on read (state is no longer in `states` list) and cleared.

Platform support: macOS + Linux. Windows and unknown platforms skip
the resolver (resolveClientWindow returns null) and the server falls
back to first-registered.

Cost: ~50-150ms per request on macOS (two shell-outs per parent hop —
lsof, then ps for each ancestor up to 4 hops deep), ~5-15ms on Linux
(pure /proc reads).

Docs: new "Caller-window resolution" subsection in mcp-server.md
covering mechanism, scenarios, platform support, and the multi-client
race caveat.

Verification:
- ./gradlew :compose-ui:compileKotlinDesktop --no-daemon clean.
- Live end-to-end testing needs a restart of BossTerm (currently
  running binary is pre-change) and a multi-window scenario — manual,
  reviewer to confirm before merge.

Generated with [Claude Code](https://claude.com/claude-code)
---
 .../compose/mcp/BossTermMcpManager.kt         |  17 ++
 .../bossterm/compose/mcp/BossTermMcpServer.kt |  19 +-
 .../compose/mcp/McpTerminalRegistry.kt        |  45 ++++
 .../bossterm/compose/mcp/ProcessAncestry.kt   | 225 ++++++++++++++++++
 docs/mcp-server.md                            |  34 +++
 5 files changed, 336 insertions(+), 4 deletions(-)
 create mode 100644 compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/ProcessAncestry.kt

diff --git a/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/BossTermMcpManager.kt b/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/BossTermMcpManager.kt
index 73400568..bbaea121 100644
--- a/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/BossTermMcpManager.kt
+++ b/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/BossTermMcpManager.kt
@@ -346,6 +346,23 @@ class BossTermMcpManager(
                         finish()
                         return@intercept
                     }
+
+                    // Resolve which BossTerm window the calling client lives in
+                    // (process-tree walk from the client's PID) and record it
+                    // so tools that default to "primary window" target the
+                    // caller's window rather than first-registered. Failure
+                    // here is silent — the resolver returns null and the
+                    // server keeps using the prior resolution (or
+                    // primaryState() if there is none). This runs only
+                    // AFTER the rebinding check passes, so we never spawn
+                    // lsof for a hostile request.
+                    val remotePort = call.request.local.remotePort
+                    val resolved = try {
+                        ProcessAncestry.resolveClientWindow(remotePort, registry)
+                    } catch (_: Throwable) {
+                        null
+                    }
+                    registry.setLastResolvedClientWindow(resolved)
                 }
                 // SDK 0.8.3 quirk: both `Route.mcp { ... }` and
                 // `Routing.mcp(path, ...) { ... }` end up mounting SSE +
diff --git a/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/BossTermMcpServer.kt b/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/BossTermMcpServer.kt
index 7539eb37..ea110890 100644
--- a/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/BossTermMcpServer.kt
+++ b/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/BossTermMcpServer.kt
@@ -309,7 +309,11 @@ class BossTermMcpServer(
         ) { request ->
             val args = request.arguments
             val includeFields = args.optionalStringSet("include_fields")
-            val primary = registry.primaryState()
+            // Prefer the window the calling client lives in (process-tree
+            // walk in ProcessAncestry, populated by BossTermMcpManager's
+            // Ktor interceptor). Falls back to first-registered when no
+            // ancestor matches a tracked pane.
+            val primary = registry.lastResolvedClientWindow() ?: registry.primaryState()
             val activeId = primary?.activeTabId
             val info = primary?.activeTab?.toTabInfo(activeId)
             // The literal JSON `null` is valid output — clients calling
@@ -867,12 +871,15 @@ class BossTermMcpServer(
             val workingDir = args.requireString("working_dir")
 
             // Resolve the target state. If tab_id given, find the state that
-            // owns it. Otherwise use the primary registered window.
+            // owns it. Otherwise prefer the window the calling client lives
+            // in (process-tree walk via ProcessAncestry); fall back to
+            // first-registered if the client can't be traced to any pane.
             val state: TabbedTerminalState = if (requestedTabId != null) {
                 registry.findState(requestedTabId)
                     ?: return@addTool errorResult("Unknown tab_id: $requestedTabId")
             } else {
-                registry.primaryState()
+                registry.lastResolvedClientWindow()
+                    ?: registry.primaryState()
                     ?: return@addTool errorResult("No registered terminal window")
             }
 
@@ -994,7 +1001,11 @@ class BossTermMcpServer(
                 registry.findState(requestedTabId)
                     ?: return@addTool errorResult("Unknown tab_id: $requestedTabId")
             } else {
-                registry.primaryState()
+                // Prefer the window the calling client lives in (process-
+                // tree walk via ProcessAncestry); fall back to first-
+                // registered when no ancestor matches a tracked pane.
+                registry.lastResolvedClientWindow()
+                    ?: registry.primaryState()
                     ?: return@addTool errorResult("No registered terminal window")
             }
             val tabId = requestedTabId
diff --git a/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/McpTerminalRegistry.kt b/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/McpTerminalRegistry.kt
index dea5e38c..197da06c 100644
--- a/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/McpTerminalRegistry.kt
+++ b/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/McpTerminalRegistry.kt
@@ -9,6 +9,7 @@ import kotlinx.coroutines.flow.asStateFlow
 import kotlinx.coroutines.sync.Mutex
 import java.util.concurrent.ConcurrentHashMap
 import java.util.concurrent.CopyOnWriteArrayList
+import java.util.concurrent.atomic.AtomicReference
 
 /**
  * Process-wide registry of live [TabbedTerminalState] instances exposed to the
@@ -199,6 +200,50 @@ object McpTerminalRegistry {
     internal fun tabCacheLock(tabId: String): Mutex =
         tabCacheLocks.computeIfAbsent(tabId) { Mutex() }
 
+    // -----------------------------------------------------------------
+    // Caller-window resolution — pick the window an MCP client is
+    // running INSIDE (via process-tree walk in ProcessAncestry) so tools
+    // that default to "the primary window" target the window the calling
+    // client lives in, not whichever window happened to register first.
+    //
+    // Updated by the Ktor interceptor in BossTermMcpManager on each
+    // incoming request; read by tool handlers in BossTermMcpServer that
+    // previously called primaryState() directly.
+    //
+    // Race: in a multi-client multi-window setup, concurrent requests
+    // from different clients write here last-writer-wins. Acceptable —
+    // the single-client case (the common one) is consistent; the rare
+    // multi-client race may target one client's window for another's
+    // call for a single request, never permanently.
+    // -----------------------------------------------------------------
+
+    private val lastResolvedClient = AtomicReference<TabbedTerminalState?>(null)
+
+    /**
+     * Most recently resolved "calling-client window", or null if no
+     * request has been resolved yet OR the resolved state has been
+     * unregistered (window closed). Lazy invalidation: stale entries
+     * are detected on read and cleared.
+     */
+    internal fun lastResolvedClientWindow(): TabbedTerminalState? {
+        val cached = lastResolvedClient.get() ?: return null
+        if (cached !in states) {
+            lastResolvedClient.compareAndSet(cached, null)
+            return null
+        }
+        return cached
+    }
+
+    /**
+     * Manager-only. Pass a resolved state to record it as the most-recent
+     * caller window. Null is a no-op — a non-resolving request (client
+     * running outside any BossTerm pane) shouldn't blow away the prior
+     * resolution from a request that DID resolve.
+     */
+    internal fun setLastResolvedClientWindow(state: TabbedTerminalState?) {
+        if (state != null) lastResolvedClient.set(state)
+    }
+
     private fun persist(targets: Set<McpAttachTarget>) {
         // Sort by enum-declaration order so settings.json is deterministic
         // across saves (kotlinx.serialization writes whatever the Set
diff --git a/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/ProcessAncestry.kt b/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/ProcessAncestry.kt
new file mode 100644
index 00000000..9bf1e23e
--- /dev/null
+++ b/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/ProcessAncestry.kt
@@ -0,0 +1,225 @@
+package ai.rever.bossterm.compose.mcp
+
+import ai.rever.bossterm.compose.TabbedTerminalState
+import ai.rever.bossterm.compose.shell.ShellCustomizationUtils
+import org.slf4j.LoggerFactory
+import java.io.File
+import java.nio.file.Files
+import java.util.concurrent.TimeUnit
+
+/**
+ * Resolves which registered [TabbedTerminalState] (window) owns the MCP
+ * client process connected on a given loopback TCP source port.
+ *
+ * Tools that default to "the primary window" (e.g. `run_command`,
+ * `run_in_panel`, `get_active_tab` called with no `tab_id`) want to target
+ * the window the calling client is running INSIDE, not whichever window
+ * happens to be first in the registry. This object answers that.
+ *
+ * Algorithm:
+ *  1. Resolve the PID owning the loopback TCP socket on the given port.
+ *  2. Walk the parent process tree.
+ *  3. First ancestor PID that matches a tracked pane's shell PID identifies
+ *     the client's pane → that pane's state is the answer.
+ *  4. If no ancestor matches (client running outside any BossTerm pane —
+ *     Claude Desktop, an external Inspector, a CI script…), return null
+ *     and the caller falls back to first-registered.
+ *
+ * Platform support: macOS via `lsof` + `ps`, Linux via `/proc/net/tcp` +
+ * `/proc/<pid>/status`. Windows and unknown platforms return null silently.
+ *
+ * Cost: a fresh resolution shells out twice per parent hop on macOS (lsof
+ * once, then `ps -o ppid=` per ancestor). Real pane→Claude depth is 2-4
+ * hops, so budget ~50-150ms. Linux is faster (pure /proc reads).
+ *
+ * Failures are logged at DEBUG and return null — never crash a tool call
+ * on the strength of a parent-walk hiccup.
+ */
+internal object ProcessAncestry {
+
+    private val log = LoggerFactory.getLogger(ProcessAncestry::class.java)
+
+    /**
+     * Max ancestors to walk before giving up. Real pane→Claude depth is
+     * typically 2-4 (shell → maybe tmux → Claude Code's node launcher →
+     * claude itself). The cap protects against PID cycles or pathologies.
+     */
+    private const val MAX_PARENT_WALK = 16
+
+    /** Per-subprocess wait cap — these shell-outs should be near-instant. */
+    private const val SUBPROCESS_TIMEOUT_S = 2L
+
+    /**
+     * @param remotePort the client-side ephemeral port — i.e. what Ktor
+     *   reports as `call.request.local.remotePort`.
+     * @param registry source of truth for tracked panes' shell PIDs.
+     * @return the [TabbedTerminalState] containing the pane that owns the
+     *   client process, or null if no ancestor matches.
+     */
+    fun resolveClientWindow(
+        remotePort: Int,
+        registry: McpTerminalRegistry
+    ): TabbedTerminalState? {
+        if (remotePort <= 0) return null
+        val statesByShellPid = collectStateByShellPid(registry)
+        if (statesByShellPid.isEmpty()) return null
+
+        val clientPid = findClientPid(remotePort) ?: return null
+        var pid = clientPid
+        var hops = 0
+        while (hops < MAX_PARENT_WALK) {
+            statesByShellPid[pid]?.let { return it }
+            val parent = parentPid(pid)
+            if (parent == null || parent <= 1L || parent == pid) return null
+            pid = parent
+            hops++
+        }
+        return null
+    }
+
+    /**
+     * Flatten every tracked tab's shell PID across every registered state
+     * into a single lookup map. Cheap and rebuilt per resolution so pane
+     * creation / closure is reflected immediately without a refresh hook.
+     */
+    private fun collectStateByShellPid(
+        registry: McpTerminalRegistry
+    ): Map<Long, TabbedTerminalState> {
+        val out = HashMap<Long, TabbedTerminalState>()
+        for (state in registry.allStates()) {
+            for (tab in state.tabs) {
+                val pid = tab.processHandle.value?.getPid() ?: continue
+                out[pid] = state
+            }
+        }
+        return out
+    }
+
+    // ---------------------------------------------------------------------
+    // Client PID resolution: socket → owning process.
+    // ---------------------------------------------------------------------
+
+    private fun findClientPid(remotePort: Int): Long? = when {
+        ShellCustomizationUtils.isMacOS() -> findClientPidMacOS(remotePort)
+        ShellCustomizationUtils.isLinux() -> findClientPidLinux(remotePort)
+        else -> null
+    }
+
+    /**
+     * `lsof -nP -iTCP@127.0.0.1:<port> -sTCP:ESTABLISHED -F p` lists every
+     * process touching the loopback TCP endpoint on that port — both ends
+     * of the connection match the filter when both are local. Each end is
+     * one line `p<PID>`. Our own PID is the server side; the *other* PID
+     * is the client we're looking for.
+     */
+    private fun findClientPidMacOS(remotePort: Int): Long? {
+        val ourPid = ProcessHandle.current().pid()
+        return try {
+            val process = ProcessBuilder(
+                "lsof", "-nP",
+                "-iTCP@127.0.0.1:$remotePort",
+                "-sTCP:ESTABLISHED",
+                "-F", "p"
+            ).redirectErrorStream(true).start()
+            val out = process.inputStream.bufferedReader().readLines()
+            process.waitFor(SUBPROCESS_TIMEOUT_S, TimeUnit.SECONDS)
+            out.asSequence()
+                .filter { it.startsWith("p") }
+                .mapNotNull { it.drop(1).toLongOrNull() }
+                .firstOrNull { it != ourPid }
+        } catch (e: Throwable) {
+            log.debug("macOS lsof PID lookup failed for port {}: {}", remotePort, e.message)
+            null
+        }
+    }
+
+    /**
+     * Linux: scan `/proc/net/tcp` for ESTABLISHED rows whose local or
+     * remote address ends with `:<portHex>`. Each row carries the socket
+     * inode. Then scan `/proc/<pid>/fd/N` symlinks (over every PID's fd
+     * directory) for `socket:[<inode>]` targets to find the owning PID.
+     *
+     * Pure file reads — no shell-out, so this is fast (~5-10ms).
+     */
+    private fun findClientPidLinux(remotePort: Int): Long? {
+        val portHex = remotePort.toString(16).uppercase().padStart(4, '0')
+        return try {
+            val inodes = File("/proc/net/tcp").readText()
+                .lineSequence()
+                .drop(1) // header
+                .mapNotNull { line ->
+                    val fields = line.trim().split(Regex("\\s+"))
+                    if (fields.size < 10) return@mapNotNull null
+                    val local = fields[1]
+                    val remote = fields[2]
+                    val state = fields[3]
+                    if (state != "01") return@mapNotNull null // 01 = TCP_ESTABLISHED
+                    if (local.endsWith(":$portHex") || remote.endsWith(":$portHex")) {
+                        fields[9].toLongOrNull()
+                    } else null
+                }
+                .toSet()
+            if (inodes.isEmpty()) return null
+
+            val ourPid = ProcessHandle.current().pid()
+            File("/proc")
+                .listFiles { f -> f.isDirectory && f.name.toLongOrNull() != null }
+                ?.asSequence()
+                ?.mapNotNull { pidDir ->
+                    val pid = pidDir.name.toLongOrNull() ?: return@mapNotNull null
+                    if (pid == ourPid) return@mapNotNull null
+                    val fdDir = File(pidDir, "fd")
+                    val fds = fdDir.listFiles() ?: return@mapNotNull null
+                    val match = fds.any { fd ->
+                        val target = try {
+                            Files.readSymbolicLink(fd.toPath()).toString()
+                        } catch (_: Throwable) {
+                            return@any false
+                        }
+                        if (!target.startsWith("socket:[") || !target.endsWith("]")) {
+                            return@any false
+                        }
+                        val inode = target.substring("socket:[".length, target.length - 1)
+                            .toLongOrNull() ?: return@any false
+                        inode in inodes
+                    }
+                    if (match) pid else null
+                }
+                ?.firstOrNull()
+        } catch (e: Throwable) {
+            log.debug("Linux /proc PID lookup failed for port {}: {}", remotePort, e.message)
+            null
+        }
+    }
+
+    // ---------------------------------------------------------------------
+    // Parent PID walking.
+    // ---------------------------------------------------------------------
+
+    private fun parentPid(pid: Long): Long? = when {
+        ShellCustomizationUtils.isMacOS() -> parentPidMacOS(pid)
+        ShellCustomizationUtils.isLinux() -> parentPidLinux(pid)
+        else -> null
+    }
+
+    private fun parentPidMacOS(pid: Long): Long? = try {
+        val process = ProcessBuilder("ps", "-o", "ppid=", "-p", pid.toString())
+            .redirectErrorStream(true).start()
+        val out = process.inputStream.bufferedReader().readText().trim()
+        process.waitFor(SUBPROCESS_TIMEOUT_S, TimeUnit.SECONDS)
+        out.toLongOrNull()
+    } catch (e: Throwable) {
+        log.debug("macOS ps ppid lookup failed for pid {}: {}", pid, e.message)
+        null
+    }
+
+    private fun parentPidLinux(pid: Long): Long? = try {
+        File("/proc/$pid/status").readText()
+            .lineSequence()
+            .firstOrNull { it.startsWith("PPid:") }
+            ?.substringAfter("PPid:")?.trim()?.toLongOrNull()
+    } catch (e: Throwable) {
+        log.debug("Linux /proc/{}/status PPid read failed: {}", pid, e.message)
+        null
+    }
+}
diff --git a/docs/mcp-server.md b/docs/mcp-server.md
index 86b6be41..0b5fbdd4 100644
--- a/docs/mcp-server.md
+++ b/docs/mcp-server.md
@@ -74,6 +74,40 @@ instead of an HTTP request with a timeout.
 The marker is an optimization, not a security boundary. Any local user
 process can already reach the loopback endpoint while it's running.
 
+### Caller-window resolution
+
+For tools that default to "the primary window" — `get_active_tab` with no
+`tab_id`, `run_in_panel` / `run_command` with no `tab_id` — the server
+picks the window the **calling client is running inside**, not whichever
+window happened to register first.
+
+Mechanism: a Ktor interceptor on every incoming MCP request looks up the
+PID owning the loopback TCP socket on the client side, walks the parent
+process tree, and matches the first ancestor PID against the shell PIDs
+of every tracked pane. The match identifies the client's pane and
+therefore its window.
+
+Behavior across scenarios:
+
+- Single BossTerm window: identical to before.
+- Multiple windows, Claude Code running in window B: tools without
+  `tab_id` target window B.
+- Multiple windows, Claude Desktop / external Inspector / a CI script
+  (no BossTerm pane in the client's ancestry): falls back to whichever
+  window most recently resolved successfully, or to first-registered if
+  none ever has.
+
+Platform support: macOS (via `lsof` + `ps`) and Linux (via
+`/proc/net/tcp` + `/proc/<pid>/status`). Cost: roughly 50–150 ms per
+request on macOS, 5–15 ms on Linux. Windows and other platforms skip
+the resolver and fall back to first-registered. All failure modes log
+at DEBUG and degrade — never crash a tool call.
+
+Concurrent multi-client racing: if two clients in different windows
+issue requests simultaneously, the resolved window is last-writer-wins
+for that single window of time. Both windows are still individually
+addressable via explicit `tab_id`.
+
 ### Initialize-time instructions
 
 The server's `initialize` response includes an MCP-spec `instructions` string

From 8d825d351098ae11ff083cff5483308d43f32105 Mon Sep 17 00:00:00 2001
From: Shivang <shivang.iitk@gmail.com>
Date: Wed, 20 May 2026 02:07:57 -0700
Subject: [PATCH 3/4] feat(mcp): stack run_in_panel splits side-by-side in
 bottom row
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Today consecutive `run_in_panel` calls with `panel: "horizontal_split"`
each split whichever pane currently has focus — so a second long-running
script lands wherever the user happened to click last, often splitting
the agent's own pane and pushing the prior MCP pane around. With three
or four parallel fire-and-forget launches the layout becomes random and
unpleasant.

Anchor the new pane off the last MCP-created scratch pane instead of
the focused pane: a horizontal_split splits the existing scratch pane
*vertically* (right of it), and the new pane becomes the next anchor.
Result: consecutive MCP scratch panes line up as a horizontal strip
along the bottom of the tab, and the agent's primary pane keeps its
real estate.

New: `TabbedTerminalState.splitVerticalFromPane(tabId, anchorPaneId,
ratio, initialCommand)`. Implemented by extending the existing
`performSplit` helper with an optional `anchorPaneId` argument — when
provided, `SplitViewState.setFocusedPane` aims focus at the anchor
before the existing `splitFocusedPane` call runs, so the same code path
handles both flavors. Anchor lookup degrades gracefully: if the cached
paneId no longer resolves to a session (user closed the pane), the
caller falls back to the original `splitHorizontal` (focused-pane)
behavior.

run_in_panel changes:
 - When `panel == "horizontal_split"` and `registry.getScratchPane(tab)`
   points at a still-live pane, route through `splitVerticalFromPane`
   to stack the new pane right of the anchor.
 - Every successful split now updates the scratch-pane cache to the
   new paneId, so the *next* call sees the new pane as its anchor
   (chains horizontally). Previously only run_command updated the
   cache; now run_in_panel does too, and the two tools share the same
   anchor for the same tab.

`vertical_split` and `new_tab` keep their existing semantics — no
behavior change for callers that explicitly opt out of the anchor
stacking.

run_command is intentionally untouched: it reuses the cached pane on
the next call (no new pane created), so the anchor-stacking rule
doesn't apply.

Verification:
 - `./gradlew :compose-ui:compileKotlinDesktop --no-daemon` clean.
 - Manual (reviewer): two run_in_panel horizontal_split calls in a row
   should yield two panes side-by-side in the bottom row, not nested.

Generated with [Claude Code](https://claude.com/claude-code)
---
 .../bossterm/compose/TabbedTerminalState.kt   | 35 ++++++++++++++++++-
 .../bossterm/compose/mcp/BossTermMcpServer.kt | 30 +++++++++++++---
 docs/mcp-server.md                            |  9 +++++
 3 files changed, 69 insertions(+), 5 deletions(-)

diff --git a/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/TabbedTerminalState.kt b/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/TabbedTerminalState.kt
index e346c7bd..9db47887 100644
--- a/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/TabbedTerminalState.kt
+++ b/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/TabbedTerminalState.kt
@@ -601,18 +601,43 @@ class TabbedTerminalState {
         return performSplit(SplitOrientation.HORIZONTAL, tabId, ratio, initialCommand)
     }
 
+    /**
+     * Split a specific pane (by id) vertically — the new pane appears to the
+     * right of [anchorPaneId]. Equivalent to focusing [anchorPaneId] then
+     * calling [splitVertical], but as one atomic operation. If the anchor
+     * pane doesn't exist, falls back to splitting the currently focused pane.
+     *
+     * Designed for the MCP tools (run_command / run_in_panel) so they can
+     * stack scratch panes horizontally in the bottom row instead of
+     * blindly splitting whatever pane happens to be focused.
+     *
+     * @return The session id of the new pane, or null if the split failed.
+     */
+    fun splitVerticalFromPane(
+        tabId: String,
+        anchorPaneId: String,
+        ratio: Float? = null,
+        initialCommand: String? = null
+    ): String? = performSplit(
+        SplitOrientation.VERTICAL, tabId, ratio, initialCommand, anchorPaneId
+    )
+
     /**
      * Internal helper to perform a split in the given orientation.
      *
      * @param initialCommand Optional command to run in the new pane once its shell is ready
      *   (OSC 133;A or fallback delay). Held by the session bootstrap so the bytes are not
      *   eaten by shell startup output (banner, rc-file sourcing, prompt draw).
+     * @param anchorPaneId Optional pane to split. When provided, that pane is
+     *   focused before the split so [SplitViewState.splitFocusedPane] targets
+     *   it. Ignored if the pane doesn't exist (falls back to current focus).
      */
     private fun performSplit(
         orientation: SplitOrientation,
         tabId: String?,
         ratio: Float? = null,
-        initialCommand: String? = null
+        initialCommand: String? = null,
+        anchorPaneId: String? = null
     ): String? {
         val resolvedTabId = resolveTabId(tabId) ?: return null
         val controller = tabController ?: return null
@@ -620,6 +645,14 @@ class TabbedTerminalState {
         val splitState = getOrCreateSplitState(resolvedTabId) ?: return null
         val settings = SettingsManager.instance.settings.value
 
+        // Anchor focus before split, when an explicit pane was requested.
+        // SplitViewState.setFocusedPane silently no-ops on unknown ids, so
+        // we read back focusedPaneId to confirm whether the redirect took
+        // effect — debug only; behavior degrades gracefully either way.
+        if (anchorPaneId != null) {
+            splitState.setFocusedPane(anchorPaneId)
+        }
+
         val workingDir = if (settings.splitInheritWorkingDirectory) {
             splitState.getFocusedSession()?.workingDirectory?.value
         } else null
diff --git a/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/BossTermMcpServer.kt b/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/BossTermMcpServer.kt
index ea110890..11e7e263 100644
--- a/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/BossTermMcpServer.kt
+++ b/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/BossTermMcpServer.kt
@@ -908,11 +908,33 @@ class BossTermMcpServer(
                     // path auto-appends '\n', matching the new_tab branch. An empty script
                     // means "just split, don't run anything".
                     val normalizedScript = script.removeSuffix("\n").ifEmpty { null }
-                    val paneId = if (panel == "horizontal_split") {
-                        state.splitHorizontal(targetTabId, ratio = effectiveRatio, initialCommand = normalizedScript)
-                    } else {
-                        state.splitVertical(targetTabId, ratio = effectiveRatio, initialCommand = normalizedScript)
+                    // Anchor stacking: if there's already an MCP scratch pane for
+                    // this tab and the caller asked for horizontal_split, stack
+                    // the new pane to the RIGHT of that existing pane instead of
+                    // splitting whatever's focused — keeps consecutive MCP panes
+                    // in a horizontal strip along the bottom rather than fighting
+                    // for the focused pane's real estate.
+                    val anchor = if (panel == "horizontal_split") {
+                        registry.getScratchPane(targetTabId)
+                            ?.takeIf { state.findSession(targetTabId, it) != null }
+                    } else null
+                    val paneId = when {
+                        anchor != null -> state.splitVerticalFromPane(
+                            tabId = targetTabId,
+                            anchorPaneId = anchor,
+                            ratio = effectiveRatio,
+                            initialCommand = normalizedScript
+                        )
+                        panel == "horizontal_split" -> state.splitHorizontal(
+                            targetTabId, ratio = effectiveRatio, initialCommand = normalizedScript
+                        )
+                        else -> state.splitVertical(
+                            targetTabId, ratio = effectiveRatio, initialCommand = normalizedScript
+                        )
                     } ?: return@addTool errorResult("Split failed (terminal too small?)")
+                    // Record the new pane as the latest scratch pane so the
+                    // next call has it as its anchor (chains horizontally).
+                    registry.setScratchPane(targetTabId, paneId)
                     val payload = RunInPanelResult(ok = true, tabId = targetTabId, paneId = paneId)
                     successJson(json.encodeToString(RunInPanelResult.serializer(), payload))
                 }
diff --git a/docs/mcp-server.md b/docs/mcp-server.md
index 0b5fbdd4..a4f32677 100644
--- a/docs/mcp-server.md
+++ b/docs/mcp-server.md
@@ -339,6 +339,15 @@ with shell startup.
   `paneId` is `null` for `new_tab`; for splits it's the new pane's session
   id, which you can pass back as `pane_id` to other tools.
 
+**Stacking behavior with `horizontal_split`:** when the tab already has an
+MCP-created scratch pane (from a prior `run_in_panel` or `run_command`),
+the next `horizontal_split` call splits the existing scratch pane *to the
+right*, not the focused pane downward. Consecutive fire-and-forget
+launches therefore line up as a horizontal strip along the bottom of the
+tab rather than fighting each other for whatever pane the user happens to
+have clicked on. `vertical_split` and `new_tab` keep their straightforward
+semantics (target the focused pane or open a fresh tab).
+
 ### `run_command` (write tool)
 
 Blocking variant of `run_in_panel`: runs a shell command in a visible BossTerm

From 476f11bc94b5fdac9ebf41ec3b47221841af619c Mon Sep 17 00:00:00 2001
From: Shivang <shivang.iitk@gmail.com>
Date: Wed, 20 May 2026 02:24:43 -0700
Subject: [PATCH 4/4] fix(mcp): six correctness fixes from review
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

P0 #1 — hook script in docs/mcp-server.md "Using as Claude Code's default
shell" didn't match the actual hook file shipped to ~/.claude/hooks/
prefer-bossterm.sh: it falls through and emits the deny JSON when nc is
missing, contradicting the doc's own "fails closed" prose. Match the
shipped script's `if ! command -v nc; then exit 0; fi; nc -z … || exit 0`
shape so a missing nc never routes Claude to an unverified port.

P0 #2 — `panel: "new_tab"` in run_command was structurally broken.
`state.createTab(...)` returns a TAB id, but the code stuffed it into
`newPaneId`, cached `mcpScratchPanes[origTabId] = newTabId`, and returned
`RunCommandResult(tabId = origTabId, paneId = newTabId)`. Consequences:
the next call with the same `tab_id` would look up `findSession(origTab,
newTab)` (which can't resolve — newTab isn't a pane in origTab's split
tree), evict as stale, and create yet another fresh tab — so the new_tab
mode never got its reuse. The response's tabId was also wrong.

Refactor:
 - `PaneResolution.Hit` now carries an `effectiveTabId` field. Splits and
   cache-hits carry the input tab id forward; new_tab path overrides
   with the freshly-created tab's id (and uses `findSession(newTab,
   newTab)` for a single-pane lookup that actually resolves).
 - The cache writes / reads land on the new tab id when applicable, so
   `panel: "reuse"` follow-ups on the same effective tab actually reuse.
 - `RunCommandResult.tabId` is now `resolveResult.tabId` (the effective
   tab id), so callers passing it back land in the right tab.

P0 #3 — Multi-statement scripts (raw "\n" inside `script`) corrupted the
result. The shell fires B/D once per statement; my `onCommandStarted`
unconditionally overwrote historyAtB/cursorYAtB, so by the time the
awaiter wakes up the start mark might point at the SECOND statement
while the exit code is from the first.

Fix: guard the snapshot with an `ourCommandStarted` AtomicBoolean
flipped via `compareAndSet(false, true)`. Only the first B after our
write records markers; later statements' B events no-op. Also added a
schema note in the docs telling callers to use `bash -lc '…'` for
compound logic, since the captured exit code is still only the first
statement's even with the fix.

P0 #4 — Stale OSC 133;D from a prior writer on the same pane (a
fire-and-forget run_in_panel script, a send_input keystroke, the user
manually typing) could complete `finishedSignal` before our script
even left the gate, returning that command's exit code with a garbled
slice. The per-pane Mutex serializes against other run_command calls
but not other writers on the pane.

Fix: an AtomicBoolean `weHaveWritten` is flipped right BEFORE the
`session.writeUserInput(...)` call. The listener ignores any B until
`weHaveWritten` is set, and ignores any D until `ourCommandStarted`
has been set (which only happens on a B we accepted). Together they
gate the entire B/D cycle to OUR command. Captured-var visibility
across threads is handled via atomics — captured `var`s become
kotlin.jvm.internal.Ref whose .element field isn't volatile.

P1 #5 — `mcpRunCommandMaxOutputBytes` was a misnomer: `sb.length` and
`line.length` are Kotlin String lengths (UTF-16 code units), not UTF-8
bytes. For ASCII it matches; for emoji-heavy output the real byte count
can be 4× the configured value.

Renamed everywhere to `mcpRunCommandMaxOutputChars` and updated the
kdoc to spell out the unit choice (apples-to-apples with the also-
char-counting `mcpMaxAnswerChars`). Settings.json entries set under
the old name silently revert to the default — acceptable for a setting
that shipped on the same dev branch.

P1 #6 — `paneMutexes` only sheds entries on the cache-miss-with-stale-
pane path. If a user closes the cached pane and never re-runs against
that tab, the mutex sits forever. Bounded by pane-creation rate but
unbounded over long-lived sessions.

Added opportunistic GC in `McpTerminalRegistry.setScratchPane`: when
the mutex map exceeds 32 entries, walk every state's tabs + pane
snapshots, collect live ids, evict everything else. Naturally batches
behind real new-pane work; no listener wiring needed. Also sheds
matching stale `mcpScratchPanes` entries while the sweep is hot.

Build: `./gradlew :compose-ui:compileKotlinDesktop --no-daemon` clean.

Generated with [Claude Code](https://claude.com/claude-code)
---
 .../bossterm/compose/mcp/BossTermMcpServer.kt | 119 ++++++++++++++----
 .../compose/mcp/McpTerminalRegistry.kt        |  40 ++++++
 .../compose/settings/TerminalSettings.kt      |  27 ++--
 docs/mcp-server.md                            |  17 ++-
 4 files changed, 167 insertions(+), 36 deletions(-)

diff --git a/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/BossTermMcpServer.kt b/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/BossTermMcpServer.kt
index 11e7e263..bdb6c849 100644
--- a/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/BossTermMcpServer.kt
+++ b/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/BossTermMcpServer.kt
@@ -22,6 +22,8 @@ import kotlinx.coroutines.launch
 import kotlinx.coroutines.sync.withLock
 import kotlinx.coroutines.withTimeoutOrNull
 import kotlinx.serialization.Serializable
+import java.util.concurrent.atomic.AtomicBoolean
+import java.util.concurrent.atomic.AtomicInteger
 import kotlinx.serialization.json.Json
 import kotlinx.serialization.json.JsonArray
 import kotlinx.serialization.json.JsonElement
@@ -1015,7 +1017,7 @@ class BossTermMcpServer(
                 .coerceIn(MIN_RUN_COMMAND_TIMEOUT_MS, MAX_RUN_COMMAND_TIMEOUT_MS)
             val timeoutMs = (args.optionalInt("timeout_ms") ?: defaultTimeoutMs)
                 .coerceIn(MIN_RUN_COMMAND_TIMEOUT_MS, MAX_RUN_COMMAND_TIMEOUT_MS)
-            val maxOutputBytes = userSettings.mcpRunCommandMaxOutputBytes.coerceAtLeast(1024)
+            val maxOutputChars = userSettings.mcpRunCommandMaxOutputChars.coerceAtLeast(1024)
             val shellReadyTimeoutMs = userSettings.mcpRunCommandShellReadyTimeoutMs
                 .coerceAtLeast(0).toLong()
 
@@ -1044,6 +1046,12 @@ class BossTermMcpServer(
             var session: TerminalSession? = null
             var paneId: String? = null
             var freshlyCreated = false
+            // Tracks where the resolved pane ACTUALLY lives. Stays as the
+            // input `tabId` for explicit-pane and cache-hit / split-create
+            // paths; switches to the new tab's id when the cache-miss path
+            // resolves `panel: "new_tab"` (the pane is in the new tab, not
+            // the original).
+            var resolvedTabId = tabId
 
             if (explicitPaneId != null) {
                 session = state.findSession(tabId, explicitPaneId)
@@ -1061,7 +1069,12 @@ class BossTermMcpServer(
                     if (cached != null) {
                         val cachedSession = state.findSession(tabId, cached)
                         if (cachedSession != null) {
-                            return@withLock PaneResolution.Hit(cached, cachedSession, fresh = false)
+                            return@withLock PaneResolution.Hit(
+                                tabId = tabId,
+                                paneId = cached,
+                                session = cachedSession,
+                                fresh = false
+                            )
                         }
                         // Stale entry (user closed the pane). Drop cache + per-pane mutex.
                         registry.clearScratchPane(tabId, paneId = cached)
@@ -1083,7 +1096,7 @@ class BossTermMcpServer(
                         "horizontal_split"
                     }
                     val effectivePanel = if (panel == "reuse") defaultPanel else panel
-                    val newPaneId = when (effectivePanel) {
+                    val newId = when (effectivePanel) {
                         "horizontal_split" -> state.splitHorizontal(
                             tabId = tabId,
                             ratio = effectiveRatio,
@@ -1100,17 +1113,45 @@ class BossTermMcpServer(
                         )
                         else -> return@withLock PaneResolution.BadPanel(effectivePanel)
                     } ?: return@withLock PaneResolution.CreateFailed
-                    val newSession = state.findSession(tabId, newPaneId)
-                        ?: state.findSession(newPaneId)
-                        ?: return@withLock PaneResolution.Unresolvable(newPaneId)
-                    registry.setScratchPane(tabId, newPaneId)
-                    PaneResolution.Hit(newPaneId, newSession, fresh = true)
+
+                    // `state.createTab` returns a TAB id; the splits return a
+                    // pane id within the existing tab. Disambiguate so the
+                    // cache write + response use the right (tabId, paneId)
+                    // pair — caching under the original tab id with the new
+                    // tab's id as the "pane id" would miss every subsequent
+                    // call (the id isn't in origTab's split tree) and the
+                    // response's tabId would point at the wrong tab.
+                    val effectiveTabId: String
+                    val effectivePaneId: String
+                    val newSession: TerminalSession
+                    if (effectivePanel == "new_tab") {
+                        // Freshly created tab: single-pane, so pane id == tab id
+                        // (PaneSnapshot for an unsplit tab uses tab.id for both).
+                        effectiveTabId = newId
+                        effectivePaneId = newId
+                        newSession = state.findSession(newId, newId)
+                            ?: state.findSession(newId)
+                            ?: return@withLock PaneResolution.Unresolvable(newId)
+                    } else {
+                        effectiveTabId = tabId
+                        effectivePaneId = newId
+                        newSession = state.findSession(tabId, newId)
+                            ?: return@withLock PaneResolution.Unresolvable(newId)
+                    }
+                    registry.setScratchPane(effectiveTabId, effectivePaneId)
+                    PaneResolution.Hit(
+                        tabId = effectiveTabId,
+                        paneId = effectivePaneId,
+                        session = newSession,
+                        fresh = true
+                    )
                 }
                 when (resolveResult) {
                     is PaneResolution.Hit -> {
                         paneId = resolveResult.paneId
                         session = resolveResult.session
                         freshlyCreated = resolveResult.fresh
+                        resolvedTabId = resolveResult.tabId
                     }
                     is PaneResolution.BadPanel -> return@addTool errorResult(
                         "Unknown panel: '${resolveResult.value}'. Expected one of: reuse, " +
@@ -1141,14 +1182,14 @@ class BossTermMcpServer(
                     timeoutMs = timeoutMs,
                     freshlyCreated = freshlyCreated,
                     shellReadyTimeoutMs = shellReadyTimeoutMs,
-                    maxOutputBytes = maxOutputBytes
+                    maxOutputChars = maxOutputChars
                 )
             }
             val durationMs = System.currentTimeMillis() - startTimeMs
 
             val result = RunCommandResult(
                 ok = outcome.error == null,
-                tabId = tabId,
+                tabId = resolvedTabId,
                 paneId = resolvedPaneId,
                 exitCode = outcome.exitCode,
                 durationMs = durationMs,
@@ -1176,17 +1217,32 @@ class BossTermMcpServer(
         timeoutMs: Int,
         freshlyCreated: Boolean,
         shellReadyTimeoutMs: Long,
-        maxOutputBytes: Int
+        maxOutputChars: Int
     ): RunOutcome {
         val terminal = session.terminal
         val textBuffer = session.textBuffer
 
         val promptReadySignal = CompletableDeferred<Unit>()
         val finishedSignal = CompletableDeferred<CommandFinish>()
-        // Snapshotted in onCommandStarted. -1 sentinel = "B never fired"; falls
-        // back to (historyAtSend, cursorYAtSend) sampled right before write.
-        var historyAtB = -1
-        var cursorYAtB = -1
+        // Snapshotted on the FIRST onCommandStarted that fires after our
+        // writeUserInput. -1 sentinel = "no B observed for our command yet";
+        // falls back to (historyAtSend, cursorYAtSend) sampled right before
+        // write. The "first-B" gate covers both:
+        //  - compound scripts (multi-statement, embedded \n) — keep the
+        //    start mark anchored at our command's FIRST B, don't let later
+        //    statements' B events stomp it.
+        //  - prior pane activity — `weHaveWritten` is flipped right before
+        //    we send, so any B/D events from a prior writer (run_in_panel,
+        //    send_input, user typing) are ignored.
+        //
+        // Atomic primitives because the listener fires on the emulator
+        // thread while our coroutine writes from a Dispatcher worker;
+        // captured `var`s would be a kotlin.jvm.internal.Ref whose `element`
+        // field isn't volatile, no JMM ordering guarantees.
+        val historyAtB = AtomicInteger(-1)
+        val cursorYAtB = AtomicInteger(-1)
+        val weHaveWritten = AtomicBoolean(false)
+        val ourCommandStarted = AtomicBoolean(false)
 
         val listener = object : CommandStateListener {
             override fun onPromptStarted() {
@@ -1195,10 +1251,13 @@ class BossTermMcpServer(
                 if (!promptReadySignal.isCompleted) promptReadySignal.complete(Unit)
             }
             override fun onCommandStarted() {
-                historyAtB = textBuffer.historyLinesCount
-                cursorYAtB = terminal.cursorY - 1
+                if (!weHaveWritten.get()) return    // prior writer's B
+                if (!ourCommandStarted.compareAndSet(false, true)) return // later statement
+                historyAtB.set(textBuffer.historyLinesCount)
+                cursorYAtB.set(terminal.cursorY - 1)
             }
             override fun onCommandFinished(exitCode: Int) {
+                if (!ourCommandStarted.get()) return // D from a prior writer's command
                 if (!finishedSignal.isCompleted) {
                     finishedSignal.complete(CommandFinish.Done(exitCode))
                 }
@@ -1229,6 +1288,12 @@ class BossTermMcpServer(
             val cursorYAtSend = terminal.cursorY - 1
 
             val toWrite = if (script.endsWith("\n")) script else script + "\n"
+            // Flip the gate BEFORE the write so the listener counts the next
+            // B as ours. Tiny race window: a stdin byte the user types in the
+            // microsecond between this write and the shell consuming it could
+            // be misattributed. Acceptable — the alternative (post-write flip)
+            // would lose B events for very-fast shells.
+            weHaveWritten.set(true)
             session.writeUserInput(toWrite)
 
             val finish = withTimeoutOrNull(timeoutMs.toLong()) {
@@ -1255,8 +1320,10 @@ class BossTermMcpServer(
 
             val historyAtEnd = textBuffer.historyLinesCount
             val cursorYAtEnd = terminal.cursorY - 1
-            val startHistory = if (historyAtB >= 0) historyAtB else historyAtSend
-            val startCursorY = if (cursorYAtB >= 0) cursorYAtB else cursorYAtSend
+            val bSnapHistory = historyAtB.get()
+            val bSnapCursorY = cursorYAtB.get()
+            val startHistory = if (bSnapHistory >= 0) bSnapHistory else historyAtSend
+            val startCursorY = if (bSnapCursorY >= 0) bSnapCursorY else cursorYAtSend
 
             when (finish) {
                 null -> {
@@ -1266,7 +1333,7 @@ class BossTermMcpServer(
                         startCursorY = startCursorY,
                         endHistory = historyAtEnd,
                         endCursorY = cursorYAtEnd,
-                        maxOutputBytes = maxOutputBytes
+                        maxOutputChars = maxOutputChars
                     )
                     RunOutcome(
                         exitCode = null,
@@ -1291,7 +1358,7 @@ class BossTermMcpServer(
                         startCursorY = startCursorY,
                         endHistory = historyAtEnd,
                         endCursorY = cursorYAtEnd,
-                        maxOutputBytes = maxOutputBytes
+                        maxOutputChars = maxOutputChars
                     )
                     RunOutcome(
                         exitCode = finish.exitCode,
@@ -1325,7 +1392,7 @@ class BossTermMcpServer(
         startCursorY: Int,
         endHistory: Int,
         endCursorY: Int,
-        maxOutputBytes: Int
+        maxOutputChars: Int
     ): SlicedOutput {
         val snapshot = textBuffer.createSnapshot()
         val historyDelta = endHistory - startHistory
@@ -1346,7 +1413,7 @@ class BossTermMcpServer(
         while (row <= endRowInclusive) {
             val line = snapshot.getLine(row).text.trimEnd()
             // +1 for the newline that joins lines.
-            if (sb.length + line.length + 1 > maxOutputBytes) {
+            if (sb.length + line.length + 1 > maxOutputChars) {
                 truncated = true
                 break
             }
@@ -1374,9 +1441,15 @@ class BossTermMcpServer(
      * session or one of the discrete failure modes the caller needs to
      * translate into MCP error responses. Sealed so the `when` in the
      * caller is exhaustive — no silent fall-through.
+     *
+     * [Hit.tabId] is the EFFECTIVE tab id — equals the input `tab_id` for
+     * cache-hits and split-create paths, but for `panel: "new_tab"` it's
+     * the freshly-created tab's id so the response and cache writes line
+     * up with the tab the pane actually lives in.
      */
     private sealed class PaneResolution {
         data class Hit(
+            val tabId: String,
             val paneId: String,
             val session: TerminalSession,
             val fresh: Boolean
diff --git a/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/McpTerminalRegistry.kt b/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/McpTerminalRegistry.kt
index 197da06c..8156e1da 100644
--- a/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/McpTerminalRegistry.kt
+++ b/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/mcp/McpTerminalRegistry.kt
@@ -174,8 +174,48 @@ object McpTerminalRegistry {
     /** Record [paneId] as the active MCP scratch pane for [tabId]. */
     internal fun setScratchPane(tabId: String, paneId: String) {
         mcpScratchPanes[tabId] = paneId
+        // Opportunistic GC: shed paneMutex entries for panes / sessions that
+        // no longer exist anywhere in the registry. Without this the mutex
+        // map grows unbounded — clearScratchPane only sheds entries on the
+        // cache-miss-with-stale-pane path, so a user who closes the cached
+        // pane and never re-runs against that tab leaks a Mutex forever.
+        // Triggered from setScratchPane because it runs on every successful
+        // new-pane creation, naturally batching the sweep behind real work.
+        if (paneMutexes.size > MCP_MUTEX_GC_THRESHOLD) {
+            gcStalePaneMutexes()
+        }
     }
 
+    /**
+     * Walk every registered state's tabs + split-tree pane snapshots,
+     * collect all live ids (tab id, pane id, session id), and evict any
+     * [paneMutexes] entries that aren't in that set. Also sweeps stale
+     * [mcpScratchPanes] entries pointing at dead panes — they'd be
+     * lazy-cleared on next read anyway, but better to drop them now while
+     * the sweep is hot.
+     */
+    private fun gcStalePaneMutexes() {
+        val live = HashSet<String>()
+        for (state in states) {
+            for (tab in state.tabs) {
+                live.add(tab.id)
+                for (snapshot in state.getPaneSnapshots(tab.id)) {
+                    live.add(snapshot.id)
+                    live.add(snapshot.sessionId)
+                }
+            }
+        }
+        paneMutexes.keys.removeIf { it !in live }
+        mcpScratchPanes.entries.removeIf { (k, v) -> k !in live || v !in live }
+    }
+
+    /**
+     * Sweep threshold for the opportunistic [gcStalePaneMutexes]. Picked to
+     * comfortably exceed the realistic pane-creation count in a single
+     * session while still bounding map growth for very long-lived ones.
+     */
+    private const val MCP_MUTEX_GC_THRESHOLD = 32
+
     /**
      * Drop the recorded scratch pane for [tabId] (called when the pane is gone).
      * Pass [paneId] to also evict the stale per-pane mutex, preventing the
diff --git a/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/settings/TerminalSettings.kt b/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/settings/TerminalSettings.kt
index 2c314a45..c154dbf0 100644
--- a/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/settings/TerminalSettings.kt
+++ b/compose-ui/src/desktopMain/kotlin/ai/rever/bossterm/compose/settings/TerminalSettings.kt
@@ -794,21 +794,30 @@ data class TerminalSettings(
     val mcpRunCommandDefaultTimeoutMs: Int = 120_000,
 
     /**
-     * Cap on the captured `output` field returned by `run_command`. Beyond
-     * this, output is truncated and the response carries `truncated: true`.
+     * Cap on the captured `output` field returned by `run_command`, in
+     * UTF-16 chars (Kotlin String length). Beyond this, output is truncated
+     * and the response carries `truncated: true`.
      *
-     * Default `120_000` (~120 KB) is sized to fit under `mcpMaxAnswerChars`
-     * (`150_000` soft response cap) with headroom for the JSON wrapper, so
-     * a maxed-out `run_command` reply never trips the response-shortening
-     * ladder. Raise it (and `mcpMaxAnswerChars`) together for tooling that
-     * emits very large dumps; lower it for tight-context clients.
+     * Default `120_000` is sized to fit under `mcpMaxAnswerChars`
+     * (`150_000` soft response cap, also UTF-16 chars) with headroom for
+     * the JSON wrapper, so a maxed-out `run_command` reply never trips the
+     * response-shortening ladder. Raise it (and `mcpMaxAnswerChars`)
+     * together for tooling that emits very large dumps; lower it for
+     * tight-context clients.
      *
-     * Minimum enforced: `1024` bytes — smaller values are silently raised
+     * Unit choice: UTF-16 chars (not UTF-8 bytes) so the comparison against
+     * `mcpMaxAnswerChars` is apples-to-apples. For ASCII-heavy output 1
+     * char ≈ 1 byte; for emoji/CJK the byte count can be up to ~4x the
+     * char count, so the real network payload for an emoji-heavy command
+     * could exceed the value here. Tighten further if your transport
+     * cares about bytes.
+     *
+     * Minimum enforced: `1024` chars — smaller values are silently raised
      * so a single typical output line still fits.
      *
      * Advanced setting — no UI control, edit settings.json directly.
      */
-    val mcpRunCommandMaxOutputBytes: Int = 120_000,
+    val mcpRunCommandMaxOutputChars: Int = 120_000,
 
     /**
      * Fallback delay `run_command` waits for OSC 133;A on a freshly-created
diff --git a/docs/mcp-server.md b/docs/mcp-server.md
index a4f32677..ab2f1a6c 100644
--- a/docs/mcp-server.md
+++ b/docs/mcp-server.md
@@ -364,7 +364,12 @@ Requires OSC 133 shell integration on the user's shell. See
 [`.claude/rules/shell-integration.md`](../.claude/rules/shell-integration.md).
 
 - Required:
-  - `script` (string) — shell command. A trailing newline is added if absent.
+  - `script` (string) — shell command. A trailing newline is added if
+    absent. **Avoid embedded `\n`** for multi-statement scripts (the shell
+    fires multiple OSC 133;B/D cycles; the response carries the FIRST D's
+    exit code and the slice covers from the first B onward). Use
+    `bash -lc '…'` or `sh -c '…'` to bundle compound logic into a single
+    shell command — that emits a single B/D pair.
 - Optional:
   - `pane_id` (string) — reuse a specific MCP pane. Defaults to the pane this
     tool last created for `tab_id`; if none, a new pane is created.
@@ -511,9 +516,13 @@ marker="$HOME/.bossterm/mcp.port"
 [ -f "$marker" ] || exit 0
 port=$(cat "$marker" 2>/dev/null) || exit 0
 case "$port" in ''|*[!0-9]*) exit 0 ;; esac
-if command -v nc >/dev/null 2>&1; then
-    nc -z 127.0.0.1 "$port" >/dev/null 2>&1 || exit 0
+# Fail closed (let Bash through) if nc is unavailable — without a probe we
+# can't verify the marker isn't stale, and routing Claude to a dead port
+# is worse than skipping the routing entirely.
+if ! command -v nc >/dev/null 2>&1; then
+    exit 0
 fi
+nc -z 127.0.0.1 "$port" >/dev/null 2>&1 || exit 0
 cat <<'JSON'
 {"hookSpecificOutput":{"hookEventName":"PreToolUse","permissionDecision":"deny","permissionDecisionReason":"BossTerm MCP is attached. Use mcp__bossterm__run_command instead of Bash. Pass back pane_id from a prior call to reuse the pane."}}
 JSON
@@ -582,7 +591,7 @@ and are persisted to `~/.bossterm/settings.json`.
 | `mcpShowStatusIndicator`  | `Boolean`           | `true`      | Show the green "BossTerm MCP on" pill in the tab bar.                                                  |
 | `mcpDefaultSplitRatio`    | `Float`             | `0.3`       | Default new-pane size for `run_in_panel` / `run_command` splits when `split_ratio` is omitted. Range `0.05..0.95`. |
 | `mcpRunCommandDefaultTimeoutMs` | `Int`         | `120_000`   | Default hard timeout for `run_command` when the caller doesn't pass `timeout_ms`. Clamped per-call to `100..600_000`. |
-| `mcpRunCommandMaxOutputBytes`   | `Int`         | `120_000`   | Cap on the captured `output` field returned by `run_command`. Beyond it, output is truncated and `truncated: true` is set. Sized to fit under `mcpMaxAnswerChars` (150_000) with JSON-wrapper headroom; raise both together for tooling that emits very large dumps. Minimum enforced: 1024. Advanced; no UI control. |
+| `mcpRunCommandMaxOutputChars`   | `Int`         | `120_000`   | Cap on the captured `output` field returned by `run_command`, in UTF-16 chars. Beyond it, output is truncated and `truncated: true` is set. Sized to fit under `mcpMaxAnswerChars` (150_000, also chars) with JSON-wrapper headroom; raise both together for tooling that emits very large dumps. Minimum enforced: 1024. Advanced; no UI control. |
 | `mcpRunCommandShellReadyTimeoutMs` | `Int`      | `1_500`     | Fallback delay `run_command` waits for OSC 133;A on a freshly-created pane before sending anyway. Set `0` to skip the wait entirely. Advanced; no UI control. |
 | `mcpRunCommandDefaultPanel`     | `String`      | `"horizontal_split"` | Panel mode `run_command` uses when it has to create a new MCP scratch pane and the caller passed `panel: "reuse"` (or omitted it). One of `horizontal_split`, `vertical_split`, `new_tab`. |
 | `mcpAttachedTo`           | `Set<String>`       | `{}`        | Stable `persistenceKey`s (e.g. `"CLAUDE_CODE"`) of attached AI CLIs. Used for silent re-attach.        |