Skip to content

Conversation

@adolago
Copy link

@adolago adolago commented Jan 17, 2026

Summary

This PR adds several reliability improvements for async operations in the codebase.

Changes

MCP State Synchronization (mcp/index.ts)

  • Added per-server mutex (serverMutexes Map + withServerMutex) to prevent concurrent state mutations for the same server
  • Wrapped add, connect, disconnect, and reconnect operations with mutex
  • Added transport cleanup on connection failure and for unused pending OAuth transports

LSP Client Cleanup (lsp/client.ts)

  • Added connection disposal and server process cleanup on initialization failure
  • Prevents resource leaks when LSP server fails to initialize

PTY Subscriber Management (pty/index.ts)

  • Added proactive cleanup of stale subscribers (those not in readyState 1) when new client connects
  • Added onError handler alongside onClose for WebSocket error handling

Improved Error Logging

  • Replaced silent .catch(() => {}) blocks with debug logging in:
    • config/config.ts - Plugin installation errors
    • file/watcher.ts - Subscription cleanup failures
    • project/instance.ts - Instance disposal errors

Testing

These changes have been tested on the agent-core fork and all 721 tests pass.

Motivation

These fixes address:

  1. Race conditions - Concurrent MCP server operations could corrupt state
  2. Resource leaks - Failed LSP/transport connections weren't being cleaned up
  3. Debuggability - Silent catches made it difficult to diagnose issues

- Add per-server mutex to MCP to prevent concurrent state mutations
- Clean up transports on connection failure and for unused connections
- Add connection/process cleanup on LSP client init failure
- Add proactive stale subscriber cleanup and onError handler in PTY
- Replace silent catches with debug logging in config, watcher, instance

These fixes address potential race conditions, resource leaks, and
improve debuggability by logging previously swallowed errors.
@github-actions
Copy link
Contributor

Thanks for your contribution!

This PR doesn't have a linked issue. All PRs must reference an existing issue.

Please:

  1. Open an issue describing the bug/feature (if one doesn't exist)
  2. Add Fixes #<number> or Closes #<number> to this PR description

See CONTRIBUTING.md for details.

@github-actions
Copy link
Contributor

The following comment was made by an LLM, it may be inaccurate:

The search results show the current PR (#9047) and two other PRs related to memory leak prevention:

Potential Related PRs:

  1. fix(core): add dispose functions to prevent subscription memory leaks #7032 - fix(core): add dispose functions to prevent subscription memory leaks

    • Related to resource cleanup and memory leak prevention similar to the current PR's focus on cleaning up failed connections and resources
  2. fix(core): add dispose functions to prevent subscription memory leaks #7914 - fix(core): add dispose functions to prevent subscription memory leaks

These PRs address similar concerns around cleanup and disposal of resources, which aligns with the current PR's work on LSP client cleanup and transport cleanup. However, they appear to be older PRs focused on subscription memory leaks rather than the comprehensive async operation reliability improvements in PR #9047.

The current PR #9047 appears to be a newer, more comprehensive fix that builds on or complements these earlier cleanup efforts.

@adolago adolago closed this Jan 17, 2026
@adolago adolago deleted the upstream-reliability-fixes branch January 17, 2026 08:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant