Skip to content

feat(bench): add registry scanner — live tool enumeration from real MCP servers#63

Open
ksek87 wants to merge 4 commits into
mainfrom
feat/registry-scan-demo
Open

feat(bench): add registry scanner — live tool enumeration from real MCP servers#63
ksek87 wants to merge 4 commits into
mainfrom
feat/registry-scan-demo

Conversation

@ksek87
Copy link
Copy Markdown
Owner

@ksek87 ksek87 commented May 27, 2026

bench/scan_registry.py spawns real npm MCP packages via npx, runs the MCP
initialize + tools/list handshake over stdio, collects tool definitions, and
pipes the combined JSON through fuzzd scan.

Default set: 10 public servers requiring no API keys
@modelcontextprotocol/server-{fetch,memory,time,filesystem,sequentialthinking,everything}
mcp-server-git, mcp-server-sqlite-npx
@executeautomation/playwright-mcp-server
@upstash/context7-mcp

First run (initial npx download): ~2-5 min. Subsequent runs use npx cache.

Usage:
python3 bench/scan_registry.py [--output FILE] [--no-scan] [--timeout N]

https://claude.ai/code/session_01G4f8mN9SeSHSGY1dWfFzih

claude added 4 commits May 27, 2026 00:32
…CP servers

bench/scan_registry.py spawns real npm MCP packages via npx, runs the MCP
initialize + tools/list handshake over stdio, collects tool definitions, and
pipes the combined JSON through fuzzd scan.

Default set: 10 public servers requiring no API keys
  @modelcontextprotocol/server-{fetch,memory,time,filesystem,sequentialthinking,everything}
  mcp-server-git, mcp-server-sqlite-npx
  @executeautomation/playwright-mcp-server
  @upstash/context7-mcp

First run (initial npx download): ~2-5 min. Subsequent runs use npx cache.

Usage:
  python3 bench/scan_registry.py [--output FILE] [--no-scan] [--timeout N]

https://claude.ai/code/session_01G4f8mN9SeSHSGY1dWfFzih
…e TF-IDF threshold for schema fields

Three structural changes to reduce false positives on real-world MCP servers
without impacting MCPTox detection rate (verified: 440/485 unchanged):

1. "silently" needle: bare "silently" (High) → 12 specific "silently <verb>"
   needles (High) + bare "silently" catch-all at Low. Eliminates the "succeed
   silently" FP on mcp-filesystem/create_directory while preserving the one
   attack hit ("silently add a relation" → still High via "silently add").

2. "deprecated" needle: "deprecated" (Medium) → "is deprecated" + "being
   deprecated" predicate forms (Medium). Attack pattern is always "[X] is
   deprecated, use [attacker tool]". Eliminates "DEPRECATED: Use read_text_file
   instead" FP (self-deprecation label) while preserving all 4 attack hits.

3. TF-IDF schema field threshold: scan_schema now calls scan_all_passes_schema
   which passes min_vocab_overlap=4 instead of 2 to scan_tfidf_with. Terse
   schema parameter descriptions ("Additional headers to include") share
   incidental vocabulary with attack archetypes; stricter overlap requirement
   eliminates playwright HTTP method FPs (cosine 0.38, vocab_hits=2→suppressed).

Also adds 13 more servers to scan_registry.py (23 total).

Research basis: false positive analysis against 76 real-world tools enumerated
live from 6 public npm MCP packages (first run of scan_registry.py).

https://claude.ai/code/session_01G4f8mN9SeSHSGY1dWfFzih
…edle

Research finding (MCPTox; Perez & Ribeiro 2022): 'you must' is structurally
ambiguous — 'you must provide a list of paths' is caller-directed parameter
documentation (benign), while 'you must call X' and 'you must not tell' are
agent-directed behavioral overrides (attacks).

Apply the same refactoring used for 'silently' and 'deprecated':

- Add 'you must not'  (High) — negation / concealment directive
- Add 'you must call' (High) — cross-tool call mandate (MCPTox Prerequisite)
- Add 'you must always' (High) — universal behavioral override
- Lower bare 'you must' to Low — catch-all for parameter requirements

Result: ESLint 'lint-files' FP drops from [high] to [low] ('you must provide
a list of absolute file paths'); context7 cross-tool mandates ('You MUST call
this function before Query Documentation') correctly remain [high] via the new
'you must call' needle. MCPTox detection unchanged (410/455).

Also scope MIN_VOCAB_OVERLAP to #[cfg(test)] to fix dead_code clippy warning.

https://claude.ai/code/session_01G4f8mN9SeSHSGY1dWfFzih
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants