feat(bench): add registry scanner — live tool enumeration from real MCP servers#63
Open
ksek87 wants to merge 4 commits into
Open
feat(bench): add registry scanner — live tool enumeration from real MCP servers#63ksek87 wants to merge 4 commits into
ksek87 wants to merge 4 commits into
Conversation
…CP servers
bench/scan_registry.py spawns real npm MCP packages via npx, runs the MCP
initialize + tools/list handshake over stdio, collects tool definitions, and
pipes the combined JSON through fuzzd scan.
Default set: 10 public servers requiring no API keys
@modelcontextprotocol/server-{fetch,memory,time,filesystem,sequentialthinking,everything}
mcp-server-git, mcp-server-sqlite-npx
@executeautomation/playwright-mcp-server
@upstash/context7-mcp
First run (initial npx download): ~2-5 min. Subsequent runs use npx cache.
Usage:
python3 bench/scan_registry.py [--output FILE] [--no-scan] [--timeout N]
https://claude.ai/code/session_01G4f8mN9SeSHSGY1dWfFzih
…e TF-IDF threshold for schema fields
Three structural changes to reduce false positives on real-world MCP servers
without impacting MCPTox detection rate (verified: 440/485 unchanged):
1. "silently" needle: bare "silently" (High) → 12 specific "silently <verb>"
needles (High) + bare "silently" catch-all at Low. Eliminates the "succeed
silently" FP on mcp-filesystem/create_directory while preserving the one
attack hit ("silently add a relation" → still High via "silently add").
2. "deprecated" needle: "deprecated" (Medium) → "is deprecated" + "being
deprecated" predicate forms (Medium). Attack pattern is always "[X] is
deprecated, use [attacker tool]". Eliminates "DEPRECATED: Use read_text_file
instead" FP (self-deprecation label) while preserving all 4 attack hits.
3. TF-IDF schema field threshold: scan_schema now calls scan_all_passes_schema
which passes min_vocab_overlap=4 instead of 2 to scan_tfidf_with. Terse
schema parameter descriptions ("Additional headers to include") share
incidental vocabulary with attack archetypes; stricter overlap requirement
eliminates playwright HTTP method FPs (cosine 0.38, vocab_hits=2→suppressed).
Also adds 13 more servers to scan_registry.py (23 total).
Research basis: false positive analysis against 76 real-world tools enumerated
live from 6 public npm MCP packages (first run of scan_registry.py).
https://claude.ai/code/session_01G4f8mN9SeSHSGY1dWfFzih
…ndshake Commented out rather than deleted so the reason is documented inline. https://claude.ai/code/session_01G4f8mN9SeSHSGY1dWfFzih
…edle
Research finding (MCPTox; Perez & Ribeiro 2022): 'you must' is structurally
ambiguous — 'you must provide a list of paths' is caller-directed parameter
documentation (benign), while 'you must call X' and 'you must not tell' are
agent-directed behavioral overrides (attacks).
Apply the same refactoring used for 'silently' and 'deprecated':
- Add 'you must not' (High) — negation / concealment directive
- Add 'you must call' (High) — cross-tool call mandate (MCPTox Prerequisite)
- Add 'you must always' (High) — universal behavioral override
- Lower bare 'you must' to Low — catch-all for parameter requirements
Result: ESLint 'lint-files' FP drops from [high] to [low] ('you must provide
a list of absolute file paths'); context7 cross-tool mandates ('You MUST call
this function before Query Documentation') correctly remain [high] via the new
'you must call' needle. MCPTox detection unchanged (410/455).
Also scope MIN_VOCAB_OVERLAP to #[cfg(test)] to fix dead_code clippy warning.
https://claude.ai/code/session_01G4f8mN9SeSHSGY1dWfFzih
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
bench/scan_registry.py spawns real npm MCP packages via npx, runs the MCP
initialize + tools/list handshake over stdio, collects tool definitions, and
pipes the combined JSON through fuzzd scan.
Default set: 10 public servers requiring no API keys
@modelcontextprotocol/server-{fetch,memory,time,filesystem,sequentialthinking,everything}
mcp-server-git, mcp-server-sqlite-npx
@executeautomation/playwright-mcp-server
@upstash/context7-mcp
First run (initial npx download): ~2-5 min. Subsequent runs use npx cache.
Usage:
python3 bench/scan_registry.py [--output FILE] [--no-scan] [--timeout N]
https://claude.ai/code/session_01G4f8mN9SeSHSGY1dWfFzih