-
Notifications
You must be signed in to change notification settings - Fork 72
@W-21082863 - feat(code-analyzer): cap query results at 10; add allowLargeResultSet override #367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
@W-21082863 - feat(code-analyzer): cap query results at 10; add allowLargeResultSet override #367
Conversation
packages/mcp-provider-code-analyzer/src/tools/list_code_analyzer_rules.ts
Show resolved
Hide resolved
packages/mcp-provider-code-analyzer/src/tools/query_code_analyzer_results.ts
Outdated
Show resolved
Hide resolved
| `Query a Code Analyzer results JSON file and return filtered violations.\n` + | ||
| `Supports filters like severity, category/tag, engine, rule, and file name, plus top-N and sorting.\n` + | ||
| `Use this after running "run_code_analyzer" to read the generated results file.\n` + | ||
| `After completion, this tool will summarize and explain the filtered results to the user.\n` + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated: Backticks (`) support multiline without needing to concatenate strings or add newline characters (\n). MDN docs
This could be rewritten to like so:
const DESCRIPTION: string = `Query a Code Analyzer results JSON file and return filtered violations.
Supports filters like severity, category/tag, engine, rule, and file name, plus top-N and sorting.
Use this after running "run_code_analyzer" to read the generated results file.
After completion, this tool will summarize and explain the filtered results to the user.
[...etc...]`;| resultsFile: z.string().describe("Absolute path to a results JSON file produced by the code analyzer., if results file is not provided, call run_code_analyzer tool to generate a results file first."), | ||
| selector: z.string().describe('Selector (same semantics as "list_code_analyzer_rules"): colon-separated tokens with optional OR-groups in parentheses, e.g., "Security:(pmd,eslint):High".'), | ||
| topN: z.number().int().positive().max(1000).default(5).describe("Return at most this many violations after filtering and sorting (default 5)."), | ||
| topN: z.number().int().positive().max(1000).default(5).describe(`Return at most this many violations after filtering and sorting (default 5). Requests >${DEFAULT_TOPN_POLICY_LIMIT} require allowLargeResultSet=true.`), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these two defaults be updated to use DEFAULT_TOPN_POLICY_LIMIT (which is currently 10), instead of a hard coded 5?
| allowLargeResultSet: boolean | undefined, | ||
| limit: number = DEFAULT_TOPN_POLICY_LIMIT | ||
| ): { ok: true } | { ok: false; message: string; code: PolicyErrorCode } { | ||
| const value = topN ?? 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, should this be hardcoded 5 or DEFAULT_TOPN_POLICY_LIMIT (10)?
| selector: z.string().describe('Selector (same semantics as "list_code_analyzer_rules"): colon-separated tokens with optional OR-groups in parentheses, e.g., "Security:(pmd,eslint):High".'), | ||
| topN: z.number().int().positive().max(1000).default(5).describe("Return at most this many violations after filtering and sorting (default 5)."), | ||
| topN: z.number().int().positive().max(1000).default(5).describe(`Return at most this many violations after filtering and sorting (default 5). Requests >${DEFAULT_TOPN_POLICY_LIMIT} require allowLargeResultSet=true.`), | ||
| allowLargeResultSet: z.boolean().optional().describe(`Explicit opt-in to request more than the top ${DEFAULT_TOPN_POLICY_LIMIT} violations. Defaults to false and is not recommended.`), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why both the topN and allowLargeResultSet inputs are needed? You already have the topN input that defaults to 5 (or 10?). If a user asks for more results than the default, why do they also need to set a boolean to get what they asked for? Explicitly asking for more that 10 results feels like it should be enough.
For example, if I ran:
sf data query --query "SELECT Id, Name from Contact LIMIT 25"
I would find it odd if I got an error back saying:
You need to set --return-large-results to get more than 10 results
I already asked for 25, just give me 25.
You could update the description of the topN input to say something like "Never increase this number unless the users explicitly asks for it to be larger"
| if (truncated) { | ||
| contentItems.push({ | ||
| type: "text", | ||
| text: `Note: Showing only the first ${DEFAULT_TOPN_POLICY_LIMIT} violations. Set allowLargeResultSet=true to retrieve more.` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might waste an LLM cycle because it does not mentioned that you also need to increase topN along with passing allowLargeResultSet: true
What does this PR do?
@W-21082863
Enforces a default cap of the top 10 violations returned by
query_code_analyzer_results. Any request for larger result sets now requires explicit user opt-in to prevent excessive token consumption.What issues does this PR fix or reference?
allowLargeResultSet?: booleanto the input schema (default:false)topN > 10unlessallowLargeResultSet === trueWhy this change?
This makes token usage explicit and predictable, and clearly signals to users that querying large result sets can result in high token consumption.