Skip to content

Conversation

@aruntyagiTutu
Copy link
Contributor

@aruntyagiTutu aruntyagiTutu commented Feb 2, 2026

What does this PR do?

@W-21082863
Enforces a default cap of the top 10 violations returned by query_code_analyzer_results. Any request for larger result sets now requires explicit user opt-in to prevent excessive token consumption.


What issues does this PR fix or reference?

  • Adds allowLargeResultSet?: boolean to the input schema (default: false)
  • Rejects requests with topN > 10 unless allowLargeResultSet === true
  • Updates tool descriptions to clearly document the token consumption implications
  • Adds test coverage for both rejection and explicit allowance paths

Why this change?

This makes token usage explicit and predictable, and clearly signals to users that querying large result sets can result in high token consumption.

@aruntyagiTutu aruntyagiTutu requested a review from a team as a code owner February 2, 2026 07:48
@aruntyagiTutu aruntyagiTutu requested a review from a team as a code owner February 3, 2026 10:53
`Query a Code Analyzer results JSON file and return filtered violations.\n` +
`Supports filters like severity, category/tag, engine, rule, and file name, plus top-N and sorting.\n` +
`Use this after running "run_code_analyzer" to read the generated results file.\n` +
`After completion, this tool will summarize and explain the filtered results to the user.\n` +
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated: Backticks (`) support multiline without needing to concatenate strings or add newline characters (\n). MDN docs

This could be rewritten to like so:

const DESCRIPTION: string = `Query a Code Analyzer results JSON file and return filtered violations.
Supports filters like severity, category/tag, engine, rule, and file name, plus top-N and sorting.
Use this after running "run_code_analyzer" to read the generated results file.
After completion, this tool will summarize and explain the filtered results to the user.
[...etc...]`;

resultsFile: z.string().describe("Absolute path to a results JSON file produced by the code analyzer., if results file is not provided, call run_code_analyzer tool to generate a results file first."),
selector: z.string().describe('Selector (same semantics as "list_code_analyzer_rules"): colon-separated tokens with optional OR-groups in parentheses, e.g., "Security:(pmd,eslint):High".'),
topN: z.number().int().positive().max(1000).default(5).describe("Return at most this many violations after filtering and sorting (default 5)."),
topN: z.number().int().positive().max(1000).default(5).describe(`Return at most this many violations after filtering and sorting (default 5). Requests >${DEFAULT_TOPN_POLICY_LIMIT} require allowLargeResultSet=true.`),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these two defaults be updated to use DEFAULT_TOPN_POLICY_LIMIT (which is currently 10), instead of a hard coded 5?

allowLargeResultSet: boolean | undefined,
limit: number = DEFAULT_TOPN_POLICY_LIMIT
): { ok: true } | { ok: false; message: string; code: PolicyErrorCode } {
const value = topN ?? 5;
Copy link
Collaborator

@iowillhoit iowillhoit Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, should this be hardcoded 5 or DEFAULT_TOPN_POLICY_LIMIT (10)?

selector: z.string().describe('Selector (same semantics as "list_code_analyzer_rules"): colon-separated tokens with optional OR-groups in parentheses, e.g., "Security:(pmd,eslint):High".'),
topN: z.number().int().positive().max(1000).default(5).describe("Return at most this many violations after filtering and sorting (default 5)."),
topN: z.number().int().positive().max(1000).default(5).describe(`Return at most this many violations after filtering and sorting (default 5). Requests >${DEFAULT_TOPN_POLICY_LIMIT} require allowLargeResultSet=true.`),
allowLargeResultSet: z.boolean().optional().describe(`Explicit opt-in to request more than the top ${DEFAULT_TOPN_POLICY_LIMIT} violations. Defaults to false and is not recommended.`),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why both the topN and allowLargeResultSet inputs are needed? You already have the topN input that defaults to 5 (or 10?). If a user asks for more results than the default, why do they also need to set a boolean to get what they asked for? Explicitly asking for more that 10 results feels like it should be enough.

For example, if I ran:
sf data query --query "SELECT Id, Name from Contact LIMIT 25"
I would find it odd if I got an error back saying:
You need to set --return-large-results to get more than 10 results
I already asked for 25, just give me 25.

You could update the description of the topN input to say something like "Never increase this number unless the users explicitly asks for it to be larger"

if (truncated) {
contentItems.push({
type: "text",
text: `Note: Showing only the first ${DEFAULT_TOPN_POLICY_LIMIT} violations. Set allowLargeResultSet=true to retrieve more.`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might waste an LLM cycle because it does not mentioned that you also need to increase topN along with passing allowLargeResultSet: true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants