feat: Science based system prompt overspec audit and mitigation#16174
Closed
micuintus wants to merge 8 commits intoanomalyco:devfrom
Closed
feat: Science based system prompt overspec audit and mitigation#16174micuintus wants to merge 8 commits intoanomalyco:devfrom
micuintus wants to merge 8 commits intoanomalyco:devfrom
Conversation
Contributor
|
This PR doesn't fully meet our contributing guidelines and PR template. What needs to be fixed:
Please edit this PR description to address the above within 2 hours, or it will be automatically closed. If you believe this was flagged incorrectly, please let a maintainer know. |
Contributor
|
Hey! Your PR title Please update it to start with one of:
Where See CONTRIBUTING.md for details. |
36bbfa3 to
c4538b1
Compare
added 8 commits
March 5, 2026 12:47
Create script/audit-overspecification.ts that: - Audits all session and agent prompt files - Measures tokens (chars/4), directives (MUST/NEVER/ALWAYS/IMPORTANT/CRITICAL), and examples (XML and markdown patterns) - Compares against thresholds by prompt type: - Provider: ≤1500 tokens, ≤12 directives, ≤5 examples - Utility: ≤200 tokens, ≤4 directives, ≤0 examples - Agent: ≤400 tokens, ≤6 directives, ≤3 examples - Meta: ≤800 tokens, ≤0 directives, ≤0 examples - Outputs structured report to stdout, violations to stderr - Runs in warning mode (exit 0) for CI integration
Reduce token count from 3843 to ~2400 by: - Remove New Applications section (~500 tokens) - Remove 3 verbose examples (Delete temp, Write tests, Find config) - Keep 5 most illustrative examples - Remove redundant tool usage instructions
Reduce token count from 2770 to ~1200 by: - Consolide repetitive 'keep going' directives - Simplify 10-step workflow to 9 concise steps - Remove Reading Files section (redundant) - Remove Writing Prompts section (not relevant) - Compress Communication Guidelines - Keep Memory section as flagged for review
Reduce token count from 2425 to ~1700 by: - Remove duplicate IMPORTANT statements about conciseness - Reduce examples from 18 to 7 - Remove redundant tool usage instructions - Consolidate security warnings
Reduce token count from 1937 to ~1400 by: - Reduce examples from 18 to 7 - Remove duplicate IMPORTANT statements - Consolidate verbosity guidelines
Remove verbose task management examples and explanations Streamline tool usage policy section Keep core functionality intact
Reduce examples from 10 to 5 Bring token count under 400 threshold
Add npm script to run prompt overspecification audit Usage: bun run audit:prompts
c4538b1 to
d0af084
Compare
Contributor
|
This pull request has been automatically closed because it was not updated to meet our contributing guidelines within the 2-hour window. Feel free to open a new pull request that follows our guidelines. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue for this PR
Closes #16170
Type of change
What does this PR do?
How did you verify your code works?
Testing