feat: add qa-changes plugin for automated PR QA validation#135
Draft
feat: add qa-changes plugin for automated PR QA validation#135
Conversation
Add a new plugin that goes beyond code review by actually running the code to verify PR changes work as described. Plugin structure: - skills/qa-changes/SKILL.md: Generic QA methodology skill - plugins/qa-changes/action.yml: Composite GitHub Action - plugins/qa-changes/scripts/agent_script.py: Main QA agent - plugins/qa-changes/scripts/prompt.py: Prompt template - plugins/qa-changes/workflows/: Example workflow file - plugins/qa-changes/README.md: Documentation The QA agent follows a five-phase methodology: 1. Understand the change (classify diff) 2. Set up the environment (install deps, build) 3. Run the test suite (establish baseline, detect regressions) 4. Exercise changed behavior (manually test features/fixes) 5. Report results (structured PR comment with verdict) Co-authored-by: openhands <openhands@all-hands.dev>
…ful failure Key changes to the QA skill: - Merge Setup + Test into one phase; check CI status first, only run tests CI doesn't cover - Raise the bar for Exercise phase: frontend changes must use a real browser (Playwright/browser automation), CLI changes must run the actual CLI, API changes must make real HTTP requests - Add specific guidance per change type (frontend, CLI, API, bug fix, library, refactor, config) - Add 'Knowing When to Give Up' section: three attempts per approach, two approaches max, then report honestly and suggest AGENTS.md guidance - Add PARTIAL verdict for when some behavior could not be verified - Update prompt, README to match new four-phase methodology Co-authored-by: openhands <openhands@all-hands.dev>
OpenHands SDK performs best with tmux available for terminal management. Co-authored-by: openhands <openhands@all-hands.dev>
Add .plugin/plugin.json manifest and update agent_script.py to load the qa-changes plugin via the SDK's Plugin system. This properly loads skills, hooks, and MCP config bundled in the plugin directory. Previously the script only loaded project skills via load_project_skills() and missed the plugin's own skills entirely. See #136 for the same issue in the pr-review plugin. Co-authored-by: openhands <openhands@all-hands.dev>
…arketplace entry - Enable browser tools (enable_browser=True) so the QA agent can actually verify UI changes in a real browser, matching the SKILL.md methodology - Switch workflow from pull_request_target to pull_request to avoid executing untrusted fork code with the base repo's secrets - Isolate untrusted PR body in the prompt with an explicit warning to mitigate prompt injection - Add qa-changes skill to marketplaces/default.json (required by CI) - Add comments explaining tmux (OpenHands runtime) and gh dependencies - Update README security section to reflect the pull_request change Co-authored-by: openhands <openhands@all-hands.dev>
- Add max-budget ($10 default), timeout-minutes (30 default), and max-iterations (200 default) as action inputs - Enforce budget via a Conversation callback that raises BudgetExceeded when accumulated LLM cost exceeds the limit - Enforce timeout via GHA step-level timeout-minutes - Enforce iteration cap via SDK's max_iteration_per_run parameter - Pass all three values through as env vars (MAX_BUDGET, MAX_ITERATIONS) and document them - Add tests for format_prompt, truncate_diff, and validate_environment (20 tests covering fields, edge cases, defaults, and custom overrides) - Add missing skills/qa-changes/README.md (required by CI) - Update README action inputs table with new parameters Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
…ll timeout GitHub Actions does not support `timeout-minutes` on steps inside composite actions (only at the job level). The `Set up job` step fails with: 'Unexpected value timeout-minutes'. Replace with the coreutils `timeout` command, passing the `timeout-minutes` input via an environment variable and converting to seconds in the shell. Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
- Add lmnr-api-key input to action.yml with env var and --with lmnr - Add Laminar trace artifact upload step - Add save_trace_context() to agent_script.py for trace persistence - Create evaluate_qa_changes.py for post-close evaluation - Create qa-changes-evaluation.yml workflow template - Update workflow template to pass lmnr-api-key Co-authored-by: openhands <openhands@all-hands.dev>
Test extract_qa_report, extract_human_responses, truncate_text, calculate_engagement_score, and load_trace_info. Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Switch from posting QA results as a plain PR comment (gh pr comment) to posting them as a GitHub code review thread using the /github-pr-review skill. The agent now: - Triggers both /qa-changes and /github-pr-review skills - Posts a structured review body with the full QA report - Adds inline review comments on specific lines for issues found - Uses priority labels (🔴🟠🟡🟢) from the github-pr-review skill - Bundles everything into a single review API call Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Add symlink skills/github-pr-review -> ../../../skills/github-pr-review so the skill is explicitly available to the agent, matching the pattern used by the pr-review plugin. Co-authored-by: openhands <openhands@all-hands.dev>
Update the QA skill and prompt to produce more scannable reports: - Verdict + one-sentence summary at the top for instant readability - Status table gives at-a-glance phase results - All evidence (code snippets, logs, command output) goes inside HTML <details> collapsible blocks - Explicit formatting rules: no repetition across sections, omit empty sections, issues always visible - Prompt reinforces compact format and collapsible evidence Motivated by verbose QA reports on PRs like #2798 in software-agent-sdk where long inline evidence made the report hard to scan. Co-authored-by: openhands <openhands@all-hands.dev>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add a new
qa-changesplugin that goes beyond code review by actually running the code to verify PR changes work as described. While the existingpr-reviewplugin reads diffs and posts inline code comments, this plugin sets up the environment, runs the test suite, exercises changed behavior, and posts a structured QA report.Plugin Structure
Five-Phase QA Methodology
The skill defines a generic, language-agnostic methodology:
How It Differs from PR Review
Usage
Triggers:
qa-thislabel oropenhands-agentreviewer request.Design Decisions
AGENTS.mdor custom skills.agent_script.pyandaction.ymlfollow the same patterns aspr-reviewfor consistency, but the prompt and skill are completely different.FIRST_TIME_CONTRIBUTORandNONEfrom automatic triggers since QA executes code.Related
This PR was created by an AI assistant (OpenHands).
@xingyaoww can click here to continue refining the PR