feat: add qa-changes plugin for automated PR QA validation by xingyaoww · Pull Request #135 · OpenHands/extensions

xingyaoww · 2026-04-02T18:05:08Z

Summary

Add a new qa-changes plugin that goes beyond code review by actually running the code to verify PR changes work as described. While the existing pr-review plugin reads diffs and posts inline code comments, this plugin sets up the environment, runs the test suite, exercises changed behavior, and posts a structured QA report.

Plugin Structure

skills/qa-changes/SKILL.md              # Generic QA methodology skill
plugins/qa-changes/
├── README.md                           # Plugin documentation
├── action.yml                          # Composite GitHub Action
├── scripts/
│   ├── agent_script.py                 # Main QA agent script
│   └── prompt.py                       # Prompt template
├── skills/
│   └── qa-changes -> ../../../skills/qa-changes
└── workflows/
    └── qa-changes-by-openhands.yml     # Example workflow

Five-Phase QA Methodology

The skill defines a generic, language-agnostic methodology:

Understand — Read the diff, classify changes (new feature, bug fix, refactor, config/docs)
Setup — Bootstrap the repo: install deps, build, establish test baseline
Test — Run the test suite, record pass/fail counts, detect regressions
Exercise — Go beyond tests: execute new features, reproduce fixed bugs, try edge cases
Report — Post structured PR comment with evidence and verdict (PASS / PASS WITH ISSUES / FAIL)

How It Differs from PR Review

Aspect	PR Review	QA Changes
Method	Reads the diff	Runs the code
Speed	2-3 minutes	5-15 minutes
Catches	Style, security, logic issues	Regressions, broken features, build failures
Output	Inline code comments	Structured QA report with evidence

Usage

- name: Run QA Changes
  uses: OpenHands/extensions/plugins/qa-changes@main
  with:
    llm-model: anthropic/claude-sonnet-4-5-20250929
    llm-api-key: ${{ secrets.LLM_API_KEY }}
    github-token: ${{ secrets.GITHUB_TOKEN }}

Triggers: qa-this label or openhands-agent reviewer request.

Design Decisions

Generic skill: The SKILL.md is intentionally language/framework-agnostic. It teaches the agent how to think about QA, not specific commands. Project-specific details come from AGENTS.md or custom skills.
Structured from pr-review: The agent_script.py and action.yml follow the same patterns as pr-review for consistency, but the prompt and skill are completely different.
Security: Excludes FIRST_TIME_CONTRIBUTOR and NONE from automatic triggers since QA executes code.

Add a new plugin that goes beyond code review by actually running the code to verify PR changes work as described. Plugin structure: - skills/qa-changes/SKILL.md: Generic QA methodology skill - plugins/qa-changes/action.yml: Composite GitHub Action - plugins/qa-changes/scripts/agent_script.py: Main QA agent - plugins/qa-changes/scripts/prompt.py: Prompt template - plugins/qa-changes/workflows/: Example workflow file - plugins/qa-changes/README.md: Documentation The QA agent follows a five-phase methodology: 1. Understand the change (classify diff) 2. Set up the environment (install deps, build) 3. Run the test suite (establish baseline, detect regressions) 4. Exercise changed behavior (manually test features/fixes) 5. Report results (structured PR comment with verdict) Co-authored-by: openhands <openhands@all-hands.dev>

…ful failure Key changes to the QA skill: - Merge Setup + Test into one phase; check CI status first, only run tests CI doesn't cover - Raise the bar for Exercise phase: frontend changes must use a real browser (Playwright/browser automation), CLI changes must run the actual CLI, API changes must make real HTTP requests - Add specific guidance per change type (frontend, CLI, API, bug fix, library, refactor, config) - Add 'Knowing When to Give Up' section: three attempts per approach, two approaches max, then report honestly and suggest AGENTS.md guidance - Add PARTIAL verdict for when some behavior could not be verified - Update prompt, README to match new four-phase methodology Co-authored-by: openhands <openhands@all-hands.dev>

OpenHands SDK performs best with tmux available for terminal management. Co-authored-by: openhands <openhands@all-hands.dev>

Add .plugin/plugin.json manifest and update agent_script.py to load the qa-changes plugin via the SDK's Plugin system. This properly loads skills, hooks, and MCP config bundled in the plugin directory. Previously the script only loaded project skills via load_project_skills() and missed the plugin's own skills entirely. See #136 for the same issue in the pr-review plugin. Co-authored-by: openhands <openhands@all-hands.dev>

…arketplace entry - Enable browser tools (enable_browser=True) so the QA agent can actually verify UI changes in a real browser, matching the SKILL.md methodology - Switch workflow from pull_request_target to pull_request to avoid executing untrusted fork code with the base repo's secrets - Isolate untrusted PR body in the prompt with an explicit warning to mitigate prompt injection - Add qa-changes skill to marketplaces/default.json (required by CI) - Add comments explaining tmux (OpenHands runtime) and gh dependencies - Update README security section to reflect the pull_request change Co-authored-by: openhands <openhands@all-hands.dev>

- Add max-budget ($10 default), timeout-minutes (30 default), and max-iterations (200 default) as action inputs - Enforce budget via a Conversation callback that raises BudgetExceeded when accumulated LLM cost exceeds the limit - Enforce timeout via GHA step-level timeout-minutes - Enforce iteration cap via SDK's max_iteration_per_run parameter - Pass all three values through as env vars (MAX_BUDGET, MAX_ITERATIONS) and document them - Add tests for format_prompt, truncate_diff, and validate_environment (20 tests covering fields, edge cases, defaults, and custom overrides) - Add missing skills/qa-changes/README.md (required by CI) - Update README action inputs table with new parameters Co-authored-by: openhands <openhands@all-hands.dev>

Co-authored-by: openhands <openhands@all-hands.dev>

…ll timeout GitHub Actions does not support `timeout-minutes` on steps inside composite actions (only at the job level). The `Set up job` step fails with: 'Unexpected value timeout-minutes'. Replace with the coreutils `timeout` command, passing the `timeout-minutes` input via an environment variable and converting to seconds in the shell. Co-authored-by: openhands <openhands@all-hands.dev>

Co-authored-by: openhands <openhands@all-hands.dev>

- Add lmnr-api-key input to action.yml with env var and --with lmnr - Add Laminar trace artifact upload step - Add save_trace_context() to agent_script.py for trace persistence - Create evaluate_qa_changes.py for post-close evaluation - Create qa-changes-evaluation.yml workflow template - Update workflow template to pass lmnr-api-key Co-authored-by: openhands <openhands@all-hands.dev>

Test extract_qa_report, extract_human_responses, truncate_text, calculate_engagement_score, and load_trace_info. Co-authored-by: openhands <openhands@all-hands.dev>

Co-authored-by: openhands <openhands@all-hands.dev>

Switch from posting QA results as a plain PR comment (gh pr comment) to posting them as a GitHub code review thread using the /github-pr-review skill. The agent now: - Triggers both /qa-changes and /github-pr-review skills - Posts a structured review body with the full QA report - Adds inline review comments on specific lines for issues found - Uses priority labels (🔴🟠🟡🟢) from the github-pr-review skill - Bundles everything into a single review API call Co-authored-by: openhands <openhands@all-hands.dev>

Co-authored-by: openhands <openhands@all-hands.dev>

Add symlink skills/github-pr-review -> ../../../skills/github-pr-review so the skill is explicitly available to the agent, matching the pattern used by the pr-review plugin. Co-authored-by: openhands <openhands@all-hands.dev>

Update the QA skill and prompt to produce more scannable reports: - Verdict + one-sentence summary at the top for instant readability - Status table gives at-a-glance phase results - All evidence (code snippets, logs, command output) goes inside HTML <details> collapsible blocks - Explicit formatting rules: no repetition across sections, omit empty sections, issues always visible - Prompt reinforces compact format and collapsible evidence Motivated by verbose QA reports on PRs like #2798 in software-agent-sdk where long inline evidence made the report hard to scan. Co-authored-by: openhands <openhands@all-hands.dev>

xingyaoww mentioned this pull request Apr 2, 2026

docs: add QA Changes use case and SDK workflow guide OpenHands/docs#431

Draft

openhands-agent added 2 commits April 2, 2026 18:13

fix: add tmux as system dependency in action.yml

1c3516e

OpenHands SDK performs best with tmux available for terminal management. Co-authored-by: openhands <openhands@all-hands.dev>

xingyaoww mentioned this pull request Apr 2, 2026

pr-review plugin: agent_script.py should use Plugin.load() instead of manual skill loading #136

Closed

openhands-agent added 13 commits April 2, 2026 18:21

chore: bump default max-iterations from 200 to 500

f6b6365

Co-authored-by: openhands <openhands@all-hands.dev>

chore: remove accidentally committed uv.lock

6d4cbae

Co-authored-by: openhands <openhands@all-hands.dev>

test: add tests for evaluate_qa_changes.py

fd9259c

Test extract_qa_report, extract_human_responses, truncate_text, calculate_engagement_score, and load_trace_info. Co-authored-by: openhands <openhands@all-hands.dev>

chore: remove accidentally committed uv.lock

1938a52

Co-authored-by: openhands <openhands@all-hands.dev>

chore: add uv.lock to .gitignore and remove from tracking

ea104ea

Co-authored-by: openhands <openhands@all-hands.dev>

fix(qa-changes): link github-pr-review skill to plugin

91189b8

Add symlink skills/github-pr-review -> ../../../skills/github-pr-review so the skill is explicitly available to the agent, matching the pattern used by the pr-review plugin. Co-authored-by: openhands <openhands@all-hands.dev>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add qa-changes plugin for automated PR QA validation#135

feat: add qa-changes plugin for automated PR QA validation#135
xingyaoww wants to merge 16 commits intomainfrom
feat/qa-changes-plugin

xingyaoww commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xingyaoww commented Apr 2, 2026

Summary

Plugin Structure

Five-Phase QA Methodology

How It Differs from PR Review

Usage

Design Decisions

Related

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants