Skip to content

feat: add qa-changes plugin for automated PR QA validation#135

Draft
xingyaoww wants to merge 16 commits intomainfrom
feat/qa-changes-plugin
Draft

feat: add qa-changes plugin for automated PR QA validation#135
xingyaoww wants to merge 16 commits intomainfrom
feat/qa-changes-plugin

Conversation

@xingyaoww
Copy link
Copy Markdown
Contributor

Summary

Add a new qa-changes plugin that goes beyond code review by actually running the code to verify PR changes work as described. While the existing pr-review plugin reads diffs and posts inline code comments, this plugin sets up the environment, runs the test suite, exercises changed behavior, and posts a structured QA report.

Plugin Structure

skills/qa-changes/SKILL.md              # Generic QA methodology skill
plugins/qa-changes/
├── README.md                           # Plugin documentation
├── action.yml                          # Composite GitHub Action
├── scripts/
│   ├── agent_script.py                 # Main QA agent script
│   └── prompt.py                       # Prompt template
├── skills/
│   └── qa-changes -> ../../../skills/qa-changes
└── workflows/
    └── qa-changes-by-openhands.yml     # Example workflow

Five-Phase QA Methodology

The skill defines a generic, language-agnostic methodology:

  1. Understand — Read the diff, classify changes (new feature, bug fix, refactor, config/docs)
  2. Setup — Bootstrap the repo: install deps, build, establish test baseline
  3. Test — Run the test suite, record pass/fail counts, detect regressions
  4. Exercise — Go beyond tests: execute new features, reproduce fixed bugs, try edge cases
  5. Report — Post structured PR comment with evidence and verdict (PASS / PASS WITH ISSUES / FAIL)

How It Differs from PR Review

Aspect PR Review QA Changes
Method Reads the diff Runs the code
Speed 2-3 minutes 5-15 minutes
Catches Style, security, logic issues Regressions, broken features, build failures
Output Inline code comments Structured QA report with evidence

Usage

- name: Run QA Changes
  uses: OpenHands/extensions/plugins/qa-changes@main
  with:
    llm-model: anthropic/claude-sonnet-4-5-20250929
    llm-api-key: ${{ secrets.LLM_API_KEY }}
    github-token: ${{ secrets.GITHUB_TOKEN }}

Triggers: qa-this label or openhands-agent reviewer request.

Design Decisions

  • Generic skill: The SKILL.md is intentionally language/framework-agnostic. It teaches the agent how to think about QA, not specific commands. Project-specific details come from AGENTS.md or custom skills.
  • Structured from pr-review: The agent_script.py and action.yml follow the same patterns as pr-review for consistency, but the prompt and skill are completely different.
  • Security: Excludes FIRST_TIME_CONTRIBUTOR and NONE from automatic triggers since QA executes code.

Related

This PR was created by an AI assistant (OpenHands).

@xingyaoww can click here to continue refining the PR

Add a new plugin that goes beyond code review by actually running
the code to verify PR changes work as described.

Plugin structure:
- skills/qa-changes/SKILL.md: Generic QA methodology skill
- plugins/qa-changes/action.yml: Composite GitHub Action
- plugins/qa-changes/scripts/agent_script.py: Main QA agent
- plugins/qa-changes/scripts/prompt.py: Prompt template
- plugins/qa-changes/workflows/: Example workflow file
- plugins/qa-changes/README.md: Documentation

The QA agent follows a five-phase methodology:
1. Understand the change (classify diff)
2. Set up the environment (install deps, build)
3. Run the test suite (establish baseline, detect regressions)
4. Exercise changed behavior (manually test features/fixes)
5. Report results (structured PR comment with verdict)

Co-authored-by: openhands <openhands@all-hands.dev>
…ful failure

Key changes to the QA skill:
- Merge Setup + Test into one phase; check CI status first, only run
  tests CI doesn't cover
- Raise the bar for Exercise phase: frontend changes must use a real
  browser (Playwright/browser automation), CLI changes must run the
  actual CLI, API changes must make real HTTP requests
- Add specific guidance per change type (frontend, CLI, API, bug fix,
  library, refactor, config)
- Add 'Knowing When to Give Up' section: three attempts per approach,
  two approaches max, then report honestly and suggest AGENTS.md guidance
- Add PARTIAL verdict for when some behavior could not be verified
- Update prompt, README to match new four-phase methodology

Co-authored-by: openhands <openhands@all-hands.dev>
OpenHands SDK performs best with tmux available for terminal management.

Co-authored-by: openhands <openhands@all-hands.dev>
Add .plugin/plugin.json manifest and update agent_script.py to load the
qa-changes plugin via the SDK's Plugin system. This properly loads skills,
hooks, and MCP config bundled in the plugin directory.

Previously the script only loaded project skills via load_project_skills()
and missed the plugin's own skills entirely.

See #136 for the same issue in the pr-review plugin.

Co-authored-by: openhands <openhands@all-hands.dev>
…arketplace entry

- Enable browser tools (enable_browser=True) so the QA agent can
  actually verify UI changes in a real browser, matching the SKILL.md
  methodology
- Switch workflow from pull_request_target to pull_request to avoid
  executing untrusted fork code with the base repo's secrets
- Isolate untrusted PR body in the prompt with an explicit warning
  to mitigate prompt injection
- Add qa-changes skill to marketplaces/default.json (required by CI)
- Add comments explaining tmux (OpenHands runtime) and gh dependencies
- Update README security section to reflect the pull_request change

Co-authored-by: openhands <openhands@all-hands.dev>
- Add max-budget ($10 default), timeout-minutes (30 default), and
  max-iterations (200 default) as action inputs
- Enforce budget via a Conversation callback that raises BudgetExceeded
  when accumulated LLM cost exceeds the limit
- Enforce timeout via GHA step-level timeout-minutes
- Enforce iteration cap via SDK's max_iteration_per_run parameter
- Pass all three values through as env vars (MAX_BUDGET, MAX_ITERATIONS)
  and document them
- Add tests for format_prompt, truncate_diff, and validate_environment
  (20 tests covering fields, edge cases, defaults, and custom overrides)
- Add missing skills/qa-changes/README.md (required by CI)
- Update README action inputs table with new parameters

Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
…ll timeout

GitHub Actions does not support `timeout-minutes` on steps inside
composite actions (only at the job level). The `Set up job` step
fails with: 'Unexpected value timeout-minutes'.

Replace with the coreutils `timeout` command, passing the
`timeout-minutes` input via an environment variable and converting
to seconds in the shell.

Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
- Add lmnr-api-key input to action.yml with env var and --with lmnr
- Add Laminar trace artifact upload step
- Add save_trace_context() to agent_script.py for trace persistence
- Create evaluate_qa_changes.py for post-close evaluation
- Create qa-changes-evaluation.yml workflow template
- Update workflow template to pass lmnr-api-key

Co-authored-by: openhands <openhands@all-hands.dev>
Test extract_qa_report, extract_human_responses, truncate_text,
calculate_engagement_score, and load_trace_info.

Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Switch from posting QA results as a plain PR comment (gh pr comment)
to posting them as a GitHub code review thread using the
/github-pr-review skill. The agent now:
- Triggers both /qa-changes and /github-pr-review skills
- Posts a structured review body with the full QA report
- Adds inline review comments on specific lines for issues found
- Uses priority labels (🔴🟠🟡🟢) from the github-pr-review skill
- Bundles everything into a single review API call

Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Add symlink skills/github-pr-review -> ../../../skills/github-pr-review
so the skill is explicitly available to the agent, matching the pattern
used by the pr-review plugin.

Co-authored-by: openhands <openhands@all-hands.dev>
Update the QA skill and prompt to produce more scannable reports:

- Verdict + one-sentence summary at the top for instant readability
- Status table gives at-a-glance phase results
- All evidence (code snippets, logs, command output) goes inside
  HTML <details> collapsible blocks
- Explicit formatting rules: no repetition across sections, omit
  empty sections, issues always visible
- Prompt reinforces compact format and collapsible evidence

Motivated by verbose QA reports on PRs like #2798 in software-agent-sdk
where long inline evidence made the report hard to scan.

Co-authored-by: openhands <openhands@all-hands.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants