From 38f3cdc45a23e9acfb119ef44906e094db14b8b7 Mon Sep 17 00:00:00 2001 From: openhands Date: Thu, 2 Apr 2026 18:04:29 +0000 Subject: [PATCH 1/2] docs: add QA Changes use case and SDK workflow guide Add documentation for the new QA Changes plugin: - openhands/usage/use-cases/qa-changes.mdx: Use case page with overview, quick start, customization, and troubleshooting - sdk/guides/github-workflows/qa-changes.mdx: SDK guide with reference workflow, action inputs, and skill customization - docs.json: Add both pages to navigation Co-authored-by: openhands --- docs.json | 2 + openhands/usage/use-cases/qa-changes.mdx | 239 +++++++++++++++++++++ sdk/guides/github-workflows/qa-changes.mdx | 167 ++++++++++++++ 3 files changed, 408 insertions(+) create mode 100644 openhands/usage/use-cases/qa-changes.mdx create mode 100644 sdk/guides/github-workflows/qa-changes.mdx diff --git a/docs.json b/docs.json index ca8084359..9b4904434 100644 --- a/docs.json +++ b/docs.json @@ -213,6 +213,7 @@ "pages": [ "openhands/usage/use-cases/vulnerability-remediation", "openhands/usage/use-cases/code-review", + "openhands/usage/use-cases/qa-changes", "openhands/usage/use-cases/incident-triage", "openhands/usage/use-cases/cobol-modernization", "openhands/usage/use-cases/dependency-upgrades", @@ -307,6 +308,7 @@ "pages": [ "sdk/guides/github-workflows/assign-reviews", "sdk/guides/github-workflows/pr-review", + "sdk/guides/github-workflows/qa-changes", "sdk/guides/github-workflows/todo-management" ] } diff --git a/openhands/usage/use-cases/qa-changes.mdx b/openhands/usage/use-cases/qa-changes.mdx new file mode 100644 index 000000000..9f457a2ef --- /dev/null +++ b/openhands/usage/use-cases/qa-changes.mdx @@ -0,0 +1,239 @@ +--- +title: Automated QA Validation +description: Set up automated QA testing of PR changes using OpenHands and the Software Agent SDK +--- + + + Check out the complete QA Changes plugin with ready-to-use code and configuration. + + +Automated code review catches style, security, and logic issues by reading diffs — but it cannot verify that a change *actually works*. The QA Changes workflow fills this gap by running the code: setting up the environment, executing the test suite, exercising changed behavior, and posting a structured report with evidence. + +## Overview + +The OpenHands QA Changes workflow is a GitHub Actions workflow that: + +- **Triggers automatically** when PRs are opened or when you request QA validation +- **Sets up the full environment** — installs dependencies, builds the project +- **Runs the test suite** — detects regressions introduced by the PR +- **Exercises changed behavior** — manually tests new features, bug fixes, and edge cases +- **Posts a structured QA report** as a PR comment with commands, outputs, and a verdict + +## How It Works + +The QA workflow uses the OpenHands Software Agent SDK to validate your code changes: + +1. **Trigger**: The workflow runs when: + - A new non-draft PR is opened + - A draft PR is marked as ready for review + - The `qa-this` label is added to a PR + - `openhands-agent` is requested as a reviewer + +2. **Validation**: The agent follows a five-phase methodology: + - **Understand**: Reads the diff and classifies changes (new feature, bug fix, refactor, config) + - **Setup**: Bootstraps the repository — installs dependencies, builds, establishes a test baseline + - **Test**: Runs the existing test suite, records pass/fail counts, detects regressions + - **Exercise**: Goes beyond tests — executes new features, reproduces fixed bugs, tries edge cases + - **Report**: Posts structured findings with evidence + +3. **Output**: A QA report is posted as a PR comment with: + - Environment setup status + - Test suite results (pass/fail counts, regressions) + - Functional verification evidence (commands run, outputs observed) + - Issues found (🔴 Blocker, 🟠 Issue, 🟡 Minor) + - Verdict (✅ PASS, ⚠️ PASS WITH ISSUES, ❌ FAIL) + +### Code Review vs QA Validation + +| Aspect | [Code Review](/openhands/usage/use-cases/code-review) | QA Validation | +|--------|-------------|---------------| +| **Method** | Reads the diff | Runs the code | +| **Speed** | 2-3 minutes | 5-15 minutes | +| **Catches** | Style, security, logic issues | Regressions, broken features, build failures | +| **Output** | Inline code comments | Structured QA report with evidence | +| **Best for** | Every PR | Feature PRs, bug fixes, risky changes | + +Both workflows complement each other. Use code review for fast feedback on every PR, and QA validation for thorough verification of changes that affect behavior. + +## Quick Start + + + + Create `.github/workflows/qa-changes-by-openhands.yml` in your repository: + + ```yaml + name: QA Changes by OpenHands + + on: + pull_request_target: + types: [opened, ready_for_review, labeled, review_requested] + + permissions: + contents: read + pull-requests: write + issues: write + + jobs: + qa-changes: + if: | + (github.event.action == 'opened' && github.event.pull_request.draft == false) || + github.event.action == 'ready_for_review' || + github.event.label.name == 'qa-this' || + github.event.requested_reviewer.login == 'openhands-agent' + runs-on: ubuntu-latest + steps: + - name: Run QA Changes + uses: OpenHands/extensions/plugins/qa-changes@main + with: + llm-model: anthropic/claude-sonnet-4-5-20250929 + llm-api-key: ${{ secrets.LLM_API_KEY }} + github-token: ${{ secrets.GITHUB_TOKEN }} + ``` + + + + Go to your repository's **Settings → Secrets and variables → Actions** and add: + - **`LLM_API_KEY`**: Your LLM API key (get one from [OpenHands LLM Provider](/openhands/usage/llms/openhands-llms)) + + + + Create a `qa-this` label in your repository: + 1. Go to **Issues → Labels** + 2. Click **New label** + 3. Name: `qa-this` + 4. Description: `Trigger OpenHands QA validation` + + + + Open a PR and either: + - Add the `qa-this` label, OR + - Request `openhands-agent` as a reviewer + + + +## Composite Action + +The workflow uses a reusable composite action from the extensions repository that handles all the setup automatically: + +- Checking out the extensions and PR repositories +- Setting up Python and dependencies +- Running the QA agent in the PR's workspace +- Uploading logs as artifacts + +### Action Inputs + +| Input | Description | Required | Default | +|-------|-------------|----------|---------| +| `llm-model` | LLM model to use | No | `anthropic/claude-sonnet-4-5-20250929` | +| `llm-base-url` | LLM base URL (for custom endpoints) | No | `''` | +| `extensions-version` | Git ref for extensions (tag, branch, or commit SHA) | No | `main` | +| `extensions-repo` | Extensions repository (owner/repo) | No | `OpenHands/extensions` | +| `llm-api-key` | LLM API key | Yes | - | +| `github-token` | GitHub token for API access | Yes | - | + + +Use `extensions-version` to pin to a specific version tag (e.g., `v1.0.0`) for production stability, or use `main` to always get the latest features. + + +## Customization + +### Repository-Specific QA Guidelines + +Help the QA agent understand your project by adding a skill file at `.agents/skills/qa-guide.md`: + +```markdown +--- +name: qa-guide +description: Project-specific QA guidelines +triggers: +- /qa-changes +--- + +# Project QA Guidelines + +## Setup Commands +- `make install` to install dependencies +- `make build` to build the project + +## Test Commands +- `make test` for unit tests +- `make test-integration` for integration tests +- `make test-e2e` for end-to-end tests + +## Key Behaviors to Verify +- User authentication flows +- API endpoint responses +- Database migration correctness + +## Known Fragile Areas +- WebSocket connections under load +- File upload handling for large files +``` + + +The QA agent also reads your repository's `AGENTS.md` file automatically. Adding setup commands, test commands, and project conventions there helps both QA and other OpenHands workflows. + + +### Trigger Customization + +Modify when QA runs by editing the workflow conditions: + +```yaml +# Only trigger on label (disable auto-QA on PR open) +if: github.event.label.name == 'qa-this' + +# Trigger on all PRs (including drafts) +if: | + github.event.action == 'opened' || + github.event.action == 'synchronize' +``` + +## Security Considerations + + +**Important**: The QA agent executes code from the PR. Unlike code review (which only reads diffs), QA validation runs commands in the repository. + +The workflow excludes `FIRST_TIME_CONTRIBUTOR` and `NONE` author associations from automatic triggers. For untrusted PRs, manually review the changes before adding the `qa-this` label. + +API keys are passed as [SDK secrets](/sdk/guides/secrets) to prevent direct credential access during code execution. + + +## Troubleshooting + + + + - Ensure the `LLM_API_KEY` secret is set correctly + - Check that the label name matches exactly (`qa-this`) + - Verify the workflow file is in `.github/workflows/` + - Check the Actions tab for workflow run errors + + + + - Add setup instructions to `AGENTS.md` or a custom QA skill + - Ensure the project's dependencies are available in the CI environment + - Check if the project requires specific system packages + + + + - Large test suites may take longer to run + - Consider adding a custom skill that specifies which test subset to run + - Check if the LLM API is experiencing delays + + + + - Ensure `GITHUB_TOKEN` has `pull-requests: write` permission + - Check the workflow logs for API errors + - Verify the PR is not from a fork with restricted permissions + + + +## Related Resources + +- [QA Changes Plugin](https://github.com/OpenHands/extensions/tree/main/plugins/qa-changes) - Complete plugin with scripts and skills +- [Automated Code Review](/openhands/usage/use-cases/code-review) - Complementary code review workflow +- [Software Agent SDK](/sdk/index) - Build your own AI-powered workflows +- [Skills Documentation](/overview/skills) - Learn more about OpenHands skills diff --git a/sdk/guides/github-workflows/qa-changes.mdx b/sdk/guides/github-workflows/qa-changes.mdx new file mode 100644 index 000000000..ef1bbfd50 --- /dev/null +++ b/sdk/guides/github-workflows/qa-changes.mdx @@ -0,0 +1,167 @@ +--- +title: QA Changes +description: Use OpenHands Agent to automatically QA pull request changes by running the code +--- + +> The reference workflow is available [here](#reference-workflow)! + +Automatically validate pull request changes by actually running the code — setting up the environment, executing the test suite, exercising changed behavior, and posting a structured QA report as a PR comment. QA can be triggered in two ways: +- Adding the `qa-this` label to the PR +- Requesting `openhands-agent` as a reviewer + + +The reference workflow triggers on either the "qa-this" label or when the openhands-agent account is requested as a reviewer. In OpenHands organization repositories, openhands-agent has access, so this works as-is. In your own repositories, requesting openhands-agent will only work if that account is added as a collaborator. If you don't plan to grant access, use the label trigger instead. + + +## Quick Start + +```bash +# 1. Copy workflow to your repository +cp plugins/qa-changes/workflows/qa-changes-by-openhands.yml \ + .github/workflows/qa-changes-by-openhands.yml + +# 2. Configure secrets in GitHub Settings → Secrets +# Add: LLM_API_KEY + +# 3. (Optional) Create a "qa-this" label in your repository +# Go to Issues → Labels → New label +``` + +## Features + +- **Runs the actual code** — Sets up the environment, installs dependencies, builds the project +- **Regression detection** — Runs the test suite and identifies new failures introduced by the PR +- **Functional verification** — Exercises new features, reproduces fixed bugs, tests edge cases +- **Structured reports** — Posts QA findings as a PR comment with evidence and a clear verdict +- **Customizable** — Add project-specific QA guidelines without forking + +## How QA Differs from Code Review + +The [PR Review](/sdk/guides/github-workflows/pr-review) workflow reads the diff and posts inline code comments. The QA Changes workflow *executes* the code: + +| | PR Review | QA Changes | +|---|-----------|------------| +| **Reads diff** | ✅ | ✅ | +| **Installs dependencies** | ❌ | ✅ | +| **Runs test suite** | ❌ | ✅ | +| **Executes changed features** | ❌ | ✅ | +| **Detects regressions** | ❌ | ✅ | +| **Output** | Inline comments | PR comment report | + +Both complement each other. Use PR Review for fast code-level feedback, and QA Changes for behavioral verification. + +## Security + + +**The QA agent executes code from the PR.** Unlike code review (which only reads diffs), QA validation runs commands. Only trigger it on PRs you trust. + +The workflow excludes `FIRST_TIME_CONTRIBUTOR` and `NONE` author associations from automatic triggers. For untrusted PRs, manually review the code before adding the `qa-this` label. + + +## Customizing QA Behavior + +Instead of forking the scripts, add project-specific QA guidelines as a skill file. + +### How It Works + +The QA agent uses the [`/qa-changes`](https://github.com/OpenHands/extensions/tree/main/skills/qa-changes) skill for its methodology. Add project-specific setup commands, test commands, and verification guidelines by creating a custom skill. + + +**Skill paths**: Place skills in `.agents/skills/` (recommended). See [Skill Loading Precedence](/overview/skills#skill-loading-precedence) for details. + + +### Example: Custom QA Skill + +Create `.agents/skills/qa-guide.md` in your repository: + +```markdown +--- +name: qa-guide +description: Project-specific QA guidelines for MyProject +triggers: +- /qa-changes +--- + +# MyProject QA Guidelines + +## Setup +- Run `make install` to install all dependencies +- Run `make build` to compile the project + +## Testing +- `make test` runs the full test suite (~2 min) +- `make test-unit` runs only unit tests (~30 sec) +- For UI changes, also run `make test-snapshots` + +## Critical Flows to Verify +- User login/logout +- Data export to CSV +- Webhook delivery + +## Known Issues +- The flaky `test_websocket_reconnect` test sometimes fails; ignore it +- Integration tests require `REDIS_URL` which is not available in CI +``` + + +**How skill merging works**: Using a name like `qa-guide` (different from `qa-changes`) allows BOTH your custom skill AND the default `qa-changes` skill to be triggered by `/qa-changes`. The agent sees both and follows both sets of guidelines. + +If your skill has `name: qa-changes` (matching the public skill's name), it will completely **override** the default methodology. + + +## Reference Workflow + + +This example is available on GitHub: [plugins/qa-changes/workflows/](https://github.com/OpenHands/extensions/tree/main/plugins/qa-changes/workflows) + + +```yaml icon="yaml" expandable plugins/qa-changes/workflows/qa-changes-by-openhands.yml +--- +name: QA Changes by OpenHands + +on: + pull_request_target: + types: [opened, ready_for_review, labeled, review_requested] + +permissions: + contents: read + pull-requests: write + issues: write + +jobs: + qa-changes: + if: | + (github.event.action == 'opened' && github.event.pull_request.draft == false && github.event.pull_request.author_association != 'FIRST_TIME_CONTRIBUTOR' && github.event.pull_request.author_association != 'NONE') || + (github.event.action == 'ready_for_review' && github.event.pull_request.author_association != 'FIRST_TIME_CONTRIBUTOR' && github.event.pull_request.author_association != 'NONE') || + github.event.label.name == 'qa-this' || + github.event.requested_reviewer.login == 'openhands-agent' + concurrency: + group: qa-changes-${{ github.event.pull_request.number }} + cancel-in-progress: true + runs-on: ubuntu-24.04 + steps: + - name: Run QA Changes + uses: OpenHands/extensions/plugins/qa-changes@main + with: + llm-model: anthropic/claude-sonnet-4-5-20250929 + llm-api-key: ${{ secrets.LLM_API_KEY }} + github-token: ${{ secrets.GITHUB_TOKEN }} +``` + +### Action Inputs + +| Input | Description | Required | Default | +|-------|-------------|----------|---------| +| `llm-model` | LLM model to use | No | `anthropic/claude-sonnet-4-5-20250929` | +| `llm-base-url` | LLM base URL (optional) | No | `''` | +| `extensions-version` | Git ref for extensions (tag, branch, or commit SHA) | No | `main` | +| `extensions-repo` | Extensions repository (owner/repo) | No | `OpenHands/extensions` | +| `llm-api-key` | LLM API key | Yes | - | +| `github-token` | GitHub token for API access | Yes | - | + +## Related Files + +- [QA Changes Plugin](https://github.com/OpenHands/extensions/tree/main/plugins/qa-changes) - Complete plugin with scripts and skills (in extensions repo) +- [Agent Script](https://github.com/OpenHands/extensions/blob/main/plugins/qa-changes/scripts/agent_script.py) - Main QA agent script +- [Prompt Template](https://github.com/OpenHands/extensions/blob/main/plugins/qa-changes/scripts/prompt.py) - QA prompt template +- [QA Skill](https://github.com/OpenHands/extensions/tree/main/skills/qa-changes) - QA methodology skill From a198efdff3459c38531ca00de98c9fa30541e10c Mon Sep 17 00:00:00 2001 From: openhands Date: Thu, 2 Apr 2026 18:13:59 +0000 Subject: [PATCH 2/2] docs: update QA docs to match revised four-phase methodology - CI-aware: check CI first, only run tests CI doesn't cover - High-bar exercise: browsers, CLI, HTTP requests - Graceful failure: give up and suggest AGENTS.md guidance - PARTIAL verdict for incomplete verification Co-authored-by: openhands --- openhands/usage/use-cases/qa-changes.mdx | 18 +++++++++--------- sdk/guides/github-workflows/qa-changes.mdx | 7 ++++--- 2 files changed, 13 insertions(+), 12 deletions(-) diff --git a/openhands/usage/use-cases/qa-changes.mdx b/openhands/usage/use-cases/qa-changes.mdx index 9f457a2ef..9587635dd 100644 --- a/openhands/usage/use-cases/qa-changes.mdx +++ b/openhands/usage/use-cases/qa-changes.mdx @@ -33,19 +33,19 @@ The QA workflow uses the OpenHands Software Agent SDK to validate your code chan - The `qa-this` label is added to a PR - `openhands-agent` is requested as a reviewer -2. **Validation**: The agent follows a five-phase methodology: - - **Understand**: Reads the diff and classifies changes (new feature, bug fix, refactor, config) - - **Setup**: Bootstraps the repository — installs dependencies, builds, establishes a test baseline - - **Test**: Runs the existing test suite, records pass/fail counts, detects regressions - - **Exercise**: Goes beyond tests — executes new features, reproduces fixed bugs, tries edge cases - - **Report**: Posts structured findings with evidence +2. **Validation**: The agent follows a four-phase methodology: + - **Understand**: Reads the diff, classifies changes, and identifies entry points (CLI commands, API endpoints, UI pages) + - **Setup**: Bootstraps the repository — installs dependencies, builds. Checks CI status and only runs tests CI does not cover + - **Exercise**: The core phase — actually uses the software as a real user would. Spins up servers, opens browsers, runs CLI commands, makes HTTP requests. The bar is high: "tests pass" is not enough + - **Report**: Posts structured findings with evidence, including what could not be verified 3. **Output**: A QA report is posted as a PR comment with: - Environment setup status - - Test suite results (pass/fail counts, regressions) - - Functional verification evidence (commands run, outputs observed) + - CI & test status (what CI covers, any additional tests run) + - Functional verification evidence (commands run, outputs observed, screenshots) + - Unable to verify (what could not be tested, with suggested `AGENTS.md` guidance) - Issues found (🔴 Blocker, 🟠 Issue, 🟡 Minor) - - Verdict (✅ PASS, ⚠️ PASS WITH ISSUES, ❌ FAIL) + - Verdict (✅ PASS, ⚠️ PASS WITH ISSUES, ❌ FAIL, 🟡 PARTIAL) ### Code Review vs QA Validation diff --git a/sdk/guides/github-workflows/qa-changes.mdx b/sdk/guides/github-workflows/qa-changes.mdx index ef1bbfd50..6099075ab 100644 --- a/sdk/guides/github-workflows/qa-changes.mdx +++ b/sdk/guides/github-workflows/qa-changes.mdx @@ -30,9 +30,10 @@ cp plugins/qa-changes/workflows/qa-changes-by-openhands.yml \ ## Features - **Runs the actual code** — Sets up the environment, installs dependencies, builds the project -- **Regression detection** — Runs the test suite and identifies new failures introduced by the PR -- **Functional verification** — Exercises new features, reproduces fixed bugs, tests edge cases -- **Structured reports** — Posts QA findings as a PR comment with evidence and a clear verdict +- **CI-aware** — Checks CI status first, only runs tests CI does not cover +- **High-bar functional verification** — Spins up servers, uses real browsers (Playwright), runs CLI commands, makes HTTP requests. "Tests pass" is not enough +- **Graceful failure** — If verification approaches fail after multiple attempts, reports honestly what could not be verified and suggests `AGENTS.md` improvements +- **Structured reports** — Posts QA findings as a PR comment with evidence and a clear verdict (PASS / PASS WITH ISSUES / FAIL / PARTIAL) - **Customizable** — Add project-specific QA guidelines without forking ## How QA Differs from Code Review