From 38f3cdc45a23e9acfb119ef44906e094db14b8b7 Mon Sep 17 00:00:00 2001
From: openhands <openhands@all-hands.dev>
Date: Thu, 2 Apr 2026 18:04:29 +0000
Subject: [PATCH 1/2] docs: add QA Changes use case and SDK workflow guide

Add documentation for the new QA Changes plugin:

- openhands/usage/use-cases/qa-changes.mdx: Use case page with overview,
  quick start, customization, and troubleshooting
- sdk/guides/github-workflows/qa-changes.mdx: SDK guide with reference
  workflow, action inputs, and skill customization
- docs.json: Add both pages to navigation

Co-authored-by: openhands <openhands@all-hands.dev>
---
 docs.json                                  |   2 +
 openhands/usage/use-cases/qa-changes.mdx   | 239 +++++++++++++++++++++
 sdk/guides/github-workflows/qa-changes.mdx | 167 ++++++++++++++
 3 files changed, 408 insertions(+)
 create mode 100644 openhands/usage/use-cases/qa-changes.mdx
 create mode 100644 sdk/guides/github-workflows/qa-changes.mdx
diff --git a/docs.json b/docs.json
index ca8084359..9b4904434 100644
--- a/docs.json
+++ b/docs.json
@@ -213,6 +213,7 @@
             "pages": [
               "openhands/usage/use-cases/vulnerability-remediation",
               "openhands/usage/use-cases/code-review",
+              "openhands/usage/use-cases/qa-changes",
               "openhands/usage/use-cases/incident-triage",
               "openhands/usage/use-cases/cobol-modernization",
               "openhands/usage/use-cases/dependency-upgrades",
@@ -307,6 +308,7 @@
                 "pages": [
                   "sdk/guides/github-workflows/assign-reviews",
                   "sdk/guides/github-workflows/pr-review",
+                  "sdk/guides/github-workflows/qa-changes",
                   "sdk/guides/github-workflows/todo-management"
                 ]
               }
diff --git a/openhands/usage/use-cases/qa-changes.mdx b/openhands/usage/use-cases/qa-changes.mdx
new file mode 100644
index 000000000..9f457a2ef
--- /dev/null
+++ b/openhands/usage/use-cases/qa-changes.mdx
@@ -0,0 +1,239 @@
+---
+title: Automated QA Validation
+description: Set up automated QA testing of PR changes using OpenHands and the Software Agent SDK
+---
+
+<Card
+  title="View Example Plugin"
+  icon="github"
+  href="https://github.com/OpenHands/extensions/tree/main/plugins/qa-changes"
+>
+  Check out the complete QA Changes plugin with ready-to-use code and configuration.
+</Card>
+
+Automated code review catches style, security, and logic issues by reading diffs — but it cannot verify that a change *actually works*. The QA Changes workflow fills this gap by running the code: setting up the environment, executing the test suite, exercising changed behavior, and posting a structured report with evidence.
+
+## Overview
+
+The OpenHands QA Changes workflow is a GitHub Actions workflow that:
+
+- **Triggers automatically** when PRs are opened or when you request QA validation
+- **Sets up the full environment** — installs dependencies, builds the project
+- **Runs the test suite** — detects regressions introduced by the PR
+- **Exercises changed behavior** — manually tests new features, bug fixes, and edge cases
+- **Posts a structured QA report** as a PR comment with commands, outputs, and a verdict
+
+## How It Works
+
+The QA workflow uses the OpenHands Software Agent SDK to validate your code changes:
+
+1. **Trigger**: The workflow runs when:
+   - A new non-draft PR is opened
+   - A draft PR is marked as ready for review
+   - The `qa-this` label is added to a PR
+   - `openhands-agent` is requested as a reviewer
+
+2. **Validation**: The agent follows a five-phase methodology:
+   - **Understand**: Reads the diff and classifies changes (new feature, bug fix, refactor, config)
+   - **Setup**: Bootstraps the repository — installs dependencies, builds, establishes a test baseline
+   - **Test**: Runs the existing test suite, records pass/fail counts, detects regressions
+   - **Exercise**: Goes beyond tests — executes new features, reproduces fixed bugs, tries edge cases
+   - **Report**: Posts structured findings with evidence
+
+3. **Output**: A QA report is posted as a PR comment with:
+   - Environment setup status
+   - Test suite results (pass/fail counts, regressions)
+   - Functional verification evidence (commands run, outputs observed)
+   - Issues found (🔴 Blocker, 🟠 Issue, 🟡 Minor)
+   - Verdict (✅ PASS, ⚠️ PASS WITH ISSUES, ❌ FAIL)
+
+### Code Review vs QA Validation
+
+| Aspect | [Code Review](/openhands/usage/use-cases/code-review) | QA Validation |
+|--------|-------------|---------------|
+| **Method** | Reads the diff | Runs the code |
+| **Speed** | 2-3 minutes | 5-15 minutes |
+| **Catches** | Style, security, logic issues | Regressions, broken features, build failures |
+| **Output** | Inline code comments | Structured QA report with evidence |
+| **Best for** | Every PR | Feature PRs, bug fixes, risky changes |
+
+Both workflows complement each other. Use code review for fast feedback on every PR, and QA validation for thorough verification of changes that affect behavior.
+
+## Quick Start
+
+<Steps>
+  <Step title="Copy the workflow file">
+    Create `.github/workflows/qa-changes-by-openhands.yml` in your repository:
+
+    ```yaml
+    name: QA Changes by OpenHands
+
+    on:
+      pull_request_target:
+        types: [opened, ready_for_review, labeled, review_requested]
+
+    permissions:
+      contents: read
+      pull-requests: write
+      issues: write
+
+    jobs:
+      qa-changes:
+        if: |
+          (github.event.action == 'opened' && github.event.pull_request.draft == false) ||
+          github.event.action == 'ready_for_review' ||
+          github.event.label.name == 'qa-this' ||
+          github.event.requested_reviewer.login == 'openhands-agent'
+        runs-on: ubuntu-latest
+        steps:
+          - name: Run QA Changes
+            uses: OpenHands/extensions/plugins/qa-changes@main
+            with:
+              llm-model: anthropic/claude-sonnet-4-5-20250929
+              llm-api-key: ${{ secrets.LLM_API_KEY }}
+              github-token: ${{ secrets.GITHUB_TOKEN }}
+    ```
+  </Step>
+
+  <Step title="Add your LLM API key">
+    Go to your repository's **Settings → Secrets and variables → Actions** and add:
+    - **`LLM_API_KEY`**: Your LLM API key (get one from [OpenHands LLM Provider](/openhands/usage/llms/openhands-llms))
+  </Step>
+
+  <Step title="Create the QA label">
+    Create a `qa-this` label in your repository:
+    1. Go to **Issues → Labels**
+    2. Click **New label**
+    3. Name: `qa-this`
+    4. Description: `Trigger OpenHands QA validation`
+  </Step>
+
+  <Step title="Trigger QA validation">
+    Open a PR and either:
+    - Add the `qa-this` label, OR
+    - Request `openhands-agent` as a reviewer
+  </Step>
+</Steps>
+
+## Composite Action
+
+The workflow uses a reusable composite action from the extensions repository that handles all the setup automatically:
+
+- Checking out the extensions and PR repositories
+- Setting up Python and dependencies
+- Running the QA agent in the PR's workspace
+- Uploading logs as artifacts
+
+### Action Inputs
+
+| Input | Description | Required | Default |
+|-------|-------------|----------|---------|
+| `llm-model` | LLM model to use | No | `anthropic/claude-sonnet-4-5-20250929` |
+| `llm-base-url` | LLM base URL (for custom endpoints) | No | `''` |
+| `extensions-version` | Git ref for extensions (tag, branch, or commit SHA) | No | `main` |
+| `extensions-repo` | Extensions repository (owner/repo) | No | `OpenHands/extensions` |
+| `llm-api-key` | LLM API key | Yes | - |
+| `github-token` | GitHub token for API access | Yes | - |
+
+<Note>
+Use `extensions-version` to pin to a specific version tag (e.g., `v1.0.0`) for production stability, or use `main` to always get the latest features.
+</Note>
+
+## Customization
+
+### Repository-Specific QA Guidelines
+
+Help the QA agent understand your project by adding a skill file at `.agents/skills/qa-guide.md`:
+
+```markdown
+---
+name: qa-guide
+description: Project-specific QA guidelines
+triggers:
+- /qa-changes
+---
+
+# Project QA Guidelines
+
+## Setup Commands
+- `make install` to install dependencies
+- `make build` to build the project
+
+## Test Commands
+- `make test` for unit tests
+- `make test-integration` for integration tests
+- `make test-e2e` for end-to-end tests
+
+## Key Behaviors to Verify
+- User authentication flows
+- API endpoint responses
+- Database migration correctness
+
+## Known Fragile Areas
+- WebSocket connections under load
+- File upload handling for large files
+```
+
+<Tip>
+The QA agent also reads your repository's `AGENTS.md` file automatically. Adding setup commands, test commands, and project conventions there helps both QA and other OpenHands workflows.
+</Tip>
+
+### Trigger Customization
+
+Modify when QA runs by editing the workflow conditions:
+
+```yaml
+# Only trigger on label (disable auto-QA on PR open)
+if: github.event.label.name == 'qa-this'
+
+# Trigger on all PRs (including drafts)
+if: |
+  github.event.action == 'opened' ||
+  github.event.action == 'synchronize'
+```
+
+## Security Considerations
+
+<Warning>
+**Important**: The QA agent executes code from the PR. Unlike code review (which only reads diffs), QA validation runs commands in the repository.
+
+The workflow excludes `FIRST_TIME_CONTRIBUTOR` and `NONE` author associations from automatic triggers. For untrusted PRs, manually review the changes before adding the `qa-this` label.
+
+API keys are passed as [SDK secrets](/sdk/guides/secrets) to prevent direct credential access during code execution.
+</Warning>
+
+## Troubleshooting
+
+<AccordionGroup>
+  <Accordion title="QA not triggering">
+    - Ensure the `LLM_API_KEY` secret is set correctly
+    - Check that the label name matches exactly (`qa-this`)
+    - Verify the workflow file is in `.github/workflows/`
+    - Check the Actions tab for workflow run errors
+  </Accordion>
+
+  <Accordion title="Environment setup failing">
+    - Add setup instructions to `AGENTS.md` or a custom QA skill
+    - Ensure the project's dependencies are available in the CI environment
+    - Check if the project requires specific system packages
+  </Accordion>
+
+  <Accordion title="QA taking too long">
+    - Large test suites may take longer to run
+    - Consider adding a custom skill that specifies which test subset to run
+    - Check if the LLM API is experiencing delays
+  </Accordion>
+
+  <Accordion title="QA report not appearing">
+    - Ensure `GITHUB_TOKEN` has `pull-requests: write` permission
+    - Check the workflow logs for API errors
+    - Verify the PR is not from a fork with restricted permissions
+  </Accordion>
+</AccordionGroup>
+
+## Related Resources
+
+- [QA Changes Plugin](https://github.com/OpenHands/extensions/tree/main/plugins/qa-changes) - Complete plugin with scripts and skills
+- [Automated Code Review](/openhands/usage/use-cases/code-review) - Complementary code review workflow
+- [Software Agent SDK](/sdk/index) - Build your own AI-powered workflows
+- [Skills Documentation](/overview/skills) - Learn more about OpenHands skills
diff --git a/sdk/guides/github-workflows/qa-changes.mdx b/sdk/guides/github-workflows/qa-changes.mdx
new file mode 100644
index 000000000..ef1bbfd50
--- /dev/null
+++ b/sdk/guides/github-workflows/qa-changes.mdx
@@ -0,0 +1,167 @@
+---
+title: QA Changes
+description: Use OpenHands Agent to automatically QA pull request changes by running the code
+---
+
+> The reference workflow is available [here](#reference-workflow)!
+
+Automatically validate pull request changes by actually running the code — setting up the environment, executing the test suite, exercising changed behavior, and posting a structured QA report as a PR comment. QA can be triggered in two ways:
+- Adding the `qa-this` label to the PR
+- Requesting `openhands-agent` as a reviewer
+
+<Note>
+The reference workflow triggers on either the "qa-this" label or when the openhands-agent account is requested as a reviewer. In OpenHands organization repositories, openhands-agent has access, so this works as-is. In your own repositories, requesting openhands-agent will only work if that account is added as a collaborator. If you don't plan to grant access, use the label trigger instead.
+</Note>
+
+## Quick Start
+
+```bash
+# 1. Copy workflow to your repository
+cp plugins/qa-changes/workflows/qa-changes-by-openhands.yml \
+   .github/workflows/qa-changes-by-openhands.yml
+
+# 2. Configure secrets in GitHub Settings → Secrets
+# Add: LLM_API_KEY
+
+# 3. (Optional) Create a "qa-this" label in your repository
+# Go to Issues → Labels → New label
+```
+
+## Features
+
+- **Runs the actual code** — Sets up the environment, installs dependencies, builds the project
+- **Regression detection** — Runs the test suite and identifies new failures introduced by the PR
+- **Functional verification** — Exercises new features, reproduces fixed bugs, tests edge cases
+- **Structured reports** — Posts QA findings as a PR comment with evidence and a clear verdict
+- **Customizable** — Add project-specific QA guidelines without forking
+
+## How QA Differs from Code Review
+
+The [PR Review](/sdk/guides/github-workflows/pr-review) workflow reads the diff and posts inline code comments. The QA Changes workflow *executes* the code:
+
+| | PR Review | QA Changes |
+|---|-----------|------------|
+| **Reads diff** | ✅ | ✅ |
+| **Installs dependencies** | ❌ | ✅ |
+| **Runs test suite** | ❌ | ✅ |
+| **Executes changed features** | ❌ | ✅ |
+| **Detects regressions** | ❌ | ✅ |
+| **Output** | Inline comments | PR comment report |
+
+Both complement each other. Use PR Review for fast code-level feedback, and QA Changes for behavioral verification.
+
+## Security
+
+<Warning>
+**The QA agent executes code from the PR.** Unlike code review (which only reads diffs), QA validation runs commands. Only trigger it on PRs you trust.
+
+The workflow excludes `FIRST_TIME_CONTRIBUTOR` and `NONE` author associations from automatic triggers. For untrusted PRs, manually review the code before adding the `qa-this` label.
+</Warning>
+
+## Customizing QA Behavior
+
+Instead of forking the scripts, add project-specific QA guidelines as a skill file.
+
+### How It Works
+
+The QA agent uses the [`/qa-changes`](https://github.com/OpenHands/extensions/tree/main/skills/qa-changes) skill for its methodology. Add project-specific setup commands, test commands, and verification guidelines by creating a custom skill.
+
+<Note>
+**Skill paths**: Place skills in `.agents/skills/` (recommended). See [Skill Loading Precedence](/overview/skills#skill-loading-precedence) for details.
+</Note>
+
+### Example: Custom QA Skill
+
+Create `.agents/skills/qa-guide.md` in your repository:
+
+```markdown
+---
+name: qa-guide
+description: Project-specific QA guidelines for MyProject
+triggers:
+- /qa-changes
+---
+
+# MyProject QA Guidelines
+
+## Setup
+- Run `make install` to install all dependencies
+- Run `make build` to compile the project
+
+## Testing
+- `make test` runs the full test suite (~2 min)
+- `make test-unit` runs only unit tests (~30 sec)
+- For UI changes, also run `make test-snapshots`
+
+## Critical Flows to Verify
+- User login/logout
+- Data export to CSV
+- Webhook delivery
+
+## Known Issues
+- The flaky `test_websocket_reconnect` test sometimes fails; ignore it
+- Integration tests require `REDIS_URL` which is not available in CI
+```
+
+<Tip>
+**How skill merging works**: Using a name like `qa-guide` (different from `qa-changes`) allows BOTH your custom skill AND the default `qa-changes` skill to be triggered by `/qa-changes`. The agent sees both and follows both sets of guidelines.
+
+If your skill has `name: qa-changes` (matching the public skill's name), it will completely **override** the default methodology.
+</Tip>
+
+## Reference Workflow
+
+<Note>
+This example is available on GitHub: [plugins/qa-changes/workflows/](https://github.com/OpenHands/extensions/tree/main/plugins/qa-changes/workflows)
+</Note>
+
+```yaml icon="yaml" expandable plugins/qa-changes/workflows/qa-changes-by-openhands.yml
+---
+name: QA Changes by OpenHands
+
+on:
+    pull_request_target:
+        types: [opened, ready_for_review, labeled, review_requested]
+
+permissions:
+    contents: read
+    pull-requests: write
+    issues: write
+
+jobs:
+    qa-changes:
+        if: |
+            (github.event.action == 'opened' && github.event.pull_request.draft == false && github.event.pull_request.author_association != 'FIRST_TIME_CONTRIBUTOR' && github.event.pull_request.author_association != 'NONE') ||
+            (github.event.action == 'ready_for_review' && github.event.pull_request.author_association != 'FIRST_TIME_CONTRIBUTOR' && github.event.pull_request.author_association != 'NONE') ||
+            github.event.label.name == 'qa-this' ||
+            github.event.requested_reviewer.login == 'openhands-agent'
+        concurrency:
+            group: qa-changes-${{ github.event.pull_request.number }}
+            cancel-in-progress: true
+        runs-on: ubuntu-24.04
+        steps:
+            - name: Run QA Changes
+              uses: OpenHands/extensions/plugins/qa-changes@main
+              with:
+                  llm-model: anthropic/claude-sonnet-4-5-20250929
+                  llm-api-key: ${{ secrets.LLM_API_KEY }}
+                  github-token: ${{ secrets.GITHUB_TOKEN }}
+```
+
+### Action Inputs
+
+| Input | Description | Required | Default |
+|-------|-------------|----------|---------|
+| `llm-model` | LLM model to use | No | `anthropic/claude-sonnet-4-5-20250929` |
+| `llm-base-url` | LLM base URL (optional) | No | `''` |
+| `extensions-version` | Git ref for extensions (tag, branch, or commit SHA) | No | `main` |
+| `extensions-repo` | Extensions repository (owner/repo) | No | `OpenHands/extensions` |
+| `llm-api-key` | LLM API key | Yes | - |
+| `github-token` | GitHub token for API access | Yes | - |
+
+## Related Files
+
+- [QA Changes Plugin](https://github.com/OpenHands/extensions/tree/main/plugins/qa-changes) - Complete plugin with scripts and skills (in extensions repo)
+- [Agent Script](https://github.com/OpenHands/extensions/blob/main/plugins/qa-changes/scripts/agent_script.py) - Main QA agent script
+- [Prompt Template](https://github.com/OpenHands/extensions/blob/main/plugins/qa-changes/scripts/prompt.py) - QA prompt template
+- [QA Skill](https://github.com/OpenHands/extensions/tree/main/skills/qa-changes) - QA methodology skill

From a198efdff3459c38531ca00de98c9fa30541e10c Mon Sep 17 00:00:00 2001
From: openhands <openhands@all-hands.dev>
Date: Thu, 2 Apr 2026 18:13:59 +0000
Subject: [PATCH 2/2] docs: update QA docs to match revised four-phase
 methodology

- CI-aware: check CI first, only run tests CI doesn't cover
- High-bar exercise: browsers, CLI, HTTP requests
- Graceful failure: give up and suggest AGENTS.md guidance
- PARTIAL verdict for incomplete verification

Co-authored-by: openhands <openhands@all-hands.dev>
---
 openhands/usage/use-cases/qa-changes.mdx   | 18 +++++++++---------
 sdk/guides/github-workflows/qa-changes.mdx |  7 ++++---
 2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/openhands/usage/use-cases/qa-changes.mdx b/openhands/usage/use-cases/qa-changes.mdx
index 9f457a2ef..9587635dd 100644
--- a/openhands/usage/use-cases/qa-changes.mdx
+++ b/openhands/usage/use-cases/qa-changes.mdx
@@ -33,19 +33,19 @@ The QA workflow uses the OpenHands Software Agent SDK to validate your code chan
    - The `qa-this` label is added to a PR
    - `openhands-agent` is requested as a reviewer
 
-2. **Validation**: The agent follows a five-phase methodology:
-   - **Understand**: Reads the diff and classifies changes (new feature, bug fix, refactor, config)
-   - **Setup**: Bootstraps the repository — installs dependencies, builds, establishes a test baseline
-   - **Test**: Runs the existing test suite, records pass/fail counts, detects regressions
-   - **Exercise**: Goes beyond tests — executes new features, reproduces fixed bugs, tries edge cases
-   - **Report**: Posts structured findings with evidence
+2. **Validation**: The agent follows a four-phase methodology:
+   - **Understand**: Reads the diff, classifies changes, and identifies entry points (CLI commands, API endpoints, UI pages)
+   - **Setup**: Bootstraps the repository — installs dependencies, builds. Checks CI status and only runs tests CI does not cover
+   - **Exercise**: The core phase — actually uses the software as a real user would. Spins up servers, opens browsers, runs CLI commands, makes HTTP requests. The bar is high: "tests pass" is not enough
+   - **Report**: Posts structured findings with evidence, including what could not be verified
 
 3. **Output**: A QA report is posted as a PR comment with:
    - Environment setup status
-   - Test suite results (pass/fail counts, regressions)
-   - Functional verification evidence (commands run, outputs observed)
+   - CI & test status (what CI covers, any additional tests run)
+   - Functional verification evidence (commands run, outputs observed, screenshots)
+   - Unable to verify (what could not be tested, with suggested `AGENTS.md` guidance)
    - Issues found (🔴 Blocker, 🟠 Issue, 🟡 Minor)
-   - Verdict (✅ PASS, ⚠️ PASS WITH ISSUES, ❌ FAIL)
+   - Verdict (✅ PASS, ⚠️ PASS WITH ISSUES, ❌ FAIL, 🟡 PARTIAL)
 
 ### Code Review vs QA Validation
 
diff --git a/sdk/guides/github-workflows/qa-changes.mdx b/sdk/guides/github-workflows/qa-changes.mdx
index ef1bbfd50..6099075ab 100644
--- a/sdk/guides/github-workflows/qa-changes.mdx
+++ b/sdk/guides/github-workflows/qa-changes.mdx
@@ -30,9 +30,10 @@ cp plugins/qa-changes/workflows/qa-changes-by-openhands.yml \
 ## Features
 
 - **Runs the actual code** — Sets up the environment, installs dependencies, builds the project
-- **Regression detection** — Runs the test suite and identifies new failures introduced by the PR
-- **Functional verification** — Exercises new features, reproduces fixed bugs, tests edge cases
-- **Structured reports** — Posts QA findings as a PR comment with evidence and a clear verdict
+- **CI-aware** — Checks CI status first, only runs tests CI does not cover
+- **High-bar functional verification** — Spins up servers, uses real browsers (Playwright), runs CLI commands, makes HTTP requests. "Tests pass" is not enough
+- **Graceful failure** — If verification approaches fail after multiple attempts, reports honestly what could not be verified and suggests `AGENTS.md` improvements
+- **Structured reports** — Posts QA findings as a PR comment with evidence and a clear verdict (PASS / PASS WITH ISSUES / FAIL / PARTIAL)
 - **Customizable** — Add project-specific QA guidelines without forking
 
 ## How QA Differs from Code Review