Skip to content

fix: align standalone GRPO with WAA API format and add retry logic#193

Merged
abrichr merged 1 commit intomainfrom
fix/standalone-grpo-waa-compat
Mar 24, 2026
Merged

fix: align standalone GRPO with WAA API format and add retry logic#193
abrichr merged 1 commit intomainfrom
fix/standalone-grpo-waa-compat

Conversation

@abrichr
Copy link
Member

@abrichr abrichr commented Mar 24, 2026

Summary

  • Fix screenshot(): WAA's /screenshot returns raw PNG bytes via send_file(), not base64-encoded JSON. Changed from resp.json() to resp.content (matching WAALiveAdapter)
  • Fix execute_action(): WAA's /execute_windows uses exec(command) directly, not subprocess. Removed python -c "..." wrapper that caused SyntaxError inside the VM. Now sends bare Python statements (matching WAALiveAdapter._build_pixel_command)
  • Fix train script import: scripts/train_grpo_standalone.py triggered openadapt_evals/__init__.py which eagerly imports open_clip (via demo_library), causing numpy ABI crashes. Now shims sys.modules to bypass the top-level init
  • Add retry logic: screenshot() retries 3 times with 2s delay; trainer does pre-rollout health check via new probe() method; training loop handles empty rollout groups gracefully
  • Add missing action types: double_click, right_click, scroll (matching WAALiveAdapter)

These two API format bugs (screenshot parsing + execute wrapping) are the root cause of the standalone GRPO trainer producing zero rewards.

Test plan

  • With WAA running, verify WAADirect.screenshot() returns valid PNG bytes (len > 100, parseable by PIL)
  • Verify WAADirect.execute_action(SimpleAction(type="click", x=500, y=500)) succeeds (status 200, no SyntaxError)
  • Verify python scripts/train_grpo_standalone.py --help works without importing open_clip
  • Verify WAADirect.probe() returns {"reachable": True, "screenshot_ok": True} when server is up
  • Verify WAADirect.health_check() returns False when server is down (no hang)

🤖 Generated with Claude Code

The standalone GRPO trainer produced zero rewards due to two API
format bugs in WAADirect:

1. screenshot() tried resp.json() expecting base64-encoded JSON, but
   WAA's /screenshot returns raw PNG bytes via Flask's send_file().
   Fixed to use resp.content (matching WAALiveAdapter).

2. execute_action() wrapped commands in `python -c "..."`, but WAA's
   /execute_windows uses exec() directly -- the wrapper caused
   SyntaxError inside the VM. Fixed to send bare Python statements
   (matching WAALiveAdapter._build_pixel_command).

Additional improvements:
- Add probe() method for structured health checking
- Add screenshot retry logic (3 attempts with 2s delay)
- Add double_click, right_click, scroll action types
- Fix type action to click target first then type (match WAALiveAdapter)
- Add pre-rollout health check in trainer._collect_group()
- Handle empty rollouts gracefully in training loop
- Fix train script to bypass openadapt_evals/__init__.py eager imports
  (open_clip -> numpy ABI crash in minimal training environments)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@abrichr abrichr merged commit 43cac1c into main Mar 24, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant