fix: align standalone GRPO with WAA API format and add retry logic#193
Merged
fix: align standalone GRPO with WAA API format and add retry logic#193
Conversation
The standalone GRPO trainer produced zero rewards due to two API format bugs in WAADirect: 1. screenshot() tried resp.json() expecting base64-encoded JSON, but WAA's /screenshot returns raw PNG bytes via Flask's send_file(). Fixed to use resp.content (matching WAALiveAdapter). 2. execute_action() wrapped commands in `python -c "..."`, but WAA's /execute_windows uses exec() directly -- the wrapper caused SyntaxError inside the VM. Fixed to send bare Python statements (matching WAALiveAdapter._build_pixel_command). Additional improvements: - Add probe() method for structured health checking - Add screenshot retry logic (3 attempts with 2s delay) - Add double_click, right_click, scroll action types - Fix type action to click target first then type (match WAALiveAdapter) - Add pre-rollout health check in trainer._collect_group() - Handle empty rollouts gracefully in training loop - Fix train script to bypass openadapt_evals/__init__.py eager imports (open_clip -> numpy ABI crash in minimal training environments) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
/screenshotreturns raw PNG bytes viasend_file(), not base64-encoded JSON. Changed fromresp.json()toresp.content(matching WAALiveAdapter)/execute_windowsusesexec(command)directly, not subprocess. Removedpython -c "..."wrapper that caused SyntaxError inside the VM. Now sends bare Python statements (matching WAALiveAdapter._build_pixel_command)scripts/train_grpo_standalone.pytriggeredopenadapt_evals/__init__.pywhich eagerly imports open_clip (via demo_library), causing numpy ABI crashes. Now shims sys.modules to bypass the top-level initprobe()method; training loop handles empty rollout groups gracefullyThese two API format bugs (screenshot parsing + execute wrapping) are the root cause of the standalone GRPO trainer producing zero rewards.
Test plan
WAADirect.screenshot()returns valid PNG bytes (len > 100, parseable by PIL)WAADirect.execute_action(SimpleAction(type="click", x=500, y=500))succeeds (status 200, no SyntaxError)python scripts/train_grpo_standalone.py --helpworks without importing open_clipWAADirect.probe()returns{"reachable": True, "screenshot_ok": True}when server is upWAADirect.health_check()returns False when server is down (no hang)🤖 Generated with Claude Code