Unify BitWorldWorker and AmongThemNativeWorker step API#48
Open
Unify BitWorldWorker and AmongThemNativeWorker step API#48
Conversation
Both workers now expose:
step(action_masks: np.ndarray) -> tuple[frames, rewards]
where action_masks has shape (agent_count,) uint8, frames has shape
(agent_count, ...), and rewards has shape (agent_count,) float32.
BitWorldWorker.step previously took a scalar int and returned
(frame, reward_delta), so the vec env dispatched on
worker.agent_count == 1 to choose between the scalar API
(BitWorldWorker) and the array API (AmongThemNativeWorker). That
condition was a leaky proxy: it conflated "this is the single-agent
worker class" with "there is one agent." It silently broke when
Among Them was configured with players=1, since the worker is still
AmongThemNativeWorker and still expects array-shaped masks.
Now BitWorldWorker.step internally unwraps masks[0] to a byte for
the connection send and wraps reward_delta into a 1-element float32
array for the return. The vec env's _step_env has a single path:
frames, rewards = worker.step(action_masks)
frames = self._frame_batch(frames, worker)
No behavior change for either env type at agent_count > 1; the
1-player Among Them path is now correct.
Smoke testing requires a torch-enabled environment, which this branch
was prepared in without; verifying end-to-end on Mac/MPS is left to
the reviewer.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Both workers now expose the same step API:
where
action_maskshas shape(agent_count,)uint8,frameshas shape(agent_count, ...), andrewardshas shape(agent_count,)float32.Why
BitWorldWorker.steppreviously took a scalarintand returned(frame, reward_delta), soBitWorldVecEnv._step_envdispatched onworker.agent_count == 1to pick between the scalar API (BitWorldWorker) and the array API (AmongThemNativeWorker):That condition is a leaky proxy: it conflates "this is the single-agent worker class" with "there is one agent." It silently breaks when Among Them is configured with
players=1, since the worker is stillAmongThemNativeWorkerand still expects array-shaped masks. Hitting the scalar branch trips theexpected N Among Them action masksguard inside the native worker.What changed
BitWorldWorker.stepnow takesaction_masks: np.ndarrayof shape(1,), internally unwrapsmasks[0]to a byte for the connection send, and wrapsreward_deltainto a 1-element float32 array on return.BitWorldVecEnv._step_envcollapses to a single path:_frame_batchis unchanged — its existing reshape handles both(1, FRAME_PIXELS)fromBitWorldWorkerand(agent_count, FRAME_PIXELS)fromAmongThemNativeWorker.No behavior change for either env type at
agent_count > 1. Theplayers=1Among Them path is now correct.Test plan
among_themwith--players 1and--num-envs 8— runs without theexpected N action masksValueError.among_themwith--players 4and--num-envs 8— unchanged behavior, no regression in SPS or rewards.BitWorldWorker-backed env (e.g.,bubble_eatsorsnake) — runs without regression. (Local env had no torch installed, so this was not run on the prep machine.)https://claude.ai/code/session_01Tb5Fr1Yu8JxTuD5dSRcswa
Generated by Claude Code