Skip to content

Unify BitWorldWorker and AmongThemNativeWorker step API#48

Open
sasmith wants to merge 1 commit intomasterfrom
claude/unify-worker-step-api
Open

Unify BitWorldWorker and AmongThemNativeWorker step API#48
sasmith wants to merge 1 commit intomasterfrom
claude/unify-worker-step-api

Conversation

@sasmith
Copy link
Copy Markdown
Contributor

@sasmith sasmith commented Apr 28, 2026

Summary

Both workers now expose the same step API:

step(action_masks: np.ndarray) -> tuple[frames, rewards]

where action_masks has shape (agent_count,) uint8, frames has shape (agent_count, ...), and rewards has shape (agent_count,) float32.

Why

BitWorldWorker.step previously took a scalar int and returned (frame, reward_delta), so BitWorldVecEnv._step_env dispatched on worker.agent_count == 1 to pick between the scalar API (BitWorldWorker) and the array API (AmongThemNativeWorker):

if worker.agent_count == 1:
    frame, reward = worker.step(int(action_masks[0]))
    ...
else:
    frames, rewards = worker.step(action_masks)
    ...

That condition is a leaky proxy: it conflates "this is the single-agent worker class" with "there is one agent." It silently breaks when Among Them is configured with players=1, since the worker is still AmongThemNativeWorker and still expects array-shaped masks. Hitting the scalar branch trips the expected N Among Them action masks guard inside the native worker.

What changed

  • BitWorldWorker.step now takes action_masks: np.ndarray of shape (1,), internally unwraps masks[0] to a byte for the connection send, and wraps reward_delta into a 1-element float32 array on return.
  • BitWorldVecEnv._step_env collapses to a single path:
    frames, rewards = worker.step(action_masks)
    frames = self._frame_batch(frames, worker)
  • _frame_batch is unchanged — its existing reshape handles both (1, FRAME_PIXELS) from BitWorldWorker and (agent_count, FRAME_PIXELS) from AmongThemNativeWorker.

No behavior change for either env type at agent_count > 1. The players=1 Among Them path is now correct.

Test plan

  • Smoke: among_them with --players 1 and --num-envs 8 — runs without the expected N action masks ValueError.
  • Smoke: among_them with --players 4 and --num-envs 8 — unchanged behavior, no regression in SPS or rewards.
  • Smoke: a BitWorldWorker-backed env (e.g., bubble_eats or snake) — runs without regression. (Local env had no torch installed, so this was not run on the prep machine.)

https://claude.ai/code/session_01Tb5Fr1Yu8JxTuD5dSRcswa


Generated by Claude Code

Both workers now expose:

    step(action_masks: np.ndarray) -> tuple[frames, rewards]

where action_masks has shape (agent_count,) uint8, frames has shape
(agent_count, ...), and rewards has shape (agent_count,) float32.

BitWorldWorker.step previously took a scalar int and returned
(frame, reward_delta), so the vec env dispatched on
worker.agent_count == 1 to choose between the scalar API
(BitWorldWorker) and the array API (AmongThemNativeWorker). That
condition was a leaky proxy: it conflated "this is the single-agent
worker class" with "there is one agent." It silently broke when
Among Them was configured with players=1, since the worker is still
AmongThemNativeWorker and still expects array-shaped masks.

Now BitWorldWorker.step internally unwraps masks[0] to a byte for
the connection send and wraps reward_delta into a 1-element float32
array for the return. The vec env's _step_env has a single path:

    frames, rewards = worker.step(action_masks)
    frames = self._frame_batch(frames, worker)

No behavior change for either env type at agent_count > 1; the
1-player Among Them path is now correct.

Smoke testing requires a torch-enabled environment, which this branch
was prepared in without; verifying end-to-end on Mac/MPS is left to
the reviewer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants