Skip to content

Conversation

@enyst
Copy link
Collaborator

@enyst enyst commented Nov 28, 2025

This PR proposes to make the wsproto test more robust, it can happen that terminate fails in pytest.

—-

(Agent Summary)

  • Record creator_pid at fixture setup and ensure teardown only runs in the original parent process

Why

  • pytest session-scoped fixtures can teardown in forked workers (e.g., pytest-xdist, pytest-forked), causing assertions if a process tries to manage a process it didn’t create

How

  • Capture creator_pid = os.getpid() when starting the child
  • During teardown, guard with if os.getpid() == creator_pid: before terminate/join/kill
  • Keep the simple liveness-based shutdown sequence

Result

  • Clean, explicit, and portable teardown logic
  • Avoids private attribute access
  • Verified locally: pre-commit hooks pass and the wsproto tests pass

Co-authored-by: openhands openhands@all-hands.dev

@enyst can click here to continue refining the PR

…ate attr\n\n- Record creator_pid and only tear down from original parent\n- Avoid using multiprocessing.Process._parent_pid which is private\n- Minor safety: terminate only if alive on startup failure\n\nCo-authored-by: openhands <openhands@all-hands.dev>
@enyst
Copy link
Collaborator Author

enyst commented Nov 29, 2025

@OpenHands Think deeply, run the entire test suite, then run this test in isolation, and come up with a few reasons why was this flaky, which MIGHT NOT be the same reason this PR attempts to fix.

/codereview-roasted

@openhands-ai
Copy link

openhands-ai bot commented Nov 29, 2025

I'm on it! enyst can track my progress at all-hands.dev

@openhands-ai
Copy link

openhands-ai bot commented Nov 29, 2025

Summary of work and answer

What I did

  • Built the project: make build
  • Ran the entire test suite: uv run pytest
    • Observed an early failure in tests/agent_server/test_agent_server_wsproto.py during the full run (.F).
    • Saw many “Exception ignored in atexit callback” messages from LocalConversation.close complaining “Agent not initialized,” likely unrelated noise but indicative of lifecycle issues elsewhere.
  • Ran the wsproto tests in isolation: uv run pytest tests/agent_server/test_agent_server_wsproto.py
    • Passed consistently (2 tests).
  • Stress-ran the wsproto file 10 times back-to-back
    • 10/10 green; couldn’t reproduce flakiness in isolation.

Why this was flaky (beyond the PR’s teardown guard)

  • Port selection race (TOCTOU): The test finds a “free” port, closes it, then starts the server. Under load/parallelism, some other test can grab the port first, causing intermittent startup failures.
  • Heavy readiness probe: Using /docs for readiness can be slow under load (OpenAPI UI generation/serving). Combined with a tight per-request timeout (1s), this can sporadically fail the health check loop even though the server is about to be ready.
  • Session-scoped lifetime: The server persists across the entire test session. Other tests can interfere with it (state changes, shutdown paths, CPU contention), making failures order-dependent.
  • Start-method/platform variability: spawn vs fork differences (CI/OS dependent) can change startup timing and behavior, contributing to occasional flakes.
  • Websocket timing sensitivity: Short open_timeout/recv windows (5s/2s) are mostly fine here, but under heavy CI load they can still be borderline.
  • Atexit cleanup noise elsewhere: The LocalConversation.close tracebacks suggest lifecycle inconsistencies. While not directly causing this test to fail, they add nondeterminism during global teardown.

Bottom line

  • The PR’s fix (guard teardown to only run in the creating process) addresses one real flake vector with xdist/forked teardown, but the biggest contributors are the free-port race and the heavy /docs readiness probe. Even after merging, I’d expect occasional flakes until those are tightened.
  • Simple mitigation ideas:
    • Avoid TOCTOU free-port selection (ideally start the server on port 0 and read back the bound port; otherwise reduce the race window).
    • Use a lightweight health endpoint with a slightly higher request timeout (e.g., 3–5s per attempt).
    • Consider narrower fixture scope if feasible to reduce cross-test interference.
    • Clean up the atexit lifecycle errors in LocalConversation to reduce teardown noise.

View full conversation

@enyst enyst marked this pull request as draft November 29, 2025 23:01
@enyst enyst changed the title tests: make wsproto agent_server fixture teardown robust without private attr tests: make wsproto agent_server fixture teardown robust Nov 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant