Currently, when an RL run fails (due to whatever reason) or the user Cotnrol-C the experiment, there is a high chance that there will be leaked sandboxes. They will live until a pre-configured timeout (say ~30 minutes).
While this is benign, it can be an issue when:
- A user only has a limited concurrency budget from a sandbox provider (e.g. 1000 sandboxes)
- If an experiment needs 1000 sandboxes, and the user has a retry script that relaunches the SkyRL experiment at the previous checkpoint, where we now need a new set of 1000 concurrent sandboxes, the user will run out of concurrency budget, making the retry fail.
While Harbor has Trial cleanup logics implemented, the intricacies revolving around uv run, ray make it relatively hard to actually allow those cleanup logics to run.
This PR that changes from tqdm.gather() to TaskGroup partially solves the issue: #1193
Related change in Harbor required to fully solve the issue: harbor-framework/harbor#819
These two branches (slightly different approach), claude-coded, can solve the issue but don't look very elegant:
Or, use Harbor's Orchestrator construct instead, and implement an API with the semantics of "clean up all the sandboxes during this session" that we run at the experiment level. This solution might be the cleanest. Will revisit when we migrate HarborGenerator to use Orchestrator instead.
Currently, when an RL run fails (due to whatever reason) or the user Cotnrol-C the experiment, there is a high chance that there will be leaked sandboxes. They will live until a pre-configured timeout (say ~30 minutes).
While this is benign, it can be an issue when:
While Harbor has Trial cleanup logics implemented, the intricacies revolving around
uv run,raymake it relatively hard to actually allow those cleanup logics to run.This PR that changes from
tqdm.gather()toTaskGrouppartially solves the issue: #1193Related change in Harbor required to fully solve the issue: harbor-framework/harbor#819
These two branches (slightly different approach), claude-coded, can solve the issue but don't look very elegant:
Or, use Harbor's
Orchestratorconstruct instead, and implement an API with the semantics of "clean up all the sandboxes during this session" that we run at the experiment level. This solution might be the cleanest. Will revisit when we migrateHarborGeneratorto useOrchestratorinstead.