[Harbor] Proper cleanup of Trials (especially address sandbox leakage) upon RL run fails / KeyboardInterrupt

Currently, when an RL run fails (due to whatever reason) or the user Cotnrol-C the experiment, there is a high chance that there will be leaked sandboxes. They will live until a [pre-configured timeout](https://github.com/laude-institute/harbor/pull/651) (say ~30 minutes).

While this is benign, it can be an issue when:
- A user only has a limited concurrency budget from a sandbox provider (e.g. 1000 sandboxes)
- If an experiment needs 1000 sandboxes, and the user has a retry script that relaunches the SkyRL experiment at the previous checkpoint, where we now need a new set of 1000 concurrent sandboxes, the user will run out of concurrency budget, making the retry fail.

While Harbor has Trial cleanup logics implemented, the intricacies revolving around `uv run`, `ray` make it relatively hard to actually allow those cleanup logics to run.

This PR that changes from `tqdm.gather()` to `TaskGroup` partially solves the issue: https://github.com/NovaSky-AI/SkyRL/pull/1193

Related change in Harbor required to fully solve the issue: https://github.com/laude-institute/harbor/pull/819

These two branches (slightly different approach), claude-coded, can solve the issue but don't look very elegant:
- https://github.com/NovaSky-AI/SkyRL/pull/1160
- https://github.com/CharlieFRuan/SkyRL/tree/dev/022126-harbor-signal
  - Makes the trainer.run() run on head node, making the fix slightly less verbose

Or, use Harbor's `Orchestrator` construct instead, and implement an API with the semantics of "clean up all the sandboxes during this session" that we run at the experiment level. This solution might be the cleanest. Will revisit when we migrate `HarborGenerator` to use `Orchestrator` instead.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Harbor] Proper cleanup of Trials (especially address sandbox leakage) upon RL run fails / KeyboardInterrupt #1194

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Harbor] Proper cleanup of Trials (especially address sandbox leakage) upon RL run fails / KeyboardInterrupt #1194

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions