You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Feat: Add batch eval script for public test only (#87)
* feat: add run_eval_public.sh
* feat: update readme
* fix: add solutions_dir support and filter options for batch eval
- Fix BatchEvaluator to use custom solutions_dir for hash computation
- Add --model and --problem filter options to CLI and run_eval_public.sh
- Allow users to test specific models or problems without running everything
* docs: add filter options examples to README
* refactor: improve batch CLI with --track and auto paths
- Replace --algorithmic flag with --track research|algorithmic (required)
- Set default solutions/problems paths based on track
- Auto-create track subdir in results_dir (results/{track}/)
- Simplify shell script to rely on CLI defaults
- Update README with CLI usage examples
* docs: clarify batch evaluation with custom solutions directory
* feat: add SkyPilot cleanup and default backend for research track
- Add signal handler to cleanup SkyPilot clusters on Ctrl+C
- Research track defaults to SkyPilot backend (no need for --skypilot flag)
- Add --docker flag to override default for research track
- Simplify shell script since CLI handles backend defaults
* refactor: simplify batch CLI with --backend flag and auto-adjust workers
* fix: update run_eval.sh for new CLI flags
* docs: update batch CLI examples in markdown files
* refactor: use positional track argument for all CLI commands
---------
Co-authored-by: Andy Lee <andylizf@outlook.com>
0 commit comments