Releases: facebookresearch/ProgramBench
Releases · facebookresearch/ProgramBench
v1.0.2
v1.0.1
What's Changed
- Fix: stderr messages can corrupt XML coverage report (#5), thanks for the report @darshanmakwana412
New Contributors
- @eltociear made their first contribution in #8
Full Changelog: v1.0.0...v1.0.1
ProgramBench 🦊
How much of SQLite, FFmpeg, PHP compiler can Opus 4.7 rebuild from scratch? Given just an executable and no starter code or internet access.
Introducing ProgramBench: 200 rigorous, whole-repo generation tasks where models design, build, and ship a working program end to end.
Read more: https://programbench.com/
