Skip to content

Releases: facebookresearch/ProgramBench

v1.0.2

11 May 16:58
b33e660

Choose a tag to compare

This minor release ignores ~30 tests that caused hangs when evaluating incorrect solutions.

Full Changelog: v1.0.1...v1.0.2

v1.0.1

07 May 12:45
1fe64c8

Choose a tag to compare

What's Changed

  • Fix: stderr messages can corrupt XML coverage report (#5), thanks for the report @darshanmakwana412

New Contributors

Full Changelog: v1.0.0...v1.0.1

ProgramBench 🦊

05 May 14:31
2803dcc

Choose a tag to compare

How much of SQLite, FFmpeg, PHP compiler can Opus 4.7 rebuild from scratch? Given just an executable and no starter code or internet access.

Introducing ProgramBench: 200 rigorous, whole-repo generation tasks where models design, build, and ship a working program end to end.

Read more: https://programbench.com/

image