Fix: Fully align WOSAC metric calculation with original #258

daphne-cornelisse · 2026-01-14T20:04:53Z

Description

We observed a discrepancy between the WOSAC meta-scores from the original implementation and those produced by PufferDrive. This PR resolves these discrepancies by fixing the bugs below.

Bugs fixed

Order of operations: We averaged log-likelihood metrics before exponentiating; WOSAC exponentiates first and then averages.

To do

There is still a discrepancy of 0.03 between the SMART meta-score in Pufferdrive vs. the WOSAC leaderboard
We need to evaluate a random agent in the leaderboard
Update the baselines (clean small dataset, larger one)

Tests ran and results

TODO

…ndex=-1.

Wael Boumediene Doulazmi and others added 3 commits January 14, 2026 13:58

quick commit so you can read the code

c720519

Merge remote-tracking branch 'origin/2.0' into wbd/wosac_debug

e59cc38

Merge remote-tracking branch 'origin/2.0' into wbd/wosac_debug

0ddfac3

daphne-cornelisse added bug Something isn't working benchmarking documentation Improvements or additions to documentation labels Jan 14, 2026

daphne-cornelisse and others added 4 commits January 15, 2026 09:44

Improve naming of sampling argument to better describe its function.

962d8f2

Ensure that initialization works with Carla maps or other, when sdc_i…

57d8e31

…ndex=-1.

Simplify CARLA compatibility

6c9cd18

Replace num_maps by wosac_num_maps in all the eval scripts

b75a99a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: Fully align WOSAC metric calculation with original #258

Fix: Fully align WOSAC metric calculation with original #258

Uh oh!

daphne-cornelisse commented Jan 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix: Fully align WOSAC metric calculation with original #258

Are you sure you want to change the base?

Fix: Fully align WOSAC metric calculation with original #258

Uh oh!

Conversation

daphne-cornelisse commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Bugs fixed

To do

Tests ran and results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

daphne-cornelisse commented Jan 14, 2026 •

edited

Loading