Skip to content

Fix flatten_dict crash/wrong key for bare numpy array values#1247

Open
Kymi808 wants to merge 1 commit into
huggingface:mainfrom
Kymi808:fix/flatten-dict-ndarray-key
Open

Fix flatten_dict crash/wrong key for bare numpy array values#1247
Kymi808 wants to merge 1 commit into
huggingface:mainfrom
Kymi808:fix/flatten-dict-ndarray-key

Conversation

@Kymi808
Copy link
Copy Markdown

@Kymi808 Kymi808 commented May 28, 2026

Summary

In flatten_dict (src/lighteval/utils/utils.py), the recursive helper's np.ndarray branch builds its key using i:

elif isinstance(v, np.ndarray):
    into[prefix + k + sep + str(i)] = v.tolist()

But i is the loop variable from the preceding list/tuple branch (for i, vv in enumerate(v)). For a standalone ndarray value it is never bound, so the call raises UnboundLocalError. If a list/tuple key was processed earlier in the same dict, the stale i leaks in and produces a bogus indexed key.

flatten_dict({"a": np.array([1, 2, 3])})
# UnboundLocalError: cannot access local variable 'i'

flatten_dict({"lst": [10, 20, 30], "arr": np.array([7, 8, 9])})
# {'lst/0': 10, 'lst/1': 20, 'lst/2': 30, 'arr/2': [7, 8, 9]}   <- bogus '/2'

flatten_dict feeds obj_to_markdown, which evaluation_tracker uses to log config/task details to TensorBoard, so an ndarray-valued config entry crashes logging.

Fix

A bare ndarray maps to a single value, so key it as prefix + k, identical to the scalar branch just below it:

elif isinstance(v, np.ndarray):
    into[prefix + k] = v.tolist()

The list/tuple-of-ndarrays path (which legitimately uses i) is unchanged.

Test plan

  • Added TestFlattenDict covering the unbound case, the stale-index case, and a regression guard for list-of-ndarrays. The first two fail on main (UnboundLocalError / bogus arr/2 key) and pass with this change.
  • pytest tests/unit/utils/test_utils.py → 9 passed.
  • ruff check / ruff format --check clean on both files.

In flatten_dict's recursive helper, the `np.ndarray` branch built its key as
`prefix + k + sep + str(i)`, but `i` is the loop variable from the preceding
list/tuple branch. For a standalone ndarray value `i` is unbound, raising
UnboundLocalError; if a list/tuple key was processed earlier in the same dict,
the stale `i` leaks in and produces a bogus indexed key (e.g. "arr/2").

A bare ndarray maps to a single value, so key it as `prefix + k`, matching the
scalar branch. Adds regression tests for the unbound and stale-index cases.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant