Skip to content

Dataset.eval does not support N-dimensional objects with N > 2 #11062

@jacopo-exact

Description

@jacopo-exact

What is your issue?

Using

  • xarray 2025.12.0
  • pandas 2.3.3

Dataset.eval uses Pandas under the hood to handle the eval of arbitrary expressions

xarray/xarray/core/dataset.py

Lines 9538 to 9598 in f8bc4f4

def eval(
self,
statement: str,
*,
parser: QueryParserOptions = "pandas",
) -> Self | T_DataArray:
"""
Calculate an expression supplied as a string in the context of the dataset.
This is currently experimental; the API may change particularly around
assignments, which currently return a ``Dataset`` with the additional variable.
Currently only the ``python`` engine is supported, which has the same
performance as executing in python.
Parameters
----------
statement : str
String containing the Python-like expression to evaluate.
Returns
-------
result : Dataset or DataArray, depending on whether ``statement`` contains an
assignment.
Examples
--------
>>> ds = xr.Dataset(
... {"a": ("x", np.arange(0, 5, 1)), "b": ("x", np.linspace(0, 1, 5))}
... )
>>> ds
<xarray.Dataset> Size: 80B
Dimensions: (x: 5)
Dimensions without coordinates: x
Data variables:
a (x) int64 40B 0 1 2 3 4
b (x) float64 40B 0.0 0.25 0.5 0.75 1.0
>>> ds.eval("a + b")
<xarray.DataArray (x: 5)> Size: 40B
array([0. , 1.25, 2.5 , 3.75, 5. ])
Dimensions without coordinates: x
>>> ds.eval("c = a + b")
<xarray.Dataset> Size: 120B
Dimensions: (x: 5)
Dimensions without coordinates: x
Data variables:
a (x) int64 40B 0 1 2 3 4
b (x) float64 40B 0.0 0.25 0.5 0.75 1.0
c (x) float64 40B 0.0 1.25 2.5 3.75 5.0
"""
return pd.eval( # type: ignore[return-value]
statement,
resolvers=[self],
target=self,
parser=parser,
# Because numexpr returns a numpy array, using that engine results in
# different behavior. We'd be very open to a contribution handling this.
engine="python",
)

However, Pandas explicitly does not support data with dimensionality > 2:

https://github.com/pandas-dev/pandas/blob/9b4461268867c548f575c75733f46fee7c5ad585/pandas/core/computation/ops.py#L118-L120

Since I was interested in simple additions and subtractions among variables, I tried to disable the check. The code runs on benchmark 3D data, but painstakingly slowly.

I came up with this solution

def _ds_eval(ds: xr.Dataset, expr: str) -> xr.DataArray:
    import ast

    ALLOWED_NODES = (
        ast.Expression, ast.BinOp, ast.Name, ast.Load, ast.Constant,
        ast.Add, ast.Sub, ast.Mult, ast.Div,
    )

    def validate(expr):
        tree = ast.parse(expr, mode="eval")
        for node in ast.walk(tree):
            if not isinstance(node, ALLOWED_NODES):
                raise ValueError(f"Disallowed expression: {node}")
        return tree

    tree = validate(expr)
    compiled = compile(tree, "<expr>", "eval")

    return eval(
        compiled,
        {"__builtins__": {}},  # globals
        ds.data_vars,  # locals
    )

# example usage, assuming ds has variables x and y
z = ds_eval(ds, "x + y")

Since ds.eval is read-only, it is not possible to monkeypatch the Dataset object.

I'm putting it here in case someone else needs it and to open a discussion on possible improvements on the current implementation, which I would be happy to help on.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions