-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Labels
Description
What is your issue?
Using
- xarray 2025.12.0
- pandas 2.3.3
Dataset.eval uses Pandas under the hood to handle the eval of arbitrary expressions
Lines 9538 to 9598 in f8bc4f4
| def eval( | |
| self, | |
| statement: str, | |
| *, | |
| parser: QueryParserOptions = "pandas", | |
| ) -> Self | T_DataArray: | |
| """ | |
| Calculate an expression supplied as a string in the context of the dataset. | |
| This is currently experimental; the API may change particularly around | |
| assignments, which currently return a ``Dataset`` with the additional variable. | |
| Currently only the ``python`` engine is supported, which has the same | |
| performance as executing in python. | |
| Parameters | |
| ---------- | |
| statement : str | |
| String containing the Python-like expression to evaluate. | |
| Returns | |
| ------- | |
| result : Dataset or DataArray, depending on whether ``statement`` contains an | |
| assignment. | |
| Examples | |
| -------- | |
| >>> ds = xr.Dataset( | |
| ... {"a": ("x", np.arange(0, 5, 1)), "b": ("x", np.linspace(0, 1, 5))} | |
| ... ) | |
| >>> ds | |
| <xarray.Dataset> Size: 80B | |
| Dimensions: (x: 5) | |
| Dimensions without coordinates: x | |
| Data variables: | |
| a (x) int64 40B 0 1 2 3 4 | |
| b (x) float64 40B 0.0 0.25 0.5 0.75 1.0 | |
| >>> ds.eval("a + b") | |
| <xarray.DataArray (x: 5)> Size: 40B | |
| array([0. , 1.25, 2.5 , 3.75, 5. ]) | |
| Dimensions without coordinates: x | |
| >>> ds.eval("c = a + b") | |
| <xarray.Dataset> Size: 120B | |
| Dimensions: (x: 5) | |
| Dimensions without coordinates: x | |
| Data variables: | |
| a (x) int64 40B 0 1 2 3 4 | |
| b (x) float64 40B 0.0 0.25 0.5 0.75 1.0 | |
| c (x) float64 40B 0.0 1.25 2.5 3.75 5.0 | |
| """ | |
| return pd.eval( # type: ignore[return-value] | |
| statement, | |
| resolvers=[self], | |
| target=self, | |
| parser=parser, | |
| # Because numexpr returns a numpy array, using that engine results in | |
| # different behavior. We'd be very open to a contribution handling this. | |
| engine="python", | |
| ) |
However, Pandas explicitly does not support data with dimensionality > 2:
Since I was interested in simple additions and subtractions among variables, I tried to disable the check. The code runs on benchmark 3D data, but painstakingly slowly.
I came up with this solution
def _ds_eval(ds: xr.Dataset, expr: str) -> xr.DataArray:
import ast
ALLOWED_NODES = (
ast.Expression, ast.BinOp, ast.Name, ast.Load, ast.Constant,
ast.Add, ast.Sub, ast.Mult, ast.Div,
)
def validate(expr):
tree = ast.parse(expr, mode="eval")
for node in ast.walk(tree):
if not isinstance(node, ALLOWED_NODES):
raise ValueError(f"Disallowed expression: {node}")
return tree
tree = validate(expr)
compiled = compile(tree, "<expr>", "eval")
return eval(
compiled,
{"__builtins__": {}}, # globals
ds.data_vars, # locals
)
# example usage, assuming ds has variables x and y
z = ds_eval(ds, "x + y")Since ds.eval is read-only, it is not possible to monkeypatch the Dataset object.
I'm putting it here in case someone else needs it and to open a discussion on possible improvements on the current implementation, which I would be happy to help on.