Skip to content

Conversation

@max-sixty
Copy link
Collaborator

@max-sixty max-sixty commented Dec 31, 2025

this ended up being a much bigger effort than expected

  • we needed to leave behind pandas' implementation for Dataset.eval because it's limited to 2 dims
  • we keep the pandas' implementation for .query, because we should be more careful about changing that, it uses numexpr which is fast, and doesn't have a requirement for > 2 dims
  • so then I added the code that kept it consistent with the .query interface; e.g. and & or, etc

I added some similar constraints that pandas has around limiting what eval can do. I'm not that confident that it's robust. and not sure how valuable it is.

most of the added code is tests

Commentary from Claude below (+ Claude wrote the code, for transparency, albeit with lots of oversight)


This commit removes the dependency on pandas.eval() and implements a native expression evaluator in Dataset.eval() using Python's ast module. The new implementation provides better support for multi-dimensional arrays and maintains backward compatibility with deprecated operators through automatic transformation.

Key changes:

  • Remove pd.eval() call and replace with custom _eval_expression() method
  • Add _LogicalOperatorTransformer to convert deprecated operators (and/or/not) to bitwise operators (&/|/~) that work element-wise on arrays
  • Implement automatic transformation of chained comparisons to explicit bitwise AND operations
  • Add security validation to block lambda expressions and private attributes
  • Emit FutureWarning for deprecated constructs (logical operators, chained comparisons, parser= argument)
  • Support assignment statements (target = expression) in eval()
  • Make data variables and coordinates take priority in namespace resolution
  • Provide safe builtins (abs, min, max, round, len, sum, pow, any, all, type constructors, iteration helpers) while blocking import, open, etc.
  • Add comprehensive test coverage including edge cases, error messages, dask compatibility, and security validation
  • Closes #xxxx
  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst
  • New functions/methods are listed in api.rst

This commit removes the dependency on pandas.eval() and implements a native
expression evaluator in Dataset.eval() using Python's ast module. The new
implementation provides better support for multi-dimensional arrays and
maintains backward compatibility with deprecated operators through automatic
transformation.

Key changes:
- Remove pd.eval() call and replace with custom _eval_expression() method
- Add _LogicalOperatorTransformer to convert deprecated operators (and/or/not)
  to bitwise operators (&/|/~) that work element-wise on arrays
- Implement automatic transformation of chained comparisons to explicit
  bitwise AND operations
- Add security validation to block lambda expressions and private attributes
- Emit FutureWarning for deprecated constructs (logical operators, chained
  comparisons, parser= argument)
- Support assignment statements (target = expression) in eval()
- Make data variables and coordinates take priority in namespace resolution
- Provide safe builtins (abs, min, max, round, len, sum, pow, any, all, type
  constructors, iteration helpers) while blocking __import__, open, etc.
- Add comprehensive test coverage including edge cases, error messages, dask
  compatibility, and security validation
max-sixty and others added 3 commits January 1, 2026 10:57
- Use pd.isna(ds["a"].values) instead of pd.isna(ds["a"]) since pandas
  type stubs don't have overloads for DataArray
- Use abs() instead of np.abs() to get DataArray return type

Co-authored-by: Claude <noreply@anthropic.com>
The lambda and dunder restrictions emulate pd.eval() behavior rather than
providing security guarantees. Pandas explicitly doesn't claim these as
security measures.

Co-authored-by: Claude <noreply@anthropic.com>
# - pd.eval() with numexpr is faster and well-tested for query's use case
# -------------------------------------------------------------------------

class _LogicalOperatorTransformer(ast.NodeTransformer):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this all be moved to a dedicated module?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants