Commit afaeccb
feat: enable pickling of most Expr except udaf and udwf (#1544)
* feat: pickle support for Expr via inline scalar UDF encoding
Adds Python-aware encoding to PythonLogicalCodec/PythonPhysicalCodec
so a ScalarUDF defined in Python travels inside the serialized
expression (cloudpickled into fun_definition) instead of needing a
matching registration on the receiver. With that in place, Expr gains
__reduce__ + classmethod from_bytes(buf, ctx=None) so pickle.dumps /
pickle.loads work end-to-end on expressions built from col, lit,
built-in functions, and Python scalar UDFs.
Wire format is framed as <DFPYUDF magic, version byte, cloudpickle
tuple>; the version byte lets a too-new/too-old payload surface a
clean Execution error instead of an opaque cloudpickle unpack
failure. Schema serde is via arrow-rs's native IPC (no pyarrow
round-trip). Cloudpickle module handle is cached per-interpreter
through PyOnceLock.
Worker-side context resolution lives in a new datafusion.ipc module:
set_worker_ctx / get_worker_ctx / clear_worker_ctx plus a private
_resolve_ctx helper consulted by Expr.from_bytes. Priority is
explicit ctx > worker ctx > global SessionContext. FFI UDFs still
travel by name and require the matching registration on the
receiver's context.
Aggregate and window UDF inline encoding, the per-session
with_python_udf_inlining toggle, sender-side context, and the
user-guide docs land in follow-on PRs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(pickle): add cloudpickle security warnings, docstring examples, edge-case tests
Inline `.. warning::` blocks on `Expr.to_bytes`, `Expr.from_bytes`, and
`Expr.__reduce__` so the cloudpickle / arbitrary-code-execution caveat is
visible at the public API surface in advance of the user-guide page that
lands in PR 4.
Add doctest-style `Examples:` blocks to `datafusion.ipc` functions
(`set_worker_ctx`, `clear_worker_ctx`, `get_worker_ctx`, `_resolve_ctx`),
`ScalarUDF.name`, and the new `Expr` pickle methods, per CLAUDE.md.
Tighten `Expr.__reduce__` return annotation to
`tuple[Callable[[bytes], Expr], tuple[bytes]]`.
Tests: multi-arg UDF round-trip (covers synthetic `arg_{i}` schema-field
loop in the codec) plus malformed-bytes paths through `Expr.from_bytes`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* as_any no longer in api
* feat(pickle): stamp Python (major, minor) in UDF wire header
cloudpickle bytecode is not portable across Python minor versions —
a payload produced on 3.11 fails to load on 3.12 with an opaque
marshal/unpickle error. Embed the sender's (major, minor) in the
DFPYUDF wire header and reject mismatches at decode time with an
actionable error that names both versions, instead of letting the
failure surface from inside cloudpickle.loads.
Header layout becomes:
DFPYUDF (7) | version (1) | py_major (1) | py_minor (1) | cloudpickle
Extend the Security warnings on Expr.to_bytes / from_bytes /
__reduce__ with a Portability section covering the cross-version
constraint and cloudpickle's by-value/by-reference behavior (the
callable inlines bytecode and closure cells, but imported names
travel by reference and must be importable on the receiver). Add
a matching Serialization model note to the datafusion.ipc module
docstring.
New tests:
- codec::wire_header_tests: py-major/minor mismatch, truncated
py-version bytes, round-trip with py-version
- test_pickle_expr::test_cross_version_error_message: patches the
py_minor byte inside an emitted payload and asserts the error
message identifies the version mismatch
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 8ba06e4 commit afaeccb
10 files changed
Lines changed: 1108 additions & 57 deletions
File tree
- crates/core/src
- python
- datafusion
- tests
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
45 | | - | |
| 45 | + | |
46 | 46 | | |
47 | 47 | | |
48 | 48 | | |
| |||
66 | 66 | | |
67 | 67 | | |
68 | 68 | | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
69 | 100 | | |
70 | 101 | | |
71 | 102 | | |
| |||
74 | 105 | | |
75 | 106 | | |
76 | 107 | | |
77 | | - | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
78 | 136 | | |
79 | 137 | | |
80 | 138 | | |
81 | 139 | | |
82 | 140 | | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
83 | 150 | | |
84 | 151 | | |
85 | 152 | | |
86 | | - | |
87 | | - | |
88 | | - | |
89 | | - | |
90 | | - | |
91 | | - | |
92 | 153 | | |
93 | 154 | | |
94 | 155 | | |
| |||
215 | 276 | | |
216 | 277 | | |
217 | 278 | | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
218 | 284 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
44 | 44 | | |
45 | 45 | | |
46 | 46 | | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
47 | 54 | | |
48 | 55 | | |
49 | 56 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
65 | 65 | | |
66 | 66 | | |
67 | 67 | | |
68 | | - | |
| 68 | + | |
69 | 69 | | |
70 | 70 | | |
71 | 71 | | |
| |||
149 | 149 | | |
150 | 150 | | |
151 | 151 | | |
| 152 | + | |
152 | 153 | | |
153 | 154 | | |
154 | 155 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
46 | 46 | | |
47 | 47 | | |
48 | 48 | | |
49 | | - | |
| 49 | + | |
50 | 50 | | |
51 | 51 | | |
52 | 52 | | |
| |||
440 | 440 | | |
441 | 441 | | |
442 | 442 | | |
443 | | - | |
| 443 | + | |
444 | 444 | | |
445 | | - | |
446 | | - | |
447 | | - | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
448 | 515 | | |
449 | 516 | | |
450 | 517 | | |
451 | 518 | | |
452 | | - | |
453 | | - | |
454 | | - | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
455 | 594 | | |
456 | | - | |
457 | | - | |
| 595 | + | |
| 596 | + | |
| 597 | + | |
| 598 | + | |
| 599 | + | |
458 | 600 | | |
459 | | - | |
| 601 | + | |
460 | 602 | | |
461 | 603 | | |
462 | 604 | | |
| |||
0 commit comments