Commit dac9ec6
feat: enable pickling for Python aggregate and window UDFs (#1545)
* feat: inline encoding for Python aggregate and window UDFs
Extends the PythonLogicalCodec / PythonPhysicalCodec inline encoding
introduced for scalar UDFs to also cover Python-defined aggregate and
window UDFs. The cloudpickle tuple shape per family is:
DFPYUDA (agg) (name, accumulator_factory, input_schema_bytes,
return_schema_bytes, state_schema_bytes,
volatility_str)
DFPYUDW (window) (name, evaluator_factory, input_schema_bytes,
return_schema_bytes, volatility_str)
Same wire-framing as scalar (family magic + version byte + cloudpickle
blob), same schema serde (arrow-rs native IPC), same cached cloudpickle
handle. The agg state schema is encoded as a full IPC schema so the
post-decode UDF reports the same names + nullability + metadata as the
sender — relevant for accumulators whose StateFieldsArgs consumers key
off names rather than positional DataType.
Required restructuring two existing UDF impls so the codec can grab
the Python callable directly:
* udaf.rs: replaces create_udaf + AccumulatorFactoryFunction closure
with a named PythonFunctionAggregateUDF that stores the Py<PyAny>
accumulator factory. Synthesizes state_{i} field names when the
Python constructor passes only Vec<DataType>; from_parts preserves
the full state schema on the decode side.
* udwf.rs: renames MultiColumnWindowUDF -> PythonFunctionWindowUDF,
drops the PartitionEvaluatorFactory PtrEq wrapper, stores the
Py<PyAny> evaluator directly. PartialEq and Hash get the same
pointer-identity fast path + debug-log exception handling already
on PythonFunctionScalarUDF.
User-facing surface:
* AggregateUDF.name and WindowUDF.name properties (parallel to the
ScalarUDF.name shipped in PR1).
* Existing UDAF/UDWF construction paths are unchanged.
The per-session with_python_udf_inlining toggle, sender-side context,
strict refusal, and user-guide docs land in PRs 3-4 of this series.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat: restore pub UDAF/UDWF helpers and document inline encoding
Re-export `to_rust_accumulator`, `to_rust_partition_evaluator`, and
`PythonFunctionWindowUDF` (with a `MultiColumnWindowUDF` alias) by
promoting `udaf` and `udwf` to `pub mod` so prior downstream Rust
consumers keep their API surface after the inline-encoding refactor.
Adds an end-to-end window UDF pickle round-trip test that runs the
decoded evaluator over a real session, mirroring the aggregate test.
Documents the cloudpickle-based shipping behavior of Python aggregate
and window UDFs in the user-guide aggregations and windows pages.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: address PR #1545 review feedback
- Fix CountAcc.merge in pickle test: sum over states[0] (partition
counts), not over the list of state fields. The prior implementation
only added partition 0's count when merging across partitions.
- Drive test_agg_udf_evaluates_after_roundtrip with a two-batch
DataFrame so merge actually runs and the round-tripped state-field
schema is exercised end-to-end.
- Correct PY_AGG_UDF_FAMILY / PY_WINDOW_UDF_FAMILY doc comments and the
aggregate block comment to reference "return schema bytes" rather
than "return type" / "return_type_bytes" so the docs match the actual
on-wire layout.
- Keep `udaf` and `udwf` modules private (matching `udf`) and
selectively re-export the helpers downstream Rust consumers rely on
(`to_rust_accumulator`, `to_rust_partition_evaluator`,
`PythonFunctionWindowUDF`, `MultiColumnWindowUDF`) instead of
exposing the whole module surface.
- Rename codec helpers `*_agg_udf` -> `*_udaf` and `*_window_udf` ->
`*_udwf` for naming consistency with the Python public aliases.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent afaeccb commit dac9ec6
10 files changed
Lines changed: 787 additions & 87 deletions
File tree
- crates/core/src
- docs/source/user-guide/common-operations
- python
- datafusion
- tests
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
65 | 65 | | |
66 | 66 | | |
67 | 67 | | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
68 | 73 | | |
69 | 74 | | |
70 | 75 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
22 | | - | |
| 22 | + | |
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| 26 | + | |
26 | 27 | | |
27 | | - | |
| 28 | + | |
28 | 29 | | |
29 | 30 | | |
30 | 31 | | |
| |||
144 | 145 | | |
145 | 146 | | |
146 | 147 | | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
147 | 164 | | |
148 | | - | |
149 | | - | |
150 | | - | |
151 | | - | |
152 | | - | |
153 | | - | |
154 | | - | |
155 | | - | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
156 | 310 | | |
157 | 311 | | |
158 | 312 | | |
| |||
190 | 344 | | |
191 | 345 | | |
192 | 346 | | |
193 | | - | |
194 | | - | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
195 | 350 | | |
196 | | - | |
| 351 | + | |
| 352 | + | |
197 | 353 | | |
198 | | - | |
199 | | - | |
200 | 354 | | |
| 355 | + | |
201 | 356 | | |
202 | 357 | | |
203 | 358 | | |
| |||
231 | 386 | | |
232 | 387 | | |
233 | 388 | | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
234 | 394 | | |
0 commit comments