Skip to content

Commit a843b37

Browse files
timsaucerclaude
andcommitted
docs: centralize pickle security caveat, link from codec/context docs
The pickle-on-untrusted-input warning was written out three times: in the `PythonLogicalCodec::with_python_udf_inlining` rustdoc, in the matching Python `SessionContext.with_python_udf_inlining` docstring, and in two places inside `distributing_work.rst` (a free-floating paragraph at the end of the inlining section plus the dedicated Security warning block). Three copies of the same load-bearing text would inevitably drift. Pick `distributing_work.rst` Security section as canonical: * Keep the Security warning block intact — it is the single source of truth. * Trim the redundant "Note that pickle.loads itself remains unsafe..." paragraph above it to a one-line summary + Sphinx cross-reference (`Security`_) so the section header still anchors the link. * Replace the Python docstring's 5-line warning paragraph with a one-sentence summary + `:doc:` link to `distributing_work`. * Replace the rustdoc warning with a one-sentence summary, a relative pointer to `docs/source/user-guide/io/distributing_work.rst`, and a link to the upstream Python pickle module security warning so the rustdoc remains self-contained for someone reading just the crate. Behavior unchanged; the strict-toggle test continues to assert on the substring "inlining is disabled", which lives in `refuse_inline_payload` (separate from these docs). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 8aad8da commit a843b37

3 files changed

Lines changed: 17 additions & 21 deletions

File tree

crates/core/src/codec.rs

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -212,17 +212,17 @@ impl PythonLogicalCodec {
212212
/// cross-language wire bytes, or reject `cloudpickle.loads` on
213213
/// untrusted `from_bytes` input.
214214
///
215-
/// Security scope: strict mode (`false`) protects only the codec
215+
/// Security scope: strict mode (`false`) narrows only the codec
216216
/// layer — it stops `Expr::from_bytes` from invoking
217217
/// `cloudpickle.loads` on the inline `DFPY*` payload. It does
218-
/// **not** make `pickle.loads(untrusted_bytes)` safe. Python's
219-
/// pickle protocol permits arbitrary code execution via
220-
/// `__reduce__` (and `Expr.__reduce__` returns
221-
/// `Expr._reconstruct(bytes)` — an honest reducer here, but the
222-
/// outer pickle stream can contain any reducer). Treat every
218+
/// **not** make `pickle.loads(untrusted_bytes)` safe; treat every
223219
/// `pickle.loads` on untrusted input as unsafe regardless of this
224-
/// setting; the toggle only narrows the surface inside
225-
/// `from_bytes`.
220+
/// setting. See `docs/source/user-guide/io/distributing_work.rst`
221+
/// (Security section) for the full threat model, and Python's
222+
/// [pickle module security warning][1] for why `pickle.loads` is
223+
/// unsafe in general.
224+
///
225+
/// [1]: https://docs.python.org/3/library/pickle.html#module-pickle
226226
pub fn with_python_udf_inlining(mut self, enabled: bool) -> Self {
227227
self.python_udf_inlining = enabled;
228228
self

docs/source/user-guide/io/distributing_work.rst

Lines changed: 4 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -308,14 +308,10 @@ side also refuses inline payloads. Explicit
308308
honor the supplied ``ctx`` directly and ignore the sender / worker
309309
contexts.
310310

311-
Note that :py:func:`pickle.loads` itself remains unsafe on untrusted
312-
input regardless of this setting — an attacker producing the outer
313-
pickle envelope can execute arbitrary code before the codec ever
314-
sees the bytes (see the
315-
`pickle module security warning
316-
<https://docs.python.org/3/library/pickle.html#module-pickle>`_ in
317-
the Python standard library docs). The toggle only protects the
318-
:py:meth:`Expr.from_bytes` API surface.
311+
The toggle only narrows the :py:meth:`Expr.from_bytes` surface;
312+
:py:func:`pickle.loads` on untrusted bytes remains unsafe regardless
313+
of this setting. See the `Security`_ section below for the full
314+
threat model.
319315

320316
Security
321317
~~~~~~~~

python/datafusion/context.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1802,11 +1802,11 @@ def with_python_udf_inlining(self, *, enabled: bool) -> SessionContext:
18021802
:func:`datafusion.ipc.set_worker_ctx` for the corresponding
18031803
:func:`pickle.loads`.
18041804
1805-
``pickle.loads`` on untrusted bytes remains unsafe regardless of
1806-
this setting (see the `pickle module security warning
1807-
<https://docs.python.org/3/library/pickle.html#module-pickle>`_
1808-
in the Python standard library docs). Only the
1809-
``to_bytes`` / ``from_bytes`` API is affected.
1805+
For the full security model, see
1806+
:doc:`/user-guide/io/distributing_work` (Security section). In
1807+
short: this toggle narrows only the :meth:`Expr.from_bytes`
1808+
surface; :func:`pickle.loads` on untrusted bytes remains
1809+
unsafe regardless of the toggle.
18101810
"""
18111811
new_internal = self.ctx.with_python_udf_inlining(enabled)
18121812
new = SessionContext.__new__(SessionContext)

0 commit comments

Comments
 (0)