@@ -116,6 +116,85 @@ What travels with the expression
116116 :py:class: `SessionContext `. Without that registration, evaluation
117117 raises an error.
118118
119+ Session contexts at a glance
120+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
121+
122+ There is only one type — :py:class: `SessionContext `. It can occupy
123+ up to four *slots * in a running program:
124+
125+ .. list-table ::
126+ :header-rows: 1
127+ :widths: 12 18 40 30
128+
129+ * - Slot
130+ - Lifetime
131+ - Purpose
132+ - Set how
133+ * - User-held
134+ - Local variable / attribute
135+ - Build and run queries
136+ - ``ctx = SessionContext(...) ``
137+ * - Global
138+ - Process singleton (lazy-init)
139+ - Backs module-level
140+ :py:func: `~datafusion.io.read_parquet `,
141+ :py:func: `~datafusion.io.read_csv `,
142+ :py:func: `~datafusion.io.read_json `,
143+ :py:func: `~datafusion.io.read_avro `; final fallback for
144+ :py:meth: `Expr.from_bytes `
145+ - Implicit; access via
146+ :py:meth: `SessionContext.global_ctx `
147+ * - Sender
148+ - Thread-local on the driver
149+ - Codec settings for outbound :py:func: `pickle.dumps ` /
150+ :py:meth: `Expr.to_bytes ` without ``ctx ``
151+ - :py:func: `~datafusion.ipc.set_sender_ctx `
152+ * - Worker
153+ - Thread-local on the worker
154+ - Function registry for inbound :py:func: `pickle.loads ` /
155+ :py:meth: `Expr.from_bytes ` without ``ctx ``
156+ - :py:func: `~datafusion.ipc.set_worker_ctx `
157+
158+ The same :py:class: `SessionContext ` object may occupy more than one
159+ slot simultaneously — installing it into a slot is a reference, not
160+ a copy.
161+
162+ **Non-distributed user. ** One user-held context. The global slot is
163+ invisible unless you call top-level ``read_* `` helpers. Sender and
164+ worker slots are unused.
165+
166+ **Distributed user. ** Two questions to answer:
167+
168+ 1. *Driver side — what wire format do I want? * The default (Python UDF
169+ inlining on) is self-contained; you do not need a sender context.
170+ To opt into the strict format,
171+ :py:func: `~datafusion.ipc.set_sender_ctx `
172+ with a session built via
173+ :py:meth: `SessionContext.with_python_udf_inlining(False)
174+ <datafusion.SessionContext.with_python_udf_inlining> `.
175+
176+ 2. *Worker side — what registrations does decode need? * For built-ins
177+ and inline Python UDFs, nothing. For FFI-capsule UDFs (or
178+ strict-mode round-trips that travel by name), call
179+ :py:func: `~datafusion.ipc.set_worker_ctx ` once per worker with a
180+ context that has the relevant registrations.
181+
182+ Resolution order on the worker side is *explicit argument →
183+ worker context → global context. * Explicit ``ctx= `` on
184+ :py:meth: `Expr.from_bytes ` always wins; the sender slot is ignored
185+ on decode and the worker slot is ignored on encode.
186+
187+ Sharp edges:
188+
189+ * Sender and worker slots are **thread-local **. Background threads
190+ on either side see ``None `` until they install their own.
191+ * The global slot persists across ``fork `` workers (copy-on-write
192+ memory inherit) but not across ``spawn `` / ``forkserver `` workers
193+ (fresh process — register or install a worker context on
194+ start-up).
195+ * The inlining toggle is per-context state, not a global switch.
196+ Two contexts with different toggles can coexist in one process.
197+
119198Registering shared UDFs on workers
120199~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
121200
@@ -143,9 +222,10 @@ as the *worker context*:
143222
144223 Inside a worker, expressions arriving from the driver resolve their
145224by-name references against the installed worker context. If no worker
146- context is installed, a fresh empty :py:class: `SessionContext ` is
147- used — fine for expressions that only reference built-ins and Python
148- UDFs, but FFI-capsule-backed registrations will fail to resolve.
225+ context is installed, the global :py:class: `SessionContext ` is used —
226+ fine for expressions that only reference built-ins and Python UDFs,
227+ but FFI-capsule-backed registrations must be installed on the global
228+ context to resolve.
149229
150230Python 3.14 default change
151231~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -198,6 +278,27 @@ Mismatched configurations raise a descriptive error: an inline blob
198278fed to a strict receiver fails fast rather than silently dropping
199279into ``cloudpickle.loads ``.
200280
281+ To make the toggle apply through :py:func: `pickle.dumps ` (which
282+ calls :py:meth: `Expr.to_bytes ` with no context), install the strict
283+ session as the driver's *sender context *:
284+
285+ .. code-block :: python
286+
287+ from datafusion import SessionContext
288+ from datafusion.ipc import set_sender_ctx
289+
290+ set_sender_ctx(SessionContext().with_python_udf_inlining(False ))
291+ # Every subsequent pickle.dumps(expr) on this thread encodes
292+ # without inlining the Python callable.
293+
294+ Pair with a matching strict worker context
295+ (:py:func: `~datafusion.ipc.set_worker_ctx `) so the ``pickle.loads ``
296+ side also refuses inline payloads. Explicit
297+ :py:meth: `Expr.to_bytes(ctx) <Expr.to_bytes> ` and
298+ :py:meth: `Expr.from_bytes(blob, ctx=ctx) <Expr.from_bytes> ` calls
299+ honor the supplied ``ctx `` directly and ignore the sender / worker
300+ contexts.
301+
201302Note that :py:func: `pickle.loads ` itself remains unsafe on untrusted
202303input regardless of this setting — an attacker producing the outer
203304pickle envelope can execute arbitrary code before the codec ever
0 commit comments