Skip to content

Mitigate CVE-2025-69872: HMAC-verified pickle envelope#361

Open
yeshsurya wants to merge 6 commits intograntjenks:masterfrom
yeshsurya:fix/cve-2025-69872-pickle-hmac
Open

Mitigate CVE-2025-69872: HMAC-verified pickle envelope#361
yeshsurya wants to merge 6 commits intograntjenks:masterfrom
yeshsurya:fix/cve-2025-69872-pickle-hmac

Conversation

@yeshsurya
Copy link
Copy Markdown

DiskCache previously called pickle.load on bytes read from the cache directory. An attacker with write access to that directory could replace the bytes with a malicious pickle and achieve arbitrary code execution in any victim process that subsequently read the cache (CVE-2025-69872 / GHSA-w8v5-vhqr-4h9v).

This change wraps every pickle blob written by Disk in an HMAC-SHA256 envelope:

b"DCv1" + HMAC_SHA256(key, payload) + payload

and verifies it before pickle.load is allowed to interpret the bytes. Verification failures raise pickle.UnpicklingError instead of executing attacker-controlled code.

Key resolution (lazy, only when the pickle path is actually used):

  1. Explicit Cache(directory, disk_pickle_key=...) / Disk(directory, pickle_key=...) argument.
  2. DISKCACHE_PICKLE_KEY environment variable (hex-encoded).
  3. Auto-generated <directory>/.diskcache_pickle_key (mode 0600). Emits UnsafePickleWarning because an attacker who can read the cache directory can also forge the HMAC -- this fallback gives corruption detection and protection against write-only attackers, not the full advisory threat model.

pickle_key=False is an explicit opt-out that disables HMAC verification (with UnsafePickleWarning). It is required to read caches written by pre-fix versions; users should re-create such caches or pass an explicit key after migrating.

Design constraints honored:

  • disk_pickle_key is constructor-only and never persisted to the Settings table -- avoids leaking secrets into the same compromised directory and prevents cache.reset from making pickle_key=False sticky across restarts.
  • Key resolution is lazy inside the new helpers, so JSONDisk and caches that only use primitive keys/values never trigger key file creation or warnings.
  • Cache.check() whitelists .diskcache_pickle_key so check(fix=True) does not delete it.
  • Strict by default: pre-fix caches without the envelope raise on read with a clear migration message.

A new tests/test_security.py adds 20 tests covering round-trip, envelope presence, explicit/env/file key resolution, non-persistence in Settings, mismatched-key rejection, inline-blob/.val-file/key tampering detection, legacy mode round-trip and forward read, JSONDisk untouched, Cache.check whitelist, and primitive-only caches not creating a key file. All 179 existing tests continue to pass.

Yeshwanth Nagaraj and others added 6 commits May 5, 2026 19:06
DiskCache previously called pickle.load on bytes read from the cache
directory. An attacker with write access to that directory could replace
the bytes with a malicious pickle and achieve arbitrary code execution
in any victim process that subsequently read the cache (CVE-2025-69872 /
GHSA-w8v5-vhqr-4h9v).

This change wraps every pickle blob written by Disk in an HMAC-SHA256
envelope:

    b"DCv1" + HMAC_SHA256(key, payload) + payload

and verifies it before pickle.load is allowed to interpret the bytes.
Verification failures raise pickle.UnpicklingError instead of executing
attacker-controlled code.

Key resolution (lazy, only when the pickle path is actually used):

  1. Explicit ``Cache(directory, disk_pickle_key=...)`` /
     ``Disk(directory, pickle_key=...)`` argument.
  2. ``DISKCACHE_PICKLE_KEY`` environment variable (hex-encoded).
  3. Auto-generated ``<directory>/.diskcache_pickle_key`` (mode 0600).
     Emits ``UnsafePickleWarning`` because an attacker who can read the
     cache directory can also forge the HMAC -- this fallback gives
     corruption detection and protection against write-only attackers,
     not the full advisory threat model.

``pickle_key=False`` is an explicit opt-out that disables HMAC
verification (with ``UnsafePickleWarning``). It is required to read
caches written by pre-fix versions; users should re-create such caches
or pass an explicit key after migrating.

Design constraints honored:

  * ``disk_pickle_key`` is constructor-only and never persisted to the
    Settings table -- avoids leaking secrets into the same compromised
    directory and prevents ``cache.reset`` from making
    ``pickle_key=False`` sticky across restarts.
  * Key resolution is lazy inside the new helpers, so ``JSONDisk`` and
    caches that only use primitive keys/values never trigger key file
    creation or warnings.
  * ``Cache.check()`` whitelists ``.diskcache_pickle_key`` so
    ``check(fix=True)`` does not delete it.
  * Strict by default: pre-fix caches without the envelope raise on
    read with a clear migration message.

A new ``tests/test_security.py`` adds 20 tests covering round-trip,
envelope presence, explicit/env/file key resolution, non-persistence in
Settings, mismatched-key rejection, inline-blob/.val-file/key tampering
detection, legacy mode round-trip and forward read, JSONDisk untouched,
``Cache.check`` whitelist, and primitive-only caches not creating a key
file. All 179 existing tests continue to pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Follow-up to the CVE-2025-69872 mitigation that addresses findings
from a 32-way parallel code review:

CRITICAL fixes:

* Block path traversal in ``Disk.fetch`` and ``Disk.remove`` (review
  C2 / r01,r16,r32). An attacker with write access to ``cache.db``
  could previously set the ``filename`` column to ``../../../etc/passwd``
  and have ``Disk.fetch`` ``open()`` arbitrary files outside the cache
  directory -- entirely bypassing HMAC verification because
  MODE_BINARY/MODE_TEXT do not unpickle. New helper
  ``Disk._safe_filename_path`` resolves the requested filename via
  ``op.realpath`` and refuses anything that escapes ``self._directory``.
  Confirmed via pen-test PoC by reviewer r32.

* FanoutCache now resolves the pickle HMAC key once at the FanoutCache
  root and forwards it to every shard (review C3 / r09). Previously
  each of N shards (default 8) independently auto-generated its own
  ``.diskcache_pickle_key`` file. ``FanoutCache._hash`` uses shard 0's
  key for routing, but storage shards held different keys -- making
  sharding non-deterministic across processes (two FanoutCache
  instances on the same directory would route the same complex key
  to different shards) and emitting one warning per shard.

HIGH fixes:

* ``Cache.check`` whitelist for ``.diskcache_pickle_key`` is anchored
  to the cache directory root rather than basename-only (review H1 /
  r07). Previously an attacker could plant a file named
  ``.diskcache_pickle_key`` in any subdirectory to evade
  ``check(fix=True)`` cleanup.

* ``Cache.__getstate__`` raises ``TypeError`` when the cache was
  configured with an explicit ``disk_pickle_key`` (or
  ``disk_pickle_key=False``) rather than silently dropping the secret
  on the floor and letting the receiving process regenerate a
  different key (review H2 / r08). Previously, sending a Cache to a
  multiprocessing worker silently corrupted reads.

* Atomic key file creation with retry-on-empty-read (review H3 / r05,
  r20, r21, r22, r31). The previous ``O_CREAT|O_EXCL`` + later
  ``os.write`` left a race window in which a concurrent reader saw a
  zero-byte file and crashed with a false-positive "too short" error.
  The reader now retries with short backoff (5 attempts, ~31 ms total)
  to ride out the writer's ``os.write``. The writer also unlinks the
  half-baked file on ``OSError`` (M8) so disk-full failures don't
  permanently break subsequent processes.

* ``os.makedirs`` in the key-file helper uses ``exist_ok=True`` to
  fix the TOCTOU race that crashed concurrent ``Disk`` initialisation
  (review H4 / r05, r21, r22).

MEDIUM fixes:

* Defensive pop of ``disk_pickle_key`` from rows loaded from the
  ``Settings`` table so a stale row (from an older patched build,
  hand-edit, or attacker injection) cannot flow into the Disk
  constructor (review M3 / r06).

* ``_coerce_pickle_key`` ``TypeError`` no longer advertises ``False``
  or ``None`` -- callers filter those before reaching the function
  (review M4 / r25).

* Standardised message prefix to lowercase ``"diskcache:"`` across
  warnings and exceptions (review M5 / r18).

INTERNAL refactor:

* ``_coerce_pickle_key``, ``_read_or_create_pickle_key_file``,
  ``_resolve_pickle_key_for_directory``, and
  ``_emit_pickle_key_warning`` are now module-level functions so
  ``FanoutCache`` can resolve the key without instantiating a Disk.

NOT FIXED (documented as known limitations):

* Reviewer H5 (HMAC does not bind cache-key context, enabling row-
  swap / replay attacks) requires plumbing the cache key through
  ``Disk.fetch``'s public signature, which is a breaking change for
  custom Disk subclasses. Deferred.

* Reviewer H6 (large MODE_PICKLE values now resident in memory for
  HMAC verification) is a documented tradeoff; a configurable
  ``disk_max_pickle_size`` ceiling is a possible future addition.

* Reviewer M2 (cached resolved key never re-read from disk after key
  rotation) is documented; key rotation requires restarting all
  processes consuming the cache.

* Reviewer C1 (legacy mode ``pickle_key=False`` strips and unpickles
  envelopes without verification) is intentional. The strip-and-load
  behaviour preserves a recovery path for caches written under a key
  the user has lost; in legacy mode the user has explicitly accepted
  unsafe pickle, and an attacker with write access can already RCE
  via raw pickle whether or not we strip the envelope.

Test additions (18 new tests, all passing):

* test_path_traversal_in_filename_raises
* test_path_traversal_blocked_for_text_mode
* test_remove_refuses_traversal
* test_fanoutcache_propagates_explicit_key
* test_fanoutcache_default_emits_one_warning
* test_fanoutcache_two_instances_share_default_key
* test_check_does_not_whitelist_subdirectory_keyfile
* test_cache_pickling_with_explicit_key_raises
* test_cache_pickling_with_legacy_mode_raises
* test_cache_pickling_with_default_succeeds
* test_disk_pickle_key_in_settings_does_not_leak
* test_short_envelope_header_raises
* test_zero_byte_value_raises
* test_valid_hmac_invalid_pickle_raises
* test_env_var_short_hex_raises
* test_env_var_empty_raises
* test_concurrent_key_file_creation_no_crash
* test_typeerror_message_lists_only_real_types

Test results: 179 existing + 38 new security = 217 passed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Follow-up to commit bacedee that addresses all the cleanly-fixable
high+medium severity findings from the third 32-reviewer audit.

HIGH (3 fixed, 1 deferred):

* V1: ``FanoutCache.__getstate__`` now mirrors ``Cache.__getstate__``'s
  H2 protection.  Previously, pickling a ``FanoutCache`` configured
  with an explicit ``disk_pickle_key`` (or ``=False``) silently
  succeeded, and the receiving process auto-generated a different key
  -- making every subsequent read fail HMAC verification.  The check
  uses the user's *original* argument (``self._pickle_key_user_arg``)
  rather than the resolved per-shard bytes, so default-mode pickling
  still works (the receiving process can re-resolve from env / file).

* V2: ``Deque.__getstate__`` and ``Index.__getstate__`` likewise
  inherit the H2 protection.  A new private helper
  ``_check_wrapped_cache_pickleable(cache, container_name)`` in
  ``persistent.py`` keeps the duplicated check tidy.

* V3: ``_read_or_create_pickle_key_file`` switched from
  ``try/except OSError/else`` to ``try/finally`` so a
  ``KeyboardInterrupt`` / ``SystemExit`` (Ctrl+C, signal handler
  calling ``sys.exit``) during the key write also unlinks the
  half-baked file.  Previously, an interrupted write left an empty
  ``.diskcache_pickle_key`` that permanently blocked every subsequent
  process with a ``RuntimeError("too short")`` until manually removed.

* V4 (HMAC binding to cache key context, replay/row-swap): DEFERRED.
  Requires a non-trivial public-API change to ``Disk.fetch`` (or a
  parallel ``fetch_with_context`` method) and a DCv2 envelope format
  with backward-compat for DCv1 readers.  Tracked for a follow-up.

MEDIUM (5 fixed, 2 deferred):

* V5: New ``Disk._uses_pickle = True`` class attribute (with
  ``JSONDisk._uses_pickle = False``).  ``FanoutCache.__init__`` skips
  the eager pickle-key resolution when ``disk._uses_pickle`` is
  ``False`` -- so ``FanoutCache(disk=JSONDisk)`` no longer auto-
  generates an unused ``.diskcache_pickle_key`` file or emits a
  spurious ``UnsafePickleWarning``.  The shard-key kwarg is now
  conditionally inserted, preserving old behaviour for pickle-using
  Disk subclasses.

* V6: ``Disk._safe_filename_path`` now caches
  ``op.realpath(self._directory)`` in ``self._directory_realpath`` on
  first use, eliminating one ``realpath`` syscall per fetch (~35%
  reduction in path-validation overhead per file-backed read).

* V9: ``_safe_filename_path`` translates ``OSError`` (e.g. Windows
  UNC ``\\server\\share\\foo`` raising ``FileNotFoundError`` from
  ``op.realpath``) into the documented ``ValueError`` so callers
  don't have to also catch ``OSError`` separately.

* V11: ``Cache.__copy__`` and ``Cache.__deepcopy__`` are added.
  Previously ``copy.copy(cache)`` and ``copy.deepcopy(cache)`` went
  through ``__getstate__`` and were rejected for caches with explicit
  keys, even though the operation never crosses a process boundary.
  Both methods now construct a fresh same-process ``Cache`` instance
  with the original directory, timeout, disk type, AND
  ``disk_pickle_key`` so the security posture is preserved.

* V13: ``Disk.fetch`` no longer raises ``UnboundLocalError`` when a
  tampered row has ``mode = MODE_BINARY|TEXT|PICKLE`` but
  ``filename = NULL``.  It raises a clear ``ValueError`` mentioning
  cache.db tampering instead.

* V7 (sticky legacy mode), V8 (orphan ``.val`` files after row
  delete): DEFERRED.  Both require deeper transaction-layer
  refactoring; documented as known limitations.

V10 (warning wording): the auto-generated-key
``UnsafePickleWarning`` now explicitly mentions the bootstrap-race
risk so operators understand the threat model and reach for
``DISKCACHE_PICKLE_KEY`` in production.

V12 (test coverage): 20 new tests covering all of the above plus
previously-uncovered error paths -- the "too short after retries"
``RuntimeError``, ``_safe_filename_path`` non-string guard,
``bytearray`` key acceptance, ``OSError`` during write cleanup,
and ``KeyboardInterrupt`` cleanup.

Test results: 179 existing + 58 security (38 prior + 20 new) = 237
passed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Follow-up to commit b233190 that addresses the cleanly-fixable
high+medium severity findings from the fourth (23-reviewer GPT-5.5)
audit.  Three findings (V20 memory regression, V23 refactor, V24 docs)
are deferred for separate work.

HIGH (4):

* V14 + V17 (combined fix in ``_read_or_create_pickle_key_file``):
  - V14: ``os.write`` may legally return a short write (signal,
    quota, ENOSPC).  Loop until the full key is on disk; raise
    OSError if a 0-byte return ever happens.  Without this the
    writer holds a 32-byte key in memory while readers see only a
    truncated prefix.
  - V17: set ``write_succeeded = True`` IMMEDIATELY after the write
    loop completes (before fsync), not after fsync.  Once the bytes
    have hit the kernel they are visible to concurrent readers; if
    a KeyboardInterrupt then cancels fsync we must NOT unlink the
    file or those readers will be left using a key that no longer
    exists on disk.

* V15 (``Cache._copy_for_same_process``): if the original Cache had
  already lazily resolved its key from the env var or in-dir file,
  propagate the resolved bytes (and ``_pickle_key_warned`` flag) to
  the copy.  Otherwise an intervening change to
  ``DISKCACHE_PICKLE_KEY`` or ``.diskcache_pickle_key`` between the
  original's first-use and the copy's first-use would silently use a
  different HMAC key.

* V16 (``FanoutCache.cache`` / ``deque`` / ``index``): the named
  child constructors now forward the FanoutCache's
  ``disk_pickle_key`` to the child Cache via a new
  ``_inject_disk_pickle_key`` helper.  Without this, named children
  auto-generated their own ``.diskcache_pickle_key`` files and used
  a different key than the parent (data unreadable on re-open with
  the parent's key).  Caller-supplied ``disk_pickle_key`` in
  ``cache(... **settings)`` still wins.

MEDIUM (4):

* V18 (``Disk._safe_filename_path``): narrow the OSError translation
  to ``FileNotFoundError``, ``NotADirectoryError``, and Windows
  WinError 53/67/87/123/161/1001 (UNC / drive-not-found / invalid
  path).  ``PermissionError`` and ``TimeoutError`` are real
  operational errors, not tampering signals; they propagate
  unchanged so callers can distinguish them.

* V19 (``FanoutCache`` / ``Deque`` / ``Index``): add ``__copy__``
  and ``__deepcopy__`` mirroring ``Cache._copy_for_same_process``.
  Previously ``copy.copy(fanout_with_explicit_key)`` raised
  TypeError via the H2 ``__getstate__`` guard, even though the copy
  never crosses a process boundary.

* V21 (legacy-data error message): the ``UnpicklingError`` raised
  when reading pre-fix raw pickle now references
  ``disk_pickle_key=False`` (the kwarg on Cache / FanoutCache /
  Django CACHES OPTIONS) instead of the Disk-only ``pickle_key``
  name.  Django users following the old message put
  ``pickle_key: False`` in their OPTIONS dict, which is silently
  dropped because Cache only forwards ``disk_*`` settings.

* V22 (test cleanup): ``test_valid_hmac_invalid_pickle_raises`` no
  longer accepts ``KeyError`` (which would mask a regression where
  an HMAC-valid corrupt entry is treated as a cache miss).

DEFERRED (with rationale):

* V20 (large MODE_PICKLE memory regression: 32 MiB pickle =
  80 ms / 96 MiB peak vs master's 24 ms / 32 MiB) -- streaming HMAC
  verify + seek-back-and-pickle-load reintroduces a TOCTOU window
  that the current full-buffer design specifically closes.  Safer
  alternatives (mmap with MAP_PRIVATE, file locking) need careful
  cross-platform work; documented as a known tradeoff for now.

* V23 (Cache.__init__ over Pylint R0915 = 63/60 statements) --
  cosmetic; defer to the doc/refactor pass.

* V24 (README Security section + MIGRATING.rst + api.rst) -- needs
  separate documentation work.

Test additions (12 new tests, 70 total in tests/test_security.py):

* test_v14_short_os_write_does_not_silently_truncate
* test_v14_zero_return_from_os_write_triggers_cleanup
* test_v17_keyboard_interrupt_during_fsync_keeps_durable_key
* test_v18_oserror_on_realpath_propagates_when_not_path_failure
* test_v15_copy_preserves_resolved_key_across_env_change
* test_v16_fanout_child_cache_inherits_pickle_key
* test_v16_fanout_child_deque_inherits_pickle_key
* test_v16_fanout_child_index_inherits_pickle_key
* test_v19_fanoutcache_copy_preserves_explicit_key
* test_v19_deque_copy_preserves_explicit_key
* test_v19_index_copy_preserves_explicit_key
* test_v21_legacy_error_mentions_disk_pickle_key

Test results: 179 existing + 70 security (58 prior + 12 new) = 249
passed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Fixes the eight issues raised by the pass-5 reviewer swarm; all tightly
inter-related so they ship in a single commit.  249 pre-existing tests
still pass; 11 new regression tests cover the original bugs.

== O1 + O4 (linked) -- inherited HMAC key channel ==

Pre-fix, default-mode FanoutCache children and copy.copy() targets
re-resolved the pickle key from env / file every time, so a change to
DISKCACHE_PICKLE_KEY between the original and the copy silently produced
a copy with a different HMAC key (O1).  And FanoutCache._inject_disk_pickle_key
forwarded the resolved root key as the public ``disk_pickle_key=`` kwarg,
which made every child Cache look user-explicit and refuse pickling
(O4) -- defeating Cache.__getstate__'s intended exemption for
default-mode caches.

Introduces a private ``_disk_pickle_key_inherited`` kwarg on
``Cache.__init__`` (and FanoutCache.__init__) that:
  * Forwards an already-resolved key to a child / copy WITHOUT
    setting ``_disk._pickle_key_arg`` (so __getstate__ still treats
    the instance as default-mode and pickling remains allowed).
  * Skips env / file resolution and skips the warning (the upstream
    context already warned).
  * Is consumed by Cache._copy_for_same_process,
    FanoutCache._copy_for_same_process, FanoutCache._inject_disk_pickle_key,
    Deque._copy_for_same_process, and Index._copy_for_same_process.

H2 protection preserved: explicit ``disk_pickle_key=`` (bytes / False)
still flows through the public kwarg, sets ``_pickle_key_arg``, and
__getstate__ refuses pickling.

== O2 -- FanoutCache children get-or-create race ==

Two threads racing on ``fc.cache('foo')`` could both miss the dict,
both construct a Cache on the same SQLite file, and the second store
would overwrite the first -- leaving the first caller holding an
orphan with an open connection that nobody else could close.  Adds
``self._children_lock = threading.Lock()`` and wraps ``cache()``,
``deque()``, ``index()`` in double-checked locking.

== O3 -- FanoutCache.close() leaks cached children ==

Pre-fix, ``close()`` only closed the shards and cleared the dicts;
long-lived references via ``fc.cache('foo')`` continued to hold open
SQLite handles.  Iterates ``_caches`` / ``_deques`` / ``_indexes``
under the new lock, calls ``.close()`` (suppressing exceptions),
clears, then closes shards.

== Y1 -- KeyboardInterrupt window in pickle key file write ==

If ``KeyboardInterrupt`` / ``SystemExit`` fires AFTER ``os.write``
deposited the full key in the kernel but BEFORE
``write_succeeded = True`` ran, the finally block would unlink the
file -- but concurrent readers can already see those bytes.  Now the
finally block re-reads the file; if the on-disk content matches the
key we just generated, the write actually completed and we preserve
the file.

== Y2 -- ELOOP not translated to ValueError ==

A symlink loop inside the cache directory surfaced as a generic
OSError from ``_safe_filename_path`` instead of ValueError, masking a
tampered cache.db that pointed at a self-referential chain.  Adds
``errno.ELOOP`` translation alongside the Windows winerror handling.

== Y3 -- Wrong kwarg name in error wording ==

``Cache(td, disk_pickle_key='not-hex')`` raised
``"pickle_key argument must be hex-encoded bytes"`` -- the user-facing
kwarg is ``disk_pickle_key``, not ``pickle_key``.  Cache.__init__ and
FanoutCache.__init__ now pre-validate user-explicit keys with
``source='disk_pickle_key argument'``.  Two adjacent operator-facing
messages also updated to use the public kwarg name (RuntimeError
"too short after retries" guidance and the disabled-mode warning).

== L1 -- pylint W0707 + flake8 E305 ==

* core.py: the "too short after retries" RuntimeError gains
  ``raise ... from None`` to suppress the chained FileExistsError
  bookkeeping context.
* core.py: extra blank line before ``MODE_NONE = 0``.

== Tests added (11) ==

* test_o1_default_mode_cache_copy_survives_env_change
* test_o1_default_mode_fanout_copy_survives_env_change
* test_o1_default_mode_deque_copy_survives_env_change
* test_o1_default_mode_index_copy_survives_env_change
* test_o2_fanout_concurrent_cache_no_duplicate
* test_o3_fanout_close_closes_children
* test_o4_default_mode_child_cache_pickleable
* test_o4_explicit_mode_child_cache_unpickleable
* test_y1_keyboard_interrupt_after_full_write_preserves_file
* test_y2_eloop_translates_to_value_error (skipped on Windows)
* test_y3_error_message_uses_disk_pickle_key_kwarg_name

CVE-2025-69872 / GHSA-w8v5-vhqr-4h9v.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Implements the four confirmed/likely findings from the pass-6 21-agent
multi-model review.

L1 (HIGH) — child cache key recoverability after pickle
=======================================================
``FanoutCache.cache(name)`` returns a child Cache that inherits the
parent's HMAC key via the private ``_disk_pickle_key_inherited``
kwarg, but the child directory previously contained no
``.diskcache_pickle_key`` file (the key lived only at the FanoutCache
root).  ``pickle.dumps(child)`` succeeded, but a receiving process
opening ``Cache(child_dir)`` would auto-generate a new random key and
fail HMAC verification on the first read.

Fix: a new module-level helper ``_write_inherited_pickle_key_file``
atomically writes the inherited key to ``<child_dir>/
.diskcache_pickle_key`` so the receiver can recover it via the normal
default-resolution path.  The write is gated on
``isinstance(inherited_key, (bytes, bytearray)) and len(...) >=
PICKLE_KEY_MIN_LEN`` so legacy mode (``False``) and explicit-key mode
do NOT leak a secret to disk -- only the inherited / default-resolved
key path persists.  As a side effect FanoutCache shards now also
mirror the root key file (verified by the updated
``test_fanoutcache_default_emits_one_warning`` test).

C1 (HIGH) — FanoutCache.close() fences subsequent use
=====================================================
After ``fc.close()``, calling ``fc.cache('new')`` previously
succeeded and constructed a fresh child against the just-closed
shards, silently leaking handles.  Now FanoutCache tracks
``self._closed`` (initialized False, set True under
``_children_lock`` inside ``close()``); ``cache()``, ``deque()``,
and ``index()`` raise ``RuntimeError`` if invoked after close.
``Cache.close()`` semantics are unchanged -- existing child
references obtained before close remain usable per the documented
per-thread connection behavior.  ``close()`` remains idempotent.

L2 (MEDIUM) — Y1 finally cleanup uses size+content check
========================================================
The original Y1 finally block re-read the file after a failed write
and unlinked unconditionally on read failure.  This could strand
concurrent readers if the read failed for transient reasons.  The
finally block now uses file size as the primary signal:

  * ``file_size < len(new_key)`` -> definitively partial, unlink.
  * ``file_size == len(new_key)`` -> verify content matches our
    generated key; if so claim success (preserve).
  * ``file_size > len(new_key)`` or unknown -> leave alone (someone
    else owns this file now, or we cannot determine state).

This biases toward false-positive preserve over false-negative
unlink, matching the stated invariant: stranding readers who already
hold a valid key is worse than leaving an empty / oversize file
(subsequent processes ride out the empty-read backoff above).

C2 — Test improvements
======================
* ``test_o2_fanout_concurrent_cache_no_duplicate``: replaced the
  Barrier-only version with a deterministic version that
  monkey-patches ``Cache.__init__`` to sleep 0.2s during child
  construction, forcing concurrent threads to genuinely race the
  get-or-create.  Asserts elapsed < 1.0s (lock serializes; without
  the lock duplicate construction would double).
* ``test_o3_fanout_close_closes_children``: replaced
  ``_local.con``-private-state asserts with a behavioral check that
  opens each child's SQLite database with a fresh
  ``BEGIN EXCLUSIVE`` after close -- if the original handle is still
  held this raises ``OperationalError``.
* New ``test_y1_finally_size_based_cleanup_partial`` and
  ``test_y1_finally_size_based_cleanup_complete`` exercise the L2
  size-based cleanup branches.
* New cross-platform
  ``test_y2_eloop_translates_to_value_error_via_monkeypatch``
  patches ``op.realpath`` to raise ``OSError(ELOOP)`` so the
  translation is verified without requiring admin-only symlinks on
  Windows.
* New ``test_l1_pickled_child_cache_recovers_data_in_receiver``
  pickles a child Cache with a complex tuple key, then verifies the
  data round-trips both via direct ``Cache(child_dir)`` and via
  ``pickle.loads(blob)``.
* New ``test_l1_inherited_key_file_written_atomically_in_child_dir``
  asserts the keyfile exists in the child directory and matches the
  parent's bytes.
* New ``test_l1_explicit_key_does_NOT_write_keyfile_into_child_dir``
  and ``test_l1_legacy_mode_does_NOT_write_keyfile_into_child_dir``
  verify the negative path -- explicit secrets and legacy mode must
  NOT persist anything to disk.
* New ``test_c1_fanoutcache_use_after_close_raises`` and
  ``test_c1_fanoutcache_close_idempotent`` exercise the C1 fence.

Test counts: 268 passed, 1 skipped (was 260 passed, 1 skipped).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant