Skip to content

Cap decompressed size for deflate/zstd/lz4/packbits to block bomb attacks#1533

Open
brendancol wants to merge 1 commit intoxarray-contrib:mainfrom
brendancol:security/decompression-output-caps
Open

Cap decompressed size for deflate/zstd/lz4/packbits to block bomb attacks#1533
brendancol wants to merge 1 commit intoxarray-contrib:mainfrom
brendancol:security/decompression-output-caps

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

Four codecs in xrspatial/geotiff/_compression.py decompressed strip and tile payloads with no output-size cap. A small malicious TIFF (web download, fsspec, third-party catalog, user upload) could declare a 1024x1024 uint8 image (1 MiB expected) and supply a 1 MiB deflate-compressed strip that decodes to 1 GiB, OOM-killing the reader. The existing size check at _decode_strip_or_tile (_reader.py:565) runs after decompression, so it cannot bound peak RSS. Audit confirmed by setting RLIMIT_AS=300MB and feeding a 1 MiB deflate-bombed TIFF to read_to_array: process is killed.

This PR adds an expected_size cap inside each codec, rejecting any decode that would exceed expected_size * 1.05 + 1 bytes with a clean ValueError before peak allocation.

Codecs covered (one fix template, four codecs)

  • deflate (deflate_decompress): zlib.decompressobj().decompress(data, max_length=cap+1) with a drain loop. zlib.decompress has no cap parameter.
  • zstd (zstd_decompress): ZstdDecompressor().stream_reader(data).read(cap+1) plus a one-byte overflow probe. decompress(data, max_output_size=cap) is not enforced when the frame embeds the content size, which is exactly the case in the bomb.
  • lz4 (lz4_decompress): LZ4FrameDecompressor().decompress(data, max_length=cap+1) and treat needs_input == False as overflow.
  • packbits (packbits_decompress): pure-Python loop already; check the running output length after each opcode.

Margin

The 1.05x margin (5%) lets legitimate codec framing or trailing metadata bytes through while still rejecting the 1000:1 ratios characteristic of bomb attacks. A test (test_cap_includes_metadata_margin) covers the legitimate metadata case, and test_legitimate_high_compression_passes confirms an all-zero array at >50:1 ratio is accepted.

Tests

xrspatial/geotiff/tests/test_decompression_caps.py (14 tests):

  • Codec-direct bomb rejection (deflate, zstd, lz4, packbits) and round-trip.
  • End-to-end TIFF reads: 1 MiB declared image with a strip that decodes to 1 GiB, all four codecs.
  • Negative tests: legitimate >50:1 compression of zeros, plus the 5% metadata margin.

The end-to-end TIFFs are hand-built with struct.pack so no test ever materializes the 1 GiB payload outside the codec's own buffer.

Out of scope

Per project policy on one-fix-per-PR for security work, the user explicitly opted to bundle these four codec caps into one PR (same vulnerability class, same fix template). Other audit findings (IFD count, IFD chain) ship in separate PRs.

Test plan

  • pytest xrspatial/geotiff/tests/test_decompression_caps.py -x -q (14 pass)
  • pytest xrspatial/geotiff/tests/test_compression.py xrspatial/geotiff/tests/test_compression_level.py xrspatial/geotiff/tests/test_lz4.py xrspatial/geotiff/tests/test_reader.py xrspatial/geotiff/tests/test_writer.py -x -q (63 pass, no regressions)

…acks

The four codecs in ``xrspatial/geotiff/_compression.py`` decompressed strip
and tile payloads with no output-size cap, so a small malicious TIFF could
expand to multiple gigabytes during decode and OOM-kill the reader.  An
audit confirmed it: a 1 MiB deflate-compressed strip declaring a 1024x1024
image expands to 1 GiB, and a process with ``RLIMIT_AS=300MB`` is killed
before the existing post-decode size check (``_decode_strip_or_tile`` in
``_reader.py:565``) ever runs.  The threat model is untrusted TIFF input
from web downloads, fsspec, third-party catalogs, or user upload.

Each codec now accepts an ``expected_size`` kwarg (the byte count the
caller already computed for the post-check) and refuses to emit more than
``expected_size * 1.05 + 1`` bytes before raising ``ValueError`` with the
codec name and actual vs cap.  The 1.05x margin allows for legitimate
codec metadata that some encoders emit; bomb ratios (1000:1+) are
rejected long before peak RSS spikes.

Per-codec implementation:

- deflate: ``zlib.decompressobj().decompress(data, max_length=cap+1)``
  with a drain loop over ``unconsumed_tail``.  ``zlib.decompress`` had no
  cap.
- zstd: ``ZstdDecompressor().stream_reader(data).read(cap+1)``, then
  probe one more byte to detect overflow.  ``decompress(data,
  max_output_size=cap)`` is not actually enforced when the frame embeds
  the content size, which the bomb does.
- lz4: ``LZ4FrameDecompressor().decompress(data, max_length=cap+1)`` and
  treat ``needs_input == False`` as overflow (decoder has buffered output
  it could not deliver).
- packbits: pure-Python loop already, so just check the running output
  length after each opcode.

The dispatch ``decompress`` plumbs ``expected_size`` through to all four.

Tests in ``xrspatial/geotiff/tests/test_decompression_caps.py`` cover:
codec-direct bomb rejection, end-to-end TIFF bomb rejection (1 MiB
declared, 1 GiB decoded), legitimate high-ratio compression (all-zero
arrays at >50:1) passing without false rejection, and the 5% metadata
margin not over-rejecting.

Audit reproducer behaviour: the 1 MiB-to-1 GiB TIFF previously triggered
``MemoryError`` (or OS OOM kill) in zlib.decompress; now raises
``ValueError("deflate decode exceeded expected size: ...")`` with peak
RSS bounded by the cap.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant