Cap decompressed size for deflate/zstd/lz4/packbits to block bomb attacks by brendancol · Pull Request #1533 · xarray-contrib/xarray-spatial

brendancol · 2026-05-08T20:55:27Z

Summary

Four codecs in xrspatial/geotiff/_compression.py decompressed strip and tile payloads with no output-size cap. A small malicious TIFF (web download, fsspec, third-party catalog, user upload) could declare a 1024x1024 uint8 image (1 MiB expected) and supply a 1 MiB deflate-compressed strip that decodes to 1 GiB, OOM-killing the reader. The existing size check at _decode_strip_or_tile (_reader.py:565) runs after decompression, so it cannot bound peak RSS. Audit confirmed by setting RLIMIT_AS=300MB and feeding a 1 MiB deflate-bombed TIFF to read_to_array: process is killed.

This PR adds an expected_size cap inside each codec, rejecting any decode that would exceed expected_size * 1.05 + 1 bytes with a clean ValueError before peak allocation.

Codecs covered (one fix template, four codecs)

deflate (deflate_decompress): zlib.decompressobj().decompress(data, max_length=cap+1) with a drain loop. zlib.decompress has no cap parameter.
zstd (zstd_decompress): ZstdDecompressor().stream_reader(data).read(cap+1) plus a one-byte overflow probe. decompress(data, max_output_size=cap) is not enforced when the frame embeds the content size, which is exactly the case in the bomb.
lz4 (lz4_decompress): LZ4FrameDecompressor().decompress(data, max_length=cap+1) and treat needs_input == False as overflow.
packbits (packbits_decompress): pure-Python loop already; check the running output length after each opcode.

Margin

The 1.05x margin (5%) lets legitimate codec framing or trailing metadata bytes through while still rejecting the 1000:1 ratios characteristic of bomb attacks. A test (test_cap_includes_metadata_margin) covers the legitimate metadata case, and test_legitimate_high_compression_passes confirms an all-zero array at >50:1 ratio is accepted.

Tests

xrspatial/geotiff/tests/test_decompression_caps.py (14 tests):

Codec-direct bomb rejection (deflate, zstd, lz4, packbits) and round-trip.
End-to-end TIFF reads: 1 MiB declared image with a strip that decodes to 1 GiB, all four codecs.
Negative tests: legitimate >50:1 compression of zeros, plus the 5% metadata margin.

The end-to-end TIFFs are hand-built with struct.pack so no test ever materializes the 1 GiB payload outside the codec's own buffer.

Out of scope

Per project policy on one-fix-per-PR for security work, the user explicitly opted to bundle these four codec caps into one PR (same vulnerability class, same fix template). Other audit findings (IFD count, IFD chain) ship in separate PRs.

Test plan

pytest xrspatial/geotiff/tests/test_decompression_caps.py -x -q (14 pass)
pytest xrspatial/geotiff/tests/test_compression.py xrspatial/geotiff/tests/test_compression_level.py xrspatial/geotiff/tests/test_lz4.py xrspatial/geotiff/tests/test_reader.py xrspatial/geotiff/tests/test_writer.py -x -q (63 pass, no regressions)

…acks The four codecs in ``xrspatial/geotiff/_compression.py`` decompressed strip and tile payloads with no output-size cap, so a small malicious TIFF could expand to multiple gigabytes during decode and OOM-kill the reader. An audit confirmed it: a 1 MiB deflate-compressed strip declaring a 1024x1024 image expands to 1 GiB, and a process with ``RLIMIT_AS=300MB`` is killed before the existing post-decode size check (``_decode_strip_or_tile`` in ``_reader.py:565``) ever runs. The threat model is untrusted TIFF input from web downloads, fsspec, third-party catalogs, or user upload. Each codec now accepts an ``expected_size`` kwarg (the byte count the caller already computed for the post-check) and refuses to emit more than ``expected_size * 1.05 + 1`` bytes before raising ``ValueError`` with the codec name and actual vs cap. The 1.05x margin allows for legitimate codec metadata that some encoders emit; bomb ratios (1000:1+) are rejected long before peak RSS spikes. Per-codec implementation: - deflate: ``zlib.decompressobj().decompress(data, max_length=cap+1)`` with a drain loop over ``unconsumed_tail``. ``zlib.decompress`` had no cap. - zstd: ``ZstdDecompressor().stream_reader(data).read(cap+1)``, then probe one more byte to detect overflow. ``decompress(data, max_output_size=cap)`` is not actually enforced when the frame embeds the content size, which the bomb does. - lz4: ``LZ4FrameDecompressor().decompress(data, max_length=cap+1)`` and treat ``needs_input == False`` as overflow (decoder has buffered output it could not deliver). - packbits: pure-Python loop already, so just check the running output length after each opcode. The dispatch ``decompress`` plumbs ``expected_size`` through to all four. Tests in ``xrspatial/geotiff/tests/test_decompression_caps.py`` cover: codec-direct bomb rejection, end-to-end TIFF bomb rejection (1 MiB declared, 1 GiB decoded), legitimate high-ratio compression (all-zero arrays at >50:1) passing without false rejection, and the 5% metadata margin not over-rejecting. Audit reproducer behaviour: the 1 MiB-to-1 GiB TIFF previously triggered ``MemoryError`` (or OS OOM kill) in zlib.decompress; now raises ``ValueError("deflate decode exceeded expected size: ...")`` with peak RSS bounded by the cap.

github-actions Bot added the performance PR touches performance-sensitive code label May 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cap decompressed size for deflate/zstd/lz4/packbits to block bomb attacks#1533

Cap decompressed size for deflate/zstd/lz4/packbits to block bomb attacks#1533
brendancol wants to merge 1 commit intoxarray-contrib:mainfrom
brendancol:security/decompression-output-caps

brendancol commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brendancol commented May 8, 2026

Summary

Codecs covered (one fix template, four codecs)

Margin

Tests

Out of scope

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant