Bounds-check IFD count to prevent DoS on crafted TIFFs by brendancol · Pull Request #1527 · xarray-contrib/xarray-spatial

brendancol · 2026-05-08T20:42:05Z

Summary

Addresses security audit finding S2: parse_ifd in xrspatial/geotiff/_header.py previously trusted an entry's count field unconditionally and passed it straight into struct.unpack_from(f'{bo}{count}{fmt}', ...). A crafted multi-MB TIFF with count=5e7 on a LONG tag and a value pointer aimed at a region of valid file bytes forced the parser to materialize a ~1.4 GB tuple of Python ints before any caller-side validation ran.

Threat model

Untrusted TIFF input: web download, fsspec source, third-party catalog, user upload. An attacker only needs to ship a file large enough to cover the supposed value range (count * type_size bytes) to weaponize the count field.

Fix

Two guards added inside the per-entry loop in parse_ifd:

Hard cap MAX_IFD_ENTRY_COUNT = 10_000_000 on count for tags that are not pixel-data offset/byte-count arrays. STRIP_OFFSETS, STRIP_BYTE_COUNTS, TILE_OFFSETS, TILE_BYTE_COUNTS, and COLORMAP are exempt because their counts legitimately scale with image dimensions (a real high-resolution tiled COG can have millions of tiles).
For non-inline values, verify that value_offset + count * type_size <= len(data) before calling _read_value.

Both checks raise ValueError before any large allocation runs.

Test plan

pytest xrspatial/geotiff/tests/test_parse_ifd_bounds.py -x -q (5 new tests)
pytest xrspatial/geotiff/tests/test_header.py xrspatial/geotiff/tests/test_reader.py -x -q (34 passed)
Full geotiff suite passes apart from three pre-existing palette/deepcopy failures in test_features.py unrelated to this change
Manual reproduction of the audit scenario (count=5e7 LONG on ImageWidth against a 250 MB padded file) now raises ValueError with no large allocation

`parse_ifd` previously trusted an entry's `count` field unconditionally and passed it straight into `struct.unpack_from(f'{bo}{count}{fmt}', ...)`. A crafted TIFF with `count=5e7` on a LONG tag and a value pointer aimed at a multi-MB padding region forced the parser to materialize a ~1.4 GB tuple of Python ints before any caller-side validation ran. This change adds two guards inside the per-entry loop: 1. Reject `count > MAX_IFD_ENTRY_COUNT` (10_000_000) for tags that are not pixel-data offset/byte-count arrays. STRIP_OFFSETS, STRIP_BYTE_COUNTS, TILE_OFFSETS, TILE_BYTE_COUNTS, and COLORMAP are exempt because their counts legitimately scale with image size. 2. For non-inline values, verify that `value_offset + count * type_size <= len(data)` before calling `_read_value`. Both checks raise `ValueError` before any allocation. Threat model is untrusted TIFF input (web download, fsspec source, third-party catalog, user upload). Tests added in test_parse_ifd_bounds.py: - count over the cap on a non-pixel tag is rejected - value range past EOF is rejected - pixel-array tag with a huge count is allowed - legal small IFDs still parse - TileByteCounts SHORT path exercised

Copilot

Pull request overview

This PR hardens the GeoTIFF header/IFD parser against memory-DoS attacks by validating IFD entry count values and referenced value ranges before decoding tag payloads.

Changes:

Add MAX_IFD_ENTRY_COUNT and enforce a hard cap for non–pixel-array tags in parse_ifd.
Add bounds checking for non-inline IFD value pointers to ensure value_offset + count * type_size stays within len(data).
Add a new test module covering count cap enforcement, EOF bounds checks, and pixel-tag exemptions.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
`xrspatial/geotiff/_header.py`	Adds `count` hard-cap for non-pixel tags and value-range bounds checks before `_read_value` / `struct.unpack_from`.
`xrspatial/geotiff/tests/test_parse_ifd_bounds.py`	Introduces tests for the new guards and intended exemptions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Tighten the IFD-entry caps and clean up test scaffolding: - Lower MAX_IFD_ENTRY_COUNT 10_000_000 -> 100_000. Even at the lower bound a malformed entry can still force ~3 MB of Python tuple allocation for LONG values; 100K is well above any legitimate metadata count and bounds the worst case to ~30 MB. - Add MAX_IFD_ENTRY_BYTES = 256 KiB cap on `count * type_size` for non-pixel tags. Catches large-itemsize abuse (e.g. DOUBLE arrays) where the count alone passes but the bytes are excessive. Default caps are now independently useful: count cap stops integer-array attacks, byte cap stops large-element-size attacks. - Remove the dead `ptr < 0` branch on the value-range check; ptr is unpacked via unsigned `I`/`Q` so it can never be negative. - Reformat _build_tiff signature in the test to one-arg-per-line so the closing paren lines up cleanly (prior layout tripped E128/E124). - Replace test_huge_pixel_array_count_allowed (which allocated 40 MB and forced a 10M-int tuple in CI) with monkeypatch-based tests that exercise the cap logic with tiny payloads. Add a second test that proves the byte cap fires at default cap values, so the production thresholds stay covered.

`parse_all_ifds` walks the IFD chain via `next_ifd_offset` and deduplicates with a `seen` set of offsets. That catches cycles, but not long acyclic chains: a crafted BigTIFF can chain millions of distinct IFD offsets that each point at small valid IFDs scattered through a sparse multi-GB file, forcing O(N) memory in attacker- controlled N. This change adds a `MAX_IFDS = 256` ceiling and raises `ValueError` once the chain hits it. 256 is generous: real-world COGs carry the full-resolution IFD plus a handful of overview levels and (optionally) per-band masks, so they sit well under 64 even for deep pyramids. The cycle-detection `seen` set is preserved untouched. Threat model is untrusted TIFF input (web download, fsspec source, third-party catalog, user upload). Counterpart to S2 (xarray-contrib#1527), which bounded per-IFD entry counts. Tests in test_ifd_chain_cap.py: - chain past the cap is rejected with a clear `ValueError` - chain at MAX_IFDS - 1 still parses - chain at MAX_IFDS hits the cap (off-by-one boundary) - error message names MAX_IFDS and the threat - big-endian chains hit the same cap - a real COG with overview levels parses unaffected

* Cap IFD chain length in parse_all_ifds to prevent DoS `parse_all_ifds` walks the IFD chain via `next_ifd_offset` and deduplicates with a `seen` set of offsets. That catches cycles, but not long acyclic chains: a crafted BigTIFF can chain millions of distinct IFD offsets that each point at small valid IFDs scattered through a sparse multi-GB file, forcing O(N) memory in attacker- controlled N. This change adds a `MAX_IFDS = 256` ceiling and raises `ValueError` once the chain hits it. 256 is generous: real-world COGs carry the full-resolution IFD plus a handful of overview levels and (optionally) per-band masks, so they sit well under 64 even for deep pyramids. The cycle-detection `seen` set is preserved untouched. Threat model is untrusted TIFF input (web download, fsspec source, third-party catalog, user upload). Counterpart to S2 (#1527), which bounded per-IFD entry counts. Tests in test_ifd_chain_cap.py: - chain past the cap is rejected with a clear `ValueError` - chain at MAX_IFDS - 1 still parses - chain at MAX_IFDS hits the cap (off-by-one boundary) - error message names MAX_IFDS and the threat - big-endian chains hit the same cap - a real COG with overview levels parses unaffected * Address Copilot review on #1530 - Remove unused TAG_IMAGE_LENGTH import (F401). - Fix test_chain_at_boundary_passes docstring: the summary said "Exactly MAX_IFDS is allowed" but the body and code show MAX_IFDS - 1 is the largest accepted chain. Reword the summary to match. * Harmonize MAX_IFDS check with MAX_IFD_ENTRY_COUNT semantics Switch parse_all_ifds from `len(ifds) >= MAX_IFDS` to `len(ifds) > MAX_IFDS` so the cap matches the `> MAX_IFD_ENTRY_COUNT` convention used elsewhere in this module: "MAX = N" reads as "up to and including N is allowed". Update the boundary test to assert MAX_IFDS parses cleanly and MAX_IFDS + 1 raises.

github-actions Bot added the performance PR touches performance-sensitive code label May 8, 2026

brendancol mentioned this pull request May 8, 2026

Cap IFD chain length in parse_all_ifds to prevent DoS #1530

Merged

5 tasks

brendancol requested a review from Copilot May 8, 2026 22:12

Copilot started reviewing on behalf of brendancol May 8, 2026 22:12 View session

Copilot AI reviewed May 8, 2026

View reviewed changes

Comment thread xrspatial/geotiff/tests/test_parse_ifd_bounds.py Outdated

Comment thread xrspatial/geotiff/_header.py Outdated

Comment thread xrspatial/geotiff/_header.py Outdated

Comment thread xrspatial/geotiff/tests/test_parse_ifd_bounds.py Outdated

brendancol merged commit 8a71679 into xarray-contrib:main May 9, 2026
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bounds-check IFD count to prevent DoS on crafted TIFFs#1527

Bounds-check IFD count to prevent DoS on crafted TIFFs#1527
brendancol merged 2 commits intoxarray-contrib:mainfrom
brendancol:fix/ifd-count-bounds

brendancol commented May 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

brendancol commented May 8, 2026

Summary

Threat model

Fix

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants