Skip to content

Cap IFD chain length in parse_all_ifds to prevent DoS#1530

Open
brendancol wants to merge 1 commit intoxarray-contrib:mainfrom
brendancol:security/ifd-chain-length-cap
Open

Cap IFD chain length in parse_all_ifds to prevent DoS#1530
brendancol wants to merge 1 commit intoxarray-contrib:mainfrom
brendancol:security/ifd-chain-length-cap

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

parse_all_ifds in xrspatial/geotiff/_header.py walks the IFD chain via next_ifd_offset and deduplicates with a seen set of offsets. That catches cycles but not long acyclic chains. A crafted BigTIFF can chain millions of distinct IFD offsets, each pointing at a small valid IFD scattered through a sparse multi-GB file, forcing O(N) memory in attacker-controlled N. Finding S3 in a recent security audit of xrspatial.geotiff.

This PR caps the chain at MAX_IFDS = 256 and raises ValueError once that ceiling is hit. 256 is generous: real-world COGs carry the full-resolution IFD plus a handful of overview levels and (optionally) per-band masks, so they sit well under 64 even for deep pyramids. The cycle-detection seen set is preserved untouched.

This is the chain-length counterpart to S2 (#1527), which bounded per-IFD entry counts via MAX_IFD_ENTRY_COUNT. Threat model in both cases is untrusted TIFF input: web download, fsspec source, third-party catalog, user upload.

Changes

  • xrspatial/geotiff/_header.py: define module-level MAX_IFDS = 256 near the existing constants, and raise ValueError in parse_all_ifds when the chain reaches that length.
  • xrspatial/geotiff/tests/test_ifd_chain_cap.py: new test module.

Test plan

  • pytest xrspatial/geotiff/tests/test_ifd_chain_cap.py -x -q (6 passed)
  • pytest xrspatial/geotiff/tests/test_header.py xrspatial/geotiff/tests/test_overview_filter.py xrspatial/geotiff/tests/test_polish_1488.py -x -q (63 passed)
  • Real COG with several overview levels still parses (covered by test_legitimate_cog_with_overviews_passes)
  • Boundary check: a chain of MAX_IFDS - 1 parses; MAX_IFDS raises
  • Big-endian path covered

`parse_all_ifds` walks the IFD chain via `next_ifd_offset` and
deduplicates with a `seen` set of offsets. That catches cycles, but
not long acyclic chains: a crafted BigTIFF can chain millions of
distinct IFD offsets that each point at small valid IFDs scattered
through a sparse multi-GB file, forcing O(N) memory in attacker-
controlled N.

This change adds a `MAX_IFDS = 256` ceiling and raises `ValueError`
once the chain hits it. 256 is generous: real-world COGs carry the
full-resolution IFD plus a handful of overview levels and (optionally)
per-band masks, so they sit well under 64 even for deep pyramids. The
cycle-detection `seen` set is preserved untouched.

Threat model is untrusted TIFF input (web download, fsspec source,
third-party catalog, user upload). Counterpart to S2 (xarray-contrib#1527), which
bounded per-IFD entry counts.

Tests in test_ifd_chain_cap.py:
- chain past the cap is rejected with a clear `ValueError`
- chain at MAX_IFDS - 1 still parses
- chain at MAX_IFDS hits the cap (off-by-one boundary)
- error message names MAX_IFDS and the threat
- big-endian chains hit the same cap
- a real COG with overview levels parses unaffected
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 8, 2026
@brendancol brendancol requested a review from Copilot May 9, 2026 01:22
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a hard upper bound on the number of TIFF Image File Directories (IFDs) that parse_all_ifds will traverse, preventing memory/time DoS via extremely long (but acyclic) next_ifd_offset chains in untrusted TIFF/BigTIFF inputs.

Changes:

  • Introduce module-level MAX_IFDS = 256 and raise ValueError when the parsed IFD chain reaches that limit.
  • Add a new test module exercising the cap (over-limit rejection, boundary behavior, message contents, and big-endian coverage).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
xrspatial/geotiff/_header.py Defines MAX_IFDS and enforces it in parse_all_ifds with a ValueError to cap chain traversal.
xrspatial/geotiff/tests/test_ifd_chain_cap.py Adds tests validating the new IFD-chain length cap behavior, including boundary and big-endian scenarios.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

from xrspatial.geotiff import to_geotiff
from xrspatial.geotiff._header import (
MAX_IFDS,
TAG_IMAGE_LENGTH,
Comment on lines +75 to +79
"""Exactly MAX_IFDS IFDs is allowed; MAX_IFDS + 1 is rejected.

Convention: we raise once ``len(ifds) >= MAX_IFDS`` after appending,
so a chain of length exactly MAX_IFDS triggers the error and
MAX_IFDS - 1 is the largest accepted chain.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants