Cap IFD chain length in parse_all_ifds to prevent DoS#1530
Open
brendancol wants to merge 1 commit intoxarray-contrib:mainfrom
Open
Cap IFD chain length in parse_all_ifds to prevent DoS#1530brendancol wants to merge 1 commit intoxarray-contrib:mainfrom
brendancol wants to merge 1 commit intoxarray-contrib:mainfrom
Conversation
`parse_all_ifds` walks the IFD chain via `next_ifd_offset` and deduplicates with a `seen` set of offsets. That catches cycles, but not long acyclic chains: a crafted BigTIFF can chain millions of distinct IFD offsets that each point at small valid IFDs scattered through a sparse multi-GB file, forcing O(N) memory in attacker- controlled N. This change adds a `MAX_IFDS = 256` ceiling and raises `ValueError` once the chain hits it. 256 is generous: real-world COGs carry the full-resolution IFD plus a handful of overview levels and (optionally) per-band masks, so they sit well under 64 even for deep pyramids. The cycle-detection `seen` set is preserved untouched. Threat model is untrusted TIFF input (web download, fsspec source, third-party catalog, user upload). Counterpart to S2 (xarray-contrib#1527), which bounded per-IFD entry counts. Tests in test_ifd_chain_cap.py: - chain past the cap is rejected with a clear `ValueError` - chain at MAX_IFDS - 1 still parses - chain at MAX_IFDS hits the cap (off-by-one boundary) - error message names MAX_IFDS and the threat - big-endian chains hit the same cap - a real COG with overview levels parses unaffected
There was a problem hiding this comment.
Pull request overview
Adds a hard upper bound on the number of TIFF Image File Directories (IFDs) that parse_all_ifds will traverse, preventing memory/time DoS via extremely long (but acyclic) next_ifd_offset chains in untrusted TIFF/BigTIFF inputs.
Changes:
- Introduce module-level
MAX_IFDS = 256and raiseValueErrorwhen the parsed IFD chain reaches that limit. - Add a new test module exercising the cap (over-limit rejection, boundary behavior, message contents, and big-endian coverage).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
xrspatial/geotiff/_header.py |
Defines MAX_IFDS and enforces it in parse_all_ifds with a ValueError to cap chain traversal. |
xrspatial/geotiff/tests/test_ifd_chain_cap.py |
Adds tests validating the new IFD-chain length cap behavior, including boundary and big-endian scenarios. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| from xrspatial.geotiff import to_geotiff | ||
| from xrspatial.geotiff._header import ( | ||
| MAX_IFDS, | ||
| TAG_IMAGE_LENGTH, |
Comment on lines
+75
to
+79
| """Exactly MAX_IFDS IFDs is allowed; MAX_IFDS + 1 is rejected. | ||
|
|
||
| Convention: we raise once ``len(ifds) >= MAX_IFDS`` after appending, | ||
| so a chain of length exactly MAX_IFDS triggers the error and | ||
| MAX_IFDS - 1 is the largest accepted chain. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
parse_all_ifdsinxrspatial/geotiff/_header.pywalks the IFD chain vianext_ifd_offsetand deduplicates with aseenset of offsets. That catches cycles but not long acyclic chains. A crafted BigTIFF can chain millions of distinct IFD offsets, each pointing at a small valid IFD scattered through a sparse multi-GB file, forcing O(N) memory in attacker-controlled N. Finding S3 in a recent security audit ofxrspatial.geotiff.This PR caps the chain at
MAX_IFDS = 256and raisesValueErroronce that ceiling is hit. 256 is generous: real-world COGs carry the full-resolution IFD plus a handful of overview levels and (optionally) per-band masks, so they sit well under 64 even for deep pyramids. The cycle-detectionseenset is preserved untouched.This is the chain-length counterpart to S2 (#1527), which bounded per-IFD entry counts via
MAX_IFD_ENTRY_COUNT. Threat model in both cases is untrusted TIFF input: web download, fsspec source, third-party catalog, user upload.Changes
xrspatial/geotiff/_header.py: define module-levelMAX_IFDS = 256near the existing constants, and raiseValueErrorinparse_all_ifdswhen the chain reaches that length.xrspatial/geotiff/tests/test_ifd_chain_cap.py: new test module.Test plan
pytest xrspatial/geotiff/tests/test_ifd_chain_cap.py -x -q(6 passed)pytest xrspatial/geotiff/tests/test_header.py xrspatial/geotiff/tests/test_overview_filter.py xrspatial/geotiff/tests/test_polish_1488.py -x -q(63 passed)test_legitimate_cog_with_overviews_passes)MAX_IFDS - 1parses;MAX_IFDSraises