Skip to content

geotiff: route read_vrt(chunks=) XML read through size cap (#1831)#1832

Open
brendancol wants to merge 1 commit into
mainfrom
deep-sweep-security-geotiff-2026-05-13-s4-fix01
Open

geotiff: route read_vrt(chunks=) XML read through size cap (#1831)#1832
brendancol wants to merge 1 commit into
mainfrom
deep-sweep-security-geotiff-2026-05-13-s4-fix01

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

PR #1818 capped VRT XML reads at 64 MiB (override via XRSPATIAL_VRT_MAX_XML_BYTES) in _vrt._read_vrt_xml so an attacker-supplied multi-GB VRT file cannot exhaust host memory at parse time. The lazy chunked dispatcher landed in #1822 (_read_vrt_chunked) parses the VRT independently with a bare open(source, 'r').read(), bypassing that cap; only the per-task worker reuses the capped reader.

This routes the chunked-path XML read through _read_vrt_xml so the same cap and env-var override apply on both eager and chunked entry points. Closes #1831.

Repro (pre-fix)

import os, tempfile, numpy as np
from xrspatial.geotiff import to_geotiff, read_vrt

td = tempfile.mkdtemp()
to_geotiff(np.zeros((10, 10), np.uint8),
           os.path.join(td, 'src.tif'), compression='none')

vrt = os.path.join(td, 'big.vrt')
pad = '<!-- ' + ('x' * (2 * 1024 * 1024)) + ' -->'
open(vrt, 'w').write(
    '<VRTDataset rasterXSize="10" rasterYSize="10">' + pad +
    '<VRTRasterBand dataType="Byte" band="1"><SimpleSource>'
    '<SourceFilename relativeToVRT="1">src.tif</SourceFilename>'
    '<SourceBand>1</SourceBand>'
    '<SrcRect xOff="0" yOff="0" xSize="10" ySize="10"/>'
    '<DstRect xOff="0" yOff="0" xSize="10" ySize="10"/>'
    '</SimpleSource></VRTRasterBand></VRTDataset>')

os.environ['XRSPATIAL_VRT_MAX_XML_BYTES'] = '1024'
read_vrt(vrt)              # eager: ValueError (cap enforced)
read_vrt(vrt, chunks=10)   # chunked: succeeds, cap bypassed

After this PR both raise the same ValueError referencing XRSPATIAL_VRT_MAX_XML_BYTES.

Test plan

  • xrspatial/geotiff/tests/test_vrt_xml_size_cap_chunked_1831.py - new file, 3 tests pinning small-cap rejection, default-cap success, raised-cap success for chunks=.
  • Existing test_vrt_xml_size_cap_1815.py (eager path) still passes.
  • Existing test_vrt_lazy_chunks_1814.py (chunked happy path) still passes.

Found by automated security sweep (deep-sweep-security-geotiff-2026-05-13-s4), Cat 1 (unbounded allocation), severity MEDIUM.

PR #1818 capped VRT XML reads at 64 MiB (configurable via
XRSPATIAL_VRT_MAX_XML_BYTES) in xrspatial/geotiff/_vrt.py::_read_vrt_xml
to keep a multi-GB attacker-supplied VRT file from exhausting memory at
parse time.

The chunked dispatcher merged in #1822 (_read_vrt_chunked in
xrspatial/geotiff/__init__.py) parses the VRT independently with a
bare open().read() before calling parse_vrt, so the cap did not apply
to read_vrt(path, chunks=...). The per-task _vrt_chunk_read worker
calls _read_vrt_internal (which goes through _read_vrt_xml), so only
the parent dispatch path was uncapped.

Route the chunked-path XML read through _read_vrt_xml so the same cap
and env-var override apply on both eager and chunked entry points.

Regression test pins the small-cap rejection, the default-cap success,
and the raised-cap success for chunks=.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 14, 2026
@brendancol brendancol requested a review from Copilot May 14, 2026 04:20
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR closes a security regression where read_vrt(..., chunks=...) bypassed the VRT XML size cap introduced in #1818 by performing an unbounded open().read(). It routes the chunked dispatcher’s upfront XML load through _vrt._read_vrt_xml so both eager and chunked entry points consistently enforce XRSPATIAL_VRT_MAX_XML_BYTES.

Changes:

  • Update _read_vrt_chunked to read XML via _read_vrt_xml instead of open().read(), applying the same 64 MiB default cap and env-var override.
  • Add regression tests covering cap rejection, default-cap success, and raised-cap success for the chunked (chunks=) path.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
xrspatial/geotiff/__init__.py Routes chunked-path VRT XML reading through _read_vrt_xml to enforce the size cap consistently.
xrspatial/geotiff/tests/test_vrt_xml_size_cap_chunked_1831.py Adds regression tests ensuring chunks= honors XRSPATIAL_VRT_MAX_XML_BYTES.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

geotiff: read_vrt(chunks=) bypasses VRT XML size cap from #1818

2 participants