Summary
PR #1818 capped VRT XML reads at 64 MiB (override via XRSPATIAL_VRT_MAX_XML_BYTES) to prevent a multi-GB attacker-supplied VRT file from exhausting memory at parse time. The cap is enforced in xrspatial/geotiff/_vrt.py::_read_vrt_xml.
The lazy chunked dispatcher merged in #1822 (_read_vrt_chunked in xrspatial/geotiff/__init__.py) parses the VRT independently and uses an unbounded read on the way in:
# xrspatial/geotiff/__init__.py:4437-4440
with open(source, 'r') as f:
xml_str = f.read()
vrt_dir = _os.path.dirname(_os.path.abspath(source))
vrt = parse_vrt(xml_str, vrt_dir)
So read_vrt(path, chunks=...) reverts to the pre-#1818 behaviour: a 10 GB VRT file is read()-ed in full before any guard fires. The eager read_vrt(path) path (and the per-task _vrt_chunk_read helper, which calls _read_vrt_internal) both go through _read_vrt_xml, so this is only the parent dispatch path.
Reproduction
import os, tempfile, numpy as np
from xrspatial.geotiff import to_geotiff, read_vrt
td = tempfile.mkdtemp()
src = os.path.join(td, 'src.tif')
to_geotiff(np.zeros((10,10), np.uint8), src, compression='none')
PAD = 2 * 1024 * 1024 # 2 MiB padding > the 1 KiB cap below
vrt = os.path.join(td, 'big.vrt')
xml = '<VRTDataset rasterXSize="10" rasterYSize="10">\n'
xml += '<!-- ' + 'x' * PAD + ' -->\n'
xml += '<VRTRasterBand dataType="Byte" band="1"><SimpleSource>'
xml += '<SourceFilename relativeToVRT="1">src.tif</SourceFilename>'
xml += '<SourceBand>1</SourceBand>'
xml += '<SrcRect xOff="0" yOff="0" xSize="10" ySize="10"/>'
xml += '<DstRect xOff="0" yOff="0" xSize="10" ySize="10"/>'
xml += '</SimpleSource></VRTRasterBand></VRTDataset>'
open(vrt, 'w').write(xml)
os.environ['XRSPATIAL_VRT_MAX_XML_BYTES'] = '1024'
# Eager path: ValueError as documented.
# Chunked path: succeeds, returns (10, 10) uint8 -- cap bypassed.
arr = read_vrt(vrt, chunks=10)
print('REGRESSION:', arr.shape)
Fix
_read_vrt_chunked should reuse _vrt._read_vrt_xml instead of its own unbounded open().read() so it inherits the cap and the XRSPATIAL_VRT_MAX_XML_BYTES override.
Severity
MEDIUM (Cat 1 -- unbounded allocation). The chunked path is a public API surface but the parent dispatcher only runs once per top-level call. Still, an attacker-supplied VRT file plus a workflow that uses chunks= (the dask-backed read-path for large mosaics) reaches uncapped f.read() directly, defeating a guard that landed two commits ago.
Found by automated security sweep on deep-sweep-security-geotiff-2026-05-13-s4.
Summary
PR #1818 capped VRT XML reads at 64 MiB (override via
XRSPATIAL_VRT_MAX_XML_BYTES) to prevent a multi-GB attacker-supplied VRT file from exhausting memory at parse time. The cap is enforced inxrspatial/geotiff/_vrt.py::_read_vrt_xml.The lazy chunked dispatcher merged in #1822 (
_read_vrt_chunkedinxrspatial/geotiff/__init__.py) parses the VRT independently and uses an unbounded read on the way in:So
read_vrt(path, chunks=...)reverts to the pre-#1818 behaviour: a 10 GB VRT file isread()-ed in full before any guard fires. The eagerread_vrt(path)path (and the per-task_vrt_chunk_readhelper, which calls_read_vrt_internal) both go through_read_vrt_xml, so this is only the parent dispatch path.Reproduction
Fix
_read_vrt_chunkedshould reuse_vrt._read_vrt_xmlinstead of its own unboundedopen().read()so it inherits the cap and theXRSPATIAL_VRT_MAX_XML_BYTESoverride.Severity
MEDIUM (Cat 1 -- unbounded allocation). The chunked path is a public API surface but the parent dispatcher only runs once per top-level call. Still, an attacker-supplied VRT file plus a workflow that uses
chunks=(the dask-backed read-path for large mosaics) reaches uncappedf.read()directly, defeating a guard that landed two commits ago.Found by automated security sweep on
deep-sweep-security-geotiff-2026-05-13-s4.