Skip to content

geotiff: read_vrt(chunks=) bypasses VRT XML size cap from #1818 #1831

@brendancol

Description

@brendancol

Summary

PR #1818 capped VRT XML reads at 64 MiB (override via XRSPATIAL_VRT_MAX_XML_BYTES) to prevent a multi-GB attacker-supplied VRT file from exhausting memory at parse time. The cap is enforced in xrspatial/geotiff/_vrt.py::_read_vrt_xml.

The lazy chunked dispatcher merged in #1822 (_read_vrt_chunked in xrspatial/geotiff/__init__.py) parses the VRT independently and uses an unbounded read on the way in:

# xrspatial/geotiff/__init__.py:4437-4440
with open(source, 'r') as f:
    xml_str = f.read()
vrt_dir = _os.path.dirname(_os.path.abspath(source))
vrt = parse_vrt(xml_str, vrt_dir)

So read_vrt(path, chunks=...) reverts to the pre-#1818 behaviour: a 10 GB VRT file is read()-ed in full before any guard fires. The eager read_vrt(path) path (and the per-task _vrt_chunk_read helper, which calls _read_vrt_internal) both go through _read_vrt_xml, so this is only the parent dispatch path.

Reproduction

import os, tempfile, numpy as np
from xrspatial.geotiff import to_geotiff, read_vrt

td = tempfile.mkdtemp()
src = os.path.join(td, 'src.tif')
to_geotiff(np.zeros((10,10), np.uint8), src, compression='none')

PAD = 2 * 1024 * 1024  # 2 MiB padding > the 1 KiB cap below
vrt = os.path.join(td, 'big.vrt')
xml = '<VRTDataset rasterXSize="10" rasterYSize="10">\n'
xml += '<!-- ' + 'x' * PAD + ' -->\n'
xml += '<VRTRasterBand dataType="Byte" band="1"><SimpleSource>'
xml += '<SourceFilename relativeToVRT="1">src.tif</SourceFilename>'
xml += '<SourceBand>1</SourceBand>'
xml += '<SrcRect xOff="0" yOff="0" xSize="10" ySize="10"/>'
xml += '<DstRect xOff="0" yOff="0" xSize="10" ySize="10"/>'
xml += '</SimpleSource></VRTRasterBand></VRTDataset>'
open(vrt, 'w').write(xml)

os.environ['XRSPATIAL_VRT_MAX_XML_BYTES'] = '1024'

# Eager path: ValueError as documented.
# Chunked path: succeeds, returns (10, 10) uint8 -- cap bypassed.
arr = read_vrt(vrt, chunks=10)
print('REGRESSION:', arr.shape)

Fix

_read_vrt_chunked should reuse _vrt._read_vrt_xml instead of its own unbounded open().read() so it inherits the cap and the XRSPATIAL_VRT_MAX_XML_BYTES override.

Severity

MEDIUM (Cat 1 -- unbounded allocation). The chunked path is a public API surface but the parent dispatcher only runs once per top-level call. Still, an attacker-supplied VRT file plus a workflow that uses chunks= (the dask-backed read-path for large mosaics) reaches uncapped f.read() directly, defeating a guard that landed two commits ago.

Found by automated security sweep on deep-sweep-security-geotiff-2026-05-13-s4.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions