Skip to content

Tiled JPEG TIFFs from GDAL fail because reader does not inject JPEGTables (tag 347) #1502

@brendancol

Description

@brendancol

Problem

GDAL writes tiled JPEG TIFFs (compress=JPEG photometric=YCBCR) where each tile is a JPEG fragment that depends on shared quantization and Huffman tables stored once in the file under tag 347 (JPEGTables). The reader does not inject those tables into each per-tile stream, so reading any GDAL-written tiled JPEG TIFF raises:

OSError: broken data stream when reading image file

xrspatial-written JPEG TIFFs happen to be self-contained per tile, which is why the existing test suite does not catch this.

Where

xrspatial/geotiff/_compression.py:820-829 — JPEG decode path.

Repro

import rasterio, numpy as np
from xrspatial.geotiff import open_geotiff

arr = (np.random.rand(512, 512, 3) * 255).astype('uint8')
with rasterio.open('/tmp/jpegtiled.tif', 'w', driver='GTiff',
                   height=512, width=512, count=3, dtype='uint8',
                   tiled=True, blockxsize=256, blockysize=256,
                   compress='JPEG', photometric='YCBCR') as dst:
    dst.write(np.transpose(arr, (2, 0, 1)))

open_geotiff('/tmp/jpegtiled.tif')  # OSError

Fix sketch

Read tag 347 (JPEGTables) once at IFD parse time. Before passing each tile's JPEG bytes to the decoder, prepend the tables (or splice them in at the right SOI offset, since JPEGTables ends with EOI and each tile starts with SOI — drop the tile's SOI and the tables' EOI and concatenate).

Severity

Interop break, not numerical drift. Discovered during geotiff accuracy sweep but deferred from that sweep because pixel-value semantics are correct once the JPEG decodes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions