Skip to content

GPU predictor=2 produces wrong values on big-endian TIFFs #1517

@brendancol

Description

@brendancol

Problem

After #1515 fixes the BE byteswap crash in read_geotiff_gpu, predictor=2 BE multi-byte TIFFs no longer raise but instead return wrong values.

Reproducer

import numpy as np, tifffile
from xrspatial.geotiff import read_geotiff_gpu
from xrspatial.geotiff._reader import read_to_array

rng = np.random.RandomState(20260507)
arr = rng.randint(-1_000_000, 1_000_000, size=(32, 48), dtype=np.int64).astype(np.int32)
tifffile.imwrite('be_pred2.tif', arr, byteorder='>', predictor=2,
                 compression='deflate', tile=(16, 16))

cpu, _ = read_to_array('be_pred2.tif')         # correct
gpu = read_geotiff_gpu('be_pred2.tif').data    # ~93% of pixels mismatch

CPU path is correct (PR #1507 fixed BE predictor=2 on the CPU side).

Root cause sketch

_apply_predictor_and_assemble in _gpu_decode.py runs _gpu_predictor2_decode before the final BE byteswap. The kernel views the byte buffer as native uint16/uint32 and computes prefix-sum differences in that interpretation, but BE files store samples MSB-first. The differences need to be computed on the native-endian samples, so either the byteswap has to happen first (at the byte-buffer level, which is awkward across tile rows) or the kernel itself needs an endian flag like _fp_predictor_decode_kernel already does for predictor=3.

Severity

LOW (silent fallback in the wrapper still produces correct output, since the values disagree but the wrapper does not detect that). After #1515 lands, predictor=2 BE files will hit this code path for real and return wrong data.

Related

#1508, #1515, #1507.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions