Describe the bug, including details regarding any error messages, version, and platform.
LocalInputFile.readFully(ByteBuffer) and LocalInputFile.read(ByteBuffer) in parquet-common are broken for any ByteBuffer that either (a) does not expose an accessible backing array or (b) has a non-zero position() when passed in. In practice this means any call to ParquetFileReader.readFooter against an InputFile obtained from new LocalInputFile(path) can fail, Parquet itself passes buffer shapes that trigger the bug.
Root cause
Both methods end with:
buf.put(buffer, buf.position() + buf.arrayOffset(), buf.remaining());
Two independent defects:
- Wrong argument semantics.
ByteBuffer.put(byte[] src, int offset, int length) treats offset as an offset into the source array. The source here is the freshly-allocated local buffer, whose indices have nothing to do with buf.position() or buf.arrayOffset(). It happens to work when both are zero; any other state either reads from the wrong offset or throws IndexOutOfBoundsException.
arrayOffset() is not universally defined. Direct buffers, memory-mapped buffers, and read-only views all throw UnsupportedOperationException from arrayOffset(), so the call explodes before the put is even attempted.
read(ByteBuffer) has an additional bug: it copies buf.remaining() bytes into the destination regardless of how many bytes read(byte[]) actually returned, corrupting the buffer on short reads and advancing position past the EOF boundary.
Stack trace
java.lang.UnsupportedOperationException
at java.base/java.nio.ByteBuffer.arrayOffset(ByteBuffer.java:1558)
at org.apache.parquet.io.LocalInputFile$1.readFully(LocalInputFile.java:93)
at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:642)
at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:578)
at org.apache.parquet.hadoop.ParquetFileReader.<init>(ParquetFileReader.java:971)
at org.apache.parquet.hadoop.ParquetFileReader.<init>(ParquetFileReader.java:961)
Minimal reproducer
Path path = /* any existing Parquet file */;
try (SeekableInputStream s = new LocalInputFile(path).newStream()) {
s.readFully(ByteBuffer.allocateDirect(8)); // throws UnsupportedOperationException
}
try (SeekableInputStream s = new LocalInputFile(path).newStream()) {
ByteBuffer heap = ByteBuffer.allocate(8);
heap.put(new byte[] {0, 0}); // position=2
s.readFully(heap); // reads from wrong offset in source array
}
Version
Component(s)
Core
Describe the bug, including details regarding any error messages, version, and platform.
LocalInputFile.readFully(ByteBuffer)andLocalInputFile.read(ByteBuffer)inparquet-commonare broken for anyByteBufferthat either (a) does not expose an accessible backing array or (b) has a non-zeroposition()when passed in. In practice this means any call toParquetFileReader.readFooteragainst anInputFileobtained from newLocalInputFile(path)can fail, Parquet itself passes buffer shapes that trigger the bug.Root cause
Both methods end with:
buf.put(buffer, buf.position() + buf.arrayOffset(), buf.remaining());Two independent defects:
ByteBuffer.put(byte[] src, int offset, int length)treats offset as an offset into the source array. The source here is the freshly-allocated local buffer, whose indices have nothing to do withbuf.position()orbuf.arrayOffset(). It happens to work when both are zero; any other state either reads from the wrong offset or throwsIndexOutOfBoundsException.arrayOffset()is not universally defined. Direct buffers, memory-mapped buffers, and read-only views all throwUnsupportedOperationExceptionfromarrayOffset(), so the call explodes before the put is even attempted.read(ByteBuffer)has an additional bug: it copiesbuf.remaining()bytes into the destination regardless of how many bytesread(byte[])actually returned, corrupting the buffer on short reads and advancingpositionpast the EOF boundary.Stack trace
Minimal reproducer
Version
LocalInputFile.Component(s)
Core