refactor: unify ResultSet implementations on Arrow-backed path#175
refactor: unify ResultSet implementations on Arrow-backed path#175mkaufmann wants to merge 24 commits into
Conversation
…ArrowStreamReader Rework the ResultSet unification to address two reviewer requests on #175: 1. Share the vector-building code with the parameter-encoding path instead of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a row-indexed primitive (setCell) used by both callers. The existing single-row parameter-binding overload and a new many-row metadata overload both funnel through it, and all the individual vector setters are parameterised by row index. 2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only interface. The metadata path now serialises a populated VSR to Arrow IPC bytes and wraps the result in a ByteArrayInputStream-backed ArrowStreamReader, so both streaming and metadata result sets travel through exactly the same reader/cursor plumbing. Supporting changes: - typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now round-trip through Arrow via a jdbc:type_name field-metadata key rather than a columns-override parameter on StreamingResultSet. HyperTypeToArrow stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it back. - StreamingResultSet drops the ofInMemory(...) factory and the columns override; callers construct an ArrowStreamReader + BufferAllocator pair and hand them to of(reader, allocator, queryId, zone). The cursor owns both and closes reader-then-allocator on close. - QueryResultArrowStream.toArrowStreamReader returns a simple Result holder (reader + allocator) instead of an AutoCloseable bundle. - MetadataResultSets is the single entry point for Arrow-backed metadata result sets; MetadataArrowBuilder is deleted. - Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor doesn't interpret a zero-row batch as "at least one row available". - Tests updated to the new API; StreamingResultSetMethodTest builds its in-memory ResultSet the same way as the metadata path (IPC round-trip).
9ecba8a to
08d62bc
Compare
…ArrowStreamReader Rework the ResultSet unification to address two reviewer requests on #175: 1. Share the vector-building code with the parameter-encoding path instead of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a row-indexed primitive (setCell) used by both callers. The existing single-row parameter-binding overload and a new many-row metadata overload both funnel through it, and all the individual vector setters are parameterised by row index. 2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only interface. The metadata path now serialises a populated VSR to Arrow IPC bytes and wraps the result in a ByteArrayInputStream-backed ArrowStreamReader, so both streaming and metadata result sets travel through exactly the same reader/cursor plumbing. Supporting changes: - typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now round-trip through Arrow via a jdbc:type_name field-metadata key rather than a columns-override parameter on StreamingResultSet. HyperTypeToArrow stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it back. - StreamingResultSet drops the ofInMemory(...) factory and the columns override; callers construct an ArrowStreamReader + BufferAllocator pair and hand them to of(reader, allocator, queryId, zone). The cursor owns both and closes reader-then-allocator on close. - QueryResultArrowStream.toArrowStreamReader returns a simple Result holder (reader + allocator) instead of an AutoCloseable bundle. - MetadataResultSets is the single entry point for Arrow-backed metadata result sets; MetadataArrowBuilder is deleted. - Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor doesn't interpret a zero-row batch as "at least one row available". - Tests updated to the new API; StreamingResultSetMethodTest builds its in-memory ResultSet the same way as the metadata path (IPC round-trip).
b648940 to
07125b1
Compare
…ArrowStreamReader Rework the ResultSet unification to address two reviewer requests on #175: 1. Share the vector-building code with the parameter-encoding path instead of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a row-indexed primitive (setCell) used by both callers. The existing single-row parameter-binding overload and a new many-row metadata overload both funnel through it, and all the individual vector setters are parameterised by row index. 2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only interface. The metadata path now serialises a populated VSR to Arrow IPC bytes and wraps the result in a ByteArrayInputStream-backed ArrowStreamReader, so both streaming and metadata result sets travel through exactly the same reader/cursor plumbing. Supporting changes: - typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now round-trip through Arrow via a jdbc:type_name field-metadata key rather than a columns-override parameter on StreamingResultSet. HyperTypeToArrow stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it back. - StreamingResultSet drops the ofInMemory(...) factory and the columns override; callers construct an ArrowStreamReader + BufferAllocator pair and hand them to of(reader, allocator, queryId, zone). The cursor owns both and closes reader-then-allocator on close. - QueryResultArrowStream.toArrowStreamReader returns a simple Result holder (reader + allocator) instead of an AutoCloseable bundle. - MetadataResultSets is the single entry point for Arrow-backed metadata result sets; MetadataArrowBuilder is deleted. - Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor doesn't interpret a zero-row batch as "at least one row available". - Tests updated to the new API; StreamingResultSetMethodTest builds its in-memory ResultSet the same way as the metadata path (IPC round-trip).
Now that QueryJDBCAccessor.getObject(Class) provides the raw + isInstance fallback as its base-class default, StreamingResultSet no longer needs the catch-and-retry path that worked around accessors which threw "Operation not supported." Collapse getObject(int, Class) to direct dispatch and update the regression test's WHY comment to point at the accessor base class as the load-bearing layer. Addresses: review comment on PR #175 line 388.
Three small follow-ups from PR #175 review: - StreamingResultSet.of: drop the paragraph that pointed at the HyperTypeToArrow.JDBC_TYPE_NAME_METADATA_KEY field-metadata key. The docstring spilled implementation detail of the metadata-stamping path into a generic "create a result set from a reader" entry-point; the type-name override is documented at HyperTypeToArrow / ColumnMetadata where it's relevant. - ArrowStreamReaderCursor.loadNextNonEmptyBatch: rewrite the rationale to answer "why does the cursor consume empty batches instead of the caller?" directly. Empty IPC batches are valid Arrow and producers emit them; JDBC's next() only knows rows, so this cursor is the seam that translates batch-level signals into row-level advances. - MetadataResultSetsTest: drop the JDBC ResultSet-shape slice (next / isClosed / getStatement / unwrap / isWrapperFor / getHoldability / getFetchSize / setFetchSize / getWarnings / getConcurrency / getType / getFetchDirection). Those test the StreamingResultSet plumbing shared by every result set on this branch and are already covered by StreamingResultSetMethodTest. Keep the arity-contract slice (short/long/right/null/empty rows) — that is the metadata-result-set-specific behavior. Addresses: review comments on PR #175.
07125b1 to
e329860
Compare
…ArrowStreamReader Rework the ResultSet unification to address two reviewer requests on #175: 1. Share the vector-building code with the parameter-encoding path instead of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a row-indexed primitive (setCell) used by both callers. The existing single-row parameter-binding overload and a new many-row metadata overload both funnel through it, and all the individual vector setters are parameterised by row index. 2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only interface. The metadata path now serialises a populated VSR to Arrow IPC bytes and wraps the result in a ByteArrayInputStream-backed ArrowStreamReader, so both streaming and metadata result sets travel through exactly the same reader/cursor plumbing. Supporting changes: - typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now round-trip through Arrow via a jdbc:type_name field-metadata key rather than a columns-override parameter on StreamingResultSet. HyperTypeToArrow stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it back. - StreamingResultSet drops the ofInMemory(...) factory and the columns override; callers construct an ArrowStreamReader + BufferAllocator pair and hand them to of(reader, allocator, queryId, zone). The cursor owns both and closes reader-then-allocator on close. - QueryResultArrowStream.toArrowStreamReader returns a simple Result holder (reader + allocator) instead of an AutoCloseable bundle. - MetadataResultSets is the single entry point for Arrow-backed metadata result sets; MetadataArrowBuilder is deleted. - Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor doesn't interpret a zero-row batch as "at least one row available". - Tests updated to the new API; StreamingResultSetMethodTest builds its in-memory ResultSet the same way as the metadata path (IPC round-trip).
Now that QueryJDBCAccessor.getObject(Class) provides the raw + isInstance fallback as its base-class default, StreamingResultSet no longer needs the catch-and-retry path that worked around accessors which threw "Operation not supported." Collapse getObject(int, Class) to direct dispatch and update the regression test's WHY comment to point at the accessor base class as the load-bearing layer. Addresses: review comment on PR #175 line 388.
Three small follow-ups from PR #175 review: - StreamingResultSet.of: drop the paragraph that pointed at the HyperTypeToArrow.JDBC_TYPE_NAME_METADATA_KEY field-metadata key. The docstring spilled implementation detail of the metadata-stamping path into a generic "create a result set from a reader" entry-point; the type-name override is documented at HyperTypeToArrow / ColumnMetadata where it's relevant. - ArrowStreamReaderCursor.loadNextNonEmptyBatch: rewrite the rationale to answer "why does the cursor consume empty batches instead of the caller?" directly. Empty IPC batches are valid Arrow and producers emit them; JDBC's next() only knows rows, so this cursor is the seam that translates batch-level signals into row-level advances. - MetadataResultSetsTest: drop the JDBC ResultSet-shape slice (next / isClosed / getStatement / unwrap / isWrapperFor / getHoldability / getFetchSize / setFetchSize / getWarnings / getConcurrency / getType / getFetchDirection). Those test the StreamingResultSet plumbing shared by every result set on this branch and are already covered by StreamingResultSetMethodTest. Keep the arity-contract slice (short/long/right/null/empty rows) — that is the metadata-result-set-specific behavior. Addresses: review comments on PR #175.
e329860 to
d17f1b0
Compare
|
Per the two review threads, split out the cherry-pickable fixes as their own PRs against
This PR (#175) keeps the same fixes as the first two commits — when #185 / #186 land, those commits will collapse to no-ops at rebase time. For the remaining "should QueryResultArrowStream allocator-ownership move pre-unify too?" thread (#175 review): waiting on your call before I do that split. As I noted there, it's a non-trivial surgery on the unify commit and I'd rather get your sign-off before rewriting ~800 lines of refactor. |
…ArrowStreamReader Rework the ResultSet unification to address two reviewer requests on #175: 1. Share the vector-building code with the parameter-encoding path instead of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a row-indexed primitive (setCell) used by both callers. The existing single-row parameter-binding overload and a new many-row metadata overload both funnel through it, and all the individual vector setters are parameterised by row index. 2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only interface. The metadata path now serialises a populated VSR to Arrow IPC bytes and wraps the result in a ByteArrayInputStream-backed ArrowStreamReader, so both streaming and metadata result sets travel through exactly the same reader/cursor plumbing. Supporting changes: - typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now round-trip through Arrow via a jdbc:type_name field-metadata key rather than a columns-override parameter on StreamingResultSet. HyperTypeToArrow stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it back. - StreamingResultSet drops the ofInMemory(...) factory and the columns override; callers construct an ArrowStreamReader + BufferAllocator pair and hand them to of(reader, allocator, queryId, zone). The cursor owns both and closes reader-then-allocator on close. - QueryResultArrowStream.toArrowStreamReader returns a simple Result holder (reader + allocator) instead of an AutoCloseable bundle. - MetadataResultSets is the single entry point for Arrow-backed metadata result sets; MetadataArrowBuilder is deleted. - Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor doesn't interpret a zero-row batch as "at least one row available". - Tests updated to the new API; StreamingResultSetMethodTest builds its in-memory ResultSet the same way as the metadata path (IPC round-trip).
Now that QueryJDBCAccessor.getObject(Class) provides the raw + isInstance fallback as its base-class default, StreamingResultSet no longer needs the catch-and-retry path that worked around accessors which threw "Operation not supported." Collapse getObject(int, Class) to direct dispatch and update the regression test's WHY comment to point at the accessor base class as the load-bearing layer. Addresses: review comment on PR #175 line 388.
Three small follow-ups from PR #175 review: - StreamingResultSet.of: drop the paragraph that pointed at the HyperTypeToArrow.JDBC_TYPE_NAME_METADATA_KEY field-metadata key. The docstring spilled implementation detail of the metadata-stamping path into a generic "create a result set from a reader" entry-point; the type-name override is documented at HyperTypeToArrow / ColumnMetadata where it's relevant. - ArrowStreamReaderCursor.loadNextNonEmptyBatch: rewrite the rationale to answer "why does the cursor consume empty batches instead of the caller?" directly. Empty IPC batches are valid Arrow and producers emit them; JDBC's next() only knows rows, so this cursor is the seam that translates batch-level signals into row-level advances. - MetadataResultSetsTest: drop the JDBC ResultSet-shape slice (next / isClosed / getStatement / unwrap / isWrapperFor / getHoldability / getFetchSize / setFetchSize / getWarnings / getConcurrency / getType / getFetchDirection). Those test the StreamingResultSet plumbing shared by every result set on this branch and are already covered by StreamingResultSetMethodTest. Keep the arity-contract slice (short/long/right/null/empty rows) — that is the metadata-result-set-specific behavior. Addresses: review comments on PR #175.
d17f1b0 to
f4cad29
Compare
…ArrowStreamReader Rework the ResultSet unification to address two reviewer requests on #175: 1. Share the vector-building code with the parameter-encoding path instead of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a row-indexed primitive (setCell) used by both callers. The existing single-row parameter-binding overload and a new many-row metadata overload both funnel through it, and all the individual vector setters are parameterised by row index. 2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only interface. The metadata path now serialises a populated VSR to Arrow IPC bytes and wraps the result in a ByteArrayInputStream-backed ArrowStreamReader, so both streaming and metadata result sets travel through exactly the same reader/cursor plumbing. Supporting changes: - typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now round-trip through Arrow via a jdbc:type_name field-metadata key rather than a columns-override parameter on StreamingResultSet. HyperTypeToArrow stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it back. - StreamingResultSet drops the ofInMemory(...) factory and the columns override; callers construct an ArrowStreamReader + BufferAllocator pair and hand them to of(reader, allocator, queryId, zone). The cursor owns both and closes reader-then-allocator on close. - QueryResultArrowStream.toArrowStreamReader returns a simple Result holder (reader + allocator) instead of an AutoCloseable bundle. - MetadataResultSets is the single entry point for Arrow-backed metadata result sets; MetadataArrowBuilder is deleted. - Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor doesn't interpret a zero-row batch as "at least one row available". - Tests updated to the new API; StreamingResultSetMethodTest builds its in-memory ResultSet the same way as the metadata path (IPC round-trip).
Now that QueryJDBCAccessor.getObject(Class) provides the raw + isInstance fallback as its base-class default, StreamingResultSet no longer needs the catch-and-retry path that worked around accessors which threw "Operation not supported." Collapse getObject(int, Class) to direct dispatch and update the regression test's WHY comment to point at the accessor base class as the load-bearing layer. Addresses: review comment on PR #175 line 388.
Three small follow-ups from PR #175 review: - StreamingResultSet.of: drop the paragraph that pointed at the HyperTypeToArrow.JDBC_TYPE_NAME_METADATA_KEY field-metadata key. The docstring spilled implementation detail of the metadata-stamping path into a generic "create a result set from a reader" entry-point; the type-name override is documented at HyperTypeToArrow / ColumnMetadata where it's relevant. - ArrowStreamReaderCursor.loadNextNonEmptyBatch: rewrite the rationale to answer "why does the cursor consume empty batches instead of the caller?" directly. Empty IPC batches are valid Arrow and producers emit them; JDBC's next() only knows rows, so this cursor is the seam that translates batch-level signals into row-level advances. - MetadataResultSetsTest: drop the JDBC ResultSet-shape slice (next / isClosed / getStatement / unwrap / isWrapperFor / getHoldability / getFetchSize / setFetchSize / getWarnings / getConcurrency / getType / getFetchDirection). Those test the StreamingResultSet plumbing shared by every result set on this branch and are already covered by StreamingResultSetMethodTest. Keep the arity-contract slice (short/long/right/null/empty rows) — that is the metadata-result-set-specific behavior. Addresses: review comments on PR #175.
67ddd24 to
b97abe2
Compare
…ArrowStreamReader Rework the ResultSet unification to address two reviewer requests on #175: 1. Share the vector-building code with the parameter-encoding path instead of having a dedicated MetadataArrowBuilder. VectorPopulator now exposes a row-indexed primitive (setCell) used by both callers. The existing single-row parameter-binding overload and a new many-row metadata overload both funnel through it, and all the individual vector setters are parameterised by row index. 2. Keep ArrowStreamReaderCursor on its original ArrowStreamReader-only interface. The metadata path now serialises a populated VSR to Arrow IPC bytes and wraps the result in a ByteArrayInputStream-backed ArrowStreamReader, so both streaming and metadata result sets travel through exactly the same reader/cursor plumbing. Supporting changes: - typeName overrides (e.g. "TEXT" for JDBC-spec metadata columns) now round-trip through Arrow via a jdbc:type_name field-metadata key rather than a columns-override parameter on StreamingResultSet. HyperTypeToArrow stamps the key on write; ArrowToHyperTypeMapper.toColumnMetadata reads it back. - StreamingResultSet drops the ofInMemory(...) factory and the columns override; callers construct an ArrowStreamReader + BufferAllocator pair and hand them to of(reader, allocator, queryId, zone). The cursor owns both and closes reader-then-allocator on close. - QueryResultArrowStream.toArrowStreamReader returns a simple Result holder (reader + allocator) instead of an AutoCloseable bundle. - MetadataResultSets is the single entry point for Arrow-backed metadata result sets; MetadataArrowBuilder is deleted. - Empty metadata results skip writeBatch() entirely so ArrowStreamReaderCursor doesn't interpret a zero-row batch as "at least one row available". - Tests updated to the new API; StreamingResultSetMethodTest builds its in-memory ResultSet the same way as the metadata path (IPC round-trip).
StreamingResultSet.of catches IOException and IllegalArgumentException from the Arrow schema decode and rewraps as SQLException. At all four query-path call sites (DataCloudConnection.getRowBasedResultSet, getChunkBasedResultSet, DataCloudStatement.executeQuery, getResultSet) the surrounding try-catch only catches StatusRuntimeException, so a SQLException thrown from of() bypasses it and leaks the 100 MB RootAllocator returned by QueryResultArrowStream.toArrowStreamReader. Introduce StreamingResultSet.ofClosingOnFailure(Result, queryId, sessionZone) that takes the reader+allocator pair and closes both on construction failure (reader first so its buffers release before the allocator's budget check). Switch all four call sites to it. The metadata path in MetadataResultSets.of already had this shape; this fixes the matching gap on the query side. Add a regression test that builds an Arrow IPC stream with an unsupported field type (LargeUtf8) and asserts the helper closes both the reader and the allocator on the resulting SQLException.
The Int/SmallInt/TinyInt setters widened from concrete boxed types (Integer/Short/Byte) to Number so metadata rows could pass long values, but lost the implicit "right boxed type" check at the call sites that went through DataCloudPreparedStatement.setObject for parameter binding. A user binding Long.MAX_VALUE to an INT32 parameter would silently get (int) Long.MAX_VALUE = -1 written to the vector. Add an explicit range check on Int/SmallInt/TinyInt setters before narrowing. Both the metadata path and the parameter-binding path go through these setters, so strict checks here mean strict on both paths. BigInt accepts the full long range and is unchanged. Pin the behavior with a focused unit test (IntegerVectorSetterRangeCheckTest).
The driver round-trips JDBC-spec type-name overrides (e.g. "TEXT" for metadata columns) through Arrow field metadata under a custom key. The previous key "jdbc:type_name" used an unprefixed namespace not reserved by the Arrow spec — Hyper, query-federator, or another Arrow producer could emit a same-named key in a future protocol version, in which case ArrowToHyperTypeMapper would silently override its own derived type name with whatever upstream stamped. Rename to "datacloud-jdbc:type_name" so the namespace is unambiguous, and expand the field's javadoc to document the namespace rationale.
The fallback in ArrowToHyperTypeMapper.toColumnMetadata — when a field
has no datacloud-jdbc:type_name override, ColumnMetadata.typeName is
null and the JDBC layer derives the column type-name from the
HyperType — was load-bearing but unasserted. Real Hyper Arrow streams
never stamp the override, so every functional query test exercised the
fallback implicitly; if a future refactor broke it, the regression
would not surface in the existing suite.
Two new pin tests:
- ArrowToHyperTypeMapperTest at the unit boundary: field with override
-> typeName matches; field without override (null metadata, empty
metadata) -> typeName is null.
- StreamingResultSetTest.getColumnTypeNameFallsBackToDerivedNameOnRealHyperStream
end-to-end against local Hyper: executeQuery on a select with INT,
VARCHAR, DECIMAL columns asserts ResultSetMetaData.getColumnTypeName
returns the derived names ("INTEGER", "VARCHAR", "DECIMAL").
Drive-by pin test: StreamingResultSet.getObject(int, Map<String,Class<?>>) with a null or empty type map should behave like plain getObject(int) per the JDBC spec. Previously not asserted anywhere. The companion getObject(Class) fallback test landed earlier on this branch, bundled into the QueryJDBCAccessor base-class fix commit so the fix and its end-to-end coverage ship as a single cherry-pickable unit.
Previously a row with the wrong number of elements would silently leave the trailing columns as Arrow null (interpreted as missing values). Today every caller routes through MetadataSchemas so the sizes match by construction, but a future caller bug would surface only inside vector population, far from the boundary. Add an explicit arity check at the of(...) entrypoint: each non-null row must have exactly columns.size() elements. Null rows are accepted as the all-nulls row (matching the legacy coerceRows convention of turning null into emptyList). Empty rows are accepted only when the schema is also empty. Pin behavior with MetadataResultSetsTest covering short, long, correct-arity, null-row, and empty-rows cases.
Now that ArrowStreamReaderCursor.loadNextNonEmptyBatch (introduced earlier on this branch as a pre-unify cursor fix) consumes empty batches at the cursor seam, MetadataResultSets.writeArrowStream no longer needs its own "skip writeBatch when rowCount==0" workaround: the cursor handles the empty-only case correctly. Remove the special case and always emit a batch. Tightens the zeroRowOnlyBatchYieldsNoRows test docstring to match.
DataCloudMetadataResultSet was deleted in this PR, but the test file retained the old name and lived in the wrong package. Merge its empty- result-set JDBC-shape smoke tests into the new MetadataResultSetsTest under the .core.metadata package and delete the legacy file. No behavior change.
Now that QueryJDBCAccessor.getObject(Class) provides the raw + isInstance fallback as its base-class default, StreamingResultSet no longer needs the catch-and-retry path that worked around accessors which threw "Operation not supported." Collapse getObject(int, Class) to direct dispatch and update the regression test's WHY comment to point at the accessor base class as the load-bearing layer. Addresses: review comment on PR #175 line 388.
Three small follow-ups from PR #175 review: - StreamingResultSet.of: drop the paragraph that pointed at the HyperTypeToArrow.JDBC_TYPE_NAME_METADATA_KEY field-metadata key. The docstring spilled implementation detail of the metadata-stamping path into a generic "create a result set from a reader" entry-point; the type-name override is documented at HyperTypeToArrow / ColumnMetadata where it's relevant. - ArrowStreamReaderCursor.loadNextNonEmptyBatch: rewrite the rationale to answer "why does the cursor consume empty batches instead of the caller?" directly. Empty IPC batches are valid Arrow and producers emit them; JDBC's next() only knows rows, so this cursor is the seam that translates batch-level signals into row-level advances. - MetadataResultSetsTest: drop the JDBC ResultSet-shape slice (next / isClosed / getStatement / unwrap / isWrapperFor / getHoldability / getFetchSize / setFetchSize / getWarnings / getConcurrency / getType / getFetchDirection). Those test the StreamingResultSet plumbing shared by every result set on this branch and are already covered by StreamingResultSetMethodTest. Keep the arity-contract slice (short/long/right/null/empty rows) — that is the metadata-result-set-specific behavior. Addresses: review comments on PR #175.
StreamingResultSet had two public factories — of(reader, allocator, queryId[, zone]) (4 callers) and ofClosingOnFailure(Result, queryId, zone) (5 callers). Every production caller wanted the close-on-failure behavior; only tests and the metadata helper used the bare of(). Two factories with overlapping responsibilities is one too many — a caller hitting the bare of() and not knowing about ofClosingOnFailure would silently leak the 100 MB RootAllocator on construction failure. Collapse to one public factory: - of(QueryResultArrowStream.Result, queryId, sessionZone) — the only factory callers see, always closes both reader and allocator on failure. Name is the unambiguous "of" because there is no other. - create(reader, allocator, queryId, sessionZone) — private; just the construction body the factory wraps. Production call sites (DataCloudConnection, DataCloudStatement) and MetadataResultSets were already passing a (reader, allocator) pair, so the call shape collapses to passing the Result holder. Tests that were building the pair locally now wrap it in a Result the same way.
…r interface Pre-unify there were three result-set implementations: StreamingResultSet (streaming Arrow query results), DataCloudMetadataResultSet (metadata), SimpleResultSet (in-memory rows). The DataCloudResultSet interface — a one-method (getQueryId) extension over java.sql.ResultSet — was the common "implements" the public API surfaced; StreamingResultSet was the only one that ever implemented it as a non-trivial impl. The unify refactor collapsed all three implementations into StreamingResultSet, but kept the interface and the "Streaming" name. Two problems fall out: - The "Streaming" name now lies. Metadata results flow through the same class but they're a one-shot in-memory IPC blob — nothing streaming about them. MetadataResultSets.of even passes /*queryId=*/ null because there is no query. - The DataCloudResultSet interface has one implementer and one method. Layering an interface for one impl is just a reader trap: callers instinctively look for "what other implementations exist" and find none. Collapse the two: - Rename the class StreamingResultSet -> DataCloudResultSet. - Delete the old DataCloudResultSet interface (the public method getQueryId() now lives directly on the class via @Getter). - Update all production and test references; rename the affected test files to match (StreamingResultSet*Test -> DataCloudResultSet*Test). The public API surface is unchanged in source for the common cases: DataCloudConnection.getRowBasedResultSet / getChunkBasedResultSet still return DataCloudResultSet, just as a class instead of an interface. This is binary-incompatible for any caller that ever cast to or implemented the old interface; in practice only StreamingResultSet implemented it on the read side, and no code outside the driver implemented it on the write side.
b83e8b7 to
7baf67d
Compare
|
Rebased on Walked the 10 inline findings against the rebased tree — all still apply. PRs #185 (zero-row batch skip) and #186 ( GitHub now flags the inline comments as outdated because of the SHA rewrite. The findings themselves are unchanged — text and line anchors are still correct. Generated by the |
…torSetter DatabaseMetaData.getTypeInfo declared CASE_SENSITIVE, UNSIGNED_ATTRIBUTE, FIXED_PREC_SCALE, and AUTO_INCREMENT as VARCHAR while the row producer in HyperTypes.buildTypeInfoRow wrote Boolean values into them. The mismatch worked only because VarCharVectorSetter accepted Object and silently called value.toString(), so the four columns surfaced as "true"/"false" strings instead of the boolean payload JDBC 4.2 (DatabaseMetaData.getTypeInfo) and pgjdbc both define for these positions. Declare the four columns with a new bool(...) helper in MetadataSchemas that produces a HyperType.bool(true) / Constants.BOOL ColumnMetadata. The existing BitVectorSetter already accepts Boolean, so the row producer is unchanged. Tighten VarCharVectorSetter from BaseVectorSetter<VarCharVector, Object> to <VarCharVector, String> so non-String payloads fail fast at the BaseVectorSetter type guard instead of being toString-coerced — the byte[] arm was dead (setBytes / setBinaryStream / setUnicodeStream / setAsciiStream all throw FEATURE_NOT_SUPPORTED in DataCloudPreparedStatement). Both fixes land together because tightening the setter without the schema fix would make getTypeInfo throw IllegalArgumentException on the Boolean payload. Behavior change: getObject on the four columns now returns Boolean (per JDBC spec), not String. Callers that previously cast (String) rs.getObject(...) will get a ClassCastException; rs.getBoolean(...) starts working where it previously threw on the VARCHAR path, and rs.getString(...) keeps returning the same lowercase "true"/"false" via BooleanVectorAccessor.getString. Pin the schema with three new MetadataSchemasTest methods mirroring the existing COLUMNS coverage (names / typeNames / JdbcTypeIds), add a strict- type regression test for VarCharVectorSetter modeled on IntegerVectorSetterRangeCheckTest so a future re-widening trips CI, and exercise rs.getBoolean on the four boolean columns end-to-end in DataCloudDatabaseMetadataTest.testGetTypeInfo.
ArrowStreamReaderCursor.close used a plain try/finally that closed reader first, allocator second. When both threw — the most likely failure mode because the allocator's leak detector fires on close when buffers are still outstanding, which is exactly what an exception during reader.close produces — Java's finally semantics replaced the reader's exception with the allocator's. The reader exception is the diagnostically interesting one (the leak detector firing on allocator.close is usually a symptom); silently dropping it left only the symptom in the stack trace. Switch the cursor to try-with-resources over the (allocator, reader) pair so reader closes first and the allocator's exception attaches as suppressed onto the reader's instead of replacing it. Same fix on the construction-failure cleanup in DataCloudResultSet.of: the reader.close was already wrapped with addSuppressed but the immediately-following allocator.close was bare and could replace the original construction SQLException; wrap it the same way. Pin the new behavior with a Mockito-based test that throws from both reader.close and allocator.close and asserts the reader's exception is primary with the allocator's attached as suppressed.
DataCloudStatement.executeQuery and getResultSet both fetched iterator.getQueryStatus().getQueryId() twice — once when constructing arrowStream and once when calling DataCloudResultSet.of with it. The second call sat between arrowStream creation (which puts a 100 MB RootAllocator on the field) and the of(...) call that takes ownership of that allocator. If the second getQueryId() throws — e.g. a future refactor makes getQueryStatus async, or a transient gRPC failure surfaces through the cached proto — the allocator escapes both DataCloudResultSet.of's own try/catch (never entered) and the surrounding catch (StatusRuntimeException) (which doesn't close arrowStream). The PR explicitly claims every code path closes its allocator; this hoist closes the window without changing any observable behavior.
…fy cap Both Result-holder construction sites — QueryResultArrowStream.toArrowStreamReader and MetadataResultSets.of — built the RootAllocator first and then handed it to a new ArrowStreamReader. If the reader's constructor throws (today benign, but a future Arrow upgrade could add constructor-side validation), the allocator escapes both DataCloudResultSet.of's own try/catch (never entered) and the caller's catch. Wrap the construction in try/catch that closes the allocator on the way out and attaches any close failure as suppressed. Unify the per-allocator cap: MetadataResultSets used Long.MAX_VALUE while the gRPC path was capped at 100 MB. The cap exists because Arrow allocators are accounted memory, so hitting the cap throws a clean OutOfMemoryException instead of letting the JVM OOM. A getColumns(...) against a tenant with thousands of tables silently bypassed the cap on the metadata path. Promote the constant to public ROOT_ALLOCATOR_BUDGET_BYTES on QueryResultArrowStream and reuse it from MetadataResultSets. No new test: the failure mode requires ArrowStreamReader's constructor to throw, which doesn't happen with ByteArrayInputStream or the gRPC channel today. Pure code-shape fix; existing suite stays green.
JDBC 4.2 Table B-6 lists INTEGER → boolean as a recommended conversion: 0
maps to false, non-zero maps to true. pgjdbc and other major drivers do
this. BaseIntVectorAccessor inherited the abstract default from
QueryJDBCAccessor, which throws SQLFeatureNotSupportedException — so
rs.getBoolean("NULLABLE") on a metadata int column failed where every
spec-respecting client expects it to work.
Add the override (one line, getLong() != 0). Restore the
assertThat(getBoolean("NULLABLE")).isFalse() assertion that was deleted in
the metadata-unify rebase, add a positive case for ORDINAL_POSITION, and
correct the comment that previously claimed the coercion already happened.
This is the second half of the JDBC-spec compliance pair started in
4c52910 (TYPE_INFO boolean columns are now declared as BIT, so VARCHAR →
boolean isn't needed there). After this commit, every metadata int column
that BI tools read as boolean (NULLABLE, columnNoNulls/columnNullable
values, ORDINAL_POSITION) returns the spec-correct coercion.
…lures DataCloudResultSet.close set the closed flag *after* delegating to cursor.close. If cursor.close threw — e.g. allocator's leak detector trips an IllegalStateException, or the addSuppressed chain surfaces a reader exception — the flag stayed false. A defensive caller that catches the close failure and retries (JDBC connection pools and driver wrappers sometimes do) re-entered cursor.close, calling allocator.close on an already-closed RootAllocator. Arrow throws on second close. Flip the flag before delegating, matching the standard JDK AutoCloseable idempotence pattern. The caller still gets the cleanup exception on the first close; subsequent closes become no-ops. Pin the new contract with a Mockito-based test that throws from allocator.close on first cursor.close and asserts the second close is a no-op (allocator.close called exactly once across both attempts).
DecimalVectorSetter.setValueInternal called value.unscaledValue().longValue()
unconditionally. BigInteger.longValue() silently truncates to the low 64
bits when the magnitude exceeds Long.MAX_VALUE — a BigDecimal("1E40")
wrote garbage and the caller had no way to know. The integer setters in
this same file already throw IllegalArgumentException for analogous
out-of-range narrowing (lines 240, 261, 433); decimal was the asymmetry.
Add the same guard via BigInteger.bitLength() > 63: catches both
directions of overflow and throws IllegalArgumentException with the
unscaled value in the message. Pin the new behavior with three tests
mirroring the integer-setter range-check style — accepts values within
long range, rejects above, rejects below.
The separate scale-mismatch issue (DecimalVectorSetter ignores the
vector's declared scale and writes whatever scale the caller's BigDecimal
happens to have) is out of scope for this finding.
closesReaderAndAllocator only counted invocations on reader.close and allocator.close — a regression that flipped the order would slip past CI. Add an InOrder assertion so the load-bearing ordering documented on ArrowStreamReaderCursor.close (reader first, then allocator, so the allocator's closing budget check sees no outstanding ArrowBufs) is explicit at the test level rather than inferred indirectly from the throw-during-close test's primary/suppressed invariant.
The driver carried an Arrow field-metadata key (`datacloud-jdbc:type_name`), a `ColumnMetadata.typeName` field, four `Constants` tag strings, and a `bool(...)` schema helper to override `ResultSetMetaData.getColumnTypeName(int)` with Hyper-flavored labels — `"TEXT"` instead of `"VARCHAR"`, `"SHORT"` instead of `"SMALLINT"`, `"BOOL"` instead of `"BOOLEAN"`, `"INTEGER"` instead of `"INTEGER"` (no-op). Worth checking whether this was actually buying anything. JDBC 4.2 spec on `getColumnTypeName`: "Retrieves the database-specific type name for the designated column." No specific strings required. pgjdbc returns lowercase Postgres-native names (`text`, `int4`, `bool`, `varchar`, `bpchar`, `numeric`). The driver's own JDBCReferenceTest already normalizes both forms by mapping pgjdbc's `TEXT` to `JDBCType.VARCHAR.getName()` and `BPCHAR` to `JDBCType.CHAR.getName()` before comparison — so even internally the JDBC names are the canonical form. Spark `TypeMapping.scala` branches only on the int `getColumnType` code, never on the type-name string. Drop the override channel entirely. `getColumnTypeName` now returns `HyperTypes.toJdbcTypeName(col.getType())` for every column. Every metadata result-set column reports its JDBC-spec name. Removes ~125 lines of plumbing across `HyperTypeToArrow` (the metadata key write path), `ArrowToHyperTypeMapper` (read path), `ColumnMetadata` (the `typeName` field and 3-arg constructor), `MetadataSchemas` (the third arg on every helper), `Constants` (TEXT/INTEGER/SHORT/BOOL fields), `DataCloudResultSetMetaData.getColumnTypeName` (the override-or-fallback dispatch), and `MetadataResultSets.writeArrowStream` (the only stamp site). Test updates: `ArrowToHyperTypeMapperTest` deleted (it pinned the override read path); 63 assertions in `DataCloudDatabaseMetadataTest` flipped from `"TEXT"` / `"SHORT"` to `"VARCHAR"` / `"SMALLINT"` (the JDBC defaults); `MetadataSchemasTest` re-pins the four BOOLEAN positions in TYPE_INFO at `"BOOLEAN"` / `Types.BOOLEAN`. Behavior on `getColumnType` (the int code), `Types.BOOLEAN` for boolean metadata columns, accessor coercion — all unchanged.
The previous implementation returned `getLong() != 0`, silently coercing any non-zero integer (2, -1, MAX_VALUE) to `true`. ResultSet.getBoolean's own Javadoc only defines the conversion for exactly 0 and 1 — the spec is silent on other values. pgjdbc throws CANNOT_COERCE on anything else (BooleanTypeUtil.fromNumber). Match that strict behavior: 0 → false, 1 → true, anything else throws SQLException with SQLState 22018. The "non-zero → true" extrapolation was the same flavor of silent coercion the rest of this PR sets out to remove (VarCharVectorSetter accepting arbitrary Object via toString, integer setters silently truncating out-of-range Numbers). Catching here too rather than letting a real integer column lose its value when bounced through getBoolean. Update the metadata-test assertion: ORDINAL_POSITION = 500 used to coerce to true under the permissive path; now asserts the SQLException with the expected message.
Summary
Collapse the two ResultSet families (streaming Arrow + row-based metadata) onto a single Arrow-backed pipeline so there is one accessor implementation, one set of type semantics, and one place to fix bugs. Tighten root-allocator hygiene end-to-end while we are in there, and bring
getTypeInfo()and integer-column accessor coercion into line with JDBC 4.2.Why
The driver previously carried two parallel result-set implementations:
StreamingResultSetfor query results (Arrow IPC, columnar accessors) andSimpleResultSet/DataCloudMetadataResultSet/ColumnAccessorfor metadata (row-orientedList<List<Object>>, hand-rolled per-cell coercion). Same JDBC surface, two divergent code paths. Bugs found in one were rarely fixed in the other; type semantics drifted (e.g.getBooleanon an integer column behaved differently between the two), and the metadata path silentlytoString()'d any payload you handed it. The pre-rebase review of this PR also surfaced several allocator leak windows and JDBC-spec compliance gaps that the unification made it natural to fix.What changed
Unified result set.
StreamingResultSet,DataCloudMetadataResultSet,SimpleResultSet, andColumnAccessorare removed. Every JDBC metadata call (getTables,getColumns,getSchemas,getTypeInfo, the empty-metadata helpers) now flows throughDataCloudResultSetvia a newMetadataResultSetsfactory.MetadataResultSetsbuilds a single-batch Arrow IPC stream by reusingVectorPopulator(the same code path the JDBC parameter encoder uses) andHyperTypeToArrow.toField, then hands the resulting reader + allocator toDataCloudResultSet.of.DataCloudResultSetis now apublic classrather than the prior empty marker interface; the concrete implementation is no longer a sibling type calledStreamingResultSet. Theof(...)factory takes aQueryResultArrowStream.Result(reader + allocator pair) and owns both their lifecycles.Root allocator hygiene. Six independent leak windows closed:
QueryResultArrowStream.toArrowStreamReaderreturns aResultholder that pairs reader + allocator and closes both in order (reader first so ArrowBuf accounting clears before the allocator's budget check). The 100 MB cap moves to a publicROOT_ALLOCATOR_BUDGET_BYTESconstant and is now reused by the metadata path.MetadataResultSets.ofandQueryResultArrowStream.toArrowStreamReaderboth wrap allocator + reader construction in try/catch so the allocator is closed ifnew ArrowStreamReader(...)ever throws before ownership transfers.DataCloudResultSet.of's construction-failure cleanup now wraps both reader.close and allocator.close withaddSuppressedso neither close masks the originalSQLException.ArrowStreamReaderCursor.closeuses try-with-resources over(allocator, reader)so reader closes first and any allocator-close exception attaches as suppressed onto the reader's instead of replacing it.DataCloudStatement.executeQueryandgetResultSethoistiterator.getQueryStatus().getQueryId()once before allocator construction, so a throw between allocator creation andDataCloudResultSet.oftaking ownership can no longer strand the allocator.DataCloudResultSet.closeis idempotent across cursor.close failures: theclosedflag flips before delegating, so a defensive caller's retry is a no-op rather than a double-close.JDBC spec compliance.
getTypeInfo()boolean columns (CASE_SENSITIVE,UNSIGNED_ATTRIBUTE,FIXED_PREC_SCALE,AUTO_INCREMENT) are now declared asBOOLEANper JDBC 4.2 (DatabaseMetaData.getTypeInfo Javadoc) and pgjdbc, via a newbool(...)helper inMetadataSchemas. They were previously declared astext(...)while the row producer wrote rawBooleanvalues, which only "worked" becauseVarCharVectorSettersilentlytoString()'d everything.BaseIntVectorAccessor.getBooleannow matchesResultSet.getBoolean's spec text on integer columns: 0 returns false, 1 returns true, anything else throwsSQLExceptionwith SQLState22018(matching pgjdbc's strict CANNOT_COERCE behavior inBooleanTypeUtil.fromNumber). Previously inherited the abstract default that threwSQLFeatureNotSupportedExceptionfor everything.VectorPopulator.VarCharVectorSetteris tightened from<VarCharVector, Object>to<VarCharVector, String>, so non-String payloads fail fast at theBaseVectorSettertype guard. Thebyte[]arm was dead —setBytes/setBinaryStream/setUnicodeStream/setAsciiStreamall throw FEATURE_NOT_SUPPORTED inDataCloudPreparedStatement.IntVectorSetter,SmallIntVectorSetter,TinyIntVectorSetter) now range-check before narrowing rather than silently truncating;DecimalVectorSetterdoes the same via abitLength() > 63guard on the unscaled value.QueryJDBCAccessor.getObject(Class)gains anisInstancefallback sogetObject(col, String.class)on a VARCHAR (and analogous identity-class paths on every other accessor) works without each accessor implementing typedgetObjectitself.Observable behavior changes
rs.getBoolean("CASE_SENSITIVE")ongetTypeInfo()returns a real Boolean (wasSQLFeatureNotSupportedExceptionvia the broken VARCHAR path).rs.getBoolean("NULLABLE")ongetColumns()(and any other integer column) returnsfalsefor0andtruefor1, instead of throwing. Other integer values throwSQLException(SQLState22018).rs.getDate(intCol)/getTime(intCol)/getTimestamp(intCol)on metadata rows throwSQLException(wasUnsupportedOperationException).rs.getObject(intCol, Boolean.class)on metadata rows now throws (the strictisInstancepath).rs.getMetaData().getColumnType(...)on the fourgetTypeInfo()boolean columns returnsTypes.BOOLEAN, notTypes.VARCHAR.rs.getMetaData().getColumnTypeName(...)on every metadata result set (getTables,getColumns,getTypeInfo, …) returns the JDBC type name derived from the column'sHyperType("VARCHAR","SMALLINT","INTEGER","BOOLEAN") rather than the prior Hyper-flavored labels ("TEXT","SHORT","INTEGER","BOOL"). The JDBC spec only requires some database-specific type name and does not pin specific strings; this aligns with the names other JDBC consumers in the driver already use.ps.setObject(idx, x, Types.VARCHAR)with a non-String / non-byte[] argument now throwsIllegalArgumentExceptioninstead of silentlytoString()-ing the argument.ps.setObject(idx, x, Types.INTEGER)(and INT2/INT8) throwsIllegalArgumentExceptionfor out-of-range Numbers instead of silently narrowing; same forTypes.DECIMALwhen the unscaled value exceeds 64 bits.Breaking changes
com.salesforce.datacloud.jdbc.core.DataCloudResultSetis now apublic classrather than apublic interface. External code that wroteclass MyRs implements DataCloudResultSet(decorators, wrappers, hand-rolled doubles) no longer compiles; code that consumes the standardjava.sql.ResultSet/DataCloudResultSetAPI as an opaque type recompiles unchanged.The previously-public types
StreamingResultSet,DataCloudMetadataResultSet,SimpleResultSet,ColumnAccessorare removed. External callers ofStreamingResultSet.of(ArrowStreamReader, ...)should switch toDataCloudResultSet.of(QueryResultArrowStream.Result, ...).Test plan
./gradlew :jdbc-core:test— full module suite green../gradlew :jdbc-core:spotlessCheck— formatting clean../gradlew clean build— full build including:spark-datasource, JaCoCo coverage, verification.MetadataSchemasTestadds three TYPE_INFO position-by-position assertions;VarCharVectorSetterStrictTypeTestregresses on Boolean / byte[] / Number payloads;IntegerVectorSetterRangeCheckTestextends toDecimalVectorSetter;ArrowStreamReaderCursorTestpins reader-before-allocator close ordering plusaddSuppressedchaining when both throw;DataCloudResultSetMethodTestpinsclose()idempotence under cursor.close failure;DataCloudDatabaseMetadataTest.testGetTypeInfonow exercisesgetBooleanon all four boolean columns end-to-end.BREAKING CHANGE:
DataCloudResultSetis now a class instead of an interface;StreamingResultSet,DataCloudMetadataResultSet,SimpleResultSet,ColumnAccessorare removed; metadata int-columngetDate/getTime/getTimestampthrowSQLException(wasUnsupportedOperationException);getTypeInfo()boolean columns are typedBOOLEANinstead ofVARCHAR(getObjectreturnsBoolean, notString);getColumnTypeNameon metadata result sets returns the JDBC type name (VARCHAR/SMALLINT/INTEGER/BOOLEAN) instead of the prior Hyper-flavored labels (TEXT/SHORT/INTEGER/BOOL);ps.setObjectwithTypes.VARCHARrejects non-String/byte[] payloads; integer-family and DECIMAL setters reject out-of-range values instead of silently narrowing.