From 00b8d4b5816a979f55882f2efb461667ab9c4156 Mon Sep 17 00:00:00 2001 From: Gopal Lal Date: Wed, 22 Apr 2026 08:48:10 +0530 Subject: [PATCH 01/23] Add design doc: Result Set Heartbeat / Keep-Alive Design doc for PECOBLR-2321: periodic heartbeat polling to keep server-side result state alive while the client consumes results slowly. Key design points: - Periodic GetStatementStatus (SEA) / GetOperationStatus (Thrift) - Opt-in via EnableHeartbeat connection parameter (default false) - Configurable interval (default 60s, aligned with ADBC C# driver) - Zero-leak guarantee: stops on ResultSet.close, Statement.close, Connection.close, end-of-results, or server terminal state - Error resilience: 10 consecutive failures before self-stop - Includes cross-driver survey (ADBC C#, Python, Go, Node.js) - Mermaid diagrams: sequence flows, state machine, class diagram Signed-off-by: Gopal Lal Co-authored-by: Isaac Signed-off-by: Gopal Lal --- docs/design/HEARTBEAT_KEEP_ALIVE.md | 493 ++++++++++++++++++++++++++++ 1 file changed, 493 insertions(+) create mode 100644 docs/design/HEARTBEAT_KEEP_ALIVE.md diff --git a/docs/design/HEARTBEAT_KEEP_ALIVE.md b/docs/design/HEARTBEAT_KEEP_ALIVE.md new file mode 100644 index 000000000..5bd497763 --- /dev/null +++ b/docs/design/HEARTBEAT_KEEP_ALIVE.md @@ -0,0 +1,493 @@ +# Design: Result Set Heartbeat / Keep-Alive for Ongoing Query Executions + +**JIRA:** PECOBLR-2321 +**Author:** Gopal Lal +**Date:** 2026-04-21 +**Status:** DRAFT + +--- + +## Problem Statement + +When a user executes a query and reads results slowly (e.g., processes each row with pauses of minutes or hours), the server-side resources backing those results can expire: + +1. **Warehouse auto-stop**: The cluster shuts down after its configured idle timeout (e.g., 10-60 minutes of no activity), destroying any in-memory state +2. **Operation handle expiry**: The server evicts idle operations after a timeout +3. **Inline result loss**: For inline results (data returned directly in the execute response, not uploaded to cloud storage), the data exists only on the cluster — when the cluster stops, the results are permanently lost +4. **Presigned URL expiry**: Cloud fetch URLs expire in 5-15 minutes (refreshable via re-fetch, but requires the operation to still be alive) + +The user sees errors like `INVALID_HANDLE_STATUS`, connection timeouts, or "operation not found" when they attempt to read the next row after a pause. + +## Requirements + +1. **Keep results alive**: Periodically signal the server that the client is still consuming results, preventing premature resource cleanup +2. **Minimize cost impact**: Heartbeats keep the warehouse running, which costs money. Must be opt-in or clearly documented, and stopped immediately when no longer needed +3. **Zero resource leaks**: Heartbeat tasks must stop when results are fully consumed, ResultSet is closed, Statement is closed, or Connection is closed — any leak means the customer pays for an idle cluster indefinitely +4. **Both protocols**: Work for both SEA (SQL Execution API) and Thrift execution paths +5. **Configurable**: Users can enable/disable the feature and control the interval +6. **Inline results priority**: Most critical for inline results where data is not persisted to cloud storage + +## Non-Goals + +- Extending the 1-hour result data retention limit in the SEA API (this is a server-side hard limit that cannot be extended by client polling) +- Implementing full result materialization to client-side storage +- Changing server-side timeout configurations + +--- + +## Background: What Keeps Results Alive? + +### SEA (SQL Execution API) + +- **Statement keep-alive**: Polling via `GET /api/2.1/sql/statements/{id}` keeps the statement alive. The API documentation states: "To guarantee that the statement is kept alive, you must poll at least once every 15 minutes." +- **Result retention**: Results are available for **1 hour** after the query succeeds. Polling does NOT extend this. This is a hard server-side limit. +- **Presigned URLs**: External link URLs expire in <= 15 minutes but can be refreshed by calling `GetStatementResultChunkN`. + +### Thrift (HiveServer2 protocol) + +- **Operation keep-alive**: The server maintains `lastAccessTime` for each operation. `FetchResults` RPCs constitute activity and reset the idle timer. `GetOperationStatus` may or may not reset it depending on server configuration. +- **Idle operation timeout**: Configured server-side (typically 5 minutes to 1 day). Operations in terminal state (FINISHED) are subject to eviction. + +### What Currently Happens + +The driver has **no background heartbeat**. Result fetching is demand-driven: +- `ResultSet.next()` triggers the next fetch (Thrift `FetchResults` or SEA chunk download) +- If the user pauses between `next()` calls longer than the server timeout, the operation expires +- For cloud fetch results, the `StreamingChunkProvider` prefetches chunks in the background, which acts as implicit heartbeat — but only while there are unfetched chunks +- For inline results, there is no background activity at all after the initial response + +--- + +## Industry Patterns + +### Cross-Driver Survey + +Research across Databricks' own driver ecosystem and open-source analytical JDBC drivers: + +| Driver | Heartbeat During Result Consumption? | API Called | Default Interval | Configurable? | +|--------|--------------------------------------|-----------|-----------------|---------------| +| **Databricks ADBC (C#)** | **YES** | `GetOperationStatus` (Thrift) | 60s | Yes: `adbc.databricks.fetch_heartbeat_interval` | +| **Databricks Python** | No | N/A | N/A | N/A | +| **Databricks Go** | No | N/A | N/A | N/A | +| **Databricks Node.js** | No | N/A | N/A | N/A | +| **Databricks JDBC (this driver)** | No | N/A | N/A | N/A | + +### Reference Implementation: Databricks ADBC C# Driver + +The ADBC C# driver (`adbc-drivers/databricks`) is the **only** Databricks driver with a heartbeat. Key design details: + +- **Class**: `DatabricksOperationStatusPoller` implementing `IOperationStatusPoller` +- **Protocol**: Thrift only (no SEA equivalent) +- **RPC**: `TCLIService.GetOperationStatus(TGetOperationStatusReq)` — lightweight status check +- **Default interval**: 60 seconds, configurable via `adbc.databricks.fetch_heartbeat_interval` +- **Request timeout**: 30 seconds per poll, configurable via `adbc.databricks.operation_status_request_timeout` +- **Mechanism**: `Task.Run()` with async loop + `CancellationTokenSource` for clean shutdown +- **Started**: In `DatabricksCompositeReader` constructor when `HasMoreRows` is true or unknown +- **Stopped**: On end of results (null batch), Dispose, terminal state, max failures, or cancellation +- **Error resilience**: Max 10 consecutive failures (~10 min at default interval) before self-stop; single success resets counter +- **Terminal states**: CANCELED, ERROR, CLOSED, TIMEDOUT, UNKNOWN → stop polling +- **FINISHED state**: Continues polling (operation complete but client still reading) +- **No disable toggle**: Must set large interval as workaround +- **No SEA support**: REST API path does not implement heartbeat + +**Key lessons from the reference implementation:** +1. The heartbeat is Thrift-only — the SEA API has different lifecycle semantics +2. Error resilience is critical — a single transient error (e.g., TLS recycling) should not permanently kill the heartbeat (fixed in their PR #372 after a customer incident) +3. The heartbeat continues even after the query is FINISHED — the goal is to keep the operation handle alive while the client reads results, not to wait for execution +4. No explicit disable parameter exists — our design should add one since heartbeats have cost implications + +### General Industry Patterns + +| Pattern | Adoption | Effectiveness | This Driver | +|---------|----------|--------------|-------------| +| **Demand-driven fetch (implicit heartbeat)** | Universal | Works only while chunks remain to fetch | Already implemented via `StreamingChunkProvider` | +| **Background heartbeat polling** | One driver (ADBC C#) | Keeps operation alive during slow consumption | Not implemented | +| **Eager cloud storage prefetch** | Universal for cloud fetch | Results survive cluster stop (data in cloud) | Already implemented | +| **Result materialization to client disk** | None found | Would solve the problem completely | Too complex, memory/disk concerns | + +--- + +## Design + +### Sequence Diagram: Normal Flow (Happy Path) + +```mermaid +sequenceDiagram + participant User + participant Statement + participant ResultSet + participant HeartbeatMgr as ResultHeartbeatManager + participant Server + + User->>Statement: executeQuery(sql) + Statement->>Server: ExecuteStatement + Server-->>Statement: Results (statementId) + Statement->>ResultSet: create(results) + Statement->>HeartbeatMgr: startHeartbeat(statementId) + HeartbeatMgr->>HeartbeatMgr: schedule task every 60s + + loop User reads slowly + User->>ResultSet: next() + ResultSet-->>User: row data + end + + Note over HeartbeatMgr,Server: Heartbeat fires during pauses + HeartbeatMgr->>Server: GetStatementStatus / GetOperationStatus + Server-->>HeartbeatMgr: FINISHED (still alive) + HeartbeatMgr->>Server: GetStatementStatus / GetOperationStatus + Server-->>HeartbeatMgr: FINISHED (still alive) + + User->>ResultSet: next() + ResultSet-->>User: false (end of results) + ResultSet->>HeartbeatMgr: stopHeartbeat(statementId) + + User->>ResultSet: close() + ResultSet->>Statement: handleResultSetClose() +``` + +### Sequence Diagram: Close During Slow Consumption + +```mermaid +sequenceDiagram + participant User + participant ResultSet + participant HeartbeatMgr as ResultHeartbeatManager + participant Server + + Note over HeartbeatMgr,Server: Heartbeat running every 60s + HeartbeatMgr->>Server: GetStatementStatus + Server-->>HeartbeatMgr: FINISHED + + User->>ResultSet: close() + ResultSet->>HeartbeatMgr: stopHeartbeat(statementId) + HeartbeatMgr->>HeartbeatMgr: cancel scheduled task + ResultSet->>Server: closeStatement(statementId) + Note over HeartbeatMgr: No more heartbeats — clean shutdown +``` + +### Sequence Diagram: Server-Side Expiry + +```mermaid +sequenceDiagram + participant User + participant ResultSet + participant HeartbeatMgr as ResultHeartbeatManager + participant Server + + HeartbeatMgr->>Server: GetStatementStatus + Server-->>HeartbeatMgr: CLOSED (operation expired) + HeartbeatMgr->>HeartbeatMgr: stop self (terminal state) + + User->>ResultSet: next() + ResultSet->>Server: fetch results + Server-->>ResultSet: ERROR (not found) + ResultSet-->>User: SQLException +``` + +### Sequence Diagram: Transient Errors with Recovery + +```mermaid +sequenceDiagram + participant HeartbeatMgr as ResultHeartbeatManager + participant Server + + HeartbeatMgr->>Server: GetStatementStatus + Server--xHeartbeatMgr: Connection error (failure 1/10) + Note over HeartbeatMgr: consecutiveFailures = 1 + + HeartbeatMgr->>Server: GetStatementStatus + Server--xHeartbeatMgr: Timeout (failure 2/10) + Note over HeartbeatMgr: consecutiveFailures = 2 + + HeartbeatMgr->>Server: GetStatementStatus + Server-->>HeartbeatMgr: FINISHED (success!) + Note over HeartbeatMgr: consecutiveFailures = 0 (reset) + + HeartbeatMgr->>Server: GetStatementStatus + Server-->>HeartbeatMgr: FINISHED +``` + +### Heartbeat State Machine + +```mermaid +stateDiagram-v2 + [*] --> Idle: Driver initialized + + Idle --> Scheduled: startHeartbeat()\n[EnableHeartbeat=true\nAND result has more rows\nAND not direct results] + + Scheduled --> Polling: timer fires (every 60s) + + Polling --> Scheduled: Server returns\nFINISHED / RUNNING / PENDING + Polling --> Scheduled: Transient error\n[consecutiveFailures < 10] + + Polling --> Stopped: Server returns\nCLOSED / ERROR / CANCELED\n/ TIMEDOUT / UNKNOWN + Polling --> Stopped: consecutiveFailures >= 10 + + Scheduled --> Stopped: stopHeartbeat()\n[ResultSet.close() OR\nStatement.close() OR\nnext() returns false] + + Scheduled --> Stopped: shutdown()\n[Connection.close()] + + Stopped --> [*] +``` + +### Result Type Eligibility + +```mermaid +flowchart TD + A[Query executed successfully] --> B{EnableHeartbeat\n= true?} + B -->|No| Z[No heartbeat] + B -->|Yes| C{Direct results?\ndirectResultsReceived} + C -->|Yes| Z + C -->|No| D{Result type?} + D -->|SEA inline| E[START heartbeat\nvia GetStatementStatus\nHighest priority - data on cluster only] + D -->|Thrift inline| F[START heartbeat\nvia GetOperationStatus\nHighest priority - data on cluster only] + D -->|SEA cloud fetch| G[START heartbeat\nvia GetStatementStatus\nKeeps statement alive for URL refresh] + D -->|Thrift cloud fetch| H[START heartbeat\nvia GetOperationStatus\nKeeps operation alive for URL refresh] +``` + +### Component Architecture + +```mermaid +classDiagram + class DatabricksConnection { + -ResultHeartbeatManager heartbeatManager + +close() stops all heartbeats + } + + class DatabricksStatement { + +executeInternal() triggers heartbeat start + +close() triggers heartbeat stop + } + + class DatabricksResultSet { + +next() stops heartbeat when returns false + +close() stops heartbeat + } + + class ResultHeartbeatManager { + -ScheduledExecutorService scheduler + -Map~StatementId, ScheduledFuture~ activeHeartbeats + +startHeartbeat(StatementId, HeartbeatTask) + +stopHeartbeat(StatementId) + +shutdown() + } + + class HeartbeatTask { + <> + +run() sends heartbeat RPC + } + + class SeaHeartbeatTask { + -DatabricksSdkClient client + -StatementId statementId + +run() GET /sql/statements/id + } + + class ThriftHeartbeatTask { + -DatabricksThriftAccessor accessor + -TOperationHandle handle + +run() GetOperationStatus + } + + DatabricksConnection --> ResultHeartbeatManager + DatabricksStatement --> ResultHeartbeatManager + DatabricksResultSet --> ResultHeartbeatManager + ResultHeartbeatManager --> HeartbeatTask + HeartbeatTask <|.. SeaHeartbeatTask + HeartbeatTask <|.. ThriftHeartbeatTask +``` + +### New Component: `ResultHeartbeatManager` + +A lightweight manager that schedules periodic heartbeat RPCs for active result sets. + +```java +/** + * Schedules periodic heartbeat RPCs to keep server-side result state alive + * while the client is consuming results slowly. Uses a shared + * ScheduledExecutorService (daemon threads) to minimize resource overhead. + */ +class ResultHeartbeatManager { + + private final ScheduledExecutorService scheduler; // shared across connection + private final Map> activeHeartbeats; + + /** Start heartbeat for a statement. Called after successful execution. */ + void startHeartbeat(StatementId id, HeartbeatTask task); + + /** Stop heartbeat for a statement. Called on ResultSet.close(), Statement.close(), + * or when all results have been fetched. */ + void stopHeartbeat(StatementId id); + + /** Stop all heartbeats. Called on Connection.close(). */ + void shutdown(); +} +``` + +### Heartbeat Task + +The heartbeat task is protocol-specific: + +**SEA**: `GetStatementStatus` — `GET /api/2.1/sql/statements/{statementId}` (lightweight status check, already implemented in `DatabricksSdkClient` line 276-280). The API documentation states: "To guarantee that the statement is kept alive, you must poll at least once every 15 minutes." This makes heartbeat particularly valuable for SEA. + +**Thrift**: `GetOperationStatus(operationHandle)` (lightweight status check, already implemented in `DatabricksThriftAccessor.getOperationStatus()`) + +The task: +1. Sends the status check RPC +2. If the server responds with a terminal state (CLOSED, ERROR, CANCELED, TIMEDOUT, UNKNOWN), stops the heartbeat and logs a warning +3. If the server responds with FINISHED state, **continues** polling (operation complete but client still reading) +4. If the RPC fails (connection error, timeout), increments a consecutive failure counter. After **10 consecutive failures** (~10 minutes at default interval), stops the heartbeat. A single success resets the counter. (Learned from ADBC C# PR #372 — a single transient error like TLS recycling should not permanently kill the heartbeat) +5. Does NOT throw exceptions — failures are logged, not propagated to the user's thread + +### Lifecycle + +``` +Statement.execute() + └─▶ ResultSet created + └─▶ If heartbeat enabled AND result type is eligible: + startHeartbeat(statementId, heartbeatTask) + +ResultSet.next() returns false (all results consumed) + └─▶ stopHeartbeat(statementId) + +ResultSet.close() + └─▶ stopHeartbeat(statementId) + +Statement.close() + └─▶ stopHeartbeat(statementId) // safety net + +Connection.close() + └─▶ heartbeatManager.shutdown() // kills all heartbeats +``` + +### Which Result Types Need Heartbeat? + +| Result Type | Cloud Persisted? | Needs Heartbeat? | Reason | +|-------------|-----------------|-------------------|--------| +| SEA inline (JSON) | No | **Yes** | Data only on cluster; lost on cluster stop | +| SEA cloud fetch (Arrow) | Yes | **Optional** | Data in cloud; survives cluster stop. But statement must stay alive for URL refresh | +| Thrift inline (columnar) | No | **Yes** | Data only on cluster | +| Thrift cloud fetch | Yes | **Optional** | Same as SEA cloud fetch | +| Direct results (CLOSED state) | N/A | **No** | Server already closed the operation; data fully delivered | + +### Configuration + +New connection parameters: + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `EnableHeartbeat` | boolean | `false` | Enable periodic heartbeat for active result sets | +| `HeartbeatIntervalSeconds` | int | `60` | Interval between heartbeat RPCs (aligned with ADBC C# default) | +| `HeartbeatRequestTimeoutSeconds` | int | `30` | Timeout for each heartbeat RPC | + +Default is **disabled** because: +- Heartbeats keep the warehouse running, increasing cost +- Most users consume results quickly +- Cloud fetch results already survive cluster stop +- Users who know they'll read slowly can opt in + +**Design choice: explicit disable toggle.** The ADBC C# driver has no way to disable the heartbeat (must set a large interval as workaround). We add `EnableHeartbeat` as an explicit boolean for clarity — cost implications should be a conscious opt-in decision. + +### Thread Management + +Follows existing codebase patterns: + +- **Shared `ScheduledExecutorService`** per connection (like `TelemetryClient`) +- **Daemon threads** (won't prevent JVM shutdown) +- **Named thread factory**: `databricks-jdbc-heartbeat-{connectionId}` +- **Single thread**: heartbeats are lightweight; one thread handles all active result sets for a connection +- **Cleanup order**: ResultSet.close() → stopHeartbeat → (if last) scheduler stays alive for reuse → Connection.close() → scheduler.shutdown() + +### Cleanup Guarantees (Zero Leak) + +The heartbeat is stopped in **four** places to guarantee no leaks: + +1. **All results consumed**: `ResultSet.next()` returns false → stop +2. **ResultSet.close()**: explicit user close → stop +3. **Statement.close()**: closes ResultSet → triggers #2 → stop +4. **Connection.close()**: `heartbeatManager.shutdown()` → cancels ALL heartbeats + +Additionally: +- If the heartbeat RPC itself fails (server gone, operation expired), the heartbeat self-stops +- Daemon threads ensure JVM can exit even if cleanup is missed +- `ScheduledFuture.cancel(false)` is used (does not interrupt running heartbeat, waits for current one to finish) + +### Error Handling + +| Scenario | Behavior | +|----------|----------| +| Heartbeat RPC returns CLOSED/ERROR state | Stop heartbeat, log warning at INFO level | +| Heartbeat RPC fails (network error) | Retry once after 30s; if still fails, stop heartbeat, log warning | +| Heartbeat RPC returns success | Continue scheduling next heartbeat | +| ResultSet.close() called during heartbeat RPC | `cancel(false)` waits for RPC to finish, then stops | +| Connection.close() with active heartbeats | `scheduler.shutdownNow()` interrupts all | + +--- + +## Implementation Plan + +### Phase 1: Core Infrastructure + +1. **`ResultHeartbeatManager`**: Shared per-connection manager with start/stop/shutdown +2. **`HeartbeatTask` interface**: Protocol-specific implementations for SEA and Thrift +3. **Connection parameters**: `EnableHeartbeat`, `HeartbeatIntervalSeconds` +4. **Integration points**: Hook into `DatabricksResultSet` constructor, `close()`, and `next()` return-false path + +### Phase 2: Protocol Implementation + +5. **SEA heartbeat**: `DatabricksSdkClient.getStatementStatus(statementId)` — lightweight GET +6. **Thrift heartbeat**: `DatabricksThriftAccessor.getOperationStatus(operationHandle)` — lightweight RPC +7. **Direct results skip**: No heartbeat when `directResultsReceived` is true + +### Phase 3: Testing + +8. **Unit tests**: Verify lifecycle (start, stop on close, stop on all-consumed, stop on error) +9. **Integration tests**: Verify heartbeat keeps results alive across a simulated pause +10. **Leak tests**: Verify no threads leak after ResultSet/Statement/Connection close + +### Phase 4: Documentation + +11. **Connection parameter docs**: Document `EnableHeartbeat` and `HeartbeatIntervalSeconds` +12. **Cost implications**: Document that heartbeats keep the warehouse running + +--- + +## Alternatives Considered + +### 1. Eager Full Prefetch to Client Memory + +Fetch all results into client memory immediately after execution, eliminating server dependency. + +**Rejected**: Would cause OOM for large result sets. The existing sliding-window prefetch is the right balance. + +### 2. Result Materialization to Client Disk + +Write results to local disk as they arrive, serving reads from disk. + +**Rejected**: Adds significant complexity (disk management, temp file cleanup, serialization format). No industry precedent in JDBC drivers. + +### 3. Force Cloud Fetch for All Queries + +Configure the server to always upload results to cloud storage, even for small queries. + +**Rejected**: Server-side change outside driver's control. Would add latency for small queries. + +### 4. Heartbeat at HTTP Connection Level + +Use TCP keepalive or HTTP-level ping to keep the connection alive. + +**Rejected**: HTTP keepalive keeps the TCP connection alive, but does NOT keep the server-side operation/statement alive. The problem is application-layer resource expiry, not connection expiry. + +--- + +## Open Questions + +1. **Does `GetOperationStatus` reset the Thrift idle operation timer?** The ADBC C# driver uses `GetOperationStatus` for its heartbeat and it works in production (customers confirmed). Need to verify with the server team whether this is guaranteed behavior or implementation-dependent. + +2. **Should heartbeat be enabled by default for inline results?** Since inline results are the most vulnerable and the feature is explicitly designed for this case, it might make sense to auto-enable when the result type is inline. But this increases cost without user opt-in. The ADBC C# driver always enables it (no toggle) — we chose to make it opt-in. + +3. **SEA heartbeat included (unlike ADBC C#).** The ADBC C# driver only implements Thrift heartbeat. We include SEA heartbeat via `GetStatementStatus` because the API documentation explicitly requires polling every 15 minutes to keep the statement alive. The 1-hour result retention is a hard limit that heartbeat cannot extend, but keeping the statement alive is still valuable for URL refresh and metadata access. Our default 60s interval satisfies the 15-minute requirement with ample margin. + +4. **Should we expose a callback for heartbeat failures?** When the heartbeat detects that results have expired (terminal state from server), should we proactively set an error flag so the next `ResultSet.next()` throws a clear error instead of waiting for the fetch to fail with a cryptic server error? + +5. **Alignment with ADBC C# driver parameters.** The ADBC C# driver uses `adbc.databricks.fetch_heartbeat_interval`. Should we align our JDBC parameter names for cross-driver consistency (e.g., `FetchHeartbeatIntervalSeconds`) or use our own convention (`HeartbeatIntervalSeconds`)? From 613030431122eeb8ff7d91c45ecf350cdf907cc9 Mon Sep 17 00:00:00 2001 From: Gopal Lal Date: Wed, 22 Apr 2026 09:50:56 +0530 Subject: [PATCH 02/23] Implement result set heartbeat / keep-alive (PECOBLR-2321) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add periodic heartbeat polling to keep server-side result state alive while the client consumes results slowly. Prevents warehouse auto-stop from destroying in-progress results. New files: - ResultHeartbeatManager: per-connection manager with shared ScheduledExecutorService (daemon thread). Manages start/stop/shutdown lifecycle for heartbeats across all statements. - ResultHeartbeatManagerTest: 7 unit tests covering lifecycle, idempotency, interval, shutdown, re-execution. Connection parameters: - EnableHeartbeat (default 0): opt-in to enable heartbeat polling - HeartbeatIntervalSeconds (default 60): polling interval Protocol support: - SEA: GET /sql/statements/{id} via checkStatementAlive() - Thrift: GetOperationStatus via checkStatementAlive() Heartbeat eligibility (skipped when not needed): - SEA inline (InlineJsonResult): all data in memory, no server state - Update count / metadata results: no data to keep alive - Direct results: server already closed the operation - Null execution result: nothing to fetch Error resilience: - 10 consecutive failures before self-stop (transient error tolerance) - Single success resets failure counter - Terminal states (CLOSED/ERROR/CANCELED/TIMEDOUT) stop heartbeat Cleanup guarantees (zero leak): - next() returns false → stop - ResultSet.close() → stop - Statement.close() → safety net stop - Connection.close() → shutdown all Signed-off-by: Gopal Lal Co-authored-by: Isaac Signed-off-by: Gopal Lal --- .../jdbc/api/impl/DatabricksConnection.java | 22 ++++ .../api/impl/DatabricksConnectionContext.java | 8 ++ .../jdbc/api/impl/DatabricksResultSet.java | 92 ++++++++++++++ .../jdbc/api/impl/DatabricksStatement.java | 7 ++ .../jdbc/api/impl/ResultHeartbeatManager.java | 119 ++++++++++++++++++ .../jdbc/common/DatabricksJdbcUrlParams.java | 10 +- .../jdbc/dbclient/IDatabricksClient.java | 11 ++ .../impl/sqlexec/DatabricksSdkClient.java | 21 ++++ .../impl/thrift/DatabricksThriftAccessor.java | 3 +- .../thrift/DatabricksThriftServiceClient.java | 24 ++++ .../api/impl/ResultHeartbeatManagerTest.java | 117 +++++++++++++++++ 11 files changed, 432 insertions(+), 2 deletions(-) create mode 100644 src/main/java/com/databricks/jdbc/api/impl/ResultHeartbeatManager.java create mode 100644 src/test/java/com/databricks/jdbc/api/impl/ResultHeartbeatManagerTest.java diff --git a/src/main/java/com/databricks/jdbc/api/impl/DatabricksConnection.java b/src/main/java/com/databricks/jdbc/api/impl/DatabricksConnection.java index 691bcc3ff..e758f1b53 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/DatabricksConnection.java +++ b/src/main/java/com/databricks/jdbc/api/impl/DatabricksConnection.java @@ -38,6 +38,7 @@ public class DatabricksConnection implements IDatabricksConnection, IDatabricksC private final Set statementSet = ConcurrentHashMap.newKeySet(); private SQLWarning warnings = null; private final IDatabricksConnectionContext connectionContext; + private final ResultHeartbeatManager heartbeatManager; /** * Creates an instance of Databricks connection for given connection context. @@ -49,6 +50,7 @@ public DatabricksConnection(IDatabricksConnectionContext connectionContext) this.connectionContext = connectionContext; DatabricksThreadContextHolder.setConnectionContext(connectionContext); this.session = new DatabricksSession(connectionContext); + this.heartbeatManager = createHeartbeatManager(connectionContext); } @VisibleForTesting @@ -58,10 +60,27 @@ public DatabricksConnection( this.connectionContext = connectionContext; DatabricksThreadContextHolder.setConnectionContext(connectionContext); this.session = new DatabricksSession(connectionContext, testDatabricksClient); + this.heartbeatManager = createHeartbeatManager(connectionContext); UserAgentManager.setUserAgent(connectionContext); TelemetryHelper.updateTelemetryAppName(connectionContext, null); } + private static ResultHeartbeatManager createHeartbeatManager( + IDatabricksConnectionContext connectionContext) { + if (connectionContext instanceof DatabricksConnectionContext) { + DatabricksConnectionContext ctx = (DatabricksConnectionContext) connectionContext; + if (ctx.isHeartbeatEnabled()) { + return new ResultHeartbeatManager(ctx.getHeartbeatIntervalSeconds()); + } + } + return null; + } + + /** Returns the heartbeat manager, or null if heartbeat is disabled. */ + ResultHeartbeatManager getHeartbeatManager() { + return heartbeatManager; + } + @Override public void open() throws SQLException { this.session.open(); @@ -420,6 +439,9 @@ public void close() throws SQLException { statement.close(false); statementSet.remove(statement); } + if (heartbeatManager != null) { + heartbeatManager.shutdown(); + } this.session.close(); TelemetryClientFactory.getInstance().closeTelemetryClient(connectionContext); DatabricksClientConfiguratorManager.getInstance().removeInstance(connectionContext); diff --git a/src/main/java/com/databricks/jdbc/api/impl/DatabricksConnectionContext.java b/src/main/java/com/databricks/jdbc/api/impl/DatabricksConnectionContext.java index 2852fa674..1d590ba93 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/DatabricksConnectionContext.java +++ b/src/main/java/com/databricks/jdbc/api/impl/DatabricksConnectionContext.java @@ -908,6 +908,14 @@ public boolean isTelemetryEnabled() { return getParameter(DatabricksJdbcUrlParams.ENABLE_TELEMETRY).equals("1"); } + public boolean isHeartbeatEnabled() { + return getParameter(DatabricksJdbcUrlParams.ENABLE_HEARTBEAT).equals("1"); + } + + public int getHeartbeatIntervalSeconds() { + return Integer.parseInt(getParameter(DatabricksJdbcUrlParams.HEARTBEAT_INTERVAL_SECONDS)); + } + @Override public String getVolumeOperationAllowedPaths() { return getParameter( diff --git a/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java b/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java index f9a73208e..e2461ef0c 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java +++ b/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java @@ -21,6 +21,7 @@ import com.databricks.jdbc.common.Nullable; import com.databricks.jdbc.common.StatementType; import com.databricks.jdbc.common.util.WarningUtil; +import com.databricks.jdbc.dbclient.IDatabricksClient; import com.databricks.jdbc.dbclient.impl.common.StatementId; import com.databricks.jdbc.exception.DatabricksParsingException; import com.databricks.jdbc.exception.DatabricksSQLException; @@ -123,6 +124,7 @@ public DatabricksResultSet( this.cachedTelemetryCollector = resolveTelemetryCollector(parentStatement); this.isClosed = false; this.wasNull = false; + startHeartbeatIfEnabled(); } @VisibleForTesting @@ -283,11 +285,15 @@ public boolean next() throws SQLException { cachedTelemetryCollector.recordResultSetIteration( statementId.toSQLExecStatementId(), resultSetMetaData.getChunkCount(), hasNext); } + if (!hasNext) { + stopHeartbeat(); + } return hasNext; } @Override public void close() throws DatabricksSQLException { + stopHeartbeat(); isClosed = true; this.executionResult.close(); if (parentStatement != null) { @@ -295,6 +301,92 @@ public void close() throws DatabricksSQLException { } } + /** Starts heartbeat polling if enabled on the connection and this result set is eligible. */ + private void startHeartbeatIfEnabled() { + if (parentStatement == null || statementId == null) { + return; + } + + // Skip heartbeat for result types where all data is already client-side: + // - SEA_INLINE (InlineJsonResult): all rows loaded in memory at construction + // - Update count results: no result data to keep alive + // - No execution result: nothing to fetch + if (resultSetType == ResultSetType.SEA_INLINE || executionResult == null) { + return; + } + + // Skip if this is an update count (no result rows) + if (statementType == StatementType.UPDATE || statementType == StatementType.METADATA) { + return; + } + + try { + DatabricksConnection conn = + (DatabricksConnection) parentStatement.getStatement().getConnection(); + ResultHeartbeatManager mgr = conn.getHeartbeatManager(); + if (mgr == null) { + return; // heartbeat not enabled + } + + IDatabricksClient client = conn.getSession().getDatabricksClient(); + final int maxConsecutiveFailures = 10; + + Runnable heartbeatTask = + new Runnable() { + private int consecutiveFailures = 0; + + @Override + public void run() { + try { + boolean alive = client.checkStatementAlive(statementId); + consecutiveFailures = 0; // reset on success + if (!alive) { + LOGGER.info( + "Heartbeat detected terminal state for statement {}, stopping", statementId); + stopHeartbeat(); + } + } catch (Exception e) { + consecutiveFailures++; + LOGGER.debug( + "Heartbeat failed for statement {} (failure {}/{}): {}", + statementId, + consecutiveFailures, + maxConsecutiveFailures, + e.getMessage()); + if (consecutiveFailures >= maxConsecutiveFailures) { + LOGGER.info( + "Heartbeat stopped for statement {} after {} consecutive failures", + statementId, + consecutiveFailures); + stopHeartbeat(); + } + } + } + }; + + mgr.startHeartbeat(statementId, heartbeatTask); + } catch (Exception e) { + LOGGER.debug("Failed to start heartbeat: {}", e.getMessage()); + } + } + + /** Stops the heartbeat for this result set's statement. Idempotent. */ + private void stopHeartbeat() { + if (parentStatement == null || statementId == null) { + return; + } + try { + DatabricksConnection conn = + (DatabricksConnection) parentStatement.getStatement().getConnection(); + ResultHeartbeatManager mgr = conn.getHeartbeatManager(); + if (mgr != null) { + mgr.stopHeartbeat(statementId); + } + } catch (Exception e) { + LOGGER.debug("Failed to stop heartbeat: {}", e.getMessage()); + } + } + private static TelemetryCollector resolveTelemetryCollector( IDatabricksStatementInternal parentStatement) { try { diff --git a/src/main/java/com/databricks/jdbc/api/impl/DatabricksStatement.java b/src/main/java/com/databricks/jdbc/api/impl/DatabricksStatement.java index c9b6733e7..a4ee5a8d7 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/DatabricksStatement.java +++ b/src/main/java/com/databricks/jdbc/api/impl/DatabricksStatement.java @@ -172,6 +172,13 @@ public void close(boolean removeFromSession) throws DatabricksSQLException { this.connection.closeStatement(this); } DatabricksThreadContextHolder.clearStatementInfo(); + // Safety net: stop any heartbeat for this statement + if (statementId != null) { + ResultHeartbeatManager mgr = connection.getHeartbeatManager(); + if (mgr != null) { + mgr.stopHeartbeat(statementId); + } + } shutDownExecutor(); this.updateCount = -1; this.isClosed = true; diff --git a/src/main/java/com/databricks/jdbc/api/impl/ResultHeartbeatManager.java b/src/main/java/com/databricks/jdbc/api/impl/ResultHeartbeatManager.java new file mode 100644 index 000000000..cc8f63270 --- /dev/null +++ b/src/main/java/com/databricks/jdbc/api/impl/ResultHeartbeatManager.java @@ -0,0 +1,119 @@ +package com.databricks.jdbc.api.impl; + +import com.databricks.jdbc.dbclient.impl.common.StatementId; +import com.databricks.jdbc.log.JdbcLogger; +import com.databricks.jdbc.log.JdbcLoggerFactory; +import java.util.Map; +import java.util.concurrent.*; + +/** + * Manages periodic heartbeat tasks to keep server-side result state alive while the client consumes + * results slowly. One instance per connection, shared across all statements. + * + *

Each active result set can register a heartbeat task that periodically calls + * GetStatementStatus (SEA) or GetOperationStatus (Thrift) to signal the server that the client is + * still consuming results. This prevents premature operation expiry and warehouse auto-stop. + * + *

Heartbeats are stopped when: + * + *

    + *
  • All results are consumed (ResultSet.next() returns false) + *
  • ResultSet.close() is called + *
  • Statement.close() is called (safety net) + *
  • Connection.close() calls shutdown() + *
  • The heartbeat task itself detects a terminal state or max consecutive failures + *
+ */ +class ResultHeartbeatManager { + + private static final JdbcLogger LOGGER = + JdbcLoggerFactory.getLogger(ResultHeartbeatManager.class); + + private final ScheduledExecutorService scheduler; + private final Map> activeHeartbeats = new ConcurrentHashMap<>(); + private final int intervalSeconds; + private volatile boolean isShutdown = false; + + ResultHeartbeatManager(int intervalSeconds) { + this.intervalSeconds = intervalSeconds; + this.scheduler = + Executors.newSingleThreadScheduledExecutor( + r -> { + Thread t = new Thread(r, "databricks-jdbc-heartbeat"); + t.setDaemon(true); + return t; + }); + } + + /** + * Starts a periodic heartbeat for the given statement. The task runs every {@code + * intervalSeconds} after an initial delay equal to the interval. + * + * @param statementId the statement to keep alive + * @param heartbeatTask the task that sends the heartbeat RPC. Must handle its own exceptions. + */ + void startHeartbeat(StatementId statementId, Runnable heartbeatTask) { + if (isShutdown || statementId == null) { + return; + } + + // Stop any existing heartbeat for this statement (e.g., re-execution) + stopHeartbeat(statementId); + + LOGGER.debug( + "Starting heartbeat for statement {} with interval {}s", statementId, intervalSeconds); + + ScheduledFuture future = + scheduler.scheduleAtFixedRate( + heartbeatTask, intervalSeconds, intervalSeconds, TimeUnit.SECONDS); + activeHeartbeats.put(statementId, future); + } + + /** + * Stops the heartbeat for the given statement. Idempotent — safe to call multiple times or for + * statements that have no active heartbeat. + */ + void stopHeartbeat(StatementId statementId) { + if (statementId == null) { + return; + } + + ScheduledFuture future = activeHeartbeats.remove(statementId); + if (future != null) { + future.cancel(false); // don't interrupt if currently running + LOGGER.debug("Stopped heartbeat for statement {}", statementId); + } + } + + /** Stops all heartbeats and shuts down the scheduler. Called on Connection.close(). */ + void shutdown() { + isShutdown = true; + + for (Map.Entry> entry : activeHeartbeats.entrySet()) { + entry.getValue().cancel(false); + LOGGER.debug("Stopped heartbeat for statement {} during shutdown", entry.getKey()); + } + activeHeartbeats.clear(); + + scheduler.shutdown(); + try { + if (!scheduler.awaitTermination(5, TimeUnit.SECONDS)) { + scheduler.shutdownNow(); + } + } catch (InterruptedException e) { + scheduler.shutdownNow(); + Thread.currentThread().interrupt(); + } + + LOGGER.debug("Heartbeat manager shut down"); + } + + /** Returns the number of active heartbeats. Visible for testing. */ + int getActiveHeartbeatCount() { + return activeHeartbeats.size(); + } + + boolean isShutdown() { + return isShutdown; + } +} diff --git a/src/main/java/com/databricks/jdbc/common/DatabricksJdbcUrlParams.java b/src/main/java/com/databricks/jdbc/common/DatabricksJdbcUrlParams.java index 446fa404a..5d19972d8 100644 --- a/src/main/java/com/databricks/jdbc/common/DatabricksJdbcUrlParams.java +++ b/src/main/java/com/databricks/jdbc/common/DatabricksJdbcUrlParams.java @@ -233,7 +233,15 @@ public enum DatabricksJdbcUrlParams { NON_ROWCOUNT_QUERY_PREFIXES( "NonRowcountQueryPrefixes", "Comma-separated list of query prefixes (like INSERT,UPDATE,DELETE) that should return result sets instead of row counts", - ""); + ""), + ENABLE_HEARTBEAT( + "EnableHeartbeat", + "Enable periodic heartbeat polling to keep server-side results alive during slow consumption", + "0"), + HEARTBEAT_INTERVAL_SECONDS( + "HeartbeatIntervalSeconds", + "Interval in seconds between heartbeat RPCs to keep results alive", + "60"); private final String paramName; private final String defaultValue; diff --git a/src/main/java/com/databricks/jdbc/dbclient/IDatabricksClient.java b/src/main/java/com/databricks/jdbc/dbclient/IDatabricksClient.java index 1c990777d..25b70d678 100644 --- a/src/main/java/com/databricks/jdbc/dbclient/IDatabricksClient.java +++ b/src/main/java/com/databricks/jdbc/dbclient/IDatabricksClient.java @@ -106,6 +106,17 @@ DatabricksResultSet executeStatementAsync( @DatabricksMetricsTimed void cancelStatement(StatementId statementId) throws DatabricksSQLException; + /** + * Checks the status of a statement without fetching results. Used for heartbeat polling to keep + * server-side operation state alive during slow result consumption. + * + * @param statementId statement to check status for + * @return true if the statement is still in a non-terminal state (alive), false if terminal + */ + default boolean checkStatementAlive(StatementId statementId) throws SQLException { + return false; // default no-op for clients that don't support heartbeat + } + /** * Fetches result for underlying statement-Id * diff --git a/src/main/java/com/databricks/jdbc/dbclient/impl/sqlexec/DatabricksSdkClient.java b/src/main/java/com/databricks/jdbc/dbclient/impl/sqlexec/DatabricksSdkClient.java index f97e47327..db7db0948 100644 --- a/src/main/java/com/databricks/jdbc/dbclient/impl/sqlexec/DatabricksSdkClient.java +++ b/src/main/java/com/databricks/jdbc/dbclient/impl/sqlexec/DatabricksSdkClient.java @@ -404,6 +404,27 @@ public DatabricksResultSet executeStatementAsync( parentStatement); } + @Override + public boolean checkStatementAlive(StatementId typedStatementId) throws SQLException { + String statementId = typedStatementId.toSQLExecStatementId(); + String getStatusPath = String.format(STATEMENT_PATH_WITH_ID, statementId); + try { + GetStatementRequest request = new GetStatementRequest().setStatementId(statementId); + Request req = new Request(Request.GET, getStatusPath, apiClient.serialize(request)); + req.withHeaders(getHeaders("getStatement")); + GetStatementResponse response = apiClient.execute(req, GetStatementResponse.class); + StatementState state = response.getStatus().getState(); + // Terminal states mean the operation is no longer alive + return state != StatementState.CANCELED + && state != StatementState.CLOSED + && state != StatementState.FAILED; + } catch (IOException e) { + LOGGER.debug("Heartbeat check failed for statement {}: {}", statementId, e.getMessage()); + throw new DatabricksSQLException( + "Heartbeat status check failed", e, DatabricksDriverErrorCode.SDK_CLIENT_ERROR); + } + } + @Override public DatabricksResultSet getStatementResult( StatementId typedStatementId, diff --git a/src/main/java/com/databricks/jdbc/dbclient/impl/thrift/DatabricksThriftAccessor.java b/src/main/java/com/databricks/jdbc/dbclient/impl/thrift/DatabricksThriftAccessor.java index c816fbdde..789215057 100644 --- a/src/main/java/com/databricks/jdbc/dbclient/impl/thrift/DatabricksThriftAccessor.java +++ b/src/main/java/com/databricks/jdbc/dbclient/impl/thrift/DatabricksThriftAccessor.java @@ -907,7 +907,8 @@ private TimeoutHandler getTimeoutHandler( internalErrorCode); } - private TGetOperationStatusResp getOperationStatus( + // Package-visible for heartbeat access from DatabricksThriftServiceClient + TGetOperationStatusResp getOperationStatus( TGetOperationStatusReq statusReq, StatementId statementId) throws TException { long operationStatusStartTime = System.nanoTime(); TGetOperationStatusResp operationStatus = getThriftClient().GetOperationStatus(statusReq); diff --git a/src/main/java/com/databricks/jdbc/dbclient/impl/thrift/DatabricksThriftServiceClient.java b/src/main/java/com/databricks/jdbc/dbclient/impl/thrift/DatabricksThriftServiceClient.java index 0057c7faa..0eb0c1179 100644 --- a/src/main/java/com/databricks/jdbc/dbclient/impl/thrift/DatabricksThriftServiceClient.java +++ b/src/main/java/com/databricks/jdbc/dbclient/impl/thrift/DatabricksThriftServiceClient.java @@ -42,6 +42,7 @@ import java.sql.SQLException; import java.util.*; import java.util.stream.Collectors; +import org.apache.thrift.TException; public class DatabricksThriftServiceClient implements IDatabricksClient, IDatabricksMetadataClient { @@ -269,6 +270,29 @@ private TExecuteStatementReq getRequest( return request; } + @Override + public boolean checkStatementAlive(StatementId statementId) throws DatabricksSQLException { + LOGGER.debug("Heartbeat check for statement {} using Thrift client", statementId); + DatabricksThreadContextHolder.setStatementId(statementId); + try { + TGetOperationStatusReq statusReq = + new TGetOperationStatusReq() + .setOperationHandle(getOperationHandle(statementId)) + .setGetProgressUpdate(false); + TGetOperationStatusResp resp = thriftAccessor.getOperationStatus(statusReq, statementId); + TOperationState state = resp.getOperationState(); + // Terminal states mean the operation is no longer alive + return state != TOperationState.CANCELED_STATE + && state != TOperationState.CLOSED_STATE + && state != TOperationState.ERROR_STATE + && state != TOperationState.TIMEDOUT_STATE; + } catch (TException e) { + LOGGER.debug("Heartbeat check failed for statement {}: {}", statementId, e.getMessage()); + throw new DatabricksSQLException( + "Heartbeat status check failed", e, DatabricksDriverErrorCode.INVALID_STATE); + } + } + @Override public void closeStatement(StatementId statementId) throws DatabricksSQLException { LOGGER.debug( diff --git a/src/test/java/com/databricks/jdbc/api/impl/ResultHeartbeatManagerTest.java b/src/test/java/com/databricks/jdbc/api/impl/ResultHeartbeatManagerTest.java new file mode 100644 index 000000000..86d3441e9 --- /dev/null +++ b/src/test/java/com/databricks/jdbc/api/impl/ResultHeartbeatManagerTest.java @@ -0,0 +1,117 @@ +package com.databricks.jdbc.api.impl; + +import static org.junit.jupiter.api.Assertions.*; + +import com.databricks.jdbc.dbclient.impl.common.StatementId; +import java.util.concurrent.CountDownLatch; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicInteger; +import org.junit.jupiter.api.AfterEach; +import org.junit.jupiter.api.Test; + +public class ResultHeartbeatManagerTest { + + private ResultHeartbeatManager manager; + + @AfterEach + void tearDown() { + if (manager != null && !manager.isShutdown()) { + manager.shutdown(); + } + } + + @Test + void testStartAndStopHeartbeat() throws Exception { + manager = new ResultHeartbeatManager(1); + StatementId id = new StatementId("test-stmt-1"); + AtomicInteger counter = new AtomicInteger(0); + + manager.startHeartbeat(id, counter::incrementAndGet); + assertEquals(1, manager.getActiveHeartbeatCount()); + + // Wait for at least one execution + Thread.sleep(1500); + assertTrue(counter.get() >= 1, "Heartbeat should have executed at least once"); + + manager.stopHeartbeat(id); + assertEquals(0, manager.getActiveHeartbeatCount()); + + int countAfterStop = counter.get(); + Thread.sleep(1500); + assertEquals(countAfterStop, counter.get(), "No more heartbeats after stop"); + } + + @Test + void testStopIsIdempotent() { + manager = new ResultHeartbeatManager(60); + StatementId id = new StatementId("test-stmt-2"); + + manager.startHeartbeat(id, () -> {}); + manager.stopHeartbeat(id); + assertDoesNotThrow(() -> manager.stopHeartbeat(id)); + assertDoesNotThrow(() -> manager.stopHeartbeat(new StatementId("nonexistent"))); + assertDoesNotThrow(() -> manager.stopHeartbeat(null)); + } + + @Test + void testShutdownCancelsAll() { + manager = new ResultHeartbeatManager(60); + manager.startHeartbeat(new StatementId("a"), () -> {}); + manager.startHeartbeat(new StatementId("b"), () -> {}); + manager.startHeartbeat(new StatementId("c"), () -> {}); + assertEquals(3, manager.getActiveHeartbeatCount()); + + manager.shutdown(); + assertEquals(0, manager.getActiveHeartbeatCount()); + assertTrue(manager.isShutdown()); + } + + @Test + void testStartAfterShutdownIsNoOp() { + manager = new ResultHeartbeatManager(60); + manager.shutdown(); + + manager.startHeartbeat(new StatementId("late"), () -> {}); + assertEquals(0, manager.getActiveHeartbeatCount()); + } + + @Test + void testReExecutionReplacesHeartbeat() throws Exception { + manager = new ResultHeartbeatManager(1); + StatementId id = new StatementId("reuse"); + AtomicInteger firstCounter = new AtomicInteger(0); + AtomicInteger secondCounter = new AtomicInteger(0); + + manager.startHeartbeat(id, firstCounter::incrementAndGet); + Thread.sleep(1500); + assertTrue(firstCounter.get() >= 1); + + // Re-start with new task (simulates re-execution) + manager.startHeartbeat(id, secondCounter::incrementAndGet); + assertEquals(1, manager.getActiveHeartbeatCount()); + + int firstCountAtReplace = firstCounter.get(); + Thread.sleep(1500); + assertTrue(secondCounter.get() >= 1, "New heartbeat should execute"); + assertEquals(firstCountAtReplace, firstCounter.get(), "Old heartbeat should no longer execute"); + } + + @Test + void testHeartbeatExecutesAtInterval() throws Exception { + manager = new ResultHeartbeatManager(1); + CountDownLatch latch = new CountDownLatch(3); + + manager.startHeartbeat(new StatementId("interval"), () -> latch.countDown()); + + boolean completed = latch.await(5, TimeUnit.SECONDS); + assertTrue(completed, "Heartbeat should have executed 3 times within 5 seconds"); + } + + @Test + void testNullStatementIdHandled() { + manager = new ResultHeartbeatManager(60); + assertDoesNotThrow(() -> manager.startHeartbeat(null, () -> {})); + assertDoesNotThrow(() -> manager.stopHeartbeat(null)); + assertEquals(0, manager.getActiveHeartbeatCount()); + } +} From d3f0c47d22fc1c2ace5e9044209d4e47f132a8c9 Mon Sep 17 00:00:00 2001 From: Gopal Lal Date: Wed, 22 Apr 2026 09:52:16 +0530 Subject: [PATCH 03/23] Fix: don't exclude metadata results from heartbeat eligibility Metadata operations (getTables, getColumns, etc.) can return large result sets that may need heartbeat. Only UPDATE statements are excluded (they return update count, no data to keep alive). Signed-off-by: Gopal Lal Co-authored-by: Isaac Signed-off-by: Gopal Lal --- .../com/databricks/jdbc/api/impl/DatabricksResultSet.java | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java b/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java index e2461ef0c..9e5b109b9 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java +++ b/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java @@ -315,8 +315,8 @@ private void startHeartbeatIfEnabled() { return; } - // Skip if this is an update count (no result rows) - if (statementType == StatementType.UPDATE || statementType == StatementType.METADATA) { + // Skip if this is an update count (no result rows to keep alive) + if (statementType == StatementType.UPDATE) { return; } From cedf1445f833fb7504b3f970123773e2f5c9ddfc Mon Sep 17 00:00:00 2001 From: Gopal Lal Date: Wed, 22 Apr 2026 09:56:08 +0530 Subject: [PATCH 04/23] Skip heartbeat for direct results (CLOSED state) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Direct results mean the server already closed the operation and delivered all data inline. No heartbeat needed — detect via ExecutionState.CLOSED in the constructor rather than waiting for the first poll to discover it. Signed-off-by: Gopal Lal Co-authored-by: Isaac Signed-off-by: Gopal Lal --- .../databricks/jdbc/api/impl/DatabricksResultSet.java | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java b/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java index 9e5b109b9..9154d1d97 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java +++ b/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java @@ -307,18 +307,21 @@ private void startHeartbeatIfEnabled() { return; } - // Skip heartbeat for result types where all data is already client-side: + // Skip heartbeat when all data is already client-side or server already closed: // - SEA_INLINE (InlineJsonResult): all rows loaded in memory at construction - // - Update count results: no result data to keep alive // - No execution result: nothing to fetch + // - Update count: no result data to keep alive + // - Direct results (CLOSED state): server already closed the operation, data delivered inline if (resultSetType == ResultSetType.SEA_INLINE || executionResult == null) { return; } - - // Skip if this is an update count (no result rows to keep alive) if (statementType == StatementType.UPDATE) { return; } + if (executionStatus != null + && executionStatus.getExecutionState() == com.databricks.jdbc.api.ExecutionState.CLOSED) { + return; // direct results — server already closed, no heartbeat needed + } try { DatabricksConnection conn = From 5d9aadc523faef36f1ae2b7cfb06b76d5f17aa89 Mon Sep 17 00:00:00 2001 From: Gopal Lal Date: Wed, 22 Apr 2026 10:10:38 +0530 Subject: [PATCH 05/23] Document async execution heartbeat policy + update eligibility table MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - No heartbeat during executeAsync() wait — user controls polling via getExecutionResult(). Heartbeat starts only when ResultSet is constructed and user begins consuming results. - Updated eligibility table: SEA inline doesn't need heartbeat (all data in memory), added update count row. - Added execution phase vs consumption phase diagram. Signed-off-by: Gopal Lal Co-authored-by: Isaac Signed-off-by: Gopal Lal --- docs/design/HEARTBEAT_KEEP_ALIVE.md | 43 ++++++++++++++++--- .../jdbc/api/impl/DatabricksStatement.java | 2 + 2 files changed, 40 insertions(+), 5 deletions(-) diff --git a/docs/design/HEARTBEAT_KEEP_ALIVE.md b/docs/design/HEARTBEAT_KEEP_ALIVE.md index 5bd497763..dfd4d7944 100644 --- a/docs/design/HEARTBEAT_KEEP_ALIVE.md +++ b/docs/design/HEARTBEAT_KEEP_ALIVE.md @@ -339,10 +339,42 @@ The task: 4. If the RPC fails (connection error, timeout), increments a consecutive failure counter. After **10 consecutive failures** (~10 minutes at default interval), stops the heartbeat. A single success resets the counter. (Learned from ADBC C# PR #372 — a single transient error like TLS recycling should not permanently kill the heartbeat) 5. Does NOT throw exceptions — failures are logged, not propagated to the user's thread +### When Heartbeat Starts and Stops + +The heartbeat only starts **after** the main thread's execution polling completes and the ResultSet is constructed. During query execution, the driver's own polling loop (200ms interval) keeps the operation alive — the heartbeat is not needed and does not run during this phase. + +``` +Execution phase (driver polls): Result consumption phase (heartbeat polls): + executeQuery() rs = getResultSet() + → client.executeStatement() heartbeat starts ──────────────┐ + → poll every 200ms ──────────┐ │ + ← SUCCEEDED │ rs.next() ... pause ... │ heartbeat 60s + ← ResultSet created ───────────┘ rs.next() ... pause ... │ heartbeat 60s + rs.close() ───────────────────┘ heartbeat stops +``` + +### Async Execution: No Heartbeat During Wait + +For `executeAsync()`, the user controls polling via `getExecutionResult()`. The heartbeat does **not** run between `executeAsync()` and `getExecutionResult()` because: + +1. The user opted into async precisely to control timing +2. Automatic heartbeat would keep the warehouse alive even if the user abandoned the query +3. The user has the poll interface — they are responsible for calling `getExecutionResult()` before the statement expires + +The heartbeat starts only when the ResultSet is available and the user begins consuming results: + +``` +stmt.executeAsync(sql) ← returns immediately, no heartbeat + ... user does other work ... ← no heartbeat (user's responsibility to poll) +rs = stmt.getExecutionResult() ← ResultSet created, heartbeat starts +rs.next() ... pause ... ← heartbeat keeps alive +rs.close() ← heartbeat stops +``` + ### Lifecycle ``` -Statement.execute() +Statement.execute() / getExecutionResult() └─▶ ResultSet created └─▶ If heartbeat enabled AND result type is eligible: startHeartbeat(statementId, heartbeatTask) @@ -364,11 +396,12 @@ Connection.close() | Result Type | Cloud Persisted? | Needs Heartbeat? | Reason | |-------------|-----------------|-------------------|--------| -| SEA inline (JSON) | No | **Yes** | Data only on cluster; lost on cluster stop | -| SEA cloud fetch (Arrow) | Yes | **Optional** | Data in cloud; survives cluster stop. But statement must stay alive for URL refresh | -| Thrift inline (columnar) | No | **Yes** | Data only on cluster | -| Thrift cloud fetch | Yes | **Optional** | Same as SEA cloud fetch | +| SEA inline (JSON) | No | **No** | All data loaded into memory at construction (InlineJsonResult) | +| SEA cloud fetch (Arrow) | Yes | **Yes** | Statement must stay alive for URL refresh | +| Thrift inline (columnar) | No | **Yes** | Data fetched on-demand from server; server can evict | +| Thrift cloud fetch | Yes | **Yes** | Operation handle must stay alive for URL refresh | | Direct results (CLOSED state) | N/A | **No** | Server already closed the operation; data fully delivered | +| Update count (DML) | N/A | **No** | No result rows; execution polling already kept it alive | ### Configuration diff --git a/src/main/java/com/databricks/jdbc/api/impl/DatabricksStatement.java b/src/main/java/com/databricks/jdbc/api/impl/DatabricksStatement.java index a4ee5a8d7..fbad59722 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/DatabricksStatement.java +++ b/src/main/java/com/databricks/jdbc/api/impl/DatabricksStatement.java @@ -679,6 +679,8 @@ public ResultSet executeAsync(String sql) throws SQLException { LOGGER.debug("ResultSet executeAsync() for statement {%s}", sql); checkIfClosed(); + // No heartbeat during async wait — the user controls polling via getExecutionResult(). + // Heartbeat starts later when the ResultSet is constructed (after getExecutionResult()). resetForNewExecution(); IDatabricksClient client = connection.getSession().getDatabricksClient(); From fa923479b8e6ad2a177bda67dedf779b69b09b67 Mon Sep 17 00:00:00 2001 From: Gopal Lal Date: Wed, 22 Apr 2026 10:13:41 +0530 Subject: [PATCH 06/23] Add changelog entry for result set heartbeat feature Signed-off-by: Gopal Lal Co-authored-by: Isaac Signed-off-by: Gopal Lal --- NEXT_CHANGELOG.md | 1 + 1 file changed, 1 insertion(+) diff --git a/NEXT_CHANGELOG.md b/NEXT_CHANGELOG.md index bc2e82ab7..bd5790e70 100644 --- a/NEXT_CHANGELOG.md +++ b/NEXT_CHANGELOG.md @@ -3,6 +3,7 @@ ## [Unreleased] ### Added +- Added result set heartbeat / keep-alive to prevent server-side result expiry during slow consumption. When enabled via `EnableHeartbeat=1`, the driver periodically polls `GetStatementStatus` (SEA) or `GetOperationStatus` (Thrift) to keep the operation alive while the client reads results. Configurable interval via `HeartbeatIntervalSeconds` (default 60s). Heartbeat automatically stops when results are fully consumed, ResultSet is closed, or the server returns a terminal state. Disabled by default due to cost implications (heartbeats keep the warehouse running). - Added `CallableStatement` support with IN parameters. `Connection.prepareCall()` now returns a working `DatabricksCallableStatement` that supports positional parameter binding and execution via `{call proc(?)}` JDBC escape syntax. OUT/INOUT parameters and named parameters throw `SQLFeatureNotSupportedException`. - Added AI coding agent detection to the User-Agent header. When the driver is invoked by a known AI coding agent (e.g. Claude Code, Cursor, Gemini CLI), `agent/` is appended to the User-Agent string. From 291be6c0a63437a8596f20b94cc3cb4c038ea2bd Mon Sep 17 00:00:00 2001 From: Gopal Lal Date: Wed, 22 Apr 2026 10:55:46 +0530 Subject: [PATCH 07/23] Add heartbeat eligibility tests + skip async PENDING/RUNNING MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Extract isHeartbeatEligible() as package-visible method for testing. Add 10 tests covering all eligibility/ineligibility scenarios: Eligible (heartbeat starts): - SEA cloud fetch (Arrow) — statement alive for URL refresh - Thrift inline — data fetched on-demand, server can evict - Thrift cloud fetch (Arrow) — operation handle alive - Metadata queries — can return large result sets Ineligible (heartbeat skipped): - SEA inline (JSON) — all data in memory - Direct results (CLOSED) — server already closed - Update count (DML) — no result rows - Null execution result — nothing to fetch - Async PENDING — user controls polling - Async RUNNING — user controls polling Signed-off-by: Gopal Lal Co-authored-by: Isaac Signed-off-by: Gopal Lal --- .../jdbc/api/impl/DatabricksResultSet.java | 58 +++++-- .../ResultSetHeartbeatEligibilityTest.java | 160 ++++++++++++++++++ 2 files changed, 204 insertions(+), 14 deletions(-) create mode 100644 src/test/java/com/databricks/jdbc/api/impl/ResultSetHeartbeatEligibilityTest.java diff --git a/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java b/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java index 9154d1d97..0987cc5bd 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java +++ b/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java @@ -306,22 +306,9 @@ private void startHeartbeatIfEnabled() { if (parentStatement == null || statementId == null) { return; } - - // Skip heartbeat when all data is already client-side or server already closed: - // - SEA_INLINE (InlineJsonResult): all rows loaded in memory at construction - // - No execution result: nothing to fetch - // - Update count: no result data to keep alive - // - Direct results (CLOSED state): server already closed the operation, data delivered inline - if (resultSetType == ResultSetType.SEA_INLINE || executionResult == null) { - return; - } - if (statementType == StatementType.UPDATE) { + if (!isHeartbeatEligible()) { return; } - if (executionStatus != null - && executionStatus.getExecutionState() == com.databricks.jdbc.api.ExecutionState.CLOSED) { - return; // direct results — server already closed, no heartbeat needed - } try { DatabricksConnection conn = @@ -390,6 +377,49 @@ private void stopHeartbeat() { } } + /** + * Determines whether this result set is eligible for heartbeat polling. Package-visible for + * testing. + * + *

Heartbeat is NOT needed when: + * + *

    + *
  • No execution result (nothing to fetch, also covers async PENDING/RUNNING with no data) + *
  • SEA inline (InlineJsonResult): all rows loaded in memory at construction + *
  • Update count (DML): no result rows to keep alive + *
  • Direct results (CLOSED state): server already closed, data fully delivered + *
  • Async execution (PENDING/RUNNING): user controls polling via getExecutionResult() + *
+ */ + boolean isHeartbeatEligible() { + // No execution result — nothing to fetch + if (executionResult == null) { + return false; + } + // SEA inline — all data loaded in memory at construction + if (resultSetType == ResultSetType.SEA_INLINE) { + return false; + } + // Update count — no result rows + if (statementType == StatementType.UPDATE) { + return false; + } + // Check execution state + if (executionStatus != null) { + com.databricks.jdbc.api.ExecutionState state = executionStatus.getExecutionState(); + // Direct results — server already closed + if (state == com.databricks.jdbc.api.ExecutionState.CLOSED) { + return false; + } + // Async execution — user controls polling + if (state == com.databricks.jdbc.api.ExecutionState.PENDING + || state == com.databricks.jdbc.api.ExecutionState.RUNNING) { + return false; + } + } + return true; + } + private static TelemetryCollector resolveTelemetryCollector( IDatabricksStatementInternal parentStatement) { try { diff --git a/src/test/java/com/databricks/jdbc/api/impl/ResultSetHeartbeatEligibilityTest.java b/src/test/java/com/databricks/jdbc/api/impl/ResultSetHeartbeatEligibilityTest.java new file mode 100644 index 000000000..799c88c10 --- /dev/null +++ b/src/test/java/com/databricks/jdbc/api/impl/ResultSetHeartbeatEligibilityTest.java @@ -0,0 +1,160 @@ +package com.databricks.jdbc.api.impl; + +import static org.junit.jupiter.api.Assertions.*; +import static org.mockito.Mockito.*; + +import com.databricks.jdbc.common.StatementType; +import com.databricks.jdbc.dbclient.impl.common.StatementId; +import com.databricks.jdbc.model.core.StatementStatus; +import com.databricks.sdk.service.sql.StatementState; +import org.junit.jupiter.api.Test; + +/** + * Tests for heartbeat eligibility logic in DatabricksResultSet. Verifies that heartbeat is started + * only for result sets that need it and skipped for cases where all data is already client-side or + * the user controls polling. + */ +public class ResultSetHeartbeatEligibilityTest { + + private DatabricksResultSet createResultSet( + StatementState state, + StatementType statementType, + DatabricksResultSet.ResultSetType resultSetType, + boolean hasExecutionResult) { + DatabricksResultSet rs = mock(DatabricksResultSet.class, CALLS_REAL_METHODS); + + // Set fields via reflection since they're private + try { + setField(rs, "executionStatus", new ExecutionStatus(new StatementStatus().setState(state))); + setField(rs, "statementType", statementType); + setField(rs, "resultSetType", resultSetType); + setField(rs, "executionResult", hasExecutionResult ? mock(IExecutionResult.class) : null); + setField(rs, "statementId", new StatementId("test-stmt")); + } catch (Exception e) { + throw new RuntimeException(e); + } + return rs; + } + + private void setField(Object obj, String fieldName, Object value) throws Exception { + java.lang.reflect.Field field = DatabricksResultSet.class.getDeclaredField(fieldName); + field.setAccessible(true); + field.set(obj, value); + } + + // === Eligible cases === + + @Test + void testSeaCloudFetchIsEligible() { + DatabricksResultSet rs = + createResultSet( + StatementState.SUCCEEDED, + StatementType.QUERY, + DatabricksResultSet.ResultSetType.SEA_ARROW_ENABLED, + true); + assertTrue(rs.isHeartbeatEligible(), "SEA cloud fetch should be eligible"); + } + + @Test + void testThriftInlineIsEligible() { + DatabricksResultSet rs = + createResultSet( + StatementState.SUCCEEDED, + StatementType.QUERY, + DatabricksResultSet.ResultSetType.THRIFT_INLINE, + true); + assertTrue(rs.isHeartbeatEligible(), "Thrift inline should be eligible"); + } + + @Test + void testThriftArrowIsEligible() { + DatabricksResultSet rs = + createResultSet( + StatementState.SUCCEEDED, + StatementType.QUERY, + DatabricksResultSet.ResultSetType.THRIFT_ARROW_ENABLED, + true); + assertTrue(rs.isHeartbeatEligible(), "Thrift arrow should be eligible"); + } + + @Test + void testMetadataQueryIsEligible() { + DatabricksResultSet rs = + createResultSet( + StatementState.SUCCEEDED, + StatementType.METADATA, + DatabricksResultSet.ResultSetType.THRIFT_INLINE, + true); + assertTrue(rs.isHeartbeatEligible(), "Metadata queries can have large results"); + } + + // === Ineligible cases === + + @Test + void testSeaInlineNotEligible() { + DatabricksResultSet rs = + createResultSet( + StatementState.SUCCEEDED, + StatementType.QUERY, + DatabricksResultSet.ResultSetType.SEA_INLINE, + true); + assertFalse(rs.isHeartbeatEligible(), "SEA inline has all data in memory"); + } + + @Test + void testDirectResultsNotEligible() { + DatabricksResultSet rs = + createResultSet( + StatementState.CLOSED, + StatementType.QUERY, + DatabricksResultSet.ResultSetType.SEA_ARROW_ENABLED, + true); + assertFalse(rs.isHeartbeatEligible(), "Direct results — server already closed"); + } + + @Test + void testUpdateCountNotEligible() { + DatabricksResultSet rs = + createResultSet( + StatementState.SUCCEEDED, + StatementType.UPDATE, + DatabricksResultSet.ResultSetType.UNASSIGNED, + true); + assertFalse(rs.isHeartbeatEligible(), "Update count has no result rows"); + } + + @Test + void testNullExecutionResultNotEligible() { + DatabricksResultSet rs = + createResultSet( + StatementState.SUCCEEDED, + StatementType.QUERY, + DatabricksResultSet.ResultSetType.UNASSIGNED, + false); + assertFalse(rs.isHeartbeatEligible(), "No execution result — nothing to fetch"); + } + + @Test + void testAsyncPendingNotEligible() { + DatabricksResultSet rs = + createResultSet( + StatementState.PENDING, + StatementType.SQL, + DatabricksResultSet.ResultSetType.UNASSIGNED, + true); + assertFalse( + rs.isHeartbeatEligible(), "Async PENDING — user controls polling via getExecutionResult"); + } + + @Test + void testAsyncRunningNotEligible() { + DatabricksResultSet rs = + createResultSet( + StatementState.RUNNING, + StatementType.SQL, + DatabricksResultSet.ResultSetType.UNASSIGNED, + true); + assertFalse( + rs.isHeartbeatEligible(), "Async RUNNING — user controls polling via getExecutionResult"); + } +} From fe84cc43e6bb810d08cb2c610d61c7fc92145fda Mon Sep 17 00:00:00 2001 From: Gopal Lal Date: Wed, 22 Apr 2026 11:11:18 +0530 Subject: [PATCH 08/23] Fix thread-safety and robustness issues in heartbeat MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 1. consecutiveFailures → AtomicInteger (was plain int written by scheduler thread, no happens-before guarantee) 2. Stopped flag prevents RPC on closed client/session. stopHeartbeat sets AtomicBoolean flag BEFORE cancel(false). In-flight heartbeat tick checks flag before RPC, skips if set. Exceptions during shutdown don't count as consecutive failures. 3. Constructor leak: verified startHeartbeatIfEnabled() is already the last line of the constructor, after all throwing code. No change needed — already safe. 4. HeartbeatIntervalSeconds bounds check: reject <= 0 (use default 60), warn for > 3600 (heartbeat may not keep operation alive). Signed-off-by: Gopal Lal Co-authored-by: Isaac Signed-off-by: Gopal Lal --- .../api/impl/DatabricksConnectionContext.java | 14 +++- .../jdbc/api/impl/DatabricksResultSet.java | 64 +++++++++++-------- .../jdbc/api/impl/ResultHeartbeatManager.java | 23 +++++++ 3 files changed, 73 insertions(+), 28 deletions(-) diff --git a/src/main/java/com/databricks/jdbc/api/impl/DatabricksConnectionContext.java b/src/main/java/com/databricks/jdbc/api/impl/DatabricksConnectionContext.java index 1d590ba93..0b2aa7f04 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/DatabricksConnectionContext.java +++ b/src/main/java/com/databricks/jdbc/api/impl/DatabricksConnectionContext.java @@ -913,7 +913,19 @@ public boolean isHeartbeatEnabled() { } public int getHeartbeatIntervalSeconds() { - return Integer.parseInt(getParameter(DatabricksJdbcUrlParams.HEARTBEAT_INTERVAL_SECONDS)); + int interval = + Integer.parseInt(getParameter(DatabricksJdbcUrlParams.HEARTBEAT_INTERVAL_SECONDS)); + if (interval <= 0) { + LOGGER.warn("HeartbeatIntervalSeconds must be positive, got {}. Using default 60.", interval); + return 60; + } + if (interval > 3600) { + LOGGER.warn( + "HeartbeatIntervalSeconds {} is very large (> 1 hour). " + + "Heartbeat may not keep the operation alive.", + interval); + } + return interval; } @Override diff --git a/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java b/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java index 0987cc5bd..04d0cc307 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java +++ b/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java @@ -320,36 +320,46 @@ private void startHeartbeatIfEnabled() { IDatabricksClient client = conn.getSession().getDatabricksClient(); final int maxConsecutiveFailures = 10; + final java.util.concurrent.atomic.AtomicInteger consecutiveFailures = + new java.util.concurrent.atomic.AtomicInteger(0); + // Get the stopped flag from the manager — shared between the heartbeat task and + // stopHeartbeat(). Prevents RPC on a just-closed client/session: stopHeartbeat sets + // the flag before cancel(false), so an in-flight tick sees it and skips the RPC. + final java.util.concurrent.atomic.AtomicBoolean stopped = mgr.getStoppedFlag(statementId); Runnable heartbeatTask = - new Runnable() { - private int consecutiveFailures = 0; - - @Override - public void run() { - try { - boolean alive = client.checkStatementAlive(statementId); - consecutiveFailures = 0; // reset on success - if (!alive) { - LOGGER.info( - "Heartbeat detected terminal state for statement {}, stopping", statementId); - stopHeartbeat(); - } - } catch (Exception e) { - consecutiveFailures++; - LOGGER.debug( - "Heartbeat failed for statement {} (failure {}/{}): {}", + () -> { + if (stopped.get()) { + return; // client/session may be closed, skip RPC + } + try { + boolean alive = client.checkStatementAlive(statementId); + consecutiveFailures.set(0); // reset on success + if (!alive) { + LOGGER.info( + "Heartbeat detected terminal state for statement {}, stopping", statementId); + stopped.set(true); + stopHeartbeat(); + } + } catch (Exception e) { + // If stopped was set during the RPC (connection closing), don't count as failure + if (stopped.get()) { + return; + } + int failures = consecutiveFailures.incrementAndGet(); + LOGGER.debug( + "Heartbeat failed for statement {} (failure {}/{}): {}", + statementId, + failures, + maxConsecutiveFailures, + e.getMessage()); + if (failures >= maxConsecutiveFailures) { + LOGGER.info( + "Heartbeat stopped for statement {} after {} consecutive failures", statementId, - consecutiveFailures, - maxConsecutiveFailures, - e.getMessage()); - if (consecutiveFailures >= maxConsecutiveFailures) { - LOGGER.info( - "Heartbeat stopped for statement {} after {} consecutive failures", - statementId, - consecutiveFailures); - stopHeartbeat(); - } + failures); + stopped.set(true); + stopHeartbeat(); } } }; diff --git a/src/main/java/com/databricks/jdbc/api/impl/ResultHeartbeatManager.java b/src/main/java/com/databricks/jdbc/api/impl/ResultHeartbeatManager.java index cc8f63270..9cdbd4872 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/ResultHeartbeatManager.java +++ b/src/main/java/com/databricks/jdbc/api/impl/ResultHeartbeatManager.java @@ -31,6 +31,8 @@ class ResultHeartbeatManager { private final ScheduledExecutorService scheduler; private final Map> activeHeartbeats = new ConcurrentHashMap<>(); + private final Map stoppedFlags = + new ConcurrentHashMap<>(); private final int intervalSeconds; private volatile boolean isShutdown = false; @@ -78,6 +80,12 @@ void stopHeartbeat(StatementId statementId) { return; } + // Set the stopped flag BEFORE canceling — prevents in-flight RPC from calling a closed client + java.util.concurrent.atomic.AtomicBoolean flag = stoppedFlags.remove(statementId); + if (flag != null) { + flag.set(true); + } + ScheduledFuture future = activeHeartbeats.remove(statementId); if (future != null) { future.cancel(false); // don't interrupt if currently running @@ -85,10 +93,25 @@ void stopHeartbeat(StatementId statementId) { } } + /** + * Returns a stopped flag for the given statement. The heartbeat task should check this before + * each RPC to avoid calling a closed client/session. + */ + java.util.concurrent.atomic.AtomicBoolean getStoppedFlag(StatementId statementId) { + return stoppedFlags.computeIfAbsent( + statementId, k -> new java.util.concurrent.atomic.AtomicBoolean(false)); + } + /** Stops all heartbeats and shuts down the scheduler. Called on Connection.close(). */ void shutdown() { isShutdown = true; + // Set all stopped flags FIRST — prevents in-flight RPCs from calling closed clients + for (java.util.concurrent.atomic.AtomicBoolean flag : stoppedFlags.values()) { + flag.set(true); + } + stoppedFlags.clear(); + for (Map.Entry> entry : activeHeartbeats.entrySet()) { entry.getValue().cancel(false); LOGGER.debug("Stopped heartbeat for statement {} during shutdown", entry.getKey()); From 53db6450f4317ff5bc2f964340c37a5f5ec3c5fc Mon Sep 17 00:00:00 2001 From: Gopal Lal Date: Wed, 22 Apr 2026 11:27:40 +0530 Subject: [PATCH 09/23] Address should-fix review feedback 1. Null-defense on Thrift response: if getOperationState() returns null, assume alive and log warning (prevents NPE) 2. Better logging for heartbeat failures: first failure at INFO, terminal (10th) failure at WARN with statement ID and error message. Users will now see early signals instead of cryptic "operation not found" on next() 3. Stop old heartbeat on re-execute: resetForNewExecution() now explicitly calls stopHeartbeat(oldStatementId) before clearing state. Prevents wasteful 10-failure self-termination of orphaned heartbeats 4. Document cloud-fetch prefetch interaction: noted that StreamingChunkProvider/RemoteChunkProvider background RPCs act as implicit heartbeat. Explicit heartbeat is still useful for gaps (all chunks downloaded, prefetch paused, sliding window full) Signed-off-by: Gopal Lal Co-authored-by: Isaac Signed-off-by: Gopal Lal --- docs/design/HEARTBEAT_KEEP_ALIVE.md | 9 ++++++ .../jdbc/api/impl/DatabricksResultSet.java | 29 +++++++++++++------ .../jdbc/api/impl/DatabricksStatement.java | 10 +++++++ .../thrift/DatabricksThriftServiceClient.java | 6 ++++ 4 files changed, 45 insertions(+), 9 deletions(-) diff --git a/docs/design/HEARTBEAT_KEEP_ALIVE.md b/docs/design/HEARTBEAT_KEEP_ALIVE.md index dfd4d7944..0d531c48d 100644 --- a/docs/design/HEARTBEAT_KEEP_ALIVE.md +++ b/docs/design/HEARTBEAT_KEEP_ALIVE.md @@ -403,6 +403,15 @@ Connection.close() | Direct results (CLOSED state) | N/A | **No** | Server already closed the operation; data fully delivered | | Update count (DML) | N/A | **No** | No result rows; execution polling already kept it alive | +### Note: Cloud Fetch Prefetch Interaction + +For cloud fetch result types (SEA Arrow, Thrift cloud fetch), the `StreamingChunkProvider` and `RemoteChunkProvider` already make background RPCs to download chunks and refresh presigned URLs. These background fetches act as an **implicit heartbeat** — each `FetchResults` or `GetStatementResultChunkN` RPC constitutes server activity. + +When both the prefetch threads and the explicit heartbeat are active, the heartbeat is technically redundant. However, we still enable it for cloud fetch because: +1. The prefetch threads stop once all chunks are downloaded — the heartbeat continues during the gap between "all chunks downloaded" and "user finishes reading" +2. If the prefetch is paused (e.g., sliding window full, waiting for user to consume), the heartbeat fills the gap +3. The heartbeat cost is minimal (one lightweight GET/status check per minute) + ### Configuration New connection parameters: diff --git a/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java b/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java index 04d0cc307..95fcc3f0c 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java +++ b/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java @@ -347,17 +347,28 @@ private void startHeartbeatIfEnabled() { return; } int failures = consecutiveFailures.incrementAndGet(); - LOGGER.debug( - "Heartbeat failed for statement {} (failure {}/{}): {}", - statementId, - failures, - maxConsecutiveFailures, - e.getMessage()); - if (failures >= maxConsecutiveFailures) { + if (failures == 1) { + // First failure — log at INFO so users see the initial problem LOGGER.info( - "Heartbeat stopped for statement {} after {} consecutive failures", + "Heartbeat failed for statement {} (first failure): {}", + statementId, + e.getMessage()); + } else { + LOGGER.debug( + "Heartbeat failed for statement {} (failure {}/{}): {}", + statementId, + failures, + maxConsecutiveFailures, + e.getMessage()); + } + if (failures >= maxConsecutiveFailures) { + // Terminal failure — log at WARN so it's visible in default log config + LOGGER.warn( + "Heartbeat stopped for statement {} after {} consecutive failures. " + + "Server-side results may expire. Last error: {}", statementId, - failures); + failures, + e.getMessage()); stopped.set(true); stopHeartbeat(); } diff --git a/src/main/java/com/databricks/jdbc/api/impl/DatabricksStatement.java b/src/main/java/com/databricks/jdbc/api/impl/DatabricksStatement.java index fbad59722..eab0702c5 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/DatabricksStatement.java +++ b/src/main/java/com/databricks/jdbc/api/impl/DatabricksStatement.java @@ -978,6 +978,16 @@ private void resetForNewExecution() { // when the server returns unexpected responses (e.g., WireMock 404 in tests). // For direct results, the server already closed the handle. + // Stop heartbeat for the previous execution before clearing state. + // Without this, the old heartbeat (keyed by old statementId) would fail and self-terminate + // after 10 consecutive failures — wasteful and noisy in logs. + if (statementId != null) { + ResultHeartbeatManager mgr = connection.getHeartbeatManager(); + if (mgr != null) { + mgr.stopHeartbeat(statementId); + } + } + directResultsReceived = false; // Per JDBC spec, re-executing a Statement implicitly closes the current ResultSet. diff --git a/src/main/java/com/databricks/jdbc/dbclient/impl/thrift/DatabricksThriftServiceClient.java b/src/main/java/com/databricks/jdbc/dbclient/impl/thrift/DatabricksThriftServiceClient.java index 0eb0c1179..b58c52474 100644 --- a/src/main/java/com/databricks/jdbc/dbclient/impl/thrift/DatabricksThriftServiceClient.java +++ b/src/main/java/com/databricks/jdbc/dbclient/impl/thrift/DatabricksThriftServiceClient.java @@ -281,6 +281,12 @@ public boolean checkStatementAlive(StatementId statementId) throws DatabricksSQL .setGetProgressUpdate(false); TGetOperationStatusResp resp = thriftAccessor.getOperationStatus(statusReq, statementId); TOperationState state = resp.getOperationState(); + if (state == null) { + LOGGER.warn( + "Heartbeat for statement {} received null operation state, assuming alive", + statementId); + return true; // assume alive — server returned response but no state + } // Terminal states mean the operation is no longer alive return state != TOperationState.CANCELED_STATE && state != TOperationState.CLOSED_STATE From 994dbc2c186a077885b3d484ef1b51b26c9d1e70 Mon Sep 17 00:00:00 2001 From: Gopal Lal Date: Wed, 22 Apr 2026 11:36:30 +0530 Subject: [PATCH 10/23] Address test gaps + add DEBUG log on heartbeat start Tests rewritten to be deterministic (CountDownLatch, no Thread.sleep): - testStoppedFlagSetOnStop: get flag after start, verify set on stop - testStoppedFlagSetOnShutdown: same pattern, verify set on shutdown - testStopRacingWithScheduledTick: verify stopped flag prevents RPC - testShutdownWithBlockedTask: verify shutdownNow fires after 5s - testReExecutionReplacesHeartbeat: verify old task stops Fix: startHeartbeat resets stopped flag on new start (was stale after stop-then-restart cycle). Add DEBUG log on successful heartbeat start with statementId, resultType, and interval for support diagnostics. Signed-off-by: Gopal Lal Co-authored-by: Isaac Signed-off-by: Gopal Lal --- .../jdbc/api/impl/DatabricksResultSet.java | 5 + .../jdbc/api/impl/ResultHeartbeatManager.java | 7 + .../api/impl/ResultHeartbeatManagerTest.java | 197 +++++++++++++++--- 3 files changed, 177 insertions(+), 32 deletions(-) diff --git a/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java b/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java index 95fcc3f0c..8672dae28 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java +++ b/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java @@ -376,6 +376,11 @@ private void startHeartbeatIfEnabled() { }; mgr.startHeartbeat(statementId, heartbeatTask); + LOGGER.debug( + "Heartbeat started for statement {} (resultType={}, interval={}s)", + statementId, + resultSetType, + mgr.getIntervalSeconds()); } catch (Exception e) { LOGGER.debug("Failed to start heartbeat: {}", e.getMessage()); } diff --git a/src/main/java/com/databricks/jdbc/api/impl/ResultHeartbeatManager.java b/src/main/java/com/databricks/jdbc/api/impl/ResultHeartbeatManager.java index 9cdbd4872..f9f92a6be 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/ResultHeartbeatManager.java +++ b/src/main/java/com/databricks/jdbc/api/impl/ResultHeartbeatManager.java @@ -62,6 +62,9 @@ void startHeartbeat(StatementId statementId, Runnable heartbeatTask) { // Stop any existing heartbeat for this statement (e.g., re-execution) stopHeartbeat(statementId); + // Reset the stopped flag for the new heartbeat + getStoppedFlag(statementId).set(false); + LOGGER.debug( "Starting heartbeat for statement {} with interval {}s", statementId, intervalSeconds); @@ -139,4 +142,8 @@ int getActiveHeartbeatCount() { boolean isShutdown() { return isShutdown; } + + int getIntervalSeconds() { + return intervalSeconds; + } } diff --git a/src/test/java/com/databricks/jdbc/api/impl/ResultHeartbeatManagerTest.java b/src/test/java/com/databricks/jdbc/api/impl/ResultHeartbeatManagerTest.java index 86d3441e9..e47666122 100644 --- a/src/test/java/com/databricks/jdbc/api/impl/ResultHeartbeatManagerTest.java +++ b/src/test/java/com/databricks/jdbc/api/impl/ResultHeartbeatManagerTest.java @@ -5,6 +5,7 @@ import com.databricks.jdbc.dbclient.impl.common.StatementId; import java.util.concurrent.CountDownLatch; import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicBoolean; import java.util.concurrent.atomic.AtomicInteger; import org.junit.jupiter.api.AfterEach; import org.junit.jupiter.api.Test; @@ -20,31 +21,29 @@ void tearDown() { } } + // ========================================================================= + // Core lifecycle + // ========================================================================= + @Test void testStartAndStopHeartbeat() throws Exception { manager = new ResultHeartbeatManager(1); - StatementId id = new StatementId("test-stmt-1"); - AtomicInteger counter = new AtomicInteger(0); + StatementId id = new StatementId("test-1"); + CountDownLatch firstExecution = new CountDownLatch(1); - manager.startHeartbeat(id, counter::incrementAndGet); + manager.startHeartbeat(id, firstExecution::countDown); assertEquals(1, manager.getActiveHeartbeatCount()); - // Wait for at least one execution - Thread.sleep(1500); - assertTrue(counter.get() >= 1, "Heartbeat should have executed at least once"); + assertTrue(firstExecution.await(5, TimeUnit.SECONDS), "Heartbeat should execute within 5s"); manager.stopHeartbeat(id); assertEquals(0, manager.getActiveHeartbeatCount()); - - int countAfterStop = counter.get(); - Thread.sleep(1500); - assertEquals(countAfterStop, counter.get(), "No more heartbeats after stop"); } @Test void testStopIsIdempotent() { manager = new ResultHeartbeatManager(60); - StatementId id = new StatementId("test-stmt-2"); + StatementId id = new StatementId("test-2"); manager.startHeartbeat(id, () -> {}); manager.stopHeartbeat(id); @@ -75,43 +74,177 @@ void testStartAfterShutdownIsNoOp() { assertEquals(0, manager.getActiveHeartbeatCount()); } + @Test + void testNullStatementIdHandled() { + manager = new ResultHeartbeatManager(60); + assertDoesNotThrow(() -> manager.startHeartbeat(null, () -> {})); + assertDoesNotThrow(() -> manager.stopHeartbeat(null)); + assertEquals(0, manager.getActiveHeartbeatCount()); + } + + // ========================================================================= + // Re-execution replaces heartbeat + // ========================================================================= + @Test void testReExecutionReplacesHeartbeat() throws Exception { manager = new ResultHeartbeatManager(1); StatementId id = new StatementId("reuse"); - AtomicInteger firstCounter = new AtomicInteger(0); - AtomicInteger secondCounter = new AtomicInteger(0); - - manager.startHeartbeat(id, firstCounter::incrementAndGet); - Thread.sleep(1500); - assertTrue(firstCounter.get() >= 1); - - // Re-start with new task (simulates re-execution) - manager.startHeartbeat(id, secondCounter::incrementAndGet); + CountDownLatch firstRan = new CountDownLatch(1); + CountDownLatch secondRan = new CountDownLatch(1); + AtomicBoolean firstStillRunning = new AtomicBoolean(true); + + manager.startHeartbeat( + id, + () -> { + firstRan.countDown(); + firstStillRunning.set(true); + }); + assertTrue(firstRan.await(5, TimeUnit.SECONDS)); + + // Replace with new task + manager.startHeartbeat( + id, + () -> { + firstStillRunning.set(false); + secondRan.countDown(); + }); assertEquals(1, manager.getActiveHeartbeatCount()); - int firstCountAtReplace = firstCounter.get(); - Thread.sleep(1500); - assertTrue(secondCounter.get() >= 1, "New heartbeat should execute"); - assertEquals(firstCountAtReplace, firstCounter.get(), "Old heartbeat should no longer execute"); + assertTrue(secondRan.await(5, TimeUnit.SECONDS)); + assertFalse(firstStillRunning.get(), "New heartbeat should have run, not old"); } + // ========================================================================= + // Heartbeat executes at interval (deterministic with latch) + // ========================================================================= + @Test - void testHeartbeatExecutesAtInterval() throws Exception { + void testHeartbeatExecutesMultipleTimes() throws Exception { manager = new ResultHeartbeatManager(1); CountDownLatch latch = new CountDownLatch(3); - manager.startHeartbeat(new StatementId("interval"), () -> latch.countDown()); + manager.startHeartbeat(new StatementId("interval"), latch::countDown); - boolean completed = latch.await(5, TimeUnit.SECONDS); - assertTrue(completed, "Heartbeat should have executed 3 times within 5 seconds"); + assertTrue(latch.await(10, TimeUnit.SECONDS), "Should execute 3 times within 10s"); } + // ========================================================================= + // Stopped flag — prevents RPC after stop + // ========================================================================= + @Test - void testNullStatementIdHandled() { + void testStoppedFlagSetOnStop() { manager = new ResultHeartbeatManager(60); - assertDoesNotThrow(() -> manager.startHeartbeat(null, () -> {})); - assertDoesNotThrow(() -> manager.stopHeartbeat(null)); - assertEquals(0, manager.getActiveHeartbeatCount()); + StatementId id = new StatementId("flag-test"); + + manager.startHeartbeat(id, () -> {}); + // Get flag AFTER start (start creates a fresh flag) + AtomicBoolean flag = manager.getStoppedFlag(id); + assertFalse(flag.get()); + + manager.stopHeartbeat(id); + assertTrue(flag.get(), "Stopped flag should be set after stopHeartbeat"); + } + + @Test + void testStoppedFlagSetOnShutdown() { + manager = new ResultHeartbeatManager(60); + StatementId id = new StatementId("shutdown-flag"); + + manager.startHeartbeat(id, () -> {}); + // Get flag AFTER start (start creates a fresh flag) + AtomicBoolean flag = manager.getStoppedFlag(id); + assertFalse(flag.get()); + + manager.shutdown(); + // Note: shutdown clears the map, so the flag instance may be orphaned. + // But it should have been set to true before removal. + assertTrue(flag.get(), "Stopped flag should be set after shutdown"); + } + + @Test + void testStopRacingWithScheduledTick() throws Exception { + manager = new ResultHeartbeatManager(1); + StatementId id = new StatementId("race"); + AtomicInteger rpcCount = new AtomicInteger(0); + CountDownLatch firstTick = new CountDownLatch(1); + + manager.startHeartbeat(id, () -> firstTick.countDown()); + // Get flag AFTER start + AtomicBoolean stoppedFlag = manager.getStoppedFlag(id); + + // Replace with a task that checks the stopped flag before "RPC" + manager.startHeartbeat( + id, + () -> { + if (!manager.getStoppedFlag(id).get()) { + rpcCount.incrementAndGet(); + } + firstTick.countDown(); + }); + + assertTrue(firstTick.await(5, TimeUnit.SECONDS)); + int countBeforeStop = rpcCount.get(); + assertTrue(countBeforeStop >= 1); + + // Stop — flag is set atomically before cancel + manager.stopHeartbeat(id); + assertTrue(stoppedFlag.get()); + + // Any tick that fires after stop will see the flag and skip the RPC + // We can't guarantee no tick fires, but the flag prevents the RPC + } + + // ========================================================================= + // Shutdown timeout — blocked task + // ========================================================================= + + @Test + void testShutdownWithBlockedTask() throws Exception { + manager = new ResultHeartbeatManager(1); + CountDownLatch taskStarted = new CountDownLatch(1); + CountDownLatch taskCanContinue = new CountDownLatch(1); + + manager.startHeartbeat( + new StatementId("blocked"), + () -> { + taskStarted.countDown(); + try { + // Simulate a task blocked longer than awaitTermination(5s) + taskCanContinue.await(30, TimeUnit.SECONDS); + } catch (InterruptedException e) { + Thread.currentThread().interrupt(); + } + }); + + assertTrue(taskStarted.await(5, TimeUnit.SECONDS), "Task should start"); + + // Shutdown should not hang — it awaits 5s then calls shutdownNow() + long start = System.currentTimeMillis(); + manager.shutdown(); + long elapsed = System.currentTimeMillis() - start; + + assertTrue(manager.isShutdown()); + // Shutdown should complete within ~6s (5s await + overhead), not 30s + assertTrue(elapsed < 15_000, "Shutdown should not hang, took " + elapsed + "ms"); + + taskCanContinue.countDown(); // cleanup + } + + // ========================================================================= + // Invalid interval validation + // ========================================================================= + + @Test + void testZeroIntervalDefaultsToSafe() { + // Zero interval would cause ScheduledExecutorService to throw. + // The validation is in DatabricksConnectionContext.getHeartbeatIntervalSeconds(). + // ResultHeartbeatManager itself receives a validated value. + // Test that the manager handles 1-second interval (minimum valid). + manager = new ResultHeartbeatManager(1); + CountDownLatch ran = new CountDownLatch(1); + manager.startHeartbeat(new StatementId("min-interval"), ran::countDown); + assertDoesNotThrow(() -> ran.await(5, TimeUnit.SECONDS)); } } From a51bccc700173ab83cfe6c8b577aad204b7d15bf Mon Sep 17 00:00:00 2001 From: Gopal Lal Date: Wed, 22 Apr 2026 11:47:02 +0530 Subject: [PATCH 11/23] Add e2e integration test for heartbeat against real warehouse Verified against dogfood warehouse: - testHeartbeatKeepsResultsAliveDuringSlowConsumption: execute query, read first row, pause 15s (3 heartbeats at 5s interval), read remaining 99 rows successfully. All 100 rows returned. - testHeartbeatStopsOnResultSetClose: verify clean shutdown after close Run with: DATABRICKS_HOST=... DATABRICKS_TOKEN=... DATABRICKS_HTTP_PATH=... \ mvn -pl jdbc-core test -Dtest="HeartbeatIntegrationTest" Signed-off-by: Gopal Lal Co-authored-by: Isaac Signed-off-by: Gopal Lal --- .../e2e/HeartbeatIntegrationTest.java | 123 ++++++++++++++++++ 1 file changed, 123 insertions(+) create mode 100644 src/test/java/com/databricks/jdbc/integration/e2e/HeartbeatIntegrationTest.java diff --git a/src/test/java/com/databricks/jdbc/integration/e2e/HeartbeatIntegrationTest.java b/src/test/java/com/databricks/jdbc/integration/e2e/HeartbeatIntegrationTest.java new file mode 100644 index 000000000..f68d1468e --- /dev/null +++ b/src/test/java/com/databricks/jdbc/integration/e2e/HeartbeatIntegrationTest.java @@ -0,0 +1,123 @@ +package com.databricks.jdbc.integration.e2e; + +import static org.junit.jupiter.api.Assertions.*; + +import java.sql.*; +import java.util.Properties; +import org.junit.jupiter.api.Tag; +import org.junit.jupiter.api.Test; + +/** + * Integration test for the heartbeat / keep-alive feature. Connects to a real Databricks warehouse + * and verifies that heartbeat polling keeps results alive during slow consumption. + * + *

Run with: mvn -pl jdbc-core test -Dtest="HeartbeatIntegrationTest" -DDATABRICKS_HOST=... + * -DDATABRICKS_TOKEN=... -DDATABRICKS_HTTP_PATH=... + * + *

Or set environment variables: DATABRICKS_HOST, DATABRICKS_TOKEN, DATABRICKS_HTTP_PATH + */ +@Tag("e2e") +public class HeartbeatIntegrationTest { + + private static String getEnvOrProp(String name) { + String value = System.getProperty(name); + if (value == null) { + value = System.getenv(name); + } + return value; + } + + private Connection createConnection(int heartbeatIntervalSeconds) throws Exception { + String host = getEnvOrProp("DATABRICKS_HOST"); + String token = getEnvOrProp("DATABRICKS_TOKEN"); + String httpPath = getEnvOrProp("DATABRICKS_HTTP_PATH"); + + if (host == null || token == null || httpPath == null) { + throw new IllegalStateException( + "Set DATABRICKS_HOST, DATABRICKS_TOKEN, DATABRICKS_HTTP_PATH"); + } + + String url = + String.format( + "jdbc:databricks://%s:443/default;transportMode=http;ssl=1;AuthMech=3;" + + "httpPath=%s;EnableHeartbeat=1;HeartbeatIntervalSeconds=%d;" + + "LogLevel=6;LogPath=/tmp", + host, httpPath, heartbeatIntervalSeconds); + + Properties props = new Properties(); + props.put("UID", "token"); + props.put("PWD", token); + + Class.forName("com.databricks.client.jdbc.Driver"); + return DriverManager.getConnection(url, props); + } + + /** + * Verifies heartbeat keeps results alive during a slow consumer scenario. Executes a query, reads + * first row, pauses for 2x the heartbeat interval (to allow heartbeats to fire), then reads + * remaining rows successfully. + */ + @Test + void testHeartbeatKeepsResultsAliveDuringSlowConsumption() throws Exception { + int heartbeatInterval = 5; // 5 seconds for fast testing + try (Connection conn = createConnection(heartbeatInterval)) { + System.out.println("Connected with EnableHeartbeat=1, interval=" + heartbeatInterval + "s"); + + // Execute a query that returns multiple rows + try (Statement stmt = conn.createStatement()) { + ResultSet rs = stmt.executeQuery("SELECT explode(sequence(1, 100)) AS id"); // 100 rows + + // Read first row + assertTrue(rs.next(), "Should have first row"); + int firstId = rs.getInt("id"); + System.out.println("Read first row: id=" + firstId); + + // Pause — heartbeat should fire during this time + int pauseSeconds = heartbeatInterval * 3; + System.out.println( + "Pausing for " + + pauseSeconds + + "s (heartbeat should fire " + + (pauseSeconds / heartbeatInterval) + + " times)..."); + Thread.sleep(pauseSeconds * 1000L); + + // Read remaining rows — should succeed if heartbeat kept results alive + int rowCount = 1; + while (rs.next()) { + rowCount++; + } + System.out.println("Read " + rowCount + " total rows after pause"); + assertEquals(100, rowCount, "Should read all 100 rows"); + + rs.close(); + } + + System.out.println("Test passed — heartbeat kept results alive during slow consumption"); + } + } + + /** + * Verifies heartbeat stops when ResultSet is closed. After close, no more heartbeat RPCs should + * fire. + */ + @Test + void testHeartbeatStopsOnResultSetClose() throws Exception { + int heartbeatInterval = 5; + try (Connection conn = createConnection(heartbeatInterval)) { + try (Statement stmt = conn.createStatement()) { + ResultSet rs = stmt.executeQuery("SELECT 1 AS val"); + assertTrue(rs.next()); + assertEquals(1, rs.getInt("val")); + + // Close — heartbeat should stop + rs.close(); + + // Pause to verify no heartbeat errors after close + System.out.println("ResultSet closed, waiting to verify no heartbeat errors..."); + Thread.sleep(heartbeatInterval * 2 * 1000L); + System.out.println("No errors — heartbeat stopped cleanly"); + } + } + } +} From 723ce064775bd01830f76dcea64b22bad66e8e9a Mon Sep 17 00:00:00 2001 From: Gopal Lal Date: Mon, 11 May 2026 13:52:25 +0530 Subject: [PATCH 12/23] Fix all critical and high-severity heartbeat review findings MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Critical fixes: - C1: Fix orphaned stopped flag — lambda now reads flag from manager each tick instead of pre-capturing it. Pre-capture caused flag to be permanently true after startHeartbeat() internally reset it. - C2: Fix lambda strong-capturing 'this' — extract all needed values into local finals. Abandoned ResultSets no longer keep polling. - C3: Fix pooled connection ClassCastException — use JDBC unwrap() pattern instead of direct cast to DatabricksConnection. - C4: Fix Thrift constructor missing startHeartbeatIfEnabled() call. High-severity fixes: - H5: Move heartbeatManager.shutdown() before statement close loop in Connection.close() so it runs even if statement.close() throws. - H6: Add isHeartbeatEnabled()/getHeartbeatIntervalSeconds() to IDatabricksConnectionContext interface, removing instanceof check. - H7: Default checkStatementAlive throws SQLFeatureNotSupportedException instead of returning false (which was misinterpreted as terminal state). - H9: Use scheduleWithFixedDelay instead of scheduleAtFixedRate to prevent burst catch-up RPCs during server slowness. - H10: Widen catch in SEA checkStatementAlive from IOException to Exception to handle DatabricksError, RuntimeException. - H11: Statement.cancel() now stops the heartbeat. H8 (per-RPC timeout) deferred to follow-up PR. Co-authored-by: Isaac Signed-off-by: Gopal Lal --- .../jdbc/api/impl/DatabricksConnection.java | 17 +++--- .../jdbc/api/impl/DatabricksResultSet.java | 60 ++++++++++++------- .../jdbc/api/impl/DatabricksStatement.java | 9 +++ .../jdbc/api/impl/ResultHeartbeatManager.java | 2 +- .../IDatabricksConnectionContext.java | 10 ++++ .../jdbc/dbclient/IDatabricksClient.java | 5 +- .../impl/sqlexec/DatabricksSdkClient.java | 5 +- 7 files changed, 75 insertions(+), 33 deletions(-) diff --git a/src/main/java/com/databricks/jdbc/api/impl/DatabricksConnection.java b/src/main/java/com/databricks/jdbc/api/impl/DatabricksConnection.java index e758f1b53..961b07ad7 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/DatabricksConnection.java +++ b/src/main/java/com/databricks/jdbc/api/impl/DatabricksConnection.java @@ -67,11 +67,10 @@ public DatabricksConnection( private static ResultHeartbeatManager createHeartbeatManager( IDatabricksConnectionContext connectionContext) { - if (connectionContext instanceof DatabricksConnectionContext) { - DatabricksConnectionContext ctx = (DatabricksConnectionContext) connectionContext; - if (ctx.isHeartbeatEnabled()) { - return new ResultHeartbeatManager(ctx.getHeartbeatIntervalSeconds()); - } + // H6 fix: Use interface methods instead of instanceof check so mocks and + // alternate implementations can also enable heartbeat + if (connectionContext.isHeartbeatEnabled()) { + return new ResultHeartbeatManager(connectionContext.getHeartbeatIntervalSeconds()); } return null; } @@ -435,13 +434,15 @@ public void rollback() throws SQLException { @Override public void close() throws SQLException { LOGGER.debug("public void close()"); + // H5 fix: Shutdown heartbeat FIRST — prevents RPCs on closing connections and + // ensures shutdown runs even if statement.close() throws + if (heartbeatManager != null) { + heartbeatManager.shutdown(); + } for (IDatabricksStatementInternal statement : statementSet) { statement.close(false); statementSet.remove(statement); } - if (heartbeatManager != null) { - heartbeatManager.shutdown(); - } this.session.close(); TelemetryClientFactory.getInstance().closeTelemetryClient(connectionContext); DatabricksClientConfiguratorManager.getInstance().removeInstance(connectionContext); diff --git a/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java b/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java index 8672dae28..15ccbd6a5 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java +++ b/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java @@ -193,6 +193,7 @@ public DatabricksResultSet( this.cachedTelemetryCollector = resolveTelemetryCollector(parentStatement); this.isClosed = false; this.wasNull = false; + startHeartbeatIfEnabled(); // C4 fix: Thrift result sets also need heartbeat } /* Constructing results for getUDTs, getTypeInfo, getProcedures metadata calls */ @@ -311,74 +312,89 @@ private void startHeartbeatIfEnabled() { } try { - DatabricksConnection conn = - (DatabricksConnection) parentStatement.getStatement().getConnection(); + // C3 fix: Use JDBC unwrap() to handle pooled connection wrappers (HikariCP, DBCP) + java.sql.Connection rawConn = parentStatement.getStatement().getConnection(); + DatabricksConnection conn; + if (rawConn instanceof DatabricksConnection) { + conn = (DatabricksConnection) rawConn; + } else if (rawConn.isWrapperFor(DatabricksConnection.class)) { + conn = rawConn.unwrap(DatabricksConnection.class); + } else { + LOGGER.debug("Cannot start heartbeat: connection is not a DatabricksConnection"); + return; + } + ResultHeartbeatManager mgr = conn.getHeartbeatManager(); if (mgr == null) { return; // heartbeat not enabled } - IDatabricksClient client = conn.getSession().getDatabricksClient(); + // C2 fix: Capture only what the lambda needs — avoid capturing 'this' to prevent + // abandoned ResultSets from keeping the warehouse alive via heartbeat. + final IDatabricksClient client = conn.getSession().getDatabricksClient(); + final StatementId capturedStatementId = this.statementId; final int maxConsecutiveFailures = 10; final java.util.concurrent.atomic.AtomicInteger consecutiveFailures = new java.util.concurrent.atomic.AtomicInteger(0); - // Get the stopped flag from the manager — shared between the heartbeat task and - // stopHeartbeat(). Prevents RPC on a just-closed client/session: stopHeartbeat sets - // the flag before cancel(false), so an in-flight tick sees it and skips the RPC. - final java.util.concurrent.atomic.AtomicBoolean stopped = mgr.getStoppedFlag(statementId); + // C1 fix: Read the stopped flag from the manager on each tick instead of pre-capturing. + // Pre-capturing caused an orphan-flag bug: startHeartbeat() internally calls + // stopHeartbeat() which removes and replaces the flag, leaving the lambda with a + // permanently-true reference. Reading from the manager each tick always gets the + // current flag. + final ResultHeartbeatManager capturedMgr = mgr; Runnable heartbeatTask = () -> { + // C1 fix: read current flag each tick + java.util.concurrent.atomic.AtomicBoolean stopped = + capturedMgr.getStoppedFlag(capturedStatementId); if (stopped.get()) { return; // client/session may be closed, skip RPC } try { - boolean alive = client.checkStatementAlive(statementId); + boolean alive = client.checkStatementAlive(capturedStatementId); consecutiveFailures.set(0); // reset on success if (!alive) { LOGGER.info( - "Heartbeat detected terminal state for statement {}, stopping", statementId); - stopped.set(true); - stopHeartbeat(); + "Heartbeat detected terminal state for statement {}, stopping", + capturedStatementId); + capturedMgr.stopHeartbeat(capturedStatementId); } } catch (Exception e) { - // If stopped was set during the RPC (connection closing), don't count as failure - if (stopped.get()) { + // Re-read flag — may have been set during the RPC (connection closing) + if (capturedMgr.getStoppedFlag(capturedStatementId).get()) { return; } int failures = consecutiveFailures.incrementAndGet(); if (failures == 1) { - // First failure — log at INFO so users see the initial problem LOGGER.info( "Heartbeat failed for statement {} (first failure): {}", - statementId, + capturedStatementId, e.getMessage()); } else { LOGGER.debug( "Heartbeat failed for statement {} (failure {}/{}): {}", - statementId, + capturedStatementId, failures, maxConsecutiveFailures, e.getMessage()); } if (failures >= maxConsecutiveFailures) { - // Terminal failure — log at WARN so it's visible in default log config LOGGER.warn( "Heartbeat stopped for statement {} after {} consecutive failures. " + "Server-side results may expire. Last error: {}", - statementId, + capturedStatementId, failures, e.getMessage()); - stopped.set(true); - stopHeartbeat(); + capturedMgr.stopHeartbeat(capturedStatementId); } } }; - mgr.startHeartbeat(statementId, heartbeatTask); + mgr.startHeartbeat(capturedStatementId, heartbeatTask); LOGGER.debug( "Heartbeat started for statement {} (resultType={}, interval={}s)", - statementId, + capturedStatementId, resultSetType, mgr.getIntervalSeconds()); } catch (Exception e) { diff --git a/src/main/java/com/databricks/jdbc/api/impl/DatabricksStatement.java b/src/main/java/com/databricks/jdbc/api/impl/DatabricksStatement.java index eab0702c5..959de35e2 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/DatabricksStatement.java +++ b/src/main/java/com/databricks/jdbc/api/impl/DatabricksStatement.java @@ -253,6 +253,15 @@ public void cancel() throws SQLException { LOGGER.debug("public void cancel()"); checkIfClosed(); + // H11 fix: Stop heartbeat on cancel — server operation is being cancelled, + // no point continuing to poll it + if (statementId != null) { + ResultHeartbeatManager mgr = connection.getHeartbeatManager(); + if (mgr != null) { + mgr.stopHeartbeat(statementId); + } + } + if (statementId != null && !directResultsReceived) { this.connection.getSession().getDatabricksClient().cancelStatement(statementId); DatabricksThreadContextHolder.clearStatementInfo(); diff --git a/src/main/java/com/databricks/jdbc/api/impl/ResultHeartbeatManager.java b/src/main/java/com/databricks/jdbc/api/impl/ResultHeartbeatManager.java index f9f92a6be..5ddd2a04d 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/ResultHeartbeatManager.java +++ b/src/main/java/com/databricks/jdbc/api/impl/ResultHeartbeatManager.java @@ -69,7 +69,7 @@ void startHeartbeat(StatementId statementId, Runnable heartbeatTask) { "Starting heartbeat for statement {} with interval {}s", statementId, intervalSeconds); ScheduledFuture future = - scheduler.scheduleAtFixedRate( + scheduler.scheduleWithFixedDelay( heartbeatTask, intervalSeconds, intervalSeconds, TimeUnit.SECONDS); activeHeartbeats.put(statementId, future); } diff --git a/src/main/java/com/databricks/jdbc/api/internal/IDatabricksConnectionContext.java b/src/main/java/com/databricks/jdbc/api/internal/IDatabricksConnectionContext.java index 71ab0ab58..772697d14 100644 --- a/src/main/java/com/databricks/jdbc/api/internal/IDatabricksConnectionContext.java +++ b/src/main/java/com/databricks/jdbc/api/internal/IDatabricksConnectionContext.java @@ -409,6 +409,16 @@ public interface IDatabricksConnectionContext { /** Returns the timeout in seconds for metadata polling operations. 0 means no timeout. */ int getMetadataOperationTimeout(); + /** Returns whether heartbeat/keep-alive polling is enabled. */ + default boolean isHeartbeatEnabled() { + return false; + } + + /** Returns the heartbeat polling interval in seconds. */ + default int getHeartbeatIntervalSeconds() { + return 60; + } + /** Returns whether batched INSERT optimization is enabled */ boolean isBatchedInsertsEnabled(); diff --git a/src/main/java/com/databricks/jdbc/dbclient/IDatabricksClient.java b/src/main/java/com/databricks/jdbc/dbclient/IDatabricksClient.java index 25b70d678..718478c17 100644 --- a/src/main/java/com/databricks/jdbc/dbclient/IDatabricksClient.java +++ b/src/main/java/com/databricks/jdbc/dbclient/IDatabricksClient.java @@ -114,7 +114,10 @@ DatabricksResultSet executeStatementAsync( * @return true if the statement is still in a non-terminal state (alive), false if terminal */ default boolean checkStatementAlive(StatementId statementId) throws SQLException { - return false; // default no-op for clients that don't support heartbeat + // H7 fix: Throw instead of returning false. Returning false is treated as "terminal state" + // by the heartbeat task, causing misleading "terminal state" logs. Throwing makes the + // heartbeat task count it as a failure, which is more accurate for unsupported clients. + throw new java.sql.SQLFeatureNotSupportedException("Heartbeat not supported by this client"); } /** diff --git a/src/main/java/com/databricks/jdbc/dbclient/impl/sqlexec/DatabricksSdkClient.java b/src/main/java/com/databricks/jdbc/dbclient/impl/sqlexec/DatabricksSdkClient.java index db7db0948..dcf3725bf 100644 --- a/src/main/java/com/databricks/jdbc/dbclient/impl/sqlexec/DatabricksSdkClient.java +++ b/src/main/java/com/databricks/jdbc/dbclient/impl/sqlexec/DatabricksSdkClient.java @@ -418,7 +418,10 @@ public boolean checkStatementAlive(StatementId typedStatementId) throws SQLExcep return state != StatementState.CANCELED && state != StatementState.CLOSED && state != StatementState.FAILED; - } catch (IOException e) { + } catch (Exception e) { + // H10 fix: Catch all exceptions (DatabricksError, DatabricksException, IOException, + // RuntimeException) — not just IOException. A 401 token expiry should count as a + // transient failure, not bypass error handling. LOGGER.debug("Heartbeat check failed for statement {}: {}", statementId, e.getMessage()); throw new DatabricksSQLException( "Heartbeat status check failed", e, DatabricksDriverErrorCode.SDK_CLIENT_ERROR); From d24d62a3993e0b420ee144c26227fd30a2cd1235 Mon Sep 17 00:00:00 2001 From: Gopal Lal Date: Mon, 11 May 2026 14:09:10 +0530 Subject: [PATCH 13/23] Address additional heartbeat review feedback MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Major fixes: - #1: Use 2-thread pool instead of single-thread to prevent one blocked heartbeat RPC from starving others on the same connection - #3: Fix getStoppedFlag() leak — use get() + ALREADY_STOPPED sentinel instead of computeIfAbsent() which re-created a false flag after stopHeartbeat() removed it, causing RPCs after stop Minor fixes: - #2: Increase awaitTermination from 5s to 10s with explanatory comment - #4: Add comment about client capture retaining session reference - #5: Document reactive vs proactive heartbeat stop limitation - #6, #7, #9, #10, #11: Acknowledged as correct/acceptable per reviewer Co-authored-by: Isaac Signed-off-by: Gopal Lal --- .../jdbc/api/impl/DatabricksResultSet.java | 8 ++++ .../jdbc/api/impl/ResultHeartbeatManager.java | 40 +++++++++++++++---- 2 files changed, 40 insertions(+), 8 deletions(-) diff --git a/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java b/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java index 5cdd7ed5f..7f07bfacf 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java +++ b/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java @@ -286,6 +286,11 @@ public boolean next() throws SQLException { cachedTelemetryCollector.recordResultSetIteration( statementId.toSQLExecStatementId(), resultSetMetaData.getChunkCount(), hasNext); } + // Stop heartbeat when all rows consumed. This is reactive — the heartbeat continues + // even if all data is already prefetched client-side (e.g., small Thrift inline results). + // A proactive check (chunkProvider.isStreamingComplete()) would be more efficient but + // requires deeper integration with the chunk download pipeline. For now, the server-side + // checkStatementAlive() returning false for terminal states provides a secondary safety net. if (!hasNext) { stopHeartbeat(); } @@ -331,6 +336,9 @@ private void startHeartbeatIfEnabled() { // C2 fix: Capture only what the lambda needs — avoid capturing 'this' to prevent // abandoned ResultSets from keeping the warehouse alive via heartbeat. + // Note: capturing 'client' retains a reference to the session/connection. If the + // connection is GC'd without close(), heartbeat RPCs will fail and self-stop after + // maxConsecutiveFailures (10 ticks, ~10 min at 60s interval). Acceptable tradeoff. final IDatabricksClient client = conn.getSession().getDatabricksClient(); final StatementId capturedStatementId = this.statementId; final int maxConsecutiveFailures = 10; diff --git a/src/main/java/com/databricks/jdbc/api/impl/ResultHeartbeatManager.java b/src/main/java/com/databricks/jdbc/api/impl/ResultHeartbeatManager.java index 5ddd2a04d..cce04b935 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/ResultHeartbeatManager.java +++ b/src/main/java/com/databricks/jdbc/api/impl/ResultHeartbeatManager.java @@ -33,13 +33,23 @@ class ResultHeartbeatManager { private final Map> activeHeartbeats = new ConcurrentHashMap<>(); private final Map stoppedFlags = new ConcurrentHashMap<>(); + + /** Sentinel returned by getStoppedFlag when the flag has been removed (already stopped). */ + private static final java.util.concurrent.atomic.AtomicBoolean ALREADY_STOPPED = + new java.util.concurrent.atomic.AtomicBoolean(true); + private final int intervalSeconds; private volatile boolean isShutdown = false; + // Small pool to prevent one blocked heartbeat RPC from starving others on the same connection. + // A connection with multiple active statements needs concurrent heartbeat ticks. + private static final int HEARTBEAT_THREAD_POOL_SIZE = 2; + ResultHeartbeatManager(int intervalSeconds) { this.intervalSeconds = intervalSeconds; this.scheduler = - Executors.newSingleThreadScheduledExecutor( + Executors.newScheduledThreadPool( + HEARTBEAT_THREAD_POOL_SIZE, r -> { Thread t = new Thread(r, "databricks-jdbc-heartbeat"); t.setDaemon(true); @@ -62,8 +72,8 @@ void startHeartbeat(StatementId statementId, Runnable heartbeatTask) { // Stop any existing heartbeat for this statement (e.g., re-execution) stopHeartbeat(statementId); - // Reset the stopped flag for the new heartbeat - getStoppedFlag(statementId).set(false); + // Create a fresh stopped flag for the new heartbeat + resetStoppedFlag(statementId); LOGGER.debug( "Starting heartbeat for statement {} with interval {}s", statementId, intervalSeconds); @@ -97,12 +107,21 @@ void stopHeartbeat(StatementId statementId) { } /** - * Returns a stopped flag for the given statement. The heartbeat task should check this before - * each RPC to avoid calling a closed client/session. + * Returns the stopped flag for the given statement, or a sentinel ALREADY_STOPPED if the flag has + * been removed (i.e., stopHeartbeat was already called). Uses get() instead of computeIfAbsent() + * to prevent re-creating a false flag after stopHeartbeat() removed it. */ java.util.concurrent.atomic.AtomicBoolean getStoppedFlag(StatementId statementId) { - return stoppedFlags.computeIfAbsent( - statementId, k -> new java.util.concurrent.atomic.AtomicBoolean(false)); + java.util.concurrent.atomic.AtomicBoolean flag = stoppedFlags.get(statementId); + return flag != null ? flag : ALREADY_STOPPED; + } + + /** + * Creates or resets the stopped flag for a statement. Called only from startHeartbeat() when + * setting up a new heartbeat — not from the heartbeat task itself. + */ + private void resetStoppedFlag(StatementId statementId) { + stoppedFlags.put(statementId, new java.util.concurrent.atomic.AtomicBoolean(false)); } /** Stops all heartbeats and shuts down the scheduler. Called on Connection.close(). */ @@ -121,9 +140,14 @@ void shutdown() { } activeHeartbeats.clear(); + // Wait for in-flight RPCs to complete. 10s gives reasonable headroom for HTTP + // timeouts to fire first (~300s connection timeout won't be an issue here since + // stopped flags are already set, so RPCs will short-circuit on next check). + // If tasks don't finish in time, shutdownNow() sends interrupts. scheduler.shutdown(); try { - if (!scheduler.awaitTermination(5, TimeUnit.SECONDS)) { + if (!scheduler.awaitTermination(10, TimeUnit.SECONDS)) { + LOGGER.debug("Heartbeat tasks did not terminate in 10s, forcing shutdown"); scheduler.shutdownNow(); } } catch (InterruptedException e) { From 7631ee30ee293f52e80f57e2a71669d897df4d61 Mon Sep 17 00:00:00 2001 From: Gopal Lal Date: Mon, 11 May 2026 14:11:16 +0530 Subject: [PATCH 14/23] Add missing heartbeat tests for concurrency, sentinel flag, and cancel-to-close MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Test concurrent startHeartbeat for same statementId replaces cleanly (no orphaned future) - Test getStoppedFlag returns ALREADY_STOPPED sentinel after stop (no flag recreation via computeIfAbsent) - Test stopped flag race with scheduled tick — no leaked RPCs after stop - Test stop (cancel) → shutdown (close) sequence — no RPCs leak Co-authored-by: Isaac Signed-off-by: Gopal Lal --- .../api/impl/ResultHeartbeatManagerTest.java | 125 ++++++++++++++++++ 1 file changed, 125 insertions(+) diff --git a/src/test/java/com/databricks/jdbc/api/impl/ResultHeartbeatManagerTest.java b/src/test/java/com/databricks/jdbc/api/impl/ResultHeartbeatManagerTest.java index e47666122..3a17952f2 100644 --- a/src/test/java/com/databricks/jdbc/api/impl/ResultHeartbeatManagerTest.java +++ b/src/test/java/com/databricks/jdbc/api/impl/ResultHeartbeatManagerTest.java @@ -247,4 +247,129 @@ void testZeroIntervalDefaultsToSafe() { manager.startHeartbeat(new StatementId("min-interval"), ran::countDown); assertDoesNotThrow(() -> ran.await(5, TimeUnit.SECONDS)); } + + // ========================================================================= + // Concurrent start for same statementId — orphaned future must not happen + // ========================================================================= + + @Test + void testConcurrentStartForSameStatementId_replacesCleanly() throws Exception { + manager = new ResultHeartbeatManager(1); + StatementId id = new StatementId("concurrent"); + AtomicInteger firstCount = new AtomicInteger(0); + AtomicInteger secondCount = new AtomicInteger(0); + CountDownLatch secondRan = new CountDownLatch(1); + + // Start first heartbeat + manager.startHeartbeat(id, firstCount::incrementAndGet); + // Immediately replace with second — first should be stopped + manager.startHeartbeat( + id, + () -> { + secondCount.incrementAndGet(); + secondRan.countDown(); + }); + + assertEquals(1, manager.getActiveHeartbeatCount(), "Only one heartbeat should be active"); + assertTrue(secondRan.await(5, TimeUnit.SECONDS), "Second heartbeat should fire"); + + // Wait a bit more, then verify second is the one running + Thread.sleep(2000); + assertTrue(secondCount.get() >= 1, "Second task should have run"); + // First task may have run once before replacement, but should not keep running + int firstSnapshot = firstCount.get(); + Thread.sleep(1500); + assertEquals(firstSnapshot, firstCount.get(), "First task should not run after replacement"); + } + + // ========================================================================= + // getStoppedFlag race with stop — flag must not be recreated after stop + // ========================================================================= + + @Test + void testGetStoppedFlagAfterStop_returnsSentinel() { + manager = new ResultHeartbeatManager(60); + StatementId id = new StatementId("sentinel-test"); + + manager.startHeartbeat(id, () -> {}); + // Flag should be false while active + assertFalse(manager.getStoppedFlag(id).get()); + + manager.stopHeartbeat(id); + // After stop, getStoppedFlag should return a true sentinel, NOT recreate a false flag + AtomicBoolean flag = manager.getStoppedFlag(id); + assertTrue(flag.get(), "Flag after stop should be true (sentinel)"); + + // Calling getStoppedFlag again should still return true — no new false flag created + assertTrue(manager.getStoppedFlag(id).get(), "Repeated call should still be true"); + } + + @Test + void testStoppedFlagRaceWithScheduledTick_noLeakedRpc() throws Exception { + manager = new ResultHeartbeatManager(1); + StatementId id = new StatementId("race-sentinel"); + AtomicInteger rpcCount = new AtomicInteger(0); + CountDownLatch firstTick = new CountDownLatch(1); + + // Task checks stopped flag from manager each tick (like production code) + manager.startHeartbeat( + id, + () -> { + AtomicBoolean stopped = manager.getStoppedFlag(id); + if (!stopped.get()) { + rpcCount.incrementAndGet(); + } + firstTick.countDown(); + }); + + assertTrue(firstTick.await(5, TimeUnit.SECONDS), "First tick should fire"); + int countBeforeStop = rpcCount.get(); + assertTrue(countBeforeStop >= 1, "At least one RPC should have fired"); + + // Stop — after this, getStoppedFlag should return sentinel true + manager.stopHeartbeat(id); + assertTrue(manager.getStoppedFlag(id).get(), "Flag should be true after stop"); + + // Wait and verify no more RPCs fire + Thread.sleep(2000); + assertEquals( + countBeforeStop, + rpcCount.get(), + "No RPCs should fire after stop (sentinel prevents recreation)"); + } + + // ========================================================================= + // Heartbeat continues past cancel to close — verify no leaked RPCs + // ========================================================================= + + @Test + void testStopThenShutdown_noLeakedRpcs() throws Exception { + manager = new ResultHeartbeatManager(1); + StatementId id = new StatementId("cancel-to-close"); + AtomicInteger rpcCount = new AtomicInteger(0); + CountDownLatch firstTick = new CountDownLatch(1); + + manager.startHeartbeat( + id, + () -> { + if (!manager.getStoppedFlag(id).get()) { + rpcCount.incrementAndGet(); + } + firstTick.countDown(); + }); + + assertTrue(firstTick.await(5, TimeUnit.SECONDS)); + + // Simulate Statement.cancel() stopping heartbeat + manager.stopHeartbeat(id); + int countAfterCancel = rpcCount.get(); + + // Simulate Connection.close() shutting down manager + manager.shutdown(); + + // Wait and verify no more RPCs + Thread.sleep(1500); + assertEquals(countAfterCancel, rpcCount.get(), "No RPCs should fire after stop + shutdown"); + assertTrue(manager.isShutdown()); + } } From a037ae11ab9945bd5f4c2e7296293d48489a3695 Mon Sep 17 00:00:00 2001 From: Gopal Lal Date: Tue, 12 May 2026 09:00:20 +0530 Subject: [PATCH 15/23] Proactive heartbeat stop when all data fetched from server Add isAllDataFetched() to IExecutionResult and ChunkProvider interfaces. When all result data has been transferred from server to client, the heartbeat stops proactively in next() instead of waiting for the user to consume all rows. This prevents unnecessary RPCs and warehouse cost when a slow consumer reads a fully-prefetched result set. Implementations: - InlineJsonResult: always true (all data fetched at construction) - InlineChunkProvider: always true (in-memory) - AbstractRemoteChunkProvider (CloudFetch): true when all chunk downloads submitted - StreamingChunkProvider: true when endOfStreamReached - LazyThriftResult: true when !hasMoreRows - LazyThriftInlineArrowResult: true when !hasMoreRows - StreamingInlineArrowResult: delegates to ThriftStreamingProvider - StreamingColumnarResult: delegates to ThriftStreamingProvider - ArrowStreamResult: delegates to ChunkProvider Co-authored-by: Isaac Signed-off-by: Gopal Lal --- .../databricks/jdbc/api/impl/DatabricksResultSet.java | 11 ++++++----- .../databricks/jdbc/api/impl/IExecutionResult.java | 11 +++++++++++ .../databricks/jdbc/api/impl/InlineJsonResult.java | 5 +++++ .../databricks/jdbc/api/impl/LazyThriftResult.java | 5 +++++ .../api/impl/arrow/AbstractRemoteChunkProvider.java | 7 +++++++ .../jdbc/api/impl/arrow/ArrowStreamResult.java | 5 +++++ .../databricks/jdbc/api/impl/arrow/ChunkProvider.java | 8 ++++++++ .../jdbc/api/impl/arrow/InlineChunkProvider.java | 5 +++++ .../api/impl/arrow/LazyThriftInlineArrowResult.java | 5 +++++ .../jdbc/api/impl/arrow/StreamingChunkProvider.java | 5 +++++ .../api/impl/arrow/StreamingInlineArrowResult.java | 5 +++++ .../jdbc/api/impl/thrift/StreamingColumnarResult.java | 5 +++++ 12 files changed, 72 insertions(+), 5 deletions(-) diff --git a/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java b/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java index 7f07bfacf..936877ae2 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java +++ b/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java @@ -286,13 +286,14 @@ public boolean next() throws SQLException { cachedTelemetryCollector.recordResultSetIteration( statementId.toSQLExecStatementId(), resultSetMetaData.getChunkCount(), hasNext); } - // Stop heartbeat when all rows consumed. This is reactive — the heartbeat continues - // even if all data is already prefetched client-side (e.g., small Thrift inline results). - // A proactive check (chunkProvider.isStreamingComplete()) would be more efficient but - // requires deeper integration with the chunk download pipeline. For now, the server-side - // checkStatementAlive() returning false for terminal states provides a secondary safety net. if (!hasNext) { stopHeartbeat(); + } else if (executionResult.isAllDataFetched()) { + // Proactive stop: all data has been fetched from the server to the client. + // No need to keep the server-side operation alive — the user is just iterating + // through client-side data. This prevents unnecessary RPCs and warehouse cost + // when a slow consumer reads a fully-prefetched result set. + stopHeartbeat(); } return hasNext; } diff --git a/src/main/java/com/databricks/jdbc/api/impl/IExecutionResult.java b/src/main/java/com/databricks/jdbc/api/impl/IExecutionResult.java index efa8d0792..9a7b46636 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/IExecutionResult.java +++ b/src/main/java/com/databricks/jdbc/api/impl/IExecutionResult.java @@ -38,4 +38,15 @@ public interface IExecutionResult { long getRowCount(); long getChunkCount(); + + /** + * Returns true if all result data has been fetched from the server to the client, regardless of + * whether the user has iterated through all rows. Used by the heartbeat to stop polling early + * when the server-side operation is no longer needed. + * + *

Default returns false (conservative — keeps heartbeat running until next() returns false). + */ + default boolean isAllDataFetched() { + return false; + } } diff --git a/src/main/java/com/databricks/jdbc/api/impl/InlineJsonResult.java b/src/main/java/com/databricks/jdbc/api/impl/InlineJsonResult.java index e3541510e..d94939768 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/InlineJsonResult.java +++ b/src/main/java/com/databricks/jdbc/api/impl/InlineJsonResult.java @@ -99,6 +99,11 @@ public long getChunkCount() { return chunkProvider.getChunkCount(); } + @Override + public boolean isAllDataFetched() { + return true; // Inline JSON results are fully in-memory + } + private boolean isClosed() { return isClosed; } diff --git a/src/main/java/com/databricks/jdbc/api/impl/LazyThriftResult.java b/src/main/java/com/databricks/jdbc/api/impl/LazyThriftResult.java index 854da8656..8752eeab5 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/LazyThriftResult.java +++ b/src/main/java/com/databricks/jdbc/api/impl/LazyThriftResult.java @@ -259,4 +259,9 @@ public long getTotalRowsFetched() { public boolean isCompletelyFetched() { return hasReachedEnd || !currentResponse.hasMoreRows; } + + @Override + public boolean isAllDataFetched() { + return isCompletelyFetched(); + } } diff --git a/src/main/java/com/databricks/jdbc/api/impl/arrow/AbstractRemoteChunkProvider.java b/src/main/java/com/databricks/jdbc/api/impl/arrow/AbstractRemoteChunkProvider.java index 4717852f4..2e3a9ad31 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/arrow/AbstractRemoteChunkProvider.java +++ b/src/main/java/com/databricks/jdbc/api/impl/arrow/AbstractRemoteChunkProvider.java @@ -219,6 +219,13 @@ public boolean isClosed() { return isClosed; } + @Override + public boolean isAllDataFetched() { + // All chunk downloads have been submitted (may still be in-flight, but data is + // on its way to the client — server-side operation is no longer needed) + return nextChunkToDownload >= chunkCount; + } + public long getAllowedChunksInMemory() { return allowedChunksInMemory; } diff --git a/src/main/java/com/databricks/jdbc/api/impl/arrow/ArrowStreamResult.java b/src/main/java/com/databricks/jdbc/api/impl/arrow/ArrowStreamResult.java index 1569b7efb..bee6cbfa3 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/arrow/ArrowStreamResult.java +++ b/src/main/java/com/databricks/jdbc/api/impl/arrow/ArrowStreamResult.java @@ -323,6 +323,11 @@ public long getChunkCount() { return chunkProvider.getChunkCount(); } + @Override + public boolean isAllDataFetched() { + return chunkProvider.isAllDataFetched(); + } + /** * Returns the chunk provider for testing purposes. * diff --git a/src/main/java/com/databricks/jdbc/api/impl/arrow/ChunkProvider.java b/src/main/java/com/databricks/jdbc/api/impl/arrow/ChunkProvider.java index d4955f9b9..31d30cf8e 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/arrow/ChunkProvider.java +++ b/src/main/java/com/databricks/jdbc/api/impl/arrow/ChunkProvider.java @@ -43,4 +43,12 @@ public interface ChunkProvider { long getChunkCount(); boolean isClosed(); + + /** + * Returns true if all chunk data has been fetched from the server to the client. Default returns + * false (conservative). + */ + default boolean isAllDataFetched() { + return false; + } } diff --git a/src/main/java/com/databricks/jdbc/api/impl/arrow/InlineChunkProvider.java b/src/main/java/com/databricks/jdbc/api/impl/arrow/InlineChunkProvider.java index 32f5e1b80..8f7a6c7c0 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/arrow/InlineChunkProvider.java +++ b/src/main/java/com/databricks/jdbc/api/impl/arrow/InlineChunkProvider.java @@ -92,6 +92,11 @@ public boolean isClosed() { return isClosed; } + @Override + public boolean isAllDataFetched() { + return true; // Inline results are fully in-memory at construction + } + @VisibleForTesting void handleError(Exception e) throws DatabricksParsingException { String errorMessage = diff --git a/src/main/java/com/databricks/jdbc/api/impl/arrow/LazyThriftInlineArrowResult.java b/src/main/java/com/databricks/jdbc/api/impl/arrow/LazyThriftInlineArrowResult.java index 98d97a899..bd9a8e1c0 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/arrow/LazyThriftInlineArrowResult.java +++ b/src/main/java/com/databricks/jdbc/api/impl/arrow/LazyThriftInlineArrowResult.java @@ -258,6 +258,11 @@ public long getChunkCount() { return 0; } + @Override + public boolean isAllDataFetched() { + return hasReachedEnd || !currentResponse.hasMoreRows; + } + /** * Gets the Arrow metadata for the current chunk. * diff --git a/src/main/java/com/databricks/jdbc/api/impl/arrow/StreamingChunkProvider.java b/src/main/java/com/databricks/jdbc/api/impl/arrow/StreamingChunkProvider.java index dbb1bbe78..37b142582 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/arrow/StreamingChunkProvider.java +++ b/src/main/java/com/databricks/jdbc/api/impl/arrow/StreamingChunkProvider.java @@ -335,6 +335,11 @@ public boolean isClosed() { return closed; } + @Override + public boolean isAllDataFetched() { + return endOfStreamReached; + } + // ==================== Link Prefetch Logic ==================== private void linkPrefetchLoop() { diff --git a/src/main/java/com/databricks/jdbc/api/impl/arrow/StreamingInlineArrowResult.java b/src/main/java/com/databricks/jdbc/api/impl/arrow/StreamingInlineArrowResult.java index 7c9025c3f..94cbe5632 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/arrow/StreamingInlineArrowResult.java +++ b/src/main/java/com/databricks/jdbc/api/impl/arrow/StreamingInlineArrowResult.java @@ -296,6 +296,11 @@ public long getChunkCount() { return 0; } + @Override + public boolean isAllDataFetched() { + return provider.isEndOfStreamReached(); + } + /** * Gets the Arrow metadata for the current chunk. * diff --git a/src/main/java/com/databricks/jdbc/api/impl/thrift/StreamingColumnarResult.java b/src/main/java/com/databricks/jdbc/api/impl/thrift/StreamingColumnarResult.java index bf5a95d34..1dee077ba 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/thrift/StreamingColumnarResult.java +++ b/src/main/java/com/databricks/jdbc/api/impl/thrift/StreamingColumnarResult.java @@ -281,6 +281,11 @@ public long getChunkCount() { return 0; } + @Override + public boolean isAllDataFetched() { + return provider.isEndOfStreamReached(); + } + /** * Gets the total number of rows fetched from the server so far. * From b881d4c809235a568753459a9798bb2a9d24e6c9 Mon Sep 17 00:00:00 2001 From: Gopal Lal Date: Tue, 12 May 2026 09:06:18 +0530 Subject: [PATCH 16/23] Add tests for isAllDataFetched() across all implementations 7 tests covering: - IExecutionResult default returns false (conservative) - InlineJsonResult always returns true - InlineChunkProvider always returns true - AbstractRemoteChunkProvider: false when downloads pending, true when all submitted - StreamingChunkProvider: false before end of stream - LazyThriftResult: delegates to isCompletelyFetched() - ChunkProvider default returns false Co-authored-by: Isaac Signed-off-by: Gopal Lal --- .../jdbc/api/impl/IsAllDataFetchedTest.java | 119 ++++++++++++++++++ 1 file changed, 119 insertions(+) create mode 100644 src/test/java/com/databricks/jdbc/api/impl/IsAllDataFetchedTest.java diff --git a/src/test/java/com/databricks/jdbc/api/impl/IsAllDataFetchedTest.java b/src/test/java/com/databricks/jdbc/api/impl/IsAllDataFetchedTest.java new file mode 100644 index 000000000..015a6afe2 --- /dev/null +++ b/src/test/java/com/databricks/jdbc/api/impl/IsAllDataFetchedTest.java @@ -0,0 +1,119 @@ +package com.databricks.jdbc.api.impl; + +import static org.junit.jupiter.api.Assertions.*; +import static org.mockito.Mockito.*; + +import com.databricks.jdbc.api.impl.arrow.*; +import org.junit.jupiter.api.Test; + +/** + * Tests for isAllDataFetched() across all IExecutionResult and ChunkProvider implementations. + * Verifies that heartbeat stops proactively when all data has been fetched from the server. + */ +public class IsAllDataFetchedTest { + + // ========================================================================= + // IExecutionResult default + // ========================================================================= + + @Test + void testDefaultIsAllDataFetched_returnsFalse() { + // The default implementation returns false (conservative) + IExecutionResult result = mock(IExecutionResult.class, CALLS_REAL_METHODS); + assertFalse(result.isAllDataFetched()); + } + + // ========================================================================= + // InlineJsonResult — always true + // ========================================================================= + + @Test + void testInlineJsonResult_alwaysTrue() { + // InlineJsonResult fetches all data at construction + InlineJsonResult result = new InlineJsonResult(new Object[][] {{1, "a"}, {2, "b"}}); + assertTrue(result.isAllDataFetched()); + } + + // ========================================================================= + // InlineChunkProvider — always true + // ========================================================================= + + @Test + void testInlineChunkProvider_alwaysTrue() { + InlineChunkProvider provider = mock(InlineChunkProvider.class, CALLS_REAL_METHODS); + assertTrue(provider.isAllDataFetched()); + } + + // ========================================================================= + // AbstractRemoteChunkProvider — based on nextChunkToDownload vs chunkCount + // ========================================================================= + + @Test + void testRemoteChunkProvider_notAllDownloaded() { + // Create a concrete subclass mock to test the base class logic + AbstractRemoteChunkProvider provider = + mock(AbstractRemoteChunkProvider.class, CALLS_REAL_METHODS); + // Default fields are 0, so nextChunkToDownload (0) < chunkCount (0) is false + // Let's set chunkCount > nextChunkToDownload + try { + java.lang.reflect.Field chunkCountField = + AbstractRemoteChunkProvider.class.getDeclaredField("chunkCount"); + chunkCountField.setAccessible(true); + chunkCountField.set(provider, 5L); + + java.lang.reflect.Field nextField = + AbstractRemoteChunkProvider.class.getDeclaredField("nextChunkToDownload"); + nextField.setAccessible(true); + nextField.set(provider, 2L); + + assertFalse(provider.isAllDataFetched()); + + // Now set all downloaded + nextField.set(provider, 5L); + assertTrue(provider.isAllDataFetched()); + + // More than needed (edge case) + nextField.set(provider, 7L); + assertTrue(provider.isAllDataFetched()); + } catch (Exception e) { + fail("Reflection failed: " + e.getMessage()); + } + } + + // ========================================================================= + // StreamingChunkProvider — based on endOfStreamReached + // ========================================================================= + + @Test + void testStreamingChunkProvider_beforeEndOfStream() { + StreamingChunkProvider provider = mock(StreamingChunkProvider.class, CALLS_REAL_METHODS); + // Default endOfStreamReached is false + assertFalse(provider.isAllDataFetched()); + } + + // ========================================================================= + // LazyThriftResult — based on hasMoreRows + // ========================================================================= + + @Test + void testLazyThriftResult_isAllDataFetched_delegatesToIsCompletelyFetched() { + LazyThriftResult result = mock(LazyThriftResult.class); + // isAllDataFetched delegates to isCompletelyFetched + doReturn(false).when(result).isCompletelyFetched(); + doCallRealMethod().when(result).isAllDataFetched(); + assertFalse(result.isAllDataFetched()); + + doReturn(true).when(result).isCompletelyFetched(); + assertTrue(result.isAllDataFetched()); + } + + // ========================================================================= + // ChunkProvider default + // ========================================================================= + + @Test + void testChunkProviderDefault_returnsFalse() { + ChunkProvider provider = mock(ChunkProvider.class, CALLS_REAL_METHODS); + assertFalse(provider.isAllDataFetched()); + } +} From 458aa5e31a71cadcd60ccb564a93d5bbfc556a97 Mon Sep 17 00:00:00 2001 From: Gopal Lal Date: Tue, 12 May 2026 17:31:10 +0530 Subject: [PATCH 17/23] Address review feedback on isAllDataFetched and NPE guards MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fixes: - LazyThriftResult.isCompletelyFetched(): null guard for currentResponse (set to null in close()) - LazyThriftInlineArrowResult.isAllDataFetched(): same null guard - AbstractRemoteChunkProvider.isAllDataFetched(): require !hasNextChunk() in addition to all downloads submitted — ensures in-flight downloads complete before signaling server is no longer needed - DatabricksResultSet.next(): throw meaningful exception when executionResult is null (pre-existing NPE, async path) - DatabricksResultSet.close(): null guard for executionResult.close() - IsAllDataFetchedTest: updated for tightened RemoteChunkProvider logic - Removed proactive server close from close() — belongs to PR #1444 Co-authored-by: Isaac Signed-off-by: Gopal Lal --- .../databricks/jdbc/api/impl/DatabricksResultSet.java | 10 +++++++++- .../databricks/jdbc/api/impl/LazyThriftResult.java | 3 ++- .../api/impl/arrow/AbstractRemoteChunkProvider.java | 8 +++++--- .../api/impl/arrow/LazyThriftInlineArrowResult.java | 3 ++- .../jdbc/api/impl/IsAllDataFetchedTest.java | 11 ++++++++++- 5 files changed, 28 insertions(+), 7 deletions(-) diff --git a/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java b/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java index 936877ae2..92bf97945 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java +++ b/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java @@ -281,6 +281,12 @@ public DatabricksResultSet( @Override public boolean next() throws SQLException { checkIfClosed(); + if (executionResult == null) { + throw new DatabricksSQLException( + "Cannot iterate: no result data available. " + + "For async execution, call getExecutionResult() first.", + DatabricksDriverErrorCode.INVALID_STATE); + } boolean hasNext = this.executionResult.next(); if (cachedTelemetryCollector != null) { cachedTelemetryCollector.recordResultSetIteration( @@ -302,7 +308,9 @@ public boolean next() throws SQLException { public void close() throws DatabricksSQLException { stopHeartbeat(); isClosed = true; - this.executionResult.close(); + if (executionResult != null) { + executionResult.close(); + } if (parentStatement != null) { parentStatement.handleResultSetClose(this); } diff --git a/src/main/java/com/databricks/jdbc/api/impl/LazyThriftResult.java b/src/main/java/com/databricks/jdbc/api/impl/LazyThriftResult.java index 8752eeab5..5fcc62c8a 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/LazyThriftResult.java +++ b/src/main/java/com/databricks/jdbc/api/impl/LazyThriftResult.java @@ -257,7 +257,8 @@ public long getTotalRowsFetched() { * @return true if all data has been fetched (either reached end or maxRows limit) */ public boolean isCompletelyFetched() { - return hasReachedEnd || !currentResponse.hasMoreRows; + // Guard against null — currentResponse is set to null in close() + return hasReachedEnd || (currentResponse != null && !currentResponse.hasMoreRows); } @Override diff --git a/src/main/java/com/databricks/jdbc/api/impl/arrow/AbstractRemoteChunkProvider.java b/src/main/java/com/databricks/jdbc/api/impl/arrow/AbstractRemoteChunkProvider.java index 2e3a9ad31..85cd01b1c 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/arrow/AbstractRemoteChunkProvider.java +++ b/src/main/java/com/databricks/jdbc/api/impl/arrow/AbstractRemoteChunkProvider.java @@ -221,9 +221,11 @@ public boolean isClosed() { @Override public boolean isAllDataFetched() { - // All chunk downloads have been submitted (may still be in-flight, but data is - // on its way to the client — server-side operation is no longer needed) - return nextChunkToDownload >= chunkCount; + // All chunk downloads submitted AND consumer has reached or passed the last chunk. + // The second condition ensures in-flight downloads have completed before signaling + // that the server is no longer needed — prevents link refresh failures if a + // presigned URL expires while downloads are still in progress. + return nextChunkToDownload >= chunkCount && !hasNextChunk(); } public long getAllowedChunksInMemory() { diff --git a/src/main/java/com/databricks/jdbc/api/impl/arrow/LazyThriftInlineArrowResult.java b/src/main/java/com/databricks/jdbc/api/impl/arrow/LazyThriftInlineArrowResult.java index bd9a8e1c0..6ab03a92d 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/arrow/LazyThriftInlineArrowResult.java +++ b/src/main/java/com/databricks/jdbc/api/impl/arrow/LazyThriftInlineArrowResult.java @@ -260,7 +260,8 @@ public long getChunkCount() { @Override public boolean isAllDataFetched() { - return hasReachedEnd || !currentResponse.hasMoreRows; + // Guard against null — currentResponse is set to null in close() + return hasReachedEnd || (currentResponse != null && !currentResponse.hasMoreRows); } /** diff --git a/src/test/java/com/databricks/jdbc/api/impl/IsAllDataFetchedTest.java b/src/test/java/com/databricks/jdbc/api/impl/IsAllDataFetchedTest.java index 015a6afe2..8d540254e 100644 --- a/src/test/java/com/databricks/jdbc/api/impl/IsAllDataFetchedTest.java +++ b/src/test/java/com/databricks/jdbc/api/impl/IsAllDataFetchedTest.java @@ -68,12 +68,21 @@ void testRemoteChunkProvider_notAllDownloaded() { assertFalse(provider.isAllDataFetched()); - // Now set all downloaded + // All downloads submitted but still iterating (currentChunkIndex < chunkCount - 1) nextField.set(provider, 5L); + java.lang.reflect.Field currentField = + AbstractRemoteChunkProvider.class.getDeclaredField("currentChunkIndex"); + currentField.setAccessible(true); + currentField.set(provider, 2L); // still iterating + assertFalse(provider.isAllDataFetched(), "Still iterating — should not report all fetched"); + + // All downloads submitted AND on last chunk (hasNextChunk == false) + currentField.set(provider, 4L); // chunkCount-1 = 4 assertTrue(provider.isAllDataFetched()); // More than needed (edge case) nextField.set(provider, 7L); + currentField.set(provider, 5L); assertTrue(provider.isAllDataFetched()); } catch (Exception e) { fail("Reflection failed: " + e.getMessage()); From a40466ac50c27b0023c6f3d44c55187952dfa7bd Mon Sep 17 00:00:00 2001 From: Gopal Lal Date: Tue, 12 May 2026 17:44:58 +0530 Subject: [PATCH 18/23] =?UTF-8?q?Remove=20isAllDataFetched=20=E2=80=94=20h?= =?UTF-8?q?eartbeat=20stops=20when=20next()=20returns=20false?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Simplify: heartbeat stops when next() returns false, which means the last row has been iterated. At that point all data is guaranteed to be client-side (you can't read a row without its chunk being downloaded). Removed isAllDataFetched() from IExecutionResult, ChunkProvider, and all 9 implementations. Deleted IsAllDataFetchedTest.java. Kept NPE guards for executionResult==null in next() and close(). Co-authored-by: Isaac Signed-off-by: Gopal Lal --- .../jdbc/api/impl/DatabricksResultSet.java | 6 - .../jdbc/api/impl/IExecutionResult.java | 11 -- .../jdbc/api/impl/InlineJsonResult.java | 5 - .../jdbc/api/impl/LazyThriftResult.java | 8 +- .../arrow/AbstractRemoteChunkProvider.java | 9 -- .../api/impl/arrow/ArrowStreamResult.java | 5 - .../jdbc/api/impl/arrow/ChunkProvider.java | 8 -- .../api/impl/arrow/InlineChunkProvider.java | 5 - .../arrow/LazyThriftInlineArrowResult.java | 6 - .../impl/arrow/StreamingChunkProvider.java | 5 - .../arrow/StreamingInlineArrowResult.java | 5 - .../impl/thrift/StreamingColumnarResult.java | 5 - .../jdbc/common/DatabricksJdbcConstants.java | 1 + .../jdbc/api/impl/IsAllDataFetchedTest.java | 128 ------------------ 14 files changed, 2 insertions(+), 205 deletions(-) delete mode 100644 src/test/java/com/databricks/jdbc/api/impl/IsAllDataFetchedTest.java diff --git a/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java b/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java index 92bf97945..ba8a23d48 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java +++ b/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java @@ -294,12 +294,6 @@ public boolean next() throws SQLException { } if (!hasNext) { stopHeartbeat(); - } else if (executionResult.isAllDataFetched()) { - // Proactive stop: all data has been fetched from the server to the client. - // No need to keep the server-side operation alive — the user is just iterating - // through client-side data. This prevents unnecessary RPCs and warehouse cost - // when a slow consumer reads a fully-prefetched result set. - stopHeartbeat(); } return hasNext; } diff --git a/src/main/java/com/databricks/jdbc/api/impl/IExecutionResult.java b/src/main/java/com/databricks/jdbc/api/impl/IExecutionResult.java index 9a7b46636..efa8d0792 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/IExecutionResult.java +++ b/src/main/java/com/databricks/jdbc/api/impl/IExecutionResult.java @@ -38,15 +38,4 @@ public interface IExecutionResult { long getRowCount(); long getChunkCount(); - - /** - * Returns true if all result data has been fetched from the server to the client, regardless of - * whether the user has iterated through all rows. Used by the heartbeat to stop polling early - * when the server-side operation is no longer needed. - * - *

Default returns false (conservative — keeps heartbeat running until next() returns false). - */ - default boolean isAllDataFetched() { - return false; - } } diff --git a/src/main/java/com/databricks/jdbc/api/impl/InlineJsonResult.java b/src/main/java/com/databricks/jdbc/api/impl/InlineJsonResult.java index d94939768..e3541510e 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/InlineJsonResult.java +++ b/src/main/java/com/databricks/jdbc/api/impl/InlineJsonResult.java @@ -99,11 +99,6 @@ public long getChunkCount() { return chunkProvider.getChunkCount(); } - @Override - public boolean isAllDataFetched() { - return true; // Inline JSON results are fully in-memory - } - private boolean isClosed() { return isClosed; } diff --git a/src/main/java/com/databricks/jdbc/api/impl/LazyThriftResult.java b/src/main/java/com/databricks/jdbc/api/impl/LazyThriftResult.java index 5fcc62c8a..854da8656 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/LazyThriftResult.java +++ b/src/main/java/com/databricks/jdbc/api/impl/LazyThriftResult.java @@ -257,12 +257,6 @@ public long getTotalRowsFetched() { * @return true if all data has been fetched (either reached end or maxRows limit) */ public boolean isCompletelyFetched() { - // Guard against null — currentResponse is set to null in close() - return hasReachedEnd || (currentResponse != null && !currentResponse.hasMoreRows); - } - - @Override - public boolean isAllDataFetched() { - return isCompletelyFetched(); + return hasReachedEnd || !currentResponse.hasMoreRows; } } diff --git a/src/main/java/com/databricks/jdbc/api/impl/arrow/AbstractRemoteChunkProvider.java b/src/main/java/com/databricks/jdbc/api/impl/arrow/AbstractRemoteChunkProvider.java index 85cd01b1c..4717852f4 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/arrow/AbstractRemoteChunkProvider.java +++ b/src/main/java/com/databricks/jdbc/api/impl/arrow/AbstractRemoteChunkProvider.java @@ -219,15 +219,6 @@ public boolean isClosed() { return isClosed; } - @Override - public boolean isAllDataFetched() { - // All chunk downloads submitted AND consumer has reached or passed the last chunk. - // The second condition ensures in-flight downloads have completed before signaling - // that the server is no longer needed — prevents link refresh failures if a - // presigned URL expires while downloads are still in progress. - return nextChunkToDownload >= chunkCount && !hasNextChunk(); - } - public long getAllowedChunksInMemory() { return allowedChunksInMemory; } diff --git a/src/main/java/com/databricks/jdbc/api/impl/arrow/ArrowStreamResult.java b/src/main/java/com/databricks/jdbc/api/impl/arrow/ArrowStreamResult.java index bee6cbfa3..1569b7efb 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/arrow/ArrowStreamResult.java +++ b/src/main/java/com/databricks/jdbc/api/impl/arrow/ArrowStreamResult.java @@ -323,11 +323,6 @@ public long getChunkCount() { return chunkProvider.getChunkCount(); } - @Override - public boolean isAllDataFetched() { - return chunkProvider.isAllDataFetched(); - } - /** * Returns the chunk provider for testing purposes. * diff --git a/src/main/java/com/databricks/jdbc/api/impl/arrow/ChunkProvider.java b/src/main/java/com/databricks/jdbc/api/impl/arrow/ChunkProvider.java index 31d30cf8e..d4955f9b9 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/arrow/ChunkProvider.java +++ b/src/main/java/com/databricks/jdbc/api/impl/arrow/ChunkProvider.java @@ -43,12 +43,4 @@ public interface ChunkProvider { long getChunkCount(); boolean isClosed(); - - /** - * Returns true if all chunk data has been fetched from the server to the client. Default returns - * false (conservative). - */ - default boolean isAllDataFetched() { - return false; - } } diff --git a/src/main/java/com/databricks/jdbc/api/impl/arrow/InlineChunkProvider.java b/src/main/java/com/databricks/jdbc/api/impl/arrow/InlineChunkProvider.java index 8f7a6c7c0..32f5e1b80 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/arrow/InlineChunkProvider.java +++ b/src/main/java/com/databricks/jdbc/api/impl/arrow/InlineChunkProvider.java @@ -92,11 +92,6 @@ public boolean isClosed() { return isClosed; } - @Override - public boolean isAllDataFetched() { - return true; // Inline results are fully in-memory at construction - } - @VisibleForTesting void handleError(Exception e) throws DatabricksParsingException { String errorMessage = diff --git a/src/main/java/com/databricks/jdbc/api/impl/arrow/LazyThriftInlineArrowResult.java b/src/main/java/com/databricks/jdbc/api/impl/arrow/LazyThriftInlineArrowResult.java index 6ab03a92d..98d97a899 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/arrow/LazyThriftInlineArrowResult.java +++ b/src/main/java/com/databricks/jdbc/api/impl/arrow/LazyThriftInlineArrowResult.java @@ -258,12 +258,6 @@ public long getChunkCount() { return 0; } - @Override - public boolean isAllDataFetched() { - // Guard against null — currentResponse is set to null in close() - return hasReachedEnd || (currentResponse != null && !currentResponse.hasMoreRows); - } - /** * Gets the Arrow metadata for the current chunk. * diff --git a/src/main/java/com/databricks/jdbc/api/impl/arrow/StreamingChunkProvider.java b/src/main/java/com/databricks/jdbc/api/impl/arrow/StreamingChunkProvider.java index 37b142582..dbb1bbe78 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/arrow/StreamingChunkProvider.java +++ b/src/main/java/com/databricks/jdbc/api/impl/arrow/StreamingChunkProvider.java @@ -335,11 +335,6 @@ public boolean isClosed() { return closed; } - @Override - public boolean isAllDataFetched() { - return endOfStreamReached; - } - // ==================== Link Prefetch Logic ==================== private void linkPrefetchLoop() { diff --git a/src/main/java/com/databricks/jdbc/api/impl/arrow/StreamingInlineArrowResult.java b/src/main/java/com/databricks/jdbc/api/impl/arrow/StreamingInlineArrowResult.java index 94cbe5632..7c9025c3f 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/arrow/StreamingInlineArrowResult.java +++ b/src/main/java/com/databricks/jdbc/api/impl/arrow/StreamingInlineArrowResult.java @@ -296,11 +296,6 @@ public long getChunkCount() { return 0; } - @Override - public boolean isAllDataFetched() { - return provider.isEndOfStreamReached(); - } - /** * Gets the Arrow metadata for the current chunk. * diff --git a/src/main/java/com/databricks/jdbc/api/impl/thrift/StreamingColumnarResult.java b/src/main/java/com/databricks/jdbc/api/impl/thrift/StreamingColumnarResult.java index 1dee077ba..bf5a95d34 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/thrift/StreamingColumnarResult.java +++ b/src/main/java/com/databricks/jdbc/api/impl/thrift/StreamingColumnarResult.java @@ -281,11 +281,6 @@ public long getChunkCount() { return 0; } - @Override - public boolean isAllDataFetched() { - return provider.isEndOfStreamReached(); - } - /** * Gets the total number of rows fetched from the server so far. * diff --git a/src/main/java/com/databricks/jdbc/common/DatabricksJdbcConstants.java b/src/main/java/com/databricks/jdbc/common/DatabricksJdbcConstants.java index 11c31c1c0..857d67c69 100644 --- a/src/main/java/com/databricks/jdbc/common/DatabricksJdbcConstants.java +++ b/src/main/java/com/databricks/jdbc/common/DatabricksJdbcConstants.java @@ -121,6 +121,7 @@ public final class DatabricksJdbcConstants { * concurrent-modification errors where the operation is potentially retryable. */ public static final String SERIALIZATION_FAILURE_SQLSTATE = "40001"; + /** Standard SQL state for data exception (SQLSTATE 22000). */ public static final String DATA_EXCEPTION_SQLSTATE = "22000"; diff --git a/src/test/java/com/databricks/jdbc/api/impl/IsAllDataFetchedTest.java b/src/test/java/com/databricks/jdbc/api/impl/IsAllDataFetchedTest.java deleted file mode 100644 index 8d540254e..000000000 --- a/src/test/java/com/databricks/jdbc/api/impl/IsAllDataFetchedTest.java +++ /dev/null @@ -1,128 +0,0 @@ -package com.databricks.jdbc.api.impl; - -import static org.junit.jupiter.api.Assertions.*; -import static org.mockito.Mockito.*; - -import com.databricks.jdbc.api.impl.arrow.*; -import org.junit.jupiter.api.Test; - -/** - * Tests for isAllDataFetched() across all IExecutionResult and ChunkProvider implementations. - * Verifies that heartbeat stops proactively when all data has been fetched from the server. - */ -public class IsAllDataFetchedTest { - - // ========================================================================= - // IExecutionResult default - // ========================================================================= - - @Test - void testDefaultIsAllDataFetched_returnsFalse() { - // The default implementation returns false (conservative) - IExecutionResult result = mock(IExecutionResult.class, CALLS_REAL_METHODS); - assertFalse(result.isAllDataFetched()); - } - - // ========================================================================= - // InlineJsonResult — always true - // ========================================================================= - - @Test - void testInlineJsonResult_alwaysTrue() { - // InlineJsonResult fetches all data at construction - InlineJsonResult result = new InlineJsonResult(new Object[][] {{1, "a"}, {2, "b"}}); - assertTrue(result.isAllDataFetched()); - } - - // ========================================================================= - // InlineChunkProvider — always true - // ========================================================================= - - @Test - void testInlineChunkProvider_alwaysTrue() { - InlineChunkProvider provider = mock(InlineChunkProvider.class, CALLS_REAL_METHODS); - assertTrue(provider.isAllDataFetched()); - } - - // ========================================================================= - // AbstractRemoteChunkProvider — based on nextChunkToDownload vs chunkCount - // ========================================================================= - - @Test - void testRemoteChunkProvider_notAllDownloaded() { - // Create a concrete subclass mock to test the base class logic - AbstractRemoteChunkProvider provider = - mock(AbstractRemoteChunkProvider.class, CALLS_REAL_METHODS); - // Default fields are 0, so nextChunkToDownload (0) < chunkCount (0) is false - // Let's set chunkCount > nextChunkToDownload - try { - java.lang.reflect.Field chunkCountField = - AbstractRemoteChunkProvider.class.getDeclaredField("chunkCount"); - chunkCountField.setAccessible(true); - chunkCountField.set(provider, 5L); - - java.lang.reflect.Field nextField = - AbstractRemoteChunkProvider.class.getDeclaredField("nextChunkToDownload"); - nextField.setAccessible(true); - nextField.set(provider, 2L); - - assertFalse(provider.isAllDataFetched()); - - // All downloads submitted but still iterating (currentChunkIndex < chunkCount - 1) - nextField.set(provider, 5L); - java.lang.reflect.Field currentField = - AbstractRemoteChunkProvider.class.getDeclaredField("currentChunkIndex"); - currentField.setAccessible(true); - currentField.set(provider, 2L); // still iterating - assertFalse(provider.isAllDataFetched(), "Still iterating — should not report all fetched"); - - // All downloads submitted AND on last chunk (hasNextChunk == false) - currentField.set(provider, 4L); // chunkCount-1 = 4 - assertTrue(provider.isAllDataFetched()); - - // More than needed (edge case) - nextField.set(provider, 7L); - currentField.set(provider, 5L); - assertTrue(provider.isAllDataFetched()); - } catch (Exception e) { - fail("Reflection failed: " + e.getMessage()); - } - } - - // ========================================================================= - // StreamingChunkProvider — based on endOfStreamReached - // ========================================================================= - - @Test - void testStreamingChunkProvider_beforeEndOfStream() { - StreamingChunkProvider provider = mock(StreamingChunkProvider.class, CALLS_REAL_METHODS); - // Default endOfStreamReached is false - assertFalse(provider.isAllDataFetched()); - } - - // ========================================================================= - // LazyThriftResult — based on hasMoreRows - // ========================================================================= - - @Test - void testLazyThriftResult_isAllDataFetched_delegatesToIsCompletelyFetched() { - LazyThriftResult result = mock(LazyThriftResult.class); - // isAllDataFetched delegates to isCompletelyFetched - doReturn(false).when(result).isCompletelyFetched(); - doCallRealMethod().when(result).isAllDataFetched(); - assertFalse(result.isAllDataFetched()); - - doReturn(true).when(result).isCompletelyFetched(); - assertTrue(result.isAllDataFetched()); - } - - // ========================================================================= - // ChunkProvider default - // ========================================================================= - - @Test - void testChunkProviderDefault_returnsFalse() { - ChunkProvider provider = mock(ChunkProvider.class, CALLS_REAL_METHODS); - assertFalse(provider.isAllDataFetched()); - } -} From 832a1c8b576e3f5a09d9748c3ea13959d7ae96ca Mon Sep 17 00:00:00 2001 From: Gopal Lal Date: Tue, 12 May 2026 18:03:21 +0530 Subject: [PATCH 19/23] Add coverage tests for heartbeat config and checkStatementAlive MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit DatabricksConnectionContextTest (6 new): - heartbeat disabled by default - heartbeat enabled via EnableHeartbeat=1 - heartbeat interval default 60s - heartbeat interval custom value - heartbeat interval zero falls back to 60 - heartbeat interval negative falls back to 60 DatabricksSdkClientTest (6 new): - checkStatementAlive: SUCCEEDED → true - checkStatementAlive: RUNNING → true - checkStatementAlive: CANCELED → false - checkStatementAlive: CLOSED → false - checkStatementAlive: FAILED → false - checkStatementAlive: exception wrapped as DatabricksSQLException Co-authored-by: Isaac Signed-off-by: Gopal Lal --- .../impl/DatabricksConnectionContextTest.java | 50 ++++++++ .../impl/sqlexec/DatabricksSdkClientTest.java | 107 ++++++++++++++++++ 2 files changed, 157 insertions(+) diff --git a/src/test/java/com/databricks/jdbc/api/impl/DatabricksConnectionContextTest.java b/src/test/java/com/databricks/jdbc/api/impl/DatabricksConnectionContextTest.java index 41e54fc50..6de7e101b 100644 --- a/src/test/java/com/databricks/jdbc/api/impl/DatabricksConnectionContextTest.java +++ b/src/test/java/com/databricks/jdbc/api/impl/DatabricksConnectionContextTest.java @@ -1783,4 +1783,54 @@ public void testTreatCatalogAsPattern1_withExplicitSEA_treatCatalogStillTrue() assertEquals(DatabricksClientType.SEA, ctx.getClientType()); assertTrue(ctx.treatMetadataCatalogNameAsPattern()); } + + // ========================================================================= + // Heartbeat configuration + // ========================================================================= + + @Test + public void testHeartbeatDisabledByDefault() throws DatabricksSQLException { + IDatabricksConnectionContext ctx = + DatabricksConnectionContext.parse(TestConstants.VALID_URL_1, properties); + assertFalse(ctx.isHeartbeatEnabled()); + } + + @Test + public void testHeartbeatEnabled() throws DatabricksSQLException { + IDatabricksConnectionContext ctx = + DatabricksConnectionContext.parse( + TestConstants.VALID_URL_1 + ";EnableHeartbeat=1", properties); + assertTrue(ctx.isHeartbeatEnabled()); + } + + @Test + public void testHeartbeatIntervalDefault() throws DatabricksSQLException { + IDatabricksConnectionContext ctx = + DatabricksConnectionContext.parse(TestConstants.VALID_URL_1, properties); + assertEquals(60, ctx.getHeartbeatIntervalSeconds()); + } + + @Test + public void testHeartbeatIntervalCustom() throws DatabricksSQLException { + IDatabricksConnectionContext ctx = + DatabricksConnectionContext.parse( + TestConstants.VALID_URL_1 + ";HeartbeatIntervalSeconds=30", properties); + assertEquals(30, ctx.getHeartbeatIntervalSeconds()); + } + + @Test + public void testHeartbeatIntervalZeroDefaultsTo60() throws DatabricksSQLException { + IDatabricksConnectionContext ctx = + DatabricksConnectionContext.parse( + TestConstants.VALID_URL_1 + ";HeartbeatIntervalSeconds=0", properties); + assertEquals(60, ctx.getHeartbeatIntervalSeconds()); + } + + @Test + public void testHeartbeatIntervalNegativeDefaultsTo60() throws DatabricksSQLException { + IDatabricksConnectionContext ctx = + DatabricksConnectionContext.parse( + TestConstants.VALID_URL_1 + ";HeartbeatIntervalSeconds=-5", properties); + assertEquals(60, ctx.getHeartbeatIntervalSeconds()); + } } diff --git a/src/test/java/com/databricks/jdbc/dbclient/impl/sqlexec/DatabricksSdkClientTest.java b/src/test/java/com/databricks/jdbc/dbclient/impl/sqlexec/DatabricksSdkClientTest.java index 66cd3afde..f5e88fbbf 100644 --- a/src/test/java/com/databricks/jdbc/dbclient/impl/sqlexec/DatabricksSdkClientTest.java +++ b/src/test/java/com/databricks/jdbc/dbclient/impl/sqlexec/DatabricksSdkClientTest.java @@ -1204,4 +1204,111 @@ public void testGetResultChunksData_DatabricksError_throwsSQLException() throws assertTrue(exception.getMessage().contains("Results have expired")); assertNotNull(exception.getCause()); } + + // ========================================================================= + // checkStatementAlive + // ========================================================================= + + @Test + public void testCheckStatementAlive_succeededState_returnsTrue() throws Exception { + IDatabricksConnectionContext connectionContext = + DatabricksConnectionContext.parse(JDBC_URL, new Properties()); + DatabricksSdkClient databricksSdkClient = + new DatabricksSdkClient(connectionContext, statementExecutionService, apiClient); + + StatementStatus status = new StatementStatus().setState(StatementState.SUCCEEDED); + GetStatementResponse response = new GetStatementResponse(); + response.setStatus(status); + response.setStatementId(STATEMENT_ID.toSQLExecStatementId()); + + when(apiClient.execute(any(Request.class), eq(GetStatementResponse.class))) + .thenReturn(response); + + assertTrue(databricksSdkClient.checkStatementAlive(STATEMENT_ID)); + } + + @Test + public void testCheckStatementAlive_runningState_returnsTrue() throws Exception { + IDatabricksConnectionContext connectionContext = + DatabricksConnectionContext.parse(JDBC_URL, new Properties()); + DatabricksSdkClient databricksSdkClient = + new DatabricksSdkClient(connectionContext, statementExecutionService, apiClient); + + StatementStatus status = new StatementStatus().setState(StatementState.RUNNING); + GetStatementResponse response = new GetStatementResponse(); + response.setStatus(status); + + when(apiClient.execute(any(Request.class), eq(GetStatementResponse.class))) + .thenReturn(response); + + assertTrue(databricksSdkClient.checkStatementAlive(STATEMENT_ID)); + } + + @Test + public void testCheckStatementAlive_canceledState_returnsFalse() throws Exception { + IDatabricksConnectionContext connectionContext = + DatabricksConnectionContext.parse(JDBC_URL, new Properties()); + DatabricksSdkClient databricksSdkClient = + new DatabricksSdkClient(connectionContext, statementExecutionService, apiClient); + + StatementStatus status = new StatementStatus().setState(StatementState.CANCELED); + GetStatementResponse response = new GetStatementResponse(); + response.setStatus(status); + + when(apiClient.execute(any(Request.class), eq(GetStatementResponse.class))) + .thenReturn(response); + + assertFalse(databricksSdkClient.checkStatementAlive(STATEMENT_ID)); + } + + @Test + public void testCheckStatementAlive_closedState_returnsFalse() throws Exception { + IDatabricksConnectionContext connectionContext = + DatabricksConnectionContext.parse(JDBC_URL, new Properties()); + DatabricksSdkClient databricksSdkClient = + new DatabricksSdkClient(connectionContext, statementExecutionService, apiClient); + + StatementStatus status = new StatementStatus().setState(StatementState.CLOSED); + GetStatementResponse response = new GetStatementResponse(); + response.setStatus(status); + + when(apiClient.execute(any(Request.class), eq(GetStatementResponse.class))) + .thenReturn(response); + + assertFalse(databricksSdkClient.checkStatementAlive(STATEMENT_ID)); + } + + @Test + public void testCheckStatementAlive_failedState_returnsFalse() throws Exception { + IDatabricksConnectionContext connectionContext = + DatabricksConnectionContext.parse(JDBC_URL, new Properties()); + DatabricksSdkClient databricksSdkClient = + new DatabricksSdkClient(connectionContext, statementExecutionService, apiClient); + + StatementStatus status = new StatementStatus().setState(StatementState.FAILED); + GetStatementResponse response = new GetStatementResponse(); + response.setStatus(status); + + when(apiClient.execute(any(Request.class), eq(GetStatementResponse.class))) + .thenReturn(response); + + assertFalse(databricksSdkClient.checkStatementAlive(STATEMENT_ID)); + } + + @Test + public void testCheckStatementAlive_exceptionWrapped() throws Exception { + IDatabricksConnectionContext connectionContext = + DatabricksConnectionContext.parse(JDBC_URL, new Properties()); + DatabricksSdkClient databricksSdkClient = + new DatabricksSdkClient(connectionContext, statementExecutionService, apiClient); + + when(apiClient.execute(any(Request.class), eq(GetStatementResponse.class))) + .thenThrow(new RuntimeException("Network error")); + + DatabricksSQLException exception = + assertThrows( + DatabricksSQLException.class, + () -> databricksSdkClient.checkStatementAlive(STATEMENT_ID)); + assertTrue(exception.getMessage().contains("Heartbeat status check failed")); + } } From fc461263853b07f4984372273a5d01f61b36cbcb Mon Sep 17 00:00:00 2001 From: Gopal Lal Date: Tue, 12 May 2026 20:15:42 +0530 Subject: [PATCH 20/23] Add more coverage tests to push past 85% threshold MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit DatabricksConnectionContextTest (+3): - Large interval (>3600) accepted with warning - Explicit EnableHeartbeat=0 - Interface default methods return disabled/60 DatabricksThriftServiceClientTest (+7): - checkStatementAlive: FINISHED → true - checkStatementAlive: RUNNING → true - checkStatementAlive: CANCELED → false - checkStatementAlive: CLOSED → false - checkStatementAlive: ERROR → false - checkStatementAlive: null state → assumes alive - checkStatementAlive: TException wrapped Total new coverage tests in this PR: 22 (config) + 16 (manager) + 10 (eligibility) Co-authored-by: Isaac Signed-off-by: Gopal Lal --- .../impl/DatabricksConnectionContextTest.java | 27 +++++ .../DatabricksThriftServiceClientTest.java | 100 ++++++++++++++++++ 2 files changed, 127 insertions(+) diff --git a/src/test/java/com/databricks/jdbc/api/impl/DatabricksConnectionContextTest.java b/src/test/java/com/databricks/jdbc/api/impl/DatabricksConnectionContextTest.java index 6de7e101b..b8b9d68d4 100644 --- a/src/test/java/com/databricks/jdbc/api/impl/DatabricksConnectionContextTest.java +++ b/src/test/java/com/databricks/jdbc/api/impl/DatabricksConnectionContextTest.java @@ -1833,4 +1833,31 @@ public void testHeartbeatIntervalNegativeDefaultsTo60() throws DatabricksSQLExce TestConstants.VALID_URL_1 + ";HeartbeatIntervalSeconds=-5", properties); assertEquals(60, ctx.getHeartbeatIntervalSeconds()); } + + @Test + public void testHeartbeatIntervalLargeValueAcceptedWithWarning() throws DatabricksSQLException { + // Values > 3600 are accepted but log a warning + IDatabricksConnectionContext ctx = + DatabricksConnectionContext.parse( + TestConstants.VALID_URL_1 + ";HeartbeatIntervalSeconds=7200", properties); + assertEquals(7200, ctx.getHeartbeatIntervalSeconds()); + } + + @Test + public void testHeartbeatExplicitlyDisabled() throws DatabricksSQLException { + IDatabricksConnectionContext ctx = + DatabricksConnectionContext.parse( + TestConstants.VALID_URL_1 + ";EnableHeartbeat=0", properties); + assertFalse(ctx.isHeartbeatEnabled()); + } + + @Test + public void testHeartbeatInterfaceDefaultDisabled() { + // IDatabricksConnectionContext default methods + IDatabricksConnectionContext defaultCtx = + org.mockito.Mockito.mock( + IDatabricksConnectionContext.class, org.mockito.Mockito.CALLS_REAL_METHODS); + assertFalse(defaultCtx.isHeartbeatEnabled()); + assertEquals(60, defaultCtx.getHeartbeatIntervalSeconds()); + } } diff --git a/src/test/java/com/databricks/jdbc/dbclient/impl/thrift/DatabricksThriftServiceClientTest.java b/src/test/java/com/databricks/jdbc/dbclient/impl/thrift/DatabricksThriftServiceClientTest.java index 583438d5b..371e886ca 100644 --- a/src/test/java/com/databricks/jdbc/dbclient/impl/thrift/DatabricksThriftServiceClientTest.java +++ b/src/test/java/com/databricks/jdbc/dbclient/impl/thrift/DatabricksThriftServiceClientTest.java @@ -1176,4 +1176,104 @@ void testListSchemasDoesNotEscapeCatalogWhenPatternEnabled() throws SQLException verify(thriftAccessor).getThriftResponse(captor.capture()); assertEquals("my_catalog", captor.getValue().getCatalogName()); } + + // ========================================================================= + // checkStatementAlive — heartbeat support + // ========================================================================= + + @Test + public void testCheckStatementAlive_finishedState_returnsTrue() throws Exception { + DatabricksThriftServiceClient client = + new DatabricksThriftServiceClient(thriftAccessor, connectionContext); + + TGetOperationStatusResp resp = new TGetOperationStatusResp(); + resp.setOperationState(TOperationState.FINISHED_STATE); + when(thriftAccessor.getOperationStatus( + any(TGetOperationStatusReq.class), any(StatementId.class))) + .thenReturn(resp); + + assertTrue(client.checkStatementAlive(TEST_STMT_ID)); + } + + @Test + public void testCheckStatementAlive_runningState_returnsTrue() throws Exception { + DatabricksThriftServiceClient client = + new DatabricksThriftServiceClient(thriftAccessor, connectionContext); + + TGetOperationStatusResp resp = new TGetOperationStatusResp(); + resp.setOperationState(TOperationState.RUNNING_STATE); + when(thriftAccessor.getOperationStatus( + any(TGetOperationStatusReq.class), any(StatementId.class))) + .thenReturn(resp); + + assertTrue(client.checkStatementAlive(TEST_STMT_ID)); + } + + @Test + public void testCheckStatementAlive_canceledState_returnsFalse() throws Exception { + DatabricksThriftServiceClient client = + new DatabricksThriftServiceClient(thriftAccessor, connectionContext); + + TGetOperationStatusResp resp = new TGetOperationStatusResp(); + resp.setOperationState(TOperationState.CANCELED_STATE); + when(thriftAccessor.getOperationStatus( + any(TGetOperationStatusReq.class), any(StatementId.class))) + .thenReturn(resp); + + assertFalse(client.checkStatementAlive(TEST_STMT_ID)); + } + + @Test + public void testCheckStatementAlive_closedState_returnsFalse() throws Exception { + DatabricksThriftServiceClient client = + new DatabricksThriftServiceClient(thriftAccessor, connectionContext); + + TGetOperationStatusResp resp = new TGetOperationStatusResp(); + resp.setOperationState(TOperationState.CLOSED_STATE); + when(thriftAccessor.getOperationStatus( + any(TGetOperationStatusReq.class), any(StatementId.class))) + .thenReturn(resp); + + assertFalse(client.checkStatementAlive(TEST_STMT_ID)); + } + + @Test + public void testCheckStatementAlive_errorState_returnsFalse() throws Exception { + DatabricksThriftServiceClient client = + new DatabricksThriftServiceClient(thriftAccessor, connectionContext); + + TGetOperationStatusResp resp = new TGetOperationStatusResp(); + resp.setOperationState(TOperationState.ERROR_STATE); + when(thriftAccessor.getOperationStatus( + any(TGetOperationStatusReq.class), any(StatementId.class))) + .thenReturn(resp); + + assertFalse(client.checkStatementAlive(TEST_STMT_ID)); + } + + @Test + public void testCheckStatementAlive_nullState_assumesAlive() throws Exception { + DatabricksThriftServiceClient client = + new DatabricksThriftServiceClient(thriftAccessor, connectionContext); + + TGetOperationStatusResp resp = new TGetOperationStatusResp(); + // operationState not set — null + when(thriftAccessor.getOperationStatus( + any(TGetOperationStatusReq.class), any(StatementId.class))) + .thenReturn(resp); + + assertTrue(client.checkStatementAlive(TEST_STMT_ID)); + } + + @Test + public void testCheckStatementAlive_thriftException_wraps() throws Exception { + DatabricksThriftServiceClient client = + new DatabricksThriftServiceClient(thriftAccessor, connectionContext); + + when(thriftAccessor.getOperationStatus( + any(TGetOperationStatusReq.class), any(StatementId.class))) + .thenThrow(new org.apache.thrift.TException("Connection refused")); + + assertThrows(DatabricksSQLException.class, () -> client.checkStatementAlive(TEST_STMT_ID)); + } } From 0a46f52a890ad2a2a47e8dff3b7fc3d0055ece88 Mon Sep 17 00:00:00 2001 From: Gopal Lal Date: Fri, 15 May 2026 14:05:57 +0530 Subject: [PATCH 21/23] Address May 12 review feedback: hashCode, thread names, Throwable, unwrap MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit F3 (Critical): Add StatementId.hashCode() — was missing, breaking ConcurrentHashMap lookups in ResultHeartbeatManager. Two equal StatementId instances had different hash codes. F2 (High): Fix asymmetric cast in stopHeartbeat() — use same JDBC unwrap() pattern as startHeartbeatIfEnabled() for pooled connections. F15 (Medium): Thread names now include manager ID: "databricks-jdbc-heartbeat-{N}" for thread dump diagnostics. F17 (Medium): Catch Throwable instead of Exception in heartbeat lambda. Re-throw VirtualMachineError to avoid swallowing fatal JVM errors. F23 (Medium): Early exit for SQLFeatureNotSupportedException — stop heartbeat immediately instead of retrying 10 times with misleading "results may expire" warning. Co-authored-by: Isaac Signed-off-by: Gopal Lal --- .../jdbc/api/impl/DatabricksResultSet.java | 30 +++++++++++++++++-- .../jdbc/api/impl/ResultHeartbeatManager.java | 6 +++- .../dbclient/impl/common/StatementId.java | 5 ++++ 3 files changed, 37 insertions(+), 4 deletions(-) diff --git a/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java b/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java index ba8a23d48..5251427f0 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java +++ b/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java @@ -371,11 +371,27 @@ private void startHeartbeatIfEnabled() { capturedStatementId); capturedMgr.stopHeartbeat(capturedStatementId); } - } catch (Exception e) { + } catch (Throwable e) { + // Catch Throwable (not just Exception) because ScheduledExecutorService + // suppresses subsequent executions on uncaught Error. Without this, an + // OOM or NoClassDefFoundError would silently kill the heartbeat. + if (e instanceof VirtualMachineError) { + // Don't swallow fatal JVM errors — stop heartbeat and let it propagate + capturedMgr.stopHeartbeat(capturedStatementId); + throw (VirtualMachineError) e; + } // Re-read flag — may have been set during the RPC (connection closing) if (capturedMgr.getStoppedFlag(capturedStatementId).get()) { return; } + // Unsupported client — stop immediately, don't retry 10 times + if (e instanceof java.sql.SQLFeatureNotSupportedException) { + LOGGER.info( + "Heartbeat not supported by client for statement {}, stopping", + capturedStatementId); + capturedMgr.stopHeartbeat(capturedStatementId); + return; + } int failures = consecutiveFailures.incrementAndGet(); if (failures == 1) { LOGGER.info( @@ -419,8 +435,16 @@ private void stopHeartbeat() { return; } try { - DatabricksConnection conn = - (DatabricksConnection) parentStatement.getStatement().getConnection(); + // Use same unwrap pattern as startHeartbeatIfEnabled() for pooled connections + java.sql.Connection rawConn = parentStatement.getStatement().getConnection(); + DatabricksConnection conn; + if (rawConn instanceof DatabricksConnection) { + conn = (DatabricksConnection) rawConn; + } else if (rawConn.isWrapperFor(DatabricksConnection.class)) { + conn = rawConn.unwrap(DatabricksConnection.class); + } else { + return; + } ResultHeartbeatManager mgr = conn.getHeartbeatManager(); if (mgr != null) { mgr.stopHeartbeat(statementId); diff --git a/src/main/java/com/databricks/jdbc/api/impl/ResultHeartbeatManager.java b/src/main/java/com/databricks/jdbc/api/impl/ResultHeartbeatManager.java index cce04b935..8a225e8fd 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/ResultHeartbeatManager.java +++ b/src/main/java/com/databricks/jdbc/api/impl/ResultHeartbeatManager.java @@ -45,13 +45,17 @@ class ResultHeartbeatManager { // A connection with multiple active statements needs concurrent heartbeat ticks. private static final int HEARTBEAT_THREAD_POOL_SIZE = 2; + private static final java.util.concurrent.atomic.AtomicLong MANAGER_COUNTER = + new java.util.concurrent.atomic.AtomicLong(0); + ResultHeartbeatManager(int intervalSeconds) { this.intervalSeconds = intervalSeconds; + long managerId = MANAGER_COUNTER.incrementAndGet(); this.scheduler = Executors.newScheduledThreadPool( HEARTBEAT_THREAD_POOL_SIZE, r -> { - Thread t = new Thread(r, "databricks-jdbc-heartbeat"); + Thread t = new Thread(r, "databricks-jdbc-heartbeat-" + managerId); t.setDaemon(true); return t; }); diff --git a/src/main/java/com/databricks/jdbc/dbclient/impl/common/StatementId.java b/src/main/java/com/databricks/jdbc/dbclient/impl/common/StatementId.java index 1579cab94..d8c6010a5 100644 --- a/src/main/java/com/databricks/jdbc/dbclient/impl/common/StatementId.java +++ b/src/main/java/com/databricks/jdbc/dbclient/impl/common/StatementId.java @@ -95,4 +95,9 @@ public boolean equals(Object otherStatement) { return Objects.equals(this.guid, ((StatementId) otherStatement).guid) && Objects.equals(this.secret, ((StatementId) otherStatement).secret); } + + @Override + public int hashCode() { + return Objects.hash(clientType, guid, secret); + } } From fd4e90a0360c269a7653c53f5d399a499edc6708 Mon Sep 17 00:00:00 2001 From: Gopal Lal Date: Fri, 15 May 2026 14:42:00 +0530 Subject: [PATCH 22/23] Downgrade unsupported heartbeat log from INFO to DEBUG Expected condition for clients that don't implement checkStatementAlive. Co-authored-by: Isaac Signed-off-by: Gopal Lal --- .../java/com/databricks/jdbc/api/impl/DatabricksResultSet.java | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java b/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java index 5251427f0..6552903f7 100644 --- a/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java +++ b/src/main/java/com/databricks/jdbc/api/impl/DatabricksResultSet.java @@ -386,7 +386,7 @@ private void startHeartbeatIfEnabled() { } // Unsupported client — stop immediately, don't retry 10 times if (e instanceof java.sql.SQLFeatureNotSupportedException) { - LOGGER.info( + LOGGER.debug( "Heartbeat not supported by client for statement {}, stopping", capturedStatementId); capturedMgr.stopHeartbeat(capturedStatementId); From 75be9b71101f4a687f13ce918bab1f3b349d7aa8 Mon Sep 17 00:00:00 2001 From: Gopal Lal Date: Fri, 15 May 2026 14:53:48 +0530 Subject: [PATCH 23/23] Use lightweight /status endpoint for SEA heartbeat instead of full GetStatement Changed checkStatementAlive from GET /sql/statements/{id} (~21KB response with full manifest/result data) to GET /sql/statements/{id}/status (~100 bytes with just state/error/sqlState). Reduces heartbeat bandwidth by ~99% per tick. Co-authored-by: Isaac Signed-off-by: Gopal Lal --- .../impl/sqlexec/DatabricksSdkClient.java | 13 +++++---- .../dbclient/impl/sqlexec/PathConstants.java | 1 + .../impl/sqlexec/DatabricksSdkClientTest.java | 28 ++++--------------- 3 files changed, 14 insertions(+), 28 deletions(-) diff --git a/src/main/java/com/databricks/jdbc/dbclient/impl/sqlexec/DatabricksSdkClient.java b/src/main/java/com/databricks/jdbc/dbclient/impl/sqlexec/DatabricksSdkClient.java index 5561a90d7..1af31de5c 100644 --- a/src/main/java/com/databricks/jdbc/dbclient/impl/sqlexec/DatabricksSdkClient.java +++ b/src/main/java/com/databricks/jdbc/dbclient/impl/sqlexec/DatabricksSdkClient.java @@ -37,6 +37,7 @@ import com.databricks.jdbc.model.core.ExternalLink; import com.databricks.jdbc.model.core.ResultData; import com.databricks.jdbc.model.core.ResultManifest; +import com.databricks.jdbc.model.core.StatementStatus; import com.databricks.jdbc.model.telemetry.enums.DatabricksDriverErrorCode; import com.databricks.sdk.WorkspaceClient; import com.databricks.sdk.core.ApiClient; @@ -417,13 +418,13 @@ public DatabricksResultSet executeStatementAsync( @Override public boolean checkStatementAlive(StatementId typedStatementId) throws SQLException { String statementId = typedStatementId.toSQLExecStatementId(); - String getStatusPath = String.format(STATEMENT_PATH_WITH_ID, statementId); + // Use lightweight /status endpoint (~100 bytes) instead of full GetStatement (~21KB) + String statusPath = String.format(STATEMENT_STATUS_PATH_WITH_ID, statementId); try { - GetStatementRequest request = new GetStatementRequest().setStatementId(statementId); - Request req = new Request(Request.GET, getStatusPath, apiClient.serialize(request)); - req.withHeaders(getHeaders("getStatement")); - GetStatementResponse response = apiClient.execute(req, GetStatementResponse.class); - StatementState state = response.getStatus().getState(); + Request req = new Request(Request.GET, statusPath, (String) null); + req.withHeaders(getHeaders("getStatementStatus")); + StatementStatus status = apiClient.execute(req, StatementStatus.class); + StatementState state = status.getState(); // Terminal states mean the operation is no longer alive return state != StatementState.CANCELED && state != StatementState.CLOSED diff --git a/src/main/java/com/databricks/jdbc/dbclient/impl/sqlexec/PathConstants.java b/src/main/java/com/databricks/jdbc/dbclient/impl/sqlexec/PathConstants.java index 45713d3c8..817558508 100644 --- a/src/main/java/com/databricks/jdbc/dbclient/impl/sqlexec/PathConstants.java +++ b/src/main/java/com/databricks/jdbc/dbclient/impl/sqlexec/PathConstants.java @@ -8,6 +8,7 @@ public class PathConstants { public static final String STATEMENT_PATH = BASE_PATH + "statements/"; public static final String STATEMENT_PATH_WITH_ID = STATEMENT_PATH + "%s"; public static final String CANCEL_STATEMENT_PATH_WITH_ID = STATEMENT_PATH + "%s/cancel"; + public static final String STATEMENT_STATUS_PATH_WITH_ID = STATEMENT_PATH + "%s/status"; public static final String RESULT_CHUNK_PATH = STATEMENT_PATH_WITH_ID + "/result/chunks/%s"; public static final String TELEMETRY_PATH = "/telemetry-ext"; public static final String TELEMETRY_PATH_UNAUTHENTICATED = "/telemetry-unauth"; diff --git a/src/test/java/com/databricks/jdbc/dbclient/impl/sqlexec/DatabricksSdkClientTest.java b/src/test/java/com/databricks/jdbc/dbclient/impl/sqlexec/DatabricksSdkClientTest.java index e97e34042..71457c35e 100644 --- a/src/test/java/com/databricks/jdbc/dbclient/impl/sqlexec/DatabricksSdkClientTest.java +++ b/src/test/java/com/databricks/jdbc/dbclient/impl/sqlexec/DatabricksSdkClientTest.java @@ -1324,12 +1324,8 @@ public void testCheckStatementAlive_succeededState_returnsTrue() throws Exceptio new DatabricksSdkClient(connectionContext, statementExecutionService, apiClient); StatementStatus status = new StatementStatus().setState(StatementState.SUCCEEDED); - GetStatementResponse response = new GetStatementResponse(); - response.setStatus(status); - response.setStatementId(STATEMENT_ID.toSQLExecStatementId()); - when(apiClient.execute(any(Request.class), eq(GetStatementResponse.class))) - .thenReturn(response); + when(apiClient.execute(any(Request.class), eq(StatementStatus.class))).thenReturn(status); assertTrue(databricksSdkClient.checkStatementAlive(STATEMENT_ID)); } @@ -1342,11 +1338,8 @@ public void testCheckStatementAlive_runningState_returnsTrue() throws Exception new DatabricksSdkClient(connectionContext, statementExecutionService, apiClient); StatementStatus status = new StatementStatus().setState(StatementState.RUNNING); - GetStatementResponse response = new GetStatementResponse(); - response.setStatus(status); - when(apiClient.execute(any(Request.class), eq(GetStatementResponse.class))) - .thenReturn(response); + when(apiClient.execute(any(Request.class), eq(StatementStatus.class))).thenReturn(status); assertTrue(databricksSdkClient.checkStatementAlive(STATEMENT_ID)); } @@ -1359,11 +1352,8 @@ public void testCheckStatementAlive_canceledState_returnsFalse() throws Exceptio new DatabricksSdkClient(connectionContext, statementExecutionService, apiClient); StatementStatus status = new StatementStatus().setState(StatementState.CANCELED); - GetStatementResponse response = new GetStatementResponse(); - response.setStatus(status); - when(apiClient.execute(any(Request.class), eq(GetStatementResponse.class))) - .thenReturn(response); + when(apiClient.execute(any(Request.class), eq(StatementStatus.class))).thenReturn(status); assertFalse(databricksSdkClient.checkStatementAlive(STATEMENT_ID)); } @@ -1376,11 +1366,8 @@ public void testCheckStatementAlive_closedState_returnsFalse() throws Exception new DatabricksSdkClient(connectionContext, statementExecutionService, apiClient); StatementStatus status = new StatementStatus().setState(StatementState.CLOSED); - GetStatementResponse response = new GetStatementResponse(); - response.setStatus(status); - when(apiClient.execute(any(Request.class), eq(GetStatementResponse.class))) - .thenReturn(response); + when(apiClient.execute(any(Request.class), eq(StatementStatus.class))).thenReturn(status); assertFalse(databricksSdkClient.checkStatementAlive(STATEMENT_ID)); } @@ -1393,11 +1380,8 @@ public void testCheckStatementAlive_failedState_returnsFalse() throws Exception new DatabricksSdkClient(connectionContext, statementExecutionService, apiClient); StatementStatus status = new StatementStatus().setState(StatementState.FAILED); - GetStatementResponse response = new GetStatementResponse(); - response.setStatus(status); - when(apiClient.execute(any(Request.class), eq(GetStatementResponse.class))) - .thenReturn(response); + when(apiClient.execute(any(Request.class), eq(StatementStatus.class))).thenReturn(status); assertFalse(databricksSdkClient.checkStatementAlive(STATEMENT_ID)); } @@ -1409,7 +1393,7 @@ public void testCheckStatementAlive_exceptionWrapped() throws Exception { DatabricksSdkClient databricksSdkClient = new DatabricksSdkClient(connectionContext, statementExecutionService, apiClient); - when(apiClient.execute(any(Request.class), eq(GetStatementResponse.class))) + when(apiClient.execute(any(Request.class), eq(StatementStatus.class))) .thenThrow(new RuntimeException("Network error")); DatabricksSQLException exception =