Skip to content

fix: extend NodeDiagnostics with node state, distance, datacenter and pool size (DRIVER-540)#887

Merged
dkropachev merged 1 commit into
scylladb:scylla-4.xfrom
nikagra:mhradovich/4.x-driver-540-timeout-diagnostics
May 14, 2026
Merged

fix: extend NodeDiagnostics with node state, distance, datacenter and pool size (DRIVER-540)#887
dkropachev merged 1 commit into
scylladb:scylla-4.xfrom
nikagra:mhradovich/4.x-driver-540-timeout-diagnostics

Conversation

@nikagra
Copy link
Copy Markdown

@nikagra nikagra commented May 13, 2026

Follows up on #883.

Fixes: DRIVER-540

Problem

The NodeDiagnostics snapshot introduced in #883 captures stream-ID and in-flight counts at timeout time, but gives no context about the node's health or topology. Two important diagnostic questions remain unanswered:

  • Is the node actually known to be down at timeout time?
  • Is the pool degraded (fewer connections than expected)?
  • Is this a remote DC node where higher latency is expected?

Changes

Four additional fields added to DriverTimeoutException.NodeDiagnostics:

Field Source What it reveals
nodeState Node.getState() UP/DOWN/FORCED_DOWN — immediately explains timeouts to unavailable nodes
nodeDistance Node.getDistance() LOCAL/REMOTE/IGNORED — contextualizes latency expectations
datacenter Node.getDatacenter() DC name — cross-DC routing diagnosis
poolSize ChannelPool.size() Active connection count — reveals degraded pools

All four are available from the Node and ChannelPool objects already in scope at each buildNodeDiagnostics() call site. No new infrastructure required.

Updated message format

Query timed out after PT0.5S — node in flight: /10.0.0.1:9042 [state: UP, distance: LOCAL, dc: dc1, channel in-flight: 5, pool size: 3, pool in-flight: 12, pool available ids: 988, pool orphaned ids: 2]

Files changed

  • DriverTimeoutException.java — new fields, getters, Javadoc, updated toString()
  • CqlRequestHandler.java, CqlPrepareHandler.java, GraphRequestHandler.java, ContinuousRequestHandlerBase.java — pass new fields to NodeDiagnostics.of()

Notes

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends DriverTimeoutException.NodeDiagnostics to include additional node topology/health context at the time a driver timeout occurs, making timeout messages more actionable for debugging.

Changes:

  • Added nodeState, nodeDistance, datacenter, and poolSize to DriverTimeoutException.NodeDiagnostics, including getters and updated toString().
  • Updated CQL/Graph/Continuous request handlers to populate the new diagnostics fields from Node and ChannelPool.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
core/src/main/java/com/datastax/oss/driver/internal/core/cql/CqlRequestHandler.java Passes node state/distance/DC and pool size into NodeDiagnostics.of() at timeout time.
core/src/main/java/com/datastax/oss/driver/internal/core/cql/CqlPrepareHandler.java Same as above for prepare timeouts.
core/src/main/java/com/datastax/oss/driver/api/core/DriverTimeoutException.java Extends NodeDiagnostics API and updates string rendering/message formatting.
core/src/main/java/com/datastax/dse/driver/internal/core/graph/GraphRequestHandler.java Same as above for graph timeouts.
core/src/main/java/com/datastax/dse/driver/internal/core/cql/continuous/ContinuousRequestHandlerBase.java Same as above for continuous paging/global timeouts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@nikagra nikagra marked this pull request as ready for review May 13, 2026 13:46
@nikagra nikagra requested a review from dkropachev May 13, 2026 13:46
return baseMessage;
}
return baseMessage + " node in flight: " + nodeDiagnostics;
return baseMessage + " \u2014 node in flight: " + nodeDiagnostics;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you drop this one.

… pool size (DRIVER-540)

Add four additional fields to DriverTimeoutException.NodeDiagnostics captured at
timeout time:

- nodeState: UP/DOWN/FORCED_DOWN/UNKNOWN — immediately explains timeouts to downed nodes
- nodeDistance: LOCAL/REMOTE/IGNORED — contextualizes latency expectations
- datacenter: node DC — helps diagnose cross-DC routing issues
- poolSize: active connection count — reveals degraded pools (fewer connections than expected)

All four are available from Node and ChannelPool already in scope at each
buildNodeDiagnostics() call site. No new infrastructure required.

Updated toString() example:
  /10.0.0.1:9042 [state: UP, distance: LOCAL, dc: dc1, channel in-flight: 5,
   pool size: 3, pool in-flight: 12, pool available ids: 988, pool orphaned ids: 2]
@nikagra nikagra force-pushed the mhradovich/4.x-driver-540-timeout-diagnostics branch from 62bf3c1 to 40b83bf Compare May 14, 2026 09:05
@dkropachev dkropachev merged commit 940259a into scylladb:scylla-4.x May 14, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants