Skip to content

feat: optimize system table queries with column projection (DRIVER-368)#862

Open
nikagra wants to merge 4 commits intoscylladb:scylla-3.xfrom
nikagra:driver-368-optimize-system-local-query
Open

feat: optimize system table queries with column projection (DRIVER-368)#862
nikagra wants to merge 4 commits intoscylladb:scylla-3.xfrom
nikagra:driver-368-optimize-system-local-query

Conversation

@nikagra
Copy link
Copy Markdown

@nikagra nikagra commented Apr 7, 2026

Closes https://scylladb.atlassian.net/browse/DRIVER-368

Backport of the same optimization done for the 4.x driver (DefaultTopologyMonitor), applied here to ControlConnection in the 3.x driver.

On the first connection to each system table (system.local, system.peers, system.peers_v2) the driver issues SELECT * to discover which columns the server exposes. It caches the intersection of those columns with a driver-internal *_COLUMNS_OF_INTEREST set. Subsequent queries project only those columns, reducing bytes on the wire and deserialization work.

Changes:

  • Three LOCAL/PEERS/PEERS_V2_COLUMNS_OF_INTEREST ImmutableSet<String> constants
  • Three volatile Set<String> cache fields (null = uninitialized sentinel)
  • intersectWithNeeded() and buildProjectedQuery() static helpers (@VisibleForTesting)
  • Cache reset in setNewConnection() and peers_v2 fallback path
  • Projected queries in refreshNodeListAndTokenMap(), selectPeersFuture(), and fetchNodeInfo() (system.local only)
  • fetchNodeInfo() uses SELECT * for peer WHERE-clause lookups (single-row reconnection path) to stay compatible with Scassandra primes on non-restarted nodes
  • New ControlConnectionUnitTest.java: 16 pure unit tests, no cluster required

@dkropachev
Copy link
Copy Markdown

@nikagra please check why cicd is failing

@nikagra nikagra marked this pull request as draft April 7, 2026 14:44
@nikagra nikagra requested review from Copilot and dkropachev April 9, 2026 21:26
@nikagra nikagra marked this pull request as ready for review April 9, 2026 21:26
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR backports the 4.x “system table column projection” optimization to the 3.x driver’s ControlConnection, caching discovered system table columns (via initial SELECT *) and using projected SELECT col1, col2... for subsequent queries to reduce bytes/deserialization work.

Changes:

  • Add *_COLUMNS_OF_INTEREST constants, projected-column caches, and helper methods (intersectWithNeeded, buildProjectedQuery) in ControlConnection.
  • Update control connection system table queries to use projections once caches are warmed, and reset caches on reconnect / peers_v2 fallback.
  • Extend Scassandra priming to include projected-query primes and add unit tests covering the new helpers/caches.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
driver-core/src/main/java/com/datastax/driver/core/ControlConnection.java Adds column-interest sets, cache fields, cache reset, and switches system table queries to projected form after discovery.
driver-core/src/test/java/com/datastax/driver/core/ScassandraCluster.java Re-primes after restart and primes projected queries to match the driver’s new projected system-table queries.
driver-core/src/test/java/com/datastax/driver/core/ControlConnectionUnitTest.java New pure unit tests for projection helpers/constants and cache-field declarations.
driver-core/src/test/java/com/datastax/driver/core/ControlConnectionTest.java Resets column caches when Scassandra primes are cleared to avoid projected-query misses.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread driver-core/src/test/java/com/datastax/driver/core/ScassandraCluster.java Outdated
Comment thread driver-core/src/main/java/com/datastax/driver/core/ControlConnection.java Outdated
Comment thread driver-core/src/test/java/com/datastax/driver/core/ControlConnectionUnitTest.java Outdated
@nikagra nikagra force-pushed the driver-368-optimize-system-local-query branch from 0bf9092 to 4f9a052 Compare April 9, 2026 22:25
@nikagra
Copy link
Copy Markdown
Author

nikagra commented Apr 9, 2026

All CI failures have been resolved. The root issues were: projected WHERE-clause queries not primed in Scassandra for non-restarted nodes (fixed by using SELECT * for peer WHERE lookups in fetchNodeInfo), inconsistent host_id UUIDs across system.local and system.peers primes (fixed by computing UUIDs once per ScassandraCluster instance), and missing contains()-guarded columns in the PEERS/PEERS_V2_COLUMNS_OF_INTEREST sets. All 11 CI checks are now passing on the latest commit.

@nikagra nikagra force-pushed the driver-368-optimize-system-local-query branch from c37c98f to 4f9a052 Compare April 9, 2026 23:27
Comment thread driver-core/src/main/java/com/datastax/driver/core/ControlConnection.java Outdated
Comment thread driver-core/src/main/java/com/datastax/driver/core/ControlConnection.java Outdated
@nikagra nikagra force-pushed the driver-368-optimize-system-local-query branch from ab6f00c to 367ff74 Compare May 5, 2026 13:48
@nikagra nikagra requested a review from dkropachev May 5, 2026 13:50
Copy link
Copy Markdown

@dkropachev dkropachev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, can I ask you to run one experiment, could you please task ask AI to extract this new logic and variables and store them on a separate class, it should make code cleaner.

Comment on lines +806 to +828
null, cluster.protocolVersion(), new Requests.Query(projection.peersQuery()));
connection.write(rawFuture);
final SettableFuture<ResultSet> peersFuture = SettableFuture.create();
Futures.addCallback(
rawFuture,
new FutureCallback<ResultSet>() {
@Override
public void onSuccess(ResultSet result) {
projection.populatePeers(result);
peersFuture.set(result);
}

@Override
public void onFailure(Throwable t) {
if (t instanceof InvalidQueryException) {
// The projected query referenced a column the server no longer exposes; reset
// caches so the next query re-discovers columns via SELECT *.
projection.reset();
}
peersFuture.setException(t);
}
},
MoreExecutors.directExecutor());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure it is not excessive ? Can you just create peersFuture and hook to onSuccess and onFailure ?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also feel like you can have a method on projection that takes DefaultResultSetFuture and hooks to it, and use it here and in at line 536

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added hookLocal(), hookPeers(), and hookPeersV2() to SystemColumnProjection. Each method attaches a FutureCallback to the given DefaultResultSetFuture that populates the corresponding column cache on success and resets all caches on InvalidQueryException failure, then returns the future unchanged.

In selectPeersFuture() the system.peers branch now drops the SettableFuture wrapper entirely — replaced by a single return projection.hookPeers(rawFuture).

fetchNodeInfo() now calls the appropriate hook before future.get() and removes the explicit populate*() calls from the if (row != null) block. The cache is populated as a side-effect of future completion regardless of row count, which is correct since ColumnDefinitions are returned by the server independent of whether any rows match.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction to the previous reply: the claim that "populating from ColumnDefinitions regardless of row count is correct" turned out to be wrong.

ControlConnectionTest.should_fetch_whole_peers_table_if_broadcast_address_changed caught a regression: when fetchNodeInfo() issues a WHERE peer='...' lookup and gets 0 rows back (because the broadcast address changed), the hook still fires — ColumnDefinitions are present in the result metadata even for empty result sets — and warms the peersColumns cache from that empty-result query. The subsequent full-table scan in selectPeersFuture() then sends a projected query, which the Scassandra prime doesn't recognise, and returns an empty result, causing refreshNodeInfo() to return false.

The original if (row != null) guard on populatePeers() was intentional and correct.

The hook API is now limited to the one path where it's unambiguously correct: the system.peers full-table scan in selectPeersFuture() (where the result always reflects the full schema regardless of row count). hookLocal() and hookPeersV2() have been removed. fetchNodeInfo() reverts to calling populateLocal() / populatePeers() / populatePeersV2() directly inside the if (row != null) block.

@nikagra nikagra force-pushed the driver-368-optimize-system-local-query branch from 22ba3af to 927802f Compare May 5, 2026 17:56
nikagra added 3 commits May 5, 2026 20:12
Backport of the 4.x DefaultTopologyMonitor optimization to the 3.x
driver's ControlConnection. On the first query to each system table
(system.local, system.peers, system.peers_v2) the driver sends SELECT *
to discover available columns. It caches the intersection with an
internal *_COLUMNS_OF_INTEREST set; subsequent queries project only those
columns, reducing bytes on the wire and deserialization work.

Changes in ControlConnection:
- LOCAL/PEERS/PEERS_V2_COLUMNS_OF_INTEREST ImmutableSet<String> constants
- Volatile Set<String> cache fields (null = uninitialized sentinel)
- intersectWithNeeded() returns null on empty intersection to prevent
  invalid empty-column projections
- buildProjectedQuery() static helper
- Cache reset in tryConnect() so each new connection rediscovers columns
- Projected queries in refreshNodeListAndTokenMap(), selectPeersFuture(),
  and fetchNodeInfo() (system.local only; peer WHERE lookups use SELECT *
  for Scassandra compatibility)

Changes in ScassandraCluster:
- hostIdByNodeCount computed once per instance for stable host_id values
- primeMetadata() primes both SELECT * and projected-query variants

New unit tests in ControlConnectionUnitTest covering projection helpers,
constants, and cache field declarations (16 tests).
Move the three *_COLUMNS_OF_INTEREST constants, the three volatile cache
fields (localColumns, peersColumns, peersV2Columns), and the two static
helpers (intersectWithNeeded, buildProjectedQuery) out of ControlConnection
and into a new package-private SystemColumnProjection class.

ControlConnection now holds a single 'projection' instance and delegates
all projection state and query-building to it. resetColumnCaches() remains
on ControlConnection as a @VisibleForTesting thin wrapper over
projection.reset(), used by ControlConnectionTest between Scassandra prime
clears.

ControlConnectionUnitTest is updated to test SystemColumnProjection
directly; ScassandraCluster references updated from ControlConnection.*
to SystemColumnProjection.*.
If a projected query fails with InvalidQueryException (e.g. a column was
dropped from a system table between connections), call projection.reset()
before signalling a reconnect. This ensures the next connection starts
with all caches null and re-discovers available columns via SELECT *,
rather than re-sending the now-invalid projected query.

Locations updated:
- selectPeersFuture: system.peers_v2 onFailure (downgrade path) — changed
  resetPeers() to reset() so the stale peersV2Columns is also cleared
- selectPeersFuture: system.peers onFailure — added reset() on
  InvalidQueryException before propagating the failure
- refreshNodeListAndTokenMap: ExecutionException catch — added reset() when
  cause is InvalidQueryException
- refreshNodeInfo: ExecutionException catch — same
@nikagra nikagra force-pushed the driver-368-optimize-system-local-query branch from 927802f to 7894323 Compare May 5, 2026 18:14
…eersV2

Add hookLocal(), hookPeers(), and hookPeersV2() to SystemColumnProjection.
Each method attaches a FutureCallback to the given DefaultResultSetFuture
that populates the corresponding column cache on success and resets all
caches on InvalidQueryException failure, then returns the future unchanged.

In ControlConnection:
- selectPeersFuture(): drop the SettableFuture wrapper in the system.peers
  branch; replace with a single projection.hookPeers(rawFuture) call.
- fetchNodeInfo(): attach the appropriate hook before future.get() instead
  of explicitly calling populate*() inside the if (row != null) guard.
  The cache is populated as a side-effect of future completion regardless
  of whether rows are present, which is correct since ColumnDefinitions
  are returned by the server independent of row count.
@nikagra nikagra force-pushed the driver-368-optimize-system-local-query branch from 4f790df to b4cf4c8 Compare May 6, 2026 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants