Conversation
Checklist before you submit for review
|
a6a0773 to
17c4380
Compare
|
TODO before merging: alter the commit message of cdc2d1a |
17c4380 to
7c70156
Compare
be40074 to
4cab315
Compare
| return Map.of( | ||
| ApplicationState.values()[7], VersionedValue.unsafeMakeVersionedValue(vv.value.split(":")[0], vv.version), | ||
| ApplicationState.values()[17], VersionedValue.unsafeMakeVersionedValue(vv.value.split(":")[1], vv.version)); | ||
| values = vv.value.split(":"); |
There was a problem hiding this comment.
This deserves a comment imo. Explaining the relationship between DSE, underlying protocol versions, states, the change from IP to IP:Port, etc. I would also extract it to a isPortPresent or sthg like that function.
There was a problem hiding this comment.
I do agree it could be documented better.
|
|
Testing upgrades here: http://10.169.74.112:8081/job/ds-cassandra-build/2125/ |
| boolean known = MessagingService.instance().versions.knows(peer); | ||
| logger.debug("Peer {} is known with version {}", peer, known ? MessagingService.instance().versions.getRaw(peer) : "null"); | ||
| // DSE 6.8/6.9 advertises itself with value higher than VERSION_40, thus we need to compare it with VERSION_DSE_68 | ||
| boolean detectedDse = known && MessagingService.instance().versions.getRaw(peer) >= MessagingService.VERSION_DSE_68; |
There was a problem hiding this comment.
(question)
why do we need this (added check on detectedDse) ?
when was any not DSE 6.x entering the condition ?
|
this type of commit is painful for rebases. i would normally ask how could the patch be improved to make rebases less likely to conflict, and to what git commit in BUT, IIUC this will not be committed to |
- Add logging to debug the port selecton for DSE -> HCD upgrade - Add logging to debug the messaging version selection during handshake - Logging in outbound connection at handshake success - Add the node address to logs - Log the protocol proposed by DSE peer - Log the value gossip fails on - Log the loaded/saved gossip values - Logs for Initiate, Accept and ConfirmOutboundPre40 - Logs for InboundConnectionInitiator
Adds tests validating the fix for ArrayIndexOutOfBoundsException that occurred when deserializing gossip state from pre-4.0 nodes containing address/port values without expected delimiters. The bug manifested when filterOutgoingState() attempted to split values like "10.0.0.1" or "NORMAL" and blindly access array indices [1] or [0] that didn't exist, causing crashes during gossip message processing in mixed-version clusters. The fix adds length checks after splitting to gracefully handle: - IP addresses without ports (e.g., "10.0.0.1" vs "10.0.0.1:7000") - Status values without tokens (e.g., "NORMAL" vs "NORMAL,10.0.0.1:7000")
…ivity check DSE 6.8 and later versions don't support PING_REQ messages, similar to DSE 6.x and Cassandra 3.x. This change extends the existing logic to detect and skip PING requests for DSE 6.8+ peers during the startup cluster connectivity check phase.
When updating keyspace schemas, UDTs from the previous schema are now preserved if they don't exist in the new schema. This prevents issues where inherited tables depend on types that would otherwise be lost during schema transformations.
|
Rebase preparing for the merge |
|
❌ Build ds-cassandra-pr-gate/PR-2240 rejected by Butler3 regressions found Found 3 new test failures
Found 7 known test failures |



This PR addresses critical compatibility issues discovered during DSE to HCD (Hyper-Converged Database) upgrade scenarios. The changes focus on ensuring smooth interoperability between DSE 6.8+ nodes and HCD nodes during mixed-version cluster operations.
What Problems Do These Changes Fix?
1. Enhanced Debugging Capabilities for Upgrade Scenarios
Problem: When upgrading from DSE to HCD, troubleshooting connection and gossip issues was difficult due to insufficient logging.
Solution: Added comprehensive logging throughout the handshake and gossip processes, including:
Why it matters: This gives operators visibility into what's happening during the upgrade process, making it much easier to diagnose and resolve issues in production environments.
2. Gossip Deserialization Crash Prevention
Problem: Nodes would crash with
ArrayIndexOutOfBoundsExceptionwhen processing gossip messages from pre-4.0 nodes. This happened because the code assumed certain delimiters would always be present in address and status values, but older nodes sent data in different formats. Both INTERNAL_ADDRESS_AND_PORT and NATIVE_ADDRESS_AND_PORT could arrive without port delimiters (e.g., "10.0.0.1" instead of "10.0.0.1:7000"), and STATUS_WITH_PORT could arrive without additional port information.Solution: Added defensive length checks after splitting gossip values to gracefully handle cases where expected delimiters are missing:
Why it matters: Prevents cluster instability and crashes during mixed-version operations.
3. DSE 6.8+ PING Request Compatibility
Problem: During startup connectivity checks, HCD nodes were attempting to send PING requests to DSE 6.8+ peers, but these versions don't support PING_REQ messages (similar to DSE 6.x and Cassandra 3.x). This resulted in noisy error logs during cluster startup.
Solution: Extended the existing version detection logic to recognize and skip PING requests for DSE 6.8+ peers during the startup connectivity check phase.
Why it matters: Eliminates unnecessary error logs when HCD nodes join or restart in a mixed DSE/HCD environment.
4. User-Defined Type Preservation During Schema Migration
Problem: When keyspace schemas were updated during migration, User-Defined Types (UDTs) from the previous schema could be lost if they weren't explicitly present in the new schema. This caused failures when inherited tables depended on these types.
Solution: Modified the schema transformation logic to preserve UDTs from the previous schema when they don't exist in the new schema, ensuring dependent tables continue to function correctly.
Why it matters: Prevents data model corruption and application failures during schema migrations, particularly important when dealing with complex schemas that use inheritance and UDTs.
Overall Impact
These changes collectively improve the robustness and reliability of DSE to HCD upgrades by:
Latest test run: http://10.169.74.112:8081/job/ds-cassandra-build/2117/
Baseline: http://10.169.74.112:8081/job/ds-cassandra-build/10/
Review corpus