Support for getkeysandflags by vaibhavyadav-dev · Pull Request #1777 · dicedb/dicedb

vaibhavyadav-dev · 2025-06-22T07:01:02Z

This PR implements COMMAND GETKEYSANDFLAGS only with tests

Changes:

handler for COMMAND GETKEYSANDFLAGS
added tests to validate correct key/flag mappings

closes #642

CLAassistant · 2025-06-22T07:01:08Z

All committers have signed the CLA.

This is somehow related with #974 and #1777. When the epoch changes, we should save the configuration file and broadcast a PONG as much as possible. For example, if a primary down after bumping the epoch, its replicas may initiate a failover, but the other primaries may refuse to vote because the epoch of the replica has not been updated. Or for example, for some reasons we bump the epoch, if the epoch is not updated in time in the cluster, it may affect the judgment of message staleness. These broadcasts are expensive in large clusters, but none of these seem high frequency so it should be fine. --------- Signed-off-by: Binbin <binloveplay1314@qq.com>

There is a failure in the daily: ``` === ASSERTION FAILED === ==> cluster_legacy.c:6588 'primary->replicaof == ((void *)0)' is not true ``` This is the logs: ``` - i am fd4318562665b4490ccc86e7f7988017cf960371 and myself become a replica, - 63c0167232dae95cdcc0a1568cd5368ac3b99f5 is the new primary 27867:M 24 Feb 2025 00:19:11.011 * Failover auth granted to 763c0167232dae95cdcc0a1568cd5368ac3b99f5 () for epoch 9 27867:M 24 Feb 2025 00:19:11.039 * Configuration change detected. Reconfiguring myself as a replica of node 763c0167232dae95cdcc0a1568cd5368ac3b99f5 () in shard c5f6b2a9c74cabd4d1e54d1130dc9cb9419bf76f 27867:S 24 Feb 2025 00:19:11.039 * Before turning into a replica, using my own primary parameters to synthesize a cached primary: I may be able to synchronize with the new primary with just a partial transfer. 27867:S 24 Feb 2025 00:19:11.039 * Connecting to PRIMARY 127.0.0.1:23654 27867:S 24 Feb 2025 00:19:11.039 * PRIMARY <-> REPLICA sync started - in here myself got an stale message, but we still process the packet and cause this issue 27867:S 24 Feb 2025 00:19:11.040 * Ignore stale message from 763c0167232dae95cdcc0a1568cd5368ac3b99f5 () in shard c5f6b2a9c74cabd4d1e54d1130dc9cb9419bf76f; gossip config epoch: 8, current config epoch: 9 27867:S 24 Feb 2025 00:19:11.040 * Node 763c0167232dae95cdcc0a1568cd5368ac3b99f5 () is now a replica of node fd4318562665b4490ccc86e7f7988017cf960371 () in shard c5f6b2a9c74cabd4d1e54d1130dc9cb9419bf76f ``` We can see myself got a stale message, but we still process it, and changed the role and cause a primary replica chain loop. The reason is that, this text is copy from valkey-io/valkey#651. In some rare case, slot config updates (via either PING/PONG or UPDATE) can be delivered out of order as illustrated below: ``` 1. To keep the discussion simple, let's assume we have 2 shards, shard a and shard b. Let's also assume there are two slots in total with shard a owning slot 1 and shard b owning slot 2. 2. Shard a has two nodes: primary A and replica A*; shard b has primary B and replica B*. 3. A manual failover was initiated on A* and A* just wins the election. 4. A* announces to the world that it now owns slot 1 using PING messages. These PING messages are queued in the outgoing buffer to every other node in the cluster, namely, A, B, and B*. 5. Keep in mind that there is no ordering in the delivery of these PING messages. For the stale PING message to appear, we need the following events in the exact order as they are laid out. a. An old PING message before A* becomes the new primary is still queued in A*'s outgoing buffer to A. This later becomes the stale message, which says A* is a replica of A. It is followed by A*'s election winning announcement PING message. b. B or B* processes A's election winning announcement PING message and sets slots[1]=A*. c. A sends a PING message to B (or B*). Since A hasn't learnt that A* wins the election, it claims that it owns slot 1 but with a lower epoch than B has on slot 1. This leads to B sending an UPDATE to A directly saying A* is the new owner of slot 1 with a higher epoch. d. A receives the UPDATE from B and executes clusterUpdateSlotsConfigWith. A now realizes that it is a replica of A* hence setting myself->replicaof to A*. e. Finally, the pre-failover PING message queued up in A*'s outgoing buffer to A is delivered and processed, out of order though, to A. f. This stale PING message creates the replication loop ``` Closes #1015. --------- Signed-off-by: Binbin <binloveplay1314@qq.com> Signed-off-by: Jacob Murphy <jkmurphy@google.com>

vaibhavyadav-dev added 2 commits June 22, 2025 12:19

Add support for COMMAND GETKEYSANDFLAGS (dicedb#642)

7b69ea2

Added test for COMMAND GETKEYSANDFLAGS (dicedb#642)

14d1738

arpitbbhayani deleted the branch dicedb:master February 9, 2026 19:21

arpitbbhayani closed this Feb 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for getkeysandflags#1777

Support for getkeysandflags#1777
vaibhavyadav-dev wants to merge 2 commits intodicedb:masterfrom
vaibhavyadav-dev:support-for-GETKEYSANDFLAGS

vaibhavyadav-dev commented Jun 22, 2025

Uh oh!

CLAassistant commented Jun 22, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

vaibhavyadav-dev commented Jun 22, 2025

Uh oh!

CLAassistant commented Jun 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLAassistant commented Jun 22, 2025 •

edited

Loading