Skip to content

handle create repl_dev failure #807

@Besroy

Description

@Besroy

Creating a replication device (repl dev) can fail and leave garbage repl devs on certain members. For example:

  • Leader adds member F2, but the operation times out.
  • Leader assumes F2 is not in the group, but F2 successfully joins.
  • When the group is destroyed, F2 is not included, leaving an orphaned group on F2.

Related Logs

Leader:

  • Sent join request to F2:
    [09/20/25 09:01:12.178923] [I] [76] [handle_join_leave.cxx:149:invite_srv_to_join_cluster] sent join request to peer 984278082, 98f2b032-095d-4d78-a069-d1e21f95603d [group=e3be8382-63f2-4edd-85ed-8851e8e65641]
    
  • Timeout occurred:
    [09/20/25 09:01:14.179209] [I] [61] [raft_server.cxx:1639:handle_ext_resp_err] receive an rpc error response from peer server, Deadline Exceeded 12 [group=e3be8382-63f2-4edd-85ed-8851e8e65641]
    

Follower (F2):

  • Received the join request after the leader timed out:
    [09/20/25 09:01:14.390810] [I] [65] [handle_join_leave.cxx:188:handle_join_cluster_req] got join cluster req from leader 2114978300 [group=e3be8382-63f2-4edd-85ed-8851e8e65641]
    

For more context, refer to the related discussion: GitHub PR #136 - Discussion.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions