Clean up stale resources when BR destroy PG

SH tests found 2 issues: When a member is resynced by BR, all PG resources (blob allocation/logs/status/...) are expected to be reset/new. However, pg_destroy misses some resources, leaving stale state, which can cause corrupted blobs or system crashes. Here we need:
  1. Clean up all currently identified stale resources;
  2. Discuss how to ensure future resources are reliably cleaned up when BR destroys a PG, since there might be more resources added in the future.

Currently there are two resources that need cleanup during BR PG destroy, related to 2 issues:
1. stale rreqs ([SH issue#91](https://docs.google.com/document/d/16DMqv9J-JuNs5c25IBuX01QXe5WbxPm4B4hAfz4JNZM/edit?tab=t.0#heading=h.zf3q4kladuic))
The chunk block index is reset by BR, but the associated rreqs in m_repl_key_req_map are not cleared. After BR completes, when the log is appended from Raft, the stale rreq in memory is reused, leading to an incorrect block.
2. no_space_left_error_info ([SH issue#95)](https://docs.google.com/document/d/16DMqv9J-JuNs5c25IBuX01QXe5WbxPm4B4hAfz4JNZM/edit?tab=t.0#heading=h.21ec6x1qhz28)
EGC handle_no_space_left waits for commits up to LSN1, then BR occurs and advances commit_lsn to LSN2 (LSN2 > LSN1). Since the stale no_space_left_error_info is not reset and its lsn < commit_lsn, this hits the assert in [notify_committed_lsn](https://github.com/eBay/HomeObject/blob/f72bec0770f0358056419e026dc65f7ca8ad40d8/src/lib/homestore_backend/replication_state_machine.cpp#L53-L57) called by flush_durable_commit_lsn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up stale resources when BR destroy PG #394

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Clean up stale resources when BR destroy PG #394

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions