Skip to content

[wip] db/state: move refcnt to visibleFiles object#21397

Open
AskAlexSharov wants to merge 14 commits into
mainfrom
alex/think_on_begin_35
Open

[wip] db/state: move refcnt to visibleFiles object#21397
AskAlexSharov wants to merge 14 commits into
mainfrom
alex/think_on_begin_35

Conversation

@AskAlexSharov
Copy link
Copy Markdown
Collaborator

Context

Aggregator.BeginFilesRo() was made lock-free in #20462/#20490, but physical
file deletion stayed gated by two per-FilesItem atomics (refcount + canDelete).
Two atomics guarding one destructive action (closeFilesAndRemove) is the TOCTOU
double-free behind #21384, and a per-file refcount taken after the snapshot
pointer is loaded can't protect the load→pin window by itself.

What this does

Replaces per-file refcount/canDelete with MVCC reclamation gated by a refcount
on the published bundle (aggregatorVisible) — the MDBX freelist model (a page
freed at txnid T is reclaimable once the oldest live reader's txnid > T),
realized in Go by reference-counting the generation object instead of each file.

  • Published bundles form an oldest→newest chain; a reader pins exactly one via
    refcnt. refcnt only grows while a bundle is current, only shrinks once superseded.
  • BeginFilesRo does validate-after-pin (one atomic add + re-check), closing the
    load→pin window. One add instead of dozens of per-file increments.
  • Files removed from dirtyFiles by a merge/prune are attached to the outgoing
    generation's retired set and physically deleted only once that generation
    (and every older one) drains — reclaimed oldest-first, single owner of
    closeFilesAndRemove, no per-file flag, no double-free.
  • DebugBeginDirtyFilesRo (BuildMissedAccessors) pins the generation the same way,
    so its captured dirty files — including unindexed ones absent from the visible
    set — are protected for the duration of the accessor build.

FilesItem.refcount/canDelete are now used only by the forkable subsystem
(out of scope here).

Design + file lifecycle: docs/plans/20260525-lockfree-file-reclamation-spec.md.

Status

WIP. Validated locally: db/state/... under -race (no data races), make lint,
make erigon integration.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors db/state snapshot-file deletion to avoid TOCTOU/double-free hazards by moving from per-FilesItem refcount/canDelete gating to generation-based (bundle) refcounting with oldest-first reclamation of retired files.

Changes:

  • Add generation-chained aggregatorVisible bundles with a single refcnt pin per reader, plus a retired set reclaimed when the oldest drained generation advances.
  • Update merge/cleanup paths to “retire” files (remove from dirtyFiles) and defer physical deletion to the bundle reclaimer.
  • Update debug pinning and GC tests to validate deferred deletion behavior under both normal readers and debug accessor-building readers.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
docs/plans/20260525-lockfree-file-reclamation-spec.md New design spec describing generation-based reclamation and invariants.
db/state/aggregator.go Implements bundle refcnt pinning, generation chain, retired-file reclamation, and updated merge cleanup plumbing.
db/state/dirty_files.go Replaces deleteMergeFile with retireMergeFiles and removes per-visible-file refcount increment/decrement helpers.
db/state/merge.go Threads “retired files” up from domain/history/II cleanup so aggregator can publish+defer deletion.
db/state/domain.go Removes per-file refcount pin/unpin in domain RO tx lifecycle (now covered by bundle pin).
db/state/history.go Removes per-file refcount pin/unpin in history RO tx lifecycle (now covered by bundle pin).
db/state/inverted_index.go Removes per-file refcount pin/unpin in II RO tx lifecycle (now covered by bundle pin).
db/state/aggregator_debug.go Debug dirty-files RO tx now pins the current bundle generation instead of per-file refcounting.
db/state/gc_test.go New tests verifying deferred deletion semantics for normal readers, debug pins, and concurrent reclaim.
db/state/aggregator_debug_test.go Updates test to assert debug Close is not a deleter (reclaimer is sole deleter).
db/state/squeeze.go Wraps recalcVisibleFiles(nil) with dirtyFilesLock where needed and updates signature calls.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread db/state/aggregator_debug_test.go
Comment thread db/state/gc_test.go Outdated
TestDirtyFilesRoTx_CloseIsNotADeleter leaked the seg.Decompressor FD, so on
Windows t.TempDir cleanup could not unlink the still-open file. Close it via
defer (the Close() under test must not delete the file, only release the FD).
Also refresh a stale "via FilesItem.refcount" comment — the debug RoTx now pins
the bundle generation.
@AskAlexSharov AskAlexSharov changed the title [wip] db/state: bundle-refcount snapshot file reclamation [wip] db/state: move refcnt to visibleFiles object May 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants