Skip to content

performance: cherry-pick 5 improvements to main#21374

Open
AskAlexSharov wants to merge 6 commits into
mainfrom
alex/cp_performance_5prs_35
Open

performance: cherry-pick 5 improvements to main#21374
AskAlexSharov wants to merge 6 commits into
mainfrom
alex/cp_performance_5prs_35

Conversation

@AskAlexSharov
Copy link
Copy Markdown
Collaborator

Cherry-pick of 5 performance improvements from the performance branch to main:

  1. warmuper: cancelable worker — warmuper: cancelable worker #20941
  2. db/state, cmd/integration: 4x larger commitment rebuild shard, squeeze flag transparent — [r3.4] db/state, cmd/integration: 4x larger commitment rebuild shard, squeeze flag transparent #21147
  3. db/rawdb: increase ChangeSets3 prune loop stride 100→1000, move log inside stride check — increase ChangeSets3 prune limit at chain tip #21204
  4. db/seg: increase bufio pool size from 256KB to 512KB
  5. db/kv/prune: remove dead limit parameter from TableScanningPrune (accepted but never forwarded internally)

AskAlexSharov and others added 5 commits May 23, 2026 12:44
… squeeze flag transparent (#21147)

## Summary

Two tuning/transparency changes for commitment rebuild on `release/3.4`.
Subset of `awskii/r34-inc-shard-def-size` — the `minStepsForReferencing`
and `AggregatorSqueezeCommitmentValues` constant changes from that
branch are **intentionally excluded** here.

- **`db/state/squeeze.go`**: raise `shardStepsSize` cap from 16 → 64
steps during `RebuildCommitmentFiles`. Larger shards cut per-shard
overhead on long rebuilds.
- **`db/state/squeeze.go`**: stop forcing `ReplaceKeysInValues=true`
inside the rebuild's squeeze path. The post-rebuild squeeze now actually
honours the caller's `squeeze` flag — `if !squeeze { return }` (was `if
!squeeze && !statecfg.Schema.CommitmentDomain.ReplaceKeysInValues`), and
`ForTestReplaceKeysInValues(..., squeeze)` (was hardcoded `true`).
- **`cmd/integration/commands/flags.go`**: flip the `--squeeze` flag
default from `true` → `false` so the integration `commitment_rebuild`
command no longer squeezes by default.

Net effect: rebuild is faster (bigger shards) and squeeze is opt-in via
flag, not silently forced.
## Summary

On heavy-state chains (bloatnet), `ChangeSets3` was the dominant
chaindata growth source post-catch-up — file grew unboundedly because
prune couldn't keep up with the per-block changeset write rate.

**Root cause:** the `pruneDiffsLimitOnChainTip = 1000` cap in
`PruneExecutionStage` (active when `initialCycle=false`). On bloatnet:
- per-block changeset entries: ~1000–1500 (each ~5 KB serialized diff
chunks)
- per commit-cycle: ~40 blocks executed → ~40k–60k entries written
- per commit-cycle: ChangeSets3 prune drains at most 1000 (or until 2s
timeout) → drain rate is **roughly 1–2% of write rate**
- net: ChangeSets3 grows ~1–2 GB per minute under heavy load, pushing
chaindata file size up by tens of GB per hour

Observed on a 12-hour bloatnet run: ChangeSets3 stayed at 0 B during
catch-up (`initialCycle=true` overrides the cap to `math.MaxInt`), then
ballooned from 0 → 40 GB in the ~3 hours after the chain caught up. File
size grew 38 GB → 181 GB over the same window, with ~80% of the new
space attributable to ChangeSets3 + write amplification from a too-small
reclaim pool.

## Changes

1. **execution/stagedsync: bump ChangeSets3 chain-tip prune limit 1000 →
200000.**
The 2s timeout still bounds wall time; the cap raise removes the
artificial floor on how many entries one call drains. With 200k cap × 2s
timeout, a single PruneExecutionStage invocation can drain up to ~1 GB
of changesets — well above the per-cycle write rate.

2. **db/rawdb: PruneTable: fold logEvery + ctx + timeout into one
mod-1000 check.**
Per-iteration `select`-on-`logEvery.C` was a syscall on every row. Moved
into the same mod-stride as ctx-done + timeout, and bumped stride 100 →
1000. For 200k-row prunes this shaves the per-iter overhead noticeably
without affecting timeout responsiveness (1000 iters at ~microseconds
each = under 10 ms granularity).

## Notes

- Catch-up path (`initialCycle=true`) is unaffected — the override there
already uses `math.MaxInt` / 1h.
- Mainnet's per-block changeset rate is much lower than bloatnet's, so
the old 1000 cap was rarely binding. The new 200k cap is just as benign
there (the 2s timeout caps actual work).
- The bump pairs with the prune-in-CommitCycle change (#21192) — that
gave us a second prune call per FCU iteration, but both paths shared the
1000 cap. Doubling calls doesn't help if each is throttled.

## Test plan

- [ ] CI on \`performance\`
- [ ] Mainnet sync still healthy (cap raise + stride change are
non-functional w.r.t. correctness; only affect drain throughput)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants