Skip to content

peak_ewma: fix segfault from timer thread-safety violation#43526

Open
rroblak wants to merge 1 commit intoenvoyproxy:mainfrom
rroblak:rroblak/fix-peak-ewma-segfault
Open

peak_ewma: fix segfault from timer thread-safety violation#43526
rroblak wants to merge 1 commit intoenvoyproxy:mainfrom
rroblak:rroblak/fix-peak-ewma-segfault

Conversation

@rroblak
Copy link
Contributor

@rroblak rroblak commented Feb 17, 2026

Commit Message: peak_ewma: fix segfault from timer thread-safety violation (#43513)
Additional Description:
Hopefully fixes #43513. The Peak EWMA LB constructor took an Event::Dispatcher& and called createTimer() on it. When instantiated on worker threads (via dynamic config such as Istio EnvoyFilter or Envoy Gateway EnvoyPatchPolicy), this violated Envoy's thread-safety model — timers must be created on the dispatcher's owning thread — causing assert failure: isThreadSafe() (debug) or segfault (release).

This PR:

  • Replaces timer-based aggregation with inline aggregation in chooseHost(), removing the Event::Dispatcher& dependency entirely
  • Removes the destructor that cleared host lbPolicyData (raced with workers still reading)
  • Cleans up all_host_stats_ entries on host removal (shared_ptr leak)

Risk Level: Low — peak_ewma is a contrib extension; changes are isolated to its source and tests.
Testing: New peak_ewma_lb_host_lifecycle_test.cc with regression tests for all 3 bugs. All existing peak_ewma tests pass.
Docs Changes: N/A
Release Notes: N/A
Platform Specific Features: N/A

@frittentheke
Copy link
Contributor

Thanks @rroblak for writing this fix! CI / build is currently failing.
If you could take another peak? Once it builds I gladly take it for a smoke test to see if it now works crash-free in the cases I tested so far.

@rroblak rroblak force-pushed the rroblak/fix-peak-ewma-segfault branch from 2831f68 to 5f07411 Compare February 18, 2026 15:07
@rroblak
Copy link
Contributor Author

rroblak commented Feb 18, 2026

Thanks @frittentheke! I suspect the CI failure was due to an unrelated Go dependency resolution issue that was fixed by #43536. I've rebased and pushed— let's see if that clears it up.

…y#43513)

- Remove dispatcher/timer dependency; aggregate inline in chooseHost()
- Remove destructor that cleared host lbPolicyData (race with workers)
- Clean up all_host_stats_ on host removal (fix shared_ptr leak)
- Remove dispatcher from config factory
- Add host lifecycle regression tests for all three bugs

Signed-off-by: Ryan Oblak <rroblak@gmail.com>
@rroblak rroblak force-pushed the rroblak/fix-peak-ewma-segfault branch from 5f07411 to a9c0016 Compare February 18, 2026 16:19
@rroblak
Copy link
Contributor Author

rroblak commented Feb 18, 2026

@frittentheke CI is green! Ready for your smoke test whenever you get a chance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Peak EWMA load balancer (contrib) randomly segfaults -- (Envoy via Istio-Proxy 1.29.0)

2 participants

Comments