Avoid NOTE_ABSOLUTE timers for UPTIME/MONOTONIC on FreeBSD/OpenBSD.#943
Open
max-potapov wants to merge 1 commit into
Open
Avoid NOTE_ABSOLUTE timers for UPTIME/MONOTONIC on FreeBSD/OpenBSD.#943max-potapov wants to merge 1 commit into
max-potapov wants to merge 1 commit into
Conversation
### Motivation: libdispatch programs every EVFILT_TIMER with NOTE_ABSOLUTE on every platform, computing an absolute deadline value via `_dispatch_time_now_cached` and passing it to kqueue with `NOTE_NSECONDS | NOTE_ABSOLUTE` (= `NOTE_ABSTIME` on BSD). On FreeBSD/OpenBSD only the WALL clock works that way — kqueue interprets the `NOTE_NSECONDS|NOTE_ABSTIME` value as CLOCK_REALTIME nanoseconds, which matches what libdispatch computes for `DISPATCH_CLOCK_WALL`. For UPTIME / MONOTONIC, libdispatch passes a CLOCK_MONOTONIC nanosecond value that BSD kqueue still interprets as realtime (decades in the past after boot). kqueue reports the timer as already-expired on every register, so the dispatch event loop fires the timer block immediately, re-arms the timer, fires it again, and so on. Observed on a long-running HummingbirdCore HTTP server using `Task.sleep` as ~48 000 kevent()/s with one CPU core pinned to ~100 % even with no real work pending. PRs swiftlang#879 and swiftlang#931 corrected the NOTE_NSECONDS scaling for the relative-time path on FreeBSD but did not touch the absolute-time deadline computation this PR addresses. ### Modifications: For `os(FreeBSD)` and `os(OpenBSD)`, split the per-clock flag selection so NOTE_ABSOLUTE is kept for DISPATCH_CLOCK_WALL (which works correctly on BSD) but dropped for DISPATCH_CLOCK_UPTIME and DISPATCH_CLOCK_MONOTONIC. Introduce `DISPATCH_NOTE_ABSOLUTE_<kind>` macros next to the existing `DISPATCH_NOTE_CLOCK_<kind>` ones, and have `_dispatch_timer_index_to_fflags` use them. In `_dispatch_event_loop_timer_arm`, when running on BSD with a non-WALL clock, convert the absolute deadline back to a relative delay by subtracting the cached `now`. This preserves any leeway already folded into `target` by `_dispatch_timers_force_max_leeway` above (a simpler `target = range.delay` would silently drop that addition because `range.leeway` has been zeroed by that point). The change is guarded behind `#if defined(__FreeBSD__) || defined(__OpenBSD__)` and is a no-op on every other platform. ### Result: Reproduced on FreeBSD 15.0-RELEASE-p9 with the official Apple Swift FreeBSD preview toolchain (Swift 6.3-dev) by running the same Swift binary (HummingbirdCore HTTP server + a single periodic `Task.sleep(for: .seconds(3600))` loop, no real probes/work) against both upstream and patched libdispatch.so via LD_LIBRARY_PATH: ``` === upstream libdispatch === %CPU RSS COMMAND 37.9 39736 /tmp/netwatch-exporter syscall seconds calls errors _umtx_op 3.999564140 16 0 kevent 0.716818577 117827 0 === patched libdispatch === %CPU RSS COMMAND 1.0 39820 /tmp/netwatch-exporter syscall seconds calls errors _umtx_op 3.999207012 16 0 kevent 1.999276981 10 0 ``` Same binary, same machine, same Swift toolchain — only `/opt/swift/lib/swift/freebsd/libdispatch.so` swapped. kevent rate drops by ~11 000×; CPU drops from one core pinned to ~idle. In production (a Hummingbird-based Prometheus exporter on FreeBSD 15) the same swap took the service from 70-98 % CPU down to 0.0 %. Existing `dispatch_timer*` tests (`dispatch_timer`, `dispatch_timer_short`, `dispatch_timer_timeout`, `dispatch_timer_set_time`, `dispatch_timer_bit31`, `dispatch_timer_bit63`) pass on FreeBSD with both upstream and patched libdispatch — they exercise the user-facing dispatch_after / DispatchSource APIs which complete before the busy loop can settle, so they do not detect the underlying bug. Two smoke tests documenting the per-clock split also pass cleanly on the patched build: ``` $ ./dispatch_timer_walltime dispatch_after(walltime + 2.0s) returned after 2.001s PASS $ ./dispatch_timer_uptime uptime test: dispatch_after(NOW + 2s) returned after 2.001s PASS ``` These two C-level reproducers are tiny and could be folded into the existing tests/ directory if maintainers want them landed — happy to add as a follow-up commit.
caa5570 to
9c26068
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
libdispatch programs every
EVFILT_TIMERwithNOTE_ABSOLUTEon everyplatform, computing an absolute deadline value via
_dispatch_time_now_cachedand passing it to kqueue withNOTE_NSECONDS | NOTE_ABSOLUTE(=NOTE_ABSTIMEon BSD).On FreeBSD / OpenBSD only the WALL clock works that way: kqueue with
NOTE_NSECONDS | NOTE_ABSTIMEinterprets the data as aCLOCK_REALTIMEnanosecond absolute deadline (FreeBSD subtracts theboot offset internally; OpenBSD documents the realtime semantics
directly). That matches the value libdispatch already computes for
DISPATCH_CLOCK_WALL, so wall timers behave correctly and continueto track wall-clock adjustments.
For
DISPATCH_CLOCK_UPTIME/DISPATCH_CLOCK_MONOTONIC, libdispatchhands kqueue a
CLOCK_MONOTONICnanosecond value that BSD kqueueinterprets as realtime — decades in the past after boot. kqueue
reports the timer as already-expired on every register, so the
dispatch event loop fires the timer block immediately, re-arms the
timer, fires it again, and so on. Observed on a long-running
HummingbirdCore HTTP server using
Task.sleepas ~48 000 kevent()/swith one CPU core pinned to ~100 % even with no real work pending.
PRs #879 and #931 corrected the
NOTE_NSECONDSscaling for therelative-time path on FreeBSD but did not touch the absolute-time
deadline computation this PR addresses.
Modifications
For
os(FreeBSD)andos(OpenBSD), split the per-clock flagselection so that NOTE_ABSOLUTE is kept for
DISPATCH_CLOCK_WALL(which works correctly on BSD) but dropped for
DISPATCH_CLOCK_UPTIMEandDISPATCH_CLOCK_MONOTONIC. IntroduceDISPATCH_NOTE_ABSOLUTE_<kind>macros next to the existingDISPATCH_NOTE_CLOCK_<kind>ones, and have_dispatch_timer_index_to_fflagsuse them.In
_dispatch_event_loop_timer_arm, when running on BSD with anon-WALL clock, convert the absolute deadline back to a relative
delay by subtracting the cached
now. Usingtarget -= nowratherthan the simpler
target = range.delaypreserves any leeway alreadyfolded into
targetby_dispatch_timers_force_max_leewayabove(
range.leewayhas been zeroed by that point, so an overwrite wouldsilently drop the addition and break
LIBDISPATCH_TIMERS_FORCE_MAX_LEEWAY).The change is guarded behind
#if defined(__FreeBSD__) || defined(__OpenBSD__)and is a no-op on every other platform.Result
Reproduced on FreeBSD 15.0-RELEASE-p9 with the official Apple Swift
FreeBSD preview toolchain (Swift 6.3-dev) by running the same Swift
binary (HummingbirdCore HTTP server + a single periodic
Task.sleep(for: .seconds(3600))loop, no real probes/work) againstboth upstream and patched
libdispatch.soviaLD_LIBRARY_PATH:Same binary, same machine, same Swift toolchain — only
/opt/swift/lib/swift/freebsd/libdispatch.soswapped.keventratedrops by ~11 000×; CPU drops from one core pinned to ~idle.
In production (a Hummingbird-based Prometheus exporter on FreeBSD 15)
the same swap took the service from 70-98 % CPU down to 0.0 %.
Checks
dispatch_timer*tests (dispatch_timer,dispatch_timer_short,dispatch_timer_timeout,dispatch_timer_set_time,dispatch_timer_bit31,dispatch_timer_bit63) pass on FreeBSD with both upstreamand patched libdispatch. (They exercise the user-facing
dispatch_after / DispatchSource APIs which complete before the
busy loop can settle, so they don't independently detect the
underlying tight-loop bug.)
dispatch_after(dispatch_walltime(NULL, 2s))returns at 2.001 son the patched build, identical to upstream.
dispatch_after(dispatch_time(DISPATCH_TIME_NOW, 2s))returnsat 2.001 s on the patched build.
#if defined(__FreeBSD__) || defined(__OpenBSD__); no behaviour change on Apple/Linux.LIBDISPATCH_TIMERS_FORCE_MAX_LEEWAYsemantics preserved —the relative-time conversion (
target -= now) keeps any leewayalready folded into
targetabove.Notes
requires the Swift Concurrency cooperative executor + a concurrent
NIO event loop to manifest, which is awkward to express against
libdispatch's existing test suite without depending on the Swift
runtime. The two smoke tests above (walltime / uptime
dispatch_after) document the per-clock split and could be foldedinto
tests/if maintainers want them landed — happy to add as afollow-up commit.
NOTE_NSECONDS|NOTE_ABSTIMEclock semantics in_dispatch_time_now_cacheditself (so uptime/monotonic values getconverted to realtime before being handed to kqueue). This PR
intentionally takes the narrower route — sidestepping the absolute
path for the two affected clocks — because it keeps the change
small, self-contained, and lets users with broken
Task.sleep-on-FreeBSD ship code today.