Consider max error when determining if time is synced by bnaecker · Pull Request #10318 · oxidecomputer/omicron

bnaecker · 2026-04-23T23:10:10Z

Parse more data from chronyc tracking, including the max offset, root delay, and root dispersion. Compute maximum error from the last two, and use that when determining if time is synced.
Add new NTP admin server version with extra data
Log more details in reconciler so we can see why time isn't synced.

- Parse more data from `chronyc tracking`, including the max offset, root delay, and root dispersion. Compute maximum error from the last two, and use that when determining if time is synced. - Add new NTP admin server version with extra data - Log more details in reconciler so we can see why time isn't synced.

bnaecker · 2026-04-23T23:11:19Z

Intended to address #7668

davepacheco · 2026-04-24T17:04:01Z

+        });
+    };
+
+    let max_error = root_delay / 2.0 + root_dispersion;


Is there a reference for this?

No, this was my guess until I found a better estimate. There does seem to be a reference in the chronyc man pages, under the description of "Root dispersion":

An absolute bound on the computer’s clock accuracy (assuming the stratum-1 computer is correct) is given by:

clock_error <= |system_time_offset| + root_dispersion + (0.5 * root_delay)

I'll update this to use that and include a link.

There's also the definition from RFC 5905 and the full algorithm in Section 11.2.3.

My original guess came from the fact that the dispersion is defined in the RFC as the maximum error accumulated along the path from us to the root server; since the RTT is assumed to be symmetric, we can divide by 2 to get the time to get just from us to the root, and then add the maximum error.

I think chrony's value is better than just delay / 2 + dispersion. That is the maximum error on chrony's internal estimate of the true time. But it is constantly updating the local tick frequency on the system to slew the clock so that it matches the upstream time. That means that if you just ask the OS "What time is it?", you see both the errors due to the fact that the clock is currently being slewed plus whatever this error is. I think we need both, but I could use some validation on that point.

davepacheco · 2026-04-24T17:08:58Z

    // |  example for the next person.
    // v
    // (next_int, IDENT),
+    (2, ADD_MAX_ERROR_AND_OFFSET),


This API is (or is supposed to be) client-side-versioned: #8769

I see it doesn't have a comment to that effect here. Can we do this without a version bump? If it's just extra data for debugging, maybe we can add the data in this release, use a client for the older version in this release, and then bump the client back to "latest" in the next release?

Thanks, I forgot to check that. I can add a comment here for the future too.

We can do this without a version bump in a few ways. The easiest would be to not change the type at all, only the definition of sync. I added the data so that clients could log why it wasn't synced, but we can also log it locally to help understand.

I'm not sure I understand the multi-release process. Do you mean:

Add this new version, which is backwards compatible

In the clients (sled-agent only?), pin to the previous version so we ignore the new fields

Release that in rN

Update the clients to the new version

Release that in rN+1

Is that right?

bnaecker marked this pull request as draft April 23, 2026 23:10

davepacheco reviewed Apr 24, 2026

View reviewed changes

bnaecker added 4 commits April 24, 2026 12:59

references for max-error calc, lower bound to match CRDB docs

4ec3dfb

update tests for new threshold

d544711

Pin clients to v1 of the NTP admin API for now

96c54d2

add note

85f0bc1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider max error when determining if time is synced#10318

Consider max error when determining if time is synced#10318
bnaecker wants to merge 5 commits intomainfrom
ben/include-max-error-in-time-sync

bnaecker commented Apr 23, 2026

Uh oh!

bnaecker commented Apr 23, 2026

Uh oh!

Uh oh!

davepacheco Apr 24, 2026

Uh oh!

bnaecker Apr 24, 2026 •

edited

Loading

Uh oh!

bnaecker Apr 24, 2026

Uh oh!

davepacheco Apr 24, 2026

Uh oh!

bnaecker Apr 24, 2026

Uh oh!

davepacheco Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bnaecker commented Apr 23, 2026

Uh oh!

bnaecker commented Apr 23, 2026

Uh oh!

Uh oh!

davepacheco Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

bnaecker Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bnaecker Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

davepacheco Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

bnaecker Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

davepacheco Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bnaecker Apr 24, 2026 •

edited

Loading