Consider max error when determining if time is synced#10318
Consider max error when determining if time is synced#10318
Conversation
- Parse more data from `chronyc tracking`, including the max offset, root delay, and root dispersion. Compute maximum error from the last two, and use that when determining if time is synced. - Add new NTP admin server version with extra data - Log more details in reconciler so we can see why time isn't synced.
|
Intended to address #7668 |
| }); | ||
| }; | ||
|
|
||
| let max_error = root_delay / 2.0 + root_dispersion; |
There was a problem hiding this comment.
Is there a reference for this?
There was a problem hiding this comment.
No, this was my guess until I found a better estimate. There does seem to be a reference in the chronyc man pages, under the description of "Root dispersion":
An absolute bound on the computer’s clock accuracy (assuming the stratum-1 computer is correct) is given by:
clock_error <= |system_time_offset| + root_dispersion + (0.5 * root_delay)
I'll update this to use that and include a link.
There was a problem hiding this comment.
There's also the definition from RFC 5905 and the full algorithm in Section 11.2.3.
My original guess came from the fact that the dispersion is defined in the RFC as the maximum error accumulated along the path from us to the root server; since the RTT is assumed to be symmetric, we can divide by 2 to get the time to get just from us to the root, and then add the maximum error.
I think chrony's value is better than just delay / 2 + dispersion. That is the maximum error on chrony's internal estimate of the true time. But it is constantly updating the local tick frequency on the system to slew the clock so that it matches the upstream time. That means that if you just ask the OS "What time is it?", you see both the errors due to the fact that the clock is currently being slewed plus whatever this error is. I think we need both, but I could use some validation on that point.
| // | example for the next person. | ||
| // v | ||
| // (next_int, IDENT), | ||
| (2, ADD_MAX_ERROR_AND_OFFSET), |
There was a problem hiding this comment.
This API is (or is supposed to be) client-side-versioned: #8769
I see it doesn't have a comment to that effect here. Can we do this without a version bump? If it's just extra data for debugging, maybe we can add the data in this release, use a client for the older version in this release, and then bump the client back to "latest" in the next release?
There was a problem hiding this comment.
Thanks, I forgot to check that. I can add a comment here for the future too.
We can do this without a version bump in a few ways. The easiest would be to not change the type at all, only the definition of sync. I added the data so that clients could log why it wasn't synced, but we can also log it locally to help understand.
I'm not sure I understand the multi-release process. Do you mean:
- Add this new version, which is backwards compatible
- In the clients (sled-agent only?), pin to the previous version so we ignore the new fields
- Release that in
rN - Update the clients to the new version
- Release that in
rN+1
Is that right?
chronyc tracking, including the max offset, root delay, and root dispersion. Compute maximum error from the last two, and use that when determining if time is synced.