Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
108 changes: 108 additions & 0 deletions docs/specs/hal/time_driver.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# Time Driver (HAL)

## Description

The time driver is a HAL component that manages the system NTP daemon (`sysntpd`) and exposes a time source capability to the rest of the system. It is responsible for all direct OS interaction related to time synchronisation: restarting the daemon, monitoring kernel hotplug NTP events via ubus, and publishing the current sync state and accuracy metadata as a capability on the bus.

The time driver acts as both a driver and a time manager β€” it owns the full lifecycle of `sysntpd` within the running session.

## Dependencies

- **ubus capability** (`{'cap', 'ubus', '1', ...}`): required to listen for `hotplug.ntp` kernel events. The driver waits for the ubus capability to become available before starting NTP.

## Initialisation

On startup (once the ubus capability is available):

1. Restart `sysntpd` via `exec.command("/etc/init.d/sysntpd", "restart"):run()`.
2. Register a ubus listener for `hotplug.ntp` events.
3. Publish initial capability `meta` and `state` (initially `synced = false`).

## Capability

The driver publishes a single time source capability. The capability id is a UUID generated at startup. In the future, additional time source capabilities may exist alongside this one.

### Meta (retained)

Topic: `{'cap', 'time', <uuid>, 'meta'}`

```lua
{
provider = 'hal',
source = 'ntp', -- time source type
version = 1, -- interface version
accuracy_seconds = <number|nil>, -- estimated absolute error in seconds
}
```

`accuracy_seconds` is a coarse estimate of absolute clock error and is derived from NTP stratum. Lower values are better. It is `nil` when unsynced (stratum >= 16) or before first sync-quality data is available.

### State (retained)

Topic: `{'cap', 'time', <uuid>, 'state'}`

```lua
{
synced = true | false,
stratum = <number>, -- last reported stratum, nil before first event
accuracy_seconds = <number|nil>,
}
```

Published on every sync/unsync transition. Retained so new subscribers immediately get the current state.

### Events (non-retained)

#### synced

Topic: `{'cap', 'time', <uuid>, 'event', 'synced'}`

Fired when the NTP daemon transitions from unsynced to synced (stratum < 16). Payload:

```lua
{ stratum = <number>, accuracy_seconds = <number|nil> }
```

#### unsynced

Topic: `{'cap', 'time', <uuid>, 'event', 'unsynced'}`

Fired when the NTP daemon transitions from synced to unsynced (stratum == 16 or daemon restarts). Payload:

```lua
{ stratum = 16, accuracy_seconds = nil }
```

These events are non-retained β€” consumers that need current state should read `{'cap', 'time', '1', 'state'}` first, then subscribe to events for transitions.

## Service Flow

```mermaid
flowchart TD
St[Start] --> A(Install alarm handler)
A --> B(Wait for ubus capability)
B -->|ubus cap available| C(Restart sysntpd)
C --> D{sysntpd restart ok?}
D -->|error| E[Log error and stop]
D -->|ok| F(Register ubus listener for hotplug.ntp)
F --> G{Listener registered ok?}
G -->|error| H[Log error and stop]
G -->|ok| I(Publish meta + initial state: synced=false)
I --> J{Wait for hotplug.ntp event or stream closed or context done}
J -->|hotplug.ntp event| K{stratum < 16?}
K -->|yes, was unsynced| L(Publish state synced=true + event/synced)
K -->|no, was synced| M(Publish state synced=false + event/unsynced)
L --> J
M --> J
K -->|no change| J
J -->|stream closed| N[Log warning, stop]
J -->|context done| O(Send stop_stream to ubus)
```

## Architecture

- The driver runs a single main fiber that handles the full lifecycle. No child scope is needed.
- Sync and unsync events are only published on **transitions** β€” the driver tracks the previous sync state and only emits an event when it changes.
- The retained `state` topic is always updated on any hotplug event regardless of transition, to refresh the stratum value.
- A `finally` block logs the reason for shutdown and performs cleanup (stop_stream if still active).
- The ubus `stream_id` must be stopped cleanly on context cancellation to avoid leaking listener registrations in the ubus driver.
76 changes: 76 additions & 0 deletions docs/specs/time.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# Time Service

## Description

The time service listens to time capabilities for sync and unsync events. It uses the events to sync and unsync the alarm module of fibers and broadcast sync/unsync events to the bus for other services to listen to.

## Time capability

There is initially only one time capability, provided by the time driver in HAL. In the future we can discover multiple time capabilities and balance multiple sources of time intelligently (e.g. prefer the capability reporting the lowest stratum).

For now, the time service subscribes to `{'cap', 'time', '+', 'meta', 'source'}` and uses the **first** capability announced. Subsequent announcements are ignored.

## Bus Outputs

### Service status (retained)

Topic: `{'svc', 'time', 'status'}`

```lua
{
state = 'starting' | 'running' | 'stopped',
ts = <number>,
}
```

### Time synced state (retained)

Topic: `{'svc', 'time', 'synced'}`

Payload: `true` or `false`

Published whenever the overall sync state changes. This is the authoritative "is system time trustworthy" signal for other services. It is retained so services that start later immediately receive the current state.

### Time transition events (non-retained)

Topics:
- `{'svc', 'time', 'event', 'synced'}`
- `{'svc', 'time', 'event', 'unsynced'}`

Published on state transitions for consumers that need edge-triggered behaviour.

## Service Flow

```mermaid
flowchart TD
St[Start] --> B(Publish status: starting)
B --> C(Subscribe to cap/time/+/meta/source)
C --> D(Publish status: running)
D --> E{Wait for first time capability meta message or scope done}
E -->|scope done| Z[Publish status: stopped]
E -->|first capability meta received| F(Extract uuid from meta topic)
F --> G(Subscribe to cap/time/uuid/state/synced\ncap/time/uuid/event/synced\ncap/time/uuid/event/unsynced)
G --> H(Read single retained state message and apply sync state)
H --> H2(Unsubscribe from state/synced)
H2 --> K{Wait for synced event, unsynced event, or scope done}
K -->|synced event| L(Apply synced: retain svc/time/synced=true, emit transition event)
L --> K
K -->|unsynced event| M(Apply unsynced: retain svc/time/synced=false, emit transition event)
M --> K
K -->|scope done| Z
```

All three subscriptions are created before any message is read, so no events are lost during initialisation. The retained `{'cap', 'time', <uuid>, 'state', 'synced'}` payload is consumed as a one-shot read to bootstrap sync state, then the state subscription is dropped. Ongoing sync state changes are tracked exclusively through the `event/synced` and `event/unsynced` transition topics.

With the new fibers alarm API, the service calls:
- `alarm.set_time_source(fibers.utils.time.realtime)` on first synced state
- `alarm.time_changed()` on subsequent synced transitions/events

There is no direct equivalent of `clock_desynced` in the new API; unsynced updates still propagate over bus outputs.

## Architecture

- Everything runs in a single fiber β€” no child fibers needed. The fiber blocks waiting for the first capability, then transitions directly into the event loop for that capability.
- The service does not interact with the OS directly β€” all time source information arrives through the capability published by the time driver in HAL.
- Use `finally` to log shutdown reason and publish `stopped` status.

3 changes: 2 additions & 1 deletion src/configs/config.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@
"name": "state",
"root": "/tmp/dc-states/"
}
]
],
"time": {}
},
"gsm": {
"modems": {
Expand Down
26 changes: 26 additions & 0 deletions src/services/hal/backends/time/contract.lua
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
---@class TimeBackend
---@field start_ntp_monitor fun(self: TimeBackend): boolean, string
---@field ntp_event_op fun(self: TimeBackend): Op
---@field stop fun(self: TimeBackend): boolean, string

local BACKEND_FUNCTIONS = {
"start_ntp_monitor",
"ntp_event_op",
"stop",
}

---Check that a time backend provides all required functions.
---@param backend TimeBackend
---@return string error Empty string on success.
local function validate(backend)
for _, func in ipairs(BACKEND_FUNCTIONS) do
if type(backend[func]) ~= "function" then
return "Missing required function: " .. func
end
end
return ""
end

return {
validate = validate
}
49 changes: 49 additions & 0 deletions src/services/hal/backends/time/provider.lua
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
---Time backend provider factory.
---Selects the appropriate TimeBackend implementation based on the runtime platform.

local contract = require "services.hal.backends.time.contract"

local BACKENDS = {
"openwrt"
}

--- Select and initialize the backend implementation
---@return table backend_impl
local function get_backend_impl()
local backend_impl = nil
for _, backend_name in ipairs(BACKENDS) do
local ok, backend_mod = pcall(require, "services.hal.backends.time.providers." .. backend_name .. ".init")
if ok and type(backend_mod) == "table" and backend_mod.is_supported and backend_mod.is_supported() then
backend_impl = backend_mod.backend
break
end
end

if backend_impl == nil then
error("No supported time backend found")
end

return backend_impl
end

---Create a new TimeBackend instance.
---
---Detects the platform and instantiates the appropriate backend implementation.
---Fails with an error if no supported backend is found.
---
---@return TimeBackend
local function new()
local backend_impl = get_backend_impl()
local backend = backend_impl.new()

local iface_err = contract.validate(backend)
if iface_err ~= "" then
error("Time backend does not implement required interface: " .. tostring(iface_err))
end

return backend
end

return {
new = new,
}
Loading