Skip to content

hiffy: nicer ringbufs and counters#2531

Draft
hawkw wants to merge 3 commits into
masterfrom
eliza/hiffy-debug-2
Draft

hiffy: nicer ringbufs and counters#2531
hawkw wants to merge 3 commits into
masterfrom
eliza/hiffy-debug-2

Conversation

@hawkw
Copy link
Copy Markdown
Member

@hawkw hawkw commented May 24, 2026

No description provided.

hawkw added 3 commits May 24, 2026 09:57
Issue #2528 describes a bug in Hiffy introduced by the change to make
Hiffy callable over the management network directly, rather than through
`udprpc` (PR #2466). The bug results in Hiffy calls via a debug probe
timing out while the system is connected to a management network.

As I described in [this comment][1], @jamesmunns and I determined that
this occurs due to a combination of two factors: First, the top of the
`main()` loop in `task-hiffy` always begins by unconditionally advancing
the deadline by `sleep_ms` and calling `sys_set_timer()` with the new
deadline. Second, the check for whether `HIFFY_KICK` was set by the
probe is *conditional on whether the timer notification has fired*. If
hiffy was only notified by the timer, this would be fine; the previous
timeout has elapsed so we should advance the timer. However, if Hiffy
has been notified by the `net` task's `SOCKET` notification, we will set
the timer forwards, but we will _not_ check `HIFFY_KICK`, meaning any
kick delivered by the debug probe is missed. While debugging, we noticed
that a large number of seemingly spurious notifications were being
posted to hiffy by the `net` task. When the management network is
connected, these spurious notifications are posted more frequently than
the timer, meaning that the timer *never* completes, and `HIFFY_KICK` is
never checked.

This commit fixes the bug by changing `task-hiffy`'s main loop in two
main ways. First, we _always_ check the value of `HIFFY_KICK`,
regardless of which notification woke us. This change alone is
sufficient to fix the bug, as I demonstrated in [this comment][2]. This
probably also reduces the latency of the `net` Hiffy RPC, as it kicks
execution by _also_ setting `HIFFY_KICK`, and then waiting for the timer
to fire to start execution. This way, we no longer need to wait for the
timer. In addition, I have also changed the `main()` loop so that the
timer is only reset at the end of the loop, if it has actually fired.
This way, no matter how often the `net` task notifies us, the timer will
always fire before being reset. I believe that either of these changes
would be sufficient to fix the bug, but fixing both of them makes the
behavior closer to what we would expect.

To demonstrate that it works, here's me flashing `mb-1-sp` (the system
on which the bug was first encountered) with an image from this branch,
and calling Hiffy both via the probe (using `humility net ip`, which
uses Hiffy), and then over the network. Note that they both work fine
now. Yay!

```console
eliza@jeeves ~ $ export
HUMILITY_ARCHIVE="$HOME/build-cosmo-b-dev-image-default.hiffy-kickme.zip"
eliza@jeeves ~ $ pfexec /staff/eliza/humility -t mb-1-sp flash
humility: WARNING: archive in environment variable overriding archive in
environment file
humility: attached to archive
humility: attaching with chip set to "STM32H753ZITx"
humility: attached to 1fc9:0143:ISS1EG0OBG2HT via CMSIS-DAP
humility: flash/archive mismatch; reflashing
humility: skipping measurement token handoff
humility: auxiliary flash data is already loaded in slot 0; skipping
programming
humility: done with auxiliary flash
humility: flashing done
eliza@jeeves ~ $ pfexec /staff/eliza/humility -t mb-1-sp net ip
humility: WARNING: archive in environment variable overriding archive in
environment file
humility: attached to 1fc9:0143:ISS1EG0OBG2HT via CMSIS-DAP
MAC address:  a8:40:25:04:11:c5
IPv6 address: fe80::aa40:25ff:fe04:11c5
eliza@jeeves ~ $ /staff/eliza/humility --ip
fe80::aa40:25ff:fe04:11c5%e1000g0 hiffy -c Sequencer.get_state
humility: connecting to fe80::aa40:25ff:fe04:11c5%2
Sequencer.get_state() => A0PlusHP
eliza@jeeves ~ $
```

This fixes #2528. Note that the spurious notifications from the `net`
task are *not* fixed by this change, but Hiffy can now behave directly
despite them. We need to determine why we're getting so many
notifications while the recv queue is empty as well, but this commit
fixes the bug. I'll open a new ticket for investigating the spurious
notifications.

[1]:
#2528 (comment)
[2]:
#2528 (comment)
The root cause of the spurious notifications hasn't been determined yet;
we need to investigate this separately. However, in the meantime
Base automatically changed from eliza/kickme to master May 24, 2026 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant