Skip to content

first-pass fabric protocol for TinyGo peer #37

@lePereT

Description

@lePereT

This is the current working design for the devicecode-lua fabric side, so we can build the corresponding TinyGo side against something concrete.

This is a first pass, not the final mesh protocol. The aim is to get a reliable CM5 <-> MCU link working cleanly, with room to grow later.

The main design choices are:

  • keep raw UART bytes out of the local in-process bus;
  • keep OS and UART ownership on the Lua side inside HAL;
  • make fabric the only service that knows about remote peers;
  • carry a small, explicit protocol over a byte stream;
  • preserve useful bus semantics such as publish and directed call, but not pretend the remote side is just another in-process connection.

1. Big picture

On the Lua side there is one fabric service. It owns one session per configured link. For the MCU link, that session uses a UART transport.

HAL on the Lua side opens the UART and returns a Stream capability to fabric. fabric then reads and writes protocol messages on that stream.

On the TinyGo side, we should think of our fabric as the matching peer session layer over the UART. It sits above a raw byte stream and below our local service bus.

So the UART link is:

  • Lua HAL owns UART fd / driver
  • Lua fabric speaks fabric protocol over the stream
  • TinyGo fabric speaks the same protocol over the raw UART
  • TinyGo fabric imports/exports messages to the MCU’s internal service world

The protocol itself should be transport-neutral in meaning, even though for now it is carried over a serial byte stream.


2. First-pass scope

Version 1 should support:

  • link bring-up and peer handshake
  • heartbeat
  • publish of ordinary messages
  • retained publish
  • unretain
  • directed call / reply, equivalent to lane B-style RPC proxying

Version 1 does not need to support:

  • distributed subscriptions
  • route advertisements
  • multi-hop mesh forwarding
  • firmware transfer
  • binary packet framing
  • authentication on the wire

Those come later.


3. Wire format in v1

For the first implementation, the wire format is deliberately simple:

  • one JSON object per line
  • UTF-8 text
  • newline (\n) terminates one message

So conceptually:

{"t":"hello",...}\n
{"t":"pub",...}\n
{"t":"call",...}\n

Important framing rule

Treat the UART as a byte stream. Accumulate bytes until newline, then decode that full line as one JSON message.

Important encoding rule

Do not emit pretty-printed JSON. One compact JSON object per line only.

Practical note

JSON strings may contain escaped \n, but not literal frame-breaking newline bytes. Standard encoders do the right thing here.


4. Message types

All messages are JSON objects with a required string field t.

4.1 hello

Sent when a session comes up.

Example:

{"t":"hello","node":"cm5-local","peer":"mcu-1","sid":"9e3b...","caps":{"pub":true,"call":true}}

Fields:

  • t: "hello"
  • node: sender node id
  • peer: intended remote peer id
  • sid: session id generated by sender
  • caps: advertised capability flags

Semantics

The sender is saying:

  • this is who I am (node)
  • this is who I think you are (peer)
  • this is the new session id for this side (sid)
  • these are the high-level capabilities I support

Expected TinyGo behaviour

On receiving hello:

  • verify the peer field is acceptable for this device
  • record remote node
  • mark link as seen/alive
  • send back hello_ack

For now, do not overcomplicate session identity. A new hello on an existing UART link can simply be treated as “fresh session, reset peer session state”.

4.2 hello_ack

Example:

{"t":"hello_ack","node":"mcu-1","ok":true}

Fields:

  • t: "hello_ack"
  • node: sender node id
  • ok: boolean, currently always true in first pass

Semantics

Acknowledges handshake. No extra payload required in v1.


4.3 ping

Example:

{"t":"ping","ts":1712345678}

Fields:

  • t: "ping"
  • ts: sender timestamp, opaque to receiver in v1

Expected behaviour

Reply with pong.

4.4 pong

Example:

{"t":"pong","ts":1712345678}

Fields:

  • t: "pong"
  • ts: opaque echo or sender timestamp

Semantics

Heartbeat only. No strict clock semantics in v1.


4.5 pub

Example:

{"t":"pub","topic":["state","mcu","health"],"payload":{"ok":true},"retain":false}

Fields:

  • t: "pub"
  • topic: array of non-empty strings
  • payload: arbitrary JSON value, typically object
  • retain: boolean

Semantics

Publish one message into the peer’s import rules.

If retain is true, the receiver should treat it as retained state and store/publish accordingly.

If retain is false, it is a transient publish.

Constraints

For v1, topic tokens are strings only. Do not use numeric topic tokens on the wire yet.


4.6 unretain

Example:

{"t":"unretain","topic":["state","mcu","health"]}

Fields:

  • t: "unretain"
  • topic: array of non-empty strings

Semantics

Clear retained state for the mapped local topic.


4.7 call

Example:

{"t":"call","id":"f6a2...","topic":["rpc","hal","read_state"],"payload":{"ns":"config","key":"services"},"timeout_ms":5000}

Fields:

  • t: "call"
  • id: correlation id generated by caller
  • topic: concrete topic array for the remote directed call target
  • payload: arbitrary JSON value, typically object
  • timeout_ms: advisory timeout in milliseconds

Semantics

This is a directed request to the remote peer. The receiver should map topic through its import-call rules, invoke the corresponding local handler, and send exactly one reply.

Important rule

call.topic should always be concrete in v1. No wildcards.


4.8 reply

Success example:

{"t":"reply","corr":"f6a2...","ok":true,"payload":{"found":true,"data":"..."}}

Failure example:

{"t":"reply","corr":"f6a2...","ok":false,"err":"timeout"}

Fields:

  • t: "reply"
  • corr: correlation id matching a previous call.id
  • ok: boolean
  • payload: present if ok=true
  • err: string if ok=false

Semantics

Completes one pending call.

Exactly one reply should be emitted per accepted call.

If the receiver cannot route or execute the call, it should still reply with ok=false.


5. Topic model

Topics on the wire are JSON arrays of strings.

Examples:

  • ["state","mcu","health"]
  • ["rpc","hal","dump"]
  • ["config","device"]

Do not encode topics as slash-separated strings on the wire. Keep them as arrays. This avoids quoting ambiguities and keeps remapping simple.


6. Topic remapping

The CM5 fabric side will use static configured remapping rules. The TinyGo side should do the same.

A remapping rule is conceptually:

  • local prefix ↔ remote prefix

with support for MQTT-style wildcards:

  • + means one token
  • # means the remaining tail

Example import rule on Lua side

Remote:

{ "state", "#" }

maps to local:

{ "peer", "mcu-1", "state", "#" }

So a remote publish:

{"t":"pub","topic":["state","net","link","wan0"], ...}

becomes locally:

{"peer","mcu-1","state","net","link","wan0"}

What I recommend for TinyGo

Mirror the same general mechanism:

  • export rules for what the MCU is allowed to send out
  • import rules for what the MCU accepts from CM5
  • optional proxy call rules for directed RPC

Keep these static for v1.


7. Directed call mapping

There are two directions.

7.1 Lua local → TinyGo remote

Lua fabric binds a local proxy endpoint, such as:

  • rpc/peer/mcu-1/hal/dump

When that endpoint is called locally, Lua fabric sends:

{"t":"call","id":"...","topic":["rpc","hal","dump"],"payload":{...},"timeout_ms":5000}

TinyGo fabric should:

  • map ["rpc","hal","dump"] to a local MCU service handler
  • invoke it
  • send back reply

7.2 TinyGo local → Lua remote

TinyGo fabric may send a call to a configured remote target, for example:

  • ["rpc","hal","read_state"]

Lua fabric will map that to a local call target and return a reply.

Rule for both sides

If no route matches, send:

{"t":"reply","corr":"...","ok":false,"err":"no_route"}

Do not silently drop a call.


8. Retained state semantics

Retained state is simple in v1.

If retain=true on a pub, the receiver should treat that as the current retained value for the mapped topic.

If an unretain arrives, the receiver should clear the retained value for the mapped topic.

That is all.

No retained enumeration or retained ownership protocol is needed yet.

Note on reconnect

On the Lua side, reconnect behaviour is helped by the fact that local bus subscriptions replay retained state on subscribe. That means fabric can resubscribe and forward retained messages again after reconnect.

On the TinyGo side, we should probably do the same conceptually: when the link comes up, emit the current retained exported state again.


9. Session state on the TinyGo side

I would recommend keeping the session state very small.

At minimum:

  • link status: down / handshaking / up
  • remote node id
  • last hello seen
  • last heartbeat seen
  • pending outgoing calls by correlation id
  • import/export rule tables

That is enough for v1.

Suggested state machine

  1. UART up, no peer state yet
  2. Send hello
  3. On hello from remote, record peer and send hello_ack
  4. On hello_ack, mark link usable
  5. Exchange ping/pong periodically
  6. If decode fails repeatedly or heartbeat expires, mark link down and reset pending calls

We do not need a very elaborate session FSM in v1.


10. Error handling rules

These should be the same on both sides.

Invalid JSON line

  • log it
  • discard it
  • do not bring the whole session down immediately unless it keeps happening

Unknown t

  • log it
  • ignore it

Malformed message of known type

  • log it
  • ignore it, or reply with error if it was a call and you can still identify id

Call with no route

  • reply with ok=false, err="no_route"

Local handler failure

  • reply with ok=false, err="<reason>"

Timeout waiting for reply

  • local caller times out
  • clear pending entry
  • treat late reply as unknown and drop it

11. Timeouts

For v1 I would keep timeout handling simple.

Suggested defaults

  • hello / hello_ack expectation: a few seconds
  • ping interval: around 15 seconds
  • link considered stale: perhaps 45 seconds without useful traffic
  • call timeout: use timeout_ms if present, otherwise local default such as 5 seconds

These are not wire-level guarantees. They are local policy.


12. UTF-8 and payload shape

Payloads are JSON values. In practice, use JSON objects for protocol-facing application messages.

Do not try to put binary blobs directly into this v1 control-plane protocol. Firmware transfer is a later subprotocol.

For now, everything over this protocol should be JSON-safe.


13. What the TinyGo side should build first

I would build the TinyGo fabric in these pieces.

UART transport

Responsible for:

  • reading bytes until newline
  • writing one JSON line plus newline
  • surfacing decoded messages upward
  • exposing a send queue downward

Session layer

Responsible for:

  • hello / hello_ack
  • ping / pong
  • pending call map
  • dispatch by t

Router

Responsible for:

  • applying import/export rules
  • mapping incoming pub to local topics
  • mapping incoming call to local RPC handlers
  • forwarding local exported publishes onto the wire

Local integration

Responsible for:

  • publishing imported state into the TinyGo local bus
  • exposing selected local endpoints to the remote side
  • collecting exported retained state on link-up

14. Expectations about local MCU-side capabilities

The MCU side does not need to mirror the Lua service tree exactly.

It just needs to provide equivalent local hooks for:

  • local publish into its internal bus
  • local retained update and clear
  • local directed call handling
  • local exported retained state replay on reconnect

So the TinyGo side fabric should be an adapter between:

  • UART session protocol
  • and the MCU’s own local microservice/runtime environment

15. Important thing that is deliberately not in v1

Firmware transfer is not part of this first control protocol.

We are intentionally keeping v1 to normal control-plane traffic first. Once the control path is working and stable, we will add a bulk-transfer subprotocol.

That later protocol will probably still ride over the same UART, but it will not just be “a big JSON message”.

It will need:

  • begin
  • ready
  • need
  • chunk
  • done
  • abort

and a different framing story.

So let's keep our session design open for adding another message class later.


16. Practical examples

16.1 CM5 announces retained config to MCU

Lua sends:

{"t":"pub","topic":["config","device"],"payload":{"schema":"devicecode.mcu/1","rev":3,"data":{"mode":"normal"}},"retain":true}

TinyGo maps this to its local config topic and treats it as retained current desired config.

16.2 MCU publishes health to CM5

TinyGo sends:

{"t":"pub","topic":["state","mcu","health"],"payload":{"ok":true,"temp_c":41.2},"retain":true}

Lua imports it under:

  • peer/mcu-1/state/mcu/health

or whatever mapping is configured.

16.3 CM5 calls remote MCU method

Lua sends:

{"t":"call","id":"1234","topic":["rpc","mcu","reboot_to_bootloader"],"payload":{"reason":"update"},"timeout_ms":5000}

TinyGo executes local handler, then sends:

{"t":"reply","corr":"1234","ok":true,"payload":{"accepted":true}}

17. Suggested TinyGo implementation constraints

Given MCU constraints, I would suggest:

  • fixed maximum line length for v1 control messages
  • bounded pending-call table
  • bounded outgoing queue
  • clear rejection on oversize input
  • no dynamic remapping expression language beyond simple prefix wildcards
  • no attempt to buffer unlimited retained state

Keep it small and predictable.

For the first UART MCU link, that is more important than generality.


18. One final point on compatibility

The Lua side may later switch the transport framing from JSON-lines to a binary packet format, but the message meanings should remain the same.

So if we structure our TinyGo side as:

  • transport/framing
  • message decode/encode
  • session/router logic

then the later migration should be manageable. Only the transport/framing and serialisation layer changes; the session semantics do not.


19. Minimal implementation checklist

For our first milestone, I would aim for this exact behaviour:

  • UART open
  • send hello
  • respond to hello with hello_ack
  • respond to ping with pong
  • accept incoming pub
  • send outgoing pub
  • accept incoming call
  • return reply
  • send outgoing call
  • match incoming reply to pending map
  • apply static import/export rules
  • log and drop malformed messages safely

Once that works reliably, we can add transfer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions