Hi MLCommons Tiny folks,
I wanted to share a small but unusual MCU language-runtime experiment and ask whether systems like this suggest a benchmark gap in the current Tiny landscape.
We built a public demo line called Engram and deployed it on a commodity ESP32-C3.
Current public numbers:
Important scope note:
This is not presented as unrestricted open-input native LLM generation on MCU.
The board-side path is closer to a flash-resident, table-driven runtime with:
- packed token weights
- hashed lookup structures
- fixed compiled probe batches
- streaming fold / checksum style execution over precompiled structures
So this is not a standard vision/KWS/anomaly micro model. It is closer to a task-specialized language runtime whose behavior has been pushed into a very compact executable form.
Repo:
https://github.com/Alpha-Guardian/Engram
What I’m genuinely curious about is whether systems like this point to a missing benchmark category in the TinyML / MCU benchmark ecosystem.
Would something like the following make sense as a future benchmark direction?
- constrained language-task execution
- auditable board-measured language behavior
- fixed-memory / fixed-artifact board deployment
- explicit separation between host benchmark capability and board execution mode
If people here think this is out of scope for MLCommons Tiny, that would also be useful to know.
Hi MLCommons Tiny folks,
I wanted to share a small but unusual MCU language-runtime experiment and ask whether systems like this suggest a benchmark gap in the current Tiny landscape.
We built a public demo line called Engram and deployed it on a commodity ESP32-C3.
Current public numbers:
Host-side benchmark capability
LogiQA = 0.392523IFEval = 0.780037Published board proof
LogiQA 642 = 249 / 642 = 0.3878504672897196host_full_match = 642 / 6421,380,771 bytesImportant scope note:
This is not presented as unrestricted open-input native LLM generation on MCU.
The board-side path is closer to a flash-resident, table-driven runtime with:
So this is not a standard vision/KWS/anomaly micro model. It is closer to a task-specialized language runtime whose behavior has been pushed into a very compact executable form.
Repo:
https://github.com/Alpha-Guardian/Engram
What I’m genuinely curious about is whether systems like this point to a missing benchmark category in the TinyML / MCU benchmark ecosystem.
Would something like the following make sense as a future benchmark direction?
If people here think this is out of scope for MLCommons Tiny, that would also be useful to know.