Updated: 2026-03-02 — v0.19.5 (post register allocator overhaul)
Full pipeline: MinZ source -> parser -> semantic -> MIR optimizer -> fusion optimizer -> Z80 codegen -> asm peephole -> MZA -> MZX emulator. All 11 tests PASS with hex-verified output.
| Test | Chain | Expected | Hex | Status |
|---|---|---|---|---|
iter_foreach |
arr.forEach(console_log) |
ABCDE\n |
41424344450a |
PASS |
iter_take |
arr.iter().take(3).forEach(console_log) |
ABC\n |
4142430a |
PASS |
iter_skip |
arr.iter().skip(2).forEach(console_log) |
CDE\n |
4344450a |
PASS |
iter_map_foreach |
arr.iter().map(double).forEach(console_log) |
doubled bytes | 020406080a0a |
PASS |
iter_filter_foreach |
arr.filter(is_big).forEach(console_log) |
DE\n |
44450a |
PASS |
iter_inline_filter |
arr.filter(|x| x > 67).forEach(console_log) |
DE\n |
44450a |
PASS |
iter_lambda_map |
arr.iter().map(|x| x+1).forEach(console_log) |
BCDEF\n |
42434445460a |
PASS |
iter_map_filter_foreach |
arr.map(double).filter(|x|x>5).forEach(log) |
6,8,10,\n |
06080a0a |
PASS |
iter_filter_map_foreach |
arr.filter(|x|x>=67).map(add_one).forEach(log) |
DEF\n |
4445460a |
PASS |
iter_take_map_foreach |
arr.take(3).map(add_one).forEach(log) |
BCD\n |
4243440a |
PASS |
iter_bare_djnz |
arr.filter(|x| x > 3).forEach(console_log) |
\x04\x05\n |
04050a |
PASS |
Run: bash examples/e2e_iterators/run_e2e.sh
Measured via exact T-state counting in the Z80 emulator (E2E harness, regalloc_quality_test.go).
| Pattern | Before (v0.19.4) | After (v0.19.5) | Ideal | Speedup |
|---|---|---|---|---|
forEach(call) |
~207T/elem | 26T/elem | ~43T | 7.8x |
map(fn).forEach(call) |
~280T/elem | 36T/elem | ~60T | ~7.6x |
filter(|x|x>N).forEach(call) |
~280T/elem | 26T/elem | ~30T | ~10x |
The per-element costs are now below the ideal hand-written target from the original plan (64T).
- Hint-aware allocation — allocator respects
RegHintB(counter),RegHintHL(pointer),RegHintA(element) - Dynamic A/B tracking —
loadToA()/loadToB()skip loads when the register already has the value - Memory round-trip elimination — POP restores tracking, void returns don't store, INC HL stays tracked
- DJNZ with BC save/restore — PUSH BC/POP BC around calls, enabling bare DJNZ
- Call-clobber-aware invalidation — per-function
ModifiedRegistersbitmask; only clear what callee actually clobbers - Cycle-count regression tests — 3 deterministic tests with thresholds
See Register Allocator Overhaul Report for full before/after assembly and pipeline walkthrough.
All listings below are real compiler output — generated by mz --dump-mir and mz -o file.a80.
MinZ source:
fun console_log(c: u8) -> void {
asm {
OUT ($23), A
}
}
fun main() -> void {
let arr: [u8; 5] = [65, 66, 67, 68, 69];
arr.forEach(console_log);
asm {
DI
HALT
}
}
MIR (--dump-mir):
Function listing_foreach.console_log$u8(c: u8) -> void
@smc
Instructions:
0: asm { OUT ($23), A }
1: return
Function listing_foreach.main() -> void
@smc
Locals:
r1 = arr: [5]u8
Instructions:
0: r2 = array_literal ; Array literal [[65 66 67 68 69]] -> DB directive
1: r8 = load arr
2: nop ; DJNZ OPTIMIZED LOOP for array[5]
3: r9 = 5 ; DJNZ counter = 5
4: r10 = r8 ; Pointer to array start
5: djnz_loop_1:
6: r11 = *r10 ; Load element via pointer
7: push r10 ; Save array pointer before operations
8: push r9 ; Save DJNZ counter before operations (BC preserved across CALL)
9: r12 = call listing_foreach.console_log$u8
10: r9 = pop ; Restore DJNZ counter after operations
11: r10 = pop ; Restore array pointer after operations
12: r10++ ; Advance to next element
13: djnz r9, djnz_loop_1 ; DJNZ - decrement and loop
14: asm { DI / HALT }
15: return
Z80 assembly (loop body only, -o file.a80):
listing_foreach_main_djnz_loop_1_i1:
; Load element via pointer
LD HL, ($F014) ; Virtual register 10 from memory
LD A, (HL)
LD ($F016), A ; Virtual register 11 to memory
; Save array pointer before operations
; Register 10 already in HL (tracked)
PUSH HL ; Save array pointer before operations
; Save DJNZ counter before operations (BC preserved across CALL)
LD HL, ($F012) ; Virtual register 9 from memory
PUSH HL ; Save DJNZ counter before operations (BC preserved across CALL)
; Call listing_foreach.console_log_u8
; Register-based parameter passing
LD A, ($F016) ; Virtual register 11 from memory
; Parameter c in A
CALL listing_foreach_console_log_u8
LD ($F018), HL ; Virtual register 12 to memory
; Restore DJNZ counter after operations (BC preserved across CALL)
POP HL ; Restore DJNZ counter
LD ($F012), HL ; Virtual register 9 to memory
; Restore array pointer after operations
POP HL ; Restore array pointer
LD ($F014), HL ; Virtual register 10 to memory
; Advance to next element
LD HL, ($F014) ; Virtual register 10 from memory
INC HL
LD ($F014), HL ; Virtual register 10 to memory
; DJNZ - decrement and loop
LD A, ($F012) ; Virtual register 9 from memory
LD B, A
DEC B
LD A, B
LD ($F012), A ; Virtual register 9 to memory
JR NZ, listing_foreach_main_djnz_loop_1_i1
; console_log — 2 instructions, 0 memory round-trips:
listing_foreach_console_log_u8:
OUT ($23), A
RETMinZ source:
fun add_one(x: u8) -> u8 {
return x + 1;
}
fun console_log(c: u8) -> void {
asm {
OUT ($23), A
}
}
fun main() -> void {
let arr: [u8; 5] = [65, 66, 67, 68, 69];
arr.map(add_one).forEach(console_log);
asm {
DI
HALT
}
}
MIR (--dump-mir) — note add_one is fused inline, no OpCall for it:
Function listing_map.add_one$u8(x: u8) -> u8
@smc
Instructions:
0: r2 = param x
1: r3 = 1
2: r4 = r2 + r3
3: return r4
Function listing_map.main() -> void
@smc
Locals:
r1 = arr: [5]u8
Instructions:
0: r2 = array_literal ; Array literal [[65 66 67 68 69]] -> DB directive
1: r8 = load arr
2: nop ; DJNZ OPTIMIZED LOOP for array[5]
3: r9 = 5 ; DJNZ counter = 5
4: r10 = r8 ; Pointer to array start
5: djnz_loop_1:
6: r11 = *r10 ; Load element via pointer
7: push r10 ; Save array pointer before operations
8: push r9 ; Save DJNZ counter before operations
9: r1 = r11 ; Fused: param x <- r11
10: r2 = 1 ; Fused from listing_map.add_one$u8
11: r3 = r1 + r2 ; Fused from listing_map.add_one$u8
12: r12 = r3 ; Fused: return value
13: r13 = call listing_map.console_log$u8
14: r9 = pop ; Restore DJNZ counter
15: r10 = pop ; Restore array pointer
16: r10++ ; Advance to next element
17: djnz r9, djnz_loop_1
18: asm { DI / HALT }
19: return
Z80 assembly (loop body only):
listing_map_main_djnz_loop_1_i1:
; Load element via pointer
LD HL, ($F014) ; Virtual register 10 from memory
LD A, (HL)
LD ($F016), A ; Virtual register 11 to memory
; Save array pointer before operations
; Register 10 already in HL (tracked)
PUSH HL ; Save array pointer before operations
; Save DJNZ counter before operations (BC preserved across CALL)
LD HL, ($F012) ; Virtual register 9 from memory
PUSH HL ; Save DJNZ counter before operations
; Fused from listing_map.add_one_u8
LD A, 1
LD ($F004), A ; Virtual register 2 to memory
; Fused from listing_map.add_one_u8
LD HL, ($F016) ; Virtual register 11 from memory
LD D, H
LD E, L
LD HL, 1 ; Constant
ADD HL, DE
LD ($F006), HL ; Virtual register 3 to memory
; Fused: return value
LD HL, ($F006) ; Virtual register 3 from memory
LD ($F018), HL ; Virtual register 12 to memory
; Call listing_map.console_log_u8
; Register-based parameter passing
LD A, ($F018) ; Virtual register 12 from memory
; Parameter c in A
CALL listing_map_console_log_u8
LD ($F01A), HL ; Virtual register 13 to memory
; Restore DJNZ counter after operations
POP HL
LD ($F012), HL
; Restore array pointer after operations
POP HL
LD ($F014), HL
; Advance to next element
LD HL, ($F014) ; Virtual register 10 from memory
INC HL
LD ($F014), HL ; Virtual register 10 to memory
; DJNZ - decrement and loop
LD A, ($F012) ; Virtual register 9 from memory
LD B, A
DEC B
LD A, B
LD ($F012), A ; Virtual register 9 to memory
JR NZ, listing_map_main_djnz_loop_1_i1
; add_one — dead code (fused inline), still emitted:
listing_map_add_one_u8:
LD A, #00 ; Parameter x (gets patched)
...
RETNote: add_one is fused into the loop (lines 9-12 in MIR, "Fused from" comments in asm). But the original function definition is still emitted as dead code (lines 89-109 in full output). The fusion uses 16-bit ADD HL, DE for what should be INC A — a known u8-as-u16 codegen issue.
MinZ source:
fun console_log(c: u8) -> void {
asm {
OUT ($23), A
}
}
fun main() -> void {
let arr: [u8; 5] = [65, 66, 67, 68, 69];
arr.filter(|x| x > 66).forEach(console_log);
asm {
DI
HALT
}
}
MIR (--dump-mir) — note inline filter at instruction 10:
Function listing_filter.main() -> void
@smc
Locals:
r1 = arr: [5]u8
Instructions:
0: r2 = array_literal ; Array literal [[65 66 67 68 69]] -> DB directive
1: r8 = load arr
2: nop ; DJNZ OPTIMIZED LOOP for array[5]
3: r9 = 5 ; DJNZ counter = 5
4: r10 = r8 ; Pointer to array start
5: djnz_loop_1:
6: r11 = *r10 ; Load element via pointer
7: push r10 ; Save array pointer before operations
8: push r9 ; Save DJNZ counter before operations
9: r12 = 67 ; Inline filter constant = 67
10: jump_if_flag CY -> filter_continue_2 ; Inline filter: CP 67 + JR CY
11: r13 = call listing_filter.console_log$u8
12: filter_continue_2:
13: r9 = pop ; Restore DJNZ counter
14: r10 = pop ; Restore array pointer
15: r10++ ; Advance to next element
16: djnz r9, djnz_loop_1
17: asm { DI / HALT }
18: return
Z80 assembly (loop body only):
listing_filter_main_djnz_loop_1_i1:
; Load element via pointer
LD HL, ($F014) ; Virtual register 10 from memory
LD A, (HL)
LD ($F016), A ; Virtual register 11 to memory
; Save array pointer before operations
; Register 10 already in HL (tracked)
PUSH HL ; Save array pointer before operations
; Save DJNZ counter before operations (BC preserved across CALL)
LD HL, ($F012) ; Virtual register 9 from memory
PUSH HL ; Save DJNZ counter before operations
; Inline filter constant = 67
LD A, 67
LD ($F018), A ; Virtual register 12 to memory
; Inline filter: CP 67 + JR CY
LD A, ($F016) ; Virtual register 11 from memory
CP 67
JR C, listing_filter_main_filter_continue_2_i1
; Call listing_filter.console_log_u8
; Register 11 already in A (tracked) <-- dynamic tracking!
; Parameter c in A
CALL listing_filter_console_log_u8
LD ($F01A), HL ; Virtual register 13 to memory
; filter_continue_2:
listing_filter_main_filter_continue_2_i1:
; Restore DJNZ counter after operations
POP HL
LD ($F012), HL
; Restore array pointer after operations
POP HL
LD ($F014), HL
; Advance to next element
LD HL, ($F014) ; Virtual register 10 from memory
INC HL
LD ($F014), HL ; Virtual register 10 to memory
; DJNZ - decrement and loop
LD A, ($F012) ; Virtual register 9 from memory
LD B, A
DEC B
LD A, B
LD ($F012), A ; Virtual register 9 to memory
JR NZ, listing_filter_main_djnz_loop_1_i1Key: |x| x > 66 becomes CP 67; JR C, continue (no function call). The CP instruction sets carry if A < 67, which is the inverse of x > 66. After CP, A still holds the element, so the dynamic tracking comment "Register 11 already in A" means no reload is needed for the console_log CALL argument.
take(n)counter folding —LD B, ninstead ofLD B, array.lengthskip(n)pointer offset —ADD HL, DEat init, no skip loop- Inline filter lambda —
CP N+JR Cinstead of CALL + OR A + JR Z - Fusion inlining — eliminates CALL/RET for simple callbacks
ADD A, Astrength reduction —x * 2is 4T instead of 8T (SLA A)
| Operation | Parser | Semantic | Z80 | E2E | Notes |
|---|---|---|---|---|---|
forEach(fn) |
pass | pass | correct | PASS | PUSH/POP HL+BC around CALL |
take(n) |
pass | pass | correct | PASS | Counter folded to LD B, n |
skip(n) |
pass | pass | correct | PASS | Pointer offset + counter adjust |
map(fn) |
pass | pass | correct | PASS | Fusion inlines simple fns |
filter(fn) |
pass | pass | correct | PASS | Named predicate + forEach |
filter(|x| x>N) |
pass | pass | correct | PASS | Inline CP N+1 + JR C |
map(|x| x+N) |
pass | pass | correct | PASS | Lambda inlined |
map+filter+forEach |
pass | pass | correct | PASS | Multi-stage chain |
filter+map+forEach |
pass | pass | correct | PASS | Multi-stage chain |
take+map+forEach |
pass | pass | correct | PASS | Multi-stage chain |
peek(fn) |
pass | pass | compiles | untested | Same path as forEach |
inspect(fn) |
pass | pass | compiles | untested | Alias for peek |
takeWhile(fn) |
pass | pass | compiles | untested | Stack hazard — early exit after PUSH |
enumerate() |
pass | pass | compiles | BROKEN | B used for both counter and index |
reduce(fn) |
pass | pass | compiles | BROKEN | Two LD A — second overwrites first |
skipWhile(fn) |
pass | pass | N/A | N/A | Not in DJNZ mode |
LD B, 5 ; B = loop counter
.loop:
LD A, B
INC A
LD B, A ; B = index (now 6?!)
; ...
DEC B ; decrement... what? counter? index?
JR NZ, .loop ; near-infinite loopB cannot serve two purposes. Needs separate index register (C or memory-backed).
sum_u8_u8:
LD A, 0 ; SMC: acc (will be patched)
LD A, 0 ; SMC: x — OVERWRITES acc!
ADD HL, DE ; 16-bit add on unloaded registers
RETTwo TRUE SMC loads both target A. Second wipes first.
If predicate fails and loop exits early, PUSH HL has no matching POP. Stack
corrupted by 2 bytes per iteration until exit. Needs POP HL before exit jump.
When fusion optimizer inlines a callback (e.g., double), the function definition
is still emitted in the output. Wastes bytes but doesn't affect correctness.
Semantic layer emits PUSH/POP HL based on hasCallOps at IR time. Fusion optimizer
removes CALLs after PUSH/POP are already in the instruction stream. Result: PUSH/POP
with no CALL between them — pure waste (21T/iteration).
- Separate loop counter from enumerate index (different registers)
- Fix reduce: two SMC params can't both target A
- Add E2E tests for both
- Fusion optimizer should remove PUSH/POP when it eliminates all CALLs
- Enables true bare DJNZ path (currently unreachable in E2E tests)
- Emit
POP HLbefore early-exit jump
- The allocator assigns registers but codegen still stores to
$F0xxdefensively - When pointer IS in HL, don't store to
$F014after every use - When counter IS in B, don't round-trip through
$F012
- Fused
add_onecurrently uses 16-bitADD HL, DEinstead ofINC A - Type-aware codegen should use 8-bit paths for u8
- After fusion inlines a function, mark it unused if no other callers
- Remove from output
Source -> Parser -> Semantic -> MIR -> Optimizer -> Z80 Codegen -> Asm Peephole -> MZA
| | | | |
| | | | +-- OpDJNZ -> DEC B + JR NZ
| | | | (or bare DJNZ if no CALL)
| | | +-- Fusion: inline callbacks in DJNZ loops
| | | +-- DCE, copy prop, constant fold
| | +-- OpLoad, OpCall, OpDJNZ, OpJumpIfFlag
| +-- generateDJNZIteration(): counter + pointer + PUSH/POP
+-- tryConvertIteratorChain(): method chain -> IteratorChainExpr
| File | Role |
|---|---|
pkg/ast/iterator.go |
IteratorChainExpr, IteratorOp (Function + Argument fields) |
pkg/parser/participle/convert.go |
tryConvertIteratorChain() — CallExpr -> IteratorChainExpr |
pkg/semantic/iterator.go |
Type checking, DJNZ/indexed dispatch, lambda extraction |
pkg/semantic/iterator_enhanced.go |
Enhanced DJNZ with take/skip/enumerate/reduce |
pkg/codegen/z80.go |
Z80 codegen + register tracking (~5500 lines) |
pkg/codegen/register_allocator.go |
Hint-aware linear scan allocation (~500 lines) |
pkg/codegen/register_allocator_test.go |
Allocation unit tests |
pkg/optimizer/fusion.go |
Fusion optimizer — DJNZ loop detection + callback inlining |
pkg/optimizer/assembly_peephole.go |
Asm peephole (pattern 48: DEC B + JR NZ -> DJNZ) |
pkg/optimizer/dead_code_elimination.go |
DCE — fixed OpJumpIfFlag/OpPush marking |
pkg/z80testing/regalloc_quality_test.go |
Cycle-count regression tests |
| Package | Tests | What |
|---|---|---|
examples/e2e_iterators/ |
11 | Full pipeline E2E: compile -> assemble -> emulate -> hex verify |
tests/iterator_corpus/ |
18 | Full pipeline regression — all 18 compile to Z80 |
pkg/optimizer/ |
19 | Peephole, DCE, superoptimizer, normalization, fusion (7) |
pkg/parser/participle/ |
20 | Chain conversion, argument routing |
pkg/semantic/ |
17 | DJNZ eligibility, filter/map/lambda IR |
pkg/mirvm/ |
8 | DJNZ execution at MIR level |
pkg/codegen/ |
5+ | Z80 assembly patterns + register allocator tests |
pkg/z80testing/ |
3 | Cycle-count regression tests (deterministic) |
| Version | Bug | Fix |
|---|---|---|
| v0.19.4 | HL clobbered by CALL return value | PUSH/POP HL around CALL blocks |
| v0.19.4 | TRUE SMC anchor: LD A,(nn) instead of LD A,N |
Changed to opcode 3E (patchable immediate) |
| v0.19.4 | MIR peephole read inst.Value not inst.Imm |
Fixed trackConstants() to use inst.Imm |
| v0.19.4 | prepareCallArguments() skipped all SMC |
Changed to only skip TRUE SMC (UsesTrueSMC) |
| v0.19.4 | Lambda NumParams=0 after extraction |
Use AddParamWithRegister() |
| v0.19.4 | Inliner assumed params at regs 1,2,... | Use fn.Params[i].Reg |
| v0.19.4 | Copy propagation across labels | Clear copy map at every OpLabel |
| v0.19.5 | DCE removed filter constant (OpJumpIfFlag) |
Added OpJumpIfFlag to Phase 1 marking |
| v0.19.5 | BareDJNZ hint + peephole pattern 48 | DEC B; JR NZ -> DJNZ when no CALL |
| v0.19.5 | findSymbol non-determinism | Prefer shortest match (function entry, not sub-labels) |
| v0.19.5 | Register allocator overhaul (6 phases) | Hint-aware, dynamic tracking, call-clobber-aware |
- Register Allocator Overhaul Report — full pipeline walkthrough with assembly
- Iterator Reality Check — pre-overhaul honest assessment
- ADR-0008: Flag-Based Boolean ABI
- ADR-0009: Superoptimizer Peephole Rules
- GenPlan — canonical development roadmap
- Archived docs — old iterator design docs and reports