Context
When Meld lowers P3 async components, it generates callback trampolines and stream buffer access code. These generated patterns are optimization targets for Loom — they're mechanical, repetitive, and have known structure.
Optimization Opportunities
1. Callback Dispatch Optimization
Meld generates switch-on-event-type dispatch in callback trampolines:
(func $__callback (param $event_type i32) (param $payload i32)
(if (i32.eq (local.get $event_type) (i32.const 2)) ;; STREAM_READ
(then (call $handle_stream_read (local.get $payload)))
)
(if (i32.eq (local.get $event_type) (i32.const 4)) ;; FUTURE_READ
(then (call $handle_future_read (local.get $payload)))
)
)
Loom can optimize this to a branch table or computed goto, reducing dispatch overhead from O(n) comparisons to O(1).
2. Stream Buffer Access Inlining
Stream read/write patterns always follow:
- Call
stream_read intrinsic → get pointer + length
- Process elements in a loop
- Optionally call
stream_write to produce output
Loom can:
- Inline the bounds check when buffer size is statically known
- Vectorize element processing loops (SIMD where applicable)
- Eliminate redundant bounds checks in read-then-write patterns
3. Dead Event Handler Elimination
If a component only subscribes to STREAM_READ events, the FUTURE_READ and SUBTASK handlers are dead code. Loom's DCE pass can eliminate them, but needs to understand that the callback is only invoked with subscribed event types.
4. Backpressure Fast Path
When a component sets task.backpressure(false) (common case — accepting work), the backpressure check in the host call path can be eliminated. Loom can propagate this constant.
Z3 Verification
All optimizations on callback trampolines must be Z3-verified:
- Dispatch correctness: optimized dispatch reaches the same handler as original
- Stream buffer safety: bounds checks are preserved or provably unnecessary
- DCE safety: eliminated handlers are genuinely unreachable
Connects to
- loom#71: Island-model optimization (try different callback optimization strategies)
- meld#94: P3 lowering (generates the code Loom optimizes)
- kiln#230: P3 runtime (defines the intrinsic semantics)
Priority
Medium — follows P3 lowering in Meld. Optimization of generated code.
Context
When Meld lowers P3 async components, it generates callback trampolines and stream buffer access code. These generated patterns are optimization targets for Loom — they're mechanical, repetitive, and have known structure.
Optimization Opportunities
1. Callback Dispatch Optimization
Meld generates switch-on-event-type dispatch in callback trampolines:
Loom can optimize this to a branch table or computed goto, reducing dispatch overhead from O(n) comparisons to O(1).
2. Stream Buffer Access Inlining
Stream read/write patterns always follow:
stream_readintrinsic → get pointer + lengthstream_writeto produce outputLoom can:
3. Dead Event Handler Elimination
If a component only subscribes to STREAM_READ events, the FUTURE_READ and SUBTASK handlers are dead code. Loom's DCE pass can eliminate them, but needs to understand that the callback is only invoked with subscribed event types.
4. Backpressure Fast Path
When a component sets
task.backpressure(false)(common case — accepting work), the backpressure check in the host call path can be eliminated. Loom can propagate this constant.Z3 Verification
All optimizations on callback trampolines must be Z3-verified:
Connects to
Priority
Medium — follows P3 lowering in Meld. Optimization of generated code.