Optimize P3 async callback trampolines and stream buffer access patterns

## Context

When Meld lowers P3 async components, it generates callback trampolines and stream buffer access code. These generated patterns are optimization targets for Loom — they're mechanical, repetitive, and have known structure.

## Optimization Opportunities

### 1. Callback Dispatch Optimization
Meld generates switch-on-event-type dispatch in callback trampolines:
```wasm
(func $__callback (param $event_type i32) (param $payload i32)
    (if (i32.eq (local.get $event_type) (i32.const 2))  ;; STREAM_READ
        (then (call $handle_stream_read (local.get $payload)))
    )
    (if (i32.eq (local.get $event_type) (i32.const 4))  ;; FUTURE_READ
        (then (call $handle_future_read (local.get $payload)))
    )
)
```

Loom can optimize this to a branch table or computed goto, reducing dispatch overhead from O(n) comparisons to O(1).

### 2. Stream Buffer Access Inlining
Stream read/write patterns always follow:
1. Call `stream_read` intrinsic → get pointer + length
2. Process elements in a loop
3. Optionally call `stream_write` to produce output

Loom can:
- Inline the bounds check when buffer size is statically known
- Vectorize element processing loops (SIMD where applicable)
- Eliminate redundant bounds checks in read-then-write patterns

### 3. Dead Event Handler Elimination
If a component only subscribes to STREAM_READ events, the FUTURE_READ and SUBTASK handlers are dead code. Loom's DCE pass can eliminate them, but needs to understand that the callback is only invoked with subscribed event types.

### 4. Backpressure Fast Path
When a component sets `task.backpressure(false)` (common case — accepting work), the backpressure check in the host call path can be eliminated. Loom can propagate this constant.

## Z3 Verification

All optimizations on callback trampolines must be Z3-verified:
- Dispatch correctness: optimized dispatch reaches the same handler as original
- Stream buffer safety: bounds checks are preserved or provably unnecessary
- DCE safety: eliminated handlers are genuinely unreachable

## Connects to

- loom#71: Island-model optimization (try different callback optimization strategies)
- meld#94: P3 lowering (generates the code Loom optimizes)
- kiln#230: P3 runtime (defines the intrinsic semantics)

## Priority

Medium — follows P3 lowering in Meld. Optimization of generated code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize P3 async callback trampolines and stream buffer access patterns #75

Context

Optimization Opportunities

1. Callback Dispatch Optimization

2. Stream Buffer Access Inlining

3. Dead Event Handler Elimination

4. Backpressure Fast Path

Z3 Verification

Connects to

Priority

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimize P3 async callback trampolines and stream buffer access patterns #75

Description

Context

Optimization Opportunities

1. Callback Dispatch Optimization

2. Stream Buffer Access Inlining

3. Dead Event Handler Elimination

4. Backpressure Fast Path

Z3 Verification

Connects to

Priority

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions