Skip to content

Optimize P3 async callback trampolines and stream buffer access patterns #75

@avrabe

Description

@avrabe

Context

When Meld lowers P3 async components, it generates callback trampolines and stream buffer access code. These generated patterns are optimization targets for Loom — they're mechanical, repetitive, and have known structure.

Optimization Opportunities

1. Callback Dispatch Optimization

Meld generates switch-on-event-type dispatch in callback trampolines:

(func $__callback (param $event_type i32) (param $payload i32)
    (if (i32.eq (local.get $event_type) (i32.const 2))  ;; STREAM_READ
        (then (call $handle_stream_read (local.get $payload)))
    )
    (if (i32.eq (local.get $event_type) (i32.const 4))  ;; FUTURE_READ
        (then (call $handle_future_read (local.get $payload)))
    )
)

Loom can optimize this to a branch table or computed goto, reducing dispatch overhead from O(n) comparisons to O(1).

2. Stream Buffer Access Inlining

Stream read/write patterns always follow:

  1. Call stream_read intrinsic → get pointer + length
  2. Process elements in a loop
  3. Optionally call stream_write to produce output

Loom can:

  • Inline the bounds check when buffer size is statically known
  • Vectorize element processing loops (SIMD where applicable)
  • Eliminate redundant bounds checks in read-then-write patterns

3. Dead Event Handler Elimination

If a component only subscribes to STREAM_READ events, the FUTURE_READ and SUBTASK handlers are dead code. Loom's DCE pass can eliminate them, but needs to understand that the callback is only invoked with subscribed event types.

4. Backpressure Fast Path

When a component sets task.backpressure(false) (common case — accepting work), the backpressure check in the host call path can be eliminated. Loom can propagate this constant.

Z3 Verification

All optimizations on callback trampolines must be Z3-verified:

  • Dispatch correctness: optimized dispatch reaches the same handler as original
  • Stream buffer safety: bounds checks are preserved or provably unnecessary
  • DCE safety: eliminated handlers are genuinely unreachable

Connects to

  • loom#71: Island-model optimization (try different callback optimization strategies)
  • meld#94: P3 lowering (generates the code Loom optimizes)
  • kiln#230: P3 runtime (defines the intrinsic semantics)

Priority

Medium — follows P3 lowering in Meld. Optimization of generated code.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions