Consolidate accel boundary#128
Conversation
|
@bardo84 is attempting to deploy a commit to the Dystr Team on Vercel. A member of the Team first needs to authorize it. |
There was a problem hiding this comment.
This PR is being reviewed by Cursor Bugbot
Details
Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
| "Resolver could not find body for '{}'", | ||
| function_def.name | ||
| )) | ||
| })?; |
There was a problem hiding this comment.
JIT user function calls fail without resolver fallback
High Severity
The execute_user_function_isolated function now requires the function_resolver to provide the function body, but when no resolver is supplied via existing public APIs (execute_compiled, execute_compiled_with_functions, execute_or_compile), EmptyFunctionResolver is used which always returns None. This causes all user function calls to fail with "Resolver could not find body" error. The UserFunction struct already contains a body field that was previously used, but the new code doesn't fall back to function_def.body when the resolver returns None. This regression breaks JIT execution of any bytecode containing user function calls when using the existing APIs.
Boundary leakage in the pipeline
crates/runmat-ignition/src/vm.rs:181-2000), so lettingrunmat_accelerate_api::provider()escape into builtins scattered across the tree makes telemetry, gatekeeping, and fallback behavior hard to keep consistent.crates/runmat-ignition/src/vm.rs:5921-6566,crates/runmat-runtime/src/accel_provider.rs), exposing seams that leak into the interpreter boundary when the runtime falls back to host tensors.Chosen approach
Code-level mission (individual fixes)
runmat_runtime::call_builtin, which now routes GPU work throughaccel_provider::maybe_providerso the runtime trace counter and telemetry stay centralized (crates/runmat-runtime/src/dispatcher.rs:23-70andcrates/runmat-runtime/src/elementwise.rs:1-280).crate::accel_providerwith#[cfg(feature = "wgpu")]to avoid unused-import lints while keeping the helper accessible when wgpu is enabled (e.g.,crates/runmat-runtime/src/builtins/array/shape/ipermute.rs:8-14).Builtin mission (systematic macro work)
runmat_macros::runtime_builtinso every generated catalog entry includes anaccel_providerflag/guard setter; the macro can then importrunmat_runtime::accel_provider::{maybe_provider, maybe_provider_for_handle, provider_for_handle}and emit the telemetry-friendly guard without touching the implementation body.tools/builtin_inventory/) once the macro is extended so the generated list of helpers reflects the new contracts instead of referencing rawrunmat_accelerate_api::provider()calls scattered acrosscrates/runmat-runtime/src/builtins.tools/convert_accel_provider.py) whenever a builtin or test still importsrunmat_accelerate_api::provider()to keep the helper-based pattern in sync.Tools & documentation
https://github.com/bardo84/pycombynow presents a generic "way forward" (pattern matching → helper injection → verify workflow) that matches our macro-driven approach without naming RunMat specifics.docs/runmat_HIR_VM.mdand see why we did the migration and how.Validation
cargo test -p runmat-runtime workspace(ran with an extended timeout after the first invocation hit the 124 s limit) confirms that the helper changes compile and the workspace/introspection tests still pass.tests/functions/closure_resolver_script.mbecame the regression guard for nested closures and script-defined handles, ensuring the shared resolver still serves HIR bodies when the JIT and interpreter cross the workspace boundary.Next steps before closing the migration
accel_providermetadata and rebuild the builtin inventory so new helpers automatically get the shared guard.cargo test -p runmat-core async_stdin -- --test-threads=2,cargo test -p runmat-runtime workspace) after each macro/builtin change to verify TLS handler isolation remains stable.