The dispatching right now takes a bit more than 1us for me. The vast majority of the time is spend just extracting the arguments that we want to dispatch for.
Moving the main Dispatchable to C, it should be possible to achieve <300ns, maybe well below for almost all cases. (Even compiling Python code on the fly could come close).
NumPy's __array_function__ does this (in parts, instead of parameter names, NumPy always uses a function to extract the ones to dispatch for) for example and is thus much faster.
This is probably not a high priority, but it is too bad that so much time is spend on very simple things (find unique types in a few keywords and forward the parameters).
The dispatching right now takes a bit more than 1us for me. The vast majority of the time is spend just extracting the arguments that we want to dispatch for.
Moving the main
Dispatchableto C, it should be possible to achieve <300ns, maybe well below for almost all cases. (Even compiling Python code on the fly could come close).NumPy's
__array_function__does this (in parts, instead of parameter names, NumPy always uses a function to extract the ones to dispatch for) for example and is thus much faster.This is probably not a high priority, but it is too bad that so much time is spend on very simple things (find unique types in a few keywords and forward the parameters).