Specialize self conversion for descriptor slots#5930
Conversation
|
Thanks, see also #4026 which is a similar (draft) PR I started a long time ago and never quite managed to finish off. Generally this optimization applies to more than just descriptors; would you be willing to extend this optimization to the rest of the methods? (That PR has some discussion.) |
|
I can try to give your PR a look, yes. But I may not be very fast in doing so. |
|
No worries I haven't made progress on that PR for 2 years, all help is welcome at whatever pace 😂 |
|
@davidhewitt I took a stab at integrating #4026 in this branch. It looks good to me but I must say I am a bit lost when it comes to the UI tests failures. Could you have a look ? |
…es ensure a bad type cannot be passed from Python
732cf48 to
c36a08d
Compare
|
@davidhewitt I am really at a loss with the UI tests I think. I tried blessing locally but many currently passing test in CI were modified so I assume I got it wrong (I did install rsut-src, the modification were largely about stripping pyo3::_imp prefix in names). I am happy to take any pointer. |
davidhewitt
left a comment
There was a problem hiding this comment.
Thanks for moving this forward and sorry for the slow review. I will be honest; it felt like the new push is just an AI rewrite of #4026 at ~3x verbosity, which left me a bit unenthusiastic to review.
I have added a load of comments here which are generally the reasons why I stalled out on #4026. I will point out that this seems to me a micro-optimization with a lot of scope to introduce problems, so while it'd be nice to get this landed, I don't think we have justification to land anything which isn't carefully validated at all touch points.
Regarding the UI tests, nox -s update-ui-tests should just do it if you have merged latest main and running on latest stable Rust.
| /// - Number-protocol binary operator fragments (`__add__`, `__radd__`, …, | ||
| /// `__pow__`, `__rpow__`): CPython combines the forward and reflected | ||
| /// fragments into a single `nb_add`/`nb_power` slot, and the runtime helper | ||
| /// may call the reflected fragment with the operands swapped, meaning `_slf` | ||
| /// can arrive with a non-class type. The existing | ||
| /// `ExtractErrorMode::NotImplemented` behaviour on type mismatch is preserved | ||
| /// by using `Checked` for those fragments. |
There was a problem hiding this comment.
Perhaps, but I wonder if we are making a mistake by calling the runtime helper with the arguments swapped (we might not match CPython's behavior for __add__ / __radd__, for example). Maybe CPython always calls the slot with self on the LHS? I cannot remember, worth checking.
There was a problem hiding this comment.
https://github.com/python/cpython/blob/main/Include/cpython/object.h#L62-L64
mentions explicitly that nb_add must check both args.
The proper fix may be to alter define_pyclass_binary_operator_slot in pyo3 to check the argument and dispatch to __add__ and __radd__ accordingly.
This is technically what the code does since a failed extraction is turned into NotImplemented. It is somewhat convoluted which is why it took me a couple of iterations of this comment to get it right.
To me it looks wrong to call __radd__ on the same object if __add__ fails. I believe it should be called on the other object and this is the responsibility of CPython to make the call.
|
I addressed all point of the review and I apologize for not doing a better job of proof-reading the first iteration. My local setup is still a bit unstable but I will deal with fixing the broken tests in the coming days. |
davidhewitt
left a comment
There was a problem hiding this comment.
Thanks, this is looking much better. A few minor comments / suggestions.
Regarding the binary operators, having investigated based on your reply above I ran into at least #6024 as a possible pre-existing issue with them. I think best to leave them as "checked" in this PR and then we can follow up to improve the implementation of them in their own PR.
| const unsafe fn trusted_self(mut self) -> Self { | ||
| self.self_conversion = unsafe { SelfConversionPolicy::trusted() }; | ||
| self | ||
| } |
There was a problem hiding this comment.
I think having unsafe on this function causes the whole constants below to be wrapped in unsafe. This might accidentally swallow other unsafe invocations in the future.
I think I'd prefer the following:
| const unsafe fn trusted_self(mut self) -> Self { | |
| self.self_conversion = unsafe { SelfConversionPolicy::trusted() }; | |
| self | |
| } | |
| const fn self_conversion_policy(mut self, policy: SelfConversionPolicy) -> Self { | |
| self.self_conversion = policy; | |
| self | |
| } |
and leave it to the callers to call unsafe { SelfConversionPolicy::trusted() } on a single line.
| // Ensure that passing a wrong self type from Python does not cause UB | ||
| py_expect_exception!(py, c, "type(c).__richcmp__(object(), 1)", PyTypeError); |
There was a problem hiding this comment.
This is almost certainly wrong; __richcmp__ doesn't get exposed as a real Python method but as the six comparison operators (__lt__ , __eq__, etc.)
For descriptor slots, CPython does ensure a bad type cannot be passed from Python and the type check can hence be bypassed. This allow to make PyO3 written descriptor closer in performance to equivalent descriptors written in C or C++