-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Add BatchAdapter to simplify using PhysicalExprAdapter / Projector to map RecordBatch between schemas #19716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
(should we undeprecate the schema adapter?) |
|
we can also do a 52.1.0 release too |
|
I actually considered that and decided against it for a couple of reasons:
So given that we don't really want the trait, that we'd be deprecating half of the methods anyway and that the other half could use a refactor / breaking changes to simplify the APIs and that it wouldn't really help most of the use cases ( |
I see no reason why this can't be included in a |
Speaking of which, here it is: |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @adriangb -- I think it is a useful API for sure. However I wonder if we can make it easier for downstream users (see comment inline)
| } | ||
| } | ||
|
|
||
| /// Factory for creating [`BatchAdapter`] instances to adapt record batches |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a useful API for sure (and we hit the same thing when upgrading to 52.0.0 internally)
It looks almost, but not quite the same, as SchemaAdapterFactor / Mapper. In other words, the API is different than SchemaAdapterFactor but looks to me like we could have it do the same thing.
To assist with people upgrading is there any way we can un deprecate SchemaAdapterFactory? For example, maybe we could move SchemaAdapterFactory into physical-expr-adapter and leave a reference back to the old location?
That way people who have the code that uses the old interface could continue to do so with minimal disruption
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deprecation was not fully equal. So the mapper provided opportunities to map_schema which can be replaced by PhysicalExpeAdapter and also map_batch which is challenging(if even possible) to have with new API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes agreed, I give my reasoning in #19716 (comment) but TLDR is it would be hard / impossible to replicate the exact semantics and APIs of SchemaAdapter so any sort of un-deprecated version would only be half functional and probably cause more headache than it's worth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb I'm curious why you need this / why implementing a custom PhysicalExprAdapter isn't enough?
|
@comphead FYI for Comet stuff. |
|
Thanks @adriangb I missed this PR somehow, actually in Comet we were experimenting with 2 main directions:
I'll check this thoroughly today, thanks for thinking of it ahead of us )) |
I'm a bit confused, isn't it pretty much the same thing? What we do in the Parquet opener is It also doesn't really seem like something you should have to do, if you provide the casting rules you want via |
… map RecordBatch between schemas
76aec5b to
d39a948
Compare
unfortunately it is slightly more than just casting(applying default values, unifying schemas, etc), we doing some RB->RB modification just after the scan, hopefully we can do this better in future as this part is expensive. |
from my investigation in apache/datafusion-comet#3047 it seemed that something like |
Correct, I'll try to do a migration today based on this PR, with adapter it is more promising. And you are right, we also should do more on expression level than on RB level, but this would be quite some investigation to be done |
I may be missing something but I think the |
Sounds good, just started discussion in the apache/datafusion-comet#3047 (comment) |
I've now seen this pattern a couple of times, in our own codebase, working on apache/datafusion-comet#3047.
I was going to add an example but I think adding an API to handle it for users is a better experience.
This should also make it a bit easier to migrate from SchemaAdapter. In fact, I think it's possible to implement a SchemaAdapter using this as the foundation + some shim code. This won't be available in DF 51 to ease migration but it's easy enough to backport (just copy the code in this PR) for users that would find that helpful.