-
Notifications
You must be signed in to change notification settings - Fork 107
feature: TableStrategy #5640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
feature: TableStrategy #5640
Conversation
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Codecov Report❌ Patch coverage is
☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
- add integration test for vortex-file - fix bug in PathStrategy descend - return Writer back from the write pathway Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
518a2e6 to
157ee48
Compare
gatesn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah really nice idea. Definitely should just immediately deprecate the StructStrategy.
vortex-layout/src/layouts/path.rs
Outdated
| field_path: impl Into<FieldPath>, | ||
| writer: Arc<dyn LayoutStrategy>, | ||
| ) -> Self { | ||
| self.leaf_writers.insert(field_path.into(), writer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should check that we don't have any overlapping prefixes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we panic on this or just ignore/maybe log warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my vote would be panic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i went with panic
| /// A set of leaf field overrides, e.g. to force one column to be compact-compressed. | ||
| leaf_writers: HashMap<FieldPath, Arc<dyn LayoutStrategy>>, | ||
| /// The writer for any validity arrays that may be present | ||
| validity: Arc<dyn LayoutStrategy>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this only for non-leaf struct validity? Or all validity? Do we just need a special type of FieldPath that allows us to index into the nulls etc?
e.g. enum PathComponent { Field(FieldName), ListElements, Validity } maybe? Not sure... might be gross.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
structs are the only case right now where validity is extracted out and compressed into its own child layout, all of the other strategies just store it alongside the array data
| /// | ||
| /// let strategy = PathStrategy::new(flat.clone(), flat.clone()); | ||
| /// ``` | ||
| pub fn new(validity: Arc<dyn LayoutStrategy>, fallback: Arc<dyn LayoutStrategy>) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think validity should just default to flat? Or maybe default everything and use builder pattern to override?
| ) -> VortexResult<LayoutRef> { | ||
| let dtype = stream.dtype().clone(); | ||
|
|
||
| // Fallback: if the array is not a struct, fallback to writing a single array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think you actually want to check the leaf_writers here for a root FieldPath, that would allow overriding intermediate struct strategies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually to maintain compatibility with behavior in StructStrategy, which lets you serialize non-struct flat arrays. Not sure why we let you do that but it's the current behavior and we have unit tests for it 🤷
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
c6dc373 to
15b8c9e
Compare
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Extracting this out of my hack week project. This adds a new
TableStrategythat is like the StructStrategy, except that it allows users to override the write strategy at particular field paths in the schema.For example, if you have a schema like this:
Before you'd be stuck with choosing a single strategy for all fields of the Struct.
TableStrategy allows you to override particular field paths anywhere in the field tree. For example, if you wanted to allow uncompressed struct validity, default BtrBlocks compression but ZSTD compression just for the
request.body.bytesfield, you'd do:I've replaced StructStrategy with this in the default WriteBuilder.
If we like this, we might want to consider marking
StructStrategyas deprecated.