Skip to content

Conversation

@a10y
Copy link
Contributor

@a10y a10y commented Dec 6, 2025

Extracting this out of my hack week project. This adds a new TableStrategy that is like the StructStrategy, except that it allows users to override the write strategy at particular field paths in the schema.

For example, if you have a schema like this:

request: struct?
|__ id: utf8
|__ duration_ms: i64
|__ body: struct?
	|__ bytes: binary?
    |__ json: utf8?

Before you'd be stuck with choosing a single strategy for all fields of the Struct.

TableStrategy allows you to override particular field paths anywhere in the field tree. For example, if you wanted to allow uncompressed struct validity, default BtrBlocks compression but ZSTD compression just for the request.body.bytes field, you'd do:

let btrblocks = Arc::new(CompressingStrategy::new_btrblocks(FlatLayoutStrategy::default()));
let zstd = Arc::new(CompressingStrategy::new_compact(FlatLayoutStrategy::default()));

let table_strategy = TableStrategy::new(
            FlatLayoutStrategy::default(), 
            btrblocks,
        )
        .with_leaf_strategy(field_path!(request.body.bytes), zstd);

I've replaced StructStrategy with this in the default WriteBuilder.

If we like this, we might want to consider marking StructStrategy as deprecated.

a10y added 4 commits December 6, 2025 14:59
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
@codecov
Copy link

codecov bot commented Dec 6, 2025

Codecov Report

❌ Patch coverage is 89.16667% with 26 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.39%. Comparing base (374882d) to head (11a8904).
⚠️ Report is 22 commits behind head on develop.

Files with missing lines Patch % Lines
vortex-layout/src/layouts/table.rs 87.23% 18 Missing ⚠️
vortex-file/src/strategy.rs 38.46% 8 Missing ⚠️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

a10y added 4 commits December 9, 2025 16:12
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
- add integration test for vortex-file
- fix bug in PathStrategy descend
- return Writer back from the write pathway

Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
@a10y a10y requested review from AdamGS and gatesn December 9, 2025 22:15
a10y added 2 commits December 9, 2025 17:19
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
@a10y a10y force-pushed the aduffy/path-strategy branch from 518a2e6 to 157ee48 Compare December 9, 2025 22:34
@a10y a10y added the feature Release label indicating a new feature or request label Dec 9, 2025
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Copy link
Contributor

@gatesn gatesn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah really nice idea. Definitely should just immediately deprecate the StructStrategy.

field_path: impl Into<FieldPath>,
writer: Arc<dyn LayoutStrategy>,
) -> Self {
self.leaf_writers.insert(field_path.into(), writer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should check that we don't have any overlapping prefixes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we panic on this or just ignore/maybe log warning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my vote would be panic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i went with panic

/// A set of leaf field overrides, e.g. to force one column to be compact-compressed.
leaf_writers: HashMap<FieldPath, Arc<dyn LayoutStrategy>>,
/// The writer for any validity arrays that may be present
validity: Arc<dyn LayoutStrategy>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only for non-leaf struct validity? Or all validity? Do we just need a special type of FieldPath that allows us to index into the nulls etc?

e.g. enum PathComponent { Field(FieldName), ListElements, Validity } maybe? Not sure... might be gross.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

structs are the only case right now where validity is extracted out and compressed into its own child layout, all of the other strategies just store it alongside the array data

///
/// let strategy = PathStrategy::new(flat.clone(), flat.clone());
/// ```
pub fn new(validity: Arc<dyn LayoutStrategy>, fallback: Arc<dyn LayoutStrategy>) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think validity should just default to flat? Or maybe default everything and use builder pattern to override?

) -> VortexResult<LayoutRef> {
let dtype = stream.dtype().clone();

// Fallback: if the array is not a struct, fallback to writing a single array.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think you actually want to check the leaf_writers here for a root FieldPath, that would allow overriding intermediate struct strategies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually to maintain compatibility with behavior in StructStrategy, which lets you serialize non-struct flat arrays. Not sure why we let you do that but it's the current behavior and we have unit tests for it 🤷

a10y added 3 commits December 10, 2025 08:52
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
@a10y a10y force-pushed the aduffy/path-strategy branch from c6dc373 to 15b8c9e Compare December 10, 2025 15:52
a10y added 2 commits December 10, 2025 11:02
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
@a10y a10y changed the title feature: PathStrategy feature: TableStrategy Dec 10, 2025
@a10y a10y requested a review from gatesn December 10, 2025 16:04
a10y added 2 commits December 10, 2025 11:08
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature Release label indicating a new feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants