Skip to content

Conversation

@robert3005
Copy link
Contributor

@robert3005 robert3005 commented Apr 24, 2025

In a followup we will add a stat for nan_count of the array

fixes #1375

There's a annoying side problem here that min == max might still mean
that array is non constant but I think it's better than min/max being
useless. We might need a function to ask if there's any indefinite
values in the array
@robert3005
Copy link
Contributor Author

I think there's an interesting balance here in as much as most operations are not interested in nans just like we don't include nulls. Arrow will include NAN and Infinity in min/max calculation and parquet will not. After reading apache/parquet-format#185 and apache/parquet-format#196 I think we should exclude min/max from min/max stats BUT also add a nan_count stat

@lwwmanning
Copy link
Contributor

lwwmanning commented Apr 24, 2025

persuasive as to why we'd want to exclude NaNs from stats and have a nan_count (tldr: pruning) -- duckdb/duckdb#16962

@robert3005 robert3005 changed the title NAN, -NAN, -Inf and Inf cannot be min/max values of a primitive array NAN cannot be min/max values of a primitive array Apr 25, 2025
@robert3005 robert3005 enabled auto-merge (squash) April 25, 2025 09:21
@robert3005 robert3005 changed the title NAN cannot be min/max values of a primitive array NAN cannot be a min/max value of a primitive array Apr 25, 2025
@robert3005 robert3005 disabled auto-merge April 25, 2025 10:54
@robert3005 robert3005 enabled auto-merge (squash) April 25, 2025 10:54
@robert3005 robert3005 changed the title NAN cannot be a min/max value of a primitive array NaN cannot be a min/max value of a primitive array Apr 25, 2025
@robert3005 robert3005 disabled auto-merge April 25, 2025 10:55
@robert3005 robert3005 enabled auto-merge (squash) April 25, 2025 10:55
@robert3005 robert3005 merged commit 1034887 into develop Apr 29, 2025
32 checks passed
@robert3005 robert3005 deleted the rk/primitivestats branch April 29, 2025 15:27
AdamGS pushed a commit that referenced this pull request Apr 30, 2025
In a followup we will add a stat for nan_count of the array

fixes #1375
robert3005 added a commit that referenced this pull request Apr 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PrimitiveArray min/max stats should exclude NaNs

5 participants