-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Is your feature request related to a problem? Please describe.
We have been using the ReplacingMergeTree engine in production often, have used insert as update motifs. In my RMT pipelines, we frequently need to materialize most recent state and have done so under aggregation, using
WITH base_table_final AS (
SELECT
a.order_by_col_1,
a.order_by_col_2,
argMax(a.col_3, a.version_col) AS col_3
FROM base_table
GROUP BY a.order_by_col_1, a.order_by_col_2
)
instead of using the FINAL keyword. It was not clear that the default behavior, when multiple unmerged parts exist, is to skip NULLs (NULLs are a returnable value for argMax when and only when all part values are NULL), and took some time to discover this effect (silent bug) in production.
The argMax approach is written about frequently and has been our primary tool for materializing most recent state, and while it is an aggregation function (and its handling of NULLs is hence logically consistent with other aggregate functions), would imagine in most production use cases it is used for this 'materialize most recent state' purpose.
We have discovered there is a workaround motif where argMax value is wrapped as a tuple and unwrapped after the aggregation function, as
WITH base_table_final AS (
SELECT
a.order_by_col_1,
a.order_by_col_2,
argMax(tuple(a.col_3), a.version_col).1 AS col_3
FROM base_table
GROUP BY a.order_by_col_1, a.order_by_col_2
)
Describe the solution you'd like
We would propose to make the default behavior of argMax to respect NULLs and not skip NULLs.
Describe alternatives you've considered
Alternatively, we think an additional aggregate function, argMaxOrNull would suffice