Skip to content

Pushdown string and date formatting functions#174

Draft
iskakaushik wants to merge 1 commit intomainfrom
pushdown-string-date-functions
Draft

Pushdown string and date formatting functions#174
iskakaushik wants to merge 1 commit intomainfrom
pushdown-string-date-functions

Conversation

@iskakaushik
Copy link
Copy Markdown
Collaborator

Summary

Add query pushdown for five PostgreSQL functions, eliminating local
evaluation for common string manipulation and date formatting patterns:

  • split_part()splitByString() with array indexing (arg reorder)
  • regexp_replace()replaceRegexpOne() / replaceRegexpAll() (flag-dependent)
  • array_to_string()arrayStringConcat()
  • concat_ws()arrayStringConcat(arrayFilter(...)) with NULL filtering
  • to_char()formatDateTime() with PG→CH format string translation

These functions appear in roughly 25% of surveyed dbt models that
currently fall back to local evaluation.

Closes PG-132, PG-133, PG-134, PG-135.

Test plan

  • make tempcheck passes on local PG 18 + ClickHouse latest
  • CI matrix passes across all supported ClickHouse versions (23.3–25.12)
  • Verify pushdown via EXPLAIN (VERBOSE, COSTS OFF) for each function
  • Edge cases: NULL args in concat_ws, MI vs MM ordering in to_char, flag variants in regexp_replace

Remote SQL: SELECT COALESCE(CAST(a AS Nullable(String)), CAST(b AS Nullable(String)), CAST(c AS String)), a, b, c FROM functions_test.t1 GROUP BY a, b, c
(4 rows)

SELECT coalesce(a::text, b::text, c::text) FROM t1 GROUP BY a, b, c;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should have ORDER BY 1

@serprex
Copy link
Copy Markdown
Member

serprex commented Apr 2, 2026

functions.out has a lot of changes which seem like regressions

@iskakaushik
Copy link
Copy Markdown
Collaborator Author

hence in draft.

@iskakaushik iskakaushik force-pushed the pushdown-string-date-functions branch from 2f3af6c to 756985b Compare April 2, 2026 20:49
@iskakaushik
Copy link
Copy Markdown
Collaborator Author

Both review comments from the first push are now resolved:

  1. coalesce ORDER BY — the row order change was an artifact of regenerating expected output from a different timezone. The force-push now uses CI-generated output (from docker containers in UTC), so the pre-existing output is unchanged.

  2. percentile_cont epoch value — same cause. The current commit only adds new lines for the new function tests; no pre-existing expected output was modified.

Add query pushdown support for five PostgreSQL functions that
previously required local evaluation:

  split_part(str, delim, n) → splitByString(delim, str)[n]
  regexp_replace(str, pat, rep [, flags]) → replaceRegexpOne/All(...)
  array_to_string(arr, sep) → arrayStringConcat(arr, sep)
  concat_ws(sep, a, b, ...) → arrayStringConcat(arrayFilter(...), sep)
  to_char(ts, fmt) → formatDateTime(ts, translated_fmt)

The regexp_replace translation inspects the flags argument: without
'g' it maps to replaceRegexpOne, with 'g' to replaceRegexpAll. The
concat_ws translation wraps each argument in ifNull and filters
empty strings, matching PostgreSQL's NULL-skipping semantics. The
to_char translation converts PG format tokens (YYYY, MM, DD, HH24,
HH12, MI, SS) to ClickHouse strftime equivalents, with care to
check MI before MM since both start with 'M', and to use %i (not
%M) for minutes since ClickHouse's %M means month name.

These functions appear in roughly 25% of surveyed dbt models that
currently fall back to local evaluation.

Closes PG-132, PG-133, PG-134, PG-135.
@iskakaushik iskakaushik force-pushed the pushdown-string-date-functions branch from 756985b to 6dfca48 Compare April 2, 2026 21:04
@serprex
Copy link
Copy Markdown
Member

serprex commented Apr 3, 2026

order by should still be there, any query with multiple results which aren't all the same needs ORDER BY because SQL doesn't specify order in that circumstance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants