Skip to content

SQL: NULL propagation with || string concatenation #6946

@philrz

Description

@philrz

When doing string concatenation via ||, super currently ignores nulls, but the typical SQL behavior is to propagate the nulls, e.g.,

$ super -c "SELECT NULL || 'x';"
{"||\"x\"":"x"}

vs. Postgres or DuckDB, respectively:

$ psql postgres -c "\pset null 'NULL'" -c "SELECT NULL || 'x';"
Null display is "NULL".
 ?column? 
----------
 NULL
(1 row)

$ duckdb -c "SELECT NULL || 'x';"
┌───────────────┐
│ (NULL || 'x') │
│     int32     │
├───────────────┤
│          NULL │
└───────────────┘

Details

Repro is with super commit a67ca49.

It looks like historically SuperDB's implementation of || and the CONCAT function have relied on the same underlying code, but it seems this is not typical for most SQL implementations. In the Postgres docs for instance, it's disclosed that "NULL arguments are ignored" in their CONCAT function, but nothing specific is said about NULL handling as relates to ||, which implies that the typical SQL behavior of "NULL propagation" should be expected, i.e., the presence of a single NULL in the inputs causes the entire result to be a NULL.

Tracing the history of this with the SQL standard is tricky since the necessary docs are not all freely available, but Claude Opus 4.7 offers a plausible interpretation of how we ended up here. When || was introduced in SQL-92, the NULL propagation behavior was implied for consistency (e.g., with other operations that evaluate to null/unknown as in sections 6.12, 8.2, etc.) Allegedly CONCAT was introduced in SQL:2003 (albeit in a limited format that only accepted two arguments) while specifying that it should ignore NULL arguments. The assumption is that this was a pragmatic concession because in the early 2000's SQL implementations like MySQL and Oracle were already treating NULLs as empty strings for usability reasons (a "I just want to print what's available" use case) so specifying CONCAT with the NULL-ignoring semantics was a way to bless the more ergonomic behavior without breaking backward compatibility of what had already been established with ||.

SuperDB's behavior here has actually oscillated. In the initial attempt at SQL-compatible string concatenation, NULL was propagated as expected with SQL ||, but also with CONCAT which goes against the standard behavior.

$ super -version
Version: v0.2.0-44-g8b2454722

$ super -c "SELECT NULL || 'x';"
{"||\"x\"":null}

$ super -c "SELECT CONCAT(NULL, 'x');"
{CONCAT:null}

Then once #6730 merged, these both switched over to ignoring NULL, so now CONCAT is in line with the standard behavior while || is going against the standard.

$ super -version
Version: v0.2.0-45-ga67ca49fa

$ super -c "SELECT NULL || 'x';"
{"||\"x\"":"x"}

$ super -c "SELECT CONCAT(NULL, 'x');"
{CONCAT:"x"}

All that said, while Postgres and its close peers seem to track the standard, this is one of those areas where it's a bit of a mess across all SQL implementations. For instance, ClickHouse propagates NULL with both || and CONCAT, while Oracle ignores NULL in both. Based on this, perhaps we could defensibly go our own way and stick with what we're doing now. But given that users may have come to expect variation here, we should update our docs to reflect what we ultimately decide and frame it relative to the SQL standard and/or Postgres.

f-strings

As noted in #6730, the f-string implementation is currently dependent on this same implementation of string concatenation. Therefore if we make a change in behavior that changes f-string behavior yet again, we'll want to update those docs as well.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions