Skip to content

Fix DML misclassification for statements containing UNION/INTERSECT/EXCEPT#1420

Open
msrathore-db wants to merge 2 commits intodatabricks:mainfrom
msrathore-db:fix/issue-1418-dml-union-classification
Open

Fix DML misclassification for statements containing UNION/INTERSECT/EXCEPT#1420
msrathore-db wants to merge 2 commits intodatabricks:mainfrom
msrathore-db:fix/issue-1418-dml-union-classification

Conversation

@msrathore-db
Copy link
Copy Markdown
Collaborator

@msrathore-db msrathore-db commented Apr 22, 2026

Summary

Fixes #1418.

Statement.execute() was incorrectly returning true (and getUpdateCount() returning -1) for INSERT / UPDATE / DELETE / MERGE statements whose subqueries or CTEs contained UNION, INTERSECT, or EXCEPT. The UNION_PATTERN, INTERSECT_PATTERN, and EXCEPT_PATTERN regexes in DatabricksJdbcConstants are non-anchored (\s+UNION\s+ etc.) and were matched via find() inside shouldReturnResultSet, so the keyword was picked up anywhere in the SQL — even deep inside a subquery of an outer DML, or inside the Databricks column-exclusion form SELECT * EXCEPT (col).

Downstream:

  • executeUpdate() threw DatabricksSQLException because shouldReturnResultSet==true triggers the post-execution guard at DatabricksStatement.java:103-108.
  • getUpdateCount() returned -1 because executeInternal forces updateCount = -1 when shouldReturnResultSet==true.
  • Frameworks that use !execute() as the DML detector (e.g. Slick's sqlu) crashed on perfectly normal DML.
  • This was a regression from Simba 2.7.5, which returned false for these inputs.

Fix

Short-circuit shouldReturnResultSet to false when the trimmed query starts with a DML keyword, so the non-anchored set-operator patterns can't fire on subquery content:

if (DML_PREFIX_PATTERN.matcher(trimmedQuery).find()) {
    return false;
}

A separate DML_PREFIX_PATTERN (matching ^(\s*\()*\s*(INSERT|UPDATE|DELETE|MERGE)\s+) is added so the existing INSERT_PATTERN — shared with InsertStatementParser and requiring INSERT INTO — is untouched. This also lets the new guard cover INSERT OVERWRITE ... (called out in the issue).

The existing NonRowcountQueryPrefixes opt-in still wins: it is evaluated before the new short-circuit, so a user who has explicitly set e.g. NonRowcountQueryPrefixes=INSERT still gets ResultSet mode for INSERTs.

Why this is safe

  • Every existing top-level UNION / INTERSECT / EXCEPT test case continues to classify as ResultSet via the already-anchored SELECT_PATTERN / WITH_PATTERN / VALUES_PATTERN / FROM_PATTERN. A regression test for (SELECT ...) UNION (SELECT ...) is included.
  • Plain DML (INSERT INTO t VALUES (1), UPDATE t SET c=1, DELETE FROM t, MERGE INTO ...) previously returned false because none of the OR-chain patterns matched; it continues to return false now, via the short-circuit.
  • INSERT_PATTERN is not modified, so InsertStatementParser / EnableBatchedInserts are unaffected.
  • All 3305 tests in jdbc-core pass.

Test plan

  • New unit tests in DatabricksStatementTest covering:
    • INSERT ... SELECT ... FROM (... UNION ALL ...)
    • INSERT ... FROM (... INTERSECT ...)
    • INSERT ... FROM (... EXCEPT ...)
    • INSERT ... SELECT * EXCEPT (col) FROM src (Databricks column-exclusion)
    • INSERT OVERWRITE DIRECTORY ... INTERSECT ...
    • UPDATE ... (... UNION ...)
    • DELETE ... (... EXCEPT ...)
    • MERGE INTO ... USING (... UNION ...) ...
    • NonRowcountQueryPrefixes=INSERT opt-in still forces ResultSet mode
    • (SELECT ...) UNION (SELECT ...) still classified as ResultSet (regression guard)
  • All 102 existing DatabricksStatementTest cases pass (including top-level UnionQuery / IntersectQuery / ExceptQuery).
  • All 52 DatabricksPreparedStatementTest cases pass.
  • All 24 InsertStatementParserTest cases pass (batching parser untouched).
  • Full mvn test -pl jdbc-core clean: 3305 tests, 0 failures, 0 errors.

NO_CHANGELOG=true

…XCEPT

`Statement.execute()` was incorrectly returning `true` (and
`getUpdateCount()` returning `-1`) for `INSERT` / `UPDATE` / `DELETE` /
`MERGE` statements whose subqueries or CTEs contained `UNION`,
`INTERSECT`, or `EXCEPT`. The `UNION_PATTERN`, `INTERSECT_PATTERN`, and
`EXCEPT_PATTERN` regexes in `DatabricksJdbcConstants` are non-anchored
(`\s+UNION\s+` etc.) and were matched via `find()` inside
`shouldReturnResultSet`, so the keyword was picked up anywhere in the
SQL — even deep inside a subquery of an outer DML, or inside the
Databricks column-exclusion form `SELECT * EXCEPT (col)`.

`executeUpdate()` then threw `DatabricksSQLException`, `getUpdateCount()`
lost the affected-row count, and frameworks like Slick that use
`!execute()` as the DML detector crashed. This also regressed behavior
from the Simba `2.7.5` driver, which returned `false` for these inputs.

Short-circuit `shouldReturnResultSet` to `false` when the trimmed query
starts with a DML keyword, so the non-anchored set-operator patterns
can't fire on subquery content. A separate `DML_PREFIX_PATTERN` is added
so `INSERT_PATTERN` (shared with the batching parser and requiring
`INSERT INTO`) is untouched — this lets the guard also cover
`INSERT OVERWRITE ...`. The existing `NonRowcountQueryPrefixes` opt-in
still wins: it is evaluated before the new short-circuit.

Fixes databricks#1418

Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
Will rely on the PR's NO_CHANGELOG=true marker until a changelog entry
is ready.

Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Statement.execute() incorrectly returns true for DML containing UNION/INTERSECT/EXCEPT, breaking frameworks that rely on !execute() to detect DML

2 participants