Fix DML misclassification for statements containing UNION/INTERSECT/EXCEPT#1420
Open
msrathore-db wants to merge 2 commits intodatabricks:mainfrom
Open
Fix DML misclassification for statements containing UNION/INTERSECT/EXCEPT#1420msrathore-db wants to merge 2 commits intodatabricks:mainfrom
msrathore-db wants to merge 2 commits intodatabricks:mainfrom
Conversation
…XCEPT `Statement.execute()` was incorrectly returning `true` (and `getUpdateCount()` returning `-1`) for `INSERT` / `UPDATE` / `DELETE` / `MERGE` statements whose subqueries or CTEs contained `UNION`, `INTERSECT`, or `EXCEPT`. The `UNION_PATTERN`, `INTERSECT_PATTERN`, and `EXCEPT_PATTERN` regexes in `DatabricksJdbcConstants` are non-anchored (`\s+UNION\s+` etc.) and were matched via `find()` inside `shouldReturnResultSet`, so the keyword was picked up anywhere in the SQL — even deep inside a subquery of an outer DML, or inside the Databricks column-exclusion form `SELECT * EXCEPT (col)`. `executeUpdate()` then threw `DatabricksSQLException`, `getUpdateCount()` lost the affected-row count, and frameworks like Slick that use `!execute()` as the DML detector crashed. This also regressed behavior from the Simba `2.7.5` driver, which returned `false` for these inputs. Short-circuit `shouldReturnResultSet` to `false` when the trimmed query starts with a DML keyword, so the non-anchored set-operator patterns can't fire on subquery content. A separate `DML_PREFIX_PATTERN` is added so `INSERT_PATTERN` (shared with the batching parser and requiring `INSERT INTO`) is untouched — this lets the guard also cover `INSERT OVERWRITE ...`. The existing `NonRowcountQueryPrefixes` opt-in still wins: it is evaluated before the new short-circuit. Fixes databricks#1418 Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
Will rely on the PR's NO_CHANGELOG=true marker until a changelog entry is ready. Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
gopalldb
approved these changes
Apr 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #1418.
Statement.execute()was incorrectly returningtrue(andgetUpdateCount()returning-1) forINSERT/UPDATE/DELETE/MERGEstatements whose subqueries or CTEs containedUNION,INTERSECT, orEXCEPT. TheUNION_PATTERN,INTERSECT_PATTERN, andEXCEPT_PATTERNregexes inDatabricksJdbcConstantsare non-anchored (\s+UNION\s+etc.) and were matched viafind()insideshouldReturnResultSet, so the keyword was picked up anywhere in the SQL — even deep inside a subquery of an outer DML, or inside the Databricks column-exclusion formSELECT * EXCEPT (col).Downstream:
executeUpdate()threwDatabricksSQLExceptionbecauseshouldReturnResultSet==truetriggers the post-execution guard atDatabricksStatement.java:103-108.getUpdateCount()returned-1becauseexecuteInternalforcesupdateCount = -1whenshouldReturnResultSet==true.!execute()as the DML detector (e.g. Slick'ssqlu) crashed on perfectly normal DML.2.7.5, which returnedfalsefor these inputs.Fix
Short-circuit
shouldReturnResultSettofalsewhen the trimmed query starts with a DML keyword, so the non-anchored set-operator patterns can't fire on subquery content:A separate
DML_PREFIX_PATTERN(matching^(\s*\()*\s*(INSERT|UPDATE|DELETE|MERGE)\s+) is added so the existingINSERT_PATTERN— shared withInsertStatementParserand requiringINSERT INTO— is untouched. This also lets the new guard coverINSERT OVERWRITE ...(called out in the issue).The existing
NonRowcountQueryPrefixesopt-in still wins: it is evaluated before the new short-circuit, so a user who has explicitly set e.g.NonRowcountQueryPrefixes=INSERTstill getsResultSetmode for INSERTs.Why this is safe
UNION/INTERSECT/EXCEPTtest case continues to classify as ResultSet via the already-anchoredSELECT_PATTERN/WITH_PATTERN/VALUES_PATTERN/FROM_PATTERN. A regression test for(SELECT ...) UNION (SELECT ...)is included.INSERT INTO t VALUES (1),UPDATE t SET c=1,DELETE FROM t,MERGE INTO ...) previously returnedfalsebecause none of the OR-chain patterns matched; it continues to returnfalsenow, via the short-circuit.INSERT_PATTERNis not modified, soInsertStatementParser/EnableBatchedInsertsare unaffected.jdbc-corepass.Test plan
DatabricksStatementTestcovering:INSERT ... SELECT ... FROM (... UNION ALL ...)INSERT ... FROM (... INTERSECT ...)INSERT ... FROM (... EXCEPT ...)INSERT ... SELECT * EXCEPT (col) FROM src(Databricks column-exclusion)INSERT OVERWRITE DIRECTORY ... INTERSECT ...UPDATE ... (... UNION ...)DELETE ... (... EXCEPT ...)MERGE INTO ... USING (... UNION ...) ...NonRowcountQueryPrefixes=INSERTopt-in still forces ResultSet mode(SELECT ...) UNION (SELECT ...)still classified as ResultSet (regression guard)DatabricksStatementTestcases pass (including top-levelUnionQuery/IntersectQuery/ExceptQuery).DatabricksPreparedStatementTestcases pass.InsertStatementParserTestcases pass (batching parser untouched).mvn test -pl jdbc-coreclean: 3305 tests, 0 failures, 0 errors.NO_CHANGELOG=true