Add support for Oracle databases & dialect in PyDough by john-sanchez31 · Pull Request #484 · bodo-ai/PyDough

john-sanchez31 · 2026-02-03T15:42:58Z

Resolves #479

review-notebook-app · 2026-02-03T19:04:05Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

…umns

…iff days, trunc week, trunc day, join_strings, extract quarters, cast int

hadia206

Good work John.
Please see my comments below

pydough/database_connectors/builtin_databases.py

pydough/sqlglot/transform_bindings/oracle_transform_bindings.py

hadia206 · 2026-03-10T19:46:12Z

pydough/sqlglot/transform_bindings/oracle_transform_bindings.py

+        self,
+        args: list[SQLGlotExpression],
+        types: list[PyDoughType],
+    ) -> SQLGlotExpression:


Add comment to explain why we needed to override the base, i.e. difference between this and the base?

hadia206 · 2026-03-10T19:55:29Z

pydough/sqlglot/transform_bindings/oracle_transform_bindings.py

+            )
+        match operator:
+            case pydop.DEFAULT_TO:
+                # sqlglot convert COALESCE in NVL for Oracle, which is fine for


Suggested change

# sqlglot convert COALESCE in NVL for Oracle, which is fine for

# sqlglot convert COALESCE to NVL for Oracle, which is fine for

pydough/sqlglot/transform_bindings/oracle_transform_bindings.py

hadia206 · 2026-03-10T20:57:26Z

pydough/sqlglot/sqlglot_relational_expression_visitor.py

+                        sqlglot_expressions.Literal.string("YYYY-MM-DD"),
+                    ],
+                )
+            if isinstance(literal_expression.value, datetime.datetime):


datetime.datetime is a subclass of datetime.date, so a datetime value matches both checks. That means if you have a datetime it'll go to both and the second would silently overwrites the first.

So 2 things:
This should be if - elif and the check for datetime.datetime should come first

if isinstance(literal_expression.value, datetime.datetime): ... elif isinstance(literal_expression.value, datetime.date): ...

The same should be applied to ANSI portion

hadia206 · 2026-03-10T21:00:47Z

pydough/sqlglot/sqlglot_relational_expression_visitor.py

+                        "PyDough does not yet support datetime values with a timezone"
+                    )
+                literal = sqlglot_expressions.Anonymous(
+                    this="TO_DATE",


Why not TO_TIMESTAMP like oracle_transform_bindings?

At first I was using TO_TIMESTAMP but that gave me problems with some operations because of datatypes, this is more consistent.

As long as we've confirmed this is working as expected for timestamps, I'm ok with this.

In Oracle it seems that DATE type stores year, month, day, hour, minute, and second. It does not store fractional seconds or time zone information, which TIMESTAMP does. But the issue is that TIMESTAMP has to be worked differently when adding and subtracting.

Ok. This means sub-second precision is silently truncated, and comparisons against TIMESTAMP columns will also drop those sub-seconds. So we need to document that in the code (why TO_DATE was chosen instead of TO_TIMESTAMP) and also for the user

hadia206 · 2026-03-10T21:09:32Z

pydough/sqlglot/override_unnest_subqueries.py

+                # PYDOUGH CHANGE: ignore SYSTIMESTAMP since it is a special case
+                # of a column that should not be considered external
+                if isinstance(c.this, exp.Identifier)
+                and c.this.this != "SYSTIMESTAMP"


Is it special for other dialects too? Let's confirm that.
If it's not, we need to figure out how to make this change only for Oracle

or document it clearly that SYSTEMTIMESTAMP is not recommended to be used as column name because ...

hadia206 · 2026-03-10T21:10:43Z

pydough/sqlglot/override_unnest_subqueries.py

+        external_columns = get_scope_external_columns(scope)
        if not parent:
            continue
        if scope.external_columns:


Shouldn't this be updated too?

Suggested change

if scope.external_columns:

if external_columns:

Don't forget this

hadia206 · 2026-03-10T21:17:18Z

tests/conftest.py

        pytest.param(DatabaseDialect.SNOWFLAKE, id="snowflake"),
        pytest.param(DatabaseDialect.MYSQL, id="mysql"),
        pytest.param(DatabaseDialect.POSTGRES, id="postgres"),
+        pytest.param(DatabaseDialect.ORACLE, id="oracle"),


Add Oracle to all_dialects_tpch_db_context as well.
See how it's used for division by zero test.
We need to start moving our tests to use that so that it's one function used for all. Perhaps you can do that for one test.

Pick your poison test_pipeline_e2e_oracle_tpch or test_pipeline_e2e_oracle_tpch_custom and I'll do the other one in my PR 🥲

Yes, we should be able to start combining a few things across the PyDough pytest testing infrastructure, seeing as how we have a very high amount of duplicated code regarding our common test sets (tpch, tpch custom, defog, defog custom)

knassre-bodo

Almost everything looks good! A few of Hadia's comments still need to be addressed, and I'll withhold approval until CI is passing stably after the changes (at which point I'll do a final once-over in case we've missed anything), but great job @john-sanchez31!

knassre-bodo · 2026-03-13T03:22:37Z

pydough/sqlglot/sqlglot_relational_expression_visitor.py

+                        "PyDough does not yet support datetime values with a timezone"
+                    )
+                literal = sqlglot_expressions.Anonymous(
+                    this="TO_DATE",


As long as we've confirmed this is working as expected for timestamps, I'm ok with this.

knassre-bodo · 2026-03-13T03:23:56Z

pydough/sqlglot/sqlglot_relational_visitor.py

@@ -420,6 +421,9 @@ def visit_join(self, join: Join) -> None:
        if join_type == "SEMI" and join.cardinality.singular:
            join_type == "INNER"


Let's also fix the == to an = here, not sure how that one slipped through.

knassre-bodo · 2026-03-13T03:26:46Z

pydough/database_connectors/builtin_databases.py

+    connection: oracledb.connection
+    if connection := kwargs.pop("connection", None):
+        # If a connection object is provided, return it wrapped in
+        # DatabaseConnection


Instead of pre-declaring the type, I'd assert that the type is oracledb.connection before returning.

knassre-bodo · 2026-03-13T03:28:28Z

pydough/sqlglot/transform_bindings/oracle_transform_bindings.py

+        return False
+
+    @property
+    def oracle_strftime_mapping(self) -> dict[str, str]:


Don't forget about this

knassre-bodo · 2026-03-13T03:29:54Z

pydough/sqlglot/transform_bindings/oracle_transform_bindings.py

+        }
+
+    PYDOP_TO_ORACLE_FUNC: dict[pydop.PyDoughExpressionOperator, str] = {
+        pydop.ABS: "ABS",


What about LPAD and RPAD

knassre-bodo · 2026-03-13T04:09:52Z

tests/test_pydough_functions/simple_pydough_functions.py

-        QUARTER("2023-10-01"),  # Q4
-        QUARTER("2023-11-15"),  # Q4
-        QUARTER("2023-12-31"),  # Q4
+        QUARTER(DATETIME("2023-01-15")),  # Q1


These changes should not be necessary; functions like QUARTER should already generate such a cast if needed with make_datetime_arg

knassre-bodo · 2026-03-13T04:10:23Z

tests/test_pydough_functions/simple_pydough_functions.py

+            quarters_since_1995=DATEDIFF("quarter", DATETIME("1995-01-01"), order_date),
+            quarters_until_2000=DATEDIFF("quarter", order_date, DATETIME("2000-01-01")),


knassre-bodo · 2026-03-13T04:11:55Z

tests/test_sql_refsols/defog_academic_gen14_sqlite.sql

I think you may need to redo the sql generation because of this test causing it to fail.

knassre-bodo · 2026-03-13T04:12:47Z

tests/test_sql_refsols/defog_academic_gen23_ansi.sql

+WITH _u_0 AS (
+  SELECT
+    oid AS _u_1
+  FROM main.organization
+  GROUP BY
+    1
+)


Yeah, make sure to change that == to an = then redo the sql generation because of stuff like this

knassre-bodo · 2026-03-13T04:27:32Z

pydough/database_connectors/builtin_databases.py

-    supported_databases = {"postgres", "mysql", "sqlite", "snowflake"}
+    supported_databases = {"postgres", "mysql", "sqlite", "snowflake", "oracle"}


Whoops, forgot to add BodoSQL here. Do you mind add that as well?

…n s3]

…e][run s3]

…n s3]

…le][run s3]

…e][run s3]

hadia206

Great work John. Almost there.
I still have a couple of comments

hadia206 · 2026-03-17T20:20:45Z

pydough/sqlglot/transform_bindings/base_transform_bindings.py

+                        this="CHAR",
+                        expressions=[sqlglot_expressions.Literal.number(13)],
+                    ),  # carriage return
+                    sqlglot_expressions.Literal.string(" "),


nit: add comment like others

hadia206 · 2026-03-17T20:24:40Z

pydough/sqlglot/transform_bindings/mysql_transform_bindings.py


        replacement_expr: SQLGlotExpression = (
-            sqlglot_expressions.Literal.string("\\s") if len(args) == 1 else args[1]
+            sqlglot_expressions.Literal.string("\\s\\t\\r\\n")


I believe in MySQL \\s means all of them. You don't need to specify each separately
The regex engine natively understands \\s as a shorthand for the entire set of whitespace characters, including tabs and newlines.

hadia206 · 2026-03-17T20:31:11Z

pydough/sqlglot/transform_bindings/oracle_transform_bindings.py

+                    if step_idx != 1:
+                        raise ValueError(
+                            "SLICE function currently only supports the step being integer literal 1 or absent."
+                        )


This is what I mean and in this case you can be more specific with error message

if not isinstance(step, sqlglot_expressions.Null): if isinstance(step, sqlglot_expressions.Literal): try: step_idx = int(step.this) except ValueError: raise ValueError( "SLICE function currently only supports the step being integer literal 1 or absent, got non-integer literal." ) if step_idx != 1: raise ValueError( "SLICE function currently only supports the step being integer literal 1 or absent, got value > 1." ) else: raise ValueError(...)

hadia206 · 2026-03-17T20:35:46Z

pydough/sqlglot/transform_bindings/oracle_transform_bindings.py

+        if isinstance(types[0], StringType):
+            args = [
+                sqlglot_expressions.Coalesce(
+                    this=arg if arg.this != "" else sqlglot_expressions.Null(),


Apologies I had it the other way around. isinstance guard was suggested to avoid accessing .this on a non-Literal expression without type checking

Suggested change

this=arg if arg.this != "" else sqlglot_expressions.Null(),

this=arg if not (isinstance(arg, sqlglot_expressions.Literal) and arg.this == "") else sqlglot_expressions.Null(),

Suggested change

this=arg if arg.this != "" else sqlglot_expressions.Null(),

this=arg if not (isinstance(arg, sqlglot_expressions.Literal) and arg.this == "") else sqlglot_expressions.Null(),

hadia206 · 2026-03-17T20:39:08Z

pydough/sqlglot/transform_bindings/oracle_transform_bindings.py

+        uses TO_CHAR with Oracle-specific date format semantics.
+        """
+        if len(args) == 1:
+            # Length defaults to 4000 which is the max length of a VARCHAR2 in


What happens if arg has more than 4000? Would it truncate, error,...? We need to document that to the user

Changed the implementation, now uses TO_CHAR so the limit is not longer needed

hadia206 · 2026-03-17T20:58:17Z

tests/test_pipeline_bodosql.py

        bodo.user_logging.restore_default_bodo_verbose_logger()


-@pytest.mark.bodosql


bodosql is not added all_dialects_tpch_db_context. Add it, if there're problems revert that delete.

hadia206 · 2026-03-17T21:00:34Z

tests/test_session.py

    )()

-    # DATABASE mode: Snowflake/Postgres throw errors, SQLite/MySQL return NULL
+    # DATABASE mode: Snowflake/Postgres/Oracle throw errors, SQLite/MySQL return NULL


Need to check it for BodoSQL too, it was missed becuase BodoSQL was not added to all_dialects_tpch_db_context

hadia206 · 2026-03-17T21:01:27Z

pyproject.toml

@@ -46,6 +47,7 @@ mysql = ["mysql-connector-python==9.5.0"]
 postgres = ["psycopg2-binary"]
 server = ["fastapi", "httpx", "uvicorn"]
 bodo = ["bodo>=2026.2", "bodosql>=2026.2", "pyiceberg[pyiceberg-core]==0.10.0"]


Not related to the PR but I think this should be bodosql for consistency. Could you do that change too? Thanks!

hadia206 · 2026-03-17T21:07:26Z

pydough/sqlglot/transform_bindings/oracle_transform_bindings.py

+        """
+        String joining must be implemented manually using the `||` operator with
+        explicit NULL handling via NVL to match CONCAT_WS semantics
+        for Oracle.


NOTE:
Other dialects use CONCAT_WS which skips NULLs ('a,b'), but this implementation uses NVL(arg, '') which replaces NULLs with empty string ('a,,b').
Since the other dialects agree on skip-NULL semantics, Oracle should match.

So lets update that and add a test case covering NULL arguments in JOIN_STRINGS

Also, document that in the user docs

Not sure about this one, I tried JOIN_STRINGS(",", "a", None, "b") and in all dialects the result SQL was NULL, so oracle does match with Null values. Same when '' (empty_string) is used. Added a column on string_functions with None value.

hadia206 · 2026-03-17T21:16:07Z

pydough/sqlglot/execute_relational.py

+    for arg in expr.iter_expressions():
+        quote_oracle_identifiers(arg, dialect)


A thought

Is this correct for replaced identifiers? After expr.replace(new_identifier), the subsequent for arg in expr.iter_expressions() iterates the detached expr's children rather than new_identifier's children.
This is ok. since Identifier nodes have no sub-expressions but is it safe to rely on this?

knassre-bodo

Nothing bad jumps out to me; once everything is passing, and you address that one comment I left, feel free to merge.

knassre-bodo · 2026-03-19T17:16:13Z

pydough/sqlglot/sqlglot_relational_expression_visitor.py

+        date: datetime.date
+        dt: datetime.datetime
        if self._dialect == DatabaseDialect.ANSI:
-            if isinstance(literal_expression.value, datetime.date):
-                date: datetime.date = literal_expression.value
+            if isinstance(literal_expression.value, datetime.datetime):
+                dt = literal_expression.value


We may want to move a lot of this function's core details into the bindings (e.g. create a self.bindings.convert_datetime_literal(...) so we can control it at a dialect level)

…alects]

…acle]

john-sanchez31 added 2 commits February 3, 2026 09:40

Initial commit

89a3425

Oracle database connector

85de6c7

john-sanchez31 added 27 commits February 3, 2026 16:13

tpch 1 demo added

a52e3eb

example modified

6957a69

oracle transform bindings base

f3a0feb

oracle test infrastructure

68b8a5f

refsol oracle files added

af42e17

WIP: slice and get_part

69478a6

Merge branch 'main' into John/oracle

064fa84

adding oracle refsol files

17d4a95

quote identifiers starting with underscore, quote alias of quoted col…

17334ad

…umns

coalesce fixed and metadata fixed

ce5227e

Merge branch 'main' into John/oracle

8165fd0

datediff implementation, dayofweek and daysfromstart

cefdbcc

dates fixed, refsol files updated

23017f7

Conflicts solved

847401a

adding oracle defog fixture

c36871b

fixing quoted alias, adding dataframe collection oracle refsol

a179a9e

fix for anti joins in oracle and refsol updated

e6b912a

fixing cast datetime, and trunc

b8ef6dd

quarter truncation fixed, add_months added

2995451

dataframe and range collection support

524fd30

week_offset fixed, df quoted name

2fb20ac

get part implementation

b6acba4

STRCOUNT oracle implementation

0b63c20

Strip, smallest, largest, variance, std, diff quarters, diff weeks, d…

f0ea15f

…iff days, trunc week, trunc day, join_strings, extract quarters, cast int

[run ci] [run dialects]

fc17d41

main merged and conflicts solved

90df7d0

updating refsol files, test metadata fixed [run ci][run dialects]

d15086c

john-sanchez31 requested review from hadia206, juankx-bodo and knassre-bodo and removed request for a team March 9, 2026 15:50

hadia206 reviewed Mar 10, 2026

View reviewed changes

john-sanchez31 added 6 commits March 12, 2026 08:30

addressing comments, updating refsols

a28fb1f

adding division by zero tests for oracle [run ci][run dialects]

d3c09ba

fixing wrong refsol [run ci][run dialects]

8b6aabc

tpch fixture refactorization [run ci][run dialects]

2a8ad9b

conflicts solved [run ci][run dialects]

74b0818

updating refsol oracle academic_gen14 [run ci]

022e200

knassre-bodo reviewed Mar 13, 2026

View reviewed changes

john-sanchez31 added 3 commits March 16, 2026 16:39

fixing last details PR [run ci][run dialects]

c839a6d

one more try [run ci][run dialects]

6e8a561

testing [run ci][run postgres][run bodosql][run mysql][run oracle][ru…

24f3b04

…n s3]

john-sanchez31 requested review from hadia206 and knassre-bodo March 16, 2026 23:36

john-sanchez31 added 5 commits March 16, 2026 17:57

fixing strip [run ci][run postgres][run bodosql][run mysql][run oracl…

b38f0ec

…e][run s3]

refsol files updated [run ci][run mysql][run postgres][run oracle][ru…

830aa8f

…n s3]

refsol files updated again [run ci][run mysql][run postgres][run orac…

8907168

…le][run s3]

strip default chars added [run ci][run mysql][run postgres][run oracl…

5a8fe2f

…e][run s3]

testing [run ci][run dialects]

63990d3

hadia206 reviewed Mar 17, 2026

View reviewed changes

john-sanchez31 added 3 commits March 19, 2026 08:26

Addressing review comments [run ci][run dialects]

ce873ac

adding fixture [run ci][run dialects]

2e20048

adding required import [run ci][run dialects]

2405283

knassre-bodo approved these changes Mar 19, 2026

View reviewed changes

convert_literal_expression per dialect implementation [run ci][run di…

856d05a

…alects]

john-sanchez31 requested a review from hadia206 March 20, 2026 14:00

adding note about quotes on oracle column/table names [run ci][run or…

86054ab

…acle]

	# sqlglot convert COALESCE in NVL for Oracle, which is fine for
	# sqlglot convert COALESCE to NVL for Oracle, which is fine for

		@@ -420,6 +421,9 @@ def visit_join(self, join: Join) -> None:
		if join_type == "SEMI" and join.cardinality.singular:
		join_type == "INNER"

		quarters_since_1995=DATEDIFF("quarter", DATETIME("1995-01-01"), order_date),
		quarters_until_2000=DATEDIFF("quarter", order_date, DATETIME("2000-01-01")),

		supported_databases = {"postgres", "mysql", "sqlite", "snowflake"}
		supported_databases = {"postgres", "mysql", "sqlite", "snowflake", "oracle"}

	this=arg if arg.this != "" else sqlglot_expressions.Null(),
	this=arg if not (isinstance(arg, sqlglot_expressions.Literal) and arg.this == "") else sqlglot_expressions.Null(),

		bodo.user_logging.restore_default_bodo_verbose_logger()


		@pytest.mark.bodosql

		for arg in expr.iter_expressions():
		quote_oracle_identifiers(arg, dialect)

Conversation

john-sanchez31 commented Feb 3, 2026

Uh oh!

review-notebook-app bot commented Feb 3, 2026

Uh oh!

hadia206 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

knassre-bodo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hadia206 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hadia206 Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

hadia206 Mar 17, 2026 •

edited

Loading