Skip to content

Iceberg TIME type unit changes depending on partition spec in DataLakeCatalog #1535

@alsugiliazova

Description

@alsugiliazova

Describe what's wrong

Issue is present on 25.8, 26.1 altinityalntalya, but not present on 25.8, 26.1 upstream.

Reading an Iceberg table with a TIME column via ClickHouse returns different units depending on the partition spec, even though the schema and data are identical.
The same values are returned as microseconds in one case and seconds in another.

Does it reproduce on the most recent release?

Yes

How to reproduce

  1. Create two Iceberg tables with the same schema and data

  2. Schema includes a TIME column (TimeType)

  3. Only difference is the partition spec (empty vs identity partition on string column (partition by time column did not work for me Can not read from Iceberg table that was partitioned by Time column ClickHouse/ClickHouse#94685)

  4. Read both tables via DataLakeCatalog

Non partitioned table:

SELECT time_column FROM iceberg_database.`namespace.table` ORDER BY tuple(*) FORMAT TabSeparated
43200
46800
50400

Partitioned table:

SELECT time_column FROM iceberg_database.`namespace.table` ORDER BY tuple(*) FORMAT TabSeparated
43200000000
46800000000
50400000000

Expected behavior

Partition spec differences must not change the semantic representation of Iceberg TIME values.

Error message and/or stacktrace

No response

Additional context

# partition_spec=PartitionSpec(
#     PartitionField(source_id=2, field_id=1001, transform=IdentityTransform(), name="name")
# )
partition_spec=PartitionSpec()
sort_order=SortOrder()
schema=Schema(
                NestedField(field_id=1, name=column_name, field_type=TimeType(), required=False),
                NestedField(field_id=2, name="name", field_type=StringType(), required=False),
            )


test_data = [
            {column_name: time(12, 0, 0), "name": "test"},
            {column_name: time(13, 0, 0), "name": "test2"},
            {column_name: time(14, 0, 0), "name": "test3"},
        ]
df = pa.Table.from_pylist(test_data, schema=pa.schema([(column_name, pa.time64("us")), ("name", pa.string())]))
table.append(df)

Metadata

Metadata

Assignees

No one assigned

    Labels

    antalyabugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions