-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Describe what's wrong
Issue is present on 25.8, 26.1 altinityalntalya, but not present on 25.8, 26.1 upstream.
Reading an Iceberg table with a TIME column via ClickHouse returns different units depending on the partition spec, even though the schema and data are identical.
The same values are returned as microseconds in one case and seconds in another.
Does it reproduce on the most recent release?
Yes
How to reproduce
-
Create two Iceberg tables with the same schema and data
-
Schema includes a TIME column (TimeType)
-
Only difference is the partition spec (empty vs identity partition on string column (partition by time column did not work for me Can not read from Iceberg table that was partitioned by Time column ClickHouse/ClickHouse#94685)
-
Read both tables via DataLakeCatalog
Non partitioned table:
SELECT time_column FROM iceberg_database.`namespace.table` ORDER BY tuple(*) FORMAT TabSeparated43200
46800
50400
Partitioned table:
SELECT time_column FROM iceberg_database.`namespace.table` ORDER BY tuple(*) FORMAT TabSeparated43200000000
46800000000
50400000000
Expected behavior
Partition spec differences must not change the semantic representation of Iceberg TIME values.
Error message and/or stacktrace
No response
Additional context
# partition_spec=PartitionSpec(
# PartitionField(source_id=2, field_id=1001, transform=IdentityTransform(), name="name")
# )
partition_spec=PartitionSpec()
sort_order=SortOrder()
schema=Schema(
NestedField(field_id=1, name=column_name, field_type=TimeType(), required=False),
NestedField(field_id=2, name="name", field_type=StringType(), required=False),
)
test_data = [
{column_name: time(12, 0, 0), "name": "test"},
{column_name: time(13, 0, 0), "name": "test2"},
{column_name: time(14, 0, 0), "name": "test3"},
]
df = pa.Table.from_pylist(test_data, schema=pa.schema([(column_name, pa.time64("us")), ("name", pa.string())]))
table.append(df)