Skip to content

feat(table): make Parquet root schema repetition configurable#896

Draft
cassio-paesleme wants to merge 2 commits intoapache:mainfrom
cassio-paesleme:feat/parquet-root-repetition
Draft

feat(table): make Parquet root schema repetition configurable#896
cassio-paesleme wants to merge 2 commits intoapache:mainfrom
cassio-paesleme:feat/parquet-root-repetition

Conversation

@cassio-paesleme
Copy link
Copy Markdown
Contributor

Problem

arrow-go defaults the Parquet root schema element repetition to Repeated. Snowflake (and some other readers) interpret Repeated at the root as one-level list encoding and reject files that contain list columns. This causes write failures when targeting Snowflake-managed Iceberg tables.

Fix

Add a write.parquet.root-repetition table property (values: required / optional / repeated, default: required). The default required aligns with the Parquet spec and matches the behaviour of arrow-rs, pyarrow, and parquet-java.

The property is applied in parquetFormat.GetWriteProperties via parquet.WithRootRepetition.

Why this default is correct

The Parquet spec defines the root message element as a container, not a repeated field. Defaulting to Required is the most interoperable choice and is what every major Parquet writer already does. arrow-go's Repeated default is an outlier.

Testing

Existing ./table/... suite passes. The property is exercised end-to-end in the Docker data platform against Snowflake-managed Iceberg tables (docker/data-platform#406).


Original implementation by @hcrosse.

hcrosse and others added 2 commits April 14, 2026 10:44
Add write.parquet.root-repetition property (required/optional/repeated,
default: required) to control the Parquet root schema element's
repetition type. arrow-go defaults to Repeated, which Snowflake
interprets as one-level list encoding and rejects files with list
columns. Defaulting to Required aligns with the Parquet spec and
matches arrow-rs, pyarrow, and parquet-java behavior.
@hcrosse
Copy link
Copy Markdown
Contributor

hcrosse commented Apr 14, 2026

I think this is resolved by apache/arrow-go#723, so it should be fixed when arrow-go cuts a new release & iceberg-go bumps the dependency.

@zeroshade
Copy link
Copy Markdown
Member

@cassio-paesleme can you confirm that the update to arrow-go and the bump here solved this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants