fix(table): Athena/Trino compatibility — fast-append data loss and legacy manifest field names by cassio-paesleme · Pull Request #4 · docker/iceberg-go

cassio-paesleme · 2026-04-10T14:08:10Z

Summary

Three changes required for correct interoperability between iceberg-go and Athena/Trino-written tables.

1. `feat: make Parquet root schema repetition configurable` (Harrison Crosse)

arrow-go defaults the Parquet root schema element to Repeated. Snowflake interprets this as one-level list encoding and rejects files with list columns.

Adds write.parquet.root-repetition table property (default: required), aligning iceberg-go with arrow-rs, pyarrow, and parquet-java.

Submitted upstream: apache#896

2. `fix: goroutine leak in positionDeleteRecordsToDataFiles` (Harrison Crosse)

iter.Pull was called unconditionally in the partitioned path, leaking one goroutine per write.

Already merged upstream: apache#825

3. `fix(table): fast-append must inherit all parent manifests unconditionally`

fastAppendFiles.existingManifests() filtered parent manifests using HasAddedFiles() || HasExistingFiles(). Both return false for inherited manifests written by Athena/Trino (counts = 0 per spec). Any data written by an external writer was silently dropped on the next iceberg-go append.

Fix: remove the filter; a fast-append never removes data files so all parent manifests are always inherited.

Open upstream: apache#869

4. `refactor: replace per-getter coalescing with post-decode normalization`

Pre-1.4 Java Iceberg and Athena use added_data_files_count / existing_data_files_count / deleted_data_files_count as field names (IDs 504/505/506). hamba/avro silently left these at zero when it couldn't match the writer-schema names to struct tags — causing HasAddedFiles() to return false and triggering the data loss in fix #3 above.

Fix: parallel struct fields with legacy avro tags, normalized into canonical fields by a single normalizeLegacyCounts() call after decode. All count getters remain one-liners.

Open upstream: apache#897

Verification

End-to-end compatibility confirmed against real Athena-written staging tables: docker/data-platform#406

Add write.parquet.root-repetition property (required/optional/repeated, default: required) to control the Parquet root schema element's repetition type. arrow-go defaults to Repeated, which Snowflake interprets as one-level list encoding and rejects files with list columns. Defaulting to Required aligns with the Parquet spec and matches arrow-rs, pyarrow, and parquet-java behavior.

iter.Pull(args.counter) was called unconditionally, but in the partitioned path newWriterFactory creates its own iter.Pull and the original stopCount was never called, leaking one goroutine per write. Move iter.Pull into the unpartitioned branch where it is actually used. Add a regression test confirming goroutine count stays stable.

brefsdal · 2026-04-10T15:13:30Z

-		}
+	previous, err := fa.base.txn.meta.SnapshotByID(fa.base.parentSnapshotID)
+	if err != nil {
+		return nil, fmt.Errorf("could not find parent snapshot %d: %w", fa.base.parentSnapshotID, err)


…ally fastAppendFiles.existingManifests() filtered parent manifests using HasAddedFiles() || HasExistingFiles(). Both methods return false when the manifest list entry has added_files_count=0 and existing_files_count=0, which is the standard Iceberg v2 representation for inherited manifests written by external writers such as Athena, Spark, and Trino. As a result, any data written by an external writer was silently dropped from the snapshot on the next iceberg-go fast-append. Queries against the table after the append returned only the iceberg-go-written rows; all previously existing data became invisible. A fast-append never removes or overwrites data files, so the correct behaviour is to inherit all manifests from the parent snapshot unconditionally. Remove the filter and return previous.Manifests() directly. Fixes: data loss when appending to an Iceberg table that was previously written by Athena or other external writers. Tested: new TestFastAppendInheritsZeroCountManifests reproduces the bug (FAIL before patch, PASS after) and the full ./table/... suite passes with no regressions.

The previous approach added parallel struct fields (AddedDataFilesCount, ExistingDataFilesCount, DeletedDataFilesCount) and coalesced them in every getter. The upstream reviewer correctly flagged this as fragile — every accessor had to remember the fallback, and any future legacy name would require N getter changes. Replace with a single normalizeLegacyCounts() call, invoked once immediately after Avro decode in both decodeManifests and decodeManifestsWithFallback. The legacy struct fields are renamed (LegacyAdded/Existing/DeletedFilesCount) to make their decode-only purpose explicit. All six count getters and both Has* methods are reverted to true one-liners. The <= 0 check in normalizeLegacyCounts handles both the V2 zero-value case and the V1 -1 (unknown/null) sentinel, so no changes to toFile() are needed. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

The LegacyAdded/Existing/DeletedFilesCount struct fields remain for the read path (decoding pre-1.4 Java / Athena files), but the avro_schemas.go write schemas no longer declare added_data_files_count, existing_data_files_count, or deleted_data_files_count. Emitting two fields with the same Iceberg field-id (504/505/506) is a spec violation. The legacy names are only needed for backward-compatible reading, not writing.

hcrosse added 2 commits March 30, 2026 14:12

cassio-paesleme force-pushed the fix/fast-append-v3 branch 5 times, most recently from 3b42d06 to b33dcb2 Compare April 10, 2026 15:12

brefsdal reviewed Apr 10, 2026

View reviewed changes

cassio-paesleme force-pushed the fix/fast-append-v3 branch from b33dcb2 to 21052fe Compare April 10, 2026 16:56

cassio-paesleme changed the title ~~fix(table): fast-append must inherit all parent manifests unconditionally~~ fix(table): Athena/Trino compatibility — fast-append data loss and legacy manifest field names Apr 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(table): Athena/Trino compatibility — fast-append data loss and legacy manifest field names#4

fix(table): Athena/Trino compatibility — fast-append data loss and legacy manifest field names#4
cassio-paesleme wants to merge 5 commits intodocker:mainfrom
cassio-paesleme:fix/fast-append-v3

cassio-paesleme commented Apr 10, 2026 •

edited

Loading

Uh oh!

brefsdal Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

cassio-paesleme commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. feat: make Parquet root schema repetition configurable (Harrison Crosse)

2. fix: goroutine leak in positionDeleteRecordsToDataFiles (Harrison Crosse)

3. fix(table): fast-append must inherit all parent manifests unconditionally

4. refactor: replace per-getter coalescing with post-decode normalization

Verification

Uh oh!

brefsdal Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cassio-paesleme commented Apr 10, 2026 •

edited

Loading

1. `feat: make Parquet root schema repetition configurable` (Harrison Crosse)

2. `fix: goroutine leak in positionDeleteRecordsToDataFiles` (Harrison Crosse)

3. `fix(table): fast-append must inherit all parent manifests unconditionally`

4. `refactor: replace per-getter coalescing with post-decode normalization`