fix(table): Athena/Trino compatibility — fast-append data loss and legacy manifest field names#4
Open
cassio-paesleme wants to merge 5 commits intodocker:mainfrom
Open
Conversation
Add write.parquet.root-repetition property (required/optional/repeated, default: required) to control the Parquet root schema element's repetition type. arrow-go defaults to Repeated, which Snowflake interprets as one-level list encoding and rejects files with list columns. Defaulting to Required aligns with the Parquet spec and matches arrow-rs, pyarrow, and parquet-java behavior.
iter.Pull(args.counter) was called unconditionally, but in the partitioned path newWriterFactory creates its own iter.Pull and the original stopCount was never called, leaking one goroutine per write. Move iter.Pull into the unpartitioned branch where it is actually used. Add a regression test confirming goroutine count stays stable.
3b42d06 to
b33dcb2
Compare
brefsdal
reviewed
Apr 10, 2026
| } | ||
| previous, err := fa.base.txn.meta.SnapshotByID(fa.base.parentSnapshotID) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("could not find parent snapshot %d: %w", fa.base.parentSnapshotID, err) |
…ally fastAppendFiles.existingManifests() filtered parent manifests using HasAddedFiles() || HasExistingFiles(). Both methods return false when the manifest list entry has added_files_count=0 and existing_files_count=0, which is the standard Iceberg v2 representation for inherited manifests written by external writers such as Athena, Spark, and Trino. As a result, any data written by an external writer was silently dropped from the snapshot on the next iceberg-go fast-append. Queries against the table after the append returned only the iceberg-go-written rows; all previously existing data became invisible. A fast-append never removes or overwrites data files, so the correct behaviour is to inherit all manifests from the parent snapshot unconditionally. Remove the filter and return previous.Manifests() directly. Fixes: data loss when appending to an Iceberg table that was previously written by Athena or other external writers. Tested: new TestFastAppendInheritsZeroCountManifests reproduces the bug (FAIL before patch, PASS after) and the full ./table/... suite passes with no regressions.
b33dcb2 to
21052fe
Compare
The previous approach added parallel struct fields (AddedDataFilesCount, ExistingDataFilesCount, DeletedDataFilesCount) and coalesced them in every getter. The upstream reviewer correctly flagged this as fragile — every accessor had to remember the fallback, and any future legacy name would require N getter changes. Replace with a single normalizeLegacyCounts() call, invoked once immediately after Avro decode in both decodeManifests and decodeManifestsWithFallback. The legacy struct fields are renamed (LegacyAdded/Existing/DeletedFilesCount) to make their decode-only purpose explicit. All six count getters and both Has* methods are reverted to true one-liners. The <= 0 check in normalizeLegacyCounts handles both the V2 zero-value case and the V1 -1 (unknown/null) sentinel, so no changes to toFile() are needed. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
The LegacyAdded/Existing/DeletedFilesCount struct fields remain for the read path (decoding pre-1.4 Java / Athena files), but the avro_schemas.go write schemas no longer declare added_data_files_count, existing_data_files_count, or deleted_data_files_count. Emitting two fields with the same Iceberg field-id (504/505/506) is a spec violation. The legacy names are only needed for backward-compatible reading, not writing.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three changes required for correct interoperability between iceberg-go and Athena/Trino-written tables.
1.
feat: make Parquet root schema repetition configurable(Harrison Crosse)arrow-go defaults the Parquet root schema element to
Repeated. Snowflake interprets this as one-level list encoding and rejects files with list columns.Adds
write.parquet.root-repetitiontable property (default:required), aligning iceberg-go with arrow-rs, pyarrow, and parquet-java.Submitted upstream: apache#896
2.
fix: goroutine leak in positionDeleteRecordsToDataFiles(Harrison Crosse)iter.Pullwas called unconditionally in the partitioned path, leaking one goroutine per write.Already merged upstream: apache#825
3.
fix(table): fast-append must inherit all parent manifests unconditionallyfastAppendFiles.existingManifests()filtered parent manifests usingHasAddedFiles() || HasExistingFiles(). Both return false for inherited manifests written by Athena/Trino (counts = 0 per spec). Any data written by an external writer was silently dropped on the next iceberg-go append.Fix: remove the filter; a fast-append never removes data files so all parent manifests are always inherited.
Open upstream: apache#869
4.
refactor: replace per-getter coalescing with post-decode normalizationPre-1.4 Java Iceberg and Athena use
added_data_files_count/existing_data_files_count/deleted_data_files_countas field names (IDs 504/505/506). hamba/avro silently left these at zero when it couldn't match the writer-schema names to struct tags — causingHasAddedFiles()to return false and triggering the data loss in fix #3 above.Fix: parallel struct fields with legacy avro tags, normalized into canonical fields by a single
normalizeLegacyCounts()call after decode. All count getters remain one-liners.Open upstream: apache#897
Verification
End-to-end compatibility confirmed against real Athena-written staging tables: docker/data-platform#406