Skip to content

fix(table): Athena/Trino compatibility — fast-append data loss and legacy manifest field names#4

Open
cassio-paesleme wants to merge 5 commits intodocker:mainfrom
cassio-paesleme:fix/fast-append-v3
Open

fix(table): Athena/Trino compatibility — fast-append data loss and legacy manifest field names#4
cassio-paesleme wants to merge 5 commits intodocker:mainfrom
cassio-paesleme:fix/fast-append-v3

Conversation

@cassio-paesleme
Copy link
Copy Markdown

@cassio-paesleme cassio-paesleme commented Apr 10, 2026

Summary

Three changes required for correct interoperability between iceberg-go and Athena/Trino-written tables.


1. feat: make Parquet root schema repetition configurable (Harrison Crosse)

arrow-go defaults the Parquet root schema element to Repeated. Snowflake interprets this as one-level list encoding and rejects files with list columns.

Adds write.parquet.root-repetition table property (default: required), aligning iceberg-go with arrow-rs, pyarrow, and parquet-java.

Submitted upstream: apache#896


2. fix: goroutine leak in positionDeleteRecordsToDataFiles (Harrison Crosse)

iter.Pull was called unconditionally in the partitioned path, leaking one goroutine per write.

Already merged upstream: apache#825


3. fix(table): fast-append must inherit all parent manifests unconditionally

fastAppendFiles.existingManifests() filtered parent manifests using HasAddedFiles() || HasExistingFiles(). Both return false for inherited manifests written by Athena/Trino (counts = 0 per spec). Any data written by an external writer was silently dropped on the next iceberg-go append.

Fix: remove the filter; a fast-append never removes data files so all parent manifests are always inherited.

Open upstream: apache#869


4. refactor: replace per-getter coalescing with post-decode normalization

Pre-1.4 Java Iceberg and Athena use added_data_files_count / existing_data_files_count / deleted_data_files_count as field names (IDs 504/505/506). hamba/avro silently left these at zero when it couldn't match the writer-schema names to struct tags — causing HasAddedFiles() to return false and triggering the data loss in fix #3 above.

Fix: parallel struct fields with legacy avro tags, normalized into canonical fields by a single normalizeLegacyCounts() call after decode. All count getters remain one-liners.

Open upstream: apache#897


Verification

End-to-end compatibility confirmed against real Athena-written staging tables: docker/data-platform#406

hcrosse added 2 commits March 30, 2026 14:12
Add write.parquet.root-repetition property (required/optional/repeated,
default: required) to control the Parquet root schema element's
repetition type. arrow-go defaults to Repeated, which Snowflake
interprets as one-level list encoding and rejects files with list
columns. Defaulting to Required aligns with the Parquet spec and
matches arrow-rs, pyarrow, and parquet-java behavior.
iter.Pull(args.counter) was called unconditionally, but in the
partitioned path newWriterFactory creates its own iter.Pull and the
original stopCount was never called, leaking one goroutine per write.

Move iter.Pull into the unpartitioned branch where it is actually used.
Add a regression test confirming goroutine count stays stable.
@cassio-paesleme cassio-paesleme force-pushed the fix/fast-append-v3 branch 5 times, most recently from 3b42d06 to b33dcb2 Compare April 10, 2026 15:12
}
previous, err := fa.base.txn.meta.SnapshotByID(fa.base.parentSnapshotID)
if err != nil {
return nil, fmt.Errorf("could not find parent snapshot %d: %w", fa.base.parentSnapshotID, err)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

…ally

fastAppendFiles.existingManifests() filtered parent manifests using
HasAddedFiles() || HasExistingFiles(). Both methods return false when the
manifest list entry has added_files_count=0 and existing_files_count=0,
which is the standard Iceberg v2 representation for inherited manifests
written by external writers such as Athena, Spark, and Trino.

As a result, any data written by an external writer was silently dropped
from the snapshot on the next iceberg-go fast-append. Queries against
the table after the append returned only the iceberg-go-written rows;
all previously existing data became invisible.

A fast-append never removes or overwrites data files, so the correct
behaviour is to inherit all manifests from the parent snapshot
unconditionally. Remove the filter and return previous.Manifests()
directly.

Fixes: data loss when appending to an Iceberg table that was previously
written by Athena or other external writers.

Tested: new TestFastAppendInheritsZeroCountManifests reproduces the bug
(FAIL before patch, PASS after) and the full ./table/... suite passes
with no regressions.
The previous approach added parallel struct fields (AddedDataFilesCount,
ExistingDataFilesCount, DeletedDataFilesCount) and coalesced them in every
getter. The upstream reviewer correctly flagged this as fragile — every
accessor had to remember the fallback, and any future legacy name would
require N getter changes.

Replace with a single normalizeLegacyCounts() call, invoked once immediately
after Avro decode in both decodeManifests and decodeManifestsWithFallback.
The legacy struct fields are renamed (LegacyAdded/Existing/DeletedFilesCount)
to make their decode-only purpose explicit. All six count getters and both
Has* methods are reverted to true one-liners.

The <= 0 check in normalizeLegacyCounts handles both the V2 zero-value case
and the V1 -1 (unknown/null) sentinel, so no changes to toFile() are needed.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@cassio-paesleme cassio-paesleme changed the title fix(table): fast-append must inherit all parent manifests unconditionally fix(table): Athena/Trino compatibility — fast-append data loss and legacy manifest field names Apr 14, 2026
The LegacyAdded/Existing/DeletedFilesCount struct fields remain for
the read path (decoding pre-1.4 Java / Athena files), but the
avro_schemas.go write schemas no longer declare added_data_files_count,
existing_data_files_count, or deleted_data_files_count.

Emitting two fields with the same Iceberg field-id (504/505/506) is a
spec violation. The legacy names are only needed for backward-compatible
reading, not writing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants