Skip to content

fix(occ): RowDelta uses partition-scoped conflict check instead of AlwaysTrue#983

Merged
laskoviymishka merged 5 commits intoapache:mainfrom
mzzz-zzm:fix/rowdelta-partition-conflict-filter
May 6, 2026
Merged

fix(occ): RowDelta uses partition-scoped conflict check instead of AlwaysTrue#983
laskoviymishka merged 5 commits intoapache:mainfrom
mzzz-zzm:fix/rowdelta-partition-conflict-filter

Conversation

@mzzz-zzm
Copy link
Copy Markdown
Contributor

@mzzz-zzm mzzz-zzm commented May 4, 2026

Fixes #978

Problem

RowDelta.validate passes iceberg.AlwaysTrue{} to validateNoConflictingDataFiles whenever equality-delete files are present. This means any concurrent append to any partition is treated as a conflict, even when it lands in a completely different partition from the equality-deletes. Under serializable isolation this causes spurious ErrConflictingDataFiles errors for workloads that write to multiple independent partitions concurrently.

Fix

RowDelta now collects the partition tuples of all equality-delete files it adds (eqDeletePartitions). A new validator, validateNoConflictingDataFilesInPartitions, checks concurrent data files only in those specific partition tuples:

  • If the partition set is empty (no eq-deletes), the check is skipped.
  • If any eq-delete is unpartitioned (empty tuple), it falls back to the conservative AlwaysTrue check, preserving existing safety.
  • Otherwise, only concurrent files in a matching partition are flagged.

Files changed

  • table/row_delta.go: collect eqDeletePartitions instead of hasEqDeletes
  • table/conflict_validation.go: validateNoConflictingDataFilesInPartitions + partitionTupleKey
  • table/partition_conflict_test.go: unit tests for both new functions

@mzzz-zzm mzzz-zzm requested a review from zeroshade as a code owner May 4, 2026 22:18
Copy link
Copy Markdown
Contributor

@laskoviymishka laskoviymishka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for picking this up — the diagnosis on #978 looks right, and partition-scoped validation is definitely the right direction.

I’d like to hold this version before merge though, mostly because the new validator may be a bit too narrow today. The old AlwaysTrue path was conservative, but safe. This version can miss conflicts under partition-spec evolution, and possibly for partition values like UUID, decimal, binary, or fixed.

The good news is I think this can be fixed without changing the overall direction. The codebase already has validateAddedDataFilesMatchingFilter, which handles per-spec projection, manifest pruning, and type-aware partition evaluation. Could we express the equality-delete partitions as an OR-of-equalities filter and route through that helper? That should avoid the string-key comparison and reuse the existing validation path.

A few things I’d love to see before merge:

  • add the #978 reproducer as a regression test
  • add same-partition reject / different-partition allow tests
  • add one UUID or decimal partition test
  • add a partition-spec evolution test
  • update the RowDelta.validate comments, since they still describe the old conservative behavior
  • split out the unrelated REST/S3/ancestry changes, so this PR stays focused

The empty-base ancestry fix looks useful too, just probably deserves its own PR.

Overall, I think this is the right line of work — I’d just rather make the new scoped validator as safe as the conservative one before we land it.

Comment thread table/conflict_validation.go Outdated
sort.Ints(keys)
var buf []byte
for _, k := range keys {
buf = fmt.Appendf(buf, "%d=%v;", k, p[k])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fmt.Appendf("%d=%v;", k, p[k]) is pretty fragile as an equality oracle here. map[int]any partition values can land as several Go types depending on construction path — uuid.UUID (= [16]byte) vs raw [16]byte, iceberg.DecimalLiteral vs *big.Rat, time.Time with monotonic-clock suffix, etc., see convertAvroValueToIcebergType in manifest.go:1756. Same logical value, different %v, missed match. Also = and ; aren't escaped, so a string value "1=a;2=b" collides with a different two-field tuple.

A silent miss here is worse than the bug being fixed — we'd accept a commit Java would reject. I'd either build an Or(And(...)) partition expression and call the existing validateAddedDataFilesMatchingFilter (which uses iceberg.Literal comparison and handles all of these), or normalize each known type to a canonical hashable form before keying.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partitionTupleKey has been removed entirely. The new anyToLiteral helper uses a type switch over all iceberg.LiteralType values — bool, int32, int64, float32, float64, string, []byte, Date, Time, Timestamp, TimestampNano, Decimal, uuid.UUID — to produce typed iceberg.Literal values with stable equality semantics. No more fmt.Sprintf("%v") instability for UUID/decimal/binary. The =/; injection issue is gone — string-keyed maps are no longer used at all.

Comment thread table/conflict_validation.go Outdated
if e.Status() != iceberg.EntryStatusADDED || e.SnapshotID() != snap.SnapshotID {
continue
}
if _, ok := partSet[partitionTupleKey(e.DataFile().Partition())]; ok {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lookup compares only (field_id, value) and never consults mf.PartitionSpecID(). After a partition-spec evolution (renamed field, identity → bucket, day → hour), the same logical row gets written under a different tuple — eq-delete bound to spec A might be {1: "hot"} while concurrent data under spec B is {1: 3} (bucketed) or {2: "hot"} (renamed). Tuples never match, real conflict missed.

The sibling helper at lines 343-434 handles this via buildPartitionProjection(specID, ...) keyed per spec id. I'd at minimum key on (specID, tuple) and use mf.PartitionSpecID() plus the eq-delete file's SpecID(). Routing through validateAddedDataFilesMatchingFilter with an OR-of-equalities filter gets it for free.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved automatically by routing through validateAddedDataFilesMatchingFilter. That helper calls buildPartitionProjection(specID, ...) keyed per each concurrent manifest's PartitionSpecID, so a concurrent file written under spec B is projected against spec B's fields — not spec A's. The eq-delete filter is expressed in row (source) space using Reference(sourceFieldName), so it projects correctly regardless of which spec the concurrent file was written under.

// If any eq-delete file is unpartitioned (empty tuple), the delete
// could affect any row — fall back to the conservative AlwaysTrue check.
for _, p := range eqDeletePartitions {
if len(p) == 0 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd hoist this fallback to the caller. The function name says "InPartitions", but a single empty input element silently turns it into AlwaysTrue across the whole table — leaky. Also "empty per-file partition tuple" is a noisy proxy for "unpartitioned table": an eq-delete file with an unset partition map on a partitioned table (easy to do via NewDataFileBuilder without partition data) silently nullifies the optimization. The right signal is the spec itself.

Move the decision into RowDelta.validate: if the table's only spec is unpartitioned, call the AlwaysTrue version directly; otherwise the partition-scoped one. Both paths become explicit.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved exactly as suggested. validateNoConflictingDataFilesInPartitions no longer contains any len(p) == 0 fallback. Instead, RowDelta.validate checks currentSpec.NumFields() == 0 before calling anything: if the table is unpartitioned, it calls validateNoConflictingDataFiles(ctx, iceberg.AlwaysTrue{}, level) directly; otherwise it calls validateNoConflictingDataFilesInPartitions. Both paths are now explicit at the caller.

Comment thread table/conflict_validation.go Outdated
continue
}
if _, ok := partSet[partitionTupleKey(e.DataFile().Partition())]; ok {
return fmt.Errorf("%w: snapshot %d added data file %s in eq-delete partition",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sibling on line 404 is "snapshot %d added data file %s matching filter %s" — includes the filter that triggered the match. This one says "in eq-delete partition" without saying which one. With multiple eq-delete files spanning multiple partitions, an operator triaging this can't tell which one conflicted without reading manifests by hand.

return fmt.Errorf("%w: snapshot %d added data file %s in partition %v overlapping eq-delete",
    ErrConflictingDataFiles, snap.SnapshotID, e.DataFile().FilePath(), e.DataFile().Partition())

Copy link
Copy Markdown
Contributor Author

@mzzz-zzm mzzz-zzm May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moot — the custom error site no longer exists. By routing through validateAddedDataFilesMatchingFilter, the existing error message "snapshot %d added data file %s matching filter %s" is produced, which includes the full filter expression. No separate fix needed.

Comment thread table/partition_conflict_test.go Outdated
// Empty partition tuple → unpartitioned eq-delete → must fall back to AlwaysTrue.
// With no concurrent snapshots in ctx the AlwaysTrue path returns nil (no-op).
emptyPartition := map[int]any{}
require.NoError(t, validateNoConflictingDataFilesInPartitions(ctx, []map[int]any{emptyPartition}, IsolationSerializable))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't actually exercise the fallback. newConflictContext(meta, meta, ...) produces ctx.concurrent = [] because base and current point at the same head, so the validator short-circuits at len(ctx.concurrent) == 0 before reaching the empty-tuple loop. The comment in the test admits it: "With no concurrent snapshots in ctx the AlwaysTrue path returns nil (no-op)" — it'd still pass if we deleted the fallback entirely.

I'd build a context with a real concurrent snapshot (mirror TestNewConflictContext_WriterHasNoBranchViewbase = newConflictTestMetadata(t, nil), current = newConflictTestMetadata(t, &head)), then assert the fallback is reached, either by injecting a fake iceio.IO and observing the manifest fetch, or by an end-to-end fixture where AlwaysTrue would correctly flag a conflict the partition-scoped check would miss.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old trivial-pass test is removed. It is replaced by TestRowDeltaValidate_UnpartitionedTableFallsBackToAlwaysTrue, which builds a real table with an empty partition spec, writes a concurrent snapshot with a real data file manifest, and asserts the commit is rejected with ErrConflictingDataFiles. This directly proves the AlwaysTrue path is active and wired — not just that it compiles.

// With no concurrent snapshots in ctx the AlwaysTrue path returns nil (no-op).
emptyPartition := map[int]any{}
require.NoError(t, validateNoConflictingDataFilesInPartitions(ctx, []map[int]any{emptyPartition}, IsolationSerializable))
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The #978 reproducer (TestBugRepro_RowDeltaFalseConflictDifferentPartition from the issue body) is the test that would fail on main and pass on this branch — it's the regression guard for the actual contract of this PR, and it isn't here.

A few I'd want at minimum:

  • the Bug(table): RowDelta.validate uses AlwaysTrue filter — false conflicts for concurrent appends to different partitions #978 reproducer as-is — different-partition allowed
  • same-partition rejected — without this, a future change to partitionTupleKey could silently degrade the validator into a no-op with no signal
  • UUID or decimal partition column, same-partition rejected — would fail today against %v keying, that's the point
  • spec evolution — eq-delete bound to spec A, concurrent data under spec B (renamed field or transform change), will fail today
  • a genuinely-unpartitioned table replacing the trivial-pass test above

The harness exists — conflict_validation_test.go builds metadata with concurrent snapshots and row_delta_test.go builds eq-delete files. They just need to be wired together.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All five requested cases are now present in table/partition_conflict_test.go:

  • TestRowDeltaValidate_DifferentPartitionAllowedBug(table): RowDelta.validate uses AlwaysTrue filter — false conflicts for concurrent appends to different partitions #978 reproducer: eu-west-1 concurrent + us-east-1 eq-delete → no conflict
  • TestRowDeltaValidate_SamePartitionRejectedus-east-1 concurrent + us-east-1 eq-delete → rejected
  • TestRowDeltaValidate_UUIDPartitionSameRejected / TestRowDeltaValidate_UUIDPartitionDifferentAllowed — UUID partition type safety (these would fail against %v keying)
  • TestRowDeltaValidate_UnpartitionedTableFallsBackToAlwaysTrue — replaces the trivial-pass test
  • TestRowDeltaValidate_SpecEvolutionConflictDetected — eq-delete written under spec A (identity(region), partitionFieldID=1000), concurrent data written under spec B (renamed to region_v2, partitionFieldID=1001, same source field ID=2), both with value "us-east-1". eqDeletePartitionsToFilter builds Reference("region") == "us-east-1" in row space; validateAddedDataFilesMatchingFilter projects it against spec B via buildPartitionProjection → conflict detected.

Comment thread table/conflict_validation.go Outdated
// falls back to AlwaysTrue — the equality delete could affect any row.
//
// Under IsolationSnapshot this validator is a no-op.
func validateNoConflictingDataFilesInPartitions(ctx *conflictContext, eqDeletePartitions []map[int]any, level IsolationLevel) error {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bigger-picture: I'd consider folding this into validateAddedDataFilesMatchingFilter rather than adding a sibling. Derive a BooleanExpression from eqDeletePartitions like Or(And(field_a == v_a, field_b == v_b), ...) and pass it to validateNoConflictingDataFiles(ctx, filter, level). That helper already does per-spec projection (spec-evolution correct), manifest-summary pruning (no full-manifest scan when summaries can't match), and type-aware partition evaluation via iceberg.Literal (no UUID/decimal/binary issues).

Resolves the comments on partitionTupleKey and spec-id awareness at the same time, and matches the pattern documented in this file's preamble ("there is one code path that pruning semantics flow through") and Java's MergingSnapshotProducer.validateNoNewDataFiles.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — validateNoConflictingDataFilesInPartitions is not a sibling that walks manifests independently. It calls eqDeletePartitionsToFilter to build an OR(AND(EqualTo(...))) expression, then passes it to validateNoConflictingDataFiles(ctx, filter, level), which internally calls validateAddedDataFilesMatchingFilter. Per-spec projection, manifest-summary pruning, and type-aware partition evaluation all come for free from the existing path.

@mzzz-zzm mzzz-zzm force-pushed the fix/rowdelta-partition-conflict-filter branch 4 times, most recently from 2af6a47 to 8b617c6 Compare May 6, 2026 06:45
…idate

Replace partitionTupleKey + validateNoConflictingDataFilesInPartitions (old) with
eqDeletePartitionsToFilter which builds an OR(AND(EqualTo(...))) BooleanExpression
from the eq-delete DataFiles and routes it through validateNoConflictingDataFiles
-> validateAddedDataFilesMatchingFilter.

Benefits over the removed partitionTupleKey approach:
- Type-safe: anyToLiteral uses a type switch over all iceberg.LiteralType values
  (bool, int32/64, float32/64, string, []byte, Date, Time, Timestamp,
  TimestampNano, Decimal, uuid.UUID) -- no fmt.Sprintf instability for
  UUID/decimal/binary partitions.
- Spec-evolution correct: the filter is projected per each concurrent manifest's
  PartitionSpec via buildPartitionProjection, so renamed fields and transform
  changes do not produce false conflicts.
- No key-injection: avoids the =;-separator collision risk in string-keyed maps.
- Unpartitioned tables: RowDelta.validate checks PartitionSpec.NumFields() == 0
  and passes AlwaysTrue directly; eqDeletePartitionsToFilter is only called for
  partitioned tables.

Updates RowDelta.validate doc comment to describe the partition-scoped behavior.

Adds 9 regression and validation tests:
- anyToLiteral for all supported types and an unsupported type
- Short-circuit under SnapshotIsolation and empty inputs
- apache#978 reproducer: different-partition concurrent append is not rejected
- Same-partition concurrent append IS rejected
- UUID partition: same rejected, different allowed
- Unpartitioned table AlwaysTrue fallback detects any concurrent append

Fixes apache#978
@mzzz-zzm mzzz-zzm force-pushed the fix/rowdelta-partition-conflict-filter branch from 8b617c6 to ed11891 Compare May 6, 2026 06:56
@mzzz-zzm
Copy link
Copy Markdown
Contributor Author

mzzz-zzm commented May 6, 2026

All points addressed. Summary of what changed:

  • partitionTupleKey removed — replaced by anyToLiteral (type switch) + eqDeletePartitionsToFilter (builds typed OR(AND(EqualTo(...))) filter)
  • Routes through validateAddedDataFilesMatchingFilter — spec evolution, manifest pruning, and type-aware evaluation all handled by the existing path
  • Unpartitioned fallback hoisted to callerRowDelta.validate checks NumFields() == 0 and passes AlwaysTrue directly
  • Unrelated changes removedcatalog/rest/rest.go and the newConflictContext empty-base change are both reverted to match origin/main
  • 10 tests added — 6 end-to-end RowDelta.validate regression tests (Bug(table): RowDelta.validate uses AlwaysTrue filter — false conflicts for concurrent appends to different partitions #978 reproducer, same/different partition, UUID type safety, unpartitioned fallback, spec-evolution cross-spec conflict detection) + 4 unit tests for anyToLiteral and validateNoConflictingDataFilesInPartitions helpers
  • Both RowDelta struct doc comment and validate function doc updated to describe the partition-scoped behavior (struct-level comment previously still said "AlwaysTrue")
  • All 19 test packages pass

@mzzz-zzm
Copy link
Copy Markdown
Contributor Author

mzzz-zzm commented May 6, 2026

Attached image below

The zizmor failure looks pre-existing. A quick look at the run history:

Run PR/Branch Result
#219 PR #983 (latest push) ❌ failed
#218 PR #983 ❌ failed
#217 PR #984 (feat/geo-type, different contributor) ❌ failed
#216 PR #1022 ✅ passed
#215 PR #983 (earlier today) ✅ passed
#211#214 PR #983 ✅ passed

PR #983 was passing fine earlier today. PR #984 (completely unrelated, different author) started failing around the same time at run #217. Looks like something changed in the zizmor action around that point, not anything in this diff.

image:

Screenshot 2026-05-06 at 5 58 48 pm

Copy link
Copy Markdown
Contributor

@laskoviymishka laskoviymishka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job! My previous concerns look cleanly addressed.

Manifest pruning is back through the existing helper, and the test coverage is much stronger. Direction looks good.

I caught three silent failure modes while tracing the transform/literal paths:

  • Non-identity transforms like bucket, day, hour, truncate: the row-space filter seems to double-transform the literal during projection. I confirmed an eq-delete in bucket=1 can match bucket=4 after BucketTransform.Project. The current spec-evolution test only uses identity, so it misses this.
  • Decimal partitions: anyToLiteral matches Decimal, but manifest reads return the named type DecimalLiteral, so reusing a manifest-read DataFile through AddDeletes fails for decimal-partitioned tables.
  • Timestamp nanos: the TimestampNano branch looks unreachable because manifest reading has no nanos case, so raw int64 matches first.

The transform issue is the one I’d fix before merge, since bucket(...) and day(ts) are very common Iceberg partition transforms and the failure is silent. The decimal and timestamp cases feel smaller and could be follow-ups.

Once the transform path is fixed, ideally with a bucket[N] or day(ts) regression test, I’m happy to take another pass and approve.

return nil, fmt.Errorf("partition field %q: %w", sourceField.Name, err)
}

conjuncts = append(conjuncts, iceberg.LiteralPredicate(iceberg.OpEQ, iceberg.Reference(sourceField.Name), lit))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a subtle issue here for non-identity transforms — wanted to flag it because the test suite only exercises IdentityTransform so it might not surface naturally.

The literal we pass in (lit) is the post-transform partition value (a bucket index from BucketTransform, days-from-epoch from DayTransform, etc.), but the predicate is anchored against the source column via Reference(sourceField.Name). Downstream, validateAddedDataFilesMatchingFilter calls BucketTransform.Project (transforms.go:366), which does transformLiteral(transformer, p.Literal()) — re-bucketing what's already a bucket index.

Quick repro on BucketTransform{NumBuckets: 16} over int64:

user_id=12345 → bucket=1   (what DataFile.Partition() stores)
user_id=1     → bucket=4   (what BucketTransform.Project produces)

Eq-delete is in bucket 1, projected predicate matches bucket 4 — a concurrent file in bucket 1 silently passes. Same shape for day(ts), hour(ts), truncate[K], year/month. IdentityTransform happens to work because identity(identity(x)) == x, which is why the existing spec-evolution test is green.

A couple of ways to handle, wdyt?

  • branch on pf.Transform: identity → keep this row-space predicate, non-identity → build a partition-space predicate (Reference(pf.Name) == lit keyed by (specID, pf.Name)) evaluated only against same-spec concurrent manifests, with cross-spec as a conservative fallback for that file — close to Java's PartitionSet.
  • narrower scope: explicitly reject non-identity transforms here with a clear error and document this as identity-only for now.

A couple of regression tests under bucket[N] and day(ts) would lock the contract either way.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. eqDeletePartitionsToFilter now detects non-identity transforms per partition field before building any row-space predicate. If any field in a file's partition spec uses a non-identity transform (bucket, day, hour, truncate, year, month), the function falls back to AlwaysTrue{} for that file — treating the eq-delete as table-wide, which is conservative but always correct. IdentityTransform continues to produce the scoped row-space predicate as before.

The full PartitionSet-style fix (partition-space predicate keyed by (specID, partitionFieldName) evaluated only against same-spec concurrent manifests) is deferred to a follow-up PR.

Two regression tests lock the contract:

  • TestRowDeltaValidate_BucketTransformFallsBackToAlwaysTrue: bucket[16](user_id) partitioned table — concurrent data in bucket 1, eq-delete in bucket 1 → rejected.
  • TestRowDeltaValidate_DayTransformFallsBackToAlwaysTrue: day(event_ts) partitioned table — concurrent data in day 100, eq-delete in day 200 → still rejected (conservative; AlwaysTrue cannot distinguish days).

return iceberg.NewLiteral(val), nil
case iceberg.TimestampNano:
return iceberg.NewLiteral(val), nil
case iceberg.Decimal:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny type-asymmetry I noticed while tracing the round-trip:

convertAvroValueToIcebergType at manifest.go:1800 returns iceberg.DecimalLiteral{Scale, Val}, which is type DecimalLiteral Decimal — a named type derived from Decimal, not Decimal itself. Go type switches match exact dynamic type, so this arm doesn't fire on a round-tripped value — it falls through to default: with "unsupported partition value type DecimalLiteral".

Since AddDeletes takes iceberg.DataFile, any flow that re-uses a manifest-read file as input would hit this on decimal-partitioned tables. Two options I can think of:

  • add case iceberg.DecimalLiteral: here
  • normalize convertAvroValueToIcebergType to return Decimal consistently with the writer side

Fwiw TestAnyToLiteral_SupportedTypes currently skips Decimal — adding a round-trip case once this is settled would catch any future drift.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Added case iceberg.DecimalLiteral: to anyToLiteral alongside the existing case iceberg.Decimal:. The new arm casts to iceberg.Decimal before wrapping in a Literal, bridging the named-type gap: iceberg.Decimal(val) is a valid conversion because type DecimalLiteral Decimal.

The TestAnyToLiteral_SupportedTypes table now includes a DecimalLiteral subtest (iceberg.DecimalLiteral{Scale: 2}) that would have failed before this fix — it passes now.

Normalizing convertAvroValueToIcebergType to return Decimal consistently is deferred; it touches callers beyond anyToLiteral and deserves its own PR.

return iceberg.NewLiteral(val), nil
case iceberg.Timestamp:
return iceberg.NewLiteral(val), nil
case iceberg.TimestampNano:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to the Decimal one above — convertAvroValueToIcebergType (manifest.go:1756-1817) has cases for TimestampMillis and TimestampMicros but none for nanos, so a nanosecond-timestamp partition value arrives here as a raw int64 and matches case int64: first. This case iceberg.TimestampNano: arm ends up unreachable from the read path.

Probably worth either adding a TimestampNanos case to convertAvroValueToIcebergType so reads return iceberg.TimestampNano and this arm fires, or dropping this arm and noting that TimestampNano partitions aren't supported here yet. Same as Decimal, the unit table also skips this case so it's not caught today.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Added case atype.TimestampNanos: to convertAvroValueToIcebergType in manifest.go, following the same pattern as the existing TimestampMillis and TimestampMicros cases. A timestamp-nanos logical type field now returns iceberg.TimestampNano from the manifest reader, making the anyToLiteral arm reachable from the read path.

Added a TimestampNano subtest to TestAnyToLiteral_SupportedTypes to catch future drift. The test passes.

mzzz-zzm added 3 commits May 6, 2026 20:40
…DeletePartitionsToFilter

For non-identity transforms (bucket, day, hour, truncate, year, month),
DataFile.Partition() stores post-transform values (e.g. bucket indices,
days-since-epoch). Building a row-space predicate from those values and
passing it to validateAddedDataFilesMatchingFilter causes the transform
to be re-applied downstream, producing wrong matches (double-transformation).

For now, detect non-identity transforms and fall back to AlwaysTrue (treat
the eq-delete as table-wide), which is conservative but correct. A full
PartitionSet-style approach — building a partition-space predicate keyed by
(specID, partitionFieldName) evaluated only against same-spec manifests — is
deferred to a follow-up PR.

Add two regression tests:
- TestRowDeltaValidate_BucketTransformFallsBackToAlwaysTrue
- TestRowDeltaValidate_DayTransformFallsBackToAlwaysTrue
convertAvroValueToIcebergType returns iceberg.DecimalLiteral{} (a named
type — type DecimalLiteral Decimal), not iceberg.Decimal. Go type switches
match exact dynamic type, so the existing 'case iceberg.Decimal:' arm did
not fire on manifest-read partition values, causing a fallthrough to the
default error branch for decimal-partitioned tables.

Add a 'case iceberg.DecimalLiteral:' arm that casts to Decimal before
wrapping in a Literal. Also add a DecimalLiteral subtest to
TestAnyToLiteral_SupportedTypes to catch future drift.
…eToIcebergType

The manifest reader already handled timestamp-millis and timestamp-micros
but had no case for timestamp-nanos (atype.TimestampNanos). A partition
field with that logical type arrived in anyToLiteral as a raw int64, matching
'case int64:' first and making the 'case iceberg.TimestampNano:' arm
unreachable from the read path.

Add the missing case following the same pattern as the existing timestamp
cases. Also add a TimestampNano subtest to TestAnyToLiteral_SupportedTypes
to cover the round-trip.
@mzzz-zzm
Copy link
Copy Markdown
Contributor Author

mzzz-zzm commented May 6, 2026

During my integration tests against a real S3-backed Iceberg catalog, I found two pre-existing bugs in manifest.go and literals.go that surface when any table has a decimal partition column. Neither appears to be tracked in the issue tracker yet, flagging here since they're adjacent to the changes in this PR.


convertDecimalValue passes the wrong argument to DecimalRequiredBytes (manifest.go)

fixedSize := internal.DecimalRequiredBytes(len(dec.String()))

DecimalRequiredBytes expects the column's declared precision (e.g. 10 for decimal(10, 2)), but receives the character length of the printed value instead. These are unrelated: "10.00" has length 5, which happens to match DecimalRequiredBytes(10) = 5 by coincidence, but "1.00" has length 4 → DecimalRequiredBytes(4) = 2 bytes instead of the correct 5. For decimal(18, 0), the value "1" gives 1 byte instead of 8.

The resulting slice length doesn't match the Avro schema's fixed[N] field, and the encoder rejects it:

avro: field data_file.partition.<field>: cannot use []uint8 with Avro type fixed

Repro: write a manifest with a decimal(10, 2) partition column and value 1.00. The fix is to thread the column's declared precision from the partition field's DecimalType into convertDecimalValue.


DecimalLiteral.Type() hardcodes precision 9 (literals.go)

func (d DecimalLiteral) Type() Type { return DecimalTypeOf(9, d.Scale) }

DecimalLiteral carries Val and Scale but not the declared precision, so Type() silently returns decimal(9, scale) regardless of the actual column type. Any path that calls .Type() on a manifest-read decimal literal — bounds checks, cast decisions, predicate projection — will operate against the wrong precision.


Both bugs exist in main (v0.5.0) and are not introduced by this PR. Happy to open separate issues and/or PRs for them if that's preferred.

@laskoviymishka
Copy link
Copy Markdown
Contributor

@mzzz-zzm --> for those new bugs, can you file a separate issues? Thanks for this! It's very usefull

Copy link
Copy Markdown
Contributor

@laskoviymishka laskoviymishka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Clean iteration: all three flagged cases addressed with regression tests. The conservative AlwaysTrue fallback for non-identity transforms is the right interim call; worth a follow-up issue to track the full PartitionSet-style partition-space check so bucket/day-partitioned tables eventually get the same scoping benefit.

@laskoviymishka laskoviymishka merged commit 158255c into apache:main May 6, 2026
13 of 14 checks passed
@mzzz-zzm mzzz-zzm deleted the fix/rowdelta-partition-conflict-filter branch May 6, 2026 23:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug(table): RowDelta.validate uses AlwaysTrue filter — false conflicts for concurrent appends to different partitions

2 participants