Skip to content

Add table maintenance primitives#1

Open
mattefunnel wants to merge 9 commits into
mainfrom
add-table-maintenance-primitives
Open

Add table maintenance primitives#1
mattefunnel wants to merge 9 commits into
mainfrom
add-table-maintenance-primitives

Conversation

@mattefunnel
Copy link
Copy Markdown
Owner

Which issue does this PR close?

  • Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

mattefunnel and others added 9 commits March 27, 2026 21:51
Covers iceberg-actions crate, DataFusionFileRewriter, required iceberg-rust
core additions (FileIO::list, ManifestWriter, RewriteFilesAction), and test
strategy (~65 tests ported from Java).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
10 tasks, ~65 ported Java tests, DataFusionFileRewriter, end-to-end
integration test mirroring MaintenanceSingle.scala.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Review assessment (2 invalid, 14 valid findings):

INVALID:
- A§3.1: DataContentType enum — report was wrong, EqualityDeletes is correct
- A§3.4: file_size_in_bytes — report was wrong, it IS u64

KEY FIXES (design doc + implementation plan):
- Add Task 0: expose pub(crate) APIs (refs() accessor, Manifest::try_from_avro_bytes)
- Fix Transaction::new pattern (not self.table.new_transaction())
- Fix MemoryCatalogBuilder::load() pattern (not .build())
- Fix Catalog::create_table(namespace, TableCreation) signature
- RemoveOrphanFiles: add metadata_log, statistics, partition_statistics to referenced set
- ExpireSnapshots: add statistics file deletion, improve retention docs
- Add shared test helpers (tests/common/mod.rs)
- Add Polaris smoke test (#[ignore])
- Unify test placement: actions tests in iceberg-rust, e2e in datafusion
- Document path deps as local-only prototype
- Flesh out append_batch() helper with real code

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Standalone document covering: no mtime filtering in RemoveOrphanFiles,
simplified branch retention in ExpireSnapshots, no dangling delete
cleanup, bin-pack only, no scheme mapping, local path deps.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ivalent)

End-to-end test that creates an Iceberg table with MemoryCatalog backed by
LocalFsStorageFactory, writes real parquet data across 5 snapshots, then
runs all four maintenance operations in sequence: RewriteDataFiles (with an
inline ParquetFileRewriter), RewriteManifests, ExpireSnapshots (retain_last=2),
and RemoveOrphanFiles (dry_run=true). Uses iceberg and iceberg-actions path
deps from the sibling iceberg-rust repo, with separate arrow/parquet v57 deps
to match iceberg-rust's version requirements.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- User guide with API examples for all 4 operations
- Known limitations doc updated with DataFusion version mismatch finding

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Expand the user guide with a full quick-start example showing all four
maintenance operations, a FileRewriter trait section, and detailed API
reference tables for each operation's builder methods and result fields.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant