Skip to content

Conversation

@jeffhiltz
Copy link

@jeffhiltz jeffhiltz commented Dec 19, 2025

Manual Configuration

Optimizer Configuration

The Table Optimizer API has a number of configuration options that are not exposed in CloudFormation.

CompactionConfiguration

Setting CompactionConfiguration can only be done via API calls (ie: CLI) after the resource has been constructed. Compaction can be enabled using this shortcut, but it cannot be configured. For many cases, the default configuration may be sufficient. The following options require post-creation manual configuration:

  • strategy: the default is binpack. Note that using sort or z-order requires the table to have the sort order manually set via Spark SQL.
  • minInputFiles: minimum number of files to in order to initiate a compaction, default is 100
  • deleteFileThershold: minimum number of deletes that must be present in a data file to make it eligible for compaction, default is 1

OrphanFileDeletionConfiguration

CloudFormation includes support for setting the OrphanFileRetentionPeriodInDays property, but the following must be set using the API/CLI:

  • location: a sub-directory in which to look for files, default is the table location
  • runRateInHours: interval in hours between orphan file deletion job runs, default is 24

RetentionConfiguration

CloudFormation includes support for setting the cleanExpiredFiles, numberOfSnapshotsToRetain and snapshotRetentionPeriodInDays properties, but the following must be set using the API/CLI:

  • runRateInHours: interval in hours between retention job runs, default is 24

Sort Order

Sort order can only be set using Spark SQL.
TODO: add details

Testing

TODO:

  • use the shortcut to create some tables and use them
  • make sure that example Spark SQL code works for setting order (and that the table keeps working)
  • try making a table that uses bucketing (we don't need to do anything extra to support that, right? it's in partition definition? or?)

@jeffhiltz jeffhiltz requested a review from a team December 19, 2025 16:51
@jeffhiltz jeffhiltz added the ai AI coding agents co-authored the code label Dec 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai AI coding agents co-authored the code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants