DATAPLT-1268 Add shortcut for Iceberg-backed Glue Table #161
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Manual Configuration
Optimizer Configuration
The Table Optimizer API has a number of configuration options that are not exposed in CloudFormation.
CompactionConfiguration
Setting CompactionConfiguration can only be done via API calls (ie: CLI) after the resource has been constructed. Compaction can be enabled using this shortcut, but it cannot be configured. For many cases, the default configuration may be sufficient. The following options require post-creation manual configuration:
strategy: the default isbinpack. Note that usingsortorz-orderrequires the table to have the sort order manually set via Spark SQL.minInputFiles: minimum number of files to in order to initiate a compaction, default is 100deleteFileThershold: minimum number of deletes that must be present in a data file to make it eligible for compaction, default is 1OrphanFileDeletionConfiguration
CloudFormation includes support for setting the
OrphanFileRetentionPeriodInDaysproperty, but the following must be set using the API/CLI:location: a sub-directory in which to look for files, default is the table locationrunRateInHours: interval in hours between orphan file deletion job runs, default is 24RetentionConfiguration
CloudFormation includes support for setting the
cleanExpiredFiles,numberOfSnapshotsToRetainandsnapshotRetentionPeriodInDaysproperties, but the following must be set using the API/CLI:runRateInHours: interval in hours between retention job runs, default is 24Sort Order
Sort order can only be set using Spark SQL.
TODO: add details
Testing
TODO: