Support Iceberg _spec_id metadata column in native scan


**Is your feature request related to a problem? Please describe.**

Currently, native Iceberg scan only supports the `_file` metadata column. Queries that project other Iceberg metadata columns fall back to Spark, even when the metadata column is file-level and can be materialized as a constant per scanned data file.

For example, `_spec_id` is a file-level Iceberg metadata column. It represents the partition spec ID of the data file containing the row. Unlike `_pos`, it does not require row-level materialization, so it can be supported in the same way as `_file`.

Related code:
- `IcebergScanSupport.isSupportedMetadataColumn` only allows `MetadataColumns.FILE_PATH`.
- `NativeIcebergTableScanExec.metadataPartitionValues` only materializes `_file`.

**Describe the solution you'd like**

Add native Iceberg scan support for the `_spec_id` metadata column.

The implementation can treat `_spec_id` as a per-file partition value:

1. Mark `MetadataColumns.SPEC_ID` as a supported metadata column in `IcebergScanSupport`.
2. Build a mapping from data file path to `FileScanTask.file().specId()`.
3. Extend `NativeIcebergTableScanExec.metadataPartitionValues` to materialize `_spec_id` as an integer literal.
4. Add integration tests for queries such as:
   - `select _spec_id from iceberg_table`
   - `select id, _file, _spec_id from iceberg_table`

The native scan should continue to fall back for row-level metadata columns such as `_pos`.


**Additional context**

Iceberg defines `_spec_id` as a required integer metadata column. Since the value is constant for all rows in a data file, it can be passed through the existing native scan partition-value mechanism already used for `_file`.

This is a focused native Iceberg scan coverage improvement and should not require an AIP.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Iceberg _spec_id metadata column in native scan #2217

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Support Iceberg _spec_id metadata column in native scan #2217

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions