What would you like to happen?
Summary
Today, Beam supports dynamic destination routing for Iceberg, but the input PCollection still requires a single Beam schema. This makes it difficult to build generic ingestion pipelines where:
- new event types can appear dynamically
- each event type maps to a different Iceberg table
- each event type has its own schema
Problem
A common ingestion pattern is:
- Read generic events from a streaming pipeline like Pub/Sub
- Determine destination table from an attribute like event_type
- Fetch the schema for that event type dynamically from a central schema store
- Write each record to its corresponding Iceberg table
This works conceptually for table routing, but breaks down because PCollection has one schema for the entire collection. In practice, that means:
- dynamic table name is possible
- dynamic schema per destination is not
- newly introduced event types require pipeline changes
What is needed?
We need per-destination schema, similar to BigQuery DynamicDestinations.
Example Idea:
IcebergIO.write()
.to(new DynamicDestinations<InputT, DestinationT>() {
@Override
public DestinationT getDestination(ValueInSingleWindow<InputT> element) { ... }
@Override
public String getTableIdentifier(DestinationT destination) { ... }
@Override
public org.apache.iceberg.Schema getSchema(DestinationT destination) { ... }
})
Issue Priority
Priority: 2 (default / most feature requests should be filed as P2)
Issue Components
What would you like to happen?
Summary
Today, Beam supports dynamic destination routing for Iceberg, but the input PCollection still requires a single Beam schema. This makes it difficult to build generic ingestion pipelines where:
Problem
A common ingestion pattern is:
This works conceptually for table routing, but breaks down because PCollection has one schema for the entire collection. In practice, that means:
What is needed?
We need per-destination schema, similar to BigQuery DynamicDestinations.
Example Idea:
Issue Priority
Priority: 2 (default / most feature requests should be filed as P2)
Issue Components