Skip to content

feat: add Iceberg integration with Parquet writer and commit pipeline#55

Open
zer0stars wants to merge 3 commits intomainfrom
iceberg
Open

feat: add Iceberg integration with Parquet writer and commit pipeline#55
zer0stars wants to merge 3 commits intomainfrom
iceberg

Conversation

@zer0stars
Copy link
Member

Summary

  • Add dimo_parquet_writer output: batches CloudEvents into Parquet files on S3, sets metadata (path/size/count) for downstream Iceberg commits
  • Add dimo_iceberg_commit processor: commits Parquet files to Iceberg tables via REST catalog API (Lakekeeper)
  • Add kafka_franz output step in valid/partial-cloudevents-2 streams to publish commit messages between parquet write and ClickHouse insert
  • Add iceberg-committer stream consuming from commit topic
  • Update values with Iceberg/object-storage env vars

Test plan

  • go build ./... passes
  • go vet ./... passes
  • go test ./internal/outputs/parquetwriter/ passes
  • All plugins register (list outputs/processors)
  • kafka_franz output available for commit publish
  • Stream YAML lints clean (except pre-existing env var warnings)
  • Integration test with Kafka + S3 + ClickHouse + Lakekeeper in dev

Made with Cursor

- Add dimo_parquet_writer output: batches CloudEvents into Parquet files
  on S3, sets metadata (path/size/count) for downstream Iceberg commits
- Add dimo_iceberg_commit processor: commits Parquet files to Iceberg
  tables via REST catalog API (Lakekeeper)
- Add kafka_franz output step in valid/partial-cloudevents-2 streams to
  publish commit messages between parquet write and ClickHouse insert
- Add iceberg-committer stream consuming from commit topic
- Update values with Iceberg/object-storage env vars

Co-authored-by: Cursor <cursoragent@cursor.com>
zer0stars and others added 2 commits February 15, 2026 19:05
…ates

- Add dimo_cloudevent_to_parquet processor and internal/encoders (CloudEvent→Parquet)
- Add icebergcommit processor and tests; remove iceberg-committer stream, ICEBERG_PLAN
- Update valid/partial cloudevent streams, values; remove parquetwriter impl (tests kept)

Co-authored-by: Cursor <cursoragent@cursor.com>
- Rename ICEBERG_WAREHOUSE bucket from dimo-iceberg-* to dimo-storage-*
  across stream configs and Helm values (dev + prod)
- Fix import path: cloudeventparquet/encoders -> encoders
- Move parseBucket tests into cloudeventtoparquet package and
  RawEvent_DataBase64 test into encoders; delete parquetwriter_test.go
- Update go.mod: add aws-sdk-go-v2 core + s3, drop segmentio/encoding,
  pin tidwall/match v1.1.1
- Fix struct field alignment in icebergcommit_test.go

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant