feat(clickhouse): support configurable storage engine for events table#4271
feat(clickhouse): support configurable storage engine for events table#4271pdeaudney wants to merge 4 commits into
Conversation
Adds aggregation.clickhouse.eventsTableEngine config block so self-hosted
multi-node ClickHouse deployments can create the events table with
ReplicatedMergeTree (with optional ON CLUSTER) instead of the hardcoded
MergeTree. Default behavior is unchanged: ClickHouse Cloud users continue
to rely on Cloud's transparent rewrite to SharedMergeTree.
The connector now plumbs an EventsTableEngine value through to the SQL
builder. Cluster names are backtick-quoted in the emitted DDL so any
identifier ClickHouse permits in <remote_servers> is accepted (including
hyphens) without re-implementing identifier validation. ZooKeeper path
and replica name are escaped against single-quote injection and validated
for non-whitespace content.
Coverage:
- Unit tests on rendered SQL for legacy default, explicit MergeTree,
ReplicatedMergeTree with macros, ON CLUSTER rendering, hyphenated cluster
names, embedded backtick escaping, and single-quote escaping in zk path
- Validation tests for required fields, whitespace-only inputs, and
unknown engine types in both the config and connector layers
- YAML unmarshal coverage in TestComplete pinning the camelCase Viper key
naming for all four engine fields
- Live single-node integration tests (gated by TEST_CLICKHOUSE_DSN)
exercising explicit MergeTree, default-zero-value, and invalid-config
paths through the connector constructor
- Live 2-node replicated integration test (gated by
TEST_CLICKHOUSE_REPLICATED=1) that inserts on node-01 and reads back
from node-02 to prove Keeper substituted {shard}/{replica} and
replication is wired end-to-end
Adds docker-compose.replicated.yaml + make up-replicated/down-replicated
targets bringing up a Keeper-backed 2-replica cluster on ports
38123/39000 and 38124/39001 (namespaced clear of the default stack), with
server logs streaming to stdout via docker logs.
Adds docs/migration-guides/2026-05-02-clickhouse-replicated-engine.md
with the config reference, an offline RENAME + INSERT SELECT recipe for
converting an existing MergeTree table, and the
system.distributed_ddl_queue / system.replicas checks operators should
run between steps.
Refs openmeterio#3050
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
🚧 Files skipped from review as they are similar to previous changes (5)
📝 WalkthroughWalkthroughAdds configurable ClickHouse events table engine support (MergeTree or ReplicatedMergeTree) with validation and SQL/ON CLUSTER rendering, wires the engine through the streaming connector, adds unit and integration tests (gated replicated E2E), provides a 2-node replicated Docker Compose stack plus Make targets, and publishes a migration guide. ChangesClickHouse ReplicatedMergeTree Engine Support
Sequence Diagram(s)sequenceDiagram
participant OM as OpenMeter
participant Keeper as ClickHouse Keeper
participant CH1 as ClickHouse Node 1
participant CH2 as ClickHouse Node 2
OM->>OM: Validate EventsTableEngine config
OM->>Keeper: Connect / coordinate (ZooKeeper path)
OM->>CH1: CREATE TABLE (ENGINE clause / ON CLUSTER if set)
CH1->>CH2: Distributed DDL replication
CH1->>Keeper: ReplicatedMergeTree metadata registration
CH2->>Keeper: Replica registration
OM->>CH1: Insert event (streaming.BatchInsert)
CH1->>CH2: Replicate inserted row
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
docker-compose.replicated.yaml (1)
1-220: 💤 Low valueWell-designed test cluster!
A few nice touches: the
imokhealthcheck comment explaining BusyBoxncquirks, stdout logging sodocker logsis actually useful, andinternal_replication: truewhich is the right setting for this topology. Thedistributed_ddlpath is also correctly wired soCREATE TABLE ... ON CLUSTERpropagates properly.One small note: the Keeper image on line 27 (
clickhouse/clickhouse-keeper:25.12.3-alpine) isn't SHA-pinned, while both server images are. Totally fine for a dev stack, but pinning it would make the setup fully reproducible.📌 Pin keeper image for full reproducibility
- image: clickhouse/clickhouse-keeper:25.12.3-alpine + image: clickhouse/clickhouse-keeper:25.12.3-alpine@sha256:<keeper-image-sha>Run
docker pull clickhouse/clickhouse-keeper:25.12.3-alpineand grab the digest fromdocker inspectto fill in the SHA.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker-compose.replicated.yaml` around lines 1 - 220, The clickhouse-keeper image in the clickhouse-keeper-01 service is not SHA-pinned (image: clickhouse/clickhouse-keeper:25.12.3-alpine); update that image field to a digest-pinned form (e.g., clickhouse/clickhouse-keeper:25.12.3-alpine@sha256:<digest>) to make the stack fully reproducible—obtain the digest by running docker pull clickhouse/clickhouse-keeper:25.12.3-alpine and docker inspect to copy the sha256 digest, then replace the image value under the clickhouse-keeper-01 service.openmeter/streaming/clickhouse/events_table_engine_test.go (1)
100-196: 💤 Low valueClever defer trick and thorough end-to-end coverage. A couple of optional things to consider:
Hardcoded node-02 address (line 176): The DSN
127.0.0.1:39001is fine given the skip gate, but you could read it from an env var (e.g.,TEST_CLICKHOUSE_REPLICA2_DSN) to make the test stack more flexible without breaking anything.Polling loop ceiling (lines 184–194): 3 s max wait is tight on a slow VM. A
TEST_CLICKHOUSE_REPLICATION_TIMEOUTenv var or bumping to 50 iterations wouldn't hurt, though it's probably fine for local use.Neither is blocking — just flagging for future ergonomics.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@openmeter/streaming/clickhouse/events_table_engine_test.go` around lines 100 - 196, The test TestReplicatedMergeTree hardcodes the secondary ClickHouse DSN and uses a fixed 30-iteration poll loop; change it to read node-02 DSN from an env var (e.g., TEST_CLICKHOUSE_REPLICA2_DSN) when creating node02Opts and fallback to the current "clickhouse://default:default@127.0.0.1:39001/openmeter" if unset, and make the replication polling ceiling configurable via an env var (e.g., TEST_CLICKHOUSE_REPLICATION_TIMEOUT or TEST_CLICKHOUSE_REPLICATION_ATTEMPTS) used in the for-loop that scans into seen (the loop around row := node02.QueryRow(...)); ensure default behavior remains the same when env vars are absent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/migration-guides/2026-05-02-clickhouse-replicated-engine.md`:
- Around line 86-146: Update the migration steps that use the ClickHouse ON
CLUSTER macro (specifically the statements "RENAME TABLE openmeter.om_events TO
openmeter.om_events_legacy ON CLUSTER {cluster};", the "CREATE TABLE ... ON
CLUSTER {cluster} (...)" statement, and "DROP TABLE openmeter.om_events_legacy
ON CLUSTER {cluster} SYNC;") to include a short clarifying note that {cluster}
is a ClickHouse server-side <macros> value (not an auto-substituted placeholder)
and advise operators to either define the cluster macro in their ClickHouse
config or replace {cluster} with their literal cluster name (e.g.
openmeter_cluster) when running the DDL; add the note inline where each ON
CLUSTER usage appears so readers see the guidance for steps 2, 3 and 7.
---
Nitpick comments:
In `@docker-compose.replicated.yaml`:
- Around line 1-220: The clickhouse-keeper image in the clickhouse-keeper-01
service is not SHA-pinned (image: clickhouse/clickhouse-keeper:25.12.3-alpine);
update that image field to a digest-pinned form (e.g.,
clickhouse/clickhouse-keeper:25.12.3-alpine@sha256:<digest>) to make the stack
fully reproducible—obtain the digest by running docker pull
clickhouse/clickhouse-keeper:25.12.3-alpine and docker inspect to copy the
sha256 digest, then replace the image value under the clickhouse-keeper-01
service.
In `@openmeter/streaming/clickhouse/events_table_engine_test.go`:
- Around line 100-196: The test TestReplicatedMergeTree hardcodes the secondary
ClickHouse DSN and uses a fixed 30-iteration poll loop; change it to read
node-02 DSN from an env var (e.g., TEST_CLICKHOUSE_REPLICA2_DSN) when creating
node02Opts and fallback to the current
"clickhouse://default:default@127.0.0.1:39001/openmeter" if unset, and make the
replication polling ceiling configurable via an env var (e.g.,
TEST_CLICKHOUSE_REPLICATION_TIMEOUT or TEST_CLICKHOUSE_REPLICATION_ATTEMPTS)
used in the for-loop that scans into seen (the loop around row :=
node02.QueryRow(...)); ensure default behavior remains the same when env vars
are absent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f5546f08-5193-4e71-84f2-156111b962a8
📒 Files selected for processing (12)
Makefileapp/common/streaming.goapp/config/aggregation.goapp/config/aggregation_test.goapp/config/config_test.goapp/config/testdata/complete.yamldocker-compose.replicated.yamldocs/migration-guides/2026-05-02-clickhouse-replicated-engine.mdopenmeter/streaming/clickhouse/connector.goopenmeter/streaming/clickhouse/event_query.goopenmeter/streaming/clickhouse/event_query_test.goopenmeter/streaming/clickhouse/events_table_engine_test.go
Note added at the first ON CLUSTER {cluster} occurrence in the migration
guide, telling operators that {cluster} is a ClickHouse <macros> value
they may need to define (or replace with the literal cluster name).
The note extends to every subsequent ON CLUSTER {cluster} in the steps,
so we don't repeat it for each.
Addresses CodeRabbit feedback on PR.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The replicated compose stack was pinning every image to tag@sha256:... except clickhouse-keeper, which was tag-only. Match the convention used for clickhouse-server, kafka, postgres, redis, and svix in docker-compose.base.yaml so this stack is reproducible across hosts and not silently affected if the upstream :25.12.3-alpine tag is rebuilt. Digest is the multi-platform manifest index, so the same line works on both arm64 (local Mac dev) and amd64 (CI / Linux), matching the existing clickhouse-server pin. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GAlexIHU
left a comment
There was a problem hiding this comment.
Thanks @pdeaudney for your contrib, it's likely something that will be well received by users!
I only have a few comments, once those are resolved I'd be happy to merge this :)
| { | ||
| name: "cluster name with embedded backtick is escaped", | ||
| data: createEventsTable{ | ||
| Database: "openmeter", | ||
| EventsTableName: "om_events", | ||
| Engine: EventsTableEngine{ | ||
| Type: EventsTableEngineMergeTree, | ||
| Cluster: "weird`name", | ||
| }, | ||
| }, | ||
| want: "CREATE TABLE IF NOT EXISTS openmeter.om_events ON CLUSTER `weird``name` (namespace String, id String, type LowCardinality(String), subject String, source String, time DateTime, data String, ingested_at DateTime, stored_at DateTime, INDEX om_events_stored_at stored_at TYPE minmax GRANULARITY 4, store_row_id String) ENGINE = MergeTree PARTITION BY toYYYYMM(time) ORDER BY (namespace, type, subject, toStartOfHour(time))", | ||
| }, |
There was a problem hiding this comment.
string escaping test is good, but I think this exact case (ON CLUSTER + ENGINE = MergeTree) should not be a valid configuration.
There was a problem hiding this comment.
Addressed in 180150a. Both event_query.go and app/config/aggregation.go now reject Cluster set when the resolved engine is MergeTree (returns: "cluster requires ReplicatedMergeTree engine; MergeTree with ON CLUSTER produces independent (non-replicated) tables per node"). The two render tests that previously combined MergeTree + Cluster were reworked to use ReplicatedMergeTree, and the "valid" cluster-name cases in the validate suite (hyphen, leading digit, embedded backtick, whitespace-only) all now sit under a ReplicatedMergeTree config.
| StoreRowID: "1", | ||
| }})) | ||
|
|
||
| node02Opts, err := clickhouse.ParseDSN("clickhouse://default:default@127.0.0.1:39001/openmeter") |
There was a problem hiding this comment.
could you please make all replicated CH configuration used in the test ENV based (similar to TEST_CLICKHOUSE_REPLICATED flag)? it would be nice to decouple + would make it easier to run this not just locally
There was a problem hiding this comment.
Done in 180150a. The replicated test now reads its full topology from env vars, following the same skip-on-missing pattern as the existing TEST_CLICKHOUSE_DSN gate in suite_test.go:
TEST_CLICKHOUSE_REPLICATED(=1, top-level gate)TEST_CLICKHOUSE_REPLICATED_CLUSTERTEST_CLICKHOUSE_REPLICATED_ZK_PATHTEST_CLICKHOUSE_REPLICATED_REPLICA_NAMETEST_CLICKHOUSE_REPLICATED_NODE2_DSNTEST_CLICKHOUSE_REPLICATED_REPLICA_COUNT
Each is read inline at the top of TestReplicatedMergeTreeEngine with a named s.T().Skip(...) if missing. No defaults — fully decoupled from make up-replicated, runnable against any externally-managed cluster (or CI) by setting the env vars.
|
|
||
| // escapeSingleQuotes doubles single quotes so the value can be embedded in a | ||
| // single-quoted SQL string literal without breaking out. | ||
| func escapeSingleQuotes(s string) string { |
There was a problem hiding this comment.
its good you added escaping but
- i don't think it matches the CH spec
- i think it would be a safer bet if you didn't handroll it
There was a problem hiding this comment.
Fixed in 180150a. Replaced the handrolled strings.ReplaceAll(s, "'", "''") with a package-level strings.NewReplacer that matches the ClickHouse string-literal grammar — backslash as the escape character, with \\ and \' as the canonical escapes:
var stringLiteralEscaper = strings.NewReplacer(`\`, `\\`, `'`, `\'`)This mirrors clickhouse-go's internal stringQuoteReplacer (see bind.go:275), so we're using the same escape policy as the driver rather than reinventing it. Test coverage extended to also exercise an embedded backslash in the ZooKeeper path.
… env-drive replicated tests - Reject `Cluster` set with non-Replicated engine in both event_query.go and app/config/aggregation.go validation. MergeTree + ON CLUSTER produces independent (non-replicated) tables per node, which is almost never intended for the events table. - Replace handrolled single-quote doubling with a NewReplacer that follows the ClickHouse string-literal grammar (mirrors clickhouse-go's stringQuoteReplacer): `\` -> `\\`, `'` -> `\'`. - Drive the ReplicatedMergeTree integration test entirely from env vars (cluster, zk path, replica name, node-02 DSN, expected replica count) using the same skip-on-missing pattern as the existing TEST_CLICKHOUSE_DSN gate, so the test decouples from the local docker-compose stack. Addresses PR review comments from @GAlexIHU. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds aggregation.clickhouse.eventsTableEngine config block so self-hosted multi-node ClickHouse deployments can create the events table with ReplicatedMergeTree (with optional ON CLUSTER) instead of the hardcoded MergeTree. Default behavior is unchanged: ClickHouse Cloud users continue to rely on Cloud's transparent rewrite to SharedMergeTree.
The connector now plumbs an EventsTableEngine value through to the SQL builder. Cluster names are backtick-quoted in the emitted DDL so any identifier ClickHouse permits in <remote_servers> is accepted (including hyphens) without re-implementing identifier validation. ZooKeeper path and replica name are escaped against single-quote injection and validated for non-whitespace content.
Coverage:
Adds docker-compose.replicated.yaml + make up-replicated/down-replicated targets bringing up a Keeper-backed 2-replica cluster on ports 38123/39000 and 38124/39001 (namespaced clear of the default stack), with server logs streaming to stdout via docker logs.
Adds docs/migration-guides/2026-05-02-clickhouse-replicated-engine.md with the config reference, an offline RENAME + INSERT SELECT recipe for converting an existing MergeTree table, and the
system.distributed_ddl_queue / system.replicas checks operators should run between steps.
Refs #3050
Summary by CodeRabbit
New Features
Documentation
Tests