From ee1a2e06c077d84895a19af21518c8d83856af4b Mon Sep 17 00:00:00 2001 From: Josh Carp Date: Wed, 13 May 2026 16:09:50 -0400 Subject: [PATCH] Abbreviate verbose clickhouse logs. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit As of this writing, `system.query_log` is one of the largest clickhouse tables. From dogfood: ``` ┌─database─┬─table──────────────────────────┬─compressed─┬─uncompressed─┬───────rows─┐ │ system │ query_log │ 17.11 GiB │ 39.49 GiB │ 24942352 │ │ oximeter │ measurements_cumulativeu64 │ 15.52 GiB │ 91.91 GiB │ 1576299396 │ │ oximeter │ measurements_f32 │ 14.52 GiB │ 85.23 GiB │ 1849126423 │ │ oximeter │ measurements_histogramu64 │ 2.76 GiB │ 117.37 GiB │ 78425393 │ │ system │ metric_log │ 742.29 MiB │ 12.84 GiB │ 2543307 │ └──────────┴────────────────────────────────┴────────────┴──────────────┴────────────┘ ``` Most of this table's disk use is attributable to the `query` column, and most of that usage comes from the very long measurement queries run by oximeter, which take the form of: ``` SELECT * FROM oximeter.measurements_* WHERE timeseries_key IN (...) ``` The `IN` clause can include thousands of keys, and a single oximeter query can run multiple clickhouse queries of this form. This takes up a lot of space in the query log. This patch truncates long `IN (...)` clauses. These aren't operationally useful, and shortening them shrinks `system.query_log` by about 80% in testing. --- smf/clickhouse/config.xml | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/smf/clickhouse/config.xml b/smf/clickhouse/config.xml index 352023300a5..ed69518db69 100644 --- a/smf/clickhouse/config.xml +++ b/smf/clickhouse/config.xml @@ -13,6 +13,28 @@ 10000 + + + + truncate large timeseries_key IN clauses + (\btimeseries_key\s+IN\s*\()[^)]{120,}\) + \1...) + + + system metric_log