|
| 1 | +.. Copyright 2026-present Alibaba Inc. |
| 2 | +
|
| 3 | +.. Licensed under the Apache License, Version 2.0 (the "License"); |
| 4 | +.. you may not use this file except in compliance with the License. |
| 5 | +.. You may obtain a copy of the License at |
| 6 | +
|
| 7 | +.. http://www.apache.org/licenses/LICENSE-2.0 |
| 8 | +
|
| 9 | +.. Unless required by applicable law or agreed to in writing, software |
| 10 | +.. distributed under the License is distributed on an "AS IS" BASIS, |
| 11 | +.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| 12 | +.. See the License for the specific language governing permissions and |
| 13 | +.. limitations under the License. |
| 14 | +
|
| 15 | +Compaction |
| 16 | +========== |
| 17 | +Compaction is the process of merging multiple small data files into fewer, larger |
| 18 | +files. It is a resource intensive procedure which consumes CPU time and disk IO, |
| 19 | +so too frequent compaction may result in slower writes. However, without |
| 20 | +compaction, the accumulation of small files degrades query performance. Tuning |
| 21 | +compaction is therefore a trade-off between write throughput and read efficiency. |
| 22 | + |
| 23 | +.. note:: |
| 24 | + - There can only be one job working on the same partition's compaction, |
| 25 | + otherwise it will cause conflicts. |
| 26 | + - C++ Paimon does not support producing changelog for now. |
| 27 | + - Compaction is disabled when ``write-only`` is set to ``true``, or when the |
| 28 | + table uses dynamic bucketing (``bucket = -1``) for append-only tables. |
| 29 | + - For a complete list of compaction-related configurations, see the |
| 30 | + :ref:`Options API Reference <cpp-api-options>`. |
| 31 | + |
| 32 | +Append-Only Table Compaction |
| 33 | +---------------------------- |
| 34 | +In append-only table, data files are simply appended in sequence order. |
| 35 | +Over time, many small files accumulate, which degrades read performance due to the |
| 36 | +overhead of opening and scanning numerous files. |
| 37 | + |
| 38 | +Append-only table compaction merges multiple small files into fewer, larger files |
| 39 | +to improve read efficiency. The compaction is performed asynchronously and does |
| 40 | +not block writes. |
| 41 | + |
| 42 | +.. note:: |
| 43 | + Append-only table compaction is only available for fixed-bucket mode |
| 44 | + (``bucket > 0``). Dynamic bucketing (``bucket = -1``) does not support |
| 45 | + compaction. Tables with blob columns also skip compaction. |
| 46 | + |
| 47 | +Auto Compaction |
| 48 | +~~~~~~~~~~~~~~~ |
| 49 | +During each flush, the writer triggers a best-effort auto compaction. The |
| 50 | +compaction picker scans the file queue ordered by sequence number and selects a |
| 51 | +contiguous window of files for merging when the number of candidate files reaches |
| 52 | +the ``compaction.min.file-num`` threshold. |
| 53 | + |
| 54 | +Full Compaction |
| 55 | +~~~~~~~~~~~~~~~ |
| 56 | +Full compaction rewrites all eligible files in the bucket. During full |
| 57 | +compaction: |
| 58 | + |
| 59 | +- Files whose size is already at or above ``compaction.file-size`` (and have no |
| 60 | + associated deletion vectors) are skipped to avoid unnecessary rewrites. |
| 61 | +- When deletion vectors are enabled, all files are always eligible for |
| 62 | + compaction regardless of size, because deletion vectors must be applied. |
| 63 | +- When ``compaction.force-rewrite-all-files`` is ``true``, all files are |
| 64 | + rewritten unconditionally. |
| 65 | +- Without deletion vectors, full compaction only proceeds when the number of |
| 66 | + small files exceeds the number of large files and the total file count is at |
| 67 | + least 3. |
| 68 | + |
| 69 | +After compaction, if the last output file is still smaller than |
| 70 | +``compaction.file-size``, it is placed back into the compaction queue for future |
| 71 | +merging. |
| 72 | + |
| 73 | +Append-Only Table Compaction Options |
| 74 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 75 | + |
| 76 | +.. list-table:: |
| 77 | + :header-rows: 1 |
| 78 | + :widths: 30 10 10 10 40 |
| 79 | + |
| 80 | + * - Option |
| 81 | + - Required |
| 82 | + - Default |
| 83 | + - Type |
| 84 | + - Description |
| 85 | + * - ``compaction.min.file-num`` |
| 86 | + - No |
| 87 | + - 5 |
| 88 | + - Integer |
| 89 | + - The minimum number of files to trigger an auto compaction for |
| 90 | + append-only tables. |
| 91 | + |
| 92 | + |
| 93 | +Primary Key Table Compaction |
| 94 | +---------------------------- |
| 95 | +Primary key tables use an LSM tree (log-structured merge-tree) for file storage. |
| 96 | +When more and more records are written, the number of sorted runs increases. |
| 97 | +Because querying an LSM tree requires all sorted runs to be combined, too many |
| 98 | +sorted runs will result in poor query performance, or even out of memory. |
| 99 | + |
| 100 | +To limit the number of sorted runs, several sorted runs are merged into one big |
| 101 | +sorted run once in a while. Paimon currently adopts a compaction strategy similar |
| 102 | +to RocksDB's `universal compaction |
| 103 | +<https://github.com/facebook/rocksdb/wiki/Universal-Compaction>`_. |
| 104 | + |
| 105 | +Primary key table compaction solves: |
| 106 | + |
| 107 | +- Reduce Level 0 files to avoid poor query performance. |
| 108 | +- Produce deletion vectors for MOW mode. |
| 109 | + |
| 110 | +Full Compaction |
| 111 | +~~~~~~~~~~~~~~~ |
| 112 | +Paimon uses Universal Compaction. By default, when there is too much incremental |
| 113 | +data, Full Compaction will be automatically performed. You don't usually have to |
| 114 | +worry about it. |
| 115 | + |
| 116 | +Paimon also provides configurations that allow for regular execution of Full |
| 117 | +Compaction: |
| 118 | + |
| 119 | +- ``compaction.optimization-interval``: Implying how often to perform an |
| 120 | + optimization full compaction. This configuration is used to ensure the query |
| 121 | + timeliness of the read-optimized system table. |
| 122 | +- ``compaction.total-size-threshold``: Full compaction will be constantly triggered |
| 123 | + when total size is smaller than this threshold. |
| 124 | +- ``compaction.incremental-size-threshold``: Full compaction will be constantly |
| 125 | + triggered when incremental size is bigger than this threshold. |
| 126 | + |
| 127 | +Lookup Compaction |
| 128 | +~~~~~~~~~~~~~~~~~ |
| 129 | +When a primary key table is configured with ``lookup`` changelog producer or |
| 130 | +``first-row`` merge engine or has enabled deletion vectors for MOW mode, Paimon |
| 131 | +will use a radical compaction strategy to force compacting level 0 files to |
| 132 | +higher levels for every compaction trigger. |
| 133 | + |
| 134 | +Paimon also provides configurations to optimize the frequency of this |
| 135 | +compaction: |
| 136 | + |
| 137 | +- ``lookup-compact``: compact mode used for lookup compaction. Possible values: |
| 138 | + |
| 139 | + * ``radical``: will use ``ForceUpLevel0Compaction`` strategy to radically |
| 140 | + compact new files. |
| 141 | + * ``gentle``: will use ``UniversalCompaction`` strategy to gently compact new |
| 142 | + files. |
| 143 | + |
| 144 | +- ``lookup-compact.max-interval``: The max interval for a forced L0 lookup |
| 145 | + compaction to be triggered in ``gentle`` mode. This option is only valid when |
| 146 | + ``lookup-compact`` mode is ``gentle``. |
| 147 | + |
| 148 | +By configuring ``lookup-compact`` as ``gentle``, new files in L0 will not be |
| 149 | +compacted immediately. This may greatly reduce the overall resource usage at the |
| 150 | +expense of worse data freshness in certain cases. |
| 151 | + |
| 152 | +Primary Key Table Compaction Options |
| 153 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 154 | + |
| 155 | +Number of Sorted Runs to Pause Writing |
| 156 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 157 | +When the number of sorted runs is small, Paimon writers will perform compaction |
| 158 | +asynchronously in separated threads, so records can be continuously written into |
| 159 | +the table. However, to avoid unbounded growth of sorted runs, writers will pause |
| 160 | +writing when the number of sorted runs hits the threshold. |
| 161 | + |
| 162 | +.. list-table:: |
| 163 | + :header-rows: 1 |
| 164 | + :widths: 30 10 10 10 40 |
| 165 | + |
| 166 | + * - Option |
| 167 | + - Required |
| 168 | + - Default |
| 169 | + - Type |
| 170 | + - Description |
| 171 | + * - ``num-sorted-run.stop-trigger`` |
| 172 | + - No |
| 173 | + - (none) |
| 174 | + - Integer |
| 175 | + - The number of sorted runs that trigger the stopping of writes. The |
| 176 | + default value is ``num-sorted-run.compaction-trigger + 3``. |
| 177 | + |
| 178 | +Write stalls will become less frequent when ``num-sorted-run.stop-trigger`` |
| 179 | +becomes larger, thus improving writing performance. However, if this value |
| 180 | +becomes too large, more memory and CPU time will be needed when querying the |
| 181 | +table. |
| 182 | + |
| 183 | +Number of Sorted Runs to Trigger Compaction |
| 184 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 185 | +Paimon uses LSM tree which supports a large number of updates. LSM organizes |
| 186 | +files in several sorted runs. When querying records from an LSM tree, all sorted |
| 187 | +runs must be combined to produce a complete view of all records. |
| 188 | + |
| 189 | +One can easily see that too many sorted runs will result in poor query |
| 190 | +performance. To keep the number of sorted runs in a reasonable range, Paimon |
| 191 | +writers will automatically perform compactions. The following table property |
| 192 | +determines the minimum number of sorted runs to trigger a compaction. |
| 193 | + |
| 194 | +.. list-table:: |
| 195 | + :header-rows: 1 |
| 196 | + :widths: 30 10 10 10 40 |
| 197 | + |
| 198 | + * - Option |
| 199 | + - Required |
| 200 | + - Default |
| 201 | + - Type |
| 202 | + - Description |
| 203 | + * - ``num-sorted-run.compaction-trigger`` |
| 204 | + - No |
| 205 | + - 5 |
| 206 | + - Integer |
| 207 | + - The sorted run number to trigger compaction. Includes level 0 files (one |
| 208 | + file one sorted run) and high-level runs (one level one sorted run). |
| 209 | + |
| 210 | +Compaction will become less frequent when ``num-sorted-run.compaction-trigger`` |
| 211 | +becomes larger, thus improving writing performance. However, if this value |
| 212 | +becomes too large, more memory and CPU time will be needed when querying the |
| 213 | +table. This is a trade-off between writing and query performance. |
0 commit comments