chore(compaction): add docs for append/pk compaction (#230)

lszskye · web-flow · commit 948324ce7c70 · 2026-04-15T11:42:36.000+08:00
diff --git a/ci/scripts/setup_ccache.sh b/ci/scripts/setup_ccache.sh
@@ -18,7 +18,7 @@ echo "PAIMON_USE_CCACHE=ON" >> $GITHUB_ENV
 
 echo "CCACHE_COMPILERCHECK=content" >> $GITHUB_ENV
 echo "CCACHE_DIR=${HOME}/.ccache" >> $GITHUB_ENV
-echo "CCACHE_MAXSIZE=1500M" >> $GITHUB_ENV
+echo "CCACHE_MAXSIZE=1G" >> $GITHUB_ENV
 echo "CCACHE_COMPRESS=true" >> $GITHUB_ENV
 echo "CCACHE_COMPRESSLEVEL=6" >> $GITHUB_ENV
 
diff --git a/docs/source/user_guide.rst b/docs/source/user_guide.rst
@@ -29,6 +29,7 @@ User Guide
    user_guide/append_only_table
    user_guide/write
    user_guide/commit
+   user_guide/compaction
    user_guide/read
    user_guide/clean
    user_guide/prefetch
diff --git a/docs/source/user_guide/compaction.rst b/docs/source/user_guide/compaction.rst
@@ -0,0 +1,213 @@
+.. Copyright 2026-present Alibaba Inc.
+
+.. Licensed under the Apache License, Version 2.0 (the "License");
+.. you may not use this file except in compliance with the License.
+.. You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+Compaction
+==========
+Compaction is the process of merging multiple small data files into fewer, larger
+files. It is a resource intensive procedure which consumes CPU time and disk IO,
+so too frequent compaction may result in slower writes. However, without
+compaction, the accumulation of small files degrades query performance. Tuning
+compaction is therefore a trade-off between write throughput and read efficiency.
+
+.. note::
+   - There can only be one job working on the same partition's compaction,
+     otherwise it will cause conflicts.
+   - C++ Paimon does not support producing changelog for now.
+   - Compaction is disabled when ``write-only`` is set to ``true``, or when the
+     table uses dynamic bucketing (``bucket = -1``) for append-only tables.
+   - For a complete list of compaction-related configurations, see the
+     :ref:`Options API Reference <cpp-api-options>`.
+
+Append-Only Table Compaction
+----------------------------
+In append-only table, data files are simply appended in sequence order.
+Over time, many small files accumulate, which degrades read performance due to the
+overhead of opening and scanning numerous files.
+
+Append-only table compaction merges multiple small files into fewer, larger files
+to improve read efficiency. The compaction is performed asynchronously and does
+not block writes.
+
+.. note::
+   Append-only table compaction is only available for fixed-bucket mode
+   (``bucket > 0``). Dynamic bucketing (``bucket = -1``) does not support
+   compaction. Tables with blob columns also skip compaction.
+
+Auto Compaction
+~~~~~~~~~~~~~~~
+During each flush, the writer triggers a best-effort auto compaction. The
+compaction picker scans the file queue ordered by sequence number and selects a
+contiguous window of files for merging when the number of candidate files reaches
+the ``compaction.min.file-num`` threshold.
+
+Full Compaction
+~~~~~~~~~~~~~~~
+Full compaction rewrites all eligible files in the bucket. During full
+compaction:
+
+- Files whose size is already at or above ``compaction.file-size`` (and have no
+  associated deletion vectors) are skipped to avoid unnecessary rewrites.
+- When deletion vectors are enabled, all files are always eligible for
+  compaction regardless of size, because deletion vectors must be applied.
+- When ``compaction.force-rewrite-all-files`` is ``true``, all files are
+  rewritten unconditionally.
+- Without deletion vectors, full compaction only proceeds when the number of
+  small files exceeds the number of large files and the total file count is at
+  least 3.
+
+After compaction, if the last output file is still smaller than
+``compaction.file-size``, it is placed back into the compaction queue for future
+merging.
+
+Append-Only Table Compaction Options
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. list-table::
+   :header-rows: 1
+   :widths: 30 10 10 10 40
+
+   * - Option
+     - Required
+     - Default
+     - Type
+     - Description
+   * - ``compaction.min.file-num``
+     - No
+     - 5
+     - Integer
+     - The minimum number of files to trigger an auto compaction for
+       append-only tables.
+
+
+Primary Key Table Compaction
+----------------------------
+Primary key tables use an LSM tree (log-structured merge-tree) for file storage.
+When more and more records are written, the number of sorted runs increases.
+Because querying an LSM tree requires all sorted runs to be combined, too many
+sorted runs will result in poor query performance, or even out of memory.
+
+To limit the number of sorted runs, several sorted runs are merged into one big
+sorted run once in a while. Paimon currently adopts a compaction strategy similar
+to RocksDB's `universal compaction
+<https://github.com/facebook/rocksdb/wiki/Universal-Compaction>`_.
+
+Primary key table compaction solves:
+
+- Reduce Level 0 files to avoid poor query performance.
+- Produce deletion vectors for MOW mode.
+
+Full Compaction
+~~~~~~~~~~~~~~~
+Paimon uses Universal Compaction. By default, when there is too much incremental
+data, Full Compaction will be automatically performed. You don't usually have to
+worry about it.
+
+Paimon also provides configurations that allow for regular execution of Full
+Compaction:
+
+- ``compaction.optimization-interval``: Implying how often to perform an
+  optimization full compaction. This configuration is used to ensure the query
+  timeliness of the read-optimized system table.
+- ``compaction.total-size-threshold``: Full compaction will be constantly triggered
+  when total size is smaller than this threshold.
+- ``compaction.incremental-size-threshold``: Full compaction will be constantly
+  triggered when incremental size is bigger than this threshold.
+
+Lookup Compaction
+~~~~~~~~~~~~~~~~~
+When a primary key table is configured with ``lookup`` changelog producer or
+``first-row`` merge engine or has enabled deletion vectors for MOW mode, Paimon
+will use a radical compaction strategy to force compacting level 0 files to
+higher levels for every compaction trigger.
+
+Paimon also provides configurations to optimize the frequency of this
+compaction:
+
+- ``lookup-compact``: compact mode used for lookup compaction. Possible values:
+
+  * ``radical``: will use ``ForceUpLevel0Compaction`` strategy to radically
+    compact new files.
+  * ``gentle``: will use ``UniversalCompaction`` strategy to gently compact new
+    files.
+
+- ``lookup-compact.max-interval``: The max interval for a forced L0 lookup
+  compaction to be triggered in ``gentle`` mode. This option is only valid when
+  ``lookup-compact`` mode is ``gentle``.
+
+By configuring ``lookup-compact`` as ``gentle``, new files in L0 will not be
+compacted immediately. This may greatly reduce the overall resource usage at the
+expense of worse data freshness in certain cases.
+
+Primary Key Table Compaction Options
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Number of Sorted Runs to Pause Writing
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+When the number of sorted runs is small, Paimon writers will perform compaction
+asynchronously in separated threads, so records can be continuously written into
+the table. However, to avoid unbounded growth of sorted runs, writers will pause
+writing when the number of sorted runs hits the threshold.
+
+.. list-table::
+   :header-rows: 1
+   :widths: 30 10 10 10 40
+
+   * - Option
+     - Required
+     - Default
+     - Type
+     - Description
+   * - ``num-sorted-run.stop-trigger``
+     - No
+     - (none)
+     - Integer
+     - The number of sorted runs that trigger the stopping of writes. The
+       default value is ``num-sorted-run.compaction-trigger + 3``.
+
+Write stalls will become less frequent when ``num-sorted-run.stop-trigger``
+becomes larger, thus improving writing performance. However, if this value
+becomes too large, more memory and CPU time will be needed when querying the
+table.
+
+Number of Sorted Runs to Trigger Compaction
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Paimon uses LSM tree which supports a large number of updates. LSM organizes
+files in several sorted runs. When querying records from an LSM tree, all sorted
+runs must be combined to produce a complete view of all records.
+
+One can easily see that too many sorted runs will result in poor query
+performance. To keep the number of sorted runs in a reasonable range, Paimon
+writers will automatically perform compactions. The following table property
+determines the minimum number of sorted runs to trigger a compaction.
+
+.. list-table::
+   :header-rows: 1
+   :widths: 30 10 10 10 40
+
+   * - Option
+     - Required
+     - Default
+     - Type
+     - Description
+   * - ``num-sorted-run.compaction-trigger``
+     - No
+     - 5
+     - Integer
+     - The sorted run number to trigger compaction. Includes level 0 files (one
+       file one sorted run) and high-level runs (one level one sorted run).
+
+Compaction will become less frequent when ``num-sorted-run.compaction-trigger``
+becomes larger, thus improving writing performance. However, if this value
+becomes too large, more memory and CPU time will be needed when querying the
+table. This is a trade-off between writing and query performance.
diff --git a/docs/source/user_guide/read.rst b/docs/source/user_guide/read.rst
@@ -12,8 +12,8 @@
 .. See the License for the specific language governing permissions and
 .. limitations under the License.
 
-Read and Data Evolution
-===============================================
+Read
+====
 Paimon by functionality can be divided into two layers:
 
 - Control Plane: Responsible for accessing and managing Meta (snapshot, manifest, etc.), including:
@@ -25,7 +25,7 @@ Paimon by functionality can be divided into two layers:
   - Readers for various file formats
   - Coordinated reading of file collections
 
-The control plane and data plane interact primarily via DataSplit (the query plan). Java currently supports a standard
+The control plane and data plane interact primarily via DataSplit (the query plan). C++ Paimon currently supports a standard
 DataSplit protocol which includes the necessary meta information to access data files. With DataSplit, a high-performance
 data access path can be integrated.
 
@@ -39,10 +39,9 @@ across the two language ecosystems.
 
 
 Schema Evolution
-================
-
-Scope and Compatibility
 -----------------------
+Scope and Compatibility
+~~~~~~~~~~~~~~~~~~~~~~~~
 
 C++ Paimon supports all evolution kinds available in Java Paimon for non-nested types:
 
@@ -62,12 +61,12 @@ C++ Paimon supports all evolution kinds available in Java Paimon for non-nested
     - Other operations are supported (consistent with Java Paimon).
 
 Per-File Schema via Field IDs
------------------------------
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 In DataSplit, each file may have a completely different data schema. Paimon uses field IDs to uniquely identify fields.
 
 Overflow Behavior Disclaimer
-----------------------------
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Overflow behavior is undefined for C++ and Java Paimon. Results in overflow scenarios may:
 
@@ -79,8 +78,7 @@ C++ Paimon does not guarantee identical results to Java Paimon in overflow scena
 return values between implementations.
 
 Type Change Support Matrix
---------------------------
-
+~~~~~~~~~~~~~~~~~~~~~~~~~~
 The table below indicates support for changing a column type from ``source`` to ``target``. Refer to the numbered notes below the table
 for caveats.
 
@@ -292,7 +290,7 @@ for caveats.
     - Example input: ``1111111111111111111111111111111111111.15``, Java returns: ``1111111111111111111111111111111111111.2``, C++ returns: ``null``
 
 Implementation Guidance
------------------------
+~~~~~~~~~~~~~~~~~~~~~~~
 
 - Use DataSplit as the sole interface between control and data planes. Treat it as the canonical query plan contract.
 - Resolve field types and IDs per file; prefer inline data file metadata, fallback to table schema files when necessary.
diff --git a/docs/source/user_guide/write.rst b/docs/source/user_guide/write.rst
@@ -12,8 +12,8 @@
 .. See the License for the specific language governing permissions and
 .. limitations under the License.
 
-Write And Prepare Commit
-========================
+Write
+=====
 Batch writing requires the compute engine to pre-bucket data (bucket), using the
 same bucketing strategy as Paimon to ensure correct ``Scan`` behavior, and to
 specify the target ``partition``. Data should be accumulated into ``RecordBatch``
@@ -42,6 +42,7 @@ Bucketing Modes
 
 - PK tables:
 
+  * Support ``bucket = -2`` (postpone bucket mode)
   * Support ``bucket > 0`` (fixed bucket mode)
 
 .. note::
@@ -111,15 +112,15 @@ The ``CommitMessage`` must encode all information required by the coordinator to
 produce a correct ``Snapshot``, which commonly includes (but is not limited to):
 
 - Partition and bucket identifiers associated with written data.
-- New data files, delete files, or changelog artifacts (as applicable to the table type).
+- New data files, delete files (as applicable to the table type).
 - File-level metadata required for manifest and index updates (e.g., row counts, min/max statistics where applicable).
 - Transactional markers and sequence numbers as required by table semantics.
 - Any per-writer state necessary for deduplication or idempotent commits.
 
 .. note::
 
-   Current C++ scope supports Append and PK tables. Changelog and index
-   artifacts are out of scope and should not be emitted in ``CommitMessage`` until
+   Current C++ scope supports Append and PK tables. Changelog is out of
+   scope and should not be emitted in ``CommitMessage`` until
    explicitly supported.
 
 Serialization and Deserialization