Implement table & tree disk usage statistics#17169
Implement table & tree disk usage statistics#17169
Conversation
There was a problem hiding this comment.
Pull request overview
This PR implements comprehensive disk usage statistics collection for both Tree Model (device-based) and Table Model databases in IoTDB. The implementation provides monitoring capabilities at the table/device level and time partition level.
Changes:
- Adds
SHOW DISK_USAGESQL statement for Tree Model with on-demand calculation - Implements
table_disk_usageinformation schema table for Table Model with persistent cache - Introduces background task infrastructure for cache maintenance with periodic compaction
- Adds predicate push-down and limit/offset optimization support for information schema tables
Reviewed changes
Copilot reviewed 95 out of 95 changed files in this pull request and generated 33 comments.
Show a summary per file
| File | Description |
|---|---|
| pom.xml | Updates tsfile version to 2.2.1-260205-SNAPSHOT |
| TsFileID.java | Adds SHALLOW_SIZE constant (contains bug) |
| InformationSchema.java | Adds table_disk_usage schema and push-down support (contains bug) |
| TableDiskUsageCache*.java | Core cache implementation with writer/reader classes |
| ShowDiskUsageNode.java | Plan node for tree model disk usage queries |
| TableDiskUsageInformationSchemaTableScanNode.java | Plan node for table model information schema scans |
| ShowDiskUsageOperator.java | Execution operator for tree model |
| DiskUsageStatisticUtil.java | Base utility class for disk usage calculation |
| IoTDBDescriptor.java | Configuration support (contains bug) |
| Integration tests | Comprehensive tests for both tree and table models |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| public static boolean supportsPushDownLimitOffset(String tableName) { | ||
| return columnsThatSupportPushDownPredicate.containsKey(tableName); |
There was a problem hiding this comment.
The method supportsPushDownLimitOffset checks if a key exists in columnsThatSupportPushDownPredicate instead of checking tablesThatSupportPushDownLimitOffset. This will cause incorrect behavior for tables that support limit/offset push-down but are not in the predicate push-down map. The condition should be tablesThatSupportPushDownLimitOffset.contains(tableName).
| "max_sub_task_num_for_information_table_scan", | ||
| Integer.toString(conf.getMaxSubTaskNumForInformationTableScan()))); | ||
| if (maxSubTaskNumForInformationTableScan > 0) { | ||
| conf.setMaxRowsInCteBuffer(maxSubTaskNumForInformationTableScan); |
There was a problem hiding this comment.
Configuration property mismatch: The method sets maxRowsInCteBuffer instead of maxSubTaskNumForInformationTableScan. This will cause the wrong configuration property to be updated when loading hot-modified properties for max_sub_task_num_for_information_table_scan. The correct call should be conf.setMaxSubTaskNumForInformationTableScan(maxSubTaskNumForInformationTableScan).
| conf.setMaxRowsInCteBuffer(maxSubTaskNumForInformationTableScan); | |
| conf.setMaxSubTaskNumForInformationTableScan(maxSubTaskNumForInformationTableScan); |
| + ramBytesUsedOfTsFileIDOffsetMap(); | ||
| } | ||
|
|
||
| // tsFileIDOffsetInValueFileMap should be null af first |
There was a problem hiding this comment.
Typo in comment: "af" should be "at". The comment should read "tsFileIDOffsetInValueFileMap should be null at first".
|
|
||
| public class TsFileID { | ||
|
|
||
| public static final long SHALLOW_SIZE = TsFileID.SHALLOW_SIZE; |
There was a problem hiding this comment.
Circular reference detected: TsFileID.SHALLOW_SIZE is defined as TsFileID.SHALLOW_SIZE. This will result in uninitialized constant value (likely 0) and incorrect memory estimation. The field should reference RamUsageEstimator.shallowSizeOfInstance(TsFileID.class) instead.
| private static final Map<String, TsTable> schemaTables = new HashMap<>(); | ||
| private static final Map<String, Set<String>> columnsThatSupportPushDownPredicate = | ||
| new HashMap<>(); | ||
| private static final Set<String> tablesThatSupportPushDownLimitOffset = new HashSet<>(); |
There was a problem hiding this comment.
The contents of this container are never accessed.
| } | ||
|
|
||
| @After | ||
| public void tearDown() throws IOException, StorageEngineException { |
There was a problem hiding this comment.
This method overrides AbstractCompactionTest.tearDown; it is advisable to add an Override annotation.
| private TsFileManager mockTsFileManager; | ||
|
|
||
| @Before | ||
| public void setUp() |
There was a problem hiding this comment.
This method overrides AbstractCompactionTest.setUp; it is advisable to add an Override annotation.
| } | ||
|
|
||
| @After | ||
| public void tearDown() throws IOException, StorageEngineException { |
There was a problem hiding this comment.
This method overrides AbstractCompactionTest.tearDown; it is advisable to add an Override annotation.
| private TsFileManager mockTsFileManager; | ||
|
|
||
| @Before | ||
| public void setUp() |
There was a problem hiding this comment.
This method overrides AbstractCompactionTest.setUp; it is advisable to add an Override annotation.
| DataRegionTableSizeQueryContext dataRegionContext, long startTime, long maxRunTime) | ||
| throws IOException; | ||
|
|
||
| void close(); |
There was a problem hiding this comment.
This method overrides AutoCloseable.close; it is advisable to add an Override annotation.
|




Description
This PR implements disk-usage statistics collection in table level and device level for tree model. It adds the necessary data structures, background tasks, and read APIs to compute and expose disk usage metrics used by monitoring, admission control and operational tooling.
Tree Model (No Cache)
Implements ShowDiskUsageNode and ShowDiskUsageOperator.
• Disk usage is calculated by scanning relevant TsFiles at query time.
• Supports:
• Path pattern matching
• Time partition filtering
• Existing SQL semantics (SHOW DISK_USAGE)
Table Model (With Cache)
Table Model introduces a dedicated disk usage cache:
• TableDiskUsageCache manages all cache operations.
• A single-threaded background worker processes write, read, and maintenance tasks via an operation queue.
• Cache state is persistent across restarts.
Cached Data
• TsFile-level table size statistics
• Object file size deltas, recorded incrementally
• Periodic snapshot + delta compaction
Query Integration
• Exposes statistics via information_schema.table_disk_usage.
• Supports:
• Predicate pushdown (except on aggregated size columns)
• Limit / offset
• Parallel region-level scanning