Skip to content

Implement table & tree disk usage statistics#17169

Open
shuwenwei wants to merge 62 commits intomasterfrom
table_disk_usage_statistics_with_cache
Open

Implement table & tree disk usage statistics#17169
shuwenwei wants to merge 62 commits intomasterfrom
table_disk_usage_statistics_with_cache

Conversation

@shuwenwei
Copy link
Member

@shuwenwei shuwenwei commented Feb 5, 2026

Description

This PR implements disk-usage statistics collection in table level and device level for tree model. It adds the necessary data structures, background tasks, and read APIs to compute and expose disk usage metrics used by monitoring, admission control and operational tooling.

Tree Model (No Cache)

IoTDB> show disk_usage from root.test.**
+---------+----------+--------+-------------+-----------+
| Database|DataNodeId|RegionId|TimePartition|SizeInBytes|
+---------+----------+--------+-------------+-----------+
|root.test|         1|       3|            0|         70|
+---------+----------+--------+-------------+-----------+
Total line number = 1
It costs 0.959s

Implements ShowDiskUsageNode and ShowDiskUsageOperator.
• Disk usage is calculated by scanning relevant TsFiles at query time.
• Supports:
• Path pattern matching
• Time partition filtering
• Existing SQL semantics (SHOW DISK_USAGE)

Table Model (With Cache)

IoTDB:information_schema> select * from table_disk_usage
+--------+----------+-------+---------+--------------+-------------+
|database|table_name|datanode_id|region_id|time_partition|size_in_bytes|
+--------+----------+-------+---------+--------------+-------------+
|   test1|        t1|          1|        5|             0|          142|
|   test1|        t2|          1|        5|             0|            0|
|   test1|        t1|          1|        6|             0|            0|
|   test1|        t2|          1|        6|             0|           82|
+--------+----------+-------+---------+--------------+-------------+
Total line number = 4
It costs 2.821s

Table Model introduces a dedicated disk usage cache:
• TableDiskUsageCache manages all cache operations.
• A single-threaded background worker processes write, read, and maintenance tasks via an operation queue.
• Cache state is persistent across restarts.

Cached Data
• TsFile-level table size statistics
• Object file size deltas, recorded incrementally
• Periodic snapshot + delta compaction

Query Integration
• Exposes statistics via information_schema.table_disk_usage.
• Supports:
• Predicate pushdown (except on aggregated size columns)
• Limit / offset
• Parallel region-level scanning

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements comprehensive disk usage statistics collection for both Tree Model (device-based) and Table Model databases in IoTDB. The implementation provides monitoring capabilities at the table/device level and time partition level.

Changes:

  • Adds SHOW DISK_USAGE SQL statement for Tree Model with on-demand calculation
  • Implements table_disk_usage information schema table for Table Model with persistent cache
  • Introduces background task infrastructure for cache maintenance with periodic compaction
  • Adds predicate push-down and limit/offset optimization support for information schema tables

Reviewed changes

Copilot reviewed 95 out of 95 changed files in this pull request and generated 33 comments.

Show a summary per file
File Description
pom.xml Updates tsfile version to 2.2.1-260205-SNAPSHOT
TsFileID.java Adds SHALLOW_SIZE constant (contains bug)
InformationSchema.java Adds table_disk_usage schema and push-down support (contains bug)
TableDiskUsageCache*.java Core cache implementation with writer/reader classes
ShowDiskUsageNode.java Plan node for tree model disk usage queries
TableDiskUsageInformationSchemaTableScanNode.java Plan node for table model information schema scans
ShowDiskUsageOperator.java Execution operator for tree model
DiskUsageStatisticUtil.java Base utility class for disk usage calculation
IoTDBDescriptor.java Configuration support (contains bug)
Integration tests Comprehensive tests for both tree and table models

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 438 to 439
public static boolean supportsPushDownLimitOffset(String tableName) {
return columnsThatSupportPushDownPredicate.containsKey(tableName);
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method supportsPushDownLimitOffset checks if a key exists in columnsThatSupportPushDownPredicate instead of checking tablesThatSupportPushDownLimitOffset. This will cause incorrect behavior for tables that support limit/offset push-down but are not in the predicate push-down map. The condition should be tablesThatSupportPushDownLimitOffset.contains(tableName).

Copilot uses AI. Check for mistakes.
"max_sub_task_num_for_information_table_scan",
Integer.toString(conf.getMaxSubTaskNumForInformationTableScan())));
if (maxSubTaskNumForInformationTableScan > 0) {
conf.setMaxRowsInCteBuffer(maxSubTaskNumForInformationTableScan);
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Configuration property mismatch: The method sets maxRowsInCteBuffer instead of maxSubTaskNumForInformationTableScan. This will cause the wrong configuration property to be updated when loading hot-modified properties for max_sub_task_num_for_information_table_scan. The correct call should be conf.setMaxSubTaskNumForInformationTableScan(maxSubTaskNumForInformationTableScan).

Suggested change
conf.setMaxRowsInCteBuffer(maxSubTaskNumForInformationTableScan);
conf.setMaxSubTaskNumForInformationTableScan(maxSubTaskNumForInformationTableScan);

Copilot uses AI. Check for mistakes.
+ ramBytesUsedOfTsFileIDOffsetMap();
}

// tsFileIDOffsetInValueFileMap should be null af first
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: "af" should be "at". The comment should read "tsFileIDOffsetInValueFileMap should be null at first".

Copilot uses AI. Check for mistakes.

public class TsFileID {

public static final long SHALLOW_SIZE = TsFileID.SHALLOW_SIZE;
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Circular reference detected: TsFileID.SHALLOW_SIZE is defined as TsFileID.SHALLOW_SIZE. This will result in uninitialized constant value (likely 0) and incorrect memory estimation. The field should reference RamUsageEstimator.shallowSizeOfInstance(TsFileID.class) instead.

Copilot uses AI. Check for mistakes.
private static final Map<String, TsTable> schemaTables = new HashMap<>();
private static final Map<String, Set<String>> columnsThatSupportPushDownPredicate =
new HashMap<>();
private static final Set<String> tablesThatSupportPushDownLimitOffset = new HashSet<>();
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The contents of this container are never accessed.

Copilot uses AI. Check for mistakes.
}

@After
public void tearDown() throws IOException, StorageEngineException {
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method overrides AbstractCompactionTest.tearDown; it is advisable to add an Override annotation.

Copilot uses AI. Check for mistakes.
private TsFileManager mockTsFileManager;

@Before
public void setUp()
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method overrides AbstractCompactionTest.setUp; it is advisable to add an Override annotation.

Copilot uses AI. Check for mistakes.
}

@After
public void tearDown() throws IOException, StorageEngineException {
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method overrides AbstractCompactionTest.tearDown; it is advisable to add an Override annotation.

Copilot uses AI. Check for mistakes.
private TsFileManager mockTsFileManager;

@Before
public void setUp()
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method overrides AbstractCompactionTest.setUp; it is advisable to add an Override annotation.

Copilot uses AI. Check for mistakes.
DataRegionTableSizeQueryContext dataRegionContext, long startTime, long maxRunTime)
throws IOException;

void close();
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method overrides AutoCloseable.close; it is advisable to add an Override annotation.

Copilot uses AI. Check for mistakes.
@sonarqubecloud
Copy link

sonarqubecloud bot commented Feb 5, 2026

Quality Gate Failed Quality Gate failed

Failed conditions
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant