Skip to content

[CASSANDRA-21083][trunk] Optimize memtable flush logic#4536

Closed
netudima wants to merge 1 commit intoapache:trunkfrom
netudima:CASSANDRA-21083-trunk
Closed

[CASSANDRA-21083][trunk] Optimize memtable flush logic#4536
netudima wants to merge 1 commit intoapache:trunkfrom
netudima:CASSANDRA-21083-trunk

Conversation

@netudima
Copy link
Copy Markdown
Contributor

No description provided.

@netudima netudima force-pushed the CASSANDRA-21083-trunk branch from 7fa2bf3 to 716d6be Compare December 20, 2025 18:22
Comment thread src/java/org/apache/cassandra/db/marshal/AbstractType.java Outdated
Comment thread src/java/org/apache/cassandra/io/sstable/metadata/MetadataCollector.java Outdated
Comment thread src/java/org/apache/cassandra/io/sstable/format/SortedTableWriter.java Outdated
Comment thread src/java/org/apache/cassandra/db/rows/Cell.java Outdated
Comment thread src/java/org/apache/cassandra/db/ClusteringPrefix.java Outdated
Comment thread src/java/org/apache/cassandra/db/memtable/Flushing.java Outdated
Comment thread src/java/org/apache/cassandra/io/sstable/metadata/MetadataCollector.java Outdated
Comment thread src/java/org/apache/cassandra/db/memtable/Flushing.java Outdated
@netudima
Copy link
Copy Markdown
Contributor Author

I've done a retesting for the current version

            if (cell.getClass() == NativeCell.class)
            {
                valueSize = cell.valueSize();
                hasValue = valueSize > 0;
                isDeleted = cell.isTombstone();
                isExpiring = cell.isExpiring();
                cellTimestamp = cell.timestamp();
                localDeletionTime = cell.localDeletionTime();
                ttl = cell.ttl();
                value = cell.value();
                accessor = cell.accessor();
            }
            else
            {
                valueSize = cell.valueSize();
                hasValue = valueSize > 0;
                isDeleted = cell.isTombstone();
                isExpiring = cell.isExpiring();
                cellTimestamp = cell.timestamp();
                localDeletionTime = cell.localDeletionTime();
                ttl = cell.ttl();
                value = cell.value();
                accessor = cell.accessor();
            }

is not equivalent to

            if (cell.getClass() == ArrayCell.class)
            {
                valueSize = cell.valueSize();
                hasValue = valueSize > 0;
                isDeleted = cell.isTombstone();
                isExpiring = cell.isExpiring();
                cellTimestamp = cell.timestamp();
                localDeletionTime = cell.localDeletionTime();
                ttl = cell.ttl();
                value = cell.value();
                accessor = cell.accessor();
            }
            else
            {
                valueSize = cell.valueSize();
                hasValue = valueSize > 0;
                isDeleted = cell.isTombstone();
                isExpiring = cell.isExpiring();
                cellTimestamp = cell.timestamp();
                localDeletionTime = cell.localDeletionTime();
                ttl = cell.ttl();
                value = cell.value();
                accessor = cell.accessor();
            }

while direct calls like cell.timestamp() as expected are inlined in both cases
image

there is a difference in inlining for 2nd level in cases like cell.localDeletionTime(); -> cell.localDeletionTimeAsUnsignedInt(), it looks like inlining does not work on the 2nd level for bimorphic calls:
image

I'll try and compare possible options, such as two if blocks if (cell.getClass() == ArrayCell.class) + if (cell.getClass() == NativeCell.class) and switch to it if it shows better results.

@netudima
Copy link
Copy Markdown
Contributor Author

netudima commented Jan 19, 2026

CPU flush time per partition (get_flush_cpu_time.sh), microseconds (per partition metric is used to normalize data, because heap and offheap flushing triggers on different thresholds, so SSTable size is is different):

Option offheap_objects heap_buffers
1if NativeCell 10.433 12.451, no inlining for transitive Cell calls
1if ArrayCell 11.112, no inlining for transitive Cell calls 12.044
2 ifs ArrayCell+NativeCell 10.369 12.277
2 ifs NativeCell+ArrayCell 10.147 12.394

note: for the option 2 ifs NativeCell+ArrayCell, I've enforced inlining in MinMax(Int/Long)Tracker

so, it looks like 2 ifs NativeCell+ArrayCell option is the safest one (while in most cases the results are actually close/within a error margin, except some combinations)

add more flushing stats: partitions/rows, bytes rate, CPU and heap allocation for the flushing thread
avoid columns filtering overheads for unfilteredIterator
do not re-map colums in serializeRowBody if they haven't changed
reduce allocations during serialization of NativeClustering
add fast return for BTreeRow.hasComplexDeletion, avoid column.name.bytes.hashCode if not needed, avoid capturing lambda allocation in UnfilteredSerializer.serializeRowBody
check if Guardrails enabled at the beginning of writing, avoid hidden auto-boxing for logging of primitive parameters
split call sites for in Cell serialize logic, make isCounterCell cheaper (avoid megamorphic call + cache isCounterColumn)
invoke metadataCollector.updateClusteringValues only for first and last clustering key in a partition
enforce inlining for MinMaxIntTracker/MinMaxLongTracker

Patch by Dmitry Konstantinov; reviewed by Branimir Lambov for CASSANDRA-21083
@netudima netudima force-pushed the CASSANDRA-21083-trunk branch from de0346a to ef49f50 Compare January 19, 2026 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants