Skip to content

Transient workloads with large object alloc & dealloc pairs thrash slabs back and forth from the kernel #811

@akrieger

Description

@akrieger

I ran our test suite in 'benchmark' mode, with a fixed seed and repeated runs averaged over time across many hours, we see that overall snmalloc has the edge over mimalloc in singlethread. However, a few notable exceptions stood out. One is a test suite which repeatedly allocs and deallocs a large object (in particular a cache struct which is ~800kB large) and then spends a measurable amount of time in dealloc. Overall tests with this codepath are 2-4x slower (admittedly the value is in microseconds).

The absolute values are not large, on the order of hundreds of microseconds, and ultimately I think the edge goes to snmalloc because these kinds of large allocators are not typical in runtime, but I wanted to call it out and maybe have at least a little discussion on it.

I wonder if decommit on Windows is significantly slower than commit? Some tests which end up allocating large vectors trigger the same codepath.

In the below image, you can compare the rss for the test suite under mimalloc on top and under snmalloc on bottom. The snmalloc graph demonstrates pronounced 'bumps' where this is occuring.
Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions