Skip to content

Comments

Object stats#616

Draft
mjp41 wants to merge 19 commits intomicrosoft:mainfrom
mjp41:object_stats
Draft

Object stats#616
mjp41 wants to merge 19 commits intomicrosoft:mainfrom
mjp41:object_stats

Conversation

@mjp41
Copy link
Member

@mjp41 mjp41 commented Jun 8, 2023

This adds some statistic for tracking

  • How many deallocations are in message queues.
  • How many allocators have been created.
  • Per sizeclass statistics
    • Number of objects allocated
    • Number of objects deallocated
    • Number of slabs allocated
    • Number of slabs deallocated

The per sizeclass statistics are tracked per allocator, and a racy read is done to combine the results for displaying.

These statistics were used to debug #615 to calculate the fragmentation.

The displayed statistics are intended for post processing to calculate the fragmentation/utilisation.

The interface just prints the results using message. This could be improved with a better logging infrastructure.

@mjp41 mjp41 requested a review from nwf-msr June 8, 2023 11:35
Copy link
Contributor

@nwf-msr nwf-msr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments so far; posting before switching threads.

Copy link
Contributor

@nwf-msr nwf-msr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks quite nice. ISTR snmalloc of old had the ability to conditionally keep stats or not; perhaps it would be worth having an empty implementation of the Stat and MonotoneStat interfaces and either templating or having a namespace snmalloc-scoped using to pick between them?

}

if (
result == nullptr && RemoteDeallocCache::remote_inflight.get_curr() != 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something has happened (TM) with the syntax there. Can this be a SNMALLOC_ASSERT_MSG?

Copy link
Contributor

@nwf-msr nwf-msr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks quite nice. ISTR snmalloc of old had the ability to conditionally keep stats or not; perhaps it would be worth having an empty implementation of the Stat and MonotoneStat interfaces and either templating or having a namespace snmalloc-scoped using to pick between them?

@mjp41
Copy link
Member Author

mjp41 commented Jun 8, 2023

Generally looks quite nice. ISTR snmalloc of old had the ability to conditionally keep stats or not; perhaps it would be worth having an empty implementation of the Stat and MonotoneStat interfaces and either templating or having a namespace snmalloc-scoped using to pick between them?

I was going to profile to see how much the operations cost. If they are noticeable, then I will macro it away as you suggest.

@mjp41 mjp41 force-pushed the object_stats branch 4 times, most recently from bfc415a to 5bc8fd8 Compare March 22, 2025 22:38
@mjp41
Copy link
Member Author

mjp41 commented Mar 25, 2025

So I have benchmarked this, and it has a perf regression. The worst case seems to be 3% (glibc-thread), but most tests are below 1%. I am going to investigate moving more of the statistics off the fast path.

This will basically be,

  1. assuming everything in current fast free list has been allocated up front, so individual allocations don't need to do accounting.
  2. on frees only update the count for a slab when we hit a slow path, or that free list is taken to be used as a fast free list.

This will over approximate the current user allocations quite a bit, but should make the overhead practically zero.

Alternatively, we could look at making this a compile time option.

@mjp41 mjp41 mentioned this pull request Jul 1, 2025
@akrieger
Copy link

akrieger commented Feb 13, 2026

It would be great if this could be rebased to main (even if not landed), I gave it a shot but some of the conflicts were too weird for me to figure out. I don't care about landing or a perf regression, I just want to get some better stats around utilization for some local testing/comparisons.

@mjp41 mjp41 marked this pull request as draft February 19, 2026 15:21
@mjp41
Copy link
Member Author

mjp41 commented Feb 19, 2026

@akrieger I have rebased and it seems to pass tests, but this is currently minimally tested. I'll set a perf run to check what the regression is.

Please let me know what kind of API you would like to access the stats. Also, how accurate do you want the statistics? This should be tracking individual allocations and deallocations for each sizeclass. But we might want to track the number of allocations and deallocations at a coarser granularity to reduce the performance impact.

I would either make this reasonably accurate statistic available under a compile flag, or an over-approximating system which is always on. Or possibly both.

@mjp41
Copy link
Member Author

mjp41 commented Feb 19, 2026

The perf results look similar to before, but now we have prettier results.

https://bencher.dev/console/projects/snmalloc/plots

The regression in redis might be noise, as one run was about the expected amount, and two were much larger. I've sent a second run to get a bit more data.

Currently, I don't think the performance is good enough for an always on feature. So either we need to add compile flags and some more CI targets, or reduce the accuracy and make it always on.

@akrieger
Copy link

I personally would like an accurate system over a performant one, but that's because I'm doing offline evaluation of various allocator options to decide which to use :)

There's two main questions I have not gotten good answers for when comparing/evaluating the various allocators: what is my fragmentation/utilization like, and can I tune my size classes to get better results for my specific workloads (and to what specific sizes). Right now all I can see are very high level patterns like 'on this test suite, snmalloc is consistently from 0-100MB higher in rss than mimalloc v3 but also seems to more aggressively return memory to the kernel'. But that extra 100MB might come at a bad time for an old android device and cause it to OOM instead, so I want to know if that's usable memory that will buffer incoming allocators or relatively permanently fragmented.

The api doesn't have to be particularly fast to answer any question either. Like a function call which returns or prints a list of stats like, I don't know, amount of used/fragmented/free space per slab or bucket or whatever the internal unit of allocation is (apologies, I haven't dug into it that deeply), which I can then print out at my convenience and postprocess in a spreadsheet app. It can take however long it needs to walk the internal structures in that case.

I wrote up this entire comment, by the way, without having reminded myself what the original PR summary was, and I see now that what I'm asking for is exactly what this PR was originally intended for :)

(For comparison, until now my memory debugging tool is to the Visual Studio memory profiler, which is... great for debugging specific allocations but not good for high level statistics).

@mjp41
Copy link
Member Author

mjp41 commented Feb 19, 2026

@akrieger thanks. I think what is there should be fairly useable based on your description. It dumps to std::err, so hopefully not too interleaved with your output. We can move to a file, but that would be a reasonable amount of work to add for all platforms (we have a lot of them).

$ ./perf-batchblitz-fast  2> output
...............................................................

You can then grep output and you will grep two interleaved tables.

$ grep output -e "snmalloc_allocs"
0x1: snmalloc_allocs,dumpid,sizeclass,size,allocated,deallocated,in_use,bytes,slabs allocated,slabs deallocated,slabs in_use,slabs bytes
0x1: snmalloc_allocs,0x0,0x5c,0x1400,0x18d314,0x0,0x18d314,0x1f07d9000,0x241bc,0x0,0x241bc,0x120de0000
0x1: snmalloc_allocs,0x1,0x5c,0x1400,0x753fd1,0x3236a0,0x430931,0x53cb7d400,0xa6b12,0x81ccc,0x24e46,0x127230000
0x1: snmalloc_allocs,0x2,0x5c,0x1400,0x133c6d2,0x9f3b80,0x948b52,0xb9ae26800,0x1b378f,0x19ba09,0x17d86,0xbec30000
0x1: snmalloc_allocs,0x3,0x5c,0x1400,0x1c62752,0xee0fc0,0xd81792,0x10e1d76800,0x282401,0x2676a8,0x1ad59,0xd6ac8000
0x1: snmalloc_allocs,0x4,0x5c,0x1400,0x2a18f93,0x15fe9e0,0x141a5b3,0x1920f1fc00,0x3b854b,0x38dac3,0x2aa88,0x155440000
0x1: snmalloc_allocs,0x5,0x5c,0x1400,0x3714762,0x1dbc7d0,0x1957f92,0x1fadf76800,0x4dcef3,0x4cde0c,0xf0e7,0x78738000
0x1: snmalloc_allocs,0x6,0x5c,0x1400,0x430bee2,0x23fdbe0,0x1f0e302,0x26d1bc2800,0x5ebb71,0x5d0bd3,0x1af9e,0xd7cf0000
0x1: snmalloc_allocs,0x7,0x5c,0x1400,0x4f2e4ee,0x2ac16fe,0x246cdf0,0x2d8816c000,0x6fdccc,0x6e8854,0x15478,0xaa3c0000
0x1: snmalloc_allocs,0x8,0x5c,0x1400,0x5ac00f7,0x30fa8dc,0x29c581b,0x3436e21c00,0x803472,0x7e9e5a,0x19618,0xcb0c0000
0x1: snmalloc_allocs,0x9,0x5c,0x1400,0x6853d44,0x3812214,0x3041b30,0x3c521fc000,0x936571,0x90f2f1,0x27280,0x139400000
0x1: snmalloc_allocs,0xa,0x5c,0x1400,0x76fd921,0x40aa4c0,0x3653461,0x43e8179400,0xa81165,0xa729df,0xe786,0x73c30000
0x1: snmalloc_allocs,0xb,0x5c,0x1400,0x8259c2f,0x46c293a,0x3b972f5,0x4a7cfb2400,0xb81a4c,0xb6e9ee,0x1305e,0x982f0000
0x1: snmalloc_allocs,0xc,0x5c,0x1400,0x8ee3632,0x4de455a,0x40ff0d8,0x513ed0e000,0xc9cbd2,0xc95a50,0x7182,0x38c10000
0x1: snmalloc_allocs,0xd,0x5c,0x1400,0x9b91cc6,0x53fabe0,0x47970e6,0x597cd1f800,0xdbbf2b,0xd91507,0x2aa24,0x155120000
0x1: snmalloc_allocs,0xe,0x5c,0x1400,0xa7beed8,0x5af995e,0x4cc557a,0x5ff6ad8800,0xece835,0xeb27dc,0x1c059,0xe02c8000
0x1: snmalloc_allocs,0xf,0x5c,0x1400,0xb535aa4,0x626fc7e,0x52c5e26,0x67775af800,0xffeb67,0xfe71d4,0x17993,0xbcc98000
0x1: snmalloc_allocs,0x10,0x5c,0x1400,0xc1b6afc,0x68f96e0,0x58bd41c,0x6eec923000,0x111962c,0x10f5825,0x23e07,0x11f038000
0x1: snmalloc_allocs,0x11,0x5c,0x1400,0xcf8faa8,0x70f24f6,0x5e9d5b2,0x7644b1e800,0x1251d46,0x123f281,0x12ac5,0x95628000
0x1: snmalloc_allocs,0x12,0x5c,0x1400,0xdba0c0c,0x76f88e0,0x64a832c,0x7dd23f7000,0x1362dd3,0x133834b,0x2aa88,0x155440000
0x1: snmalloc_allocs,0x13,0x5c,0x1400,0xe5b4b51,0x7ccd5de,0x68e7573,0x83212cfc00,0x144627a,0x142942a,0x1ce50,0xe7280000
0x1: snmalloc_allocs,0x14,0x5c,0x1400,0xf1b4339,0x83c6c3e,0x6ded6fb,0x8968cb9c00,0x1554b1f,0x1549a85,0xb09a,0x584d0000
0x1: snmalloc_allocs,0x15,0x5c,0x1400,0xfe9963c,0x8aa9502,0x73f013a,0x90ec188800,0x167820f,0x1666602,0x11c0d,0x8e068000
0x1: snmalloc_allocs,0x16,0x5c,0x1400,0x10ca3c0e,0x91f6de0,0x7aace2e,0x99581b9800,0x17b5988,0x179457b,0x2140d,0x10a068000
0x1: snmalloc_allocs,0x17,0x5c,0x1400,0x11806728,0x97f722e,0x800f4fa,0xa013238800,0x18b7066,0x188c6ae,0x2a9b8,0x154dc0000
0x1: snmalloc_allocs,0x18,0x5c,0x1400,0x122f08c7,0x9df61e0,0x84fa6e7,0xa6390a0c00,0x19ada35,0x1984889,0x291ac,0x148d60000
0x1: snmalloc_allocs,0x19,0x5c,0x1400,0x12c2c88b,0xa36e7be,0x88be0cd,0xaaed900400,0x1a7def5,0x1a66ccf,0x17226,0xb9130000
0x1: snmalloc_allocs,0x1a,0x5c,0x1400,0x13467792,0xa7c2810,0x8ca4f82,0xafce362800,0x1b37f07,0x1b19b94,0x1e373,0xf1b98000
0x1: snmalloc_allocs,0x1b,0x5c,0x1400,0x13e75ac5,0xacf52e0,0x91807e5,0xb5e09de400,0x1c1b774,0x1bf0cec,0x2aa88,0x155440000
0x1: snmalloc_allocs,0x1c,0x5c,0x1400,0x145f9e02,0xb1a7f50,0x9451eb2,0xb96665e800,0x1cc4a2b,0x1cb324c,0x117df,0x8bef8000
0x1: snmalloc_allocs,0x1d,0x5c,0x1400,0x14ef354f,0xb67efb6,0x9874599,0xbe916ff400,0x1d8f939,0x1d7b4d3,0x14466,0xa2330000

and

$ grep output -e "snmalloc_totals"
0x1: snmalloc_totals,dumpid,backend bytes,peak backend bytes,requested,slabs requested bytes,remote inflight bytes,allocator count
0x1: snmalloc_totals,0x0,0x122820000,0x122820000,0x1f07d9000,0x120de0000,0x0,0x8
0x1: snmalloc_totals,0x1,0x155620000,0x156020000,0x53cb7d400,0x127230000,0xd8f90000,0x8
0x1: snmalloc_totals,0x2,0x151620000,0x156020000,0xb9ae26800,0xbec30000,0x28000,0x8
0x1: snmalloc_totals,0x3,0x151a20000,0x156020000,0x10e1d76800,0xd6ac8000,0x11a58000,0x8
0x1: snmalloc_totals,0x4,0x156020000,0x156020000,0x1920f1fc00,0x155440000,0x108cb0800,0x8
0x1: snmalloc_totals,0x5,0x152420000,0x156020000,0x1fadf76800,0x78738000,0x33941000,0x8
0x1: snmalloc_totals,0x6,0x154c20000,0x156020000,0x26d1bc2800,0xd7cf0000,0x28000,0x8
0x1: snmalloc_totals,0x7,0x151020000,0x156020000,0x2d8816c000,0xaa3c0000,0x28000,0x8
0x1: snmalloc_totals,0x8,0x156020000,0x156020000,0x3436e21c00,0xcb0c0000,0xf3fcf000,0x8
0x1: snmalloc_totals,0x9,0x150420000,0x156020000,0x3c521fc000,0x139400000,0x4e1d3000,0x8
0x1: snmalloc_totals,0xa,0x152e20000,0x156020000,0x43e8179400,0x73c30000,0xbe48800,0x8
0x1: snmalloc_totals,0xb,0x14ca20000,0x156020000,0x4a7cfb2400,0x982f0000,0x2b8dd800,0x8
0x1: snmalloc_totals,0xc,0x14be20000,0x156020000,0x513ed0e000,0x38c10000,0x5426800,0x8
0x1: snmalloc_totals,0xd,0x156020000,0x156020000,0x597cd1f800,0x155120000,0x10fbcb000,0x8
0x1: snmalloc_totals,0xe,0x153c20000,0x156020000,0x5ff6ad8800,0xe02c8000,0x28000,0x8
0x1: snmalloc_totals,0xf,0x151220000,0x156020000,0x67775af800,0xbcc98000,0x2a07b000,0x8
0x1: snmalloc_totals,0x10,0x156020000,0x156020000,0x6eec923000,0x11f038000,0x12a47b000,0x8
0x1: snmalloc_totals,0x11,0x151220000,0x156020000,0x7644b1e800,0x95628000,0x1676000,0x8
0x1: snmalloc_totals,0x12,0x156020000,0x156020000,0x7dd23f7000,0x155440000,0x1095bb800,0x8
0x1: snmalloc_totals,0x13,0x152420000,0x156020000,0x83212cfc00,0xe7280000,0x2f977000,0x8
0x1: snmalloc_totals,0x14,0x14dc20000,0x156020000,0x8968cb9c00,0x584d0000,0x18b98800,0x8
0x1: snmalloc_totals,0x15,0x151620000,0x156020000,0x90ec188800,0x8e068000,0x11a3c800,0x8
0x1: snmalloc_totals,0x16,0x156020000,0x156020000,0x99581b9800,0x10a068000,0x7080000,0x8
0x1: snmalloc_totals,0x17,0x153620000,0x156020000,0xa013238800,0x154dc0000,0x3518a800,0x8
0x1: snmalloc_totals,0x18,0x152c20000,0x156020000,0xa6390a0c00,0x148d60000,0x7b88d800,0x8
0x1: snmalloc_totals,0x19,0x150620000,0x156020000,0xaaed900400,0xb9130000,0x39c5b000,0x8
0x1: snmalloc_totals,0x1a,0x151220000,0x156020000,0xafce362800,0xf1b98000,0x1987000,0x8
0x1: snmalloc_totals,0x1b,0x156020000,0x156020000,0xb5e09de400,0x155440000,0x13f303000,0x8
0x1: snmalloc_totals,0x1c,0x153a20000,0x156020000,0xb96665e800,0x8bef8000,0x2b5ca000,0x8
0x1: snmalloc_totals,0x1d,0x150620000,0x156020000,0xbe916ff400,0xa2330000,0x5cf4e000,0x8

The 0x1: comes from the logging to give the output line a thread id. You will want to strip this, and then you have two CSV files.

This doesn't stop the threads and uses relaxed reads and writes, so it isn't a technically correct snapshot, but for the kind of analysis you want, and I used it for, it should be accurate enough.

The 100Mb you mentioned what is the overall footprint? I am interested to know how much overhead we have on mimalloc for your application.

@akrieger
Copy link

akrieger commented Feb 19, 2026

Up to 100mb out of 1-2gb, so 5-10%, but it's rounding to the .1GB in the UI and I haven't dug deeper into that aspect yet (still mostly spending my time analyzing cpu/wall time).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants