metrics of snmalloc #409

SchrodingerZhu · 2021-10-24T13:47:33Z

Hi,
I am implementing snmalloc support for an analytical database engine now. Everything works fine and the performance is really cool. But there is a problem on creating proper statistics of snmalloc:

Basically, I want something like resident memory and (de)committing information. Details like allocation size distribution can also be helpful but it is not an essence.

So I mimic the way of printing out the stats in snmalloc and wrote some code:

    {
        snmalloc::Stats stats;
        snmalloc::current_alloc_pool()->aggregate_stats(stats);

        using namespace snmalloc;

        size_t current = 0;
        size_t total = 0;
        size_t max = 0;
        static size_t large_alloc_max[NUM_LARGE_CLASSES]{0};

        for (sizeclass_t i = 0; i < NUM_SIZECLASSES; i++)
        {
            if (stats.sizeclass[i].count.is_unused())
                continue;

            stats.sizeclass[i].addToRunningAverage();

            auto size = sizeclass_to_size(i);
            set(fmt::format("snmalloc.bucketed_stat_size_{}_current", size), stats.sizeclass[i].count.current);
            set(fmt::format("snmalloc.bucketed_stat_size_{}_max", size), stats.sizeclass[i].count.max);
            set(fmt::format("snmalloc.bucketed_stat_size_{}_total", size), stats.sizeclass[i].count.used);
            set(fmt::format("snmalloc.bucketed_stat_size_{}_average_slab_usage", size), stats.sizeclass[i].online_average);
            set(fmt::format("snmalloc.bucketed_stat_size_{}_average_wasted_space", size),
                (1.0 - stats.sizeclass[i].online_average) * stats.sizeclass[i].slab_count.max);
            current += stats.sizeclass[i].count.current * size;
            total += stats.sizeclass[i].count.used * size;
            max += stats.sizeclass[i].count.max * size;
        }

        for (uint8_t i = 0; i < NUM_LARGE_CLASSES; i++)
        {
            if ((stats.large_push_count[i] == 0) && (stats.large_pop_count[i] == 0))
                continue;

            auto size = large_sizeclass_to_size(i);
            set(fmt::format("snmalloc.large_bucketed_stat_size_{}_push_count", size), stats.large_push_count[i]);
            set(fmt::format("snmalloc.large_bucketed_stat_size_{}_pop_count", size), stats.large_pop_count[i]);
            auto large_alloc = (stats.large_pop_count[i] - stats.large_push_count[i]) * size;
            large_alloc_max[i] = std::max(large_alloc_max[i], large_alloc);
            current += large_alloc;
            total += stats.large_push_count[i] * size;
            max += large_alloc_max[i];
        }

        set("snmalloc.global_stat_remote_freed", stats.remote_freed);
        set("snmalloc.global_stat_remote_posted", stats.remote_posted);
        set("snmalloc.global_stat_remote_received", stats.remote_received);
        set("snmalloc.global_stat_superslab_pop_count", stats.superslab_pop_count);
        set("snmalloc.global_stat_superslab_push_count", stats.superslab_push_count);
        set("snmalloc.global_stat_segment_count", stats.segment_count);
        set("snmalloc.global_stat_current_size", current);
        set("snmalloc.global_stat_total_size", total);
        set("snmalloc.global_stat_max_size", max);
    }

I don't know. but maybe the above method would create too many entries in the summary?

And any suggestion on creating more concise async metrics for the allocator?

SchrodingerZhu · 2021-10-24T14:40:09Z

Another thing is that it is probably not a good idea to print out the statistics after thread exiting in this situation.

SchrodingerZhu · 2021-10-25T06:43:52Z

Another interesting part is that, if I enable stats, the dealloc routine would be taken up by costly average calculation. (I cannot provide further stack traces since the product has not been released as open source yet. sorry for that)

mjp41 · 2021-10-25T08:22:36Z

So those statistics are pretty heavy weight, and were not designed for production. More for working out what snmalloc is doing wrong. They have not really been maintained. There are very coarse statistics available from

snmalloc/src/override/malloc-extensions.cc

Lines 7 to 12 in 6e63874

    
           void get_malloc_info_v1(malloc_info_v1* stats) 
        
           { 
        
             auto next_memory_usage = default_memory_provider().memory_usage(); 
        
             stats->current_memory_usage = next_memory_usage.first; 
        
             stats->peak_memory_usage = next_memory_usage.second; 
        
           }

This might be sufficient for what you are after. This is tracked all the time and is very cheap. It was considered the bare minimum for some other services.

With the rewrite on the snmalloc2 branch, I am about to investigate statistics tracking. So if you have requirements, I will try to work them into what I build.

SchrodingerZhu · 2021-10-25T11:55:33Z

I would provide some records from my side. This is from a analytical database engine (single node in this case). It took up almost all the system memory on linux (as it won't madvise them back).
The problem is, the server itself uses mmap/mremap for large allocations for get potential speedup from OS paging. So I am very concerned to have this de-commit pattern in a production env.

mjp41 · 2021-10-25T14:47:55Z

@SchrodingerZhu are you able to try #404 for your use case? This getting pretty stable now, and should address your concern about holding on to OS memory.

What is the green line showing in the graph? RSS or Virtual memory usage?

SchrodingerZhu · 2021-10-25T15:57:50Z

according to the name of the metric, it should be RSS. I can also see htop show similar memory usage of my program with the green line.

SchrodingerZhu · 2021-10-25T16:05:30Z

Since all of my works now are in experimental mode, I would like to give snmalloc 2 and #404 a try. I could also report back the changes in performance and the metric.

Thanks for the suggestions! And the above results were still on snmalloc 1 and there was something like a tens of seconds performance bump on some TPCH workload when switching from jemalloc to snmalloc, which really made me astonished. Let's see what we can get with snmalloc 2.

SchrodingerZhu · 2021-10-26T07:26:42Z

I believe #404 is working since we can now see drops of RSS curve.

However, in this case:

as you can see after some peaks in the memory curve (it tried to acquire more than 169GiB!), the stats suddenly went to zero with snmalloc 2. This means the engine is killed for OOM. Ouch, this is bad; with snmalloc 1 thou the space is not de-committed, I didn't experience OOMs.

The performance of snmalloc 2 degrading was still there, for those successful trials, I could see 100% slow down (from 30s to 1min) for some particular queries. I may provide some flamegraphs on snmalloc stacks when they are ready,

SchrodingerZhu · 2021-10-26T07:31:32Z

Oops, since I was running this on kernel 3.10 I guess madvise with MADV_DONTNEED was too much heavier than I have ever expected.

I think it is madvise that took all the extra running time (up to 30s for that query) in this case.

mjp41 · 2021-10-26T08:02:01Z

I am going to look into a more consolidating calls to madvise, which will hopefully reduce this cost.

So did it work in terms of reducing the memory usage, or did it regress the memory usage and get OOM. I was clear from your message?

SchrodingerZhu · 2021-10-26T08:27:41Z

I can see decrement of the memory usage now: so madvise is working.
even with the decrement above, I got a regression of OOM compared with snmalloc 1

mjp41 · 2022-03-16T11:50:58Z

@SchrodingerZhu would you be able to run this experiment again with the latest main branch? I have done a lot of work on bringing down the footprint, most examples are very close to snmalloc 1 now, so would be interested to know if I have fixed this.

SchrodingerZhu closed this as completed Oct 24, 2021

SchrodingerZhu reopened this Oct 24, 2021

SchrodingerZhu changed the title ~~Implement async metric for snmalloc~~ metrics of snmalloc Oct 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metrics of snmalloc #409

metrics of snmalloc #409

SchrodingerZhu commented Oct 24, 2021 •

edited

SchrodingerZhu commented Oct 24, 2021 •

edited

SchrodingerZhu commented Oct 25, 2021

mjp41 commented Oct 25, 2021

SchrodingerZhu commented Oct 25, 2021 •

edited

mjp41 commented Oct 25, 2021

SchrodingerZhu commented Oct 25, 2021

SchrodingerZhu commented Oct 25, 2021 •

edited

SchrodingerZhu commented Oct 26, 2021

SchrodingerZhu commented Oct 26, 2021 •

edited

mjp41 commented Oct 26, 2021

SchrodingerZhu commented Oct 26, 2021

mjp41 commented Mar 16, 2022

metrics of snmalloc #409

metrics of snmalloc #409

Comments

SchrodingerZhu commented Oct 24, 2021 • edited

SchrodingerZhu commented Oct 24, 2021 • edited

SchrodingerZhu commented Oct 25, 2021

mjp41 commented Oct 25, 2021

SchrodingerZhu commented Oct 25, 2021 • edited

mjp41 commented Oct 25, 2021

SchrodingerZhu commented Oct 25, 2021

SchrodingerZhu commented Oct 25, 2021 • edited

SchrodingerZhu commented Oct 26, 2021

SchrodingerZhu commented Oct 26, 2021 • edited

mjp41 commented Oct 26, 2021

SchrodingerZhu commented Oct 26, 2021

mjp41 commented Mar 16, 2022

SchrodingerZhu commented Oct 24, 2021 •

edited

SchrodingerZhu commented Oct 24, 2021 •

edited

SchrodingerZhu commented Oct 25, 2021 •

edited

SchrodingerZhu commented Oct 25, 2021 •

edited

SchrodingerZhu commented Oct 26, 2021 •

edited