Re: [RFC PATCH v4] mm/slub: Optimize slub memory usage

From: Vlastimil Babka <vbabka@suse.cz>
To: jaypatel@linux.ibm.com, Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: linux-mm@kvack.org, cl@linux.com, penberg@kernel.org,
	rientjes@google.com, iamjoonsoo.kim@lge.com,
	akpm@linux-foundation.org, aneesh.kumar@linux.ibm.com,
	tsahu@linux.ibm.com, piyushs@linux.ibm.com
Subject: Re: [RFC PATCH v4] mm/slub: Optimize slub memory usage
Date: Thu, 7 Sep 2023 15:42:05 +0200	[thread overview]
Message-ID: <fc2752e5-0e9f-3106-f3bd-0e7631f9d23c@suse.cz> (raw)
In-Reply-To: <7fdf3f5dfd9fa1b5e210cc4176cac58a9992ecc0.camel@linux.ibm.com>

On 8/24/23 12:52, Jay Patel wrote:
> On Fri, 2023-08-11 at 17:43 +0200, Vlastimil Babka wrote:
>> On 8/10/23 19:54, Hyeonggon Yoo wrote:
>> > >                         order = calc_slab_order(size,
>> > > min_objects,
>> > >                                         slub_max_order,
>> > > fraction);
>> > > @@ -4159,14 +4164,6 @@ static inline int calculate_order(unsigned
>> > > int size)
>> > >                 min_objects--;
>> > >         }
>> > > -       /*
>> > > -        * We were unable to place multiple objects in a slab.
>> > > Now
>> > > -        * lets see if we can place a single object there.
>> > > -        */
>> > > -       order = calc_slab_order(size, 1, slub_max_order, 1);
>> > > -       if (order <= slub_max_order)
>> > > -               return order;
>> > 
>> > I'm not sure if it's okay to remove this?
>> > It was fine in v2 because the least wasteful order was chosen
>> > regardless of fraction but that's not true anymore.
>> > 
>> > Otherwise, everything looks fine to me. I'm too dumb to anticipate
>> > the outcome of increasing the slab order :P but this patch does not
>> > sound crazy to me.
>> 
>> I wanted to have a better idea how the orders change so I hacked up a
>> patch
>> to print them for all sizes up to 1MB (unnecessarily large I guess)
>> and also
>> for various page sizes and nr_cpus (that's however rather invasive
>> and prone
>> to me missing some helper being used that still relies on real
>> PAGE_SHIFT),
>> then I applied v4 (needed some conflict fixups with my hack) on top:
>> 
>> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slab-orders
>> 
>> As expected, things didn't change with 4k PAGE_SIZE. With 64k
>> PAGE_SIZE, I
>> thought the patch in v4 form would result in lower orders, but seems
>> not always?
>> 
>> I.e. I can see before the patch:
>> 
>>  Calculated slab orders for page_shift 16 nr_cpus 1:
>>           8       0
>>        4376       1
>> 
>> (so until 4368 bytes it keeps order at 0)
>> 
>> And after:
>>           8       0
>>        2264       1
>>        2272       0
>>        2344       1
>>        2352       0
>>        2432       1
>> 
>> Not sure this kind of "oscillation" is helpful with a small machine
>> (1CPU),
>> and 64kB pages so the unused part of page is quite small.
>> 
> Hi Vlastimil,
>  
> With patch. it will cause the fraction_size to rise to 32
> when utilizing a 64k page size. As a result, the maximum wastage cap
> for each slab cache will be 2k (64k divided by 32). Any object size
> exceeding this cap will be moved to order 1 or beyond due to which this
> oscillation is seen.

Hi, sorry for the late reply.

>> With 16 cpus, AFAICS the orders are also larger for some sizes.
>> Hm but you reported reduction of total slab memory which suggests
>> lower
>> orders were selected somewhere, so maybe I did some mistake.A
> 
> AFAIK total slab memory is reduce because of two reason (with this
> patch for larger page size) 
> 1) order for some slab cache is reduce (by increasing fraction_size)

How can increased fraction_size ever result in a lower order? I think it can
only result in increased order (or same order). And the simulations with my
hack patch don't seem to counter example that. Note previously I did expect
the order to be lower (or same) and was surprised by my results, but now I
realized I misunderstood the v4 patch.

> 2) Have also seen reduction in overall slab cache numbers as because of
> increasing page order

I think your results might be just due to randomness and could turn out
different with repeating the test, or converge to be the same if you average
multiple runs. You posted them for "160 CPUs with 64K Page size" and if I
add that combination to my hack print, I see the same result before and
after your patch:

Calculated slab orders for page_shift 16 nr_cpus 160:
         8       0
      1824       1
      3648       2
      7288       3
    174768       2
    196608       3
    524296       4

Still, I might have a bug there. Can you confirm there are actual
differences with a /proc/slabinfo before/after your patch? If there are
none, any differences observed have to be due to randomness, not differences
in order.

Going back to the idea behind your patch, I don't think it makes sense to
try increase the fraction only for higher-orders. Yes, with 1/16 fraction,
the waste with 64kB page can be 4kB, while with 1/32 it will be just 2kB,
and with 4kB this is only 256 vs 128bytes. However the object sizes and
counts don't differ with page size, so with 4kB pages we'll have more slabs
to host the same number of objects, and the waste will accumulate
accordingly - i.e. the fraction metric should be independent of page size
wrt resulting total kilobytes of waste.

So maybe the only thing we need to do is to try setting it to 32 initial
value instead of 16 regardless of page size. That should hopefully again
show a good tradeoff for 4kB as one of the earlier versions, while on 64kB
it shouldn't cause much difference (again, none at all with 160 cpus, some
difference with less than 128 cpus, if my simulations were correct).

>> 
>> Anyway my point here is that this evaluation approach might be
>> useful, even
>> if it's a non-upstreamable hack, and some postprocessing of the
>> output is
>> needed for easier comparison of before/after, so feel free to try
>> that out.
> 
> Thank you for this details test :) 
>> 
>> BTW I'll be away for 2 weeks from now, so further feedback will have
>> to come
>> from others in that time...
>> 
> Do we have any additional feedback from others on the same matter?
> 
> Thank
> 
> Jay Patel
>> > Thanks!
>> > --
>> > Hyeonggon
> 
>