Re: [RFC PATCH v4] mm/slub: Optimize slub memory usage

From: Vlastimil Babka <vbabka@suse.cz>
To: jaypatel@linux.ibm.com, Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: linux-mm@kvack.org, cl@linux.com, penberg@kernel.org,
	rientjes@google.com, iamjoonsoo.kim@lge.com,
	akpm@linux-foundation.org, aneesh.kumar@linux.ibm.com,
	tsahu@linux.ibm.com, piyushs@linux.ibm.com
Subject: Re: [RFC PATCH v4] mm/slub: Optimize slub memory usage
Date: Thu, 14 Sep 2023 08:38:57 +0200	[thread overview]
Message-ID: <d2a2d3a5-aa8a-0c14-a364-b7084cf10d38@suse.cz> (raw)
In-Reply-To: <2e257eb4b3cc76f78619f5b8c9f95462421762d4.camel@linux.ibm.com>

On 9/14/23 07:40, Jay Patel wrote:
> On Thu, 2023-09-07 at 15:42 +0200, Vlastimil Babka wrote:
>> On 8/24/23 12:52, Jay Patel wrote:
>> How can increased fraction_size ever result in a lower order? I think
>> it can
>> only result in increased order (or same order). And the simulations
>> with my
>> hack patch don't seem to counter example that. Note previously I did
>> expect
>> the order to be lower (or same) and was surprised by my results, but
>> now I
>> realized I misunderstood the v4 patch.
> 
> Hi, Sorry for late reply as i was on vacation :) 
> 
> You're absolutely
> right. Increasing the fraction size won't reduce the order, and I
> apologize for any confusion in my previous response.

No problem, glad that it's cleared :)

>> 
>> > 2) Have also seen reduction in overall slab cache numbers as
>> > because of
>> > increasing page order
>> 
>> I think your results might be just due to randomness and could turn
>> out
>> different with repeating the test, or converge to be the same if you
>> average
>> multiple runs. You posted them for "160 CPUs with 64K Page size" and
>> if I
>> add that combination to my hack print, I see the same result before
>> and
>> after your patch:
>> 
>> Calculated slab orders for page_shift 16 nr_cpus 160:
>>          8       0
>>       1824       1
>>       3648       2
>>       7288       3
>>     174768       2
>>     196608       3
>>     524296       4
>> 
>> Still, I might have a bug there. Can you confirm there are actual
>> differences with a /proc/slabinfo before/after your patch? If there
>> are
>> none, any differences observed have to be due to randomness, not
>> differences
>> in order.
> 
> Indeed, to eliminate randomness, I've consistently gathered data from
> /proc/slabinfo, and I can confirm a decrease in the total number of
> slab caches. 
> 
> Values as on 160 cpu system with 64k page size 
> Without
> patch 24892 slab caches
> with patch    23891 slab caches  

I would like to see why exactly they decreased, given what the patch does it
has to be due to getting a higher order slab pages. So the values of
"<objperslab> <pagesperslab>" columns should increase for some caches -
which ones and what is their <objsize>?

>> 
>> Going back to the idea behind your patch, I don't think it makes
>> sense to
>> try increase the fraction only for higher-orders. Yes, with 1/16
>> fraction,
>> the waste with 64kB page can be 4kB, while with 1/32 it will be just
>> 2kB,
>> and with 4kB this is only 256 vs 128bytes. However the object sizes
>> and
>> counts don't differ with page size, so with 4kB pages we'll have more
>> slabs
>> to host the same number of objects, and the waste will accumulate
>> accordingly - i.e. the fraction metric should be independent of page
>> size
>> wrt resulting total kilobytes of waste.
>> 
>> So maybe the only thing we need to do is to try setting it to 32
>> initial
>> value instead of 16 regardless of page size. That should hopefully
>> again
>> show a good tradeoff for 4kB as one of the earlier versions, while on
>> 64kB
>> it shouldn't cause much difference (again, none at all with 160 cpus,
>> some
>> difference with less than 128 cpus, if my simulations were correct).
>> 
> Yes, We can modify the default fraction size to 32 for all page sizes.
> I've noticed that on a 160 CPU system with a 64K page size, there's a
> noticeable change in the total memory allocated for slabs – it
> decreases.
> 
> Alright, I'll make the necessary changes to the patch, setting the
> fraction size default to 32, and I'll post v5 along with some
> performance metrics.

Could you please also check my cleanup series at

https://lore.kernel.org/all/20230908145302.30320-6-vbabka@suse.cz/

(I did Cc you there). If it makes sense, I'd like to apply the further
optimization on top of those cleanups, not the other way around.

Thanks!

>>  
>> > > Anyway my point here is that this evaluation approach might be
>> > > useful, even
>> > > if it's a non-upstreamable hack, and some postprocessing of the
>> > > output is
>> > > needed for easier comparison of before/after, so feel free to try
>> > > that out.
>> > 
>> > Thank you for this details test :) 
>> > > BTW I'll be away for 2 weeks from now, so further feedback will
>> > > have
>> > > to come
>> > > from others in that time...
>> > > 
>> > Do we have any additional feedback from others on the same matter?
>> > 
>> > Thank
>> > 
>> > Jay Patel
>> > > > Thanks!
>> > > > --
>> > > > Hyeonggon
>