Re: [RFC PATCH v4] mm/slub: Optimize slub memory usage

From: Jay Patel <jaypatel@linux.ibm.com>
To: Vlastimil Babka <vbabka@suse.cz>, Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: linux-mm@kvack.org, cl@linux.com, penberg@kernel.org,
	rientjes@google.com, iamjoonsoo.kim@lge.com,
	akpm@linux-foundation.org, aneesh.kumar@linux.ibm.com,
	tsahu@linux.ibm.com, piyushs@linux.ibm.com
Subject: Re: [RFC PATCH v4] mm/slub: Optimize slub memory usage
Date: Thu, 14 Sep 2023 11:10:10 +0530	[thread overview]
Message-ID: <2e257eb4b3cc76f78619f5b8c9f95462421762d4.camel@linux.ibm.com> (raw)
In-Reply-To: <fc2752e5-0e9f-3106-f3bd-0e7631f9d23c@suse.cz>

On Thu, 2023-09-07 at 15:42 +0200, Vlastimil Babka wrote:
> On 8/24/23 12:52, Jay Patel wrote:
> > On Fri, 2023-08-11 at 17:43 +0200, Vlastimil Babka wrote:
> > > On 8/10/23 19:54, Hyeonggon Yoo wrote:
> > > > >                         order = calc_slab_order(size,
> > > > > min_objects,
> > > > >                                         slub_max_order,
> > > > > fraction);
> > > > > @@ -4159,14 +4164,6 @@ static inline int
> > > > > calculate_order(unsigned
> > > > > int size)
> > > > >                 min_objects--;
> > > > >         }
> > > > > -       /*
> > > > > -        * We were unable to place multiple objects in a
> > > > > slab.
> > > > > Now
> > > > > -        * lets see if we can place a single object there.
> > > > > -        */
> > > > > -       order = calc_slab_order(size, 1, slub_max_order, 1);
> > > > > -       if (order <= slub_max_order)
> > > > > -               return order;
> > > > 
> > > > I'm not sure if it's okay to remove this?
> > > > It was fine in v2 because the least wasteful order was chosen
> > > > regardless of fraction but that's not true anymore.
> > > > 
> > > > Otherwise, everything looks fine to me. I'm too dumb to
> > > > anticipate
> > > > the outcome of increasing the slab order :P but this patch does
> > > > not
> > > > sound crazy to me.
> > > 
> > > I wanted to have a better idea how the orders change so I hacked
> > > up a
> > > patch
> > > to print them for all sizes up to 1MB (unnecessarily large I
> > > guess)
> > > and also
> > > for various page sizes and nr_cpus (that's however rather
> > > invasive
> > > and prone
> > > to me missing some helper being used that still relies on real
> > > PAGE_SHIFT),
> > > then I applied v4 (needed some conflict fixups with my hack) on
> > > top:
> > > 
> > > https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slab-orders
> > > 
> > > As expected, things didn't change with 4k PAGE_SIZE. With 64k
> > > PAGE_SIZE, I
> > > thought the patch in v4 form would result in lower orders, but
> > > seems
> > > not always?
> > > 
> > > I.e. I can see before the patch:
> > > 
> > >  Calculated slab orders for page_shift 16 nr_cpus 1:
> > >           8       0
> > >        4376       1
> > > 
> > > (so until 4368 bytes it keeps order at 0)
> > > 
> > > And after:
> > >           8       0
> > >        2264       1
> > >        2272       0
> > >        2344       1
> > >        2352       0
> > >        2432       1
> > > 
> > > Not sure this kind of "oscillation" is helpful with a small
> > > machine
> > > (1CPU),
> > > and 64kB pages so the unused part of page is quite small.
> > > 
> > Hi Vlastimil,
> >  
> > With patch. it will cause the fraction_size to rise to 32
> > when utilizing a 64k page size. As a result, the maximum wastage
> > cap
> > for each slab cache will be 2k (64k divided by 32). Any object size
> > exceeding this cap will be moved to order 1 or beyond due to which
> > this
> > oscillation is seen.
> 
> Hi, sorry for the late reply.
> 
> > > With 16 cpus, AFAICS the orders are also larger for some sizes.
> > > Hm but you reported reduction of total slab memory which suggests
> > > lower
> > > orders were selected somewhere, so maybe I did some mistake.A
> > 
> > AFAIK total slab memory is reduce because of two reason (with this
> > patch for larger page size) 
> > 1) order for some slab cache is reduce (by increasing
> > fraction_size)
> 
> How can increased fraction_size ever result in a lower order? I think
> it can
> only result in increased order (or same order). And the simulations
> with my
> hack patch don't seem to counter example that. Note previously I did
> expect
> the order to be lower (or same) and was surprised by my results, but
> now I
> realized I misunderstood the v4 patch.

Hi, Sorry for late reply as i was on vacation :) 

You're absolutely
right. Increasing the fraction size won't reduce the order, and I
apologize for any confusion in my previous response.
> 
> > 2) Have also seen reduction in overall slab cache numbers as
> > because of
> > increasing page order
> 
> I think your results might be just due to randomness and could turn
> out
> different with repeating the test, or converge to be the same if you
> average
> multiple runs. You posted them for "160 CPUs with 64K Page size" and
> if I
> add that combination to my hack print, I see the same result before
> and
> after your patch:
> 
> Calculated slab orders for page_shift 16 nr_cpus 160:
>          8       0
>       1824       1
>       3648       2
>       7288       3
>     174768       2
>     196608       3
>     524296       4
> 
> Still, I might have a bug there. Can you confirm there are actual
> differences with a /proc/slabinfo before/after your patch? If there
> are
> none, any differences observed have to be due to randomness, not
> differences
> in order.

Indeed, to eliminate randomness, I've consistently gathered data from
/proc/slabinfo, and I can confirm a decrease in the total number of
slab caches. 

Values as on 160 cpu system with 64k page size 
Without
patch 24892 slab caches
with patch    23891 slab caches  
> 
> Going back to the idea behind your patch, I don't think it makes
> sense to
> try increase the fraction only for higher-orders. Yes, with 1/16
> fraction,
> the waste with 64kB page can be 4kB, while with 1/32 it will be just
> 2kB,
> and with 4kB this is only 256 vs 128bytes. However the object sizes
> and
> counts don't differ with page size, so with 4kB pages we'll have more
> slabs
> to host the same number of objects, and the waste will accumulate
> accordingly - i.e. the fraction metric should be independent of page
> size
> wrt resulting total kilobytes of waste.
> 
> So maybe the only thing we need to do is to try setting it to 32
> initial
> value instead of 16 regardless of page size. That should hopefully
> again
> show a good tradeoff for 4kB as one of the earlier versions, while on
> 64kB
> it shouldn't cause much difference (again, none at all with 160 cpus,
> some
> difference with less than 128 cpus, if my simulations were correct).
> 
Yes, We can modify the default fraction size to 32 for all page sizes.
I've noticed that on a 160 CPU system with a 64K page size, there's a
noticeable change in the total memory allocated for slabs – it
decreases.

Alright, I'll make the necessary changes to the patch, setting the
fraction size default to 32, and I'll post v5 along with some
performance metrics.
>  
> > > Anyway my point here is that this evaluation approach might be
> > > useful, even
> > > if it's a non-upstreamable hack, and some postprocessing of the
> > > output is
> > > needed for easier comparison of before/after, so feel free to try
> > > that out.
> > 
> > Thank you for this details test :) 
> > > BTW I'll be away for 2 weeks from now, so further feedback will
> > > have
> > > to come
> > > from others in that time...
> > > 
> > Do we have any additional feedback from others on the same matter?
> > 
> > Thank
> > 
> > Jay Patel
> > > > Thanks!
> > > > --
> > > > Hyeonggon