From: Jay Patel <jaypatel@linux.ibm.com>
To: Vlastimil Babka <vbabka@suse.cz>, Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: linux-mm@kvack.org, cl@linux.com, penberg@kernel.org,
rientjes@google.com, iamjoonsoo.kim@lge.com,
akpm@linux-foundation.org, aneesh.kumar@linux.ibm.com,
tsahu@linux.ibm.com, piyushs@linux.ibm.com
Subject: Re: [RFC PATCH v4] mm/slub: Optimize slub memory usage
Date: Thu, 14 Sep 2023 11:10:10 +0530 [thread overview]
Message-ID: <2e257eb4b3cc76f78619f5b8c9f95462421762d4.camel@linux.ibm.com> (raw)
In-Reply-To: <fc2752e5-0e9f-3106-f3bd-0e7631f9d23c@suse.cz>
On Thu, 2023-09-07 at 15:42 +0200, Vlastimil Babka wrote:
> On 8/24/23 12:52, Jay Patel wrote:
> > On Fri, 2023-08-11 at 17:43 +0200, Vlastimil Babka wrote:
> > > On 8/10/23 19:54, Hyeonggon Yoo wrote:
> > > > > order = calc_slab_order(size,
> > > > > min_objects,
> > > > > slub_max_order,
> > > > > fraction);
> > > > > @@ -4159,14 +4164,6 @@ static inline int
> > > > > calculate_order(unsigned
> > > > > int size)
> > > > > min_objects--;
> > > > > }
> > > > > - /*
> > > > > - * We were unable to place multiple objects in a
> > > > > slab.
> > > > > Now
> > > > > - * lets see if we can place a single object there.
> > > > > - */
> > > > > - order = calc_slab_order(size, 1, slub_max_order, 1);
> > > > > - if (order <= slub_max_order)
> > > > > - return order;
> > > >
> > > > I'm not sure if it's okay to remove this?
> > > > It was fine in v2 because the least wasteful order was chosen
> > > > regardless of fraction but that's not true anymore.
> > > >
> > > > Otherwise, everything looks fine to me. I'm too dumb to
> > > > anticipate
> > > > the outcome of increasing the slab order :P but this patch does
> > > > not
> > > > sound crazy to me.
> > >
> > > I wanted to have a better idea how the orders change so I hacked
> > > up a
> > > patch
> > > to print them for all sizes up to 1MB (unnecessarily large I
> > > guess)
> > > and also
> > > for various page sizes and nr_cpus (that's however rather
> > > invasive
> > > and prone
> > > to me missing some helper being used that still relies on real
> > > PAGE_SHIFT),
> > > then I applied v4 (needed some conflict fixups with my hack) on
> > > top:
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slab-orders
> > >
> > > As expected, things didn't change with 4k PAGE_SIZE. With 64k
> > > PAGE_SIZE, I
> > > thought the patch in v4 form would result in lower orders, but
> > > seems
> > > not always?
> > >
> > > I.e. I can see before the patch:
> > >
> > > Calculated slab orders for page_shift 16 nr_cpus 1:
> > > 8 0
> > > 4376 1
> > >
> > > (so until 4368 bytes it keeps order at 0)
> > >
> > > And after:
> > > 8 0
> > > 2264 1
> > > 2272 0
> > > 2344 1
> > > 2352 0
> > > 2432 1
> > >
> > > Not sure this kind of "oscillation" is helpful with a small
> > > machine
> > > (1CPU),
> > > and 64kB pages so the unused part of page is quite small.
> > >
> > Hi Vlastimil,
> >
> > With patch. it will cause the fraction_size to rise to 32
> > when utilizing a 64k page size. As a result, the maximum wastage
> > cap
> > for each slab cache will be 2k (64k divided by 32). Any object size
> > exceeding this cap will be moved to order 1 or beyond due to which
> > this
> > oscillation is seen.
>
> Hi, sorry for the late reply.
>
> > > With 16 cpus, AFAICS the orders are also larger for some sizes.
> > > Hm but you reported reduction of total slab memory which suggests
> > > lower
> > > orders were selected somewhere, so maybe I did some mistake.A
> >
> > AFAIK total slab memory is reduce because of two reason (with this
> > patch for larger page size)
> > 1) order for some slab cache is reduce (by increasing
> > fraction_size)
>
> How can increased fraction_size ever result in a lower order? I think
> it can
> only result in increased order (or same order). And the simulations
> with my
> hack patch don't seem to counter example that. Note previously I did
> expect
> the order to be lower (or same) and was surprised by my results, but
> now I
> realized I misunderstood the v4 patch.
Hi, Sorry for late reply as i was on vacation :)
You're absolutely
right. Increasing the fraction size won't reduce the order, and I
apologize for any confusion in my previous response.
>
> > 2) Have also seen reduction in overall slab cache numbers as
> > because of
> > increasing page order
>
> I think your results might be just due to randomness and could turn
> out
> different with repeating the test, or converge to be the same if you
> average
> multiple runs. You posted them for "160 CPUs with 64K Page size" and
> if I
> add that combination to my hack print, I see the same result before
> and
> after your patch:
>
> Calculated slab orders for page_shift 16 nr_cpus 160:
> 8 0
> 1824 1
> 3648 2
> 7288 3
> 174768 2
> 196608 3
> 524296 4
>
> Still, I might have a bug there. Can you confirm there are actual
> differences with a /proc/slabinfo before/after your patch? If there
> are
> none, any differences observed have to be due to randomness, not
> differences
> in order.
Indeed, to eliminate randomness, I've consistently gathered data from
/proc/slabinfo, and I can confirm a decrease in the total number of
slab caches.
Values as on 160 cpu system with 64k page size
Without
patch 24892 slab caches
with patch 23891 slab caches
>
> Going back to the idea behind your patch, I don't think it makes
> sense to
> try increase the fraction only for higher-orders. Yes, with 1/16
> fraction,
> the waste with 64kB page can be 4kB, while with 1/32 it will be just
> 2kB,
> and with 4kB this is only 256 vs 128bytes. However the object sizes
> and
> counts don't differ with page size, so with 4kB pages we'll have more
> slabs
> to host the same number of objects, and the waste will accumulate
> accordingly - i.e. the fraction metric should be independent of page
> size
> wrt resulting total kilobytes of waste.
>
> So maybe the only thing we need to do is to try setting it to 32
> initial
> value instead of 16 regardless of page size. That should hopefully
> again
> show a good tradeoff for 4kB as one of the earlier versions, while on
> 64kB
> it shouldn't cause much difference (again, none at all with 160 cpus,
> some
> difference with less than 128 cpus, if my simulations were correct).
>
Yes, We can modify the default fraction size to 32 for all page sizes.
I've noticed that on a 160 CPU system with a 64K page size, there's a
noticeable change in the total memory allocated for slabs – it
decreases.
Alright, I'll make the necessary changes to the patch, setting the
fraction size default to 32, and I'll post v5 along with some
performance metrics.
>
> > > Anyway my point here is that this evaluation approach might be
> > > useful, even
> > > if it's a non-upstreamable hack, and some postprocessing of the
> > > output is
> > > needed for easier comparison of before/after, so feel free to try
> > > that out.
> >
> > Thank you for this details test :)
> > > BTW I'll be away for 2 weeks from now, so further feedback will
> > > have
> > > to come
> > > from others in that time...
> > >
> > Do we have any additional feedback from others on the same matter?
> >
> > Thank
> >
> > Jay Patel
> > > > Thanks!
> > > > --
> > > > Hyeonggon
next prev parent reply other threads:[~2023-09-14 5:40 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-20 10:23 [RFC PATCH v4] mm/slub: Optimize slub memory usage Jay Patel
2023-08-10 17:54 ` Hyeonggon Yoo
2023-08-11 6:52 ` Jay Patel
2023-08-18 5:11 ` Hyeonggon Yoo
2023-08-18 6:41 ` Jay Patel
2023-08-11 15:43 ` Vlastimil Babka
2023-08-24 10:52 ` Jay Patel
2023-09-07 13:42 ` Vlastimil Babka
2023-09-14 5:40 ` Jay Patel [this message]
2023-09-14 6:38 ` Vlastimil Babka
2023-09-14 12:43 ` Jay Patel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2e257eb4b3cc76f78619f5b8c9f95462421762d4.camel@linux.ibm.com \
--to=jaypatel@linux.ibm.com \
--cc=42.hyeyoo@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.ibm.com \
--cc=cl@linux.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=linux-mm@kvack.org \
--cc=penberg@kernel.org \
--cc=piyushs@linux.ibm.com \
--cc=rientjes@google.com \
--cc=tsahu@linux.ibm.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).