* [PATCH] mm/slub: embed __slab_alloc to its caller
@ 2021-02-02 8:05 Abel Wu
2021-02-02 10:11 ` Christoph Lameter
0 siblings, 1 reply; 4+ messages in thread
From: Abel Wu @ 2021-02-02 8:05 UTC (permalink / raw)
To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
Andrew Morton, Vlastimil Babka
Cc: hewenliang4, wuyun.wu, Abel Wu, linux-mm, linux-kernel
Since slab_alloc_node() is the only caller of __slab_alloc(), embed
__slab_alloc() to its caller to save function call overhead. This
will also expand the caller's code block size a bit, but hackbench
tests on both host and guest didn't show a difference w/ or w/o
this patch.
Also rename ___slab_alloc() to __slab_alloc().
Signed-off-by: Abel Wu <abel.w@icloud.com>
---
mm/slub.c | 46 ++++++++++++++++------------------------------
1 file changed, 16 insertions(+), 30 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index 7ecbbbe5bc0c..0f69d2d0471a 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2654,10 +2654,9 @@ static inline void *get_freelist(struct kmem_cache *s, struct page *page)
* we need to allocate a new slab. This is the slowest path since it involves
* a call to the page allocator and the setup of a new slab.
*
- * Version of __slab_alloc to use when we know that interrupts are
- * already disabled (which is the case for bulk allocation).
+ * Must be called with interrupts disabled.
*/
-static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
+static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
unsigned long addr, struct kmem_cache_cpu *c)
{
void *freelist;
@@ -2758,31 +2757,6 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
return freelist;
}
-/*
- * Another one that disabled interrupt and compensates for possible
- * cpu changes by refetching the per cpu area pointer.
- */
-static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
- unsigned long addr, struct kmem_cache_cpu *c)
-{
- void *p;
- unsigned long flags;
-
- local_irq_save(flags);
-#ifdef CONFIG_PREEMPTION
- /*
- * We may have been preempted and rescheduled on a different
- * cpu before disabling interrupts. Need to reload cpu area
- * pointer.
- */
- c = this_cpu_ptr(s->cpu_slab);
-#endif
-
- p = ___slab_alloc(s, gfpflags, node, addr, c);
- local_irq_restore(flags);
- return p;
-}
-
/*
* If the object has been wiped upon free, make sure it's fully initialized by
* zeroing out freelist pointer.
@@ -2854,7 +2828,19 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s,
object = c->freelist;
page = c->page;
if (unlikely(!object || !page || !node_match(page, node))) {
+ unsigned long flags;
+
+ local_irq_save(flags);
+#ifdef CONFIG_PREEMPTION
+ /*
+ * We may have been preempted and rescheduled on a different
+ * cpu before disabling interrupts. Need to reload cpu area
+ * pointer.
+ */
+ c = this_cpu_ptr(s->cpu_slab);
+#endif
object = __slab_alloc(s, gfpflags, node, addr, c);
+ local_irq_restore(flags);
} else {
void *next_object = get_freepointer_safe(s, object);
@@ -3299,7 +3285,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
* We may have removed an object from c->freelist using
* the fastpath in the previous iteration; in that case,
* c->tid has not been bumped yet.
- * Since ___slab_alloc() may reenable interrupts while
+ * Since __slab_alloc() may reenable interrupts while
* allocating memory, we should bump c->tid now.
*/
c->tid = next_tid(c->tid);
@@ -3308,7 +3294,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
* Invoking slow path likely have side-effect
* of re-populating per CPU c->freelist
*/
- p[i] = ___slab_alloc(s, flags, NUMA_NO_NODE,
+ p[i] = __slab_alloc(s, flags, NUMA_NO_NODE,
_RET_IP_, c);
if (unlikely(!p[i]))
goto error;
--
2.27.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] mm/slub: embed __slab_alloc to its caller
2021-02-02 8:05 [PATCH] mm/slub: embed __slab_alloc to its caller Abel Wu
@ 2021-02-02 10:11 ` Christoph Lameter
2021-02-03 1:41 ` Abel Wu
0 siblings, 1 reply; 4+ messages in thread
From: Christoph Lameter @ 2021-02-02 10:11 UTC (permalink / raw)
To: Abel Wu
Cc: Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
Vlastimil Babka, hewenliang4, wuyun.wu, linux-mm, linux-kernel
On Tue, 2 Feb 2021, Abel Wu wrote:
> Since slab_alloc_node() is the only caller of __slab_alloc(), embed
> __slab_alloc() to its caller to save function call overhead. This
> will also expand the caller's code block size a bit, but hackbench
> tests on both host and guest didn't show a difference w/ or w/o
> this patch.
slab_alloc_node is an always_inline function. It is intentional that only
the fast path was inlined and not the slow path.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] mm/slub: embed __slab_alloc to its caller
2021-02-02 10:11 ` Christoph Lameter
@ 2021-02-03 1:41 ` Abel Wu
2021-02-05 13:03 ` Vlastimil Babka
0 siblings, 1 reply; 4+ messages in thread
From: Abel Wu @ 2021-02-03 1:41 UTC (permalink / raw)
To: Christoph Lameter
Cc: Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
Vlastimil Babka, hewenliang4, wuyun.wu, linux-mm, linux-kernel
> On Feb 2, 2021, at 6:11 PM, Christoph Lameter <cl@linux.com> wrote:
>
> On Tue, 2 Feb 2021, Abel Wu wrote:
>
>> Since slab_alloc_node() is the only caller of __slab_alloc(), embed
>> __slab_alloc() to its caller to save function call overhead. This
>> will also expand the caller's code block size a bit, but hackbench
>> tests on both host and guest didn't show a difference w/ or w/o
>> this patch.
>
> slab_alloc_node is an always_inline function. It is intentional that only
> the fast path was inlined and not the slow path.
Oh I got it. Thanks for your excellent explanation.
Best Regards,
Abel
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] mm/slub: embed __slab_alloc to its caller
2021-02-03 1:41 ` Abel Wu
@ 2021-02-05 13:03 ` Vlastimil Babka
0 siblings, 0 replies; 4+ messages in thread
From: Vlastimil Babka @ 2021-02-05 13:03 UTC (permalink / raw)
To: Abel Wu, Christoph Lameter
Cc: Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
hewenliang4, wuyun.wu, linux-mm, linux-kernel
On 2/3/21 2:41 AM, Abel Wu wrote:
>> On Feb 2, 2021, at 6:11 PM, Christoph Lameter <cl@linux.com> wrote:
>>
>> On Tue, 2 Feb 2021, Abel Wu wrote:
>>
>>> Since slab_alloc_node() is the only caller of __slab_alloc(), embed
>>> __slab_alloc() to its caller to save function call overhead. This
>>> will also expand the caller's code block size a bit, but hackbench
>>> tests on both host and guest didn't show a difference w/ or w/o
>>> this patch.
>>
>> slab_alloc_node is an always_inline function. It is intentional that only
>> the fast path was inlined and not the slow path.
>
> Oh I got it. Thanks for your excellent explanation.
BTW, there's a script in the Linux source to nicely see the effect of such changes:
./scripts/bloat-o-meter slub.o.before mm/slub.o
add/remove: 0/1 grow/shrink: 9/0 up/down: 1660/-1130 (530)
Function old new delta
__slab_alloc 127 1130 +1003
__kmalloc_track_caller 877 965 +88
__kmalloc 878 966 +88
kmem_cache_alloc 778 862 +84
__kmalloc_node_track_caller 996 1080 +84
kmem_cache_alloc_node_trace 813 896 +83
kmem_cache_alloc_node 800 881 +81
kmem_cache_alloc_trace 786 862 +76
__kmalloc_node 998 1071 +73
___slab_alloc 1130 - -1130
Total: Before=57782, After=58312, chg +0.92%
And yeah, bloating all the entry points wouldn't be nice.
Thanks,
Vlastimil
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-02-05 13:04 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-02 8:05 [PATCH] mm/slub: embed __slab_alloc to its caller Abel Wu
2021-02-02 10:11 ` Christoph Lameter
2021-02-03 1:41 ` Abel Wu
2021-02-05 13:03 ` Vlastimil Babka
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).