linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] mm/slub: some minor optimization and cleanup
@ 2024-01-17 11:45 Chengming Zhou
  2024-01-17 11:45 ` [PATCH 1/3] mm/slub: directly load freelist from cpu partial slab in the likely case Chengming Zhou
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Chengming Zhou @ 2024-01-17 11:45 UTC (permalink / raw)
  To: Hyeonggon Yoo, Joonsoo Kim, Vlastimil Babka, Christoph Lameter,
	Pekka Enberg, Andrew Morton, Roman Gushchin, David Rientjes
  Cc: linux-mm, Chengming Zhou, linux-kernel

Hi,

This series include a minor optimization of cpu partial slab fastpath,
which directly load freelist from cpu partial slab in the likely case.

It has small performance improvement in testing:
perf bench sched messaging -g 5 -t -l 100000

            mm-stable   slub-optimize
Total time      7.473    7.209

The other two patches are cleanups, which are included for convenience.

Thanks for review and comment!

To: Christoph Lameter <cl@linux.com>
To: Pekka Enberg <penberg@kernel.org>
To: David Rientjes <rientjes@google.com>
To: Joonsoo Kim <iamjoonsoo.kim@lge.com>
To: Andrew Morton <akpm@linux-foundation.org>
To: Vlastimil Babka <vbabka@suse.cz>
To: Roman Gushchin <roman.gushchin@linux.dev>
To: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>

---
Chengming Zhou (3):
      mm/slub: directly load freelist from cpu partial slab in the likely case
      mm/slub: remove full list manipulation for non-debug slab
      mm/slub: remove unused parameter in next_freelist_entry()

 mm/slub.c | 31 +++++++++++++------------------
 1 file changed, 13 insertions(+), 18 deletions(-)
---
base-commit: ab27740f76654ed58dd32ac0ba0031c18a6dea3b
change-id: 20240117-slab-misc-5a5f37a51257

Best regards,
-- 
Chengming Zhou <zhouchengming@bytedance.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/3] mm/slub: directly load freelist from cpu partial slab in the likely case
  2024-01-17 11:45 [PATCH 0/3] mm/slub: some minor optimization and cleanup Chengming Zhou
@ 2024-01-17 11:45 ` Chengming Zhou
  2024-01-17 22:41   ` Christoph Lameter (Ampere)
  2024-01-17 11:45 ` [PATCH 2/3] mm/slub: remove full list manipulation for non-debug slab Chengming Zhou
  2024-01-17 11:46 ` [PATCH 3/3] mm/slub: remove unused parameter in next_freelist_entry() Chengming Zhou
  2 siblings, 1 reply; 16+ messages in thread
From: Chengming Zhou @ 2024-01-17 11:45 UTC (permalink / raw)
  To: Hyeonggon Yoo, Joonsoo Kim, Vlastimil Babka, Christoph Lameter,
	Pekka Enberg, Andrew Morton, Roman Gushchin, David Rientjes
  Cc: linux-mm, Chengming Zhou, linux-kernel

The likely case is that we get a usable slab from the cpu partial list,
we can directly load freelist from it and return back, instead of going
the other way that need more work, like reenable interrupt and recheck.

But we need to remove the "VM_BUG_ON(!new.frozen)" in get_freelist()
for reusing it, since cpu partial slab is not frozen. It seems
acceptable since it's only for debug purpose.

There is some small performance improvement too, which shows by:
perf bench sched messaging -g 5 -t -l 100000

            mm-stable   slub-optimize
Total time      7.473    7.209

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 mm/slub.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 2ef88bbf56a3..20c03555c97b 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3326,7 +3326,6 @@ static inline void *get_freelist(struct kmem_cache *s, struct slab *slab)
 		counters = slab->counters;
 
 		new.counters = counters;
-		VM_BUG_ON(!new.frozen);
 
 		new.inuse = slab->objects;
 		new.frozen = freelist != NULL;
@@ -3498,18 +3497,19 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 
 		slab = slub_percpu_partial(c);
 		slub_set_percpu_partial(c, slab);
-		local_unlock_irqrestore(&s->cpu_slab->lock, flags);
-		stat(s, CPU_PARTIAL_ALLOC);
 
-		if (unlikely(!node_match(slab, node) ||
-			     !pfmemalloc_match(slab, gfpflags))) {
-			slab->next = NULL;
-			__put_partials(s, slab);
-			continue;
+		if (likely(node_match(slab, node) &&
+			   pfmemalloc_match(slab, gfpflags))) {
+			c->slab = slab;
+			freelist = get_freelist(s, slab);
+			stat(s, CPU_PARTIAL_ALLOC);
+			goto load_freelist;
 		}
 
-		freelist = freeze_slab(s, slab);
-		goto retry_load_slab;
+		local_unlock_irqrestore(&s->cpu_slab->lock, flags);
+
+		slab->next = NULL;
+		__put_partials(s, slab);
 	}
 #endif
 

-- 
b4 0.10.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/3] mm/slub: remove full list manipulation for non-debug slab
  2024-01-17 11:45 [PATCH 0/3] mm/slub: some minor optimization and cleanup Chengming Zhou
  2024-01-17 11:45 ` [PATCH 1/3] mm/slub: directly load freelist from cpu partial slab in the likely case Chengming Zhou
@ 2024-01-17 11:45 ` Chengming Zhou
  2024-01-17 22:44   ` Christoph Lameter (Ampere)
  2024-01-23  8:45   ` Vlastimil Babka
  2024-01-17 11:46 ` [PATCH 3/3] mm/slub: remove unused parameter in next_freelist_entry() Chengming Zhou
  2 siblings, 2 replies; 16+ messages in thread
From: Chengming Zhou @ 2024-01-17 11:45 UTC (permalink / raw)
  To: Hyeonggon Yoo, Joonsoo Kim, Vlastimil Babka, Christoph Lameter,
	Pekka Enberg, Andrew Morton, Roman Gushchin, David Rientjes
  Cc: linux-mm, Chengming Zhou, linux-kernel

Since debug slab is processed by free_to_partial_list(), and only debug
slab which has SLAB_STORE_USER flag would care about the full list, we
can remove these unrelated full list manipulations from __slab_free().

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 mm/slub.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 20c03555c97b..f0307e8b4cd2 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4187,7 +4187,6 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
 	 * then add it.
 	 */
 	if (!kmem_cache_has_cpu_partial(s) && unlikely(!prior)) {
-		remove_full(s, n, slab);
 		add_partial(n, slab, DEACTIVATE_TO_TAIL);
 		stat(s, FREE_ADD_PARTIAL);
 	}
@@ -4201,9 +4200,6 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
 		 */
 		remove_partial(n, slab);
 		stat(s, FREE_REMOVE_PARTIAL);
-	} else {
-		/* Slab must be on the full list */
-		remove_full(s, n, slab);
 	}
 
 	spin_unlock_irqrestore(&n->list_lock, flags);

-- 
b4 0.10.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/3] mm/slub: remove unused parameter in next_freelist_entry()
  2024-01-17 11:45 [PATCH 0/3] mm/slub: some minor optimization and cleanup Chengming Zhou
  2024-01-17 11:45 ` [PATCH 1/3] mm/slub: directly load freelist from cpu partial slab in the likely case Chengming Zhou
  2024-01-17 11:45 ` [PATCH 2/3] mm/slub: remove full list manipulation for non-debug slab Chengming Zhou
@ 2024-01-17 11:46 ` Chengming Zhou
  2024-01-23  8:47   ` Vlastimil Babka
  2 siblings, 1 reply; 16+ messages in thread
From: Chengming Zhou @ 2024-01-17 11:46 UTC (permalink / raw)
  To: Hyeonggon Yoo, Joonsoo Kim, Vlastimil Babka, Christoph Lameter,
	Pekka Enberg, Andrew Morton, Roman Gushchin, David Rientjes
  Cc: linux-mm, Chengming Zhou, linux-kernel

The parameter "struct slab *slab" is unused in next_freelist_entry(),
so just remove it.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 mm/slub.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index f0307e8b4cd2..3a4e2f8d341c 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2243,7 +2243,7 @@ static void __init init_freelist_randomization(void)
 }
 
 /* Get the next entry on the pre-computed freelist randomized */
-static void *next_freelist_entry(struct kmem_cache *s, struct slab *slab,
+static void *next_freelist_entry(struct kmem_cache *s,
 				unsigned long *pos, void *start,
 				unsigned long page_limit,
 				unsigned long freelist_count)
@@ -2282,13 +2282,12 @@ static bool shuffle_freelist(struct kmem_cache *s, struct slab *slab)
 	start = fixup_red_left(s, slab_address(slab));
 
 	/* First entry is used as the base of the freelist */
-	cur = next_freelist_entry(s, slab, &pos, start, page_limit,
-				freelist_count);
+	cur = next_freelist_entry(s, &pos, start, page_limit, freelist_count);
 	cur = setup_object(s, cur);
 	slab->freelist = cur;
 
 	for (idx = 1; idx < slab->objects; idx++) {
-		next = next_freelist_entry(s, slab, &pos, start, page_limit,
+		next = next_freelist_entry(s, &pos, start, page_limit,
 			freelist_count);
 		next = setup_object(s, next);
 		set_freepointer(s, cur, next);

-- 
b4 0.10.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] mm/slub: directly load freelist from cpu partial slab in the likely case
  2024-01-17 11:45 ` [PATCH 1/3] mm/slub: directly load freelist from cpu partial slab in the likely case Chengming Zhou
@ 2024-01-17 22:41   ` Christoph Lameter (Ampere)
  2024-01-18 11:37     ` Chengming Zhou
  0 siblings, 1 reply; 16+ messages in thread
From: Christoph Lameter (Ampere) @ 2024-01-17 22:41 UTC (permalink / raw)
  To: Chengming Zhou
  Cc: Hyeonggon Yoo, Joonsoo Kim, Vlastimil Babka, Pekka Enberg,
	Andrew Morton, Roman Gushchin, David Rientjes, linux-mm,
	linux-kernel

On Wed, 17 Jan 2024, Chengming Zhou wrote:

> The likely case is that we get a usable slab from the cpu partial list,
> we can directly load freelist from it and return back, instead of going
> the other way that need more work, like reenable interrupt and recheck.

Ok I see that it could be useful to avoid the unlock_irq/lock_irq sequence 
in the partial cpu handling.

> But we need to remove the "VM_BUG_ON(!new.frozen)" in get_freelist()
> for reusing it, since cpu partial slab is not frozen. It seems
> acceptable since it's only for debug purpose.

This is test for verification that the newly acquired slab is actually in 
frozen status. If that test is no longer necessary then this is a bug that 
may need to be fixed independently. Maybe this test is now required to be 
different depending on where the partial slab originated from? Check only 
necessary when taken from the per node partials?

> There is some small performance improvement too, which shows by:
> perf bench sched messaging -g 5 -t -l 100000
>
>            mm-stable   slub-optimize
> Total time      7.473    7.209

Hmm... Good avoiding the lock/relock sequence helps.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] mm/slub: remove full list manipulation for non-debug slab
  2024-01-17 11:45 ` [PATCH 2/3] mm/slub: remove full list manipulation for non-debug slab Chengming Zhou
@ 2024-01-17 22:44   ` Christoph Lameter (Ampere)
  2024-01-23  8:45   ` Vlastimil Babka
  1 sibling, 0 replies; 16+ messages in thread
From: Christoph Lameter (Ampere) @ 2024-01-17 22:44 UTC (permalink / raw)
  To: Chengming Zhou
  Cc: Hyeonggon Yoo, Joonsoo Kim, Vlastimil Babka, Pekka Enberg,
	Andrew Morton, Roman Gushchin, David Rientjes, linux-mm,
	linux-kernel

On Wed, 17 Jan 2024, Chengming Zhou wrote:

> Since debug slab is processed by free_to_partial_list(), and only debug
> slab which has SLAB_STORE_USER flag would care about the full list, we
> can remove these unrelated full list manipulations from __slab_free().

Acked-by: Christoph Lameter (Ampere) <cl@linux.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] mm/slub: directly load freelist from cpu partial slab in the likely case
  2024-01-17 22:41   ` Christoph Lameter (Ampere)
@ 2024-01-18 11:37     ` Chengming Zhou
  2024-01-18 22:14       ` Christoph Lameter (Ampere)
  0 siblings, 1 reply; 16+ messages in thread
From: Chengming Zhou @ 2024-01-18 11:37 UTC (permalink / raw)
  To: Christoph Lameter (Ampere)
  Cc: Hyeonggon Yoo, Joonsoo Kim, Vlastimil Babka, Pekka Enberg,
	Andrew Morton, Roman Gushchin, David Rientjes, linux-mm,
	linux-kernel

On 2024/1/18 06:41, Christoph Lameter (Ampere) wrote:
> On Wed, 17 Jan 2024, Chengming Zhou wrote:
> 
>> The likely case is that we get a usable slab from the cpu partial list,
>> we can directly load freelist from it and return back, instead of going
>> the other way that need more work, like reenable interrupt and recheck.
> 
> Ok I see that it could be useful to avoid the unlock_irq/lock_irq sequence in the partial cpu handling.

Right.

> 
>> But we need to remove the "VM_BUG_ON(!new.frozen)" in get_freelist()
>> for reusing it, since cpu partial slab is not frozen. It seems
>> acceptable since it's only for debug purpose.
> 
> This is test for verification that the newly acquired slab is actually in frozen status. If that test is no longer necessary then this is a bug that may need to be fixed independently. Maybe this test is now required to be different depending on where the partial slab originated from? Check only necessary when taken from the per node partials?

Now there are two similar functions: get_freelist() and freeze_slab().

get_freelist() is used for the cpu slab, will transfer the freelist to
the cpu freelist, so there is "VM_BUG_ON(!new.frozen)" in it, since the
cpu slab must be frozen already.

freeze_slab() is used for slab got from node partial list, will be frozen
and get the freelist from it before using as the cpu slab. So it has the
"VM_BUG_ON(new.frozen)" in it since the partial slab must NOT be frozen.
And it doesn't need the cpu_slab lock.

This patch handles the third case: slab on cpu partial list, which
already held the cpu_slab lock, so change to reuse get_freelist() from
freeze_slab().

So get_freelist() has two cases to handle: cpu slab and cpu partial list slab.
The latter is NOT frozen, so need to remove "VM_BUG_ON(!new.frozen)" from it.

And "VM_BUG_ON(new.frozen)" in freeze_slab() is unchanged, so per node partials
are covered.

Thanks!

> 
>> There is some small performance improvement too, which shows by:
>> perf bench sched messaging -g 5 -t -l 100000
>>
>>            mm-stable   slub-optimize
>> Total time      7.473    7.209
> 
> Hmm... Good avoiding the lock/relock sequence helps.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] mm/slub: directly load freelist from cpu partial slab in the likely case
  2024-01-18 11:37     ` Chengming Zhou
@ 2024-01-18 22:14       ` Christoph Lameter (Ampere)
  2024-01-19  3:53         ` Chengming Zhou
  0 siblings, 1 reply; 16+ messages in thread
From: Christoph Lameter (Ampere) @ 2024-01-18 22:14 UTC (permalink / raw)
  To: Chengming Zhou
  Cc: Hyeonggon Yoo, Joonsoo Kim, Vlastimil Babka, Pekka Enberg,
	Andrew Morton, Roman Gushchin, David Rientjes, linux-mm,
	linux-kernel

On Thu, 18 Jan 2024, Chengming Zhou wrote:

> So get_freelist() has two cases to handle: cpu slab and cpu partial list slab.
> The latter is NOT frozen, so need to remove "VM_BUG_ON(!new.frozen)" from it.

Right so keep the check if it is the former?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] mm/slub: directly load freelist from cpu partial slab in the likely case
  2024-01-18 22:14       ` Christoph Lameter (Ampere)
@ 2024-01-19  3:53         ` Chengming Zhou
  2024-01-22 17:13           ` Vlastimil Babka
  0 siblings, 1 reply; 16+ messages in thread
From: Chengming Zhou @ 2024-01-19  3:53 UTC (permalink / raw)
  To: Christoph Lameter (Ampere)
  Cc: Hyeonggon Yoo, Joonsoo Kim, Vlastimil Babka, Pekka Enberg,
	Andrew Morton, Roman Gushchin, David Rientjes, linux-mm,
	linux-kernel

On 2024/1/19 06:14, Christoph Lameter (Ampere) wrote:
> On Thu, 18 Jan 2024, Chengming Zhou wrote:
> 
>> So get_freelist() has two cases to handle: cpu slab and cpu partial list slab.
>> The latter is NOT frozen, so need to remove "VM_BUG_ON(!new.frozen)" from it.
> 
> Right so keep the check if it is the former?
> 

Ok, I get it. Maybe like this:

diff --git a/mm/slub.c b/mm/slub.c
index 2ef88bbf56a3..7fa9dbc2e938 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3313,7 +3313,7 @@ __update_cpu_freelist_fast(struct kmem_cache *s,
  *
  * If this function returns NULL then the slab has been unfrozen.
  */
-static inline void *get_freelist(struct kmem_cache *s, struct slab *slab)
+static inline void *get_freelist(struct kmem_cache *s, struct slab *slab, int frozen)
 {
        struct slab new;
        unsigned long counters;
@@ -3326,7 +3326,7 @@ static inline void *get_freelist(struct kmem_cache *s, struct slab *slab)
                counters = slab->counters;

                new.counters = counters;
-               VM_BUG_ON(!new.frozen);
+               VM_BUG_ON(frozen && !new.frozen);

                new.inuse = slab->objects;
                new.frozen = freelist != NULL;
@@ -3440,7 +3440,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
        if (freelist)
                goto load_freelist;

-       freelist = get_freelist(s, slab);
+       freelist = get_freelist(s, slab, 1);

        if (!freelist) {
                c->slab = NULL;
@@ -3498,18 +3498,19 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,

                slab = slub_percpu_partial(c);
                slub_set_percpu_partial(c, slab);
-               local_unlock_irqrestore(&s->cpu_slab->lock, flags);
-               stat(s, CPU_PARTIAL_ALLOC);

-               if (unlikely(!node_match(slab, node) ||
-                            !pfmemalloc_match(slab, gfpflags))) {
-                       slab->next = NULL;
-                       __put_partials(s, slab);
-                       continue;
+               if (likely(node_match(slab, node) &&
+                          pfmemalloc_match(slab, gfpflags))) {
+                       c->slab = slab;
+                       freelist = get_freelist(s, slab, 0);
+                       stat(s, CPU_PARTIAL_ALLOC);
+                       goto load_freelist;
                }

-               freelist = freeze_slab(s, slab);
-               goto retry_load_slab;
+               local_unlock_irqrestore(&s->cpu_slab->lock, flags);
+
+               slab->next = NULL;
+               __put_partials(s, slab);
        }
 #endif

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] mm/slub: directly load freelist from cpu partial slab in the likely case
  2024-01-19  3:53         ` Chengming Zhou
@ 2024-01-22 17:13           ` Vlastimil Babka
  2024-01-23  2:51             ` Chengming Zhou
  0 siblings, 1 reply; 16+ messages in thread
From: Vlastimil Babka @ 2024-01-22 17:13 UTC (permalink / raw)
  To: Chengming Zhou, Christoph Lameter (Ampere)
  Cc: Hyeonggon Yoo, Joonsoo Kim, Pekka Enberg, Andrew Morton,
	Roman Gushchin, David Rientjes, linux-mm, linux-kernel

On 1/19/24 04:53, Chengming Zhou wrote:
> On 2024/1/19 06:14, Christoph Lameter (Ampere) wrote:
>> On Thu, 18 Jan 2024, Chengming Zhou wrote:
>> 
>>> So get_freelist() has two cases to handle: cpu slab and cpu partial list slab.
>>> The latter is NOT frozen, so need to remove "VM_BUG_ON(!new.frozen)" from it.
>> 
>> Right so keep the check if it is the former?
>> 
> 
> Ok, I get it. Maybe like this:

I think that's just too ugly for a VM_BUG_ON(). I'd just remove the check
and be done with that.

I have a somewhat different point. You reused get_freelist() but in fact
it's more like freeze_slab(), but that one uses slab_update_freelist() and
we are under the local_lock so we want the cheaper __slab_update_freelist(),
which get_freelist() has and I guess that's why you reused that one.

However get_freelist() also assumes it can return NULL if the freelist is
empty. If that's possible to happen on the percpu partial list, we should
not "goto load_freelist;" but rather create a new label above that, above
the "if (!freelist) {" block that handles the case.

If that's not possible to happen (needs careful audit) and we have guarantee
that slabs on percpu partial list must have non-empty freelist, then we
probably instead want a new __freeze_slab() variant that is like
freeze_slab(), but uses __slab_update_freelist() and probably also has
VM_BUG_ON(!freelist) before returning it?

> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 2ef88bbf56a3..7fa9dbc2e938 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -3313,7 +3313,7 @@ __update_cpu_freelist_fast(struct kmem_cache *s,
>   *
>   * If this function returns NULL then the slab has been unfrozen.
>   */
> -static inline void *get_freelist(struct kmem_cache *s, struct slab *slab)
> +static inline void *get_freelist(struct kmem_cache *s, struct slab *slab, int frozen)
>  {
>         struct slab new;
>         unsigned long counters;
> @@ -3326,7 +3326,7 @@ static inline void *get_freelist(struct kmem_cache *s, struct slab *slab)
>                 counters = slab->counters;
> 
>                 new.counters = counters;
> -               VM_BUG_ON(!new.frozen);
> +               VM_BUG_ON(frozen && !new.frozen);
> 
>                 new.inuse = slab->objects;
>                 new.frozen = freelist != NULL;
> @@ -3440,7 +3440,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>         if (freelist)
>                 goto load_freelist;
> 
> -       freelist = get_freelist(s, slab);
> +       freelist = get_freelist(s, slab, 1);
> 
>         if (!freelist) {
>                 c->slab = NULL;
> @@ -3498,18 +3498,19 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
> 
>                 slab = slub_percpu_partial(c);
>                 slub_set_percpu_partial(c, slab);
> -               local_unlock_irqrestore(&s->cpu_slab->lock, flags);
> -               stat(s, CPU_PARTIAL_ALLOC);
> 
> -               if (unlikely(!node_match(slab, node) ||
> -                            !pfmemalloc_match(slab, gfpflags))) {
> -                       slab->next = NULL;
> -                       __put_partials(s, slab);
> -                       continue;
> +               if (likely(node_match(slab, node) &&
> +                          pfmemalloc_match(slab, gfpflags))) {
> +                       c->slab = slab;
> +                       freelist = get_freelist(s, slab, 0);
> +                       stat(s, CPU_PARTIAL_ALLOC);
> +                       goto load_freelist;
>                 }
> 
> -               freelist = freeze_slab(s, slab);
> -               goto retry_load_slab;
> +               local_unlock_irqrestore(&s->cpu_slab->lock, flags);
> +
> +               slab->next = NULL;
> +               __put_partials(s, slab);
>         }
>  #endif


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] mm/slub: directly load freelist from cpu partial slab in the likely case
  2024-01-22 17:13           ` Vlastimil Babka
@ 2024-01-23  2:51             ` Chengming Zhou
  2024-01-23  7:42               ` Chengming Zhou
  2024-01-23  8:24               ` Vlastimil Babka
  0 siblings, 2 replies; 16+ messages in thread
From: Chengming Zhou @ 2024-01-23  2:51 UTC (permalink / raw)
  To: Vlastimil Babka, Christoph Lameter (Ampere)
  Cc: Hyeonggon Yoo, Joonsoo Kim, Pekka Enberg, Andrew Morton,
	Roman Gushchin, David Rientjes, linux-mm, linux-kernel

On 2024/1/23 01:13, Vlastimil Babka wrote:
> On 1/19/24 04:53, Chengming Zhou wrote:
>> On 2024/1/19 06:14, Christoph Lameter (Ampere) wrote:
>>> On Thu, 18 Jan 2024, Chengming Zhou wrote:
>>>
>>>> So get_freelist() has two cases to handle: cpu slab and cpu partial list slab.
>>>> The latter is NOT frozen, so need to remove "VM_BUG_ON(!new.frozen)" from it.
>>>
>>> Right so keep the check if it is the former?
>>>
>>
>> Ok, I get it. Maybe like this:
> 
> I think that's just too ugly for a VM_BUG_ON(). I'd just remove the check
> and be done with that.

Ok with me.

> 
> I have a somewhat different point. You reused get_freelist() but in fact
> it's more like freeze_slab(), but that one uses slab_update_freelist() and
> we are under the local_lock so we want the cheaper __slab_update_freelist(),
> which get_freelist() has and I guess that's why you reused that one.

Right, we already have the lock_lock, so reuse get_freelist().

> 
> However get_freelist() also assumes it can return NULL if the freelist is
> empty. If that's possible to happen on the percpu partial list, we should
> not "goto load_freelist;" but rather create a new label above that, above
> the "if (!freelist) {" block that handles the case.
> 
> If that's not possible to happen (needs careful audit) and we have guarantee

Yes, it's not possible for now.

> that slabs on percpu partial list must have non-empty freelist, then we
> probably instead want a new __freeze_slab() variant that is like
> freeze_slab(), but uses __slab_update_freelist() and probably also has
> VM_BUG_ON(!freelist) before returning it?
> 

Instead of introducing another new function, how about still reusing get_freelist()
and VM_BUG_ON(!freelist) after calling it? I feel this is simpler.

Thanks!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] mm/slub: directly load freelist from cpu partial slab in the likely case
  2024-01-23  2:51             ` Chengming Zhou
@ 2024-01-23  7:42               ` Chengming Zhou
  2024-01-23  8:24               ` Vlastimil Babka
  1 sibling, 0 replies; 16+ messages in thread
From: Chengming Zhou @ 2024-01-23  7:42 UTC (permalink / raw)
  To: Vlastimil Babka, Christoph Lameter (Ampere)
  Cc: Hyeonggon Yoo, Joonsoo Kim, Pekka Enberg, Andrew Morton,
	Roman Gushchin, David Rientjes, linux-mm, linux-kernel

On 2024/1/23 10:51, Chengming Zhou wrote:
> On 2024/1/23 01:13, Vlastimil Babka wrote:
>> On 1/19/24 04:53, Chengming Zhou wrote:
>>> On 2024/1/19 06:14, Christoph Lameter (Ampere) wrote:
>>>> On Thu, 18 Jan 2024, Chengming Zhou wrote:
>>>>
>>>>> So get_freelist() has two cases to handle: cpu slab and cpu partial list slab.
>>>>> The latter is NOT frozen, so need to remove "VM_BUG_ON(!new.frozen)" from it.
>>>>
>>>> Right so keep the check if it is the former?
>>>>
>>>
>>> Ok, I get it. Maybe like this:
>>
>> I think that's just too ugly for a VM_BUG_ON(). I'd just remove the check
>> and be done with that.
> 
> Ok with me.
> 
>>
>> I have a somewhat different point. You reused get_freelist() but in fact
>> it's more like freeze_slab(), but that one uses slab_update_freelist() and
>> we are under the local_lock so we want the cheaper __slab_update_freelist(),
>> which get_freelist() has and I guess that's why you reused that one.
> 
> Right, we already have the lock_lock, so reuse get_freelist().
> 
>>
>> However get_freelist() also assumes it can return NULL if the freelist is
>> empty. If that's possible to happen on the percpu partial list, we should
>> not "goto load_freelist;" but rather create a new label above that, above
>> the "if (!freelist) {" block that handles the case.
>>
>> If that's not possible to happen (needs careful audit) and we have guarantee
> 
> Yes, it's not possible for now.
> 
>> that slabs on percpu partial list must have non-empty freelist, then we
>> probably instead want a new __freeze_slab() variant that is like
>> freeze_slab(), but uses __slab_update_freelist() and probably also has
>> VM_BUG_ON(!freelist) before returning it?
>>
> 
> Instead of introducing another new function, how about still reusing get_freelist()
> and VM_BUG_ON(!freelist) after calling it? I feel this is simpler.
> 
> Thanks!

Does this look fine?

diff --git a/mm/slub.c b/mm/slub.c
index 2ef88bbf56a3..fda402b2d649 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3326,7 +3326,6 @@ static inline void *get_freelist(struct kmem_cache *s, struct slab *slab)
                counters = slab->counters;

                new.counters = counters;
-               VM_BUG_ON(!new.frozen);

                new.inuse = slab->objects;
                new.frozen = freelist != NULL;
@@ -3498,18 +3497,20 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,

                slab = slub_percpu_partial(c);
                slub_set_percpu_partial(c, slab);
-               local_unlock_irqrestore(&s->cpu_slab->lock, flags);
-               stat(s, CPU_PARTIAL_ALLOC);

-               if (unlikely(!node_match(slab, node) ||
-                            !pfmemalloc_match(slab, gfpflags))) {
-                       slab->next = NULL;
-                       __put_partials(s, slab);
-                       continue;
+               if (likely(node_match(slab, node) &&
+                          pfmemalloc_match(slab, gfpflags))) {
+                       c->slab = slab;
+                       freelist = get_freelist(s, slab);
+                       VM_BUG_ON(!freelist);
+                       stat(s, CPU_PARTIAL_ALLOC);
+                       goto load_freelist;
                }

-               freelist = freeze_slab(s, slab);
-               goto retry_load_slab;
+               local_unlock_irqrestore(&s->cpu_slab->lock, flags);
+
+               slab->next = NULL;
+               __put_partials(s, slab);
        }
 #endif


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] mm/slub: directly load freelist from cpu partial slab in the likely case
  2024-01-23  2:51             ` Chengming Zhou
  2024-01-23  7:42               ` Chengming Zhou
@ 2024-01-23  8:24               ` Vlastimil Babka
  2024-01-23  9:17                 ` Chengming Zhou
  1 sibling, 1 reply; 16+ messages in thread
From: Vlastimil Babka @ 2024-01-23  8:24 UTC (permalink / raw)
  To: Chengming Zhou, Christoph Lameter (Ampere)
  Cc: Hyeonggon Yoo, Joonsoo Kim, Pekka Enberg, Andrew Morton,
	Roman Gushchin, David Rientjes, linux-mm, linux-kernel

On 1/23/24 03:51, Chengming Zhou wrote:
> On 2024/1/23 01:13, Vlastimil Babka wrote:
>> On 1/19/24 04:53, Chengming Zhou wrote:
>>> On 2024/1/19 06:14, Christoph Lameter (Ampere) wrote:
>>>> On Thu, 18 Jan 2024, Chengming Zhou wrote:
>>>>
>>>>> So get_freelist() has two cases to handle: cpu slab and cpu partial list slab.
>>>>> The latter is NOT frozen, so need to remove "VM_BUG_ON(!new.frozen)" from it.
>>>>
>>>> Right so keep the check if it is the former?
>>>>
>>>
>>> Ok, I get it. Maybe like this:
>> 
>> I think that's just too ugly for a VM_BUG_ON(). I'd just remove the check
>> and be done with that.
> 
> Ok with me.
> 
>> 
>> I have a somewhat different point. You reused get_freelist() but in fact
>> it's more like freeze_slab(), but that one uses slab_update_freelist() and
>> we are under the local_lock so we want the cheaper __slab_update_freelist(),
>> which get_freelist() has and I guess that's why you reused that one.
> 
> Right, we already have the lock_lock, so reuse get_freelist().
> 
>> 
>> However get_freelist() also assumes it can return NULL if the freelist is
>> empty. If that's possible to happen on the percpu partial list, we should
>> not "goto load_freelist;" but rather create a new label above that, above
>> the "if (!freelist) {" block that handles the case.
>> 
>> If that's not possible to happen (needs careful audit) and we have guarantee
> 
> Yes, it's not possible for now.
> 
>> that slabs on percpu partial list must have non-empty freelist, then we
>> probably instead want a new __freeze_slab() variant that is like
>> freeze_slab(), but uses __slab_update_freelist() and probably also has
>> VM_BUG_ON(!freelist) before returning it?
>> 
> 
> Instead of introducing another new function, how about still reusing get_freelist()
> and VM_BUG_ON(!freelist) after calling it? I feel this is simpler.

Could you measure if introducing new function that sets new.frozen = 1; has
any performance benefit? If not, we can reuse get_freelist() as you say.
Thanks!

> Thanks!


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] mm/slub: remove full list manipulation for non-debug slab
  2024-01-17 11:45 ` [PATCH 2/3] mm/slub: remove full list manipulation for non-debug slab Chengming Zhou
  2024-01-17 22:44   ` Christoph Lameter (Ampere)
@ 2024-01-23  8:45   ` Vlastimil Babka
  1 sibling, 0 replies; 16+ messages in thread
From: Vlastimil Babka @ 2024-01-23  8:45 UTC (permalink / raw)
  To: Chengming Zhou, Hyeonggon Yoo, Joonsoo Kim, Christoph Lameter,
	Pekka Enberg, Andrew Morton, Roman Gushchin, David Rientjes
  Cc: linux-mm, linux-kernel

On 1/17/24 12:45, Chengming Zhou wrote:
> Since debug slab is processed by free_to_partial_list(), and only debug
> slab which has SLAB_STORE_USER flag would care about the full list, we
> can remove these unrelated full list manipulations from __slab_free().

Well spotted.

> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/slub.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 20c03555c97b..f0307e8b4cd2 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -4187,7 +4187,6 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
>  	 * then add it.
>  	 */
>  	if (!kmem_cache_has_cpu_partial(s) && unlikely(!prior)) {
> -		remove_full(s, n, slab);
>  		add_partial(n, slab, DEACTIVATE_TO_TAIL);
>  		stat(s, FREE_ADD_PARTIAL);
>  	}
> @@ -4201,9 +4200,6 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
>  		 */
>  		remove_partial(n, slab);
>  		stat(s, FREE_REMOVE_PARTIAL);
> -	} else {
> -		/* Slab must be on the full list */
> -		remove_full(s, n, slab);
>  	}
>  
>  	spin_unlock_irqrestore(&n->list_lock, flags);
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] mm/slub: remove unused parameter in next_freelist_entry()
  2024-01-17 11:46 ` [PATCH 3/3] mm/slub: remove unused parameter in next_freelist_entry() Chengming Zhou
@ 2024-01-23  8:47   ` Vlastimil Babka
  0 siblings, 0 replies; 16+ messages in thread
From: Vlastimil Babka @ 2024-01-23  8:47 UTC (permalink / raw)
  To: Chengming Zhou, Hyeonggon Yoo, Joonsoo Kim, Christoph Lameter,
	Pekka Enberg, Andrew Morton, Roman Gushchin, David Rientjes
  Cc: linux-mm, linux-kernel

On 1/17/24 12:46, Chengming Zhou wrote:
> The parameter "struct slab *slab" is unused in next_freelist_entry(),
> so just remove it.
> 
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/slub.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index f0307e8b4cd2..3a4e2f8d341c 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2243,7 +2243,7 @@ static void __init init_freelist_randomization(void)
>  }
>  
>  /* Get the next entry on the pre-computed freelist randomized */
> -static void *next_freelist_entry(struct kmem_cache *s, struct slab *slab,
> +static void *next_freelist_entry(struct kmem_cache *s,
>  				unsigned long *pos, void *start,
>  				unsigned long page_limit,
>  				unsigned long freelist_count)
> @@ -2282,13 +2282,12 @@ static bool shuffle_freelist(struct kmem_cache *s, struct slab *slab)
>  	start = fixup_red_left(s, slab_address(slab));
>  
>  	/* First entry is used as the base of the freelist */
> -	cur = next_freelist_entry(s, slab, &pos, start, page_limit,
> -				freelist_count);
> +	cur = next_freelist_entry(s, &pos, start, page_limit, freelist_count);
>  	cur = setup_object(s, cur);
>  	slab->freelist = cur;
>  
>  	for (idx = 1; idx < slab->objects; idx++) {
> -		next = next_freelist_entry(s, slab, &pos, start, page_limit,
> +		next = next_freelist_entry(s, &pos, start, page_limit,
>  			freelist_count);
>  		next = setup_object(s, next);
>  		set_freepointer(s, cur, next);
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/3] mm/slub: directly load freelist from cpu partial slab in the likely case
  2024-01-23  8:24               ` Vlastimil Babka
@ 2024-01-23  9:17                 ` Chengming Zhou
  0 siblings, 0 replies; 16+ messages in thread
From: Chengming Zhou @ 2024-01-23  9:17 UTC (permalink / raw)
  To: Vlastimil Babka, Christoph Lameter (Ampere)
  Cc: Hyeonggon Yoo, Joonsoo Kim, Pekka Enberg, Andrew Morton,
	Roman Gushchin, David Rientjes, linux-mm, linux-kernel

On 2024/1/23 16:24, Vlastimil Babka wrote:
> On 1/23/24 03:51, Chengming Zhou wrote:
>> On 2024/1/23 01:13, Vlastimil Babka wrote:
>>> On 1/19/24 04:53, Chengming Zhou wrote:
>>>> On 2024/1/19 06:14, Christoph Lameter (Ampere) wrote:
>>>>> On Thu, 18 Jan 2024, Chengming Zhou wrote:
>>>>>
>>>>>> So get_freelist() has two cases to handle: cpu slab and cpu partial list slab.
>>>>>> The latter is NOT frozen, so need to remove "VM_BUG_ON(!new.frozen)" from it.
>>>>>
>>>>> Right so keep the check if it is the former?
>>>>>
>>>>
>>>> Ok, I get it. Maybe like this:
>>>
>>> I think that's just too ugly for a VM_BUG_ON(). I'd just remove the check
>>> and be done with that.
>>
>> Ok with me.
>>
>>>
>>> I have a somewhat different point. You reused get_freelist() but in fact
>>> it's more like freeze_slab(), but that one uses slab_update_freelist() and
>>> we are under the local_lock so we want the cheaper __slab_update_freelist(),
>>> which get_freelist() has and I guess that's why you reused that one.
>>
>> Right, we already have the lock_lock, so reuse get_freelist().
>>
>>>
>>> However get_freelist() also assumes it can return NULL if the freelist is
>>> empty. If that's possible to happen on the percpu partial list, we should
>>> not "goto load_freelist;" but rather create a new label above that, above
>>> the "if (!freelist) {" block that handles the case.
>>>
>>> If that's not possible to happen (needs careful audit) and we have guarantee
>>
>> Yes, it's not possible for now.
>>
>>> that slabs on percpu partial list must have non-empty freelist, then we
>>> probably instead want a new __freeze_slab() variant that is like
>>> freeze_slab(), but uses __slab_update_freelist() and probably also has
>>> VM_BUG_ON(!freelist) before returning it?
>>>
>>
>> Instead of introducing another new function, how about still reusing get_freelist()
>> and VM_BUG_ON(!freelist) after calling it? I feel this is simpler.
> 
> Could you measure if introducing new function that sets new.frozen = 1; has
> any performance benefit? If not, we can reuse get_freelist() as you say.
> Thanks!
> 

I just tested using the new function: __freeze_slab() that uses __slab_update_freelist()
and sets new.frozen = 1, but found the performance is a little worse than reusing
get_freelist().

The reason I think maybe more code memory footprint? I don't look deep into that.

Anyway it looks better to reuse get_freelist(), I will update a version later.

Thanks!

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2024-01-23  9:17 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-17 11:45 [PATCH 0/3] mm/slub: some minor optimization and cleanup Chengming Zhou
2024-01-17 11:45 ` [PATCH 1/3] mm/slub: directly load freelist from cpu partial slab in the likely case Chengming Zhou
2024-01-17 22:41   ` Christoph Lameter (Ampere)
2024-01-18 11:37     ` Chengming Zhou
2024-01-18 22:14       ` Christoph Lameter (Ampere)
2024-01-19  3:53         ` Chengming Zhou
2024-01-22 17:13           ` Vlastimil Babka
2024-01-23  2:51             ` Chengming Zhou
2024-01-23  7:42               ` Chengming Zhou
2024-01-23  8:24               ` Vlastimil Babka
2024-01-23  9:17                 ` Chengming Zhou
2024-01-17 11:45 ` [PATCH 2/3] mm/slub: remove full list manipulation for non-debug slab Chengming Zhou
2024-01-17 22:44   ` Christoph Lameter (Ampere)
2024-01-23  8:45   ` Vlastimil Babka
2024-01-17 11:46 ` [PATCH 3/3] mm/slub: remove unused parameter in next_freelist_entry() Chengming Zhou
2024-01-23  8:47   ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).