* [RFC PATCH v3 1/7] slub: Keep track of whether slub is on the per-node partial list
2023-10-24 9:33 [RFC PATCH v3 0/7] slub: Delay freezing of CPU partial slabs chengming.zhou
@ 2023-10-24 9:33 ` chengming.zhou
[not found] ` <6d054dbe-c90d-591d-11ca-b9ad3787683d@suse.cz>
2023-10-24 9:33 ` [RFC PATCH v3 2/7] slub: Prepare __slab_free() for unfrozen partial slab out of node " chengming.zhou
` (6 subsequent siblings)
7 siblings, 1 reply; 30+ messages in thread
From: chengming.zhou @ 2023-10-24 9:33 UTC (permalink / raw)
To: cl, penberg
Cc: rientjes, iamjoonsoo.kim, akpm, vbabka, roman.gushchin,
42.hyeyoo, linux-mm, linux-kernel, chengming.zhou,
Chengming Zhou
From: Chengming Zhou <zhouchengming@bytedance.com>
Now we rely on the "frozen" bit to see if we should manipulate the
slab->slab_list, which will be changed in the following patch.
Instead we introduce another way to keep track of whether slub is on
the per-node partial list, here we reuse the PG_workingset bit.
We use __set_bit and __clear_bit directly instead of the atomic version
for better performance and it's safe since it's protected by the slub
node list_lock.
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
mm/slab.h | 19 +++++++++++++++++++
mm/slub.c | 3 +++
2 files changed, 22 insertions(+)
diff --git a/mm/slab.h b/mm/slab.h
index 8cd3294fedf5..50522b688cfb 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -193,6 +193,25 @@ static inline void __slab_clear_pfmemalloc(struct slab *slab)
__folio_clear_active(slab_folio(slab));
}
+/*
+ * Slub reuse PG_workingset bit to keep track of whether it's on
+ * the per-node partial list.
+ */
+static inline bool slab_test_node_partial(const struct slab *slab)
+{
+ return folio_test_workingset((struct folio *)slab_folio(slab));
+}
+
+static inline void slab_set_node_partial(struct slab *slab)
+{
+ __set_bit(PG_workingset, folio_flags(slab_folio(slab), 0));
+}
+
+static inline void slab_clear_node_partial(struct slab *slab)
+{
+ __clear_bit(PG_workingset, folio_flags(slab_folio(slab), 0));
+}
+
static inline void *slab_address(const struct slab *slab)
{
return folio_address(slab_folio(slab));
diff --git a/mm/slub.c b/mm/slub.c
index 63d281dfacdb..3fad4edca34b 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2127,6 +2127,7 @@ __add_partial(struct kmem_cache_node *n, struct slab *slab, int tail)
list_add_tail(&slab->slab_list, &n->partial);
else
list_add(&slab->slab_list, &n->partial);
+ slab_set_node_partial(slab);
}
static inline void add_partial(struct kmem_cache_node *n,
@@ -2141,6 +2142,7 @@ static inline void remove_partial(struct kmem_cache_node *n,
{
lockdep_assert_held(&n->list_lock);
list_del(&slab->slab_list);
+ slab_clear_node_partial(slab);
n->nr_partial--;
}
@@ -4831,6 +4833,7 @@ static int __kmem_cache_do_shrink(struct kmem_cache *s)
if (free == slab->objects) {
list_move(&slab->slab_list, &discard);
+ slab_clear_node_partial(slab);
n->nr_partial--;
dec_slabs_node(s, node, slab->objects);
} else if (free <= SHRINK_PROMOTE_MAX)
--
2.40.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [RFC PATCH v3 2/7] slub: Prepare __slab_free() for unfrozen partial slab out of node partial list
2023-10-24 9:33 [RFC PATCH v3 0/7] slub: Delay freezing of CPU partial slabs chengming.zhou
2023-10-24 9:33 ` [RFC PATCH v3 1/7] slub: Keep track of whether slub is on the per-node partial list chengming.zhou
@ 2023-10-24 9:33 ` chengming.zhou
[not found] ` <43da5c9a-aeff-1bff-81a8-4611470c2514@suse.cz>
2023-10-24 9:33 ` [RFC PATCH v3 3/7] slub: Reflow ___slab_alloc() chengming.zhou
` (5 subsequent siblings)
7 siblings, 1 reply; 30+ messages in thread
From: chengming.zhou @ 2023-10-24 9:33 UTC (permalink / raw)
To: cl, penberg
Cc: rientjes, iamjoonsoo.kim, akpm, vbabka, roman.gushchin,
42.hyeyoo, linux-mm, linux-kernel, chengming.zhou,
Chengming Zhou
From: Chengming Zhou <zhouchengming@bytedance.com>
Now the partial slub will be frozen when taken out of node partial list,
so the __slab_free() will know from "was_frozen" that the partial slab
is not on node partial list and is used by one kmem_cache_cpu.
But we will change this, make partial slabs leave the node partial list
with unfrozen state, so we need to change __slab_free() to use the new
slab_test_node_partial() we just introduced.
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
mm/slub.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/mm/slub.c b/mm/slub.c
index 3fad4edca34b..f568a32d7332 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3610,6 +3610,7 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
unsigned long counters;
struct kmem_cache_node *n = NULL;
unsigned long flags;
+ bool on_node_partial;
stat(s, FREE_SLOWPATH);
@@ -3657,6 +3658,7 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
*/
spin_lock_irqsave(&n->list_lock, flags);
+ on_node_partial = slab_test_node_partial(slab);
}
}
@@ -3685,6 +3687,15 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
return;
}
+ /*
+ * This slab was partial but not on the per-node partial list,
+ * in which case we shouldn't manipulate its list, just return.
+ */
+ if (prior && !on_node_partial) {
+ spin_unlock_irqrestore(&n->list_lock, flags);
+ return;
+ }
+
if (unlikely(!new.inuse && n->nr_partial >= s->min_partial))
goto slab_empty;
--
2.40.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [RFC PATCH v3 3/7] slub: Reflow ___slab_alloc()
2023-10-24 9:33 [RFC PATCH v3 0/7] slub: Delay freezing of CPU partial slabs chengming.zhou
2023-10-24 9:33 ` [RFC PATCH v3 1/7] slub: Keep track of whether slub is on the per-node partial list chengming.zhou
2023-10-24 9:33 ` [RFC PATCH v3 2/7] slub: Prepare __slab_free() for unfrozen partial slab out of node " chengming.zhou
@ 2023-10-24 9:33 ` chengming.zhou
2023-10-24 9:33 ` [RFC PATCH v3 4/7] slub: Change get_partial() interfaces to return slab chengming.zhou
` (4 subsequent siblings)
7 siblings, 0 replies; 30+ messages in thread
From: chengming.zhou @ 2023-10-24 9:33 UTC (permalink / raw)
To: cl, penberg
Cc: rientjes, iamjoonsoo.kim, akpm, vbabka, roman.gushchin,
42.hyeyoo, linux-mm, linux-kernel, chengming.zhou,
Chengming Zhou
From: Chengming Zhou <zhouchengming@bytedance.com>
The get_partial() interface used in ___slab_alloc() may return a single
object in the "kmem_cache_debug(s)" case, in which we will just return
the "freelist" object.
Move this handling up to prepare for later changes.
And the "pfmemalloc_match()" part is not needed for node partial slab,
since we already check this in the get_partial_node().
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
mm/slub.c | 31 +++++++++++++++----------------
1 file changed, 15 insertions(+), 16 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index f568a32d7332..cd8aa68c156e 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3218,8 +3218,21 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
pc.slab = &slab;
pc.orig_size = orig_size;
freelist = get_partial(s, node, &pc);
- if (freelist)
- goto check_new_slab;
+ if (freelist) {
+ if (kmem_cache_debug(s)) {
+ /*
+ * For debug caches here we had to go through
+ * alloc_single_from_partial() so just store the
+ * tracking info and return the object.
+ */
+ if (s->flags & SLAB_STORE_USER)
+ set_track(s, freelist, TRACK_ALLOC, addr);
+
+ return freelist;
+ }
+
+ goto retry_load_slab;
+ }
slub_put_cpu_ptr(s->cpu_slab);
slab = new_slab(s, gfpflags, node);
@@ -3255,20 +3268,6 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
inc_slabs_node(s, slab_nid(slab), slab->objects);
-check_new_slab:
-
- if (kmem_cache_debug(s)) {
- /*
- * For debug caches here we had to go through
- * alloc_single_from_partial() so just store the tracking info
- * and return the object
- */
- if (s->flags & SLAB_STORE_USER)
- set_track(s, freelist, TRACK_ALLOC, addr);
-
- return freelist;
- }
-
if (unlikely(!pfmemalloc_match(slab, gfpflags))) {
/*
* For !pfmemalloc_match() case we don't load freelist so that
--
2.40.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [RFC PATCH v3 4/7] slub: Change get_partial() interfaces to return slab
2023-10-24 9:33 [RFC PATCH v3 0/7] slub: Delay freezing of CPU partial slabs chengming.zhou
` (2 preceding siblings ...)
2023-10-24 9:33 ` [RFC PATCH v3 3/7] slub: Reflow ___slab_alloc() chengming.zhou
@ 2023-10-24 9:33 ` chengming.zhou
2023-10-30 16:55 ` Vlastimil Babka
2023-10-24 9:33 ` [RFC PATCH v3 5/7] slub: Introduce freeze_slab() chengming.zhou
` (3 subsequent siblings)
7 siblings, 1 reply; 30+ messages in thread
From: chengming.zhou @ 2023-10-24 9:33 UTC (permalink / raw)
To: cl, penberg
Cc: rientjes, iamjoonsoo.kim, akpm, vbabka, roman.gushchin,
42.hyeyoo, linux-mm, linux-kernel, chengming.zhou,
Chengming Zhou
From: Chengming Zhou <zhouchengming@bytedance.com>
We need all get_partial() related interfaces to return a slab, instead
of returning the freelist (or object).
Use the partial_context.object to return back freelist or object for
now. This patch shouldn't have any functional changes.
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
mm/slub.c | 63 +++++++++++++++++++++++++++++--------------------------
1 file changed, 33 insertions(+), 30 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index cd8aa68c156e..7d0234bffad3 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -204,9 +204,9 @@ DEFINE_STATIC_KEY_FALSE(slub_debug_enabled);
/* Structure holding parameters for get_partial() call chain */
struct partial_context {
- struct slab **slab;
gfp_t flags;
unsigned int orig_size;
+ void *object;
};
static inline bool kmem_cache_debug(struct kmem_cache *s)
@@ -2271,10 +2271,11 @@ static inline bool pfmemalloc_match(struct slab *slab, gfp_t gfpflags);
/*
* Try to allocate a partial slab from a specific node.
*/
-static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n,
- struct partial_context *pc)
+static struct slab *get_partial_node(struct kmem_cache *s,
+ struct kmem_cache_node *n,
+ struct partial_context *pc)
{
- struct slab *slab, *slab2;
+ struct slab *slab, *slab2, *partial = NULL;
void *object = NULL;
unsigned long flags;
unsigned int partial_slabs = 0;
@@ -2290,27 +2291,28 @@ static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n,
spin_lock_irqsave(&n->list_lock, flags);
list_for_each_entry_safe(slab, slab2, &n->partial, slab_list) {
- void *t;
-
if (!pfmemalloc_match(slab, pc->flags))
continue;
if (IS_ENABLED(CONFIG_SLUB_TINY) || kmem_cache_debug(s)) {
object = alloc_single_from_partial(s, n, slab,
pc->orig_size);
- if (object)
+ if (object) {
+ partial = slab;
+ pc->object = object;
break;
+ }
continue;
}
- t = acquire_slab(s, n, slab, object == NULL);
- if (!t)
+ object = acquire_slab(s, n, slab, object == NULL);
+ if (!object)
break;
- if (!object) {
- *pc->slab = slab;
+ if (!partial) {
+ partial = slab;
+ pc->object = object;
stat(s, ALLOC_FROM_PARTIAL);
- object = t;
} else {
put_cpu_partial(s, slab, 0);
stat(s, CPU_PARTIAL_NODE);
@@ -2326,20 +2328,21 @@ static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n,
}
spin_unlock_irqrestore(&n->list_lock, flags);
- return object;
+ return partial;
}
/*
* Get a slab from somewhere. Search in increasing NUMA distances.
*/
-static void *get_any_partial(struct kmem_cache *s, struct partial_context *pc)
+static struct slab *get_any_partial(struct kmem_cache *s,
+ struct partial_context *pc)
{
#ifdef CONFIG_NUMA
struct zonelist *zonelist;
struct zoneref *z;
struct zone *zone;
enum zone_type highest_zoneidx = gfp_zone(pc->flags);
- void *object;
+ struct slab *slab;
unsigned int cpuset_mems_cookie;
/*
@@ -2374,8 +2377,8 @@ static void *get_any_partial(struct kmem_cache *s, struct partial_context *pc)
if (n && cpuset_zone_allowed(zone, pc->flags) &&
n->nr_partial > s->min_partial) {
- object = get_partial_node(s, n, pc);
- if (object) {
+ slab = get_partial_node(s, n, pc);
+ if (slab) {
/*
* Don't check read_mems_allowed_retry()
* here - if mems_allowed was updated in
@@ -2383,7 +2386,7 @@ static void *get_any_partial(struct kmem_cache *s, struct partial_context *pc)
* between allocation and the cpuset
* update
*/
- return object;
+ return slab;
}
}
}
@@ -2395,17 +2398,18 @@ static void *get_any_partial(struct kmem_cache *s, struct partial_context *pc)
/*
* Get a partial slab, lock it and return it.
*/
-static void *get_partial(struct kmem_cache *s, int node, struct partial_context *pc)
+static struct slab *get_partial(struct kmem_cache *s, int node,
+ struct partial_context *pc)
{
- void *object;
+ struct slab *slab;
int searchnode = node;
if (node == NUMA_NO_NODE)
searchnode = numa_mem_id();
- object = get_partial_node(s, get_node(s, searchnode), pc);
- if (object || node != NUMA_NO_NODE)
- return object;
+ slab = get_partial_node(s, get_node(s, searchnode), pc);
+ if (slab || node != NUMA_NO_NODE)
+ return slab;
return get_any_partial(s, pc);
}
@@ -3215,10 +3219,10 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
new_objects:
pc.flags = gfpflags;
- pc.slab = &slab;
pc.orig_size = orig_size;
- freelist = get_partial(s, node, &pc);
- if (freelist) {
+ slab = get_partial(s, node, &pc);
+ if (slab) {
+ freelist = pc.object;
if (kmem_cache_debug(s)) {
/*
* For debug caches here we had to go through
@@ -3410,12 +3414,11 @@ static void *__slab_alloc_node(struct kmem_cache *s,
void *object;
pc.flags = gfpflags;
- pc.slab = &slab;
pc.orig_size = orig_size;
- object = get_partial(s, node, &pc);
+ slab = get_partial(s, node, &pc);
- if (object)
- return object;
+ if (slab)
+ return pc.object;
slab = new_slab(s, gfpflags, node);
if (unlikely(!slab)) {
--
2.40.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v3 4/7] slub: Change get_partial() interfaces to return slab
2023-10-24 9:33 ` [RFC PATCH v3 4/7] slub: Change get_partial() interfaces to return slab chengming.zhou
@ 2023-10-30 16:55 ` Vlastimil Babka
2023-10-31 2:22 ` Chengming Zhou
0 siblings, 1 reply; 30+ messages in thread
From: Vlastimil Babka @ 2023-10-30 16:55 UTC (permalink / raw)
To: chengming.zhou, cl, penberg
Cc: rientjes, iamjoonsoo.kim, akpm, roman.gushchin, 42.hyeyoo,
linux-mm, linux-kernel, Chengming Zhou
On 10/24/23 11:33, chengming.zhou@linux.dev wrote:
> From: Chengming Zhou <zhouchengming@bytedance.com>
>
> We need all get_partial() related interfaces to return a slab, instead
> of returning the freelist (or object).
>
> Use the partial_context.object to return back freelist or object for
> now. This patch shouldn't have any functional changes.
>
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
I think you could even move patches 3/7 and 4/7 to the front of the series,
as cleanups that are useful on their own.
Thanks!
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v3 4/7] slub: Change get_partial() interfaces to return slab
2023-10-30 16:55 ` Vlastimil Babka
@ 2023-10-31 2:22 ` Chengming Zhou
0 siblings, 0 replies; 30+ messages in thread
From: Chengming Zhou @ 2023-10-31 2:22 UTC (permalink / raw)
To: Vlastimil Babka, cl, penberg
Cc: rientjes, iamjoonsoo.kim, akpm, roman.gushchin, 42.hyeyoo,
linux-mm, linux-kernel, Chengming Zhou
On 2023/10/31 00:55, Vlastimil Babka wrote:
> On 10/24/23 11:33, chengming.zhou@linux.dev wrote:
>> From: Chengming Zhou <zhouchengming@bytedance.com>
>>
>> We need all get_partial() related interfaces to return a slab, instead
>> of returning the freelist (or object).
>>
>> Use the partial_context.object to return back freelist or object for
>> now. This patch shouldn't have any functional changes.
>>
>> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>
> I think you could even move patches 3/7 and 4/7 to the front of the series,
> as cleanups that are useful on their own.
>
Right, I will move these two to the front in the next version.
Thanks!
^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFC PATCH v3 5/7] slub: Introduce freeze_slab()
2023-10-24 9:33 [RFC PATCH v3 0/7] slub: Delay freezing of CPU partial slabs chengming.zhou
` (3 preceding siblings ...)
2023-10-24 9:33 ` [RFC PATCH v3 4/7] slub: Change get_partial() interfaces to return slab chengming.zhou
@ 2023-10-24 9:33 ` chengming.zhou
2023-10-30 18:11 ` Vlastimil Babka
2023-10-24 9:33 ` [RFC PATCH v3 6/7] slub: Delay freezing of partial slabs chengming.zhou
` (2 subsequent siblings)
7 siblings, 1 reply; 30+ messages in thread
From: chengming.zhou @ 2023-10-24 9:33 UTC (permalink / raw)
To: cl, penberg
Cc: rientjes, iamjoonsoo.kim, akpm, vbabka, roman.gushchin,
42.hyeyoo, linux-mm, linux-kernel, chengming.zhou,
Chengming Zhou
From: Chengming Zhou <zhouchengming@bytedance.com>
We will have unfrozen slabs out of the node partial list later, so we
need a freeze_slab() function to freeze the partial slab and get its
freelist.
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
mm/slub.c | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)
diff --git a/mm/slub.c b/mm/slub.c
index 7d0234bffad3..5b428648021f 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3079,6 +3079,33 @@ static inline void *get_freelist(struct kmem_cache *s, struct slab *slab)
return freelist;
}
+/*
+ * Freeze the partial slab and return the pointer to the freelist.
+ */
+static inline void *freeze_slab(struct kmem_cache *s, struct slab *slab)
+{
+ struct slab new;
+ unsigned long counters;
+ void *freelist;
+
+ do {
+ freelist = slab->freelist;
+ counters = slab->counters;
+
+ new.counters = counters;
+ VM_BUG_ON(new.frozen);
+
+ new.inuse = slab->objects;
+ new.frozen = 1;
+
+ } while (!__slab_update_freelist(s, slab,
+ freelist, counters,
+ NULL, new.counters,
+ "freeze_slab"));
+
+ return freelist;
+}
+
/*
* Slow path. The lockless freelist is empty or we need to perform
* debugging duties.
--
2.40.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v3 5/7] slub: Introduce freeze_slab()
2023-10-24 9:33 ` [RFC PATCH v3 5/7] slub: Introduce freeze_slab() chengming.zhou
@ 2023-10-30 18:11 ` Vlastimil Babka
0 siblings, 0 replies; 30+ messages in thread
From: Vlastimil Babka @ 2023-10-30 18:11 UTC (permalink / raw)
To: chengming.zhou, cl, penberg
Cc: rientjes, iamjoonsoo.kim, akpm, roman.gushchin, 42.hyeyoo,
linux-mm, linux-kernel, Chengming Zhou
On 10/24/23 11:33, chengming.zhou@linux.dev wrote:
> From: Chengming Zhou <zhouchengming@bytedance.com>
>
> We will have unfrozen slabs out of the node partial list later, so we
> need a freeze_slab() function to freeze the partial slab and get its
> freelist.
>
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
As you noted, we'll need slab_update_freelist().
Otherwise,
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> ---
> mm/slub.c | 27 +++++++++++++++++++++++++++
> 1 file changed, 27 insertions(+)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 7d0234bffad3..5b428648021f 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -3079,6 +3079,33 @@ static inline void *get_freelist(struct kmem_cache *s, struct slab *slab)
> return freelist;
> }
>
> +/*
> + * Freeze the partial slab and return the pointer to the freelist.
> + */
> +static inline void *freeze_slab(struct kmem_cache *s, struct slab *slab)
> +{
> + struct slab new;
> + unsigned long counters;
> + void *freelist;
> +
> + do {
> + freelist = slab->freelist;
> + counters = slab->counters;
> +
> + new.counters = counters;
> + VM_BUG_ON(new.frozen);
> +
> + new.inuse = slab->objects;
> + new.frozen = 1;
> +
> + } while (!__slab_update_freelist(s, slab,
> + freelist, counters,
> + NULL, new.counters,
> + "freeze_slab"));
> +
> + return freelist;
> +}
> +
> /*
> * Slow path. The lockless freelist is empty or we need to perform
> * debugging duties.
^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFC PATCH v3 6/7] slub: Delay freezing of partial slabs
2023-10-24 9:33 [RFC PATCH v3 0/7] slub: Delay freezing of CPU partial slabs chengming.zhou
` (4 preceding siblings ...)
2023-10-24 9:33 ` [RFC PATCH v3 5/7] slub: Introduce freeze_slab() chengming.zhou
@ 2023-10-24 9:33 ` chengming.zhou
2023-10-24 14:30 ` kernel test robot
` (4 more replies)
2023-10-24 9:33 ` [RFC PATCH v3 7/7] slub: Optimize deactivate_slab() chengming.zhou
2023-10-27 17:57 ` [RFC PATCH v3 0/7] slub: Delay freezing of CPU partial slabs Christoph Lameter
7 siblings, 5 replies; 30+ messages in thread
From: chengming.zhou @ 2023-10-24 9:33 UTC (permalink / raw)
To: cl, penberg
Cc: rientjes, iamjoonsoo.kim, akpm, vbabka, roman.gushchin,
42.hyeyoo, linux-mm, linux-kernel, chengming.zhou,
Chengming Zhou
From: Chengming Zhou <zhouchengming@bytedance.com>
Now we will freeze slabs when moving them out of node partial list to
cpu partial list, this method needs two cmpxchg_double operations:
1. freeze slab (acquire_slab()) under the node list_lock
2. get_freelist() when pick used in ___slab_alloc()
Actually we don't need to freeze when moving slabs out of node partial
list, we can delay freezing to when use slab freelist in ___slab_alloc(),
so we can save one cmpxchg_double().
And there are other good points:
- The moving of slabs between node partial list and cpu partial list
becomes simpler, since we don't need to freeze or unfreeze at all.
- The node list_lock contention would be less, since we don't need to
freeze any slab under the node list_lock.
We can achieve this because there is no concurrent path would manipulate
the partial slab list except the __slab_free() path, which is serialized
now.
Since the slab returned by get_partial() interfaces is not frozen anymore
and no freelist in the partial_context, so we need to use the introduced
freeze_slab() to freeze it and get its freelist.
Similarly, the slabs on the CPU partial list are not frozen anymore,
we need to freeze_slab() on it before use.
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
mm/slub.c | 111 +++++++++++-------------------------------------------
1 file changed, 21 insertions(+), 90 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index 5b428648021f..486d44421432 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2215,51 +2215,6 @@ static void *alloc_single_from_new_slab(struct kmem_cache *s,
return object;
}
-/*
- * Remove slab from the partial list, freeze it and
- * return the pointer to the freelist.
- *
- * Returns a list of objects or NULL if it fails.
- */
-static inline void *acquire_slab(struct kmem_cache *s,
- struct kmem_cache_node *n, struct slab *slab,
- int mode)
-{
- void *freelist;
- unsigned long counters;
- struct slab new;
-
- lockdep_assert_held(&n->list_lock);
-
- /*
- * Zap the freelist and set the frozen bit.
- * The old freelist is the list of objects for the
- * per cpu allocation list.
- */
- freelist = slab->freelist;
- counters = slab->counters;
- new.counters = counters;
- if (mode) {
- new.inuse = slab->objects;
- new.freelist = NULL;
- } else {
- new.freelist = freelist;
- }
-
- VM_BUG_ON(new.frozen);
- new.frozen = 1;
-
- if (!__slab_update_freelist(s, slab,
- freelist, counters,
- new.freelist, new.counters,
- "acquire_slab"))
- return NULL;
-
- remove_partial(n, slab);
- WARN_ON(!freelist);
- return freelist;
-}
-
#ifdef CONFIG_SLUB_CPU_PARTIAL
static void put_cpu_partial(struct kmem_cache *s, struct slab *slab, int drain);
#else
@@ -2276,7 +2231,6 @@ static struct slab *get_partial_node(struct kmem_cache *s,
struct partial_context *pc)
{
struct slab *slab, *slab2, *partial = NULL;
- void *object = NULL;
unsigned long flags;
unsigned int partial_slabs = 0;
@@ -2295,7 +2249,7 @@ static struct slab *get_partial_node(struct kmem_cache *s,
continue;
if (IS_ENABLED(CONFIG_SLUB_TINY) || kmem_cache_debug(s)) {
- object = alloc_single_from_partial(s, n, slab,
+ void *object = alloc_single_from_partial(s, n, slab,
pc->orig_size);
if (object) {
partial = slab;
@@ -2305,13 +2259,10 @@ static struct slab *get_partial_node(struct kmem_cache *s,
continue;
}
- object = acquire_slab(s, n, slab, object == NULL);
- if (!object)
- break;
+ remove_partial(n, slab);
if (!partial) {
partial = slab;
- pc->object = object;
stat(s, ALLOC_FROM_PARTIAL);
} else {
put_cpu_partial(s, slab, 0);
@@ -2610,9 +2561,6 @@ static void __unfreeze_partials(struct kmem_cache *s, struct slab *partial_slab)
unsigned long flags = 0;
while (partial_slab) {
- struct slab new;
- struct slab old;
-
slab = partial_slab;
partial_slab = slab->next;
@@ -2625,23 +2573,7 @@ static void __unfreeze_partials(struct kmem_cache *s, struct slab *partial_slab)
spin_lock_irqsave(&n->list_lock, flags);
}
- do {
-
- old.freelist = slab->freelist;
- old.counters = slab->counters;
- VM_BUG_ON(!old.frozen);
-
- new.counters = old.counters;
- new.freelist = old.freelist;
-
- new.frozen = 0;
-
- } while (!__slab_update_freelist(s, slab,
- old.freelist, old.counters,
- new.freelist, new.counters,
- "unfreezing slab"));
-
- if (unlikely(!new.inuse && n->nr_partial >= s->min_partial)) {
+ if (unlikely(!slab->inuse && n->nr_partial >= s->min_partial)) {
slab->next = slab_to_discard;
slab_to_discard = slab;
} else {
@@ -3148,7 +3080,6 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
node = NUMA_NO_NODE;
goto new_slab;
}
-redo:
if (unlikely(!node_match(slab, node))) {
/*
@@ -3224,7 +3155,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
new_slab:
- if (slub_percpu_partial(c)) {
+ while (slub_percpu_partial(c)) {
local_lock_irqsave(&s->cpu_slab->lock, flags);
if (unlikely(c->slab)) {
local_unlock_irqrestore(&s->cpu_slab->lock, flags);
@@ -3236,11 +3167,20 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
goto new_objects;
}
- slab = c->slab = slub_percpu_partial(c);
+ slab = slub_percpu_partial(c);
slub_set_percpu_partial(c, slab);
local_unlock_irqrestore(&s->cpu_slab->lock, flags);
stat(s, CPU_PARTIAL_ALLOC);
- goto redo;
+
+ if (unlikely(!node_match(slab, node) ||
+ !pfmemalloc_match(slab, gfpflags))) {
+ slab->next = NULL;
+ __unfreeze_partials(s, slab);
+ continue;
+ }
+
+ freelist = freeze_slab(s, slab);
+ goto retry_load_slab;
}
new_objects:
@@ -3249,8 +3189,8 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
pc.orig_size = orig_size;
slab = get_partial(s, node, &pc);
if (slab) {
- freelist = pc.object;
if (kmem_cache_debug(s)) {
+ freelist = pc.object;
/*
* For debug caches here we had to go through
* alloc_single_from_partial() so just store the
@@ -3262,6 +3202,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
return freelist;
}
+ freelist = freeze_slab(s, slab);
goto retry_load_slab;
}
@@ -3663,18 +3604,8 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
was_frozen = new.frozen;
new.inuse -= cnt;
if ((!new.inuse || !prior) && !was_frozen) {
-
- if (kmem_cache_has_cpu_partial(s) && !prior) {
-
- /*
- * Slab was on no list before and will be
- * partially empty
- * We can defer the list move and instead
- * freeze it.
- */
- new.frozen = 1;
-
- } else { /* Needs to be taken off a list */
+ /* Needs to be taken off a list */
+ if (!kmem_cache_has_cpu_partial(s) || prior) {
n = get_node(s, slab_nid(slab));
/*
@@ -3704,9 +3635,9 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
* activity can be necessary.
*/
stat(s, FREE_FROZEN);
- } else if (new.frozen) {
+ } else if (kmem_cache_has_cpu_partial(s) && !prior) {
/*
- * If we just froze the slab then put it onto the
+ * If we started with a full slab then put it onto the
* per cpu partial list.
*/
put_cpu_partial(s, slab, 1);
--
2.40.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v3 6/7] slub: Delay freezing of partial slabs
2023-10-24 9:33 ` [RFC PATCH v3 6/7] slub: Delay freezing of partial slabs chengming.zhou
@ 2023-10-24 14:30 ` kernel test robot
2023-10-24 14:42 ` Chengming Zhou
2023-10-24 15:22 ` kernel test robot
` (3 subsequent siblings)
4 siblings, 1 reply; 30+ messages in thread
From: kernel test robot @ 2023-10-24 14:30 UTC (permalink / raw)
To: chengming.zhou; +Cc: oe-kbuild-all
Hi,
[This is a private test report for your RFC patch.]
kernel test robot noticed the following build errors:
[auto build test ERROR on vbabka-slab/for-next]
[also build test ERROR on linus/master v6.6-rc7 next-20231024]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/chengming-zhou-linux-dev/slub-Keep-track-of-whether-slub-is-on-the-per-node-partial-list/20231024-173519
base: git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git for-next
patch link: https://lore.kernel.org/r/20231024093345.3676493-7-chengming.zhou%40linux.dev
patch subject: [RFC PATCH v3 6/7] slub: Delay freezing of partial slabs
config: sh-allnoconfig (https://download.01.org/0day-ci/archive/20231024/202310242209.VBW1Fewa-lkp@intel.com/config)
compiler: sh4-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231024/202310242209.VBW1Fewa-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202310242209.VBW1Fewa-lkp@intel.com/
All errors (new ones prefixed by >>):
mm/slub.c: In function '___slab_alloc':
>> mm/slub.c:3177:29: error: 'struct slab' has no member named 'next'
3177 | slab->next = NULL;
| ^~
>> mm/slub.c:3178:25: error: implicit declaration of function '__unfreeze_partials'; did you mean 'unfreeze_partials'? [-Werror=implicit-function-declaration]
3178 | __unfreeze_partials(s, slab);
| ^~~~~~~~~~~~~~~~~~~
| unfreeze_partials
cc1: some warnings being treated as errors
vim +3177 mm/slub.c
3040
3041 /*
3042 * Slow path. The lockless freelist is empty or we need to perform
3043 * debugging duties.
3044 *
3045 * Processing is still very fast if new objects have been freed to the
3046 * regular freelist. In that case we simply take over the regular freelist
3047 * as the lockless freelist and zap the regular freelist.
3048 *
3049 * If that is not working then we fall back to the partial lists. We take the
3050 * first element of the freelist as the object to allocate now and move the
3051 * rest of the freelist to the lockless freelist.
3052 *
3053 * And if we were unable to get a new slab from the partial slab lists then
3054 * we need to allocate a new slab. This is the slowest path since it involves
3055 * a call to the page allocator and the setup of a new slab.
3056 *
3057 * Version of __slab_alloc to use when we know that preemption is
3058 * already disabled (which is the case for bulk allocation).
3059 */
3060 static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
3061 unsigned long addr, struct kmem_cache_cpu *c, unsigned int orig_size)
3062 {
3063 void *freelist;
3064 struct slab *slab;
3065 unsigned long flags;
3066 struct partial_context pc;
3067
3068 stat(s, ALLOC_SLOWPATH);
3069
3070 reread_slab:
3071
3072 slab = READ_ONCE(c->slab);
3073 if (!slab) {
3074 /*
3075 * if the node is not online or has no normal memory, just
3076 * ignore the node constraint
3077 */
3078 if (unlikely(node != NUMA_NO_NODE &&
3079 !node_isset(node, slab_nodes)))
3080 node = NUMA_NO_NODE;
3081 goto new_slab;
3082 }
3083
3084 if (unlikely(!node_match(slab, node))) {
3085 /*
3086 * same as above but node_match() being false already
3087 * implies node != NUMA_NO_NODE
3088 */
3089 if (!node_isset(node, slab_nodes)) {
3090 node = NUMA_NO_NODE;
3091 } else {
3092 stat(s, ALLOC_NODE_MISMATCH);
3093 goto deactivate_slab;
3094 }
3095 }
3096
3097 /*
3098 * By rights, we should be searching for a slab page that was
3099 * PFMEMALLOC but right now, we are losing the pfmemalloc
3100 * information when the page leaves the per-cpu allocator
3101 */
3102 if (unlikely(!pfmemalloc_match(slab, gfpflags)))
3103 goto deactivate_slab;
3104
3105 /* must check again c->slab in case we got preempted and it changed */
3106 local_lock_irqsave(&s->cpu_slab->lock, flags);
3107 if (unlikely(slab != c->slab)) {
3108 local_unlock_irqrestore(&s->cpu_slab->lock, flags);
3109 goto reread_slab;
3110 }
3111 freelist = c->freelist;
3112 if (freelist)
3113 goto load_freelist;
3114
3115 freelist = get_freelist(s, slab);
3116
3117 if (!freelist) {
3118 c->slab = NULL;
3119 c->tid = next_tid(c->tid);
3120 local_unlock_irqrestore(&s->cpu_slab->lock, flags);
3121 stat(s, DEACTIVATE_BYPASS);
3122 goto new_slab;
3123 }
3124
3125 stat(s, ALLOC_REFILL);
3126
3127 load_freelist:
3128
3129 lockdep_assert_held(this_cpu_ptr(&s->cpu_slab->lock));
3130
3131 /*
3132 * freelist is pointing to the list of objects to be used.
3133 * slab is pointing to the slab from which the objects are obtained.
3134 * That slab must be frozen for per cpu allocations to work.
3135 */
3136 VM_BUG_ON(!c->slab->frozen);
3137 c->freelist = get_freepointer(s, freelist);
3138 c->tid = next_tid(c->tid);
3139 local_unlock_irqrestore(&s->cpu_slab->lock, flags);
3140 return freelist;
3141
3142 deactivate_slab:
3143
3144 local_lock_irqsave(&s->cpu_slab->lock, flags);
3145 if (slab != c->slab) {
3146 local_unlock_irqrestore(&s->cpu_slab->lock, flags);
3147 goto reread_slab;
3148 }
3149 freelist = c->freelist;
3150 c->slab = NULL;
3151 c->freelist = NULL;
3152 c->tid = next_tid(c->tid);
3153 local_unlock_irqrestore(&s->cpu_slab->lock, flags);
3154 deactivate_slab(s, slab, freelist);
3155
3156 new_slab:
3157
3158 while (slub_percpu_partial(c)) {
3159 local_lock_irqsave(&s->cpu_slab->lock, flags);
3160 if (unlikely(c->slab)) {
3161 local_unlock_irqrestore(&s->cpu_slab->lock, flags);
3162 goto reread_slab;
3163 }
3164 if (unlikely(!slub_percpu_partial(c))) {
3165 local_unlock_irqrestore(&s->cpu_slab->lock, flags);
3166 /* we were preempted and partial list got empty */
3167 goto new_objects;
3168 }
3169
3170 slab = slub_percpu_partial(c);
3171 slub_set_percpu_partial(c, slab);
3172 local_unlock_irqrestore(&s->cpu_slab->lock, flags);
3173 stat(s, CPU_PARTIAL_ALLOC);
3174
3175 if (unlikely(!node_match(slab, node) ||
3176 !pfmemalloc_match(slab, gfpflags))) {
> 3177 slab->next = NULL;
> 3178 __unfreeze_partials(s, slab);
3179 continue;
3180 }
3181
3182 freelist = freeze_slab(s, slab);
3183 goto retry_load_slab;
3184 }
3185
3186 new_objects:
3187
3188 pc.flags = gfpflags;
3189 pc.orig_size = orig_size;
3190 slab = get_partial(s, node, &pc);
3191 if (slab) {
3192 if (kmem_cache_debug(s)) {
3193 freelist = pc.object;
3194 /*
3195 * For debug caches here we had to go through
3196 * alloc_single_from_partial() so just store the
3197 * tracking info and return the object.
3198 */
3199 if (s->flags & SLAB_STORE_USER)
3200 set_track(s, freelist, TRACK_ALLOC, addr);
3201
3202 return freelist;
3203 }
3204
3205 freelist = freeze_slab(s, slab);
3206 goto retry_load_slab;
3207 }
3208
3209 slub_put_cpu_ptr(s->cpu_slab);
3210 slab = new_slab(s, gfpflags, node);
3211 c = slub_get_cpu_ptr(s->cpu_slab);
3212
3213 if (unlikely(!slab)) {
3214 slab_out_of_memory(s, gfpflags, node);
3215 return NULL;
3216 }
3217
3218 stat(s, ALLOC_SLAB);
3219
3220 if (kmem_cache_debug(s)) {
3221 freelist = alloc_single_from_new_slab(s, slab, orig_size);
3222
3223 if (unlikely(!freelist))
3224 goto new_objects;
3225
3226 if (s->flags & SLAB_STORE_USER)
3227 set_track(s, freelist, TRACK_ALLOC, addr);
3228
3229 return freelist;
3230 }
3231
3232 /*
3233 * No other reference to the slab yet so we can
3234 * muck around with it freely without cmpxchg
3235 */
3236 freelist = slab->freelist;
3237 slab->freelist = NULL;
3238 slab->inuse = slab->objects;
3239 slab->frozen = 1;
3240
3241 inc_slabs_node(s, slab_nid(slab), slab->objects);
3242
3243 if (unlikely(!pfmemalloc_match(slab, gfpflags))) {
3244 /*
3245 * For !pfmemalloc_match() case we don't load freelist so that
3246 * we don't make further mismatched allocations easier.
3247 */
3248 deactivate_slab(s, slab, get_freepointer(s, freelist));
3249 return freelist;
3250 }
3251
3252 retry_load_slab:
3253
3254 local_lock_irqsave(&s->cpu_slab->lock, flags);
3255 if (unlikely(c->slab)) {
3256 void *flush_freelist = c->freelist;
3257 struct slab *flush_slab = c->slab;
3258
3259 c->slab = NULL;
3260 c->freelist = NULL;
3261 c->tid = next_tid(c->tid);
3262
3263 local_unlock_irqrestore(&s->cpu_slab->lock, flags);
3264
3265 deactivate_slab(s, flush_slab, flush_freelist);
3266
3267 stat(s, CPUSLAB_FLUSH);
3268
3269 goto retry_load_slab;
3270 }
3271 c->slab = slab;
3272
3273 goto load_freelist;
3274 }
3275
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v3 6/7] slub: Delay freezing of partial slabs
2023-10-24 14:30 ` kernel test robot
@ 2023-10-24 14:42 ` Chengming Zhou
0 siblings, 0 replies; 30+ messages in thread
From: Chengming Zhou @ 2023-10-24 14:42 UTC (permalink / raw)
To: kernel test robot; +Cc: oe-kbuild-all
On 2023/10/24 22:30, kernel test robot wrote:
> Hi,
>
> [This is a private test report for your RFC patch.]
> kernel test robot noticed the following build errors:
>
> [auto build test ERROR on vbabka-slab/for-next]
> [also build test ERROR on linus/master v6.6-rc7 next-20231024]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url: https://github.com/intel-lab-lkp/linux/commits/chengming-zhou-linux-dev/slub-Keep-track-of-whether-slub-is-on-the-per-node-partial-list/20231024-173519
> base: git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git for-next
> patch link: https://lore.kernel.org/r/20231024093345.3676493-7-chengming.zhou%40linux.dev
> patch subject: [RFC PATCH v3 6/7] slub: Delay freezing of partial slabs
> config: sh-allnoconfig (https://download.01.org/0day-ci/archive/20231024/202310242209.VBW1Fewa-lkp@intel.com/config)
> compiler: sh4-linux-gcc (GCC) 13.2.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231024/202310242209.VBW1Fewa-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202310242209.VBW1Fewa-lkp@intel.com/
>
> All errors (new ones prefixed by >>):
>
> mm/slub.c: In function '___slab_alloc':
>>> mm/slub.c:3177:29: error: 'struct slab' has no member named 'next'
> 3177 | slab->next = NULL;
> | ^~
>>> mm/slub.c:3178:25: error: implicit declaration of function '__unfreeze_partials'; did you mean 'unfreeze_partials'? [-Werror=implicit-function-declaration]
> 3178 | __unfreeze_partials(s, slab);
> | ^~~~~~~~~~~~~~~~~~~
> | unfreeze_partials
> cc1: some warnings being treated as errors
Sorry, oops. I forgot to test without CONFIG_SLUB_CPU_PARTIAL.
Will fix it.
Thanks.
>
>
> vim +3177 mm/slub.c
>
> 3040
> 3041 /*
> 3042 * Slow path. The lockless freelist is empty or we need to perform
> 3043 * debugging duties.
> 3044 *
> 3045 * Processing is still very fast if new objects have been freed to the
> 3046 * regular freelist. In that case we simply take over the regular freelist
> 3047 * as the lockless freelist and zap the regular freelist.
> 3048 *
> 3049 * If that is not working then we fall back to the partial lists. We take the
> 3050 * first element of the freelist as the object to allocate now and move the
> 3051 * rest of the freelist to the lockless freelist.
> 3052 *
> 3053 * And if we were unable to get a new slab from the partial slab lists then
> 3054 * we need to allocate a new slab. This is the slowest path since it involves
> 3055 * a call to the page allocator and the setup of a new slab.
> 3056 *
> 3057 * Version of __slab_alloc to use when we know that preemption is
> 3058 * already disabled (which is the case for bulk allocation).
> 3059 */
> 3060 static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
> 3061 unsigned long addr, struct kmem_cache_cpu *c, unsigned int orig_size)
> 3062 {
> 3063 void *freelist;
> 3064 struct slab *slab;
> 3065 unsigned long flags;
> 3066 struct partial_context pc;
> 3067
> 3068 stat(s, ALLOC_SLOWPATH);
> 3069
> 3070 reread_slab:
> 3071
> 3072 slab = READ_ONCE(c->slab);
> 3073 if (!slab) {
> 3074 /*
> 3075 * if the node is not online or has no normal memory, just
> 3076 * ignore the node constraint
> 3077 */
> 3078 if (unlikely(node != NUMA_NO_NODE &&
> 3079 !node_isset(node, slab_nodes)))
> 3080 node = NUMA_NO_NODE;
> 3081 goto new_slab;
> 3082 }
> 3083
> 3084 if (unlikely(!node_match(slab, node))) {
> 3085 /*
> 3086 * same as above but node_match() being false already
> 3087 * implies node != NUMA_NO_NODE
> 3088 */
> 3089 if (!node_isset(node, slab_nodes)) {
> 3090 node = NUMA_NO_NODE;
> 3091 } else {
> 3092 stat(s, ALLOC_NODE_MISMATCH);
> 3093 goto deactivate_slab;
> 3094 }
> 3095 }
> 3096
> 3097 /*
> 3098 * By rights, we should be searching for a slab page that was
> 3099 * PFMEMALLOC but right now, we are losing the pfmemalloc
> 3100 * information when the page leaves the per-cpu allocator
> 3101 */
> 3102 if (unlikely(!pfmemalloc_match(slab, gfpflags)))
> 3103 goto deactivate_slab;
> 3104
> 3105 /* must check again c->slab in case we got preempted and it changed */
> 3106 local_lock_irqsave(&s->cpu_slab->lock, flags);
> 3107 if (unlikely(slab != c->slab)) {
> 3108 local_unlock_irqrestore(&s->cpu_slab->lock, flags);
> 3109 goto reread_slab;
> 3110 }
> 3111 freelist = c->freelist;
> 3112 if (freelist)
> 3113 goto load_freelist;
> 3114
> 3115 freelist = get_freelist(s, slab);
> 3116
> 3117 if (!freelist) {
> 3118 c->slab = NULL;
> 3119 c->tid = next_tid(c->tid);
> 3120 local_unlock_irqrestore(&s->cpu_slab->lock, flags);
> 3121 stat(s, DEACTIVATE_BYPASS);
> 3122 goto new_slab;
> 3123 }
> 3124
> 3125 stat(s, ALLOC_REFILL);
> 3126
> 3127 load_freelist:
> 3128
> 3129 lockdep_assert_held(this_cpu_ptr(&s->cpu_slab->lock));
> 3130
> 3131 /*
> 3132 * freelist is pointing to the list of objects to be used.
> 3133 * slab is pointing to the slab from which the objects are obtained.
> 3134 * That slab must be frozen for per cpu allocations to work.
> 3135 */
> 3136 VM_BUG_ON(!c->slab->frozen);
> 3137 c->freelist = get_freepointer(s, freelist);
> 3138 c->tid = next_tid(c->tid);
> 3139 local_unlock_irqrestore(&s->cpu_slab->lock, flags);
> 3140 return freelist;
> 3141
> 3142 deactivate_slab:
> 3143
> 3144 local_lock_irqsave(&s->cpu_slab->lock, flags);
> 3145 if (slab != c->slab) {
> 3146 local_unlock_irqrestore(&s->cpu_slab->lock, flags);
> 3147 goto reread_slab;
> 3148 }
> 3149 freelist = c->freelist;
> 3150 c->slab = NULL;
> 3151 c->freelist = NULL;
> 3152 c->tid = next_tid(c->tid);
> 3153 local_unlock_irqrestore(&s->cpu_slab->lock, flags);
> 3154 deactivate_slab(s, slab, freelist);
> 3155
> 3156 new_slab:
> 3157
> 3158 while (slub_percpu_partial(c)) {
> 3159 local_lock_irqsave(&s->cpu_slab->lock, flags);
> 3160 if (unlikely(c->slab)) {
> 3161 local_unlock_irqrestore(&s->cpu_slab->lock, flags);
> 3162 goto reread_slab;
> 3163 }
> 3164 if (unlikely(!slub_percpu_partial(c))) {
> 3165 local_unlock_irqrestore(&s->cpu_slab->lock, flags);
> 3166 /* we were preempted and partial list got empty */
> 3167 goto new_objects;
> 3168 }
> 3169
> 3170 slab = slub_percpu_partial(c);
> 3171 slub_set_percpu_partial(c, slab);
> 3172 local_unlock_irqrestore(&s->cpu_slab->lock, flags);
> 3173 stat(s, CPU_PARTIAL_ALLOC);
> 3174
> 3175 if (unlikely(!node_match(slab, node) ||
> 3176 !pfmemalloc_match(slab, gfpflags))) {
>> 3177 slab->next = NULL;
>> 3178 __unfreeze_partials(s, slab);
> 3179 continue;
> 3180 }
> 3181
> 3182 freelist = freeze_slab(s, slab);
> 3183 goto retry_load_slab;
> 3184 }
> 3185
> 3186 new_objects:
> 3187
> 3188 pc.flags = gfpflags;
> 3189 pc.orig_size = orig_size;
> 3190 slab = get_partial(s, node, &pc);
> 3191 if (slab) {
> 3192 if (kmem_cache_debug(s)) {
> 3193 freelist = pc.object;
> 3194 /*
> 3195 * For debug caches here we had to go through
> 3196 * alloc_single_from_partial() so just store the
> 3197 * tracking info and return the object.
> 3198 */
> 3199 if (s->flags & SLAB_STORE_USER)
> 3200 set_track(s, freelist, TRACK_ALLOC, addr);
> 3201
> 3202 return freelist;
> 3203 }
> 3204
> 3205 freelist = freeze_slab(s, slab);
> 3206 goto retry_load_slab;
> 3207 }
> 3208
> 3209 slub_put_cpu_ptr(s->cpu_slab);
> 3210 slab = new_slab(s, gfpflags, node);
> 3211 c = slub_get_cpu_ptr(s->cpu_slab);
> 3212
> 3213 if (unlikely(!slab)) {
> 3214 slab_out_of_memory(s, gfpflags, node);
> 3215 return NULL;
> 3216 }
> 3217
> 3218 stat(s, ALLOC_SLAB);
> 3219
> 3220 if (kmem_cache_debug(s)) {
> 3221 freelist = alloc_single_from_new_slab(s, slab, orig_size);
> 3222
> 3223 if (unlikely(!freelist))
> 3224 goto new_objects;
> 3225
> 3226 if (s->flags & SLAB_STORE_USER)
> 3227 set_track(s, freelist, TRACK_ALLOC, addr);
> 3228
> 3229 return freelist;
> 3230 }
> 3231
> 3232 /*
> 3233 * No other reference to the slab yet so we can
> 3234 * muck around with it freely without cmpxchg
> 3235 */
> 3236 freelist = slab->freelist;
> 3237 slab->freelist = NULL;
> 3238 slab->inuse = slab->objects;
> 3239 slab->frozen = 1;
> 3240
> 3241 inc_slabs_node(s, slab_nid(slab), slab->objects);
> 3242
> 3243 if (unlikely(!pfmemalloc_match(slab, gfpflags))) {
> 3244 /*
> 3245 * For !pfmemalloc_match() case we don't load freelist so that
> 3246 * we don't make further mismatched allocations easier.
> 3247 */
> 3248 deactivate_slab(s, slab, get_freepointer(s, freelist));
> 3249 return freelist;
> 3250 }
> 3251
> 3252 retry_load_slab:
> 3253
> 3254 local_lock_irqsave(&s->cpu_slab->lock, flags);
> 3255 if (unlikely(c->slab)) {
> 3256 void *flush_freelist = c->freelist;
> 3257 struct slab *flush_slab = c->slab;
> 3258
> 3259 c->slab = NULL;
> 3260 c->freelist = NULL;
> 3261 c->tid = next_tid(c->tid);
> 3262
> 3263 local_unlock_irqrestore(&s->cpu_slab->lock, flags);
> 3264
> 3265 deactivate_slab(s, flush_slab, flush_freelist);
> 3266
> 3267 stat(s, CPUSLAB_FLUSH);
> 3268
> 3269 goto retry_load_slab;
> 3270 }
> 3271 c->slab = slab;
> 3272
> 3273 goto load_freelist;
> 3274 }
> 3275
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v3 6/7] slub: Delay freezing of partial slabs
2023-10-24 9:33 ` [RFC PATCH v3 6/7] slub: Delay freezing of partial slabs chengming.zhou
2023-10-24 14:30 ` kernel test robot
@ 2023-10-24 15:22 ` kernel test robot
2023-10-25 2:18 ` Chengming Zhou
` (2 subsequent siblings)
4 siblings, 0 replies; 30+ messages in thread
From: kernel test robot @ 2023-10-24 15:22 UTC (permalink / raw)
To: chengming.zhou; +Cc: llvm, oe-kbuild-all
Hi,
[This is a private test report for your RFC patch.]
kernel test robot noticed the following build errors:
[auto build test ERROR on vbabka-slab/for-next]
[also build test ERROR on linus/master v6.6-rc7 next-20231024]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/chengming-zhou-linux-dev/slub-Keep-track-of-whether-slub-is-on-the-per-node-partial-list/20231024-173519
base: git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git for-next
patch link: https://lore.kernel.org/r/20231024093345.3676493-7-chengming.zhou%40linux.dev
patch subject: [RFC PATCH v3 6/7] slub: Delay freezing of partial slabs
config: um-allnoconfig (https://download.01.org/0day-ci/archive/20231024/202310242306.AvWAQtIn-lkp@intel.com/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project.git 4a5ac14ee968ff0ad5d2cc1ffa0299048db4c88a)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231024/202310242306.AvWAQtIn-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202310242306.AvWAQtIn-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from mm/slub.c:14:
In file included from include/linux/swap.h:9:
In file included from include/linux/memcontrol.h:13:
In file included from include/linux/cgroup.h:26:
In file included from include/linux/kernel_stat.h:9:
In file included from include/linux/interrupt.h:11:
In file included from include/linux/hardirq.h:11:
In file included from arch/um/include/asm/hardirq.h:5:
In file included from include/asm-generic/hardirq.h:17:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/um/include/asm/io.h:24:
include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
547 | val = __raw_readb(PCI_IOBASE + addr);
| ~~~~~~~~~~ ^
include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
560 | val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
| ~~~~~~~~~~ ^
include/uapi/linux/byteorder/little_endian.h:37:51: note: expanded from macro '__le16_to_cpu'
37 | #define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
| ^
In file included from mm/slub.c:14:
In file included from include/linux/swap.h:9:
In file included from include/linux/memcontrol.h:13:
In file included from include/linux/cgroup.h:26:
In file included from include/linux/kernel_stat.h:9:
In file included from include/linux/interrupt.h:11:
In file included from include/linux/hardirq.h:11:
In file included from arch/um/include/asm/hardirq.h:5:
In file included from include/asm-generic/hardirq.h:17:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/um/include/asm/io.h:24:
include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
573 | val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
| ~~~~~~~~~~ ^
include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from macro '__le32_to_cpu'
35 | #define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
| ^
In file included from mm/slub.c:14:
In file included from include/linux/swap.h:9:
In file included from include/linux/memcontrol.h:13:
In file included from include/linux/cgroup.h:26:
In file included from include/linux/kernel_stat.h:9:
In file included from include/linux/interrupt.h:11:
In file included from include/linux/hardirq.h:11:
In file included from arch/um/include/asm/hardirq.h:5:
In file included from include/asm-generic/hardirq.h:17:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:13:
In file included from arch/um/include/asm/io.h:24:
include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
584 | __raw_writeb(value, PCI_IOBASE + addr);
| ~~~~~~~~~~ ^
include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
594 | __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
| ~~~~~~~~~~ ^
include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
604 | __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
| ~~~~~~~~~~ ^
include/asm-generic/io.h:692:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
692 | readsb(PCI_IOBASE + addr, buffer, count);
| ~~~~~~~~~~ ^
include/asm-generic/io.h:700:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
700 | readsw(PCI_IOBASE + addr, buffer, count);
| ~~~~~~~~~~ ^
include/asm-generic/io.h:708:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
708 | readsl(PCI_IOBASE + addr, buffer, count);
| ~~~~~~~~~~ ^
include/asm-generic/io.h:717:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
717 | writesb(PCI_IOBASE + addr, buffer, count);
| ~~~~~~~~~~ ^
include/asm-generic/io.h:726:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
726 | writesw(PCI_IOBASE + addr, buffer, count);
| ~~~~~~~~~~ ^
include/asm-generic/io.h:735:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
735 | writesl(PCI_IOBASE + addr, buffer, count);
| ~~~~~~~~~~ ^
mm/slub.c:2235:15: warning: variable 'partial_slabs' set but not used [-Wunused-but-set-variable]
2235 | unsigned int partial_slabs = 0;
| ^
>> mm/slub.c:3177:10: error: no member named 'next' in 'struct slab'
3177 | slab->next = NULL;
| ~~~~ ^
>> mm/slub.c:3178:4: error: call to undeclared function '__unfreeze_partials'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
3178 | __unfreeze_partials(s, slab);
| ^
mm/slub.c:3178:4: note: did you mean 'unfreeze_partials'?
mm/slub.c:2676:20: note: 'unfreeze_partials' declared here
2676 | static inline void unfreeze_partials(struct kmem_cache *s) { }
| ^
13 warnings and 2 errors generated.
vim +3177 mm/slub.c
3040
3041 /*
3042 * Slow path. The lockless freelist is empty or we need to perform
3043 * debugging duties.
3044 *
3045 * Processing is still very fast if new objects have been freed to the
3046 * regular freelist. In that case we simply take over the regular freelist
3047 * as the lockless freelist and zap the regular freelist.
3048 *
3049 * If that is not working then we fall back to the partial lists. We take the
3050 * first element of the freelist as the object to allocate now and move the
3051 * rest of the freelist to the lockless freelist.
3052 *
3053 * And if we were unable to get a new slab from the partial slab lists then
3054 * we need to allocate a new slab. This is the slowest path since it involves
3055 * a call to the page allocator and the setup of a new slab.
3056 *
3057 * Version of __slab_alloc to use when we know that preemption is
3058 * already disabled (which is the case for bulk allocation).
3059 */
3060 static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
3061 unsigned long addr, struct kmem_cache_cpu *c, unsigned int orig_size)
3062 {
3063 void *freelist;
3064 struct slab *slab;
3065 unsigned long flags;
3066 struct partial_context pc;
3067
3068 stat(s, ALLOC_SLOWPATH);
3069
3070 reread_slab:
3071
3072 slab = READ_ONCE(c->slab);
3073 if (!slab) {
3074 /*
3075 * if the node is not online or has no normal memory, just
3076 * ignore the node constraint
3077 */
3078 if (unlikely(node != NUMA_NO_NODE &&
3079 !node_isset(node, slab_nodes)))
3080 node = NUMA_NO_NODE;
3081 goto new_slab;
3082 }
3083
3084 if (unlikely(!node_match(slab, node))) {
3085 /*
3086 * same as above but node_match() being false already
3087 * implies node != NUMA_NO_NODE
3088 */
3089 if (!node_isset(node, slab_nodes)) {
3090 node = NUMA_NO_NODE;
3091 } else {
3092 stat(s, ALLOC_NODE_MISMATCH);
3093 goto deactivate_slab;
3094 }
3095 }
3096
3097 /*
3098 * By rights, we should be searching for a slab page that was
3099 * PFMEMALLOC but right now, we are losing the pfmemalloc
3100 * information when the page leaves the per-cpu allocator
3101 */
3102 if (unlikely(!pfmemalloc_match(slab, gfpflags)))
3103 goto deactivate_slab;
3104
3105 /* must check again c->slab in case we got preempted and it changed */
3106 local_lock_irqsave(&s->cpu_slab->lock, flags);
3107 if (unlikely(slab != c->slab)) {
3108 local_unlock_irqrestore(&s->cpu_slab->lock, flags);
3109 goto reread_slab;
3110 }
3111 freelist = c->freelist;
3112 if (freelist)
3113 goto load_freelist;
3114
3115 freelist = get_freelist(s, slab);
3116
3117 if (!freelist) {
3118 c->slab = NULL;
3119 c->tid = next_tid(c->tid);
3120 local_unlock_irqrestore(&s->cpu_slab->lock, flags);
3121 stat(s, DEACTIVATE_BYPASS);
3122 goto new_slab;
3123 }
3124
3125 stat(s, ALLOC_REFILL);
3126
3127 load_freelist:
3128
3129 lockdep_assert_held(this_cpu_ptr(&s->cpu_slab->lock));
3130
3131 /*
3132 * freelist is pointing to the list of objects to be used.
3133 * slab is pointing to the slab from which the objects are obtained.
3134 * That slab must be frozen for per cpu allocations to work.
3135 */
3136 VM_BUG_ON(!c->slab->frozen);
3137 c->freelist = get_freepointer(s, freelist);
3138 c->tid = next_tid(c->tid);
3139 local_unlock_irqrestore(&s->cpu_slab->lock, flags);
3140 return freelist;
3141
3142 deactivate_slab:
3143
3144 local_lock_irqsave(&s->cpu_slab->lock, flags);
3145 if (slab != c->slab) {
3146 local_unlock_irqrestore(&s->cpu_slab->lock, flags);
3147 goto reread_slab;
3148 }
3149 freelist = c->freelist;
3150 c->slab = NULL;
3151 c->freelist = NULL;
3152 c->tid = next_tid(c->tid);
3153 local_unlock_irqrestore(&s->cpu_slab->lock, flags);
3154 deactivate_slab(s, slab, freelist);
3155
3156 new_slab:
3157
3158 while (slub_percpu_partial(c)) {
3159 local_lock_irqsave(&s->cpu_slab->lock, flags);
3160 if (unlikely(c->slab)) {
3161 local_unlock_irqrestore(&s->cpu_slab->lock, flags);
3162 goto reread_slab;
3163 }
3164 if (unlikely(!slub_percpu_partial(c))) {
3165 local_unlock_irqrestore(&s->cpu_slab->lock, flags);
3166 /* we were preempted and partial list got empty */
3167 goto new_objects;
3168 }
3169
3170 slab = slub_percpu_partial(c);
3171 slub_set_percpu_partial(c, slab);
3172 local_unlock_irqrestore(&s->cpu_slab->lock, flags);
3173 stat(s, CPU_PARTIAL_ALLOC);
3174
3175 if (unlikely(!node_match(slab, node) ||
3176 !pfmemalloc_match(slab, gfpflags))) {
> 3177 slab->next = NULL;
> 3178 __unfreeze_partials(s, slab);
3179 continue;
3180 }
3181
3182 freelist = freeze_slab(s, slab);
3183 goto retry_load_slab;
3184 }
3185
3186 new_objects:
3187
3188 pc.flags = gfpflags;
3189 pc.orig_size = orig_size;
3190 slab = get_partial(s, node, &pc);
3191 if (slab) {
3192 if (kmem_cache_debug(s)) {
3193 freelist = pc.object;
3194 /*
3195 * For debug caches here we had to go through
3196 * alloc_single_from_partial() so just store the
3197 * tracking info and return the object.
3198 */
3199 if (s->flags & SLAB_STORE_USER)
3200 set_track(s, freelist, TRACK_ALLOC, addr);
3201
3202 return freelist;
3203 }
3204
3205 freelist = freeze_slab(s, slab);
3206 goto retry_load_slab;
3207 }
3208
3209 slub_put_cpu_ptr(s->cpu_slab);
3210 slab = new_slab(s, gfpflags, node);
3211 c = slub_get_cpu_ptr(s->cpu_slab);
3212
3213 if (unlikely(!slab)) {
3214 slab_out_of_memory(s, gfpflags, node);
3215 return NULL;
3216 }
3217
3218 stat(s, ALLOC_SLAB);
3219
3220 if (kmem_cache_debug(s)) {
3221 freelist = alloc_single_from_new_slab(s, slab, orig_size);
3222
3223 if (unlikely(!freelist))
3224 goto new_objects;
3225
3226 if (s->flags & SLAB_STORE_USER)
3227 set_track(s, freelist, TRACK_ALLOC, addr);
3228
3229 return freelist;
3230 }
3231
3232 /*
3233 * No other reference to the slab yet so we can
3234 * muck around with it freely without cmpxchg
3235 */
3236 freelist = slab->freelist;
3237 slab->freelist = NULL;
3238 slab->inuse = slab->objects;
3239 slab->frozen = 1;
3240
3241 inc_slabs_node(s, slab_nid(slab), slab->objects);
3242
3243 if (unlikely(!pfmemalloc_match(slab, gfpflags))) {
3244 /*
3245 * For !pfmemalloc_match() case we don't load freelist so that
3246 * we don't make further mismatched allocations easier.
3247 */
3248 deactivate_slab(s, slab, get_freepointer(s, freelist));
3249 return freelist;
3250 }
3251
3252 retry_load_slab:
3253
3254 local_lock_irqsave(&s->cpu_slab->lock, flags);
3255 if (unlikely(c->slab)) {
3256 void *flush_freelist = c->freelist;
3257 struct slab *flush_slab = c->slab;
3258
3259 c->slab = NULL;
3260 c->freelist = NULL;
3261 c->tid = next_tid(c->tid);
3262
3263 local_unlock_irqrestore(&s->cpu_slab->lock, flags);
3264
3265 deactivate_slab(s, flush_slab, flush_freelist);
3266
3267 stat(s, CPUSLAB_FLUSH);
3268
3269 goto retry_load_slab;
3270 }
3271 c->slab = slab;
3272
3273 goto load_freelist;
3274 }
3275
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v3 6/7] slub: Delay freezing of partial slabs
2023-10-24 9:33 ` [RFC PATCH v3 6/7] slub: Delay freezing of partial slabs chengming.zhou
2023-10-24 14:30 ` kernel test robot
2023-10-24 15:22 ` kernel test robot
@ 2023-10-25 2:18 ` Chengming Zhou
2023-10-26 5:49 ` kernel test robot
2023-10-31 9:50 ` Vlastimil Babka
4 siblings, 0 replies; 30+ messages in thread
From: Chengming Zhou @ 2023-10-25 2:18 UTC (permalink / raw)
To: cl, penberg
Cc: rientjes, iamjoonsoo.kim, akpm, vbabka, roman.gushchin,
42.hyeyoo, linux-mm, linux-kernel, Chengming Zhou
On 2023/10/24 17:33, chengming.zhou@linux.dev wrote:
> From: Chengming Zhou <zhouchengming@bytedance.com>
>
> Now we will freeze slabs when moving them out of node partial list to
> cpu partial list, this method needs two cmpxchg_double operations:
>
> 1. freeze slab (acquire_slab()) under the node list_lock
> 2. get_freelist() when pick used in ___slab_alloc()
>
> Actually we don't need to freeze when moving slabs out of node partial
> list, we can delay freezing to when use slab freelist in ___slab_alloc(),
> so we can save one cmpxchg_double().
>
> And there are other good points:
> - The moving of slabs between node partial list and cpu partial list
> becomes simpler, since we don't need to freeze or unfreeze at all.
>
> - The node list_lock contention would be less, since we don't need to
> freeze any slab under the node list_lock.
>
> We can achieve this because there is no concurrent path would manipulate
> the partial slab list except the __slab_free() path, which is serialized
> now.
>
> Since the slab returned by get_partial() interfaces is not frozen anymore
> and no freelist in the partial_context, so we need to use the introduced
> freeze_slab() to freeze it and get its freelist.
>
> Similarly, the slabs on the CPU partial list are not frozen anymore,
> we need to freeze_slab() on it before use.
>
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> ---
> mm/slub.c | 111 +++++++++++-------------------------------------------
> 1 file changed, 21 insertions(+), 90 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 5b428648021f..486d44421432 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2215,51 +2215,6 @@ static void *alloc_single_from_new_slab(struct kmem_cache *s,
> return object;
> }
>
> -/*
> - * Remove slab from the partial list, freeze it and
> - * return the pointer to the freelist.
> - *
> - * Returns a list of objects or NULL if it fails.
> - */
> -static inline void *acquire_slab(struct kmem_cache *s,
> - struct kmem_cache_node *n, struct slab *slab,
> - int mode)
> -{
> - void *freelist;
> - unsigned long counters;
> - struct slab new;
> -
> - lockdep_assert_held(&n->list_lock);
> -
> - /*
> - * Zap the freelist and set the frozen bit.
> - * The old freelist is the list of objects for the
> - * per cpu allocation list.
> - */
> - freelist = slab->freelist;
> - counters = slab->counters;
> - new.counters = counters;
> - if (mode) {
> - new.inuse = slab->objects;
> - new.freelist = NULL;
> - } else {
> - new.freelist = freelist;
> - }
> -
> - VM_BUG_ON(new.frozen);
> - new.frozen = 1;
> -
> - if (!__slab_update_freelist(s, slab,
> - freelist, counters,
> - new.freelist, new.counters,
> - "acquire_slab"))
> - return NULL;
> -
> - remove_partial(n, slab);
> - WARN_ON(!freelist);
> - return freelist;
> -}
> -
> #ifdef CONFIG_SLUB_CPU_PARTIAL
> static void put_cpu_partial(struct kmem_cache *s, struct slab *slab, int drain);
> #else
> @@ -2276,7 +2231,6 @@ static struct slab *get_partial_node(struct kmem_cache *s,
> struct partial_context *pc)
> {
> struct slab *slab, *slab2, *partial = NULL;
> - void *object = NULL;
> unsigned long flags;
> unsigned int partial_slabs = 0;
>
> @@ -2295,7 +2249,7 @@ static struct slab *get_partial_node(struct kmem_cache *s,
> continue;
>
> if (IS_ENABLED(CONFIG_SLUB_TINY) || kmem_cache_debug(s)) {
> - object = alloc_single_from_partial(s, n, slab,
> + void *object = alloc_single_from_partial(s, n, slab,
> pc->orig_size);
> if (object) {
> partial = slab;
> @@ -2305,13 +2259,10 @@ static struct slab *get_partial_node(struct kmem_cache *s,
> continue;
> }
>
> - object = acquire_slab(s, n, slab, object == NULL);
> - if (!object)
> - break;
> + remove_partial(n, slab);
>
> if (!partial) {
> partial = slab;
> - pc->object = object;
> stat(s, ALLOC_FROM_PARTIAL);
> } else {
> put_cpu_partial(s, slab, 0);
> @@ -2610,9 +2561,6 @@ static void __unfreeze_partials(struct kmem_cache *s, struct slab *partial_slab)
> unsigned long flags = 0;
>
> while (partial_slab) {
> - struct slab new;
> - struct slab old;
> -
> slab = partial_slab;
> partial_slab = slab->next;
>
> @@ -2625,23 +2573,7 @@ static void __unfreeze_partials(struct kmem_cache *s, struct slab *partial_slab)
> spin_lock_irqsave(&n->list_lock, flags);
> }
>
> - do {
> -
> - old.freelist = slab->freelist;
> - old.counters = slab->counters;
> - VM_BUG_ON(!old.frozen);
> -
> - new.counters = old.counters;
> - new.freelist = old.freelist;
> -
> - new.frozen = 0;
> -
> - } while (!__slab_update_freelist(s, slab,
> - old.freelist, old.counters,
> - new.freelist, new.counters,
> - "unfreezing slab"));
> -
> - if (unlikely(!new.inuse && n->nr_partial >= s->min_partial)) {
> + if (unlikely(!slab->inuse && n->nr_partial >= s->min_partial)) {
> slab->next = slab_to_discard;
> slab_to_discard = slab;
> } else {
> @@ -3148,7 +3080,6 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
> node = NUMA_NO_NODE;
> goto new_slab;
> }
> -redo:
>
> if (unlikely(!node_match(slab, node))) {
> /*
> @@ -3224,7 +3155,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>
> new_slab:
>
> - if (slub_percpu_partial(c)) {
> + while (slub_percpu_partial(c)) {
> local_lock_irqsave(&s->cpu_slab->lock, flags);
> if (unlikely(c->slab)) {
> local_unlock_irqrestore(&s->cpu_slab->lock, flags);
> @@ -3236,11 +3167,20 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
> goto new_objects;
> }
>
> - slab = c->slab = slub_percpu_partial(c);
> + slab = slub_percpu_partial(c);
> slub_set_percpu_partial(c, slab);
> local_unlock_irqrestore(&s->cpu_slab->lock, flags);
> stat(s, CPU_PARTIAL_ALLOC);
> - goto redo;
> +
> + if (unlikely(!node_match(slab, node) ||
> + !pfmemalloc_match(slab, gfpflags))) {
> + slab->next = NULL;
> + __unfreeze_partials(s, slab);
> + continue;
> + }
> +
> + freelist = freeze_slab(s, slab);
> + goto retry_load_slab;
> }
Oops, this while(slub_percpu_partial(c)) loop block should be put in #ifdef CONFIG_SLUB_CPU_PARTIAL,
since the slab->next and __unfreeze_partials() only defined when CONFIG_SLUB_CPU_PARTIAL.
And I should append a cleanup patch to rename all *unfreeze_partials* functions to *put_partials*
since there is no "unfreeze" in these functions anymore.
Will do in the next version.
Thanks.
>
> new_objects:
> @@ -3249,8 +3189,8 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
> pc.orig_size = orig_size;
> slab = get_partial(s, node, &pc);
> if (slab) {
> - freelist = pc.object;
> if (kmem_cache_debug(s)) {
> + freelist = pc.object;
> /*
> * For debug caches here we had to go through
> * alloc_single_from_partial() so just store the
> @@ -3262,6 +3202,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
> return freelist;
> }
>
> + freelist = freeze_slab(s, slab);
> goto retry_load_slab;
> }
>
> @@ -3663,18 +3604,8 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
> was_frozen = new.frozen;
> new.inuse -= cnt;
> if ((!new.inuse || !prior) && !was_frozen) {
> -
> - if (kmem_cache_has_cpu_partial(s) && !prior) {
> -
> - /*
> - * Slab was on no list before and will be
> - * partially empty
> - * We can defer the list move and instead
> - * freeze it.
> - */
> - new.frozen = 1;
> -
> - } else { /* Needs to be taken off a list */
> + /* Needs to be taken off a list */
> + if (!kmem_cache_has_cpu_partial(s) || prior) {
>
> n = get_node(s, slab_nid(slab));
> /*
> @@ -3704,9 +3635,9 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
> * activity can be necessary.
> */
> stat(s, FREE_FROZEN);
> - } else if (new.frozen) {
> + } else if (kmem_cache_has_cpu_partial(s) && !prior) {
> /*
> - * If we just froze the slab then put it onto the
> + * If we started with a full slab then put it onto the
> * per cpu partial list.
> */
> put_cpu_partial(s, slab, 1);
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v3 6/7] slub: Delay freezing of partial slabs
2023-10-24 9:33 ` [RFC PATCH v3 6/7] slub: Delay freezing of partial slabs chengming.zhou
` (2 preceding siblings ...)
2023-10-25 2:18 ` Chengming Zhou
@ 2023-10-26 5:49 ` kernel test robot
2023-10-26 7:41 ` Chengming Zhou
2023-10-31 9:50 ` Vlastimil Babka
4 siblings, 1 reply; 30+ messages in thread
From: kernel test robot @ 2023-10-26 5:49 UTC (permalink / raw)
To: chengming.zhou
Cc: oe-lkp, lkp, linux-mm, cl, penberg, rientjes, iamjoonsoo.kim,
akpm, vbabka, roman.gushchin, 42.hyeyoo, linux-kernel,
chengming.zhou, Chengming Zhou, oliver.sang
Hello,
kernel test robot noticed "WARNING:at_mm/slub.c:#___slab_alloc" on:
commit: b34342e2732b0dc92b29d6807c5314e2e5e0c27c ("[RFC PATCH v3 6/7] slub: Delay freezing of partial slabs")
url: https://github.com/intel-lab-lkp/linux/commits/chengming-zhou-linux-dev/slub-Keep-track-of-whether-slub-is-on-the-per-node-partial-list/20231024-173519
base: git://git.kernel.org/cgit/linux/kernel/git/vbabka/slab.git for-next
patch link: https://lore.kernel.org/all/20231024093345.3676493-7-chengming.zhou@linux.dev/
patch subject: [RFC PATCH v3 6/7] slub: Delay freezing of partial slabs
in testcase: rcutorture
version:
with following parameters:
runtime: 300s
test: default
torture_type: rcu
compiler: gcc-12
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
(please refer to attached dmesg/kmsg for entire log/backtrace)
+------------------------------------------------+------------+------------+
| | e87d1d9973 | b34342e273 |
+------------------------------------------------+------------+------------+
| boot_successes | 15 | 0 |
| boot_failures | 0 | 15 |
| WARNING:at_mm/slub.c:#___slab_alloc | 0 | 15 |
| RIP:___slab_alloc | 0 | 15 |
+------------------------------------------------+------------+------------+
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202310261032.b4590eb0-oliver.sang@intel.com
[ 4.907500][ T1] ------------[ cut here ]------------
[ 4.908232][ T1] WARNING: CPU: 0 PID: 1 at mm/slub.c:577 ___slab_alloc (mm/slub.c:577 mm/slub.c:3033 mm/slub.c:3205)
[ 4.909242][ T1] Modules linked in:
[ 4.909739][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.6.0-rc5-00013-gb34342e2732b #1
[ 4.910746][ T1] RIP: 0010:___slab_alloc (mm/slub.c:577 mm/slub.c:3033 mm/slub.c:3205)
[ 4.911433][ T1] Code: 9f 05 66 25 ff 7f 66 89 45 b8 48 8b 4d b8 45 85 e4 74 1a 65 8b 05 dd c4 a3 7e 85 c0 75 0f 65 8b 05 de c1 87 7e 85 c0 74 04 90 <0f> 0b 90 81 e6 00 00 00 40 74 31 4c 89 f8 f0 49 0f c7 48 20 0f 84
All code
========
0: 9f lahf
1: 05 66 25 ff 7f add $0x7fff2566,%eax
6: 66 89 45 b8 mov %ax,-0x48(%rbp)
a: 48 8b 4d b8 mov -0x48(%rbp),%rcx
e: 45 85 e4 test %r12d,%r12d
11: 74 1a je 0x2d
13: 65 8b 05 dd c4 a3 7e mov %gs:0x7ea3c4dd(%rip),%eax # 0x7ea3c4f7
1a: 85 c0 test %eax,%eax
1c: 75 0f jne 0x2d
1e: 65 8b 05 de c1 87 7e mov %gs:0x7e87c1de(%rip),%eax # 0x7e87c203
25: 85 c0 test %eax,%eax
27: 74 04 je 0x2d
29: 90 nop
2a:* 0f 0b ud2 <-- trapping instruction
2c: 90 nop
2d: 81 e6 00 00 00 40 and $0x40000000,%esi
33: 74 31 je 0x66
35: 4c 89 f8 mov %r15,%rax
38: f0 49 0f c7 48 20 lock cmpxchg16b 0x20(%r8)
3e: 0f .byte 0xf
3f: 84 .byte 0x84
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 90 nop
3: 81 e6 00 00 00 40 and $0x40000000,%esi
9: 74 31 je 0x3c
b: 4c 89 f8 mov %r15,%rax
e: f0 49 0f c7 48 20 lock cmpxchg16b 0x20(%r8)
14: 0f .byte 0xf
15: 84 .byte 0x84
[ 4.913822][ T1] RSP: 0000:ffffc9000001f830 EFLAGS: 00010202
[ 4.914602][ T1] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000080270027
[ 4.914743][ T1] RDX: 0000000000270021 RSI: 0000000048000000 RDI: ffffffff842e066b
[ 4.915745][ T1] RBP: ffffc9000001f8e0 R08: ffffea0005931d40 R09: fffffbfff0e33a5a
[ 4.916743][ T1] R10: ffffffff8719d2d7 R11: 0000000000000000 R12: 0000000000000001
[ 4.917696][ T1] R13: ffff8883ae809430 R14: ffff88810033e3c0 R15: ffff888164c75dd0
[ 4.918688][ T1] FS: 0000000000000000(0000) GS:ffff8883ae600000(0000) knlGS:0000000000000000
[ 4.918754][ T1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4.919522][ T1] CR2: ffff88843ffff000 CR3: 0000000005c89000 CR4: 00000000000406f0
[ 4.920500][ T1] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4.921496][ T1] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 4.922461][ T1] Call Trace:
[ 4.922742][ T1] <TASK>
[ 4.923107][ T1] ? __warn (kernel/panic.c:673)
[ 4.923632][ T1] ? ___slab_alloc (mm/slub.c:577 mm/slub.c:3033 mm/slub.c:3205)
[ 4.924250][ T1] ? report_bug (lib/bug.c:180 lib/bug.c:219)
[ 4.924831][ T1] ? handle_bug (arch/x86/kernel/traps.c:237)
[ 4.925376][ T1] ? exc_invalid_op (arch/x86/kernel/traps.c:258 (discriminator 1))
[ 4.925962][ T1] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:568)
[ 4.926595][ T1] ? _raw_spin_unlock_irqrestore (arch/x86/include/asm/irqflags.h:42 arch/x86/include/asm/irqflags.h:77 arch/x86/include/asm/irqflags.h:135 include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194)
[ 4.926741][ T1] ? ___slab_alloc (mm/slub.c:577 mm/slub.c:3033 mm/slub.c:3205)
[ 4.927370][ T1] ? acpi_ut_allocate_object_desc_dbg (include/linux/slab.h:710 include/acpi/platform/aclinuxex.h:67 drivers/acpi/acpica/utobject.c:359)
[ 4.928186][ T1] ? kmem_cache_alloc (mm/slub.c:3295 mm/slub.c:3348 mm/slub.c:3440 mm/slub.c:3458 mm/slub.c:3465 mm/slub.c:3474)
[ 4.928824][ T1] kmem_cache_alloc (mm/slub.c:3295 mm/slub.c:3348 mm/slub.c:3440 mm/slub.c:3458 mm/slub.c:3465 mm/slub.c:3474)
[ 4.929442][ T1] ? acpi_ut_allocate_object_desc_dbg (include/linux/slab.h:710 include/acpi/platform/aclinuxex.h:67 drivers/acpi/acpica/utobject.c:359)
[ 4.930221][ T1] acpi_ut_allocate_object_desc_dbg (include/linux/slab.h:710 include/acpi/platform/aclinuxex.h:67 drivers/acpi/acpica/utobject.c:359)
[ 4.930743][ T1] acpi_ut_create_internal_object_dbg (drivers/acpi/acpica/utobject.c:71)
[ 4.931576][ T1] acpi_ut_copy_esimple_to_isimple (drivers/acpi/acpica/utcopy.c:434)
[ 4.932394][ T1] acpi_ut_copy_eobject_to_iobject (drivers/acpi/acpica/utcopy.c:618)
[ 4.933185][ T1] ? __kasan_kmalloc (mm/kasan/common.c:374 mm/kasan/common.c:383)
[ 4.933816][ T1] acpi_evaluate_object (drivers/acpi/acpica/nsxfeval.c:259)
[ 4.934486][ T1] ? acpi_get_data_full (drivers/acpi/acpica/nsxfeval.c:167)
[ 4.934756][ T1] ? hlock_class (arch/x86/include/asm/bitops.h:228 arch/x86/include/asm/bitops.h:240 include/asm-generic/bitops/instrumented-non-atomic.h:142 kernel/locking/lockdep.c:228)
[ 4.935345][ T1] ? __uuid_parse+0xd0/0x1b0
[ 4.936024][ T1] acpi_run_osc (drivers/acpi/bus.c:217)
[ 4.936609][ T1] ? acpi_bus_detach_private_data (drivers/acpi/bus.c:187)
[ 4.937363][ T1] ? _raw_spin_unlock_irqrestore (arch/x86/include/asm/irqflags.h:42 arch/x86/include/asm/irqflags.h:77 arch/x86/include/asm/irqflags.h:135 include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194)
[ 4.938127][ T1] ? acpi_get_data (drivers/acpi/acpica/nsxfname.c:48)
[ 4.938710][ T1] acpi_bus_init (drivers/acpi/bus.c:352 drivers/acpi/bus.c:1329)
[ 4.938742][ T1] ? up (include/linux/list.h:373 kernel/locking/semaphore.c:188)
[ 4.939205][ T1] ? acpi_sleep_proc_init (drivers/acpi/bus.c:1289)
[ 4.939862][ T1] ? _raw_spin_unlock_irqrestore (arch/x86/include/asm/irqflags.h:42 arch/x86/include/asm/irqflags.h:77 arch/x86/include/asm/irqflags.h:135 include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194)
[ 4.940581][ T1] ? acpi_os_signal_semaphore (drivers/acpi/osl.c:1294 (discriminator 5))
[ 4.941313][ T1] ? acpi_ut_release_mutex (drivers/acpi/acpica/utmutex.c:329)
[ 4.942007][ T1] ? acpi_install_address_space_handler_internal (drivers/acpi/acpica/evxfregn.c:94)
[ 4.942744][ T1] ? acpi_bus_init (drivers/acpi/bus.c:1390)
[ 4.943325][ T1] acpi_init (drivers/acpi/bus.c:1404)
[ 4.943832][ T1] ? acpi_bus_init (drivers/acpi/bus.c:1390)
[ 4.944420][ T1] ? kb3886_init (drivers/video/fbdev/core/fbmem.c:1111)
[ 4.944976][ T1] ? acpi_bus_init (drivers/acpi/bus.c:1390)
[ 4.945579][ T1] do_one_initcall (init/main.c:1232)
[ 4.946157][ T1] ? trace_event_raw_event_initcall_level (init/main.c:1223)
[ 4.946747][ T1] ? kasan_set_track (mm/kasan/common.c:52)
[ 4.947373][ T1] ? __kasan_kmalloc (mm/kasan/common.c:374 mm/kasan/common.c:383)
[ 4.948010][ T1] do_initcalls (init/main.c:1293 init/main.c:1310)
[ 4.948614][ T1] kernel_init_freeable (init/main.c:1549)
[ 4.949315][ T1] ? rest_init (init/main.c:1429)
[ 4.949894][ T1] kernel_init (init/main.c:1439)
[ 4.950743][ T1] ? _raw_spin_unlock_irq (arch/x86/include/asm/irqflags.h:42 arch/x86/include/asm/irqflags.h:77 include/linux/spinlock_api_smp.h:159 kernel/locking/spinlock.c:202)
[ 4.951386][ T1] ret_from_fork (arch/x86/kernel/process.c:153)
[ 4.951957][ T1] ? rest_init (init/main.c:1429)
[ 4.952520][ T1] ret_from_fork_asm (arch/x86/entry/entry_64.S:312)
[ 4.953143][ T1] </TASK>
[ 4.953514][ T1] irq event stamp: 85121
[ 4.954040][ T1] hardirqs last enabled at (85129): console_unlock (arch/x86/include/asm/irqflags.h:42 arch/x86/include/asm/irqflags.h:77 arch/x86/include/asm/irqflags.h:135 kernel/printk/printk.c:347 kernel/printk/printk.c:2718 kernel/printk/printk.c:3037)
[ 4.954736][ T1] hardirqs last disabled at (85138): console_unlock (kernel/printk/printk.c:345 kernel/printk/printk.c:2718 kernel/printk/printk.c:3037)
[ 4.955884][ T1] softirqs last enabled at (84862): __do_softirq (arch/x86/include/asm/preempt.h:27 kernel/softirq.c:400 kernel/softirq.c:582)
[ 4.957017][ T1] softirqs last disabled at (84857): irq_exit_rcu (kernel/softirq.c:427 kernel/softirq.c:632 kernel/softirq.c:644)
[ 4.958128][ T1] ---[ end trace 0000000000000000 ]---
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231026/202310261032.b4590eb0-oliver.sang@intel.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v3 6/7] slub: Delay freezing of partial slabs
2023-10-26 5:49 ` kernel test robot
@ 2023-10-26 7:41 ` Chengming Zhou
0 siblings, 0 replies; 30+ messages in thread
From: Chengming Zhou @ 2023-10-26 7:41 UTC (permalink / raw)
To: kernel test robot
Cc: oe-lkp, lkp, linux-mm, cl, penberg, rientjes, iamjoonsoo.kim,
akpm, vbabka, roman.gushchin, 42.hyeyoo, linux-kernel,
Chengming Zhou
On 2023/10/26 13:49, kernel test robot wrote:
>
>
> Hello,
>
> kernel test robot noticed "WARNING:at_mm/slub.c:#___slab_alloc" on:
>
> commit: b34342e2732b0dc92b29d6807c5314e2e5e0c27c ("[RFC PATCH v3 6/7] slub: Delay freezing of partial slabs")
> url: https://github.com/intel-lab-lkp/linux/commits/chengming-zhou-linux-dev/slub-Keep-track-of-whether-slub-is-on-the-per-node-partial-list/20231024-173519
> base: git://git.kernel.org/cgit/linux/kernel/git/vbabka/slab.git for-next
> patch link: https://lore.kernel.org/all/20231024093345.3676493-7-chengming.zhou@linux.dev/
> patch subject: [RFC PATCH v3 6/7] slub: Delay freezing of partial slabs
>
> in testcase: rcutorture
> version:
> with following parameters:
>
> runtime: 300s
> test: default
> torture_type: rcu
>
>
>
> compiler: gcc-12
> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
>
> (please refer to attached dmesg/kmsg for entire log/backtrace)
>
>
> +------------------------------------------------+------------+------------+
> | | e87d1d9973 | b34342e273 |
> +------------------------------------------------+------------+------------+
> | boot_successes | 15 | 0 |
> | boot_failures | 0 | 15 |
> | WARNING:at_mm/slub.c:#___slab_alloc | 0 | 15 |
> | RIP:___slab_alloc | 0 | 15 |
> +------------------------------------------------+------------+------------+
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202310261032.b4590eb0-oliver.sang@intel.com
>
>
> [ 4.907500][ T1] ------------[ cut here ]------------
> [ 4.908232][ T1] WARNING: CPU: 0 PID: 1 at mm/slub.c:577 ___slab_alloc (mm/slub.c:577 mm/slub.c:3033 mm/slub.c:3205)
> [ 4.909242][ T1] Modules linked in:
> [ 4.909739][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.6.0-rc5-00013-gb34342e2732b #1
> [ 4.910746][ T1] RIP: 0010:___slab_alloc (mm/slub.c:577 mm/slub.c:3033 mm/slub.c:3205)
The warning is triggered by "lockdep_assert_irqs_disabled()" in __slab_update_freelist(),
which is called in the new introduced freeze_slab().
We should fix it by using "slab_update_freelist()" in freeze_slab() instead, which will
disable the interrupts correctly.
Thanks!
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v3 6/7] slub: Delay freezing of partial slabs
2023-10-24 9:33 ` [RFC PATCH v3 6/7] slub: Delay freezing of partial slabs chengming.zhou
` (3 preceding siblings ...)
2023-10-26 5:49 ` kernel test robot
@ 2023-10-31 9:50 ` Vlastimil Babka
4 siblings, 0 replies; 30+ messages in thread
From: Vlastimil Babka @ 2023-10-31 9:50 UTC (permalink / raw)
To: chengming.zhou, cl, penberg
Cc: rientjes, iamjoonsoo.kim, akpm, roman.gushchin, 42.hyeyoo,
linux-mm, linux-kernel, Chengming Zhou
On 10/24/23 11:33, chengming.zhou@linux.dev wrote:
> From: Chengming Zhou <zhouchengming@bytedance.com>
>
> Now we will freeze slabs when moving them out of node partial list to
> cpu partial list, this method needs two cmpxchg_double operations:
>
> 1. freeze slab (acquire_slab()) under the node list_lock
> 2. get_freelist() when pick used in ___slab_alloc()
>
> Actually we don't need to freeze when moving slabs out of node partial
> list, we can delay freezing to when use slab freelist in ___slab_alloc(),
> so we can save one cmpxchg_double().
>
> And there are other good points:
> - The moving of slabs between node partial list and cpu partial list
> becomes simpler, since we don't need to freeze or unfreeze at all.
>
> - The node list_lock contention would be less, since we don't need to
> freeze any slab under the node list_lock.
>
> We can achieve this because there is no concurrent path would manipulate
> the partial slab list except the __slab_free() path, which is serialized
> now.
"which is now serialized by slab_test_node_partial() under the list_lock." ?
> Since the slab returned by get_partial() interfaces is not frozen anymore
> and no freelist in the partial_context, so we need to use the introduced
^ is returned in
> freeze_slab() to freeze it and get its freelist.
>
> Similarly, the slabs on the CPU partial list are not frozen anymore,
> we need to freeze_slab() on it before use.
We can now delete acquire_slab() as it became unused.
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
With the fixup for CONFIG_SLUB_CPU_PARTIAL you mentioned,
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Also agreed with followup patch to rename unfreeze_partials().
Thanks!
^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFC PATCH v3 7/7] slub: Optimize deactivate_slab()
2023-10-24 9:33 [RFC PATCH v3 0/7] slub: Delay freezing of CPU partial slabs chengming.zhou
` (5 preceding siblings ...)
2023-10-24 9:33 ` [RFC PATCH v3 6/7] slub: Delay freezing of partial slabs chengming.zhou
@ 2023-10-24 9:33 ` chengming.zhou
2023-10-31 11:15 ` Vlastimil Babka
2023-10-27 17:57 ` [RFC PATCH v3 0/7] slub: Delay freezing of CPU partial slabs Christoph Lameter
7 siblings, 1 reply; 30+ messages in thread
From: chengming.zhou @ 2023-10-24 9:33 UTC (permalink / raw)
To: cl, penberg
Cc: rientjes, iamjoonsoo.kim, akpm, vbabka, roman.gushchin,
42.hyeyoo, linux-mm, linux-kernel, chengming.zhou,
Chengming Zhou
From: Chengming Zhou <zhouchengming@bytedance.com>
Since the introduce of unfrozen slabs on cpu partial list, we don't
need to synchronize the slab frozen state under the node list_lock.
The caller of deactivate_slab() and the caller of __slab_free() won't
manipulate the slab list concurrently.
So we can get node list_lock in the last stage if we really need to
manipulate the slab list in this path.
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
mm/slub.c | 70 ++++++++++++++++++++-----------------------------------
1 file changed, 25 insertions(+), 45 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index 486d44421432..64d550e415eb 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2449,10 +2449,8 @@ static void init_kmem_cache_cpus(struct kmem_cache *s)
static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
void *freelist)
{
- enum slab_modes { M_NONE, M_PARTIAL, M_FREE, M_FULL_NOLIST };
struct kmem_cache_node *n = get_node(s, slab_nid(slab));
int free_delta = 0;
- enum slab_modes mode = M_NONE;
void *nextfree, *freelist_iter, *freelist_tail;
int tail = DEACTIVATE_TO_HEAD;
unsigned long flags = 0;
@@ -2499,58 +2497,40 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
* unfrozen and number of objects in the slab may have changed.
* Then release lock and retry cmpxchg again.
*/
-redo:
-
- old.freelist = READ_ONCE(slab->freelist);
- old.counters = READ_ONCE(slab->counters);
- VM_BUG_ON(!old.frozen);
-
- /* Determine target state of the slab */
- new.counters = old.counters;
- if (freelist_tail) {
- new.inuse -= free_delta;
- set_freepointer(s, freelist_tail, old.freelist);
- new.freelist = freelist;
- } else
- new.freelist = old.freelist;
+ do {
+ old.freelist = READ_ONCE(slab->freelist);
+ old.counters = READ_ONCE(slab->counters);
+ VM_BUG_ON(!old.frozen);
+
+ /* Determine target state of the slab */
+ new.counters = old.counters;
+ new.frozen = 0;
+ if (freelist_tail) {
+ new.inuse -= free_delta;
+ set_freepointer(s, freelist_tail, old.freelist);
+ new.freelist = freelist;
+ } else
+ new.freelist = old.freelist;
- new.frozen = 0;
+ } while (!slab_update_freelist(s, slab,
+ old.freelist, old.counters,
+ new.freelist, new.counters,
+ "unfreezing slab"));
+ /*
+ * Stage three: Manipulate the slab list based on the updated state.
+ */
if (!new.inuse && n->nr_partial >= s->min_partial) {
- mode = M_FREE;
+ stat(s, DEACTIVATE_EMPTY);
+ discard_slab(s, slab);
+ stat(s, FREE_SLAB);
} else if (new.freelist) {
- mode = M_PARTIAL;
- /*
- * Taking the spinlock removes the possibility that
- * acquire_slab() will see a slab that is frozen
- */
spin_lock_irqsave(&n->list_lock, flags);
- } else {
- mode = M_FULL_NOLIST;
- }
-
-
- if (!slab_update_freelist(s, slab,
- old.freelist, old.counters,
- new.freelist, new.counters,
- "unfreezing slab")) {
- if (mode == M_PARTIAL)
- spin_unlock_irqrestore(&n->list_lock, flags);
- goto redo;
- }
-
-
- if (mode == M_PARTIAL) {
add_partial(n, slab, tail);
spin_unlock_irqrestore(&n->list_lock, flags);
stat(s, tail);
- } else if (mode == M_FREE) {
- stat(s, DEACTIVATE_EMPTY);
- discard_slab(s, slab);
- stat(s, FREE_SLAB);
- } else if (mode == M_FULL_NOLIST) {
+ } else
stat(s, DEACTIVATE_FULL);
- }
}
#ifdef CONFIG_SLUB_CPU_PARTIAL
--
2.40.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v3 7/7] slub: Optimize deactivate_slab()
2023-10-24 9:33 ` [RFC PATCH v3 7/7] slub: Optimize deactivate_slab() chengming.zhou
@ 2023-10-31 11:15 ` Vlastimil Babka
2023-10-31 11:41 ` Chengming Zhou
0 siblings, 1 reply; 30+ messages in thread
From: Vlastimil Babka @ 2023-10-31 11:15 UTC (permalink / raw)
To: chengming.zhou, cl, penberg
Cc: rientjes, iamjoonsoo.kim, akpm, roman.gushchin, 42.hyeyoo,
linux-mm, linux-kernel, Chengming Zhou
On 10/24/23 11:33, chengming.zhou@linux.dev wrote:
> From: Chengming Zhou <zhouchengming@bytedance.com>
>
> Since the introduce of unfrozen slabs on cpu partial list, we don't
> need to synchronize the slab frozen state under the node list_lock.
>
> The caller of deactivate_slab() and the caller of __slab_free() won't
> manipulate the slab list concurrently.
>
> So we can get node list_lock in the last stage if we really need to
> manipulate the slab list in this path.
>
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Nice simplification!
> ---
> mm/slub.c | 70 ++++++++++++++++++++-----------------------------------
> 1 file changed, 25 insertions(+), 45 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 486d44421432..64d550e415eb 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2449,10 +2449,8 @@ static void init_kmem_cache_cpus(struct kmem_cache *s)
> static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
> void *freelist)
> {
> - enum slab_modes { M_NONE, M_PARTIAL, M_FREE, M_FULL_NOLIST };
> struct kmem_cache_node *n = get_node(s, slab_nid(slab));
> int free_delta = 0;
> - enum slab_modes mode = M_NONE;
> void *nextfree, *freelist_iter, *freelist_tail;
> int tail = DEACTIVATE_TO_HEAD;
> unsigned long flags = 0;
> @@ -2499,58 +2497,40 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
> * unfrozen and number of objects in the slab may have changed.
> * Then release lock and retry cmpxchg again.
> */
This comment above (including parts not visible here) should be updated as
there is no more list manipulation during unfreeze.
> -redo:
> -
> - old.freelist = READ_ONCE(slab->freelist);
> - old.counters = READ_ONCE(slab->counters);
> - VM_BUG_ON(!old.frozen);
> -
> - /* Determine target state of the slab */
> - new.counters = old.counters;
> - if (freelist_tail) {
> - new.inuse -= free_delta;
> - set_freepointer(s, freelist_tail, old.freelist);
> - new.freelist = freelist;
> - } else
> - new.freelist = old.freelist;
> + do {
> + old.freelist = READ_ONCE(slab->freelist);
> + old.counters = READ_ONCE(slab->counters);
> + VM_BUG_ON(!old.frozen);
> +
> + /* Determine target state of the slab */
> + new.counters = old.counters;
> + new.frozen = 0;
> + if (freelist_tail) {
> + new.inuse -= free_delta;
> + set_freepointer(s, freelist_tail, old.freelist);
> + new.freelist = freelist;
> + } else
> + new.freelist = old.freelist;
Per coding style we should have the else with { } even if it's one line, to
match the if branch. Since we touch the code that was previously violating
the style, we can fix up.
>
> - new.frozen = 0;
> + } while (!slab_update_freelist(s, slab,
> + old.freelist, old.counters,
> + new.freelist, new.counters,
> + "unfreezing slab"));
>
> + /*
> + * Stage three: Manipulate the slab list based on the updated state.
> + */
> if (!new.inuse && n->nr_partial >= s->min_partial) {
> - mode = M_FREE;
> + stat(s, DEACTIVATE_EMPTY);
> + discard_slab(s, slab);
> + stat(s, FREE_SLAB);
> } else if (new.freelist) {
> - mode = M_PARTIAL;
> - /*
> - * Taking the spinlock removes the possibility that
> - * acquire_slab() will see a slab that is frozen
> - */
> spin_lock_irqsave(&n->list_lock, flags);
> - } else {
> - mode = M_FULL_NOLIST;
> - }
> -
> -
> - if (!slab_update_freelist(s, slab,
> - old.freelist, old.counters,
> - new.freelist, new.counters,
> - "unfreezing slab")) {
> - if (mode == M_PARTIAL)
> - spin_unlock_irqrestore(&n->list_lock, flags);
> - goto redo;
> - }
> -
> -
> - if (mode == M_PARTIAL) {
> add_partial(n, slab, tail);
> spin_unlock_irqrestore(&n->list_lock, flags);
> stat(s, tail);
> - } else if (mode == M_FREE) {
> - stat(s, DEACTIVATE_EMPTY);
> - discard_slab(s, slab);
> - stat(s, FREE_SLAB);
> - } else if (mode == M_FULL_NOLIST) {
> + } else
> stat(s, DEACTIVATE_FULL);
> - }
Same here.
Thanks!
> }
>
> #ifdef CONFIG_SLUB_CPU_PARTIAL
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v3 7/7] slub: Optimize deactivate_slab()
2023-10-31 11:15 ` Vlastimil Babka
@ 2023-10-31 11:41 ` Chengming Zhou
0 siblings, 0 replies; 30+ messages in thread
From: Chengming Zhou @ 2023-10-31 11:41 UTC (permalink / raw)
To: Vlastimil Babka, cl, penberg
Cc: rientjes, iamjoonsoo.kim, akpm, roman.gushchin, 42.hyeyoo,
linux-mm, linux-kernel, Chengming Zhou
On 2023/10/31 19:15, Vlastimil Babka wrote:
> On 10/24/23 11:33, chengming.zhou@linux.dev wrote:
>> From: Chengming Zhou <zhouchengming@bytedance.com>
>>
>> Since the introduce of unfrozen slabs on cpu partial list, we don't
>> need to synchronize the slab frozen state under the node list_lock.
>>
>> The caller of deactivate_slab() and the caller of __slab_free() won't
>> manipulate the slab list concurrently.
>>
>> So we can get node list_lock in the last stage if we really need to
>> manipulate the slab list in this path.
>>
>> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
>
> Nice simplification!
>
>> ---
>> mm/slub.c | 70 ++++++++++++++++++++-----------------------------------
>> 1 file changed, 25 insertions(+), 45 deletions(-)
>>
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 486d44421432..64d550e415eb 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -2449,10 +2449,8 @@ static void init_kmem_cache_cpus(struct kmem_cache *s)
>> static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
>> void *freelist)
>> {
>> - enum slab_modes { M_NONE, M_PARTIAL, M_FREE, M_FULL_NOLIST };
>> struct kmem_cache_node *n = get_node(s, slab_nid(slab));
>> int free_delta = 0;
>> - enum slab_modes mode = M_NONE;
>> void *nextfree, *freelist_iter, *freelist_tail;
>> int tail = DEACTIVATE_TO_HEAD;
>> unsigned long flags = 0;
>> @@ -2499,58 +2497,40 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
>> * unfrozen and number of objects in the slab may have changed.
>> * Then release lock and retry cmpxchg again.
>> */
>
> This comment above (including parts not visible here) should be updated as
> there is no more list manipulation during unfreeze.
Right!
>
>> -redo:
>> -
>> - old.freelist = READ_ONCE(slab->freelist);
>> - old.counters = READ_ONCE(slab->counters);
>> - VM_BUG_ON(!old.frozen);
>> -
>> - /* Determine target state of the slab */
>> - new.counters = old.counters;
>> - if (freelist_tail) {
>> - new.inuse -= free_delta;
>> - set_freepointer(s, freelist_tail, old.freelist);
>> - new.freelist = freelist;
>> - } else
>> - new.freelist = old.freelist;
>> + do {
>> + old.freelist = READ_ONCE(slab->freelist);
>> + old.counters = READ_ONCE(slab->counters);
>> + VM_BUG_ON(!old.frozen);
>> +
>> + /* Determine target state of the slab */
>> + new.counters = old.counters;
>> + new.frozen = 0;
>> + if (freelist_tail) {
>> + new.inuse -= free_delta;
>> + set_freepointer(s, freelist_tail, old.freelist);
>> + new.freelist = freelist;
>> + } else
>> + new.freelist = old.freelist;
>
> Per coding style we should have the else with { } even if it's one line, to
> match the if branch. Since we touch the code that was previously violating
> the style, we can fix up.
Ok, I will fix all these.
Big thanks for your review!
>
>>
>> - new.frozen = 0;
>> + } while (!slab_update_freelist(s, slab,
>> + old.freelist, old.counters,
>> + new.freelist, new.counters,
>> + "unfreezing slab"));
>>
>> + /*
>> + * Stage three: Manipulate the slab list based on the updated state.
>> + */
>> if (!new.inuse && n->nr_partial >= s->min_partial) {
>> - mode = M_FREE;
>> + stat(s, DEACTIVATE_EMPTY);
>> + discard_slab(s, slab);
>> + stat(s, FREE_SLAB);
>> } else if (new.freelist) {
>> - mode = M_PARTIAL;
>> - /*
>> - * Taking the spinlock removes the possibility that
>> - * acquire_slab() will see a slab that is frozen
>> - */
>> spin_lock_irqsave(&n->list_lock, flags);
>> - } else {
>> - mode = M_FULL_NOLIST;
>> - }
>> -
>> -
>> - if (!slab_update_freelist(s, slab,
>> - old.freelist, old.counters,
>> - new.freelist, new.counters,
>> - "unfreezing slab")) {
>> - if (mode == M_PARTIAL)
>> - spin_unlock_irqrestore(&n->list_lock, flags);
>> - goto redo;
>> - }
>> -
>> -
>> - if (mode == M_PARTIAL) {
>> add_partial(n, slab, tail);
>> spin_unlock_irqrestore(&n->list_lock, flags);
>> stat(s, tail);
>> - } else if (mode == M_FREE) {
>> - stat(s, DEACTIVATE_EMPTY);
>> - discard_slab(s, slab);
>> - stat(s, FREE_SLAB);
>> - } else if (mode == M_FULL_NOLIST) {
>> + } else
>> stat(s, DEACTIVATE_FULL);
>> - }
>
> Same here.
>
> Thanks!
>
>> }
>>
>> #ifdef CONFIG_SLUB_CPU_PARTIAL
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v3 0/7] slub: Delay freezing of CPU partial slabs
2023-10-24 9:33 [RFC PATCH v3 0/7] slub: Delay freezing of CPU partial slabs chengming.zhou
` (6 preceding siblings ...)
2023-10-24 9:33 ` [RFC PATCH v3 7/7] slub: Optimize deactivate_slab() chengming.zhou
@ 2023-10-27 17:57 ` Christoph Lameter
2023-10-28 2:36 ` Chengming Zhou
7 siblings, 1 reply; 30+ messages in thread
From: Christoph Lameter @ 2023-10-27 17:57 UTC (permalink / raw)
To: chengming.zhou
Cc: penberg, rientjes, iamjoonsoo.kim, akpm, vbabka, roman.gushchin,
42.hyeyoo, linux-mm, linux-kernel, Chengming Zhou
On Tue, 24 Oct 2023, chengming.zhou@linux.dev wrote:
> 2. Solution
> ===========
> We solve these problems by leaving slabs unfrozen when moving out of
> the node partial list and on CPU partial list, so "frozen" bit is 0.
>
> These partial slabs won't be manipulate concurrently by alloc path,
> the only racer is free path, which may manipulate its list when !inuse.
> So we need to introduce another synchronization way to avoid it, we
> reuse PG_workingset to keep track of whether the slab is on node partial
> list or not, only in that case we can manipulate the slab list.
>
> The slab will be delay frozen when it's picked to actively use by the
> CPU, it becomes full at the same time, in which case we still need to
> rely on "frozen" bit to avoid manipulating its list. So the slab will
> be frozen only when activate use and be unfrozen only when deactivate.
I think we have to clear our terminology a bit about what a "frozen" slab
is.
Before this patch a frozen slab is not on the node partial list and
therefore its state on the list does not have to be considered during
freeing and other operations. The frozen slab could be actively allocated
from.
From the source:
* Frozen slabs
*
* If a slab is frozen then it is exempt from list management. It is not
* on any list except per cpu partial list. The processor that froze the
* slab is the one who can perform list operations on the slab. Other
* processors may put objects onto the freelist but the processor that
* froze the slab is the only one that can retrieve the objects from the
* slab's freelist.
*
After this patch the PG_workingset indicates the state of being on
the partial lists.
What does "frozen slab" then mean? The slab is being allocated from? Is
that information useful or can we drop the frozen flag?
Update the definition?
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v3 0/7] slub: Delay freezing of CPU partial slabs
2023-10-27 17:57 ` [RFC PATCH v3 0/7] slub: Delay freezing of CPU partial slabs Christoph Lameter
@ 2023-10-28 2:36 ` Chengming Zhou
2023-10-30 16:19 ` Vlastimil Babka
2023-10-30 19:25 ` Christoph Lameter
0 siblings, 2 replies; 30+ messages in thread
From: Chengming Zhou @ 2023-10-28 2:36 UTC (permalink / raw)
To: Christoph Lameter
Cc: penberg, rientjes, iamjoonsoo.kim, akpm, vbabka, roman.gushchin,
42.hyeyoo, linux-mm, linux-kernel, Chengming Zhou
On 2023/10/28 01:57, Christoph Lameter wrote:
> On Tue, 24 Oct 2023, chengming.zhou@linux.dev wrote:
>
>> 2. Solution
>> ===========
>> We solve these problems by leaving slabs unfrozen when moving out of
>> the node partial list and on CPU partial list, so "frozen" bit is 0.
>>
>> These partial slabs won't be manipulate concurrently by alloc path,
>> the only racer is free path, which may manipulate its list when !inuse.
>> So we need to introduce another synchronization way to avoid it, we
>> reuse PG_workingset to keep track of whether the slab is on node partial
>> list or not, only in that case we can manipulate the slab list.
>>
>> The slab will be delay frozen when it's picked to actively use by the
>> CPU, it becomes full at the same time, in which case we still need to
>> rely on "frozen" bit to avoid manipulating its list. So the slab will
>> be frozen only when activate use and be unfrozen only when deactivate.
>
> I think we have to clear our terminology a bit about what a "frozen" slab is.
Yes, we need to clean up these inconsistent documentations in the source.
>
> Before this patch a frozen slab is not on the node partial list and therefore its state on the list does not have to be considered during freeing and other operations. The frozen slab could be actively allocated from.
>
> From the source:
>
> * Frozen slabs
> *
> * If a slab is frozen then it is exempt from list management. It is not
> * on any list except per cpu partial list. The processor that froze the
~~ except per cpu partial list ~~
Frozen slab is not on any list, it's actively allocated from by the processor
that froze it. IOW, frozen slab is the cpu slab.
> * slab is the one who can perform list operations on the slab. Other
> * processors may put objects onto the freelist but the processor that
> * froze the slab is the only one that can retrieve the objects from the
> * slab's freelist.
> *
This part I think is unchanged.
>
>
> After this patch the PG_workingset indicates the state of being on the partial lists.
>
> What does "frozen slab" then mean? The slab is being allocated from? Is that information useful or can we drop the frozen flag?
Right, frozen slab is the cpu slab, which is being allocated from by the cpu that froze it.
IMHO, the "frozen" bit is useful because:
1. PG_workingset is only useful on partial slab, which indicates the slab is on the node
partial list, so we can manipulate its list in the __slab_free() path.
2. But for full slab (slab->freelist == NULL), PG_workingset is not much useful, we don't
safely know whether it's used as the cpu slab or not just from this flag. So __slab_free()
still rely on the "frozen" bit to know it.
3. And the maintaining of "frozen" has no extra cost now, since it's changed together with "freelist"
and other counter using cmpxchg, we already have the cmpxchg when start to use a slab as the cpu slab.
Maybe I missed something, I don't know how to drop the frozen flag.
>
> Update the definition?
>
Ok, will add a cleanup patch to update.
Thanks!
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v3 0/7] slub: Delay freezing of CPU partial slabs
2023-10-28 2:36 ` Chengming Zhou
@ 2023-10-30 16:19 ` Vlastimil Babka
2023-10-31 2:29 ` Chengming Zhou
2023-10-30 19:25 ` Christoph Lameter
1 sibling, 1 reply; 30+ messages in thread
From: Vlastimil Babka @ 2023-10-30 16:19 UTC (permalink / raw)
To: Chengming Zhou, Christoph Lameter
Cc: penberg, rientjes, iamjoonsoo.kim, akpm, roman.gushchin,
42.hyeyoo, linux-mm, linux-kernel, Chengming Zhou
On 10/28/23 04:36, Chengming Zhou wrote:
>>
>>
>> After this patch the PG_workingset indicates the state of being on the partial lists.
>>
>> What does "frozen slab" then mean? The slab is being allocated from? Is that information useful or can we drop the frozen flag?
>
> Right, frozen slab is the cpu slab, which is being allocated from by the cpu that froze it.
>
> IMHO, the "frozen" bit is useful because:
>
> 1. PG_workingset is only useful on partial slab, which indicates the slab is on the node
> partial list, so we can manipulate its list in the __slab_free() path.
>
> 2. But for full slab (slab->freelist == NULL), PG_workingset is not much useful, we don't
> safely know whether it's used as the cpu slab or not just from this flag. So __slab_free()
> still rely on the "frozen" bit to know it.
Well, we could extend the meaning of PG_workingset to mean "not a cpu slab
or pecpu partial slab" i.e. both on node partial list and full. However it
would increase the number of cases where __slab_free() has to lock the
list_lock and check the PG_working set. "slab->freelist == NULL" might
happen often exactly because the freelist became cpu freelist.
> 3. And the maintaining of "frozen" has no extra cost now, since it's changed together with "freelist"
> and other counter using cmpxchg, we already have the cmpxchg when start to use a slab as the cpu slab.
And together with this point, I don't see a reason to drop the frozen bit.
It's still useful for cpu slabs. It just wasn't the best possible solution
for percpu partial slabs.
> Maybe I missed something, I don't know how to drop the frozen flag.
Should be possible, but not worth it IMHO.
>>
>> Update the definition?
>>
>
> Ok, will add a cleanup patch to update.
>
> Thanks!
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v3 0/7] slub: Delay freezing of CPU partial slabs
2023-10-30 16:19 ` Vlastimil Babka
@ 2023-10-31 2:29 ` Chengming Zhou
0 siblings, 0 replies; 30+ messages in thread
From: Chengming Zhou @ 2023-10-31 2:29 UTC (permalink / raw)
To: Vlastimil Babka, Christoph Lameter
Cc: penberg, rientjes, iamjoonsoo.kim, akpm, roman.gushchin,
42.hyeyoo, linux-mm, linux-kernel, Chengming Zhou
On 2023/10/31 00:19, Vlastimil Babka wrote:
> On 10/28/23 04:36, Chengming Zhou wrote:
>>>
>>>
>>> After this patch the PG_workingset indicates the state of being on the partial lists.
>>>
>>> What does "frozen slab" then mean? The slab is being allocated from? Is that information useful or can we drop the frozen flag?
>>
>> Right, frozen slab is the cpu slab, which is being allocated from by the cpu that froze it.
>>
>> IMHO, the "frozen" bit is useful because:
>>
>> 1. PG_workingset is only useful on partial slab, which indicates the slab is on the node
>> partial list, so we can manipulate its list in the __slab_free() path.
>>
>> 2. But for full slab (slab->freelist == NULL), PG_workingset is not much useful, we don't
>> safely know whether it's used as the cpu slab or not just from this flag. So __slab_free()
>> still rely on the "frozen" bit to know it.
>
> Well, we could extend the meaning of PG_workingset to mean "not a cpu slab
> or pecpu partial slab" i.e. both on node partial list and full. However it
> would increase the number of cases where __slab_free() has to lock the
> list_lock and check the PG_working set. "slab->freelist == NULL" might
> happen often exactly because the freelist became cpu freelist.
Ah, right, it's possible to do like this.
>
>> 3. And the maintaining of "frozen" has no extra cost now, since it's changed together with "freelist"
>> and other counter using cmpxchg, we already have the cmpxchg when start to use a slab as the cpu slab.
>
> And together with this point, I don't see a reason to drop the frozen bit.
> It's still useful for cpu slabs. It just wasn't the best possible solution
> for percpu partial slabs.
>
>> Maybe I missed something, I don't know how to drop the frozen flag.
>
> Should be possible, but not worth it IMHO.
Agree, we'll just keep "frozen" for the cpu slabs.
Thanks!
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v3 0/7] slub: Delay freezing of CPU partial slabs
2023-10-28 2:36 ` Chengming Zhou
2023-10-30 16:19 ` Vlastimil Babka
@ 2023-10-30 19:25 ` Christoph Lameter
2023-10-31 2:50 ` Chengming Zhou
1 sibling, 1 reply; 30+ messages in thread
From: Christoph Lameter @ 2023-10-30 19:25 UTC (permalink / raw)
To: Chengming Zhou
Cc: penberg, rientjes, iamjoonsoo.kim, akpm, vbabka, roman.gushchin,
42.hyeyoo, linux-mm, linux-kernel, Chengming Zhou
On Sat, 28 Oct 2023, Chengming Zhou wrote:
> 2. But for full slab (slab->freelist == NULL), PG_workingset is not much useful, we don't
> safely know whether it's used as the cpu slab or not just from this flag. So __slab_free()
> still rely on the "frozen" bit to know it.
>
> 3. And the maintaining of "frozen" has no extra cost now, since it's changed together with "freelist"
> and other counter using cmpxchg, we already have the cmpxchg when start to use a slab as the cpu slab.
>
> Maybe I missed something, I don't know how to drop the frozen flag.
Maybe frozen is now = PG_Workingset | cmpxchg-frozen?
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v3 0/7] slub: Delay freezing of CPU partial slabs
2023-10-30 19:25 ` Christoph Lameter
@ 2023-10-31 2:50 ` Chengming Zhou
2023-10-31 3:47 ` Christoph Lameter
0 siblings, 1 reply; 30+ messages in thread
From: Chengming Zhou @ 2023-10-31 2:50 UTC (permalink / raw)
To: Christoph Lameter, vbabka
Cc: penberg, rientjes, iamjoonsoo.kim, akpm, roman.gushchin,
42.hyeyoo, linux-mm, linux-kernel, Chengming Zhou
On 2023/10/31 03:25, Christoph Lameter wrote:
> On Sat, 28 Oct 2023, Chengming Zhou wrote:
>
>> 2. But for full slab (slab->freelist == NULL), PG_workingset is not much useful, we don't
>> safely know whether it's used as the cpu slab or not just from this flag. So __slab_free()
>> still rely on the "frozen" bit to know it.
>>
>> 3. And the maintaining of "frozen" has no extra cost now, since it's changed together with "freelist"
>> and other counter using cmpxchg, we already have the cmpxchg when start to use a slab as the cpu slab.
>>
>> Maybe I missed something, I don't know how to drop the frozen flag.
>
>
> Maybe frozen is now = PG_Workingset | cmpxchg-frozen?
>
>
The current scheme (which this series implemented) is:
- node partial slabs: PG_Workingset (set or clear with per-node list_lock protection)
- cpu partial slabs: !PG_Workingset
- cpu slabs: !PG_Workingset && frozen (set or clear using cmpxchg together with freelist)
- full slabs: !PG_Workingset
As Vlastimil noted, it's possible to drop "frozen" bit for cpu slabs, but
we keep it for performance, since we don't need to grab node list_lock to
check whether PG_Workingset is set or not if the "frozen" bit is set.
Thanks!
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v3 0/7] slub: Delay freezing of CPU partial slabs
2023-10-31 2:50 ` Chengming Zhou
@ 2023-10-31 3:47 ` Christoph Lameter
2023-10-31 4:57 ` Chengming Zhou
0 siblings, 1 reply; 30+ messages in thread
From: Christoph Lameter @ 2023-10-31 3:47 UTC (permalink / raw)
To: Chengming Zhou
Cc: vbabka, penberg, rientjes, iamjoonsoo.kim, akpm, roman.gushchin,
42.hyeyoo, linux-mm, linux-kernel, Chengming Zhou
On Tue, 31 Oct 2023, Chengming Zhou wrote:
> The current scheme (which this series implemented) is:
>
> - node partial slabs: PG_Workingset (set or clear with per-node list_lock protection)
> - cpu partial slabs: !PG_Workingset
And then the frozen flag needs to be set. Otherwise slab_free() would
conclude it is on a partial list?
> - cpu slabs: !PG_Workingset && frozen (set or clear using cmpxchg together with freelist)
> - full slabs: !PG_Workingset
And frozen is clear? Otherwise it is the same as a cpu partial slab.
> As Vlastimil noted, it's possible to drop "frozen" bit for cpu slabs, but
> we keep it for performance, since we don't need to grab node list_lock to
> check whether PG_Workingset is set or not if the "frozen" bit is set.
>
> Thanks!
>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC PATCH v3 0/7] slub: Delay freezing of CPU partial slabs
2023-10-31 3:47 ` Christoph Lameter
@ 2023-10-31 4:57 ` Chengming Zhou
0 siblings, 0 replies; 30+ messages in thread
From: Chengming Zhou @ 2023-10-31 4:57 UTC (permalink / raw)
To: Christoph Lameter
Cc: vbabka, penberg, rientjes, iamjoonsoo.kim, akpm, roman.gushchin,
42.hyeyoo, linux-mm, linux-kernel, Chengming Zhou
On 2023/10/31 11:47, Christoph Lameter wrote:
> On Tue, 31 Oct 2023, Chengming Zhou wrote:
>
>> The current scheme (which this series implemented) is:
>>
>> - node partial slabs: PG_Workingset (set or clear with per-node list_lock protection)
>> - cpu partial slabs: !PG_Workingset
>
> And then the frozen flag needs to be set. Otherwise slab_free() would conclude it is on a partial list?
>
- cpu partial slabs: !PG_Workingset && !frozen
Here comes the optimization that "frozen" is not set for the cpu partial slabs,
slab_free() will grab node list_lock then check by !PG_Workingset that it's not
on a node partial list.
>> - cpu slabs: !PG_Workingset && frozen (set or clear using cmpxchg together with freelist)
>
>
>
>> - full slabs: !PG_Workingset
>
> And frozen is clear? Otherwise it is the same as a cpu partial slab.
>
Right, - full slabs: !PG_Workingset && !frozen
Yes, it's the same as a cpu partial slab from only these two flags, but
slab_free() also know whether it was full or not.
>> As Vlastimil noted, it's possible to drop "frozen" bit for cpu slabs, but
>> we keep it for performance, since we don't need to grab node list_lock to
>> check whether PG_Workingset is set or not if the "frozen" bit is set.
>>
>> Thanks!
>>
^ permalink raw reply [flat|nested] 30+ messages in thread