linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v4 0/9] slub: Delay freezing of CPU partial slabs
@ 2023-10-31 14:07 chengming.zhou
  2023-10-31 14:07 ` [RFC PATCH v4 1/9] slub: Reflow ___slab_alloc() chengming.zhou
                   ` (10 more replies)
  0 siblings, 11 replies; 23+ messages in thread
From: chengming.zhou @ 2023-10-31 14:07 UTC (permalink / raw)
  To: vbabka, cl, penberg, willy
  Cc: rientjes, iamjoonsoo.kim, akpm, roman.gushchin, 42.hyeyoo,
	linux-mm, linux-kernel, chengming.zhou, Chengming Zhou

From: Chengming Zhou <zhouchengming@bytedance.com>

Changes in RFC v4:
 - Reorder patches to put the two cleanup patches to the front.
 - Move slab_node_partial flag functions to mm/slub.c.
 - Fix freeze_slab() by using slab_update_freelist().
 - Fix build error when !CONFIG_SLUB_CPU_PARTIAL.
 - Add a patch to rename all *unfreeze_partials* functions.
 - Add a patch to update inconsistent documentations in the source.
 - Some comments and changelog improvements.
 - Add Reviewed-by and Suggested-by tags. Many thanks!
 - RFC v3: https://lore.kernel.org/all/20231024093345.3676493-1-chengming.zhou@linux.dev/

Changes in RFC v3:
 - Directly use __set_bit() and __clear_bit() for the slab_node_partial
   flag operations to avoid exporting non-atomic "workingset" interfaces.
 - Change get_partial() related functions to return a slab instead of
   returning the freelist or single object.
 - Don't freeze any slab under the node list_lock to further reduce
   list_lock holding times, as suggested by Vlastimil Babka.
 - Introduce freeze_slab() to do the delay freezing and return freelist.
 - Reorder patches.
 - RFC v2: https://lore.kernel.org/all/20231021144317.3400916-1-chengming.zhou@linux.dev/

Changes in RFC v2:
 - Reuse PG_workingset bit to keep track of whether slub is on the
   per-node partial list, as suggested by Matthew Wilcox.
 - Fix OOM problem on kernel without CONFIG_SLUB_CPU_PARTIAL, which
   is caused by leak of partial slabs when get_partial_node().
 - Add a patch to simplify acquire_slab().
 - Reorder patches a little.
 - RFC v1: https://lore.kernel.org/all/20231017154439.3036608-1-chengming.zhou@linux.dev/

1. Problem
==========
Now we have to freeze the slab when get from the node partial list, and
unfreeze the slab when put to the node partial list. Because we need to
rely on the node list_lock to synchronize the "frozen" bit changes.

This implementation has some drawbacks:

 - Alloc path: twice cmpxchg_double.
   It has to get some partial slabs from node when the allocator has used
   up the CPU partial slabs. So it freeze the slab (one cmpxchg_double)
   with node list_lock held, put those frozen slabs on its CPU partial
   list. Later ___slab_alloc() will cmpxchg_double try-loop again if that
   slab is picked to use.

 - Alloc path: amplified contention on node list_lock.
   Since we have to synchronize the "frozen" bit changes under the node
   list_lock, the contention of slab (struct page) can be transferred
   to the node list_lock. On machine with many CPUs in one node, the
   contention of list_lock will be amplified by all CPUs' alloc path.

   The current code has to workaround this problem by avoiding using
   cmpxchg_double try-loop, which will just break and return when
   contention of page encountered and the first cmpxchg_double failed.
   But this workaround has its own problem. For more context, see
   9b1ea29bc0d7 ("Revert "mm, slub: consider rest of partial list if
   acquire_slab() fails"").

 - Free path: redundant unfreeze.
   __slab_free() will freeze and cache some slabs on its partial list,
   and flush them to the node partial list when exceed, which has to
   unfreeze those slabs again under the node list_lock. Actually we
   don't need to freeze slab on CPU partial list, in which case we
   can save the unfreeze cmpxchg_double operations in flush path.

2. Solution
===========
We solve these problems by leaving slabs unfrozen when moving out of
the node partial list and on CPU partial list, so "frozen" bit is 0.

These partial slabs won't be manipulate concurrently by alloc path,
the only racer is free path, which may manipulate its list when !inuse.
So we need to introduce another synchronization way to avoid it, we
reuse PG_workingset to keep track of whether the slab is on node partial
list or not, only in that case we can manipulate the slab list.

The slab will be delay frozen when it's picked to actively use by the
CPU, it becomes full at the same time, in which case we still need to
rely on "frozen" bit to avoid manipulating its list. So the slab will
be frozen only when activate use and be unfrozen only when deactivate.

3. Testing
==========
We just did some simple testing on a server with 128 CPUs (2 nodes) to
compare performance for now.

 - perf bench sched messaging -g 5 -t -l 100000
   baseline	RFC
   7.042s	6.966s
   7.022s	7.045s
   7.054s	6.985s

 - stress-ng --rawpkt 128 --rawpkt-ops 100000000
   baseline	RFC
   2.42s	2.15s
   2.45s	2.16s
   2.44s	2.17s

It shows above there is about 10% improvement on stress-ng rawpkt
testcase, although no much improvement on perf sched bench testcase.

Thanks for any comment and code review!

Chengming Zhou (9):
  slub: Reflow ___slab_alloc()
  slub: Change get_partial() interfaces to return slab
  slub: Keep track of whether slub is on the per-node partial list
  slub: Prepare __slab_free() for unfrozen partial slab out of node
    partial list
  slub: Introduce freeze_slab()
  slub: Delay freezing of partial slabs
  slub: Optimize deactivate_slab()
  slub: Rename all *unfreeze_partials* functions to *put_partials*
  slub: Update frozen slabs documentations in the source

 mm/slub.c | 381 ++++++++++++++++++++++++++----------------------------
 1 file changed, 180 insertions(+), 201 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC PATCH v4 1/9] slub: Reflow ___slab_alloc()
  2023-10-31 14:07 [RFC PATCH v4 0/9] slub: Delay freezing of CPU partial slabs chengming.zhou
@ 2023-10-31 14:07 ` chengming.zhou
  2023-10-31 14:07 ` [RFC PATCH v4 2/9] slub: Change get_partial() interfaces to return slab chengming.zhou
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: chengming.zhou @ 2023-10-31 14:07 UTC (permalink / raw)
  To: vbabka, cl, penberg, willy
  Cc: rientjes, iamjoonsoo.kim, akpm, roman.gushchin, 42.hyeyoo,
	linux-mm, linux-kernel, chengming.zhou, Chengming Zhou

From: Chengming Zhou <zhouchengming@bytedance.com>

The get_partial() interface used in ___slab_alloc() may return a single
object in the "kmem_cache_debug(s)" case, in which we will just return
the "freelist" object.

Move this handling up to prepare for later changes.

And the "pfmemalloc_match()" part is not needed for node partial slab,
since we already check this in the get_partial_node().

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slub.c | 31 +++++++++++++++----------------
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 63d281dfacdb..0b0fdc8c189f 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3216,8 +3216,21 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 	pc.slab = &slab;
 	pc.orig_size = orig_size;
 	freelist = get_partial(s, node, &pc);
-	if (freelist)
-		goto check_new_slab;
+	if (freelist) {
+		if (kmem_cache_debug(s)) {
+			/*
+			 * For debug caches here we had to go through
+			 * alloc_single_from_partial() so just store the
+			 * tracking info and return the object.
+			 */
+			if (s->flags & SLAB_STORE_USER)
+				set_track(s, freelist, TRACK_ALLOC, addr);
+
+			return freelist;
+		}
+
+		goto retry_load_slab;
+	}
 
 	slub_put_cpu_ptr(s->cpu_slab);
 	slab = new_slab(s, gfpflags, node);
@@ -3253,20 +3266,6 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 
 	inc_slabs_node(s, slab_nid(slab), slab->objects);
 
-check_new_slab:
-
-	if (kmem_cache_debug(s)) {
-		/*
-		 * For debug caches here we had to go through
-		 * alloc_single_from_partial() so just store the tracking info
-		 * and return the object
-		 */
-		if (s->flags & SLAB_STORE_USER)
-			set_track(s, freelist, TRACK_ALLOC, addr);
-
-		return freelist;
-	}
-
 	if (unlikely(!pfmemalloc_match(slab, gfpflags))) {
 		/*
 		 * For !pfmemalloc_match() case we don't load freelist so that
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH v4 2/9] slub: Change get_partial() interfaces to return slab
  2023-10-31 14:07 [RFC PATCH v4 0/9] slub: Delay freezing of CPU partial slabs chengming.zhou
  2023-10-31 14:07 ` [RFC PATCH v4 1/9] slub: Reflow ___slab_alloc() chengming.zhou
@ 2023-10-31 14:07 ` chengming.zhou
  2023-10-31 14:07 ` [RFC PATCH v4 3/9] slub: Keep track of whether slub is on the per-node partial list chengming.zhou
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: chengming.zhou @ 2023-10-31 14:07 UTC (permalink / raw)
  To: vbabka, cl, penberg, willy
  Cc: rientjes, iamjoonsoo.kim, akpm, roman.gushchin, 42.hyeyoo,
	linux-mm, linux-kernel, chengming.zhou, Chengming Zhou

From: Chengming Zhou <zhouchengming@bytedance.com>

We need all get_partial() related interfaces to return a slab, instead
of returning the freelist (or object).

Use the partial_context.object to return back freelist or object for
now. This patch shouldn't have any functional changes.

Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slub.c | 63 +++++++++++++++++++++++++++++--------------------------
 1 file changed, 33 insertions(+), 30 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 0b0fdc8c189f..03384cd965c5 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -204,9 +204,9 @@ DEFINE_STATIC_KEY_FALSE(slub_debug_enabled);
 
 /* Structure holding parameters for get_partial() call chain */
 struct partial_context {
-	struct slab **slab;
 	gfp_t flags;
 	unsigned int orig_size;
+	void *object;
 };
 
 static inline bool kmem_cache_debug(struct kmem_cache *s)
@@ -2269,10 +2269,11 @@ static inline bool pfmemalloc_match(struct slab *slab, gfp_t gfpflags);
 /*
  * Try to allocate a partial slab from a specific node.
  */
-static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n,
-			      struct partial_context *pc)
+static struct slab *get_partial_node(struct kmem_cache *s,
+				     struct kmem_cache_node *n,
+				     struct partial_context *pc)
 {
-	struct slab *slab, *slab2;
+	struct slab *slab, *slab2, *partial = NULL;
 	void *object = NULL;
 	unsigned long flags;
 	unsigned int partial_slabs = 0;
@@ -2288,27 +2289,28 @@ static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n,
 
 	spin_lock_irqsave(&n->list_lock, flags);
 	list_for_each_entry_safe(slab, slab2, &n->partial, slab_list) {
-		void *t;
-
 		if (!pfmemalloc_match(slab, pc->flags))
 			continue;
 
 		if (IS_ENABLED(CONFIG_SLUB_TINY) || kmem_cache_debug(s)) {
 			object = alloc_single_from_partial(s, n, slab,
 							pc->orig_size);
-			if (object)
+			if (object) {
+				partial = slab;
+				pc->object = object;
 				break;
+			}
 			continue;
 		}
 
-		t = acquire_slab(s, n, slab, object == NULL);
-		if (!t)
+		object = acquire_slab(s, n, slab, object == NULL);
+		if (!object)
 			break;
 
-		if (!object) {
-			*pc->slab = slab;
+		if (!partial) {
+			partial = slab;
+			pc->object = object;
 			stat(s, ALLOC_FROM_PARTIAL);
-			object = t;
 		} else {
 			put_cpu_partial(s, slab, 0);
 			stat(s, CPU_PARTIAL_NODE);
@@ -2324,20 +2326,21 @@ static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n,
 
 	}
 	spin_unlock_irqrestore(&n->list_lock, flags);
-	return object;
+	return partial;
 }
 
 /*
  * Get a slab from somewhere. Search in increasing NUMA distances.
  */
-static void *get_any_partial(struct kmem_cache *s, struct partial_context *pc)
+static struct slab *get_any_partial(struct kmem_cache *s,
+				    struct partial_context *pc)
 {
 #ifdef CONFIG_NUMA
 	struct zonelist *zonelist;
 	struct zoneref *z;
 	struct zone *zone;
 	enum zone_type highest_zoneidx = gfp_zone(pc->flags);
-	void *object;
+	struct slab *slab;
 	unsigned int cpuset_mems_cookie;
 
 	/*
@@ -2372,8 +2375,8 @@ static void *get_any_partial(struct kmem_cache *s, struct partial_context *pc)
 
 			if (n && cpuset_zone_allowed(zone, pc->flags) &&
 					n->nr_partial > s->min_partial) {
-				object = get_partial_node(s, n, pc);
-				if (object) {
+				slab = get_partial_node(s, n, pc);
+				if (slab) {
 					/*
 					 * Don't check read_mems_allowed_retry()
 					 * here - if mems_allowed was updated in
@@ -2381,7 +2384,7 @@ static void *get_any_partial(struct kmem_cache *s, struct partial_context *pc)
 					 * between allocation and the cpuset
 					 * update
 					 */
-					return object;
+					return slab;
 				}
 			}
 		}
@@ -2393,17 +2396,18 @@ static void *get_any_partial(struct kmem_cache *s, struct partial_context *pc)
 /*
  * Get a partial slab, lock it and return it.
  */
-static void *get_partial(struct kmem_cache *s, int node, struct partial_context *pc)
+static struct slab *get_partial(struct kmem_cache *s, int node,
+				struct partial_context *pc)
 {
-	void *object;
+	struct slab *slab;
 	int searchnode = node;
 
 	if (node == NUMA_NO_NODE)
 		searchnode = numa_mem_id();
 
-	object = get_partial_node(s, get_node(s, searchnode), pc);
-	if (object || node != NUMA_NO_NODE)
-		return object;
+	slab = get_partial_node(s, get_node(s, searchnode), pc);
+	if (slab || node != NUMA_NO_NODE)
+		return slab;
 
 	return get_any_partial(s, pc);
 }
@@ -3213,10 +3217,10 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 new_objects:
 
 	pc.flags = gfpflags;
-	pc.slab = &slab;
 	pc.orig_size = orig_size;
-	freelist = get_partial(s, node, &pc);
-	if (freelist) {
+	slab = get_partial(s, node, &pc);
+	if (slab) {
+		freelist = pc.object;
 		if (kmem_cache_debug(s)) {
 			/*
 			 * For debug caches here we had to go through
@@ -3408,12 +3412,11 @@ static void *__slab_alloc_node(struct kmem_cache *s,
 	void *object;
 
 	pc.flags = gfpflags;
-	pc.slab = &slab;
 	pc.orig_size = orig_size;
-	object = get_partial(s, node, &pc);
+	slab = get_partial(s, node, &pc);
 
-	if (object)
-		return object;
+	if (slab)
+		return pc.object;
 
 	slab = new_slab(s, gfpflags, node);
 	if (unlikely(!slab)) {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH v4 3/9] slub: Keep track of whether slub is on the per-node partial list
  2023-10-31 14:07 [RFC PATCH v4 0/9] slub: Delay freezing of CPU partial slabs chengming.zhou
  2023-10-31 14:07 ` [RFC PATCH v4 1/9] slub: Reflow ___slab_alloc() chengming.zhou
  2023-10-31 14:07 ` [RFC PATCH v4 2/9] slub: Change get_partial() interfaces to return slab chengming.zhou
@ 2023-10-31 14:07 ` chengming.zhou
  2023-11-01 12:23   ` Vlastimil Babka
  2023-10-31 14:07 ` [RFC PATCH v4 4/9] slub: Prepare __slab_free() for unfrozen partial slab out of node " chengming.zhou
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 23+ messages in thread
From: chengming.zhou @ 2023-10-31 14:07 UTC (permalink / raw)
  To: vbabka, cl, penberg, willy
  Cc: rientjes, iamjoonsoo.kim, akpm, roman.gushchin, 42.hyeyoo,
	linux-mm, linux-kernel, chengming.zhou, Chengming Zhou

From: Chengming Zhou <zhouchengming@bytedance.com>

Now we rely on the "frozen" bit to see if we should manipulate the
slab->slab_list, which will be changed in the following patch.

Instead we introduce another way to keep track of whether slub is on
the per-node partial list, here we reuse the PG_workingset bit.

We use __set_bit and __clear_bit directly instead of the atomic version
for better performance and it's safe since it's protected by the slub
node list_lock.

Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 mm/slub.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/mm/slub.c b/mm/slub.c
index 03384cd965c5..eed8ae0dbaf9 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2116,6 +2116,25 @@ static void discard_slab(struct kmem_cache *s, struct slab *slab)
 	free_slab(s, slab);
 }
 
+/*
+ * SLUB reuses PG_workingset bit to keep track of whether it's on
+ * the per-node partial list.
+ */
+static inline bool slab_test_node_partial(const struct slab *slab)
+{
+	return folio_test_workingset((struct folio *)slab_folio(slab));
+}
+
+static inline void slab_set_node_partial(struct slab *slab)
+{
+	__set_bit(PG_workingset, folio_flags(slab_folio(slab), 0));
+}
+
+static inline void slab_clear_node_partial(struct slab *slab)
+{
+	__clear_bit(PG_workingset, folio_flags(slab_folio(slab), 0));
+}
+
 /*
  * Management of partially allocated slabs.
  */
@@ -2127,6 +2146,7 @@ __add_partial(struct kmem_cache_node *n, struct slab *slab, int tail)
 		list_add_tail(&slab->slab_list, &n->partial);
 	else
 		list_add(&slab->slab_list, &n->partial);
+	slab_set_node_partial(slab);
 }
 
 static inline void add_partial(struct kmem_cache_node *n,
@@ -2141,6 +2161,7 @@ static inline void remove_partial(struct kmem_cache_node *n,
 {
 	lockdep_assert_held(&n->list_lock);
 	list_del(&slab->slab_list);
+	slab_clear_node_partial(slab);
 	n->nr_partial--;
 }
 
@@ -4833,6 +4854,7 @@ static int __kmem_cache_do_shrink(struct kmem_cache *s)
 
 			if (free == slab->objects) {
 				list_move(&slab->slab_list, &discard);
+				slab_clear_node_partial(slab);
 				n->nr_partial--;
 				dec_slabs_node(s, node, slab->objects);
 			} else if (free <= SHRINK_PROMOTE_MAX)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH v4 4/9] slub: Prepare __slab_free() for unfrozen partial slab out of node partial list
  2023-10-31 14:07 [RFC PATCH v4 0/9] slub: Delay freezing of CPU partial slabs chengming.zhou
                   ` (2 preceding siblings ...)
  2023-10-31 14:07 ` [RFC PATCH v4 3/9] slub: Keep track of whether slub is on the per-node partial list chengming.zhou
@ 2023-10-31 14:07 ` chengming.zhou
  2023-11-01 12:26   ` Vlastimil Babka
  2023-10-31 14:07 ` [RFC PATCH v4 5/9] slub: Introduce freeze_slab() chengming.zhou
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 23+ messages in thread
From: chengming.zhou @ 2023-10-31 14:07 UTC (permalink / raw)
  To: vbabka, cl, penberg, willy
  Cc: rientjes, iamjoonsoo.kim, akpm, roman.gushchin, 42.hyeyoo,
	linux-mm, linux-kernel, chengming.zhou, Chengming Zhou

From: Chengming Zhou <zhouchengming@bytedance.com>

Now the partially empty slub will be frozen when taken out of node partial
list, so the __slab_free() will know from "was_frozen" that the partially
empty slab is not on node partial list and is a cpu or cpu partial slab
of some cpu.

But we will change this, make partial slabs leave the node partial list
with unfrozen state, so we need to change __slab_free() to use the new
slab_test_node_partial() we just introduced.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 mm/slub.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/mm/slub.c b/mm/slub.c
index eed8ae0dbaf9..1880b483350e 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3631,6 +3631,7 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
 	unsigned long counters;
 	struct kmem_cache_node *n = NULL;
 	unsigned long flags;
+	bool on_node_partial;
 
 	stat(s, FREE_SLOWPATH);
 
@@ -3678,6 +3679,7 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
 				 */
 				spin_lock_irqsave(&n->list_lock, flags);
 
+				on_node_partial = slab_test_node_partial(slab);
 			}
 		}
 
@@ -3706,6 +3708,15 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
 		return;
 	}
 
+	/*
+	 * This slab was partially empty but not on the per-node partial list,
+	 * in which case we shouldn't manipulate its list, just return.
+	 */
+	if (prior && !on_node_partial) {
+		spin_unlock_irqrestore(&n->list_lock, flags);
+		return;
+	}
+
 	if (unlikely(!new.inuse && n->nr_partial >= s->min_partial))
 		goto slab_empty;
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH v4 5/9] slub: Introduce freeze_slab()
  2023-10-31 14:07 [RFC PATCH v4 0/9] slub: Delay freezing of CPU partial slabs chengming.zhou
                   ` (3 preceding siblings ...)
  2023-10-31 14:07 ` [RFC PATCH v4 4/9] slub: Prepare __slab_free() for unfrozen partial slab out of node " chengming.zhou
@ 2023-10-31 14:07 ` chengming.zhou
  2023-12-03  6:08   ` Hyeonggon Yoo
  2023-10-31 14:07 ` [RFC PATCH v4 6/9] slub: Delay freezing of partial slabs chengming.zhou
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 23+ messages in thread
From: chengming.zhou @ 2023-10-31 14:07 UTC (permalink / raw)
  To: vbabka, cl, penberg, willy
  Cc: rientjes, iamjoonsoo.kim, akpm, roman.gushchin, 42.hyeyoo,
	linux-mm, linux-kernel, chengming.zhou, Chengming Zhou

From: Chengming Zhou <zhouchengming@bytedance.com>

We will have unfrozen slabs out of the node partial list later, so we
need a freeze_slab() function to freeze the partial slab and get its
freelist.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slub.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/mm/slub.c b/mm/slub.c
index 1880b483350e..edf567971679 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3098,6 +3098,33 @@ static inline void *get_freelist(struct kmem_cache *s, struct slab *slab)
 	return freelist;
 }
 
+/*
+ * Freeze the partial slab and return the pointer to the freelist.
+ */
+static inline void *freeze_slab(struct kmem_cache *s, struct slab *slab)
+{
+	struct slab new;
+	unsigned long counters;
+	void *freelist;
+
+	do {
+		freelist = slab->freelist;
+		counters = slab->counters;
+
+		new.counters = counters;
+		VM_BUG_ON(new.frozen);
+
+		new.inuse = slab->objects;
+		new.frozen = 1;
+
+	} while (!slab_update_freelist(s, slab,
+		freelist, counters,
+		NULL, new.counters,
+		"freeze_slab"));
+
+	return freelist;
+}
+
 /*
  * Slow path. The lockless freelist is empty or we need to perform
  * debugging duties.
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH v4 6/9] slub: Delay freezing of partial slabs
  2023-10-31 14:07 [RFC PATCH v4 0/9] slub: Delay freezing of CPU partial slabs chengming.zhou
                   ` (4 preceding siblings ...)
  2023-10-31 14:07 ` [RFC PATCH v4 5/9] slub: Introduce freeze_slab() chengming.zhou
@ 2023-10-31 14:07 ` chengming.zhou
  2023-10-31 14:07 ` [RFC PATCH v4 7/9] slub: Optimize deactivate_slab() chengming.zhou
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: chengming.zhou @ 2023-10-31 14:07 UTC (permalink / raw)
  To: vbabka, cl, penberg, willy
  Cc: rientjes, iamjoonsoo.kim, akpm, roman.gushchin, 42.hyeyoo,
	linux-mm, linux-kernel, chengming.zhou, Chengming Zhou

From: Chengming Zhou <zhouchengming@bytedance.com>

Now we will freeze slabs when moving them out of node partial list to
cpu partial list, this method needs two cmpxchg_double operations:

1. freeze slab (acquire_slab()) under the node list_lock
2. get_freelist() when pick used in ___slab_alloc()

Actually we don't need to freeze when moving slabs out of node partial
list, we can delay freezing to when use slab freelist in ___slab_alloc(),
so we can save one cmpxchg_double().

And there are other good points:
 - The moving of slabs between node partial list and cpu partial list
   becomes simpler, since we don't need to freeze or unfreeze at all.

 - The node list_lock contention would be less, since we don't need to
   freeze any slab under the node list_lock.

We can achieve this because there is no concurrent path would manipulate
the partial slab list except the __slab_free() path, which is now
serialized by slab_test_node_partial() under the list_lock.

Since the slab returned by get_partial() interfaces is not frozen anymore
and no freelist is returned in the partial_context, so we need to use the
introduced freeze_slab() to freeze it and get its freelist.

Similarly, the slabs on the CPU partial list are not frozen anymore,
we need to freeze_slab() on it before use.

We can now delete acquire_slab() as it became unused.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slub.c | 113 +++++++++++-------------------------------------------
 1 file changed, 23 insertions(+), 90 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index edf567971679..bcb5b2c4e213 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2234,51 +2234,6 @@ static void *alloc_single_from_new_slab(struct kmem_cache *s,
 	return object;
 }
 
-/*
- * Remove slab from the partial list, freeze it and
- * return the pointer to the freelist.
- *
- * Returns a list of objects or NULL if it fails.
- */
-static inline void *acquire_slab(struct kmem_cache *s,
-		struct kmem_cache_node *n, struct slab *slab,
-		int mode)
-{
-	void *freelist;
-	unsigned long counters;
-	struct slab new;
-
-	lockdep_assert_held(&n->list_lock);
-
-	/*
-	 * Zap the freelist and set the frozen bit.
-	 * The old freelist is the list of objects for the
-	 * per cpu allocation list.
-	 */
-	freelist = slab->freelist;
-	counters = slab->counters;
-	new.counters = counters;
-	if (mode) {
-		new.inuse = slab->objects;
-		new.freelist = NULL;
-	} else {
-		new.freelist = freelist;
-	}
-
-	VM_BUG_ON(new.frozen);
-	new.frozen = 1;
-
-	if (!__slab_update_freelist(s, slab,
-			freelist, counters,
-			new.freelist, new.counters,
-			"acquire_slab"))
-		return NULL;
-
-	remove_partial(n, slab);
-	WARN_ON(!freelist);
-	return freelist;
-}
-
 #ifdef CONFIG_SLUB_CPU_PARTIAL
 static void put_cpu_partial(struct kmem_cache *s, struct slab *slab, int drain);
 #else
@@ -2295,7 +2250,6 @@ static struct slab *get_partial_node(struct kmem_cache *s,
 				     struct partial_context *pc)
 {
 	struct slab *slab, *slab2, *partial = NULL;
-	void *object = NULL;
 	unsigned long flags;
 	unsigned int partial_slabs = 0;
 
@@ -2314,7 +2268,7 @@ static struct slab *get_partial_node(struct kmem_cache *s,
 			continue;
 
 		if (IS_ENABLED(CONFIG_SLUB_TINY) || kmem_cache_debug(s)) {
-			object = alloc_single_from_partial(s, n, slab,
+			void *object = alloc_single_from_partial(s, n, slab,
 							pc->orig_size);
 			if (object) {
 				partial = slab;
@@ -2324,13 +2278,10 @@ static struct slab *get_partial_node(struct kmem_cache *s,
 			continue;
 		}
 
-		object = acquire_slab(s, n, slab, object == NULL);
-		if (!object)
-			break;
+		remove_partial(n, slab);
 
 		if (!partial) {
 			partial = slab;
-			pc->object = object;
 			stat(s, ALLOC_FROM_PARTIAL);
 		} else {
 			put_cpu_partial(s, slab, 0);
@@ -2629,9 +2580,6 @@ static void __unfreeze_partials(struct kmem_cache *s, struct slab *partial_slab)
 	unsigned long flags = 0;
 
 	while (partial_slab) {
-		struct slab new;
-		struct slab old;
-
 		slab = partial_slab;
 		partial_slab = slab->next;
 
@@ -2644,23 +2592,7 @@ static void __unfreeze_partials(struct kmem_cache *s, struct slab *partial_slab)
 			spin_lock_irqsave(&n->list_lock, flags);
 		}
 
-		do {
-
-			old.freelist = slab->freelist;
-			old.counters = slab->counters;
-			VM_BUG_ON(!old.frozen);
-
-			new.counters = old.counters;
-			new.freelist = old.freelist;
-
-			new.frozen = 0;
-
-		} while (!__slab_update_freelist(s, slab,
-				old.freelist, old.counters,
-				new.freelist, new.counters,
-				"unfreezing slab"));
-
-		if (unlikely(!new.inuse && n->nr_partial >= s->min_partial)) {
+		if (unlikely(!slab->inuse && n->nr_partial >= s->min_partial)) {
 			slab->next = slab_to_discard;
 			slab_to_discard = slab;
 		} else {
@@ -3167,7 +3099,6 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 			node = NUMA_NO_NODE;
 		goto new_slab;
 	}
-redo:
 
 	if (unlikely(!node_match(slab, node))) {
 		/*
@@ -3243,7 +3174,8 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 
 new_slab:
 
-	if (slub_percpu_partial(c)) {
+#ifdef CONFIG_SLUB_CPU_PARTIAL
+	while (slub_percpu_partial(c)) {
 		local_lock_irqsave(&s->cpu_slab->lock, flags);
 		if (unlikely(c->slab)) {
 			local_unlock_irqrestore(&s->cpu_slab->lock, flags);
@@ -3255,12 +3187,22 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 			goto new_objects;
 		}
 
-		slab = c->slab = slub_percpu_partial(c);
+		slab = slub_percpu_partial(c);
 		slub_set_percpu_partial(c, slab);
 		local_unlock_irqrestore(&s->cpu_slab->lock, flags);
 		stat(s, CPU_PARTIAL_ALLOC);
-		goto redo;
+
+		if (unlikely(!node_match(slab, node) ||
+			     !pfmemalloc_match(slab, gfpflags))) {
+			slab->next = NULL;
+			__unfreeze_partials(s, slab);
+			continue;
+		}
+
+		freelist = freeze_slab(s, slab);
+		goto retry_load_slab;
 	}
+#endif
 
 new_objects:
 
@@ -3268,8 +3210,8 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 	pc.orig_size = orig_size;
 	slab = get_partial(s, node, &pc);
 	if (slab) {
-		freelist = pc.object;
 		if (kmem_cache_debug(s)) {
+			freelist = pc.object;
 			/*
 			 * For debug caches here we had to go through
 			 * alloc_single_from_partial() so just store the
@@ -3281,6 +3223,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 			return freelist;
 		}
 
+		freelist = freeze_slab(s, slab);
 		goto retry_load_slab;
 	}
 
@@ -3682,18 +3625,8 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
 		was_frozen = new.frozen;
 		new.inuse -= cnt;
 		if ((!new.inuse || !prior) && !was_frozen) {
-
-			if (kmem_cache_has_cpu_partial(s) && !prior) {
-
-				/*
-				 * Slab was on no list before and will be
-				 * partially empty
-				 * We can defer the list move and instead
-				 * freeze it.
-				 */
-				new.frozen = 1;
-
-			} else { /* Needs to be taken off a list */
+			/* Needs to be taken off a list */
+			if (!kmem_cache_has_cpu_partial(s) || prior) {
 
 				n = get_node(s, slab_nid(slab));
 				/*
@@ -3723,9 +3656,9 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
 			 * activity can be necessary.
 			 */
 			stat(s, FREE_FROZEN);
-		} else if (new.frozen) {
+		} else if (kmem_cache_has_cpu_partial(s) && !prior) {
 			/*
-			 * If we just froze the slab then put it onto the
+			 * If we started with a full slab then put it onto the
 			 * per cpu partial list.
 			 */
 			put_cpu_partial(s, slab, 1);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH v4 7/9] slub: Optimize deactivate_slab()
  2023-10-31 14:07 [RFC PATCH v4 0/9] slub: Delay freezing of CPU partial slabs chengming.zhou
                   ` (5 preceding siblings ...)
  2023-10-31 14:07 ` [RFC PATCH v4 6/9] slub: Delay freezing of partial slabs chengming.zhou
@ 2023-10-31 14:07 ` chengming.zhou
  2023-11-01 13:21   ` Vlastimil Babka
  2023-10-31 14:07 ` [RFC PATCH v4 8/9] slub: Rename all *unfreeze_partials* functions to *put_partials* chengming.zhou
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 23+ messages in thread
From: chengming.zhou @ 2023-10-31 14:07 UTC (permalink / raw)
  To: vbabka, cl, penberg, willy
  Cc: rientjes, iamjoonsoo.kim, akpm, roman.gushchin, 42.hyeyoo,
	linux-mm, linux-kernel, chengming.zhou, Chengming Zhou

From: Chengming Zhou <zhouchengming@bytedance.com>

Since the introduce of unfrozen slabs on cpu partial list, we don't
need to synchronize the slab frozen state under the node list_lock.

The caller of deactivate_slab() and the caller of __slab_free() won't
manipulate the slab list concurrently.

So we can get node list_lock in the last stage if we really need to
manipulate the slab list in this path.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 mm/slub.c | 76 +++++++++++++++++++------------------------------------
 1 file changed, 26 insertions(+), 50 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index bcb5b2c4e213..c429f8baba5f 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2468,10 +2468,8 @@ static void init_kmem_cache_cpus(struct kmem_cache *s)
 static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
 			    void *freelist)
 {
-	enum slab_modes { M_NONE, M_PARTIAL, M_FREE, M_FULL_NOLIST };
 	struct kmem_cache_node *n = get_node(s, slab_nid(slab));
 	int free_delta = 0;
-	enum slab_modes mode = M_NONE;
 	void *nextfree, *freelist_iter, *freelist_tail;
 	int tail = DEACTIVATE_TO_HEAD;
 	unsigned long flags = 0;
@@ -2512,62 +2510,40 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
 	 *
 	 * Ensure that the slab is unfrozen while the list presence
 	 * reflects the actual number of objects during unfreeze.
-	 *
-	 * We first perform cmpxchg holding lock and insert to list
-	 * when it succeed. If there is mismatch then the slab is not
-	 * unfrozen and number of objects in the slab may have changed.
-	 * Then release lock and retry cmpxchg again.
 	 */
-redo:
-
-	old.freelist = READ_ONCE(slab->freelist);
-	old.counters = READ_ONCE(slab->counters);
-	VM_BUG_ON(!old.frozen);
-
-	/* Determine target state of the slab */
-	new.counters = old.counters;
-	if (freelist_tail) {
-		new.inuse -= free_delta;
-		set_freepointer(s, freelist_tail, old.freelist);
-		new.freelist = freelist;
-	} else
-		new.freelist = old.freelist;
-
-	new.frozen = 0;
+	do {
+		old.freelist = READ_ONCE(slab->freelist);
+		old.counters = READ_ONCE(slab->counters);
+		VM_BUG_ON(!old.frozen);
+
+		/* Determine target state of the slab */
+		new.counters = old.counters;
+		new.frozen = 0;
+		if (freelist_tail) {
+			new.inuse -= free_delta;
+			set_freepointer(s, freelist_tail, old.freelist);
+			new.freelist = freelist;
+		} else {
+			new.freelist = old.freelist;
+		}
+	} while (!slab_update_freelist(s, slab,
+		old.freelist, old.counters,
+		new.freelist, new.counters,
+		"unfreezing slab"));
 
+	/*
+	 * Stage three: Manipulate the slab list based on the updated state.
+	 */
 	if (!new.inuse && n->nr_partial >= s->min_partial) {
-		mode = M_FREE;
+		stat(s, DEACTIVATE_EMPTY);
+		discard_slab(s, slab);
+		stat(s, FREE_SLAB);
 	} else if (new.freelist) {
-		mode = M_PARTIAL;
-		/*
-		 * Taking the spinlock removes the possibility that
-		 * acquire_slab() will see a slab that is frozen
-		 */
 		spin_lock_irqsave(&n->list_lock, flags);
-	} else {
-		mode = M_FULL_NOLIST;
-	}
-
-
-	if (!slab_update_freelist(s, slab,
-				old.freelist, old.counters,
-				new.freelist, new.counters,
-				"unfreezing slab")) {
-		if (mode == M_PARTIAL)
-			spin_unlock_irqrestore(&n->list_lock, flags);
-		goto redo;
-	}
-
-
-	if (mode == M_PARTIAL) {
 		add_partial(n, slab, tail);
 		spin_unlock_irqrestore(&n->list_lock, flags);
 		stat(s, tail);
-	} else if (mode == M_FREE) {
-		stat(s, DEACTIVATE_EMPTY);
-		discard_slab(s, slab);
-		stat(s, FREE_SLAB);
-	} else if (mode == M_FULL_NOLIST) {
+	} else {
 		stat(s, DEACTIVATE_FULL);
 	}
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH v4 8/9] slub: Rename all *unfreeze_partials* functions to *put_partials*
  2023-10-31 14:07 [RFC PATCH v4 0/9] slub: Delay freezing of CPU partial slabs chengming.zhou
                   ` (6 preceding siblings ...)
  2023-10-31 14:07 ` [RFC PATCH v4 7/9] slub: Optimize deactivate_slab() chengming.zhou
@ 2023-10-31 14:07 ` chengming.zhou
  2023-11-01 13:40   ` Vlastimil Babka
  2023-10-31 14:07 ` [RFC PATCH v4 9/9] slub: Update frozen slabs documentations in the source chengming.zhou
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 23+ messages in thread
From: chengming.zhou @ 2023-10-31 14:07 UTC (permalink / raw)
  To: vbabka, cl, penberg, willy
  Cc: rientjes, iamjoonsoo.kim, akpm, roman.gushchin, 42.hyeyoo,
	linux-mm, linux-kernel, chengming.zhou, Chengming Zhou

From: Chengming Zhou <zhouchengming@bytedance.com>

Since all partial slabs on the CPU partial list are not frozen anymore,
we don't unfreeze when moving cpu partial slabs to node partial list,
it's better to rename these functions.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 mm/slub.c | 34 +++++++++++++++++-----------------
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index c429f8baba5f..bb7368047103 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2549,7 +2549,7 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
 }
 
 #ifdef CONFIG_SLUB_CPU_PARTIAL
-static void __unfreeze_partials(struct kmem_cache *s, struct slab *partial_slab)
+static void put_partials_node(struct kmem_cache *s, struct slab *partial_slab)
 {
 	struct kmem_cache_node *n = NULL, *n2 = NULL;
 	struct slab *slab, *slab_to_discard = NULL;
@@ -2591,9 +2591,9 @@ static void __unfreeze_partials(struct kmem_cache *s, struct slab *partial_slab)
 }
 
 /*
- * Unfreeze all the cpu partial slabs.
+ * Put all the cpu partial slabs to the node partial list.
  */
-static void unfreeze_partials(struct kmem_cache *s)
+static void put_partials(struct kmem_cache *s)
 {
 	struct slab *partial_slab;
 	unsigned long flags;
@@ -2604,11 +2604,11 @@ static void unfreeze_partials(struct kmem_cache *s)
 	local_unlock_irqrestore(&s->cpu_slab->lock, flags);
 
 	if (partial_slab)
-		__unfreeze_partials(s, partial_slab);
+		put_partials_node(s, partial_slab);
 }
 
-static void unfreeze_partials_cpu(struct kmem_cache *s,
-				  struct kmem_cache_cpu *c)
+static void put_partials_cpu(struct kmem_cache *s,
+			     struct kmem_cache_cpu *c)
 {
 	struct slab *partial_slab;
 
@@ -2616,7 +2616,7 @@ static void unfreeze_partials_cpu(struct kmem_cache *s,
 	c->partial = NULL;
 
 	if (partial_slab)
-		__unfreeze_partials(s, partial_slab);
+		put_partials_node(s, partial_slab);
 }
 
 /*
@@ -2629,7 +2629,7 @@ static void unfreeze_partials_cpu(struct kmem_cache *s,
 static void put_cpu_partial(struct kmem_cache *s, struct slab *slab, int drain)
 {
 	struct slab *oldslab;
-	struct slab *slab_to_unfreeze = NULL;
+	struct slab *slab_to_put = NULL;
 	unsigned long flags;
 	int slabs = 0;
 
@@ -2644,7 +2644,7 @@ static void put_cpu_partial(struct kmem_cache *s, struct slab *slab, int drain)
 			 * per node partial list. Postpone the actual unfreezing
 			 * outside of the critical section.
 			 */
-			slab_to_unfreeze = oldslab;
+			slab_to_put = oldslab;
 			oldslab = NULL;
 		} else {
 			slabs = oldslab->slabs;
@@ -2660,17 +2660,17 @@ static void put_cpu_partial(struct kmem_cache *s, struct slab *slab, int drain)
 
 	local_unlock_irqrestore(&s->cpu_slab->lock, flags);
 
-	if (slab_to_unfreeze) {
-		__unfreeze_partials(s, slab_to_unfreeze);
+	if (slab_to_put) {
+		put_partials_node(s, slab_to_put);
 		stat(s, CPU_PARTIAL_DRAIN);
 	}
 }
 
 #else	/* CONFIG_SLUB_CPU_PARTIAL */
 
-static inline void unfreeze_partials(struct kmem_cache *s) { }
-static inline void unfreeze_partials_cpu(struct kmem_cache *s,
-				  struct kmem_cache_cpu *c) { }
+static inline void put_partials(struct kmem_cache *s) { }
+static inline void put_partials_cpu(struct kmem_cache *s,
+				    struct kmem_cache_cpu *c) { }
 
 #endif	/* CONFIG_SLUB_CPU_PARTIAL */
 
@@ -2712,7 +2712,7 @@ static inline void __flush_cpu_slab(struct kmem_cache *s, int cpu)
 		stat(s, CPUSLAB_FLUSH);
 	}
 
-	unfreeze_partials_cpu(s, c);
+	put_partials_cpu(s, c);
 }
 
 struct slub_flush_work {
@@ -2740,7 +2740,7 @@ static void flush_cpu_slab(struct work_struct *w)
 	if (c->slab)
 		flush_slab(s, c);
 
-	unfreeze_partials(s);
+	put_partials(s);
 }
 
 static bool has_cpu_slab(int cpu, struct kmem_cache *s)
@@ -3171,7 +3171,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 		if (unlikely(!node_match(slab, node) ||
 			     !pfmemalloc_match(slab, gfpflags))) {
 			slab->next = NULL;
-			__unfreeze_partials(s, slab);
+			put_partials_node(s, slab);
 			continue;
 		}
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH v4 9/9] slub: Update frozen slabs documentations in the source
  2023-10-31 14:07 [RFC PATCH v4 0/9] slub: Delay freezing of CPU partial slabs chengming.zhou
                   ` (7 preceding siblings ...)
  2023-10-31 14:07 ` [RFC PATCH v4 8/9] slub: Rename all *unfreeze_partials* functions to *put_partials* chengming.zhou
@ 2023-10-31 14:07 ` chengming.zhou
  2023-11-01 13:51   ` Vlastimil Babka
  2023-11-01 13:33 ` [RFC PATCH v4 0/9] slub: Delay freezing of CPU partial slabs Hyeonggon Yoo
  2023-11-01 13:59 ` Vlastimil Babka
  10 siblings, 1 reply; 23+ messages in thread
From: chengming.zhou @ 2023-10-31 14:07 UTC (permalink / raw)
  To: vbabka, cl, penberg, willy
  Cc: rientjes, iamjoonsoo.kim, akpm, roman.gushchin, 42.hyeyoo,
	linux-mm, linux-kernel, chengming.zhou, Chengming Zhou

From: Chengming Zhou <zhouchengming@bytedance.com>

The current updated scheme (which this series implemented) is:
 - node partial slabs: PG_Workingset && !frozen
 - cpu partial slabs: !PG_Workingset && !frozen
 - cpu slabs: !PG_Workingset && frozen
 - full slabs: !PG_Workingset && !frozen

The most important change is that "frozen" bit is not set for the
cpu partial slabs anymore, __slab_free() will grab node list_lock
then check by !PG_Workingset that it's not on a node partial list.

And the "frozen" bit is still kept for the cpu slabs for performance,
since we don't need to grab node list_lock to check whether the
PG_Workingset is set or not if the "frozen" bit is set in __slab_free().

Update related documentations and comments in the source.

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 mm/slub.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index bb7368047103..89d3f7a18a73 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -76,13 +76,22 @@
  *
  *   Frozen slabs
  *
- *   If a slab is frozen then it is exempt from list management. It is not
- *   on any list except per cpu partial list. The processor that froze the
+ *   If a slab is frozen then it is exempt from list management. It is
+ *   the cpu slab which is actively allocated from by the processor that
+ *   froze it and it is not on any list. The processor that froze the
  *   slab is the one who can perform list operations on the slab. Other
  *   processors may put objects onto the freelist but the processor that
  *   froze the slab is the only one that can retrieve the objects from the
  *   slab's freelist.
  *
+ *   CPU partial slabs
+ *
+ *   The partially empty slabs cached on the CPU partial list are used
+ *   for performance reasons, which speeds up the allocation process.
+ *   These slabs are not frozen, but also exempt from list management,
+ *   by clearing the PG_workingset flag when moving out of the node
+ *   partial list. Please see __slab_free() for more details.
+ *
  *   list_lock
  *
  *   The list_lock protects the partial and full list on each node and
@@ -2620,8 +2629,7 @@ static void put_partials_cpu(struct kmem_cache *s,
 }
 
 /*
- * Put a slab that was just frozen (in __slab_free|get_partial_node) into a
- * partial slab slot if available.
+ * Put a slab into a partial slab slot if available.
  *
  * If we did not find a slot then simply move all the partials to the
  * per node partial list.
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v4 3/9] slub: Keep track of whether slub is on the per-node partial list
  2023-10-31 14:07 ` [RFC PATCH v4 3/9] slub: Keep track of whether slub is on the per-node partial list chengming.zhou
@ 2023-11-01 12:23   ` Vlastimil Babka
  0 siblings, 0 replies; 23+ messages in thread
From: Vlastimil Babka @ 2023-11-01 12:23 UTC (permalink / raw)
  To: chengming.zhou, cl, penberg, willy
  Cc: rientjes, iamjoonsoo.kim, akpm, roman.gushchin, 42.hyeyoo,
	linux-mm, linux-kernel, Chengming Zhou



On 10/31/23 15:07, chengming.zhou@linux.dev wrote:
> From: Chengming Zhou <zhouchengming@bytedance.com>
> 
> Now we rely on the "frozen" bit to see if we should manipulate the
> slab->slab_list, which will be changed in the following patch.
> 
> Instead we introduce another way to keep track of whether slub is on
> the per-node partial list, here we reuse the PG_workingset bit.
> 
> We use __set_bit and __clear_bit directly instead of the atomic version
> for better performance and it's safe since it's protected by the slub
> node list_lock.
> 
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v4 4/9] slub: Prepare __slab_free() for unfrozen partial slab out of node partial list
  2023-10-31 14:07 ` [RFC PATCH v4 4/9] slub: Prepare __slab_free() for unfrozen partial slab out of node " chengming.zhou
@ 2023-11-01 12:26   ` Vlastimil Babka
  0 siblings, 0 replies; 23+ messages in thread
From: Vlastimil Babka @ 2023-11-01 12:26 UTC (permalink / raw)
  To: chengming.zhou, cl, penberg, willy
  Cc: rientjes, iamjoonsoo.kim, akpm, roman.gushchin, 42.hyeyoo,
	linux-mm, linux-kernel, Chengming Zhou

On 10/31/23 15:07, chengming.zhou@linux.dev wrote:
> From: Chengming Zhou <zhouchengming@bytedance.com>
> 
> Now the partially empty slub will be frozen when taken out of node partial
> list, so the __slab_free() will know from "was_frozen" that the partially
> empty slab is not on node partial list and is a cpu or cpu partial slab
> of some cpu.
> 
> But we will change this, make partial slabs leave the node partial list
> with unfrozen state, so we need to change __slab_free() to use the new
> slab_test_node_partial() we just introduced.
> 
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/slub.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index eed8ae0dbaf9..1880b483350e 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -3631,6 +3631,7 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
>  	unsigned long counters;
>  	struct kmem_cache_node *n = NULL;
>  	unsigned long flags;
> +	bool on_node_partial;
>  
>  	stat(s, FREE_SLOWPATH);
>  
> @@ -3678,6 +3679,7 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
>  				 */
>  				spin_lock_irqsave(&n->list_lock, flags);
>  
> +				on_node_partial = slab_test_node_partial(slab);
>  			}
>  		}
>  
> @@ -3706,6 +3708,15 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab,
>  		return;
>  	}
>  
> +	/*
> +	 * This slab was partially empty but not on the per-node partial list,
> +	 * in which case we shouldn't manipulate its list, just return.
> +	 */
> +	if (prior && !on_node_partial) {
> +		spin_unlock_irqrestore(&n->list_lock, flags);
> +		return;
> +	}
> +
>  	if (unlikely(!new.inuse && n->nr_partial >= s->min_partial))
>  		goto slab_empty;
>  

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v4 7/9] slub: Optimize deactivate_slab()
  2023-10-31 14:07 ` [RFC PATCH v4 7/9] slub: Optimize deactivate_slab() chengming.zhou
@ 2023-11-01 13:21   ` Vlastimil Babka
  2023-11-02  2:10     ` Chengming Zhou
  0 siblings, 1 reply; 23+ messages in thread
From: Vlastimil Babka @ 2023-11-01 13:21 UTC (permalink / raw)
  To: chengming.zhou, cl, penberg, willy
  Cc: rientjes, iamjoonsoo.kim, akpm, roman.gushchin, 42.hyeyoo,
	linux-mm, linux-kernel, Chengming Zhou



On 10/31/23 15:07, chengming.zhou@linux.dev wrote:
> From: Chengming Zhou <zhouchengming@bytedance.com>
> 
> Since the introduce of unfrozen slabs on cpu partial list, we don't
> need to synchronize the slab frozen state under the node list_lock.
> 
> The caller of deactivate_slab() and the caller of __slab_free() won't
> manipulate the slab list concurrently.
> 
> So we can get node list_lock in the last stage if we really need to
> manipulate the slab list in this path.
> 
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/slub.c | 76 +++++++++++++++++++------------------------------------
>  1 file changed, 26 insertions(+), 50 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index bcb5b2c4e213..c429f8baba5f 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2468,10 +2468,8 @@ static void init_kmem_cache_cpus(struct kmem_cache *s)
>  static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
>  			    void *freelist)
>  {
> -	enum slab_modes { M_NONE, M_PARTIAL, M_FREE, M_FULL_NOLIST };
>  	struct kmem_cache_node *n = get_node(s, slab_nid(slab));
>  	int free_delta = 0;
> -	enum slab_modes mode = M_NONE;
>  	void *nextfree, *freelist_iter, *freelist_tail;
>  	int tail = DEACTIVATE_TO_HEAD;
>  	unsigned long flags = 0;
> @@ -2512,62 +2510,40 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
>  	 *
>  	 * Ensure that the slab is unfrozen while the list presence
>  	 * reflects the actual number of objects during unfreeze.

I think this we can delete also these two lines. If there's no other
reason for v5, I can do it when merging the series.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v4 0/9] slub: Delay freezing of CPU partial slabs
  2023-10-31 14:07 [RFC PATCH v4 0/9] slub: Delay freezing of CPU partial slabs chengming.zhou
                   ` (8 preceding siblings ...)
  2023-10-31 14:07 ` [RFC PATCH v4 9/9] slub: Update frozen slabs documentations in the source chengming.zhou
@ 2023-11-01 13:33 ` Hyeonggon Yoo
  2023-11-02  2:17   ` Chengming Zhou
  2023-11-01 13:59 ` Vlastimil Babka
  10 siblings, 1 reply; 23+ messages in thread
From: Hyeonggon Yoo @ 2023-11-01 13:33 UTC (permalink / raw)
  To: chengming.zhou
  Cc: vbabka, cl, penberg, willy, rientjes, iamjoonsoo.kim, akpm,
	roman.gushchin, linux-mm, linux-kernel, Chengming Zhou

On Tue, Oct 31, 2023 at 02:07:32PM +0000, chengming.zhou@linux.dev wrote:
> From: Chengming Zhou <zhouchengming@bytedance.com>
> 
> Changes in RFC v4:
>  - Reorder patches to put the two cleanup patches to the front.
>  - Move slab_node_partial flag functions to mm/slub.c.
>  - Fix freeze_slab() by using slab_update_freelist().
>  - Fix build error when !CONFIG_SLUB_CPU_PARTIAL.
>  - Add a patch to rename all *unfreeze_partials* functions.
>  - Add a patch to update inconsistent documentations in the source.
>  - Some comments and changelog improvements.
>  - Add Reviewed-by and Suggested-by tags. Many thanks!
>  - RFC v3: https://lore.kernel.org/all/20231024093345.3676493-1-chengming.zhou@linux.dev/

Hi,
This series passed (yeah, v3 was broken as reported by the bot)
hackbench + a set of MM tests on 30 different SLUB configs [1] [2]

For the series, feel free to add:
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

Thanks!

[1] https://jenkins.kerneltesting.org/job/slab-experimental/21/
[2] https://lava.kerneltesting.org/scheduler/alljobs

-- 
Hyeonggon

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v4 8/9] slub: Rename all *unfreeze_partials* functions to *put_partials*
  2023-10-31 14:07 ` [RFC PATCH v4 8/9] slub: Rename all *unfreeze_partials* functions to *put_partials* chengming.zhou
@ 2023-11-01 13:40   ` Vlastimil Babka
  2023-11-02  2:12     ` Chengming Zhou
  0 siblings, 1 reply; 23+ messages in thread
From: Vlastimil Babka @ 2023-11-01 13:40 UTC (permalink / raw)
  To: chengming.zhou, cl, penberg, willy
  Cc: rientjes, iamjoonsoo.kim, akpm, roman.gushchin, 42.hyeyoo,
	linux-mm, linux-kernel, Chengming Zhou

On 10/31/23 15:07, chengming.zhou@linux.dev wrote:
> From: Chengming Zhou <zhouchengming@bytedance.com>
> 
> Since all partial slabs on the CPU partial list are not frozen anymore,
> we don't unfreeze when moving cpu partial slabs to node partial list,
> it's better to rename these functions.
> 
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

However I think put_partials_node() is not the best name as it's not
something specific to a single node. I think __put_partials() would be
better.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v4 9/9] slub: Update frozen slabs documentations in the source
  2023-10-31 14:07 ` [RFC PATCH v4 9/9] slub: Update frozen slabs documentations in the source chengming.zhou
@ 2023-11-01 13:51   ` Vlastimil Babka
  2023-11-02  2:48     ` Chengming Zhou
  0 siblings, 1 reply; 23+ messages in thread
From: Vlastimil Babka @ 2023-11-01 13:51 UTC (permalink / raw)
  To: chengming.zhou, cl, penberg, willy
  Cc: rientjes, iamjoonsoo.kim, akpm, roman.gushchin, 42.hyeyoo,
	linux-mm, linux-kernel, Chengming Zhou

On 10/31/23 15:07, chengming.zhou@linux.dev wrote:
> From: Chengming Zhou <zhouchengming@bytedance.com>
> 
> The current updated scheme (which this series implemented) is:
>  - node partial slabs: PG_Workingset && !frozen
>  - cpu partial slabs: !PG_Workingset && !frozen
>  - cpu slabs: !PG_Workingset && frozen
>  - full slabs: !PG_Workingset && !frozen

It could be useful to put this also to the initial comment description.
Towards the end of the comment, there's a block explaining
"slab->frozen". It could be extended to cover all 4 combination (but not
all of them need such long explanation).

> 
> The most important change is that "frozen" bit is not set for the
> cpu partial slabs anymore, __slab_free() will grab node list_lock
> then check by !PG_Workingset that it's not on a node partial list.
> 
> And the "frozen" bit is still kept for the cpu slabs for performance,
> since we don't need to grab node list_lock to check whether the
> PG_Workingset is set or not if the "frozen" bit is set in __slab_free().
> 
> Update related documentations and comments in the source.
> 
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> ---
>  mm/slub.c | 16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index bb7368047103..89d3f7a18a73 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -76,13 +76,22 @@
>   *
>   *   Frozen slabs
>   *
> - *   If a slab is frozen then it is exempt from list management. It is not
> - *   on any list except per cpu partial list. The processor that froze the
> + *   If a slab is frozen then it is exempt from list management. It is
> + *   the cpu slab which is actively allocated from by the processor that
> + *   froze it and it is not on any list. The processor that froze the
>   *   slab is the one who can perform list operations on the slab. Other
>   *   processors may put objects onto the freelist but the processor that
>   *   froze the slab is the only one that can retrieve the objects from the
>   *   slab's freelist.
>   *
> + *   CPU partial slabs
> + *
> + *   The partially empty slabs cached on the CPU partial list are used
> + *   for performance reasons, which speeds up the allocation process.
> + *   These slabs are not frozen, but also exempt from list management,

					^ are also

(otherwise somebody could read it as "also are not")

> + *   by clearing the PG_workingset flag when moving out of the node
> + *   partial list. Please see __slab_free() for more details.
> + *
>   *   list_lock
>   *
>   *   The list_lock protects the partial and full list on each node and
> @@ -2620,8 +2629,7 @@ static void put_partials_cpu(struct kmem_cache *s,
>  }
>  
>  /*
> - * Put a slab that was just frozen (in __slab_free|get_partial_node) into a
> - * partial slab slot if available.
> + * Put a slab into a partial slab slot if available.
>   *
>   * If we did not find a slot then simply move all the partials to the
>   * per node partial list.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v4 0/9] slub: Delay freezing of CPU partial slabs
  2023-10-31 14:07 [RFC PATCH v4 0/9] slub: Delay freezing of CPU partial slabs chengming.zhou
                   ` (9 preceding siblings ...)
  2023-11-01 13:33 ` [RFC PATCH v4 0/9] slub: Delay freezing of CPU partial slabs Hyeonggon Yoo
@ 2023-11-01 13:59 ` Vlastimil Babka
  2023-11-02  2:19   ` Chengming Zhou
  10 siblings, 1 reply; 23+ messages in thread
From: Vlastimil Babka @ 2023-11-01 13:59 UTC (permalink / raw)
  To: chengming.zhou, cl, penberg, willy
  Cc: rientjes, iamjoonsoo.kim, akpm, roman.gushchin, 42.hyeyoo,
	linux-mm, linux-kernel, Chengming Zhou

> 3. Testing
> ==========
> We just did some simple testing on a server with 128 CPUs (2 nodes) to
> compare performance for now.
> 
>  - perf bench sched messaging -g 5 -t -l 100000
>    baseline	RFC
>    7.042s	6.966s
>    7.022s	7.045s
>    7.054s	6.985s
> 
>  - stress-ng --rawpkt 128 --rawpkt-ops 100000000
>    baseline	RFC
>    2.42s	2.15s
>    2.45s	2.16s
>    2.44s	2.17s

Looks like these numbers are carried over from the first RFC. Could you
please retest with v4 as there were some bigger changes (i.e. getting
rid of acquire_slab()).

Otherwise I think v5 can drop "RFC" and will add it to slab tree after
the merge window and 6.7-rc1. Thanks!

> It shows above there is about 10% improvement on stress-ng rawpkt
> testcase, although no much improvement on perf sched bench testcase.
> 
> Thanks for any comment and code review!
> 
> Chengming Zhou (9):
>   slub: Reflow ___slab_alloc()
>   slub: Change get_partial() interfaces to return slab
>   slub: Keep track of whether slub is on the per-node partial list
>   slub: Prepare __slab_free() for unfrozen partial slab out of node
>     partial list
>   slub: Introduce freeze_slab()
>   slub: Delay freezing of partial slabs
>   slub: Optimize deactivate_slab()
>   slub: Rename all *unfreeze_partials* functions to *put_partials*
>   slub: Update frozen slabs documentations in the source
> 
>  mm/slub.c | 381 ++++++++++++++++++++++++++----------------------------
>  1 file changed, 180 insertions(+), 201 deletions(-)
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v4 7/9] slub: Optimize deactivate_slab()
  2023-11-01 13:21   ` Vlastimil Babka
@ 2023-11-02  2:10     ` Chengming Zhou
  0 siblings, 0 replies; 23+ messages in thread
From: Chengming Zhou @ 2023-11-02  2:10 UTC (permalink / raw)
  To: Vlastimil Babka, cl, penberg, willy
  Cc: rientjes, iamjoonsoo.kim, akpm, roman.gushchin, 42.hyeyoo,
	linux-mm, linux-kernel, Chengming Zhou

On 2023/11/1 21:21, Vlastimil Babka wrote:
> 
> 
> On 10/31/23 15:07, chengming.zhou@linux.dev wrote:
>> From: Chengming Zhou <zhouchengming@bytedance.com>
>>
>> Since the introduce of unfrozen slabs on cpu partial list, we don't
>> need to synchronize the slab frozen state under the node list_lock.
>>
>> The caller of deactivate_slab() and the caller of __slab_free() won't
>> manipulate the slab list concurrently.
>>
>> So we can get node list_lock in the last stage if we really need to
>> manipulate the slab list in this path.
>>
>> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> 
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> 
>> ---
>>  mm/slub.c | 76 +++++++++++++++++++------------------------------------
>>  1 file changed, 26 insertions(+), 50 deletions(-)
>>
>> diff --git a/mm/slub.c b/mm/slub.c
>> index bcb5b2c4e213..c429f8baba5f 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -2468,10 +2468,8 @@ static void init_kmem_cache_cpus(struct kmem_cache *s)
>>  static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
>>  			    void *freelist)
>>  {
>> -	enum slab_modes { M_NONE, M_PARTIAL, M_FREE, M_FULL_NOLIST };
>>  	struct kmem_cache_node *n = get_node(s, slab_nid(slab));
>>  	int free_delta = 0;
>> -	enum slab_modes mode = M_NONE;
>>  	void *nextfree, *freelist_iter, *freelist_tail;
>>  	int tail = DEACTIVATE_TO_HEAD;
>>  	unsigned long flags = 0;
>> @@ -2512,62 +2510,40 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
>>  	 *
>>  	 * Ensure that the slab is unfrozen while the list presence
>>  	 * reflects the actual number of objects during unfreeze.
> 
> I think this we can delete also these two lines. If there's no other
> reason for v5, I can do it when merging the series.

Ok, I will delete it in v5.

Thanks!

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v4 8/9] slub: Rename all *unfreeze_partials* functions to *put_partials*
  2023-11-01 13:40   ` Vlastimil Babka
@ 2023-11-02  2:12     ` Chengming Zhou
  0 siblings, 0 replies; 23+ messages in thread
From: Chengming Zhou @ 2023-11-02  2:12 UTC (permalink / raw)
  To: Vlastimil Babka, cl, penberg, willy
  Cc: rientjes, iamjoonsoo.kim, akpm, roman.gushchin, 42.hyeyoo,
	linux-mm, linux-kernel, Chengming Zhou

On 2023/11/1 21:40, Vlastimil Babka wrote:
> On 10/31/23 15:07, chengming.zhou@linux.dev wrote:
>> From: Chengming Zhou <zhouchengming@bytedance.com>
>>
>> Since all partial slabs on the CPU partial list are not frozen anymore,
>> we don't unfreeze when moving cpu partial slabs to node partial list,
>> it's better to rename these functions.
>>
>> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> 
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> 
> However I think put_partials_node() is not the best name as it's not
> something specific to a single node. I think __put_partials() would be
> better.

Right, I will change to __put_partials() in v5.

Thanks!

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v4 0/9] slub: Delay freezing of CPU partial slabs
  2023-11-01 13:33 ` [RFC PATCH v4 0/9] slub: Delay freezing of CPU partial slabs Hyeonggon Yoo
@ 2023-11-02  2:17   ` Chengming Zhou
  0 siblings, 0 replies; 23+ messages in thread
From: Chengming Zhou @ 2023-11-02  2:17 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: vbabka, cl, penberg, willy, rientjes, iamjoonsoo.kim, akpm,
	roman.gushchin, linux-mm, linux-kernel, Chengming Zhou

On 2023/11/1 21:33, Hyeonggon Yoo wrote:
> On Tue, Oct 31, 2023 at 02:07:32PM +0000, chengming.zhou@linux.dev wrote:
>> From: Chengming Zhou <zhouchengming@bytedance.com>
>>
>> Changes in RFC v4:
>>  - Reorder patches to put the two cleanup patches to the front.
>>  - Move slab_node_partial flag functions to mm/slub.c.
>>  - Fix freeze_slab() by using slab_update_freelist().
>>  - Fix build error when !CONFIG_SLUB_CPU_PARTIAL.
>>  - Add a patch to rename all *unfreeze_partials* functions.
>>  - Add a patch to update inconsistent documentations in the source.
>>  - Some comments and changelog improvements.
>>  - Add Reviewed-by and Suggested-by tags. Many thanks!
>>  - RFC v3: https://lore.kernel.org/all/20231024093345.3676493-1-chengming.zhou@linux.dev/
> 
> Hi,
> This series passed (yeah, v3 was broken as reported by the bot)
> hackbench + a set of MM tests on 30 different SLUB configs [1] [2]
> 
> For the series, feel free to add:
> Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> 
> Thanks!
> 
> [1] https://jenkins.kerneltesting.org/job/slab-experimental/21/
> [2] https://lava.kerneltesting.org/scheduler/alljobs
> 

Great! This is very helpful! I will add it in v5.

Thanks!

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v4 0/9] slub: Delay freezing of CPU partial slabs
  2023-11-01 13:59 ` Vlastimil Babka
@ 2023-11-02  2:19   ` Chengming Zhou
  0 siblings, 0 replies; 23+ messages in thread
From: Chengming Zhou @ 2023-11-02  2:19 UTC (permalink / raw)
  To: Vlastimil Babka, cl, penberg, willy
  Cc: rientjes, iamjoonsoo.kim, akpm, roman.gushchin, 42.hyeyoo,
	linux-mm, linux-kernel, Chengming Zhou

On 2023/11/1 21:59, Vlastimil Babka wrote:
>> 3. Testing
>> ==========
>> We just did some simple testing on a server with 128 CPUs (2 nodes) to
>> compare performance for now.
>>
>>  - perf bench sched messaging -g 5 -t -l 100000
>>    baseline	RFC
>>    7.042s	6.966s
>>    7.022s	7.045s
>>    7.054s	6.985s
>>
>>  - stress-ng --rawpkt 128 --rawpkt-ops 100000000
>>    baseline	RFC
>>    2.42s	2.15s
>>    2.45s	2.16s
>>    2.44s	2.17s
> 
> Looks like these numbers are carried over from the first RFC. Could you
> please retest with v4 as there were some bigger changes (i.e. getting
> rid of acquire_slab()).
> 
> Otherwise I think v5 can drop "RFC" and will add it to slab tree after
> the merge window and 6.7-rc1. Thanks!

Ah, yes, I will retest v5 and update the numbers today.

Thanks!

> 
>> It shows above there is about 10% improvement on stress-ng rawpkt
>> testcase, although no much improvement on perf sched bench testcase.
>>
>> Thanks for any comment and code review!
>>
>> Chengming Zhou (9):
>>   slub: Reflow ___slab_alloc()
>>   slub: Change get_partial() interfaces to return slab
>>   slub: Keep track of whether slub is on the per-node partial list
>>   slub: Prepare __slab_free() for unfrozen partial slab out of node
>>     partial list
>>   slub: Introduce freeze_slab()
>>   slub: Delay freezing of partial slabs
>>   slub: Optimize deactivate_slab()
>>   slub: Rename all *unfreeze_partials* functions to *put_partials*
>>   slub: Update frozen slabs documentations in the source
>>
>>  mm/slub.c | 381 ++++++++++++++++++++++++++----------------------------
>>  1 file changed, 180 insertions(+), 201 deletions(-)
>>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v4 9/9] slub: Update frozen slabs documentations in the source
  2023-11-01 13:51   ` Vlastimil Babka
@ 2023-11-02  2:48     ` Chengming Zhou
  0 siblings, 0 replies; 23+ messages in thread
From: Chengming Zhou @ 2023-11-02  2:48 UTC (permalink / raw)
  To: Vlastimil Babka, cl, penberg, willy
  Cc: rientjes, iamjoonsoo.kim, akpm, roman.gushchin, 42.hyeyoo,
	linux-mm, linux-kernel, Chengming Zhou

On 2023/11/1 21:51, Vlastimil Babka wrote:
> On 10/31/23 15:07, chengming.zhou@linux.dev wrote:
>> From: Chengming Zhou <zhouchengming@bytedance.com>
>>
>> The current updated scheme (which this series implemented) is:
>>  - node partial slabs: PG_Workingset && !frozen
>>  - cpu partial slabs: !PG_Workingset && !frozen
>>  - cpu slabs: !PG_Workingset && frozen
>>  - full slabs: !PG_Workingset && !frozen
> 
> It could be useful to put this also to the initial comment description.
> Towards the end of the comment, there's a block explaining
> "slab->frozen". It could be extended to cover all 4 combination (but not
> all of them need such long explanation).
> 

Ok, I will extend it and put in the cover letter of v5.

>>
>> The most important change is that "frozen" bit is not set for the
>> cpu partial slabs anymore, __slab_free() will grab node list_lock
>> then check by !PG_Workingset that it's not on a node partial list.
>>
>> And the "frozen" bit is still kept for the cpu slabs for performance,
>> since we don't need to grab node list_lock to check whether the
>> PG_Workingset is set or not if the "frozen" bit is set in __slab_free().
>>
>> Update related documentations and comments in the source.
>>
>> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
>> ---
>>  mm/slub.c | 16 ++++++++++++----
>>  1 file changed, 12 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/slub.c b/mm/slub.c
>> index bb7368047103..89d3f7a18a73 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -76,13 +76,22 @@
>>   *
>>   *   Frozen slabs
>>   *
>> - *   If a slab is frozen then it is exempt from list management. It is not
>> - *   on any list except per cpu partial list. The processor that froze the
>> + *   If a slab is frozen then it is exempt from list management. It is
>> + *   the cpu slab which is actively allocated from by the processor that
>> + *   froze it and it is not on any list. The processor that froze the
>>   *   slab is the one who can perform list operations on the slab. Other
>>   *   processors may put objects onto the freelist but the processor that
>>   *   froze the slab is the only one that can retrieve the objects from the
>>   *   slab's freelist.
>>   *
>> + *   CPU partial slabs
>> + *
>> + *   The partially empty slabs cached on the CPU partial list are used
>> + *   for performance reasons, which speeds up the allocation process.
>> + *   These slabs are not frozen, but also exempt from list management,
> 
> 					^ are also
> 
> (otherwise somebody could read it as "also are not")
> 

Ah, will fix.

Thanks!

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH v4 5/9] slub: Introduce freeze_slab()
  2023-10-31 14:07 ` [RFC PATCH v4 5/9] slub: Introduce freeze_slab() chengming.zhou
@ 2023-12-03  6:08   ` Hyeonggon Yoo
  0 siblings, 0 replies; 23+ messages in thread
From: Hyeonggon Yoo @ 2023-12-03  6:08 UTC (permalink / raw)
  To: chengming.zhou
  Cc: vbabka, cl, penberg, willy, rientjes, iamjoonsoo.kim, akpm,
	roman.gushchin, linux-mm, linux-kernel, Chengming Zhou

On Tue, Oct 31, 2023 at 11:09 PM <chengming.zhou@linux.dev> wrote:
>
> From: Chengming Zhou <zhouchengming@bytedance.com>
>
> We will have unfrozen slabs out of the node partial list later, so we
> need a freeze_slab() function to freeze the partial slab and get its
> freelist.
>
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  mm/slub.c | 27 +++++++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 1880b483350e..edf567971679 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -3098,6 +3098,33 @@ static inline void *get_freelist(struct kmem_cache *s, struct slab *slab)
>         return freelist;
>  }
>
> +/*
> + * Freeze the partial slab and return the pointer to the freelist.
> + */
> +static inline void *freeze_slab(struct kmem_cache *s, struct slab *slab)
> +{
> +       struct slab new;
> +       unsigned long counters;
> +       void *freelist;
> +
> +       do {
> +               freelist = slab->freelist;
> +               counters = slab->counters;
> +
> +               new.counters = counters;
> +               VM_BUG_ON(new.frozen);
> +
> +               new.inuse = slab->objects;
> +               new.frozen = 1;
> +
> +       } while (!slab_update_freelist(s, slab,
> +               freelist, counters,
> +               NULL, new.counters,
> +               "freeze_slab"));
> +
> +       return freelist;
> +}
> +

Looks good to me,
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>

Thanks!

>  /*
>   * Slow path. The lockless freelist is empty or we need to perform
>   * debugging duties.
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2023-12-03  6:09 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-31 14:07 [RFC PATCH v4 0/9] slub: Delay freezing of CPU partial slabs chengming.zhou
2023-10-31 14:07 ` [RFC PATCH v4 1/9] slub: Reflow ___slab_alloc() chengming.zhou
2023-10-31 14:07 ` [RFC PATCH v4 2/9] slub: Change get_partial() interfaces to return slab chengming.zhou
2023-10-31 14:07 ` [RFC PATCH v4 3/9] slub: Keep track of whether slub is on the per-node partial list chengming.zhou
2023-11-01 12:23   ` Vlastimil Babka
2023-10-31 14:07 ` [RFC PATCH v4 4/9] slub: Prepare __slab_free() for unfrozen partial slab out of node " chengming.zhou
2023-11-01 12:26   ` Vlastimil Babka
2023-10-31 14:07 ` [RFC PATCH v4 5/9] slub: Introduce freeze_slab() chengming.zhou
2023-12-03  6:08   ` Hyeonggon Yoo
2023-10-31 14:07 ` [RFC PATCH v4 6/9] slub: Delay freezing of partial slabs chengming.zhou
2023-10-31 14:07 ` [RFC PATCH v4 7/9] slub: Optimize deactivate_slab() chengming.zhou
2023-11-01 13:21   ` Vlastimil Babka
2023-11-02  2:10     ` Chengming Zhou
2023-10-31 14:07 ` [RFC PATCH v4 8/9] slub: Rename all *unfreeze_partials* functions to *put_partials* chengming.zhou
2023-11-01 13:40   ` Vlastimil Babka
2023-11-02  2:12     ` Chengming Zhou
2023-10-31 14:07 ` [RFC PATCH v4 9/9] slub: Update frozen slabs documentations in the source chengming.zhou
2023-11-01 13:51   ` Vlastimil Babka
2023-11-02  2:48     ` Chengming Zhou
2023-11-01 13:33 ` [RFC PATCH v4 0/9] slub: Delay freezing of CPU partial slabs Hyeonggon Yoo
2023-11-02  2:17   ` Chengming Zhou
2023-11-01 13:59 ` Vlastimil Babka
2023-11-02  2:19   ` Chengming Zhou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).