linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] mm/slub: Fix slabs_node return value
@ 2020-06-14 12:39 Muchun Song
  2020-06-14 12:39 ` [PATCH 1/3] mm/slub: Fix slabs_node return value when CONFIG_SLUB_DEBUG disabled Muchun Song
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Muchun Song @ 2020-06-14 12:39 UTC (permalink / raw)
  To: cl, penberg, rientjes, iamjoonsoo.kim, akpm
  Cc: linux-mm, linux-kernel, Muchun Song

The slabs_node() always return zero when CONFIG_SLUB_DEBUG is disabled.
But some codes determine whether slab is empty by checking the return
value of slabs_node(). As you know, the result is not correct. we move
the nr_slabs of kmem_cache_node out of the CONFIG_SLUB_DEBUG. So we can
get the corrent value returned by the slabs_node().

Muchun Song (3):
  mm/slub: Fix slabs_node return value when CONFIG_SLUB_DEBUG disabled
  mm/slub: Use node_nr_slabs() instead of slabs_node()
  mm/slub: Fix release all resources used by a slab cache

 mm/slab.h |  2 +-
 mm/slub.c | 93 +++++++++++++++++++++++++++++++++------------------------------
 2 files changed, 50 insertions(+), 45 deletions(-)

-- 
2.11.0



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/3] mm/slub: Fix slabs_node return value when CONFIG_SLUB_DEBUG disabled
  2020-06-14 12:39 [PATCH 0/3] mm/slub: Fix slabs_node return value Muchun Song
@ 2020-06-14 12:39 ` Muchun Song
  2020-06-15  6:11   ` Joonsoo Kim
  2020-06-15 16:46   ` Vlastimil Babka
  2020-06-14 12:39 ` [PATCH 2/3] mm/slub: Use node_nr_slabs() instead of slabs_node() Muchun Song
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 14+ messages in thread
From: Muchun Song @ 2020-06-14 12:39 UTC (permalink / raw)
  To: cl, penberg, rientjes, iamjoonsoo.kim, akpm
  Cc: linux-mm, linux-kernel, Muchun Song

The slabs_node() always return zero when CONFIG_SLUB_DEBUG is disabled.
But some codes determine whether slab is empty by checking the return
value of slabs_node(). As you know, the result is not correct. This
problem can be reproduce by the follow code(and boot system with the
cmdline of "slub_nomerge"):

  void *objs[32];
  struct kmem_cache *cache = kmem_cache_create("kmem-test", 128, 0,
			0, 0);

  if (cache) {
  	int i;

	/* Make a full slab */
  	for (i = 0; i < ARRAY_SIZE(objs); i++)
		objs[i] = kmem_cache_alloc(cache, GFP_KERNEL_ACCOUNT);

	/*
  	 * This really should fail because the slab cache still has
  	 * objects. But we did destroy the @cache because of zero
  	 * returned by slabs_node().
  	 */
  	kmem_cache_destroy(cache);
  }

To fix it, we can move the nr_slabs of kmem_cache_node out of the
CONFIG_SLUB_DEBUG. So we can get the corrent value returned by the
slabs_node().

With this patch applied, we will get a warning message and stack
trace in the dmesg.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 mm/slab.h |  2 +-
 mm/slub.c | 80 +++++++++++++++++++++++++++++++++------------------------------
 2 files changed, 43 insertions(+), 39 deletions(-)

diff --git a/mm/slab.h b/mm/slab.h
index 0b91f2a7b033..062d4542b7e2 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -619,8 +619,8 @@ struct kmem_cache_node {
 #ifdef CONFIG_SLUB
 	unsigned long nr_partial;
 	struct list_head partial;
-#ifdef CONFIG_SLUB_DEBUG
 	atomic_long_t nr_slabs;
+#ifdef CONFIG_SLUB_DEBUG
 	atomic_long_t total_objects;
 	struct list_head full;
 #endif
diff --git a/mm/slub.c b/mm/slub.c
index 49b5cb7da318..1a3e6a5b7287 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1070,39 +1070,14 @@ static void remove_full(struct kmem_cache *s, struct kmem_cache_node *n, struct
 	list_del(&page->slab_list);
 }
 
-/* Tracking of the number of slabs for debugging purposes */
-static inline unsigned long slabs_node(struct kmem_cache *s, int node)
+/* Tracking of the number of objects for debugging purposes */
+static inline void inc_objects_node(struct kmem_cache_node *n, int objects)
 {
-	struct kmem_cache_node *n = get_node(s, node);
-
-	return atomic_long_read(&n->nr_slabs);
+	atomic_long_add(objects, &n->total_objects);
 }
 
-static inline unsigned long node_nr_slabs(struct kmem_cache_node *n)
+static inline void dec_objects_node(struct kmem_cache_node *n, int objects)
 {
-	return atomic_long_read(&n->nr_slabs);
-}
-
-static inline void inc_slabs_node(struct kmem_cache *s, int node, int objects)
-{
-	struct kmem_cache_node *n = get_node(s, node);
-
-	/*
-	 * May be called early in order to allocate a slab for the
-	 * kmem_cache_node structure. Solve the chicken-egg
-	 * dilemma by deferring the increment of the count during
-	 * bootstrap (see early_kmem_cache_node_alloc).
-	 */
-	if (likely(n)) {
-		atomic_long_inc(&n->nr_slabs);
-		atomic_long_add(objects, &n->total_objects);
-	}
-}
-static inline void dec_slabs_node(struct kmem_cache *s, int node, int objects)
-{
-	struct kmem_cache_node *n = get_node(s, node);
-
-	atomic_long_dec(&n->nr_slabs);
 	atomic_long_sub(objects, &n->total_objects);
 }
 
@@ -1413,15 +1388,8 @@ slab_flags_t kmem_cache_flags(unsigned int object_size,
 
 #define disable_higher_order_debug 0
 
-static inline unsigned long slabs_node(struct kmem_cache *s, int node)
-							{ return 0; }
-static inline unsigned long node_nr_slabs(struct kmem_cache_node *n)
-							{ return 0; }
-static inline void inc_slabs_node(struct kmem_cache *s, int node,
-							int objects) {}
-static inline void dec_slabs_node(struct kmem_cache *s, int node,
-							int objects) {}
-
+static inline void inc_objects_node(struct kmem_cache_node *n, int objects) {}
+static inline void dec_objects_node(struct kmem_cache_node *n, int objects) {}
 static bool freelist_corrupted(struct kmem_cache *s, struct page *page,
 			       void *freelist, void *nextfree)
 {
@@ -1429,6 +1397,42 @@ static bool freelist_corrupted(struct kmem_cache *s, struct page *page,
 }
 #endif /* CONFIG_SLUB_DEBUG */
 
+static inline unsigned long slabs_node(struct kmem_cache *s, int node)
+{
+	struct kmem_cache_node *n = get_node(s, node);
+
+	return atomic_long_read(&n->nr_slabs);
+}
+
+static inline unsigned long node_nr_slabs(struct kmem_cache_node *n)
+{
+	return atomic_long_read(&n->nr_slabs);
+}
+
+static inline void inc_slabs_node(struct kmem_cache *s, int node, int objects)
+{
+	struct kmem_cache_node *n = get_node(s, node);
+
+	/*
+	 * May be called early in order to allocate a slab for the
+	 * kmem_cache_node structure. Solve the chicken-egg
+	 * dilemma by deferring the increment of the count during
+	 * bootstrap (see early_kmem_cache_node_alloc).
+	 */
+	if (likely(n)) {
+		atomic_long_inc(&n->nr_slabs);
+		inc_objects_node(n, objects);
+	}
+}
+
+static inline void dec_slabs_node(struct kmem_cache *s, int node, int objects)
+{
+	struct kmem_cache_node *n = get_node(s, node);
+
+	atomic_long_dec(&n->nr_slabs);
+	dec_objects_node(n, objects);
+}
+
 /*
  * Hooks for other subsystems that check memory allocations. In a typical
  * production configuration these hooks all should produce no code at all.
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/3] mm/slub: Use node_nr_slabs() instead of slabs_node()
  2020-06-14 12:39 [PATCH 0/3] mm/slub: Fix slabs_node return value Muchun Song
  2020-06-14 12:39 ` [PATCH 1/3] mm/slub: Fix slabs_node return value when CONFIG_SLUB_DEBUG disabled Muchun Song
@ 2020-06-14 12:39 ` Muchun Song
  2020-06-15  6:15   ` Joonsoo Kim
  2020-06-14 12:39 ` [PATCH 3/3] mm/slub: Fix release all resources used by a slab cache Muchun Song
  2020-06-17 16:19 ` [PATCH 0/3] mm/slub: Fix slabs_node return value Christopher Lameter
  3 siblings, 1 reply; 14+ messages in thread
From: Muchun Song @ 2020-06-14 12:39 UTC (permalink / raw)
  To: cl, penberg, rientjes, iamjoonsoo.kim, akpm
  Cc: linux-mm, linux-kernel, Muchun Song

In the some code, we already get the kmem_cache_node, so we can
use node_nr_slabs() directly instead of slabs_node(). Check the
condition of n->nr_partial can also be removed because we can get
the correct result via node_nr_slabs().

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 mm/slub.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 1a3e6a5b7287..b73505df3de2 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3829,7 +3829,7 @@ bool __kmem_cache_empty(struct kmem_cache *s)
 	struct kmem_cache_node *n;
 
 	for_each_kmem_cache_node(s, node, n)
-		if (n->nr_partial || slabs_node(s, node))
+		if (node_nr_slabs(n))
 			return false;
 	return true;
 }
@@ -3846,7 +3846,7 @@ int __kmem_cache_shutdown(struct kmem_cache *s)
 	/* Attempt to free all objects */
 	for_each_kmem_cache_node(s, node, n) {
 		free_partial(s, n);
-		if (n->nr_partial || slabs_node(s, node))
+		if (node_nr_slabs(n))
 			return 1;
 	}
 	sysfs_slab_remove(s);
@@ -4126,7 +4126,7 @@ int __kmem_cache_shrink(struct kmem_cache *s)
 		list_for_each_entry_safe(page, t, &discard, slab_list)
 			discard_slab(s, page);
 
-		if (slabs_node(s, node))
+		if (node_nr_slabs(n))
 			ret = 1;
 	}
 
@@ -4201,7 +4201,7 @@ static void slab_mem_offline_callback(void *arg)
 			 * and offline_pages() function shouldn't call this
 			 * callback. So, we must fail.
 			 */
-			BUG_ON(slabs_node(s, offline_node));
+			BUG_ON(node_nr_slabs(n));
 
 			s->node[offline_node] = NULL;
 			kmem_cache_free(kmem_cache_node, n);
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/3] mm/slub: Fix release all resources used by a slab cache
  2020-06-14 12:39 [PATCH 0/3] mm/slub: Fix slabs_node return value Muchun Song
  2020-06-14 12:39 ` [PATCH 1/3] mm/slub: Fix slabs_node return value when CONFIG_SLUB_DEBUG disabled Muchun Song
  2020-06-14 12:39 ` [PATCH 2/3] mm/slub: Use node_nr_slabs() instead of slabs_node() Muchun Song
@ 2020-06-14 12:39 ` Muchun Song
  2020-06-15  6:23   ` Joonsoo Kim
  2020-06-17 16:19 ` [PATCH 0/3] mm/slub: Fix slabs_node return value Christopher Lameter
  3 siblings, 1 reply; 14+ messages in thread
From: Muchun Song @ 2020-06-14 12:39 UTC (permalink / raw)
  To: cl, penberg, rientjes, iamjoonsoo.kim, akpm
  Cc: linux-mm, linux-kernel, Muchun Song

The function of __kmem_cache_shutdown() is that release all resources
used by the slab cache, while currently it stop release resources when
the preceding node is not empty.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 mm/slub.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index b73505df3de2..4e477ef0f2b9 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3839,6 +3839,7 @@ bool __kmem_cache_empty(struct kmem_cache *s)
  */
 int __kmem_cache_shutdown(struct kmem_cache *s)
 {
+	int ret = 0;
 	int node;
 	struct kmem_cache_node *n;
 
@@ -3846,11 +3847,11 @@ int __kmem_cache_shutdown(struct kmem_cache *s)
 	/* Attempt to free all objects */
 	for_each_kmem_cache_node(s, node, n) {
 		free_partial(s, n);
-		if (node_nr_slabs(n))
-			return 1;
+		if (!ret && node_nr_slabs(n))
+			ret = 1;
 	}
 	sysfs_slab_remove(s);
-	return 0;
+	return ret;
 }
 
 /********************************************************************
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] mm/slub: Fix slabs_node return value when CONFIG_SLUB_DEBUG disabled
  2020-06-14 12:39 ` [PATCH 1/3] mm/slub: Fix slabs_node return value when CONFIG_SLUB_DEBUG disabled Muchun Song
@ 2020-06-15  6:11   ` Joonsoo Kim
  2020-06-15  6:26     ` [External] " Muchun Song
  2020-06-15 16:46   ` Vlastimil Babka
  1 sibling, 1 reply; 14+ messages in thread
From: Joonsoo Kim @ 2020-06-15  6:11 UTC (permalink / raw)
  To: Muchun Song
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Linux Memory Management List, LKML

2020년 6월 14일 (일) 오후 9:39, Muchun Song <songmuchun@bytedance.com>님이 작성:
>
> The slabs_node() always return zero when CONFIG_SLUB_DEBUG is disabled.
> But some codes determine whether slab is empty by checking the return
> value of slabs_node(). As you know, the result is not correct. This
> problem can be reproduce by the follow code(and boot system with the
> cmdline of "slub_nomerge"):
>
>   void *objs[32];
>   struct kmem_cache *cache = kmem_cache_create("kmem-test", 128, 0,
>                         0, 0);
>
>   if (cache) {
>         int i;
>
>         /* Make a full slab */
>         for (i = 0; i < ARRAY_SIZE(objs); i++)
>                 objs[i] = kmem_cache_alloc(cache, GFP_KERNEL_ACCOUNT);
>
>         /*
>          * This really should fail because the slab cache still has
>          * objects. But we did destroy the @cache because of zero
>          * returned by slabs_node().
>          */
>         kmem_cache_destroy(cache);
>   }
>
> To fix it, we can move the nr_slabs of kmem_cache_node out of the
> CONFIG_SLUB_DEBUG. So we can get the corrent value returned by the
> slabs_node().
>
> With this patch applied, we will get a warning message and stack
> trace in the dmesg.
>
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  mm/slab.h |  2 +-
>  mm/slub.c | 80 +++++++++++++++++++++++++++++++++------------------------------
>  2 files changed, 43 insertions(+), 39 deletions(-)
>
> diff --git a/mm/slab.h b/mm/slab.h
> index 0b91f2a7b033..062d4542b7e2 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -619,8 +619,8 @@ struct kmem_cache_node {
>  #ifdef CONFIG_SLUB
>         unsigned long nr_partial;
>         struct list_head partial;
> -#ifdef CONFIG_SLUB_DEBUG
>         atomic_long_t nr_slabs;
> +#ifdef CONFIG_SLUB_DEBUG
>         atomic_long_t total_objects;
>         struct list_head full;
>  #endif
> diff --git a/mm/slub.c b/mm/slub.c
> index 49b5cb7da318..1a3e6a5b7287 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c

Hello,

You also need to initialize nr_slabs in init_kmem_cache_node()
on !CONFIG_SLUB_DEBUG.

Otherwise, looks good to me.

Thanks.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] mm/slub: Use node_nr_slabs() instead of slabs_node()
  2020-06-14 12:39 ` [PATCH 2/3] mm/slub: Use node_nr_slabs() instead of slabs_node() Muchun Song
@ 2020-06-15  6:15   ` Joonsoo Kim
  2020-06-15  6:20     ` [External] " Muchun Song
  0 siblings, 1 reply; 14+ messages in thread
From: Joonsoo Kim @ 2020-06-15  6:15 UTC (permalink / raw)
  To: Muchun Song
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Linux Memory Management List, LKML

2020년 6월 14일 (일) 오후 9:39, Muchun Song <songmuchun@bytedance.com>님이 작성:
>
> In the some code, we already get the kmem_cache_node, so we can
> use node_nr_slabs() directly instead of slabs_node(). Check the
> condition of n->nr_partial can also be removed because we can get
> the correct result via node_nr_slabs().
>
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>

If appying this patch, there is no reference to slabs_node(). Please remove it.

Thanks.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [External] Re: [PATCH 2/3] mm/slub: Use node_nr_slabs() instead of slabs_node()
  2020-06-15  6:15   ` Joonsoo Kim
@ 2020-06-15  6:20     ` Muchun Song
  0 siblings, 0 replies; 14+ messages in thread
From: Muchun Song @ 2020-06-15  6:20 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Linux Memory Management List, LKML

On Mon, Jun 15, 2020 at 2:15 PM Joonsoo Kim <js1304@gmail.com> wrote:
>
> 2020년 6월 14일 (일) 오후 9:39, Muchun Song <songmuchun@bytedance.com>님이 작성:
> >
> > In the some code, we already get the kmem_cache_node, so we can
> > use node_nr_slabs() directly instead of slabs_node(). Check the
> > condition of n->nr_partial can also be removed because we can get
> > the correct result via node_nr_slabs().
> >
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>
> If appying this patch, there is no reference to slabs_node(). Please remove it.
>
> Thanks.

OK, Thanks. I will remove it in the next version.

-- 
Yours,
Muchun


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] mm/slub: Fix release all resources used by a slab cache
  2020-06-14 12:39 ` [PATCH 3/3] mm/slub: Fix release all resources used by a slab cache Muchun Song
@ 2020-06-15  6:23   ` Joonsoo Kim
  2020-06-15  6:41     ` [External] " Muchun Song
  0 siblings, 1 reply; 14+ messages in thread
From: Joonsoo Kim @ 2020-06-15  6:23 UTC (permalink / raw)
  To: Muchun Song
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Linux Memory Management List, LKML

2020년 6월 14일 (일) 오후 9:39, Muchun Song <songmuchun@bytedance.com>님이 작성:
>
> The function of __kmem_cache_shutdown() is that release all resources
> used by the slab cache, while currently it stop release resources when
> the preceding node is not empty.
>
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  mm/slub.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index b73505df3de2..4e477ef0f2b9 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -3839,6 +3839,7 @@ bool __kmem_cache_empty(struct kmem_cache *s)
>   */
>  int __kmem_cache_shutdown(struct kmem_cache *s)
>  {
> +       int ret = 0;
>         int node;
>         struct kmem_cache_node *n;
>
> @@ -3846,11 +3847,11 @@ int __kmem_cache_shutdown(struct kmem_cache *s)
>         /* Attempt to free all objects */
>         for_each_kmem_cache_node(s, node, n) {
>                 free_partial(s, n);
> -               if (node_nr_slabs(n))
> -                       return 1;
> +               if (!ret && node_nr_slabs(n))
> +                       ret = 1;
>         }

I don't think that this is an improvement.

If the shutdown condition isn't met, we don't need to process further.
Just 'return 1' looks okay to me.

And, with this change, sysfs_slab_remove() is called even if the
shutdown is failed.
It's better not to have side effects when failing.

Thanks.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [External] Re: [PATCH 1/3] mm/slub: Fix slabs_node return value when CONFIG_SLUB_DEBUG disabled
  2020-06-15  6:11   ` Joonsoo Kim
@ 2020-06-15  6:26     ` Muchun Song
  0 siblings, 0 replies; 14+ messages in thread
From: Muchun Song @ 2020-06-15  6:26 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Linux Memory Management List, LKML

On Mon, Jun 15, 2020 at 2:11 PM Joonsoo Kim <js1304@gmail.com> wrote:
>
> 2020년 6월 14일 (일) 오후 9:39, Muchun Song <songmuchun@bytedance.com>님이 작성:
> >
> > The slabs_node() always return zero when CONFIG_SLUB_DEBUG is disabled.
> > But some codes determine whether slab is empty by checking the return
> > value of slabs_node(). As you know, the result is not correct. This
> > problem can be reproduce by the follow code(and boot system with the
> > cmdline of "slub_nomerge"):
> >
> >   void *objs[32];
> >   struct kmem_cache *cache = kmem_cache_create("kmem-test", 128, 0,
> >                         0, 0);
> >
> >   if (cache) {
> >         int i;
> >
> >         /* Make a full slab */
> >         for (i = 0; i < ARRAY_SIZE(objs); i++)
> >                 objs[i] = kmem_cache_alloc(cache, GFP_KERNEL_ACCOUNT);
> >
> >         /*
> >          * This really should fail because the slab cache still has
> >          * objects. But we did destroy the @cache because of zero
> >          * returned by slabs_node().
> >          */
> >         kmem_cache_destroy(cache);
> >   }
> >
> > To fix it, we can move the nr_slabs of kmem_cache_node out of the
> > CONFIG_SLUB_DEBUG. So we can get the corrent value returned by the
> > slabs_node().
> >
> > With this patch applied, we will get a warning message and stack
> > trace in the dmesg.
> >
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  mm/slab.h |  2 +-
> >  mm/slub.c | 80 +++++++++++++++++++++++++++++++++------------------------------
> >  2 files changed, 43 insertions(+), 39 deletions(-)
> >
> > diff --git a/mm/slab.h b/mm/slab.h
> > index 0b91f2a7b033..062d4542b7e2 100644
> > --- a/mm/slab.h
> > +++ b/mm/slab.h
> > @@ -619,8 +619,8 @@ struct kmem_cache_node {
> >  #ifdef CONFIG_SLUB
> >         unsigned long nr_partial;
> >         struct list_head partial;
> > -#ifdef CONFIG_SLUB_DEBUG
> >         atomic_long_t nr_slabs;
> > +#ifdef CONFIG_SLUB_DEBUG
> >         atomic_long_t total_objects;
> >         struct list_head full;
> >  #endif
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 49b5cb7da318..1a3e6a5b7287 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
>
> Hello,
>
> You also need to initialize nr_slabs in init_kmem_cache_node()
> on !CONFIG_SLUB_DEBUG.

Good catch, thanks! I will fix it in the next version.

>
> Otherwise, looks good to me.
>
> Thanks.



-- 
Yours,
Muchun


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [External] Re: [PATCH 3/3] mm/slub: Fix release all resources used by a slab cache
  2020-06-15  6:23   ` Joonsoo Kim
@ 2020-06-15  6:41     ` Muchun Song
  2020-06-15  7:25       ` Joonsoo Kim
  0 siblings, 1 reply; 14+ messages in thread
From: Muchun Song @ 2020-06-15  6:41 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Linux Memory Management List, LKML

On Mon, Jun 15, 2020 at 2:23 PM Joonsoo Kim <js1304@gmail.com> wrote:
>
> 2020년 6월 14일 (일) 오후 9:39, Muchun Song <songmuchun@bytedance.com>님이 작성:
> >
> > The function of __kmem_cache_shutdown() is that release all resources
> > used by the slab cache, while currently it stop release resources when
> > the preceding node is not empty.
> >
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  mm/slub.c | 7 ++++---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> >
> > diff --git a/mm/slub.c b/mm/slub.c
> > index b73505df3de2..4e477ef0f2b9 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -3839,6 +3839,7 @@ bool __kmem_cache_empty(struct kmem_cache *s)
> >   */
> >  int __kmem_cache_shutdown(struct kmem_cache *s)
> >  {
> > +       int ret = 0;
> >         int node;
> >         struct kmem_cache_node *n;
> >
> > @@ -3846,11 +3847,11 @@ int __kmem_cache_shutdown(struct kmem_cache *s)
> >         /* Attempt to free all objects */
> >         for_each_kmem_cache_node(s, node, n) {
> >                 free_partial(s, n);
> > -               if (node_nr_slabs(n))
> > -                       return 1;
> > +               if (!ret && node_nr_slabs(n))
> > +                       ret = 1;
> >         }
>
> I don't think that this is an improvement.
>
> If the shutdown condition isn't met, we don't need to process further.
> Just 'return 1' looks okay to me.
>
> And, with this change, sysfs_slab_remove() is called even if the
> shutdown is failed.
> It's better not to have side effects when failing.

If someone calls __kmem_cache_shutdown, he may want to release
resources used by the slab cache as much as possible. If we continue,
we may release more pages. From this point, is it an improvement?

-- 
Yours,
Muchun


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [External] Re: [PATCH 3/3] mm/slub: Fix release all resources used by a slab cache
  2020-06-15  6:41     ` [External] " Muchun Song
@ 2020-06-15  7:25       ` Joonsoo Kim
  2020-06-15  7:50         ` Muchun Song
  0 siblings, 1 reply; 14+ messages in thread
From: Joonsoo Kim @ 2020-06-15  7:25 UTC (permalink / raw)
  To: Muchun Song
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Linux Memory Management List, LKML

2020년 6월 15일 (월) 오후 3:41, Muchun Song <songmuchun@bytedance.com>님이 작성:
>
> On Mon, Jun 15, 2020 at 2:23 PM Joonsoo Kim <js1304@gmail.com> wrote:
> >
> > 2020년 6월 14일 (일) 오후 9:39, Muchun Song <songmuchun@bytedance.com>님이 작성:
> > >
> > > The function of __kmem_cache_shutdown() is that release all resources
> > > used by the slab cache, while currently it stop release resources when
> > > the preceding node is not empty.
> > >
> > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > ---
> > >  mm/slub.c | 7 ++++---
> > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/mm/slub.c b/mm/slub.c
> > > index b73505df3de2..4e477ef0f2b9 100644
> > > --- a/mm/slub.c
> > > +++ b/mm/slub.c
> > > @@ -3839,6 +3839,7 @@ bool __kmem_cache_empty(struct kmem_cache *s)
> > >   */
> > >  int __kmem_cache_shutdown(struct kmem_cache *s)
> > >  {
> > > +       int ret = 0;
> > >         int node;
> > >         struct kmem_cache_node *n;
> > >
> > > @@ -3846,11 +3847,11 @@ int __kmem_cache_shutdown(struct kmem_cache *s)
> > >         /* Attempt to free all objects */
> > >         for_each_kmem_cache_node(s, node, n) {
> > >                 free_partial(s, n);
> > > -               if (node_nr_slabs(n))
> > > -                       return 1;
> > > +               if (!ret && node_nr_slabs(n))
> > > +                       ret = 1;
> > >         }
> >
> > I don't think that this is an improvement.
> >
> > If the shutdown condition isn't met, we don't need to process further.
> > Just 'return 1' looks okay to me.
> >
> > And, with this change, sysfs_slab_remove() is called even if the
> > shutdown is failed.
> > It's better not to have side effects when failing.
>
> If someone calls __kmem_cache_shutdown, he may want to release
> resources used by the slab cache as much as possible. If we continue,
> we may release more pages. From this point, is it an improvement?

My opinion is not strong but I still think that it's not useful enough
to complicate
the code.

If shutdown is failed, it implies there are some bugs and someone should fix it.
Releasing more resources would mitigate the resource problem but doesn't
change the situation that someone should fix it soon.

Anyway, I don't object more if you don't agree with my opinion. In that case,
please fix not to call sysfs_slab_remove() when ret is 1.

Thanks.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [External] Re: [PATCH 3/3] mm/slub: Fix release all resources used by a slab cache
  2020-06-15  7:25       ` Joonsoo Kim
@ 2020-06-15  7:50         ` Muchun Song
  0 siblings, 0 replies; 14+ messages in thread
From: Muchun Song @ 2020-06-15  7:50 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Linux Memory Management List, LKML

On Mon, Jun 15, 2020 at 3:25 PM Joonsoo Kim <js1304@gmail.com> wrote:
>
> 2020년 6월 15일 (월) 오후 3:41, Muchun Song <songmuchun@bytedance.com>님이 작성:
> >
> > On Mon, Jun 15, 2020 at 2:23 PM Joonsoo Kim <js1304@gmail.com> wrote:
> > >
> > > 2020년 6월 14일 (일) 오후 9:39, Muchun Song <songmuchun@bytedance.com>님이 작성:
> > > >
> > > > The function of __kmem_cache_shutdown() is that release all resources
> > > > used by the slab cache, while currently it stop release resources when
> > > > the preceding node is not empty.
> > > >
> > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > > ---
> > > >  mm/slub.c | 7 ++++---
> > > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/mm/slub.c b/mm/slub.c
> > > > index b73505df3de2..4e477ef0f2b9 100644
> > > > --- a/mm/slub.c
> > > > +++ b/mm/slub.c
> > > > @@ -3839,6 +3839,7 @@ bool __kmem_cache_empty(struct kmem_cache *s)
> > > >   */
> > > >  int __kmem_cache_shutdown(struct kmem_cache *s)
> > > >  {
> > > > +       int ret = 0;
> > > >         int node;
> > > >         struct kmem_cache_node *n;
> > > >
> > > > @@ -3846,11 +3847,11 @@ int __kmem_cache_shutdown(struct kmem_cache *s)
> > > >         /* Attempt to free all objects */
> > > >         for_each_kmem_cache_node(s, node, n) {
> > > >                 free_partial(s, n);
> > > > -               if (node_nr_slabs(n))
> > > > -                       return 1;
> > > > +               if (!ret && node_nr_slabs(n))
> > > > +                       ret = 1;
> > > >         }
> > >
> > > I don't think that this is an improvement.
> > >
> > > If the shutdown condition isn't met, we don't need to process further.
> > > Just 'return 1' looks okay to me.
> > >
> > > And, with this change, sysfs_slab_remove() is called even if the
> > > shutdown is failed.
> > > It's better not to have side effects when failing.
> >
> > If someone calls __kmem_cache_shutdown, he may want to release
> > resources used by the slab cache as much as possible. If we continue,
> > we may release more pages. From this point, is it an improvement?
>
> My opinion is not strong but I still think that it's not useful enough
> to complicate
> the code.
>
> If shutdown is failed, it implies there are some bugs and someone should fix it.

Yeah, I agree with you.

> Releasing more resources would mitigate the resource problem but doesn't
> change the situation that someone should fix it soon.
>
> Anyway, I don't object more if you don't agree with my opinion. In that case,
> please fix not to call sysfs_slab_remove() when ret is 1.
>

Yeah, we should call sysfs_slab_remove only when ret is zero. Thanks very
much.

-- 
Yours,
Muchun


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] mm/slub: Fix slabs_node return value when CONFIG_SLUB_DEBUG disabled
  2020-06-14 12:39 ` [PATCH 1/3] mm/slub: Fix slabs_node return value when CONFIG_SLUB_DEBUG disabled Muchun Song
  2020-06-15  6:11   ` Joonsoo Kim
@ 2020-06-15 16:46   ` Vlastimil Babka
  1 sibling, 0 replies; 14+ messages in thread
From: Vlastimil Babka @ 2020-06-15 16:46 UTC (permalink / raw)
  To: Muchun Song, cl, penberg, rientjes, iamjoonsoo.kim, akpm
  Cc: linux-mm, linux-kernel, Roman Gushchin

On 6/14/20 2:39 PM, Muchun Song wrote:
> The slabs_node() always return zero when CONFIG_SLUB_DEBUG is disabled.
> But some codes determine whether slab is empty by checking the return
> value of slabs_node(). As you know, the result is not correct. This
> problem can be reproduce by the follow code(and boot system with the
> cmdline of "slub_nomerge"):
> 
>   void *objs[32];
>   struct kmem_cache *cache = kmem_cache_create("kmem-test", 128, 0,
> 			0, 0);
> 
>   if (cache) {
>   	int i;
> 
> 	/* Make a full slab */
>   	for (i = 0; i < ARRAY_SIZE(objs); i++)
> 		objs[i] = kmem_cache_alloc(cache, GFP_KERNEL_ACCOUNT);
> 
> 	/*
>   	 * This really should fail because the slab cache still has
>   	 * objects. But we did destroy the @cache because of zero
>   	 * returned by slabs_node().
>   	 */
>   	kmem_cache_destroy(cache);
>   }

Hmm, the whole SLUB subsystem (or even kernel) has some trust in the other
individual subsystems doing the right thing. I.e. if allocate an object once and
free it twice, bad things will happen. If you are debugging such a problem, you
enable SLUB_DEBUG (and the boot parameter).

This particular situation is the same - SLUB trusts its users to free their
objects before calling kmem_cache_destroy(). If they don't and just leak them,
then they stay leaked and nothing worse will happen (I'd expect). If they
destroy the cache and later try to e.g. free the objects, then there will most
likely be a crash.

And yeah, if you are debugging such thing, you will compile with SLUB_DEBUG and
make sure there's slub_nomerge (or you won't catch it anyway).

Hmm, but there's one exception in __kmem_cache_shrink() where the caller wants
to know if all slabs were discarded. kmem_cache_shrink() passes that to its
callers, but seems nobody cares about the return value.

The only caller that seems to care is __kmemcg_cache_deactivate_after_rcu():

        if (!__kmem_cache_shrink(s))
                sysfs_slab_remove(s);

Perhaps that will go away with memcg rework? (CC Roman). If yes, we also have
the option to stop returning the value in kmem_cache_shrink() (as nobody seems
to care), and then slabs_node() remains a debug only thing.

> To fix it, we can move the nr_slabs of kmem_cache_node out of the
> CONFIG_SLUB_DEBUG. So we can get the corrent value returned by the
> slabs_node().
> 
> With this patch applied, we will get a warning message and stack
> trace in the dmesg.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  mm/slab.h |  2 +-
>  mm/slub.c | 80 +++++++++++++++++++++++++++++++++------------------------------
>  2 files changed, 43 insertions(+), 39 deletions(-)
> 
> diff --git a/mm/slab.h b/mm/slab.h
> index 0b91f2a7b033..062d4542b7e2 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -619,8 +619,8 @@ struct kmem_cache_node {
>  #ifdef CONFIG_SLUB
>  	unsigned long nr_partial;
>  	struct list_head partial;
> -#ifdef CONFIG_SLUB_DEBUG
>  	atomic_long_t nr_slabs;
> +#ifdef CONFIG_SLUB_DEBUG
>  	atomic_long_t total_objects;
>  	struct list_head full;
>  #endif
> diff --git a/mm/slub.c b/mm/slub.c
> index 49b5cb7da318..1a3e6a5b7287 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1070,39 +1070,14 @@ static void remove_full(struct kmem_cache *s, struct kmem_cache_node *n, struct
>  	list_del(&page->slab_list);
>  }
>  
> -/* Tracking of the number of slabs for debugging purposes */
> -static inline unsigned long slabs_node(struct kmem_cache *s, int node)
> +/* Tracking of the number of objects for debugging purposes */
> +static inline void inc_objects_node(struct kmem_cache_node *n, int objects)
>  {
> -	struct kmem_cache_node *n = get_node(s, node);
> -
> -	return atomic_long_read(&n->nr_slabs);
> +	atomic_long_add(objects, &n->total_objects);
>  }
>  
> -static inline unsigned long node_nr_slabs(struct kmem_cache_node *n)
> +static inline void dec_objects_node(struct kmem_cache_node *n, int objects)
>  {
> -	return atomic_long_read(&n->nr_slabs);
> -}
> -
> -static inline void inc_slabs_node(struct kmem_cache *s, int node, int objects)
> -{
> -	struct kmem_cache_node *n = get_node(s, node);
> -
> -	/*
> -	 * May be called early in order to allocate a slab for the
> -	 * kmem_cache_node structure. Solve the chicken-egg
> -	 * dilemma by deferring the increment of the count during
> -	 * bootstrap (see early_kmem_cache_node_alloc).
> -	 */
> -	if (likely(n)) {
> -		atomic_long_inc(&n->nr_slabs);
> -		atomic_long_add(objects, &n->total_objects);
> -	}
> -}
> -static inline void dec_slabs_node(struct kmem_cache *s, int node, int objects)
> -{
> -	struct kmem_cache_node *n = get_node(s, node);
> -
> -	atomic_long_dec(&n->nr_slabs);
>  	atomic_long_sub(objects, &n->total_objects);
>  }
>  
> @@ -1413,15 +1388,8 @@ slab_flags_t kmem_cache_flags(unsigned int object_size,
>  
>  #define disable_higher_order_debug 0
>  
> -static inline unsigned long slabs_node(struct kmem_cache *s, int node)
> -							{ return 0; }
> -static inline unsigned long node_nr_slabs(struct kmem_cache_node *n)
> -							{ return 0; }
> -static inline void inc_slabs_node(struct kmem_cache *s, int node,
> -							int objects) {}
> -static inline void dec_slabs_node(struct kmem_cache *s, int node,
> -							int objects) {}
> -
> +static inline void inc_objects_node(struct kmem_cache_node *n, int objects) {}
> +static inline void dec_objects_node(struct kmem_cache_node *n, int objects) {}
>  static bool freelist_corrupted(struct kmem_cache *s, struct page *page,
>  			       void *freelist, void *nextfree)
>  {
> @@ -1429,6 +1397,42 @@ static bool freelist_corrupted(struct kmem_cache *s, struct page *page,
>  }
>  #endif /* CONFIG_SLUB_DEBUG */
>  
> +static inline unsigned long slabs_node(struct kmem_cache *s, int node)
> +{
> +	struct kmem_cache_node *n = get_node(s, node);
> +
> +	return atomic_long_read(&n->nr_slabs);
> +}
> +
> +static inline unsigned long node_nr_slabs(struct kmem_cache_node *n)
> +{
> +	return atomic_long_read(&n->nr_slabs);
> +}
> +
> +static inline void inc_slabs_node(struct kmem_cache *s, int node, int objects)
> +{
> +	struct kmem_cache_node *n = get_node(s, node);
> +
> +	/*
> +	 * May be called early in order to allocate a slab for the
> +	 * kmem_cache_node structure. Solve the chicken-egg
> +	 * dilemma by deferring the increment of the count during
> +	 * bootstrap (see early_kmem_cache_node_alloc).
> +	 */
> +	if (likely(n)) {
> +		atomic_long_inc(&n->nr_slabs);
> +		inc_objects_node(n, objects);
> +	}
> +}
> +
> +static inline void dec_slabs_node(struct kmem_cache *s, int node, int objects)
> +{
> +	struct kmem_cache_node *n = get_node(s, node);
> +
> +	atomic_long_dec(&n->nr_slabs);
> +	dec_objects_node(n, objects);
> +}
> +
>  /*
>   * Hooks for other subsystems that check memory allocations. In a typical
>   * production configuration these hooks all should produce no code at all.
> 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] mm/slub: Fix slabs_node return value
  2020-06-14 12:39 [PATCH 0/3] mm/slub: Fix slabs_node return value Muchun Song
                   ` (2 preceding siblings ...)
  2020-06-14 12:39 ` [PATCH 3/3] mm/slub: Fix release all resources used by a slab cache Muchun Song
@ 2020-06-17 16:19 ` Christopher Lameter
  3 siblings, 0 replies; 14+ messages in thread
From: Christopher Lameter @ 2020-06-17 16:19 UTC (permalink / raw)
  To: Muchun Song
  Cc: penberg, rientjes, iamjoonsoo.kim, akpm, linux-mm, linux-kernel

On Sun, 14 Jun 2020, Muchun Song wrote:

> The slabs_node() always return zero when CONFIG_SLUB_DEBUG is disabled.
> But some codes determine whether slab is empty by checking the return
> value of slabs_node(). As you know, the result is not correct. we move
> the nr_slabs of kmem_cache_node out of the CONFIG_SLUB_DEBUG. So we can
> get the corrent value returned by the slabs_node().

Not that all distribution kernels have CONFIG_SLUB_DEBUG enabled. This
does not enable runtime debugging but only compiles the debug code in that
can then be enabled at runtime.

Users of !CONFIG_SLUB_DEBUG do not want to have the debug code included
because they have extreme requirements on memory use.

This patch increases use of memory by enabling fields that were excluded
under !CONFIG_SLUB_DEBUG before!

There is nothing wrong with slab_node's return value if one wants to
sacrifice debugging and consistency checks for a small code build.




^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-06-17 16:19 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-14 12:39 [PATCH 0/3] mm/slub: Fix slabs_node return value Muchun Song
2020-06-14 12:39 ` [PATCH 1/3] mm/slub: Fix slabs_node return value when CONFIG_SLUB_DEBUG disabled Muchun Song
2020-06-15  6:11   ` Joonsoo Kim
2020-06-15  6:26     ` [External] " Muchun Song
2020-06-15 16:46   ` Vlastimil Babka
2020-06-14 12:39 ` [PATCH 2/3] mm/slub: Use node_nr_slabs() instead of slabs_node() Muchun Song
2020-06-15  6:15   ` Joonsoo Kim
2020-06-15  6:20     ` [External] " Muchun Song
2020-06-14 12:39 ` [PATCH 3/3] mm/slub: Fix release all resources used by a slab cache Muchun Song
2020-06-15  6:23   ` Joonsoo Kim
2020-06-15  6:41     ` [External] " Muchun Song
2020-06-15  7:25       ` Joonsoo Kim
2020-06-15  7:50         ` Muchun Song
2020-06-17 16:19 ` [PATCH 0/3] mm/slub: Fix slabs_node return value Christopher Lameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).