All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4.19.y,v5.4.y] mm: slab: fix kmem_cache_create failed when sysfs node not destroyed
@ 2021-07-20  8:20 Nanyong Sun
  2021-07-22 14:44 ` Greg KH
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Nanyong Sun @ 2021-07-20  8:20 UTC (permalink / raw)
  To: songmuchun, cl, penberg, rientjes, iamjoonsoo.kim, akpm
  Cc: linux-mm, linux-kernel, stable

The commit d38a2b7a9c93 ("mm: memcg/slab: fix memory leak at non-root
kmem_cache destroy") introduced a problem: If one thread destroy a
kmem_cache A and another thread concurrently create a kmem_cache B,
which is mergeable with A and has same size with A, the B may fail to
create due to the duplicate sysfs node.
The scenario in detail:
1) Thread 1 uses kmem_cache_destroy() to destroy kmem_cache A which is
mergeable, it decreases A's refcount and if refcount is 0, then call
memcg_set_kmem_cache_dying() which set A->memcg_params.dying = true,
then unlock the slab_mutex and call flush_memcg_workqueue(), it may cost
a while.
Note: now the sysfs node(like '/kernel/slab/:0000248') of A is still
present, it will be deleted in shutdown_cache() which will be called
after flush_memcg_workqueue() is done and lock the slab_mutex again.
2) Now if thread 2 is coming, it use kmem_cache_create() to create B, which
is mergeable with A(their size is same), it gain the lock of slab_mutex,
then call __kmem_cache_alias() trying to find a mergeable node, because
of the below added code in commit d38a2b7a9c93 ("mm: memcg/slab: fix
memory leak at non-root kmem_cache destroy"), B is not mergeable with
A whose memcg_params.dying is true.

int slab_unmergeable(struct kmem_cache *s)
 	if (s->refcount < 0)
 		return 1;

	/*
	 * Skip the dying kmem_cache.
	 */
	if (s->memcg_params.dying)
		return 1;

 	return 0;
 }

So B has to create its own sysfs node by calling:
 create_cache->
	__kmem_cache_create->
		sysfs_slab_add->
			kobject_init_and_add
Because B is mergeable itself, its filename of sysfs node is based on its size,
like '/kernel/slab/:0000248', which is duplicate with A, and the sysfs
node of A is still present now, so kobject_init_and_add() will return
fail and result in kmem_cache_create() fail.

Concurrently modprobe and rmmod the two modules below can reproduce the issue
quickly: nf_conntrack_expect, se_sess_cache. See call trace in the end.

LTS versions of v4.19.y and v5.4.y have this problem, whereas linux versions after
v5.9 do not have this problem because the patchset: ("The new cgroup slab memory
controller") almost refactored memcg slab.

A potential solution(this patch belongs): Just let the dying kmem_cache be mergeable,
the slab_mutex lock can prevent the race between alias kmem_cache creating thread
and root kmem_cache destroying thread. In the destroying thread, after
flush_memcg_workqueue() is done, judge the refcount again, if someone
reference it again during un-lock time, we don't need to destroy the kmem_cache
completely, we can reuse it.

Another potential solution: revert the commit d38a2b7a9c93 ("mm: memcg/slab:
fix memory leak at non-root kmem_cache destroy"), compare to the fail of
kmem_cache_create, the memory leak in special scenario seems less harmful.

Call trace:
 sysfs: cannot create duplicate filename '/kernel/slab/:0000248'
 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
 Call trace:
  dump_backtrace+0x0/0x198
  show_stack+0x24/0x30
  dump_stack+0xb0/0x100
  sysfs_warn_dup+0x6c/0x88
  sysfs_create_dir_ns+0x104/0x120
  kobject_add_internal+0xd0/0x378
  kobject_init_and_add+0x90/0xd8
  sysfs_slab_add+0x16c/0x2d0
  __kmem_cache_create+0x16c/0x1d8
  create_cache+0xbc/0x1f8
  kmem_cache_create_usercopy+0x1a0/0x230
  kmem_cache_create+0x50/0x68
  init_se_kmem_caches+0x38/0x258 [target_core_mod]
  target_core_init_configfs+0x8c/0x390 [target_core_mod]
  do_one_initcall+0x54/0x230
  do_init_module+0x64/0x1ec
  load_module+0x150c/0x16f0
  __se_sys_finit_module+0xf0/0x108
  __arm64_sys_finit_module+0x24/0x30
  el0_svc_common+0x80/0x1c0
  el0_svc_handler+0x78/0xe0
  el0_svc+0x10/0x260
 kobject_add_internal failed for :0000248 with -EEXIST, don't try to register things with the same name in the same directory.
 kmem_cache_create(se_sess_cache) failed with error -17
 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
 Call trace:
  dump_backtrace+0x0/0x198
  show_stack+0x24/0x30
  dump_stack+0xb0/0x100
  kmem_cache_create_usercopy+0xa8/0x230
  kmem_cache_create+0x50/0x68
  init_se_kmem_caches+0x38/0x258 [target_core_mod]
  target_core_init_configfs+0x8c/0x390 [target_core_mod]
  do_one_initcall+0x54/0x230
  do_init_module+0x64/0x1ec
  load_module+0x150c/0x16f0
  __se_sys_finit_module+0xf0/0x108
  __arm64_sys_finit_module+0x24/0x30
  el0_svc_common+0x80/0x1c0
  el0_svc_handler+0x78/0xe0
  el0_svc+0x10/0x260

Fixes: d38a2b7a9c93 ("mm: memcg/slab: fix memory leak at non-root kmem_cache destroy")
Signed-off-by: Nanyong Sun <sunnanyong@huawei.com>
Cc: stable@vger.kernel.org
---
 mm/slab_common.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index d208b47e01a8..acc743315bb5 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -326,14 +326,6 @@ int slab_unmergeable(struct kmem_cache *s)
 	if (s->refcount < 0)
 		return 1;
 
-#ifdef CONFIG_MEMCG_KMEM
-	/*
-	 * Skip the dying kmem_cache.
-	 */
-	if (s->memcg_params.dying)
-		return 1;
-#endif
-
 	return 0;
 }
 
@@ -947,6 +939,16 @@ void kmem_cache_destroy(struct kmem_cache *s)
 	get_online_mems();
 
 	mutex_lock(&slab_mutex);
+
+	/*
+	 *Another thread referenced it again
+	 */
+	if (READ_ONCE(s->refcount)) {
+		spin_lock_irq(&memcg_kmem_wq_lock);
+		s->memcg_params.dying = false;
+		spin_unlock_irq(&memcg_kmem_wq_lock);
+		goto out_unlock;
+	}
 #endif
 
 	err = shutdown_memcg_caches(s);
-- 
2.18.0.huawei.25


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v4.19.y,v5.4.y] mm: slab: fix kmem_cache_create failed when sysfs node not destroyed
  2021-07-20  8:20 [PATCH v4.19.y,v5.4.y] mm: slab: fix kmem_cache_create failed when sysfs node not destroyed Nanyong Sun
@ 2021-07-22 14:44 ` Greg KH
  2021-07-22 14:51 ` Patch "mm: slab: fix kmem_cache_create failed when sysfs node not destroyed" has been added to the 4.19-stable tree gregkh
  2021-07-22 14:52 ` Patch "mm: slab: fix kmem_cache_create failed when sysfs node not destroyed" has been added to the 5.4-stable tree gregkh
  2 siblings, 0 replies; 4+ messages in thread
From: Greg KH @ 2021-07-22 14:44 UTC (permalink / raw)
  To: Nanyong Sun
  Cc: songmuchun, cl, penberg, rientjes, iamjoonsoo.kim, akpm,
	linux-mm, linux-kernel, stable

On Tue, Jul 20, 2021 at 04:20:48PM +0800, Nanyong Sun wrote:
> The commit d38a2b7a9c93 ("mm: memcg/slab: fix memory leak at non-root
> kmem_cache destroy") introduced a problem: If one thread destroy a
> kmem_cache A and another thread concurrently create a kmem_cache B,
> which is mergeable with A and has same size with A, the B may fail to
> create due to the duplicate sysfs node.
> The scenario in detail:
> 1) Thread 1 uses kmem_cache_destroy() to destroy kmem_cache A which is
> mergeable, it decreases A's refcount and if refcount is 0, then call
> memcg_set_kmem_cache_dying() which set A->memcg_params.dying = true,
> then unlock the slab_mutex and call flush_memcg_workqueue(), it may cost
> a while.
> Note: now the sysfs node(like '/kernel/slab/:0000248') of A is still
> present, it will be deleted in shutdown_cache() which will be called
> after flush_memcg_workqueue() is done and lock the slab_mutex again.
> 2) Now if thread 2 is coming, it use kmem_cache_create() to create B, which
> is mergeable with A(their size is same), it gain the lock of slab_mutex,
> then call __kmem_cache_alias() trying to find a mergeable node, because
> of the below added code in commit d38a2b7a9c93 ("mm: memcg/slab: fix
> memory leak at non-root kmem_cache destroy"), B is not mergeable with
> A whose memcg_params.dying is true.
> 
> int slab_unmergeable(struct kmem_cache *s)
>  	if (s->refcount < 0)
>  		return 1;
> 
> 	/*
> 	 * Skip the dying kmem_cache.
> 	 */
> 	if (s->memcg_params.dying)
> 		return 1;
> 
>  	return 0;
>  }
> 
> So B has to create its own sysfs node by calling:
>  create_cache->
> 	__kmem_cache_create->
> 		sysfs_slab_add->
> 			kobject_init_and_add
> Because B is mergeable itself, its filename of sysfs node is based on its size,
> like '/kernel/slab/:0000248', which is duplicate with A, and the sysfs
> node of A is still present now, so kobject_init_and_add() will return
> fail and result in kmem_cache_create() fail.
> 
> Concurrently modprobe and rmmod the two modules below can reproduce the issue
> quickly: nf_conntrack_expect, se_sess_cache. See call trace in the end.
> 
> LTS versions of v4.19.y and v5.4.y have this problem, whereas linux versions after
> v5.9 do not have this problem because the patchset: ("The new cgroup slab memory
> controller") almost refactored memcg slab.
> 
> A potential solution(this patch belongs): Just let the dying kmem_cache be mergeable,
> the slab_mutex lock can prevent the race between alias kmem_cache creating thread
> and root kmem_cache destroying thread. In the destroying thread, after
> flush_memcg_workqueue() is done, judge the refcount again, if someone
> reference it again during un-lock time, we don't need to destroy the kmem_cache
> completely, we can reuse it.
> 
> Another potential solution: revert the commit d38a2b7a9c93 ("mm: memcg/slab:
> fix memory leak at non-root kmem_cache destroy"), compare to the fail of
> kmem_cache_create, the memory leak in special scenario seems less harmful.
> 
> Call trace:
>  sysfs: cannot create duplicate filename '/kernel/slab/:0000248'
>  Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
>  Call trace:
>   dump_backtrace+0x0/0x198
>   show_stack+0x24/0x30
>   dump_stack+0xb0/0x100
>   sysfs_warn_dup+0x6c/0x88
>   sysfs_create_dir_ns+0x104/0x120
>   kobject_add_internal+0xd0/0x378
>   kobject_init_and_add+0x90/0xd8
>   sysfs_slab_add+0x16c/0x2d0
>   __kmem_cache_create+0x16c/0x1d8
>   create_cache+0xbc/0x1f8
>   kmem_cache_create_usercopy+0x1a0/0x230
>   kmem_cache_create+0x50/0x68
>   init_se_kmem_caches+0x38/0x258 [target_core_mod]
>   target_core_init_configfs+0x8c/0x390 [target_core_mod]
>   do_one_initcall+0x54/0x230
>   do_init_module+0x64/0x1ec
>   load_module+0x150c/0x16f0
>   __se_sys_finit_module+0xf0/0x108
>   __arm64_sys_finit_module+0x24/0x30
>   el0_svc_common+0x80/0x1c0
>   el0_svc_handler+0x78/0xe0
>   el0_svc+0x10/0x260
>  kobject_add_internal failed for :0000248 with -EEXIST, don't try to register things with the same name in the same directory.
>  kmem_cache_create(se_sess_cache) failed with error -17
>  Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
>  Call trace:
>   dump_backtrace+0x0/0x198
>   show_stack+0x24/0x30
>   dump_stack+0xb0/0x100
>   kmem_cache_create_usercopy+0xa8/0x230
>   kmem_cache_create+0x50/0x68
>   init_se_kmem_caches+0x38/0x258 [target_core_mod]
>   target_core_init_configfs+0x8c/0x390 [target_core_mod]
>   do_one_initcall+0x54/0x230
>   do_init_module+0x64/0x1ec
>   load_module+0x150c/0x16f0
>   __se_sys_finit_module+0xf0/0x108
>   __arm64_sys_finit_module+0x24/0x30
>   el0_svc_common+0x80/0x1c0
>   el0_svc_handler+0x78/0xe0
>   el0_svc+0x10/0x260
> 
> Fixes: d38a2b7a9c93 ("mm: memcg/slab: fix memory leak at non-root kmem_cache destroy")
> Signed-off-by: Nanyong Sun <sunnanyong@huawei.com>
> Cc: stable@vger.kernel.org
> ---
>  mm/slab_common.c | 18 ++++++++++--------
>  1 file changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index d208b47e01a8..acc743315bb5 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -326,14 +326,6 @@ int slab_unmergeable(struct kmem_cache *s)
>  	if (s->refcount < 0)
>  		return 1;
>  
> -#ifdef CONFIG_MEMCG_KMEM
> -	/*
> -	 * Skip the dying kmem_cache.
> -	 */
> -	if (s->memcg_params.dying)
> -		return 1;
> -#endif
> -
>  	return 0;
>  }
>  
> @@ -947,6 +939,16 @@ void kmem_cache_destroy(struct kmem_cache *s)
>  	get_online_mems();
>  
>  	mutex_lock(&slab_mutex);
> +
> +	/*
> +	 *Another thread referenced it again
> +	 */
> +	if (READ_ONCE(s->refcount)) {
> +		spin_lock_irq(&memcg_kmem_wq_lock);
> +		s->memcg_params.dying = false;
> +		spin_unlock_irq(&memcg_kmem_wq_lock);
> +		goto out_unlock;
> +	}
>  #endif
>  
>  	err = shutdown_memcg_caches(s);
> -- 
> 2.18.0.huawei.25
> 

Thanks, I've queued this up now.

greg k-h

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Patch "mm: slab: fix kmem_cache_create failed when sysfs node not destroyed" has been added to the 4.19-stable tree
  2021-07-20  8:20 [PATCH v4.19.y,v5.4.y] mm: slab: fix kmem_cache_create failed when sysfs node not destroyed Nanyong Sun
  2021-07-22 14:44 ` Greg KH
@ 2021-07-22 14:51 ` gregkh
  2021-07-22 14:52 ` Patch "mm: slab: fix kmem_cache_create failed when sysfs node not destroyed" has been added to the 5.4-stable tree gregkh
  2 siblings, 0 replies; 4+ messages in thread
From: gregkh @ 2021-07-22 14:51 UTC (permalink / raw)
  To: akpm, cl, gregkh, iamjoonsoo.kim, linux-mm, penberg, rientjes,
	songmuchun, sunnanyong
  Cc: stable-commits


This is a note to let you know that I've just added the patch titled

    mm: slab: fix kmem_cache_create failed when sysfs node not destroyed

to the 4.19-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     mm-slab-fix-kmem_cache_create-failed-when-sysfs-node-not-destroyed.patch
and it can be found in the queue-4.19 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@vger.kernel.org> know about it.


From sunnanyong@huawei.com  Thu Jul 22 16:42:41 2021
From: Nanyong Sun <sunnanyong@huawei.com>
Date: Tue, 20 Jul 2021 16:20:48 +0800
Subject: mm: slab: fix kmem_cache_create failed when sysfs node not destroyed
To: <songmuchun@bytedance.com>, <cl@linux.com>, <penberg@kernel.org>, <rientjes@google.com>, <iamjoonsoo.kim@lge.com>, <akpm@linux-foundation.org>
Cc: <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>, <stable@vger.kernel.org>
Message-ID: <20210720082048.2797315-1-sunnanyong@huawei.com>

From: Nanyong Sun <sunnanyong@huawei.com>

The commit d38a2b7a9c93 ("mm: memcg/slab: fix memory leak at non-root
kmem_cache destroy") introduced a problem: If one thread destroy a
kmem_cache A and another thread concurrently create a kmem_cache B,
which is mergeable with A and has same size with A, the B may fail to
create due to the duplicate sysfs node.
The scenario in detail:
1) Thread 1 uses kmem_cache_destroy() to destroy kmem_cache A which is
mergeable, it decreases A's refcount and if refcount is 0, then call
memcg_set_kmem_cache_dying() which set A->memcg_params.dying = true,
then unlock the slab_mutex and call flush_memcg_workqueue(), it may cost
a while.
Note: now the sysfs node(like '/kernel/slab/:0000248') of A is still
present, it will be deleted in shutdown_cache() which will be called
after flush_memcg_workqueue() is done and lock the slab_mutex again.
2) Now if thread 2 is coming, it use kmem_cache_create() to create B, which
is mergeable with A(their size is same), it gain the lock of slab_mutex,
then call __kmem_cache_alias() trying to find a mergeable node, because
of the below added code in commit d38a2b7a9c93 ("mm: memcg/slab: fix
memory leak at non-root kmem_cache destroy"), B is not mergeable with
A whose memcg_params.dying is true.

int slab_unmergeable(struct kmem_cache *s)
 	if (s->refcount < 0)
 		return 1;

	/*
	 * Skip the dying kmem_cache.
	 */
	if (s->memcg_params.dying)
		return 1;

 	return 0;
 }

So B has to create its own sysfs node by calling:
 create_cache->
	__kmem_cache_create->
		sysfs_slab_add->
			kobject_init_and_add
Because B is mergeable itself, its filename of sysfs node is based on its size,
like '/kernel/slab/:0000248', which is duplicate with A, and the sysfs
node of A is still present now, so kobject_init_and_add() will return
fail and result in kmem_cache_create() fail.

Concurrently modprobe and rmmod the two modules below can reproduce the issue
quickly: nf_conntrack_expect, se_sess_cache. See call trace in the end.

LTS versions of v4.19.y and v5.4.y have this problem, whereas linux versions after
v5.9 do not have this problem because the patchset: ("The new cgroup slab memory
controller") almost refactored memcg slab.

A potential solution(this patch belongs): Just let the dying kmem_cache be mergeable,
the slab_mutex lock can prevent the race between alias kmem_cache creating thread
and root kmem_cache destroying thread. In the destroying thread, after
flush_memcg_workqueue() is done, judge the refcount again, if someone
reference it again during un-lock time, we don't need to destroy the kmem_cache
completely, we can reuse it.

Another potential solution: revert the commit d38a2b7a9c93 ("mm: memcg/slab:
fix memory leak at non-root kmem_cache destroy"), compare to the fail of
kmem_cache_create, the memory leak in special scenario seems less harmful.

Call trace:
 sysfs: cannot create duplicate filename '/kernel/slab/:0000248'
 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
 Call trace:
  dump_backtrace+0x0/0x198
  show_stack+0x24/0x30
  dump_stack+0xb0/0x100
  sysfs_warn_dup+0x6c/0x88
  sysfs_create_dir_ns+0x104/0x120
  kobject_add_internal+0xd0/0x378
  kobject_init_and_add+0x90/0xd8
  sysfs_slab_add+0x16c/0x2d0
  __kmem_cache_create+0x16c/0x1d8
  create_cache+0xbc/0x1f8
  kmem_cache_create_usercopy+0x1a0/0x230
  kmem_cache_create+0x50/0x68
  init_se_kmem_caches+0x38/0x258 [target_core_mod]
  target_core_init_configfs+0x8c/0x390 [target_core_mod]
  do_one_initcall+0x54/0x230
  do_init_module+0x64/0x1ec
  load_module+0x150c/0x16f0
  __se_sys_finit_module+0xf0/0x108
  __arm64_sys_finit_module+0x24/0x30
  el0_svc_common+0x80/0x1c0
  el0_svc_handler+0x78/0xe0
  el0_svc+0x10/0x260
 kobject_add_internal failed for :0000248 with -EEXIST, don't try to register things with the same name in the same directory.
 kmem_cache_create(se_sess_cache) failed with error -17
 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
 Call trace:
  dump_backtrace+0x0/0x198
  show_stack+0x24/0x30
  dump_stack+0xb0/0x100
  kmem_cache_create_usercopy+0xa8/0x230
  kmem_cache_create+0x50/0x68
  init_se_kmem_caches+0x38/0x258 [target_core_mod]
  target_core_init_configfs+0x8c/0x390 [target_core_mod]
  do_one_initcall+0x54/0x230
  do_init_module+0x64/0x1ec
  load_module+0x150c/0x16f0
  __se_sys_finit_module+0xf0/0x108
  __arm64_sys_finit_module+0x24/0x30
  el0_svc_common+0x80/0x1c0
  el0_svc_handler+0x78/0xe0
  el0_svc+0x10/0x260

Fixes: d38a2b7a9c93 ("mm: memcg/slab: fix memory leak at non-root kmem_cache destroy")
Signed-off-by: Nanyong Sun <sunnanyong@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 mm/slab_common.c |   18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -311,14 +311,6 @@ int slab_unmergeable(struct kmem_cache *
 	if (s->refcount < 0)
 		return 1;
 
-#ifdef CONFIG_MEMCG_KMEM
-	/*
-	 * Skip the dying kmem_cache.
-	 */
-	if (s->memcg_params.dying)
-		return 1;
-#endif
-
 	return 0;
 }
 
@@ -918,6 +910,16 @@ void kmem_cache_destroy(struct kmem_cach
 	get_online_mems();
 
 	mutex_lock(&slab_mutex);
+
+	/*
+	 * Another thread referenced it again
+	 */
+	if (READ_ONCE(s->refcount)) {
+		spin_lock_irq(&memcg_kmem_wq_lock);
+		s->memcg_params.dying = false;
+		spin_unlock_irq(&memcg_kmem_wq_lock);
+		goto out_unlock;
+	}
 #endif
 
 	err = shutdown_memcg_caches(s);


Patches currently in stable-queue which might be from sunnanyong@huawei.com are

queue-4.19/mm-slab-fix-kmem_cache_create-failed-when-sysfs-node-not-destroyed.patch


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Patch "mm: slab: fix kmem_cache_create failed when sysfs node not destroyed" has been added to the 5.4-stable tree
  2021-07-20  8:20 [PATCH v4.19.y,v5.4.y] mm: slab: fix kmem_cache_create failed when sysfs node not destroyed Nanyong Sun
  2021-07-22 14:44 ` Greg KH
  2021-07-22 14:51 ` Patch "mm: slab: fix kmem_cache_create failed when sysfs node not destroyed" has been added to the 4.19-stable tree gregkh
@ 2021-07-22 14:52 ` gregkh
  2 siblings, 0 replies; 4+ messages in thread
From: gregkh @ 2021-07-22 14:52 UTC (permalink / raw)
  To: akpm, cl, gregkh, iamjoonsoo.kim, linux-mm, penberg, rientjes,
	songmuchun, sunnanyong
  Cc: stable-commits


This is a note to let you know that I've just added the patch titled

    mm: slab: fix kmem_cache_create failed when sysfs node not destroyed

to the 5.4-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     mm-slab-fix-kmem_cache_create-failed-when-sysfs-node-not-destroyed.patch
and it can be found in the queue-5.4 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@vger.kernel.org> know about it.


From sunnanyong@huawei.com  Thu Jul 22 16:42:41 2021
From: Nanyong Sun <sunnanyong@huawei.com>
Date: Tue, 20 Jul 2021 16:20:48 +0800
Subject: mm: slab: fix kmem_cache_create failed when sysfs node not destroyed
To: <songmuchun@bytedance.com>, <cl@linux.com>, <penberg@kernel.org>, <rientjes@google.com>, <iamjoonsoo.kim@lge.com>, <akpm@linux-foundation.org>
Cc: <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>, <stable@vger.kernel.org>
Message-ID: <20210720082048.2797315-1-sunnanyong@huawei.com>

From: Nanyong Sun <sunnanyong@huawei.com>

The commit d38a2b7a9c93 ("mm: memcg/slab: fix memory leak at non-root
kmem_cache destroy") introduced a problem: If one thread destroy a
kmem_cache A and another thread concurrently create a kmem_cache B,
which is mergeable with A and has same size with A, the B may fail to
create due to the duplicate sysfs node.
The scenario in detail:
1) Thread 1 uses kmem_cache_destroy() to destroy kmem_cache A which is
mergeable, it decreases A's refcount and if refcount is 0, then call
memcg_set_kmem_cache_dying() which set A->memcg_params.dying = true,
then unlock the slab_mutex and call flush_memcg_workqueue(), it may cost
a while.
Note: now the sysfs node(like '/kernel/slab/:0000248') of A is still
present, it will be deleted in shutdown_cache() which will be called
after flush_memcg_workqueue() is done and lock the slab_mutex again.
2) Now if thread 2 is coming, it use kmem_cache_create() to create B, which
is mergeable with A(their size is same), it gain the lock of slab_mutex,
then call __kmem_cache_alias() trying to find a mergeable node, because
of the below added code in commit d38a2b7a9c93 ("mm: memcg/slab: fix
memory leak at non-root kmem_cache destroy"), B is not mergeable with
A whose memcg_params.dying is true.

int slab_unmergeable(struct kmem_cache *s)
 	if (s->refcount < 0)
 		return 1;

	/*
	 * Skip the dying kmem_cache.
	 */
	if (s->memcg_params.dying)
		return 1;

 	return 0;
 }

So B has to create its own sysfs node by calling:
 create_cache->
	__kmem_cache_create->
		sysfs_slab_add->
			kobject_init_and_add
Because B is mergeable itself, its filename of sysfs node is based on its size,
like '/kernel/slab/:0000248', which is duplicate with A, and the sysfs
node of A is still present now, so kobject_init_and_add() will return
fail and result in kmem_cache_create() fail.

Concurrently modprobe and rmmod the two modules below can reproduce the issue
quickly: nf_conntrack_expect, se_sess_cache. See call trace in the end.

LTS versions of v4.19.y and v5.4.y have this problem, whereas linux versions after
v5.9 do not have this problem because the patchset: ("The new cgroup slab memory
controller") almost refactored memcg slab.

A potential solution(this patch belongs): Just let the dying kmem_cache be mergeable,
the slab_mutex lock can prevent the race between alias kmem_cache creating thread
and root kmem_cache destroying thread. In the destroying thread, after
flush_memcg_workqueue() is done, judge the refcount again, if someone
reference it again during un-lock time, we don't need to destroy the kmem_cache
completely, we can reuse it.

Another potential solution: revert the commit d38a2b7a9c93 ("mm: memcg/slab:
fix memory leak at non-root kmem_cache destroy"), compare to the fail of
kmem_cache_create, the memory leak in special scenario seems less harmful.

Call trace:
 sysfs: cannot create duplicate filename '/kernel/slab/:0000248'
 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
 Call trace:
  dump_backtrace+0x0/0x198
  show_stack+0x24/0x30
  dump_stack+0xb0/0x100
  sysfs_warn_dup+0x6c/0x88
  sysfs_create_dir_ns+0x104/0x120
  kobject_add_internal+0xd0/0x378
  kobject_init_and_add+0x90/0xd8
  sysfs_slab_add+0x16c/0x2d0
  __kmem_cache_create+0x16c/0x1d8
  create_cache+0xbc/0x1f8
  kmem_cache_create_usercopy+0x1a0/0x230
  kmem_cache_create+0x50/0x68
  init_se_kmem_caches+0x38/0x258 [target_core_mod]
  target_core_init_configfs+0x8c/0x390 [target_core_mod]
  do_one_initcall+0x54/0x230
  do_init_module+0x64/0x1ec
  load_module+0x150c/0x16f0
  __se_sys_finit_module+0xf0/0x108
  __arm64_sys_finit_module+0x24/0x30
  el0_svc_common+0x80/0x1c0
  el0_svc_handler+0x78/0xe0
  el0_svc+0x10/0x260
 kobject_add_internal failed for :0000248 with -EEXIST, don't try to register things with the same name in the same directory.
 kmem_cache_create(se_sess_cache) failed with error -17
 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
 Call trace:
  dump_backtrace+0x0/0x198
  show_stack+0x24/0x30
  dump_stack+0xb0/0x100
  kmem_cache_create_usercopy+0xa8/0x230
  kmem_cache_create+0x50/0x68
  init_se_kmem_caches+0x38/0x258 [target_core_mod]
  target_core_init_configfs+0x8c/0x390 [target_core_mod]
  do_one_initcall+0x54/0x230
  do_init_module+0x64/0x1ec
  load_module+0x150c/0x16f0
  __se_sys_finit_module+0xf0/0x108
  __arm64_sys_finit_module+0x24/0x30
  el0_svc_common+0x80/0x1c0
  el0_svc_handler+0x78/0xe0
  el0_svc+0x10/0x260

Fixes: d38a2b7a9c93 ("mm: memcg/slab: fix memory leak at non-root kmem_cache destroy")
Signed-off-by: Nanyong Sun <sunnanyong@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 mm/slab_common.c |   18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -325,14 +325,6 @@ int slab_unmergeable(struct kmem_cache *
 	if (s->refcount < 0)
 		return 1;
 
-#ifdef CONFIG_MEMCG_KMEM
-	/*
-	 * Skip the dying kmem_cache.
-	 */
-	if (s->memcg_params.dying)
-		return 1;
-#endif
-
 	return 0;
 }
 
@@ -973,6 +965,16 @@ void kmem_cache_destroy(struct kmem_cach
 	get_online_mems();
 
 	mutex_lock(&slab_mutex);
+
+	/*
+	 * Another thread referenced it again
+	 */
+	if (READ_ONCE(s->refcount)) {
+		spin_lock_irq(&memcg_kmem_wq_lock);
+		s->memcg_params.dying = false;
+		spin_unlock_irq(&memcg_kmem_wq_lock);
+		goto out_unlock;
+	}
 #endif
 
 	err = shutdown_memcg_caches(s);


Patches currently in stable-queue which might be from sunnanyong@huawei.com are

queue-5.4/mm-slab-fix-kmem_cache_create-failed-when-sysfs-node-not-destroyed.patch


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-07-22 14:52 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-20  8:20 [PATCH v4.19.y,v5.4.y] mm: slab: fix kmem_cache_create failed when sysfs node not destroyed Nanyong Sun
2021-07-22 14:44 ` Greg KH
2021-07-22 14:51 ` Patch "mm: slab: fix kmem_cache_create failed when sysfs node not destroyed" has been added to the 4.19-stable tree gregkh
2021-07-22 14:52 ` Patch "mm: slab: fix kmem_cache_create failed when sysfs node not destroyed" has been added to the 5.4-stable tree gregkh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.