[PATCH 1/2] mm, slab: Revert "extend slab/shrink to shrink all memcg caches"

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/2] mm, slab: Revert "extend slab/shrink to shrink all memcg caches"
@ 2020-04-24 15:12 Waiman Long
  2020-04-24 15:12 ` [PATCH 2/2] mm/slub: Fix slab_mutex circular locking problem in slab_attr_store() Waiman Long
  2020-04-24 16:41 ` [PATCH 1/2] mm, slab: Revert "extend slab/shrink to shrink all memcg caches" Qian Cai
  0 siblings, 2 replies; 7+ messages in thread
From: Waiman Long @ 2020-04-24 15:12 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton
  Cc: linux-mm, linux-kernel, Juri Lelli, Waiman Long

When the slub shrink sysfs file is written into, the function call
sequence is as follows:

  kernfs_fop_write
    => slab_attr_store
      => shrink_store
        => kmem_cache_shrink_all

It turns out that doing a memcg cache scan in kmem_cache_shrink_all()
is redundant as the same memcg cache scan is being done in
slab_attr_store(). So revert the commit 04f768a39d55 ("mm, slab: extend
slab/shrink to shrink all memcg caches") except the documentation change
which is still valid.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 mm/slab.h        |  1 -
 mm/slab_common.c | 37 -------------------------------------
 mm/slub.c        |  2 +-
 3 files changed, 1 insertion(+), 39 deletions(-)

diff --git a/mm/slab.h b/mm/slab.h
index 207c83ef6e06..0937cb2ae8aa 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -237,7 +237,6 @@ int __kmem_cache_shrink(struct kmem_cache *);
 void __kmemcg_cache_deactivate(struct kmem_cache *s);
 void __kmemcg_cache_deactivate_after_rcu(struct kmem_cache *s);
 void slab_kmem_cache_release(struct kmem_cache *);
-void kmem_cache_shrink_all(struct kmem_cache *s);
 
 struct seq_file;
 struct file;
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 23c7500eea7d..2e367ab8c15c 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -995,43 +995,6 @@ int kmem_cache_shrink(struct kmem_cache *cachep)
 }
 EXPORT_SYMBOL(kmem_cache_shrink);
 
-/**
- * kmem_cache_shrink_all - shrink a cache and all memcg caches for root cache
- * @s: The cache pointer
- */
-void kmem_cache_shrink_all(struct kmem_cache *s)
-{
-	struct kmem_cache *c;
-
-	if (!IS_ENABLED(CONFIG_MEMCG_KMEM) || !is_root_cache(s)) {
-		kmem_cache_shrink(s);
-		return;
-	}
-
-	get_online_cpus();
-	get_online_mems();
-	kasan_cache_shrink(s);
-	__kmem_cache_shrink(s);
-
-	/*
-	 * We have to take the slab_mutex to protect from the memcg list
-	 * modification.
-	 */
-	mutex_lock(&slab_mutex);
-	for_each_memcg_cache(c, s) {
-		/*
-		 * Don't need to shrink deactivated memcg caches.
-		 */
-		if (s->flags & SLAB_DEACTIVATED)
-			continue;
-		kasan_cache_shrink(c);
-		__kmem_cache_shrink(c);
-	}
-	mutex_unlock(&slab_mutex);
-	put_online_mems();
-	put_online_cpus();
-}
-
 bool slab_is_available(void)
 {
 	return slab_state >= UP;
diff --git a/mm/slub.c b/mm/slub.c
index 9bf44955c4f1..183ccc364ccf 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5343,7 +5343,7 @@ static ssize_t shrink_store(struct kmem_cache *s,
 			const char *buf, size_t length)
 {
 	if (buf[0] == '1')
-		kmem_cache_shrink_all(s);
+		kmem_cache_shrink(s);
 	else
 		return -EINVAL;
 	return length;
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] mm/slub: Fix slab_mutex circular locking problem in slab_attr_store()
  2020-04-24 15:12 [PATCH 1/2] mm, slab: Revert "extend slab/shrink to shrink all memcg caches" Waiman Long
@ 2020-04-24 15:12 ` Waiman Long
  2020-04-24 16:16   ` Qian Cai
  2020-04-24 19:44   ` Qian Cai
  2020-04-24 16:41 ` [PATCH 1/2] mm, slab: Revert "extend slab/shrink to shrink all memcg caches" Qian Cai
  1 sibling, 2 replies; 7+ messages in thread
From: Waiman Long @ 2020-04-24 15:12 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton
  Cc: linux-mm, linux-kernel, Juri Lelli, Waiman Long

The following lockdep splat was reported:

  [  176.241923] ======================================================
  [  176.241924] WARNING: possible circular locking dependency detected
  [  176.241926] 4.18.0-172.rt13.29.el8.x86_64+debug #1 Not tainted
  [  176.241927] ------------------------------------------------------
  [  176.241929] slub_cpu_partia/5371 is trying to acquire lock:
  [  176.241930] ffffffffa0b83718 (slab_mutex){+.+.}, at: slab_attr_store+0x6b/0xe0
  [  176.241941]
                 but task is already holding lock:
  [  176.241942] ffff88bb6d8b83c8 (kn->count#103){++++}, at: kernfs_fop_write+0x1cc/0x400
  [  176.241947]
                 which lock already depends on the new lock.

  [  176.241949]
                 the existing dependency chain (in reverse order) is:
  [  176.241949]
                 -> #1 (kn->count#103){++++}:
  [  176.241955]        __kernfs_remove+0x616/0x800
  [  176.241957]        kernfs_remove_by_name_ns+0x3e/0x80
  [  176.241959]        sysfs_slab_add+0x1c6/0x330
  [  176.241961]        __kmem_cache_create+0x15f/0x1b0
  [  176.241964]        create_cache+0xe1/0x220
  [  176.241966]        kmem_cache_create_usercopy+0x1a3/0x260
  [  176.241967]        kmem_cache_create+0x12/0x20
  [  176.242076]        mlx5_init_fs+0x18d/0x1a00 [mlx5_core]
  [  176.242100]        mlx5_load_one+0x3b4/0x1730 [mlx5_core]
  [  176.242124]        init_one+0x901/0x11b0 [mlx5_core]
  [  176.242127]        local_pci_probe+0xd4/0x180
  [  176.242131]        work_for_cpu_fn+0x51/0xa0
  [  176.242133]        process_one_work+0x91a/0x1ac0
  [  176.242134]        worker_thread+0x536/0xb40
  [  176.242136]        kthread+0x30c/0x3d0
  [  176.242140]        ret_from_fork+0x27/0x50
  [  176.242140]
                 -> #0 (slab_mutex){+.+.}:
  [  176.242145]        __lock_acquire+0x22cb/0x48c0
  [  176.242146]        lock_acquire+0x134/0x4c0
  [  176.242148]        _mutex_lock+0x28/0x40
  [  176.242150]        slab_attr_store+0x6b/0xe0
  [  176.242151]        kernfs_fop_write+0x251/0x400
  [  176.242154]        vfs_write+0x157/0x460
  [  176.242155]        ksys_write+0xb8/0x170
  [  176.242158]        do_syscall_64+0x13c/0x710
  [  176.242160]        entry_SYSCALL_64_after_hwframe+0x6a/0xdf
  [  176.242161]
                 other info that might help us debug this:

  [  176.242161]  Possible unsafe locking scenario:

  [  176.242162]        CPU0                    CPU1
  [  176.242163]        ----                    ----
  [  176.242163]   lock(kn->count#103);
  [  176.242165]                                lock(slab_mutex);
  [  176.242166]                                lock(kn->count#103);
  [  176.242167]   lock(slab_mutex);
  [  176.242169]
                  *** DEADLOCK ***

  [  176.242170] 3 locks held by slub_cpu_partia/5371:
  [  176.242170]  #0: ffff888705e3a800 (sb_writers#4){.+.+}, at: vfs_write+0x31c/0x460
  [  176.242174]  #1: ffff889aeec4d658 (&of->mutex){+.+.}, at: kernfs_fop_write+0x1a9/0x400
  [  176.242177]  #2: ffff88bb6d8b83c8 (kn->count#103){++++}, at: kernfs_fop_write+0x1cc/0x400
  [  176.242180]
                 stack backtrace:
  [  176.242183] CPU: 36 PID: 5371 Comm: slub_cpu_partia Not tainted 4.18.0-172.rt13.29.el8.x86_64+debug #1
  [  176.242184] Hardware name: AMD Corporation DAYTONA_X/DAYTONA_X, BIOS RDY1005C 11/22/2019
  [  176.242185] Call Trace:
  [  176.242190]  dump_stack+0x9a/0xf0
  [  176.242193]  check_noncircular+0x317/0x3c0
  [  176.242195]  ? print_circular_bug+0x1e0/0x1e0
  [  176.242199]  ? native_sched_clock+0x32/0x1e0
  [  176.242202]  ? sched_clock+0x5/0x10
  [  176.242205]  ? sched_clock_cpu+0x238/0x340
  [  176.242208]  __lock_acquire+0x22cb/0x48c0
  [  176.242213]  ? trace_hardirqs_on+0x10/0x10
  [  176.242215]  ? trace_hardirqs_on+0x10/0x10
  [  176.242218]  lock_acquire+0x134/0x4c0
  [  176.242220]  ? slab_attr_store+0x6b/0xe0
  [  176.242223]  _mutex_lock+0x28/0x40
  [  176.242225]  ? slab_attr_store+0x6b/0xe0
  [  176.242227]  slab_attr_store+0x6b/0xe0
  [  176.242229]  ? sysfs_file_ops+0x160/0x160
  [  176.242230]  kernfs_fop_write+0x251/0x400
  [  176.242232]  ? __sb_start_write+0x26a/0x3f0
  [  176.242234]  vfs_write+0x157/0x460
  [  176.242237]  ksys_write+0xb8/0x170
  [  176.242239]  ? __ia32_sys_read+0xb0/0xb0
  [  176.242242]  ? do_syscall_64+0xb9/0x710
  [  176.242245]  do_syscall_64+0x13c/0x710
  [  176.242247]  entry_SYSCALL_64_after_hwframe+0x6a/0xdf

There was another lockdep splat generated by echoing "1" to
"/sys/kernel/slab/fs_cache/shrink":

[  445.231443] Chain exists of:
                 cpu_hotplug_lock --> mem_hotplug_lock --> slab_mutex

[  445.242025]  Possible unsafe locking scenario:

[  445.247977]        CPU0                    CPU1
[  445.252529]        ----                    ----
[  445.257082]   lock(slab_mutex);
[  445.260239]                                lock(mem_hotplug_lock);
[  445.266452]                                lock(slab_mutex);
[  445.272141]   lock(cpu_hotplug_lock);

So it is problematic to use slab_mutex to iterate the list of
child memcgs with for_each_memcg_cache(). Fortunately, there is
another way to do child memcg iteration by going through the array
entries in memcg_params.memcg_caches while holding a read lock on
memcg_cache_ids_sem.

To avoid other possible circular locking problems, we only take a
reference to the child memcgs and store their addresses while holding
memcg_cache_ids_sem. The actual store method is called for each of the
child memcgs after releasing the lock.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 mm/slub.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 48 insertions(+), 8 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 183ccc364ccf..255981180489 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5567,13 +5567,30 @@ static ssize_t slab_attr_store(struct kobject *kobj,
 		return -EIO;
 
 	err = attribute->store(s, buf, len);
-#ifdef CONFIG_MEMCG
-	if (slab_state >= FULL && err >= 0 && is_root_cache(s)) {
-		struct kmem_cache *c;
+#ifdef CONFIG_MEMCG_KMEM
+	if (slab_state >= FULL && err >= 0 && is_root_cache(s) &&
+	    !list_empty(&s->memcg_params.children)) {
+		struct kmem_cache *c, **pcaches;
+		int idx, max, cnt = 0;
+		size_t size = s->max_attr_size;
+		struct memcg_cache_array *arr;
+
+		/*
+		 * Make atomic update to s->max_attr_size.
+		 */
+		do {
+			if (len <= size)
+				break;
+		} while (!try_cmpxchg(&s->max_attr_size, &size, len));
 
-		mutex_lock(&slab_mutex);
-		if (s->max_attr_size < len)
-			s->max_attr_size = len;
+		memcg_get_cache_ids();
+		max = memcg_nr_cache_ids;
+
+		pcaches = kmalloc_array(max, sizeof(void *), GFP_KERNEL);
+		if (!pcaches) {
+			memcg_put_cache_ids();
+			return -ENOMEM;
+		}
 
 		/*
 		 * This is a best effort propagation, so this function's return
@@ -5591,10 +5608,33 @@ static ssize_t slab_attr_store(struct kobject *kobj,
 		 * has well defined semantics. The cache being written to
 		 * directly either failed or succeeded, in which case we loop
 		 * through the descendants with best-effort propagation.
+		 *
+		 * To avoid potential circular lock dependency problems, we
+		 * just get a reference and store child cache pointers while
+		 * holding the memcg_cache_ids_sem read lock. The store
+		 * method is then called for each child cache after releasing
+		 * the lock. Code sequence partly borrowed from
+		 * memcg_kmem_get_cache().
 		 */
-		for_each_memcg_cache(c, s)
+		rcu_read_lock();
+		arr = rcu_dereference(s->memcg_params.memcg_caches);
+		for (idx = 0; idx < max; idx++) {
+			c = READ_ONCE(arr->entries[idx]);
+			if (!c)
+				continue;
+			if (!percpu_ref_tryget(&c->memcg_params.refcnt))
+				continue;
+			pcaches[cnt++] = c;
+		}
+		rcu_read_unlock();
+		memcg_put_cache_ids();
+
+		for (idx = 0; idx < cnt; idx++) {
+			c = pcaches[idx];
 			attribute->store(c, buf, len);
-		mutex_unlock(&slab_mutex);
+			percpu_ref_put(&c->memcg_params.refcnt);
+		}
+		kfree(pcaches);
 	}
 #endif
 	return err;
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] mm/slub: Fix slab_mutex circular locking problem in slab_attr_store()
  2020-04-24 15:12 ` [PATCH 2/2] mm/slub: Fix slab_mutex circular locking problem in slab_attr_store() Waiman Long
@ 2020-04-24 16:16   ` Qian Cai
  2020-04-27 18:16     ` Waiman Long
  2020-04-24 19:44   ` Qian Cai
  1 sibling, 1 reply; 7+ messages in thread
From: Qian Cai @ 2020-04-24 16:16 UTC (permalink / raw)
  To: Waiman Long
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Linux-MM, LKML, Juri Lelli



> On Apr 24, 2020, at 11:12 AM, Waiman Long <longman@redhat.com> wrote:
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 183ccc364ccf..255981180489 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -5567,13 +5567,30 @@ static ssize_t slab_attr_store(struct kobject *kobj,
> 		return -EIO;
> 
> 	err = attribute->store(s, buf, len);
> -#ifdef CONFIG_MEMCG
> -	if (slab_state >= FULL && err >= 0 && is_root_cache(s)) {
> -		struct kmem_cache *c;
> +#ifdef CONFIG_MEMCG_KMEM
> +	if (slab_state >= FULL && err >= 0 && is_root_cache(s) &&
> +	    !list_empty(&s->memcg_params.children)) {
> +		struct kmem_cache *c, **pcaches;
> +		int idx, max, cnt = 0;
> +		size_t size = s->max_attr_size;
> +		struct memcg_cache_array *arr;
> +
> +		/*
> +		 * Make atomic update to s->max_attr_size.
> +		 */
> +		do {
> +			if (len <= size)
> +				break;
> +		} while (!try_cmpxchg(&s->max_attr_size, &size, len));
> 

try_cmpxchg() is only available on x86.

mm/slub.c: In function ‘slab_attr_store’:
mm/slub.c:5613:13: error: implicit declaration of function ‘try_cmpxchg’; did you mean ‘xa_cmpxchg’? [-Werror=implicit-function-declaration]
   } while (!try_cmpxchg(&s->max_attr_size, &size, len));
             ^~~~~~~~~~~
             xa_cmpxchg


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] mm, slab: Revert "extend slab/shrink to shrink all memcg caches"
  2020-04-24 15:12 [PATCH 1/2] mm, slab: Revert "extend slab/shrink to shrink all memcg caches" Waiman Long
  2020-04-24 15:12 ` [PATCH 2/2] mm/slub: Fix slab_mutex circular locking problem in slab_attr_store() Waiman Long
@ 2020-04-24 16:41 ` Qian Cai
  1 sibling, 0 replies; 7+ messages in thread
From: Qian Cai @ 2020-04-24 16:41 UTC (permalink / raw)
  To: Waiman Long
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, linux-mm, linux-kernel, Juri Lelli



> On Apr 24, 2020, at 11:12 AM, Waiman Long <longman@redhat.com> wrote:
> 
> When the slub shrink sysfs file is written into, the function call
> sequence is as follows:
> 
>  kernfs_fop_write
>    => slab_attr_store
>      => shrink_store
>        => kmem_cache_shrink_all
> 
> It turns out that doing a memcg cache scan in kmem_cache_shrink_all()
> is redundant as the same memcg cache scan is being done in
> slab_attr_store(). So revert the commit 04f768a39d55 ("mm, slab: extend
> slab/shrink to shrink all memcg caches") except the documentation change
> which is still valid.

BTW, currently, do,

# echo 1 > /sys/kernel/slab/fs_cache/shrink

would crash the kernel stack probably due to large amount of memcg caches.

 I am still figuring out if the above commit 04f768a39d55 is the culprit.

[ 7938.979589][T106403] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: __kmem_cache_create+0x7f8/0x800
[ 7938.979640][T106403] CPU: 80 PID: 106403 Comm: kworker/80:2 Not tainted 5.7.0-rc2-next-20200424 #5
[ 7938.979670][T106403] Workqueue: memcg_kmem_cache memcg_kmem_cache_create_func
[ 7938.979708][T106403] Call Trace:
[ 7938.979745][T106403] [c000200012e0f880] [c000000000716498] dump_stack+0xfc/0x174 (unreliable)
[ 7938.979789][T106403] [c000200012e0f8d0] [c00000000010d7d0] panic+0x224/0x4d4
[ 7938.979816][T106403] [c000200012e0f970] [c00000000010d05c] __stack_chk_fail+0x2c/0x30
[ 7938.979865][T106403] [c000200012e0f9d0] [c0000000004b1fb8] __kmem_cache_create+0x7f8/0x800
[ 7938.979914][T106403] [c000200012e0faf0] [4320383d35334320] 0x4320383d35334320

> 
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
> mm/slab.h        |  1 -
> mm/slab_common.c | 37 -------------------------------------
> mm/slub.c        |  2 +-
> 3 files changed, 1 insertion(+), 39 deletions(-)
> 
> diff --git a/mm/slab.h b/mm/slab.h
> index 207c83ef6e06..0937cb2ae8aa 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -237,7 +237,6 @@ int __kmem_cache_shrink(struct kmem_cache *);
> void __kmemcg_cache_deactivate(struct kmem_cache *s);
> void __kmemcg_cache_deactivate_after_rcu(struct kmem_cache *s);
> void slab_kmem_cache_release(struct kmem_cache *);
> -void kmem_cache_shrink_all(struct kmem_cache *s);
> 
> struct seq_file;
> struct file;
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 23c7500eea7d..2e367ab8c15c 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -995,43 +995,6 @@ int kmem_cache_shrink(struct kmem_cache *cachep)
> }
> EXPORT_SYMBOL(kmem_cache_shrink);
> 
> -/**
> - * kmem_cache_shrink_all - shrink a cache and all memcg caches for root cache
> - * @s: The cache pointer
> - */
> -void kmem_cache_shrink_all(struct kmem_cache *s)
> -{
> -	struct kmem_cache *c;
> -
> -	if (!IS_ENABLED(CONFIG_MEMCG_KMEM) || !is_root_cache(s)) {
> -		kmem_cache_shrink(s);
> -		return;
> -	}
> -
> -	get_online_cpus();
> -	get_online_mems();
> -	kasan_cache_shrink(s);
> -	__kmem_cache_shrink(s);
> -
> -	/*
> -	 * We have to take the slab_mutex to protect from the memcg list
> -	 * modification.
> -	 */
> -	mutex_lock(&slab_mutex);
> -	for_each_memcg_cache(c, s) {
> -		/*
> -		 * Don't need to shrink deactivated memcg caches.
> -		 */
> -		if (s->flags & SLAB_DEACTIVATED)
> -			continue;
> -		kasan_cache_shrink(c);
> -		__kmem_cache_shrink(c);
> -	}
> -	mutex_unlock(&slab_mutex);
> -	put_online_mems();
> -	put_online_cpus();
> -}
> -
> bool slab_is_available(void)
> {
> 	return slab_state >= UP;
> diff --git a/mm/slub.c b/mm/slub.c
> index 9bf44955c4f1..183ccc364ccf 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -5343,7 +5343,7 @@ static ssize_t shrink_store(struct kmem_cache *s,
> 			const char *buf, size_t length)
> {
> 	if (buf[0] == '1')
> -		kmem_cache_shrink_all(s);
> +		kmem_cache_shrink(s);
> 	else
> 		return -EINVAL;
> 	return length;
> -- 
> 2.18.1
> 
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] mm/slub: Fix slab_mutex circular locking problem in slab_attr_store()
  2020-04-24 15:12 ` [PATCH 2/2] mm/slub: Fix slab_mutex circular locking problem in slab_attr_store() Waiman Long
  2020-04-24 16:16   ` Qian Cai
@ 2020-04-24 19:44   ` Qian Cai
  2020-04-27 19:00     ` Waiman Long
  1 sibling, 1 reply; 7+ messages in thread
From: Qian Cai @ 2020-04-24 19:44 UTC (permalink / raw)
  To: Waiman Long
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Linux-MM, LKML, Juri Lelli



> On Apr 24, 2020, at 11:12 AM, Waiman Long <longman@redhat.com> wrote:
> 
> The following lockdep splat was reported:
> 
>  [  176.241923] ======================================================
>  [  176.241924] WARNING: possible circular locking dependency detected
>  [  176.241926] 4.18.0-172.rt13.29.el8.x86_64+debug #1 Not tainted
>  [  176.241927] ------------------------------------------------------
>  [  176.241929] slub_cpu_partia/5371 is trying to acquire lock:
>  [  176.241930] ffffffffa0b83718 (slab_mutex){+.+.}, at: slab_attr_store+0x6b/0xe0
>  [  176.241941]
>                 but task is already holding lock:
>  [  176.241942] ffff88bb6d8b83c8 (kn->count#103){++++}, at: kernfs_fop_write+0x1cc/0x400
>  [  176.241947]
>                 which lock already depends on the new lock.
> 
>  [  176.241949]
>                 the existing dependency chain (in reverse order) is:
>  [  176.241949]
>                 -> #1 (kn->count#103){++++}:
>  [  176.241955]        __kernfs_remove+0x616/0x800
>  [  176.241957]        kernfs_remove_by_name_ns+0x3e/0x80
>  [  176.241959]        sysfs_slab_add+0x1c6/0x330
>  [  176.241961]        __kmem_cache_create+0x15f/0x1b0
>  [  176.241964]        create_cache+0xe1/0x220
>  [  176.241966]        kmem_cache_create_usercopy+0x1a3/0x260
>  [  176.241967]        kmem_cache_create+0x12/0x20
>  [  176.242076]        mlx5_init_fs+0x18d/0x1a00 [mlx5_core]
>  [  176.242100]        mlx5_load_one+0x3b4/0x1730 [mlx5_core]
>  [  176.242124]        init_one+0x901/0x11b0 [mlx5_core]
>  [  176.242127]        local_pci_probe+0xd4/0x180
>  [  176.242131]        work_for_cpu_fn+0x51/0xa0
>  [  176.242133]        process_one_work+0x91a/0x1ac0
>  [  176.242134]        worker_thread+0x536/0xb40
>  [  176.242136]        kthread+0x30c/0x3d0
>  [  176.242140]        ret_from_fork+0x27/0x50
>  [  176.242140]
>                 -> #0 (slab_mutex){+.+.}:
>  [  176.242145]        __lock_acquire+0x22cb/0x48c0
>  [  176.242146]        lock_acquire+0x134/0x4c0
>  [  176.242148]        _mutex_lock+0x28/0x40
>  [  176.242150]        slab_attr_store+0x6b/0xe0
>  [  176.242151]        kernfs_fop_write+0x251/0x400
>  [  176.242154]        vfs_write+0x157/0x460
>  [  176.242155]        ksys_write+0xb8/0x170
>  [  176.242158]        do_syscall_64+0x13c/0x710
>  [  176.242160]        entry_SYSCALL_64_after_hwframe+0x6a/0xdf
>  [  176.242161]
>                 other info that might help us debug this:
> 
>  [  176.242161]  Possible unsafe locking scenario:
> 
>  [  176.242162]        CPU0                    CPU1
>  [  176.242163]        ----                    ----
>  [  176.242163]   lock(kn->count#103);
>  [  176.242165]                                lock(slab_mutex);
>  [  176.242166]                                lock(kn->count#103);
>  [  176.242167]   lock(slab_mutex);
>  [  176.242169]
>                  *** DEADLOCK ***
> 
>  [  176.242170] 3 locks held by slub_cpu_partia/5371:
>  [  176.242170]  #0: ffff888705e3a800 (sb_writers#4){.+.+}, at: vfs_write+0x31c/0x460
>  [  176.242174]  #1: ffff889aeec4d658 (&of->mutex){+.+.}, at: kernfs_fop_write+0x1a9/0x400
>  [  176.242177]  #2: ffff88bb6d8b83c8 (kn->count#103){++++}, at: kernfs_fop_write+0x1cc/0x400
>  [  176.242180]
>                 stack backtrace:
>  [  176.242183] CPU: 36 PID: 5371 Comm: slub_cpu_partia Not tainted 4.18.0-172.rt13.29.el8.x86_64+debug #1
>  [  176.242184] Hardware name: AMD Corporation DAYTONA_X/DAYTONA_X, BIOS RDY1005C 11/22/2019
>  [  176.242185] Call Trace:
>  [  176.242190]  dump_stack+0x9a/0xf0
>  [  176.242193]  check_noncircular+0x317/0x3c0
>  [  176.242195]  ? print_circular_bug+0x1e0/0x1e0
>  [  176.242199]  ? native_sched_clock+0x32/0x1e0
>  [  176.242202]  ? sched_clock+0x5/0x10
>  [  176.242205]  ? sched_clock_cpu+0x238/0x340
>  [  176.242208]  __lock_acquire+0x22cb/0x48c0
>  [  176.242213]  ? trace_hardirqs_on+0x10/0x10
>  [  176.242215]  ? trace_hardirqs_on+0x10/0x10
>  [  176.242218]  lock_acquire+0x134/0x4c0
>  [  176.242220]  ? slab_attr_store+0x6b/0xe0
>  [  176.242223]  _mutex_lock+0x28/0x40
>  [  176.242225]  ? slab_attr_store+0x6b/0xe0
>  [  176.242227]  slab_attr_store+0x6b/0xe0
>  [  176.242229]  ? sysfs_file_ops+0x160/0x160
>  [  176.242230]  kernfs_fop_write+0x251/0x400
>  [  176.242232]  ? __sb_start_write+0x26a/0x3f0
>  [  176.242234]  vfs_write+0x157/0x460
>  [  176.242237]  ksys_write+0xb8/0x170
>  [  176.242239]  ? __ia32_sys_read+0xb0/0xb0
>  [  176.242242]  ? do_syscall_64+0xb9/0x710
>  [  176.242245]  do_syscall_64+0x13c/0x710
>  [  176.242247]  entry_SYSCALL_64_after_hwframe+0x6a/0xdf
> 
> There was another lockdep splat generated by echoing "1" to
> "/sys/kernel/slab/fs_cache/shrink":
> 
> [  445.231443] Chain exists of:
>                 cpu_hotplug_lock --> mem_hotplug_lock --> slab_mutex
> 
> [  445.242025]  Possible unsafe locking scenario:
> 
> [  445.247977]        CPU0                    CPU1
> [  445.252529]        ----                    ----
> [  445.257082]   lock(slab_mutex);
> [  445.260239]                                lock(mem_hotplug_lock);
> [  445.266452]                                lock(slab_mutex);
> [  445.272141]   lock(cpu_hotplug_lock);
> 
> So it is problematic to use slab_mutex to iterate the list of
> child memcgs with for_each_memcg_cache(). Fortunately, there is
> another way to do child memcg iteration by going through the array
> entries in memcg_params.memcg_caches while holding a read lock on
> memcg_cache_ids_sem.
> 
> To avoid other possible circular locking problems, we only take a
> reference to the child memcgs and store their addresses while holding
> memcg_cache_ids_sem. The actual store method is called for each of the
> child memcgs after releasing the lock.

Even on x86 where it compiles, this patch is insufficient to prevent a lockdep splat,
because there are still cpu_hotplug_lock and mem_hotplug_lock in the way.

[  290.738690] ======================================================
[  290.744908] WARNING: possible circular locking dependency detected
[  290.751134] 5.7.0-rc2-next-20200424+ #12 Not tainted
[  290.756132] ------------------------------------------------------
[  290.762611] bash/1798 is trying to acquire lock:
[  290.767262] ffffffff89917890 (cpu_hotplug_lock){++++}-{0:0}, at: kmem_cache_shrink+0x12/0x40
[  290.775770] 
[  290.775770] but task is already holding lock:
[  290.781641] ffff8887f2fbc830 (kn->count#86){++++}-{0:0}, at: kernfs_fop_write+0x105/0x250
[  290.790061] 
[  290.790061] which lock already depends on the new lock.
[  290.790061] 
[  290.798289] 
[  290.798289] the existing dependency chain (in reverse order) is:
[  290.805818] 
[  290.805818] -> #3 (kn->count#86){++++}-{0:0}:
[  290.811954]        __kernfs_remove+0x455/0x4c0
[  290.816428]        kernfs_remove+0x23/0x40
[  290.820554]        sysfs_remove_dir+0x74/0x80
[  290.824947]        kobject_del+0x57/0xa0
[  290.828905]        sysfs_slab_unlink+0x1c/0x20
[  290.833377]        shutdown_cache+0x15d/0x1c0
[  290.837964]        kmemcg_cache_shutdown_fn+0xe/0x20
[  290.842963]        kmemcg_workfn+0x35/0x50
[  290.847095]        process_one_work+0x57e/0xb90
[  290.851658]        worker_thread+0x63/0x5b0
[  290.855872]        kthread+0x1f7/0x220
[  290.859653]        ret_from_fork+0x27/0x50
[  290.864042] 
[  290.864042] -> #2 (slab_mutex){+.+.}-{3:3}:
[  290.869747]        __mutex_lock+0x12f/0xca0
[  290.873960]        mutex_lock_nested+0x1b/0x20
[  290.878434]        memcg_create_kmem_cache+0x2e/0x190
[  290.883523]        memcg_kmem_cache_create_func+0x38/0x190
[  290.889274]        process_one_work+0x57e/0xb90
[  290.893838]        worker_thread+0x63/0x5b0
[  290.898050]        kthread+0x1f7/0x220
[  290.901828]        ret_from_fork+0x27/0x50
[  290.905953] 
[  290.905953] -> #1 (mem_hotplug_lock){++++}-{0:0}:
[  290.912384]        get_online_mems+0x3e/0xc0
[  290.916682]        kmem_cache_create_usercopy+0x32/0x270
[  290.922030]        kmem_cache_create+0x16/0x20
[  290.926507]        kmemleak_init+0x4a/0xf4
[  290.930640]        start_kernel+0x52a/0x8bb
[  290.935109]        x86_64_start_reservations+0x24/0x26
[  290.940291]        x86_64_start_kernel+0xf4/0xfb
[  290.944942]        secondary_startup_64+0xb6/0xc0
[  290.949677] 
[  290.949677] -> #0 (cpu_hotplug_lock){++++}-{0:0}:
[  290.955906]        __lock_acquire+0x21f8/0x3260
[  290.960751]        lock_acquire+0x1a2/0x680
[  290.964966]        cpus_read_lock+0x3e/0xc0
[  290.969176]        kmem_cache_shrink+0x12/0x40
[  290.973650]        shrink_store+0x17/0x30
[  290.977687]        slab_attr_store+0x4d/0x3c0
[  290.982076]        sysfs_kf_write+0x89/0xb0
[  290.986473]        kernfs_fop_write+0x155/0x250
[  290.991041]        __vfs_write+0x50/0xa0
[  290.994990]        vfs_write+0x10b/0x290
[  290.998942]        ksys_write+0xcc/0x170
[  291.002893]        __x64_sys_write+0x43/0x50
[  291.007193]        do_syscall_64+0xcc/0xaf0
[  291.011600]        entry_SYSCALL_64_after_hwframe+0x49/0xb3
[  291.017208] 
[  291.017208] other info that might help us debug this:
[  291.017208] 
[  291.025262] Chain exists of:
[  291.025262]   cpu_hotplug_lock --> slab_mutex --> kn->count#86
[  291.025262] 
[  291.035683]  Possible unsafe locking scenario:
[  291.035683] 
[  291.041639]        CPU0                    CPU1
[  291.046200]        ----                    ----
[  291.050762]   lock(kn->count#86);
[  291.054101]                                lock(slab_mutex);
[  291.060132]                                lock(kn->count#86);
[  291.066018]   lock(cpu_hotplug_lock);
[  291.069707] 
[  291.069707]  *** DEADLOCK ***
[  291.069707] 
[  291.075666] 3 locks held by bash/1798:
[  291.079443]  #0: ffff88832e091430 (sb_writers#4){.+.+}-{0:0}, at: vfs_write+0x262/0x290
[  291.087711]  #1: ffff888784670288 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write+0xf5/0x250
[  291.096119]  #2: ffff8887f2fbc830 (kn->count#86){++++}-{0:0}, at: kernfs_fop_write+0x105/0x250
[  291.104788] 
[  291.104788] stack backtrace:
[  291.109181] CPU: 14 PID: 1798 Comm: bash Not tainted 5.7.0-rc2-next-20200424+ #12
[  291.116940] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 03/09/2018
[  291.125517] Call Trace:
[  291.127990]  dump_stack+0xa7/0xea
[  291.131328]  print_circular_bug.cold.54+0x147/0x14c
[  291.136463]  check_noncircular+0x295/0x2d0
[  291.140586]  ? print_circular_bug+0x1d0/0x1d0
[  291.144974]  ? __kasan_check_read+0x11/0x20
[  291.149188]  ? mark_lock+0x160/0xfe0
[  291.152792]  __lock_acquire+0x21f8/0x3260
[  291.156830]  ? register_lock_class+0xb90/0xb90
[  291.161534]  ? register_lock_class+0xb90/0xb90
[  291.166012]  ? check_flags.part.28+0x220/0x220
[  291.170489]  lock_acquire+0x1a2/0x680
[  291.174178]  ? kmem_cache_shrink+0x12/0x40
[  291.178306]  ? check_flags.part.28+0x220/0x220
[  291.182784]  ? ___might_sleep+0x178/0x210
[  291.187021]  cpus_read_lock+0x3e/0xc0
[  291.190709]  ? kmem_cache_shrink+0x12/0x40
[  291.194832]  kmem_cache_shrink+0x12/0x40
[  291.198781]  shrink_store+0x17/0x30
[  291.202294]  slab_attr_store+0x4d/0x3c0
[  291.206159]  ? kernfs_get_active+0x136/0x150
[  291.210663]  ? objects_show+0x20/0x20
[  291.214349]  sysfs_kf_write+0x89/0xb0
[  291.218038]  ? sysfs_file_ops+0xa0/0xa0
[  291.221898]  kernfs_fop_write+0x155/0x250
[  291.225936]  __vfs_write+0x50/0xa0
[  291.229364]  vfs_write+0x10b/0x290
[  291.232790]  ksys_write+0xcc/0x170
[  291.236405]  ? __x64_sys_read+0x50/0x50
[  291.240270]  ? do_syscall_64+0x79/0xaf0
[  291.244132]  ? do_syscall_64+0x79/0xaf0
[  291.247998]  __x64_sys_write+0x43/0x50
[  291.251776]  do_syscall_64+0xcc/0xaf0
[  291.255464]  ? trace_hardirqs_on_thunk+0x1a/0x1c
[  291.260396]  ? syscall_return_slowpath+0x580/0x580
[  291.265223]  ? entry_SYSCALL_64_after_hwframe+0x3e/0xb3
[  291.270491]  ? trace_hardirqs_off_caller+0x3a/0x150
[  291.275400]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[  291.280137]  entry_SYSCALL_64_after_hwframe+0x49/0xb3
[  291.285443] RIP: 0033:0x7f2c0d9adb28
[  291.289046] Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 35 4b 2d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
[  291.307931] RSP: 002b:00007ffccc6e3898 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  291.315804] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f2c0d9adb28
[  291.322985] RDX: 0000000000000002 RSI: 000055d97c790de0 RDI: 0000000000000001
[  291.330164] RBP: 000055d97c790de0 R08: 000000000000000a R09: 00007f2c0da3ec80
[  291.337537] R10: 000000000000000a R11: 0000000000000246 R12: 00007f2c0dc7e6c0
[  291.344718] R13: 0000000000000002 R14: 00007f2c0dc79880 R15: 0000000000000002


> 
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
> mm/slub.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++--------
> 1 file changed, 48 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 183ccc364ccf..255981180489 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -5567,13 +5567,30 @@ static ssize_t slab_attr_store(struct kobject *kobj,
> 		return -EIO;
> 
> 	err = attribute->store(s, buf, len);
> -#ifdef CONFIG_MEMCG
> -	if (slab_state >= FULL && err >= 0 && is_root_cache(s)) {
> -		struct kmem_cache *c;
> +#ifdef CONFIG_MEMCG_KMEM
> +	if (slab_state >= FULL && err >= 0 && is_root_cache(s) &&
> +	    !list_empty(&s->memcg_params.children)) {
> +		struct kmem_cache *c, **pcaches;
> +		int idx, max, cnt = 0;
> +		size_t size = s->max_attr_size;
> +		struct memcg_cache_array *arr;
> +
> +		/*
> +		 * Make atomic update to s->max_attr_size.
> +		 */
> +		do {
> +			if (len <= size)
> +				break;
> +		} while (!try_cmpxchg(&s->max_attr_size, &size, len));
> 
> -		mutex_lock(&slab_mutex);
> -		if (s->max_attr_size < len)
> -			s->max_attr_size = len;
> +		memcg_get_cache_ids();
> +		max = memcg_nr_cache_ids;
> +
> +		pcaches = kmalloc_array(max, sizeof(void *), GFP_KERNEL);
> +		if (!pcaches) {
> +			memcg_put_cache_ids();
> +			return -ENOMEM;
> +		}
> 
> 		/*
> 		 * This is a best effort propagation, so this function's return
> @@ -5591,10 +5608,33 @@ static ssize_t slab_attr_store(struct kobject *kobj,
> 		 * has well defined semantics. The cache being written to
> 		 * directly either failed or succeeded, in which case we loop
> 		 * through the descendants with best-effort propagation.
> +		 *
> +		 * To avoid potential circular lock dependency problems, we
> +		 * just get a reference and store child cache pointers while
> +		 * holding the memcg_cache_ids_sem read lock. The store
> +		 * method is then called for each child cache after releasing
> +		 * the lock. Code sequence partly borrowed from
> +		 * memcg_kmem_get_cache().
> 		 */
> -		for_each_memcg_cache(c, s)
> +		rcu_read_lock();
> +		arr = rcu_dereference(s->memcg_params.memcg_caches);
> +		for (idx = 0; idx < max; idx++) {
> +			c = READ_ONCE(arr->entries[idx]);
> +			if (!c)
> +				continue;
> +			if (!percpu_ref_tryget(&c->memcg_params.refcnt))
> +				continue;
> +			pcaches[cnt++] = c;
> +		}
> +		rcu_read_unlock();
> +		memcg_put_cache_ids();
> +
> +		for (idx = 0; idx < cnt; idx++) {
> +			c = pcaches[idx];
> 			attribute->store(c, buf, len);
> -		mutex_unlock(&slab_mutex);
> +			percpu_ref_put(&c->memcg_params.refcnt);
> +		}
> +		kfree(pcaches);
> 	}
> #endif
> 	return err;
> -- 
> 2.18.1
> 
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] mm/slub: Fix slab_mutex circular locking problem in slab_attr_store()
  2020-04-24 16:16   ` Qian Cai
@ 2020-04-27 18:16     ` Waiman Long
  0 siblings, 0 replies; 7+ messages in thread
From: Waiman Long @ 2020-04-27 18:16 UTC (permalink / raw)
  To: Qian Cai
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Linux-MM, LKML, Juri Lelli

On 4/24/20 12:16 PM, Qian Cai wrote:
>
>> On Apr 24, 2020, at 11:12 AM, Waiman Long <longman@redhat.com> wrote:
>>
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 183ccc364ccf..255981180489 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -5567,13 +5567,30 @@ static ssize_t slab_attr_store(struct kobject *kobj,
>> 		return -EIO;
>>
>> 	err = attribute->store(s, buf, len);
>> -#ifdef CONFIG_MEMCG
>> -	if (slab_state >= FULL && err >= 0 && is_root_cache(s)) {
>> -		struct kmem_cache *c;
>> +#ifdef CONFIG_MEMCG_KMEM
>> +	if (slab_state >= FULL && err >= 0 && is_root_cache(s) &&
>> +	    !list_empty(&s->memcg_params.children)) {
>> +		struct kmem_cache *c, **pcaches;
>> +		int idx, max, cnt = 0;
>> +		size_t size = s->max_attr_size;
>> +		struct memcg_cache_array *arr;
>> +
>> +		/*
>> +		 * Make atomic update to s->max_attr_size.
>> +		 */
>> +		do {
>> +			if (len <= size)
>> +				break;
>> +		} while (!try_cmpxchg(&s->max_attr_size, &size, len));
>>
> try_cmpxchg() is only available on x86.
>
> mm/slub.c: In function ‘slab_attr_store’:
> mm/slub.c:5613:13: error: implicit declaration of function ‘try_cmpxchg’; did you mean ‘xa_cmpxchg’? [-Werror=implicit-function-declaration]
>     } while (!try_cmpxchg(&s->max_attr_size, &size, len));
>               ^~~~~~~~~~~
>               xa_cmpxchg
>
Oh, sorry about that. Will fix it and send out v2.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] mm/slub: Fix slab_mutex circular locking problem in slab_attr_store()
  2020-04-24 19:44   ` Qian Cai
@ 2020-04-27 19:00     ` Waiman Long
  0 siblings, 0 replies; 7+ messages in thread
From: Waiman Long @ 2020-04-27 19:00 UTC (permalink / raw)
  To: Qian Cai
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Linux-MM, LKML, Juri Lelli

On 4/24/20 3:44 PM, Qian Cai wrote:
>
>> On Apr 24, 2020, at 11:12 AM, Waiman Long <longman@redhat.com> wrote:
>>
>> The following lockdep splat was reported:
>>
>>   [  176.241923] ======================================================
>>   [  176.241924] WARNING: possible circular locking dependency detected
>>   [  176.241926] 4.18.0-172.rt13.29.el8.x86_64+debug #1 Not tainted
>>   [  176.241927] ------------------------------------------------------
>>   [  176.241929] slub_cpu_partia/5371 is trying to acquire lock:
>>   [  176.241930] ffffffffa0b83718 (slab_mutex){+.+.}, at: slab_attr_store+0x6b/0xe0
>>   [  176.241941]
>>                  but task is already holding lock:
>>   [  176.241942] ffff88bb6d8b83c8 (kn->count#103){++++}, at: kernfs_fop_write+0x1cc/0x400
>>   [  176.241947]
>>                  which lock already depends on the new lock.
>>
>>   [  176.241949]
>>                  the existing dependency chain (in reverse order) is:
>>   [  176.241949]
>>                  -> #1 (kn->count#103){++++}:
>>   [  176.241955]        __kernfs_remove+0x616/0x800
>>   [  176.241957]        kernfs_remove_by_name_ns+0x3e/0x80
>>   [  176.241959]        sysfs_slab_add+0x1c6/0x330
>>   [  176.241961]        __kmem_cache_create+0x15f/0x1b0
>>   [  176.241964]        create_cache+0xe1/0x220
>>   [  176.241966]        kmem_cache_create_usercopy+0x1a3/0x260
>>   [  176.241967]        kmem_cache_create+0x12/0x20
>>   [  176.242076]        mlx5_init_fs+0x18d/0x1a00 [mlx5_core]
>>   [  176.242100]        mlx5_load_one+0x3b4/0x1730 [mlx5_core]
>>   [  176.242124]        init_one+0x901/0x11b0 [mlx5_core]
>>   [  176.242127]        local_pci_probe+0xd4/0x180
>>   [  176.242131]        work_for_cpu_fn+0x51/0xa0
>>   [  176.242133]        process_one_work+0x91a/0x1ac0
>>   [  176.242134]        worker_thread+0x536/0xb40
>>   [  176.242136]        kthread+0x30c/0x3d0
>>   [  176.242140]        ret_from_fork+0x27/0x50
>>   [  176.242140]
>>                  -> #0 (slab_mutex){+.+.}:
>>   [  176.242145]        __lock_acquire+0x22cb/0x48c0
>>   [  176.242146]        lock_acquire+0x134/0x4c0
>>   [  176.242148]        _mutex_lock+0x28/0x40
>>   [  176.242150]        slab_attr_store+0x6b/0xe0
>>   [  176.242151]        kernfs_fop_write+0x251/0x400
>>   [  176.242154]        vfs_write+0x157/0x460
>>   [  176.242155]        ksys_write+0xb8/0x170
>>   [  176.242158]        do_syscall_64+0x13c/0x710
>>   [  176.242160]        entry_SYSCALL_64_after_hwframe+0x6a/0xdf
>>   [  176.242161]
>>                  other info that might help us debug this:
>>
>>   [  176.242161]  Possible unsafe locking scenario:
>>
>>   [  176.242162]        CPU0                    CPU1
>>   [  176.242163]        ----                    ----
>>   [  176.242163]   lock(kn->count#103);
>>   [  176.242165]                                lock(slab_mutex);
>>   [  176.242166]                                lock(kn->count#103);
>>   [  176.242167]   lock(slab_mutex);
>>   [  176.242169]
>>                   *** DEADLOCK ***
>>
>>   [  176.242170] 3 locks held by slub_cpu_partia/5371:
>>   [  176.242170]  #0: ffff888705e3a800 (sb_writers#4){.+.+}, at: vfs_write+0x31c/0x460
>>   [  176.242174]  #1: ffff889aeec4d658 (&of->mutex){+.+.}, at: kernfs_fop_write+0x1a9/0x400
>>   [  176.242177]  #2: ffff88bb6d8b83c8 (kn->count#103){++++}, at: kernfs_fop_write+0x1cc/0x400
>>   [  176.242180]
>>                  stack backtrace:
>>   [  176.242183] CPU: 36 PID: 5371 Comm: slub_cpu_partia Not tainted 4.18.0-172.rt13.29.el8.x86_64+debug #1
>>   [  176.242184] Hardware name: AMD Corporation DAYTONA_X/DAYTONA_X, BIOS RDY1005C 11/22/2019
>>   [  176.242185] Call Trace:
>>   [  176.242190]  dump_stack+0x9a/0xf0
>>   [  176.242193]  check_noncircular+0x317/0x3c0
>>   [  176.242195]  ? print_circular_bug+0x1e0/0x1e0
>>   [  176.242199]  ? native_sched_clock+0x32/0x1e0
>>   [  176.242202]  ? sched_clock+0x5/0x10
>>   [  176.242205]  ? sched_clock_cpu+0x238/0x340
>>   [  176.242208]  __lock_acquire+0x22cb/0x48c0
>>   [  176.242213]  ? trace_hardirqs_on+0x10/0x10
>>   [  176.242215]  ? trace_hardirqs_on+0x10/0x10
>>   [  176.242218]  lock_acquire+0x134/0x4c0
>>   [  176.242220]  ? slab_attr_store+0x6b/0xe0
>>   [  176.242223]  _mutex_lock+0x28/0x40
>>   [  176.242225]  ? slab_attr_store+0x6b/0xe0
>>   [  176.242227]  slab_attr_store+0x6b/0xe0
>>   [  176.242229]  ? sysfs_file_ops+0x160/0x160
>>   [  176.242230]  kernfs_fop_write+0x251/0x400
>>   [  176.242232]  ? __sb_start_write+0x26a/0x3f0
>>   [  176.242234]  vfs_write+0x157/0x460
>>   [  176.242237]  ksys_write+0xb8/0x170
>>   [  176.242239]  ? __ia32_sys_read+0xb0/0xb0
>>   [  176.242242]  ? do_syscall_64+0xb9/0x710
>>   [  176.242245]  do_syscall_64+0x13c/0x710
>>   [  176.242247]  entry_SYSCALL_64_after_hwframe+0x6a/0xdf
>>
>> There was another lockdep splat generated by echoing "1" to
>> "/sys/kernel/slab/fs_cache/shrink":
>>
>> [  445.231443] Chain exists of:
>>                  cpu_hotplug_lock --> mem_hotplug_lock --> slab_mutex
>>
>> [  445.242025]  Possible unsafe locking scenario:
>>
>> [  445.247977]        CPU0                    CPU1
>> [  445.252529]        ----                    ----
>> [  445.257082]   lock(slab_mutex);
>> [  445.260239]                                lock(mem_hotplug_lock);
>> [  445.266452]                                lock(slab_mutex);
>> [  445.272141]   lock(cpu_hotplug_lock);
>>
>> So it is problematic to use slab_mutex to iterate the list of
>> child memcgs with for_each_memcg_cache(). Fortunately, there is
>> another way to do child memcg iteration by going through the array
>> entries in memcg_params.memcg_caches while holding a read lock on
>> memcg_cache_ids_sem.
>>
>> To avoid other possible circular locking problems, we only take a
>> reference to the child memcgs and store their addresses while holding
>> memcg_cache_ids_sem. The actual store method is called for each of the
>> child memcgs after releasing the lock.
> Even on x86 where it compiles, this patch is insufficient to prevent a lockdep splat,
> because there are still cpu_hotplug_lock and mem_hotplug_lock in the way.
>
Yes, I was aware of that. Will include additional patch in v2 to address 
that.

Thanks,
Longman


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-04-27 19:00 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-24 15:12 [PATCH 1/2] mm, slab: Revert "extend slab/shrink to shrink all memcg caches" Waiman Long
2020-04-24 15:12 ` [PATCH 2/2] mm/slub: Fix slab_mutex circular locking problem in slab_attr_store() Waiman Long
2020-04-24 16:16   ` Qian Cai
2020-04-27 18:16     ` Waiman Long
2020-04-24 19:44   ` Qian Cai
2020-04-27 19:00     ` Waiman Long
2020-04-24 16:41 ` [PATCH 1/2] mm, slab: Revert "extend slab/shrink to shrink all memcg caches" Qian Cai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).