[PATCH 1/2] mm, slab: Revert "extend slab/shrink to shrink all memcg caches"

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/2] mm, slab: Revert "extend slab/shrink to shrink all memcg caches"
@ 2020-04-24 15:12 Waiman Long
  2020-04-24 15:12 ` [PATCH 2/2] mm/slub: Fix slab_mutex circular locking problem in slab_attr_store() Waiman Long
  2020-04-24 16:41 ` [PATCH 1/2] mm, slab: Revert "extend slab/shrink to shrink all memcg caches" Qian Cai
  0 siblings, 2 replies; 7+ messages in thread
From: Waiman Long @ 2020-04-24 15:12 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton
  Cc: linux-mm, linux-kernel, Juri Lelli, Waiman Long

When the slub shrink sysfs file is written into, the function call
sequence is as follows:

  kernfs_fop_write
    => slab_attr_store
      => shrink_store
        => kmem_cache_shrink_all

It turns out that doing a memcg cache scan in kmem_cache_shrink_all()
is redundant as the same memcg cache scan is being done in
slab_attr_store(). So revert the commit 04f768a39d55 ("mm, slab: extend
slab/shrink to shrink all memcg caches") except the documentation change
which is still valid.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 mm/slab.h        |  1 -
 mm/slab_common.c | 37 -------------------------------------
 mm/slub.c        |  2 +-
 3 files changed, 1 insertion(+), 39 deletions(-)

diff --git a/mm/slab.h b/mm/slab.h
index 207c83ef6e06..0937cb2ae8aa 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -237,7 +237,6 @@ int __kmem_cache_shrink(struct kmem_cache *);
 void __kmemcg_cache_deactivate(struct kmem_cache *s);
 void __kmemcg_cache_deactivate_after_rcu(struct kmem_cache *s);
 void slab_kmem_cache_release(struct kmem_cache *);
-void kmem_cache_shrink_all(struct kmem_cache *s);
 
 struct seq_file;
 struct file;
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 23c7500eea7d..2e367ab8c15c 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -995,43 +995,6 @@ int kmem_cache_shrink(struct kmem_cache *cachep)
 }
 EXPORT_SYMBOL(kmem_cache_shrink);
 
-/**
- * kmem_cache_shrink_all - shrink a cache and all memcg caches for root cache
- * @s: The cache pointer
- */
-void kmem_cache_shrink_all(struct kmem_cache *s)
-{
-	struct kmem_cache *c;
-
-	if (!IS_ENABLED(CONFIG_MEMCG_KMEM) || !is_root_cache(s)) {
-		kmem_cache_shrink(s);
-		return;
-	}
-
-	get_online_cpus();
-	get_online_mems();
-	kasan_cache_shrink(s);
-	__kmem_cache_shrink(s);
-
-	/*
-	 * We have to take the slab_mutex to protect from the memcg list
-	 * modification.
-	 */
-	mutex_lock(&slab_mutex);
-	for_each_memcg_cache(c, s) {
-		/*
-		 * Don't need to shrink deactivated memcg caches.
-		 */
-		if (s->flags & SLAB_DEACTIVATED)
-			continue;
-		kasan_cache_shrink(c);
-		__kmem_cache_shrink(c);
-	}
-	mutex_unlock(&slab_mutex);
-	put_online_mems();
-	put_online_cpus();
-}
-
 bool slab_is_available(void)
 {
 	return slab_state >= UP;
diff --git a/mm/slub.c b/mm/slub.c
index 9bf44955c4f1..183ccc364ccf 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5343,7 +5343,7 @@ static ssize_t shrink_store(struct kmem_cache *s,
 			const char *buf, size_t length)
 {
 	if (buf[0] == '1')
-		kmem_cache_shrink_all(s);
+		kmem_cache_shrink(s);
 	else
 		return -EINVAL;
 	return length;
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] mm/slub: Fix slab_mutex circular locking problem in slab_attr_store()
  2020-04-24 15:12 [PATCH 1/2] mm, slab: Revert "extend slab/shrink to shrink all memcg caches" Waiman Long
@ 2020-04-24 15:12 ` Waiman Long
  2020-04-24 16:16   ` Qian Cai
  2020-04-24 19:44   ` Qian Cai
  2020-04-24 16:41 ` [PATCH 1/2] mm, slab: Revert "extend slab/shrink to shrink all memcg caches" Qian Cai
  1 sibling, 2 replies; 7+ messages in thread
From: Waiman Long @ 2020-04-24 15:12 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton
  Cc: linux-mm, linux-kernel, Juri Lelli, Waiman Long

The following lockdep splat was reported:

  [  176.241923] ======================================================
  [  176.241924] WARNING: possible circular locking dependency detected
  [  176.241926] 4.18.0-172.rt13.29.el8.x86_64+debug #1 Not tainted
  [  176.241927] ------------------------------------------------------
  [  176.241929] slub_cpu_partia/5371 is trying to acquire lock:
  [  176.241930] ffffffffa0b83718 (slab_mutex){+.+.}, at: slab_attr_store+0x6b/0xe0
  [  176.241941]
                 but task is already holding lock:
  [  176.241942] ffff88bb6d8b83c8 (kn->count#103){++++}, at: kernfs_fop_write+0x1cc/0x400
  [  176.241947]
                 which lock already depends on the new lock.

  [  176.241949]
                 the existing dependency chain (in reverse order) is:
  [  176.241949]
                 -> #1 (kn->count#103){++++}:
  [  176.241955]        __kernfs_remove+0x616/0x800
  [  176.241957]        kernfs_remove_by_name_ns+0x3e/0x80
  [  176.241959]        sysfs_slab_add+0x1c6/0x330
  [  176.241961]        __kmem_cache_create+0x15f/0x1b0
  [  176.241964]        create_cache+0xe1/0x220
  [  176.241966]        kmem_cache_create_usercopy+0x1a3/0x260
  [  176.241967]        kmem_cache_create+0x12/0x20
  [  176.242076]        mlx5_init_fs+0x18d/0x1a00 [mlx5_core]
  [  176.242100]        mlx5_load_one+0x3b4/0x1730 [mlx5_core]
  [  176.242124]        init_one+0x901/0x11b0 [mlx5_core]
  [  176.242127]        local_pci_probe+0xd4/0x180
  [  176.242131]        work_for_cpu_fn+0x51/0xa0
  [  176.242133]        process_one_work+0x91a/0x1ac0
  [  176.242134]        worker_thread+0x536/0xb40
  [  176.242136]        kthread+0x30c/0x3d0
  [  176.242140]        ret_from_fork+0x27/0x50
  [  176.242140]
                 -> #0 (slab_mutex){+.+.}:
  [  176.242145]        __lock_acquire+0x22cb/0x48c0
  [  176.242146]        lock_acquire+0x134/0x4c0
  [  176.242148]        _mutex_lock+0x28/0x40
  [  176.242150]        slab_attr_store+0x6b/0xe0
  [  176.242151]        kernfs_fop_write+0x251/0x400
  [  176.242154]        vfs_write+0x157/0x460
  [  176.242155]        ksys_write+0xb8/0x170
  [  176.242158]        do_syscall_64+0x13c/0x710
  [  176.242160]        entry_SYSCALL_64_after_hwframe+0x6a/0xdf
  [  176.242161]
                 other info that might help us debug this:

  [  176.242161]  Possible unsafe locking scenario:

  [  176.242162]        CPU0                    CPU1
  [  176.242163]        ----                    ----
  [  176.242163]   lock(kn->count#103);
  [  176.242165]                                lock(slab_mutex);
  [  176.242166]                                lock(kn->count#103);
  [  176.242167]   lock(slab_mutex);
  [  176.242169]
                  *** DEADLOCK ***

  [  176.242170] 3 locks held by slub_cpu_partia/5371:
  [  176.242170]  #0: ffff888705e3a800 (sb_writers#4){.+.+}, at: vfs_write+0x31c/0x460
  [  176.242174]  #1: ffff889aeec4d658 (&of->mutex){+.+.}, at: kernfs_fop_write+0x1a9/0x400
  [  176.242177]  #2: ffff88bb6d8b83c8 (kn->count#103){++++}, at: kernfs_fop_write+0x1cc/0x400
  [  176.242180]
                 stack backtrace:
  [  176.242183] CPU: 36 PID: 5371 Comm: slub_cpu_partia Not tainted 4.18.0-172.rt13.29.el8.x86_64+debug #1
  [  176.242184] Hardware name: AMD Corporation DAYTONA_X/DAYTONA_X, BIOS RDY1005C 11/22/2019
  [  176.242185] Call Trace:
  [  176.242190]  dump_stack+0x9a/0xf0
  [  176.242193]  check_noncircular+0x317/0x3c0
  [  176.242195]  ? print_circular_bug+0x1e0/0x1e0
  [  176.242199]  ? native_sched_clock+0x32/0x1e0
  [  176.242202]  ? sched_clock+0x5/0x10
  [  176.242205]  ? sched_clock_cpu+0x238/0x340
  [  176.242208]  __lock_acquire+0x22cb/0x48c0
  [  176.242213]  ? trace_hardirqs_on+0x10/0x10
  [  176.242215]  ? trace_hardirqs_on+0x10/0x10
  [  176.242218]  lock_acquire+0x134/0x4c0
  [  176.242220]  ? slab_attr_store+0x6b/0xe0
  [  176.242223]  _mutex_lock+0x28/0x40
  [  176.242225]  ? slab_attr_store+0x6b/0xe0
  [  176.242227]  slab_attr_store+0x6b/0xe0
  [  176.242229]  ? sysfs_file_ops+0x160/0x160
  [  176.242230]  kernfs_fop_write+0x251/0x400
  [  176.242232]  ? __sb_start_write+0x26a/0x3f0
  [  176.242234]  vfs_write+0x157/0x460
  [  176.242237]  ksys_write+0xb8/0x170
  [  176.242239]  ? __ia32_sys_read+0xb0/0xb0
  [  176.242242]  ? do_syscall_64+0xb9/0x710
  [  176.242245]  do_syscall_64+0x13c/0x710
  [  176.242247]  entry_SYSCALL_64_after_hwframe+0x6a/0xdf

There was another lockdep splat generated by echoing "1" to
"/sys/kernel/slab/fs_cache/shrink":

[  445.231443] Chain exists of:
                 cpu_hotplug_lock --> mem_hotplug_lock --> slab_mutex

[  445.242025]  Possible unsafe locking scenario:

[  445.247977]        CPU0                    CPU1
[  445.252529]        ----                    ----
[  445.257082]   lock(slab_mutex);
[  445.260239]                                lock(mem_hotplug_lock);
[  445.266452]                                lock(slab_mutex);
[  445.272141]   lock(cpu_hotplug_lock);

So it is problematic to use slab_mutex to iterate the list of
child memcgs with for_each_memcg_cache(). Fortunately, there is
another way to do child memcg iteration by going through the array
entries in memcg_params.memcg_caches while holding a read lock on
memcg_cache_ids_sem.

To avoid other possible circular locking problems, we only take a
reference to the child memcgs and store their addresses while holding
memcg_cache_ids_sem. The actual store method is called for each of the
child memcgs after releasing the lock.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 mm/slub.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 48 insertions(+), 8 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 183ccc364ccf..255981180489 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5567,13 +5567,30 @@ static ssize_t slab_attr_store(struct kobject *kobj,
 		return -EIO;
 
 	err = attribute->store(s, buf, len);
-#ifdef CONFIG_MEMCG
-	if (slab_state >= FULL && err >= 0 && is_root_cache(s)) {
-		struct kmem_cache *c;
+#ifdef CONFIG_MEMCG_KMEM
+	if (slab_state >= FULL && err >= 0 && is_root_cache(s) &&
+	    !list_empty(&s->memcg_params.children)) {
+		struct kmem_cache *c, **pcaches;
+		int idx, max, cnt = 0;
+		size_t size = s->max_attr_size;
+		struct memcg_cache_array *arr;
+
+		/*
+		 * Make atomic update to s->max_attr_size.
+		 */
+		do {
+			if (len <= size)
+				break;
+		} while (!try_cmpxchg(&s->max_attr_size, &size, len));
 
-		mutex_lock(&slab_mutex);
-		if (s->max_attr_size < len)
-			s->max_attr_size = len;
+		memcg_get_cache_ids();
+		max = memcg_nr_cache_ids;
+
+		pcaches = kmalloc_array(max, sizeof(void *), GFP_KERNEL);
+		if (!pcaches) {
+			memcg_put_cache_ids();
+			return -ENOMEM;
+		}
 
 		/*
 		 * This is a best effort propagation, so this function's return
@@ -5591,10 +5608,33 @@ static ssize_t slab_attr_store(struct kobject *kobj,
 		 * has well defined semantics. The cache being written to
 		 * directly either failed or succeeded, in which case we loop
 		 * through the descendants with best-effort propagation.
+		 *
+		 * To avoid potential circular lock dependency problems, we
+		 * just get a reference and store child cache pointers while
+		 * holding the memcg_cache_ids_sem read lock. The store
+		 * method is then called for each child cache after releasing
+		 * the lock. Code sequence partly borrowed from
+		 * memcg_kmem_get_cache().
 		 */
-		for_each_memcg_cache(c, s)
+		rcu_read_lock();
+		arr = rcu_dereference(s->memcg_params.memcg_caches);
+		for (idx = 0; idx < max; idx++) {
+			c = READ_ONCE(arr->entries[idx]);
+			if (!c)
+				continue;
+			if (!percpu_ref_tryget(&c->memcg_params.refcnt))
+				continue;
+			pcaches[cnt++] = c;
+		}
+		rcu_read_unlock();
+		memcg_put_cache_ids();
+
+		for (idx = 0; idx < cnt; idx++) {
+			c = pcaches[idx];
 			attribute->store(c, buf, len);
-		mutex_unlock(&slab_mutex);
+			percpu_ref_put(&c->memcg_params.refcnt);
+		}
+		kfree(pcaches);
 	}
 #endif
 	return err;
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] mm/slub: Fix slab_mutex circular locking problem in slab_attr_store()
  2020-04-24 15:12 ` [PATCH 2/2] mm/slub: Fix slab_mutex circular locking problem in slab_attr_store() Waiman Long
@ 2020-04-24 16:16   ` Qian Cai
  2020-04-27 18:16     ` Waiman Long
  2020-04-24 19:44   ` Qian Cai
  1 sibling, 1 reply; 7+ messages in thread
From: Qian Cai @ 2020-04-24 16:16 UTC (permalink / raw)
  To: Waiman Long
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Linux-MM, LKML, Juri Lelli



> On Apr 24, 2020, at 11:12 AM, Waiman Long <longman@redhat.com> wrote:
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 183ccc364ccf..255981180489 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -5567,13 +5567,30 @@ static ssize_t slab_attr_store(struct kobject *kobj,
> 		return -EIO;
> 
> 	err = attribute->store(s, buf, len);
> -#ifdef CONFIG_MEMCG
> -	if (slab_state >= FULL && err >= 0 && is_root_cache(s)) {
> -		struct kmem_cache *c;
> +#ifdef CONFIG_MEMCG_KMEM
> +	if (slab_state >= FULL && err >= 0 && is_root_cache(s) &&
> +	    !list_empty(&s->memcg_params.children)) {
> +		struct kmem_cache *c, **pcaches;
> +		int idx, max, cnt = 0;
> +		size_t size = s->max_attr_size;
> +		struct memcg_cache_array *arr;
> +
> +		/*
> +		 * Make atomic update to s->max_attr_size.
> +		 */
> +		do {
> +			if (len <= size)
> +				break;
> +		} while (!try_cmpxchg(&s->max_attr_size, &size, len));
> 

try_cmpxchg() is only available on x86.

mm/slub.c: In function ‘slab_attr_store’:
mm/slub.c:5613:13: error: implicit declaration of function ‘try_cmpxchg’; did you mean ‘xa_cmpxchg’? [-Werror=implicit-function-declaration]
   } while (!try_cmpxchg(&s->max_attr_size, &size, len));
             ^~~~~~~~~~~
             xa_cmpxchg


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] mm, slab: Revert "extend slab/shrink to shrink all memcg caches"
  2020-04-24 15:12 [PATCH 1/2] mm, slab: Revert "extend slab/shrink to shrink all memcg caches" Waiman Long
  2020-04-24 15:12 ` [PATCH 2/2] mm/slub: Fix slab_mutex circular locking problem in slab_attr_store() Waiman Long
@ 2020-04-24 16:41 ` Qian Cai
  1 sibling, 0 replies; 7+ messages in thread
From: Qian Cai @ 2020-04-24 16:41 UTC (permalink / raw)
  To: Waiman Long
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, linux-mm, linux-kernel, Juri Lelli



> On Apr 24, 2020, at 11:12 AM, Waiman Long <longman@redhat.com> wrote:
> 
> When the slub shrink sysfs file is written into, the function call
> sequence is as follows:
> 
>  kernfs_fop_write
>    => slab_attr_store
>      => shrink_store
>        => kmem_cache_shrink_all
> 
> It turns out that doing a memcg cache scan in kmem_cache_shrink_all()
> is redundant as the same memcg cache scan is being done in
> slab_attr_store(). So revert the commit 04f768a39d55 ("mm, slab: extend
> slab/shrink to shrink all memcg caches") except the documentation change
> which is still valid.

BTW, currently, do,

# echo 1 > /sys/kernel/slab/fs_cache/shrink

would crash the kernel stack probably due to large amount of memcg caches.

 I am still figuring out if the above commit 04f768a39d55 is the culprit.

[ 7938.979589][T106403] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: __kmem_cache_create+0x7f8/0x800
[ 7938.979640][T106403] CPU: 80 PID: 106403 Comm: kworker/80:2 Not tainted 5.7.0-rc2-next-20200424 #5
[ 7938.979670][T106403] Workqueue: memcg_kmem_cache memcg_kmem_cache_create_func
[ 7938.979708][T106403] Call Trace:
[ 7938.979745][T106403] [c000200012e0f880] [c000000000716498] dump_stack+0xfc/0x174 (unreliable)
[ 7938.979789][T106403] [c000200012e0f8d0] [c00000000010d7d0] panic+0x224/0x4d4
[ 7938.979816][T106403] [c000200012e0f970] [c00000000010d05c] __stack_chk_fail+0x2c/0x30
[ 7938.979865][T106403] [c000200012e0f9d0] [c0000000004b1fb8] __kmem_cache_create+0x7f8/0x800
[ 7938.979914][T106403] [c000200012e0faf0] [4320383d35334320] 0x4320383d35334320

> 
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
> mm/slab.h        |  1 -
> mm/slab_common.c | 37 -------------------------------------
> mm/slub.c        |  2 +-
> 3 files changed, 1 insertion(+), 39 deletions(-)
> 
> diff --git a/mm/slab.h b/mm/slab.h
> index 207c83ef6e06..0937cb2ae8aa 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -237,7 +237,6 @@ int __kmem_cache_shrink(struct kmem_cache *);
> void __kmemcg_cache_deactivate(struct kmem_cache *s);
> void __kmemcg_cache_deactivate_after_rcu(struct kmem_cache *s);
> void slab_kmem_cache_release(struct kmem_cache *);
> -void kmem_cache_shrink_all(struct kmem_cache *s);
> 
> struct seq_file;
> struct file;
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 23c7500eea7d..2e367ab8c15c 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -995,43 +995,6 @@ int kmem_cache_shrink(struct kmem_cache *cachep)
> }
> EXPORT_SYMBOL(kmem_cache_shrink);
> 
> -/**
> - * kmem_cache_shrink_all - shrink a cache and all memcg caches for root cache
> - * @s: The cache pointer
> - */
> -void kmem_cache_shrink_all(struct kmem_cache *s)
> -{
> -	struct kmem_cache *c;
> -
> -	if (!IS_ENABLED(CONFIG_MEMCG_KMEM) || !is_root_cache(s)) {
> -		kmem_cache_shrink(s);
> -		return;
> -	}
> -
> -	get_online_cpus();
> -	get_online_mems();
> -	kasan_cache_shrink(s);
> -	__kmem_cache_shrink(s);
> -
> -	/*
> -	 * We have to take the slab_mutex to protect from the memcg list
> -	 * modification.
> -	 */
> -	mutex_lock(&slab_mutex);
> -	for_each_memcg_cache(c, s) {
> -		/*
> -		 * Don't need to shrink deactivated memcg caches.
> -		 */
> -		if (s->flags & SLAB_DEACTIVATED)
> -			continue;
> -		kasan_cache_shrink(c);
> -		__kmem_cache_shrink(c);
> -	}
> -	mutex_unlock(&slab_mutex);
> -	put_online_mems();
> -	put_online_cpus();
> -}
> -
> bool slab_is_available(void)
> {
> 	return slab_state >= UP;
> diff --git a/mm/slub.c b/mm/slub.c
> index 9bf44955c4f1..183ccc364ccf 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -5343,7 +5343,7 @@ static ssize_t shrink_store(struct kmem_cache *s,
> 			const char *buf, size_t length)
> {
> 	if (buf[0] == '1')
> -		kmem_cache_shrink_all(s);
> +		kmem_cache_shrink(s);
> 	else
> 		return -EINVAL;
> 	return length;
> -- 
> 2.18.1
> 
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] mm/slub: Fix slab_mutex circular locking problem in slab_attr_store()
  2020-04-24 15:12 ` [PATCH 2/2] mm/slub: Fix slab_mutex circular locking problem in slab_attr_store() Waiman Long
  2020-04-24 16:16   ` Qian Cai
@ 2020-04-24 19:44   ` Qian Cai
  2020-04-27 19:00     ` Waiman Long
  1 sibling, 1 reply; 7+ messages in thread
From: Qian Cai @ 2020-04-24 19:44 UTC (permalink / raw)
  To: Waiman Long
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Linux-MM, LKML, Juri Lelli



> On Apr 24, 2020, at 11:12 AM, Waiman Long <longman@redhat.com> wrote:
> 
> The following lockdep splat was reported:
> 
>  [  176.241923] ======================================================
>  [  176.241924] WARNING: possible circular locking dependency detected
>  [  176.241926] 4.18.0-172.rt13.29.el8.x86_64+debug #1 Not tainted
>  [  176.241927] ------------------------------------------------------
>  [  176.241929] slub_cpu_partia/5371 is trying to acquire lock:
>  [  176.241930] ffffffffa0b83718 (slab_mutex){+.+.}, at: slab_attr_store+0x6b/0xe0
>  [  176.241941]
>                 but task is already holding lock:
>  [  176.241942] ffff88bb6d8b83c8 (kn->count#103){++++}, at: kernfs_fop_write+0x1cc/0x400
>  [  176.241947]
>                 which lock already depends on the new lock.
> 
>  [  176.241949]
>                 the existing dependency chain (in reverse order) is:
>  [  176.241949]
>                 -> #1 (kn->count#103){++++}:
>  [  176.241955]        __kernfs_remove+0x616/0x800
>  [  176.241957]        kernfs_remove_by_name_ns+0x3e/0x80
>  [  176.241959]        sysfs_slab_add+0x1c6/0x330
>  [  176.241961]        __kmem_cache_create+0x15f/0x1b0
>  [  176.241964]        create_cache+0xe1/0x220
>  [  176.241966]        kmem_cache_create_usercopy+0x1a3/0x260
>  [  176.241967]        kmem_cache_create+0x12/0x20
>  [  176.242076]        mlx5_init_fs+0x18d/0x1a00 [mlx5_core]
>  [  176.242100]        mlx5_load_one+0x3b4/0x1730 [mlx5_core]
>  [  176.242124]        init_one+0x901/0x11b0 [mlx5_core]
>  [  176.242127]        local_pci_probe+0xd4/0x180
>  [  176.242131]        work_for_cpu_fn+0x51/0xa0
>  [  176.242133]        process_one_work+0x91a/0x1ac0
>  [  176.242134]        worker_thread+0x536/0xb40
>  [  176.242136]        kthread+0x30c/0x3d0
>  [  176.242140]        ret_from_fork+0x27/0x50
>  [  176.242140]
>                 -> #0 (slab_mutex){+.+.}:
>  [  176.242145]        __lock_acquire+0x22cb/0x48c0
>  [  176.242146]        lock_acquire+0x134/0x4c0
>  [  176.242148]        _mutex_lock+0x28/0x40
>  [  176.242150]        slab_attr_store+0x6b/0xe0
>  [  176.242151]        kernfs_fop_write+0x251/0x400
>  [  176.242154]        vfs_write+0x157/0x460
>  [  176.242155]        ksys_write+0xb8/0x170
>  [  176.242158]        do_syscall_64+0x13c/0x710
>  [  176.242160]        entry_SYSCALL_64_after_hwframe+0x6a/0xdf
>  [  176.242161]
>                 other info that might help us debug this:
> 
>  [  176.242161]  Possible unsafe locking scenario:
> 
>  [  176.242162]        CPU0                    CPU1
>  [  176.242163]        ----                    ----
>  [  176.242163]   lock(kn->count#103);
>  [  176.242165]                                lock(slab_mutex);
>  [  176.242166]                                lock(kn->count#103);
>  [  176.242167]   lock(slab_mutex);
>  [  176.242169]
>                  *** DEADLOCK ***
> 
>  [  176.242170] 3 locks held by slub_cpu_partia/5371:
>  [  176.242170]  #0: ffff888705e3a800 (sb_writers#4){.+.+}, at: vfs_write+0x31c/0x460
>  [  176.242174]  #1: ffff889aeec4d658 (&of->mutex){+.+.}, at: kernfs_fop_write+0x1a9/0x400
>  [  176.242177]  #2: ffff88bb6d8b83c8 (kn->count#103){++++}, at: kernfs_fop_write+0x1cc/0x400
>  [  176.242180]
>                 stack backtrace:
>  [  176.242183] CPU: 36 PID: 5371 Comm: slub_cpu_partia Not tainted 4.18.0-172.rt13.29.el8.x86_64+debug #1
>  [  176.242184] Hardware name: AMD Corporation DAYTONA_X/DAYTONA_X, BIOS RDY1005C 11/22/2019
>  [  176.242185] Call Trace:
>  [  176.242190]  dump_stack+0x9a/0xf0
>  [  176.242193]  check_noncircular+0x317/0x3c0
>  [  176.242195]  ? print_circular_bug+0x1e0/0x1e0
>  [  176.242199]  ? native_sched_clock+0x32/0x1e0
>  [  176.242202]  ? sched_clock+0x5/0x10
>  [  176.242205]  ? sched_clock_cpu+0x238/0x340
>  [  176.242208]  __lock_acquire+0x22cb/0x48c0
>  [  176.242213]  ? trace_hardirqs_on+0x10/0x10
>  [  176.242215]  ? trace_hardirqs_on+0x10/0x10
>  [  176.242218]  lock_acquire+0x134/0x4c0
>  [  176.242220]  ? slab_attr_store+0x6b/0xe0
>  [  176.242223]  _mutex_lock+0x28/0x40
>  [  176.242225]  ? slab_attr_store+0x6b/0xe0
>  [  176.242227]  slab_attr_store+0x6b/0xe0
>  [  176.242229]  ? sysfs_file_ops+0x160/0x160
>  [  176.242230]  kernfs_fop_write+0x251/0x400
>  [  176.242232]  ? __sb_start_write+0x26a/0x3f0
>  [  176.242234]  vfs_write+0x157/0x460
>  [  176.242237]  ksys_write+0xb8/0x170
>  [  176.242239]  ? __ia32_sys_read+0xb0/0xb0
>  [  176.242242]  ? do_syscall_64+0xb9/0x710
>  [  176.242245]  do_syscall_64+0x13c/0x710
>  [  176.242247]  entry_SYSCALL_64_after_hwframe+0x6a/0xdf
> 
> There was another lockdep splat generated by echoing "1" to
> "/sys/kernel/slab/fs_cache/shrink":
> 
> [  445.231443] Chain exists of:
>                 cpu_hotplug_lock --> mem_hotplug_lock --> slab_mutex
> 
> [  445.242025]  Possible unsafe locking scenario:
> 
> [  445.247977]        CPU0                    CPU1
> [  445.252529]        ----                    ----
> [  445.257082]   lock(slab_mutex);
> [  445.260239]                                lock(mem_hotplug_lock);
> [  445.266452]                                lock(slab_mutex);
> [  445.272141]   lock(cpu_hotplug_lock);
> 
> So it is problematic to use slab_mutex to iterate the list of
> child memcgs with for_each_memcg_cache(). Fortunately, there is
> another way to do child memcg iteration by going through the array
> entries in memcg_params.memcg_caches while holding a read lock on
> memcg_cache_ids_sem.
> 
> To avoid other possible circular locking problems, we only take a
> reference to the child memcgs and store their addresses while holding
> memcg_cache_ids_sem. The actual store method is called for each of the
> child memcgs after releasing the lock.

Even on x86 where it compiles, this patch is insufficient to prevent a lockdep splat,
because there are still cpu_hotplug_lock and mem_hotplug_lock in the way.

[  290.738690] ======================================================
[  290.744908] WARNING: possible circular locking dependency detected
[  290.751134] 5.7.0-rc2-next-20200424+ #12 Not tainted
[  290.756132] ------------------------------------------------------
[  290.762611] bash/1798 is trying to acquire lock:
[  290.767262] ffffffff89917890 (cpu_hotplug_lock){++++}-{0:0}, at: kmem_cache_shrink+0x12/0x40
[  290.775770] 
[  290.775770] but task is already holding lock:
[  290.781641] ffff8887f2fbc830 (kn->count#86){++++}-{0:0}, at: kernfs_fop_write+0x105/0x250
[  290.790061] 
[  290.790061] which lock already depends on the new lock.
[  290.790061] 
[  290.798289] 
[  290.798289] the existing dependency chain (in reverse order) is:
[  290.805818] 
[  290.805818] -> #3 (kn->count#86){++++}-{0:0}:
[  290.811954]        __kernfs_remove+0x455/0x4c0
[  290.816428]        kernfs_remove+0x23/0x40
[  290.820554]        sysfs_remove_dir+0x74/0x80
[  290.824947]        kobject_del+0x57/0xa0
[  290.828905]        sysfs_slab_unlink+0x1c/0x20
[  290.833377]        shutdown_cache+0x15d/0x1c0
[  290.837964]        kmemcg_cache_shutdown_fn+0xe/0x20
[  290.842963]        kmemcg_workfn+0x35/0x50
[  290.847095]        process_one_work+0x57e/0xb90
[  290.851658]        worker_thread+0x63/0x5b0
[  290.855872]        kthread+0x1f7/0x220
[  290.859653]        ret_from_fork+0x27/0x50
[  290.864042] 
[  290.864042] -> #2 (slab_mutex){+.+.}-{3:3}:
[  290.869747]        __mutex_lock+0x12f/0xca0
[  290.873960]        mutex_lock_nested+0x1b/0x20
[  290.878434]        memcg_create_kmem_cache+0x2e/0x190
[  290.883523]        memcg_kmem_cache_create_func+0x38/0x190
[  290.889274]        process_one_work+0x57e/0xb90
[  290.893838]        worker_thread+0x63/0x5b0
[  290.898050]        kthread+0x1f7/0x220
[  290.901828]        ret_from_fork+0x27/0x50
[  290.905953] 
[  290.905953] -> #1 (mem_hotplug_lock){++++}-{0:0}:
[  290.912384]        get_online_mems+0x3e/0xc0
[  290.916682]        kmem_cache_create_usercopy+0x32/0x270
[  290.922030]        kmem_cache_create+0x16/0x20
[  290.926507]        kmemleak_init+0x4a/0xf4
[  290.930640]        start_kernel+0x52a/0x8bb
[  290.935109]        x86_64_start_reservations+0x24/0x26
[  290.940291]        x86_64_start_kernel+0xf4/0xfb
[  290.944942]        secondary_startup_64+0xb6/0xc0
[  290.949677] 
[  290.949677] -> #0 (cpu_hotplug_lock){++++}-{0:0}:
[  290.955906]        __lock_acquire+0x21f8/0x3260
[  290.960751]        lock_acquire+0x1a2/0x680
[  290.964966]        cpus_read_lock+0x3e/0xc0
[  290.969176]        kmem_cache_shrink+0x12/0x40
[  290.973650]        shrink_store+0x17/0x30
[  290.977687]        slab_attr_store+0x4d/0x3c0
[  290.982076]        sysfs_kf_write+0x89/0xb0
[  290.986473]        kernfs_fop_write+0x155/0x250
[  290.991041]        __vfs_write+0x50/0xa0
[  290.994990]        vfs_write+0x10b/0x290
[  290.998942]        ksys_write+0xcc/0x170
[  291.002893]        __x64_sys_write+0x43/0x50
[  291.007193]        do_syscall_64+0xcc/0xaf0
[  291.011600]        entry_SYSCALL_64_after_hwframe+0x49/0xb3
[  291.017208] 
[  291.017208] other info that might help us debug this:
[  291.017208] 
[  291.025262] Chain exists of:
[  291.025262]   cpu_hotplug_lock --> slab_mutex --> kn->count#86
[  291.025262] 
[  291.035683]  Possible unsafe locking scenario:
[  291.035683] 
[  291.041639]        CPU0                    CPU1
[  291.046200]        ----                    ----
[  291.050762]   lock(kn->count#86);
[  291.054101]                                lock(slab_mutex);
[  291.060132]                                lock(kn->count#86);
[  291.066018]   lock(cpu_hotplug_lock);
[  291.069707] 
[  291.069707]  *** DEADLOCK ***
[  291.069707] 
[  291.075666] 3 locks held by bash/1798:
[  291.079443]  #0: ffff88832e091430 (sb_writers#4){.+.+}-{0:0}, at: vfs_write+0x262/0x290
[  291.087711]  #1: ffff888784670288 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write+0xf5/0x250
[  291.096119]  #2: ffff8887f2fbc830 (kn->count#86){++++}-{0:0}, at: kernfs_fop_write+0x105/0x250
[  291.104788] 
[  291.104788] stack backtrace:
[  291.109181] CPU: 14 PID: 1798 Comm: bash Not tainted 5.7.0-rc2-next-20200424+ #12
[  291.116940] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 03/09/2018
[  291.125517] Call Trace:
[  291.127990]  dump_stack+0xa7/0xea
[  291.131328]  print_circular_bug.cold.54+0x147/0x14c
[  291.136463]  check_noncircular+0x295/0x2d0
[  291.140586]  ? print_circular_bug+0x1d0/0x1d0
[  291.144974]  ? __kasan_check_read+0x11/0x20
[  291.149188]  ? mark_lock+0x160/0xfe0
[  291.152792]  __lock_acquire+0x21f8/0x3260
[  291.156830]  ? register_lock_class+0xb90/0xb90
[  291.161534]  ? register_lock_class+0xb90/0xb90
[  291.166012]  ? check_flags.part.28+0x220/0x220
[  291.170489]  lock_acquire+0x1a2/0x680
[  291.174178]  ? kmem_cache_shrink+0x12/0x40
[  291.178306]  ? check_flags.part.28+0x220/0x220
[  291.182784]  ? ___might_sleep+0x178/0x210
[  291.187021]  cpus_read_lock+0x3e/0xc0
[  291.190709]  ? kmem_cache_shrink+0x12/0x40
[  291.194832]  kmem_cache_shrink+0x12/0x40
[  291.198781]  shrink_store+0x17/0x30
[  291.202294]  slab_attr_store+0x4d/0x3c0
[  291.206159]  ? kernfs_get_active+0x136/0x150
[  291.210663]  ? objects_show+0x20/0x20
[  291.214349]  sysfs_kf_write+0x89/0xb0
[  291.218038]  ? sysfs_file_ops+0xa0/0xa0
[  291.221898]  kernfs_fop_write+0x155/0x250
[  291.225936]  __vfs_write+0x50/0xa0
[  291.229364]  vfs_write+0x10b/0x290
[  291.232790]  ksys_write+0xcc/0x170
[  291.236405]  ? __x64_sys_read+0x50/0x50
[  291.240270]  ? do_syscall_64+0x79/0xaf0
[  291.244132]  ? do_syscall_64+0x79/0xaf0
[  291.247998]  __x64_sys_write+0x43/0x50
[  291.251776]  do_syscall_64+0xcc/0xaf0
[  291.255464]  ? trace_hardirqs_on_thunk+0x1a/0x1c
[  291.260396]  ? syscall_return_slowpath+0x580/0x580
[  291.265223]  ? entry_SYSCALL_64_after_hwframe+0x3e/0xb3
[  291.270491]  ? trace_hardirqs_off_caller+0x3a/0x150
[  291.275400]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[  291.280137]  entry_SYSCALL_64_after_hwframe+0x49/0xb3
[  291.285443] RIP: 0033:0x7f2c0d9adb28
[  291.289046] Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 35 4b 2d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
[  291.307931] RSP: 002b:00007ffccc6e3898 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  291.315804] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f2c0d9adb28
[  291.322985] RDX: 0000000000000002 RSI: 000055d97c790de0 RDI: 0000000000000001
[  291.330164] RBP: 000055d97c790de0 R08: 000000000000000a R09: 00007f2c0da3ec80
[  291.337537] R10: 000000000000000a R11: 0000000000000246 R12: 00007f2c0dc7e6c0
[  291.344718] R13: 0000000000000002 R14: 00007f2c0dc79880 R15: 0000000000000002


> 
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
> mm/slub.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++--------
> 1 file changed, 48 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 183ccc364ccf..255981180489 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -5567,13 +5567,30 @@ static ssize_t slab_attr_store(struct kobject *kobj,
> 		return -EIO;
> 
> 	err = attribute->store(s, buf, len);
> -#ifdef CONFIG_MEMCG
> -	if (slab_state >= FULL && err >= 0 && is_root_cache(s)) {
> -		struct kmem_cache *c;
> +#ifdef CONFIG_MEMCG_KMEM
> +	if (slab_state >= FULL && err >= 0 && is_root_cache(s) &&
> +	    !list_empty(&s->memcg_params.children)) {
> +		struct kmem_cache *c, **pcaches;
> +		int idx, max, cnt = 0;
> +		size_t size = s->max_attr_size;
> +		struct memcg_cache_array *arr;
> +
> +		/*
> +		 * Make atomic update to s->max_attr_size.
> +		 */
> +		do {
> +			if (len <= size)
> +				break;
> +		} while (!try_cmpxchg(&s->max_attr_size, &size, len));
> 
> -		mutex_lock(&slab_mutex);
> -		if (s->max_attr_size < len)
> -			s->max_attr_size = len;
> +		memcg_get_cache_ids();
> +		max = memcg_nr_cache_ids;
> +
> +		pcaches = kmalloc_array(max, sizeof(void *), GFP_KERNEL);
> +		if (!pcaches) {
> +			memcg_put_cache_ids();
> +			return -ENOMEM;
> +		}
> 
> 		/*
> 		 * This is a best effort propagation, so this function's return
> @@ -5591,10 +5608,33 @@ static ssize_t slab_attr_store(struct kobject *kobj,
> 		 * has well defined semantics. The cache being written to
> 		 * directly either failed or succeeded, in which case we loop
> 		 * through the descendants with best-effort propagation.
> +		 *
> +		 * To avoid potential circular lock dependency problems, we
> +		 * just get a reference and store child cache pointers while
> +		 * holding the memcg_cache_ids_sem read lock. The store
> +		 * method is then called for each child cache after releasing
> +		 * the lock. Code sequence partly borrowed from
> +		 * memcg_kmem_get_cache().
> 		 */
> -		for_each_memcg_cache(c, s)
> +		rcu_read_lock();
> +		arr = rcu_dereference(s->memcg_params.memcg_caches);
> +		for (idx = 0; idx < max; idx++) {
> +			c = READ_ONCE(arr->entries[idx]);
> +			if (!c)
> +				continue;
> +			if (!percpu_ref_tryget(&c->memcg_params.refcnt))
> +				continue;
> +			pcaches[cnt++] = c;
> +		}
> +		rcu_read_unlock();
> +		memcg_put_cache_ids();
> +
> +		for (idx = 0; idx < cnt; idx++) {
> +			c = pcaches[idx];
> 			attribute->store(c, buf, len);
> -		mutex_unlock(&slab_mutex);
> +			percpu_ref_put(&c->memcg_params.refcnt);
> +		}
> +		kfree(pcaches);
> 	}
> #endif
> 	return err;
> -- 
> 2.18.1
> 
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] mm/slub: Fix slab_mutex circular locking problem in slab_attr_store()
  2020-04-24 16:16   ` Qian Cai
@ 2020-04-27 18:16     ` Waiman Long
  0 siblings, 0 replies; 7+ messages in thread
From: Waiman Long @ 2020-04-27 18:16 UTC (permalink / raw)
  To: Qian Cai
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Linux-MM, LKML, Juri Lelli

On 4/24/20 12:16 PM, Qian Cai wrote:
>
>> On Apr 24, 2020, at 11:12 AM, Waiman Long <longman@redhat.com> wrote:
>>
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 183ccc364ccf..255981180489 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -5567,13 +5567,30 @@ static ssize_t slab_attr_store(struct kobject *kobj,
>> 		return -EIO;
>>
>> 	err = attribute->store(s, buf, len);
>> -#ifdef CONFIG_MEMCG
>> -	if (slab_state >= FULL && err >= 0 && is_root_cache(s)) {
>> -		struct kmem_cache *c;
>> +#ifdef CONFIG_MEMCG_KMEM
>> +	if (slab_state >= FULL && err >= 0 && is_root_cache(s) &&
>> +	    !list_empty(&s->memcg_params.children)) {
>> +		struct kmem_cache *c, **pcaches;
>> +		int idx, max, cnt = 0;
>> +		size_t size = s->max_attr_size;
>> +		struct memcg_cache_array *arr;
>> +
>> +		/*
>> +		 * Make atomic update to s->max_attr_size.
>> +		 */
>> +		do {
>> +			if (len <= size)
>> +				break;
>> +		} while (!try_cmpxchg(&s->max_attr_size, &size, len));
>>
> try_cmpxchg() is only available on x86.
>
> mm/slub.c: In function ‘slab_attr_store’:
> mm/slub.c:5613:13: error: implicit declaration of function ‘try_cmpxchg’; did you mean ‘xa_cmpxchg’? [-Werror=implicit-function-declaration]
>     } while (!try_cmpxchg(&s->max_attr_size, &size, len));
>               ^~~~~~~~~~~
>               xa_cmpxchg
>
Oh, sorry about that. Will fix it and send out v2.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] mm/slub: Fix slab_mutex circular locking problem in slab_attr_store()
  2020-04-24 19:44   ` Qian Cai
@ 2020-04-27 19:00     ` Waiman Long
  0 siblings, 0 replies; 7+ messages in thread
From: Waiman Long @ 2020-04-27 19:00 UTC (permalink / raw)
  To: Qian Cai
  Cc: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Linux-MM, LKML, Juri Lelli

On 4/24/20 3:44 PM, Qian Cai wrote:
>
>> On Apr 24, 2020, at 11:12 AM, Waiman Long <longman@redhat.com> wrote:
>>
>> The following lockdep splat was reported:
>>
>>   [  176.241923] ======================================================
>>   [  176.241924] WARNING: possible circular locking dependency detected
>>   [  176.241926] 4.18.0-172.rt13.29.el8.x86_64+debug #1 Not tainted
>>   [  176.241927] ------------------------------------------------------
>>   [  176.241929] slub_cpu_partia/5371 is trying to acquire lock:
>>   [  176.241930] ffffffffa0b83718 (slab_mutex){+.+.}, at: slab_attr_store+0x6b/0xe0
>>   [  176.241941]
>>                  but task is already holding lock:
>>   [  176.241942] ffff88bb6d8b83c8 (kn->count#103){++++}, at: kernfs_fop_write+0x1cc/0x400
>>   [  176.241947]
>>                  which lock already depends on the new lock.
>>
>>   [  176.241949]
>>                  the existing dependency chain (in reverse order) is:
>>   [  176.241949]
>>                  -> #1 (kn->count#103){++++}:
>>   [  176.241955]        __kernfs_remove+0x616/0x800
>>   [  176.241957]        kernfs_remove_by_name_ns+0x3e/0x80
>>   [  176.241959]        sysfs_slab_add+0x1c6/0x330
>>   [  176.241961]        __kmem_cache_create+0x15f/0x1b0
>>   [  176.241964]        create_cache+0xe1/0x220
>>   [  176.241966]        kmem_cache_create_usercopy+0x1a3/0x260
>>   [  176.241967]        kmem_cache_create+0x12/0x20
>>   [  176.242076]        mlx5_init_fs+0x18d/0x1a00 [mlx5_core]
>>   [  176.242100]        mlx5_load_one+0x3b4/0x1730 [mlx5_core]
>>   [  176.242124]        init_one+0x901/0x11b0 [mlx5_core]
>>   [  176.242127]        local_pci_probe+0xd4/0x180
>>   [  176.242131]        work_for_cpu_fn+0x51/0xa0
>>   [  176.242133]        process_one_work+0x91a/0x1ac0
>>   [  176.242134]        worker_thread+0x536/0xb40
>>   [  176.242136]        kthread+0x30c/0x3d0
>>   [  176.242140]        ret_from_fork+0x27/0x50
>>   [  176.242140]
>>                  -> #0 (slab_mutex){+.+.}:
>>   [  176.242145]        __lock_acquire+0x22cb/0x48c0
>>   [  176.242146]        lock_acquire+0x134/0x4c0
>>   [  176.242148]        _mutex_lock+0x28/0x40
>>   [  176.242150]        slab_attr_store+0x6b/0xe0
>>   [  176.242151]        kernfs_fop_write+0x251/0x400
>>   [  176.242154]        vfs_write+0x157/0x460
>>   [  176.242155]        ksys_write+0xb8/0x170
>>   [  176.242158]        do_syscall_64+0x13c/0x710
>>   [  176.242160]        entry_SYSCALL_64_after_hwframe+0x6a/0xdf
>>   [  176.242161]
>>                  other info that might help us debug this:
>>
>>   [  176.242161]  Possible unsafe locking scenario:
>>
>>   [  176.242162]        CPU0                    CPU1
>>   [  176.242163]        ----                    ----
>>   [  176.242163]   lock(kn->count#103);
>>   [  176.242165]                                lock(slab_mutex);
>>   [  176.242166]                                lock(kn->count#103);
>>   [  176.242167]   lock(slab_mutex);
>>   [  176.242169]
>>                   *** DEADLOCK ***
>>
>>   [  176.242170] 3 locks held by slub_cpu_partia/5371:
>>   [  176.242170]  #0: ffff888705e3a800 (sb_writers#4){.+.+}, at: vfs_write+0x31c/0x460
>>   [  176.242174]  #1: ffff889aeec4d658 (&of->mutex){+.+.}, at: kernfs_fop_write+0x1a9/0x400
>>   [  176.242177]  #2: ffff88bb6d8b83c8 (kn->count#103){++++}, at: kernfs_fop_write+0x1cc/0x400
>>   [  176.242180]
>>                  stack backtrace:
>>   [  176.242183] CPU: 36 PID: 5371 Comm: slub_cpu_partia Not tainted 4.18.0-172.rt13.29.el8.x86_64+debug #1
>>   [  176.242184] Hardware name: AMD Corporation DAYTONA_X/DAYTONA_X, BIOS RDY1005C 11/22/2019
>>   [  176.242185] Call Trace:
>>   [  176.242190]  dump_stack+0x9a/0xf0
>>   [  176.242193]  check_noncircular+0x317/0x3c0
>>   [  176.242195]  ? print_circular_bug+0x1e0/0x1e0
>>   [  176.242199]  ? native_sched_clock+0x32/0x1e0
>>   [  176.242202]  ? sched_clock+0x5/0x10
>>   [  176.242205]  ? sched_clock_cpu+0x238/0x340
>>   [  176.242208]  __lock_acquire+0x22cb/0x48c0
>>   [  176.242213]  ? trace_hardirqs_on+0x10/0x10
>>   [  176.242215]  ? trace_hardirqs_on+0x10/0x10
>>   [  176.242218]  lock_acquire+0x134/0x4c0
>>   [  176.242220]  ? slab_attr_store+0x6b/0xe0
>>   [  176.242223]  _mutex_lock+0x28/0x40
>>   [  176.242225]  ? slab_attr_store+0x6b/0xe0
>>   [  176.242227]  slab_attr_store+0x6b/0xe0
>>   [  176.242229]  ? sysfs_file_ops+0x160/0x160
>>   [  176.242230]  kernfs_fop_write+0x251/0x400
>>   [  176.242232]  ? __sb_start_write+0x26a/0x3f0
>>   [  176.242234]  vfs_write+0x157/0x460
>>   [  176.242237]  ksys_write+0xb8/0x170
>>   [  176.242239]  ? __ia32_sys_read+0xb0/0xb0
>>   [  176.242242]  ? do_syscall_64+0xb9/0x710
>>   [  176.242245]  do_syscall_64+0x13c/0x710
>>   [  176.242247]  entry_SYSCALL_64_after_hwframe+0x6a/0xdf
>>
>> There was another lockdep splat generated by echoing "1" to
>> "/sys/kernel/slab/fs_cache/shrink":
>>
>> [  445.231443] Chain exists of:
>>                  cpu_hotplug_lock --> mem_hotplug_lock --> slab_mutex
>>
>> [  445.242025]  Possible unsafe locking scenario:
>>
>> [  445.247977]        CPU0                    CPU1
>> [  445.252529]        ----                    ----
>> [  445.257082]   lock(slab_mutex);
>> [  445.260239]                                lock(mem_hotplug_lock);
>> [  445.266452]                                lock(slab_mutex);
>> [  445.272141]   lock(cpu_hotplug_lock);
>>
>> So it is problematic to use slab_mutex to iterate the list of
>> child memcgs with for_each_memcg_cache(). Fortunately, there is
>> another way to do child memcg iteration by going through the array
>> entries in memcg_params.memcg_caches while holding a read lock on
>> memcg_cache_ids_sem.
>>
>> To avoid other possible circular locking problems, we only take a
>> reference to the child memcgs and store their addresses while holding
>> memcg_cache_ids_sem. The actual store method is called for each of the
>> child memcgs after releasing the lock.
> Even on x86 where it compiles, this patch is insufficient to prevent a lockdep splat,
> because there are still cpu_hotplug_lock and mem_hotplug_lock in the way.
>
Yes, I was aware of that. Will include additional patch in v2 to address 
that.

Thanks,
Longman


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-04-27 19:00 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-24 15:12 [PATCH 1/2] mm, slab: Revert "extend slab/shrink to shrink all memcg caches" Waiman Long
2020-04-24 15:12 ` [PATCH 2/2] mm/slub: Fix slab_mutex circular locking problem in slab_attr_store() Waiman Long
2020-04-24 16:16   ` Qian Cai
2020-04-27 18:16     ` Waiman Long
2020-04-24 19:44   ` Qian Cai
2020-04-27 19:00     ` Waiman Long
2020-04-24 16:41 ` [PATCH 1/2] mm, slab: Revert "extend slab/shrink to shrink all memcg caches" Qian Cai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.