All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm: slub: fix a deadlock warning in kmem_cache_destroy
@ 2022-01-14 14:23 Xin Long
  2022-01-16  6:35 ` Hyeonggon Yoo
  2022-06-01 14:36 ` Maurizio Lombardi
  0 siblings, 2 replies; 10+ messages in thread
From: Xin Long @ 2022-01-14 14:23 UTC (permalink / raw)
  To: linux-kernel, linux-mm, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, Vlastimil Babka
  Cc: Sebastian Andrzej Siewior, Antoine Tenart

cpus_read_lock() is introduced into kmem_cache_destroy() by
commit 5a836bf6b09f ("mm: slub: move flush_cpu_slab() invocations
__free_slab() invocations out of IRQ context"), and it could cause
a deadlock.

As Antoine pointed out, when one thread calls kmem_cache_destroy(), it is
blocking until kn->active becomes 0 in kernfs_drain() after holding
cpu_hotplug_lock. While in another thread, when calling kernfs_fop_write(),
it may try to hold cpu_hotplug_lock after incrementing kn->active by
calling kernfs_get_active():

        CPU0                        CPU1
        ----                        ----
  cpus_read_lock()
                                   kn->active++
                                   cpus_read_lock() [a]
  wait until kn->active == 0

Although cpu_hotplug_lock is a RWSEM, [a] will not block in there. But as
lockdep annotations are added for cpu_hotplug_lock, a deadlock warning
would be detected:

  ======================================================
  WARNING: possible circular locking dependency detected
  ------------------------------------------------------
  dmsetup/1832 is trying to acquire lock:
  ffff986f5a0f9f20 (kn->count#144){++++}-{0:0}, at: kernfs_remove+0x1d/0x30

  but task is already holding lock:
  ffffffffa43817c0 (slab_mutex){+.+.}-{3:3}, at: kmem_cache_destroy+0x2a/0x120

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #2 (slab_mutex){+.+.}-{3:3}:
         lock_acquire+0xe8/0x470
         mutex_lock_nested+0x47/0x80
         kmem_cache_destroy+0x2a/0x120
         bioset_exit+0xb5/0x100
         cleanup_mapped_device+0x26/0xf0 [dm_mod]
         free_dev+0x43/0xb0 [dm_mod]
         __dm_destroy+0x153/0x1b0 [dm_mod]
         dev_remove+0xe4/0x1a0 [dm_mod]
         ctl_ioctl+0x1af/0x3f0 [dm_mod]
         dm_ctl_ioctl+0xa/0x10 [dm_mod]
         do_vfs_ioctl+0xa5/0x760
         ksys_ioctl+0x60/0x90
         __x64_sys_ioctl+0x16/0x20
         do_syscall_64+0x8c/0x240
         entry_SYSCALL_64_after_hwframe+0x6a/0xdf

  -> #1 (cpu_hotplug_lock){++++}-{0:0}:
         lock_acquire+0xe8/0x470
         cpus_read_lock+0x39/0x100
         cpu_partial_store+0x44/0x80
         slab_attr_store+0x20/0x30
         kernfs_fop_write+0x101/0x1b0
         vfs_write+0xd4/0x1e0
         ksys_write+0x52/0xc0
         do_syscall_64+0x8c/0x240
         entry_SYSCALL_64_after_hwframe+0x6a/0xdf

  -> #0 (kn->count#144){++++}-{0:0}:
         check_prevs_add+0x185/0xb80
         __lock_acquire+0xd8f/0xe90
         lock_acquire+0xe8/0x470
         __kernfs_remove+0x25e/0x320
         kernfs_remove+0x1d/0x30
         kobject_del+0x28/0x60
         kmem_cache_destroy+0xf1/0x120
         bioset_exit+0xb5/0x100
         cleanup_mapped_device+0x26/0xf0 [dm_mod]
         free_dev+0x43/0xb0 [dm_mod]
         __dm_destroy+0x153/0x1b0 [dm_mod]
         dev_remove+0xe4/0x1a0 [dm_mod]
         ctl_ioctl+0x1af/0x3f0 [dm_mod]
         dm_ctl_ioctl+0xa/0x10 [dm_mod]
         do_vfs_ioctl+0xa5/0x760
         ksys_ioctl+0x60/0x90
         __x64_sys_ioctl+0x16/0x20
         do_syscall_64+0x8c/0x240
         entry_SYSCALL_64_after_hwframe+0x6a/0xdf

  other info that might help us debug this:

  Chain exists of:
    kn->count#144 --> cpu_hotplug_lock --> slab_mutex

   Possible unsafe locking scenario:

         CPU0                    CPU1
         ----                    ----
    lock(slab_mutex);
                                 lock(cpu_hotplug_lock);
                                 lock(slab_mutex);
    lock(kn->count#144);

   *** DEADLOCK ***

  3 locks held by dmsetup/1832:
   #0: ffffffffa43fe5c0 (bio_slab_lock){+.+.}-{3:3}, at: bioset_exit+0x62/0x100
   #1: ffffffffa3e87c20 (cpu_hotplug_lock){++++}-{0:0}, at: kmem_cache_destroy+0x1c/0x120
   #2: ffffffffa43817c0 (slab_mutex){+.+.}-{3:3}, at: kmem_cache_destroy+0x2a/0x120

  stack backtrace:
  Call Trace:
   dump_stack+0x5c/0x80
   check_noncircular+0xff/0x120
   check_prevs_add+0x185/0xb80
   __lock_acquire+0xd8f/0xe90
   lock_acquire+0xe8/0x470
   __kernfs_remove+0x25e/0x320
   kernfs_remove+0x1d/0x30
   kobject_del+0x28/0x60
   kmem_cache_destroy+0xf1/0x120
   bioset_exit+0xb5/0x100
   cleanup_mapped_device+0x26/0xf0 [dm_mod]
   free_dev+0x43/0xb0 [dm_mod]
   __dm_destroy+0x153/0x1b0 [dm_mod]
   dev_remove+0xe4/0x1a0 [dm_mod]
   ctl_ioctl+0x1af/0x3f0 [dm_mod]
   dm_ctl_ioctl+0xa/0x10 [dm_mod]
   do_vfs_ioctl+0xa5/0x760
   ksys_ioctl+0x60/0x90
   __x64_sys_ioctl+0x16/0x20
   do_syscall_64+0x8c/0x240
   entry_SYSCALL_64_after_hwframe+0x6a/0xdf

Since cpus_read_lock() is supposed to protect the cpu related data, it
makes sense to fix this issue by moving cpus_read_lock() from
kmem_cache_destroy() to __kmem_cache_shutdown(). While at it,
add the missing cpus_read_lock() in slab_mem_going_offline_callback().

Fixes: 5a836bf6b09f ("mm: slub: move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 mm/slab_common.c | 2 --
 mm/slub.c        | 4 ++--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index e5d080a93009..06ec3fa585e6 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -494,7 +494,6 @@ void kmem_cache_destroy(struct kmem_cache *s)
 	if (unlikely(!s))
 		return;
 
-	cpus_read_lock();
 	mutex_lock(&slab_mutex);
 
 	s->refcount--;
@@ -509,7 +508,6 @@ void kmem_cache_destroy(struct kmem_cache *s)
 	}
 out_unlock:
 	mutex_unlock(&slab_mutex);
-	cpus_read_unlock();
 }
 EXPORT_SYMBOL(kmem_cache_destroy);
 
diff --git a/mm/slub.c b/mm/slub.c
index abe7db581d68..754f020235ee 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4311,7 +4311,7 @@ int __kmem_cache_shutdown(struct kmem_cache *s)
 	int node;
 	struct kmem_cache_node *n;
 
-	flush_all_cpus_locked(s);
+	flush_all(s);
 	/* Attempt to free all objects */
 	for_each_kmem_cache_node(s, node, n) {
 		free_partial(s, n);
@@ -4646,7 +4646,7 @@ static int slab_mem_going_offline_callback(void *arg)
 
 	mutex_lock(&slab_mutex);
 	list_for_each_entry(s, &slab_caches, list) {
-		flush_all_cpus_locked(s);
+		flush_all(s);
 		__kmem_cache_do_shrink(s);
 	}
 	mutex_unlock(&slab_mutex);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: slub: fix a deadlock warning in kmem_cache_destroy
  2022-01-14 14:23 [PATCH] mm: slub: fix a deadlock warning in kmem_cache_destroy Xin Long
@ 2022-01-16  6:35 ` Hyeonggon Yoo
  2022-01-17  8:32   ` Xin Long
  2022-06-01 14:36 ` Maurizio Lombardi
  1 sibling, 1 reply; 10+ messages in thread
From: Hyeonggon Yoo @ 2022-01-16  6:35 UTC (permalink / raw)
  To: Xin Long
  Cc: linux-kernel, linux-mm, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, Vlastimil Babka,
	Sebastian Andrzej Siewior, Antoine Tenart

On Fri, Jan 14, 2022 at 09:23:16AM -0500, Xin Long wrote:
> cpus_read_lock() is introduced into kmem_cache_destroy() by
> commit 5a836bf6b09f ("mm: slub: move flush_cpu_slab() invocations
> __free_slab() invocations out of IRQ context"), and it could cause
> a deadlock.
> 
> As Antoine pointed out, when one thread calls kmem_cache_destroy(), it is
> blocking until kn->active becomes 0 in kernfs_drain() after holding
> cpu_hotplug_lock. While in another thread, when calling kernfs_fop_write(),
> it may try to hold cpu_hotplug_lock after incrementing kn->active by
> calling kernfs_get_active():
>

Hello. can you give me a link of related the thread?

>         CPU0                        CPU1
>         ----                        ----
>   cpus_read_lock()
>                                    kn->active++
>                                    cpus_read_lock() [a]
>   wait until kn->active == 0
> 
> Although cpu_hotplug_lock is a RWSEM, [a] will not block in there. But as
> lockdep annotations are added for cpu_hotplug_lock, a deadlock warning
> would be detected:
> 
>   ======================================================
>   WARNING: possible circular locking dependency detected
>   ------------------------------------------------------
>   dmsetup/1832 is trying to acquire lock:
>   ffff986f5a0f9f20 (kn->count#144){++++}-{0:0}, at: kernfs_remove+0x1d/0x30
> 
>   but task is already holding lock:
>   ffffffffa43817c0 (slab_mutex){+.+.}-{3:3}, at: kmem_cache_destroy+0x2a/0x120
> 
>   which lock already depends on the new lock.
> 
>   the existing dependency chain (in reverse order) is:
> 
>   -> #2 (slab_mutex){+.+.}-{3:3}:
>          lock_acquire+0xe8/0x470
>          mutex_lock_nested+0x47/0x80
>          kmem_cache_destroy+0x2a/0x120
>          bioset_exit+0xb5/0x100
>          cleanup_mapped_device+0x26/0xf0 [dm_mod]
>          free_dev+0x43/0xb0 [dm_mod]
>          __dm_destroy+0x153/0x1b0 [dm_mod]
>          dev_remove+0xe4/0x1a0 [dm_mod]
>          ctl_ioctl+0x1af/0x3f0 [dm_mod]
>          dm_ctl_ioctl+0xa/0x10 [dm_mod]
>          do_vfs_ioctl+0xa5/0x760
>          ksys_ioctl+0x60/0x90
>          __x64_sys_ioctl+0x16/0x20
>          do_syscall_64+0x8c/0x240
>          entry_SYSCALL_64_after_hwframe+0x6a/0xdf
> 
>   -> #1 (cpu_hotplug_lock){++++}-{0:0}:
>          lock_acquire+0xe8/0x470
>          cpus_read_lock+0x39/0x100
>          cpu_partial_store+0x44/0x80
>          slab_attr_store+0x20/0x30
>          kernfs_fop_write+0x101/0x1b0
>          vfs_write+0xd4/0x1e0
>          ksys_write+0x52/0xc0
>          do_syscall_64+0x8c/0x240
>          entry_SYSCALL_64_after_hwframe+0x6a/0xdf
> 
>   -> #0 (kn->count#144){++++}-{0:0}:
>          check_prevs_add+0x185/0xb80
>          __lock_acquire+0xd8f/0xe90
>          lock_acquire+0xe8/0x470
>          __kernfs_remove+0x25e/0x320
>          kernfs_remove+0x1d/0x30
>          kobject_del+0x28/0x60
>          kmem_cache_destroy+0xf1/0x120
>          bioset_exit+0xb5/0x100
>          cleanup_mapped_device+0x26/0xf0 [dm_mod]
>          free_dev+0x43/0xb0 [dm_mod]
>          __dm_destroy+0x153/0x1b0 [dm_mod]
>          dev_remove+0xe4/0x1a0 [dm_mod]
>          ctl_ioctl+0x1af/0x3f0 [dm_mod]
>          dm_ctl_ioctl+0xa/0x10 [dm_mod]
>          do_vfs_ioctl+0xa5/0x760
>          ksys_ioctl+0x60/0x90
>          __x64_sys_ioctl+0x16/0x20
>          do_syscall_64+0x8c/0x240
>          entry_SYSCALL_64_after_hwframe+0x6a/0xdf
> 
>   other info that might help us debug this:
> 
>   Chain exists of:
>     kn->count#144 --> cpu_hotplug_lock --> slab_mutex
> 
>    Possible unsafe locking scenario:
> 
>          CPU0                    CPU1
>          ----                    ----
>     lock(slab_mutex);
>                                  lock(cpu_hotplug_lock);
>                                  lock(slab_mutex);
>     lock(kn->count#144);
> 
>    *** DEADLOCK ***
> 
>   3 locks held by dmsetup/1832:
>    #0: ffffffffa43fe5c0 (bio_slab_lock){+.+.}-{3:3}, at: bioset_exit+0x62/0x100
>    #1: ffffffffa3e87c20 (cpu_hotplug_lock){++++}-{0:0}, at: kmem_cache_destroy+0x1c/0x120
>    #2: ffffffffa43817c0 (slab_mutex){+.+.}-{3:3}, at: kmem_cache_destroy+0x2a/0x120
> 
>   stack backtrace:
>   Call Trace:
>    dump_stack+0x5c/0x80
>    check_noncircular+0xff/0x120
>    check_prevs_add+0x185/0xb80
>    __lock_acquire+0xd8f/0xe90
>    lock_acquire+0xe8/0x470
>    __kernfs_remove+0x25e/0x320
>    kernfs_remove+0x1d/0x30
>    kobject_del+0x28/0x60
>    kmem_cache_destroy+0xf1/0x120
>    bioset_exit+0xb5/0x100
>    cleanup_mapped_device+0x26/0xf0 [dm_mod]
>    free_dev+0x43/0xb0 [dm_mod]
>    __dm_destroy+0x153/0x1b0 [dm_mod]
>    dev_remove+0xe4/0x1a0 [dm_mod]
>    ctl_ioctl+0x1af/0x3f0 [dm_mod]
>    dm_ctl_ioctl+0xa/0x10 [dm_mod]
>    do_vfs_ioctl+0xa5/0x760
>    ksys_ioctl+0x60/0x90
>    __x64_sys_ioctl+0x16/0x20
>    do_syscall_64+0x8c/0x240
>    entry_SYSCALL_64_after_hwframe+0x6a/0xdf
>

To summary the possible scenario is:

- when cache is destroyed:
	cpu_hotplug_lock
	-> slab_mutex 
	-> wait until kn->count == 0 (because it removes sysfs objects.)

- when someone writes to cpu_partial attribute:
	increase kn->count (incrased in kernfs_fop_write_iter(),
	using kernfs_get_active() )
	-> cpu_hotplug_lock
	-> slab_mutex

... So there is a circular dependency when using kernfs because
clearing sysfs stuff in kernfs  makes unexpected dependency. Right?

I think it's quite unlikely but yeah, seems possible.

> Since cpus_read_lock() is supposed to protect the cpu related data, it
> makes sense to fix this issue by moving cpus_read_lock() from
> kmem_cache_destroy() to __kmem_cache_shutdown(). While at it,
> add the missing cpus_read_lock() in slab_mem_going_offline_callback().
> 
> Fixes: 5a836bf6b09f ("mm: slub: move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context")
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> ---
>  mm/slab_common.c | 2 --
>  mm/slub.c        | 4 ++--
>  2 files changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index e5d080a93009..06ec3fa585e6 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -494,7 +494,6 @@ void kmem_cache_destroy(struct kmem_cache *s)
>  	if (unlikely(!s))
>  		return;
>  
> -	cpus_read_lock();
>  	mutex_lock(&slab_mutex);
>  
>  	s->refcount--;
> @@ -509,7 +508,6 @@ void kmem_cache_destroy(struct kmem_cache *s)
>  	}
>  out_unlock:
>  	mutex_unlock(&slab_mutex);
> -	cpus_read_unlock();
>  }

This code is changing lock order
from cpu_hotplug_lock -> slab_muitex
to slab_mutex -> cpu_hotplug_lock.

>  EXPORT_SYMBOL(kmem_cache_destroy);
>  
> diff --git a/mm/slub.c b/mm/slub.c
> index abe7db581d68..754f020235ee 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -4311,7 +4311,7 @@ int __kmem_cache_shutdown(struct kmem_cache *s)
>  	int node;
>  	struct kmem_cache_node *n;
>  
> -	flush_all_cpus_locked(s);
> +	flush_all(s);
>  	/* Attempt to free all objects */
>  	for_each_kmem_cache_node(s, node, n) {
>  		free_partial(s, n);
> @@ -4646,7 +4646,7 @@ static int slab_mem_going_offline_callback(void *arg)
>  
>  	mutex_lock(&slab_mutex);
>  	list_for_each_entry(s, &slab_caches, list) {
> -		flush_all_cpus_locked(s);
> +		flush_all(s);

In My Opinion, this code is wrong. Because it's called when memory
offlining with cpu_hotplug_lock held. See function offline_pages()
in mm/memory_hotplug.c for details.

it first holds cpu_hoplug_lock by calling mem_hotplug_begin(),
and notifies memory_chain. (so slab_mem_going_offline_callback is
called.)

I think this patch will make another possible deadlock scenario.

in memory hotplugging: cpu_hotplug_lock -> slab_mutex -> cpu_hotplug_lock
in slab cache destroying: slab_mutex -> cpu_hotplug_lock

Thanks!,
Hyeonggon.

>  		__kmem_cache_do_shrink(s);
>  	}
>  	mutex_unlock(&slab_mutex);
> -- 
> 2.27.0
> 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: slub: fix a deadlock warning in kmem_cache_destroy
  2022-01-16  6:35 ` Hyeonggon Yoo
@ 2022-01-17  8:32   ` Xin Long
  2022-01-17  9:33     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 10+ messages in thread
From: Xin Long @ 2022-01-17  8:32 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: LKML, linux-mm, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Vlastimil Babka,
	Sebastian Andrzej Siewior, Antoine Tenart

On Sun, Jan 16, 2022 at 2:35 PM Hyeonggon Yoo <42.hyeyoo@gmail.com> wrote:
>
> On Fri, Jan 14, 2022 at 09:23:16AM -0500, Xin Long wrote:
> > cpus_read_lock() is introduced into kmem_cache_destroy() by
> > commit 5a836bf6b09f ("mm: slub: move flush_cpu_slab() invocations
> > __free_slab() invocations out of IRQ context"), and it could cause
> > a deadlock.
> >
> > As Antoine pointed out, when one thread calls kmem_cache_destroy(), it is
> > blocking until kn->active becomes 0 in kernfs_drain() after holding
> > cpu_hotplug_lock. While in another thread, when calling kernfs_fop_write(),
> > it may try to hold cpu_hotplug_lock after incrementing kn->active by
> > calling kernfs_get_active():
> >
>
> Hello. can you give me a link of related the thread?
Sorry, I don't have a thread on the internet, but I think the changelog
has provided all the information we have.

Just note that It was reproduced in the RHEL-8 RT Kernel after we fixed
another issue. From the code analysis, this issue does exist on the
upstream kernel, though I couldn't build an upstream RT kernel for the
testing.

>
> >         CPU0                        CPU1
> >         ----                        ----
> >   cpus_read_lock()
> >                                    kn->active++
> >                                    cpus_read_lock() [a]
> >   wait until kn->active == 0
> >
> > Although cpu_hotplug_lock is a RWSEM, [a] will not block in there. But as
> > lockdep annotations are added for cpu_hotplug_lock, a deadlock warning
> > would be detected:
> >
> >   ======================================================
> >   WARNING: possible circular locking dependency detected
> >   ------------------------------------------------------
> >   dmsetup/1832 is trying to acquire lock:
> >   ffff986f5a0f9f20 (kn->count#144){++++}-{0:0}, at: kernfs_remove+0x1d/0x30
> >
> >   but task is already holding lock:
> >   ffffffffa43817c0 (slab_mutex){+.+.}-{3:3}, at: kmem_cache_destroy+0x2a/0x120
> >
> >   which lock already depends on the new lock.
> >
> >   the existing dependency chain (in reverse order) is:
> >
> >   -> #2 (slab_mutex){+.+.}-{3:3}:
> >          lock_acquire+0xe8/0x470
> >          mutex_lock_nested+0x47/0x80
> >          kmem_cache_destroy+0x2a/0x120
> >          bioset_exit+0xb5/0x100
> >          cleanup_mapped_device+0x26/0xf0 [dm_mod]
> >          free_dev+0x43/0xb0 [dm_mod]
> >          __dm_destroy+0x153/0x1b0 [dm_mod]
> >          dev_remove+0xe4/0x1a0 [dm_mod]
> >          ctl_ioctl+0x1af/0x3f0 [dm_mod]
> >          dm_ctl_ioctl+0xa/0x10 [dm_mod]
> >          do_vfs_ioctl+0xa5/0x760
> >          ksys_ioctl+0x60/0x90
> >          __x64_sys_ioctl+0x16/0x20
> >          do_syscall_64+0x8c/0x240
> >          entry_SYSCALL_64_after_hwframe+0x6a/0xdf
> >
> >   -> #1 (cpu_hotplug_lock){++++}-{0:0}:
> >          lock_acquire+0xe8/0x470
> >          cpus_read_lock+0x39/0x100
> >          cpu_partial_store+0x44/0x80
> >          slab_attr_store+0x20/0x30
> >          kernfs_fop_write+0x101/0x1b0
> >          vfs_write+0xd4/0x1e0
> >          ksys_write+0x52/0xc0
> >          do_syscall_64+0x8c/0x240
> >          entry_SYSCALL_64_after_hwframe+0x6a/0xdf
> >
> >   -> #0 (kn->count#144){++++}-{0:0}:
> >          check_prevs_add+0x185/0xb80
> >          __lock_acquire+0xd8f/0xe90
> >          lock_acquire+0xe8/0x470
> >          __kernfs_remove+0x25e/0x320
> >          kernfs_remove+0x1d/0x30
> >          kobject_del+0x28/0x60
> >          kmem_cache_destroy+0xf1/0x120
> >          bioset_exit+0xb5/0x100
> >          cleanup_mapped_device+0x26/0xf0 [dm_mod]
> >          free_dev+0x43/0xb0 [dm_mod]
> >          __dm_destroy+0x153/0x1b0 [dm_mod]
> >          dev_remove+0xe4/0x1a0 [dm_mod]
> >          ctl_ioctl+0x1af/0x3f0 [dm_mod]
> >          dm_ctl_ioctl+0xa/0x10 [dm_mod]
> >          do_vfs_ioctl+0xa5/0x760
> >          ksys_ioctl+0x60/0x90
> >          __x64_sys_ioctl+0x16/0x20
> >          do_syscall_64+0x8c/0x240
> >          entry_SYSCALL_64_after_hwframe+0x6a/0xdf
> >
> >   other info that might help us debug this:
> >
> >   Chain exists of:
> >     kn->count#144 --> cpu_hotplug_lock --> slab_mutex
> >
> >    Possible unsafe locking scenario:
> >
> >          CPU0                    CPU1
> >          ----                    ----
> >     lock(slab_mutex);
> >                                  lock(cpu_hotplug_lock);
> >                                  lock(slab_mutex);
> >     lock(kn->count#144);
> >
> >    *** DEADLOCK ***
> >
> >   3 locks held by dmsetup/1832:
> >    #0: ffffffffa43fe5c0 (bio_slab_lock){+.+.}-{3:3}, at: bioset_exit+0x62/0x100
> >    #1: ffffffffa3e87c20 (cpu_hotplug_lock){++++}-{0:0}, at: kmem_cache_destroy+0x1c/0x120
> >    #2: ffffffffa43817c0 (slab_mutex){+.+.}-{3:3}, at: kmem_cache_destroy+0x2a/0x120
> >
> >   stack backtrace:
> >   Call Trace:
> >    dump_stack+0x5c/0x80
> >    check_noncircular+0xff/0x120
> >    check_prevs_add+0x185/0xb80
> >    __lock_acquire+0xd8f/0xe90
> >    lock_acquire+0xe8/0x470
> >    __kernfs_remove+0x25e/0x320
> >    kernfs_remove+0x1d/0x30
> >    kobject_del+0x28/0x60
> >    kmem_cache_destroy+0xf1/0x120
> >    bioset_exit+0xb5/0x100
> >    cleanup_mapped_device+0x26/0xf0 [dm_mod]
> >    free_dev+0x43/0xb0 [dm_mod]
> >    __dm_destroy+0x153/0x1b0 [dm_mod]
> >    dev_remove+0xe4/0x1a0 [dm_mod]
> >    ctl_ioctl+0x1af/0x3f0 [dm_mod]
> >    dm_ctl_ioctl+0xa/0x10 [dm_mod]
> >    do_vfs_ioctl+0xa5/0x760
> >    ksys_ioctl+0x60/0x90
> >    __x64_sys_ioctl+0x16/0x20
> >    do_syscall_64+0x8c/0x240
> >    entry_SYSCALL_64_after_hwframe+0x6a/0xdf
> >
>
> To summary the possible scenario is:
>
> - when cache is destroyed:
>         cpu_hotplug_lock
>         -> slab_mutex
>         -> wait until kn->count == 0 (because it removes sysfs objects.)
not really wait for kn->count == 0, but wait for kn->active in kernfs_drain():

        /* but everyone should wait for draining */
        wait_event(root->deactivate_waitq,
                   atomic_read(&kn->active) == KN_DEACTIVATED_BIAS);

>
> - when someone writes to cpu_partial attribute:
>         increase kn->count (incrased in kernfs_fop_write_iter(),
>         using kernfs_get_active() )
and yes, called kernfs_get_active() to hold/increment kn->active.

>         -> cpu_hotplug_lock
>         -> slab_mutex
>
> ... So there is a circular dependency when using kernfs because
> clearing sysfs stuff in kernfs  makes unexpected dependency. Right?
So it is:

      CPU0                            CPU1
        ----                            ----
  cpus_read_lock() in kmem_cache_destroy()
                                       kn->active++ in kernfs_get_active()
                                       cpus_read_lock() in
cpu_partial_store()->flush_all()
  wait until kn->active == 0 in kernfs_drain()


>
> I think it's quite unlikely but yeah, seems possible.
Interesting that it could be easily reproduced on the RT kernel.

>
> > Since cpus_read_lock() is supposed to protect the cpu related data, it
> > makes sense to fix this issue by moving cpus_read_lock() from
> > kmem_cache_destroy() to __kmem_cache_shutdown(). While at it,
> > add the missing cpus_read_lock() in slab_mem_going_offline_callback().
> >
> > Fixes: 5a836bf6b09f ("mm: slub: move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context")
> > Signed-off-by: Xin Long <lucien.xin@gmail.com>
> > ---
> >  mm/slab_common.c | 2 --
> >  mm/slub.c        | 4 ++--
> >  2 files changed, 2 insertions(+), 4 deletions(-)
> >
> > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > index e5d080a93009..06ec3fa585e6 100644
> > --- a/mm/slab_common.c
> > +++ b/mm/slab_common.c
> > @@ -494,7 +494,6 @@ void kmem_cache_destroy(struct kmem_cache *s)
> >       if (unlikely(!s))
> >               return;
> >
> > -     cpus_read_lock();
> >       mutex_lock(&slab_mutex);
> >
> >       s->refcount--;
> > @@ -509,7 +508,6 @@ void kmem_cache_destroy(struct kmem_cache *s)
> >       }
> >  out_unlock:
> >       mutex_unlock(&slab_mutex);
> > -     cpus_read_unlock();
> >  }
>
> This code is changing lock order
> from cpu_hotplug_lock -> slab_muitex
> to slab_mutex -> cpu_hotplug_lock.
Right.

>
> >  EXPORT_SYMBOL(kmem_cache_destroy);
> >
> > diff --git a/mm/slub.c b/mm/slub.c
> > index abe7db581d68..754f020235ee 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -4311,7 +4311,7 @@ int __kmem_cache_shutdown(struct kmem_cache *s)
> >       int node;
> >       struct kmem_cache_node *n;
> >
> > -     flush_all_cpus_locked(s);
> > +     flush_all(s);
> >       /* Attempt to free all objects */
> >       for_each_kmem_cache_node(s, node, n) {
> >               free_partial(s, n);
> > @@ -4646,7 +4646,7 @@ static int slab_mem_going_offline_callback(void *arg)
> >
> >       mutex_lock(&slab_mutex);
> >       list_for_each_entry(s, &slab_caches, list) {
> > -             flush_all_cpus_locked(s);
> > +             flush_all(s);
>
> In My Opinion, this code is wrong. Because it's called when memory
> offlining with cpu_hotplug_lock held. See function offline_pages()
> in mm/memory_hotplug.c for details.
>
> it first holds cpu_hoplug_lock by calling mem_hotplug_begin(),
> and notifies memory_chain. (so slab_mem_going_offline_callback is
> called.)
>
> I think this patch will make another possible deadlock scenario.
>
> in memory hotplugging: cpu_hotplug_lock -> slab_mutex -> cpu_hotplug_lock
> in slab cache destroying: slab_mutex -> cpu_hotplug_lock
Now I understand why cpus_read_lock() was called in kmem_cache_destroy().
I have to think about fixing it in a better way.

Thanks for the review.

>
> Thanks!,
> Hyeonggon.
>
> >               __kmem_cache_do_shrink(s);
> >       }
> >       mutex_unlock(&slab_mutex);
> > --
> > 2.27.0
> >
> >

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: slub: fix a deadlock warning in kmem_cache_destroy
  2022-01-17  8:32   ` Xin Long
@ 2022-01-17  9:33     ` Sebastian Andrzej Siewior
  2022-01-17 12:40       ` Vlastimil Babka
  0 siblings, 1 reply; 10+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-01-17  9:33 UTC (permalink / raw)
  To: Xin Long
  Cc: Hyeonggon Yoo, LKML, linux-mm, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, Vlastimil Babka,
	Antoine Tenart

On 2022-01-17 16:32:46 [+0800], Xin Long wrote:
> another issue. From the code analysis, this issue does exist on the
> upstream kernel, though I couldn't build an upstream RT kernel for the
> testing.

This should also reproduce in v5.16 since the commit in question is
there.

> > >         CPU0                        CPU1
> > >         ----                        ----
> > >   cpus_read_lock()
> > >                                    kn->active++
> > >                                    cpus_read_lock() [a]
> > >   wait until kn->active == 0
> > >
> > > Although cpu_hotplug_lock is a RWSEM, [a] will not block in there. But as
> > > lockdep annotations are added for cpu_hotplug_lock, a deadlock warning
> > > would be detected:

The cpu_hotplug_lock is a per-CPU RWSEM. The lock in [a] will block if
there is a writer pending.

> > >   ======================================================
> > >   WARNING: possible circular locking dependency detected
> > >   ------------------------------------------------------
> > >   dmsetup/1832 is trying to acquire lock:
> > >   ffff986f5a0f9f20 (kn->count#144){++++}-{0:0}, at: kernfs_remove+0x1d/0x30
> > >
> > >   but task is already holding lock:
> > >   ffffffffa43817c0 (slab_mutex){+.+.}-{3:3}, at: kmem_cache_destroy+0x2a/0x120
> > >

I tried to create & destroy a cryptarget which creates/destroy a cache
via bio_put_slab(). Either the callchain is different or something else
is but I didn't see a lockdep warning.

Sebastian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: slub: fix a deadlock warning in kmem_cache_destroy
  2022-01-17  9:33     ` Sebastian Andrzej Siewior
@ 2022-01-17 12:40       ` Vlastimil Babka
  2022-01-17 13:13         ` Juri Lelli
  0 siblings, 1 reply; 10+ messages in thread
From: Vlastimil Babka @ 2022-01-17 12:40 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Xin Long
  Cc: Hyeonggon Yoo, LKML, linux-mm, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, Antoine Tenart,
	Clark Williams

+CC Clark

On 1/17/22 10:33, Sebastian Andrzej Siewior wrote:
> On 2022-01-17 16:32:46 [+0800], Xin Long wrote:
>> another issue. From the code analysis, this issue does exist on the
>> upstream kernel, though I couldn't build an upstream RT kernel for the
>> testing.
> 
> This should also reproduce in v5.16 since the commit in question is
> there.

Yeah. I remember we had some issues with the commit during development, but
I'd hope those were resolved and the commit that's ultimately merged got the
fixes, see this subthread:

https://lore.kernel.org/all/0b36128c-3e12-77df-85fe-a153a714569b@quicinc.com/

>> > >         CPU0                        CPU1
>> > >         ----                        ----
>> > >   cpus_read_lock()
>> > >                                    kn->active++
>> > >                                    cpus_read_lock() [a]
>> > >   wait until kn->active == 0
>> > >
>> > > Although cpu_hotplug_lock is a RWSEM, [a] will not block in there. But as
>> > > lockdep annotations are added for cpu_hotplug_lock, a deadlock warning
>> > > would be detected:
> 
> The cpu_hotplug_lock is a per-CPU RWSEM. The lock in [a] will block if
> there is a writer pending.
> 
>> > >   ======================================================
>> > >   WARNING: possible circular locking dependency detected
>> > >   ------------------------------------------------------
>> > >   dmsetup/1832 is trying to acquire lock:
>> > >   ffff986f5a0f9f20 (kn->count#144){++++}-{0:0}, at: kernfs_remove+0x1d/0x30
>> > >
>> > >   but task is already holding lock:
>> > >   ffffffffa43817c0 (slab_mutex){+.+.}-{3:3}, at: kmem_cache_destroy+0x2a/0x120
>> > >
> 
> I tried to create & destroy a cryptarget which creates/destroy a cache
> via bio_put_slab(). Either the callchain is different or something else
> is but I didn't see a lockdep warning.

RHEL-8 kernel seems to be 4.18, unless RT uses a newer one. Could be some
silently relevant backport is missing? How about e.g. 59450bbc12be ("mm,
slab, slub: stop taking cpu hotplug lock") ?

> Sebastian


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: slub: fix a deadlock warning in kmem_cache_destroy
  2022-01-17 12:40       ` Vlastimil Babka
@ 2022-01-17 13:13         ` Juri Lelli
  2022-01-18  8:00           ` Xin Long
  0 siblings, 1 reply; 10+ messages in thread
From: Juri Lelli @ 2022-01-17 13:13 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Sebastian Andrzej Siewior, Xin Long, Hyeonggon Yoo, LKML,
	linux-mm, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Antoine Tenart, Clark Williams

Hi,

On 17/01/22 13:40, Vlastimil Babka wrote:
> +CC Clark
> 
> On 1/17/22 10:33, Sebastian Andrzej Siewior wrote:
> > On 2022-01-17 16:32:46 [+0800], Xin Long wrote:
> >> another issue. From the code analysis, this issue does exist on the
> >> upstream kernel, though I couldn't build an upstream RT kernel for the
> >> testing.
> > 
> > This should also reproduce in v5.16 since the commit in question is
> > there.
> 
> Yeah. I remember we had some issues with the commit during development, but
> I'd hope those were resolved and the commit that's ultimately merged got the
> fixes, see this subthread:
> 
> https://lore.kernel.org/all/0b36128c-3e12-77df-85fe-a153a714569b@quicinc.com/
> 
> >> > >         CPU0                        CPU1
> >> > >         ----                        ----
> >> > >   cpus_read_lock()
> >> > >                                    kn->active++
> >> > >                                    cpus_read_lock() [a]
> >> > >   wait until kn->active == 0
> >> > >
> >> > > Although cpu_hotplug_lock is a RWSEM, [a] will not block in there. But as
> >> > > lockdep annotations are added for cpu_hotplug_lock, a deadlock warning
> >> > > would be detected:
> > 
> > The cpu_hotplug_lock is a per-CPU RWSEM. The lock in [a] will block if
> > there is a writer pending.
> > 
> >> > >   ======================================================
> >> > >   WARNING: possible circular locking dependency detected
> >> > >   ------------------------------------------------------
> >> > >   dmsetup/1832 is trying to acquire lock:
> >> > >   ffff986f5a0f9f20 (kn->count#144){++++}-{0:0}, at: kernfs_remove+0x1d/0x30
> >> > >
> >> > >   but task is already holding lock:
> >> > >   ffffffffa43817c0 (slab_mutex){+.+.}-{3:3}, at: kmem_cache_destroy+0x2a/0x120
> >> > >
> > 
> > I tried to create & destroy a cryptarget which creates/destroy a cache
> > via bio_put_slab(). Either the callchain is different or something else
> > is but I didn't see a lockdep warning.
> 
> RHEL-8 kernel seems to be 4.18, unless RT uses a newer one. Could be some
> silently relevant backport is missing? How about e.g. 59450bbc12be ("mm,
> slab, slub: stop taking cpu hotplug lock") ?

Hummm, looks like we have backported commit 59450bbc12be in RHEL-8.

Xin Long, would you be able to check if you still see the lockdep splat
with latest upstream RT?

git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git linux-5.16.y-rt

Thanks!
Juri


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: slub: fix a deadlock warning in kmem_cache_destroy
  2022-01-17 13:13         ` Juri Lelli
@ 2022-01-18  8:00           ` Xin Long
  2022-01-18 11:07             ` Vlastimil Babka
  0 siblings, 1 reply; 10+ messages in thread
From: Xin Long @ 2022-01-18  8:00 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Vlastimil Babka, Sebastian Andrzej Siewior, Hyeonggon Yoo, LKML,
	linux-mm, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Antoine Tenart, Clark Williams

On Mon, Jan 17, 2022 at 9:13 PM Juri Lelli <juri.lelli@redhat.com> wrote:
>
> Hi,
>
> On 17/01/22 13:40, Vlastimil Babka wrote:
> > +CC Clark
> >
> > On 1/17/22 10:33, Sebastian Andrzej Siewior wrote:
> > > On 2022-01-17 16:32:46 [+0800], Xin Long wrote:
> > >> another issue. From the code analysis, this issue does exist on the
> > >> upstream kernel, though I couldn't build an upstream RT kernel for the
> > >> testing.
> > >
> > > This should also reproduce in v5.16 since the commit in question is
> > > there.
> >
> > Yeah. I remember we had some issues with the commit during development, but
> > I'd hope those were resolved and the commit that's ultimately merged got the
> > fixes, see this subthread:
> >
> > https://lore.kernel.org/all/0b36128c-3e12-77df-85fe-a153a714569b@quicinc.com/
> >
> > >> > >         CPU0                        CPU1
> > >> > >         ----                        ----
> > >> > >   cpus_read_lock()
> > >> > >                                    kn->active++
> > >> > >                                    cpus_read_lock() [a]
> > >> > >   wait until kn->active == 0
> > >> > >
> > >> > > Although cpu_hotplug_lock is a RWSEM, [a] will not block in there. But as
> > >> > > lockdep annotations are added for cpu_hotplug_lock, a deadlock warning
> > >> > > would be detected:
> > >
> > > The cpu_hotplug_lock is a per-CPU RWSEM. The lock in [a] will block if
> > > there is a writer pending.
> > >
> > >> > >   ======================================================
> > >> > >   WARNING: possible circular locking dependency detected
> > >> > >   ------------------------------------------------------
> > >> > >   dmsetup/1832 is trying to acquire lock:
> > >> > >   ffff986f5a0f9f20 (kn->count#144){++++}-{0:0}, at: kernfs_remove+0x1d/0x30
> > >> > >
> > >> > >   but task is already holding lock:
> > >> > >   ffffffffa43817c0 (slab_mutex){+.+.}-{3:3}, at: kmem_cache_destroy+0x2a/0x120
> > >> > >
> > >
> > > I tried to create & destroy a cryptarget which creates/destroy a cache
> > > via bio_put_slab(). Either the callchain is different or something else
> > > is but I didn't see a lockdep warning.
> >
> > RHEL-8 kernel seems to be 4.18, unless RT uses a newer one. Could be some
> > silently relevant backport is missing? How about e.g. 59450bbc12be ("mm,
> > slab, slub: stop taking cpu hotplug lock") ?
>
> Hummm, looks like we have backported commit 59450bbc12be in RHEL-8.
>
> Xin Long, would you be able to check if you still see the lockdep splat
> with latest upstream RT?
>
> git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git linux-5.16.y-rt
Hi, Juri,

Thanks for sharing the RT kernel repo.

I just tried with this kernel, and I couldn't reproduce it on my env.
But I don't see how the upstream RT kernel can avoid the call trace.

As this warning was triggered when the system was shutting down, it might
not be reproduced on it due to some timing change.

>
> Thanks!
> Juri
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: slub: fix a deadlock warning in kmem_cache_destroy
  2022-01-18  8:00           ` Xin Long
@ 2022-01-18 11:07             ` Vlastimil Babka
  2022-01-19  8:38               ` Xin Long
  0 siblings, 1 reply; 10+ messages in thread
From: Vlastimil Babka @ 2022-01-18 11:07 UTC (permalink / raw)
  To: Xin Long, Juri Lelli
  Cc: Sebastian Andrzej Siewior, Hyeonggon Yoo, LKML, linux-mm,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Antoine Tenart, Clark Williams


On 1/18/22 09:00, Xin Long wrote:
> On Mon, Jan 17, 2022 at 9:13 PM Juri Lelli <juri.lelli@redhat.com> wrote:
>> >
>> > RHEL-8 kernel seems to be 4.18, unless RT uses a newer one. Could be some
>> > silently relevant backport is missing? How about e.g. 59450bbc12be ("mm,
>> > slab, slub: stop taking cpu hotplug lock") ?
>>
>> Hummm, looks like we have backported commit 59450bbc12be in RHEL-8.
>>
>> Xin Long, would you be able to check if you still see the lockdep splat
>> with latest upstream RT?
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git linux-5.16.y-rt
> Hi, Juri,
> 
> Thanks for sharing the RT kernel repo.
> 
> I just tried with this kernel, and I couldn't reproduce it on my env.
> But I don't see how the upstream RT kernel can avoid the call trace.
> 
> As this warning was triggered when the system was shutting down, it might
> not be reproduced on it due to some timing change.

As it was caught by lockdep and not as a real deadlock, I think it should be
indepenedent of a timing change. Lockdep will correlate potentially deadlock
scenarios even if they don't really occur in the same time, AFAIK.

But let's go back to:

> Although cpu_hotplug_lock is a RWSEM, [a] will not block in there. But as
> lockdep annotations are added for cpu_hotplug_lock, a deadlock warning
> would be detected:

Is it possible that upstream lockdep handles this RWSEM scenario properly
and doesn't report it, but the RHEL kernel is missing some relevant lockdep fix?

>>
>> Thanks!
>> Juri
>>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: slub: fix a deadlock warning in kmem_cache_destroy
  2022-01-18 11:07             ` Vlastimil Babka
@ 2022-01-19  8:38               ` Xin Long
  0 siblings, 0 replies; 10+ messages in thread
From: Xin Long @ 2022-01-19  8:38 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Juri Lelli, Sebastian Andrzej Siewior, Hyeonggon Yoo, LKML,
	linux-mm, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Antoine Tenart, Clark Williams

On Tue, Jan 18, 2022 at 7:07 PM Vlastimil Babka <vbabka@suse.cz> wrote:
>
>
> On 1/18/22 09:00, Xin Long wrote:
> > On Mon, Jan 17, 2022 at 9:13 PM Juri Lelli <juri.lelli@redhat.com> wrote:
> >> >
> >> > RHEL-8 kernel seems to be 4.18, unless RT uses a newer one. Could be some
> >> > silently relevant backport is missing? How about e.g. 59450bbc12be ("mm,
> >> > slab, slub: stop taking cpu hotplug lock") ?
> >>
> >> Hummm, looks like we have backported commit 59450bbc12be in RHEL-8.
> >>
> >> Xin Long, would you be able to check if you still see the lockdep splat
> >> with latest upstream RT?
> >>
> >> git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git linux-5.16.y-rt
> > Hi, Juri,
> >
> > Thanks for sharing the RT kernel repo.
> >
> > I just tried with this kernel, and I couldn't reproduce it on my env.
> > But I don't see how the upstream RT kernel can avoid the call trace.
> >
> > As this warning was triggered when the system was shutting down, it might
> > not be reproduced on it due to some timing change.
>
> As it was caught by lockdep and not as a real deadlock, I think it should be
> indepenedent of a timing change. Lockdep will correlate potentially deadlock
> scenarios even if they don't really occur in the same time, AFAIK.
>
> But let's go back to:
>
> > Although cpu_hotplug_lock is a RWSEM, [a] will not block in there. But as
> > lockdep annotations are added for cpu_hotplug_lock, a deadlock warning
> > would be detected:
>
> Is it possible that upstream lockdep handles this RWSEM scenario properly
> and doesn't report it, but the RHEL kernel is missing some relevant lockdep fix?
That's a good point.

I actually think:
  cpus_read_lock()
  cpus_read_lock()
shouldn't be considered as a deadlock.

I will check the lockdep changes, and it may take some time.

Thanks.

>
> >>
> >> Thanks!
> >> Juri
> >>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: slub: fix a deadlock warning in kmem_cache_destroy
  2022-01-14 14:23 [PATCH] mm: slub: fix a deadlock warning in kmem_cache_destroy Xin Long
  2022-01-16  6:35 ` Hyeonggon Yoo
@ 2022-06-01 14:36 ` Maurizio Lombardi
  1 sibling, 0 replies; 10+ messages in thread
From: Maurizio Lombardi @ 2022-06-01 14:36 UTC (permalink / raw)
  To: Xin Long
  Cc: linux-kernel, linux-mm, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, Vlastimil Babka,
	Sebastian Andrzej Siewior, Antoine Tenart

pá 14. 1. 2022 v 15:23 odesílatel Xin Long <lucien.xin@gmail.com> napsal:
>
> cpus_read_lock() is introduced into kmem_cache_destroy() by
> commit 5a836bf6b09f ("mm: slub: move flush_cpu_slab() invocations
> __free_slab() invocations out of IRQ context"), and it could cause
> a deadlock.

FYI,

I received a bug report from one of our customers, he complains that
his system (with nvmefc boot from SAN) hangs when rebooting.
He runs a RHEL-9 kernel based on version 5.14.0.

What is interesting is that, according to him, after reverting commit
5a836bf6b09f
("mm: slub: move flush_cpu_slab() invocations __free_slab()
invocations out of IRQ context")
the reboot operation doesn't hang anymore.

The call trace seems to point to a possible problem due to the fact that
nvme_delete_ctrl_work is allocated with the WQ_MEM_RECLAIM bit set.

[  453.012078] ------------[ cut here ]------------
[  453.016744] workqueue: WQ_MEM_RECLAIM
nvme-delete-wq:nvme_delete_ctrl_work [nvme_core] is flushing
!WQ_MEM_RECLAIM events:flush_cpu_slab
[  453.016789] WARNING: CPU: 37 PID: 410 at kernel/workqueue.c:2637
check_flush_dependency+0x10a/0x120
[...]
[  453.262125] Call Trace:
[  453.264582]  __flush_work.isra.0+0xbf/0x220
[  453.268775]  ? __queue_work+0x1dc/0x420
[  453.272623]  flush_all_cpus_locked+0xfb/0x120
[  453.276992]  __kmem_cache_shutdown+0x2b/0x320
[  453.281361]  kmem_cache_destroy+0x49/0x100
[  453.285465]  bioset_exit+0x143/0x190
[  453.289052]  blk_release_queue+0xb9/0x100
[  453.293075]  kobject_cleanup+0x37/0x130
[  453.296922]  nvme_fc_ctrl_free+0xc6/0x150 [nvme_fc]
[  453.302397]  nvme_free_ctrl+0x1ac/0x2b0 [nvme_core]
[  453.307818]  device_release+0x31/0x90
[  453.312005]  kobject_cleanup+0x37/0x130
[  453.316369]  process_one_work+0x1e5/0x3c0
[  453.320895]  worker_thread+0x50/0x3b0
[  453.325074]  ? rescuer_thread+0x370/0x370
[  453.329592]  kthread+0x146/0x170
[  453.333322]  ? set_kthread_struct+0x40/0x40
[  453.338027]  ret_from_fork+0x1f/0x30
[  453.342082] ---[ end trace 8c9cdd85adbbfc4f ]---

Maurizio Lombardi


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-06-01 14:36 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-14 14:23 [PATCH] mm: slub: fix a deadlock warning in kmem_cache_destroy Xin Long
2022-01-16  6:35 ` Hyeonggon Yoo
2022-01-17  8:32   ` Xin Long
2022-01-17  9:33     ` Sebastian Andrzej Siewior
2022-01-17 12:40       ` Vlastimil Babka
2022-01-17 13:13         ` Juri Lelli
2022-01-18  8:00           ` Xin Long
2022-01-18 11:07             ` Vlastimil Babka
2022-01-19  8:38               ` Xin Long
2022-06-01 14:36 ` Maurizio Lombardi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.