All of lore.kernel.org
 help / color / mirror / Atom feed
* a deadlock bug in the kernel-side device mapper code
@ 2009-11-05 13:21 guy keren
  2009-11-05 14:24 ` Alasdair G Kergon
  2009-11-06  0:24 ` Kiyoshi Ueda
  0 siblings, 2 replies; 13+ messages in thread
From: guy keren @ 2009-11-05 13:21 UTC (permalink / raw)
  To: device-mapper development


Hi,

we encountered a deadlock inside the kernel part of the device-mapper 
code. it was found in a CentOS 5.3 system's kernel - but from looking at 
the code of kernel 2.6.31 - the same bug is still in there.

below is the stack trace of the self-deadlocking code. this is one of 
the threads of multipathd, that attempts to remove a dm device using a 
ioctl to the dm driver:

crash> bt 22619
PID: 22619  TASK: ffff8106521247e0  CPU: 3   COMMAND: "multipathd"
  #0 [ffff8106298dfb48] schedule at ffffffff80063035
  #1 [ffff8106298dfc20] __down_read at ffffffff8006475d
  #2 [ffff8106298dfc60] dm_copy_name_and_uuid at ffffffff8824f740
  #3 [ffff8106298dfc90] dm_send_uevents at ffffffff88252685
  #4 [ffff8106298dfcd0] event_callback at ffffffff8824c678
  #5 [ffff8106298dfd00] dm_table_event at ffffffff8824dd01
  #6 [ffff8106298dfd10] __hash_remove at ffffffff882507ad
  #7 [ffff8106298dfd30] dev_remove at ffffffff88250865
  #8 [ffff8106298dfd60] ctl_ioctl at ffffffff88250d80
  #9 [ffff8106298dfee0] do_ioctl at ffffffff800418c4
#10 [ffff8106298dff00] vfs_ioctl at ffffffff8002fab9
#11 [ffff8106298dff40] sys_ioctl at ffffffff8004bdaf
#12 [ffff8106298dff80] tracesys at ffffffff8005d28d (via system_call)
     RIP: 00000039deecbb47  RSP: 0000000041e35bb8  RFLAGS: 00000246
     RAX: ffffffffffffffda  RBX: ffffffff8005d28d  RCX: ffffffffffffffff
     RDX: 000000001b9a7ac0  RSI: 00000000c138fd04  RDI: 0000000000000007
     RBP: 0000000000000000   R8: 00000039df211e45   R9: 000000001b9a7af0
     R10: 00000039df211d59  R11: 0000000000000246  R12: 00000039df211e23
     R13: 0000000000000000  R14: 00000039df211d59  R15: 0000000000000000
     ORIG_RAX: 0000000000000010  CS: 0033  SS: 002b

(note: the crash was taken using kdump).

the problem appears to be that the function dm_remove in file 
drivers/md/dm-ioctl.c is locking the _hash_lock rw semaphore for write 
(down_write(&_hash_lock);), and then later in the call chain, the 
function dm_copy_name_and_uuid (in the same source file) attempts to 
lock the same semaphore for read. since the semaphore is not recursive - 
there is a deadlock. naturally, when this happens, any command trying to 
access those data structures (dmsetup, multipath, etc) block as well.

if my analysis is correct - is there any idea on how to go about fixing 
this? i can see several diffeernt paths - one is to store the data 
required by dm_copy_name_and_uuid in a location that won't require 
locking - or altenatively, have a dual version of the relevant functions 
- one to be invoked when there lock is not held, and one to be invoked 
when the lock is held.

note: we've encountered this deadlock twice in the past week - no idea 
if we saw it in the past or not.

thanks,
--guy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: a deadlock bug in the kernel-side device mapper code
  2009-11-05 13:21 a deadlock bug in the kernel-side device mapper code guy keren
@ 2009-11-05 14:24 ` Alasdair G Kergon
  2009-11-06  2:58   ` [PATCH] " Mikulas Patocka
  2009-11-06  0:24 ` Kiyoshi Ueda
  1 sibling, 1 reply; 13+ messages in thread
From: Alasdair G Kergon @ 2009-11-05 14:24 UTC (permalink / raw)
  To: guy keren; +Cc: device-mapper development

On Thu, Nov 05, 2009 at 03:21:58PM +0200, guy keren wrote:
> below is the stack trace of the self-deadlocking code. this is one of  
> the threads of multipathd, that attempts to remove a dm device using a  
> ioctl to the dm driver:

Yes, thanks for reporting this.

This only happens if there are some uevents still waiting to be sent
when the device is being destroyed, but it does need to be fixed.

Alasdair

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: a deadlock bug in the kernel-side device mapper code
  2009-11-05 13:21 a deadlock bug in the kernel-side device mapper code guy keren
  2009-11-05 14:24 ` Alasdair G Kergon
@ 2009-11-06  0:24 ` Kiyoshi Ueda
  1 sibling, 0 replies; 13+ messages in thread
From: Kiyoshi Ueda @ 2009-11-06  0:24 UTC (permalink / raw)
  To: device-mapper development

Hi,

On 11/05/2009 10:21 PM +0900, guy keren wrote:
> 
> Hi,
> 
> we encountered a deadlock inside the kernel part of the device-mapper
> code. it was found in a CentOS 5.3 system's kernel - but from looking at
> the code of kernel 2.6.31 - the same bug is still in there.
> 
> below is the stack trace of the self-deadlocking code. this is one of
> the threads of multipathd, that attempts to remove a dm device using a
> ioctl to the dm driver:
> 
> crash> bt 22619
> PID: 22619  TASK: ffff8106521247e0  CPU: 3   COMMAND: "multipathd"
>  #0 [ffff8106298dfb48] schedule at ffffffff80063035
>  #1 [ffff8106298dfc20] __down_read at ffffffff8006475d
>  #2 [ffff8106298dfc60] dm_copy_name_and_uuid at ffffffff8824f740
>  #3 [ffff8106298dfc90] dm_send_uevents at ffffffff88252685
>  #4 [ffff8106298dfcd0] event_callback at ffffffff8824c678
>  #5 [ffff8106298dfd00] dm_table_event at ffffffff8824dd01
>  #6 [ffff8106298dfd10] __hash_remove at ffffffff882507ad
>  #7 [ffff8106298dfd30] dev_remove at ffffffff88250865
>  #8 [ffff8106298dfd60] ctl_ioctl at ffffffff88250d80
>  #9 [ffff8106298dfee0] do_ioctl at ffffffff800418c4
> #10 [ffff8106298dff00] vfs_ioctl at ffffffff8002fab9
> #11 [ffff8106298dff40] sys_ioctl at ffffffff8004bdaf
> #12 [ffff8106298dff80] tracesys at ffffffff8005d28d (via system_call)
>     RIP: 00000039deecbb47  RSP: 0000000041e35bb8  RFLAGS: 00000246
>     RAX: ffffffffffffffda  RBX: ffffffff8005d28d  RCX: ffffffffffffffff
>     RDX: 000000001b9a7ac0  RSI: 00000000c138fd04  RDI: 0000000000000007
>     RBP: 0000000000000000   R8: 00000039df211e45   R9: 000000001b9a7af0
>     R10: 00000039df211d59  R11: 0000000000000246  R12: 00000039df211e23
>     R13: 0000000000000000  R14: 00000039df211d59  R15: 0000000000000000
>     ORIG_RAX: 0000000000000010  CS: 0033  SS: 002b
> 
> (note: the crash was taken using kdump).
> 
> the problem appears to be that the function dm_remove in file
> drivers/md/dm-ioctl.c is locking the _hash_lock rw semaphore for write
> (down_write(&_hash_lock);), and then later in the call chain, the
> function dm_copy_name_and_uuid (in the same source file) attempts to
> lock the same semaphore for read. since the semaphore is not recursive -
> there is a deadlock. naturally, when this happens, any command trying to
> access those data structures (dmsetup, multipath, etc) block as well.

Right, it's a known problem, and it has not been fixed yet.


> note: we've encountered this deadlock twice in the past week - no idea
> if we saw it in the past or not.

This one has been there since the commit below:
---------------------------------------------------------------------
    commit 7a8c3d3b92883798e4ead21dd48c16db0ec0ff6f
    Author: Mike Anderson <andmike@linux.vnet.ibm.com>
    Date:   Fri Oct 19 22:48:01 2007 +0100

        dm: uevent generate events

        This patch adds support for the dm_path_event dm_send_event
        functions which create and send udev events.
---------------------------------------------------------------------

See below for details:
    http://marc.info/?l=dm-devel&m=125412382315993&w=2

Thanks,
Kiyoshi Ueda

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] a deadlock bug in the kernel-side device mapper code
  2009-11-05 14:24 ` Alasdair G Kergon
@ 2009-11-06  2:58   ` Mikulas Patocka
       [not found]     ` <20091106151504.GS13375@agk-dp.fab.redhat.com>
                       ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Mikulas Patocka @ 2009-11-06  2:58 UTC (permalink / raw)
  To: device-mapper development; +Cc: Alasdair G Kergon



On Thu, 5 Nov 2009, Alasdair G Kergon wrote:

> On Thu, Nov 05, 2009 at 03:21:58PM +0200, guy keren wrote:
> > below is the stack trace of the self-deadlocking code. this is one of  
> > the threads of multipathd, that attempts to remove a dm device using a  
> > ioctl to the dm driver:
> 
> Yes, thanks for reporting this.
> 
> This only happens if there are some uevents still waiting to be sent
> when the device is being destroyed, but it does need to be fixed.
> 
> Alasdair
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

Hi

This is the patch that uses two locks to avoid the deadlock.

Mikulas

---
 
Fix deadlock due to _hash_lock recursion

This patch fixes the following deadlock:
 #0 [ffff8106298dfb48] schedule at ffffffff80063035
 #1 [ffff8106298dfc20] __down_read at ffffffff8006475d
 #2 [ffff8106298dfc60] dm_copy_name_and_uuid at ffffffff8824f740
 #3 [ffff8106298dfc90] dm_send_uevents at ffffffff88252685
 #4 [ffff8106298dfcd0] event_callback at ffffffff8824c678
 #5 [ffff8106298dfd00] dm_table_event at ffffffff8824dd01
 #6 [ffff8106298dfd10] __hash_remove at ffffffff882507ad
 #7 [ffff8106298dfd30] dev_remove at ffffffff88250865
 #8 [ffff8106298dfd60] ctl_ioctl at ffffffff88250d80
 #9 [ffff8106298dfee0] do_ioctl at ffffffff800418c4
#10 [ffff8106298dff00] vfs_ioctl at ffffffff8002fab9
#11 [ffff8106298dff40] sys_ioctl at ffffffff8004bdaf
#12 [ffff8106298dff80] tracesys at ffffffff8005d28d (via system_call)

_hash_lock is taken in dev_remove and then again in dm_copy_name_and_uuid.

This patch introduces a new lock, _name_read_lock, that is placed around
regions that modify pointer to the hash with dm_set_mdptr or that modify
hc->name or hc->uuid. When this lock is taken, the caller can safely read
the name and uuid.

This lock has much smaller span than _hash_lock and thus lock recursion
can't happen.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 drivers/md/dm-ioctl.c |   17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

Index: linux-2.6.31.5-fast/drivers/md/dm-ioctl.c
===================================================================
--- linux-2.6.31.5-fast.orig/drivers/md/dm-ioctl.c	2009-10-28 19:49:13.000000000 +0100
+++ linux-2.6.31.5-fast/drivers/md/dm-ioctl.c	2009-11-06 03:46:04.000000000 +0100
@@ -56,6 +56,13 @@ static void dm_hash_remove_all(int keep_
  */
 static DECLARE_RWSEM(_hash_lock);
 
+/*
+ * Enables calling dm_get_mdptr and reading name and uuid from the hash table.
+ * This may be called from dm events when _hash_lock is held, so a separate
+ * lock is needed to avoid deadlock.
+ */
+static DEFINE_MUTEX(_name_read_lock);
+
 static void init_buckets(struct list_head *buckets)
 {
 	unsigned int i;
@@ -206,7 +213,9 @@ static int dm_hash_insert(const char *na
 		list_add(&cell->uuid_list, _uuid_buckets + hash_str(uuid));
 	}
 	dm_get(md);
+	mutex_lock(&_name_read_lock);
 	dm_set_mdptr(md, cell);
+	mutex_unlock(&_name_read_lock);
 	up_write(&_hash_lock);
 
 	return 0;
@@ -224,7 +233,9 @@ static void __hash_remove(struct hash_ce
 	/* remove from the dev hash */
 	list_del(&hc->uuid_list);
 	list_del(&hc->name_list);
+	mutex_lock(&_name_read_lock);
 	dm_set_mdptr(hc->md, NULL);
+	mutex_unlock(&_name_read_lock);
 
 	table = dm_get_table(hc->md);
 	if (table) {
@@ -321,7 +332,9 @@ static int dm_hash_rename(uint32_t cooki
 	 */
 	list_del(&hc->name_list);
 	old_name = hc->name;
+	mutex_lock(&_name_read_lock);
 	hc->name = new_name;
+	mutex_unlock(&_name_read_lock);
 	list_add(&hc->name_list, _name_buckets + hash_str(new_name));
 
 	/*
@@ -1583,7 +1596,7 @@ int dm_copy_name_and_uuid(struct mapped_
 		return -ENXIO;
 
 	dm_get(md);
-	down_read(&_hash_lock);
+	mutex_lock(&_name_read_lock);
 	hc = dm_get_mdptr(md);
 	if (!hc || hc->md != md) {
 		r = -ENXIO;
@@ -1596,7 +1609,7 @@ int dm_copy_name_and_uuid(struct mapped_
 		strcpy(uuid, hc->uuid ? : "");
 
 out:
-	up_read(&_hash_lock);
+	mutex_unlock(&_name_read_lock);
 	dm_put(md);
 
 	return r;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] a deadlock bug in the kernel-side device mapper code
       [not found]     ` <20091106151504.GS13375@agk-dp.fab.redhat.com>
@ 2009-11-06 16:30       ` Mikulas Patocka
  0 siblings, 0 replies; 13+ messages in thread
From: Mikulas Patocka @ 2009-11-06 16:30 UTC (permalink / raw)
  To: Alasdair G Kergon; +Cc: dm-devel

On Fri, 6 Nov 2009, Alasdair G Kergon wrote:

> On Thu, Nov 05, 2009 at 09:58:26PM -0500, Mikulas Patocka wrote:
> > +static DEFINE_MUTEX(_name_read_lock);
> 
> Any reason for a mutex rather than a spinlock?

There can be both. I basically use the rule "if there can be either 
spinlock or mutex, use mutex". Because mutexes don't create scheduling 
latency --- i.e. you don't have to check them "how much time can it spend 
inside a mutex and how to break a long lock into few smaller locks".

Performance of mutex and spinlock in non-contended case is the same.

Mikulas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] a deadlock bug in the kernel-side device mapper code
  2009-11-06  2:58   ` [PATCH] " Mikulas Patocka
       [not found]     ` <20091106151504.GS13375@agk-dp.fab.redhat.com>
@ 2009-11-06 17:27     ` malahal
  2009-11-09  8:51     ` Mike Anderson
  2 siblings, 0 replies; 13+ messages in thread
From: malahal @ 2009-11-06 17:27 UTC (permalink / raw)
  To: dm-devel

Mikulas Patocka [mpatocka@redhat.com] wrote:
> This patch introduces a new lock, _name_read_lock, that is placed around
> regions that modify pointer to the hash with dm_set_mdptr or that modify
> hc->name or hc->uuid. When this lock is taken, the caller can safely read
> the name and uuid.
> 
> This lock has much smaller span than _hash_lock and thus lock recursion
> can't happen.

This lock actually guarantees that hash cell, name (and uuid) don't
disappear when someone holds this lock. How about naming it to something
like _hashcell_name_uuid_lock()???

Thanks, Malahal.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] a deadlock bug in the kernel-side device mapper code
  2009-11-06  2:58   ` [PATCH] " Mikulas Patocka
       [not found]     ` <20091106151504.GS13375@agk-dp.fab.redhat.com>
  2009-11-06 17:27     ` malahal
@ 2009-11-09  8:51     ` Mike Anderson
  2009-11-09 17:57       ` malahal
  2009-11-10  6:13       ` Mikulas Patocka
  2 siblings, 2 replies; 13+ messages in thread
From: Mike Anderson @ 2009-11-09  8:51 UTC (permalink / raw)
  To: device-mapper development; +Cc: Alasdair G Kergon

Mikulas Patocka <mpatocka@redhat.com> wrote:
> Hi
> 
> This is the patch that uses two locks to avoid the deadlock.

Thanks for doing the patch. 

I had previously started trying to address this issue using rcu and moving
dm_copy_name_and_uuid back to being called during dm_build_path_uevent, but
that patch still had a couple of cases to be addressed.

In testing your patch without moving where dm_copy_name_and_uuid is called
I run into a issue during test runs where I receive a BUG_ON for the
dm_put in dm_copy_name_and_uuid as DMF_FREEING was able to progress (Note:
this failure case occurs without your path). If the proper dm_get / dm_put
is added to the dm_uevent functions then there are cases where
dm_uevent_free becomes the last dm_put resulting in recursion.

It would be good since we are adding this synchronization if we selected a
synchronization type that could be called from dm_build_path_uevent (i.e.,
SOFTIRQ-safe) allowing the movement of the call to dm_copy_name_and_uuid
back to dm_build_path_uevent.

The test case below normally fails in about 5-10 minutes.

I am running the test case using a spinlock instead of the mutex and
moving dm_copy_name_and_uuid to being called from dm_build_path_uevent. It
has been running for a few hours now. I will continue to let it run.

Should we look to use a spinlock for this read access?

My test case just uses scsi debug to create a two path dm mpath device.

1.) modprobe scsi_debug vpd_use_hostno=0 add_host=2
2.) Then in one shell do a loop of "dmsetup remove" and multipath
3.) In another window do a loop of "dmsetup message ... fail_path"
followed by "dmsetup message ... reinstate_path" on the two paths of the
same dm device that is being removed / added.

Note: If someone tries to repeat this testing, occasionally I would hit an
issue in scsi_debug so for longer test runs I needed to add a patch for
handling ensuring that reacquiring queued_arr_lock did not occur.

Thanks,

-andmike
--
Michael Anderson
andmike@linux.vnet.ibm.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] a deadlock bug in the kernel-side device mapper code
  2009-11-09  8:51     ` Mike Anderson
@ 2009-11-09 17:57       ` malahal
  2009-11-09 22:24         ` malahal
  2009-11-10  0:24         ` Alasdair G Kergon
  2009-11-10  6:13       ` Mikulas Patocka
  1 sibling, 2 replies; 13+ messages in thread
From: malahal @ 2009-11-09 17:57 UTC (permalink / raw)
  To: dm-devel

Mike Anderson [andmike@linux.vnet.ibm.com] wrote:
> Mikulas Patocka <mpatocka@redhat.com> wrote:
> > Hi
> > 
> > This is the patch that uses two locks to avoid the deadlock.
> 
> Thanks for doing the patch. 
> 
> I had previously started trying to address this issue using rcu and moving
> dm_copy_name_and_uuid back to being called during dm_build_path_uevent, but
> that patch still had a couple of cases to be addressed.
> 
> In testing your patch without moving where dm_copy_name_and_uuid is called
> I run into a issue during test runs where I receive a BUG_ON for the
> dm_put in dm_copy_name_and_uuid as DMF_FREEING was able to progress (Note:
> this failure case occurs without your path). If the proper dm_get / dm_put
> is added to the dm_uevent functions then there are cases where
> dm_uevent_free becomes the last dm_put resulting in recursion.

Why do we need dm_get() and dm_put() in dm_copy_name_and_uuid()?
dm_copy_name_and_uuid() already has access to md and there shouldn't be
any need to hold a reference. The caller of dm_copy_name_and_uuid()
should have placed a hold. This is just some unnecessary code and should
not cause a BUG_ON though.

Can you send the BUG_ON stack?

dm_get_from_kobject() seems to be a culprit though. It checks
DMF_FREEING without a lock and then calls dm_get. Here is an untested
patch. Made on top of _name_read_lock patch.


Signed-off-by: malahal@us.ibm.com

diff -r ce6e81a22554 -r 1106e5e3dabc drivers/md/dm-ioctl.c
--- a/drivers/md/dm-ioctl.c	Thu Nov 05 21:35:30 2009 -0800
+++ b/drivers/md/dm-ioctl.c	Mon Nov 09 09:32:03 2009 -0800
@@ -1595,7 +1595,6 @@
 	if (!md)
 		return -ENXIO;
 
-	dm_get(md);
 	mutex_lock(&_name_read_lock);
 	hc = dm_get_mdptr(md);
 	if (!hc || hc->md != md) {
@@ -1610,7 +1609,6 @@
 
 out:
 	mutex_unlock(&_name_read_lock);
-	dm_put(md);
 
 	return r;
 }
diff -r ce6e81a22554 -r 1106e5e3dabc drivers/md/dm.c
--- a/drivers/md/dm.c	Thu Nov 05 21:35:30 2009 -0800
+++ b/drivers/md/dm.c	Mon Nov 09 09:32:03 2009 -0800
@@ -2584,11 +2584,18 @@
 	if (&md->kobj != kobj)
 		return NULL;
 
+	spin_lock(&_minor_lock);
 	if (test_bit(DMF_FREEING, &md->flags) ||
-	    test_bit(DMF_DELETING, &md->flags))
-		return NULL;
+	    test_bit(DMF_DELETING, &md->flags)) {
+		md = NULL;
+		goto out;
+	}
 
 	dm_get(md);
+
+out:
+	spin_unlock(&_minor_lock);
+
 	return md;
 }
 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] a deadlock bug in the kernel-side device mapper code
  2009-11-09 17:57       ` malahal
@ 2009-11-09 22:24         ` malahal
  2009-11-10  0:24         ` Alasdair G Kergon
  1 sibling, 0 replies; 13+ messages in thread
From: malahal @ 2009-11-09 22:24 UTC (permalink / raw)
  To: dm-devel

malahal@us.ibm.com [malahal@us.ibm.com] wrote:
> dm_get_from_kobject() seems to be a culprit though. It checks
> DMF_FREEING without a lock and then calls dm_get. Here is an untested
> patch. Made on top of _name_read_lock patch.

Found another place where the lock is not held correctly. This is an
update to my previous patch [dmf_freeing_lock.patch]

Description:

There are places where the code checks for DMF_FREEING bit flag and
conditionally calls dm_get(). This should be done under _minor_lock,
otherwise there is no point in checking for the bit flag.

Signed-off-by: malahal@us.ibm.com

diff -r e4c5c66b9a17 drivers/md/dm-ioctl.c
--- a/drivers/md/dm-ioctl.c	Mon Nov 09 13:38:38 2009 -0800
+++ b/drivers/md/dm-ioctl.c	Mon Nov 09 14:21:13 2009 -0800
@@ -1595,7 +1595,6 @@
 	if (!md)
 		return -ENXIO;
 
-	dm_get(md);
 	mutex_lock(&_name_read_lock);
 	hc = dm_get_mdptr(md);
 	if (!hc || hc->md != md) {
@@ -1610,7 +1609,6 @@
 
 out:
 	mutex_unlock(&_name_read_lock);
-	dm_put(md);
 
 	return r;
 }
diff -r e4c5c66b9a17 drivers/md/dm.c
--- a/drivers/md/dm.c	Mon Nov 09 13:38:38 2009 -0800
+++ b/drivers/md/dm.c	Mon Nov 09 14:21:13 2009 -0800
@@ -1998,28 +1998,24 @@
 	if (MAJOR(dev) != _major || minor >= (1 << MINORBITS))
 		return NULL;
 
-	spin_lock(&_minor_lock);
-
 	md = idr_find(&_minor_idr, minor);
 	if (md && (md == MINOR_ALLOCED ||
 		   (MINOR(disk_devt(dm_disk(md))) != minor) ||
-		   test_bit(DMF_FREEING, &md->flags))) {
-		md = NULL;
-		goto out;
-	}
-
-out:
-	spin_unlock(&_minor_lock);
+		   test_bit(DMF_FREEING, &md->flags)))
+		return NULL;
 
 	return md;
 }
 
 struct mapped_device *dm_get_md(dev_t dev)
 {
-	struct mapped_device *md = dm_find_md(dev);
+	struct mapped_device *md;
 
+	spin_lock(&_minor_lock);
+	md = dm_find_md(dev);
 	if (md)
 		dm_get(md);
+	spin_unlock(&_minor_lock);
 
 	return md;
 }
@@ -2584,11 +2580,17 @@
 	if (&md->kobj != kobj)
 		return NULL;
 
+	spin_lock(&_minor_lock);
+
 	if (test_bit(DMF_FREEING, &md->flags) ||
 	    test_bit(DMF_DELETING, &md->flags))
-		return NULL;
+		md = NULL;
 
-	dm_get(md);
+	if (md)
+		dm_get(md);
+
+	spin_unlock(&_minor_lock);
+
 	return md;
 }
 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] a deadlock bug in the kernel-side device mapper code
  2009-11-09 17:57       ` malahal
  2009-11-09 22:24         ` malahal
@ 2009-11-10  0:24         ` Alasdair G Kergon
  2009-11-10  1:50           ` malahal
  1 sibling, 1 reply; 13+ messages in thread
From: Alasdair G Kergon @ 2009-11-10  0:24 UTC (permalink / raw)
  To: dm-devel

On Mon, Nov 09, 2009 at 09:57:30AM -0800, malahal@us.ibm.com wrote:
> Why do we need dm_get() and dm_put() in dm_copy_name_and_uuid()?
> dm_copy_name_and_uuid() already has access to md and there shouldn't be
> any need to hold a reference. 

As Mike points out, struct dm_uevent holds a reference without incrementing the
ref count.

dm_path_uevent() already takes a reference - can everything get simplified
if we move this code there (and replace the new mutex with a spinlock of
course)?  Then dm_send_uevents won't need to use 'md'.

Alasdair

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] a deadlock bug in the kernel-side device mapper code
  2009-11-10  0:24         ` Alasdair G Kergon
@ 2009-11-10  1:50           ` malahal
  2009-11-10  5:24             ` Mike Anderson
  0 siblings, 1 reply; 13+ messages in thread
From: malahal @ 2009-11-10  1:50 UTC (permalink / raw)
  To: dm-devel

Alasdair G Kergon [agk@redhat.com] wrote:
> On Mon, Nov 09, 2009 at 09:57:30AM -0800, malahal@us.ibm.com wrote:
> > Why do we need dm_get() and dm_put() in dm_copy_name_and_uuid()?
> > dm_copy_name_and_uuid() already has access to md and there shouldn't be
> > any need to hold a reference. 
> 
> As Mike points out, struct dm_uevent holds a reference without
> incrementing the ref count.
> 
> dm_path_uevent() already takes a reference - can everything get
> simplified if we move this code there (and replace the new mutex with
> a spinlock of course)?  Then dm_send_uevents won't need to use 'md'.

I think the last dm_put() is calling the multipath destructor which
eventually calls (through a work struct) dm_send_uevents. At that point,
dm is guaranteed to exist but you *must* not place a hold on it. As the
current code in dm_copy_name_and_uuid() places a hold on a FREEING md,
it will result in a BUG_ON while calling the dm_put in the same
function. So, removing the dm_get/dm_put from that function should also
solve the BUG_ON issue.

I agree, removing the "md" field from the dm_event struct probably
simplifies other things.

Thanks, Malahal.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] a deadlock bug in the kernel-side device mapper code
  2009-11-10  1:50           ` malahal
@ 2009-11-10  5:24             ` Mike Anderson
  0 siblings, 0 replies; 13+ messages in thread
From: Mike Anderson @ 2009-11-10  5:24 UTC (permalink / raw)
  To: dm-devel

malahal@us.ibm.com <malahal@us.ibm.com> wrote:
> Alasdair G Kergon [agk@redhat.com] wrote:
> > On Mon, Nov 09, 2009 at 09:57:30AM -0800, malahal@us.ibm.com wrote:
> > > Why do we need dm_get() and dm_put() in dm_copy_name_and_uuid()?
> > > dm_copy_name_and_uuid() already has access to md and there shouldn't be
> > > any need to hold a reference. 
> > 
> > As Mike points out, struct dm_uevent holds a reference without
> > incrementing the ref count.
> > 
> > dm_path_uevent() already takes a reference - can everything get
> > simplified if we move this code there (and replace the new mutex with
> > a spinlock of course)?  Then dm_send_uevents won't need to use 'md'.
> 
> I think the last dm_put() is calling the multipath destructor which
> eventually calls (through a work struct) dm_send_uevents. At that point,
> dm is guaranteed to exist but you *must* not place a hold on it. As the
> current code in dm_copy_name_and_uuid() places a hold on a FREEING md,
> it will result in a BUG_ON while calling the dm_put in the same
> function. So, removing the dm_get/dm_put from that function should also
> solve the BUG_ON issue.
> 

I have been running Mikulas's patch and your last dmf_freeing_lock.patch.
The system system is running for 5 hrs with no BUG_ON or lock dep
warnings.

> I agree, removing the "md" field from the dm_event struct probably
> simplifies other things.
> 
I will capture similar data to what is provided below for a run using the
dmf_freeing_lock.patch, moving dm_copy_name_and_uuid to dm_path_uevent,
and a patch similar to Mikulas's except using a spinlock.

I attached the previous BUG_ON output you asked for. I also supplied two
systemtap snips of output captured during the previous mentioned run. The
first systemtap you can see that while we are not hitting the BUG_ON and
the dm_copy_name_and_uuid function will fail with ENXIO the sequence is
probably something we would like to avoid. The second systemtap just shows
a case where dev_remove is the last put (Note the dm_get and dm_put are
return probes and the ref is the value at the end of the function).

-andmike
--
Michael Anderson
andmike@linux.vnet.ibm.com

1.) BUG ON 

[  260.342806] ------------[ cut here ]------------
[  260.365067] kernel BUG at drivers/md/dm.c:2059!
[  260.381069] invalid opcode: 0000 [#1] SMP
[  260.413071] last sysfs file: /sys/kernel/uevent_seqnum
[  260.413071] CPU 3
[  260.413071] Modules linked in: scsi_debug crc_t10dif xt_tcpudp af_packet nf_conntrack_ipv6 ip6t_REJECT xt_state ipt_REJECT iptable_filter ip6table_mangle nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables ip6table_filter ip6_tables x_tables ipv6 fuse loop sr_mod cdrom ata_piix ahci i2c_i801 shpchp libata i2c_core tg3 sg pci_hotplug rng_core libphy floppy mpt2sas button mptctl joydev usbhid scsi_dh_alua scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc dm_round_robin dm_multipath scsi_dh uhci_hcd ehci_hcd usbcore sd_mod dm_snapshot dm_mod edd ext3 jbd fan mptsas mptscsih mptbase scsi_transport_sas aic79xx scsi_transport_spi scsi_mod thermal processor [last unloaded: freq_table]
[  260.766701] Pid: 18, comm: events/3 Not tainted 2.6.32-rc6-Mikulas_name_read_lock #1 eserver xSeries 346 -[884015U]-
[  260.861109] RIP: 0010:[<ffffffffa012915f>]  [<ffffffffa012915f>] dm_put+0x19/0x1b3 [dm_mod]
[  260.861109] RSP: 0018:ffff88003fac9c80  EFLAGS: 00010202
[  260.861109] RAX: 000000000000529b RBX: ffff88003c9cd000 RCX: ffff88003fac6506
[  260.861109] RDX: ffff88003fac9be0 RSI: ffff88003fac6c40 RDI: ffff88003c9cd000
[  260.861109] RBP: ffff88003fac9ca0 R08: 0000000000000003 R09: 0000000000000000
[  260.861109] R10: ffff88003e304140 R11: 0000000000000046 R12: 0000000000000000
[  260.861109] R13: ffff88003c9cd000 R14: ffff88003cc53dc8 R15: ffff88003cc53e48
[  260.861109] FS:  0000000000000000(0000) GS:ffff880004800000(0000) knlGS:0000000000000000
[  260.861109] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[  260.861109] CR2: 00007f9bcc13b000 CR3: 000000003e2e9000 CR4: 00000000000006e0
[  260.861109] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  260.861109] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  261.373150] Process events/3 (pid: 18, threadinfo ffff88003fac8000, task ffff88003fac6540)
[  261.373150] Stack:
[  261.373150]  ffff88003c9cd000 00000000fffffffa 0000000000000000 ffff88003c9cd000
[  261.373150] <0> ffff88003fac9ce0 ffffffffa012c529 ffff88003fac9ce0 ffff88003cc534a0
[  261.526731] <0> ffff88003fac9d40 ffff88003cc53dc8 ffff88003cc53e48 ffff88003fac9428
[  261.526731] Call Trace:
[  261.602734]  [<ffffffffa012c529>] dm_copy_name_and_uuid+0x9d/0xab [dm_mod]
[  261.602734]  [<ffffffffa012728e>] dm_send_uevents+0x6e/0x138 [dm_mod]
[  261.602734]  [<ffffffffa0127ba3>] event_callback+0x85/0xb6 [dm_mod]
[  261.602734]  [<ffffffffa012ad00>] dm_table_event+0x4a/0x5a [dm_mod]
[  261.750880]  [<ffffffffa01b1588>] trigger_event+0x13/0x15 [dm_multipath]
[  261.750880]  [<ffffffff810535aa>] worker_thread+0x232/0x33b
[  261.750880]  [<ffffffff81053550>] ? worker_thread+0x1d8/0x33b
[  261.750880]  [<ffffffff812b05c6>] ? thread_return+0x3e/0x108
[  261.885195]  [<ffffffffa01b1575>] ? trigger_event+0x0/0x15 [dm_multipath]
[  261.885195]  [<ffffffff810574e0>] ? autoremove_wake_function+0x0/0x38
[  261.885195]  [<ffffffff81053378>] ? worker_thread+0x0/0x33b
[  261.885195]  [<ffffffff81057194>] kthread+0x7d/0x85
[  261.885195]  [<ffffffff8100caba>] child_rip+0xa/0x20
[  261.885195]  [<ffffffff8100c47c>] ? restore_args+0x0/0x30
[  261.885195]  [<ffffffff81057117>] ? kthread+0x0/0x85
[  261.885195]  [<ffffffff8100cab0>] ? child_rip+0x0/0x20
[  261.885195] Code: 8b 45 d4 48 83 c4 28 5b 41 5c 41 5d 41 5e 41 5f c9 c3 55 48 89 e5 41 55 41 54 53 48 89 fb 48 83 ec 08 f6 87 60 01 00 00 08 74 04 <0f> 0b eb fe 48 8d bf 58 01 00 00 48 c7 c6 10 16 13 a0 e8 ce 53
[  261.885195] RIP  [<ffffffffa012915f>] dm_put+0x19/0x1b3 [dm_mod]
[  261.885195]  RSP <ffff88003fac9c80>
[  262.343081] ---[ end trace 0d93a89c3d6227e3 ]---


2.) System tap output dev_remove NOT last dm_put.

1257824260814259355, dm_get, md (253:5), ref 3, caller dm_table_get_md 0xffffffffa012a63b
1257824260814291496, dm_put, md (253:5), ref 2, caller dm_path_uevent 0xffffffffa0127211
1257824260814348359, dm_put, md (253:5), ref 1, caller target_message 0xffffffffa012caaa
1257824260814390748, dm_copy_name_and_uuid, md (253:5), ref 1
1257824260814404049, dm_copy_name_and_uuid, md (253:5), ref 1 ret 0
1257824260818867530, dm_put, md (253:5), ref 1, caller table_status 0xffffffffa012ce4f
1257824260819356132, dm_get, md (253:5), ref 2, caller __get_name_cell 0xffffffffa012c637
1257824260819383953, dm_put, md (253:5), ref 1, caller dev_status 0xffffffffa012c96e
1257824260819835016, dm_get, md (253:5), ref 2, caller __get_name_cell 0xffffffffa012c637
1257824260819868820, dm_put, md (253:5), ref 1, caller table_status 0xffffffffa012ce4f
1257824261030684451, dm_get, md (253:5), ref 2, caller __get_name_cell 0xffffffffa012c637
1257824261036110432, dm_get, md (253:5), ref 3, caller __get_name_cell 0xffffffffa012c637
1257824261036157879, dm_put, md (253:5), ref 2, caller __hash_remove 0xffffffffa012d874
1257824261036248683, dm_put, md (253:5), ref 1, caller dev_remove 0xffffffffa012d935
1257824261063081028, dm_get, md (253:5), ref 2, caller dm_table_get_md 0xffffffffa012a63b
1257824261063120742, dm_put, md (253:5), ref 1, caller dm_path_uevent 0xffffffffa0127211
1257824261063263612, dm_put(DMF_FREEING), md (), ref 0
1257824261063393477, dm_copy_name_and_uuid, md (253:5), ref 0
1257824261063408504, dm_copy_name_and_uuid, md (253:5), ref 0 ret -6
1257824261065012335, __unbind, md (253:5), ref 0
1257824261109351080, free_dev, md (°¡×/), ref 140
1257824261121781827, dm_put, md (253:5), ref 0, caller target_message 0xffffffffa012caaa

3.) System tap output dev_remove last dm_put.

1257824262109220909, dm_get, md (253:5), ref 3, caller dm_table_get_md 0xffffffffa012a63b
1257824262109252657, dm_put, md (253:5), ref 2, caller dm_path_uevent 0xffffffffa0127211
1257824262109375197, dm_put, md (253:5), ref 1, caller target_message 0xffffffffa012caaa
1257824262109427927, dm_copy_name_and_uuid, md (253:5), ref 1
1257824262109443336, dm_copy_name_and_uuid, md (253:5), ref 1 ret 0
1257824262114407438, dm_put, md (253:5), ref 1, caller table_status 0xffffffffa012ce4f
1257824262114910281, dm_get, md (253:5), ref 2, caller __get_name_cell 0xffffffffa012c637
1257824262114937691, dm_put, md (253:5), ref 1, caller dev_status 0xffffffffa012c96e
1257824262115397074, dm_get, md (253:5), ref 2, caller __get_name_cell 0xffffffffa012c637
1257824262115446916, dm_put, md (253:5), ref 1, caller table_status 0xffffffffa012ce4f
1257824262316985895, dm_get, md (253:5), ref 2, caller __get_name_cell 0xffffffffa012c637
1257824262317031425, dm_put, md (253:5), ref 1, caller __hash_remove 0xffffffffa012d874
1257824262317123286, dm_put(DMF_FREEING), md (), ref 0
1257824262318357193, __unbind, md (253:5), ref 0
1257824262318635815, free_dev, md (ðië/), ref 140
1257824262336344411, dm_put, md (253:5), ref 0, caller dev_remove 0xffffffffa012d935

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] a deadlock bug in the kernel-side device mapper code
  2009-11-09  8:51     ` Mike Anderson
  2009-11-09 17:57       ` malahal
@ 2009-11-10  6:13       ` Mikulas Patocka
  1 sibling, 0 replies; 13+ messages in thread
From: Mikulas Patocka @ 2009-11-10  6:13 UTC (permalink / raw)
  To: Mike Anderson; +Cc: device-mapper development, Alasdair G Kergon



On Mon, 9 Nov 2009, Mike Anderson wrote:

> Mikulas Patocka <mpatocka@redhat.com> wrote:
> > Hi
> > 
> > This is the patch that uses two locks to avoid the deadlock.
> 
> Thanks for doing the patch. 
> 
> I had previously started trying to address this issue using rcu and moving
> dm_copy_name_and_uuid back to being called during dm_build_path_uevent, but
> that patch still had a couple of cases to be addressed.
> 
> In testing your patch without moving where dm_copy_name_and_uuid is called
> I run into a issue during test runs where I receive a BUG_ON for the
> dm_put in dm_copy_name_and_uuid as DMF_FREEING was able to progress (Note:
> this failure case occurs without your path). If the proper dm_get / dm_put
> is added to the dm_uevent functions then there are cases where
> dm_uevent_free becomes the last dm_put resulting in recursion.

Hi

Try this patch, it removes dm_get/dm_put that is not needed anyway.

Mikulas

---

Fix deadlock due to _hash_lock recursion

This patch fixes the following deadlock:
 #0 [ffff8106298dfb48] schedule at ffffffff80063035
 #1 [ffff8106298dfc20] __down_read at ffffffff8006475d
 #2 [ffff8106298dfc60] dm_copy_name_and_uuid at ffffffff8824f740
 #3 [ffff8106298dfc90] dm_send_uevents at ffffffff88252685
 #4 [ffff8106298dfcd0] event_callback at ffffffff8824c678
 #5 [ffff8106298dfd00] dm_table_event at ffffffff8824dd01
 #6 [ffff8106298dfd10] __hash_remove at ffffffff882507ad
 #7 [ffff8106298dfd30] dev_remove at ffffffff88250865
 #8 [ffff8106298dfd60] ctl_ioctl at ffffffff88250d80
 #9 [ffff8106298dfee0] do_ioctl at ffffffff800418c4
#10 [ffff8106298dff00] vfs_ioctl at ffffffff8002fab9
#11 [ffff8106298dff40] sys_ioctl at ffffffff8004bdaf
#12 [ffff8106298dff80] tracesys at ffffffff8005d28d (via system_call)

_hash_lock is taken in dev_remove and then again in dm_copy_name_and_uuid.

This patch introduces a new lock, _name_read_lock, that is placed around
regions that modify pointer to the hash with dm_set_mdptr or that modify
hc->name or hc->uuid. When this lock is taken, the caller can safely read
the name and uuid.

This lock has much smaller span than _hash_lock and thus lock recursion
can't happen.

There's another problem: calling dm_get+dm_put while "md" is being freed
causes BUG(). In the function dm_copy_name_and_uuid we are sure that
the "md" exists and is freed under us, so drop this dm_get+dm_put.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 drivers/md/dm-ioctl.c |   19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

Index: linux-2.6.31.5-fast/drivers/md/dm-ioctl.c
===================================================================
--- linux-2.6.31.5-fast.orig/drivers/md/dm-ioctl.c	2009-11-09 02:30:20.000000000 +0100
+++ linux-2.6.31.5-fast/drivers/md/dm-ioctl.c	2009-11-10 07:08:29.000000000 +0100
@@ -56,6 +56,13 @@ static void dm_hash_remove_all(int keep_
  */
 static DECLARE_RWSEM(_hash_lock);
 
+/*
+ * Enables calling dm_get_mdptr and reading name and uuid from the hash table.
+ * This may be called from dm events when _hash_lock is held, so a separate
+ * lock is needed to avoid deadlock.
+ */
+static DEFINE_MUTEX(_name_read_lock);
+
 static void init_buckets(struct list_head *buckets)
 {
 	unsigned int i;
@@ -206,7 +213,9 @@ static int dm_hash_insert(const char *na
 		list_add(&cell->uuid_list, _uuid_buckets + hash_str(uuid));
 	}
 	dm_get(md);
+	mutex_lock(&_name_read_lock);
 	dm_set_mdptr(md, cell);
+	mutex_unlock(&_name_read_lock);
 	up_write(&_hash_lock);
 
 	return 0;
@@ -224,7 +233,9 @@ static void __hash_remove(struct hash_ce
 	/* remove from the dev hash */
 	list_del(&hc->uuid_list);
 	list_del(&hc->name_list);
+	mutex_lock(&_name_read_lock);
 	dm_set_mdptr(hc->md, NULL);
+	mutex_unlock(&_name_read_lock);
 
 	table = dm_get_table(hc->md);
 	if (table) {
@@ -321,7 +332,9 @@ static int dm_hash_rename(uint32_t cooki
 	 */
 	list_del(&hc->name_list);
 	old_name = hc->name;
+	mutex_lock(&_name_read_lock);
 	hc->name = new_name;
+	mutex_unlock(&_name_read_lock);
 	list_add(&hc->name_list, _name_buckets + hash_str(new_name));
 
 	/*
@@ -1582,8 +1595,7 @@ int dm_copy_name_and_uuid(struct mapped_
 	if (!md)
 		return -ENXIO;
 
-	dm_get(md);
-	down_read(&_hash_lock);
+	mutex_lock(&_name_read_lock);
 	hc = dm_get_mdptr(md);
 	if (!hc || hc->md != md) {
 		r = -ENXIO;
@@ -1596,8 +1608,7 @@ int dm_copy_name_and_uuid(struct mapped_
 		strcpy(uuid, hc->uuid ? : "");
 
 out:
-	up_read(&_hash_lock);
-	dm_put(md);
+	mutex_unlock(&_name_read_lock);
 
 	return r;
 }

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2009-11-10  6:13 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-05 13:21 a deadlock bug in the kernel-side device mapper code guy keren
2009-11-05 14:24 ` Alasdair G Kergon
2009-11-06  2:58   ` [PATCH] " Mikulas Patocka
     [not found]     ` <20091106151504.GS13375@agk-dp.fab.redhat.com>
2009-11-06 16:30       ` Mikulas Patocka
2009-11-06 17:27     ` malahal
2009-11-09  8:51     ` Mike Anderson
2009-11-09 17:57       ` malahal
2009-11-09 22:24         ` malahal
2009-11-10  0:24         ` Alasdair G Kergon
2009-11-10  1:50           ` malahal
2009-11-10  5:24             ` Mike Anderson
2009-11-10  6:13       ` Mikulas Patocka
2009-11-06  0:24 ` Kiyoshi Ueda

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.