All of lore.kernel.org
 help / color / mirror / Atom feed
* 3.15-rc4: circular locking dependency triggered by dm-multipath
@ 2014-05-06  9:31 Bart Van Assche
  2014-05-06 13:22 ` Mike Snitzer
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Bart Van Assche @ 2014-05-06  9:31 UTC (permalink / raw)
  To: device-mapper development

Hello,

Has anyone else perhaps already run into this ?

Thanks,

Bart.

======================================================
[ INFO: possible circular locking dependency detected ]
3.15.0-rc4-debug+ #1 Not tainted
-------------------------------------------------------
multipathd/10364 is trying to acquire lock:
 (&(&q->__queue_lock)->rlock){-.-...}, at: [<ffffffffa043bff3>] dm_table_run_md_queue_async+0x33/0x60 [dm_mod]

but task is already holding lock:
 (&(&m->lock)->rlock){..-...}, at: [<ffffffffa077a647>] queue_if_no_path+0x27/0xc0 [dm_multipath]

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&(&m->lock)->rlock){..-...}:
       [<ffffffff810a56c3>] lock_acquire+0x93/0x1c0
       [<ffffffff814ae8eb>] _raw_spin_lock+0x3b/0x50
       [<ffffffffa043a3e9>] dm_blk_open+0x19/0x80 [dm_mod]
       [<ffffffff811cbc41>] __blkdev_get+0xd1/0x4c0
       [<ffffffff811cc215>] blkdev_get+0x1e5/0x380
       [<ffffffff811cc45b>] blkdev_open+0x5b/0x80
       [<ffffffff8118a12e>] do_dentry_open.isra.15+0x1de/0x2a0
       [<ffffffff8118a300>] finish_open+0x30/0x40
       [<ffffffff8119c44d>] do_last.isra.61+0xa5d/0x1200
       [<ffffffff8119cca7>] path_openat+0xb7/0x620
       [<ffffffff8119d88a>] do_filp_open+0x3a/0x90
       [<ffffffff8118bb4e>] do_sys_open+0x12e/0x210
       [<ffffffff8118bc4e>] SyS_open+0x1e/0x20
       [<ffffffff814b84e2>] tracesys+0xd0/0xd5

-> #0 (&(&q->__queue_lock)->rlock){-.-...}:
       [<ffffffff810a4c46>] __lock_acquire+0x1716/0x1a00
       [<ffffffff810a56c3>] lock_acquire+0x93/0x1c0
       [<ffffffff814aea56>] _raw_spin_lock_irqsave+0x46/0x60
       [<ffffffffa043bff3>] dm_table_run_md_queue_async+0x33/0x60 [dm_mod]
       [<ffffffffa077a692>] queue_if_no_path+0x72/0xc0 [dm_multipath]
       [<ffffffffa077a6f9>] multipath_presuspend+0x19/0x20 [dm_multipath]
       [<ffffffffa043d34a>] dm_table_presuspend_targets+0x4a/0x60 [dm_mod]
       [<ffffffffa043ad5d>] dm_suspend+0x6d/0x1f0 [dm_mod]
       [<ffffffffa043ff63>] dev_suspend+0x1c3/0x220 [dm_mod]
       [<ffffffffa0440759>] ctl_ioctl+0x269/0x500 [dm_mod]
       [<ffffffffa0440a03>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
       [<ffffffff811a04c0>] do_vfs_ioctl+0x300/0x520
       [<ffffffff811a0721>] SyS_ioctl+0x41/0x80
       [<ffffffff814b84e2>] tracesys+0xd0/0xd5

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&(&m->lock)->rlock);
                               lock(&(&q->__queue_lock)->rlock);
                               lock(&(&m->lock)->rlock);
  lock(&(&q->__queue_lock)->rlock);

 *** DEADLOCK ***

2 locks held by multipathd/10364:
 #0:  (&md->suspend_lock){+.+...}, at: [<ffffffffa043ad28>] dm_suspend+0x38/0x1f0 [dm_mod]
 #1:  (&(&m->lock)->rlock){..-...}, at: [<ffffffffa077a647>] queue_if_no_path+0x27/0xc0 [dm_multipath]

stack backtrace:
CPU: 10 PID: 10364 Comm: multipathd Not tainted 3.15.0-rc4-debug+ #1
Hardware name: MSI MS-7737/Big Bang-XPower II (MS-7737), BIOS V1.5 10/16/2012
 ffffffff81fea150 ffff8807c194fa98 ffffffff814a6780 ffffffff81fea150
 ffff8807c194fad8 ffffffff814a36db ffff8807c194fb30 ffff8807c1954c88
 0000000000000001 0000000000000002 ffff8807c1954c88 ffff8807c1954440
Call Trace:
 [<ffffffff814a6780>] dump_stack+0x4e/0x7a
 [<ffffffff814a36db>] print_circular_bug+0x200/0x20f
 [<ffffffff810a4c46>] __lock_acquire+0x1716/0x1a00
 [<ffffffff810a56c3>] lock_acquire+0x93/0x1c0
 [<ffffffffa043bff3>] ? dm_table_run_md_queue_async+0x33/0x60 [dm_mod]
 [<ffffffff814aea56>] _raw_spin_lock_irqsave+0x46/0x60
 [<ffffffffa043bff3>] ? dm_table_run_md_queue_async+0x33/0x60 [dm_mod]
 [<ffffffffa043bff3>] dm_table_run_md_queue_async+0x33/0x60 [dm_mod]
 [<ffffffffa077a692>] queue_if_no_path+0x72/0xc0 [dm_multipath]
 [<ffffffffa077a6f9>] multipath_presuspend+0x19/0x20 [dm_multipath]
 [<ffffffffa043d34a>] dm_table_presuspend_targets+0x4a/0x60 [dm_mod]
 [<ffffffffa043ad5d>] dm_suspend+0x6d/0x1f0 [dm_mod]
 [<ffffffffa043ff63>] dev_suspend+0x1c3/0x220 [dm_mod]
 [<ffffffffa043fda0>] ? table_load+0x350/0x350 [dm_mod]
 [<ffffffffa0440759>] ctl_ioctl+0x269/0x500 [dm_mod]
 [<ffffffffa0440a03>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
 [<ffffffff811a04c0>] do_vfs_ioctl+0x300/0x520
 [<ffffffff811ac089>] ? __fget+0x129/0x300
 [<ffffffff811abf65>] ? __fget+0x5/0x300
 [<ffffffff811ac2d0>] ? __fget_light+0x30/0x160
 [<ffffffff811a0721>] SyS_ioctl+0x41/0x80
 [<ffffffff814b84e2>] tracesys+0xd0/0xd5

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 3.15-rc4: circular locking dependency triggered by dm-multipath
  2014-05-06  9:31 3.15-rc4: circular locking dependency triggered by dm-multipath Bart Van Assche
@ 2014-05-06 13:22 ` Mike Snitzer
  2014-05-06 13:46   ` Bart Van Assche
  2014-05-07 20:53 ` Mike Snitzer
  2014-05-08  5:57 ` Hannes Reinecke
  2 siblings, 1 reply; 15+ messages in thread
From: Mike Snitzer @ 2014-05-06 13:22 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development

On Tue, May 06 2014 at  5:31am -0400,
Bart Van Assche <bvanassche@acm.org> wrote:

> Hello,
> 
> Has anyone else perhaps already run into this ?

Not seen this, how easily is it reproduced?

These recent commits are clearly the issue:

e809917735e dm mpath: push back requests instead of queueing
3e9f1be1b40 dm mpath: remove process_queued_ios()

Need to take a closer look at the lockdep splat you provided.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 3.15-rc4: circular locking dependency triggered by dm-multipath
  2014-05-06 13:22 ` Mike Snitzer
@ 2014-05-06 13:46   ` Bart Van Assche
  0 siblings, 0 replies; 15+ messages in thread
From: Bart Van Assche @ 2014-05-06 13:46 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development

On 05/06/14 15:22, Mike Snitzer wrote:
> On Tue, May 06 2014 at  5:31am -0400,
> Bart Van Assche <bvanassche@acm.org> wrote:
>> Has anyone else perhaps already run into this ?
> 
> Not seen this, how easily is it reproduced?
> 
> These recent commits are clearly the issue:
> 
> e809917735e dm mpath: push back requests instead of queueing
> 3e9f1be1b40 dm mpath: remove process_queued_ios()
> 
> Need to take a closer look at the lockdep splat you provided.

All I did was to simulate a cable pull while running a fio test on top
of a dm-multipath device on top of a few SCSI devices. This lockdep
report was generated the first time I ran the cable pull simulation. I
have not yet tried to reproduce this but it's not impossible that this
is 100% reproducible. The same test runs fine with kernel 3.14.

Bart.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 3.15-rc4: circular locking dependency triggered by dm-multipath
  2014-05-06  9:31 3.15-rc4: circular locking dependency triggered by dm-multipath Bart Van Assche
  2014-05-06 13:22 ` Mike Snitzer
@ 2014-05-07 20:53 ` Mike Snitzer
  2014-05-08  5:57 ` Hannes Reinecke
  2 siblings, 0 replies; 15+ messages in thread
From: Mike Snitzer @ 2014-05-07 20:53 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development

Hi Bart,

On Tue, May 06 2014 at  5:31am -0400,
Bart Van Assche <bvanassche@acm.org> wrote:

> Hello,
> 
> Has anyone else perhaps already run into this ?
> 
> Thanks,
> 
> Bart.
> 
> ======================================================
> [ INFO: possible circular locking dependency detected ]
> 3.15.0-rc4-debug+ #1 Not tainted
> -------------------------------------------------------
> multipathd/10364 is trying to acquire lock:
>  (&(&q->__queue_lock)->rlock){-.-...}, at: [<ffffffffa043bff3>] dm_table_run_md_queue_async+0x33/0x60 [dm_mod]
> 
> but task is already holding lock:
>  (&(&m->lock)->rlock){..-...}, at: [<ffffffffa077a647>] queue_if_no_path+0x27/0xc0 [dm_multipath]
> 
> which lock already depends on the new lock.
> 
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #1 (&(&m->lock)->rlock){..-...}:
>        [<ffffffff810a56c3>] lock_acquire+0x93/0x1c0
>        [<ffffffff814ae8eb>] _raw_spin_lock+0x3b/0x50
>        [<ffffffffa043a3e9>] dm_blk_open+0x19/0x80 [dm_mod]
>        [<ffffffff811cbc41>] __blkdev_get+0xd1/0x4c0
>        [<ffffffff811cc215>] blkdev_get+0x1e5/0x380
>        [<ffffffff811cc45b>] blkdev_open+0x5b/0x80
>        [<ffffffff8118a12e>] do_dentry_open.isra.15+0x1de/0x2a0
>        [<ffffffff8118a300>] finish_open+0x30/0x40
>        [<ffffffff8119c44d>] do_last.isra.61+0xa5d/0x1200
>        [<ffffffff8119cca7>] path_openat+0xb7/0x620
>        [<ffffffff8119d88a>] do_filp_open+0x3a/0x90
>        [<ffffffff8118bb4e>] do_sys_open+0x12e/0x210
>        [<ffffffff8118bc4e>] SyS_open+0x1e/0x20
>        [<ffffffff814b84e2>] tracesys+0xd0/0xd5

I'm not sure what lockdep is trying to communicate for #1 here.  Not
seeing where m->lock is taken in the above stack.  Nor does it appear to
take q->queue_lock.

> -> #0 (&(&q->__queue_lock)->rlock){-.-...}:
>        [<ffffffff810a4c46>] __lock_acquire+0x1716/0x1a00
>        [<ffffffff810a56c3>] lock_acquire+0x93/0x1c0
>        [<ffffffff814aea56>] _raw_spin_lock_irqsave+0x46/0x60
>        [<ffffffffa043bff3>] dm_table_run_md_queue_async+0x33/0x60 [dm_mod]
>        [<ffffffffa077a692>] queue_if_no_path+0x72/0xc0 [dm_multipath]
>        [<ffffffffa077a6f9>] multipath_presuspend+0x19/0x20 [dm_multipath]
>        [<ffffffffa043d34a>] dm_table_presuspend_targets+0x4a/0x60 [dm_mod]
>        [<ffffffffa043ad5d>] dm_suspend+0x6d/0x1f0 [dm_mod]
>        [<ffffffffa043ff63>] dev_suspend+0x1c3/0x220 [dm_mod]
>        [<ffffffffa0440759>] ctl_ioctl+0x269/0x500 [dm_mod]
>        [<ffffffffa0440a03>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
>        [<ffffffff811a04c0>] do_vfs_ioctl+0x300/0x520
>        [<ffffffff811a0721>] SyS_ioctl+0x41/0x80
>        [<ffffffff814b84e2>] tracesys+0xd0/0xd5
> 
> other info that might help us debug this:
> 
>  Possible unsafe locking scenario:
> 
>        CPU0                    CPU1
>        ----                    ----
>   lock(&(&m->lock)->rlock);
>                                lock(&(&q->__queue_lock)->rlock);
>                                lock(&(&m->lock)->rlock);
>   lock(&(&q->__queue_lock)->rlock);
> 
>  *** DEADLOCK ***

I'm not seeing where q->queue_lock is held before m->lock is taken.

I did find that multipath_ioctl() does _not_ hold m->lock when calling
dm_table_run_md_queue_async() -- the other callers hold m->lock.

Any chance you can see if this patch silences lockdep?

--
 drivers/md/dm-mpath.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index aa009e8..fa0f6cb 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -1566,8 +1566,8 @@ static int multipath_ioctl(struct dm_target *ti, unsigned int cmd,
 		}
 		if (m->pg_init_required)
 			__pg_init_all_paths(m);
-		spin_unlock_irqrestore(&m->lock, flags);
 		dm_table_run_md_queue_async(m->ti->table);
+		spin_unlock_irqrestore(&m->lock, flags);
 	}
 
 	return r ? : __blkdev_driver_ioctl(bdev, mode, cmd, arg);

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: 3.15-rc4: circular locking dependency triggered by dm-multipath
  2014-05-06  9:31 3.15-rc4: circular locking dependency triggered by dm-multipath Bart Van Assche
  2014-05-06 13:22 ` Mike Snitzer
  2014-05-07 20:53 ` Mike Snitzer
@ 2014-05-08  5:57 ` Hannes Reinecke
  2014-05-26 11:44   ` Bart Van Assche
  2 siblings, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2014-05-08  5:57 UTC (permalink / raw)
  To: dm-devel

On 05/06/2014 11:31 AM, Bart Van Assche wrote:
> Hello,
>
> Has anyone else perhaps already run into this ?
>
> Thanks,
>
> Bart.
>
> ======================================================
> [ INFO: possible circular locking dependency detected ]
> 3.15.0-rc4-debug+ #1 Not tainted
> -------------------------------------------------------
> multipathd/10364 is trying to acquire lock:
>   (&(&q->__queue_lock)->rlock){-.-...}, at: [<ffffffffa043bff3>] dm_table_run_md_queue_async+0x33/0x60 [dm_mod]
>
> but task is already holding lock:
>   (&(&m->lock)->rlock){..-...}, at: [<ffffffffa077a647>] queue_if_no_path+0x27/0xc0 [dm_multipath]
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (&(&m->lock)->rlock){..-...}:
>         [<ffffffff810a56c3>] lock_acquire+0x93/0x1c0
>         [<ffffffff814ae8eb>] _raw_spin_lock+0x3b/0x50
>         [<ffffffffa043a3e9>] dm_blk_open+0x19/0x80 [dm_mod]
>         [<ffffffff811cbc41>] __blkdev_get+0xd1/0x4c0
>         [<ffffffff811cc215>] blkdev_get+0x1e5/0x380
>         [<ffffffff811cc45b>] blkdev_open+0x5b/0x80
>         [<ffffffff8118a12e>] do_dentry_open.isra.15+0x1de/0x2a0
>         [<ffffffff8118a300>] finish_open+0x30/0x40
>         [<ffffffff8119c44d>] do_last.isra.61+0xa5d/0x1200
>         [<ffffffff8119cca7>] path_openat+0xb7/0x620
>         [<ffffffff8119d88a>] do_filp_open+0x3a/0x90
>         [<ffffffff8118bb4e>] do_sys_open+0x12e/0x210
>         [<ffffffff8118bc4e>] SyS_open+0x1e/0x20
>         [<ffffffff814b84e2>] tracesys+0xd0/0xd5
>
> -> #0 (&(&q->__queue_lock)->rlock){-.-...}:
>         [<ffffffff810a4c46>] __lock_acquire+0x1716/0x1a00
>         [<ffffffff810a56c3>] lock_acquire+0x93/0x1c0
>         [<ffffffff814aea56>] _raw_spin_lock_irqsave+0x46/0x60
>         [<ffffffffa043bff3>] dm_table_run_md_queue_async+0x33/0x60 [dm_mod]
>         [<ffffffffa077a692>] queue_if_no_path+0x72/0xc0 [dm_multipath]
>         [<ffffffffa077a6f9>] multipath_presuspend+0x19/0x20 [dm_multipath]
>         [<ffffffffa043d34a>] dm_table_presuspend_targets+0x4a/0x60 [dm_mod]
>         [<ffffffffa043ad5d>] dm_suspend+0x6d/0x1f0 [dm_mod]
>         [<ffffffffa043ff63>] dev_suspend+0x1c3/0x220 [dm_mod]
>         [<ffffffffa0440759>] ctl_ioctl+0x269/0x500 [dm_mod]
>         [<ffffffffa0440a03>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
>         [<ffffffff811a04c0>] do_vfs_ioctl+0x300/0x520
>         [<ffffffff811a0721>] SyS_ioctl+0x41/0x80
>         [<ffffffff814b84e2>] tracesys+0xd0/0xd5
>
> other info that might help us debug this:
>
>   Possible unsafe locking scenario:
>
>         CPU0                    CPU1
>         ----                    ----
>    lock(&(&m->lock)->rlock);
>                                 lock(&(&q->__queue_lock)->rlock);
>                                 lock(&(&m->lock)->rlock);
>    lock(&(&q->__queue_lock)->rlock);
>
>   *** DEADLOCK ***
>
> 2 locks held by multipathd/10364:
>   #0:  (&md->suspend_lock){+.+...}, at: [<ffffffffa043ad28>] dm_suspend+0x38/0x1f0 [dm_mod]
>   #1:  (&(&m->lock)->rlock){..-...}, at: [<ffffffffa077a647>] queue_if_no_path+0x27/0xc0 [dm_multipath]
>
> stack backtrace:
> CPU: 10 PID: 10364 Comm: multipathd Not tainted 3.15.0-rc4-debug+ #1
> Hardware name: MSI MS-7737/Big Bang-XPower II (MS-7737), BIOS V1.5 10/16/2012
>   ffffffff81fea150 ffff8807c194fa98 ffffffff814a6780 ffffffff81fea150
>   ffff8807c194fad8 ffffffff814a36db ffff8807c194fb30 ffff8807c1954c88
>   0000000000000001 0000000000000002 ffff8807c1954c88 ffff8807c1954440
> Call Trace:
>   [<ffffffff814a6780>] dump_stack+0x4e/0x7a
>   [<ffffffff814a36db>] print_circular_bug+0x200/0x20f
>   [<ffffffff810a4c46>] __lock_acquire+0x1716/0x1a00
>   [<ffffffff810a56c3>] lock_acquire+0x93/0x1c0
>   [<ffffffffa043bff3>] ? dm_table_run_md_queue_async+0x33/0x60 [dm_mod]
>   [<ffffffff814aea56>] _raw_spin_lock_irqsave+0x46/0x60
>   [<ffffffffa043bff3>] ? dm_table_run_md_queue_async+0x33/0x60 [dm_mod]
>   [<ffffffffa043bff3>] dm_table_run_md_queue_async+0x33/0x60 [dm_mod]
>   [<ffffffffa077a692>] queue_if_no_path+0x72/0xc0 [dm_multipath]
>   [<ffffffffa077a6f9>] multipath_presuspend+0x19/0x20 [dm_multipath]
>   [<ffffffffa043d34a>] dm_table_presuspend_targets+0x4a/0x60 [dm_mod]
>   [<ffffffffa043ad5d>] dm_suspend+0x6d/0x1f0 [dm_mod]
>   [<ffffffffa043ff63>] dev_suspend+0x1c3/0x220 [dm_mod]
>   [<ffffffffa043fda0>] ? table_load+0x350/0x350 [dm_mod]
>   [<ffffffffa0440759>] ctl_ioctl+0x269/0x500 [dm_mod]
>   [<ffffffffa0440a03>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
>   [<ffffffff811a04c0>] do_vfs_ioctl+0x300/0x520
>   [<ffffffff811ac089>] ? __fget+0x129/0x300
>   [<ffffffff811abf65>] ? __fget+0x5/0x300
>   [<ffffffff811ac2d0>] ? __fget_light+0x30/0x160
>   [<ffffffff811a0721>] SyS_ioctl+0x41/0x80
>   [<ffffffff814b84e2>] tracesys+0xd0/0xd5
>
Hmm. No, I haven't seen it (yet).
Looking into it.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 3.15-rc4: circular locking dependency triggered by dm-multipath
  2014-05-08  5:57 ` Hannes Reinecke
@ 2014-05-26 11:44   ` Bart Van Assche
  2014-05-26 12:10     ` Hannes Reinecke
  0 siblings, 1 reply; 15+ messages in thread
From: Bart Van Assche @ 2014-05-26 11:44 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: device-mapper development

On 05/08/14 07:57, Hannes Reinecke wrote:
> On 05/06/2014 11:31 AM, Bart Van Assche wrote:
>> Hello,
>>
>> Has anyone else perhaps already run into this ?
>>
>> Thanks,
>>
>> Bart.
>>
>> ======================================================
>> [ INFO: possible circular locking dependency detected ]
>> 3.15.0-rc4-debug+ #1 Not tainted
>> -------------------------------------------------------
>> multipathd/10364 is trying to acquire lock:
>>   (&(&q->__queue_lock)->rlock){-.-...}, at: [<ffffffffa043bff3>]
>> dm_table_run_md_queue_async+0x33/0x60 [dm_mod]
>>
>> but task is already holding lock:
>>   (&(&m->lock)->rlock){..-...}, at: [<ffffffffa077a647>]
>> queue_if_no_path+0x27/0xc0 [dm_multipath]
>>
>> which lock already depends on the new lock.
>>
>>
>> the existing dependency chain (in reverse order) is:
>>
>> -> #1 (&(&m->lock)->rlock){..-...}:
>>         [<ffffffff810a56c3>] lock_acquire+0x93/0x1c0
>>         [<ffffffff814ae8eb>] _raw_spin_lock+0x3b/0x50
>>         [<ffffffffa043a3e9>] dm_blk_open+0x19/0x80 [dm_mod]
>>         [<ffffffff811cbc41>] __blkdev_get+0xd1/0x4c0
>>         [<ffffffff811cc215>] blkdev_get+0x1e5/0x380
>>         [<ffffffff811cc45b>] blkdev_open+0x5b/0x80
>>         [<ffffffff8118a12e>] do_dentry_open.isra.15+0x1de/0x2a0
>>         [<ffffffff8118a300>] finish_open+0x30/0x40
>>         [<ffffffff8119c44d>] do_last.isra.61+0xa5d/0x1200
>>         [<ffffffff8119cca7>] path_openat+0xb7/0x620
>>         [<ffffffff8119d88a>] do_filp_open+0x3a/0x90
>>         [<ffffffff8118bb4e>] do_sys_open+0x12e/0x210
>>         [<ffffffff8118bc4e>] SyS_open+0x1e/0x20
>>         [<ffffffff814b84e2>] tracesys+0xd0/0xd5
>>
>> -> #0 (&(&q->__queue_lock)->rlock){-.-...}:
>>         [<ffffffff810a4c46>] __lock_acquire+0x1716/0x1a00
>>         [<ffffffff810a56c3>] lock_acquire+0x93/0x1c0
>>         [<ffffffff814aea56>] _raw_spin_lock_irqsave+0x46/0x60
>>         [<ffffffffa043bff3>] dm_table_run_md_queue_async+0x33/0x60
>> [dm_mod]
>>         [<ffffffffa077a692>] queue_if_no_path+0x72/0xc0 [dm_multipath]
>>         [<ffffffffa077a6f9>] multipath_presuspend+0x19/0x20
>> [dm_multipath]
>>         [<ffffffffa043d34a>] dm_table_presuspend_targets+0x4a/0x60
>> [dm_mod]
>>         [<ffffffffa043ad5d>] dm_suspend+0x6d/0x1f0 [dm_mod]
>>         [<ffffffffa043ff63>] dev_suspend+0x1c3/0x220 [dm_mod]
>>         [<ffffffffa0440759>] ctl_ioctl+0x269/0x500 [dm_mod]
>>         [<ffffffffa0440a03>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
>>         [<ffffffff811a04c0>] do_vfs_ioctl+0x300/0x520
>>         [<ffffffff811a0721>] SyS_ioctl+0x41/0x80
>>         [<ffffffff814b84e2>] tracesys+0xd0/0xd5
>>
>> other info that might help us debug this:
>>
>>   Possible unsafe locking scenario:
>>
>>         CPU0                    CPU1
>>         ----                    ----
>>    lock(&(&m->lock)->rlock);
>>                                 lock(&(&q->__queue_lock)->rlock);
>>                                 lock(&(&m->lock)->rlock);
>>    lock(&(&q->__queue_lock)->rlock);
>>
>>   *** DEADLOCK ***
>>
>> 2 locks held by multipathd/10364:
>>   #0:  (&md->suspend_lock){+.+...}, at: [<ffffffffa043ad28>]
>> dm_suspend+0x38/0x1f0 [dm_mod]
>>   #1:  (&(&m->lock)->rlock){..-...}, at: [<ffffffffa077a647>]
>> queue_if_no_path+0x27/0xc0 [dm_multipath]
>>
>> stack backtrace:
>> CPU: 10 PID: 10364 Comm: multipathd Not tainted 3.15.0-rc4-debug+ #1
>> Hardware name: MSI MS-7737/Big Bang-XPower II (MS-7737), BIOS V1.5
>> 10/16/2012
>>   ffffffff81fea150 ffff8807c194fa98 ffffffff814a6780 ffffffff81fea150
>>   ffff8807c194fad8 ffffffff814a36db ffff8807c194fb30 ffff8807c1954c88
>>   0000000000000001 0000000000000002 ffff8807c1954c88 ffff8807c1954440
>> Call Trace:
>>   [<ffffffff814a6780>] dump_stack+0x4e/0x7a
>>   [<ffffffff814a36db>] print_circular_bug+0x200/0x20f
>>   [<ffffffff810a4c46>] __lock_acquire+0x1716/0x1a00
>>   [<ffffffff810a56c3>] lock_acquire+0x93/0x1c0
>>   [<ffffffffa043bff3>] ? dm_table_run_md_queue_async+0x33/0x60 [dm_mod]
>>   [<ffffffff814aea56>] _raw_spin_lock_irqsave+0x46/0x60
>>   [<ffffffffa043bff3>] ? dm_table_run_md_queue_async+0x33/0x60 [dm_mod]
>>   [<ffffffffa043bff3>] dm_table_run_md_queue_async+0x33/0x60 [dm_mod]
>>   [<ffffffffa077a692>] queue_if_no_path+0x72/0xc0 [dm_multipath]
>>   [<ffffffffa077a6f9>] multipath_presuspend+0x19/0x20 [dm_multipath]
>>   [<ffffffffa043d34a>] dm_table_presuspend_targets+0x4a/0x60 [dm_mod]
>>   [<ffffffffa043ad5d>] dm_suspend+0x6d/0x1f0 [dm_mod]
>>   [<ffffffffa043ff63>] dev_suspend+0x1c3/0x220 [dm_mod]
>>   [<ffffffffa043fda0>] ? table_load+0x350/0x350 [dm_mod]
>>   [<ffffffffa0440759>] ctl_ioctl+0x269/0x500 [dm_mod]
>>   [<ffffffffa0440a03>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
>>   [<ffffffff811a04c0>] do_vfs_ioctl+0x300/0x520
>>   [<ffffffff811ac089>] ? __fget+0x129/0x300
>>   [<ffffffff811abf65>] ? __fget+0x5/0x300
>>   [<ffffffff811ac2d0>] ? __fget_light+0x30/0x160
>>   [<ffffffff811a0721>] SyS_ioctl+0x41/0x80
>>   [<ffffffff814b84e2>] tracesys+0xd0/0xd5
>>
> Hmm. No, I haven't seen it (yet).
> Looking into it.

Hello Hannes,

Not sure if this information helps but apparently the above warning is
not a false positive but indicates a real deadlock. I have been able to
reproduce the above warning with 3.15-rc7. Shortly after the above
message appeared the following was reported:

Watchdog detected hard LOCKUP on cpu 11
Call Trace:
 <NMI>  [<ffffffff814a7bd0>] dump_stack+0x4e/0x7a
 [<ffffffff8104bb8d>] warn_slowpath_common+0x7d/0xa0
 [<ffffffff8104bbfc>] warn_slowpath_fmt+0x4c/0x50
 [<ffffffff810e89b5>] watchdog_overflow_callback+0xd5/0x120
 [<ffffffff8112268c>] __perf_event_overflow+0x9c/0x320
 [<ffffffff81017a50>] ? x86_perf_event_set_period+0xe0/0x150
 [<ffffffff811236b4>] perf_event_overflow+0x14/0x20
 [<ffffffff8101e5de>] intel_pmu_handle_irq+0x1ce/0x3c0
 [<ffffffff814affd1>] ? _raw_spin_unlock+0x31/0x50
 [<ffffffff812e6c3f>] ? ghes_copy_tofrom_phys+0x11f/0x210
 [<ffffffff814b28ab>] perf_event_nmi_handler+0x2b/0x50
 [<ffffffff814b1da2>] nmi_handle.isra.5+0xc2/0x380
 [<ffffffff814b1ce5>] ? nmi_handle.isra.5+0x5/0x380
 [<ffffffff814b21e5>] do_nmi+0x185/0x3e0
 [<ffffffff814b1287>] end_repeat_nmi+0x1e/0x2e
 [<ffffffff8126548d>] ? delay_tsc+0x2d/0xc0
 [<ffffffff8126548d>] ? delay_tsc+0x2d/0xc0
 [<ffffffff8126548d>] ? delay_tsc+0x2d/0xc0
 <<EOE>>  [<ffffffff812653bf>] __delay+0xf/0x20
 [<ffffffff810a86b1>] do_raw_spin_lock+0xe1/0x140
 [<ffffffff814afece>] _raw_spin_lock_irqsave+0x4e/0x60
 [<ffffffffa0782500>] ? multipath_busy+0x20/0xf0 [dm_multipath]
 [<ffffffffa0782500>] multipath_busy+0x20/0xf0 [dm_multipath]
 [<ffffffffa024e56c>] dm_request_fn+0xcc/0x320 [dm_mod]
 [<ffffffff8123b013>] __blk_run_queue+0x33/0x40
 [<ffffffff8123b205>] blk_delay_work+0x25/0x40
 [<ffffffff8106d7aa>] process_one_work+0x1ea/0x6c0
 [<ffffffff8106d748>] ? process_one_work+0x188/0x6c0
 [<ffffffff8106dd9b>] worker_thread+0x11b/0x3a0
 [<ffffffff8106dc80>] ? process_one_work+0x6c0/0x6c0
 [<ffffffff810758bd>] kthread+0xed/0x110
 [<ffffffff810757d0>] ? insert_kthread_work+0x80/0x80
 [<ffffffff814b972c>] ret_from_fork+0x7c/0xb0
 [<ffffffff810757d0>] ? insert_kthread_work+0x80/0x80

Bart.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 3.15-rc4: circular locking dependency triggered by dm-multipath
  2014-05-26 11:44   ` Bart Van Assche
@ 2014-05-26 12:10     ` Hannes Reinecke
  2014-05-26 12:20       ` Bart Van Assche
  0 siblings, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2014-05-26 12:10 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development, Mike Snitzer

On 05/26/2014 01:44 PM, Bart Van Assche wrote:
> On 05/08/14 07:57, Hannes Reinecke wrote:
>> On 05/06/2014 11:31 AM, Bart Van Assche wrote:
>>> Hello,
>>>
>>> Has anyone else perhaps already run into this ?
>>>
>>> Thanks,
>>>
>>> Bart.
>>>
>>> ======================================================
>>> [ INFO: possible circular locking dependency detected ]
>>> 3.15.0-rc4-debug+ #1 Not tainted
>>> -------------------------------------------------------
>>> multipathd/10364 is trying to acquire lock:
>>>    (&(&q->__queue_lock)->rlock){-.-...}, at: [<ffffffffa043bff3>]
>>> dm_table_run_md_queue_async+0x33/0x60 [dm_mod]
>>>
>>> but task is already holding lock:
>>>    (&(&m->lock)->rlock){..-...}, at: [<ffffffffa077a647>]
>>> queue_if_no_path+0x27/0xc0 [dm_multipath]
>>>
>>> which lock already depends on the new lock.
>>>
>>>
>>> the existing dependency chain (in reverse order) is:
>>>
>>> -> #1 (&(&m->lock)->rlock){..-...}:
>>>          [<ffffffff810a56c3>] lock_acquire+0x93/0x1c0
>>>          [<ffffffff814ae8eb>] _raw_spin_lock+0x3b/0x50
>>>          [<ffffffffa043a3e9>] dm_blk_open+0x19/0x80 [dm_mod]
>>>          [<ffffffff811cbc41>] __blkdev_get+0xd1/0x4c0
>>>          [<ffffffff811cc215>] blkdev_get+0x1e5/0x380
>>>          [<ffffffff811cc45b>] blkdev_open+0x5b/0x80
>>>          [<ffffffff8118a12e>] do_dentry_open.isra.15+0x1de/0x2a0
>>>          [<ffffffff8118a300>] finish_open+0x30/0x40
>>>          [<ffffffff8119c44d>] do_last.isra.61+0xa5d/0x1200
>>>          [<ffffffff8119cca7>] path_openat+0xb7/0x620
>>>          [<ffffffff8119d88a>] do_filp_open+0x3a/0x90
>>>          [<ffffffff8118bb4e>] do_sys_open+0x12e/0x210
>>>          [<ffffffff8118bc4e>] SyS_open+0x1e/0x20
>>>          [<ffffffff814b84e2>] tracesys+0xd0/0xd5
>>>
>>> -> #0 (&(&q->__queue_lock)->rlock){-.-...}:
>>>          [<ffffffff810a4c46>] __lock_acquire+0x1716/0x1a00
>>>          [<ffffffff810a56c3>] lock_acquire+0x93/0x1c0
>>>          [<ffffffff814aea56>] _raw_spin_lock_irqsave+0x46/0x60
>>>          [<ffffffffa043bff3>] dm_table_run_md_queue_async+0x33/0x60
>>> [dm_mod]
>>>          [<ffffffffa077a692>] queue_if_no_path+0x72/0xc0 [dm_multipath]
>>>          [<ffffffffa077a6f9>] multipath_presuspend+0x19/0x20
>>> [dm_multipath]
>>>          [<ffffffffa043d34a>] dm_table_presuspend_targets+0x4a/0x60
>>> [dm_mod]
>>>          [<ffffffffa043ad5d>] dm_suspend+0x6d/0x1f0 [dm_mod]
>>>          [<ffffffffa043ff63>] dev_suspend+0x1c3/0x220 [dm_mod]
>>>          [<ffffffffa0440759>] ctl_ioctl+0x269/0x500 [dm_mod]
>>>          [<ffffffffa0440a03>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
>>>          [<ffffffff811a04c0>] do_vfs_ioctl+0x300/0x520
>>>          [<ffffffff811a0721>] SyS_ioctl+0x41/0x80
>>>          [<ffffffff814b84e2>] tracesys+0xd0/0xd5
>>>
>>> other info that might help us debug this:
>>>
>>>    Possible unsafe locking scenario:
>>>
>>>          CPU0                    CPU1
>>>          ----                    ----
>>>     lock(&(&m->lock)->rlock);
>>>                                  lock(&(&q->__queue_lock)->rlock);
>>>                                  lock(&(&m->lock)->rlock);
>>>     lock(&(&q->__queue_lock)->rlock);
>>>
>>>    *** DEADLOCK ***
>>>
>>> 2 locks held by multipathd/10364:
>>>    #0:  (&md->suspend_lock){+.+...}, at: [<ffffffffa043ad28>]
>>> dm_suspend+0x38/0x1f0 [dm_mod]
>>>    #1:  (&(&m->lock)->rlock){..-...}, at: [<ffffffffa077a647>]
>>> queue_if_no_path+0x27/0xc0 [dm_multipath]
>>>
>>> stack backtrace:
>>> CPU: 10 PID: 10364 Comm: multipathd Not tainted 3.15.0-rc4-debug+ #1
>>> Hardware name: MSI MS-7737/Big Bang-XPower II (MS-7737), BIOS V1.5
>>> 10/16/2012
>>>    ffffffff81fea150 ffff8807c194fa98 ffffffff814a6780 ffffffff81fea150
>>>    ffff8807c194fad8 ffffffff814a36db ffff8807c194fb30 ffff8807c1954c88
>>>    0000000000000001 0000000000000002 ffff8807c1954c88 ffff8807c1954440
>>> Call Trace:
>>>    [<ffffffff814a6780>] dump_stack+0x4e/0x7a
>>>    [<ffffffff814a36db>] print_circular_bug+0x200/0x20f
>>>    [<ffffffff810a4c46>] __lock_acquire+0x1716/0x1a00
>>>    [<ffffffff810a56c3>] lock_acquire+0x93/0x1c0
>>>    [<ffffffffa043bff3>] ? dm_table_run_md_queue_async+0x33/0x60 [dm_mod]
>>>    [<ffffffff814aea56>] _raw_spin_lock_irqsave+0x46/0x60
>>>    [<ffffffffa043bff3>] ? dm_table_run_md_queue_async+0x33/0x60 [dm_mod]
>>>    [<ffffffffa043bff3>] dm_table_run_md_queue_async+0x33/0x60 [dm_mod]
>>>    [<ffffffffa077a692>] queue_if_no_path+0x72/0xc0 [dm_multipath]
>>>    [<ffffffffa077a6f9>] multipath_presuspend+0x19/0x20 [dm_multipath]
>>>    [<ffffffffa043d34a>] dm_table_presuspend_targets+0x4a/0x60 [dm_mod]
>>>    [<ffffffffa043ad5d>] dm_suspend+0x6d/0x1f0 [dm_mod]
>>>    [<ffffffffa043ff63>] dev_suspend+0x1c3/0x220 [dm_mod]
>>>    [<ffffffffa043fda0>] ? table_load+0x350/0x350 [dm_mod]
>>>    [<ffffffffa0440759>] ctl_ioctl+0x269/0x500 [dm_mod]
>>>    [<ffffffffa0440a03>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
>>>    [<ffffffff811a04c0>] do_vfs_ioctl+0x300/0x520
>>>    [<ffffffff811ac089>] ? __fget+0x129/0x300
>>>    [<ffffffff811abf65>] ? __fget+0x5/0x300
>>>    [<ffffffff811ac2d0>] ? __fget_light+0x30/0x160
>>>    [<ffffffff811a0721>] SyS_ioctl+0x41/0x80
>>>    [<ffffffff814b84e2>] tracesys+0xd0/0xd5
>>>
>> Hmm. No, I haven't seen it (yet).
>> Looking into it.
>
> Hello Hannes,
>
> Not sure if this information helps but apparently the above warning is
> not a false positive but indicates a real deadlock. I have been able to
> reproduce the above warning with 3.15-rc7. Shortly after the above
> message appeared the following was reported:
>
> Watchdog detected hard LOCKUP on cpu 11
> Call Trace:
>   <NMI>  [<ffffffff814a7bd0>] dump_stack+0x4e/0x7a
>   [<ffffffff8104bb8d>] warn_slowpath_common+0x7d/0xa0
>   [<ffffffff8104bbfc>] warn_slowpath_fmt+0x4c/0x50
>   [<ffffffff810e89b5>] watchdog_overflow_callback+0xd5/0x120
>   [<ffffffff8112268c>] __perf_event_overflow+0x9c/0x320
>   [<ffffffff81017a50>] ? x86_perf_event_set_period+0xe0/0x150
>   [<ffffffff811236b4>] perf_event_overflow+0x14/0x20
>   [<ffffffff8101e5de>] intel_pmu_handle_irq+0x1ce/0x3c0
>   [<ffffffff814affd1>] ? _raw_spin_unlock+0x31/0x50
>   [<ffffffff812e6c3f>] ? ghes_copy_tofrom_phys+0x11f/0x210
>   [<ffffffff814b28ab>] perf_event_nmi_handler+0x2b/0x50
>   [<ffffffff814b1da2>] nmi_handle.isra.5+0xc2/0x380
>   [<ffffffff814b1ce5>] ? nmi_handle.isra.5+0x5/0x380
>   [<ffffffff814b21e5>] do_nmi+0x185/0x3e0
>   [<ffffffff814b1287>] end_repeat_nmi+0x1e/0x2e
>   [<ffffffff8126548d>] ? delay_tsc+0x2d/0xc0
>   [<ffffffff8126548d>] ? delay_tsc+0x2d/0xc0
>   [<ffffffff8126548d>] ? delay_tsc+0x2d/0xc0
>   <<EOE>>  [<ffffffff812653bf>] __delay+0xf/0x20
>   [<ffffffff810a86b1>] do_raw_spin_lock+0xe1/0x140
>   [<ffffffff814afece>] _raw_spin_lock_irqsave+0x4e/0x60
>   [<ffffffffa0782500>] ? multipath_busy+0x20/0xf0 [dm_multipath]
>   [<ffffffffa0782500>] multipath_busy+0x20/0xf0 [dm_multipath]
>   [<ffffffffa024e56c>] dm_request_fn+0xcc/0x320 [dm_mod]
>   [<ffffffff8123b013>] __blk_run_queue+0x33/0x40
>   [<ffffffff8123b205>] blk_delay_work+0x25/0x40
>   [<ffffffff8106d7aa>] process_one_work+0x1ea/0x6c0
>   [<ffffffff8106d748>] ? process_one_work+0x188/0x6c0
>   [<ffffffff8106dd9b>] worker_thread+0x11b/0x3a0
>   [<ffffffff8106dc80>] ? process_one_work+0x6c0/0x6c0
>   [<ffffffff810758bd>] kthread+0xed/0x110
>   [<ffffffff810757d0>] ? insert_kthread_work+0x80/0x80
>   [<ffffffff814b972c>] ret_from_fork+0x7c/0xb0
>   [<ffffffff810757d0>] ? insert_kthread_work+0x80/0x80
>
Mike Snitzer had a patch in his device-mapper tree:

dm mpath: fix lock order inconsistency in multipath_ioctl 
(2014-05-14 16:12:17 -0400)

I was sort of hoping that would address this issue.
Can you check?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 3.15-rc4: circular locking dependency triggered by dm-multipath
  2014-05-26 12:10     ` Hannes Reinecke
@ 2014-05-26 12:20       ` Bart Van Assche
  2014-05-26 12:29         ` Hannes Reinecke
  0 siblings, 1 reply; 15+ messages in thread
From: Bart Van Assche @ 2014-05-26 12:20 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: device-mapper development, Mike Snitzer

On 05/26/14 14:10, Hannes Reinecke wrote:
> Mike Snitzer had a patch in his device-mapper tree:
> 
> dm mpath: fix lock order inconsistency in multipath_ioctl (2014-05-14
> 16:12:17 -0400)
> 
> I was sort of hoping that would address this issue.
> Can you check?

Hello Hannes,

Is it possible that that patch already got included in v3.15-rc6 and
hence that that patch was included in my test ?

$ git log v3.15-rc5..v3.15-rc6 | grep 'dm mpath: fix lock order
inconsistency in multipath_ioctl'
      dm mpath: fix lock order inconsistency in multipath_ioctl
    dm mpath: fix lock order inconsistency in multipath_ioctl

Bart.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 3.15-rc4: circular locking dependency triggered by dm-multipath
  2014-05-26 12:20       ` Bart Van Assche
@ 2014-05-26 12:29         ` Hannes Reinecke
  2014-05-26 12:44           ` Hannes Reinecke
  0 siblings, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2014-05-26 12:29 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development, Mike Snitzer

On 05/26/2014 02:20 PM, Bart Van Assche wrote:
> On 05/26/14 14:10, Hannes Reinecke wrote:
>> Mike Snitzer had a patch in his device-mapper tree:
>>
>> dm mpath: fix lock order inconsistency in multipath_ioctl (2014-05-14
>> 16:12:17 -0400)
>>
>> I was sort of hoping that would address this issue.
>> Can you check?
>
> Hello Hannes,
>
> Is it possible that that patch already got included in v3.15-rc6 and
> hence that that patch was included in my test ?
>
> $ git log v3.15-rc5..v3.15-rc6 | grep 'dm mpath: fix lock order
> inconsistency in multipath_ioctl'
>        dm mpath: fix lock order inconsistency in multipath_ioctl
>      dm mpath: fix lock order inconsistency in multipath_ioctl
>
Could be.

Okay, I'll be cross-checking.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 3.15-rc4: circular locking dependency triggered by dm-multipath
  2014-05-26 12:29         ` Hannes Reinecke
@ 2014-05-26 12:44           ` Hannes Reinecke
  2014-05-26 12:51             ` Hannes Reinecke
  0 siblings, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2014-05-26 12:44 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: dm-devel

On 05/26/2014 02:29 PM, Hannes Reinecke wrote:
> On 05/26/2014 02:20 PM, Bart Van Assche wrote:
>> On 05/26/14 14:10, Hannes Reinecke wrote:
>>> Mike Snitzer had a patch in his device-mapper tree:
>>>
>>> dm mpath: fix lock order inconsistency in multipath_ioctl
>>> (2014-05-14
>>> 16:12:17 -0400)
>>>
>>> I was sort of hoping that would address this issue.
>>> Can you check?
>>
>> Hello Hannes,
>>
>> Is it possible that that patch already got included in v3.15-rc6 and
>> hence that that patch was included in my test ?
>>
>> $ git log v3.15-rc5..v3.15-rc6 | grep 'dm mpath: fix lock order
>> inconsistency in multipath_ioctl'
>>        dm mpath: fix lock order inconsistency in multipath_ioctl
>>      dm mpath: fix lock order inconsistency in multipath_ioctl
>>
> Could be.
>
> Okay, I'll be cross-checking.
>
Can you check if this makes lockdep happy?

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index aa009e8..40b3036 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -445,11 +445,11 @@ static int queue_if_no_path(struct multipath 
*m, unsigned
queue_if_no_path,
         else
                 m->saved_queue_if_no_path = queue_if_no_path;
         m->queue_if_no_path = queue_if_no_path;
-       if (!m->queue_if_no_path)
-               dm_table_run_md_queue_async(m->ti->table);
-
         spin_unlock_irqrestore(&m->lock, flags);

+       if (!queue_if_no_path)
+               dm_table_run_md_queue_async(m->ti->table);
+
         return 0;
  }

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: 3.15-rc4: circular locking dependency triggered by dm-multipath
  2014-05-26 12:44           ` Hannes Reinecke
@ 2014-05-26 12:51             ` Hannes Reinecke
  2014-05-26 14:54               ` Bart Van Assche
  0 siblings, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2014-05-26 12:51 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: dm-devel

[-- Attachment #1: Type: text/plain, Size: 1991 bytes --]

On 05/26/2014 02:44 PM, Hannes Reinecke wrote:
> On 05/26/2014 02:29 PM, Hannes Reinecke wrote:
>> On 05/26/2014 02:20 PM, Bart Van Assche wrote:
>>> On 05/26/14 14:10, Hannes Reinecke wrote:
>>>> Mike Snitzer had a patch in his device-mapper tree:
>>>>
>>>> dm mpath: fix lock order inconsistency in multipath_ioctl
>>>> (2014-05-14
>>>> 16:12:17 -0400)
>>>>
>>>> I was sort of hoping that would address this issue.
>>>> Can you check?
>>>
>>> Hello Hannes,
>>>
>>> Is it possible that that patch already got included in v3.15-rc6 and
>>> hence that that patch was included in my test ?
>>>
>>> $ git log v3.15-rc5..v3.15-rc6 | grep 'dm mpath: fix lock order
>>> inconsistency in multipath_ioctl'
>>>        dm mpath: fix lock order inconsistency in multipath_ioctl
>>>      dm mpath: fix lock order inconsistency in multipath_ioctl
>>>
>> Could be.
>>
>> Okay, I'll be cross-checking.
>>
> Can you check if this makes lockdep happy?
>
> diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> index aa009e8..40b3036 100644
> --- a/drivers/md/dm-mpath.c
> +++ b/drivers/md/dm-mpath.c
> @@ -445,11 +445,11 @@ static int queue_if_no_path(struct multipath
> *m, unsigned
> queue_if_no_path,
>          else
>                  m->saved_queue_if_no_path = queue_if_no_path;
>          m->queue_if_no_path = queue_if_no_path;
> -       if (!m->queue_if_no_path)
> -               dm_table_run_md_queue_async(m->ti->table);
> -
>          spin_unlock_irqrestore(&m->lock, flags);
>
> +       if (!queue_if_no_path)
> +               dm_table_run_md_queue_async(m->ti->table);
> +
>          return 0;
>   }
>
Or, better still, try the attached patch.
There's one more instance where lockdep might complain.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

[-- Attachment #2: 0001-dm-multipath-Fixup-lockdep-warning.patch --]
[-- Type: text/x-patch, Size: 1833 bytes --]

From 44bd0601e47f5cc5d26b679550f0a5a2f2c3f487 Mon Sep 17 00:00:00 2001
From: Hannes Reinecke <hare@suse.de>
Date: Mon, 26 May 2014 14:45:39 +0200
Subject: [PATCH] dm-multipath: Fixup lockdep warning

lockdep complains about a circular locking.
And indeed, we need to release the lock before calling
dm_table_run_md_queue_asycn().

Reported-by: Bart van Assche <bvanassche@acm.org>
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/md/dm-mpath.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index aa009e8..ebfa411 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -445,11 +445,11 @@ static int queue_if_no_path(struct multipath *m, unsigned queue_if_no_path,
 	else
 		m->saved_queue_if_no_path = queue_if_no_path;
 	m->queue_if_no_path = queue_if_no_path;
-	if (!m->queue_if_no_path)
-		dm_table_run_md_queue_async(m->ti->table);
-
 	spin_unlock_irqrestore(&m->lock, flags);
 
+	if (!queue_if_no_path)
+		dm_table_run_md_queue_async(m->ti->table);
+
 	return 0;
 }
 
@@ -954,7 +954,7 @@ out:
  */
 static int reinstate_path(struct pgpath *pgpath)
 {
-	int r = 0;
+	int r = 0, run_queue = 0;
 	unsigned long flags;
 	struct multipath *m = pgpath->pg->m;
 
@@ -978,7 +978,7 @@ static int reinstate_path(struct pgpath *pgpath)
 
 	if (!m->nr_valid_paths++) {
 		m->current_pgpath = NULL;
-		dm_table_run_md_queue_async(m->ti->table);
+		run_queue = 1;
 	} else if (m->hw_handler_name && (m->current_pg == pgpath->pg)) {
 		if (queue_work(kmpath_handlerd, &pgpath->activate_path.work))
 			m->pg_init_in_progress++;
@@ -991,6 +991,8 @@ static int reinstate_path(struct pgpath *pgpath)
 
 out:
 	spin_unlock_irqrestore(&m->lock, flags);
+	if (run_queue)
+		dm_table_run_md_queue_async(m->ti->table);
 
 	return r;
 }
-- 
1.7.12.4


[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: 3.15-rc4: circular locking dependency triggered by dm-multipath
  2014-05-26 12:51             ` Hannes Reinecke
@ 2014-05-26 14:54               ` Bart Van Assche
  2014-05-27 13:22                 ` Mike Snitzer
  0 siblings, 1 reply; 15+ messages in thread
From: Bart Van Assche @ 2014-05-26 14:54 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: dm-devel

On 05/26/14 14:51, Hannes Reinecke wrote:
> Or, better still, try the attached patch.
> There's one more instance where lockdep might complain.

Does anyone know whether it's safe to revert commit 4cdd2ad780 ? Anyway,
I ran a test with v3.15-rc7 + your patch + 4cdd2ad780 reverted and that
test ran fine. I didn't encounter any lockups nor any lockdep
complaints. The test I ran was to run a fio I/O verification test on top
of a dm-multipath device. That multipath device was created on top of
several SCSI device nodes created by the ib_srp driver. About once every
30 seconds a cable removal / reinsertion was simulated via the
ibportstate command.

Bart.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 3.15-rc4: circular locking dependency triggered by dm-multipath
  2014-05-26 14:54               ` Bart Van Assche
@ 2014-05-27 13:22                 ` Mike Snitzer
  2014-05-27 14:49                   ` Mike Snitzer
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Snitzer @ 2014-05-27 13:22 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: dm-devel

On Mon, May 26 2014 at 10:54am -0400,
Bart Van Assche <bvanassche@acm.org> wrote:

> On 05/26/14 14:51, Hannes Reinecke wrote:
> > Or, better still, try the attached patch.
> > There's one more instance where lockdep might complain.
> 
> Does anyone know whether it's safe to revert commit 4cdd2ad780 ?

Yes, it should be reverted, otherwise multipath_ioctl is using
inconsistent lock order.  I should've waited for test confirmation
before sending the previous "fix" upstream.  It made the lock order
consistent (but consistently wrong).

> Anyway,
> I ran a test with v3.15-rc7 + your patch + 4cdd2ad780 reverted and that
> test ran fine. I didn't encounter any lockups nor any lockdep
> complaints. The test I ran was to run a fio I/O verification test on top
> of a dm-multipath device. That multipath device was created on top of
> several SCSI device nodes created by the ib_srp driver. About once every
> 30 seconds a cable removal / reinsertion was simulated via the
> ibportstate command.

Great, thanks.  I'll try to get this queued for Linus this week.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 3.15-rc4: circular locking dependency triggered by dm-multipath
  2014-05-27 13:22                 ` Mike Snitzer
@ 2014-05-27 14:49                   ` Mike Snitzer
  2014-05-27 16:09                     ` Bart Van Assche
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Snitzer @ 2014-05-27 14:49 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: dm-devel

On Tue, May 27 2014 at  9:22am -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Mon, May 26 2014 at 10:54am -0400,
> Bart Van Assche <bvanassche@acm.org> wrote:
> 
> > Anyway,
> > I ran a test with v3.15-rc7 + your patch + 4cdd2ad780 reverted and that
> > test ran fine. I didn't encounter any lockups nor any lockdep
> > complaints. The test I ran was to run a fio I/O verification test on top
> > of a dm-multipath device. That multipath device was created on top of
> > several SCSI device nodes created by the ib_srp driver. About once every
> > 30 seconds a cable removal / reinsertion was simulated via the
> > ibportstate command.
> 
> Great, thanks.  I'll try to get this queued for Linus this week.

Done, please see:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-linus&id=63d832c30142cdceb478b1cac7d943d83b95b2dc

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 3.15-rc4: circular locking dependency triggered by dm-multipath
  2014-05-27 14:49                   ` Mike Snitzer
@ 2014-05-27 16:09                     ` Bart Van Assche
  0 siblings, 0 replies; 15+ messages in thread
From: Bart Van Assche @ 2014-05-27 16:09 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: dm-devel

On 05/27/14 16:49, Mike Snitzer wrote:
> On Tue, May 27 2014 at  9:22am -0400,
> Mike Snitzer <snitzer@redhat.com> wrote:
>> On Mon, May 26 2014 at 10:54am -0400,
>> Bart Van Assche <bvanassche@acm.org> wrote:
>>> Anyway,
>>> I ran a test with v3.15-rc7 + your patch + 4cdd2ad780 reverted and that
>>> test ran fine. I didn't encounter any lockups nor any lockdep
>>> complaints. The test I ran was to run a fio I/O verification test on top
>>> of a dm-multipath device. That multipath device was created on top of
>>> several SCSI device nodes created by the ib_srp driver. About once every
>>> 30 seconds a cable removal / reinsertion was simulated via the
>>> ibportstate command.
>>
>> Great, thanks.  I'll try to get this queued for Linus this week.
> 
> Done, please see:
> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-linus&id=63d832c30142cdceb478b1cac7d943d83b95b2dc

Hello Mike,

My test passes with that tree.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-05-27 16:09 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-06  9:31 3.15-rc4: circular locking dependency triggered by dm-multipath Bart Van Assche
2014-05-06 13:22 ` Mike Snitzer
2014-05-06 13:46   ` Bart Van Assche
2014-05-07 20:53 ` Mike Snitzer
2014-05-08  5:57 ` Hannes Reinecke
2014-05-26 11:44   ` Bart Van Assche
2014-05-26 12:10     ` Hannes Reinecke
2014-05-26 12:20       ` Bart Van Assche
2014-05-26 12:29         ` Hannes Reinecke
2014-05-26 12:44           ` Hannes Reinecke
2014-05-26 12:51             ` Hannes Reinecke
2014-05-26 14:54               ` Bart Van Assche
2014-05-27 13:22                 ` Mike Snitzer
2014-05-27 14:49                   ` Mike Snitzer
2014-05-27 16:09                     ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.