* 3.15-rc4: circular locking dependency triggered by dm-multipath @ 2014-05-06 9:31 Bart Van Assche 2014-05-06 13:22 ` Mike Snitzer ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: Bart Van Assche @ 2014-05-06 9:31 UTC (permalink / raw) To: device-mapper development Hello, Has anyone else perhaps already run into this ? Thanks, Bart. ====================================================== [ INFO: possible circular locking dependency detected ] 3.15.0-rc4-debug+ #1 Not tainted ------------------------------------------------------- multipathd/10364 is trying to acquire lock: (&(&q->__queue_lock)->rlock){-.-...}, at: [<ffffffffa043bff3>] dm_table_run_md_queue_async+0x33/0x60 [dm_mod] but task is already holding lock: (&(&m->lock)->rlock){..-...}, at: [<ffffffffa077a647>] queue_if_no_path+0x27/0xc0 [dm_multipath] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&(&m->lock)->rlock){..-...}: [<ffffffff810a56c3>] lock_acquire+0x93/0x1c0 [<ffffffff814ae8eb>] _raw_spin_lock+0x3b/0x50 [<ffffffffa043a3e9>] dm_blk_open+0x19/0x80 [dm_mod] [<ffffffff811cbc41>] __blkdev_get+0xd1/0x4c0 [<ffffffff811cc215>] blkdev_get+0x1e5/0x380 [<ffffffff811cc45b>] blkdev_open+0x5b/0x80 [<ffffffff8118a12e>] do_dentry_open.isra.15+0x1de/0x2a0 [<ffffffff8118a300>] finish_open+0x30/0x40 [<ffffffff8119c44d>] do_last.isra.61+0xa5d/0x1200 [<ffffffff8119cca7>] path_openat+0xb7/0x620 [<ffffffff8119d88a>] do_filp_open+0x3a/0x90 [<ffffffff8118bb4e>] do_sys_open+0x12e/0x210 [<ffffffff8118bc4e>] SyS_open+0x1e/0x20 [<ffffffff814b84e2>] tracesys+0xd0/0xd5 -> #0 (&(&q->__queue_lock)->rlock){-.-...}: [<ffffffff810a4c46>] __lock_acquire+0x1716/0x1a00 [<ffffffff810a56c3>] lock_acquire+0x93/0x1c0 [<ffffffff814aea56>] _raw_spin_lock_irqsave+0x46/0x60 [<ffffffffa043bff3>] dm_table_run_md_queue_async+0x33/0x60 [dm_mod] [<ffffffffa077a692>] queue_if_no_path+0x72/0xc0 [dm_multipath] [<ffffffffa077a6f9>] multipath_presuspend+0x19/0x20 [dm_multipath] [<ffffffffa043d34a>] dm_table_presuspend_targets+0x4a/0x60 [dm_mod] [<ffffffffa043ad5d>] dm_suspend+0x6d/0x1f0 [dm_mod] [<ffffffffa043ff63>] dev_suspend+0x1c3/0x220 [dm_mod] [<ffffffffa0440759>] ctl_ioctl+0x269/0x500 [dm_mod] [<ffffffffa0440a03>] dm_ctl_ioctl+0x13/0x20 [dm_mod] [<ffffffff811a04c0>] do_vfs_ioctl+0x300/0x520 [<ffffffff811a0721>] SyS_ioctl+0x41/0x80 [<ffffffff814b84e2>] tracesys+0xd0/0xd5 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&(&m->lock)->rlock); lock(&(&q->__queue_lock)->rlock); lock(&(&m->lock)->rlock); lock(&(&q->__queue_lock)->rlock); *** DEADLOCK *** 2 locks held by multipathd/10364: #0: (&md->suspend_lock){+.+...}, at: [<ffffffffa043ad28>] dm_suspend+0x38/0x1f0 [dm_mod] #1: (&(&m->lock)->rlock){..-...}, at: [<ffffffffa077a647>] queue_if_no_path+0x27/0xc0 [dm_multipath] stack backtrace: CPU: 10 PID: 10364 Comm: multipathd Not tainted 3.15.0-rc4-debug+ #1 Hardware name: MSI MS-7737/Big Bang-XPower II (MS-7737), BIOS V1.5 10/16/2012 ffffffff81fea150 ffff8807c194fa98 ffffffff814a6780 ffffffff81fea150 ffff8807c194fad8 ffffffff814a36db ffff8807c194fb30 ffff8807c1954c88 0000000000000001 0000000000000002 ffff8807c1954c88 ffff8807c1954440 Call Trace: [<ffffffff814a6780>] dump_stack+0x4e/0x7a [<ffffffff814a36db>] print_circular_bug+0x200/0x20f [<ffffffff810a4c46>] __lock_acquire+0x1716/0x1a00 [<ffffffff810a56c3>] lock_acquire+0x93/0x1c0 [<ffffffffa043bff3>] ? dm_table_run_md_queue_async+0x33/0x60 [dm_mod] [<ffffffff814aea56>] _raw_spin_lock_irqsave+0x46/0x60 [<ffffffffa043bff3>] ? dm_table_run_md_queue_async+0x33/0x60 [dm_mod] [<ffffffffa043bff3>] dm_table_run_md_queue_async+0x33/0x60 [dm_mod] [<ffffffffa077a692>] queue_if_no_path+0x72/0xc0 [dm_multipath] [<ffffffffa077a6f9>] multipath_presuspend+0x19/0x20 [dm_multipath] [<ffffffffa043d34a>] dm_table_presuspend_targets+0x4a/0x60 [dm_mod] [<ffffffffa043ad5d>] dm_suspend+0x6d/0x1f0 [dm_mod] [<ffffffffa043ff63>] dev_suspend+0x1c3/0x220 [dm_mod] [<ffffffffa043fda0>] ? table_load+0x350/0x350 [dm_mod] [<ffffffffa0440759>] ctl_ioctl+0x269/0x500 [dm_mod] [<ffffffffa0440a03>] dm_ctl_ioctl+0x13/0x20 [dm_mod] [<ffffffff811a04c0>] do_vfs_ioctl+0x300/0x520 [<ffffffff811ac089>] ? __fget+0x129/0x300 [<ffffffff811abf65>] ? __fget+0x5/0x300 [<ffffffff811ac2d0>] ? __fget_light+0x30/0x160 [<ffffffff811a0721>] SyS_ioctl+0x41/0x80 [<ffffffff814b84e2>] tracesys+0xd0/0xd5 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: 3.15-rc4: circular locking dependency triggered by dm-multipath 2014-05-06 9:31 3.15-rc4: circular locking dependency triggered by dm-multipath Bart Van Assche @ 2014-05-06 13:22 ` Mike Snitzer 2014-05-06 13:46 ` Bart Van Assche 2014-05-07 20:53 ` Mike Snitzer 2014-05-08 5:57 ` Hannes Reinecke 2 siblings, 1 reply; 15+ messages in thread From: Mike Snitzer @ 2014-05-06 13:22 UTC (permalink / raw) To: Bart Van Assche; +Cc: device-mapper development On Tue, May 06 2014 at 5:31am -0400, Bart Van Assche <bvanassche@acm.org> wrote: > Hello, > > Has anyone else perhaps already run into this ? Not seen this, how easily is it reproduced? These recent commits are clearly the issue: e809917735e dm mpath: push back requests instead of queueing 3e9f1be1b40 dm mpath: remove process_queued_ios() Need to take a closer look at the lockdep splat you provided. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: 3.15-rc4: circular locking dependency triggered by dm-multipath 2014-05-06 13:22 ` Mike Snitzer @ 2014-05-06 13:46 ` Bart Van Assche 0 siblings, 0 replies; 15+ messages in thread From: Bart Van Assche @ 2014-05-06 13:46 UTC (permalink / raw) To: Mike Snitzer; +Cc: device-mapper development On 05/06/14 15:22, Mike Snitzer wrote: > On Tue, May 06 2014 at 5:31am -0400, > Bart Van Assche <bvanassche@acm.org> wrote: >> Has anyone else perhaps already run into this ? > > Not seen this, how easily is it reproduced? > > These recent commits are clearly the issue: > > e809917735e dm mpath: push back requests instead of queueing > 3e9f1be1b40 dm mpath: remove process_queued_ios() > > Need to take a closer look at the lockdep splat you provided. All I did was to simulate a cable pull while running a fio test on top of a dm-multipath device on top of a few SCSI devices. This lockdep report was generated the first time I ran the cable pull simulation. I have not yet tried to reproduce this but it's not impossible that this is 100% reproducible. The same test runs fine with kernel 3.14. Bart. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: 3.15-rc4: circular locking dependency triggered by dm-multipath 2014-05-06 9:31 3.15-rc4: circular locking dependency triggered by dm-multipath Bart Van Assche 2014-05-06 13:22 ` Mike Snitzer @ 2014-05-07 20:53 ` Mike Snitzer 2014-05-08 5:57 ` Hannes Reinecke 2 siblings, 0 replies; 15+ messages in thread From: Mike Snitzer @ 2014-05-07 20:53 UTC (permalink / raw) To: Bart Van Assche; +Cc: device-mapper development Hi Bart, On Tue, May 06 2014 at 5:31am -0400, Bart Van Assche <bvanassche@acm.org> wrote: > Hello, > > Has anyone else perhaps already run into this ? > > Thanks, > > Bart. > > ====================================================== > [ INFO: possible circular locking dependency detected ] > 3.15.0-rc4-debug+ #1 Not tainted > ------------------------------------------------------- > multipathd/10364 is trying to acquire lock: > (&(&q->__queue_lock)->rlock){-.-...}, at: [<ffffffffa043bff3>] dm_table_run_md_queue_async+0x33/0x60 [dm_mod] > > but task is already holding lock: > (&(&m->lock)->rlock){..-...}, at: [<ffffffffa077a647>] queue_if_no_path+0x27/0xc0 [dm_multipath] > > which lock already depends on the new lock. > > > the existing dependency chain (in reverse order) is: > > -> #1 (&(&m->lock)->rlock){..-...}: > [<ffffffff810a56c3>] lock_acquire+0x93/0x1c0 > [<ffffffff814ae8eb>] _raw_spin_lock+0x3b/0x50 > [<ffffffffa043a3e9>] dm_blk_open+0x19/0x80 [dm_mod] > [<ffffffff811cbc41>] __blkdev_get+0xd1/0x4c0 > [<ffffffff811cc215>] blkdev_get+0x1e5/0x380 > [<ffffffff811cc45b>] blkdev_open+0x5b/0x80 > [<ffffffff8118a12e>] do_dentry_open.isra.15+0x1de/0x2a0 > [<ffffffff8118a300>] finish_open+0x30/0x40 > [<ffffffff8119c44d>] do_last.isra.61+0xa5d/0x1200 > [<ffffffff8119cca7>] path_openat+0xb7/0x620 > [<ffffffff8119d88a>] do_filp_open+0x3a/0x90 > [<ffffffff8118bb4e>] do_sys_open+0x12e/0x210 > [<ffffffff8118bc4e>] SyS_open+0x1e/0x20 > [<ffffffff814b84e2>] tracesys+0xd0/0xd5 I'm not sure what lockdep is trying to communicate for #1 here. Not seeing where m->lock is taken in the above stack. Nor does it appear to take q->queue_lock. > -> #0 (&(&q->__queue_lock)->rlock){-.-...}: > [<ffffffff810a4c46>] __lock_acquire+0x1716/0x1a00 > [<ffffffff810a56c3>] lock_acquire+0x93/0x1c0 > [<ffffffff814aea56>] _raw_spin_lock_irqsave+0x46/0x60 > [<ffffffffa043bff3>] dm_table_run_md_queue_async+0x33/0x60 [dm_mod] > [<ffffffffa077a692>] queue_if_no_path+0x72/0xc0 [dm_multipath] > [<ffffffffa077a6f9>] multipath_presuspend+0x19/0x20 [dm_multipath] > [<ffffffffa043d34a>] dm_table_presuspend_targets+0x4a/0x60 [dm_mod] > [<ffffffffa043ad5d>] dm_suspend+0x6d/0x1f0 [dm_mod] > [<ffffffffa043ff63>] dev_suspend+0x1c3/0x220 [dm_mod] > [<ffffffffa0440759>] ctl_ioctl+0x269/0x500 [dm_mod] > [<ffffffffa0440a03>] dm_ctl_ioctl+0x13/0x20 [dm_mod] > [<ffffffff811a04c0>] do_vfs_ioctl+0x300/0x520 > [<ffffffff811a0721>] SyS_ioctl+0x41/0x80 > [<ffffffff814b84e2>] tracesys+0xd0/0xd5 > > other info that might help us debug this: > > Possible unsafe locking scenario: > > CPU0 CPU1 > ---- ---- > lock(&(&m->lock)->rlock); > lock(&(&q->__queue_lock)->rlock); > lock(&(&m->lock)->rlock); > lock(&(&q->__queue_lock)->rlock); > > *** DEADLOCK *** I'm not seeing where q->queue_lock is held before m->lock is taken. I did find that multipath_ioctl() does _not_ hold m->lock when calling dm_table_run_md_queue_async() -- the other callers hold m->lock. Any chance you can see if this patch silences lockdep? -- drivers/md/dm-mpath.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index aa009e8..fa0f6cb 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -1566,8 +1566,8 @@ static int multipath_ioctl(struct dm_target *ti, unsigned int cmd, } if (m->pg_init_required) __pg_init_all_paths(m); - spin_unlock_irqrestore(&m->lock, flags); dm_table_run_md_queue_async(m->ti->table); + spin_unlock_irqrestore(&m->lock, flags); } return r ? : __blkdev_driver_ioctl(bdev, mode, cmd, arg); ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: 3.15-rc4: circular locking dependency triggered by dm-multipath 2014-05-06 9:31 3.15-rc4: circular locking dependency triggered by dm-multipath Bart Van Assche 2014-05-06 13:22 ` Mike Snitzer 2014-05-07 20:53 ` Mike Snitzer @ 2014-05-08 5:57 ` Hannes Reinecke 2014-05-26 11:44 ` Bart Van Assche 2 siblings, 1 reply; 15+ messages in thread From: Hannes Reinecke @ 2014-05-08 5:57 UTC (permalink / raw) To: dm-devel On 05/06/2014 11:31 AM, Bart Van Assche wrote: > Hello, > > Has anyone else perhaps already run into this ? > > Thanks, > > Bart. > > ====================================================== > [ INFO: possible circular locking dependency detected ] > 3.15.0-rc4-debug+ #1 Not tainted > ------------------------------------------------------- > multipathd/10364 is trying to acquire lock: > (&(&q->__queue_lock)->rlock){-.-...}, at: [<ffffffffa043bff3>] dm_table_run_md_queue_async+0x33/0x60 [dm_mod] > > but task is already holding lock: > (&(&m->lock)->rlock){..-...}, at: [<ffffffffa077a647>] queue_if_no_path+0x27/0xc0 [dm_multipath] > > which lock already depends on the new lock. > > > the existing dependency chain (in reverse order) is: > > -> #1 (&(&m->lock)->rlock){..-...}: > [<ffffffff810a56c3>] lock_acquire+0x93/0x1c0 > [<ffffffff814ae8eb>] _raw_spin_lock+0x3b/0x50 > [<ffffffffa043a3e9>] dm_blk_open+0x19/0x80 [dm_mod] > [<ffffffff811cbc41>] __blkdev_get+0xd1/0x4c0 > [<ffffffff811cc215>] blkdev_get+0x1e5/0x380 > [<ffffffff811cc45b>] blkdev_open+0x5b/0x80 > [<ffffffff8118a12e>] do_dentry_open.isra.15+0x1de/0x2a0 > [<ffffffff8118a300>] finish_open+0x30/0x40 > [<ffffffff8119c44d>] do_last.isra.61+0xa5d/0x1200 > [<ffffffff8119cca7>] path_openat+0xb7/0x620 > [<ffffffff8119d88a>] do_filp_open+0x3a/0x90 > [<ffffffff8118bb4e>] do_sys_open+0x12e/0x210 > [<ffffffff8118bc4e>] SyS_open+0x1e/0x20 > [<ffffffff814b84e2>] tracesys+0xd0/0xd5 > > -> #0 (&(&q->__queue_lock)->rlock){-.-...}: > [<ffffffff810a4c46>] __lock_acquire+0x1716/0x1a00 > [<ffffffff810a56c3>] lock_acquire+0x93/0x1c0 > [<ffffffff814aea56>] _raw_spin_lock_irqsave+0x46/0x60 > [<ffffffffa043bff3>] dm_table_run_md_queue_async+0x33/0x60 [dm_mod] > [<ffffffffa077a692>] queue_if_no_path+0x72/0xc0 [dm_multipath] > [<ffffffffa077a6f9>] multipath_presuspend+0x19/0x20 [dm_multipath] > [<ffffffffa043d34a>] dm_table_presuspend_targets+0x4a/0x60 [dm_mod] > [<ffffffffa043ad5d>] dm_suspend+0x6d/0x1f0 [dm_mod] > [<ffffffffa043ff63>] dev_suspend+0x1c3/0x220 [dm_mod] > [<ffffffffa0440759>] ctl_ioctl+0x269/0x500 [dm_mod] > [<ffffffffa0440a03>] dm_ctl_ioctl+0x13/0x20 [dm_mod] > [<ffffffff811a04c0>] do_vfs_ioctl+0x300/0x520 > [<ffffffff811a0721>] SyS_ioctl+0x41/0x80 > [<ffffffff814b84e2>] tracesys+0xd0/0xd5 > > other info that might help us debug this: > > Possible unsafe locking scenario: > > CPU0 CPU1 > ---- ---- > lock(&(&m->lock)->rlock); > lock(&(&q->__queue_lock)->rlock); > lock(&(&m->lock)->rlock); > lock(&(&q->__queue_lock)->rlock); > > *** DEADLOCK *** > > 2 locks held by multipathd/10364: > #0: (&md->suspend_lock){+.+...}, at: [<ffffffffa043ad28>] dm_suspend+0x38/0x1f0 [dm_mod] > #1: (&(&m->lock)->rlock){..-...}, at: [<ffffffffa077a647>] queue_if_no_path+0x27/0xc0 [dm_multipath] > > stack backtrace: > CPU: 10 PID: 10364 Comm: multipathd Not tainted 3.15.0-rc4-debug+ #1 > Hardware name: MSI MS-7737/Big Bang-XPower II (MS-7737), BIOS V1.5 10/16/2012 > ffffffff81fea150 ffff8807c194fa98 ffffffff814a6780 ffffffff81fea150 > ffff8807c194fad8 ffffffff814a36db ffff8807c194fb30 ffff8807c1954c88 > 0000000000000001 0000000000000002 ffff8807c1954c88 ffff8807c1954440 > Call Trace: > [<ffffffff814a6780>] dump_stack+0x4e/0x7a > [<ffffffff814a36db>] print_circular_bug+0x200/0x20f > [<ffffffff810a4c46>] __lock_acquire+0x1716/0x1a00 > [<ffffffff810a56c3>] lock_acquire+0x93/0x1c0 > [<ffffffffa043bff3>] ? dm_table_run_md_queue_async+0x33/0x60 [dm_mod] > [<ffffffff814aea56>] _raw_spin_lock_irqsave+0x46/0x60 > [<ffffffffa043bff3>] ? dm_table_run_md_queue_async+0x33/0x60 [dm_mod] > [<ffffffffa043bff3>] dm_table_run_md_queue_async+0x33/0x60 [dm_mod] > [<ffffffffa077a692>] queue_if_no_path+0x72/0xc0 [dm_multipath] > [<ffffffffa077a6f9>] multipath_presuspend+0x19/0x20 [dm_multipath] > [<ffffffffa043d34a>] dm_table_presuspend_targets+0x4a/0x60 [dm_mod] > [<ffffffffa043ad5d>] dm_suspend+0x6d/0x1f0 [dm_mod] > [<ffffffffa043ff63>] dev_suspend+0x1c3/0x220 [dm_mod] > [<ffffffffa043fda0>] ? table_load+0x350/0x350 [dm_mod] > [<ffffffffa0440759>] ctl_ioctl+0x269/0x500 [dm_mod] > [<ffffffffa0440a03>] dm_ctl_ioctl+0x13/0x20 [dm_mod] > [<ffffffff811a04c0>] do_vfs_ioctl+0x300/0x520 > [<ffffffff811ac089>] ? __fget+0x129/0x300 > [<ffffffff811abf65>] ? __fget+0x5/0x300 > [<ffffffff811ac2d0>] ? __fget_light+0x30/0x160 > [<ffffffff811a0721>] SyS_ioctl+0x41/0x80 > [<ffffffff814b84e2>] tracesys+0xd0/0xd5 > Hmm. No, I haven't seen it (yet). Looking into it. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: 3.15-rc4: circular locking dependency triggered by dm-multipath 2014-05-08 5:57 ` Hannes Reinecke @ 2014-05-26 11:44 ` Bart Van Assche 2014-05-26 12:10 ` Hannes Reinecke 0 siblings, 1 reply; 15+ messages in thread From: Bart Van Assche @ 2014-05-26 11:44 UTC (permalink / raw) To: Hannes Reinecke; +Cc: device-mapper development On 05/08/14 07:57, Hannes Reinecke wrote: > On 05/06/2014 11:31 AM, Bart Van Assche wrote: >> Hello, >> >> Has anyone else perhaps already run into this ? >> >> Thanks, >> >> Bart. >> >> ====================================================== >> [ INFO: possible circular locking dependency detected ] >> 3.15.0-rc4-debug+ #1 Not tainted >> ------------------------------------------------------- >> multipathd/10364 is trying to acquire lock: >> (&(&q->__queue_lock)->rlock){-.-...}, at: [<ffffffffa043bff3>] >> dm_table_run_md_queue_async+0x33/0x60 [dm_mod] >> >> but task is already holding lock: >> (&(&m->lock)->rlock){..-...}, at: [<ffffffffa077a647>] >> queue_if_no_path+0x27/0xc0 [dm_multipath] >> >> which lock already depends on the new lock. >> >> >> the existing dependency chain (in reverse order) is: >> >> -> #1 (&(&m->lock)->rlock){..-...}: >> [<ffffffff810a56c3>] lock_acquire+0x93/0x1c0 >> [<ffffffff814ae8eb>] _raw_spin_lock+0x3b/0x50 >> [<ffffffffa043a3e9>] dm_blk_open+0x19/0x80 [dm_mod] >> [<ffffffff811cbc41>] __blkdev_get+0xd1/0x4c0 >> [<ffffffff811cc215>] blkdev_get+0x1e5/0x380 >> [<ffffffff811cc45b>] blkdev_open+0x5b/0x80 >> [<ffffffff8118a12e>] do_dentry_open.isra.15+0x1de/0x2a0 >> [<ffffffff8118a300>] finish_open+0x30/0x40 >> [<ffffffff8119c44d>] do_last.isra.61+0xa5d/0x1200 >> [<ffffffff8119cca7>] path_openat+0xb7/0x620 >> [<ffffffff8119d88a>] do_filp_open+0x3a/0x90 >> [<ffffffff8118bb4e>] do_sys_open+0x12e/0x210 >> [<ffffffff8118bc4e>] SyS_open+0x1e/0x20 >> [<ffffffff814b84e2>] tracesys+0xd0/0xd5 >> >> -> #0 (&(&q->__queue_lock)->rlock){-.-...}: >> [<ffffffff810a4c46>] __lock_acquire+0x1716/0x1a00 >> [<ffffffff810a56c3>] lock_acquire+0x93/0x1c0 >> [<ffffffff814aea56>] _raw_spin_lock_irqsave+0x46/0x60 >> [<ffffffffa043bff3>] dm_table_run_md_queue_async+0x33/0x60 >> [dm_mod] >> [<ffffffffa077a692>] queue_if_no_path+0x72/0xc0 [dm_multipath] >> [<ffffffffa077a6f9>] multipath_presuspend+0x19/0x20 >> [dm_multipath] >> [<ffffffffa043d34a>] dm_table_presuspend_targets+0x4a/0x60 >> [dm_mod] >> [<ffffffffa043ad5d>] dm_suspend+0x6d/0x1f0 [dm_mod] >> [<ffffffffa043ff63>] dev_suspend+0x1c3/0x220 [dm_mod] >> [<ffffffffa0440759>] ctl_ioctl+0x269/0x500 [dm_mod] >> [<ffffffffa0440a03>] dm_ctl_ioctl+0x13/0x20 [dm_mod] >> [<ffffffff811a04c0>] do_vfs_ioctl+0x300/0x520 >> [<ffffffff811a0721>] SyS_ioctl+0x41/0x80 >> [<ffffffff814b84e2>] tracesys+0xd0/0xd5 >> >> other info that might help us debug this: >> >> Possible unsafe locking scenario: >> >> CPU0 CPU1 >> ---- ---- >> lock(&(&m->lock)->rlock); >> lock(&(&q->__queue_lock)->rlock); >> lock(&(&m->lock)->rlock); >> lock(&(&q->__queue_lock)->rlock); >> >> *** DEADLOCK *** >> >> 2 locks held by multipathd/10364: >> #0: (&md->suspend_lock){+.+...}, at: [<ffffffffa043ad28>] >> dm_suspend+0x38/0x1f0 [dm_mod] >> #1: (&(&m->lock)->rlock){..-...}, at: [<ffffffffa077a647>] >> queue_if_no_path+0x27/0xc0 [dm_multipath] >> >> stack backtrace: >> CPU: 10 PID: 10364 Comm: multipathd Not tainted 3.15.0-rc4-debug+ #1 >> Hardware name: MSI MS-7737/Big Bang-XPower II (MS-7737), BIOS V1.5 >> 10/16/2012 >> ffffffff81fea150 ffff8807c194fa98 ffffffff814a6780 ffffffff81fea150 >> ffff8807c194fad8 ffffffff814a36db ffff8807c194fb30 ffff8807c1954c88 >> 0000000000000001 0000000000000002 ffff8807c1954c88 ffff8807c1954440 >> Call Trace: >> [<ffffffff814a6780>] dump_stack+0x4e/0x7a >> [<ffffffff814a36db>] print_circular_bug+0x200/0x20f >> [<ffffffff810a4c46>] __lock_acquire+0x1716/0x1a00 >> [<ffffffff810a56c3>] lock_acquire+0x93/0x1c0 >> [<ffffffffa043bff3>] ? dm_table_run_md_queue_async+0x33/0x60 [dm_mod] >> [<ffffffff814aea56>] _raw_spin_lock_irqsave+0x46/0x60 >> [<ffffffffa043bff3>] ? dm_table_run_md_queue_async+0x33/0x60 [dm_mod] >> [<ffffffffa043bff3>] dm_table_run_md_queue_async+0x33/0x60 [dm_mod] >> [<ffffffffa077a692>] queue_if_no_path+0x72/0xc0 [dm_multipath] >> [<ffffffffa077a6f9>] multipath_presuspend+0x19/0x20 [dm_multipath] >> [<ffffffffa043d34a>] dm_table_presuspend_targets+0x4a/0x60 [dm_mod] >> [<ffffffffa043ad5d>] dm_suspend+0x6d/0x1f0 [dm_mod] >> [<ffffffffa043ff63>] dev_suspend+0x1c3/0x220 [dm_mod] >> [<ffffffffa043fda0>] ? table_load+0x350/0x350 [dm_mod] >> [<ffffffffa0440759>] ctl_ioctl+0x269/0x500 [dm_mod] >> [<ffffffffa0440a03>] dm_ctl_ioctl+0x13/0x20 [dm_mod] >> [<ffffffff811a04c0>] do_vfs_ioctl+0x300/0x520 >> [<ffffffff811ac089>] ? __fget+0x129/0x300 >> [<ffffffff811abf65>] ? __fget+0x5/0x300 >> [<ffffffff811ac2d0>] ? __fget_light+0x30/0x160 >> [<ffffffff811a0721>] SyS_ioctl+0x41/0x80 >> [<ffffffff814b84e2>] tracesys+0xd0/0xd5 >> > Hmm. No, I haven't seen it (yet). > Looking into it. Hello Hannes, Not sure if this information helps but apparently the above warning is not a false positive but indicates a real deadlock. I have been able to reproduce the above warning with 3.15-rc7. Shortly after the above message appeared the following was reported: Watchdog detected hard LOCKUP on cpu 11 Call Trace: <NMI> [<ffffffff814a7bd0>] dump_stack+0x4e/0x7a [<ffffffff8104bb8d>] warn_slowpath_common+0x7d/0xa0 [<ffffffff8104bbfc>] warn_slowpath_fmt+0x4c/0x50 [<ffffffff810e89b5>] watchdog_overflow_callback+0xd5/0x120 [<ffffffff8112268c>] __perf_event_overflow+0x9c/0x320 [<ffffffff81017a50>] ? x86_perf_event_set_period+0xe0/0x150 [<ffffffff811236b4>] perf_event_overflow+0x14/0x20 [<ffffffff8101e5de>] intel_pmu_handle_irq+0x1ce/0x3c0 [<ffffffff814affd1>] ? _raw_spin_unlock+0x31/0x50 [<ffffffff812e6c3f>] ? ghes_copy_tofrom_phys+0x11f/0x210 [<ffffffff814b28ab>] perf_event_nmi_handler+0x2b/0x50 [<ffffffff814b1da2>] nmi_handle.isra.5+0xc2/0x380 [<ffffffff814b1ce5>] ? nmi_handle.isra.5+0x5/0x380 [<ffffffff814b21e5>] do_nmi+0x185/0x3e0 [<ffffffff814b1287>] end_repeat_nmi+0x1e/0x2e [<ffffffff8126548d>] ? delay_tsc+0x2d/0xc0 [<ffffffff8126548d>] ? delay_tsc+0x2d/0xc0 [<ffffffff8126548d>] ? delay_tsc+0x2d/0xc0 <<EOE>> [<ffffffff812653bf>] __delay+0xf/0x20 [<ffffffff810a86b1>] do_raw_spin_lock+0xe1/0x140 [<ffffffff814afece>] _raw_spin_lock_irqsave+0x4e/0x60 [<ffffffffa0782500>] ? multipath_busy+0x20/0xf0 [dm_multipath] [<ffffffffa0782500>] multipath_busy+0x20/0xf0 [dm_multipath] [<ffffffffa024e56c>] dm_request_fn+0xcc/0x320 [dm_mod] [<ffffffff8123b013>] __blk_run_queue+0x33/0x40 [<ffffffff8123b205>] blk_delay_work+0x25/0x40 [<ffffffff8106d7aa>] process_one_work+0x1ea/0x6c0 [<ffffffff8106d748>] ? process_one_work+0x188/0x6c0 [<ffffffff8106dd9b>] worker_thread+0x11b/0x3a0 [<ffffffff8106dc80>] ? process_one_work+0x6c0/0x6c0 [<ffffffff810758bd>] kthread+0xed/0x110 [<ffffffff810757d0>] ? insert_kthread_work+0x80/0x80 [<ffffffff814b972c>] ret_from_fork+0x7c/0xb0 [<ffffffff810757d0>] ? insert_kthread_work+0x80/0x80 Bart. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: 3.15-rc4: circular locking dependency triggered by dm-multipath 2014-05-26 11:44 ` Bart Van Assche @ 2014-05-26 12:10 ` Hannes Reinecke 2014-05-26 12:20 ` Bart Van Assche 0 siblings, 1 reply; 15+ messages in thread From: Hannes Reinecke @ 2014-05-26 12:10 UTC (permalink / raw) To: Bart Van Assche; +Cc: device-mapper development, Mike Snitzer On 05/26/2014 01:44 PM, Bart Van Assche wrote: > On 05/08/14 07:57, Hannes Reinecke wrote: >> On 05/06/2014 11:31 AM, Bart Van Assche wrote: >>> Hello, >>> >>> Has anyone else perhaps already run into this ? >>> >>> Thanks, >>> >>> Bart. >>> >>> ====================================================== >>> [ INFO: possible circular locking dependency detected ] >>> 3.15.0-rc4-debug+ #1 Not tainted >>> ------------------------------------------------------- >>> multipathd/10364 is trying to acquire lock: >>> (&(&q->__queue_lock)->rlock){-.-...}, at: [<ffffffffa043bff3>] >>> dm_table_run_md_queue_async+0x33/0x60 [dm_mod] >>> >>> but task is already holding lock: >>> (&(&m->lock)->rlock){..-...}, at: [<ffffffffa077a647>] >>> queue_if_no_path+0x27/0xc0 [dm_multipath] >>> >>> which lock already depends on the new lock. >>> >>> >>> the existing dependency chain (in reverse order) is: >>> >>> -> #1 (&(&m->lock)->rlock){..-...}: >>> [<ffffffff810a56c3>] lock_acquire+0x93/0x1c0 >>> [<ffffffff814ae8eb>] _raw_spin_lock+0x3b/0x50 >>> [<ffffffffa043a3e9>] dm_blk_open+0x19/0x80 [dm_mod] >>> [<ffffffff811cbc41>] __blkdev_get+0xd1/0x4c0 >>> [<ffffffff811cc215>] blkdev_get+0x1e5/0x380 >>> [<ffffffff811cc45b>] blkdev_open+0x5b/0x80 >>> [<ffffffff8118a12e>] do_dentry_open.isra.15+0x1de/0x2a0 >>> [<ffffffff8118a300>] finish_open+0x30/0x40 >>> [<ffffffff8119c44d>] do_last.isra.61+0xa5d/0x1200 >>> [<ffffffff8119cca7>] path_openat+0xb7/0x620 >>> [<ffffffff8119d88a>] do_filp_open+0x3a/0x90 >>> [<ffffffff8118bb4e>] do_sys_open+0x12e/0x210 >>> [<ffffffff8118bc4e>] SyS_open+0x1e/0x20 >>> [<ffffffff814b84e2>] tracesys+0xd0/0xd5 >>> >>> -> #0 (&(&q->__queue_lock)->rlock){-.-...}: >>> [<ffffffff810a4c46>] __lock_acquire+0x1716/0x1a00 >>> [<ffffffff810a56c3>] lock_acquire+0x93/0x1c0 >>> [<ffffffff814aea56>] _raw_spin_lock_irqsave+0x46/0x60 >>> [<ffffffffa043bff3>] dm_table_run_md_queue_async+0x33/0x60 >>> [dm_mod] >>> [<ffffffffa077a692>] queue_if_no_path+0x72/0xc0 [dm_multipath] >>> [<ffffffffa077a6f9>] multipath_presuspend+0x19/0x20 >>> [dm_multipath] >>> [<ffffffffa043d34a>] dm_table_presuspend_targets+0x4a/0x60 >>> [dm_mod] >>> [<ffffffffa043ad5d>] dm_suspend+0x6d/0x1f0 [dm_mod] >>> [<ffffffffa043ff63>] dev_suspend+0x1c3/0x220 [dm_mod] >>> [<ffffffffa0440759>] ctl_ioctl+0x269/0x500 [dm_mod] >>> [<ffffffffa0440a03>] dm_ctl_ioctl+0x13/0x20 [dm_mod] >>> [<ffffffff811a04c0>] do_vfs_ioctl+0x300/0x520 >>> [<ffffffff811a0721>] SyS_ioctl+0x41/0x80 >>> [<ffffffff814b84e2>] tracesys+0xd0/0xd5 >>> >>> other info that might help us debug this: >>> >>> Possible unsafe locking scenario: >>> >>> CPU0 CPU1 >>> ---- ---- >>> lock(&(&m->lock)->rlock); >>> lock(&(&q->__queue_lock)->rlock); >>> lock(&(&m->lock)->rlock); >>> lock(&(&q->__queue_lock)->rlock); >>> >>> *** DEADLOCK *** >>> >>> 2 locks held by multipathd/10364: >>> #0: (&md->suspend_lock){+.+...}, at: [<ffffffffa043ad28>] >>> dm_suspend+0x38/0x1f0 [dm_mod] >>> #1: (&(&m->lock)->rlock){..-...}, at: [<ffffffffa077a647>] >>> queue_if_no_path+0x27/0xc0 [dm_multipath] >>> >>> stack backtrace: >>> CPU: 10 PID: 10364 Comm: multipathd Not tainted 3.15.0-rc4-debug+ #1 >>> Hardware name: MSI MS-7737/Big Bang-XPower II (MS-7737), BIOS V1.5 >>> 10/16/2012 >>> ffffffff81fea150 ffff8807c194fa98 ffffffff814a6780 ffffffff81fea150 >>> ffff8807c194fad8 ffffffff814a36db ffff8807c194fb30 ffff8807c1954c88 >>> 0000000000000001 0000000000000002 ffff8807c1954c88 ffff8807c1954440 >>> Call Trace: >>> [<ffffffff814a6780>] dump_stack+0x4e/0x7a >>> [<ffffffff814a36db>] print_circular_bug+0x200/0x20f >>> [<ffffffff810a4c46>] __lock_acquire+0x1716/0x1a00 >>> [<ffffffff810a56c3>] lock_acquire+0x93/0x1c0 >>> [<ffffffffa043bff3>] ? dm_table_run_md_queue_async+0x33/0x60 [dm_mod] >>> [<ffffffff814aea56>] _raw_spin_lock_irqsave+0x46/0x60 >>> [<ffffffffa043bff3>] ? dm_table_run_md_queue_async+0x33/0x60 [dm_mod] >>> [<ffffffffa043bff3>] dm_table_run_md_queue_async+0x33/0x60 [dm_mod] >>> [<ffffffffa077a692>] queue_if_no_path+0x72/0xc0 [dm_multipath] >>> [<ffffffffa077a6f9>] multipath_presuspend+0x19/0x20 [dm_multipath] >>> [<ffffffffa043d34a>] dm_table_presuspend_targets+0x4a/0x60 [dm_mod] >>> [<ffffffffa043ad5d>] dm_suspend+0x6d/0x1f0 [dm_mod] >>> [<ffffffffa043ff63>] dev_suspend+0x1c3/0x220 [dm_mod] >>> [<ffffffffa043fda0>] ? table_load+0x350/0x350 [dm_mod] >>> [<ffffffffa0440759>] ctl_ioctl+0x269/0x500 [dm_mod] >>> [<ffffffffa0440a03>] dm_ctl_ioctl+0x13/0x20 [dm_mod] >>> [<ffffffff811a04c0>] do_vfs_ioctl+0x300/0x520 >>> [<ffffffff811ac089>] ? __fget+0x129/0x300 >>> [<ffffffff811abf65>] ? __fget+0x5/0x300 >>> [<ffffffff811ac2d0>] ? __fget_light+0x30/0x160 >>> [<ffffffff811a0721>] SyS_ioctl+0x41/0x80 >>> [<ffffffff814b84e2>] tracesys+0xd0/0xd5 >>> >> Hmm. No, I haven't seen it (yet). >> Looking into it. > > Hello Hannes, > > Not sure if this information helps but apparently the above warning is > not a false positive but indicates a real deadlock. I have been able to > reproduce the above warning with 3.15-rc7. Shortly after the above > message appeared the following was reported: > > Watchdog detected hard LOCKUP on cpu 11 > Call Trace: > <NMI> [<ffffffff814a7bd0>] dump_stack+0x4e/0x7a > [<ffffffff8104bb8d>] warn_slowpath_common+0x7d/0xa0 > [<ffffffff8104bbfc>] warn_slowpath_fmt+0x4c/0x50 > [<ffffffff810e89b5>] watchdog_overflow_callback+0xd5/0x120 > [<ffffffff8112268c>] __perf_event_overflow+0x9c/0x320 > [<ffffffff81017a50>] ? x86_perf_event_set_period+0xe0/0x150 > [<ffffffff811236b4>] perf_event_overflow+0x14/0x20 > [<ffffffff8101e5de>] intel_pmu_handle_irq+0x1ce/0x3c0 > [<ffffffff814affd1>] ? _raw_spin_unlock+0x31/0x50 > [<ffffffff812e6c3f>] ? ghes_copy_tofrom_phys+0x11f/0x210 > [<ffffffff814b28ab>] perf_event_nmi_handler+0x2b/0x50 > [<ffffffff814b1da2>] nmi_handle.isra.5+0xc2/0x380 > [<ffffffff814b1ce5>] ? nmi_handle.isra.5+0x5/0x380 > [<ffffffff814b21e5>] do_nmi+0x185/0x3e0 > [<ffffffff814b1287>] end_repeat_nmi+0x1e/0x2e > [<ffffffff8126548d>] ? delay_tsc+0x2d/0xc0 > [<ffffffff8126548d>] ? delay_tsc+0x2d/0xc0 > [<ffffffff8126548d>] ? delay_tsc+0x2d/0xc0 > <<EOE>> [<ffffffff812653bf>] __delay+0xf/0x20 > [<ffffffff810a86b1>] do_raw_spin_lock+0xe1/0x140 > [<ffffffff814afece>] _raw_spin_lock_irqsave+0x4e/0x60 > [<ffffffffa0782500>] ? multipath_busy+0x20/0xf0 [dm_multipath] > [<ffffffffa0782500>] multipath_busy+0x20/0xf0 [dm_multipath] > [<ffffffffa024e56c>] dm_request_fn+0xcc/0x320 [dm_mod] > [<ffffffff8123b013>] __blk_run_queue+0x33/0x40 > [<ffffffff8123b205>] blk_delay_work+0x25/0x40 > [<ffffffff8106d7aa>] process_one_work+0x1ea/0x6c0 > [<ffffffff8106d748>] ? process_one_work+0x188/0x6c0 > [<ffffffff8106dd9b>] worker_thread+0x11b/0x3a0 > [<ffffffff8106dc80>] ? process_one_work+0x6c0/0x6c0 > [<ffffffff810758bd>] kthread+0xed/0x110 > [<ffffffff810757d0>] ? insert_kthread_work+0x80/0x80 > [<ffffffff814b972c>] ret_from_fork+0x7c/0xb0 > [<ffffffff810757d0>] ? insert_kthread_work+0x80/0x80 > Mike Snitzer had a patch in his device-mapper tree: dm mpath: fix lock order inconsistency in multipath_ioctl (2014-05-14 16:12:17 -0400) I was sort of hoping that would address this issue. Can you check? Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: 3.15-rc4: circular locking dependency triggered by dm-multipath 2014-05-26 12:10 ` Hannes Reinecke @ 2014-05-26 12:20 ` Bart Van Assche 2014-05-26 12:29 ` Hannes Reinecke 0 siblings, 1 reply; 15+ messages in thread From: Bart Van Assche @ 2014-05-26 12:20 UTC (permalink / raw) To: Hannes Reinecke; +Cc: device-mapper development, Mike Snitzer On 05/26/14 14:10, Hannes Reinecke wrote: > Mike Snitzer had a patch in his device-mapper tree: > > dm mpath: fix lock order inconsistency in multipath_ioctl (2014-05-14 > 16:12:17 -0400) > > I was sort of hoping that would address this issue. > Can you check? Hello Hannes, Is it possible that that patch already got included in v3.15-rc6 and hence that that patch was included in my test ? $ git log v3.15-rc5..v3.15-rc6 | grep 'dm mpath: fix lock order inconsistency in multipath_ioctl' dm mpath: fix lock order inconsistency in multipath_ioctl dm mpath: fix lock order inconsistency in multipath_ioctl Bart. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: 3.15-rc4: circular locking dependency triggered by dm-multipath 2014-05-26 12:20 ` Bart Van Assche @ 2014-05-26 12:29 ` Hannes Reinecke 2014-05-26 12:44 ` Hannes Reinecke 0 siblings, 1 reply; 15+ messages in thread From: Hannes Reinecke @ 2014-05-26 12:29 UTC (permalink / raw) To: Bart Van Assche; +Cc: device-mapper development, Mike Snitzer On 05/26/2014 02:20 PM, Bart Van Assche wrote: > On 05/26/14 14:10, Hannes Reinecke wrote: >> Mike Snitzer had a patch in his device-mapper tree: >> >> dm mpath: fix lock order inconsistency in multipath_ioctl (2014-05-14 >> 16:12:17 -0400) >> >> I was sort of hoping that would address this issue. >> Can you check? > > Hello Hannes, > > Is it possible that that patch already got included in v3.15-rc6 and > hence that that patch was included in my test ? > > $ git log v3.15-rc5..v3.15-rc6 | grep 'dm mpath: fix lock order > inconsistency in multipath_ioctl' > dm mpath: fix lock order inconsistency in multipath_ioctl > dm mpath: fix lock order inconsistency in multipath_ioctl > Could be. Okay, I'll be cross-checking. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: 3.15-rc4: circular locking dependency triggered by dm-multipath 2014-05-26 12:29 ` Hannes Reinecke @ 2014-05-26 12:44 ` Hannes Reinecke 2014-05-26 12:51 ` Hannes Reinecke 0 siblings, 1 reply; 15+ messages in thread From: Hannes Reinecke @ 2014-05-26 12:44 UTC (permalink / raw) To: Bart Van Assche; +Cc: dm-devel On 05/26/2014 02:29 PM, Hannes Reinecke wrote: > On 05/26/2014 02:20 PM, Bart Van Assche wrote: >> On 05/26/14 14:10, Hannes Reinecke wrote: >>> Mike Snitzer had a patch in his device-mapper tree: >>> >>> dm mpath: fix lock order inconsistency in multipath_ioctl >>> (2014-05-14 >>> 16:12:17 -0400) >>> >>> I was sort of hoping that would address this issue. >>> Can you check? >> >> Hello Hannes, >> >> Is it possible that that patch already got included in v3.15-rc6 and >> hence that that patch was included in my test ? >> >> $ git log v3.15-rc5..v3.15-rc6 | grep 'dm mpath: fix lock order >> inconsistency in multipath_ioctl' >> dm mpath: fix lock order inconsistency in multipath_ioctl >> dm mpath: fix lock order inconsistency in multipath_ioctl >> > Could be. > > Okay, I'll be cross-checking. > Can you check if this makes lockdep happy? diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index aa009e8..40b3036 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -445,11 +445,11 @@ static int queue_if_no_path(struct multipath *m, unsigned queue_if_no_path, else m->saved_queue_if_no_path = queue_if_no_path; m->queue_if_no_path = queue_if_no_path; - if (!m->queue_if_no_path) - dm_table_run_md_queue_async(m->ti->table); - spin_unlock_irqrestore(&m->lock, flags); + if (!queue_if_no_path) + dm_table_run_md_queue_async(m->ti->table); + return 0; } Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: 3.15-rc4: circular locking dependency triggered by dm-multipath 2014-05-26 12:44 ` Hannes Reinecke @ 2014-05-26 12:51 ` Hannes Reinecke 2014-05-26 14:54 ` Bart Van Assche 0 siblings, 1 reply; 15+ messages in thread From: Hannes Reinecke @ 2014-05-26 12:51 UTC (permalink / raw) To: Bart Van Assche; +Cc: dm-devel [-- Attachment #1: Type: text/plain, Size: 1991 bytes --] On 05/26/2014 02:44 PM, Hannes Reinecke wrote: > On 05/26/2014 02:29 PM, Hannes Reinecke wrote: >> On 05/26/2014 02:20 PM, Bart Van Assche wrote: >>> On 05/26/14 14:10, Hannes Reinecke wrote: >>>> Mike Snitzer had a patch in his device-mapper tree: >>>> >>>> dm mpath: fix lock order inconsistency in multipath_ioctl >>>> (2014-05-14 >>>> 16:12:17 -0400) >>>> >>>> I was sort of hoping that would address this issue. >>>> Can you check? >>> >>> Hello Hannes, >>> >>> Is it possible that that patch already got included in v3.15-rc6 and >>> hence that that patch was included in my test ? >>> >>> $ git log v3.15-rc5..v3.15-rc6 | grep 'dm mpath: fix lock order >>> inconsistency in multipath_ioctl' >>> dm mpath: fix lock order inconsistency in multipath_ioctl >>> dm mpath: fix lock order inconsistency in multipath_ioctl >>> >> Could be. >> >> Okay, I'll be cross-checking. >> > Can you check if this makes lockdep happy? > > diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c > index aa009e8..40b3036 100644 > --- a/drivers/md/dm-mpath.c > +++ b/drivers/md/dm-mpath.c > @@ -445,11 +445,11 @@ static int queue_if_no_path(struct multipath > *m, unsigned > queue_if_no_path, > else > m->saved_queue_if_no_path = queue_if_no_path; > m->queue_if_no_path = queue_if_no_path; > - if (!m->queue_if_no_path) > - dm_table_run_md_queue_async(m->ti->table); > - > spin_unlock_irqrestore(&m->lock, flags); > > + if (!queue_if_no_path) > + dm_table_run_md_queue_async(m->ti->table); > + > return 0; > } > Or, better still, try the attached patch. There's one more instance where lockdep might complain. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) [-- Attachment #2: 0001-dm-multipath-Fixup-lockdep-warning.patch --] [-- Type: text/x-patch, Size: 1833 bytes --] From 44bd0601e47f5cc5d26b679550f0a5a2f2c3f487 Mon Sep 17 00:00:00 2001 From: Hannes Reinecke <hare@suse.de> Date: Mon, 26 May 2014 14:45:39 +0200 Subject: [PATCH] dm-multipath: Fixup lockdep warning lockdep complains about a circular locking. And indeed, we need to release the lock before calling dm_table_run_md_queue_asycn(). Reported-by: Bart van Assche <bvanassche@acm.org> Signed-off-by: Hannes Reinecke <hare@suse.de> --- drivers/md/dm-mpath.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index aa009e8..ebfa411 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -445,11 +445,11 @@ static int queue_if_no_path(struct multipath *m, unsigned queue_if_no_path, else m->saved_queue_if_no_path = queue_if_no_path; m->queue_if_no_path = queue_if_no_path; - if (!m->queue_if_no_path) - dm_table_run_md_queue_async(m->ti->table); - spin_unlock_irqrestore(&m->lock, flags); + if (!queue_if_no_path) + dm_table_run_md_queue_async(m->ti->table); + return 0; } @@ -954,7 +954,7 @@ out: */ static int reinstate_path(struct pgpath *pgpath) { - int r = 0; + int r = 0, run_queue = 0; unsigned long flags; struct multipath *m = pgpath->pg->m; @@ -978,7 +978,7 @@ static int reinstate_path(struct pgpath *pgpath) if (!m->nr_valid_paths++) { m->current_pgpath = NULL; - dm_table_run_md_queue_async(m->ti->table); + run_queue = 1; } else if (m->hw_handler_name && (m->current_pg == pgpath->pg)) { if (queue_work(kmpath_handlerd, &pgpath->activate_path.work)) m->pg_init_in_progress++; @@ -991,6 +991,8 @@ static int reinstate_path(struct pgpath *pgpath) out: spin_unlock_irqrestore(&m->lock, flags); + if (run_queue) + dm_table_run_md_queue_async(m->ti->table); return r; } -- 1.7.12.4 [-- Attachment #3: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: 3.15-rc4: circular locking dependency triggered by dm-multipath 2014-05-26 12:51 ` Hannes Reinecke @ 2014-05-26 14:54 ` Bart Van Assche 2014-05-27 13:22 ` Mike Snitzer 0 siblings, 1 reply; 15+ messages in thread From: Bart Van Assche @ 2014-05-26 14:54 UTC (permalink / raw) To: Hannes Reinecke; +Cc: dm-devel On 05/26/14 14:51, Hannes Reinecke wrote: > Or, better still, try the attached patch. > There's one more instance where lockdep might complain. Does anyone know whether it's safe to revert commit 4cdd2ad780 ? Anyway, I ran a test with v3.15-rc7 + your patch + 4cdd2ad780 reverted and that test ran fine. I didn't encounter any lockups nor any lockdep complaints. The test I ran was to run a fio I/O verification test on top of a dm-multipath device. That multipath device was created on top of several SCSI device nodes created by the ib_srp driver. About once every 30 seconds a cable removal / reinsertion was simulated via the ibportstate command. Bart. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: 3.15-rc4: circular locking dependency triggered by dm-multipath 2014-05-26 14:54 ` Bart Van Assche @ 2014-05-27 13:22 ` Mike Snitzer 2014-05-27 14:49 ` Mike Snitzer 0 siblings, 1 reply; 15+ messages in thread From: Mike Snitzer @ 2014-05-27 13:22 UTC (permalink / raw) To: Bart Van Assche; +Cc: dm-devel On Mon, May 26 2014 at 10:54am -0400, Bart Van Assche <bvanassche@acm.org> wrote: > On 05/26/14 14:51, Hannes Reinecke wrote: > > Or, better still, try the attached patch. > > There's one more instance where lockdep might complain. > > Does anyone know whether it's safe to revert commit 4cdd2ad780 ? Yes, it should be reverted, otherwise multipath_ioctl is using inconsistent lock order. I should've waited for test confirmation before sending the previous "fix" upstream. It made the lock order consistent (but consistently wrong). > Anyway, > I ran a test with v3.15-rc7 + your patch + 4cdd2ad780 reverted and that > test ran fine. I didn't encounter any lockups nor any lockdep > complaints. The test I ran was to run a fio I/O verification test on top > of a dm-multipath device. That multipath device was created on top of > several SCSI device nodes created by the ib_srp driver. About once every > 30 seconds a cable removal / reinsertion was simulated via the > ibportstate command. Great, thanks. I'll try to get this queued for Linus this week. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: 3.15-rc4: circular locking dependency triggered by dm-multipath 2014-05-27 13:22 ` Mike Snitzer @ 2014-05-27 14:49 ` Mike Snitzer 2014-05-27 16:09 ` Bart Van Assche 0 siblings, 1 reply; 15+ messages in thread From: Mike Snitzer @ 2014-05-27 14:49 UTC (permalink / raw) To: Bart Van Assche; +Cc: dm-devel On Tue, May 27 2014 at 9:22am -0400, Mike Snitzer <snitzer@redhat.com> wrote: > On Mon, May 26 2014 at 10:54am -0400, > Bart Van Assche <bvanassche@acm.org> wrote: > > > Anyway, > > I ran a test with v3.15-rc7 + your patch + 4cdd2ad780 reverted and that > > test ran fine. I didn't encounter any lockups nor any lockdep > > complaints. The test I ran was to run a fio I/O verification test on top > > of a dm-multipath device. That multipath device was created on top of > > several SCSI device nodes created by the ib_srp driver. About once every > > 30 seconds a cable removal / reinsertion was simulated via the > > ibportstate command. > > Great, thanks. I'll try to get this queued for Linus this week. Done, please see: https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-linus&id=63d832c30142cdceb478b1cac7d943d83b95b2dc ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: 3.15-rc4: circular locking dependency triggered by dm-multipath 2014-05-27 14:49 ` Mike Snitzer @ 2014-05-27 16:09 ` Bart Van Assche 0 siblings, 0 replies; 15+ messages in thread From: Bart Van Assche @ 2014-05-27 16:09 UTC (permalink / raw) To: Mike Snitzer; +Cc: dm-devel On 05/27/14 16:49, Mike Snitzer wrote: > On Tue, May 27 2014 at 9:22am -0400, > Mike Snitzer <snitzer@redhat.com> wrote: >> On Mon, May 26 2014 at 10:54am -0400, >> Bart Van Assche <bvanassche@acm.org> wrote: >>> Anyway, >>> I ran a test with v3.15-rc7 + your patch + 4cdd2ad780 reverted and that >>> test ran fine. I didn't encounter any lockups nor any lockdep >>> complaints. The test I ran was to run a fio I/O verification test on top >>> of a dm-multipath device. That multipath device was created on top of >>> several SCSI device nodes created by the ib_srp driver. About once every >>> 30 seconds a cable removal / reinsertion was simulated via the >>> ibportstate command. >> >> Great, thanks. I'll try to get this queued for Linus this week. > > Done, please see: > https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-linus&id=63d832c30142cdceb478b1cac7d943d83b95b2dc Hello Mike, My test passes with that tree. Thanks, Bart. ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2014-05-27 16:09 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-05-06 9:31 3.15-rc4: circular locking dependency triggered by dm-multipath Bart Van Assche 2014-05-06 13:22 ` Mike Snitzer 2014-05-06 13:46 ` Bart Van Assche 2014-05-07 20:53 ` Mike Snitzer 2014-05-08 5:57 ` Hannes Reinecke 2014-05-26 11:44 ` Bart Van Assche 2014-05-26 12:10 ` Hannes Reinecke 2014-05-26 12:20 ` Bart Van Assche 2014-05-26 12:29 ` Hannes Reinecke 2014-05-26 12:44 ` Hannes Reinecke 2014-05-26 12:51 ` Hannes Reinecke 2014-05-26 14:54 ` Bart Van Assche 2014-05-27 13:22 ` Mike Snitzer 2014-05-27 14:49 ` Mike Snitzer 2014-05-27 16:09 ` Bart Van Assche
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.