From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Ewan D. Milne" Subject: Re: [PATCH 2/2] scsi_dh_alua: Fix a recently introduced deadlock Date: Tue, 29 Mar 2016 10:55:55 -0400 Message-ID: <1459263355.30035.16.camel@localhost.localdomain> References: <56F9740A.4060100@sandisk.com> <56F9746C.40101@sandisk.com> Reply-To: emilne@redhat.com Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Return-path: Received: from mx1.redhat.com ([209.132.183.28]:60128 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752622AbcC2Oz5 (ORCPT ); Tue, 29 Mar 2016 10:55:57 -0400 In-Reply-To: <56F9746C.40101@sandisk.com> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: Bart Van Assche Cc: James Bottomley , "Martin K. Petersen" , Hannes Reinecke , Christoph Hellwig , "linux-scsi@vger.kernel.org" On Mon, 2016-03-28 at 11:14 -0700, Bart Van Assche wrote: > While retesting the SRP initiator I ran the command "rmmod mlx4_ib" > while I/O was in progress. That command triggers SCSI device removal > indirectly. Avoid that this action triggers the following deadlock: > > ================================= > [ INFO: inconsistent lock state ] > 4.6.0-rc0-dbg+ #2 Tainted: G O > --------------------------------- > inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. > multipathd/484 [HC0[0]:SC0[0]:HE1:SE1] takes: > (&(&pg->lock)->rlock){+.?...}, at: [] alua_bus_detach+0x52/0xa0 [scsi_dh_alua] > {IN-SOFTIRQ-W} state was registered at: > [] __lock_acquire+0x7e9/0x1ad0 > [] lock_acquire+0x60/0x80 > [] _raw_spin_lock_irqsave+0x3e/0x60 > [] alua_rtpg_queue+0x41/0x1d0 [scsi_dh_alua] > [] alua_check+0xe1/0x220 [scsi_dh_alua] > [] alua_check_sense+0x99/0xb0 [scsi_dh_alua] > [] scsi_check_sense+0x71/0x3f0 > [] scsi_decide_disposition+0x18b/0x1d0 > [] scsi_softirq_done+0x52/0x140 > [] blk_done_softirq+0x52/0x90 > [] __do_softirq+0x10f/0x230 > [] irq_exit+0xa8/0xb0 > [] do_IRQ+0x65/0x110 > [] ret_from_intr+0x0/0x19 > [] kmem_cache_alloc+0x151/0x190 > [] create_object+0x34/0x2d0 > [] kmemleak_alloc_percpu+0x56/0xd0 > [] pcpu_alloc+0x38d/0x660 > [] __alloc_percpu_gfp+0xd/0x10 > [] __percpu_counter_init+0x55/0xb0 > [] blkg_alloc+0x79/0x230 > [] blkcg_init_queue+0x26/0x1d0 > [] blk_alloc_queue_node+0x27d/0x2e0 > [] dm_create+0x20c/0x570 [dm_mod] > [] dev_create+0x56/0x2c0 [dm_mod] > [] ctl_ioctl+0x26e/0x520 [dm_mod] > [] dm_ctl_ioctl+0xe/0x20 [dm_mod] > [] do_vfs_ioctl+0x8e/0x660 > [] SyS_ioctl+0x3c/0x70 > [] entry_SYSCALL_64_fastpath+0x1c/0xac > irq event stamp: 4290931 > hardirqs last enabled at (4290931): [ 1662.892772] > [] _raw_spin_unlock_irqrestore+0x31/0x50 > hardirqs last disabled at (4290930): [] _raw_spin_lock_irqsave+0x17/0x60 > softirqs last enabled at (4290774): [] __do_softirq+0x1cb/0x230 > softirqs last disabled at (4289831): [] irq_exit+0xa8/0xb0 > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > ---- > lock(&(&pg->lock)->rlock); > > lock(&(&pg->lock)->rlock); > > *** DEADLOCK *** > > 2 locks held by multipathd/484: > #0: (&bdev->bd_mutex){+.+.+.}, at: [] __blkdev_put+0x33/0x360 > #1: (sd_ref_mutex){+.+...}, at: [] scsi_disk_put+0x1c/0x40 > > stack backtrace: > CPU: 6 PID: 484 Comm: multipathd Tainted: G O 4.6.0-rc0-dbg+ #2 > Call Trace: > [] dump_stack+0x67/0x92 > [] print_usage_bug+0x215/0x240 > [] mark_lock+0x54a/0x610 > [] __lock_acquire+0x845/0x1ad0 > [] lock_acquire+0x60/0x80 > [] _raw_spin_lock+0x33/0x50 > [] alua_bus_detach+0x52/0xa0 [scsi_dh_alua] > [] scsi_dh_release_device+0x17/0x50 > [] scsi_device_dev_release_usercontext+0x2a/0x120 > [] execute_in_process_context+0x80/0x90 > [] scsi_device_dev_release+0x17/0x20 > [] device_release+0x2d/0x90 > [] kobject_release+0x7a/0x190 > [] kobject_put+0x26/0x50 > [] put_device+0x12/0x20 > [] scsi_device_put+0x26/0x30 > [] scsi_disk_put+0x2d/0x40 > [] sd_release+0x48/0xb0 > [] __blkdev_put+0x29e/0x360 > [] blkdev_put+0x49/0x170 > [] blkdev_close+0x20/0x30 > [] __fput+0xe8/0x1f0 > [] ____fput+0x9/0x10 > [] task_work_run+0x6e/0xa0 > [] exit_to_usermode_loop+0xa9/0xb0 > [] syscall_return_slowpath+0xb0/0xc0 > [] entry_SYSCALL_64_fastpath+0xaa/0xac > > Fixes: cb0a168cb6b8 (scsi_dh_alua: update 'access_state' field) > Signed-off-by: Bart Van Assche > Cc: Hannes Reinecke > Signed-off-by: Bart Van Assche > --- > drivers/scsi/device_handler/scsi_dh_alua.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c > index a404a41..8eaed05 100644 > --- a/drivers/scsi/device_handler/scsi_dh_alua.c > +++ b/drivers/scsi/device_handler/scsi_dh_alua.c > @@ -1112,9 +1112,9 @@ static void alua_bus_detach(struct scsi_device *sdev) > h->sdev = NULL; > spin_unlock(&h->pg_lock); > if (pg) { > - spin_lock(&pg->lock); > + spin_lock_irq(&pg->lock); > list_del_rcu(&h->node); > - spin_unlock(&pg->lock); > + spin_unlock_irq(&pg->lock); > kref_put(&pg->kref, release_port_group); > } > sdev->handler_data = NULL; Reviewed-by: Ewan D. Milne