All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG] Oops when SCSI device under multipath is removed
@ 2011-08-10  4:29 Jun'ichi Nomura
  2011-08-10 19:52 ` James Bottomley
  0 siblings, 1 reply; 14+ messages in thread
From: Jun'ichi Nomura @ 2011-08-10  4:29 UTC (permalink / raw)
  To: James.Bottomley, jaxboe
  Cc: roland, stern, linux-scsi, linux-kernel,
	device-mapper development, Kiyoshi Ueda

Hi James,

With the attached shell script, blk_insert_cloned_request will cause
oops with 3.1-rc1. (This is a regression introduced in 2.6.39)

The regression was introduced by 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b,
"[SCSI] put stricter guards on queue dead checks".
Part of the commit has moved scsi_free_queue(), which calls
blk_cleanup_queue(), to scsi_remove_device(), which is called while
the device is still open.
The oops occurs because blk_insert_cloned_request() is called
after blk_cleanup_queue() is called (which frees elevator
and turns on QUEUE_FLAG_DEAD).

2 patches have been proposed but neither of them included:
  1) Add QUEUE_FLAG_DEAD check in blk_insert_cloned_request()
     https://lkml.org/lkml/2011/7/8/457
  2) SCSI to call blk_cleanup_queue() from device's ->release() callback
     (before 2.6.39, it used to work like this)
     https://lkml.org/lkml/2011/7/2/106

Both work fine for this test case
but it seems it's not possible to make the patch 1) safe because
QUEUE_FLAG_DEAD check is racy. There is a window between the check
and the use of elevator.
So I think the patch 2) is better. Could you please consider to include it?

   BUG: unable to handle kernel NULL pointer dereference at           (null)
   IP: [<          (null)>]           (null)
   PGD 29c015067 PUD 29855a067 PMD 0 
   Oops: 0010 [#1] SMP 
   CPU 2 
   ..
   Pid: 6125, comm: dd Not tainted 3.1.0-rc1 #1
   RIP: 0010:[<0000000000000000>]  [<          (null)>]           (null)
   RSP: 0018:ffff880288377990  EFLAGS: 00010092
   RAX: ffff88029c3e4e00 RBX: ffff88029a324cc8 RCX: 0000000000000240
   RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffff88029a324cc8
   RBP: ffff8802883779a8 R08: 0000000000000000 R09: 0000000000000000
   R10: 0000000000000002 R11: 0000000000000000 R12: ffff88029a324cc8
   R13: 0000000000000002 R14: 0000000000000002 R15: ffff88029a46c180
   FS:  00007fc4daf3d700(0000) GS:ffff8802afa00000(0000) knlGS:0000000000000000
   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   CR2: 0000000000000000 CR3: 000000029abe4000 CR4: 00000000000006e0
   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
   DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
   Process dd (pid: 6125, threadinfo ffff880288376000, task ffff8802879ac260)
   Stack:
   ffffffff81223d8a ffff8802883779d8 ffff88029b5a6018 ffff8802883779d8
   ffffffff81224bdf ffff88029b5a6018 ffff88029a324cc8 0000000000000002
   ffff88029a46c000 ffff880288377a08 ffffffff8122974d ffff880288377a08
   Call Trace:
   [<ffffffff81223d8a>] ? elv_drain_elevator+0x2a/0x80
   [<ffffffff81224bdf>] __elv_add_request+0x9f/0x200
   [<ffffffff8122974d>] add_acct_request+0x3d/0x50
   [<ffffffff812297c5>] blk_insert_cloned_request+0x65/0x90
   [<ffffffffa01870ee>] dm_dispatch_request+0x3e/0x70 [dm_mod]
   [<ffffffffa0187e71>] dm_request_fn+0x191/0x290 [dm_mod]
   [<ffffffff8122571e>] __blk_run_queue+0x1e/0x20
   [<ffffffff81228a4e>] queue_unplugged+0x4e/0xc0
   [<ffffffff81228c76>] blk_flush_plug_list+0x1b6/0x210
   [<ffffffff814c62a5>] io_schedule+0x75/0xd0
   [<ffffffff811a03b4>] __blockdev_direct_IO+0x924/0xb20
   [<ffffffff8119dc77>] blkdev_direct_IO+0x57/0x60
   [<ffffffff8119cc60>] ? blkdev_get_block+0x70/0x70
   [<ffffffff8110ce35>] generic_file_aio_read+0x6f5/0x770
   [<ffffffff814c971b>] ? _raw_spin_unlock+0x2b/0x40
   [<ffffffff81131cda>] ? handle_pte_fault+0x40a/0x9c0
   [<ffffffff8101aa53>] ? native_sched_clock+0x13/0x60
   [<ffffffff81019e99>] ? sched_clock+0x9/0x10
   [<ffffffff8116746a>] do_sync_read+0xda/0x120
   [<ffffffff811f943e>] ? security_file_permission+0x8e/0x90
   [<ffffffff81167bed>] vfs_read+0xcd/0x190
   [<ffffffff81167db4>] sys_read+0x54/0xa0
   [<ffffffff814d1582>] system_call_fastpath+0x16/0x1b
   Code:  Bad RIP value.
   RIP  [<          (null)>]           (null)
   RSP <ffff880288377990>
   CR2: 0000000000000000
   ---[ end trace 4abd28efc271910a ]---

-- 
Jun'ichi Nomura, NEC Corporation



#!/bin/bash

dev=$1
if [ -z "$dev" ]; then
	echo "usage: $0 <sdX>    where <sdX> is unused SCSI device"
	exit 1
fi
if [ ! -e /dev/$dev -o ! -d /sys/block/$dev ] ; then
	echo "device $dev not found"
	exit 1
fi
if [ $dev = ${dev#sd} ]; then
	echo "device $dev is not SCSI device"
	exit 1
fi
echo "Use SCSI device $dev"

mapname=deadmp
maptab="0 $(blockdev --getsz /dev/$dev) multipath 0 0 1 1 round-robin 1 1 1 1 /dev/$dev 1"
modprobe dm-multipath
modprobe dm-round-robin
echo "Creating multipath device: $mapname"
echo $maptab | dmsetup create $mapname
if [ $? -ne 0 ]; then
	echo "failed to create mpath device"
	exit 1
fi

echo "Deleting $dev"
echo 1 > /sys/block/$dev/device/delete
sleep 1

echo "Try I/O on $dev"
dd if=/dev/mapper/${mapname} of=/dev/null bs=4k count=1 iflag=direct

sleep 10
dmsetup remove $mapname

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] Oops when SCSI device under multipath is removed
  2011-08-10  4:29 [BUG] Oops when SCSI device under multipath is removed Jun'ichi Nomura
@ 2011-08-10 19:52 ` James Bottomley
  2011-08-11  0:24   ` Jun'ichi Nomura
  0 siblings, 1 reply; 14+ messages in thread
From: James Bottomley @ 2011-08-10 19:52 UTC (permalink / raw)
  To: Jun'ichi Nomura
  Cc: jaxboe, roland, stern, linux-scsi, linux-kernel,
	device-mapper development, Kiyoshi Ueda

On Wed, 2011-08-10 at 13:29 +0900, Jun'ichi Nomura wrote:
> Hi James,
> 
> With the attached shell script, blk_insert_cloned_request will cause
> oops with 3.1-rc1. (This is a regression introduced in 2.6.39)
> 
> The regression was introduced by 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b,
> "[SCSI] put stricter guards on queue dead checks".
> Part of the commit has moved scsi_free_queue(), which calls
> blk_cleanup_queue(), to scsi_remove_device(), which is called while
> the device is still open.
> The oops occurs because blk_insert_cloned_request() is called
> after blk_cleanup_queue() is called (which frees elevator
> and turns on QUEUE_FLAG_DEAD).
> 
> 2 patches have been proposed but neither of them included:
>   1) Add QUEUE_FLAG_DEAD check in blk_insert_cloned_request()
>      https://lkml.org/lkml/2011/7/8/457
>   2) SCSI to call blk_cleanup_queue() from device's ->release() callback
>      (before 2.6.39, it used to work like this)
>      https://lkml.org/lkml/2011/7/2/106

Well, they both have documented objections.  I asked why we destroy the
elevator in the del case and didn't get any traction, so let me show the
actual patch which should fix all of these issues.

Is there a good reason for not doing this as a bug fix now?

James

---
diff --git a/block/blk-core.c b/block/blk-core.c
index b627558..cc72b7d 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -367,11 +367,6 @@ void blk_cleanup_queue(struct request_queue *q)
 	queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
 	mutex_unlock(&q->sysfs_lock);
 
-	if (q->elevator)
-		elevator_exit(q->elevator);
-
-	blk_throtl_exit(q);
-
 	blk_put_queue(q);
 }
 EXPORT_SYMBOL(blk_cleanup_queue);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 0ee17b5..a5a756b 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -477,6 +477,11 @@ static void blk_release_queue(struct kobject *kobj)
 
 	blk_sync_queue(q);
 
+	if (q->elevator)
+		elevator_exit(q->elevator);
+
+	blk_throtl_exit(q);
+
 	if (rl->rq_pool)
 		mempool_destroy(rl->rq_pool);
 



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [BUG] Oops when SCSI device under multipath is removed
  2011-08-10 19:52 ` James Bottomley
@ 2011-08-11  0:24   ` Jun'ichi Nomura
  2011-08-11  3:01     ` Jun'ichi Nomura
  0 siblings, 1 reply; 14+ messages in thread
From: Jun'ichi Nomura @ 2011-08-11  0:24 UTC (permalink / raw)
  To: James Bottomley
  Cc: jaxboe, roland, stern, linux-scsi, linux-kernel,
	device-mapper development, Kiyoshi Ueda

Hi James,

On 08/11/11 04:52, James Bottomley wrote:
> On Wed, 2011-08-10 at 13:29 +0900, Jun'ichi Nomura wrote:
>> 2 patches have been proposed but neither of them included:
>>   1) Add QUEUE_FLAG_DEAD check in blk_insert_cloned_request()
>>      https://lkml.org/lkml/2011/7/8/457
>>   2) SCSI to call blk_cleanup_queue() from device's ->release() callback
>>      (before 2.6.39, it used to work like this)
>>      https://lkml.org/lkml/2011/7/2/106
> 
> Well, they both have documented objections.  I asked why we destroy the
> elevator in the del case and didn't get any traction, so let me show the
> actual patch which should fix all of these issues.
> 
> Is there a good reason for not doing this as a bug fix now?
> 
> James
> 
> ---
> diff --git a/block/blk-core.c b/block/blk-core.c
> index b627558..cc72b7d 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -367,11 +367,6 @@ void blk_cleanup_queue(struct request_queue *q)
>  	queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
>  	mutex_unlock(&q->sysfs_lock);
>  
> -	if (q->elevator)
> -		elevator_exit(q->elevator);
> -
> -	blk_throtl_exit(q);
> -
>  	blk_put_queue(q);
>  }
>  EXPORT_SYMBOL(blk_cleanup_queue);
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 0ee17b5..a5a756b 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -477,6 +477,11 @@ static void blk_release_queue(struct kobject *kobj)
>  
>  	blk_sync_queue(q);
>  
> +	if (q->elevator)
> +		elevator_exit(q->elevator);
> +
> +	blk_throtl_exit(q);
> +
>  	if (rl->rq_pool)
>  		mempool_destroy(rl->rq_pool);

I think it doesn't work because elevator_exit() and
blk_throtl_exit() take &q->queue_lock, which may be freed
by LLD after blk_cleanup_queue, before blk_release_queue.

Vivek's comment here describes it in detail:
https://lkml.org/lkml/2011/7/12/279

Vivek Goyal wrote:
> I thought a driver could either rely on spin lock provided by request
> queue or override that by providing its own spinlock.
> 
> blk_init_allocated_queue_node()
>         /* Override internal queue lock with supplied lock pointer */
>         if (lock)
>                 q->queue_lock           = lock;
> So if driver calls blk_cleanup_queue() and drops its reference on queue, then
> it should be free to release any memory it has allocated for spinlock.
> So though queue is around there are no gurantees that q->queue_lock is
> still around. That memory might have been freed by driver and reused.
> 
> I see many drivers are providing their own locks. Some samples from
> drivers/block.
> 
> /virtio_blk.c:	q = vblk->disk->queue = blk_init_queue(do_virtblk_request,
> &vblk->lock);
> ./xd.c:	xd_queue = blk_init_queue(do_xd_request, &xd_lock);
> ./cpqarray.c:	q = blk_init_queue(do_ida_request, &hba[i]->lock);
> ./sx8.c:	q = blk_init_queue(carm_rq_fn, &host->lock);
> ./sx8.c:	q = blk_init_queue(carm_oob_rq_fn, &host->lock);
> ./floppy.c:	disks[dr]->queue = blk_init_queue(do_fd_request, &floppy_lock);
> ./viodasd.c:	q = blk_init_queue(do_viodasd_request, &d->q_lock);
> ./cciss.c:	disk->queue = blk_init_queue(do_cciss_request, &h->lock);
> ./hd.c:	hd_queue = blk_init_queue(do_hd_request, &hd_lock);
> ./DAC960.c:  	RequestQueue = blk_init_queue(DAC960_RequestFunction,&Controller->queue_lock);
> ./z2ram.c:    z2_queue = blk_init_queue(do_z2_request, &z2ram_lock);
> ./amiflop.c:	disk->queue = blk_init_queue(do_fd_request, &amiflop_lock);
> ./xen-blkfront.c:	rq = blk_init_queue(do_blkif_request, &blkif_io_lock);
> ./paride/pd.c:	pd_queue = blk_init_queue(do_pd_request, &pd_lock);
> ./paride/pf.c:	pf_queue = blk_init_queue(do_pf_request, &pf_spin_lock);
> ./paride/pcd.c:	pcd_queue = blk_init_queue(do_pcd_request, &pcd_lock);
> ./mg_disk.c:	host->breq = blk_init_queue(mg_request_poll, &host->lock);
> ./mg_disk.c:	host->breq = blk_init_queue(mg_request, &host->lock);
> ./rbd.c:	q = blk_init_queue(rbd_rq_fn, &rbd_dev->lock);
> ./sunvdc.c:	q = blk_init_queue(do_vdc_request, &port->vio.lock);
> ./swim.c:	swd->queue = blk_init_queue(do_fd_request, &swd->lock);
> ./xsysace.c:	ace->queue = blk_init_queue(ace_request, &ace->lock);
> ./osdblk.c:	q = blk_init_queue(osdblk_rq_fn, &osdev->lock);
> ./ps3disk.c:	queue = blk_init_queue(ps3disk_request, &priv->lock);
> ./swim3.c:	swim3_queue = blk_init_queue(do_fd_request, &swim3_lock);
> ./ub.c:	if ((q = blk_init_queue(ub_request_fn, sc->lock)) == NULL)
> ./nbd.c:	disk->queue = blk_init_queue(do_nbd_request, &nbd_lock);

# SCSI doesn't hit this problem because it uses
# queue_lock embedded in struct request_queue.


Thanks,
-- 
Jun'ichi Nomura, NEC Corporation

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] Oops when SCSI device under multipath is removed
  2011-08-11  0:24   ` Jun'ichi Nomura
@ 2011-08-11  3:01     ` Jun'ichi Nomura
  2011-08-11 14:33       ` James Bottomley
  0 siblings, 1 reply; 14+ messages in thread
From: Jun'ichi Nomura @ 2011-08-11  3:01 UTC (permalink / raw)
  To: James Bottomley
  Cc: jaxboe, roland, stern, linux-scsi, linux-kernel,
	device-mapper development, Kiyoshi Ueda

Hi James,

On 08/11/11 09:24, Jun'ichi Nomura wrote:
> On 08/11/11 04:52, James Bottomley wrote:
>> On Wed, 2011-08-10 at 13:29 +0900, Jun'ichi Nomura wrote:
>>>   2) SCSI to call blk_cleanup_queue() from device's ->release() callback
>>>      (before 2.6.39, it used to work like this)
>>>      https://lkml.org/lkml/2011/7/2/106
>>
>> Well, they both have documented objections.  I asked why we destroy the
>> elevator in the del case and didn't get any traction, so let me show the
>> actual patch which should fix all of these issues.
>>
>> Is there a good reason for not doing this as a bug fix now?
...
> I think it doesn't work because elevator_exit() and
> blk_throtl_exit() take &q->queue_lock, which may be freed
> by LLD after blk_cleanup_queue, before blk_release_queue.

If the reason you moved scsi_free_queue into scsi_remove_device
is marking the queue dead, how about the following patch?
Do you think it's acceptable?

Jun'ichi Nomura, NEC Corporation


Add blk_kill_queue() for drivers which want to mark the queue dead early.

blk_cleanup_queue() is an interface for LLD to notify block layer
that LLD no longer needs the queue.
Since q->queue_lock may point to a structure in LLD which is freed
after blk_cleanup_queue, blk_cleanup_queue() frees subordinate structures
like elevator, which uses q->queue_lock, to avoid invalid reference.

OTOH, LLD like SCSI wants to just mark the queue dead earlier in tear
down phase.

So this patch factors out the early part of blk_cleanup_queue into
blk_kill_queue for such drivers.

--- linux-3.1-rc1/include/linux/blkdev.h.orig	2011-08-11 11:19:52.585280223 +0900
+++ linux-3.1-rc1/include/linux/blkdev.h	2011-08-11 11:20:09.482279763 +0900
@@ -804,6 +804,7 @@ extern struct request_queue *blk_init_al
 extern struct request_queue *blk_init_queue(request_fn_proc *, spinlock_t *);
 extern struct request_queue *blk_init_allocated_queue(struct request_queue *,
 						      request_fn_proc *, spinlock_t *);
+extern void blk_kill_queue(struct request_queue *);
 extern void blk_cleanup_queue(struct request_queue *);
 extern void blk_queue_make_request(struct request_queue *, make_request_fn *);
 extern void blk_queue_bounce_limit(struct request_queue *, u64);
--- linux-3.1-rc1/block/blk-core.c.orig	2011-08-10 09:46:06.014043123 +0900
+++ linux-3.1-rc1/block/blk-core.c	2011-08-11 11:19:34.551280697 +0900
@@ -347,6 +347,17 @@ void blk_put_queue(struct request_queue 
 }
 EXPORT_SYMBOL(blk_put_queue);
 
+void blk_kill_queue(struct request_queue *q)
+{
+	blk_sync_queue(q);
+
+	del_timer_sync(&q->backing_dev_info.laptop_mode_wb_timer);
+	mutex_lock(&q->sysfs_lock);
+	queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
+	mutex_unlock(&q->sysfs_lock);
+}
+EXPORT_SYMBOL(blk_kill_queue);
+
 /*
  * Note: If a driver supplied the queue lock, it should not zap that lock
  * unexpectedly as some queue cleanup components like elevator_exit() and
@@ -360,12 +371,7 @@ void blk_cleanup_queue(struct request_qu
 	 * are done before moving on. Going into this function, we should
 	 * not have processes doing IO to this device.
 	 */
-	blk_sync_queue(q);
-
-	del_timer_sync(&q->backing_dev_info.laptop_mode_wb_timer);
-	mutex_lock(&q->sysfs_lock);
-	queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
-	mutex_unlock(&q->sysfs_lock);
+	blk_kill_queue(q);
 
 	if (q->elevator)
 		elevator_exit(q->elevator);
--- linux-3.1-rc1/drivers/scsi/scsi_sysfs.c.orig	2011-08-09 18:48:13.676485115 +0900
+++ linux-3.1-rc1/drivers/scsi/scsi_sysfs.c	2011-08-11 11:21:07.923277456 +0900
@@ -322,6 +322,7 @@ static void scsi_device_dev_release_user
 		kfree(evt);
 	}
 
+	scsi_free_queue(sdev->request_queue);
 	blk_put_queue(sdev->request_queue);
 	/* NULL queue means the device can't be used */
 	sdev->request_queue = NULL;
@@ -937,7 +938,7 @@ void __scsi_remove_device(struct scsi_de
 	sdev->request_queue->queuedata = NULL;
 
 	/* Freeing the queue signals to block that we're done */
-	scsi_free_queue(sdev->request_queue);
+	blk_kill_queue(sdev->request_queue);
 	put_device(dev);
 }
 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] Oops when SCSI device under multipath is removed
  2011-08-11  3:01     ` Jun'ichi Nomura
@ 2011-08-11 14:33       ` James Bottomley
  2011-08-11 14:59           ` Alan Stern
  0 siblings, 1 reply; 14+ messages in thread
From: James Bottomley @ 2011-08-11 14:33 UTC (permalink / raw)
  To: Jun'ichi Nomura
  Cc: jaxboe, roland, stern, linux-scsi, linux-kernel,
	device-mapper development, Kiyoshi Ueda

On Thu, 2011-08-11 at 12:01 +0900, Jun'ichi Nomura wrote:
> Hi James,
> 
> On 08/11/11 09:24, Jun'ichi Nomura wrote:
> > On 08/11/11 04:52, James Bottomley wrote:
> >> On Wed, 2011-08-10 at 13:29 +0900, Jun'ichi Nomura wrote:
> >>>   2) SCSI to call blk_cleanup_queue() from device's ->release() callback
> >>>      (before 2.6.39, it used to work like this)
> >>>      https://lkml.org/lkml/2011/7/2/106
> >>
> >> Well, they both have documented objections.  I asked why we destroy the
> >> elevator in the del case and didn't get any traction, so let me show the
> >> actual patch which should fix all of these issues.
> >>
> >> Is there a good reason for not doing this as a bug fix now?
> ...
> > I think it doesn't work because elevator_exit() and
> > blk_throtl_exit() take &q->queue_lock, which may be freed
> > by LLD after blk_cleanup_queue, before blk_release_queue.
> 
> If the reason you moved scsi_free_queue into scsi_remove_device
> is marking the queue dead, how about the following patch?
> Do you think it's acceptable?

Well, it's just hiding the problem.  The essential problem is that only
block has the correctly refcounted knowledge to know the last release of
the queue reference.  Until that time, the holder of the reference can
use the queue regardless of whether blk_cleanup_queue() has been called.
This is the race you complain about since use of the queue involves the
lock which should be guarded by QUEUE_DEAD checks.

This is essentially unfixable with function calls.  The only way to fix
it is to have a callback model for freeing the external lock.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] Oops when SCSI device under multipath is removed
  2011-08-11 14:33       ` James Bottomley
@ 2011-08-11 14:59           ` Alan Stern
  0 siblings, 0 replies; 14+ messages in thread
From: Alan Stern @ 2011-08-11 14:59 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jun'ichi Nomura, jaxboe, roland, linux-scsi, linux-kernel,
	device-mapper development, Kiyoshi Ueda

On Thu, 11 Aug 2011, James Bottomley wrote:

> > If the reason you moved scsi_free_queue into scsi_remove_device
> > is marking the queue dead, how about the following patch?
> > Do you think it's acceptable?
> 
> Well, it's just hiding the problem.  The essential problem is that only
> block has the correctly refcounted knowledge to know the last release of
> the queue reference.  Until that time, the holder of the reference can
> use the queue regardless of whether blk_cleanup_queue() has been called.
> This is the race you complain about since use of the queue involves the
> lock which should be guarded by QUEUE_DEAD checks.
> 
> This is essentially unfixable with function calls.  The only way to fix
> it is to have a callback model for freeing the external lock.

Assuming the queue is associated with a device, the queue could take a
reference to the device, dropping that reference when the queue is
freed.  Then the lock could safely be freed at the same time as the 
device.

Alan Stern


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] Oops when SCSI device under multipath is removed
@ 2011-08-11 14:59           ` Alan Stern
  0 siblings, 0 replies; 14+ messages in thread
From: Alan Stern @ 2011-08-11 14:59 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jun'ichi Nomura, jaxboe, roland, linux-scsi, linux-kernel,
	device-mapper development, Kiyoshi Ueda

On Thu, 11 Aug 2011, James Bottomley wrote:

> > If the reason you moved scsi_free_queue into scsi_remove_device
> > is marking the queue dead, how about the following patch?
> > Do you think it's acceptable?
> 
> Well, it's just hiding the problem.  The essential problem is that only
> block has the correctly refcounted knowledge to know the last release of
> the queue reference.  Until that time, the holder of the reference can
> use the queue regardless of whether blk_cleanup_queue() has been called.
> This is the race you complain about since use of the queue involves the
> lock which should be guarded by QUEUE_DEAD checks.
> 
> This is essentially unfixable with function calls.  The only way to fix
> it is to have a callback model for freeing the external lock.

Assuming the queue is associated with a device, the queue could take a
reference to the device, dropping that reference when the queue is
freed.  Then the lock could safely be freed at the same time as the 
device.

Alan Stern


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] Oops when SCSI device under multipath is removed
  2011-08-11 14:59           ` Alan Stern
  (?)
@ 2011-08-11 15:05           ` James Bottomley
  2011-08-11 15:16               ` Alan Stern
  -1 siblings, 1 reply; 14+ messages in thread
From: James Bottomley @ 2011-08-11 15:05 UTC (permalink / raw)
  To: Alan Stern
  Cc: Jun'ichi Nomura, jaxboe, roland, linux-scsi, linux-kernel,
	device-mapper development, Kiyoshi Ueda

On Thu, 2011-08-11 at 10:59 -0400, Alan Stern wrote:
> On Thu, 11 Aug 2011, James Bottomley wrote:
> 
> > > If the reason you moved scsi_free_queue into scsi_remove_device
> > > is marking the queue dead, how about the following patch?
> > > Do you think it's acceptable?
> > 
> > Well, it's just hiding the problem.  The essential problem is that only
> > block has the correctly refcounted knowledge to know the last release of
> > the queue reference.  Until that time, the holder of the reference can
> > use the queue regardless of whether blk_cleanup_queue() has been called.
> > This is the race you complain about since use of the queue involves the
> > lock which should be guarded by QUEUE_DEAD checks.
> > 
> > This is essentially unfixable with function calls.  The only way to fix
> > it is to have a callback model for freeing the external lock.
> 
> Assuming the queue is associated with a device, the queue could take a
> reference to the device, dropping that reference when the queue is
> freed.  Then the lock could safely be freed at the same time as the 
> device.

If that assumption is correct, there's no point refcounting the queue at
all because its use is entirely subordinated to the lifecycle of the
associated device.  Plus all the wittering about my previous patch is
pointless, because blk_cleanup_queue() has to do the final put of the
queue in the lock free path (otherwise the assumption is violated).

However, much as I'd like to accept this rosy view, the original oops
that started all of this in 2.6.38 was someone caught something with a
reference to a SCSI queue after the device release function had been
called.

James



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] Oops when SCSI device under multipath is removed
  2011-08-11 15:05           ` James Bottomley
@ 2011-08-11 15:16               ` Alan Stern
  0 siblings, 0 replies; 14+ messages in thread
From: Alan Stern @ 2011-08-11 15:16 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jun'ichi Nomura, jaxboe, roland, linux-scsi, linux-kernel,
	device-mapper development, Kiyoshi Ueda

On Thu, 11 Aug 2011, James Bottomley wrote:

> > > Well, it's just hiding the problem.  The essential problem is that only
> > > block has the correctly refcounted knowledge to know the last release of
> > > the queue reference.  Until that time, the holder of the reference can
> > > use the queue regardless of whether blk_cleanup_queue() has been called.
> > > This is the race you complain about since use of the queue involves the
> > > lock which should be guarded by QUEUE_DEAD checks.
> > > 
> > > This is essentially unfixable with function calls.  The only way to fix
> > > it is to have a callback model for freeing the external lock.
> > 
> > Assuming the queue is associated with a device, the queue could take a
> > reference to the device, dropping that reference when the queue is
> > freed.  Then the lock could safely be freed at the same time as the 
> > device.
> 
> If that assumption is correct, there's no point refcounting the queue at
> all because its use is entirely subordinated to the lifecycle of the
> associated device.

That's true.  Why wasn't it done that way originally?  Are there queues 
that aren't associated with devices?

>  Plus all the wittering about my previous patch is
> pointless, because blk_cleanup_queue() has to do the final put of the
> queue in the lock free path (otherwise the assumption is violated).
> 
> However, much as I'd like to accept this rosy view, the original oops
> that started all of this in 2.6.38 was someone caught something with a
> reference to a SCSI queue after the device release function had been
> called.

Not according to your commit log.  You wrote that the reference was
taken after scsi_remove_device() had been called -- but the device
release function is scsi_device_dev_release_usercontext().

Alan Stern


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] Oops when SCSI device under multipath is removed
@ 2011-08-11 15:16               ` Alan Stern
  0 siblings, 0 replies; 14+ messages in thread
From: Alan Stern @ 2011-08-11 15:16 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jun'ichi Nomura, jaxboe, roland, linux-scsi, linux-kernel,
	device-mapper development, Kiyoshi Ueda

On Thu, 11 Aug 2011, James Bottomley wrote:

> > > Well, it's just hiding the problem.  The essential problem is that only
> > > block has the correctly refcounted knowledge to know the last release of
> > > the queue reference.  Until that time, the holder of the reference can
> > > use the queue regardless of whether blk_cleanup_queue() has been called.
> > > This is the race you complain about since use of the queue involves the
> > > lock which should be guarded by QUEUE_DEAD checks.
> > > 
> > > This is essentially unfixable with function calls.  The only way to fix
> > > it is to have a callback model for freeing the external lock.
> > 
> > Assuming the queue is associated with a device, the queue could take a
> > reference to the device, dropping that reference when the queue is
> > freed.  Then the lock could safely be freed at the same time as the 
> > device.
> 
> If that assumption is correct, there's no point refcounting the queue at
> all because its use is entirely subordinated to the lifecycle of the
> associated device.

That's true.  Why wasn't it done that way originally?  Are there queues 
that aren't associated with devices?

>  Plus all the wittering about my previous patch is
> pointless, because blk_cleanup_queue() has to do the final put of the
> queue in the lock free path (otherwise the assumption is violated).
> 
> However, much as I'd like to accept this rosy view, the original oops
> that started all of this in 2.6.38 was someone caught something with a
> reference to a SCSI queue after the device release function had been
> called.

Not according to your commit log.  You wrote that the reference was
taken after scsi_remove_device() had been called -- but the device
release function is scsi_device_dev_release_usercontext().

Alan Stern


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] Oops when SCSI device under multipath is removed
  2011-08-11 15:16               ` Alan Stern
  (?)
@ 2011-08-16 11:26               ` Jun'ichi Nomura
  2011-08-18  9:11                 ` Jun'ichi Nomura
  -1 siblings, 1 reply; 14+ messages in thread
From: Jun'ichi Nomura @ 2011-08-16 11:26 UTC (permalink / raw)
  To: Alan Stern, James Bottomley, Tejun Heo
  Cc: jaxboe, roland, linux-scsi, linux-kernel,
	device-mapper development, Kiyoshi Ueda

Hi,

On 08/12/11 00:16, Alan Stern wrote:
> On Thu, 11 Aug 2011, James Bottomley wrote:
>> However, much as I'd like to accept this rosy view, the original oops
>> that started all of this in 2.6.38 was someone caught something with a
>> reference to a SCSI queue after the device release function had been
>> called.
> 
> Not according to your commit log.  You wrote that the reference was
> taken after scsi_remove_device() had been called -- but the device
> release function is scsi_device_dev_release_usercontext().

The commit log of 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b
("[SCSI] put stricter guards on queue dead checks") does not
explain about the move of scsi_free_queue().

But according to the discussion below, it seems
the move was motivated to solve the following self-deadlock:
https://lkml.org/lkml/2011/4/12/9

  [in the context of kblockd_workqueue]
  blk_delay_work
    __blk_run_queue
      scsi_request_fn
        put_device
          (puts final sdev refcount)
             scsi_device_dev_release
               execute_in_process_context(scsi_device_dev_release_usercontext)
                 [execute immediately because it's in process context]
                    scsi_device_dev_release_usercontext
                      scsi_free_queue
                        blk_cleanup_queue
                          blk_sync_queue
                            (wait for blk_delay_work to complete...)

James, is my understanding correct?

If so, isn't it possible to move the scsi_free_queue back to
the original place and solve the deadlock instead by
avoiding the wait in the same context?

@@ -338,8 +339,8 @@ static void scsi_device_dev_release_user
 static void scsi_device_dev_release(struct device *dev)
 {
 	struct scsi_device *sdp = to_scsi_device(dev);
-	execute_in_process_context(scsi_device_dev_release_usercontext,
-				   &sdp->ew);
+	INIT_WORK(&sdp->ew.work, scsi_device_dev_release_usercontext);
+	schedule_work(&sdp->ew.work);
 }
 
 static struct class sdev_class = {

Thanks,
-- 
Jun'ichi Nomura, NEC Corporation

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] Oops when SCSI device under multipath is removed
  2011-08-16 11:26               ` Jun'ichi Nomura
@ 2011-08-18  9:11                 ` Jun'ichi Nomura
  2011-08-31 19:50                   ` Thadeu Lima de Souza Cascardo
  0 siblings, 1 reply; 14+ messages in thread
From: Jun'ichi Nomura @ 2011-08-18  9:11 UTC (permalink / raw)
  To: James Bottomley, Tejun Heo
  Cc: Alan Stern, jaxboe, roland, linux-scsi, linux-kernel,
	device-mapper development, Kiyoshi Ueda

Hi James,

On 08/16/11 20:26, Jun'ichi Nomura wrote:
> The commit log of 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b
> ("[SCSI] put stricter guards on queue dead checks") does not
> explain about the move of scsi_free_queue().
> 
> But according to the discussion below, it seems
> the move was motivated to solve the following self-deadlock:
> https://lkml.org/lkml/2011/4/12/9
> 
>   [in the context of kblockd_workqueue]
>   blk_delay_work
>     __blk_run_queue
>       scsi_request_fn
>         put_device
>           (puts final sdev refcount)
>              scsi_device_dev_release
>                execute_in_process_context(scsi_device_dev_release_usercontext)
>                  [execute immediately because it's in process context]
>                     scsi_device_dev_release_usercontext
>                       scsi_free_queue
>                         blk_cleanup_queue
>                           blk_sync_queue
>                             (wait for blk_delay_work to complete...)
> 
> James, is my understanding correct?
> 
> If so, isn't it possible to move the scsi_free_queue back to
> the original place and solve the deadlock instead by
> avoiding the wait in the same context?

Actually, Tejun has posted a patch to replace
execute_in_process_context() with queue_work()
and asking your review:

  [PATCH RESEND] scsi: don't use execute_in_process_context()
  https://lkml.org/lkml/2011/4/30/87

Do you think you can take the patch and revert the move
of scsi_free_queue()?

Thanks,
-- 
Jun'ichi Nomura, NEC Corporation

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] Oops when SCSI device under multipath is removed
  2011-08-18  9:11                 ` Jun'ichi Nomura
@ 2011-08-31 19:50                   ` Thadeu Lima de Souza Cascardo
  2011-09-08  0:00                     ` Jun'ichi Nomura
  0 siblings, 1 reply; 14+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2011-08-31 19:50 UTC (permalink / raw)
  To: Jun'ichi Nomura
  Cc: James Bottomley, Tejun Heo, Alan Stern, jaxboe, roland,
	linux-scsi, linux-kernel, device-mapper development,
	Kiyoshi Ueda

On Thu, Aug 18, 2011 at 06:11:19PM +0900, Jun'ichi Nomura wrote:
> Hi James,
> 
> On 08/16/11 20:26, Jun'ichi Nomura wrote:
> > The commit log of 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b
> > ("[SCSI] put stricter guards on queue dead checks") does not
> > explain about the move of scsi_free_queue().
> > 
> > But according to the discussion below, it seems
> > the move was motivated to solve the following self-deadlock:
> > https://lkml.org/lkml/2011/4/12/9
> > 
> >   [in the context of kblockd_workqueue]
> >   blk_delay_work
> >     __blk_run_queue
> >       scsi_request_fn
> >         put_device
> >           (puts final sdev refcount)
> >              scsi_device_dev_release
> >                execute_in_process_context(scsi_device_dev_release_usercontext)
> >                  [execute immediately because it's in process context]
> >                     scsi_device_dev_release_usercontext
> >                       scsi_free_queue
> >                         blk_cleanup_queue
> >                           blk_sync_queue
> >                             (wait for blk_delay_work to complete...)
> > 
> > James, is my understanding correct?
> > 
> > If so, isn't it possible to move the scsi_free_queue back to
> > the original place and solve the deadlock instead by
> > avoiding the wait in the same context?
> 
> Actually, Tejun has posted a patch to replace
> execute_in_process_context() with queue_work()
> and asking your review:
> 
>   [PATCH RESEND] scsi: don't use execute_in_process_context()
>   https://lkml.org/lkml/2011/4/30/87
> 
> Do you think you can take the patch and revert the move
> of scsi_free_queue()?
> 
> Thanks,
> -- 
> Jun'ichi Nomura, NEC Corporation
> --

I've tested with your suggestion (reverting the move of scsi_free_queue)
and it works like a charm. I did not get any oops after that. I tested
with a multipath setup on top of two iscsi targets. Using dd after
logging out of some of one of the iscsi targets would trigger the oops.
With this patch, it could not be triggered anymore.

Best regards,
Cascardo.

--

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index e0bd3f7..a6eb6f1 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -322,7 +322,11 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)
 		kfree(evt);
 	}
 
+	/* Freeing the queue signals to block that we're done */
+	scsi_free_queue(sdev->request_queue);
+
 	blk_put_queue(sdev->request_queue);
+
 	/* NULL queue means the device can't be used */
 	sdev->request_queue = NULL;
 
@@ -936,8 +940,6 @@ void __scsi_remove_device(struct scsi_device *sdev)
 	/* cause the request function to reject all I/O requests */
 	sdev->request_queue->queuedata = NULL;
 
-	/* Freeing the queue signals to block that we're done */
-	scsi_free_queue(sdev->request_queue);
 	put_device(dev);
 }
 
---


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [BUG] Oops when SCSI device under multipath is removed
  2011-08-31 19:50                   ` Thadeu Lima de Souza Cascardo
@ 2011-09-08  0:00                     ` Jun'ichi Nomura
  0 siblings, 0 replies; 14+ messages in thread
From: Jun'ichi Nomura @ 2011-09-08  0:00 UTC (permalink / raw)
  To: Thadeu Lima de Souza Cascardo
  Cc: James Bottomley, Tejun Heo, Alan Stern, jaxboe, roland,
	linux-scsi, linux-kernel, device-mapper development,
	Kiyoshi Ueda

Hello,

On 09/01/11 04:50, Thadeu Lima de Souza Cascardo wrote:
> On Thu, Aug 18, 2011 at 06:11:19PM +0900, Jun'ichi Nomura wrote:
>> Actually, Tejun has posted a patch to replace
>> execute_in_process_context() with queue_work()
>> and asking your review:
>>
>>   [PATCH RESEND] scsi: don't use execute_in_process_context()
>>   https://lkml.org/lkml/2011/4/30/87
>>
>> Do you think you can take the patch and revert the move
>> of scsi_free_queue()?
> 
> I've tested with your suggestion (reverting the move of scsi_free_queue)
> and it works like a charm. I did not get any oops after that. I tested
> with a multipath setup on top of two iscsi targets. Using dd after
> logging out of some of one of the iscsi targets would trigger the oops.
> With this patch, it could not be triggered anymore.

Thank you for testing and the report.

Since scsi_free_queue() frees elevator, calling it while there
still is a user of the elevator has no way to work.
Either we should call it later (like the above suggestion)
or change scsi_free_queue not to free the elevator
(James posted a patch early in this thread).

I think the latter approach could be nice if it worked.
But if not, the former approach should be taken.
Without fix, a path failure can cause a panic. This is bad...

Best regards,
-- 
Jun'ichi Nomura, NEC Corporation

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2011-09-08  0:04 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-10  4:29 [BUG] Oops when SCSI device under multipath is removed Jun'ichi Nomura
2011-08-10 19:52 ` James Bottomley
2011-08-11  0:24   ` Jun'ichi Nomura
2011-08-11  3:01     ` Jun'ichi Nomura
2011-08-11 14:33       ` James Bottomley
2011-08-11 14:59         ` Alan Stern
2011-08-11 14:59           ` Alan Stern
2011-08-11 15:05           ` James Bottomley
2011-08-11 15:16             ` Alan Stern
2011-08-11 15:16               ` Alan Stern
2011-08-16 11:26               ` Jun'ichi Nomura
2011-08-18  9:11                 ` Jun'ichi Nomura
2011-08-31 19:50                   ` Thadeu Lima de Souza Cascardo
2011-09-08  0:00                     ` Jun'ichi Nomura

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.