All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 RFC 0/4] virtio: add 'surprize_removal' to virtio_device
@ 2013-11-27 10:32 Heinz Graalfs
  2013-11-27 10:32 ` [PATCH v3 RFC 1/4] virtio: add surprize_removal " Heinz Graalfs
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Heinz Graalfs @ 2013-11-27 10:32 UTC (permalink / raw)
  To: mst, virtualization; +Cc: borntraeger

Hi, here is an updated patch-set to my v2 RFC
   virtio: add new notify() callback to virtio_driver
This RFC introduces a new virtio_device entry 'surprize_removal' instead
of a new 'notify' callback in struct virtio_driver.

When an active virtio block device is hot-unplugged from a KVM guest,
affected guest user applications are not aware of any errors that occur
due to the lost device. This patch-set adds code to avoid further request
queueing when a lost block device is detected, resulting in appropriate
error info. Additionally a potential hang situation can be avoided by not
waiting for requests (e.g. in-flight requests) in blk_cleanup_queue() that
will never complete.

On System z there exists no handshake mechanism between host and guest
when a device is hot-unplugged. The device is removed and no further I/O
is possible.

When an online channel device disappears on System z the kernel's CIO layer
informs the driver (virtio_ccw) about the lost device.

Here are some more error details:

For a particular block device virtio's request function virtblk_request()
is called by the block layer to queue requests to be handled by the host.
In case of a lost device requests can still be queued, but an appropriate
subsequent host kick usually fails. This leads to situations where no error
feedback is shown.

In order to prevent request queueing for lost devices appropriate settings
in the block layer should be made. Exploiting System z's CIO notify handler
callback, and passing on device loss information via the surprize_removal
flag to the remove callback of the backend driver, can solve this task.

v2->v3 changes:
 - remove virtio_driver's notify callback (and appropriate code) introduced
   in my v1 RFC
 - introduce 'surprize_removal' in struct virtio_device
 - change virtio_blk's remove callback to perform special actions when the
   surprize_removal flag is set
   - avoid final I/O by preventing further request queueing
   - avoid hangs in blk_cleanup_queue() due to waits on 'in-flight' requests
 - set surprize_removal in virtio_ccw's notify callback when a device is lost

v1->v2 changes:
 - add include of linux/notifier.h (I also added it to the 3rd patch)
 - get queue lock in order to be able to use safe queue_flag_set() functions
   in virtblk_notify() handler


Heinz Graalfs (4):
  virtio: add surprize_removal to virtio_device
  virtio_blk: avoid further request queueing on device loss
  virtio_blk: avoid calling blk_cleanup_queue() on device loss
  virtio_ccw: set surprize_removal in virtio_device if a device was lost

 drivers/block/virtio_blk.c    | 12 +++++++++++-
 drivers/s390/kvm/virtio_ccw.c | 16 ++++++++++++++--
 drivers/virtio/virtio.c       |  1 +
 include/linux/virtio.h        |  1 +
 4 files changed, 27 insertions(+), 3 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3 RFC 1/4] virtio: add surprize_removal to virtio_device
  2013-11-27 10:32 [PATCH v3 RFC 0/4] virtio: add 'surprize_removal' to virtio_device Heinz Graalfs
@ 2013-11-27 10:32 ` Heinz Graalfs
  2013-11-27 10:32 ` [PATCH v3 RFC 2/4] virtio_blk: avoid further request queueing on device loss Heinz Graalfs
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Heinz Graalfs @ 2013-11-27 10:32 UTC (permalink / raw)
  To: mst, virtualization; +Cc: borntraeger

Add new entry surprize_removal to struct virtio_device.

When a virtio transport driver is notified about a lost device
it should set surprize_removal to true.

A backend driver can test this flag in order to perform specific
actions that might be appropriate wrt the device loss.

Signed-off-by: Heinz Graalfs <graalfs@linux.vnet.ibm.com>
---
 drivers/virtio/virtio.c | 1 +
 include/linux/virtio.h  | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index ee59b74..290d1e2 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -115,6 +115,7 @@ static int virtio_dev_probe(struct device *_d)
 
 	/* We have a driver! */
 	add_status(dev, VIRTIO_CONFIG_S_DRIVER);
+	dev->surprize_removal = false;
 
 	/* Figure out what features the device supports. */
 	device_features = dev->config->get_features(dev);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index f15f6e7..131404a 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -98,6 +98,7 @@ struct virtio_device {
 	struct list_head vqs;
 	/* Note that this is a Linux set_bit-style bitmap. */
 	unsigned long features[1];
+	bool surprize_removal;
 	void *priv;
 };
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 RFC 2/4] virtio_blk: avoid further request queueing on device loss
  2013-11-27 10:32 [PATCH v3 RFC 0/4] virtio: add 'surprize_removal' to virtio_device Heinz Graalfs
  2013-11-27 10:32 ` [PATCH v3 RFC 1/4] virtio: add surprize_removal " Heinz Graalfs
@ 2013-11-27 10:32 ` Heinz Graalfs
  2013-12-04  4:04   ` Rusty Russell
  2013-11-27 10:32 ` [PATCH v3 RFC 3/4] virtio_blk: avoid calling blk_cleanup_queue() " Heinz Graalfs
  2013-11-27 10:32 ` [PATCH v3 RFC 4/4] virtio_ccw: set surprize_removal in virtio_device if a device was lost Heinz Graalfs
  3 siblings, 1 reply; 13+ messages in thread
From: Heinz Graalfs @ 2013-11-27 10:32 UTC (permalink / raw)
  To: mst, virtualization; +Cc: borntraeger

Code is added to the remove callback to verify if a device was lost.

In case of a device loss further request queueing should be prevented
by setting appropriate queue flags prior to invoking del_gendisk().
Blocking of request queueing leads to appropriate I/O errors when data
are tried to be synched. Trying to synch data to a lost block device
doesn't make too much sense.

If the current remove callback was triggered due to an unregister driver,
and the surprize_removal is not already set (although the actual device
is already gone, e.g. virsh detach), del_gendisk() would be triggered
resulting in a hang. This is a weird situation, and most likely not
'serializable'.

Signed-off-by: Heinz Graalfs <graalfs@linux.vnet.ibm.com>
---
 drivers/block/virtio_blk.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 2d43be4..0f64282 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -875,6 +875,7 @@ static void virtblk_remove(struct virtio_device *vdev)
 {
 	struct virtio_blk *vblk = vdev->priv;
 	int index = vblk->index;
+	unsigned long flags;
 	int refc;
 
 	/* Prevent config work handler from accessing the device. */
@@ -882,6 +883,14 @@ static void virtblk_remove(struct virtio_device *vdev)
 	vblk->config_enable = false;
 	mutex_unlock(&vblk->config_lock);
 
+	if (vdev->surprize_removal) {
+		spin_lock_irqsave(vblk->disk->queue->queue_lock, flags);
+		queue_flag_set(QUEUE_FLAG_DYING, vblk->disk->queue);
+		queue_flag_set(QUEUE_FLAG_NOMERGES, vblk->disk->queue);
+		queue_flag_set(QUEUE_FLAG_NOXMERGES, vblk->disk->queue);
+		spin_unlock_irqrestore(vblk->disk->queue->queue_lock, flags);
+	}
+
 	del_gendisk(vblk->disk);
 	blk_cleanup_queue(vblk->disk->queue);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 RFC 3/4] virtio_blk: avoid calling blk_cleanup_queue() on device loss
  2013-11-27 10:32 [PATCH v3 RFC 0/4] virtio: add 'surprize_removal' to virtio_device Heinz Graalfs
  2013-11-27 10:32 ` [PATCH v3 RFC 1/4] virtio: add surprize_removal " Heinz Graalfs
  2013-11-27 10:32 ` [PATCH v3 RFC 2/4] virtio_blk: avoid further request queueing on device loss Heinz Graalfs
@ 2013-11-27 10:32 ` Heinz Graalfs
  2013-11-27 10:47   ` Michael S. Tsirkin
  2013-11-27 10:32 ` [PATCH v3 RFC 4/4] virtio_ccw: set surprize_removal in virtio_device if a device was lost Heinz Graalfs
  3 siblings, 1 reply; 13+ messages in thread
From: Heinz Graalfs @ 2013-11-27 10:32 UTC (permalink / raw)
  To: mst, virtualization; +Cc: borntraeger

Code is added to avoid calling blk_cleanup_queue() when the surprize_removal
flag is set due to a disappeared device. It avoid hangs due to incomplete
requests (e.g. in-flight requests). Such requests must be considered as lost.

If the current remove callback was triggered due to an unregister driver,
and the surprize_removal is not already set (although the actual device
is already gone, e.g. virsh detach), blk_cleanup_queue() would be triggered
resulting in a possible hang. This hang is caused by e.g. 'in-flight' requests
that will never complete. This is a weird situation, and most likely not
'serializable'.

Signed-off-by: Heinz Graalfs <graalfs@linux.vnet.ibm.com>
---
 drivers/block/virtio_blk.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 0f64282..8c05001 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -892,7 +892,8 @@ static void virtblk_remove(struct virtio_device *vdev)
 	}
 
 	del_gendisk(vblk->disk);
-	blk_cleanup_queue(vblk->disk->queue);
+	if (!vdev->surprize_removal)
+		blk_cleanup_queue(vblk->disk->queue);
 
 	/* Stop all the virtqueues. */
 	vdev->config->reset(vdev);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 RFC 4/4] virtio_ccw: set surprize_removal in virtio_device if a device was lost
  2013-11-27 10:32 [PATCH v3 RFC 0/4] virtio: add 'surprize_removal' to virtio_device Heinz Graalfs
                   ` (2 preceding siblings ...)
  2013-11-27 10:32 ` [PATCH v3 RFC 3/4] virtio_blk: avoid calling blk_cleanup_queue() " Heinz Graalfs
@ 2013-11-27 10:32 ` Heinz Graalfs
  2013-11-27 10:49   ` Michael S. Tsirkin
  3 siblings, 1 reply; 13+ messages in thread
From: Heinz Graalfs @ 2013-11-27 10:32 UTC (permalink / raw)
  To: mst, virtualization; +Cc: borntraeger

Code is added to the notify handler to set the 'surprize_removal' flag
in virtio_device in case a CIO_GONE notification occurs. The remove
callback of the backend driver must check this flag in order to perform
special processing for a lost device.

Signed-off-by: Heinz Graalfs <graalfs@linux.vnet.ibm.com>
---
 drivers/s390/kvm/virtio_ccw.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c
index 35b9aaa..8f6c74a 100644
--- a/drivers/s390/kvm/virtio_ccw.c
+++ b/drivers/s390/kvm/virtio_ccw.c
@@ -27,6 +27,7 @@
 #include <linux/module.h>
 #include <linux/io.h>
 #include <linux/kvm_para.h>
+#include <linux/notifier.h>
 #include <asm/setup.h>
 #include <asm/irq.h>
 #include <asm/cio.h>
@@ -1064,8 +1065,19 @@ out_free:
 
 static int virtio_ccw_cio_notify(struct ccw_device *cdev, int event)
 {
-	/* TODO: Check whether we need special handling here. */
-	return 0;
+	int rc;
+	struct virtio_ccw_device *vcdev = dev_get_drvdata(&cdev->dev);
+
+	switch (event) {
+	case CIO_GONE:
+		vcdev->vdev.surprize_removal = true;
+		rc = NOTIFY_DONE;
+		break;
+	default:
+		rc = NOTIFY_DONE;
+		break;
+	}
+	return rc;
 }
 
 static struct ccw_device_id virtio_ids[] = {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 RFC 3/4] virtio_blk: avoid calling blk_cleanup_queue() on device loss
  2013-11-27 10:32 ` [PATCH v3 RFC 3/4] virtio_blk: avoid calling blk_cleanup_queue() " Heinz Graalfs
@ 2013-11-27 10:47   ` Michael S. Tsirkin
  2013-11-27 11:37     ` Heinz Graalfs
  0 siblings, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2013-11-27 10:47 UTC (permalink / raw)
  To: Heinz Graalfs; +Cc: borntraeger, virtualization

On Wed, Nov 27, 2013 at 11:32:39AM +0100, Heinz Graalfs wrote:
> Code is added to avoid calling blk_cleanup_queue() when the surprize_removal
> flag is set due to a disappeared device. It avoid hangs due to incomplete
> requests (e.g. in-flight requests). Such requests must be considered as lost.

Ugh. Can't we complete these immediately using detach_unused_buf? If not why?

> If the current remove callback was triggered due to an unregister driver,
> and the surprize_removal is not already set (although the actual device
> is already gone, e.g. virsh detach), blk_cleanup_queue() would be triggered
> resulting in a possible hang. This hang is caused by e.g. 'in-flight' requests
> that will never complete. This is a weird situation, and most likely not
> 'serializable'.

Hmm interesting. Implement some timeout and probe device to make sure
it's still alive?

> Signed-off-by: Heinz Graalfs <graalfs@linux.vnet.ibm.com>
> ---
>  drivers/block/virtio_blk.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 0f64282..8c05001 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -892,7 +892,8 @@ static void virtblk_remove(struct virtio_device *vdev)
>  	}
>  
>  	del_gendisk(vblk->disk);
> -	blk_cleanup_queue(vblk->disk->queue);
> +	if (!vdev->surprize_removal)
> +		blk_cleanup_queue(vblk->disk->queue);
>  
>  	/* Stop all the virtqueues. */
>  	vdev->config->reset(vdev);
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 RFC 4/4] virtio_ccw: set surprize_removal in virtio_device if a device was lost
  2013-11-27 10:32 ` [PATCH v3 RFC 4/4] virtio_ccw: set surprize_removal in virtio_device if a device was lost Heinz Graalfs
@ 2013-11-27 10:49   ` Michael S. Tsirkin
  0 siblings, 0 replies; 13+ messages in thread
From: Michael S. Tsirkin @ 2013-11-27 10:49 UTC (permalink / raw)
  To: Heinz Graalfs; +Cc: borntraeger, virtualization

On Wed, Nov 27, 2013 at 11:32:40AM +0100, Heinz Graalfs wrote:
> Code is added to the notify handler to set the 'surprize_removal' flag
> in virtio_device in case a CIO_GONE notification occurs. The remove
> callback of the backend driver must check this flag in order to perform
> special processing for a lost device.
> 
> Signed-off-by: Heinz Graalfs <graalfs@linux.vnet.ibm.com>
> ---
>  drivers/s390/kvm/virtio_ccw.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c
> index 35b9aaa..8f6c74a 100644
> --- a/drivers/s390/kvm/virtio_ccw.c
> +++ b/drivers/s390/kvm/virtio_ccw.c
> @@ -27,6 +27,7 @@
>  #include <linux/module.h>
>  #include <linux/io.h>
>  #include <linux/kvm_para.h>
> +#include <linux/notifier.h>
>  #include <asm/setup.h>
>  #include <asm/irq.h>
>  #include <asm/cio.h>
> @@ -1064,8 +1065,19 @@ out_free:
>  
>  static int virtio_ccw_cio_notify(struct ccw_device *cdev, int event)
>  {
> -	/* TODO: Check whether we need special handling here. */
> -	return 0;
> +	int rc;
> +	struct virtio_ccw_device *vcdev = dev_get_drvdata(&cdev->dev);
> +
> +	switch (event) {
> +	case CIO_GONE:
> +		vcdev->vdev.surprize_removal = true;
> +		rc = NOTIFY_DONE;
> +		break;
> +	default:
> +		rc = NOTIFY_DONE;
> +		break;
> +	}
> +	return rc;
>  }
>  

So again, this seems to mean there is some serialization with
device unregistration?
Otherwise how do we know using dev_get_drvdata is safe?


>  static struct ccw_device_id virtio_ids[] = {
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 RFC 3/4] virtio_blk: avoid calling blk_cleanup_queue() on device loss
  2013-11-27 10:47   ` Michael S. Tsirkin
@ 2013-11-27 11:37     ` Heinz Graalfs
  2013-11-27 12:28       ` Michael S. Tsirkin
  2013-11-27 12:49       ` Michael S. Tsirkin
  0 siblings, 2 replies; 13+ messages in thread
From: Heinz Graalfs @ 2013-11-27 11:37 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: borntraeger, virtualization

On 27/11/13 11:47, Michael S. Tsirkin wrote:
> On Wed, Nov 27, 2013 at 11:32:39AM +0100, Heinz Graalfs wrote:
>> Code is added to avoid calling blk_cleanup_queue() when the surprize_removal
>> flag is set due to a disappeared device. It avoid hangs due to incomplete
>> requests (e.g. in-flight requests). Such requests must be considered as lost.
>
> Ugh. Can't we complete these immediately using detach_unused_buf? If not why?

OK, I will try

>
>> If the current remove callback was triggered due to an unregister driver,
>> and the surprize_removal is not already set (although the actual device
>> is already gone, e.g. virsh detach), blk_cleanup_queue() would be triggered
>> resulting in a possible hang. This hang is caused by e.g. 'in-flight' requests
>> that will never complete. This is a weird situation, and most likely not
>> 'serializable'.
>
> Hmm interesting. Implement some timeout and probe device to make sure
> it's still alive?

but there is always some race, isn't it?

>
>> Signed-off-by: Heinz Graalfs <graalfs@linux.vnet.ibm.com>
>> ---
>>   drivers/block/virtio_blk.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
>> index 0f64282..8c05001 100644
>> --- a/drivers/block/virtio_blk.c
>> +++ b/drivers/block/virtio_blk.c
>> @@ -892,7 +892,8 @@ static void virtblk_remove(struct virtio_device *vdev)
>>   	}
>>
>>   	del_gendisk(vblk->disk);
>> -	blk_cleanup_queue(vblk->disk->queue);
>> +	if (!vdev->surprize_removal)
>> +		blk_cleanup_queue(vblk->disk->queue);
>>
>>   	/* Stop all the virtqueues. */
>>   	vdev->config->reset(vdev);
>> --
>> 1.8.3.1
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 RFC 3/4] virtio_blk: avoid calling blk_cleanup_queue() on device loss
  2013-11-27 11:37     ` Heinz Graalfs
@ 2013-11-27 12:28       ` Michael S. Tsirkin
  2013-11-27 12:49       ` Michael S. Tsirkin
  1 sibling, 0 replies; 13+ messages in thread
From: Michael S. Tsirkin @ 2013-11-27 12:28 UTC (permalink / raw)
  To: Heinz Graalfs; +Cc: borntraeger, virtualization

On Wed, Nov 27, 2013 at 12:37:02PM +0100, Heinz Graalfs wrote:
> On 27/11/13 11:47, Michael S. Tsirkin wrote:
> >On Wed, Nov 27, 2013 at 11:32:39AM +0100, Heinz Graalfs wrote:
> >>Code is added to avoid calling blk_cleanup_queue() when the surprize_removal
> >>flag is set due to a disappeared device. It avoid hangs due to incomplete
> >>requests (e.g. in-flight requests). Such requests must be considered as lost.
> >
> >Ugh. Can't we complete these immediately using detach_unused_buf? If not why?
> 
> OK, I will try
> 
> >
> >>If the current remove callback was triggered due to an unregister driver,
> >>and the surprize_removal is not already set (although the actual device
> >>is already gone, e.g. virsh detach), blk_cleanup_queue() would be triggered
> >>resulting in a possible hang. This hang is caused by e.g. 'in-flight' requests
> >>that will never complete. This is a weird situation, and most likely not
> >>'serializable'.
> >
> >Hmm interesting. Implement some timeout and probe device to make sure
> >it's still alive?
> 
> but there is always some race, isn't it?

Then we retry after a second timeout?


> >
> >>Signed-off-by: Heinz Graalfs <graalfs@linux.vnet.ibm.com>
> >>---
> >>  drivers/block/virtio_blk.c | 3 ++-
> >>  1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >>diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> >>index 0f64282..8c05001 100644
> >>--- a/drivers/block/virtio_blk.c
> >>+++ b/drivers/block/virtio_blk.c
> >>@@ -892,7 +892,8 @@ static void virtblk_remove(struct virtio_device *vdev)
> >>  	}
> >>
> >>  	del_gendisk(vblk->disk);
> >>-	blk_cleanup_queue(vblk->disk->queue);
> >>+	if (!vdev->surprize_removal)
> >>+		blk_cleanup_queue(vblk->disk->queue);
> >>
> >>  	/* Stop all the virtqueues. */
> >>  	vdev->config->reset(vdev);
> >>--
> >>1.8.3.1
> >

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 RFC 3/4] virtio_blk: avoid calling blk_cleanup_queue() on device loss
  2013-11-27 11:37     ` Heinz Graalfs
  2013-11-27 12:28       ` Michael S. Tsirkin
@ 2013-11-27 12:49       ` Michael S. Tsirkin
  2013-11-27 14:15         ` Heinz Graalfs
  1 sibling, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2013-11-27 12:49 UTC (permalink / raw)
  To: Heinz Graalfs; +Cc: borntraeger, virtualization

On Wed, Nov 27, 2013 at 12:37:02PM +0100, Heinz Graalfs wrote:
> On 27/11/13 11:47, Michael S. Tsirkin wrote:
> >On Wed, Nov 27, 2013 at 11:32:39AM +0100, Heinz Graalfs wrote:
> >>Code is added to avoid calling blk_cleanup_queue() when the surprize_removal
> >>flag is set due to a disappeared device. It avoid hangs due to incomplete
> >>requests (e.g. in-flight requests). Such requests must be considered as lost.
> >
> >Ugh. Can't we complete these immediately using detach_unused_buf? If not why?
> 
> OK, I will try
> 
> >
> >>If the current remove callback was triggered due to an unregister driver,
> >>and the surprize_removal is not already set (although the actual device
> >>is already gone, e.g. virsh detach), blk_cleanup_queue() would be triggered
> >>resulting in a possible hang. This hang is caused by e.g. 'in-flight' requests
> >>that will never complete. This is a weird situation, and most likely not
> >>'serializable'.
> >
> >Hmm interesting. Implement some timeout and probe device to make sure
> >it's still alive?
> 
> but there is always some race, isn't it?

To clarify, why this might not be very elegant, a timer-based
solution for surprise removal during driver cleanup
might be easier than trying to build robust interfaces
to address this esoteric case.

But what worries me is that it's not clear to me that ccw won't
invoke notify in parallel with remove callback.
If this happens there will be use after free.


-- 
MST

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 RFC 3/4] virtio_blk: avoid calling blk_cleanup_queue() on device loss
  2013-11-27 12:49       ` Michael S. Tsirkin
@ 2013-11-27 14:15         ` Heinz Graalfs
  2013-11-27 14:37           ` Michael S. Tsirkin
  0 siblings, 1 reply; 13+ messages in thread
From: Heinz Graalfs @ 2013-11-27 14:15 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: borntraeger, virtualization

On 27/11/13 13:49, Michael S. Tsirkin wrote:
> On Wed, Nov 27, 2013 at 12:37:02PM +0100, Heinz Graalfs wrote:
>> On 27/11/13 11:47, Michael S. Tsirkin wrote:
>>> On Wed, Nov 27, 2013 at 11:32:39AM +0100, Heinz Graalfs wrote:
>>>> Code is added to avoid calling blk_cleanup_queue() when the surprize_removal
>>>> flag is set due to a disappeared device. It avoid hangs due to incomplete
>>>> requests (e.g. in-flight requests). Such requests must be considered as lost.
>>>
>>> Ugh. Can't we complete these immediately using detach_unused_buf? If not why?
>>
>> OK, I will try

I tried virtqueue_detach_unused_buf(). It doesn't seem to solve the 
problem. Would that affect block layer in-flight requests anyway? The 
function comment also says it should not be used on an active queue.

Isn't there a mechanism to end vring requests for which a 
vring_interrupt() is missing? (simulate virtblk_done() with an error)?
At least that's it what would help, I suppose.

>>
>>>
>>>> If the current remove callback was triggered due to an unregister driver,
>>>> and the surprize_removal is not already set (although the actual device
>>>> is already gone, e.g. virsh detach), blk_cleanup_queue() would be triggered
>>>> resulting in a possible hang. This hang is caused by e.g. 'in-flight' requests
>>>> that will never complete. This is a weird situation, and most likely not
>>>> 'serializable'.
>>>
>>> Hmm interesting. Implement some timeout and probe device to make sure
>>> it's still alive?

This patch doesn't try to solve any weird races.
It avoids triggering the block queue cleanup, with potential for a hang, 
IFF a device is gone.

>>
>> but there is always some race, isn't it?
>
> To clarify, why this might not be very elegant, a timer-based
> solution for surprise removal during driver cleanup
> might be easier than trying to build robust interfaces
> to address this esoteric case.
>
> But what worries me is that it's not clear to me that ccw won't
> invoke notify in parallel with remove callback.
> If this happens there will be use after free.

OK, I agree, calling remove twice or working on freed stuff must not happen.

>


>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 RFC 3/4] virtio_blk: avoid calling blk_cleanup_queue() on device loss
  2013-11-27 14:15         ` Heinz Graalfs
@ 2013-11-27 14:37           ` Michael S. Tsirkin
  0 siblings, 0 replies; 13+ messages in thread
From: Michael S. Tsirkin @ 2013-11-27 14:37 UTC (permalink / raw)
  To: Heinz Graalfs; +Cc: borntraeger, virtualization

On Wed, Nov 27, 2013 at 03:15:45PM +0100, Heinz Graalfs wrote:
> On 27/11/13 13:49, Michael S. Tsirkin wrote:
> >On Wed, Nov 27, 2013 at 12:37:02PM +0100, Heinz Graalfs wrote:
> >>On 27/11/13 11:47, Michael S. Tsirkin wrote:
> >>>On Wed, Nov 27, 2013 at 11:32:39AM +0100, Heinz Graalfs wrote:
> >>>>Code is added to avoid calling blk_cleanup_queue() when the surprize_removal
> >>>>flag is set due to a disappeared device. It avoid hangs due to incomplete
> >>>>requests (e.g. in-flight requests). Such requests must be considered as lost.
> >>>
> >>>Ugh. Can't we complete these immediately using detach_unused_buf? If not why?
> >>
> >>OK, I will try
> 
> I tried virtqueue_detach_unused_buf(). It doesn't seem to solve the
> problem. Would that affect block layer in-flight requests anyway?
> The function comment also says it should not be used on an active
> queue.

Yes, this must be done after reset normally
so we know device is not consuming buffers.

But if you know device is gone, just make sure
no one will add more requests, that's enough.



> Isn't there a mechanism to end vring requests for which a
> vring_interrupt() is missing? (simulate virtblk_done() with an
> error)?
> At least that's it what would help, I suppose.
> 
> >>
> >>>
> >>>>If the current remove callback was triggered due to an unregister driver,
> >>>>and the surprize_removal is not already set (although the actual device
> >>>>is already gone, e.g. virsh detach), blk_cleanup_queue() would be triggered
> >>>>resulting in a possible hang. This hang is caused by e.g. 'in-flight' requests
> >>>>that will never complete. This is a weird situation, and most likely not
> >>>>'serializable'.
> >>>
> >>>Hmm interesting. Implement some timeout and probe device to make sure
> >>>it's still alive?
> 
> This patch doesn't try to solve any weird races.
> It avoids triggering the block queue cleanup, with potential for a
> hang, IFF a device is gone.
> 
> >>
> >>but there is always some race, isn't it?
> >
> >To clarify, why this might not be very elegant, a timer-based
> >solution for surprise removal during driver cleanup
> >might be easier than trying to build robust interfaces
> >to address this esoteric case.
> >
> >But what worries me is that it's not clear to me that ccw won't
> >invoke notify in parallel with remove callback.
> >If this happens there will be use after free.
> 
> OK, I agree, calling remove twice or working on freed stuff must not happen.
> 
> >
> 
> 
> >

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 RFC 2/4] virtio_blk: avoid further request queueing on device loss
  2013-11-27 10:32 ` [PATCH v3 RFC 2/4] virtio_blk: avoid further request queueing on device loss Heinz Graalfs
@ 2013-12-04  4:04   ` Rusty Russell
  0 siblings, 0 replies; 13+ messages in thread
From: Rusty Russell @ 2013-12-04  4:04 UTC (permalink / raw)
  To: Heinz Graalfs, mst, virtualization; +Cc: borntraeger

Heinz Graalfs <graalfs@linux.vnet.ibm.com> writes:
> Code is added to the remove callback to verify if a device was lost.
>
> In case of a device loss further request queueing should be prevented
> by setting appropriate queue flags prior to invoking del_gendisk().
> Blocking of request queueing leads to appropriate I/O errors when data
> are tried to be synched. Trying to synch data to a lost block device
> doesn't make too much sense.

Sure, but this isn't the only case where we should set these flags,
right?  I would think if any virtqueue is reported broken, we should
mark the queue dying too.

> If the current remove callback was triggered due to an unregister driver,
> and the surprize_removal is not already set (although the actual device
> is already gone, e.g. virsh detach), del_gendisk() would be triggered
> resulting in a hang. This is a weird situation, and most likely not
> 'serializable'.

This seems like abusing the vdev struct to pass an argument to
virtblk_remove().

How about you mark all virtqueues broken in the "unexpected removal"
case?  Then the driver should correctly fail to deliver requests.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-12-04  4:04 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-27 10:32 [PATCH v3 RFC 0/4] virtio: add 'surprize_removal' to virtio_device Heinz Graalfs
2013-11-27 10:32 ` [PATCH v3 RFC 1/4] virtio: add surprize_removal " Heinz Graalfs
2013-11-27 10:32 ` [PATCH v3 RFC 2/4] virtio_blk: avoid further request queueing on device loss Heinz Graalfs
2013-12-04  4:04   ` Rusty Russell
2013-11-27 10:32 ` [PATCH v3 RFC 3/4] virtio_blk: avoid calling blk_cleanup_queue() " Heinz Graalfs
2013-11-27 10:47   ` Michael S. Tsirkin
2013-11-27 11:37     ` Heinz Graalfs
2013-11-27 12:28       ` Michael S. Tsirkin
2013-11-27 12:49       ` Michael S. Tsirkin
2013-11-27 14:15         ` Heinz Graalfs
2013-11-27 14:37           ` Michael S. Tsirkin
2013-11-27 10:32 ` [PATCH v3 RFC 4/4] virtio_ccw: set surprize_removal in virtio_device if a device was lost Heinz Graalfs
2013-11-27 10:49   ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.