[Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
@ 2010-10-25 18:22 Ryan Harper
  2010-10-25 18:22 ` [Qemu-devel] [PATCH 1/3] v2 Add drive_get_by_id Ryan Harper
                   ` (3 more replies)
  0 siblings, 4 replies; 60+ messages in thread
From: Ryan Harper @ 2010-10-25 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: Stefan Hajnoczi, Anthony Liguori, Ryan Harper, Kevin Wolf

This patch series decouples the detachment of a block device from the removal
of the backing pci-device.  Removal of a hotplugged pci device requires the
guest to respond before qemu tears down the block device. In some cases, the
guest may not respond leaving the guest with continued access to the block
device.  

The new monitor command, drive_unplug, will revoke a guests access to the
block device independently of the removal of the pci device.

The first patch adds a new drive find method, the second patch implements the
monitor command and block layer changes.

Changes since v3:
- Moved QMP command for drive_unplug() to separate patch

Changes since v2:
- Added QMP command for drive_unplug()

Changes since v1:
- CodingStyle fixes
- Added qemu_aio_flush() to bdrv_unplug()

Signed-off-by: Ryan Harper <ryanh@us.ibm.com>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH 1/3] v2 Add drive_get_by_id
  2010-10-25 18:22 [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal Ryan Harper
@ 2010-10-25 18:22 ` Ryan Harper
  2010-10-29 13:18   ` Markus Armbruster
  2010-10-25 18:22 ` [Qemu-devel] [PATCH 2/3] v2 Fix Block Hotplug race with drive_unplug() Ryan Harper
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 60+ messages in thread
From: Ryan Harper @ 2010-10-25 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: Stefan Hajnoczi, Anthony Liguori, Ryan Harper, Kevin Wolf

Add a function to find a drive by id string.

Changes since v1:
-Coding Style fix

Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
---
 blockdev.c |   13 +++++++++++++
 blockdev.h |    1 +
 2 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index ff7602b..5fc3b9b 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -75,6 +75,19 @@ DriveInfo *drive_get(BlockInterfaceType type, int bus, int unit)
     return NULL;
 }
 
+DriveInfo *drive_get_by_id(const char *id)
+{
+    DriveInfo *dinfo;
+
+    QTAILQ_FOREACH(dinfo, &drives, next) {
+        if (strcmp(id, dinfo->id)) {
+            continue;
+        }
+        return dinfo;
+    }
+    return NULL;
+}
+
 int drive_get_max_bus(BlockInterfaceType type)
 {
     int max_bus;
diff --git a/blockdev.h b/blockdev.h
index 653affc..19c6915 100644
--- a/blockdev.h
+++ b/blockdev.h
@@ -38,6 +38,7 @@ DriveInfo *drive_get(BlockInterfaceType type, int bus, int unit);
 int drive_get_max_bus(BlockInterfaceType type);
 void drive_uninit(DriveInfo *dinfo);
 DriveInfo *drive_get_by_blockdev(BlockDriverState *bs);
+DriveInfo *drive_get_by_id(const char *id);
 
 QemuOpts *drive_add(const char *file, const char *fmt, ...) GCC_FMT_ATTR(2, 3);
 DriveInfo *drive_init(QemuOpts *arg, int default_to_scsi, int *fatal_error);
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH 2/3] v2 Fix Block Hotplug race with drive_unplug()
  2010-10-25 18:22 [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal Ryan Harper
  2010-10-25 18:22 ` [Qemu-devel] [PATCH 1/3] v2 Add drive_get_by_id Ryan Harper
@ 2010-10-25 18:22 ` Ryan Harper
  2010-10-29 14:01   ` Markus Armbruster
  2010-10-25 18:22 ` [Qemu-devel] [PATCH 3/3] Add qmp version of drive_unplug Ryan Harper
  2010-10-29 14:12 ` [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal Markus Armbruster
  3 siblings, 1 reply; 60+ messages in thread
From: Ryan Harper @ 2010-10-25 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: Stefan Hajnoczi, Anthony Liguori, Ryan Harper, Kevin Wolf

Block hot unplug is racy since the guest is required to acknowlege the ACPI
unplug event; this may not happen synchronously with the device removal command

This series aims to close a gap where by mgmt applications that assume the
block resource has been removed without confirming that the guest has
acknowledged the removal may re-assign the underlying device to a second guest
leading to data leakage.

This series introduces a new montor command to decouple asynchornous device
removal from restricting guest access to a block device.  We do this by creating
a new monitor command drive_unplug which maps to a bdrv_unplug() command which
does a qemu_aio_flush; bdrv_flush() and bdrv_close().  Once complete, subsequent
IO is rejected from the device and the guest will get IO errors but continue to
function.

A subsequent device removal command can be issued to remove the device, to which
the guest may or maynot respond, but as long as the unplugged bit is set, no IO
will be sumbitted.

Changes since v1:
- Added qemu_aio_flush() before bdrv_flush() to wait on pending io

Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
---
 block.c         |    7 +++++++
 block.h         |    1 +
 blockdev.c      |   26 ++++++++++++++++++++++++++
 blockdev.h      |    1 +
 hmp-commands.hx |   15 +++++++++++++++
 5 files changed, 50 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index a19374d..be47655 100644
--- a/block.c
+++ b/block.c
@@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int removable)
     }
 }
 
+void bdrv_unplug(BlockDriverState *bs)
+{
+    qemu_aio_flush();
+    bdrv_flush(bs);
+    bdrv_close(bs);
+}
+
 int bdrv_is_removable(BlockDriverState *bs)
 {
     return bs->removable;
diff --git a/block.h b/block.h
index 5f64380..732f63e 100644
--- a/block.h
+++ b/block.h
@@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs, BlockErrorAction on_read_error,
                        BlockErrorAction on_write_error);
 BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read);
 void bdrv_set_removable(BlockDriverState *bs, int removable);
+void bdrv_unplug(BlockDriverState *bs);
 int bdrv_is_removable(BlockDriverState *bs);
 int bdrv_is_read_only(BlockDriverState *bs);
 int bdrv_is_sg(BlockDriverState *bs);
diff --git a/blockdev.c b/blockdev.c
index 5fc3b9b..68eb329 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -610,3 +610,29 @@ int do_change_block(Monitor *mon, const char *device,
     }
     return monitor_read_bdrv_key_start(mon, bs, NULL, NULL);
 }
+
+int do_drive_unplug(Monitor *mon, const QDict *qdict, QObject **ret_data)
+{
+    DriveInfo *dinfo;
+    BlockDriverState *bs;
+    const char *id;
+
+    if (!qdict_haskey(qdict, "id")) {
+        qerror_report(QERR_MISSING_PARAMETER, "id");
+        return -1;
+    }
+
+    id = qdict_get_str(qdict, "id");
+    dinfo = drive_get_by_id(id);
+    if (!dinfo) {
+        qerror_report(QERR_DEVICE_NOT_FOUND, id);
+        return -1;
+    }
+
+    /* mark block device unplugged */
+    bs = dinfo->bdrv;
+    bdrv_unplug(bs);
+
+    return 0;
+}
+ 
diff --git a/blockdev.h b/blockdev.h
index 19c6915..ecb9ac8 100644
--- a/blockdev.h
+++ b/blockdev.h
@@ -52,5 +52,6 @@ int do_eject(Monitor *mon, const QDict *qdict, QObject **ret_data);
 int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject **ret_data);
 int do_change_block(Monitor *mon, const char *device,
                     const char *filename, const char *fmt);
+int do_drive_unplug(Monitor *mon, const QDict *qdict, QObject **ret_data);
 
 #endif
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 81999aa..7a32a2e 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -68,6 +68,21 @@ Eject a removable medium (use -f to force it).
 ETEXI
 
     {
+        .name       = "drive_unplug",
+        .args_type  = "id:s",
+        .params     = "device",
+        .help       = "unplug block device",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_new = do_drive_unplug,
+    },
+
+STEXI
+@item unplug @var{device}
+@findex unplug
+Unplug block device.
+ETEXI
+
+    {
         .name       = "change",
         .args_type  = "device:B,target:F,arg:s?",
         .params     = "device filename [format]",
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH 3/3] Add qmp version of drive_unplug
  2010-10-25 18:22 [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal Ryan Harper
  2010-10-25 18:22 ` [Qemu-devel] [PATCH 1/3] v2 Add drive_get_by_id Ryan Harper
  2010-10-25 18:22 ` [Qemu-devel] [PATCH 2/3] v2 Fix Block Hotplug race with drive_unplug() Ryan Harper
@ 2010-10-25 18:22 ` Ryan Harper
  2010-10-29 14:12 ` [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal Markus Armbruster
  3 siblings, 0 replies; 60+ messages in thread
From: Ryan Harper @ 2010-10-25 18:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: Stefan Hajnoczi, Anthony Liguori, Ryan Harper, Kevin Wolf

Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
---
 qmp-commands.hx |   26 ++++++++++++++++++++++++++
 1 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/qmp-commands.hx b/qmp-commands.hx
index 793cf1c..e8f3d4a 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -338,6 +338,32 @@ Example:
 EQMP
 
     {
+        .name       = "drive_unplug",
+        .args_type  = "id:s",
+        .params     = "device",
+        .help       = "unplug block device",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_new = do_drive_unplug,
+    },
+
+SQMP
+drive unplug
+----------
+
+Unplug a block device.
+
+Arguments:
+
+- "id": the device's ID (json-string)
+
+Example:
+
+-> { "execute": "drive_unplug", "arguments": { "id": "drive-virtio-blk1" } }
+<- { "return": {} }
+
+EQMP
+
+    {
         .name       = "cpu",
         .args_type  = "index:i",
         .params     = "index",
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] v2 Add drive_get_by_id
  2010-10-25 18:22 ` [Qemu-devel] [PATCH 1/3] v2 Add drive_get_by_id Ryan Harper
@ 2010-10-29 13:18   ` Markus Armbruster
  0 siblings, 0 replies; 60+ messages in thread
From: Markus Armbruster @ 2010-10-29 13:18 UTC (permalink / raw)
  To: Ryan Harper; +Cc: Stefan Hajnoczi, Anthony Liguori, qemu-devel, Kevin Wolf

Ryan Harper <ryanh@us.ibm.com> writes:

> Add a function to find a drive by id string.
>
> Changes since v1:
> -Coding Style fix

Recommend to put patch history below the --- line, so it doesn't get
included in the commit message.

> Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
> ---
>  blockdev.c |   13 +++++++++++++
>  blockdev.h |    1 +
>  2 files changed, 14 insertions(+), 0 deletions(-)
[...]

This effectively reverts commit dfb0acd8, which cleans up after commit
f8b6cc00:

    qdev: Decouple qdev_prop_drive from DriveInfo
    
    Make the property point to BlockDriverState, cutting out the DriveInfo
    middleman.  This prepares the ground for block devices that don't have
    a DriveInfo.

    Currently all user-defined ones have a DriveInfo, because the only way
    to define one is -drive & friends (they go through drive_init()).
    DriveInfo is closely tied to -drive, and like -drive, it mixes
    information about host and guest part of the block device.  I'm
    working towards a new way to define block devices, with clean
    host/guest separation, and I need to get DriveInfo out of the way for
    that.

    Fortunately, the device models are perfectly happy with
    BlockDriverState, except for two places: ide_drive_initfn() and
    scsi_disk_initfn() need to check the DriveInfo for a serial number set
    with legacy -drive serial=...  Use drive_get_by_blockdev() there.
    
    Device model code should now use DriveInfo only when explicitly
    dealing with drives defined the old way, i.e. without -device.

I think your do_drive_unplug() could use bdrv_find() instead.  More on
that in my review of your PATCH 2/3.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] v2 Fix Block Hotplug race with drive_unplug()
  2010-10-25 18:22 ` [Qemu-devel] [PATCH 2/3] v2 Fix Block Hotplug race with drive_unplug() Ryan Harper
@ 2010-10-29 14:01   ` Markus Armbruster
  2010-10-29 14:15     ` Anthony Liguori
  2010-11-01 21:06     ` Ryan Harper
  0 siblings, 2 replies; 60+ messages in thread
From: Markus Armbruster @ 2010-10-29 14:01 UTC (permalink / raw)
  To: Ryan Harper; +Cc: Stefan Hajnoczi, Anthony Liguori, qemu-devel, Kevin Wolf

Ryan Harper <ryanh@us.ibm.com> writes:

> Block hot unplug is racy since the guest is required to acknowlege the ACPI
> unplug event; this may not happen synchronously with the device removal command
>
> This series aims to close a gap where by mgmt applications that assume the
> block resource has been removed without confirming that the guest has
> acknowledged the removal may re-assign the underlying device to a second guest
> leading to data leakage.
>
> This series introduces a new montor command to decouple asynchornous device
> removal from restricting guest access to a block device.  We do this by creating
> a new monitor command drive_unplug which maps to a bdrv_unplug() command which
> does a qemu_aio_flush; bdrv_flush() and bdrv_close().  Once complete, subsequent
> IO is rejected from the device and the guest will get IO errors but continue to
> function.
>
> A subsequent device removal command can be issued to remove the device, to which
> the guest may or maynot respond, but as long as the unplugged bit is set, no IO
> will be sumbitted.
>
> Changes since v1:
> - Added qemu_aio_flush() before bdrv_flush() to wait on pending io
>
> Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
> ---
>  block.c         |    7 +++++++
>  block.h         |    1 +
>  blockdev.c      |   26 ++++++++++++++++++++++++++
>  blockdev.h      |    1 +
>  hmp-commands.hx |   15 +++++++++++++++
>  5 files changed, 50 insertions(+), 0 deletions(-)
>
> diff --git a/block.c b/block.c
> index a19374d..be47655 100644
> --- a/block.c
> +++ b/block.c
> @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int removable)
>      }
>  }
>  
> +void bdrv_unplug(BlockDriverState *bs)
> +{
> +    qemu_aio_flush();
> +    bdrv_flush(bs);
> +    bdrv_close(bs);
> +}

Stupid question: why doesn't bdrv_close() flush automatically?

And why do we have to flush here, but not before other uses of
bdrv_close(), such as eject_device()?

> +
>  int bdrv_is_removable(BlockDriverState *bs)
>  {
>      return bs->removable;
> diff --git a/block.h b/block.h
> index 5f64380..732f63e 100644
> --- a/block.h
> +++ b/block.h
> @@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs, BlockErrorAction on_read_error,
>                         BlockErrorAction on_write_error);
>  BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read);
>  void bdrv_set_removable(BlockDriverState *bs, int removable);
> +void bdrv_unplug(BlockDriverState *bs);
>  int bdrv_is_removable(BlockDriverState *bs);
>  int bdrv_is_read_only(BlockDriverState *bs);
>  int bdrv_is_sg(BlockDriverState *bs);
> diff --git a/blockdev.c b/blockdev.c
> index 5fc3b9b..68eb329 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -610,3 +610,29 @@ int do_change_block(Monitor *mon, const char *device,
>      }
>      return monitor_read_bdrv_key_start(mon, bs, NULL, NULL);
>  }
> +
> +int do_drive_unplug(Monitor *mon, const QDict *qdict, QObject **ret_data)
> +{
> +    DriveInfo *dinfo;
> +    BlockDriverState *bs;
> +    const char *id;
> +
> +    if (!qdict_haskey(qdict, "id")) {
> +        qerror_report(QERR_MISSING_PARAMETER, "id");
> +        return -1;
> +    }

As Luiz pointed out, this check is redundant.

> +
> +    id = qdict_get_str(qdict, "id");
> +    dinfo = drive_get_by_id(id);
> +    if (!dinfo) {
> +        qerror_report(QERR_DEVICE_NOT_FOUND, id);
> +        return -1;
> +    }
> +
> +    /* mark block device unplugged */
> +    bs = dinfo->bdrv;
> +    bdrv_unplug(bs);
> +
> +    return 0;
> +}
> + 

What about:

    const char *id = qdict_get_str(qdict, "id");
    BlockDriverState *bs;

    bs = bdrv_find(id);
    if (!bs) {
        qerror_report(QERR_DEVICE_NOT_FOUND, id);
        return -1;
    }

    bdrv_unplug(bs);

    return 0;

Precedence: commit f8b6cc00 replaced uses of drive_get_by_id() by
bdrv_find().

> diff --git a/blockdev.h b/blockdev.h
> index 19c6915..ecb9ac8 100644
> --- a/blockdev.h
> +++ b/blockdev.h
> @@ -52,5 +52,6 @@ int do_eject(Monitor *mon, const QDict *qdict, QObject **ret_data);
>  int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject **ret_data);
>  int do_change_block(Monitor *mon, const char *device,
>                      const char *filename, const char *fmt);
> +int do_drive_unplug(Monitor *mon, const QDict *qdict, QObject **ret_data);
>  
>  #endif
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index 81999aa..7a32a2e 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -68,6 +68,21 @@ Eject a removable medium (use -f to force it).
>  ETEXI
>  
>      {
> +        .name       = "drive_unplug",
> +        .args_type  = "id:s",
> +        .params     = "device",
> +        .help       = "unplug block device",
> +        .user_print = monitor_user_noop,
> +        .mhandler.cmd_new = do_drive_unplug,
> +    },
> +
> +STEXI
> +@item unplug @var{device}
> +@findex unplug
> +Unplug block device.

A bit terse, isn't it?  What does it mean to unplug a block device?
What's its observable effect on the guest?  Does it look like disk gone
completely south, perhaps?

> +ETEXI
> +
> +    {
>          .name       = "change",
>          .args_type  = "device:B,target:F,arg:s?",
>          .params     = "device filename [format]",

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-10-25 18:22 [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal Ryan Harper
                   ` (2 preceding siblings ...)
  2010-10-25 18:22 ` [Qemu-devel] [PATCH 3/3] Add qmp version of drive_unplug Ryan Harper
@ 2010-10-29 14:12 ` Markus Armbruster
  2010-10-29 15:03   ` Ryan Harper
  3 siblings, 1 reply; 60+ messages in thread
From: Markus Armbruster @ 2010-10-29 14:12 UTC (permalink / raw)
  To: Ryan Harper
  Cc: Stefan Hajnoczi, Anthony Liguori, qemu-devel, Kevin Wolf,
	Michael S. Tsirkin

[Note cc: Michael]

Ryan Harper <ryanh@us.ibm.com> writes:

> This patch series decouples the detachment of a block device from the removal
> of the backing pci-device.  Removal of a hotplugged pci device requires the
> guest to respond before qemu tears down the block device. In some cases, the
> guest may not respond leaving the guest with continued access to the block
> device.  
>
> The new monitor command, drive_unplug, will revoke a guests access to the
> block device independently of the removal of the pci device.
>
> The first patch adds a new drive find method, the second patch implements the
> monitor command and block layer changes.
>
> Changes since v3:
> - Moved QMP command for drive_unplug() to separate patch
>
> Changes since v2:
> - Added QMP command for drive_unplug()
>
> Changes since v1:
> - CodingStyle fixes
> - Added qemu_aio_flush() to bdrv_unplug()
>
> Signed-off-by: Ryan Harper <ryanh@us.ibm.com>

If I understand your patch correctly, the difference between your
drive_unplug and my blockdev_del is as follows:

* drive_unplug forcefully severs the connection between the host part of
  the block device and its BlockDriverState.  A shell of the host part
  remains, to be cleaned up later.  You need forceful disconnect
  operation to be able to revoke access to an image whether the guest
  cooperates or not.  Fair enough.

* blockdev_del deletes a host part.  My current version fails when the
  host part is in use.  I patterned that after netdev_del, which used to
  work that way, until commit 2ffcb18d:

    Make netdev_del delete the netdev even when it's in use

    To hot-unplug guest and host part of a network device, you do:

        device_del NIC-ID
        netdev_del NETDEV-ID

    For PCI devices, device_del merely tells ACPI to unplug the device.
    The device goes away for real only after the guest processed the ACPI
    unplug event.

    You have to wait until then (e.g. by polling info pci) before you can
    unplug the netdev.  Not good.

    Fix by removing the "in use" check from do_netdev_del().  Deleting a
    netdev while it's in use is safe; packets simply get routed to the bit
    bucket.

  Isn't this the very same problem that's behind your drive_unplug?

I'd like to have some consistency among net, block and char device
commands, i.e. a common set of operations that work the same for all of
them.  Can we agree on such a set?

Even if your drive_unplug shouldn't fit in that set, we might want it as
a stop-gap.  Depends on how urgent the need for it is.  Yet another
special-purpose command to be deprecated later.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] v2 Fix Block Hotplug race with drive_unplug()
  2010-10-29 14:01   ` Markus Armbruster
@ 2010-10-29 14:15     ` Anthony Liguori
  2010-10-29 14:29       ` Kevin Wolf
  2010-10-29 15:28       ` Markus Armbruster
  2010-11-01 21:06     ` Ryan Harper
  1 sibling, 2 replies; 60+ messages in thread
From: Anthony Liguori @ 2010-10-29 14:15 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: Stefan Hajnoczi, Kevin Wolf, Ryan Harper, qemu-devel

On 10/29/2010 09:01 AM, Markus Armbruster wrote:
> Ryan Harper<ryanh@us.ibm.com>  writes:
>
>    
>> Block hot unplug is racy since the guest is required to acknowlege the ACPI
>> unplug event; this may not happen synchronously with the device removal command
>>
>> This series aims to close a gap where by mgmt applications that assume the
>> block resource has been removed without confirming that the guest has
>> acknowledged the removal may re-assign the underlying device to a second guest
>> leading to data leakage.
>>
>> This series introduces a new montor command to decouple asynchornous device
>> removal from restricting guest access to a block device.  We do this by creating
>> a new monitor command drive_unplug which maps to a bdrv_unplug() command which
>> does a qemu_aio_flush; bdrv_flush() and bdrv_close().  Once complete, subsequent
>> IO is rejected from the device and the guest will get IO errors but continue to
>> function.
>>
>> A subsequent device removal command can be issued to remove the device, to which
>> the guest may or maynot respond, but as long as the unplugged bit is set, no IO
>> will be sumbitted.
>>
>> Changes since v1:
>> - Added qemu_aio_flush() before bdrv_flush() to wait on pending io
>>
>> Signed-off-by: Ryan Harper<ryanh@us.ibm.com>
>> ---
>>   block.c         |    7 +++++++
>>   block.h         |    1 +
>>   blockdev.c      |   26 ++++++++++++++++++++++++++
>>   blockdev.h      |    1 +
>>   hmp-commands.hx |   15 +++++++++++++++
>>   5 files changed, 50 insertions(+), 0 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index a19374d..be47655 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int removable)
>>       }
>>   }
>>
>> +void bdrv_unplug(BlockDriverState *bs)
>> +{
>> +    qemu_aio_flush();
>> +    bdrv_flush(bs);
>> +    bdrv_close(bs);
>> +}
>>      
> Stupid question: why doesn't bdrv_close() flush automatically?
>    

I don't think it's a bad idea to do that but to the extent that the 
block API is designed after posix file I/O, close does not usually imply 
flush.

> And why do we have to flush here, but not before other uses of
> bdrv_close(), such as eject_device()?
>    

Good question.  Kevin should also confirm, but looking at the code, I 
think flush() is needed before close.  If there's a pending I/O event 
and you close before the I/O event is completed, you'll get a callback 
for completion against a bogus BlockDriverState.

I can't find anything in either raw-posix or the generic block layer 
that would mitigate this.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] v2 Fix Block Hotplug race with drive_unplug()
  2010-10-29 14:15     ` Anthony Liguori
@ 2010-10-29 14:29       ` Kevin Wolf
  2010-10-29 14:40         ` Anthony Liguori
  2010-10-29 15:28       ` Markus Armbruster
  1 sibling, 1 reply; 60+ messages in thread
From: Kevin Wolf @ 2010-10-29 14:29 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Stefan Hajnoczi, Ryan Harper, Markus Armbruster, qemu-devel

Am 29.10.2010 16:15, schrieb Anthony Liguori:
> On 10/29/2010 09:01 AM, Markus Armbruster wrote:
>> Ryan Harper<ryanh@us.ibm.com>  writes:
>>> diff --git a/block.c b/block.c
>>> index a19374d..be47655 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int removable)
>>>       }
>>>   }
>>>
>>> +void bdrv_unplug(BlockDriverState *bs)
>>> +{
>>> +    qemu_aio_flush();
>>> +    bdrv_flush(bs);
>>> +    bdrv_close(bs);
>>> +}
>>>      
>> Stupid question: why doesn't bdrv_close() flush automatically?
>>    
> 
> I don't think it's a bad idea to do that but to the extent that the 
> block API is designed after posix file I/O, close does not usually imply 
> flush.

I don't think it really resembles POSIX. More or less the only thing
they have in common is that both provide open, read, write and close,
which is something that probably any API for file accesses provides.

The operation you're talking about here is bdrv_flush/fsync that is not
implied by a POSIX close?

>> And why do we have to flush here, but not before other uses of
>> bdrv_close(), such as eject_device()?
>>    
> 
> Good question.  Kevin should also confirm, but looking at the code, I 
> think flush() is needed before close.  If there's a pending I/O event 
> and you close before the I/O event is completed, you'll get a callback 
> for completion against a bogus BlockDriverState.
> 
> I can't find anything in either raw-posix or the generic block layer 
> that would mitigate this.

I'm not aware of anything either. This is what qemu_aio_flush would do.

It seems reasonable to me to call both qemu_aio_flush and bdrv_flush in
bdrv_close. We probably don't really need to call bdrv_flush to operate
correctly, but it can't hurt and bdrv_close shouldn't happen that often
anyway.

Kevin

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] v2 Fix Block Hotplug race with drive_unplug()
  2010-10-29 14:29       ` Kevin Wolf
@ 2010-10-29 14:40         ` Anthony Liguori
  2010-10-29 14:57           ` Kevin Wolf
  0 siblings, 1 reply; 60+ messages in thread
From: Anthony Liguori @ 2010-10-29 14:40 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Stefan Hajnoczi, Ryan Harper, Markus Armbruster, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2541 bytes --]

On 10/29/2010 09:29 AM, Kevin Wolf wrote:
> Am 29.10.2010 16:15, schrieb Anthony Liguori:
>    
>> On 10/29/2010 09:01 AM, Markus Armbruster wrote:
>>      
>>> Ryan Harper<ryanh@us.ibm.com>   writes:
>>>        
>>>> diff --git a/block.c b/block.c
>>>> index a19374d..be47655 100644
>>>> --- a/block.c
>>>> +++ b/block.c
>>>> @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int removable)
>>>>        }
>>>>    }
>>>>
>>>> +void bdrv_unplug(BlockDriverState *bs)
>>>> +{
>>>> +    qemu_aio_flush();
>>>> +    bdrv_flush(bs);
>>>> +    bdrv_close(bs);
>>>> +}
>>>>
>>>>          
>>> Stupid question: why doesn't bdrv_close() flush automatically?
>>>
>>>        
>> I don't think it's a bad idea to do that but to the extent that the
>> block API is designed after posix file I/O, close does not usually imply
>> flush.
>>      
> I don't think it really resembles POSIX. More or less the only thing
> they have in common is that both provide open, read, write and close,
> which is something that probably any API for file accesses provides.
>
> The operation you're talking about here is bdrv_flush/fsync that is not
> implied by a POSIX close?
>    

Yes.  But I think for the purposes of this patch, a bdrv_cancel_all() 
would be just as good.  The intention is to eliminate pending I/O 
requests, the fsync is just a side effect.

>>> And why do we have to flush here, but not before other uses of
>>> bdrv_close(), such as eject_device()?
>>>
>>>        
>> Good question.  Kevin should also confirm, but looking at the code, I
>> think flush() is needed before close.  If there's a pending I/O event
>> and you close before the I/O event is completed, you'll get a callback
>> for completion against a bogus BlockDriverState.
>>
>> I can't find anything in either raw-posix or the generic block layer
>> that would mitigate this.
>>      
> I'm not aware of anything either. This is what qemu_aio_flush would do.
>
> It seems reasonable to me to call both qemu_aio_flush and bdrv_flush in
> bdrv_close. We probably don't really need to call bdrv_flush to operate
> correctly, but it can't hurt and bdrv_close shouldn't happen that often
> anyway.
>    

I agree.  Re: qemu_aio_flush, we have to wait for it to complete which 
gets a little complicated in bdrv_close().  I think it would be better 
to make bdrv_flush() call bdrv_aio_flush() if an explicit bdrv_flush 
method isn't provided.  Something like the attached (still need to test).

Does that seem reasonable?

Regards,

Anthony Liguori

> Kevin
>    


[-- Attachment #2: 0001-block-make-bdrv_flush-fall-back-to-bdrv_aio_flush.patch --]
[-- Type: text/x-patch, Size: 1366 bytes --]

>From 86bf3c9eb5ce43280224f9271a4ad016b0dd3fb1 Mon Sep 17 00:00:00 2001
From: Anthony Liguori <aliguori@us.ibm.com>
Date: Fri, 29 Oct 2010 09:36:53 -0500
Subject: [PATCH 1/2] block: make bdrv_flush() fall back to bdrv_aio_flush

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

diff --git a/block.c b/block.c
index 985d0b7..fc8defd 100644
--- a/block.c
+++ b/block.c
@@ -1453,14 +1453,51 @@ const char *bdrv_get_device_name(BlockDriverState *bs)
     return bs->device_name;
 }
 
+static void bdrv_flush_em_cb(void *opaque, int ret)
+{
+    int *pcomplete = opaque;
+    *pcomplete = 1;
+}
+
+static void bdrv_flush_em(BlockDriverState *bs)
+{
+    int complete = 0;
+    BlockDriverAIOCB *acb;
+
+    if (!bs->drv->bdrv_aio_flush) {
+        return;
+    }
+
+    async_context_push();
+
+    acb = bs->drv->bdrv_aio_flush(bs, bdrv_flush_em_cb, &complete);
+    if (!acb) {
+        goto out;
+    }
+
+    while (!complete) {
+        qemu_aio_wait();
+    }
+
+out:
+    async_context_pop();
+}
+
 void bdrv_flush(BlockDriverState *bs)
 {
     if (bs->open_flags & BDRV_O_NO_FLUSH) {
         return;
     }
 
-    if (bs->drv && bs->drv->bdrv_flush)
+    if (!bs->drv) {
+        return;
+    }
+
+    if (bs->drv->bdrv_flush) {
         bs->drv->bdrv_flush(bs);
+    } else {
+        bdrv_flush_em(bs);
+    }
 }
 
 void bdrv_flush_all(void)
-- 
1.7.0.4


[-- Attachment #3: 0002-block-add-bdrv_flush-to-bdrv_close.patch --]
[-- Type: text/x-patch, Size: 657 bytes --]

>From 094049974796ddf78ee2f1541bffa40fe1176a1a Mon Sep 17 00:00:00 2001
From: Anthony Liguori <aliguori@us.ibm.com>
Date: Fri, 29 Oct 2010 09:37:25 -0500
Subject: [PATCH 2/2] block: add bdrv_flush to bdrv_close

To ensure that there are no pending completions before destroying a block
device.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

diff --git a/block.c b/block.c
index fc8defd..d2aed1b 100644
--- a/block.c
+++ b/block.c
@@ -644,6 +644,8 @@ unlink_and_fail:
 void bdrv_close(BlockDriverState *bs)
 {
     if (bs->drv) {
+        bdrv_flush(bs);
+
         if (bs == bs_snapshots) {
             bs_snapshots = NULL;
         }
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] v2 Fix Block Hotplug race with drive_unplug()
  2010-10-29 14:40         ` Anthony Liguori
@ 2010-10-29 14:57           ` Kevin Wolf
  2010-10-29 15:28             ` Anthony Liguori
  0 siblings, 1 reply; 60+ messages in thread
From: Kevin Wolf @ 2010-10-29 14:57 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Stefan Hajnoczi, Ryan Harper, Markus Armbruster, qemu-devel

Am 29.10.2010 16:40, schrieb Anthony Liguori:
> On 10/29/2010 09:29 AM, Kevin Wolf wrote:
>> Am 29.10.2010 16:15, schrieb Anthony Liguori:
>>> I don't think it's a bad idea to do that but to the extent that the
>>> block API is designed after posix file I/O, close does not usually imply
>>> flush.
>>>      
>> I don't think it really resembles POSIX. More or less the only thing
>> they have in common is that both provide open, read, write and close,
>> which is something that probably any API for file accesses provides.
>>
>> The operation you're talking about here is bdrv_flush/fsync that is not
>> implied by a POSIX close?
>>    
> 
> Yes.  But I think for the purposes of this patch, a bdrv_cancel_all() 
> would be just as good.  The intention is to eliminate pending I/O 
> requests, the fsync is just a side effect.

Well, if I'm not mistaken, bdrv_flush would provide only this side
effect and not the semantics that you're really looking for. This is why
I suggested adding both bdrv_flush and qemu_aio_flush. We could probably
introduce a qemu_aio_flush variant that flushes only one
BlockDriverState - this is what you really want.

>>>> And why do we have to flush here, but not before other uses of
>>>> bdrv_close(), such as eject_device()?
>>>>
>>>>        
>>> Good question.  Kevin should also confirm, but looking at the code, I
>>> think flush() is needed before close.  If there's a pending I/O event
>>> and you close before the I/O event is completed, you'll get a callback
>>> for completion against a bogus BlockDriverState.
>>>
>>> I can't find anything in either raw-posix or the generic block layer
>>> that would mitigate this.
>>>      
>> I'm not aware of anything either. This is what qemu_aio_flush would do.
>>
>> It seems reasonable to me to call both qemu_aio_flush and bdrv_flush in
>> bdrv_close. We probably don't really need to call bdrv_flush to operate
>> correctly, but it can't hurt and bdrv_close shouldn't happen that often
>> anyway.
>>    
> 
> I agree.  Re: qemu_aio_flush, we have to wait for it to complete which 
> gets a little complicated in bdrv_close().  

qemu_aio_flush is the function that waits for requests to complete.

> I think it would be better 
> to make bdrv_flush() call bdrv_aio_flush() if an explicit bdrv_flush 
> method isn't provided.  Something like the attached (still need to test).
> 
> Does that seem reasonable?

I'm not sure why you want to introduce this emulation. Are there any
drivers that implement bdrv_aio_flush, but not bdrv_flush? They are
definitely broken.

Today, bdrv_aio_flush is emulated using bdrv_flush if the driver doesn't
provide it explicitly.

I think this also means that your first patch would kill any drivers
implementing neither bdrv_flush nor bdrv_aio_flush because they'd try to
emulate the other function in an endless recursion.

And apart from that, as said above, bdrv_flush doesn't do the right
thing anyway. ;-)

Kevin

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-10-29 14:12 ` [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal Markus Armbruster
@ 2010-10-29 15:03   ` Ryan Harper
  2010-10-29 16:10     ` Markus Armbruster
  0 siblings, 1 reply; 60+ messages in thread
From: Ryan Harper @ 2010-10-29 15:03 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Kevin Wolf, Michael S. Tsirkin, qemu-devel, Anthony Liguori,
	Ryan Harper, Stefan Hajnoczi

* Markus Armbruster <armbru@redhat.com> [2010-10-29 09:13]:
> [Note cc: Michael]
> 
> Ryan Harper <ryanh@us.ibm.com> writes:
> 
> > This patch series decouples the detachment of a block device from the removal
> > of the backing pci-device.  Removal of a hotplugged pci device requires the
> > guest to respond before qemu tears down the block device. In some cases, the
> > guest may not respond leaving the guest with continued access to the block
> > device.  
> >
> > The new monitor command, drive_unplug, will revoke a guests access to the
> > block device independently of the removal of the pci device.
> >
> > The first patch adds a new drive find method, the second patch implements the
> > monitor command and block layer changes.
> >
> > Changes since v3:
> > - Moved QMP command for drive_unplug() to separate patch
> >
> > Changes since v2:
> > - Added QMP command for drive_unplug()
> >
> > Changes since v1:
> > - CodingStyle fixes
> > - Added qemu_aio_flush() to bdrv_unplug()
> >
> > Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
> 
> If I understand your patch correctly, the difference between your
> drive_unplug and my blockdev_del is as follows:
> 
> * drive_unplug forcefully severs the connection between the host part of
>   the block device and its BlockDriverState.  A shell of the host part
>   remains, to be cleaned up later.  You need forceful disconnect
>   operation to be able to revoke access to an image whether the guest
>   cooperates or not.  Fair enough.
> 
> * blockdev_del deletes a host part.  My current version fails when the
>   host part is in use.  I patterned that after netdev_del, which used to
>   work that way, until commit 2ffcb18d:
> 
>     Make netdev_del delete the netdev even when it's in use
>     
>     To hot-unplug guest and host part of a network device, you do:
>     
>         device_del NIC-ID
>         netdev_del NETDEV-ID
>     
>     For PCI devices, device_del merely tells ACPI to unplug the device.
>     The device goes away for real only after the guest processed the ACPI
>     unplug event.
>     
>     You have to wait until then (e.g. by polling info pci) before you can
>     unplug the netdev.  Not good.
>     
>     Fix by removing the "in use" check from do_netdev_del().  Deleting a
>     netdev while it's in use is safe; packets simply get routed to the bit
>     bucket.
> 
>   Isn't this the very same problem that's behind your drive_unplug?

Yes it is.

> 
> I'd like to have some consistency among net, block and char device
> commands, i.e. a common set of operations that work the same for all of
> them.  Can we agree on such a set?

Yeah; the current trouble (or at least what I perceive to be trouble) is
that in the case where the guest responds to device_del induced ACPI
removal event; the current qdev code already does the host-side device
tear down.  Not sure if it is OK to do a blockdev_del() immediately
after the device_del.  What happens when we do:

device_del
ACPI to guest
blockdev_del /* removes host-side device */
guest responds to ACPI
qdev calls pci device removal code
qemu attempts to destroy the associated host-side block

That may just work today; and if not, it shouldn't be hard to fix up the
code to check for NULLs

> 
> Even if your drive_unplug shouldn't fit in that set, we might want it as
> a stop-gap.  Depends on how urgent the need for it is.  Yet another
> special-purpose command to be deprecated later.

The fix is urgent; but I'm willing to spin a couple patches if it helps
get this into better shape.


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] v2 Fix Block Hotplug race with drive_unplug()
  2010-10-29 14:57           ` Kevin Wolf
@ 2010-10-29 15:28             ` Anthony Liguori
  2010-10-29 16:08               ` Kevin Wolf
  0 siblings, 1 reply; 60+ messages in thread
From: Anthony Liguori @ 2010-10-29 15:28 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Stefan Hajnoczi, Ryan Harper, Christoph Hellwig,
	Markus Armbruster, qemu-devel

On 10/29/2010 09:57 AM, Kevin Wolf wrote:
> Am 29.10.2010 16:40, schrieb Anthony Liguori:
>    
>> On 10/29/2010 09:29 AM, Kevin Wolf wrote:
>>      
>>> Am 29.10.2010 16:15, schrieb Anthony Liguori:
>>>        
>>>> I don't think it's a bad idea to do that but to the extent that the
>>>> block API is designed after posix file I/O, close does not usually imply
>>>> flush.
>>>>
>>>>          
>>> I don't think it really resembles POSIX. More or less the only thing
>>> they have in common is that both provide open, read, write and close,
>>> which is something that probably any API for file accesses provides.
>>>
>>> The operation you're talking about here is bdrv_flush/fsync that is not
>>> implied by a POSIX close?
>>>
>>>        
>> Yes.  But I think for the purposes of this patch, a bdrv_cancel_all()
>> would be just as good.  The intention is to eliminate pending I/O
>> requests, the fsync is just a side effect.
>>      
> Well, if I'm not mistaken, bdrv_flush would provide only this side
> effect and not the semantics that you're really looking for. This is why
> I suggested adding both bdrv_flush and qemu_aio_flush. We could probably
> introduce a qemu_aio_flush variant that flushes only one
> BlockDriverState - this is what you really want.
>
>    
>>>>> And why do we have to flush here, but not before other uses of
>>>>> bdrv_close(), such as eject_device()?
>>>>>
>>>>>
>>>>>            
>>>> Good question.  Kevin should also confirm, but looking at the code, I
>>>> think flush() is needed before close.  If there's a pending I/O event
>>>> and you close before the I/O event is completed, you'll get a callback
>>>> for completion against a bogus BlockDriverState.
>>>>
>>>> I can't find anything in either raw-posix or the generic block layer
>>>> that would mitigate this.
>>>>
>>>>          
>>> I'm not aware of anything either. This is what qemu_aio_flush would do.
>>>
>>> It seems reasonable to me to call both qemu_aio_flush and bdrv_flush in
>>> bdrv_close. We probably don't really need to call bdrv_flush to operate
>>> correctly, but it can't hurt and bdrv_close shouldn't happen that often
>>> anyway.
>>>
>>>        
>> I agree.  Re: qemu_aio_flush, we have to wait for it to complete which
>> gets a little complicated in bdrv_close().
>>      
> qemu_aio_flush is the function that waits for requests to complete.
>    

Please excuse me while my head explodes ;-)

I think we've got a bit of a problem.

We have:

1) bdrv_flush() - sends an fdatasync

2) bdrv_aio_flush() - sends an fdatasync using the thread pool

3) qemu_aio_flush() - waits for all pending aio requests to complete

But we use bdrv_aio_flush() to implement a barrier and we don't actually 
preserve those barrier semantics in the thread pool.

That is:

If I do:

bdrv_aio_write() -> A
bdrv_aio_write() -> B
bdrv_aio_flush() -> C

This will get queued as three requests on the thread pool.  (A) is a 
write, (B) is a write, and (C) is a fdatasync.

But if this gets picked up by three separate threads, the ordering isn't 
guaranteed.  It might be C, B, A.  So semantically, is bdrv_aio_flush() 
supposed to flush any *pending* writes or any *completed* writes?  If 
it's the later, we're okay, but if it's the former, we're broken.

If it's supposed to flush any pending writes, then my patch series is 
correct in theory.

Regards,

Anthony Liguori

>> I think it would be better
>> to make bdrv_flush() call bdrv_aio_flush() if an explicit bdrv_flush
>> method isn't provided.  Something like the attached (still need to test).
>>
>> Does that seem reasonable?
>>      
> I'm not sure why you want to introduce this emulation. Are there any
> drivers that implement bdrv_aio_flush, but not bdrv_flush? They are
> definitely broken.
>
> Today, bdrv_aio_flush is emulated using bdrv_flush if the driver doesn't
> provide it explicitly.
>
> I think this also means that your first patch would kill any drivers
> implementing neither bdrv_flush nor bdrv_aio_flush because they'd try to
> emulate the other function in an endless recursion.
>
> And apart from that, as said above, bdrv_flush doesn't do the right
> thing anyway. ;-)
>
> Kevin
>    

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] v2 Fix Block Hotplug race with drive_unplug()
  2010-10-29 14:15     ` Anthony Liguori
  2010-10-29 14:29       ` Kevin Wolf
@ 2010-10-29 15:28       ` Markus Armbruster
  1 sibling, 0 replies; 60+ messages in thread
From: Markus Armbruster @ 2010-10-29 15:28 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Stefan Hajnoczi, Kevin Wolf, Ryan Harper, qemu-devel

Anthony Liguori <aliguori@linux.vnet.ibm.com> writes:

> On 10/29/2010 09:01 AM, Markus Armbruster wrote:
>> Ryan Harper<ryanh@us.ibm.com>  writes:
>>
>>    
>>> Block hot unplug is racy since the guest is required to acknowlege the ACPI
>>> unplug event; this may not happen synchronously with the device removal command
>>>
>>> This series aims to close a gap where by mgmt applications that assume the
>>> block resource has been removed without confirming that the guest has
>>> acknowledged the removal may re-assign the underlying device to a second guest
>>> leading to data leakage.
>>>
>>> This series introduces a new montor command to decouple asynchornous device
>>> removal from restricting guest access to a block device.  We do this by creating
>>> a new monitor command drive_unplug which maps to a bdrv_unplug() command which
>>> does a qemu_aio_flush; bdrv_flush() and bdrv_close().  Once complete, subsequent
>>> IO is rejected from the device and the guest will get IO errors but continue to
>>> function.
>>>
>>> A subsequent device removal command can be issued to remove the device, to which
>>> the guest may or maynot respond, but as long as the unplugged bit is set, no IO
>>> will be sumbitted.
>>>
>>> Changes since v1:
>>> - Added qemu_aio_flush() before bdrv_flush() to wait on pending io
>>>
>>> Signed-off-by: Ryan Harper<ryanh@us.ibm.com>
>>> ---
>>>   block.c         |    7 +++++++
>>>   block.h         |    1 +
>>>   blockdev.c      |   26 ++++++++++++++++++++++++++
>>>   blockdev.h      |    1 +
>>>   hmp-commands.hx |   15 +++++++++++++++
>>>   5 files changed, 50 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/block.c b/block.c
>>> index a19374d..be47655 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int removable)
>>>       }
>>>   }
>>>
>>> +void bdrv_unplug(BlockDriverState *bs)
>>> +{
>>> +    qemu_aio_flush();
>>> +    bdrv_flush(bs);
>>> +    bdrv_close(bs);
>>> +}
>>>      
>> Stupid question: why doesn't bdrv_close() flush automatically?
>>    
>
> I don't think it's a bad idea to do that but to the extent that the
> block API is designed after posix file I/O, close does not usually
> imply flush.

There is no flush() in POSIX file I/O.  There is fsync().

There is fflush() in stdio.  fclose() flushes automatically.  Flushing
only affects stdio buffers, it doesn't imply fsync().

Based on that, a reasonable programmer could be led to believe that
bdrv_close() flushes automatically, and flushing doesn't fsync().

>> And why do we have to flush here, but not before other uses of
>> bdrv_close(), such as eject_device()?
>>    
>
> Good question.  Kevin should also confirm, but looking at the code, I
> think flush() is needed before close.  If there's a pending I/O event
> and you close before the I/O event is completed, you'll get a callback
> for completion against a bogus BlockDriverState.
>
> I can't find anything in either raw-posix or the generic block layer
> that would mitigate this.

Then bdrv_close() is too hard to use.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] v2 Fix Block Hotplug race with drive_unplug()
  2010-10-29 15:28             ` Anthony Liguori
@ 2010-10-29 16:08               ` Kevin Wolf
  2010-10-30 13:25                 ` Christoph Hellwig
  0 siblings, 1 reply; 60+ messages in thread
From: Kevin Wolf @ 2010-10-29 16:08 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Stefan Hajnoczi, Ryan Harper, Christoph Hellwig,
	Markus Armbruster, qemu-devel

Am 29.10.2010 17:28, schrieb Anthony Liguori:
> On 10/29/2010 09:57 AM, Kevin Wolf wrote:
>> Am 29.10.2010 16:40, schrieb Anthony Liguori:
>>    
>>> On 10/29/2010 09:29 AM, Kevin Wolf wrote:
>>>      
>>>> Am 29.10.2010 16:15, schrieb Anthony Liguori:
>>>>        
>>>>> I don't think it's a bad idea to do that but to the extent that the
>>>>> block API is designed after posix file I/O, close does not usually imply
>>>>> flush.
>>>>>
>>>>>          
>>>> I don't think it really resembles POSIX. More or less the only thing
>>>> they have in common is that both provide open, read, write and close,
>>>> which is something that probably any API for file accesses provides.
>>>>
>>>> The operation you're talking about here is bdrv_flush/fsync that is not
>>>> implied by a POSIX close?
>>>>
>>>>        
>>> Yes.  But I think for the purposes of this patch, a bdrv_cancel_all()
>>> would be just as good.  The intention is to eliminate pending I/O
>>> requests, the fsync is just a side effect.
>>>      
>> Well, if I'm not mistaken, bdrv_flush would provide only this side
>> effect and not the semantics that you're really looking for. This is why
>> I suggested adding both bdrv_flush and qemu_aio_flush. We could probably
>> introduce a qemu_aio_flush variant that flushes only one
>> BlockDriverState - this is what you really want.
>>
>>    
>>>>>> And why do we have to flush here, but not before other uses of
>>>>>> bdrv_close(), such as eject_device()?
>>>>>>
>>>>>>
>>>>>>            
>>>>> Good question.  Kevin should also confirm, but looking at the code, I
>>>>> think flush() is needed before close.  If there's a pending I/O event
>>>>> and you close before the I/O event is completed, you'll get a callback
>>>>> for completion against a bogus BlockDriverState.
>>>>>
>>>>> I can't find anything in either raw-posix or the generic block layer
>>>>> that would mitigate this.
>>>>>
>>>>>          
>>>> I'm not aware of anything either. This is what qemu_aio_flush would do.
>>>>
>>>> It seems reasonable to me to call both qemu_aio_flush and bdrv_flush in
>>>> bdrv_close. We probably don't really need to call bdrv_flush to operate
>>>> correctly, but it can't hurt and bdrv_close shouldn't happen that often
>>>> anyway.
>>>>
>>>>        
>>> I agree.  Re: qemu_aio_flush, we have to wait for it to complete which
>>> gets a little complicated in bdrv_close().
>>>      
>> qemu_aio_flush is the function that waits for requests to complete.
>>    
> 
> Please excuse me while my head explodes ;-)
> 
> I think we've got a bit of a problem.
> 
> We have:
> 
> 1) bdrv_flush() - sends an fdatasync
> 
> 2) bdrv_aio_flush() - sends an fdatasync using the thread pool
> 
> 3) qemu_aio_flush() - waits for all pending aio requests to complete
> 
> But we use bdrv_aio_flush() to implement a barrier and we don't actually 
> preserve those barrier semantics in the thread pool.

Not really. We use it to implement flush commands, which I think don't
necessarily constitute a barrier by themselves.

> That is:
> 
> If I do:
> 
> bdrv_aio_write() -> A
> bdrv_aio_write() -> B
> bdrv_aio_flush() -> C
> 
> This will get queued as three requests on the thread pool.  (A) is a 
> write, (B) is a write, and (C) is a fdatasync.
> 
> But if this gets picked up by three separate threads, the ordering isn't 
> guaranteed.  It might be C, B, A.  So semantically, is bdrv_aio_flush() 
> supposed to flush any *pending* writes or any *completed* writes?  If 
> it's the later, we're okay, but if it's the former, we're broken.

Right, so don't do that. ;-)

bdrv_aio_flush, as I understand it, is meant to flush only completed
writes. We've had this discussion before and if I understood right, this
is also how real hardware works generally. So to get barrier semantics
you as an OS need to flush your queue, i.e. you wait for A and B to
complete before you issue C.

Christoph should be able to detail on this.

Kevin

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-10-29 15:03   ` Ryan Harper
@ 2010-10-29 16:10     ` Markus Armbruster
  2010-10-29 16:50       ` Ryan Harper
  0 siblings, 1 reply; 60+ messages in thread
From: Markus Armbruster @ 2010-10-29 16:10 UTC (permalink / raw)
  To: Ryan Harper
  Cc: Stefan Hajnoczi, Anthony Liguori, qemu-devel, Kevin Wolf,
	Michael S. Tsirkin

Ryan Harper <ryanh@us.ibm.com> writes:

> * Markus Armbruster <armbru@redhat.com> [2010-10-29 09:13]:
>> [Note cc: Michael]
>> 
>> Ryan Harper <ryanh@us.ibm.com> writes:
>> 
>> > This patch series decouples the detachment of a block device from the removal
>> > of the backing pci-device.  Removal of a hotplugged pci device requires the
>> > guest to respond before qemu tears down the block device. In some cases, the
>> > guest may not respond leaving the guest with continued access to the block
>> > device.  
>> >
>> > The new monitor command, drive_unplug, will revoke a guests access to the
>> > block device independently of the removal of the pci device.
>> >
>> > The first patch adds a new drive find method, the second patch implements the
>> > monitor command and block layer changes.
>> >
>> > Changes since v3:
>> > - Moved QMP command for drive_unplug() to separate patch
>> >
>> > Changes since v2:
>> > - Added QMP command for drive_unplug()
>> >
>> > Changes since v1:
>> > - CodingStyle fixes
>> > - Added qemu_aio_flush() to bdrv_unplug()
>> >
>> > Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
>> 
>> If I understand your patch correctly, the difference between your
>> drive_unplug and my blockdev_del is as follows:
>> 
>> * drive_unplug forcefully severs the connection between the host part of
>>   the block device and its BlockDriverState.  A shell of the host part
>>   remains, to be cleaned up later.  You need forceful disconnect
>>   operation to be able to revoke access to an image whether the guest
>>   cooperates or not.  Fair enough.
>> 
>> * blockdev_del deletes a host part.  My current version fails when the
>>   host part is in use.  I patterned that after netdev_del, which used to
>>   work that way, until commit 2ffcb18d:
>> 
>>     Make netdev_del delete the netdev even when it's in use
>>     
>>     To hot-unplug guest and host part of a network device, you do:
>>     
>>         device_del NIC-ID
>>         netdev_del NETDEV-ID
>>     
>>     For PCI devices, device_del merely tells ACPI to unplug the device.
>>     The device goes away for real only after the guest processed the ACPI
>>     unplug event.
>>     
>>     You have to wait until then (e.g. by polling info pci) before you can
>>     unplug the netdev.  Not good.
>>     
>>     Fix by removing the "in use" check from do_netdev_del().  Deleting a
>>     netdev while it's in use is safe; packets simply get routed to the bit
>>     bucket.
>> 
>>   Isn't this the very same problem that's behind your drive_unplug?
>
> Yes it is.
>
>> 
>> I'd like to have some consistency among net, block and char device
>> commands, i.e. a common set of operations that work the same for all of
>> them.  Can we agree on such a set?
>
> Yeah; the current trouble (or at least what I perceive to be trouble) is
> that in the case where the guest responds to device_del induced ACPI
> removal event; the current qdev code already does the host-side device
> tear down.  Not sure if it is OK to do a blockdev_del() immediately
> after the device_del.  What happens when we do:
>
> device_del
> ACPI to guest
> blockdev_del /* removes host-side device */

Fails in my tree, because the blockdev's still in use.  See below.

> guest responds to ACPI
> qdev calls pci device removal code
> qemu attempts to destroy the associated host-side block
>
> That may just work today; and if not, it shouldn't be hard to fix up the
> code to check for NULLs

I hate the automatic deletion of host part along with the guest part.
device_del should undo device_add.  {block,net,char}dev_{add,del} should
be similarly paired.

In my blockdev branch, I keep the automatic delete only for backwards
compatibility: if you create the drive with drive_add, it gets
auto-deleted, but if you use blockdev_add, it stays around.

>> Even if your drive_unplug shouldn't fit in that set, we might want it as
>> a stop-gap.  Depends on how urgent the need for it is.  Yet another
>> special-purpose command to be deprecated later.
>
> The fix is urgent; but I'm willing to spin a couple patches if it helps
> get this into better shape.

Can we agree on a common solution for block and net?  That's why I cc'ed
Michael.

Currently, we have two different ways:

* The netdev way: "del" always succeeds

  How can it succeed if the host part is in use?

  If all device models are prepared to deal with a missing host part, we
  can delete it right away.

  Else, we need to replace it with a suitable zombie, which is
  auto-deleted when it goes out of use.  Such zombies are not be visible
  elsewhere, in particular, the ID becomes available immediately.

* The unplug way: "del" fails while in use, "unplug" always succeeds

  Feels a bit cleaner to me.  But changing netdev_del might not be
  acceptable.

Either way works for me as an user interface.  But I'd rather not have
both.

Next, we need to consider how to integrate this with the automatic
deletion of drives on qdev destruction.  That's too late for unplug, we
want that right in device_del.  I'd leave the stupid automatic delete
where it is now, in qdev destruction.  The C API need unplug and delete
separate for that.


Regardless of the way we choose, we need to think very clearly on how
exactly device models should behave when their host part is missing or a
zombie, and how that behavior appears in the guest.

For net, making it look exactly like a yanked out network cable would
make sense to me.

What about block?

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-10-29 16:10     ` Markus Armbruster
@ 2010-10-29 16:50       ` Ryan Harper
  2010-11-02  9:40         ` Markus Armbruster
  0 siblings, 1 reply; 60+ messages in thread
From: Ryan Harper @ 2010-10-29 16:50 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Kevin Wolf, Michael S. Tsirkin, qemu-devel, Anthony Liguori,
	Ryan Harper, Stefan Hajnoczi

* Markus Armbruster <armbru@redhat.com> [2010-10-29 11:11]:
> Ryan Harper <ryanh@us.ibm.com> writes:
> 
> > * Markus Armbruster <armbru@redhat.com> [2010-10-29 09:13]:
> >> [Note cc: Michael]
> >> 
> >> Ryan Harper <ryanh@us.ibm.com> writes:
> >> 
> >> 
> >> If I understand your patch correctly, the difference between your
> >> drive_unplug and my blockdev_del is as follows:
> >> 
> >> * drive_unplug forcefully severs the connection between the host part of
> >>   the block device and its BlockDriverState.  A shell of the host part
> >>   remains, to be cleaned up later.  You need forceful disconnect
> >>   operation to be able to revoke access to an image whether the guest
> >>   cooperates or not.  Fair enough.
> >> 
> >> * blockdev_del deletes a host part.  My current version fails when the
> >>   host part is in use.  I patterned that after netdev_del, which used to
> >>   work that way, until commit 2ffcb18d:
> >> 
> >>     Make netdev_del delete the netdev even when it's in use
> >>     
> >>     To hot-unplug guest and host part of a network device, you do:
> >>     
> >>         device_del NIC-ID
> >>         netdev_del NETDEV-ID
> >>     
> >>     For PCI devices, device_del merely tells ACPI to unplug the device.
> >>     The device goes away for real only after the guest processed the ACPI
> >>     unplug event.
> >>     
> >>     You have to wait until then (e.g. by polling info pci) before you can
> >>     unplug the netdev.  Not good.
> >>     
> >>     Fix by removing the "in use" check from do_netdev_del().  Deleting a
> >>     netdev while it's in use is safe; packets simply get routed to the bit
> >>     bucket.
> >> 
> >>   Isn't this the very same problem that's behind your drive_unplug?
> >
> > Yes it is.
> >
> >> 
> >> I'd like to have some consistency among net, block and char device
> >> commands, i.e. a common set of operations that work the same for all of
> >> them.  Can we agree on such a set?
> >
> > Yeah; the current trouble (or at least what I perceive to be trouble) is
> > that in the case where the guest responds to device_del induced ACPI
> > removal event; the current qdev code already does the host-side device
> > tear down.  Not sure if it is OK to do a blockdev_del() immediately
> > after the device_del.  What happens when we do:
> >
> > device_del
> > ACPI to guest
> > blockdev_del /* removes host-side device */
> 
> Fails in my tree, because the blockdev's still in use.  See below.
> 
> > guest responds to ACPI
> > qdev calls pci device removal code
> > qemu attempts to destroy the associated host-side block
> >
> > That may just work today; and if not, it shouldn't be hard to fix up the
> > code to check for NULLs
> 
> I hate the automatic deletion of host part along with the guest part.
> device_del should undo device_add.  {block,net,char}dev_{add,del} should
> be similarly paired.

Agreed.
> 
> In my blockdev branch, I keep the automatic delete only for backwards
> compatibility: if you create the drive with drive_add, it gets
> auto-deleted, but if you use blockdev_add, it stays around.

But what to do about the case where we're doing drive_add and then a
device_del()  That's the urgent situation that needs to be resolved.

> 
> >> Even if your drive_unplug shouldn't fit in that set, we might want it as
> >> a stop-gap.  Depends on how urgent the need for it is.  Yet another
> >> special-purpose command to be deprecated later.
> >
> > The fix is urgent; but I'm willing to spin a couple patches if it helps
> > get this into better shape.
> 
> Can we agree on a common solution for block and net?  That's why I cc'ed
> Michael.

I didn't see a good way to have block behave the same as net; though I
do agree that it would be good to have this be common, long term.

> 
> Currently, we have two different ways:
> 
> * The netdev way: "del" always succeeds
> 
>   How can it succeed if the host part is in use?
> 
>   If all device models are prepared to deal with a missing host part, we
>   can delete it right away.
> 
>   Else, we need to replace it with a suitable zombie, which is
>   auto-deleted when it goes out of use.  Such zombies are not be visible
>   elsewhere, in particular, the ID becomes available immediately.
> 
> * The unplug way: "del" fails while in use, "unplug" always succeeds
> 
>   Feels a bit cleaner to me.  But changing netdev_del might not be
>   acceptable.
> 
> Either way works for me as an user interface.  But I'd rather not have
> both.
> 
> Next, we need to consider how to integrate this with the automatic
> deletion of drives on qdev destruction.  That's too late for unplug, we
> want that right in device_del.  I'd leave the stupid automatic delete
> where it is now, in qdev destruction.  The C API need unplug and delete
> separate for that.
> 
> 
> Regardless of the way we choose, we need to think very clearly on how
> exactly device models should behave when their host part is missing or a
> zombie, and how that behavior appears in the guest.
> 
> For net, making it look exactly like a yanked out network cable would
> make sense to me.
> 
> What about block?

It seems to me that for block it's like cdrom with no disk, floppy with
no media, hard disk that's gone bad.  I think we we throw EIO back; it's
handled gracefully enough.  This is what happens when you do a
drive_unplug with my patch; the application using the device gets IO
errors.  That's expected if a drive were to suddently fail (which is
what this looks like).  And certainly there is some responsibility
at the mgmt console to ensure you're not unplugging a drive that you are
currently using.




-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] v2 Fix Block Hotplug race with drive_unplug()
  2010-10-29 16:08               ` Kevin Wolf
@ 2010-10-30 13:25                 ` Christoph Hellwig
  0 siblings, 0 replies; 60+ messages in thread
From: Christoph Hellwig @ 2010-10-30 13:25 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Markus Armbruster, qemu-devel, Anthony Liguori, Ryan Harper,
	Stefan Hajnoczi, Christoph Hellwig

On Fri, Oct 29, 2010 at 06:08:03PM +0200, Kevin Wolf wrote:
> > I think we've got a bit of a problem.
> > 
> > We have:
> > 
> > 1) bdrv_flush() - sends an fdatasync
> > 
> > 2) bdrv_aio_flush() - sends an fdatasync using the thread pool
> > 
> > 3) qemu_aio_flush() - waits for all pending aio requests to complete
> > 
> > But we use bdrv_aio_flush() to implement a barrier and we don't actually 
> > preserve those barrier semantics in the thread pool.
> 
> Not really. We use it to implement flush commands, which I think don't
> necessarily constitute a barrier by themselves.

Yes.  Just as with normal disks qemu has absolutely no concept of I/O
barriers.  I/O barriers is an abstraction inside the Linux kernel that
we fortunately finally got rid of.

Qemu just gets a cache flush command from the guest and executes it.
Usuaully asynchronously as synchronous block I/O with a single
outstanding request is not very performant.  The filesystem in the guest
handles the ordering around it.

> bdrv_aio_flush, as I understand it, is meant to flush only completed
> writes.

Exactly.  The guest OS tracks writes and only issues a cache flush if
all I/Os it wants to see flushes have been ACKed by the storage hardware
/ qemu.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] v2 Fix Block Hotplug race with drive_unplug()
  2010-10-29 14:01   ` Markus Armbruster
  2010-10-29 14:15     ` Anthony Liguori
@ 2010-11-01 21:06     ` Ryan Harper
  1 sibling, 0 replies; 60+ messages in thread
From: Ryan Harper @ 2010-11-01 21:06 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Stefan Hajnoczi, Anthony Liguori, Ryan Harper, qemu-devel, Kevin Wolf

* Markus Armbruster <armbru@redhat.com> [2010-10-29 09:08]:
> Ryan Harper <ryanh@us.ibm.com> writes:
> 
> > Block hot unplug is racy since the guest is required to acknowlege the ACPI
> > unplug event; this may not happen synchronously with the device removal command
> >
> > This series aims to close a gap where by mgmt applications that assume the
> > block resource has been removed without confirming that the guest has
> > acknowledged the removal may re-assign the underlying device to a second guest
> > leading to data leakage.
> >
> > This series introduces a new montor command to decouple asynchornous device
> > removal from restricting guest access to a block device.  We do this by creating
> > a new monitor command drive_unplug which maps to a bdrv_unplug() command which
> > does a qemu_aio_flush; bdrv_flush() and bdrv_close().  Once complete, subsequent
> > IO is rejected from the device and the guest will get IO errors but continue to
> > function.
> >
> > A subsequent device removal command can be issued to remove the device, to which
> > the guest may or maynot respond, but as long as the unplugged bit is set, no IO
> > will be sumbitted.
> >
> > Changes since v1:
> > - Added qemu_aio_flush() before bdrv_flush() to wait on pending io
> >
> > Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
> > ---
> >  block.c         |    7 +++++++
> >  block.h         |    1 +
> >  blockdev.c      |   26 ++++++++++++++++++++++++++
> >  blockdev.h      |    1 +
> >  hmp-commands.hx |   15 +++++++++++++++
> >  5 files changed, 50 insertions(+), 0 deletions(-)
> >
> > diff --git a/block.c b/block.c
> > index a19374d..be47655 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -1328,6 +1328,13 @@ void bdrv_set_removable(BlockDriverState *bs, int removable)
> >      }
> >  }
> >  
> > +void bdrv_unplug(BlockDriverState *bs)
> > +{
> > +    qemu_aio_flush();
> > +    bdrv_flush(bs);
> > +    bdrv_close(bs);
> > +}
> 
> Stupid question: why doesn't bdrv_close() flush automatically?
> 
> And why do we have to flush here, but not before other uses of
> bdrv_close(), such as eject_device()?
> 
> > +
> >  int bdrv_is_removable(BlockDriverState *bs)
> >  {
> >      return bs->removable;
> > diff --git a/block.h b/block.h
> > index 5f64380..732f63e 100644
> > --- a/block.h
> > +++ b/block.h
> > @@ -171,6 +171,7 @@ void bdrv_set_on_error(BlockDriverState *bs, BlockErrorAction on_read_error,
> >                         BlockErrorAction on_write_error);
> >  BlockErrorAction bdrv_get_on_error(BlockDriverState *bs, int is_read);
> >  void bdrv_set_removable(BlockDriverState *bs, int removable);
> > +void bdrv_unplug(BlockDriverState *bs);
> >  int bdrv_is_removable(BlockDriverState *bs);
> >  int bdrv_is_read_only(BlockDriverState *bs);
> >  int bdrv_is_sg(BlockDriverState *bs);
> > diff --git a/blockdev.c b/blockdev.c
> > index 5fc3b9b..68eb329 100644
> > --- a/blockdev.c
> > +++ b/blockdev.c
> > @@ -610,3 +610,29 @@ int do_change_block(Monitor *mon, const char *device,
> >      }
> >      return monitor_read_bdrv_key_start(mon, bs, NULL, NULL);
> >  }
> > +
> > +int do_drive_unplug(Monitor *mon, const QDict *qdict, QObject **ret_data)
> > +{
> > +    DriveInfo *dinfo;
> > +    BlockDriverState *bs;
> > +    const char *id;
> > +
> > +    if (!qdict_haskey(qdict, "id")) {
> > +        qerror_report(QERR_MISSING_PARAMETER, "id");
> > +        return -1;
> > +    }
> 
> As Luiz pointed out, this check is redundant.
> 
> > +
> > +    id = qdict_get_str(qdict, "id");
> > +    dinfo = drive_get_by_id(id);
> > +    if (!dinfo) {
> > +        qerror_report(QERR_DEVICE_NOT_FOUND, id);
> > +        return -1;
> > +    }
> > +
> > +    /* mark block device unplugged */
> > +    bs = dinfo->bdrv;
> > +    bdrv_unplug(bs);
> > +
> > +    return 0;
> > +}
> > + 
> 
> What about:
> 
>     const char *id = qdict_get_str(qdict, "id");
>     BlockDriverState *bs;
> 
>     bs = bdrv_find(id);
>     if (!bs) {
>         qerror_report(QERR_DEVICE_NOT_FOUND, id);
>         return -1;
>     }
> 
>     bdrv_unplug(bs);
> 
>     return 0;
> 
> Precedence: commit f8b6cc00 replaced uses of drive_get_by_id() by
> bdrv_find().

That works out nicely; and I can drop the drive_get_by_id() patch as
well.  Thanks.

> 
> > diff --git a/blockdev.h b/blockdev.h
> > index 19c6915..ecb9ac8 100644
> > --- a/blockdev.h
> > +++ b/blockdev.h
> > @@ -52,5 +52,6 @@ int do_eject(Monitor *mon, const QDict *qdict, QObject **ret_data);
> >  int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject **ret_data);
> >  int do_change_block(Monitor *mon, const char *device,
> >                      const char *filename, const char *fmt);
> > +int do_drive_unplug(Monitor *mon, const QDict *qdict, QObject **ret_data);
> >  
> >  #endif
> > diff --git a/hmp-commands.hx b/hmp-commands.hx
> > index 81999aa..7a32a2e 100644
> > --- a/hmp-commands.hx
> > +++ b/hmp-commands.hx
> > @@ -68,6 +68,21 @@ Eject a removable medium (use -f to force it).
> >  ETEXI
> >  
> >      {
> > +        .name       = "drive_unplug",
> > +        .args_type  = "id:s",
> > +        .params     = "device",
> > +        .help       = "unplug block device",
> > +        .user_print = monitor_user_noop,
> > +        .mhandler.cmd_new = do_drive_unplug,
> > +    },
> > +
> > +STEXI
> > +@item unplug @var{device}
> > +@findex unplug
> > +Unplug block device.
> 
> A bit terse, isn't it?  What does it mean to unplug a block device?
> What's its observable effect on the guest?  Does it look like disk gone
> completely south, perhaps?

Well, most of the info in here is rather sparse as well, so there is
clear precedence for it's terseness; I'll be a bit more verbose in the
next version.


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-10-29 16:50       ` Ryan Harper
@ 2010-11-02  9:40         ` Markus Armbruster
  2010-11-02 13:22           ` Michael S. Tsirkin
                             ` (2 more replies)
  0 siblings, 3 replies; 60+ messages in thread
From: Markus Armbruster @ 2010-11-02  9:40 UTC (permalink / raw)
  To: Ryan Harper
  Cc: Stefan Hajnoczi, Anthony Liguori, qemu-devel, Kevin Wolf,
	Michael S. Tsirkin

Ryan Harper <ryanh@us.ibm.com> writes:

> * Markus Armbruster <armbru@redhat.com> [2010-10-29 11:11]:
>> Ryan Harper <ryanh@us.ibm.com> writes:
>> 
>> > * Markus Armbruster <armbru@redhat.com> [2010-10-29 09:13]:
>> >> [Note cc: Michael]
>> >> 
>> >> Ryan Harper <ryanh@us.ibm.com> writes:
>> >> 
>> >> 
>> >> If I understand your patch correctly, the difference between your
>> >> drive_unplug and my blockdev_del is as follows:
>> >> 
>> >> * drive_unplug forcefully severs the connection between the host part of
>> >>   the block device and its BlockDriverState.  A shell of the host part
>> >>   remains, to be cleaned up later.  You need forceful disconnect
>> >>   operation to be able to revoke access to an image whether the guest
>> >>   cooperates or not.  Fair enough.
>> >> 
>> >> * blockdev_del deletes a host part.  My current version fails when the
>> >>   host part is in use.  I patterned that after netdev_del, which used to
>> >>   work that way, until commit 2ffcb18d:
>> >> 
>> >>     Make netdev_del delete the netdev even when it's in use
>> >>     
>> >>     To hot-unplug guest and host part of a network device, you do:
>> >>     
>> >>         device_del NIC-ID
>> >>         netdev_del NETDEV-ID
>> >>     
>> >>     For PCI devices, device_del merely tells ACPI to unplug the device.
>> >>     The device goes away for real only after the guest processed the ACPI
>> >>     unplug event.
>> >>     
>> >>     You have to wait until then (e.g. by polling info pci) before you can
>> >>     unplug the netdev.  Not good.
>> >>     
>> >>     Fix by removing the "in use" check from do_netdev_del().  Deleting a
>> >>     netdev while it's in use is safe; packets simply get routed to the bit
>> >>     bucket.
>> >> 
>> >>   Isn't this the very same problem that's behind your drive_unplug?
>> >
>> > Yes it is.
>> >
>> >> 
>> >> I'd like to have some consistency among net, block and char device
>> >> commands, i.e. a common set of operations that work the same for all of
>> >> them.  Can we agree on such a set?
>> >
>> > Yeah; the current trouble (or at least what I perceive to be trouble) is
>> > that in the case where the guest responds to device_del induced ACPI
>> > removal event; the current qdev code already does the host-side device
>> > tear down.  Not sure if it is OK to do a blockdev_del() immediately
>> > after the device_del.  What happens when we do:
>> >
>> > device_del
>> > ACPI to guest
>> > blockdev_del /* removes host-side device */
>> 
>> Fails in my tree, because the blockdev's still in use.  See below.
>> 
>> > guest responds to ACPI
>> > qdev calls pci device removal code
>> > qemu attempts to destroy the associated host-side block
>> >
>> > That may just work today; and if not, it shouldn't be hard to fix up the
>> > code to check for NULLs
>> 
>> I hate the automatic deletion of host part along with the guest part.
>> device_del should undo device_add.  {block,net,char}dev_{add,del} should
>> be similarly paired.
>
> Agreed.
>> 
>> In my blockdev branch, I keep the automatic delete only for backwards
>> compatibility: if you create the drive with drive_add, it gets
>> auto-deleted, but if you use blockdev_add, it stays around.
>
> But what to do about the case where we're doing drive_add and then a
> device_del()  That's the urgent situation that needs to be resolved.

What's the exact problem we need to solve urgently?

Is it "provide means to cut the connection to the host part immediately,
even with an uncooperative guest"?

Does this need to be separate from device_del?

>> >> Even if your drive_unplug shouldn't fit in that set, we might want it as
>> >> a stop-gap.  Depends on how urgent the need for it is.  Yet another
>> >> special-purpose command to be deprecated later.
>> >
>> > The fix is urgent; but I'm willing to spin a couple patches if it helps
>> > get this into better shape.
>> 
>> Can we agree on a common solution for block and net?  That's why I cc'ed
>> Michael.
>
> I didn't see a good way to have block behave the same as net; though I
> do agree that it would be good to have this be common, long term.

If we can't make them behave 100% the same, then the next best thing is
to offer a preferred way to do things that works similarly enough to let
users ignore the differences.

Possible preferred ways to revoke access to a host part:

A. device_del

   Need to make device_del cut the connection right away instead of when
   the guest completes unplug.

   device_del changes behavior.  Any problems with that?

   Not an option if we need "cut the connection" to be separate from
   device_del.

B. FOO_del

   Got netdev_del.

   Need drive_del.  If drive is in use, replace it by a special "dead
   drive" without a device name (so the ID becomes available for new
   drives), then delete the original.

   Wart: drive_del doesn't work reliably after device_del, because the
   drive is auto-deleted when the guest completes unplug.  Not a problem
   for my blockdev_add/blockdev_del work-in-progress, because host parts
   created with blockdev_add don't auto-delete.

C. FOO_unplug

   You got a patch for drive_unplug.

   Need netdev_unplug.

   By the way, I hate "unplug", because it suggests relation to hot
   unplug.  What about "disconnect"?

Any preferences?

>> Currently, we have two different ways:
>> 
>> * The netdev way: "del" always succeeds
>> 
>>   How can it succeed if the host part is in use?
>> 
>>   If all device models are prepared to deal with a missing host part, we
>>   can delete it right away.
>> 
>>   Else, we need to replace it with a suitable zombie, which is
>>   auto-deleted when it goes out of use.  Such zombies are not be visible
>>   elsewhere, in particular, the ID becomes available immediately.
>> 
>> * The unplug way: "del" fails while in use, "unplug" always succeeds
>> 
>>   Feels a bit cleaner to me.  But changing netdev_del might not be
>>   acceptable.
>> 
>> Either way works for me as an user interface.  But I'd rather not have
>> both.
>> 
>> Next, we need to consider how to integrate this with the automatic
>> deletion of drives on qdev destruction.  That's too late for unplug, we
>> want that right in device_del.  I'd leave the stupid automatic delete
>> where it is now, in qdev destruction.  The C API need unplug and delete
>> separate for that.
>> 
>> 
>> Regardless of the way we choose, we need to think very clearly on how
>> exactly device models should behave when their host part is missing or a
>> zombie, and how that behavior appears in the guest.
>> 
>> For net, making it look exactly like a yanked out network cable would
>> make sense to me.
>> 
>> What about block?
>
> It seems to me that for block it's like cdrom with no disk, floppy with
> no media, hard disk that's gone bad.  I think we we throw EIO back; it's
> handled gracefully enough.  This is what happens when you do a
> drive_unplug with my patch; the application using the device gets IO
> errors.  That's expected if a drive were to suddently fail (which is
> what this looks like).  And certainly there is some responsibility
> at the mgmt console to ensure you're not unplugging a drive that you are
> currently using.

Total drive failure works for me.

"No media" is cute, but it's possible only for drives with removable
media.  I'd rather have all drives behave the same, whether their media
is removable or not.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-02  9:40         ` Markus Armbruster
@ 2010-11-02 13:22           ` Michael S. Tsirkin
  2010-11-02 13:41           ` Kevin Wolf
  2010-11-02 13:46           ` Ryan Harper
  2 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2010-11-02 13:22 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Stefan Hajnoczi, Anthony Liguori, Ryan Harper, qemu-devel, Kevin Wolf

On Tue, Nov 02, 2010 at 10:40:32AM +0100, Markus Armbruster wrote:
> C. FOO_unplug
> 
>    You got a patch for drive_unplug.
> 
>    Need netdev_unplug.
> 
>    By the way, I hate "unplug", because it suggests relation to hot
>    unplug.  What about "disconnect"?

> Any preferences?

This implies that both parts stay on, just disconnected.
This is really surprise removal.  While we are at it, can we handle this
generically as removal of devices?

-- 
MST

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-02  9:40         ` Markus Armbruster
  2010-11-02 13:22           ` Michael S. Tsirkin
@ 2010-11-02 13:41           ` Kevin Wolf
  2010-11-02 13:46           ` Ryan Harper
  2 siblings, 0 replies; 60+ messages in thread
From: Kevin Wolf @ 2010-11-02 13:41 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Stefan Hajnoczi, Anthony Liguori, Ryan Harper, qemu-devel,
	Michael S. Tsirkin

Am 02.11.2010 10:40, schrieb Markus Armbruster:
> Ryan Harper <ryanh@us.ibm.com> writes:
> 
>> * Markus Armbruster <armbru@redhat.com> [2010-10-29 11:11]:
>>> Ryan Harper <ryanh@us.ibm.com> writes:
>>>
>>> Regardless of the way we choose, we need to think very clearly on how
>>> exactly device models should behave when their host part is missing or a
>>> zombie, and how that behavior appears in the guest.
>>>
>>> For net, making it look exactly like a yanked out network cable would
>>> make sense to me.
>>>
>>> What about block?
>>
>> It seems to me that for block it's like cdrom with no disk, floppy with
>> no media, hard disk that's gone bad.  I think we we throw EIO back; it's
>> handled gracefully enough.  This is what happens when you do a
>> drive_unplug with my patch; the application using the device gets IO
>> errors.  That's expected if a drive were to suddently fail (which is
>> what this looks like).  And certainly there is some responsibility
>> at the mgmt console to ensure you're not unplugging a drive that you are
>> currently using.
> 
> Total drive failure works for me.
> 
> "No media" is cute, but it's possible only for drives with removable
> media.  I'd rather have all drives behave the same, whether their media
> is removable or not.

But I think we need some way to eject the medium without destroying the
whole device. If you handle it as "device broken", you need another
command for keeping the device, but ejecting the medium, unplugging the
network cable, etc. We have "eject" for block today, but it needs a
drive and not a blockdev (and I'm not even sure it really does what I'm
thinking of, once again)

Kevin

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-02  9:40         ` Markus Armbruster
  2010-11-02 13:22           ` Michael S. Tsirkin
  2010-11-02 13:41           ` Kevin Wolf
@ 2010-11-02 13:46           ` Ryan Harper
  2010-11-02 13:58             ` Michael S. Tsirkin
  2 siblings, 1 reply; 60+ messages in thread
From: Ryan Harper @ 2010-11-02 13:46 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Kevin Wolf, Michael S. Tsirkin, qemu-devel, Anthony Liguori,
	Ryan Harper, Stefan Hajnoczi

* Markus Armbruster <armbru@redhat.com> [2010-11-02 04:40]:
> Ryan Harper <ryanh@us.ibm.com> writes:
> 
> > * Markus Armbruster <armbru@redhat.com> [2010-10-29 11:11]:
> >> Ryan Harper <ryanh@us.ibm.com> writes:
> >> 
> >> > * Markus Armbruster <armbru@redhat.com> [2010-10-29 09:13]:
> >> >> [Note cc: Michael]
> >> >> 
> >> >> Ryan Harper <ryanh@us.ibm.com> writes:
> >> >> 
> >> >> 
> >> >> If I understand your patch correctly, the difference between your
> >> >> drive_unplug and my blockdev_del is as follows:
> >> >> 
> >> >> * drive_unplug forcefully severs the connection between the host part of
> >> >>   the block device and its BlockDriverState.  A shell of the host part
> >> >>   remains, to be cleaned up later.  You need forceful disconnect
> >> >>   operation to be able to revoke access to an image whether the guest
> >> >>   cooperates or not.  Fair enough.
> >> >> 
> >> >> * blockdev_del deletes a host part.  My current version fails when the
> >> >>   host part is in use.  I patterned that after netdev_del, which used to
> >> >>   work that way, until commit 2ffcb18d:
> >> >> 
> >> >>     Make netdev_del delete the netdev even when it's in use
> >> >>     
> >> >>     To hot-unplug guest and host part of a network device, you do:
> >> >>     
> >> >>         device_del NIC-ID
> >> >>         netdev_del NETDEV-ID
> >> >>     
> >> >>     For PCI devices, device_del merely tells ACPI to unplug the device.
> >> >>     The device goes away for real only after the guest processed the ACPI
> >> >>     unplug event.
> >> >>     
> >> >>     You have to wait until then (e.g. by polling info pci) before you can
> >> >>     unplug the netdev.  Not good.
> >> >>     
> >> >>     Fix by removing the "in use" check from do_netdev_del().  Deleting a
> >> >>     netdev while it's in use is safe; packets simply get routed to the bit
> >> >>     bucket.
> >> >> 
> >> >>   Isn't this the very same problem that's behind your drive_unplug?
> >> >
> >> > Yes it is.
> >> >
> >> >> 
> >> >> I'd like to have some consistency among net, block and char device
> >> >> commands, i.e. a common set of operations that work the same for all of
> >> >> them.  Can we agree on such a set?
> >> >
> >> > Yeah; the current trouble (or at least what I perceive to be trouble) is
> >> > that in the case where the guest responds to device_del induced ACPI
> >> > removal event; the current qdev code already does the host-side device
> >> > tear down.  Not sure if it is OK to do a blockdev_del() immediately
> >> > after the device_del.  What happens when we do:
> >> >
> >> > device_del
> >> > ACPI to guest
> >> > blockdev_del /* removes host-side device */
> >> 
> >> Fails in my tree, because the blockdev's still in use.  See below.
> >> 
> >> > guest responds to ACPI
> >> > qdev calls pci device removal code
> >> > qemu attempts to destroy the associated host-side block
> >> >
> >> > That may just work today; and if not, it shouldn't be hard to fix up the
> >> > code to check for NULLs
> >> 
> >> I hate the automatic deletion of host part along with the guest part.
> >> device_del should undo device_add.  {block,net,char}dev_{add,del} should
> >> be similarly paired.
> >
> > Agreed.
> >> 
> >> In my blockdev branch, I keep the automatic delete only for backwards
> >> compatibility: if you create the drive with drive_add, it gets
> >> auto-deleted, but if you use blockdev_add, it stays around.
> >
> > But what to do about the case where we're doing drive_add and then a
> > device_del()  That's the urgent situation that needs to be resolved.
> 
> What's the exact problem we need to solve urgently?
> 
> Is it "provide means to cut the connection to the host part immediately,
> even with an uncooperative guest"?

Yes, need to ensure that if the mgmt layer (libvirt) has done what it
believes should have disassociated the host block device from the guest,
we want to ensure that the host block device is no longer accessible
from the guest.

> 
> Does this need to be separate from device_del?

no, it doesn't have to be.  Honestly, I didn't see a clear way to do
something like unplug early in the device_del because that's all pci
device code which has no knowledge of host block devices; having it
disconnect seemed like a layering violation.


> 
> >> >> Even if your drive_unplug shouldn't fit in that set, we might want it as
> >> >> a stop-gap.  Depends on how urgent the need for it is.  Yet another
> >> >> special-purpose command to be deprecated later.
> >> >
> >> > The fix is urgent; but I'm willing to spin a couple patches if it helps
> >> > get this into better shape.
> >> 
> >> Can we agree on a common solution for block and net?  That's why I cc'ed
> >> Michael.
> >
> > I didn't see a good way to have block behave the same as net; though I
> > do agree that it would be good to have this be common, long term.
> 
> If we can't make them behave 100% the same, then the next best thing is
> to offer a preferred way to do things that works similarly enough to let
> users ignore the differences.
> 
> Possible preferred ways to revoke access to a host part:
> 
> A. device_del
> 
>    Need to make device_del cut the connection right away instead of when
>    the guest completes unplug.
> 
>    device_del changes behavior.  Any problems with that?

I don't think so; current mgmt consumers assume a cooperative guest and
don't handle an uncooperative one right now in the case where Selinux is
disabled; so modifying this path to do a disconnect shouldn't be a
problem.

> 
>    Not an option if we need "cut the connection" to be separate from
>    device_del.
> 
> B. FOO_del
> 
>    Got netdev_del.
> 
>    Need drive_del.  If drive is in use, replace it by a special "dead
>    drive" without a device name (so the ID becomes available for new
>    drives), then delete the original.
> 
>    Wart: drive_del doesn't work reliably after device_del, because the
>    drive is auto-deleted when the guest completes unplug.  Not a problem
>    for my blockdev_add/blockdev_del work-in-progress, because host parts
>    created with blockdev_add don't auto-delete.
> 
> C. FOO_unplug
> 
>    You got a patch for drive_unplug.
> 
>    Need netdev_unplug.
> 
>    By the way, I hate "unplug", because it suggests relation to hot
>    unplug.  What about "disconnect"?
> 
> Any preferences?

disconnect is fine.

> 
> >> Currently, we have two different ways:
> >> 
> >> * The netdev way: "del" always succeeds
> >> 
> >>   How can it succeed if the host part is in use?
> >> 
> >>   If all device models are prepared to deal with a missing host part, we
> >>   can delete it right away.
> >> 
> >>   Else, we need to replace it with a suitable zombie, which is
> >>   auto-deleted when it goes out of use.  Such zombies are not be visible
> >>   elsewhere, in particular, the ID becomes available immediately.
> >> 
> >> * The unplug way: "del" fails while in use, "unplug" always succeeds
> >> 
> >>   Feels a bit cleaner to me.  But changing netdev_del might not be
> >>   acceptable.
> >> 
> >> Either way works for me as an user interface.  But I'd rather not have
> >> both.
> >> 
> >> Next, we need to consider how to integrate this with the automatic
> >> deletion of drives on qdev destruction.  That's too late for unplug, we
> >> want that right in device_del.  I'd leave the stupid automatic delete
> >> where it is now, in qdev destruction.  The C API need unplug and delete
> >> separate for that.
> >> 
> >> 
> >> Regardless of the way we choose, we need to think very clearly on how
> >> exactly device models should behave when their host part is missing or a
> >> zombie, and how that behavior appears in the guest.
> >> 
> >> For net, making it look exactly like a yanked out network cable would
> >> make sense to me.
> >> 
> >> What about block?
> >
> > It seems to me that for block it's like cdrom with no disk, floppy with
> > no media, hard disk that's gone bad.  I think we we throw EIO back; it's
> > handled gracefully enough.  This is what happens when you do a
> > drive_unplug with my patch; the application using the device gets IO
> > errors.  That's expected if a drive were to suddently fail (which is
> > what this looks like).  And certainly there is some responsibility
> > at the mgmt console to ensure you're not unplugging a drive that you are
> > currently using.
> 
> Total drive failure works for me.

OK

> 
> "No media" is cute, but it's possible only for drives with removable
> media.  I'd rather have all drives behave the same, whether their media
> is removable or not.

right, I don't think there is any "no media" for hard disks.


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-02 13:46           ` Ryan Harper
@ 2010-11-02 13:58             ` Michael S. Tsirkin
  2010-11-02 14:22               ` Ryan Harper
  0 siblings, 1 reply; 60+ messages in thread
From: Michael S. Tsirkin @ 2010-11-02 13:58 UTC (permalink / raw)
  To: Ryan Harper
  Cc: Stefan Hajnoczi, Anthony Liguori, Markus Armbruster, Kevin Wolf,
	qemu-devel

On Tue, Nov 02, 2010 at 08:46:22AM -0500, Ryan Harper wrote:
> * Markus Armbruster <armbru@redhat.com> [2010-11-02 04:40]:
> > Ryan Harper <ryanh@us.ibm.com> writes:
> > 
> > > * Markus Armbruster <armbru@redhat.com> [2010-10-29 11:11]:
> > >> Ryan Harper <ryanh@us.ibm.com> writes:
> > >> 
> > >> > * Markus Armbruster <armbru@redhat.com> [2010-10-29 09:13]:
> > >> >> [Note cc: Michael]
> > >> >> 
> > >> >> Ryan Harper <ryanh@us.ibm.com> writes:
> > >> >> 
> > >> >> 
> > >> >> If I understand your patch correctly, the difference between your
> > >> >> drive_unplug and my blockdev_del is as follows:
> > >> >> 
> > >> >> * drive_unplug forcefully severs the connection between the host part of
> > >> >>   the block device and its BlockDriverState.  A shell of the host part
> > >> >>   remains, to be cleaned up later.  You need forceful disconnect
> > >> >>   operation to be able to revoke access to an image whether the guest
> > >> >>   cooperates or not.  Fair enough.
> > >> >> 
> > >> >> * blockdev_del deletes a host part.  My current version fails when the
> > >> >>   host part is in use.  I patterned that after netdev_del, which used to
> > >> >>   work that way, until commit 2ffcb18d:
> > >> >> 
> > >> >>     Make netdev_del delete the netdev even when it's in use
> > >> >>     
> > >> >>     To hot-unplug guest and host part of a network device, you do:
> > >> >>     
> > >> >>         device_del NIC-ID
> > >> >>         netdev_del NETDEV-ID
> > >> >>     
> > >> >>     For PCI devices, device_del merely tells ACPI to unplug the device.
> > >> >>     The device goes away for real only after the guest processed the ACPI
> > >> >>     unplug event.
> > >> >>     
> > >> >>     You have to wait until then (e.g. by polling info pci) before you can
> > >> >>     unplug the netdev.  Not good.
> > >> >>     
> > >> >>     Fix by removing the "in use" check from do_netdev_del().  Deleting a
> > >> >>     netdev while it's in use is safe; packets simply get routed to the bit
> > >> >>     bucket.
> > >> >> 
> > >> >>   Isn't this the very same problem that's behind your drive_unplug?
> > >> >
> > >> > Yes it is.
> > >> >
> > >> >> 
> > >> >> I'd like to have some consistency among net, block and char device
> > >> >> commands, i.e. a common set of operations that work the same for all of
> > >> >> them.  Can we agree on such a set?
> > >> >
> > >> > Yeah; the current trouble (or at least what I perceive to be trouble) is
> > >> > that in the case where the guest responds to device_del induced ACPI
> > >> > removal event; the current qdev code already does the host-side device
> > >> > tear down.  Not sure if it is OK to do a blockdev_del() immediately
> > >> > after the device_del.  What happens when we do:
> > >> >
> > >> > device_del
> > >> > ACPI to guest
> > >> > blockdev_del /* removes host-side device */
> > >> 
> > >> Fails in my tree, because the blockdev's still in use.  See below.
> > >> 
> > >> > guest responds to ACPI
> > >> > qdev calls pci device removal code
> > >> > qemu attempts to destroy the associated host-side block
> > >> >
> > >> > That may just work today; and if not, it shouldn't be hard to fix up the
> > >> > code to check for NULLs
> > >> 
> > >> I hate the automatic deletion of host part along with the guest part.
> > >> device_del should undo device_add.  {block,net,char}dev_{add,del} should
> > >> be similarly paired.
> > >
> > > Agreed.
> > >> 
> > >> In my blockdev branch, I keep the automatic delete only for backwards
> > >> compatibility: if you create the drive with drive_add, it gets
> > >> auto-deleted, but if you use blockdev_add, it stays around.
> > >
> > > But what to do about the case where we're doing drive_add and then a
> > > device_del()  That's the urgent situation that needs to be resolved.
> > 
> > What's the exact problem we need to solve urgently?
> > 
> > Is it "provide means to cut the connection to the host part immediately,
> > even with an uncooperative guest"?
> 
> Yes, need to ensure that if the mgmt layer (libvirt) has done what it
> believes should have disassociated the host block device from the guest,
> we want to ensure that the host block device is no longer accessible
> from the guest.
> 
> > 
> > Does this need to be separate from device_del?
> 
> no, it doesn't have to be.  Honestly, I didn't see a clear way to do
> something like unplug early in the device_del because that's all pci
> device code which has no knowledge of host block devices; having it
> disconnect seemed like a layering violation.

We invoke the cleanup callback, isn't that enough?

> > 
> > >> >> Even if your drive_unplug shouldn't fit in that set, we might want it as
> > >> >> a stop-gap.  Depends on how urgent the need for it is.  Yet another
> > >> >> special-purpose command to be deprecated later.
> > >> >
> > >> > The fix is urgent; but I'm willing to spin a couple patches if it helps
> > >> > get this into better shape.
> > >> 
> > >> Can we agree on a common solution for block and net?  That's why I cc'ed
> > >> Michael.
> > >
> > > I didn't see a good way to have block behave the same as net; though I
> > > do agree that it would be good to have this be common, long term.
> > 
> > If we can't make them behave 100% the same, then the next best thing is
> > to offer a preferred way to do things that works similarly enough to let
> > users ignore the differences.
> > 
> > Possible preferred ways to revoke access to a host part:
> > 
> > A. device_del
> > 
> >    Need to make device_del cut the connection right away instead of when
> >    the guest completes unplug.
> > 
> >    device_del changes behavior.  Any problems with that?
> 
> I don't think so; current mgmt consumers assume a cooperative guest and
> don't handle an uncooperative one right now in the case where Selinux is
> disabled; so modifying this path to do a disconnect shouldn't be a
> problem.
> 
> > 
> >    Not an option if we need "cut the connection" to be separate from
> >    device_del.
> > 
> > B. FOO_del
> > 
> >    Got netdev_del.
> > 
> >    Need drive_del.  If drive is in use, replace it by a special "dead
> >    drive" without a device name (so the ID becomes available for new
> >    drives), then delete the original.
> > 
> >    Wart: drive_del doesn't work reliably after device_del, because the
> >    drive is auto-deleted when the guest completes unplug.  Not a problem
> >    for my blockdev_add/blockdev_del work-in-progress, because host parts
> >    created with blockdev_add don't auto-delete.
> > 
> > C. FOO_unplug
> > 
> >    You got a patch for drive_unplug.
> > 
> >    Need netdev_unplug.
> > 
> >    By the way, I hate "unplug", because it suggests relation to hot
> >    unplug.  What about "disconnect"?
> > 
> > Any preferences?
> 
> disconnect is fine.
> 
> > 
> > >> Currently, we have two different ways:
> > >> 
> > >> * The netdev way: "del" always succeeds
> > >> 
> > >>   How can it succeed if the host part is in use?
> > >> 
> > >>   If all device models are prepared to deal with a missing host part, we
> > >>   can delete it right away.
> > >> 
> > >>   Else, we need to replace it with a suitable zombie, which is
> > >>   auto-deleted when it goes out of use.  Such zombies are not be visible
> > >>   elsewhere, in particular, the ID becomes available immediately.
> > >> 
> > >> * The unplug way: "del" fails while in use, "unplug" always succeeds
> > >> 
> > >>   Feels a bit cleaner to me.  But changing netdev_del might not be
> > >>   acceptable.
> > >> 
> > >> Either way works for me as an user interface.  But I'd rather not have
> > >> both.
> > >> 
> > >> Next, we need to consider how to integrate this with the automatic
> > >> deletion of drives on qdev destruction.  That's too late for unplug, we
> > >> want that right in device_del.  I'd leave the stupid automatic delete
> > >> where it is now, in qdev destruction.  The C API need unplug and delete
> > >> separate for that.
> > >> 
> > >> 
> > >> Regardless of the way we choose, we need to think very clearly on how
> > >> exactly device models should behave when their host part is missing or a
> > >> zombie, and how that behavior appears in the guest.
> > >> 
> > >> For net, making it look exactly like a yanked out network cable would
> > >> make sense to me.
> > >> 
> > >> What about block?
> > >
> > > It seems to me that for block it's like cdrom with no disk, floppy with
> > > no media, hard disk that's gone bad.  I think we we throw EIO back; it's
> > > handled gracefully enough.  This is what happens when you do a
> > > drive_unplug with my patch; the application using the device gets IO
> > > errors.  That's expected if a drive were to suddently fail (which is
> > > what this looks like).  And certainly there is some responsibility
> > > at the mgmt console to ensure you're not unplugging a drive that you are
> > > currently using.
> > 
> > Total drive failure works for me.
> 
> OK
> 
> > 
> > "No media" is cute, but it's possible only for drives with removable
> > media.  I'd rather have all drives behave the same, whether their media
> > is removable or not.
> 
> right, I don't think there is any "no media" for hard disks.
> 
> 
> -- 
> Ryan Harper
> Software Engineer; Linux Technology Center
> IBM Corp., Austin, Tx
> ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-02 13:58             ` Michael S. Tsirkin
@ 2010-11-02 14:22               ` Ryan Harper
  2010-11-02 15:46                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 60+ messages in thread
From: Ryan Harper @ 2010-11-02 14:22 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Kevin Wolf, Markus Armbruster, qemu-devel, Anthony Liguori,
	Ryan Harper, Stefan Hajnoczi

* Michael S. Tsirkin <mst@redhat.com> [2010-11-02 08:59]:
> On Tue, Nov 02, 2010 at 08:46:22AM -0500, Ryan Harper wrote:
> > * Markus Armbruster <armbru@redhat.com> [2010-11-02 04:40]:

> > > >> >> I'd like to have some consistency among net, block and char device
> > > >> >> commands, i.e. a common set of operations that work the same for all of
> > > >> >> them.  Can we agree on such a set?
> > > >> >
> > > >> > Yeah; the current trouble (or at least what I perceive to be trouble) is
> > > >> > that in the case where the guest responds to device_del induced ACPI
> > > >> > removal event; the current qdev code already does the host-side device
> > > >> > tear down.  Not sure if it is OK to do a blockdev_del() immediately
> > > >> > after the device_del.  What happens when we do:
> > > >> >
> > > >> > device_del
> > > >> > ACPI to guest
> > > >> > blockdev_del /* removes host-side device */
> > > >> 
> > > >> Fails in my tree, because the blockdev's still in use.  See below.
> > > >> 
> > > >> > guest responds to ACPI
> > > >> > qdev calls pci device removal code
> > > >> > qemu attempts to destroy the associated host-side block
> > > >> >
> > > >> > That may just work today; and if not, it shouldn't be hard to fix up the
> > > >> > code to check for NULLs
> > > >> 
> > > >> I hate the automatic deletion of host part along with the guest part.
> > > >> device_del should undo device_add.  {block,net,char}dev_{add,del} should
> > > >> be similarly paired.
> > > >
> > > > Agreed.
> > > >> 
> > > >> In my blockdev branch, I keep the automatic delete only for backwards
> > > >> compatibility: if you create the drive with drive_add, it gets
> > > >> auto-deleted, but if you use blockdev_add, it stays around.
> > > >
> > > > But what to do about the case where we're doing drive_add and then a
> > > > device_del()  That's the urgent situation that needs to be resolved.
> > > 
> > > What's the exact problem we need to solve urgently?
> > > 
> > > Is it "provide means to cut the connection to the host part immediately,
> > > even with an uncooperative guest"?
> > 
> > Yes, need to ensure that if the mgmt layer (libvirt) has done what it
> > believes should have disassociated the host block device from the guest,
> > we want to ensure that the host block device is no longer accessible
> > from the guest.
> > 
> > > 
> > > Does this need to be separate from device_del?
> > 
> > no, it doesn't have to be.  Honestly, I didn't see a clear way to do
> > something like unplug early in the device_del because that's all pci
> > device code which has no knowledge of host block devices; having it
> > disconnect seemed like a layering violation.
> 
> We invoke the cleanup callback, isn't that enough?

Won't that look a bit strange?  on device_del, call the cleanup callback
first;, then notify the guest, if the guest responds, I suppose as long
as the cleanup callback can handle being called a second time that'd
work.

I like the idea of disconnect; if part of the device_del method was to
invoke a disconnect method, we could implement that for block, net, etc;

I'd think we'd want to send the notification, then disconnect.
Struggling with whether it's worth having some reasonable timeout
between notification and disconnect.  




-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-02 14:22               ` Ryan Harper
@ 2010-11-02 15:46                 ` Michael S. Tsirkin
  2010-11-02 16:53                   ` Ryan Harper
  0 siblings, 1 reply; 60+ messages in thread
From: Michael S. Tsirkin @ 2010-11-02 15:46 UTC (permalink / raw)
  To: Ryan Harper
  Cc: Stefan Hajnoczi, Anthony Liguori, Markus Armbruster, Kevin Wolf,
	qemu-devel

On Tue, Nov 02, 2010 at 09:22:01AM -0500, Ryan Harper wrote:
> * Michael S. Tsirkin <mst@redhat.com> [2010-11-02 08:59]:
> > On Tue, Nov 02, 2010 at 08:46:22AM -0500, Ryan Harper wrote:
> > > * Markus Armbruster <armbru@redhat.com> [2010-11-02 04:40]:
> 
> > > > >> >> I'd like to have some consistency among net, block and char device
> > > > >> >> commands, i.e. a common set of operations that work the same for all of
> > > > >> >> them.  Can we agree on such a set?
> > > > >> >
> > > > >> > Yeah; the current trouble (or at least what I perceive to be trouble) is
> > > > >> > that in the case where the guest responds to device_del induced ACPI
> > > > >> > removal event; the current qdev code already does the host-side device
> > > > >> > tear down.  Not sure if it is OK to do a blockdev_del() immediately
> > > > >> > after the device_del.  What happens when we do:
> > > > >> >
> > > > >> > device_del
> > > > >> > ACPI to guest
> > > > >> > blockdev_del /* removes host-side device */
> > > > >> 
> > > > >> Fails in my tree, because the blockdev's still in use.  See below.
> > > > >> 
> > > > >> > guest responds to ACPI
> > > > >> > qdev calls pci device removal code
> > > > >> > qemu attempts to destroy the associated host-side block
> > > > >> >
> > > > >> > That may just work today; and if not, it shouldn't be hard to fix up the
> > > > >> > code to check for NULLs
> > > > >> 
> > > > >> I hate the automatic deletion of host part along with the guest part.
> > > > >> device_del should undo device_add.  {block,net,char}dev_{add,del} should
> > > > >> be similarly paired.
> > > > >
> > > > > Agreed.
> > > > >> 
> > > > >> In my blockdev branch, I keep the automatic delete only for backwards
> > > > >> compatibility: if you create the drive with drive_add, it gets
> > > > >> auto-deleted, but if you use blockdev_add, it stays around.
> > > > >
> > > > > But what to do about the case where we're doing drive_add and then a
> > > > > device_del()  That's the urgent situation that needs to be resolved.
> > > > 
> > > > What's the exact problem we need to solve urgently?
> > > > 
> > > > Is it "provide means to cut the connection to the host part immediately,
> > > > even with an uncooperative guest"?
> > > 
> > > Yes, need to ensure that if the mgmt layer (libvirt) has done what it
> > > believes should have disassociated the host block device from the guest,
> > > we want to ensure that the host block device is no longer accessible
> > > from the guest.
> > > 
> > > > 
> > > > Does this need to be separate from device_del?
> > > 
> > > no, it doesn't have to be.  Honestly, I didn't see a clear way to do
> > > something like unplug early in the device_del because that's all pci
> > > device code which has no knowledge of host block devices; having it
> > > disconnect seemed like a layering violation.
> > 
> > We invoke the cleanup callback, isn't that enough?
> 
> Won't that look a bit strange?  on device_del, call the cleanup callback
> first;, then notify the guest, if the guest responds, I suppose as long
> as the cleanup callback can handle being called a second time that'd
> work.

Well this is exactly what happens with surpise removal.
If you yank a card out the slot, guest only gets notification
afterwards.

> I like the idea of disconnect; if part of the device_del method was to
> invoke a disconnect method, we could implement that for block, net, etc;
> 
> I'd think we'd want to send the notification, then disconnect.
> Struggling with whether it's worth having some reasonable timeout
> between notification and disconnect.  

The problem with this is that it has no analog in real world.
In real world, you can send some notifications to the guest, and you can
remove the card.  Tying them together is what created the problem in the
first place.

Timeouts can be implemented by management, maybe with a nice dialog
being shown to the user.

-- 
MST

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-02 15:46                 ` Michael S. Tsirkin
@ 2010-11-02 16:53                   ` Ryan Harper
  2010-11-02 17:59                     ` Michael S. Tsirkin
  0 siblings, 1 reply; 60+ messages in thread
From: Ryan Harper @ 2010-11-02 16:53 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Kevin Wolf, Markus Armbruster, qemu-devel, Anthony Liguori,
	Ryan Harper, Stefan Hajnoczi

* Michael S. Tsirkin <mst@redhat.com> [2010-11-02 10:56]:
> On Tue, Nov 02, 2010 at 09:22:01AM -0500, Ryan Harper wrote:
> > * Michael S. Tsirkin <mst@redhat.com> [2010-11-02 08:59]:
> > > On Tue, Nov 02, 2010 at 08:46:22AM -0500, Ryan Harper wrote:
> > > > * Markus Armbruster <armbru@redhat.com> [2010-11-02 04:40]:
> > 
> > > > > >> >> I'd like to have some consistency among net, block and char device
> > > > > >> >> commands, i.e. a common set of operations that work the same for all of
> > > > > >> >> them.  Can we agree on such a set?
> > > > > >> >
> > > > > >> > Yeah; the current trouble (or at least what I perceive to be trouble) is
> > > > > >> > that in the case where the guest responds to device_del induced ACPI
> > > > > >> > removal event; the current qdev code already does the host-side device
> > > > > >> > tear down.  Not sure if it is OK to do a blockdev_del() immediately
> > > > > >> > after the device_del.  What happens when we do:
> > > > > >> >
> > > > > >> > device_del
> > > > > >> > ACPI to guest
> > > > > >> > blockdev_del /* removes host-side device */
> > > > > >> 
> > > > > >> Fails in my tree, because the blockdev's still in use.  See below.
> > > > > >> 
> > > > > >> > guest responds to ACPI
> > > > > >> > qdev calls pci device removal code
> > > > > >> > qemu attempts to destroy the associated host-side block
> > > > > >> >
> > > > > >> > That may just work today; and if not, it shouldn't be hard to fix up the
> > > > > >> > code to check for NULLs
> > > > > >> 
> > > > > >> I hate the automatic deletion of host part along with the guest part.
> > > > > >> device_del should undo device_add.  {block,net,char}dev_{add,del} should
> > > > > >> be similarly paired.
> > > > > >
> > > > > > Agreed.
> > > > > >> 
> > > > > >> In my blockdev branch, I keep the automatic delete only for backwards
> > > > > >> compatibility: if you create the drive with drive_add, it gets
> > > > > >> auto-deleted, but if you use blockdev_add, it stays around.
> > > > > >
> > > > > > But what to do about the case where we're doing drive_add and then a
> > > > > > device_del()  That's the urgent situation that needs to be resolved.
> > > > > 
> > > > > What's the exact problem we need to solve urgently?
> > > > > 
> > > > > Is it "provide means to cut the connection to the host part immediately,
> > > > > even with an uncooperative guest"?
> > > > 
> > > > Yes, need to ensure that if the mgmt layer (libvirt) has done what it
> > > > believes should have disassociated the host block device from the guest,
> > > > we want to ensure that the host block device is no longer accessible
> > > > from the guest.
> > > > 
> > > > > 
> > > > > Does this need to be separate from device_del?
> > > > 
> > > > no, it doesn't have to be.  Honestly, I didn't see a clear way to do
> > > > something like unplug early in the device_del because that's all pci
> > > > device code which has no knowledge of host block devices; having it
> > > > disconnect seemed like a layering violation.
> > > 
> > > We invoke the cleanup callback, isn't that enough?
> > 
> > Won't that look a bit strange?  on device_del, call the cleanup callback
> > first;, then notify the guest, if the guest responds, I suppose as long
> > as the cleanup callback can handle being called a second time that'd
> > work.
> 
> Well this is exactly what happens with surpise removal.
> If you yank a card out the slot, guest only gets notification
> afterwards.

Right, though the card ripper can (in some systems) press the removal
button which would send notification.  I think I'm fine with not
bothering to notify; this was mgmt interface driven anyhow so who ever
is doing it should have already ensured they weren't using the device.

> 
> > I like the idea of disconnect; if part of the device_del method was to
> > invoke a disconnect method, we could implement that for block, net, etc;
> > 
> > I'd think we'd want to send the notification, then disconnect.
> > Struggling with whether it's worth having some reasonable timeout
> > between notification and disconnect.  
> 
> The problem with this is that it has no analog in real world.
> In real world, you can send some notifications to the guest, and you can
> remove the card.  Tying them together is what created the problem in the
> first place.
> 
> Timeouts can be implemented by management, maybe with a nice dialog
> being shown to the user.

Very true.  I'm fine with forcing a disconnect during the removal path
prior to notification.  Do we want a new disconnect method at the device
level (pci)? or just use the existing removal callback and call that
during the initial hotremov event?


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-02 16:53                   ` Ryan Harper
@ 2010-11-02 17:59                     ` Michael S. Tsirkin
  2010-11-02 19:01                       ` Ryan Harper
  0 siblings, 1 reply; 60+ messages in thread
From: Michael S. Tsirkin @ 2010-11-02 17:59 UTC (permalink / raw)
  To: Ryan Harper
  Cc: Kevin Wolf, yamahata, Markus Armbruster, qemu-devel,
	Anthony Liguori, Stefan Hajnoczi

Cc yamahata@valinux.co.jp, he is working on hotplug for pci
express.

On Tue, Nov 02, 2010 at 11:53:39AM -0500, Ryan Harper wrote:
> * Michael S. Tsirkin <mst@redhat.com> [2010-11-02 10:56]:
> > On Tue, Nov 02, 2010 at 09:22:01AM -0500, Ryan Harper wrote:
> > > * Michael S. Tsirkin <mst@redhat.com> [2010-11-02 08:59]:
> > > > On Tue, Nov 02, 2010 at 08:46:22AM -0500, Ryan Harper wrote:
> > > > > * Markus Armbruster <armbru@redhat.com> [2010-11-02 04:40]:
> > > 
> > > > > > >> >> I'd like to have some consistency among net, block and char device
> > > > > > >> >> commands, i.e. a common set of operations that work the same for all of
> > > > > > >> >> them.  Can we agree on such a set?
> > > > > > >> >
> > > > > > >> > Yeah; the current trouble (or at least what I perceive to be trouble) is
> > > > > > >> > that in the case where the guest responds to device_del induced ACPI
> > > > > > >> > removal event; the current qdev code already does the host-side device
> > > > > > >> > tear down.  Not sure if it is OK to do a blockdev_del() immediately
> > > > > > >> > after the device_del.  What happens when we do:
> > > > > > >> >
> > > > > > >> > device_del
> > > > > > >> > ACPI to guest
> > > > > > >> > blockdev_del /* removes host-side device */
> > > > > > >> 
> > > > > > >> Fails in my tree, because the blockdev's still in use.  See below.
> > > > > > >> 
> > > > > > >> > guest responds to ACPI
> > > > > > >> > qdev calls pci device removal code
> > > > > > >> > qemu attempts to destroy the associated host-side block
> > > > > > >> >
> > > > > > >> > That may just work today; and if not, it shouldn't be hard to fix up the
> > > > > > >> > code to check for NULLs
> > > > > > >> 
> > > > > > >> I hate the automatic deletion of host part along with the guest part.
> > > > > > >> device_del should undo device_add.  {block,net,char}dev_{add,del} should
> > > > > > >> be similarly paired.
> > > > > > >
> > > > > > > Agreed.
> > > > > > >> 
> > > > > > >> In my blockdev branch, I keep the automatic delete only for backwards
> > > > > > >> compatibility: if you create the drive with drive_add, it gets
> > > > > > >> auto-deleted, but if you use blockdev_add, it stays around.
> > > > > > >
> > > > > > > But what to do about the case where we're doing drive_add and then a
> > > > > > > device_del()  That's the urgent situation that needs to be resolved.
> > > > > > 
> > > > > > What's the exact problem we need to solve urgently?
> > > > > > 
> > > > > > Is it "provide means to cut the connection to the host part immediately,
> > > > > > even with an uncooperative guest"?
> > > > > 
> > > > > Yes, need to ensure that if the mgmt layer (libvirt) has done what it
> > > > > believes should have disassociated the host block device from the guest,
> > > > > we want to ensure that the host block device is no longer accessible
> > > > > from the guest.
> > > > > 
> > > > > > 
> > > > > > Does this need to be separate from device_del?
> > > > > 
> > > > > no, it doesn't have to be.  Honestly, I didn't see a clear way to do
> > > > > something like unplug early in the device_del because that's all pci
> > > > > device code which has no knowledge of host block devices; having it
> > > > > disconnect seemed like a layering violation.
> > > > 
> > > > We invoke the cleanup callback, isn't that enough?
> > > 
> > > Won't that look a bit strange?  on device_del, call the cleanup callback
> > > first;, then notify the guest, if the guest responds, I suppose as long
> > > as the cleanup callback can handle being called a second time that'd
> > > work.
> > 
> > Well this is exactly what happens with surpise removal.
> > If you yank a card out the slot, guest only gets notification
> > afterwards.
> 
> Right, though the card ripper can (in some systems) press the removal
> button which would send notification.  I think I'm fine with not
> bothering to notify;

I think at least for express the port would notice the event
and notify guest anyway, it just happens after the fact,
by necessity.

> this was mgmt interface driven anyhow so who ever
> is doing it should have already ensured they weren't using the device.

Right. However, I think two additional new interfaces that
1. just send the notification
2. report event on guest eject
would be a good idea. This might tie in well with pci express work where
there's a standard interface for sending these events and for getting
notified that guest is ready for device to be removed.

Guests also might have ways to lock the card so you can not yank it out,
mechanically. This could translate to a failure to do surpise removal,
management can then decide whether killing the guest makes sense.

> > 
> > > I like the idea of disconnect; if part of the device_del method was to
> > > invoke a disconnect method, we could implement that for block, net, etc;
> > > 
> > > I'd think we'd want to send the notification, then disconnect.
> > > Struggling with whether it's worth having some reasonable timeout
> > > between notification and disconnect.  
> > 
> > The problem with this is that it has no analog in real world.
> > In real world, you can send some notifications to the guest, and you can
> > remove the card.  Tying them together is what created the problem in the
> > first place.
> > 
> > Timeouts can be implemented by management, maybe with a nice dialog
> > being shown to the user.
> 
> Very true.  I'm fine with forcing a disconnect during the removal path
> prior to notification.  Do we want a new disconnect method at the device
> level (pci)? or just use the existing removal callback and call that
> during the initial hotremov event?

Not sure what you mean by that, but I don't see a device doing anything
differently wrt surprise or ordered removal. So probably the existing
callback should do. I don't think we need to talk about disconnect:
since we decided we are emulating device removal, let's call it
just that.

-- 
MST

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-02 17:59                     ` Michael S. Tsirkin
@ 2010-11-02 19:01                       ` Ryan Harper
  2010-11-02 19:17                         ` Michael S. Tsirkin
  0 siblings, 1 reply; 60+ messages in thread
From: Ryan Harper @ 2010-11-02 19:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Kevin Wolf, yamahata, Markus Armbruster, qemu-devel,
	Anthony Liguori, Ryan Harper, Stefan Hajnoczi

> > > > I like the idea of disconnect; if part of the device_del method was to
> > > > invoke a disconnect method, we could implement that for block, net, etc;
> > > > 
> > > > I'd think we'd want to send the notification, then disconnect.
> > > > Struggling with whether it's worth having some reasonable timeout
> > > > between notification and disconnect.  
> > > 
> > > The problem with this is that it has no analog in real world.
> > > In real world, you can send some notifications to the guest, and you can
> > > remove the card.  Tying them together is what created the problem in the
> > > first place.
> > > 
> > > Timeouts can be implemented by management, maybe with a nice dialog
> > > being shown to the user.
> > 
> > Very true.  I'm fine with forcing a disconnect during the removal path
> > prior to notification.  Do we want a new disconnect method at the device
> > level (pci)? or just use the existing removal callback and call that
> > during the initial hotremov event?
> 
> Not sure what you mean by that, but I don't see a device doing anything
> differently wrt surprise or ordered removal. So probably the existing
> callback should do. I don't think we need to talk about disconnect:
> since we decided we are emulating device removal, let's call it
> just that.

Because current the "removal" process depends on the guest actually
responding.  What I'm suggesting is that, in Marcus's term, and what
drive_unplug() implements, is to disconnect the host block device from
the guest device to prevent any further access to it in the case the
guest doesn't respond to the removal request made via ACPI.

Very specifically, what we're suggesting instead of the drive_unplug()
command so to complete the device removal operation without waiting for
the guest to respond; that's what's going to happen if we invoke the
response callback; it will appear as if the guest responded whether it
did or not.

What I was suggesting above was to instead of calling the callback for
handing the guest response was to add a device function called
disconnect which would remove any association of host resources from
guest resources before we notified the guest.  Thinking about it again
I'm not sure this is useful, but if we're going to remove the device
without the guests knowledge, I'm not sure how useful sending the
removal requests via ACPI is in the first place.

My feeling is that I'd like to have explicit control over the disconnect
from host resources separate from the device removal *if* we're going to
retain the guest notification.  If we don't care to notify the guest,
then we can just do device removal without notifying the guest
and be done with it.

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-02 19:01                       ` Ryan Harper
@ 2010-11-02 19:17                         ` Michael S. Tsirkin
  2010-11-02 20:23                           ` Ryan Harper
  0 siblings, 1 reply; 60+ messages in thread
From: Michael S. Tsirkin @ 2010-11-02 19:17 UTC (permalink / raw)
  To: Ryan Harper
  Cc: Kevin Wolf, yamahata, Markus Armbruster, qemu-devel,
	Anthony Liguori, Stefan Hajnoczi

On Tue, Nov 02, 2010 at 02:01:08PM -0500, Ryan Harper wrote:
> > > > > I like the idea of disconnect; if part of the device_del method was to
> > > > > invoke a disconnect method, we could implement that for block, net, etc;
> > > > > 
> > > > > I'd think we'd want to send the notification, then disconnect.
> > > > > Struggling with whether it's worth having some reasonable timeout
> > > > > between notification and disconnect.  
> > > > 
> > > > The problem with this is that it has no analog in real world.
> > > > In real world, you can send some notifications to the guest, and you can
> > > > remove the card.  Tying them together is what created the problem in the
> > > > first place.
> > > > 
> > > > Timeouts can be implemented by management, maybe with a nice dialog
> > > > being shown to the user.
> > > 
> > > Very true.  I'm fine with forcing a disconnect during the removal path
> > > prior to notification.  Do we want a new disconnect method at the device
> > > level (pci)? or just use the existing removal callback and call that
> > > during the initial hotremov event?
> > 
> > Not sure what you mean by that, but I don't see a device doing anything
> > differently wrt surprise or ordered removal. So probably the existing
> > callback should do. I don't think we need to talk about disconnect:
> > since we decided we are emulating device removal, let's call it
> > just that.
> 
> Because current the "removal" process depends on the guest actually
> responding.  What I'm suggesting is that, in Marcus's term, and what
> drive_unplug() implements, is to disconnect the host block device from
> the guest device to prevent any further access to it in the case the
> guest doesn't respond to the removal request made via ACPI.
> 
> Very specifically, what we're suggesting instead of the drive_unplug()
> command so to complete the device removal operation without waiting for
> the guest to respond; that's what's going to happen if we invoke the
> response callback; it will appear as if the guest responded whether it
> did or not.
> 
> What I was suggesting above was to instead of calling the callback for
> handing the guest response was to add a device function called
> disconnect which would remove any association of host resources from
> guest resources before we notified the guest.  Thinking about it again
> I'm not sure this is useful, but if we're going to remove the device
> without the guests knowledge, I'm not sure how useful sending the
> removal requests via ACPI is in the first place.
> 
> My feeling is that I'd like to have explicit control over the disconnect
> from host resources separate from the device removal *if* we're going to
> retain the guest notification.  If we don't care to notify the guest,
> then we can just do device removal without notifying the guest
> and be done with it.

I imagine management would typically want to do this:
1. notify guest
2. wait a bit
3. remove device

A twist is when guest disabled the device already.
Then it would just want to remove the device.

> -- 
> Ryan Harper
> Software Engineer; Linux Technology Center
> IBM Corp., Austin, Tx
> ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-02 19:17                         ` Michael S. Tsirkin
@ 2010-11-02 20:23                           ` Ryan Harper
  2010-11-03  7:21                             ` Michael S. Tsirkin
  0 siblings, 1 reply; 60+ messages in thread
From: Ryan Harper @ 2010-11-02 20:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Kevin Wolf, yamahata, Markus Armbruster, qemu-devel,
	Anthony Liguori, Ryan Harper, Stefan Hajnoczi

* Michael S. Tsirkin <mst@redhat.com> [2010-11-02 14:18]:
> On Tue, Nov 02, 2010 at 02:01:08PM -0500, Ryan Harper wrote:
> > > > > > I like the idea of disconnect; if part of the device_del method was to
> > > > > > invoke a disconnect method, we could implement that for block, net, etc;
> > > > > > 
> > > > > > I'd think we'd want to send the notification, then disconnect.
> > > > > > Struggling with whether it's worth having some reasonable timeout
> > > > > > between notification and disconnect.  
> > > > > 
> > > > > The problem with this is that it has no analog in real world.
> > > > > In real world, you can send some notifications to the guest, and you can
> > > > > remove the card.  Tying them together is what created the problem in the
> > > > > first place.
> > > > > 
> > > > > Timeouts can be implemented by management, maybe with a nice dialog
> > > > > being shown to the user.
> > > > 
> > > > Very true.  I'm fine with forcing a disconnect during the removal path
> > > > prior to notification.  Do we want a new disconnect method at the device
> > > > level (pci)? or just use the existing removal callback and call that
> > > > during the initial hotremov event?
> > > 
> > > Not sure what you mean by that, but I don't see a device doing anything
> > > differently wrt surprise or ordered removal. So probably the existing
> > > callback should do. I don't think we need to talk about disconnect:
> > > since we decided we are emulating device removal, let's call it
> > > just that.
> > 
> > Because current the "removal" process depends on the guest actually
> > responding.  What I'm suggesting is that, in Marcus's term, and what
> > drive_unplug() implements, is to disconnect the host block device from
> > the guest device to prevent any further access to it in the case the
> > guest doesn't respond to the removal request made via ACPI.
> > 
> > Very specifically, what we're suggesting instead of the drive_unplug()
> > command so to complete the device removal operation without waiting for
> > the guest to respond; that's what's going to happen if we invoke the
> > response callback; it will appear as if the guest responded whether it
> > did or not.
> > 
> > What I was suggesting above was to instead of calling the callback for
> > handing the guest response was to add a device function called
> > disconnect which would remove any association of host resources from
> > guest resources before we notified the guest.  Thinking about it again
> > I'm not sure this is useful, but if we're going to remove the device
> > without the guests knowledge, I'm not sure how useful sending the
> > removal requests via ACPI is in the first place.
> > 
> > My feeling is that I'd like to have explicit control over the disconnect
> > from host resources separate from the device removal *if* we're going to
> > retain the guest notification.  If we don't care to notify the guest,
> > then we can just do device removal without notifying the guest
> > and be done with it.
> 
> I imagine management would typically want to do this:
> 1. notify guest
> 2. wait a bit
> 3. remove device

Yes; but this argues for (1) being a separate command from (3) unless we
require (3) to include (1) and (2) in the qemu implementation.

Currently we implement:

1. device_del (attempt to remove device)
2. notify guest
3. if guest responds, remove device
4. disconnect host resource from device on destruction

With my drive_unplug patch we do:

1. disconnect host resource from device
2. device_del (attempt to remove device)
3. notify guest
4. if guest responds, remove device

I think we're suggesting to instead do (if we keep disconnect as part of
device_del)

1. device_del (attemp to remove device)
2. notify guest
3. invoke device destruction callback resulting in disconnect host resource from device
4. if guest responds, invoke device destruction path a second time.



-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-02 20:23                           ` Ryan Harper
@ 2010-11-03  7:21                             ` Michael S. Tsirkin
  2010-11-03 12:04                               ` Ryan Harper
  0 siblings, 1 reply; 60+ messages in thread
From: Michael S. Tsirkin @ 2010-11-03  7:21 UTC (permalink / raw)
  To: Ryan Harper
  Cc: Kevin Wolf, yamahata, Markus Armbruster, qemu-devel,
	Anthony Liguori, Stefan Hajnoczi

On Tue, Nov 02, 2010 at 03:23:38PM -0500, Ryan Harper wrote:
> * Michael S. Tsirkin <mst@redhat.com> [2010-11-02 14:18]:
> > On Tue, Nov 02, 2010 at 02:01:08PM -0500, Ryan Harper wrote:
> > > > > > > I like the idea of disconnect; if part of the device_del method was to
> > > > > > > invoke a disconnect method, we could implement that for block, net, etc;
> > > > > > > 
> > > > > > > I'd think we'd want to send the notification, then disconnect.
> > > > > > > Struggling with whether it's worth having some reasonable timeout
> > > > > > > between notification and disconnect.  
> > > > > > 
> > > > > > The problem with this is that it has no analog in real world.
> > > > > > In real world, you can send some notifications to the guest, and you can
> > > > > > remove the card.  Tying them together is what created the problem in the
> > > > > > first place.
> > > > > > 
> > > > > > Timeouts can be implemented by management, maybe with a nice dialog
> > > > > > being shown to the user.
> > > > > 
> > > > > Very true.  I'm fine with forcing a disconnect during the removal path
> > > > > prior to notification.  Do we want a new disconnect method at the device
> > > > > level (pci)? or just use the existing removal callback and call that
> > > > > during the initial hotremov event?
> > > > 
> > > > Not sure what you mean by that, but I don't see a device doing anything
> > > > differently wrt surprise or ordered removal. So probably the existing
> > > > callback should do. I don't think we need to talk about disconnect:
> > > > since we decided we are emulating device removal, let's call it
> > > > just that.
> > > 
> > > Because current the "removal" process depends on the guest actually
> > > responding.  What I'm suggesting is that, in Marcus's term, and what
> > > drive_unplug() implements, is to disconnect the host block device from
> > > the guest device to prevent any further access to it in the case the
> > > guest doesn't respond to the removal request made via ACPI.
> > > 
> > > Very specifically, what we're suggesting instead of the drive_unplug()
> > > command so to complete the device removal operation without waiting for
> > > the guest to respond; that's what's going to happen if we invoke the
> > > response callback; it will appear as if the guest responded whether it
> > > did or not.
> > > 
> > > What I was suggesting above was to instead of calling the callback for
> > > handing the guest response was to add a device function called
> > > disconnect which would remove any association of host resources from
> > > guest resources before we notified the guest.  Thinking about it again
> > > I'm not sure this is useful, but if we're going to remove the device
> > > without the guests knowledge, I'm not sure how useful sending the
> > > removal requests via ACPI is in the first place.
> > > 
> > > My feeling is that I'd like to have explicit control over the disconnect
> > > from host resources separate from the device removal *if* we're going to
> > > retain the guest notification.  If we don't care to notify the guest,
> > > then we can just do device removal without notifying the guest
> > > and be done with it.
> > 
> > I imagine management would typically want to do this:
> > 1. notify guest
> > 2. wait a bit
> > 3. remove device
> 
> Yes; but this argues for (1) being a separate command from (3)

Yes. Long term I think we will want a way to do that.

> unless we
> require (3) to include (1) and (2) in the qemu implementation.
> 
> Currently we implement:
> 
> 1. device_del (attempt to remove device)
> 2. notify guest
> 3. if guest responds, remove device
> 4. disconnect host resource from device on destruction
> 
> With my drive_unplug patch we do:
> 
> 1. disconnect host resource from device

This is what drive_unplug does, right?

> 2. device_del (attempt to remove device)
> 3. notify guest
> 4. if guest responds, remove device
> 
> I think we're suggesting to instead do (if we keep disconnect as part of
> device_del)
> 
> 1. device_del (attemp to remove device)
> 2. notify guest
> 3. invoke device destruction callback resulting in disconnect host resource from device
> 4. if guest responds, invoke device destruction path a second time.

By response you mean eject?  No, this is not what I was suggesting.
I was really suggesting that your patch is fine :)
Sorry about confusion.

I was also saying that from what I hear, the pci express support
will at some point need interfaces to
- notify guest about device removal/addition
- get eject from guest
- remove device without talking to guest
- add device without talking to guest
- suppress device deletion on eject

All this can be generic and can work through express
configuration mechanisms or through acpi for pci.
But this is completely separate from unplugging
the host backend, which should be possible at any point.

> 
> -- 
> Ryan Harper
> Software Engineer; Linux Technology Center
> IBM Corp., Austin, Tx
> ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-03  7:21                             ` Michael S. Tsirkin
@ 2010-11-03 12:04                               ` Ryan Harper
  2010-11-03 16:41                                 ` Markus Armbruster
  0 siblings, 1 reply; 60+ messages in thread
From: Ryan Harper @ 2010-11-03 12:04 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Kevin Wolf, yamahata, Markus Armbruster, qemu-devel,
	Anthony Liguori, Ryan Harper, Stefan Hajnoczi

* Michael S. Tsirkin <mst@redhat.com> [2010-11-03 02:22]:
> On Tue, Nov 02, 2010 at 03:23:38PM -0500, Ryan Harper wrote:
> > * Michael S. Tsirkin <mst@redhat.com> [2010-11-02 14:18]:
> > > On Tue, Nov 02, 2010 at 02:01:08PM -0500, Ryan Harper wrote:
> > > > > > > > I like the idea of disconnect; if part of the device_del method was to
> > > > > > > > invoke a disconnect method, we could implement that for block, net, etc;
> > > > > > > > 
> > > > > > > > I'd think we'd want to send the notification, then disconnect.
> > > > > > > > Struggling with whether it's worth having some reasonable timeout
> > > > > > > > between notification and disconnect.  
> > > > > > > 
> > > > > > > The problem with this is that it has no analog in real world.
> > > > > > > In real world, you can send some notifications to the guest, and you can
> > > > > > > remove the card.  Tying them together is what created the problem in the
> > > > > > > first place.
> > > > > > > 
> > > > > > > Timeouts can be implemented by management, maybe with a nice dialog
> > > > > > > being shown to the user.
> > > > > > 
> > > > > > Very true.  I'm fine with forcing a disconnect during the removal path
> > > > > > prior to notification.  Do we want a new disconnect method at the device
> > > > > > level (pci)? or just use the existing removal callback and call that
> > > > > > during the initial hotremov event?
> > > > > 
> > > > > Not sure what you mean by that, but I don't see a device doing anything
> > > > > differently wrt surprise or ordered removal. So probably the existing
> > > > > callback should do. I don't think we need to talk about disconnect:
> > > > > since we decided we are emulating device removal, let's call it
> > > > > just that.
> > > > 
> > > > Because current the "removal" process depends on the guest actually
> > > > responding.  What I'm suggesting is that, in Marcus's term, and what
> > > > drive_unplug() implements, is to disconnect the host block device from
> > > > the guest device to prevent any further access to it in the case the
> > > > guest doesn't respond to the removal request made via ACPI.
> > > > 
> > > > Very specifically, what we're suggesting instead of the drive_unplug()
> > > > command so to complete the device removal operation without waiting for
> > > > the guest to respond; that's what's going to happen if we invoke the
> > > > response callback; it will appear as if the guest responded whether it
> > > > did or not.
> > > > 
> > > > What I was suggesting above was to instead of calling the callback for
> > > > handing the guest response was to add a device function called
> > > > disconnect which would remove any association of host resources from
> > > > guest resources before we notified the guest.  Thinking about it again
> > > > I'm not sure this is useful, but if we're going to remove the device
> > > > without the guests knowledge, I'm not sure how useful sending the
> > > > removal requests via ACPI is in the first place.
> > > > 
> > > > My feeling is that I'd like to have explicit control over the disconnect
> > > > from host resources separate from the device removal *if* we're going to
> > > > retain the guest notification.  If we don't care to notify the guest,
> > > > then we can just do device removal without notifying the guest
> > > > and be done with it.
> > > 
> > > I imagine management would typically want to do this:
> > > 1. notify guest
> > > 2. wait a bit
> > > 3. remove device
> > 
> > Yes; but this argues for (1) being a separate command from (3)
> 
> Yes. Long term I think we will want a way to do that.
> 
> > unless we
> > require (3) to include (1) and (2) in the qemu implementation.
> > 
> > Currently we implement:
> > 
> > 1. device_del (attempt to remove device)
> > 2. notify guest
> > 3. if guest responds, remove device
> > 4. disconnect host resource from device on destruction
> > 
> > With my drive_unplug patch we do:
> > 
> > 1. disconnect host resource from device
> 
> This is what drive_unplug does, right?

Correct.

> 
> > 2. device_del (attempt to remove device)
> > 3. notify guest
> > 4. if guest responds, remove device
> > 
> > I think we're suggesting to instead do (if we keep disconnect as part of
> > device_del)
> > 
> > 1. device_del (attemp to remove device)
> > 2. notify guest
> > 3. invoke device destruction callback resulting in disconnect host resource from device
> > 4. if guest responds, invoke device destruction path a second time.
> 
> By response you mean eject?  No, this is not what I was suggesting.
> I was really suggesting that your patch is fine :)
> Sorry about confusion.

I don't mean eject; I mean responding to the ACPI event by writing a
response to the PCI chipset which QEMU then in turn will invoke the
qdev_unplug() path which ultimately kills the device and the Drive and
BlockState objects.

> 
> I was also saying that from what I hear, the pci express support
> will at some point need interfaces to
> - notify guest about device removal/addition
> - get eject from guest
> - remove device without talking to guest
> - add device without talking to guest
> - suppress device deletion on eject
> 
> All this can be generic and can work through express
> configuration mechanisms or through acpi for pci.
> But this is completely separate from unplugging
> the host backend, which should be possible at any point.

Yes.  I think we've worked out that we do want an independent
unplug/disconnect mechanism rather than tying it to device_del.

Marcus, it sounds like then you wanted to see a net_unplug/disconnect
and that instead of having device_del always succeed and replacing it
with a shell, we'd need to provide an explicit command to do the
disconnect in a similar fashion to how we're doing drive_unplug?

With at least two of these device types needing an explicit disconnect
to sever the bond between host/guest makes me want a device-level
interface for doing the disconnect that each device can implement
differently.



-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-03 12:04                               ` Ryan Harper
@ 2010-11-03 16:41                                 ` Markus Armbruster
  2010-11-03 17:29                                   ` Ryan Harper
  0 siblings, 1 reply; 60+ messages in thread
From: Markus Armbruster @ 2010-11-03 16:41 UTC (permalink / raw)
  To: Ryan Harper
  Cc: Kevin Wolf, yamahata, Michael S. Tsirkin, qemu-devel,
	Anthony Liguori, Stefan Hajnoczi

Ryan Harper <ryanh@us.ibm.com> writes:

> * Michael S. Tsirkin <mst@redhat.com> [2010-11-03 02:22]:
>> On Tue, Nov 02, 2010 at 03:23:38PM -0500, Ryan Harper wrote:
>> > * Michael S. Tsirkin <mst@redhat.com> [2010-11-02 14:18]:
>> > > On Tue, Nov 02, 2010 at 02:01:08PM -0500, Ryan Harper wrote:
>> > > > > > > > I like the idea of disconnect; if part of the device_del method was to
>> > > > > > > > invoke a disconnect method, we could implement that for block, net, etc;
>> > > > > > > > 
>> > > > > > > > I'd think we'd want to send the notification, then disconnect.
>> > > > > > > > Struggling with whether it's worth having some reasonable timeout
>> > > > > > > > between notification and disconnect.  
>> > > > > > > 
>> > > > > > > The problem with this is that it has no analog in real world.
>> > > > > > > In real world, you can send some notifications to the guest, and you can
>> > > > > > > remove the card.  Tying them together is what created the problem in the
>> > > > > > > first place.
>> > > > > > > 
>> > > > > > > Timeouts can be implemented by management, maybe with a nice dialog
>> > > > > > > being shown to the user.
>> > > > > > 
>> > > > > > Very true.  I'm fine with forcing a disconnect during the removal path
>> > > > > > prior to notification.  Do we want a new disconnect method at the device
>> > > > > > level (pci)? or just use the existing removal callback and call that
>> > > > > > during the initial hotremov event?
>> > > > > 
>> > > > > Not sure what you mean by that, but I don't see a device doing anything
>> > > > > differently wrt surprise or ordered removal. So probably the existing
>> > > > > callback should do. I don't think we need to talk about disconnect:
>> > > > > since we decided we are emulating device removal, let's call it
>> > > > > just that.
>> > > > 
>> > > > Because current the "removal" process depends on the guest actually
>> > > > responding.  What I'm suggesting is that, in Marcus's term, and what
>> > > > drive_unplug() implements, is to disconnect the host block device from
>> > > > the guest device to prevent any further access to it in the case the
>> > > > guest doesn't respond to the removal request made via ACPI.
>> > > > 
>> > > > Very specifically, what we're suggesting instead of the drive_unplug()
>> > > > command so to complete the device removal operation without waiting for
>> > > > the guest to respond; that's what's going to happen if we invoke the
>> > > > response callback; it will appear as if the guest responded whether it
>> > > > did or not.
>> > > > 
>> > > > What I was suggesting above was to instead of calling the callback for
>> > > > handing the guest response was to add a device function called
>> > > > disconnect which would remove any association of host resources from
>> > > > guest resources before we notified the guest.  Thinking about it again
>> > > > I'm not sure this is useful, but if we're going to remove the device
>> > > > without the guests knowledge, I'm not sure how useful sending the
>> > > > removal requests via ACPI is in the first place.
>> > > > 
>> > > > My feeling is that I'd like to have explicit control over the disconnect
>> > > > from host resources separate from the device removal *if* we're going to
>> > > > retain the guest notification.  If we don't care to notify the guest,
>> > > > then we can just do device removal without notifying the guest
>> > > > and be done with it.
>> > > 
>> > > I imagine management would typically want to do this:
>> > > 1. notify guest
>> > > 2. wait a bit
>> > > 3. remove device
>> > 
>> > Yes; but this argues for (1) being a separate command from (3)
>> 
>> Yes. Long term I think we will want a way to do that.
>> 
>> > unless we
>> > require (3) to include (1) and (2) in the qemu implementation.
>> > 
>> > Currently we implement:
>> > 
>> > 1. device_del (attempt to remove device)
>> > 2. notify guest
>> > 3. if guest responds, remove device
>> > 4. disconnect host resource from device on destruction
>> > 
>> > With my drive_unplug patch we do:
>> > 
>> > 1. disconnect host resource from device
>> 
>> This is what drive_unplug does, right?
>
> Correct.
>
>> 
>> > 2. device_del (attempt to remove device)
>> > 3. notify guest
>> > 4. if guest responds, remove device
>> > 
>> > I think we're suggesting to instead do (if we keep disconnect as part of
>> > device_del)
>> > 
>> > 1. device_del (attemp to remove device)
>> > 2. notify guest
>> > 3. invoke device destruction callback resulting in disconnect host resource from device
>> > 4. if guest responds, invoke device destruction path a second time.
>> 
>> By response you mean eject?  No, this is not what I was suggesting.
>> I was really suggesting that your patch is fine :)
>> Sorry about confusion.
>
> I don't mean eject; I mean responding to the ACPI event by writing a
> response to the PCI chipset which QEMU then in turn will invoke the
> qdev_unplug() path which ultimately kills the device and the Drive and
> BlockState objects.
>
>> 
>> I was also saying that from what I hear, the pci express support
>> will at some point need interfaces to
>> - notify guest about device removal/addition
>> - get eject from guest
>> - remove device without talking to guest
>> - add device without talking to guest
>> - suppress device deletion on eject
>> 
>> All this can be generic and can work through express
>> configuration mechanisms or through acpi for pci.
>> But this is completely separate from unplugging
>> the host backend, which should be possible at any point.
>
> Yes.  I think we've worked out that we do want an independent
> unplug/disconnect mechanism rather than tying it to device_del.
>
> Marcus, it sounds like then you wanted to see a net_unplug/disconnect
> and that instead of having device_del always succeed and replacing it
> with a shell, we'd need to provide an explicit command to do the
> disconnect in a similar fashion to how we're doing drive_unplug?

I'm not sure I parse this.

> With at least two of these device types needing an explicit disconnect
> to sever the bond between host/guest makes me want a device-level
> interface for doing the disconnect that each device can implement
> differently.

I'm fine with having a separate command to forcibly disconnect a device
from its host resources.

Typical use:

1. device_del
   ask guest to give up device, via ACPI

2a. guest replies "done", delete device, free host resources

2b. timeout, device_disconnect (or however we call that)

Is this what you have in mind?


With qdev, device models are connected to host resources with special
properties such as qdev_prop_netdev and qdev_prop_drive.  Thus, generic
qdev code can already find and disconnect them.

How can we make sure device models survive such a disconnect?

* Ask the device to disconnect itself (new DeviceInfo method).
  Drawback: duplicates common functionality in every device model.
  More code, more bugs.

* Let qdev core disconnect and free host resources

  - and replace them with dummies.  I guess we'd need a dummy
    constructor method for that, in PropertyInfo.  Done right, device
    models should be able to carry on unawares.

  - and leave them null.  Device models need to cope with that.  NICs
    do for netdev.

  We might need to notify the device model (new DeviceInfo method).
  Dunno.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-03 16:41                                 ` Markus Armbruster
@ 2010-11-03 17:29                                   ` Ryan Harper
  2010-11-03 18:02                                     ` Michael S. Tsirkin
  0 siblings, 1 reply; 60+ messages in thread
From: Ryan Harper @ 2010-11-03 17:29 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Kevin Wolf, Michael S. Tsirkin, qemu-devel, Anthony Liguori,
	Ryan Harper, Stefan Hajnoczi, yamahata

* Markus Armbruster <armbru@redhat.com> [2010-11-03 11:42]:
> Ryan Harper <ryanh@us.ibm.com> writes:
> 
> > * Michael S. Tsirkin <mst@redhat.com> [2010-11-03 02:22]:
> >> On Tue, Nov 02, 2010 at 03:23:38PM -0500, Ryan Harper wrote:
> >> > * Michael S. Tsirkin <mst@redhat.com> [2010-11-02 14:18]:
> >> > > On Tue, Nov 02, 2010 at 02:01:08PM -0500, Ryan Harper wrote:
> >> > > > > > > > I like the idea of disconnect; if part of the device_del method was to
> >> > > > > > > > invoke a disconnect method, we could implement that for block, net, etc;
> >> > > > > > > > 
> >> > > > > > > > I'd think we'd want to send the notification, then disconnect.
> >> > > > > > > > Struggling with whether it's worth having some reasonable timeout
> >> > > > > > > > between notification and disconnect.  
> >> > > > > > > 
> >> > > > > > > The problem with this is that it has no analog in real world.
> >> > > > > > > In real world, you can send some notifications to the guest, and you can
> >> > > > > > > remove the card.  Tying them together is what created the problem in the
> >> > > > > > > first place.
> >> > > > > > > 
> >> > > > > > > Timeouts can be implemented by management, maybe with a nice dialog
> >> > > > > > > being shown to the user.
> >> > > > > > 
> >> > > > > > Very true.  I'm fine with forcing a disconnect during the removal path
> >> > > > > > prior to notification.  Do we want a new disconnect method at the device
> >> > > > > > level (pci)? or just use the existing removal callback and call that
> >> > > > > > during the initial hotremov event?
> >> > > > > 
> >> > > > > Not sure what you mean by that, but I don't see a device doing anything
> >> > > > > differently wrt surprise or ordered removal. So probably the existing
> >> > > > > callback should do. I don't think we need to talk about disconnect:
> >> > > > > since we decided we are emulating device removal, let's call it
> >> > > > > just that.
> >> > > > 
> >> > > > Because current the "removal" process depends on the guest actually
> >> > > > responding.  What I'm suggesting is that, in Marcus's term, and what
> >> > > > drive_unplug() implements, is to disconnect the host block device from
> >> > > > the guest device to prevent any further access to it in the case the
> >> > > > guest doesn't respond to the removal request made via ACPI.
> >> > > > 
> >> > > > Very specifically, what we're suggesting instead of the drive_unplug()
> >> > > > command so to complete the device removal operation without waiting for
> >> > > > the guest to respond; that's what's going to happen if we invoke the
> >> > > > response callback; it will appear as if the guest responded whether it
> >> > > > did or not.
> >> > > > 
> >> > > > What I was suggesting above was to instead of calling the callback for
> >> > > > handing the guest response was to add a device function called
> >> > > > disconnect which would remove any association of host resources from
> >> > > > guest resources before we notified the guest.  Thinking about it again
> >> > > > I'm not sure this is useful, but if we're going to remove the device
> >> > > > without the guests knowledge, I'm not sure how useful sending the
> >> > > > removal requests via ACPI is in the first place.
> >> > > > 
> >> > > > My feeling is that I'd like to have explicit control over the disconnect
> >> > > > from host resources separate from the device removal *if* we're going to
> >> > > > retain the guest notification.  If we don't care to notify the guest,
> >> > > > then we can just do device removal without notifying the guest
> >> > > > and be done with it.
> >> > > 
> >> > > I imagine management would typically want to do this:
> >> > > 1. notify guest
> >> > > 2. wait a bit
> >> > > 3. remove device
> >> > 
> >> > Yes; but this argues for (1) being a separate command from (3)
> >> 
> >> Yes. Long term I think we will want a way to do that.
> >> 
> >> > unless we
> >> > require (3) to include (1) and (2) in the qemu implementation.
> >> > 
> >> > Currently we implement:
> >> > 
> >> > 1. device_del (attempt to remove device)
> >> > 2. notify guest
> >> > 3. if guest responds, remove device
> >> > 4. disconnect host resource from device on destruction
> >> > 
> >> > With my drive_unplug patch we do:
> >> > 
> >> > 1. disconnect host resource from device
> >> 
> >> This is what drive_unplug does, right?
> >
> > Correct.
> >
> >> 
> >> > 2. device_del (attempt to remove device)
> >> > 3. notify guest
> >> > 4. if guest responds, remove device
> >> > 
> >> > I think we're suggesting to instead do (if we keep disconnect as part of
> >> > device_del)
> >> > 
> >> > 1. device_del (attemp to remove device)
> >> > 2. notify guest
> >> > 3. invoke device destruction callback resulting in disconnect host resource from device
> >> > 4. if guest responds, invoke device destruction path a second time.
> >> 
> >> By response you mean eject?  No, this is not what I was suggesting.
> >> I was really suggesting that your patch is fine :)
> >> Sorry about confusion.
> >
> > I don't mean eject; I mean responding to the ACPI event by writing a
> > response to the PCI chipset which QEMU then in turn will invoke the
> > qdev_unplug() path which ultimately kills the device and the Drive and
> > BlockState objects.
> >
> >> 
> >> I was also saying that from what I hear, the pci express support
> >> will at some point need interfaces to
> >> - notify guest about device removal/addition
> >> - get eject from guest
> >> - remove device without talking to guest
> >> - add device without talking to guest
> >> - suppress device deletion on eject
> >> 
> >> All this can be generic and can work through express
> >> configuration mechanisms or through acpi for pci.
> >> But this is completely separate from unplugging
> >> the host backend, which should be possible at any point.
> >
> > Yes.  I think we've worked out that we do want an independent
> > unplug/disconnect mechanism rather than tying it to device_del.
> >
> > Marcus, it sounds like then you wanted to see a net_unplug/disconnect
> > and that instead of having device_del always succeed and replacing it
> > with a shell, we'd need to provide an explicit command to do the
> > disconnect in a similar fashion to how we're doing drive_unplug?
> 
> I'm not sure I parse this.

You were asking for net and block disconnect to have similar mechanisms.
You mentioned the net fix for suprise removal was to have device_del()
always succeed by replacing the device with a shell/zombie.  The
drive_unplug() patch doesn't do the same thing; it doesn't affect the
device_del() path at all, rather it provides mgmt apps a hook to
directly disconnect host resource from guest resource.

> 
> > With at least two of these device types needing an explicit disconnect
> > to sever the bond between host/guest makes me want a device-level
> > interface for doing the disconnect that each device can implement
> > differently.
> 
> I'm fine with having a separate command to forcibly disconnect a device
> from its host resources.
> 
> Typical use:
> 
> 1. device_del
>    ask guest to give up device, via ACPI
> 
> 2a. guest replies "done", delete device, free host resources
> 
> 2b. timeout, device_disconnect (or however we call that)
> 
> Is this what you have in mind?

Yeah, aboslutely.  I think Michael was saying we should implement 2b in
the mgmt stack.  The current libvirt does the following 

1. mgmt invokes detach-device
2. device_del
3. update mgmt view of resources, assumes guest has done it's part; does
not confirm with qemu that device has been deleted.

With drive_unplug in qemu and a patch to libvirt, it looks like:

1. mgmt invokes detach-device
2a. call drive_unplug, log warning if drive_unplug isn't available
2b. device_del
3. update mgmt view of resources, assumes guest has done it's part; does
not confirm with qemu that device has been deleted.

I can look at implementing the timeout before invoking the unplug
(that's a bit tricky) in libvirt; but given the fact that the mgmt is
invoking the removal I think it's reasonable to do forced disconnect
(even if the guest hasn't responded).

> 
> 
> With qdev, device models are connected to host resources with special
> properties such as qdev_prop_netdev and qdev_prop_drive.  Thus, generic
> qdev code can already find and disconnect them.
> 
> How can we make sure device models survive such a disconnect?
> 
> * Ask the device to disconnect itself (new DeviceInfo method).
>   Drawback: duplicates common functionality in every device model.
>   More code, more bugs.
> 
> * Let qdev core disconnect and free host resources
> 
>   - and replace them with dummies.  I guess we'd need a dummy
>     constructor method for that, in PropertyInfo.  Done right, device
>     models should be able to carry on unawares.
> 
>   - and leave them null.  Device models need to cope with that.  NICs
>     do for netdev.
> 

I like the latter here; the BlockDriverState handles nulls.  I think
netdev should be able to as well though I haven't looked very closely
though so maybe Michael can confirm if that's a true statement.

>   We might need to notify the device model (new DeviceInfo method).
>   Dunno.



-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-03 17:29                                   ` Ryan Harper
@ 2010-11-03 18:02                                     ` Michael S. Tsirkin
  2010-11-03 20:59                                       ` Ryan Harper
  0 siblings, 1 reply; 60+ messages in thread
From: Michael S. Tsirkin @ 2010-11-03 18:02 UTC (permalink / raw)
  To: Ryan Harper
  Cc: Kevin Wolf, yamahata, Markus Armbruster, qemu-devel,
	Anthony Liguori, Stefan Hajnoczi

On Wed, Nov 03, 2010 at 12:29:10PM -0500, Ryan Harper wrote:
> * Markus Armbruster <armbru@redhat.com> [2010-11-03 11:42]:
> > Ryan Harper <ryanh@us.ibm.com> writes:
> > 
> > > * Michael S. Tsirkin <mst@redhat.com> [2010-11-03 02:22]:
> > >> On Tue, Nov 02, 2010 at 03:23:38PM -0500, Ryan Harper wrote:
> > >> > * Michael S. Tsirkin <mst@redhat.com> [2010-11-02 14:18]:
> > >> > > On Tue, Nov 02, 2010 at 02:01:08PM -0500, Ryan Harper wrote:
> > >> > > > > > > > I like the idea of disconnect; if part of the device_del method was to
> > >> > > > > > > > invoke a disconnect method, we could implement that for block, net, etc;
> > >> > > > > > > > 
> > >> > > > > > > > I'd think we'd want to send the notification, then disconnect.
> > >> > > > > > > > Struggling with whether it's worth having some reasonable timeout
> > >> > > > > > > > between notification and disconnect.  
> > >> > > > > > > 
> > >> > > > > > > The problem with this is that it has no analog in real world.
> > >> > > > > > > In real world, you can send some notifications to the guest, and you can
> > >> > > > > > > remove the card.  Tying them together is what created the problem in the
> > >> > > > > > > first place.
> > >> > > > > > > 
> > >> > > > > > > Timeouts can be implemented by management, maybe with a nice dialog
> > >> > > > > > > being shown to the user.
> > >> > > > > > 
> > >> > > > > > Very true.  I'm fine with forcing a disconnect during the removal path
> > >> > > > > > prior to notification.  Do we want a new disconnect method at the device
> > >> > > > > > level (pci)? or just use the existing removal callback and call that
> > >> > > > > > during the initial hotremov event?
> > >> > > > > 
> > >> > > > > Not sure what you mean by that, but I don't see a device doing anything
> > >> > > > > differently wrt surprise or ordered removal. So probably the existing
> > >> > > > > callback should do. I don't think we need to talk about disconnect:
> > >> > > > > since we decided we are emulating device removal, let's call it
> > >> > > > > just that.
> > >> > > > 
> > >> > > > Because current the "removal" process depends on the guest actually
> > >> > > > responding.  What I'm suggesting is that, in Marcus's term, and what
> > >> > > > drive_unplug() implements, is to disconnect the host block device from
> > >> > > > the guest device to prevent any further access to it in the case the
> > >> > > > guest doesn't respond to the removal request made via ACPI.
> > >> > > > 
> > >> > > > Very specifically, what we're suggesting instead of the drive_unplug()
> > >> > > > command so to complete the device removal operation without waiting for
> > >> > > > the guest to respond; that's what's going to happen if we invoke the
> > >> > > > response callback; it will appear as if the guest responded whether it
> > >> > > > did or not.
> > >> > > > 
> > >> > > > What I was suggesting above was to instead of calling the callback for
> > >> > > > handing the guest response was to add a device function called
> > >> > > > disconnect which would remove any association of host resources from
> > >> > > > guest resources before we notified the guest.  Thinking about it again
> > >> > > > I'm not sure this is useful, but if we're going to remove the device
> > >> > > > without the guests knowledge, I'm not sure how useful sending the
> > >> > > > removal requests via ACPI is in the first place.
> > >> > > > 
> > >> > > > My feeling is that I'd like to have explicit control over the disconnect
> > >> > > > from host resources separate from the device removal *if* we're going to
> > >> > > > retain the guest notification.  If we don't care to notify the guest,
> > >> > > > then we can just do device removal without notifying the guest
> > >> > > > and be done with it.
> > >> > > 
> > >> > > I imagine management would typically want to do this:
> > >> > > 1. notify guest
> > >> > > 2. wait a bit
> > >> > > 3. remove device
> > >> > 
> > >> > Yes; but this argues for (1) being a separate command from (3)
> > >> 
> > >> Yes. Long term I think we will want a way to do that.
> > >> 
> > >> > unless we
> > >> > require (3) to include (1) and (2) in the qemu implementation.
> > >> > 
> > >> > Currently we implement:
> > >> > 
> > >> > 1. device_del (attempt to remove device)
> > >> > 2. notify guest
> > >> > 3. if guest responds, remove device
> > >> > 4. disconnect host resource from device on destruction
> > >> > 
> > >> > With my drive_unplug patch we do:
> > >> > 
> > >> > 1. disconnect host resource from device
> > >> 
> > >> This is what drive_unplug does, right?
> > >
> > > Correct.
> > >
> > >> 
> > >> > 2. device_del (attempt to remove device)
> > >> > 3. notify guest
> > >> > 4. if guest responds, remove device
> > >> > 
> > >> > I think we're suggesting to instead do (if we keep disconnect as part of
> > >> > device_del)
> > >> > 
> > >> > 1. device_del (attemp to remove device)
> > >> > 2. notify guest
> > >> > 3. invoke device destruction callback resulting in disconnect host resource from device
> > >> > 4. if guest responds, invoke device destruction path a second time.
> > >> 
> > >> By response you mean eject?  No, this is not what I was suggesting.
> > >> I was really suggesting that your patch is fine :)
> > >> Sorry about confusion.
> > >
> > > I don't mean eject; I mean responding to the ACPI event by writing a
> > > response to the PCI chipset which QEMU then in turn will invoke the
> > > qdev_unplug() path which ultimately kills the device and the Drive and
> > > BlockState objects.
> > >
> > >> 
> > >> I was also saying that from what I hear, the pci express support
> > >> will at some point need interfaces to
> > >> - notify guest about device removal/addition
> > >> - get eject from guest
> > >> - remove device without talking to guest
> > >> - add device without talking to guest
> > >> - suppress device deletion on eject
> > >> 
> > >> All this can be generic and can work through express
> > >> configuration mechanisms or through acpi for pci.
> > >> But this is completely separate from unplugging
> > >> the host backend, which should be possible at any point.
> > >
> > > Yes.  I think we've worked out that we do want an independent
> > > unplug/disconnect mechanism rather than tying it to device_del.
> > >
> > > Marcus, it sounds like then you wanted to see a net_unplug/disconnect
> > > and that instead of having device_del always succeed and replacing it
> > > with a shell, we'd need to provide an explicit command to do the
> > > disconnect in a similar fashion to how we're doing drive_unplug?
> > 
> > I'm not sure I parse this.
> 
> You were asking for net and block disconnect to have similar mechanisms.
> You mentioned the net fix for suprise removal was to have device_del()
> always succeed by replacing the device with a shell/zombie.  The
> drive_unplug() patch doesn't do the same thing; it doesn't affect the
> device_del() path at all, rather it provides mgmt apps a hook to
> directly disconnect host resource from guest resource.

Yes, the shell thing is just an implementation detail.

> > 
> > > With at least two of these device types needing an explicit disconnect
> > > to sever the bond between host/guest makes me want a device-level
> > > interface for doing the disconnect that each device can implement
> > > differently.
> > 
> > I'm fine with having a separate command to forcibly disconnect a device
> > from its host resources.
> > 
> > Typical use:
> > 
> > 1. device_del
> >    ask guest to give up device, via ACPI
> > 
> > 2a. guest replies "done", delete device, free host resources
> > 
> > 2b. timeout, device_disconnect (or however we call that)
> > 
> > Is this what you have in mind?
> 
> Yeah, aboslutely.  I think Michael was saying we should implement 2b in
> the mgmt stack.  The current libvirt does the following 
> 
> 1. mgmt invokes detach-device
> 2. device_del
> 3. update mgmt view of resources, assumes guest has done it's part; does
> not confirm with qemu that device has been deleted.
> 
> With drive_unplug in qemu and a patch to libvirt, it looks like:
> 
> 1. mgmt invokes detach-device
> 2a. call drive_unplug, log warning if drive_unplug isn't available
> 2b. device_del
> 3. update mgmt view of resources, assumes guest has done it's part; does
> not confirm with qemu that device has been deleted.
> 
> I can look at implementing the timeout before invoking the unplug
> (that's a bit tricky) in libvirt;


So we'd 
1. reorder 2a and 2b, and add a small timeout
2. teach libvirt not to reuse the PCI slot and device id
   until it is really free

Sounds good.

> but given the fact that the mgmt is
> invoking the removal I think it's reasonable to do forced disconnect
> (even if the guest hasn't responded).

This is really making an assumption about the user.
Giving the guest a bit of time to respond with eject seems prudent.
For disk we risk losing data otherwise.
The only reason we are pushing this out to management is so
it can track state implement timeouts and interact with the user.
If it doesn't, what's the point? Let's keep it all in qemu...

> > 
> > 
> > With qdev, device models are connected to host resources with special
> > properties such as qdev_prop_netdev and qdev_prop_drive.  Thus, generic
> > qdev code can already find and disconnect them.
> > 
> > How can we make sure device models survive such a disconnect?
> > 
> > * Ask the device to disconnect itself (new DeviceInfo method).
> >   Drawback: duplicates common functionality in every device model.
> >   More code, more bugs.
> > 
> > * Let qdev core disconnect and free host resources
> > 
> >   - and replace them with dummies.  I guess we'd need a dummy
> >     constructor method for that, in PropertyInfo.  Done right, device
> >     models should be able to carry on unawares.
> > 
> >   - and leave them null.  Device models need to cope with that.  NICs
> >     do for netdev.
> > 
> 
> I like the latter here; the BlockDriverState handles nulls.  I think
> netdev should be able to as well though I haven't looked very closely
> though so maybe Michael can confirm if that's a true statement.

Not at the moment: the issue is that NULL means legacy vlan setup there.
We can rework the code to avoid that assumption but it's not on my
priority list.

> >   We might need to notify the device model (new DeviceInfo method).
> >   Dunno.
> 
> 
> 
> -- 
> Ryan Harper
> Software Engineer; Linux Technology Center
> IBM Corp., Austin, Tx
> ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-03 18:02                                     ` Michael S. Tsirkin
@ 2010-11-03 20:59                                       ` Ryan Harper
  2010-11-03 21:26                                         ` Michael S. Tsirkin
  0 siblings, 1 reply; 60+ messages in thread
From: Ryan Harper @ 2010-11-03 20:59 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Kevin Wolf, Anthony Liguori, Markus Armbruster, qemu-devel,
	yamahata, Ryan Harper, Stefan Hajnoczi

* Michael S. Tsirkin <mst@redhat.com> [2010-11-03 13:03]:
> On Wed, Nov 03, 2010 at 12:29:10PM -0500, Ryan Harper wrote:
> > * Markus Armbruster <armbru@redhat.com> [2010-11-03 11:42]:
> > > Ryan Harper <ryanh@us.ibm.com> writes:
> > > 
> > > > * Michael S. Tsirkin <mst@redhat.com> [2010-11-03 02:22]:
> > > >> On Tue, Nov 02, 2010 at 03:23:38PM -0500, Ryan Harper wrote:
> > > >> > * Michael S. Tsirkin <mst@redhat.com> [2010-11-02 14:18]:
> > > >> > > On Tue, Nov 02, 2010 at 02:01:08PM -0500, Ryan Harper wrote:
> > > >> > > > > > > > I like the idea of disconnect; if part of the device_del method was to
> > > >> > > > > > > > invoke a disconnect method, we could implement that for block, net, etc;
> > > >> > > > > > > > 
> > > >> > > > > > > > I'd think we'd want to send the notification, then disconnect.
> > > >> > > > > > > > Struggling with whether it's worth having some reasonable timeout
> > > >> > > > > > > > between notification and disconnect.  
> > > >> > > > > > > 
> > > >> > > > > > > The problem with this is that it has no analog in real world.
> > > >> > > > > > > In real world, you can send some notifications to the guest, and you can
> > > >> > > > > > > remove the card.  Tying them together is what created the problem in the
> > > >> > > > > > > first place.
> > > >> > > > > > > 
> > > >> > > > > > > Timeouts can be implemented by management, maybe with a nice dialog
> > > >> > > > > > > being shown to the user.
> > > >> > > > > > 
> > > >> > > > > > Very true.  I'm fine with forcing a disconnect during the removal path
> > > >> > > > > > prior to notification.  Do we want a new disconnect method at the device
> > > >> > > > > > level (pci)? or just use the existing removal callback and call that
> > > >> > > > > > during the initial hotremov event?
> > > >> > > > > 
> > > >> > > > > Not sure what you mean by that, but I don't see a device doing anything
> > > >> > > > > differently wrt surprise or ordered removal. So probably the existing
> > > >> > > > > callback should do. I don't think we need to talk about disconnect:
> > > >> > > > > since we decided we are emulating device removal, let's call it
> > > >> > > > > just that.
> > > >> > > > 
> > > >> > > > Because current the "removal" process depends on the guest actually
> > > >> > > > responding.  What I'm suggesting is that, in Marcus's term, and what
> > > >> > > > drive_unplug() implements, is to disconnect the host block device from
> > > >> > > > the guest device to prevent any further access to it in the case the
> > > >> > > > guest doesn't respond to the removal request made via ACPI.
> > > >> > > > 
> > > >> > > > Very specifically, what we're suggesting instead of the drive_unplug()
> > > >> > > > command so to complete the device removal operation without waiting for
> > > >> > > > the guest to respond; that's what's going to happen if we invoke the
> > > >> > > > response callback; it will appear as if the guest responded whether it
> > > >> > > > did or not.
> > > >> > > > 
> > > >> > > > What I was suggesting above was to instead of calling the callback for
> > > >> > > > handing the guest response was to add a device function called
> > > >> > > > disconnect which would remove any association of host resources from
> > > >> > > > guest resources before we notified the guest.  Thinking about it again
> > > >> > > > I'm not sure this is useful, but if we're going to remove the device
> > > >> > > > without the guests knowledge, I'm not sure how useful sending the
> > > >> > > > removal requests via ACPI is in the first place.
> > > >> > > > 
> > > >> > > > My feeling is that I'd like to have explicit control over the disconnect
> > > >> > > > from host resources separate from the device removal *if* we're going to
> > > >> > > > retain the guest notification.  If we don't care to notify the guest,
> > > >> > > > then we can just do device removal without notifying the guest
> > > >> > > > and be done with it.
> > > >> > > 
> > > >> > > I imagine management would typically want to do this:
> > > >> > > 1. notify guest
> > > >> > > 2. wait a bit
> > > >> > > 3. remove device
> > > >> > 
> > > >> > Yes; but this argues for (1) being a separate command from (3)
> > > >> 
> > > >> Yes. Long term I think we will want a way to do that.
> > > >> 
> > > >> > unless we
> > > >> > require (3) to include (1) and (2) in the qemu implementation.
> > > >> > 
> > > >> > Currently we implement:
> > > >> > 
> > > >> > 1. device_del (attempt to remove device)
> > > >> > 2. notify guest
> > > >> > 3. if guest responds, remove device
> > > >> > 4. disconnect host resource from device on destruction
> > > >> > 
> > > >> > With my drive_unplug patch we do:
> > > >> > 
> > > >> > 1. disconnect host resource from device
> > > >> 
> > > >> This is what drive_unplug does, right?
> > > >
> > > > Correct.
> > > >
> > > >> 
> > > >> > 2. device_del (attempt to remove device)
> > > >> > 3. notify guest
> > > >> > 4. if guest responds, remove device
> > > >> > 
> > > >> > I think we're suggesting to instead do (if we keep disconnect as part of
> > > >> > device_del)
> > > >> > 
> > > >> > 1. device_del (attemp to remove device)
> > > >> > 2. notify guest
> > > >> > 3. invoke device destruction callback resulting in disconnect host resource from device
> > > >> > 4. if guest responds, invoke device destruction path a second time.
> > > >> 
> > > >> By response you mean eject?  No, this is not what I was suggesting.
> > > >> I was really suggesting that your patch is fine :)
> > > >> Sorry about confusion.
> > > >
> > > > I don't mean eject; I mean responding to the ACPI event by writing a
> > > > response to the PCI chipset which QEMU then in turn will invoke the
> > > > qdev_unplug() path which ultimately kills the device and the Drive and
> > > > BlockState objects.
> > > >
> > > >> 
> > > >> I was also saying that from what I hear, the pci express support
> > > >> will at some point need interfaces to
> > > >> - notify guest about device removal/addition
> > > >> - get eject from guest
> > > >> - remove device without talking to guest
> > > >> - add device without talking to guest
> > > >> - suppress device deletion on eject
> > > >> 
> > > >> All this can be generic and can work through express
> > > >> configuration mechanisms or through acpi for pci.
> > > >> But this is completely separate from unplugging
> > > >> the host backend, which should be possible at any point.
> > > >
> > > > Yes.  I think we've worked out that we do want an independent
> > > > unplug/disconnect mechanism rather than tying it to device_del.
> > > >
> > > > Marcus, it sounds like then you wanted to see a net_unplug/disconnect
> > > > and that instead of having device_del always succeed and replacing it
> > > > with a shell, we'd need to provide an explicit command to do the
> > > > disconnect in a similar fashion to how we're doing drive_unplug?
> > > 
> > > I'm not sure I parse this.
> > 
> > You were asking for net and block disconnect to have similar mechanisms.
> > You mentioned the net fix for suprise removal was to have device_del()
> > always succeed by replacing the device with a shell/zombie.  The
> > drive_unplug() patch doesn't do the same thing; it doesn't affect the
> > device_del() path at all, rather it provides mgmt apps a hook to
> > directly disconnect host resource from guest resource.
> 
> Yes, the shell thing is just an implementation detail.

ok.  What qemu monitor command do I call for net delete to do the
"disconnect/unplug"?

> 
> > > 
> > > > With at least two of these device types needing an explicit disconnect
> > > > to sever the bond between host/guest makes me want a device-level
> > > > interface for doing the disconnect that each device can implement
> > > > differently.
> > > 
> > > I'm fine with having a separate command to forcibly disconnect a device
> > > from its host resources.
> > > 
> > > Typical use:
> > > 
> > > 1. device_del
> > >    ask guest to give up device, via ACPI
> > > 
> > > 2a. guest replies "done", delete device, free host resources
> > > 
> > > 2b. timeout, device_disconnect (or however we call that)
> > > 
> > > Is this what you have in mind?
> > 
> > Yeah, aboslutely.  I think Michael was saying we should implement 2b in
> > the mgmt stack.  The current libvirt does the following 
> > 
> > 1. mgmt invokes detach-device
> > 2. device_del
> > 3. update mgmt view of resources, assumes guest has done it's part; does
> > not confirm with qemu that device has been deleted.
> > 
> > With drive_unplug in qemu and a patch to libvirt, it looks like:
> > 
> > 1. mgmt invokes detach-device
> > 2a. call drive_unplug, log warning if drive_unplug isn't available
> > 2b. device_del
> > 3. update mgmt view of resources, assumes guest has done it's part; does
> > not confirm with qemu that device has been deleted.
> > 
> > I can look at implementing the timeout before invoking the unplug
> > (that's a bit tricky) in libvirt;
> 
> 
> So we'd 
> 1. reorder 2a and 2b, and add a small timeout
> 2. teach libvirt not to reuse the PCI slot and device id
>    until it is really free
> 
> Sounds good.

We're talking libvirt code here; so we'll need to start up that thread
there.  (1) is probably reasonable.  (2) is the harder part.  We'll need
some help in figuring out how to do that one.  Maybe it can be done on
the attach path (check if the slot is available in qemu).  I know there
is some code to allocte slots in a structure that libvirt maintains.

I'll start a thread over there.

> 
> > but given the fact that the mgmt is
> > invoking the removal I think it's reasonable to do forced disconnect
> > (even if the guest hasn't responded).
> 
> This is really making an assumption about the user.
> Giving the guest a bit of time to respond with eject seems prudent.

I don't disagree that the notification is nice; but I'm not sure I see
it as a requirement for correctness of behavior.  The device is being
deleted *explicitly* at the user's request.  If the user invokes removal
and it still using the device; the kernel doesn't do anything special
here; it just responds to the interrupt and destroys the resource; this
will result in the user app being hung on pending IO.  The same thing
happens if we disconnect the host side device.  

Now, if we want to talk about nice; we'd need to do some improvements on
the Linux acpi removal code where by we flush all pending io in the
device before we respond to the device removal; 

> For disk we risk losing data otherwise.

I don't think that's true at all.  Even on suprise removal we complete
pending io the device and return -EIO back to the guest.  The app may or
maynot be robust enough to handle it the errors but we're definitely not
losing data.

> The only reason we are pushing this out to management is so
> it can track state implement timeouts and interact with the user.
> If it doesn't, what's the point? Let's keep it all in qemu...
> 
> > > 
> > > 
> > > With qdev, device models are connected to host resources with special
> > > properties such as qdev_prop_netdev and qdev_prop_drive.  Thus, generic
> > > qdev code can already find and disconnect them.
> > > 
> > > How can we make sure device models survive such a disconnect?
> > > 
> > > * Ask the device to disconnect itself (new DeviceInfo method).
> > >   Drawback: duplicates common functionality in every device model.
> > >   More code, more bugs.
> > > 
> > > * Let qdev core disconnect and free host resources
> > > 
> > >   - and replace them with dummies.  I guess we'd need a dummy
> > >     constructor method for that, in PropertyInfo.  Done right, device
> > >     models should be able to carry on unawares.
> > > 
> > >   - and leave them null.  Device models need to cope with that.  NICs
> > >     do for netdev.
> > > 
> > 
> > I like the latter here; the BlockDriverState handles nulls.  I think
> > netdev should be able to as well though I haven't looked very closely
> > though so maybe Michael can confirm if that's a true statement.
> 
> Not at the moment: the issue is that NULL means legacy vlan setup there.
> We can rework the code to avoid that assumption but it's not on my
> priority list.

So, to move the ball forward so to speak we need:

1. have qdev code disconnect host resources and leave them null (or some
   that indicates that the device is disconnected from host resource)
2. update netdev code to have some way to distinquish between legacy
   vlan and disconnected
3. implement a device_disconnect monitor command which can disconnect 
   at least block and net devices? 

Does that look right?

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-03 20:59                                       ` Ryan Harper
@ 2010-11-03 21:26                                         ` Michael S. Tsirkin
  2010-11-04 16:45                                           ` Ryan Harper
  0 siblings, 1 reply; 60+ messages in thread
From: Michael S. Tsirkin @ 2010-11-03 21:26 UTC (permalink / raw)
  To: Ryan Harper
  Cc: Kevin Wolf, Anthony Liguori, Markus Armbruster, qemu-devel,
	yamahata, Stefan Hajnoczi

On Wed, Nov 03, 2010 at 03:59:29PM -0500, Ryan Harper wrote:
> * Michael S. Tsirkin <mst@redhat.com> [2010-11-03 13:03]:
> > On Wed, Nov 03, 2010 at 12:29:10PM -0500, Ryan Harper wrote:
> > > * Markus Armbruster <armbru@redhat.com> [2010-11-03 11:42]:
> > > > Ryan Harper <ryanh@us.ibm.com> writes:
> > > > 
> > > > > * Michael S. Tsirkin <mst@redhat.com> [2010-11-03 02:22]:
> > > > >> On Tue, Nov 02, 2010 at 03:23:38PM -0500, Ryan Harper wrote:
> > > > >> > * Michael S. Tsirkin <mst@redhat.com> [2010-11-02 14:18]:
> > > > >> > > On Tue, Nov 02, 2010 at 02:01:08PM -0500, Ryan Harper wrote:
> > > > >> > > > > > > > I like the idea of disconnect; if part of the device_del method was to
> > > > >> > > > > > > > invoke a disconnect method, we could implement that for block, net, etc;
> > > > >> > > > > > > > 
> > > > >> > > > > > > > I'd think we'd want to send the notification, then disconnect.
> > > > >> > > > > > > > Struggling with whether it's worth having some reasonable timeout
> > > > >> > > > > > > > between notification and disconnect.  
> > > > >> > > > > > > 
> > > > >> > > > > > > The problem with this is that it has no analog in real world.
> > > > >> > > > > > > In real world, you can send some notifications to the guest, and you can
> > > > >> > > > > > > remove the card.  Tying them together is what created the problem in the
> > > > >> > > > > > > first place.
> > > > >> > > > > > > 
> > > > >> > > > > > > Timeouts can be implemented by management, maybe with a nice dialog
> > > > >> > > > > > > being shown to the user.
> > > > >> > > > > > 
> > > > >> > > > > > Very true.  I'm fine with forcing a disconnect during the removal path
> > > > >> > > > > > prior to notification.  Do we want a new disconnect method at the device
> > > > >> > > > > > level (pci)? or just use the existing removal callback and call that
> > > > >> > > > > > during the initial hotremov event?
> > > > >> > > > > 
> > > > >> > > > > Not sure what you mean by that, but I don't see a device doing anything
> > > > >> > > > > differently wrt surprise or ordered removal. So probably the existing
> > > > >> > > > > callback should do. I don't think we need to talk about disconnect:
> > > > >> > > > > since we decided we are emulating device removal, let's call it
> > > > >> > > > > just that.
> > > > >> > > > 
> > > > >> > > > Because current the "removal" process depends on the guest actually
> > > > >> > > > responding.  What I'm suggesting is that, in Marcus's term, and what
> > > > >> > > > drive_unplug() implements, is to disconnect the host block device from
> > > > >> > > > the guest device to prevent any further access to it in the case the
> > > > >> > > > guest doesn't respond to the removal request made via ACPI.
> > > > >> > > > 
> > > > >> > > > Very specifically, what we're suggesting instead of the drive_unplug()
> > > > >> > > > command so to complete the device removal operation without waiting for
> > > > >> > > > the guest to respond; that's what's going to happen if we invoke the
> > > > >> > > > response callback; it will appear as if the guest responded whether it
> > > > >> > > > did or not.
> > > > >> > > > 
> > > > >> > > > What I was suggesting above was to instead of calling the callback for
> > > > >> > > > handing the guest response was to add a device function called
> > > > >> > > > disconnect which would remove any association of host resources from
> > > > >> > > > guest resources before we notified the guest.  Thinking about it again
> > > > >> > > > I'm not sure this is useful, but if we're going to remove the device
> > > > >> > > > without the guests knowledge, I'm not sure how useful sending the
> > > > >> > > > removal requests via ACPI is in the first place.
> > > > >> > > > 
> > > > >> > > > My feeling is that I'd like to have explicit control over the disconnect
> > > > >> > > > from host resources separate from the device removal *if* we're going to
> > > > >> > > > retain the guest notification.  If we don't care to notify the guest,
> > > > >> > > > then we can just do device removal without notifying the guest
> > > > >> > > > and be done with it.
> > > > >> > > 
> > > > >> > > I imagine management would typically want to do this:
> > > > >> > > 1. notify guest
> > > > >> > > 2. wait a bit
> > > > >> > > 3. remove device
> > > > >> > 
> > > > >> > Yes; but this argues for (1) being a separate command from (3)
> > > > >> 
> > > > >> Yes. Long term I think we will want a way to do that.
> > > > >> 
> > > > >> > unless we
> > > > >> > require (3) to include (1) and (2) in the qemu implementation.
> > > > >> > 
> > > > >> > Currently we implement:
> > > > >> > 
> > > > >> > 1. device_del (attempt to remove device)
> > > > >> > 2. notify guest
> > > > >> > 3. if guest responds, remove device
> > > > >> > 4. disconnect host resource from device on destruction
> > > > >> > 
> > > > >> > With my drive_unplug patch we do:
> > > > >> > 
> > > > >> > 1. disconnect host resource from device
> > > > >> 
> > > > >> This is what drive_unplug does, right?
> > > > >
> > > > > Correct.
> > > > >
> > > > >> 
> > > > >> > 2. device_del (attempt to remove device)
> > > > >> > 3. notify guest
> > > > >> > 4. if guest responds, remove device
> > > > >> > 
> > > > >> > I think we're suggesting to instead do (if we keep disconnect as part of
> > > > >> > device_del)
> > > > >> > 
> > > > >> > 1. device_del (attemp to remove device)
> > > > >> > 2. notify guest
> > > > >> > 3. invoke device destruction callback resulting in disconnect host resource from device
> > > > >> > 4. if guest responds, invoke device destruction path a second time.
> > > > >> 
> > > > >> By response you mean eject?  No, this is not what I was suggesting.
> > > > >> I was really suggesting that your patch is fine :)
> > > > >> Sorry about confusion.
> > > > >
> > > > > I don't mean eject; I mean responding to the ACPI event by writing a
> > > > > response to the PCI chipset which QEMU then in turn will invoke the
> > > > > qdev_unplug() path which ultimately kills the device and the Drive and
> > > > > BlockState objects.
> > > > >
> > > > >> 
> > > > >> I was also saying that from what I hear, the pci express support
> > > > >> will at some point need interfaces to
> > > > >> - notify guest about device removal/addition
> > > > >> - get eject from guest
> > > > >> - remove device without talking to guest
> > > > >> - add device without talking to guest
> > > > >> - suppress device deletion on eject
> > > > >> 
> > > > >> All this can be generic and can work through express
> > > > >> configuration mechanisms or through acpi for pci.
> > > > >> But this is completely separate from unplugging
> > > > >> the host backend, which should be possible at any point.
> > > > >
> > > > > Yes.  I think we've worked out that we do want an independent
> > > > > unplug/disconnect mechanism rather than tying it to device_del.
> > > > >
> > > > > Marcus, it sounds like then you wanted to see a net_unplug/disconnect
> > > > > and that instead of having device_del always succeed and replacing it
> > > > > with a shell, we'd need to provide an explicit command to do the
> > > > > disconnect in a similar fashion to how we're doing drive_unplug?
> > > > 
> > > > I'm not sure I parse this.
> > > 
> > > You were asking for net and block disconnect to have similar mechanisms.
> > > You mentioned the net fix for suprise removal was to have device_del()
> > > always succeed by replacing the device with a shell/zombie.  The
> > > drive_unplug() patch doesn't do the same thing; it doesn't affect the
> > > device_del() path at all, rather it provides mgmt apps a hook to
> > > directly disconnect host resource from guest resource.
> > 
> > Yes, the shell thing is just an implementation detail.
> 
> ok.  What qemu monitor command do I call for net delete to do the
> "disconnect/unplug"?


netdev_del

> > 
> > > > 
> > > > > With at least two of these device types needing an explicit disconnect
> > > > > to sever the bond between host/guest makes me want a device-level
> > > > > interface for doing the disconnect that each device can implement
> > > > > differently.
> > > > 
> > > > I'm fine with having a separate command to forcibly disconnect a device
> > > > from its host resources.
> > > > 
> > > > Typical use:
> > > > 
> > > > 1. device_del
> > > >    ask guest to give up device, via ACPI
> > > > 
> > > > 2a. guest replies "done", delete device, free host resources
> > > > 
> > > > 2b. timeout, device_disconnect (or however we call that)
> > > > 
> > > > Is this what you have in mind?
> > > 
> > > Yeah, aboslutely.  I think Michael was saying we should implement 2b in
> > > the mgmt stack.  The current libvirt does the following 
> > > 
> > > 1. mgmt invokes detach-device
> > > 2. device_del
> > > 3. update mgmt view of resources, assumes guest has done it's part; does
> > > not confirm with qemu that device has been deleted.
> > > 
> > > With drive_unplug in qemu and a patch to libvirt, it looks like:
> > > 
> > > 1. mgmt invokes detach-device
> > > 2a. call drive_unplug, log warning if drive_unplug isn't available
> > > 2b. device_del
> > > 3. update mgmt view of resources, assumes guest has done it's part; does
> > > not confirm with qemu that device has been deleted.
> > > 
> > > I can look at implementing the timeout before invoking the unplug
> > > (that's a bit tricky) in libvirt;
> > 
> > 
> > So we'd 
> > 1. reorder 2a and 2b, and add a small timeout
> > 2. teach libvirt not to reuse the PCI slot and device id
> >    until it is really free
> > 
> > Sounds good.
> 
> We're talking libvirt code here; so we'll need to start up that thread
> there.  (1) is probably reasonable.  (2) is the harder part.  We'll need
> some help in figuring out how to do that one.  Maybe it can be done on
> the attach path (check if the slot is available in qemu).  I know there
> is some code to allocte slots in a structure that libvirt maintains.
> 
> I'll start a thread over there.
> 
> > 
> > > but given the fact that the mgmt is
> > > invoking the removal I think it's reasonable to do forced disconnect
> > > (even if the guest hasn't responded).
> > 
> > This is really making an assumption about the user.
> > Giving the guest a bit of time to respond with eject seems prudent.
> 
> I don't disagree that the notification is nice; but I'm not sure I see
> it as a requirement for correctness of behavior.  The device is being
> deleted *explicitly* at the user's request.  If the user invokes removal
> and it still using the device; the kernel doesn't do anything special
> here; it just responds to the interrupt and destroys the resource; this
> will result in the user app being hung on pending IO.  The same thing
> happens if we disconnect the host side device.  
> 
> Now, if we want to talk about nice; we'd need to do some improvements on
> the Linux acpi removal code where by we flush all pending io in the
> device before we respond to the device removal; 
> 
> > For disk we risk losing data otherwise.
> 
> I don't think that's true at all.  Even on suprise removal we complete
> pending io the device and return -EIO back to the guest.  The app may or
> maynot be robust enough to handle it the errors but we're definitely not
> losing data.
> 
> > The only reason we are pushing this out to management is so
> > it can track state implement timeouts and interact with the user.
> > If it doesn't, what's the point? Let's keep it all in qemu...
> > 
> > > > 
> > > > 
> > > > With qdev, device models are connected to host resources with special
> > > > properties such as qdev_prop_netdev and qdev_prop_drive.  Thus, generic
> > > > qdev code can already find and disconnect them.
> > > > 
> > > > How can we make sure device models survive such a disconnect?
> > > > 
> > > > * Ask the device to disconnect itself (new DeviceInfo method).
> > > >   Drawback: duplicates common functionality in every device model.
> > > >   More code, more bugs.
> > > > 
> > > > * Let qdev core disconnect and free host resources
> > > > 
> > > >   - and replace them with dummies.  I guess we'd need a dummy
> > > >     constructor method for that, in PropertyInfo.  Done right, device
> > > >     models should be able to carry on unawares.
> > > > 
> > > >   - and leave them null.  Device models need to cope with that.  NICs
> > > >     do for netdev.
> > > > 
> > > 
> > > I like the latter here; the BlockDriverState handles nulls.  I think
> > > netdev should be able to as well though I haven't looked very closely
> > > though so maybe Michael can confirm if that's a true statement.
> > 
> > Not at the moment: the issue is that NULL means legacy vlan setup there.
> > We can rework the code to avoid that assumption but it's not on my
> > priority list.
> 
> So, to move the ball forward so to speak we need:
> 
> 1. have qdev code disconnect host resources and leave them null (or some
>    that indicates that the device is disconnected from host resource)
> 2. update netdev code to have some way to distinquish between legacy
>    vlan and disconnected
> 3. implement a device_disconnect monitor command which can disconnect 
>    at least block and net devices? 
> 
> Does that look right?
> 
> -- 
> Ryan Harper
> Software Engineer; Linux Technology Center
> IBM Corp., Austin, Tx
> ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-03 21:26                                         ` Michael S. Tsirkin
@ 2010-11-04 16:45                                           ` Ryan Harper
  2010-11-04 17:04                                             ` Michael S. Tsirkin
  2010-11-05 13:27                                             ` Markus Armbruster
  0 siblings, 2 replies; 60+ messages in thread
From: Ryan Harper @ 2010-11-04 16:45 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Kevin Wolf, yamahata, Markus Armbruster, qemu-devel,
	Anthony Liguori, Ryan Harper, Stefan Hajnoczi

* Michael S. Tsirkin <mst@redhat.com> [2010-11-03 16:46]:
> On Wed, Nov 03, 2010 at 03:59:29PM -0500, Ryan Harper wrote:
> > * Michael S. Tsirkin <mst@redhat.com> [2010-11-03 13:03]:
> > > On Wed, Nov 03, 2010 at 12:29:10PM -0500, Ryan Harper wrote:
> > > > * Markus Armbruster <armbru@redhat.com> [2010-11-03 11:42]:
> > > > > Ryan Harper <ryanh@us.ibm.com> writes:
> > > > > 
> > > > > > * Michael S. Tsirkin <mst@redhat.com> [2010-11-03 02:22]:
> > > > > >> On Tue, Nov 02, 2010 at 03:23:38PM -0500, Ryan Harper wrote:
> > > > > >> > * Michael S. Tsirkin <mst@redhat.com> [2010-11-02 14:18]:
> > > > > >> > > On Tue, Nov 02, 2010 at 02:01:08PM -0500, Ryan Harper wrote:
> > > > > >> > > > > > > > I like the idea of disconnect; if part of the device_del method was to
> > > > > >> > > > > > > > invoke a disconnect method, we could implement that for block, net, etc;
> > > > > >> > > > > > > > 
> > > > > >> > > > > > > > I'd think we'd want to send the notification, then disconnect.
> > > > > >> > > > > > > > Struggling with whether it's worth having some reasonable timeout
> > > > > >> > > > > > > > between notification and disconnect.  
> > > > > >> > > > > > > 
> > > > > >> > > > > > > The problem with this is that it has no analog in real world.
> > > > > >> > > > > > > In real world, you can send some notifications to the guest, and you can
> > > > > >> > > > > > > remove the card.  Tying them together is what created the problem in the
> > > > > >> > > > > > > first place.
> > > > > >> > > > > > > 
> > > > > >> > > > > > > Timeouts can be implemented by management, maybe with a nice dialog
> > > > > >> > > > > > > being shown to the user.
> > > > > >> > > > > > 
> > > > > >> > > > > > Very true.  I'm fine with forcing a disconnect during the removal path
> > > > > >> > > > > > prior to notification.  Do we want a new disconnect method at the device
> > > > > >> > > > > > level (pci)? or just use the existing removal callback and call that
> > > > > >> > > > > > during the initial hotremov event?
> > > > > >> > > > > 
> > > > > >> > > > > Not sure what you mean by that, but I don't see a device doing anything
> > > > > >> > > > > differently wrt surprise or ordered removal. So probably the existing
> > > > > >> > > > > callback should do. I don't think we need to talk about disconnect:
> > > > > >> > > > > since we decided we are emulating device removal, let's call it
> > > > > >> > > > > just that.
> > > > > >> > > > 
> > > > > >> > > > Because current the "removal" process depends on the guest actually
> > > > > >> > > > responding.  What I'm suggesting is that, in Marcus's term, and what
> > > > > >> > > > drive_unplug() implements, is to disconnect the host block device from
> > > > > >> > > > the guest device to prevent any further access to it in the case the
> > > > > >> > > > guest doesn't respond to the removal request made via ACPI.
> > > > > >> > > > 
> > > > > >> > > > Very specifically, what we're suggesting instead of the drive_unplug()
> > > > > >> > > > command so to complete the device removal operation without waiting for
> > > > > >> > > > the guest to respond; that's what's going to happen if we invoke the
> > > > > >> > > > response callback; it will appear as if the guest responded whether it
> > > > > >> > > > did or not.
> > > > > >> > > > 
> > > > > >> > > > What I was suggesting above was to instead of calling the callback for
> > > > > >> > > > handing the guest response was to add a device function called
> > > > > >> > > > disconnect which would remove any association of host resources from
> > > > > >> > > > guest resources before we notified the guest.  Thinking about it again
> > > > > >> > > > I'm not sure this is useful, but if we're going to remove the device
> > > > > >> > > > without the guests knowledge, I'm not sure how useful sending the
> > > > > >> > > > removal requests via ACPI is in the first place.
> > > > > >> > > > 
> > > > > >> > > > My feeling is that I'd like to have explicit control over the disconnect
> > > > > >> > > > from host resources separate from the device removal *if* we're going to
> > > > > >> > > > retain the guest notification.  If we don't care to notify the guest,
> > > > > >> > > > then we can just do device removal without notifying the guest
> > > > > >> > > > and be done with it.
> > > > > >> > > 
> > > > > >> > > I imagine management would typically want to do this:
> > > > > >> > > 1. notify guest
> > > > > >> > > 2. wait a bit
> > > > > >> > > 3. remove device
> > > > > >> > 
> > > > > >> > Yes; but this argues for (1) being a separate command from (3)
> > > > > >> 
> > > > > >> Yes. Long term I think we will want a way to do that.
> > > > > >> 
> > > > > >> > unless we
> > > > > >> > require (3) to include (1) and (2) in the qemu implementation.
> > > > > >> > 
> > > > > >> > Currently we implement:
> > > > > >> > 
> > > > > >> > 1. device_del (attempt to remove device)
> > > > > >> > 2. notify guest
> > > > > >> > 3. if guest responds, remove device
> > > > > >> > 4. disconnect host resource from device on destruction
> > > > > >> > 
> > > > > >> > With my drive_unplug patch we do:
> > > > > >> > 
> > > > > >> > 1. disconnect host resource from device
> > > > > >> 
> > > > > >> This is what drive_unplug does, right?
> > > > > >
> > > > > > Correct.
> > > > > >
> > > > > >> 
> > > > > >> > 2. device_del (attempt to remove device)
> > > > > >> > 3. notify guest
> > > > > >> > 4. if guest responds, remove device
> > > > > >> > 
> > > > > >> > I think we're suggesting to instead do (if we keep disconnect as part of
> > > > > >> > device_del)
> > > > > >> > 
> > > > > >> > 1. device_del (attemp to remove device)
> > > > > >> > 2. notify guest
> > > > > >> > 3. invoke device destruction callback resulting in disconnect host resource from device
> > > > > >> > 4. if guest responds, invoke device destruction path a second time.
> > > > > >> 
> > > > > >> By response you mean eject?  No, this is not what I was suggesting.
> > > > > >> I was really suggesting that your patch is fine :)
> > > > > >> Sorry about confusion.
> > > > > >
> > > > > > I don't mean eject; I mean responding to the ACPI event by writing a
> > > > > > response to the PCI chipset which QEMU then in turn will invoke the
> > > > > > qdev_unplug() path which ultimately kills the device and the Drive and
> > > > > > BlockState objects.
> > > > > >
> > > > > >> 
> > > > > >> I was also saying that from what I hear, the pci express support
> > > > > >> will at some point need interfaces to
> > > > > >> - notify guest about device removal/addition
> > > > > >> - get eject from guest
> > > > > >> - remove device without talking to guest
> > > > > >> - add device without talking to guest
> > > > > >> - suppress device deletion on eject
> > > > > >> 
> > > > > >> All this can be generic and can work through express
> > > > > >> configuration mechanisms or through acpi for pci.
> > > > > >> But this is completely separate from unplugging
> > > > > >> the host backend, which should be possible at any point.
> > > > > >
> > > > > > Yes.  I think we've worked out that we do want an independent
> > > > > > unplug/disconnect mechanism rather than tying it to device_del.
> > > > > >
> > > > > > Marcus, it sounds like then you wanted to see a net_unplug/disconnect
> > > > > > and that instead of having device_del always succeed and replacing it
> > > > > > with a shell, we'd need to provide an explicit command to do the
> > > > > > disconnect in a similar fashion to how we're doing drive_unplug?
> > > > > 
> > > > > I'm not sure I parse this.
> > > > 
> > > > You were asking for net and block disconnect to have similar mechanisms.
> > > > You mentioned the net fix for suprise removal was to have device_del()
> > > > always succeed by replacing the device with a shell/zombie.  The
> > > > drive_unplug() patch doesn't do the same thing; it doesn't affect the
> > > > device_del() path at all, rather it provides mgmt apps a hook to
> > > > directly disconnect host resource from guest resource.
> > > 
> > > Yes, the shell thing is just an implementation detail.
> > 
> > ok.  What qemu monitor command do I call for net delete to do the
> > "disconnect/unplug"?
> 
> 
> netdev_del

OK.  With netdev_del and drive_unplug commands (not sure if we care to
change the names to be similar, maybe blockdev_del) in qemu, we can then
implement the following in libvirt:

1) detach-device invocation
2) issue device_del to QEMU
2a) notification is sent)
3) issue netdev_del/blockdev_del as appropriate for the device type
4) update guest XML to indicate device has been removed

And a fancier version would look like:

1) detach-device invocation
2) issue device_del to QEMU
2a) notification is sent)
3) set a timeout for guest to respond
4) when timeout expires
4a) check if the pci device has been removed by quering QEMU
    if it hasn't been removed, issue netdev_del/blockdev_del
5) update guest XML to indicate device has been removed


in both cases, I think we'll also want a patch that validates that the
pci slot is available before handing it out again; this will handle the
case where the guest doesn't respond to the device removal request; our
net/blockdev_del command will break the host/guest association, but we
don't want to attempt to attach a device to the same slot.

Marcus, do you think we're at a point where the mechanisms for
explicitly revoking access to the host resource is consistent between
net and block?

If so, then I suppose I might have a consmetic patch to fix up the
monitor command name to line up with the netdev_del.


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-04 16:45                                           ` Ryan Harper
@ 2010-11-04 17:04                                             ` Michael S. Tsirkin
  2010-11-05 13:27                                             ` Markus Armbruster
  1 sibling, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2010-11-04 17:04 UTC (permalink / raw)
  To: Ryan Harper
  Cc: Kevin Wolf, yamahata, Markus Armbruster, qemu-devel,
	Anthony Liguori, Stefan Hajnoczi

On Thu, Nov 04, 2010 at 11:45:51AM -0500, Ryan Harper wrote:
> OK.  With netdev_del and drive_unplug commands (not sure if we care to
> change the names to be similar, maybe blockdev_del) in qemu, we can then
> implement the following in libvirt:
> 
> 1) detach-device invocation
> 2) issue device_del to QEMU
> 2a) notification is sent)
> 3) issue netdev_del/blockdev_del as appropriate for the device type
> 4) update guest XML to indicate device has been removed
> And a fancier version would look like:
> 
> 1) detach-device invocation
> 2) issue device_del to QEMU
> 2a) notification is sent)
> 3) set a timeout for guest to respond
> 4) when timeout expires
> 4a) check if the pci device has been removed by quering QEMU
>     if it hasn't been removed, issue netdev_del/blockdev_del

I think it's easier to check the network device:
info network and whatever is appropriate for block

> 5) update guest XML to indicate device has been removed
> 
> 
> in both cases, I think we'll also want a patch that validates that the
> pci slot is available before handing it out again; this will handle the
> case where the guest doesn't respond to the device removal request; our
> net/blockdev_del command will break the host/guest association, but we
> don't want to attempt to attach a device to the same slot.

Yes, absolutely.  And same for qdev device id.

> Marcus, do you think we're at a point where the mechanisms for
> explicitly revoking access to the host resource is consistent between
> net and block?
> 
> If so, then I suppose I might have a consmetic patch to fix up the
> monitor command name to line up with the netdev_del.
> 
> 
> -- 
> Ryan Harper
> Software Engineer; Linux Technology Center
> IBM Corp., Austin, Tx
> ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-04 16:45                                           ` Ryan Harper
  2010-11-04 17:04                                             ` Michael S. Tsirkin
@ 2010-11-05 13:27                                             ` Markus Armbruster
  2010-11-05 14:17                                               ` Michael S. Tsirkin
  2010-11-05 14:25                                               ` Ryan Harper
  1 sibling, 2 replies; 60+ messages in thread
From: Markus Armbruster @ 2010-11-05 13:27 UTC (permalink / raw)
  To: Ryan Harper
  Cc: Kevin Wolf, yamahata, Michael S. Tsirkin, qemu-devel,
	Anthony Liguori, Stefan Hajnoczi

Ryan Harper <ryanh@us.ibm.com> writes:

> * Michael S. Tsirkin <mst@redhat.com> [2010-11-03 16:46]:
>> On Wed, Nov 03, 2010 at 03:59:29PM -0500, Ryan Harper wrote:
>> > * Michael S. Tsirkin <mst@redhat.com> [2010-11-03 13:03]:
>> > > On Wed, Nov 03, 2010 at 12:29:10PM -0500, Ryan Harper wrote:
>> > > > * Markus Armbruster <armbru@redhat.com> [2010-11-03 11:42]:
>> > > > > Ryan Harper <ryanh@us.ibm.com> writes:
>> > > > > 
>> > > > > > * Michael S. Tsirkin <mst@redhat.com> [2010-11-03 02:22]:
>> > > > > >> On Tue, Nov 02, 2010 at 03:23:38PM -0500, Ryan Harper wrote:
>> > > > > >> > * Michael S. Tsirkin <mst@redhat.com> [2010-11-02 14:18]:
>> > > > > >> > > On Tue, Nov 02, 2010 at 02:01:08PM -0500, Ryan Harper wrote:
>> > > > > >> > > > > > > > I like the idea of disconnect; if part of the device_del method was to
>> > > > > >> > > > > > > > invoke a disconnect method, we could implement that for block, net, etc;
>> > > > > >> > > > > > > > 
>> > > > > >> > > > > > > > I'd think we'd want to send the notification, then disconnect.
>> > > > > >> > > > > > > > Struggling with whether it's worth having some reasonable timeout
>> > > > > >> > > > > > > > between notification and disconnect.  
>> > > > > >> > > > > > > 
>> > > > > >> > > > > > > The problem with this is that it has no analog in real world.
>> > > > > >> > > > > > > In real world, you can send some notifications to the guest, and you can
>> > > > > >> > > > > > > remove the card.  Tying them together is what created the problem in the
>> > > > > >> > > > > > > first place.
>> > > > > >> > > > > > > 
>> > > > > >> > > > > > > Timeouts can be implemented by management, maybe with a nice dialog
>> > > > > >> > > > > > > being shown to the user.
>> > > > > >> > > > > > 
>> > > > > >> > > > > > Very true.  I'm fine with forcing a disconnect during the removal path
>> > > > > >> > > > > > prior to notification.  Do we want a new disconnect method at the device
>> > > > > >> > > > > > level (pci)? or just use the existing removal callback and call that
>> > > > > >> > > > > > during the initial hotremov event?
>> > > > > >> > > > > 
>> > > > > >> > > > > Not sure what you mean by that, but I don't see a device doing anything
>> > > > > >> > > > > differently wrt surprise or ordered removal. So probably the existing
>> > > > > >> > > > > callback should do. I don't think we need to talk about disconnect:
>> > > > > >> > > > > since we decided we are emulating device removal, let's call it
>> > > > > >> > > > > just that.
>> > > > > >> > > > 
>> > > > > >> > > > Because current the "removal" process depends on the guest actually
>> > > > > >> > > > responding.  What I'm suggesting is that, in Marcus's term, and what
>> > > > > >> > > > drive_unplug() implements, is to disconnect the host block device from
>> > > > > >> > > > the guest device to prevent any further access to it in the case the
>> > > > > >> > > > guest doesn't respond to the removal request made via ACPI.
>> > > > > >> > > > 
>> > > > > >> > > > Very specifically, what we're suggesting instead of the drive_unplug()
>> > > > > >> > > > command so to complete the device removal operation without waiting for
>> > > > > >> > > > the guest to respond; that's what's going to happen if we invoke the
>> > > > > >> > > > response callback; it will appear as if the guest responded whether it
>> > > > > >> > > > did or not.
>> > > > > >> > > > 
>> > > > > >> > > > What I was suggesting above was to instead of calling the callback for
>> > > > > >> > > > handing the guest response was to add a device function called
>> > > > > >> > > > disconnect which would remove any association of host resources from
>> > > > > >> > > > guest resources before we notified the guest.  Thinking about it again
>> > > > > >> > > > I'm not sure this is useful, but if we're going to remove the device
>> > > > > >> > > > without the guests knowledge, I'm not sure how useful sending the
>> > > > > >> > > > removal requests via ACPI is in the first place.
>> > > > > >> > > > 
>> > > > > >> > > > My feeling is that I'd like to have explicit control over the disconnect
>> > > > > >> > > > from host resources separate from the device removal *if* we're going to
>> > > > > >> > > > retain the guest notification.  If we don't care to notify the guest,
>> > > > > >> > > > then we can just do device removal without notifying the guest
>> > > > > >> > > > and be done with it.
>> > > > > >> > > 
>> > > > > >> > > I imagine management would typically want to do this:
>> > > > > >> > > 1. notify guest
>> > > > > >> > > 2. wait a bit
>> > > > > >> > > 3. remove device
>> > > > > >> > 
>> > > > > >> > Yes; but this argues for (1) being a separate command from (3)
>> > > > > >> 
>> > > > > >> Yes. Long term I think we will want a way to do that.
>> > > > > >> 
>> > > > > >> > unless we
>> > > > > >> > require (3) to include (1) and (2) in the qemu implementation.
>> > > > > >> > 
>> > > > > >> > Currently we implement:
>> > > > > >> > 
>> > > > > >> > 1. device_del (attempt to remove device)
>> > > > > >> > 2. notify guest
>> > > > > >> > 3. if guest responds, remove device
>> > > > > >> > 4. disconnect host resource from device on destruction
>> > > > > >> > 
>> > > > > >> > With my drive_unplug patch we do:
>> > > > > >> > 
>> > > > > >> > 1. disconnect host resource from device
>> > > > > >> 
>> > > > > >> This is what drive_unplug does, right?
>> > > > > >
>> > > > > > Correct.
>> > > > > >
>> > > > > >> 
>> > > > > >> > 2. device_del (attempt to remove device)
>> > > > > >> > 3. notify guest
>> > > > > >> > 4. if guest responds, remove device
>> > > > > >> > 
>> > > > > >> > I think we're suggesting to instead do (if we keep disconnect as part of
>> > > > > >> > device_del)
>> > > > > >> > 
>> > > > > >> > 1. device_del (attemp to remove device)
>> > > > > >> > 2. notify guest
>> > > > > >> > 3. invoke device destruction callback resulting in disconnect host resource from device
>> > > > > >> > 4. if guest responds, invoke device destruction path a second time.
>> > > > > >> 
>> > > > > >> By response you mean eject?  No, this is not what I was suggesting.
>> > > > > >> I was really suggesting that your patch is fine :)
>> > > > > >> Sorry about confusion.
>> > > > > >
>> > > > > > I don't mean eject; I mean responding to the ACPI event by writing a
>> > > > > > response to the PCI chipset which QEMU then in turn will invoke the
>> > > > > > qdev_unplug() path which ultimately kills the device and the Drive and
>> > > > > > BlockState objects.
>> > > > > >
>> > > > > >> 
>> > > > > >> I was also saying that from what I hear, the pci express support
>> > > > > >> will at some point need interfaces to
>> > > > > >> - notify guest about device removal/addition
>> > > > > >> - get eject from guest
>> > > > > >> - remove device without talking to guest
>> > > > > >> - add device without talking to guest
>> > > > > >> - suppress device deletion on eject
>> > > > > >> 
>> > > > > >> All this can be generic and can work through express
>> > > > > >> configuration mechanisms or through acpi for pci.
>> > > > > >> But this is completely separate from unplugging
>> > > > > >> the host backend, which should be possible at any point.
>> > > > > >
>> > > > > > Yes.  I think we've worked out that we do want an independent
>> > > > > > unplug/disconnect mechanism rather than tying it to device_del.
>> > > > > >
>> > > > > > Marcus, it sounds like then you wanted to see a net_unplug/disconnect
>> > > > > > and that instead of having device_del always succeed and replacing it
>> > > > > > with a shell, we'd need to provide an explicit command to do the
>> > > > > > disconnect in a similar fashion to how we're doing drive_unplug?
>> > > > > 
>> > > > > I'm not sure I parse this.
>> > > > 
>> > > > You were asking for net and block disconnect to have similar mechanisms.
>> > > > You mentioned the net fix for suprise removal was to have device_del()
>> > > > always succeed by replacing the device with a shell/zombie.  The
>> > > > drive_unplug() patch doesn't do the same thing; it doesn't affect the
>> > > > device_del() path at all, rather it provides mgmt apps a hook to
>> > > > directly disconnect host resource from guest resource.
>> > > 
>> > > Yes, the shell thing is just an implementation detail.
>> > 
>> > ok.  What qemu monitor command do I call for net delete to do the
>> > "disconnect/unplug"?
>> 
>> 
>> netdev_del
>
> OK.  With netdev_del and drive_unplug commands (not sure if we care to
> change the names to be similar, maybe blockdev_del) in qemu, we can then
> implement the following in libvirt:
>
> 1) detach-device invocation
> 2) issue device_del to QEMU
> 2a) notification is sent)
> 3) issue netdev_del/blockdev_del as appropriate for the device type
> 4) update guest XML to indicate device has been removed
>
> And a fancier version would look like:
>
> 1) detach-device invocation
> 2) issue device_del to QEMU
> 2a) notification is sent)
> 3) set a timeout for guest to respond
> 4) when timeout expires
> 4a) check if the pci device has been removed by quering QEMU
>     if it hasn't been removed, issue netdev_del/blockdev_del
> 5) update guest XML to indicate device has been removed
>
>
> in both cases, I think we'll also want a patch that validates that the
> pci slot is available before handing it out again; this will handle the
> case where the guest doesn't respond to the device removal request; our
> net/blockdev_del command will break the host/guest association, but we
> don't want to attempt to attach a device to the same slot.
>
> Marcus, do you think we're at a point where the mechanisms for
> explicitly revoking access to the host resource is consistent between
> net and block?
>
> If so, then I suppose I might have a consmetic patch to fix up the
> monitor command name to line up with the netdev_del.

I'd be fine with any of these:

1. A new command "device_disconnet ID" (or similar name) to disconnect
   device ID from any host parts.  Nice touch: you don't have to know
   about the device's host part(s) to disconnect it.  But it might be
   more work than the other two.

2. New commands netdev_disconnect, drive_disconnect (or similar names)
   to disconnect a host part from a guest device.  Like (1), except you
   have to point to the other end of the connection to cut it.

3. A new command "drive_del ID" similar to existing netdev_del.  This is
   (2) fused with delete.  Conceptual wart: you can't disconnect and
   keep the host part around.  Moreover, delete is slightly dangerous,
   because it renders any guest device still using the host part
   useless.

Do you need anything else from me to make progress?

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-05 13:27                                             ` Markus Armbruster
@ 2010-11-05 14:17                                               ` Michael S. Tsirkin
  2010-11-05 14:29                                                 ` Ryan Harper
  2010-11-05 16:01                                                 ` Markus Armbruster
  2010-11-05 14:25                                               ` Ryan Harper
  1 sibling, 2 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2010-11-05 14:17 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Kevin Wolf, yamahata, qemu-devel, Anthony Liguori, Ryan Harper,
	Stefan Hajnoczi

On Fri, Nov 05, 2010 at 02:27:49PM +0100, Markus Armbruster wrote:
> Ryan Harper <ryanh@us.ibm.com> writes:
> 
> > * Michael S. Tsirkin <mst@redhat.com> [2010-11-03 16:46]:
> >> On Wed, Nov 03, 2010 at 03:59:29PM -0500, Ryan Harper wrote:
> >> > * Michael S. Tsirkin <mst@redhat.com> [2010-11-03 13:03]:
> >> > > On Wed, Nov 03, 2010 at 12:29:10PM -0500, Ryan Harper wrote:
> >> > > > * Markus Armbruster <armbru@redhat.com> [2010-11-03 11:42]:
> >> > > > > Ryan Harper <ryanh@us.ibm.com> writes:
> >> > > > > 
> >> > > > > > * Michael S. Tsirkin <mst@redhat.com> [2010-11-03 02:22]:
> >> > > > > >> On Tue, Nov 02, 2010 at 03:23:38PM -0500, Ryan Harper wrote:
> >> > > > > >> > * Michael S. Tsirkin <mst@redhat.com> [2010-11-02 14:18]:
> >> > > > > >> > > On Tue, Nov 02, 2010 at 02:01:08PM -0500, Ryan Harper wrote:
> >> > > > > >> > > > > > > > I like the idea of disconnect; if part of the device_del method was to
> >> > > > > >> > > > > > > > invoke a disconnect method, we could implement that for block, net, etc;
> >> > > > > >> > > > > > > > 
> >> > > > > >> > > > > > > > I'd think we'd want to send the notification, then disconnect.
> >> > > > > >> > > > > > > > Struggling with whether it's worth having some reasonable timeout
> >> > > > > >> > > > > > > > between notification and disconnect.  
> >> > > > > >> > > > > > > 
> >> > > > > >> > > > > > > The problem with this is that it has no analog in real world.
> >> > > > > >> > > > > > > In real world, you can send some notifications to the guest, and you can
> >> > > > > >> > > > > > > remove the card.  Tying them together is what created the problem in the
> >> > > > > >> > > > > > > first place.
> >> > > > > >> > > > > > > 
> >> > > > > >> > > > > > > Timeouts can be implemented by management, maybe with a nice dialog
> >> > > > > >> > > > > > > being shown to the user.
> >> > > > > >> > > > > > 
> >> > > > > >> > > > > > Very true.  I'm fine with forcing a disconnect during the removal path
> >> > > > > >> > > > > > prior to notification.  Do we want a new disconnect method at the device
> >> > > > > >> > > > > > level (pci)? or just use the existing removal callback and call that
> >> > > > > >> > > > > > during the initial hotremov event?
> >> > > > > >> > > > > 
> >> > > > > >> > > > > Not sure what you mean by that, but I don't see a device doing anything
> >> > > > > >> > > > > differently wrt surprise or ordered removal. So probably the existing
> >> > > > > >> > > > > callback should do. I don't think we need to talk about disconnect:
> >> > > > > >> > > > > since we decided we are emulating device removal, let's call it
> >> > > > > >> > > > > just that.
> >> > > > > >> > > > 
> >> > > > > >> > > > Because current the "removal" process depends on the guest actually
> >> > > > > >> > > > responding.  What I'm suggesting is that, in Marcus's term, and what
> >> > > > > >> > > > drive_unplug() implements, is to disconnect the host block device from
> >> > > > > >> > > > the guest device to prevent any further access to it in the case the
> >> > > > > >> > > > guest doesn't respond to the removal request made via ACPI.
> >> > > > > >> > > > 
> >> > > > > >> > > > Very specifically, what we're suggesting instead of the drive_unplug()
> >> > > > > >> > > > command so to complete the device removal operation without waiting for
> >> > > > > >> > > > the guest to respond; that's what's going to happen if we invoke the
> >> > > > > >> > > > response callback; it will appear as if the guest responded whether it
> >> > > > > >> > > > did or not.
> >> > > > > >> > > > 
> >> > > > > >> > > > What I was suggesting above was to instead of calling the callback for
> >> > > > > >> > > > handing the guest response was to add a device function called
> >> > > > > >> > > > disconnect which would remove any association of host resources from
> >> > > > > >> > > > guest resources before we notified the guest.  Thinking about it again
> >> > > > > >> > > > I'm not sure this is useful, but if we're going to remove the device
> >> > > > > >> > > > without the guests knowledge, I'm not sure how useful sending the
> >> > > > > >> > > > removal requests via ACPI is in the first place.
> >> > > > > >> > > > 
> >> > > > > >> > > > My feeling is that I'd like to have explicit control over the disconnect
> >> > > > > >> > > > from host resources separate from the device removal *if* we're going to
> >> > > > > >> > > > retain the guest notification.  If we don't care to notify the guest,
> >> > > > > >> > > > then we can just do device removal without notifying the guest
> >> > > > > >> > > > and be done with it.
> >> > > > > >> > > 
> >> > > > > >> > > I imagine management would typically want to do this:
> >> > > > > >> > > 1. notify guest
> >> > > > > >> > > 2. wait a bit
> >> > > > > >> > > 3. remove device
> >> > > > > >> > 
> >> > > > > >> > Yes; but this argues for (1) being a separate command from (3)
> >> > > > > >> 
> >> > > > > >> Yes. Long term I think we will want a way to do that.
> >> > > > > >> 
> >> > > > > >> > unless we
> >> > > > > >> > require (3) to include (1) and (2) in the qemu implementation.
> >> > > > > >> > 
> >> > > > > >> > Currently we implement:
> >> > > > > >> > 
> >> > > > > >> > 1. device_del (attempt to remove device)
> >> > > > > >> > 2. notify guest
> >> > > > > >> > 3. if guest responds, remove device
> >> > > > > >> > 4. disconnect host resource from device on destruction
> >> > > > > >> > 
> >> > > > > >> > With my drive_unplug patch we do:
> >> > > > > >> > 
> >> > > > > >> > 1. disconnect host resource from device
> >> > > > > >> 
> >> > > > > >> This is what drive_unplug does, right?
> >> > > > > >
> >> > > > > > Correct.
> >> > > > > >
> >> > > > > >> 
> >> > > > > >> > 2. device_del (attempt to remove device)
> >> > > > > >> > 3. notify guest
> >> > > > > >> > 4. if guest responds, remove device
> >> > > > > >> > 
> >> > > > > >> > I think we're suggesting to instead do (if we keep disconnect as part of
> >> > > > > >> > device_del)
> >> > > > > >> > 
> >> > > > > >> > 1. device_del (attemp to remove device)
> >> > > > > >> > 2. notify guest
> >> > > > > >> > 3. invoke device destruction callback resulting in disconnect host resource from device
> >> > > > > >> > 4. if guest responds, invoke device destruction path a second time.
> >> > > > > >> 
> >> > > > > >> By response you mean eject?  No, this is not what I was suggesting.
> >> > > > > >> I was really suggesting that your patch is fine :)
> >> > > > > >> Sorry about confusion.
> >> > > > > >
> >> > > > > > I don't mean eject; I mean responding to the ACPI event by writing a
> >> > > > > > response to the PCI chipset which QEMU then in turn will invoke the
> >> > > > > > qdev_unplug() path which ultimately kills the device and the Drive and
> >> > > > > > BlockState objects.
> >> > > > > >
> >> > > > > >> 
> >> > > > > >> I was also saying that from what I hear, the pci express support
> >> > > > > >> will at some point need interfaces to
> >> > > > > >> - notify guest about device removal/addition
> >> > > > > >> - get eject from guest
> >> > > > > >> - remove device without talking to guest
> >> > > > > >> - add device without talking to guest
> >> > > > > >> - suppress device deletion on eject
> >> > > > > >> 
> >> > > > > >> All this can be generic and can work through express
> >> > > > > >> configuration mechanisms or through acpi for pci.
> >> > > > > >> But this is completely separate from unplugging
> >> > > > > >> the host backend, which should be possible at any point.
> >> > > > > >
> >> > > > > > Yes.  I think we've worked out that we do want an independent
> >> > > > > > unplug/disconnect mechanism rather than tying it to device_del.
> >> > > > > >
> >> > > > > > Marcus, it sounds like then you wanted to see a net_unplug/disconnect
> >> > > > > > and that instead of having device_del always succeed and replacing it
> >> > > > > > with a shell, we'd need to provide an explicit command to do the
> >> > > > > > disconnect in a similar fashion to how we're doing drive_unplug?
> >> > > > > 
> >> > > > > I'm not sure I parse this.
> >> > > > 
> >> > > > You were asking for net and block disconnect to have similar mechanisms.
> >> > > > You mentioned the net fix for suprise removal was to have device_del()
> >> > > > always succeed by replacing the device with a shell/zombie.  The
> >> > > > drive_unplug() patch doesn't do the same thing; it doesn't affect the
> >> > > > device_del() path at all, rather it provides mgmt apps a hook to
> >> > > > directly disconnect host resource from guest resource.
> >> > > 
> >> > > Yes, the shell thing is just an implementation detail.
> >> > 
> >> > ok.  What qemu monitor command do I call for net delete to do the
> >> > "disconnect/unplug"?
> >> 
> >> 
> >> netdev_del
> >
> > OK.  With netdev_del and drive_unplug commands (not sure if we care to
> > change the names to be similar, maybe blockdev_del) in qemu, we can then
> > implement the following in libvirt:
> >
> > 1) detach-device invocation
> > 2) issue device_del to QEMU
> > 2a) notification is sent)
> > 3) issue netdev_del/blockdev_del as appropriate for the device type
> > 4) update guest XML to indicate device has been removed
> >
> > And a fancier version would look like:
> >
> > 1) detach-device invocation
> > 2) issue device_del to QEMU
> > 2a) notification is sent)
> > 3) set a timeout for guest to respond
> > 4) when timeout expires
> > 4a) check if the pci device has been removed by quering QEMU
> >     if it hasn't been removed, issue netdev_del/blockdev_del
> > 5) update guest XML to indicate device has been removed
> >
> >
> > in both cases, I think we'll also want a patch that validates that the
> > pci slot is available before handing it out again; this will handle the
> > case where the guest doesn't respond to the device removal request; our
> > net/blockdev_del command will break the host/guest association, but we
> > don't want to attempt to attach a device to the same slot.
> >
> > Marcus, do you think we're at a point where the mechanisms for
> > explicitly revoking access to the host resource is consistent between
> > net and block?
> >
> > If so, then I suppose I might have a consmetic patch to fix up the
> > monitor command name to line up with the netdev_del.
> 
> I'd be fine with any of these:
> 
> 1. A new command "device_disconnet ID" (or similar name) to disconnect
>    device ID from any host parts.  Nice touch: you don't have to know
>    about the device's host part(s) to disconnect it.  But it might be
>    more work than the other two.
> 
> 2. New commands netdev_disconnect, drive_disconnect (or similar names)
>    to disconnect a host part from a guest device.  Like (1), except you
>    have to point to the other end of the connection to cut it.

I think it's cleaner not to introduce a concept of a disconnected
backend.

One thing that we must be careful to explicitly disallow, is
reconnecting guest to another host backend. The reason being
that guest might rely on backend features and changing these
would break this.

Given that, disconnecting without delete isn't helpful.

> 3. A new command "drive_del ID" similar to existing netdev_del.  This is
>    (2) fused with delete.  Conceptual wart: you can't disconnect and
>    keep the host part around.  Moreover, delete is slightly dangerous,
>    because it renders any guest device still using the host part
>    useless.

I don't see how it's more dangerous than disconnecting.
If guest can't access the backend it might not exist
as far as guest is concerned.

> Do you need anything else from me to make progress?

Let's go for 3. Need for 1/2 seems dubious, and it's much harder
to support.

-- 
MST

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-05 13:27                                             ` Markus Armbruster
  2010-11-05 14:17                                               ` Michael S. Tsirkin
@ 2010-11-05 14:25                                               ` Ryan Harper
  2010-11-05 16:10                                                 ` Markus Armbruster
  1 sibling, 1 reply; 60+ messages in thread
From: Ryan Harper @ 2010-11-05 14:25 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Kevin Wolf, yamahata, Michael S. Tsirkin, qemu-devel,
	Anthony Liguori, Ryan Harper, Stefan Hajnoczi

* Markus Armbruster <armbru@redhat.com> [2010-11-05 08:28]:
> Ryan Harper <ryanh@us.ibm.com> writes:
> 
> > * Michael S. Tsirkin <mst@redhat.com> [2010-11-03 16:46]:
> >> On Wed, Nov 03, 2010 at 03:59:29PM -0500, Ryan Harper wrote:
> >> > * Michael S. Tsirkin <mst@redhat.com> [2010-11-03 13:03]:
> >> > > On Wed, Nov 03, 2010 at 12:29:10PM -0500, Ryan Harper wrote:
> >> > > > * Markus Armbruster <armbru@redhat.com> [2010-11-03 11:42]:
> >> > > > > Ryan Harper <ryanh@us.ibm.com> writes:
> >> > > > > 
> >> > > > > > * Michael S. Tsirkin <mst@redhat.com> [2010-11-03 02:22]:
> >> > > > > >> On Tue, Nov 02, 2010 at 03:23:38PM -0500, Ryan Harper wrote:
> >> > > > > >> > * Michael S. Tsirkin <mst@redhat.com> [2010-11-02 14:18]:
> >> > > > > >> > > On Tue, Nov 02, 2010 at 02:01:08PM -0500, Ryan Harper wrote:
> >> > > > > >> > > > > > > > I like the idea of disconnect; if part of the device_del method was to
> >> > > > > >> > > > > > > > invoke a disconnect method, we could implement that for block, net, etc;
> >> > > > > >> > > > > > > > 
> >> > > > > >> > > > > > > > I'd think we'd want to send the notification, then disconnect.
> >> > > > > >> > > > > > > > Struggling with whether it's worth having some reasonable timeout
> >> > > > > >> > > > > > > > between notification and disconnect.  
> >> > > > > >> > > > > > > 
> >> > > > > >> > > > > > > The problem with this is that it has no analog in real world.
> >> > > > > >> > > > > > > In real world, you can send some notifications to the guest, and you can
> >> > > > > >> > > > > > > remove the card.  Tying them together is what created the problem in the
> >> > > > > >> > > > > > > first place.
> >> > > > > >> > > > > > > 
> >> > > > > >> > > > > > > Timeouts can be implemented by management, maybe with a nice dialog
> >> > > > > >> > > > > > > being shown to the user.
> >> > > > > >> > > > > > 
> >> > > > > >> > > > > > Very true.  I'm fine with forcing a disconnect during the removal path
> >> > > > > >> > > > > > prior to notification.  Do we want a new disconnect method at the device
> >> > > > > >> > > > > > level (pci)? or just use the existing removal callback and call that
> >> > > > > >> > > > > > during the initial hotremov event?
> >> > > > > >> > > > > 
> >> > > > > >> > > > > Not sure what you mean by that, but I don't see a device doing anything
> >> > > > > >> > > > > differently wrt surprise or ordered removal. So probably the existing
> >> > > > > >> > > > > callback should do. I don't think we need to talk about disconnect:
> >> > > > > >> > > > > since we decided we are emulating device removal, let's call it
> >> > > > > >> > > > > just that.
> >> > > > > >> > > > 
> >> > > > > >> > > > Because current the "removal" process depends on the guest actually
> >> > > > > >> > > > responding.  What I'm suggesting is that, in Marcus's term, and what
> >> > > > > >> > > > drive_unplug() implements, is to disconnect the host block device from
> >> > > > > >> > > > the guest device to prevent any further access to it in the case the
> >> > > > > >> > > > guest doesn't respond to the removal request made via ACPI.
> >> > > > > >> > > > 
> >> > > > > >> > > > Very specifically, what we're suggesting instead of the drive_unplug()
> >> > > > > >> > > > command so to complete the device removal operation without waiting for
> >> > > > > >> > > > the guest to respond; that's what's going to happen if we invoke the
> >> > > > > >> > > > response callback; it will appear as if the guest responded whether it
> >> > > > > >> > > > did or not.
> >> > > > > >> > > > 
> >> > > > > >> > > > What I was suggesting above was to instead of calling the callback for
> >> > > > > >> > > > handing the guest response was to add a device function called
> >> > > > > >> > > > disconnect which would remove any association of host resources from
> >> > > > > >> > > > guest resources before we notified the guest.  Thinking about it again
> >> > > > > >> > > > I'm not sure this is useful, but if we're going to remove the device
> >> > > > > >> > > > without the guests knowledge, I'm not sure how useful sending the
> >> > > > > >> > > > removal requests via ACPI is in the first place.
> >> > > > > >> > > > 
> >> > > > > >> > > > My feeling is that I'd like to have explicit control over the disconnect
> >> > > > > >> > > > from host resources separate from the device removal *if* we're going to
> >> > > > > >> > > > retain the guest notification.  If we don't care to notify the guest,
> >> > > > > >> > > > then we can just do device removal without notifying the guest
> >> > > > > >> > > > and be done with it.
> >> > > > > >> > > 
> >> > > > > >> > > I imagine management would typically want to do this:
> >> > > > > >> > > 1. notify guest
> >> > > > > >> > > 2. wait a bit
> >> > > > > >> > > 3. remove device
> >> > > > > >> > 
> >> > > > > >> > Yes; but this argues for (1) being a separate command from (3)
> >> > > > > >> 
> >> > > > > >> Yes. Long term I think we will want a way to do that.
> >> > > > > >> 
> >> > > > > >> > unless we
> >> > > > > >> > require (3) to include (1) and (2) in the qemu implementation.
> >> > > > > >> > 
> >> > > > > >> > Currently we implement:
> >> > > > > >> > 
> >> > > > > >> > 1. device_del (attempt to remove device)
> >> > > > > >> > 2. notify guest
> >> > > > > >> > 3. if guest responds, remove device
> >> > > > > >> > 4. disconnect host resource from device on destruction
> >> > > > > >> > 
> >> > > > > >> > With my drive_unplug patch we do:
> >> > > > > >> > 
> >> > > > > >> > 1. disconnect host resource from device
> >> > > > > >> 
> >> > > > > >> This is what drive_unplug does, right?
> >> > > > > >
> >> > > > > > Correct.
> >> > > > > >
> >> > > > > >> 
> >> > > > > >> > 2. device_del (attempt to remove device)
> >> > > > > >> > 3. notify guest
> >> > > > > >> > 4. if guest responds, remove device
> >> > > > > >> > 
> >> > > > > >> > I think we're suggesting to instead do (if we keep disconnect as part of
> >> > > > > >> > device_del)
> >> > > > > >> > 
> >> > > > > >> > 1. device_del (attemp to remove device)
> >> > > > > >> > 2. notify guest
> >> > > > > >> > 3. invoke device destruction callback resulting in disconnect host resource from device
> >> > > > > >> > 4. if guest responds, invoke device destruction path a second time.
> >> > > > > >> 
> >> > > > > >> By response you mean eject?  No, this is not what I was suggesting.
> >> > > > > >> I was really suggesting that your patch is fine :)
> >> > > > > >> Sorry about confusion.
> >> > > > > >
> >> > > > > > I don't mean eject; I mean responding to the ACPI event by writing a
> >> > > > > > response to the PCI chipset which QEMU then in turn will invoke the
> >> > > > > > qdev_unplug() path which ultimately kills the device and the Drive and
> >> > > > > > BlockState objects.
> >> > > > > >
> >> > > > > >> 
> >> > > > > >> I was also saying that from what I hear, the pci express support
> >> > > > > >> will at some point need interfaces to
> >> > > > > >> - notify guest about device removal/addition
> >> > > > > >> - get eject from guest
> >> > > > > >> - remove device without talking to guest
> >> > > > > >> - add device without talking to guest
> >> > > > > >> - suppress device deletion on eject
> >> > > > > >> 
> >> > > > > >> All this can be generic and can work through express
> >> > > > > >> configuration mechanisms or through acpi for pci.
> >> > > > > >> But this is completely separate from unplugging
> >> > > > > >> the host backend, which should be possible at any point.
> >> > > > > >
> >> > > > > > Yes.  I think we've worked out that we do want an independent
> >> > > > > > unplug/disconnect mechanism rather than tying it to device_del.
> >> > > > > >
> >> > > > > > Marcus, it sounds like then you wanted to see a net_unplug/disconnect
> >> > > > > > and that instead of having device_del always succeed and replacing it
> >> > > > > > with a shell, we'd need to provide an explicit command to do the
> >> > > > > > disconnect in a similar fashion to how we're doing drive_unplug?
> >> > > > > 
> >> > > > > I'm not sure I parse this.
> >> > > > 
> >> > > > You were asking for net and block disconnect to have similar mechanisms.
> >> > > > You mentioned the net fix for suprise removal was to have device_del()
> >> > > > always succeed by replacing the device with a shell/zombie.  The
> >> > > > drive_unplug() patch doesn't do the same thing; it doesn't affect the
> >> > > > device_del() path at all, rather it provides mgmt apps a hook to
> >> > > > directly disconnect host resource from guest resource.
> >> > > 
> >> > > Yes, the shell thing is just an implementation detail.
> >> > 
> >> > ok.  What qemu monitor command do I call for net delete to do the
> >> > "disconnect/unplug"?
> >> 
> >> 
> >> netdev_del
> >
> > OK.  With netdev_del and drive_unplug commands (not sure if we care to
> > change the names to be similar, maybe blockdev_del) in qemu, we can then
> > implement the following in libvirt:
> >
> > 1) detach-device invocation
> > 2) issue device_del to QEMU
> > 2a) notification is sent)
> > 3) issue netdev_del/blockdev_del as appropriate for the device type
> > 4) update guest XML to indicate device has been removed
> >
> > And a fancier version would look like:
> >
> > 1) detach-device invocation
> > 2) issue device_del to QEMU
> > 2a) notification is sent)
> > 3) set a timeout for guest to respond
> > 4) when timeout expires
> > 4a) check if the pci device has been removed by quering QEMU
> >     if it hasn't been removed, issue netdev_del/blockdev_del
> > 5) update guest XML to indicate device has been removed
> >
> >
> > in both cases, I think we'll also want a patch that validates that the
> > pci slot is available before handing it out again; this will handle the
> > case where the guest doesn't respond to the device removal request; our
> > net/blockdev_del command will break the host/guest association, but we
> > don't want to attempt to attach a device to the same slot.
> >
> > Marcus, do you think we're at a point where the mechanisms for
> > explicitly revoking access to the host resource is consistent between
> > net and block?
> >
> > If so, then I suppose I might have a consmetic patch to fix up the
> > monitor command name to line up with the netdev_del.
> 
> I'd be fine with any of these:
> 
> 1. A new command "device_disconnet ID" (or similar name) to disconnect
>    device ID from any host parts.  Nice touch: you don't have to know
>    about the device's host part(s) to disconnect it.  But it might be
>    more work than the other two.

This is sort of what netdev_del() and drive_unplug() are today; we're
just saying sever the connection of this device id.   

I'd like to rename drive_unplug() to blockdev_del() and call it done.  I
was looking at libvirt and the right call to netdev_del is already
in-place; I'd just need to re-spin my block patch to call blockdev_del()
after invoking device_del() to match what is done for net.

> 
> 2. New commands netdev_disconnect, drive_disconnect (or similar names)
>    to disconnect a host part from a guest device.  Like (1), except you
>    have to point to the other end of the connection to cut it.

What's the advantage here? We need an additional piece of info (host
part) in addition to the device id?

> 
> 3. A new command "drive_del ID" similar to existing netdev_del.  This is
>    (2) fused with delete.  Conceptual wart: you can't disconnect and
>    keep the host part around.  Moreover, delete is slightly dangerous,
>    because it renders any guest device still using the host part
>    useless.

Hrm, I thought that's what (1) is.  Well, either (1) or (3); I'd like to
rename drive_unplug() to blockdev_del() since they're similar function
w.r.t removing access to the host resource.  And we can invoke them in
the same way from libvirt (after doing guest notification, remove
access).

> 
> Do you need anything else from me to make progress?

I think just an agreement on the approach; shouldn't take more than a
few hours to respin the qemu and libvirt side.


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-05 14:17                                               ` Michael S. Tsirkin
@ 2010-11-05 14:29                                                 ` Ryan Harper
  2010-11-05 16:01                                                 ` Markus Armbruster
  1 sibling, 0 replies; 60+ messages in thread
From: Ryan Harper @ 2010-11-05 14:29 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Kevin Wolf, yamahata, Markus Armbruster, qemu-devel,
	Anthony Liguori, Ryan Harper, Stefan Hajnoczi

* Michael S. Tsirkin <mst@redhat.com> [2010-11-05 09:18]:
> On Fri, Nov 05, 2010 at 02:27:49PM +0100, Markus Armbruster wrote:
> > Ryan Harper <ryanh@us.ibm.com> writes:
> > 
> > > * Michael S. Tsirkin <mst@redhat.com> [2010-11-03 16:46]:
> > >> On Wed, Nov 03, 2010 at 03:59:29PM -0500, Ryan Harper wrote:
> > >> > * Michael S. Tsirkin <mst@redhat.com> [2010-11-03 13:03]:
> > >> > > On Wed, Nov 03, 2010 at 12:29:10PM -0500, Ryan Harper wrote:
> > >> > > > * Markus Armbruster <armbru@redhat.com> [2010-11-03 11:42]:
> > >> > > > > Ryan Harper <ryanh@us.ibm.com> writes:
> > >> > > > > 
> > >> > > > > > * Michael S. Tsirkin <mst@redhat.com> [2010-11-03 02:22]:
> > >> > > > > >> On Tue, Nov 02, 2010 at 03:23:38PM -0500, Ryan Harper wrote:
> > >> > > > > >> > * Michael S. Tsirkin <mst@redhat.com> [2010-11-02 14:18]:
> > >> > > > > >> > > On Tue, Nov 02, 2010 at 02:01:08PM -0500, Ryan Harper wrote:
> > >> > > > > >> > > > > > > > I like the idea of disconnect; if part of the device_del method was to
> > >> > > > > >> > > > > > > > invoke a disconnect method, we could implement that for block, net, etc;
> > >> > > > > >> > > > > > > > 
> > >> > > > > >> > > > > > > > I'd think we'd want to send the notification, then disconnect.
> > >> > > > > >> > > > > > > > Struggling with whether it's worth having some reasonable timeout
> > >> > > > > >> > > > > > > > between notification and disconnect.  
> > >> > > > > >> > > > > > > 
> > >> > > > > >> > > > > > > The problem with this is that it has no analog in real world.
> > >> > > > > >> > > > > > > In real world, you can send some notifications to the guest, and you can
> > >> > > > > >> > > > > > > remove the card.  Tying them together is what created the problem in the
> > >> > > > > >> > > > > > > first place.
> > >> > > > > >> > > > > > > 
> > >> > > > > >> > > > > > > Timeouts can be implemented by management, maybe with a nice dialog
> > >> > > > > >> > > > > > > being shown to the user.
> > >> > > > > >> > > > > > 
> > >> > > > > >> > > > > > Very true.  I'm fine with forcing a disconnect during the removal path
> > >> > > > > >> > > > > > prior to notification.  Do we want a new disconnect method at the device
> > >> > > > > >> > > > > > level (pci)? or just use the existing removal callback and call that
> > >> > > > > >> > > > > > during the initial hotremov event?
> > >> > > > > >> > > > > 
> > >> > > > > >> > > > > Not sure what you mean by that, but I don't see a device doing anything
> > >> > > > > >> > > > > differently wrt surprise or ordered removal. So probably the existing
> > >> > > > > >> > > > > callback should do. I don't think we need to talk about disconnect:
> > >> > > > > >> > > > > since we decided we are emulating device removal, let's call it
> > >> > > > > >> > > > > just that.
> > >> > > > > >> > > > 
> > >> > > > > >> > > > Because current the "removal" process depends on the guest actually
> > >> > > > > >> > > > responding.  What I'm suggesting is that, in Marcus's term, and what
> > >> > > > > >> > > > drive_unplug() implements, is to disconnect the host block device from
> > >> > > > > >> > > > the guest device to prevent any further access to it in the case the
> > >> > > > > >> > > > guest doesn't respond to the removal request made via ACPI.
> > >> > > > > >> > > > 
> > >> > > > > >> > > > Very specifically, what we're suggesting instead of the drive_unplug()
> > >> > > > > >> > > > command so to complete the device removal operation without waiting for
> > >> > > > > >> > > > the guest to respond; that's what's going to happen if we invoke the
> > >> > > > > >> > > > response callback; it will appear as if the guest responded whether it
> > >> > > > > >> > > > did or not.
> > >> > > > > >> > > > 
> > >> > > > > >> > > > What I was suggesting above was to instead of calling the callback for
> > >> > > > > >> > > > handing the guest response was to add a device function called
> > >> > > > > >> > > > disconnect which would remove any association of host resources from
> > >> > > > > >> > > > guest resources before we notified the guest.  Thinking about it again
> > >> > > > > >> > > > I'm not sure this is useful, but if we're going to remove the device
> > >> > > > > >> > > > without the guests knowledge, I'm not sure how useful sending the
> > >> > > > > >> > > > removal requests via ACPI is in the first place.
> > >> > > > > >> > > > 
> > >> > > > > >> > > > My feeling is that I'd like to have explicit control over the disconnect
> > >> > > > > >> > > > from host resources separate from the device removal *if* we're going to
> > >> > > > > >> > > > retain the guest notification.  If we don't care to notify the guest,
> > >> > > > > >> > > > then we can just do device removal without notifying the guest
> > >> > > > > >> > > > and be done with it.
> > >> > > > > >> > > 
> > >> > > > > >> > > I imagine management would typically want to do this:
> > >> > > > > >> > > 1. notify guest
> > >> > > > > >> > > 2. wait a bit
> > >> > > > > >> > > 3. remove device
> > >> > > > > >> > 
> > >> > > > > >> > Yes; but this argues for (1) being a separate command from (3)
> > >> > > > > >> 
> > >> > > > > >> Yes. Long term I think we will want a way to do that.
> > >> > > > > >> 
> > >> > > > > >> > unless we
> > >> > > > > >> > require (3) to include (1) and (2) in the qemu implementation.
> > >> > > > > >> > 
> > >> > > > > >> > Currently we implement:
> > >> > > > > >> > 
> > >> > > > > >> > 1. device_del (attempt to remove device)
> > >> > > > > >> > 2. notify guest
> > >> > > > > >> > 3. if guest responds, remove device
> > >> > > > > >> > 4. disconnect host resource from device on destruction
> > >> > > > > >> > 
> > >> > > > > >> > With my drive_unplug patch we do:
> > >> > > > > >> > 
> > >> > > > > >> > 1. disconnect host resource from device
> > >> > > > > >> 
> > >> > > > > >> This is what drive_unplug does, right?
> > >> > > > > >
> > >> > > > > > Correct.
> > >> > > > > >
> > >> > > > > >> 
> > >> > > > > >> > 2. device_del (attempt to remove device)
> > >> > > > > >> > 3. notify guest
> > >> > > > > >> > 4. if guest responds, remove device
> > >> > > > > >> > 
> > >> > > > > >> > I think we're suggesting to instead do (if we keep disconnect as part of
> > >> > > > > >> > device_del)
> > >> > > > > >> > 
> > >> > > > > >> > 1. device_del (attemp to remove device)
> > >> > > > > >> > 2. notify guest
> > >> > > > > >> > 3. invoke device destruction callback resulting in disconnect host resource from device
> > >> > > > > >> > 4. if guest responds, invoke device destruction path a second time.
> > >> > > > > >> 
> > >> > > > > >> By response you mean eject?  No, this is not what I was suggesting.
> > >> > > > > >> I was really suggesting that your patch is fine :)
> > >> > > > > >> Sorry about confusion.
> > >> > > > > >
> > >> > > > > > I don't mean eject; I mean responding to the ACPI event by writing a
> > >> > > > > > response to the PCI chipset which QEMU then in turn will invoke the
> > >> > > > > > qdev_unplug() path which ultimately kills the device and the Drive and
> > >> > > > > > BlockState objects.
> > >> > > > > >
> > >> > > > > >> 
> > >> > > > > >> I was also saying that from what I hear, the pci express support
> > >> > > > > >> will at some point need interfaces to
> > >> > > > > >> - notify guest about device removal/addition
> > >> > > > > >> - get eject from guest
> > >> > > > > >> - remove device without talking to guest
> > >> > > > > >> - add device without talking to guest
> > >> > > > > >> - suppress device deletion on eject
> > >> > > > > >> 
> > >> > > > > >> All this can be generic and can work through express
> > >> > > > > >> configuration mechanisms or through acpi for pci.
> > >> > > > > >> But this is completely separate from unplugging
> > >> > > > > >> the host backend, which should be possible at any point.
> > >> > > > > >
> > >> > > > > > Yes.  I think we've worked out that we do want an independent
> > >> > > > > > unplug/disconnect mechanism rather than tying it to device_del.
> > >> > > > > >
> > >> > > > > > Marcus, it sounds like then you wanted to see a net_unplug/disconnect
> > >> > > > > > and that instead of having device_del always succeed and replacing it
> > >> > > > > > with a shell, we'd need to provide an explicit command to do the
> > >> > > > > > disconnect in a similar fashion to how we're doing drive_unplug?
> > >> > > > > 
> > >> > > > > I'm not sure I parse this.
> > >> > > > 
> > >> > > > You were asking for net and block disconnect to have similar mechanisms.
> > >> > > > You mentioned the net fix for suprise removal was to have device_del()
> > >> > > > always succeed by replacing the device with a shell/zombie.  The
> > >> > > > drive_unplug() patch doesn't do the same thing; it doesn't affect the
> > >> > > > device_del() path at all, rather it provides mgmt apps a hook to
> > >> > > > directly disconnect host resource from guest resource.
> > >> > > 
> > >> > > Yes, the shell thing is just an implementation detail.
> > >> > 
> > >> > ok.  What qemu monitor command do I call for net delete to do the
> > >> > "disconnect/unplug"?
> > >> 
> > >> 
> > >> netdev_del
> > >
> > > OK.  With netdev_del and drive_unplug commands (not sure if we care to
> > > change the names to be similar, maybe blockdev_del) in qemu, we can then
> > > implement the following in libvirt:
> > >
> > > 1) detach-device invocation
> > > 2) issue device_del to QEMU
> > > 2a) notification is sent)
> > > 3) issue netdev_del/blockdev_del as appropriate for the device type
> > > 4) update guest XML to indicate device has been removed
> > >
> > > And a fancier version would look like:
> > >
> > > 1) detach-device invocation
> > > 2) issue device_del to QEMU
> > > 2a) notification is sent)
> > > 3) set a timeout for guest to respond
> > > 4) when timeout expires
> > > 4a) check if the pci device has been removed by quering QEMU
> > >     if it hasn't been removed, issue netdev_del/blockdev_del
> > > 5) update guest XML to indicate device has been removed
> > >
> > >
> > > in both cases, I think we'll also want a patch that validates that the
> > > pci slot is available before handing it out again; this will handle the
> > > case where the guest doesn't respond to the device removal request; our
> > > net/blockdev_del command will break the host/guest association, but we
> > > don't want to attempt to attach a device to the same slot.
> > >
> > > Marcus, do you think we're at a point where the mechanisms for
> > > explicitly revoking access to the host resource is consistent between
> > > net and block?
> > >
> > > If so, then I suppose I might have a consmetic patch to fix up the
> > > monitor command name to line up with the netdev_del.
> > 
> > I'd be fine with any of these:
> > 
> > 1. A new command "device_disconnet ID" (or similar name) to disconnect
> >    device ID from any host parts.  Nice touch: you don't have to know
> >    about the device's host part(s) to disconnect it.  But it might be
> >    more work than the other two.
> > 
> > 2. New commands netdev_disconnect, drive_disconnect (or similar names)
> >    to disconnect a host part from a guest device.  Like (1), except you
> >    have to point to the other end of the connection to cut it.
> 
> I think it's cleaner not to introduce a concept of a disconnected
> backend.
> 
> One thing that we must be careful to explicitly disallow, is
> reconnecting guest to another host backend. The reason being
> that guest might rely on backend features and changing these
> would break this.
> 
> Given that, disconnecting without delete isn't helpful.
> 
> > 3. A new command "drive_del ID" similar to existing netdev_del.  This is
> >    (2) fused with delete.  Conceptual wart: you can't disconnect and
> >    keep the host part around.  Moreover, delete is slightly dangerous,
> >    because it renders any guest device still using the host part
> >    useless.
> 
> I don't see how it's more dangerous than disconnecting.
> If guest can't access the backend it might not exist
> as far as guest is concerned.
> 
> > Do you need anything else from me to make progress?
> 
> Let's go for 3. Need for 1/2 seems dubious, and it's much harder
> to support.

Other than naming I thought (1) and (3) were the same; but if the current
netdev_del() is considered (3), then I'm for renaming drive_unplug to
blockdev_del (or drive_del).


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-05 14:17                                               ` Michael S. Tsirkin
  2010-11-05 14:29                                                 ` Ryan Harper
@ 2010-11-05 16:01                                                 ` Markus Armbruster
  2010-11-08 21:02                                                   ` Michael S. Tsirkin
  1 sibling, 1 reply; 60+ messages in thread
From: Markus Armbruster @ 2010-11-05 16:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Kevin Wolf, yamahata, qemu-devel, Anthony Liguori, Ryan Harper,
	Stefan Hajnoczi

"Michael S. Tsirkin" <mst@redhat.com> writes:

> On Fri, Nov 05, 2010 at 02:27:49PM +0100, Markus Armbruster wrote:
>> I'd be fine with any of these:
>> 
>> 1. A new command "device_disconnet ID" (or similar name) to disconnect
>>    device ID from any host parts.  Nice touch: you don't have to know
>>    about the device's host part(s) to disconnect it.  But it might be
>>    more work than the other two.
>> 
>> 2. New commands netdev_disconnect, drive_disconnect (or similar names)
>>    to disconnect a host part from a guest device.  Like (1), except you
>>    have to point to the other end of the connection to cut it.
>
> I think it's cleaner not to introduce a concept of a disconnected
> backend.

Backends start disconnected, so the concept already exists.

> One thing that we must be careful to explicitly disallow, is
> reconnecting guest to another host backend. The reason being
> that guest might rely on backend features and changing these
> would break this.
>
> Given that, disconnecting without delete isn't helpful.

What about disconnect, hot plug new device, connect?

>> 3. A new command "drive_del ID" similar to existing netdev_del.  This is
>>    (2) fused with delete.  Conceptual wart: you can't disconnect and
>>    keep the host part around.  Moreover, delete is slightly dangerous,
>>    because it renders any guest device still using the host part
>>    useless.
>
> I don't see how it's more dangerous than disconnecting.
> If guest can't access the backend it might not exist
> as far as guest is concerned.

If we keep disconnect and delete separate operations, we can make delete
fail when still connected.  Typo insurance.

>> Do you need anything else from me to make progress?
>
> Let's go for 3. Need for 1/2 seems dubious, and it's much harder
> to support.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-05 14:25                                               ` Ryan Harper
@ 2010-11-05 16:10                                                 ` Markus Armbruster
  2010-11-05 16:22                                                   ` Ryan Harper
  0 siblings, 1 reply; 60+ messages in thread
From: Markus Armbruster @ 2010-11-05 16:10 UTC (permalink / raw)
  To: Ryan Harper
  Cc: Kevin Wolf, yamahata, Michael S. Tsirkin, qemu-devel,
	Anthony Liguori, Stefan Hajnoczi

Ryan Harper <ryanh@us.ibm.com> writes:

> * Markus Armbruster <armbru@redhat.com> [2010-11-05 08:28]:
>> I'd be fine with any of these:
>> 
>> 1. A new command "device_disconnet ID" (or similar name) to disconnect
>>    device ID from any host parts.  Nice touch: you don't have to know
>>    about the device's host part(s) to disconnect it.  But it might be
>>    more work than the other two.
>
> This is sort of what netdev_del() and drive_unplug() are today; we're
> just saying sever the connection of this device id.   

No, I have netdev_del as (3).

All three options are "sort of" the same, just different commands with
a common purpose.

> I'd like to rename drive_unplug() to blockdev_del() and call it done.  I
> was looking at libvirt and the right call to netdev_del is already
> in-place; I'd just need to re-spin my block patch to call blockdev_del()
> after invoking device_del() to match what is done for net.

Unless I'm missing something, you can't just rename: your unplug does
not delete the host part.

>> 2. New commands netdev_disconnect, drive_disconnect (or similar names)
>>    to disconnect a host part from a guest device.  Like (1), except you
>>    have to point to the other end of the connection to cut it.
>
> What's the advantage here? We need an additional piece of info (host
> part) in addition to the device id?

That's a disadvantage.

Possible advantage: implementation could be slightly easier than (1),
because you don't have to find the host parts.

>> 3. A new command "drive_del ID" similar to existing netdev_del.  This is
>>    (2) fused with delete.  Conceptual wart: you can't disconnect and
>>    keep the host part around.  Moreover, delete is slightly dangerous,
>>    because it renders any guest device still using the host part
>>    useless.
>
> Hrm, I thought that's what (1) is.

No.

With (1), the argument is a *device* ID, and we disconnect *all* host
parts connected to this device (typically just one).

With (3), the argument is a netdev/drive ID, and disconnect *this* host
part from the peer device.

>                                     Well, either (1) or (3); I'd like to
> rename drive_unplug() to blockdev_del() since they're similar function
> w.r.t removing access to the host resource.  And we can invoke them in
> the same way from libvirt (after doing guest notification, remove
> access).

I'd call it drive_del for now, to match drive_add.

>> Do you need anything else from me to make progress?
>
> I think just an agreement on the approach; shouldn't take more than a
> few hours to respin the qemu and libvirt side.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-05 16:10                                                 ` Markus Armbruster
@ 2010-11-05 16:22                                                   ` Ryan Harper
  2010-11-06  8:18                                                     ` Markus Armbruster
  0 siblings, 1 reply; 60+ messages in thread
From: Ryan Harper @ 2010-11-05 16:22 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Kevin Wolf, yamahata, Michael S. Tsirkin, qemu-devel,
	Anthony Liguori, Ryan Harper, Stefan Hajnoczi

* Markus Armbruster <armbru@redhat.com> [2010-11-05 11:11]:
> Ryan Harper <ryanh@us.ibm.com> writes:
> 
> > * Markus Armbruster <armbru@redhat.com> [2010-11-05 08:28]:
> >> I'd be fine with any of these:
> >> 
> >> 1. A new command "device_disconnet ID" (or similar name) to disconnect
> >>    device ID from any host parts.  Nice touch: you don't have to know
> >>    about the device's host part(s) to disconnect it.  But it might be
> >>    more work than the other two.
> >
> > This is sort of what netdev_del() and drive_unplug() are today; we're
> > just saying sever the connection of this device id.   
> 
> No, I have netdev_del as (3).
> 
> All three options are "sort of" the same, just different commands with
> a common purpose.
> 
> > I'd like to rename drive_unplug() to blockdev_del() and call it done.  I
> > was looking at libvirt and the right call to netdev_del is already
> > in-place; I'd just need to re-spin my block patch to call blockdev_del()
> > after invoking device_del() to match what is done for net.
> 
> Unless I'm missing something, you can't just rename: your unplug does
> not delete the host part.
> 
> >> 2. New commands netdev_disconnect, drive_disconnect (or similar names)
> >>    to disconnect a host part from a guest device.  Like (1), except you
> >>    have to point to the other end of the connection to cut it.
> >
> > What's the advantage here? We need an additional piece of info (host
> > part) in addition to the device id?
> 
> That's a disadvantage.
> 
> Possible advantage: implementation could be slightly easier than (1),
> because you don't have to find the host parts.
> 
> >> 3. A new command "drive_del ID" similar to existing netdev_del.  This is
> >>    (2) fused with delete.  Conceptual wart: you can't disconnect and
> >>    keep the host part around.  Moreover, delete is slightly dangerous,
> >>    because it renders any guest device still using the host part
> >>    useless.
> >
> > Hrm, I thought that's what (1) is.
> 
> No.
> 
> With (1), the argument is a *device* ID, and we disconnect *all* host
> parts connected to this device (typically just one).
> 
> With (3), the argument is a netdev/drive ID, and disconnect *this* host
> part from the peer device.
> 
> >                                     Well, either (1) or (3); I'd like to
> > rename drive_unplug() to blockdev_del() since they're similar function
> > w.r.t removing access to the host resource.  And we can invoke them in
> > the same way from libvirt (after doing guest notification, remove
> > access).
> 
> I'd call it drive_del for now, to match drive_add.

OK, drive_del() and as you mentioned, drive_unplug will take out the
block driver, but doesn't remove the dinfo object; that ends up dying
when we call the device destructor.  I think for symmetry we'll want
drive_del to remove the dinfo object as well.


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-05 16:22                                                   ` Ryan Harper
@ 2010-11-06  8:18                                                     ` Markus Armbruster
  2010-11-08  2:19                                                       ` Ryan Harper
  0 siblings, 1 reply; 60+ messages in thread
From: Markus Armbruster @ 2010-11-06  8:18 UTC (permalink / raw)
  To: Ryan Harper
  Cc: Kevin Wolf, yamahata, Michael S. Tsirkin, qemu-devel,
	Anthony Liguori, Stefan Hajnoczi

Ryan Harper <ryanh@us.ibm.com> writes:

> * Markus Armbruster <armbru@redhat.com> [2010-11-05 11:11]:
>> Ryan Harper <ryanh@us.ibm.com> writes:
>> 
>> > * Markus Armbruster <armbru@redhat.com> [2010-11-05 08:28]:
>> >> I'd be fine with any of these:
>> >> 
>> >> 1. A new command "device_disconnet ID" (or similar name) to disconnect
>> >>    device ID from any host parts.  Nice touch: you don't have to know
>> >>    about the device's host part(s) to disconnect it.  But it might be
>> >>    more work than the other two.
>> >
>> > This is sort of what netdev_del() and drive_unplug() are today; we're
>> > just saying sever the connection of this device id.   
>> 
>> No, I have netdev_del as (3).
>> 
>> All three options are "sort of" the same, just different commands with
>> a common purpose.
>> 
>> > I'd like to rename drive_unplug() to blockdev_del() and call it done.  I
>> > was looking at libvirt and the right call to netdev_del is already
>> > in-place; I'd just need to re-spin my block patch to call blockdev_del()
>> > after invoking device_del() to match what is done for net.
>> 
>> Unless I'm missing something, you can't just rename: your unplug does
>> not delete the host part.
>> 
>> >> 2. New commands netdev_disconnect, drive_disconnect (or similar names)
>> >>    to disconnect a host part from a guest device.  Like (1), except you
>> >>    have to point to the other end of the connection to cut it.
>> >
>> > What's the advantage here? We need an additional piece of info (host
>> > part) in addition to the device id?
>> 
>> That's a disadvantage.
>> 
>> Possible advantage: implementation could be slightly easier than (1),
>> because you don't have to find the host parts.
>> 
>> >> 3. A new command "drive_del ID" similar to existing netdev_del.  This is
>> >>    (2) fused with delete.  Conceptual wart: you can't disconnect and
>> >>    keep the host part around.  Moreover, delete is slightly dangerous,
>> >>    because it renders any guest device still using the host part
>> >>    useless.
>> >
>> > Hrm, I thought that's what (1) is.
>> 
>> No.
>> 
>> With (1), the argument is a *device* ID, and we disconnect *all* host
>> parts connected to this device (typically just one).
>> 
>> With (3), the argument is a netdev/drive ID, and disconnect *this* host
>> part from the peer device.
>> 
>> >                                     Well, either (1) or (3); I'd like to
>> > rename drive_unplug() to blockdev_del() since they're similar function
>> > w.r.t removing access to the host resource.  And we can invoke them in
>> > the same way from libvirt (after doing guest notification, remove
>> > access).
>> 
>> I'd call it drive_del for now, to match drive_add.
>
> OK, drive_del() and as you mentioned, drive_unplug will take out the
> block driver, but doesn't remove the dinfo object; that ends up dying
> when we call the device destructor.  I think for symmetry we'll want
> drive_del to remove the dinfo object as well.

Exactly.

a. bdrv_detach() to zap the pointer from bdrv to qdev
b. zap the pointer from qdev to bdrv
c. drive_uninit() to dispose of the host part

Step b could be awkward with (3), because you don't know device details.
I guess you have to search device properties for a drive property
pointing to bdrv.  I like (1) because it puts that loop in the one place
where it belongs: qdev core.  (3) duplicates it in every HOSTDEV_del.
Except for netdev_del, which is special because of VLANs.

To avoid step b, you could try to keep the bdrv around in a special
zombie state.  Still have to free the dinfo, but can't use
drive_uninit() for that then.

If you think I'm overcomplicating this, feel free to prove me wrong with
working code :)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-06  8:18                                                     ` Markus Armbruster
@ 2010-11-08  2:19                                                       ` Ryan Harper
  2010-11-08 10:32                                                         ` Markus Armbruster
  0 siblings, 1 reply; 60+ messages in thread
From: Ryan Harper @ 2010-11-08  2:19 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Kevin Wolf, yamahata, Michael S. Tsirkin, qemu-devel,
	Anthony Liguori, Ryan Harper, Stefan Hajnoczi

* Markus Armbruster <armbru@redhat.com> [2010-11-06 04:19]:
> Ryan Harper <ryanh@us.ibm.com> writes:
> 
> > * Markus Armbruster <armbru@redhat.com> [2010-11-05 11:11]:
> >> Ryan Harper <ryanh@us.ibm.com> writes:
> >> 
> >> > * Markus Armbruster <armbru@redhat.com> [2010-11-05 08:28]:
> >> >> I'd be fine with any of these:
> >> >> 
> >> >> 1. A new command "device_disconnet ID" (or similar name) to disconnect
> >> >>    device ID from any host parts.  Nice touch: you don't have to know
> >> >>    about the device's host part(s) to disconnect it.  But it might be
> >> >>    more work than the other two.
> >> >
> >> > This is sort of what netdev_del() and drive_unplug() are today; we're
> >> > just saying sever the connection of this device id.   
> >> 
> >> No, I have netdev_del as (3).
> >> 
> >> All three options are "sort of" the same, just different commands with
> >> a common purpose.
> >> 
> >> > I'd like to rename drive_unplug() to blockdev_del() and call it done.  I
> >> > was looking at libvirt and the right call to netdev_del is already
> >> > in-place; I'd just need to re-spin my block patch to call blockdev_del()
> >> > after invoking device_del() to match what is done for net.
> >> 
> >> Unless I'm missing something, you can't just rename: your unplug does
> >> not delete the host part.
> >> 
> >> >> 2. New commands netdev_disconnect, drive_disconnect (or similar names)
> >> >>    to disconnect a host part from a guest device.  Like (1), except you
> >> >>    have to point to the other end of the connection to cut it.
> >> >
> >> > What's the advantage here? We need an additional piece of info (host
> >> > part) in addition to the device id?
> >> 
> >> That's a disadvantage.
> >> 
> >> Possible advantage: implementation could be slightly easier than (1),
> >> because you don't have to find the host parts.
> >> 
> >> >> 3. A new command "drive_del ID" similar to existing netdev_del.  This is
> >> >>    (2) fused with delete.  Conceptual wart: you can't disconnect and
> >> >>    keep the host part around.  Moreover, delete is slightly dangerous,
> >> >>    because it renders any guest device still using the host part
> >> >>    useless.
> >> >
> >> > Hrm, I thought that's what (1) is.
> >> 
> >> No.
> >> 
> >> With (1), the argument is a *device* ID, and we disconnect *all* host
> >> parts connected to this device (typically just one).
> >> 
> >> With (3), the argument is a netdev/drive ID, and disconnect *this* host
> >> part from the peer device.
> >> 
> >> >                                     Well, either (1) or (3); I'd like to
> >> > rename drive_unplug() to blockdev_del() since they're similar function
> >> > w.r.t removing access to the host resource.  And we can invoke them in
> >> > the same way from libvirt (after doing guest notification, remove
> >> > access).
> >> 
> >> I'd call it drive_del for now, to match drive_add.
> >
> > OK, drive_del() and as you mentioned, drive_unplug will take out the
> > block driver, but doesn't remove the dinfo object; that ends up dying
> > when we call the device destructor.  I think for symmetry we'll want
> > drive_del to remove the dinfo object as well.
> 
> Exactly.
> 
> a. bdrv_detach() to zap the pointer from bdrv to qdev
> b. zap the pointer from qdev to bdrv
> c. drive_uninit() to dispose of the host part

a-c need to be done to match netdev_del symmetry?  How hard of a req is
this?

> 
> Step b could be awkward with (3), because you don't know device details.
> I guess you have to search device properties for a drive property
> pointing to bdrv.  I like (1) because it puts that loop in the one place
> where it belongs: qdev core.  (3) duplicates it in every HOSTDEV_del.
> Except for netdev_del, which is special because of VLANs.
> 
> To avoid step b, you could try to keep the bdrv around in a special
> zombie state.  Still have to free the dinfo, but can't use
> drive_uninit() for that then.
> 
> If you think I'm overcomplicating this, feel free to prove me wrong with
> working code :)

drive_unplug() works as-is today; so it does feel very combursome at
this point.  Other than the name change and agreement on how mgmt should
invoke the command, it's been a long ride to get here.

I'll take my best shot at trying to clean up the other
pointers and objects; though on one of my attempts when I took out the
dinfo() object that didn't go so well; going to have to audit who uses
dinfo and where and what they check before calling it to have a proper
cleanup that doesn't remove the whole device altogether.

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-08  2:19                                                       ` Ryan Harper
@ 2010-11-08 10:32                                                         ` Markus Armbruster
  2010-11-08 10:49                                                           ` Michael S. Tsirkin
  0 siblings, 1 reply; 60+ messages in thread
From: Markus Armbruster @ 2010-11-08 10:32 UTC (permalink / raw)
  To: Ryan Harper
  Cc: Kevin Wolf, yamahata, Michael S. Tsirkin, qemu-devel,
	Anthony Liguori, Stefan Hajnoczi

Ryan Harper <ryanh@us.ibm.com> writes:

> * Markus Armbruster <armbru@redhat.com> [2010-11-06 04:19]:
>> Ryan Harper <ryanh@us.ibm.com> writes:
>> 
>> > * Markus Armbruster <armbru@redhat.com> [2010-11-05 11:11]:
>> >> Ryan Harper <ryanh@us.ibm.com> writes:
>> >> 
>> >> > * Markus Armbruster <armbru@redhat.com> [2010-11-05 08:28]:
>> >> >> I'd be fine with any of these:
>> >> >> 
>> >> >> 1. A new command "device_disconnet ID" (or similar name) to disconnect
>> >> >>    device ID from any host parts.  Nice touch: you don't have to know
>> >> >>    about the device's host part(s) to disconnect it.  But it might be
>> >> >>    more work than the other two.
>> >> >
>> >> > This is sort of what netdev_del() and drive_unplug() are today; we're
>> >> > just saying sever the connection of this device id.   
>> >> 
>> >> No, I have netdev_del as (3).
>> >> 
>> >> All three options are "sort of" the same, just different commands with
>> >> a common purpose.
>> >> 
>> >> > I'd like to rename drive_unplug() to blockdev_del() and call it done.  I
>> >> > was looking at libvirt and the right call to netdev_del is already
>> >> > in-place; I'd just need to re-spin my block patch to call blockdev_del()
>> >> > after invoking device_del() to match what is done for net.
>> >> 
>> >> Unless I'm missing something, you can't just rename: your unplug does
>> >> not delete the host part.
>> >> 
>> >> >> 2. New commands netdev_disconnect, drive_disconnect (or similar names)
>> >> >>    to disconnect a host part from a guest device.  Like (1), except you
>> >> >>    have to point to the other end of the connection to cut it.
>> >> >
>> >> > What's the advantage here? We need an additional piece of info (host
>> >> > part) in addition to the device id?
>> >> 
>> >> That's a disadvantage.
>> >> 
>> >> Possible advantage: implementation could be slightly easier than (1),
>> >> because you don't have to find the host parts.
>> >> 
>> >> >> 3. A new command "drive_del ID" similar to existing netdev_del.  This is
>> >> >>    (2) fused with delete.  Conceptual wart: you can't disconnect and
>> >> >>    keep the host part around.  Moreover, delete is slightly dangerous,
>> >> >>    because it renders any guest device still using the host part
>> >> >>    useless.
>> >> >
>> >> > Hrm, I thought that's what (1) is.
>> >> 
>> >> No.
>> >> 
>> >> With (1), the argument is a *device* ID, and we disconnect *all* host
>> >> parts connected to this device (typically just one).
>> >> 
>> >> With (3), the argument is a netdev/drive ID, and disconnect *this* host
>> >> part from the peer device.
>> >> 
>> >> >                                     Well, either (1) or (3); I'd like to
>> >> > rename drive_unplug() to blockdev_del() since they're similar function
>> >> > w.r.t removing access to the host resource.  And we can invoke them in
>> >> > the same way from libvirt (after doing guest notification, remove
>> >> > access).
>> >> 
>> >> I'd call it drive_del for now, to match drive_add.
>> >
>> > OK, drive_del() and as you mentioned, drive_unplug will take out the
>> > block driver, but doesn't remove the dinfo object; that ends up dying
>> > when we call the device destructor.  I think for symmetry we'll want
>> > drive_del to remove the dinfo object as well.
>> 
>> Exactly.
>> 
>> a. bdrv_detach() to zap the pointer from bdrv to qdev
>> b. zap the pointer from qdev to bdrv
>> c. drive_uninit() to dispose of the host part
>
> a-c need to be done to match netdev_del symmetry?  How hard of a req is
> this?

Without (c), it's not a delete.  And (c) without (b) leaves a dangling
pointer.  (c) without (a) fails an assertion in bdrv_delete().

Aside: (b) should probably be folded into bdrv_detach().

>> Step b could be awkward with (3), because you don't know device details.
>> I guess you have to search device properties for a drive property
>> pointing to bdrv.  I like (1) because it puts that loop in the one place
>> where it belongs: qdev core.  (3) duplicates it in every HOSTDEV_del.
>> Except for netdev_del, which is special because of VLANs.
>> 
>> To avoid step b, you could try to keep the bdrv around in a special
>> zombie state.  Still have to free the dinfo, but can't use
>> drive_uninit() for that then.
>> 
>> If you think I'm overcomplicating this, feel free to prove me wrong with
>> working code :)
>
> drive_unplug() works as-is today; so it does feel very combursome at
> this point.  Other than the name change and agreement on how mgmt should
> invoke the command, it's been a long ride to get here.

Sometimes it takes a tough man to make a tender chicken.

> I'll take my best shot at trying to clean up the other
> pointers and objects; though on one of my attempts when I took out the
> dinfo() object that didn't go so well; going to have to audit who uses
> dinfo and where and what they check before calling it to have a proper
> cleanup that doesn't remove the whole device altogether.

Steps a, b, c are the result of my (admittedly quick) audit.

Here's how the various objects are connected to each other:

               contains
drivelist    -----------> DriveInfo
                                |
                                | .bdrv
                                | .id == .bdrv->device_name
                                |
               contains         V
bdrv_states  -----------> BlockDriverState
                             |   ^
                       .peer |   |
                             |   |                          host part
-----------------------------|---|-----------------------------------
                             |   |                         guest part
                             |   | property "drive"
                             v   |
                          DeviceState

To disconnect host from guest part, you need to cut both pointers.  To
delete the host part, you need to delete both objects, BlockDriverState
and DriveInfo.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-08 10:32                                                         ` Markus Armbruster
@ 2010-11-08 10:49                                                           ` Michael S. Tsirkin
  2010-11-08 12:03                                                             ` Markus Armbruster
  0 siblings, 1 reply; 60+ messages in thread
From: Michael S. Tsirkin @ 2010-11-08 10:49 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Kevin Wolf, yamahata, qemu-devel, Anthony Liguori, Ryan Harper,
	Stefan Hajnoczi

On Mon, Nov 08, 2010 at 11:32:01AM +0100, Markus Armbruster wrote:
> Ryan Harper <ryanh@us.ibm.com> writes:
> 
> > * Markus Armbruster <armbru@redhat.com> [2010-11-06 04:19]:
> >> Ryan Harper <ryanh@us.ibm.com> writes:
> >> 
> >> > * Markus Armbruster <armbru@redhat.com> [2010-11-05 11:11]:
> >> >> Ryan Harper <ryanh@us.ibm.com> writes:
> >> >> 
> >> >> > * Markus Armbruster <armbru@redhat.com> [2010-11-05 08:28]:
> >> >> >> I'd be fine with any of these:
> >> >> >> 
> >> >> >> 1. A new command "device_disconnet ID" (or similar name) to disconnect
> >> >> >>    device ID from any host parts.  Nice touch: you don't have to know
> >> >> >>    about the device's host part(s) to disconnect it.  But it might be
> >> >> >>    more work than the other two.
> >> >> >
> >> >> > This is sort of what netdev_del() and drive_unplug() are today; we're
> >> >> > just saying sever the connection of this device id.   
> >> >> 
> >> >> No, I have netdev_del as (3).
> >> >> 
> >> >> All three options are "sort of" the same, just different commands with
> >> >> a common purpose.
> >> >> 
> >> >> > I'd like to rename drive_unplug() to blockdev_del() and call it done.  I
> >> >> > was looking at libvirt and the right call to netdev_del is already
> >> >> > in-place; I'd just need to re-spin my block patch to call blockdev_del()
> >> >> > after invoking device_del() to match what is done for net.
> >> >> 
> >> >> Unless I'm missing something, you can't just rename: your unplug does
> >> >> not delete the host part.
> >> >> 
> >> >> >> 2. New commands netdev_disconnect, drive_disconnect (or similar names)
> >> >> >>    to disconnect a host part from a guest device.  Like (1), except you
> >> >> >>    have to point to the other end of the connection to cut it.
> >> >> >
> >> >> > What's the advantage here? We need an additional piece of info (host
> >> >> > part) in addition to the device id?
> >> >> 
> >> >> That's a disadvantage.
> >> >> 
> >> >> Possible advantage: implementation could be slightly easier than (1),
> >> >> because you don't have to find the host parts.
> >> >> 
> >> >> >> 3. A new command "drive_del ID" similar to existing netdev_del.  This is
> >> >> >>    (2) fused with delete.  Conceptual wart: you can't disconnect and
> >> >> >>    keep the host part around.  Moreover, delete is slightly dangerous,
> >> >> >>    because it renders any guest device still using the host part
> >> >> >>    useless.
> >> >> >
> >> >> > Hrm, I thought that's what (1) is.
> >> >> 
> >> >> No.
> >> >> 
> >> >> With (1), the argument is a *device* ID, and we disconnect *all* host
> >> >> parts connected to this device (typically just one).
> >> >> 
> >> >> With (3), the argument is a netdev/drive ID, and disconnect *this* host
> >> >> part from the peer device.
> >> >> 
> >> >> >                                     Well, either (1) or (3); I'd like to
> >> >> > rename drive_unplug() to blockdev_del() since they're similar function
> >> >> > w.r.t removing access to the host resource.  And we can invoke them in
> >> >> > the same way from libvirt (after doing guest notification, remove
> >> >> > access).
> >> >> 
> >> >> I'd call it drive_del for now, to match drive_add.
> >> >
> >> > OK, drive_del() and as you mentioned, drive_unplug will take out the
> >> > block driver, but doesn't remove the dinfo object; that ends up dying
> >> > when we call the device destructor.  I think for symmetry we'll want
> >> > drive_del to remove the dinfo object as well.
> >> 
> >> Exactly.
> >> 
> >> a. bdrv_detach() to zap the pointer from bdrv to qdev
> >> b. zap the pointer from qdev to bdrv
> >> c. drive_uninit() to dispose of the host part
> >
> > a-c need to be done to match netdev_del symmetry?  How hard of a req is
> > this?
> 
> Without (c), it's not a delete.  And (c) without (b) leaves a dangling
> pointer.  (c) without (a) fails an assertion in bdrv_delete().
> 
> Aside: (b) should probably be folded into bdrv_detach().
> 
> >> Step b could be awkward with (3), because you don't know device details.
> >> I guess you have to search device properties for a drive property
> >> pointing to bdrv.  I like (1) because it puts that loop in the one place
> >> where it belongs: qdev core.  (3) duplicates it in every HOSTDEV_del.
> >> Except for netdev_del, which is special because of VLANs.
> >> 
> >> To avoid step b, you could try to keep the bdrv around in a special
> >> zombie state.  Still have to free the dinfo, but can't use
> >> drive_uninit() for that then.
> >> 
> >> If you think I'm overcomplicating this, feel free to prove me wrong with
> >> working code :)
> >
> > drive_unplug() works as-is today; so it does feel very combursome at
> > this point.  Other than the name change and agreement on how mgmt should
> > invoke the command, it's been a long ride to get here.
> 
> Sometimes it takes a tough man to make a tender chicken.

> > I'll take my best shot at trying to clean up the other
> > pointers and objects; though on one of my attempts when I took out the
> > dinfo() object that didn't go so well; going to have to audit who uses
> > dinfo and where and what they check before calling it to have a proper
> > cleanup that doesn't remove the whole device altogether.
> 
> Steps a, b, c are the result of my (admittedly quick) audit.
> 
> Here's how the various objects are connected to each other:
> 
>                contains
> drivelist    -----------> DriveInfo
>                                 |
>                                 | .bdrv
>                                 | .id == .bdrv->device_name
>                                 |
>                contains         V
> bdrv_states  -----------> BlockDriverState
>                              |   ^
>                        .peer |   |
>                              |   |                          host part
> -----------------------------|---|-----------------------------------
>                              |   |                         guest part
>                              |   | property "drive"
>                              v   |
>                           DeviceState
> 
> To disconnect host from guest part, you need to cut both pointers.  To
> delete the host part, you need to delete both objects, BlockDriverState
> and DriveInfo.


If we remove DriveInfo, how can management later detect that guest part
was deleted? If you want symmetry with netdev, it's possible to keep a
shell of BlockDriverState/DriveInfo around (solving dangling pointer
problems).


-- 
MST

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-08 10:49                                                           ` Michael S. Tsirkin
@ 2010-11-08 12:03                                                             ` Markus Armbruster
  2010-11-08 14:02                                                               ` Ryan Harper
  2010-11-08 16:34                                                               ` Michael S. Tsirkin
  0 siblings, 2 replies; 60+ messages in thread
From: Markus Armbruster @ 2010-11-08 12:03 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Kevin Wolf, yamahata, qemu-devel, Anthony Liguori, Ryan Harper,
	Stefan Hajnoczi

"Michael S. Tsirkin" <mst@redhat.com> writes:

> On Mon, Nov 08, 2010 at 11:32:01AM +0100, Markus Armbruster wrote:
>> Ryan Harper <ryanh@us.ibm.com> writes:
>> 
>> > * Markus Armbruster <armbru@redhat.com> [2010-11-06 04:19]:
>> >> Ryan Harper <ryanh@us.ibm.com> writes:
>> >> 
>> >> > * Markus Armbruster <armbru@redhat.com> [2010-11-05 11:11]:
>> >> >> Ryan Harper <ryanh@us.ibm.com> writes:
>> >> >> 
>> >> >> > * Markus Armbruster <armbru@redhat.com> [2010-11-05 08:28]:
>> >> >> >> I'd be fine with any of these:
>> >> >> >> 
>> >> >> >> 1. A new command "device_disconnet ID" (or similar name) to disconnect
>> >> >> >>    device ID from any host parts.  Nice touch: you don't have to know
>> >> >> >>    about the device's host part(s) to disconnect it.  But it might be
>> >> >> >>    more work than the other two.
>> >> >> >
>> >> >> > This is sort of what netdev_del() and drive_unplug() are today; we're
>> >> >> > just saying sever the connection of this device id.   
>> >> >> 
>> >> >> No, I have netdev_del as (3).
>> >> >> 
>> >> >> All three options are "sort of" the same, just different commands with
>> >> >> a common purpose.
>> >> >> 
>> >> >> > I'd like to rename drive_unplug() to blockdev_del() and call it done.  I
>> >> >> > was looking at libvirt and the right call to netdev_del is already
>> >> >> > in-place; I'd just need to re-spin my block patch to call blockdev_del()
>> >> >> > after invoking device_del() to match what is done for net.
>> >> >> 
>> >> >> Unless I'm missing something, you can't just rename: your unplug does
>> >> >> not delete the host part.
>> >> >> 
>> >> >> >> 2. New commands netdev_disconnect, drive_disconnect (or similar names)
>> >> >> >>    to disconnect a host part from a guest device.  Like (1), except you
>> >> >> >>    have to point to the other end of the connection to cut it.
>> >> >> >
>> >> >> > What's the advantage here? We need an additional piece of info (host
>> >> >> > part) in addition to the device id?
>> >> >> 
>> >> >> That's a disadvantage.
>> >> >> 
>> >> >> Possible advantage: implementation could be slightly easier than (1),
>> >> >> because you don't have to find the host parts.
>> >> >> 
>> >> >> >> 3. A new command "drive_del ID" similar to existing netdev_del.  This is
>> >> >> >>    (2) fused with delete.  Conceptual wart: you can't disconnect and
>> >> >> >>    keep the host part around.  Moreover, delete is slightly dangerous,
>> >> >> >>    because it renders any guest device still using the host part
>> >> >> >>    useless.
>> >> >> >
>> >> >> > Hrm, I thought that's what (1) is.
>> >> >> 
>> >> >> No.
>> >> >> 
>> >> >> With (1), the argument is a *device* ID, and we disconnect *all* host
>> >> >> parts connected to this device (typically just one).
>> >> >> 
>> >> >> With (3), the argument is a netdev/drive ID, and disconnect *this* host
>> >> >> part from the peer device.
>> >> >> 
>> >> >> >                                     Well, either (1) or (3); I'd like to
>> >> >> > rename drive_unplug() to blockdev_del() since they're similar function
>> >> >> > w.r.t removing access to the host resource.  And we can invoke them in
>> >> >> > the same way from libvirt (after doing guest notification, remove
>> >> >> > access).
>> >> >> 
>> >> >> I'd call it drive_del for now, to match drive_add.
>> >> >
>> >> > OK, drive_del() and as you mentioned, drive_unplug will take out the
>> >> > block driver, but doesn't remove the dinfo object; that ends up dying
>> >> > when we call the device destructor.  I think for symmetry we'll want
>> >> > drive_del to remove the dinfo object as well.
>> >> 
>> >> Exactly.
>> >> 
>> >> a. bdrv_detach() to zap the pointer from bdrv to qdev
>> >> b. zap the pointer from qdev to bdrv
>> >> c. drive_uninit() to dispose of the host part
>> >
>> > a-c need to be done to match netdev_del symmetry?  How hard of a req is
>> > this?
>> 
>> Without (c), it's not a delete.  And (c) without (b) leaves a dangling
>> pointer.  (c) without (a) fails an assertion in bdrv_delete().
>> 
>> Aside: (b) should probably be folded into bdrv_detach().
>> 
>> >> Step b could be awkward with (3), because you don't know device details.
>> >> I guess you have to search device properties for a drive property
>> >> pointing to bdrv.  I like (1) because it puts that loop in the one place
>> >> where it belongs: qdev core.  (3) duplicates it in every HOSTDEV_del.
>> >> Except for netdev_del, which is special because of VLANs.
>> >> 
>> >> To avoid step b, you could try to keep the bdrv around in a special
>> >> zombie state.  Still have to free the dinfo, but can't use
>> >> drive_uninit() for that then.
>> >> 
>> >> If you think I'm overcomplicating this, feel free to prove me wrong with
>> >> working code :)
>> >
>> > drive_unplug() works as-is today; so it does feel very combursome at
>> > this point.  Other than the name change and agreement on how mgmt should
>> > invoke the command, it's been a long ride to get here.
>> 
>> Sometimes it takes a tough man to make a tender chicken.
>
>> > I'll take my best shot at trying to clean up the other
>> > pointers and objects; though on one of my attempts when I took out the
>> > dinfo() object that didn't go so well; going to have to audit who uses
>> > dinfo and where and what they check before calling it to have a proper
>> > cleanup that doesn't remove the whole device altogether.
>> 
>> Steps a, b, c are the result of my (admittedly quick) audit.
>> 
>> Here's how the various objects are connected to each other:
>> 
>>                contains
>> drivelist    -----------> DriveInfo
>>                                 |
>>                                 | .bdrv
>>                                 | .id == .bdrv->device_name
>>                                 |
>>                contains         V
>> bdrv_states  -----------> BlockDriverState
>>                              |   ^
>>                        .peer |   |
>>                              |   |                          host part
>> -----------------------------|---|-----------------------------------
>>                              |   |                         guest part
>>                              |   | property "drive"
>>                              v   |
>>                           DeviceState
>> 
>> To disconnect host from guest part, you need to cut both pointers.  To
>> delete the host part, you need to delete both objects, BlockDriverState
>> and DriveInfo.
>
>
> If we remove DriveInfo, how can management later detect that guest part
> was deleted?

Directly: check whether the qdev is gone.

I don't know how to check that indirectly, via DriveInfo.

>              If you want symmetry with netdev, it's possible to keep a
> shell of BlockDriverState/DriveInfo around (solving dangling pointer
> problems).

netdev_del deletes the host network part:

    (qemu) info network
    Devices not on any VLAN:
      net.0: net=10.0.2.0, restricted=n peer=nic.0
      nic.0: model=virtio-net-pci,macaddr=52:54:00:12:34:56 peer=net.0
    (qemu) netdev_del net.0
    (qemu) info network
    Devices not on any VLAN:
      nic.0: model=virtio-net-pci,macaddr=52:54:00:12:34:56 peer=net.0

It leaves around the VLAN object.  Since qdev property points to that,
it doesn't dangle.

In my opinion, drive_del should make the drive vanish from "info block",
just like netdev_del makes the netdev vanish from "info network".  And
that means deleting it from bdrv_states.  Whether we delete it
alltogether (which is what I sketched), or turn it into a zombie is a
separate question.  Both work for me.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-08 12:03                                                             ` Markus Armbruster
@ 2010-11-08 14:02                                                               ` Ryan Harper
  2010-11-08 16:56                                                                 ` Michael S. Tsirkin
  2010-11-08 16:34                                                               ` Michael S. Tsirkin
  1 sibling, 1 reply; 60+ messages in thread
From: Ryan Harper @ 2010-11-08 14:02 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Kevin Wolf, yamahata, Michael S. Tsirkin, qemu-devel,
	Anthony Liguori, Ryan Harper, Stefan Hajnoczi

* Markus Armbruster <armbru@redhat.com> [2010-11-08 06:04]:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> 
> > On Mon, Nov 08, 2010 at 11:32:01AM +0100, Markus Armbruster wrote:
> >> Ryan Harper <ryanh@us.ibm.com> writes:
> >> 
> >> > * Markus Armbruster <armbru@redhat.com> [2010-11-06 04:19]:
> >> >> Ryan Harper <ryanh@us.ibm.com> writes:
> >> >> 
> >> >> > * Markus Armbruster <armbru@redhat.com> [2010-11-05 11:11]:
> >> >> >> Ryan Harper <ryanh@us.ibm.com> writes:
> >> >> >> 
> >> >> >> > * Markus Armbruster <armbru@redhat.com> [2010-11-05 08:28]:
> >> >> >> >> I'd be fine with any of these:
> >> >> >> >> 
> >> >> >> >> 1. A new command "device_disconnet ID" (or similar name) to disconnect
> >> >> >> >>    device ID from any host parts.  Nice touch: you don't have to know
> >> >> >> >>    about the device's host part(s) to disconnect it.  But it might be
> >> >> >> >>    more work than the other two.
> >> >> >> >
> >> >> >> > This is sort of what netdev_del() and drive_unplug() are today; we're
> >> >> >> > just saying sever the connection of this device id.   
> >> >> >> 
> >> >> >> No, I have netdev_del as (3).
> >> >> >> 
> >> >> >> All three options are "sort of" the same, just different commands with
> >> >> >> a common purpose.
> >> >> >> 
> >> >> >> > I'd like to rename drive_unplug() to blockdev_del() and call it done.  I
> >> >> >> > was looking at libvirt and the right call to netdev_del is already
> >> >> >> > in-place; I'd just need to re-spin my block patch to call blockdev_del()
> >> >> >> > after invoking device_del() to match what is done for net.
> >> >> >> 
> >> >> >> Unless I'm missing something, you can't just rename: your unplug does
> >> >> >> not delete the host part.
> >> >> >> 
> >> >> >> >> 2. New commands netdev_disconnect, drive_disconnect (or similar names)
> >> >> >> >>    to disconnect a host part from a guest device.  Like (1), except you
> >> >> >> >>    have to point to the other end of the connection to cut it.
> >> >> >> >
> >> >> >> > What's the advantage here? We need an additional piece of info (host
> >> >> >> > part) in addition to the device id?
> >> >> >> 
> >> >> >> That's a disadvantage.
> >> >> >> 
> >> >> >> Possible advantage: implementation could be slightly easier than (1),
> >> >> >> because you don't have to find the host parts.
> >> >> >> 
> >> >> >> >> 3. A new command "drive_del ID" similar to existing netdev_del.  This is
> >> >> >> >>    (2) fused with delete.  Conceptual wart: you can't disconnect and
> >> >> >> >>    keep the host part around.  Moreover, delete is slightly dangerous,
> >> >> >> >>    because it renders any guest device still using the host part
> >> >> >> >>    useless.
> >> >> >> >
> >> >> >> > Hrm, I thought that's what (1) is.
> >> >> >> 
> >> >> >> No.
> >> >> >> 
> >> >> >> With (1), the argument is a *device* ID, and we disconnect *all* host
> >> >> >> parts connected to this device (typically just one).
> >> >> >> 
> >> >> >> With (3), the argument is a netdev/drive ID, and disconnect *this* host
> >> >> >> part from the peer device.
> >> >> >> 
> >> >> >> >                                     Well, either (1) or (3); I'd like to
> >> >> >> > rename drive_unplug() to blockdev_del() since they're similar function
> >> >> >> > w.r.t removing access to the host resource.  And we can invoke them in
> >> >> >> > the same way from libvirt (after doing guest notification, remove
> >> >> >> > access).
> >> >> >> 
> >> >> >> I'd call it drive_del for now, to match drive_add.
> >> >> >
> >> >> > OK, drive_del() and as you mentioned, drive_unplug will take out the
> >> >> > block driver, but doesn't remove the dinfo object; that ends up dying
> >> >> > when we call the device destructor.  I think for symmetry we'll want
> >> >> > drive_del to remove the dinfo object as well.
> >> >> 
> >> >> Exactly.
> >> >> 
> >> >> a. bdrv_detach() to zap the pointer from bdrv to qdev
> >> >> b. zap the pointer from qdev to bdrv
> >> >> c. drive_uninit() to dispose of the host part
> >> >
> >> > a-c need to be done to match netdev_del symmetry?  How hard of a req is
> >> > this?
> >> 
> >> Without (c), it's not a delete.  And (c) without (b) leaves a dangling
> >> pointer.  (c) without (a) fails an assertion in bdrv_delete().
> >> 
> >> Aside: (b) should probably be folded into bdrv_detach().
> >> 
> >> >> Step b could be awkward with (3), because you don't know device details.
> >> >> I guess you have to search device properties for a drive property
> >> >> pointing to bdrv.  I like (1) because it puts that loop in the one place
> >> >> where it belongs: qdev core.  (3) duplicates it in every HOSTDEV_del.
> >> >> Except for netdev_del, which is special because of VLANs.
> >> >> 
> >> >> To avoid step b, you could try to keep the bdrv around in a special
> >> >> zombie state.  Still have to free the dinfo, but can't use
> >> >> drive_uninit() for that then.
> >> >> 
> >> >> If you think I'm overcomplicating this, feel free to prove me wrong with
> >> >> working code :)
> >> >
> >> > drive_unplug() works as-is today; so it does feel very combursome at
> >> > this point.  Other than the name change and agreement on how mgmt should
> >> > invoke the command, it's been a long ride to get here.
> >> 
> >> Sometimes it takes a tough man to make a tender chicken.
> >
> >> > I'll take my best shot at trying to clean up the other
> >> > pointers and objects; though on one of my attempts when I took out the
> >> > dinfo() object that didn't go so well; going to have to audit who uses
> >> > dinfo and where and what they check before calling it to have a proper
> >> > cleanup that doesn't remove the whole device altogether.
> >> 
> >> Steps a, b, c are the result of my (admittedly quick) audit.
> >> 
> >> Here's how the various objects are connected to each other:
> >> 
> >>                contains
> >> drivelist    -----------> DriveInfo
> >>                                 |
> >>                                 | .bdrv
> >>                                 | .id == .bdrv->device_name
> >>                                 |
> >>                contains         V
> >> bdrv_states  -----------> BlockDriverState
> >>                              |   ^
> >>                        .peer |   |
> >>                              |   |                          host part
> >> -----------------------------|---|-----------------------------------
> >>                              |   |                         guest part
> >>                              |   | property "drive"
> >>                              v   |
> >>                           DeviceState
> >> 
> >> To disconnect host from guest part, you need to cut both pointers.  To
> >> delete the host part, you need to delete both objects, BlockDriverState
> >> and DriveInfo.
> >
> >
> > If we remove DriveInfo, how can management later detect that guest part
> > was deleted?
> 
> Directly: check whether the qdev is gone.
> 
> I don't know how to check that indirectly, via DriveInfo.
> 
> >              If you want symmetry with netdev, it's possible to keep a
> > shell of BlockDriverState/DriveInfo around (solving dangling pointer
> > problems).
> 
> netdev_del deletes the host network part:
> 
>     (qemu) info network
>     Devices not on any VLAN:
>       net.0: net=10.0.2.0, restricted=n peer=nic.0
>       nic.0: model=virtio-net-pci,macaddr=52:54:00:12:34:56 peer=net.0
>     (qemu) netdev_del net.0
>     (qemu) info network
>     Devices not on any VLAN:
>       nic.0: model=virtio-net-pci,macaddr=52:54:00:12:34:56 peer=net.0
> 
> It leaves around the VLAN object.  Since qdev property points to that,
> it doesn't dangle.
> 
> In my opinion, drive_del should make the drive vanish from "info block",

Yeah; that's the right thing to do here.  Let me respin the patch with
the name change and the additional work to fix up the pointers and
ensure that we don't see the drive in info block.

> just like netdev_del makes the netdev vanish from "info network".  And
> that means deleting it from bdrv_states.  Whether we delete it
> alltogether (which is what I sketched), or turn it into a zombie is a
> separate question.  Both work for me.


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-08 12:03                                                             ` Markus Armbruster
  2010-11-08 14:02                                                               ` Ryan Harper
@ 2010-11-08 16:34                                                               ` Michael S. Tsirkin
  1 sibling, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2010-11-08 16:34 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Kevin Wolf, yamahata, qemu-devel, Anthony Liguori, Ryan Harper,
	Stefan Hajnoczi

On Mon, Nov 08, 2010 at 01:03:18PM +0100, Markus Armbruster wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> 
> > On Mon, Nov 08, 2010 at 11:32:01AM +0100, Markus Armbruster wrote:
> >> Ryan Harper <ryanh@us.ibm.com> writes:
> >> 
> >> > * Markus Armbruster <armbru@redhat.com> [2010-11-06 04:19]:
> >> >> Ryan Harper <ryanh@us.ibm.com> writes:
> >> >> 
> >> >> > * Markus Armbruster <armbru@redhat.com> [2010-11-05 11:11]:
> >> >> >> Ryan Harper <ryanh@us.ibm.com> writes:
> >> >> >> 
> >> >> >> > * Markus Armbruster <armbru@redhat.com> [2010-11-05 08:28]:
> >> >> >> >> I'd be fine with any of these:
> >> >> >> >> 
> >> >> >> >> 1. A new command "device_disconnet ID" (or similar name) to disconnect
> >> >> >> >>    device ID from any host parts.  Nice touch: you don't have to know
> >> >> >> >>    about the device's host part(s) to disconnect it.  But it might be
> >> >> >> >>    more work than the other two.
> >> >> >> >
> >> >> >> > This is sort of what netdev_del() and drive_unplug() are today; we're
> >> >> >> > just saying sever the connection of this device id.   
> >> >> >> 
> >> >> >> No, I have netdev_del as (3).
> >> >> >> 
> >> >> >> All three options are "sort of" the same, just different commands with
> >> >> >> a common purpose.
> >> >> >> 
> >> >> >> > I'd like to rename drive_unplug() to blockdev_del() and call it done.  I
> >> >> >> > was looking at libvirt and the right call to netdev_del is already
> >> >> >> > in-place; I'd just need to re-spin my block patch to call blockdev_del()
> >> >> >> > after invoking device_del() to match what is done for net.
> >> >> >> 
> >> >> >> Unless I'm missing something, you can't just rename: your unplug does
> >> >> >> not delete the host part.
> >> >> >> 
> >> >> >> >> 2. New commands netdev_disconnect, drive_disconnect (or similar names)
> >> >> >> >>    to disconnect a host part from a guest device.  Like (1), except you
> >> >> >> >>    have to point to the other end of the connection to cut it.
> >> >> >> >
> >> >> >> > What's the advantage here? We need an additional piece of info (host
> >> >> >> > part) in addition to the device id?
> >> >> >> 
> >> >> >> That's a disadvantage.
> >> >> >> 
> >> >> >> Possible advantage: implementation could be slightly easier than (1),
> >> >> >> because you don't have to find the host parts.
> >> >> >> 
> >> >> >> >> 3. A new command "drive_del ID" similar to existing netdev_del.  This is
> >> >> >> >>    (2) fused with delete.  Conceptual wart: you can't disconnect and
> >> >> >> >>    keep the host part around.  Moreover, delete is slightly dangerous,
> >> >> >> >>    because it renders any guest device still using the host part
> >> >> >> >>    useless.
> >> >> >> >
> >> >> >> > Hrm, I thought that's what (1) is.
> >> >> >> 
> >> >> >> No.
> >> >> >> 
> >> >> >> With (1), the argument is a *device* ID, and we disconnect *all* host
> >> >> >> parts connected to this device (typically just one).
> >> >> >> 
> >> >> >> With (3), the argument is a netdev/drive ID, and disconnect *this* host
> >> >> >> part from the peer device.
> >> >> >> 
> >> >> >> >                                     Well, either (1) or (3); I'd like to
> >> >> >> > rename drive_unplug() to blockdev_del() since they're similar function
> >> >> >> > w.r.t removing access to the host resource.  And we can invoke them in
> >> >> >> > the same way from libvirt (after doing guest notification, remove
> >> >> >> > access).
> >> >> >> 
> >> >> >> I'd call it drive_del for now, to match drive_add.
> >> >> >
> >> >> > OK, drive_del() and as you mentioned, drive_unplug will take out the
> >> >> > block driver, but doesn't remove the dinfo object; that ends up dying
> >> >> > when we call the device destructor.  I think for symmetry we'll want
> >> >> > drive_del to remove the dinfo object as well.
> >> >> 
> >> >> Exactly.
> >> >> 
> >> >> a. bdrv_detach() to zap the pointer from bdrv to qdev
> >> >> b. zap the pointer from qdev to bdrv
> >> >> c. drive_uninit() to dispose of the host part
> >> >
> >> > a-c need to be done to match netdev_del symmetry?  How hard of a req is
> >> > this?
> >> 
> >> Without (c), it's not a delete.  And (c) without (b) leaves a dangling
> >> pointer.  (c) without (a) fails an assertion in bdrv_delete().
> >> 
> >> Aside: (b) should probably be folded into bdrv_detach().
> >> 
> >> >> Step b could be awkward with (3), because you don't know device details.
> >> >> I guess you have to search device properties for a drive property
> >> >> pointing to bdrv.  I like (1) because it puts that loop in the one place
> >> >> where it belongs: qdev core.  (3) duplicates it in every HOSTDEV_del.
> >> >> Except for netdev_del, which is special because of VLANs.
> >> >> 
> >> >> To avoid step b, you could try to keep the bdrv around in a special
> >> >> zombie state.  Still have to free the dinfo, but can't use
> >> >> drive_uninit() for that then.
> >> >> 
> >> >> If you think I'm overcomplicating this, feel free to prove me wrong with
> >> >> working code :)
> >> >
> >> > drive_unplug() works as-is today; so it does feel very combursome at
> >> > this point.  Other than the name change and agreement on how mgmt should
> >> > invoke the command, it's been a long ride to get here.
> >> 
> >> Sometimes it takes a tough man to make a tender chicken.
> >
> >> > I'll take my best shot at trying to clean up the other
> >> > pointers and objects; though on one of my attempts when I took out the
> >> > dinfo() object that didn't go so well; going to have to audit who uses
> >> > dinfo and where and what they check before calling it to have a proper
> >> > cleanup that doesn't remove the whole device altogether.
> >> 
> >> Steps a, b, c are the result of my (admittedly quick) audit.
> >> 
> >> Here's how the various objects are connected to each other:
> >> 
> >>                contains
> >> drivelist    -----------> DriveInfo
> >>                                 |
> >>                                 | .bdrv
> >>                                 | .id == .bdrv->device_name
> >>                                 |
> >>                contains         V
> >> bdrv_states  -----------> BlockDriverState
> >>                              |   ^
> >>                        .peer |   |
> >>                              |   |                          host part
> >> -----------------------------|---|-----------------------------------
> >>                              |   |                         guest part
> >>                              |   | property "drive"
> >>                              v   |
> >>                           DeviceState
> >> 
> >> To disconnect host from guest part, you need to cut both pointers.  To
> >> delete the host part, you need to delete both objects, BlockDriverState
> >> and DriveInfo.
> >
> >
> > If we remove DriveInfo, how can management later detect that guest part
> > was deleted?
> 
> Directly: check whether the qdev is gone.

With info qdev?
I am not at all sure we want management to do that, it'll
require that we keep the output stable.
info block is already parsed, it's easier for management
to look there.

> I don't know how to check that indirectly, via DriveInfo.
> 
> >              If you want symmetry with netdev, it's possible to keep a
> > shell of BlockDriverState/DriveInfo around (solving dangling pointer
> > problems).
> 
> netdev_del deletes the host network part:
> 
>     (qemu) info network
>     Devices not on any VLAN:
>       net.0: net=10.0.2.0, restricted=n peer=nic.0
>       nic.0: model=virtio-net-pci,macaddr=52:54:00:12:34:56 peer=net.0
>     (qemu) netdev_del net.0
>     (qemu) info network
>     Devices not on any VLAN:
>       nic.0: model=virtio-net-pci,macaddr=52:54:00:12:34:56 peer=net.0
> 
> It leaves around the VLAN object.  Since qdev property points to that,
> it doesn't dangle.
> 
> In my opinion, drive_del should make the drive vanish from "info block",
> just like netdev_del makes the netdev vanish from "info network".

Yes but we need to have something left on info block IMO.

>  And
> that means deleting it from bdrv_states.  Whether we delete it
> alltogether (which is what I sketched), or turn it into a zombie is a
> separate question.  Both work for me.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-08 14:02                                                               ` Ryan Harper
@ 2010-11-08 16:56                                                                 ` Michael S. Tsirkin
  2010-11-08 17:04                                                                   ` Daniel P. Berrange
  2010-11-08 18:39                                                                   ` Ryan Harper
  0 siblings, 2 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2010-11-08 16:56 UTC (permalink / raw)
  To: Ryan Harper
  Cc: Kevin Wolf, yamahata, Markus Armbruster, qemu-devel,
	Anthony Liguori, Stefan Hajnoczi

On Mon, Nov 08, 2010 at 08:02:50AM -0600, Ryan Harper wrote:
> * Markus Armbruster <armbru@redhat.com> [2010-11-08 06:04]:
> > "Michael S. Tsirkin" <mst@redhat.com> writes:
> > 
> > > On Mon, Nov 08, 2010 at 11:32:01AM +0100, Markus Armbruster wrote:
> > >> Ryan Harper <ryanh@us.ibm.com> writes:
> > >> 
> > >> > * Markus Armbruster <armbru@redhat.com> [2010-11-06 04:19]:
> > >> >> Ryan Harper <ryanh@us.ibm.com> writes:
> > >> >> 
> > >> >> > * Markus Armbruster <armbru@redhat.com> [2010-11-05 11:11]:
> > >> >> >> Ryan Harper <ryanh@us.ibm.com> writes:
> > >> >> >> 
> > >> >> >> > * Markus Armbruster <armbru@redhat.com> [2010-11-05 08:28]:
> > >> >> >> >> I'd be fine with any of these:
> > >> >> >> >> 
> > >> >> >> >> 1. A new command "device_disconnet ID" (or similar name) to disconnect
> > >> >> >> >>    device ID from any host parts.  Nice touch: you don't have to know
> > >> >> >> >>    about the device's host part(s) to disconnect it.  But it might be
> > >> >> >> >>    more work than the other two.
> > >> >> >> >
> > >> >> >> > This is sort of what netdev_del() and drive_unplug() are today; we're
> > >> >> >> > just saying sever the connection of this device id.   
> > >> >> >> 
> > >> >> >> No, I have netdev_del as (3).
> > >> >> >> 
> > >> >> >> All three options are "sort of" the same, just different commands with
> > >> >> >> a common purpose.
> > >> >> >> 
> > >> >> >> > I'd like to rename drive_unplug() to blockdev_del() and call it done.  I
> > >> >> >> > was looking at libvirt and the right call to netdev_del is already
> > >> >> >> > in-place; I'd just need to re-spin my block patch to call blockdev_del()
> > >> >> >> > after invoking device_del() to match what is done for net.
> > >> >> >> 
> > >> >> >> Unless I'm missing something, you can't just rename: your unplug does
> > >> >> >> not delete the host part.
> > >> >> >> 
> > >> >> >> >> 2. New commands netdev_disconnect, drive_disconnect (or similar names)
> > >> >> >> >>    to disconnect a host part from a guest device.  Like (1), except you
> > >> >> >> >>    have to point to the other end of the connection to cut it.
> > >> >> >> >
> > >> >> >> > What's the advantage here? We need an additional piece of info (host
> > >> >> >> > part) in addition to the device id?
> > >> >> >> 
> > >> >> >> That's a disadvantage.
> > >> >> >> 
> > >> >> >> Possible advantage: implementation could be slightly easier than (1),
> > >> >> >> because you don't have to find the host parts.
> > >> >> >> 
> > >> >> >> >> 3. A new command "drive_del ID" similar to existing netdev_del.  This is
> > >> >> >> >>    (2) fused with delete.  Conceptual wart: you can't disconnect and
> > >> >> >> >>    keep the host part around.  Moreover, delete is slightly dangerous,
> > >> >> >> >>    because it renders any guest device still using the host part
> > >> >> >> >>    useless.
> > >> >> >> >
> > >> >> >> > Hrm, I thought that's what (1) is.
> > >> >> >> 
> > >> >> >> No.
> > >> >> >> 
> > >> >> >> With (1), the argument is a *device* ID, and we disconnect *all* host
> > >> >> >> parts connected to this device (typically just one).
> > >> >> >> 
> > >> >> >> With (3), the argument is a netdev/drive ID, and disconnect *this* host
> > >> >> >> part from the peer device.
> > >> >> >> 
> > >> >> >> >                                     Well, either (1) or (3); I'd like to
> > >> >> >> > rename drive_unplug() to blockdev_del() since they're similar function
> > >> >> >> > w.r.t removing access to the host resource.  And we can invoke them in
> > >> >> >> > the same way from libvirt (after doing guest notification, remove
> > >> >> >> > access).
> > >> >> >> 
> > >> >> >> I'd call it drive_del for now, to match drive_add.
> > >> >> >
> > >> >> > OK, drive_del() and as you mentioned, drive_unplug will take out the
> > >> >> > block driver, but doesn't remove the dinfo object; that ends up dying
> > >> >> > when we call the device destructor.  I think for symmetry we'll want
> > >> >> > drive_del to remove the dinfo object as well.
> > >> >> 
> > >> >> Exactly.
> > >> >> 
> > >> >> a. bdrv_detach() to zap the pointer from bdrv to qdev
> > >> >> b. zap the pointer from qdev to bdrv
> > >> >> c. drive_uninit() to dispose of the host part
> > >> >
> > >> > a-c need to be done to match netdev_del symmetry?  How hard of a req is
> > >> > this?
> > >> 
> > >> Without (c), it's not a delete.  And (c) without (b) leaves a dangling
> > >> pointer.  (c) without (a) fails an assertion in bdrv_delete().
> > >> 
> > >> Aside: (b) should probably be folded into bdrv_detach().
> > >> 
> > >> >> Step b could be awkward with (3), because you don't know device details.
> > >> >> I guess you have to search device properties for a drive property
> > >> >> pointing to bdrv.  I like (1) because it puts that loop in the one place
> > >> >> where it belongs: qdev core.  (3) duplicates it in every HOSTDEV_del.
> > >> >> Except for netdev_del, which is special because of VLANs.
> > >> >> 
> > >> >> To avoid step b, you could try to keep the bdrv around in a special
> > >> >> zombie state.  Still have to free the dinfo, but can't use
> > >> >> drive_uninit() for that then.
> > >> >> 
> > >> >> If you think I'm overcomplicating this, feel free to prove me wrong with
> > >> >> working code :)
> > >> >
> > >> > drive_unplug() works as-is today; so it does feel very combursome at
> > >> > this point.  Other than the name change and agreement on how mgmt should
> > >> > invoke the command, it's been a long ride to get here.
> > >> 
> > >> Sometimes it takes a tough man to make a tender chicken.
> > >
> > >> > I'll take my best shot at trying to clean up the other
> > >> > pointers and objects; though on one of my attempts when I took out the
> > >> > dinfo() object that didn't go so well; going to have to audit who uses
> > >> > dinfo and where and what they check before calling it to have a proper
> > >> > cleanup that doesn't remove the whole device altogether.
> > >> 
> > >> Steps a, b, c are the result of my (admittedly quick) audit.
> > >> 
> > >> Here's how the various objects are connected to each other:
> > >> 
> > >>                contains
> > >> drivelist    -----------> DriveInfo
> > >>                                 |
> > >>                                 | .bdrv
> > >>                                 | .id == .bdrv->device_name
> > >>                                 |
> > >>                contains         V
> > >> bdrv_states  -----------> BlockDriverState
> > >>                              |   ^
> > >>                        .peer |   |
> > >>                              |   |                          host part
> > >> -----------------------------|---|-----------------------------------
> > >>                              |   |                         guest part
> > >>                              |   | property "drive"
> > >>                              v   |
> > >>                           DeviceState
> > >> 
> > >> To disconnect host from guest part, you need to cut both pointers.  To
> > >> delete the host part, you need to delete both objects, BlockDriverState
> > >> and DriveInfo.
> > >
> > >
> > > If we remove DriveInfo, how can management later detect that guest part
> > > was deleted?
> > 
> > Directly: check whether the qdev is gone.
> > 
> > I don't know how to check that indirectly, via DriveInfo.
> > 
> > >              If you want symmetry with netdev, it's possible to keep a
> > > shell of BlockDriverState/DriveInfo around (solving dangling pointer
> > > problems).
> > 
> > netdev_del deletes the host network part:
> > 
> >     (qemu) info network
> >     Devices not on any VLAN:
> >       net.0: net=10.0.2.0, restricted=n peer=nic.0
> >       nic.0: model=virtio-net-pci,macaddr=52:54:00:12:34:56 peer=net.0
> >     (qemu) netdev_del net.0
> >     (qemu) info network
> >     Devices not on any VLAN:
> >       nic.0: model=virtio-net-pci,macaddr=52:54:00:12:34:56 peer=net.0
> > 
> > It leaves around the VLAN object.  Since qdev property points to that,
> > it doesn't dangle.
> > 
> > In my opinion, drive_del should make the drive vanish from "info block",
> 
> Yeah; that's the right thing to do here.  Let me respin the patch with
> the name change and the additional work to fix up the pointers and
> ensure that we don't see the drive in info block.

Daniel, I'd like your input here: can you live with
device diappearing from info block and parsing
qdev tree info to figure out whether device is really gone?

> > just like netdev_del makes the netdev vanish from "info network".  And
> > that means deleting it from bdrv_states.  Whether we delete it
> > alltogether (which is what I sketched), or turn it into a zombie is a
> > separate question.  Both work for me.
> 
> 
> -- 
> Ryan Harper
> Software Engineer; Linux Technology Center
> IBM Corp., Austin, Tx
> ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-08 16:56                                                                 ` Michael S. Tsirkin
@ 2010-11-08 17:04                                                                   ` Daniel P. Berrange
  2010-11-08 18:41                                                                     ` Ryan Harper
  2010-11-08 18:39                                                                   ` Ryan Harper
  1 sibling, 1 reply; 60+ messages in thread
From: Daniel P. Berrange @ 2010-11-08 17:04 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Kevin Wolf, yamahata, Markus Armbruster, qemu-devel,
	Anthony Liguori, Ryan Harper, Stefan Hajnoczi

On Mon, Nov 08, 2010 at 06:56:02PM +0200, Michael S. Tsirkin wrote:
> On Mon, Nov 08, 2010 at 08:02:50AM -0600, Ryan Harper wrote:
> > * Markus Armbruster <armbru@redhat.com> [2010-11-08 06:04]:
> > > "Michael S. Tsirkin" <mst@redhat.com> writes:
> > > >> Here's how the various objects are connected to each other:
> > > >> 
> > > >>                contains
> > > >> drivelist    -----------> DriveInfo
> > > >>                                 |
> > > >>                                 | .bdrv
> > > >>                                 | .id == .bdrv->device_name
> > > >>                                 |
> > > >>                contains         V
> > > >> bdrv_states  -----------> BlockDriverState
> > > >>                              |   ^
> > > >>                        .peer |   |
> > > >>                              |   |                          host part
> > > >> -----------------------------|---|-----------------------------------
> > > >>                              |   |                         guest part
> > > >>                              |   | property "drive"
> > > >>                              v   |
> > > >>                           DeviceState
> > > >> 
> > > >> To disconnect host from guest part, you need to cut both pointers.  To
> > > >> delete the host part, you need to delete both objects, BlockDriverState
> > > >> and DriveInfo.
> > > >
> > > >
> > > > If we remove DriveInfo, how can management later detect that guest part
> > > > was deleted?
> > > 
> > > Directly: check whether the qdev is gone.
> > > 
> > > I don't know how to check that indirectly, via DriveInfo.
> > > 
> > > >              If you want symmetry with netdev, it's possible to keep a
> > > > shell of BlockDriverState/DriveInfo around (solving dangling pointer
> > > > problems).
> > > 
> > > netdev_del deletes the host network part:
> > > 
> > >     (qemu) info network
> > >     Devices not on any VLAN:
> > >       net.0: net=10.0.2.0, restricted=n peer=nic.0
> > >       nic.0: model=virtio-net-pci,macaddr=52:54:00:12:34:56 peer=net.0
> > >     (qemu) netdev_del net.0
> > >     (qemu) info network
> > >     Devices not on any VLAN:
> > >       nic.0: model=virtio-net-pci,macaddr=52:54:00:12:34:56 peer=net.0
> > > 
> > > It leaves around the VLAN object.  Since qdev property points to that,
> > > it doesn't dangle.
> > > 
> > > In my opinion, drive_del should make the drive vanish from "info block",
> > 
> > Yeah; that's the right thing to do here.  Let me respin the patch with
> > the name change and the additional work to fix up the pointers and
> > ensure that we don't see the drive in info block.
> 
> Daniel, I'd like your input here: can you live with
> device diappearing from info block and parsing
> qdev tree info to figure out whether device is really gone?

We don't use info block for anything. Having to parse the full qdev tree
to determine if a single device is gone seems rather tedious. It would
be better if query-qdev took an optional argument, which is the name
of the device to root the tree at. Then checking whether a device
named 'foo' is gone just means running 'query-qdev foo' and seeing if
that returns an error about the device not existing, then we know it
has gone. No need to parse anything. Being able to query the qdev data
for a single device, or sub-tree of devices seems useful in its own
right.

Regards,
Daniel
-- 
|: Red Hat, Engineering, London    -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org        -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-08 16:56                                                                 ` Michael S. Tsirkin
  2010-11-08 17:04                                                                   ` Daniel P. Berrange
@ 2010-11-08 18:39                                                                   ` Ryan Harper
  2010-11-08 19:06                                                                     ` Daniel P. Berrange
  1 sibling, 1 reply; 60+ messages in thread
From: Ryan Harper @ 2010-11-08 18:39 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Kevin Wolf, yamahata, Markus Armbruster, qemu-devel,
	Anthony Liguori, Ryan Harper, Stefan Hajnoczi

* Michael S. Tsirkin <mst@redhat.com> [2010-11-08 10:57]:
> On Mon, Nov 08, 2010 at 08:02:50AM -0600, Ryan Harper wrote:
> > * Markus Armbruster <armbru@redhat.com> [2010-11-08 06:04]:
> > > "Michael S. Tsirkin" <mst@redhat.com> writes:
> > > 
> > > > On Mon, Nov 08, 2010 at 11:32:01AM +0100, Markus Armbruster wrote:
> > > >> Ryan Harper <ryanh@us.ibm.com> writes:
> > > >> 
> > > >> > * Markus Armbruster <armbru@redhat.com> [2010-11-06 04:19]:
> > > >> >> Ryan Harper <ryanh@us.ibm.com> writes:
> > > >> >> 
> > > >> >> > * Markus Armbruster <armbru@redhat.com> [2010-11-05 11:11]:
> > > >> >> >> Ryan Harper <ryanh@us.ibm.com> writes:
> > > >> >> >> 
> > > >> >> >> > * Markus Armbruster <armbru@redhat.com> [2010-11-05 08:28]:
> > > >> >> >> >> I'd be fine with any of these:
> > > >> >> >> >> 
> > > >> >> >> >> 1. A new command "device_disconnet ID" (or similar name) to disconnect
> > > >> >> >> >>    device ID from any host parts.  Nice touch: you don't have to know
> > > >> >> >> >>    about the device's host part(s) to disconnect it.  But it might be
> > > >> >> >> >>    more work than the other two.
> > > >> >> >> >
> > > >> >> >> > This is sort of what netdev_del() and drive_unplug() are today; we're
> > > >> >> >> > just saying sever the connection of this device id.   
> > > >> >> >> 
> > > >> >> >> No, I have netdev_del as (3).
> > > >> >> >> 
> > > >> >> >> All three options are "sort of" the same, just different commands with
> > > >> >> >> a common purpose.
> > > >> >> >> 
> > > >> >> >> > I'd like to rename drive_unplug() to blockdev_del() and call it done.  I
> > > >> >> >> > was looking at libvirt and the right call to netdev_del is already
> > > >> >> >> > in-place; I'd just need to re-spin my block patch to call blockdev_del()
> > > >> >> >> > after invoking device_del() to match what is done for net.
> > > >> >> >> 
> > > >> >> >> Unless I'm missing something, you can't just rename: your unplug does
> > > >> >> >> not delete the host part.
> > > >> >> >> 
> > > >> >> >> >> 2. New commands netdev_disconnect, drive_disconnect (or similar names)
> > > >> >> >> >>    to disconnect a host part from a guest device.  Like (1), except you
> > > >> >> >> >>    have to point to the other end of the connection to cut it.
> > > >> >> >> >
> > > >> >> >> > What's the advantage here? We need an additional piece of info (host
> > > >> >> >> > part) in addition to the device id?
> > > >> >> >> 
> > > >> >> >> That's a disadvantage.
> > > >> >> >> 
> > > >> >> >> Possible advantage: implementation could be slightly easier than (1),
> > > >> >> >> because you don't have to find the host parts.
> > > >> >> >> 
> > > >> >> >> >> 3. A new command "drive_del ID" similar to existing netdev_del.  This is
> > > >> >> >> >>    (2) fused with delete.  Conceptual wart: you can't disconnect and
> > > >> >> >> >>    keep the host part around.  Moreover, delete is slightly dangerous,
> > > >> >> >> >>    because it renders any guest device still using the host part
> > > >> >> >> >>    useless.
> > > >> >> >> >
> > > >> >> >> > Hrm, I thought that's what (1) is.
> > > >> >> >> 
> > > >> >> >> No.
> > > >> >> >> 
> > > >> >> >> With (1), the argument is a *device* ID, and we disconnect *all* host
> > > >> >> >> parts connected to this device (typically just one).
> > > >> >> >> 
> > > >> >> >> With (3), the argument is a netdev/drive ID, and disconnect *this* host
> > > >> >> >> part from the peer device.
> > > >> >> >> 
> > > >> >> >> >                                     Well, either (1) or (3); I'd like to
> > > >> >> >> > rename drive_unplug() to blockdev_del() since they're similar function
> > > >> >> >> > w.r.t removing access to the host resource.  And we can invoke them in
> > > >> >> >> > the same way from libvirt (after doing guest notification, remove
> > > >> >> >> > access).
> > > >> >> >> 
> > > >> >> >> I'd call it drive_del for now, to match drive_add.
> > > >> >> >
> > > >> >> > OK, drive_del() and as you mentioned, drive_unplug will take out the
> > > >> >> > block driver, but doesn't remove the dinfo object; that ends up dying
> > > >> >> > when we call the device destructor.  I think for symmetry we'll want
> > > >> >> > drive_del to remove the dinfo object as well.
> > > >> >> 
> > > >> >> Exactly.
> > > >> >> 
> > > >> >> a. bdrv_detach() to zap the pointer from bdrv to qdev
> > > >> >> b. zap the pointer from qdev to bdrv
> > > >> >> c. drive_uninit() to dispose of the host part
> > > >> >
> > > >> > a-c need to be done to match netdev_del symmetry?  How hard of a req is
> > > >> > this?
> > > >> 
> > > >> Without (c), it's not a delete.  And (c) without (b) leaves a dangling
> > > >> pointer.  (c) without (a) fails an assertion in bdrv_delete().
> > > >> 
> > > >> Aside: (b) should probably be folded into bdrv_detach().
> > > >> 
> > > >> >> Step b could be awkward with (3), because you don't know device details.
> > > >> >> I guess you have to search device properties for a drive property
> > > >> >> pointing to bdrv.  I like (1) because it puts that loop in the one place
> > > >> >> where it belongs: qdev core.  (3) duplicates it in every HOSTDEV_del.
> > > >> >> Except for netdev_del, which is special because of VLANs.
> > > >> >> 
> > > >> >> To avoid step b, you could try to keep the bdrv around in a special
> > > >> >> zombie state.  Still have to free the dinfo, but can't use
> > > >> >> drive_uninit() for that then.
> > > >> >> 
> > > >> >> If you think I'm overcomplicating this, feel free to prove me wrong with
> > > >> >> working code :)
> > > >> >
> > > >> > drive_unplug() works as-is today; so it does feel very combursome at
> > > >> > this point.  Other than the name change and agreement on how mgmt should
> > > >> > invoke the command, it's been a long ride to get here.
> > > >> 
> > > >> Sometimes it takes a tough man to make a tender chicken.
> > > >
> > > >> > I'll take my best shot at trying to clean up the other
> > > >> > pointers and objects; though on one of my attempts when I took out the
> > > >> > dinfo() object that didn't go so well; going to have to audit who uses
> > > >> > dinfo and where and what they check before calling it to have a proper
> > > >> > cleanup that doesn't remove the whole device altogether.
> > > >> 
> > > >> Steps a, b, c are the result of my (admittedly quick) audit.
> > > >> 
> > > >> Here's how the various objects are connected to each other:
> > > >> 
> > > >>                contains
> > > >> drivelist    -----------> DriveInfo
> > > >>                                 |
> > > >>                                 | .bdrv
> > > >>                                 | .id == .bdrv->device_name
> > > >>                                 |
> > > >>                contains         V
> > > >> bdrv_states  -----------> BlockDriverState
> > > >>                              |   ^
> > > >>                        .peer |   |
> > > >>                              |   |                          host part
> > > >> -----------------------------|---|-----------------------------------
> > > >>                              |   |                         guest part
> > > >>                              |   | property "drive"
> > > >>                              v   |
> > > >>                           DeviceState
> > > >> 
> > > >> To disconnect host from guest part, you need to cut both pointers.  To
> > > >> delete the host part, you need to delete both objects, BlockDriverState
> > > >> and DriveInfo.
> > > >
> > > >
> > > > If we remove DriveInfo, how can management later detect that guest part
> > > > was deleted?
> > > 
> > > Directly: check whether the qdev is gone.
> > > 
> > > I don't know how to check that indirectly, via DriveInfo.
> > > 
> > > >              If you want symmetry with netdev, it's possible to keep a
> > > > shell of BlockDriverState/DriveInfo around (solving dangling pointer
> > > > problems).
> > > 
> > > netdev_del deletes the host network part:
> > > 
> > >     (qemu) info network
> > >     Devices not on any VLAN:
> > >       net.0: net=10.0.2.0, restricted=n peer=nic.0
> > >       nic.0: model=virtio-net-pci,macaddr=52:54:00:12:34:56 peer=net.0
> > >     (qemu) netdev_del net.0
> > >     (qemu) info network
> > >     Devices not on any VLAN:
> > >       nic.0: model=virtio-net-pci,macaddr=52:54:00:12:34:56 peer=net.0
> > > 
> > > It leaves around the VLAN object.  Since qdev property points to that,
> > > it doesn't dangle.
> > > 
> > > In my opinion, drive_del should make the drive vanish from "info block",
> > 
> > Yeah; that's the right thing to do here.  Let me respin the patch with
> > the name change and the additional work to fix up the pointers and
> > ensure that we don't see the drive in info block.
> 
> Daniel, I'd like your input here: can you live with
> device diappearing from info block and parsing
> qdev tree info to figure out whether device is really gone?

AFAICT, libvirt doesn't look at or use info block at all.

I'd rather not have to add info block to libvirt; but currently I can't
see how else we can determine if we should call drive_unplug if we do a
device_del() and the guest removes it before we call drive_unplug().  

What happens is that the guest removes the device and when we call
drive_unplug() it fails to find the target device (since it was deleted
by the guest). Then we fail the PCiDelDisk and libvirt keeps the device
config around even though the guest has finished removing it.

The only way I see out of this is to either movethe severing of
host/guest before doing the delete (and notification).  Or adding some
code to parse info block and don't bother calling unplug if the
blockdevice is already deleted.


> 
> > > just like netdev_del makes the netdev vanish from "info network".  And
> > > that means deleting it from bdrv_states.  Whether we delete it
> > > alltogether (which is what I sketched), or turn it into a zombie is a
> > > separate question.  Both work for me.
> > 
> > 
> > -- 
> > Ryan Harper
> > Software Engineer; Linux Technology Center
> > IBM Corp., Austin, Tx
> > ryanh@us.ibm.com

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-08 17:04                                                                   ` Daniel P. Berrange
@ 2010-11-08 18:41                                                                     ` Ryan Harper
  0 siblings, 0 replies; 60+ messages in thread
From: Ryan Harper @ 2010-11-08 18:41 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Kevin Wolf, Michael S. Tsirkin, qemu-devel, Markus Armbruster,
	Anthony Liguori, Ryan Harper, Stefan Hajnoczi, yamahata

* Daniel P. Berrange <berrange@redhat.com> [2010-11-08 11:05]:
> On Mon, Nov 08, 2010 at 06:56:02PM +0200, Michael S. Tsirkin wrote:
> > On Mon, Nov 08, 2010 at 08:02:50AM -0600, Ryan Harper wrote:
> > > * Markus Armbruster <armbru@redhat.com> [2010-11-08 06:04]:
> > > > "Michael S. Tsirkin" <mst@redhat.com> writes:
> > > > >> Here's how the various objects are connected to each other:
> > > > >> 
> > > > >>                contains
> > > > >> drivelist    -----------> DriveInfo
> > > > >>                                 |
> > > > >>                                 | .bdrv
> > > > >>                                 | .id == .bdrv->device_name
> > > > >>                                 |
> > > > >>                contains         V
> > > > >> bdrv_states  -----------> BlockDriverState
> > > > >>                              |   ^
> > > > >>                        .peer |   |
> > > > >>                              |   |                          host part
> > > > >> -----------------------------|---|-----------------------------------
> > > > >>                              |   |                         guest part
> > > > >>                              |   | property "drive"
> > > > >>                              v   |
> > > > >>                           DeviceState
> > > > >> 
> > > > >> To disconnect host from guest part, you need to cut both pointers.  To
> > > > >> delete the host part, you need to delete both objects, BlockDriverState
> > > > >> and DriveInfo.
> > > > >
> > > > >
> > > > > If we remove DriveInfo, how can management later detect that guest part
> > > > > was deleted?
> > > > 
> > > > Directly: check whether the qdev is gone.
> > > > 
> > > > I don't know how to check that indirectly, via DriveInfo.
> > > > 
> > > > >              If you want symmetry with netdev, it's possible to keep a
> > > > > shell of BlockDriverState/DriveInfo around (solving dangling pointer
> > > > > problems).
> > > > 
> > > > netdev_del deletes the host network part:
> > > > 
> > > >     (qemu) info network
> > > >     Devices not on any VLAN:
> > > >       net.0: net=10.0.2.0, restricted=n peer=nic.0
> > > >       nic.0: model=virtio-net-pci,macaddr=52:54:00:12:34:56 peer=net.0
> > > >     (qemu) netdev_del net.0
> > > >     (qemu) info network
> > > >     Devices not on any VLAN:
> > > >       nic.0: model=virtio-net-pci,macaddr=52:54:00:12:34:56 peer=net.0
> > > > 
> > > > It leaves around the VLAN object.  Since qdev property points to that,
> > > > it doesn't dangle.
> > > > 
> > > > In my opinion, drive_del should make the drive vanish from "info block",
> > > 
> > > Yeah; that's the right thing to do here.  Let me respin the patch with
> > > the name change and the additional work to fix up the pointers and
> > > ensure that we don't see the drive in info block.
> > 
> > Daniel, I'd like your input here: can you live with
> > device diappearing from info block and parsing
> > qdev tree info to figure out whether device is really gone?
> 
> We don't use info block for anything. Having to parse the full qdev tree
> to determine if a single device is gone seems rather tedious. It would
> be better if query-qdev took an optional argument, which is the name
> of the device to root the tree at. Then checking whether a device
> named 'foo' is gone just means running 'query-qdev foo' and seeing if
> that returns an error about the device not existing, then we know it
> has gone. No need to parse anything. Being able to query the qdev data
> for a single device, or sub-tree of devices seems useful in its own
> right.

Since I'm not looking forward to parsing info block (easy) nor parsing
all of qdev tree (much harder) I really like the query approach.  

That makes it easy to put a query in the netdev_del/drive_del commands
to skip invoking them if the guest has already responded.

> 
> Regards,
> Daniel
> -- 
> |: Red Hat, Engineering, London    -o-   http://people.redhat.com/berrange/ :|
> |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
> |: http://autobuild.org        -o-         http://search.cpan.org/~danberr/ :|
> |: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-08 18:39                                                                   ` Ryan Harper
@ 2010-11-08 19:06                                                                     ` Daniel P. Berrange
  0 siblings, 0 replies; 60+ messages in thread
From: Daniel P. Berrange @ 2010-11-08 19:06 UTC (permalink / raw)
  To: Ryan Harper
  Cc: Kevin Wolf, yamahata, Michael S. Tsirkin, qemu-devel,
	Markus Armbruster, Anthony Liguori, Stefan Hajnoczi

On Mon, Nov 08, 2010 at 12:39:01PM -0600, Ryan Harper wrote:
> * Michael S. Tsirkin <mst@redhat.com> [2010-11-08 10:57]:
> > On Mon, Nov 08, 2010 at 08:02:50AM -0600, Ryan Harper wrote:
> > > * Markus Armbruster <armbru@redhat.com> [2010-11-08 06:04]:
> > > > "Michael S. Tsirkin" <mst@redhat.com> writes:
> > > > 
> > > > > On Mon, Nov 08, 2010 at 11:32:01AM +0100, Markus Armbruster wrote:
> > 
> > Daniel, I'd like your input here: can you live with
> > device diappearing from info block and parsing
> > qdev tree info to figure out whether device is really gone?
> 
> AFAICT, libvirt doesn't look at or use info block at all.
> 
> I'd rather not have to add info block to libvirt; but currently I can't
> see how else we can determine if we should call drive_unplug if we do a
> device_del() and the guest removes it before we call drive_unplug().  
> 
> What happens is that the guest removes the device and when we call
> drive_unplug() it fails to find the target device (since it was deleted
> by the guest). Then we fail the PCiDelDisk and libvirt keeps the device
> config around even though the guest has finished removing it.

This needs drive_unplug to return an explicitly identifiable
'no such device' error code, which libvirt can catch and
ignore.  Making the call to drive_unplug conditional on a
check to query-block/query-qdev is really a bug, because it
has an designed in race condition which means you need to 
check for a 'no such device' error code regardless. So it
is better to just blindly call drive_unplug and handle the
non-fatal failure conditions every time - this ensures that
codepath gets exercised more frequently too :-)

Regards,
Daniel
-- 
|: Red Hat, Engineering, London    -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org        -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal
  2010-11-05 16:01                                                 ` Markus Armbruster
@ 2010-11-08 21:02                                                   ` Michael S. Tsirkin
  0 siblings, 0 replies; 60+ messages in thread
From: Michael S. Tsirkin @ 2010-11-08 21:02 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Kevin Wolf, yamahata, qemu-devel, Anthony Liguori, Ryan Harper,
	Stefan Hajnoczi

On Fri, Nov 05, 2010 at 05:01:49PM +0100, Markus Armbruster wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> 
> > On Fri, Nov 05, 2010 at 02:27:49PM +0100, Markus Armbruster wrote:
> >> I'd be fine with any of these:
> >> 
> >> 1. A new command "device_disconnet ID" (or similar name) to disconnect
> >>    device ID from any host parts.  Nice touch: you don't have to know
> >>    about the device's host part(s) to disconnect it.  But it might be
> >>    more work than the other two.
> >> 
> >> 2. New commands netdev_disconnect, drive_disconnect (or similar names)
> >>    to disconnect a host part from a guest device.  Like (1), except you
> >>    have to point to the other end of the connection to cut it.
> >
> > I think it's cleaner not to introduce a concept of a disconnected
> > backend.
> 
> Backends start disconnected, so the concept already exists.
> 
> > One thing that we must be careful to explicitly disallow, is
> > reconnecting guest to another host backend. The reason being
> > that guest might rely on backend features and changing these
> > would break this.
> >
> > Given that, disconnecting without delete isn't helpful.
> 
> What about disconnect, hot plug new device, connect?

Exactly. I don't think we want to support this.
New device might not support all features that old one has.
Or it may have more features.

> >> 3. A new command "drive_del ID" similar to existing netdev_del.  This is
> >>    (2) fused with delete.  Conceptual wart: you can't disconnect and
> >>    keep the host part around.  Moreover, delete is slightly dangerous,
> >>    because it renders any guest device still using the host part
> >>    useless.
> >
> > I don't see how it's more dangerous than disconnecting.
> > If guest can't access the backend it might not exist
> > as far as guest is concerned.
> 
> If we keep disconnect and delete separate operations, we can make delete
> fail when still connected.  Typo insurance.
> 
> >> Do you need anything else from me to make progress?
> >
> > Let's go for 3. Need for 1/2 seems dubious, and it's much harder
> > to support.

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2010-11-08 21:36 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-25 18:22 [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal Ryan Harper
2010-10-25 18:22 ` [Qemu-devel] [PATCH 1/3] v2 Add drive_get_by_id Ryan Harper
2010-10-29 13:18   ` Markus Armbruster
2010-10-25 18:22 ` [Qemu-devel] [PATCH 2/3] v2 Fix Block Hotplug race with drive_unplug() Ryan Harper
2010-10-29 14:01   ` Markus Armbruster
2010-10-29 14:15     ` Anthony Liguori
2010-10-29 14:29       ` Kevin Wolf
2010-10-29 14:40         ` Anthony Liguori
2010-10-29 14:57           ` Kevin Wolf
2010-10-29 15:28             ` Anthony Liguori
2010-10-29 16:08               ` Kevin Wolf
2010-10-30 13:25                 ` Christoph Hellwig
2010-10-29 15:28       ` Markus Armbruster
2010-11-01 21:06     ` Ryan Harper
2010-10-25 18:22 ` [Qemu-devel] [PATCH 3/3] Add qmp version of drive_unplug Ryan Harper
2010-10-29 14:12 ` [Qemu-devel] [PATCH 0/3] v4 Decouple block device removal from device removal Markus Armbruster
2010-10-29 15:03   ` Ryan Harper
2010-10-29 16:10     ` Markus Armbruster
2010-10-29 16:50       ` Ryan Harper
2010-11-02  9:40         ` Markus Armbruster
2010-11-02 13:22           ` Michael S. Tsirkin
2010-11-02 13:41           ` Kevin Wolf
2010-11-02 13:46           ` Ryan Harper
2010-11-02 13:58             ` Michael S. Tsirkin
2010-11-02 14:22               ` Ryan Harper
2010-11-02 15:46                 ` Michael S. Tsirkin
2010-11-02 16:53                   ` Ryan Harper
2010-11-02 17:59                     ` Michael S. Tsirkin
2010-11-02 19:01                       ` Ryan Harper
2010-11-02 19:17                         ` Michael S. Tsirkin
2010-11-02 20:23                           ` Ryan Harper
2010-11-03  7:21                             ` Michael S. Tsirkin
2010-11-03 12:04                               ` Ryan Harper
2010-11-03 16:41                                 ` Markus Armbruster
2010-11-03 17:29                                   ` Ryan Harper
2010-11-03 18:02                                     ` Michael S. Tsirkin
2010-11-03 20:59                                       ` Ryan Harper
2010-11-03 21:26                                         ` Michael S. Tsirkin
2010-11-04 16:45                                           ` Ryan Harper
2010-11-04 17:04                                             ` Michael S. Tsirkin
2010-11-05 13:27                                             ` Markus Armbruster
2010-11-05 14:17                                               ` Michael S. Tsirkin
2010-11-05 14:29                                                 ` Ryan Harper
2010-11-05 16:01                                                 ` Markus Armbruster
2010-11-08 21:02                                                   ` Michael S. Tsirkin
2010-11-05 14:25                                               ` Ryan Harper
2010-11-05 16:10                                                 ` Markus Armbruster
2010-11-05 16:22                                                   ` Ryan Harper
2010-11-06  8:18                                                     ` Markus Armbruster
2010-11-08  2:19                                                       ` Ryan Harper
2010-11-08 10:32                                                         ` Markus Armbruster
2010-11-08 10:49                                                           ` Michael S. Tsirkin
2010-11-08 12:03                                                             ` Markus Armbruster
2010-11-08 14:02                                                               ` Ryan Harper
2010-11-08 16:56                                                                 ` Michael S. Tsirkin
2010-11-08 17:04                                                                   ` Daniel P. Berrange
2010-11-08 18:41                                                                     ` Ryan Harper
2010-11-08 18:39                                                                   ` Ryan Harper
2010-11-08 19:06                                                                     ` Daniel P. Berrange
2010-11-08 16:34                                                               ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.