All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH for-2.10 0/3] qdev/vfio: defer DEVICE_DEL to avoid races with libvirt
@ 2017-07-27  1:30 Michael Roth
  2017-07-27  1:30 ` [Qemu-devel] [PATCH for-2.10 1/3] qdev: store DeviceState's canonical path to use when unparenting Michael Roth
                   ` (5 more replies)
  0 siblings, 6 replies; 24+ messages in thread
From: Michael Roth @ 2017-07-27  1:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, berrange, alex.williamson, pbonzini, david, groug, armbru

This series was motivated by the discussion in this thread:

  https://www.redhat.com/archives/libvir-list/2017-June/msg01370.html

The issue this series addresses is that when libvirt unplugs a VFIO PCI device,
it may attempt to bind the host device back to the host driver when QEMU emits
the DEVICE_DELETED event for the corresponding vfio-pci device. However, the
VFIO group FD is not actually cleaned up until vfio-pci device is *finalized*
by QEMU, whereas the event is emitted earlier during device_unparent.
Depending on the host device and how long certain operations like resetting the
device might take, this can in result in libvirt trying to rebind the device
back to the host while it is still in use by VFIO, leading to host crashes or
other unexpected behavior.

In particular, Mellanox CX4 adapters on PowerNV hosts might not be fully
quiesced by vfio-pci's finalize() routine until up to 6s after the
DEVICE_DELETED was emitted, leading to detach-device on the libvirt side pretty
much always crashing the host.

Implementing this change requires 2 prereqs to ensure the same information is
available when the DEVICE_DELETED is finally emitted:

1) Storing the path in the composition patch, which is addressed by PATCH 1,
   which was plucked from another pending series from Greg Kurz:

   https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg07922.html

   since we are now "disconnected" at the time the event is emitted, and

2) Deferring qemu_opts_del of the DeviceState->QemuOpts till finalize, since
   that is where DeviceState->id is stored. This was actually how it was
   done in the past, so PATCH 2 simply reverts the change which moved it to
   device_unparent.

>From there it's just a mechanical move of the event from device_unparent to
device_finalize.

 hw/core/qdev.c         | 30 +++++++++++++++++++-----------
 include/hw/qdev-core.h |  1 +
 2 files changed, 20 insertions(+), 11 deletions(-)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH for-2.10 1/3] qdev: store DeviceState's canonical path to use when unparenting
  2017-07-27  1:30 [Qemu-devel] [PATCH for-2.10 0/3] qdev/vfio: defer DEVICE_DEL to avoid races with libvirt Michael Roth
@ 2017-07-27  1:30 ` Michael Roth
  2017-07-27  1:30 ` [Qemu-devel] [PATCH for-2.10 2/3] Revert "qdev: Free QemuOpts when the QOM path goes away" Michael Roth
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 24+ messages in thread
From: Michael Roth @ 2017-07-27  1:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, berrange, alex.williamson, pbonzini, david, groug,
	armbru, Michael S . Tsirkin

device_unparent(dev, ...) is called when a device is unparented,
either directly, or as a result of a parent device being
finalized, and handles some final cleanup for the device. Part
of this includes emiting a DEVICE_DELETED QMP event to notify
management, which includes the device's path in the composition
tree as provided by object_get_canonical_path().

object_get_canonical_path() assumes the device is still connected
to the machine/root container, and will assert otherwise, but
in some situations this isn't the case:

If the parent is finalized as a result of object_unparent(), it
will still be attached to the composition tree at the time any
children are unparented as a result of that same call to
object_unparent(). However, in some cases, object_unparent()
will complete without finalizing the parent device, due to
lingering references that won't be released till some time later.
One such example is if the parent has MemoryRegion children (which
take a ref on their parent), who in turn have AddressSpace's (which
take a ref on their regions), since those AddressSpaces get cleaned
up asynchronously by the RCU thread.

In this case qdev:device_unparent() may be called for a child Device
that no longer has a path to the root/machine container, causing
object_get_canonical_path() to assert.

Fix this by storing the canonical path during realize() so the
information will still be available for device_unparent() in such
cases.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/core/qdev.c         | 15 ++++++++++++---
 include/hw/qdev-core.h |  1 +
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 606ab53..a64b35c 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -928,6 +928,12 @@ static void device_set_realized(Object *obj, bool value, Error **errp)
             goto post_realize_fail;
         }
 
+        /* always re-initialize since we clean up in device_unparent() instead
+         * of unrealize()
+         */
+        g_free(dev->canonical_path);
+        dev->canonical_path = object_get_canonical_path(OBJECT(dev));
+
         if (qdev_get_vmsd(dev)) {
             if (vmstate_register_with_alias_id(dev, -1, qdev_get_vmsd(dev), dev,
                                                dev->instance_id_alias,
@@ -984,6 +990,7 @@ child_realize_fail:
     }
 
 post_realize_fail:
+    g_free(dev->canonical_path);
     if (dc->unrealize) {
         dc->unrealize(dev, NULL);
     }
@@ -1102,10 +1109,12 @@ static void device_unparent(Object *obj)
 
     /* Only send event if the device had been completely realized */
     if (dev->pending_deleted_event) {
-        gchar *path = object_get_canonical_path(OBJECT(dev));
+        g_assert(dev->canonical_path);
 
-        qapi_event_send_device_deleted(!!dev->id, dev->id, path, &error_abort);
-        g_free(path);
+        qapi_event_send_device_deleted(!!dev->id, dev->id, dev->canonical_path,
+                                       &error_abort);
+        g_free(dev->canonical_path);
+        dev->canonical_path = NULL;
     }
 
     qemu_opts_del(dev->opts);
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index ae31728..9237b684 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -153,6 +153,7 @@ struct DeviceState {
     /*< public >*/
 
     const char *id;
+    char *canonical_path;
     bool realized;
     bool pending_deleted_event;
     QemuOpts *opts;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH for-2.10 2/3] Revert "qdev: Free QemuOpts when the QOM path goes away"
  2017-07-27  1:30 [Qemu-devel] [PATCH for-2.10 0/3] qdev/vfio: defer DEVICE_DEL to avoid races with libvirt Michael Roth
  2017-07-27  1:30 ` [Qemu-devel] [PATCH for-2.10 1/3] qdev: store DeviceState's canonical path to use when unparenting Michael Roth
@ 2017-07-27  1:30 ` Michael Roth
  2017-07-31 15:51   ` Greg Kurz
  2017-07-27  1:30 ` [Qemu-devel] [PATCH for-2.10 3/3] qdev: defer DEVICE_DEL event until instance_finalize() Michael Roth
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 24+ messages in thread
From: Michael Roth @ 2017-07-27  1:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, berrange, alex.williamson, pbonzini, david, groug, armbru

This reverts commit abed886ec60cf239a03515cf0b30fb11fa964c44.

This patch originally addressed an issue where a DEVICE_DELETED
event could be emitted (in device_unparent()) before a Device's
QemuOpts were cleaned up (in device_finalize()), leading to a
"duplicate ID" error if management attempted to immediately add
a device with the same ID in response to the DEVICE_DELETED event.

An alternative will be implemented in a subsequent patch where we
defer the DEVICE_DELETED event until device_finalize(), which would
also prevent the race, so we revert the original fix in preparation.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/core/qdev.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index a64b35c..08c4061 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -1067,6 +1067,7 @@ static void device_finalize(Object *obj)
     NamedGPIOList *ngl, *next;
 
     DeviceState *dev = DEVICE(obj);
+    qemu_opts_del(dev->opts);
 
     QLIST_FOREACH_SAFE(ngl, &dev->gpios, node, next) {
         QLIST_REMOVE(ngl, node);
@@ -1116,9 +1117,6 @@ static void device_unparent(Object *obj)
         g_free(dev->canonical_path);
         dev->canonical_path = NULL;
     }
-
-    qemu_opts_del(dev->opts);
-    dev->opts = NULL;
 }
 
 static void device_class_init(ObjectClass *class, void *data)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH for-2.10 3/3] qdev: defer DEVICE_DEL event until instance_finalize()
  2017-07-27  1:30 [Qemu-devel] [PATCH for-2.10 0/3] qdev/vfio: defer DEVICE_DEL to avoid races with libvirt Michael Roth
  2017-07-27  1:30 ` [Qemu-devel] [PATCH for-2.10 1/3] qdev: store DeviceState's canonical path to use when unparenting Michael Roth
  2017-07-27  1:30 ` [Qemu-devel] [PATCH for-2.10 2/3] Revert "qdev: Free QemuOpts when the QOM path goes away" Michael Roth
@ 2017-07-27  1:30 ` Michael Roth
  2017-07-31 17:11   ` Greg Kurz
  2017-08-09 14:04   ` Auger Eric
  2017-07-27  9:11 ` [Qemu-devel] [PATCH for-2.10 0/3] qdev/vfio: defer DEVICE_DEL to avoid races with libvirt Peter Maydell
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 24+ messages in thread
From: Michael Roth @ 2017-07-27  1:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, berrange, alex.williamson, pbonzini, david, groug, armbru

DEVICE_DEL is currently emitted when a Device is unparented, as
opposed to when it is finalized. The main design motivation for this
seems to be that after unparent()/unrealize(), the Device is no
longer visible to the guest, and thus the operation is complete
from the perspective of management.

However, there are cases where remaining host-side cleanup is also
pertinent to management. The is generally handled by treating these
resources as aspects of the "backend", which can be managed via
separate interfaces/events, such as blockdev_add/del, netdev_add/del,
object_add/del, etc, but some devices do not have this level of
compartmentalization, namely vfio-pci, and possibly to lend themselves
well to it.

In the case of vfio-pci, the "backend" cleanup happens as part of
the finalization of the vfio-pci device itself, in particular the
cleanup of the VFIO group FD. Failing to wait for this cleanup can
result in tools like libvirt attempting to rebind the device to
the host while it's still being used by VFIO, which can result in
host crashes or other misbehavior depending on the host driver.

Deferring DEVICE_DEL still affords us the ability to manage backends
explicitly, while also addressing cases like vfio-pci's, so we
implement that approach here.

An alternative proposal involving having VFIO emit a separate event
to denote completion of host-side cleanup was discussed, but the
prevailing opinion seems to be that it is not worth the added
complexity, and leaves the issue open for other Device implementations
solve in the future.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
 hw/core/qdev.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 08c4061..d14acba 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -1067,7 +1067,6 @@ static void device_finalize(Object *obj)
     NamedGPIOList *ngl, *next;
 
     DeviceState *dev = DEVICE(obj);
-    qemu_opts_del(dev->opts);
 
     QLIST_FOREACH_SAFE(ngl, &dev->gpios, node, next) {
         QLIST_REMOVE(ngl, node);
@@ -1078,6 +1077,18 @@ static void device_finalize(Object *obj)
          * here
          */
     }
+
+    /* Only send event if the device had been completely realized */
+    if (dev->pending_deleted_event) {
+        g_assert(dev->canonical_path);
+
+        qapi_event_send_device_deleted(!!dev->id, dev->id, dev->canonical_path,
+                                       &error_abort);
+        g_free(dev->canonical_path);
+        dev->canonical_path = NULL;
+    }
+
+    qemu_opts_del(dev->opts);
 }
 
 static void device_class_base_init(ObjectClass *class, void *data)
@@ -1107,16 +1118,6 @@ static void device_unparent(Object *obj)
         object_unref(OBJECT(dev->parent_bus));
         dev->parent_bus = NULL;
     }
-
-    /* Only send event if the device had been completely realized */
-    if (dev->pending_deleted_event) {
-        g_assert(dev->canonical_path);
-
-        qapi_event_send_device_deleted(!!dev->id, dev->id, dev->canonical_path,
-                                       &error_abort);
-        g_free(dev->canonical_path);
-        dev->canonical_path = NULL;
-    }
 }
 
 static void device_class_init(ObjectClass *class, void *data)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 0/3] qdev/vfio: defer DEVICE_DEL to avoid races with libvirt
  2017-07-27  1:30 [Qemu-devel] [PATCH for-2.10 0/3] qdev/vfio: defer DEVICE_DEL to avoid races with libvirt Michael Roth
                   ` (2 preceding siblings ...)
  2017-07-27  1:30 ` [Qemu-devel] [PATCH for-2.10 3/3] qdev: defer DEVICE_DEL event until instance_finalize() Michael Roth
@ 2017-07-27  9:11 ` Peter Maydell
  2017-07-27 10:53   ` David Gibson
  2017-08-09 14:53 ` Auger Eric
  2017-10-03 22:21 ` Michael Roth
  5 siblings, 1 reply; 24+ messages in thread
From: Peter Maydell @ 2017-07-27  9:11 UTC (permalink / raw)
  To: Michael Roth
  Cc: QEMU Developers, Daniel P. Berrange, Alex Williamson,
	Paolo Bonzini, David Gibson, Greg Kurz, Markus Armbruster

On 27 July 2017 at 02:30, Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
> In particular, Mellanox CX4 adapters on PowerNV hosts might not be fully
> quiesced by vfio-pci's finalize() routine until up to 6s after the
> DEVICE_DELETED was emitted, leading to detach-device on the libvirt side pretty
> much always crashing the host.

My initial naive thought is that if the host kernel can crash then
this is a host kernel bug... shouldn't the host kernel refuse
the subsequent libvirt rebind if it would cause a crash ?

thanks
-- PMM

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 0/3] qdev/vfio: defer DEVICE_DEL to avoid races with libvirt
  2017-07-27  9:11 ` [Qemu-devel] [PATCH for-2.10 0/3] qdev/vfio: defer DEVICE_DEL to avoid races with libvirt Peter Maydell
@ 2017-07-27 10:53   ` David Gibson
  2017-07-27 11:50     ` Daniel P. Berrange
                       ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: David Gibson @ 2017-07-27 10:53 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Michael Roth, QEMU Developers, Daniel P. Berrange,
	Alex Williamson, Paolo Bonzini, Greg Kurz, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 917 bytes --]

On Thu, Jul 27, 2017 at 10:11:48AM +0100, Peter Maydell wrote:
> On 27 July 2017 at 02:30, Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
> > In particular, Mellanox CX4 adapters on PowerNV hosts might not be fully
> > quiesced by vfio-pci's finalize() routine until up to 6s after the
> > DEVICE_DELETED was emitted, leading to detach-device on the libvirt side pretty
> > much always crashing the host.
> 
> My initial naive thought is that if the host kernel can crash then
> this is a host kernel bug... shouldn't the host kernel refuse
> the subsequent libvirt rebind if it would cause a crash ?

I think so too, but I haven't been able to convince Alex.  Nor
find time to fix it in the kernel myself.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 0/3] qdev/vfio: defer DEVICE_DEL to avoid races with libvirt
  2017-07-27 10:53   ` David Gibson
@ 2017-07-27 11:50     ` Daniel P. Berrange
  2017-08-08 19:40       ` Alex Williamson
  2017-07-27 11:54     ` Michael Roth
  2017-07-27 14:47     ` Alex Williamson
  2 siblings, 1 reply; 24+ messages in thread
From: Daniel P. Berrange @ 2017-07-27 11:50 UTC (permalink / raw)
  To: David Gibson
  Cc: Peter Maydell, Michael Roth, QEMU Developers, Alex Williamson,
	Paolo Bonzini, Greg Kurz, Markus Armbruster

On Thu, Jul 27, 2017 at 08:53:48PM +1000, David Gibson wrote:
> On Thu, Jul 27, 2017 at 10:11:48AM +0100, Peter Maydell wrote:
> > On 27 July 2017 at 02:30, Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
> > > In particular, Mellanox CX4 adapters on PowerNV hosts might not be fully
> > > quiesced by vfio-pci's finalize() routine until up to 6s after the
> > > DEVICE_DELETED was emitted, leading to detach-device on the libvirt side pretty
> > > much always crashing the host.
> > 
> > My initial naive thought is that if the host kernel can crash then
> > this is a host kernel bug... shouldn't the host kernel refuse
> > the subsequent libvirt rebind if it would cause a crash ?
> 
> I think so too, but I haven't been able to convince Alex.  Nor
> find time to fix it in the kernel myself.

I think we need to fix both the QEMU premature sending of DEVICE_DELETED
and the kernel bug that allowed the crash.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 0/3] qdev/vfio: defer DEVICE_DEL to avoid races with libvirt
  2017-07-27 10:53   ` David Gibson
  2017-07-27 11:50     ` Daniel P. Berrange
@ 2017-07-27 11:54     ` Michael Roth
  2017-07-27 14:47     ` Alex Williamson
  2 siblings, 0 replies; 24+ messages in thread
From: Michael Roth @ 2017-07-27 11:54 UTC (permalink / raw)
  To: David Gibson, Peter Maydell
  Cc: QEMU Developers, Daniel P. Berrange, Alex Williamson,
	Paolo Bonzini, Greg Kurz, Markus Armbruster

Quoting David Gibson (2017-07-27 05:53:48)
> On Thu, Jul 27, 2017 at 10:11:48AM +0100, Peter Maydell wrote:
> > On 27 July 2017 at 02:30, Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
> > > In particular, Mellanox CX4 adapters on PowerNV hosts might not be fully
> > > quiesced by vfio-pci's finalize() routine until up to 6s after the
> > > DEVICE_DELETED was emitted, leading to detach-device on the libvirt side pretty
> > > much always crashing the host.
> > 
> > My initial naive thought is that if the host kernel can crash then
> > this is a host kernel bug... shouldn't the host kernel refuse
> > the subsequent libvirt rebind if it would cause a crash ?
> 
> I think so too, but I haven't been able to convince Alex.  Nor
> find time to fix it in the kernel myself.

In the thread I linked to Alex had mentioned he was pursuing something on the
kernel side, but my understanding what that we'd simply have the kernel fail
more gracefully when attempting to rebind in this situation.

But that still leaves the matter of libvirt failing to rebind the device to
the host. This series addresses that aspect of it, so I think the 2 approaches
are complementary.

> 
> -- 
> David Gibson                    | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
>                                 | _way_ _around_!
> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 0/3] qdev/vfio: defer DEVICE_DEL to avoid races with libvirt
  2017-07-27 10:53   ` David Gibson
  2017-07-27 11:50     ` Daniel P. Berrange
  2017-07-27 11:54     ` Michael Roth
@ 2017-07-27 14:47     ` Alex Williamson
  2017-07-28  3:14       ` David Gibson
  2 siblings, 1 reply; 24+ messages in thread
From: Alex Williamson @ 2017-07-27 14:47 UTC (permalink / raw)
  To: David Gibson
  Cc: Peter Maydell, QEMU Developers, Markus Armbruster, Greg Kurz,
	Michael Roth, Paolo Bonzini

On Thu, 27 Jul 2017 20:53:48 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Thu, Jul 27, 2017 at 10:11:48AM +0100, Peter Maydell wrote:
> > On 27 July 2017 at 02:30, Michael Roth <mdroth@linux.vnet.ibm.com> wrote:  
> > > In particular, Mellanox CX4 adapters on PowerNV hosts might not be fully
> > > quiesced by vfio-pci's finalize() routine until up to 6s after the
> > > DEVICE_DELETED was emitted, leading to detach-device on the libvirt side pretty
> > > much always crashing the host.  
> > 
> > My initial naive thought is that if the host kernel can crash then
> > this is a host kernel bug... shouldn't the host kernel refuse
> > the subsequent libvirt rebind if it would cause a crash ?  
> 
> I think so too, but I haven't been able to convince Alex.  Nor
> find time to fix it in the kernel myself.

It's not me you need to convince, it's GregKH[1].  That interpretation
is that the user bind request is a mandate and we'll fall over
ourselves to try to do as they ask.  I think the best I might be able
to do is to kill the QEMU process to avoid compromising the kernel
rather than killing the kernel after the isolation compromise has
occurred.  Messing with driver binding is a privileged operation, and
the kernel believes you get to keep all the pieces when it fails.
Sorry.  Thanks,

Alex

[1] https://lkml.org/lkml/2017/7/10/728

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 0/3] qdev/vfio: defer DEVICE_DEL to avoid races with libvirt
  2017-07-27 14:47     ` Alex Williamson
@ 2017-07-28  3:14       ` David Gibson
  0 siblings, 0 replies; 24+ messages in thread
From: David Gibson @ 2017-07-28  3:14 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Peter Maydell, QEMU Developers, Markus Armbruster, Greg Kurz,
	Michael Roth, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 2106 bytes --]

On Thu, Jul 27, 2017 at 08:47:33AM -0600, Alex Williamson wrote:
> On Thu, 27 Jul 2017 20:53:48 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Thu, Jul 27, 2017 at 10:11:48AM +0100, Peter Maydell wrote:
> > > On 27 July 2017 at 02:30, Michael Roth <mdroth@linux.vnet.ibm.com> wrote:  
> > > > In particular, Mellanox CX4 adapters on PowerNV hosts might not be fully
> > > > quiesced by vfio-pci's finalize() routine until up to 6s after the
> > > > DEVICE_DELETED was emitted, leading to detach-device on the libvirt side pretty
> > > > much always crashing the host.  
> > > 
> > > My initial naive thought is that if the host kernel can crash then
> > > this is a host kernel bug... shouldn't the host kernel refuse
> > > the subsequent libvirt rebind if it would cause a crash ?  
> > 
> > I think so too, but I haven't been able to convince Alex.  Nor
> > find time to fix it in the kernel myself.
> 
> It's not me you need to convince, it's GregKH[1].  That interpretation
> is that the user bind request is a mandate and we'll fall over
> ourselves to try to do as they ask.  I think the best I might be able
> to do is to kill the QEMU process to avoid compromising the kernel
> rather than killing the kernel after the isolation compromise has
> occurred.  Messing with driver binding is a privileged operation, and
> the kernel believes you get to keep all the pieces when it fails.
> Sorry.  Thanks,

Ah, my mistake.  Well, I guess I'll whinge at GregKH at some point.

Anyway, the basic point remains - I still think we should address this
in the kernel, but that's not going to happen soon.  So we're left
with addressing it in qemu and/or libvirt.  And as others have pointed
out, there are reasons to do that even if the kernel does get changed
to protect itself better here.

> 
> Alex
> 
> [1] https://lkml.org/lkml/2017/7/10/728
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 2/3] Revert "qdev: Free QemuOpts when the QOM path goes away"
  2017-07-27  1:30 ` [Qemu-devel] [PATCH for-2.10 2/3] Revert "qdev: Free QemuOpts when the QOM path goes away" Michael Roth
@ 2017-07-31 15:51   ` Greg Kurz
  2017-07-31 16:39     ` Michael Roth
  0 siblings, 1 reply; 24+ messages in thread
From: Greg Kurz @ 2017-07-31 15:51 UTC (permalink / raw)
  To: Michael Roth
  Cc: qemu-devel, peter.maydell, berrange, alex.williamson, pbonzini,
	david, armbru

[-- Attachment #1: Type: text/plain, Size: 1836 bytes --]

On Wed, 26 Jul 2017 20:30:54 -0500
Michael Roth <mdroth@linux.vnet.ibm.com> wrote:

> This reverts commit abed886ec60cf239a03515cf0b30fb11fa964c44.
> 
> This patch originally addressed an issue where a DEVICE_DELETED
> event could be emitted (in device_unparent()) before a Device's
> QemuOpts were cleaned up (in device_finalize()), leading to a
> "duplicate ID" error if management attempted to immediately add
> a device with the same ID in response to the DEVICE_DELETED event.
> 
> An alternative will be implemented in a subsequent patch where we
> defer the DEVICE_DELETED event until device_finalize(), which would
> also prevent the race, so we revert the original fix in preparation.
> 

Do you really need to revert abed886ec60cf239a03515cf0b30fb11fa964c44 ?

IIUC, the purpose of this series is to fix/workaround the VFIO cleanup issue.
Even if it also relates to when we emit DEVICE_DELETED, it isn't exactly the
same problem here, is it ?

> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/core/qdev.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/hw/core/qdev.c b/hw/core/qdev.c
> index a64b35c..08c4061 100644
> --- a/hw/core/qdev.c
> +++ b/hw/core/qdev.c
> @@ -1067,6 +1067,7 @@ static void device_finalize(Object *obj)
>      NamedGPIOList *ngl, *next;
>  
>      DeviceState *dev = DEVICE(obj);
> +    qemu_opts_del(dev->opts);
>  
>      QLIST_FOREACH_SAFE(ngl, &dev->gpios, node, next) {
>          QLIST_REMOVE(ngl, node);
> @@ -1116,9 +1117,6 @@ static void device_unparent(Object *obj)
>          g_free(dev->canonical_path);
>          dev->canonical_path = NULL;
>      }
> -
> -    qemu_opts_del(dev->opts);
> -    dev->opts = NULL;
>  }
>  
>  static void device_class_init(ObjectClass *class, void *data)


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 2/3] Revert "qdev: Free QemuOpts when the QOM path goes away"
  2017-07-31 15:51   ` Greg Kurz
@ 2017-07-31 16:39     ` Michael Roth
  2017-07-31 17:10       ` Greg Kurz
  0 siblings, 1 reply; 24+ messages in thread
From: Michael Roth @ 2017-07-31 16:39 UTC (permalink / raw)
  To: Greg Kurz
  Cc: qemu-devel, peter.maydell, berrange, alex.williamson, pbonzini,
	david, armbru

Quoting Greg Kurz (2017-07-31 10:51:39)
> On Wed, 26 Jul 2017 20:30:54 -0500
> Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
> 
> > This reverts commit abed886ec60cf239a03515cf0b30fb11fa964c44.
> > 
> > This patch originally addressed an issue where a DEVICE_DELETED
> > event could be emitted (in device_unparent()) before a Device's
> > QemuOpts were cleaned up (in device_finalize()), leading to a
> > "duplicate ID" error if management attempted to immediately add
> > a device with the same ID in response to the DEVICE_DELETED event.
> > 
> > An alternative will be implemented in a subsequent patch where we
> > defer the DEVICE_DELETED event until device_finalize(), which would
> > also prevent the race, so we revert the original fix in preparation.
> > 
> 
> Do you really need to revert abed886ec60cf239a03515cf0b30fb11fa964c44 ?
> 
> IIUC, the purpose of this series is to fix/workaround the VFIO cleanup issue.
> Even if it also relates to when we emit DEVICE_DELETED, it isn't exactly the
> same problem here, is it ?

That's true, but since the implementation of the VFIO fix requires
moving qemu_opts_del() to back to finalize(), and since that
effectively reverts the change made in abed88, I figured it was
better to revert the patch directly as opposed as a side-effect
of patch 3.

> 
> > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > ---
> >  hw/core/qdev.c | 4 +---
> >  1 file changed, 1 insertion(+), 3 deletions(-)
> > 
> > diff --git a/hw/core/qdev.c b/hw/core/qdev.c
> > index a64b35c..08c4061 100644
> > --- a/hw/core/qdev.c
> > +++ b/hw/core/qdev.c
> > @@ -1067,6 +1067,7 @@ static void device_finalize(Object *obj)
> >      NamedGPIOList *ngl, *next;
> >  
> >      DeviceState *dev = DEVICE(obj);
> > +    qemu_opts_del(dev->opts);
> >  
> >      QLIST_FOREACH_SAFE(ngl, &dev->gpios, node, next) {
> >          QLIST_REMOVE(ngl, node);
> > @@ -1116,9 +1117,6 @@ static void device_unparent(Object *obj)
> >          g_free(dev->canonical_path);
> >          dev->canonical_path = NULL;
> >      }
> > -
> > -    qemu_opts_del(dev->opts);
> > -    dev->opts = NULL;
> >  }
> >  
> >  static void device_class_init(ObjectClass *class, void *data)
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 2/3] Revert "qdev: Free QemuOpts when the QOM path goes away"
  2017-07-31 16:39     ` Michael Roth
@ 2017-07-31 17:10       ` Greg Kurz
  0 siblings, 0 replies; 24+ messages in thread
From: Greg Kurz @ 2017-07-31 17:10 UTC (permalink / raw)
  To: Michael Roth
  Cc: qemu-devel, peter.maydell, berrange, alex.williamson, pbonzini,
	david, armbru

[-- Attachment #1: Type: text/plain, Size: 2536 bytes --]

On Mon, 31 Jul 2017 11:39:13 -0500
Michael Roth <mdroth@linux.vnet.ibm.com> wrote:

> Quoting Greg Kurz (2017-07-31 10:51:39)
> > On Wed, 26 Jul 2017 20:30:54 -0500
> > Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
> >   
> > > This reverts commit abed886ec60cf239a03515cf0b30fb11fa964c44.
> > > 
> > > This patch originally addressed an issue where a DEVICE_DELETED
> > > event could be emitted (in device_unparent()) before a Device's
> > > QemuOpts were cleaned up (in device_finalize()), leading to a
> > > "duplicate ID" error if management attempted to immediately add
> > > a device with the same ID in response to the DEVICE_DELETED event.
> > > 
> > > An alternative will be implemented in a subsequent patch where we
> > > defer the DEVICE_DELETED event until device_finalize(), which would
> > > also prevent the race, so we revert the original fix in preparation.
> > >   
> > 
> > Do you really need to revert abed886ec60cf239a03515cf0b30fb11fa964c44 ?
> > 
> > IIUC, the purpose of this series is to fix/workaround the VFIO cleanup issue.
> > Even if it also relates to when we emit DEVICE_DELETED, it isn't exactly the
> > same problem here, is it ?  
> 
> That's true, but since the implementation of the VFIO fix requires
> moving qemu_opts_del() to back to finalize(), and since that
> effectively reverts the change made in abed88, I figured it was
> better to revert the patch directly as opposed as a side-effect
> of patch 3.
> 

Fair enough.

Reviewed-by: Greg Kurz <groug@kaod.org>

> >   
> > > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > > ---
> > >  hw/core/qdev.c | 4 +---
> > >  1 file changed, 1 insertion(+), 3 deletions(-)
> > > 
> > > diff --git a/hw/core/qdev.c b/hw/core/qdev.c
> > > index a64b35c..08c4061 100644
> > > --- a/hw/core/qdev.c
> > > +++ b/hw/core/qdev.c
> > > @@ -1067,6 +1067,7 @@ static void device_finalize(Object *obj)
> > >      NamedGPIOList *ngl, *next;
> > >  
> > >      DeviceState *dev = DEVICE(obj);
> > > +    qemu_opts_del(dev->opts);
> > >  
> > >      QLIST_FOREACH_SAFE(ngl, &dev->gpios, node, next) {
> > >          QLIST_REMOVE(ngl, node);
> > > @@ -1116,9 +1117,6 @@ static void device_unparent(Object *obj)
> > >          g_free(dev->canonical_path);
> > >          dev->canonical_path = NULL;
> > >      }
> > > -
> > > -    qemu_opts_del(dev->opts);
> > > -    dev->opts = NULL;
> > >  }
> > >  
> > >  static void device_class_init(ObjectClass *class, void *data)  
> >   
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 3/3] qdev: defer DEVICE_DEL event until instance_finalize()
  2017-07-27  1:30 ` [Qemu-devel] [PATCH for-2.10 3/3] qdev: defer DEVICE_DEL event until instance_finalize() Michael Roth
@ 2017-07-31 17:11   ` Greg Kurz
  2017-08-09 14:04   ` Auger Eric
  1 sibling, 0 replies; 24+ messages in thread
From: Greg Kurz @ 2017-07-31 17:11 UTC (permalink / raw)
  To: Michael Roth
  Cc: qemu-devel, peter.maydell, berrange, alex.williamson, pbonzini,
	david, armbru

[-- Attachment #1: Type: text/plain, Size: 3646 bytes --]

On Wed, 26 Jul 2017 20:30:55 -0500
Michael Roth <mdroth@linux.vnet.ibm.com> wrote:

> DEVICE_DEL is currently emitted when a Device is unparented, as
> opposed to when it is finalized. The main design motivation for this
> seems to be that after unparent()/unrealize(), the Device is no
> longer visible to the guest, and thus the operation is complete
> from the perspective of management.
> 
> However, there are cases where remaining host-side cleanup is also
> pertinent to management. The is generally handled by treating these
> resources as aspects of the "backend", which can be managed via
> separate interfaces/events, such as blockdev_add/del, netdev_add/del,
> object_add/del, etc, but some devices do not have this level of
> compartmentalization, namely vfio-pci, and possibly to lend themselves
> well to it.
> 
> In the case of vfio-pci, the "backend" cleanup happens as part of
> the finalization of the vfio-pci device itself, in particular the
> cleanup of the VFIO group FD. Failing to wait for this cleanup can
> result in tools like libvirt attempting to rebind the device to
> the host while it's still being used by VFIO, which can result in
> host crashes or other misbehavior depending on the host driver.
> 
> Deferring DEVICE_DEL still affords us the ability to manage backends
> explicitly, while also addressing cases like vfio-pci's, so we
> implement that approach here.
> 
> An alternative proposal involving having VFIO emit a separate event
> to denote completion of host-side cleanup was discussed, but the
> prevailing opinion seems to be that it is not worth the added
> complexity, and leaves the issue open for other Device implementations
> solve in the future.
> 
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---

Reviewed-by: Greg Kurz <groug@kaod.org>

>  hw/core/qdev.c | 23 ++++++++++++-----------
>  1 file changed, 12 insertions(+), 11 deletions(-)
> 
> diff --git a/hw/core/qdev.c b/hw/core/qdev.c
> index 08c4061..d14acba 100644
> --- a/hw/core/qdev.c
> +++ b/hw/core/qdev.c
> @@ -1067,7 +1067,6 @@ static void device_finalize(Object *obj)
>      NamedGPIOList *ngl, *next;
>  
>      DeviceState *dev = DEVICE(obj);
> -    qemu_opts_del(dev->opts);
>  
>      QLIST_FOREACH_SAFE(ngl, &dev->gpios, node, next) {
>          QLIST_REMOVE(ngl, node);
> @@ -1078,6 +1077,18 @@ static void device_finalize(Object *obj)
>           * here
>           */
>      }
> +
> +    /* Only send event if the device had been completely realized */
> +    if (dev->pending_deleted_event) {
> +        g_assert(dev->canonical_path);
> +
> +        qapi_event_send_device_deleted(!!dev->id, dev->id, dev->canonical_path,
> +                                       &error_abort);
> +        g_free(dev->canonical_path);
> +        dev->canonical_path = NULL;
> +    }
> +
> +    qemu_opts_del(dev->opts);
>  }
>  
>  static void device_class_base_init(ObjectClass *class, void *data)
> @@ -1107,16 +1118,6 @@ static void device_unparent(Object *obj)
>          object_unref(OBJECT(dev->parent_bus));
>          dev->parent_bus = NULL;
>      }
> -
> -    /* Only send event if the device had been completely realized */
> -    if (dev->pending_deleted_event) {
> -        g_assert(dev->canonical_path);
> -
> -        qapi_event_send_device_deleted(!!dev->id, dev->id, dev->canonical_path,
> -                                       &error_abort);
> -        g_free(dev->canonical_path);
> -        dev->canonical_path = NULL;
> -    }
>  }
>  
>  static void device_class_init(ObjectClass *class, void *data)


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 0/3] qdev/vfio: defer DEVICE_DEL to avoid races with libvirt
  2017-07-27 11:50     ` Daniel P. Berrange
@ 2017-08-08 19:40       ` Alex Williamson
  2017-08-09  5:08         ` David Gibson
  0 siblings, 1 reply; 24+ messages in thread
From: Alex Williamson @ 2017-08-08 19:40 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: David Gibson, Peter Maydell, Michael Roth, QEMU Developers,
	Paolo Bonzini, Greg Kurz, Markus Armbruster

On Thu, 27 Jul 2017 12:50:42 +0100
"Daniel P. Berrange" <berrange@redhat.com> wrote:

> On Thu, Jul 27, 2017 at 08:53:48PM +1000, David Gibson wrote:
> > On Thu, Jul 27, 2017 at 10:11:48AM +0100, Peter Maydell wrote:  
> > > On 27 July 2017 at 02:30, Michael Roth <mdroth@linux.vnet.ibm.com> wrote:  
> > > > In particular, Mellanox CX4 adapters on PowerNV hosts might not be fully
> > > > quiesced by vfio-pci's finalize() routine until up to 6s after the
> > > > DEVICE_DELETED was emitted, leading to detach-device on the libvirt side pretty
> > > > much always crashing the host.  
> > > 
> > > My initial naive thought is that if the host kernel can crash then
> > > this is a host kernel bug... shouldn't the host kernel refuse
> > > the subsequent libvirt rebind if it would cause a crash ?  
> > 
> > I think so too, but I haven't been able to convince Alex.  Nor
> > find time to fix it in the kernel myself.  
> 
> I think we need to fix both the QEMU premature sending of DEVICE_DELETED
> and the kernel bug that allowed the crash.


Where do we stand on this for v2.10?  I'd like to see it get in.  There
may be things to fix in the kernel, some of them may already be fixed
in the latest development kernel, but ultimately the kernel considers
driver binding to be a trusted operation and if userspace doesn't
understand all the dependencies, they shouldn't be doing it.  In this
case libvirt is using the DEVICE_DELETED signal with the assumption
that the device has been fully released by QEMU, which is of course not
accurate (libvirt could test this, but chooses not to).  libvirt
therefore begins trying to unbind a device that is still in use, we try
to handle it, but see official kernel stance that userspace is
responsible for understanding device dependencies, so we can only do so
much.

IMO, the next step along those lines would be that libvirt needs to
understand that even once a device is fully released from QEMU, it's
not necessarily safe to re-bind the device to a host driver.  If the
device is a member of a group where other devices are still in use by
userspace, this will violate user/host device isolation and the kernel
will crash to protect itself.  At best I may be able to improve this to
killing the userspace process making use of the conflicting device, but
the kernel view is that userspace (libvirt) has mandated to bind the
device to the host driver and we must make it so, the user is
responsible for the consequences.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 0/3] qdev/vfio: defer DEVICE_DEL to avoid races with libvirt
  2017-08-08 19:40       ` Alex Williamson
@ 2017-08-09  5:08         ` David Gibson
  2017-09-05 19:35           ` Greg Kurz
  0 siblings, 1 reply; 24+ messages in thread
From: David Gibson @ 2017-08-09  5:08 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Daniel P. Berrange, Peter Maydell, Michael Roth, QEMU Developers,
	Paolo Bonzini, Greg Kurz, Markus Armbruster

[-- Attachment #1: Type: text/plain, Size: 3000 bytes --]

On Tue, Aug 08, 2017 at 01:40:08PM -0600, Alex Williamson wrote:
> On Thu, 27 Jul 2017 12:50:42 +0100
> "Daniel P. Berrange" <berrange@redhat.com> wrote:
> 
> > On Thu, Jul 27, 2017 at 08:53:48PM +1000, David Gibson wrote:
> > > On Thu, Jul 27, 2017 at 10:11:48AM +0100, Peter Maydell wrote:  
> > > > On 27 July 2017 at 02:30, Michael Roth <mdroth@linux.vnet.ibm.com> wrote:  
> > > > > In particular, Mellanox CX4 adapters on PowerNV hosts might not be fully
> > > > > quiesced by vfio-pci's finalize() routine until up to 6s after the
> > > > > DEVICE_DELETED was emitted, leading to detach-device on the libvirt side pretty
> > > > > much always crashing the host.  
> > > > 
> > > > My initial naive thought is that if the host kernel can crash then
> > > > this is a host kernel bug... shouldn't the host kernel refuse
> > > > the subsequent libvirt rebind if it would cause a crash ?  
> > > 
> > > I think so too, but I haven't been able to convince Alex.  Nor
> > > find time to fix it in the kernel myself.  
> > 
> > I think we need to fix both the QEMU premature sending of DEVICE_DELETED
> > and the kernel bug that allowed the crash.
> 
> 
> Where do we stand on this for v2.10?  I'd like to see it get in.  There
> may be things to fix in the kernel, some of them may already be fixed
> in the latest development kernel, but ultimately the kernel considers
> driver binding to be a trusted operation and if userspace doesn't
> understand all the dependencies, they shouldn't be doing it.  In this
> case libvirt is using the DEVICE_DELETED signal with the assumption
> that the device has been fully released by QEMU, which is of course not
> accurate (libvirt could test this, but chooses not to).  libvirt
> therefore begins trying to unbind a device that is still in use, we try
> to handle it, but see official kernel stance that userspace is
> responsible for understanding device dependencies, so we can only do so
> much.
> 
> IMO, the next step along those lines would be that libvirt needs to
> understand that even once a device is fully released from QEMU, it's
> not necessarily safe to re-bind the device to a host driver.  If the
> device is a member of a group where other devices are still in use by
> userspace, this will violate user/host device isolation and the kernel
> will crash to protect itself.  At best I may be able to improve this to
> killing the userspace process making use of the conflicting device, but
> the kernel view is that userspace (libvirt) has mandated to bind the
> device to the host driver and we must make it so, the user is
> responsible for the consequences.  Thanks,

Merging it for 2.10 seems like a good idea to me to, but it's not
really my area of expertise, and therefore not my call.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 3/3] qdev: defer DEVICE_DEL event until instance_finalize()
  2017-07-27  1:30 ` [Qemu-devel] [PATCH for-2.10 3/3] qdev: defer DEVICE_DEL event until instance_finalize() Michael Roth
  2017-07-31 17:11   ` Greg Kurz
@ 2017-08-09 14:04   ` Auger Eric
  2017-10-07  0:03     ` Michael Roth
  1 sibling, 1 reply; 24+ messages in thread
From: Auger Eric @ 2017-08-09 14:04 UTC (permalink / raw)
  To: Michael Roth, qemu-devel
  Cc: peter.maydell, groug, armbru, alex.williamson, pbonzini, david

Hi Michael,

On 27/07/2017 03:30, Michael Roth wrote:
> DEVICE_DEL is currently emitted when a Device is unparented, as
> opposed to when it is finalized. The main design motivation for this
> seems to be that after unparent()/unrealize(), the Device is no
> longer visible to the guest, and thus the operation is complete
> from the perspective of management.
> 
> However, there are cases where remaining host-side cleanup is also
> pertinent to management. The is generally handled by treating these
> resources as aspects of the "backend", which can be managed via
> separate interfaces/events, such as blockdev_add/del, netdev_add/del,
> object_add/del, etc, but some devices do not have this level of
> compartmentalization, namely vfio-pci, and possibly to lend themselves
> well to it.
> 
> In the case of vfio-pci, the "backend" cleanup happens as part of
> the finalization of the vfio-pci device itself, in particular the
> cleanup of the VFIO group FD. Failing to wait for this cleanup can
> result in tools like libvirt attempting to rebind the device to
> the host while it's still being used by VFIO, which can result in
> host crashes or other misbehavior depending on the host driver.
> 
> Deferring DEVICE_DEL still affords us the ability to manage backends
> explicitly, while also addressing cases like vfio-pci's, so we
> implement that approach here.
> 
> An alternative proposal involving having VFIO emit a separate event
> to denote completion of host-side cleanup was discussed, but the
> prevailing opinion seems to be that it is not worth the added
> complexity, and leaves the issue open for other Device implementations
> solve in the future.
> 
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
>  hw/core/qdev.c | 23 ++++++++++++-----------
>  1 file changed, 12 insertions(+), 11 deletions(-)
> 
> diff --git a/hw/core/qdev.c b/hw/core/qdev.c
> index 08c4061..d14acba 100644
> --- a/hw/core/qdev.c
> +++ b/hw/core/qdev.c
> @@ -1067,7 +1067,6 @@ static void device_finalize(Object *obj)
>      NamedGPIOList *ngl, *next;
>  
>      DeviceState *dev = DEVICE(obj);
> -    qemu_opts_del(dev->opts);
>  
>      QLIST_FOREACH_SAFE(ngl, &dev->gpios, node, next) {
>          QLIST_REMOVE(ngl, node);
> @@ -1078,6 +1077,18 @@ static void device_finalize(Object *obj)
>           * here
>           */
>      }
> +
> +    /* Only send event if the device had been completely realized */
> +    if (dev->pending_deleted_event) {
> +        g_assert(dev->canonical_path);
> +
> +        qapi_event_send_device_deleted(!!dev->id, dev->id, dev->canonical_path,
> +                                       &error_abort);
> +        g_free(dev->canonical_path);
> +        dev->canonical_path = NULL;
> +    }
> +
> +    qemu_opts_del(dev->opts);
>  }
>  
>  static void device_class_base_init(ObjectClass *class, void *data)
> @@ -1107,16 +1118,6 @@ static void device_unparent(Object *obj)
>          object_unref(OBJECT(dev->parent_bus));
>          dev->parent_bus = NULL;
>      }
> -
> -    /* Only send event if the device had been completely realized */
> -    if (dev->pending_deleted_event) {
> -        g_assert(dev->canonical_path);
> -
> -        qapi_event_send_device_deleted(!!dev->id, dev->id, dev->canonical_path,
> -                                       &error_abort);
> -        g_free(dev->canonical_path);
> -        dev->canonical_path = NULL;
> -    }
is the code below, introduced in patch 1/device_set_realized() still
relevant?
        /* always re-initialize since we clean up in device_unparent()
instead
         * of unrealize()
         */
        g_free(dev->canonical_path);

Thanks

Eric
>  }
>  
>  static void device_class_init(ObjectClass *class, void *data)
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 0/3] qdev/vfio: defer DEVICE_DEL to avoid races with libvirt
  2017-07-27  1:30 [Qemu-devel] [PATCH for-2.10 0/3] qdev/vfio: defer DEVICE_DEL to avoid races with libvirt Michael Roth
                   ` (3 preceding siblings ...)
  2017-07-27  9:11 ` [Qemu-devel] [PATCH for-2.10 0/3] qdev/vfio: defer DEVICE_DEL to avoid races with libvirt Peter Maydell
@ 2017-08-09 14:53 ` Auger Eric
  2017-10-03 22:21 ` Michael Roth
  5 siblings, 0 replies; 24+ messages in thread
From: Auger Eric @ 2017-08-09 14:53 UTC (permalink / raw)
  To: Michael Roth, qemu-devel
  Cc: peter.maydell, groug, armbru, alex.williamson, pbonzini, david

Hi Michael,

On 27/07/2017 03:30, Michael Roth wrote:
> This series was motivated by the discussion in this thread:
> 
>   https://www.redhat.com/archives/libvir-list/2017-June/msg01370.html
> 
> The issue this series addresses is that when libvirt unplugs a VFIO PCI device,
> it may attempt to bind the host device back to the host driver when QEMU emits
> the DEVICE_DELETED event for the corresponding vfio-pci device. However, the
> VFIO group FD is not actually cleaned up until vfio-pci device is *finalized*
> by QEMU, whereas the event is emitted earlier during device_unparent.
> Depending on the host device and how long certain operations like resetting the
> device might take, this can in result in libvirt trying to rebind the device
> back to the host while it is still in use by VFIO, leading to host crashes or
> other unexpected behavior.
> 
> In particular, Mellanox CX4 adapters on PowerNV hosts might not be fully
> quiesced by vfio-pci's finalize() routine until up to 6s after the
> DEVICE_DELETED was emitted, leading to detach-device on the libvirt side pretty
> much always crashing the host.
> 
> Implementing this change requires 2 prereqs to ensure the same information is
> available when the DEVICE_DELETED is finally emitted:
> 
> 1) Storing the path in the composition patch, which is addressed by PATCH 1,
>    which was plucked from another pending series from Greg Kurz:
> 
>    https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg07922.html
> 
>    since we are now "disconnected" at the time the event is emitted, and
> 
> 2) Deferring qemu_opts_del of the DeviceState->QemuOpts till finalize, since
>    that is where DeviceState->id is stored. This was actually how it was
>    done in the past, so PATCH 2 simply reverts the change which moved it to
>    device_unparent.
> 
> From there it's just a mechanical move of the event from device_unparent to
> device_finalize.
> 
>  hw/core/qdev.c         | 30 +++++++++++++++++++-----------
>  include/hw/qdev-core.h |  1 +
>  2 files changed, 20 insertions(+), 11 deletions(-)
> 
> 

Tested-by: Eric Auger <eric.auger@redhat.com>


Without the series I get the following warning when detaching a pci host
dev from virt-manager:
2017-08-09T14:28:44.825925Z qemu-system-aarch64: Failed to remove group
53 from KVM VFIO device: No such file or directory

as KVM_DEV_VFIO_GROUP_DEL returns -ENOENT due to the fact the fd for the
VFIO group does not exist anymore as libvirt already started to unbind
the device from vfio-pci at vfio_instance_finalize() time.

With the series I don't see the message anymore.

Thanks

Eric

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 0/3] qdev/vfio: defer DEVICE_DEL to avoid races with libvirt
  2017-08-09  5:08         ` David Gibson
@ 2017-09-05 19:35           ` Greg Kurz
  0 siblings, 0 replies; 24+ messages in thread
From: Greg Kurz @ 2017-09-05 19:35 UTC (permalink / raw)
  To: David Gibson
  Cc: Alex Williamson, Peter Maydell, QEMU Developers,
	Markus Armbruster, Michael Roth, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 3171 bytes --]

On Wed, 9 Aug 2017 15:08:42 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Tue, Aug 08, 2017 at 01:40:08PM -0600, Alex Williamson wrote:
> > On Thu, 27 Jul 2017 12:50:42 +0100
> > "Daniel P. Berrange" <berrange@redhat.com> wrote:
> >   
> > > On Thu, Jul 27, 2017 at 08:53:48PM +1000, David Gibson wrote:  
> > > > On Thu, Jul 27, 2017 at 10:11:48AM +0100, Peter Maydell wrote:    
> > > > > On 27 July 2017 at 02:30, Michael Roth <mdroth@linux.vnet.ibm.com> wrote:    
> > > > > > In particular, Mellanox CX4 adapters on PowerNV hosts might not be fully
> > > > > > quiesced by vfio-pci's finalize() routine until up to 6s after the
> > > > > > DEVICE_DELETED was emitted, leading to detach-device on the libvirt side pretty
> > > > > > much always crashing the host.    
> > > > > 
> > > > > My initial naive thought is that if the host kernel can crash then
> > > > > this is a host kernel bug... shouldn't the host kernel refuse
> > > > > the subsequent libvirt rebind if it would cause a crash ?    
> > > > 
> > > > I think so too, but I haven't been able to convince Alex.  Nor
> > > > find time to fix it in the kernel myself.    
> > > 
> > > I think we need to fix both the QEMU premature sending of DEVICE_DELETED
> > > and the kernel bug that allowed the crash.  
> > 
> > 
> > Where do we stand on this for v2.10?  I'd like to see it get in.  There
> > may be things to fix in the kernel, some of them may already be fixed
> > in the latest development kernel, but ultimately the kernel considers
> > driver binding to be a trusted operation and if userspace doesn't
> > understand all the dependencies, they shouldn't be doing it.  In this
> > case libvirt is using the DEVICE_DELETED signal with the assumption
> > that the device has been fully released by QEMU, which is of course not
> > accurate (libvirt could test this, but chooses not to).  libvirt
> > therefore begins trying to unbind a device that is still in use, we try
> > to handle it, but see official kernel stance that userspace is
> > responsible for understanding device dependencies, so we can only do so
> > much.
> > 
> > IMO, the next step along those lines would be that libvirt needs to
> > understand that even once a device is fully released from QEMU, it's
> > not necessarily safe to re-bind the device to a host driver.  If the
> > device is a member of a group where other devices are still in use by
> > userspace, this will violate user/host device isolation and the kernel
> > will crash to protect itself.  At best I may be able to improve this to
> > killing the userspace process making use of the conflicting device, but
> > the kernel view is that userspace (libvirt) has mandated to bind the
> > device to the host driver and we must make it so, the user is
> > responsible for the consequences.  Thanks,  
> 
> Merging it for 2.10 seems like a good idea to me to, but it's not
> really my area of expertise, and therefore not my call.
> 

It looks like there was some kind of consensus on this series, but it
didn't make it for 2.10. Is there something more to discuss ?

Cheers,

--
Greg

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 0/3] qdev/vfio: defer DEVICE_DEL to avoid races with libvirt
  2017-07-27  1:30 [Qemu-devel] [PATCH for-2.10 0/3] qdev/vfio: defer DEVICE_DEL to avoid races with libvirt Michael Roth
                   ` (4 preceding siblings ...)
  2017-08-09 14:53 ` Auger Eric
@ 2017-10-03 22:21 ` Michael Roth
  2017-10-04  6:01   ` David Gibson
  2017-10-06 10:23   ` David Gibson
  5 siblings, 2 replies; 24+ messages in thread
From: Michael Roth @ 2017-10-03 22:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, groug, armbru, alex.williamson, pbonzini, david

Quoting Michael Roth (2017-07-26 20:30:52)
> This series was motivated by the discussion in this thread:
> 
>   https://www.redhat.com/archives/libvir-list/2017-June/msg01370.html
> 
> The issue this series addresses is that when libvirt unplugs a VFIO PCI device,
> it may attempt to bind the host device back to the host driver when QEMU emits
> the DEVICE_DELETED event for the corresponding vfio-pci device. However, the
> VFIO group FD is not actually cleaned up until vfio-pci device is *finalized*
> by QEMU, whereas the event is emitted earlier during device_unparent.
> Depending on the host device and how long certain operations like resetting the
> device might take, this can in result in libvirt trying to rebind the device
> back to the host while it is still in use by VFIO, leading to host crashes or
> other unexpected behavior.
> 
> In particular, Mellanox CX4 adapters on PowerNV hosts might not be fully
> quiesced by vfio-pci's finalize() routine until up to 6s after the
> DEVICE_DELETED was emitted, leading to detach-device on the libvirt side pretty
> much always crashing the host.
> 
> Implementing this change requires 2 prereqs to ensure the same information is
> available when the DEVICE_DELETED is finally emitted:
> 
> 1) Storing the path in the composition patch, which is addressed by PATCH 1,
>    which was plucked from another pending series from Greg Kurz:
> 
>    https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg07922.html
> 
>    since we are now "disconnected" at the time the event is emitted, and
> 
> 2) Deferring qemu_opts_del of the DeviceState->QemuOpts till finalize, since
>    that is where DeviceState->id is stored. This was actually how it was
>    done in the past, so PATCH 2 simply reverts the change which moved it to
>    device_unparent.
> 
> From there it's just a mechanical move of the event from device_unparent to
> device_finalize.

Ping.

The situation has changed somewhat since original posting as Alex now
has a fix on the kernel side for the VFIO issue noted above:

  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6586b561a91cd80a91c8f107ed0d144feb3eadc2

However, I think this series would still be useful for addressing the
issue for hosts using older kernels, and there seems to be general
interest from the libvirt side in aligning "DEVICE_DELETED" events
with the notion that QEMU is completely finished with a device.

Patch 1/3 is also needed for another series that Greg is working on
and I don't want to hold that up for this series, so if it's
preferred that we just post that patch separately in the meantime
please let me know.

Thanks!

> 
>  hw/core/qdev.c         | 30 +++++++++++++++++++-----------
>  include/hw/qdev-core.h |  1 +
>  2 files changed, 20 insertions(+), 11 deletions(-)
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 0/3] qdev/vfio: defer DEVICE_DEL to avoid races with libvirt
  2017-10-03 22:21 ` Michael Roth
@ 2017-10-04  6:01   ` David Gibson
  2017-10-06 10:23   ` David Gibson
  1 sibling, 0 replies; 24+ messages in thread
From: David Gibson @ 2017-10-04  6:01 UTC (permalink / raw)
  To: Michael Roth
  Cc: qemu-devel, peter.maydell, groug, armbru, alex.williamson, pbonzini

[-- Attachment #1: Type: text/plain, Size: 3240 bytes --]

On Tue, Oct 03, 2017 at 05:21:57PM -0500, Michael Roth wrote:
> Quoting Michael Roth (2017-07-26 20:30:52)
> > This series was motivated by the discussion in this thread:
> > 
> >   https://www.redhat.com/archives/libvir-list/2017-June/msg01370.html
> > 
> > The issue this series addresses is that when libvirt unplugs a VFIO PCI device,
> > it may attempt to bind the host device back to the host driver when QEMU emits
> > the DEVICE_DELETED event for the corresponding vfio-pci device. However, the
> > VFIO group FD is not actually cleaned up until vfio-pci device is *finalized*
> > by QEMU, whereas the event is emitted earlier during device_unparent.
> > Depending on the host device and how long certain operations like resetting the
> > device might take, this can in result in libvirt trying to rebind the device
> > back to the host while it is still in use by VFIO, leading to host crashes or
> > other unexpected behavior.
> > 
> > In particular, Mellanox CX4 adapters on PowerNV hosts might not be fully
> > quiesced by vfio-pci's finalize() routine until up to 6s after the
> > DEVICE_DELETED was emitted, leading to detach-device on the libvirt side pretty
> > much always crashing the host.
> > 
> > Implementing this change requires 2 prereqs to ensure the same information is
> > available when the DEVICE_DELETED is finally emitted:
> > 
> > 1) Storing the path in the composition patch, which is addressed by PATCH 1,
> >    which was plucked from another pending series from Greg Kurz:
> > 
> >    https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg07922.html
> > 
> >    since we are now "disconnected" at the time the event is emitted, and
> > 
> > 2) Deferring qemu_opts_del of the DeviceState->QemuOpts till finalize, since
> >    that is where DeviceState->id is stored. This was actually how it was
> >    done in the past, so PATCH 2 simply reverts the change which moved it to
> >    device_unparent.
> > 
> > From there it's just a mechanical move of the event from device_unparent to
> > device_finalize.
> 
> Ping.
> 
> The situation has changed somewhat since original posting as Alex now
> has a fix on the kernel side for the VFIO issue noted above:
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6586b561a91cd80a91c8f107ed0d144feb3eadc2
> 
> However, I think this series would still be useful for addressing the
> issue for hosts using older kernels, and there seems to be general
> interest from the libvirt side in aligning "DEVICE_DELETED" events
> with the notion that QEMU is completely finished with a device.
> 
> Patch 1/3 is also needed for another series that Greg is working on
> and I don't want to hold that up for this series, so if it's
> preferred that we just post that patch separately in the meantime
> please let me know.

Yeah, I'd been meaning to ask you about this.  I think it just got
lost in the rush of things for qemu-2.10.  I think it would be worth
rebasing and reposting.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 0/3] qdev/vfio: defer DEVICE_DEL to avoid races with libvirt
  2017-10-03 22:21 ` Michael Roth
  2017-10-04  6:01   ` David Gibson
@ 2017-10-06 10:23   ` David Gibson
  2017-10-06 12:31     ` Paolo Bonzini
  1 sibling, 1 reply; 24+ messages in thread
From: David Gibson @ 2017-10-06 10:23 UTC (permalink / raw)
  To: Michael Roth
  Cc: qemu-devel, peter.maydell, groug, armbru, alex.williamson, pbonzini

[-- Attachment #1: Type: text/plain, Size: 3251 bytes --]

On Tue, Oct 03, 2017 at 05:21:57PM -0500, Michael Roth wrote:
> Quoting Michael Roth (2017-07-26 20:30:52)
> > This series was motivated by the discussion in this thread:
> > 
> >   https://www.redhat.com/archives/libvir-list/2017-June/msg01370.html
> > 
> > The issue this series addresses is that when libvirt unplugs a VFIO PCI device,
> > it may attempt to bind the host device back to the host driver when QEMU emits
> > the DEVICE_DELETED event for the corresponding vfio-pci device. However, the
> > VFIO group FD is not actually cleaned up until vfio-pci device is *finalized*
> > by QEMU, whereas the event is emitted earlier during device_unparent.
> > Depending on the host device and how long certain operations like resetting the
> > device might take, this can in result in libvirt trying to rebind the device
> > back to the host while it is still in use by VFIO, leading to host crashes or
> > other unexpected behavior.
> > 
> > In particular, Mellanox CX4 adapters on PowerNV hosts might not be fully
> > quiesced by vfio-pci's finalize() routine until up to 6s after the
> > DEVICE_DELETED was emitted, leading to detach-device on the libvirt side pretty
> > much always crashing the host.
> > 
> > Implementing this change requires 2 prereqs to ensure the same information is
> > available when the DEVICE_DELETED is finally emitted:
> > 
> > 1) Storing the path in the composition patch, which is addressed by PATCH 1,
> >    which was plucked from another pending series from Greg Kurz:
> > 
> >    https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg07922.html
> > 
> >    since we are now "disconnected" at the time the event is emitted, and
> > 
> > 2) Deferring qemu_opts_del of the DeviceState->QemuOpts till finalize, since
> >    that is where DeviceState->id is stored. This was actually how it was
> >    done in the past, so PATCH 2 simply reverts the change which moved it to
> >    device_unparent.
> > 
> > From there it's just a mechanical move of the event from device_unparent to
> > device_finalize.
> 
> Ping.

I think it got forgotten amongst the rush to get v2.10 out.

> The situation has changed somewhat since original posting as Alex now
> has a fix on the kernel side for the VFIO issue noted above:
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6586b561a91cd80a91c8f107ed0d144feb3eadc2
> 
> However, I think this series would still be useful for addressing the
> issue for hosts using older kernels, and there seems to be general
> interest from the libvirt side in aligning "DEVICE_DELETED" events
> with the notion that QEMU is completely finished with a device.
> 
> Patch 1/3 is also needed for another series that Greg is working on
> and I don't want to hold that up for this series, so if it's
> preferred that we just post that patch separately in the meantime
> please let me know.

Yeah, now that I've understood the issue properly, I also think this
is a good idea.  I recommend reposting.


-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 0/3] qdev/vfio: defer DEVICE_DEL to avoid races with libvirt
  2017-10-06 10:23   ` David Gibson
@ 2017-10-06 12:31     ` Paolo Bonzini
  0 siblings, 0 replies; 24+ messages in thread
From: Paolo Bonzini @ 2017-10-06 12:31 UTC (permalink / raw)
  To: David Gibson, Michael Roth
  Cc: qemu-devel, peter.maydell, groug, armbru, alex.williamson

[-- Attachment #1: Type: text/plain, Size: 498 bytes --]

On 06/10/2017 12:23, David Gibson wrote:
> On Tue, Oct 03, 2017 at 05:21:57PM -0500, Michael Roth wrote:
>> Patch 1/3 is also needed for another series that Greg is working on
>> and I don't want to hold that up for this series, so if it's
>> preferred that we just post that patch separately in the meantime
>> please let me know.
> 
> Yeah, now that I've understood the issue properly, I also think this
> is a good idea.  I recommend reposting.

Yes, please do!

Thanks,

Paolo


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 3/3] qdev: defer DEVICE_DEL event until instance_finalize()
  2017-08-09 14:04   ` Auger Eric
@ 2017-10-07  0:03     ` Michael Roth
  0 siblings, 0 replies; 24+ messages in thread
From: Michael Roth @ 2017-10-07  0:03 UTC (permalink / raw)
  To: Auger Eric, qemu-devel
  Cc: peter.maydell, groug, armbru, alex.williamson, pbonzini, david

Quoting Auger Eric (2017-08-09 09:04:54)
> Hi Michael,
> 
> On 27/07/2017 03:30, Michael Roth wrote:
> > DEVICE_DEL is currently emitted when a Device is unparented, as
> > opposed to when it is finalized. The main design motivation for this
> > seems to be that after unparent()/unrealize(), the Device is no
> > longer visible to the guest, and thus the operation is complete
> > from the perspective of management.
> > 
> > However, there are cases where remaining host-side cleanup is also
> > pertinent to management. The is generally handled by treating these
> > resources as aspects of the "backend", which can be managed via
> > separate interfaces/events, such as blockdev_add/del, netdev_add/del,
> > object_add/del, etc, but some devices do not have this level of
> > compartmentalization, namely vfio-pci, and possibly to lend themselves
> > well to it.
> > 
> > In the case of vfio-pci, the "backend" cleanup happens as part of
> > the finalization of the vfio-pci device itself, in particular the
> > cleanup of the VFIO group FD. Failing to wait for this cleanup can
> > result in tools like libvirt attempting to rebind the device to
> > the host while it's still being used by VFIO, which can result in
> > host crashes or other misbehavior depending on the host driver.
> > 
> > Deferring DEVICE_DEL still affords us the ability to manage backends
> > explicitly, while also addressing cases like vfio-pci's, so we
> > implement that approach here.
> > 
> > An alternative proposal involving having VFIO emit a separate event
> > to denote completion of host-side cleanup was discussed, but the
> > prevailing opinion seems to be that it is not worth the added
> > complexity, and leaves the issue open for other Device implementations
> > solve in the future.
> > 
> > Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > ---
> >  hw/core/qdev.c | 23 ++++++++++++-----------
> >  1 file changed, 12 insertions(+), 11 deletions(-)
> > 
> > diff --git a/hw/core/qdev.c b/hw/core/qdev.c
> > index 08c4061..d14acba 100644
> > --- a/hw/core/qdev.c
> > +++ b/hw/core/qdev.c
> > @@ -1067,7 +1067,6 @@ static void device_finalize(Object *obj)
> >      NamedGPIOList *ngl, *next;
> >  
> >      DeviceState *dev = DEVICE(obj);
> > -    qemu_opts_del(dev->opts);
> >  
> >      QLIST_FOREACH_SAFE(ngl, &dev->gpios, node, next) {
> >          QLIST_REMOVE(ngl, node);
> > @@ -1078,6 +1077,18 @@ static void device_finalize(Object *obj)
> >           * here
> >           */
> >      }
> > +
> > +    /* Only send event if the device had been completely realized */
> > +    if (dev->pending_deleted_event) {
> > +        g_assert(dev->canonical_path);
> > +
> > +        qapi_event_send_device_deleted(!!dev->id, dev->id, dev->canonical_path,
> > +                                       &error_abort);
> > +        g_free(dev->canonical_path);
> > +        dev->canonical_path = NULL;
> > +    }
> > +
> > +    qemu_opts_del(dev->opts);
> >  }
> >  
> >  static void device_class_base_init(ObjectClass *class, void *data)
> > @@ -1107,16 +1118,6 @@ static void device_unparent(Object *obj)
> >          object_unref(OBJECT(dev->parent_bus));
> >          dev->parent_bus = NULL;
> >      }
> > -
> > -    /* Only send event if the device had been completely realized */
> > -    if (dev->pending_deleted_event) {
> > -        g_assert(dev->canonical_path);
> > -
> > -        qapi_event_send_device_deleted(!!dev->id, dev->id, dev->canonical_path,
> > -                                       &error_abort);
> > -        g_free(dev->canonical_path);
> > -        dev->canonical_path = NULL;
> > -    }
> is the code below, introduced in patch 1/device_set_realized() still
> relevant?
>         /* always re-initialize since we clean up in device_unparent()
> instead
>          * of unrealize()
>          */
>         g_free(dev->canonical_path);

Hi Eric,

Sorry for missing your reply previously. That comment does indeed need
some adjusting after patch 3. Will fix it up for v2.

> 
> Thanks
> 
> Eric
> >  }
> >  
> >  static void device_class_init(ObjectClass *class, void *data)
> > 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2017-10-07  0:23 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-27  1:30 [Qemu-devel] [PATCH for-2.10 0/3] qdev/vfio: defer DEVICE_DEL to avoid races with libvirt Michael Roth
2017-07-27  1:30 ` [Qemu-devel] [PATCH for-2.10 1/3] qdev: store DeviceState's canonical path to use when unparenting Michael Roth
2017-07-27  1:30 ` [Qemu-devel] [PATCH for-2.10 2/3] Revert "qdev: Free QemuOpts when the QOM path goes away" Michael Roth
2017-07-31 15:51   ` Greg Kurz
2017-07-31 16:39     ` Michael Roth
2017-07-31 17:10       ` Greg Kurz
2017-07-27  1:30 ` [Qemu-devel] [PATCH for-2.10 3/3] qdev: defer DEVICE_DEL event until instance_finalize() Michael Roth
2017-07-31 17:11   ` Greg Kurz
2017-08-09 14:04   ` Auger Eric
2017-10-07  0:03     ` Michael Roth
2017-07-27  9:11 ` [Qemu-devel] [PATCH for-2.10 0/3] qdev/vfio: defer DEVICE_DEL to avoid races with libvirt Peter Maydell
2017-07-27 10:53   ` David Gibson
2017-07-27 11:50     ` Daniel P. Berrange
2017-08-08 19:40       ` Alex Williamson
2017-08-09  5:08         ` David Gibson
2017-09-05 19:35           ` Greg Kurz
2017-07-27 11:54     ` Michael Roth
2017-07-27 14:47     ` Alex Williamson
2017-07-28  3:14       ` David Gibson
2017-08-09 14:53 ` Auger Eric
2017-10-03 22:21 ` Michael Roth
2017-10-04  6:01   ` David Gibson
2017-10-06 10:23   ` David Gibson
2017-10-06 12:31     ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.