All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH for-2.10 0/3] qdev/vfio: defer DEVICE_DEL to avoid races with libvirt
@ 2017-07-27  1:30 Michael Roth
  2017-07-27  1:30 ` [Qemu-devel] [PATCH for-2.10 1/3] qdev: store DeviceState's canonical path to use when unparenting Michael Roth
                   ` (5 more replies)
  0 siblings, 6 replies; 24+ messages in thread
From: Michael Roth @ 2017-07-27  1:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, berrange, alex.williamson, pbonzini, david, groug, armbru

This series was motivated by the discussion in this thread:

  https://www.redhat.com/archives/libvir-list/2017-June/msg01370.html

The issue this series addresses is that when libvirt unplugs a VFIO PCI device,
it may attempt to bind the host device back to the host driver when QEMU emits
the DEVICE_DELETED event for the corresponding vfio-pci device. However, the
VFIO group FD is not actually cleaned up until vfio-pci device is *finalized*
by QEMU, whereas the event is emitted earlier during device_unparent.
Depending on the host device and how long certain operations like resetting the
device might take, this can in result in libvirt trying to rebind the device
back to the host while it is still in use by VFIO, leading to host crashes or
other unexpected behavior.

In particular, Mellanox CX4 adapters on PowerNV hosts might not be fully
quiesced by vfio-pci's finalize() routine until up to 6s after the
DEVICE_DELETED was emitted, leading to detach-device on the libvirt side pretty
much always crashing the host.

Implementing this change requires 2 prereqs to ensure the same information is
available when the DEVICE_DELETED is finally emitted:

1) Storing the path in the composition patch, which is addressed by PATCH 1,
   which was plucked from another pending series from Greg Kurz:

   https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg07922.html

   since we are now "disconnected" at the time the event is emitted, and

2) Deferring qemu_opts_del of the DeviceState->QemuOpts till finalize, since
   that is where DeviceState->id is stored. This was actually how it was
   done in the past, so PATCH 2 simply reverts the change which moved it to
   device_unparent.

>From there it's just a mechanical move of the event from device_unparent to
device_finalize.

 hw/core/qdev.c         | 30 +++++++++++++++++++-----------
 include/hw/qdev-core.h |  1 +
 2 files changed, 20 insertions(+), 11 deletions(-)

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2017-10-07  0:23 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-27  1:30 [Qemu-devel] [PATCH for-2.10 0/3] qdev/vfio: defer DEVICE_DEL to avoid races with libvirt Michael Roth
2017-07-27  1:30 ` [Qemu-devel] [PATCH for-2.10 1/3] qdev: store DeviceState's canonical path to use when unparenting Michael Roth
2017-07-27  1:30 ` [Qemu-devel] [PATCH for-2.10 2/3] Revert "qdev: Free QemuOpts when the QOM path goes away" Michael Roth
2017-07-31 15:51   ` Greg Kurz
2017-07-31 16:39     ` Michael Roth
2017-07-31 17:10       ` Greg Kurz
2017-07-27  1:30 ` [Qemu-devel] [PATCH for-2.10 3/3] qdev: defer DEVICE_DEL event until instance_finalize() Michael Roth
2017-07-31 17:11   ` Greg Kurz
2017-08-09 14:04   ` Auger Eric
2017-10-07  0:03     ` Michael Roth
2017-07-27  9:11 ` [Qemu-devel] [PATCH for-2.10 0/3] qdev/vfio: defer DEVICE_DEL to avoid races with libvirt Peter Maydell
2017-07-27 10:53   ` David Gibson
2017-07-27 11:50     ` Daniel P. Berrange
2017-08-08 19:40       ` Alex Williamson
2017-08-09  5:08         ` David Gibson
2017-09-05 19:35           ` Greg Kurz
2017-07-27 11:54     ` Michael Roth
2017-07-27 14:47     ` Alex Williamson
2017-07-28  3:14       ` David Gibson
2017-08-09 14:53 ` Auger Eric
2017-10-03 22:21 ` Michael Roth
2017-10-04  6:01   ` David Gibson
2017-10-06 10:23   ` David Gibson
2017-10-06 12:31     ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.