qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/11] add failover feature for assigned network devices
@ 2019-10-23  8:27 Jens Freimann
  2019-10-23  8:27 ` [PATCH v5 01/11] qdev/qbus: add hidden device support Jens Freimann
                   ` (10 more replies)
  0 siblings, 11 replies; 34+ messages in thread
From: Jens Freimann @ 2019-10-23  8:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: pkrempa, berrange, ehabkost, mst, aadam, dgilbert,
	alex.williamson, laine, ailan, parav

This is implementing the host side of the net_failover concept
(https://www.kernel.org/doc/html/latest/networking/net_failover.html)

Changes since v4:
* Patch 1, qdev, add comment to DeviceClass and qdev_should_hide_device
           function
* Patch 2  pci, set flag that allows unplug during migration in qdev
           code (aw)
* Patch 10 virtio, fix error handling
* Patch 11 vfio, fix error handling in vfio_realize (Conny)
           move allow_unplug_during_migraton flag to generic PCI 

The general idea is that we have a pair of devices, a vfio-pci and a
virtio-net device. Before migration the vfio device is unplugged and data
flows to the virtio-net device, on the target side another vfio-pci device
is plugged in to take over the data-path. In the guest the net_failover
module will pair net devices with the same MAC address.

* Patch 1 adds the infrastructure to hide the device for the qbus and qdev APIs

* Patch 2 adds checks to PCIDevice for only allowing ethernet devices as
  failover primary and only PCIExpress capable devices

* Patch 3 sets a new flag for PCIDevice 'partially_hotplugged' which we
  use to skip the unrealize code path when doing a unplug of the primary
  device

* Patch 4 sets the pending_deleted_event before triggering the guest
  unplug request

* Patch 5 and 6 add new qmp events, one sends the device id of a device
  that was just requested to be unplugged from the guest and another one
  to let libvirt know if VIRTIO_NET_F_STANDBY was negotiated

* Patch 7 make sure that we can unplug the vfio-device before
  migration starts

* Patch 8 adds a new migration state that is entered while we wait for
  devices to be unplugged by guest OS

* Patch 9 just adds the new migration state to a check in libqos code

* Patch 10 In the second patch the virtio-net uses the API to defer adding the vfio
  device until the VIRTIO_NET_F_STANDBY feature is acked. It also
  implements the migration handler to unplug the device from the guest and
  re-plug in case of migration failure

* Patch 11 allows migration for failover vfio-pci devices

Previous discussion:
  RFC v1 https://patchwork.ozlabs.org/cover/989098/
  RFC v2 https://www.mail-archive.com/qemu-devel@nongnu.org/msg606906.html
  v1: https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg03968.html
  v2: https://www.mail-archive.com/qemu-devel@nongnu.org/msg635214.html
  v3: https://patchew.org/QEMU/20191011112015.11785-1-jfreimann@redhat.com/
  v4: https://patchew.org/QEMU/20191018202040.30349-1-jfreimann@redhat.com/

To summarize concerns/feedback from previous discussion:
1.- guest OS can reject or worse _delay_ unplug by any amount of time.
  Migration might get stuck for unpredictable time with unclear reason.
  This approach combines two tricky things, hot/unplug and migration.
  -> We need to let libvirt know what's happening. Add new qmp events
     and a new migration state. When a primary device is (partially)
     unplugged (only from guest) we send a qmp event with the device id. When
     it is unplugged from the guest the DEVICE_DELETED event is sent.
     Migration will enter the wait-unplug state while waiting for the guest
     os to unplug all primary devices and then move on with migration.
2. PCI devices are a precious ressource. The primary device should never
  be added to QEMU if it won't be used by guest instead of hiding it in
  QEMU.
  -> We only hotplug the device when the standby feature bit was
     negotiated. We save the device cmdline options until we need it for
     qdev_device_add()
     Hiding a device can be a useful concept to model. For example a
     pci device in a powered-off slot could be marked as hidden until the slot is
     powered on (mst).
3. Management layer software should handle this. Open Stack already has
  components/code to handle unplug/replug VFIO devices and metadata to
  provide to the guest for detecting which devices should be paired.
  -> An approach that includes all software from firmware to
     higher-level management software wasn't tried in the last years. This is
     an attempt to keep it simple and contained in QEMU as much as possible.
     One of the problems that stopped management software and libvirt from
     implementing this idea is that it can't be sure that it's possible to
     re-plug the primary device. By not freeing the devices resources in QEMU
     and only asking the guest OS to unplug it is possible to re-plug the
     device in case of a migration failure.
4. Hotplugging a device and then making it part of a failover setup is
   not possible
  -> addressed by extending qdev hotplug functions to check for hidden
     attribute, so e.g. device_add can be used to plug a device.


I have tested this with a mlx5 and igb NIC and was able to migrate the VM.

Command line example:

qemu-system-x86_64 -enable-kvm -m 3072 -smp 3 \
        -machine q35,kernel-irqchip=split -cpu host   \
        -serial stdio   \
        -net none \
        -qmp unix:/tmp/qmp.socket,server,nowait \
        -monitor telnet:127.0.0.1:5555,server,nowait \
        -device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
        -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
        -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
        -netdev tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \
        -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,failover=on \
	-device vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,net_failover_pair_id =net1 \
        /root/rhel-guest-image-8.0-1781.x86_64.qcow2

I'm grateful for any remarks or ideas!

Thanks!

regards,
Jens 

Jens Freimann (11):
  qdev/qbus: add hidden device support
  pci: add option for net failover
  pci: mark devices partially unplugged
  pci: mark device having guest unplug request pending
  qapi: add unplug primary event
  qapi: add failover negotiated event
  migration: allow unplug during migration for failover devices
  migration: add new migration state wait-unplug
  libqos: tolerate wait-unplug migration state
  net/virtio: add failover support
  vfio: unplug failover primary device before migration

 hw/core/qdev.c                 |  24 +++
 hw/net/virtio-net.c            | 302 +++++++++++++++++++++++++++++++++
 hw/pci/pci.c                   |  18 ++
 hw/pci/pcie.c                  |   6 +
 hw/vfio/pci.c                  |  26 ++-
 hw/vfio/pci.h                  |   1 +
 include/hw/pci/pci.h           |   4 +
 include/hw/qdev-core.h         |  30 ++++
 include/hw/virtio/virtio-net.h |  12 ++
 include/hw/virtio/virtio.h     |   1 +
 include/migration/vmstate.h    |   2 +
 migration/migration.c          |  21 +++
 migration/migration.h          |   3 +
 migration/savevm.c             |  36 ++++
 migration/savevm.h             |   2 +
 qapi/migration.json            |  24 ++-
 qapi/net.json                  |  16 ++
 qdev-monitor.c                 |  43 ++++-
 tests/libqos/libqos.c          |   3 +-
 vl.c                           |   6 +-
 20 files changed, 565 insertions(+), 15 deletions(-)

-- 
2.21.0



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v5 01/11] qdev/qbus: add hidden device support
  2019-10-23  8:27 [PATCH v5 0/11] add failover feature for assigned network devices Jens Freimann
@ 2019-10-23  8:27 ` Jens Freimann
  2019-10-23  8:27 ` [PATCH v5 02/11] pci: add option for net failover Jens Freimann
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 34+ messages in thread
From: Jens Freimann @ 2019-10-23  8:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: pkrempa, berrange, ehabkost, mst, aadam, dgilbert,
	alex.williamson, laine, ailan, parav

This adds support for hiding a device to the qbus and qdev APIs.  The
first user of this will be the virtio-net failover feature but the API
introduced with this patch could be used to implement other features as
well, for example hiding pci devices when a pci bus is powered off.

qdev_device_add() is modified to check for a net_failover_pair_id
argument in the option string. A DeviceListener callback
should_be_hidden() is added. It can be used by a standby device to
inform qdev that this device should not be added now. The standby device
handler can store the device options to plug the device in at a later
point in time.

One reason for hiding the device is that we don't want to expose both
devices to the guest kernel until the respective virtio feature bit
VIRTIO_NET_F_STANDBY was negotiated and we know that the devices will be
handled correctly by the guest.

More information on the kernel feature this is using:
 https://www.kernel.org/doc/html/latest/networking/net_failover.html

An example where the primary device is a vfio-pci device and the standby
device is a virtio-net device:

A device is hidden when it has an "net_failover_pair_id" option, e.g.

 -device virtio-net-pci,...,failover=on,...
 -device vfio-pci,...,net_failover_pair_id=net1,...

Signed-off-by: Jens Freimann <jfreimann@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
---
 hw/core/qdev.c         | 23 +++++++++++++++++++++++
 include/hw/qdev-core.h | 29 +++++++++++++++++++++++++++++
 qdev-monitor.c         | 41 +++++++++++++++++++++++++++++++++++++----
 vl.c                   |  6 ++++--
 4 files changed, 93 insertions(+), 6 deletions(-)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index cbad6c1d55..f786650446 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -212,6 +212,29 @@ void device_listener_unregister(DeviceListener *listener)
     QTAILQ_REMOVE(&device_listeners, listener, link);
 }
 
+bool qdev_should_hide_device(QemuOpts *opts)
+{
+    int rc = -1;
+    DeviceListener *listener;
+
+    QTAILQ_FOREACH(listener, &device_listeners, link) {
+       if (listener->should_be_hidden) {
+            /* should_be_hidden_will return
+             *  1 if device matches opts and it should be hidden
+             *  0 if device matches opts and should not be hidden
+             *  -1 if device doesn't match opts
+             */
+            rc = listener->should_be_hidden(listener, opts);
+        }
+
+        if (rc > 0) {
+            break;
+        }
+    }
+
+    return rc > 0;
+}
+
 void qdev_set_legacy_instance_id(DeviceState *dev, int alias_id,
                                  int required_for_version)
 {
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index aa123f88cb..710981af36 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -78,6 +78,19 @@ typedef void (*BusUnrealize)(BusState *bus, Error **errp);
  * respective parent types.
  *   </para>
  * </note>
+ *
+ * # Hiding a device #
+ * To hide a device, a DeviceListener function should_be_hidden() needs to
+ * be registered.
+ * It can be used to defer adding a device and therefore hide it from the
+ * guest. The handler registering to this DeviceListener can save the QOpts
+ * passed to it for re-using it later and must return that it wants the device
+ * to be/remain hidden or not. When the handler function decides the device
+ * shall not be hidden it will be added in qdev_device_add() and
+ * realized as any other device. Otherwise qdev_device_add() will return early
+ * without adding the device. The guest will not see a "hidden" device
+ * until it was marked don't hide and qdev_device_add called again.
+ *
  */
 typedef struct DeviceClass {
     /*< private >*/
@@ -154,6 +167,12 @@ struct DeviceState {
 struct DeviceListener {
     void (*realize)(DeviceListener *listener, DeviceState *dev);
     void (*unrealize)(DeviceListener *listener, DeviceState *dev);
+    /*
+     * This callback is called upon init of the DeviceState and allows to
+     * inform qdev that a device should be hidden, depending on the device
+     * opts, for example, to hide a standby device.
+     */
+    int (*should_be_hidden)(DeviceListener *listener, QemuOpts *device_opts);
     QTAILQ_ENTRY(DeviceListener) link;
 };
 
@@ -451,4 +470,14 @@ static inline bool qbus_is_hotpluggable(BusState *bus)
 void device_listener_register(DeviceListener *listener);
 void device_listener_unregister(DeviceListener *listener);
 
+/**
+ * @qdev_should_hide_device:
+ * @opts: QemuOpts as passed on cmdline.
+ *
+ * Check if a device should be added.
+ * When a device is added via qdev_device_add() this will be called,
+ * and return if the device should be added now or not.
+ */
+bool qdev_should_hide_device(QemuOpts *opts);
+
 #endif
diff --git a/qdev-monitor.c b/qdev-monitor.c
index 148df9cacf..676a759fb4 100644
--- a/qdev-monitor.c
+++ b/qdev-monitor.c
@@ -32,9 +32,11 @@
 #include "qemu/help_option.h"
 #include "qemu/option.h"
 #include "qemu/qemu-print.h"
+#include "qemu/option_int.h"
 #include "sysemu/block-backend.h"
 #include "sysemu/sysemu.h"
 #include "migration/misc.h"
+#include "migration/migration.h"
 
 /*
  * Aliases were a bad idea from the start.  Let's keep them
@@ -562,13 +564,36 @@ void qdev_set_id(DeviceState *dev, const char *id)
     }
 }
 
+static int is_failover_device(void *opaque, const char *name, const char *value,
+                        Error **errp)
+{
+    if (strcmp(name, "net_failover_pair_id") == 0) {
+        QemuOpts *opts = (QemuOpts *)opaque;
+
+        if (qdev_should_hide_device(opts)) {
+            return 1;
+        }
+    }
+
+    return 0;
+}
+
+static bool should_hide_device(QemuOpts *opts)
+{
+    if (qemu_opt_foreach(opts, is_failover_device, opts, NULL) == 0) {
+        return false;
+    }
+    return true;
+}
+
 DeviceState *qdev_device_add(QemuOpts *opts, Error **errp)
 {
     DeviceClass *dc;
     const char *driver, *path;
-    DeviceState *dev;
+    DeviceState *dev = NULL;
     BusState *bus = NULL;
     Error *err = NULL;
+    bool hide;
 
     driver = qemu_opt_get(opts, "driver");
     if (!driver) {
@@ -602,11 +627,17 @@ DeviceState *qdev_device_add(QemuOpts *opts, Error **errp)
             return NULL;
         }
     }
-    if (qdev_hotplug && bus && !qbus_is_hotpluggable(bus)) {
+    hide = should_hide_device(opts);
+
+    if ((hide || qdev_hotplug) && bus && !qbus_is_hotpluggable(bus)) {
         error_setg(errp, QERR_BUS_NO_HOTPLUG, bus->name);
         return NULL;
     }
 
+    if (hide) {
+        return NULL;
+    }
+
     if (!migration_is_idle()) {
         error_setg(errp, "device_add not allowed while migrating");
         return NULL;
@@ -648,8 +679,10 @@ DeviceState *qdev_device_add(QemuOpts *opts, Error **errp)
 
 err_del_dev:
     error_propagate(errp, err);
-    object_unparent(OBJECT(dev));
-    object_unref(OBJECT(dev));
+    if (dev) {
+        object_unparent(OBJECT(dev));
+        object_unref(OBJECT(dev));
+    }
     return NULL;
 }
 
diff --git a/vl.c b/vl.c
index 4489cfb2bb..62c388cb49 100644
--- a/vl.c
+++ b/vl.c
@@ -2204,10 +2204,12 @@ static int device_init_func(void *opaque, QemuOpts *opts, Error **errp)
     DeviceState *dev;
 
     dev = qdev_device_add(opts, errp);
-    if (!dev) {
+    if (!dev && *errp) {
+        error_report_err(*errp);
         return -1;
+    } else if (dev) {
+        object_unref(OBJECT(dev));
     }
-    object_unref(OBJECT(dev));
     return 0;
 }
 
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 02/11] pci: add option for net failover
  2019-10-23  8:27 [PATCH v5 0/11] add failover feature for assigned network devices Jens Freimann
  2019-10-23  8:27 ` [PATCH v5 01/11] qdev/qbus: add hidden device support Jens Freimann
@ 2019-10-23  8:27 ` Jens Freimann
  2019-10-23 18:06   ` Alex Williamson
                     ` (2 more replies)
  2019-10-23  8:27 ` [PATCH v5 03/11] pci: mark devices partially unplugged Jens Freimann
                   ` (8 subsequent siblings)
  10 siblings, 3 replies; 34+ messages in thread
From: Jens Freimann @ 2019-10-23  8:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: pkrempa, berrange, ehabkost, mst, aadam, dgilbert,
	alex.williamson, laine, ailan, parav

This patch adds a net_failover_pair_id property to PCIDev which is
used to link the primary device in a failover pair (the PCI dev) to
a standby (a virtio-net-pci) device.

It only supports ethernet devices. Also currently it only supports
PCIe devices. QEMU will exit with an error message otherwise.

Signed-off-by: Jens Freimann <jfreimann@redhat.com>
---
 hw/pci/pci.c         | 17 +++++++++++++++++
 include/hw/pci/pci.h |  3 +++
 2 files changed, 20 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index aa05c2b9b2..fa9b5219f8 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -75,6 +75,8 @@ static Property pci_props[] = {
                     QEMU_PCIE_LNKSTA_DLLLA_BITNR, true),
     DEFINE_PROP_BIT("x-pcie-extcap-init", PCIDevice, cap_present,
                     QEMU_PCIE_EXTCAP_INIT_BITNR, true),
+    DEFINE_PROP_STRING("net_failover_pair_id", PCIDevice,
+            net_failover_pair_id),
     DEFINE_PROP_END_OF_LIST()
 };
 
@@ -2077,6 +2079,7 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
     ObjectClass *klass = OBJECT_CLASS(pc);
     Error *local_err = NULL;
     bool is_default_rom;
+    uint16_t class_id;
 
     /* initialize cap_present for pci_is_express() and pci_config_size(),
      * Note that hybrid PCIs are not set automatically and need to manage
@@ -2101,6 +2104,20 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
         }
     }
 
+    if (pci_dev->net_failover_pair_id) {
+        if (!pci_is_express(pci_dev)) {
+            error_setg(errp, "failover device is not a PCIExpress device");
+            error_propagate(errp, local_err);
+            return;
+        }
+        class_id = pci_get_word(pci_dev->config + PCI_CLASS_DEVICE);
+        if (class_id != PCI_CLASS_NETWORK_ETHERNET) {
+            error_setg(errp, "failover device is not an Ethernet device");
+            error_propagate(errp, local_err);
+            return;
+        }
+    }
+
     /* rom loading */
     is_default_rom = false;
     if (pci_dev->romfile == NULL && pc->romfile != NULL) {
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index f3f0ffd5fb..def5435685 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -352,6 +352,9 @@ struct PCIDevice {
     MSIVectorUseNotifier msix_vector_use_notifier;
     MSIVectorReleaseNotifier msix_vector_release_notifier;
     MSIVectorPollNotifier msix_vector_poll_notifier;
+
+    /* ID of standby device in net_failover pair */
+    char *net_failover_pair_id;
 };
 
 void pci_register_bar(PCIDevice *pci_dev, int region_num,
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 03/11] pci: mark devices partially unplugged
  2019-10-23  8:27 [PATCH v5 0/11] add failover feature for assigned network devices Jens Freimann
  2019-10-23  8:27 ` [PATCH v5 01/11] qdev/qbus: add hidden device support Jens Freimann
  2019-10-23  8:27 ` [PATCH v5 02/11] pci: add option for net failover Jens Freimann
@ 2019-10-23  8:27 ` Jens Freimann
  2019-10-23  8:27 ` [PATCH v5 04/11] pci: mark device having guest unplug request pending Jens Freimann
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 34+ messages in thread
From: Jens Freimann @ 2019-10-23  8:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: pkrempa, berrange, ehabkost, mst, aadam, dgilbert,
	alex.williamson, laine, ailan, parav

Only the guest unplug request was triggered. This is needed for
the failover feature. In case of a failed migration we need to
plug the device back to the guest.

Signed-off-by: Jens Freimann <jfreimann@redhat.com>
---
 hw/pci/pcie.c        | 3 +++
 include/hw/pci/pci.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index a6beb567bd..19363ff8ce 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -456,6 +456,9 @@ static void pcie_unplug_device(PCIBus *bus, PCIDevice *dev, void *opaque)
 {
     HotplugHandler *hotplug_ctrl = qdev_get_hotplug_handler(DEVICE(dev));
 
+    if (dev->partially_hotplugged) {
+        return;
+    }
     hotplug_handler_unplug(hotplug_ctrl, DEVICE(dev), &error_abort);
     object_unparent(OBJECT(dev));
 }
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index def5435685..7b7eac845c 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -265,6 +265,7 @@ typedef struct PCIReqIDCache PCIReqIDCache;
 
 struct PCIDevice {
     DeviceState qdev;
+    bool partially_hotplugged;
 
     /* PCI config space */
     uint8_t *config;
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 04/11] pci: mark device having guest unplug request pending
  2019-10-23  8:27 [PATCH v5 0/11] add failover feature for assigned network devices Jens Freimann
                   ` (2 preceding siblings ...)
  2019-10-23  8:27 ` [PATCH v5 03/11] pci: mark devices partially unplugged Jens Freimann
@ 2019-10-23  8:27 ` Jens Freimann
  2019-10-23  8:27 ` [PATCH v5 05/11] qapi: add unplug primary event Jens Freimann
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 34+ messages in thread
From: Jens Freimann @ 2019-10-23  8:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: pkrempa, berrange, ehabkost, mst, aadam, dgilbert,
	alex.williamson, laine, ailan, parav

Set pending_deleted_event in DeviceState for failover
primary devices that were successfully unplugged by the Guest OS.

Signed-off-by: Jens Freimann <jfreimann@redhat.com>
---
 hw/pci/pcie.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 19363ff8ce..08718188bb 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -457,6 +457,7 @@ static void pcie_unplug_device(PCIBus *bus, PCIDevice *dev, void *opaque)
     HotplugHandler *hotplug_ctrl = qdev_get_hotplug_handler(DEVICE(dev));
 
     if (dev->partially_hotplugged) {
+        dev->qdev.pending_deleted_event = false;
         return;
     }
     hotplug_handler_unplug(hotplug_ctrl, DEVICE(dev), &error_abort);
@@ -476,6 +477,8 @@ void pcie_cap_slot_unplug_request_cb(HotplugHandler *hotplug_dev,
         return;
     }
 
+    dev->pending_deleted_event = true;
+
     /* In case user cancel the operation of multi-function hot-add,
      * remove the function that is unexposed to guest individually,
      * without interaction with guest.
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 05/11] qapi: add unplug primary event
  2019-10-23  8:27 [PATCH v5 0/11] add failover feature for assigned network devices Jens Freimann
                   ` (3 preceding siblings ...)
  2019-10-23  8:27 ` [PATCH v5 04/11] pci: mark device having guest unplug request pending Jens Freimann
@ 2019-10-23  8:27 ` Jens Freimann
  2019-10-23 11:32   ` Eric Blake
  2019-10-23  8:27 ` [PATCH v5 06/11] qapi: add failover negotiated event Jens Freimann
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 34+ messages in thread
From: Jens Freimann @ 2019-10-23  8:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: pkrempa, berrange, ehabkost, mst, aadam, dgilbert,
	alex.williamson, laine, ailan, parav

This event is emitted when we sent a request to unplug a
failover primary device from the Guest OS and it includes the
device id of the primary device.

Signed-off-by: Jens Freimann <jfreimann@redhat.com>
---
 qapi/migration.json | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/qapi/migration.json b/qapi/migration.json
index 82feb5bd39..52e69e2868 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1448,3 +1448,22 @@
 # Since: 3.0
 ##
 { 'command': 'migrate-pause', 'allow-oob': true }
+
+##
+# @UNPLUG_PRIMARY:
+#
+# Emitted from source side of a migration when migration state is
+# WAIT_UNPLUG. Device was unplugged by guest operating system.
+# Device resources in QEMU are kept on standby to be able to re-plug it in case
+# of migration failure.
+#
+# @device_id: QEMU device id of the unplugged device
+#
+# Since: 4.2
+#
+# Example:
+#   {"event": "UNPLUG_PRIMARY", "data": {"device_id": "hostdev0"} }
+#
+##
+{ 'event': 'UNPLUG_PRIMARY',
+  'data': { 'device_id': 'str' } }
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 06/11] qapi: add failover negotiated event
  2019-10-23  8:27 [PATCH v5 0/11] add failover feature for assigned network devices Jens Freimann
                   ` (4 preceding siblings ...)
  2019-10-23  8:27 ` [PATCH v5 05/11] qapi: add unplug primary event Jens Freimann
@ 2019-10-23  8:27 ` Jens Freimann
  2019-10-24 17:32   ` Dr. David Alan Gilbert
  2019-10-25  5:35   ` Markus Armbruster
  2019-10-23  8:27 ` [PATCH v5 07/11] migration: allow unplug during migration for failover devices Jens Freimann
                   ` (4 subsequent siblings)
  10 siblings, 2 replies; 34+ messages in thread
From: Jens Freimann @ 2019-10-23  8:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: pkrempa, berrange, ehabkost, mst, aadam, dgilbert,
	alex.williamson, laine, ailan, parav

This event is sent to let libvirt know that VIRTIO_NET_F_STANDBY
feature was not negotiated during virtio feature negotiation. If this
event is received it means any primary devices hotplugged before
this were were never really added to QEMU devices.

Signed-off-by: Jens Freimann <jfreimann@redhat.com>
---
 qapi/net.json | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/qapi/net.json b/qapi/net.json
index 728990f4fb..8c5f3f1fb2 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -737,3 +737,19 @@
 ##
 { 'command': 'announce-self', 'boxed': true,
   'data' : 'AnnounceParameters'}
+
+##
+# @FAILOVER_NEGOTIATED:
+#
+# Emitted when VIRTIO_NET_F_STANDBY was negotiated during feature negotiation
+#
+# Since: 4.2
+#
+# Example:
+#
+# <- { "event": "FAILOVER_NEGOTIATED",
+#      "data": {} }
+#
+##
+{ 'event': 'FAILOVER_NEGOTIATED',
+  'data': {} }
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 07/11] migration: allow unplug during migration for failover devices
  2019-10-23  8:27 [PATCH v5 0/11] add failover feature for assigned network devices Jens Freimann
                   ` (5 preceding siblings ...)
  2019-10-23  8:27 ` [PATCH v5 06/11] qapi: add failover negotiated event Jens Freimann
@ 2019-10-23  8:27 ` Jens Freimann
  2019-10-23  8:27 ` [PATCH v5 08/11] migration: add new migration state wait-unplug Jens Freimann
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 34+ messages in thread
From: Jens Freimann @ 2019-10-23  8:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: pkrempa, berrange, ehabkost, mst, aadam, dgilbert,
	alex.williamson, laine, ailan, parav

In "b06424de62 migration: Disable hotplug/unplug during migration" we
added a check to disable unplug for all devices until we have figured
out what works. For failover primary devices qdev_unplug() is called
from the migration handler, i.e. during migration.

This patch adds a flag to DeviceState which is set to false for all
devices and makes an exception for PCI devices that are also
primary devices in a failover pair.

Signed-off-by: Jens Freimann <jfreimann@redhat.com>
---
 hw/core/qdev.c         | 1 +
 hw/pci/pci.c           | 1 +
 include/hw/qdev-core.h | 1 +
 qdev-monitor.c         | 2 +-
 4 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index f786650446..dc1289da86 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -995,6 +995,7 @@ static void device_initfn(Object *obj)
 
     dev->instance_id_alias = -1;
     dev->realized = false;
+    dev->allow_unplug_during_migration = false;
 
     object_property_add_bool(obj, "realized",
                              device_get_realized, device_set_realized, NULL);
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index fa9b5219f8..8fbf32d68c 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2116,6 +2116,7 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
             error_propagate(errp, local_err);
             return;
         }
+        qdev->allow_unplug_during_migration = true;
     }
 
     /* rom loading */
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 710981af36..1518495b1e 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -156,6 +156,7 @@ struct DeviceState {
     bool pending_deleted_event;
     QemuOpts *opts;
     int hotplugged;
+    bool allow_unplug_during_migration;
     BusState *parent_bus;
     QLIST_HEAD(, NamedGPIOList) gpios;
     QLIST_HEAD(, BusState) child_bus;
diff --git a/qdev-monitor.c b/qdev-monitor.c
index 676a759fb4..bc6a41fa37 100644
--- a/qdev-monitor.c
+++ b/qdev-monitor.c
@@ -851,7 +851,7 @@ void qdev_unplug(DeviceState *dev, Error **errp)
         return;
     }
 
-    if (!migration_is_idle()) {
+    if (!migration_is_idle() && !dev->allow_unplug_during_migration) {
         error_setg(errp, "device_del not allowed while migrating");
         return;
     }
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 08/11] migration: add new migration state wait-unplug
  2019-10-23  8:27 [PATCH v5 0/11] add failover feature for assigned network devices Jens Freimann
                   ` (6 preceding siblings ...)
  2019-10-23  8:27 ` [PATCH v5 07/11] migration: allow unplug during migration for failover devices Jens Freimann
@ 2019-10-23  8:27 ` Jens Freimann
  2019-10-23  8:27 ` [PATCH v5 09/11] libqos: tolerate wait-unplug migration state Jens Freimann
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 34+ messages in thread
From: Jens Freimann @ 2019-10-23  8:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: pkrempa, berrange, ehabkost, mst, aadam, dgilbert,
	alex.williamson, laine, ailan, parav

This patch adds a new migration state called wait-unplug.  It is entered
after the SETUP state if failover devices are present. It will transition
into ACTIVE once all devices were succesfully unplugged from the guest.

So if a guest doesn't respond or takes long to honor the unplug request
the user will see the migration state 'wait-unplug'.

In the migration thread we query failover devices if they're are still
pending the guest unplug. When all are unplugged the migration
continues. If one device won't unplug migration will stay in wait_unplug
state.

Signed-off-by: Jens Freimann <jfreimann@redhat.com>
---
 include/migration/vmstate.h |  2 ++
 migration/migration.c       | 21 +++++++++++++++++++++
 migration/migration.h       |  3 +++
 migration/savevm.c          | 36 ++++++++++++++++++++++++++++++++++++
 migration/savevm.h          |  2 ++
 qapi/migration.json         |  5 ++++-
 6 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index b9ee563aa4..ac4f46a67d 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -186,6 +186,8 @@ struct VMStateDescription {
     int (*pre_save)(void *opaque);
     int (*post_save)(void *opaque);
     bool (*needed)(void *opaque);
+    bool (*dev_unplug_pending)(void *opaque);
+
     const VMStateField *fields;
     const VMStateDescription **subsections;
 };
diff --git a/migration/migration.c b/migration/migration.c
index 3febd0f8f3..51764f2565 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -52,6 +52,7 @@
 #include "hw/qdev-properties.h"
 #include "monitor/monitor.h"
 #include "net/announce.h"
+#include "qemu/queue.h"
 
 #define MAX_THROTTLE  (32 << 20)      /* Migration transfer speed throttling */
 
@@ -819,6 +820,7 @@ bool migration_is_setup_or_active(int state)
     case MIGRATION_STATUS_SETUP:
     case MIGRATION_STATUS_PRE_SWITCHOVER:
     case MIGRATION_STATUS_DEVICE:
+    case MIGRATION_STATUS_WAIT_UNPLUG:
         return true;
 
     default:
@@ -954,6 +956,9 @@ static void fill_source_migration_info(MigrationInfo *info)
     case MIGRATION_STATUS_CANCELLED:
         info->has_status = true;
         break;
+    case MIGRATION_STATUS_WAIT_UNPLUG:
+        info->has_status = true;
+        break;
     }
     info->status = s->state;
 }
@@ -1694,6 +1699,7 @@ bool migration_is_idle(void)
     case MIGRATION_STATUS_COLO:
     case MIGRATION_STATUS_PRE_SWITCHOVER:
     case MIGRATION_STATUS_DEVICE:
+    case MIGRATION_STATUS_WAIT_UNPLUG:
         return false;
     case MIGRATION_STATUS__MAX:
         g_assert_not_reached();
@@ -3264,6 +3270,19 @@ static void *migration_thread(void *opaque)
 
     qemu_savevm_state_setup(s->to_dst_file);
 
+    if (qemu_savevm_nr_failover_devices()) {
+        migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
+                          MIGRATION_STATUS_WAIT_UNPLUG);
+
+        while (s->state == MIGRATION_STATUS_WAIT_UNPLUG &&
+                !qemu_savevm_state_guest_unplug_pending()) {
+            qemu_sem_timedwait(&s->wait_unplug_sem, 250);
+        }
+
+        migrate_set_state(&s->state, MIGRATION_STATUS_WAIT_UNPLUG,
+                MIGRATION_STATUS_ACTIVE);
+    }
+
     s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
     migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
                       MIGRATION_STATUS_ACTIVE);
@@ -3511,6 +3530,7 @@ static void migration_instance_finalize(Object *obj)
     qemu_mutex_destroy(&ms->qemu_file_lock);
     g_free(params->tls_hostname);
     g_free(params->tls_creds);
+    qemu_sem_destroy(&ms->wait_unplug_sem);
     qemu_sem_destroy(&ms->rate_limit_sem);
     qemu_sem_destroy(&ms->pause_sem);
     qemu_sem_destroy(&ms->postcopy_pause_sem);
@@ -3556,6 +3576,7 @@ static void migration_instance_init(Object *obj)
     qemu_sem_init(&ms->postcopy_pause_rp_sem, 0);
     qemu_sem_init(&ms->rp_state.rp_sem, 0);
     qemu_sem_init(&ms->rate_limit_sem, 0);
+    qemu_sem_init(&ms->wait_unplug_sem, 0);
     qemu_mutex_init(&ms->qemu_file_lock);
 }
 
diff --git a/migration/migration.h b/migration/migration.h
index 4f2fe193dc..79b3dda146 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -206,6 +206,9 @@ struct MigrationState
     /* Flag set once the migration thread called bdrv_inactivate_all */
     bool block_inactive;
 
+    /* Migration is waiting for guest to unplug device */
+    QemuSemaphore wait_unplug_sem;
+
     /* Migration is paused due to pause-before-switchover */
     QemuSemaphore pause_sem;
 
diff --git a/migration/savevm.c b/migration/savevm.c
index 8d95e261f6..0f18dea49e 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1113,6 +1113,42 @@ void qemu_savevm_state_header(QEMUFile *f)
     }
 }
 
+int qemu_savevm_nr_failover_devices(void)
+{
+    SaveStateEntry *se;
+    int n = 0;
+
+    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+        if (se->vmsd && se->vmsd->dev_unplug_pending) {
+            n++;
+        }
+    }
+
+    return n;
+}
+
+bool qemu_savevm_state_guest_unplug_pending(void)
+{
+    int nr_failover_devs;
+    SaveStateEntry *se;
+    bool ret = false;
+    int n = 0;
+
+    nr_failover_devs = qemu_savevm_nr_failover_devices();
+
+    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+        if (!se->vmsd || !se->vmsd->dev_unplug_pending) {
+            continue;
+        }
+        ret = se->vmsd->dev_unplug_pending(se->opaque);
+        if (!ret) {
+            n++;
+        }
+    }
+
+    return n == nr_failover_devs;
+}
+
 void qemu_savevm_state_setup(QEMUFile *f)
 {
     SaveStateEntry *se;
diff --git a/migration/savevm.h b/migration/savevm.h
index 51a4b9caa8..c42b9c80ee 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -31,6 +31,8 @@
 
 bool qemu_savevm_state_blocked(Error **errp);
 void qemu_savevm_state_setup(QEMUFile *f);
+int qemu_savevm_nr_failover_devices(void);
+bool qemu_savevm_state_guest_unplug_pending(void);
 int qemu_savevm_state_resume_prepare(MigrationState *s);
 void qemu_savevm_state_header(QEMUFile *f);
 int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy);
diff --git a/qapi/migration.json b/qapi/migration.json
index 52e69e2868..5a06cd489f 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -133,6 +133,9 @@
 # @device: During device serialisation when pause-before-switchover is enabled
 #        (since 2.11)
 #
+# @wait-unplug: wait for device unplug request by guest OS to be completed.
+#               (since 4.2)
+#
 # Since: 2.3
 #
 ##
@@ -140,7 +143,7 @@
   'data': [ 'none', 'setup', 'cancelling', 'cancelled',
             'active', 'postcopy-active', 'postcopy-paused',
             'postcopy-recover', 'completed', 'failed', 'colo',
-            'pre-switchover', 'device' ] }
+            'pre-switchover', 'device', 'wait-unplug' ] }
 
 ##
 # @MigrationInfo:
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 09/11] libqos: tolerate wait-unplug migration state
  2019-10-23  8:27 [PATCH v5 0/11] add failover feature for assigned network devices Jens Freimann
                   ` (7 preceding siblings ...)
  2019-10-23  8:27 ` [PATCH v5 08/11] migration: add new migration state wait-unplug Jens Freimann
@ 2019-10-23  8:27 ` Jens Freimann
  2019-10-23  8:27 ` [PATCH v5 10/11] net/virtio: add failover support Jens Freimann
  2019-10-23  8:27 ` [PATCH v5 11/11] vfio: unplug failover primary device before migration Jens Freimann
  10 siblings, 0 replies; 34+ messages in thread
From: Jens Freimann @ 2019-10-23  8:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: pkrempa, berrange, ehabkost, mst, aadam, dgilbert,
	alex.williamson, laine, ailan, parav

Signed-off-by: Jens Freimann <jfreimann@redhat.com>
---
 tests/libqos/libqos.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/libqos/libqos.c b/tests/libqos/libqos.c
index d71557c5cb..f229eb2cb8 100644
--- a/tests/libqos/libqos.c
+++ b/tests/libqos/libqos.c
@@ -125,7 +125,8 @@ void migrate(QOSState *from, QOSState *to, const char *uri)
             break;
         }
 
-        if ((strcmp(st, "setup") == 0) || (strcmp(st, "active") == 0)) {
+        if ((strcmp(st, "setup") == 0) || (strcmp(st, "active") == 0)
+            || (strcmp(st, "wait-unplug") == 0)) {
             qobject_unref(rsp);
             g_usleep(5000);
             continue;
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 10/11] net/virtio: add failover support
  2019-10-23  8:27 [PATCH v5 0/11] add failover feature for assigned network devices Jens Freimann
                   ` (8 preceding siblings ...)
  2019-10-23  8:27 ` [PATCH v5 09/11] libqos: tolerate wait-unplug migration state Jens Freimann
@ 2019-10-23  8:27 ` Jens Freimann
  2019-10-23  8:27 ` [PATCH v5 11/11] vfio: unplug failover primary device before migration Jens Freimann
  10 siblings, 0 replies; 34+ messages in thread
From: Jens Freimann @ 2019-10-23  8:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: pkrempa, berrange, ehabkost, mst, aadam, dgilbert,
	alex.williamson, laine, ailan, parav

This patch adds support to handle failover device pairs of a virtio-net
device and a vfio-pci device, where the virtio-net acts as the standby
device and the vfio-pci device as the primary.

The general idea is that we have a pair of devices, a vfio-pci and a
emulated (virtio-net) device. Before migration the vfio device is
unplugged and data flows to the emulated device, on the target side
another vfio-pci device is plugged in to take over the data-path. In the
guest the net_failover module will pair net devices with the same MAC
address.

To achieve this we need:

1. Provide a callback function for the should_be_hidden DeviceListener.
   It is called when the primary device is plugged in. Evaluate the QOpt
   passed in to check if it is the matching primary device. It returns
   two values:
     - one to signal if the device to be added is the matching
       primary device
     - another one to signal to qdev if it should actually
       continue with adding the device or skip it.

   In the latter case it stores the device options in the VirtioNet
   struct and the device is added once the VIRTIO_NET_F_STANDBY feature is
   negotiated during virtio feature negotiation.

   If the virtio-net devices are not realized at the time the vfio-pci
   devices are realized, we need to connect the devices later. This way
   we make sure primary and standby devices can be specified in any
   order.

2. Register a callback for migration status notifier. When called it
   will unplug its primary device before the migration happens.

3. Register a callback for the migration code that checks if a device
   needs to be unplugged from the guest.

Signed-off-by: Jens Freimann <jfreimann@redhat.com>
---
 hw/net/virtio-net.c            | 302 +++++++++++++++++++++++++++++++++
 include/hw/virtio/virtio-net.h |  12 ++
 include/hw/virtio/virtio.h     |   1 +
 3 files changed, 315 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 9f11422337..cccaf77bde 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -12,6 +12,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/atomic.h"
 #include "qemu/iov.h"
 #include "qemu/main-loop.h"
 #include "qemu/module.h"
@@ -21,6 +22,10 @@
 #include "net/tap.h"
 #include "qemu/error-report.h"
 #include "qemu/timer.h"
+#include "qemu/option.h"
+#include "qemu/option_int.h"
+#include "qemu/config-file.h"
+#include "qapi/qmp/qdict.h"
 #include "hw/virtio/virtio-net.h"
 #include "net/vhost_net.h"
 #include "net/announce.h"
@@ -28,11 +33,15 @@
 #include "qapi/error.h"
 #include "qapi/qapi-events-net.h"
 #include "hw/qdev-properties.h"
+#include "qapi/qapi-types-migration.h"
+#include "qapi/qapi-events-migration.h"
 #include "hw/virtio/virtio-access.h"
 #include "migration/misc.h"
 #include "standard-headers/linux/ethtool.h"
 #include "sysemu/sysemu.h"
 #include "trace.h"
+#include "monitor/qdev.h"
+#include "hw/pci/pci.h"
 
 #define VIRTIO_NET_VM_VERSION    11
 
@@ -746,9 +755,99 @@ static inline uint64_t virtio_net_supported_guest_offloads(VirtIONet *n)
     return virtio_net_guest_offloads_by_features(vdev->guest_features);
 }
 
+static void failover_add_primary(VirtIONet *n, Error **errp)
+{
+    Error *err = NULL;
+
+    n->primary_device_opts = qemu_opts_find(qemu_find_opts("device"),
+            n->primary_device_id);
+    if (n->primary_device_opts) {
+        n->primary_dev = qdev_device_add(n->primary_device_opts, &err);
+        if (err) {
+            qemu_opts_del(n->primary_device_opts);
+        }
+        if (n->primary_dev) {
+            n->primary_bus = n->primary_dev->parent_bus;
+            if (err) {
+                qdev_unplug(n->primary_dev, &err);
+                qdev_set_id(n->primary_dev, "");
+
+            }
+        }
+    } else {
+        error_setg(errp, "Primary device not found");
+        error_append_hint(errp, "Virtio-net failover will not work. Make "
+            "sure primary device has parameter"
+            " net_failover_pair_id=<virtio-net-id>\n");
+}
+    if (err) {
+        error_propagate(errp, err);
+    }
+}
+
+static int is_my_primary(void *opaque, QemuOpts *opts, Error **errp)
+{
+    VirtIONet *n = opaque;
+    int ret = 0;
+
+    const char *standby_id = qemu_opt_get(opts, "net_failover_pair_id");
+
+    if (standby_id != NULL && (g_strcmp0(standby_id, n->netclient_name) == 0)) {
+        n->primary_device_id = g_strdup(opts->id);
+        ret = 1;
+    }
+
+    return ret;
+}
+
+static DeviceState *virtio_net_find_primary(VirtIONet *n, Error **errp)
+{
+    DeviceState *dev = NULL;
+    Error *err = NULL;
+
+    if (qemu_opts_foreach(qemu_find_opts("device"),
+                         is_my_primary, n, &err)) {
+        if (err) {
+            error_propagate(errp, err);
+            return NULL;
+        }
+        if (n->primary_device_id) {
+            dev = qdev_find_recursive(sysbus_get_default(),
+                    n->primary_device_id);
+        } else {
+            error_setg(errp, "Primary device id not found");
+            return NULL;
+        }
+    }
+    return dev;
+}
+
+
+
+static DeviceState *virtio_connect_failover_devices(VirtIONet *n,
+                                                    DeviceState *dev,
+                                                    Error **errp)
+{
+    DeviceState *prim_dev = NULL;
+    Error *err = NULL;
+
+    prim_dev = virtio_net_find_primary(n, &err);
+    if (prim_dev) {
+        n->primary_device_id = g_strdup(prim_dev->id);
+        n->primary_device_opts = prim_dev->opts;
+    } else {
+        if (err) {
+            error_propagate(errp, err);
+        }
+    }
+
+    return prim_dev;
+}
+
 static void virtio_net_set_features(VirtIODevice *vdev, uint64_t features)
 {
     VirtIONet *n = VIRTIO_NET(vdev);
+    Error *err = NULL;
     int i;
 
     if (n->mtu_bypass_backend &&
@@ -790,6 +889,27 @@ static void virtio_net_set_features(VirtIODevice *vdev, uint64_t features)
     } else {
         memset(n->vlans, 0xff, MAX_VLAN >> 3);
     }
+
+    if (virtio_has_feature(features, VIRTIO_NET_F_STANDBY)) {
+        atomic_set(&n->primary_should_be_hidden, false);
+        failover_add_primary(n, &err);
+        if (err) {
+            n->primary_dev = virtio_connect_failover_devices(n, n->qdev, &err);
+            if (err) {
+                goto out_err;
+            }
+            failover_add_primary(n, &err);
+            if (err) {
+                goto out_err;
+            }
+        }
+    }
+    return;
+
+out_err:
+    if (err) {
+        warn_report_err(err);
+    }
 }
 
 static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
@@ -2630,6 +2750,151 @@ void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
     n->netclient_type = g_strdup(type);
 }
 
+static bool failover_unplug_primary(VirtIONet *n)
+{
+    HotplugHandler *hotplug_ctrl;
+    PCIDevice *pci_dev;
+    Error *err = NULL;
+
+    hotplug_ctrl = qdev_get_hotplug_handler(n->primary_dev);
+    if (hotplug_ctrl) {
+        pci_dev = PCI_DEVICE(n->primary_dev);
+        pci_dev->partially_hotplugged = true;
+        hotplug_handler_unplug_request(hotplug_ctrl, n->primary_dev, &err);
+        if (err) {
+            error_report_err(err);
+            return false;
+        }
+    } else {
+        return false;
+    }
+    return true;
+}
+
+static bool failover_replug_primary(VirtIONet *n, Error **errp)
+{
+    HotplugHandler *hotplug_ctrl;
+    PCIDevice *pdev = PCI_DEVICE(n->primary_dev);
+
+    if (!pdev->partially_hotplugged) {
+        return true;
+    }
+    if (!n->primary_device_opts) {
+        n->primary_device_opts = qemu_opts_from_qdict(
+                qemu_find_opts("device"),
+                n->primary_device_dict, errp);
+    }
+    if (n->primary_device_opts) {
+        if (n->primary_dev) {
+            n->primary_bus = n->primary_dev->parent_bus;
+        }
+        qdev_set_parent_bus(n->primary_dev, n->primary_bus);
+        n->primary_should_be_hidden = false;
+        qemu_opt_set_bool(n->primary_device_opts,
+                "partially_hotplugged", true, errp);
+        hotplug_ctrl = qdev_get_hotplug_handler(n->primary_dev);
+        if (hotplug_ctrl) {
+            hotplug_handler_pre_plug(hotplug_ctrl, n->primary_dev, errp);
+            hotplug_handler_plug(hotplug_ctrl, n->primary_dev, errp);
+        }
+        if (!n->primary_dev) {
+            error_setg(errp, "virtio_net: couldn't find primary device");
+        }
+    }
+    return *errp != NULL;
+}
+
+static void virtio_net_handle_migration_primary(VirtIONet *n,
+                                                MigrationState *s)
+{
+    bool should_be_hidden;
+    Error *err = NULL;
+
+    should_be_hidden = atomic_read(&n->primary_should_be_hidden);
+
+    if (!n->primary_dev) {
+        n->primary_dev = virtio_connect_failover_devices(n, n->qdev, &err);
+        if (!n->primary_dev) {
+            return;
+        }
+    }
+
+    if (migration_in_setup(s) && !should_be_hidden &&
+        n->primary_dev) {
+        if (failover_unplug_primary(n)) {
+            vmstate_unregister(n->primary_dev, qdev_get_vmsd(n->primary_dev),
+                    n->primary_dev);
+            qapi_event_send_unplug_primary(n->primary_device_id);
+            atomic_set(&n->primary_should_be_hidden, true);
+        } else {
+            warn_report("couldn't unplug primary device");
+        }
+    } else if (migration_has_failed(s)) {
+        /* We already unplugged the device let's plugged it back */
+        if (!failover_replug_primary(n, &err)) {
+            if (err) {
+                error_report_err(err);
+            }
+        }
+    }
+}
+
+static void virtio_net_migration_state_notifier(Notifier *notifier, void *data)
+{
+    MigrationState *s = data;
+    VirtIONet *n = container_of(notifier, VirtIONet, migration_state);
+    virtio_net_handle_migration_primary(n, s);
+}
+
+static int virtio_net_primary_should_be_hidden(DeviceListener *listener,
+            QemuOpts *device_opts)
+{
+    VirtIONet *n = container_of(listener, VirtIONet, primary_listener);
+    bool match_found;
+    bool hide;
+
+    n->primary_device_dict = qemu_opts_to_qdict(device_opts,
+            n->primary_device_dict);
+    if (n->primary_device_dict) {
+        g_free(n->standby_id);
+        n->standby_id = g_strdup(qdict_get_try_str(n->primary_device_dict,
+                    "net_failover_pair_id"));
+    }
+    if (device_opts && g_strcmp0(n->standby_id, n->netclient_name) == 0) {
+        match_found = true;
+    } else {
+        match_found = false;
+        hide = false;
+        g_free(n->standby_id);
+        n->primary_device_dict = NULL;
+        warn_report("primary_device_id not found in device options");
+        goto out;
+    }
+
+    n->primary_device_opts = device_opts;
+
+    /* primary_should_be_hidden is set during feature negotiation */
+    hide = atomic_read(&n->primary_should_be_hidden);
+
+    if (n->primary_device_dict) {
+        g_free(n->primary_device_id);
+        n->primary_device_id = g_strdup(qdict_get_try_str(
+                    n->primary_device_dict, "id"));
+        if (!n->primary_device_id) {
+            warn_report("primary_device_id not set");
+        }
+    }
+
+out:
+    if (match_found && hide) {
+        return 1;
+    } else if (match_found && !hide) {
+        return 0;
+    } else {
+        return -1;
+    }
+}
+
 static void virtio_net_device_realize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
@@ -2660,6 +2925,16 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp)
         n->host_features |= (1ULL << VIRTIO_NET_F_SPEED_DUPLEX);
     }
 
+    if (n->failover) {
+        n->primary_listener.should_be_hidden =
+            virtio_net_primary_should_be_hidden;
+        atomic_set(&n->primary_should_be_hidden, true);
+        device_listener_register(&n->primary_listener);
+        n->migration_state.notify = virtio_net_migration_state_notifier;
+        add_migration_state_change_notifier(&n->migration_state);
+        n->host_features |= (1ULL << VIRTIO_NET_F_STANDBY);
+    }
+
     virtio_net_set_config_size(n, n->host_features);
     virtio_init(vdev, "virtio-net", VIRTIO_ID_NET, n->config_size);
 
@@ -2782,6 +3057,13 @@ static void virtio_net_device_unrealize(DeviceState *dev, Error **errp)
     g_free(n->mac_table.macs);
     g_free(n->vlans);
 
+    if (n->failover) {
+        g_free(n->primary_device_id);
+        g_free(n->standby_id);
+        qobject_unref(n->primary_device_dict);
+        n->primary_device_dict = NULL;
+    }
+
     max_queues = n->multiqueue ? n->max_queues : 1;
     for (i = 0; i < max_queues; i++) {
         virtio_net_del_queue(n, i);
@@ -2819,6 +3101,23 @@ static int virtio_net_pre_save(void *opaque)
     return 0;
 }
 
+static bool primary_unplug_pending(void *opaque)
+{
+    DeviceState *dev = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+    VirtIONet *n = VIRTIO_NET(vdev);
+
+    return n->primary_dev ? n->primary_dev->pending_deleted_event : false;
+}
+
+static bool dev_unplug_pending(void *opaque)
+{
+    DeviceState *dev = opaque;
+    VirtioDeviceClass *vdc = VIRTIO_DEVICE_GET_CLASS(dev);
+
+    return vdc->primary_unplug_pending(dev);
+}
+
 static const VMStateDescription vmstate_virtio_net = {
     .name = "virtio-net",
     .minimum_version_id = VIRTIO_NET_VM_VERSION,
@@ -2828,6 +3127,7 @@ static const VMStateDescription vmstate_virtio_net = {
         VMSTATE_END_OF_LIST()
     },
     .pre_save = virtio_net_pre_save,
+    .dev_unplug_pending = dev_unplug_pending,
 };
 
 static Property virtio_net_properties[] = {
@@ -2889,6 +3189,7 @@ static Property virtio_net_properties[] = {
                      true),
     DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
     DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
+    DEFINE_PROP_BOOL("failover", VirtIONet, failover, false),
     DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -2913,6 +3214,7 @@ static void virtio_net_class_init(ObjectClass *klass, void *data)
     vdc->guest_notifier_pending = virtio_net_guest_notifier_pending;
     vdc->legacy_features |= (0x1 << VIRTIO_NET_F_GSO);
     vdc->vmsd = &vmstate_virtio_net_device;
+    vdc->primary_unplug_pending = primary_unplug_pending;
 }
 
 static const TypeInfo virtio_net_info = {
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index b96f0c643f..3da4ca382a 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -18,6 +18,7 @@
 #include "standard-headers/linux/virtio_net.h"
 #include "hw/virtio/virtio.h"
 #include "net/announce.h"
+#include "qemu/option_int.h"
 
 #define TYPE_VIRTIO_NET "virtio-net-device"
 #define VIRTIO_NET(obj) \
@@ -43,6 +44,7 @@ typedef struct virtio_net_conf
     int32_t speed;
     char *duplex_str;
     uint8_t duplex;
+    char *primary_id_str;
 } virtio_net_conf;
 
 /* Coalesced packets type & status */
@@ -185,6 +187,16 @@ struct VirtIONet {
     AnnounceTimer announce_timer;
     bool needs_vnet_hdr_swap;
     bool mtu_bypass_backend;
+    QemuOpts *primary_device_opts;
+    QDict *primary_device_dict;
+    DeviceState *primary_dev;
+    BusState *primary_bus;
+    char *primary_device_id;
+    char *standby_id;
+    bool primary_should_be_hidden;
+    bool failover;
+    DeviceListener primary_listener;
+    Notifier migration_state;
 };
 
 void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 48e8d04ff6..0c857ecf1a 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -159,6 +159,7 @@ typedef struct VirtioDeviceClass {
     void (*save)(VirtIODevice *vdev, QEMUFile *f);
     int (*load)(VirtIODevice *vdev, QEMUFile *f, int version_id);
     const VMStateDescription *vmsd;
+    bool (*primary_unplug_pending)(void *opaque);
 } VirtioDeviceClass;
 
 void virtio_instance_init_common(Object *proxy_obj, void *data,
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v5 11/11] vfio: unplug failover primary device before migration
  2019-10-23  8:27 [PATCH v5 0/11] add failover feature for assigned network devices Jens Freimann
                   ` (9 preceding siblings ...)
  2019-10-23  8:27 ` [PATCH v5 10/11] net/virtio: add failover support Jens Freimann
@ 2019-10-23  8:27 ` Jens Freimann
  2019-10-23 18:28   ` Alex Williamson
  10 siblings, 1 reply; 34+ messages in thread
From: Jens Freimann @ 2019-10-23  8:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: pkrempa, berrange, ehabkost, mst, aadam, dgilbert,
	alex.williamson, laine, ailan, parav

As usual block all vfio-pci devices from being migrated, but make an
exception for failover primary devices. This is achieved by setting
unmigratable to 0 but also add a migration blocker for all vfio-pci
devices except failover primary devices. These will be unplugged before
migration happens by the migration handler of the corresponding
virtio-net standby device.

Signed-off-by: Jens Freimann <jfreimann@redhat.com>
---
 hw/vfio/pci.c | 26 ++++++++++++++++++++------
 hw/vfio/pci.h |  1 +
 2 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 12fac39804..5dadfec6e8 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -40,6 +40,7 @@
 #include "pci.h"
 #include "trace.h"
 #include "qapi/error.h"
+#include "migration/blocker.h"
 
 #define TYPE_VFIO_PCI "vfio-pci"
 #define PCI_VFIO(obj)    OBJECT_CHECK(VFIOPCIDevice, obj, TYPE_VFIO_PCI)
@@ -2732,6 +2733,17 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
         return;
     }
 
+    if (!pdev->net_failover_pair_id) {
+        error_setg(&vdev->migration_blocker,
+                "VFIO device doesn't support migration");
+        ret = migrate_add_blocker(vdev->migration_blocker, &err);
+        if (err) {
+            error_propagate(errp, err);
+            error_free(vdev->migration_blocker);
+            return;
+        }
+    }
+
     vdev->vbasedev.name = g_path_get_basename(vdev->vbasedev.sysfsdev);
     vdev->vbasedev.ops = &vfio_pci_ops;
     vdev->vbasedev.type = VFIO_DEVICE_TYPE_PCI;
@@ -3008,6 +3020,10 @@ out_teardown:
     vfio_bars_exit(vdev);
 error:
     error_prepend(errp, VFIO_MSG_PREFIX, vdev->vbasedev.name);
+    if (vdev->migration_blocker) {
+        migrate_del_blocker(vdev->migration_blocker);
+        error_free(vdev->migration_blocker);
+    }
 }
 
 static void vfio_instance_finalize(Object *obj)
@@ -3019,6 +3035,10 @@ static void vfio_instance_finalize(Object *obj)
     vfio_bars_finalize(vdev);
     g_free(vdev->emulated_config_bits);
     g_free(vdev->rom);
+    if (vdev->migration_blocker) {
+        migrate_del_blocker(vdev->migration_blocker);
+        error_free(vdev->migration_blocker);
+    }
     /*
      * XXX Leaking igd_opregion is not an oversight, we can't remove the
      * fw_cfg entry therefore leaking this allocation seems like the safest
@@ -3151,11 +3171,6 @@ static Property vfio_pci_dev_properties[] = {
     DEFINE_PROP_END_OF_LIST(),
 };
 
-static const VMStateDescription vfio_pci_vmstate = {
-    .name = "vfio-pci",
-    .unmigratable = 1,
-};
-
 static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
@@ -3163,7 +3178,6 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
 
     dc->reset = vfio_pci_reset;
     dc->props = vfio_pci_dev_properties;
-    dc->vmsd = &vfio_pci_vmstate;
     dc->desc = "VFIO-based PCI device assignment";
     set_bit(DEVICE_CATEGORY_MISC, dc->categories);
     pdc->realize = vfio_realize;
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 834a90d646..b329d50338 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -168,6 +168,7 @@ typedef struct VFIOPCIDevice {
     bool no_vfio_ioeventfd;
     bool enable_ramfb;
     VFIODisplay *dpy;
+    Error *migration_blocker;
 } VFIOPCIDevice;
 
 uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 05/11] qapi: add unplug primary event
  2019-10-23  8:27 ` [PATCH v5 05/11] qapi: add unplug primary event Jens Freimann
@ 2019-10-23 11:32   ` Eric Blake
  0 siblings, 0 replies; 34+ messages in thread
From: Eric Blake @ 2019-10-23 11:32 UTC (permalink / raw)
  To: Jens Freimann, qemu-devel
  Cc: pkrempa, berrange, ehabkost, mst, aadam, dgilbert,
	alex.williamson, laine, ailan, parav

On 10/23/19 3:27 AM, Jens Freimann wrote:
> This event is emitted when we sent a request to unplug a
> failover primary device from the Guest OS and it includes the
> device id of the primary device.
> 
> Signed-off-by: Jens Freimann <jfreimann@redhat.com>
> ---
>   qapi/migration.json | 19 +++++++++++++++++++
>   1 file changed, 19 insertions(+)
> 
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 82feb5bd39..52e69e2868 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -1448,3 +1448,22 @@
>   # Since: 3.0
>   ##
>   { 'command': 'migrate-pause', 'allow-oob': true }
> +
> +##
> +# @UNPLUG_PRIMARY:
> +#
> +# Emitted from source side of a migration when migration state is
> +# WAIT_UNPLUG. Device was unplugged by guest operating system.
> +# Device resources in QEMU are kept on standby to be able to re-plug it in case
> +# of migration failure.
> +#
> +# @device_id: QEMU device id of the unplugged device
> +#
> +# Since: 4.2
> +#
> +# Example:
> +#   {"event": "UNPLUG_PRIMARY", "data": {"device_id": "hostdev0"} }

Should be device-id


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 02/11] pci: add option for net failover
  2019-10-23  8:27 ` [PATCH v5 02/11] pci: add option for net failover Jens Freimann
@ 2019-10-23 18:06   ` Alex Williamson
  2019-10-23 19:30     ` Jens Freimann
  2019-10-25 10:52     ` Jens Freimann
  2019-10-24  5:03   ` Parav Pandit
  2019-10-24 17:22   ` Dr. David Alan Gilbert
  2 siblings, 2 replies; 34+ messages in thread
From: Alex Williamson @ 2019-10-23 18:06 UTC (permalink / raw)
  To: Jens Freimann
  Cc: pkrempa, berrange, ehabkost, mst, aadam, qemu-devel, dgilbert,
	laine, ailan, parav

On Wed, 23 Oct 2019 10:27:02 +0200
Jens Freimann <jfreimann@redhat.com> wrote:

> This patch adds a net_failover_pair_id property to PCIDev which is
> used to link the primary device in a failover pair (the PCI dev) to
> a standby (a virtio-net-pci) device.
> 
> It only supports ethernet devices. Also currently it only supports
> PCIe devices. QEMU will exit with an error message otherwise.
> 
> Signed-off-by: Jens Freimann <jfreimann@redhat.com>
> ---
>  hw/pci/pci.c         | 17 +++++++++++++++++
>  include/hw/pci/pci.h |  3 +++
>  2 files changed, 20 insertions(+)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index aa05c2b9b2..fa9b5219f8 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -75,6 +75,8 @@ static Property pci_props[] = {
>                      QEMU_PCIE_LNKSTA_DLLLA_BITNR, true),
>      DEFINE_PROP_BIT("x-pcie-extcap-init", PCIDevice, cap_present,
>                      QEMU_PCIE_EXTCAP_INIT_BITNR, true),
> +    DEFINE_PROP_STRING("net_failover_pair_id", PCIDevice,
> +            net_failover_pair_id),

Nit, white space appears broken here.

>      DEFINE_PROP_END_OF_LIST()
>  };
>  
> @@ -2077,6 +2079,7 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
>      ObjectClass *klass = OBJECT_CLASS(pc);
>      Error *local_err = NULL;
>      bool is_default_rom;
> +    uint16_t class_id;
>  
>      /* initialize cap_present for pci_is_express() and pci_config_size(),
>       * Note that hybrid PCIs are not set automatically and need to manage
> @@ -2101,6 +2104,20 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
>          }
>      }
>  
> +    if (pci_dev->net_failover_pair_id) {
> +        if (!pci_is_express(pci_dev)) {
> +            error_setg(errp, "failover device is not a PCIExpress device");
> +            error_propagate(errp, local_err);
> +            return;
> +        }

Did we decide we don't need to test that the device is also in a
hotpluggable slot?  Are there also multi-function considerations that
should be prevented or documented?  For example, if a user tries to
configure both the primary and failover NICs in the same slot, I assume
bad things will happen.

> +        class_id = pci_get_word(pci_dev->config + PCI_CLASS_DEVICE);
> +        if (class_id != PCI_CLASS_NETWORK_ETHERNET) {
> +            error_setg(errp, "failover device is not an Ethernet device");
> +            error_propagate(errp, local_err);
> +            return;
> +        }
> +    }

Looks like cleanup is missing both both error cases, the option rom
error path below this does a pci_qdev_unrealize() before returning so
I'd assume we want the same here.  Thanks,

Alex

> +
>      /* rom loading */
>      is_default_rom = false;
>      if (pci_dev->romfile == NULL && pc->romfile != NULL) {
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index f3f0ffd5fb..def5435685 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -352,6 +352,9 @@ struct PCIDevice {
>      MSIVectorUseNotifier msix_vector_use_notifier;
>      MSIVectorReleaseNotifier msix_vector_release_notifier;
>      MSIVectorPollNotifier msix_vector_poll_notifier;
> +
> +    /* ID of standby device in net_failover pair */
> +    char *net_failover_pair_id;
>  };
>  
>  void pci_register_bar(PCIDevice *pci_dev, int region_num,



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 11/11] vfio: unplug failover primary device before migration
  2019-10-23  8:27 ` [PATCH v5 11/11] vfio: unplug failover primary device before migration Jens Freimann
@ 2019-10-23 18:28   ` Alex Williamson
  0 siblings, 0 replies; 34+ messages in thread
From: Alex Williamson @ 2019-10-23 18:28 UTC (permalink / raw)
  To: Jens Freimann
  Cc: pkrempa, berrange, ehabkost, mst, aadam, qemu-devel, dgilbert,
	laine, ailan, parav

On Wed, 23 Oct 2019 10:27:11 +0200
Jens Freimann <jfreimann@redhat.com> wrote:

> As usual block all vfio-pci devices from being migrated, but make an
> exception for failover primary devices. This is achieved by setting
> unmigratable to 0 but also add a migration blocker for all vfio-pci
> devices except failover primary devices. These will be unplugged before
> migration happens by the migration handler of the corresponding
> virtio-net standby device.
> 
> Signed-off-by: Jens Freimann <jfreimann@redhat.com>
> ---
>  hw/vfio/pci.c | 26 ++++++++++++++++++++------
>  hw/vfio/pci.h |  1 +
>  2 files changed, 21 insertions(+), 6 deletions(-)

I wonder if this might be cleaner if we add the migration
unconditionally initially, then remove it for specific cases, such as
this or when we start to see mdev devices that can actually support
migration.  That can be refined as we start to introduce the latter
though.

Acked-by: Alex Williamson <alex.williamson@redhat.com>

> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 12fac39804..5dadfec6e8 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -40,6 +40,7 @@
>  #include "pci.h"
>  #include "trace.h"
>  #include "qapi/error.h"
> +#include "migration/blocker.h"
>  
>  #define TYPE_VFIO_PCI "vfio-pci"
>  #define PCI_VFIO(obj)    OBJECT_CHECK(VFIOPCIDevice, obj, TYPE_VFIO_PCI)
> @@ -2732,6 +2733,17 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>          return;
>      }
>  
> +    if (!pdev->net_failover_pair_id) {
> +        error_setg(&vdev->migration_blocker,
> +                "VFIO device doesn't support migration");
> +        ret = migrate_add_blocker(vdev->migration_blocker, &err);
> +        if (err) {
> +            error_propagate(errp, err);
> +            error_free(vdev->migration_blocker);
> +            return;
> +        }
> +    }
> +
>      vdev->vbasedev.name = g_path_get_basename(vdev->vbasedev.sysfsdev);
>      vdev->vbasedev.ops = &vfio_pci_ops;
>      vdev->vbasedev.type = VFIO_DEVICE_TYPE_PCI;
> @@ -3008,6 +3020,10 @@ out_teardown:
>      vfio_bars_exit(vdev);
>  error:
>      error_prepend(errp, VFIO_MSG_PREFIX, vdev->vbasedev.name);
> +    if (vdev->migration_blocker) {
> +        migrate_del_blocker(vdev->migration_blocker);
> +        error_free(vdev->migration_blocker);
> +    }
>  }
>  
>  static void vfio_instance_finalize(Object *obj)
> @@ -3019,6 +3035,10 @@ static void vfio_instance_finalize(Object *obj)
>      vfio_bars_finalize(vdev);
>      g_free(vdev->emulated_config_bits);
>      g_free(vdev->rom);
> +    if (vdev->migration_blocker) {
> +        migrate_del_blocker(vdev->migration_blocker);
> +        error_free(vdev->migration_blocker);
> +    }
>      /*
>       * XXX Leaking igd_opregion is not an oversight, we can't remove the
>       * fw_cfg entry therefore leaking this allocation seems like the safest
> @@ -3151,11 +3171,6 @@ static Property vfio_pci_dev_properties[] = {
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> -static const VMStateDescription vfio_pci_vmstate = {
> -    .name = "vfio-pci",
> -    .unmigratable = 1,
> -};
> -
>  static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
>  {
>      DeviceClass *dc = DEVICE_CLASS(klass);
> @@ -3163,7 +3178,6 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
>  
>      dc->reset = vfio_pci_reset;
>      dc->props = vfio_pci_dev_properties;
> -    dc->vmsd = &vfio_pci_vmstate;
>      dc->desc = "VFIO-based PCI device assignment";
>      set_bit(DEVICE_CATEGORY_MISC, dc->categories);
>      pdc->realize = vfio_realize;
> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> index 834a90d646..b329d50338 100644
> --- a/hw/vfio/pci.h
> +++ b/hw/vfio/pci.h
> @@ -168,6 +168,7 @@ typedef struct VFIOPCIDevice {
>      bool no_vfio_ioeventfd;
>      bool enable_ramfb;
>      VFIODisplay *dpy;
> +    Error *migration_blocker;
>  } VFIOPCIDevice;
>  
>  uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 02/11] pci: add option for net failover
  2019-10-23 18:06   ` Alex Williamson
@ 2019-10-23 19:30     ` Jens Freimann
  2019-10-23 20:02       ` Alex Williamson
  2019-10-25 10:52     ` Jens Freimann
  1 sibling, 1 reply; 34+ messages in thread
From: Jens Freimann @ 2019-10-23 19:30 UTC (permalink / raw)
  To: Alex Williamson
  Cc: pkrempa, berrange, ehabkost, mst, aadam, qemu-devel, dgilbert,
	laine, ailan, parav

On Wed, Oct 23, 2019 at 12:06:48PM -0600, Alex Williamson wrote:
>On Wed, 23 Oct 2019 10:27:02 +0200
>Jens Freimann <jfreimann@redhat.com> wrote:
>
>> This patch adds a net_failover_pair_id property to PCIDev which is
>> used to link the primary device in a failover pair (the PCI dev) to
>> a standby (a virtio-net-pci) device.
>>
>> It only supports ethernet devices. Also currently it only supports
>> PCIe devices. QEMU will exit with an error message otherwise.
>>
>> Signed-off-by: Jens Freimann <jfreimann@redhat.com>
>> ---
>>  hw/pci/pci.c         | 17 +++++++++++++++++
>>  include/hw/pci/pci.h |  3 +++
>>  2 files changed, 20 insertions(+)
>>
>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>> index aa05c2b9b2..fa9b5219f8 100644
>> --- a/hw/pci/pci.c
>> +++ b/hw/pci/pci.c
>> @@ -75,6 +75,8 @@ static Property pci_props[] = {
>>                      QEMU_PCIE_LNKSTA_DLLLA_BITNR, true),
>>      DEFINE_PROP_BIT("x-pcie-extcap-init", PCIDevice, cap_present,
>>                      QEMU_PCIE_EXTCAP_INIT_BITNR, true),
>> +    DEFINE_PROP_STRING("net_failover_pair_id", PCIDevice,
>> +            net_failover_pair_id),
>
>Nit, white space appears broken here.

I'll fix it.

>>      DEFINE_PROP_END_OF_LIST()
>>  };
>>
>> @@ -2077,6 +2079,7 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
>>      ObjectClass *klass = OBJECT_CLASS(pc);
>>      Error *local_err = NULL;
>>      bool is_default_rom;
>> +    uint16_t class_id;
>>
>>      /* initialize cap_present for pci_is_express() and pci_config_size(),
>>       * Note that hybrid PCIs are not set automatically and need to manage
>> @@ -2101,6 +2104,20 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
>>          }
>>      }
>>
>> +    if (pci_dev->net_failover_pair_id) {
>> +        if (!pci_is_express(pci_dev)) {
>> +            error_setg(errp, "failover device is not a PCIExpress device");
>> +            error_propagate(errp, local_err);
>> +            return;
>> +        }
>
>Did we decide we don't need to test that the device is also in a
>hotpluggable slot?  

Hmm, my reply to you was never sent. I thought the check for
qdev_device_add() was sufficient but you said that works only
after qdev_hotplug is set (after machine creation). I modified
the check to this:

    hide = should_hide_device(opts);                                                                      
                                                                                                          
    if ((hide || qdev_hotplug) && bus && !qbus_is_hotpluggable(bus)) {                                    
        error_setg(errp, QERR_BUS_NO_HOTPLUG, bus->name);                                                 
        return NULL;                                                                                      
    }                                                                                                     
                                                                                                          
    if (hide) {                                                                                           
        return NULL;                                                                                      
    }

This will make qemu fail when we have the device in the initial
configuration or when we hotplug it to a bus that doesn't support it.
I tested both with a device on pcie.0. Am I missing something? 

>Are there also multi-function considerations that
>should be prevented or documented?  For example, if a user tries to
>configure both the primary and failover NICs in the same slot, I assume
>bad things will happen.

  I would have expected that this is already checked in pci code, but
it is not. I tried it and when I put both devices into the same slot
they are both unplugged from the guest during boot but nothing else
happens. I don't know what triggers that unplug of the devices.

I'm not aware of any other problems regarding multi-function, which
doesn't mean there aren't any. 

>
>> +        class_id = pci_get_word(pci_dev->config + PCI_CLASS_DEVICE);
>> +        if (class_id != PCI_CLASS_NETWORK_ETHERNET) {
>> +            error_setg(errp, "failover device is not an Ethernet device");
>> +            error_propagate(errp, local_err);
>> +            return;
>> +        }
>> +    }
>
>Looks like cleanup is missing both both error cases, the option rom
>error path below this does a pci_qdev_unrealize() before returning so
>I'd assume we want the same here.  Thanks,

Thanks, I'll fix this too.

regards,
Jens 



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 02/11] pci: add option for net failover
  2019-10-23 19:30     ` Jens Freimann
@ 2019-10-23 20:02       ` Alex Williamson
  2019-10-23 20:31         ` Jens Freimann
  0 siblings, 1 reply; 34+ messages in thread
From: Alex Williamson @ 2019-10-23 20:02 UTC (permalink / raw)
  To: Jens Freimann
  Cc: pkrempa, berrange, ehabkost, mst, aadam, qemu-devel, dgilbert,
	laine, ailan, parav

On Wed, 23 Oct 2019 21:30:35 +0200
Jens Freimann <jfreimann@redhat.com> wrote:

> On Wed, Oct 23, 2019 at 12:06:48PM -0600, Alex Williamson wrote:
> >On Wed, 23 Oct 2019 10:27:02 +0200
> >Jens Freimann <jfreimann@redhat.com> wrote:
> >  
> >> This patch adds a net_failover_pair_id property to PCIDev which is
> >> used to link the primary device in a failover pair (the PCI dev) to
> >> a standby (a virtio-net-pci) device.
> >>
> >> It only supports ethernet devices. Also currently it only supports
> >> PCIe devices. QEMU will exit with an error message otherwise.
> >>
> >> Signed-off-by: Jens Freimann <jfreimann@redhat.com>
> >> ---
> >>  hw/pci/pci.c         | 17 +++++++++++++++++
> >>  include/hw/pci/pci.h |  3 +++
> >>  2 files changed, 20 insertions(+)
> >>
> >> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> >> index aa05c2b9b2..fa9b5219f8 100644
> >> --- a/hw/pci/pci.c
> >> +++ b/hw/pci/pci.c
> >> @@ -75,6 +75,8 @@ static Property pci_props[] = {
> >>                      QEMU_PCIE_LNKSTA_DLLLA_BITNR, true),
> >>      DEFINE_PROP_BIT("x-pcie-extcap-init", PCIDevice, cap_present,
> >>                      QEMU_PCIE_EXTCAP_INIT_BITNR, true),
> >> +    DEFINE_PROP_STRING("net_failover_pair_id", PCIDevice,
> >> +            net_failover_pair_id),  
> >
> >Nit, white space appears broken here.  
> 
> I'll fix it.
> 
> >>      DEFINE_PROP_END_OF_LIST()
> >>  };
> >>
> >> @@ -2077,6 +2079,7 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
> >>      ObjectClass *klass = OBJECT_CLASS(pc);
> >>      Error *local_err = NULL;
> >>      bool is_default_rom;
> >> +    uint16_t class_id;
> >>
> >>      /* initialize cap_present for pci_is_express() and pci_config_size(),
> >>       * Note that hybrid PCIs are not set automatically and need to manage
> >> @@ -2101,6 +2104,20 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
> >>          }
> >>      }
> >>
> >> +    if (pci_dev->net_failover_pair_id) {
> >> +        if (!pci_is_express(pci_dev)) {
> >> +            error_setg(errp, "failover device is not a PCIExpress device");
> >> +            error_propagate(errp, local_err);
> >> +            return;
> >> +        }  
> >
> >Did we decide we don't need to test that the device is also in a
> >hotpluggable slot?    
> 
> Hmm, my reply to you was never sent. I thought the check for
> qdev_device_add() was sufficient but you said that works only
> after qdev_hotplug is set (after machine creation). I modified
> the check to this:
> 
>     hide = should_hide_device(opts);                                                                      
>                                                                                                           
>     if ((hide || qdev_hotplug) && bus && !qbus_is_hotpluggable(bus)) {                                    
>         error_setg(errp, QERR_BUS_NO_HOTPLUG, bus->name);                                                 
>         return NULL;                                                                                      
>     }                                                                                                     
>                                                                                                           
>     if (hide) {                                                                                           
>         return NULL;                                                                                      
>     }
> 
> This will make qemu fail when we have the device in the initial
> configuration or when we hotplug it to a bus that doesn't support it.
> I tested both with a device on pcie.0. Am I missing something? 

Nope, sorry, I was expecting the check here and didn't see that you
perform it elsewhere.  Seems good enough for me.
 
> >Are there also multi-function considerations that
> >should be prevented or documented?  For example, if a user tries to
> >configure both the primary and failover NICs in the same slot, I assume
> >bad things will happen.  
> 
>   I would have expected that this is already checked in pci code, but
> it is not. I tried it and when I put both devices into the same slot
> they are both unplugged from the guest during boot but nothing else
> happens. I don't know what triggers that unplug of the devices.
> 
> I'm not aware of any other problems regarding multi-function, which
> doesn't mean there aren't any. 

Hmm, was the hidden device at function #0?  The guest won't find any
functions if function #0 isn't present, but I don't know what would
trigger the hotplug.  The angle I'm thinking is that we only have slot
level granularity for hotplug, so any sort of automatic hotplug of a
slot should probably think about bystander devices within the slot.
Thanks,

Alex

> >> +        class_id = pci_get_word(pci_dev->config + PCI_CLASS_DEVICE);
> >> +        if (class_id != PCI_CLASS_NETWORK_ETHERNET) {
> >> +            error_setg(errp, "failover device is not an Ethernet device");
> >> +            error_propagate(errp, local_err);
> >> +            return;
> >> +        }
> >> +    }  
> >
> >Looks like cleanup is missing both both error cases, the option rom
> >error path below this does a pci_qdev_unrealize() before returning so
> >I'd assume we want the same here.  Thanks,  
> 
> Thanks, I'll fix this too.
> 
> regards,
> Jens 



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 02/11] pci: add option for net failover
  2019-10-23 20:02       ` Alex Williamson
@ 2019-10-23 20:31         ` Jens Freimann
  2019-10-23 21:15           ` Alex Williamson
  0 siblings, 1 reply; 34+ messages in thread
From: Jens Freimann @ 2019-10-23 20:31 UTC (permalink / raw)
  To: Alex Williamson
  Cc: pkrempa, berrange, ehabkost, mst, aadam, qemu-devel, dgilbert,
	laine, ailan, parav

On Wed, Oct 23, 2019 at 02:02:11PM -0600, Alex Williamson wrote:
>On Wed, 23 Oct 2019 21:30:35 +0200
>Jens Freimann <jfreimann@redhat.com> wrote:
>
>> On Wed, Oct 23, 2019 at 12:06:48PM -0600, Alex Williamson wrote:
>> >On Wed, 23 Oct 2019 10:27:02 +0200
>> >Jens Freimann <jfreimann@redhat.com> wrote:
[...]
>> >Are there also multi-function considerations that
>> >should be prevented or documented?  For example, if a user tries to
>> >configure both the primary and failover NICs in the same slot, I assume
>> >bad things will happen.
>>
>>   I would have expected that this is already checked in pci code, but
>> it is not. I tried it and when I put both devices into the same slot
>> they are both unplugged from the guest during boot but nothing else
>> happens. I don't know what triggers that unplug of the devices.
>>
>> I'm not aware of any other problems regarding multi-function, which
>> doesn't mean there aren't any.
>
>Hmm, was the hidden device at function #0?  The guest won't find any
>functions if function #0 isn't present, but I don't know what would
>trigger the hotplug.  The angle I'm thinking is that we only have slot
>level granularity for hotplug, so any sort of automatic hotplug of a
>slot should probably think about bystander devices within the slot.

Yes that would be a problem, but isn't it the same in the non-failover case
where a user configures it wrong? The slot where the device is plugged is not
chosen automatically it's configured by the user, no? I might be mixing something
up here.  I have no idea yet how to check if a slot is already populated, but
I'll think about it. 

Thanks!

regards,
Jens 



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 02/11] pci: add option for net failover
  2019-10-23 20:31         ` Jens Freimann
@ 2019-10-23 21:15           ` Alex Williamson
  2019-10-24 17:57             ` Laine Stump
  0 siblings, 1 reply; 34+ messages in thread
From: Alex Williamson @ 2019-10-23 21:15 UTC (permalink / raw)
  To: Jens Freimann
  Cc: pkrempa, berrange, ehabkost, mst, aadam, qemu-devel, dgilbert,
	laine, ailan, parav

On Wed, 23 Oct 2019 22:31:37 +0200
Jens Freimann <jfreimann@redhat.com> wrote:

> On Wed, Oct 23, 2019 at 02:02:11PM -0600, Alex Williamson wrote:
> >On Wed, 23 Oct 2019 21:30:35 +0200
> >Jens Freimann <jfreimann@redhat.com> wrote:
> >  
> >> On Wed, Oct 23, 2019 at 12:06:48PM -0600, Alex Williamson wrote:  
> >> >On Wed, 23 Oct 2019 10:27:02 +0200
> >> >Jens Freimann <jfreimann@redhat.com> wrote:  
> [...]
> >> >Are there also multi-function considerations that
> >> >should be prevented or documented?  For example, if a user tries to
> >> >configure both the primary and failover NICs in the same slot, I assume
> >> >bad things will happen.  
> >>
> >>   I would have expected that this is already checked in pci code, but
> >> it is not. I tried it and when I put both devices into the same slot
> >> they are both unplugged from the guest during boot but nothing else
> >> happens. I don't know what triggers that unplug of the devices.
> >>
> >> I'm not aware of any other problems regarding multi-function, which
> >> doesn't mean there aren't any.  
> >
> >Hmm, was the hidden device at function #0?  The guest won't find any
> >functions if function #0 isn't present, but I don't know what would
> >trigger the hotplug.  The angle I'm thinking is that we only have slot
> >level granularity for hotplug, so any sort of automatic hotplug of a
> >slot should probably think about bystander devices within the slot.  
> 
> Yes that would be a problem, but isn't it the same in the non-failover case
> where a user configures it wrong? The slot where the device is plugged is not
> chosen automatically it's configured by the user, no? I might be mixing something
> up here.  I have no idea yet how to check if a slot is already populated, but
> I'll think about it. 

I don't think libvirt will automatically make use of multifunction
endpoints, except maybe for some built-in devices, so yes it probably
would be up to the user to explicitly create a multifunction device.
But are there other scenarios that generate an automatic hot-unplug?
If a user creates a multifunction slot and then triggers a hot-unplug
themselves, it's easy to place the blame on the user if the result is
unexpected, but is it so obviously a user configuration error if the
hotplug occurs as an automatic response to a migration?  I'm not as
sure about that.

As indicated, I don't know whether this should just be documented or if
we should spend time preventing it, but someone, somewhere will
probably think it's a good idea to put their primary and failover NIC
in the same slot and be confused that the underlying mechanisms cannot
support it.  It doesn't appear that it would be too difficult to test
QEMU_PCI_CAP_MULTIFUNCTION (not set) and PCI_FUNC (is 0) for the
primary, but maybe I'm just being paranoid.  Thanks,

Alex



^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH v5 02/11] pci: add option for net failover
  2019-10-23  8:27 ` [PATCH v5 02/11] pci: add option for net failover Jens Freimann
  2019-10-23 18:06   ` Alex Williamson
@ 2019-10-24  5:03   ` Parav Pandit
  2019-10-24  9:37     ` Jens Freimann
  2019-10-24 17:22   ` Dr. David Alan Gilbert
  2 siblings, 1 reply; 34+ messages in thread
From: Parav Pandit @ 2019-10-24  5:03 UTC (permalink / raw)
  To: Jens Freimann, qemu-devel
  Cc: pkrempa, berrange, ehabkost, mst, aadam, dgilbert,
	alex.williamson, laine, ailan



> -----Original Message-----
> From: Jens Freimann <jfreimann@redhat.com>
> Sent: Wednesday, October 23, 2019 3:27 AM
> To: qemu-devel@nongnu.org
> Cc: ehabkost@redhat.com; mst@redhat.com; berrange@redhat.com;
> pkrempa@redhat.com; laine@redhat.com; aadam@redhat.com;
> ailan@redhat.com; Parav Pandit <parav@mellanox.com>;
> dgilbert@redhat.com; alex.williamson@redhat.com
> Subject: [PATCH v5 02/11] pci: add option for net failover
> 
> This patch adds a net_failover_pair_id property to PCIDev which is used to
> link the primary device in a failover pair (the PCI dev) to a standby (a virtio-
> net-pci) device.
> 
> It only supports ethernet devices. Also currently it only supports PCIe
> devices. QEMU will exit with an error message otherwise.
> 
> Signed-off-by: Jens Freimann <jfreimann@redhat.com>
> ---
>  hw/pci/pci.c         | 17 +++++++++++++++++
>  include/hw/pci/pci.h |  3 +++
>  2 files changed, 20 insertions(+)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c index aa05c2b9b2..fa9b5219f8 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -75,6 +75,8 @@ static Property pci_props[] = {
>                      QEMU_PCIE_LNKSTA_DLLLA_BITNR, true),
>      DEFINE_PROP_BIT("x-pcie-extcap-init", PCIDevice, cap_present,
>                      QEMU_PCIE_EXTCAP_INIT_BITNR, true),
> +    DEFINE_PROP_STRING("net_failover_pair_id", PCIDevice,
> +            net_failover_pair_id),
>      DEFINE_PROP_END_OF_LIST()
>  };
> 
> @@ -2077,6 +2079,7 @@ static void pci_qdev_realize(DeviceState *qdev,
> Error **errp)
>      ObjectClass *klass = OBJECT_CLASS(pc);
>      Error *local_err = NULL;
>      bool is_default_rom;
> +    uint16_t class_id;
> 
>      /* initialize cap_present for pci_is_express() and pci_config_size(),
>       * Note that hybrid PCIs are not set automatically and need to manage
> @@ -2101,6 +2104,20 @@ static void pci_qdev_realize(DeviceState *qdev,
> Error **errp)
>          }
>      }
> 
> +    if (pci_dev->net_failover_pair_id) {
> +        if (!pci_is_express(pci_dev)) {

I am testing and integrating this piece with mlx5 devices.
I see that pci_is_express() return true only for first PCI function.
Didn't yet dig the API.
Commenting out this check and below class check progresses further.

While reviewing, I realized that we shouldn't have this check for below reasons.

1. It is user's responsibility to pass networking device.
But its ok to check the class, if PCI Device is passed.
So class comparison should be inside the pci_check().

2. It is limiting to only consider PCI devices. 
Automated and regression tests should be able validate this feature without PCI Device.
This will enhance the stability of feature in long run.

3. net failover driver doesn't limit it to have it over only PCI device.
So similarly hypervisor should be limiting.

> +            error_setg(errp, "failover device is not a PCIExpress device");
> +            error_propagate(errp, local_err);
> +            return;
> +        }
> +        class_id = pci_get_word(pci_dev->config + PCI_CLASS_DEVICE);
> +        if (class_id != PCI_CLASS_NETWORK_ETHERNET) {
> +            error_setg(errp, "failover device is not an Ethernet device");
> +            error_propagate(errp, local_err);
> +            return;
> +        }
> +    }
> +
>      /* rom loading */
>      is_default_rom = false;
>      if (pci_dev->romfile == NULL && pc->romfile != NULL) { diff --git
> a/include/hw/pci/pci.h b/include/hw/pci/pci.h index f3f0ffd5fb..def5435685
> 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -352,6 +352,9 @@ struct PCIDevice {
>      MSIVectorUseNotifier msix_vector_use_notifier;
>      MSIVectorReleaseNotifier msix_vector_release_notifier;
>      MSIVectorPollNotifier msix_vector_poll_notifier;
> +
> +    /* ID of standby device in net_failover pair */
> +    char *net_failover_pair_id;
>  };
> 
>  void pci_register_bar(PCIDevice *pci_dev, int region_num,
> --
> 2.21.0



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 02/11] pci: add option for net failover
  2019-10-24  5:03   ` Parav Pandit
@ 2019-10-24  9:37     ` Jens Freimann
  2019-10-24 16:34       ` Parav Pandit
  2019-10-24 16:52       ` Alex Williamson
  0 siblings, 2 replies; 34+ messages in thread
From: Jens Freimann @ 2019-10-24  9:37 UTC (permalink / raw)
  To: Parav Pandit
  Cc: pkrempa, berrange, ehabkost, mst, aadam, qemu-devel, dgilbert,
	alex.williamson, laine, ailan

On Thu, Oct 24, 2019 at 05:03:46AM +0000, Parav Pandit wrote:
>> @@ -2101,6 +2104,20 @@ static void pci_qdev_realize(DeviceState *qdev,
>> Error **errp)
>>          }
>>      }
>>
>> +    if (pci_dev->net_failover_pair_id) {
>> +        if (!pci_is_express(pci_dev)) {
>
>I am testing and integrating this piece with mlx5 devices.
>I see that pci_is_express() return true only for first PCI function.
>Didn't yet dig the API.
>Commenting out this check and below class check progresses further.

First of all, thanks for testing this!
Could you share your commandline please? I can't reproduce it.
>
>While reviewing, I realized that we shouldn't have this check for below reasons.
>
>1. It is user's responsibility to pass networking device.
>But its ok to check the class, if PCI Device is passed.
>So class comparison should be inside the pci_check().

I'm not sure I understand this point, could you please elaborate?
You're suggesting to move the check for the class into the check for
pci_is_express?

>2. It is limiting to only consider PCI devices.
>Automated and regression tests should be able validate this feature without PCI Device.
>This will enhance the stability of feature in long run.
>
>3. net failover driver doesn't limit it to have it over only PCI device.
>So similarly hypervisor should be limiting.

I agree that we don't have to limit it to PCI(e) forever. But for this
first shot I think we should and then extend it continually. There are
more things we can support in the future like other hotplug types etc.

regards,
Jens  



^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH v5 02/11] pci: add option for net failover
  2019-10-24  9:37     ` Jens Freimann
@ 2019-10-24 16:34       ` Parav Pandit
  2019-10-24 17:06         ` Alex Williamson
  2019-10-24 16:52       ` Alex Williamson
  1 sibling, 1 reply; 34+ messages in thread
From: Parav Pandit @ 2019-10-24 16:34 UTC (permalink / raw)
  To: Jens Freimann
  Cc: pkrempa, berrange, ehabkost, mst, aadam, qemu-devel, dgilbert,
	alex.williamson, laine, ailan



> -----Original Message-----
> From: Jens Freimann <jfreimann@redhat.com>
> Sent: Thursday, October 24, 2019 4:38 AM
> To: Parav Pandit <parav@mellanox.com>
> Cc: qemu-devel@nongnu.org; ehabkost@redhat.com; mst@redhat.com;
> berrange@redhat.com; pkrempa@redhat.com; laine@redhat.com;
> aadam@redhat.com; ailan@redhat.com; dgilbert@redhat.com;
> alex.williamson@redhat.com
> Subject: Re: [PATCH v5 02/11] pci: add option for net failover
> 
> On Thu, Oct 24, 2019 at 05:03:46AM +0000, Parav Pandit wrote:
> >> @@ -2101,6 +2104,20 @@ static void pci_qdev_realize(DeviceState
> >> *qdev, Error **errp)
> >>          }
> >>      }
> >>
> >> +    if (pci_dev->net_failover_pair_id) {
> >> +        if (!pci_is_express(pci_dev)) {
> >
> >I am testing and integrating this piece with mlx5 devices.
> >I see that pci_is_express() return true only for first PCI function.
> >Didn't yet dig the API.
> >Commenting out this check and below class check progresses further.
> 
> First of all, thanks for testing this!
> Could you share your commandline please? I can't reproduce it.
> >
I added debug prints to get the difference between VF1 and VF2 behavior.
What I see is, vfio_populate_device() below code is activated for VF2 where qemu claims that its not a PCIe device.

    vdev->config_size = reg_info->size;
    if (vdev->config_size == PCI_CONFIG_SPACE_SIZE) {
        vdev->pdev.cap_present &= ~QEMU_PCI_CAP_EXPRESS;
        printf("%s clearing QEMU PCI bits\n", __func__);
    }

Command line:
/usr/local/bin/qemu-system-x86_64 -enable-kvm -m 3072 -smp 3 \
               -machine q35,usb=off,vmport=off,dump-guest-core=off -cpu Haswell-noTSX-IBRS \
           -net none \
               -qmp unix:/tmp/qmp.socket,server,nowait \
        -monitor telnet:127.0.0.1:5556,server,nowait \
        -device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
        -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
        -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
        -netdev tap,id=hostnet1,fd=4 4<>/dev/tap49\
        -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:02:02:02,bus=root2,failover=on \
        -device vfio-pci,id=hostdev0,host=05:00.2,bus=root1,net_failover_pair_id=net1 \
        /var/lib/libvirt/images/sriov-lm-02.qcow2

> >While reviewing, I realized that we shouldn't have this check for below
> reasons.
> >
> >1. It is user's responsibility to pass networking device.
> >But its ok to check the class, if PCI Device is passed.
> >So class comparison should be inside the pci_check().
> 
> I'm not sure I understand this point, could you please elaborate?
> You're suggesting to move the check for the class into the check for
> pci_is_express?
> 
No. Below is the suggestion.

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 8fbf32d68c..8004309973 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2105,17 +2105,14 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
     }

     if (pci_dev->net_failover_pair_id) {
-        if (!pci_is_express(pci_dev)) {
-            error_setg(errp, "failover device is not a PCIExpress device");
-            error_propagate(errp, local_err);
-            return;
-        }
-        class_id = pci_get_word(pci_dev->config + PCI_CLASS_DEVICE);
-        if (class_id != PCI_CLASS_NETWORK_ETHERNET) {
-            error_setg(errp, "failover device is not an Ethernet device");
-            error_propagate(errp, local_err);
-            return;
-        }
+        if (pci_is_express(pci_dev)) {
+            class_id = pci_get_word(pci_dev->config + PCI_CLASS_DEVICE);
+            if (class_id != PCI_CLASS_NETWORK_ETHERNET) {
+                error_setg(errp, "failover device is not an Ethernet device");
+                error_propagate(errp, local_err);
+                return;
+            }
+       }

This will allow to map non PCI device as failover too.
After writing above hunk I think that when code reaches to check for 
If (pci_dev->net_failover_pair_id),... it is already gone gone through do_pci_register_device().
There should not be any check needed again for pci_is_express().
Isn't it?


> >2. It is limiting to only consider PCI devices.
> >Automated and regression tests should be able validate this feature without
> PCI Device.
> >This will enhance the stability of feature in long run.
> >
> >3. net failover driver doesn't limit it to have it over only PCI device.
> >So similarly hypervisor should be limiting.
> 
> I agree that we don't have to limit it to PCI(e) forever. But for this first shot I
> think we should and then extend it continually. There are more things we can
> support in the future like other hotplug types etc.
> 
o.k. But probably net_failover_pair_id field should be in DeviceState instead of PCIDevice at minimum?
Or you want to refactor it later?


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 02/11] pci: add option for net failover
  2019-10-24  9:37     ` Jens Freimann
  2019-10-24 16:34       ` Parav Pandit
@ 2019-10-24 16:52       ` Alex Williamson
  2019-10-24 20:08         ` Jens Freimann
  1 sibling, 1 reply; 34+ messages in thread
From: Alex Williamson @ 2019-10-24 16:52 UTC (permalink / raw)
  To: Jens Freimann
  Cc: pkrempa, berrange, Parav Pandit, mst, aadam, qemu-devel,
	dgilbert, laine, ailan, ehabkost

On Thu, 24 Oct 2019 11:37:54 +0200
Jens Freimann <jfreimann@redhat.com> wrote:

> On Thu, Oct 24, 2019 at 05:03:46AM +0000, Parav Pandit wrote:
> >> @@ -2101,6 +2104,20 @@ static void pci_qdev_realize(DeviceState *qdev,
> >> Error **errp)
> >>          }
> >>      }
> >>
> >> +    if (pci_dev->net_failover_pair_id) {
> >> +        if (!pci_is_express(pci_dev)) {  
> >
> >I am testing and integrating this piece with mlx5 devices.
> >I see that pci_is_express() return true only for first PCI function.
> >Didn't yet dig the API.
> >Commenting out this check and below class check progresses further.  
> 
> First of all, thanks for testing this!
> Could you share your commandline please? I can't reproduce it.
> >
> >While reviewing, I realized that we shouldn't have this check for below reasons.
> >
> >1. It is user's responsibility to pass networking device.
> >But its ok to check the class, if PCI Device is passed.
> >So class comparison should be inside the pci_check().  
> 
> I'm not sure I understand this point, could you please elaborate?
> You're suggesting to move the check for the class into the check for
> pci_is_express?

Seems like the suggestion is that net_failover_pair_id should be an
option on the base class of PCIDevice (DeviceState?) and only if it's a
PCI device would we check the class code.  But there are dependencies
at the hotplug controller, which I think is why this is currently
specific to PCI.

However, it's an interesting point about pci_is_express().  This test
is really just meant to check whether the hotplug controller supports
this feature, which is only implemented in pciehp via this series.
There's a bit of a mismatch though that pcie_is_express() checks
whether the device is express, not whether the bus it sits on is
express.  I think we really want the latter, so maybe this should be:

pci_bus_is_express(pci_get_bus(dev)

For example this feature should work if I plug an e1000 (not e1000e)
into an express slot, but not if I plug an e1000e into a conventional
slot.
 
> >2. It is limiting to only consider PCI devices.
> >Automated and regression tests should be able validate this feature without PCI Device.
> >This will enhance the stability of feature in long run.
> >
> >3. net failover driver doesn't limit it to have it over only PCI device.
> >So similarly hypervisor should be limiting.  
> 
> I agree that we don't have to limit it to PCI(e) forever. But for this
> first shot I think we should and then extend it continually. There are
> more things we can support in the future like other hotplug types etc.

Yep, long term it seems very generic, but there's a dependency in the
hotplug controller and it is beneficial that PCI has a class code
feature that allows us to error if this is specified on a non-net
device.  Thanks,

Alex



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 02/11] pci: add option for net failover
  2019-10-24 16:34       ` Parav Pandit
@ 2019-10-24 17:06         ` Alex Williamson
  0 siblings, 0 replies; 34+ messages in thread
From: Alex Williamson @ 2019-10-24 17:06 UTC (permalink / raw)
  To: Parav Pandit
  Cc: pkrempa, berrange, ehabkost, mst, aadam, qemu-devel, dgilbert,
	laine, Jens Freimann, ailan

On Thu, 24 Oct 2019 16:34:01 +0000
Parav Pandit <parav@mellanox.com> wrote:

> > -----Original Message-----
> > From: Jens Freimann <jfreimann@redhat.com>
> > Sent: Thursday, October 24, 2019 4:38 AM
> > To: Parav Pandit <parav@mellanox.com>
> > Cc: qemu-devel@nongnu.org; ehabkost@redhat.com; mst@redhat.com;
> > berrange@redhat.com; pkrempa@redhat.com; laine@redhat.com;
> > aadam@redhat.com; ailan@redhat.com; dgilbert@redhat.com;
> > alex.williamson@redhat.com
> > Subject: Re: [PATCH v5 02/11] pci: add option for net failover
> > 
> > On Thu, Oct 24, 2019 at 05:03:46AM +0000, Parav Pandit wrote:  
> > >> @@ -2101,6 +2104,20 @@ static void pci_qdev_realize(DeviceState
> > >> *qdev, Error **errp)
> > >>          }
> > >>      }
> > >>
> > >> +    if (pci_dev->net_failover_pair_id) {
> > >> +        if (!pci_is_express(pci_dev)) {  
> > >
> > >I am testing and integrating this piece with mlx5 devices.
> > >I see that pci_is_express() return true only for first PCI function.
> > >Didn't yet dig the API.
> > >Commenting out this check and below class check progresses further.  
> > 
> > First of all, thanks for testing this!
> > Could you share your commandline please? I can't reproduce it.  
> > >  
> I added debug prints to get the difference between VF1 and VF2 behavior.
> What I see is, vfio_populate_device() below code is activated for VF2 where qemu claims that its not a PCIe device.
> 
>     vdev->config_size = reg_info->size;
>     if (vdev->config_size == PCI_CONFIG_SPACE_SIZE) {
>         vdev->pdev.cap_present &= ~QEMU_PCI_CAP_EXPRESS;
>         printf("%s clearing QEMU PCI bits\n", __func__);
>     }
> 
> Command line:
> /usr/local/bin/qemu-system-x86_64 -enable-kvm -m 3072 -smp 3 \
>                -machine q35,usb=off,vmport=off,dump-guest-core=off -cpu Haswell-noTSX-IBRS \
>            -net none \
>                -qmp unix:/tmp/qmp.socket,server,nowait \
>         -monitor telnet:127.0.0.1:5556,server,nowait \
>         -device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
>         -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
>         -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
>         -netdev tap,id=hostnet1,fd=4 4<>/dev/tap49\
>         -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:02:02:02,bus=root2,failover=on \
>         -device vfio-pci,id=hostdev0,host=05:00.2,bus=root1,net_failover_pair_id=net1 \
>         /var/lib/libvirt/images/sriov-lm-02.qcow2
> 
> > >While reviewing, I realized that we shouldn't have this check for below  
> > reasons.  
> > >
> > >1. It is user's responsibility to pass networking device.
> > >But its ok to check the class, if PCI Device is passed.
> > >So class comparison should be inside the pci_check().  
> > 
> > I'm not sure I understand this point, could you please elaborate?
> > You're suggesting to move the check for the class into the check for
> > pci_is_express?
> >   
> No. Below is the suggestion.
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 8fbf32d68c..8004309973 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -2105,17 +2105,14 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
>      }
> 
>      if (pci_dev->net_failover_pair_id) {
> -        if (!pci_is_express(pci_dev)) {
> -            error_setg(errp, "failover device is not a PCIExpress device");
> -            error_propagate(errp, local_err);
> -            return;
> -        }
> -        class_id = pci_get_word(pci_dev->config + PCI_CLASS_DEVICE);
> -        if (class_id != PCI_CLASS_NETWORK_ETHERNET) {
> -            error_setg(errp, "failover device is not an Ethernet device");
> -            error_propagate(errp, local_err);
> -            return;
> -        }
> +        if (pci_is_express(pci_dev)) {
> +            class_id = pci_get_word(pci_dev->config + PCI_CLASS_DEVICE);
> +            if (class_id != PCI_CLASS_NETWORK_ETHERNET) {
> +                error_setg(errp, "failover device is not an Ethernet device");
> +                error_propagate(errp, local_err);
> +                return;
> +            }
> +       }
> 
> This will allow to map non PCI device as failover too.

As in previous email, the point of the check was to exclude devices
when the hotplug controller is known not to support the feature.  It's
a topology check masked as a device check, it only exists because
support at the hotplug controller is not ubiquitous.  Thanks,

Alex

> After writing above hunk I think that when code reaches to check for 
> If (pci_dev->net_failover_pair_id),... it is already gone gone through do_pci_register_device().
> There should not be any check needed again for pci_is_express().
> Isn't it?
> 
> 
> > >2. It is limiting to only consider PCI devices.
> > >Automated and regression tests should be able validate this feature without  
> > PCI Device.  
> > >This will enhance the stability of feature in long run.
> > >
> > >3. net failover driver doesn't limit it to have it over only PCI device.
> > >So similarly hypervisor should be limiting.  
> > 
> > I agree that we don't have to limit it to PCI(e) forever. But for this first shot I
> > think we should and then extend it continually. There are more things we can
> > support in the future like other hotplug types etc.
> >   
> o.k. But probably net_failover_pair_id field should be in DeviceState instead of PCIDevice at minimum?
> Or you want to refactor it later?



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 02/11] pci: add option for net failover
  2019-10-23  8:27 ` [PATCH v5 02/11] pci: add option for net failover Jens Freimann
  2019-10-23 18:06   ` Alex Williamson
  2019-10-24  5:03   ` Parav Pandit
@ 2019-10-24 17:22   ` Dr. David Alan Gilbert
  2019-10-24 19:56     ` Jens Freimann
  2 siblings, 1 reply; 34+ messages in thread
From: Dr. David Alan Gilbert @ 2019-10-24 17:22 UTC (permalink / raw)
  To: Jens Freimann
  Cc: pkrempa, berrange, ehabkost, mst, aadam, qemu-devel,
	alex.williamson, laine, ailan, parav

* Jens Freimann (jfreimann@redhat.com) wrote:
> This patch adds a net_failover_pair_id property to PCIDev which is
> used to link the primary device in a failover pair (the PCI dev) to
> a standby (a virtio-net-pci) device.
> 
> It only supports ethernet devices. Also currently it only supports
> PCIe devices. QEMU will exit with an error message otherwise.
> 
> Signed-off-by: Jens Freimann <jfreimann@redhat.com>
> ---
>  hw/pci/pci.c         | 17 +++++++++++++++++
>  include/hw/pci/pci.h |  3 +++
>  2 files changed, 20 insertions(+)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index aa05c2b9b2..fa9b5219f8 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -75,6 +75,8 @@ static Property pci_props[] = {
>                      QEMU_PCIE_LNKSTA_DLLLA_BITNR, true),
>      DEFINE_PROP_BIT("x-pcie-extcap-init", PCIDevice, cap_present,
>                      QEMU_PCIE_EXTCAP_INIT_BITNR, true),
> +    DEFINE_PROP_STRING("net_failover_pair_id", PCIDevice,
> +            net_failover_pair_id),

Should we just make this 'failover_pair_id' - then when someone in the
future figures out how to make it work for something else (e.g.
multipath block devices) then it's all good?

Dave

>      DEFINE_PROP_END_OF_LIST()
>  };
>  
> @@ -2077,6 +2079,7 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
>      ObjectClass *klass = OBJECT_CLASS(pc);
>      Error *local_err = NULL;
>      bool is_default_rom;
> +    uint16_t class_id;
>  
>      /* initialize cap_present for pci_is_express() and pci_config_size(),
>       * Note that hybrid PCIs are not set automatically and need to manage
> @@ -2101,6 +2104,20 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
>          }
>      }
>  
> +    if (pci_dev->net_failover_pair_id) {
> +        if (!pci_is_express(pci_dev)) {
> +            error_setg(errp, "failover device is not a PCIExpress device");
> +            error_propagate(errp, local_err);
> +            return;
> +        }
> +        class_id = pci_get_word(pci_dev->config + PCI_CLASS_DEVICE);
> +        if (class_id != PCI_CLASS_NETWORK_ETHERNET) {
> +            error_setg(errp, "failover device is not an Ethernet device");
> +            error_propagate(errp, local_err);
> +            return;
> +        }
> +    }
> +
>      /* rom loading */
>      is_default_rom = false;
>      if (pci_dev->romfile == NULL && pc->romfile != NULL) {
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index f3f0ffd5fb..def5435685 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -352,6 +352,9 @@ struct PCIDevice {
>      MSIVectorUseNotifier msix_vector_use_notifier;
>      MSIVectorReleaseNotifier msix_vector_release_notifier;
>      MSIVectorPollNotifier msix_vector_poll_notifier;
> +
> +    /* ID of standby device in net_failover pair */
> +    char *net_failover_pair_id;
>  };
>  
>  void pci_register_bar(PCIDevice *pci_dev, int region_num,
> -- 
> 2.21.0
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 06/11] qapi: add failover negotiated event
  2019-10-23  8:27 ` [PATCH v5 06/11] qapi: add failover negotiated event Jens Freimann
@ 2019-10-24 17:32   ` Dr. David Alan Gilbert
  2019-10-24 20:03     ` Jens Freimann
  2019-10-25  5:35   ` Markus Armbruster
  1 sibling, 1 reply; 34+ messages in thread
From: Dr. David Alan Gilbert @ 2019-10-24 17:32 UTC (permalink / raw)
  To: Jens Freimann
  Cc: pkrempa, berrange, ehabkost, mst, aadam, qemu-devel,
	alex.williamson, laine, ailan, parav

* Jens Freimann (jfreimann@redhat.com) wrote:
> This event is sent to let libvirt know that VIRTIO_NET_F_STANDBY
> feature was not negotiated during virtio feature negotiation. If this
> event is received it means any primary devices hotplugged before
> this were were never really added to QEMU devices.
> 
> Signed-off-by: Jens Freimann <jfreimann@redhat.com>

Can I just understand a bit more about what the meaning of this is.

Say my VM boots:
   a) BIOS
   b) Boot loader
   c) Linux
   d) Reboots
      (possibly a',b', different c')

When would I get that event?
When can libvirt know it can use it?

Dave

> ---
>  qapi/net.json | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/qapi/net.json b/qapi/net.json
> index 728990f4fb..8c5f3f1fb2 100644
> --- a/qapi/net.json
> +++ b/qapi/net.json
> @@ -737,3 +737,19 @@
>  ##
>  { 'command': 'announce-self', 'boxed': true,
>    'data' : 'AnnounceParameters'}
> +
> +##
> +# @FAILOVER_NEGOTIATED:
> +#
> +# Emitted when VIRTIO_NET_F_STANDBY was negotiated during feature negotiation
> +#
> +# Since: 4.2
> +#
> +# Example:
> +#
> +# <- { "event": "FAILOVER_NEGOTIATED",
> +#      "data": {} }
> +#
> +##
> +{ 'event': 'FAILOVER_NEGOTIATED',
> +  'data': {} }
> -- 
> 2.21.0
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 02/11] pci: add option for net failover
  2019-10-23 21:15           ` Alex Williamson
@ 2019-10-24 17:57             ` Laine Stump
  0 siblings, 0 replies; 34+ messages in thread
From: Laine Stump @ 2019-10-24 17:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: pkrempa, berrange, ehabkost, mst, aadam, dgilbert,
	Alex Williamson, Jens Freimann, ailan, parav

On 10/23/19 5:15 PM, Alex Williamson wrote:
> On Wed, 23 Oct 2019 22:31:37 +0200
> Jens Freimann <jfreimann@redhat.com> wrote:
> 
>> On Wed, Oct 23, 2019 at 02:02:11PM -0600, Alex Williamson wrote:
>>> On Wed, 23 Oct 2019 21:30:35 +0200
>>> Jens Freimann <jfreimann@redhat.com> wrote:
>>>   
>>>> On Wed, Oct 23, 2019 at 12:06:48PM -0600, Alex Williamson wrote:
>>>>> On Wed, 23 Oct 2019 10:27:02 +0200
>>>>> Jens Freimann <jfreimann@redhat.com> wrote:
>> [...]
>>>>> Are there also multi-function considerations that
>>>>> should be prevented or documented?  For example, if a user tries to
>>>>> configure both the primary and failover NICs in the same slot, I assume
>>>>> bad things will happen.
>>>>
>>>>    I would have expected that this is already checked in pci code, but
>>>> it is not. I tried it and when I put both devices into the same slot
>>>> they are both unplugged from the guest during boot but nothing else
>>>> happens. I don't know what triggers that unplug of the devices.
>>>>
>>>> I'm not aware of any other problems regarding multi-function, which
>>>> doesn't mean there aren't any.
>>>
>>> Hmm, was the hidden device at function #0?  The guest won't find any
>>> functions if function #0 isn't present, but I don't know what would
>>> trigger the hotplug.  The angle I'm thinking is that we only have slot
>>> level granularity for hotplug, so any sort of automatic hotplug of a
>>> slot should probably think about bystander devices within the slot.
>>
>> Yes that would be a problem, but isn't it the same in the non-failover case
>> where a user configures it wrong? The slot where the device is plugged is not
>> chosen automatically it's configured by the user, no? I might be mixing something
>> up here.  I have no idea yet how to check if a slot is already populated, but
>> I'll think about it.
> 
> I don't think libvirt will automatically make use of multifunction
> endpoints, except maybe for some built-in devices, so yes it probably
> would be up to the user to explicitly create a multifunction device.

Correct. The only place libvirt will ever assign devices anywhere except 
function 0 is when we are adding pcie-root-ports - those are combined 
8-per-slot in order to conserve space on pcie.0 (this permits us to have 
up to 240 PCIe devices without needing to resort to upstream/downstream 
switches).


> But are there other scenarios that generate an automatic hot-unplug?
> If a user creates a multifunction slot and then triggers a hot-unplug
> themselves, it's easy to place the blame on the user if the result is
> unexpected, but is it so obviously a user configuration error if the
> hotplug occurs as an automatic response to a migration?  I'm not as
> sure about that.

I guess that's all a matter of opinion. If the user never enters in any 
PCI address info and it's all handled by someone else, then I wouldn't 
expect them to know exactly where the devices were (and only vaguely 
understand that their hostdev network interface is going to be unplugged 
during migration). In that case (as long as it's libvirt assigning the 
PCI addresses) the situation we're considering would never ever happen, 
so it's a non-issue.

If, on the other hand, the user wants to mess around assigning PCI 
addresses themselves, then they get to pick up all the pieces. It might 
be nice if they could be given a clue about why it broke though.

> 
> As indicated, I don't know whether this should just be documented or if
> we should spend time preventing it, but someone, somewhere will
> probably think it's a good idea to put their primary and failover NIC
> in the same slot and be confused that the underlying mechanisms cannot
> support it.  It doesn't appear that it would be too difficult to test
> QEMU_PCI_CAP_MULTIFUNCTION (not set) and PCI_FUNC (is 0) for the
> primary, but maybe I'm just being paranoid.  Thanks,

If, as you claim, it's not difficult, then I guess why not?



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 02/11] pci: add option for net failover
  2019-10-24 17:22   ` Dr. David Alan Gilbert
@ 2019-10-24 19:56     ` Jens Freimann
  0 siblings, 0 replies; 34+ messages in thread
From: Jens Freimann @ 2019-10-24 19:56 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: pkrempa, berrange, ehabkost, mst, aadam, qemu-devel,
	alex.williamson, laine, ailan, parav

On Thu, Oct 24, 2019 at 06:22:27PM +0100, Dr. David Alan Gilbert wrote:
>* Jens Freimann (jfreimann@redhat.com) wrote:
>> This patch adds a net_failover_pair_id property to PCIDev which is
>> used to link the primary device in a failover pair (the PCI dev) to
>> a standby (a virtio-net-pci) device.
>>
>> It only supports ethernet devices. Also currently it only supports
>> PCIe devices. QEMU will exit with an error message otherwise.
>>
>> Signed-off-by: Jens Freimann <jfreimann@redhat.com>
>> ---
>>  hw/pci/pci.c         | 17 +++++++++++++++++
>>  include/hw/pci/pci.h |  3 +++
>>  2 files changed, 20 insertions(+)
>>
>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>> index aa05c2b9b2..fa9b5219f8 100644
>> --- a/hw/pci/pci.c
>> +++ b/hw/pci/pci.c
>> @@ -75,6 +75,8 @@ static Property pci_props[] = {
>>                      QEMU_PCIE_LNKSTA_DLLLA_BITNR, true),
>>      DEFINE_PROP_BIT("x-pcie-extcap-init", PCIDevice, cap_present,
>>                      QEMU_PCIE_EXTCAP_INIT_BITNR, true),
>> +    DEFINE_PROP_STRING("net_failover_pair_id", PCIDevice,
>> +            net_failover_pair_id),
>
>Should we just make this 'failover_pair_id' - then when someone in the
>future figures out how to make it work for something else (e.g.
>multipath block devices) then it's all good?

Yes, I see no reason why not to rename it. 

Thanks!

regards,
Jens



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 06/11] qapi: add failover negotiated event
  2019-10-24 17:32   ` Dr. David Alan Gilbert
@ 2019-10-24 20:03     ` Jens Freimann
  0 siblings, 0 replies; 34+ messages in thread
From: Jens Freimann @ 2019-10-24 20:03 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: pkrempa, berrange, ehabkost, mst, aadam, qemu-devel,
	alex.williamson, laine, ailan, parav

On Thu, Oct 24, 2019 at 06:32:45PM +0100, Dr. David Alan Gilbert wrote:
>* Jens Freimann (jfreimann@redhat.com) wrote:
>> This event is sent to let libvirt know that VIRTIO_NET_F_STANDBY
>> feature was not negotiated during virtio feature negotiation. If this
>> event is received it means any primary devices hotplugged before
>> this were were never really added to QEMU devices.
>>
>> Signed-off-by: Jens Freimann <jfreimann@redhat.com>
>
>Can I just understand a bit more about what the meaning of this is.
>
>Say my VM boots:
>   a) BIOS
>   b) Boot loader
>   c) Linux
>   d) Reboots
>      (possibly a',b', different c')
>
>When would I get that event?
>When can libvirt know it can use it?

The event is sent every time we do feature negotiation for the virtio-net
device, so you'll get it during Linux boot and reboots.

In v6, I add a data field 'id' to the event to pass the device id. 

regards,
Jens 



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 02/11] pci: add option for net failover
  2019-10-24 16:52       ` Alex Williamson
@ 2019-10-24 20:08         ` Jens Freimann
  0 siblings, 0 replies; 34+ messages in thread
From: Jens Freimann @ 2019-10-24 20:08 UTC (permalink / raw)
  To: Alex Williamson
  Cc: pkrempa, berrange, Parav Pandit, mst, aadam, qemu-devel,
	dgilbert, laine, ailan, ehabkost

On Thu, Oct 24, 2019 at 10:52:36AM -0600, Alex Williamson wrote:
>On Thu, 24 Oct 2019 11:37:54 +0200
>Jens Freimann <jfreimann@redhat.com> wrote:
[...]
> >
>> >While reviewing, I realized that we shouldn't have this check for below reasons.
>> >
>> >1. It is user's responsibility to pass networking device.
>> >But its ok to check the class, if PCI Device is passed.
>> >So class comparison should be inside the pci_check().
>>
>> I'm not sure I understand this point, could you please elaborate?
>> You're suggesting to move the check for the class into the check for
>> pci_is_express?
>
>Seems like the suggestion is that net_failover_pair_id should be an
>option on the base class of PCIDevice (DeviceState?) and only if it's a
>PCI device would we check the class code.  But there are dependencies
>at the hotplug controller, which I think is why this is currently
>specific to PCI.

Yes, It doesn't support acpi, shpc,... hotplug as of now. It
shouldn't be hard to add but I'd like to do it as a follow-on series.

>However, it's an interesting point about pci_is_express().  This test
>is really just meant to check whether the hotplug controller supports
>this feature, which is only implemented in pciehp via this series.
>There's a bit of a mismatch though that pcie_is_express() checks
>whether the device is express, not whether the bus it sits on is
>express.  I think we really want the latter, so maybe this should be:
>
>pci_bus_is_express(pci_get_bus(dev)
>
>For example this feature should work if I plug an e1000 (not e1000e)
>into an express slot, but not if I plug an e1000e into a conventional
>slot.

I'll try this and test. 

Thanks!

regards,
Jens 



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 06/11] qapi: add failover negotiated event
  2019-10-23  8:27 ` [PATCH v5 06/11] qapi: add failover negotiated event Jens Freimann
  2019-10-24 17:32   ` Dr. David Alan Gilbert
@ 2019-10-25  5:35   ` Markus Armbruster
  2019-10-25  7:51     ` Jens Freimann
  1 sibling, 1 reply; 34+ messages in thread
From: Markus Armbruster @ 2019-10-25  5:35 UTC (permalink / raw)
  To: Jens Freimann
  Cc: pkrempa, berrange, ehabkost, mst, aadam, qemu-devel, dgilbert,
	alex.williamson, laine, ailan, parav

We ask patch submitters to cc: subject matter experts for review.  You
did.  When such patches touch the QAPI schema, it's best to cc the qapi
schema maintainers (Eric Blake and me) as well, because we can't require
all subject matter experts to be fluent in the QAPI schema language and
conventions.  I found this one more or less by chance.

Jens Freimann <jfreimann@redhat.com> writes:

> This event is sent to let libvirt know that VIRTIO_NET_F_STANDBY
> feature was not negotiated during virtio feature negotiation. If this
> event is received it means any primary devices hotplugged before
> this were were never really added to QEMU devices.

Too many negations for my poor old brain to process.

>
> Signed-off-by: Jens Freimann <jfreimann@redhat.com>
> ---
>  qapi/net.json | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
>
> diff --git a/qapi/net.json b/qapi/net.json
> index 728990f4fb..8c5f3f1fb2 100644
> --- a/qapi/net.json
> +++ b/qapi/net.json
> @@ -737,3 +737,19 @@
>  ##
>  { 'command': 'announce-self', 'boxed': true,
>    'data' : 'AnnounceParameters'}
> +
> +##
> +# @FAILOVER_NEGOTIATED:
> +#
> +# Emitted when VIRTIO_NET_F_STANDBY was negotiated during feature negotiation
> +#
> +# Since: 4.2
> +#
> +# Example:
> +#
> +# <- { "event": "FAILOVER_NEGOTIATED",
> +#      "data": {} }
> +#
> +##
> +{ 'event': 'FAILOVER_NEGOTIATED',
> +  'data': {} }

The commit message at least tries to explain intended use.  The doc
string does not.  Should it?



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 06/11] qapi: add failover negotiated event
  2019-10-25  5:35   ` Markus Armbruster
@ 2019-10-25  7:51     ` Jens Freimann
  0 siblings, 0 replies; 34+ messages in thread
From: Jens Freimann @ 2019-10-25  7:51 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: pkrempa, berrange, ehabkost, mst, aadam, qemu-devel, dgilbert,
	alex.williamson, laine, ailan, parav

On Fri, Oct 25, 2019 at 07:35:28AM +0200, Markus Armbruster wrote:
>We ask patch submitters to cc: subject matter experts for review.  You
>did.  When such patches touch the QAPI schema, it's best to cc the qapi
>schema maintainers (Eric Blake and me) as well, because we can't require
>all subject matter experts to be fluent in the QAPI schema language and
>conventions.  I found this one more or less by chance.

Sorry about that, I'll make sure to get the right people next time.
>
>Jens Freimann <jfreimann@redhat.com> writes:
>
>> This event is sent to let libvirt know that VIRTIO_NET_F_STANDBY
>> feature was not negotiated during virtio feature negotiation. If this
>> event is received it means any primary devices hotplugged before
>> this were were never really added to QEMU devices.
>
>Too many negations for my poor old brain to process.

I'll try to explain better :) 
>
>>
>> Signed-off-by: Jens Freimann <jfreimann@redhat.com>
>> ---
>>  qapi/net.json | 16 ++++++++++++++++
>>  1 file changed, 16 insertions(+)
>>
>> diff --git a/qapi/net.json b/qapi/net.json
>> index 728990f4fb..8c5f3f1fb2 100644
>> --- a/qapi/net.json
>> +++ b/qapi/net.json
>> @@ -737,3 +737,19 @@
>>  ##
>>  { 'command': 'announce-self', 'boxed': true,
>>    'data' : 'AnnounceParameters'}
>> +
>> +##
>> +# @FAILOVER_NEGOTIATED:
>> +#
>> +# Emitted when VIRTIO_NET_F_STANDBY was negotiated during feature negotiation
>> +#
>> +# Since: 4.2
>> +#
>> +# Example:
>> +#
>> +# <- { "event": "FAILOVER_NEGOTIATED",
>> +#      "data": {} }
>> +#
>> +##
>> +{ 'event': 'FAILOVER_NEGOTIATED',
>> +  'data': {} }
>
>The commit message at least tries to explain intended use.  The doc
>string does not.  Should it?

Sure, I'll add it. 

Thanks for the review!

regards,
Jens 



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 02/11] pci: add option for net failover
  2019-10-23 18:06   ` Alex Williamson
  2019-10-23 19:30     ` Jens Freimann
@ 2019-10-25 10:52     ` Jens Freimann
  2019-10-25 14:56       ` Alex Williamson
  1 sibling, 1 reply; 34+ messages in thread
From: Jens Freimann @ 2019-10-25 10:52 UTC (permalink / raw)
  To: Alex Williamson
  Cc: pkrempa, berrange, ehabkost, mst, aadam, qemu-devel, dgilbert,
	laine, ailan, parav

On Wed, Oct 23, 2019 at 12:06:48PM -0600, Alex Williamson wrote:
>On Wed, 23 Oct 2019 10:27:02 +0200
>Jens Freimann <jfreimann@redhat.com> wrote:
[...]
>> @@ -2101,6 +2104,20 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
>>          }
>>      }
>>
>> +    if (pci_dev->net_failover_pair_id) {
>> +        if (!pci_is_express(pci_dev)) {
>> +            error_setg(errp, "failover device is not a PCIExpress device");
>> +            error_propagate(errp, local_err);
>> +            return;
>> +        }
>
>Did we decide we don't need to test that the device is also in a
>hotpluggable slot?  Are there also multi-function considerations that
>should be prevented or documented?  For example, if a user tries to
>configure both the primary and failover NICs in the same slot, I assume
>bad things will happen.

I added this check

        if (!(pci_dev->cap_present & QEMU_PCI_CAP_MULTIFUNCTION)
            && (PCI_FUNC(pci_dev->devfn) == 0)) {
            qdev->allow_unplug_during_migration = true;
        } else {
            error_setg(errp, "failover: primary device must be in its own "
                              "PCI slot");
            error_propagate(errp, local_err);
            pci_qdev_unrealize(DEVICE(pci_dev), NULL);
            return;
        }

When I first add a vfio-pci with net_failover_pair_id=x,multifunction=on
and addr=0.0 I will now get an error.

(qemu) device_add vfio-pci,...,bus=root2,net_failover_pair_id=net1,multifunction=on,addr=0.0
Error: failover: primary device must be in its own PCI slot

If I put in a virtio-net device in slot 0 and then try to add a
vfio-pci device in the same slot I get the following error message:

-device virtio-net-pci,...id=net1bus=root1,failover=on,multifunction=on,addr=0.0
(qemu) device_add vfio-pci,id=hostdev1,host=0000:5e:00.2,bus=root1,net_failover_pair_id=net1,addr=0.1
Error: PCI: slot 0 function 0 already ocuppied by virtio-net-pci,
   new func vfio-pci cannot be exposed to guest.


regards,
Jens



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v5 02/11] pci: add option for net failover
  2019-10-25 10:52     ` Jens Freimann
@ 2019-10-25 14:56       ` Alex Williamson
  0 siblings, 0 replies; 34+ messages in thread
From: Alex Williamson @ 2019-10-25 14:56 UTC (permalink / raw)
  To: Jens Freimann
  Cc: pkrempa, berrange, ehabkost, mst, aadam, qemu-devel, dgilbert,
	laine, ailan, parav

On Fri, 25 Oct 2019 12:52:32 +0200
Jens Freimann <jfreimann@redhat.com> wrote:

> On Wed, Oct 23, 2019 at 12:06:48PM -0600, Alex Williamson wrote:
> >On Wed, 23 Oct 2019 10:27:02 +0200
> >Jens Freimann <jfreimann@redhat.com> wrote:  
> [...]
> >> @@ -2101,6 +2104,20 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
> >>          }
> >>      }
> >>
> >> +    if (pci_dev->net_failover_pair_id) {
> >> +        if (!pci_is_express(pci_dev)) {
> >> +            error_setg(errp, "failover device is not a PCIExpress device");
> >> +            error_propagate(errp, local_err);
> >> +            return;
> >> +        }  
> >
> >Did we decide we don't need to test that the device is also in a
> >hotpluggable slot?  Are there also multi-function considerations that
> >should be prevented or documented?  For example, if a user tries to
> >configure both the primary and failover NICs in the same slot, I assume
> >bad things will happen.  
> 
> I added this check
> 
>         if (!(pci_dev->cap_present & QEMU_PCI_CAP_MULTIFUNCTION)
>             && (PCI_FUNC(pci_dev->devfn) == 0)) {
>             qdev->allow_unplug_during_migration = true;
>         } else {
>             error_setg(errp, "failover: primary device must be in its own "
>                               "PCI slot");
>             error_propagate(errp, local_err);
>             pci_qdev_unrealize(DEVICE(pci_dev), NULL);
>             return;
>         }
> 
> When I first add a vfio-pci with net_failover_pair_id=x,multifunction=on
> and addr=0.0 I will now get an error.
> 
> (qemu) device_add vfio-pci,...,bus=root2,net_failover_pair_id=net1,multifunction=on,addr=0.0
> Error: failover: primary device must be in its own PCI slot
> 
> If I put in a virtio-net device in slot 0 and then try to add a
> vfio-pci device in the same slot I get the following error message:
> 
> -device virtio-net-pci,...id=net1bus=root1,failover=on,multifunction=on,addr=0.0
> (qemu) device_add vfio-pci,id=hostdev1,host=0000:5e:00.2,bus=root1,net_failover_pair_id=net1,addr=0.1
> Error: PCI: slot 0 function 0 already ocuppied by virtio-net-pci,
>    new func vfio-pci cannot be exposed to guest.

Cool, looks good.  Thanks,

Alex



^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2019-10-25 15:12 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-23  8:27 [PATCH v5 0/11] add failover feature for assigned network devices Jens Freimann
2019-10-23  8:27 ` [PATCH v5 01/11] qdev/qbus: add hidden device support Jens Freimann
2019-10-23  8:27 ` [PATCH v5 02/11] pci: add option for net failover Jens Freimann
2019-10-23 18:06   ` Alex Williamson
2019-10-23 19:30     ` Jens Freimann
2019-10-23 20:02       ` Alex Williamson
2019-10-23 20:31         ` Jens Freimann
2019-10-23 21:15           ` Alex Williamson
2019-10-24 17:57             ` Laine Stump
2019-10-25 10:52     ` Jens Freimann
2019-10-25 14:56       ` Alex Williamson
2019-10-24  5:03   ` Parav Pandit
2019-10-24  9:37     ` Jens Freimann
2019-10-24 16:34       ` Parav Pandit
2019-10-24 17:06         ` Alex Williamson
2019-10-24 16:52       ` Alex Williamson
2019-10-24 20:08         ` Jens Freimann
2019-10-24 17:22   ` Dr. David Alan Gilbert
2019-10-24 19:56     ` Jens Freimann
2019-10-23  8:27 ` [PATCH v5 03/11] pci: mark devices partially unplugged Jens Freimann
2019-10-23  8:27 ` [PATCH v5 04/11] pci: mark device having guest unplug request pending Jens Freimann
2019-10-23  8:27 ` [PATCH v5 05/11] qapi: add unplug primary event Jens Freimann
2019-10-23 11:32   ` Eric Blake
2019-10-23  8:27 ` [PATCH v5 06/11] qapi: add failover negotiated event Jens Freimann
2019-10-24 17:32   ` Dr. David Alan Gilbert
2019-10-24 20:03     ` Jens Freimann
2019-10-25  5:35   ` Markus Armbruster
2019-10-25  7:51     ` Jens Freimann
2019-10-23  8:27 ` [PATCH v5 07/11] migration: allow unplug during migration for failover devices Jens Freimann
2019-10-23  8:27 ` [PATCH v5 08/11] migration: add new migration state wait-unplug Jens Freimann
2019-10-23  8:27 ` [PATCH v5 09/11] libqos: tolerate wait-unplug migration state Jens Freimann
2019-10-23  8:27 ` [PATCH v5 10/11] net/virtio: add failover support Jens Freimann
2019-10-23  8:27 ` [PATCH v5 11/11] vfio: unplug failover primary device before migration Jens Freimann
2019-10-23 18:28   ` Alex Williamson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).