All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices
@ 2019-05-17 12:58 Jens Freimann
  2019-05-17 12:58 ` [Qemu-devel] [PATCH 1/4] migration: allow unplug during migration for failover devices Jens Freimann
                   ` (7 more replies)
  0 siblings, 8 replies; 77+ messages in thread
From: Jens Freimann @ 2019-05-17 12:58 UTC (permalink / raw)
  To: qemu-devel; +Cc: pkrempa, berrange, ehabkost, mst, aadam, laine, ailan

This is another attempt at implementing the host side of the
net_failover concept
(https://www.kernel.org/doc/html/latest/networking/net_failover.html)

Changes since last RFC:
- work around circular dependency of commandline options. Just add
  failover=on to the virtio-net standby options and reference it from
  primary (vfio-pci) device with standby=<id>  
- add patch 3/4 to allow migration of vfio-pci device when it is part of a
  failover pair, still disallow for all other devices
- add patch 4/4 to allow unplug of device during migrationm, make an
  exception for failover primary devices. I'd like feedback on how to
  solve this more elegant. I added a boolean to DeviceState, have it
  default to false for all devices except for primary devices. 
- not tested yet with surprise removal
- I don't expect this to go in as it is, still needs more testing but
  I'd like to get feedback on above mentioned changes.

The general idea is that we have a pair of devices, a vfio-pci and a
emulated device. Before migration the vfio device is unplugged and data
flows to the emulated device, on the target side another vfio-pci device
is plugged in to take over the data-path. In the guest the net_failover
module will pair net devices with the same MAC address.

* In the first patch the infrastructure for hiding the device is added
  for the qbus and qdev APIs. 

* In the second patch the virtio-net uses the API to defer adding the vfio
  device until the VIRTIO_NET_F_STANDBY feature is acked.

Previous discussion: 
  RFC v1 https://patchwork.ozlabs.org/cover/989098/
  RFC v2 https://www.mail-archive.com/qemu-devel@nongnu.org/msg606906.html

To summarize concerns/feedback from previous discussion:
1.- guest OS can reject or worse _delay_ unplug by any amount of time.
  Migration might get stuck for unpredictable time with unclear reason.
  This approach combines two tricky things, hot/unplug and migration. 
  -> We can surprise-remove the PCI device and in QEMU we can do all
     necessary rollbacks transparent to management software. Will it be
     easy, probably not.
2. PCI devices are a precious ressource. The primary device should never
  be added to QEMU if it won't be used by guest instead of hiding it in
  QEMU. 
  -> We only hotplug the device when the standby feature bit was
     negotiated. We save the device cmdline options until we need it for
     qdev_device_add()
     Hiding a device can be a useful concept to model. For example a
     pci device in a powered-off slot could be marked as hidden until the slot is
     powered on (mst).
3. Management layer software should handle this. Open Stack already has
  components/code to handle unplug/replug VFIO devices and metadata to
  provide to the guest for detecting which devices should be paired.
  -> An approach that includes all software from firmware to
     higher-level management software wasn't tried in the last years. This is
     an attempt to keep it simple and contained in QEMU as much as possible.
4. Hotplugging a device and then making it part of a failover setup is
   not possible
  -> addressed by extending qdev hotplug functions to check for hidden
     attribute, so e.g. device_add can be used to plug a device.


I have tested this with a mlx5 NIC and was able to migrate the VM with
above mentioned workarounds for open problems.

Command line example:

qemu-system-x86_64 -enable-kvm -m 3072 -smp 3 \
        -machine q35,kernel-irqchip=split -cpu host   \
        -k fr   \
        -serial stdio   \
        -net none \
        -qmp unix:/tmp/qmp.socket,server,nowait \
        -monitor telnet:127.0.0.1:5555,server,nowait \
        -device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
        -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
        -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
        -netdev tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \
        -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,failover=on \                                                                                    
        /root/rhel-guest-image-8.0-1781.x86_64.qcow2

Then the primary device can be hotplugged via
 (qemu) device_add vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,standby=net1


I'm grateful for any remarks or ideas!

Thanks!


Jens Freimann (4):
  migration: allow unplug during migration for failover devices
  qdev/qbus: Add hidden device support
  net/virtio: add failover support
  vfio: unplug failover primary device before migration

 hw/core/qdev.c                 |  20 ++++++
 hw/net/virtio-net.c            | 117 +++++++++++++++++++++++++++++++++
 hw/vfio/pci.c                  |  25 ++++++-
 hw/vfio/pci.h                  |   2 +
 include/hw/qdev-core.h         |  10 +++
 include/hw/virtio/virtio-net.h |  12 ++++
 qdev-monitor.c                 |  43 ++++++++++--
 vl.c                           |   6 +-
 8 files changed, 228 insertions(+), 7 deletions(-)

-- 
2.21.0



^ permalink raw reply	[flat|nested] 77+ messages in thread

* [Qemu-devel] [PATCH 1/4] migration: allow unplug during migration for failover devices
  2019-05-17 12:58 [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices Jens Freimann
@ 2019-05-17 12:58 ` Jens Freimann
  2019-05-21  9:33   ` Dr. David Alan Gilbert
  2019-05-17 12:58 ` [Qemu-devel] [PATCH 2/4] qdev/qbus: Add hidden device support Jens Freimann
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 77+ messages in thread
From: Jens Freimann @ 2019-05-17 12:58 UTC (permalink / raw)
  To: qemu-devel; +Cc: pkrempa, berrange, ehabkost, mst, aadam, laine, ailan

In "b06424de62 migration: Disable hotplug/unplug during migration" we
added a check to disable unplug for all devices until we have figured
out what works. For failover primary devices qdev_unplug() is called
from the migration handler, i.e. during migration.

This patch adds a flag to DeviceState which is set to false for all
devices and makes an exception for vfio-pci devices that are also
primary devices in a failover pair.

Signed-off-by: Jens Freimann <jfreimann@redhat.com>
---
 hw/core/qdev.c         | 1 +
 include/hw/qdev-core.h | 1 +
 qdev-monitor.c         | 2 +-
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index f9b6efe509..98cdaa6bf7 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -954,6 +954,7 @@ static void device_initfn(Object *obj)
 
     dev->instance_id_alias = -1;
     dev->realized = false;
+    dev->allow_unplug_during_migration = false;
 
     object_property_add_bool(obj, "realized",
                              device_get_realized, device_set_realized, NULL);
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 33ed3b8dde..5437395779 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -146,6 +146,7 @@ struct DeviceState {
     bool pending_deleted_event;
     QemuOpts *opts;
     int hotplugged;
+    bool allow_unplug_during_migration;
     BusState *parent_bus;
     QLIST_HEAD(, NamedGPIOList) gpios;
     QLIST_HEAD(, BusState) child_bus;
diff --git a/qdev-monitor.c b/qdev-monitor.c
index 373b9ad445..9cce8b93c2 100644
--- a/qdev-monitor.c
+++ b/qdev-monitor.c
@@ -867,7 +867,7 @@ void qdev_unplug(DeviceState *dev, Error **errp)
         return;
     }
 
-    if (!migration_is_idle()) {
+    if (!migration_is_idle() && !dev->allow_unplug_during_migration) {
         error_setg(errp, "device_del not allowed while migrating");
         return;
     }
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Qemu-devel] [PATCH 2/4] qdev/qbus: Add hidden device support
  2019-05-17 12:58 [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices Jens Freimann
  2019-05-17 12:58 ` [Qemu-devel] [PATCH 1/4] migration: allow unplug during migration for failover devices Jens Freimann
@ 2019-05-17 12:58 ` Jens Freimann
  2019-05-21 11:33   ` Michael S. Tsirkin
  2019-05-17 12:58 ` [Qemu-devel] [PATCH 3/4] net/virtio: add failover support Jens Freimann
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 77+ messages in thread
From: Jens Freimann @ 2019-05-17 12:58 UTC (permalink / raw)
  To: qemu-devel; +Cc: pkrempa, berrange, ehabkost, mst, aadam, laine, ailan

This adds support for hiding a device to the qbus and qdev APIs.
qdev_device_add() is modified to check for a standby argument in the
option string. A DeviceListener callback should_be_hidden() is added. It
can be used by a standby device to inform qdev that this device should
not be added now. The standby device handler can store the device
options to plug the device in at a later point in time.

Signed-off-by: Jens Freimann <jfreimann@redhat.com>
---
 hw/core/qdev.c         | 19 +++++++++++++++++++
 hw/vfio/pci.c          |  1 +
 hw/vfio/pci.h          |  1 +
 include/hw/qdev-core.h |  9 +++++++++
 qdev-monitor.c         | 41 ++++++++++++++++++++++++++++++++++++++---
 vl.c                   |  6 ++++--
 6 files changed, 72 insertions(+), 5 deletions(-)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 98cdaa6bf7..d55fe00ae7 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -211,6 +211,25 @@ void device_listener_unregister(DeviceListener *listener)
     QTAILQ_REMOVE(&device_listeners, listener, link);
 }
 
+bool qdev_should_hide_device(QemuOpts *opts, Error **errp)
+{
+    bool res = false;
+    bool match_found = false;
+
+    DeviceListener *listener;
+
+    QTAILQ_FOREACH(listener, &device_listeners, link) {
+       if (listener->should_be_hidden) {
+            listener->should_be_hidden(listener, opts, &match_found, &res);
+        }
+
+        if (match_found) {
+            break;
+        }
+    }
+    return res;
+}
+
 void qdev_set_legacy_instance_id(DeviceState *dev, int alias_id,
                                  int required_for_version)
 {
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 8cecb53d5c..835249c61d 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3215,6 +3215,7 @@ static Property vfio_pci_dev_properties[] = {
                             display, ON_OFF_AUTO_OFF),
     DEFINE_PROP_UINT32("xres", VFIOPCIDevice, display_xres, 0),
     DEFINE_PROP_UINT32("yres", VFIOPCIDevice, display_yres, 0),
+    DEFINE_PROP_STRING("standby", VFIOPCIDevice, standby),
     DEFINE_PROP_UINT32("x-intx-mmap-timeout-ms", VFIOPCIDevice,
                        intx.mmap_timeout, 1100),
     DEFINE_PROP_BIT("x-vga", VFIOPCIDevice, features,
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index cfcd1a81b8..1a87f91889 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -135,6 +135,7 @@ typedef struct VFIOPCIDevice {
     PCIHostDeviceAddress host;
     EventNotifier err_notifier;
     EventNotifier req_notifier;
+    char *standby;
     int (*resetfn)(struct VFIOPCIDevice *);
     uint32_t vendor_id;
     uint32_t device_id;
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 5437395779..d54d3ae62a 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -158,6 +158,13 @@ struct DeviceState {
 struct DeviceListener {
     void (*realize)(DeviceListener *listener, DeviceState *dev);
     void (*unrealize)(DeviceListener *listener, DeviceState *dev);
+    /*
+     * This callback is called just upon init of the DeviceState
+     * and can be used by a standby device for informing qdev if this
+     * device should be hidden by checking the device opts
+     */
+    void (*should_be_hidden)(DeviceListener *listener, QemuOpts *device_opts,
+            bool *match_found, bool *res);
     QTAILQ_ENTRY(DeviceListener) link;
 };
 
@@ -454,4 +461,6 @@ static inline bool qbus_is_hotpluggable(BusState *bus)
 void device_listener_register(DeviceListener *listener);
 void device_listener_unregister(DeviceListener *listener);
 
+bool qdev_should_hide_device(QemuOpts *opts, Error **errp);
+
 #endif
diff --git a/qdev-monitor.c b/qdev-monitor.c
index 9cce8b93c2..a81226529a 100644
--- a/qdev-monitor.c
+++ b/qdev-monitor.c
@@ -32,8 +32,10 @@
 #include "qemu/help_option.h"
 #include "qemu/option.h"
 #include "qemu/qemu-print.h"
+#include "qemu/option_int.h"
 #include "sysemu/block-backend.h"
 #include "migration/misc.h"
+#include "migration/migration.h"
 
 /*
  * Aliases were a bad idea from the start.  Let's keep them
@@ -561,14 +563,45 @@ void qdev_set_id(DeviceState *dev, const char *id)
     }
 }
 
+static int is_failover_device(void *opaque, const char *name, const char *value,
+                        Error **errp)
+{
+    if (strcmp(name, "standby") == 0) {
+        QemuOpts *opts = (QemuOpts *)opaque;
+
+        if (qdev_should_hide_device(opts, errp) && errp && !*errp) {
+            return 1;
+        } else if (errp && *errp) {
+            return -1;
+        }
+    }
+
+    return 0;
+}
+
+static bool should_hide_device(QemuOpts *opts, Error **err)
+{
+    if (qemu_opt_foreach(opts, is_failover_device, opts, err) == 0) {
+        return false;
+    }
+    return true;
+}
+
 DeviceState *qdev_device_add(QemuOpts *opts, Error **errp)
 {
     DeviceClass *dc;
     const char *driver, *path;
-    DeviceState *dev;
+    DeviceState *dev = NULL;
     BusState *bus = NULL;
     Error *err = NULL;
 
+    if (opts && should_hide_device(opts, &err)) {
+        if (err) {
+            goto err_del_dev;
+        }
+        return NULL;
+    }
+
     driver = qemu_opt_get(opts, "driver");
     if (!driver) {
         error_setg(errp, QERR_MISSING_PARAMETER, "driver");
@@ -640,8 +673,10 @@ DeviceState *qdev_device_add(QemuOpts *opts, Error **errp)
 
 err_del_dev:
     error_propagate(errp, err);
-    object_unparent(OBJECT(dev));
-    object_unref(OBJECT(dev));
+    if (dev) {
+        object_unparent(OBJECT(dev));
+        object_unref(OBJECT(dev));
+    }
     return NULL;
 }
 
diff --git a/vl.c b/vl.c
index b6709514c1..4b5b878275 100644
--- a/vl.c
+++ b/vl.c
@@ -2355,10 +2355,12 @@ static int device_init_func(void *opaque, QemuOpts *opts, Error **errp)
     DeviceState *dev;
 
     dev = qdev_device_add(opts, errp);
-    if (!dev) {
+    if (!dev && *errp) {
+        error_report_err(*errp);
         return -1;
+    } else if (dev) {
+        object_unref(OBJECT(dev));
     }
-    object_unref(OBJECT(dev));
     return 0;
 }
 
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-05-17 12:58 [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices Jens Freimann
  2019-05-17 12:58 ` [Qemu-devel] [PATCH 1/4] migration: allow unplug during migration for failover devices Jens Freimann
  2019-05-17 12:58 ` [Qemu-devel] [PATCH 2/4] qdev/qbus: Add hidden device support Jens Freimann
@ 2019-05-17 12:58 ` Jens Freimann
  2019-05-21  9:45   ` Dr. David Alan Gilbert
  2019-05-17 12:58 ` [Qemu-devel] [PATCH 4/4] vfio/pci: unplug failover primary device before migration Jens Freimann
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 77+ messages in thread
From: Jens Freimann @ 2019-05-17 12:58 UTC (permalink / raw)
  To: qemu-devel; +Cc: pkrempa, berrange, ehabkost, mst, aadam, laine, ailan

This patch adds support to handle failover device pairs of a virtio-net
device and a vfio-pci device, where the virtio-net acts as the standby
device and the vfio-pci device as the primary.

The general idea is that we have a pair of devices, a vfio-pci and a
emulated (virtio-net) device. Before migration the vfio device is
unplugged and data flows to the emulated device, on the target side
another vfio-pci device is plugged in to take over the data-path. In the
guest the net_failover module will pair net devices with the same MAC
address.

To achieve this we need:

1. Provide a callback function for the should_be_hidden DeviceListener.
   It is called when the primary device is plugged in. Evaluate the QOpt
   passed in to check if it is the matching primary device. It returns
   two values:
     - one to signal if the device to be added is the matching
       primary device
     - another one to signal to qdev if it should actually
       continue with adding the device or skip it.

   In the latter case it stores the device options in the VirtioNet
   struct and the device is added once the VIRTIO_NET_F_STANDBY feature is
   negotiated during virtio feature negotiation.

2. Register a callback for migration status notifier. When called it
   will unplug its primary device before the migration happens.

Signed-off-by: Jens Freimann <jfreimann@redhat.com>
---
 hw/net/virtio-net.c            | 117 +++++++++++++++++++++++++++++++++
 include/hw/virtio/virtio-net.h |  12 ++++
 2 files changed, 129 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index ffe0872fff..120eccbb98 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -12,6 +12,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/atomic.h"
 #include "qemu/iov.h"
 #include "hw/virtio/virtio.h"
 #include "net/net.h"
@@ -19,6 +20,10 @@
 #include "net/tap.h"
 #include "qemu/error-report.h"
 #include "qemu/timer.h"
+#include "qemu/option.h"
+#include "qemu/option_int.h"
+#include "qemu/config-file.h"
+#include "qapi/qmp/qdict.h"
 #include "hw/virtio/virtio-net.h"
 #include "net/vhost_net.h"
 #include "net/announce.h"
@@ -29,6 +34,8 @@
 #include "migration/misc.h"
 #include "standard-headers/linux/ethtool.h"
 #include "trace.h"
+#include "monitor/qdev.h"
+#include "hw/pci/pci.h"
 
 #define VIRTIO_NET_VM_VERSION    11
 
@@ -364,6 +371,9 @@ static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
     }
 }
 
+
+static void virtio_net_primary_plug_timer(void *opaque);
+
 static void virtio_net_set_link_status(NetClientState *nc)
 {
     VirtIONet *n = qemu_get_nic_opaque(nc);
@@ -786,6 +796,14 @@ static void virtio_net_set_features(VirtIODevice *vdev, uint64_t features)
     } else {
         memset(n->vlans, 0xff, MAX_VLAN >> 3);
     }
+
+    if (virtio_has_feature(features, VIRTIO_NET_F_STANDBY)) {
+        atomic_set(&n->primary_should_be_hidden, false);
+        if (n->primary_device_timer)
+            timer_mod(n->primary_device_timer,
+                qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
+                4000);
+    }
 }
 
 static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
@@ -2626,6 +2644,87 @@ void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
     n->netclient_type = g_strdup(type);
 }
 
+static void virtio_net_primary_plug_timer(void *opaque)
+{
+    VirtIONet *n = opaque;
+    Error *err = NULL;
+
+    if (n->primary_device_dict)
+        n->primary_device_opts = qemu_opts_from_qdict(qemu_find_opts("device"),
+            n->primary_device_dict, &err);
+    if (n->primary_device_opts) {
+        n->primary_dev = qdev_device_add(n->primary_device_opts, &err);
+        error_setg(&err, "virtio_net: couldn't plug in primary device");
+        return;
+    }
+    if (!n->primary_device_dict && err) {
+        if (n->primary_device_timer) {
+            timer_mod(n->primary_device_timer,
+                qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
+                100);
+        }
+    }
+}
+
+static void virtio_net_handle_migration_primary(VirtIONet *n,
+                                                MigrationState *s)
+{
+    Error *err = NULL;
+    bool should_be_hidden = atomic_read(&n->primary_should_be_hidden);
+
+    n->primary_dev = qdev_find_recursive(sysbus_get_default(),
+            n->primary_device_id);
+    if (!n->primary_dev) {
+        error_setg(&err, "virtio_net: couldn't find primary device");
+    }
+    if (migration_in_setup(s) && !should_be_hidden && n->primary_dev) {
+        qdev_unplug(n->primary_dev, &err);
+        if (!err) {
+            atomic_set(&n->primary_should_be_hidden, true);
+            n->primary_dev = NULL;
+        }
+    } else if (migration_has_failed(s)) {
+        if (should_be_hidden && !n->primary_dev) {
+            /* We already unplugged the device let's plugged it back */
+            n->primary_dev = qdev_device_add(n->primary_device_opts, &err);
+        }
+    }
+}
+
+static void migration_state_notifier(Notifier *notifier, void *data)
+{
+    MigrationState *s = data;
+    VirtIONet *n = container_of(notifier, VirtIONet, migration_state);
+    virtio_net_handle_migration_primary(n, s);
+}
+
+static void virtio_net_primary_should_be_hidden(DeviceListener *listener,
+            QemuOpts *device_opts, bool *match_found, bool *res)
+{
+    VirtIONet *n = container_of(listener, VirtIONet, primary_listener);
+
+    if (device_opts) {
+        n->primary_device_dict = qemu_opts_to_qdict(device_opts,
+                n->primary_device_dict);
+    }
+    g_free(n->standby_id);
+    n->standby_id = g_strdup(qdict_get_try_str(n->primary_device_dict,
+                             "standby"));
+    if (n->standby_id) {
+        *match_found = true;
+    }
+    /* primary_should_be_hidden is set during feature negotiation */
+    if (atomic_read(&n->primary_should_be_hidden) && *match_found) {
+        *res = true;
+    } else if (*match_found)  {
+        n->primary_device_dict = qemu_opts_to_qdict(device_opts,
+                n->primary_device_dict);
+        *res = false;
+    }
+    g_free(n->primary_device_id);
+    n->primary_device_id = g_strdup(device_opts->id);
+}
+
 static void virtio_net_device_realize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
@@ -2656,6 +2755,18 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp)
         n->host_features |= (1ULL << VIRTIO_NET_F_SPEED_DUPLEX);
     }
 
+    if (n->failover) {
+        n->primary_listener.should_be_hidden =
+            virtio_net_primary_should_be_hidden;
+        atomic_set(&n->primary_should_be_hidden, true);
+        device_listener_register(&n->primary_listener);
+        n->migration_state.notify = migration_state_notifier;
+        add_migration_state_change_notifier(&n->migration_state);
+        n->host_features |= (1ULL << VIRTIO_NET_F_STANDBY);
+        n->primary_device_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL,
+                                     virtio_net_primary_plug_timer, n);
+    }
+
     virtio_net_set_config_size(n, n->host_features);
     virtio_init(vdev, "virtio-net", VIRTIO_ID_NET, n->config_size);
 
@@ -2778,6 +2889,11 @@ static void virtio_net_device_unrealize(DeviceState *dev, Error **errp)
     g_free(n->mac_table.macs);
     g_free(n->vlans);
 
+    g_free(n->primary_device_id);
+    g_free(n->standby_id);
+    qobject_unref(n->primary_device_dict);
+    n->primary_device_dict = NULL;
+
     max_queues = n->multiqueue ? n->max_queues : 1;
     for (i = 0; i < max_queues; i++) {
         virtio_net_del_queue(n, i);
@@ -2885,6 +3001,7 @@ static Property virtio_net_properties[] = {
                      true),
     DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
     DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
+    DEFINE_PROP_BOOL("failover", VirtIONet, failover, false),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index b96f0c643f..c2bb6ada44 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -18,6 +18,7 @@
 #include "standard-headers/linux/virtio_net.h"
 #include "hw/virtio/virtio.h"
 #include "net/announce.h"
+#include "qemu/option_int.h"
 
 #define TYPE_VIRTIO_NET "virtio-net-device"
 #define VIRTIO_NET(obj) \
@@ -43,6 +44,7 @@ typedef struct virtio_net_conf
     int32_t speed;
     char *duplex_str;
     uint8_t duplex;
+    char *primary_id_str;
 } virtio_net_conf;
 
 /* Coalesced packets type & status */
@@ -185,6 +187,16 @@ struct VirtIONet {
     AnnounceTimer announce_timer;
     bool needs_vnet_hdr_swap;
     bool mtu_bypass_backend;
+    QemuOpts *primary_device_opts;
+    QDict *primary_device_dict;
+    DeviceState *primary_dev;
+    char *primary_device_id;
+    char *standby_id;
+    bool primary_should_be_hidden;
+    bool failover;
+    DeviceListener primary_listener;
+    QEMUTimer *primary_device_timer;
+    Notifier migration_state;
 };
 
 void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Qemu-devel] [PATCH 4/4] vfio/pci: unplug failover primary device before migration
  2019-05-17 12:58 [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices Jens Freimann
                   ` (2 preceding siblings ...)
  2019-05-17 12:58 ` [Qemu-devel] [PATCH 3/4] net/virtio: add failover support Jens Freimann
@ 2019-05-17 12:58 ` Jens Freimann
  2019-05-20 22:56 ` [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices Alex Williamson
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 77+ messages in thread
From: Jens Freimann @ 2019-05-17 12:58 UTC (permalink / raw)
  To: qemu-devel; +Cc: pkrempa, berrange, ehabkost, mst, aadam, laine, ailan

As usual block all vfio-pci devices from being migrated, but make an
exception for failover primary devices. This is achieved by setting
unmigratable to 0 but also add a migration blocker for all vfio-pci
devices except failover primary devices. These will be unplugged before
migration happens by the migration handler of the corresponding
virtio-net standby device.

Signed-off-by: Jens Freimann <jfreimann@redhat.com>
---
 hw/vfio/pci.c | 24 +++++++++++++++++++++++-
 hw/vfio/pci.h |  1 +
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 835249c61d..60cda7dbc9 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -34,6 +34,9 @@
 #include "pci.h"
 #include "trace.h"
 #include "qapi/error.h"
+#include "migration/blocker.h"
+#include "qemu/option.h"
+#include "qemu/option_int.h"
 
 #define MSIX_CAP_LENGTH 12
 
@@ -2803,6 +2806,12 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
     vdev->req_enabled = false;
 }
 
+static int has_standby_arg(void *opaque, const char *name,
+                           const char *value, Error **errp)
+{
+    return strcmp(name, "standby") == 0;
+}
+
 static void vfio_realize(PCIDevice *pdev, Error **errp)
 {
     VFIOPCIDevice *vdev = PCI_VFIO(pdev);
@@ -2816,6 +2825,19 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
     int i, ret;
     bool is_mdev;
 
+    if (qemu_opt_foreach(pdev->qdev.opts, has_standby_arg,
+                         (void *) pdev->qdev.opts, &err) == 0) {
+        error_setg(&vdev->migration_blocker,
+                "VFIO device doesn't support migration");
+        ret = migrate_add_blocker(vdev->migration_blocker, &err);
+        if (err) {
+            error_propagate(errp, err);
+            error_free(vdev->migration_blocker);
+        }
+    } else {
+        pdev->qdev.allow_unplug_during_migration = true;
+    }
+
     if (!vdev->vbasedev.sysfsdev) {
         if (!(~vdev->host.domain || ~vdev->host.bus ||
               ~vdev->host.slot || ~vdev->host.function)) {
@@ -3258,7 +3280,7 @@ static Property vfio_pci_dev_properties[] = {
 
 static const VMStateDescription vfio_pci_vmstate = {
     .name = "vfio-pci",
-    .unmigratable = 1,
+    .unmigratable = 0,
 };
 
 static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 1a87f91889..390ba2c767 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -170,6 +170,7 @@ typedef struct VFIOPCIDevice {
     bool no_vfio_ioeventfd;
     bool enable_ramfb;
     VFIODisplay *dpy;
+    Error *migration_blocker;
 } VFIOPCIDevice;
 
 uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices
  2019-05-17 12:58 [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices Jens Freimann
                   ` (3 preceding siblings ...)
  2019-05-17 12:58 ` [Qemu-devel] [PATCH 4/4] vfio/pci: unplug failover primary device before migration Jens Freimann
@ 2019-05-20 22:56 ` Alex Williamson
  2019-05-21  7:21   ` Jens Freimann
  2019-05-21  8:37 ` Daniel P. Berrangé
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 77+ messages in thread
From: Alex Williamson @ 2019-05-20 22:56 UTC (permalink / raw)
  To: Jens Freimann
  Cc: pkrempa, berrange, ehabkost, mst, aadam, qemu-devel, laine, ailan

On Fri, 17 May 2019 14:58:16 +0200
Jens Freimann <jfreimann@redhat.com> wrote:

> This is another attempt at implementing the host side of the
> net_failover concept
> (https://www.kernel.org/doc/html/latest/networking/net_failover.html)
> 
> Changes since last RFC:
> - work around circular dependency of commandline options. Just add
>   failover=on to the virtio-net standby options and reference it from
>   primary (vfio-pci) device with standby=<id>  
> - add patch 3/4 to allow migration of vfio-pci device when it is part of a
>   failover pair, still disallow for all other devices
> - add patch 4/4 to allow unplug of device during migrationm, make an
>   exception for failover primary devices. I'd like feedback on how to
>   solve this more elegant. I added a boolean to DeviceState, have it
>   default to false for all devices except for primary devices. 
> - not tested yet with surprise removal
> - I don't expect this to go in as it is, still needs more testing but
>   I'd like to get feedback on above mentioned changes.
> 
> The general idea is that we have a pair of devices, a vfio-pci and a
> emulated device. Before migration the vfio device is unplugged and data
> flows to the emulated device, on the target side another vfio-pci device
> is plugged in to take over the data-path. In the guest the net_failover
> module will pair net devices with the same MAC address.
> 
> * In the first patch the infrastructure for hiding the device is added
>   for the qbus and qdev APIs. 
> 
> * In the second patch the virtio-net uses the API to defer adding the vfio
>   device until the VIRTIO_NET_F_STANDBY feature is acked.
> 
> Previous discussion: 
>   RFC v1 https://patchwork.ozlabs.org/cover/989098/
>   RFC v2 https://www.mail-archive.com/qemu-devel@nongnu.org/msg606906.html
> 
> To summarize concerns/feedback from previous discussion:
> 1.- guest OS can reject or worse _delay_ unplug by any amount of time.
>   Migration might get stuck for unpredictable time with unclear reason.
>   This approach combines two tricky things, hot/unplug and migration. 
>   -> We can surprise-remove the PCI device and in QEMU we can do all  
>      necessary rollbacks transparent to management software. Will it be
>      easy, probably not.
> 2. PCI devices are a precious ressource. The primary device should never
>   be added to QEMU if it won't be used by guest instead of hiding it in
>   QEMU. 
>   -> We only hotplug the device when the standby feature bit was  
>      negotiated. We save the device cmdline options until we need it for
>      qdev_device_add()
>      Hiding a device can be a useful concept to model. For example a
>      pci device in a powered-off slot could be marked as hidden until the slot is
>      powered on (mst).
> 3. Management layer software should handle this. Open Stack already has
>   components/code to handle unplug/replug VFIO devices and metadata to
>   provide to the guest for detecting which devices should be paired.
>   -> An approach that includes all software from firmware to  
>      higher-level management software wasn't tried in the last years. This is
>      an attempt to keep it simple and contained in QEMU as much as possible.
> 4. Hotplugging a device and then making it part of a failover setup is
>    not possible
>   -> addressed by extending qdev hotplug functions to check for hidden  
>      attribute, so e.g. device_add can be used to plug a device.
> 
> 
> I have tested this with a mlx5 NIC and was able to migrate the VM with
> above mentioned workarounds for open problems.
> 
> Command line example:
> 
> qemu-system-x86_64 -enable-kvm -m 3072 -smp 3 \
>         -machine q35,kernel-irqchip=split -cpu host   \
>         -k fr   \
>         -serial stdio   \
>         -net none \
>         -qmp unix:/tmp/qmp.socket,server,nowait \
>         -monitor telnet:127.0.0.1:5555,server,nowait \
>         -device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
>         -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
>         -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
>         -netdev tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \
>         -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,failover=on \                                                                                    
>         /root/rhel-guest-image-8.0-1781.x86_64.qcow2
> 
> Then the primary device can be hotplugged via
>  (qemu) device_add vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,standby=net1

Is this standby= option only valid for Network/Ethernet class code
devices?  If so, perhaps vfio-pci code should reject the option on any
non-ethernet devices.  The option is also non-intuitive for users, only
through examples like above can we see it relates to the id of the
secondary device.  Could we instead name it something like
"standby_net_failover_pair_id="?

Also, this feature requires matching MAC addresses per the description,
where is that done?  Is it the user's responsibility to set the MAC on
the host device prior to the device_add?  If so, is this actually not
only specific to ethernet devices, but ethernet VFs?

Finally, please copy me on code touching vfio.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices
  2019-05-20 22:56 ` [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices Alex Williamson
@ 2019-05-21  7:21   ` Jens Freimann
  2019-05-21 11:37     ` Michael S. Tsirkin
  2019-05-21 14:18     ` Alex Williamson
  0 siblings, 2 replies; 77+ messages in thread
From: Jens Freimann @ 2019-05-21  7:21 UTC (permalink / raw)
  To: Alex Williamson
  Cc: pkrempa, berrange, ehabkost, mst, aadam, qemu-devel, laine, ailan

On Mon, May 20, 2019 at 04:56:57PM -0600, Alex Williamson wrote:
>On Fri, 17 May 2019 14:58:16 +0200
>Jens Freimann <jfreimann@redhat.com> wrote:
>> Command line example:
>>
>> qemu-system-x86_64 -enable-kvm -m 3072 -smp 3 \
>>         -machine q35,kernel-irqchip=split -cpu host   \
>>         -k fr   \
>>         -serial stdio   \
>>         -net none \
>>         -qmp unix:/tmp/qmp.socket,server,nowait \
>>         -monitor telnet:127.0.0.1:5555,server,nowait \
>>         -device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
>>         -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
>>         -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
>>         -netdev tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \
>>         -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,failover=on \
>>         /root/rhel-guest-image-8.0-1781.x86_64.qcow2
>>
>> Then the primary device can be hotplugged via
>>  (qemu) device_add vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,standby=net1
>
>Is this standby= option only valid for Network/Ethernet class code
>devices?  If so, perhaps vfio-pci code should reject the option on any
>non-ethernet devices.  The option is also non-intuitive for users, only
>through examples like above can we see it relates to the id of the
>secondary device.  Could we instead name it something like
>"standby_net_failover_pair_id="?

It is only for ethernet (VFs), I will add code to reject non-ethernet VF devices.
I agree the name is not descriptive and the one you suggest seems good to
me. 
>
>Also, this feature requires matching MAC addresses per the description,
>where is that done?  Is it the user's responsibility to set the MAC on
>the host device prior to the device_add?  If so, is this actually not
>only specific to ethernet devices, but ethernet VFs?

Yes, it's the users responsibility and the MACs are then matched by
the net_failover driver in the guest. It makes sense for ethernet VFs only,
I'll add a check for that.
>
>Finally, please copy me on code touching vfio.  Thanks,

I'm sorry about that, will do.

Thanks for the review Alex!

regards,
Jens 


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices
  2019-05-17 12:58 [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices Jens Freimann
                   ` (4 preceding siblings ...)
  2019-05-20 22:56 ` [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices Alex Williamson
@ 2019-05-21  8:37 ` Daniel P. Berrangé
  2019-05-21 10:10 ` Michael S. Tsirkin
  2019-06-11 15:42 ` Laine Stump
  7 siblings, 0 replies; 77+ messages in thread
From: Daniel P. Berrangé @ 2019-05-21  8:37 UTC (permalink / raw)
  To: Jens Freimann; +Cc: pkrempa, ehabkost, mst, aadam, qemu-devel, laine, ailan

On Fri, May 17, 2019 at 02:58:16PM +0200, Jens Freimann wrote:
> This is another attempt at implementing the host side of the
> net_failover concept
> (https://www.kernel.org/doc/html/latest/networking/net_failover.html)
> 
> Changes since last RFC:
> - work around circular dependency of commandline options. Just add
>   failover=on to the virtio-net standby options and reference it from
>   primary (vfio-pci) device with standby=<id>  
> - add patch 3/4 to allow migration of vfio-pci device when it is part of a
>   failover pair, still disallow for all other devices
> - add patch 4/4 to allow unplug of device during migrationm, make an
>   exception for failover primary devices. I'd like feedback on how to
>   solve this more elegant. I added a boolean to DeviceState, have it
>   default to false for all devices except for primary devices. 
> - not tested yet with surprise removal
> - I don't expect this to go in as it is, still needs more testing but
>   I'd like to get feedback on above mentioned changes.
> 
> The general idea is that we have a pair of devices, a vfio-pci and a
> emulated device. Before migration the vfio device is unplugged and data
> flows to the emulated device, on the target side another vfio-pci device
> is plugged in to take over the data-path. In the guest the net_failover
> module will pair net devices with the same MAC address.
> 
> * In the first patch the infrastructure for hiding the device is added
>   for the qbus and qdev APIs. 
> 
> * In the second patch the virtio-net uses the API to defer adding the vfio
>   device until the VIRTIO_NET_F_STANDBY feature is acked.
> 
> Previous discussion: 
>   RFC v1 https://patchwork.ozlabs.org/cover/989098/
>   RFC v2 https://www.mail-archive.com/qemu-devel@nongnu.org/msg606906.html
> 
> To summarize concerns/feedback from previous discussion:
> 1.- guest OS can reject or worse _delay_ unplug by any amount of time.
>   Migration might get stuck for unpredictable time with unclear reason.
>   This approach combines two tricky things, hot/unplug and migration. 
>   -> We can surprise-remove the PCI device and in QEMU we can do all
>      necessary rollbacks transparent to management software. Will it be
>      easy, probably not.
> 2. PCI devices are a precious ressource. The primary device should never
>   be added to QEMU if it won't be used by guest instead of hiding it in
>   QEMU. 
>   -> We only hotplug the device when the standby feature bit was
>      negotiated. We save the device cmdline options until we need it for
>      qdev_device_add()
>      Hiding a device can be a useful concept to model. For example a
>      pci device in a powered-off slot could be marked as hidden until the slot is
>      powered on (mst).
> 3. Management layer software should handle this. Open Stack already has
>   components/code to handle unplug/replug VFIO devices and metadata to
>   provide to the guest for detecting which devices should be paired.
>   -> An approach that includes all software from firmware to
>      higher-level management software wasn't tried in the last years. This is
>      an attempt to keep it simple and contained in QEMU as much as possible.
> 4. Hotplugging a device and then making it part of a failover setup is
>    not possible
>   -> addressed by extending qdev hotplug functions to check for hidden
>      attribute, so e.g. device_add can be used to plug a device.
> 
> 
> I have tested this with a mlx5 NIC and was able to migrate the VM with
> above mentioned workarounds for open problems.
> 
> Command line example:
> 
> qemu-system-x86_64 -enable-kvm -m 3072 -smp 3 \
>         -machine q35,kernel-irqchip=split -cpu host   \
>         -k fr   \
>         -serial stdio   \
>         -net none \
>         -qmp unix:/tmp/qmp.socket,server,nowait \
>         -monitor telnet:127.0.0.1:5555,server,nowait \
>         -device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
>         -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
>         -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
>         -netdev tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \
>         -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,failover=on \                                                                                    
>         /root/rhel-guest-image-8.0-1781.x86_64.qcow2
> 
> Then the primary device can be hotplugged via
>  (qemu) device_add vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,standby=net1

This command line syntax looks much saner now that the circular dep is
gone. I think this approach is now viable to use from libvirt's POV.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] migration: allow unplug during migration for failover devices
  2019-05-17 12:58 ` [Qemu-devel] [PATCH 1/4] migration: allow unplug during migration for failover devices Jens Freimann
@ 2019-05-21  9:33   ` Dr. David Alan Gilbert
  2019-05-21  9:47     ` Daniel P. Berrangé
  2019-05-23  8:01     ` Jens Freimann
  0 siblings, 2 replies; 77+ messages in thread
From: Dr. David Alan Gilbert @ 2019-05-21  9:33 UTC (permalink / raw)
  To: Jens Freimann
  Cc: pkrempa, berrange, ehabkost, mst, aadam, qemu-devel, laine, ailan

* Jens Freimann (jfreimann@redhat.com) wrote:
> In "b06424de62 migration: Disable hotplug/unplug during migration" we
> added a check to disable unplug for all devices until we have figured
> out what works. For failover primary devices qdev_unplug() is called
> from the migration handler, i.e. during migration.
> 
> This patch adds a flag to DeviceState which is set to false for all
> devices and makes an exception for vfio-pci devices that are also
> primary devices in a failover pair.
> 
> Signed-off-by: Jens Freimann <jfreimann@redhat.com>

So I think this is safe in your case, because you trigger the unplug
right at the start of migration during setup and plug after failure;
however it's not generally safe - I can't unplug a device while the
migration is actually in progress.

Dave

> ---
>  hw/core/qdev.c         | 1 +
>  include/hw/qdev-core.h | 1 +
>  qdev-monitor.c         | 2 +-
>  3 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/core/qdev.c b/hw/core/qdev.c
> index f9b6efe509..98cdaa6bf7 100644
> --- a/hw/core/qdev.c
> +++ b/hw/core/qdev.c
> @@ -954,6 +954,7 @@ static void device_initfn(Object *obj)
>  
>      dev->instance_id_alias = -1;
>      dev->realized = false;
> +    dev->allow_unplug_during_migration = false;
>  
>      object_property_add_bool(obj, "realized",
>                               device_get_realized, device_set_realized, NULL);
> diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> index 33ed3b8dde..5437395779 100644
> --- a/include/hw/qdev-core.h
> +++ b/include/hw/qdev-core.h
> @@ -146,6 +146,7 @@ struct DeviceState {
>      bool pending_deleted_event;
>      QemuOpts *opts;
>      int hotplugged;
> +    bool allow_unplug_during_migration;
>      BusState *parent_bus;
>      QLIST_HEAD(, NamedGPIOList) gpios;
>      QLIST_HEAD(, BusState) child_bus;
> diff --git a/qdev-monitor.c b/qdev-monitor.c
> index 373b9ad445..9cce8b93c2 100644
> --- a/qdev-monitor.c
> +++ b/qdev-monitor.c
> @@ -867,7 +867,7 @@ void qdev_unplug(DeviceState *dev, Error **errp)
>          return;
>      }
>  
> -    if (!migration_is_idle()) {
> +    if (!migration_is_idle() && !dev->allow_unplug_during_migration) {
>          error_setg(errp, "device_del not allowed while migrating");
>          return;
>      }
> -- 
> 2.21.0
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-05-17 12:58 ` [Qemu-devel] [PATCH 3/4] net/virtio: add failover support Jens Freimann
@ 2019-05-21  9:45   ` Dr. David Alan Gilbert
  2019-05-30 14:56     ` Jens Freimann
  0 siblings, 1 reply; 77+ messages in thread
From: Dr. David Alan Gilbert @ 2019-05-21  9:45 UTC (permalink / raw)
  To: Jens Freimann
  Cc: pkrempa, berrange, ehabkost, mst, aadam, qemu-devel, laine, ailan

* Jens Freimann (jfreimann@redhat.com) wrote:
> This patch adds support to handle failover device pairs of a virtio-net
> device and a vfio-pci device, where the virtio-net acts as the standby
> device and the vfio-pci device as the primary.
> 
> The general idea is that we have a pair of devices, a vfio-pci and a
> emulated (virtio-net) device. Before migration the vfio device is
> unplugged and data flows to the emulated device, on the target side
> another vfio-pci device is plugged in to take over the data-path. In the
> guest the net_failover module will pair net devices with the same MAC
> address.
> 
> To achieve this we need:
> 
> 1. Provide a callback function for the should_be_hidden DeviceListener.
>    It is called when the primary device is plugged in. Evaluate the QOpt
>    passed in to check if it is the matching primary device. It returns
>    two values:
>      - one to signal if the device to be added is the matching
>        primary device
>      - another one to signal to qdev if it should actually
>        continue with adding the device or skip it.
> 
>    In the latter case it stores the device options in the VirtioNet
>    struct and the device is added once the VIRTIO_NET_F_STANDBY feature is
>    negotiated during virtio feature negotiation.
> 
> 2. Register a callback for migration status notifier. When called it
>    will unplug its primary device before the migration happens.
> 
> Signed-off-by: Jens Freimann <jfreimann@redhat.com>
> ---
>  hw/net/virtio-net.c            | 117 +++++++++++++++++++++++++++++++++
>  include/hw/virtio/virtio-net.h |  12 ++++
>  2 files changed, 129 insertions(+)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index ffe0872fff..120eccbb98 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -12,6 +12,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "qemu/atomic.h"
>  #include "qemu/iov.h"
>  #include "hw/virtio/virtio.h"
>  #include "net/net.h"
> @@ -19,6 +20,10 @@
>  #include "net/tap.h"
>  #include "qemu/error-report.h"
>  #include "qemu/timer.h"
> +#include "qemu/option.h"
> +#include "qemu/option_int.h"
> +#include "qemu/config-file.h"
> +#include "qapi/qmp/qdict.h"
>  #include "hw/virtio/virtio-net.h"
>  #include "net/vhost_net.h"
>  #include "net/announce.h"
> @@ -29,6 +34,8 @@
>  #include "migration/misc.h"
>  #include "standard-headers/linux/ethtool.h"
>  #include "trace.h"
> +#include "monitor/qdev.h"
> +#include "hw/pci/pci.h"
>  
>  #define VIRTIO_NET_VM_VERSION    11
>  
> @@ -364,6 +371,9 @@ static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
>      }
>  }
>  
> +
> +static void virtio_net_primary_plug_timer(void *opaque);
> +
>  static void virtio_net_set_link_status(NetClientState *nc)
>  {
>      VirtIONet *n = qemu_get_nic_opaque(nc);
> @@ -786,6 +796,14 @@ static void virtio_net_set_features(VirtIODevice *vdev, uint64_t features)
>      } else {
>          memset(n->vlans, 0xff, MAX_VLAN >> 3);
>      }
> +
> +    if (virtio_has_feature(features, VIRTIO_NET_F_STANDBY)) {
> +        atomic_set(&n->primary_should_be_hidden, false);
> +        if (n->primary_device_timer)
> +            timer_mod(n->primary_device_timer,
> +                qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> +                4000);
> +    }

What's this magic timer constant and why?

>  }
>  
>  static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
> @@ -2626,6 +2644,87 @@ void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
>      n->netclient_type = g_strdup(type);
>  }
>  
> +static void virtio_net_primary_plug_timer(void *opaque)
> +{
> +    VirtIONet *n = opaque;
> +    Error *err = NULL;
> +
> +    if (n->primary_device_dict)
> +        n->primary_device_opts = qemu_opts_from_qdict(qemu_find_opts("device"),
> +            n->primary_device_dict, &err);
> +    if (n->primary_device_opts) {
> +        n->primary_dev = qdev_device_add(n->primary_device_opts, &err);
> +        error_setg(&err, "virtio_net: couldn't plug in primary device");
> +        return;
> +    }
> +    if (!n->primary_device_dict && err) {
> +        if (n->primary_device_timer) {
> +            timer_mod(n->primary_device_timer,
> +                qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> +                100);

same here.


> +        }
> +    }
> +}
> +
> +static void virtio_net_handle_migration_primary(VirtIONet *n,
> +                                                MigrationState *s)
> +{
> +    Error *err = NULL;
> +    bool should_be_hidden = atomic_read(&n->primary_should_be_hidden);
> +
> +    n->primary_dev = qdev_find_recursive(sysbus_get_default(),
> +            n->primary_device_id);
> +    if (!n->primary_dev) {
> +        error_setg(&err, "virtio_net: couldn't find primary device");

There's something broken with the error handling in this function - the
'err' never goes anywhere - I don't think it ever gets printed or
reported or stops the migration.

> +    }
> +    if (migration_in_setup(s) && !should_be_hidden && n->primary_dev) {
> +        qdev_unplug(n->primary_dev, &err);

Not knowing unplug well; can you just explain - is that device hard
unplugged and it's gone by the time this function returns or is it still
hanging around for some indeterminate time?

> +        if (!err) {
> +            atomic_set(&n->primary_should_be_hidden, true);
> +            n->primary_dev = NULL;
> +        }
> +    } else if (migration_has_failed(s)) {
> +        if (should_be_hidden && !n->primary_dev) {
> +            /* We already unplugged the device let's plugged it back */
> +            n->primary_dev = qdev_device_add(n->primary_device_opts, &err);
> +        }
> +    }
> +}
> +
> +static void migration_state_notifier(Notifier *notifier, void *data)
> +{
> +    MigrationState *s = data;
> +    VirtIONet *n = container_of(notifier, VirtIONet, migration_state);
> +    virtio_net_handle_migration_primary(n, s);
> +}
> +
> +static void virtio_net_primary_should_be_hidden(DeviceListener *listener,
> +            QemuOpts *device_opts, bool *match_found, bool *res)
> +{
> +    VirtIONet *n = container_of(listener, VirtIONet, primary_listener);
> +
> +    if (device_opts) {
> +        n->primary_device_dict = qemu_opts_to_qdict(device_opts,
> +                n->primary_device_dict);
> +    }
> +    g_free(n->standby_id);
> +    n->standby_id = g_strdup(qdict_get_try_str(n->primary_device_dict,
> +                             "standby"));
> +    if (n->standby_id) {
> +        *match_found = true;
> +    }
> +    /* primary_should_be_hidden is set during feature negotiation */
> +    if (atomic_read(&n->primary_should_be_hidden) && *match_found) {
> +        *res = true;
> +    } else if (*match_found)  {
> +        n->primary_device_dict = qemu_opts_to_qdict(device_opts,
> +                n->primary_device_dict);
> +        *res = false;
> +    }
> +    g_free(n->primary_device_id);
> +    n->primary_device_id = g_strdup(device_opts->id);
> +}
> +
>  static void virtio_net_device_realize(DeviceState *dev, Error **errp)
>  {
>      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> @@ -2656,6 +2755,18 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp)
>          n->host_features |= (1ULL << VIRTIO_NET_F_SPEED_DUPLEX);
>      }
>  
> +    if (n->failover) {
> +        n->primary_listener.should_be_hidden =
> +            virtio_net_primary_should_be_hidden;
> +        atomic_set(&n->primary_should_be_hidden, true);
> +        device_listener_register(&n->primary_listener);
> +        n->migration_state.notify = migration_state_notifier;
> +        add_migration_state_change_notifier(&n->migration_state);
> +        n->host_features |= (1ULL << VIRTIO_NET_F_STANDBY);
> +        n->primary_device_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL,
> +                                     virtio_net_primary_plug_timer, n);
> +    }
> +
>      virtio_net_set_config_size(n, n->host_features);
>      virtio_init(vdev, "virtio-net", VIRTIO_ID_NET, n->config_size);
>  
> @@ -2778,6 +2889,11 @@ static void virtio_net_device_unrealize(DeviceState *dev, Error **errp)
>      g_free(n->mac_table.macs);
>      g_free(n->vlans);
>  
> +    g_free(n->primary_device_id);
> +    g_free(n->standby_id);
> +    qobject_unref(n->primary_device_dict);
> +    n->primary_device_dict = NULL;
> +
>      max_queues = n->multiqueue ? n->max_queues : 1;
>      for (i = 0; i < max_queues; i++) {
>          virtio_net_del_queue(n, i);
> @@ -2885,6 +3001,7 @@ static Property virtio_net_properties[] = {
>                       true),
>      DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
>      DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
> +    DEFINE_PROP_BOOL("failover", VirtIONet, failover, false),
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
> index b96f0c643f..c2bb6ada44 100644
> --- a/include/hw/virtio/virtio-net.h
> +++ b/include/hw/virtio/virtio-net.h
> @@ -18,6 +18,7 @@
>  #include "standard-headers/linux/virtio_net.h"
>  #include "hw/virtio/virtio.h"
>  #include "net/announce.h"
> +#include "qemu/option_int.h"
>  
>  #define TYPE_VIRTIO_NET "virtio-net-device"
>  #define VIRTIO_NET(obj) \
> @@ -43,6 +44,7 @@ typedef struct virtio_net_conf
>      int32_t speed;
>      char *duplex_str;
>      uint8_t duplex;
> +    char *primary_id_str;
>  } virtio_net_conf;
>  
>  /* Coalesced packets type & status */
> @@ -185,6 +187,16 @@ struct VirtIONet {
>      AnnounceTimer announce_timer;
>      bool needs_vnet_hdr_swap;
>      bool mtu_bypass_backend;
> +    QemuOpts *primary_device_opts;
> +    QDict *primary_device_dict;
> +    DeviceState *primary_dev;
> +    char *primary_device_id;
> +    char *standby_id;
> +    bool primary_should_be_hidden;
> +    bool failover;
> +    DeviceListener primary_listener;
> +    QEMUTimer *primary_device_timer;
> +    Notifier migration_state;
>  };
>  
>  void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
> -- 
> 2.21.0
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] migration: allow unplug during migration for failover devices
  2019-05-21  9:33   ` Dr. David Alan Gilbert
@ 2019-05-21  9:47     ` Daniel P. Berrangé
  2019-05-23  8:01     ` Jens Freimann
  1 sibling, 0 replies; 77+ messages in thread
From: Daniel P. Berrangé @ 2019-05-21  9:47 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: pkrempa, ehabkost, mst, aadam, qemu-devel, laine, Jens Freimann, ailan

On Tue, May 21, 2019 at 10:33:36AM +0100, Dr. David Alan Gilbert wrote:
> * Jens Freimann (jfreimann@redhat.com) wrote:
> > In "b06424de62 migration: Disable hotplug/unplug during migration" we
> > added a check to disable unplug for all devices until we have figured
> > out what works. For failover primary devices qdev_unplug() is called
> > from the migration handler, i.e. during migration.
> > 
> > This patch adds a flag to DeviceState which is set to false for all
> > devices and makes an exception for vfio-pci devices that are also
> > primary devices in a failover pair.
> > 
> > Signed-off-by: Jens Freimann <jfreimann@redhat.com>
> 
> So I think this is safe in your case, because you trigger the unplug
> right at the start of migration during setup and plug after failure;
> however it's not generally safe - I can't unplug a device while the
> migration is actually in progress.

Libvirt will also block any attempt to hotplug/unplug device during
migration.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices
  2019-05-17 12:58 [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices Jens Freimann
                   ` (5 preceding siblings ...)
  2019-05-21  8:37 ` Daniel P. Berrangé
@ 2019-05-21 10:10 ` Michael S. Tsirkin
  2019-05-21 19:17   ` Jens Freimann
  2019-06-11 15:42 ` Laine Stump
  7 siblings, 1 reply; 77+ messages in thread
From: Michael S. Tsirkin @ 2019-05-21 10:10 UTC (permalink / raw)
  To: Jens Freimann
  Cc: pkrempa, berrange, ehabkost, aadam, qemu-devel, laine, ailan

On Fri, May 17, 2019 at 02:58:16PM +0200, Jens Freimann wrote:
> This is another attempt at implementing the host side of the
> net_failover concept
> (https://www.kernel.org/doc/html/latest/networking/net_failover.html)
> 
> Changes since last RFC:
> - work around circular dependency of commandline options. Just add
>   failover=on to the virtio-net standby options and reference it from
>   primary (vfio-pci) device with standby=<id>  
> - add patch 3/4 to allow migration of vfio-pci device when it is part of a
>   failover pair, still disallow for all other devices
> - add patch 4/4 to allow unplug of device during migrationm, make an
>   exception for failover primary devices. I'd like feedback on how to
>   solve this more elegant. I added a boolean to DeviceState, have it
>   default to false for all devices except for primary devices. 
> - not tested yet with surprise removal
> - I don't expect this to go in as it is, still needs more testing but
>   I'd like to get feedback on above mentioned changes.
> 
> The general idea is that we have a pair of devices, a vfio-pci and a
> emulated device. Before migration the vfio device is unplugged and data
> flows to the emulated device, on the target side another vfio-pci device
> is plugged in to take over the data-path. In the guest the net_failover
> module will pair net devices with the same MAC address.
> 
> * In the first patch the infrastructure for hiding the device is added
>   for the qbus and qdev APIs. 
> 
> * In the second patch the virtio-net uses the API to defer adding the vfio
>   device until the VIRTIO_NET_F_STANDBY feature is acked.
> 
> Previous discussion: 
>   RFC v1 https://patchwork.ozlabs.org/cover/989098/
>   RFC v2 https://www.mail-archive.com/qemu-devel@nongnu.org/msg606906.html
> 
> To summarize concerns/feedback from previous discussion:
> 1.- guest OS can reject or worse _delay_ unplug by any amount of time.
>   Migration might get stuck for unpredictable time with unclear reason.
>   This approach combines two tricky things, hot/unplug and migration. 
>   -> We can surprise-remove the PCI device and in QEMU we can do all
>      necessary rollbacks transparent to management software. Will it be
>      easy, probably not.
> 2. PCI devices are a precious ressource. The primary device should never
>   be added to QEMU if it won't be used by guest instead of hiding it in
>   QEMU. 
>   -> We only hotplug the device when the standby feature bit was
>      negotiated. We save the device cmdline options until we need it for
>      qdev_device_add()
>      Hiding a device can be a useful concept to model. For example a
>      pci device in a powered-off slot could be marked as hidden until the slot is
>      powered on (mst).
> 3. Management layer software should handle this. Open Stack already has
>   components/code to handle unplug/replug VFIO devices and metadata to
>   provide to the guest for detecting which devices should be paired.
>   -> An approach that includes all software from firmware to
>      higher-level management software wasn't tried in the last years. This is
>      an attempt to keep it simple and contained in QEMU as much as possible.
> 4. Hotplugging a device and then making it part of a failover setup is
>    not possible
>   -> addressed by extending qdev hotplug functions to check for hidden
>      attribute, so e.g. device_add can be used to plug a device.
> 
> 
> I have tested this with a mlx5 NIC and was able to migrate the VM with
> above mentioned workarounds for open problems.
> 
> Command line example:
> 
> qemu-system-x86_64 -enable-kvm -m 3072 -smp 3 \
>         -machine q35,kernel-irqchip=split -cpu host   \
>         -k fr   \
>         -serial stdio   \
>         -net none \
>         -qmp unix:/tmp/qmp.socket,server,nowait \
>         -monitor telnet:127.0.0.1:5555,server,nowait \
>         -device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
>         -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
>         -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
>         -netdev tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \
>         -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,failover=on \                                                                                    
>         /root/rhel-guest-image-8.0-1781.x86_64.qcow2
> 
> Then the primary device can be hotplugged via
>  (qemu) device_add vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,standby=net1
> 
> 
> I'm grateful for any remarks or ideas!
> 
> Thanks!

Hi Jens!
Overall I like the patches. Thanks!

Could you please tell us a bit more about other hardware: does this work
more or less universally across vendors? were any other cards tested?

Thanks in advance.

> 
> Jens Freimann (4):
>   migration: allow unplug during migration for failover devices
>   qdev/qbus: Add hidden device support
>   net/virtio: add failover support
>   vfio: unplug failover primary device before migration
> 
>  hw/core/qdev.c                 |  20 ++++++
>  hw/net/virtio-net.c            | 117 +++++++++++++++++++++++++++++++++
>  hw/vfio/pci.c                  |  25 ++++++-
>  hw/vfio/pci.h                  |   2 +
>  include/hw/qdev-core.h         |  10 +++
>  include/hw/virtio/virtio-net.h |  12 ++++
>  qdev-monitor.c                 |  43 ++++++++++--
>  vl.c                           |   6 +-
>  8 files changed, 228 insertions(+), 7 deletions(-)
> 
> -- 
> 2.21.0


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] qdev/qbus: Add hidden device support
  2019-05-17 12:58 ` [Qemu-devel] [PATCH 2/4] qdev/qbus: Add hidden device support Jens Freimann
@ 2019-05-21 11:33   ` Michael S. Tsirkin
  0 siblings, 0 replies; 77+ messages in thread
From: Michael S. Tsirkin @ 2019-05-21 11:33 UTC (permalink / raw)
  To: Jens Freimann
  Cc: pkrempa, berrange, ehabkost, aadam, qemu-devel, laine, ailan

On Fri, May 17, 2019 at 02:58:18PM +0200, Jens Freimann wrote:
> This adds support for hiding a device to the qbus and qdev APIs.
> qdev_device_add() is modified to check for a standby argument in the
> option string. A DeviceListener callback should_be_hidden() is added. It
> can be used by a standby device to inform qdev that this device should
> not be added now. The standby device handler can store the device
> options to plug the device in at a later point in time.
> 
> Signed-off-by: Jens Freimann <jfreimann@redhat.com>


I really like this approach. I think is has value beyond failover:
e.g. if the PCI bus is powered off then devices on it should
also be invisible. Right now we kind of work around this
but we could switch to this API down the road.

> ---
>  hw/core/qdev.c         | 19 +++++++++++++++++++
>  hw/vfio/pci.c          |  1 +
>  hw/vfio/pci.h          |  1 +
>  include/hw/qdev-core.h |  9 +++++++++
>  qdev-monitor.c         | 41 ++++++++++++++++++++++++++++++++++++++---
>  vl.c                   |  6 ++++--
>  6 files changed, 72 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/core/qdev.c b/hw/core/qdev.c
> index 98cdaa6bf7..d55fe00ae7 100644
> --- a/hw/core/qdev.c
> +++ b/hw/core/qdev.c
> @@ -211,6 +211,25 @@ void device_listener_unregister(DeviceListener *listener)
>      QTAILQ_REMOVE(&device_listeners, listener, link);
>  }
>  
> +bool qdev_should_hide_device(QemuOpts *opts, Error **errp)
> +{
> +    bool res = false;
> +    bool match_found = false;
> +
> +    DeviceListener *listener;
> +
> +    QTAILQ_FOREACH(listener, &device_listeners, link) {
> +       if (listener->should_be_hidden) {
> +            listener->should_be_hidden(listener, opts, &match_found, &res);
> +        }
> +
> +        if (match_found) {
> +            break;
> +        }
> +    }
> +    return res;
> +}
> +
>  void qdev_set_legacy_instance_id(DeviceState *dev, int alias_id,
>                                   int required_for_version)
>  {
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 8cecb53d5c..835249c61d 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3215,6 +3215,7 @@ static Property vfio_pci_dev_properties[] = {
>                              display, ON_OFF_AUTO_OFF),
>      DEFINE_PROP_UINT32("xres", VFIOPCIDevice, display_xres, 0),
>      DEFINE_PROP_UINT32("yres", VFIOPCIDevice, display_yres, 0),
> +    DEFINE_PROP_STRING("standby", VFIOPCIDevice, standby),
>      DEFINE_PROP_UINT32("x-intx-mmap-timeout-ms", VFIOPCIDevice,
>                         intx.mmap_timeout, 1100),
>      DEFINE_PROP_BIT("x-vga", VFIOPCIDevice, features,
> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> index cfcd1a81b8..1a87f91889 100644
> --- a/hw/vfio/pci.h
> +++ b/hw/vfio/pci.h
> @@ -135,6 +135,7 @@ typedef struct VFIOPCIDevice {
>      PCIHostDeviceAddress host;
>      EventNotifier err_notifier;
>      EventNotifier req_notifier;
> +    char *standby;
>      int (*resetfn)(struct VFIOPCIDevice *);
>      uint32_t vendor_id;
>      uint32_t device_id;
> diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> index 5437395779..d54d3ae62a 100644
> --- a/include/hw/qdev-core.h
> +++ b/include/hw/qdev-core.h
> @@ -158,6 +158,13 @@ struct DeviceState {
>  struct DeviceListener {
>      void (*realize)(DeviceListener *listener, DeviceState *dev);
>      void (*unrealize)(DeviceListener *listener, DeviceState *dev);
> +    /*
> +     * This callback is called just upon init of the DeviceState
> +     * and can be used by a standby device for informing qdev if this
> +     * device should be hidden by checking the device opts
> +     */
> +    void (*should_be_hidden)(DeviceListener *listener, QemuOpts *device_opts,
> +            bool *match_found, bool *res);
>      QTAILQ_ENTRY(DeviceListener) link;
>  };
>  
> @@ -454,4 +461,6 @@ static inline bool qbus_is_hotpluggable(BusState *bus)
>  void device_listener_register(DeviceListener *listener);
>  void device_listener_unregister(DeviceListener *listener);
>  
> +bool qdev_should_hide_device(QemuOpts *opts, Error **errp);
> +
>  #endif
> diff --git a/qdev-monitor.c b/qdev-monitor.c
> index 9cce8b93c2..a81226529a 100644
> --- a/qdev-monitor.c
> +++ b/qdev-monitor.c
> @@ -32,8 +32,10 @@
>  #include "qemu/help_option.h"
>  #include "qemu/option.h"
>  #include "qemu/qemu-print.h"
> +#include "qemu/option_int.h"
>  #include "sysemu/block-backend.h"
>  #include "migration/misc.h"
> +#include "migration/migration.h"
>  
>  /*
>   * Aliases were a bad idea from the start.  Let's keep them
> @@ -561,14 +563,45 @@ void qdev_set_id(DeviceState *dev, const char *id)
>      }
>  }
>  
> +static int is_failover_device(void *opaque, const char *name, const char *value,
> +                        Error **errp)
> +{
> +    if (strcmp(name, "standby") == 0) {
> +        QemuOpts *opts = (QemuOpts *)opaque;
> +
> +        if (qdev_should_hide_device(opts, errp) && errp && !*errp) {
> +            return 1;
> +        } else if (errp && *errp) {
> +            return -1;
> +        }
> +    }
> +
> +    return 0;
> +}
> +
> +static bool should_hide_device(QemuOpts *opts, Error **err)
> +{
> +    if (qemu_opt_foreach(opts, is_failover_device, opts, err) == 0) {
> +        return false;
> +    }
> +    return true;
> +}
> +
>  DeviceState *qdev_device_add(QemuOpts *opts, Error **errp)
>  {
>      DeviceClass *dc;
>      const char *driver, *path;
> -    DeviceState *dev;
> +    DeviceState *dev = NULL;
>      BusState *bus = NULL;
>      Error *err = NULL;
>  
> +    if (opts && should_hide_device(opts, &err)) {
> +        if (err) {
> +            goto err_del_dev;
> +        }
> +        return NULL;
> +    }
> +
>      driver = qemu_opt_get(opts, "driver");
>      if (!driver) {
>          error_setg(errp, QERR_MISSING_PARAMETER, "driver");
> @@ -640,8 +673,10 @@ DeviceState *qdev_device_add(QemuOpts *opts, Error **errp)
>  
>  err_del_dev:
>      error_propagate(errp, err);
> -    object_unparent(OBJECT(dev));
> -    object_unref(OBJECT(dev));
> +    if (dev) {
> +        object_unparent(OBJECT(dev));
> +        object_unref(OBJECT(dev));
> +    }
>      return NULL;
>  }
>  
> diff --git a/vl.c b/vl.c
> index b6709514c1..4b5b878275 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -2355,10 +2355,12 @@ static int device_init_func(void *opaque, QemuOpts *opts, Error **errp)
>      DeviceState *dev;
>  
>      dev = qdev_device_add(opts, errp);
> -    if (!dev) {
> +    if (!dev && *errp) {
> +        error_report_err(*errp);
>          return -1;
> +    } else if (dev) {
> +        object_unref(OBJECT(dev));
>      }
> -    object_unref(OBJECT(dev));
>      return 0;
>  }
>  
> -- 
> 2.21.0


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices
  2019-05-21  7:21   ` Jens Freimann
@ 2019-05-21 11:37     ` Michael S. Tsirkin
  2019-05-21 18:49       ` Jens Freimann
  2019-05-21 14:18     ` Alex Williamson
  1 sibling, 1 reply; 77+ messages in thread
From: Michael S. Tsirkin @ 2019-05-21 11:37 UTC (permalink / raw)
  To: Jens Freimann
  Cc: pkrempa, berrange, ehabkost, aadam, qemu-devel, Alex Williamson,
	laine, ailan

On Tue, May 21, 2019 at 09:21:57AM +0200, Jens Freimann wrote:
> On Mon, May 20, 2019 at 04:56:57PM -0600, Alex Williamson wrote:
> > On Fri, 17 May 2019 14:58:16 +0200
> > Jens Freimann <jfreimann@redhat.com> wrote:
> > > Command line example:
> > > 
> > > qemu-system-x86_64 -enable-kvm -m 3072 -smp 3 \
> > >         -machine q35,kernel-irqchip=split -cpu host   \
> > >         -k fr   \
> > >         -serial stdio   \
> > >         -net none \
> > >         -qmp unix:/tmp/qmp.socket,server,nowait \
> > >         -monitor telnet:127.0.0.1:5555,server,nowait \
> > >         -device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
> > >         -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
> > >         -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
> > >         -netdev tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \
> > >         -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,failover=on \
> > >         /root/rhel-guest-image-8.0-1781.x86_64.qcow2
> > > 
> > > Then the primary device can be hotplugged via
> > >  (qemu) device_add vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,standby=net1
> > 
> > Is this standby= option only valid for Network/Ethernet class code
> > devices?  If so, perhaps vfio-pci code should reject the option on any
> > non-ethernet devices.  The option is also non-intuitive for users, only
> > through examples like above can we see it relates to the id of the
> > secondary device.  Could we instead name it something like
> > "standby_net_failover_pair_id="?
> 
> It is only for ethernet (VFs), I will add code to reject non-ethernet VF devices.
> I agree the name is not descriptive and the one you suggest seems good to
> me.
> > 
> > Also, this feature requires matching MAC addresses per the description,
> > where is that done?  Is it the user's responsibility to set the MAC on
> > the host device prior to the device_add?  If so, is this actually not
> > only specific to ethernet devices, but ethernet VFs?
> 
> Yes, it's the users responsibility and the MACs are then matched by
> the net_failover driver in the guest. It makes sense for ethernet VFs only,
> I'll add a check for that.

Actually is there a list of devices for which this has been tested
besides mlx5? I think someone said some old intel cards
don't support this well, we might need to blacklist these ...

> > 
> > Finally, please copy me on code touching vfio.  Thanks,
> 
> I'm sorry about that, will do.
> 
> Thanks for the review Alex!
> 
> regards,
> Jens


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices
  2019-05-21  7:21   ` Jens Freimann
  2019-05-21 11:37     ` Michael S. Tsirkin
@ 2019-05-21 14:18     ` Alex Williamson
  1 sibling, 0 replies; 77+ messages in thread
From: Alex Williamson @ 2019-05-21 14:18 UTC (permalink / raw)
  To: Jens Freimann
  Cc: pkrempa, berrange, ehabkost, mst, aadam, qemu-devel, laine, ailan

On Tue, 21 May 2019 09:21:57 +0200
Jens Freimann <jfreimann@redhat.com> wrote:

> On Mon, May 20, 2019 at 04:56:57PM -0600, Alex Williamson wrote:
> >On Fri, 17 May 2019 14:58:16 +0200
> >Jens Freimann <jfreimann@redhat.com> wrote:  
> >> Command line example:
> >>
> >> qemu-system-x86_64 -enable-kvm -m 3072 -smp 3 \
> >>         -machine q35,kernel-irqchip=split -cpu host   \
> >>         -k fr   \
> >>         -serial stdio   \
> >>         -net none \
> >>         -qmp unix:/tmp/qmp.socket,server,nowait \
> >>         -monitor telnet:127.0.0.1:5555,server,nowait \
> >>         -device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
> >>         -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
> >>         -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
> >>         -netdev tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \
> >>         -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,failover=on \
> >>         /root/rhel-guest-image-8.0-1781.x86_64.qcow2
> >>
> >> Then the primary device can be hotplugged via
> >>  (qemu) device_add vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,standby=net1  
> >
> >Is this standby= option only valid for Network/Ethernet class code
> >devices?  If so, perhaps vfio-pci code should reject the option on any
> >non-ethernet devices.  The option is also non-intuitive for users, only
> >through examples like above can we see it relates to the id of the
> >secondary device.  Could we instead name it something like
> >"standby_net_failover_pair_id="?  
> 
> It is only for ethernet (VFs), I will add code to reject non-ethernet VF devices.
> I agree the name is not descriptive and the one you suggest seems good to
> me. 
> >
> >Also, this feature requires matching MAC addresses per the description,
> >where is that done?  Is it the user's responsibility to set the MAC on
> >the host device prior to the device_add?  If so, is this actually not
> >only specific to ethernet devices, but ethernet VFs?  
> 
> Yes, it's the users responsibility and the MACs are then matched by
> the net_failover driver in the guest. It makes sense for ethernet VFs only,
> I'll add a check for that.

FWIW, I'd probably stop at Ethernet class devices, vfio doesn't really
expose whether a device is a VF, so we'd likely need to resort to
getting that info through sysfs.  It also seems like there might be
some limited-use cases of copying the MAC from a PF to the virtio nic
or use of utilities on the host for modifiying a PF MAC, perhaps via
eeprom.  So while we expect the typical use case to be a VF, it's
probably ugly and unnecessarily restrictive to enforce it.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices
  2019-05-21 11:37     ` Michael S. Tsirkin
@ 2019-05-21 18:49       ` Jens Freimann
  2019-05-29  0:14         ` si-wei liu
  2019-05-29  2:40         ` Michael S. Tsirkin
  0 siblings, 2 replies; 77+ messages in thread
From: Jens Freimann @ 2019-05-21 18:49 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: pkrempa, berrange, ehabkost, aadam, qemu-devel, Alex Williamson,
	laine, ailan

On Tue, May 21, 2019 at 07:37:19AM -0400, Michael S. Tsirkin wrote:
>On Tue, May 21, 2019 at 09:21:57AM +0200, Jens Freimann wrote:
>> On Mon, May 20, 2019 at 04:56:57PM -0600, Alex Williamson wrote:
>> > On Fri, 17 May 2019 14:58:16 +0200
>> > Jens Freimann <jfreimann@redhat.com> wrote:
>> > > Command line example:
>> > >
>> > > qemu-system-x86_64 -enable-kvm -m 3072 -smp 3 \
>> > >         -machine q35,kernel-irqchip=split -cpu host   \
>> > >         -k fr   \
>> > >         -serial stdio   \
>> > >         -net none \
>> > >         -qmp unix:/tmp/qmp.socket,server,nowait \
>> > >         -monitor telnet:127.0.0.1:5555,server,nowait \
>> > >         -device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
>> > >         -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
>> > >         -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
>> > >         -netdev tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \
>> > >         -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,failover=on \
>> > >         /root/rhel-guest-image-8.0-1781.x86_64.qcow2
>> > >
>> > > Then the primary device can be hotplugged via
>> > >  (qemu) device_add vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,standby=net1
>> >
>> > Is this standby= option only valid for Network/Ethernet class code
>> > devices?  If so, perhaps vfio-pci code should reject the option on any
>> > non-ethernet devices.  The option is also non-intuitive for users, only
>> > through examples like above can we see it relates to the id of the
>> > secondary device.  Could we instead name it something like
>> > "standby_net_failover_pair_id="?
>>
>> It is only for ethernet (VFs), I will add code to reject non-ethernet VF devices.
>> I agree the name is not descriptive and the one you suggest seems good to
>> me.
>> >
>> > Also, this feature requires matching MAC addresses per the description,
>> > where is that done?  Is it the user's responsibility to set the MAC on
>> > the host device prior to the device_add?  If so, is this actually not
>> > only specific to ethernet devices, but ethernet VFs?
>>
>> Yes, it's the users responsibility and the MACs are then matched by
>> the net_failover driver in the guest. It makes sense for ethernet VFs only,
>> I'll add a check for that.
>
>Actually is there a list of devices for which this has been tested
>besides mlx5? I think someone said some old intel cards
>don't support this well, we might need to blacklist these ...

So far I've tested mlx5 and XL710 which both worked, but I'm
working on testing with more devices. But of course help with testing
is greatly appreciated.

regards,
Jens 


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices
  2019-05-21 10:10 ` Michael S. Tsirkin
@ 2019-05-21 19:17   ` Jens Freimann
  2019-05-21 21:43     ` Michael S. Tsirkin
  0 siblings, 1 reply; 77+ messages in thread
From: Jens Freimann @ 2019-05-21 19:17 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: pkrempa, berrange, ehabkost, aadam, qemu-devel, laine, ailan

On Tue, May 21, 2019 at 06:10:19AM -0400, Michael S. Tsirkin wrote:
>On Fri, May 17, 2019 at 02:58:16PM +0200, Jens Freimann wrote:
>> I'm grateful for any remarks or ideas!
>>
>> Thanks!
>
>Hi Jens!
>Overall I like the patches. Thanks!
>
>Could you please tell us a bit more about other hardware: does this work
>more or less universally across vendors? were any other cards tested?

Thank you, I have tested only Mellanox and XL710 so far but am working
on testing with more nics at the moment. I think there's a few more
things to work out with the patches (especially the unplug before
migration) which should give me a bit more time to test other cards.

Also I haven't yet tested unplug with surprise removal. My understanding
is that device_del was switched from using surprise removal to
attention button etc. a while back. So I'll have to find out first how
to try removing a card with surprise removal in qemu. 

regards,
Jens  


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices
  2019-05-21 19:17   ` Jens Freimann
@ 2019-05-21 21:43     ` Michael S. Tsirkin
  0 siblings, 0 replies; 77+ messages in thread
From: Michael S. Tsirkin @ 2019-05-21 21:43 UTC (permalink / raw)
  To: Jens Freimann
  Cc: pkrempa, berrange, ehabkost, aadam, qemu-devel, laine, ailan

On Tue, May 21, 2019 at 09:17:54PM +0200, Jens Freimann wrote:
> On Tue, May 21, 2019 at 06:10:19AM -0400, Michael S. Tsirkin wrote:
> > On Fri, May 17, 2019 at 02:58:16PM +0200, Jens Freimann wrote:
> > > I'm grateful for any remarks or ideas!
> > > 
> > > Thanks!
> > 
> > Hi Jens!
> > Overall I like the patches. Thanks!
> > 
> > Could you please tell us a bit more about other hardware: does this work
> > more or less universally across vendors? were any other cards tested?
> 
> Thank you, I have tested only Mellanox and XL710 so far but am working
> on testing with more nics at the moment. I think there's a few more
> things to work out with the patches (especially the unplug before
> migration) which should give me a bit more time to test other cards.
> 
> Also I haven't yet tested unplug with surprise removal. My understanding
> is that device_del was switched from using surprise removal to
> attention button etc. a while back.

it never used surprise removal

> So I'll have to find out first how
> to try removing a card with surprise removal in qemu.
> 
> regards,
> Jens

i would not do this at this stage yet. lots of work needed to
make linux not crash

-- 
MST


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] migration: allow unplug during migration for failover devices
  2019-05-21  9:33   ` Dr. David Alan Gilbert
  2019-05-21  9:47     ` Daniel P. Berrangé
@ 2019-05-23  8:01     ` Jens Freimann
  2019-05-23 15:37       ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 77+ messages in thread
From: Jens Freimann @ 2019-05-23  8:01 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: pkrempa, berrange, ehabkost, mst, aadam, qemu-devel, laine, ailan

On Tue, May 21, 2019 at 10:33:36AM +0100, Dr. David Alan Gilbert wrote:
>* Jens Freimann (jfreimann@redhat.com) wrote:
>> In "b06424de62 migration: Disable hotplug/unplug during migration" we
>> added a check to disable unplug for all devices until we have figured
>> out what works. For failover primary devices qdev_unplug() is called
>> from the migration handler, i.e. during migration.
>>
>> This patch adds a flag to DeviceState which is set to false for all
>> devices and makes an exception for vfio-pci devices that are also
>> primary devices in a failover pair.
>>
>> Signed-off-by: Jens Freimann <jfreimann@redhat.com>
>
>So I think this is safe in your case, because you trigger the unplug
>right at the start of migration during setup and plug after failure;
>however it's not generally safe - I can't unplug a device while the
>migration is actually in progress.

I tried to limit it to only allow it in failover case. You're saying
it's missing something and not strict enough? I could allow it only
during migration setup. I guess we'll need a similar exception for
failover in libvirt. 

regards,
Jens 


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 1/4] migration: allow unplug during migration for failover devices
  2019-05-23  8:01     ` Jens Freimann
@ 2019-05-23 15:37       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 77+ messages in thread
From: Dr. David Alan Gilbert @ 2019-05-23 15:37 UTC (permalink / raw)
  To: Jens Freimann
  Cc: pkrempa, berrange, ehabkost, mst, aadam, qemu-devel, laine, ailan

* Jens Freimann (jfreimann@redhat.com) wrote:
> On Tue, May 21, 2019 at 10:33:36AM +0100, Dr. David Alan Gilbert wrote:
> > * Jens Freimann (jfreimann@redhat.com) wrote:
> > > In "b06424de62 migration: Disable hotplug/unplug during migration" we
> > > added a check to disable unplug for all devices until we have figured
> > > out what works. For failover primary devices qdev_unplug() is called
> > > from the migration handler, i.e. during migration.
> > > 
> > > This patch adds a flag to DeviceState which is set to false for all
> > > devices and makes an exception for vfio-pci devices that are also
> > > primary devices in a failover pair.
> > > 
> > > Signed-off-by: Jens Freimann <jfreimann@redhat.com>
> > 
> > So I think this is safe in your case, because you trigger the unplug
> > right at the start of migration during setup and plug after failure;
> > however it's not generally safe - I can't unplug a device while the
> > migration is actually in progress.
> 
> I tried to limit it to only allow it in failover case. You're saying
> it's missing something and not strict enough? I could allow it only
> during migration setup. I guess we'll need a similar exception for
> failover in libvirt.

I might be wrong, but I think with your patch I could hot unplug your
device part way through migration; where as I think you only care about
it doing it at a very specific point during setup.

(I still would prefer the hotplug to be done outside qemu, but still
that's separate).

Dave
> regards,
> Jens
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices
  2019-05-21 18:49       ` Jens Freimann
@ 2019-05-29  0:14         ` si-wei liu
  2019-05-29  2:54           ` Michael S. Tsirkin
  2019-05-29  2:40         ` Michael S. Tsirkin
  1 sibling, 1 reply; 77+ messages in thread
From: si-wei liu @ 2019-05-29  0:14 UTC (permalink / raw)
  To: Jens Freimann, Michael S. Tsirkin
  Cc: pkrempa, berrange, ehabkost, aadam, qemu-devel, Alex Williamson,
	laine, ailan



On 5/21/2019 11:49 AM, Jens Freimann wrote:
> On Tue, May 21, 2019 at 07:37:19AM -0400, Michael S. Tsirkin wrote:
>> On Tue, May 21, 2019 at 09:21:57AM +0200, Jens Freimann wrote:
>>> On Mon, May 20, 2019 at 04:56:57PM -0600, Alex Williamson wrote:
>>> > On Fri, 17 May 2019 14:58:16 +0200
>>> > Jens Freimann <jfreimann@redhat.com> wrote:
>>> > > Command line example:
>>> > >
>>> > > qemu-system-x86_64 -enable-kvm -m 3072 -smp 3 \
>>> > >         -machine q35,kernel-irqchip=split -cpu host   \
>>> > >         -k fr   \
>>> > >         -serial stdio   \
>>> > >         -net none \
>>> > >         -qmp unix:/tmp/qmp.socket,server,nowait \
>>> > >         -monitor telnet:127.0.0.1:5555,server,nowait \
>>> > >         -device 
>>> pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
>>> > >         -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
>>> > >         -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
>>> > >         -netdev 
>>> tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \
>>> > >         -device 
>>> virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,failover=on 
>>> \
>>> > >         /root/rhel-guest-image-8.0-1781.x86_64.qcow2
>>> > >
>>> > > Then the primary device can be hotplugged via
>>> > >  (qemu) device_add 
>>> vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,standby=net1
>>> >
>>> > Is this standby= option only valid for Network/Ethernet class code
>>> > devices?  If so, perhaps vfio-pci code should reject the option on 
>>> any
>>> > non-ethernet devices.  The option is also non-intuitive for users, 
>>> only
>>> > through examples like above can we see it relates to the id of the
>>> > secondary device.  Could we instead name it something like
>>> > "standby_net_failover_pair_id="?
>>>
>>> It is only for ethernet (VFs), I will add code to reject 
>>> non-ethernet VF devices.
>>> I agree the name is not descriptive and the one you suggest seems 
>>> good to
>>> me.
>>> >
>>> > Also, this feature requires matching MAC addresses per the 
>>> description,
>>> > where is that done?  Is it the user's responsibility to set the 
>>> MAC on
>>> > the host device prior to the device_add?  If so, is this actually not
>>> > only specific to ethernet devices, but ethernet VFs?
>>>
>>> Yes, it's the users responsibility and the MACs are then matched by
>>> the net_failover driver in the guest. It makes sense for ethernet 
>>> VFs only,
>>> I'll add a check for that.
>>
>> Actually is there a list of devices for which this has been tested
>> besides mlx5? I think someone said some old intel cards
>> don't support this well, we might need to blacklist these ...
>
> So far I've tested mlx5 and XL710 which both worked, but I'm
> working on testing with more devices. But of course help with testing
> is greatly appreciated.
It won't work on Intel ixgbe and Broadcom bnxt_en, which requires 
toggling the state of tap backing the virtio-net in order to 
release/reprogram MAC filter. Actually, it's very few NICs that could 
work with this - even some works by chance the behavior is undefined. 
Instead of blacklisting it makes more sense to whitelist the NIC that 
supports it - with some new sysfs attribute claiming the support 
presumably.

-Siwei

>
> regards,
> Jens



^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices
  2019-05-21 18:49       ` Jens Freimann
  2019-05-29  0:14         ` si-wei liu
@ 2019-05-29  2:40         ` Michael S. Tsirkin
  2019-05-29  7:48           ` Jens Freimann
  1 sibling, 1 reply; 77+ messages in thread
From: Michael S. Tsirkin @ 2019-05-29  2:40 UTC (permalink / raw)
  To: Jens Freimann
  Cc: pkrempa, berrange, ehabkost, aadam, qemu-devel, Alex Williamson,
	laine, ailan

On Tue, May 21, 2019 at 08:49:18PM +0200, Jens Freimann wrote:
> On Tue, May 21, 2019 at 07:37:19AM -0400, Michael S. Tsirkin wrote:
> > On Tue, May 21, 2019 at 09:21:57AM +0200, Jens Freimann wrote:
> > > On Mon, May 20, 2019 at 04:56:57PM -0600, Alex Williamson wrote:
> > > > On Fri, 17 May 2019 14:58:16 +0200
> > > > Jens Freimann <jfreimann@redhat.com> wrote:
> > > > > Command line example:
> > > > >
> > > > > qemu-system-x86_64 -enable-kvm -m 3072 -smp 3 \
> > > > >         -machine q35,kernel-irqchip=split -cpu host   \
> > > > >         -k fr   \
> > > > >         -serial stdio   \
> > > > >         -net none \
> > > > >         -qmp unix:/tmp/qmp.socket,server,nowait \
> > > > >         -monitor telnet:127.0.0.1:5555,server,nowait \
> > > > >         -device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
> > > > >         -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
> > > > >         -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
> > > > >         -netdev tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \
> > > > >         -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,failover=on \
> > > > >         /root/rhel-guest-image-8.0-1781.x86_64.qcow2
> > > > >
> > > > > Then the primary device can be hotplugged via
> > > > >  (qemu) device_add vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,standby=net1
> > > >
> > > > Is this standby= option only valid for Network/Ethernet class code
> > > > devices?  If so, perhaps vfio-pci code should reject the option on any
> > > > non-ethernet devices.  The option is also non-intuitive for users, only
> > > > through examples like above can we see it relates to the id of the
> > > > secondary device.  Could we instead name it something like
> > > > "standby_net_failover_pair_id="?
> > > 
> > > It is only for ethernet (VFs), I will add code to reject non-ethernet VF devices.
> > > I agree the name is not descriptive and the one you suggest seems good to
> > > me.
> > > >
> > > > Also, this feature requires matching MAC addresses per the description,
> > > > where is that done?  Is it the user's responsibility to set the MAC on
> > > > the host device prior to the device_add?  If so, is this actually not
> > > > only specific to ethernet devices, but ethernet VFs?
> > > 
> > > Yes, it's the users responsibility and the MACs are then matched by
> > > the net_failover driver in the guest. It makes sense for ethernet VFs only,
> > > I'll add a check for that.
> > 
> > Actually is there a list of devices for which this has been tested
> > besides mlx5? I think someone said some old intel cards
> > don't support this well, we might need to blacklist these ...
> 
> So far I've tested mlx5 and XL710 which both worked, but I'm
> working on testing with more devices. But of course help with testing
> is greatly appreciated.
> 
> regards,
> Jens

A testing tool that people can run to get a pass/fail
result would be needed for that.
Do you have something like this?

-- 
MST


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices
  2019-05-29  0:14         ` si-wei liu
@ 2019-05-29  2:54           ` Michael S. Tsirkin
  2019-06-03 18:06             ` Laine Stump
  0 siblings, 1 reply; 77+ messages in thread
From: Michael S. Tsirkin @ 2019-05-29  2:54 UTC (permalink / raw)
  To: si-wei liu
  Cc: pkrempa, berrange, ehabkost, aadam, qemu-devel, Alex Williamson,
	laine, Jens Freimann, ailan

On Tue, May 28, 2019 at 05:14:22PM -0700, si-wei liu wrote:
> 
> 
> On 5/21/2019 11:49 AM, Jens Freimann wrote:
> > On Tue, May 21, 2019 at 07:37:19AM -0400, Michael S. Tsirkin wrote:
> > > On Tue, May 21, 2019 at 09:21:57AM +0200, Jens Freimann wrote:
> > > > On Mon, May 20, 2019 at 04:56:57PM -0600, Alex Williamson wrote:
> > > > > On Fri, 17 May 2019 14:58:16 +0200
> > > > > Jens Freimann <jfreimann@redhat.com> wrote:
> > > > > > Command line example:
> > > > > >
> > > > > > qemu-system-x86_64 -enable-kvm -m 3072 -smp 3 \
> > > > > >         -machine q35,kernel-irqchip=split -cpu host   \
> > > > > >         -k fr   \
> > > > > >         -serial stdio   \
> > > > > >         -net none \
> > > > > >         -qmp unix:/tmp/qmp.socket,server,nowait \
> > > > > >         -monitor telnet:127.0.0.1:5555,server,nowait \
> > > > > >         -device
> > > > pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
> > > > > >         -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
> > > > > >         -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
> > > > > >         -netdev
> > > > tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on
> > > > \
> > > > > >         -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,failover=on
> > > > \
> > > > > >         /root/rhel-guest-image-8.0-1781.x86_64.qcow2
> > > > > >
> > > > > > Then the primary device can be hotplugged via
> > > > > >  (qemu) device_add
> > > > vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,standby=net1
> > > > >
> > > > > Is this standby= option only valid for Network/Ethernet class code
> > > > > devices?  If so, perhaps vfio-pci code should reject the
> > > > option on any
> > > > > non-ethernet devices.  The option is also non-intuitive for
> > > > users, only
> > > > > through examples like above can we see it relates to the id of the
> > > > > secondary device.  Could we instead name it something like
> > > > > "standby_net_failover_pair_id="?
> > > > 
> > > > It is only for ethernet (VFs), I will add code to reject
> > > > non-ethernet VF devices.
> > > > I agree the name is not descriptive and the one you suggest
> > > > seems good to
> > > > me.
> > > > >
> > > > > Also, this feature requires matching MAC addresses per the
> > > > description,
> > > > > where is that done?  Is it the user's responsibility to set
> > > > the MAC on
> > > > > the host device prior to the device_add?  If so, is this actually not
> > > > > only specific to ethernet devices, but ethernet VFs?
> > > > 
> > > > Yes, it's the users responsibility and the MACs are then matched by
> > > > the net_failover driver in the guest. It makes sense for
> > > > ethernet VFs only,
> > > > I'll add a check for that.
> > > 
> > > Actually is there a list of devices for which this has been tested
> > > besides mlx5? I think someone said some old intel cards
> > > don't support this well, we might need to blacklist these ...
> > 
> > So far I've tested mlx5 and XL710 which both worked, but I'm
> > working on testing with more devices. But of course help with testing
> > is greatly appreciated.
> It won't work on Intel ixgbe and Broadcom bnxt_en, which requires toggling
> the state of tap backing the virtio-net in order to release/reprogram MAC
> filter. Actually, it's very few NICs that could work with this - even some
> works by chance the behavior is undefined. Instead of blacklisting it makes
> more sense to whitelist the NIC that supports it - with some new sysfs
> attribute claiming the support presumably.
> 
> -Siwei

I agree for many cards we won't know how they behave until we try.  One
can consider this a bug in Linux that cards don't behave in a consistent
way.  The best thing to do IMHO would be to write a tool that people can
run to test the behaviour.


> > 
> > regards,
> > Jens


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices
  2019-05-29  2:40         ` Michael S. Tsirkin
@ 2019-05-29  7:48           ` Jens Freimann
  2019-05-30 18:12             ` Michael S. Tsirkin
  0 siblings, 1 reply; 77+ messages in thread
From: Jens Freimann @ 2019-05-29  7:48 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: pkrempa, berrange, ehabkost, aadam, qemu-devel, Alex Williamson,
	laine, ailan

On Tue, May 28, 2019 at 10:40:42PM -0400, Michael S. Tsirkin wrote:
>On Tue, May 21, 2019 at 08:49:18PM +0200, Jens Freimann wrote:
>> On Tue, May 21, 2019 at 07:37:19AM -0400, Michael S. Tsirkin wrote:
>> > On Tue, May 21, 2019 at 09:21:57AM +0200, Jens Freimann wrote:
>> > Actually is there a list of devices for which this has been tested
>> > besides mlx5? I think someone said some old intel cards
>> > don't support this well, we might need to blacklist these ...
>>
>> So far I've tested mlx5 and XL710 which both worked, but I'm
>> working on testing with more devices. But of course help with testing
>> is greatly appreciated.
>
>A testing tool that people can run to get a pass/fail
>result would be needed for that.
>Do you have something like this?

I have two simple tools. One that sends packets and another one that
sniffs for packets to see which device the packet goes to. Find it at
https://github.com/jensfr/netfailover_driver_detect

Feedback and/or patches welcome.

regards,
Jens 


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-05-21  9:45   ` Dr. David Alan Gilbert
@ 2019-05-30 14:56     ` Jens Freimann
  2019-05-30 17:46       ` Michael S. Tsirkin
                         ` (2 more replies)
  0 siblings, 3 replies; 77+ messages in thread
From: Jens Freimann @ 2019-05-30 14:56 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: pkrempa, berrange, ehabkost, mst, aadam, qemu-devel, laine, ailan

Hi David,

sorry for the  delayed reply. 

On Tue, May 28, 2019 at 11:04:15AM -0400, Michael S. Tsirkin wrote:
>On Tue, May 21, 2019 at 10:45:05AM +0100, Dr. David Alan Gilbert wrote:
>> * Jens Freimann (jfreimann@redhat.com) wrote:
>> > +static void virtio_net_primary_plug_timer(void *opaque);
>> > +
>> >  static void virtio_net_set_link_status(NetClientState *nc)
>> >  {
>> >      VirtIONet *n = qemu_get_nic_opaque(nc);
>> > @@ -786,6 +796,14 @@ static void virtio_net_set_features(VirtIODevice *vdev, uint64_t features)
>> >      } else {
>> >          memset(n->vlans, 0xff, MAX_VLAN >> 3);
>> >      }
>> > +
>> > +    if (virtio_has_feature(features, VIRTIO_NET_F_STANDBY)) {
>> > +        atomic_set(&n->primary_should_be_hidden, false);
>> > +        if (n->primary_device_timer)
>> > +            timer_mod(n->primary_device_timer,
>> > +                qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
>> > +                4000);
>> > +    }
>>
>> What's this magic timer constant and why?

To be honest it's a leftover from previous versions (before I took
over) of the patches and I'm not sure why the timer is there.
I removed it and so far see no reason to keep it.  

>>
>> >  }
>> >
>> >  static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
>> > @@ -2626,6 +2644,87 @@ void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
>> >      n->netclient_type = g_strdup(type);
>> >  }
>> >
>> > +static void virtio_net_primary_plug_timer(void *opaque)
>> > +{
>> > +    VirtIONet *n = opaque;
>> > +    Error *err = NULL;
>> > +
>> > +    if (n->primary_device_dict)
>> > +        n->primary_device_opts = qemu_opts_from_qdict(qemu_find_opts("device"),
>> > +            n->primary_device_dict, &err);
>> > +    if (n->primary_device_opts) {
>> > +        n->primary_dev = qdev_device_add(n->primary_device_opts, &err);
>> > +        error_setg(&err, "virtio_net: couldn't plug in primary device");
>> > +        return;
>> > +    }
>> > +    if (!n->primary_device_dict && err) {
>> > +        if (n->primary_device_timer) {
>> > +            timer_mod(n->primary_device_timer,
>> > +                qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
>> > +                100);
>>
>> same here.

see above

>>
>>
>> > +        }
>> > +    }
>> > +}
>> > +
>> > +static void virtio_net_handle_migration_primary(VirtIONet *n,
>> > +                                                MigrationState *s)
>> > +{
>> > +    Error *err = NULL;
>> > +    bool should_be_hidden = atomic_read(&n->primary_should_be_hidden);
>> > +
>> > +    n->primary_dev = qdev_find_recursive(sysbus_get_default(),
>> > +            n->primary_device_id);
>> > +    if (!n->primary_dev) {
>> > +        error_setg(&err, "virtio_net: couldn't find primary device");
>>
>> There's something broken with the error handling in this function - the
>> 'err' never goes anywhere - I don't think it ever gets printed or
>> reported or stops the migration.

yes, I'll fix it.

>> > +    }
>> > +    if (migration_in_setup(s) && !should_be_hidden && n->primary_dev) {
>> > +        qdev_unplug(n->primary_dev, &err);
>>
>> Not knowing unplug well; can you just explain - is that device hard
>> unplugged and it's gone by the time this function returns or is it still
>> hanging around for some indeterminate time?

Qemu will trigger an unplug request via pcie attention button in which case
there could be a delay by the guest operating system. We could give it some
amount of time and if nothing happens try surpise removal or handle the
error otherwise.


regards,
Jens 


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-05-30 14:56     ` Jens Freimann
@ 2019-05-30 17:46       ` Michael S. Tsirkin
  2019-05-30 18:00         ` Dr. David Alan Gilbert
  2019-05-30 19:09       ` Dr. David Alan Gilbert
  2019-05-31 21:47       ` Eduardo Habkost
  2 siblings, 1 reply; 77+ messages in thread
From: Michael S. Tsirkin @ 2019-05-30 17:46 UTC (permalink / raw)
  To: Jens Freimann
  Cc: pkrempa, berrange, ehabkost, aadam, Dr. David Alan Gilbert,
	qemu-devel, laine, ailan

On Thu, May 30, 2019 at 04:56:45PM +0200, Jens Freimann wrote:
> Hi David,
> 
> sorry for the  delayed reply.
> 
> On Tue, May 28, 2019 at 11:04:15AM -0400, Michael S. Tsirkin wrote:
> > On Tue, May 21, 2019 at 10:45:05AM +0100, Dr. David Alan Gilbert wrote:
> > > * Jens Freimann (jfreimann@redhat.com) wrote:
> > > > +static void virtio_net_primary_plug_timer(void *opaque);
> > > > +
> > > >  static void virtio_net_set_link_status(NetClientState *nc)
> > > >  {
> > > >      VirtIONet *n = qemu_get_nic_opaque(nc);
> > > > @@ -786,6 +796,14 @@ static void virtio_net_set_features(VirtIODevice *vdev, uint64_t features)
> > > >      } else {
> > > >          memset(n->vlans, 0xff, MAX_VLAN >> 3);
> > > >      }
> > > > +
> > > > +    if (virtio_has_feature(features, VIRTIO_NET_F_STANDBY)) {
> > > > +        atomic_set(&n->primary_should_be_hidden, false);
> > > > +        if (n->primary_device_timer)
> > > > +            timer_mod(n->primary_device_timer,
> > > > +                qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> > > > +                4000);
> > > > +    }
> > > 
> > > What's this magic timer constant and why?
> 
> To be honest it's a leftover from previous versions (before I took
> over) of the patches and I'm not sure why the timer is there.
> I removed it and so far see no reason to keep it.
> 
> > > 
> > > >  }
> > > >
> > > >  static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
> > > > @@ -2626,6 +2644,87 @@ void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
> > > >      n->netclient_type = g_strdup(type);
> > > >  }
> > > >
> > > > +static void virtio_net_primary_plug_timer(void *opaque)
> > > > +{
> > > > +    VirtIONet *n = opaque;
> > > > +    Error *err = NULL;
> > > > +
> > > > +    if (n->primary_device_dict)
> > > > +        n->primary_device_opts = qemu_opts_from_qdict(qemu_find_opts("device"),
> > > > +            n->primary_device_dict, &err);
> > > > +    if (n->primary_device_opts) {
> > > > +        n->primary_dev = qdev_device_add(n->primary_device_opts, &err);
> > > > +        error_setg(&err, "virtio_net: couldn't plug in primary device");
> > > > +        return;
> > > > +    }
> > > > +    if (!n->primary_device_dict && err) {
> > > > +        if (n->primary_device_timer) {
> > > > +            timer_mod(n->primary_device_timer,
> > > > +                qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> > > > +                100);
> > > 
> > > same here.
> 
> see above
> 
> > > 
> > > 
> > > > +        }
> > > > +    }
> > > > +}
> > > > +
> > > > +static void virtio_net_handle_migration_primary(VirtIONet *n,
> > > > +                                                MigrationState *s)
> > > > +{
> > > > +    Error *err = NULL;
> > > > +    bool should_be_hidden = atomic_read(&n->primary_should_be_hidden);
> > > > +
> > > > +    n->primary_dev = qdev_find_recursive(sysbus_get_default(),
> > > > +            n->primary_device_id);
> > > > +    if (!n->primary_dev) {
> > > > +        error_setg(&err, "virtio_net: couldn't find primary device");
> > > 
> > > There's something broken with the error handling in this function - the
> > > 'err' never goes anywhere - I don't think it ever gets printed or
> > > reported or stops the migration.
> 
> yes, I'll fix it.
> 
> > > > +    }
> > > > +    if (migration_in_setup(s) && !should_be_hidden && n->primary_dev) {
> > > > +        qdev_unplug(n->primary_dev, &err);
> > > 
> > > Not knowing unplug well; can you just explain - is that device hard
> > > unplugged and it's gone by the time this function returns or is it still
> > > hanging around for some indeterminate time?
> 
> Qemu will trigger an unplug request via pcie attention button in which case
> there could be a delay by the guest operating system. We could give it some
> amount of time and if nothing happens try surpise removal or handle the
> error otherwise.
> 
> 
> regards,
> Jens

That's a subject for another day. Let's get the basic thing
working.

-- 
MST


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-05-30 17:46       ` Michael S. Tsirkin
@ 2019-05-30 18:00         ` Dr. David Alan Gilbert
  2019-05-30 18:09           ` Michael S. Tsirkin
  2019-05-30 18:17           ` Eduardo Habkost
  0 siblings, 2 replies; 77+ messages in thread
From: Dr. David Alan Gilbert @ 2019-05-30 18:00 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: pkrempa, berrange, ehabkost, aadam, qemu-devel, laine,
	Jens Freimann, ailan

* Michael S. Tsirkin (mst@redhat.com) wrote:
> On Thu, May 30, 2019 at 04:56:45PM +0200, Jens Freimann wrote:
> > Hi David,
> > 
> > sorry for the  delayed reply.
> > 
> > On Tue, May 28, 2019 at 11:04:15AM -0400, Michael S. Tsirkin wrote:
> > > On Tue, May 21, 2019 at 10:45:05AM +0100, Dr. David Alan Gilbert wrote:
> > > > * Jens Freimann (jfreimann@redhat.com) wrote:
> > > > > +static void virtio_net_primary_plug_timer(void *opaque);
> > > > > +
> > > > >  static void virtio_net_set_link_status(NetClientState *nc)
> > > > >  {
> > > > >      VirtIONet *n = qemu_get_nic_opaque(nc);
> > > > > @@ -786,6 +796,14 @@ static void virtio_net_set_features(VirtIODevice *vdev, uint64_t features)
> > > > >      } else {
> > > > >          memset(n->vlans, 0xff, MAX_VLAN >> 3);
> > > > >      }
> > > > > +
> > > > > +    if (virtio_has_feature(features, VIRTIO_NET_F_STANDBY)) {
> > > > > +        atomic_set(&n->primary_should_be_hidden, false);
> > > > > +        if (n->primary_device_timer)
> > > > > +            timer_mod(n->primary_device_timer,
> > > > > +                qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> > > > > +                4000);
> > > > > +    }
> > > > 
> > > > What's this magic timer constant and why?
> > 
> > To be honest it's a leftover from previous versions (before I took
> > over) of the patches and I'm not sure why the timer is there.
> > I removed it and so far see no reason to keep it.
> > 
> > > > 
> > > > >  }
> > > > >
> > > > >  static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
> > > > > @@ -2626,6 +2644,87 @@ void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
> > > > >      n->netclient_type = g_strdup(type);
> > > > >  }
> > > > >
> > > > > +static void virtio_net_primary_plug_timer(void *opaque)
> > > > > +{
> > > > > +    VirtIONet *n = opaque;
> > > > > +    Error *err = NULL;
> > > > > +
> > > > > +    if (n->primary_device_dict)
> > > > > +        n->primary_device_opts = qemu_opts_from_qdict(qemu_find_opts("device"),
> > > > > +            n->primary_device_dict, &err);
> > > > > +    if (n->primary_device_opts) {
> > > > > +        n->primary_dev = qdev_device_add(n->primary_device_opts, &err);
> > > > > +        error_setg(&err, "virtio_net: couldn't plug in primary device");
> > > > > +        return;
> > > > > +    }
> > > > > +    if (!n->primary_device_dict && err) {
> > > > > +        if (n->primary_device_timer) {
> > > > > +            timer_mod(n->primary_device_timer,
> > > > > +                qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> > > > > +                100);
> > > > 
> > > > same here.
> > 
> > see above
> > 
> > > > 
> > > > 
> > > > > +        }
> > > > > +    }
> > > > > +}
> > > > > +
> > > > > +static void virtio_net_handle_migration_primary(VirtIONet *n,
> > > > > +                                                MigrationState *s)
> > > > > +{
> > > > > +    Error *err = NULL;
> > > > > +    bool should_be_hidden = atomic_read(&n->primary_should_be_hidden);
> > > > > +
> > > > > +    n->primary_dev = qdev_find_recursive(sysbus_get_default(),
> > > > > +            n->primary_device_id);
> > > > > +    if (!n->primary_dev) {
> > > > > +        error_setg(&err, "virtio_net: couldn't find primary device");
> > > > 
> > > > There's something broken with the error handling in this function - the
> > > > 'err' never goes anywhere - I don't think it ever gets printed or
> > > > reported or stops the migration.
> > 
> > yes, I'll fix it.
> > 
> > > > > +    }
> > > > > +    if (migration_in_setup(s) && !should_be_hidden && n->primary_dev) {
> > > > > +        qdev_unplug(n->primary_dev, &err);
> > > > 
> > > > Not knowing unplug well; can you just explain - is that device hard
> > > > unplugged and it's gone by the time this function returns or is it still
> > > > hanging around for some indeterminate time?
> > 
> > Qemu will trigger an unplug request via pcie attention button in which case
> > there could be a delay by the guest operating system. We could give it some
> > amount of time and if nothing happens try surpise removal or handle the
> > error otherwise.
> > 
> > 
> > regards,
> > Jens
> 
> That's a subject for another day. Let's get the basic thing
> working.

Well no, we need to know this thing isn't going to hang in the migration
setup phase, or if it does how we recover.  This patch series is very
odd precisely because it's trying to do the unplug itself in the
migration phase rather than let the management layer do it - so unless
it's nailed down how to make sure that's really really bullet proof
then we've got to go back and ask the question about whether we should
really fix it so it can be done by the management layer.

Dave

> -- 
> MST
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-05-30 18:00         ` Dr. David Alan Gilbert
@ 2019-05-30 18:09           ` Michael S. Tsirkin
  2019-05-30 18:22             ` Eduardo Habkost
                               ` (2 more replies)
  2019-05-30 18:17           ` Eduardo Habkost
  1 sibling, 3 replies; 77+ messages in thread
From: Michael S. Tsirkin @ 2019-05-30 18:09 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: pkrempa, berrange, ehabkost, aadam, qemu-devel, laine,
	Jens Freimann, ailan

On Thu, May 30, 2019 at 07:00:23PM +0100, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (mst@redhat.com) wrote:
> > On Thu, May 30, 2019 at 04:56:45PM +0200, Jens Freimann wrote:
> > > Hi David,
> > > 
> > > sorry for the  delayed reply.
> > > 
> > > On Tue, May 28, 2019 at 11:04:15AM -0400, Michael S. Tsirkin wrote:
> > > > On Tue, May 21, 2019 at 10:45:05AM +0100, Dr. David Alan Gilbert wrote:
> > > > > * Jens Freimann (jfreimann@redhat.com) wrote:
> > > > > > +static void virtio_net_primary_plug_timer(void *opaque);
> > > > > > +
> > > > > >  static void virtio_net_set_link_status(NetClientState *nc)
> > > > > >  {
> > > > > >      VirtIONet *n = qemu_get_nic_opaque(nc);
> > > > > > @@ -786,6 +796,14 @@ static void virtio_net_set_features(VirtIODevice *vdev, uint64_t features)
> > > > > >      } else {
> > > > > >          memset(n->vlans, 0xff, MAX_VLAN >> 3);
> > > > > >      }
> > > > > > +
> > > > > > +    if (virtio_has_feature(features, VIRTIO_NET_F_STANDBY)) {
> > > > > > +        atomic_set(&n->primary_should_be_hidden, false);
> > > > > > +        if (n->primary_device_timer)
> > > > > > +            timer_mod(n->primary_device_timer,
> > > > > > +                qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> > > > > > +                4000);
> > > > > > +    }
> > > > > 
> > > > > What's this magic timer constant and why?
> > > 
> > > To be honest it's a leftover from previous versions (before I took
> > > over) of the patches and I'm not sure why the timer is there.
> > > I removed it and so far see no reason to keep it.
> > > 
> > > > > 
> > > > > >  }
> > > > > >
> > > > > >  static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
> > > > > > @@ -2626,6 +2644,87 @@ void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
> > > > > >      n->netclient_type = g_strdup(type);
> > > > > >  }
> > > > > >
> > > > > > +static void virtio_net_primary_plug_timer(void *opaque)
> > > > > > +{
> > > > > > +    VirtIONet *n = opaque;
> > > > > > +    Error *err = NULL;
> > > > > > +
> > > > > > +    if (n->primary_device_dict)
> > > > > > +        n->primary_device_opts = qemu_opts_from_qdict(qemu_find_opts("device"),
> > > > > > +            n->primary_device_dict, &err);
> > > > > > +    if (n->primary_device_opts) {
> > > > > > +        n->primary_dev = qdev_device_add(n->primary_device_opts, &err);
> > > > > > +        error_setg(&err, "virtio_net: couldn't plug in primary device");
> > > > > > +        return;
> > > > > > +    }
> > > > > > +    if (!n->primary_device_dict && err) {
> > > > > > +        if (n->primary_device_timer) {
> > > > > > +            timer_mod(n->primary_device_timer,
> > > > > > +                qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> > > > > > +                100);
> > > > > 
> > > > > same here.
> > > 
> > > see above
> > > 
> > > > > 
> > > > > 
> > > > > > +        }
> > > > > > +    }
> > > > > > +}
> > > > > > +
> > > > > > +static void virtio_net_handle_migration_primary(VirtIONet *n,
> > > > > > +                                                MigrationState *s)
> > > > > > +{
> > > > > > +    Error *err = NULL;
> > > > > > +    bool should_be_hidden = atomic_read(&n->primary_should_be_hidden);
> > > > > > +
> > > > > > +    n->primary_dev = qdev_find_recursive(sysbus_get_default(),
> > > > > > +            n->primary_device_id);
> > > > > > +    if (!n->primary_dev) {
> > > > > > +        error_setg(&err, "virtio_net: couldn't find primary device");
> > > > > 
> > > > > There's something broken with the error handling in this function - the
> > > > > 'err' never goes anywhere - I don't think it ever gets printed or
> > > > > reported or stops the migration.
> > > 
> > > yes, I'll fix it.
> > > 
> > > > > > +    }
> > > > > > +    if (migration_in_setup(s) && !should_be_hidden && n->primary_dev) {
> > > > > > +        qdev_unplug(n->primary_dev, &err);
> > > > > 
> > > > > Not knowing unplug well; can you just explain - is that device hard
> > > > > unplugged and it's gone by the time this function returns or is it still
> > > > > hanging around for some indeterminate time?
> > > 
> > > Qemu will trigger an unplug request via pcie attention button in which case
> > > there could be a delay by the guest operating system. We could give it some
> > > amount of time and if nothing happens try surpise removal or handle the
> > > error otherwise.
> > > 
> > > 
> > > regards,
> > > Jens
> > 
> > That's a subject for another day. Let's get the basic thing
> > working.
> 
> Well no, we need to know this thing isn't going to hang in the migration
> setup phase, or if it does how we recover.


This thing is *supposed* to be stuck in migration startup phase
if guest is malicious.

If migration does not progress management needs
a way to detect this and cancel.

Some more documentation about how this is supposed to happen
would be helpful.

>  This patch series is very
> odd precisely because it's trying to do the unplug itself in the
> migration phase rather than let the management layer do it - so unless
> it's nailed down how to make sure that's really really bullet proof
> then we've got to go back and ask the question about whether we should
> really fix it so it can be done by the management layer.
> 
> Dave

management already said they can't because files get closed and
resources freed on unplug and so they might not be able to re-add device
on migration failure. We do it in migration because that is
where failures can happen and we can recover.

> > -- 
> > MST
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices
  2019-05-29  7:48           ` Jens Freimann
@ 2019-05-30 18:12             ` Michael S. Tsirkin
  2019-05-31 15:12               ` Jens Freimann
  0 siblings, 1 reply; 77+ messages in thread
From: Michael S. Tsirkin @ 2019-05-30 18:12 UTC (permalink / raw)
  To: Jens Freimann
  Cc: pkrempa, berrange, ehabkost, aadam, qemu-devel, Alex Williamson,
	laine, ailan

On Wed, May 29, 2019 at 09:48:02AM +0200, Jens Freimann wrote:
> On Tue, May 28, 2019 at 10:40:42PM -0400, Michael S. Tsirkin wrote:
> > On Tue, May 21, 2019 at 08:49:18PM +0200, Jens Freimann wrote:
> > > On Tue, May 21, 2019 at 07:37:19AM -0400, Michael S. Tsirkin wrote:
> > > > On Tue, May 21, 2019 at 09:21:57AM +0200, Jens Freimann wrote:
> > > > Actually is there a list of devices for which this has been tested
> > > > besides mlx5? I think someone said some old intel cards
> > > > don't support this well, we might need to blacklist these ...
> > > 
> > > So far I've tested mlx5 and XL710 which both worked, but I'm
> > > working on testing with more devices. But of course help with testing
> > > is greatly appreciated.
> > 
> > A testing tool that people can run to get a pass/fail
> > result would be needed for that.
> > Do you have something like this?
> 
> I have two simple tools. One that sends packets and another one that
> sniffs for packets to see which device the packet goes to. Find it at
> https://github.com/jensfr/netfailover_driver_detect
> 
> Feedback and/or patches welcome.
> 
> regards,
> Jens

The docs say:
 ./is_legacy -d . If is_legacy returns 0 it means it has received the packets sent by send_packet. If it returns 1 it didn't receive the packet. Now run ./is_legacy -d 


So -d twice. What is the difference?

-- 
MST


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-05-30 18:00         ` Dr. David Alan Gilbert
  2019-05-30 18:09           ` Michael S. Tsirkin
@ 2019-05-30 18:17           ` Eduardo Habkost
  1 sibling, 0 replies; 77+ messages in thread
From: Eduardo Habkost @ 2019-05-30 18:17 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Daniel P. Berrangé
  Cc: pkrempa, Sameeh Jubran, mdroth, Michael S. Tsirkin, aadam,
	Jason Wang, qemu-devel, armbru, liran.alon, laine,
	Yan Vugenfirer, ogerlitz, Jens Freimann, ailan

On Thu, Dec 06, 2018 at 10:01:46AM +0000, Daniel P. Berrangé wrote:
> Users absolutely *do* care why migration is not finishing. A migration that
> does not finish is a major problem for mgmt apps in many case of the use
> cases for migration. Especially important when evacuating VMs from a host
> in order to do a software upgrade or replace faulty hardware. As mentioned
> previously, they will also often serialize migrations to prevent eh network
> being overutilized, so a migration that runs indefinitely will stall
> evacuation of additional VMs too.  Predictable execution of migration and
> clear error reporting/handling are critical features. IMHO this is the key
> reason VFIO unplug/plug needs to be done explicitly by the mgmt app, so it
> can be in control over when each part of the process takes place.

On Fri, Apr 05, 2019 at 09:56:29AM +0100, Dr. David Alan Gilbert wrote:
> Why not just let this happen at the libvirt level; then you do the
> hotunplug etc before you actually tell qemu anything about starting a
> migration?

On Thu, May 30, 2019 at 07:00:23PM +0100, Dr. David Alan Gilbert wrote:
> Well no, we need to know this thing isn't going to hang in the migration
> setup phase, or if it does how we recover.  This patch series is very
> odd precisely because it's trying to do the unplug itself in the
> migration phase rather than let the management layer do it - so unless
> it's nailed down how to make sure that's really really bullet proof
> then we've got to go back and ask the question about whether we should
> really fix it so it can be done by the management layer.
> 

I have the impression we are running in circles here.

-- 
Eduardo


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-05-30 18:09           ` Michael S. Tsirkin
@ 2019-05-30 18:22             ` Eduardo Habkost
  2019-05-30 23:06               ` Michael S. Tsirkin
  2019-05-30 19:08             ` Dr. David Alan Gilbert
  2019-06-05 15:23             ` Daniel P. Berrangé
  2 siblings, 1 reply; 77+ messages in thread
From: Eduardo Habkost @ 2019-05-30 18:22 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: pkrempa, berrange, aadam, Dr. David Alan Gilbert, qemu-devel,
	laine, Jens Freimann, ailan

On Thu, May 30, 2019 at 02:09:42PM -0400, Michael S. Tsirkin wrote:
> On Thu, May 30, 2019 at 07:00:23PM +0100, Dr. David Alan Gilbert wrote:
> > * Michael S. Tsirkin (mst@redhat.com) wrote:
> > > On Thu, May 30, 2019 at 04:56:45PM +0200, Jens Freimann wrote:
> > > > Hi David,
> > > > 
> > > > sorry for the  delayed reply.
> > > > 
> > > > On Tue, May 28, 2019 at 11:04:15AM -0400, Michael S. Tsirkin wrote:
> > > > > On Tue, May 21, 2019 at 10:45:05AM +0100, Dr. David Alan Gilbert wrote:
> > > > > > * Jens Freimann (jfreimann@redhat.com) wrote:
> > > > > > > +static void virtio_net_primary_plug_timer(void *opaque);
> > > > > > > +
> > > > > > >  static void virtio_net_set_link_status(NetClientState *nc)
> > > > > > >  {
> > > > > > >      VirtIONet *n = qemu_get_nic_opaque(nc);
> > > > > > > @@ -786,6 +796,14 @@ static void virtio_net_set_features(VirtIODevice *vdev, uint64_t features)
> > > > > > >      } else {
> > > > > > >          memset(n->vlans, 0xff, MAX_VLAN >> 3);
> > > > > > >      }
> > > > > > > +
> > > > > > > +    if (virtio_has_feature(features, VIRTIO_NET_F_STANDBY)) {
> > > > > > > +        atomic_set(&n->primary_should_be_hidden, false);
> > > > > > > +        if (n->primary_device_timer)
> > > > > > > +            timer_mod(n->primary_device_timer,
> > > > > > > +                qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> > > > > > > +                4000);
> > > > > > > +    }
> > > > > > 
> > > > > > What's this magic timer constant and why?
> > > > 
> > > > To be honest it's a leftover from previous versions (before I took
> > > > over) of the patches and I'm not sure why the timer is there.
> > > > I removed it and so far see no reason to keep it.
> > > > 
> > > > > > 
> > > > > > >  }
> > > > > > >
> > > > > > >  static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
> > > > > > > @@ -2626,6 +2644,87 @@ void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
> > > > > > >      n->netclient_type = g_strdup(type);
> > > > > > >  }
> > > > > > >
> > > > > > > +static void virtio_net_primary_plug_timer(void *opaque)
> > > > > > > +{
> > > > > > > +    VirtIONet *n = opaque;
> > > > > > > +    Error *err = NULL;
> > > > > > > +
> > > > > > > +    if (n->primary_device_dict)
> > > > > > > +        n->primary_device_opts = qemu_opts_from_qdict(qemu_find_opts("device"),
> > > > > > > +            n->primary_device_dict, &err);
> > > > > > > +    if (n->primary_device_opts) {
> > > > > > > +        n->primary_dev = qdev_device_add(n->primary_device_opts, &err);
> > > > > > > +        error_setg(&err, "virtio_net: couldn't plug in primary device");
> > > > > > > +        return;
> > > > > > > +    }
> > > > > > > +    if (!n->primary_device_dict && err) {
> > > > > > > +        if (n->primary_device_timer) {
> > > > > > > +            timer_mod(n->primary_device_timer,
> > > > > > > +                qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> > > > > > > +                100);
> > > > > > 
> > > > > > same here.
> > > > 
> > > > see above
> > > > 
> > > > > > 
> > > > > > 
> > > > > > > +        }
> > > > > > > +    }
> > > > > > > +}
> > > > > > > +
> > > > > > > +static void virtio_net_handle_migration_primary(VirtIONet *n,
> > > > > > > +                                                MigrationState *s)
> > > > > > > +{
> > > > > > > +    Error *err = NULL;
> > > > > > > +    bool should_be_hidden = atomic_read(&n->primary_should_be_hidden);
> > > > > > > +
> > > > > > > +    n->primary_dev = qdev_find_recursive(sysbus_get_default(),
> > > > > > > +            n->primary_device_id);
> > > > > > > +    if (!n->primary_dev) {
> > > > > > > +        error_setg(&err, "virtio_net: couldn't find primary device");
> > > > > > 
> > > > > > There's something broken with the error handling in this function - the
> > > > > > 'err' never goes anywhere - I don't think it ever gets printed or
> > > > > > reported or stops the migration.
> > > > 
> > > > yes, I'll fix it.
> > > > 
> > > > > > > +    }
> > > > > > > +    if (migration_in_setup(s) && !should_be_hidden && n->primary_dev) {
> > > > > > > +        qdev_unplug(n->primary_dev, &err);
> > > > > > 
> > > > > > Not knowing unplug well; can you just explain - is that device hard
> > > > > > unplugged and it's gone by the time this function returns or is it still
> > > > > > hanging around for some indeterminate time?
> > > > 
> > > > Qemu will trigger an unplug request via pcie attention button in which case
> > > > there could be a delay by the guest operating system. We could give it some
> > > > amount of time and if nothing happens try surpise removal or handle the
> > > > error otherwise.
> > > > 
> > > > 
> > > > regards,
> > > > Jens
> > > 
> > > That's a subject for another day. Let's get the basic thing
> > > working.
> > 
> > Well no, we need to know this thing isn't going to hang in the migration
> > setup phase, or if it does how we recover.
> 
> 
> This thing is *supposed* to be stuck in migration startup phase
> if guest is malicious.
> 
> If migration does not progress management needs
> a way to detect this and cancel.
> 
> Some more documentation about how this is supposed to happen
> would be helpful.

Do we have confirmation from libvirt developers that this would
be a reasonable API for them?


> >  This patch series is very
> > odd precisely because it's trying to do the unplug itself in the
> > migration phase rather than let the management layer do it - so unless
> > it's nailed down how to make sure that's really really bullet proof
> > then we've got to go back and ask the question about whether we should
> > really fix it so it can be done by the management layer.
> > 
> > Dave
> 
> management already said they can't because files get closed and
> resources freed on unplug and so they might not be able to re-add device
> on migration failure. We do it in migration because that is
> where failures can happen and we can recover.

We are capable of providing an API to libvirt where files won't
get closed when a device is unplugged, if necessary.

This might become necessary if libvirt or management software
developers tell us the interface we are providing is not going to
work for them.

-- 
Eduardo


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-05-30 18:09           ` Michael S. Tsirkin
  2019-05-30 18:22             ` Eduardo Habkost
@ 2019-05-30 19:08             ` Dr. David Alan Gilbert
  2019-05-30 19:21               ` Michael S. Tsirkin
  2019-06-05 15:23             ` Daniel P. Berrangé
  2 siblings, 1 reply; 77+ messages in thread
From: Dr. David Alan Gilbert @ 2019-05-30 19:08 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: pkrempa, berrange, ehabkost, aadam, qemu-devel, laine,
	Jens Freimann, ailan

* Michael S. Tsirkin (mst@redhat.com) wrote:
> On Thu, May 30, 2019 at 07:00:23PM +0100, Dr. David Alan Gilbert wrote:
> > * Michael S. Tsirkin (mst@redhat.com) wrote:
> > > On Thu, May 30, 2019 at 04:56:45PM +0200, Jens Freimann wrote:
> > > > Hi David,
> > > > 
> > > > sorry for the  delayed reply.
> > > > 
> > > > On Tue, May 28, 2019 at 11:04:15AM -0400, Michael S. Tsirkin wrote:
> > > > > On Tue, May 21, 2019 at 10:45:05AM +0100, Dr. David Alan Gilbert wrote:
> > > > > > * Jens Freimann (jfreimann@redhat.com) wrote:
> > > > > > > +static void virtio_net_primary_plug_timer(void *opaque);
> > > > > > > +
> > > > > > >  static void virtio_net_set_link_status(NetClientState *nc)
> > > > > > >  {
> > > > > > >      VirtIONet *n = qemu_get_nic_opaque(nc);
> > > > > > > @@ -786,6 +796,14 @@ static void virtio_net_set_features(VirtIODevice *vdev, uint64_t features)
> > > > > > >      } else {
> > > > > > >          memset(n->vlans, 0xff, MAX_VLAN >> 3);
> > > > > > >      }
> > > > > > > +
> > > > > > > +    if (virtio_has_feature(features, VIRTIO_NET_F_STANDBY)) {
> > > > > > > +        atomic_set(&n->primary_should_be_hidden, false);
> > > > > > > +        if (n->primary_device_timer)
> > > > > > > +            timer_mod(n->primary_device_timer,
> > > > > > > +                qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> > > > > > > +                4000);
> > > > > > > +    }
> > > > > > 
> > > > > > What's this magic timer constant and why?
> > > > 
> > > > To be honest it's a leftover from previous versions (before I took
> > > > over) of the patches and I'm not sure why the timer is there.
> > > > I removed it and so far see no reason to keep it.
> > > > 
> > > > > > 
> > > > > > >  }
> > > > > > >
> > > > > > >  static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
> > > > > > > @@ -2626,6 +2644,87 @@ void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
> > > > > > >      n->netclient_type = g_strdup(type);
> > > > > > >  }
> > > > > > >
> > > > > > > +static void virtio_net_primary_plug_timer(void *opaque)
> > > > > > > +{
> > > > > > > +    VirtIONet *n = opaque;
> > > > > > > +    Error *err = NULL;
> > > > > > > +
> > > > > > > +    if (n->primary_device_dict)
> > > > > > > +        n->primary_device_opts = qemu_opts_from_qdict(qemu_find_opts("device"),
> > > > > > > +            n->primary_device_dict, &err);
> > > > > > > +    if (n->primary_device_opts) {
> > > > > > > +        n->primary_dev = qdev_device_add(n->primary_device_opts, &err);
> > > > > > > +        error_setg(&err, "virtio_net: couldn't plug in primary device");
> > > > > > > +        return;
> > > > > > > +    }
> > > > > > > +    if (!n->primary_device_dict && err) {
> > > > > > > +        if (n->primary_device_timer) {
> > > > > > > +            timer_mod(n->primary_device_timer,
> > > > > > > +                qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> > > > > > > +                100);
> > > > > > 
> > > > > > same here.
> > > > 
> > > > see above
> > > > 
> > > > > > 
> > > > > > 
> > > > > > > +        }
> > > > > > > +    }
> > > > > > > +}
> > > > > > > +
> > > > > > > +static void virtio_net_handle_migration_primary(VirtIONet *n,
> > > > > > > +                                                MigrationState *s)
> > > > > > > +{
> > > > > > > +    Error *err = NULL;
> > > > > > > +    bool should_be_hidden = atomic_read(&n->primary_should_be_hidden);
> > > > > > > +
> > > > > > > +    n->primary_dev = qdev_find_recursive(sysbus_get_default(),
> > > > > > > +            n->primary_device_id);
> > > > > > > +    if (!n->primary_dev) {
> > > > > > > +        error_setg(&err, "virtio_net: couldn't find primary device");
> > > > > > 
> > > > > > There's something broken with the error handling in this function - the
> > > > > > 'err' never goes anywhere - I don't think it ever gets printed or
> > > > > > reported or stops the migration.
> > > > 
> > > > yes, I'll fix it.
> > > > 
> > > > > > > +    }
> > > > > > > +    if (migration_in_setup(s) && !should_be_hidden && n->primary_dev) {
> > > > > > > +        qdev_unplug(n->primary_dev, &err);
> > > > > > 
> > > > > > Not knowing unplug well; can you just explain - is that device hard
> > > > > > unplugged and it's gone by the time this function returns or is it still
> > > > > > hanging around for some indeterminate time?
> > > > 
> > > > Qemu will trigger an unplug request via pcie attention button in which case
> > > > there could be a delay by the guest operating system. We could give it some
> > > > amount of time and if nothing happens try surpise removal or handle the
> > > > error otherwise.
> > > > 
> > > > 
> > > > regards,
> > > > Jens
> > > 
> > > That's a subject for another day. Let's get the basic thing
> > > working.
> > 
> > Well no, we need to know this thing isn't going to hang in the migration
> > setup phase, or if it does how we recover.
> 
> 
> This thing is *supposed* to be stuck in migration startup phase
> if guest is malicious.
> 
> If migration does not progress management needs
> a way to detect this and cancel.
> 
> Some more documentation about how this is supposed to happen
> would be helpful.

I want to see that first; because I want to convinced it's just a
documentation problem and that we actually really have a method of
recovering.

> >  This patch series is very
> > odd precisely because it's trying to do the unplug itself in the
> > migration phase rather than let the management layer do it - so unless
> > it's nailed down how to make sure that's really really bullet proof
> > then we've got to go back and ask the question about whether we should
> > really fix it so it can be done by the management layer.
> > 
> > Dave
> 
> management already said they can't because files get closed and
> resources freed on unplug and so they might not be able to re-add device
> on migration failure. We do it in migration because that is
> where failures can happen and we can recover.

I find this explanation confusing - I can kind of see where it's coming
from, but we've got a pretty clear separation between a NIC and the
netdev that backs it; those files and resources should be associated
with the netdev and not the NIC.  So does hot-removing the NIC really
clean up the netdev?  (I guess maybe this is a different in vfio
which is the problem)

Dave

> > > -- 
> > > MST
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-05-30 14:56     ` Jens Freimann
  2019-05-30 17:46       ` Michael S. Tsirkin
@ 2019-05-30 19:09       ` Dr. David Alan Gilbert
  2019-05-31 21:47       ` Eduardo Habkost
  2 siblings, 0 replies; 77+ messages in thread
From: Dr. David Alan Gilbert @ 2019-05-30 19:09 UTC (permalink / raw)
  To: Jens Freimann
  Cc: pkrempa, berrange, ehabkost, mst, aadam, qemu-devel, laine, ailan

* Jens Freimann (jfreimann@redhat.com) wrote:
> Hi David,
> 
> sorry for the  delayed reply.
> 
> On Tue, May 28, 2019 at 11:04:15AM -0400, Michael S. Tsirkin wrote:
> > On Tue, May 21, 2019 at 10:45:05AM +0100, Dr. David Alan Gilbert wrote:
> > > * Jens Freimann (jfreimann@redhat.com) wrote:
> > > > +static void virtio_net_primary_plug_timer(void *opaque);
> > > > +
> > > >  static void virtio_net_set_link_status(NetClientState *nc)
> > > >  {
> > > >      VirtIONet *n = qemu_get_nic_opaque(nc);
> > > > @@ -786,6 +796,14 @@ static void virtio_net_set_features(VirtIODevice *vdev, uint64_t features)
> > > >      } else {
> > > >          memset(n->vlans, 0xff, MAX_VLAN >> 3);
> > > >      }
> > > > +
> > > > +    if (virtio_has_feature(features, VIRTIO_NET_F_STANDBY)) {
> > > > +        atomic_set(&n->primary_should_be_hidden, false);
> > > > +        if (n->primary_device_timer)
> > > > +            timer_mod(n->primary_device_timer,
> > > > +                qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> > > > +                4000);
> > > > +    }
> > > 
> > > What's this magic timer constant and why?
> 
> To be honest it's a leftover from previous versions (before I took
> over) of the patches and I'm not sure why the timer is there.
> I removed it and so far see no reason to keep it.
> 
> > > 
> > > >  }
> > > >
> > > >  static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
> > > > @@ -2626,6 +2644,87 @@ void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
> > > >      n->netclient_type = g_strdup(type);
> > > >  }
> > > >
> > > > +static void virtio_net_primary_plug_timer(void *opaque)
> > > > +{
> > > > +    VirtIONet *n = opaque;
> > > > +    Error *err = NULL;
> > > > +
> > > > +    if (n->primary_device_dict)
> > > > +        n->primary_device_opts = qemu_opts_from_qdict(qemu_find_opts("device"),
> > > > +            n->primary_device_dict, &err);
> > > > +    if (n->primary_device_opts) {
> > > > +        n->primary_dev = qdev_device_add(n->primary_device_opts, &err);
> > > > +        error_setg(&err, "virtio_net: couldn't plug in primary device");
> > > > +        return;
> > > > +    }
> > > > +    if (!n->primary_device_dict && err) {
> > > > +        if (n->primary_device_timer) {
> > > > +            timer_mod(n->primary_device_timer,
> > > > +                qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> > > > +                100);
> > > 
> > > same here.
> 
> see above
> 
> > > 
> > > 
> > > > +        }
> > > > +    }
> > > > +}
> > > > +
> > > > +static void virtio_net_handle_migration_primary(VirtIONet *n,
> > > > +                                                MigrationState *s)
> > > > +{
> > > > +    Error *err = NULL;
> > > > +    bool should_be_hidden = atomic_read(&n->primary_should_be_hidden);
> > > > +
> > > > +    n->primary_dev = qdev_find_recursive(sysbus_get_default(),
> > > > +            n->primary_device_id);
> > > > +    if (!n->primary_dev) {
> > > > +        error_setg(&err, "virtio_net: couldn't find primary device");
> > > 
> > > There's something broken with the error handling in this function - the
> > > 'err' never goes anywhere - I don't think it ever gets printed or
> > > reported or stops the migration.
> 
> yes, I'll fix it.
> 
> > > > +    }
> > > > +    if (migration_in_setup(s) && !should_be_hidden && n->primary_dev) {
> > > > +        qdev_unplug(n->primary_dev, &err);
> > > 
> > > Not knowing unplug well; can you just explain - is that device hard
> > > unplugged and it's gone by the time this function returns or is it still
> > > hanging around for some indeterminate time?
> 
> Qemu will trigger an unplug request via pcie attention button in which case
> there could be a delay by the guest operating system. We could give it some
> amount of time and if nothing happens try surpise removal or handle the
> error otherwise.

OK, can you show how one of those is going to work and try it out.
THis setup is weird enough I just want to make sure it works.

Dave

> 
> regards,
> Jens
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-05-30 19:08             ` Dr. David Alan Gilbert
@ 2019-05-30 19:21               ` Michael S. Tsirkin
  2019-05-31  8:23                 ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 77+ messages in thread
From: Michael S. Tsirkin @ 2019-05-30 19:21 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: pkrempa, berrange, ehabkost, aadam, qemu-devel, laine,
	Jens Freimann, ailan

On Thu, May 30, 2019 at 08:08:23PM +0100, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (mst@redhat.com) wrote:
> > On Thu, May 30, 2019 at 07:00:23PM +0100, Dr. David Alan Gilbert wrote:
> > > * Michael S. Tsirkin (mst@redhat.com) wrote:
> > > > On Thu, May 30, 2019 at 04:56:45PM +0200, Jens Freimann wrote:
> > > > > Hi David,
> > > > > 
> > > > > sorry for the  delayed reply.
> > > > > 
> > > > > On Tue, May 28, 2019 at 11:04:15AM -0400, Michael S. Tsirkin wrote:
> > > > > > On Tue, May 21, 2019 at 10:45:05AM +0100, Dr. David Alan Gilbert wrote:
> > > > > > > * Jens Freimann (jfreimann@redhat.com) wrote:
> > > > > > > > +static void virtio_net_primary_plug_timer(void *opaque);
> > > > > > > > +
> > > > > > > >  static void virtio_net_set_link_status(NetClientState *nc)
> > > > > > > >  {
> > > > > > > >      VirtIONet *n = qemu_get_nic_opaque(nc);
> > > > > > > > @@ -786,6 +796,14 @@ static void virtio_net_set_features(VirtIODevice *vdev, uint64_t features)
> > > > > > > >      } else {
> > > > > > > >          memset(n->vlans, 0xff, MAX_VLAN >> 3);
> > > > > > > >      }
> > > > > > > > +
> > > > > > > > +    if (virtio_has_feature(features, VIRTIO_NET_F_STANDBY)) {
> > > > > > > > +        atomic_set(&n->primary_should_be_hidden, false);
> > > > > > > > +        if (n->primary_device_timer)
> > > > > > > > +            timer_mod(n->primary_device_timer,
> > > > > > > > +                qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> > > > > > > > +                4000);
> > > > > > > > +    }
> > > > > > > 
> > > > > > > What's this magic timer constant and why?
> > > > > 
> > > > > To be honest it's a leftover from previous versions (before I took
> > > > > over) of the patches and I'm not sure why the timer is there.
> > > > > I removed it and so far see no reason to keep it.
> > > > > 
> > > > > > > 
> > > > > > > >  }
> > > > > > > >
> > > > > > > >  static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
> > > > > > > > @@ -2626,6 +2644,87 @@ void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
> > > > > > > >      n->netclient_type = g_strdup(type);
> > > > > > > >  }
> > > > > > > >
> > > > > > > > +static void virtio_net_primary_plug_timer(void *opaque)
> > > > > > > > +{
> > > > > > > > +    VirtIONet *n = opaque;
> > > > > > > > +    Error *err = NULL;
> > > > > > > > +
> > > > > > > > +    if (n->primary_device_dict)
> > > > > > > > +        n->primary_device_opts = qemu_opts_from_qdict(qemu_find_opts("device"),
> > > > > > > > +            n->primary_device_dict, &err);
> > > > > > > > +    if (n->primary_device_opts) {
> > > > > > > > +        n->primary_dev = qdev_device_add(n->primary_device_opts, &err);
> > > > > > > > +        error_setg(&err, "virtio_net: couldn't plug in primary device");
> > > > > > > > +        return;
> > > > > > > > +    }
> > > > > > > > +    if (!n->primary_device_dict && err) {
> > > > > > > > +        if (n->primary_device_timer) {
> > > > > > > > +            timer_mod(n->primary_device_timer,
> > > > > > > > +                qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> > > > > > > > +                100);
> > > > > > > 
> > > > > > > same here.
> > > > > 
> > > > > see above
> > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > > +        }
> > > > > > > > +    }
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static void virtio_net_handle_migration_primary(VirtIONet *n,
> > > > > > > > +                                                MigrationState *s)
> > > > > > > > +{
> > > > > > > > +    Error *err = NULL;
> > > > > > > > +    bool should_be_hidden = atomic_read(&n->primary_should_be_hidden);
> > > > > > > > +
> > > > > > > > +    n->primary_dev = qdev_find_recursive(sysbus_get_default(),
> > > > > > > > +            n->primary_device_id);
> > > > > > > > +    if (!n->primary_dev) {
> > > > > > > > +        error_setg(&err, "virtio_net: couldn't find primary device");
> > > > > > > 
> > > > > > > There's something broken with the error handling in this function - the
> > > > > > > 'err' never goes anywhere - I don't think it ever gets printed or
> > > > > > > reported or stops the migration.
> > > > > 
> > > > > yes, I'll fix it.
> > > > > 
> > > > > > > > +    }
> > > > > > > > +    if (migration_in_setup(s) && !should_be_hidden && n->primary_dev) {
> > > > > > > > +        qdev_unplug(n->primary_dev, &err);
> > > > > > > 
> > > > > > > Not knowing unplug well; can you just explain - is that device hard
> > > > > > > unplugged and it's gone by the time this function returns or is it still
> > > > > > > hanging around for some indeterminate time?
> > > > > 
> > > > > Qemu will trigger an unplug request via pcie attention button in which case
> > > > > there could be a delay by the guest operating system. We could give it some
> > > > > amount of time and if nothing happens try surpise removal or handle the
> > > > > error otherwise.
> > > > > 
> > > > > 
> > > > > regards,
> > > > > Jens
> > > > 
> > > > That's a subject for another day. Let's get the basic thing
> > > > working.
> > > 
> > > Well no, we need to know this thing isn't going to hang in the migration
> > > setup phase, or if it does how we recover.
> > 
> > 
> > This thing is *supposed* to be stuck in migration startup phase
> > if guest is malicious.
> > 
> > If migration does not progress management needs
> > a way to detect this and cancel.
> > 
> > Some more documentation about how this is supposed to happen
> > would be helpful.
> 
> I want to see that first; because I want to convinced it's just a
> documentation problem and that we actually really have a method of
> recovering.
> 
> > >  This patch series is very
> > > odd precisely because it's trying to do the unplug itself in the
> > > migration phase rather than let the management layer do it - so unless
> > > it's nailed down how to make sure that's really really bullet proof
> > > then we've got to go back and ask the question about whether we should
> > > really fix it so it can be done by the management layer.
> > > 
> > > Dave
> > 
> > management already said they can't because files get closed and
> > resources freed on unplug and so they might not be able to re-add device
> > on migration failure. We do it in migration because that is
> > where failures can happen and we can recover.
> 
> I find this explanation confusing - I can kind of see where it's coming
> from, but we've got a pretty clear separation between a NIC and the
> netdev that backs it; those files and resources should be associated
> with the netdev and not the NIC.  So does hot-removing the NIC really
> clean up the netdev?  (I guess maybe this is a different in vfio
> which is the problem)
> 
> Dave

what we are removing is the VFIO device.
Nothing to do with nic or netdev.

> > > > -- 
> > > > MST
> > > --
> > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-05-30 18:22             ` Eduardo Habkost
@ 2019-05-30 23:06               ` Michael S. Tsirkin
  2019-05-31 17:01                 ` Eduardo Habkost
  0 siblings, 1 reply; 77+ messages in thread
From: Michael S. Tsirkin @ 2019-05-30 23:06 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: pkrempa, berrange, aadam, Dr. David Alan Gilbert, qemu-devel,
	laine, Jens Freimann, ailan

On Thu, May 30, 2019 at 03:22:10PM -0300, Eduardo Habkost wrote:
> On Thu, May 30, 2019 at 02:09:42PM -0400, Michael S. Tsirkin wrote:
> > On Thu, May 30, 2019 at 07:00:23PM +0100, Dr. David Alan Gilbert wrote:
> > > * Michael S. Tsirkin (mst@redhat.com) wrote:
> > > > On Thu, May 30, 2019 at 04:56:45PM +0200, Jens Freimann wrote:
> > > > > Hi David,
> > > > > 
> > > > > sorry for the  delayed reply.
> > > > > 
> > > > > On Tue, May 28, 2019 at 11:04:15AM -0400, Michael S. Tsirkin wrote:
> > > > > > On Tue, May 21, 2019 at 10:45:05AM +0100, Dr. David Alan Gilbert wrote:
> > > > > > > * Jens Freimann (jfreimann@redhat.com) wrote:
> > > > > > > > +static void virtio_net_primary_plug_timer(void *opaque);
> > > > > > > > +
> > > > > > > >  static void virtio_net_set_link_status(NetClientState *nc)
> > > > > > > >  {
> > > > > > > >      VirtIONet *n = qemu_get_nic_opaque(nc);
> > > > > > > > @@ -786,6 +796,14 @@ static void virtio_net_set_features(VirtIODevice *vdev, uint64_t features)
> > > > > > > >      } else {
> > > > > > > >          memset(n->vlans, 0xff, MAX_VLAN >> 3);
> > > > > > > >      }
> > > > > > > > +
> > > > > > > > +    if (virtio_has_feature(features, VIRTIO_NET_F_STANDBY)) {
> > > > > > > > +        atomic_set(&n->primary_should_be_hidden, false);
> > > > > > > > +        if (n->primary_device_timer)
> > > > > > > > +            timer_mod(n->primary_device_timer,
> > > > > > > > +                qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> > > > > > > > +                4000);
> > > > > > > > +    }
> > > > > > > 
> > > > > > > What's this magic timer constant and why?
> > > > > 
> > > > > To be honest it's a leftover from previous versions (before I took
> > > > > over) of the patches and I'm not sure why the timer is there.
> > > > > I removed it and so far see no reason to keep it.
> > > > > 
> > > > > > > 
> > > > > > > >  }
> > > > > > > >
> > > > > > > >  static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
> > > > > > > > @@ -2626,6 +2644,87 @@ void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
> > > > > > > >      n->netclient_type = g_strdup(type);
> > > > > > > >  }
> > > > > > > >
> > > > > > > > +static void virtio_net_primary_plug_timer(void *opaque)
> > > > > > > > +{
> > > > > > > > +    VirtIONet *n = opaque;
> > > > > > > > +    Error *err = NULL;
> > > > > > > > +
> > > > > > > > +    if (n->primary_device_dict)
> > > > > > > > +        n->primary_device_opts = qemu_opts_from_qdict(qemu_find_opts("device"),
> > > > > > > > +            n->primary_device_dict, &err);
> > > > > > > > +    if (n->primary_device_opts) {
> > > > > > > > +        n->primary_dev = qdev_device_add(n->primary_device_opts, &err);
> > > > > > > > +        error_setg(&err, "virtio_net: couldn't plug in primary device");
> > > > > > > > +        return;
> > > > > > > > +    }
> > > > > > > > +    if (!n->primary_device_dict && err) {
> > > > > > > > +        if (n->primary_device_timer) {
> > > > > > > > +            timer_mod(n->primary_device_timer,
> > > > > > > > +                qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> > > > > > > > +                100);
> > > > > > > 
> > > > > > > same here.
> > > > > 
> > > > > see above
> > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > > +        }
> > > > > > > > +    }
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static void virtio_net_handle_migration_primary(VirtIONet *n,
> > > > > > > > +                                                MigrationState *s)
> > > > > > > > +{
> > > > > > > > +    Error *err = NULL;
> > > > > > > > +    bool should_be_hidden = atomic_read(&n->primary_should_be_hidden);
> > > > > > > > +
> > > > > > > > +    n->primary_dev = qdev_find_recursive(sysbus_get_default(),
> > > > > > > > +            n->primary_device_id);
> > > > > > > > +    if (!n->primary_dev) {
> > > > > > > > +        error_setg(&err, "virtio_net: couldn't find primary device");
> > > > > > > 
> > > > > > > There's something broken with the error handling in this function - the
> > > > > > > 'err' never goes anywhere - I don't think it ever gets printed or
> > > > > > > reported or stops the migration.
> > > > > 
> > > > > yes, I'll fix it.
> > > > > 
> > > > > > > > +    }
> > > > > > > > +    if (migration_in_setup(s) && !should_be_hidden && n->primary_dev) {
> > > > > > > > +        qdev_unplug(n->primary_dev, &err);
> > > > > > > 
> > > > > > > Not knowing unplug well; can you just explain - is that device hard
> > > > > > > unplugged and it's gone by the time this function returns or is it still
> > > > > > > hanging around for some indeterminate time?
> > > > > 
> > > > > Qemu will trigger an unplug request via pcie attention button in which case
> > > > > there could be a delay by the guest operating system. We could give it some
> > > > > amount of time and if nothing happens try surpise removal or handle the
> > > > > error otherwise.
> > > > > 
> > > > > 
> > > > > regards,
> > > > > Jens
> > > > 
> > > > That's a subject for another day. Let's get the basic thing
> > > > working.
> > > 
> > > Well no, we need to know this thing isn't going to hang in the migration
> > > setup phase, or if it does how we recover.
> > 
> > 
> > This thing is *supposed* to be stuck in migration startup phase
> > if guest is malicious.
> > 
> > If migration does not progress management needs
> > a way to detect this and cancel.
> > 
> > Some more documentation about how this is supposed to happen
> > would be helpful.
> 
> Do we have confirmation from libvirt developers that this would
> be a reasonable API for them?
> 
> 
> > >  This patch series is very
> > > odd precisely because it's trying to do the unplug itself in the
> > > migration phase rather than let the management layer do it - so unless
> > > it's nailed down how to make sure that's really really bullet proof
> > > then we've got to go back and ask the question about whether we should
> > > really fix it so it can be done by the management layer.
> > > 
> > > Dave
> > 
> > management already said they can't because files get closed and
> > resources freed on unplug and so they might not be able to re-add device
> > on migration failure. We do it in migration because that is
> > where failures can happen and we can recover.
> 
> We are capable of providing an API to libvirt where files won't
> get closed when a device is unplugged, if necessary.
> 
> This might become necessary if libvirt or management software
> developers tell us the interface we are providing is not going to
> work for them.
> 
> -- 
> Eduardo

Yes. It's just lots of extremely low level interfaces
and all rather pointless.

And down the road extensions like surprise removal support will make it
all cleaner and more transparent. Floating things up to libvirt means
all these low level details will require more and more hacks.

-- 
MST


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-05-30 19:21               ` Michael S. Tsirkin
@ 2019-05-31  8:23                 ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 77+ messages in thread
From: Dr. David Alan Gilbert @ 2019-05-31  8:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: pkrempa, berrange, ehabkost, aadam, qemu-devel, laine,
	Jens Freimann, ailan

* Michael S. Tsirkin (mst@redhat.com) wrote:
> On Thu, May 30, 2019 at 08:08:23PM +0100, Dr. David Alan Gilbert wrote:
> > * Michael S. Tsirkin (mst@redhat.com) wrote:
> > > On Thu, May 30, 2019 at 07:00:23PM +0100, Dr. David Alan Gilbert wrote:
> > > > * Michael S. Tsirkin (mst@redhat.com) wrote:
> > > > > On Thu, May 30, 2019 at 04:56:45PM +0200, Jens Freimann wrote:
> > > > > > Hi David,
> > > > > > 
> > > > > > sorry for the  delayed reply.
> > > > > > 
> > > > > > On Tue, May 28, 2019 at 11:04:15AM -0400, Michael S. Tsirkin wrote:
> > > > > > > On Tue, May 21, 2019 at 10:45:05AM +0100, Dr. David Alan Gilbert wrote:
> > > > > > > > * Jens Freimann (jfreimann@redhat.com) wrote:
> > > > > > > > > +static void virtio_net_primary_plug_timer(void *opaque);
> > > > > > > > > +
> > > > > > > > >  static void virtio_net_set_link_status(NetClientState *nc)
> > > > > > > > >  {
> > > > > > > > >      VirtIONet *n = qemu_get_nic_opaque(nc);
> > > > > > > > > @@ -786,6 +796,14 @@ static void virtio_net_set_features(VirtIODevice *vdev, uint64_t features)
> > > > > > > > >      } else {
> > > > > > > > >          memset(n->vlans, 0xff, MAX_VLAN >> 3);
> > > > > > > > >      }
> > > > > > > > > +
> > > > > > > > > +    if (virtio_has_feature(features, VIRTIO_NET_F_STANDBY)) {
> > > > > > > > > +        atomic_set(&n->primary_should_be_hidden, false);
> > > > > > > > > +        if (n->primary_device_timer)
> > > > > > > > > +            timer_mod(n->primary_device_timer,
> > > > > > > > > +                qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> > > > > > > > > +                4000);
> > > > > > > > > +    }
> > > > > > > > 
> > > > > > > > What's this magic timer constant and why?
> > > > > > 
> > > > > > To be honest it's a leftover from previous versions (before I took
> > > > > > over) of the patches and I'm not sure why the timer is there.
> > > > > > I removed it and so far see no reason to keep it.
> > > > > > 
> > > > > > > > 
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > >  static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
> > > > > > > > > @@ -2626,6 +2644,87 @@ void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
> > > > > > > > >      n->netclient_type = g_strdup(type);
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > +static void virtio_net_primary_plug_timer(void *opaque)
> > > > > > > > > +{
> > > > > > > > > +    VirtIONet *n = opaque;
> > > > > > > > > +    Error *err = NULL;
> > > > > > > > > +
> > > > > > > > > +    if (n->primary_device_dict)
> > > > > > > > > +        n->primary_device_opts = qemu_opts_from_qdict(qemu_find_opts("device"),
> > > > > > > > > +            n->primary_device_dict, &err);
> > > > > > > > > +    if (n->primary_device_opts) {
> > > > > > > > > +        n->primary_dev = qdev_device_add(n->primary_device_opts, &err);
> > > > > > > > > +        error_setg(&err, "virtio_net: couldn't plug in primary device");
> > > > > > > > > +        return;
> > > > > > > > > +    }
> > > > > > > > > +    if (!n->primary_device_dict && err) {
> > > > > > > > > +        if (n->primary_device_timer) {
> > > > > > > > > +            timer_mod(n->primary_device_timer,
> > > > > > > > > +                qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> > > > > > > > > +                100);
> > > > > > > > 
> > > > > > > > same here.
> > > > > > 
> > > > > > see above
> > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > > +        }
> > > > > > > > > +    }
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +static void virtio_net_handle_migration_primary(VirtIONet *n,
> > > > > > > > > +                                                MigrationState *s)
> > > > > > > > > +{
> > > > > > > > > +    Error *err = NULL;
> > > > > > > > > +    bool should_be_hidden = atomic_read(&n->primary_should_be_hidden);
> > > > > > > > > +
> > > > > > > > > +    n->primary_dev = qdev_find_recursive(sysbus_get_default(),
> > > > > > > > > +            n->primary_device_id);
> > > > > > > > > +    if (!n->primary_dev) {
> > > > > > > > > +        error_setg(&err, "virtio_net: couldn't find primary device");
> > > > > > > > 
> > > > > > > > There's something broken with the error handling in this function - the
> > > > > > > > 'err' never goes anywhere - I don't think it ever gets printed or
> > > > > > > > reported or stops the migration.
> > > > > > 
> > > > > > yes, I'll fix it.
> > > > > > 
> > > > > > > > > +    }
> > > > > > > > > +    if (migration_in_setup(s) && !should_be_hidden && n->primary_dev) {
> > > > > > > > > +        qdev_unplug(n->primary_dev, &err);
> > > > > > > > 
> > > > > > > > Not knowing unplug well; can you just explain - is that device hard
> > > > > > > > unplugged and it's gone by the time this function returns or is it still
> > > > > > > > hanging around for some indeterminate time?
> > > > > > 
> > > > > > Qemu will trigger an unplug request via pcie attention button in which case
> > > > > > there could be a delay by the guest operating system. We could give it some
> > > > > > amount of time and if nothing happens try surpise removal or handle the
> > > > > > error otherwise.
> > > > > > 
> > > > > > 
> > > > > > regards,
> > > > > > Jens
> > > > > 
> > > > > That's a subject for another day. Let's get the basic thing
> > > > > working.
> > > > 
> > > > Well no, we need to know this thing isn't going to hang in the migration
> > > > setup phase, or if it does how we recover.
> > > 
> > > 
> > > This thing is *supposed* to be stuck in migration startup phase
> > > if guest is malicious.
> > > 
> > > If migration does not progress management needs
> > > a way to detect this and cancel.
> > > 
> > > Some more documentation about how this is supposed to happen
> > > would be helpful.
> > 
> > I want to see that first; because I want to convinced it's just a
> > documentation problem and that we actually really have a method of
> > recovering.
> > 
> > > >  This patch series is very
> > > > odd precisely because it's trying to do the unplug itself in the
> > > > migration phase rather than let the management layer do it - so unless
> > > > it's nailed down how to make sure that's really really bullet proof
> > > > then we've got to go back and ask the question about whether we should
> > > > really fix it so it can be done by the management layer.
> > > > 
> > > > Dave
> > > 
> > > management already said they can't because files get closed and
> > > resources freed on unplug and so they might not be able to re-add device
> > > on migration failure. We do it in migration because that is
> > > where failures can happen and we can recover.
> > 
> > I find this explanation confusing - I can kind of see where it's coming
> > from, but we've got a pretty clear separation between a NIC and the
> > netdev that backs it; those files and resources should be associated
> > with the netdev and not the NIC.  So does hot-removing the NIC really
> > clean up the netdev?  (I guess maybe this is a different in vfio
> > which is the problem)
> > 
> > Dave
> 
> what we are removing is the VFIO device.
> Nothing to do with nic or netdev.

OK, but at the same time why can't we hold open the VFIOs devices
resources in a comparable way - i.e. don't really let qemu let go of
it even when the guest has unplugged it?

Dave

> > > > > -- 
> > > > > MST
> > > > --
> > > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices
  2019-05-30 18:12             ` Michael S. Tsirkin
@ 2019-05-31 15:12               ` Jens Freimann
  0 siblings, 0 replies; 77+ messages in thread
From: Jens Freimann @ 2019-05-31 15:12 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: pkrempa, berrange, ehabkost, aadam, qemu-devel, Alex Williamson,
	laine, ailan

On Thu, May 30, 2019 at 02:12:21PM -0400, Michael S. Tsirkin wrote:
>On Wed, May 29, 2019 at 09:48:02AM +0200, Jens Freimann wrote:
>> On Tue, May 28, 2019 at 10:40:42PM -0400, Michael S. Tsirkin wrote:
>> > On Tue, May 21, 2019 at 08:49:18PM +0200, Jens Freimann wrote:
>> > > On Tue, May 21, 2019 at 07:37:19AM -0400, Michael S. Tsirkin wrote:
>> > > > On Tue, May 21, 2019 at 09:21:57AM +0200, Jens Freimann wrote:
>> > > > Actually is there a list of devices for which this has been tested
>> > > > besides mlx5? I think someone said some old intel cards
>> > > > don't support this well, we might need to blacklist these ...
>> > >
>> > > So far I've tested mlx5 and XL710 which both worked, but I'm
>> > > working on testing with more devices. But of course help with testing
>> > > is greatly appreciated.
>> >
>> > A testing tool that people can run to get a pass/fail
>> > result would be needed for that.
>> > Do you have something like this?
>>
>> I have two simple tools. One that sends packets and another one that
>> sniffs for packets to see which device the packet goes to. Find it at
>> https://github.com/jensfr/netfailover_driver_detect
>>
>> Feedback and/or patches welcome.
>>
>> regards,
>> Jens
>
>The docs say:
> ./is_legacy -d . If is_legacy returns 0 it means it has received the packets sent by send_packet. If it returns 1 it didn't receive the packet. Now run ./is_legacy -d
>
>So -d twice. What is the difference?

Should say "Now run ./is_legacy -d <vf device>" to sniff on vf device.
I'll fix the README, thanks!

regards,
Jens 


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-05-30 23:06               ` Michael S. Tsirkin
@ 2019-05-31 17:01                 ` Eduardo Habkost
  2019-05-31 18:04                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 77+ messages in thread
From: Eduardo Habkost @ 2019-05-31 17:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: pkrempa, berrange, aadam, Dr. David Alan Gilbert, qemu-devel,
	laine, Jens Freimann, ailan

On Thu, May 30, 2019 at 07:06:29PM -0400, Michael S. Tsirkin wrote:
> On Thu, May 30, 2019 at 03:22:10PM -0300, Eduardo Habkost wrote:
> > On Thu, May 30, 2019 at 02:09:42PM -0400, Michael S. Tsirkin wrote:
> > > On Thu, May 30, 2019 at 07:00:23PM +0100, Dr. David Alan Gilbert wrote:
> > > > * Michael S. Tsirkin (mst@redhat.com) wrote:
> > > > > On Thu, May 30, 2019 at 04:56:45PM +0200, Jens Freimann wrote:
> > > > > > Hi David,
> > > > > > 
> > > > > > sorry for the  delayed reply.
> > > > > > 
> > > > > > On Tue, May 28, 2019 at 11:04:15AM -0400, Michael S. Tsirkin wrote:
> > > > > > > On Tue, May 21, 2019 at 10:45:05AM +0100, Dr. David Alan Gilbert wrote:
> > > > > > > > * Jens Freimann (jfreimann@redhat.com) wrote:
> > > > > > > > > +static void virtio_net_primary_plug_timer(void *opaque);
> > > > > > > > > +
> > > > > > > > >  static void virtio_net_set_link_status(NetClientState *nc)
> > > > > > > > >  {
> > > > > > > > >      VirtIONet *n = qemu_get_nic_opaque(nc);
> > > > > > > > > @@ -786,6 +796,14 @@ static void virtio_net_set_features(VirtIODevice *vdev, uint64_t features)
> > > > > > > > >      } else {
> > > > > > > > >          memset(n->vlans, 0xff, MAX_VLAN >> 3);
> > > > > > > > >      }
> > > > > > > > > +
> > > > > > > > > +    if (virtio_has_feature(features, VIRTIO_NET_F_STANDBY)) {
> > > > > > > > > +        atomic_set(&n->primary_should_be_hidden, false);
> > > > > > > > > +        if (n->primary_device_timer)
> > > > > > > > > +            timer_mod(n->primary_device_timer,
> > > > > > > > > +                qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> > > > > > > > > +                4000);
> > > > > > > > > +    }
> > > > > > > > 
> > > > > > > > What's this magic timer constant and why?
> > > > > > 
> > > > > > To be honest it's a leftover from previous versions (before I took
> > > > > > over) of the patches and I'm not sure why the timer is there.
> > > > > > I removed it and so far see no reason to keep it.
> > > > > > 
> > > > > > > > 
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > >  static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
> > > > > > > > > @@ -2626,6 +2644,87 @@ void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
> > > > > > > > >      n->netclient_type = g_strdup(type);
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > +static void virtio_net_primary_plug_timer(void *opaque)
> > > > > > > > > +{
> > > > > > > > > +    VirtIONet *n = opaque;
> > > > > > > > > +    Error *err = NULL;
> > > > > > > > > +
> > > > > > > > > +    if (n->primary_device_dict)
> > > > > > > > > +        n->primary_device_opts = qemu_opts_from_qdict(qemu_find_opts("device"),
> > > > > > > > > +            n->primary_device_dict, &err);
> > > > > > > > > +    if (n->primary_device_opts) {
> > > > > > > > > +        n->primary_dev = qdev_device_add(n->primary_device_opts, &err);
> > > > > > > > > +        error_setg(&err, "virtio_net: couldn't plug in primary device");
> > > > > > > > > +        return;
> > > > > > > > > +    }
> > > > > > > > > +    if (!n->primary_device_dict && err) {
> > > > > > > > > +        if (n->primary_device_timer) {
> > > > > > > > > +            timer_mod(n->primary_device_timer,
> > > > > > > > > +                qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> > > > > > > > > +                100);
> > > > > > > > 
> > > > > > > > same here.
> > > > > > 
> > > > > > see above
> > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > > +        }
> > > > > > > > > +    }
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +static void virtio_net_handle_migration_primary(VirtIONet *n,
> > > > > > > > > +                                                MigrationState *s)
> > > > > > > > > +{
> > > > > > > > > +    Error *err = NULL;
> > > > > > > > > +    bool should_be_hidden = atomic_read(&n->primary_should_be_hidden);
> > > > > > > > > +
> > > > > > > > > +    n->primary_dev = qdev_find_recursive(sysbus_get_default(),
> > > > > > > > > +            n->primary_device_id);
> > > > > > > > > +    if (!n->primary_dev) {
> > > > > > > > > +        error_setg(&err, "virtio_net: couldn't find primary device");
> > > > > > > > 
> > > > > > > > There's something broken with the error handling in this function - the
> > > > > > > > 'err' never goes anywhere - I don't think it ever gets printed or
> > > > > > > > reported or stops the migration.
> > > > > > 
> > > > > > yes, I'll fix it.
> > > > > > 
> > > > > > > > > +    }
> > > > > > > > > +    if (migration_in_setup(s) && !should_be_hidden && n->primary_dev) {
> > > > > > > > > +        qdev_unplug(n->primary_dev, &err);
> > > > > > > > 
> > > > > > > > Not knowing unplug well; can you just explain - is that device hard
> > > > > > > > unplugged and it's gone by the time this function returns or is it still
> > > > > > > > hanging around for some indeterminate time?
> > > > > > 
> > > > > > Qemu will trigger an unplug request via pcie attention button in which case
> > > > > > there could be a delay by the guest operating system. We could give it some
> > > > > > amount of time and if nothing happens try surpise removal or handle the
> > > > > > error otherwise.
> > > > > > 
> > > > > > 
> > > > > > regards,
> > > > > > Jens
> > > > > 
> > > > > That's a subject for another day. Let's get the basic thing
> > > > > working.
> > > > 
> > > > Well no, we need to know this thing isn't going to hang in the migration
> > > > setup phase, or if it does how we recover.
> > > 
> > > 
> > > This thing is *supposed* to be stuck in migration startup phase
> > > if guest is malicious.
> > > 
> > > If migration does not progress management needs
> > > a way to detect this and cancel.
> > > 
> > > Some more documentation about how this is supposed to happen
> > > would be helpful.
> > 
> > Do we have confirmation from libvirt developers that this would
> > be a reasonable API for them?
> > 
> > 
> > > >  This patch series is very
> > > > odd precisely because it's trying to do the unplug itself in the
> > > > migration phase rather than let the management layer do it - so unless
> > > > it's nailed down how to make sure that's really really bullet proof
> > > > then we've got to go back and ask the question about whether we should
> > > > really fix it so it can be done by the management layer.
> > > > 
> > > > Dave
> > > 
> > > management already said they can't because files get closed and
> > > resources freed on unplug and so they might not be able to re-add device
> > > on migration failure. We do it in migration because that is
> > > where failures can happen and we can recover.
> > 
> > We are capable of providing an API to libvirt where files won't
> > get closed when a device is unplugged, if necessary.
> > 
> > This might become necessary if libvirt or management software
> > developers tell us the interface we are providing is not going to
> > work for them.
> > 
> > -- 
> > Eduardo
> 
> Yes. It's just lots of extremely low level interfaces
> and all rather pointless.
> 
> And down the road extensions like surprise removal support will make it
> all cleaner and more transparent. Floating things up to libvirt means
> all these low level details will require more and more hacks.

Why do you call it pointless?  If we want this to work before
surprise removal is implemented, we need to provide an API that
works for management software.  Don't we want to make this work
without surprise removal too?

-- 
Eduardo


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-05-31 17:01                 ` Eduardo Habkost
@ 2019-05-31 18:04                   ` Michael S. Tsirkin
  2019-05-31 18:42                     ` Eduardo Habkost
  2019-05-31 18:45                     ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 77+ messages in thread
From: Michael S. Tsirkin @ 2019-05-31 18:04 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: pkrempa, berrange, aadam, Dr. David Alan Gilbert, qemu-devel,
	laine, Jens Freimann, ailan

On Fri, May 31, 2019 at 02:01:54PM -0300, Eduardo Habkost wrote:
> > Yes. It's just lots of extremely low level interfaces
> > and all rather pointless.
> > 
> > And down the road extensions like surprise removal support will make it
> > all cleaner and more transparent. Floating things up to libvirt means
> > all these low level details will require more and more hacks.
> 
> Why do you call it pointless?

We'd need APIs to manipulate device visibility to guest, hotplug
controller state and separately manipulate the resources allocated. This
is low level stuff that users really have no idea what to do about.
Exposing such a level of detail to management is imho pointless.
We are better off with a high level API, see below.

> If we want this to work before
> surprise removal is implemented, we need to provide an API that
> works for management software.
>  Don't we want to make this work
> without surprise removal too?

This patchset adds an optional, off by default support for
migrating guests with an assigned network device.
If enabled this requires guest to allow migration.

Of course this can be viewed as a security problem since it allows guest
to block migration. We can't detect a malicious guest reliably imho.
What we can do is report to management when guest allows migration.
Policy such what to do when this does not happen for a while and
what timeout to set would be up to management.

The API in question would be a high level one, something
along the lines of a single "guest allowed migration" event.


-- 
MST


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-05-31 18:04                   ` Michael S. Tsirkin
@ 2019-05-31 18:42                     ` Eduardo Habkost
  2019-05-31 18:45                     ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 77+ messages in thread
From: Eduardo Habkost @ 2019-05-31 18:42 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: pkrempa, berrange, aadam, Dr. David Alan Gilbert, qemu-devel,
	laine, Jens Freimann, ailan

On Fri, May 31, 2019 at 02:04:49PM -0400, Michael S. Tsirkin wrote:
> On Fri, May 31, 2019 at 02:01:54PM -0300, Eduardo Habkost wrote:
> > > Yes. It's just lots of extremely low level interfaces
> > > and all rather pointless.
> > > 
> > > And down the road extensions like surprise removal support will make it
> > > all cleaner and more transparent. Floating things up to libvirt means
> > > all these low level details will require more and more hacks.
> > 
> > Why do you call it pointless?
> 
> We'd need APIs to manipulate device visibility to guest, hotplug
> controller state and separately manipulate the resources allocated. This
> is low level stuff that users really have no idea what to do about.
> Exposing such a level of detail to management is imho pointless.
> We are better off with a high level API, see below.

I don't disagree it's low level.  I just disagree it's pointless.
The goal here is to provide an API that management software can
use.

> 
> > If we want this to work before
> > surprise removal is implemented, we need to provide an API that
> > works for management software.
> >  Don't we want to make this work
> > without surprise removal too?
> 
> This patchset adds an optional, off by default support for
> migrating guests with an assigned network device.
> If enabled this requires guest to allow migration.
> 
> Of course this can be viewed as a security problem since it allows guest
> to block migration. We can't detect a malicious guest reliably imho.
> What we can do is report to management when guest allows migration.
> Policy such what to do when this does not happen for a while and
> what timeout to set would be up to management.
> 
> The API in question would be a high level one, something
> along the lines of a single "guest allowed migration" event.

If you want to hide the low level details behind a higher level
API, that's OK.  I just want to be sure we have really listened
to management software developers to confirm the API we're
providing will work for them.

This will probably require documenting the new interface in more
detail (as you have already mentioned in this thread).

-- 
Eduardo


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-05-31 18:04                   ` Michael S. Tsirkin
  2019-05-31 18:42                     ` Eduardo Habkost
@ 2019-05-31 18:45                     ` Dr. David Alan Gilbert
  2019-05-31 20:29                       ` Alex Williamson
  2019-05-31 20:43                       ` Michael S. Tsirkin
  1 sibling, 2 replies; 77+ messages in thread
From: Dr. David Alan Gilbert @ 2019-05-31 18:45 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: pkrempa, berrange, Eduardo Habkost, aadam, qemu-devel, laine,
	jdenemar, Jens Freimann, ailan

* Michael S. Tsirkin (mst@redhat.com) wrote:
> On Fri, May 31, 2019 at 02:01:54PM -0300, Eduardo Habkost wrote:
> > > Yes. It's just lots of extremely low level interfaces
> > > and all rather pointless.
> > > 
> > > And down the road extensions like surprise removal support will make it
> > > all cleaner and more transparent. Floating things up to libvirt means
> > > all these low level details will require more and more hacks.
> > 
> > Why do you call it pointless?
> 
> We'd need APIs to manipulate device visibility to guest, hotplug
> controller state and separately manipulate the resources allocated. This
> is low level stuff that users really have no idea what to do about.
> Exposing such a level of detail to management is imho pointless.
> We are better off with a high level API, see below.

so I don't know much about vfio; but to me it strikes me that
you wouldn't need that low level detail if we just reworked vfio
to look more like all our other devices; something like:

  -vfiodev  host=02:00.0,id=gpu
  -device vfio-pci,dev=gpu

The 'vfiodev' would own the resources; so to do this trick, the
management layer would:
   hotunplug the vfio-pci
   migrate

if anything went wrong it would
   hotplug the vfio-pci backin

you wouldn't have free'd up any resources because they belonged
to the vfiodev.


> > If we want this to work before
> > surprise removal is implemented, we need to provide an API that
> > works for management software.
> >  Don't we want to make this work
> > without surprise removal too?
> 
> This patchset adds an optional, off by default support for
> migrating guests with an assigned network device.
> If enabled this requires guest to allow migration.
> 
> Of course this can be viewed as a security problem since it allows guest
> to block migration. We can't detect a malicious guest reliably imho.
> What we can do is report to management when guest allows migration.
> Policy such what to do when this does not happen for a while and
> what timeout to set would be up to management.
> 
> The API in question would be a high level one, something
> along the lines of a single "guest allowed migration" event.

This is all fairly normal problems with hot unplugging - that's
already dealt with at higher levels for normal hot unplugging.

The question here is to try to avoid duplicating that fairly
painful process in qemu.

Dave

> 
> -- 
> MST
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-05-31 18:45                     ` Dr. David Alan Gilbert
@ 2019-05-31 20:29                       ` Alex Williamson
  2019-05-31 21:05                         ` Michael S. Tsirkin
  2019-06-03  8:59                         ` Dr. David Alan Gilbert
  2019-05-31 20:43                       ` Michael S. Tsirkin
  1 sibling, 2 replies; 77+ messages in thread
From: Alex Williamson @ 2019-05-31 20:29 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: pkrempa, berrange, Eduardo Habkost, Michael S. Tsirkin, aadam,
	qemu-devel, laine, jdenemar, Jens Freimann, ailan

On Fri, 31 May 2019 19:45:13 +0100
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> * Michael S. Tsirkin (mst@redhat.com) wrote:
> > On Fri, May 31, 2019 at 02:01:54PM -0300, Eduardo Habkost wrote:  
> > > > Yes. It's just lots of extremely low level interfaces
> > > > and all rather pointless.
> > > > 
> > > > And down the road extensions like surprise removal support will make it
> > > > all cleaner and more transparent. Floating things up to libvirt means
> > > > all these low level details will require more and more hacks.  
> > > 
> > > Why do you call it pointless?  
> > 
> > We'd need APIs to manipulate device visibility to guest, hotplug
> > controller state and separately manipulate the resources allocated. This
> > is low level stuff that users really have no idea what to do about.
> > Exposing such a level of detail to management is imho pointless.
> > We are better off with a high level API, see below.  
> 
> so I don't know much about vfio; but to me it strikes me that
> you wouldn't need that low level detail if we just reworked vfio
> to look more like all our other devices;

I don't understand what this means, I thought vfio-pci followed a very
standard device model.

> something like:
> 
>   -vfiodev  host=02:00.0,id=gpu
>   -device vfio-pci,dev=gpu
>
> The 'vfiodev' would own the resources; so to do this trick, the
> management layer would:
>    hotunplug the vfio-pci
>    migrate
> 
> if anything went wrong it would
>    hotplug the vfio-pci backin
> 
> you wouldn't have free'd up any resources because they belonged
> to the vfiodev.

So you're looking more for some sort of frontend-backend separation, we
hot-unplug the frontend device that's exposed to the guest while the
backend device that holds the host resources is still attached.  I
would have hardly guessed that's "like all our other devices".  I was
under the impression (from previous discussions mostly) that the device
removal would be caught before actually allowing the device to finalize
and exit, such that with a failed migration, re-adding the device would
be deterministic since the device is never released back to the host.
I expected that could be done within QEMU, but I guess that's what
we're getting into here is how management tools specify that eject w/o
release semantic.  I don't know what this frontend/backend rework would
look like for vfio-pci, but it seems non-trivial for this one use case
and I don't see that it adds any value outside of this use case,
perhaps quite the opposite, it's an overly complicated interface for
the majority of use cases so we either move to a more complicated
interface or maintain both.  Poor choices either way.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-05-31 18:45                     ` Dr. David Alan Gilbert
  2019-05-31 20:29                       ` Alex Williamson
@ 2019-05-31 20:43                       ` Michael S. Tsirkin
  2019-05-31 21:03                         ` Eduardo Habkost
  2019-06-03  8:06                         ` Dr. David Alan Gilbert
  1 sibling, 2 replies; 77+ messages in thread
From: Michael S. Tsirkin @ 2019-05-31 20:43 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: pkrempa, berrange, Eduardo Habkost, aadam, qemu-devel, laine,
	jdenemar, Jens Freimann, ailan

On Fri, May 31, 2019 at 07:45:13PM +0100, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (mst@redhat.com) wrote:
> > On Fri, May 31, 2019 at 02:01:54PM -0300, Eduardo Habkost wrote:
> > > > Yes. It's just lots of extremely low level interfaces
> > > > and all rather pointless.
> > > > 
> > > > And down the road extensions like surprise removal support will make it
> > > > all cleaner and more transparent. Floating things up to libvirt means
> > > > all these low level details will require more and more hacks.
> > > 
> > > Why do you call it pointless?
> > 
> > We'd need APIs to manipulate device visibility to guest, hotplug
> > controller state and separately manipulate the resources allocated. This
> > is low level stuff that users really have no idea what to do about.
> > Exposing such a level of detail to management is imho pointless.
> > We are better off with a high level API, see below.
> 
> so I don't know much about vfio; but to me it strikes me that
> you wouldn't need that low level detail if we just reworked vfio
> to look more like all our other devices; something like:
> 
>   -vfiodev  host=02:00.0,id=gpu
>   -device vfio-pci,dev=gpu
> 
> The 'vfiodev' would own the resources; so to do this trick, the
> management layer would:
>    hotunplug the vfio-pci
>    migrate
> 
> if anything went wrong it would
>    hotplug the vfio-pci backin
> 
> you wouldn't have free'd up any resources because they belonged
> to the vfiodev.


IIUC that doesn't really work with passthrough
unless guests support surprise removal.


> > > If we want this to work before
> > > surprise removal is implemented, we need to provide an API that
> > > works for management software.
> > >  Don't we want to make this work
> > > without surprise removal too?
> > 
> > This patchset adds an optional, off by default support for
> > migrating guests with an assigned network device.
> > If enabled this requires guest to allow migration.
> > 
> > Of course this can be viewed as a security problem since it allows guest
> > to block migration. We can't detect a malicious guest reliably imho.
> > What we can do is report to management when guest allows migration.
> > Policy such what to do when this does not happen for a while and
> > what timeout to set would be up to management.
> > 
> > The API in question would be a high level one, something
> > along the lines of a single "guest allowed migration" event.
> 
> This is all fairly normal problems with hot unplugging - that's
> already dealt with at higher levels for normal hot unplugging.
> 
> The question here is to try to avoid duplicating that fairly
> painful process in qemu.
> 
> Dave
> > 
> > -- 
> > MST
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-05-31 20:43                       ` Michael S. Tsirkin
@ 2019-05-31 21:03                         ` Eduardo Habkost
  2019-06-03  8:06                         ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 77+ messages in thread
From: Eduardo Habkost @ 2019-05-31 21:03 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: pkrempa, berrange, aadam, Dr. David Alan Gilbert, qemu-devel,
	laine, jdenemar, Jens Freimann, ailan

On Fri, May 31, 2019 at 04:43:44PM -0400, Michael S. Tsirkin wrote:
> On Fri, May 31, 2019 at 07:45:13PM +0100, Dr. David Alan Gilbert wrote:
> > * Michael S. Tsirkin (mst@redhat.com) wrote:
> > > On Fri, May 31, 2019 at 02:01:54PM -0300, Eduardo Habkost wrote:
> > > > > Yes. It's just lots of extremely low level interfaces
> > > > > and all rather pointless.
> > > > > 
> > > > > And down the road extensions like surprise removal support will make it
> > > > > all cleaner and more transparent. Floating things up to libvirt means
> > > > > all these low level details will require more and more hacks.
> > > > 
> > > > Why do you call it pointless?
> > > 
> > > We'd need APIs to manipulate device visibility to guest, hotplug
> > > controller state and separately manipulate the resources allocated. This
> > > is low level stuff that users really have no idea what to do about.
> > > Exposing such a level of detail to management is imho pointless.
> > > We are better off with a high level API, see below.
> > 
> > so I don't know much about vfio; but to me it strikes me that
> > you wouldn't need that low level detail if we just reworked vfio
> > to look more like all our other devices; something like:
> > 
> >   -vfiodev  host=02:00.0,id=gpu
> >   -device vfio-pci,dev=gpu
> > 
> > The 'vfiodev' would own the resources; so to do this trick, the
> > management layer would:
> >    hotunplug the vfio-pci
> >    migrate
> > 
> > if anything went wrong it would
> >    hotplug the vfio-pci backin
> > 
> > you wouldn't have free'd up any resources because they belonged
> > to the vfiodev.
> 
> 
> IIUC that doesn't really work with passthrough
> unless guests support surprise removal.

Why?  For the guest, this is indistinguishable from the unplug
request implemented by this series.

-- 
Eduardo


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-05-31 20:29                       ` Alex Williamson
@ 2019-05-31 21:05                         ` Michael S. Tsirkin
  2019-05-31 21:59                           ` Eduardo Habkost
  2019-06-03  8:59                         ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 77+ messages in thread
From: Michael S. Tsirkin @ 2019-05-31 21:05 UTC (permalink / raw)
  To: Alex Williamson
  Cc: pkrempa, berrange, Eduardo Habkost, aadam, qemu-devel,
	Dr. David Alan Gilbert, laine, jdenemar, Jens Freimann, ailan

On Fri, May 31, 2019 at 02:29:33PM -0600, Alex Williamson wrote:
> I don't know what this frontend/backend rework would
> look like for vfio-pci, but it seems non-trivial for this one use case
> and I don't see that it adds any value outside of this use case,
> perhaps quite the opposite, it's an overly complicated interface for
> the majority of use cases so we either move to a more complicated
> interface or maintain both.  Poor choices either way.

Well put Alex this is what I meant when I said it's a useless
interface. I meant it only has a single use.



^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-05-30 14:56     ` Jens Freimann
  2019-05-30 17:46       ` Michael S. Tsirkin
  2019-05-30 19:09       ` Dr. David Alan Gilbert
@ 2019-05-31 21:47       ` Eduardo Habkost
  2019-06-03  8:24         ` Jens Freimann
  2 siblings, 1 reply; 77+ messages in thread
From: Eduardo Habkost @ 2019-05-31 21:47 UTC (permalink / raw)
  To: Jens Freimann
  Cc: pkrempa, berrange, mst, aadam, Dr. David Alan Gilbert,
	qemu-devel, laine, ailan

On Thu, May 30, 2019 at 04:56:45PM +0200, Jens Freimann wrote:
> On Tue, May 28, 2019 at 11:04:15AM -0400, Michael S. Tsirkin wrote:
> > On Tue, May 21, 2019 at 10:45:05AM +0100, Dr. David Alan Gilbert wrote:
> > > * Jens Freimann (jfreimann@redhat.com) wrote:
[...]
> > > > +    }
> > > > +    if (migration_in_setup(s) && !should_be_hidden && n->primary_dev) {
> > > > +        qdev_unplug(n->primary_dev, &err);
> > > 
> > > Not knowing unplug well; can you just explain - is that device hard
> > > unplugged and it's gone by the time this function returns or is it still
> > > hanging around for some indeterminate time?
> 
> Qemu will trigger an unplug request via pcie attention button in which case
> there could be a delay by the guest operating system. We could give it some
> amount of time and if nothing happens try surpise removal or handle the
> error otherwise.

I'm missing something here:

Isn't the whole point of the new device-hiding infrastructure to
prevent QEMU from closing the VFIO until migration ended
successfully?

What exactly is preventing QEMU from closing the host VFIO device
after the guest OS has handled the unplug request?

-- 
Eduardo


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-05-31 21:05                         ` Michael S. Tsirkin
@ 2019-05-31 21:59                           ` Eduardo Habkost
  0 siblings, 0 replies; 77+ messages in thread
From: Eduardo Habkost @ 2019-05-31 21:59 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: pkrempa, berrange, aadam, Dr. David Alan Gilbert, qemu-devel,
	Alex Williamson, laine, jdenemar, Jens Freimann, ailan

On Fri, May 31, 2019 at 05:05:26PM -0400, Michael S. Tsirkin wrote:
> On Fri, May 31, 2019 at 02:29:33PM -0600, Alex Williamson wrote:
> > I don't know what this frontend/backend rework would
> > look like for vfio-pci, but it seems non-trivial for this one use case
> > and I don't see that it adds any value outside of this use case,
> > perhaps quite the opposite, it's an overly complicated interface for
> > the majority of use cases so we either move to a more complicated
> > interface or maintain both.  Poor choices either way.
> 
> Well put Alex this is what I meant when I said it's a useless
> interface. I meant it only has a single use.

I might agree if the code needed to hide the VFIO device from the
guest while keeping resources open (so it can be re-added if
migration fails) is demonstrably simpler than the code that would
be necessary to separate the device backend from the frontend.

But I couldn't find the code that does that in this series.  Is
this already implemented?

All I see is a qdev_unplug() call (which will close host
resources) and a qdev_device_add() call (which will reopen the
host device).

-- 
Eduardo


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-05-31 20:43                       ` Michael S. Tsirkin
  2019-05-31 21:03                         ` Eduardo Habkost
@ 2019-06-03  8:06                         ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 77+ messages in thread
From: Dr. David Alan Gilbert @ 2019-06-03  8:06 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: pkrempa, berrange, Eduardo Habkost, aadam, qemu-devel, laine,
	jdenemar, Jens Freimann, ailan

* Michael S. Tsirkin (mst@redhat.com) wrote:
> On Fri, May 31, 2019 at 07:45:13PM +0100, Dr. David Alan Gilbert wrote:
> > * Michael S. Tsirkin (mst@redhat.com) wrote:
> > > On Fri, May 31, 2019 at 02:01:54PM -0300, Eduardo Habkost wrote:
> > > > > Yes. It's just lots of extremely low level interfaces
> > > > > and all rather pointless.
> > > > > 
> > > > > And down the road extensions like surprise removal support will make it
> > > > > all cleaner and more transparent. Floating things up to libvirt means
> > > > > all these low level details will require more and more hacks.
> > > > 
> > > > Why do you call it pointless?
> > > 
> > > We'd need APIs to manipulate device visibility to guest, hotplug
> > > controller state and separately manipulate the resources allocated. This
> > > is low level stuff that users really have no idea what to do about.
> > > Exposing such a level of detail to management is imho pointless.
> > > We are better off with a high level API, see below.
> > 
> > so I don't know much about vfio; but to me it strikes me that
> > you wouldn't need that low level detail if we just reworked vfio
> > to look more like all our other devices; something like:
> > 
> >   -vfiodev  host=02:00.0,id=gpu
> >   -device vfio-pci,dev=gpu
> > 
> > The 'vfiodev' would own the resources; so to do this trick, the
> > management layer would:
> >    hotunplug the vfio-pci
> >    migrate
> > 
> > if anything went wrong it would
> >    hotplug the vfio-pci backin
> > 
> > you wouldn't have free'd up any resources because they belonged
> > to the vfiodev.
> 
> 
> IIUC that doesn't really work with passthrough
> unless guests support surprise removal.

Why? The view from the guest here is just like what this series
has added without the special hack.

Dave

> 
> > > > If we want this to work before
> > > > surprise removal is implemented, we need to provide an API that
> > > > works for management software.
> > > >  Don't we want to make this work
> > > > without surprise removal too?
> > > 
> > > This patchset adds an optional, off by default support for
> > > migrating guests with an assigned network device.
> > > If enabled this requires guest to allow migration.
> > > 
> > > Of course this can be viewed as a security problem since it allows guest
> > > to block migration. We can't detect a malicious guest reliably imho.
> > > What we can do is report to management when guest allows migration.
> > > Policy such what to do when this does not happen for a while and
> > > what timeout to set would be up to management.
> > > 
> > > The API in question would be a high level one, something
> > > along the lines of a single "guest allowed migration" event.
> > 
> > This is all fairly normal problems with hot unplugging - that's
> > already dealt with at higher levels for normal hot unplugging.
> > 
> > The question here is to try to avoid duplicating that fairly
> > painful process in qemu.
> > 
> > Dave
> > > 
> > > -- 
> > > MST
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-05-31 21:47       ` Eduardo Habkost
@ 2019-06-03  8:24         ` Jens Freimann
  2019-06-03  9:26           ` Jens Freimann
                             ` (2 more replies)
  0 siblings, 3 replies; 77+ messages in thread
From: Jens Freimann @ 2019-06-03  8:24 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: pkrempa, berrange, mst, aadam, qemu-devel,
	Dr. David Alan Gilbert, laine, ailan

On Fri, May 31, 2019 at 06:47:48PM -0300, Eduardo Habkost wrote:
>On Thu, May 30, 2019 at 04:56:45PM +0200, Jens Freimann wrote:
>> On Tue, May 28, 2019 at 11:04:15AM -0400, Michael S. Tsirkin wrote:
>> > On Tue, May 21, 2019 at 10:45:05AM +0100, Dr. David Alan Gilbert wrote:
>> > > * Jens Freimann (jfreimann@redhat.com) wrote:
>[...]
>> > > > +    }
>> > > > +    if (migration_in_setup(s) && !should_be_hidden && n->primary_dev) {
>> > > > +        qdev_unplug(n->primary_dev, &err);
>> > >
>> > > Not knowing unplug well; can you just explain - is that device hard
>> > > unplugged and it's gone by the time this function returns or is it still
>> > > hanging around for some indeterminate time?
>>
>> Qemu will trigger an unplug request via pcie attention button in which case
>> there could be a delay by the guest operating system. We could give it some
>> amount of time and if nothing happens try surpise removal or handle the
>> error otherwise.
>
>I'm missing something here:
>
>Isn't the whole point of the new device-hiding infrastructure to
>prevent QEMU from closing the VFIO until migration ended
>successfully?

No. The point of hiding it is to only add the VFIO (that is configured
with the same MAC as the virtio-net device) until the
VIRTIO_NET_F_STANDBY feature is negotiated. We don't want to expose to
devices with the same MAC to guests who can't handle it.

>What exactly is preventing QEMU from closing the host VFIO device
>after the guest OS has handled the unplug request?

We qdev_unplug() the VFIO device and want the virtio-net standby device to
take over. If something goes wrong with unplug or
migration in general we have to qdev_plug() the device back.

This series does not try to implement new functionality to close a
device without freeing the resources.

From the discussion in this thread I understand that is what libvirt
needs though. Something that will trigger the unplug from the
guest but not free the devices resources in the host system (which is
what qdev_unplug() does). Correct? 

Why is it bad to fully re-create the device in case of a failed migration?


regards,
Jens 
 


>-- 
>Eduardo
>


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-05-31 20:29                       ` Alex Williamson
  2019-05-31 21:05                         ` Michael S. Tsirkin
@ 2019-06-03  8:59                         ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 77+ messages in thread
From: Dr. David Alan Gilbert @ 2019-06-03  8:59 UTC (permalink / raw)
  To: Alex Williamson
  Cc: pkrempa, berrange, Eduardo Habkost, Michael S. Tsirkin, aadam,
	qemu-devel, laine, jdenemar, Jens Freimann, ailan

* Alex Williamson (alex.williamson@redhat.com) wrote:
> On Fri, 31 May 2019 19:45:13 +0100
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> 
> > * Michael S. Tsirkin (mst@redhat.com) wrote:
> > > On Fri, May 31, 2019 at 02:01:54PM -0300, Eduardo Habkost wrote:  
> > > > > Yes. It's just lots of extremely low level interfaces
> > > > > and all rather pointless.
> > > > > 
> > > > > And down the road extensions like surprise removal support will make it
> > > > > all cleaner and more transparent. Floating things up to libvirt means
> > > > > all these low level details will require more and more hacks.  
> > > > 
> > > > Why do you call it pointless?  
> > > 
> > > We'd need APIs to manipulate device visibility to guest, hotplug
> > > controller state and separately manipulate the resources allocated. This
> > > is low level stuff that users really have no idea what to do about.
> > > Exposing such a level of detail to management is imho pointless.
> > > We are better off with a high level API, see below.  
> > 
> > so I don't know much about vfio; but to me it strikes me that
> > you wouldn't need that low level detail if we just reworked vfio
> > to look more like all our other devices;
> 
> I don't understand what this means, I thought vfio-pci followed a very
> standard device model.
> 
> > something like:
> > 
> >   -vfiodev  host=02:00.0,id=gpu
> >   -device vfio-pci,dev=gpu
> >
> > The 'vfiodev' would own the resources; so to do this trick, the
> > management layer would:
> >    hotunplug the vfio-pci
> >    migrate
> > 
> > if anything went wrong it would
> >    hotplug the vfio-pci backin
> > 
> > you wouldn't have free'd up any resources because they belonged
> > to the vfiodev.
> 
> So you're looking more for some sort of frontend-backend separation, we
> hot-unplug the frontend device that's exposed to the guest while the
> backend device that holds the host resources is still attached.  I
> would have hardly guessed that's "like all our other devices".


Well, we have netdev's and NICs that connect them to the guest,
        and blockdev's and guest devices that expose them to the guest

> I was
> under the impression (from previous discussions mostly) that the device
> removal would be caught before actually allowing the device to finalize
> and exit, such that with a failed migration, re-adding the device would
> be deterministic since the device is never released back to the host.
> I expected that could be done within QEMU, but I guess that's what
> we're getting into here is how management tools specify that eject w/o
> release semantic.

My worry here is that this is all being done behind the back of the
management tools in this series.
The management tools already deal with hot-unplugging and problems with
it;  here we're duplicating that set of problems and trying to stuff
them into the start of migration.

> I don't know what this frontend/backend rework would
> look like for vfio-pci, but it seems non-trivial for this one use case
> and I don't see that it adds any value outside of this use case,
> perhaps quite the opposite, it's an overly complicated interface for
> the majority of use cases so we either move to a more complicated
> interface or maintain both.  Poor choices either way.  Thanks,

Yep, tricky.

Dave

> Alex
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-06-03  8:24         ` Jens Freimann
@ 2019-06-03  9:26           ` Jens Freimann
  2019-06-03 18:10           ` Laine Stump
  2019-06-03 19:36           ` Eduardo Habkost
  2 siblings, 0 replies; 77+ messages in thread
From: Jens Freimann @ 2019-06-03  9:26 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: pkrempa, berrange, mst, aadam, qemu-devel,
	Dr. David Alan Gilbert, laine, ailan

On Mon, Jun 03, 2019 at 10:24:56AM +0200, Jens Freimann wrote:
>On Fri, May 31, 2019 at 06:47:48PM -0300, Eduardo Habkost wrote:
>>On Thu, May 30, 2019 at 04:56:45PM +0200, Jens Freimann wrote:
>>>On Tue, May 28, 2019 at 11:04:15AM -0400, Michael S. Tsirkin wrote:
>>>> On Tue, May 21, 2019 at 10:45:05AM +0100, Dr. David Alan Gilbert wrote:
>>>> > * Jens Freimann (jfreimann@redhat.com) wro
>>What exactly is preventing QEMU from closing the host VFIO device
>>after the guest OS has handled the unplug request?
>
>We qdev_unplug() the VFIO device and want the virtio-net standby device to
>take over. If something goes wrong with unplug or
>migration in general we have to qdev_plug() the device back.

I meant qdev_device_add, not qdev_plug.

regards,
Jens 


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices
  2019-05-29  2:54           ` Michael S. Tsirkin
@ 2019-06-03 18:06             ` Laine Stump
  2019-06-03 18:12               ` Michael S. Tsirkin
  0 siblings, 1 reply; 77+ messages in thread
From: Laine Stump @ 2019-06-03 18:06 UTC (permalink / raw)
  To: Michael S. Tsirkin, si-wei liu
  Cc: pkrempa, berrange, ehabkost, aadam, qemu-devel, Alex Williamson,
	Jens Freimann, ailan

On 5/28/19 10:54 PM, Michael S. Tsirkin wrote:
> On Tue, May 28, 2019 at 05:14:22PM -0700, si-wei liu wrote:
>>
>>
>> On 5/21/2019 11:49 AM, Jens Freimann wrote:
>>> On Tue, May 21, 2019 at 07:37:19AM -0400, Michael S. Tsirkin wrote:
>>>> On Tue, May 21, 2019 at 09:21:57AM +0200, Jens Freimann wrote:
>>>>> On Mon, May 20, 2019 at 04:56:57PM -0600, Alex Williamson wrote:

>>>> Actually is there a list of devices for which this has been tested
>>>> besides mlx5? I think someone said some old intel cards
>>>> don't support this well, we might need to blacklist these ...
>>>
>>> So far I've tested mlx5 and XL710 which both worked, but I'm
>>> working on testing with more devices. But of course help with testing
>>> is greatly appreciated.
 >>
>> It won't work on Intel ixgbe and Broadcom bnxt_en, which requires toggling
>> the state of tap backing the virtio-net in order to release/reprogram MAC
>> filter. Actually, it's very few NICs that could work with this - even some
>> works by chance the behavior is undefined. Instead of blacklisting it makes
>> more sense to whitelist the NIC that supports it - with some new sysfs
>> attribute claiming the support presumably.
>>
>> -Siwei
> 
> I agree for many cards we won't know how they behave until we try.  One
> can consider this a bug in Linux that cards don't behave in a consistent
> way.  The best thing to do IMHO would be to write a tool that people can
> run to test the behaviour.

Is the "bad behavior" something due to the hardware of the cards, or 
their drivers? If it's the latter, then at least initially having a 
whitelist would be counterproductive, since it would make it difficult 
for relative outsiders to test and report success/failure of various cards.

(It's probably just a pipe dream, but it would be nice if it eventually 
could work with old igb cards - I have several of them that I use for 
SRIOV testing, and would rather avoid having to buy new hardware.)


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-06-03  8:24         ` Jens Freimann
  2019-06-03  9:26           ` Jens Freimann
@ 2019-06-03 18:10           ` Laine Stump
  2019-06-03 18:46             ` Alex Williamson
  2019-06-03 19:36           ` Eduardo Habkost
  2 siblings, 1 reply; 77+ messages in thread
From: Laine Stump @ 2019-06-03 18:10 UTC (permalink / raw)
  To: Jens Freimann, Eduardo Habkost
  Cc: pkrempa, berrange, mst, aadam, qemu-devel, Dr. David Alan Gilbert, ailan

On 6/3/19 4:24 AM, Jens Freimann wrote:
> On Fri, May 31, 2019 at 06:47:48PM -0300, Eduardo Habkost wrote:
>> On Thu, May 30, 2019 at 04:56:45PM +0200, Jens Freimann wrote:
>>> On Tue, May 28, 2019 at 11:04:15AM -0400, Michael S. Tsirkin wrote:
>>> > On Tue, May 21, 2019 at 10:45:05AM +0100, Dr. David Alan Gilbert 
>>> wrote:
>>> > > * Jens Freimann (jfreimann@redhat.com) wrote:
>> [...]
>>> > > > +    }
>>> > > > +    if (migration_in_setup(s) && !should_be_hidden && 
>>> n->primary_dev) {
>>> > > > +        qdev_unplug(n->primary_dev, &err);
>>> > >
>>> > > Not knowing unplug well; can you just explain - is that device hard
>>> > > unplugged and it's gone by the time this function returns or is 
>>> it still
>>> > > hanging around for some indeterminate time?
>>>
>>> Qemu will trigger an unplug request via pcie attention button in 
>>> which case
>>> there could be a delay by the guest operating system. We could give 
>>> it some
>>> amount of time and if nothing happens try surpise removal or handle the
>>> error otherwise.
>>
>> I'm missing something here:
>>
>> Isn't the whole point of the new device-hiding infrastructure to
>> prevent QEMU from closing the VFIO until migration ended
>> successfully?
> 
> No. The point of hiding it is to only add the VFIO (that is configured
> with the same MAC as the virtio-net device) until the
> VIRTIO_NET_F_STANDBY feature is negotiated. We don't want to expose to
> devices with the same MAC to guests who can't handle it.
> 
>> What exactly is preventing QEMU from closing the host VFIO device
>> after the guest OS has handled the unplug request?
> 
> We qdev_unplug() the VFIO device and want the virtio-net standby device to
> take over. If something goes wrong with unplug or
> migration in general we have to qdev_plug() the device back.
> 
> This series does not try to implement new functionality to close a
> device without freeing the resources.
> 
>  From the discussion in this thread I understand that is what libvirt
> needs though. Something that will trigger the unplug from the
> guest but not free the devices resources in the host system (which is
> what qdev_unplug() does). Correct?
> Why is it bad to fully re-create the device in case of a failed migration?

I think the concern is that if the device was fully released by qemu 
during migration, it might have already been given to some other/new 
guest during the time that migration is trying to complete. If migration 
then fails, you may be unable to restore the guest to the previous state.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices
  2019-06-03 18:06             ` Laine Stump
@ 2019-06-03 18:12               ` Michael S. Tsirkin
  2019-06-03 18:18                 ` Laine Stump
  0 siblings, 1 reply; 77+ messages in thread
From: Michael S. Tsirkin @ 2019-06-03 18:12 UTC (permalink / raw)
  To: Laine Stump
  Cc: pkrempa, berrange, ehabkost, aadam, qemu-devel, Alex Williamson,
	si-wei liu, Jens Freimann, ailan

On Mon, Jun 03, 2019 at 02:06:47PM -0400, Laine Stump wrote:
> On 5/28/19 10:54 PM, Michael S. Tsirkin wrote:
> > On Tue, May 28, 2019 at 05:14:22PM -0700, si-wei liu wrote:
> > > 
> > > 
> > > On 5/21/2019 11:49 AM, Jens Freimann wrote:
> > > > On Tue, May 21, 2019 at 07:37:19AM -0400, Michael S. Tsirkin wrote:
> > > > > On Tue, May 21, 2019 at 09:21:57AM +0200, Jens Freimann wrote:
> > > > > > On Mon, May 20, 2019 at 04:56:57PM -0600, Alex Williamson wrote:
> 
> > > > > Actually is there a list of devices for which this has been tested
> > > > > besides mlx5? I think someone said some old intel cards
> > > > > don't support this well, we might need to blacklist these ...
> > > > 
> > > > So far I've tested mlx5 and XL710 which both worked, but I'm
> > > > working on testing with more devices. But of course help with testing
> > > > is greatly appreciated.
> >>
> > > It won't work on Intel ixgbe and Broadcom bnxt_en, which requires toggling
> > > the state of tap backing the virtio-net in order to release/reprogram MAC
> > > filter. Actually, it's very few NICs that could work with this - even some
> > > works by chance the behavior is undefined. Instead of blacklisting it makes
> > > more sense to whitelist the NIC that supports it - with some new sysfs
> > > attribute claiming the support presumably.
> > > 
> > > -Siwei
> > 
> > I agree for many cards we won't know how they behave until we try.  One
> > can consider this a bug in Linux that cards don't behave in a consistent
> > way.  The best thing to do IMHO would be to write a tool that people can
> > run to test the behaviour.
> 
> Is the "bad behavior" something due to the hardware of the cards, or their
> drivers? If it's the latter, then at least initially having a whitelist
> would be counterproductive, since it would make it difficult for relative
> outsiders to test and report success/failure of various cards.

We can add an "ignore whitelist" flag. Would that address the issue?

> (It's probably just a pipe dream, but it would be nice if it eventually
> could work with old igb cards - I have several of them that I use for SRIOV
> testing, and would rather avoid having to buy new hardware.)

I think it generally can be worked around in the driver.
Most host drivers do get a notification when guest driver
loads/unloads and can use that to manipulate the on-device
switch.

-- 
MST


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices
  2019-06-03 18:12               ` Michael S. Tsirkin
@ 2019-06-03 18:18                 ` Laine Stump
  2019-06-06 21:49                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 77+ messages in thread
From: Laine Stump @ 2019-06-03 18:18 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: pkrempa, berrange, ehabkost, aadam, qemu-devel, Alex Williamson,
	si-wei liu, Jens Freimann, ailan

On 6/3/19 2:12 PM, Michael S. Tsirkin wrote:
> On Mon, Jun 03, 2019 at 02:06:47PM -0400, Laine Stump wrote:
>> On 5/28/19 10:54 PM, Michael S. Tsirkin wrote:
>>> On Tue, May 28, 2019 at 05:14:22PM -0700, si-wei liu wrote:
>>>>
>>>>
>>>> On 5/21/2019 11:49 AM, Jens Freimann wrote:
>>>>> On Tue, May 21, 2019 at 07:37:19AM -0400, Michael S. Tsirkin wrote:
>>>>>> On Tue, May 21, 2019 at 09:21:57AM +0200, Jens Freimann wrote:
>>>>>>> On Mon, May 20, 2019 at 04:56:57PM -0600, Alex Williamson wrote:
>>
>>>>>> Actually is there a list of devices for which this has been tested
>>>>>> besides mlx5? I think someone said some old intel cards
>>>>>> don't support this well, we might need to blacklist these ...
>>>>>
>>>>> So far I've tested mlx5 and XL710 which both worked, but I'm
>>>>> working on testing with more devices. But of course help with testing
>>>>> is greatly appreciated.
>>>>
>>>> It won't work on Intel ixgbe and Broadcom bnxt_en, which requires toggling
>>>> the state of tap backing the virtio-net in order to release/reprogram MAC
>>>> filter. Actually, it's very few NICs that could work with this - even some
>>>> works by chance the behavior is undefined. Instead of blacklisting it makes
>>>> more sense to whitelist the NIC that supports it - with some new sysfs
>>>> attribute claiming the support presumably.
>>>>
>>>> -Siwei
>>>
>>> I agree for many cards we won't know how they behave until we try.  One
>>> can consider this a bug in Linux that cards don't behave in a consistent
>>> way.  The best thing to do IMHO would be to write a tool that people can
>>> run to test the behaviour.
>>
>> Is the "bad behavior" something due to the hardware of the cards, or their
>> drivers? If it's the latter, then at least initially having a whitelist
>> would be counterproductive, since it would make it difficult for relative
>> outsiders to test and report success/failure of various cards.
> 
> We can add an "ignore whitelist" flag. Would that address the issue?

It would be better than requiring a kernel/qemu recompile :-)


Where would the whilelist live? In qemu or in the kernel? It would be 
problematic to have the whitelist in qemu if kernel driver changes could 
fix a particular card.

Beyond that, what about *always* just issuing some sort of warning 
rather than completely forbidding a card that wasn't whitelisted? 
(Haven't decided if I like that better or not (and it probably doesn't 
matter, since I'm not a "real" user, but I thought I would mention it).


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-06-03 18:10           ` Laine Stump
@ 2019-06-03 18:46             ` Alex Williamson
  2019-06-05 15:20               ` Daniel P. Berrangé
  2019-06-06 15:00               ` Roman Kagan
  0 siblings, 2 replies; 77+ messages in thread
From: Alex Williamson @ 2019-06-03 18:46 UTC (permalink / raw)
  To: Laine Stump
  Cc: pkrempa, berrange, Eduardo Habkost, mst, aadam, qemu-devel,
	Dr. David Alan Gilbert, Jens Freimann, ailan

On Mon, 3 Jun 2019 14:10:52 -0400
Laine Stump <laine@redhat.com> wrote:

> On 6/3/19 4:24 AM, Jens Freimann wrote:
> > On Fri, May 31, 2019 at 06:47:48PM -0300, Eduardo Habkost wrote:  
> >> On Thu, May 30, 2019 at 04:56:45PM +0200, Jens Freimann wrote:  
> >>> On Tue, May 28, 2019 at 11:04:15AM -0400, Michael S. Tsirkin wrote:  
> >>> > On Tue, May 21, 2019 at 10:45:05AM +0100, Dr. David Alan Gilbert   
> >>> wrote:  
> >>> > > * Jens Freimann (jfreimann@redhat.com) wrote:  
> >> [...]  
> >>> > > > +    }
> >>> > > > +    if (migration_in_setup(s) && !should_be_hidden &&   
> >>> n->primary_dev) {  
> >>> > > > +        qdev_unplug(n->primary_dev, &err);  
> >>> > >
> >>> > > Not knowing unplug well; can you just explain - is that device hard
> >>> > > unplugged and it's gone by the time this function returns or is   
> >>> it still  
> >>> > > hanging around for some indeterminate time?  
> >>>
> >>> Qemu will trigger an unplug request via pcie attention button in 
> >>> which case
> >>> there could be a delay by the guest operating system. We could give 
> >>> it some
> >>> amount of time and if nothing happens try surpise removal or handle the
> >>> error otherwise.  
> >>
> >> I'm missing something here:
> >>
> >> Isn't the whole point of the new device-hiding infrastructure to
> >> prevent QEMU from closing the VFIO until migration ended
> >> successfully?  
> > 
> > No. The point of hiding it is to only add the VFIO (that is configured
> > with the same MAC as the virtio-net device) until the
> > VIRTIO_NET_F_STANDBY feature is negotiated. We don't want to expose to
> > devices with the same MAC to guests who can't handle it.
> >   
> >> What exactly is preventing QEMU from closing the host VFIO device
> >> after the guest OS has handled the unplug request?  
> > 
> > We qdev_unplug() the VFIO device and want the virtio-net standby device to
> > take over. If something goes wrong with unplug or
> > migration in general we have to qdev_plug() the device back.
> > 
> > This series does not try to implement new functionality to close a
> > device without freeing the resources.
> > 
> >  From the discussion in this thread I understand that is what libvirt
> > needs though. Something that will trigger the unplug from the
> > guest but not free the devices resources in the host system (which is
> > what qdev_unplug() does). Correct?
> > Why is it bad to fully re-create the device in case of a failed migration?  
> 
> I think the concern is that if the device was fully released by qemu 
> during migration, it might have already been given to some other/new 
> guest during the time that migration is trying to complete. If migration 
> then fails, you may be unable to restore the guest to the previous state.

Yep, plus I think the memory pinning and IOMMU resources could be a
variable as well.  Essentially, there's no guaranteed reservation to
the device or any of the additional resources that the device implies
once it's released, so we want to keep as much of that on hot-standby
as we can in case the migration fails.  Unfortunately even just
unmapping the BARs for a guest-only hot-unplug unmaps those regions
from the IOMMU, but aside from catastrophic resource issues on the
host, we can essentially guarantee being able to remap those.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-06-03  8:24         ` Jens Freimann
  2019-06-03  9:26           ` Jens Freimann
  2019-06-03 18:10           ` Laine Stump
@ 2019-06-03 19:36           ` Eduardo Habkost
  2019-06-04 13:43             ` Jens Freimann
  2 siblings, 1 reply; 77+ messages in thread
From: Eduardo Habkost @ 2019-06-03 19:36 UTC (permalink / raw)
  To: Jens Freimann
  Cc: pkrempa, berrange, mst, aadam, qemu-devel,
	Dr. David Alan Gilbert, laine, ailan

On Mon, Jun 03, 2019 at 10:24:56AM +0200, Jens Freimann wrote:
> On Fri, May 31, 2019 at 06:47:48PM -0300, Eduardo Habkost wrote:
> > On Thu, May 30, 2019 at 04:56:45PM +0200, Jens Freimann wrote:
> > > On Tue, May 28, 2019 at 11:04:15AM -0400, Michael S. Tsirkin wrote:
> > > > On Tue, May 21, 2019 at 10:45:05AM +0100, Dr. David Alan Gilbert wrote:
> > > > > * Jens Freimann (jfreimann@redhat.com) wrote:
> > [...]
> > > > > > +    }
> > > > > > +    if (migration_in_setup(s) && !should_be_hidden && n->primary_dev) {
> > > > > > +        qdev_unplug(n->primary_dev, &err);
> > > > >
> > > > > Not knowing unplug well; can you just explain - is that device hard
> > > > > unplugged and it's gone by the time this function returns or is it still
> > > > > hanging around for some indeterminate time?
> > > 
> > > Qemu will trigger an unplug request via pcie attention button in which case
> > > there could be a delay by the guest operating system. We could give it some
> > > amount of time and if nothing happens try surpise removal or handle the
> > > error otherwise.
> > 
> > I'm missing something here:
> > 
> > Isn't the whole point of the new device-hiding infrastructure to
> > prevent QEMU from closing the VFIO until migration ended
> > successfully?
> 
> No. The point of hiding it is to only add the VFIO (that is configured
> with the same MAC as the virtio-net device) until the
> VIRTIO_NET_F_STANDBY feature is negotiated. We don't want to expose to
> devices with the same MAC to guests who can't handle it.
> 
> > What exactly is preventing QEMU from closing the host VFIO device
> > after the guest OS has handled the unplug request?
> 
> We qdev_unplug() the VFIO device and want the virtio-net standby device to
> take over. If something goes wrong with unplug or
> migration in general we have to qdev_plug() the device back.
> 
> This series does not try to implement new functionality to close a
> device without freeing the resources.
> 
> From the discussion in this thread I understand that is what libvirt
> needs though. Something that will trigger the unplug from the
> guest but not free the devices resources in the host system (which is
> what qdev_unplug() does). Correct?

This is what I understand we need, but this is not what
qdev_unplug() does.

> 
> Why is it bad to fully re-create the device in case of a failed migration?

Bad or not, I thought the whole point of doing it inside QEMU was
to do something libvirt wouldn't be able to do (namely,
unplugging the device while not freeing resources).  If we are
doing something that management software is already capable of
doing, what's the point?

Quoting a previous message from this thread:

On Thu, May 30, 2019 at 02:09:42PM -0400, Michael S. Tsirkin wrote:
| > On Thu, May 30, 2019 at 07:00:23PM +0100, Dr. David Alan Gilbert wrote:
| > >  This patch series is very
| > > odd precisely because it's trying to do the unplug itself in the
| > > migration phase rather than let the management layer do it - so unless
| > > it's nailed down how to make sure that's really really bullet proof
| > > then we've got to go back and ask the question about whether we should
| > > really fix it so it can be done by the management layer.
| > > 
| > > Dave
| > 
| > management already said they can't because files get closed and
| > resources freed on unplug and so they might not be able to re-add device
| > on migration failure. We do it in migration because that is
| > where failures can happen and we can recover.


-- 
Eduardo


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-06-03 19:36           ` Eduardo Habkost
@ 2019-06-04 13:43             ` Jens Freimann
  2019-06-04 14:09               ` Eduardo Habkost
                                 ` (3 more replies)
  0 siblings, 4 replies; 77+ messages in thread
From: Jens Freimann @ 2019-06-04 13:43 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: pkrempa, berrange, mst, aadam, qemu-devel,
	Dr. David Alan Gilbert, laine, ailan

On Mon, Jun 03, 2019 at 04:36:48PM -0300, Eduardo Habkost wrote:
>On Mon, Jun 03, 2019 at 10:24:56AM +0200, Jens Freimann wrote:
>> On Fri, May 31, 2019 at 06:47:48PM -0300, Eduardo Habkost wrote:
>> > On Thu, May 30, 2019 at 04:56:45PM +0200, Jens Freimann wrote:
>> > > On Tue, May 28, 2019 at 11:04:15AM -0400, Michael S. Tsirkin wrote:
>> > > > On Tue, May 21, 2019 at 10:45:05AM +0100, Dr. David Alan Gilbert wrote:
>> > > > > * Jens Freimann (jfreimann@redhat.com) wrote:
>> Why is it bad to fully re-create the device in case of a failed migration?
>
>Bad or not, I thought the whole point of doing it inside QEMU was
>to do something libvirt wouldn't be able to do (namely,
>unplugging the device while not freeing resources).  If we are
>doing something that management software is already capable of
>doing, what's the point?

Event though management software seems to be capable of it, a failover
implementation has never happened. As Michael says network failover is
a mechanism (there's no good reason not to use a PT device if it is
available), not a policy. We are now trying to implement it in a
simple way, contained within QEMU. 

>Quoting a previous message from this thread:
>
>On Thu, May 30, 2019 at 02:09:42PM -0400, Michael S. Tsirkin wrote:
>| > On Thu, May 30, 2019 at 07:00:23PM +0100, Dr. David Alan Gilbert wrote:
>| > >  This patch series is very
>| > > odd precisely because it's trying to do the unplug itself in the
>| > > migration phase rather than let the management layer do it - so unless
>| > > it's nailed down how to make sure that's really really bullet proof
>| > > then we've got to go back and ask the question about whether we should
>| > > really fix it so it can be done by the management layer.
>| > >
>| > > Dave
>| >
>| > management already said they can't because files get closed and
>| > resources freed on unplug and so they might not be able to re-add device
>| > on migration failure. We do it in migration because that is
>| > where failures can happen and we can recover.

This is something that I can work on as well, but it doesn't have to
be part of this patch set in my opinion. Let's say migration fails and we can't
re-plug the primary device. We can still use the standby (virtio-net)
device which would only mean slower networking. How likely is it that
the primary device is grabbed by another VM between unplugging and
migration failure anyway? 

regards,
Jens 


>
>-- 
>Eduardo


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-06-04 13:43             ` Jens Freimann
@ 2019-06-04 14:09               ` Eduardo Habkost
  2019-06-04 17:06               ` Michael S. Tsirkin
                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 77+ messages in thread
From: Eduardo Habkost @ 2019-06-04 14:09 UTC (permalink / raw)
  To: Jens Freimann
  Cc: pkrempa, berrange, mst, aadam, qemu-devel,
	Dr. David Alan Gilbert, laine, ailan

On Tue, Jun 04, 2019 at 03:43:21PM +0200, Jens Freimann wrote:
> On Mon, Jun 03, 2019 at 04:36:48PM -0300, Eduardo Habkost wrote:
> > On Mon, Jun 03, 2019 at 10:24:56AM +0200, Jens Freimann wrote:
> > > On Fri, May 31, 2019 at 06:47:48PM -0300, Eduardo Habkost wrote:
> > > > On Thu, May 30, 2019 at 04:56:45PM +0200, Jens Freimann wrote:
> > > > > On Tue, May 28, 2019 at 11:04:15AM -0400, Michael S. Tsirkin wrote:
> > > > > > On Tue, May 21, 2019 at 10:45:05AM +0100, Dr. David Alan Gilbert wrote:
> > > > > > > * Jens Freimann (jfreimann@redhat.com) wrote:
> > > Why is it bad to fully re-create the device in case of a failed migration?
> > 
> > Bad or not, I thought the whole point of doing it inside QEMU was
> > to do something libvirt wouldn't be able to do (namely,
> > unplugging the device while not freeing resources).  If we are
> > doing something that management software is already capable of
> > doing, what's the point?
> 
> Event though management software seems to be capable of it, a failover
> implementation has never happened. As Michael says network failover is
> a mechanism (there's no good reason not to use a PT device if it is
> available), not a policy. We are now trying to implement it in a
> simple way, contained within QEMU.

I don't think this is a strong enough reason to move complexity
to QEMU.

This might look like it's reducing complexity in the
QEMU<->libvirt interface, but having QEMU unplugging/plugging
devices automatically without libvirt involvement is actually
complicating that interface.

That said, I won't try to prevent this from being merged if the
maintainers and libvirt developers agree on this interface.

-- 
Eduardo


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-06-04 13:43             ` Jens Freimann
  2019-06-04 14:09               ` Eduardo Habkost
@ 2019-06-04 17:06               ` Michael S. Tsirkin
  2019-06-04 19:00                 ` Dr. David Alan Gilbert
  2019-06-05 14:36               ` Daniel P. Berrangé
  2019-06-05 16:04               ` Laine Stump
  3 siblings, 1 reply; 77+ messages in thread
From: Michael S. Tsirkin @ 2019-06-04 17:06 UTC (permalink / raw)
  To: Jens Freimann
  Cc: pkrempa, berrange, Eduardo Habkost, aadam, qemu-devel,
	Dr. David Alan Gilbert, laine, ailan

On Tue, Jun 04, 2019 at 03:43:21PM +0200, Jens Freimann wrote:
> On Mon, Jun 03, 2019 at 04:36:48PM -0300, Eduardo Habkost wrote:
> > On Mon, Jun 03, 2019 at 10:24:56AM +0200, Jens Freimann wrote:
> > > On Fri, May 31, 2019 at 06:47:48PM -0300, Eduardo Habkost wrote:
> > > > On Thu, May 30, 2019 at 04:56:45PM +0200, Jens Freimann wrote:
> > > > > On Tue, May 28, 2019 at 11:04:15AM -0400, Michael S. Tsirkin wrote:
> > > > > > On Tue, May 21, 2019 at 10:45:05AM +0100, Dr. David Alan Gilbert wrote:
> > > > > > > * Jens Freimann (jfreimann@redhat.com) wrote:
> > > Why is it bad to fully re-create the device in case of a failed migration?
> > 
> > Bad or not, I thought the whole point of doing it inside QEMU was
> > to do something libvirt wouldn't be able to do (namely,
> > unplugging the device while not freeing resources).  If we are
> > doing something that management software is already capable of
> > doing, what's the point?
> 
> Event though management software seems to be capable of it, a failover
> implementation has never happened. As Michael says network failover is
> a mechanism (there's no good reason not to use a PT device if it is
> available), not a policy. We are now trying to implement it in a
> simple way, contained within QEMU.
> 
> > Quoting a previous message from this thread:
> > 
> > On Thu, May 30, 2019 at 02:09:42PM -0400, Michael S. Tsirkin wrote:
> > | > On Thu, May 30, 2019 at 07:00:23PM +0100, Dr. David Alan Gilbert wrote:
> > | > >  This patch series is very
> > | > > odd precisely because it's trying to do the unplug itself in the
> > | > > migration phase rather than let the management layer do it - so unless
> > | > > it's nailed down how to make sure that's really really bullet proof
> > | > > then we've got to go back and ask the question about whether we should
> > | > > really fix it so it can be done by the management layer.
> > | > >
> > | > > Dave
> > | >
> > | > management already said they can't because files get closed and
> > | > resources freed on unplug and so they might not be able to re-add device
> > | > on migration failure. We do it in migration because that is
> > | > where failures can happen and we can recover.
> 
> This is something that I can work on as well, but it doesn't have to
> be part of this patch set in my opinion. Let's say migration fails and we can't
> re-plug the primary device. We can still use the standby (virtio-net)
> device which would only mean slower networking. How likely is it that
> the primary device is grabbed by another VM between unplugging and
> migration failure anyway?
> 
> regards,
> Jens

I think I agree with Eduardo it's very important to handle this corner
case correctly. Fast networking outside migration is why people use
failover at all.  Someone who can live with a slower virtio would use
just that.

And IIRC this corner case is exactly why libvirt could not
implement it correctly itself and had to push it up the stack
until it fell off the cliff :).

> 
> > 
> > -- 
> > Eduardo


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-06-04 17:06               ` Michael S. Tsirkin
@ 2019-06-04 19:00                 ` Dr. David Alan Gilbert
  2019-06-07 14:14                   ` Jens Freimann
  0 siblings, 1 reply; 77+ messages in thread
From: Dr. David Alan Gilbert @ 2019-06-04 19:00 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: pkrempa, berrange, Eduardo Habkost, aadam, qemu-devel, laine,
	Jens Freimann, ailan

* Michael S. Tsirkin (mst@redhat.com) wrote:
> On Tue, Jun 04, 2019 at 03:43:21PM +0200, Jens Freimann wrote:
> > On Mon, Jun 03, 2019 at 04:36:48PM -0300, Eduardo Habkost wrote:
> > > On Mon, Jun 03, 2019 at 10:24:56AM +0200, Jens Freimann wrote:
> > > > On Fri, May 31, 2019 at 06:47:48PM -0300, Eduardo Habkost wrote:
> > > > > On Thu, May 30, 2019 at 04:56:45PM +0200, Jens Freimann wrote:
> > > > > > On Tue, May 28, 2019 at 11:04:15AM -0400, Michael S. Tsirkin wrote:
> > > > > > > On Tue, May 21, 2019 at 10:45:05AM +0100, Dr. David Alan Gilbert wrote:
> > > > > > > > * Jens Freimann (jfreimann@redhat.com) wrote:
> > > > Why is it bad to fully re-create the device in case of a failed migration?
> > > 
> > > Bad or not, I thought the whole point of doing it inside QEMU was
> > > to do something libvirt wouldn't be able to do (namely,
> > > unplugging the device while not freeing resources).  If we are
> > > doing something that management software is already capable of
> > > doing, what's the point?
> > 
> > Event though management software seems to be capable of it, a failover
> > implementation has never happened. As Michael says network failover is
> > a mechanism (there's no good reason not to use a PT device if it is
> > available), not a policy. We are now trying to implement it in a
> > simple way, contained within QEMU.
> > 
> > > Quoting a previous message from this thread:
> > > 
> > > On Thu, May 30, 2019 at 02:09:42PM -0400, Michael S. Tsirkin wrote:
> > > | > On Thu, May 30, 2019 at 07:00:23PM +0100, Dr. David Alan Gilbert wrote:
> > > | > >  This patch series is very
> > > | > > odd precisely because it's trying to do the unplug itself in the
> > > | > > migration phase rather than let the management layer do it - so unless
> > > | > > it's nailed down how to make sure that's really really bullet proof
> > > | > > then we've got to go back and ask the question about whether we should
> > > | > > really fix it so it can be done by the management layer.
> > > | > >
> > > | > > Dave
> > > | >
> > > | > management already said they can't because files get closed and
> > > | > resources freed on unplug and so they might not be able to re-add device
> > > | > on migration failure. We do it in migration because that is
> > > | > where failures can happen and we can recover.
> > 
> > This is something that I can work on as well, but it doesn't have to
> > be part of this patch set in my opinion. Let's say migration fails and we can't
> > re-plug the primary device. We can still use the standby (virtio-net)
> > device which would only mean slower networking. How likely is it that
> > the primary device is grabbed by another VM between unplugging and
> > migration failure anyway?
> > 
> > regards,
> > Jens
> 
> I think I agree with Eduardo it's very important to handle this corner
> case correctly. Fast networking outside migration is why people use
> failover at all.  Someone who can live with a slower virtio would use
> just that.
> 
> And IIRC this corner case is exactly why libvirt could not
> implement it correctly itself and had to push it up the stack
> until it fell off the cliff :).

So I think we need to have the code that shows we can cope with the
corner cases - or provide a way for libvirt to handle it (which is
my strong preference).

Dave


> > 
> > > 
> > > -- 
> > > Eduardo
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-06-04 13:43             ` Jens Freimann
  2019-06-04 14:09               ` Eduardo Habkost
  2019-06-04 17:06               ` Michael S. Tsirkin
@ 2019-06-05 14:36               ` Daniel P. Berrangé
  2019-06-05 16:04               ` Laine Stump
  3 siblings, 0 replies; 77+ messages in thread
From: Daniel P. Berrangé @ 2019-06-05 14:36 UTC (permalink / raw)
  To: Jens Freimann
  Cc: pkrempa, Eduardo Habkost, mst, aadam, qemu-devel,
	Dr. David Alan Gilbert, laine, ailan

On Tue, Jun 04, 2019 at 03:43:21PM +0200, Jens Freimann wrote:
> On Mon, Jun 03, 2019 at 04:36:48PM -0300, Eduardo Habkost wrote:
> > On Mon, Jun 03, 2019 at 10:24:56AM +0200, Jens Freimann wrote:
> > > On Fri, May 31, 2019 at 06:47:48PM -0300, Eduardo Habkost wrote:
> > > > On Thu, May 30, 2019 at 04:56:45PM +0200, Jens Freimann wrote:
> > > > > On Tue, May 28, 2019 at 11:04:15AM -0400, Michael S. Tsirkin wrote:
> > > > > > On Tue, May 21, 2019 at 10:45:05AM +0100, Dr. David Alan Gilbert wrote:
> > > > > > > * Jens Freimann (jfreimann@redhat.com) wrote:
> > > Why is it bad to fully re-create the device in case of a failed migration?
> > 
> > Bad or not, I thought the whole point of doing it inside QEMU was
> > to do something libvirt wouldn't be able to do (namely,
> > unplugging the device while not freeing resources).  If we are
> > doing something that management software is already capable of
> > doing, what's the point?
> 
> Event though management software seems to be capable of it, a failover
> implementation has never happened. As Michael says network failover is
> a mechanism (there's no good reason not to use a PT device if it is
> available), not a policy. We are now trying to implement it in a
> simple way, contained within QEMU.
> 
> > Quoting a previous message from this thread:
> > 
> > On Thu, May 30, 2019 at 02:09:42PM -0400, Michael S. Tsirkin wrote:
> > | > On Thu, May 30, 2019 at 07:00:23PM +0100, Dr. David Alan Gilbert wrote:
> > | > >  This patch series is very
> > | > > odd precisely because it's trying to do the unplug itself in the
> > | > > migration phase rather than let the management layer do it - so unless
> > | > > it's nailed down how to make sure that's really really bullet proof
> > | > > then we've got to go back and ask the question about whether we should
> > | > > really fix it so it can be done by the management layer.
> > | > >
> > | > > Dave
> > | >
> > | > management already said they can't because files get closed and
> > | > resources freed on unplug and so they might not be able to re-add device
> > | > on migration failure. We do it in migration because that is
> > | > where failures can happen and we can recover.
> 
> This is something that I can work on as well, but it doesn't have to
> be part of this patch set in my opinion. Let's say migration fails and we can't
> re-plug the primary device. We can still use the standby (virtio-net)
> device which would only mean slower networking. How likely is it that
> the primary device is grabbed by another VM between unplugging and
> migration failure anyway?

The case of another VM taking the primary device is *not* a problem for
libvirt. We keep track of which device is allocated for use by which
guest, so even if its not currently plugged into the guest, we won't
give it away to a second guest.

The failure scenario is the edge cases where replugging the device fails
for some reason more outside libvirt's control. Running out of file
descriptors, memory allocation failure when pinning guest RAM. Essentially
any failure path that may arise from "device-add vfio..."

In such a case the device won't get replugged. So the mgmt app will think
the migration was rolled back, but the rollback won't be complete as the
original device will be missing.

I guess the question is whether that's really something to worry about ?

Can we justifiably just leave this as a docs problem give that it would
be very rare failure ?

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-06-03 18:46             ` Alex Williamson
@ 2019-06-05 15:20               ` Daniel P. Berrangé
  2019-06-06 15:00               ` Roman Kagan
  1 sibling, 0 replies; 77+ messages in thread
From: Daniel P. Berrangé @ 2019-06-05 15:20 UTC (permalink / raw)
  To: Alex Williamson
  Cc: pkrempa, Eduardo Habkost, mst, aadam, qemu-devel,
	Dr. David Alan Gilbert, Laine Stump, Jens Freimann, ailan

On Mon, Jun 03, 2019 at 12:46:52PM -0600, Alex Williamson wrote:
> On Mon, 3 Jun 2019 14:10:52 -0400
> Laine Stump <laine@redhat.com> wrote:
> 
> > On 6/3/19 4:24 AM, Jens Freimann wrote:
> > > On Fri, May 31, 2019 at 06:47:48PM -0300, Eduardo Habkost wrote:  
> > >> On Thu, May 30, 2019 at 04:56:45PM +0200, Jens Freimann wrote:  
> > >>> On Tue, May 28, 2019 at 11:04:15AM -0400, Michael S. Tsirkin wrote:  
> > >>> > On Tue, May 21, 2019 at 10:45:05AM +0100, Dr. David Alan Gilbert   
> > >>> wrote:  
> > >>> > > * Jens Freimann (jfreimann@redhat.com) wrote:  
> > >> [...]  
> > >>> > > > +    }
> > >>> > > > +    if (migration_in_setup(s) && !should_be_hidden &&   
> > >>> n->primary_dev) {  
> > >>> > > > +        qdev_unplug(n->primary_dev, &err);  
> > >>> > >
> > >>> > > Not knowing unplug well; can you just explain - is that device hard
> > >>> > > unplugged and it's gone by the time this function returns or is   
> > >>> it still  
> > >>> > > hanging around for some indeterminate time?  
> > >>>
> > >>> Qemu will trigger an unplug request via pcie attention button in 
> > >>> which case
> > >>> there could be a delay by the guest operating system. We could give 
> > >>> it some
> > >>> amount of time and if nothing happens try surpise removal or handle the
> > >>> error otherwise.  
> > >>
> > >> I'm missing something here:
> > >>
> > >> Isn't the whole point of the new device-hiding infrastructure to
> > >> prevent QEMU from closing the VFIO until migration ended
> > >> successfully?  
> > > 
> > > No. The point of hiding it is to only add the VFIO (that is configured
> > > with the same MAC as the virtio-net device) until the
> > > VIRTIO_NET_F_STANDBY feature is negotiated. We don't want to expose to
> > > devices with the same MAC to guests who can't handle it.
> > >   
> > >> What exactly is preventing QEMU from closing the host VFIO device
> > >> after the guest OS has handled the unplug request?  
> > > 
> > > We qdev_unplug() the VFIO device and want the virtio-net standby device to
> > > take over. If something goes wrong with unplug or
> > > migration in general we have to qdev_plug() the device back.
> > > 
> > > This series does not try to implement new functionality to close a
> > > device without freeing the resources.
> > > 
> > >  From the discussion in this thread I understand that is what libvirt
> > > needs though. Something that will trigger the unplug from the
> > > guest but not free the devices resources in the host system (which is
> > > what qdev_unplug() does). Correct?
> > > Why is it bad to fully re-create the device in case of a failed migration?  
> > 
> > I think the concern is that if the device was fully released by qemu 
> > during migration, it might have already been given to some other/new 
> > guest during the time that migration is trying to complete. If migration 
> > then fails, you may be unable to restore the guest to the previous state.
> 
> Yep, plus I think the memory pinning and IOMMU resources could be a
> variable as well.  Essentially, there's no guaranteed reservation to
> the device or any of the additional resources that the device implies
> once it's released, so we want to keep as much of that on hot-standby
> as we can in case the migration fails.  Unfortunately even just
> unmapping the BARs for a guest-only hot-unplug unmaps those regions
> from the IOMMU, but aside from catastrophic resource issues on the
> host, we can essentially guarantee being able to remap those.  Thanks,

Yes its these other resource allocations that are the problem. Libvirt
can easily ensure that the actual PCI device is not given away to a
second guest until migration completes. The mgmt app above libvirt is
likely ensure this exclusion of PCI devices too.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-05-30 18:09           ` Michael S. Tsirkin
  2019-05-30 18:22             ` Eduardo Habkost
  2019-05-30 19:08             ` Dr. David Alan Gilbert
@ 2019-06-05 15:23             ` Daniel P. Berrangé
  2 siblings, 0 replies; 77+ messages in thread
From: Daniel P. Berrangé @ 2019-06-05 15:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: pkrempa, ehabkost, aadam, qemu-devel, Dr. David Alan Gilbert,
	laine, Jens Freimann, ailan

On Thu, May 30, 2019 at 02:09:42PM -0400, Michael S. Tsirkin wrote:
> On Thu, May 30, 2019 at 07:00:23PM +0100, Dr. David Alan Gilbert wrote:
> > * Michael S. Tsirkin (mst@redhat.com) wrote:
> > > On Thu, May 30, 2019 at 04:56:45PM +0200, Jens Freimann wrote:
> > > > Hi David,
> > > > 
> > > > sorry for the  delayed reply.
> > > > 
> > > > On Tue, May 28, 2019 at 11:04:15AM -0400, Michael S. Tsirkin wrote:
> > > > > On Tue, May 21, 2019 at 10:45:05AM +0100, Dr. David Alan Gilbert wrote:
> > > > > > * Jens Freimann (jfreimann@redhat.com) wrote:
> > > > > > > +static void virtio_net_primary_plug_timer(void *opaque);
> > > > > > > +
> > > > > > >  static void virtio_net_set_link_status(NetClientState *nc)
> > > > > > >  {
> > > > > > >      VirtIONet *n = qemu_get_nic_opaque(nc);
> > > > > > > @@ -786,6 +796,14 @@ static void virtio_net_set_features(VirtIODevice *vdev, uint64_t features)
> > > > > > >      } else {
> > > > > > >          memset(n->vlans, 0xff, MAX_VLAN >> 3);
> > > > > > >      }
> > > > > > > +
> > > > > > > +    if (virtio_has_feature(features, VIRTIO_NET_F_STANDBY)) {
> > > > > > > +        atomic_set(&n->primary_should_be_hidden, false);
> > > > > > > +        if (n->primary_device_timer)
> > > > > > > +            timer_mod(n->primary_device_timer,
> > > > > > > +                qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> > > > > > > +                4000);
> > > > > > > +    }
> > > > > > 
> > > > > > What's this magic timer constant and why?
> > > > 
> > > > To be honest it's a leftover from previous versions (before I took
> > > > over) of the patches and I'm not sure why the timer is there.
> > > > I removed it and so far see no reason to keep it.
> > > > 
> > > > > > 
> > > > > > >  }
> > > > > > >
> > > > > > >  static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
> > > > > > > @@ -2626,6 +2644,87 @@ void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
> > > > > > >      n->netclient_type = g_strdup(type);
> > > > > > >  }
> > > > > > >
> > > > > > > +static void virtio_net_primary_plug_timer(void *opaque)
> > > > > > > +{
> > > > > > > +    VirtIONet *n = opaque;
> > > > > > > +    Error *err = NULL;
> > > > > > > +
> > > > > > > +    if (n->primary_device_dict)
> > > > > > > +        n->primary_device_opts = qemu_opts_from_qdict(qemu_find_opts("device"),
> > > > > > > +            n->primary_device_dict, &err);
> > > > > > > +    if (n->primary_device_opts) {
> > > > > > > +        n->primary_dev = qdev_device_add(n->primary_device_opts, &err);
> > > > > > > +        error_setg(&err, "virtio_net: couldn't plug in primary device");
> > > > > > > +        return;
> > > > > > > +    }
> > > > > > > +    if (!n->primary_device_dict && err) {
> > > > > > > +        if (n->primary_device_timer) {
> > > > > > > +            timer_mod(n->primary_device_timer,
> > > > > > > +                qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> > > > > > > +                100);
> > > > > > 
> > > > > > same here.
> > > > 
> > > > see above
> > > > 
> > > > > > 
> > > > > > 
> > > > > > > +        }
> > > > > > > +    }
> > > > > > > +}
> > > > > > > +
> > > > > > > +static void virtio_net_handle_migration_primary(VirtIONet *n,
> > > > > > > +                                                MigrationState *s)
> > > > > > > +{
> > > > > > > +    Error *err = NULL;
> > > > > > > +    bool should_be_hidden = atomic_read(&n->primary_should_be_hidden);
> > > > > > > +
> > > > > > > +    n->primary_dev = qdev_find_recursive(sysbus_get_default(),
> > > > > > > +            n->primary_device_id);
> > > > > > > +    if (!n->primary_dev) {
> > > > > > > +        error_setg(&err, "virtio_net: couldn't find primary device");
> > > > > > 
> > > > > > There's something broken with the error handling in this function - the
> > > > > > 'err' never goes anywhere - I don't think it ever gets printed or
> > > > > > reported or stops the migration.
> > > > 
> > > > yes, I'll fix it.
> > > > 
> > > > > > > +    }
> > > > > > > +    if (migration_in_setup(s) && !should_be_hidden && n->primary_dev) {
> > > > > > > +        qdev_unplug(n->primary_dev, &err);
> > > > > > 
> > > > > > Not knowing unplug well; can you just explain - is that device hard
> > > > > > unplugged and it's gone by the time this function returns or is it still
> > > > > > hanging around for some indeterminate time?
> > > > 
> > > > Qemu will trigger an unplug request via pcie attention button in which case
> > > > there could be a delay by the guest operating system. We could give it some
> > > > amount of time and if nothing happens try surpise removal or handle the
> > > > error otherwise.
> > > > 
> > > > 
> > > > regards,
> > > > Jens
> > > 
> > > That's a subject for another day. Let's get the basic thing
> > > working.
> > 
> > Well no, we need to know this thing isn't going to hang in the migration
> > setup phase, or if it does how we recover.
> 
> 
> This thing is *supposed* to be stuck in migration startup phase
> if guest is malicious.
> 
> If migration does not progress management needs
> a way to detect this and cancel.
> 
> Some more documentation about how this is supposed to happen
> would be helpful.

We need more than merely documentation in this area. We need some explicit
migration status or event exported from QMP to reflect that fact that QEMU
has not yet started migration, because it is waiting for PCI device unplug
to complete.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-06-04 13:43             ` Jens Freimann
                                 ` (2 preceding siblings ...)
  2019-06-05 14:36               ` Daniel P. Berrangé
@ 2019-06-05 16:04               ` Laine Stump
  2019-06-05 16:19                 ` Daniel P. Berrangé
  3 siblings, 1 reply; 77+ messages in thread
From: Laine Stump @ 2019-06-05 16:04 UTC (permalink / raw)
  To: Jens Freimann, Eduardo Habkost
  Cc: pkrempa, berrange, mst, aadam, qemu-devel, Dr. David Alan Gilbert, ailan

On 6/4/19 9:43 AM, Jens Freimann wrote:
> On Mon, Jun 03, 2019 at 04:36:48PM -0300, Eduardo Habkost wrote:
>> On Mon, Jun 03, 2019 at 10:24:56AM +0200, Jens Freimann wrote:
>>> On Fri, May 31, 2019 at 06:47:48PM -0300, Eduardo Habkost wrote:
>>> > On Thu, May 30, 2019 at 04:56:45PM +0200, Jens Freimann wrote:
>>> > > On Tue, May 28, 2019 at 11:04:15AM -0400, Michael S. Tsirkin wrote:
>>> > > > On Tue, May 21, 2019 at 10:45:05AM +0100, Dr. David Alan 
>>> Gilbert wrote:
>>> > > > > * Jens Freimann (jfreimann@redhat.com) wrote:
>>> Why is it bad to fully re-create the device in case of a failed 
>>> migration?
>>
>> Bad or not, I thought the whole point of doing it inside QEMU was
>> to do something libvirt wouldn't be able to do (namely,
>> unplugging the device while not freeing resources).  If we are
>> doing something that management software is already capable of
>> doing, what's the point?
> 
> Event though management software seems to be capable of it, a failover
> implementation has never happened.

I'm pretty sure RHV/oVirt+vdsm has implemented it and it is even being 
used in production. Of course it requires a bond/team device to be 
configured in the guest OS, but the part about auto-detaching the VF 
before migration, then reattaching a similar VF on the destination is 
all done by vdsm. (Don't misunderstand this as discouraging this new 
method! Just wanted to set the record straight.)



^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-06-05 16:04               ` Laine Stump
@ 2019-06-05 16:19                 ` Daniel P. Berrangé
  0 siblings, 0 replies; 77+ messages in thread
From: Daniel P. Berrangé @ 2019-06-05 16:19 UTC (permalink / raw)
  To: Laine Stump
  Cc: pkrempa, Eduardo Habkost, mst, aadam, Dr. David Alan Gilbert,
	qemu-devel, Jens Freimann, ailan

On Wed, Jun 05, 2019 at 12:04:28PM -0400, Laine Stump wrote:
> On 6/4/19 9:43 AM, Jens Freimann wrote:
> > On Mon, Jun 03, 2019 at 04:36:48PM -0300, Eduardo Habkost wrote:
> > > On Mon, Jun 03, 2019 at 10:24:56AM +0200, Jens Freimann wrote:
> > > > On Fri, May 31, 2019 at 06:47:48PM -0300, Eduardo Habkost wrote:
> > > > > On Thu, May 30, 2019 at 04:56:45PM +0200, Jens Freimann wrote:
> > > > > > On Tue, May 28, 2019 at 11:04:15AM -0400, Michael S. Tsirkin wrote:
> > > > > > > On Tue, May 21, 2019 at 10:45:05AM +0100, Dr. David Alan
> > > > Gilbert wrote:
> > > > > > > > * Jens Freimann (jfreimann@redhat.com) wrote:
> > > > Why is it bad to fully re-create the device in case of a failed
> > > > migration?
> > > 
> > > Bad or not, I thought the whole point of doing it inside QEMU was
> > > to do something libvirt wouldn't be able to do (namely,
> > > unplugging the device while not freeing resources).  If we are
> > > doing something that management software is already capable of
> > > doing, what's the point?
> > 
> > Event though management software seems to be capable of it, a failover
> > implementation has never happened.
> 
> I'm pretty sure RHV/oVirt+vdsm has implemented it and it is even being used
> in production. Of course it requires a bond/team device to be configured in
> the guest OS, but the part about auto-detaching the VF before migration,
> then reattaching a similar VF on the destination is all done by vdsm. (Don't
> misunderstand this as discouraging this new method! Just wanted to set the
> record straight.)

OpenStack will detach/reattach PCI devices around a save-to-disk, but
does not currently do that for live migration, but easily could do. The
blocker why they've not done that is the issue around exposing to the
guest which pairs of devices are intended to be used together in the
bond.

OpenStack could have defined a way to express that, but it would be
specific to OpenStack which is not very desirable. Standardization
of how to express the relationship between the pair of devices would
be very desirable, allowing the host side solution to be done in either
QEMU or the mgmt app as they see fit.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-06-03 18:46             ` Alex Williamson
  2019-06-05 15:20               ` Daniel P. Berrangé
@ 2019-06-06 15:00               ` Roman Kagan
  1 sibling, 0 replies; 77+ messages in thread
From: Roman Kagan @ 2019-06-06 15:00 UTC (permalink / raw)
  To: Alex Williamson
  Cc: pkrempa, berrange, Eduardo Habkost, mst, aadam, qemu-devel,
	Dr. David Alan	Gilbert, Laine Stump, Jens Freimann, ailan

On Mon, Jun 03, 2019 at 12:46:52PM -0600, Alex Williamson wrote:
> On Mon, 3 Jun 2019 14:10:52 -0400
> Laine Stump <laine@redhat.com> wrote:
> 
> > On 6/3/19 4:24 AM, Jens Freimann wrote:
> > > On Fri, May 31, 2019 at 06:47:48PM -0300, Eduardo Habkost wrote:  
> > >> On Thu, May 30, 2019 at 04:56:45PM +0200, Jens Freimann wrote:  
> > >>> On Tue, May 28, 2019 at 11:04:15AM -0400, Michael S. Tsirkin wrote:  
> > >>> > On Tue, May 21, 2019 at 10:45:05AM +0100, Dr. David Alan Gilbert   
> > >>> wrote:  
> > >>> > > * Jens Freimann (jfreimann@redhat.com) wrote:  
> > >> [...]  
> > >>> > > > +    }
> > >>> > > > +    if (migration_in_setup(s) && !should_be_hidden &&   
> > >>> n->primary_dev) {  
> > >>> > > > +        qdev_unplug(n->primary_dev, &err);  
> > >>> > >
> > >>> > > Not knowing unplug well; can you just explain - is that device hard
> > >>> > > unplugged and it's gone by the time this function returns or is   
> > >>> it still  
> > >>> > > hanging around for some indeterminate time?  
> > >>>
> > >>> Qemu will trigger an unplug request via pcie attention button in 
> > >>> which case
> > >>> there could be a delay by the guest operating system. We could give 
> > >>> it some
> > >>> amount of time and if nothing happens try surpise removal or handle the
> > >>> error otherwise.  
> > >>
> > >> I'm missing something here:
> > >>
> > >> Isn't the whole point of the new device-hiding infrastructure to
> > >> prevent QEMU from closing the VFIO until migration ended
> > >> successfully?  
> > > 
> > > No. The point of hiding it is to only add the VFIO (that is configured
> > > with the same MAC as the virtio-net device) until the
> > > VIRTIO_NET_F_STANDBY feature is negotiated. We don't want to expose to
> > > devices with the same MAC to guests who can't handle it.
> > >   
> > >> What exactly is preventing QEMU from closing the host VFIO device
> > >> after the guest OS has handled the unplug request?  
> > > 
> > > We qdev_unplug() the VFIO device and want the virtio-net standby device to
> > > take over. If something goes wrong with unplug or
> > > migration in general we have to qdev_plug() the device back.
> > > 
> > > This series does not try to implement new functionality to close a
> > > device without freeing the resources.
> > > 
> > >  From the discussion in this thread I understand that is what libvirt
> > > needs though. Something that will trigger the unplug from the
> > > guest but not free the devices resources in the host system (which is
> > > what qdev_unplug() does). Correct?
> > > Why is it bad to fully re-create the device in case of a failed migration?  
> > 
> > I think the concern is that if the device was fully released by qemu 
> > during migration, it might have already been given to some other/new 
> > guest during the time that migration is trying to complete. If migration 
> > then fails, you may be unable to restore the guest to the previous state.
> 
> Yep, plus I think the memory pinning and IOMMU resources could be a
> variable as well.  Essentially, there's no guaranteed reservation to
> the device or any of the additional resources that the device implies
> once it's released, so we want to keep as much of that on hot-standby
> as we can in case the migration fails.  Unfortunately even just
> unmapping the BARs for a guest-only hot-unplug unmaps those regions
> from the IOMMU, but aside from catastrophic resource issues on the
> host, we can essentially guarantee being able to remap those.  Thanks,

Isn't this also the case for guest (re)boots?

IOW libvirt/mgmt will anyway have to be aware of the PT-PV pairing and
that pair being in a degraded state sometimes.  Then the migration of
such VMs would just imply transitioning to the degraded state prior to
the actual migration.  Which sounds much like putting the mostly
existing bits together in libvirt/mgmt and nothing to be done in QEMU.

Am I missing anything?

Thanks,
Roman.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices
  2019-06-03 18:18                 ` Laine Stump
@ 2019-06-06 21:49                   ` Michael S. Tsirkin
  0 siblings, 0 replies; 77+ messages in thread
From: Michael S. Tsirkin @ 2019-06-06 21:49 UTC (permalink / raw)
  To: Laine Stump
  Cc: pkrempa, berrange, ehabkost, aadam, qemu-devel, Alex Williamson,
	si-wei liu, Jens Freimann, ailan

On Mon, Jun 03, 2019 at 02:18:19PM -0400, Laine Stump wrote:
> On 6/3/19 2:12 PM, Michael S. Tsirkin wrote:
> > On Mon, Jun 03, 2019 at 02:06:47PM -0400, Laine Stump wrote:
> > > On 5/28/19 10:54 PM, Michael S. Tsirkin wrote:
> > > > On Tue, May 28, 2019 at 05:14:22PM -0700, si-wei liu wrote:
> > > > > 
> > > > > 
> > > > > On 5/21/2019 11:49 AM, Jens Freimann wrote:
> > > > > > On Tue, May 21, 2019 at 07:37:19AM -0400, Michael S. Tsirkin wrote:
> > > > > > > On Tue, May 21, 2019 at 09:21:57AM +0200, Jens Freimann wrote:
> > > > > > > > On Mon, May 20, 2019 at 04:56:57PM -0600, Alex Williamson wrote:
> > > 
> > > > > > > Actually is there a list of devices for which this has been tested
> > > > > > > besides mlx5? I think someone said some old intel cards
> > > > > > > don't support this well, we might need to blacklist these ...
> > > > > > 
> > > > > > So far I've tested mlx5 and XL710 which both worked, but I'm
> > > > > > working on testing with more devices. But of course help with testing
> > > > > > is greatly appreciated.
> > > > > 
> > > > > It won't work on Intel ixgbe and Broadcom bnxt_en, which requires toggling
> > > > > the state of tap backing the virtio-net in order to release/reprogram MAC
> > > > > filter. Actually, it's very few NICs that could work with this - even some
> > > > > works by chance the behavior is undefined. Instead of blacklisting it makes
> > > > > more sense to whitelist the NIC that supports it - with some new sysfs
> > > > > attribute claiming the support presumably.
> > > > > 
> > > > > -Siwei
> > > > 
> > > > I agree for many cards we won't know how they behave until we try.  One
> > > > can consider this a bug in Linux that cards don't behave in a consistent
> > > > way.  The best thing to do IMHO would be to write a tool that people can
> > > > run to test the behaviour.
> > > 
> > > Is the "bad behavior" something due to the hardware of the cards, or their
> > > drivers? If it's the latter, then at least initially having a whitelist
> > > would be counterproductive, since it would make it difficult for relative
> > > outsiders to test and report success/failure of various cards.
> > 
> > We can add an "ignore whitelist" flag. Would that address the issue?
> 
> It would be better than requiring a kernel/qemu recompile :-)
> 
> 
> Where would the whilelist live? In qemu or in the kernel? It would be
> problematic to have the whitelist in qemu if kernel driver changes could fix
> a particular card.

So originally I thought:
- add some interface in the kernel to signal new behaviour
- start with a whitelist in qemu
- if not on the whitelist, check the new interface
- if not there, check a "force" flag on the device

But one problem with all of the above is that it's actually
too late. With a broken driver when management sets MAC on the
to-be-primary VF traffic stops being sent to standby.

> Beyond that, what about *always* just issuing some sort of warning rather
> than completely forbidding a card that wasn't whitelisted? (Haven't decided
> if I like that better or not (and it probably doesn't matter, since I'm not
> a "real" user, but I thought I would mention it).

People tend to ignore warnings :)

-- 
MST


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-06-04 19:00                 ` Dr. David Alan Gilbert
@ 2019-06-07 14:14                   ` Jens Freimann
  2019-06-07 14:32                     ` Michael S. Tsirkin
  2019-06-07 17:51                     ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 77+ messages in thread
From: Jens Freimann @ 2019-06-07 14:14 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: pkrempa, berrange, Eduardo Habkost, Michael S. Tsirkin, aadam,
	qemu-devel, laine, ailan

On Tue, Jun 04, 2019 at 08:00:19PM +0100, Dr. David Alan Gilbert wrote:
>* Michael S. Tsirkin (mst@redhat.com) wrote:
>> On Tue, Jun 04, 2019 at 03:43:21PM +0200, Jens Freimann wrote:
>> > On Mon, Jun 03, 2019 at 04:36:48PM -0300, Eduardo Habkost wrote:
>> > > On Mon, Jun 03, 2019 at 10:24:56AM +0200, Jens Freimann wrote:
>> > > > On Fri, May 31, 2019 at 06:47:48PM -0300, Eduardo Habkost wrote:
>> > > > > On Thu, May 30, 2019 at 04:56:45PM +0200, Jens Freimann wrote:
>> > > > > > On Tue, May 28, 2019 at 11:04:15AM -0400, Michael S. Tsirkin wrote:
>> > > > > > > On Tue, May 21, 2019 at 10:45:05AM +0100, Dr. David Alan Gilbert wrote:
>> > > > > > > > * Jens Freimann (jfreimann@redhat.com) wrote:
>> > > > Why is it bad to fully re-create the device in case of a failed migration?
>> > >
>> > > Bad or not, I thought the whole point of doing it inside QEMU was
>> > > to do something libvirt wouldn't be able to do (namely,
>> > > unplugging the device while not freeing resources).  If we are
>> > > doing something that management software is already capable of
>> > > doing, what's the point?
>> >
>> > Event though management software seems to be capable of it, a failover
>> > implementation has never happened. As Michael says network failover is
>> > a mechanism (there's no good reason not to use a PT device if it is
>> > available), not a policy. We are now trying to implement it in a
>> > simple way, contained within QEMU.
>> >
>> > > Quoting a previous message from this thread:
>> > >
>> > > On Thu, May 30, 2019 at 02:09:42PM -0400, Michael S. Tsirkin wrote:
>> > > | > On Thu, May 30, 2019 at 07:00:23PM +0100, Dr. David Alan Gilbert wrote:
>> > > | > >  This patch series is very
>> > > | > > odd precisely because it's trying to do the unplug itself in the
>> > > | > > migration phase rather than let the management layer do it - so unless
>> > > | > > it's nailed down how to make sure that's really really bullet proof
>> > > | > > then we've got to go back and ask the question about whether we should
>> > > | > > really fix it so it can be done by the management layer.
>> > > | > >
>> > > | > > Dave
>> > > | >
>> > > | > management already said they can't because files get closed and
>> > > | > resources freed on unplug and so they might not be able to re-add device
>> > > | > on migration failure. We do it in migration because that is
>> > > | > where failures can happen and we can recover.
>> >
>> > This is something that I can work on as well, but it doesn't have to
>> > be part of this patch set in my opinion. Let's say migration fails and we can't
>> > re-plug the primary device. We can still use the standby (virtio-net)
>> > device which would only mean slower networking. How likely is it that
>> > the primary device is grabbed by another VM between unplugging and
>> > migration failure anyway?
>> >
>> > regards,
>> > Jens
>>
>> I think I agree with Eduardo it's very important to handle this corner
>> case correctly. Fast networking outside migration is why people use
>> failover at all.  Someone who can live with a slower virtio would use
>> just that.
>>
>> And IIRC this corner case is exactly why libvirt could not
>> implement it correctly itself and had to push it up the stack
>> until it fell off the cliff :).
>
>So I think we need to have the code that shows we can cope with the
>corner cases - or provide a way for libvirt to handle it (which is
>my strong preference).

Would this work: We add a new migration state MIGRATE_WAIT_UNPLUG (or
a better more generic name) which tells libvirt that migration has not
started yet because we are waiting for the guest. And extend the qmp
events for the migration state. When we know the device was
sucessfully unplugged we sent a qmp event DEVICE_DELETED or a new one
DEVICE_DELETED_PARTIALLY (not sure about that yet), let migration
start and set the migration state to active?

To do a partial unplug I imagine, we have to separate vfio(-pci) code
to differ between release of resources (fds, mappings etc) and unplug
(I haven't yet found out how it works in vfio). In the failover case
we only do the unplug part but not the release part. 

regards,
Jens 


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-06-07 14:14                   ` Jens Freimann
@ 2019-06-07 14:32                     ` Michael S. Tsirkin
  2019-06-07 17:51                     ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 77+ messages in thread
From: Michael S. Tsirkin @ 2019-06-07 14:32 UTC (permalink / raw)
  To: Jens Freimann
  Cc: pkrempa, berrange, Eduardo Habkost, aadam, qemu-devel,
	Dr. David Alan Gilbert, laine, ailan

On Fri, Jun 07, 2019 at 04:14:07PM +0200, Jens Freimann wrote:
> On Tue, Jun 04, 2019 at 08:00:19PM +0100, Dr. David Alan Gilbert wrote:
> > * Michael S. Tsirkin (mst@redhat.com) wrote:
> > > On Tue, Jun 04, 2019 at 03:43:21PM +0200, Jens Freimann wrote:
> > > > On Mon, Jun 03, 2019 at 04:36:48PM -0300, Eduardo Habkost wrote:
> > > > > On Mon, Jun 03, 2019 at 10:24:56AM +0200, Jens Freimann wrote:
> > > > > > On Fri, May 31, 2019 at 06:47:48PM -0300, Eduardo Habkost wrote:
> > > > > > > On Thu, May 30, 2019 at 04:56:45PM +0200, Jens Freimann wrote:
> > > > > > > > On Tue, May 28, 2019 at 11:04:15AM -0400, Michael S. Tsirkin wrote:
> > > > > > > > > On Tue, May 21, 2019 at 10:45:05AM +0100, Dr. David Alan Gilbert wrote:
> > > > > > > > > > * Jens Freimann (jfreimann@redhat.com) wrote:
> > > > > > Why is it bad to fully re-create the device in case of a failed migration?
> > > > >
> > > > > Bad or not, I thought the whole point of doing it inside QEMU was
> > > > > to do something libvirt wouldn't be able to do (namely,
> > > > > unplugging the device while not freeing resources).  If we are
> > > > > doing something that management software is already capable of
> > > > > doing, what's the point?
> > > >
> > > > Event though management software seems to be capable of it, a failover
> > > > implementation has never happened. As Michael says network failover is
> > > > a mechanism (there's no good reason not to use a PT device if it is
> > > > available), not a policy. We are now trying to implement it in a
> > > > simple way, contained within QEMU.
> > > >
> > > > > Quoting a previous message from this thread:
> > > > >
> > > > > On Thu, May 30, 2019 at 02:09:42PM -0400, Michael S. Tsirkin wrote:
> > > > > | > On Thu, May 30, 2019 at 07:00:23PM +0100, Dr. David Alan Gilbert wrote:
> > > > > | > >  This patch series is very
> > > > > | > > odd precisely because it's trying to do the unplug itself in the
> > > > > | > > migration phase rather than let the management layer do it - so unless
> > > > > | > > it's nailed down how to make sure that's really really bullet proof
> > > > > | > > then we've got to go back and ask the question about whether we should
> > > > > | > > really fix it so it can be done by the management layer.
> > > > > | > >
> > > > > | > > Dave
> > > > > | >
> > > > > | > management already said they can't because files get closed and
> > > > > | > resources freed on unplug and so they might not be able to re-add device
> > > > > | > on migration failure. We do it in migration because that is
> > > > > | > where failures can happen and we can recover.
> > > >
> > > > This is something that I can work on as well, but it doesn't have to
> > > > be part of this patch set in my opinion. Let's say migration fails and we can't
> > > > re-plug the primary device. We can still use the standby (virtio-net)
> > > > device which would only mean slower networking. How likely is it that
> > > > the primary device is grabbed by another VM between unplugging and
> > > > migration failure anyway?
> > > >
> > > > regards,
> > > > Jens
> > > 
> > > I think I agree with Eduardo it's very important to handle this corner
> > > case correctly. Fast networking outside migration is why people use
> > > failover at all.  Someone who can live with a slower virtio would use
> > > just that.
> > > 
> > > And IIRC this corner case is exactly why libvirt could not
> > > implement it correctly itself and had to push it up the stack
> > > until it fell off the cliff :).
> > 
> > So I think we need to have the code that shows we can cope with the
> > corner cases - or provide a way for libvirt to handle it (which is
> > my strong preference).
> 
> Would this work: We add a new migration state MIGRATE_WAIT_UNPLUG (or
> a better more generic name) which tells libvirt that migration has not
> started yet because we are waiting for the guest. And extend the qmp
> events for the migration state. When we know the device was
> sucessfully unplugged we sent a qmp event DEVICE_DELETED or a new one
> DEVICE_DELETED_PARTIALLY (not sure about that yet), let migration
> start and set the migration state to active?
> 
> To do a partial unplug I imagine, we have to separate vfio(-pci) code
> to differ between release of resources (fds, mappings etc) and unplug
> (I haven't yet found out how it works in vfio). In the failover case
> we only do the unplug part but not the release part.
> 
> regards,
> Jens

I think the first is done in vfio_exitfn and the second in
vfio_instance_finalize.

-- 
MST


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
  2019-06-07 14:14                   ` Jens Freimann
  2019-06-07 14:32                     ` Michael S. Tsirkin
@ 2019-06-07 17:51                     ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 77+ messages in thread
From: Dr. David Alan Gilbert @ 2019-06-07 17:51 UTC (permalink / raw)
  To: Jens Freimann
  Cc: pkrempa, berrange, Eduardo Habkost, Michael S. Tsirkin, aadam,
	qemu-devel, laine, ailan

* Jens Freimann (jfreimann@redhat.com) wrote:
> On Tue, Jun 04, 2019 at 08:00:19PM +0100, Dr. David Alan Gilbert wrote:
> > * Michael S. Tsirkin (mst@redhat.com) wrote:
> > > On Tue, Jun 04, 2019 at 03:43:21PM +0200, Jens Freimann wrote:
> > > > On Mon, Jun 03, 2019 at 04:36:48PM -0300, Eduardo Habkost wrote:
> > > > > On Mon, Jun 03, 2019 at 10:24:56AM +0200, Jens Freimann wrote:
> > > > > > On Fri, May 31, 2019 at 06:47:48PM -0300, Eduardo Habkost wrote:
> > > > > > > On Thu, May 30, 2019 at 04:56:45PM +0200, Jens Freimann wrote:
> > > > > > > > On Tue, May 28, 2019 at 11:04:15AM -0400, Michael S. Tsirkin wrote:
> > > > > > > > > On Tue, May 21, 2019 at 10:45:05AM +0100, Dr. David Alan Gilbert wrote:
> > > > > > > > > > * Jens Freimann (jfreimann@redhat.com) wrote:
> > > > > > Why is it bad to fully re-create the device in case of a failed migration?
> > > > >
> > > > > Bad or not, I thought the whole point of doing it inside QEMU was
> > > > > to do something libvirt wouldn't be able to do (namely,
> > > > > unplugging the device while not freeing resources).  If we are
> > > > > doing something that management software is already capable of
> > > > > doing, what's the point?
> > > >
> > > > Event though management software seems to be capable of it, a failover
> > > > implementation has never happened. As Michael says network failover is
> > > > a mechanism (there's no good reason not to use a PT device if it is
> > > > available), not a policy. We are now trying to implement it in a
> > > > simple way, contained within QEMU.
> > > >
> > > > > Quoting a previous message from this thread:
> > > > >
> > > > > On Thu, May 30, 2019 at 02:09:42PM -0400, Michael S. Tsirkin wrote:
> > > > > | > On Thu, May 30, 2019 at 07:00:23PM +0100, Dr. David Alan Gilbert wrote:
> > > > > | > >  This patch series is very
> > > > > | > > odd precisely because it's trying to do the unplug itself in the
> > > > > | > > migration phase rather than let the management layer do it - so unless
> > > > > | > > it's nailed down how to make sure that's really really bullet proof
> > > > > | > > then we've got to go back and ask the question about whether we should
> > > > > | > > really fix it so it can be done by the management layer.
> > > > > | > >
> > > > > | > > Dave
> > > > > | >
> > > > > | > management already said they can't because files get closed and
> > > > > | > resources freed on unplug and so they might not be able to re-add device
> > > > > | > on migration failure. We do it in migration because that is
> > > > > | > where failures can happen and we can recover.
> > > >
> > > > This is something that I can work on as well, but it doesn't have to
> > > > be part of this patch set in my opinion. Let's say migration fails and we can't
> > > > re-plug the primary device. We can still use the standby (virtio-net)
> > > > device which would only mean slower networking. How likely is it that
> > > > the primary device is grabbed by another VM between unplugging and
> > > > migration failure anyway?
> > > >
> > > > regards,
> > > > Jens
> > > 
> > > I think I agree with Eduardo it's very important to handle this corner
> > > case correctly. Fast networking outside migration is why people use
> > > failover at all.  Someone who can live with a slower virtio would use
> > > just that.
> > > 
> > > And IIRC this corner case is exactly why libvirt could not
> > > implement it correctly itself and had to push it up the stack
> > > until it fell off the cliff :).
> > 
> > So I think we need to have the code that shows we can cope with the
> > corner cases - or provide a way for libvirt to handle it (which is
> > my strong preference).
> 
> Would this work: We add a new migration state MIGRATE_WAIT_UNPLUG (or
> a better more generic name) which tells libvirt that migration has not
> started yet because we are waiting for the guest. And extend the qmp
> events for the migration state. When we know the device was
> sucessfully unplugged we sent a qmp event DEVICE_DELETED or a new one
> DEVICE_DELETED_PARTIALLY (not sure about that yet), let migration
> start and set the migration state to active?

Potentially; lets see what the libvirt people have to say.
What happens if you have multiple devices and one of them unplugs OK and
then the other fails?

> To do a partial unplug I imagine, we have to separate vfio(-pci) code
> to differ between release of resources (fds, mappings etc) and unplug
> (I haven't yet found out how it works in vfio). In the failover case
> we only do the unplug part but not the release part.

Dave

> regards,
> Jens
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices
  2019-05-17 12:58 [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices Jens Freimann
                   ` (6 preceding siblings ...)
  2019-05-21 10:10 ` Michael S. Tsirkin
@ 2019-06-11 15:42 ` Laine Stump
  2019-06-11 15:51   ` Michael S. Tsirkin
  2019-06-12  9:11   ` Daniel P. Berrangé
  7 siblings, 2 replies; 77+ messages in thread
From: Laine Stump @ 2019-06-11 15:42 UTC (permalink / raw)
  To: Jens Freimann, qemu-devel; +Cc: pkrempa, berrange, ehabkost, mst, aadam, ailan

On 5/17/19 8:58 AM, Jens Freimann wrote:
> This is another attempt at implementing the host side of the
> net_failover concept
> (https://www.kernel.org/doc/html/latest/networking/net_failover.html)
> 
> Changes since last RFC:
> - work around circular dependency of commandline options. Just add
>    failover=on to the virtio-net standby options and reference it from
>    primary (vfio-pci) device with standby=<id>
> - add patch 3/4 to allow migration of vfio-pci device when it is part of a
>    failover pair, still disallow for all other devices
> - add patch 4/4 to allow unplug of device during migrationm, make an
>    exception for failover primary devices. I'd like feedback on how to
>    solve this more elegant. I added a boolean to DeviceState, have it
>    default to false for all devices except for primary devices.
> - not tested yet with surprise removal
> - I don't expect this to go in as it is, still needs more testing but
>    I'd like to get feedback on above mentioned changes.
> 
> The general idea is that we have a pair of devices, a vfio-pci and a
> emulated device. Before migration the vfio device is unplugged and data
> flows to the emulated device, on the target side another vfio-pci device
> is plugged in to take over the data-path. In the guest the net_failover
> module will pair net devices with the same MAC address.
> 
> * In the first patch the infrastructure for hiding the device is added
>    for the qbus and qdev APIs.
> 
> * In the second patch the virtio-net uses the API to defer adding the vfio
>    device until the VIRTIO_NET_F_STANDBY feature is acked.
> 
> Previous discussion:
>    RFC v1 https://patchwork.ozlabs.org/cover/989098/
>    RFC v2 https://www.mail-archive.com/qemu-devel@nongnu.org/msg606906.html
> 
> To summarize concerns/feedback from previous discussion:
> 1.- guest OS can reject or worse _delay_ unplug by any amount of time.
>    Migration might get stuck for unpredictable time with unclear reason.
>    This approach combines two tricky things, hot/unplug and migration.
>    -> We can surprise-remove the PCI device and in QEMU we can do all
>       necessary rollbacks transparent to management software. Will it be
>       easy, probably not.
> 2. PCI devices are a precious ressource. The primary device should never
>    be added to QEMU if it won't be used by guest instead of hiding it in
>    QEMU.
>    -> We only hotplug the device when the standby feature bit was
>       negotiated. We save the device cmdline options until we need it for
>       qdev_device_add()
>       Hiding a device can be a useful concept to model. For example a
>       pci device in a powered-off slot could be marked as hidden until the slot is
>       powered on (mst).
> 3. Management layer software should handle this. Open Stack already has
>    components/code to handle unplug/replug VFIO devices and metadata to
>    provide to the guest for detecting which devices should be paired.
>    -> An approach that includes all software from firmware to
>       higher-level management software wasn't tried in the last years. This is
>       an attempt to keep it simple and contained in QEMU as much as possible.
> 4. Hotplugging a device and then making it part of a failover setup is
>     not possible
>    -> addressed by extending qdev hotplug functions to check for hidden
>       attribute, so e.g. device_add can be used to plug a device.
> 
> 
> I have tested this with a mlx5 NIC and was able to migrate the VM with
> above mentioned workarounds for open problems.
> 
> Command line example:
> 
> qemu-system-x86_64 -enable-kvm -m 3072 -smp 3 \
>          -machine q35,kernel-irqchip=split -cpu host   \
>          -k fr   \
>          -serial stdio   \
>          -net none \
>          -qmp unix:/tmp/qmp.socket,server,nowait \
>          -monitor telnet:127.0.0.1:5555,server,nowait \
>          -device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
>          -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
>          -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
>          -netdev tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \
>          -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,failover=on \
>          /root/rhel-guest-image-8.0-1781.x86_64.qcow2
> 
> Then the primary device can be hotplugged via
>   (qemu) device_add vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,standby=net1


I guess this is the commandline on the migration destination, and as far 
as I understand from this example, on the destination we (meaning 
libvirt or higher level management application) must *not* include the 
assigned device on the qemu commandline, but must instead hotplug the 
device later after the guest CPUs have been restarted on the destination.

So if I'm understanding correctly, the idea is that on the migration 
source, the device may have been hotplugged, or may have been included 
when qemu was originally started. Then qemu automatically handles the 
unplug of the device on the source, but it seems qemu does nothing on 
the destination, leaving that up to libvirt or a higher layer to implement.

Then in order for this to work, libvirt (or OpenStack or oVirt or 
whoever) needs to understand that the device in the libvirt config (it 
will still be in the libvirt config, since from libvirt's POV it hasn't 
been unplugged):

1) shouldn't be included in the qemu commandline on the destination,

2) will almost surely need to be replaced with a different device on the 
destination (since it's almost certain that the destination won't have 
an available device at the same PCI address)

3) will probably need to be unbinded from the VF net driver (does this 
need to happen before migration is finished? If we want to lower the 
probability of a failure after we're already committed to the migration, 
then I think we must, but libvirt isn't set up for that in any way).

4) will need to be hotplugged after the migration has finished *and* 
after the guest CPUs have been restarted on the destination.


While it will be possible to assure that there is a destination device, 
and to replace the old device with new in the config (and maybe, either 
with some major reworking of device assignment code, or offloading the 
responsibility to the management application(s), possible to re-bind the 
device to the vfio-pci driver), prior to marking the migration as 
"successful" (thus committing to running it on the destination), we 
can't say as much for actually assigning the device. So if the 
assignment fails, then what happens?


So a few issues I see that will need to be solved by [someone] 
(apparently either libvirt or management):

a) there isn't anything in libvirt's XML grammar that allows us to 
signify a device that is "present in the config but shouldn't be 
included in the commandline"

b) someone will need to replace the device from the source with an 
equivalent device on the destination in the libvirt XML. There are other 
cases of management modifying the XML during migration (I think), but 
this does point out that putting the "auto-unplug code into qemu isn't 
turning this into a trivial

c) there is nothing in libvirt's migration logic that can cause a device 
to be re-binded to vfio-pci prior to completion of a migration. Unless 
this is added to libvirt (or the re-bind operation is passed off to the 
management application), we will need to live with the possibility that 
hotplugging the device will fail due to failed re-bind *after* we've 
committed to the migration.

d) once the guest CPUs are restarted on the destination, [someone] 
(libvirt or management) needs to hotplug the new device on the 
destination. (I'm guessing that a hotplug can only be done while the 
guest CPUs are running; correct me if this is wrong!)

This sounds like a lot of complexity for something that was supposed to 
be handled completely/transparently by qemu :-P.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices
  2019-06-11 15:42 ` Laine Stump
@ 2019-06-11 15:51   ` Michael S. Tsirkin
  2019-06-11 16:12     ` Laine Stump
  2019-06-12  9:11   ` Daniel P. Berrangé
  1 sibling, 1 reply; 77+ messages in thread
From: Michael S. Tsirkin @ 2019-06-11 15:51 UTC (permalink / raw)
  To: Laine Stump
  Cc: pkrempa, berrange, ehabkost, aadam, qemu-devel, Jens Freimann, ailan

On Tue, Jun 11, 2019 at 11:42:54AM -0400, Laine Stump wrote:
> On 5/17/19 8:58 AM, Jens Freimann wrote:
> > This is another attempt at implementing the host side of the
> > net_failover concept
> > (https://www.kernel.org/doc/html/latest/networking/net_failover.html)
> > 
> > Changes since last RFC:
> > - work around circular dependency of commandline options. Just add
> >    failover=on to the virtio-net standby options and reference it from
> >    primary (vfio-pci) device with standby=<id>
> > - add patch 3/4 to allow migration of vfio-pci device when it is part of a
> >    failover pair, still disallow for all other devices
> > - add patch 4/4 to allow unplug of device during migrationm, make an
> >    exception for failover primary devices. I'd like feedback on how to
> >    solve this more elegant. I added a boolean to DeviceState, have it
> >    default to false for all devices except for primary devices.
> > - not tested yet with surprise removal
> > - I don't expect this to go in as it is, still needs more testing but
> >    I'd like to get feedback on above mentioned changes.
> > 
> > The general idea is that we have a pair of devices, a vfio-pci and a
> > emulated device. Before migration the vfio device is unplugged and data
> > flows to the emulated device, on the target side another vfio-pci device
> > is plugged in to take over the data-path. In the guest the net_failover
> > module will pair net devices with the same MAC address.
> > 
> > * In the first patch the infrastructure for hiding the device is added
> >    for the qbus and qdev APIs.
> > 
> > * In the second patch the virtio-net uses the API to defer adding the vfio
> >    device until the VIRTIO_NET_F_STANDBY feature is acked.
> > 
> > Previous discussion:
> >    RFC v1 https://patchwork.ozlabs.org/cover/989098/
> >    RFC v2 https://www.mail-archive.com/qemu-devel@nongnu.org/msg606906.html
> > 
> > To summarize concerns/feedback from previous discussion:
> > 1.- guest OS can reject or worse _delay_ unplug by any amount of time.
> >    Migration might get stuck for unpredictable time with unclear reason.
> >    This approach combines two tricky things, hot/unplug and migration.
> >    -> We can surprise-remove the PCI device and in QEMU we can do all
> >       necessary rollbacks transparent to management software. Will it be
> >       easy, probably not.
> > 2. PCI devices are a precious ressource. The primary device should never
> >    be added to QEMU if it won't be used by guest instead of hiding it in
> >    QEMU.
> >    -> We only hotplug the device when the standby feature bit was
> >       negotiated. We save the device cmdline options until we need it for
> >       qdev_device_add()
> >       Hiding a device can be a useful concept to model. For example a
> >       pci device in a powered-off slot could be marked as hidden until the slot is
> >       powered on (mst).
> > 3. Management layer software should handle this. Open Stack already has
> >    components/code to handle unplug/replug VFIO devices and metadata to
> >    provide to the guest for detecting which devices should be paired.
> >    -> An approach that includes all software from firmware to
> >       higher-level management software wasn't tried in the last years. This is
> >       an attempt to keep it simple and contained in QEMU as much as possible.
> > 4. Hotplugging a device and then making it part of a failover setup is
> >     not possible
> >    -> addressed by extending qdev hotplug functions to check for hidden
> >       attribute, so e.g. device_add can be used to plug a device.
> > 
> > 
> > I have tested this with a mlx5 NIC and was able to migrate the VM with
> > above mentioned workarounds for open problems.
> > 
> > Command line example:
> > 
> > qemu-system-x86_64 -enable-kvm -m 3072 -smp 3 \
> >          -machine q35,kernel-irqchip=split -cpu host   \
> >          -k fr   \
> >          -serial stdio   \
> >          -net none \
> >          -qmp unix:/tmp/qmp.socket,server,nowait \
> >          -monitor telnet:127.0.0.1:5555,server,nowait \
> >          -device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
> >          -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
> >          -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
> >          -netdev tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \
> >          -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,failover=on \
> >          /root/rhel-guest-image-8.0-1781.x86_64.qcow2
> > 
> > Then the primary device can be hotplugged via
> >   (qemu) device_add vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,standby=net1
> 
> 
> I guess this is the commandline on the migration destination, and as far as
> I understand from this example, on the destination we (meaning libvirt or
> higher level management application) must *not* include the assigned device
> on the qemu commandline, but must instead hotplug the device later after the
> guest CPUs have been restarted on the destination.
> 
> So if I'm understanding correctly, the idea is that on the migration source,
> the device may have been hotplugged, or may have been included when qemu was
> originally started. Then qemu automatically handles the unplug of the device
> on the source, but it seems qemu does nothing on the destination, leaving
> that up to libvirt or a higher layer to implement.

Good point. I don't see why it would not work just as well
with device present straight away.

Did I miss something?

I think Jens was just testing local machine migration
and of course you can only assign a device to 1 VM at a time.

> Then in order for this to work, libvirt (or OpenStack or oVirt or whoever)
> needs to understand that the device in the libvirt config (it will still be
> in the libvirt config, since from libvirt's POV it hasn't been unplugged):
> 
> 1) shouldn't be included in the qemu commandline on the destination,
> 
> 2) will almost surely need to be replaced with a different device on the
> destination (since it's almost certain that the destination won't have an
> available device at the same PCI address)
> 
> 3) will probably need to be unbinded from the VF net driver (does this need
> to happen before migration is finished? If we want to lower the probability
> of a failure after we're already committed to the migration, then I think we
> must, but libvirt isn't set up for that in any way).
> 
> 4) will need to be hotplugged after the migration has finished *and* after
> the guest CPUs have been restarted on the destination.
> 
> 
> While it will be possible to assure that there is a destination device, and
> to replace the old device with new in the config (and maybe, either with
> some major reworking of device assignment code, or offloading the
> responsibility to the management application(s), possible to re-bind the
> device to the vfio-pci driver), prior to marking the migration as
> "successful" (thus committing to running it on the destination), we can't
> say as much for actually assigning the device. So if the assignment fails,
> then what happens?
> 
> 
> So a few issues I see that will need to be solved by [someone] (apparently
> either libvirt or management):
> 
> a) there isn't anything in libvirt's XML grammar that allows us to signify a
> device that is "present in the config but shouldn't be included in the
> commandline"
> 
> b) someone will need to replace the device from the source with an
> equivalent device on the destination in the libvirt XML. There are other
> cases of management modifying the XML during migration (I think), but this
> does point out that putting the "auto-unplug code into qemu isn't turning
> this into a trivial
> 
> c) there is nothing in libvirt's migration logic that can cause a device to
> be re-binded to vfio-pci prior to completion of a migration. Unless this is
> added to libvirt (or the re-bind operation is passed off to the management
> application), we will need to live with the possibility that hotplugging the
> device will fail due to failed re-bind *after* we've committed to the
> migration.
> 
> d) once the guest CPUs are restarted on the destination, [someone] (libvirt
> or management) needs to hotplug the new device on the destination. (I'm
> guessing that a hotplug can only be done while the guest CPUs are running;
> correct me if this is wrong!)
> 
> This sounds like a lot of complexity for something that was supposed to be
> handled completely/transparently by qemu :-P.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices
  2019-06-11 15:51   ` Michael S. Tsirkin
@ 2019-06-11 16:12     ` Laine Stump
  0 siblings, 0 replies; 77+ messages in thread
From: Laine Stump @ 2019-06-11 16:12 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: pkrempa, berrange, ehabkost, aadam, qemu-devel, Jens Freimann, ailan

On 6/11/19 11:51 AM, Michael S. Tsirkin wrote:
> On Tue, Jun 11, 2019 at 11:42:54AM -0400, Laine Stump wrote:
>> On 5/17/19 8:58 AM, Jens Freimann wrote:
>>> This is another attempt at implementing the host side of the
>>> net_failover concept
>>> (https://www.kernel.org/doc/html/latest/networking/net_failover.html)
>>>
>>> Changes since last RFC:
>>> - work around circular dependency of commandline options. Just add
>>>     failover=on to the virtio-net standby options and reference it from
>>>     primary (vfio-pci) device with standby=<id>
>>> - add patch 3/4 to allow migration of vfio-pci device when it is part of a
>>>     failover pair, still disallow for all other devices
>>> - add patch 4/4 to allow unplug of device during migrationm, make an
>>>     exception for failover primary devices. I'd like feedback on how to
>>>     solve this more elegant. I added a boolean to DeviceState, have it
>>>     default to false for all devices except for primary devices.
>>> - not tested yet with surprise removal
>>> - I don't expect this to go in as it is, still needs more testing but
>>>     I'd like to get feedback on above mentioned changes.
>>>
>>> The general idea is that we have a pair of devices, a vfio-pci and a
>>> emulated device. Before migration the vfio device is unplugged and data
>>> flows to the emulated device, on the target side another vfio-pci device
>>> is plugged in to take over the data-path. In the guest the net_failover
>>> module will pair net devices with the same MAC address.
>>>
>>> * In the first patch the infrastructure for hiding the device is added
>>>     for the qbus and qdev APIs.
>>>
>>> * In the second patch the virtio-net uses the API to defer adding the vfio
>>>     device until the VIRTIO_NET_F_STANDBY feature is acked.
>>>
>>> Previous discussion:
>>>     RFC v1 https://patchwork.ozlabs.org/cover/989098/
>>>     RFC v2 https://www.mail-archive.com/qemu-devel@nongnu.org/msg606906.html
>>>
>>> To summarize concerns/feedback from previous discussion:
>>> 1.- guest OS can reject or worse _delay_ unplug by any amount of time.
>>>     Migration might get stuck for unpredictable time with unclear reason.
>>>     This approach combines two tricky things, hot/unplug and migration.
>>>     -> We can surprise-remove the PCI device and in QEMU we can do all
>>>        necessary rollbacks transparent to management software. Will it be
>>>        easy, probably not.
>>> 2. PCI devices are a precious ressource. The primary device should never
>>>     be added to QEMU if it won't be used by guest instead of hiding it in
>>>     QEMU.
>>>     -> We only hotplug the device when the standby feature bit was
>>>        negotiated. We save the device cmdline options until we need it for
>>>        qdev_device_add()
>>>        Hiding a device can be a useful concept to model. For example a
>>>        pci device in a powered-off slot could be marked as hidden until the slot is
>>>        powered on (mst).
>>> 3. Management layer software should handle this. Open Stack already has
>>>     components/code to handle unplug/replug VFIO devices and metadata to
>>>     provide to the guest for detecting which devices should be paired.
>>>     -> An approach that includes all software from firmware to
>>>        higher-level management software wasn't tried in the last years. This is
>>>        an attempt to keep it simple and contained in QEMU as much as possible.
>>> 4. Hotplugging a device and then making it part of a failover setup is
>>>      not possible
>>>     -> addressed by extending qdev hotplug functions to check for hidden
>>>        attribute, so e.g. device_add can be used to plug a device.
>>>
>>>
>>> I have tested this with a mlx5 NIC and was able to migrate the VM with
>>> above mentioned workarounds for open problems.
>>>
>>> Command line example:
>>>
>>> qemu-system-x86_64 -enable-kvm -m 3072 -smp 3 \
>>>           -machine q35,kernel-irqchip=split -cpu host   \
>>>           -k fr   \
>>>           -serial stdio   \
>>>           -net none \
>>>           -qmp unix:/tmp/qmp.socket,server,nowait \
>>>           -monitor telnet:127.0.0.1:5555,server,nowait \
>>>           -device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
>>>           -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
>>>           -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
>>>           -netdev tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \
>>>           -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,failover=on \
>>>           /root/rhel-guest-image-8.0-1781.x86_64.qcow2
>>>
>>> Then the primary device can be hotplugged via
>>>    (qemu) device_add vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,standby=net1
>>
>>
>> I guess this is the commandline on the migration destination, and as far as
>> I understand from this example, on the destination we (meaning libvirt or
>> higher level management application) must *not* include the assigned device
>> on the qemu commandline, but must instead hotplug the device later after the
>> guest CPUs have been restarted on the destination.
>>
>> So if I'm understanding correctly, the idea is that on the migration source,
>> the device may have been hotplugged, or may have been included when qemu was
>> originally started. Then qemu automatically handles the unplug of the device
>> on the source, but it seems qemu does nothing on the destination, leaving
>> that up to libvirt or a higher layer to implement.
> 
> Good point. I don't see why it would not work just as well
> with device present straight away.

Will the guest get properly notified about the device if it had been 
unplugged (from the guest POV) prior to the migration (so the last thing 
the guest knows is that there is no device), but is suddenly/magically 
back in place from the instant the CPUs start on the destination? 
Doesn't there need to be some sort of notification sent from qemu to the 
guest OS to let it know that a "new" device has been plugged in? (I've 
always assumed this was the case, but it's really just guessing on my 
part :-)

It will certainly make things simpler if the device can be present in 
the qemu commandline. If that's the case, then I *think* only item (2) 
below will need solving.

> 
> Did I miss something?
> 
> I think Jens was just testing local machine migration
> and of course you can only assign a device to 1 VM at a time.


That's useful and convenient for a smoke test, but doesn't account for 
the (almost 100%) possibility of having a different device address 
(maybe even a different model of device) on source and destination, or 
for the need to unbind/rebind vfio-pci and the host net driver.


> 
>> Then in order for this to work, libvirt (or OpenStack or oVirt or whoever)
>> needs to understand that the device in the libvirt config (it will still be
>> in the libvirt config, since from libvirt's POV it hasn't been unplugged):
>>
>> 1) shouldn't be included in the qemu commandline on the destination,
>>
>> 2) will almost surely need to be replaced with a different device on the
>> destination (since it's almost certain that the destination won't have an
>> available device at the same PCI address)
>>
>> 3) will probably need to be unbinded from the VF net driver (does this need
>> to happen before migration is finished? If we want to lower the probability
>> of a failure after we're already committed to the migration, then I think we
>> must, but libvirt isn't set up for that in any way).
>>
>> 4) will need to be hotplugged after the migration has finished *and* after
>> the guest CPUs have been restarted on the destination.
>>
>>
>> While it will be possible to assure that there is a destination device, and
>> to replace the old device with new in the config (and maybe, either with
>> some major reworking of device assignment code, or offloading the
>> responsibility to the management application(s), possible to re-bind the
>> device to the vfio-pci driver), prior to marking the migration as
>> "successful" (thus committing to running it on the destination), we can't
>> say as much for actually assigning the device. So if the assignment fails,
>> then what happens?
>>
>>
>> So a few issues I see that will need to be solved by [someone] (apparently
>> either libvirt or management):
>>
>> a) there isn't anything in libvirt's XML grammar that allows us to signify a
>> device that is "present in the config but shouldn't be included in the
>> commandline"
>>
>> b) someone will need to replace the device from the source with an
>> equivalent device on the destination in the libvirt XML. There are other
>> cases of management modifying the XML during migration (I think), but this
>> does point out that putting the "auto-unplug code into qemu isn't turning
>> this into a trivial
>>
>> c) there is nothing in libvirt's migration logic that can cause a device to
>> be re-binded to vfio-pci prior to completion of a migration. Unless this is
>> added to libvirt (or the re-bind operation is passed off to the management
>> application), we will need to live with the possibility that hotplugging the
>> device will fail due to failed re-bind *after* we've committed to the
>> migration.
>>
>> d) once the guest CPUs are restarted on the destination, [someone] (libvirt
>> or management) needs to hotplug the new device on the destination. (I'm
>> guessing that a hotplug can only be done while the guest CPUs are running;
>> correct me if this is wrong!)
>>
>> This sounds like a lot of complexity for something that was supposed to be
>> handled completely/transparently by qemu :-P.



^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices
  2019-06-11 15:42 ` Laine Stump
  2019-06-11 15:51   ` Michael S. Tsirkin
@ 2019-06-12  9:11   ` Daniel P. Berrangé
  2019-06-12 11:59     ` Jens Freimann
  1 sibling, 1 reply; 77+ messages in thread
From: Daniel P. Berrangé @ 2019-06-12  9:11 UTC (permalink / raw)
  To: Laine Stump
  Cc: pkrempa, ehabkost, mst, aadam, qemu-devel, Jens Freimann, ailan

On Tue, Jun 11, 2019 at 11:42:54AM -0400, Laine Stump wrote:
> On 5/17/19 8:58 AM, Jens Freimann wrote:
> > This is another attempt at implementing the host side of the
> > net_failover concept
> > (https://www.kernel.org/doc/html/latest/networking/net_failover.html)
> > 
> > Changes since last RFC:
> > - work around circular dependency of commandline options. Just add
> >    failover=on to the virtio-net standby options and reference it from
> >    primary (vfio-pci) device with standby=<id>
> > - add patch 3/4 to allow migration of vfio-pci device when it is part of a
> >    failover pair, still disallow for all other devices
> > - add patch 4/4 to allow unplug of device during migrationm, make an
> >    exception for failover primary devices. I'd like feedback on how to
> >    solve this more elegant. I added a boolean to DeviceState, have it
> >    default to false for all devices except for primary devices.
> > - not tested yet with surprise removal
> > - I don't expect this to go in as it is, still needs more testing but
> >    I'd like to get feedback on above mentioned changes.
> > 
> > The general idea is that we have a pair of devices, a vfio-pci and a
> > emulated device. Before migration the vfio device is unplugged and data
> > flows to the emulated device, on the target side another vfio-pci device
> > is plugged in to take over the data-path. In the guest the net_failover
> > module will pair net devices with the same MAC address.
> > 
> > * In the first patch the infrastructure for hiding the device is added
> >    for the qbus and qdev APIs.
> > 
> > * In the second patch the virtio-net uses the API to defer adding the vfio
> >    device until the VIRTIO_NET_F_STANDBY feature is acked.
> > 
> > Previous discussion:
> >    RFC v1 https://patchwork.ozlabs.org/cover/989098/
> >    RFC v2 https://www.mail-archive.com/qemu-devel@nongnu.org/msg606906.html
> > 
> > To summarize concerns/feedback from previous discussion:
> > 1.- guest OS can reject or worse _delay_ unplug by any amount of time.
> >    Migration might get stuck for unpredictable time with unclear reason.
> >    This approach combines two tricky things, hot/unplug and migration.
> >    -> We can surprise-remove the PCI device and in QEMU we can do all
> >       necessary rollbacks transparent to management software. Will it be
> >       easy, probably not.
> > 2. PCI devices are a precious ressource. The primary device should never
> >    be added to QEMU if it won't be used by guest instead of hiding it in
> >    QEMU.
> >    -> We only hotplug the device when the standby feature bit was
> >       negotiated. We save the device cmdline options until we need it for
> >       qdev_device_add()
> >       Hiding a device can be a useful concept to model. For example a
> >       pci device in a powered-off slot could be marked as hidden until the slot is
> >       powered on (mst).
> > 3. Management layer software should handle this. Open Stack already has
> >    components/code to handle unplug/replug VFIO devices and metadata to
> >    provide to the guest for detecting which devices should be paired.
> >    -> An approach that includes all software from firmware to
> >       higher-level management software wasn't tried in the last years. This is
> >       an attempt to keep it simple and contained in QEMU as much as possible.
> > 4. Hotplugging a device and then making it part of a failover setup is
> >     not possible
> >    -> addressed by extending qdev hotplug functions to check for hidden
> >       attribute, so e.g. device_add can be used to plug a device.
> > 
> > 
> > I have tested this with a mlx5 NIC and was able to migrate the VM with
> > above mentioned workarounds for open problems.
> > 
> > Command line example:
> > 
> > qemu-system-x86_64 -enable-kvm -m 3072 -smp 3 \
> >          -machine q35,kernel-irqchip=split -cpu host   \
> >          -k fr   \
> >          -serial stdio   \
> >          -net none \
> >          -qmp unix:/tmp/qmp.socket,server,nowait \
> >          -monitor telnet:127.0.0.1:5555,server,nowait \
> >          -device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
> >          -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
> >          -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
> >          -netdev tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \
> >          -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,failover=on \
> >          /root/rhel-guest-image-8.0-1781.x86_64.qcow2
> > 
> > Then the primary device can be hotplugged via
> >   (qemu) device_add vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,standby=net1
> 
> 
> I guess this is the commandline on the migration destination, and as far as
> I understand from this example, on the destination we (meaning libvirt or
> higher level management application) must *not* include the assigned device
> on the qemu commandline, but must instead hotplug the device later after the
> guest CPUs have been restarted on the destination.
> 
> So if I'm understanding correctly, the idea is that on the migration source,
> the device may have been hotplugged, or may have been included when qemu was
> originally started. Then qemu automatically handles the unplug of the device
> on the source, but it seems qemu does nothing on the destination, leaving
> that up to libvirt or a higher layer to implement.
> 
> Then in order for this to work, libvirt (or OpenStack or oVirt or whoever)
> needs to understand that the device in the libvirt config (it will still be
> in the libvirt config, since from libvirt's POV it hasn't been unplugged):
> 
> 1) shouldn't be included in the qemu commandline on the destination,

I don't believe that's the case.  The CLI args above are just illustrating
that it is now possible to *optionally* not specify the VFIO device on the
CLI. This is because previous versions of the patchset *always* required
the device on the CLI due to a circular dependancy in the CLI syntax. This
patch series version fixed that limitation, so now the VFIO device can be
cold plugged or hotplugged as desired.

> 2) will almost surely need to be replaced with a different device on the
> destination (since it's almost certain that the destination won't have an
> available device at the same PCI address)

Yes, the management application that triggers the migration will need to
pass in a new XML document to libvirt when starting the migration so that
we use the suitable new device on the target host.

> 3) will probably need to be unbinded from the VF net driver (does this need
> to happen before migration is finished? If we want to lower the probability
> of a failure after we're already committed to the migration, then I think we
> must, but libvirt isn't set up for that in any way).
> 
> 4) will need to be hotplugged after the migration has finished *and* after
> the guest CPUs have been restarted on the destination.

My understanding is that QEMU takes care of this.

> a) there isn't anything in libvirt's XML grammar that allows us to signify a
> device that is "present in the config but shouldn't be included in the
> commandline"

I don't thin we need that.

> b) someone will need to replace the device from the source with an
> equivalent device on the destination in the libvirt XML. There are other
> cases of management modifying the XML during migration (I think), but this
> does point out that putting the "auto-unplug code into qemu isn't turning
> this into a trivial

The mgmt app should pass the new device details in the XML when starting
migration. Shouldn't be a big deal as OpenStack already does that for 
quite a few other parts of the config.

> c) there is nothing in libvirt's migration logic that can cause a device to
> be re-binded to vfio-pci prior to completion of a migration. Unless this is
> added to libvirt (or the re-bind operation is passed off to the management
> application), we will need to live with the possibility that hotplugging the
> device will fail due to failed re-bind *after* we've committed to the
> migration.

IIUC, we should be binding to vfio-pci during the prepare phase of the
migration, since that's when QEMU is started by libvirt on the target.

> d) once the guest CPUs are restarted on the destination, [someone] (libvirt
> or management) needs to hotplug the new device on the destination. (I'm
> guessing that a hotplug can only be done while the guest CPUs are running;
> correct me if this is wrong!)

I don't believe so, since we'll be able to cold plug it during prepare
phase.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices
  2019-06-12  9:11   ` Daniel P. Berrangé
@ 2019-06-12 11:59     ` Jens Freimann
  2019-06-12 15:54       ` Laine Stump
  0 siblings, 1 reply; 77+ messages in thread
From: Jens Freimann @ 2019-06-12 11:59 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: pkrempa, ehabkost, mst, aadam, qemu-devel, Laine Stump, ailan

On Wed, Jun 12, 2019 at 11:11:23AM +0200, Daniel P. Berrangé wrote:
>On Tue, Jun 11, 2019 at 11:42:54AM -0400, Laine Stump wrote:
>> On 5/17/19 8:58 AM, Jens Freimann wrote:
> >
>> > Command line example:
>> >
>> > qemu-system-x86_64 -enable-kvm -m 3072 -smp 3 \
>> >          -machine q35,kernel-irqchip=split -cpu host   \
>> >          -k fr   \
>> >          -serial stdio   \
>> >          -net none \
>> >          -qmp unix:/tmp/qmp.socket,server,nowait \
>> >          -monitor telnet:127.0.0.1:5555,server,nowait \
>> >          -device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
>> >          -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
>> >          -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
>> >          -netdev tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \
>> >          -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,failover=on \
>> >          /root/rhel-guest-image-8.0-1781.x86_64.qcow2
>> >
>> > Then the primary device can be hotplugged via
>> >   (qemu) device_add vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,standby=net1
>>
>>
>> I guess this is the commandline on the migration destination, and as far as
>> I understand from this example, on the destination we (meaning libvirt or
>> higher level management application) must *not* include the assigned device
>> on the qemu commandline, but must instead hotplug the device later after the
>> guest CPUs have been restarted on the destination.
>>
>> So if I'm understanding correctly, the idea is that on the migration source,
>> the device may have been hotplugged, or may have been included when qemu was
>> originally started. Then qemu automatically handles the unplug of the device
>> on the source, but it seems qemu does nothing on the destination, leaving
>> that up to libvirt or a higher layer to implement.
>>
>> Then in order for this to work, libvirt (or OpenStack or oVirt or whoever)
>> needs to understand that the device in the libvirt config (it will still be
>> in the libvirt config, since from libvirt's POV it hasn't been unplugged):
>>
>> 1) shouldn't be included in the qemu commandline on the destination,
>
>I don't believe that's the case.  The CLI args above are just illustrating
>that it is now possible to *optionally* not specify the VFIO device on the
>CLI. This is because previous versions of the patchset *always* required
>the device on the CLI due to a circular dependancy in the CLI syntax. This
>patch series version fixed that limitation, so now the VFIO device can be
>cold plugged or hotplugged as desired.

I've mostly tested hotplugging but cold plugged should work as well. 

>> 2) will almost surely need to be replaced with a different device on the
>> destination (since it's almost certain that the destination won't have an
>> available device at the same PCI address)
>
>Yes, the management application that triggers the migration will need to
>pass in a new XML document to libvirt when starting the migration so that
>we use the suitable new device on the target host.

Yes, that's how I expected it to work. In my tests the pci address was
the same on destination and source host but that was more by accident. I
think the libvirt XML on the destination just needs to have the pci
address of nic of the same type for it to work. 

>> 3) will probably need to be unbinded from the VF net driver (does this need
>> to happen before migration is finished? If we want to lower the probability
>> of a failure after we're already committed to the migration, then I think we
>> must, but libvirt isn't set up for that in any way).

Yes, so I think that's part of the 'partial' unplug I'm trying to
figure out add the moment. 

>> 4) will need to be hotplugged after the migration has finished *and* after
>> the guest CPUs have been restarted on the destination.
>
>My understanding is that QEMU takes care of this.

So the re-plugging of the device on the destination is not in the v1
of the patches, which I failed to mention, my bad. I will sent out a v2
that has this part as well shortly. I added a runstate change handler
that is called on the destination when the run state changes from INMIGRATE
to something else. When the new state is RUNNING I hotplug the primary device. 

>> a) there isn't anything in libvirt's XML grammar that allows us to signify a
>> device that is "present in the config but shouldn't be included in the
>> commandline"
>
>I don't thin we need that.
>
>> b) someone will need to replace the device from the source with an
>> equivalent device on the destination in the libvirt XML. There are other
>> cases of management modifying the XML during migration (I think), but this
>> does point out that putting the "auto-unplug code into qemu isn't turning
>> this into a trivial
>
>The mgmt app should pass the new device details in the XML when starting
>migration. Shouldn't be a big deal as OpenStack already does that for
>quite a few other parts of the config.
>
>> c) there is nothing in libvirt's migration logic that can cause a device to
>> be re-binded to vfio-pci prior to completion of a migration. Unless this is
>> added to libvirt (or the re-bind operation is passed off to the management
>> application), we will need to live with the possibility that hotplugging the
>> device will fail due to failed re-bind *after* we've committed to the
>> migration.
>
>IIUC, we should be binding to vfio-pci during the prepare phase of the
>migration, since that's when QEMU is started by libvirt on the target.
>
>> d) once the guest CPUs are restarted on the destination, [someone] (libvirt
>> or management) needs to hotplug the new device on the destination. (I'm
>> guessing that a hotplug can only be done while the guest CPUs are running;
>> correct me if this is wrong!)
>
>I don't believe so, since we'll be able to cold plug it during prepare
>phase.

I think I don't understand what happens during the prepare phase on
the destination. Need to look into that. But I think I had an error in
my logic that I need to plug the device from QEMU on the destination
side. You're saying we could just always cold plug it directly when
the VM is started. I think an exception would be when the guest was
migrated before we added the primary device on the source, so before
virtio feature negotiation.

regards,
Jens 


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices
  2019-06-12 11:59     ` Jens Freimann
@ 2019-06-12 15:54       ` Laine Stump
  0 siblings, 0 replies; 77+ messages in thread
From: Laine Stump @ 2019-06-12 15:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: pkrempa, Daniel P. Berrangé,
	ehabkost, mst, aadam, Jens Freimann, ailan

On 6/12/19 7:59 AM, Jens Freimann wrote:
> On Wed, Jun 12, 2019 at 11:11:23AM +0200, Daniel P. Berrangé wrote:
>> On Tue, Jun 11, 2019 at 11:42:54AM -0400, Laine Stump wrote:
>>> On 5/17/19 8:58 AM, Jens Freimann wrote:
>> >
>>> > Command line example:
>>> >
>>> > qemu-system-x86_64 -enable-kvm -m 3072 -smp 3 \
>>> >          -machine q35,kernel-irqchip=split -cpu host   \
>>> >          -k fr   \
>>> >          -serial stdio   \
>>> >          -net none \
>>> >          -qmp unix:/tmp/qmp.socket,server,nowait \
>>> >          -monitor telnet:127.0.0.1:5555,server,nowait \
>>> >          -device 
>>> pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
>>> >          -device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
>>> >          -device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
>>> >          -netdev 
>>> tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \
>>> >          -device 
>>> virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,failover=on 
>>> \
>>> >          /root/rhel-guest-image-8.0-1781.x86_64.qcow2
>>> >
>>> > Then the primary device can be hotplugged via
>>> >   (qemu) device_add 
>>> vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,standby=net1
>>>
>>>
>>> I guess this is the commandline on the migration destination, and as 
>>> far as
>>> I understand from this example, on the destination we (meaning 
>>> libvirt or
>>> higher level management application) must *not* include the assigned 
>>> device
>>> on the qemu commandline, but must instead hotplug the device later 
>>> after the
>>> guest CPUs have been restarted on the destination.
>>>
>>> So if I'm understanding correctly, the idea is that on the migration 
>>> source,
>>> the device may have been hotplugged, or may have been included when 
>>> qemu was
>>> originally started. Then qemu automatically handles the unplug of the 
>>> device
>>> on the source, but it seems qemu does nothing on the destination, 
>>> leaving
>>> that up to libvirt or a higher layer to implement.
>>>
>>> Then in order for this to work, libvirt (or OpenStack or oVirt or 
>>> whoever)
>>> needs to understand that the device in the libvirt config (it will 
>>> still be
>>> in the libvirt config, since from libvirt's POV it hasn't been 
>>> unplugged):
>>>
>>> 1) shouldn't be included in the qemu commandline on the destination,
>>
>> I don't believe that's the case.  The CLI args above are just 
>> illustrating
>> that it is now possible to *optionally* not specify the VFIO device on 
>> the
>> CLI. This is because previous versions of the patchset *always* required
>> the device on the CLI due to a circular dependancy in the CLI syntax. 
>> This
>> patch series version fixed that limitation, so now the VFIO device can be
>> cold plugged or hotplugged as desired.
> 
> I've mostly tested hotplugging but cold plugged should work as well.


Okay, in that case my issues 1, 3, and 4 are irrelevant (and (2) is 
handled by management), so the concerns from my previous email are all 
addressed.


^ permalink raw reply	[flat|nested] 77+ messages in thread

end of thread, other threads:[~2019-06-12 15:56 UTC | newest]

Thread overview: 77+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-17 12:58 [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices Jens Freimann
2019-05-17 12:58 ` [Qemu-devel] [PATCH 1/4] migration: allow unplug during migration for failover devices Jens Freimann
2019-05-21  9:33   ` Dr. David Alan Gilbert
2019-05-21  9:47     ` Daniel P. Berrangé
2019-05-23  8:01     ` Jens Freimann
2019-05-23 15:37       ` Dr. David Alan Gilbert
2019-05-17 12:58 ` [Qemu-devel] [PATCH 2/4] qdev/qbus: Add hidden device support Jens Freimann
2019-05-21 11:33   ` Michael S. Tsirkin
2019-05-17 12:58 ` [Qemu-devel] [PATCH 3/4] net/virtio: add failover support Jens Freimann
2019-05-21  9:45   ` Dr. David Alan Gilbert
2019-05-30 14:56     ` Jens Freimann
2019-05-30 17:46       ` Michael S. Tsirkin
2019-05-30 18:00         ` Dr. David Alan Gilbert
2019-05-30 18:09           ` Michael S. Tsirkin
2019-05-30 18:22             ` Eduardo Habkost
2019-05-30 23:06               ` Michael S. Tsirkin
2019-05-31 17:01                 ` Eduardo Habkost
2019-05-31 18:04                   ` Michael S. Tsirkin
2019-05-31 18:42                     ` Eduardo Habkost
2019-05-31 18:45                     ` Dr. David Alan Gilbert
2019-05-31 20:29                       ` Alex Williamson
2019-05-31 21:05                         ` Michael S. Tsirkin
2019-05-31 21:59                           ` Eduardo Habkost
2019-06-03  8:59                         ` Dr. David Alan Gilbert
2019-05-31 20:43                       ` Michael S. Tsirkin
2019-05-31 21:03                         ` Eduardo Habkost
2019-06-03  8:06                         ` Dr. David Alan Gilbert
2019-05-30 19:08             ` Dr. David Alan Gilbert
2019-05-30 19:21               ` Michael S. Tsirkin
2019-05-31  8:23                 ` Dr. David Alan Gilbert
2019-06-05 15:23             ` Daniel P. Berrangé
2019-05-30 18:17           ` Eduardo Habkost
2019-05-30 19:09       ` Dr. David Alan Gilbert
2019-05-31 21:47       ` Eduardo Habkost
2019-06-03  8:24         ` Jens Freimann
2019-06-03  9:26           ` Jens Freimann
2019-06-03 18:10           ` Laine Stump
2019-06-03 18:46             ` Alex Williamson
2019-06-05 15:20               ` Daniel P. Berrangé
2019-06-06 15:00               ` Roman Kagan
2019-06-03 19:36           ` Eduardo Habkost
2019-06-04 13:43             ` Jens Freimann
2019-06-04 14:09               ` Eduardo Habkost
2019-06-04 17:06               ` Michael S. Tsirkin
2019-06-04 19:00                 ` Dr. David Alan Gilbert
2019-06-07 14:14                   ` Jens Freimann
2019-06-07 14:32                     ` Michael S. Tsirkin
2019-06-07 17:51                     ` Dr. David Alan Gilbert
2019-06-05 14:36               ` Daniel P. Berrangé
2019-06-05 16:04               ` Laine Stump
2019-06-05 16:19                 ` Daniel P. Berrangé
2019-05-17 12:58 ` [Qemu-devel] [PATCH 4/4] vfio/pci: unplug failover primary device before migration Jens Freimann
2019-05-20 22:56 ` [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices Alex Williamson
2019-05-21  7:21   ` Jens Freimann
2019-05-21 11:37     ` Michael S. Tsirkin
2019-05-21 18:49       ` Jens Freimann
2019-05-29  0:14         ` si-wei liu
2019-05-29  2:54           ` Michael S. Tsirkin
2019-06-03 18:06             ` Laine Stump
2019-06-03 18:12               ` Michael S. Tsirkin
2019-06-03 18:18                 ` Laine Stump
2019-06-06 21:49                   ` Michael S. Tsirkin
2019-05-29  2:40         ` Michael S. Tsirkin
2019-05-29  7:48           ` Jens Freimann
2019-05-30 18:12             ` Michael S. Tsirkin
2019-05-31 15:12               ` Jens Freimann
2019-05-21 14:18     ` Alex Williamson
2019-05-21  8:37 ` Daniel P. Berrangé
2019-05-21 10:10 ` Michael S. Tsirkin
2019-05-21 19:17   ` Jens Freimann
2019-05-21 21:43     ` Michael S. Tsirkin
2019-06-11 15:42 ` Laine Stump
2019-06-11 15:51   ` Michael S. Tsirkin
2019-06-11 16:12     ` Laine Stump
2019-06-12  9:11   ` Daniel P. Berrangé
2019-06-12 11:59     ` Jens Freimann
2019-06-12 15:54       ` Laine Stump

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.