[Qemu-devel] [PATCH v3 0/5] Support for datapath switching during live migration

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v3 0/5] Support for datapath switching during live migration
@ 2019-01-07 22:29 ` Venu Busireddy
  0 siblings, 0 replies; 57+ messages in thread
From: Venu Busireddy @ 2019-01-07 22:29 UTC (permalink / raw)
  To: venu.busireddy, Michael S. Tsirkin, Marcel Apfelbaum
  Cc: virtio-dev, qemu-devel

Implement the infrastructure to support datapath switching during live
migration involving SR-IOV devices.

1. This patch is based off on the current VIRTIO_NET_F_STANDBY feature
   bit and MAC address device pairing.

2. This set of events will be consumed by userspace management software
   to orchestrate all the hot plug and datapath switching activities.
   This scheme has the least QEMU modifications while allowing userspace
   software to build its own intelligence to control the whole process
   of SR-IOV live migration.

3. While the hidden device model (viz. coupled device model) is still
   being explored for automatic hot plugging (QEMU) and automatic datapath
   switching (host-kernel), this series provides a supplemental set
   of interfaces if management software wants to drive the SR-IOV live
   migration on its own. It should not conflict with the hidden device
   model but just offers simplicity of implementation.

Si-Wei Liu (2):
  vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover
  pci: query command extension to check the bus master enabling status of the failover-primary device

Sridhar Samudrala (1):
  virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.

Venu Busireddy (2):
  virtio_net: Add support for "Data Path Switching" during Live Migration.
  virtio_net: Add a query command for FAILOVER_STANDBY_CHANGED event.

---
Changes in v3:
  Fix issues with coding style in patch 3/5.

Changes in v2:
  Added a query command for FAILOVER_STANDBY_CHANGED event.
  Added a query command for FAILOVER_PRIMARY_CHANGED event.

 hmp.c                          |   5 +++
 hw/acpi/pcihp.c                |  27 +++++++++++
 hw/net/virtio-net.c            |  42 +++++++++++++++++
 hw/pci/pci.c                   |   5 +++
 hw/vfio/pci.c                  |  60 +++++++++++++++++++++++++
 hw/vfio/pci.h                  |   1 +
 include/hw/pci/pci.h           |   1 +
 include/hw/virtio/virtio-net.h |   1 +
 include/net/net.h              |   2 +
 net/net.c                      |  61 +++++++++++++++++++++++++
 qapi/misc.json                 |   5 ++-
 qapi/net.json                  | 100 +++++++++++++++++++++++++++++++++++++++++
 12 files changed, 309 insertions(+), 1 deletion(-)

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [virtio-dev] [PATCH v3 0/5] Support for datapath switching during live migration
@ 2019-01-07 22:29 ` Venu Busireddy
  0 siblings, 0 replies; 57+ messages in thread
From: Venu Busireddy @ 2019-01-07 22:29 UTC (permalink / raw)
  To: venu.busireddy, Michael S. Tsirkin, Marcel Apfelbaum
  Cc: virtio-dev, qemu-devel

Implement the infrastructure to support datapath switching during live
migration involving SR-IOV devices.

1. This patch is based off on the current VIRTIO_NET_F_STANDBY feature
   bit and MAC address device pairing.

2. This set of events will be consumed by userspace management software
   to orchestrate all the hot plug and datapath switching activities.
   This scheme has the least QEMU modifications while allowing userspace
   software to build its own intelligence to control the whole process
   of SR-IOV live migration.

3. While the hidden device model (viz. coupled device model) is still
   being explored for automatic hot plugging (QEMU) and automatic datapath
   switching (host-kernel), this series provides a supplemental set
   of interfaces if management software wants to drive the SR-IOV live
   migration on its own. It should not conflict with the hidden device
   model but just offers simplicity of implementation.

Si-Wei Liu (2):
  vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover
  pci: query command extension to check the bus master enabling status of the failover-primary device

Sridhar Samudrala (1):
  virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.

Venu Busireddy (2):
  virtio_net: Add support for "Data Path Switching" during Live Migration.
  virtio_net: Add a query command for FAILOVER_STANDBY_CHANGED event.

---
Changes in v3:
  Fix issues with coding style in patch 3/5.

Changes in v2:
  Added a query command for FAILOVER_STANDBY_CHANGED event.
  Added a query command for FAILOVER_PRIMARY_CHANGED event.

 hmp.c                          |   5 +++
 hw/acpi/pcihp.c                |  27 +++++++++++
 hw/net/virtio-net.c            |  42 +++++++++++++++++
 hw/pci/pci.c                   |   5 +++
 hw/vfio/pci.c                  |  60 +++++++++++++++++++++++++
 hw/vfio/pci.h                  |   1 +
 include/hw/pci/pci.h           |   1 +
 include/hw/virtio/virtio-net.h |   1 +
 include/net/net.h              |   2 +
 net/net.c                      |  61 +++++++++++++++++++++++++
 qapi/misc.json                 |   5 ++-
 qapi/net.json                  | 100 +++++++++++++++++++++++++++++++++++++++++
 12 files changed, 309 insertions(+), 1 deletion(-)

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Qemu-devel] [PATCH v3 1/5] virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.
  2019-01-07 22:29 ` [virtio-dev] " Venu Busireddy
@ 2019-01-07 22:29   ` Venu Busireddy
  -1 siblings, 0 replies; 57+ messages in thread
From: Venu Busireddy @ 2019-01-07 22:29 UTC (permalink / raw)
  To: venu.busireddy, Michael S. Tsirkin, Marcel Apfelbaum
  Cc: Sridhar Samudrala, virtio-dev, qemu-devel

From: Sridhar Samudrala <sridhar.samudrala@intel.com>

This feature bit can be used by a hypervisor to indicate to the virtio_net
device that it can act as a standby for another device with the same MAC
address.

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
---
 hw/net/virtio-net.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 385b1a0..411f8fb 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
                      true),
     DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
     DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
+    DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY,
+                      false),
     DEFINE_PROP_END_OF_LIST(),
 };
 

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [virtio-dev] [PATCH v3 1/5] virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.
@ 2019-01-07 22:29   ` Venu Busireddy
  0 siblings, 0 replies; 57+ messages in thread
From: Venu Busireddy @ 2019-01-07 22:29 UTC (permalink / raw)
  To: venu.busireddy, Michael S. Tsirkin, Marcel Apfelbaum
  Cc: Sridhar Samudrala, virtio-dev, qemu-devel

From: Sridhar Samudrala <sridhar.samudrala@intel.com>

This feature bit can be used by a hypervisor to indicate to the virtio_net
device that it can act as a standby for another device with the same MAC
address.

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
---
 hw/net/virtio-net.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 385b1a0..411f8fb 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
                      true),
     DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
     DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
+    DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY,
+                      false),
     DEFINE_PROP_END_OF_LIST(),
 };
 

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Qemu-devel] [PATCH v3 2/5] virtio_net: Add support for "Data Path Switching" during Live Migration.
  2019-01-07 22:29 ` [virtio-dev] " Venu Busireddy
@ 2019-01-07 22:29   ` Venu Busireddy
  -1 siblings, 0 replies; 57+ messages in thread
From: Venu Busireddy @ 2019-01-07 22:29 UTC (permalink / raw)
  To: venu.busireddy, Michael S. Tsirkin, Marcel Apfelbaum
  Cc: virtio-dev, qemu-devel

Added a new event, FAILOVER_STANDBY_CHANGED, which is emitted whenever
the status of the virtio_net driver in the guest changes (either the
guest successfully loads the driver after the F_STANDBY feature bit
is negotiated, or the guest unloads the driver or reboots). Management
stack can use this event to determine when to plug/unplug the VF device
to/from the guest.

Also, the Virtual Functions will be automatically removed from the guest
if the guest is rebooted. To properly identify the VFIO devices that
must be removed, a new property named "failover-primary" is added to
the vfio-pci devices. Only the vfio-pci devices that have this property
enabled are removed from the guest upon reboot.

Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
---
 hw/acpi/pcihp.c      | 27 +++++++++++++++++++++++++++
 hw/net/virtio-net.c  | 24 ++++++++++++++++++++++++
 hw/vfio/pci.c        |  3 +++
 hw/vfio/pci.h        |  1 +
 include/hw/pci/pci.h |  1 +
 qapi/net.json        | 28 ++++++++++++++++++++++++++++
 6 files changed, 84 insertions(+)

diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
index 80d42e1..2a3ffd3 100644
--- a/hw/acpi/pcihp.c
+++ b/hw/acpi/pcihp.c
@@ -176,6 +176,25 @@ static void acpi_pcihp_eject_slot(AcpiPciHpState *s, unsigned bsel, unsigned slo
     }
 }
 
+static void acpi_pcihp_cleanup_failover_primary(AcpiPciHpState *s, int bsel)
+{
+    BusChild *kid, *next;
+    PCIBus *bus = acpi_pcihp_find_hotplug_bus(s, bsel);
+
+    if (!bus) {
+        return;
+    }
+    QTAILQ_FOREACH_SAFE(kid, &bus->qbus.children, sibling, next) {
+        DeviceState *qdev = kid->child;
+        PCIDevice *pdev = PCI_DEVICE(qdev);
+        int slot = PCI_SLOT(pdev->devfn);
+
+        if (pdev->failover_primary) {
+            s->acpi_pcihp_pci_status[bsel].down |= (1U << slot);
+        }
+    }
+}
+
 static void acpi_pcihp_update_hotplug_bus(AcpiPciHpState *s, int bsel)
 {
     BusChild *kid, *next;
@@ -207,6 +226,14 @@ static void acpi_pcihp_update(AcpiPciHpState *s)
     int i;
 
     for (i = 0; i < ACPI_PCIHP_MAX_HOTPLUG_BUS; ++i) {
+        /*
+         * Set the acpi_pcihp_pci_status[].down bits of all the
+         * failover_primary devices so that the devices are ejected
+         * from the guest. We can't use the qdev_unplug() as well as the
+         * hotplug_handler to unplug the devices, because the guest may
+         * not be in a state to cooperate.
+         */
+        acpi_pcihp_cleanup_failover_primary(s, i);
         acpi_pcihp_update_hotplug_bus(s, i);
     }
 }
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 411f8fb..7b1bcde 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -248,6 +248,29 @@ static void virtio_net_drop_tx_queue_data(VirtIODevice *vdev, VirtQueue *vq)
     }
 }
 
+static void virtio_net_failover_notify_event(VirtIONet *n, uint8_t status)
+{
+    VirtIODevice *vdev = VIRTIO_DEVICE(n);
+
+    if (virtio_has_feature(vdev->guest_features, VIRTIO_NET_F_STANDBY)) {
+        const char *ncn = n->netclient_name;
+        gchar *path = object_get_canonical_path(OBJECT(n->qdev));
+        /*
+         * Emit FAILOVER_STANDBY_CHANGED event with enabled=true
+         *   when the status transitions from 0 to VIRTIO_CONFIG_S_DRIVER_OK
+         * Emit FAILOVER_STANDBY_CHANGED event with enabled=false
+         *   when the status transitions from VIRTIO_CONFIG_S_DRIVER_OK to 0
+         */
+        if ((status & VIRTIO_CONFIG_S_DRIVER_OK) &&
+                (!(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK))) {
+            qapi_event_send_failover_standby_changed(!!ncn, ncn, path, true);
+        } else if ((!(status & VIRTIO_CONFIG_S_DRIVER_OK)) &&
+                (vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) {
+            qapi_event_send_failover_standby_changed(!!ncn, ncn, path, false);
+        }
+    }
+}
+
 static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
 {
     VirtIONet *n = VIRTIO_NET(vdev);
@@ -256,6 +279,7 @@ static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
     uint8_t queue_status;
 
     virtio_net_vnet_endian_status(n, status);
+    virtio_net_failover_notify_event(n, status);
     virtio_net_vhost_status(n, status);
 
     for (i = 0; i < n->max_queues; i++) {
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 5c7bd96..bd83b58 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3077,6 +3077,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
     vfio_register_err_notifier(vdev);
     vfio_register_req_notifier(vdev);
     vfio_setup_resetfn_quirk(vdev);
+    pdev->failover_primary = vdev->failover_primary;
 
     return;
 
@@ -3219,6 +3220,8 @@ static Property vfio_pci_dev_properties[] = {
                                    qdev_prop_nv_gpudirect_clique, uint8_t),
     DEFINE_PROP_OFF_AUTO_PCIBAR("x-msix-relocation", VFIOPCIDevice, msix_relo,
                                 OFF_AUTOPCIBAR_OFF),
+    DEFINE_PROP_BOOL("failover-primary", VFIOPCIDevice, failover_primary,
+                     false),
     /*
      * TODO - support passed fds... is this necessary?
      * DEFINE_PROP_STRING("vfiofd", VFIOPCIDevice, vfiofd_name),
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index b1ae4c0..06ca661 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -167,6 +167,7 @@ typedef struct VFIOPCIDevice {
     bool no_vfio_ioeventfd;
     bool enable_ramfb;
     VFIODisplay *dpy;
+    bool failover_primary;
 } VFIOPCIDevice;
 
 uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index e6514bb..b0111d1 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -351,6 +351,7 @@ struct PCIDevice {
     MSIVectorUseNotifier msix_vector_use_notifier;
     MSIVectorReleaseNotifier msix_vector_release_notifier;
     MSIVectorPollNotifier msix_vector_poll_notifier;
+    bool failover_primary;
 };
 
 void pci_register_bar(PCIDevice *pci_dev, int region_num,
diff --git a/qapi/net.json b/qapi/net.json
index 8f99fd9..6a6d6fe 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -683,3 +683,31 @@
 ##
 { 'event': 'NIC_RX_FILTER_CHANGED',
   'data': { '*name': 'str', 'path': 'str' } }
+
+##
+# @FAILOVER_STANDBY_CHANGED:
+#
+# Emitted whenever the virtio_net driver status changes (either the guest
+# successfully loads the driver after the F_STANDBY feature bit is negotiated,
+# or the guest unloads the driver or reboots).
+#
+# @device: Indicates the virtio_net device.
+#
+# @path: Indicates the device path.
+#
+# @enabled: true if the virtio_net driver is loaded.
+#           false if the virtio_net driver is unloaded or the guest reboots.
+#
+# Since: 4.0
+#
+# Example:
+#
+# <- { "event": "FAILOVER_STANDBY_CHANGED",
+#      "data": { "device": "net0",
+#                "path": "/machine/peripheral/net0/virtio-backend",
+#                "enabled": "true" },
+#      "timestamp": { "seconds": 1432121972, "microseconds": 744001 } },
+#
+##
+{ 'event': 'FAILOVER_STANDBY_CHANGED',
+  'data': {'*device': 'str', 'path': 'str', 'enabled': 'bool'} }

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [virtio-dev] [PATCH v3 2/5] virtio_net: Add support for "Data Path Switching" during Live Migration.
@ 2019-01-07 22:29   ` Venu Busireddy
  0 siblings, 0 replies; 57+ messages in thread
From: Venu Busireddy @ 2019-01-07 22:29 UTC (permalink / raw)
  To: venu.busireddy, Michael S. Tsirkin, Marcel Apfelbaum
  Cc: virtio-dev, qemu-devel

Added a new event, FAILOVER_STANDBY_CHANGED, which is emitted whenever
the status of the virtio_net driver in the guest changes (either the
guest successfully loads the driver after the F_STANDBY feature bit
is negotiated, or the guest unloads the driver or reboots). Management
stack can use this event to determine when to plug/unplug the VF device
to/from the guest.

Also, the Virtual Functions will be automatically removed from the guest
if the guest is rebooted. To properly identify the VFIO devices that
must be removed, a new property named "failover-primary" is added to
the vfio-pci devices. Only the vfio-pci devices that have this property
enabled are removed from the guest upon reboot.

Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
---
 hw/acpi/pcihp.c      | 27 +++++++++++++++++++++++++++
 hw/net/virtio-net.c  | 24 ++++++++++++++++++++++++
 hw/vfio/pci.c        |  3 +++
 hw/vfio/pci.h        |  1 +
 include/hw/pci/pci.h |  1 +
 qapi/net.json        | 28 ++++++++++++++++++++++++++++
 6 files changed, 84 insertions(+)

diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
index 80d42e1..2a3ffd3 100644
--- a/hw/acpi/pcihp.c
+++ b/hw/acpi/pcihp.c
@@ -176,6 +176,25 @@ static void acpi_pcihp_eject_slot(AcpiPciHpState *s, unsigned bsel, unsigned slo
     }
 }
 
+static void acpi_pcihp_cleanup_failover_primary(AcpiPciHpState *s, int bsel)
+{
+    BusChild *kid, *next;
+    PCIBus *bus = acpi_pcihp_find_hotplug_bus(s, bsel);
+
+    if (!bus) {
+        return;
+    }
+    QTAILQ_FOREACH_SAFE(kid, &bus->qbus.children, sibling, next) {
+        DeviceState *qdev = kid->child;
+        PCIDevice *pdev = PCI_DEVICE(qdev);
+        int slot = PCI_SLOT(pdev->devfn);
+
+        if (pdev->failover_primary) {
+            s->acpi_pcihp_pci_status[bsel].down |= (1U << slot);
+        }
+    }
+}
+
 static void acpi_pcihp_update_hotplug_bus(AcpiPciHpState *s, int bsel)
 {
     BusChild *kid, *next;
@@ -207,6 +226,14 @@ static void acpi_pcihp_update(AcpiPciHpState *s)
     int i;
 
     for (i = 0; i < ACPI_PCIHP_MAX_HOTPLUG_BUS; ++i) {
+        /*
+         * Set the acpi_pcihp_pci_status[].down bits of all the
+         * failover_primary devices so that the devices are ejected
+         * from the guest. We can't use the qdev_unplug() as well as the
+         * hotplug_handler to unplug the devices, because the guest may
+         * not be in a state to cooperate.
+         */
+        acpi_pcihp_cleanup_failover_primary(s, i);
         acpi_pcihp_update_hotplug_bus(s, i);
     }
 }
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 411f8fb..7b1bcde 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -248,6 +248,29 @@ static void virtio_net_drop_tx_queue_data(VirtIODevice *vdev, VirtQueue *vq)
     }
 }
 
+static void virtio_net_failover_notify_event(VirtIONet *n, uint8_t status)
+{
+    VirtIODevice *vdev = VIRTIO_DEVICE(n);
+
+    if (virtio_has_feature(vdev->guest_features, VIRTIO_NET_F_STANDBY)) {
+        const char *ncn = n->netclient_name;
+        gchar *path = object_get_canonical_path(OBJECT(n->qdev));
+        /*
+         * Emit FAILOVER_STANDBY_CHANGED event with enabled=true
+         *   when the status transitions from 0 to VIRTIO_CONFIG_S_DRIVER_OK
+         * Emit FAILOVER_STANDBY_CHANGED event with enabled=false
+         *   when the status transitions from VIRTIO_CONFIG_S_DRIVER_OK to 0
+         */
+        if ((status & VIRTIO_CONFIG_S_DRIVER_OK) &&
+                (!(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK))) {
+            qapi_event_send_failover_standby_changed(!!ncn, ncn, path, true);
+        } else if ((!(status & VIRTIO_CONFIG_S_DRIVER_OK)) &&
+                (vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) {
+            qapi_event_send_failover_standby_changed(!!ncn, ncn, path, false);
+        }
+    }
+}
+
 static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
 {
     VirtIONet *n = VIRTIO_NET(vdev);
@@ -256,6 +279,7 @@ static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
     uint8_t queue_status;
 
     virtio_net_vnet_endian_status(n, status);
+    virtio_net_failover_notify_event(n, status);
     virtio_net_vhost_status(n, status);
 
     for (i = 0; i < n->max_queues; i++) {
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 5c7bd96..bd83b58 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3077,6 +3077,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
     vfio_register_err_notifier(vdev);
     vfio_register_req_notifier(vdev);
     vfio_setup_resetfn_quirk(vdev);
+    pdev->failover_primary = vdev->failover_primary;
 
     return;
 
@@ -3219,6 +3220,8 @@ static Property vfio_pci_dev_properties[] = {
                                    qdev_prop_nv_gpudirect_clique, uint8_t),
     DEFINE_PROP_OFF_AUTO_PCIBAR("x-msix-relocation", VFIOPCIDevice, msix_relo,
                                 OFF_AUTOPCIBAR_OFF),
+    DEFINE_PROP_BOOL("failover-primary", VFIOPCIDevice, failover_primary,
+                     false),
     /*
      * TODO - support passed fds... is this necessary?
      * DEFINE_PROP_STRING("vfiofd", VFIOPCIDevice, vfiofd_name),
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index b1ae4c0..06ca661 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -167,6 +167,7 @@ typedef struct VFIOPCIDevice {
     bool no_vfio_ioeventfd;
     bool enable_ramfb;
     VFIODisplay *dpy;
+    bool failover_primary;
 } VFIOPCIDevice;
 
 uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index e6514bb..b0111d1 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -351,6 +351,7 @@ struct PCIDevice {
     MSIVectorUseNotifier msix_vector_use_notifier;
     MSIVectorReleaseNotifier msix_vector_release_notifier;
     MSIVectorPollNotifier msix_vector_poll_notifier;
+    bool failover_primary;
 };
 
 void pci_register_bar(PCIDevice *pci_dev, int region_num,
diff --git a/qapi/net.json b/qapi/net.json
index 8f99fd9..6a6d6fe 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -683,3 +683,31 @@
 ##
 { 'event': 'NIC_RX_FILTER_CHANGED',
   'data': { '*name': 'str', 'path': 'str' } }
+
+##
+# @FAILOVER_STANDBY_CHANGED:
+#
+# Emitted whenever the virtio_net driver status changes (either the guest
+# successfully loads the driver after the F_STANDBY feature bit is negotiated,
+# or the guest unloads the driver or reboots).
+#
+# @device: Indicates the virtio_net device.
+#
+# @path: Indicates the device path.
+#
+# @enabled: true if the virtio_net driver is loaded.
+#           false if the virtio_net driver is unloaded or the guest reboots.
+#
+# Since: 4.0
+#
+# Example:
+#
+# <- { "event": "FAILOVER_STANDBY_CHANGED",
+#      "data": { "device": "net0",
+#                "path": "/machine/peripheral/net0/virtio-backend",
+#                "enabled": "true" },
+#      "timestamp": { "seconds": 1432121972, "microseconds": 744001 } },
+#
+##
+{ 'event': 'FAILOVER_STANDBY_CHANGED',
+  'data': {'*device': 'str', 'path': 'str', 'enabled': 'bool'} }

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Qemu-devel] [PATCH v3 3/5] virtio_net: Add a query command for FAILOVER_STANDBY_CHANGED event.
  2019-01-07 22:29 ` [virtio-dev] " Venu Busireddy
@ 2019-01-07 22:29   ` Venu Busireddy
  -1 siblings, 0 replies; 57+ messages in thread
From: Venu Busireddy @ 2019-01-07 22:29 UTC (permalink / raw)
  To: venu.busireddy, Michael S. Tsirkin, Marcel Apfelbaum
  Cc: virtio-dev, qemu-devel

Add a query command to check the status of the FAILOVER_STANDBY_CHANGED
state of the virtio_net devices.

Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
---
 hw/net/virtio-net.c            | 16 +++++++++++
 include/hw/virtio/virtio-net.h |  1 +
 include/net/net.h              |  2 ++
 net/net.c                      | 61 ++++++++++++++++++++++++++++++++++++++++++
 qapi/net.json                  | 46 +++++++++++++++++++++++++++++++
 5 files changed, 126 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 7b1bcde..a4e07ac 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -263,9 +263,11 @@ static void virtio_net_failover_notify_event(VirtIONet *n, uint8_t status)
          */
         if ((status & VIRTIO_CONFIG_S_DRIVER_OK) &&
                 (!(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK))) {
+            n->standby_enabled = true;
             qapi_event_send_failover_standby_changed(!!ncn, ncn, path, true);
         } else if ((!(status & VIRTIO_CONFIG_S_DRIVER_OK)) &&
                 (vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) {
+            n->standby_enabled = false;
             qapi_event_send_failover_standby_changed(!!ncn, ncn, path, false);
         }
     }
@@ -448,6 +450,19 @@ static RxFilterInfo *virtio_net_query_rxfilter(NetClientState *nc)
     return info;
 }
 
+static StandbyStatusInfo *virtio_net_query_standby_status(NetClientState *nc)
+{
+    StandbyStatusInfo *info;
+    VirtIONet *n = qemu_get_nic_opaque(nc);
+
+    info = g_malloc0(sizeof(*info));
+    info->device = g_strdup(n->netclient_name);
+    info->path = g_strdup(object_get_canonical_path(OBJECT(n->qdev)));
+    info->enabled = n->standby_enabled;
+
+    return info;
+}
+
 static void virtio_net_reset(VirtIODevice *vdev)
 {
     VirtIONet *n = VIRTIO_NET(vdev);
@@ -1923,6 +1938,7 @@ static NetClientInfo net_virtio_info = {
     .receive = virtio_net_receive,
     .link_status_changed = virtio_net_set_link_status,
     .query_rx_filter = virtio_net_query_rxfilter,
+    .query_standby_status = virtio_net_query_standby_status,
 };
 
 static bool virtio_net_guest_notifier_pending(VirtIODevice *vdev, int idx)
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index 4d7f3c8..9071e96 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -103,6 +103,7 @@ typedef struct VirtIONet {
     int announce_counter;
     bool needs_vnet_hdr_swap;
     bool mtu_bypass_backend;
+    bool standby_enabled;
 } VirtIONet;
 
 void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
diff --git a/include/net/net.h b/include/net/net.h
index ec13702..61e8513 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -50,6 +50,7 @@ typedef void (NetCleanup) (NetClientState *);
 typedef void (LinkStatusChanged)(NetClientState *);
 typedef void (NetClientDestructor)(NetClientState *);
 typedef RxFilterInfo *(QueryRxFilter)(NetClientState *);
+typedef StandbyStatusInfo *(QueryStandbyStatus)(NetClientState *);
 typedef bool (HasUfo)(NetClientState *);
 typedef bool (HasVnetHdr)(NetClientState *);
 typedef bool (HasVnetHdrLen)(NetClientState *, int);
@@ -71,6 +72,7 @@ typedef struct NetClientInfo {
     NetCleanup *cleanup;
     LinkStatusChanged *link_status_changed;
     QueryRxFilter *query_rx_filter;
+    QueryStandbyStatus *query_standby_status;
     NetPoll *poll;
     HasUfo *has_ufo;
     HasVnetHdr *has_vnet_hdr;
diff --git a/net/net.c b/net/net.c
index 1f7d626..fbf288e 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1320,6 +1320,67 @@ RxFilterInfoList *qmp_query_rx_filter(bool has_name, const char *name,
     return filter_list;
 }
 
+StandbyStatusInfoList *qmp_query_standby_status(bool has_device,
+                                                const char *device,
+                                                Error **errp)
+{
+    NetClientState *nc;
+    StandbyStatusInfoList *status_list = NULL, *last_entry = NULL;
+
+    QTAILQ_FOREACH(nc, &net_clients, next) {
+        StandbyStatusInfoList *entry;
+        StandbyStatusInfo *info;
+
+        if (has_device && strcmp(nc->name, device) != 0) {
+            continue;
+        }
+
+        /* only query standby status information of NIC */
+        if (nc->info->type != NET_CLIENT_DRIVER_NIC) {
+            if (has_device) {
+                error_setg(errp, "net client(%s) isn't a NIC", device);
+                return NULL;
+            }
+            continue;
+        }
+
+        /*
+         * only query information on queue 0 since the info is per nic,
+         * not per queue.
+         */
+        if (nc->queue_index != 0) {
+            continue;
+        }
+
+        if (nc->info->query_standby_status) {
+            info = nc->info->query_standby_status(nc);
+            entry = g_malloc0(sizeof(*entry));
+            entry->value = info;
+
+            if (!status_list) {
+                status_list = entry;
+            } else {
+                last_entry->next = entry;
+            }
+            last_entry = entry;
+        } else if (has_device) {
+            error_setg(errp, "net client(%s) doesn't support"
+                       " standby status querying", device);
+            return NULL;
+        }
+
+        if (has_device) {
+            break;
+        }
+    }
+
+    if (status_list == NULL && has_device) {
+        error_setg(errp, "invalid net client name: %s", device);
+    }
+
+    return status_list;
+}
+
 void hmp_info_network(Monitor *mon, const QDict *qdict)
 {
     NetClientState *nc, *peer;
diff --git a/qapi/net.json b/qapi/net.json
index 6a6d6fe..633ac87 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -711,3 +711,49 @@
 ##
 { 'event': 'FAILOVER_STANDBY_CHANGED',
   'data': {'*device': 'str', 'path': 'str', 'enabled': 'bool'} }
+
+##
+# @StandbyStatusInfo:
+#
+# Standby status information for a virtio_net device.
+#
+# @device: Indicates the virtio_net device.
+#
+# @path: Indicates the device path.
+#
+# @enabled: true if the virtio_net driver is loaded.
+#           false if the virtio_net driver is unloaded or the guest rebooted.
+#
+# Since: 4.0
+##
+{ 'struct': 'StandbyStatusInfo',
+  'data': {'device': 'str', 'path': 'str', 'enabled': 'bool'} }
+
+##
+# @query-standby-status:
+#
+# Return Standby status information for all virtio_net devices,
+#        or for the given virtio_net device.
+#
+# @device: Name of the virtio_net device.
+#
+# Returns: List of @StandbyStatusInfo for all virtio_net devices,
+#          or for the given virtio_net device.
+#          Returns an error if the given @device doesn't exist.
+#
+# Since: 4.0
+#
+# Example:
+#
+# -> { "execute": "query-standby-status", "arguments": { "device": "net0" } }
+# <- { "return": [
+#                  { 'device': 'net0',
+#                    'path': '/machine/peripheral/net0/virtio-backend',
+#                    'enabled': 'true'
+#                  }
+#                ]
+#    }
+#
+##
+{ 'command': 'query-standby-status', 'data': { '*device': 'str' },
+  'returns': ['StandbyStatusInfo'] }

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [virtio-dev] [PATCH v3 3/5] virtio_net: Add a query command for FAILOVER_STANDBY_CHANGED event.
@ 2019-01-07 22:29   ` Venu Busireddy
  0 siblings, 0 replies; 57+ messages in thread
From: Venu Busireddy @ 2019-01-07 22:29 UTC (permalink / raw)
  To: venu.busireddy, Michael S. Tsirkin, Marcel Apfelbaum
  Cc: virtio-dev, qemu-devel

Add a query command to check the status of the FAILOVER_STANDBY_CHANGED
state of the virtio_net devices.

Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
---
 hw/net/virtio-net.c            | 16 +++++++++++
 include/hw/virtio/virtio-net.h |  1 +
 include/net/net.h              |  2 ++
 net/net.c                      | 61 ++++++++++++++++++++++++++++++++++++++++++
 qapi/net.json                  | 46 +++++++++++++++++++++++++++++++
 5 files changed, 126 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 7b1bcde..a4e07ac 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -263,9 +263,11 @@ static void virtio_net_failover_notify_event(VirtIONet *n, uint8_t status)
          */
         if ((status & VIRTIO_CONFIG_S_DRIVER_OK) &&
                 (!(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK))) {
+            n->standby_enabled = true;
             qapi_event_send_failover_standby_changed(!!ncn, ncn, path, true);
         } else if ((!(status & VIRTIO_CONFIG_S_DRIVER_OK)) &&
                 (vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) {
+            n->standby_enabled = false;
             qapi_event_send_failover_standby_changed(!!ncn, ncn, path, false);
         }
     }
@@ -448,6 +450,19 @@ static RxFilterInfo *virtio_net_query_rxfilter(NetClientState *nc)
     return info;
 }
 
+static StandbyStatusInfo *virtio_net_query_standby_status(NetClientState *nc)
+{
+    StandbyStatusInfo *info;
+    VirtIONet *n = qemu_get_nic_opaque(nc);
+
+    info = g_malloc0(sizeof(*info));
+    info->device = g_strdup(n->netclient_name);
+    info->path = g_strdup(object_get_canonical_path(OBJECT(n->qdev)));
+    info->enabled = n->standby_enabled;
+
+    return info;
+}
+
 static void virtio_net_reset(VirtIODevice *vdev)
 {
     VirtIONet *n = VIRTIO_NET(vdev);
@@ -1923,6 +1938,7 @@ static NetClientInfo net_virtio_info = {
     .receive = virtio_net_receive,
     .link_status_changed = virtio_net_set_link_status,
     .query_rx_filter = virtio_net_query_rxfilter,
+    .query_standby_status = virtio_net_query_standby_status,
 };
 
 static bool virtio_net_guest_notifier_pending(VirtIODevice *vdev, int idx)
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index 4d7f3c8..9071e96 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -103,6 +103,7 @@ typedef struct VirtIONet {
     int announce_counter;
     bool needs_vnet_hdr_swap;
     bool mtu_bypass_backend;
+    bool standby_enabled;
 } VirtIONet;
 
 void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
diff --git a/include/net/net.h b/include/net/net.h
index ec13702..61e8513 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -50,6 +50,7 @@ typedef void (NetCleanup) (NetClientState *);
 typedef void (LinkStatusChanged)(NetClientState *);
 typedef void (NetClientDestructor)(NetClientState *);
 typedef RxFilterInfo *(QueryRxFilter)(NetClientState *);
+typedef StandbyStatusInfo *(QueryStandbyStatus)(NetClientState *);
 typedef bool (HasUfo)(NetClientState *);
 typedef bool (HasVnetHdr)(NetClientState *);
 typedef bool (HasVnetHdrLen)(NetClientState *, int);
@@ -71,6 +72,7 @@ typedef struct NetClientInfo {
     NetCleanup *cleanup;
     LinkStatusChanged *link_status_changed;
     QueryRxFilter *query_rx_filter;
+    QueryStandbyStatus *query_standby_status;
     NetPoll *poll;
     HasUfo *has_ufo;
     HasVnetHdr *has_vnet_hdr;
diff --git a/net/net.c b/net/net.c
index 1f7d626..fbf288e 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1320,6 +1320,67 @@ RxFilterInfoList *qmp_query_rx_filter(bool has_name, const char *name,
     return filter_list;
 }
 
+StandbyStatusInfoList *qmp_query_standby_status(bool has_device,
+                                                const char *device,
+                                                Error **errp)
+{
+    NetClientState *nc;
+    StandbyStatusInfoList *status_list = NULL, *last_entry = NULL;
+
+    QTAILQ_FOREACH(nc, &net_clients, next) {
+        StandbyStatusInfoList *entry;
+        StandbyStatusInfo *info;
+
+        if (has_device && strcmp(nc->name, device) != 0) {
+            continue;
+        }
+
+        /* only query standby status information of NIC */
+        if (nc->info->type != NET_CLIENT_DRIVER_NIC) {
+            if (has_device) {
+                error_setg(errp, "net client(%s) isn't a NIC", device);
+                return NULL;
+            }
+            continue;
+        }
+
+        /*
+         * only query information on queue 0 since the info is per nic,
+         * not per queue.
+         */
+        if (nc->queue_index != 0) {
+            continue;
+        }
+
+        if (nc->info->query_standby_status) {
+            info = nc->info->query_standby_status(nc);
+            entry = g_malloc0(sizeof(*entry));
+            entry->value = info;
+
+            if (!status_list) {
+                status_list = entry;
+            } else {
+                last_entry->next = entry;
+            }
+            last_entry = entry;
+        } else if (has_device) {
+            error_setg(errp, "net client(%s) doesn't support"
+                       " standby status querying", device);
+            return NULL;
+        }
+
+        if (has_device) {
+            break;
+        }
+    }
+
+    if (status_list == NULL && has_device) {
+        error_setg(errp, "invalid net client name: %s", device);
+    }
+
+    return status_list;
+}
+
 void hmp_info_network(Monitor *mon, const QDict *qdict)
 {
     NetClientState *nc, *peer;
diff --git a/qapi/net.json b/qapi/net.json
index 6a6d6fe..633ac87 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -711,3 +711,49 @@
 ##
 { 'event': 'FAILOVER_STANDBY_CHANGED',
   'data': {'*device': 'str', 'path': 'str', 'enabled': 'bool'} }
+
+##
+# @StandbyStatusInfo:
+#
+# Standby status information for a virtio_net device.
+#
+# @device: Indicates the virtio_net device.
+#
+# @path: Indicates the device path.
+#
+# @enabled: true if the virtio_net driver is loaded.
+#           false if the virtio_net driver is unloaded or the guest rebooted.
+#
+# Since: 4.0
+##
+{ 'struct': 'StandbyStatusInfo',
+  'data': {'device': 'str', 'path': 'str', 'enabled': 'bool'} }
+
+##
+# @query-standby-status:
+#
+# Return Standby status information for all virtio_net devices,
+#        or for the given virtio_net device.
+#
+# @device: Name of the virtio_net device.
+#
+# Returns: List of @StandbyStatusInfo for all virtio_net devices,
+#          or for the given virtio_net device.
+#          Returns an error if the given @device doesn't exist.
+#
+# Since: 4.0
+#
+# Example:
+#
+# -> { "execute": "query-standby-status", "arguments": { "device": "net0" } }
+# <- { "return": [
+#                  { 'device': 'net0',
+#                    'path': '/machine/peripheral/net0/virtio-backend',
+#                    'enabled': 'true'
+#                  }
+#                ]
+#    }
+#
+##
+{ 'command': 'query-standby-status', 'data': { '*device': 'str' },
+  'returns': ['StandbyStatusInfo'] }

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Qemu-devel] [PATCH v3 4/5] vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover
  2019-01-07 22:29 ` [virtio-dev] " Venu Busireddy
@ 2019-01-07 22:29   ` Venu Busireddy
  -1 siblings, 0 replies; 57+ messages in thread
From: Venu Busireddy @ 2019-01-07 22:29 UTC (permalink / raw)
  To: venu.busireddy, Michael S. Tsirkin, Marcel Apfelbaum
  Cc: Si-Wei Liu, virtio-dev, qemu-devel

From: Si-Wei Liu <si-wei.liu@oracle.com>

When a VF is hotplugged into the guest, datapath switching will be
performed immediately, which is sub-optimal in terms of timing, and
could end up with substantial network downtime. One of ways to shorten
this downtime is to switch the datapath only after the VF is seen to get
enabled by guest, indicated by the bus master bit in VF's PCI config
space getting enabled. The FAILOVER_PRIMARY_CHANGED event is emitted
at that time to indicate this condition. Then management stack can kick
off datapath switching upon receiving the event.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
---
 hw/vfio/pci.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 qapi/net.json | 26 ++++++++++++++++++++++++++
 2 files changed, 83 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index bd83b58..adcc95a 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -34,6 +34,7 @@
 #include "pci.h"
 #include "trace.h"
 #include "qapi/error.h"
+#include "qapi/qapi-events-net.h"
 
 #define MSIX_CAP_LENGTH 12
 
@@ -42,6 +43,7 @@
 
 static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
 static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
+static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status);
 
 /*
  * Disabling BAR mmaping can be slow, but toggling it around INTx can
@@ -1170,6 +1172,8 @@ void vfio_pci_write_config(PCIDevice *pdev,
 {
     VFIOPCIDevice *vdev = PCI_VFIO(pdev);
     uint32_t val_le = cpu_to_le32(val);
+    bool may_notify = false;
+    bool master_was = false;
 
     trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len);
 
@@ -1180,6 +1184,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
                      __func__, vdev->vbasedev.name, addr, val, len);
     }
 
+    /* Bus Master Enabling/Disabling */
+    if (pdev->failover_primary && current_cpu &&
+        range_covers_byte(addr, len, PCI_COMMAND)) {
+        master_was = !!(pci_get_word(pdev->config + PCI_COMMAND) &
+                        PCI_COMMAND_MASTER);
+        may_notify = true;
+    }
+
     /* MSI/MSI-X Enabling/Disabling */
     if (pdev->cap_present & QEMU_PCI_CAP_MSI &&
         ranges_overlap(addr, len, pdev->msi_cap, vdev->msi_cap_size)) {
@@ -1235,6 +1247,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
         /* Write everything to QEMU to keep emulated bits correct */
         pci_default_write_config(pdev, addr, val, len);
     }
+
+    if (may_notify) {
+        bool master_now = !!(pci_get_word(pdev->config + PCI_COMMAND) &
+                             PCI_COMMAND_MASTER);
+        if (master_was != master_now) {
+            vfio_failover_notify(vdev, master_now);
+        }
+    }
 }
 
 /*
@@ -2801,6 +2821,17 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
     vdev->req_enabled = false;
 }
 
+static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status)
+{
+    PCIDevice *pdev = &vdev->pdev;
+    const char *n;
+    gchar *path;
+
+    n = pdev->qdev.id ? pdev->qdev.id : vdev->vbasedev.name;
+    path = object_get_canonical_path(OBJECT(vdev));
+    qapi_event_send_failover_primary_changed(!!n, n, path, status);
+}
+
 static void vfio_realize(PCIDevice *pdev, Error **errp)
 {
     VFIOPCIDevice *vdev = PCI_VFIO(pdev);
@@ -3109,10 +3140,36 @@ static void vfio_instance_finalize(Object *obj)
     vfio_put_group(group);
 }
 
+static void vfio_exit_failover_notify(VFIOPCIDevice *vdev)
+{
+    PCIDevice *pdev = &vdev->pdev;
+
+    /*
+     * Guest driver may not get the chance to disable bus mastering
+     * before the device object gets to be unrealized. In that event,
+     * send out a "disabled" notification on behalf of guest driver.
+     */
+    if (pdev->failover_primary &&
+        pdev->bus_master_enable_region.enabled) {
+        vfio_failover_notify(vdev, false);
+    }
+}
+
 static void vfio_exitfn(PCIDevice *pdev)
 {
     VFIOPCIDevice *vdev = PCI_VFIO(pdev);
 
+    /*
+     * During the guest reboot sequence, it is sometimes possible that
+     * the guest may not get sufficient time to complete the entire driver
+     * removal sequence, near the end of which a PCI config space write to
+     * disable bus mastering can be intercepted by device. In such cases,
+     * the FAILOVER_PRIMARY_CHANGED "disable" event will not be emitted. It
+     * is imperative to generate the event on the guest's behalf if the
+     * guest fails to make it.
+     */
+    vfio_exit_failover_notify(vdev);
+
     vfio_unregister_req_notifier(vdev);
     vfio_unregister_err_notifier(vdev);
     pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
diff --git a/qapi/net.json b/qapi/net.json
index 633ac87..a5b8d70 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -757,3 +757,29 @@
 ##
 { 'command': 'query-standby-status', 'data': { '*device': 'str' },
   'returns': ['StandbyStatusInfo'] }
+
+##
+# @FAILOVER_PRIMARY_CHANGED:
+#
+# Emitted whenever the driver of failover primary is loaded or unloaded
+# by the guest.
+#
+# @device: device name
+#
+# @path: device path
+#
+# @enabled: true if driver is loaded thus device is enabled in guest
+#
+# Since: 3.0
+#
+# Example:
+#
+# <- { "event": "FAILOVER_PRIMARY_CHANGED",
+#      "data": { "device": "vfio-0",
+#                "path": "/machine/peripheral/vfio-0" },
+#                "enabled": "true" },
+#      "timestamp": { "seconds": 1539935213, "microseconds": 753529 } }
+#
+##
+{ 'event': 'FAILOVER_PRIMARY_CHANGED',
+  'data': { '*device': 'str', 'path': 'str', 'enabled': 'bool' } }

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [virtio-dev] [PATCH v3 4/5] vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover
@ 2019-01-07 22:29   ` Venu Busireddy
  0 siblings, 0 replies; 57+ messages in thread
From: Venu Busireddy @ 2019-01-07 22:29 UTC (permalink / raw)
  To: venu.busireddy, Michael S. Tsirkin, Marcel Apfelbaum
  Cc: Si-Wei Liu, virtio-dev, qemu-devel

From: Si-Wei Liu <si-wei.liu@oracle.com>

When a VF is hotplugged into the guest, datapath switching will be
performed immediately, which is sub-optimal in terms of timing, and
could end up with substantial network downtime. One of ways to shorten
this downtime is to switch the datapath only after the VF is seen to get
enabled by guest, indicated by the bus master bit in VF's PCI config
space getting enabled. The FAILOVER_PRIMARY_CHANGED event is emitted
at that time to indicate this condition. Then management stack can kick
off datapath switching upon receiving the event.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
---
 hw/vfio/pci.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 qapi/net.json | 26 ++++++++++++++++++++++++++
 2 files changed, 83 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index bd83b58..adcc95a 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -34,6 +34,7 @@
 #include "pci.h"
 #include "trace.h"
 #include "qapi/error.h"
+#include "qapi/qapi-events-net.h"
 
 #define MSIX_CAP_LENGTH 12
 
@@ -42,6 +43,7 @@
 
 static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
 static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
+static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status);
 
 /*
  * Disabling BAR mmaping can be slow, but toggling it around INTx can
@@ -1170,6 +1172,8 @@ void vfio_pci_write_config(PCIDevice *pdev,
 {
     VFIOPCIDevice *vdev = PCI_VFIO(pdev);
     uint32_t val_le = cpu_to_le32(val);
+    bool may_notify = false;
+    bool master_was = false;
 
     trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len);
 
@@ -1180,6 +1184,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
                      __func__, vdev->vbasedev.name, addr, val, len);
     }
 
+    /* Bus Master Enabling/Disabling */
+    if (pdev->failover_primary && current_cpu &&
+        range_covers_byte(addr, len, PCI_COMMAND)) {
+        master_was = !!(pci_get_word(pdev->config + PCI_COMMAND) &
+                        PCI_COMMAND_MASTER);
+        may_notify = true;
+    }
+
     /* MSI/MSI-X Enabling/Disabling */
     if (pdev->cap_present & QEMU_PCI_CAP_MSI &&
         ranges_overlap(addr, len, pdev->msi_cap, vdev->msi_cap_size)) {
@@ -1235,6 +1247,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
         /* Write everything to QEMU to keep emulated bits correct */
         pci_default_write_config(pdev, addr, val, len);
     }
+
+    if (may_notify) {
+        bool master_now = !!(pci_get_word(pdev->config + PCI_COMMAND) &
+                             PCI_COMMAND_MASTER);
+        if (master_was != master_now) {
+            vfio_failover_notify(vdev, master_now);
+        }
+    }
 }
 
 /*
@@ -2801,6 +2821,17 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
     vdev->req_enabled = false;
 }
 
+static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status)
+{
+    PCIDevice *pdev = &vdev->pdev;
+    const char *n;
+    gchar *path;
+
+    n = pdev->qdev.id ? pdev->qdev.id : vdev->vbasedev.name;
+    path = object_get_canonical_path(OBJECT(vdev));
+    qapi_event_send_failover_primary_changed(!!n, n, path, status);
+}
+
 static void vfio_realize(PCIDevice *pdev, Error **errp)
 {
     VFIOPCIDevice *vdev = PCI_VFIO(pdev);
@@ -3109,10 +3140,36 @@ static void vfio_instance_finalize(Object *obj)
     vfio_put_group(group);
 }
 
+static void vfio_exit_failover_notify(VFIOPCIDevice *vdev)
+{
+    PCIDevice *pdev = &vdev->pdev;
+
+    /*
+     * Guest driver may not get the chance to disable bus mastering
+     * before the device object gets to be unrealized. In that event,
+     * send out a "disabled" notification on behalf of guest driver.
+     */
+    if (pdev->failover_primary &&
+        pdev->bus_master_enable_region.enabled) {
+        vfio_failover_notify(vdev, false);
+    }
+}
+
 static void vfio_exitfn(PCIDevice *pdev)
 {
     VFIOPCIDevice *vdev = PCI_VFIO(pdev);
 
+    /*
+     * During the guest reboot sequence, it is sometimes possible that
+     * the guest may not get sufficient time to complete the entire driver
+     * removal sequence, near the end of which a PCI config space write to
+     * disable bus mastering can be intercepted by device. In such cases,
+     * the FAILOVER_PRIMARY_CHANGED "disable" event will not be emitted. It
+     * is imperative to generate the event on the guest's behalf if the
+     * guest fails to make it.
+     */
+    vfio_exit_failover_notify(vdev);
+
     vfio_unregister_req_notifier(vdev);
     vfio_unregister_err_notifier(vdev);
     pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
diff --git a/qapi/net.json b/qapi/net.json
index 633ac87..a5b8d70 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -757,3 +757,29 @@
 ##
 { 'command': 'query-standby-status', 'data': { '*device': 'str' },
   'returns': ['StandbyStatusInfo'] }
+
+##
+# @FAILOVER_PRIMARY_CHANGED:
+#
+# Emitted whenever the driver of failover primary is loaded or unloaded
+# by the guest.
+#
+# @device: device name
+#
+# @path: device path
+#
+# @enabled: true if driver is loaded thus device is enabled in guest
+#
+# Since: 3.0
+#
+# Example:
+#
+# <- { "event": "FAILOVER_PRIMARY_CHANGED",
+#      "data": { "device": "vfio-0",
+#                "path": "/machine/peripheral/vfio-0" },
+#                "enabled": "true" },
+#      "timestamp": { "seconds": 1539935213, "microseconds": 753529 } }
+#
+##
+{ 'event': 'FAILOVER_PRIMARY_CHANGED',
+  'data': { '*device': 'str', 'path': 'str', 'enabled': 'bool' } }

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Qemu-devel] [PATCH v3 5/5] pci: query command extension to check the bus master enabling status of the failover-primary device
  2019-01-07 22:29 ` [virtio-dev] " Venu Busireddy
@ 2019-01-07 22:29   ` Venu Busireddy
  -1 siblings, 0 replies; 57+ messages in thread
From: Venu Busireddy @ 2019-01-07 22:29 UTC (permalink / raw)
  To: venu.busireddy, Michael S. Tsirkin, Marcel Apfelbaum
  Cc: Si-Wei Liu, virtio-dev, qemu-devel

From: Si-Wei Liu <si-wei.liu@oracle.com>

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
---
 hmp.c          | 5 +++++
 hw/pci/pci.c   | 5 +++++
 qapi/misc.json | 5 ++++-
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/hmp.c b/hmp.c
index 7828f93..7a75c93 100644
--- a/hmp.c
+++ b/hmp.c
@@ -890,6 +890,11 @@ static void hmp_info_pci_device(Monitor *mon, const PciDeviceInfo *dev)
         }
     }
 
+    if (dev->has_failover_status) {
+        monitor_printf(mon, "      Failover primary, bus master %s.\n",
+                       dev->failover_status ? "enabled" : "disabled");
+    }
+
     monitor_printf(mon, "      id \"%s\"\n", dev->qdev_id);
 
     if (dev->has_pci_bridge) {
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 56b13b3..9da49fd 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1761,6 +1761,11 @@ static PciDeviceInfo *qmp_query_pci_device(PCIDevice *dev, PCIBus *bus,
             pci_get_word(dev->config + PCI_CB_SUBSYSTEM_VENDOR_ID);
     }
 
+    if (dev->failover_primary) {
+        info->has_failover_status = true;
+        info->failover_status = dev->bus_master_enable_region.enabled;
+    }
+
     return info;
 }
 
diff --git a/qapi/misc.json b/qapi/misc.json
index 6c1c5c0..05f003e 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -865,6 +865,9 @@
 #
 # @regions: a list of the PCI I/O regions associated with the device
 #
+# @failover_status: if 'failover-primary' property is 'true', true if PCI
+#                   bus master bit on the device is enabled
+#
 # Notes: the contents of @class_info.desc are not stable and should only be
 #        treated as informational.
 #
@@ -874,7 +877,7 @@
   'data': {'bus': 'int', 'slot': 'int', 'function': 'int',
            'class_info': 'PciDeviceClass', 'id': 'PciDeviceId',
            '*irq': 'int', 'qdev_id': 'str', '*pci_bridge': 'PciBridgeInfo',
-           'regions': ['PciMemoryRegion']} }
+           'regions': ['PciMemoryRegion'], '*failover_status': 'bool'} }
 
 ##
 # @PciInfo:

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [virtio-dev] [PATCH v3 5/5] pci: query command extension to check the bus master enabling status of the failover-primary device
@ 2019-01-07 22:29   ` Venu Busireddy
  0 siblings, 0 replies; 57+ messages in thread
From: Venu Busireddy @ 2019-01-07 22:29 UTC (permalink / raw)
  To: venu.busireddy, Michael S. Tsirkin, Marcel Apfelbaum
  Cc: Si-Wei Liu, virtio-dev, qemu-devel

From: Si-Wei Liu <si-wei.liu@oracle.com>

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
---
 hmp.c          | 5 +++++
 hw/pci/pci.c   | 5 +++++
 qapi/misc.json | 5 ++++-
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/hmp.c b/hmp.c
index 7828f93..7a75c93 100644
--- a/hmp.c
+++ b/hmp.c
@@ -890,6 +890,11 @@ static void hmp_info_pci_device(Monitor *mon, const PciDeviceInfo *dev)
         }
     }
 
+    if (dev->has_failover_status) {
+        monitor_printf(mon, "      Failover primary, bus master %s.\n",
+                       dev->failover_status ? "enabled" : "disabled");
+    }
+
     monitor_printf(mon, "      id \"%s\"\n", dev->qdev_id);
 
     if (dev->has_pci_bridge) {
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 56b13b3..9da49fd 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1761,6 +1761,11 @@ static PciDeviceInfo *qmp_query_pci_device(PCIDevice *dev, PCIBus *bus,
             pci_get_word(dev->config + PCI_CB_SUBSYSTEM_VENDOR_ID);
     }
 
+    if (dev->failover_primary) {
+        info->has_failover_status = true;
+        info->failover_status = dev->bus_master_enable_region.enabled;
+    }
+
     return info;
 }
 
diff --git a/qapi/misc.json b/qapi/misc.json
index 6c1c5c0..05f003e 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -865,6 +865,9 @@
 #
 # @regions: a list of the PCI I/O regions associated with the device
 #
+# @failover_status: if 'failover-primary' property is 'true', true if PCI
+#                   bus master bit on the device is enabled
+#
 # Notes: the contents of @class_info.desc are not stable and should only be
 #        treated as informational.
 #
@@ -874,7 +877,7 @@
   'data': {'bus': 'int', 'slot': 'int', 'function': 'int',
            'class_info': 'PciDeviceClass', 'id': 'PciDeviceId',
            '*irq': 'int', 'qdev_id': 'str', '*pci_bridge': 'PciBridgeInfo',
-           'regions': ['PciMemoryRegion']} }
+           'regions': ['PciMemoryRegion'], '*failover_status': 'bool'} }
 
 ##
 # @PciInfo:

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/5] vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover
  2019-01-07 22:29   ` [virtio-dev] " Venu Busireddy
  (?)
@ 2019-01-07 23:17   ` Alex Williamson
  2019-01-07 23:22       ` [virtio-dev] " Michael S. Tsirkin
  -1 siblings, 1 reply; 57+ messages in thread
From: Alex Williamson @ 2019-01-07 23:17 UTC (permalink / raw)
  To: Venu Busireddy
  Cc: Michael S. Tsirkin, Marcel Apfelbaum, Si-Wei Liu, virtio-dev, qemu-devel

On Mon,  7 Jan 2019 17:29:43 -0500
Venu Busireddy <venu.busireddy@oracle.com> wrote:

> From: Si-Wei Liu <si-wei.liu@oracle.com>
> 
> When a VF is hotplugged into the guest, datapath switching will be
> performed immediately, which is sub-optimal in terms of timing, and
> could end up with substantial network downtime. One of ways to shorten
> this downtime is to switch the datapath only after the VF is seen to get
> enabled by guest, indicated by the bus master bit in VF's PCI config
> space getting enabled. The FAILOVER_PRIMARY_CHANGED event is emitted
> at that time to indicate this condition. Then management stack can kick
> off datapath switching upon receiving the event.
> 
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
> ---
>  hw/vfio/pci.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  qapi/net.json | 26 ++++++++++++++++++++++++++
>  2 files changed, 83 insertions(+)

Why is this done at the vfio driver layer rather than the PCI core
layer?  We write everything through using pci_default_write_config(), I
don't see that anything here is particularly vfio specific.  Please copy
me on any changes in hw/vfio.  Thanks,

Alex

> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index bd83b58..adcc95a 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -34,6 +34,7 @@
>  #include "pci.h"
>  #include "trace.h"
>  #include "qapi/error.h"
> +#include "qapi/qapi-events-net.h"
>  
>  #define MSIX_CAP_LENGTH 12
>  
> @@ -42,6 +43,7 @@
>  
>  static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
>  static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
> +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status);
>  
>  /*
>   * Disabling BAR mmaping can be slow, but toggling it around INTx can
> @@ -1170,6 +1172,8 @@ void vfio_pci_write_config(PCIDevice *pdev,
>  {
>      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
>      uint32_t val_le = cpu_to_le32(val);
> +    bool may_notify = false;
> +    bool master_was = false;
>  
>      trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len);
>  
> @@ -1180,6 +1184,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
>                       __func__, vdev->vbasedev.name, addr, val, len);
>      }
>  
> +    /* Bus Master Enabling/Disabling */
> +    if (pdev->failover_primary && current_cpu &&
> +        range_covers_byte(addr, len, PCI_COMMAND)) {
> +        master_was = !!(pci_get_word(pdev->config + PCI_COMMAND) &
> +                        PCI_COMMAND_MASTER);
> +        may_notify = true;
> +    }
> +
>      /* MSI/MSI-X Enabling/Disabling */
>      if (pdev->cap_present & QEMU_PCI_CAP_MSI &&
>          ranges_overlap(addr, len, pdev->msi_cap, vdev->msi_cap_size)) {
> @@ -1235,6 +1247,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
>          /* Write everything to QEMU to keep emulated bits correct */
>          pci_default_write_config(pdev, addr, val, len);
>      }
> +
> +    if (may_notify) {
> +        bool master_now = !!(pci_get_word(pdev->config + PCI_COMMAND) &
> +                             PCI_COMMAND_MASTER);
> +        if (master_was != master_now) {
> +            vfio_failover_notify(vdev, master_now);
> +        }
> +    }
>  }
>  
>  /*
> @@ -2801,6 +2821,17 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
>      vdev->req_enabled = false;
>  }
>  
> +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status)
> +{
> +    PCIDevice *pdev = &vdev->pdev;
> +    const char *n;
> +    gchar *path;
> +
> +    n = pdev->qdev.id ? pdev->qdev.id : vdev->vbasedev.name;
> +    path = object_get_canonical_path(OBJECT(vdev));
> +    qapi_event_send_failover_primary_changed(!!n, n, path, status);
> +}
> +
>  static void vfio_realize(PCIDevice *pdev, Error **errp)
>  {
>      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> @@ -3109,10 +3140,36 @@ static void vfio_instance_finalize(Object *obj)
>      vfio_put_group(group);
>  }
>  
> +static void vfio_exit_failover_notify(VFIOPCIDevice *vdev)
> +{
> +    PCIDevice *pdev = &vdev->pdev;
> +
> +    /*
> +     * Guest driver may not get the chance to disable bus mastering
> +     * before the device object gets to be unrealized. In that event,
> +     * send out a "disabled" notification on behalf of guest driver.
> +     */
> +    if (pdev->failover_primary &&
> +        pdev->bus_master_enable_region.enabled) {
> +        vfio_failover_notify(vdev, false);
> +    }
> +}
> +
>  static void vfio_exitfn(PCIDevice *pdev)
>  {
>      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
>  
> +    /*
> +     * During the guest reboot sequence, it is sometimes possible that
> +     * the guest may not get sufficient time to complete the entire driver
> +     * removal sequence, near the end of which a PCI config space write to
> +     * disable bus mastering can be intercepted by device. In such cases,
> +     * the FAILOVER_PRIMARY_CHANGED "disable" event will not be emitted. It
> +     * is imperative to generate the event on the guest's behalf if the
> +     * guest fails to make it.
> +     */
> +    vfio_exit_failover_notify(vdev);
> +
>      vfio_unregister_req_notifier(vdev);
>      vfio_unregister_err_notifier(vdev);
>      pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
> diff --git a/qapi/net.json b/qapi/net.json
> index 633ac87..a5b8d70 100644
> --- a/qapi/net.json
> +++ b/qapi/net.json
> @@ -757,3 +757,29 @@
>  ##
>  { 'command': 'query-standby-status', 'data': { '*device': 'str' },
>    'returns': ['StandbyStatusInfo'] }
> +
> +##
> +# @FAILOVER_PRIMARY_CHANGED:
> +#
> +# Emitted whenever the driver of failover primary is loaded or unloaded
> +# by the guest.
> +#
> +# @device: device name
> +#
> +# @path: device path
> +#
> +# @enabled: true if driver is loaded thus device is enabled in guest
> +#
> +# Since: 3.0
> +#
> +# Example:
> +#
> +# <- { "event": "FAILOVER_PRIMARY_CHANGED",
> +#      "data": { "device": "vfio-0",
> +#                "path": "/machine/peripheral/vfio-0" },
> +#                "enabled": "true" },
> +#      "timestamp": { "seconds": 1539935213, "microseconds": 753529 } }
> +#
> +##
> +{ 'event': 'FAILOVER_PRIMARY_CHANGED',
> +  'data': { '*device': 'str', 'path': 'str', 'enabled': 'bool' } }
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/5] vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover
  2019-01-07 23:17   ` [Qemu-devel] " Alex Williamson
@ 2019-01-07 23:22       ` Michael S. Tsirkin
  0 siblings, 0 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2019-01-07 23:22 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Venu Busireddy, Marcel Apfelbaum, Si-Wei Liu, virtio-dev, qemu-devel

On Mon, Jan 07, 2019 at 04:17:17PM -0700, Alex Williamson wrote:
> On Mon,  7 Jan 2019 17:29:43 -0500
> Venu Busireddy <venu.busireddy@oracle.com> wrote:
> 
> > From: Si-Wei Liu <si-wei.liu@oracle.com>
> > 
> > When a VF is hotplugged into the guest, datapath switching will be
> > performed immediately, which is sub-optimal in terms of timing, and
> > could end up with substantial network downtime. One of ways to shorten
> > this downtime is to switch the datapath only after the VF is seen to get
> > enabled by guest, indicated by the bus master bit in VF's PCI config
> > space getting enabled. The FAILOVER_PRIMARY_CHANGED event is emitted
> > at that time to indicate this condition. Then management stack can kick
> > off datapath switching upon receiving the event.
> > 
> > Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> > Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
> > ---
> >  hw/vfio/pci.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  qapi/net.json | 26 ++++++++++++++++++++++++++
> >  2 files changed, 83 insertions(+)
> 
> Why is this done at the vfio driver layer rather than the PCI core
> layer?  We write everything through using pci_default_write_config(), I
> don't see that anything here is particularly vfio specific.  Please copy
> me on any changes in hw/vfio.  Thanks,
> 
> Alex

Hmm so you are saying let's send events for each device?
I don't have a problem with this but in this case
I think I would like to see a per-device option "send events".
We don't want a ton of events in the simple default config.

> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > index bd83b58..adcc95a 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -34,6 +34,7 @@
> >  #include "pci.h"
> >  #include "trace.h"
> >  #include "qapi/error.h"
> > +#include "qapi/qapi-events-net.h"
> >  
> >  #define MSIX_CAP_LENGTH 12
> >  
> > @@ -42,6 +43,7 @@
> >  
> >  static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
> >  static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
> > +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status);
> >  
> >  /*
> >   * Disabling BAR mmaping can be slow, but toggling it around INTx can
> > @@ -1170,6 +1172,8 @@ void vfio_pci_write_config(PCIDevice *pdev,
> >  {
> >      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> >      uint32_t val_le = cpu_to_le32(val);
> > +    bool may_notify = false;
> > +    bool master_was = false;
> >  
> >      trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len);
> >  
> > @@ -1180,6 +1184,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
> >                       __func__, vdev->vbasedev.name, addr, val, len);
> >      }
> >  
> > +    /* Bus Master Enabling/Disabling */
> > +    if (pdev->failover_primary && current_cpu &&
> > +        range_covers_byte(addr, len, PCI_COMMAND)) {
> > +        master_was = !!(pci_get_word(pdev->config + PCI_COMMAND) &
> > +                        PCI_COMMAND_MASTER);
> > +        may_notify = true;
> > +    }
> > +
> >      /* MSI/MSI-X Enabling/Disabling */
> >      if (pdev->cap_present & QEMU_PCI_CAP_MSI &&
> >          ranges_overlap(addr, len, pdev->msi_cap, vdev->msi_cap_size)) {
> > @@ -1235,6 +1247,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
> >          /* Write everything to QEMU to keep emulated bits correct */
> >          pci_default_write_config(pdev, addr, val, len);
> >      }
> > +
> > +    if (may_notify) {
> > +        bool master_now = !!(pci_get_word(pdev->config + PCI_COMMAND) &
> > +                             PCI_COMMAND_MASTER);
> > +        if (master_was != master_now) {
> > +            vfio_failover_notify(vdev, master_now);
> > +        }
> > +    }
> >  }
> >  
> >  /*
> > @@ -2801,6 +2821,17 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
> >      vdev->req_enabled = false;
> >  }
> >  
> > +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status)
> > +{
> > +    PCIDevice *pdev = &vdev->pdev;
> > +    const char *n;
> > +    gchar *path;
> > +
> > +    n = pdev->qdev.id ? pdev->qdev.id : vdev->vbasedev.name;
> > +    path = object_get_canonical_path(OBJECT(vdev));
> > +    qapi_event_send_failover_primary_changed(!!n, n, path, status);
> > +}
> > +
> >  static void vfio_realize(PCIDevice *pdev, Error **errp)
> >  {
> >      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> > @@ -3109,10 +3140,36 @@ static void vfio_instance_finalize(Object *obj)
> >      vfio_put_group(group);
> >  }
> >  
> > +static void vfio_exit_failover_notify(VFIOPCIDevice *vdev)
> > +{
> > +    PCIDevice *pdev = &vdev->pdev;
> > +
> > +    /*
> > +     * Guest driver may not get the chance to disable bus mastering
> > +     * before the device object gets to be unrealized. In that event,
> > +     * send out a "disabled" notification on behalf of guest driver.
> > +     */
> > +    if (pdev->failover_primary &&
> > +        pdev->bus_master_enable_region.enabled) {
> > +        vfio_failover_notify(vdev, false);
> > +    }
> > +}
> > +
> >  static void vfio_exitfn(PCIDevice *pdev)
> >  {
> >      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> >  
> > +    /*
> > +     * During the guest reboot sequence, it is sometimes possible that
> > +     * the guest may not get sufficient time to complete the entire driver
> > +     * removal sequence, near the end of which a PCI config space write to
> > +     * disable bus mastering can be intercepted by device. In such cases,
> > +     * the FAILOVER_PRIMARY_CHANGED "disable" event will not be emitted. It
> > +     * is imperative to generate the event on the guest's behalf if the
> > +     * guest fails to make it.
> > +     */
> > +    vfio_exit_failover_notify(vdev);
> > +
> >      vfio_unregister_req_notifier(vdev);
> >      vfio_unregister_err_notifier(vdev);
> >      pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
> > diff --git a/qapi/net.json b/qapi/net.json
> > index 633ac87..a5b8d70 100644
> > --- a/qapi/net.json
> > +++ b/qapi/net.json
> > @@ -757,3 +757,29 @@
> >  ##
> >  { 'command': 'query-standby-status', 'data': { '*device': 'str' },
> >    'returns': ['StandbyStatusInfo'] }
> > +
> > +##
> > +# @FAILOVER_PRIMARY_CHANGED:
> > +#
> > +# Emitted whenever the driver of failover primary is loaded or unloaded
> > +# by the guest.
> > +#
> > +# @device: device name
> > +#
> > +# @path: device path
> > +#
> > +# @enabled: true if driver is loaded thus device is enabled in guest
> > +#
> > +# Since: 3.0
> > +#
> > +# Example:
> > +#
> > +# <- { "event": "FAILOVER_PRIMARY_CHANGED",
> > +#      "data": { "device": "vfio-0",
> > +#                "path": "/machine/peripheral/vfio-0" },
> > +#                "enabled": "true" },
> > +#      "timestamp": { "seconds": 1539935213, "microseconds": 753529 } }
> > +#
> > +##
> > +{ 'event': 'FAILOVER_PRIMARY_CHANGED',
> > +  'data': { '*device': 'str', 'path': 'str', 'enabled': 'bool' } }
> > 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [virtio-dev] Re: [Qemu-devel] [PATCH v3 4/5] vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover
@ 2019-01-07 23:22       ` Michael S. Tsirkin
  0 siblings, 0 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2019-01-07 23:22 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Venu Busireddy, Marcel Apfelbaum, Si-Wei Liu, virtio-dev, qemu-devel

On Mon, Jan 07, 2019 at 04:17:17PM -0700, Alex Williamson wrote:
> On Mon,  7 Jan 2019 17:29:43 -0500
> Venu Busireddy <venu.busireddy@oracle.com> wrote:
> 
> > From: Si-Wei Liu <si-wei.liu@oracle.com>
> > 
> > When a VF is hotplugged into the guest, datapath switching will be
> > performed immediately, which is sub-optimal in terms of timing, and
> > could end up with substantial network downtime. One of ways to shorten
> > this downtime is to switch the datapath only after the VF is seen to get
> > enabled by guest, indicated by the bus master bit in VF's PCI config
> > space getting enabled. The FAILOVER_PRIMARY_CHANGED event is emitted
> > at that time to indicate this condition. Then management stack can kick
> > off datapath switching upon receiving the event.
> > 
> > Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> > Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
> > ---
> >  hw/vfio/pci.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  qapi/net.json | 26 ++++++++++++++++++++++++++
> >  2 files changed, 83 insertions(+)
> 
> Why is this done at the vfio driver layer rather than the PCI core
> layer?  We write everything through using pci_default_write_config(), I
> don't see that anything here is particularly vfio specific.  Please copy
> me on any changes in hw/vfio.  Thanks,
> 
> Alex

Hmm so you are saying let's send events for each device?
I don't have a problem with this but in this case
I think I would like to see a per-device option "send events".
We don't want a ton of events in the simple default config.

> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > index bd83b58..adcc95a 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -34,6 +34,7 @@
> >  #include "pci.h"
> >  #include "trace.h"
> >  #include "qapi/error.h"
> > +#include "qapi/qapi-events-net.h"
> >  
> >  #define MSIX_CAP_LENGTH 12
> >  
> > @@ -42,6 +43,7 @@
> >  
> >  static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
> >  static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
> > +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status);
> >  
> >  /*
> >   * Disabling BAR mmaping can be slow, but toggling it around INTx can
> > @@ -1170,6 +1172,8 @@ void vfio_pci_write_config(PCIDevice *pdev,
> >  {
> >      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> >      uint32_t val_le = cpu_to_le32(val);
> > +    bool may_notify = false;
> > +    bool master_was = false;
> >  
> >      trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len);
> >  
> > @@ -1180,6 +1184,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
> >                       __func__, vdev->vbasedev.name, addr, val, len);
> >      }
> >  
> > +    /* Bus Master Enabling/Disabling */
> > +    if (pdev->failover_primary && current_cpu &&
> > +        range_covers_byte(addr, len, PCI_COMMAND)) {
> > +        master_was = !!(pci_get_word(pdev->config + PCI_COMMAND) &
> > +                        PCI_COMMAND_MASTER);
> > +        may_notify = true;
> > +    }
> > +
> >      /* MSI/MSI-X Enabling/Disabling */
> >      if (pdev->cap_present & QEMU_PCI_CAP_MSI &&
> >          ranges_overlap(addr, len, pdev->msi_cap, vdev->msi_cap_size)) {
> > @@ -1235,6 +1247,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
> >          /* Write everything to QEMU to keep emulated bits correct */
> >          pci_default_write_config(pdev, addr, val, len);
> >      }
> > +
> > +    if (may_notify) {
> > +        bool master_now = !!(pci_get_word(pdev->config + PCI_COMMAND) &
> > +                             PCI_COMMAND_MASTER);
> > +        if (master_was != master_now) {
> > +            vfio_failover_notify(vdev, master_now);
> > +        }
> > +    }
> >  }
> >  
> >  /*
> > @@ -2801,6 +2821,17 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
> >      vdev->req_enabled = false;
> >  }
> >  
> > +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status)
> > +{
> > +    PCIDevice *pdev = &vdev->pdev;
> > +    const char *n;
> > +    gchar *path;
> > +
> > +    n = pdev->qdev.id ? pdev->qdev.id : vdev->vbasedev.name;
> > +    path = object_get_canonical_path(OBJECT(vdev));
> > +    qapi_event_send_failover_primary_changed(!!n, n, path, status);
> > +}
> > +
> >  static void vfio_realize(PCIDevice *pdev, Error **errp)
> >  {
> >      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> > @@ -3109,10 +3140,36 @@ static void vfio_instance_finalize(Object *obj)
> >      vfio_put_group(group);
> >  }
> >  
> > +static void vfio_exit_failover_notify(VFIOPCIDevice *vdev)
> > +{
> > +    PCIDevice *pdev = &vdev->pdev;
> > +
> > +    /*
> > +     * Guest driver may not get the chance to disable bus mastering
> > +     * before the device object gets to be unrealized. In that event,
> > +     * send out a "disabled" notification on behalf of guest driver.
> > +     */
> > +    if (pdev->failover_primary &&
> > +        pdev->bus_master_enable_region.enabled) {
> > +        vfio_failover_notify(vdev, false);
> > +    }
> > +}
> > +
> >  static void vfio_exitfn(PCIDevice *pdev)
> >  {
> >      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> >  
> > +    /*
> > +     * During the guest reboot sequence, it is sometimes possible that
> > +     * the guest may not get sufficient time to complete the entire driver
> > +     * removal sequence, near the end of which a PCI config space write to
> > +     * disable bus mastering can be intercepted by device. In such cases,
> > +     * the FAILOVER_PRIMARY_CHANGED "disable" event will not be emitted. It
> > +     * is imperative to generate the event on the guest's behalf if the
> > +     * guest fails to make it.
> > +     */
> > +    vfio_exit_failover_notify(vdev);
> > +
> >      vfio_unregister_req_notifier(vdev);
> >      vfio_unregister_err_notifier(vdev);
> >      pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
> > diff --git a/qapi/net.json b/qapi/net.json
> > index 633ac87..a5b8d70 100644
> > --- a/qapi/net.json
> > +++ b/qapi/net.json
> > @@ -757,3 +757,29 @@
> >  ##
> >  { 'command': 'query-standby-status', 'data': { '*device': 'str' },
> >    'returns': ['StandbyStatusInfo'] }
> > +
> > +##
> > +# @FAILOVER_PRIMARY_CHANGED:
> > +#
> > +# Emitted whenever the driver of failover primary is loaded or unloaded
> > +# by the guest.
> > +#
> > +# @device: device name
> > +#
> > +# @path: device path
> > +#
> > +# @enabled: true if driver is loaded thus device is enabled in guest
> > +#
> > +# Since: 3.0
> > +#
> > +# Example:
> > +#
> > +# <- { "event": "FAILOVER_PRIMARY_CHANGED",
> > +#      "data": { "device": "vfio-0",
> > +#                "path": "/machine/peripheral/vfio-0" },
> > +#                "enabled": "true" },
> > +#      "timestamp": { "seconds": 1539935213, "microseconds": 753529 } }
> > +#
> > +##
> > +{ 'event': 'FAILOVER_PRIMARY_CHANGED',
> > +  'data': { '*device': 'str', 'path': 'str', 'enabled': 'bool' } }
> > 

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/5] Support for datapath switching during live migration
  2019-01-07 22:29 ` [virtio-dev] " Venu Busireddy
@ 2019-01-07 23:32   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2019-01-07 23:32 UTC (permalink / raw)
  To: Venu Busireddy; +Cc: Marcel Apfelbaum, virtio-dev, qemu-devel

On Mon, Jan 07, 2019 at 05:29:39PM -0500, Venu Busireddy wrote:
> Implement the infrastructure to support datapath switching during live
> migration involving SR-IOV devices.
> 
> 1. This patch is based off on the current VIRTIO_NET_F_STANDBY feature
>    bit and MAC address device pairing.
> 
> 2. This set of events will be consumed by userspace management software
>    to orchestrate all the hot plug and datapath switching activities.
>    This scheme has the least QEMU modifications while allowing userspace
>    software to build its own intelligence to control the whole process
>    of SR-IOV live migration.
> 
> 3. While the hidden device model (viz. coupled device model) is still
>    being explored for automatic hot plugging (QEMU) and automatic datapath
>    switching (host-kernel), this series provides a supplemental set
>    of interfaces if management software wants to drive the SR-IOV live
>    migration on its own. It should not conflict with the hidden device
>    model but just offers simplicity of implementation.
> 
> 
> Si-Wei Liu (2):
>   vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover
>   pci: query command extension to check the bus master enabling status of the failover-primary device
> 
> Sridhar Samudrala (1):
>   virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.
> 
> Venu Busireddy (2):
>   virtio_net: Add support for "Data Path Switching" during Live Migration.
>   virtio_net: Add a query command for FAILOVER_STANDBY_CHANGED event.
> 
> ---
> Changes in v3:
>   Fix issues with coding style in patch 3/5.
> 
> Changes in v2:
>   Added a query command for FAILOVER_STANDBY_CHANGED event.
>   Added a query command for FAILOVER_PRIMARY_CHANGED event.

Hmm it looks like all feedback I sent e.g. here:
https://patchwork.kernel.org/patch/10721571/
got ignored.

To summarize I suggest reworking the series adding a new command along
the lines of (naming is up to you):

query-pci-master - this returns status for a device
		   and enables a *single* event after
		   it changes

and then removing all status data from the event,
just notify about the change and *only once*.
	    

upon event management does query-pci-master
and acts accordingly.




>  hmp.c                          |   5 +++
>  hw/acpi/pcihp.c                |  27 +++++++++++
>  hw/net/virtio-net.c            |  42 +++++++++++++++++
>  hw/pci/pci.c                   |   5 +++
>  hw/vfio/pci.c                  |  60 +++++++++++++++++++++++++
>  hw/vfio/pci.h                  |   1 +
>  include/hw/pci/pci.h           |   1 +
>  include/hw/virtio/virtio-net.h |   1 +
>  include/net/net.h              |   2 +
>  net/net.c                      |  61 +++++++++++++++++++++++++
>  qapi/misc.json                 |   5 ++-
>  qapi/net.json                  | 100 +++++++++++++++++++++++++++++++++++++++++
>  12 files changed, 309 insertions(+), 1 deletion(-)

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [virtio-dev] Re: [PATCH v3 0/5] Support for datapath switching during live migration
@ 2019-01-07 23:32   ` Michael S. Tsirkin
  0 siblings, 0 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2019-01-07 23:32 UTC (permalink / raw)
  To: Venu Busireddy; +Cc: Marcel Apfelbaum, virtio-dev, qemu-devel

On Mon, Jan 07, 2019 at 05:29:39PM -0500, Venu Busireddy wrote:
> Implement the infrastructure to support datapath switching during live
> migration involving SR-IOV devices.
> 
> 1. This patch is based off on the current VIRTIO_NET_F_STANDBY feature
>    bit and MAC address device pairing.
> 
> 2. This set of events will be consumed by userspace management software
>    to orchestrate all the hot plug and datapath switching activities.
>    This scheme has the least QEMU modifications while allowing userspace
>    software to build its own intelligence to control the whole process
>    of SR-IOV live migration.
> 
> 3. While the hidden device model (viz. coupled device model) is still
>    being explored for automatic hot plugging (QEMU) and automatic datapath
>    switching (host-kernel), this series provides a supplemental set
>    of interfaces if management software wants to drive the SR-IOV live
>    migration on its own. It should not conflict with the hidden device
>    model but just offers simplicity of implementation.
> 
> 
> Si-Wei Liu (2):
>   vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover
>   pci: query command extension to check the bus master enabling status of the failover-primary device
> 
> Sridhar Samudrala (1):
>   virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.
> 
> Venu Busireddy (2):
>   virtio_net: Add support for "Data Path Switching" during Live Migration.
>   virtio_net: Add a query command for FAILOVER_STANDBY_CHANGED event.
> 
> ---
> Changes in v3:
>   Fix issues with coding style in patch 3/5.
> 
> Changes in v2:
>   Added a query command for FAILOVER_STANDBY_CHANGED event.
>   Added a query command for FAILOVER_PRIMARY_CHANGED event.

Hmm it looks like all feedback I sent e.g. here:
https://patchwork.kernel.org/patch/10721571/
got ignored.

To summarize I suggest reworking the series adding a new command along
the lines of (naming is up to you):

query-pci-master - this returns status for a device
		   and enables a *single* event after
		   it changes

and then removing all status data from the event,
just notify about the change and *only once*.
	    

upon event management does query-pci-master
and acts accordingly.




>  hmp.c                          |   5 +++
>  hw/acpi/pcihp.c                |  27 +++++++++++
>  hw/net/virtio-net.c            |  42 +++++++++++++++++
>  hw/pci/pci.c                   |   5 +++
>  hw/vfio/pci.c                  |  60 +++++++++++++++++++++++++
>  hw/vfio/pci.h                  |   1 +
>  include/hw/pci/pci.h           |   1 +
>  include/hw/virtio/virtio-net.h |   1 +
>  include/net/net.h              |   2 +
>  net/net.c                      |  61 +++++++++++++++++++++++++
>  qapi/misc.json                 |   5 ++-
>  qapi/net.json                  | 100 +++++++++++++++++++++++++++++++++++++++++
>  12 files changed, 309 insertions(+), 1 deletion(-)

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/5] vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover
  2019-01-07 23:22       ` [virtio-dev] " Michael S. Tsirkin
  (?)
@ 2019-01-07 23:41       ` Alex Williamson
  2019-01-08  0:12           ` [virtio-dev] " Michael S. Tsirkin
  2019-01-08  1:13           ` [virtio-dev] " si-wei liu
  -1 siblings, 2 replies; 57+ messages in thread
From: Alex Williamson @ 2019-01-07 23:41 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Venu Busireddy, Marcel Apfelbaum, Si-Wei Liu, virtio-dev, qemu-devel

On Mon, 7 Jan 2019 18:22:20 -0500
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Mon, Jan 07, 2019 at 04:17:17PM -0700, Alex Williamson wrote:
> > On Mon,  7 Jan 2019 17:29:43 -0500
> > Venu Busireddy <venu.busireddy@oracle.com> wrote:
> >   
> > > From: Si-Wei Liu <si-wei.liu@oracle.com>
> > > 
> > > When a VF is hotplugged into the guest, datapath switching will be
> > > performed immediately, which is sub-optimal in terms of timing, and
> > > could end up with substantial network downtime. One of ways to shorten
> > > this downtime is to switch the datapath only after the VF is seen to get
> > > enabled by guest, indicated by the bus master bit in VF's PCI config
> > > space getting enabled. The FAILOVER_PRIMARY_CHANGED event is emitted
> > > at that time to indicate this condition. Then management stack can kick
> > > off datapath switching upon receiving the event.
> > > 
> > > Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> > > Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
> > > ---
> > >  hw/vfio/pci.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >  qapi/net.json | 26 ++++++++++++++++++++++++++
> > >  2 files changed, 83 insertions(+)  
> > 
> > Why is this done at the vfio driver layer rather than the PCI core
> > layer?  We write everything through using pci_default_write_config(), I
> > don't see that anything here is particularly vfio specific.  Please copy
> > me on any changes in hw/vfio.  Thanks,
> > 
> > Alex  
> 
> Hmm so you are saying let's send events for each device?
> I don't have a problem with this but in this case
> I think I would like to see a per-device option "send events".
> We don't want a ton of events in the simple default config.

In the below we're only sending events for PCIDevice.failover_primary,
seems like that would filter out the rest of the non-NIC PCI devices as
well as it does for non-NIC VFIO PCI devices.  The only thing remotely
vfio specific below is that it might notify based on the vfio device
name, but it's a fallback to PCIDevice.qdev.id.  A real ID could just
be a requirement to make use of this.  Thanks,

Alex

> > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > > index bd83b58..adcc95a 100644
> > > --- a/hw/vfio/pci.c
> > > +++ b/hw/vfio/pci.c
> > > @@ -34,6 +34,7 @@
> > >  #include "pci.h"
> > >  #include "trace.h"
> > >  #include "qapi/error.h"
> > > +#include "qapi/qapi-events-net.h"
> > >  
> > >  #define MSIX_CAP_LENGTH 12
> > >  
> > > @@ -42,6 +43,7 @@
> > >  
> > >  static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
> > >  static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
> > > +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status);
> > >  
> > >  /*
> > >   * Disabling BAR mmaping can be slow, but toggling it around INTx can
> > > @@ -1170,6 +1172,8 @@ void vfio_pci_write_config(PCIDevice *pdev,
> > >  {
> > >      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> > >      uint32_t val_le = cpu_to_le32(val);
> > > +    bool may_notify = false;
> > > +    bool master_was = false;
> > >  
> > >      trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len);
> > >  
> > > @@ -1180,6 +1184,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
> > >                       __func__, vdev->vbasedev.name, addr, val, len);
> > >      }
> > >  
> > > +    /* Bus Master Enabling/Disabling */
> > > +    if (pdev->failover_primary && current_cpu &&
> > > +        range_covers_byte(addr, len, PCI_COMMAND)) {
> > > +        master_was = !!(pci_get_word(pdev->config + PCI_COMMAND) &
> > > +                        PCI_COMMAND_MASTER);
> > > +        may_notify = true;
> > > +    }
> > > +
> > >      /* MSI/MSI-X Enabling/Disabling */
> > >      if (pdev->cap_present & QEMU_PCI_CAP_MSI &&
> > >          ranges_overlap(addr, len, pdev->msi_cap, vdev->msi_cap_size)) {
> > > @@ -1235,6 +1247,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
> > >          /* Write everything to QEMU to keep emulated bits correct */
> > >          pci_default_write_config(pdev, addr, val, len);
> > >      }
> > > +
> > > +    if (may_notify) {
> > > +        bool master_now = !!(pci_get_word(pdev->config + PCI_COMMAND) &
> > > +                             PCI_COMMAND_MASTER);
> > > +        if (master_was != master_now) {
> > > +            vfio_failover_notify(vdev, master_now);
> > > +        }
> > > +    }
> > >  }
> > >  
> > >  /*
> > > @@ -2801,6 +2821,17 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
> > >      vdev->req_enabled = false;
> > >  }
> > >  
> > > +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status)
> > > +{
> > > +    PCIDevice *pdev = &vdev->pdev;
> > > +    const char *n;
> > > +    gchar *path;
> > > +
> > > +    n = pdev->qdev.id ? pdev->qdev.id : vdev->vbasedev.name;
> > > +    path = object_get_canonical_path(OBJECT(vdev));
> > > +    qapi_event_send_failover_primary_changed(!!n, n, path, status);
> > > +}
> > > +
> > >  static void vfio_realize(PCIDevice *pdev, Error **errp)
> > >  {
> > >      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> > > @@ -3109,10 +3140,36 @@ static void vfio_instance_finalize(Object *obj)
> > >      vfio_put_group(group);
> > >  }
> > >  
> > > +static void vfio_exit_failover_notify(VFIOPCIDevice *vdev)
> > > +{
> > > +    PCIDevice *pdev = &vdev->pdev;
> > > +
> > > +    /*
> > > +     * Guest driver may not get the chance to disable bus mastering
> > > +     * before the device object gets to be unrealized. In that event,
> > > +     * send out a "disabled" notification on behalf of guest driver.
> > > +     */
> > > +    if (pdev->failover_primary &&
> > > +        pdev->bus_master_enable_region.enabled) {
> > > +        vfio_failover_notify(vdev, false);
> > > +    }
> > > +}
> > > +
> > >  static void vfio_exitfn(PCIDevice *pdev)
> > >  {
> > >      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> > >  
> > > +    /*
> > > +     * During the guest reboot sequence, it is sometimes possible that
> > > +     * the guest may not get sufficient time to complete the entire driver
> > > +     * removal sequence, near the end of which a PCI config space write to
> > > +     * disable bus mastering can be intercepted by device. In such cases,
> > > +     * the FAILOVER_PRIMARY_CHANGED "disable" event will not be emitted. It
> > > +     * is imperative to generate the event on the guest's behalf if the
> > > +     * guest fails to make it.
> > > +     */
> > > +    vfio_exit_failover_notify(vdev);
> > > +
> > >      vfio_unregister_req_notifier(vdev);
> > >      vfio_unregister_err_notifier(vdev);
> > >      pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
> > > diff --git a/qapi/net.json b/qapi/net.json
> > > index 633ac87..a5b8d70 100644
> > > --- a/qapi/net.json
> > > +++ b/qapi/net.json
> > > @@ -757,3 +757,29 @@
> > >  ##
> > >  { 'command': 'query-standby-status', 'data': { '*device': 'str' },
> > >    'returns': ['StandbyStatusInfo'] }
> > > +
> > > +##
> > > +# @FAILOVER_PRIMARY_CHANGED:
> > > +#
> > > +# Emitted whenever the driver of failover primary is loaded or unloaded
> > > +# by the guest.
> > > +#
> > > +# @device: device name
> > > +#
> > > +# @path: device path
> > > +#
> > > +# @enabled: true if driver is loaded thus device is enabled in guest
> > > +#
> > > +# Since: 3.0
> > > +#
> > > +# Example:
> > > +#
> > > +# <- { "event": "FAILOVER_PRIMARY_CHANGED",
> > > +#      "data": { "device": "vfio-0",
> > > +#                "path": "/machine/peripheral/vfio-0" },
> > > +#                "enabled": "true" },
> > > +#      "timestamp": { "seconds": 1539935213, "microseconds": 753529 } }
> > > +#
> > > +##
> > > +{ 'event': 'FAILOVER_PRIMARY_CHANGED',
> > > +  'data': { '*device': 'str', 'path': 'str', 'enabled': 'bool' } }
> > >   

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 3/5] virtio_net: Add a query command for FAILOVER_STANDBY_CHANGED event.
  2019-01-07 22:29   ` [virtio-dev] " Venu Busireddy
@ 2019-01-08  0:10     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2019-01-08  0:10 UTC (permalink / raw)
  To: Venu Busireddy; +Cc: Marcel Apfelbaum, virtio-dev, qemu-devel

On Mon, Jan 07, 2019 at 05:29:42PM -0500, Venu Busireddy wrote:
> Add a query command to check the status of the FAILOVER_STANDBY_CHANGED
> state of the virtio_net devices.
> 
> Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
> ---
>  hw/net/virtio-net.c            | 16 +++++++++++
>  include/hw/virtio/virtio-net.h |  1 +
>  include/net/net.h              |  2 ++
>  net/net.c                      | 61 ++++++++++++++++++++++++++++++++++++++++++
>  qapi/net.json                  | 46 +++++++++++++++++++++++++++++++
>  5 files changed, 126 insertions(+)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 7b1bcde..a4e07ac 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -263,9 +263,11 @@ static void virtio_net_failover_notify_event(VirtIONet *n, uint8_t status)
>           */
>          if ((status & VIRTIO_CONFIG_S_DRIVER_OK) &&
>                  (!(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK))) {
> +            n->standby_enabled = true;
>              qapi_event_send_failover_standby_changed(!!ncn, ncn, path, true);
>          } else if ((!(status & VIRTIO_CONFIG_S_DRIVER_OK)) &&
>                  (vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) {
> +            n->standby_enabled = false;
>              qapi_event_send_failover_standby_changed(!!ncn, ncn, path, false);
>          }
>      }

Here too, we are sending an endless stream of events.

Instead, let's send one "changed" event without data,
and then be silent until management runs the query command.




> @@ -448,6 +450,19 @@ static RxFilterInfo *virtio_net_query_rxfilter(NetClientState *nc)
>      return info;
>  }
>  
> +static StandbyStatusInfo *virtio_net_query_standby_status(NetClientState *nc)
> +{
> +    StandbyStatusInfo *info;
> +    VirtIONet *n = qemu_get_nic_opaque(nc);
> +
> +    info = g_malloc0(sizeof(*info));
> +    info->device = g_strdup(n->netclient_name);
> +    info->path = g_strdup(object_get_canonical_path(OBJECT(n->qdev)));
> +    info->enabled = n->standby_enabled;
> +
> +    return info;
> +}
> +
>  static void virtio_net_reset(VirtIODevice *vdev)
>  {
>      VirtIONet *n = VIRTIO_NET(vdev);
> @@ -1923,6 +1938,7 @@ static NetClientInfo net_virtio_info = {
>      .receive = virtio_net_receive,
>      .link_status_changed = virtio_net_set_link_status,
>      .query_rx_filter = virtio_net_query_rxfilter,
> +    .query_standby_status = virtio_net_query_standby_status,
>  };
>  
>  static bool virtio_net_guest_notifier_pending(VirtIODevice *vdev, int idx)
> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
> index 4d7f3c8..9071e96 100644
> --- a/include/hw/virtio/virtio-net.h
> +++ b/include/hw/virtio/virtio-net.h
> @@ -103,6 +103,7 @@ typedef struct VirtIONet {
>      int announce_counter;
>      bool needs_vnet_hdr_swap;
>      bool mtu_bypass_backend;
> +    bool standby_enabled;
>  } VirtIONet;
>  
>  void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
> diff --git a/include/net/net.h b/include/net/net.h
> index ec13702..61e8513 100644
> --- a/include/net/net.h
> +++ b/include/net/net.h
> @@ -50,6 +50,7 @@ typedef void (NetCleanup) (NetClientState *);
>  typedef void (LinkStatusChanged)(NetClientState *);
>  typedef void (NetClientDestructor)(NetClientState *);
>  typedef RxFilterInfo *(QueryRxFilter)(NetClientState *);
> +typedef StandbyStatusInfo *(QueryStandbyStatus)(NetClientState *);
>  typedef bool (HasUfo)(NetClientState *);
>  typedef bool (HasVnetHdr)(NetClientState *);
>  typedef bool (HasVnetHdrLen)(NetClientState *, int);
> @@ -71,6 +72,7 @@ typedef struct NetClientInfo {
>      NetCleanup *cleanup;
>      LinkStatusChanged *link_status_changed;
>      QueryRxFilter *query_rx_filter;
> +    QueryStandbyStatus *query_standby_status;
>      NetPoll *poll;
>      HasUfo *has_ufo;
>      HasVnetHdr *has_vnet_hdr;
> diff --git a/net/net.c b/net/net.c
> index 1f7d626..fbf288e 100644
> --- a/net/net.c
> +++ b/net/net.c
> @@ -1320,6 +1320,67 @@ RxFilterInfoList *qmp_query_rx_filter(bool has_name, const char *name,
>      return filter_list;
>  }
>  
> +StandbyStatusInfoList *qmp_query_standby_status(bool has_device,
> +                                                const char *device,
> +                                                Error **errp)
> +{
> +    NetClientState *nc;
> +    StandbyStatusInfoList *status_list = NULL, *last_entry = NULL;
> +
> +    QTAILQ_FOREACH(nc, &net_clients, next) {
> +        StandbyStatusInfoList *entry;
> +        StandbyStatusInfo *info;
> +
> +        if (has_device && strcmp(nc->name, device) != 0) {
> +            continue;
> +        }
> +
> +        /* only query standby status information of NIC */
> +        if (nc->info->type != NET_CLIENT_DRIVER_NIC) {
> +            if (has_device) {
> +                error_setg(errp, "net client(%s) isn't a NIC", device);
> +                return NULL;
> +            }
> +            continue;
> +        }
> +
> +        /*
> +         * only query information on queue 0 since the info is per nic,
> +         * not per queue.
> +         */
> +        if (nc->queue_index != 0) {
> +            continue;
> +        }
> +
> +        if (nc->info->query_standby_status) {
> +            info = nc->info->query_standby_status(nc);
> +            entry = g_malloc0(sizeof(*entry));
> +            entry->value = info;
> +
> +            if (!status_list) {
> +                status_list = entry;
> +            } else {
> +                last_entry->next = entry;
> +            }
> +            last_entry = entry;
> +        } else if (has_device) {
> +            error_setg(errp, "net client(%s) doesn't support"
> +                       " standby status querying", device);
> +            return NULL;
> +        }
> +
> +        if (has_device) {
> +            break;
> +        }
> +    }
> +
> +    if (status_list == NULL && has_device) {
> +        error_setg(errp, "invalid net client name: %s", device);
> +    }
> +
> +    return status_list;
> +}
> +
>  void hmp_info_network(Monitor *mon, const QDict *qdict)
>  {
>      NetClientState *nc, *peer;
> diff --git a/qapi/net.json b/qapi/net.json
> index 6a6d6fe..633ac87 100644
> --- a/qapi/net.json
> +++ b/qapi/net.json
> @@ -711,3 +711,49 @@
>  ##
>  { 'event': 'FAILOVER_STANDBY_CHANGED',
>    'data': {'*device': 'str', 'path': 'str', 'enabled': 'bool'} }
> +
> +##
> +# @StandbyStatusInfo:
> +#
> +# Standby status information for a virtio_net device.
> +#
> +# @device: Indicates the virtio_net device.
> +#
> +# @path: Indicates the device path.
> +#
> +# @enabled: true if the virtio_net driver is loaded.
> +#           false if the virtio_net driver is unloaded or the guest rebooted.
> +#
> +# Since: 4.0
> +##
> +{ 'struct': 'StandbyStatusInfo',
> +  'data': {'device': 'str', 'path': 'str', 'enabled': 'bool'} }
> +
> +##
> +# @query-standby-status:
> +#
> +# Return Standby status information for all virtio_net devices,
> +#        or for the given virtio_net device.
> +#
> +# @device: Name of the virtio_net device.
> +#
> +# Returns: List of @StandbyStatusInfo for all virtio_net devices,
> +#          or for the given virtio_net device.
> +#          Returns an error if the given @device doesn't exist.
> +#
> +# Since: 4.0
> +#
> +# Example:
> +#
> +# -> { "execute": "query-standby-status", "arguments": { "device": "net0" } }
> +# <- { "return": [
> +#                  { 'device': 'net0',
> +#                    'path': '/machine/peripheral/net0/virtio-backend',
> +#                    'enabled': 'true'
> +#                  }
> +#                ]
> +#    }
> +#
> +##
> +{ 'command': 'query-standby-status', 'data': { '*device': 'str' },
> +  'returns': ['StandbyStatusInfo'] }

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [virtio-dev] Re: [PATCH v3 3/5] virtio_net: Add a query command for FAILOVER_STANDBY_CHANGED event.
@ 2019-01-08  0:10     ` Michael S. Tsirkin
  0 siblings, 0 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2019-01-08  0:10 UTC (permalink / raw)
  To: Venu Busireddy; +Cc: Marcel Apfelbaum, virtio-dev, qemu-devel

On Mon, Jan 07, 2019 at 05:29:42PM -0500, Venu Busireddy wrote:
> Add a query command to check the status of the FAILOVER_STANDBY_CHANGED
> state of the virtio_net devices.
> 
> Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
> ---
>  hw/net/virtio-net.c            | 16 +++++++++++
>  include/hw/virtio/virtio-net.h |  1 +
>  include/net/net.h              |  2 ++
>  net/net.c                      | 61 ++++++++++++++++++++++++++++++++++++++++++
>  qapi/net.json                  | 46 +++++++++++++++++++++++++++++++
>  5 files changed, 126 insertions(+)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 7b1bcde..a4e07ac 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -263,9 +263,11 @@ static void virtio_net_failover_notify_event(VirtIONet *n, uint8_t status)
>           */
>          if ((status & VIRTIO_CONFIG_S_DRIVER_OK) &&
>                  (!(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK))) {
> +            n->standby_enabled = true;
>              qapi_event_send_failover_standby_changed(!!ncn, ncn, path, true);
>          } else if ((!(status & VIRTIO_CONFIG_S_DRIVER_OK)) &&
>                  (vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) {
> +            n->standby_enabled = false;
>              qapi_event_send_failover_standby_changed(!!ncn, ncn, path, false);
>          }
>      }

Here too, we are sending an endless stream of events.

Instead, let's send one "changed" event without data,
and then be silent until management runs the query command.




> @@ -448,6 +450,19 @@ static RxFilterInfo *virtio_net_query_rxfilter(NetClientState *nc)
>      return info;
>  }
>  
> +static StandbyStatusInfo *virtio_net_query_standby_status(NetClientState *nc)
> +{
> +    StandbyStatusInfo *info;
> +    VirtIONet *n = qemu_get_nic_opaque(nc);
> +
> +    info = g_malloc0(sizeof(*info));
> +    info->device = g_strdup(n->netclient_name);
> +    info->path = g_strdup(object_get_canonical_path(OBJECT(n->qdev)));
> +    info->enabled = n->standby_enabled;
> +
> +    return info;
> +}
> +
>  static void virtio_net_reset(VirtIODevice *vdev)
>  {
>      VirtIONet *n = VIRTIO_NET(vdev);
> @@ -1923,6 +1938,7 @@ static NetClientInfo net_virtio_info = {
>      .receive = virtio_net_receive,
>      .link_status_changed = virtio_net_set_link_status,
>      .query_rx_filter = virtio_net_query_rxfilter,
> +    .query_standby_status = virtio_net_query_standby_status,
>  };
>  
>  static bool virtio_net_guest_notifier_pending(VirtIODevice *vdev, int idx)
> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
> index 4d7f3c8..9071e96 100644
> --- a/include/hw/virtio/virtio-net.h
> +++ b/include/hw/virtio/virtio-net.h
> @@ -103,6 +103,7 @@ typedef struct VirtIONet {
>      int announce_counter;
>      bool needs_vnet_hdr_swap;
>      bool mtu_bypass_backend;
> +    bool standby_enabled;
>  } VirtIONet;
>  
>  void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
> diff --git a/include/net/net.h b/include/net/net.h
> index ec13702..61e8513 100644
> --- a/include/net/net.h
> +++ b/include/net/net.h
> @@ -50,6 +50,7 @@ typedef void (NetCleanup) (NetClientState *);
>  typedef void (LinkStatusChanged)(NetClientState *);
>  typedef void (NetClientDestructor)(NetClientState *);
>  typedef RxFilterInfo *(QueryRxFilter)(NetClientState *);
> +typedef StandbyStatusInfo *(QueryStandbyStatus)(NetClientState *);
>  typedef bool (HasUfo)(NetClientState *);
>  typedef bool (HasVnetHdr)(NetClientState *);
>  typedef bool (HasVnetHdrLen)(NetClientState *, int);
> @@ -71,6 +72,7 @@ typedef struct NetClientInfo {
>      NetCleanup *cleanup;
>      LinkStatusChanged *link_status_changed;
>      QueryRxFilter *query_rx_filter;
> +    QueryStandbyStatus *query_standby_status;
>      NetPoll *poll;
>      HasUfo *has_ufo;
>      HasVnetHdr *has_vnet_hdr;
> diff --git a/net/net.c b/net/net.c
> index 1f7d626..fbf288e 100644
> --- a/net/net.c
> +++ b/net/net.c
> @@ -1320,6 +1320,67 @@ RxFilterInfoList *qmp_query_rx_filter(bool has_name, const char *name,
>      return filter_list;
>  }
>  
> +StandbyStatusInfoList *qmp_query_standby_status(bool has_device,
> +                                                const char *device,
> +                                                Error **errp)
> +{
> +    NetClientState *nc;
> +    StandbyStatusInfoList *status_list = NULL, *last_entry = NULL;
> +
> +    QTAILQ_FOREACH(nc, &net_clients, next) {
> +        StandbyStatusInfoList *entry;
> +        StandbyStatusInfo *info;
> +
> +        if (has_device && strcmp(nc->name, device) != 0) {
> +            continue;
> +        }
> +
> +        /* only query standby status information of NIC */
> +        if (nc->info->type != NET_CLIENT_DRIVER_NIC) {
> +            if (has_device) {
> +                error_setg(errp, "net client(%s) isn't a NIC", device);
> +                return NULL;
> +            }
> +            continue;
> +        }
> +
> +        /*
> +         * only query information on queue 0 since the info is per nic,
> +         * not per queue.
> +         */
> +        if (nc->queue_index != 0) {
> +            continue;
> +        }
> +
> +        if (nc->info->query_standby_status) {
> +            info = nc->info->query_standby_status(nc);
> +            entry = g_malloc0(sizeof(*entry));
> +            entry->value = info;
> +
> +            if (!status_list) {
> +                status_list = entry;
> +            } else {
> +                last_entry->next = entry;
> +            }
> +            last_entry = entry;
> +        } else if (has_device) {
> +            error_setg(errp, "net client(%s) doesn't support"
> +                       " standby status querying", device);
> +            return NULL;
> +        }
> +
> +        if (has_device) {
> +            break;
> +        }
> +    }
> +
> +    if (status_list == NULL && has_device) {
> +        error_setg(errp, "invalid net client name: %s", device);
> +    }
> +
> +    return status_list;
> +}
> +
>  void hmp_info_network(Monitor *mon, const QDict *qdict)
>  {
>      NetClientState *nc, *peer;
> diff --git a/qapi/net.json b/qapi/net.json
> index 6a6d6fe..633ac87 100644
> --- a/qapi/net.json
> +++ b/qapi/net.json
> @@ -711,3 +711,49 @@
>  ##
>  { 'event': 'FAILOVER_STANDBY_CHANGED',
>    'data': {'*device': 'str', 'path': 'str', 'enabled': 'bool'} }
> +
> +##
> +# @StandbyStatusInfo:
> +#
> +# Standby status information for a virtio_net device.
> +#
> +# @device: Indicates the virtio_net device.
> +#
> +# @path: Indicates the device path.
> +#
> +# @enabled: true if the virtio_net driver is loaded.
> +#           false if the virtio_net driver is unloaded or the guest rebooted.
> +#
> +# Since: 4.0
> +##
> +{ 'struct': 'StandbyStatusInfo',
> +  'data': {'device': 'str', 'path': 'str', 'enabled': 'bool'} }
> +
> +##
> +# @query-standby-status:
> +#
> +# Return Standby status information for all virtio_net devices,
> +#        or for the given virtio_net device.
> +#
> +# @device: Name of the virtio_net device.
> +#
> +# Returns: List of @StandbyStatusInfo for all virtio_net devices,
> +#          or for the given virtio_net device.
> +#          Returns an error if the given @device doesn't exist.
> +#
> +# Since: 4.0
> +#
> +# Example:
> +#
> +# -> { "execute": "query-standby-status", "arguments": { "device": "net0" } }
> +# <- { "return": [
> +#                  { 'device': 'net0',
> +#                    'path': '/machine/peripheral/net0/virtio-backend',
> +#                    'enabled': 'true'
> +#                  }
> +#                ]
> +#    }
> +#
> +##
> +{ 'command': 'query-standby-status', 'data': { '*device': 'str' },
> +  'returns': ['StandbyStatusInfo'] }

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/5] vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover
  2019-01-07 23:41       ` Alex Williamson
@ 2019-01-08  0:12           ` Michael S. Tsirkin
  2019-01-08  1:13           ` [virtio-dev] " si-wei liu
  1 sibling, 0 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2019-01-08  0:12 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Venu Busireddy, Marcel Apfelbaum, Si-Wei Liu, virtio-dev, qemu-devel

On Mon, Jan 07, 2019 at 04:41:15PM -0700, Alex Williamson wrote:
> On Mon, 7 Jan 2019 18:22:20 -0500
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Mon, Jan 07, 2019 at 04:17:17PM -0700, Alex Williamson wrote:
> > > On Mon,  7 Jan 2019 17:29:43 -0500
> > > Venu Busireddy <venu.busireddy@oracle.com> wrote:
> > >   
> > > > From: Si-Wei Liu <si-wei.liu@oracle.com>
> > > > 
> > > > When a VF is hotplugged into the guest, datapath switching will be
> > > > performed immediately, which is sub-optimal in terms of timing, and
> > > > could end up with substantial network downtime. One of ways to shorten
> > > > this downtime is to switch the datapath only after the VF is seen to get
> > > > enabled by guest, indicated by the bus master bit in VF's PCI config
> > > > space getting enabled. The FAILOVER_PRIMARY_CHANGED event is emitted
> > > > at that time to indicate this condition. Then management stack can kick
> > > > off datapath switching upon receiving the event.
> > > > 
> > > > Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> > > > Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
> > > > ---
> > > >  hw/vfio/pci.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > >  qapi/net.json | 26 ++++++++++++++++++++++++++
> > > >  2 files changed, 83 insertions(+)  
> > > 
> > > Why is this done at the vfio driver layer rather than the PCI core
> > > layer?  We write everything through using pci_default_write_config(), I
> > > don't see that anything here is particularly vfio specific.  Please copy
> > > me on any changes in hw/vfio.  Thanks,
> > > 
> > > Alex  
> > 
> > Hmm so you are saying let's send events for each device?
> > I don't have a problem with this but in this case
> > I think I would like to see a per-device option "send events".
> > We don't want a ton of events in the simple default config.
> 
> In the below we're only sending events for PCIDevice.failover_primary,

Well failover_primary in this patch is a vfio property, not a
pci device property.


> seems like that would filter out the rest of the non-NIC PCI devices as
> well as it does for non-NIC VFIO PCI devices.  The only thing remotely
> vfio specific below is that it might notify based on the vfio device
> name, but it's a fallback to PCIDevice.qdev.id.  A real ID could just
> be a requirement to make use of this.


Right and in fact I don't see why we can't make reporting
bus master status a capability of all devices.


>  Thanks,
> 
> Alex
> 
> > > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > > > index bd83b58..adcc95a 100644
> > > > --- a/hw/vfio/pci.c
> > > > +++ b/hw/vfio/pci.c
> > > > @@ -34,6 +34,7 @@
> > > >  #include "pci.h"
> > > >  #include "trace.h"
> > > >  #include "qapi/error.h"
> > > > +#include "qapi/qapi-events-net.h"
> > > >  
> > > >  #define MSIX_CAP_LENGTH 12
> > > >  
> > > > @@ -42,6 +43,7 @@
> > > >  
> > > >  static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
> > > >  static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
> > > > +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status);
> > > >  
> > > >  /*
> > > >   * Disabling BAR mmaping can be slow, but toggling it around INTx can
> > > > @@ -1170,6 +1172,8 @@ void vfio_pci_write_config(PCIDevice *pdev,
> > > >  {
> > > >      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> > > >      uint32_t val_le = cpu_to_le32(val);
> > > > +    bool may_notify = false;
> > > > +    bool master_was = false;
> > > >  
> > > >      trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len);
> > > >  
> > > > @@ -1180,6 +1184,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
> > > >                       __func__, vdev->vbasedev.name, addr, val, len);
> > > >      }
> > > >  
> > > > +    /* Bus Master Enabling/Disabling */
> > > > +    if (pdev->failover_primary && current_cpu &&
> > > > +        range_covers_byte(addr, len, PCI_COMMAND)) {
> > > > +        master_was = !!(pci_get_word(pdev->config + PCI_COMMAND) &
> > > > +                        PCI_COMMAND_MASTER);
> > > > +        may_notify = true;
> > > > +    }
> > > > +
> > > >      /* MSI/MSI-X Enabling/Disabling */
> > > >      if (pdev->cap_present & QEMU_PCI_CAP_MSI &&
> > > >          ranges_overlap(addr, len, pdev->msi_cap, vdev->msi_cap_size)) {
> > > > @@ -1235,6 +1247,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
> > > >          /* Write everything to QEMU to keep emulated bits correct */
> > > >          pci_default_write_config(pdev, addr, val, len);
> > > >      }
> > > > +
> > > > +    if (may_notify) {
> > > > +        bool master_now = !!(pci_get_word(pdev->config + PCI_COMMAND) &
> > > > +                             PCI_COMMAND_MASTER);
> > > > +        if (master_was != master_now) {
> > > > +            vfio_failover_notify(vdev, master_now);
> > > > +        }
> > > > +    }
> > > >  }
> > > >  
> > > >  /*
> > > > @@ -2801,6 +2821,17 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
> > > >      vdev->req_enabled = false;
> > > >  }
> > > >  
> > > > +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status)
> > > > +{
> > > > +    PCIDevice *pdev = &vdev->pdev;
> > > > +    const char *n;
> > > > +    gchar *path;
> > > > +
> > > > +    n = pdev->qdev.id ? pdev->qdev.id : vdev->vbasedev.name;
> > > > +    path = object_get_canonical_path(OBJECT(vdev));
> > > > +    qapi_event_send_failover_primary_changed(!!n, n, path, status);
> > > > +}
> > > > +
> > > >  static void vfio_realize(PCIDevice *pdev, Error **errp)
> > > >  {
> > > >      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> > > > @@ -3109,10 +3140,36 @@ static void vfio_instance_finalize(Object *obj)
> > > >      vfio_put_group(group);
> > > >  }
> > > >  
> > > > +static void vfio_exit_failover_notify(VFIOPCIDevice *vdev)
> > > > +{
> > > > +    PCIDevice *pdev = &vdev->pdev;
> > > > +
> > > > +    /*
> > > > +     * Guest driver may not get the chance to disable bus mastering
> > > > +     * before the device object gets to be unrealized. In that event,
> > > > +     * send out a "disabled" notification on behalf of guest driver.
> > > > +     */
> > > > +    if (pdev->failover_primary &&
> > > > +        pdev->bus_master_enable_region.enabled) {
> > > > +        vfio_failover_notify(vdev, false);
> > > > +    }
> > > > +}
> > > > +
> > > >  static void vfio_exitfn(PCIDevice *pdev)
> > > >  {
> > > >      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> > > >  
> > > > +    /*
> > > > +     * During the guest reboot sequence, it is sometimes possible that
> > > > +     * the guest may not get sufficient time to complete the entire driver
> > > > +     * removal sequence, near the end of which a PCI config space write to
> > > > +     * disable bus mastering can be intercepted by device. In such cases,
> > > > +     * the FAILOVER_PRIMARY_CHANGED "disable" event will not be emitted. It
> > > > +     * is imperative to generate the event on the guest's behalf if the
> > > > +     * guest fails to make it.
> > > > +     */
> > > > +    vfio_exit_failover_notify(vdev);
> > > > +
> > > >      vfio_unregister_req_notifier(vdev);
> > > >      vfio_unregister_err_notifier(vdev);
> > > >      pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
> > > > diff --git a/qapi/net.json b/qapi/net.json
> > > > index 633ac87..a5b8d70 100644
> > > > --- a/qapi/net.json
> > > > +++ b/qapi/net.json
> > > > @@ -757,3 +757,29 @@
> > > >  ##
> > > >  { 'command': 'query-standby-status', 'data': { '*device': 'str' },
> > > >    'returns': ['StandbyStatusInfo'] }
> > > > +
> > > > +##
> > > > +# @FAILOVER_PRIMARY_CHANGED:
> > > > +#
> > > > +# Emitted whenever the driver of failover primary is loaded or unloaded
> > > > +# by the guest.
> > > > +#
> > > > +# @device: device name
> > > > +#
> > > > +# @path: device path
> > > > +#
> > > > +# @enabled: true if driver is loaded thus device is enabled in guest
> > > > +#
> > > > +# Since: 3.0
> > > > +#
> > > > +# Example:
> > > > +#
> > > > +# <- { "event": "FAILOVER_PRIMARY_CHANGED",
> > > > +#      "data": { "device": "vfio-0",
> > > > +#                "path": "/machine/peripheral/vfio-0" },
> > > > +#                "enabled": "true" },
> > > > +#      "timestamp": { "seconds": 1539935213, "microseconds": 753529 } }
> > > > +#
> > > > +##
> > > > +{ 'event': 'FAILOVER_PRIMARY_CHANGED',
> > > > +  'data': { '*device': 'str', 'path': 'str', 'enabled': 'bool' } }
> > > >   

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [virtio-dev] Re: [Qemu-devel] [PATCH v3 4/5] vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover
@ 2019-01-08  0:12           ` Michael S. Tsirkin
  0 siblings, 0 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2019-01-08  0:12 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Venu Busireddy, Marcel Apfelbaum, Si-Wei Liu, virtio-dev, qemu-devel

On Mon, Jan 07, 2019 at 04:41:15PM -0700, Alex Williamson wrote:
> On Mon, 7 Jan 2019 18:22:20 -0500
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Mon, Jan 07, 2019 at 04:17:17PM -0700, Alex Williamson wrote:
> > > On Mon,  7 Jan 2019 17:29:43 -0500
> > > Venu Busireddy <venu.busireddy@oracle.com> wrote:
> > >   
> > > > From: Si-Wei Liu <si-wei.liu@oracle.com>
> > > > 
> > > > When a VF is hotplugged into the guest, datapath switching will be
> > > > performed immediately, which is sub-optimal in terms of timing, and
> > > > could end up with substantial network downtime. One of ways to shorten
> > > > this downtime is to switch the datapath only after the VF is seen to get
> > > > enabled by guest, indicated by the bus master bit in VF's PCI config
> > > > space getting enabled. The FAILOVER_PRIMARY_CHANGED event is emitted
> > > > at that time to indicate this condition. Then management stack can kick
> > > > off datapath switching upon receiving the event.
> > > > 
> > > > Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> > > > Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
> > > > ---
> > > >  hw/vfio/pci.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > >  qapi/net.json | 26 ++++++++++++++++++++++++++
> > > >  2 files changed, 83 insertions(+)  
> > > 
> > > Why is this done at the vfio driver layer rather than the PCI core
> > > layer?  We write everything through using pci_default_write_config(), I
> > > don't see that anything here is particularly vfio specific.  Please copy
> > > me on any changes in hw/vfio.  Thanks,
> > > 
> > > Alex  
> > 
> > Hmm so you are saying let's send events for each device?
> > I don't have a problem with this but in this case
> > I think I would like to see a per-device option "send events".
> > We don't want a ton of events in the simple default config.
> 
> In the below we're only sending events for PCIDevice.failover_primary,

Well failover_primary in this patch is a vfio property, not a
pci device property.


> seems like that would filter out the rest of the non-NIC PCI devices as
> well as it does for non-NIC VFIO PCI devices.  The only thing remotely
> vfio specific below is that it might notify based on the vfio device
> name, but it's a fallback to PCIDevice.qdev.id.  A real ID could just
> be a requirement to make use of this.


Right and in fact I don't see why we can't make reporting
bus master status a capability of all devices.


>  Thanks,
> 
> Alex
> 
> > > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > > > index bd83b58..adcc95a 100644
> > > > --- a/hw/vfio/pci.c
> > > > +++ b/hw/vfio/pci.c
> > > > @@ -34,6 +34,7 @@
> > > >  #include "pci.h"
> > > >  #include "trace.h"
> > > >  #include "qapi/error.h"
> > > > +#include "qapi/qapi-events-net.h"
> > > >  
> > > >  #define MSIX_CAP_LENGTH 12
> > > >  
> > > > @@ -42,6 +43,7 @@
> > > >  
> > > >  static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
> > > >  static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
> > > > +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status);
> > > >  
> > > >  /*
> > > >   * Disabling BAR mmaping can be slow, but toggling it around INTx can
> > > > @@ -1170,6 +1172,8 @@ void vfio_pci_write_config(PCIDevice *pdev,
> > > >  {
> > > >      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> > > >      uint32_t val_le = cpu_to_le32(val);
> > > > +    bool may_notify = false;
> > > > +    bool master_was = false;
> > > >  
> > > >      trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len);
> > > >  
> > > > @@ -1180,6 +1184,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
> > > >                       __func__, vdev->vbasedev.name, addr, val, len);
> > > >      }
> > > >  
> > > > +    /* Bus Master Enabling/Disabling */
> > > > +    if (pdev->failover_primary && current_cpu &&
> > > > +        range_covers_byte(addr, len, PCI_COMMAND)) {
> > > > +        master_was = !!(pci_get_word(pdev->config + PCI_COMMAND) &
> > > > +                        PCI_COMMAND_MASTER);
> > > > +        may_notify = true;
> > > > +    }
> > > > +
> > > >      /* MSI/MSI-X Enabling/Disabling */
> > > >      if (pdev->cap_present & QEMU_PCI_CAP_MSI &&
> > > >          ranges_overlap(addr, len, pdev->msi_cap, vdev->msi_cap_size)) {
> > > > @@ -1235,6 +1247,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
> > > >          /* Write everything to QEMU to keep emulated bits correct */
> > > >          pci_default_write_config(pdev, addr, val, len);
> > > >      }
> > > > +
> > > > +    if (may_notify) {
> > > > +        bool master_now = !!(pci_get_word(pdev->config + PCI_COMMAND) &
> > > > +                             PCI_COMMAND_MASTER);
> > > > +        if (master_was != master_now) {
> > > > +            vfio_failover_notify(vdev, master_now);
> > > > +        }
> > > > +    }
> > > >  }
> > > >  
> > > >  /*
> > > > @@ -2801,6 +2821,17 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
> > > >      vdev->req_enabled = false;
> > > >  }
> > > >  
> > > > +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status)
> > > > +{
> > > > +    PCIDevice *pdev = &vdev->pdev;
> > > > +    const char *n;
> > > > +    gchar *path;
> > > > +
> > > > +    n = pdev->qdev.id ? pdev->qdev.id : vdev->vbasedev.name;
> > > > +    path = object_get_canonical_path(OBJECT(vdev));
> > > > +    qapi_event_send_failover_primary_changed(!!n, n, path, status);
> > > > +}
> > > > +
> > > >  static void vfio_realize(PCIDevice *pdev, Error **errp)
> > > >  {
> > > >      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> > > > @@ -3109,10 +3140,36 @@ static void vfio_instance_finalize(Object *obj)
> > > >      vfio_put_group(group);
> > > >  }
> > > >  
> > > > +static void vfio_exit_failover_notify(VFIOPCIDevice *vdev)
> > > > +{
> > > > +    PCIDevice *pdev = &vdev->pdev;
> > > > +
> > > > +    /*
> > > > +     * Guest driver may not get the chance to disable bus mastering
> > > > +     * before the device object gets to be unrealized. In that event,
> > > > +     * send out a "disabled" notification on behalf of guest driver.
> > > > +     */
> > > > +    if (pdev->failover_primary &&
> > > > +        pdev->bus_master_enable_region.enabled) {
> > > > +        vfio_failover_notify(vdev, false);
> > > > +    }
> > > > +}
> > > > +
> > > >  static void vfio_exitfn(PCIDevice *pdev)
> > > >  {
> > > >      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> > > >  
> > > > +    /*
> > > > +     * During the guest reboot sequence, it is sometimes possible that
> > > > +     * the guest may not get sufficient time to complete the entire driver
> > > > +     * removal sequence, near the end of which a PCI config space write to
> > > > +     * disable bus mastering can be intercepted by device. In such cases,
> > > > +     * the FAILOVER_PRIMARY_CHANGED "disable" event will not be emitted. It
> > > > +     * is imperative to generate the event on the guest's behalf if the
> > > > +     * guest fails to make it.
> > > > +     */
> > > > +    vfio_exit_failover_notify(vdev);
> > > > +
> > > >      vfio_unregister_req_notifier(vdev);
> > > >      vfio_unregister_err_notifier(vdev);
> > > >      pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
> > > > diff --git a/qapi/net.json b/qapi/net.json
> > > > index 633ac87..a5b8d70 100644
> > > > --- a/qapi/net.json
> > > > +++ b/qapi/net.json
> > > > @@ -757,3 +757,29 @@
> > > >  ##
> > > >  { 'command': 'query-standby-status', 'data': { '*device': 'str' },
> > > >    'returns': ['StandbyStatusInfo'] }
> > > > +
> > > > +##
> > > > +# @FAILOVER_PRIMARY_CHANGED:
> > > > +#
> > > > +# Emitted whenever the driver of failover primary is loaded or unloaded
> > > > +# by the guest.
> > > > +#
> > > > +# @device: device name
> > > > +#
> > > > +# @path: device path
> > > > +#
> > > > +# @enabled: true if driver is loaded thus device is enabled in guest
> > > > +#
> > > > +# Since: 3.0
> > > > +#
> > > > +# Example:
> > > > +#
> > > > +# <- { "event": "FAILOVER_PRIMARY_CHANGED",
> > > > +#      "data": { "device": "vfio-0",
> > > > +#                "path": "/machine/peripheral/vfio-0" },
> > > > +#                "enabled": "true" },
> > > > +#      "timestamp": { "seconds": 1539935213, "microseconds": 753529 } }
> > > > +#
> > > > +##
> > > > +{ 'event': 'FAILOVER_PRIMARY_CHANGED',
> > > > +  'data': { '*device': 'str', 'path': 'str', 'enabled': 'bool' } }
> > > >   

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/5] vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover
  2019-01-08  0:12           ` [virtio-dev] " Michael S. Tsirkin
  (?)
@ 2019-01-08  0:24           ` Alex Williamson
  2019-01-08  0:43               ` [virtio-dev] " Michael S. Tsirkin
  -1 siblings, 1 reply; 57+ messages in thread
From: Alex Williamson @ 2019-01-08  0:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Venu Busireddy, Marcel Apfelbaum, Si-Wei Liu, virtio-dev, qemu-devel

On Mon, 7 Jan 2019 19:12:06 -0500
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Mon, Jan 07, 2019 at 04:41:15PM -0700, Alex Williamson wrote:
> > On Mon, 7 Jan 2019 18:22:20 -0500
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >   
> > > On Mon, Jan 07, 2019 at 04:17:17PM -0700, Alex Williamson wrote:  
> > > > On Mon,  7 Jan 2019 17:29:43 -0500
> > > > Venu Busireddy <venu.busireddy@oracle.com> wrote:
> > > >     
> > > > > From: Si-Wei Liu <si-wei.liu@oracle.com>
> > > > > 
> > > > > When a VF is hotplugged into the guest, datapath switching will be
> > > > > performed immediately, which is sub-optimal in terms of timing, and
> > > > > could end up with substantial network downtime. One of ways to shorten
> > > > > this downtime is to switch the datapath only after the VF is seen to get
> > > > > enabled by guest, indicated by the bus master bit in VF's PCI config
> > > > > space getting enabled. The FAILOVER_PRIMARY_CHANGED event is emitted
> > > > > at that time to indicate this condition. Then management stack can kick
> > > > > off datapath switching upon receiving the event.
> > > > > 
> > > > > Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> > > > > Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
> > > > > ---
> > > > >  hw/vfio/pci.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > > >  qapi/net.json | 26 ++++++++++++++++++++++++++
> > > > >  2 files changed, 83 insertions(+)    
> > > > 
> > > > Why is this done at the vfio driver layer rather than the PCI core
> > > > layer?  We write everything through using pci_default_write_config(), I
> > > > don't see that anything here is particularly vfio specific.  Please copy
> > > > me on any changes in hw/vfio.  Thanks,
> > > > 
> > > > Alex    
> > > 
> > > Hmm so you are saying let's send events for each device?
> > > I don't have a problem with this but in this case
> > > I think I would like to see a per-device option "send events".
> > > We don't want a ton of events in the simple default config.  
> > 
> > In the below we're only sending events for PCIDevice.failover_primary,  
> 
> Well failover_primary in this patch is a vfio property, not a
> pci device property.

It's both and it's kind of a kludge (from 2/5):

--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3077,6 +3077,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
     vfio_register_err_notifier(vdev);
     vfio_register_req_notifier(vdev);
     vfio_setup_resetfn_quirk(vdev);
+    pdev->failover_primary = vdev->failover_primary;
 
     return;
 
@@ -3219,6 +3220,8 @@ static Property vfio_pci_dev_properties[] = {
                                    qdev_prop_nv_gpudirect_clique, uint8_t),
     DEFINE_PROP_OFF_AUTO_PCIBAR("x-msix-relocation", VFIOPCIDevice, msix_relo,
                                 OFF_AUTOPCIBAR_OFF),
+    DEFINE_PROP_BOOL("failover-primary", VFIOPCIDevice, failover_primary,
+                     false),
     /*
      * TODO - support passed fds... is this necessary?
      * DEFINE_PROP_STRING("vfiofd", VFIOPCIDevice, vfiofd_name),

The property could have set VFIOPCIDevice.pdev.failover_primary
directly.  I'm not thrilled about that name either, it's a very NIC
centric property whereas vfio-pci supports plenty of non-networking
devices, as of course does PCIDevice.  Maybe the concept needs to be
more general or the name needs to be more NIC specific and fail for
devices that don't have the correct class code.  Thanks,

Alex

> > seems like that would filter out the rest of the non-NIC PCI devices as
> > well as it does for non-NIC VFIO PCI devices.  The only thing remotely
> > vfio specific below is that it might notify based on the vfio device
> > name, but it's a fallback to PCIDevice.qdev.id.  A real ID could just
> > be a requirement to make use of this.  
> 
> 
> Right and in fact I don't see why we can't make reporting
> bus master status a capability of all devices.
> 
> 
> >  Thanks,
> > 
> > Alex
> >   
> > > > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > > > > index bd83b58..adcc95a 100644
> > > > > --- a/hw/vfio/pci.c
> > > > > +++ b/hw/vfio/pci.c
> > > > > @@ -34,6 +34,7 @@
> > > > >  #include "pci.h"
> > > > >  #include "trace.h"
> > > > >  #include "qapi/error.h"
> > > > > +#include "qapi/qapi-events-net.h"
> > > > >  
> > > > >  #define MSIX_CAP_LENGTH 12
> > > > >  
> > > > > @@ -42,6 +43,7 @@
> > > > >  
> > > > >  static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
> > > > >  static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
> > > > > +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status);
> > > > >  
> > > > >  /*
> > > > >   * Disabling BAR mmaping can be slow, but toggling it around INTx can
> > > > > @@ -1170,6 +1172,8 @@ void vfio_pci_write_config(PCIDevice *pdev,
> > > > >  {
> > > > >      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> > > > >      uint32_t val_le = cpu_to_le32(val);
> > > > > +    bool may_notify = false;
> > > > > +    bool master_was = false;
> > > > >  
> > > > >      trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len);
> > > > >  
> > > > > @@ -1180,6 +1184,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
> > > > >                       __func__, vdev->vbasedev.name, addr, val, len);
> > > > >      }
> > > > >  
> > > > > +    /* Bus Master Enabling/Disabling */
> > > > > +    if (pdev->failover_primary && current_cpu &&
> > > > > +        range_covers_byte(addr, len, PCI_COMMAND)) {
> > > > > +        master_was = !!(pci_get_word(pdev->config + PCI_COMMAND) &
> > > > > +                        PCI_COMMAND_MASTER);
> > > > > +        may_notify = true;
> > > > > +    }
> > > > > +
> > > > >      /* MSI/MSI-X Enabling/Disabling */
> > > > >      if (pdev->cap_present & QEMU_PCI_CAP_MSI &&
> > > > >          ranges_overlap(addr, len, pdev->msi_cap, vdev->msi_cap_size)) {
> > > > > @@ -1235,6 +1247,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
> > > > >          /* Write everything to QEMU to keep emulated bits correct */
> > > > >          pci_default_write_config(pdev, addr, val, len);
> > > > >      }
> > > > > +
> > > > > +    if (may_notify) {
> > > > > +        bool master_now = !!(pci_get_word(pdev->config + PCI_COMMAND) &
> > > > > +                             PCI_COMMAND_MASTER);
> > > > > +        if (master_was != master_now) {
> > > > > +            vfio_failover_notify(vdev, master_now);
> > > > > +        }
> > > > > +    }
> > > > >  }
> > > > >  
> > > > >  /*
> > > > > @@ -2801,6 +2821,17 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
> > > > >      vdev->req_enabled = false;
> > > > >  }
> > > > >  
> > > > > +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status)
> > > > > +{
> > > > > +    PCIDevice *pdev = &vdev->pdev;
> > > > > +    const char *n;
> > > > > +    gchar *path;
> > > > > +
> > > > > +    n = pdev->qdev.id ? pdev->qdev.id : vdev->vbasedev.name;
> > > > > +    path = object_get_canonical_path(OBJECT(vdev));
> > > > > +    qapi_event_send_failover_primary_changed(!!n, n, path, status);
> > > > > +}
> > > > > +
> > > > >  static void vfio_realize(PCIDevice *pdev, Error **errp)
> > > > >  {
> > > > >      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> > > > > @@ -3109,10 +3140,36 @@ static void vfio_instance_finalize(Object *obj)
> > > > >      vfio_put_group(group);
> > > > >  }
> > > > >  
> > > > > +static void vfio_exit_failover_notify(VFIOPCIDevice *vdev)
> > > > > +{
> > > > > +    PCIDevice *pdev = &vdev->pdev;
> > > > > +
> > > > > +    /*
> > > > > +     * Guest driver may not get the chance to disable bus mastering
> > > > > +     * before the device object gets to be unrealized. In that event,
> > > > > +     * send out a "disabled" notification on behalf of guest driver.
> > > > > +     */
> > > > > +    if (pdev->failover_primary &&
> > > > > +        pdev->bus_master_enable_region.enabled) {
> > > > > +        vfio_failover_notify(vdev, false);
> > > > > +    }
> > > > > +}
> > > > > +
> > > > >  static void vfio_exitfn(PCIDevice *pdev)
> > > > >  {
> > > > >      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> > > > >  
> > > > > +    /*
> > > > > +     * During the guest reboot sequence, it is sometimes possible that
> > > > > +     * the guest may not get sufficient time to complete the entire driver
> > > > > +     * removal sequence, near the end of which a PCI config space write to
> > > > > +     * disable bus mastering can be intercepted by device. In such cases,
> > > > > +     * the FAILOVER_PRIMARY_CHANGED "disable" event will not be emitted. It
> > > > > +     * is imperative to generate the event on the guest's behalf if the
> > > > > +     * guest fails to make it.
> > > > > +     */
> > > > > +    vfio_exit_failover_notify(vdev);
> > > > > +
> > > > >      vfio_unregister_req_notifier(vdev);
> > > > >      vfio_unregister_err_notifier(vdev);
> > > > >      pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
> > > > > diff --git a/qapi/net.json b/qapi/net.json
> > > > > index 633ac87..a5b8d70 100644
> > > > > --- a/qapi/net.json
> > > > > +++ b/qapi/net.json
> > > > > @@ -757,3 +757,29 @@
> > > > >  ##
> > > > >  { 'command': 'query-standby-status', 'data': { '*device': 'str' },
> > > > >    'returns': ['StandbyStatusInfo'] }
> > > > > +
> > > > > +##
> > > > > +# @FAILOVER_PRIMARY_CHANGED:
> > > > > +#
> > > > > +# Emitted whenever the driver of failover primary is loaded or unloaded
> > > > > +# by the guest.
> > > > > +#
> > > > > +# @device: device name
> > > > > +#
> > > > > +# @path: device path
> > > > > +#
> > > > > +# @enabled: true if driver is loaded thus device is enabled in guest
> > > > > +#
> > > > > +# Since: 3.0
> > > > > +#
> > > > > +# Example:
> > > > > +#
> > > > > +# <- { "event": "FAILOVER_PRIMARY_CHANGED",
> > > > > +#      "data": { "device": "vfio-0",
> > > > > +#                "path": "/machine/peripheral/vfio-0" },
> > > > > +#                "enabled": "true" },
> > > > > +#      "timestamp": { "seconds": 1539935213, "microseconds": 753529 } }
> > > > > +#
> > > > > +##
> > > > > +{ 'event': 'FAILOVER_PRIMARY_CHANGED',
> > > > > +  'data': { '*device': 'str', 'path': 'str', 'enabled': 'bool' } }
> > > > >     

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/5] vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover
  2019-01-08  0:24           ` Alex Williamson
@ 2019-01-08  0:43               ` Michael S. Tsirkin
  0 siblings, 0 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2019-01-08  0:43 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Venu Busireddy, Marcel Apfelbaum, Si-Wei Liu, virtio-dev, qemu-devel

On Mon, Jan 07, 2019 at 05:24:15PM -0700, Alex Williamson wrote:
> On Mon, 7 Jan 2019 19:12:06 -0500
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Mon, Jan 07, 2019 at 04:41:15PM -0700, Alex Williamson wrote:
> > > On Mon, 7 Jan 2019 18:22:20 -0500
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > >   
> > > > On Mon, Jan 07, 2019 at 04:17:17PM -0700, Alex Williamson wrote:  
> > > > > On Mon,  7 Jan 2019 17:29:43 -0500
> > > > > Venu Busireddy <venu.busireddy@oracle.com> wrote:
> > > > >     
> > > > > > From: Si-Wei Liu <si-wei.liu@oracle.com>
> > > > > > 
> > > > > > When a VF is hotplugged into the guest, datapath switching will be
> > > > > > performed immediately, which is sub-optimal in terms of timing, and
> > > > > > could end up with substantial network downtime. One of ways to shorten
> > > > > > this downtime is to switch the datapath only after the VF is seen to get
> > > > > > enabled by guest, indicated by the bus master bit in VF's PCI config
> > > > > > space getting enabled. The FAILOVER_PRIMARY_CHANGED event is emitted
> > > > > > at that time to indicate this condition. Then management stack can kick
> > > > > > off datapath switching upon receiving the event.
> > > > > > 
> > > > > > Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> > > > > > Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
> > > > > > ---
> > > > > >  hw/vfio/pci.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > > > >  qapi/net.json | 26 ++++++++++++++++++++++++++
> > > > > >  2 files changed, 83 insertions(+)    
> > > > > 
> > > > > Why is this done at the vfio driver layer rather than the PCI core
> > > > > layer?  We write everything through using pci_default_write_config(), I
> > > > > don't see that anything here is particularly vfio specific.  Please copy
> > > > > me on any changes in hw/vfio.  Thanks,
> > > > > 
> > > > > Alex    
> > > > 
> > > > Hmm so you are saying let's send events for each device?
> > > > I don't have a problem with this but in this case
> > > > I think I would like to see a per-device option "send events".
> > > > We don't want a ton of events in the simple default config.  
> > > 
> > > In the below we're only sending events for PCIDevice.failover_primary,  
> > 
> > Well failover_primary in this patch is a vfio property, not a
> > pci device property.
> 
> It's both and it's kind of a kludge (from 2/5):
> 
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3077,6 +3077,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>      vfio_register_err_notifier(vdev);
>      vfio_register_req_notifier(vdev);
>      vfio_setup_resetfn_quirk(vdev);
> +    pdev->failover_primary = vdev->failover_primary;
>  
>      return;
>  
> @@ -3219,6 +3220,8 @@ static Property vfio_pci_dev_properties[] = {
>                                     qdev_prop_nv_gpudirect_clique, uint8_t),
>      DEFINE_PROP_OFF_AUTO_PCIBAR("x-msix-relocation", VFIOPCIDevice, msix_relo,
>                                  OFF_AUTOPCIBAR_OFF),
> +    DEFINE_PROP_BOOL("failover-primary", VFIOPCIDevice, failover_primary,
> +                     false),
>      /*
>       * TODO - support passed fds... is this necessary?
>       * DEFINE_PROP_STRING("vfiofd", VFIOPCIDevice, vfiofd_name),
> 
> The property could have set VFIOPCIDevice.pdev.failover_primary
> directly.  I'm not thrilled about that name either, it's a very NIC
> centric property whereas vfio-pci supports plenty of non-networking
> devices, as of course does PCIDevice.  Maybe the concept needs to be
> more general or the name needs to be more NIC specific and fail for
> devices that don't have the correct class code.  Thanks,
> 
> Alex

I actually think it's generic concept. I came with a name failover
exactly to avoid the "bonding" name that was used originally
and was net specific.

In particular
https://fedoraproject.org/wiki/Features/Virt_Device_Failover
suggests using multipath for storage.

Can in theory easily be imagined to work with  rng, crypto
even though I don't think Linux makes supporting this easy.



> > > seems like that would filter out the rest of the non-NIC PCI devices as
> > > well as it does for non-NIC VFIO PCI devices.  The only thing remotely
> > > vfio specific below is that it might notify based on the vfio device
> > > name, but it's a fallback to PCIDevice.qdev.id.  A real ID could just
> > > be a requirement to make use of this.  
> > 
> > 
> > Right and in fact I don't see why we can't make reporting
> > bus master status a capability of all devices.
> > 
> > 
> > >  Thanks,
> > > 
> > > Alex
> > >   
> > > > > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > > > > > index bd83b58..adcc95a 100644
> > > > > > --- a/hw/vfio/pci.c
> > > > > > +++ b/hw/vfio/pci.c
> > > > > > @@ -34,6 +34,7 @@
> > > > > >  #include "pci.h"
> > > > > >  #include "trace.h"
> > > > > >  #include "qapi/error.h"
> > > > > > +#include "qapi/qapi-events-net.h"
> > > > > >  
> > > > > >  #define MSIX_CAP_LENGTH 12
> > > > > >  
> > > > > > @@ -42,6 +43,7 @@
> > > > > >  
> > > > > >  static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
> > > > > >  static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
> > > > > > +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status);
> > > > > >  
> > > > > >  /*
> > > > > >   * Disabling BAR mmaping can be slow, but toggling it around INTx can
> > > > > > @@ -1170,6 +1172,8 @@ void vfio_pci_write_config(PCIDevice *pdev,
> > > > > >  {
> > > > > >      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> > > > > >      uint32_t val_le = cpu_to_le32(val);
> > > > > > +    bool may_notify = false;
> > > > > > +    bool master_was = false;
> > > > > >  
> > > > > >      trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len);
> > > > > >  
> > > > > > @@ -1180,6 +1184,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
> > > > > >                       __func__, vdev->vbasedev.name, addr, val, len);
> > > > > >      }
> > > > > >  
> > > > > > +    /* Bus Master Enabling/Disabling */
> > > > > > +    if (pdev->failover_primary && current_cpu &&
> > > > > > +        range_covers_byte(addr, len, PCI_COMMAND)) {
> > > > > > +        master_was = !!(pci_get_word(pdev->config + PCI_COMMAND) &
> > > > > > +                        PCI_COMMAND_MASTER);
> > > > > > +        may_notify = true;
> > > > > > +    }
> > > > > > +
> > > > > >      /* MSI/MSI-X Enabling/Disabling */
> > > > > >      if (pdev->cap_present & QEMU_PCI_CAP_MSI &&
> > > > > >          ranges_overlap(addr, len, pdev->msi_cap, vdev->msi_cap_size)) {
> > > > > > @@ -1235,6 +1247,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
> > > > > >          /* Write everything to QEMU to keep emulated bits correct */
> > > > > >          pci_default_write_config(pdev, addr, val, len);
> > > > > >      }
> > > > > > +
> > > > > > +    if (may_notify) {
> > > > > > +        bool master_now = !!(pci_get_word(pdev->config + PCI_COMMAND) &
> > > > > > +                             PCI_COMMAND_MASTER);
> > > > > > +        if (master_was != master_now) {
> > > > > > +            vfio_failover_notify(vdev, master_now);
> > > > > > +        }
> > > > > > +    }
> > > > > >  }
> > > > > >  
> > > > > >  /*
> > > > > > @@ -2801,6 +2821,17 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
> > > > > >      vdev->req_enabled = false;
> > > > > >  }
> > > > > >  
> > > > > > +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status)
> > > > > > +{
> > > > > > +    PCIDevice *pdev = &vdev->pdev;
> > > > > > +    const char *n;
> > > > > > +    gchar *path;
> > > > > > +
> > > > > > +    n = pdev->qdev.id ? pdev->qdev.id : vdev->vbasedev.name;
> > > > > > +    path = object_get_canonical_path(OBJECT(vdev));
> > > > > > +    qapi_event_send_failover_primary_changed(!!n, n, path, status);
> > > > > > +}
> > > > > > +
> > > > > >  static void vfio_realize(PCIDevice *pdev, Error **errp)
> > > > > >  {
> > > > > >      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> > > > > > @@ -3109,10 +3140,36 @@ static void vfio_instance_finalize(Object *obj)
> > > > > >      vfio_put_group(group);
> > > > > >  }
> > > > > >  
> > > > > > +static void vfio_exit_failover_notify(VFIOPCIDevice *vdev)
> > > > > > +{
> > > > > > +    PCIDevice *pdev = &vdev->pdev;
> > > > > > +
> > > > > > +    /*
> > > > > > +     * Guest driver may not get the chance to disable bus mastering
> > > > > > +     * before the device object gets to be unrealized. In that event,
> > > > > > +     * send out a "disabled" notification on behalf of guest driver.
> > > > > > +     */
> > > > > > +    if (pdev->failover_primary &&
> > > > > > +        pdev->bus_master_enable_region.enabled) {
> > > > > > +        vfio_failover_notify(vdev, false);
> > > > > > +    }
> > > > > > +}
> > > > > > +
> > > > > >  static void vfio_exitfn(PCIDevice *pdev)
> > > > > >  {
> > > > > >      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> > > > > >  
> > > > > > +    /*
> > > > > > +     * During the guest reboot sequence, it is sometimes possible that
> > > > > > +     * the guest may not get sufficient time to complete the entire driver
> > > > > > +     * removal sequence, near the end of which a PCI config space write to
> > > > > > +     * disable bus mastering can be intercepted by device. In such cases,
> > > > > > +     * the FAILOVER_PRIMARY_CHANGED "disable" event will not be emitted. It
> > > > > > +     * is imperative to generate the event on the guest's behalf if the
> > > > > > +     * guest fails to make it.
> > > > > > +     */
> > > > > > +    vfio_exit_failover_notify(vdev);
> > > > > > +
> > > > > >      vfio_unregister_req_notifier(vdev);
> > > > > >      vfio_unregister_err_notifier(vdev);
> > > > > >      pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
> > > > > > diff --git a/qapi/net.json b/qapi/net.json
> > > > > > index 633ac87..a5b8d70 100644
> > > > > > --- a/qapi/net.json
> > > > > > +++ b/qapi/net.json
> > > > > > @@ -757,3 +757,29 @@
> > > > > >  ##
> > > > > >  { 'command': 'query-standby-status', 'data': { '*device': 'str' },
> > > > > >    'returns': ['StandbyStatusInfo'] }
> > > > > > +
> > > > > > +##
> > > > > > +# @FAILOVER_PRIMARY_CHANGED:
> > > > > > +#
> > > > > > +# Emitted whenever the driver of failover primary is loaded or unloaded
> > > > > > +# by the guest.
> > > > > > +#
> > > > > > +# @device: device name
> > > > > > +#
> > > > > > +# @path: device path
> > > > > > +#
> > > > > > +# @enabled: true if driver is loaded thus device is enabled in guest
> > > > > > +#
> > > > > > +# Since: 3.0
> > > > > > +#
> > > > > > +# Example:
> > > > > > +#
> > > > > > +# <- { "event": "FAILOVER_PRIMARY_CHANGED",
> > > > > > +#      "data": { "device": "vfio-0",
> > > > > > +#                "path": "/machine/peripheral/vfio-0" },
> > > > > > +#                "enabled": "true" },
> > > > > > +#      "timestamp": { "seconds": 1539935213, "microseconds": 753529 } }
> > > > > > +#
> > > > > > +##
> > > > > > +{ 'event': 'FAILOVER_PRIMARY_CHANGED',
> > > > > > +  'data': { '*device': 'str', 'path': 'str', 'enabled': 'bool' } }
> > > > > >     

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [virtio-dev] Re: [Qemu-devel] [PATCH v3 4/5] vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover
@ 2019-01-08  0:43               ` Michael S. Tsirkin
  0 siblings, 0 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2019-01-08  0:43 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Venu Busireddy, Marcel Apfelbaum, Si-Wei Liu, virtio-dev, qemu-devel

On Mon, Jan 07, 2019 at 05:24:15PM -0700, Alex Williamson wrote:
> On Mon, 7 Jan 2019 19:12:06 -0500
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Mon, Jan 07, 2019 at 04:41:15PM -0700, Alex Williamson wrote:
> > > On Mon, 7 Jan 2019 18:22:20 -0500
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > >   
> > > > On Mon, Jan 07, 2019 at 04:17:17PM -0700, Alex Williamson wrote:  
> > > > > On Mon,  7 Jan 2019 17:29:43 -0500
> > > > > Venu Busireddy <venu.busireddy@oracle.com> wrote:
> > > > >     
> > > > > > From: Si-Wei Liu <si-wei.liu@oracle.com>
> > > > > > 
> > > > > > When a VF is hotplugged into the guest, datapath switching will be
> > > > > > performed immediately, which is sub-optimal in terms of timing, and
> > > > > > could end up with substantial network downtime. One of ways to shorten
> > > > > > this downtime is to switch the datapath only after the VF is seen to get
> > > > > > enabled by guest, indicated by the bus master bit in VF's PCI config
> > > > > > space getting enabled. The FAILOVER_PRIMARY_CHANGED event is emitted
> > > > > > at that time to indicate this condition. Then management stack can kick
> > > > > > off datapath switching upon receiving the event.
> > > > > > 
> > > > > > Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> > > > > > Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
> > > > > > ---
> > > > > >  hw/vfio/pci.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > > > >  qapi/net.json | 26 ++++++++++++++++++++++++++
> > > > > >  2 files changed, 83 insertions(+)    
> > > > > 
> > > > > Why is this done at the vfio driver layer rather than the PCI core
> > > > > layer?  We write everything through using pci_default_write_config(), I
> > > > > don't see that anything here is particularly vfio specific.  Please copy
> > > > > me on any changes in hw/vfio.  Thanks,
> > > > > 
> > > > > Alex    
> > > > 
> > > > Hmm so you are saying let's send events for each device?
> > > > I don't have a problem with this but in this case
> > > > I think I would like to see a per-device option "send events".
> > > > We don't want a ton of events in the simple default config.  
> > > 
> > > In the below we're only sending events for PCIDevice.failover_primary,  
> > 
> > Well failover_primary in this patch is a vfio property, not a
> > pci device property.
> 
> It's both and it's kind of a kludge (from 2/5):
> 
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3077,6 +3077,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>      vfio_register_err_notifier(vdev);
>      vfio_register_req_notifier(vdev);
>      vfio_setup_resetfn_quirk(vdev);
> +    pdev->failover_primary = vdev->failover_primary;
>  
>      return;
>  
> @@ -3219,6 +3220,8 @@ static Property vfio_pci_dev_properties[] = {
>                                     qdev_prop_nv_gpudirect_clique, uint8_t),
>      DEFINE_PROP_OFF_AUTO_PCIBAR("x-msix-relocation", VFIOPCIDevice, msix_relo,
>                                  OFF_AUTOPCIBAR_OFF),
> +    DEFINE_PROP_BOOL("failover-primary", VFIOPCIDevice, failover_primary,
> +                     false),
>      /*
>       * TODO - support passed fds... is this necessary?
>       * DEFINE_PROP_STRING("vfiofd", VFIOPCIDevice, vfiofd_name),
> 
> The property could have set VFIOPCIDevice.pdev.failover_primary
> directly.  I'm not thrilled about that name either, it's a very NIC
> centric property whereas vfio-pci supports plenty of non-networking
> devices, as of course does PCIDevice.  Maybe the concept needs to be
> more general or the name needs to be more NIC specific and fail for
> devices that don't have the correct class code.  Thanks,
> 
> Alex

I actually think it's generic concept. I came with a name failover
exactly to avoid the "bonding" name that was used originally
and was net specific.

In particular
https://fedoraproject.org/wiki/Features/Virt_Device_Failover
suggests using multipath for storage.

Can in theory easily be imagined to work with  rng, crypto
even though I don't think Linux makes supporting this easy.



> > > seems like that would filter out the rest of the non-NIC PCI devices as
> > > well as it does for non-NIC VFIO PCI devices.  The only thing remotely
> > > vfio specific below is that it might notify based on the vfio device
> > > name, but it's a fallback to PCIDevice.qdev.id.  A real ID could just
> > > be a requirement to make use of this.  
> > 
> > 
> > Right and in fact I don't see why we can't make reporting
> > bus master status a capability of all devices.
> > 
> > 
> > >  Thanks,
> > > 
> > > Alex
> > >   
> > > > > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > > > > > index bd83b58..adcc95a 100644
> > > > > > --- a/hw/vfio/pci.c
> > > > > > +++ b/hw/vfio/pci.c
> > > > > > @@ -34,6 +34,7 @@
> > > > > >  #include "pci.h"
> > > > > >  #include "trace.h"
> > > > > >  #include "qapi/error.h"
> > > > > > +#include "qapi/qapi-events-net.h"
> > > > > >  
> > > > > >  #define MSIX_CAP_LENGTH 12
> > > > > >  
> > > > > > @@ -42,6 +43,7 @@
> > > > > >  
> > > > > >  static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
> > > > > >  static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
> > > > > > +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status);
> > > > > >  
> > > > > >  /*
> > > > > >   * Disabling BAR mmaping can be slow, but toggling it around INTx can
> > > > > > @@ -1170,6 +1172,8 @@ void vfio_pci_write_config(PCIDevice *pdev,
> > > > > >  {
> > > > > >      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> > > > > >      uint32_t val_le = cpu_to_le32(val);
> > > > > > +    bool may_notify = false;
> > > > > > +    bool master_was = false;
> > > > > >  
> > > > > >      trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len);
> > > > > >  
> > > > > > @@ -1180,6 +1184,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
> > > > > >                       __func__, vdev->vbasedev.name, addr, val, len);
> > > > > >      }
> > > > > >  
> > > > > > +    /* Bus Master Enabling/Disabling */
> > > > > > +    if (pdev->failover_primary && current_cpu &&
> > > > > > +        range_covers_byte(addr, len, PCI_COMMAND)) {
> > > > > > +        master_was = !!(pci_get_word(pdev->config + PCI_COMMAND) &
> > > > > > +                        PCI_COMMAND_MASTER);
> > > > > > +        may_notify = true;
> > > > > > +    }
> > > > > > +
> > > > > >      /* MSI/MSI-X Enabling/Disabling */
> > > > > >      if (pdev->cap_present & QEMU_PCI_CAP_MSI &&
> > > > > >          ranges_overlap(addr, len, pdev->msi_cap, vdev->msi_cap_size)) {
> > > > > > @@ -1235,6 +1247,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
> > > > > >          /* Write everything to QEMU to keep emulated bits correct */
> > > > > >          pci_default_write_config(pdev, addr, val, len);
> > > > > >      }
> > > > > > +
> > > > > > +    if (may_notify) {
> > > > > > +        bool master_now = !!(pci_get_word(pdev->config + PCI_COMMAND) &
> > > > > > +                             PCI_COMMAND_MASTER);
> > > > > > +        if (master_was != master_now) {
> > > > > > +            vfio_failover_notify(vdev, master_now);
> > > > > > +        }
> > > > > > +    }
> > > > > >  }
> > > > > >  
> > > > > >  /*
> > > > > > @@ -2801,6 +2821,17 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
> > > > > >      vdev->req_enabled = false;
> > > > > >  }
> > > > > >  
> > > > > > +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status)
> > > > > > +{
> > > > > > +    PCIDevice *pdev = &vdev->pdev;
> > > > > > +    const char *n;
> > > > > > +    gchar *path;
> > > > > > +
> > > > > > +    n = pdev->qdev.id ? pdev->qdev.id : vdev->vbasedev.name;
> > > > > > +    path = object_get_canonical_path(OBJECT(vdev));
> > > > > > +    qapi_event_send_failover_primary_changed(!!n, n, path, status);
> > > > > > +}
> > > > > > +
> > > > > >  static void vfio_realize(PCIDevice *pdev, Error **errp)
> > > > > >  {
> > > > > >      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> > > > > > @@ -3109,10 +3140,36 @@ static void vfio_instance_finalize(Object *obj)
> > > > > >      vfio_put_group(group);
> > > > > >  }
> > > > > >  
> > > > > > +static void vfio_exit_failover_notify(VFIOPCIDevice *vdev)
> > > > > > +{
> > > > > > +    PCIDevice *pdev = &vdev->pdev;
> > > > > > +
> > > > > > +    /*
> > > > > > +     * Guest driver may not get the chance to disable bus mastering
> > > > > > +     * before the device object gets to be unrealized. In that event,
> > > > > > +     * send out a "disabled" notification on behalf of guest driver.
> > > > > > +     */
> > > > > > +    if (pdev->failover_primary &&
> > > > > > +        pdev->bus_master_enable_region.enabled) {
> > > > > > +        vfio_failover_notify(vdev, false);
> > > > > > +    }
> > > > > > +}
> > > > > > +
> > > > > >  static void vfio_exitfn(PCIDevice *pdev)
> > > > > >  {
> > > > > >      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> > > > > >  
> > > > > > +    /*
> > > > > > +     * During the guest reboot sequence, it is sometimes possible that
> > > > > > +     * the guest may not get sufficient time to complete the entire driver
> > > > > > +     * removal sequence, near the end of which a PCI config space write to
> > > > > > +     * disable bus mastering can be intercepted by device. In such cases,
> > > > > > +     * the FAILOVER_PRIMARY_CHANGED "disable" event will not be emitted. It
> > > > > > +     * is imperative to generate the event on the guest's behalf if the
> > > > > > +     * guest fails to make it.
> > > > > > +     */
> > > > > > +    vfio_exit_failover_notify(vdev);
> > > > > > +
> > > > > >      vfio_unregister_req_notifier(vdev);
> > > > > >      vfio_unregister_err_notifier(vdev);
> > > > > >      pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
> > > > > > diff --git a/qapi/net.json b/qapi/net.json
> > > > > > index 633ac87..a5b8d70 100644
> > > > > > --- a/qapi/net.json
> > > > > > +++ b/qapi/net.json
> > > > > > @@ -757,3 +757,29 @@
> > > > > >  ##
> > > > > >  { 'command': 'query-standby-status', 'data': { '*device': 'str' },
> > > > > >    'returns': ['StandbyStatusInfo'] }
> > > > > > +
> > > > > > +##
> > > > > > +# @FAILOVER_PRIMARY_CHANGED:
> > > > > > +#
> > > > > > +# Emitted whenever the driver of failover primary is loaded or unloaded
> > > > > > +# by the guest.
> > > > > > +#
> > > > > > +# @device: device name
> > > > > > +#
> > > > > > +# @path: device path
> > > > > > +#
> > > > > > +# @enabled: true if driver is loaded thus device is enabled in guest
> > > > > > +#
> > > > > > +# Since: 3.0
> > > > > > +#
> > > > > > +# Example:
> > > > > > +#
> > > > > > +# <- { "event": "FAILOVER_PRIMARY_CHANGED",
> > > > > > +#      "data": { "device": "vfio-0",
> > > > > > +#                "path": "/machine/peripheral/vfio-0" },
> > > > > > +#                "enabled": "true" },
> > > > > > +#      "timestamp": { "seconds": 1539935213, "microseconds": 753529 } }
> > > > > > +#
> > > > > > +##
> > > > > > +{ 'event': 'FAILOVER_PRIMARY_CHANGED',
> > > > > > +  'data': { '*device': 'str', 'path': 'str', 'enabled': 'bool' } }
> > > > > >     

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/5] vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover
  2019-01-07 23:41       ` Alex Williamson
@ 2019-01-08  1:13           ` si-wei liu
  2019-01-08  1:13           ` [virtio-dev] " si-wei liu
  1 sibling, 0 replies; 57+ messages in thread
From: si-wei liu @ 2019-01-08  1:13 UTC (permalink / raw)
  To: Alex Williamson, Michael S. Tsirkin
  Cc: Marcel Apfelbaum, Venu Busireddy, qemu-devel, virtio-dev



On 01/07/2019 03:41 PM, Alex Williamson wrote:
> On Mon, 7 Jan 2019 18:22:20 -0500
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>
>> On Mon, Jan 07, 2019 at 04:17:17PM -0700, Alex Williamson wrote:
>>> On Mon,  7 Jan 2019 17:29:43 -0500
>>> Venu Busireddy <venu.busireddy@oracle.com> wrote:
>>>    
>>>> From: Si-Wei Liu <si-wei.liu@oracle.com>
>>>>
>>>> When a VF is hotplugged into the guest, datapath switching will be
>>>> performed immediately, which is sub-optimal in terms of timing, and
>>>> could end up with substantial network downtime. One of ways to shorten
>>>> this downtime is to switch the datapath only after the VF is seen to get
>>>> enabled by guest, indicated by the bus master bit in VF's PCI config
>>>> space getting enabled. The FAILOVER_PRIMARY_CHANGED event is emitted
>>>> at that time to indicate this condition. Then management stack can kick
>>>> off datapath switching upon receiving the event.
>>>>
>>>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>>>> Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
>>>> ---
>>>>   hw/vfio/pci.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>   qapi/net.json | 26 ++++++++++++++++++++++++++
>>>>   2 files changed, 83 insertions(+)
>>> Why is this done at the vfio driver layer rather than the PCI core
>>> layer?  We write everything through using pci_default_write_config(), I
>>> don't see that anything here is particularly vfio specific.  Please copy
>>> me on any changes in hw/vfio.  Thanks,
>>>
>>> Alex
>> Hmm so you are saying let's send events for each device?
>> I don't have a problem with this but in this case
>> I think I would like to see a per-device option "send events".
>> We don't want a ton of events in the simple default config.
> In the below we're only sending events for PCIDevice.failover_primary,
> seems like that would filter out the rest of the non-NIC PCI devices as
> well as it does for non-NIC VFIO PCI devices.  The only thing remotely
> vfio specific below is that it might notify based on the vfio device
> name, but it's a fallback to PCIDevice.qdev.id.
Not exactly. It will first try to use the qdev ID to notify. If qdev id 
is missing (vfio-pci device could live without it),  then sysfsdev name 
will be used instead (in the form of host device 
"<bus>:<device>.<function>" location rather than ID). The intent was 
indeed to make this notification applicable to every possible vfio-pci 
device, even those without a qdev ID.

>   A real ID could just
> be a requirement to make use of this.
I'm fine to make qdev-id required for failover_primary PCI device. But 
please be noted, this is a shrinkage rather than generalization that has 
to apply to all other non-VFIO PCI devices that don't have to specify a 
qdev ID today.  I'm not sure if it's a good idea to make it restricted 
this early.

Thanks,
-Siwei

>   Thanks,
>
> Alex
>
>>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>>> index bd83b58..adcc95a 100644
>>>> --- a/hw/vfio/pci.c
>>>> +++ b/hw/vfio/pci.c
>>>> @@ -34,6 +34,7 @@
>>>>   #include "pci.h"
>>>>   #include "trace.h"
>>>>   #include "qapi/error.h"
>>>> +#include "qapi/qapi-events-net.h"
>>>>   
>>>>   #define MSIX_CAP_LENGTH 12
>>>>   
>>>> @@ -42,6 +43,7 @@
>>>>   
>>>>   static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
>>>>   static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
>>>> +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status);
>>>>   
>>>>   /*
>>>>    * Disabling BAR mmaping can be slow, but toggling it around INTx can
>>>> @@ -1170,6 +1172,8 @@ void vfio_pci_write_config(PCIDevice *pdev,
>>>>   {
>>>>       VFIOPCIDevice *vdev = PCI_VFIO(pdev);
>>>>       uint32_t val_le = cpu_to_le32(val);
>>>> +    bool may_notify = false;
>>>> +    bool master_was = false;
>>>>   
>>>>       trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len);
>>>>   
>>>> @@ -1180,6 +1184,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
>>>>                        __func__, vdev->vbasedev.name, addr, val, len);
>>>>       }
>>>>   
>>>> +    /* Bus Master Enabling/Disabling */
>>>> +    if (pdev->failover_primary && current_cpu &&
>>>> +        range_covers_byte(addr, len, PCI_COMMAND)) {
>>>> +        master_was = !!(pci_get_word(pdev->config + PCI_COMMAND) &
>>>> +                        PCI_COMMAND_MASTER);
>>>> +        may_notify = true;
>>>> +    }
>>>> +
>>>>       /* MSI/MSI-X Enabling/Disabling */
>>>>       if (pdev->cap_present & QEMU_PCI_CAP_MSI &&
>>>>           ranges_overlap(addr, len, pdev->msi_cap, vdev->msi_cap_size)) {
>>>> @@ -1235,6 +1247,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
>>>>           /* Write everything to QEMU to keep emulated bits correct */
>>>>           pci_default_write_config(pdev, addr, val, len);
>>>>       }
>>>> +
>>>> +    if (may_notify) {
>>>> +        bool master_now = !!(pci_get_word(pdev->config + PCI_COMMAND) &
>>>> +                             PCI_COMMAND_MASTER);
>>>> +        if (master_was != master_now) {
>>>> +            vfio_failover_notify(vdev, master_now);
>>>> +        }
>>>> +    }
>>>>   }
>>>>   
>>>>   /*
>>>> @@ -2801,6 +2821,17 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
>>>>       vdev->req_enabled = false;
>>>>   }
>>>>   
>>>> +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status)
>>>> +{
>>>> +    PCIDevice *pdev = &vdev->pdev;
>>>> +    const char *n;
>>>> +    gchar *path;
>>>> +
>>>> +    n = pdev->qdev.id ? pdev->qdev.id : vdev->vbasedev.name;
>>>> +    path = object_get_canonical_path(OBJECT(vdev));
>>>> +    qapi_event_send_failover_primary_changed(!!n, n, path, status);
>>>> +}
>>>> +
>>>>   static void vfio_realize(PCIDevice *pdev, Error **errp)
>>>>   {
>>>>       VFIOPCIDevice *vdev = PCI_VFIO(pdev);
>>>> @@ -3109,10 +3140,36 @@ static void vfio_instance_finalize(Object *obj)
>>>>       vfio_put_group(group);
>>>>   }
>>>>   
>>>> +static void vfio_exit_failover_notify(VFIOPCIDevice *vdev)
>>>> +{
>>>> +    PCIDevice *pdev = &vdev->pdev;
>>>> +
>>>> +    /*
>>>> +     * Guest driver may not get the chance to disable bus mastering
>>>> +     * before the device object gets to be unrealized. In that event,
>>>> +     * send out a "disabled" notification on behalf of guest driver.
>>>> +     */
>>>> +    if (pdev->failover_primary &&
>>>> +        pdev->bus_master_enable_region.enabled) {
>>>> +        vfio_failover_notify(vdev, false);
>>>> +    }
>>>> +}
>>>> +
>>>>   static void vfio_exitfn(PCIDevice *pdev)
>>>>   {
>>>>       VFIOPCIDevice *vdev = PCI_VFIO(pdev);
>>>>   
>>>> +    /*
>>>> +     * During the guest reboot sequence, it is sometimes possible that
>>>> +     * the guest may not get sufficient time to complete the entire driver
>>>> +     * removal sequence, near the end of which a PCI config space write to
>>>> +     * disable bus mastering can be intercepted by device. In such cases,
>>>> +     * the FAILOVER_PRIMARY_CHANGED "disable" event will not be emitted. It
>>>> +     * is imperative to generate the event on the guest's behalf if the
>>>> +     * guest fails to make it.
>>>> +     */
>>>> +    vfio_exit_failover_notify(vdev);
>>>> +
>>>>       vfio_unregister_req_notifier(vdev);
>>>>       vfio_unregister_err_notifier(vdev);
>>>>       pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
>>>> diff --git a/qapi/net.json b/qapi/net.json
>>>> index 633ac87..a5b8d70 100644
>>>> --- a/qapi/net.json
>>>> +++ b/qapi/net.json
>>>> @@ -757,3 +757,29 @@
>>>>   ##
>>>>   { 'command': 'query-standby-status', 'data': { '*device': 'str' },
>>>>     'returns': ['StandbyStatusInfo'] }
>>>> +
>>>> +##
>>>> +# @FAILOVER_PRIMARY_CHANGED:
>>>> +#
>>>> +# Emitted whenever the driver of failover primary is loaded or unloaded
>>>> +# by the guest.
>>>> +#
>>>> +# @device: device name
>>>> +#
>>>> +# @path: device path
>>>> +#
>>>> +# @enabled: true if driver is loaded thus device is enabled in guest
>>>> +#
>>>> +# Since: 3.0
>>>> +#
>>>> +# Example:
>>>> +#
>>>> +# <- { "event": "FAILOVER_PRIMARY_CHANGED",
>>>> +#      "data": { "device": "vfio-0",
>>>> +#                "path": "/machine/peripheral/vfio-0" },
>>>> +#                "enabled": "true" },
>>>> +#      "timestamp": { "seconds": 1539935213, "microseconds": 753529 } }
>>>> +#
>>>> +##
>>>> +{ 'event': 'FAILOVER_PRIMARY_CHANGED',
>>>> +  'data': { '*device': 'str', 'path': 'str', 'enabled': 'bool' } }
>>>>    
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [virtio-dev] Re: [Qemu-devel] [PATCH v3 4/5] vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover
@ 2019-01-08  1:13           ` si-wei liu
  0 siblings, 0 replies; 57+ messages in thread
From: si-wei liu @ 2019-01-08  1:13 UTC (permalink / raw)
  To: Alex Williamson, Michael S. Tsirkin
  Cc: Marcel Apfelbaum, Venu Busireddy, qemu-devel, virtio-dev



On 01/07/2019 03:41 PM, Alex Williamson wrote:
> On Mon, 7 Jan 2019 18:22:20 -0500
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>
>> On Mon, Jan 07, 2019 at 04:17:17PM -0700, Alex Williamson wrote:
>>> On Mon,  7 Jan 2019 17:29:43 -0500
>>> Venu Busireddy <venu.busireddy@oracle.com> wrote:
>>>    
>>>> From: Si-Wei Liu <si-wei.liu@oracle.com>
>>>>
>>>> When a VF is hotplugged into the guest, datapath switching will be
>>>> performed immediately, which is sub-optimal in terms of timing, and
>>>> could end up with substantial network downtime. One of ways to shorten
>>>> this downtime is to switch the datapath only after the VF is seen to get
>>>> enabled by guest, indicated by the bus master bit in VF's PCI config
>>>> space getting enabled. The FAILOVER_PRIMARY_CHANGED event is emitted
>>>> at that time to indicate this condition. Then management stack can kick
>>>> off datapath switching upon receiving the event.
>>>>
>>>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>>>> Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
>>>> ---
>>>>   hw/vfio/pci.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>   qapi/net.json | 26 ++++++++++++++++++++++++++
>>>>   2 files changed, 83 insertions(+)
>>> Why is this done at the vfio driver layer rather than the PCI core
>>> layer?  We write everything through using pci_default_write_config(), I
>>> don't see that anything here is particularly vfio specific.  Please copy
>>> me on any changes in hw/vfio.  Thanks,
>>>
>>> Alex
>> Hmm so you are saying let's send events for each device?
>> I don't have a problem with this but in this case
>> I think I would like to see a per-device option "send events".
>> We don't want a ton of events in the simple default config.
> In the below we're only sending events for PCIDevice.failover_primary,
> seems like that would filter out the rest of the non-NIC PCI devices as
> well as it does for non-NIC VFIO PCI devices.  The only thing remotely
> vfio specific below is that it might notify based on the vfio device
> name, but it's a fallback to PCIDevice.qdev.id.
Not exactly. It will first try to use the qdev ID to notify. If qdev id 
is missing (vfio-pci device could live without it),  then sysfsdev name 
will be used instead (in the form of host device 
"<bus>:<device>.<function>" location rather than ID). The intent was 
indeed to make this notification applicable to every possible vfio-pci 
device, even those without a qdev ID.

>   A real ID could just
> be a requirement to make use of this.
I'm fine to make qdev-id required for failover_primary PCI device. But 
please be noted, this is a shrinkage rather than generalization that has 
to apply to all other non-VFIO PCI devices that don't have to specify a 
qdev ID today.  I'm not sure if it's a good idea to make it restricted 
this early.

Thanks,
-Siwei

>   Thanks,
>
> Alex
>
>>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>>> index bd83b58..adcc95a 100644
>>>> --- a/hw/vfio/pci.c
>>>> +++ b/hw/vfio/pci.c
>>>> @@ -34,6 +34,7 @@
>>>>   #include "pci.h"
>>>>   #include "trace.h"
>>>>   #include "qapi/error.h"
>>>> +#include "qapi/qapi-events-net.h"
>>>>   
>>>>   #define MSIX_CAP_LENGTH 12
>>>>   
>>>> @@ -42,6 +43,7 @@
>>>>   
>>>>   static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
>>>>   static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
>>>> +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status);
>>>>   
>>>>   /*
>>>>    * Disabling BAR mmaping can be slow, but toggling it around INTx can
>>>> @@ -1170,6 +1172,8 @@ void vfio_pci_write_config(PCIDevice *pdev,
>>>>   {
>>>>       VFIOPCIDevice *vdev = PCI_VFIO(pdev);
>>>>       uint32_t val_le = cpu_to_le32(val);
>>>> +    bool may_notify = false;
>>>> +    bool master_was = false;
>>>>   
>>>>       trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len);
>>>>   
>>>> @@ -1180,6 +1184,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
>>>>                        __func__, vdev->vbasedev.name, addr, val, len);
>>>>       }
>>>>   
>>>> +    /* Bus Master Enabling/Disabling */
>>>> +    if (pdev->failover_primary && current_cpu &&
>>>> +        range_covers_byte(addr, len, PCI_COMMAND)) {
>>>> +        master_was = !!(pci_get_word(pdev->config + PCI_COMMAND) &
>>>> +                        PCI_COMMAND_MASTER);
>>>> +        may_notify = true;
>>>> +    }
>>>> +
>>>>       /* MSI/MSI-X Enabling/Disabling */
>>>>       if (pdev->cap_present & QEMU_PCI_CAP_MSI &&
>>>>           ranges_overlap(addr, len, pdev->msi_cap, vdev->msi_cap_size)) {
>>>> @@ -1235,6 +1247,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
>>>>           /* Write everything to QEMU to keep emulated bits correct */
>>>>           pci_default_write_config(pdev, addr, val, len);
>>>>       }
>>>> +
>>>> +    if (may_notify) {
>>>> +        bool master_now = !!(pci_get_word(pdev->config + PCI_COMMAND) &
>>>> +                             PCI_COMMAND_MASTER);
>>>> +        if (master_was != master_now) {
>>>> +            vfio_failover_notify(vdev, master_now);
>>>> +        }
>>>> +    }
>>>>   }
>>>>   
>>>>   /*
>>>> @@ -2801,6 +2821,17 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
>>>>       vdev->req_enabled = false;
>>>>   }
>>>>   
>>>> +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status)
>>>> +{
>>>> +    PCIDevice *pdev = &vdev->pdev;
>>>> +    const char *n;
>>>> +    gchar *path;
>>>> +
>>>> +    n = pdev->qdev.id ? pdev->qdev.id : vdev->vbasedev.name;
>>>> +    path = object_get_canonical_path(OBJECT(vdev));
>>>> +    qapi_event_send_failover_primary_changed(!!n, n, path, status);
>>>> +}
>>>> +
>>>>   static void vfio_realize(PCIDevice *pdev, Error **errp)
>>>>   {
>>>>       VFIOPCIDevice *vdev = PCI_VFIO(pdev);
>>>> @@ -3109,10 +3140,36 @@ static void vfio_instance_finalize(Object *obj)
>>>>       vfio_put_group(group);
>>>>   }
>>>>   
>>>> +static void vfio_exit_failover_notify(VFIOPCIDevice *vdev)
>>>> +{
>>>> +    PCIDevice *pdev = &vdev->pdev;
>>>> +
>>>> +    /*
>>>> +     * Guest driver may not get the chance to disable bus mastering
>>>> +     * before the device object gets to be unrealized. In that event,
>>>> +     * send out a "disabled" notification on behalf of guest driver.
>>>> +     */
>>>> +    if (pdev->failover_primary &&
>>>> +        pdev->bus_master_enable_region.enabled) {
>>>> +        vfio_failover_notify(vdev, false);
>>>> +    }
>>>> +}
>>>> +
>>>>   static void vfio_exitfn(PCIDevice *pdev)
>>>>   {
>>>>       VFIOPCIDevice *vdev = PCI_VFIO(pdev);
>>>>   
>>>> +    /*
>>>> +     * During the guest reboot sequence, it is sometimes possible that
>>>> +     * the guest may not get sufficient time to complete the entire driver
>>>> +     * removal sequence, near the end of which a PCI config space write to
>>>> +     * disable bus mastering can be intercepted by device. In such cases,
>>>> +     * the FAILOVER_PRIMARY_CHANGED "disable" event will not be emitted. It
>>>> +     * is imperative to generate the event on the guest's behalf if the
>>>> +     * guest fails to make it.
>>>> +     */
>>>> +    vfio_exit_failover_notify(vdev);
>>>> +
>>>>       vfio_unregister_req_notifier(vdev);
>>>>       vfio_unregister_err_notifier(vdev);
>>>>       pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
>>>> diff --git a/qapi/net.json b/qapi/net.json
>>>> index 633ac87..a5b8d70 100644
>>>> --- a/qapi/net.json
>>>> +++ b/qapi/net.json
>>>> @@ -757,3 +757,29 @@
>>>>   ##
>>>>   { 'command': 'query-standby-status', 'data': { '*device': 'str' },
>>>>     'returns': ['StandbyStatusInfo'] }
>>>> +
>>>> +##
>>>> +# @FAILOVER_PRIMARY_CHANGED:
>>>> +#
>>>> +# Emitted whenever the driver of failover primary is loaded or unloaded
>>>> +# by the guest.
>>>> +#
>>>> +# @device: device name
>>>> +#
>>>> +# @path: device path
>>>> +#
>>>> +# @enabled: true if driver is loaded thus device is enabled in guest
>>>> +#
>>>> +# Since: 3.0
>>>> +#
>>>> +# Example:
>>>> +#
>>>> +# <- { "event": "FAILOVER_PRIMARY_CHANGED",
>>>> +#      "data": { "device": "vfio-0",
>>>> +#                "path": "/machine/peripheral/vfio-0" },
>>>> +#                "enabled": "true" },
>>>> +#      "timestamp": { "seconds": 1539935213, "microseconds": 753529 } }
>>>> +#
>>>> +##
>>>> +{ 'event': 'FAILOVER_PRIMARY_CHANGED',
>>>> +  'data': { '*device': 'str', 'path': 'str', 'enabled': 'bool' } }
>>>>    
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v3 0/5] Support for datapath switching during live migration
  2019-01-07 23:32   ` [virtio-dev] " Michael S. Tsirkin
@ 2019-01-08  1:45     ` si-wei liu
  -1 siblings, 0 replies; 57+ messages in thread
From: si-wei liu @ 2019-01-08  1:45 UTC (permalink / raw)
  To: Michael S. Tsirkin, Venu Busireddy
  Cc: Marcel Apfelbaum, virtio-dev, qemu-devel



On 01/07/2019 03:32 PM, Michael S. Tsirkin wrote:
> On Mon, Jan 07, 2019 at 05:29:39PM -0500, Venu Busireddy wrote:
>> Implement the infrastructure to support datapath switching during live
>> migration involving SR-IOV devices.
>>
>> 1. This patch is based off on the current VIRTIO_NET_F_STANDBY feature
>>     bit and MAC address device pairing.
>>
>> 2. This set of events will be consumed by userspace management software
>>     to orchestrate all the hot plug and datapath switching activities.
>>     This scheme has the least QEMU modifications while allowing userspace
>>     software to build its own intelligence to control the whole process
>>     of SR-IOV live migration.
>>
>> 3. While the hidden device model (viz. coupled device model) is still
>>     being explored for automatic hot plugging (QEMU) and automatic datapath
>>     switching (host-kernel), this series provides a supplemental set
>>     of interfaces if management software wants to drive the SR-IOV live
>>     migration on its own. It should not conflict with the hidden device
>>     model but just offers simplicity of implementation.
>>
>>
>> Si-Wei Liu (2):
>>    vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover
>>    pci: query command extension to check the bus master enabling status of the failover-primary device
>>
>> Sridhar Samudrala (1):
>>    virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.
>>
>> Venu Busireddy (2):
>>    virtio_net: Add support for "Data Path Switching" during Live Migration.
>>    virtio_net: Add a query command for FAILOVER_STANDBY_CHANGED event.
>>
>> ---
>> Changes in v3:
>>    Fix issues with coding style in patch 3/5.
>>
>> Changes in v2:
>>    Added a query command for FAILOVER_STANDBY_CHANGED event.
>>    Added a query command for FAILOVER_PRIMARY_CHANGED event.
> Hmm it looks like all feedback I sent e.g. here:
> https://patchwork.kernel.org/patch/10721571/
> got ignored.
>
> To summarize I suggest reworking the series adding a new command along
> the lines of (naming is up to you):
>
> query-pci-master - this returns status for a device
> 		   and enables a *single* event after
> 		   it changes
>
> and then removing all status data from the event,
> just notify about the change and *only once*.
Why removing all status data from the event? It does not hurt to keep 
them as the FAILOVER_PRIMARY_CHANGED event in general is of pretty 
low-frequency. As can be seen other similar low-frequent QMP events do 
have data carried over.

As this event relates to datapath switching, there's implication to 
coalesce events as packets might not get a chance to send out as nothing 
would ever happen when  going through quick transitions like 
disabled->enabled->disabled. I would allow at least few packets to be 
sent over wire rather than nothing. Who knows how fast management can 
react and consume these events?

Thanks,
-Siwei

> 	
>
> upon event management does query-pci-master
> and acts accordingly.
>
>
>
>
>>   hmp.c                          |   5 +++
>>   hw/acpi/pcihp.c                |  27 +++++++++++
>>   hw/net/virtio-net.c            |  42 +++++++++++++++++
>>   hw/pci/pci.c                   |   5 +++
>>   hw/vfio/pci.c                  |  60 +++++++++++++++++++++++++
>>   hw/vfio/pci.h                  |   1 +
>>   include/hw/pci/pci.h           |   1 +
>>   include/hw/virtio/virtio-net.h |   1 +
>>   include/net/net.h              |   2 +
>>   net/net.c                      |  61 +++++++++++++++++++++++++
>>   qapi/misc.json                 |   5 ++-
>>   qapi/net.json                  | 100 +++++++++++++++++++++++++++++++++++++++++
>>   12 files changed, 309 insertions(+), 1 deletion(-)
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [virtio-dev] Re: [PATCH v3 0/5] Support for datapath switching during live migration
@ 2019-01-08  1:45     ` si-wei liu
  0 siblings, 0 replies; 57+ messages in thread
From: si-wei liu @ 2019-01-08  1:45 UTC (permalink / raw)
  To: Michael S. Tsirkin, Venu Busireddy
  Cc: Marcel Apfelbaum, virtio-dev, qemu-devel



On 01/07/2019 03:32 PM, Michael S. Tsirkin wrote:
> On Mon, Jan 07, 2019 at 05:29:39PM -0500, Venu Busireddy wrote:
>> Implement the infrastructure to support datapath switching during live
>> migration involving SR-IOV devices.
>>
>> 1. This patch is based off on the current VIRTIO_NET_F_STANDBY feature
>>     bit and MAC address device pairing.
>>
>> 2. This set of events will be consumed by userspace management software
>>     to orchestrate all the hot plug and datapath switching activities.
>>     This scheme has the least QEMU modifications while allowing userspace
>>     software to build its own intelligence to control the whole process
>>     of SR-IOV live migration.
>>
>> 3. While the hidden device model (viz. coupled device model) is still
>>     being explored for automatic hot plugging (QEMU) and automatic datapath
>>     switching (host-kernel), this series provides a supplemental set
>>     of interfaces if management software wants to drive the SR-IOV live
>>     migration on its own. It should not conflict with the hidden device
>>     model but just offers simplicity of implementation.
>>
>>
>> Si-Wei Liu (2):
>>    vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover
>>    pci: query command extension to check the bus master enabling status of the failover-primary device
>>
>> Sridhar Samudrala (1):
>>    virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.
>>
>> Venu Busireddy (2):
>>    virtio_net: Add support for "Data Path Switching" during Live Migration.
>>    virtio_net: Add a query command for FAILOVER_STANDBY_CHANGED event.
>>
>> ---
>> Changes in v3:
>>    Fix issues with coding style in patch 3/5.
>>
>> Changes in v2:
>>    Added a query command for FAILOVER_STANDBY_CHANGED event.
>>    Added a query command for FAILOVER_PRIMARY_CHANGED event.
> Hmm it looks like all feedback I sent e.g. here:
> https://patchwork.kernel.org/patch/10721571/
> got ignored.
>
> To summarize I suggest reworking the series adding a new command along
> the lines of (naming is up to you):
>
> query-pci-master - this returns status for a device
> 		   and enables a *single* event after
> 		   it changes
>
> and then removing all status data from the event,
> just notify about the change and *only once*.
Why removing all status data from the event? It does not hurt to keep 
them as the FAILOVER_PRIMARY_CHANGED event in general is of pretty 
low-frequency. As can be seen other similar low-frequent QMP events do 
have data carried over.

As this event relates to datapath switching, there's implication to 
coalesce events as packets might not get a chance to send out as nothing 
would ever happen when  going through quick transitions like 
disabled->enabled->disabled. I would allow at least few packets to be 
sent over wire rather than nothing. Who knows how fast management can 
react and consume these events?

Thanks,
-Siwei

> 	
>
> upon event management does query-pci-master
> and acts accordingly.
>
>
>
>
>>   hmp.c                          |   5 +++
>>   hw/acpi/pcihp.c                |  27 +++++++++++
>>   hw/net/virtio-net.c            |  42 +++++++++++++++++
>>   hw/pci/pci.c                   |   5 +++
>>   hw/vfio/pci.c                  |  60 +++++++++++++++++++++++++
>>   hw/vfio/pci.h                  |   1 +
>>   include/hw/pci/pci.h           |   1 +
>>   include/hw/virtio/virtio-net.h |   1 +
>>   include/net/net.h              |   2 +
>>   net/net.c                      |  61 +++++++++++++++++++++++++
>>   qapi/misc.json                 |   5 ++-
>>   qapi/net.json                  | 100 +++++++++++++++++++++++++++++++++++++++++
>>   12 files changed, 309 insertions(+), 1 deletion(-)
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v3 0/5] Support for datapath switching during live migration
  2019-01-08  1:45     ` si-wei liu
@ 2019-01-08  2:25       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2019-01-08  2:25 UTC (permalink / raw)
  To: si-wei liu; +Cc: Venu Busireddy, Marcel Apfelbaum, virtio-dev, qemu-devel

On Mon, Jan 07, 2019 at 05:45:22PM -0800, si-wei liu wrote:
> 
> 
> On 01/07/2019 03:32 PM, Michael S. Tsirkin wrote:
> > On Mon, Jan 07, 2019 at 05:29:39PM -0500, Venu Busireddy wrote:
> > > Implement the infrastructure to support datapath switching during live
> > > migration involving SR-IOV devices.
> > > 
> > > 1. This patch is based off on the current VIRTIO_NET_F_STANDBY feature
> > >     bit and MAC address device pairing.
> > > 
> > > 2. This set of events will be consumed by userspace management software
> > >     to orchestrate all the hot plug and datapath switching activities.
> > >     This scheme has the least QEMU modifications while allowing userspace
> > >     software to build its own intelligence to control the whole process
> > >     of SR-IOV live migration.
> > > 
> > > 3. While the hidden device model (viz. coupled device model) is still
> > >     being explored for automatic hot plugging (QEMU) and automatic datapath
> > >     switching (host-kernel), this series provides a supplemental set
> > >     of interfaces if management software wants to drive the SR-IOV live
> > >     migration on its own. It should not conflict with the hidden device
> > >     model but just offers simplicity of implementation.
> > > 
> > > 
> > > Si-Wei Liu (2):
> > >    vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover
> > >    pci: query command extension to check the bus master enabling status of the failover-primary device
> > > 
> > > Sridhar Samudrala (1):
> > >    virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.
> > > 
> > > Venu Busireddy (2):
> > >    virtio_net: Add support for "Data Path Switching" during Live Migration.
> > >    virtio_net: Add a query command for FAILOVER_STANDBY_CHANGED event.
> > > 
> > > ---
> > > Changes in v3:
> > >    Fix issues with coding style in patch 3/5.
> > > 
> > > Changes in v2:
> > >    Added a query command for FAILOVER_STANDBY_CHANGED event.
> > >    Added a query command for FAILOVER_PRIMARY_CHANGED event.
> > Hmm it looks like all feedback I sent e.g. here:
> > https://patchwork.kernel.org/patch/10721571/
> > got ignored.
> > 
> > To summarize I suggest reworking the series adding a new command along
> > the lines of (naming is up to you):
> > 
> > query-pci-master - this returns status for a device
> > 		   and enables a *single* event after
> > 		   it changes
> > 
> > and then removing all status data from the event,
> > just notify about the change and *only once*.
> Why removing all status data from the event?

To make sure users do not forget to call query-pci-master to
re-enable more events.

> It does not hurt to keep them
> as the FAILOVER_PRIMARY_CHANGED event in general is of pretty low-frequency.

A malicious guest can make it as frequent as it wants to.
OTOH there is no way to limit.


> As can be seen other similar low-frequent QMP events do have data carried
> over.
> 
> As this event relates to datapath switching, there's implication to coalesce
> events as packets might not get a chance to send out as nothing would ever
> happen when  going through quick transitions like
> disabled->enabled->disabled. I would allow at least few packets to be sent
> over wire rather than nothing. Who knows how fast management can react and
> consume these events?
> 
> Thanks,
> -Siwei

OK if it's so important for latency let's include data in the event.
Please add comments explaining that you must always re-run query
afterwards to make sure it's stable and re-enable more events.



> > 	
> > 
> > upon event management does query-pci-master
> > and acts accordingly.
> > 
> > 
> > 
> > 
> > >   hmp.c                          |   5 +++
> > >   hw/acpi/pcihp.c                |  27 +++++++++++
> > >   hw/net/virtio-net.c            |  42 +++++++++++++++++
> > >   hw/pci/pci.c                   |   5 +++
> > >   hw/vfio/pci.c                  |  60 +++++++++++++++++++++++++
> > >   hw/vfio/pci.h                  |   1 +
> > >   include/hw/pci/pci.h           |   1 +
> > >   include/hw/virtio/virtio-net.h |   1 +
> > >   include/net/net.h              |   2 +
> > >   net/net.c                      |  61 +++++++++++++++++++++++++
> > >   qapi/misc.json                 |   5 ++-
> > >   qapi/net.json                  | 100 +++++++++++++++++++++++++++++++++++++++++
> > >   12 files changed, 309 insertions(+), 1 deletion(-)
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [virtio-dev] Re: [PATCH v3 0/5] Support for datapath switching during live migration
@ 2019-01-08  2:25       ` Michael S. Tsirkin
  0 siblings, 0 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2019-01-08  2:25 UTC (permalink / raw)
  To: si-wei liu; +Cc: Venu Busireddy, Marcel Apfelbaum, virtio-dev, qemu-devel

On Mon, Jan 07, 2019 at 05:45:22PM -0800, si-wei liu wrote:
> 
> 
> On 01/07/2019 03:32 PM, Michael S. Tsirkin wrote:
> > On Mon, Jan 07, 2019 at 05:29:39PM -0500, Venu Busireddy wrote:
> > > Implement the infrastructure to support datapath switching during live
> > > migration involving SR-IOV devices.
> > > 
> > > 1. This patch is based off on the current VIRTIO_NET_F_STANDBY feature
> > >     bit and MAC address device pairing.
> > > 
> > > 2. This set of events will be consumed by userspace management software
> > >     to orchestrate all the hot plug and datapath switching activities.
> > >     This scheme has the least QEMU modifications while allowing userspace
> > >     software to build its own intelligence to control the whole process
> > >     of SR-IOV live migration.
> > > 
> > > 3. While the hidden device model (viz. coupled device model) is still
> > >     being explored for automatic hot plugging (QEMU) and automatic datapath
> > >     switching (host-kernel), this series provides a supplemental set
> > >     of interfaces if management software wants to drive the SR-IOV live
> > >     migration on its own. It should not conflict with the hidden device
> > >     model but just offers simplicity of implementation.
> > > 
> > > 
> > > Si-Wei Liu (2):
> > >    vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover
> > >    pci: query command extension to check the bus master enabling status of the failover-primary device
> > > 
> > > Sridhar Samudrala (1):
> > >    virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.
> > > 
> > > Venu Busireddy (2):
> > >    virtio_net: Add support for "Data Path Switching" during Live Migration.
> > >    virtio_net: Add a query command for FAILOVER_STANDBY_CHANGED event.
> > > 
> > > ---
> > > Changes in v3:
> > >    Fix issues with coding style in patch 3/5.
> > > 
> > > Changes in v2:
> > >    Added a query command for FAILOVER_STANDBY_CHANGED event.
> > >    Added a query command for FAILOVER_PRIMARY_CHANGED event.
> > Hmm it looks like all feedback I sent e.g. here:
> > https://patchwork.kernel.org/patch/10721571/
> > got ignored.
> > 
> > To summarize I suggest reworking the series adding a new command along
> > the lines of (naming is up to you):
> > 
> > query-pci-master - this returns status for a device
> > 		   and enables a *single* event after
> > 		   it changes
> > 
> > and then removing all status data from the event,
> > just notify about the change and *only once*.
> Why removing all status data from the event?

To make sure users do not forget to call query-pci-master to
re-enable more events.

> It does not hurt to keep them
> as the FAILOVER_PRIMARY_CHANGED event in general is of pretty low-frequency.

A malicious guest can make it as frequent as it wants to.
OTOH there is no way to limit.


> As can be seen other similar low-frequent QMP events do have data carried
> over.
> 
> As this event relates to datapath switching, there's implication to coalesce
> events as packets might not get a chance to send out as nothing would ever
> happen when  going through quick transitions like
> disabled->enabled->disabled. I would allow at least few packets to be sent
> over wire rather than nothing. Who knows how fast management can react and
> consume these events?
> 
> Thanks,
> -Siwei

OK if it's so important for latency let's include data in the event.
Please add comments explaining that you must always re-run query
afterwards to make sure it's stable and re-enable more events.



> > 	
> > 
> > upon event management does query-pci-master
> > and acts accordingly.
> > 
> > 
> > 
> > 
> > >   hmp.c                          |   5 +++
> > >   hw/acpi/pcihp.c                |  27 +++++++++++
> > >   hw/net/virtio-net.c            |  42 +++++++++++++++++
> > >   hw/pci/pci.c                   |   5 +++
> > >   hw/vfio/pci.c                  |  60 +++++++++++++++++++++++++
> > >   hw/vfio/pci.h                  |   1 +
> > >   include/hw/pci/pci.h           |   1 +
> > >   include/hw/virtio/virtio-net.h |   1 +
> > >   include/net/net.h              |   2 +
> > >   net/net.c                      |  61 +++++++++++++++++++++++++
> > >   qapi/misc.json                 |   5 ++-
> > >   qapi/net.json                  | 100 +++++++++++++++++++++++++++++++++++++++++
> > >   12 files changed, 309 insertions(+), 1 deletion(-)
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > 

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/5] virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.
  2019-01-07 22:29   ` [virtio-dev] " Venu Busireddy
  (?)
@ 2019-01-08 16:56   ` Dongli Zhang
  2019-01-08 17:25       ` [virtio-dev] " Venu Busireddy
  -1 siblings, 1 reply; 57+ messages in thread
From: Dongli Zhang @ 2019-01-08 16:56 UTC (permalink / raw)
  To: Venu Busireddy, si-wei.liu
  Cc: Michael S. Tsirkin, Marcel Apfelbaum, Sridhar Samudrala,
	qemu-devel, virtio-dev

I am not familiar with libvirt and I would like to play with this only with qemu.

With failover, we need to hotplug the VF on destination server to VM after live
migration. However, the VF on destination server would have different mac address.

How can we specify the mac for the new VF to hotplug via qemu, as VF is only a
vfio pci device?

I am trying to play with this with only qemu (w/o libvirt).

Thank you very much!

Dongli Zhang

On 01/08/2019 06:29 AM, Venu Busireddy wrote:
> From: Sridhar Samudrala <sridhar.samudrala@intel.com>
> 
> This feature bit can be used by a hypervisor to indicate to the virtio_net
> device that it can act as a standby for another device with the same MAC
> address.
> 
> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
> ---
>  hw/net/virtio-net.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 385b1a0..411f8fb 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
>                       true),
>      DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
>      DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
> +    DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY,
> +                      false),
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/5] virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.
  2019-01-08 16:56   ` [Qemu-devel] " Dongli Zhang
@ 2019-01-08 17:25       ` Venu Busireddy
  0 siblings, 0 replies; 57+ messages in thread
From: Venu Busireddy @ 2019-01-08 17:25 UTC (permalink / raw)
  To: Dongli Zhang
  Cc: si-wei.liu, Michael S. Tsirkin, Marcel Apfelbaum,
	Sridhar Samudrala, qemu-devel, virtio-dev

On 2019-01-09 00:56:38 +0800, Dongli Zhang wrote:
> I am not familiar with libvirt and I would like to play with this only with qemu.
> 
> With failover, we need to hotplug the VF on destination server to VM after live
> migration. However, the VF on destination server would have different mac address.
> 
> How can we specify the mac for the new VF to hotplug via qemu, as VF is only a
> vfio pci device?

How is the VF device on the destination host any different from the VF
on the source host?

As you do on the source host, you first assign the MAC address of
00:00:00:00:00:00 to the VF. After the migration, you assign the same
MAC address as that of the virtio_net device to the VF, and hotadd the VF
device to the VM. And then, after you receive the FAILOVER_PRIMARY_CHANGED
event, set the macvtap device to down state.

Venu

> 
> I am trying to play with this with only qemu (w/o libvirt).
> 
> Thank you very much!
> 
> Dongli Zhang
> 
> On 01/08/2019 06:29 AM, Venu Busireddy wrote:
> > From: Sridhar Samudrala <sridhar.samudrala@intel.com>
> > 
> > This feature bit can be used by a hypervisor to indicate to the virtio_net
> > device that it can act as a standby for another device with the same MAC
> > address.
> > 
> > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> > Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
> > ---
> >  hw/net/virtio-net.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > index 385b1a0..411f8fb 100644
> > --- a/hw/net/virtio-net.c
> > +++ b/hw/net/virtio-net.c
> > @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
> >                       true),
> >      DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
> >      DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
> > +    DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY,
> > +                      false),
> >      DEFINE_PROP_END_OF_LIST(),
> >  };
> >  
> > 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [virtio-dev] Re: [Qemu-devel] [PATCH v3 1/5] virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.
@ 2019-01-08 17:25       ` Venu Busireddy
  0 siblings, 0 replies; 57+ messages in thread
From: Venu Busireddy @ 2019-01-08 17:25 UTC (permalink / raw)
  To: Dongli Zhang
  Cc: si-wei.liu, Michael S. Tsirkin, Marcel Apfelbaum,
	Sridhar Samudrala, qemu-devel, virtio-dev

On 2019-01-09 00:56:38 +0800, Dongli Zhang wrote:
> I am not familiar with libvirt and I would like to play with this only with qemu.
> 
> With failover, we need to hotplug the VF on destination server to VM after live
> migration. However, the VF on destination server would have different mac address.
> 
> How can we specify the mac for the new VF to hotplug via qemu, as VF is only a
> vfio pci device?

How is the VF device on the destination host any different from the VF
on the source host?

As you do on the source host, you first assign the MAC address of
00:00:00:00:00:00 to the VF. After the migration, you assign the same
MAC address as that of the virtio_net device to the VF, and hotadd the VF
device to the VM. And then, after you receive the FAILOVER_PRIMARY_CHANGED
event, set the macvtap device to down state.

Venu

> 
> I am trying to play with this with only qemu (w/o libvirt).
> 
> Thank you very much!
> 
> Dongli Zhang
> 
> On 01/08/2019 06:29 AM, Venu Busireddy wrote:
> > From: Sridhar Samudrala <sridhar.samudrala@intel.com>
> > 
> > This feature bit can be used by a hypervisor to indicate to the virtio_net
> > device that it can act as a standby for another device with the same MAC
> > address.
> > 
> > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> > Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
> > ---
> >  hw/net/virtio-net.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > index 385b1a0..411f8fb 100644
> > --- a/hw/net/virtio-net.c
> > +++ b/hw/net/virtio-net.c
> > @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
> >                       true),
> >      DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
> >      DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
> > +    DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY,
> > +                      false),
> >      DEFINE_PROP_END_OF_LIST(),
> >  };
> >  
> > 

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/5] virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.
  2019-01-08 17:25       ` [virtio-dev] " Venu Busireddy
  (?)
@ 2019-01-09  0:14       ` Dongli Zhang
  2019-01-09  0:18           ` [virtio-dev] " Samudrala, Sridhar
  -1 siblings, 1 reply; 57+ messages in thread
From: Dongli Zhang @ 2019-01-09  0:14 UTC (permalink / raw)
  To: Venu Busireddy
  Cc: virtio-dev, Michael S. Tsirkin, Sridhar Samudrala, qemu-devel,
	Marcel Apfelbaum, si-wei.liu

Hi Venu,

On 2019/1/9 上午1:25, Venu Busireddy wrote:
> On 2019-01-09 00:56:38 +0800, Dongli Zhang wrote:
>> I am not familiar with libvirt and I would like to play with this only with qemu.
>>
>> With failover, we need to hotplug the VF on destination server to VM after live
>> migration. However, the VF on destination server would have different mac address.
>>
>> How can we specify the mac for the new VF to hotplug via qemu, as VF is only a
>> vfio pci device?
> 
> How is the VF device on the destination host any different from the VF
> on the source host?
> 
> As you do on the source host, you first assign the MAC address of
> 00:00:00:00:00:00 to the VF. After the migration, you assign the same
> MAC address as that of the virtio_net device to the VF, and hotadd the VF

This was what I was wondering.

How the mac address is configured for VF (or any NIC like PF) after it is
assigned to vfio?

Thank you very much!

Dongli Zhang


> device to the VM. And then, after you receive the FAILOVER_PRIMARY_CHANGED
> event, set the macvtap device to down state.
> 
> Venu
> 
>>
>> I am trying to play with this with only qemu (w/o libvirt).
>>
>> Thank you very much!
>>
>> Dongli Zhang
>>
>> On 01/08/2019 06:29 AM, Venu Busireddy wrote:
>>> From: Sridhar Samudrala <sridhar.samudrala@intel.com>
>>>
>>> This feature bit can be used by a hypervisor to indicate to the virtio_net
>>> device that it can act as a standby for another device with the same MAC
>>> address.
>>>
>>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>>> Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
>>> ---
>>>  hw/net/virtio-net.c | 2 ++
>>>  1 file changed, 2 insertions(+)
>>>
>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>> index 385b1a0..411f8fb 100644
>>> --- a/hw/net/virtio-net.c
>>> +++ b/hw/net/virtio-net.c
>>> @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
>>>                       true),
>>>      DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
>>>      DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
>>> +    DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY,
>>> +                      false),
>>>      DEFINE_PROP_END_OF_LIST(),
>>>  };
>>>  
>>>
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/5] virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.
  2019-01-09  0:14       ` Dongli Zhang
@ 2019-01-09  0:18           ` Samudrala, Sridhar
  0 siblings, 0 replies; 57+ messages in thread
From: Samudrala, Sridhar @ 2019-01-09  0:18 UTC (permalink / raw)
  To: Dongli Zhang, Venu Busireddy
  Cc: virtio-dev, Michael S. Tsirkin, qemu-devel, Marcel Apfelbaum, si-wei.liu

On 1/8/2019 4:14 PM, Dongli Zhang wrote:
> Hi Venu,
>
> On 2019/1/9 上午1:25, Venu Busireddy wrote:
>> On 2019-01-09 00:56:38 +0800, Dongli Zhang wrote:
>>> I am not familiar with libvirt and I would like to play with this only with qemu.
>>>
>>> With failover, we need to hotplug the VF on destination server to VM after live
>>> migration. However, the VF on destination server would have different mac address.
>>>
>>> How can we specify the mac for the new VF to hotplug via qemu, as VF is only a
>>> vfio pci device?
>> How is the VF device on the destination host any different from the VF
>> on the source host?
>>
>> As you do on the source host, you first assign the MAC address of
>> 00:00:00:00:00:00 to the VF. After the migration, you assign the same
>> MAC address as that of the virtio_net device to the VF, and hotadd the VF
> This was what I was wondering.
>
> How the mac address is configured for VF (or any NIC like PF) after it is
> assigned to vfio?

ip link set <pf> vf  <vf-num>  mac <MAC>

See https://www.kernel.org/doc/html/latest/networking/net_failover.html
for a sample script that shows the steps to initiate live migration with VF
and virtio-net in standby mode.


>
> Thank you very much!
>
> Dongli Zhang
>
>
>> device to the VM. And then, after you receive the FAILOVER_PRIMARY_CHANGED
>> event, set the macvtap device to down state.
>>
>> Venu
>>
>>> I am trying to play with this with only qemu (w/o libvirt).
>>>
>>> Thank you very much!
>>>
>>> Dongli Zhang
>>>
>>> On 01/08/2019 06:29 AM, Venu Busireddy wrote:
>>>> From: Sridhar Samudrala <sridhar.samudrala@intel.com>
>>>>
>>>> This feature bit can be used by a hypervisor to indicate to the virtio_net
>>>> device that it can act as a standby for another device with the same MAC
>>>> address.
>>>>
>>>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>>>> Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
>>>> ---
>>>>   hw/net/virtio-net.c | 2 ++
>>>>   1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>>> index 385b1a0..411f8fb 100644
>>>> --- a/hw/net/virtio-net.c
>>>> +++ b/hw/net/virtio-net.c
>>>> @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
>>>>                        true),
>>>>       DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
>>>>       DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
>>>> +    DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY,
>>>> +                      false),
>>>>       DEFINE_PROP_END_OF_LIST(),
>>>>   };
>>>>   
>>>>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [virtio-dev] Re: [Qemu-devel] [PATCH v3 1/5] virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.
@ 2019-01-09  0:18           ` Samudrala, Sridhar
  0 siblings, 0 replies; 57+ messages in thread
From: Samudrala, Sridhar @ 2019-01-09  0:18 UTC (permalink / raw)
  To: Dongli Zhang, Venu Busireddy
  Cc: virtio-dev, Michael S. Tsirkin, qemu-devel, Marcel Apfelbaum, si-wei.liu

[-- Attachment #1: Type: text/plain, Size: 2615 bytes --]

On 1/8/2019 4:14 PM, Dongli Zhang wrote:
> Hi Venu,
>
> On 2019/1/9 上午1:25, Venu Busireddy wrote:
>> On 2019-01-09 00:56:38 +0800, Dongli Zhang wrote:
>>> I am not familiar with libvirt and I would like to play with this only with qemu.
>>>
>>> With failover, we need to hotplug the VF on destination server to VM after live
>>> migration. However, the VF on destination server would have different mac address.
>>>
>>> How can we specify the mac for the new VF to hotplug via qemu, as VF is only a
>>> vfio pci device?
>> How is the VF device on the destination host any different from the VF
>> on the source host?
>>
>> As you do on the source host, you first assign the MAC address of
>> 00:00:00:00:00:00 to the VF. After the migration, you assign the same
>> MAC address as that of the virtio_net device to the VF, and hotadd the VF
> This was what I was wondering.
>
> How the mac address is configured for VF (or any NIC like PF) after it is
> assigned to vfio?

ip link set <pf> vf  <vf-num>  mac <MAC>

See https://www.kernel.org/doc/html/latest/networking/net_failover.html
for a sample script that shows the steps to initiate live migration with VF
and virtio-net in standby mode.


>
> Thank you very much!
>
> Dongli Zhang
>
>
>> device to the VM. And then, after you receive the FAILOVER_PRIMARY_CHANGED
>> event, set the macvtap device to down state.
>>
>> Venu
>>
>>> I am trying to play with this with only qemu (w/o libvirt).
>>>
>>> Thank you very much!
>>>
>>> Dongli Zhang
>>>
>>> On 01/08/2019 06:29 AM, Venu Busireddy wrote:
>>>> From: Sridhar Samudrala <sridhar.samudrala@intel.com>
>>>>
>>>> This feature bit can be used by a hypervisor to indicate to the virtio_net
>>>> device that it can act as a standby for another device with the same MAC
>>>> address.
>>>>
>>>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>>>> Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
>>>> ---
>>>>   hw/net/virtio-net.c | 2 ++
>>>>   1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>>> index 385b1a0..411f8fb 100644
>>>> --- a/hw/net/virtio-net.c
>>>> +++ b/hw/net/virtio-net.c
>>>> @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
>>>>                        true),
>>>>       DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
>>>>       DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
>>>> +    DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY,
>>>> +                      false),
>>>>       DEFINE_PROP_END_OF_LIST(),
>>>>   };
>>>>   
>>>>

[-- Attachment #2: Type: text/html, Size: 4071 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/5] virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.
  2019-01-09  0:18           ` [virtio-dev] " Samudrala, Sridhar
  (?)
@ 2019-01-09  0:39           ` Dongli Zhang
  2019-01-09  4:17               ` [virtio-dev] " Michael S. Tsirkin
  -1 siblings, 1 reply; 57+ messages in thread
From: Dongli Zhang @ 2019-01-09  0:39 UTC (permalink / raw)
  To: Samudrala, Sridhar, Venu Busireddy
  Cc: virtio-dev, Michael S. Tsirkin, qemu-devel, Marcel Apfelbaum, si-wei.liu

Hi Samudrala,

On 2019/1/9 上午8:18, Samudrala, Sridhar wrote:
> On 1/8/2019 4:14 PM, Dongli Zhang wrote:
>> Hi Venu,
>>
>> On 2019/1/9 上午1:25, Venu Busireddy wrote:
>>> On 2019-01-09 00:56:38 +0800, Dongli Zhang wrote:
>>>> I am not familiar with libvirt and I would like to play with this only with qemu.
>>>>
>>>> With failover, we need to hotplug the VF on destination server to VM after live
>>>> migration. However, the VF on destination server would have different mac address.
>>>>
>>>> How can we specify the mac for the new VF to hotplug via qemu, as VF is only a
>>>> vfio pci device?
>>> How is the VF device on the destination host any different from the VF
>>> on the source host?
>>>
>>> As you do on the source host, you first assign the MAC address of
>>> 00:00:00:00:00:00 to the VF. After the migration, you assign the same
>>> MAC address as that of the virtio_net device to the VF, and hotadd the VF
>> This was what I was wondering.
>>
>> How the mac address is configured for VF (or any NIC like PF) after it is
>> assigned to vfio?
> 
> ip link set <pf> vf  <vf-num>  mac <MAC>
> 
> See https://www.kernel.org/doc/html/latest/networking/net_failover.html
> for a sample script that shows the steps to initiate live migration with VF 
> and virtio-net in standby mode.

Thank you very much for the help!

Sorry that I did not ask the question in the right way.

Although I was talking about VF, I would like to passthrough the entire PF (with
sriov_numvfs=0) to guest VM.

In this situation, I am not able to configure the mac address when the entire PF
(or NIC) is assigned to VFIO. <pf> does not exist as it is belong to VFIO.

Thank you very much!

Dongli Zhang

> 
> 
>> Thank you very much!
>>
>> Dongli Zhang
>>
>>
>>> device to the VM. And then, after you receive the FAILOVER_PRIMARY_CHANGED
>>> event, set the macvtap device to down state.
>>>
>>> Venu
>>>
>>>> I am trying to play with this with only qemu (w/o libvirt).
>>>>
>>>> Thank you very much!
>>>>
>>>> Dongli Zhang
>>>>
>>>> On 01/08/2019 06:29 AM, Venu Busireddy wrote:
>>>>> From: Sridhar Samudrala <sridhar.samudrala@intel.com>
>>>>>
>>>>> This feature bit can be used by a hypervisor to indicate to the virtio_net
>>>>> device that it can act as a standby for another device with the same MAC
>>>>> address.
>>>>>
>>>>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>>>>> Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
>>>>> ---
>>>>>  hw/net/virtio-net.c | 2 ++
>>>>>  1 file changed, 2 insertions(+)
>>>>>
>>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>>>> index 385b1a0..411f8fb 100644
>>>>> --- a/hw/net/virtio-net.c
>>>>> +++ b/hw/net/virtio-net.c
>>>>> @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
>>>>>                       true),
>>>>>      DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
>>>>>      DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
>>>>> +    DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY,
>>>>> +                      false),
>>>>>      DEFINE_PROP_END_OF_LIST(),
>>>>>  };
>>>>>  
>>>>>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/5] virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.
  2019-01-09  0:39           ` Dongli Zhang
@ 2019-01-09  4:17               ` Michael S. Tsirkin
  0 siblings, 0 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2019-01-09  4:17 UTC (permalink / raw)
  To: Dongli Zhang
  Cc: Samudrala, Sridhar, Venu Busireddy, virtio-dev, qemu-devel,
	Marcel Apfelbaum, si-wei.liu

On Wed, Jan 09, 2019 at 08:39:07AM +0800, Dongli Zhang wrote:
> Hi Samudrala,
> 
> On 2019/1/9 上午8:18, Samudrala, Sridhar wrote:
> > On 1/8/2019 4:14 PM, Dongli Zhang wrote:
> >> Hi Venu,
> >>
> >> On 2019/1/9 上午1:25, Venu Busireddy wrote:
> >>> On 2019-01-09 00:56:38 +0800, Dongli Zhang wrote:
> >>>> I am not familiar with libvirt and I would like to play with this only with qemu.
> >>>>
> >>>> With failover, we need to hotplug the VF on destination server to VM after live
> >>>> migration. However, the VF on destination server would have different mac address.
> >>>>
> >>>> How can we specify the mac for the new VF to hotplug via qemu, as VF is only a
> >>>> vfio pci device?
> >>> How is the VF device on the destination host any different from the VF
> >>> on the source host?
> >>>
> >>> As you do on the source host, you first assign the MAC address of
> >>> 00:00:00:00:00:00 to the VF. After the migration, you assign the same
> >>> MAC address as that of the virtio_net device to the VF, and hotadd the VF
> >> This was what I was wondering.
> >>
> >> How the mac address is configured for VF (or any NIC like PF) after it is
> >> assigned to vfio?
> > 
> > ip link set <pf> vf  <vf-num>  mac <MAC>
> > 
> > See https://www.kernel.org/doc/html/latest/networking/net_failover.html
> > for a sample script that shows the steps to initiate live migration with VF 
> > and virtio-net in standby mode.
> 
> Thank you very much for the help!
> 
> Sorry that I did not ask the question in the right way.
> 
> Although I was talking about VF, I would like to passthrough the entire PF (with
> sriov_numvfs=0) to guest VM.
> 
> In this situation, I am not able to configure the mac address when the entire PF
> (or NIC) is assigned to VFIO. <pf> does not exist as it is belong to VFIO.
> 
> Thank you very much!
> 
> Dongli Zhang

I think that this mode isn't a good fit for the current (MAC address
based) failover. There was talk about supporting other ways to match
devices for failover, but no one implemented driver changes required
for this yet.


> > 
> > 
> >> Thank you very much!
> >>
> >> Dongli Zhang
> >>
> >>
> >>> device to the VM. And then, after you receive the FAILOVER_PRIMARY_CHANGED
> >>> event, set the macvtap device to down state.
> >>>
> >>> Venu
> >>>
> >>>> I am trying to play with this with only qemu (w/o libvirt).
> >>>>
> >>>> Thank you very much!
> >>>>
> >>>> Dongli Zhang
> >>>>
> >>>> On 01/08/2019 06:29 AM, Venu Busireddy wrote:
> >>>>> From: Sridhar Samudrala <sridhar.samudrala@intel.com>
> >>>>>
> >>>>> This feature bit can be used by a hypervisor to indicate to the virtio_net
> >>>>> device that it can act as a standby for another device with the same MAC
> >>>>> address.
> >>>>>
> >>>>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> >>>>> Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
> >>>>> ---
> >>>>>  hw/net/virtio-net.c | 2 ++
> >>>>>  1 file changed, 2 insertions(+)
> >>>>>
> >>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> >>>>> index 385b1a0..411f8fb 100644
> >>>>> --- a/hw/net/virtio-net.c
> >>>>> +++ b/hw/net/virtio-net.c
> >>>>> @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
> >>>>>                       true),
> >>>>>      DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
> >>>>>      DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
> >>>>> +    DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY,
> >>>>> +                      false),
> >>>>>      DEFINE_PROP_END_OF_LIST(),
> >>>>>  };
> >>>>>  
> >>>>>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [virtio-dev] Re: [Qemu-devel] [PATCH v3 1/5] virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.
@ 2019-01-09  4:17               ` Michael S. Tsirkin
  0 siblings, 0 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2019-01-09  4:17 UTC (permalink / raw)
  To: Dongli Zhang
  Cc: Samudrala, Sridhar, Venu Busireddy, virtio-dev, qemu-devel,
	Marcel Apfelbaum, si-wei.liu

On Wed, Jan 09, 2019 at 08:39:07AM +0800, Dongli Zhang wrote:
> Hi Samudrala,
> 
> On 2019/1/9 上午8:18, Samudrala, Sridhar wrote:
> > On 1/8/2019 4:14 PM, Dongli Zhang wrote:
> >> Hi Venu,
> >>
> >> On 2019/1/9 上午1:25, Venu Busireddy wrote:
> >>> On 2019-01-09 00:56:38 +0800, Dongli Zhang wrote:
> >>>> I am not familiar with libvirt and I would like to play with this only with qemu.
> >>>>
> >>>> With failover, we need to hotplug the VF on destination server to VM after live
> >>>> migration. However, the VF on destination server would have different mac address.
> >>>>
> >>>> How can we specify the mac for the new VF to hotplug via qemu, as VF is only a
> >>>> vfio pci device?
> >>> How is the VF device on the destination host any different from the VF
> >>> on the source host?
> >>>
> >>> As you do on the source host, you first assign the MAC address of
> >>> 00:00:00:00:00:00 to the VF. After the migration, you assign the same
> >>> MAC address as that of the virtio_net device to the VF, and hotadd the VF
> >> This was what I was wondering.
> >>
> >> How the mac address is configured for VF (or any NIC like PF) after it is
> >> assigned to vfio?
> > 
> > ip link set <pf> vf  <vf-num>  mac <MAC>
> > 
> > See https://www.kernel.org/doc/html/latest/networking/net_failover.html
> > for a sample script that shows the steps to initiate live migration with VF 
> > and virtio-net in standby mode.
> 
> Thank you very much for the help!
> 
> Sorry that I did not ask the question in the right way.
> 
> Although I was talking about VF, I would like to passthrough the entire PF (with
> sriov_numvfs=0) to guest VM.
> 
> In this situation, I am not able to configure the mac address when the entire PF
> (or NIC) is assigned to VFIO. <pf> does not exist as it is belong to VFIO.
> 
> Thank you very much!
> 
> Dongli Zhang

I think that this mode isn't a good fit for the current (MAC address
based) failover. There was talk about supporting other ways to match
devices for failover, but no one implemented driver changes required
for this yet.


> > 
> > 
> >> Thank you very much!
> >>
> >> Dongli Zhang
> >>
> >>
> >>> device to the VM. And then, after you receive the FAILOVER_PRIMARY_CHANGED
> >>> event, set the macvtap device to down state.
> >>>
> >>> Venu
> >>>
> >>>> I am trying to play with this with only qemu (w/o libvirt).
> >>>>
> >>>> Thank you very much!
> >>>>
> >>>> Dongli Zhang
> >>>>
> >>>> On 01/08/2019 06:29 AM, Venu Busireddy wrote:
> >>>>> From: Sridhar Samudrala <sridhar.samudrala@intel.com>
> >>>>>
> >>>>> This feature bit can be used by a hypervisor to indicate to the virtio_net
> >>>>> device that it can act as a standby for another device with the same MAC
> >>>>> address.
> >>>>>
> >>>>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> >>>>> Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
> >>>>> ---
> >>>>>  hw/net/virtio-net.c | 2 ++
> >>>>>  1 file changed, 2 insertions(+)
> >>>>>
> >>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> >>>>> index 385b1a0..411f8fb 100644
> >>>>> --- a/hw/net/virtio-net.c
> >>>>> +++ b/hw/net/virtio-net.c
> >>>>> @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
> >>>>>                       true),
> >>>>>      DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
> >>>>>      DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
> >>>>> +    DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY,
> >>>>> +                      false),
> >>>>>      DEFINE_PROP_END_OF_LIST(),
> >>>>>  };
> >>>>>  
> >>>>>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v3 0/5] Support for datapath switching during live migration
  2019-01-08  2:25       ` Michael S. Tsirkin
@ 2019-01-09  4:55         ` si-wei liu
  -1 siblings, 0 replies; 57+ messages in thread
From: si-wei liu @ 2019-01-09  4:55 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Venu Busireddy, Marcel Apfelbaum, virtio-dev, qemu-devel, Liran Alon



On 1/7/2019 6:25 PM, Michael S. Tsirkin wrote:
> On Mon, Jan 07, 2019 at 05:45:22PM -0800, si-wei liu wrote:
>> On 01/07/2019 03:32 PM, Michael S. Tsirkin wrote:
>>> On Mon, Jan 07, 2019 at 05:29:39PM -0500, Venu Busireddy wrote:
>>>> Implement the infrastructure to support datapath switching during live
>>>> migration involving SR-IOV devices.
>>>>
>>>> 1. This patch is based off on the current VIRTIO_NET_F_STANDBY feature
>>>>      bit and MAC address device pairing.
>>>>
>>>> 2. This set of events will be consumed by userspace management software
>>>>      to orchestrate all the hot plug and datapath switching activities.
>>>>      This scheme has the least QEMU modifications while allowing userspace
>>>>      software to build its own intelligence to control the whole process
>>>>      of SR-IOV live migration.
>>>>
>>>> 3. While the hidden device model (viz. coupled device model) is still
>>>>      being explored for automatic hot plugging (QEMU) and automatic datapath
>>>>      switching (host-kernel), this series provides a supplemental set
>>>>      of interfaces if management software wants to drive the SR-IOV live
>>>>      migration on its own. It should not conflict with the hidden device
>>>>      model but just offers simplicity of implementation.
>>>>
>>>>
>>>> Si-Wei Liu (2):
>>>>     vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover
>>>>     pci: query command extension to check the bus master enabling status of the failover-primary device
>>>>
>>>> Sridhar Samudrala (1):
>>>>     virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.
>>>>
>>>> Venu Busireddy (2):
>>>>     virtio_net: Add support for "Data Path Switching" during Live Migration.
>>>>     virtio_net: Add a query command for FAILOVER_STANDBY_CHANGED event.
>>>>
>>>> ---
>>>> Changes in v3:
>>>>     Fix issues with coding style in patch 3/5.
>>>>
>>>> Changes in v2:
>>>>     Added a query command for FAILOVER_STANDBY_CHANGED event.
>>>>     Added a query command for FAILOVER_PRIMARY_CHANGED event.
>>> Hmm it looks like all feedback I sent e.g. here:
>>> https://patchwork.kernel.org/patch/10721571/
>>> got ignored.
>>>
>>> To summarize I suggest reworking the series adding a new command along
>>> the lines of (naming is up to you):
>>>
>>> query-pci-master - this returns status for a device
>>> 		   and enables a *single* event after
>>> 		   it changes
>>>
>>> and then removing all status data from the event,
>>> just notify about the change and *only once*.
>> Why removing all status data from the event?
> To make sure users do not forget to call query-pci-master to
> re-enable more events.
IMO the FAILOVER_PRIMARY_CHANGED event is on the performance path, it's 
an overkill to enforce round trip query for each event in normal situations.
>> It does not hurt to keep them
>> as the FAILOVER_PRIMARY_CHANGED event in general is of pretty low-frequency.
> A malicious guest can make it as frequent as it wants to.
> OTOH there is no way to limit.
Will throttle the event rate (say, limiting to no more than 1 event per 
second) a way to limit (as opposed to control guest behavior) ? The 
other similar events that apply rate limiting don't suppress event 
emission until the next query at all. Doing so would just cause more 
events missing. As stated in the earlier example, we should give guest 
NIC a chance to flush queued packets even if ending state is same 
between two events.

>> As can be seen other similar low-frequent QMP events do have data carried
>> over.
>>
>> As this event relates to datapath switching, there's implication to coalesce
>> events as packets might not get a chance to send out as nothing would ever
>> happen when  going through quick transitions like
>> disabled->enabled->disabled. I would allow at least few packets to be sent
>> over wire rather than nothing. Who knows how fast management can react and
>> consume these events?
>>
>> Thanks,
>> -Siwei
> OK if it's so important for latency let's include data in the event.
> Please add comments explaining that you must always re-run query
> afterwards to make sure it's stable and re-enable more events.
I can add comments describing why we need to carry data in the event, 
and apply rate limiting to events. But I don't follow why it must 
suppress event until next query.


Thanks,
-Siwei

>
>
>>> 	
>>>
>>> upon event management does query-pci-master
>>> and acts accordingly.
>>>
>>>
>>>
>>>
>>>>    hmp.c                          |   5 +++
>>>>    hw/acpi/pcihp.c                |  27 +++++++++++
>>>>    hw/net/virtio-net.c            |  42 +++++++++++++++++
>>>>    hw/pci/pci.c                   |   5 +++
>>>>    hw/vfio/pci.c                  |  60 +++++++++++++++++++++++++
>>>>    hw/vfio/pci.h                  |   1 +
>>>>    include/hw/pci/pci.h           |   1 +
>>>>    include/hw/virtio/virtio-net.h |   1 +
>>>>    include/net/net.h              |   2 +
>>>>    net/net.c                      |  61 +++++++++++++++++++++++++
>>>>    qapi/misc.json                 |   5 ++-
>>>>    qapi/net.json                  | 100 +++++++++++++++++++++++++++++++++++++++++
>>>>    12 files changed, 309 insertions(+), 1 deletion(-)
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:virtio-dev-unsubscribe@lists.oasis-open.org
>>> For additional commands, e-mail:virtio-dev-help@lists.oasis-open.org
>>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail:virtio-dev-help@lists.oasis-open.org
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [virtio-dev] Re: [PATCH v3 0/5] Support for datapath switching during live migration
@ 2019-01-09  4:55         ` si-wei liu
  0 siblings, 0 replies; 57+ messages in thread
From: si-wei liu @ 2019-01-09  4:55 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Venu Busireddy, Marcel Apfelbaum, virtio-dev, qemu-devel, Liran Alon

[-- Attachment #1: Type: text/plain, Size: 5624 bytes --]



On 1/7/2019 6:25 PM, Michael S. Tsirkin wrote:
> On Mon, Jan 07, 2019 at 05:45:22PM -0800, si-wei liu wrote:
>> On 01/07/2019 03:32 PM, Michael S. Tsirkin wrote:
>>> On Mon, Jan 07, 2019 at 05:29:39PM -0500, Venu Busireddy wrote:
>>>> Implement the infrastructure to support datapath switching during live
>>>> migration involving SR-IOV devices.
>>>>
>>>> 1. This patch is based off on the current VIRTIO_NET_F_STANDBY feature
>>>>      bit and MAC address device pairing.
>>>>
>>>> 2. This set of events will be consumed by userspace management software
>>>>      to orchestrate all the hot plug and datapath switching activities.
>>>>      This scheme has the least QEMU modifications while allowing userspace
>>>>      software to build its own intelligence to control the whole process
>>>>      of SR-IOV live migration.
>>>>
>>>> 3. While the hidden device model (viz. coupled device model) is still
>>>>      being explored for automatic hot plugging (QEMU) and automatic datapath
>>>>      switching (host-kernel), this series provides a supplemental set
>>>>      of interfaces if management software wants to drive the SR-IOV live
>>>>      migration on its own. It should not conflict with the hidden device
>>>>      model but just offers simplicity of implementation.
>>>>
>>>>
>>>> Si-Wei Liu (2):
>>>>     vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover
>>>>     pci: query command extension to check the bus master enabling status of the failover-primary device
>>>>
>>>> Sridhar Samudrala (1):
>>>>     virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.
>>>>
>>>> Venu Busireddy (2):
>>>>     virtio_net: Add support for "Data Path Switching" during Live Migration.
>>>>     virtio_net: Add a query command for FAILOVER_STANDBY_CHANGED event.
>>>>
>>>> ---
>>>> Changes in v3:
>>>>     Fix issues with coding style in patch 3/5.
>>>>
>>>> Changes in v2:
>>>>     Added a query command for FAILOVER_STANDBY_CHANGED event.
>>>>     Added a query command for FAILOVER_PRIMARY_CHANGED event.
>>> Hmm it looks like all feedback I sent e.g. here:
>>> https://patchwork.kernel.org/patch/10721571/
>>> got ignored.
>>>
>>> To summarize I suggest reworking the series adding a new command along
>>> the lines of (naming is up to you):
>>>
>>> query-pci-master - this returns status for a device
>>> 		   and enables a *single* event after
>>> 		   it changes
>>>
>>> and then removing all status data from the event,
>>> just notify about the change and *only once*.
>> Why removing all status data from the event?
> To make sure users do not forget to call query-pci-master to
> re-enable more events.
IMO the FAILOVER_PRIMARY_CHANGED event is on the performance path, it's 
an overkill to enforce round trip query for each event in normal situations.
>> It does not hurt to keep them
>> as the FAILOVER_PRIMARY_CHANGED event in general is of pretty low-frequency.
> A malicious guest can make it as frequent as it wants to.
> OTOH there is no way to limit.
Will throttle the event rate (say, limiting to no more than 1 event per 
second) a way to limit (as opposed to control guest behavior) ? The 
other similar events that apply rate limiting don't suppress event 
emission until the next query at all. Doing so would just cause more 
events missing. As stated in the earlier example, we should give guest 
NIC a chance to flush queued packets even if ending state is same 
between two events.

>> As can be seen other similar low-frequent QMP events do have data carried
>> over.
>>
>> As this event relates to datapath switching, there's implication to coalesce
>> events as packets might not get a chance to send out as nothing would ever
>> happen when  going through quick transitions like
>> disabled->enabled->disabled. I would allow at least few packets to be sent
>> over wire rather than nothing. Who knows how fast management can react and
>> consume these events?
>>
>> Thanks,
>> -Siwei
> OK if it's so important for latency let's include data in the event.
> Please add comments explaining that you must always re-run query
> afterwards to make sure it's stable and re-enable more events.
I can add comments describing why we need to carry data in the event, 
and apply rate limiting to events. But I don't follow why it must 
suppress event until next query.


Thanks,
-Siwei

>
>
>>> 	
>>>
>>> upon event management does query-pci-master
>>> and acts accordingly.
>>>
>>>
>>>
>>>
>>>>    hmp.c                          |   5 +++
>>>>    hw/acpi/pcihp.c                |  27 +++++++++++
>>>>    hw/net/virtio-net.c            |  42 +++++++++++++++++
>>>>    hw/pci/pci.c                   |   5 +++
>>>>    hw/vfio/pci.c                  |  60 +++++++++++++++++++++++++
>>>>    hw/vfio/pci.h                  |   1 +
>>>>    include/hw/pci/pci.h           |   1 +
>>>>    include/hw/virtio/virtio-net.h |   1 +
>>>>    include/net/net.h              |   2 +
>>>>    net/net.c                      |  61 +++++++++++++++++++++++++
>>>>    qapi/misc.json                 |   5 ++-
>>>>    qapi/net.json                  | 100 +++++++++++++++++++++++++++++++++++++++++
>>>>    12 files changed, 309 insertions(+), 1 deletion(-)
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:virtio-dev-unsubscribe@lists.oasis-open.org
>>> For additional commands, e-mail:virtio-dev-help@lists.oasis-open.org
>>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail:virtio-dev-help@lists.oasis-open.org
>


[-- Attachment #2: Type: text/html, Size: 7721 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v3 0/5] Support for datapath switching during live migration
  2019-01-09  4:55         ` si-wei liu
@ 2019-01-09 13:39           ` Michael S. Tsirkin
  -1 siblings, 0 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2019-01-09 13:39 UTC (permalink / raw)
  To: si-wei liu
  Cc: Venu Busireddy, Marcel Apfelbaum, virtio-dev, qemu-devel, Liran Alon

On Tue, Jan 08, 2019 at 08:55:35PM -0800, si-wei liu wrote:
> 
> 
> On 1/7/2019 6:25 PM, Michael S. Tsirkin wrote:
> 
>     On Mon, Jan 07, 2019 at 05:45:22PM -0800, si-wei liu wrote:
> 
>         On 01/07/2019 03:32 PM, Michael S. Tsirkin wrote:
> 
>             On Mon, Jan 07, 2019 at 05:29:39PM -0500, Venu Busireddy wrote:
> 
>                 Implement the infrastructure to support datapath switching during live
>                 migration involving SR-IOV devices.
> 
>                 1. This patch is based off on the current VIRTIO_NET_F_STANDBY feature
>                     bit and MAC address device pairing.
> 
>                 2. This set of events will be consumed by userspace management software
>                     to orchestrate all the hot plug and datapath switching activities.
>                     This scheme has the least QEMU modifications while allowing userspace
>                     software to build its own intelligence to control the whole process
>                     of SR-IOV live migration.
> 
>                 3. While the hidden device model (viz. coupled device model) is still
>                     being explored for automatic hot plugging (QEMU) and automatic datapath
>                     switching (host-kernel), this series provides a supplemental set
>                     of interfaces if management software wants to drive the SR-IOV live
>                     migration on its own. It should not conflict with the hidden device
>                     model but just offers simplicity of implementation.
> 
> 
>                 Si-Wei Liu (2):
>                    vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover
>                    pci: query command extension to check the bus master enabling status of the failover-primary device
> 
>                 Sridhar Samudrala (1):
>                    virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.
> 
>                 Venu Busireddy (2):
>                    virtio_net: Add support for "Data Path Switching" during Live Migration.
>                    virtio_net: Add a query command for FAILOVER_STANDBY_CHANGED event.
> 
>                 ---
>                 Changes in v3:
>                    Fix issues with coding style in patch 3/5.
> 
>                 Changes in v2:
>                    Added a query command for FAILOVER_STANDBY_CHANGED event.
>                    Added a query command for FAILOVER_PRIMARY_CHANGED event.
> 
>             Hmm it looks like all feedback I sent e.g. here:
>             https://patchwork.kernel.org/patch/10721571/
>             got ignored.
> 
>             To summarize I suggest reworking the series adding a new command along
>             the lines of (naming is up to you):
> 
>             query-pci-master - this returns status for a device
>                                and enables a *single* event after
>                                it changes
> 
>             and then removing all status data from the event,
>             just notify about the change and *only once*.
> 
>         Why removing all status data from the event?
> 
>     To make sure users do not forget to call query-pci-master to
>     re-enable more events.
> 
> IMO the FAILOVER_PRIMARY_CHANGED event is on the performance path, it's an
> overkill to enforce round trip query for each event in normal situations.
> 
>         It does not hurt to keep them
>         as the FAILOVER_PRIMARY_CHANGED event in general is of pretty low-frequency.
> 
>     A malicious guest can make it as frequent as it wants to.
>     OTOH there is no way to limit.
> 
> Will throttle the event rate (say, limiting to no more than 1 event per second)

And if guest *does* need the switch because e.g. attaching xdp wants to
resent the card?

> a way to limit (as opposed to control guest behavior) ? The other similar
> events that apply rate limiting don't suppress event emission until the next
> query at all.

We have some problematic interfaces already, that's true.

> Doing so would just cause more events missing. As stated in the
> earlier example, we should give guest NIC a chance to flush queued packets even
> if ending state is same between two events.

I haven't seen that requirement. I guess a reset just stops processing
buffers rather than flush. Care to repeat?

> 
>         As can be seen other similar low-frequent QMP events do have data carried
>         over.
> 
>         As this event relates to datapath switching, there's implication to coalesce
>         events as packets might not get a chance to send out as nothing would ever
>         happen when  going through quick transitions like
>         disabled->enabled->disabled. I would allow at least few packets to be sent
>         over wire rather than nothing. Who knows how fast management can react and
>         consume these events?
> 
>         Thanks,
>         -Siwei
> 
>     OK if it's so important for latency let's include data in the event.
>     Please add comments explaining that you must always re-run query
>     afterwards to make sure it's stable and re-enable more events.
> 
> I can add comments describing why we need to carry data in the event, and apply
> rate limiting to events. But I don't follow why it must suppress event until
> next query.

Rate limiting is fundamentally broken.
Try a stress of resets and there goes your promise of low downtime.

Let me try to re-state: state query is fundamentally required
because otherwise e.g. management restarts do not work.
And it is much better to force management to run query every event
and every restart than just hope it handles racy corner cases correctly.

If we do include data in the event then there is no real latency cost to
that, since management can take action in response to the event then do
the query asynchronously at leasure.
So let me turn it around and say I don't follow why you have objections
to blocking following events until query.


> 
> Thanks,
> -Siwei
> 
> 
> 
> 
> 
> 
> 
>             upon event management does query-pci-master
>             and acts accordingly.
> 
> 
> 
> 
> 
>                   hmp.c                          |   5 +++
>                   hw/acpi/pcihp.c                |  27 +++++++++++
>                   hw/net/virtio-net.c            |  42 +++++++++++++++++
>                   hw/pci/pci.c                   |   5 +++
>                   hw/vfio/pci.c                  |  60 +++++++++++++++++++++++++
>                   hw/vfio/pci.h                  |   1 +
>                   include/hw/pci/pci.h           |   1 +
>                   include/hw/virtio/virtio-net.h |   1 +
>                   include/net/net.h              |   2 +
>                   net/net.c                      |  61 +++++++++++++++++++++++++
>                   qapi/misc.json                 |   5 ++-
>                   qapi/net.json                  | 100 +++++++++++++++++++++++++++++++++++++++++
>                   12 files changed, 309 insertions(+), 1 deletion(-)
> 
>             ---------------------------------------------------------------------
>             To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>             For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> 
> 
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>     For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> 
> 
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [virtio-dev] Re: [PATCH v3 0/5] Support for datapath switching during live migration
@ 2019-01-09 13:39           ` Michael S. Tsirkin
  0 siblings, 0 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2019-01-09 13:39 UTC (permalink / raw)
  To: si-wei liu
  Cc: Venu Busireddy, Marcel Apfelbaum, virtio-dev, qemu-devel, Liran Alon

On Tue, Jan 08, 2019 at 08:55:35PM -0800, si-wei liu wrote:
> 
> 
> On 1/7/2019 6:25 PM, Michael S. Tsirkin wrote:
> 
>     On Mon, Jan 07, 2019 at 05:45:22PM -0800, si-wei liu wrote:
> 
>         On 01/07/2019 03:32 PM, Michael S. Tsirkin wrote:
> 
>             On Mon, Jan 07, 2019 at 05:29:39PM -0500, Venu Busireddy wrote:
> 
>                 Implement the infrastructure to support datapath switching during live
>                 migration involving SR-IOV devices.
> 
>                 1. This patch is based off on the current VIRTIO_NET_F_STANDBY feature
>                     bit and MAC address device pairing.
> 
>                 2. This set of events will be consumed by userspace management software
>                     to orchestrate all the hot plug and datapath switching activities.
>                     This scheme has the least QEMU modifications while allowing userspace
>                     software to build its own intelligence to control the whole process
>                     of SR-IOV live migration.
> 
>                 3. While the hidden device model (viz. coupled device model) is still
>                     being explored for automatic hot plugging (QEMU) and automatic datapath
>                     switching (host-kernel), this series provides a supplemental set
>                     of interfaces if management software wants to drive the SR-IOV live
>                     migration on its own. It should not conflict with the hidden device
>                     model but just offers simplicity of implementation.
> 
> 
>                 Si-Wei Liu (2):
>                    vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover
>                    pci: query command extension to check the bus master enabling status of the failover-primary device
> 
>                 Sridhar Samudrala (1):
>                    virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.
> 
>                 Venu Busireddy (2):
>                    virtio_net: Add support for "Data Path Switching" during Live Migration.
>                    virtio_net: Add a query command for FAILOVER_STANDBY_CHANGED event.
> 
>                 ---
>                 Changes in v3:
>                    Fix issues with coding style in patch 3/5.
> 
>                 Changes in v2:
>                    Added a query command for FAILOVER_STANDBY_CHANGED event.
>                    Added a query command for FAILOVER_PRIMARY_CHANGED event.
> 
>             Hmm it looks like all feedback I sent e.g. here:
>             https://patchwork.kernel.org/patch/10721571/
>             got ignored.
> 
>             To summarize I suggest reworking the series adding a new command along
>             the lines of (naming is up to you):
> 
>             query-pci-master - this returns status for a device
>                                and enables a *single* event after
>                                it changes
> 
>             and then removing all status data from the event,
>             just notify about the change and *only once*.
> 
>         Why removing all status data from the event?
> 
>     To make sure users do not forget to call query-pci-master to
>     re-enable more events.
> 
> IMO the FAILOVER_PRIMARY_CHANGED event is on the performance path, it's an
> overkill to enforce round trip query for each event in normal situations.
> 
>         It does not hurt to keep them
>         as the FAILOVER_PRIMARY_CHANGED event in general is of pretty low-frequency.
> 
>     A malicious guest can make it as frequent as it wants to.
>     OTOH there is no way to limit.
> 
> Will throttle the event rate (say, limiting to no more than 1 event per second)

And if guest *does* need the switch because e.g. attaching xdp wants to
resent the card?

> a way to limit (as opposed to control guest behavior) ? The other similar
> events that apply rate limiting don't suppress event emission until the next
> query at all.

We have some problematic interfaces already, that's true.

> Doing so would just cause more events missing. As stated in the
> earlier example, we should give guest NIC a chance to flush queued packets even
> if ending state is same between two events.

I haven't seen that requirement. I guess a reset just stops processing
buffers rather than flush. Care to repeat?

> 
>         As can be seen other similar low-frequent QMP events do have data carried
>         over.
> 
>         As this event relates to datapath switching, there's implication to coalesce
>         events as packets might not get a chance to send out as nothing would ever
>         happen when  going through quick transitions like
>         disabled->enabled->disabled. I would allow at least few packets to be sent
>         over wire rather than nothing. Who knows how fast management can react and
>         consume these events?
> 
>         Thanks,
>         -Siwei
> 
>     OK if it's so important for latency let's include data in the event.
>     Please add comments explaining that you must always re-run query
>     afterwards to make sure it's stable and re-enable more events.
> 
> I can add comments describing why we need to carry data in the event, and apply
> rate limiting to events. But I don't follow why it must suppress event until
> next query.

Rate limiting is fundamentally broken.
Try a stress of resets and there goes your promise of low downtime.

Let me try to re-state: state query is fundamentally required
because otherwise e.g. management restarts do not work.
And it is much better to force management to run query every event
and every restart than just hope it handles racy corner cases correctly.

If we do include data in the event then there is no real latency cost to
that, since management can take action in response to the event then do
the query asynchronously at leasure.
So let me turn it around and say I don't follow why you have objections
to blocking following events until query.


> 
> Thanks,
> -Siwei
> 
> 
> 
> 
> 
> 
> 
>             upon event management does query-pci-master
>             and acts accordingly.
> 
> 
> 
> 
> 
>                   hmp.c                          |   5 +++
>                   hw/acpi/pcihp.c                |  27 +++++++++++
>                   hw/net/virtio-net.c            |  42 +++++++++++++++++
>                   hw/pci/pci.c                   |   5 +++
>                   hw/vfio/pci.c                  |  60 +++++++++++++++++++++++++
>                   hw/vfio/pci.h                  |   1 +
>                   include/hw/pci/pci.h           |   1 +
>                   include/hw/virtio/virtio-net.h |   1 +
>                   include/net/net.h              |   2 +
>                   net/net.c                      |  61 +++++++++++++++++++++++++
>                   qapi/misc.json                 |   5 ++-
>                   qapi/net.json                  | 100 +++++++++++++++++++++++++++++++++++++++++
>                   12 files changed, 309 insertions(+), 1 deletion(-)
> 
>             ---------------------------------------------------------------------
>             To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>             For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> 
> 
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>     For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> 
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [virtio-dev] [PATCH v3 2/5] virtio_net: Add support for "Data Path Switching" during Live Migration.
  2019-01-07 22:29   ` [virtio-dev] " Venu Busireddy
@ 2019-01-09 13:39     ` Cornelia Huck
  -1 siblings, 0 replies; 57+ messages in thread
From: Cornelia Huck @ 2019-01-09 13:39 UTC (permalink / raw)
  To: Venu Busireddy
  Cc: Michael S. Tsirkin, Marcel Apfelbaum, virtio-dev, qemu-devel

On Mon,  7 Jan 2019 17:29:41 -0500
Venu Busireddy <venu.busireddy@oracle.com> wrote:

> Added a new event, FAILOVER_STANDBY_CHANGED, which is emitted whenever
> the status of the virtio_net driver in the guest changes (either the
> guest successfully loads the driver after the F_STANDBY feature bit
> is negotiated, or the guest unloads the driver or reboots). Management
> stack can use this event to determine when to plug/unplug the VF device
> to/from the guest.
> 
> Also, the Virtual Functions will be automatically removed from the guest
> if the guest is rebooted. To properly identify the VFIO devices that
> must be removed, a new property named "failover-primary" is added to
> the vfio-pci devices. Only the vfio-pci devices that have this property
> enabled are removed from the guest upon reboot.
> 
> Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
> ---
>  hw/acpi/pcihp.c      | 27 +++++++++++++++++++++++++++
>  hw/net/virtio-net.c  | 24 ++++++++++++++++++++++++
>  hw/vfio/pci.c        |  3 +++
>  hw/vfio/pci.h        |  1 +
>  include/hw/pci/pci.h |  1 +
>  qapi/net.json        | 28 ++++++++++++++++++++++++++++
>  6 files changed, 84 insertions(+)
> 
> diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
> index 80d42e1..2a3ffd3 100644
> --- a/hw/acpi/pcihp.c
> +++ b/hw/acpi/pcihp.c
> @@ -176,6 +176,25 @@ static void acpi_pcihp_eject_slot(AcpiPciHpState *s, unsigned bsel, unsigned slo
>      }
>  }
>  
> +static void acpi_pcihp_cleanup_failover_primary(AcpiPciHpState *s, int bsel)
> +{
> +    BusChild *kid, *next;
> +    PCIBus *bus = acpi_pcihp_find_hotplug_bus(s, bsel);
> +
> +    if (!bus) {
> +        return;
> +    }
> +    QTAILQ_FOREACH_SAFE(kid, &bus->qbus.children, sibling, next) {
> +        DeviceState *qdev = kid->child;
> +        PCIDevice *pdev = PCI_DEVICE(qdev);
> +        int slot = PCI_SLOT(pdev->devfn);
> +
> +        if (pdev->failover_primary) {
> +            s->acpi_pcihp_pci_status[bsel].down |= (1U << slot);
> +        }
> +    }
> +}
> +
>  static void acpi_pcihp_update_hotplug_bus(AcpiPciHpState *s, int bsel)
>  {
>      BusChild *kid, *next;
> @@ -207,6 +226,14 @@ static void acpi_pcihp_update(AcpiPciHpState *s)
>      int i;
>  
>      for (i = 0; i < ACPI_PCIHP_MAX_HOTPLUG_BUS; ++i) {
> +        /*
> +         * Set the acpi_pcihp_pci_status[].down bits of all the
> +         * failover_primary devices so that the devices are ejected
> +         * from the guest. We can't use the qdev_unplug() as well as the
> +         * hotplug_handler to unplug the devices, because the guest may
> +         * not be in a state to cooperate.
> +         */
> +        acpi_pcihp_cleanup_failover_primary(s, i);
>          acpi_pcihp_update_hotplug_bus(s, i);
>      }
>  }


It seems that you rely on acpi to get the processing right. On a
non-acpi system, you won't get the required changes done. Maybe only
advertise the failover feature if you are actually on a system that
supports handling of the primary correctly (which, at least currently,
means a system with acpi)?

> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 411f8fb..7b1bcde 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -248,6 +248,29 @@ static void virtio_net_drop_tx_queue_data(VirtIODevice *vdev, VirtQueue *vq)
>      }
>  }
>  
> +static void virtio_net_failover_notify_event(VirtIONet *n, uint8_t status)
> +{
> +    VirtIODevice *vdev = VIRTIO_DEVICE(n);
> +
> +    if (virtio_has_feature(vdev->guest_features, VIRTIO_NET_F_STANDBY)) {
> +        const char *ncn = n->netclient_name;
> +        gchar *path = object_get_canonical_path(OBJECT(n->qdev));
> +        /*
> +         * Emit FAILOVER_STANDBY_CHANGED event with enabled=true
> +         *   when the status transitions from 0 to VIRTIO_CONFIG_S_DRIVER_OK
> +         * Emit FAILOVER_STANDBY_CHANGED event with enabled=false
> +         *   when the status transitions from VIRTIO_CONFIG_S_DRIVER_OK to 0
> +         */
> +        if ((status & VIRTIO_CONFIG_S_DRIVER_OK) &&
> +                (!(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK))) {
> +            qapi_event_send_failover_standby_changed(!!ncn, ncn, path, true);
> +        } else if ((!(status & VIRTIO_CONFIG_S_DRIVER_OK)) &&
> +                (vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) {
> +            qapi_event_send_failover_standby_changed(!!ncn, ncn, path, false);
> +        }

Do you also need a notification if something goes wrong in the guest
and it sets VIRTIO_CONFIG_S_FAILED?

> +    }
> +}
> +
>  static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
>  {
>      VirtIONet *n = VIRTIO_NET(vdev);

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [virtio-dev] [PATCH v3 2/5] virtio_net: Add support for "Data Path Switching" during Live Migration.
@ 2019-01-09 13:39     ` Cornelia Huck
  0 siblings, 0 replies; 57+ messages in thread
From: Cornelia Huck @ 2019-01-09 13:39 UTC (permalink / raw)
  To: Venu Busireddy
  Cc: Michael S. Tsirkin, Marcel Apfelbaum, virtio-dev, qemu-devel

On Mon,  7 Jan 2019 17:29:41 -0500
Venu Busireddy <venu.busireddy@oracle.com> wrote:

> Added a new event, FAILOVER_STANDBY_CHANGED, which is emitted whenever
> the status of the virtio_net driver in the guest changes (either the
> guest successfully loads the driver after the F_STANDBY feature bit
> is negotiated, or the guest unloads the driver or reboots). Management
> stack can use this event to determine when to plug/unplug the VF device
> to/from the guest.
> 
> Also, the Virtual Functions will be automatically removed from the guest
> if the guest is rebooted. To properly identify the VFIO devices that
> must be removed, a new property named "failover-primary" is added to
> the vfio-pci devices. Only the vfio-pci devices that have this property
> enabled are removed from the guest upon reboot.
> 
> Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
> ---
>  hw/acpi/pcihp.c      | 27 +++++++++++++++++++++++++++
>  hw/net/virtio-net.c  | 24 ++++++++++++++++++++++++
>  hw/vfio/pci.c        |  3 +++
>  hw/vfio/pci.h        |  1 +
>  include/hw/pci/pci.h |  1 +
>  qapi/net.json        | 28 ++++++++++++++++++++++++++++
>  6 files changed, 84 insertions(+)
> 
> diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
> index 80d42e1..2a3ffd3 100644
> --- a/hw/acpi/pcihp.c
> +++ b/hw/acpi/pcihp.c
> @@ -176,6 +176,25 @@ static void acpi_pcihp_eject_slot(AcpiPciHpState *s, unsigned bsel, unsigned slo
>      }
>  }
>  
> +static void acpi_pcihp_cleanup_failover_primary(AcpiPciHpState *s, int bsel)
> +{
> +    BusChild *kid, *next;
> +    PCIBus *bus = acpi_pcihp_find_hotplug_bus(s, bsel);
> +
> +    if (!bus) {
> +        return;
> +    }
> +    QTAILQ_FOREACH_SAFE(kid, &bus->qbus.children, sibling, next) {
> +        DeviceState *qdev = kid->child;
> +        PCIDevice *pdev = PCI_DEVICE(qdev);
> +        int slot = PCI_SLOT(pdev->devfn);
> +
> +        if (pdev->failover_primary) {
> +            s->acpi_pcihp_pci_status[bsel].down |= (1U << slot);
> +        }
> +    }
> +}
> +
>  static void acpi_pcihp_update_hotplug_bus(AcpiPciHpState *s, int bsel)
>  {
>      BusChild *kid, *next;
> @@ -207,6 +226,14 @@ static void acpi_pcihp_update(AcpiPciHpState *s)
>      int i;
>  
>      for (i = 0; i < ACPI_PCIHP_MAX_HOTPLUG_BUS; ++i) {
> +        /*
> +         * Set the acpi_pcihp_pci_status[].down bits of all the
> +         * failover_primary devices so that the devices are ejected
> +         * from the guest. We can't use the qdev_unplug() as well as the
> +         * hotplug_handler to unplug the devices, because the guest may
> +         * not be in a state to cooperate.
> +         */
> +        acpi_pcihp_cleanup_failover_primary(s, i);
>          acpi_pcihp_update_hotplug_bus(s, i);
>      }
>  }


It seems that you rely on acpi to get the processing right. On a
non-acpi system, you won't get the required changes done. Maybe only
advertise the failover feature if you are actually on a system that
supports handling of the primary correctly (which, at least currently,
means a system with acpi)?

> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 411f8fb..7b1bcde 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -248,6 +248,29 @@ static void virtio_net_drop_tx_queue_data(VirtIODevice *vdev, VirtQueue *vq)
>      }
>  }
>  
> +static void virtio_net_failover_notify_event(VirtIONet *n, uint8_t status)
> +{
> +    VirtIODevice *vdev = VIRTIO_DEVICE(n);
> +
> +    if (virtio_has_feature(vdev->guest_features, VIRTIO_NET_F_STANDBY)) {
> +        const char *ncn = n->netclient_name;
> +        gchar *path = object_get_canonical_path(OBJECT(n->qdev));
> +        /*
> +         * Emit FAILOVER_STANDBY_CHANGED event with enabled=true
> +         *   when the status transitions from 0 to VIRTIO_CONFIG_S_DRIVER_OK
> +         * Emit FAILOVER_STANDBY_CHANGED event with enabled=false
> +         *   when the status transitions from VIRTIO_CONFIG_S_DRIVER_OK to 0
> +         */
> +        if ((status & VIRTIO_CONFIG_S_DRIVER_OK) &&
> +                (!(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK))) {
> +            qapi_event_send_failover_standby_changed(!!ncn, ncn, path, true);
> +        } else if ((!(status & VIRTIO_CONFIG_S_DRIVER_OK)) &&
> +                (vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) {
> +            qapi_event_send_failover_standby_changed(!!ncn, ncn, path, false);
> +        }

Do you also need a notification if something goes wrong in the guest
and it sets VIRTIO_CONFIG_S_FAILED?

> +    }
> +}
> +
>  static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
>  {
>      VirtIONet *n = VIRTIO_NET(vdev);

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/5] virtio_net: Add support for "Data Path Switching" during Live Migration.
  2019-01-07 22:29   ` [virtio-dev] " Venu Busireddy
@ 2019-01-09 15:56     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2019-01-09 15:56 UTC (permalink / raw)
  To: Venu Busireddy; +Cc: Marcel Apfelbaum, virtio-dev, qemu-devel

On Mon, Jan 07, 2019 at 05:29:41PM -0500, Venu Busireddy wrote:
> diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
> index 80d42e1..2a3ffd3 100644
> --- a/hw/acpi/pcihp.c
> +++ b/hw/acpi/pcihp.c
> @@ -176,6 +176,25 @@ static void acpi_pcihp_eject_slot(AcpiPciHpState *s, unsigned bsel, unsigned slo
>      }
>  }
>  
> +static void acpi_pcihp_cleanup_failover_primary(AcpiPciHpState *s, int bsel)
> +{
> +    BusChild *kid, *next;
> +    PCIBus *bus = acpi_pcihp_find_hotplug_bus(s, bsel);
> +
> +    if (!bus) {
> +        return;
> +    }
> +    QTAILQ_FOREACH_SAFE(kid, &bus->qbus.children, sibling, next) {
> +        DeviceState *qdev = kid->child;
> +        PCIDevice *pdev = PCI_DEVICE(qdev);
> +        int slot = PCI_SLOT(pdev->devfn);
> +
> +        if (pdev->failover_primary) {
> +            s->acpi_pcihp_pci_status[bsel].down |= (1U << slot);
> +        }
> +    }
> +}
> +
>  static void acpi_pcihp_update_hotplug_bus(AcpiPciHpState *s, int bsel)
>  {
>      BusChild *kid, *next;

So the result here will be that device will be deleted completely,
and will not reappear after guest reboot.
I don't think this is what we wanted.
I think we wanted a special state that will hide device from guest until
guest acks the failover bit.


> @@ -207,6 +226,14 @@ static void acpi_pcihp_update(AcpiPciHpState *s)
>      int i;
>  
>      for (i = 0; i < ACPI_PCIHP_MAX_HOTPLUG_BUS; ++i) {
> +        /*
> +         * Set the acpi_pcihp_pci_status[].down bits of all the
> +         * failover_primary devices so that the devices are ejected
> +         * from the guest. We can't use the qdev_unplug() as well as the
> +         * hotplug_handler to unplug the devices, because the guest may
> +         * not be in a state to cooperate.
> +         */
> +        acpi_pcihp_cleanup_failover_primary(s, i);
>          acpi_pcihp_update_hotplug_bus(s, i);
>      }
>  }

I really don't want acpi to know anything about failover.

All that needs to happen is sending a device delete request
to guest. Should work with any hotplug removal:
pci standard,acpi, etc.


-- 
MST

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [virtio-dev] Re: [PATCH v3 2/5] virtio_net: Add support for "Data Path Switching" during Live Migration.
@ 2019-01-09 15:56     ` Michael S. Tsirkin
  0 siblings, 0 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2019-01-09 15:56 UTC (permalink / raw)
  To: Venu Busireddy; +Cc: Marcel Apfelbaum, virtio-dev, qemu-devel

On Mon, Jan 07, 2019 at 05:29:41PM -0500, Venu Busireddy wrote:
> diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
> index 80d42e1..2a3ffd3 100644
> --- a/hw/acpi/pcihp.c
> +++ b/hw/acpi/pcihp.c
> @@ -176,6 +176,25 @@ static void acpi_pcihp_eject_slot(AcpiPciHpState *s, unsigned bsel, unsigned slo
>      }
>  }
>  
> +static void acpi_pcihp_cleanup_failover_primary(AcpiPciHpState *s, int bsel)
> +{
> +    BusChild *kid, *next;
> +    PCIBus *bus = acpi_pcihp_find_hotplug_bus(s, bsel);
> +
> +    if (!bus) {
> +        return;
> +    }
> +    QTAILQ_FOREACH_SAFE(kid, &bus->qbus.children, sibling, next) {
> +        DeviceState *qdev = kid->child;
> +        PCIDevice *pdev = PCI_DEVICE(qdev);
> +        int slot = PCI_SLOT(pdev->devfn);
> +
> +        if (pdev->failover_primary) {
> +            s->acpi_pcihp_pci_status[bsel].down |= (1U << slot);
> +        }
> +    }
> +}
> +
>  static void acpi_pcihp_update_hotplug_bus(AcpiPciHpState *s, int bsel)
>  {
>      BusChild *kid, *next;

So the result here will be that device will be deleted completely,
and will not reappear after guest reboot.
I don't think this is what we wanted.
I think we wanted a special state that will hide device from guest until
guest acks the failover bit.


> @@ -207,6 +226,14 @@ static void acpi_pcihp_update(AcpiPciHpState *s)
>      int i;
>  
>      for (i = 0; i < ACPI_PCIHP_MAX_HOTPLUG_BUS; ++i) {
> +        /*
> +         * Set the acpi_pcihp_pci_status[].down bits of all the
> +         * failover_primary devices so that the devices are ejected
> +         * from the guest. We can't use the qdev_unplug() as well as the
> +         * hotplug_handler to unplug the devices, because the guest may
> +         * not be in a state to cooperate.
> +         */
> +        acpi_pcihp_cleanup_failover_primary(s, i);
>          acpi_pcihp_update_hotplug_bus(s, i);
>      }
>  }

I really don't want acpi to know anything about failover.

All that needs to happen is sending a device delete request
to guest. Should work with any hotplug removal:
pci standard,acpi, etc.


-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/5] virtio_net: Add support for "Data Path Switching" during Live Migration.
  2019-01-09 15:56     ` [virtio-dev] " Michael S. Tsirkin
@ 2019-01-11  2:09       ` si-wei liu
  -1 siblings, 0 replies; 57+ messages in thread
From: si-wei liu @ 2019-01-11  2:09 UTC (permalink / raw)
  To: Michael S. Tsirkin, Venu Busireddy
  Cc: Marcel Apfelbaum, virtio-dev, qemu-devel



On 01/09/2019 07:56 AM, Michael S. Tsirkin wrote:
> On Mon, Jan 07, 2019 at 05:29:41PM -0500, Venu Busireddy wrote:
>> diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
>> index 80d42e1..2a3ffd3 100644
>> --- a/hw/acpi/pcihp.c
>> +++ b/hw/acpi/pcihp.c
>> @@ -176,6 +176,25 @@ static void acpi_pcihp_eject_slot(AcpiPciHpState *s, unsigned bsel, unsigned slo
>>       }
>>   }
>>   
>> +static void acpi_pcihp_cleanup_failover_primary(AcpiPciHpState *s, int bsel)
>> +{
>> +    BusChild *kid, *next;
>> +    PCIBus *bus = acpi_pcihp_find_hotplug_bus(s, bsel);
>> +
>> +    if (!bus) {
>> +        return;
>> +    }
>> +    QTAILQ_FOREACH_SAFE(kid, &bus->qbus.children, sibling, next) {
>> +        DeviceState *qdev = kid->child;
>> +        PCIDevice *pdev = PCI_DEVICE(qdev);
>> +        int slot = PCI_SLOT(pdev->devfn);
>> +
>> +        if (pdev->failover_primary) {
>> +            s->acpi_pcihp_pci_status[bsel].down |= (1U << slot);
>> +        }
>> +    }
>> +}
>> +
>>   static void acpi_pcihp_update_hotplug_bus(AcpiPciHpState *s, int bsel)
>>   {
>>       BusChild *kid, *next;
> So the result here will be that device will be deleted completely,
> and will not reappear after guest reboot.
The management stack will replug the VF until seeing the STANDBY_CHANGED 
"enabled" event after guest driver finishes feature negotiation and sets 
driver_ok.

> I don't think this is what we wanted.
> I think we wanted a special state that will hide device from guest until
> guest acks the failover bit.
What do we get by hiding? On the next reboot after system reset guest 
may load an older OS instance without standby advertised. The VF can't 
be plugged out then?

The model we adopt here doesn't pair virtio with VF in the QEMU level. 
If the VF isn't being used by guest, it would make sense to notify 
management to release VF anyways.

>
>
>> @@ -207,6 +226,14 @@ static void acpi_pcihp_update(AcpiPciHpState *s)
>>       int i;
>>   
>>       for (i = 0; i < ACPI_PCIHP_MAX_HOTPLUG_BUS; ++i) {
>> +        /*
>> +         * Set the acpi_pcihp_pci_status[].down bits of all the
>> +         * failover_primary devices so that the devices are ejected
>> +         * from the guest. We can't use the qdev_unplug() as well as the
>> +         * hotplug_handler to unplug the devices, because the guest may
>> +         * not be in a state to cooperate.
>> +         */
>> +        acpi_pcihp_cleanup_failover_primary(s, i);
>>           acpi_pcihp_update_hotplug_bus(s, i);
>>       }
>>   }
> I really don't want acpi to know anything about failover.
>
> All that needs to happen is sending a device delete request
> to guest. Should work with any hotplug removal:
> pci standard,acpi, etc.
>
As the code comments above indicated, there was issue uncovered that the 
guest may not be in a state to respond to interrupt during reboot. 
Actually management stack running fast enough is supposed to do this 
graceful hot plug removal upon receiving the STANDBY_CHANGED "disabled" 
event. However, if management stack's unable to do so, the code here 
makes sure the VF can be deleted and won't be seen by an older kernel 
after reboot.

-Siwei

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [virtio-dev] Re: [Qemu-devel] [PATCH v3 2/5] virtio_net: Add support for "Data Path Switching" during Live Migration.
@ 2019-01-11  2:09       ` si-wei liu
  0 siblings, 0 replies; 57+ messages in thread
From: si-wei liu @ 2019-01-11  2:09 UTC (permalink / raw)
  To: Michael S. Tsirkin, Venu Busireddy
  Cc: Marcel Apfelbaum, virtio-dev, qemu-devel



On 01/09/2019 07:56 AM, Michael S. Tsirkin wrote:
> On Mon, Jan 07, 2019 at 05:29:41PM -0500, Venu Busireddy wrote:
>> diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
>> index 80d42e1..2a3ffd3 100644
>> --- a/hw/acpi/pcihp.c
>> +++ b/hw/acpi/pcihp.c
>> @@ -176,6 +176,25 @@ static void acpi_pcihp_eject_slot(AcpiPciHpState *s, unsigned bsel, unsigned slo
>>       }
>>   }
>>   
>> +static void acpi_pcihp_cleanup_failover_primary(AcpiPciHpState *s, int bsel)
>> +{
>> +    BusChild *kid, *next;
>> +    PCIBus *bus = acpi_pcihp_find_hotplug_bus(s, bsel);
>> +
>> +    if (!bus) {
>> +        return;
>> +    }
>> +    QTAILQ_FOREACH_SAFE(kid, &bus->qbus.children, sibling, next) {
>> +        DeviceState *qdev = kid->child;
>> +        PCIDevice *pdev = PCI_DEVICE(qdev);
>> +        int slot = PCI_SLOT(pdev->devfn);
>> +
>> +        if (pdev->failover_primary) {
>> +            s->acpi_pcihp_pci_status[bsel].down |= (1U << slot);
>> +        }
>> +    }
>> +}
>> +
>>   static void acpi_pcihp_update_hotplug_bus(AcpiPciHpState *s, int bsel)
>>   {
>>       BusChild *kid, *next;
> So the result here will be that device will be deleted completely,
> and will not reappear after guest reboot.
The management stack will replug the VF until seeing the STANDBY_CHANGED 
"enabled" event after guest driver finishes feature negotiation and sets 
driver_ok.

> I don't think this is what we wanted.
> I think we wanted a special state that will hide device from guest until
> guest acks the failover bit.
What do we get by hiding? On the next reboot after system reset guest 
may load an older OS instance without standby advertised. The VF can't 
be plugged out then?

The model we adopt here doesn't pair virtio with VF in the QEMU level. 
If the VF isn't being used by guest, it would make sense to notify 
management to release VF anyways.

>
>
>> @@ -207,6 +226,14 @@ static void acpi_pcihp_update(AcpiPciHpState *s)
>>       int i;
>>   
>>       for (i = 0; i < ACPI_PCIHP_MAX_HOTPLUG_BUS; ++i) {
>> +        /*
>> +         * Set the acpi_pcihp_pci_status[].down bits of all the
>> +         * failover_primary devices so that the devices are ejected
>> +         * from the guest. We can't use the qdev_unplug() as well as the
>> +         * hotplug_handler to unplug the devices, because the guest may
>> +         * not be in a state to cooperate.
>> +         */
>> +        acpi_pcihp_cleanup_failover_primary(s, i);
>>           acpi_pcihp_update_hotplug_bus(s, i);
>>       }
>>   }
> I really don't want acpi to know anything about failover.
>
> All that needs to happen is sending a device delete request
> to guest. Should work with any hotplug removal:
> pci standard,acpi, etc.
>
As the code comments above indicated, there was issue uncovered that the 
guest may not be in a state to respond to interrupt during reboot. 
Actually management stack running fast enough is supposed to do this 
graceful hot plug removal upon receiving the STANDBY_CHANGED "disabled" 
event. However, if management stack's unable to do so, the code here 
makes sure the VF can be deleted and won't be seen by an older kernel 
after reboot.

-Siwei




---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/5] virtio_net: Add support for "Data Path Switching" during Live Migration.
  2019-01-11  2:09       ` [virtio-dev] " si-wei liu
@ 2019-01-11  3:20         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2019-01-11  3:20 UTC (permalink / raw)
  To: si-wei liu; +Cc: Venu Busireddy, Marcel Apfelbaum, virtio-dev, qemu-devel

On Thu, Jan 10, 2019 at 06:09:23PM -0800, si-wei liu wrote:
> 
> 
> On 01/09/2019 07:56 AM, Michael S. Tsirkin wrote:
> > On Mon, Jan 07, 2019 at 05:29:41PM -0500, Venu Busireddy wrote:
> > > diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
> > > index 80d42e1..2a3ffd3 100644
> > > --- a/hw/acpi/pcihp.c
> > > +++ b/hw/acpi/pcihp.c
> > > @@ -176,6 +176,25 @@ static void acpi_pcihp_eject_slot(AcpiPciHpState *s, unsigned bsel, unsigned slo
> > >       }
> > >   }
> > > +static void acpi_pcihp_cleanup_failover_primary(AcpiPciHpState *s, int bsel)
> > > +{
> > > +    BusChild *kid, *next;
> > > +    PCIBus *bus = acpi_pcihp_find_hotplug_bus(s, bsel);
> > > +
> > > +    if (!bus) {
> > > +        return;
> > > +    }
> > > +    QTAILQ_FOREACH_SAFE(kid, &bus->qbus.children, sibling, next) {
> > > +        DeviceState *qdev = kid->child;
> > > +        PCIDevice *pdev = PCI_DEVICE(qdev);
> > > +        int slot = PCI_SLOT(pdev->devfn);
> > > +
> > > +        if (pdev->failover_primary) {
> > > +            s->acpi_pcihp_pci_status[bsel].down |= (1U << slot);
> > > +        }
> > > +    }
> > > +}
> > > +
> > >   static void acpi_pcihp_update_hotplug_bus(AcpiPciHpState *s, int bsel)
> > >   {
> > >       BusChild *kid, *next;
> > So the result here will be that device will be deleted completely,
> > and will not reappear after guest reboot.
> The management stack will replug the VF until seeing the STANDBY_CHANGED
> "enabled" event after guest driver finishes feature negotiation and sets
> driver_ok.
>
> > I don't think this is what we wanted.
> > I think we wanted a special state that will hide device from guest until
> > guest acks the failover bit.
> What do we get by hiding? On the next reboot after system reset guest may
> load an older OS instance without standby advertised. The VF can't be
> plugged out then?
> 
> The model we adopt here doesn't pair virtio with VF in the QEMU level. If
> the VF isn't being used by guest, it would make sense to notify management
> to release VF anyways.

Hmm it's different from what I envisioned and more work for management,
but maybe it's ok ... I will need to think about it.

> > 
> > 
> > > @@ -207,6 +226,14 @@ static void acpi_pcihp_update(AcpiPciHpState *s)
> > >       int i;
> > >       for (i = 0; i < ACPI_PCIHP_MAX_HOTPLUG_BUS; ++i) {
> > > +        /*
> > > +         * Set the acpi_pcihp_pci_status[].down bits of all the
> > > +         * failover_primary devices so that the devices are ejected
> > > +         * from the guest. We can't use the qdev_unplug() as well as the
> > > +         * hotplug_handler to unplug the devices, because the guest may
> > > +         * not be in a state to cooperate.
> > > +         */
> > > +        acpi_pcihp_cleanup_failover_primary(s, i);
> > >           acpi_pcihp_update_hotplug_bus(s, i);
> > >       }
> > >   }
> > I really don't want acpi to know anything about failover.
> > 
> > All that needs to happen is sending a device delete request
> > to guest. Should work with any hotplug removal:
> > pci standard,acpi, etc.
> > 
> As the code comments above indicated, there was issue uncovered that the
> guest may not be in a state to respond to interrupt during reboot.

If you request removal then hotplug machinery normally will eject
the device on system reset. You need to request it early enough though.
I guess this missing is what happened.

> Actually
> management stack running fast enough is supposed to do this graceful hot
> plug removal upon receiving the STANDBY_CHANGED "disabled" event. However,
> if management stack's unable to do so, the code here makes sure the VF can
> be deleted and won't be seen by an older kernel after reboot.
> 
> -Siwei

I'm sorry I don't understand.  On a system with PCIe native hotplug
poking at ACPI is just wrong.


-- 
MST

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [virtio-dev] Re: [Qemu-devel] [PATCH v3 2/5] virtio_net: Add support for "Data Path Switching" during Live Migration.
@ 2019-01-11  3:20         ` Michael S. Tsirkin
  0 siblings, 0 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2019-01-11  3:20 UTC (permalink / raw)
  To: si-wei liu; +Cc: Venu Busireddy, Marcel Apfelbaum, virtio-dev, qemu-devel

On Thu, Jan 10, 2019 at 06:09:23PM -0800, si-wei liu wrote:
> 
> 
> On 01/09/2019 07:56 AM, Michael S. Tsirkin wrote:
> > On Mon, Jan 07, 2019 at 05:29:41PM -0500, Venu Busireddy wrote:
> > > diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
> > > index 80d42e1..2a3ffd3 100644
> > > --- a/hw/acpi/pcihp.c
> > > +++ b/hw/acpi/pcihp.c
> > > @@ -176,6 +176,25 @@ static void acpi_pcihp_eject_slot(AcpiPciHpState *s, unsigned bsel, unsigned slo
> > >       }
> > >   }
> > > +static void acpi_pcihp_cleanup_failover_primary(AcpiPciHpState *s, int bsel)
> > > +{
> > > +    BusChild *kid, *next;
> > > +    PCIBus *bus = acpi_pcihp_find_hotplug_bus(s, bsel);
> > > +
> > > +    if (!bus) {
> > > +        return;
> > > +    }
> > > +    QTAILQ_FOREACH_SAFE(kid, &bus->qbus.children, sibling, next) {
> > > +        DeviceState *qdev = kid->child;
> > > +        PCIDevice *pdev = PCI_DEVICE(qdev);
> > > +        int slot = PCI_SLOT(pdev->devfn);
> > > +
> > > +        if (pdev->failover_primary) {
> > > +            s->acpi_pcihp_pci_status[bsel].down |= (1U << slot);
> > > +        }
> > > +    }
> > > +}
> > > +
> > >   static void acpi_pcihp_update_hotplug_bus(AcpiPciHpState *s, int bsel)
> > >   {
> > >       BusChild *kid, *next;
> > So the result here will be that device will be deleted completely,
> > and will not reappear after guest reboot.
> The management stack will replug the VF until seeing the STANDBY_CHANGED
> "enabled" event after guest driver finishes feature negotiation and sets
> driver_ok.
>
> > I don't think this is what we wanted.
> > I think we wanted a special state that will hide device from guest until
> > guest acks the failover bit.
> What do we get by hiding? On the next reboot after system reset guest may
> load an older OS instance without standby advertised. The VF can't be
> plugged out then?
> 
> The model we adopt here doesn't pair virtio with VF in the QEMU level. If
> the VF isn't being used by guest, it would make sense to notify management
> to release VF anyways.

Hmm it's different from what I envisioned and more work for management,
but maybe it's ok ... I will need to think about it.

> > 
> > 
> > > @@ -207,6 +226,14 @@ static void acpi_pcihp_update(AcpiPciHpState *s)
> > >       int i;
> > >       for (i = 0; i < ACPI_PCIHP_MAX_HOTPLUG_BUS; ++i) {
> > > +        /*
> > > +         * Set the acpi_pcihp_pci_status[].down bits of all the
> > > +         * failover_primary devices so that the devices are ejected
> > > +         * from the guest. We can't use the qdev_unplug() as well as the
> > > +         * hotplug_handler to unplug the devices, because the guest may
> > > +         * not be in a state to cooperate.
> > > +         */
> > > +        acpi_pcihp_cleanup_failover_primary(s, i);
> > >           acpi_pcihp_update_hotplug_bus(s, i);
> > >       }
> > >   }
> > I really don't want acpi to know anything about failover.
> > 
> > All that needs to happen is sending a device delete request
> > to guest. Should work with any hotplug removal:
> > pci standard,acpi, etc.
> > 
> As the code comments above indicated, there was issue uncovered that the
> guest may not be in a state to respond to interrupt during reboot.

If you request removal then hotplug machinery normally will eject
the device on system reset. You need to request it early enough though.
I guess this missing is what happened.

> Actually
> management stack running fast enough is supposed to do this graceful hot
> plug removal upon receiving the STANDBY_CHANGED "disabled" event. However,
> if management stack's unable to do so, the code here makes sure the VF can
> be deleted and won't be seen by an older kernel after reboot.
> 
> -Siwei

I'm sorry I don't understand.  On a system with PCIe native hotplug
poking at ACPI is just wrong.


-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v3 0/5] Support for datapath switching during live migration
  2019-01-09 13:39           ` Michael S. Tsirkin
@ 2019-01-11  6:57             ` si-wei liu
  -1 siblings, 0 replies; 57+ messages in thread
From: si-wei liu @ 2019-01-11  6:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Venu Busireddy, Marcel Apfelbaum, virtio-dev, qemu-devel, Liran Alon



On 01/09/2019 05:39 AM, Michael S. Tsirkin wrote:
> On Tue, Jan 08, 2019 at 08:55:35PM -0800, si-wei liu wrote:
>>
>> On 1/7/2019 6:25 PM, Michael S. Tsirkin wrote:
>>
>>      On Mon, Jan 07, 2019 at 05:45:22PM -0800, si-wei liu wrote:
>>
>>          On 01/07/2019 03:32 PM, Michael S. Tsirkin wrote:
>>
>>              On Mon, Jan 07, 2019 at 05:29:39PM -0500, Venu Busireddy wrote:
>>
>>                  Implement the infrastructure to support datapath switching during live
>>                  migration involving SR-IOV devices.
>>
>>                  1. This patch is based off on the current VIRTIO_NET_F_STANDBY feature
>>                      bit and MAC address device pairing.
>>
>>                  2. This set of events will be consumed by userspace management software
>>                      to orchestrate all the hot plug and datapath switching activities.
>>                      This scheme has the least QEMU modifications while allowing userspace
>>                      software to build its own intelligence to control the whole process
>>                      of SR-IOV live migration.
>>
>>                  3. While the hidden device model (viz. coupled device model) is still
>>                      being explored for automatic hot plugging (QEMU) and automatic datapath
>>                      switching (host-kernel), this series provides a supplemental set
>>                      of interfaces if management software wants to drive the SR-IOV live
>>                      migration on its own. It should not conflict with the hidden device
>>                      model but just offers simplicity of implementation.
>> And if guest*does*  need the switch because e.g. attaching xdp wants to
>> resent the card?
>>
>>
>>
>>                  Si-Wei Liu (2):
>>                     vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover
>>                     pci: query command extension to check the bus master enabling status of the failover-primary device
>>
>>                  Sridhar Samudrala (1):
>>                     virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.
>>
>>                  Venu Busireddy (2):
>>                     virtio_net: Add support for "Data Path Switching" during Live Migration.
>>                     virtio_net: Add a query command for FAILOVER_STANDBY_CHANGED event.
>>
>>                  ---
>>                  Changes in v3:
>>                     Fix issues with coding style in patch 3/5.
>>
>>                  Changes in v2:
>>                     Added a query command for FAILOVER_STANDBY_CHANGED event.
>>                     Added a query command for FAILOVER_PRIMARY_CHANGED event.
>>
>>              Hmm it looks like all feedback I sent e.g. here:
>>              https://patchwork.kernel.org/patch/10721571/
>>              got ignored.
>>
>>              To summarize I suggest reworking the series adding a new command along
>>              the lines of (naming is up to you):
>>
>>              query-pci-master - this returns status for a device
>>                                 and enables a *single* event after
>>                                 it changes
>>
>>              and then removing all status data from the event,
>>              just notify about the change and *only once*.
>>
>>          Why removing all status data from the event?
>>
>>      To make sure users do not forget to call query-pci-master to
>>      re-enable more events.
>>
>> IMO the FAILOVER_PRIMARY_CHANGED event is on the performance path, it's an
>> overkill to enforce round trip query for each event in normal situations.
>>
>>          It does not hurt to keep them
>>          as the FAILOVER_PRIMARY_CHANGED event in general is of pretty low-frequency.
>>
>>      A malicious guest can make it as frequent as it wants to.
>>      OTOH there is no way to limit.
>>
>> Will throttle the event rate (say, limiting to no more than 1 event per second)
> And if guest *does* need the switch because e.g. attaching xdp wants to
> resent the card?
Device reset during attaching XDP prog doesn't end up with PCI master 
status reset as far as I see. Even so (say with FLR or slot reset), the 
ending state would be reflected (without having to do a query) if 
state's going through quick transitions. And, suppressing event emission 
until next query seems to make management stack perform worse in this 
case? Management stack end up having to query commands for couple of 
times to see if status is settled.

>
>> a way to limit (as opposed to control guest behavior) ? The other similar
>> events that apply rate limiting don't suppress event emission until the next
>> query at all.
> We have some problematic interfaces already, that's true.
>
>> Doing so would just cause more events missing. As stated in the
>> earlier example, we should give guest NIC a chance to flush queued packets even
>> if ending state is same between two events.
> I haven't seen that requirement. I guess a reset just stops processing
> buffers rather than flush. Care to repeat?
If for some reason a VF driver runs into some hardware/firmware fault 
and keeps resetting just attempts to fix itself, wouldn't it be 
necessary for failover to use the PV path temporarily? Without proper 
interleaving all packets can be dropped in the worst case. Severe packet 
drops deteriorate network performance quite badly, and may even lead to 
fatal errors if storage device is being hosted over the network (e.g. 
iSCSI) in particular.

>
>>          As can be seen other similar low-frequent QMP events do have data carried
>>          over.
>>
>>          As this event relates to datapath switching, there's implication to coalesce
>>          events as packets might not get a chance to send out as nothing would ever
>>          happen when  going through quick transitions like
>>          disabled->enabled->disabled. I would allow at least few packets to be sent
>>          over wire rather than nothing. Who knows how fast management can react and
>>          consume these events?
>>
>>          Thanks,
>>          -Siwei
>>
>>      OK if it's so important for latency let's include data in the event.
>>      Please add comments explaining that you must always re-run query
>>      afterwards to make sure it's stable and re-enable more events.
>>
>> I can add comments describing why we need to carry data in the event, and apply
>> rate limiting to events. But I don't follow why it must suppress event until
>> next query.
> Rate limiting is fundamentally broken.
What is the issue with rate limiting? Seems pretty good fit to me, as 
otherwise mgmt stack still need to check the status periodically with 
some slow rate if experiencing malicious attack.
> Try a stress of resets and there goes your promise of low downtime.
Is this a normal user would do? I guess only malicious guest may do so. 
With this in mind, what's more important is to defend the attack by 
imposing rate limiter, and downtime is not even a factor to consider in 
this context. The point is that I view rate limiting as a way to defend 
malicious attack, and it's a good trade-off as reset itself doesn't 
happen very often in normal situations. And we can always find a proper 
rate that could balance both needs well.

BTW, there's no way to promise latency/downtime either way - all is 
based on best effort with the current datapatch switching scheme that 
involves userspace management stack. IMHO ideally the only way with some 
latency/downtime guarantee is to have guest to kick off datapath 
switching through a control queue message, which would cause guest to 
exit and host kernel would then take the turn to handle the MAC filter 
movement in a synchronized context. Or we should wait Intel to fix their 
legacy drivers.

>
> Let me try to re-state: state query is fundamentally required
> because otherwise e.g. management restarts do not work.
Yes, we've added the query command for such purpose (handle management 
restarts). There's no argue about this.

> And it is much better to force management to run query every event
> and every restart than just hope it handles racy corner cases correctly.
There's no way to handle those racy corner cases with what you 
suggested, I see missing packets when switching promiscuous mode on 
virtio-net, and I see no chance it can be improved with the current scheme.

>
> If we do include data in the event then there is no real latency cost to
> that, since management can take action in response to the event then do
> the query asynchronously at leasure.
> So let me turn it around and say I don't follow why you have objections
> to blocking following events until query.
See my explanations above.

thanks,
-Siwei
>
>
>> Thanks,
>> -Siwei
>>
>>
>>
>>
>>
>>
>>
>>              upon event management does query-pci-master
>>              and acts accordingly.
>>
>>
>>
>>
>>
>>                    hmp.c                          |   5 +++
>>                    hw/acpi/pcihp.c                |  27 +++++++++++
>>                    hw/net/virtio-net.c            |  42 +++++++++++++++++
>>                    hw/pci/pci.c                   |   5 +++
>>                    hw/vfio/pci.c                  |  60 +++++++++++++++++++++++++
>>                    hw/vfio/pci.h                  |   1 +
>>                    include/hw/pci/pci.h           |   1 +
>>                    include/hw/virtio/virtio-net.h |   1 +
>>                    include/net/net.h              |   2 +
>>                    net/net.c                      |  61 +++++++++++++++++++++++++
>>                    qapi/misc.json                 |   5 ++-
>>                    qapi/net.json                  | 100 +++++++++++++++++++++++++++++++++++++++++
>>                    12 files changed, 309 insertions(+), 1 deletion(-)
>>
>>              ---------------------------------------------------------------------
>>              To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>>              For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>
>>
>>      ---------------------------------------------------------------------
>>      To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>>      For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>
>>
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [virtio-dev] Re: [PATCH v3 0/5] Support for datapath switching during live migration
@ 2019-01-11  6:57             ` si-wei liu
  0 siblings, 0 replies; 57+ messages in thread
From: si-wei liu @ 2019-01-11  6:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Venu Busireddy, Marcel Apfelbaum, virtio-dev, qemu-devel, Liran Alon

[-- Attachment #1: Type: text/plain, Size: 10557 bytes --]



On 01/09/2019 05:39 AM, Michael S. Tsirkin wrote:
> On Tue, Jan 08, 2019 at 08:55:35PM -0800, si-wei liu wrote:
>>
>> On 1/7/2019 6:25 PM, Michael S. Tsirkin wrote:
>>
>>      On Mon, Jan 07, 2019 at 05:45:22PM -0800, si-wei liu wrote:
>>
>>          On 01/07/2019 03:32 PM, Michael S. Tsirkin wrote:
>>
>>              On Mon, Jan 07, 2019 at 05:29:39PM -0500, Venu Busireddy wrote:
>>
>>                  Implement the infrastructure to support datapath switching during live
>>                  migration involving SR-IOV devices.
>>
>>                  1. This patch is based off on the current VIRTIO_NET_F_STANDBY feature
>>                      bit and MAC address device pairing.
>>
>>                  2. This set of events will be consumed by userspace management software
>>                      to orchestrate all the hot plug and datapath switching activities.
>>                      This scheme has the least QEMU modifications while allowing userspace
>>                      software to build its own intelligence to control the whole process
>>                      of SR-IOV live migration.
>>
>>                  3. While the hidden device model (viz. coupled device model) is still
>>                      being explored for automatic hot plugging (QEMU) and automatic datapath
>>                      switching (host-kernel), this series provides a supplemental set
>>                      of interfaces if management software wants to drive the SR-IOV live
>>                      migration on its own. It should not conflict with the hidden device
>>                      model but just offers simplicity of implementation.
>> And if guest*does*  need the switch because e.g. attaching xdp wants to
>> resent the card?
>>
>>
>>
>>                  Si-Wei Liu (2):
>>                     vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover
>>                     pci: query command extension to check the bus master enabling status of the failover-primary device
>>
>>                  Sridhar Samudrala (1):
>>                     virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.
>>
>>                  Venu Busireddy (2):
>>                     virtio_net: Add support for "Data Path Switching" during Live Migration.
>>                     virtio_net: Add a query command for FAILOVER_STANDBY_CHANGED event.
>>
>>                  ---
>>                  Changes in v3:
>>                     Fix issues with coding style in patch 3/5.
>>
>>                  Changes in v2:
>>                     Added a query command for FAILOVER_STANDBY_CHANGED event.
>>                     Added a query command for FAILOVER_PRIMARY_CHANGED event.
>>
>>              Hmm it looks like all feedback I sent e.g. here:
>>              https://patchwork.kernel.org/patch/10721571/
>>              got ignored.
>>
>>              To summarize I suggest reworking the series adding a new command along
>>              the lines of (naming is up to you):
>>
>>              query-pci-master - this returns status for a device
>>                                 and enables a *single* event after
>>                                 it changes
>>
>>              and then removing all status data from the event,
>>              just notify about the change and *only once*.
>>
>>          Why removing all status data from the event?
>>
>>      To make sure users do not forget to call query-pci-master to
>>      re-enable more events.
>>
>> IMO the FAILOVER_PRIMARY_CHANGED event is on the performance path, it's an
>> overkill to enforce round trip query for each event in normal situations.
>>
>>          It does not hurt to keep them
>>          as the FAILOVER_PRIMARY_CHANGED event in general is of pretty low-frequency.
>>
>>      A malicious guest can make it as frequent as it wants to.
>>      OTOH there is no way to limit.
>>
>> Will throttle the event rate (say, limiting to no more than 1 event per second)
> And if guest *does* need the switch because e.g. attaching xdp wants to
> resent the card?
Device reset during attaching XDP prog doesn't end up with PCI master 
status reset as far as I see. Even so (say with FLR or slot reset), the 
ending state would be reflected (without having to do a query) if 
state's going through quick transitions. And, suppressing event emission 
until next query seems to make management stack perform worse in this 
case? Management stack end up having to query commands for couple of 
times to see if status is settled.

>
>> a way to limit (as opposed to control guest behavior) ? The other similar
>> events that apply rate limiting don't suppress event emission until the next
>> query at all.
> We have some problematic interfaces already, that's true.
>
>> Doing so would just cause more events missing. As stated in the
>> earlier example, we should give guest NIC a chance to flush queued packets even
>> if ending state is same between two events.
> I haven't seen that requirement. I guess a reset just stops processing
> buffers rather than flush. Care to repeat?
If for some reason a VF driver runs into some hardware/firmware fault 
and keeps resetting just attempts to fix itself, wouldn't it be 
necessary for failover to use the PV path temporarily? Without proper 
interleaving all packets can be dropped in the worst case. Severe packet 
drops deteriorate network performance quite badly, and may even lead to 
fatal errors if storage device is being hosted over the network (e.g. 
iSCSI) in particular.

>
>>          As can be seen other similar low-frequent QMP events do have data carried
>>          over.
>>
>>          As this event relates to datapath switching, there's implication to coalesce
>>          events as packets might not get a chance to send out as nothing would ever
>>          happen when  going through quick transitions like
>>          disabled->enabled->disabled. I would allow at least few packets to be sent
>>          over wire rather than nothing. Who knows how fast management can react and
>>          consume these events?
>>
>>          Thanks,
>>          -Siwei
>>
>>      OK if it's so important for latency let's include data in the event.
>>      Please add comments explaining that you must always re-run query
>>      afterwards to make sure it's stable and re-enable more events.
>>
>> I can add comments describing why we need to carry data in the event, and apply
>> rate limiting to events. But I don't follow why it must suppress event until
>> next query.
> Rate limiting is fundamentally broken.
What is the issue with rate limiting? Seems pretty good fit to me, as 
otherwise mgmt stack still need to check the status periodically with 
some slow rate if experiencing malicious attack.
> Try a stress of resets and there goes your promise of low downtime.
Is this a normal user would do? I guess only malicious guest may do so. 
With this in mind, what's more important is to defend the attack by 
imposing rate limiter, and downtime is not even a factor to consider in 
this context. The point is that I view rate limiting as a way to defend 
malicious attack, and it's a good trade-off as reset itself doesn't 
happen very often in normal situations. And we can always find a proper 
rate that could balance both needs well.

BTW, there's no way to promise latency/downtime either way - all is 
based on best effort with the current datapatch switching scheme that 
involves userspace management stack. IMHO ideally the only way with some 
latency/downtime guarantee is to have guest to kick off datapath 
switching through a control queue message, which would cause guest to 
exit and host kernel would then take the turn to handle the MAC filter 
movement in a synchronized context. Or we should wait Intel to fix their 
legacy drivers.

>
> Let me try to re-state: state query is fundamentally required
> because otherwise e.g. management restarts do not work.
Yes, we've added the query command for such purpose (handle management 
restarts). There's no argue about this.

> And it is much better to force management to run query every event
> and every restart than just hope it handles racy corner cases correctly.
There's no way to handle those racy corner cases with what you 
suggested, I see missing packets when switching promiscuous mode on 
virtio-net, and I see no chance it can be improved with the current scheme.

>
> If we do include data in the event then there is no real latency cost to
> that, since management can take action in response to the event then do
> the query asynchronously at leasure.
> So let me turn it around and say I don't follow why you have objections
> to blocking following events until query.
See my explanations above.

thanks,
-Siwei
>
>
>> Thanks,
>> -Siwei
>>
>>
>>
>>
>>
>>
>>
>>              upon event management does query-pci-master
>>              and acts accordingly.
>>
>>
>>
>>
>>
>>                    hmp.c                          |   5 +++
>>                    hw/acpi/pcihp.c                |  27 +++++++++++
>>                    hw/net/virtio-net.c            |  42 +++++++++++++++++
>>                    hw/pci/pci.c                   |   5 +++
>>                    hw/vfio/pci.c                  |  60 +++++++++++++++++++++++++
>>                    hw/vfio/pci.h                  |   1 +
>>                    include/hw/pci/pci.h           |   1 +
>>                    include/hw/virtio/virtio-net.h |   1 +
>>                    include/net/net.h              |   2 +
>>                    net/net.c                      |  61 +++++++++++++++++++++++++
>>                    qapi/misc.json                 |   5 ++-
>>                    qapi/net.json                  | 100 +++++++++++++++++++++++++++++++++++++++++
>>                    12 files changed, 309 insertions(+), 1 deletion(-)
>>
>>              ---------------------------------------------------------------------
>>              To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>>              For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>
>>
>>      ---------------------------------------------------------------------
>>      To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>>      For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>
>>
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>


[-- Attachment #2: Type: text/html, Size: 12888 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v3 2/5] virtio_net: Add support for "Data Path Switching" during Live Migration.
  2019-01-11  3:20         ` [virtio-dev] " Michael S. Tsirkin
@ 2019-01-11  7:09           ` si-wei liu
  -1 siblings, 0 replies; 57+ messages in thread
From: si-wei liu @ 2019-01-11  7:09 UTC (permalink / raw)
  To: Michael S. Tsirkin, Venu Busireddy
  Cc: Marcel Apfelbaum, virtio-dev, qemu-devel



On 01/10/2019 07:20 PM, Michael S. Tsirkin wrote:
> On Thu, Jan 10, 2019 at 06:09:23PM -0800, si-wei liu wrote:
>>
>> On 01/09/2019 07:56 AM, Michael S. Tsirkin wrote:
>>> On Mon, Jan 07, 2019 at 05:29:41PM -0500, Venu Busireddy wrote:
>>>> diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
>>>> index 80d42e1..2a3ffd3 100644
>>>> --- a/hw/acpi/pcihp.c
>>>> +++ b/hw/acpi/pcihp.c
>>>> @@ -176,6 +176,25 @@ static void acpi_pcihp_eject_slot(AcpiPciHpState *s, unsigned bsel, unsigned slo
>>>>        }
>>>>    }
>>>> +static void acpi_pcihp_cleanup_failover_primary(AcpiPciHpState *s, int bsel)
>>>> +{
>>>> +    BusChild *kid, *next;
>>>> +    PCIBus *bus = acpi_pcihp_find_hotplug_bus(s, bsel);
>>>> +
>>>> +    if (!bus) {
>>>> +        return;
>>>> +    }
>>>> +    QTAILQ_FOREACH_SAFE(kid, &bus->qbus.children, sibling, next) {
>>>> +        DeviceState *qdev = kid->child;
>>>> +        PCIDevice *pdev = PCI_DEVICE(qdev);
>>>> +        int slot = PCI_SLOT(pdev->devfn);
>>>> +
>>>> +        if (pdev->failover_primary) {
>>>> +            s->acpi_pcihp_pci_status[bsel].down |= (1U << slot);
>>>> +        }
>>>> +    }
>>>> +}
>>>> +
>>>>    static void acpi_pcihp_update_hotplug_bus(AcpiPciHpState *s, int bsel)
>>>>    {
>>>>        BusChild *kid, *next;
>>> So the result here will be that device will be deleted completely,
>>> and will not reappear after guest reboot.
>> The management stack will replug the VF until seeing the STANDBY_CHANGED
>> "enabled" event after guest driver finishes feature negotiation and sets
>> driver_ok.
>>
>>> I don't think this is what we wanted.
>>> I think we wanted a special state that will hide device from guest until
>>> guest acks the failover bit.
>> What do we get by hiding? On the next reboot after system reset guest may
>> load an older OS instance without standby advertised. The VF can't be
>> plugged out then?
>>
>> The model we adopt here doesn't pair virtio with VF in the QEMU level. If
>> the VF isn't being used by guest, it would make sense to notify management
>> to release VF anyways.
> Hmm it's different from what I envisioned and more work for management,
> but maybe it's ok ... I will need to think about it.
>
>>>
>>>> @@ -207,6 +226,14 @@ static void acpi_pcihp_update(AcpiPciHpState *s)
>>>>        int i;
>>>>        for (i = 0; i < ACPI_PCIHP_MAX_HOTPLUG_BUS; ++i) {
>>>> +        /*
>>>> +         * Set the acpi_pcihp_pci_status[].down bits of all the
>>>> +         * failover_primary devices so that the devices are ejected
>>>> +         * from the guest. We can't use the qdev_unplug() as well as the
>>>> +         * hotplug_handler to unplug the devices, because the guest may
>>>> +         * not be in a state to cooperate.
>>>> +         */
>>>> +        acpi_pcihp_cleanup_failover_primary(s, i);
>>>>            acpi_pcihp_update_hotplug_bus(s, i);
>>>>        }
>>>>    }
>>> I really don't want acpi to know anything about failover.
>>>
>>> All that needs to happen is sending a device delete request
>>> to guest. Should work with any hotplug removal:
>>> pci standard,acpi, etc.
>>>
>> As the code comments above indicated, there was issue uncovered that the
>> guest may not be in a state to respond to interrupt during reboot.
> If you request removal then hotplug machinery normally will eject
> the device on system reset. You need to request it early enough though.
With asynchronous nature of interrupt injection and guest handling, 
there's no way you can guarantee it's early enough, do you?

Surely that's why I said the event is in a "performance" path that has 
to be handled as fast as possible by management.

> I guess this missing is what happened.
>
>> Actually
>> management stack running fast enough is supposed to do this graceful hot
>> plug removal upon receiving the STANDBY_CHANGED "disabled" event. However,
>> if management stack's unable to do so, the code here makes sure the VF can
>> be deleted and won't be seen by an older kernel after reboot.
>>
>> -Siwei
> I'm sorry I don't understand.  On a system with PCIe native hotplug
> poking at ACPI is just wrong.
Venu, what's your plan to add the SHPC and PCIe native hotplug support? 
People starts to get confusing. I did not see you mentioned it in the 
cover letter.

Thanks,
-Siwei

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [virtio-dev] Re: [Qemu-devel] [PATCH v3 2/5] virtio_net: Add support for "Data Path Switching" during Live Migration.
@ 2019-01-11  7:09           ` si-wei liu
  0 siblings, 0 replies; 57+ messages in thread
From: si-wei liu @ 2019-01-11  7:09 UTC (permalink / raw)
  To: Michael S. Tsirkin, Venu Busireddy
  Cc: Marcel Apfelbaum, virtio-dev, qemu-devel



On 01/10/2019 07:20 PM, Michael S. Tsirkin wrote:
> On Thu, Jan 10, 2019 at 06:09:23PM -0800, si-wei liu wrote:
>>
>> On 01/09/2019 07:56 AM, Michael S. Tsirkin wrote:
>>> On Mon, Jan 07, 2019 at 05:29:41PM -0500, Venu Busireddy wrote:
>>>> diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
>>>> index 80d42e1..2a3ffd3 100644
>>>> --- a/hw/acpi/pcihp.c
>>>> +++ b/hw/acpi/pcihp.c
>>>> @@ -176,6 +176,25 @@ static void acpi_pcihp_eject_slot(AcpiPciHpState *s, unsigned bsel, unsigned slo
>>>>        }
>>>>    }
>>>> +static void acpi_pcihp_cleanup_failover_primary(AcpiPciHpState *s, int bsel)
>>>> +{
>>>> +    BusChild *kid, *next;
>>>> +    PCIBus *bus = acpi_pcihp_find_hotplug_bus(s, bsel);
>>>> +
>>>> +    if (!bus) {
>>>> +        return;
>>>> +    }
>>>> +    QTAILQ_FOREACH_SAFE(kid, &bus->qbus.children, sibling, next) {
>>>> +        DeviceState *qdev = kid->child;
>>>> +        PCIDevice *pdev = PCI_DEVICE(qdev);
>>>> +        int slot = PCI_SLOT(pdev->devfn);
>>>> +
>>>> +        if (pdev->failover_primary) {
>>>> +            s->acpi_pcihp_pci_status[bsel].down |= (1U << slot);
>>>> +        }
>>>> +    }
>>>> +}
>>>> +
>>>>    static void acpi_pcihp_update_hotplug_bus(AcpiPciHpState *s, int bsel)
>>>>    {
>>>>        BusChild *kid, *next;
>>> So the result here will be that device will be deleted completely,
>>> and will not reappear after guest reboot.
>> The management stack will replug the VF until seeing the STANDBY_CHANGED
>> "enabled" event after guest driver finishes feature negotiation and sets
>> driver_ok.
>>
>>> I don't think this is what we wanted.
>>> I think we wanted a special state that will hide device from guest until
>>> guest acks the failover bit.
>> What do we get by hiding? On the next reboot after system reset guest may
>> load an older OS instance without standby advertised. The VF can't be
>> plugged out then?
>>
>> The model we adopt here doesn't pair virtio with VF in the QEMU level. If
>> the VF isn't being used by guest, it would make sense to notify management
>> to release VF anyways.
> Hmm it's different from what I envisioned and more work for management,
> but maybe it's ok ... I will need to think about it.
>
>>>
>>>> @@ -207,6 +226,14 @@ static void acpi_pcihp_update(AcpiPciHpState *s)
>>>>        int i;
>>>>        for (i = 0; i < ACPI_PCIHP_MAX_HOTPLUG_BUS; ++i) {
>>>> +        /*
>>>> +         * Set the acpi_pcihp_pci_status[].down bits of all the
>>>> +         * failover_primary devices so that the devices are ejected
>>>> +         * from the guest. We can't use the qdev_unplug() as well as the
>>>> +         * hotplug_handler to unplug the devices, because the guest may
>>>> +         * not be in a state to cooperate.
>>>> +         */
>>>> +        acpi_pcihp_cleanup_failover_primary(s, i);
>>>>            acpi_pcihp_update_hotplug_bus(s, i);
>>>>        }
>>>>    }
>>> I really don't want acpi to know anything about failover.
>>>
>>> All that needs to happen is sending a device delete request
>>> to guest. Should work with any hotplug removal:
>>> pci standard,acpi, etc.
>>>
>> As the code comments above indicated, there was issue uncovered that the
>> guest may not be in a state to respond to interrupt during reboot.
> If you request removal then hotplug machinery normally will eject
> the device on system reset. You need to request it early enough though.
With asynchronous nature of interrupt injection and guest handling, 
there's no way you can guarantee it's early enough, do you?

Surely that's why I said the event is in a "performance" path that has 
to be handled as fast as possible by management.

> I guess this missing is what happened.
>
>> Actually
>> management stack running fast enough is supposed to do this graceful hot
>> plug removal upon receiving the STANDBY_CHANGED "disabled" event. However,
>> if management stack's unable to do so, the code here makes sure the VF can
>> be deleted and won't be seen by an older kernel after reboot.
>>
>> -Siwei
> I'm sorry I don't understand.  On a system with PCIe native hotplug
> poking at ACPI is just wrong.
Venu, what's your plan to add the SHPC and PCIe native hotplug support? 
People starts to get confusing. I did not see you mentioned it in the 
cover letter.

Thanks,
-Siwei


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v3 2/5] virtio_net: Add support for "Data Path Switching" during Live Migration.
  2019-01-11  7:09           ` [virtio-dev] Re: [Qemu-devel] " si-wei liu
  (?)
@ 2019-01-14 11:10           ` Roman Kagan
  -1 siblings, 0 replies; 57+ messages in thread
From: Roman Kagan @ 2019-01-14 11:10 UTC (permalink / raw)
  To: si-wei liu
  Cc: Michael S. Tsirkin, Venu Busireddy, Marcel Apfelbaum, virtio-dev,
	qemu-devel

On Thu, Jan 10, 2019 at 11:09:09PM -0800, si-wei liu wrote:
> On 01/10/2019 07:20 PM, Michael S. Tsirkin wrote:
> > On Thu, Jan 10, 2019 at 06:09:23PM -0800, si-wei liu wrote:
> > > 
> > > On 01/09/2019 07:56 AM, Michael S. Tsirkin wrote:
> > > > On Mon, Jan 07, 2019 at 05:29:41PM -0500, Venu Busireddy wrote:
> > > > > @@ -207,6 +226,14 @@ static void acpi_pcihp_update(AcpiPciHpState *s)
> > > > >        int i;
> > > > >        for (i = 0; i < ACPI_PCIHP_MAX_HOTPLUG_BUS; ++i) {
> > > > > +        /*
> > > > > +         * Set the acpi_pcihp_pci_status[].down bits of all the
> > > > > +         * failover_primary devices so that the devices are ejected
> > > > > +         * from the guest. We can't use the qdev_unplug() as well as the
> > > > > +         * hotplug_handler to unplug the devices, because the guest may
> > > > > +         * not be in a state to cooperate.
> > > > > +         */
> > > > > +        acpi_pcihp_cleanup_failover_primary(s, i);
> > > > >            acpi_pcihp_update_hotplug_bus(s, i);
> > > > >        }
> > > > >    }
> > > > I really don't want acpi to know anything about failover.
> > > > 
> > > > All that needs to happen is sending a device delete request
> > > > to guest. Should work with any hotplug removal:
> > > > pci standard,acpi, etc.
> > > > 
> > > As the code comments above indicated, there was issue uncovered that the
> > > guest may not be in a state to respond to interrupt during reboot.
> > If you request removal then hotplug machinery normally will eject
> > the device on system reset. You need to request it early enough though.
> With asynchronous nature of interrupt injection and guest handling, there's
> no way you can guarantee it's early enough, do you?

I wonder if it can be better addressed by some "eject-on-parent-reset"
or "eject-on-vm-reset" property which would automatically eject the
device when the parent bridge or the vm is reset, so that the device is
in predictably unplugged state on every boot?

Roman.

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2019-01-14 11:11 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-07 22:29 [Qemu-devel] [PATCH v3 0/5] Support for datapath switching during live migration Venu Busireddy
2019-01-07 22:29 ` [virtio-dev] " Venu Busireddy
2019-01-07 22:29 ` [Qemu-devel] [PATCH v3 1/5] virtio_net: Add VIRTIO_NET_F_STANDBY feature bit Venu Busireddy
2019-01-07 22:29   ` [virtio-dev] " Venu Busireddy
2019-01-08 16:56   ` [Qemu-devel] " Dongli Zhang
2019-01-08 17:25     ` Venu Busireddy
2019-01-08 17:25       ` [virtio-dev] " Venu Busireddy
2019-01-09  0:14       ` Dongli Zhang
2019-01-09  0:18         ` Samudrala, Sridhar
2019-01-09  0:18           ` [virtio-dev] " Samudrala, Sridhar
2019-01-09  0:39           ` Dongli Zhang
2019-01-09  4:17             ` Michael S. Tsirkin
2019-01-09  4:17               ` [virtio-dev] " Michael S. Tsirkin
2019-01-07 22:29 ` [Qemu-devel] [PATCH v3 2/5] virtio_net: Add support for "Data Path Switching" during Live Migration Venu Busireddy
2019-01-07 22:29   ` [virtio-dev] " Venu Busireddy
2019-01-09 13:39   ` [Qemu-devel] " Cornelia Huck
2019-01-09 13:39     ` Cornelia Huck
2019-01-09 15:56   ` [Qemu-devel] " Michael S. Tsirkin
2019-01-09 15:56     ` [virtio-dev] " Michael S. Tsirkin
2019-01-11  2:09     ` [Qemu-devel] " si-wei liu
2019-01-11  2:09       ` [virtio-dev] " si-wei liu
2019-01-11  3:20       ` Michael S. Tsirkin
2019-01-11  3:20         ` [virtio-dev] " Michael S. Tsirkin
2019-01-11  7:09         ` [Qemu-devel] [virtio-dev] " si-wei liu
2019-01-11  7:09           ` [virtio-dev] Re: [Qemu-devel] " si-wei liu
2019-01-14 11:10           ` [Qemu-devel] [virtio-dev] " Roman Kagan
2019-01-07 22:29 ` [Qemu-devel] [PATCH v3 3/5] virtio_net: Add a query command for FAILOVER_STANDBY_CHANGED event Venu Busireddy
2019-01-07 22:29   ` [virtio-dev] " Venu Busireddy
2019-01-08  0:10   ` [Qemu-devel] " Michael S. Tsirkin
2019-01-08  0:10     ` [virtio-dev] " Michael S. Tsirkin
2019-01-07 22:29 ` [Qemu-devel] [PATCH v3 4/5] vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover Venu Busireddy
2019-01-07 22:29   ` [virtio-dev] " Venu Busireddy
2019-01-07 23:17   ` [Qemu-devel] " Alex Williamson
2019-01-07 23:22     ` Michael S. Tsirkin
2019-01-07 23:22       ` [virtio-dev] " Michael S. Tsirkin
2019-01-07 23:41       ` Alex Williamson
2019-01-08  0:12         ` Michael S. Tsirkin
2019-01-08  0:12           ` [virtio-dev] " Michael S. Tsirkin
2019-01-08  0:24           ` Alex Williamson
2019-01-08  0:43             ` Michael S. Tsirkin
2019-01-08  0:43               ` [virtio-dev] " Michael S. Tsirkin
2019-01-08  1:13         ` si-wei liu
2019-01-08  1:13           ` [virtio-dev] " si-wei liu
2019-01-07 22:29 ` [Qemu-devel] [PATCH v3 5/5] pci: query command extension to check the bus master enabling status of the failover-primary device Venu Busireddy
2019-01-07 22:29   ` [virtio-dev] " Venu Busireddy
2019-01-07 23:32 ` [Qemu-devel] [PATCH v3 0/5] Support for datapath switching during live migration Michael S. Tsirkin
2019-01-07 23:32   ` [virtio-dev] " Michael S. Tsirkin
2019-01-08  1:45   ` [Qemu-devel] " si-wei liu
2019-01-08  1:45     ` si-wei liu
2019-01-08  2:25     ` [Qemu-devel] " Michael S. Tsirkin
2019-01-08  2:25       ` Michael S. Tsirkin
2019-01-09  4:55       ` [Qemu-devel] " si-wei liu
2019-01-09  4:55         ` si-wei liu
2019-01-09 13:39         ` [Qemu-devel] " Michael S. Tsirkin
2019-01-09 13:39           ` Michael S. Tsirkin
2019-01-11  6:57           ` [Qemu-devel] " si-wei liu
2019-01-11  6:57             ` si-wei liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.