qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC v3 0/2] Add live migration support in the PVRDMA device
@ 2019-07-20 23:48 Sukrit Bhatnagar
  2019-07-20 23:48 ` [Qemu-devel] [RFC v3 1/2] hw/pvrdma: make DSR mapping idempotent in load_dsr() Sukrit Bhatnagar
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Sukrit Bhatnagar @ 2019-07-20 23:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: Yuval Shaia

In v2, we had successful migration of PCI and MSIX states as well as
various DMA addresses and ring page information.
This series enables the migration of various GIDs used by the device.

We have switched to a setup having two hosts and two VMs running atop them.
Migrations are now performed over the local network. This has settled the
same-host issue with libvirt.

We also have performed various ping-pong tests (ibv_rc_pingpong) in the
guest(s) after adding GID migration support and this is the current status:
- ping-pong to localhost succeeds, when performed before starting the
  migration and after the completion of migration.
- ping-pong to a peer succeeds, both before and after migration as above,
  provided that both VMs are running on/migrated to the same host.
  So, if two VMs were started on two different hosts, and one of them
  was migrated to the other host, the ping-pong was successful.
  Similarly, if two VMs are migrated to the same host, then after migration,
  the ping-pong was successful.
- ping-pong to a peer on the remote host is not working as of now.

Our next goal is to achieve successful migration with live traffic.

This series can be also found at:
https://github.com/skrtbhtngr/qemu/tree/gsoc19


History:

v2 -> v3:
- remove struct PVRDMAMigTmp and VMSTATE_WITH_TMP
- use predefined PVRDMA_HW_NAME for the vmsd name
- add vmsd for gids and a gid table field in pvrdma_state
- perform gid registration in pvrdma_post_load
- define pvrdma_post_save to unregister gids in the source host

v1 -> v2:
- modify load_dsr() to make it idempotent
- switch to VMStateDescription
- add fields for PCI and MSIX state
- define a temporary struct PVRDMAMigTmp to use WITH_TMP macro
- perform mappings to CQ and event notification rings at load
- vmxnet3 issue solved by Marcel's patch
- BounceBuffer issue solved automatically by switching to VMStateDescription


Link(s) to v2:
https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01848.html
https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01849.html
https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01850.html

Link(s) to v1:
https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg04924.html
https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg04923.html

Sukrit Bhatnagar (2):
  hw/pvrdma: make DSR mapping idempotent in load_dsr()
  hw/pvrdma: add live migration support

 hw/rdma/vmw/pvrdma_main.c | 94 +++++++++++++++++++++++++++++++++++----
 1 file changed, 86 insertions(+), 8 deletions(-)

-- 
2.21.0



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Qemu-devel] [RFC v3 1/2] hw/pvrdma: make DSR mapping idempotent in load_dsr()
  2019-07-20 23:48 [Qemu-devel] [RFC v3 0/2] Add live migration support in the PVRDMA device Sukrit Bhatnagar
@ 2019-07-20 23:48 ` Sukrit Bhatnagar
  2019-07-20 23:48 ` [Qemu-devel] [RFC v3 2/2] hw/pvrdma: add live migration support Sukrit Bhatnagar
  2019-08-15 11:03 ` [Qemu-devel] [RFC v3 0/2] Add live migration support in the PVRDMA device Yuval Shaia
  2 siblings, 0 replies; 4+ messages in thread
From: Sukrit Bhatnagar @ 2019-07-20 23:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: Yuval Shaia

Map to DSR only when there is no mapping done already i.e., when
dev->dsr_info.dsr is NULL. This allows the rest of mappings and
ring inits to be done by calling load_dsr() when DSR has already
been mapped to, somewhere else.

Move free_dsr() out of load_dsr() and call it before the latter
as and when needed. This aids the case where load_dsr() is called
having DSR mapping already done, but the rest of map and init
operations are pending, and prevents an unmap of the DSR.

Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Sukrit Bhatnagar <skrtbhtngr@gmail.com>
---
 hw/rdma/vmw/pvrdma_main.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
index adcf79cd63..6c90db96f9 100644
--- a/hw/rdma/vmw/pvrdma_main.c
+++ b/hw/rdma/vmw/pvrdma_main.c
@@ -172,15 +172,15 @@ static int load_dsr(PVRDMADev *dev)
     DSRInfo *dsr_info;
     struct pvrdma_device_shared_region *dsr;
 
-    free_dsr(dev);
-
-    /* Map to DSR */
-    dev->dsr_info.dsr = rdma_pci_dma_map(pci_dev, dev->dsr_info.dma,
-                              sizeof(struct pvrdma_device_shared_region));
     if (!dev->dsr_info.dsr) {
-        rdma_error_report("Failed to map to DSR");
-        rc = -ENOMEM;
-        goto out;
+        /* Map to DSR */
+        dev->dsr_info.dsr = rdma_pci_dma_map(pci_dev, dev->dsr_info.dma,
+                                  sizeof(struct pvrdma_device_shared_region));
+        if (!dev->dsr_info.dsr) {
+            rdma_error_report("Failed to map to DSR");
+            rc = -ENOMEM;
+            goto out;
+        }
     }
 
     /* Shortcuts */
@@ -402,6 +402,7 @@ static void pvrdma_regs_write(void *opaque, hwaddr addr, uint64_t val,
     case PVRDMA_REG_DSRHIGH:
         trace_pvrdma_regs_write(addr, val, "DSRHIGH", "");
         dev->dsr_info.dma |= val << 32;
+        free_dsr(dev);
         load_dsr(dev);
         init_dsr_dev_caps(dev);
         break;
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [Qemu-devel] [RFC v3 2/2] hw/pvrdma: add live migration support
  2019-07-20 23:48 [Qemu-devel] [RFC v3 0/2] Add live migration support in the PVRDMA device Sukrit Bhatnagar
  2019-07-20 23:48 ` [Qemu-devel] [RFC v3 1/2] hw/pvrdma: make DSR mapping idempotent in load_dsr() Sukrit Bhatnagar
@ 2019-07-20 23:48 ` Sukrit Bhatnagar
  2019-08-15 11:03 ` [Qemu-devel] [RFC v3 0/2] Add live migration support in the PVRDMA device Yuval Shaia
  2 siblings, 0 replies; 4+ messages in thread
From: Sukrit Bhatnagar @ 2019-07-20 23:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: Yuval Shaia

vmstate_pvrdma describes the PCI and MSIX states as well as the dma
address for dsr and the gid table of device.
vmstate_pvrdma_gids describes each gid in the gid table.

pvrdma_post_save() does the job of unregistering gid entries from the
backend device in the source host.

pvrdma_post_load() maps to dsr using the loaded dma address, registers
each loaded gid into the backend device, and finally calls load_dsr()
to perform other mappings and ring init operations.

Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Sukrit Bhatnagar <skrtbhtngr@gmail.com>
---
 hw/rdma/vmw/pvrdma_main.c | 77 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
index 6c90db96f9..6f8b56dea3 100644
--- a/hw/rdma/vmw/pvrdma_main.c
+++ b/hw/rdma/vmw/pvrdma_main.c
@@ -28,6 +28,7 @@
 #include "sysemu/sysemu.h"
 #include "monitor/monitor.h"
 #include "hw/rdma/rdma.h"
+#include "migration/register.h"
 
 #include "../rdma_rm.h"
 #include "../rdma_backend.h"
@@ -593,6 +594,81 @@ static void pvrdma_shutdown_notifier(Notifier *n, void *opaque)
     pvrdma_fini(pci_dev);
 }
 
+static int pvrdma_post_save(void *opaque)
+{
+    int i, rc;
+    PVRDMADev *dev = opaque;
+
+    for (i = 0; i < MAX_GIDS; i++) {
+
+        if (!dev->rdma_dev_res.port.gid_tbl[i].gid.global.interface_id) {
+            continue;
+        }
+        rc = rdma_backend_del_gid(&dev->backend_dev,
+                                   dev->backend_eth_device_name,
+                                   &dev->rdma_dev_res.port.gid_tbl[i].gid);
+        if (rc) {
+            return -EINVAL;
+        }
+    }
+
+    return 0;
+}
+
+static int pvrdma_post_load(void *opaque, int version_id)
+{
+    int i, rc;
+    PVRDMADev *dev = opaque;
+    PCIDevice *pci_dev = PCI_DEVICE(dev);
+    DSRInfo *dsr_info = &dev->dsr_info;
+
+    dsr_info->dsr = rdma_pci_dma_map(pci_dev, dsr_info->dma,
+                                sizeof(struct pvrdma_device_shared_region));
+    if (!dsr_info->dsr) {
+        rdma_error_report("Failed to map to DSR");
+        return -ENOMEM;
+    }
+
+    for (i = 0; i < MAX_GIDS; i++) {
+
+        if (!dev->rdma_dev_res.port.gid_tbl[i].gid.global.interface_id) {
+            continue;
+        }
+
+        rc = rdma_backend_add_gid(&dev->backend_dev,
+                                  dev->backend_eth_device_name,
+                                  &dev->rdma_dev_res.port.gid_tbl[i].gid);
+        if (rc) {
+            return -EINVAL;
+        }
+    }
+
+    return load_dsr(dev);
+}
+
+static const VMStateDescription vmstate_pvrdma_gids = {
+    .name = "pvrdma-gids",
+    .fields = (VMStateField[]) {
+            VMSTATE_UINT8_ARRAY_V(gid.raw, RdmaRmGid, 16, 0),
+            VMSTATE_END_OF_LIST()
+    }
+};
+
+static const VMStateDescription vmstate_pvrdma = {
+    .name = PVRDMA_HW_NAME,
+    .post_save = pvrdma_post_save,
+    .post_load = pvrdma_post_load,
+    .fields = (VMStateField[]) {
+            VMSTATE_PCI_DEVICE(parent_obj, PVRDMADev),
+            VMSTATE_MSIX(parent_obj, PVRDMADev),
+            VMSTATE_UINT64(dsr_info.dma, PVRDMADev),
+            VMSTATE_STRUCT_ARRAY(rdma_dev_res.port.gid_tbl, PVRDMADev,
+                                 MAX_PORT_GIDS, 0, vmstate_pvrdma_gids,
+                                 RdmaRmGid),
+            VMSTATE_END_OF_LIST()
+    }
+};
+
 static void pvrdma_realize(PCIDevice *pdev, Error **errp)
 {
     int rc = 0;
@@ -688,6 +764,7 @@ static void pvrdma_class_init(ObjectClass *klass, void *data)
 
     dc->desc = "RDMA Device";
     dc->props = pvrdma_dev_properties;
+    dc->vmsd = &vmstate_pvrdma;
     set_bit(DEVICE_CATEGORY_NETWORK, dc->categories);
 
     ir->print_statistics = pvrdma_print_statistics;
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] [RFC v3 0/2] Add live migration support in the PVRDMA device
  2019-07-20 23:48 [Qemu-devel] [RFC v3 0/2] Add live migration support in the PVRDMA device Sukrit Bhatnagar
  2019-07-20 23:48 ` [Qemu-devel] [RFC v3 1/2] hw/pvrdma: make DSR mapping idempotent in load_dsr() Sukrit Bhatnagar
  2019-07-20 23:48 ` [Qemu-devel] [RFC v3 2/2] hw/pvrdma: add live migration support Sukrit Bhatnagar
@ 2019-08-15 11:03 ` Yuval Shaia
  2 siblings, 0 replies; 4+ messages in thread
From: Yuval Shaia @ 2019-08-15 11:03 UTC (permalink / raw)
  To: Sukrit Bhatnagar; +Cc: qemu-devel

On Sun, Jul 21, 2019 at 05:18:01AM +0530, Sukrit Bhatnagar wrote:
> In v2, we had successful migration of PCI and MSIX states as well as
> various DMA addresses and ring page information.
> This series enables the migration of various GIDs used by the device.
> 
> We have switched to a setup having two hosts and two VMs running atop them.
> Migrations are now performed over the local network. This has settled the
> same-host issue with libvirt.
> 
> We also have performed various ping-pong tests (ibv_rc_pingpong) in the
> guest(s) after adding GID migration support and this is the current status:
> - ping-pong to localhost succeeds, when performed before starting the
>   migration and after the completion of migration.
> - ping-pong to a peer succeeds, both before and after migration as above,
>   provided that both VMs are running on/migrated to the same host.
>   So, if two VMs were started on two different hosts, and one of them
>   was migrated to the other host, the ping-pong was successful.
>   Similarly, if two VMs are migrated to the same host, then after migration,
>   the ping-pong was successful.
> - ping-pong to a peer on the remote host is not working as of now.
> 
> Our next goal is to achieve successful migration with live traffic.

As this is a major milestone which enable live migration (still when there
are no QPs), i believe we are ok for a patch.

Yuval

> 
> This series can be also found at:
> https://github.com/skrtbhtngr/qemu/tree/gsoc19
> 
> 
> History:
> 
> v2 -> v3:
> - remove struct PVRDMAMigTmp and VMSTATE_WITH_TMP
> - use predefined PVRDMA_HW_NAME for the vmsd name
> - add vmsd for gids and a gid table field in pvrdma_state
> - perform gid registration in pvrdma_post_load
> - define pvrdma_post_save to unregister gids in the source host
> 
> v1 -> v2:
> - modify load_dsr() to make it idempotent
> - switch to VMStateDescription
> - add fields for PCI and MSIX state
> - define a temporary struct PVRDMAMigTmp to use WITH_TMP macro
> - perform mappings to CQ and event notification rings at load
> - vmxnet3 issue solved by Marcel's patch
> - BounceBuffer issue solved automatically by switching to VMStateDescription
> 
> 
> Link(s) to v2:
> https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01848.html
> https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01849.html
> https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01850.html
> 
> Link(s) to v1:
> https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg04924.html
> https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg04923.html
> 
> Sukrit Bhatnagar (2):
>   hw/pvrdma: make DSR mapping idempotent in load_dsr()
>   hw/pvrdma: add live migration support
> 
>  hw/rdma/vmw/pvrdma_main.c | 94 +++++++++++++++++++++++++++++++++++----
>  1 file changed, 86 insertions(+), 8 deletions(-)
> 
> -- 
> 2.21.0
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-08-15 11:04 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-20 23:48 [Qemu-devel] [RFC v3 0/2] Add live migration support in the PVRDMA device Sukrit Bhatnagar
2019-07-20 23:48 ` [Qemu-devel] [RFC v3 1/2] hw/pvrdma: make DSR mapping idempotent in load_dsr() Sukrit Bhatnagar
2019-07-20 23:48 ` [Qemu-devel] [RFC v3 2/2] hw/pvrdma: add live migration support Sukrit Bhatnagar
2019-08-15 11:03 ` [Qemu-devel] [RFC v3 0/2] Add live migration support in the PVRDMA device Yuval Shaia

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).