qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v1 0/2] Add live migration support in the PVRDMA device
@ 2019-08-28 14:23 Sukrit Bhatnagar
  2019-08-28 14:23 ` [Qemu-devel] [PATCH v1 1/2] hw/pvrdma: make DSR mapping idempotent in load_dsr() Sukrit Bhatnagar
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Sukrit Bhatnagar @ 2019-08-28 14:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: Yuval Shaia

This series enables the migration of various GIDs used by the device.    
This is in addition to the successful migration of PCI and MSIX states
as well as various DMA addresses and ring page information.
    
We have a setup having two hosts and two VMs running atop them.    
Migrations are performed over the local network.    

We also have performed various ping-pong tests (ibv_rc_pingpong) in the    
guest(s) after adding GID migration support and this is the current status:    
- ping-pong to localhost succeeds, when performed before starting the    
  migration and after the completion of migration.    
- ping-pong to a peer succeeds, both before and after migration as above,    
  provided that both VMs are running on/migrated to the same host.    
  So, if two VMs were started on two different hosts, and one of them    
  was migrated to the other host, the ping-pong was successful.    
  Similarly, if two VMs are migrated to the same host, then after migration,    
  the ping-pong was successful.    
- ping-pong to a peer on the remote host is not working as of now.    
    
Our next goal is to achieve successful migration with live traffic.    
    
This is the same as the RFC v3 series posted earlier:
https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg04752.html
https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg04753.html
https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg04754.html


Sukrit Bhatnagar (2):
  hw/pvrdma: make DSR mapping idempotent in load_dsr()
  hw/pvrdma: add live migration support

 hw/rdma/vmw/pvrdma_main.c | 94 +++++++++++++++++++++++++++++++++++----
 1 file changed, 86 insertions(+), 8 deletions(-)

-- 
2.21.0



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH v1 1/2] hw/pvrdma: make DSR mapping idempotent in load_dsr()
  2019-08-28 14:23 [Qemu-devel] [PATCH v1 0/2] Add live migration support in the PVRDMA device Sukrit Bhatnagar
@ 2019-08-28 14:23 ` Sukrit Bhatnagar
  2019-08-29 12:57   ` Yuval Shaia
  2019-08-31 19:39   ` Marcel Apfelbaum
  2019-08-28 14:23 ` [Qemu-devel] [PATCH v1 2/2] hw/pvrdma: add live migration support Sukrit Bhatnagar
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 14+ messages in thread
From: Sukrit Bhatnagar @ 2019-08-28 14:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: Yuval Shaia

Map to DSR only when there is no mapping done already i.e., when
dev->dsr_info.dsr is NULL. This allows the rest of mappings and
ring inits to be done by calling load_dsr() when DSR has already
been mapped to, somewhere else.

Move free_dsr() out of load_dsr() and call it before the latter
as and when needed. This aids the case where load_dsr() is called
having DSR mapping already done, but the rest of map and init
operations are pending, and prevents an unmap of the DSR.

Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Sukrit Bhatnagar <skrtbhtngr@gmail.com>
---
 hw/rdma/vmw/pvrdma_main.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
index adcf79cd63..6c90db96f9 100644
--- a/hw/rdma/vmw/pvrdma_main.c
+++ b/hw/rdma/vmw/pvrdma_main.c
@@ -172,15 +172,15 @@ static int load_dsr(PVRDMADev *dev)
     DSRInfo *dsr_info;
     struct pvrdma_device_shared_region *dsr;
 
-    free_dsr(dev);
-
-    /* Map to DSR */
-    dev->dsr_info.dsr = rdma_pci_dma_map(pci_dev, dev->dsr_info.dma,
-                              sizeof(struct pvrdma_device_shared_region));
     if (!dev->dsr_info.dsr) {
-        rdma_error_report("Failed to map to DSR");
-        rc = -ENOMEM;
-        goto out;
+        /* Map to DSR */
+        dev->dsr_info.dsr = rdma_pci_dma_map(pci_dev, dev->dsr_info.dma,
+                                  sizeof(struct pvrdma_device_shared_region));
+        if (!dev->dsr_info.dsr) {
+            rdma_error_report("Failed to map to DSR");
+            rc = -ENOMEM;
+            goto out;
+        }
     }
 
     /* Shortcuts */
@@ -402,6 +402,7 @@ static void pvrdma_regs_write(void *opaque, hwaddr addr, uint64_t val,
     case PVRDMA_REG_DSRHIGH:
         trace_pvrdma_regs_write(addr, val, "DSRHIGH", "");
         dev->dsr_info.dma |= val << 32;
+        free_dsr(dev);
         load_dsr(dev);
         init_dsr_dev_caps(dev);
         break;
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH v1 2/2] hw/pvrdma: add live migration support
  2019-08-28 14:23 [Qemu-devel] [PATCH v1 0/2] Add live migration support in the PVRDMA device Sukrit Bhatnagar
  2019-08-28 14:23 ` [Qemu-devel] [PATCH v1 1/2] hw/pvrdma: make DSR mapping idempotent in load_dsr() Sukrit Bhatnagar
@ 2019-08-28 14:23 ` Sukrit Bhatnagar
  2019-08-29 12:53   ` Yuval Shaia
                     ` (2 more replies)
  2019-08-29 12:00 ` [Qemu-devel] [PATCH v1 0/2] Add live migration support in the PVRDMA device Yuval Shaia
  2019-08-31 19:37 ` Marcel Apfelbaum
  3 siblings, 3 replies; 14+ messages in thread
From: Sukrit Bhatnagar @ 2019-08-28 14:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: Yuval Shaia

vmstate_pvrdma describes the PCI and MSIX states as well as the dma
address for dsr and the gid table of device.
vmstate_pvrdma_gids describes each gid in the gid table.

pvrdma_post_save() does the job of unregistering gid entries from the
backend device in the source host.

pvrdma_post_load() maps to dsr using the loaded dma address, registers
each loaded gid into the backend device, and finally calls load_dsr()
to perform other mappings and ring init operations.

Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Sukrit Bhatnagar <skrtbhtngr@gmail.com>
---
 hw/rdma/vmw/pvrdma_main.c | 77 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
index 6c90db96f9..6f8b56dea3 100644
--- a/hw/rdma/vmw/pvrdma_main.c
+++ b/hw/rdma/vmw/pvrdma_main.c
@@ -28,6 +28,7 @@
 #include "sysemu/sysemu.h"
 #include "monitor/monitor.h"
 #include "hw/rdma/rdma.h"
+#include "migration/register.h"
 
 #include "../rdma_rm.h"
 #include "../rdma_backend.h"
@@ -593,6 +594,81 @@ static void pvrdma_shutdown_notifier(Notifier *n, void *opaque)
     pvrdma_fini(pci_dev);
 }
 
+static int pvrdma_post_save(void *opaque)
+{
+    int i, rc;
+    PVRDMADev *dev = opaque;
+
+    for (i = 0; i < MAX_GIDS; i++) {
+
+        if (!dev->rdma_dev_res.port.gid_tbl[i].gid.global.interface_id) {
+            continue;
+        }
+        rc = rdma_backend_del_gid(&dev->backend_dev,
+                                   dev->backend_eth_device_name,
+                                   &dev->rdma_dev_res.port.gid_tbl[i].gid);
+        if (rc) {
+            return -EINVAL;
+        }
+    }
+
+    return 0;
+}
+
+static int pvrdma_post_load(void *opaque, int version_id)
+{
+    int i, rc;
+    PVRDMADev *dev = opaque;
+    PCIDevice *pci_dev = PCI_DEVICE(dev);
+    DSRInfo *dsr_info = &dev->dsr_info;
+
+    dsr_info->dsr = rdma_pci_dma_map(pci_dev, dsr_info->dma,
+                                sizeof(struct pvrdma_device_shared_region));
+    if (!dsr_info->dsr) {
+        rdma_error_report("Failed to map to DSR");
+        return -ENOMEM;
+    }
+
+    for (i = 0; i < MAX_GIDS; i++) {
+
+        if (!dev->rdma_dev_res.port.gid_tbl[i].gid.global.interface_id) {
+            continue;
+        }
+
+        rc = rdma_backend_add_gid(&dev->backend_dev,
+                                  dev->backend_eth_device_name,
+                                  &dev->rdma_dev_res.port.gid_tbl[i].gid);
+        if (rc) {
+            return -EINVAL;
+        }
+    }
+
+    return load_dsr(dev);
+}
+
+static const VMStateDescription vmstate_pvrdma_gids = {
+    .name = "pvrdma-gids",
+    .fields = (VMStateField[]) {
+            VMSTATE_UINT8_ARRAY_V(gid.raw, RdmaRmGid, 16, 0),
+            VMSTATE_END_OF_LIST()
+    }
+};
+
+static const VMStateDescription vmstate_pvrdma = {
+    .name = PVRDMA_HW_NAME,
+    .post_save = pvrdma_post_save,
+    .post_load = pvrdma_post_load,
+    .fields = (VMStateField[]) {
+            VMSTATE_PCI_DEVICE(parent_obj, PVRDMADev),
+            VMSTATE_MSIX(parent_obj, PVRDMADev),
+            VMSTATE_UINT64(dsr_info.dma, PVRDMADev),
+            VMSTATE_STRUCT_ARRAY(rdma_dev_res.port.gid_tbl, PVRDMADev,
+                                 MAX_PORT_GIDS, 0, vmstate_pvrdma_gids,
+                                 RdmaRmGid),
+            VMSTATE_END_OF_LIST()
+    }
+};
+
 static void pvrdma_realize(PCIDevice *pdev, Error **errp)
 {
     int rc = 0;
@@ -688,6 +764,7 @@ static void pvrdma_class_init(ObjectClass *klass, void *data)
 
     dc->desc = "RDMA Device";
     dc->props = pvrdma_dev_properties;
+    dc->vmsd = &vmstate_pvrdma;
     set_bit(DEVICE_CATEGORY_NETWORK, dc->categories);
 
     ir->print_statistics = pvrdma_print_statistics;
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v1 0/2] Add live migration support in the PVRDMA device
  2019-08-28 14:23 [Qemu-devel] [PATCH v1 0/2] Add live migration support in the PVRDMA device Sukrit Bhatnagar
  2019-08-28 14:23 ` [Qemu-devel] [PATCH v1 1/2] hw/pvrdma: make DSR mapping idempotent in load_dsr() Sukrit Bhatnagar
  2019-08-28 14:23 ` [Qemu-devel] [PATCH v1 2/2] hw/pvrdma: add live migration support Sukrit Bhatnagar
@ 2019-08-29 12:00 ` Yuval Shaia
  2019-08-31 19:37 ` Marcel Apfelbaum
  3 siblings, 0 replies; 14+ messages in thread
From: Yuval Shaia @ 2019-08-29 12:00 UTC (permalink / raw)
  To: Sukrit Bhatnagar; +Cc: qemu-devel

On Wed, Aug 28, 2019 at 07:53:26PM +0530, Sukrit Bhatnagar wrote:
> This series enables the migration of various GIDs used by the device.    
> This is in addition to the successful migration of PCI and MSIX states
> as well as various DMA addresses and ring page information.
>     
> We have a setup having two hosts and two VMs running atop them.    
> Migrations are performed over the local network.    
> 
> We also have performed various ping-pong tests (ibv_rc_pingpong) in the    
> guest(s) after adding GID migration support and this is the current status:    
> - ping-pong to localhost succeeds, when performed before starting the    
>   migration and after the completion of migration.    
> - ping-pong to a peer succeeds, both before and after migration as above,    
>   provided that both VMs are running on/migrated to the same host.    
>   So, if two VMs were started on two different hosts, and one of them    
>   was migrated to the other host, the ping-pong was successful.    

This limitation looks to me like wrongly configured network. This patch-set
should enable any migration.

After our last meeting, please confirm.

In addition, i don't see why non-VM peer (i.e. Bare-metal) is not
supported. Can you run a test and update with the results?

>   Similarly, if two VMs are migrated to the same host, then after migration,    
>   the ping-pong was successful.    
> - ping-pong to a peer on the remote host is not working as of now.    
>     
> Our next goal is to achieve successful migration with live traffic.    
>     
> This is the same as the RFC v3 series posted earlier:
> https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg04752.html
> https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg04753.html
> https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg04754.html
> 
> 
> Sukrit Bhatnagar (2):
>   hw/pvrdma: make DSR mapping idempotent in load_dsr()
>   hw/pvrdma: add live migration support
> 
>  hw/rdma/vmw/pvrdma_main.c | 94 +++++++++++++++++++++++++++++++++++----
>  1 file changed, 86 insertions(+), 8 deletions(-)
> 
> -- 
> 2.21.0
> 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v1 2/2] hw/pvrdma: add live migration support
  2019-08-28 14:23 ` [Qemu-devel] [PATCH v1 2/2] hw/pvrdma: add live migration support Sukrit Bhatnagar
@ 2019-08-29 12:53   ` Yuval Shaia
  2019-09-03 21:33     ` Sukrit Bhatnagar
  2019-08-29 12:56   ` Yuval Shaia
  2019-08-31 19:45   ` Marcel Apfelbaum
  2 siblings, 1 reply; 14+ messages in thread
From: Yuval Shaia @ 2019-08-29 12:53 UTC (permalink / raw)
  To: Sukrit Bhatnagar; +Cc: qemu-devel

On Wed, Aug 28, 2019 at 07:53:28PM +0530, Sukrit Bhatnagar wrote:
> vmstate_pvrdma describes the PCI and MSIX states as well as the dma
> address for dsr and the gid table of device.
> vmstate_pvrdma_gids describes each gid in the gid table.
> 
> pvrdma_post_save() does the job of unregistering gid entries from the
> backend device in the source host.
> 
> pvrdma_post_load() maps to dsr using the loaded dma address, registers
> each loaded gid into the backend device, and finally calls load_dsr()
> to perform other mappings and ring init operations.

I think it worth to mention that the dma address is kept in driver/device
shared memory (dsr->dma) which is migrated as part of memory migration and
it is out of the scope of this change and so we do not need to save/load
the dma address during migration.

Also you should specifically comment that this migration-support does not
includes QP migration. This means that support for life migration *during*
traffic is not yet supported.

> 
> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> Cc: Yuval Shaia <yuval.shaia@oracle.com>
> Signed-off-by: Sukrit Bhatnagar <skrtbhtngr@gmail.com>
> ---
>  hw/rdma/vmw/pvrdma_main.c | 77 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 77 insertions(+)
> 
> diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
> index 6c90db96f9..6f8b56dea3 100644
> --- a/hw/rdma/vmw/pvrdma_main.c
> +++ b/hw/rdma/vmw/pvrdma_main.c
> @@ -28,6 +28,7 @@
>  #include "sysemu/sysemu.h"
>  #include "monitor/monitor.h"
>  #include "hw/rdma/rdma.h"
> +#include "migration/register.h"
>  
>  #include "../rdma_rm.h"
>  #include "../rdma_backend.h"
> @@ -593,6 +594,81 @@ static void pvrdma_shutdown_notifier(Notifier *n, void *opaque)
>      pvrdma_fini(pci_dev);
>  }
>  
> +static int pvrdma_post_save(void *opaque)
> +{
> +    int i, rc;
> +    PVRDMADev *dev = opaque;
> +
> +    for (i = 0; i < MAX_GIDS; i++) {
> +

Empty line is redundant here.

> +        if (!dev->rdma_dev_res.port.gid_tbl[i].gid.global.interface_id) {
> +            continue;
> +        }
> +        rc = rdma_backend_del_gid(&dev->backend_dev,
> +                                   dev->backend_eth_device_name,
> +                                   &dev->rdma_dev_res.port.gid_tbl[i].gid);
> +        if (rc) {
> +            return -EINVAL;

Some error report will help here i guess.

> +        }
> +    }
> +
> +    return 0;
> +}
> +
> +static int pvrdma_post_load(void *opaque, int version_id)
> +{
> +    int i, rc;
> +    PVRDMADev *dev = opaque;
> +    PCIDevice *pci_dev = PCI_DEVICE(dev);
> +    DSRInfo *dsr_info = &dev->dsr_info;
> +
> +    dsr_info->dsr = rdma_pci_dma_map(pci_dev, dsr_info->dma,
> +                                sizeof(struct pvrdma_device_shared_region));
> +    if (!dsr_info->dsr) {
> +        rdma_error_report("Failed to map to DSR");
> +        return -ENOMEM;
> +    }
> +
> +    for (i = 0; i < MAX_GIDS; i++) {
> +

Empty line is redundant here.

> +        if (!dev->rdma_dev_res.port.gid_tbl[i].gid.global.interface_id) {
> +            continue;
> +        }
> +
> +        rc = rdma_backend_add_gid(&dev->backend_dev,
> +                                  dev->backend_eth_device_name,
> +                                  &dev->rdma_dev_res.port.gid_tbl[i].gid);
> +        if (rc) {
> +            return -EINVAL;
> +        }
> +    }
> +
> +    return load_dsr(dev);
> +}
> +
> +static const VMStateDescription vmstate_pvrdma_gids = {
> +    .name = "pvrdma-gids",
> +    .fields = (VMStateField[]) {
> +            VMSTATE_UINT8_ARRAY_V(gid.raw, RdmaRmGid, 16, 0),
> +            VMSTATE_END_OF_LIST()
> +    }
> +};
> +
> +static const VMStateDescription vmstate_pvrdma = {
> +    .name = PVRDMA_HW_NAME,
> +    .post_save = pvrdma_post_save,
> +    .post_load = pvrdma_post_load,
> +    .fields = (VMStateField[]) {
> +            VMSTATE_PCI_DEVICE(parent_obj, PVRDMADev),
> +            VMSTATE_MSIX(parent_obj, PVRDMADev),
> +            VMSTATE_UINT64(dsr_info.dma, PVRDMADev),
> +            VMSTATE_STRUCT_ARRAY(rdma_dev_res.port.gid_tbl, PVRDMADev,
> +                                 MAX_PORT_GIDS, 0, vmstate_pvrdma_gids,
> +                                 RdmaRmGid),
> +            VMSTATE_END_OF_LIST()
> +    }
> +};
> +
>  static void pvrdma_realize(PCIDevice *pdev, Error **errp)
>  {
>      int rc = 0;
> @@ -688,6 +764,7 @@ static void pvrdma_class_init(ObjectClass *klass, void *data)
>  
>      dc->desc = "RDMA Device";
>      dc->props = pvrdma_dev_properties;
> +    dc->vmsd = &vmstate_pvrdma;
>      set_bit(DEVICE_CATEGORY_NETWORK, dc->categories);
>  
>      ir->print_statistics = pvrdma_print_statistics;
> -- 
> 2.21.0
> 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v1 2/2] hw/pvrdma: add live migration support
  2019-08-28 14:23 ` [Qemu-devel] [PATCH v1 2/2] hw/pvrdma: add live migration support Sukrit Bhatnagar
  2019-08-29 12:53   ` Yuval Shaia
@ 2019-08-29 12:56   ` Yuval Shaia
  2019-08-31 19:45   ` Marcel Apfelbaum
  2 siblings, 0 replies; 14+ messages in thread
From: Yuval Shaia @ 2019-08-29 12:56 UTC (permalink / raw)
  To: Sukrit Bhatnagar; +Cc: qemu-devel

On Wed, Aug 28, 2019 at 07:53:28PM +0530, Sukrit Bhatnagar wrote:
> vmstate_pvrdma describes the PCI and MSIX states as well as the dma
> address for dsr and the gid table of device.
> vmstate_pvrdma_gids describes each gid in the gid table.
> 
> pvrdma_post_save() does the job of unregistering gid entries from the
> backend device in the source host.
> 
> pvrdma_post_load() maps to dsr using the loaded dma address, registers
> each loaded gid into the backend device, and finally calls load_dsr()
> to perform other mappings and ring init operations.
> 
> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> Cc: Yuval Shaia <yuval.shaia@oracle.com>
> Signed-off-by: Sukrit Bhatnagar <skrtbhtngr@gmail.com>
> ---
>  hw/rdma/vmw/pvrdma_main.c | 77 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 77 insertions(+)
> 
> diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
> index 6c90db96f9..6f8b56dea3 100644
> --- a/hw/rdma/vmw/pvrdma_main.c
> +++ b/hw/rdma/vmw/pvrdma_main.c
> @@ -28,6 +28,7 @@
>  #include "sysemu/sysemu.h"
>  #include "monitor/monitor.h"
>  #include "hw/rdma/rdma.h"
> +#include "migration/register.h"
>  
>  #include "../rdma_rm.h"
>  #include "../rdma_backend.h"
> @@ -593,6 +594,81 @@ static void pvrdma_shutdown_notifier(Notifier *n, void *opaque)
>      pvrdma_fini(pci_dev);
>  }
>  
> +static int pvrdma_post_save(void *opaque)
> +{
> +    int i, rc;
> +    PVRDMADev *dev = opaque;
> +
> +    for (i = 0; i < MAX_GIDS; i++) {
> +
> +        if (!dev->rdma_dev_res.port.gid_tbl[i].gid.global.interface_id) {
> +            continue;
> +        }
> +        rc = rdma_backend_del_gid(&dev->backend_dev,
> +                                   dev->backend_eth_device_name,
> +                                   &dev->rdma_dev_res.port.gid_tbl[i].gid);
> +        if (rc) {
> +            return -EINVAL;
> +        }
> +    }
> +
> +    return 0;
> +}
> +
> +static int pvrdma_post_load(void *opaque, int version_id)
> +{
> +    int i, rc;
> +    PVRDMADev *dev = opaque;
> +    PCIDevice *pci_dev = PCI_DEVICE(dev);
> +    DSRInfo *dsr_info = &dev->dsr_info;
> +
> +    dsr_info->dsr = rdma_pci_dma_map(pci_dev, dsr_info->dma,
> +                                sizeof(struct pvrdma_device_shared_region));
> +    if (!dsr_info->dsr) {
> +        rdma_error_report("Failed to map to DSR");
> +        return -ENOMEM;
> +    }
> +
> +    for (i = 0; i < MAX_GIDS; i++) {
> +
> +        if (!dev->rdma_dev_res.port.gid_tbl[i].gid.global.interface_id) {
> +            continue;
> +        }
> +
> +        rc = rdma_backend_add_gid(&dev->backend_dev,
> +                                  dev->backend_eth_device_name,
> +                                  &dev->rdma_dev_res.port.gid_tbl[i].gid);
> +        if (rc) {
> +            return -EINVAL;
> +        }
> +    }
> +
> +    return load_dsr(dev);

This check is better performed before any gid manipulation on the host
because no one will undo it if load_dsr fails.

> +}
> +
> +static const VMStateDescription vmstate_pvrdma_gids = {
> +    .name = "pvrdma-gids",
> +    .fields = (VMStateField[]) {
> +            VMSTATE_UINT8_ARRAY_V(gid.raw, RdmaRmGid, 16, 0),
> +            VMSTATE_END_OF_LIST()
> +    }
> +};
> +
> +static const VMStateDescription vmstate_pvrdma = {
> +    .name = PVRDMA_HW_NAME,
> +    .post_save = pvrdma_post_save,
> +    .post_load = pvrdma_post_load,
> +    .fields = (VMStateField[]) {
> +            VMSTATE_PCI_DEVICE(parent_obj, PVRDMADev),
> +            VMSTATE_MSIX(parent_obj, PVRDMADev),
> +            VMSTATE_UINT64(dsr_info.dma, PVRDMADev),
> +            VMSTATE_STRUCT_ARRAY(rdma_dev_res.port.gid_tbl, PVRDMADev,
> +                                 MAX_PORT_GIDS, 0, vmstate_pvrdma_gids,
> +                                 RdmaRmGid),
> +            VMSTATE_END_OF_LIST()
> +    }
> +};
> +
>  static void pvrdma_realize(PCIDevice *pdev, Error **errp)
>  {
>      int rc = 0;
> @@ -688,6 +764,7 @@ static void pvrdma_class_init(ObjectClass *klass, void *data)
>  
>      dc->desc = "RDMA Device";
>      dc->props = pvrdma_dev_properties;
> +    dc->vmsd = &vmstate_pvrdma;
>      set_bit(DEVICE_CATEGORY_NETWORK, dc->categories);
>  
>      ir->print_statistics = pvrdma_print_statistics;
> -- 
> 2.21.0
> 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v1 1/2] hw/pvrdma: make DSR mapping idempotent in load_dsr()
  2019-08-28 14:23 ` [Qemu-devel] [PATCH v1 1/2] hw/pvrdma: make DSR mapping idempotent in load_dsr() Sukrit Bhatnagar
@ 2019-08-29 12:57   ` Yuval Shaia
  2019-08-31 19:39   ` Marcel Apfelbaum
  1 sibling, 0 replies; 14+ messages in thread
From: Yuval Shaia @ 2019-08-29 12:57 UTC (permalink / raw)
  To: Sukrit Bhatnagar; +Cc: qemu-devel

On Wed, Aug 28, 2019 at 07:53:27PM +0530, Sukrit Bhatnagar wrote:
> Map to DSR only when there is no mapping done already i.e., when
> dev->dsr_info.dsr is NULL. This allows the rest of mappings and
> ring inits to be done by calling load_dsr() when DSR has already
> been mapped to, somewhere else.
> 
> Move free_dsr() out of load_dsr() and call it before the latter
> as and when needed. This aids the case where load_dsr() is called
> having DSR mapping already done, but the rest of map and init
> operations are pending, and prevents an unmap of the DSR.
> 
> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> Cc: Yuval Shaia <yuval.shaia@oracle.com>
> Signed-off-by: Sukrit Bhatnagar <skrtbhtngr@gmail.com>
> ---
>  hw/rdma/vmw/pvrdma_main.c | 17 +++++++++--------
>  1 file changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
> index adcf79cd63..6c90db96f9 100644
> --- a/hw/rdma/vmw/pvrdma_main.c
> +++ b/hw/rdma/vmw/pvrdma_main.c
> @@ -172,15 +172,15 @@ static int load_dsr(PVRDMADev *dev)
>      DSRInfo *dsr_info;
>      struct pvrdma_device_shared_region *dsr;
>  
> -    free_dsr(dev);
> -
> -    /* Map to DSR */
> -    dev->dsr_info.dsr = rdma_pci_dma_map(pci_dev, dev->dsr_info.dma,
> -                              sizeof(struct pvrdma_device_shared_region));
>      if (!dev->dsr_info.dsr) {
> -        rdma_error_report("Failed to map to DSR");
> -        rc = -ENOMEM;
> -        goto out;
> +        /* Map to DSR */
> +        dev->dsr_info.dsr = rdma_pci_dma_map(pci_dev, dev->dsr_info.dma,
> +                                  sizeof(struct pvrdma_device_shared_region));
> +        if (!dev->dsr_info.dsr) {
> +            rdma_error_report("Failed to map to DSR");
> +            rc = -ENOMEM;
> +            goto out;
> +        }
>      }
>  
>      /* Shortcuts */
> @@ -402,6 +402,7 @@ static void pvrdma_regs_write(void *opaque, hwaddr addr, uint64_t val,
>      case PVRDMA_REG_DSRHIGH:
>          trace_pvrdma_regs_write(addr, val, "DSRHIGH", "");
>          dev->dsr_info.dma |= val << 32;
> +        free_dsr(dev);
>          load_dsr(dev);
>          init_dsr_dev_caps(dev);
>          break;

LGTM, thanks.

Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>

> -- 
> 2.21.0
> 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v1 0/2] Add live migration support in the PVRDMA device
  2019-08-28 14:23 [Qemu-devel] [PATCH v1 0/2] Add live migration support in the PVRDMA device Sukrit Bhatnagar
                   ` (2 preceding siblings ...)
  2019-08-29 12:00 ` [Qemu-devel] [PATCH v1 0/2] Add live migration support in the PVRDMA device Yuval Shaia
@ 2019-08-31 19:37 ` Marcel Apfelbaum
  3 siblings, 0 replies; 14+ messages in thread
From: Marcel Apfelbaum @ 2019-08-31 19:37 UTC (permalink / raw)
  To: Sukrit Bhatnagar, qemu-devel; +Cc: Yuval Shaia

Hi Sukrit,

On 8/28/19 5:23 PM, Sukrit Bhatnagar wrote:
> This series enables the migration of various GIDs used by the device.
> This is in addition to the successful migration of PCI and MSIX states
> as well as various DMA addresses and ring page information.
>      
> We have a setup having two hosts and two VMs running atop them.
> Migrations are performed over the local network.
>
> We also have performed various ping-pong tests (ibv_rc_pingpong) in the
> guest(s) after adding GID migration support and this is the current status:
> - ping-pong to localhost succeeds, when performed before starting the
>    migration and after the completion of migration.
> - ping-pong to a peer succeeds, both before and after migration as above,
>    provided that both VMs are running on/migrated to the same host.
>    So, if two VMs were started on two different hosts, and one of them
>    was migrated to the other host, the ping-pong was successful.
>    Similarly, if two VMs are migrated to the same host, then after migration,
>    the ping-pong was successful.
> - ping-pong to a peer on the remote host is not working as of now.
>      
> Our next goal is to achieve successful migration with live traffic.
>      

Nice work!
I am sorry for the GSOC program interruption and congrats
on your decision to continue the project!

Thanks,
Marcel

> This is the same as the RFC v3 series posted earlier:
> https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg04752.html
> https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg04753.html
> https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg04754.html
>
>
> Sukrit Bhatnagar (2):
>    hw/pvrdma: make DSR mapping idempotent in load_dsr()
>    hw/pvrdma: add live migration support
>
>   hw/rdma/vmw/pvrdma_main.c | 94 +++++++++++++++++++++++++++++++++++----
>   1 file changed, 86 insertions(+), 8 deletions(-)
>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v1 1/2] hw/pvrdma: make DSR mapping idempotent in load_dsr()
  2019-08-28 14:23 ` [Qemu-devel] [PATCH v1 1/2] hw/pvrdma: make DSR mapping idempotent in load_dsr() Sukrit Bhatnagar
  2019-08-29 12:57   ` Yuval Shaia
@ 2019-08-31 19:39   ` Marcel Apfelbaum
  1 sibling, 0 replies; 14+ messages in thread
From: Marcel Apfelbaum @ 2019-08-31 19:39 UTC (permalink / raw)
  To: Sukrit Bhatnagar, qemu-devel; +Cc: Yuval Shaia



On 8/28/19 5:23 PM, Sukrit Bhatnagar wrote:
> Map to DSR only when there is no mapping done already i.e., when
> dev->dsr_info.dsr is NULL. This allows the rest of mappings and
> ring inits to be done by calling load_dsr() when DSR has already
> been mapped to, somewhere else.
>
> Move free_dsr() out of load_dsr() and call it before the latter
> as and when needed. This aids the case where load_dsr() is called
> having DSR mapping already done, but the rest of map and init
> operations are pending, and prevents an unmap of the DSR.
>
> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> Cc: Yuval Shaia <yuval.shaia@oracle.com>
> Signed-off-by: Sukrit Bhatnagar <skrtbhtngr@gmail.com>
> ---
>   hw/rdma/vmw/pvrdma_main.c | 17 +++++++++--------
>   1 file changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
> index adcf79cd63..6c90db96f9 100644
> --- a/hw/rdma/vmw/pvrdma_main.c
> +++ b/hw/rdma/vmw/pvrdma_main.c
> @@ -172,15 +172,15 @@ static int load_dsr(PVRDMADev *dev)
>       DSRInfo *dsr_info;
>       struct pvrdma_device_shared_region *dsr;
>   
> -    free_dsr(dev);
> -
> -    /* Map to DSR */
> -    dev->dsr_info.dsr = rdma_pci_dma_map(pci_dev, dev->dsr_info.dma,
> -                              sizeof(struct pvrdma_device_shared_region));
>       if (!dev->dsr_info.dsr) {
> -        rdma_error_report("Failed to map to DSR");
> -        rc = -ENOMEM;
> -        goto out;
> +        /* Map to DSR */
> +        dev->dsr_info.dsr = rdma_pci_dma_map(pci_dev, dev->dsr_info.dma,
> +                                  sizeof(struct pvrdma_device_shared_region));
> +        if (!dev->dsr_info.dsr) {
> +            rdma_error_report("Failed to map to DSR");
> +            rc = -ENOMEM;
> +            goto out;
> +        }
>       }
>   
>       /* Shortcuts */
> @@ -402,6 +402,7 @@ static void pvrdma_regs_write(void *opaque, hwaddr addr, uint64_t val,
>       case PVRDMA_REG_DSRHIGH:
>           trace_pvrdma_regs_write(addr, val, "DSRHIGH", "");
>           dev->dsr_info.dma |= val << 32;
> +        free_dsr(dev);
>           load_dsr(dev);
>           init_dsr_dev_caps(dev);
>           break;

Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>

Thanks,
Marcel


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v1 2/2] hw/pvrdma: add live migration support
  2019-08-28 14:23 ` [Qemu-devel] [PATCH v1 2/2] hw/pvrdma: add live migration support Sukrit Bhatnagar
  2019-08-29 12:53   ` Yuval Shaia
  2019-08-29 12:56   ` Yuval Shaia
@ 2019-08-31 19:45   ` Marcel Apfelbaum
  2019-09-01  9:35     ` Yuval Shaia
  2019-09-03 11:05     ` Sukrit Bhatnagar
  2 siblings, 2 replies; 14+ messages in thread
From: Marcel Apfelbaum @ 2019-08-31 19:45 UTC (permalink / raw)
  To: Sukrit Bhatnagar, qemu-devel; +Cc: Yuval Shaia



On 8/28/19 5:23 PM, Sukrit Bhatnagar wrote:
> vmstate_pvrdma describes the PCI and MSIX states as well as the dma
> address for dsr and the gid table of device.
> vmstate_pvrdma_gids describes each gid in the gid table.
>
> pvrdma_post_save() does the job of unregistering gid entries from the
> backend device in the source host.
>
> pvrdma_post_load() maps to dsr using the loaded dma address, registers
> each loaded gid into the backend device, and finally calls load_dsr()
> to perform other mappings and ring init operations.
>
> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> Cc: Yuval Shaia <yuval.shaia@oracle.com>
> Signed-off-by: Sukrit Bhatnagar <skrtbhtngr@gmail.com>
> ---
>   hw/rdma/vmw/pvrdma_main.c | 77 +++++++++++++++++++++++++++++++++++++++
>   1 file changed, 77 insertions(+)
>
> diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
> index 6c90db96f9..6f8b56dea3 100644
> --- a/hw/rdma/vmw/pvrdma_main.c
> +++ b/hw/rdma/vmw/pvrdma_main.c
> @@ -28,6 +28,7 @@
>   #include "sysemu/sysemu.h"
>   #include "monitor/monitor.h"
>   #include "hw/rdma/rdma.h"
> +#include "migration/register.h"
>   
>   #include "../rdma_rm.h"
>   #include "../rdma_backend.h"
> @@ -593,6 +594,81 @@ static void pvrdma_shutdown_notifier(Notifier *n, void *opaque)
>       pvrdma_fini(pci_dev);
>   }
>   
> +static int pvrdma_post_save(void *opaque)
> +{
> +    int i, rc;
> +    PVRDMADev *dev = opaque;
> +
> +    for (i = 0; i < MAX_GIDS; i++) {
> +

No need for the extra line
> +        if (!dev->rdma_dev_res.port.gid_tbl[i].gid.global.interface_id) {
> +            continue;
> +        }
> +        rc = rdma_backend_del_gid(&dev->backend_dev,
> +                                   dev->backend_eth_device_name,
> +                                   &dev->rdma_dev_res.port.gid_tbl[i].gid);
> +        if (rc) {
> +            return -EINVAL;
> +        }
> +    }
> +
> +    return 0;
> +}
> +
> +static int pvrdma_post_load(void *opaque, int version_id)
> +{
> +    int i, rc;
> +    PVRDMADev *dev = opaque;
> +    PCIDevice *pci_dev = PCI_DEVICE(dev);
> +    DSRInfo *dsr_info = &dev->dsr_info;
> +
> +    dsr_info->dsr = rdma_pci_dma_map(pci_dev, dsr_info->dma,
> +                                sizeof(struct pvrdma_device_shared_region));
> +    if (!dsr_info->dsr) {
> +        rdma_error_report("Failed to map to DSR");
> +        return -ENOMEM;
> +    }
> +
> +    for (i = 0; i < MAX_GIDS; i++) {
> +

The same here

> +        if (!dev->rdma_dev_res.port.gid_tbl[i].gid.global.interface_id) {
> +            continue;
> +        }
> +
> +        rc = rdma_backend_add_gid(&dev->backend_dev,
> +                                  dev->backend_eth_device_name,
> +                                  &dev->rdma_dev_res.port.gid_tbl[i].gid);
> +        if (rc) {
> +            return -EINVAL;
> +        }
> +    }
> +
> +    return load_dsr(dev);
> +}
> +
> +static const VMStateDescription vmstate_pvrdma_gids = {
> +    .name = "pvrdma-gids",
> +    .fields = (VMStateField[]) {
> +            VMSTATE_UINT8_ARRAY_V(gid.raw, RdmaRmGid, 16, 0),

Is 16 the array length? If yes, do we have same macro definition?

> +            VMSTATE_END_OF_LIST()
> +    }
> +};
> +
> +static const VMStateDescription vmstate_pvrdma = {
> +    .name = PVRDMA_HW_NAME,
> +    .post_save = pvrdma_post_save,
> +    .post_load = pvrdma_post_load,
> +    .fields = (VMStateField[]) {
> +            VMSTATE_PCI_DEVICE(parent_obj, PVRDMADev),
> +            VMSTATE_MSIX(parent_obj, PVRDMADev),
> +            VMSTATE_UINT64(dsr_info.dma, PVRDMADev),
> +            VMSTATE_STRUCT_ARRAY(rdma_dev_res.port.gid_tbl, PVRDMADev,
> +                                 MAX_PORT_GIDS, 0, vmstate_pvrdma_gids,
> +                                 RdmaRmGid),
> +            VMSTATE_END_OF_LIST()
> +    }
> +};
> +
>   static void pvrdma_realize(PCIDevice *pdev, Error **errp)
>   {
>       int rc = 0;
> @@ -688,6 +764,7 @@ static void pvrdma_class_init(ObjectClass *klass, void *data)
>   
>       dc->desc = "RDMA Device";
>       dc->props = pvrdma_dev_properties;
> +    dc->vmsd = &vmstate_pvrdma;
>       set_bit(DEVICE_CATEGORY_NETWORK, dc->categories);
>   
>       ir->print_statistics = pvrdma_print_statistics;

Very simple an elegant.
If I understand correctly the live migration of a pvrdma device with no
active workloads works with this patch, right?
If yes, I think we should consider merging this code already.
Yuval, do you agree?

Thanks,
Marcel




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v1 2/2] hw/pvrdma: add live migration support
  2019-08-31 19:45   ` Marcel Apfelbaum
@ 2019-09-01  9:35     ` Yuval Shaia
  2019-09-03 11:05     ` Sukrit Bhatnagar
  1 sibling, 0 replies; 14+ messages in thread
From: Yuval Shaia @ 2019-09-01  9:35 UTC (permalink / raw)
  To: Marcel Apfelbaum; +Cc: Sukrit Bhatnagar, qemu-devel

On Sat, Aug 31, 2019 at 10:45:44PM +0300, Marcel Apfelbaum wrote:
> 
> 
> On 8/28/19 5:23 PM, Sukrit Bhatnagar wrote:
> > vmstate_pvrdma describes the PCI and MSIX states as well as the dma
> > address for dsr and the gid table of device.
> > vmstate_pvrdma_gids describes each gid in the gid table.
> > 
> > pvrdma_post_save() does the job of unregistering gid entries from the
> > backend device in the source host.
> > 
> > pvrdma_post_load() maps to dsr using the loaded dma address, registers
> > each loaded gid into the backend device, and finally calls load_dsr()
> > to perform other mappings and ring init operations.
> > 
> > Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> > Cc: Yuval Shaia <yuval.shaia@oracle.com>
> > Signed-off-by: Sukrit Bhatnagar <skrtbhtngr@gmail.com>
> > ---
> >   hw/rdma/vmw/pvrdma_main.c | 77 +++++++++++++++++++++++++++++++++++++++
> >   1 file changed, 77 insertions(+)
> > 
> > diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
> > index 6c90db96f9..6f8b56dea3 100644
> > --- a/hw/rdma/vmw/pvrdma_main.c
> > +++ b/hw/rdma/vmw/pvrdma_main.c
> > @@ -28,6 +28,7 @@
> >   #include "sysemu/sysemu.h"
> >   #include "monitor/monitor.h"
> >   #include "hw/rdma/rdma.h"
> > +#include "migration/register.h"
> >   #include "../rdma_rm.h"
> >   #include "../rdma_backend.h"
> > @@ -593,6 +594,81 @@ static void pvrdma_shutdown_notifier(Notifier *n, void *opaque)
> >       pvrdma_fini(pci_dev);
> >   }
> > +static int pvrdma_post_save(void *opaque)
> > +{
> > +    int i, rc;
> > +    PVRDMADev *dev = opaque;
> > +
> > +    for (i = 0; i < MAX_GIDS; i++) {
> > +
> 
> No need for the extra line
> > +        if (!dev->rdma_dev_res.port.gid_tbl[i].gid.global.interface_id) {
> > +            continue;
> > +        }
> > +        rc = rdma_backend_del_gid(&dev->backend_dev,
> > +                                   dev->backend_eth_device_name,
> > +                                   &dev->rdma_dev_res.port.gid_tbl[i].gid);
> > +        if (rc) {
> > +            return -EINVAL;
> > +        }
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> > +static int pvrdma_post_load(void *opaque, int version_id)
> > +{
> > +    int i, rc;
> > +    PVRDMADev *dev = opaque;
> > +    PCIDevice *pci_dev = PCI_DEVICE(dev);
> > +    DSRInfo *dsr_info = &dev->dsr_info;
> > +
> > +    dsr_info->dsr = rdma_pci_dma_map(pci_dev, dsr_info->dma,
> > +                                sizeof(struct pvrdma_device_shared_region));
> > +    if (!dsr_info->dsr) {
> > +        rdma_error_report("Failed to map to DSR");
> > +        return -ENOMEM;
> > +    }
> > +
> > +    for (i = 0; i < MAX_GIDS; i++) {
> > +
> 
> The same here
> 
> > +        if (!dev->rdma_dev_res.port.gid_tbl[i].gid.global.interface_id) {
> > +            continue;
> > +        }
> > +
> > +        rc = rdma_backend_add_gid(&dev->backend_dev,
> > +                                  dev->backend_eth_device_name,
> > +                                  &dev->rdma_dev_res.port.gid_tbl[i].gid);
> > +        if (rc) {
> > +            return -EINVAL;
> > +        }
> > +    }
> > +
> > +    return load_dsr(dev);
> > +}
> > +
> > +static const VMStateDescription vmstate_pvrdma_gids = {
> > +    .name = "pvrdma-gids",
> > +    .fields = (VMStateField[]) {
> > +            VMSTATE_UINT8_ARRAY_V(gid.raw, RdmaRmGid, 16, 0),
> 
> Is 16 the array length? If yes, do we have same macro definition?
> 
> > +            VMSTATE_END_OF_LIST()
> > +    }
> > +};
> > +
> > +static const VMStateDescription vmstate_pvrdma = {
> > +    .name = PVRDMA_HW_NAME,
> > +    .post_save = pvrdma_post_save,
> > +    .post_load = pvrdma_post_load,
> > +    .fields = (VMStateField[]) {
> > +            VMSTATE_PCI_DEVICE(parent_obj, PVRDMADev),
> > +            VMSTATE_MSIX(parent_obj, PVRDMADev),
> > +            VMSTATE_UINT64(dsr_info.dma, PVRDMADev),
> > +            VMSTATE_STRUCT_ARRAY(rdma_dev_res.port.gid_tbl, PVRDMADev,
> > +                                 MAX_PORT_GIDS, 0, vmstate_pvrdma_gids,
> > +                                 RdmaRmGid),
> > +            VMSTATE_END_OF_LIST()
> > +    }
> > +};
> > +
> >   static void pvrdma_realize(PCIDevice *pdev, Error **errp)
> >   {
> >       int rc = 0;
> > @@ -688,6 +764,7 @@ static void pvrdma_class_init(ObjectClass *klass, void *data)
> >       dc->desc = "RDMA Device";
> >       dc->props = pvrdma_dev_properties;
> > +    dc->vmsd = &vmstate_pvrdma;
> >       set_bit(DEVICE_CATEGORY_NETWORK, dc->categories);
> >       ir->print_statistics = pvrdma_print_statistics;
> 
> Very simple an elegant.
> If I understand correctly the live migration of a pvrdma device with no
> active workloads works with this patch, right?

And no QPs also.

> If yes, I think we should consider merging this code already.
> Yuval, do you agree?

Sure i do!
Even with the limitation, this is huge enhancement that can be used right
away.

But first suggested some fixes, let's see v2.

> 
> Thanks,
> Marcel
> 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v1 2/2] hw/pvrdma: add live migration support
  2019-08-31 19:45   ` Marcel Apfelbaum
  2019-09-01  9:35     ` Yuval Shaia
@ 2019-09-03 11:05     ` Sukrit Bhatnagar
  1 sibling, 0 replies; 14+ messages in thread
From: Sukrit Bhatnagar @ 2019-09-03 11:05 UTC (permalink / raw)
  To: Marcel Apfelbaum; +Cc: qemu-devel, Yuval Shaia

On Sun, 1 Sep 2019 at 01:15, Marcel Apfelbaum
<marcel.apfelbaum@gmail.com> wrote:
>
>
>
> On 8/28/19 5:23 PM, Sukrit Bhatnagar wrote:
> > vmstate_pvrdma describes the PCI and MSIX states as well as the dma
> > address for dsr and the gid table of device.
> > vmstate_pvrdma_gids describes each gid in the gid table.
> >
> > pvrdma_post_save() does the job of unregistering gid entries from the
> > backend device in the source host.
> >
> > pvrdma_post_load() maps to dsr using the loaded dma address, registers
> > each loaded gid into the backend device, and finally calls load_dsr()
> > to perform other mappings and ring init operations.
> >
> > Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> > Cc: Yuval Shaia <yuval.shaia@oracle.com>
> > Signed-off-by: Sukrit Bhatnagar <skrtbhtngr@gmail.com>
> > ---
> >   hw/rdma/vmw/pvrdma_main.c | 77 +++++++++++++++++++++++++++++++++++++++
> >   1 file changed, 77 insertions(+)
> >
> > diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
> > index 6c90db96f9..6f8b56dea3 100644
> > --- a/hw/rdma/vmw/pvrdma_main.c
> > +++ b/hw/rdma/vmw/pvrdma_main.c
> > @@ -28,6 +28,7 @@
> >   #include "sysemu/sysemu.h"
> >   #include "monitor/monitor.h"
> >   #include "hw/rdma/rdma.h"
> > +#include "migration/register.h"
> >
> >   #include "../rdma_rm.h"
> >   #include "../rdma_backend.h"
> > @@ -593,6 +594,81 @@ static void pvrdma_shutdown_notifier(Notifier *n, void *opaque)
> >       pvrdma_fini(pci_dev);
> >   }
> >
> > +static int pvrdma_post_save(void *opaque)
> > +{
> > +    int i, rc;
> > +    PVRDMADev *dev = opaque;
> > +
> > +    for (i = 0; i < MAX_GIDS; i++) {
> > +
>
> No need for the extra line
> > +        if (!dev->rdma_dev_res.port.gid_tbl[i].gid.global.interface_id) {
> > +            continue;
> > +        }
> > +        rc = rdma_backend_del_gid(&dev->backend_dev,
> > +                                   dev->backend_eth_device_name,
> > +                                   &dev->rdma_dev_res.port.gid_tbl[i].gid);
> > +        if (rc) {
> > +            return -EINVAL;
> > +        }
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> > +static int pvrdma_post_load(void *opaque, int version_id)
> > +{
> > +    int i, rc;
> > +    PVRDMADev *dev = opaque;
> > +    PCIDevice *pci_dev = PCI_DEVICE(dev);
> > +    DSRInfo *dsr_info = &dev->dsr_info;
> > +
> > +    dsr_info->dsr = rdma_pci_dma_map(pci_dev, dsr_info->dma,
> > +                                sizeof(struct pvrdma_device_shared_region));
> > +    if (!dsr_info->dsr) {
> > +        rdma_error_report("Failed to map to DSR");
> > +        return -ENOMEM;
> > +    }
> > +
> > +    for (i = 0; i < MAX_GIDS; i++) {
> > +
>
> The same here
>
> > +        if (!dev->rdma_dev_res.port.gid_tbl[i].gid.global.interface_id) {
> > +            continue;
> > +        }
> > +
> > +        rc = rdma_backend_add_gid(&dev->backend_dev,
> > +                                  dev->backend_eth_device_name,
> > +                                  &dev->rdma_dev_res.port.gid_tbl[i].gid);
> > +        if (rc) {
> > +            return -EINVAL;
> > +        }
> > +    }
> > +
> > +    return load_dsr(dev);
> > +}
> > +
> > +static const VMStateDescription vmstate_pvrdma_gids = {
> > +    .name = "pvrdma-gids",
> > +    .fields = (VMStateField[]) {
> > +            VMSTATE_UINT8_ARRAY_V(gid.raw, RdmaRmGid, 16, 0),
>
> Is 16 the array length? If yes, do we have same macro definition?

16 here represents the number of bytes in a GID.
This comes from the verbs definition of ibv_gid

union ibv_gid {
    uint8_t         raw[16];
    struct {
        __be64  subnet_prefix;
        __be64  interface_id;
    } global;
};

I suppose there is no macro for this but we can declare
our own (something like IBV_GID_SIZE).

> > +            VMSTATE_END_OF_LIST()
> > +    }
> > +};
> > +
> > +static const VMStateDescription vmstate_pvrdma = {
> > +    .name = PVRDMA_HW_NAME,
> > +    .post_save = pvrdma_post_save,
> > +    .post_load = pvrdma_post_load,
> > +    .fields = (VMStateField[]) {
> > +            VMSTATE_PCI_DEVICE(parent_obj, PVRDMADev),
> > +            VMSTATE_MSIX(parent_obj, PVRDMADev),
> > +            VMSTATE_UINT64(dsr_info.dma, PVRDMADev),
> > +            VMSTATE_STRUCT_ARRAY(rdma_dev_res.port.gid_tbl, PVRDMADev,
> > +                                 MAX_PORT_GIDS, 0, vmstate_pvrdma_gids,
> > +                                 RdmaRmGid),
> > +            VMSTATE_END_OF_LIST()
> > +    }
> > +};
> > +
> >   static void pvrdma_realize(PCIDevice *pdev, Error **errp)
> >   {
> >       int rc = 0;
> > @@ -688,6 +764,7 @@ static void pvrdma_class_init(ObjectClass *klass, void *data)
> >
> >       dc->desc = "RDMA Device";
> >       dc->props = pvrdma_dev_properties;
> > +    dc->vmsd = &vmstate_pvrdma;
> >       set_bit(DEVICE_CATEGORY_NETWORK, dc->categories);
> >
> >       ir->print_statistics = pvrdma_print_statistics;
>
> Very simple an elegant.
> If I understand correctly the live migration of a pvrdma device with no
> active workloads works with this patch, right?

Yes.

> If yes, I think we should consider merging this code already.
> Yuval, do you agree?
>
> Thanks,
> Marcel
>
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v1 2/2] hw/pvrdma: add live migration support
  2019-08-29 12:53   ` Yuval Shaia
@ 2019-09-03 21:33     ` Sukrit Bhatnagar
  2019-09-04  5:04       ` Yuval Shaia
  0 siblings, 1 reply; 14+ messages in thread
From: Sukrit Bhatnagar @ 2019-09-03 21:33 UTC (permalink / raw)
  To: Yuval Shaia; +Cc: qemu-devel

On Thu, 29 Aug 2019 at 18:23, Yuval Shaia <yuval.shaia@oracle.com> wrote:
>
> On Wed, Aug 28, 2019 at 07:53:28PM +0530, Sukrit Bhatnagar wrote:
> > vmstate_pvrdma describes the PCI and MSIX states as well as the dma
> > address for dsr and the gid table of device.
> > vmstate_pvrdma_gids describes each gid in the gid table.
> >
> > pvrdma_post_save() does the job of unregistering gid entries from the
> > backend device in the source host.
> >
> > pvrdma_post_load() maps to dsr using the loaded dma address, registers
> > each loaded gid into the backend device, and finally calls load_dsr()
> > to perform other mappings and ring init operations.
>
> I think it worth to mention that the dma address is kept in driver/device
> shared memory (dsr->dma) which is migrated as part of memory migration and
> it is out of the scope of this change and so we do not need to save/load
> the dma address during migration.
>
> Also you should specifically comment that this migration-support does not
> includes QP migration. This means that support for life migration *during*
> traffic is not yet supported.
>
> >
> > Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> > Cc: Yuval Shaia <yuval.shaia@oracle.com>
> > Signed-off-by: Sukrit Bhatnagar <skrtbhtngr@gmail.com>
> > ---
> >  hw/rdma/vmw/pvrdma_main.c | 77 +++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 77 insertions(+)
> >
> > diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
> > index 6c90db96f9..6f8b56dea3 100644
> > --- a/hw/rdma/vmw/pvrdma_main.c
> > +++ b/hw/rdma/vmw/pvrdma_main.c
> > @@ -28,6 +28,7 @@
> >  #include "sysemu/sysemu.h"
> >  #include "monitor/monitor.h"
> >  #include "hw/rdma/rdma.h"
> > +#include "migration/register.h"
> >
> >  #include "../rdma_rm.h"
> >  #include "../rdma_backend.h"
> > @@ -593,6 +594,81 @@ static void pvrdma_shutdown_notifier(Notifier *n, void *opaque)
> >      pvrdma_fini(pci_dev);
> >  }
> >
> > +static int pvrdma_post_save(void *opaque)
> > +{
> > +    int i, rc;
> > +    PVRDMADev *dev = opaque;
> > +
> > +    for (i = 0; i < MAX_GIDS; i++) {
> > +
>
> Empty line is redundant here.
>
> > +        if (!dev->rdma_dev_res.port.gid_tbl[i].gid.global.interface_id) {
> > +            continue;
> > +        }
> > +        rc = rdma_backend_del_gid(&dev->backend_dev,
> > +                                   dev->backend_eth_device_name,
> > +                                   &dev->rdma_dev_res.port.gid_tbl[i].gid);
> > +        if (rc) {
> > +            return -EINVAL;
>
> Some error report will help here i guess.

rdma_backend_del_gid() already generates an error report
when rc isn't 0.

Adding another statement for the same seems redundant.

> > +        }
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> > +static int pvrdma_post_load(void *opaque, int version_id)
> > +{
> > +    int i, rc;
> > +    PVRDMADev *dev = opaque;
> > +    PCIDevice *pci_dev = PCI_DEVICE(dev);
> > +    DSRInfo *dsr_info = &dev->dsr_info;
> > +
> > +    dsr_info->dsr = rdma_pci_dma_map(pci_dev, dsr_info->dma,
> > +                                sizeof(struct pvrdma_device_shared_region));
> > +    if (!dsr_info->dsr) {
> > +        rdma_error_report("Failed to map to DSR");
> > +        return -ENOMEM;
> > +    }
> > +
> > +    for (i = 0; i < MAX_GIDS; i++) {
> > +
>
> Empty line is redundant here.
>
> > +        if (!dev->rdma_dev_res.port.gid_tbl[i].gid.global.interface_id) {
> > +            continue;
> > +        }
> > +
> > +        rc = rdma_backend_add_gid(&dev->backend_dev,
> > +                                  dev->backend_eth_device_name,
> > +                                  &dev->rdma_dev_res.port.gid_tbl[i].gid);
> > +        if (rc) {
> > +            return -EINVAL;
> > +        }
> > +    }
> > +
> > +    return load_dsr(dev);

Now that I will move load_dsr() before the del_gid loop,
I can use goto jumps on exit/error paths, so that I can
undo load_dsr if any del_gid fails.

> > +}
> > +
> > +static const VMStateDescription vmstate_pvrdma_gids = {
> > +    .name = "pvrdma-gids",
> > +    .fields = (VMStateField[]) {
> > +            VMSTATE_UINT8_ARRAY_V(gid.raw, RdmaRmGid, 16, 0),
> > +            VMSTATE_END_OF_LIST()
> > +    }
> > +};
> > +
> > +static const VMStateDescription vmstate_pvrdma = {
> > +    .name = PVRDMA_HW_NAME,
> > +    .post_save = pvrdma_post_save,
> > +    .post_load = pvrdma_post_load,
> > +    .fields = (VMStateField[]) {
> > +            VMSTATE_PCI_DEVICE(parent_obj, PVRDMADev),
> > +            VMSTATE_MSIX(parent_obj, PVRDMADev),
> > +            VMSTATE_UINT64(dsr_info.dma, PVRDMADev),
> > +            VMSTATE_STRUCT_ARRAY(rdma_dev_res.port.gid_tbl, PVRDMADev,
> > +                                 MAX_PORT_GIDS, 0, vmstate_pvrdma_gids,
> > +                                 RdmaRmGid),
> > +            VMSTATE_END_OF_LIST()
> > +    }
> > +};
> > +
> >  static void pvrdma_realize(PCIDevice *pdev, Error **errp)
> >  {
> >      int rc = 0;
> > @@ -688,6 +764,7 @@ static void pvrdma_class_init(ObjectClass *klass, void *data)
> >
> >      dc->desc = "RDMA Device";
> >      dc->props = pvrdma_dev_properties;
> > +    dc->vmsd = &vmstate_pvrdma;
> >      set_bit(DEVICE_CATEGORY_NETWORK, dc->categories);
> >
> >      ir->print_statistics = pvrdma_print_statistics;
> > --
> > 2.21.0
> >
> >


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v1 2/2] hw/pvrdma: add live migration support
  2019-09-03 21:33     ` Sukrit Bhatnagar
@ 2019-09-04  5:04       ` Yuval Shaia
  0 siblings, 0 replies; 14+ messages in thread
From: Yuval Shaia @ 2019-09-04  5:04 UTC (permalink / raw)
  To: Sukrit Bhatnagar; +Cc: qemu-devel

On Wed, Sep 04, 2019 at 03:03:20AM +0530, Sukrit Bhatnagar wrote:
> On Thu, 29 Aug 2019 at 18:23, Yuval Shaia <yuval.shaia@oracle.com> wrote:
> >
> > On Wed, Aug 28, 2019 at 07:53:28PM +0530, Sukrit Bhatnagar wrote:
> > > vmstate_pvrdma describes the PCI and MSIX states as well as the dma
> > > address for dsr and the gid table of device.
> > > vmstate_pvrdma_gids describes each gid in the gid table.
> > >
> > > pvrdma_post_save() does the job of unregistering gid entries from the
> > > backend device in the source host.
> > >
> > > pvrdma_post_load() maps to dsr using the loaded dma address, registers
> > > each loaded gid into the backend device, and finally calls load_dsr()
> > > to perform other mappings and ring init operations.
> >
> > I think it worth to mention that the dma address is kept in driver/device
> > shared memory (dsr->dma) which is migrated as part of memory migration and
> > it is out of the scope of this change and so we do not need to save/load
> > the dma address during migration.
> >
> > Also you should specifically comment that this migration-support does not
> > includes QP migration. This means that support for life migration *during*
> > traffic is not yet supported.
> >
> > >
> > > Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> > > Cc: Yuval Shaia <yuval.shaia@oracle.com>
> > > Signed-off-by: Sukrit Bhatnagar <skrtbhtngr@gmail.com>
> > > ---
> > >  hw/rdma/vmw/pvrdma_main.c | 77 +++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 77 insertions(+)
> > >
> > > diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
> > > index 6c90db96f9..6f8b56dea3 100644
> > > --- a/hw/rdma/vmw/pvrdma_main.c
> > > +++ b/hw/rdma/vmw/pvrdma_main.c
> > > @@ -28,6 +28,7 @@
> > >  #include "sysemu/sysemu.h"
> > >  #include "monitor/monitor.h"
> > >  #include "hw/rdma/rdma.h"
> > > +#include "migration/register.h"
> > >
> > >  #include "../rdma_rm.h"
> > >  #include "../rdma_backend.h"
> > > @@ -593,6 +594,81 @@ static void pvrdma_shutdown_notifier(Notifier *n, void *opaque)
> > >      pvrdma_fini(pci_dev);
> > >  }
> > >
> > > +static int pvrdma_post_save(void *opaque)
> > > +{
> > > +    int i, rc;
> > > +    PVRDMADev *dev = opaque;
> > > +
> > > +    for (i = 0; i < MAX_GIDS; i++) {
> > > +
> >
> > Empty line is redundant here.
> >
> > > +        if (!dev->rdma_dev_res.port.gid_tbl[i].gid.global.interface_id) {
> > > +            continue;
> > > +        }
> > > +        rc = rdma_backend_del_gid(&dev->backend_dev,
> > > +                                   dev->backend_eth_device_name,
> > > +                                   &dev->rdma_dev_res.port.gid_tbl[i].gid);
> > > +        if (rc) {
> > > +            return -EINVAL;
> >
> > Some error report will help here i guess.
> 
> rdma_backend_del_gid() already generates an error report
> when rc isn't 0.
> 
> Adding another statement for the same seems redundant.

Sure, make sense.

> 
> > > +        }
> > > +    }
> > > +
> > > +    return 0;
> > > +}
> > > +
> > > +static int pvrdma_post_load(void *opaque, int version_id)
> > > +{
> > > +    int i, rc;
> > > +    PVRDMADev *dev = opaque;
> > > +    PCIDevice *pci_dev = PCI_DEVICE(dev);
> > > +    DSRInfo *dsr_info = &dev->dsr_info;
> > > +
> > > +    dsr_info->dsr = rdma_pci_dma_map(pci_dev, dsr_info->dma,
> > > +                                sizeof(struct pvrdma_device_shared_region));
> > > +    if (!dsr_info->dsr) {
> > > +        rdma_error_report("Failed to map to DSR");
> > > +        return -ENOMEM;
> > > +    }
> > > +
> > > +    for (i = 0; i < MAX_GIDS; i++) {
> > > +
> >
> > Empty line is redundant here.
> >
> > > +        if (!dev->rdma_dev_res.port.gid_tbl[i].gid.global.interface_id) {
> > > +            continue;
> > > +        }
> > > +
> > > +        rc = rdma_backend_add_gid(&dev->backend_dev,
> > > +                                  dev->backend_eth_device_name,
> > > +                                  &dev->rdma_dev_res.port.gid_tbl[i].gid);
> > > +        if (rc) {
> > > +            return -EINVAL;
> > > +        }
> > > +    }
> > > +
> > > +    return load_dsr(dev);
> 
> Now that I will move load_dsr() before the del_gid loop,

You probably meant before add_gid loop.

> I can use goto jumps on exit/error paths, so that I can
> undo load_dsr if any del_gid fails.

Yeah, it will be easier to undo load_dsr than add_gid.

> 
> > > +}
> > > +
> > > +static const VMStateDescription vmstate_pvrdma_gids = {
> > > +    .name = "pvrdma-gids",
> > > +    .fields = (VMStateField[]) {
> > > +            VMSTATE_UINT8_ARRAY_V(gid.raw, RdmaRmGid, 16, 0),
> > > +            VMSTATE_END_OF_LIST()
> > > +    }
> > > +};
> > > +
> > > +static const VMStateDescription vmstate_pvrdma = {
> > > +    .name = PVRDMA_HW_NAME,
> > > +    .post_save = pvrdma_post_save,
> > > +    .post_load = pvrdma_post_load,
> > > +    .fields = (VMStateField[]) {
> > > +            VMSTATE_PCI_DEVICE(parent_obj, PVRDMADev),
> > > +            VMSTATE_MSIX(parent_obj, PVRDMADev),
> > > +            VMSTATE_UINT64(dsr_info.dma, PVRDMADev),
> > > +            VMSTATE_STRUCT_ARRAY(rdma_dev_res.port.gid_tbl, PVRDMADev,
> > > +                                 MAX_PORT_GIDS, 0, vmstate_pvrdma_gids,
> > > +                                 RdmaRmGid),
> > > +            VMSTATE_END_OF_LIST()
> > > +    }
> > > +};
> > > +
> > >  static void pvrdma_realize(PCIDevice *pdev, Error **errp)
> > >  {
> > >      int rc = 0;
> > > @@ -688,6 +764,7 @@ static void pvrdma_class_init(ObjectClass *klass, void *data)
> > >
> > >      dc->desc = "RDMA Device";
> > >      dc->props = pvrdma_dev_properties;
> > > +    dc->vmsd = &vmstate_pvrdma;
> > >      set_bit(DEVICE_CATEGORY_NETWORK, dc->categories);
> > >
> > >      ir->print_statistics = pvrdma_print_statistics;
> > > --
> > > 2.21.0
> > >
> > >


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-09-04  5:06 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-28 14:23 [Qemu-devel] [PATCH v1 0/2] Add live migration support in the PVRDMA device Sukrit Bhatnagar
2019-08-28 14:23 ` [Qemu-devel] [PATCH v1 1/2] hw/pvrdma: make DSR mapping idempotent in load_dsr() Sukrit Bhatnagar
2019-08-29 12:57   ` Yuval Shaia
2019-08-31 19:39   ` Marcel Apfelbaum
2019-08-28 14:23 ` [Qemu-devel] [PATCH v1 2/2] hw/pvrdma: add live migration support Sukrit Bhatnagar
2019-08-29 12:53   ` Yuval Shaia
2019-09-03 21:33     ` Sukrit Bhatnagar
2019-09-04  5:04       ` Yuval Shaia
2019-08-29 12:56   ` Yuval Shaia
2019-08-31 19:45   ` Marcel Apfelbaum
2019-09-01  9:35     ` Yuval Shaia
2019-09-03 11:05     ` Sukrit Bhatnagar
2019-08-29 12:00 ` [Qemu-devel] [PATCH v1 0/2] Add live migration support in the PVRDMA device Yuval Shaia
2019-08-31 19:37 ` Marcel Apfelbaum

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).