qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC 0/1] Add live migration support to the PVRDMA device
@ 2019-06-21 14:45 Sukrit Bhatnagar
  2019-06-21 14:45 ` [Qemu-devel] [RFC 1/1] hw/pvrdma: Add live migration support Sukrit Bhatnagar
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Sukrit Bhatnagar @ 2019-06-21 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Yuval Shaia

Hi,

I am a GSoC participant, trying to implement live migration for the
pvrdma device with help from my mentors Marcel and Yuval.
My current task is to save and load the various addresses that the
device uses for DMA mapping. We will be adding the device state into
live migration, incrementally. As the first step in the implementation,
we are performing migration to the same host. This will save us from
many complexities, such as GID change, at this stage, and we will
address migration across hosts at a later point when same-host migration
works.

Currently, the save and load logic uses SaveVMHandlers, which is the
legcay way, and will be ported to VMStateDescription once the
existing issues are solved.

This RFC is meant to request suggestions on the things which are
working and for help on the things which are not.


What is working:

* pvrdma device is getting initialized in a VM, its GID entry is
  getting added to the host, and rc_pingpong is successful between
  two such VMs. This is when libvirt is used to launch the VMs.

* The dma, cmd_slot_dma and resp_slot_dma addresses are saved at the
  source and loaded properly in the destination upon migration. That is,
  the values loaded at the dest during migration are the same as the
  ones saved.

  `dma` is provided by the guest device when it writes to BAR1, stored
  in dev->dsr_info.dma. A DSR is created on mapping to this address.
  `cmd_slot_dma` and `resp_slot_dma` are the dma addresses of the command
  and response buffers, respectively, which are provided by the guest
  through the DSR.

* The DSR successfully (re)maps to the dma address loaded from
  migration at the dest.


What is not working:

* In the pvrdma_load() logic, the mapping to DSR is successful at dest.
  But the mapping for cmd and resp slots fails.
  rdma_pci_dma_map() eventually calls address_space_map(). Inside the
  latter, a global BounceBuffer bounce is checked to see if it is in use
  (the atomic_xchg() primitive).
  At the dest, it is in use and the dma remapping fails there, which
  fails the whole migration process. Essentially, I am looking for a
  way to remap guest physical address after a live migration (to the
  same host). Any tips on avoiding the BounceBuffer will also be great.

  I have also tried unmapping the cmd and resp slots at the source before
  saving the dma addresses in pvrdma_save(), but the mapping fails anyway.

* It seems that vmxnet3 migration itself is not working properly, at least
  for me. The pvrdma device depends on it, vmxnet3 is function 0 and pvrdma
  is function 1. This is happening even for a build of unmodified code from
  the master branch.
  After migration, the network connectivity is lost at destination.
  Things are fine at the source before migration.
  This is the command I am using at src:

  sudo /home/skrtbhtngr/qemu/build/x86_64-softmmu/qemu-system-x86_64 \
    -enable-kvm \
    -m 2G -smp cpus=2 \
    -hda /home/skrtbhtngr/fedora.img \
    -netdev tap,id=hostnet0 \
    -device vmxnet3,netdev=hostnet0,id=net0,mac=52:54:00:99:ff:bc \
    -monitor telnet:127.0.0.1:4444,server,nowait \
    -trace events=/home/skrtbhtngr/trace-events \
    -vnc 0.0.0.0:0

  Similar command is used for the dest. Currently, I am trying
  same-host migration for testing purpose, without the pvrdma device.
  Two tap interfaces, for src and dest were created successfully at
  the host. Kernel logs:
  ...
  br0: port 2(tap0) entered forwarding state
  ...
  br0: port 3(tap1) entered forwarding state

  tcpdump at the dest reports only outgoing ARP packets, which ask
  for gateway: "ARP, Request who-has _gateway tell guest1".

  Tried using user (slirp) as the network backend, but no luck.
  
  Also tried git bisect to find the issue using a working commit (given
  by Marcel), but it turns out that it is very old and I faced build
  errors one after another.

  Please note that e1000 live migration is working fine in the same setup.

* Since we are aiming at trying on same-host migration first, I cannot
  use libvirt as it does not allow this. Currently, I am running the
  VMs using qemu-system commands. But libvirt is needed to add the GID
  entry of the guest device in the host. I am looking for a workaround,
  if that is possible at all.
  I started a thread few days ago for the same on libvirt-users:
  https://www.redhat.com/archives/libvirt-users/2019-June/msg00011.html


Sukrit Bhatnagar (1):
  hw/pvrdma: Add live migration support

 hw/rdma/vmw/pvrdma_main.c | 56 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 56 insertions(+)

-- 
2.21.0



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Qemu-devel] [RFC 1/1] hw/pvrdma: Add live migration support
  2019-06-21 14:45 [Qemu-devel] [RFC 0/1] Add live migration support to the PVRDMA device Sukrit Bhatnagar
@ 2019-06-21 14:45 ` Sukrit Bhatnagar
  2019-06-25 15:32   ` Yuval Shaia
  2019-06-25 15:38   ` Yuval Shaia
  2019-06-21 17:55 ` [Qemu-devel] [RFC 0/1] Add live migration support to the PVRDMA device no-reply
  2019-06-25  8:14 ` Marcel Apfelbaum
  2 siblings, 2 replies; 17+ messages in thread
From: Sukrit Bhatnagar @ 2019-06-21 14:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: Yuval Shaia

Define and register SaveVMHandlers pvrdma_save and
pvrdma_load for saving and loading the device state,
which currently includes only the dma, command slot
and response slot addresses.

Remap the DSR, command slot and response slot upon
loading the addresses in the pvrdma_load function.

Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Sukrit Bhatnagar <skrtbhtngr@gmail.com>
---
 hw/rdma/vmw/pvrdma_main.c | 56 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 56 insertions(+)

diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
index adcf79cd63..cd8573173c 100644
--- a/hw/rdma/vmw/pvrdma_main.c
+++ b/hw/rdma/vmw/pvrdma_main.c
@@ -28,6 +28,7 @@
 #include "sysemu/sysemu.h"
 #include "monitor/monitor.h"
 #include "hw/rdma/rdma.h"
+#include "migration/register.h"
 
 #include "../rdma_rm.h"
 #include "../rdma_backend.h"
@@ -592,9 +593,62 @@ static void pvrdma_shutdown_notifier(Notifier *n, void *opaque)
     pvrdma_fini(pci_dev);
 }
 
+static void pvrdma_save(QEMUFile *f, void *opaque)
+{
+    PVRDMADev *dev = PVRDMA_DEV(opaque);
+
+    qemu_put_be64(f, dev->dsr_info.dma);
+    qemu_put_be64(f, dev->dsr_info.dsr->cmd_slot_dma);
+    qemu_put_be64(f, dev->dsr_info.dsr->resp_slot_dma);
+}
+
+static int pvrdma_load(QEMUFile *f, void *opaque, int version_id)
+{
+    PVRDMADev *dev = PVRDMA_DEV(opaque);
+    PCIDevice *pci_dev = PCI_DEVICE(dev);
+
+    // Remap DSR
+    dev->dsr_info.dma = qemu_get_be64(f);
+    dev->dsr_info.dsr = rdma_pci_dma_map(pci_dev, dev->dsr_info.dma,
+                                    sizeof(struct pvrdma_device_shared_region));
+    if (!dev->dsr_info.dsr) {
+        rdma_error_report("Failed to map to DSR");
+        return -1;
+    }
+    qemu_log("pvrdma_load: successfully remapped to DSR\n");
+
+    // Remap cmd slot
+    dev->dsr_info.dsr->cmd_slot_dma = qemu_get_be64(f);
+    dev->dsr_info.req = rdma_pci_dma_map(pci_dev, dev->dsr_info.dsr->cmd_slot_dma,
+                                     sizeof(union pvrdma_cmd_req));
+    if (!dev->dsr_info.req) {
+        rdma_error_report("Failed to map to command slot address");
+        return -1;
+    }
+    qemu_log("pvrdma_load: successfully remapped to cmd slot\n");
+
+    // Remap rsp slot
+    dev->dsr_info.dsr->resp_slot_dma = qemu_get_be64(f);
+    dev->dsr_info.rsp = rdma_pci_dma_map(pci_dev, dev->dsr_info.dsr->resp_slot_dma,
+                                     sizeof(union pvrdma_cmd_resp));
+    if (!dev->dsr_info.rsp) {
+        rdma_error_report("Failed to map to response slot address");
+        return -1;
+    }
+    qemu_log("pvrdma_load: successfully remapped to rsp slot\n");
+
+    return 0;
+}
+
+static SaveVMHandlers savevm_pvrdma = {
+    .save_state = pvrdma_save,
+    .load_state = pvrdma_load,
+};
+
 static void pvrdma_realize(PCIDevice *pdev, Error **errp)
 {
     int rc = 0;
+    DeviceState *s = DEVICE(pdev);
     PVRDMADev *dev = PVRDMA_DEV(pdev);
     Object *memdev_root;
     bool ram_shared = false;
@@ -666,6 +720,8 @@ static void pvrdma_realize(PCIDevice *pdev, Error **errp)
     dev->shutdown_notifier.notify = pvrdma_shutdown_notifier;
     qemu_register_shutdown_notifier(&dev->shutdown_notifier);
 
+    register_savevm_live(s, "pvrdma", -1, 1, &savevm_pvrdma, dev);
+
 out:
     if (rc) {
         pvrdma_fini(pdev);
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC 0/1] Add live migration support to the PVRDMA device
  2019-06-21 14:45 [Qemu-devel] [RFC 0/1] Add live migration support to the PVRDMA device Sukrit Bhatnagar
  2019-06-21 14:45 ` [Qemu-devel] [RFC 1/1] hw/pvrdma: Add live migration support Sukrit Bhatnagar
@ 2019-06-21 17:55 ` no-reply
  2019-06-25  8:14 ` Marcel Apfelbaum
  2 siblings, 0 replies; 17+ messages in thread
From: no-reply @ 2019-06-21 17:55 UTC (permalink / raw)
  To: skrtbhtngr; +Cc: qemu-devel, yuval.shaia

Patchew URL: https://patchew.org/QEMU/20190621144541.13770-1-skrtbhtngr@gmail.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Subject: [Qemu-devel] [RFC 0/1] Add live migration support to the PVRDMA device
Message-id: 20190621144541.13770-1-skrtbhtngr@gmail.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
21c2b30 hw/pvrdma: Add live migration support

=== OUTPUT BEGIN ===
ERROR: do not use C99 // comments
#50: FILE: hw/rdma/vmw/pvrdma_main.c:610:
+    // Remap DSR

ERROR: do not use C99 // comments
#60: FILE: hw/rdma/vmw/pvrdma_main.c:620:
+    // Remap cmd slot

WARNING: line over 80 characters
#62: FILE: hw/rdma/vmw/pvrdma_main.c:622:
+    dev->dsr_info.req = rdma_pci_dma_map(pci_dev, dev->dsr_info.dsr->cmd_slot_dma,

ERROR: do not use C99 // comments
#70: FILE: hw/rdma/vmw/pvrdma_main.c:630:
+    // Remap rsp slot

WARNING: line over 80 characters
#72: FILE: hw/rdma/vmw/pvrdma_main.c:632:
+    dev->dsr_info.rsp = rdma_pci_dma_map(pci_dev, dev->dsr_info.dsr->resp_slot_dma,

total: 3 errors, 2 warnings, 77 lines checked

Commit 21c2b3077a6c (hw/pvrdma: Add live migration support) has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20190621144541.13770-1-skrtbhtngr@gmail.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC 0/1] Add live migration support to the PVRDMA device
  2019-06-21 14:45 [Qemu-devel] [RFC 0/1] Add live migration support to the PVRDMA device Sukrit Bhatnagar
  2019-06-21 14:45 ` [Qemu-devel] [RFC 1/1] hw/pvrdma: Add live migration support Sukrit Bhatnagar
  2019-06-21 17:55 ` [Qemu-devel] [RFC 0/1] Add live migration support to the PVRDMA device no-reply
@ 2019-06-25  8:14 ` Marcel Apfelbaum
  2019-06-25  8:39   ` Dmitry Fleytman
  2 siblings, 1 reply; 17+ messages in thread
From: Marcel Apfelbaum @ 2019-06-25  8:14 UTC (permalink / raw)
  To: Sukrit Bhatnagar, qemu-devel, Dmitry Fleytman, Dr. David Alan Gilbert
  Cc: Yuval Shaia

Hi Sukrit

On 6/21/19 5:45 PM, Sukrit Bhatnagar wrote:
> Hi,
[...]
> This RFC is meant to request suggestions on the things which are
> working and for help on the things which are not.
>
[...]
> What is not working:
>
[...]
>
> * It seems that vmxnet3 migration itself is not working properly, at least
>    for me. The pvrdma device depends on it, vmxnet3 is function 0 and pvrdma
>    is function 1. This is happening even for a build of unmodified code from
>    the master branch.
>    After migration, the network connectivity is lost at destination.
>    Things are fine at the source before migration.
>    This is the command I am using at src:
>
>    sudo /home/skrtbhtngr/qemu/build/x86_64-softmmu/qemu-system-x86_64 \
>      -enable-kvm \
>      -m 2G -smp cpus=2 \
>      -hda /home/skrtbhtngr/fedora.img \
>      -netdev tap,id=hostnet0 \
>      -device vmxnet3,netdev=hostnet0,id=net0,mac=52:54:00:99:ff:bc \
>      -monitor telnet:127.0.0.1:4444,server,nowait \
>      -trace events=/home/skrtbhtngr/trace-events \
>      -vnc 0.0.0.0:0
>
>    Similar command is used for the dest. Currently, I am trying
>    same-host migration for testing purpose, without the pvrdma device.
>    Two tap interfaces, for src and dest were created successfully at
>    the host. Kernel logs:
>    ...
>    br0: port 2(tap0) entered forwarding state
>    ...
>    br0: port 3(tap1) entered forwarding state
>
>    tcpdump at the dest reports only outgoing ARP packets, which ask
>    for gateway: "ARP, Request who-has _gateway tell guest1".
>
>    Tried using user (slirp) as the network backend, but no luck.
>    
>    Also tried git bisect to find the issue using a working commit (given
>    by Marcel), but it turns out that it is very old and I faced build
>    errors one after another.
>
>    Please note that e1000 live migration is working fine in the same setup.
>

I tried to git bisect , but I couldn't find a working version of vmxnet 
supporting live migration ....
I tried even a commit from December 2014 and it didn't work.

What is strange (to me) is that the networking packets can't be sent 
from the guest (after migration)
even after rebooting the guest.

Any help or pointer would be greatly appreciated.
Thanks,
Marcel


[...]


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC 0/1] Add live migration support to the PVRDMA device
  2019-06-25  8:14 ` Marcel Apfelbaum
@ 2019-06-25  8:39   ` Dmitry Fleytman
  2019-06-25  8:49     ` Marcel Apfelbaum
  0 siblings, 1 reply; 17+ messages in thread
From: Dmitry Fleytman @ 2019-06-25  8:39 UTC (permalink / raw)
  To: Marcel Apfelbaum
  Cc: Sukrit Bhatnagar, qemu-devel, Yuval Shaia, Dr. David Alan Gilbert



> On 25 Jun 2019, at 11:14, Marcel Apfelbaum <marcel.apfelbaum@gmail.com> wrote:
> 
> Hi Sukrit
> 
> On 6/21/19 5:45 PM, Sukrit Bhatnagar wrote:
>> Hi,
> [...]
>> This RFC is meant to request suggestions on the things which are
>> working and for help on the things which are not.
>> 
> [...]
>> What is not working:
>> 
> [...]
>> 
>> * It seems that vmxnet3 migration itself is not working properly, at least
>>   for me. The pvrdma device depends on it, vmxnet3 is function 0 and pvrdma
>>   is function 1. This is happening even for a build of unmodified code from
>>   the master branch.
>>   After migration, the network connectivity is lost at destination.
>>   Things are fine at the source before migration.
>>   This is the command I am using at src:
>> 
>>   sudo /home/skrtbhtngr/qemu/build/x86_64-softmmu/qemu-system-x86_64 \
>>     -enable-kvm \
>>     -m 2G -smp cpus=2 \
>>     -hda /home/skrtbhtngr/fedora.img \
>>     -netdev tap,id=hostnet0 \
>>     -device vmxnet3,netdev=hostnet0,id=net0,mac=52:54:00:99:ff:bc \
>>     -monitor telnet:127.0.0.1:4444,server,nowait \
>>     -trace events=/home/skrtbhtngr/trace-events \
>>     -vnc 0.0.0.0:0
>> 
>>   Similar command is used for the dest. Currently, I am trying
>>   same-host migration for testing purpose, without the pvrdma device.
>>   Two tap interfaces, for src and dest were created successfully at
>>   the host. Kernel logs:
>>   ...
>>   br0: port 2(tap0) entered forwarding state
>>   ...
>>   br0: port 3(tap1) entered forwarding state
>> 
>>   tcpdump at the dest reports only outgoing ARP packets, which ask
>>   for gateway: "ARP, Request who-has _gateway tell guest1".
>> 
>>   Tried using user (slirp) as the network backend, but no luck.
>>      Also tried git bisect to find the issue using a working commit (given
>>   by Marcel), but it turns out that it is very old and I faced build
>>   errors one after another.
>> 
>>   Please note that e1000 live migration is working fine in the same setup.
>> 
> 
> I tried to git bisect , but I couldn't find a working version of vmxnet supporting live migration ....
> I tried even a commit from December 2014 and it didn't work.
> 
> What is strange (to me) is that the networking packets can't be sent from the guest (after migration)
> even after rebooting the guest.

This makes me think that some network offload configuration wasn’t properly migrated or applied.
What network backend are you using?

Do you see any outgoing packets in the sniffer?

> 
> Any help or pointer would be greatly appreciated.
> Thanks,
> Marcel
> 
> 
> [...]



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC 0/1] Add live migration support to the PVRDMA device
  2019-06-25  8:39   ` Dmitry Fleytman
@ 2019-06-25  8:49     ` Marcel Apfelbaum
  2019-06-25  9:11       ` Dr. David Alan Gilbert
  2019-06-25 10:25       ` Dmitry Fleytman
  0 siblings, 2 replies; 17+ messages in thread
From: Marcel Apfelbaum @ 2019-06-25  8:49 UTC (permalink / raw)
  To: Dmitry Fleytman
  Cc: Sukrit Bhatnagar, qemu-devel, Yuval Shaia, Dr. David Alan Gilbert

Hi Dmitry,

On 6/25/19 11:39 AM, Dmitry Fleytman wrote:
>
>> On 25 Jun 2019, at 11:14, Marcel Apfelbaum <marcel.apfelbaum@gmail.com> wrote:
>>
>> Hi Sukrit
>>
>> On 6/21/19 5:45 PM, Sukrit Bhatnagar wrote:
>>> Hi,
>> [...]
>>> This RFC is meant to request suggestions on the things which are
>>> working and for help on the things which are not.
>>>
>> [...]
>>> What is not working:
>>>
>> [...]
>>> * It seems that vmxnet3 migration itself is not working properly, at least
>>>    for me. The pvrdma device depends on it, vmxnet3 is function 0 and pvrdma
>>>    is function 1. This is happening even for a build of unmodified code from
>>>    the master branch.
>>>    After migration, the network connectivity is lost at destination.
>>>    Things are fine at the source before migration.
>>>    This is the command I am using at src:
>>>
>>>    sudo /home/skrtbhtngr/qemu/build/x86_64-softmmu/qemu-system-x86_64 \
>>>      -enable-kvm \
>>>      -m 2G -smp cpus=2 \
>>>      -hda /home/skrtbhtngr/fedora.img \
>>>      -netdev tap,id=hostnet0 \
>>>      -device vmxnet3,netdev=hostnet0,id=net0,mac=52:54:00:99:ff:bc \
>>>      -monitor telnet:127.0.0.1:4444,server,nowait \
>>>      -trace events=/home/skrtbhtngr/trace-events \
>>>      -vnc 0.0.0.0:0
>>>
>>>    Similar command is used for the dest. Currently, I am trying
>>>    same-host migration for testing purpose, without the pvrdma device.
>>>    Two tap interfaces, for src and dest were created successfully at
>>>    the host. Kernel logs:
>>>    ...
>>>    br0: port 2(tap0) entered forwarding state
>>>    ...
>>>    br0: port 3(tap1) entered forwarding state
>>>
>>>    tcpdump at the dest reports only outgoing ARP packets, which ask
>>>    for gateway: "ARP, Request who-has _gateway tell guest1".
>>>
>>>    Tried using user (slirp) as the network backend, but no luck.
>>>       Also tried git bisect to find the issue using a working commit (given
>>>    by Marcel), but it turns out that it is very old and I faced build
>>>    errors one after another.
>>>
>>>    Please note that e1000 live migration is working fine in the same setup.
>>>
>> I tried to git bisect , but I couldn't find a working version of vmxnet supporting live migration ....
>> I tried even a commit from December 2014 and it didn't work.
>>
>> What is strange (to me) is that the networking packets can't be sent from the guest (after migration)
>> even after rebooting the guest.
> This makes me think that some network offload configuration wasn’t properly migrated or applied.
> What network backend are you using?

Suktrit tried with tap device, I tried with slirp.
If you can point me to the property that disables all the offloads it 
will really help.

> Do you see any outgoing packets in the sniffer?

I didn't use the sniffer, I checked dmesg in guest, there was a line 
complaining that it can't send packets.

Thanks,
Marcel

>> Any help or pointer would be greatly appreciated.
>> Thanks,
>> Marcel
>>
>>
>> [...]



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC 0/1] Add live migration support to the PVRDMA device
  2019-06-25  8:49     ` Marcel Apfelbaum
@ 2019-06-25  9:11       ` Dr. David Alan Gilbert
  2019-06-25 11:39         ` Marcel Apfelbaum
  2019-06-25 10:25       ` Dmitry Fleytman
  1 sibling, 1 reply; 17+ messages in thread
From: Dr. David Alan Gilbert @ 2019-06-25  9:11 UTC (permalink / raw)
  To: Marcel Apfelbaum
  Cc: Sukrit Bhatnagar, Dmitry Fleytman, qemu-devel, Yuval Shaia

* Marcel Apfelbaum (marcel.apfelbaum@gmail.com) wrote:
> Hi Dmitry,
> 
> On 6/25/19 11:39 AM, Dmitry Fleytman wrote:
> > 
> > > On 25 Jun 2019, at 11:14, Marcel Apfelbaum <marcel.apfelbaum@gmail.com> wrote:
> > > 
> > > Hi Sukrit
> > > 
> > > On 6/21/19 5:45 PM, Sukrit Bhatnagar wrote:
> > > > Hi,
> > > [...]
> > > > This RFC is meant to request suggestions on the things which are
> > > > working and for help on the things which are not.
> > > > 
> > > [...]
> > > > What is not working:
> > > > 
> > > [...]
> > > > * It seems that vmxnet3 migration itself is not working properly, at least
> > > >    for me. The pvrdma device depends on it, vmxnet3 is function 0 and pvrdma
> > > >    is function 1. This is happening even for a build of unmodified code from
> > > >    the master branch.
> > > >    After migration, the network connectivity is lost at destination.
> > > >    Things are fine at the source before migration.
> > > >    This is the command I am using at src:
> > > > 
> > > >    sudo /home/skrtbhtngr/qemu/build/x86_64-softmmu/qemu-system-x86_64 \
> > > >      -enable-kvm \
> > > >      -m 2G -smp cpus=2 \
> > > >      -hda /home/skrtbhtngr/fedora.img \
> > > >      -netdev tap,id=hostnet0 \
> > > >      -device vmxnet3,netdev=hostnet0,id=net0,mac=52:54:00:99:ff:bc \
> > > >      -monitor telnet:127.0.0.1:4444,server,nowait \
> > > >      -trace events=/home/skrtbhtngr/trace-events \
> > > >      -vnc 0.0.0.0:0
> > > > 
> > > >    Similar command is used for the dest. Currently, I am trying
> > > >    same-host migration for testing purpose, without the pvrdma device.
> > > >    Two tap interfaces, for src and dest were created successfully at
> > > >    the host. Kernel logs:
> > > >    ...
> > > >    br0: port 2(tap0) entered forwarding state
> > > >    ...
> > > >    br0: port 3(tap1) entered forwarding state
> > > > 
> > > >    tcpdump at the dest reports only outgoing ARP packets, which ask
> > > >    for gateway: "ARP, Request who-has _gateway tell guest1".
> > > > 
> > > >    Tried using user (slirp) as the network backend, but no luck.
> > > >       Also tried git bisect to find the issue using a working commit (given
> > > >    by Marcel), but it turns out that it is very old and I faced build
> > > >    errors one after another.
> > > > 
> > > >    Please note that e1000 live migration is working fine in the same setup.
> > > > 
> > > I tried to git bisect , but I couldn't find a working version of vmxnet supporting live migration ....
> > > I tried even a commit from December 2014 and it didn't work.
> > > 
> > > What is strange (to me) is that the networking packets can't be sent from the guest (after migration)
> > > even after rebooting the guest.
> > This makes me think that some network offload configuration wasn’t properly migrated or applied.
> > What network backend are you using?
> 
> Suktrit tried with tap device, I tried with slirp.
> If you can point me to the property that disables all the offloads it will
> really help.
> 
> > Do you see any outgoing packets in the sniffer?
> 
> I didn't use the sniffer, I checked dmesg in guest, there was a line
> complaining that it can't send packets.

What exactly was the error?

I don't know much about vmxnet3;  but if the guest driver is seeing the
problem then I guess that's the best pointer we have.


Dave

> Thanks,
> Marcel
> 
> > > Any help or pointer would be greatly appreciated.
> > > Thanks,
> > > Marcel
> > > 
> > > 
> > > [...]
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC 0/1] Add live migration support to the PVRDMA device
  2019-06-25  8:49     ` Marcel Apfelbaum
  2019-06-25  9:11       ` Dr. David Alan Gilbert
@ 2019-06-25 10:25       ` Dmitry Fleytman
  1 sibling, 0 replies; 17+ messages in thread
From: Dmitry Fleytman @ 2019-06-25 10:25 UTC (permalink / raw)
  To: Marcel Apfelbaum
  Cc: Sukrit Bhatnagar, QEMU Developers, Yuval Shaia, Dr. David Alan Gilbert



> On 25 Jun 2019, at 11:49, Marcel Apfelbaum <marcel.apfelbaum@gmail.com> wrote:
> 
> Hi Dmitry,
> 
> On 6/25/19 11:39 AM, Dmitry Fleytman wrote:
>> 
>>> On 25 Jun 2019, at 11:14, Marcel Apfelbaum <marcel.apfelbaum@gmail.com> wrote:
>>> 
>>> Hi Sukrit
>>> 
>>> On 6/21/19 5:45 PM, Sukrit Bhatnagar wrote:
>>>> Hi,
>>> [...]
>>>> This RFC is meant to request suggestions on the things which are
>>>> working and for help on the things which are not.
>>>> 
>>> [...]
>>>> What is not working:
>>>> 
>>> [...]
>>>> * It seems that vmxnet3 migration itself is not working properly, at least
>>>>   for me. The pvrdma device depends on it, vmxnet3 is function 0 and pvrdma
>>>>   is function 1. This is happening even for a build of unmodified code from
>>>>   the master branch.
>>>>   After migration, the network connectivity is lost at destination.
>>>>   Things are fine at the source before migration.
>>>>   This is the command I am using at src:
>>>> 
>>>>   sudo /home/skrtbhtngr/qemu/build/x86_64-softmmu/qemu-system-x86_64 \
>>>>     -enable-kvm \
>>>>     -m 2G -smp cpus=2 \
>>>>     -hda /home/skrtbhtngr/fedora.img \
>>>>     -netdev tap,id=hostnet0 \
>>>>     -device vmxnet3,netdev=hostnet0,id=net0,mac=52:54:00:99:ff:bc \
>>>>     -monitor telnet:127.0.0.1:4444,server,nowait \
>>>>     -trace events=/home/skrtbhtngr/trace-events \
>>>>     -vnc 0.0.0.0:0
>>>> 
>>>>   Similar command is used for the dest. Currently, I am trying
>>>>   same-host migration for testing purpose, without the pvrdma device.
>>>>   Two tap interfaces, for src and dest were created successfully at
>>>>   the host. Kernel logs:
>>>>   ...
>>>>   br0: port 2(tap0) entered forwarding state
>>>>   ...
>>>>   br0: port 3(tap1) entered forwarding state
>>>> 
>>>>   tcpdump at the dest reports only outgoing ARP packets, which ask
>>>>   for gateway: "ARP, Request who-has _gateway tell guest1".
>>>> 
>>>>   Tried using user (slirp) as the network backend, but no luck.
>>>>      Also tried git bisect to find the issue using a working commit (given
>>>>   by Marcel), but it turns out that it is very old and I faced build
>>>>   errors one after another.
>>>> 
>>>>   Please note that e1000 live migration is working fine in the same setup.
>>>> 
>>> I tried to git bisect , but I couldn't find a working version of vmxnet supporting live migration ....
>>> I tried even a commit from December 2014 and it didn't work.
>>> 
>>> What is strange (to me) is that the networking packets can't be sent from the guest (after migration)
>>> even after rebooting the guest.
>> This makes me think that some network offload configuration wasn’t properly migrated or applied.
>> What network backend are you using?
> 
> Suktrit tried with tap device, I tried with slirp.
> If you can point me to the property that disables all the offloads it will really help.
> 
>> Do you see any outgoing packets in the sniffer?
> 
> I didn't use the sniffer, I checked dmesg in guest, there was a line complaining that it can't send packets.

I see. If it cannot send packet on the guest side, then it’s not an offload.
A snippet from dmesg will be helpful indeed.

> 
> Thanks,
> Marcel
> 
>>> Any help or pointer would be greatly appreciated.
>>> Thanks,
>>> Marcel
>>> 
>>> 
>>> [...]


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC 0/1] Add live migration support to the PVRDMA device
  2019-06-25  9:11       ` Dr. David Alan Gilbert
@ 2019-06-25 11:39         ` Marcel Apfelbaum
  0 siblings, 0 replies; 17+ messages in thread
From: Marcel Apfelbaum @ 2019-06-25 11:39 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Sukrit Bhatnagar, Dmitry Fleytman, qemu-devel, Yuval Shaia



On 6/25/19 12:11 PM, Dr. David Alan Gilbert wrote:
> * Marcel Apfelbaum (marcel.apfelbaum@gmail.com) wrote:
>> Hi Dmitry,
>>
>> On 6/25/19 11:39 AM, Dmitry Fleytman wrote:
>>>> On 25 Jun 2019, at 11:14, Marcel Apfelbaum <marcel.apfelbaum@gmail.com> wrote:
>>>>
>>>> Hi Sukrit
>>>>
>>>> On 6/21/19 5:45 PM, Sukrit Bhatnagar wrote:
>>>>> Hi,
>>>> [...]
>>>>> This RFC is meant to request suggestions on the things which are
>>>>> working and for help on the things which are not.
>>>>>
>>>> [...]
>>>>> What is not working:
>>>>>
>>>> [...]
>>>>> * It seems that vmxnet3 migration itself is not working properly, at least
>>>>>     for me. The pvrdma device depends on it, vmxnet3 is function 0 and pvrdma
>>>>>     is function 1. This is happening even for a build of unmodified code from
>>>>>     the master branch.
>>>>>     After migration, the network connectivity is lost at destination.
>>>>>     Things are fine at the source before migration.
>>>>>     This is the command I am using at src:
>>>>>
>>>>>     sudo /home/skrtbhtngr/qemu/build/x86_64-softmmu/qemu-system-x86_64 \
>>>>>       -enable-kvm \
>>>>>       -m 2G -smp cpus=2 \
>>>>>       -hda /home/skrtbhtngr/fedora.img \
>>>>>       -netdev tap,id=hostnet0 \
>>>>>       -device vmxnet3,netdev=hostnet0,id=net0,mac=52:54:00:99:ff:bc \
>>>>>       -monitor telnet:127.0.0.1:4444,server,nowait \
>>>>>       -trace events=/home/skrtbhtngr/trace-events \
>>>>>       -vnc 0.0.0.0:0
>>>>>
>>>>>     Similar command is used for the dest. Currently, I am trying
>>>>>     same-host migration for testing purpose, without the pvrdma device.
>>>>>     Two tap interfaces, for src and dest were created successfully at
>>>>>     the host. Kernel logs:
>>>>>     ...
>>>>>     br0: port 2(tap0) entered forwarding state
>>>>>     ...
>>>>>     br0: port 3(tap1) entered forwarding state
>>>>>
>>>>>     tcpdump at the dest reports only outgoing ARP packets, which ask
>>>>>     for gateway: "ARP, Request who-has _gateway tell guest1".
>>>>>
>>>>>     Tried using user (slirp) as the network backend, but no luck.
>>>>>        Also tried git bisect to find the issue using a working commit (given
>>>>>     by Marcel), but it turns out that it is very old and I faced build
>>>>>     errors one after another.
>>>>>
>>>>>     Please note that e1000 live migration is working fine in the same setup.
>>>>>
>>>> I tried to git bisect , but I couldn't find a working version of vmxnet supporting live migration ....
>>>> I tried even a commit from December 2014 and it didn't work.
>>>>
>>>> What is strange (to me) is that the networking packets can't be sent from the guest (after migration)
>>>> even after rebooting the guest.
>>> This makes me think that some network offload configuration wasn’t properly migrated or applied.
>>> What network backend are you using?
>> Suktrit tried with tap device, I tried with slirp.
>> If you can point me to the property that disables all the offloads it will
>> really help.
>>
>>> Do you see any outgoing packets in the sniffer?
>> I didn't use the sniffer, I checked dmesg in guest, there was a line
>> complaining that it can't send packets.
> What exactly was the error?

I'll try to reproduce the error

Thanks,
Marcel

> I don't know much about vmxnet3;  but if the guest driver is seeing the
> problem then I guess that's the best pointer we have.
>
>
> Dave
>
>> Thanks,
>> Marcel
>>
>>>> Any help or pointer would be greatly appreciated.
>>>> Thanks,
>>>> Marcel
>>>>
>>>>
>>>> [...]
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC 1/1] hw/pvrdma: Add live migration support
  2019-06-21 14:45 ` [Qemu-devel] [RFC 1/1] hw/pvrdma: Add live migration support Sukrit Bhatnagar
@ 2019-06-25 15:32   ` Yuval Shaia
  2019-06-28 11:26     ` Dr. David Alan Gilbert
  2019-06-25 15:38   ` Yuval Shaia
  1 sibling, 1 reply; 17+ messages in thread
From: Yuval Shaia @ 2019-06-25 15:32 UTC (permalink / raw)
  To: Sukrit Bhatnagar; +Cc: qemu-devel

On Fri, Jun 21, 2019 at 08:15:41PM +0530, Sukrit Bhatnagar wrote:
> Define and register SaveVMHandlers pvrdma_save and
> pvrdma_load for saving and loading the device state,
> which currently includes only the dma, command slot
> and response slot addresses.
> 
> Remap the DSR, command slot and response slot upon
> loading the addresses in the pvrdma_load function.
> 
> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> Cc: Yuval Shaia <yuval.shaia@oracle.com>
> Signed-off-by: Sukrit Bhatnagar <skrtbhtngr@gmail.com>
> ---
>  hw/rdma/vmw/pvrdma_main.c | 56 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 56 insertions(+)
> 
> diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
> index adcf79cd63..cd8573173c 100644
> --- a/hw/rdma/vmw/pvrdma_main.c
> +++ b/hw/rdma/vmw/pvrdma_main.c
> @@ -28,6 +28,7 @@
>  #include "sysemu/sysemu.h"
>  #include "monitor/monitor.h"
>  #include "hw/rdma/rdma.h"
> +#include "migration/register.h"
>  
>  #include "../rdma_rm.h"
>  #include "../rdma_backend.h"
> @@ -592,9 +593,62 @@ static void pvrdma_shutdown_notifier(Notifier *n, void *opaque)
>      pvrdma_fini(pci_dev);
>  }
>  
> +static void pvrdma_save(QEMUFile *f, void *opaque)
> +{
> +    PVRDMADev *dev = PVRDMA_DEV(opaque);
> +
> +    qemu_put_be64(f, dev->dsr_info.dma);
> +    qemu_put_be64(f, dev->dsr_info.dsr->cmd_slot_dma);
> +    qemu_put_be64(f, dev->dsr_info.dsr->resp_slot_dma);
> +}
> +
> +static int pvrdma_load(QEMUFile *f, void *opaque, int version_id)
> +{
> +    PVRDMADev *dev = PVRDMA_DEV(opaque);
> +    PCIDevice *pci_dev = PCI_DEVICE(dev);
> +
> +    // Remap DSR
> +    dev->dsr_info.dma = qemu_get_be64(f);
> +    dev->dsr_info.dsr = rdma_pci_dma_map(pci_dev, dev->dsr_info.dma,
> +                                    sizeof(struct pvrdma_device_shared_region));
> +    if (!dev->dsr_info.dsr) {
> +        rdma_error_report("Failed to map to DSR");
> +        return -1;
> +    }
> +    qemu_log("pvrdma_load: successfully remapped to DSR\n");
> +
> +    // Remap cmd slot
> +    dev->dsr_info.dsr->cmd_slot_dma = qemu_get_be64(f);
> +    dev->dsr_info.req = rdma_pci_dma_map(pci_dev, dev->dsr_info.dsr->cmd_slot_dma,
> +                                     sizeof(union pvrdma_cmd_req));
> +    if (!dev->dsr_info.req) {

So this is where you hit the error, right?
All looks good to me, i wonder why the first pci_dma_map works fine but
second fails where the global BounceBuffer is marked as used.

Anyone?

> +        rdma_error_report("Failed to map to command slot address");
> +        return -1;
> +    }
> +    qemu_log("pvrdma_load: successfully remapped to cmd slot\n");
> +
> +    // Remap rsp slot

btw, this is RFC so we are fine but try to avoid such way of comments.

> +    dev->dsr_info.dsr->resp_slot_dma = qemu_get_be64(f);
> +    dev->dsr_info.rsp = rdma_pci_dma_map(pci_dev, dev->dsr_info.dsr->resp_slot_dma,
> +                                     sizeof(union pvrdma_cmd_resp));
> +    if (!dev->dsr_info.rsp) {
> +        rdma_error_report("Failed to map to response slot address");
> +        return -1;
> +    }
> +    qemu_log("pvrdma_load: successfully remapped to rsp slot\n");
> +
> +    return 0;
> +}
> +
> +static SaveVMHandlers savevm_pvrdma = {
> +    .save_state = pvrdma_save,
> +    .load_state = pvrdma_load,
> +};
> +
>  static void pvrdma_realize(PCIDevice *pdev, Error **errp)
>  {
>      int rc = 0;
> +    DeviceState *s = DEVICE(pdev);
>      PVRDMADev *dev = PVRDMA_DEV(pdev);
>      Object *memdev_root;
>      bool ram_shared = false;
> @@ -666,6 +720,8 @@ static void pvrdma_realize(PCIDevice *pdev, Error **errp)
>      dev->shutdown_notifier.notify = pvrdma_shutdown_notifier;
>      qemu_register_shutdown_notifier(&dev->shutdown_notifier);
>  
> +    register_savevm_live(s, "pvrdma", -1, 1, &savevm_pvrdma, dev);
> +
>  out:
>      if (rc) {
>          pvrdma_fini(pdev);
> -- 
> 2.21.0
> 
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC 1/1] hw/pvrdma: Add live migration support
  2019-06-21 14:45 ` [Qemu-devel] [RFC 1/1] hw/pvrdma: Add live migration support Sukrit Bhatnagar
  2019-06-25 15:32   ` Yuval Shaia
@ 2019-06-25 15:38   ` Yuval Shaia
  1 sibling, 0 replies; 17+ messages in thread
From: Yuval Shaia @ 2019-06-25 15:38 UTC (permalink / raw)
  To: Sukrit Bhatnagar; +Cc: qemu-devel

On Fri, Jun 21, 2019 at 08:15:41PM +0530, Sukrit Bhatnagar wrote:
> Define and register SaveVMHandlers pvrdma_save and
> pvrdma_load for saving and loading the device state,
> which currently includes only the dma, command slot
> and response slot addresses.
> 
> Remap the DSR, command slot and response slot upon
> loading the addresses in the pvrdma_load function.
> 
> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> Cc: Yuval Shaia <yuval.shaia@oracle.com>
> Signed-off-by: Sukrit Bhatnagar <skrtbhtngr@gmail.com>
> ---
>  hw/rdma/vmw/pvrdma_main.c | 56 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 56 insertions(+)
> 
> diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
> index adcf79cd63..cd8573173c 100644
> --- a/hw/rdma/vmw/pvrdma_main.c
> +++ b/hw/rdma/vmw/pvrdma_main.c
> @@ -28,6 +28,7 @@
>  #include "sysemu/sysemu.h"
>  #include "monitor/monitor.h"
>  #include "hw/rdma/rdma.h"
> +#include "migration/register.h"
>  
>  #include "../rdma_rm.h"
>  #include "../rdma_backend.h"
> @@ -592,9 +593,62 @@ static void pvrdma_shutdown_notifier(Notifier *n, void *opaque)
>      pvrdma_fini(pci_dev);
>  }
>  
> +static void pvrdma_save(QEMUFile *f, void *opaque)
> +{
> +    PVRDMADev *dev = PVRDMA_DEV(opaque);
> +
> +    qemu_put_be64(f, dev->dsr_info.dma);
> +    qemu_put_be64(f, dev->dsr_info.dsr->cmd_slot_dma);
> +    qemu_put_be64(f, dev->dsr_info.dsr->resp_slot_dma);
> +}
> +
> +static int pvrdma_load(QEMUFile *f, void *opaque, int version_id)
> +{
> +    PVRDMADev *dev = PVRDMA_DEV(opaque);
> +    PCIDevice *pci_dev = PCI_DEVICE(dev);
> +
> +    // Remap DSR
> +    dev->dsr_info.dma = qemu_get_be64(f);
> +    dev->dsr_info.dsr = rdma_pci_dma_map(pci_dev, dev->dsr_info.dma,
> +                                    sizeof(struct pvrdma_device_shared_region));
> +    if (!dev->dsr_info.dsr) {
> +        rdma_error_report("Failed to map to DSR");
> +        return -1;
> +    }
> +    qemu_log("pvrdma_load: successfully remapped to DSR\n");
> +
> +    // Remap cmd slot
> +    dev->dsr_info.dsr->cmd_slot_dma = qemu_get_be64(f);
> +    dev->dsr_info.req = rdma_pci_dma_map(pci_dev, dev->dsr_info.dsr->cmd_slot_dma,
> +                                     sizeof(union pvrdma_cmd_req));
> +    if (!dev->dsr_info.req) {
> +        rdma_error_report("Failed to map to command slot address");
> +        return -1;
> +    }
> +    qemu_log("pvrdma_load: successfully remapped to cmd slot\n");
> +
> +    // Remap rsp slot
> +    dev->dsr_info.dsr->resp_slot_dma = qemu_get_be64(f);
> +    dev->dsr_info.rsp = rdma_pci_dma_map(pci_dev, dev->dsr_info.dsr->resp_slot_dma,
> +                                     sizeof(union pvrdma_cmd_resp));
> +    if (!dev->dsr_info.rsp) {
> +        rdma_error_report("Failed to map to response slot address");
> +        return -1;
> +    }
> +    qemu_log("pvrdma_load: successfully remapped to rsp slot\n");
> +
> +    return 0;
> +}
> +
> +static SaveVMHandlers savevm_pvrdma = {
> +    .save_state = pvrdma_save,
> +    .load_state = pvrdma_load,
> +};
> +
>  static void pvrdma_realize(PCIDevice *pdev, Error **errp)
>  {
>      int rc = 0;
> +    DeviceState *s = DEVICE(pdev);
>      PVRDMADev *dev = PVRDMA_DEV(pdev);
>      Object *memdev_root;
>      bool ram_shared = false;
> @@ -666,6 +720,8 @@ static void pvrdma_realize(PCIDevice *pdev, Error **errp)
>      dev->shutdown_notifier.notify = pvrdma_shutdown_notifier;
>      qemu_register_shutdown_notifier(&dev->shutdown_notifier);
>  
> +    register_savevm_live(s, "pvrdma", -1, 1, &savevm_pvrdma, dev);
> +

Don't forget to unregister_savevm on fini()

>  out:
>      if (rc) {
>          pvrdma_fini(pdev);
> -- 
> 2.21.0
> 
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC 1/1] hw/pvrdma: Add live migration support
  2019-06-25 15:32   ` Yuval Shaia
@ 2019-06-28 11:26     ` Dr. David Alan Gilbert
  2019-06-29 12:45       ` Sukrit Bhatnagar
  0 siblings, 1 reply; 17+ messages in thread
From: Dr. David Alan Gilbert @ 2019-06-28 11:26 UTC (permalink / raw)
  To: Yuval Shaia; +Cc: Sukrit Bhatnagar, qemu-devel, stefanha

* Yuval Shaia (yuval.shaia@oracle.com) wrote:
> On Fri, Jun 21, 2019 at 08:15:41PM +0530, Sukrit Bhatnagar wrote:
> > Define and register SaveVMHandlers pvrdma_save and
> > pvrdma_load for saving and loading the device state,
> > which currently includes only the dma, command slot
> > and response slot addresses.
> > 
> > Remap the DSR, command slot and response slot upon
> > loading the addresses in the pvrdma_load function.
> > 
> > Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> > Cc: Yuval Shaia <yuval.shaia@oracle.com>
> > Signed-off-by: Sukrit Bhatnagar <skrtbhtngr@gmail.com>
> > ---
> >  hw/rdma/vmw/pvrdma_main.c | 56 +++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 56 insertions(+)
> > 
> > diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
> > index adcf79cd63..cd8573173c 100644
> > --- a/hw/rdma/vmw/pvrdma_main.c
> > +++ b/hw/rdma/vmw/pvrdma_main.c
> > @@ -28,6 +28,7 @@
> >  #include "sysemu/sysemu.h"
> >  #include "monitor/monitor.h"
> >  #include "hw/rdma/rdma.h"
> > +#include "migration/register.h"
> >  
> >  #include "../rdma_rm.h"
> >  #include "../rdma_backend.h"
> > @@ -592,9 +593,62 @@ static void pvrdma_shutdown_notifier(Notifier *n, void *opaque)
> >      pvrdma_fini(pci_dev);
> >  }
> >  
> > +static void pvrdma_save(QEMUFile *f, void *opaque)
> > +{
> > +    PVRDMADev *dev = PVRDMA_DEV(opaque);
> > +
> > +    qemu_put_be64(f, dev->dsr_info.dma);
> > +    qemu_put_be64(f, dev->dsr_info.dsr->cmd_slot_dma);
> > +    qemu_put_be64(f, dev->dsr_info.dsr->resp_slot_dma);
> > +}
> > +
> > +static int pvrdma_load(QEMUFile *f, void *opaque, int version_id)
> > +{
> > +    PVRDMADev *dev = PVRDMA_DEV(opaque);
> > +    PCIDevice *pci_dev = PCI_DEVICE(dev);
> > +
> > +    // Remap DSR
> > +    dev->dsr_info.dma = qemu_get_be64(f);
> > +    dev->dsr_info.dsr = rdma_pci_dma_map(pci_dev, dev->dsr_info.dma,
> > +                                    sizeof(struct pvrdma_device_shared_region));
> > +    if (!dev->dsr_info.dsr) {
> > +        rdma_error_report("Failed to map to DSR");
> > +        return -1;
> > +    }
> > +    qemu_log("pvrdma_load: successfully remapped to DSR\n");
> > +
> > +    // Remap cmd slot
> > +    dev->dsr_info.dsr->cmd_slot_dma = qemu_get_be64(f);
> > +    dev->dsr_info.req = rdma_pci_dma_map(pci_dev, dev->dsr_info.dsr->cmd_slot_dma,
> > +                                     sizeof(union pvrdma_cmd_req));
> > +    if (!dev->dsr_info.req) {
> 
> So this is where you hit the error, right?
> All looks good to me, i wonder why the first pci_dma_map works fine but
> second fails where the global BounceBuffer is marked as used.
> 
> Anyone?

I've got a few guesses:
  a) My reading of address_space_map is that it gives a bounce buffer
pointer and then it has to be freed; so if it's going wrong and using a
bounce buffer, then the 1st call would work and only the 2nd would fail.

  b) Perhaps the dma address read from the stream is bad, and isn't
pointing into RAM properly - and that's why you're getting a bounce
buffer.

  c) Do you have some ordering problems? i.e. is this code happening
before some other device has been loaded/initialised?  If you're relying
on some other state, then perhaps the right thing is to perform these
mappings later, at the end of migration, not during the loading itself.

Other notes:
  d) Use VMSTATE macros and structures rather than open-coding qemu_get
calls.

  e) QEMU normally uses /* comments */ rather than //

Dave

> > +        rdma_error_report("Failed to map to command slot address");
> > +        return -1;
> > +    }
> > +    qemu_log("pvrdma_load: successfully remapped to cmd slot\n");
> > +
> > +    // Remap rsp slot
> 
> btw, this is RFC so we are fine but try to avoid such way of comments.
> 
> > +    dev->dsr_info.dsr->resp_slot_dma = qemu_get_be64(f);
> > +    dev->dsr_info.rsp = rdma_pci_dma_map(pci_dev, dev->dsr_info.dsr->resp_slot_dma,
> > +                                     sizeof(union pvrdma_cmd_resp));
> > +    if (!dev->dsr_info.rsp) {
> > +        rdma_error_report("Failed to map to response slot address");
> > +        return -1;
> > +    }
> > +    qemu_log("pvrdma_load: successfully remapped to rsp slot\n");
> > +
> > +    return 0;
> > +}
> > +
> > +static SaveVMHandlers savevm_pvrdma = {
> > +    .save_state = pvrdma_save,
> > +    .load_state = pvrdma_load,
> > +};
> > +
> >  static void pvrdma_realize(PCIDevice *pdev, Error **errp)
> >  {
> >      int rc = 0;
> > +    DeviceState *s = DEVICE(pdev);
> >      PVRDMADev *dev = PVRDMA_DEV(pdev);
> >      Object *memdev_root;
> >      bool ram_shared = false;
> > @@ -666,6 +720,8 @@ static void pvrdma_realize(PCIDevice *pdev, Error **errp)
> >      dev->shutdown_notifier.notify = pvrdma_shutdown_notifier;
> >      qemu_register_shutdown_notifier(&dev->shutdown_notifier);
> >  
> > +    register_savevm_live(s, "pvrdma", -1, 1, &savevm_pvrdma, dev);
> > +
> >  out:
> >      if (rc) {
> >          pvrdma_fini(pdev);
> > -- 
> > 2.21.0
> > 
> > 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC 1/1] hw/pvrdma: Add live migration support
  2019-06-28 11:26     ` Dr. David Alan Gilbert
@ 2019-06-29 12:45       ` Sukrit Bhatnagar
  2019-06-30  8:13         ` Yuval Shaia
  2019-07-08 10:54         ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 17+ messages in thread
From: Sukrit Bhatnagar @ 2019-06-29 12:45 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: stefanha, qemu-devel, Yuval Shaia

On Fri, 28 Jun 2019 at 16:56, Dr. David Alan Gilbert
<dgilbert@redhat.com> wrote:
>
> * Yuval Shaia (yuval.shaia@oracle.com) wrote:
> > On Fri, Jun 21, 2019 at 08:15:41PM +0530, Sukrit Bhatnagar wrote:
> > > Define and register SaveVMHandlers pvrdma_save and
> > > pvrdma_load for saving and loading the device state,
> > > which currently includes only the dma, command slot
> > > and response slot addresses.
> > >
> > > Remap the DSR, command slot and response slot upon
> > > loading the addresses in the pvrdma_load function.
> > >
> > > Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> > > Cc: Yuval Shaia <yuval.shaia@oracle.com>
> > > Signed-off-by: Sukrit Bhatnagar <skrtbhtngr@gmail.com>
> > > ---
> > >  hw/rdma/vmw/pvrdma_main.c | 56 +++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 56 insertions(+)
> > >
> > > diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
> > > index adcf79cd63..cd8573173c 100644
> > > --- a/hw/rdma/vmw/pvrdma_main.c
> > > +++ b/hw/rdma/vmw/pvrdma_main.c
> > > @@ -28,6 +28,7 @@
> > >  #include "sysemu/sysemu.h"
> > >  #include "monitor/monitor.h"
> > >  #include "hw/rdma/rdma.h"
> > > +#include "migration/register.h"
> > >
> > >  #include "../rdma_rm.h"
> > >  #include "../rdma_backend.h"
> > > @@ -592,9 +593,62 @@ static void pvrdma_shutdown_notifier(Notifier *n, void *opaque)
> > >      pvrdma_fini(pci_dev);
> > >  }
> > >
> > > +static void pvrdma_save(QEMUFile *f, void *opaque)
> > > +{
> > > +    PVRDMADev *dev = PVRDMA_DEV(opaque);
> > > +
> > > +    qemu_put_be64(f, dev->dsr_info.dma);
> > > +    qemu_put_be64(f, dev->dsr_info.dsr->cmd_slot_dma);
> > > +    qemu_put_be64(f, dev->dsr_info.dsr->resp_slot_dma);
> > > +}
> > > +
> > > +static int pvrdma_load(QEMUFile *f, void *opaque, int version_id)
> > > +{
> > > +    PVRDMADev *dev = PVRDMA_DEV(opaque);
> > > +    PCIDevice *pci_dev = PCI_DEVICE(dev);
> > > +
> > > +    // Remap DSR
> > > +    dev->dsr_info.dma = qemu_get_be64(f);
> > > +    dev->dsr_info.dsr = rdma_pci_dma_map(pci_dev, dev->dsr_info.dma,
> > > +                                    sizeof(struct pvrdma_device_shared_region));
> > > +    if (!dev->dsr_info.dsr) {
> > > +        rdma_error_report("Failed to map to DSR");
> > > +        return -1;
> > > +    }
> > > +    qemu_log("pvrdma_load: successfully remapped to DSR\n");
> > > +
> > > +    // Remap cmd slot
> > > +    dev->dsr_info.dsr->cmd_slot_dma = qemu_get_be64(f);
> > > +    dev->dsr_info.req = rdma_pci_dma_map(pci_dev, dev->dsr_info.dsr->cmd_slot_dma,
> > > +                                     sizeof(union pvrdma_cmd_req));
> > > +    if (!dev->dsr_info.req) {
> >
> > So this is where you hit the error, right?
> > All looks good to me, i wonder why the first pci_dma_map works fine but
> > second fails where the global BounceBuffer is marked as used.
> >
> > Anyone?
>
> I've got a few guesses:
>   a) My reading of address_space_map is that it gives a bounce buffer
> pointer and then it has to be freed; so if it's going wrong and using a
> bounce buffer, then the 1st call would work and only the 2nd would fail.

In the scenario without any migration, the device is init and load_dsr()
is called when the guest writes to DSR in BAR1 of pvrdma.

Inside the load_dsr(), there are similar calls to rdma_pci_dma_map().

The difference is that when the address_space_map() is called there,
!memory_access_is_direct() condition is FALSE.
So, there is no use for BounceBuffer.


In the migration scenario, when the device at dest calls load and
subsequently rdma_pci_dma_map(), the !memory_access_is_direct()
condition is TRUE.
So, the first rdma_pci_dma_map() will set BounceBuffer from FALSE to
TRUE and succeed. But, the subsequent calls will fail at atomic_xchg().

This BounceBuffer is only freed when address_space_unmap() is called.


I am guessing that when the address is translated to one in FlatView,
the MemoryRegion returned at dest is causing the issue.
To be honest, at this point I am not sure how FlatView translations works.

I tried adding qemu_log to memory_access_is_direct(), but I guess it is
called too many times, so the vm stalls before it even starts.

>   b) Perhaps the dma address read from the stream is bad, and isn't
> pointing into RAM properly - and that's why you're getting a bounce
> buffer.

I have logged the addresses saved in pvrdma_save(), and the ones
loaded in pvrdma_load(). They are same. So, there is no problem in the
stream itself, I suppose.

>   c) Do you have some ordering problems? i.e. is this code happening
> before some other device has been loaded/initialised?  If you're relying
> on some other state, then perhaps the right thing is to perform these
> mappings later, at the end of migration, not during the loading itself.

The device which is saved/loaded before pvrdma is vmxnet3, on which
the pvrdma depends.

I have included the last few lines of my traces during migration below.
The pvrdma migration is performed in this last part of migration.
I had added some debug messages at various places to see which parts
of the code are touched. The ones I added are without any timestamp.

Source:
15942@1561806657.197239:vmstate_subsection_save_top PCIDevice
15942@1561806657.197257:vmstate_subsection_save_top vmxnet3/pcie
15942@1561806657.197275:savevm_section_end 0000:00:10.0/vmxnet3,
section_id 46 -> 0
15942@1561806657.197302:savevm_section_start 0000:00:10.1/pvrdma, section_id 47
15942@1561806657.197326:vmstate_save 0000:00:10.1/pvrdma, (old)
pvrdma_save: dev->dsr_info.dma is 2032963584
pvrdma_save: dev->dsr_info.dsr->cmd_slot_dma is 2032660480
pvrdma_save: dev->dsr_info.dsr->resp_slot_dma is 893681664
15942@1561806657.197372:savevm_section_end 0000:00:10.1/pvrdma,
section_id 47 -> 0
15942@1561806657.197397:savevm_section_start acpi_build, section_id 48
15942@1561806657.197420:vmstate_save acpi_build, acpi_build
15942@1561806657.197441:vmstate_save_state_top acpi_build
15942@1561806657.197459:vmstate_n_elems patched: 1
15942@1561806657.197479:vmstate_save_state_loop acpi_build/patched[1]
15942@1561806657.197503:vmstate_subsection_save_top acpi_build
15942@1561806657.197520:savevm_section_end acpi_build, section_id 48 -> 0
15942@1561806657.197545:savevm_section_start globalstate, section_id 49
15942@1561806657.197568:vmstate_save globalstate, globalstate
15942@1561806657.197589:vmstate_save_state_top globalstate
15942@1561806657.197606:migrate_global_state_pre_save saved state: running
15942@1561806657.197624:vmstate_save_state_pre_save_res globalstate/0
15942@1561806657.197645:vmstate_n_elems size: 1
15942@1561806657.197677:vmstate_save_state_loop globalstate/size[1]
15942@1561806657.197710:vmstate_n_elems runstate: 1
15942@1561806657.197730:vmstate_save_state_loop globalstate/runstate[1]
15942@1561806657.197822:vmstate_subsection_save_top globalstate
15942@1561806657.197885:savevm_section_end globalstate, section_id 49 -> 0
15942@1561806657.198240:migrate_set_state new state completed
15942@1561806657.198269:migration_thread_after_loop
15797@1561806657.198329:savevm_state_cleanup
15797@1561806657.198377:migrate_fd_cleanup
15797@1561806657.199072:qemu_file_fclose

Destination:
15830@1561806657.667024:vmstate_subsection_load vmxnet3/pcie
15830@1561806657.667064:vmstate_subsection_load_good vmxnet3/pcie
15830@1561806657.667106:vmstate_load_state_end vmxnet3/pcie end/0
15830@1561806657.667150:vmstate_subsection_load_good vmxnet3
15830@1561806657.667213:vmstate_load_state_end vmxnet3 end/0
15830@1561806657.667263:qemu_loadvm_state_section 4
15830@1561806657.667293:qemu_loadvm_state_section_startfull
47(0000:00:10.1/pvrdma) 0 1
15830@1561806657.667346:vmstate_load 0000:00:10.1/pvrdma, (old)
pvrdma_load: dev->dsr_info.dma is 2032963584
address_space_map: is_write is 0
address_space_map: addr is 2032963584
address_space_map: Inside !memory_access_is_direct
15830@1561806657.667453:rdma_pci_dma_map 0x792c9000 -> 0x5616d1f66000 (len=280)
pvrdma_load: successfully remapped to DSR
pvrdma_load: dev->dsr_info.dsr->cmd_slot_dma is 2032660480
address_space_map: is_write is 0
address_space_map: addr is 2032660480
address_space_map: Inside !memory_access_is_direct
address_space_map: Inside atomic_xchg
qemu-system-x86_64: rdma: pci_dma_map fail, addr=0x7927f000, len=184
qemu-system-x86_64: rdma: Failed to map to command slot address
qemu-system-x86_64: error while loading state for instance 0x0 of
device '0000:00:10.1/pvrdma'
15830@1561806657.667693:qemu_loadvm_state_post_main -1
15830@1561806657.667729:loadvm_state_cleanup
15830@1561806657.668568:process_incoming_migration_co_end ret=-1
postcopy-state=0
qemu-system-x86_64: load of migration failed: Operation not permitted
15830@1561806657.668721:migrate_set_state new state failed
15830@1561806657.668807:qemu_file_fclose
qemu-system-x86_64: network script /etc/qemu-ifdown failed with status 256

@Marcel, @Yuval: As David has suggested, what if we just read the dma
addresses in pvrdma_load(), and let the load_dsr() do the mapping?
In pvrdma_regs_write(), we can check if dev->dsr_info.dma is already set, so
that its value is not overwritten.


> Other notes:
>   d) Use VMSTATE macros and structures rather than open-coding qemu_get
> calls.

Yes, I am trying it currently.

>   e) QEMU normally uses /* comments */ rather than //

I have corrected this mistake in my code. patchew notified me about this
and the line length issues minutes after I had sent this patch.


> Dave
>
> > > +        rdma_error_report("Failed to map to command slot address");
> > > +        return -1;
> > > +    }
> > > +    qemu_log("pvrdma_load: successfully remapped to cmd slot\n");
> > > +
> > > +    // Remap rsp slot
> >
> > btw, this is RFC so we are fine but try to avoid such way of comments.
> >
> > > +    dev->dsr_info.dsr->resp_slot_dma = qemu_get_be64(f);
> > > +    dev->dsr_info.rsp = rdma_pci_dma_map(pci_dev, dev->dsr_info.dsr->resp_slot_dma,
> > > +                                     sizeof(union pvrdma_cmd_resp));
> > > +    if (!dev->dsr_info.rsp) {
> > > +        rdma_error_report("Failed to map to response slot address");
> > > +        return -1;
> > > +    }
> > > +    qemu_log("pvrdma_load: successfully remapped to rsp slot\n");
> > > +
> > > +    return 0;
> > > +}
> > > +
> > > +static SaveVMHandlers savevm_pvrdma = {
> > > +    .save_state = pvrdma_save,
> > > +    .load_state = pvrdma_load,
> > > +};
> > > +
> > >  static void pvrdma_realize(PCIDevice *pdev, Error **errp)
> > >  {
> > >      int rc = 0;
> > > +    DeviceState *s = DEVICE(pdev);
> > >      PVRDMADev *dev = PVRDMA_DEV(pdev);
> > >      Object *memdev_root;
> > >      bool ram_shared = false;
> > > @@ -666,6 +720,8 @@ static void pvrdma_realize(PCIDevice *pdev, Error **errp)
> > >      dev->shutdown_notifier.notify = pvrdma_shutdown_notifier;
> > >      qemu_register_shutdown_notifier(&dev->shutdown_notifier);
> > >
> > > +    register_savevm_live(s, "pvrdma", -1, 1, &savevm_pvrdma, dev);
> > > +
> > >  out:
> > >      if (rc) {
> > >          pvrdma_fini(pdev);
> > > --
> > > 2.21.0
> > >
> > >
> >
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC 1/1] hw/pvrdma: Add live migration support
  2019-06-29 12:45       ` Sukrit Bhatnagar
@ 2019-06-30  8:13         ` Yuval Shaia
  2019-07-01  2:15           ` Sukrit Bhatnagar
  2019-07-08 10:54         ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 17+ messages in thread
From: Yuval Shaia @ 2019-06-30  8:13 UTC (permalink / raw)
  To: Sukrit Bhatnagar; +Cc: Dr. David Alan Gilbert, stefanha, qemu-devel

On Sat, Jun 29, 2019 at 06:15:21PM +0530, Sukrit Bhatnagar wrote:
> On Fri, 28 Jun 2019 at 16:56, Dr. David Alan Gilbert
> <dgilbert@redhat.com> wrote:
> >
> > * Yuval Shaia (yuval.shaia@oracle.com) wrote:
> > > On Fri, Jun 21, 2019 at 08:15:41PM +0530, Sukrit Bhatnagar wrote:
> > > > Define and register SaveVMHandlers pvrdma_save and
> > > > pvrdma_load for saving and loading the device state,
> > > > which currently includes only the dma, command slot
> > > > and response slot addresses.
> > > >
> > > > Remap the DSR, command slot and response slot upon
> > > > loading the addresses in the pvrdma_load function.
> > > >
> > > > Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> > > > Cc: Yuval Shaia <yuval.shaia@oracle.com>
> > > > Signed-off-by: Sukrit Bhatnagar <skrtbhtngr@gmail.com>
> > > > ---
> > > >  hw/rdma/vmw/pvrdma_main.c | 56 +++++++++++++++++++++++++++++++++++++++
> > > >  1 file changed, 56 insertions(+)
> > > >
> > > > diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
> > > > index adcf79cd63..cd8573173c 100644
> > > > --- a/hw/rdma/vmw/pvrdma_main.c
> > > > +++ b/hw/rdma/vmw/pvrdma_main.c
> > > > @@ -28,6 +28,7 @@
> > > >  #include "sysemu/sysemu.h"
> > > >  #include "monitor/monitor.h"
> > > >  #include "hw/rdma/rdma.h"
> > > > +#include "migration/register.h"
> > > >
> > > >  #include "../rdma_rm.h"
> > > >  #include "../rdma_backend.h"
> > > > @@ -592,9 +593,62 @@ static void pvrdma_shutdown_notifier(Notifier *n, void *opaque)
> > > >      pvrdma_fini(pci_dev);
> > > >  }
> > > >
> > > > +static void pvrdma_save(QEMUFile *f, void *opaque)
> > > > +{
> > > > +    PVRDMADev *dev = PVRDMA_DEV(opaque);
> > > > +
> > > > +    qemu_put_be64(f, dev->dsr_info.dma);
> > > > +    qemu_put_be64(f, dev->dsr_info.dsr->cmd_slot_dma);
> > > > +    qemu_put_be64(f, dev->dsr_info.dsr->resp_slot_dma);
> > > > +}
> > > > +
> > > > +static int pvrdma_load(QEMUFile *f, void *opaque, int version_id)
> > > > +{
> > > > +    PVRDMADev *dev = PVRDMA_DEV(opaque);
> > > > +    PCIDevice *pci_dev = PCI_DEVICE(dev);
> > > > +
> > > > +    // Remap DSR
> > > > +    dev->dsr_info.dma = qemu_get_be64(f);
> > > > +    dev->dsr_info.dsr = rdma_pci_dma_map(pci_dev, dev->dsr_info.dma,
> > > > +                                    sizeof(struct pvrdma_device_shared_region));
> > > > +    if (!dev->dsr_info.dsr) {
> > > > +        rdma_error_report("Failed to map to DSR");
> > > > +        return -1;
> > > > +    }
> > > > +    qemu_log("pvrdma_load: successfully remapped to DSR\n");
> > > > +
> > > > +    // Remap cmd slot
> > > > +    dev->dsr_info.dsr->cmd_slot_dma = qemu_get_be64(f);
> > > > +    dev->dsr_info.req = rdma_pci_dma_map(pci_dev, dev->dsr_info.dsr->cmd_slot_dma,
> > > > +                                     sizeof(union pvrdma_cmd_req));
> > > > +    if (!dev->dsr_info.req) {
> > >
> > > So this is where you hit the error, right?
> > > All looks good to me, i wonder why the first pci_dma_map works fine but
> > > second fails where the global BounceBuffer is marked as used.
> > >
> > > Anyone?
> >
> > I've got a few guesses:
> >   a) My reading of address_space_map is that it gives a bounce buffer
> > pointer and then it has to be freed; so if it's going wrong and using a
> > bounce buffer, then the 1st call would work and only the 2nd would fail.
> 
> In the scenario without any migration, the device is init and load_dsr()
> is called when the guest writes to DSR in BAR1 of pvrdma.
> 
> Inside the load_dsr(), there are similar calls to rdma_pci_dma_map().
> 
> The difference is that when the address_space_map() is called there,
> !memory_access_is_direct() condition is FALSE.
> So, there is no use for BounceBuffer.
> 
> 
> In the migration scenario, when the device at dest calls load and
> subsequently rdma_pci_dma_map(), the !memory_access_is_direct()
> condition is TRUE.
> So, the first rdma_pci_dma_map() will set BounceBuffer from FALSE to
> TRUE and succeed. But, the subsequent calls will fail at atomic_xchg().
> 
> This BounceBuffer is only freed when address_space_unmap() is called.
> 
> 
> I am guessing that when the address is translated to one in FlatView,
> the MemoryRegion returned at dest is causing the issue.
> To be honest, at this point I am not sure how FlatView translations works.
> 
> I tried adding qemu_log to memory_access_is_direct(), but I guess it is
> called too many times, so the vm stalls before it even starts.
> 
> >   b) Perhaps the dma address read from the stream is bad, and isn't
> > pointing into RAM properly - and that's why you're getting a bounce
> > buffer.
> 
> I have logged the addresses saved in pvrdma_save(), and the ones
> loaded in pvrdma_load(). They are same. So, there is no problem in the
> stream itself, I suppose.
> 
> >   c) Do you have some ordering problems? i.e. is this code happening
> > before some other device has been loaded/initialised?  If you're relying
> > on some other state, then perhaps the right thing is to perform these
> > mappings later, at the end of migration, not during the loading itself.
> 
> The device which is saved/loaded before pvrdma is vmxnet3, on which
> the pvrdma depends.
> 
> I have included the last few lines of my traces during migration below.
> The pvrdma migration is performed in this last part of migration.
> I had added some debug messages at various places to see which parts
> of the code are touched. The ones I added are without any timestamp.
> 
> Source:
> 15942@1561806657.197239:vmstate_subsection_save_top PCIDevice
> 15942@1561806657.197257:vmstate_subsection_save_top vmxnet3/pcie
> 15942@1561806657.197275:savevm_section_end 0000:00:10.0/vmxnet3,
> section_id 46 -> 0
> 15942@1561806657.197302:savevm_section_start 0000:00:10.1/pvrdma, section_id 47
> 15942@1561806657.197326:vmstate_save 0000:00:10.1/pvrdma, (old)
> pvrdma_save: dev->dsr_info.dma is 2032963584
> pvrdma_save: dev->dsr_info.dsr->cmd_slot_dma is 2032660480
> pvrdma_save: dev->dsr_info.dsr->resp_slot_dma is 893681664
> 15942@1561806657.197372:savevm_section_end 0000:00:10.1/pvrdma,
> section_id 47 -> 0
> 15942@1561806657.197397:savevm_section_start acpi_build, section_id 48
> 15942@1561806657.197420:vmstate_save acpi_build, acpi_build
> 15942@1561806657.197441:vmstate_save_state_top acpi_build
> 15942@1561806657.197459:vmstate_n_elems patched: 1
> 15942@1561806657.197479:vmstate_save_state_loop acpi_build/patched[1]
> 15942@1561806657.197503:vmstate_subsection_save_top acpi_build
> 15942@1561806657.197520:savevm_section_end acpi_build, section_id 48 -> 0
> 15942@1561806657.197545:savevm_section_start globalstate, section_id 49
> 15942@1561806657.197568:vmstate_save globalstate, globalstate
> 15942@1561806657.197589:vmstate_save_state_top globalstate
> 15942@1561806657.197606:migrate_global_state_pre_save saved state: running
> 15942@1561806657.197624:vmstate_save_state_pre_save_res globalstate/0
> 15942@1561806657.197645:vmstate_n_elems size: 1
> 15942@1561806657.197677:vmstate_save_state_loop globalstate/size[1]
> 15942@1561806657.197710:vmstate_n_elems runstate: 1
> 15942@1561806657.197730:vmstate_save_state_loop globalstate/runstate[1]
> 15942@1561806657.197822:vmstate_subsection_save_top globalstate
> 15942@1561806657.197885:savevm_section_end globalstate, section_id 49 -> 0
> 15942@1561806657.198240:migrate_set_state new state completed
> 15942@1561806657.198269:migration_thread_after_loop
> 15797@1561806657.198329:savevm_state_cleanup
> 15797@1561806657.198377:migrate_fd_cleanup
> 15797@1561806657.199072:qemu_file_fclose
> 
> Destination:
> 15830@1561806657.667024:vmstate_subsection_load vmxnet3/pcie
> 15830@1561806657.667064:vmstate_subsection_load_good vmxnet3/pcie
> 15830@1561806657.667106:vmstate_load_state_end vmxnet3/pcie end/0
> 15830@1561806657.667150:vmstate_subsection_load_good vmxnet3
> 15830@1561806657.667213:vmstate_load_state_end vmxnet3 end/0
> 15830@1561806657.667263:qemu_loadvm_state_section 4
> 15830@1561806657.667293:qemu_loadvm_state_section_startfull
> 47(0000:00:10.1/pvrdma) 0 1
> 15830@1561806657.667346:vmstate_load 0000:00:10.1/pvrdma, (old)
> pvrdma_load: dev->dsr_info.dma is 2032963584
> address_space_map: is_write is 0
> address_space_map: addr is 2032963584
> address_space_map: Inside !memory_access_is_direct
> 15830@1561806657.667453:rdma_pci_dma_map 0x792c9000 -> 0x5616d1f66000 (len=280)
> pvrdma_load: successfully remapped to DSR
> pvrdma_load: dev->dsr_info.dsr->cmd_slot_dma is 2032660480
> address_space_map: is_write is 0
> address_space_map: addr is 2032660480
> address_space_map: Inside !memory_access_is_direct
> address_space_map: Inside atomic_xchg
> qemu-system-x86_64: rdma: pci_dma_map fail, addr=0x7927f000, len=184
> qemu-system-x86_64: rdma: Failed to map to command slot address
> qemu-system-x86_64: error while loading state for instance 0x0 of
> device '0000:00:10.1/pvrdma'
> 15830@1561806657.667693:qemu_loadvm_state_post_main -1
> 15830@1561806657.667729:loadvm_state_cleanup
> 15830@1561806657.668568:process_incoming_migration_co_end ret=-1
> postcopy-state=0
> qemu-system-x86_64: load of migration failed: Operation not permitted
> 15830@1561806657.668721:migrate_set_state new state failed
> 15830@1561806657.668807:qemu_file_fclose
> qemu-system-x86_64: network script /etc/qemu-ifdown failed with status 256
> 
> @Marcel, @Yuval: As David has suggested, what if we just read the dma
> addresses in pvrdma_load(), and let the load_dsr() do the mapping?
> In pvrdma_regs_write(), we can check if dev->dsr_info.dma is already set, so
> that its value is not overwritten.

Have you tried that? Did it works?

So it is like a lazy load and the first command will be bit slower, we can
live with that i guess.

Is there other way to postpone work so it will be executed right after
migration is completes? I just don't like the fact that we are adding an
'if' statement.

In any case, wrap the "if (dev->dsr_info.dma)" with "unlikely" because as
soon as it be initialized it will always true.

> 
> 
> > Other notes:
> >   d) Use VMSTATE macros and structures rather than open-coding qemu_get
> > calls.
> 
> Yes, I am trying it currently.
> 
> >   e) QEMU normally uses /* comments */ rather than //
> 
> I have corrected this mistake in my code. patchew notified me about this
> and the line length issues minutes after I had sent this patch.
> 
> 
> > Dave
> >
> > > > +        rdma_error_report("Failed to map to command slot address");
> > > > +        return -1;
> > > > +    }
> > > > +    qemu_log("pvrdma_load: successfully remapped to cmd slot\n");
> > > > +
> > > > +    // Remap rsp slot
> > >
> > > btw, this is RFC so we are fine but try to avoid such way of comments.
> > >
> > > > +    dev->dsr_info.dsr->resp_slot_dma = qemu_get_be64(f);
> > > > +    dev->dsr_info.rsp = rdma_pci_dma_map(pci_dev, dev->dsr_info.dsr->resp_slot_dma,
> > > > +                                     sizeof(union pvrdma_cmd_resp));
> > > > +    if (!dev->dsr_info.rsp) {
> > > > +        rdma_error_report("Failed to map to response slot address");
> > > > +        return -1;
> > > > +    }
> > > > +    qemu_log("pvrdma_load: successfully remapped to rsp slot\n");
> > > > +
> > > > +    return 0;
> > > > +}
> > > > +
> > > > +static SaveVMHandlers savevm_pvrdma = {
> > > > +    .save_state = pvrdma_save,
> > > > +    .load_state = pvrdma_load,
> > > > +};
> > > > +
> > > >  static void pvrdma_realize(PCIDevice *pdev, Error **errp)
> > > >  {
> > > >      int rc = 0;
> > > > +    DeviceState *s = DEVICE(pdev);
> > > >      PVRDMADev *dev = PVRDMA_DEV(pdev);
> > > >      Object *memdev_root;
> > > >      bool ram_shared = false;
> > > > @@ -666,6 +720,8 @@ static void pvrdma_realize(PCIDevice *pdev, Error **errp)
> > > >      dev->shutdown_notifier.notify = pvrdma_shutdown_notifier;
> > > >      qemu_register_shutdown_notifier(&dev->shutdown_notifier);
> > > >
> > > > +    register_savevm_live(s, "pvrdma", -1, 1, &savevm_pvrdma, dev);
> > > > +
> > > >  out:
> > > >      if (rc) {
> > > >          pvrdma_fini(pdev);
> > > > --
> > > > 2.21.0
> > > >
> > > >
> > >
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC 1/1] hw/pvrdma: Add live migration support
  2019-06-30  8:13         ` Yuval Shaia
@ 2019-07-01  2:15           ` Sukrit Bhatnagar
  2019-07-01 12:08             ` Yuval Shaia
  0 siblings, 1 reply; 17+ messages in thread
From: Sukrit Bhatnagar @ 2019-07-01  2:15 UTC (permalink / raw)
  To: Yuval Shaia; +Cc: Dr. David Alan Gilbert, stefanha, qemu-devel

On Sun, 30 Jun 2019 at 13:43, Yuval Shaia <yuval.shaia@oracle.com> wrote:
>
> On Sat, Jun 29, 2019 at 06:15:21PM +0530, Sukrit Bhatnagar wrote:
> > On Fri, 28 Jun 2019 at 16:56, Dr. David Alan Gilbert
> > <dgilbert@redhat.com> wrote:
> > >
> > > * Yuval Shaia (yuval.shaia@oracle.com) wrote:
> > > > On Fri, Jun 21, 2019 at 08:15:41PM +0530, Sukrit Bhatnagar wrote:
> > > > > Define and register SaveVMHandlers pvrdma_save and
> > > > > pvrdma_load for saving and loading the device state,
> > > > > which currently includes only the dma, command slot
> > > > > and response slot addresses.
> > > > >
> > > > > Remap the DSR, command slot and response slot upon
> > > > > loading the addresses in the pvrdma_load function.
> > > > >
> > > > > Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> > > > > Cc: Yuval Shaia <yuval.shaia@oracle.com>
> > > > > Signed-off-by: Sukrit Bhatnagar <skrtbhtngr@gmail.com>
> > > > > ---
> > > > >  hw/rdma/vmw/pvrdma_main.c | 56 +++++++++++++++++++++++++++++++++++++++
> > > > >  1 file changed, 56 insertions(+)
> > > > >
> > > > > diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
> > > > > index adcf79cd63..cd8573173c 100644
> > > > > --- a/hw/rdma/vmw/pvrdma_main.c
> > > > > +++ b/hw/rdma/vmw/pvrdma_main.c
> > > > > @@ -28,6 +28,7 @@
> > > > >  #include "sysemu/sysemu.h"
> > > > >  #include "monitor/monitor.h"
> > > > >  #include "hw/rdma/rdma.h"
> > > > > +#include "migration/register.h"
> > > > >
> > > > >  #include "../rdma_rm.h"
> > > > >  #include "../rdma_backend.h"
> > > > > @@ -592,9 +593,62 @@ static void pvrdma_shutdown_notifier(Notifier *n, void *opaque)
> > > > >      pvrdma_fini(pci_dev);
> > > > >  }
> > > > >
> > > > > +static void pvrdma_save(QEMUFile *f, void *opaque)
> > > > > +{
> > > > > +    PVRDMADev *dev = PVRDMA_DEV(opaque);
> > > > > +
> > > > > +    qemu_put_be64(f, dev->dsr_info.dma);
> > > > > +    qemu_put_be64(f, dev->dsr_info.dsr->cmd_slot_dma);
> > > > > +    qemu_put_be64(f, dev->dsr_info.dsr->resp_slot_dma);
> > > > > +}
> > > > > +
> > > > > +static int pvrdma_load(QEMUFile *f, void *opaque, int version_id)
> > > > > +{
> > > > > +    PVRDMADev *dev = PVRDMA_DEV(opaque);
> > > > > +    PCIDevice *pci_dev = PCI_DEVICE(dev);
> > > > > +
> > > > > +    // Remap DSR
> > > > > +    dev->dsr_info.dma = qemu_get_be64(f);
> > > > > +    dev->dsr_info.dsr = rdma_pci_dma_map(pci_dev, dev->dsr_info.dma,
> > > > > +                                    sizeof(struct pvrdma_device_shared_region));
> > > > > +    if (!dev->dsr_info.dsr) {
> > > > > +        rdma_error_report("Failed to map to DSR");
> > > > > +        return -1;
> > > > > +    }
> > > > > +    qemu_log("pvrdma_load: successfully remapped to DSR\n");
> > > > > +
> > > > > +    // Remap cmd slot
> > > > > +    dev->dsr_info.dsr->cmd_slot_dma = qemu_get_be64(f);
> > > > > +    dev->dsr_info.req = rdma_pci_dma_map(pci_dev, dev->dsr_info.dsr->cmd_slot_dma,
> > > > > +                                     sizeof(union pvrdma_cmd_req));
> > > > > +    if (!dev->dsr_info.req) {
> > > >
> > > > So this is where you hit the error, right?
> > > > All looks good to me, i wonder why the first pci_dma_map works fine but
> > > > second fails where the global BounceBuffer is marked as used.
> > > >
> > > > Anyone?
> > >
> > > I've got a few guesses:
> > >   a) My reading of address_space_map is that it gives a bounce buffer
> > > pointer and then it has to be freed; so if it's going wrong and using a
> > > bounce buffer, then the 1st call would work and only the 2nd would fail.
> >
> > In the scenario without any migration, the device is init and load_dsr()
> > is called when the guest writes to DSR in BAR1 of pvrdma.
> >
> > Inside the load_dsr(), there are similar calls to rdma_pci_dma_map().
> >
> > The difference is that when the address_space_map() is called there,
> > !memory_access_is_direct() condition is FALSE.
> > So, there is no use for BounceBuffer.
> >
> >
> > In the migration scenario, when the device at dest calls load and
> > subsequently rdma_pci_dma_map(), the !memory_access_is_direct()
> > condition is TRUE.
> > So, the first rdma_pci_dma_map() will set BounceBuffer from FALSE to
> > TRUE and succeed. But, the subsequent calls will fail at atomic_xchg().
> >
> > This BounceBuffer is only freed when address_space_unmap() is called.
> >
> >
> > I am guessing that when the address is translated to one in FlatView,
> > the MemoryRegion returned at dest is causing the issue.
> > To be honest, at this point I am not sure how FlatView translations works.
> >
> > I tried adding qemu_log to memory_access_is_direct(), but I guess it is
> > called too many times, so the vm stalls before it even starts.
> >
> > >   b) Perhaps the dma address read from the stream is bad, and isn't
> > > pointing into RAM properly - and that's why you're getting a bounce
> > > buffer.
> >
> > I have logged the addresses saved in pvrdma_save(), and the ones
> > loaded in pvrdma_load(). They are same. So, there is no problem in the
> > stream itself, I suppose.
> >
> > >   c) Do you have some ordering problems? i.e. is this code happening
> > > before some other device has been loaded/initialised?  If you're relying
> > > on some other state, then perhaps the right thing is to perform these
> > > mappings later, at the end of migration, not during the loading itself.
> >
> > The device which is saved/loaded before pvrdma is vmxnet3, on which
> > the pvrdma depends.
> >
> > I have included the last few lines of my traces during migration below.
> > The pvrdma migration is performed in this last part of migration.
> > I had added some debug messages at various places to see which parts
> > of the code are touched. The ones I added are without any timestamp.
> >
> > Source:
> > 15942@1561806657.197239:vmstate_subsection_save_top PCIDevice
> > 15942@1561806657.197257:vmstate_subsection_save_top vmxnet3/pcie
> > 15942@1561806657.197275:savevm_section_end 0000:00:10.0/vmxnet3,
> > section_id 46 -> 0
> > 15942@1561806657.197302:savevm_section_start 0000:00:10.1/pvrdma, section_id 47
> > 15942@1561806657.197326:vmstate_save 0000:00:10.1/pvrdma, (old)
> > pvrdma_save: dev->dsr_info.dma is 2032963584
> > pvrdma_save: dev->dsr_info.dsr->cmd_slot_dma is 2032660480
> > pvrdma_save: dev->dsr_info.dsr->resp_slot_dma is 893681664
> > 15942@1561806657.197372:savevm_section_end 0000:00:10.1/pvrdma,
> > section_id 47 -> 0
> > 15942@1561806657.197397:savevm_section_start acpi_build, section_id 48
> > 15942@1561806657.197420:vmstate_save acpi_build, acpi_build
> > 15942@1561806657.197441:vmstate_save_state_top acpi_build
> > 15942@1561806657.197459:vmstate_n_elems patched: 1
> > 15942@1561806657.197479:vmstate_save_state_loop acpi_build/patched[1]
> > 15942@1561806657.197503:vmstate_subsection_save_top acpi_build
> > 15942@1561806657.197520:savevm_section_end acpi_build, section_id 48 -> 0
> > 15942@1561806657.197545:savevm_section_start globalstate, section_id 49
> > 15942@1561806657.197568:vmstate_save globalstate, globalstate
> > 15942@1561806657.197589:vmstate_save_state_top globalstate
> > 15942@1561806657.197606:migrate_global_state_pre_save saved state: running
> > 15942@1561806657.197624:vmstate_save_state_pre_save_res globalstate/0
> > 15942@1561806657.197645:vmstate_n_elems size: 1
> > 15942@1561806657.197677:vmstate_save_state_loop globalstate/size[1]
> > 15942@1561806657.197710:vmstate_n_elems runstate: 1
> > 15942@1561806657.197730:vmstate_save_state_loop globalstate/runstate[1]
> > 15942@1561806657.197822:vmstate_subsection_save_top globalstate
> > 15942@1561806657.197885:savevm_section_end globalstate, section_id 49 -> 0
> > 15942@1561806657.198240:migrate_set_state new state completed
> > 15942@1561806657.198269:migration_thread_after_loop
> > 15797@1561806657.198329:savevm_state_cleanup
> > 15797@1561806657.198377:migrate_fd_cleanup
> > 15797@1561806657.199072:qemu_file_fclose
> >
> > Destination:
> > 15830@1561806657.667024:vmstate_subsection_load vmxnet3/pcie
> > 15830@1561806657.667064:vmstate_subsection_load_good vmxnet3/pcie
> > 15830@1561806657.667106:vmstate_load_state_end vmxnet3/pcie end/0
> > 15830@1561806657.667150:vmstate_subsection_load_good vmxnet3
> > 15830@1561806657.667213:vmstate_load_state_end vmxnet3 end/0
> > 15830@1561806657.667263:qemu_loadvm_state_section 4
> > 15830@1561806657.667293:qemu_loadvm_state_section_startfull
> > 47(0000:00:10.1/pvrdma) 0 1
> > 15830@1561806657.667346:vmstate_load 0000:00:10.1/pvrdma, (old)
> > pvrdma_load: dev->dsr_info.dma is 2032963584
> > address_space_map: is_write is 0
> > address_space_map: addr is 2032963584
> > address_space_map: Inside !memory_access_is_direct
> > 15830@1561806657.667453:rdma_pci_dma_map 0x792c9000 -> 0x5616d1f66000 (len=280)
> > pvrdma_load: successfully remapped to DSR
> > pvrdma_load: dev->dsr_info.dsr->cmd_slot_dma is 2032660480
> > address_space_map: is_write is 0
> > address_space_map: addr is 2032660480
> > address_space_map: Inside !memory_access_is_direct
> > address_space_map: Inside atomic_xchg
> > qemu-system-x86_64: rdma: pci_dma_map fail, addr=0x7927f000, len=184
> > qemu-system-x86_64: rdma: Failed to map to command slot address
> > qemu-system-x86_64: error while loading state for instance 0x0 of
> > device '0000:00:10.1/pvrdma'
> > 15830@1561806657.667693:qemu_loadvm_state_post_main -1
> > 15830@1561806657.667729:loadvm_state_cleanup
> > 15830@1561806657.668568:process_incoming_migration_co_end ret=-1
> > postcopy-state=0
> > qemu-system-x86_64: load of migration failed: Operation not permitted
> > 15830@1561806657.668721:migrate_set_state new state failed
> > 15830@1561806657.668807:qemu_file_fclose
> > qemu-system-x86_64: network script /etc/qemu-ifdown failed with status 256
> >
> > @Marcel, @Yuval: As David has suggested, what if we just read the dma
> > addresses in pvrdma_load(), and let the load_dsr() do the mapping?
> > In pvrdma_regs_write(), we can check if dev->dsr_info.dma is already set, so
> > that its value is not overwritten.
>
> Have you tried that? Did it works?

I haven't tried it, but in the dest, the guest will not call pvrdma_regs_write()
for DSR again as it calls it only when the guest driver probes pci.
load_dsr() will not be called then.
So, I guess my assumption was wrong.

> So it is like a lazy load and the first command will be bit slower, we can
> live with that i guess.
>
> Is there other way to postpone work so it will be executed right after
> migration is completes? I just don't like the fact that we are adding an
> 'if' statement.

Once the guest driver has init device at the dest, the only way it
will write to the registers is when it sends a command to the cmd slot.
So, I think the dma mapping for dsr and cmd/resp slots has to be done
in the migration process itself.

I am not sure if there is a callback that we can register for after
the migration
process finishes.


For the BounceBuffer problem:
My understanding is that the vm at dest will have its own address
space, separate from the vm at src.
We have the same address (at src and dest) that we want backend
to map to, and this address is guest physical.
When we are hitting address_space_map(), some flatview translation
takes place which returns a memory region for that address.
This memory region is apparently either not a ram/ram device, or is
readonly (at dest, but not at src).
So the same address in another address space is translated to a
different type of memory region, even when the whole ram is copied
in migration before this.
I guess BounceBuffer is not really the issue.

I need to see what different is happening inside the flatview translation
at the src and dest, but when I add debug messages, the vm stalls
indefinitely as there are too many times it is called. Is there a better way
to debug such cases?


Apart from this, I have been trying to convert my code to use VMSTATE
macros, but I have been hitting dead ends due to complex order of the
values I need to save/load. Shall I start a new thread for this discussion
or should I continue in this one?

> In any case, wrap the "if (dev->dsr_info.dma)" with "unlikely" because as
> soon as it be initialized it will always true.
>
> >
> >
> > > Other notes:
> > >   d) Use VMSTATE macros and structures rather than open-coding qemu_get
> > > calls.
> >
> > Yes, I am trying it currently.
> >
> > >   e) QEMU normally uses /* comments */ rather than //
> >
> > I have corrected this mistake in my code. patchew notified me about this
> > and the line length issues minutes after I had sent this patch.
> >
> >
> > > Dave
> > >
> > > > > +        rdma_error_report("Failed to map to command slot address");
> > > > > +        return -1;
> > > > > +    }
> > > > > +    qemu_log("pvrdma_load: successfully remapped to cmd slot\n");
> > > > > +
> > > > > +    // Remap rsp slot
> > > >
> > > > btw, this is RFC so we are fine but try to avoid such way of comments.
> > > >
> > > > > +    dev->dsr_info.dsr->resp_slot_dma = qemu_get_be64(f);
> > > > > +    dev->dsr_info.rsp = rdma_pci_dma_map(pci_dev, dev->dsr_info.dsr->resp_slot_dma,
> > > > > +                                     sizeof(union pvrdma_cmd_resp));
> > > > > +    if (!dev->dsr_info.rsp) {
> > > > > +        rdma_error_report("Failed to map to response slot address");
> > > > > +        return -1;
> > > > > +    }
> > > > > +    qemu_log("pvrdma_load: successfully remapped to rsp slot\n");
> > > > > +
> > > > > +    return 0;
> > > > > +}
> > > > > +
> > > > > +static SaveVMHandlers savevm_pvrdma = {
> > > > > +    .save_state = pvrdma_save,
> > > > > +    .load_state = pvrdma_load,
> > > > > +};
> > > > > +
> > > > >  static void pvrdma_realize(PCIDevice *pdev, Error **errp)
> > > > >  {
> > > > >      int rc = 0;
> > > > > +    DeviceState *s = DEVICE(pdev);
> > > > >      PVRDMADev *dev = PVRDMA_DEV(pdev);
> > > > >      Object *memdev_root;
> > > > >      bool ram_shared = false;
> > > > > @@ -666,6 +720,8 @@ static void pvrdma_realize(PCIDevice *pdev, Error **errp)
> > > > >      dev->shutdown_notifier.notify = pvrdma_shutdown_notifier;
> > > > >      qemu_register_shutdown_notifier(&dev->shutdown_notifier);
> > > > >
> > > > > +    register_savevm_live(s, "pvrdma", -1, 1, &savevm_pvrdma, dev);
> > > > > +
> > > > >  out:
> > > > >      if (rc) {
> > > > >          pvrdma_fini(pdev);
> > > > > --
> > > > > 2.21.0
> > > > >
> > > > >
> > > >
> > > --
> > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC 1/1] hw/pvrdma: Add live migration support
  2019-07-01  2:15           ` Sukrit Bhatnagar
@ 2019-07-01 12:08             ` Yuval Shaia
  0 siblings, 0 replies; 17+ messages in thread
From: Yuval Shaia @ 2019-07-01 12:08 UTC (permalink / raw)
  To: Sukrit Bhatnagar; +Cc: Dr. David Alan Gilbert, stefanha, qemu-devel

> > >
> > > @Marcel, @Yuval: As David has suggested, what if we just read the dma
> > > addresses in pvrdma_load(), and let the load_dsr() do the mapping?
> > > In pvrdma_regs_write(), we can check if dev->dsr_info.dma is already set, so
> > > that its value is not overwritten.
> >
> > Have you tried that? Did it works?
> 
> I haven't tried it, but in the dest, the guest will not call pvrdma_regs_write()
> for DSR again as it calls it only when the guest driver probes pci.
> load_dsr() will not be called then.
> So, I guess my assumption was wrong.

To clarify. The guest is not "calling" pvrdma_regs_write() directly, it
writes into a register and this triggers the function.

So my thinking was that you meant to hook into *any* such register-write
and call "load_dsr" indirectly, i.e. to simulate a guest writing into DSR
register.

This might work and it is nice idea, just wondering whether you actually
give that a try.

> 
> > So it is like a lazy load and the first command will be bit slower, we can
> > live with that i guess.

That is way i called it "lazy load".

> >


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [RFC 1/1] hw/pvrdma: Add live migration support
  2019-06-29 12:45       ` Sukrit Bhatnagar
  2019-06-30  8:13         ` Yuval Shaia
@ 2019-07-08 10:54         ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 17+ messages in thread
From: Dr. David Alan Gilbert @ 2019-07-08 10:54 UTC (permalink / raw)
  To: Sukrit Bhatnagar; +Cc: stefanha, qemu-devel, Yuval Shaia

* Sukrit Bhatnagar (skrtbhtngr@gmail.com) wrote:
> On Fri, 28 Jun 2019 at 16:56, Dr. David Alan Gilbert
> <dgilbert@redhat.com> wrote:
> >
> > * Yuval Shaia (yuval.shaia@oracle.com) wrote:
> > > On Fri, Jun 21, 2019 at 08:15:41PM +0530, Sukrit Bhatnagar wrote:
> > > > Define and register SaveVMHandlers pvrdma_save and
> > > > pvrdma_load for saving and loading the device state,
> > > > which currently includes only the dma, command slot
> > > > and response slot addresses.
> > > >
> > > > Remap the DSR, command slot and response slot upon
> > > > loading the addresses in the pvrdma_load function.
> > > >
> > > > Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> > > > Cc: Yuval Shaia <yuval.shaia@oracle.com>
> > > > Signed-off-by: Sukrit Bhatnagar <skrtbhtngr@gmail.com>
> > > > ---
> > > >  hw/rdma/vmw/pvrdma_main.c | 56 +++++++++++++++++++++++++++++++++++++++
> > > >  1 file changed, 56 insertions(+)
> > > >
> > > > diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
> > > > index adcf79cd63..cd8573173c 100644
> > > > --- a/hw/rdma/vmw/pvrdma_main.c
> > > > +++ b/hw/rdma/vmw/pvrdma_main.c
> > > > @@ -28,6 +28,7 @@
> > > >  #include "sysemu/sysemu.h"
> > > >  #include "monitor/monitor.h"
> > > >  #include "hw/rdma/rdma.h"
> > > > +#include "migration/register.h"
> > > >
> > > >  #include "../rdma_rm.h"
> > > >  #include "../rdma_backend.h"
> > > > @@ -592,9 +593,62 @@ static void pvrdma_shutdown_notifier(Notifier *n, void *opaque)
> > > >      pvrdma_fini(pci_dev);
> > > >  }
> > > >
> > > > +static void pvrdma_save(QEMUFile *f, void *opaque)
> > > > +{
> > > > +    PVRDMADev *dev = PVRDMA_DEV(opaque);
> > > > +
> > > > +    qemu_put_be64(f, dev->dsr_info.dma);
> > > > +    qemu_put_be64(f, dev->dsr_info.dsr->cmd_slot_dma);
> > > > +    qemu_put_be64(f, dev->dsr_info.dsr->resp_slot_dma);
> > > > +}
> > > > +
> > > > +static int pvrdma_load(QEMUFile *f, void *opaque, int version_id)
> > > > +{
> > > > +    PVRDMADev *dev = PVRDMA_DEV(opaque);
> > > > +    PCIDevice *pci_dev = PCI_DEVICE(dev);
> > > > +
> > > > +    // Remap DSR
> > > > +    dev->dsr_info.dma = qemu_get_be64(f);
> > > > +    dev->dsr_info.dsr = rdma_pci_dma_map(pci_dev, dev->dsr_info.dma,
> > > > +                                    sizeof(struct pvrdma_device_shared_region));
> > > > +    if (!dev->dsr_info.dsr) {
> > > > +        rdma_error_report("Failed to map to DSR");
> > > > +        return -1;
> > > > +    }
> > > > +    qemu_log("pvrdma_load: successfully remapped to DSR\n");
> > > > +
> > > > +    // Remap cmd slot
> > > > +    dev->dsr_info.dsr->cmd_slot_dma = qemu_get_be64(f);
> > > > +    dev->dsr_info.req = rdma_pci_dma_map(pci_dev, dev->dsr_info.dsr->cmd_slot_dma,
> > > > +                                     sizeof(union pvrdma_cmd_req));
> > > > +    if (!dev->dsr_info.req) {
> > >
> > > So this is where you hit the error, right?
> > > All looks good to me, i wonder why the first pci_dma_map works fine but
> > > second fails where the global BounceBuffer is marked as used.
> > >
> > > Anyone?
> >
> > I've got a few guesses:
> >   a) My reading of address_space_map is that it gives a bounce buffer
> > pointer and then it has to be freed; so if it's going wrong and using a
> > bounce buffer, then the 1st call would work and only the 2nd would fail.
> 
> In the scenario without any migration, the device is init and load_dsr()
> is called when the guest writes to DSR in BAR1 of pvrdma.
> 
> Inside the load_dsr(), there are similar calls to rdma_pci_dma_map().
> 
> The difference is that when the address_space_map() is called there,
> !memory_access_is_direct() condition is FALSE.
> So, there is no use for BounceBuffer.
> 
> 
> In the migration scenario, when the device at dest calls load and
> subsequently rdma_pci_dma_map(), the !memory_access_is_direct()
> condition is TRUE.
> So, the first rdma_pci_dma_map() will set BounceBuffer from FALSE to
> TRUE and succeed. But, the subsequent calls will fail at atomic_xchg().
> 
> This BounceBuffer is only freed when address_space_unmap() is called.
> 
> 
> I am guessing that when the address is translated to one in FlatView,
> the MemoryRegion returned at dest is causing the issue.
> To be honest, at this point I am not sure how FlatView translations works.
> 
> I tried adding qemu_log to memory_access_is_direct(), but I guess it is
> called too many times, so the vm stalls before it even starts.

The insides of flatview are quite complex, but the idea is pretty
simple;  you've got a memory address and you need to find the
corresponding chunk of QEMU device that it coreresponds to (e.g. part of
a RAM or ROM).   The basics start out as a hierarchy (e.g. this is
'system memory' that contains 'RAM, ROM etc etc') and it's broken down in
a tree; but flatview squashes that tree and is a simpler address->device
mapping.

At the qemu monitor you can do:
  info mtree to see the tree
or
  info mtree -f
to see all the flatviews.

So perhaps you should print out some of the addresses you're migrating
and try the info mtree -f on the now paused source to see what they are
part of.  You might also be able to use the code from HMP's monitor info
mtree to dump the flatview on the destination just before your migration
code to see if it looks the same.

Dave

> >   b) Perhaps the dma address read from the stream is bad, and isn't
> > pointing into RAM properly - and that's why you're getting a bounce
> > buffer.
> 
> I have logged the addresses saved in pvrdma_save(), and the ones
> loaded in pvrdma_load(). They are same. So, there is no problem in the
> stream itself, I suppose.
> 
> >   c) Do you have some ordering problems? i.e. is this code happening
> > before some other device has been loaded/initialised?  If you're relying
> > on some other state, then perhaps the right thing is to perform these
> > mappings later, at the end of migration, not during the loading itself.
> 
> The device which is saved/loaded before pvrdma is vmxnet3, on which
> the pvrdma depends.
> 
> I have included the last few lines of my traces during migration below.
> The pvrdma migration is performed in this last part of migration.
> I had added some debug messages at various places to see which parts
> of the code are touched. The ones I added are without any timestamp.
> 
> Source:
> 15942@1561806657.197239:vmstate_subsection_save_top PCIDevice
> 15942@1561806657.197257:vmstate_subsection_save_top vmxnet3/pcie
> 15942@1561806657.197275:savevm_section_end 0000:00:10.0/vmxnet3,
> section_id 46 -> 0
> 15942@1561806657.197302:savevm_section_start 0000:00:10.1/pvrdma, section_id 47
> 15942@1561806657.197326:vmstate_save 0000:00:10.1/pvrdma, (old)
> pvrdma_save: dev->dsr_info.dma is 2032963584
> pvrdma_save: dev->dsr_info.dsr->cmd_slot_dma is 2032660480
> pvrdma_save: dev->dsr_info.dsr->resp_slot_dma is 893681664
> 15942@1561806657.197372:savevm_section_end 0000:00:10.1/pvrdma,
> section_id 47 -> 0
> 15942@1561806657.197397:savevm_section_start acpi_build, section_id 48
> 15942@1561806657.197420:vmstate_save acpi_build, acpi_build
> 15942@1561806657.197441:vmstate_save_state_top acpi_build
> 15942@1561806657.197459:vmstate_n_elems patched: 1
> 15942@1561806657.197479:vmstate_save_state_loop acpi_build/patched[1]
> 15942@1561806657.197503:vmstate_subsection_save_top acpi_build
> 15942@1561806657.197520:savevm_section_end acpi_build, section_id 48 -> 0
> 15942@1561806657.197545:savevm_section_start globalstate, section_id 49
> 15942@1561806657.197568:vmstate_save globalstate, globalstate
> 15942@1561806657.197589:vmstate_save_state_top globalstate
> 15942@1561806657.197606:migrate_global_state_pre_save saved state: running
> 15942@1561806657.197624:vmstate_save_state_pre_save_res globalstate/0
> 15942@1561806657.197645:vmstate_n_elems size: 1
> 15942@1561806657.197677:vmstate_save_state_loop globalstate/size[1]
> 15942@1561806657.197710:vmstate_n_elems runstate: 1
> 15942@1561806657.197730:vmstate_save_state_loop globalstate/runstate[1]
> 15942@1561806657.197822:vmstate_subsection_save_top globalstate
> 15942@1561806657.197885:savevm_section_end globalstate, section_id 49 -> 0
> 15942@1561806657.198240:migrate_set_state new state completed
> 15942@1561806657.198269:migration_thread_after_loop
> 15797@1561806657.198329:savevm_state_cleanup
> 15797@1561806657.198377:migrate_fd_cleanup
> 15797@1561806657.199072:qemu_file_fclose
> 
> Destination:
> 15830@1561806657.667024:vmstate_subsection_load vmxnet3/pcie
> 15830@1561806657.667064:vmstate_subsection_load_good vmxnet3/pcie
> 15830@1561806657.667106:vmstate_load_state_end vmxnet3/pcie end/0
> 15830@1561806657.667150:vmstate_subsection_load_good vmxnet3
> 15830@1561806657.667213:vmstate_load_state_end vmxnet3 end/0
> 15830@1561806657.667263:qemu_loadvm_state_section 4
> 15830@1561806657.667293:qemu_loadvm_state_section_startfull
> 47(0000:00:10.1/pvrdma) 0 1
> 15830@1561806657.667346:vmstate_load 0000:00:10.1/pvrdma, (old)
> pvrdma_load: dev->dsr_info.dma is 2032963584
> address_space_map: is_write is 0
> address_space_map: addr is 2032963584
> address_space_map: Inside !memory_access_is_direct
> 15830@1561806657.667453:rdma_pci_dma_map 0x792c9000 -> 0x5616d1f66000 (len=280)
> pvrdma_load: successfully remapped to DSR
> pvrdma_load: dev->dsr_info.dsr->cmd_slot_dma is 2032660480
> address_space_map: is_write is 0
> address_space_map: addr is 2032660480
> address_space_map: Inside !memory_access_is_direct
> address_space_map: Inside atomic_xchg
> qemu-system-x86_64: rdma: pci_dma_map fail, addr=0x7927f000, len=184
> qemu-system-x86_64: rdma: Failed to map to command slot address
> qemu-system-x86_64: error while loading state for instance 0x0 of
> device '0000:00:10.1/pvrdma'
> 15830@1561806657.667693:qemu_loadvm_state_post_main -1
> 15830@1561806657.667729:loadvm_state_cleanup
> 15830@1561806657.668568:process_incoming_migration_co_end ret=-1
> postcopy-state=0
> qemu-system-x86_64: load of migration failed: Operation not permitted
> 15830@1561806657.668721:migrate_set_state new state failed
> 15830@1561806657.668807:qemu_file_fclose
> qemu-system-x86_64: network script /etc/qemu-ifdown failed with status 256
> 
> @Marcel, @Yuval: As David has suggested, what if we just read the dma
> addresses in pvrdma_load(), and let the load_dsr() do the mapping?
> In pvrdma_regs_write(), we can check if dev->dsr_info.dma is already set, so
> that its value is not overwritten.
> 
> 
> > Other notes:
> >   d) Use VMSTATE macros and structures rather than open-coding qemu_get
> > calls.
> 
> Yes, I am trying it currently.
> 
> >   e) QEMU normally uses /* comments */ rather than //
> 
> I have corrected this mistake in my code. patchew notified me about this
> and the line length issues minutes after I had sent this patch.
> 
> 
> > Dave
> >
> > > > +        rdma_error_report("Failed to map to command slot address");
> > > > +        return -1;
> > > > +    }
> > > > +    qemu_log("pvrdma_load: successfully remapped to cmd slot\n");
> > > > +
> > > > +    // Remap rsp slot
> > >
> > > btw, this is RFC so we are fine but try to avoid such way of comments.
> > >
> > > > +    dev->dsr_info.dsr->resp_slot_dma = qemu_get_be64(f);
> > > > +    dev->dsr_info.rsp = rdma_pci_dma_map(pci_dev, dev->dsr_info.dsr->resp_slot_dma,
> > > > +                                     sizeof(union pvrdma_cmd_resp));
> > > > +    if (!dev->dsr_info.rsp) {
> > > > +        rdma_error_report("Failed to map to response slot address");
> > > > +        return -1;
> > > > +    }
> > > > +    qemu_log("pvrdma_load: successfully remapped to rsp slot\n");
> > > > +
> > > > +    return 0;
> > > > +}
> > > > +
> > > > +static SaveVMHandlers savevm_pvrdma = {
> > > > +    .save_state = pvrdma_save,
> > > > +    .load_state = pvrdma_load,
> > > > +};
> > > > +
> > > >  static void pvrdma_realize(PCIDevice *pdev, Error **errp)
> > > >  {
> > > >      int rc = 0;
> > > > +    DeviceState *s = DEVICE(pdev);
> > > >      PVRDMADev *dev = PVRDMA_DEV(pdev);
> > > >      Object *memdev_root;
> > > >      bool ram_shared = false;
> > > > @@ -666,6 +720,8 @@ static void pvrdma_realize(PCIDevice *pdev, Error **errp)
> > > >      dev->shutdown_notifier.notify = pvrdma_shutdown_notifier;
> > > >      qemu_register_shutdown_notifier(&dev->shutdown_notifier);
> > > >
> > > > +    register_savevm_live(s, "pvrdma", -1, 1, &savevm_pvrdma, dev);
> > > > +
> > > >  out:
> > > >      if (rc) {
> > > >          pvrdma_fini(pdev);
> > > > --
> > > > 2.21.0
> > > >
> > > >
> > >
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-07-08 10:56 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-21 14:45 [Qemu-devel] [RFC 0/1] Add live migration support to the PVRDMA device Sukrit Bhatnagar
2019-06-21 14:45 ` [Qemu-devel] [RFC 1/1] hw/pvrdma: Add live migration support Sukrit Bhatnagar
2019-06-25 15:32   ` Yuval Shaia
2019-06-28 11:26     ` Dr. David Alan Gilbert
2019-06-29 12:45       ` Sukrit Bhatnagar
2019-06-30  8:13         ` Yuval Shaia
2019-07-01  2:15           ` Sukrit Bhatnagar
2019-07-01 12:08             ` Yuval Shaia
2019-07-08 10:54         ` Dr. David Alan Gilbert
2019-06-25 15:38   ` Yuval Shaia
2019-06-21 17:55 ` [Qemu-devel] [RFC 0/1] Add live migration support to the PVRDMA device no-reply
2019-06-25  8:14 ` Marcel Apfelbaum
2019-06-25  8:39   ` Dmitry Fleytman
2019-06-25  8:49     ` Marcel Apfelbaum
2019-06-25  9:11       ` Dr. David Alan Gilbert
2019-06-25 11:39         ` Marcel Apfelbaum
2019-06-25 10:25       ` Dmitry Fleytman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).