qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Sukrit Bhatnagar <skrtbhtngr@gmail.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: stefanha@redhat.com, qemu-devel@nongnu.org,
	Yuval Shaia <yuval.shaia@oracle.com>
Subject: Re: [Qemu-devel] [RFC 1/1] hw/pvrdma: Add live migration support
Date: Sat, 29 Jun 2019 18:15:21 +0530	[thread overview]
Message-ID: <CAMzgYoOb7VbFk3U6zP9OvDPtcbT3K4CHjBVuaKUzQWfe_yFCYQ@mail.gmail.com> (raw)
In-Reply-To: <20190628112634.GB2987@work-vm>

On Fri, 28 Jun 2019 at 16:56, Dr. David Alan Gilbert
<dgilbert@redhat.com> wrote:
>
> * Yuval Shaia (yuval.shaia@oracle.com) wrote:
> > On Fri, Jun 21, 2019 at 08:15:41PM +0530, Sukrit Bhatnagar wrote:
> > > Define and register SaveVMHandlers pvrdma_save and
> > > pvrdma_load for saving and loading the device state,
> > > which currently includes only the dma, command slot
> > > and response slot addresses.
> > >
> > > Remap the DSR, command slot and response slot upon
> > > loading the addresses in the pvrdma_load function.
> > >
> > > Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> > > Cc: Yuval Shaia <yuval.shaia@oracle.com>
> > > Signed-off-by: Sukrit Bhatnagar <skrtbhtngr@gmail.com>
> > > ---
> > >  hw/rdma/vmw/pvrdma_main.c | 56 +++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 56 insertions(+)
> > >
> > > diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
> > > index adcf79cd63..cd8573173c 100644
> > > --- a/hw/rdma/vmw/pvrdma_main.c
> > > +++ b/hw/rdma/vmw/pvrdma_main.c
> > > @@ -28,6 +28,7 @@
> > >  #include "sysemu/sysemu.h"
> > >  #include "monitor/monitor.h"
> > >  #include "hw/rdma/rdma.h"
> > > +#include "migration/register.h"
> > >
> > >  #include "../rdma_rm.h"
> > >  #include "../rdma_backend.h"
> > > @@ -592,9 +593,62 @@ static void pvrdma_shutdown_notifier(Notifier *n, void *opaque)
> > >      pvrdma_fini(pci_dev);
> > >  }
> > >
> > > +static void pvrdma_save(QEMUFile *f, void *opaque)
> > > +{
> > > +    PVRDMADev *dev = PVRDMA_DEV(opaque);
> > > +
> > > +    qemu_put_be64(f, dev->dsr_info.dma);
> > > +    qemu_put_be64(f, dev->dsr_info.dsr->cmd_slot_dma);
> > > +    qemu_put_be64(f, dev->dsr_info.dsr->resp_slot_dma);
> > > +}
> > > +
> > > +static int pvrdma_load(QEMUFile *f, void *opaque, int version_id)
> > > +{
> > > +    PVRDMADev *dev = PVRDMA_DEV(opaque);
> > > +    PCIDevice *pci_dev = PCI_DEVICE(dev);
> > > +
> > > +    // Remap DSR
> > > +    dev->dsr_info.dma = qemu_get_be64(f);
> > > +    dev->dsr_info.dsr = rdma_pci_dma_map(pci_dev, dev->dsr_info.dma,
> > > +                                    sizeof(struct pvrdma_device_shared_region));
> > > +    if (!dev->dsr_info.dsr) {
> > > +        rdma_error_report("Failed to map to DSR");
> > > +        return -1;
> > > +    }
> > > +    qemu_log("pvrdma_load: successfully remapped to DSR\n");
> > > +
> > > +    // Remap cmd slot
> > > +    dev->dsr_info.dsr->cmd_slot_dma = qemu_get_be64(f);
> > > +    dev->dsr_info.req = rdma_pci_dma_map(pci_dev, dev->dsr_info.dsr->cmd_slot_dma,
> > > +                                     sizeof(union pvrdma_cmd_req));
> > > +    if (!dev->dsr_info.req) {
> >
> > So this is where you hit the error, right?
> > All looks good to me, i wonder why the first pci_dma_map works fine but
> > second fails where the global BounceBuffer is marked as used.
> >
> > Anyone?
>
> I've got a few guesses:
>   a) My reading of address_space_map is that it gives a bounce buffer
> pointer and then it has to be freed; so if it's going wrong and using a
> bounce buffer, then the 1st call would work and only the 2nd would fail.

In the scenario without any migration, the device is init and load_dsr()
is called when the guest writes to DSR in BAR1 of pvrdma.

Inside the load_dsr(), there are similar calls to rdma_pci_dma_map().

The difference is that when the address_space_map() is called there,
!memory_access_is_direct() condition is FALSE.
So, there is no use for BounceBuffer.


In the migration scenario, when the device at dest calls load and
subsequently rdma_pci_dma_map(), the !memory_access_is_direct()
condition is TRUE.
So, the first rdma_pci_dma_map() will set BounceBuffer from FALSE to
TRUE and succeed. But, the subsequent calls will fail at atomic_xchg().

This BounceBuffer is only freed when address_space_unmap() is called.


I am guessing that when the address is translated to one in FlatView,
the MemoryRegion returned at dest is causing the issue.
To be honest, at this point I am not sure how FlatView translations works.

I tried adding qemu_log to memory_access_is_direct(), but I guess it is
called too many times, so the vm stalls before it even starts.

>   b) Perhaps the dma address read from the stream is bad, and isn't
> pointing into RAM properly - and that's why you're getting a bounce
> buffer.

I have logged the addresses saved in pvrdma_save(), and the ones
loaded in pvrdma_load(). They are same. So, there is no problem in the
stream itself, I suppose.

>   c) Do you have some ordering problems? i.e. is this code happening
> before some other device has been loaded/initialised?  If you're relying
> on some other state, then perhaps the right thing is to perform these
> mappings later, at the end of migration, not during the loading itself.

The device which is saved/loaded before pvrdma is vmxnet3, on which
the pvrdma depends.

I have included the last few lines of my traces during migration below.
The pvrdma migration is performed in this last part of migration.
I had added some debug messages at various places to see which parts
of the code are touched. The ones I added are without any timestamp.

Source:
15942@1561806657.197239:vmstate_subsection_save_top PCIDevice
15942@1561806657.197257:vmstate_subsection_save_top vmxnet3/pcie
15942@1561806657.197275:savevm_section_end 0000:00:10.0/vmxnet3,
section_id 46 -> 0
15942@1561806657.197302:savevm_section_start 0000:00:10.1/pvrdma, section_id 47
15942@1561806657.197326:vmstate_save 0000:00:10.1/pvrdma, (old)
pvrdma_save: dev->dsr_info.dma is 2032963584
pvrdma_save: dev->dsr_info.dsr->cmd_slot_dma is 2032660480
pvrdma_save: dev->dsr_info.dsr->resp_slot_dma is 893681664
15942@1561806657.197372:savevm_section_end 0000:00:10.1/pvrdma,
section_id 47 -> 0
15942@1561806657.197397:savevm_section_start acpi_build, section_id 48
15942@1561806657.197420:vmstate_save acpi_build, acpi_build
15942@1561806657.197441:vmstate_save_state_top acpi_build
15942@1561806657.197459:vmstate_n_elems patched: 1
15942@1561806657.197479:vmstate_save_state_loop acpi_build/patched[1]
15942@1561806657.197503:vmstate_subsection_save_top acpi_build
15942@1561806657.197520:savevm_section_end acpi_build, section_id 48 -> 0
15942@1561806657.197545:savevm_section_start globalstate, section_id 49
15942@1561806657.197568:vmstate_save globalstate, globalstate
15942@1561806657.197589:vmstate_save_state_top globalstate
15942@1561806657.197606:migrate_global_state_pre_save saved state: running
15942@1561806657.197624:vmstate_save_state_pre_save_res globalstate/0
15942@1561806657.197645:vmstate_n_elems size: 1
15942@1561806657.197677:vmstate_save_state_loop globalstate/size[1]
15942@1561806657.197710:vmstate_n_elems runstate: 1
15942@1561806657.197730:vmstate_save_state_loop globalstate/runstate[1]
15942@1561806657.197822:vmstate_subsection_save_top globalstate
15942@1561806657.197885:savevm_section_end globalstate, section_id 49 -> 0
15942@1561806657.198240:migrate_set_state new state completed
15942@1561806657.198269:migration_thread_after_loop
15797@1561806657.198329:savevm_state_cleanup
15797@1561806657.198377:migrate_fd_cleanup
15797@1561806657.199072:qemu_file_fclose

Destination:
15830@1561806657.667024:vmstate_subsection_load vmxnet3/pcie
15830@1561806657.667064:vmstate_subsection_load_good vmxnet3/pcie
15830@1561806657.667106:vmstate_load_state_end vmxnet3/pcie end/0
15830@1561806657.667150:vmstate_subsection_load_good vmxnet3
15830@1561806657.667213:vmstate_load_state_end vmxnet3 end/0
15830@1561806657.667263:qemu_loadvm_state_section 4
15830@1561806657.667293:qemu_loadvm_state_section_startfull
47(0000:00:10.1/pvrdma) 0 1
15830@1561806657.667346:vmstate_load 0000:00:10.1/pvrdma, (old)
pvrdma_load: dev->dsr_info.dma is 2032963584
address_space_map: is_write is 0
address_space_map: addr is 2032963584
address_space_map: Inside !memory_access_is_direct
15830@1561806657.667453:rdma_pci_dma_map 0x792c9000 -> 0x5616d1f66000 (len=280)
pvrdma_load: successfully remapped to DSR
pvrdma_load: dev->dsr_info.dsr->cmd_slot_dma is 2032660480
address_space_map: is_write is 0
address_space_map: addr is 2032660480
address_space_map: Inside !memory_access_is_direct
address_space_map: Inside atomic_xchg
qemu-system-x86_64: rdma: pci_dma_map fail, addr=0x7927f000, len=184
qemu-system-x86_64: rdma: Failed to map to command slot address
qemu-system-x86_64: error while loading state for instance 0x0 of
device '0000:00:10.1/pvrdma'
15830@1561806657.667693:qemu_loadvm_state_post_main -1
15830@1561806657.667729:loadvm_state_cleanup
15830@1561806657.668568:process_incoming_migration_co_end ret=-1
postcopy-state=0
qemu-system-x86_64: load of migration failed: Operation not permitted
15830@1561806657.668721:migrate_set_state new state failed
15830@1561806657.668807:qemu_file_fclose
qemu-system-x86_64: network script /etc/qemu-ifdown failed with status 256

@Marcel, @Yuval: As David has suggested, what if we just read the dma
addresses in pvrdma_load(), and let the load_dsr() do the mapping?
In pvrdma_regs_write(), we can check if dev->dsr_info.dma is already set, so
that its value is not overwritten.


> Other notes:
>   d) Use VMSTATE macros and structures rather than open-coding qemu_get
> calls.

Yes, I am trying it currently.

>   e) QEMU normally uses /* comments */ rather than //

I have corrected this mistake in my code. patchew notified me about this
and the line length issues minutes after I had sent this patch.


> Dave
>
> > > +        rdma_error_report("Failed to map to command slot address");
> > > +        return -1;
> > > +    }
> > > +    qemu_log("pvrdma_load: successfully remapped to cmd slot\n");
> > > +
> > > +    // Remap rsp slot
> >
> > btw, this is RFC so we are fine but try to avoid such way of comments.
> >
> > > +    dev->dsr_info.dsr->resp_slot_dma = qemu_get_be64(f);
> > > +    dev->dsr_info.rsp = rdma_pci_dma_map(pci_dev, dev->dsr_info.dsr->resp_slot_dma,
> > > +                                     sizeof(union pvrdma_cmd_resp));
> > > +    if (!dev->dsr_info.rsp) {
> > > +        rdma_error_report("Failed to map to response slot address");
> > > +        return -1;
> > > +    }
> > > +    qemu_log("pvrdma_load: successfully remapped to rsp slot\n");
> > > +
> > > +    return 0;
> > > +}
> > > +
> > > +static SaveVMHandlers savevm_pvrdma = {
> > > +    .save_state = pvrdma_save,
> > > +    .load_state = pvrdma_load,
> > > +};
> > > +
> > >  static void pvrdma_realize(PCIDevice *pdev, Error **errp)
> > >  {
> > >      int rc = 0;
> > > +    DeviceState *s = DEVICE(pdev);
> > >      PVRDMADev *dev = PVRDMA_DEV(pdev);
> > >      Object *memdev_root;
> > >      bool ram_shared = false;
> > > @@ -666,6 +720,8 @@ static void pvrdma_realize(PCIDevice *pdev, Error **errp)
> > >      dev->shutdown_notifier.notify = pvrdma_shutdown_notifier;
> > >      qemu_register_shutdown_notifier(&dev->shutdown_notifier);
> > >
> > > +    register_savevm_live(s, "pvrdma", -1, 1, &savevm_pvrdma, dev);
> > > +
> > >  out:
> > >      if (rc) {
> > >          pvrdma_fini(pdev);
> > > --
> > > 2.21.0
> > >
> > >
> >
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


  reply	other threads:[~2019-06-29 12:47 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-21 14:45 [Qemu-devel] [RFC 0/1] Add live migration support to the PVRDMA device Sukrit Bhatnagar
2019-06-21 14:45 ` [Qemu-devel] [RFC 1/1] hw/pvrdma: Add live migration support Sukrit Bhatnagar
2019-06-25 15:32   ` Yuval Shaia
2019-06-28 11:26     ` Dr. David Alan Gilbert
2019-06-29 12:45       ` Sukrit Bhatnagar [this message]
2019-06-30  8:13         ` Yuval Shaia
2019-07-01  2:15           ` Sukrit Bhatnagar
2019-07-01 12:08             ` Yuval Shaia
2019-07-08 10:54         ` Dr. David Alan Gilbert
2019-06-25 15:38   ` Yuval Shaia
2019-06-21 17:55 ` [Qemu-devel] [RFC 0/1] Add live migration support to the PVRDMA device no-reply
2019-06-25  8:14 ` Marcel Apfelbaum
2019-06-25  8:39   ` Dmitry Fleytman
2019-06-25  8:49     ` Marcel Apfelbaum
2019-06-25  9:11       ` Dr. David Alan Gilbert
2019-06-25 11:39         ` Marcel Apfelbaum
2019-06-25 10:25       ` Dmitry Fleytman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMzgYoOb7VbFk3U6zP9OvDPtcbT3K4CHjBVuaKUzQWfe_yFCYQ@mail.gmail.com \
    --to=skrtbhtngr@gmail.com \
    --cc=dgilbert@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=yuval.shaia@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).