All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Zeng <jason.zeng@intel.com>
To: Steven Sistare <steven.sistare@oracle.com>
Cc: "Daniel P. Berrange" <berrange@redhat.com>,
	"Juan Quintela" <quintela@redhat.com>,
	"Markus Armbruster" <armbru@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	qemu-devel@nongnu.org,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"Marc-André Lureau" <marcandre.lureau@redhat.com>,
	"Jason Zeng" <jason.zeng@linux.intel.com>,
	"Philippe Mathieu-Daudé" <philmd@redhat.com>,
	"Alex Bennée" <alex.bennee@linaro.org>
Subject: Re: [PATCH V1 30/32] vfio-pci: save and restore
Date: Mon, 10 Aug 2020 11:50:59 +0800	[thread overview]
Message-ID: <20200810035059.GA3463@x48> (raw)
In-Reply-To: <5d2e3c90-eb8c-569f-ef4a-5016756725c7@oracle.com>

On Fri, Aug 07, 2020 at 04:38:12PM -0400, Steven Sistare wrote:
> On 8/6/2020 6:22 AM, Jason Zeng wrote:
> > Hi Steve,
> > 
> > On Thu, Jul 30, 2020 at 08:14:34AM -0700, Steve Sistare wrote:
> >> @@ -3182,6 +3207,51 @@ static Property vfio_pci_dev_properties[] = {
> >>      DEFINE_PROP_END_OF_LIST(),
> >>  };
> >>  
> >> +static int vfio_pci_post_load(void *opaque, int version_id)
> >> +{
> >> +    int vector;
> >> +    MSIMessage msg;
> >> +    Error *err = 0;
> >> +    VFIOPCIDevice *vdev = opaque;
> >> +    PCIDevice *pdev = &vdev->pdev;
> >> +
> >> +    if (msix_enabled(pdev)) {
> >> +        vfio_msix_enable(vdev);
> >> +        pdev->msix_function_masked = false;
> >> +
> >> +        for (vector = 0; vector < vdev->pdev.msix_entries_nr; vector++) {
> >> +            if (!msix_is_masked(pdev, vector)) {
> >> +                msg = msix_get_message(pdev, vector);
> >> +                vfio_msix_vector_use(pdev, vector, msg);
> >> +            }
> >> +        }
> > 
> > It looks to me MSIX re-init here may lose device IRQs and impact
> > device hardware state?
> > 
> > The re-init will cause the kernel vfio driver to connect the device
> > MSIX vectors to new eventfds and KVM instance. But before that, device
> > IRQs will be routed to previous eventfd. Looks these IRQs will be lost.
> 
> Thanks Jason, that sounds like a problem.  I could try reading and saving an 
> event from eventfd before shutdown, and injecting it into the eventfd after
> restart, but that would be racy unless I disable interrupts.  Or, unconditionally
> inject a spurious interrupt after restart to kick it, in case an interrupt 
> was lost.
> 
> Do you have any other ideas?

Maybe we can consider to also hand over the eventfd file descriptor, or
even the KVM fds to the new Qemu?

If the KVM fds can be preserved, we will just need to restore Qemu KVM
side states. But not sure how complicated the implementation would be.

If we only preserve the eventfd fd, we can attach the old eventfd to
vfio devices. But looks it may turn out we always inject an interrupt
unconditionally, because kernel KVM irqfd eventfd handling is a bit
different than normal user land eventfd read/write. It doesn't decrease
the counter in the eventfd context. So if we read the eventfd from new
Qemu, it looks will always have a non-zero counter, which requires an
interrupt injection.

> 
> > And the re-init will make the device go through the procedure of
> > disabling MSIX, enabling INTX, and re-enabling MSIX and vectors.
> > So if the device is active, its hardware state will be impacted?
> 
> Again thanks.  vfio_msix_enable() does indeed call vfio_disable_interrupts().
> For a quick experiment, I deleted that call in for the post_load code path, and 
> it seems to work fine, but I need to study it more.

vfio_msix_vector_use() will also trigger this procedure in the kernel.

Looks we shouldn't trigger any kernel vfio actions here? Because we
preserve vfio fds, so its kernel state shouldn't be touched. Here we
may only need to restore Qemu states. Re-connect to KVM instance should
be done automatically when we setup the KVM irqfds with the same eventfd.

BTW, if I remember correctly, it is not enough to only save MSIX state
in the snapshot. We should also save the Qemu side pci config space
cache to the snapshot, because Qemu's copy is not exactly the same as
the kernel's copy. I encountered this before, but I don't remember which
field it was.

And another question, why don't we support MSI? I see the code only
handles MSIX?

Thanks,
Jason


> 
> - Steve
>  
> >> +
> >> +    } else if (vfio_pci_read_config(pdev, PCI_INTERRUPT_PIN, 1)) {
> >> +        vfio_intx_enable(vdev, &err);
> >> +        if (err) {
> >> +            error_report_err(err);
> >> +        }
> >> +    }
> >> +
> >> +    vdev->vbasedev.group->container->reused = false;
> >> +    vdev->pdev.reused = false;
> >> +
> >> +    return 0;
> >> +}
> >> +
> >> +static const VMStateDescription vfio_pci_vmstate = {
> >> +    .name = "vfio-pci",
> >> +    .unmigratable = 1,
> >> +    .mode_mask = VMS_RESTART,
> >> +    .version_id = 0,
> >> +    .minimum_version_id = 0,
> >> +    .post_load = vfio_pci_post_load,
> >> +    .fields = (VMStateField[]) {
> >> +        VMSTATE_MSIX(pdev, VFIOPCIDevice),
> >> +        VMSTATE_END_OF_LIST()
> >> +    }
> >> +};
> >> +
> >>  static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
> >>  {
> >>      DeviceClass *dc = DEVICE_CLASS(klass);
> >> @@ -3189,6 +3259,7 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
> >>  
> >>      dc->reset = vfio_pci_reset;
> >>      device_class_set_props(dc, vfio_pci_dev_properties);
> >> +    dc->vmsd = &vfio_pci_vmstate;
> >>      dc->desc = "VFIO-based PCI device assignment";
> >>      set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> >>      pdc->realize = vfio_realize;
> >> diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
> >> index ac2cefc..e6e1a5d 100644
> >> --- a/hw/vfio/platform.c
> >> +++ b/hw/vfio/platform.c
> >> @@ -592,7 +592,7 @@ static int vfio_base_device_init(VFIODevice *vbasedev, Error **errp)
> >>              return -EBUSY;
> >>          }
> >>      }
> >> -    ret = vfio_get_device(group, vbasedev->name, vbasedev, errp);
> >> +    ret = vfio_get_device(group, vbasedev->name, vbasedev, 0, errp);
> >>      if (ret) {
> >>          vfio_put_group(group);
> >>          return ret;
> >> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> >> index bd07c86..c926a24 100644
> >> --- a/include/hw/pci/pci.h
> >> +++ b/include/hw/pci/pci.h
> >> @@ -358,6 +358,7 @@ struct PCIDevice {
> >>  
> >>      /* ID of standby device in net_failover pair */
> >>      char *failover_pair_id;
> >> +    bool reused;
> >>  };
> >>  
> >>  void pci_register_bar(PCIDevice *pci_dev, int region_num,
> >> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> >> index c78f3ff..4e2a332 100644
> >> --- a/include/hw/vfio/vfio-common.h
> >> +++ b/include/hw/vfio/vfio-common.h
> >> @@ -73,6 +73,8 @@ typedef struct VFIOContainer {
> >>      unsigned iommu_type;
> >>      Error *error;
> >>      bool initialized;
> >> +    bool reused;
> >> +    int cid;
> >>      unsigned long pgsizes;
> >>      QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
> >>      QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
> >> @@ -177,7 +179,7 @@ void vfio_reset_handler(void *opaque);
> >>  VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp);
> >>  void vfio_put_group(VFIOGroup *group);
> >>  int vfio_get_device(VFIOGroup *group, const char *name,
> >> -                    VFIODevice *vbasedev, Error **errp);
> >> +                    VFIODevice *vbasedev, bool *reused, Error **errp);
> >>  
> >>  extern const MemoryRegionOps vfio_region_ops;
> >>  typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList;
> >> diff --git a/migration/savevm.c b/migration/savevm.c
> >> index 881dc13..2606cf0 100644
> >> --- a/migration/savevm.c
> >> +++ b/migration/savevm.c
> >> @@ -1568,7 +1568,7 @@ static int qemu_savevm_state(QEMUFile *f, VMStateMode mode, Error **errp)
> >>          return -EINVAL;
> >>      }
> >>  
> >> -    if (migrate_use_block()) {
> >> +    if ((mode & (VMS_SNAPSHOT | VMS_MIGRATE)) && migrate_use_block()) {
> >>          error_setg(errp, "Block migration and snapshots are incompatible");
> >>          return -EINVAL;
> >>      }
> >> -- 
> >> 1.8.3.1
> >>
> >>


  reply	other threads:[~2020-08-10  3:52 UTC|newest]

Thread overview: 118+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-30 15:14 [PATCH V1 00/32] Live Update Steve Sistare
2020-07-30 15:14 ` [PATCH V1 01/32] savevm: add vmstate handler iterators Steve Sistare
2020-09-11 16:24   ` Dr. David Alan Gilbert
2020-09-24 21:43     ` Steven Sistare
2020-09-25  9:07       ` Dr. David Alan Gilbert
2020-07-30 15:14 ` [PATCH V1 02/32] savevm: VM handlers mode mask Steve Sistare
2020-07-30 15:14 ` [PATCH V1 03/32] savevm: QMP command for cprsave Steve Sistare
2020-07-30 16:12   ` Eric Blake
2020-07-30 17:52     ` Steven Sistare
2020-09-11 16:43   ` Dr. David Alan Gilbert
2020-09-25 18:43     ` Steven Sistare
2020-09-25 22:22       ` Steven Sistare
2020-07-30 15:14 ` [PATCH V1 04/32] savevm: HMP Command " Steve Sistare
2020-09-11 16:57   ` Dr. David Alan Gilbert
2020-09-24 21:44     ` Steven Sistare
2020-09-25  9:26       ` Dr. David Alan Gilbert
2020-07-30 15:14 ` [PATCH V1 05/32] savevm: QMP command for cprload Steve Sistare
2020-07-30 16:14   ` Eric Blake
2020-07-30 18:00     ` Steven Sistare
2020-09-11 17:18       ` Dr. David Alan Gilbert
2020-09-24 21:49         ` Steven Sistare
2020-07-30 15:14 ` [PATCH V1 06/32] savevm: HMP Command " Steve Sistare
2020-07-30 15:14 ` [PATCH V1 07/32] savevm: QMP command for cprinfo Steve Sistare
2020-07-30 16:17   ` Eric Blake
2020-07-30 18:02     ` Steven Sistare
2020-07-30 15:14 ` [PATCH V1 08/32] savevm: HMP " Steve Sistare
2020-09-11 17:27   ` Dr. David Alan Gilbert
2020-09-24 21:50     ` Steven Sistare
2020-07-30 15:14 ` [PATCH V1 09/32] savevm: prevent cprsave if memory is volatile Steve Sistare
2020-09-11 17:35   ` Dr. David Alan Gilbert
2020-09-24 21:51     ` Steven Sistare
2020-07-30 15:14 ` [PATCH V1 10/32] kvmclock: restore paused KVM clock Steve Sistare
2020-09-11 17:50   ` Dr. David Alan Gilbert
2020-09-25 18:07     ` Steven Sistare
2020-07-30 15:14 ` [PATCH V1 11/32] cpu: disable ticks when suspended Steve Sistare
2020-09-11 17:53   ` Dr. David Alan Gilbert
2020-09-24 20:42     ` Steven Sistare
2020-09-25  9:03       ` Dr. David Alan Gilbert
2020-07-30 15:14 ` [PATCH V1 12/32] vl: pause option Steve Sistare
2020-07-30 16:20   ` Eric Blake
2020-07-30 18:11     ` Steven Sistare
2020-07-31 10:07       ` Daniel P. Berrangé
2020-07-31 15:18         ` Steven Sistare
2020-07-30 17:03   ` Alex Bennée
2020-07-30 18:14     ` Steven Sistare
2020-07-31  9:44       ` Alex Bennée
2020-09-11 17:59       ` Dr. David Alan Gilbert
2020-09-24 21:51         ` Steven Sistare
2020-07-30 15:14 ` [PATCH V1 13/32] gdbstub: gdb support for suspended state Steve Sistare
2020-09-11 18:41   ` Dr. David Alan Gilbert
2020-09-24 21:51     ` Steven Sistare
2020-07-30 15:14 ` [PATCH V1 14/32] savevm: VMS_RESTART and cprsave restart Steve Sistare
2020-07-30 16:22   ` Eric Blake
2020-07-30 18:14     ` Steven Sistare
2020-09-11 18:44   ` Dr. David Alan Gilbert
2020-09-24 21:44     ` Steven Sistare
2020-07-30 15:14 ` [PATCH V1 15/32] vl: QEMU_START_FREEZE env var Steve Sistare
2020-09-11 18:49   ` Dr. David Alan Gilbert
2020-09-24 21:47     ` Steven Sistare
2020-09-25 15:52       ` Dr. David Alan Gilbert
2020-07-30 15:14 ` [PATCH V1 16/32] oslib: add qemu_clr_cloexec Steve Sistare
2020-09-11 18:52   ` Dr. David Alan Gilbert
2020-07-30 15:14 ` [PATCH V1 17/32] util: env var helpers Steve Sistare
2020-09-11 19:00   ` Dr. David Alan Gilbert
2020-09-24 21:52     ` Steven Sistare
2020-07-30 15:14 ` [PATCH V1 18/32] osdep: import MADV_DOEXEC Steve Sistare
2020-08-17 18:30   ` Steven Sistare
2020-08-17 20:48     ` Alex Williamson
2020-08-17 21:20       ` Steven Sistare
2020-08-17 21:44         ` Alex Williamson
2020-08-18  2:42           ` Alex Williamson
2020-08-19 21:52             ` Steven Sistare
2020-08-24 22:30               ` Alex Williamson
2020-10-08 16:32                 ` Steven Sistare
2020-10-15 20:36                   ` Alex Williamson
2020-10-19 16:33                     ` Steven Sistare
2020-10-26 18:28                       ` Steven Sistare
2020-07-30 15:14 ` [PATCH V1 19/32] memory: ram_block_add cosmetic changes Steve Sistare
2020-07-30 15:14 ` [PATCH V1 20/32] vl: add helper to request re-exec Steve Sistare
2020-07-30 15:14 ` [PATCH V1 21/32] exec, memory: exec(3) to restart Steve Sistare
2020-07-30 15:14 ` [PATCH V1 22/32] char: qio_channel_socket_accept reuse fd Steve Sistare
2020-09-15 17:33   ` Dr. David Alan Gilbert
2020-09-15 17:53     ` Daniel P. Berrangé
2020-09-24 21:54     ` Steven Sistare
2020-07-30 15:14 ` [PATCH V1 23/32] char: save/restore chardev socket fds Steve Sistare
2020-07-30 15:14 ` [PATCH V1 24/32] ui: save/restore vnc " Steve Sistare
2020-07-31  9:06   ` Daniel P. Berrangé
2020-07-31 16:51     ` Steven Sistare
2020-07-30 15:14 ` [PATCH V1 25/32] char: save/restore chardev pty fds Steve Sistare
2020-07-30 15:14 ` [PATCH V1 26/32] monitor: save/restore QMP negotiation status Steve Sistare
2020-07-30 15:14 ` [PATCH V1 27/32] vhost: reset vhost devices upon cprsave Steve Sistare
2020-07-30 15:14 ` [PATCH V1 28/32] char: restore terminal on restart Steve Sistare
2020-07-30 15:14 ` [PATCH V1 29/32] pci: export pci_update_mappings Steve Sistare
2020-07-30 15:14 ` [PATCH V1 30/32] vfio-pci: save and restore Steve Sistare
2020-08-06 10:22   ` Jason Zeng
2020-08-07 20:38     ` Steven Sistare
2020-08-10  3:50       ` Jason Zeng [this message]
2020-08-19 21:15         ` Steven Sistare
2020-08-20 10:33           ` Jason Zeng
2020-10-07 21:25             ` Steven Sistare
2020-07-30 15:14 ` [PATCH V1 31/32] vfio-pci: trace pci config Steve Sistare
2020-07-30 15:14 ` [PATCH V1 32/32] vfio-pci: improved tracing Steve Sistare
2020-09-15 18:49   ` Dr. David Alan Gilbert
2020-09-24 21:52     ` Steven Sistare
2020-07-30 16:52 ` [PATCH V1 00/32] Live Update Daniel P. Berrangé
2020-07-30 18:48   ` Steven Sistare
2020-07-31  8:53     ` Daniel P. Berrangé
2020-07-31 15:27       ` Steven Sistare
2020-07-31 15:52         ` Daniel P. Berrangé
2020-07-31 17:20           ` Steven Sistare
2020-08-11 19:08           ` Dr. David Alan Gilbert
2020-07-30 17:15 ` Paolo Bonzini
2020-07-30 19:09   ` Steven Sistare
2020-07-30 21:39     ` Paolo Bonzini
2020-07-31 19:22       ` Steven Sistare
2020-07-30 17:49 ` Dr. David Alan Gilbert
2020-07-30 19:31   ` Steven Sistare
2020-08-04 18:18 ` Steven Sistare

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200810035059.GA3463@x48 \
    --to=jason.zeng@intel.com \
    --cc=alex.bennee@linaro.org \
    --cc=alex.williamson@redhat.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=jason.zeng@linux.intel.com \
    --cc=marcandre.lureau@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=philmd@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=steven.sistare@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.