All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Steven Sistare <steven.sistare@oracle.com>
Cc: "Daniel P. Berrange" <berrange@redhat.com>,
	"Juan Quintela" <quintela@redhat.com>,
	"Jason Zeng" <jason.zeng@linux.intel.com>,
	"Alex Bennée" <alex.bennee@linaro.org>,
	qemu-devel@nongnu.org, "Eric Blake" <eblake@redhat.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	"Zheng Chuan" <zhengchuan@huawei.com>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"Marc-André Lureau" <marcandre.lureau@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@redhat.com>,
	"Markus Armbruster" <armbru@redhat.com>
Subject: Re: [PATCH V7 19/29] vfio-pci: cpr part 1 (fd and dma)
Date: Wed, 5 Jan 2022 16:14:09 -0500	[thread overview]
Message-ID: <20220105161046-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <f1cadf51-795b-200c-8abb-f8f97b34c228@oracle.com>

On Wed, Jan 05, 2022 at 12:24:21PM -0500, Steven Sistare wrote:
> On 12/22/2021 6:15 PM, Michael S. Tsirkin wrote:
> > On Wed, Dec 22, 2021 at 11:05:24AM -0800, Steve Sistare wrote:
> >> Enable vfio-pci devices to be saved and restored across an exec restart
> >> of qemu.
> >>
> >> At vfio creation time, save the value of vfio container, group, and device
> >> descriptors in cpr state.
> >>
> >> In cpr-save and cpr-exec, suspend the use of virtual addresses in DMA
> >> mappings with VFIO_DMA_UNMAP_FLAG_VADDR, because guest ram will be remapped
> >> at a different VA after exec.  DMA to already-mapped pages continues.  Save
> >> the msi message area as part of vfio-pci vmstate, save the interrupt and
> >> notifier eventfd's in cpr state, and clear the close-on-exec flag for the
> >> vfio descriptors.  The flag is not cleared earlier because the descriptors
> >> should not persist across miscellaneous fork and exec calls that may be
> >> performed during normal operation.
> >>
> >> On qemu restart, vfio_realize() finds the saved descriptors, uses
> >> the descriptors, and notes that the device is being reused.  Device and
> >> iommu state is already configured, so operations in vfio_realize that
> >> would modify the configuration are skipped for a reused device, including
> >> vfio ioctl's and writes to PCI configuration space.  The result is that
> >> vfio_realize constructs qemu data structures that reflect the current
> >> state of the device.  However, the reconstruction is not complete until
> >> cpr-load is called. cpr-load loads the msi data and finds eventfds in cpr
> >> state.  It rebuilds vector data structures and attaches the interrupts to
> >> the new KVM instance.  cpr-load then invokes the main vfio listener callback,
> >> which walks the flattened ranges of the vfio_address_spaces and calls
> >> VFIO_DMA_MAP_FLAG_VADDR to inform the kernel of the new VA's.  Lastly, it
> >> starts the VM and suppresses vfio pci device reset.
> >>
> >> This functionality is delivered by 3 patches for clarity.  Part 1 handles
> >> device file descriptors and DMA.  Part 2 adds eventfd and MSI/MSI-X vector
> >> support.  Part 3 adds INTX support.
> >>
> >> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> >> ---
> >>  MAINTAINERS                   |   1 +
> >>  hw/pci/pci.c                  |  10 ++++
> >>  hw/vfio/common.c              | 115 ++++++++++++++++++++++++++++++++++++++----
> >>  hw/vfio/cpr.c                 |  94 ++++++++++++++++++++++++++++++++++
> >>  hw/vfio/meson.build           |   1 +
> >>  hw/vfio/pci.c                 |  77 ++++++++++++++++++++++++++++
> >>  hw/vfio/trace-events          |   1 +
> >>  include/hw/pci/pci.h          |   1 +
> >>  include/hw/vfio/vfio-common.h |   8 +++
> >>  include/migration/cpr.h       |   3 ++
> >>  migration/cpr.c               |  10 +++-
> >>  migration/target.c            |  14 +++++
> >>  12 files changed, 324 insertions(+), 11 deletions(-)
> >>  create mode 100644 hw/vfio/cpr.c
> >>
> >> diff --git a/MAINTAINERS b/MAINTAINERS
> >> index cfe7480..feed239 100644
> >> --- a/MAINTAINERS
> >> +++ b/MAINTAINERS
> >> @@ -2992,6 +2992,7 @@ CPR
> >>  M: Steve Sistare <steven.sistare@oracle.com>
> >>  M: Mark Kanda <mark.kanda@oracle.com>
> >>  S: Maintained
> >> +F: hw/vfio/cpr.c
> >>  F: include/migration/cpr.h
> >>  F: migration/cpr.c
> >>  F: qapi/cpr.json
> >> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> >> index 0fd21e1..e35df4f 100644
> >> --- a/hw/pci/pci.c
> >> +++ b/hw/pci/pci.c
> >> @@ -307,6 +307,16 @@ static void pci_do_device_reset(PCIDevice *dev)
> >>  {
> >>      int r;
> >>  
> >> +    /*
> >> +     * A reused vfio-pci device is already configured, so do not reset it
> >> +     * during qemu_system_reset prior to cpr-load, else interrupts may be
> >> +     * lost.  By contrast, pure-virtual pci devices may be reset here and
> >> +     * updated with new state in cpr-load with no ill effects.
> >> +     */
> >> +    if (dev->reused) {
> >> +        return;
> >> +    }
> >> +
> >>      pci_device_deassert_intx(dev);
> >>      assert(dev->irq_state == 0);
> >>  
> > 
> > 
> > Hmm that's a weird thing to do. I suspect this works because
> > "reused" means something like "in the process of being restored"?
> > Because clearly, we do not want to skip this part e.g. when
> > guest resets the device.
> 
> Exactly.  vfio_realize sets the flag if it detects the device is reused during
> a restart, and vfio_pci_post_load clears the reused flag.
> 
> > So a better name could be called for, but really I don't
> > love how vfio gets to poke at internal PCI state.
> > I'd rather we found a way just not to call this function.
> > If we can't, maybe an explicit API, and make it
> > actually say what it's doing?
> 
> How about:
> 
> pci_set_restore(PCIDevice *dev) { dev->restore = true; }
> pci_clr_restore(PCIDevice *dev) { dev->restore = false; }
> 
> vfio_realize()
>   pci_set_restore(pdev)
> 
> vfio_pci_post_load()
>   pci_clr_restore(pdev)
> 
> pci_do_device_reset()
>     if (dev->restore)
>         return;
> 
> - Steve


Not too bad. I'd like a better definition of what dev->restore is
exactly and to add them in comments near where it
is defined and used.

E.g. does this mean "device is being restored because of qemu restart"?

Do we need a per device flag for this thing or would a global
"qemu restart in progress" flag be enough?

-- 
MST



  reply	other threads:[~2022-01-05 21:17 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-22 19:05 [PATCH V7 00/29] Live Update Steve Sistare
2021-12-22 19:05 ` [PATCH V7 01/29] memory: qemu_check_ram_volatile Steve Sistare
2022-02-24 18:28   ` Dr. David Alan Gilbert
2022-03-03 15:55     ` Steven Sistare
2022-03-04 12:47   ` Philippe Mathieu-Daudé
2021-12-22 19:05 ` [PATCH V7 02/29] migration: fix populate_vfio_info Steve Sistare
2022-02-24 18:42   ` Peter Maydell
2022-03-03 15:55     ` Steven Sistare
2022-03-03 16:21       ` Peter Maydell
2022-03-03 16:38         ` Steven Sistare
2021-12-22 19:05 ` [PATCH V7 03/29] migration: qemu file wrappers Steve Sistare
2022-02-24 18:21   ` Dr. David Alan Gilbert
2022-03-03 15:55     ` Steven Sistare
2021-12-22 19:05 ` [PATCH V7 04/29] migration: simplify savevm Steve Sistare
2022-02-24 18:25   ` Dr. David Alan Gilbert
2022-03-03 15:55     ` Steven Sistare
2021-12-22 19:05 ` [PATCH V7 05/29] vl: start on wakeup request Steve Sistare
2022-02-24 18:51   ` Dr. David Alan Gilbert
2022-03-03 15:56     ` Steven Sistare
2021-12-22 19:05 ` [PATCH V7 06/29] cpr: reboot mode Steve Sistare
2021-12-22 19:05 ` [PATCH V7 07/29] cpr: reboot HMP interfaces Steve Sistare
2021-12-22 19:05 ` [PATCH V7 08/29] memory: flat section iterator Steve Sistare
2022-03-04 12:48   ` Philippe Mathieu-Daudé
2022-03-07 14:42     ` Steven Sistare
2022-03-09 14:18   ` Marc-André Lureau
2021-12-22 19:05 ` [PATCH V7 09/29] oslib: qemu_clear_cloexec Steve Sistare
2021-12-22 19:05 ` [PATCH V7 10/29] machine: memfd-alloc option Steve Sistare
2022-02-18  8:05   ` Guoyi Tu
2022-03-03 15:55     ` Steven Sistare
2022-02-24 17:56   ` Dr. David Alan Gilbert
2022-03-03 15:56     ` Steven Sistare
2022-03-03 17:21   ` Michael S. Tsirkin
2022-03-04 10:41     ` Igor Mammedov
2022-03-07 14:41       ` Steven Sistare
2022-03-08  6:50         ` Michael S. Tsirkin
2022-03-08  7:20           ` Igor Mammedov
2022-03-10 15:36             ` Steven Sistare
2022-03-10 16:00               ` Igor Mammedov
2022-03-10 17:28                 ` Steven Sistare
2022-03-10 18:18                   ` Steven Sistare
2022-03-11  9:42                     ` Igor Mammedov
2022-03-29 17:43                       ` Steven Sistare
2022-03-11 10:08         ` Daniel P. Berrangé
2022-03-11 10:25     ` David Hildenbrand
2022-03-11  9:54   ` David Hildenbrand
2021-12-22 19:05 ` [PATCH V7 11/29] qapi: list utility functions Steve Sistare
2022-03-09 14:11   ` Marc-André Lureau
2022-03-11 16:45     ` Steven Sistare
2022-03-11 21:59       ` Marc-André Lureau
2021-12-22 19:05 ` [PATCH V7 12/29] vl: helper to request re-exec Steve Sistare
2022-03-09 14:16   ` Marc-André Lureau
2022-03-11 16:45     ` Steven Sistare
2021-12-22 19:05 ` [PATCH V7 13/29] cpr: preserve extra state Steve Sistare
2021-12-22 19:05 ` [PATCH V7 14/29] cpr: restart mode Steve Sistare
2021-12-22 19:05 ` [PATCH V7 15/29] cpr: restart HMP interfaces Steve Sistare
2021-12-22 19:05 ` [PATCH V7 16/29] hostmem-memfd: cpr for memory-backend-memfd Steve Sistare
2021-12-22 19:05 ` [PATCH V7 17/29] pci: export functions for cpr Steve Sistare
2021-12-22 23:07   ` Michael S. Tsirkin
2022-01-05 17:22     ` Steven Sistare
2022-01-05 20:16       ` Michael S. Tsirkin
2022-01-06 22:48         ` Steven Sistare
2022-01-07 10:03           ` Michael S. Tsirkin
2021-12-22 19:05 ` [PATCH V7 18/29] vfio-pci: refactor " Steve Sistare
2022-03-03 23:21   ` Alex Williamson
2022-03-07 14:42     ` Steven Sistare
2021-12-22 19:05 ` [PATCH V7 19/29] vfio-pci: cpr part 1 (fd and dma) Steve Sistare
2021-12-22 23:15   ` Michael S. Tsirkin
2022-01-05 17:24     ` Steven Sistare
2022-01-05 21:14       ` Michael S. Tsirkin [this message]
2022-01-05 21:40         ` Steven Sistare
2022-01-05 23:09           ` Michael S. Tsirkin
2022-01-05 23:24             ` Steven Sistare
2022-01-06  9:12               ` Michael S. Tsirkin
2022-01-06 19:13                 ` Steven Sistare
2022-03-07 22:16   ` Alex Williamson
2022-03-10 15:00     ` Steven Sistare
2022-03-10 18:35       ` Alex Williamson
2022-03-10 19:55         ` Steven Sistare
2022-03-10 22:30           ` Alex Williamson
2022-03-11 16:22             ` Steven Sistare
2021-12-22 19:05 ` [PATCH V7 20/29] vfio-pci: cpr part 2 (msi) Steve Sistare
2021-12-22 19:05 ` [PATCH V7 21/29] vfio-pci: cpr part 3 (intx) Steve Sistare
2021-12-22 19:05 ` [PATCH V7 22/29] vfio-pci: recover from unmap-all-vaddr failure Steve Sistare
2021-12-22 19:05 ` [PATCH V7 23/29] vhost: reset vhost devices for cpr Steve Sistare
2021-12-22 19:05 ` [PATCH V7 24/29] loader: suppress rom_reset during cpr Steve Sistare
2021-12-22 19:05 ` [PATCH V7 25/29] chardev: cpr framework Steve Sistare
2021-12-22 19:05 ` [PATCH V7 26/29] chardev: cpr for simple devices Steve Sistare
2021-12-22 19:05 ` [PATCH V7 27/29] chardev: cpr for pty Steve Sistare
2021-12-22 19:05 ` [PATCH V7 28/29] chardev: cpr for sockets Steve Sistare
2022-02-18  9:03   ` Guoyi Tu
2022-03-03 15:55     ` Steven Sistare
2021-12-22 19:05 ` [PATCH V7 29/29] cpr: only-cpr-capable option Steve Sistare
2022-02-18  9:43   ` Guoyi Tu
2022-03-03 15:54     ` Steven Sistare
2022-01-07 18:45 ` [PATCH V7 00/29] Live Update Steven Sistare
2022-02-18 13:36   ` Steven Sistare

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220105161046-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=alex.bennee@linaro.org \
    --cc=alex.williamson@redhat.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eblake@redhat.com \
    --cc=jason.zeng@linux.intel.com \
    --cc=marcandre.lureau@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=philmd@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=steven.sistare@oracle.com \
    --cc=zhengchuan@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.