All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steven Sistare <steven.sistare@oracle.com>
To: Fam Zheng <fam.zheng@bytedance.com>
Cc: "Jason Zeng" <jason.zeng@linux.intel.com>,
	"Juan Quintela" <quintela@redhat.com>,
	"Eric Blake" <eblake@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	qemu-devel@nongnu.org,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	"Zheng Chuan" <zhengchuan@huawei.com>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"Marc-André Lureau" <marcandre.lureau@redhat.com>,
	"Daniel P. Berrange" <berrange@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@redhat.com>,
	"Alex Bennée" <alex.bennee@linaro.org>,
	"Markus Armbruster" <armbru@redhat.com>
Subject: Re: [PATCH V6 21/27] vfio-pci: cpr part 3 (intx)
Date: Mon, 11 Apr 2022 12:23:50 -0400	[thread overview]
Message-ID: <f18cf0e7-bb4f-1b43-ee2b-5e00c0c282d6@oracle.com> (raw)
In-Reply-To: <20220329110326.GA447081@fam-dell>

On 3/29/2022 7:03 AM, Fam Zheng wrote:
> On 2021-08-06 14:43, Steve Sistare wrote:
>> Preserve vfio INTX state across cpr restart.  Preserve VFIOINTx fields as
>> follows:
>>   pin : Recover this from the vfio config in kernel space
>>   interrupt : Preserve its eventfd descriptor across exec.
>>   unmask : Ditto
>>   route.irq : This could perhaps be recovered in vfio_pci_post_load by
>>     calling pci_device_route_intx_to_irq(pin), whose implementation reads
>>     config space for a bridge device such as ich9.  However, there is no
>>     guarantee that the bridge vmstate is read before vfio vmstate.  Rather
>>     than fiddling with MigrationPriority for vmstate handlers, explicitly
>>     save route.irq in vfio vmstate.
>>   pending : save in vfio vmstate.
>>   mmap_timeout, mmap_timer : Re-initialize
>>   bool kvm_accel : Re-initialize
>>
>> In vfio_realize, defer calling vfio_intx_enable until the vmstate
>> is available, in vfio_pci_post_load.  Modify vfio_intx_enable and
>> vfio_intx_kvm_enable to skip vfio initialization, but still perform
>> kvm initialization.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> 
> Hi Steve,
> 
> Not directly related to this patch, but since the context is close: it looks
> like this series only takes care of exec restart mode of vfio-pci, have you had
> any thoughts on kexec reboot mode with vfio-pci?
> 
> The general idea is if DMAR context is not lost during kexec, we should be able
> to set up irqfds again and things will just work?
> 
> Fam

Hi Fam,
  I have thought about that use case, but only in general terms.
IMO it best fits in the cpr framework as a new mode (rather than as 
a new -restore command line argument).  

In your code below, you would have fewer code changes if you set 
'reused = true' for the new mode, rather than testing both 'reused and restored' 
at multiple sites. Lastly, I cleaned up the vector handling somewhat from V6 
to V7, so you may want to try your code using V7 as a base.

- Steve

> PS some more info below:
> 
> I have some local kernel patches to kexec reboot most part of the host kernel
> while keeping IOMMU DMAR tables in a valid state; with that, not many extra
> things are needed in addition to restore it. A PoC is like below (I can share
> more details of the kernel changes if this patch makes any sense):
> 
> 
> commit f8951e58be86bd6e37f816394a9a73f28d8059fc
> Author: Fam Zheng <fam.zheng@bytedance.com>
> Date:   Mon Mar 21 13:19:49 2022 +0000
> 
>     cpr: Add live-update support to vfio-pci devices
>     
>     In cpr-save, always serialize the vfio-pci states.
>     
>     In cpr-load, add a '-restore' mode that will do
>     VFIO_GROUP_GET_DEVICE_FD_INTACT and skip DMAR setup, somewhat similar to
>     the current cpr exec mode.
>     
>     Signed-off-by: Fam Zheng <fam.zheng@bytedance.com>
> 
> diff --git a/hw/pci/msix.c b/hw/pci/msix.c
> index 73f4259556..e36f0ef97d 100644
> --- a/hw/pci/msix.c
> +++ b/hw/pci/msix.c
> @@ -584,10 +584,15 @@ void msix_init_vector_notifiers(PCIDevice *dev,
>                                  MSIVectorReleaseNotifier release_notifier,
>                                  MSIVectorPollNotifier poll_notifier)
>  {
> +    int vector;
> +
>      assert(use_notifier && release_notifier);
>      dev->msix_vector_use_notifier = use_notifier;
>      dev->msix_vector_release_notifier = release_notifier;
>      dev->msix_vector_poll_notifier = poll_notifier;
> +    for (vector = 0; vector < dev->msix_entries_nr; ++vector) {
> +        msix_handle_mask_update(dev, vector, true);
> +    }
>  }
>  
>  int msix_set_vector_notifiers(PCIDevice *dev,
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 605ffbb5d0..f1240410a8 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -2066,6 +2066,9 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>      bool reused;
>      VFIOAddressSpace *space;
>  
> +    if (restore) {
> +        return 0;
> +    }
>      space = vfio_get_address_space(as);
>      fd = cpr_find_fd("vfio_container_for_group", group->groupid);
>      reused = (fd > 0);
> @@ -2486,7 +2489,8 @@ int vfio_get_device(VFIOGroup *group, const char *name,
>      fd = cpr_find_fd(name, 0);
>      reused = (fd >= 0);
>      if (!reused) {
> -        fd = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
> +        int op = restore ? VFIO_GROUP_GET_DEVICE_FD_INTACT : VFIO_GROUP_GET_DEVICE_FD;
> +        fd = ioctl(group->fd, op, name);
>      }
>  
>      if (fd < 0) {
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index e32513c668..9da5f93228 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -361,7 +361,7 @@ static int vfio_intx_enable(VFIOPCIDevice *vdev, Error **errp)
>       * Do not alter interrupt state during vfio_realize and cpr-load.  The
>       * reused flag is cleared thereafter.
>       */
> -    if (!vdev->pdev.reused) {
> +    if (!vdev->pdev.reused && !restore) {
>          vfio_disable_interrupts(vdev);
>      }
>  
> @@ -388,7 +388,7 @@ static int vfio_intx_enable(VFIOPCIDevice *vdev, Error **errp)
>      fd = event_notifier_get_fd(&vdev->intx.interrupt);
>      qemu_set_fd_handler(fd, vfio_intx_interrupt, NULL, vdev);
>  
> -    if (vdev->pdev.reused) {
> +    if (vdev->pdev.reused && !restore) {
>          vfio_intx_reenable_kvm(vdev, &err);
>          goto finish;
>      }
> @@ -2326,6 +2326,9 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
>      int ret, i, count;
>      bool multi = false;
>  
> +    if (restore) {
> +        return 0;
> +    }
>      trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" : "multi");
>  
>      if (!single) {
> @@ -3185,7 +3188,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>          kvm_irqchip_add_change_notifier(&vdev->irqchip_change_notifier);
>  
>          /* Wait until cpr-load reads intx routing data to enable */
> -        if (!pdev->reused) {
> +        if (!pdev->reused && !restore) {
>              ret = vfio_intx_enable(vdev, errp);
>              if (ret) {
>                  goto out_deregister;
> @@ -3295,7 +3298,7 @@ static void vfio_pci_reset(DeviceState *dev)
>      VFIOPCIDevice *vdev = VFIO_PCI(dev);
>  
>      /* Do not reset the device during qemu_system_reset prior to cpr-load */
> -    if (vdev->pdev.reused) {
> +    if (vdev->pdev.reused || restore) {
>          return;
>      }
>  
> @@ -3429,33 +3432,40 @@ static void vfio_merge_config(VFIOPCIDevice *vdev)
>  
>  static void vfio_claim_vectors(VFIOPCIDevice *vdev, int nr_vectors, bool msix)
>  {
> -    int i, fd;
> +    int i, fd, ret;
>      bool pending = false;
>      PCIDevice *pdev = &vdev->pdev;
>  
> +    pdev->msix_entries_nr = nr_vectors;
>      vdev->nr_vectors = nr_vectors;
>      vdev->msi_vectors = g_new0(VFIOMSIVector, nr_vectors);
>      vdev->interrupt = msix ? VFIO_INT_MSIX : VFIO_INT_MSI;
>  
> -    for (i = 0; i < nr_vectors; i++) {
> -        VFIOMSIVector *vector = &vdev->msi_vectors[i];
> -
> -        fd = load_event_fd(vdev, "interrupt", i);
> -        if (fd >= 0) {
> -            vfio_vector_init(vdev, i);
> -            qemu_set_fd_handler(fd, vfio_msi_interrupt, NULL, vector);
> +    if (restore) {
> +        ret = vfio_enable_vectors(vdev, true);
> +        if (ret) {
> +            error_report("vfio: failed to enable vectors, %d", ret);
>          }
> +    } else {
> +        for (i = 0; i < nr_vectors; i++) {
> +            VFIOMSIVector *vector = &vdev->msi_vectors[i];
>  
> -        if (load_event_fd(vdev, "kvm_interrupt", i) >= 0) {
> -            vfio_add_kvm_msi_virq(vdev, vector, i, msix);
> -        }
> +            fd = load_event_fd(vdev, "interrupt", i);
> +            if (fd >= 0) {
> +                vfio_vector_init(vdev, i);
> +                qemu_set_fd_handler(fd, vfio_msi_interrupt, NULL, vector);
> +            }
>  
> -        if (msix && msix_is_pending(pdev, i) && msix_is_masked(pdev, i)) {
> -            set_bit(i, vdev->msix->pending);
> -            pending = true;
> +            if (load_event_fd(vdev, "kvm_interrupt", i) >= 0) {
> +                vfio_add_kvm_msi_virq(vdev, vector, i, msix);
> +            }
> +
> +            if (msix && msix_is_pending(pdev, i) && msix_is_masked(pdev, i)) {
> +                set_bit(i, vdev->msix->pending);
> +                pending = true;
> +            }
>          }
>      }
> -
>      if (msix) {
>          memory_region_set_enabled(&pdev->msix_pba_mmio, pending);
>      }
> @@ -3534,7 +3544,7 @@ static const VMStateDescription vfio_intx_vmstate = {
>  
>  static bool vfio_pci_needed(void *opaque)
>  {
> -    return cpr_get_mode() == CPR_MODE_RESTART;
> +    return 1;
>  }
>  
>  static const VMStateDescription vfio_pci_vmstate = {
> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> index 6241c20fb1..0179b0aa90 100644
> --- a/include/sysemu/sysemu.h
> +++ b/include/sysemu/sysemu.h
> @@ -26,6 +26,7 @@ void configure_rtc(QemuOpts *opts);
>  void qemu_init_subsystems(void);
>  
>  extern int autostart;
> +extern int restore;
>  
>  typedef enum {
>      VGA_NONE, VGA_STD, VGA_CIRRUS, VGA_VMWARE, VGA_XENFB, VGA_QXL,
> diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
> index e680594f27..65c3bab074 100644
> --- a/linux-headers/linux/vfio.h
> +++ b/linux-headers/linux/vfio.h
> @@ -188,6 +188,8 @@ struct vfio_group_status {
>   */
>  #define VFIO_GROUP_GET_DEVICE_FD	_IO(VFIO_TYPE, VFIO_BASE + 6)
>  
> +#define VFIO_GROUP_GET_DEVICE_FD_INTACT	_IO(VFIO_TYPE, VFIO_BASE + 21)
> +
>  /* --------------- IOCTLs for DEVICE file descriptors --------------- */
>  
>  /**
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 8b90d04cb9..03666a59b3 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -3984,6 +3984,10 @@ SRST
>      option is experimental.
>  ERST
>  
> +DEF("restore", 0, QEMU_OPTION_restore, \
> +    "-restore              restore mode",
> +    QEMU_ARCH_ALL)
> +
>  DEF("S", 0, QEMU_OPTION_S, \
>      "-S              freeze CPU at startup (use 'c' to start execution)\n",
>      QEMU_ARCH_ALL)
> diff --git a/softmmu/globals.c b/softmmu/globals.c
> index a18fd8dcf3..6fcb5846b4 100644
> --- a/softmmu/globals.c
> +++ b/softmmu/globals.c
> @@ -41,6 +41,7 @@ bool enable_cpu_pm;
>  int nb_nics;
>  NICInfo nd_table[MAX_NICS];
>  int autostart = 1;
> +int restore;
>  int vga_interface_type = VGA_NONE;
>  Chardev *parallel_hds[MAX_PARALLEL_PORTS];
>  int win2k_install_hack;
> diff --git a/softmmu/vl.c b/softmmu/vl.c
> index f14e29e622..fba6b577cb 100644
> --- a/softmmu/vl.c
> +++ b/softmmu/vl.c
> @@ -3088,6 +3088,9 @@ void qemu_init(int argc, char **argv, char **envp)
>              case QEMU_OPTION_S:
>                  autostart = 0;
>                  break;
> +            case QEMU_OPTION_restore:
> +                restore = 1;
> +                break;
>              case QEMU_OPTION_k:
>                  keyboard_layout = optarg;
>                  break;


  reply	other threads:[~2022-04-11 16:27 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-06 21:43 [PATCH V6 00/27] Live Update Steve Sistare
2021-08-06 21:43 ` [PATCH V6 01/27] memory: qemu_check_ram_volatile Steve Sistare
2021-08-06 21:43 ` [PATCH V6 02/27] migration: fix populate_vfio_info Steve Sistare
2021-08-06 21:43 ` [PATCH V6 03/27] migration: qemu file wrappers Steve Sistare
2021-08-06 21:43 ` [PATCH V6 04/27] migration: simplify savevm Steve Sistare
2021-08-06 21:43 ` [PATCH V6 05/27] vl: start on wakeup request Steve Sistare
2021-08-06 21:43 ` [PATCH V6 06/27] cpr: reboot mode Steve Sistare
2021-08-06 21:43 ` [PATCH V6 07/27] cpr: reboot HMP interfaces Steve Sistare
2021-08-06 21:43 ` [PATCH V6 08/27] memory: flat section iterator Steve Sistare
2021-08-06 21:43 ` [PATCH V6 09/27] oslib: qemu_clear_cloexec Steve Sistare
2021-08-06 21:43 ` [PATCH V6 10/27] machine: memfd-alloc option Steve Sistare
2021-08-06 21:43 ` [PATCH V6 11/27] qapi: list utility functions Steve Sistare
2021-08-06 21:43 ` [PATCH V6 12/27] vl: helper to request re-exec Steve Sistare
2021-08-06 21:43 ` [PATCH V6 13/27] cpr: preserve extra state Steve Sistare
2021-08-06 21:43 ` [PATCH V6 14/27] cpr: restart mode Steve Sistare
2021-08-06 21:43 ` [PATCH V6 15/27] cpr: restart HMP interfaces Steve Sistare
2021-08-06 21:43 ` [PATCH V6 16/27] hostmem-memfd: cpr for memory-backend-memfd Steve Sistare
2021-08-06 21:43 ` [PATCH V6 17/27] pci: export functions for cpr Steve Sistare
2021-08-06 21:43 ` [PATCH V6 18/27] vfio-pci: refactor " Steve Sistare
2021-08-10 16:53   ` Alex Williamson
2021-08-23 16:52     ` Steven Sistare
2021-08-06 21:43 ` [PATCH V6 19/27] vfio-pci: cpr part 1 (fd and dma) Steve Sistare
2021-08-10 17:06   ` Alex Williamson
2021-08-23 19:43     ` Steven Sistare
2021-11-10  7:48     ` Zheng Chuan
2021-11-30 16:11       ` Steven Sistare
2021-08-06 21:43 ` [PATCH V6 20/27] vfio-pci: cpr part 2 (msi) Steve Sistare
2021-08-06 21:43 ` [PATCH V6 21/27] vfio-pci: cpr part 3 (intx) Steve Sistare
2022-03-29 11:03   ` Fam Zheng
2022-04-11 16:23     ` Steven Sistare [this message]
2022-04-12 11:01       ` Fam Zheng
2021-08-06 21:43 ` [PATCH V6 22/27] vhost: reset vhost devices for cpr Steve Sistare
2021-08-06 21:43 ` [PATCH V6 23/27] chardev: cpr framework Steve Sistare
2021-08-06 21:43 ` [PATCH V6 24/27] chardev: cpr for simple devices Steve Sistare
2021-08-06 21:43 ` [PATCH V6 25/27] chardev: cpr for pty Steve Sistare
2021-08-06 21:44 ` [PATCH V6 26/27] chardev: cpr for sockets Steve Sistare
2021-08-06 21:44 ` [PATCH V6 27/27] cpr: only-cpr-capable option Steve Sistare
2021-08-09 16:02 ` [PATCH V6 00/27] Live Update Steven Sistare
2021-08-21  8:54 ` Zheng Chuan
2021-08-23 21:36   ` Steven Sistare
2021-08-24  9:36     ` Zheng Chuan
2021-08-31 21:15       ` Steven Sistare
2021-10-27  6:16         ` Zheng Chuan
2021-10-27 12:25           ` Steven Sistare

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f18cf0e7-bb4f-1b43-ee2b-5e00c0c282d6@oracle.com \
    --to=steven.sistare@oracle.com \
    --cc=alex.bennee@linaro.org \
    --cc=alex.williamson@redhat.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eblake@redhat.com \
    --cc=fam.zheng@bytedance.com \
    --cc=jason.zeng@linux.intel.com \
    --cc=marcandre.lureau@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=philmd@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=zhengchuan@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.