All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: Zhou Jie <zhoujie2011@cn.fujitsu.com>
Cc: qemu-devel@nongnu.org, fan.chen@easystack.cn,
	izumi.taku@jp.fujitsu.com, caoj.fnst@cn.fujitsu.com,
	mst@redhat.com, Chen Fan <chen.fan.fnst@cn.fujitsu.com>
Subject: Re: [Qemu-devel] [PATCH v8 11/12] vfio: register aer resume notification handler for aer resume
Date: Fri, 27 May 2016 10:06:55 -0600	[thread overview]
Message-ID: <20160527100655.60db8206@t450s.home> (raw)
In-Reply-To: <1464315131-25834-1-git-send-email-zhoujie2011@cn.fujitsu.com>

On Fri, 27 May 2016 10:12:11 +0800
Zhou Jie <zhoujie2011@cn.fujitsu.com> wrote:

> From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
> 
> For supporting aer recovery, host and guest would run the same aer
> recovery code, that would do the secondary bus reset if the error
> is fatal, the aer recovery process:
>   1. error_detected
>   2. reset_link (if fatal)
>   3. slot_reset/mmio_enabled
>   4. resume
> 
> It indicates that host will do secondary bus reset to reset
> the physical devices under bus in step 2, that would cause
> devices in D3 status in a short time. But in qemu, we register
> an error detected handler, that would be invoked as host broadcasts
> the error-detected event in step 1, in order to avoid guest do
> reset_link when host do reset_link simultaneously. it may cause
> fatal error. we introduce a resmue notifier to assure host reset
> completely.
> In qemu, the aer recovery process:
>   1. Detect support for resume notification
>      If host vfio driver does not support for resume notification,
>      directly fail to boot up VM as with aer enabled.
>   2. Immediately notify the VM on error detected.
>   3. Delay the guest directed bus reset.
>   4. Wait for resume notification.
>      If we don't get the resume notification from the host after
>      some timeout, we would abort the guest directed bus reset
>      altogether and unplug of the device to prevent it from further
>      interacting with the VM.
>   5. After get the resume notification reset bus.


It seems like we have a number of questions open in the thread with MST
from the previous version, particularly whether we should actually drop
the resume notifier and block the reset in the kernel.  The concern
being that it's not very well specified what we can and cannot do
between the error interrupt and the resume interrupt.  We'd probably
need some other indicate of whether the host has this capability,
perhaps a flag in vfio_device_info.  Appreciate your opinions there.
Thanks,

Alex

 
> Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
> Signed-off-by: Zhou Jie <zhoujie2011@cn.fujitsu.com>
> ---
> v7->v8
>  *Add a comment for why VFIO_RESET_TIMEOUT value is 1000.
>  *change vfio_resume_cap to pci_aer_has_resume.
>  *change vfio_resume_notifier_handler to vfio_aer_resume_notifier_handler.
>  *change reset_timer to pci_aer_reset_blocked_timer.
>  *Remove error_report for not supporting resume notification in
>   vfio_populate_device function.
>  *All error and resume tracking is done with atomic bitmap on function 0.
>  *Remove stalling any access to the device until resume is signaled.
>   Because guest OS need read configure space to know the reason of error.
>   And it should up to guest OS to decide to stop access BAR region.
>  *Still use timer to delay reset.
>   Because vfio_aer_resume_notifier_handler cann't be invoked when
>   vfio_pci_reset is blocked.
> 
>  hw/vfio/pci.c              | 223 ++++++++++++++++++++++++++++++++++++++++++++-
>  hw/vfio/pci.h              |   5 +
>  linux-headers/linux/vfio.h |   1 +
>  3 files changed, 228 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 6877a3d..77d86d8 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -35,6 +35,12 @@
>  #include "trace.h"
>  
>  #define MSIX_CAP_LENGTH 12
> +/*
> + * Timeout for waiting resume notification, it is 1 second.
> + * The resume notificaton will be sent after host aer error recovery.
> + * For hardware bus reset 1 second will be enough.
> + */
> +#define VFIO_RESET_TIMEOUT 1000
>  
>  static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
>  static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
> @@ -1937,6 +1943,14 @@ static void vfio_check_hot_bus_reset(VFIOPCIDevice *vdev, Error **errp)
>      VFIOGroup *group;
>      int ret, i, devfn, range_limit;
>  
> +    if (!vdev->pci_aer_has_resume) {
> +        error_setg(errp, "vfio: Cannot enable AER for device %s,"
> +                   " host vfio driver does not support for"
> +                   " resume notification",
> +                   vdev->vbasedev.name);
> +        return;
> +    }
> +
>      ret = vfio_get_hot_reset_info(vdev, &info);
>      if (ret) {
>          error_setg(errp, "vfio: Cannot enable AER for device %s,"
> @@ -2594,6 +2608,17 @@ static int vfio_populate_device(VFIOPCIDevice *vdev)
>                       vbasedev->name);
>      }
>  
> +    irq_info.index = VFIO_PCI_RESUME_IRQ_INDEX;
> +
> +    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
> +    if (ret) {
> +        /* This can fail for an old kernel or legacy PCI dev */
> +        trace_vfio_populate_device_get_irq_info_failure();
> +        ret = 0;
> +    } else if (irq_info.count == 1) {
> +        vdev->pci_aer_has_resume = true;
> +    }
> +
>  error:
>      return ret;
>  }
> @@ -2606,6 +2631,72 @@ static void vfio_put_device(VFIOPCIDevice *vdev)
>      vfio_put_base_device(&vdev->vbasedev);
>  }
>  
> +static void vfio_aer_error_set_signaled(VFIOPCIDevice *vdev)
> +{
> +    PCIDevice *dev = &vdev->pdev;
> +    PCIDevice *dev_0 = pci_get_function_0(dev);
> +    VFIOPCIDevice *vdev_0 = DO_UPCAST(VFIOPCIDevice, pdev, dev_0);
> +    unsigned int func = PCI_FUNC(dev->devfn);
> +
> +    bitmap_set_atomic(vdev_0->pci_aer_error_signaled_bitmap, func, 1);
> +}
> +
> +static void vfio_aer_resume_set_signaled(VFIOPCIDevice *vdev)
> +{
> +    PCIDevice *dev = &vdev->pdev;
> +    PCIDevice *dev_0 = pci_get_function_0(dev);
> +    VFIOPCIDevice *vdev_0 = DO_UPCAST(VFIOPCIDevice, pdev, dev_0);
> +    unsigned int func = PCI_FUNC(dev->devfn);
> +
> +    bitmap_set_atomic(vdev_0->pci_aer_resume_signaled_bitmap, func, 1);
> +}
> +
> +static void vfio_aer_error_clear_all_signaled(VFIOPCIDevice *vdev)
> +{
> +    PCIDevice *dev = &vdev->pdev;
> +    PCIDevice *dev_0 = pci_get_function_0(dev);
> +    VFIOPCIDevice *vdev_0 = DO_UPCAST(VFIOPCIDevice, pdev, dev_0);
> +
> +    bitmap_test_and_clear_atomic(vdev_0->pci_aer_error_signaled_bitmap,
> +                                 0, PCI_FUNC_MAX);
> +}
> +
> +static void vfio_aer_resume_clear_all_signaled(VFIOPCIDevice *vdev)
> +{
> +    PCIDevice *dev = &vdev->pdev;
> +    PCIDevice *dev_0 = pci_get_function_0(dev);
> +    VFIOPCIDevice *vdev_0 = DO_UPCAST(VFIOPCIDevice, pdev, dev_0);
> +
> +    bitmap_test_and_clear_atomic(vdev_0->pci_aer_resume_signaled_bitmap,
> +                                 0, PCI_FUNC_MAX);
> +}
> +
> +static int vfio_aer_error_is_not_signaled(VFIOPCIDevice *vdev)
> +{
> +    PCIDevice *dev = &vdev->pdev;
> +    PCIDevice *dev_0 = pci_get_function_0(dev);
> +    VFIOPCIDevice *vdev_0 = DO_UPCAST(VFIOPCIDevice, pdev, dev_0);
> +
> +    return slow_bitmap_empty(vdev_0->pci_aer_error_signaled_bitmap,
> +                             PCI_FUNC_MAX);
> +}
> +
> +static int vfio_aer_resume_match_error_signaled(VFIOPCIDevice *vdev)
> +{
> +    PCIDevice *dev = &vdev->pdev;
> +    PCIDevice *dev_0 = pci_get_function_0(dev);
> +    VFIOPCIDevice *vdev_0 = DO_UPCAST(VFIOPCIDevice, pdev, dev_0);
> +    DECLARE_BITMAP(resume_bitmap, PCI_FUNC_MAX);
> +
> +    slow_bitmap_and(resume_bitmap,
> +                    vdev_0->pci_aer_error_signaled_bitmap,
> +                    vdev_0->pci_aer_resume_signaled_bitmap,
> +                    PCI_FUNC_MAX);
> +    return slow_bitmap_equal(resume_bitmap,
> +                             vdev_0->pci_aer_error_signaled_bitmap,
> +                             PCI_FUNC_MAX);
> +}
> +
>  static void vfio_err_notifier_handler(void *opaque)
>  {
>      VFIOPCIDevice *vdev = opaque;
> @@ -2661,6 +2752,12 @@ static void vfio_err_notifier_handler(void *opaque)
>          msg.severity = isfatal ? PCI_ERR_ROOT_CMD_FATAL_EN :
>                                   PCI_ERR_ROOT_CMD_NONFATAL_EN;
>  
> +        if (isfatal) {
> +            if (vfio_aer_error_is_not_signaled(vdev)) {
> +                vfio_aer_resume_clear_all_signaled(vdev);
> +            }
> +            vfio_aer_error_set_signaled(vdev);
> +        }
>          pcie_aer_msg(dev, &msg);
>          return;
>      }
> @@ -2756,6 +2853,98 @@ static void vfio_unregister_err_notifier(VFIOPCIDevice *vdev)
>      event_notifier_cleanup(&vdev->err_notifier);
>  }
>  
> +static void vfio_aer_resume_notifier_handler(void *opaque)
> +{
> +    VFIOPCIDevice *vdev = opaque;
> +
> +    if (!event_notifier_test_and_clear(&vdev->resume_notifier)) {
> +        return;
> +    }
> +
> +    vfio_aer_resume_set_signaled(vdev);
> +    if (vfio_aer_resume_match_error_signaled(vdev) &&
> +        vdev->pci_aer_reset_blocked_timer != NULL) {
> +        timer_mod(vdev->pci_aer_reset_blocked_timer,
> +                  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL));
> +    }
> +}
> +
> +static void vfio_register_aer_resume_notifier(VFIOPCIDevice *vdev)
> +{
> +    int ret;
> +    int argsz;
> +    struct vfio_irq_set *irq_set;
> +    int32_t *pfd;
> +
> +    if (!(vdev->features & VFIO_FEATURE_ENABLE_AER) ||
> +        !vdev->pci_aer_has_resume) {
> +        return;
> +    }
> +
> +    if (event_notifier_init(&vdev->resume_notifier, 0)) {
> +        error_report("vfio: Unable to init event notifier for"
> +                     " resume notification");
> +        vdev->pci_aer_has_resume = false;
> +        return;
> +    }
> +
> +    argsz = sizeof(*irq_set) + sizeof(*pfd);
> +
> +    irq_set = g_malloc0(argsz);
> +    irq_set->argsz = argsz;
> +    irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
> +                     VFIO_IRQ_SET_ACTION_TRIGGER;
> +    irq_set->index = VFIO_PCI_RESUME_IRQ_INDEX;
> +    irq_set->start = 0;
> +    irq_set->count = 1;
> +    pfd = (int32_t *)&irq_set->data;
> +
> +    *pfd = event_notifier_get_fd(&vdev->resume_notifier);
> +    qemu_set_fd_handler(*pfd, vfio_aer_resume_notifier_handler, NULL, vdev);
> +
> +    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
> +    if (ret) {
> +        error_report("vfio: Failed to set up resume notification");
> +        qemu_set_fd_handler(*pfd, NULL, NULL, vdev);
> +        event_notifier_cleanup(&vdev->resume_notifier);
> +        vdev->pci_aer_has_resume = false;
> +    }
> +    g_free(irq_set);
> +}
> +
> +static void vfio_unregister_aer_resume_notifier(VFIOPCIDevice *vdev)
> +{
> +    int argsz;
> +    struct vfio_irq_set *irq_set;
> +    int32_t *pfd;
> +    int ret;
> +
> +    if (!vdev->pci_aer_has_resume) {
> +        return;
> +    }
> +
> +    argsz = sizeof(*irq_set) + sizeof(*pfd);
> +
> +    irq_set = g_malloc0(argsz);
> +    irq_set->argsz = argsz;
> +    irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
> +                     VFIO_IRQ_SET_ACTION_TRIGGER;
> +    irq_set->index = VFIO_PCI_RESUME_IRQ_INDEX;
> +    irq_set->start = 0;
> +    irq_set->count = 1;
> +    pfd = (int32_t *)&irq_set->data;
> +    *pfd = -1;
> +
> +    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
> +    if (ret) {
> +        error_report("vfio: Failed to de-assign error fd: %m");
> +    }
> +    g_free(irq_set);
> +    qemu_set_fd_handler(event_notifier_get_fd(&vdev->resume_notifier),
> +                        NULL, NULL, vdev);
> +    event_notifier_cleanup(&vdev->resume_notifier);
> +}
> +
>  static void vfio_req_notifier_handler(void *opaque)
>  {
>      VFIOPCIDevice *vdev = opaque;
> @@ -3061,6 +3250,7 @@ static int vfio_initfn(PCIDevice *pdev)
>      }
>  
>      vfio_register_err_notifier(vdev);
> +    vfio_register_aer_resume_notifier(vdev);
>      vfio_register_req_notifier(vdev);
>      vfio_setup_resetfn_quirk(vdev);
>  
> @@ -3091,6 +3281,7 @@ static void vfio_exitfn(PCIDevice *pdev)
>      VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
>  
>      vfio_unregister_req_notifier(vdev);
> +    vfio_unregister_aer_resume_notifier(vdev);
>      vfio_unregister_err_notifier(vdev);
>      pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
>      vfio_disable_interrupts(vdev);
> @@ -3101,6 +3292,22 @@ static void vfio_exitfn(PCIDevice *pdev)
>      vfio_bars_exit(vdev);
>  }
>  
> +static void vfio_pci_delayed_reset(void *opaque)
> +{
> +    VFIOPCIDevice *vdev = opaque;
> +    QEMUTimer *reset_timer = vdev->pci_aer_reset_blocked_timer;
> +
> +    vdev->pci_aer_reset_blocked_timer = NULL;
> +    timer_free(reset_timer);
> +
> +    if (vfio_aer_resume_match_error_signaled(vdev)) {
> +        vfio_pci_hot_reset(vdev, vdev->single_depend_dev);
> +        vfio_aer_error_clear_all_signaled(vdev);
> +    } else {
> +        qdev_unplug(&vdev->pdev.qdev, NULL);
> +    }
> +}
> +
>  static void vfio_pci_reset(DeviceState *dev)
>  {
>      PCIDevice *pdev = DO_UPCAST(PCIDevice, qdev, dev);
> @@ -3114,7 +3321,21 @@ static void vfio_pci_reset(DeviceState *dev)
>          if ((pci_get_word(br->config + PCI_BRIDGE_CONTROL) &
>               PCI_BRIDGE_CTL_BUS_RESET)) {
>              if (pci_get_function_0(pdev) == pdev) {
> -                vfio_pci_hot_reset(vdev, vdev->single_depend_dev);
> +                if (vfio_aer_error_is_not_signaled(vdev)) {
> +                    vfio_pci_hot_reset(vdev, vdev->single_depend_dev);
> +                } else {
> +                    if (vfio_aer_resume_match_error_signaled(vdev)) {
> +                        vfio_pci_hot_reset(vdev, vdev->single_depend_dev);
> +                        vfio_aer_error_clear_all_signaled(vdev);
> +                    } else {
> +                        vdev->pci_aer_reset_blocked_timer = timer_new_ms(
> +                            QEMU_CLOCK_VIRTUAL,
> +                            vfio_pci_delayed_reset, vdev);
> +                        timer_mod(vdev->pci_aer_reset_blocked_timer,
> +                            qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL)
> +                            + VFIO_RESET_TIMEOUT);
> +                    }
> +                }
>              }
>              return;
>          }
> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> index 9fb0206..be3b883 100644
> --- a/hw/vfio/pci.h
> +++ b/hw/vfio/pci.h
> @@ -119,6 +119,7 @@ typedef struct VFIOPCIDevice {
>      VFIOVGA *vga; /* 0xa0000, 0x3b0, 0x3c0 */
>      PCIHostDeviceAddress host;
>      EventNotifier err_notifier;
> +    EventNotifier resume_notifier;
>      EventNotifier req_notifier;
>      int (*resetfn)(struct VFIOPCIDevice *);
>      uint32_t vendor_id;
> @@ -144,6 +145,10 @@ typedef struct VFIOPCIDevice {
>      bool no_kvm_msi;
>      bool no_kvm_msix;
>      bool single_depend_dev;
> +    bool pci_aer_has_resume;
> +    DECLARE_BITMAP(pci_aer_error_signaled_bitmap, PCI_FUNC_MAX);
> +    DECLARE_BITMAP(pci_aer_resume_signaled_bitmap, PCI_FUNC_MAX);
> +    QEMUTimer *pci_aer_reset_blocked_timer;
>  } VFIOPCIDevice;
>  
>  uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
> diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
> index 759b850..01dfd5d 100644
> --- a/linux-headers/linux/vfio.h
> +++ b/linux-headers/linux/vfio.h
> @@ -433,6 +433,7 @@ enum {
>  	VFIO_PCI_MSIX_IRQ_INDEX,
>  	VFIO_PCI_ERR_IRQ_INDEX,
>  	VFIO_PCI_REQ_IRQ_INDEX,
> +        VFIO_PCI_RESUME_IRQ_INDEX,
>  	VFIO_PCI_NUM_IRQS
>  };
>  

  reply	other threads:[~2016-05-27 16:07 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-27  2:12 [Qemu-devel] [PATCH v8 11/12] vfio: register aer resume notification handler for aer resume Zhou Jie
2016-05-27 16:06 ` Alex Williamson [this message]
2016-06-12  2:38   ` Zhou Jie
2016-06-20  7:41     ` Zhou Jie
2016-06-20 16:32       ` Alex Williamson
2016-06-21  2:16         ` Zhou Jie
2016-06-21  3:13           ` Alex Williamson
2016-06-21 12:41             ` Chen Fan
2016-06-21 14:44               ` Alex Williamson
2016-06-22  3:28                 ` Zhou Jie
2016-06-22  3:56                   ` Alex Williamson
2016-06-22  5:45                     ` Zhou Jie
2016-06-22  7:49                       ` Zhou Jie
2016-06-22 15:42                         ` Alex Williamson
2016-06-25  1:24                           ` Zhou Jie
2016-06-27 15:54                             ` Alex Williamson
2016-06-28  3:26                               ` Zhou Jie
2016-06-28  3:58                                 ` Alex Williamson
2016-06-28  5:27                                   ` Zhou Jie
2016-06-28 14:40                                     ` Alex Williamson
2016-06-29  8:54                                       ` Zhou Jie
2016-06-29 18:22                                         ` Alex Williamson
2016-06-30  1:45                                           ` Zhou Jie
2016-07-03  4:00                                             ` Zhou Jie
2016-07-05  1:36                                               ` Zhou Jie
2016-07-05 17:03                                                 ` Alex Williamson
2016-07-06  2:01                                                   ` Zhou Jie
2016-07-07 19:04                                                     ` Alex Williamson
2016-07-08  1:38                                                       ` Zhou Jie
2016-07-08 17:33                                                         ` Alex Williamson
2016-07-10  1:28                                                           ` Zhou Jie
2016-07-11 16:24                                                             ` Alex Williamson
2016-07-12  1:42                                                               ` Zhou Jie
2016-07-12 15:45                                                                 ` Alex Williamson
2016-07-13  1:04                                                                   ` Zhou Jie
2016-07-13  2:54                                                                     ` Alex Williamson
2016-07-13  3:33                                                                       ` Zhou Jie
2016-06-22 15:25                       ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160527100655.60db8206@t450s.home \
    --to=alex.williamson@redhat.com \
    --cc=caoj.fnst@cn.fujitsu.com \
    --cc=chen.fan.fnst@cn.fujitsu.com \
    --cc=fan.chen@easystack.cn \
    --cc=izumi.taku@jp.fujitsu.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=zhoujie2011@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.