[Qemu-devel] [PATCH v3 0/2] Bug fixes for EEH on VFIO PCI devices

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v3 0/2] Bug fixes for EEH on VFIO PCI devices
@ 2015-03-26  5:35 Gavin Shan
  2015-03-26  5:35 ` [Qemu-devel] [PATCH v3 1/2] VFIO: Clear stale MSIx table during EEH reset Gavin Shan
  2015-03-26  5:35 ` [Qemu-devel] [PATCH 2/2] sPAPR: Reenable EEH functionality on reboot Gavin Shan
  0 siblings, 2 replies; 14+ messages in thread
From: Gavin Shan @ 2015-03-26  5:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: agraf, Gavin Shan, alex.williamson, qemu-ppc, david

The patches are fixing 2 issues for EEH on VFIO PCI devices. PATCH[1/2]
clears stale MSIx table of VFIO PCI devices when asserting fundamental
or hot PE reset so that their MSIx tables can be restored properly after
reset to avoid recursive EEH error. PATCH[2/2] clears PE frozen state
in case the guest hits excessive EEH errors. With the fix, the VFIO
PCI devices are expected to work again after rebooting guest.

Changelog
=========
v2 -> v3:
        * Introduced vfio_eeh_pe_reset(), which is called from spapr_pci_vfio.c
          when asserting PE reset, replaces vfio_container_eeh_event() to clear
          stale MSIx tables.
        * Droped the patch "VFIO: Disable INTx interrupt on EEH reset", which
          is caused by KVM bugs on host side.
v1 -> v2:
        * vfio_container_eeh_event() stub for !CONFIG_PCI and separate error
          message for this function. Dropped vfio_put_group() on NULL group
        * Disabling INTx interrupt, instead of clearing INTx pending flag
          during PE reset.
 
Gavin Shan (2):
  VFIO: Clear stale MSIx table during EEH reset
  sPAPR: Reenable EEH functionality on reboot

 hw/ppc/spapr_pci_vfio.c | 27 ++++++++++++++++++++++-----
 hw/vfio/Makefile.objs   |  6 +++++-
 hw/vfio/pci-stub.c      | 16 ++++++++++++++++
 hw/vfio/pci.c           | 36 ++++++++++++++++++++++++++++++++++++
 include/hw/vfio/vfio.h  |  2 ++
 5 files changed, 81 insertions(+), 6 deletions(-)
 create mode 100644 hw/vfio/pci-stub.c

-- 
1.8.3.2

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH v3 1/2] VFIO: Clear stale MSIx table during EEH reset
  2015-03-26  5:35 [Qemu-devel] [PATCH v3 0/2] Bug fixes for EEH on VFIO PCI devices Gavin Shan
@ 2015-03-26  5:35 ` Gavin Shan
  2015-03-27  6:00   ` David Gibson
  2015-03-30  2:39   ` David Gibson
  2015-03-26  5:35 ` [Qemu-devel] [PATCH 2/2] sPAPR: Reenable EEH functionality on reboot Gavin Shan
  1 sibling, 2 replies; 14+ messages in thread
From: Gavin Shan @ 2015-03-26  5:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: agraf, Gavin Shan, alex.williamson, qemu-ppc, david

The PCI device MSIx table is cleaned out in hardware after EEH PE
reset. However, we still hold the stale MSIx entries in QEMU, which
should be cleared accordingly. Otherwise, we will run into another
(recursive) EEH error and the PCI devices contained in the PE have
to be offlined exceptionally.

The patch introduces function vfio_eeh_pe_reset(), which is called
by sPAPR when asserting hot or fundamental reset, to clear stale MSIx
table before EEH PE reset so that MSIx table could be restored properly
after EEH PE reset.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 hw/ppc/spapr_pci_vfio.c | 13 +++++++++----
 hw/vfio/Makefile.objs   |  6 +++++-
 hw/vfio/pci-stub.c      | 16 ++++++++++++++++
 hw/vfio/pci.c           | 36 ++++++++++++++++++++++++++++++++++++
 include/hw/vfio/vfio.h  |  2 ++
 5 files changed, 68 insertions(+), 5 deletions(-)
 create mode 100644 hw/vfio/pci-stub.c

diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
index 99a1be5..6fa3afe 100644
--- a/hw/ppc/spapr_pci_vfio.c
+++ b/hw/ppc/spapr_pci_vfio.c
@@ -151,19 +151,24 @@ static int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
     switch (option) {
     case RTAS_SLOT_RESET_DEACTIVATE:
         op.op = VFIO_EEH_PE_RESET_DEACTIVATE;
+        ret = vfio_container_ioctl(&svphb->phb.iommu_as,
+                                   svphb->iommugroupid,
+                                   VFIO_EEH_PE_OP, &op);
         break;
     case RTAS_SLOT_RESET_HOT:
-        op.op = VFIO_EEH_PE_RESET_HOT;
+        ret = vfio_eeh_pe_reset(&svphb->phb.iommu_as,
+                                svphb->iommugroupid,
+                                VFIO_EEH_PE_RESET_HOT);
         break;
     case RTAS_SLOT_RESET_FUNDAMENTAL:
-        op.op = VFIO_EEH_PE_RESET_FUNDAMENTAL;
+        ret = vfio_eeh_pe_reset(&svphb->phb.iommu_as,
+                                svphb->iommugroupid,
+                                VFIO_EEH_PE_RESET_FUNDAMENTAL);
         break;
     default:
         return RTAS_OUT_PARAM_ERROR;
     }
 
-    ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid,
-                               VFIO_EEH_PE_OP, &op);
     if (ret < 0) {
         return RTAS_OUT_HW_ERROR;
     }
diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
index e31f30e..1b8a065 100644
--- a/hw/vfio/Makefile.objs
+++ b/hw/vfio/Makefile.objs
@@ -1,4 +1,8 @@
 ifeq ($(CONFIG_LINUX), y)
 obj-$(CONFIG_SOFTMMU) += common.o
-obj-$(CONFIG_PCI) += pci.o
+ifeq ($(CONFIG_PCI), y)
+obj-y += pci.o
+else
+obj-y += pci-stub.o
+endif
 endif
diff --git a/hw/vfio/pci-stub.c b/hw/vfio/pci-stub.c
new file mode 100644
index 0000000..f317c1e
--- /dev/null
+++ b/hw/vfio/pci-stub.c
@@ -0,0 +1,16 @@
+/*
+ * To include the file on !CONFIG_PCI
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include <linux/vfio.h>
+
+#include "exec/memory.h"
+#include "hw/vfio/vfio.h"
+
+int vfio_eeh_pe_reset(AddressSpace *as, int32_t groupid, uint32_t option)
+{
+    return -1;
+}
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 6b80539..d0fd4b4 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3319,6 +3319,42 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
     vdev->req_enabled = false;
 }
 
+int vfio_eeh_pe_reset(AddressSpace *as, int32_t groupid, uint32_t option)
+{
+    VFIOGroup *group;
+    VFIODevice *vbasedev;
+    VFIOPCIDevice *vdev;
+    struct vfio_eeh_pe_op op = {
+        .argsz = sizeof(op),
+        .op = option
+    };
+
+    group = vfio_get_group(groupid, as);
+    if (!group) {
+        error_report("vfio: group %d not found\n", groupid);
+        return -1;
+    }
+
+    /*
+     * The MSIx table will be cleaned out by reset. We need
+     * disable it so that it can be reenabled properly. Also,
+     * the cached MSIx table should be cleared as it's not
+     * reflecting the contents in hardware.
+     */
+    QLIST_FOREACH(vbasedev, &group->device_list, next) {
+        vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
+        if (msix_enabled(&vdev->pdev)) {
+            vfio_disable_msix(vdev);
+        }
+
+        msix_reset(&vdev->pdev);
+    }
+
+    vfio_put_group(group);
+
+    return vfio_container_ioctl(as, groupid, VFIO_EEH_PE_OP, &op);
+}
+
 static int vfio_initfn(PCIDevice *pdev)
 {
     VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
diff --git a/include/hw/vfio/vfio.h b/include/hw/vfio/vfio.h
index 0b26cd8..52de277 100644
--- a/include/hw/vfio/vfio.h
+++ b/include/hw/vfio/vfio.h
@@ -5,5 +5,7 @@
 
 extern int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
                                 int req, void *param);
+extern int vfio_eeh_pe_reset(AddressSpace *as,
+                             int32_t groupid, uint32_t option);
 
 #endif
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 2/2] sPAPR: Reenable EEH functionality on reboot
  2015-03-26  5:35 [Qemu-devel] [PATCH v3 0/2] Bug fixes for EEH on VFIO PCI devices Gavin Shan
  2015-03-26  5:35 ` [Qemu-devel] [PATCH v3 1/2] VFIO: Clear stale MSIx table during EEH reset Gavin Shan
@ 2015-03-26  5:35 ` Gavin Shan
  2015-03-27  6:01   ` David Gibson
  2015-03-30  2:40   ` David Gibson
  1 sibling, 2 replies; 14+ messages in thread
From: Gavin Shan @ 2015-03-26  5:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: agraf, Gavin Shan, alex.williamson, qemu-ppc, david

When rebooting the guest, some PEs might be in frozen state. The
contained PCI devices won't work properly if their frozen states
aren't cleared in time. One case running into this situation would
be maximal EEH error times encountered in the guest.

The patch reenables the EEH functinality on PEs on PHB's reset
callback, which will clear their frozen states if needed.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 hw/ppc/spapr_pci_vfio.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
index 6fa3afe..25c1b3e 100644
--- a/hw/ppc/spapr_pci_vfio.c
+++ b/hw/ppc/spapr_pci_vfio.c
@@ -71,9 +71,21 @@ static void spapr_phb_vfio_finish_realize(sPAPRPHBState *sphb, Error **errp)
                                 spapr_tce_get_iommu(tcet));
 }
 
+/*
+ * The PE might be in frozen state. To reenable the EEH
+ * functionality on it will clean the frozen state, which
+ * ensures that the contained PCI devices will work properly.
+ */
 static void spapr_phb_vfio_reset(DeviceState *qdev)
 {
-    /* Do nothing */
+    sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(qdev);
+    struct vfio_eeh_pe_op op = {
+        .argsz = sizeof(op),
+        .op = VFIO_EEH_PE_ENABLE
+    };
+
+    vfio_container_ioctl(&svphb->phb.iommu_as,
+                         svphb->iommugroupid, VFIO_EEH_PE_OP, &op);
 }
 
 static int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/2] VFIO: Clear stale MSIx table during EEH reset
  2015-03-26  5:35 ` [Qemu-devel] [PATCH v3 1/2] VFIO: Clear stale MSIx table during EEH reset Gavin Shan
@ 2015-03-27  6:00   ` David Gibson
  2015-03-30  9:32     ` Gavin Shan
  2015-03-30  2:39   ` David Gibson
  1 sibling, 1 reply; 14+ messages in thread
From: David Gibson @ 2015-03-27  6:00 UTC (permalink / raw)
  To: Gavin Shan; +Cc: alex.williamson, qemu-ppc, qemu-devel, agraf

[-- Attachment #1: Type: text/plain, Size: 5004 bytes --]

On Thu, Mar 26, 2015 at 04:35:01PM +1100, Gavin Shan wrote:
> The PCI device MSIx table is cleaned out in hardware after EEH PE
> reset. However, we still hold the stale MSIx entries in QEMU, which
> should be cleared accordingly. Otherwise, we will run into another
> (recursive) EEH error and the PCI devices contained in the PE have
> to be offlined exceptionally.
> 
> The patch introduces function vfio_eeh_pe_reset(), which is called
> by sPAPR when asserting hot or fundamental reset, to clear stale MSIx
> table before EEH PE reset so that MSIx table could be restored properly
> after EEH PE reset.
> 
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr_pci_vfio.c | 13 +++++++++----
>  hw/vfio/Makefile.objs   |  6 +++++-
>  hw/vfio/pci-stub.c      | 16 ++++++++++++++++
>  hw/vfio/pci.c           | 36 ++++++++++++++++++++++++++++++++++++
>  include/hw/vfio/vfio.h  |  2 ++
>  5 files changed, 68 insertions(+), 5 deletions(-)
>  create mode 100644 hw/vfio/pci-stub.c
> 
> diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
> index 99a1be5..6fa3afe 100644
> --- a/hw/ppc/spapr_pci_vfio.c
> +++ b/hw/ppc/spapr_pci_vfio.c
> @@ -151,19 +151,24 @@ static int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
>      switch (option) {
>      case RTAS_SLOT_RESET_DEACTIVATE:
>          op.op = VFIO_EEH_PE_RESET_DEACTIVATE;
> +        ret = vfio_container_ioctl(&svphb->phb.iommu_as,
> +                                   svphb->iommugroupid,
> +                                   VFIO_EEH_PE_OP, &op);
>          break;
>      case RTAS_SLOT_RESET_HOT:
> -        op.op = VFIO_EEH_PE_RESET_HOT;
> +        ret = vfio_eeh_pe_reset(&svphb->phb.iommu_as,
> +                                svphb->iommugroupid,
> +                                VFIO_EEH_PE_RESET_HOT);
>          break;
>      case RTAS_SLOT_RESET_FUNDAMENTAL:
> -        op.op = VFIO_EEH_PE_RESET_FUNDAMENTAL;
> +        ret = vfio_eeh_pe_reset(&svphb->phb.iommu_as,
> +                                svphb->iommugroupid,
> +                                VFIO_EEH_PE_RESET_FUNDAMENTAL);
>          break;
>      default:
>          return RTAS_OUT_PARAM_ERROR;
>      }
>  
> -    ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid,
> -                               VFIO_EEH_PE_OP, &op);
>      if (ret < 0) {
>          return RTAS_OUT_HW_ERROR;
>      }
> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
> index e31f30e..1b8a065 100644
> --- a/hw/vfio/Makefile.objs
> +++ b/hw/vfio/Makefile.objs
> @@ -1,4 +1,8 @@
>  ifeq ($(CONFIG_LINUX), y)
>  obj-$(CONFIG_SOFTMMU) += common.o
> -obj-$(CONFIG_PCI) += pci.o
> +ifeq ($(CONFIG_PCI), y)
> +obj-y += pci.o
> +else
> +obj-y += pci-stub.o
> +endif
>  endif
> diff --git a/hw/vfio/pci-stub.c b/hw/vfio/pci-stub.c
> new file mode 100644
> index 0000000..f317c1e
> --- /dev/null
> +++ b/hw/vfio/pci-stub.c
> @@ -0,0 +1,16 @@
> +/*
> + * To include the file on !CONFIG_PCI
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + */
> +
> +#include <linux/vfio.h>
> +
> +#include "exec/memory.h"
> +#include "hw/vfio/vfio.h"
> +
> +int vfio_eeh_pe_reset(AddressSpace *as, int32_t groupid, uint32_t option)
> +{
> +    return -1;
> +}

This doesn't seem quite right.  AFAICT the only caller of
vfio_eeh_pe_reset() is in spapr_pci_vfio.c, which is only built if
CONFIG_PCI is enabled.  So if there needed to be !PCI stubs, I'd
expect them further up the call stack.

> +int vfio_eeh_pe_reset(AddressSpace *as, int32_t groupid, uint32_t option)
> +{
> +    VFIOGroup *group;
> +    VFIODevice *vbasedev;
> +    VFIOPCIDevice *vdev;
> +    struct vfio_eeh_pe_op op = {
> +        .argsz = sizeof(op),
> +        .op = option
> +    };
> +
> +    group = vfio_get_group(groupid, as);
> +    if (!group) {
> +        error_report("vfio: group %d not found\n", groupid);
> +        return -1;
> +    }
> +
> +    /*
> +     * The MSIx table will be cleaned out by reset. We need
> +     * disable it so that it can be reenabled properly. Also,
> +     * the cached MSIx table should be cleared as it's not
> +     * reflecting the contents in hardware.
> +     */
> +    QLIST_FOREACH(vbasedev, &group->device_list, next) {
> +        vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
> +        if (msix_enabled(&vdev->pdev)) {
> +            vfio_disable_msix(vdev);
> +        }
> +
> +        msix_reset(&vdev->pdev);
> +    }
> +
> +    vfio_put_group(group);
> +
> +    return vfio_container_ioctl(as, groupid, VFIO_EEH_PE_OP, &op);
> +}

This is much better than the vfio_eeh_event stuff().

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] sPAPR: Reenable EEH functionality on reboot
  2015-03-26  5:35 ` [Qemu-devel] [PATCH 2/2] sPAPR: Reenable EEH functionality on reboot Gavin Shan
@ 2015-03-27  6:01   ` David Gibson
  2015-03-30  2:40   ` David Gibson
  1 sibling, 0 replies; 14+ messages in thread
From: David Gibson @ 2015-03-27  6:01 UTC (permalink / raw)
  To: Gavin Shan; +Cc: alex.williamson, qemu-ppc, qemu-devel, agraf

[-- Attachment #1: Type: text/plain, Size: 770 bytes --]

On Thu, Mar 26, 2015 at 04:35:02PM +1100, Gavin Shan wrote:
> When rebooting the guest, some PEs might be in frozen state. The
> contained PCI devices won't work properly if their frozen states
> aren't cleared in time. One case running into this situation would
> be maximal EEH error times encountered in the guest.
> 
> The patch reenables the EEH functinality on PEs on PHB's reset
> callback, which will clear their frozen states if needed.
> 
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/2] VFIO: Clear stale MSIx table during EEH reset
  2015-03-26  5:35 ` [Qemu-devel] [PATCH v3 1/2] VFIO: Clear stale MSIx table during EEH reset Gavin Shan
  2015-03-27  6:00   ` David Gibson
@ 2015-03-30  2:39   ` David Gibson
  2015-03-30  9:34     ` Gavin Shan
  1 sibling, 1 reply; 14+ messages in thread
From: David Gibson @ 2015-03-30  2:39 UTC (permalink / raw)
  To: Gavin Shan; +Cc: alex.williamson, qemu-ppc, qemu-devel, agraf

[-- Attachment #1: Type: text/plain, Size: 5802 bytes --]

On Thu, Mar 26, 2015 at 04:35:01PM +1100, Gavin Shan wrote:
> The PCI device MSIx table is cleaned out in hardware after EEH PE
> reset. However, we still hold the stale MSIx entries in QEMU, which
> should be cleared accordingly. Otherwise, we will run into another
> (recursive) EEH error and the PCI devices contained in the PE have
> to be offlined exceptionally.
> 
> The patch introduces function vfio_eeh_pe_reset(), which is called
> by sPAPR when asserting hot or fundamental reset, to clear stale MSIx
> table before EEH PE reset so that MSIx table could be restored properly
> after EEH PE reset.
> 
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr_pci_vfio.c | 13 +++++++++----
>  hw/vfio/Makefile.objs   |  6 +++++-
>  hw/vfio/pci-stub.c      | 16 ++++++++++++++++
>  hw/vfio/pci.c           | 36 ++++++++++++++++++++++++++++++++++++
>  include/hw/vfio/vfio.h  |  2 ++
>  5 files changed, 68 insertions(+), 5 deletions(-)
>  create mode 100644 hw/vfio/pci-stub.c
> 
> diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
> index 99a1be5..6fa3afe 100644
> --- a/hw/ppc/spapr_pci_vfio.c
> +++ b/hw/ppc/spapr_pci_vfio.c
> @@ -151,19 +151,24 @@ static int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
>      switch (option) {
>      case RTAS_SLOT_RESET_DEACTIVATE:
>          op.op = VFIO_EEH_PE_RESET_DEACTIVATE;
> +        ret = vfio_container_ioctl(&svphb->phb.iommu_as,
> +                                   svphb->iommugroupid,
> +                                   VFIO_EEH_PE_OP, &op);

For consistency, I think all the reset operations should go through
vfio_eeh_pe_reset(), even though in this case it won't do more than
call vfio_container_ioctl().

>          break;
>      case RTAS_SLOT_RESET_HOT:
> -        op.op = VFIO_EEH_PE_RESET_HOT;
> +        ret = vfio_eeh_pe_reset(&svphb->phb.iommu_as,
> +                                svphb->iommugroupid,
> +                                VFIO_EEH_PE_RESET_HOT);
>          break;
>      case RTAS_SLOT_RESET_FUNDAMENTAL:
> -        op.op = VFIO_EEH_PE_RESET_FUNDAMENTAL;
> +        ret = vfio_eeh_pe_reset(&svphb->phb.iommu_as,
> +                                svphb->iommugroupid,
> +                                VFIO_EEH_PE_RESET_FUNDAMENTAL);
>          break;
>      default:
>          return RTAS_OUT_PARAM_ERROR;
>      }
>  
> -    ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid,
> -                               VFIO_EEH_PE_OP, &op);
>      if (ret < 0) {
>          return RTAS_OUT_HW_ERROR;
>      }
> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
> index e31f30e..1b8a065 100644
> --- a/hw/vfio/Makefile.objs
> +++ b/hw/vfio/Makefile.objs
> @@ -1,4 +1,8 @@
>  ifeq ($(CONFIG_LINUX), y)
>  obj-$(CONFIG_SOFTMMU) += common.o
> -obj-$(CONFIG_PCI) += pci.o
> +ifeq ($(CONFIG_PCI), y)
> +obj-y += pci.o
> +else
> +obj-y += pci-stub.o
> +endif
>  endif
> diff --git a/hw/vfio/pci-stub.c b/hw/vfio/pci-stub.c
> new file mode 100644
> index 0000000..f317c1e
> --- /dev/null
> +++ b/hw/vfio/pci-stub.c
> @@ -0,0 +1,16 @@
> +/*
> + * To include the file on !CONFIG_PCI
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + */
> +
> +#include <linux/vfio.h>
> +
> +#include "exec/memory.h"
> +#include "hw/vfio/vfio.h"
> +
> +int vfio_eeh_pe_reset(AddressSpace *as, int32_t groupid, uint32_t option)
> +{
> +    return -1;

Probably should have assert(0) here - this should never be called if !CONFIG_PCI.

> +}
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 6b80539..d0fd4b4 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3319,6 +3319,42 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
>      vdev->req_enabled = false;
>  }
>  
> +int vfio_eeh_pe_reset(AddressSpace *as, int32_t groupid, uint32_t option)
> +{
> +    VFIOGroup *group;
> +    VFIODevice *vbasedev;
> +    VFIOPCIDevice *vdev;
> +    struct vfio_eeh_pe_op op = {
> +        .argsz = sizeof(op),
> +        .op = option
> +    };
> +
> +    group = vfio_get_group(groupid, as);
> +    if (!group) {
> +        error_report("vfio: group %d not found\n", groupid);
> +        return -1;
> +    }
> +
> +    /*
> +     * The MSIx table will be cleaned out by reset. We need
> +     * disable it so that it can be reenabled properly. Also,
> +     * the cached MSIx table should be cleared as it's not
> +     * reflecting the contents in hardware.
> +     */
> +    QLIST_FOREACH(vbasedev, &group->device_list, next) {
> +        vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
> +        if (msix_enabled(&vdev->pdev)) {
> +            vfio_disable_msix(vdev);
> +        }
> +
> +        msix_reset(&vdev->pdev);
> +    }
> +
> +    vfio_put_group(group);
> +
> +    return vfio_container_ioctl(as, groupid, VFIO_EEH_PE_OP, &op);
> +}
> +
>  static int vfio_initfn(PCIDevice *pdev)
>  {
>      VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> diff --git a/include/hw/vfio/vfio.h b/include/hw/vfio/vfio.h
> index 0b26cd8..52de277 100644
> --- a/include/hw/vfio/vfio.h
> +++ b/include/hw/vfio/vfio.h
> @@ -5,5 +5,7 @@
>  
>  extern int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
>                                  int req, void *param);
> +extern int vfio_eeh_pe_reset(AddressSpace *as,
> +                             int32_t groupid, uint32_t option);
>  
>  #endif

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] sPAPR: Reenable EEH functionality on reboot
  2015-03-26  5:35 ` [Qemu-devel] [PATCH 2/2] sPAPR: Reenable EEH functionality on reboot Gavin Shan
  2015-03-27  6:01   ` David Gibson
@ 2015-03-30  2:40   ` David Gibson
  2015-03-30  9:35     ` Gavin Shan
  1 sibling, 1 reply; 14+ messages in thread
From: David Gibson @ 2015-03-30  2:40 UTC (permalink / raw)
  To: Gavin Shan; +Cc: alex.williamson, qemu-ppc, qemu-devel, agraf

[-- Attachment #1: Type: text/plain, Size: 770 bytes --]

On Thu, Mar 26, 2015 at 04:35:02PM +1100, Gavin Shan wrote:
> When rebooting the guest, some PEs might be in frozen state. The
> contained PCI devices won't work properly if their frozen states
> aren't cleared in time. One case running into this situation would
> be maximal EEH error times encountered in the guest.
> 
> The patch reenables the EEH functinality on PEs on PHB's reset
> callback, which will clear their frozen states if needed.
> 
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/2] VFIO: Clear stale MSIx table during EEH reset
  2015-03-27  6:00   ` David Gibson
@ 2015-03-30  9:32     ` Gavin Shan
  0 siblings, 0 replies; 14+ messages in thread
From: Gavin Shan @ 2015-03-30  9:32 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-devel, Gavin Shan, agraf, alex.williamson, qemu-ppc

On Fri, Mar 27, 2015 at 05:00:25PM +1100, David Gibson wrote:
>On Thu, Mar 26, 2015 at 04:35:01PM +1100, Gavin Shan wrote:
>> The PCI device MSIx table is cleaned out in hardware after EEH PE
>> reset. However, we still hold the stale MSIx entries in QEMU, which
>> should be cleared accordingly. Otherwise, we will run into another
>> (recursive) EEH error and the PCI devices contained in the PE have
>> to be offlined exceptionally.
>> 
>> The patch introduces function vfio_eeh_pe_reset(), which is called
>> by sPAPR when asserting hot or fundamental reset, to clear stale MSIx
>> table before EEH PE reset so that MSIx table could be restored properly
>> after EEH PE reset.
>> 
>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> ---
>>  hw/ppc/spapr_pci_vfio.c | 13 +++++++++----
>>  hw/vfio/Makefile.objs   |  6 +++++-
>>  hw/vfio/pci-stub.c      | 16 ++++++++++++++++
>>  hw/vfio/pci.c           | 36 ++++++++++++++++++++++++++++++++++++
>>  include/hw/vfio/vfio.h  |  2 ++
>>  5 files changed, 68 insertions(+), 5 deletions(-)
>>  create mode 100644 hw/vfio/pci-stub.c
>> 
>> diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
>> index 99a1be5..6fa3afe 100644
>> --- a/hw/ppc/spapr_pci_vfio.c
>> +++ b/hw/ppc/spapr_pci_vfio.c
>> @@ -151,19 +151,24 @@ static int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
>>      switch (option) {
>>      case RTAS_SLOT_RESET_DEACTIVATE:
>>          op.op = VFIO_EEH_PE_RESET_DEACTIVATE;
>> +        ret = vfio_container_ioctl(&svphb->phb.iommu_as,
>> +                                   svphb->iommugroupid,
>> +                                   VFIO_EEH_PE_OP, &op);
>>          break;
>>      case RTAS_SLOT_RESET_HOT:
>> -        op.op = VFIO_EEH_PE_RESET_HOT;
>> +        ret = vfio_eeh_pe_reset(&svphb->phb.iommu_as,
>> +                                svphb->iommugroupid,
>> +                                VFIO_EEH_PE_RESET_HOT);
>>          break;
>>      case RTAS_SLOT_RESET_FUNDAMENTAL:
>> -        op.op = VFIO_EEH_PE_RESET_FUNDAMENTAL;
>> +        ret = vfio_eeh_pe_reset(&svphb->phb.iommu_as,
>> +                                svphb->iommugroupid,
>> +                                VFIO_EEH_PE_RESET_FUNDAMENTAL);
>>          break;
>>      default:
>>          return RTAS_OUT_PARAM_ERROR;
>>      }
>>  
>> -    ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid,
>> -                               VFIO_EEH_PE_OP, &op);
>>      if (ret < 0) {
>>          return RTAS_OUT_HW_ERROR;
>>      }
>> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
>> index e31f30e..1b8a065 100644
>> --- a/hw/vfio/Makefile.objs
>> +++ b/hw/vfio/Makefile.objs
>> @@ -1,4 +1,8 @@
>>  ifeq ($(CONFIG_LINUX), y)
>>  obj-$(CONFIG_SOFTMMU) += common.o
>> -obj-$(CONFIG_PCI) += pci.o
>> +ifeq ($(CONFIG_PCI), y)
>> +obj-y += pci.o
>> +else
>> +obj-y += pci-stub.o
>> +endif
>>  endif
>> diff --git a/hw/vfio/pci-stub.c b/hw/vfio/pci-stub.c
>> new file mode 100644
>> index 0000000..f317c1e
>> --- /dev/null
>> +++ b/hw/vfio/pci-stub.c
>> @@ -0,0 +1,16 @@
>> +/*
>> + * To include the file on !CONFIG_PCI
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2.  See
>> + * the COPYING file in the top-level directory.
>> + */
>> +
>> +#include <linux/vfio.h>
>> +
>> +#include "exec/memory.h"
>> +#include "hw/vfio/vfio.h"
>> +
>> +int vfio_eeh_pe_reset(AddressSpace *as, int32_t groupid, uint32_t option)
>> +{
>> +    return -1;
>> +}
>
>This doesn't seem quite right.  AFAICT the only caller of
>vfio_eeh_pe_reset() is in spapr_pci_vfio.c, which is only built if
>CONFIG_PCI is enabled.  So if there needed to be !PCI stubs, I'd
>expect them further up the call stack.
>

Or we simply drop the stub for !CONFIG_PCI if Alex.W agree. Alex, what's
your opnion?

Thanks,
Gavin

>> +int vfio_eeh_pe_reset(AddressSpace *as, int32_t groupid, uint32_t option)
>> +{
>> +    VFIOGroup *group;
>> +    VFIODevice *vbasedev;
>> +    VFIOPCIDevice *vdev;
>> +    struct vfio_eeh_pe_op op = {
>> +        .argsz = sizeof(op),
>> +        .op = option
>> +    };
>> +
>> +    group = vfio_get_group(groupid, as);
>> +    if (!group) {
>> +        error_report("vfio: group %d not found\n", groupid);
>> +        return -1;
>> +    }
>> +
>> +    /*
>> +     * The MSIx table will be cleaned out by reset. We need
>> +     * disable it so that it can be reenabled properly. Also,
>> +     * the cached MSIx table should be cleared as it's not
>> +     * reflecting the contents in hardware.
>> +     */
>> +    QLIST_FOREACH(vbasedev, &group->device_list, next) {
>> +        vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>> +        if (msix_enabled(&vdev->pdev)) {
>> +            vfio_disable_msix(vdev);
>> +        }
>> +
>> +        msix_reset(&vdev->pdev);
>> +    }
>> +
>> +    vfio_put_group(group);
>> +
>> +    return vfio_container_ioctl(as, groupid, VFIO_EEH_PE_OP, &op);
>> +}
>
>This is much better than the vfio_eeh_event stuff().
>
>-- 
>David Gibson			| I'll have my music baroque, and my code
>david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
>				| _way_ _around_!
>http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/2] VFIO: Clear stale MSIx table during EEH reset
  2015-03-30  2:39   ` David Gibson
@ 2015-03-30  9:34     ` Gavin Shan
  2015-03-31 19:36       ` Alex Williamson
  0 siblings, 1 reply; 14+ messages in thread
From: Gavin Shan @ 2015-03-30  9:34 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-devel, Gavin Shan, agraf, alex.williamson, qemu-ppc

On Mon, Mar 30, 2015 at 01:39:16PM +1100, David Gibson wrote:
>On Thu, Mar 26, 2015 at 04:35:01PM +1100, Gavin Shan wrote:
>> The PCI device MSIx table is cleaned out in hardware after EEH PE
>> reset. However, we still hold the stale MSIx entries in QEMU, which
>> should be cleared accordingly. Otherwise, we will run into another
>> (recursive) EEH error and the PCI devices contained in the PE have
>> to be offlined exceptionally.
>> 
>> The patch introduces function vfio_eeh_pe_reset(), which is called
>> by sPAPR when asserting hot or fundamental reset, to clear stale MSIx
>> table before EEH PE reset so that MSIx table could be restored properly
>> after EEH PE reset.
>> 
>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> ---
>>  hw/ppc/spapr_pci_vfio.c | 13 +++++++++----
>>  hw/vfio/Makefile.objs   |  6 +++++-
>>  hw/vfio/pci-stub.c      | 16 ++++++++++++++++
>>  hw/vfio/pci.c           | 36 ++++++++++++++++++++++++++++++++++++
>>  include/hw/vfio/vfio.h  |  2 ++
>>  5 files changed, 68 insertions(+), 5 deletions(-)
>>  create mode 100644 hw/vfio/pci-stub.c
>> 
>> diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
>> index 99a1be5..6fa3afe 100644
>> --- a/hw/ppc/spapr_pci_vfio.c
>> +++ b/hw/ppc/spapr_pci_vfio.c
>> @@ -151,19 +151,24 @@ static int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
>>      switch (option) {
>>      case RTAS_SLOT_RESET_DEACTIVATE:
>>          op.op = VFIO_EEH_PE_RESET_DEACTIVATE;
>> +        ret = vfio_container_ioctl(&svphb->phb.iommu_as,
>> +                                   svphb->iommugroupid,
>> +                                   VFIO_EEH_PE_OP, &op);
>
>For consistency, I think all the reset operations should go through
>vfio_eeh_pe_reset(), even though in this case it won't do more than
>call vfio_container_ioctl().
>

Fair enough. I'll fix :-)


>>          break;
>>      case RTAS_SLOT_RESET_HOT:
>> -        op.op = VFIO_EEH_PE_RESET_HOT;
>> +        ret = vfio_eeh_pe_reset(&svphb->phb.iommu_as,
>> +                                svphb->iommugroupid,
>> +                                VFIO_EEH_PE_RESET_HOT);
>>          break;
>>      case RTAS_SLOT_RESET_FUNDAMENTAL:
>> -        op.op = VFIO_EEH_PE_RESET_FUNDAMENTAL;
>> +        ret = vfio_eeh_pe_reset(&svphb->phb.iommu_as,
>> +                                svphb->iommugroupid,
>> +                                VFIO_EEH_PE_RESET_FUNDAMENTAL);
>>          break;
>>      default:
>>          return RTAS_OUT_PARAM_ERROR;
>>      }
>>  
>> -    ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid,
>> -                               VFIO_EEH_PE_OP, &op);
>>      if (ret < 0) {
>>          return RTAS_OUT_HW_ERROR;
>>      }
>> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
>> index e31f30e..1b8a065 100644
>> --- a/hw/vfio/Makefile.objs
>> +++ b/hw/vfio/Makefile.objs
>> @@ -1,4 +1,8 @@
>>  ifeq ($(CONFIG_LINUX), y)
>>  obj-$(CONFIG_SOFTMMU) += common.o
>> -obj-$(CONFIG_PCI) += pci.o
>> +ifeq ($(CONFIG_PCI), y)
>> +obj-y += pci.o
>> +else
>> +obj-y += pci-stub.o
>> +endif
>>  endif
>> diff --git a/hw/vfio/pci-stub.c b/hw/vfio/pci-stub.c
>> new file mode 100644
>> index 0000000..f317c1e
>> --- /dev/null
>> +++ b/hw/vfio/pci-stub.c
>> @@ -0,0 +1,16 @@
>> +/*
>> + * To include the file on !CONFIG_PCI
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2.  See
>> + * the COPYING file in the top-level directory.
>> + */
>> +
>> +#include <linux/vfio.h>
>> +
>> +#include "exec/memory.h"
>> +#include "hw/vfio/vfio.h"
>> +
>> +int vfio_eeh_pe_reset(AddressSpace *as, int32_t groupid, uint32_t option)
>> +{
>> +    return -1;
>
>Probably should have assert(0) here - this should never be called if !CONFIG_PCI.
>

Indeed, assert(0) would be better. I just replied to ask for dropping the stub
for !CONFIG_PCI if you and Alex.W agree.

Thanks,
Gavin

>> +}
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 6b80539..d0fd4b4 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -3319,6 +3319,42 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
>>      vdev->req_enabled = false;
>>  }
>>  
>> +int vfio_eeh_pe_reset(AddressSpace *as, int32_t groupid, uint32_t option)
>> +{
>> +    VFIOGroup *group;
>> +    VFIODevice *vbasedev;
>> +    VFIOPCIDevice *vdev;
>> +    struct vfio_eeh_pe_op op = {
>> +        .argsz = sizeof(op),
>> +        .op = option
>> +    };
>> +
>> +    group = vfio_get_group(groupid, as);
>> +    if (!group) {
>> +        error_report("vfio: group %d not found\n", groupid);
>> +        return -1;
>> +    }
>> +
>> +    /*
>> +     * The MSIx table will be cleaned out by reset. We need
>> +     * disable it so that it can be reenabled properly. Also,
>> +     * the cached MSIx table should be cleared as it's not
>> +     * reflecting the contents in hardware.
>> +     */
>> +    QLIST_FOREACH(vbasedev, &group->device_list, next) {
>> +        vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>> +        if (msix_enabled(&vdev->pdev)) {
>> +            vfio_disable_msix(vdev);
>> +        }
>> +
>> +        msix_reset(&vdev->pdev);
>> +    }
>> +
>> +    vfio_put_group(group);
>> +
>> +    return vfio_container_ioctl(as, groupid, VFIO_EEH_PE_OP, &op);
>> +}
>> +
>>  static int vfio_initfn(PCIDevice *pdev)
>>  {
>>      VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
>> diff --git a/include/hw/vfio/vfio.h b/include/hw/vfio/vfio.h
>> index 0b26cd8..52de277 100644
>> --- a/include/hw/vfio/vfio.h
>> +++ b/include/hw/vfio/vfio.h
>> @@ -5,5 +5,7 @@
>>  
>>  extern int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
>>                                  int req, void *param);
>> +extern int vfio_eeh_pe_reset(AddressSpace *as,
>> +                             int32_t groupid, uint32_t option);
>>  
>>  #endif
>
>-- 
>David Gibson			| I'll have my music baroque, and my code
>david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
>				| _way_ _around_!
>http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] sPAPR: Reenable EEH functionality on reboot
  2015-03-30  2:40   ` David Gibson
@ 2015-03-30  9:35     ` Gavin Shan
  0 siblings, 0 replies; 14+ messages in thread
From: Gavin Shan @ 2015-03-30  9:35 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-devel, Gavin Shan, agraf, alex.williamson, qemu-ppc

On Mon, Mar 30, 2015 at 01:40:04PM +1100, David Gibson wrote:
>On Thu, Mar 26, 2015 at 04:35:02PM +1100, Gavin Shan wrote:
>> When rebooting the guest, some PEs might be in frozen state. The
>> contained PCI devices won't work properly if their frozen states
>> aren't cleared in time. One case running into this situation would
>> be maximal EEH error times encountered in the guest.
>> 
>> The patch reenables the EEH functinality on PEs on PHB's reset
>> callback, which will clear their frozen states if needed.
>> 
>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>
>Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
>

Thanks for your review, David.

>-- 
>David Gibson			| I'll have my music baroque, and my code
>david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
>				| _way_ _around_!
>http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/2] VFIO: Clear stale MSIx table during EEH reset
  2015-03-30  9:34     ` Gavin Shan
@ 2015-03-31 19:36       ` Alex Williamson
  2015-04-01  0:20         ` Gavin Shan
  0 siblings, 1 reply; 14+ messages in thread
From: Alex Williamson @ 2015-03-31 19:36 UTC (permalink / raw)
  To: Gavin Shan; +Cc: agraf, qemu-ppc, qemu-devel, David Gibson

On Mon, 2015-03-30 at 20:34 +1100, Gavin Shan wrote:
> On Mon, Mar 30, 2015 at 01:39:16PM +1100, David Gibson wrote:
> >On Thu, Mar 26, 2015 at 04:35:01PM +1100, Gavin Shan wrote:
> >> The PCI device MSIx table is cleaned out in hardware after EEH PE
> >> reset. However, we still hold the stale MSIx entries in QEMU, which
> >> should be cleared accordingly. Otherwise, we will run into another
> >> (recursive) EEH error and the PCI devices contained in the PE have
> >> to be offlined exceptionally.
> >> 
> >> The patch introduces function vfio_eeh_pe_reset(), which is called
> >> by sPAPR when asserting hot or fundamental reset, to clear stale MSIx
> >> table before EEH PE reset so that MSIx table could be restored properly
> >> after EEH PE reset.
> >> 
> >> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> >> ---
> >>  hw/ppc/spapr_pci_vfio.c | 13 +++++++++----
> >>  hw/vfio/Makefile.objs   |  6 +++++-
> >>  hw/vfio/pci-stub.c      | 16 ++++++++++++++++
> >>  hw/vfio/pci.c           | 36 ++++++++++++++++++++++++++++++++++++
> >>  include/hw/vfio/vfio.h  |  2 ++
> >>  5 files changed, 68 insertions(+), 5 deletions(-)
> >>  create mode 100644 hw/vfio/pci-stub.c
> >> 
> >> diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
> >> index 99a1be5..6fa3afe 100644
> >> --- a/hw/ppc/spapr_pci_vfio.c
> >> +++ b/hw/ppc/spapr_pci_vfio.c
> >> @@ -151,19 +151,24 @@ static int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
> >>      switch (option) {
> >>      case RTAS_SLOT_RESET_DEACTIVATE:
> >>          op.op = VFIO_EEH_PE_RESET_DEACTIVATE;
> >> +        ret = vfio_container_ioctl(&svphb->phb.iommu_as,
> >> +                                   svphb->iommugroupid,
> >> +                                   VFIO_EEH_PE_OP, &op);
> >
> >For consistency, I think all the reset operations should go through
> >vfio_eeh_pe_reset(), even though in this case it won't do more than
> >call vfio_container_ioctl().
> >
> 
> Fair enough. I'll fix :-)
> 
> 
> >>          break;
> >>      case RTAS_SLOT_RESET_HOT:
> >> -        op.op = VFIO_EEH_PE_RESET_HOT;
> >> +        ret = vfio_eeh_pe_reset(&svphb->phb.iommu_as,
> >> +                                svphb->iommugroupid,
> >> +                                VFIO_EEH_PE_RESET_HOT);
> >>          break;
> >>      case RTAS_SLOT_RESET_FUNDAMENTAL:
> >> -        op.op = VFIO_EEH_PE_RESET_FUNDAMENTAL;
> >> +        ret = vfio_eeh_pe_reset(&svphb->phb.iommu_as,
> >> +                                svphb->iommugroupid,
> >> +                                VFIO_EEH_PE_RESET_FUNDAMENTAL);
> >>          break;
> >>      default:
> >>          return RTAS_OUT_PARAM_ERROR;
> >>      }
> >>  
> >> -    ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid,
> >> -                               VFIO_EEH_PE_OP, &op);
> >>      if (ret < 0) {
> >>          return RTAS_OUT_HW_ERROR;
> >>      }
> >> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
> >> index e31f30e..1b8a065 100644
> >> --- a/hw/vfio/Makefile.objs
> >> +++ b/hw/vfio/Makefile.objs
> >> @@ -1,4 +1,8 @@
> >>  ifeq ($(CONFIG_LINUX), y)
> >>  obj-$(CONFIG_SOFTMMU) += common.o
> >> -obj-$(CONFIG_PCI) += pci.o
> >> +ifeq ($(CONFIG_PCI), y)
> >> +obj-y += pci.o
> >> +else
> >> +obj-y += pci-stub.o
> >> +endif
> >>  endif
> >> diff --git a/hw/vfio/pci-stub.c b/hw/vfio/pci-stub.c
> >> new file mode 100644
> >> index 0000000..f317c1e
> >> --- /dev/null
> >> +++ b/hw/vfio/pci-stub.c
> >> @@ -0,0 +1,16 @@
> >> +/*
> >> + * To include the file on !CONFIG_PCI
> >> + *
> >> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> >> + * the COPYING file in the top-level directory.
> >> + */
> >> +
> >> +#include <linux/vfio.h>
> >> +
> >> +#include "exec/memory.h"
> >> +#include "hw/vfio/vfio.h"
> >> +
> >> +int vfio_eeh_pe_reset(AddressSpace *as, int32_t groupid, uint32_t option)
> >> +{
> >> +    return -1;
> >
> >Probably should have assert(0) here - this should never be called if !CONFIG_PCI.
> >
> 
> Indeed, assert(0) would be better. I just replied to ask for dropping the stub
> for !CONFIG_PCI if you and Alex.W agree.

I certainly don't see the reason for the stub, it was only suggested
before because a previous version had the callout in hw/vfio/common.c

> >> +}
> >> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> >> index 6b80539..d0fd4b4 100644
> >> --- a/hw/vfio/pci.c
> >> +++ b/hw/vfio/pci.c
> >> @@ -3319,6 +3319,42 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
> >>      vdev->req_enabled = false;
> >>  }
> >>  
> >> +int vfio_eeh_pe_reset(AddressSpace *as, int32_t groupid, uint32_t option)
> >> +{
> >> +    VFIOGroup *group;
> >> +    VFIODevice *vbasedev;
> >> +    VFIOPCIDevice *vdev;
> >> +    struct vfio_eeh_pe_op op = {
> >> +        .argsz = sizeof(op),
> >> +        .op = option
> >> +    };
> >> +
> >> +    group = vfio_get_group(groupid, as);
> >> +    if (!group) {
> >> +        error_report("vfio: group %d not found\n", groupid);
> >> +        return -1;
> >> +    }
> >> +
> >> +    /*
> >> +     * The MSIx table will be cleaned out by reset. We need
> >> +     * disable it so that it can be reenabled properly. Also,
> >> +     * the cached MSIx table should be cleared as it's not
> >> +     * reflecting the contents in hardware.
> >> +     */
> >> +    QLIST_FOREACH(vbasedev, &group->device_list, next) {
> >> +        vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
> >> +        if (msix_enabled(&vdev->pdev)) {
> >> +            vfio_disable_msix(vdev);
> >> +        }
> >> +
> >> +        msix_reset(&vdev->pdev);
> >> +    }
> >> +
> >> +    vfio_put_group(group);
> >> +
> >> +    return vfio_container_ioctl(as, groupid, VFIO_EEH_PE_OP, &op);
> >> +}

So all you're trying to do here is find the devices in the PE and
disable/reset MSI-X, but do you really need yet another ugly callback
into vfio to do that?  Isn't it possible to find the devices based on
the address space or PCI topology?  If we have EEH emulation, don't you
also want to do this for emulated devices?  The vfio_disable_msix() call
could be replaced by the equivalent config space access to make it look
like the guest disabled MSI-X.

> >> +
> >>  static int vfio_initfn(PCIDevice *pdev)
> >>  {
> >>      VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> >> diff --git a/include/hw/vfio/vfio.h b/include/hw/vfio/vfio.h
> >> index 0b26cd8..52de277 100644
> >> --- a/include/hw/vfio/vfio.h
> >> +++ b/include/hw/vfio/vfio.h
> >> @@ -5,5 +5,7 @@
> >>  
> >>  extern int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
> >>                                  int req, void *param);
> >> +extern int vfio_eeh_pe_reset(AddressSpace *as,
> >> +                             int32_t groupid, uint32_t option);
> >>  
> >>  #endif
> >
> >-- 
> >David Gibson			| I'll have my music baroque, and my code
> >david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> >				| _way_ _around_!
> >http://www.ozlabs.org/~dgibson
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/2] VFIO: Clear stale MSIx table during EEH reset
  2015-03-31 19:36       ` Alex Williamson
@ 2015-04-01  0:20         ` Gavin Shan
  2015-04-01  1:16           ` Alex Williamson
  0 siblings, 1 reply; 14+ messages in thread
From: Gavin Shan @ 2015-04-01  0:20 UTC (permalink / raw)
  To: Alex Williamson; +Cc: qemu-devel, agraf, qemu-ppc, Gavin Shan, David Gibson

On Tue, Mar 31, 2015 at 01:36:30PM -0600, Alex Williamson wrote:
>On Mon, 2015-03-30 at 20:34 +1100, Gavin Shan wrote:
>> On Mon, Mar 30, 2015 at 01:39:16PM +1100, David Gibson wrote:
>> >On Thu, Mar 26, 2015 at 04:35:01PM +1100, Gavin Shan wrote:
>> >> The PCI device MSIx table is cleaned out in hardware after EEH PE
>> >> reset. However, we still hold the stale MSIx entries in QEMU, which
>> >> should be cleared accordingly. Otherwise, we will run into another
>> >> (recursive) EEH error and the PCI devices contained in the PE have
>> >> to be offlined exceptionally.
>> >> 
>> >> The patch introduces function vfio_eeh_pe_reset(), which is called
>> >> by sPAPR when asserting hot or fundamental reset, to clear stale MSIx
>> >> table before EEH PE reset so that MSIx table could be restored properly
>> >> after EEH PE reset.
>> >> 
>> >> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> >> ---
>> >>  hw/ppc/spapr_pci_vfio.c | 13 +++++++++----
>> >>  hw/vfio/Makefile.objs   |  6 +++++-
>> >>  hw/vfio/pci-stub.c      | 16 ++++++++++++++++
>> >>  hw/vfio/pci.c           | 36 ++++++++++++++++++++++++++++++++++++
>> >>  include/hw/vfio/vfio.h  |  2 ++
>> >>  5 files changed, 68 insertions(+), 5 deletions(-)
>> >>  create mode 100644 hw/vfio/pci-stub.c
>> >> 
>> >> diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
>> >> index 99a1be5..6fa3afe 100644
>> >> --- a/hw/ppc/spapr_pci_vfio.c
>> >> +++ b/hw/ppc/spapr_pci_vfio.c
>> >> @@ -151,19 +151,24 @@ static int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
>> >>      switch (option) {
>> >>      case RTAS_SLOT_RESET_DEACTIVATE:
>> >>          op.op = VFIO_EEH_PE_RESET_DEACTIVATE;
>> >> +        ret = vfio_container_ioctl(&svphb->phb.iommu_as,
>> >> +                                   svphb->iommugroupid,
>> >> +                                   VFIO_EEH_PE_OP, &op);
>> >
>> >For consistency, I think all the reset operations should go through
>> >vfio_eeh_pe_reset(), even though in this case it won't do more than
>> >call vfio_container_ioctl().
>> >
>> 
>> Fair enough. I'll fix :-)
>> 
>> 
>> >>          break;
>> >>      case RTAS_SLOT_RESET_HOT:
>> >> -        op.op = VFIO_EEH_PE_RESET_HOT;
>> >> +        ret = vfio_eeh_pe_reset(&svphb->phb.iommu_as,
>> >> +                                svphb->iommugroupid,
>> >> +                                VFIO_EEH_PE_RESET_HOT);
>> >>          break;
>> >>      case RTAS_SLOT_RESET_FUNDAMENTAL:
>> >> -        op.op = VFIO_EEH_PE_RESET_FUNDAMENTAL;
>> >> +        ret = vfio_eeh_pe_reset(&svphb->phb.iommu_as,
>> >> +                                svphb->iommugroupid,
>> >> +                                VFIO_EEH_PE_RESET_FUNDAMENTAL);
>> >>          break;
>> >>      default:
>> >>          return RTAS_OUT_PARAM_ERROR;
>> >>      }
>> >>  
>> >> -    ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid,
>> >> -                               VFIO_EEH_PE_OP, &op);
>> >>      if (ret < 0) {
>> >>          return RTAS_OUT_HW_ERROR;
>> >>      }
>> >> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
>> >> index e31f30e..1b8a065 100644
>> >> --- a/hw/vfio/Makefile.objs
>> >> +++ b/hw/vfio/Makefile.objs
>> >> @@ -1,4 +1,8 @@
>> >>  ifeq ($(CONFIG_LINUX), y)
>> >>  obj-$(CONFIG_SOFTMMU) += common.o
>> >> -obj-$(CONFIG_PCI) += pci.o
>> >> +ifeq ($(CONFIG_PCI), y)
>> >> +obj-y += pci.o
>> >> +else
>> >> +obj-y += pci-stub.o
>> >> +endif
>> >>  endif
>> >> diff --git a/hw/vfio/pci-stub.c b/hw/vfio/pci-stub.c
>> >> new file mode 100644
>> >> index 0000000..f317c1e
>> >> --- /dev/null
>> >> +++ b/hw/vfio/pci-stub.c
>> >> @@ -0,0 +1,16 @@
>> >> +/*
>> >> + * To include the file on !CONFIG_PCI
>> >> + *
>> >> + * This work is licensed under the terms of the GNU GPL, version 2.  See
>> >> + * the COPYING file in the top-level directory.
>> >> + */
>> >> +
>> >> +#include <linux/vfio.h>
>> >> +
>> >> +#include "exec/memory.h"
>> >> +#include "hw/vfio/vfio.h"
>> >> +
>> >> +int vfio_eeh_pe_reset(AddressSpace *as, int32_t groupid, uint32_t option)
>> >> +{
>> >> +    return -1;
>> >
>> >Probably should have assert(0) here - this should never be called if !CONFIG_PCI.
>> >
>> 
>> Indeed, assert(0) would be better. I just replied to ask for dropping the stub
>> for !CONFIG_PCI if you and Alex.W agree.
>
>I certainly don't see the reason for the stub, it was only suggested
>before because a previous version had the callout in hw/vfio/common.c
>

Ok. I'll drop it in next revision.

>> >> +}
>> >> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> >> index 6b80539..d0fd4b4 100644
>> >> --- a/hw/vfio/pci.c
>> >> +++ b/hw/vfio/pci.c
>> >> @@ -3319,6 +3319,42 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
>> >>      vdev->req_enabled = false;
>> >>  }
>> >>  
>> >> +int vfio_eeh_pe_reset(AddressSpace *as, int32_t groupid, uint32_t option)
>> >> +{
>> >> +    VFIOGroup *group;
>> >> +    VFIODevice *vbasedev;
>> >> +    VFIOPCIDevice *vdev;
>> >> +    struct vfio_eeh_pe_op op = {
>> >> +        .argsz = sizeof(op),
>> >> +        .op = option
>> >> +    };
>> >> +
>> >> +    group = vfio_get_group(groupid, as);
>> >> +    if (!group) {
>> >> +        error_report("vfio: group %d not found\n", groupid);
>> >> +        return -1;
>> >> +    }
>> >> +
>> >> +    /*
>> >> +     * The MSIx table will be cleaned out by reset. We need
>> >> +     * disable it so that it can be reenabled properly. Also,
>> >> +     * the cached MSIx table should be cleared as it's not
>> >> +     * reflecting the contents in hardware.
>> >> +     */
>> >> +    QLIST_FOREACH(vbasedev, &group->device_list, next) {
>> >> +        vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>> >> +        if (msix_enabled(&vdev->pdev)) {
>> >> +            vfio_disable_msix(vdev);
>> >> +        }
>> >> +
>> >> +        msix_reset(&vdev->pdev);
>> >> +    }
>> >> +
>> >> +    vfio_put_group(group);
>> >> +
>> >> +    return vfio_container_ioctl(as, groupid, VFIO_EEH_PE_OP, &op);
>> >> +}
>
>So all you're trying to do here is find the devices in the PE and
>disable/reset MSI-X, but do you really need yet another ugly callback
>into vfio to do that?  Isn't it possible to find the devices based on
>the address space or PCI topology?  If we have EEH emulation, don't you
>also want to do this for emulated devices?  The vfio_disable_msix() call
>could be replaced by the equivalent config space access to make it look
>like the guest disabled MSI-X.
>

EEH for emulated PCI device is out of scope for now, which depends on
fully emulated IBM's PHB. The PE reset is requested by guest and guest
is aware of losing MSIx table after that.

I'm not sure I'm following your suggestion, but yes, the VFIO PCI devices
can be identified by checking its class string with help of some QOM helper
functions. So I guess you are suggesting something as follows, which would
make the code a bit cleaner.

- In hw/ppc/spapr_pci_vfio.c::spapr_phb_vfio_eeh_reset(), check all PCI
  devices hooked to the PHB and if it's a VFIO PCI device, disable MSIx
  interrupt by clearing MSIX_ENABLE in the config space and cleaning out
  the MSIx table if MSIx interrupt has been enabled on the PCI device.

Thanks,
Gavin

>> >> +
>> >>  static int vfio_initfn(PCIDevice *pdev)
>> >>  {
>> >>      VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
>> >> diff --git a/include/hw/vfio/vfio.h b/include/hw/vfio/vfio.h
>> >> index 0b26cd8..52de277 100644
>> >> --- a/include/hw/vfio/vfio.h
>> >> +++ b/include/hw/vfio/vfio.h
>> >> @@ -5,5 +5,7 @@
>> >>  
>> >>  extern int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
>> >>                                  int req, void *param);
>> >> +extern int vfio_eeh_pe_reset(AddressSpace *as,
>> >> +                             int32_t groupid, uint32_t option);
>> >>  
>> >>  #endif
>> >
>> >-- 
>> >David Gibson			| I'll have my music baroque, and my code
>> >david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
>> >				| _way_ _around_!
>> >http://www.ozlabs.org/~dgibson
>> 
>> 
>
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/2] VFIO: Clear stale MSIx table during EEH reset
  2015-04-01  0:20         ` Gavin Shan
@ 2015-04-01  1:16           ` Alex Williamson
  2015-04-01  3:05             ` Gavin Shan
  0 siblings, 1 reply; 14+ messages in thread
From: Alex Williamson @ 2015-04-01  1:16 UTC (permalink / raw)
  To: Gavin Shan; +Cc: agraf, qemu-ppc, qemu-devel, David Gibson

On Wed, 2015-04-01 at 11:20 +1100, Gavin Shan wrote:
> On Tue, Mar 31, 2015 at 01:36:30PM -0600, Alex Williamson wrote:
> >On Mon, 2015-03-30 at 20:34 +1100, Gavin Shan wrote:
> >> On Mon, Mar 30, 2015 at 01:39:16PM +1100, David Gibson wrote:
> >> >On Thu, Mar 26, 2015 at 04:35:01PM +1100, Gavin Shan wrote:
> >> >> The PCI device MSIx table is cleaned out in hardware after EEH PE
> >> >> reset. However, we still hold the stale MSIx entries in QEMU, which
> >> >> should be cleared accordingly. Otherwise, we will run into another
> >> >> (recursive) EEH error and the PCI devices contained in the PE have
> >> >> to be offlined exceptionally.
> >> >> 
> >> >> The patch introduces function vfio_eeh_pe_reset(), which is called
> >> >> by sPAPR when asserting hot or fundamental reset, to clear stale MSIx
> >> >> table before EEH PE reset so that MSIx table could be restored properly
> >> >> after EEH PE reset.
> >> >> 
> >> >> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> >> >> ---
> >> >>  hw/ppc/spapr_pci_vfio.c | 13 +++++++++----
> >> >>  hw/vfio/Makefile.objs   |  6 +++++-
> >> >>  hw/vfio/pci-stub.c      | 16 ++++++++++++++++
> >> >>  hw/vfio/pci.c           | 36 ++++++++++++++++++++++++++++++++++++
> >> >>  include/hw/vfio/vfio.h  |  2 ++
> >> >>  5 files changed, 68 insertions(+), 5 deletions(-)
> >> >>  create mode 100644 hw/vfio/pci-stub.c
> >> >> 
> >> >> diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
> >> >> index 99a1be5..6fa3afe 100644
> >> >> --- a/hw/ppc/spapr_pci_vfio.c
> >> >> +++ b/hw/ppc/spapr_pci_vfio.c
> >> >> @@ -151,19 +151,24 @@ static int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
> >> >>      switch (option) {
> >> >>      case RTAS_SLOT_RESET_DEACTIVATE:
> >> >>          op.op = VFIO_EEH_PE_RESET_DEACTIVATE;
> >> >> +        ret = vfio_container_ioctl(&svphb->phb.iommu_as,
> >> >> +                                   svphb->iommugroupid,
> >> >> +                                   VFIO_EEH_PE_OP, &op);
> >> >
> >> >For consistency, I think all the reset operations should go through
> >> >vfio_eeh_pe_reset(), even though in this case it won't do more than
> >> >call vfio_container_ioctl().
> >> >
> >> 
> >> Fair enough. I'll fix :-)
> >> 
> >> 
> >> >>          break;
> >> >>      case RTAS_SLOT_RESET_HOT:
> >> >> -        op.op = VFIO_EEH_PE_RESET_HOT;
> >> >> +        ret = vfio_eeh_pe_reset(&svphb->phb.iommu_as,
> >> >> +                                svphb->iommugroupid,
> >> >> +                                VFIO_EEH_PE_RESET_HOT);
> >> >>          break;
> >> >>      case RTAS_SLOT_RESET_FUNDAMENTAL:
> >> >> -        op.op = VFIO_EEH_PE_RESET_FUNDAMENTAL;
> >> >> +        ret = vfio_eeh_pe_reset(&svphb->phb.iommu_as,
> >> >> +                                svphb->iommugroupid,
> >> >> +                                VFIO_EEH_PE_RESET_FUNDAMENTAL);
> >> >>          break;
> >> >>      default:
> >> >>          return RTAS_OUT_PARAM_ERROR;
> >> >>      }
> >> >>  
> >> >> -    ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid,
> >> >> -                               VFIO_EEH_PE_OP, &op);
> >> >>      if (ret < 0) {
> >> >>          return RTAS_OUT_HW_ERROR;
> >> >>      }
> >> >> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
> >> >> index e31f30e..1b8a065 100644
> >> >> --- a/hw/vfio/Makefile.objs
> >> >> +++ b/hw/vfio/Makefile.objs
> >> >> @@ -1,4 +1,8 @@
> >> >>  ifeq ($(CONFIG_LINUX), y)
> >> >>  obj-$(CONFIG_SOFTMMU) += common.o
> >> >> -obj-$(CONFIG_PCI) += pci.o
> >> >> +ifeq ($(CONFIG_PCI), y)
> >> >> +obj-y += pci.o
> >> >> +else
> >> >> +obj-y += pci-stub.o
> >> >> +endif
> >> >>  endif
> >> >> diff --git a/hw/vfio/pci-stub.c b/hw/vfio/pci-stub.c
> >> >> new file mode 100644
> >> >> index 0000000..f317c1e
> >> >> --- /dev/null
> >> >> +++ b/hw/vfio/pci-stub.c
> >> >> @@ -0,0 +1,16 @@
> >> >> +/*
> >> >> + * To include the file on !CONFIG_PCI
> >> >> + *
> >> >> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> >> >> + * the COPYING file in the top-level directory.
> >> >> + */
> >> >> +
> >> >> +#include <linux/vfio.h>
> >> >> +
> >> >> +#include "exec/memory.h"
> >> >> +#include "hw/vfio/vfio.h"
> >> >> +
> >> >> +int vfio_eeh_pe_reset(AddressSpace *as, int32_t groupid, uint32_t option)
> >> >> +{
> >> >> +    return -1;
> >> >
> >> >Probably should have assert(0) here - this should never be called if !CONFIG_PCI.
> >> >
> >> 
> >> Indeed, assert(0) would be better. I just replied to ask for dropping the stub
> >> for !CONFIG_PCI if you and Alex.W agree.
> >
> >I certainly don't see the reason for the stub, it was only suggested
> >before because a previous version had the callout in hw/vfio/common.c
> >
> 
> Ok. I'll drop it in next revision.
> 
> >> >> +}
> >> >> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> >> >> index 6b80539..d0fd4b4 100644
> >> >> --- a/hw/vfio/pci.c
> >> >> +++ b/hw/vfio/pci.c
> >> >> @@ -3319,6 +3319,42 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
> >> >>      vdev->req_enabled = false;
> >> >>  }
> >> >>  
> >> >> +int vfio_eeh_pe_reset(AddressSpace *as, int32_t groupid, uint32_t option)
> >> >> +{
> >> >> +    VFIOGroup *group;
> >> >> +    VFIODevice *vbasedev;
> >> >> +    VFIOPCIDevice *vdev;
> >> >> +    struct vfio_eeh_pe_op op = {
> >> >> +        .argsz = sizeof(op),
> >> >> +        .op = option
> >> >> +    };
> >> >> +
> >> >> +    group = vfio_get_group(groupid, as);
> >> >> +    if (!group) {
> >> >> +        error_report("vfio: group %d not found\n", groupid);
> >> >> +        return -1;
> >> >> +    }
> >> >> +
> >> >> +    /*
> >> >> +     * The MSIx table will be cleaned out by reset. We need
> >> >> +     * disable it so that it can be reenabled properly. Also,
> >> >> +     * the cached MSIx table should be cleared as it's not
> >> >> +     * reflecting the contents in hardware.
> >> >> +     */
> >> >> +    QLIST_FOREACH(vbasedev, &group->device_list, next) {
> >> >> +        vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
> >> >> +        if (msix_enabled(&vdev->pdev)) {
> >> >> +            vfio_disable_msix(vdev);
> >> >> +        }
> >> >> +
> >> >> +        msix_reset(&vdev->pdev);
> >> >> +    }
> >> >> +
> >> >> +    vfio_put_group(group);
> >> >> +
> >> >> +    return vfio_container_ioctl(as, groupid, VFIO_EEH_PE_OP, &op);
> >> >> +}
> >
> >So all you're trying to do here is find the devices in the PE and
> >disable/reset MSI-X, but do you really need yet another ugly callback
> >into vfio to do that?  Isn't it possible to find the devices based on
> >the address space or PCI topology?  If we have EEH emulation, don't you
> >also want to do this for emulated devices?  The vfio_disable_msix() call
> >could be replaced by the equivalent config space access to make it look
> >like the guest disabled MSI-X.
> >
> 
> EEH for emulated PCI device is out of scope for now, which depends on
> fully emulated IBM's PHB. The PE reset is requested by guest and guest
> is aware of losing MSIx table after that.
> 
> I'm not sure I'm following your suggestion, but yes, the VFIO PCI devices
> can be identified by checking its class string with help of some QOM helper
> functions. So I guess you are suggesting something as follows, which would
> make the code a bit cleaner.
> 
> - In hw/ppc/spapr_pci_vfio.c::spapr_phb_vfio_eeh_reset(), check all PCI
>   devices hooked to the PHB and if it's a VFIO PCI device, disable MSIx
>   interrupt by clearing MSIX_ENABLE in the config space and cleaning out
>   the MSIx table if MSIx interrupt has been enabled on the PCI device.

That's what I'm suggesting, but why do you even need to check whether
the subordinate device is vfio?  I imagine you can't mix emulated and
vfio devices behind a phb, but even if you could, what's the harm in
doing the same MSI-X reset on emulated devices?  You don't need to
support EEH on emulated, but you also don't need to handle vfio uniquely
here.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/2] VFIO: Clear stale MSIx table during EEH reset
  2015-04-01  1:16           ` Alex Williamson
@ 2015-04-01  3:05             ` Gavin Shan
  0 siblings, 0 replies; 14+ messages in thread
From: Gavin Shan @ 2015-04-01  3:05 UTC (permalink / raw)
  To: Alex Williamson; +Cc: qemu-devel, agraf, qemu-ppc, Gavin Shan, David Gibson

On Tue, Mar 31, 2015 at 07:16:43PM -0600, Alex Williamson wrote:
>On Wed, 2015-04-01 at 11:20 +1100, Gavin Shan wrote:
>> On Tue, Mar 31, 2015 at 01:36:30PM -0600, Alex Williamson wrote:
>> >On Mon, 2015-03-30 at 20:34 +1100, Gavin Shan wrote:
>> >> On Mon, Mar 30, 2015 at 01:39:16PM +1100, David Gibson wrote:
>> >> >On Thu, Mar 26, 2015 at 04:35:01PM +1100, Gavin Shan wrote:
>> >> >> The PCI device MSIx table is cleaned out in hardware after EEH PE
>> >> >> reset. However, we still hold the stale MSIx entries in QEMU, which
>> >> >> should be cleared accordingly. Otherwise, we will run into another
>> >> >> (recursive) EEH error and the PCI devices contained in the PE have
>> >> >> to be offlined exceptionally.
>> >> >> 
>> >> >> The patch introduces function vfio_eeh_pe_reset(), which is called
>> >> >> by sPAPR when asserting hot or fundamental reset, to clear stale MSIx
>> >> >> table before EEH PE reset so that MSIx table could be restored properly
>> >> >> after EEH PE reset.
>> >> >> 
>> >> >> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> >> >> ---
>> >> >>  hw/ppc/spapr_pci_vfio.c | 13 +++++++++----
>> >> >>  hw/vfio/Makefile.objs   |  6 +++++-
>> >> >>  hw/vfio/pci-stub.c      | 16 ++++++++++++++++
>> >> >>  hw/vfio/pci.c           | 36 ++++++++++++++++++++++++++++++++++++
>> >> >>  include/hw/vfio/vfio.h  |  2 ++
>> >> >>  5 files changed, 68 insertions(+), 5 deletions(-)
>> >> >>  create mode 100644 hw/vfio/pci-stub.c
>> >> >> 
>> >> >> diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
>> >> >> index 99a1be5..6fa3afe 100644
>> >> >> --- a/hw/ppc/spapr_pci_vfio.c
>> >> >> +++ b/hw/ppc/spapr_pci_vfio.c
>> >> >> @@ -151,19 +151,24 @@ static int spapr_phb_vfio_eeh_reset(sPAPRPHBState *sphb, int option)
>> >> >>      switch (option) {
>> >> >>      case RTAS_SLOT_RESET_DEACTIVATE:
>> >> >>          op.op = VFIO_EEH_PE_RESET_DEACTIVATE;
>> >> >> +        ret = vfio_container_ioctl(&svphb->phb.iommu_as,
>> >> >> +                                   svphb->iommugroupid,
>> >> >> +                                   VFIO_EEH_PE_OP, &op);
>> >> >
>> >> >For consistency, I think all the reset operations should go through
>> >> >vfio_eeh_pe_reset(), even though in this case it won't do more than
>> >> >call vfio_container_ioctl().
>> >> >
>> >> 
>> >> Fair enough. I'll fix :-)
>> >> 
>> >> 
>> >> >>          break;
>> >> >>      case RTAS_SLOT_RESET_HOT:
>> >> >> -        op.op = VFIO_EEH_PE_RESET_HOT;
>> >> >> +        ret = vfio_eeh_pe_reset(&svphb->phb.iommu_as,
>> >> >> +                                svphb->iommugroupid,
>> >> >> +                                VFIO_EEH_PE_RESET_HOT);
>> >> >>          break;
>> >> >>      case RTAS_SLOT_RESET_FUNDAMENTAL:
>> >> >> -        op.op = VFIO_EEH_PE_RESET_FUNDAMENTAL;
>> >> >> +        ret = vfio_eeh_pe_reset(&svphb->phb.iommu_as,
>> >> >> +                                svphb->iommugroupid,
>> >> >> +                                VFIO_EEH_PE_RESET_FUNDAMENTAL);
>> >> >>          break;
>> >> >>      default:
>> >> >>          return RTAS_OUT_PARAM_ERROR;
>> >> >>      }
>> >> >>  
>> >> >> -    ret = vfio_container_ioctl(&svphb->phb.iommu_as, svphb->iommugroupid,
>> >> >> -                               VFIO_EEH_PE_OP, &op);
>> >> >>      if (ret < 0) {
>> >> >>          return RTAS_OUT_HW_ERROR;
>> >> >>      }
>> >> >> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
>> >> >> index e31f30e..1b8a065 100644
>> >> >> --- a/hw/vfio/Makefile.objs
>> >> >> +++ b/hw/vfio/Makefile.objs
>> >> >> @@ -1,4 +1,8 @@
>> >> >>  ifeq ($(CONFIG_LINUX), y)
>> >> >>  obj-$(CONFIG_SOFTMMU) += common.o
>> >> >> -obj-$(CONFIG_PCI) += pci.o
>> >> >> +ifeq ($(CONFIG_PCI), y)
>> >> >> +obj-y += pci.o
>> >> >> +else
>> >> >> +obj-y += pci-stub.o
>> >> >> +endif
>> >> >>  endif
>> >> >> diff --git a/hw/vfio/pci-stub.c b/hw/vfio/pci-stub.c
>> >> >> new file mode 100644
>> >> >> index 0000000..f317c1e
>> >> >> --- /dev/null
>> >> >> +++ b/hw/vfio/pci-stub.c
>> >> >> @@ -0,0 +1,16 @@
>> >> >> +/*
>> >> >> + * To include the file on !CONFIG_PCI
>> >> >> + *
>> >> >> + * This work is licensed under the terms of the GNU GPL, version 2.  See
>> >> >> + * the COPYING file in the top-level directory.
>> >> >> + */
>> >> >> +
>> >> >> +#include <linux/vfio.h>
>> >> >> +
>> >> >> +#include "exec/memory.h"
>> >> >> +#include "hw/vfio/vfio.h"
>> >> >> +
>> >> >> +int vfio_eeh_pe_reset(AddressSpace *as, int32_t groupid, uint32_t option)
>> >> >> +{
>> >> >> +    return -1;
>> >> >
>> >> >Probably should have assert(0) here - this should never be called if !CONFIG_PCI.
>> >> >
>> >> 
>> >> Indeed, assert(0) would be better. I just replied to ask for dropping the stub
>> >> for !CONFIG_PCI if you and Alex.W agree.
>> >
>> >I certainly don't see the reason for the stub, it was only suggested
>> >before because a previous version had the callout in hw/vfio/common.c
>> >
>> 
>> Ok. I'll drop it in next revision.
>> 
>> >> >> +}
>> >> >> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> >> >> index 6b80539..d0fd4b4 100644
>> >> >> --- a/hw/vfio/pci.c
>> >> >> +++ b/hw/vfio/pci.c
>> >> >> @@ -3319,6 +3319,42 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
>> >> >>      vdev->req_enabled = false;
>> >> >>  }
>> >> >>  
>> >> >> +int vfio_eeh_pe_reset(AddressSpace *as, int32_t groupid, uint32_t option)
>> >> >> +{
>> >> >> +    VFIOGroup *group;
>> >> >> +    VFIODevice *vbasedev;
>> >> >> +    VFIOPCIDevice *vdev;
>> >> >> +    struct vfio_eeh_pe_op op = {
>> >> >> +        .argsz = sizeof(op),
>> >> >> +        .op = option
>> >> >> +    };
>> >> >> +
>> >> >> +    group = vfio_get_group(groupid, as);
>> >> >> +    if (!group) {
>> >> >> +        error_report("vfio: group %d not found\n", groupid);
>> >> >> +        return -1;
>> >> >> +    }
>> >> >> +
>> >> >> +    /*
>> >> >> +     * The MSIx table will be cleaned out by reset. We need
>> >> >> +     * disable it so that it can be reenabled properly. Also,
>> >> >> +     * the cached MSIx table should be cleared as it's not
>> >> >> +     * reflecting the contents in hardware.
>> >> >> +     */
>> >> >> +    QLIST_FOREACH(vbasedev, &group->device_list, next) {
>> >> >> +        vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>> >> >> +        if (msix_enabled(&vdev->pdev)) {
>> >> >> +            vfio_disable_msix(vdev);
>> >> >> +        }
>> >> >> +
>> >> >> +        msix_reset(&vdev->pdev);
>> >> >> +    }
>> >> >> +
>> >> >> +    vfio_put_group(group);
>> >> >> +
>> >> >> +    return vfio_container_ioctl(as, groupid, VFIO_EEH_PE_OP, &op);
>> >> >> +}
>> >
>> >So all you're trying to do here is find the devices in the PE and
>> >disable/reset MSI-X, but do you really need yet another ugly callback
>> >into vfio to do that?  Isn't it possible to find the devices based on
>> >the address space or PCI topology?  If we have EEH emulation, don't you
>> >also want to do this for emulated devices?  The vfio_disable_msix() call
>> >could be replaced by the equivalent config space access to make it look
>> >like the guest disabled MSI-X.
>> >
>> 
>> EEH for emulated PCI device is out of scope for now, which depends on
>> fully emulated IBM's PHB. The PE reset is requested by guest and guest
>> is aware of losing MSIx table after that.
>> 
>> I'm not sure I'm following your suggestion, but yes, the VFIO PCI devices
>> can be identified by checking its class string with help of some QOM helper
>> functions. So I guess you are suggesting something as follows, which would
>> make the code a bit cleaner.
>> 
>> - In hw/ppc/spapr_pci_vfio.c::spapr_phb_vfio_eeh_reset(), check all PCI
>>   devices hooked to the PHB and if it's a VFIO PCI device, disable MSIx
>>   interrupt by clearing MSIX_ENABLE in the config space and cleaning out
>>   the MSIx table if MSIx interrupt has been enabled on the PCI device.
>
>That's what I'm suggesting, but why do you even need to check whether
>the subordinate device is vfio?  I imagine you can't mix emulated and
>vfio devices behind a phb, but even if you could, what's the harm in
>doing the same MSI-X reset on emulated devices?  You don't need to
>support EEH on emulated, but you also don't need to handle vfio uniquely
>here.  Thanks,
>

Thanks for confirm. Yes, it's fair enough since EEH is platform unique
feature. I'll move the logic to sPAPR platform.

The VFIO PCI devices could be hooked to PCI bus, which is leaded from
emulated PCI bridge. It might be harmless to reset MSIx table for the
upstream emulatd bridge, but pointless as it's not part of the PE, on
which we're applying the reset. However, I doubt if a emulated PCI bridge
needs MSIx at all.

Thanks,
Gavin

>Alex
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-04-01  3:06 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-26  5:35 [Qemu-devel] [PATCH v3 0/2] Bug fixes for EEH on VFIO PCI devices Gavin Shan
2015-03-26  5:35 ` [Qemu-devel] [PATCH v3 1/2] VFIO: Clear stale MSIx table during EEH reset Gavin Shan
2015-03-27  6:00   ` David Gibson
2015-03-30  9:32     ` Gavin Shan
2015-03-30  2:39   ` David Gibson
2015-03-30  9:34     ` Gavin Shan
2015-03-31 19:36       ` Alex Williamson
2015-04-01  0:20         ` Gavin Shan
2015-04-01  1:16           ` Alex Williamson
2015-04-01  3:05             ` Gavin Shan
2015-03-26  5:35 ` [Qemu-devel] [PATCH 2/2] sPAPR: Reenable EEH functionality on reboot Gavin Shan
2015-03-27  6:01   ` David Gibson
2015-03-30  2:40   ` David Gibson
2015-03-30  9:35     ` Gavin Shan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.