All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 1/3] VFIO: Clear stale MSIx table during EEH reset
@ 2015-03-16 16:31 Gavin Shan
  2015-03-16 16:31 ` [Qemu-devel] [PATCH v2 2/3] VFIO: Disable INTx interrupt on " Gavin Shan
                   ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Gavin Shan @ 2015-03-16 16:31 UTC (permalink / raw)
  To: qemu-devel; +Cc: aik, agraf, Gavin Shan, alex.williamson, qemu-ppc, david

The PCI device MSIx table is cleaned out in hardware after EEH PE
reset. However, we still hold the stale MSIx entries in QEMU, which
should be cleared accordingly. Otherwise, we will run into another
(recursive) EEH error and the PCI devices contained in the PE have
to be offlined exceptionally.

The patch clears stale MSIx table before EEH PE reset so that MSIx
table could be restored properly after EEH PE reset.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
v2: vfio_container_eeh_event() stub for !CONFIG_PCI and separate
    error message for this function. Dropped vfio_put_group()
    on NULL group
---
 hw/vfio/Makefile.objs  |  6 +++++-
 hw/vfio/common.c       |  7 +++++++
 hw/vfio/pci-stub.c     | 17 +++++++++++++++++
 hw/vfio/pci.c          | 38 ++++++++++++++++++++++++++++++++++++++
 include/hw/vfio/vfio.h |  2 ++
 5 files changed, 69 insertions(+), 1 deletion(-)
 create mode 100644 hw/vfio/pci-stub.c

diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
index e31f30e..1b8a065 100644
--- a/hw/vfio/Makefile.objs
+++ b/hw/vfio/Makefile.objs
@@ -1,4 +1,8 @@
 ifeq ($(CONFIG_LINUX), y)
 obj-$(CONFIG_SOFTMMU) += common.o
-obj-$(CONFIG_PCI) += pci.o
+ifeq ($(CONFIG_PCI), y)
+obj-y += pci.o
+else
+obj-y += pci-stub.o
+endif
 endif
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 148eb53..ed07814 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -949,7 +949,14 @@ int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
     switch (req) {
     case VFIO_CHECK_EXTENSION:
     case VFIO_IOMMU_SPAPR_TCE_GET_INFO:
+        break;
     case VFIO_EEH_PE_OP:
+        if (vfio_container_eeh_event(as, groupid, param) != 0) {
+            error_report("vfio: cannot handle EEH event on group %d\n",
+                         groupid);
+            return -1;
+        }
+
         break;
     default:
         /* Return an error on unknown requests */
diff --git a/hw/vfio/pci-stub.c b/hw/vfio/pci-stub.c
new file mode 100644
index 0000000..7726a7a
--- /dev/null
+++ b/hw/vfio/pci-stub.c
@@ -0,0 +1,17 @@
+/*
+ * To include the file on !CONFIG_PCI
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include <linux/vfio.h>
+
+#include "exec/memory.h"
+#include "hw/vfio/vfio.h"
+
+int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
+                             struct vfio_eeh_pe_op *op)
+{
+    return -1;
+}
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 6b80539..fca1edc 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3319,6 +3319,44 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
     vdev->req_enabled = false;
 }
 
+int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
+                             struct vfio_eeh_pe_op *op)
+{
+    VFIOGroup *group;
+    VFIODevice *vbasedev;
+    VFIOPCIDevice *vdev;
+
+    group = vfio_get_group(groupid, as);
+    if (!group) {
+        error_report("vfio: group %d not found\n", groupid);
+        return -1;
+    }
+
+    switch (op->op) {
+    case VFIO_EEH_PE_RESET_HOT:
+    case VFIO_EEH_PE_RESET_FUNDAMENTAL:
+        /*
+         * The MSIx table will be cleaned out by reset. We need
+         * disable it so that it can be reenabled properly. Also,
+         * the cached MSIx table should be cleared as it's not
+         * reflecting the contents in hardware.
+         */
+        QLIST_FOREACH(vbasedev, &group->device_list, next) {
+            vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
+            if (msix_enabled(&vdev->pdev)) {
+                vfio_disable_msix(vdev);
+            }
+
+            msix_reset(&vdev->pdev);
+        }
+
+        break;
+    }
+
+    vfio_put_group(group);
+    return 0;
+}
+
 static int vfio_initfn(PCIDevice *pdev)
 {
     VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
diff --git a/include/hw/vfio/vfio.h b/include/hw/vfio/vfio.h
index 0b26cd8..e9a3abb 100644
--- a/include/hw/vfio/vfio.h
+++ b/include/hw/vfio/vfio.h
@@ -5,5 +5,7 @@
 
 extern int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
                                 int req, void *param);
+extern int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
+                                    struct vfio_eeh_pe_op *op);
 
 #endif
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH v2 2/3] VFIO: Disable INTx interrupt on EEH reset
  2015-03-16 16:31 [Qemu-devel] [PATCH v2 1/3] VFIO: Clear stale MSIx table during EEH reset Gavin Shan
@ 2015-03-16 16:31 ` Gavin Shan
  2015-03-17 21:16   ` Alex Williamson
  2015-03-16 16:31 ` [Qemu-devel] [PATCH v2 3/3] sPAPR: Reenable EEH functionality on reboot Gavin Shan
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 21+ messages in thread
From: Gavin Shan @ 2015-03-16 16:31 UTC (permalink / raw)
  To: qemu-devel; +Cc: aik, agraf, Gavin Shan, alex.williamson, qemu-ppc, david

When Linux guest recovers from EEH error on the following Emulex
adapter, the MSIx interrupts are disabled and the INTx emulation
is enabled. One INTx interrupt is injected to the guest by host
because of detected pending INTx interrupts on the adapter. QEMU
disables mmap'ed BAR regions and starts a timer to enable those
regions at later point the INTx interrupt handler. Unfortunately,
"VFIOPCIDevice->intx.pending" isn't cleared, meaning those disabled
mapp'ed BAR regions won't be reenabled properly. It leads to EEH
recovery failure at guest side because of hanged MMIO access.

 # lspci | grep Emulex
 0000:01:00.0 Ethernet controller: Emulex Corporation \
              OneConnect 10Gb NIC (be3) (rev 02)
 0000:01:00.1 Ethernet controller: Emulex Corporation \
              OneConnect 10Gb NIC (be3) (rev 02)

The patch disables INTx interrupt before doing EEH reset to avoid
the issue.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 hw/vfio/pci.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index fca1edc..bfa3d0c 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3340,6 +3340,14 @@ int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
          * disable it so that it can be reenabled properly. Also,
          * the cached MSIx table should be cleared as it's not
          * reflecting the contents in hardware.
+         *
+         * We might have INTx interrupt whose handler disables the
+         * memory mapped BARs. The INTx pending state can't be
+         * cleared with memory BAR access in slow path. The timer
+         * kicked by the INTx interrupt handler won't enable those
+         * disabled memory mapped BARs, which leads to hanged MMIO
+         * register access and EEH recovery failure. We simply disable
+         * INTx if it has been enabled.
          */
         QLIST_FOREACH(vbasedev, &group->device_list, next) {
             vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
@@ -3348,6 +3356,11 @@ int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
             }
 
             msix_reset(&vdev->pdev);
+
+            /* Disable INTx */
+            if (vdev->interrupt == VFIO_INT_INTx) {
+                vfio_disable_intx(vdev);
+            }
         }
 
         break;
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH v2 3/3] sPAPR: Reenable EEH functionality on reboot
  2015-03-16 16:31 [Qemu-devel] [PATCH v2 1/3] VFIO: Clear stale MSIx table during EEH reset Gavin Shan
  2015-03-16 16:31 ` [Qemu-devel] [PATCH v2 2/3] VFIO: Disable INTx interrupt on " Gavin Shan
@ 2015-03-16 16:31 ` Gavin Shan
  2015-03-17 21:09 ` [Qemu-devel] [PATCH v2 1/3] VFIO: Clear stale MSIx table during EEH reset Alex Williamson
  2015-03-20  6:04 ` David Gibson
  3 siblings, 0 replies; 21+ messages in thread
From: Gavin Shan @ 2015-03-16 16:31 UTC (permalink / raw)
  To: qemu-devel; +Cc: aik, agraf, Gavin Shan, alex.williamson, qemu-ppc, david

When rebooting the guest, some PEs might be in frozen state. The
contained PCI devices won't work properly if their frozen states
aren't cleared in time. One case running into this situation would
be maximal EEH error times encountered in the guest.

The patch reenables the EEH functinality on PEs on PHB's reset
callback, which will clear their frozen states if needed.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 hw/ppc/spapr_pci_vfio.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
index 99a1be5..0cd96d3 100644
--- a/hw/ppc/spapr_pci_vfio.c
+++ b/hw/ppc/spapr_pci_vfio.c
@@ -71,9 +71,21 @@ static void spapr_phb_vfio_finish_realize(sPAPRPHBState *sphb, Error **errp)
                                 spapr_tce_get_iommu(tcet));
 }
 
+/*
+ * The PE might be in frozen state. To reenable the EEH
+ * functionality on it will clean the frozen state, which
+ * ensures that the comtained PCI devices will work properly.
+ */
 static void spapr_phb_vfio_reset(DeviceState *qdev)
 {
-    /* Do nothing */
+    sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(qdev);
+    struct vfio_eeh_pe_op op = {
+        .argsz = sizeof(op),
+        .op = VFIO_EEH_PE_ENABLE,
+    };
+
+    vfio_container_ioctl(&svphb->phb.iommu_as,
+                         svphb->iommugroupid, VFIO_EEH_PE_OP, &op);
 }
 
 static int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] VFIO: Clear stale MSIx table during EEH reset
  2015-03-16 16:31 [Qemu-devel] [PATCH v2 1/3] VFIO: Clear stale MSIx table during EEH reset Gavin Shan
  2015-03-16 16:31 ` [Qemu-devel] [PATCH v2 2/3] VFIO: Disable INTx interrupt on " Gavin Shan
  2015-03-16 16:31 ` [Qemu-devel] [PATCH v2 3/3] sPAPR: Reenable EEH functionality on reboot Gavin Shan
@ 2015-03-17 21:09 ` Alex Williamson
  2015-03-17 23:26   ` Gavin Shan
  2015-03-20  6:04 ` David Gibson
  3 siblings, 1 reply; 21+ messages in thread
From: Alex Williamson @ 2015-03-17 21:09 UTC (permalink / raw)
  To: Gavin Shan; +Cc: aik, qemu-devel, agraf, qemu-ppc, david

On Tue, 2015-03-17 at 03:31 +1100, Gavin Shan wrote:
> The PCI device MSIx table is cleaned out in hardware after EEH PE
> reset. However, we still hold the stale MSIx entries in QEMU, which
> should be cleared accordingly. Otherwise, we will run into another
> (recursive) EEH error and the PCI devices contained in the PE have
> to be offlined exceptionally.
> 
> The patch clears stale MSIx table before EEH PE reset so that MSIx
> table could be restored properly after EEH PE reset.
> 
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
> v2: vfio_container_eeh_event() stub for !CONFIG_PCI and separate
>     error message for this function. Dropped vfio_put_group()
>     on NULL group
> ---
>  hw/vfio/Makefile.objs  |  6 +++++-
>  hw/vfio/common.c       |  7 +++++++
>  hw/vfio/pci-stub.c     | 17 +++++++++++++++++
>  hw/vfio/pci.c          | 38 ++++++++++++++++++++++++++++++++++++++
>  include/hw/vfio/vfio.h |  2 ++
>  5 files changed, 69 insertions(+), 1 deletion(-)
>  create mode 100644 hw/vfio/pci-stub.c
> 
> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
> index e31f30e..1b8a065 100644
> --- a/hw/vfio/Makefile.objs
> +++ b/hw/vfio/Makefile.objs
> @@ -1,4 +1,8 @@
>  ifeq ($(CONFIG_LINUX), y)
>  obj-$(CONFIG_SOFTMMU) += common.o
> -obj-$(CONFIG_PCI) += pci.o
> +ifeq ($(CONFIG_PCI), y)
> +obj-y += pci.o
> +else
> +obj-y += pci-stub.o
> +endif
>  endif
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 148eb53..ed07814 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -949,7 +949,14 @@ int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
>      switch (req) {
>      case VFIO_CHECK_EXTENSION:
>      case VFIO_IOMMU_SPAPR_TCE_GET_INFO:
> +        break;
>      case VFIO_EEH_PE_OP:
> +        if (vfio_container_eeh_event(as, groupid, param) != 0) {
> +            error_report("vfio: cannot handle EEH event on group %d\n",
> +                         groupid);
> +            return -1;
> +        }
> +
>          break;
>      default:
>          /* Return an error on unknown requests */
> diff --git a/hw/vfio/pci-stub.c b/hw/vfio/pci-stub.c
> new file mode 100644
> index 0000000..7726a7a
> --- /dev/null
> +++ b/hw/vfio/pci-stub.c
> @@ -0,0 +1,17 @@
> +/*
> + * To include the file on !CONFIG_PCI
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + */
> +
> +#include <linux/vfio.h>
> +
> +#include "exec/memory.h"
> +#include "hw/vfio/vfio.h"
> +
> +int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
> +                             struct vfio_eeh_pe_op *op)
> +{
> +    return -1;
> +}

How about just a static inline in vfio.h?

> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 6b80539..fca1edc 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3319,6 +3319,44 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
>      vdev->req_enabled = false;
>  }
>  
> +int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
> +                             struct vfio_eeh_pe_op *op)
> +{
> +    VFIOGroup *group;
> +    VFIODevice *vbasedev;
> +    VFIOPCIDevice *vdev;
> +
> +    group = vfio_get_group(groupid, as);
> +    if (!group) {
> +        error_report("vfio: group %d not found\n", groupid);
> +        return -1;
> +    }
> +
> +    switch (op->op) {
> +    case VFIO_EEH_PE_RESET_HOT:
> +    case VFIO_EEH_PE_RESET_FUNDAMENTAL:
> +        /*
> +         * The MSIx table will be cleaned out by reset. We need
> +         * disable it so that it can be reenabled properly. Also,
> +         * the cached MSIx table should be cleared as it's not
> +         * reflecting the contents in hardware.
> +         */
> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
> +            vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
> +            if (msix_enabled(&vdev->pdev)) {
> +                vfio_disable_msix(vdev);
> +            }
> +
> +            msix_reset(&vdev->pdev);
> +        }
> +
> +        break;
> +    }
> +
> +    vfio_put_group(group);
> +    return 0;
> +}
> +
>  static int vfio_initfn(PCIDevice *pdev)
>  {
>      VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> diff --git a/include/hw/vfio/vfio.h b/include/hw/vfio/vfio.h
> index 0b26cd8..e9a3abb 100644
> --- a/include/hw/vfio/vfio.h
> +++ b/include/hw/vfio/vfio.h
> @@ -5,5 +5,7 @@
>  
>  extern int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
>                                  int req, void *param);
> +extern int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
> +                                    struct vfio_eeh_pe_op *op);
>  
>  #endif

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] VFIO: Disable INTx interrupt on EEH reset
  2015-03-16 16:31 ` [Qemu-devel] [PATCH v2 2/3] VFIO: Disable INTx interrupt on " Gavin Shan
@ 2015-03-17 21:16   ` Alex Williamson
  2015-03-18  4:54     ` [Qemu-devel] [Qemu-ppc] " Gavin Shan
  0 siblings, 1 reply; 21+ messages in thread
From: Alex Williamson @ 2015-03-17 21:16 UTC (permalink / raw)
  To: Gavin Shan; +Cc: aik, qemu-devel, agraf, qemu-ppc, david

On Tue, 2015-03-17 at 03:31 +1100, Gavin Shan wrote:
> When Linux guest recovers from EEH error on the following Emulex
> adapter, the MSIx interrupts are disabled and the INTx emulation
> is enabled. One INTx interrupt is injected to the guest by host
> because of detected pending INTx interrupts on the adapter. QEMU
> disables mmap'ed BAR regions and starts a timer to enable those
> regions at later point the INTx interrupt handler. Unfortunately,
> "VFIOPCIDevice->intx.pending" isn't cleared, meaning those disabled
> mapp'ed BAR regions won't be reenabled properly. It leads to EEH
> recovery failure at guest side because of hanged MMIO access.
> 
>  # lspci | grep Emulex
>  0000:01:00.0 Ethernet controller: Emulex Corporation \
>               OneConnect 10Gb NIC (be3) (rev 02)
>  0000:01:00.1 Ethernet controller: Emulex Corporation \
>               OneConnect 10Gb NIC (be3) (rev 02)
> 
> The patch disables INTx interrupt before doing EEH reset to avoid
> the issue.
> 
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>  hw/vfio/pci.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index fca1edc..bfa3d0c 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3340,6 +3340,14 @@ int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
>           * disable it so that it can be reenabled properly. Also,
>           * the cached MSIx table should be cleared as it's not
>           * reflecting the contents in hardware.
> +         *
> +         * We might have INTx interrupt whose handler disables the
> +         * memory mapped BARs. The INTx pending state can't be
> +         * cleared with memory BAR access in slow path. The timer
> +         * kicked by the INTx interrupt handler won't enable those
> +         * disabled memory mapped BARs, which leads to hanged MMIO
> +         * register access and EEH recovery failure. We simply disable
> +         * INTx if it has been enabled.
>           */

This feels like a quick hack for a problem we don't really understand.
Why is INTx being fired through QEMU rather than KVM?  Why isn't the
INTx re-enabling happening since this is exactly the scenario where it's
supposed to work (ie. INTx occurs, BAR mmap disabled, guest accesses
BAR, mmap re-enabled, INTx unmasked)?

>          QLIST_FOREACH(vbasedev, &group->device_list, next) {
>              vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
> @@ -3348,6 +3356,11 @@ int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
>              }
>  
>              msix_reset(&vdev->pdev);
> +
> +            /* Disable INTx */
> +            if (vdev->interrupt == VFIO_INT_INTx) {
> +                vfio_disable_intx(vdev);
> +            }
>          }
>  
>          break;

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] VFIO: Clear stale MSIx table during EEH reset
  2015-03-17 21:09 ` [Qemu-devel] [PATCH v2 1/3] VFIO: Clear stale MSIx table during EEH reset Alex Williamson
@ 2015-03-17 23:26   ` Gavin Shan
  2015-03-23  5:05     ` Gavin Shan
  0 siblings, 1 reply; 21+ messages in thread
From: Gavin Shan @ 2015-03-17 23:26 UTC (permalink / raw)
  To: Alex Williamson; +Cc: qemu-devel, aik, agraf, Gavin Shan, qemu-ppc, david

On Tue, Mar 17, 2015 at 03:09:52PM -0600, Alex Williamson wrote:
>On Tue, 2015-03-17 at 03:31 +1100, Gavin Shan wrote:
>> The PCI device MSIx table is cleaned out in hardware after EEH PE
>> reset. However, we still hold the stale MSIx entries in QEMU, which
>> should be cleared accordingly. Otherwise, we will run into another
>> (recursive) EEH error and the PCI devices contained in the PE have
>> to be offlined exceptionally.
>> 
>> The patch clears stale MSIx table before EEH PE reset so that MSIx
>> table could be restored properly after EEH PE reset.
>> 
>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> ---
>> v2: vfio_container_eeh_event() stub for !CONFIG_PCI and separate
>>     error message for this function. Dropped vfio_put_group()
>>     on NULL group
>> ---
>>  hw/vfio/Makefile.objs  |  6 +++++-
>>  hw/vfio/common.c       |  7 +++++++
>>  hw/vfio/pci-stub.c     | 17 +++++++++++++++++
>>  hw/vfio/pci.c          | 38 ++++++++++++++++++++++++++++++++++++++
>>  include/hw/vfio/vfio.h |  2 ++
>>  5 files changed, 69 insertions(+), 1 deletion(-)
>>  create mode 100644 hw/vfio/pci-stub.c
>> 
>> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
>> index e31f30e..1b8a065 100644
>> --- a/hw/vfio/Makefile.objs
>> +++ b/hw/vfio/Makefile.objs
>> @@ -1,4 +1,8 @@
>>  ifeq ($(CONFIG_LINUX), y)
>>  obj-$(CONFIG_SOFTMMU) += common.o
>> -obj-$(CONFIG_PCI) += pci.o
>> +ifeq ($(CONFIG_PCI), y)
>> +obj-y += pci.o
>> +else
>> +obj-y += pci-stub.o
>> +endif
>>  endif
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 148eb53..ed07814 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -949,7 +949,14 @@ int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
>>      switch (req) {
>>      case VFIO_CHECK_EXTENSION:
>>      case VFIO_IOMMU_SPAPR_TCE_GET_INFO:
>> +        break;
>>      case VFIO_EEH_PE_OP:
>> +        if (vfio_container_eeh_event(as, groupid, param) != 0) {
>> +            error_report("vfio: cannot handle EEH event on group %d\n",
>> +                         groupid);
>> +            return -1;
>> +        }
>> +
>>          break;
>>      default:
>>          /* Return an error on unknown requests */
>> diff --git a/hw/vfio/pci-stub.c b/hw/vfio/pci-stub.c
>> new file mode 100644
>> index 0000000..7726a7a
>> --- /dev/null
>> +++ b/hw/vfio/pci-stub.c
>> @@ -0,0 +1,17 @@
>> +/*
>> + * To include the file on !CONFIG_PCI
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2.  See
>> + * the COPYING file in the top-level directory.
>> + */
>> +
>> +#include <linux/vfio.h>
>> +
>> +#include "exec/memory.h"
>> +#include "hw/vfio/vfio.h"
>> +
>> +int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
>> +                             struct vfio_eeh_pe_op *op)
>> +{
>> +    return -1;
>> +}
>
>How about just a static inline in vfio.h?
>

Yes, It would make things simpler to have something as follows, but I didn't
find there is a header file, which includes "#define CONFIG_PCI 1" or similar
thing. Do you know the header file containing CONFIG_PCI, or alternative macro
for the purpose?

#ifdef CONFIG_PCI /* Or alternative macro */
extern int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
				    struct vfio_eeh_pe_op *op);
#else
static ineline int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
                                            struct vfio_eeh_pe_op *op)
{
    return -1;
}
#endif /* CONFIG_PCI */

Thanks,
Gavin

>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 6b80539..fca1edc 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -3319,6 +3319,44 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
>>      vdev->req_enabled = false;
>>  }
>>  
>> +int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
>> +                             struct vfio_eeh_pe_op *op)
>> +{
>> +    VFIOGroup *group;
>> +    VFIODevice *vbasedev;
>> +    VFIOPCIDevice *vdev;
>> +
>> +    group = vfio_get_group(groupid, as);
>> +    if (!group) {
>> +        error_report("vfio: group %d not found\n", groupid);
>> +        return -1;
>> +    }
>> +
>> +    switch (op->op) {
>> +    case VFIO_EEH_PE_RESET_HOT:
>> +    case VFIO_EEH_PE_RESET_FUNDAMENTAL:
>> +        /*
>> +         * The MSIx table will be cleaned out by reset. We need
>> +         * disable it so that it can be reenabled properly. Also,
>> +         * the cached MSIx table should be cleared as it's not
>> +         * reflecting the contents in hardware.
>> +         */
>> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
>> +            vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>> +            if (msix_enabled(&vdev->pdev)) {
>> +                vfio_disable_msix(vdev);
>> +            }
>> +
>> +            msix_reset(&vdev->pdev);
>> +        }
>> +
>> +        break;
>> +    }
>> +
>> +    vfio_put_group(group);
>> +    return 0;
>> +}
>> +
>>  static int vfio_initfn(PCIDevice *pdev)
>>  {
>>      VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
>> diff --git a/include/hw/vfio/vfio.h b/include/hw/vfio/vfio.h
>> index 0b26cd8..e9a3abb 100644
>> --- a/include/hw/vfio/vfio.h
>> +++ b/include/hw/vfio/vfio.h
>> @@ -5,5 +5,7 @@
>>  
>>  extern int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
>>                                  int req, void *param);
>> +extern int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
>> +                                    struct vfio_eeh_pe_op *op);
>>  
>>  #endif
>
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v2 2/3] VFIO: Disable INTx interrupt on EEH reset
  2015-03-17 21:16   ` Alex Williamson
@ 2015-03-18  4:54     ` Gavin Shan
  2015-03-20  4:01       ` Gavin Shan
  0 siblings, 1 reply; 21+ messages in thread
From: Gavin Shan @ 2015-03-18  4:54 UTC (permalink / raw)
  To: Alex Williamson; +Cc: david, qemu-ppc, Gavin Shan, qemu-devel

On Tue, Mar 17, 2015 at 03:16:46PM -0600, Alex Williamson wrote:
>On Tue, 2015-03-17 at 03:31 +1100, Gavin Shan wrote:
>> When Linux guest recovers from EEH error on the following Emulex
>> adapter, the MSIx interrupts are disabled and the INTx emulation
>> is enabled. One INTx interrupt is injected to the guest by host
>> because of detected pending INTx interrupts on the adapter. QEMU
>> disables mmap'ed BAR regions and starts a timer to enable those
>> regions at later point the INTx interrupt handler. Unfortunately,
>> "VFIOPCIDevice->intx.pending" isn't cleared, meaning those disabled
>> mapp'ed BAR regions won't be reenabled properly. It leads to EEH
>> recovery failure at guest side because of hanged MMIO access.
>> 
>>  # lspci | grep Emulex
>>  0000:01:00.0 Ethernet controller: Emulex Corporation \
>>               OneConnect 10Gb NIC (be3) (rev 02)
>>  0000:01:00.1 Ethernet controller: Emulex Corporation \
>>               OneConnect 10Gb NIC (be3) (rev 02)
>> 
>> The patch disables INTx interrupt before doing EEH reset to avoid
>> the issue.
>> 
>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> ---
>>  hw/vfio/pci.c | 13 +++++++++++++
>>  1 file changed, 13 insertions(+)
>> 
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index fca1edc..bfa3d0c 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -3340,6 +3340,14 @@ int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
>>           * disable it so that it can be reenabled properly. Also,
>>           * the cached MSIx table should be cleared as it's not
>>           * reflecting the contents in hardware.
>> +         *
>> +         * We might have INTx interrupt whose handler disables the
>> +         * memory mapped BARs. The INTx pending state can't be
>> +         * cleared with memory BAR access in slow path. The timer
>> +         * kicked by the INTx interrupt handler won't enable those
>> +         * disabled memory mapped BARs, which leads to hanged MMIO
>> +         * register access and EEH recovery failure. We simply disable
>> +         * INTx if it has been enabled.
>>           */
>
>This feels like a quick hack for a problem we don't really understand.
>Why is INTx being fired through QEMU rather than KVM?  Why isn't the
>INTx re-enabling happening since this is exactly the scenario where it's
>supposed to work (ie. INTx occurs, BAR mmap disabled, guest accesses
>BAR, mmap re-enabled, INTx unmasked)?
>

Indeed. It's a quick hack before finding the root cause about why slow
path doesn't work when fast path is disabled. I'm still tracing it and
hopefully I can find something soon. Note that: KVM IRQFD isn't enabled
on the system I was doing experiments.

Thanks,
Gavin

>>          QLIST_FOREACH(vbasedev, &group->device_list, next) {
>>              vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>> @@ -3348,6 +3356,11 @@ int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
>>              }
>>  
>>              msix_reset(&vdev->pdev);
>> +
>> +            /* Disable INTx */
>> +            if (vdev->interrupt == VFIO_INT_INTx) {
>> +                vfio_disable_intx(vdev);
>> +            }
>>          }
>>  
>>          break;
>
>
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v2 2/3] VFIO: Disable INTx interrupt on EEH reset
  2015-03-18  4:54     ` [Qemu-devel] [Qemu-ppc] " Gavin Shan
@ 2015-03-20  4:01       ` Gavin Shan
  2015-03-20  5:57         ` Gavin Shan
  0 siblings, 1 reply; 21+ messages in thread
From: Gavin Shan @ 2015-03-20  4:01 UTC (permalink / raw)
  To: Gavin Shan; +Cc: Alex Williamson, qemu-ppc, qemu-devel, david

On Wed, Mar 18, 2015 at 03:54:09PM +1100, Gavin Shan wrote:
>On Tue, Mar 17, 2015 at 03:16:46PM -0600, Alex Williamson wrote:
>>On Tue, 2015-03-17 at 03:31 +1100, Gavin Shan wrote:
>>> When Linux guest recovers from EEH error on the following Emulex
>>> adapter, the MSIx interrupts are disabled and the INTx emulation
>>> is enabled. One INTx interrupt is injected to the guest by host
>>> because of detected pending INTx interrupts on the adapter. QEMU
>>> disables mmap'ed BAR regions and starts a timer to enable those
>>> regions at later point the INTx interrupt handler. Unfortunately,
>>> "VFIOPCIDevice->intx.pending" isn't cleared, meaning those disabled
>>> mapp'ed BAR regions won't be reenabled properly. It leads to EEH
>>> recovery failure at guest side because of hanged MMIO access.
>>> 
>>>  # lspci | grep Emulex
>>>  0000:01:00.0 Ethernet controller: Emulex Corporation \
>>>               OneConnect 10Gb NIC (be3) (rev 02)
>>>  0000:01:00.1 Ethernet controller: Emulex Corporation \
>>>               OneConnect 10Gb NIC (be3) (rev 02)
>>> 
>>> The patch disables INTx interrupt before doing EEH reset to avoid
>>> the issue.
>>> 
>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> ---
>>>  hw/vfio/pci.c | 13 +++++++++++++
>>>  1 file changed, 13 insertions(+)
>>> 
>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>> index fca1edc..bfa3d0c 100644
>>> --- a/hw/vfio/pci.c
>>> +++ b/hw/vfio/pci.c
>>> @@ -3340,6 +3340,14 @@ int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
>>>           * disable it so that it can be reenabled properly. Also,
>>>           * the cached MSIx table should be cleared as it's not
>>>           * reflecting the contents in hardware.
>>> +         *
>>> +         * We might have INTx interrupt whose handler disables the
>>> +         * memory mapped BARs. The INTx pending state can't be
>>> +         * cleared with memory BAR access in slow path. The timer
>>> +         * kicked by the INTx interrupt handler won't enable those
>>> +         * disabled memory mapped BARs, which leads to hanged MMIO
>>> +         * register access and EEH recovery failure. We simply disable
>>> +         * INTx if it has been enabled.
>>>           */
>>
>>This feels like a quick hack for a problem we don't really understand.
>>Why is INTx being fired through QEMU rather than KVM?  Why isn't the
>>INTx re-enabling happening since this is exactly the scenario where it's
>>supposed to work (ie. INTx occurs, BAR mmap disabled, guest accesses
>>BAR, mmap re-enabled, INTx unmasked)?
>>
>
>Indeed. It's a quick hack before finding the root cause about why slow
>path doesn't work when fast path is disabled. I'm still tracing it and
>hopefully I can find something soon. Note that: KVM IRQFD isn't enabled
>on the system I was doing experiments.
>

I'm not sure if there're further replies to this thread since then. My
mailbox is broken for quite a while. Also, I don't know how Ben is missed
from the cc list. Add him who might have more thoughts. Sorry for mail 
flooding to you, Ben :-)

The root cause of the problem would be specific to how MMIO regions are
supported on KVM for book3s_hv, where invalid page table entries (PTEs)
are built to trace the address mapping between guest virtual address
and host physical address. Since they're marked as "invalid", accessing
to the MMIO region will cause page fault in host kernel and then the
PTE table is searched to locate the address mapping, the access is
propogated to QEMU for emulation if the address mapping exits in PTE
table. It's how the "slow path" works.

For the "fast path", KVM creates memory slot and update PTE table with
the mappings upon page fault. Those PTEs are "valid" and seen by CPU
to do address translation automatically. When the memory region for
the "fast path" is disabled from QEMU, the host will purge TLB/PTEs
for the mappings. So when the "fast path" regions are disabled, the
PTE table doesn't have the entries required for "slow path" access
any more. At later point, guest tries access to the region and the
host just doesn't know how to handle the case.

I already have couple of patches to introduce addtional flag to
identify the "fast path" region, which has properties that RAM
region and MMIO region have. Once the region is disabled, host
purges its mappings in PTEs/TLB, and then update PTEs with the
mappings required by "slow path" region. With that, I didn't see
the problem. Both host/QEMU needs changes. I'll remove this patch
from the series and send out a new revision. The patches to address
the "fast path" issue will be sent separately after checking with
Ben or Paul they're worthy to be sent out.

Thanks,
Gavin

>
>>>          QLIST_FOREACH(vbasedev, &group->device_list, next) {
>>>              vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>>> @@ -3348,6 +3356,11 @@ int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
>>>              }
>>>  
>>>              msix_reset(&vdev->pdev);
>>> +
>>> +            /* Disable INTx */
>>> +            if (vdev->interrupt == VFIO_INT_INTx) {
>>> +                vfio_disable_intx(vdev);
>>> +            }
>>>          }
>>>  
>>>          break;
>>
>>
>>
>>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v2 2/3] VFIO: Disable INTx interrupt on EEH reset
  2015-03-20  4:01       ` Gavin Shan
@ 2015-03-20  5:57         ` Gavin Shan
  0 siblings, 0 replies; 21+ messages in thread
From: Gavin Shan @ 2015-03-20  5:57 UTC (permalink / raw)
  To: Gavin Shan; +Cc: Alex Williamson, qemu-ppc, qemu-devel, david

On Fri, Mar 20, 2015 at 03:01:43PM +1100, Gavin Shan wrote:
>On Wed, Mar 18, 2015 at 03:54:09PM +1100, Gavin Shan wrote:
>>On Tue, Mar 17, 2015 at 03:16:46PM -0600, Alex Williamson wrote:
>>>On Tue, 2015-03-17 at 03:31 +1100, Gavin Shan wrote:
>>>> When Linux guest recovers from EEH error on the following Emulex
>>>> adapter, the MSIx interrupts are disabled and the INTx emulation
>>>> is enabled. One INTx interrupt is injected to the guest by host
>>>> because of detected pending INTx interrupts on the adapter. QEMU
>>>> disables mmap'ed BAR regions and starts a timer to enable those
>>>> regions at later point the INTx interrupt handler. Unfortunately,
>>>> "VFIOPCIDevice->intx.pending" isn't cleared, meaning those disabled
>>>> mapp'ed BAR regions won't be reenabled properly. It leads to EEH
>>>> recovery failure at guest side because of hanged MMIO access.
>>>> 
>>>>  # lspci | grep Emulex
>>>>  0000:01:00.0 Ethernet controller: Emulex Corporation \
>>>>               OneConnect 10Gb NIC (be3) (rev 02)
>>>>  0000:01:00.1 Ethernet controller: Emulex Corporation \
>>>>               OneConnect 10Gb NIC (be3) (rev 02)
>>>> 
>>>> The patch disables INTx interrupt before doing EEH reset to avoid
>>>> the issue.
>>>> 
>>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>> ---
>>>>  hw/vfio/pci.c | 13 +++++++++++++
>>>>  1 file changed, 13 insertions(+)
>>>> 
>>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>>> index fca1edc..bfa3d0c 100644
>>>> --- a/hw/vfio/pci.c
>>>> +++ b/hw/vfio/pci.c
>>>> @@ -3340,6 +3340,14 @@ int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
>>>>           * disable it so that it can be reenabled properly. Also,
>>>>           * the cached MSIx table should be cleared as it's not
>>>>           * reflecting the contents in hardware.
>>>> +         *
>>>> +         * We might have INTx interrupt whose handler disables the
>>>> +         * memory mapped BARs. The INTx pending state can't be
>>>> +         * cleared with memory BAR access in slow path. The timer
>>>> +         * kicked by the INTx interrupt handler won't enable those
>>>> +         * disabled memory mapped BARs, which leads to hanged MMIO
>>>> +         * register access and EEH recovery failure. We simply disable
>>>> +         * INTx if it has been enabled.
>>>>           */
>>>
>>>This feels like a quick hack for a problem we don't really understand.
>>>Why is INTx being fired through QEMU rather than KVM?  Why isn't the
>>>INTx re-enabling happening since this is exactly the scenario where it's
>>>supposed to work (ie. INTx occurs, BAR mmap disabled, guest accesses
>>>BAR, mmap re-enabled, INTx unmasked)?
>>>
>>
>>Indeed. It's a quick hack before finding the root cause about why slow
>>path doesn't work when fast path is disabled. I'm still tracing it and
>>hopefully I can find something soon. Note that: KVM IRQFD isn't enabled
>>on the system I was doing experiments.
>>
>
>I'm not sure if there're further replies to this thread since then. My
>mailbox is broken for quite a while. Also, I don't know how Ben is missed
>from the cc list. Add him who might have more thoughts. Sorry for mail 
>flooding to you, Ben :-)
>
>The root cause of the problem would be specific to how MMIO regions are
>supported on KVM for book3s_hv, where invalid page table entries (PTEs)
>are built to trace the address mapping between guest virtual address
>and host physical address. Since they're marked as "invalid", accessing
>to the MMIO region will cause page fault in host kernel and then the
>PTE table is searched to locate the address mapping, the access is
>propogated to QEMU for emulation if the address mapping exits in PTE
>table. It's how the "slow path" works.
>
>For the "fast path", KVM creates memory slot and update PTE table with
>the mappings upon page fault. Those PTEs are "valid" and seen by CPU
>to do address translation automatically. When the memory region for
>the "fast path" is disabled from QEMU, the host will purge TLB/PTEs
>for the mappings. So when the "fast path" regions are disabled, the
>PTE table doesn't have the entries required for "slow path" access
>any more. At later point, guest tries access to the region and the
>host just doesn't know how to handle the case.
>
>I already have couple of patches to introduce addtional flag to
>identify the "fast path" region, which has properties that RAM
>region and MMIO region have. Once the region is disabled, host
>purges its mappings in PTEs/TLB, and then update PTEs with the
>mappings required by "slow path" region. With that, I didn't see
>the problem. Both host/QEMU needs changes. I'll remove this patch
>from the series and send out a new revision. The patches to address
>the "fast path" issue will be sent separately after checking with
>Ben or Paul they're worthy to be sent out.
>

Checked with Paul Mackerras. He already has patch fixing the issue
in an different way :-)

Thanks,
Gavin

>
>>
>>>>          QLIST_FOREACH(vbasedev, &group->device_list, next) {
>>>>              vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>>>> @@ -3348,6 +3356,11 @@ int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
>>>>              }
>>>>  
>>>>              msix_reset(&vdev->pdev);
>>>> +
>>>> +            /* Disable INTx */
>>>> +            if (vdev->interrupt == VFIO_INT_INTx) {
>>>> +                vfio_disable_intx(vdev);
>>>> +            }
>>>>          }
>>>>  
>>>>          break;
>>>
>>>
>>>
>>>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] VFIO: Clear stale MSIx table during EEH reset
  2015-03-16 16:31 [Qemu-devel] [PATCH v2 1/3] VFIO: Clear stale MSIx table during EEH reset Gavin Shan
                   ` (2 preceding siblings ...)
  2015-03-17 21:09 ` [Qemu-devel] [PATCH v2 1/3] VFIO: Clear stale MSIx table during EEH reset Alex Williamson
@ 2015-03-20  6:04 ` David Gibson
  2015-03-20  6:27   ` [Qemu-devel] [Qemu-ppc] " Gavin Shan
  3 siblings, 1 reply; 21+ messages in thread
From: David Gibson @ 2015-03-20  6:04 UTC (permalink / raw)
  To: Gavin Shan; +Cc: aik, qemu-devel, agraf, alex.williamson, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 5166 bytes --]

On Tue, Mar 17, 2015 at 03:31:24AM +1100, Gavin Shan wrote:
> The PCI device MSIx table is cleaned out in hardware after EEH PE
> reset. However, we still hold the stale MSIx entries in QEMU, which
> should be cleared accordingly. Otherwise, we will run into another
> (recursive) EEH error and the PCI devices contained in the PE have
> to be offlined exceptionally.
> 
> The patch clears stale MSIx table before EEH PE reset so that MSIx
> table could be restored properly after EEH PE reset.
> 
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
> v2: vfio_container_eeh_event() stub for !CONFIG_PCI and separate
>     error message for this function. Dropped vfio_put_group()
>     on NULL group
> ---
>  hw/vfio/Makefile.objs  |  6 +++++-
>  hw/vfio/common.c       |  7 +++++++
>  hw/vfio/pci-stub.c     | 17 +++++++++++++++++
>  hw/vfio/pci.c          | 38 ++++++++++++++++++++++++++++++++++++++
>  include/hw/vfio/vfio.h |  2 ++
>  5 files changed, 69 insertions(+), 1 deletion(-)
>  create mode 100644 hw/vfio/pci-stub.c
> 
> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
> index e31f30e..1b8a065 100644
> --- a/hw/vfio/Makefile.objs
> +++ b/hw/vfio/Makefile.objs
> @@ -1,4 +1,8 @@
>  ifeq ($(CONFIG_LINUX), y)
>  obj-$(CONFIG_SOFTMMU) += common.o
> -obj-$(CONFIG_PCI) += pci.o
> +ifeq ($(CONFIG_PCI), y)
> +obj-y += pci.o
> +else
> +obj-y += pci-stub.o
> +endif
>  endif
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 148eb53..ed07814 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -949,7 +949,14 @@ int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
>      switch (req) {
>      case VFIO_CHECK_EXTENSION:
>      case VFIO_IOMMU_SPAPR_TCE_GET_INFO:
> +        break;
>      case VFIO_EEH_PE_OP:
> +        if (vfio_container_eeh_event(as, groupid, param) != 0) {

I really dislike the idea of having an arbitrarily complex side effect
from a function whose name suggest's it's just a trivial wrapper
around the ioctl().


> +            error_report("vfio: cannot handle EEH event on group %d\n",
> +                         groupid);
> +            return -1;
> +        }
> +
>          break;
>      default:
>          /* Return an error on unknown requests */
> diff --git a/hw/vfio/pci-stub.c b/hw/vfio/pci-stub.c
> new file mode 100644
> index 0000000..7726a7a
> --- /dev/null
> +++ b/hw/vfio/pci-stub.c
> @@ -0,0 +1,17 @@
> +/*
> + * To include the file on !CONFIG_PCI
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + */
> +
> +#include <linux/vfio.h>
> +
> +#include "exec/memory.h"
> +#include "hw/vfio/vfio.h"
> +
> +int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
> +                             struct vfio_eeh_pe_op *op)
> +{
> +    return -1;
> +}
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 6b80539..fca1edc 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3319,6 +3319,44 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
>      vdev->req_enabled = false;
>  }
>  
> +int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
> +                             struct vfio_eeh_pe_op *op)
> +{
> +    VFIOGroup *group;
> +    VFIODevice *vbasedev;
> +    VFIOPCIDevice *vdev;
> +
> +    group = vfio_get_group(groupid, as);
> +    if (!group) {
> +        error_report("vfio: group %d not found\n", groupid);
> +        return -1;
> +    }
> +
> +    switch (op->op) {
> +    case VFIO_EEH_PE_RESET_HOT:
> +    case VFIO_EEH_PE_RESET_FUNDAMENTAL:
> +        /*
> +         * The MSIx table will be cleaned out by reset. We need
> +         * disable it so that it can be reenabled properly. Also,
> +         * the cached MSIx table should be cleared as it's not
> +         * reflecting the contents in hardware.
> +         */
> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
> +            vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
> +            if (msix_enabled(&vdev->pdev)) {
> +                vfio_disable_msix(vdev);
> +            }
> +
> +            msix_reset(&vdev->pdev);
> +        }
> +
> +        break;
> +    }
> +
> +    vfio_put_group(group);
> +    return 0;
> +}
> +
>  static int vfio_initfn(PCIDevice *pdev)
>  {
>      VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> diff --git a/include/hw/vfio/vfio.h b/include/hw/vfio/vfio.h
> index 0b26cd8..e9a3abb 100644
> --- a/include/hw/vfio/vfio.h
> +++ b/include/hw/vfio/vfio.h
> @@ -5,5 +5,7 @@
>  
>  extern int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
>                                  int req, void *param);
> +extern int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
> +                                    struct vfio_eeh_pe_op *op);
>  
>  #endif

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v2 1/3] VFIO: Clear stale MSIx table during EEH reset
  2015-03-20  6:04 ` David Gibson
@ 2015-03-20  6:27   ` Gavin Shan
  2015-03-23  5:06     ` David Gibson
  0 siblings, 1 reply; 21+ messages in thread
From: Gavin Shan @ 2015-03-20  6:27 UTC (permalink / raw)
  To: David Gibson; +Cc: alex.williamson, qemu-ppc, Gavin Shan, qemu-devel

On Fri, Mar 20, 2015 at 05:04:01PM +1100, David Gibson wrote:
>On Tue, Mar 17, 2015 at 03:31:24AM +1100, Gavin Shan wrote:
>> The PCI device MSIx table is cleaned out in hardware after EEH PE
>> reset. However, we still hold the stale MSIx entries in QEMU, which
>> should be cleared accordingly. Otherwise, we will run into another
>> (recursive) EEH error and the PCI devices contained in the PE have
>> to be offlined exceptionally.
>> 
>> The patch clears stale MSIx table before EEH PE reset so that MSIx
>> table could be restored properly after EEH PE reset.
>> 
>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> ---
>> v2: vfio_container_eeh_event() stub for !CONFIG_PCI and separate
>>     error message for this function. Dropped vfio_put_group()
>>     on NULL group
>> ---
>>  hw/vfio/Makefile.objs  |  6 +++++-
>>  hw/vfio/common.c       |  7 +++++++
>>  hw/vfio/pci-stub.c     | 17 +++++++++++++++++
>>  hw/vfio/pci.c          | 38 ++++++++++++++++++++++++++++++++++++++
>>  include/hw/vfio/vfio.h |  2 ++
>>  5 files changed, 69 insertions(+), 1 deletion(-)
>>  create mode 100644 hw/vfio/pci-stub.c
>> 
>> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
>> index e31f30e..1b8a065 100644
>> --- a/hw/vfio/Makefile.objs
>> +++ b/hw/vfio/Makefile.objs
>> @@ -1,4 +1,8 @@
>>  ifeq ($(CONFIG_LINUX), y)
>>  obj-$(CONFIG_SOFTMMU) += common.o
>> -obj-$(CONFIG_PCI) += pci.o
>> +ifeq ($(CONFIG_PCI), y)
>> +obj-y += pci.o
>> +else
>> +obj-y += pci-stub.o
>> +endif
>>  endif
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 148eb53..ed07814 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -949,7 +949,14 @@ int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
>>      switch (req) {
>>      case VFIO_CHECK_EXTENSION:
>>      case VFIO_IOMMU_SPAPR_TCE_GET_INFO:
>> +        break;
>>      case VFIO_EEH_PE_OP:
>> +        if (vfio_container_eeh_event(as, groupid, param) != 0) {
>
>I really dislike the idea of having an arbitrarily complex side effect
>from a function whose name suggest's it's just a trivial wrapper
>around the ioctl().
>

Ok. I guess you would like putting the complex in the callers of
vfio_container_ioctl(). All those callers are implemtend in sPAPR
platform (spapr_pci_vfio.c). I didn't put the logic there because:

- VFIOPCIDevice is invisible to sPAPR platform.
- We can't tell one particular PCI device attached to sPAPRPHBState
  is the basement of VFIOPCIDevice, or emulated PCI device.

Thanks,
Gavin

>
>> +            error_report("vfio: cannot handle EEH event on group %d\n",
>> +                         groupid);
>> +            return -1;
>> +        }
>> +
>>          break;
>>      default:
>>          /* Return an error on unknown requests */
>> diff --git a/hw/vfio/pci-stub.c b/hw/vfio/pci-stub.c
>> new file mode 100644
>> index 0000000..7726a7a
>> --- /dev/null
>> +++ b/hw/vfio/pci-stub.c
>> @@ -0,0 +1,17 @@
>> +/*
>> + * To include the file on !CONFIG_PCI
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2.  See
>> + * the COPYING file in the top-level directory.
>> + */
>> +
>> +#include <linux/vfio.h>
>> +
>> +#include "exec/memory.h"
>> +#include "hw/vfio/vfio.h"
>> +
>> +int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
>> +                             struct vfio_eeh_pe_op *op)
>> +{
>> +    return -1;
>> +}
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 6b80539..fca1edc 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -3319,6 +3319,44 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
>>      vdev->req_enabled = false;
>>  }
>>  
>> +int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
>> +                             struct vfio_eeh_pe_op *op)
>> +{
>> +    VFIOGroup *group;
>> +    VFIODevice *vbasedev;
>> +    VFIOPCIDevice *vdev;
>> +
>> +    group = vfio_get_group(groupid, as);
>> +    if (!group) {
>> +        error_report("vfio: group %d not found\n", groupid);
>> +        return -1;
>> +    }
>> +
>> +    switch (op->op) {
>> +    case VFIO_EEH_PE_RESET_HOT:
>> +    case VFIO_EEH_PE_RESET_FUNDAMENTAL:
>> +        /*
>> +         * The MSIx table will be cleaned out by reset. We need
>> +         * disable it so that it can be reenabled properly. Also,
>> +         * the cached MSIx table should be cleared as it's not
>> +         * reflecting the contents in hardware.
>> +         */
>> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
>> +            vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>> +            if (msix_enabled(&vdev->pdev)) {
>> +                vfio_disable_msix(vdev);
>> +            }
>> +
>> +            msix_reset(&vdev->pdev);
>> +        }
>> +
>> +        break;
>> +    }
>> +
>> +    vfio_put_group(group);
>> +    return 0;
>> +}
>> +
>>  static int vfio_initfn(PCIDevice *pdev)
>>  {
>>      VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
>> diff --git a/include/hw/vfio/vfio.h b/include/hw/vfio/vfio.h
>> index 0b26cd8..e9a3abb 100644
>> --- a/include/hw/vfio/vfio.h
>> +++ b/include/hw/vfio/vfio.h
>> @@ -5,5 +5,7 @@
>>  
>>  extern int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
>>                                  int req, void *param);
>> +extern int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
>> +                                    struct vfio_eeh_pe_op *op);
>>  
>>  #endif
>
>-- 
>David Gibson			| I'll have my music baroque, and my code
>david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
>				| _way_ _around_!
>http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] VFIO: Clear stale MSIx table during EEH reset
  2015-03-17 23:26   ` Gavin Shan
@ 2015-03-23  5:05     ` Gavin Shan
  0 siblings, 0 replies; 21+ messages in thread
From: Gavin Shan @ 2015-03-23  5:05 UTC (permalink / raw)
  To: Gavin Shan; +Cc: aik, qemu-devel, agraf, Alex Williamson, qemu-ppc, david

On Wed, Mar 18, 2015 at 10:26:33AM +1100, Gavin Shan wrote:
>On Tue, Mar 17, 2015 at 03:09:52PM -0600, Alex Williamson wrote:
>>On Tue, 2015-03-17 at 03:31 +1100, Gavin Shan wrote:
>>> The PCI device MSIx table is cleaned out in hardware after EEH PE
>>> reset. However, we still hold the stale MSIx entries in QEMU, which
>>> should be cleared accordingly. Otherwise, we will run into another
>>> (recursive) EEH error and the PCI devices contained in the PE have
>>> to be offlined exceptionally.
>>> 
>>> The patch clears stale MSIx table before EEH PE reset so that MSIx
>>> table could be restored properly after EEH PE reset.
>>> 
>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> ---
>>> v2: vfio_container_eeh_event() stub for !CONFIG_PCI and separate
>>>     error message for this function. Dropped vfio_put_group()
>>>     on NULL group
>>> ---
>>>  hw/vfio/Makefile.objs  |  6 +++++-
>>>  hw/vfio/common.c       |  7 +++++++
>>>  hw/vfio/pci-stub.c     | 17 +++++++++++++++++
>>>  hw/vfio/pci.c          | 38 ++++++++++++++++++++++++++++++++++++++
>>>  include/hw/vfio/vfio.h |  2 ++
>>>  5 files changed, 69 insertions(+), 1 deletion(-)
>>>  create mode 100644 hw/vfio/pci-stub.c
>>> 
>>> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
>>> index e31f30e..1b8a065 100644
>>> --- a/hw/vfio/Makefile.objs
>>> +++ b/hw/vfio/Makefile.objs
>>> @@ -1,4 +1,8 @@
>>>  ifeq ($(CONFIG_LINUX), y)
>>>  obj-$(CONFIG_SOFTMMU) += common.o
>>> -obj-$(CONFIG_PCI) += pci.o
>>> +ifeq ($(CONFIG_PCI), y)
>>> +obj-y += pci.o
>>> +else
>>> +obj-y += pci-stub.o
>>> +endif
>>>  endif
>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>> index 148eb53..ed07814 100644
>>> --- a/hw/vfio/common.c
>>> +++ b/hw/vfio/common.c
>>> @@ -949,7 +949,14 @@ int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
>>>      switch (req) {
>>>      case VFIO_CHECK_EXTENSION:
>>>      case VFIO_IOMMU_SPAPR_TCE_GET_INFO:
>>> +        break;
>>>      case VFIO_EEH_PE_OP:
>>> +        if (vfio_container_eeh_event(as, groupid, param) != 0) {
>>> +            error_report("vfio: cannot handle EEH event on group %d\n",
>>> +                         groupid);
>>> +            return -1;
>>> +        }
>>> +
>>>          break;
>>>      default:
>>>          /* Return an error on unknown requests */
>>> diff --git a/hw/vfio/pci-stub.c b/hw/vfio/pci-stub.c
>>> new file mode 100644
>>> index 0000000..7726a7a
>>> --- /dev/null
>>> +++ b/hw/vfio/pci-stub.c
>>> @@ -0,0 +1,17 @@
>>> +/*
>>> + * To include the file on !CONFIG_PCI
>>> + *
>>> + * This work is licensed under the terms of the GNU GPL, version 2.  See
>>> + * the COPYING file in the top-level directory.
>>> + */
>>> +
>>> +#include <linux/vfio.h>
>>> +
>>> +#include "exec/memory.h"
>>> +#include "hw/vfio/vfio.h"
>>> +
>>> +int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
>>> +                             struct vfio_eeh_pe_op *op)
>>> +{
>>> +    return -1;
>>> +}
>>
>>How about just a static inline in vfio.h?
>>
>
>Yes, It would make things simpler to have something as follows, but I didn't
>find there is a header file, which includes "#define CONFIG_PCI 1" or similar
>thing. Do you know the header file containing CONFIG_PCI, or alternative macro
>for the purpose?
>
>#ifdef CONFIG_PCI /* Or alternative macro */
>extern int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
>				    struct vfio_eeh_pe_op *op);
>#else
>static ineline int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
>                                            struct vfio_eeh_pe_op *op)
>{
>    return -1;
>}
>#endif /* CONFIG_PCI */
>

Alex.W, any idea on this? :-)

Thanks,
Gavin

>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>> index 6b80539..fca1edc 100644
>>> --- a/hw/vfio/pci.c
>>> +++ b/hw/vfio/pci.c
>>> @@ -3319,6 +3319,44 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
>>>      vdev->req_enabled = false;
>>>  }
>>>  
>>> +int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
>>> +                             struct vfio_eeh_pe_op *op)
>>> +{
>>> +    VFIOGroup *group;
>>> +    VFIODevice *vbasedev;
>>> +    VFIOPCIDevice *vdev;
>>> +
>>> +    group = vfio_get_group(groupid, as);
>>> +    if (!group) {
>>> +        error_report("vfio: group %d not found\n", groupid);
>>> +        return -1;
>>> +    }
>>> +
>>> +    switch (op->op) {
>>> +    case VFIO_EEH_PE_RESET_HOT:
>>> +    case VFIO_EEH_PE_RESET_FUNDAMENTAL:
>>> +        /*
>>> +         * The MSIx table will be cleaned out by reset. We need
>>> +         * disable it so that it can be reenabled properly. Also,
>>> +         * the cached MSIx table should be cleared as it's not
>>> +         * reflecting the contents in hardware.
>>> +         */
>>> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
>>> +            vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>>> +            if (msix_enabled(&vdev->pdev)) {
>>> +                vfio_disable_msix(vdev);
>>> +            }
>>> +
>>> +            msix_reset(&vdev->pdev);
>>> +        }
>>> +
>>> +        break;
>>> +    }
>>> +
>>> +    vfio_put_group(group);
>>> +    return 0;
>>> +}
>>> +
>>>  static int vfio_initfn(PCIDevice *pdev)
>>>  {
>>>      VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
>>> diff --git a/include/hw/vfio/vfio.h b/include/hw/vfio/vfio.h
>>> index 0b26cd8..e9a3abb 100644
>>> --- a/include/hw/vfio/vfio.h
>>> +++ b/include/hw/vfio/vfio.h
>>> @@ -5,5 +5,7 @@
>>>  
>>>  extern int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
>>>                                  int req, void *param);
>>> +extern int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
>>> +                                    struct vfio_eeh_pe_op *op);
>>>  
>>>  #endif
>>
>>
>>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v2 1/3] VFIO: Clear stale MSIx table during EEH reset
  2015-03-20  6:27   ` [Qemu-devel] [Qemu-ppc] " Gavin Shan
@ 2015-03-23  5:06     ` David Gibson
  2015-03-23  5:25       ` Gavin Shan
  0 siblings, 1 reply; 21+ messages in thread
From: David Gibson @ 2015-03-23  5:06 UTC (permalink / raw)
  To: Gavin Shan; +Cc: alex.williamson, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3331 bytes --]

On Fri, Mar 20, 2015 at 05:27:29PM +1100, Gavin Shan wrote:
> On Fri, Mar 20, 2015 at 05:04:01PM +1100, David Gibson wrote:
> >On Tue, Mar 17, 2015 at 03:31:24AM +1100, Gavin Shan wrote:
> >> The PCI device MSIx table is cleaned out in hardware after EEH PE
> >> reset. However, we still hold the stale MSIx entries in QEMU, which
> >> should be cleared accordingly. Otherwise, we will run into another
> >> (recursive) EEH error and the PCI devices contained in the PE have
> >> to be offlined exceptionally.
> >> 
> >> The patch clears stale MSIx table before EEH PE reset so that MSIx
> >> table could be restored properly after EEH PE reset.
> >> 
> >> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> >> ---
> >> v2: vfio_container_eeh_event() stub for !CONFIG_PCI and separate
> >>     error message for this function. Dropped vfio_put_group()
> >>     on NULL group
> >> ---
> >>  hw/vfio/Makefile.objs  |  6 +++++-
> >>  hw/vfio/common.c       |  7 +++++++
> >>  hw/vfio/pci-stub.c     | 17 +++++++++++++++++
> >>  hw/vfio/pci.c          | 38 ++++++++++++++++++++++++++++++++++++++
> >>  include/hw/vfio/vfio.h |  2 ++
> >>  5 files changed, 69 insertions(+), 1 deletion(-)
> >>  create mode 100644 hw/vfio/pci-stub.c
> >> 
> >> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
> >> index e31f30e..1b8a065 100644
> >> --- a/hw/vfio/Makefile.objs
> >> +++ b/hw/vfio/Makefile.objs
> >> @@ -1,4 +1,8 @@
> >>  ifeq ($(CONFIG_LINUX), y)
> >>  obj-$(CONFIG_SOFTMMU) += common.o
> >> -obj-$(CONFIG_PCI) += pci.o
> >> +ifeq ($(CONFIG_PCI), y)
> >> +obj-y += pci.o
> >> +else
> >> +obj-y += pci-stub.o
> >> +endif
> >>  endif
> >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> >> index 148eb53..ed07814 100644
> >> --- a/hw/vfio/common.c
> >> +++ b/hw/vfio/common.c
> >> @@ -949,7 +949,14 @@ int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
> >>      switch (req) {
> >>      case VFIO_CHECK_EXTENSION:
> >>      case VFIO_IOMMU_SPAPR_TCE_GET_INFO:
> >> +        break;
> >>      case VFIO_EEH_PE_OP:
> >> +        if (vfio_container_eeh_event(as, groupid, param) != 0) {
> >
> >I really dislike the idea of having an arbitrarily complex side effect
> >from a function whose name suggest's it's just a trivial wrapper
> >around the ioctl().
> >
> 
> Ok. I guess you would like putting the complex in the callers of
> vfio_container_ioctl().

Well.. maybe.  I'd also be happy if helper functions were implemeneted
which both called the ioctl() and did the other necessary pieces.
They should just be called something that indicates their full
function, not a name which suggests they're just an ioctl wrapper.

> All those callers are implemtend in sPAPR
> platform (spapr_pci_vfio.c). I didn't put the logic there because:
> 
> - VFIOPCIDevice is invisible to sPAPR platform.

It seems kind of silly that VFIOPCIDevice isn't visible to
spapr_pci_vfio.c.

> - We can't tell one particular PCI device attached to sPAPRPHBState
>   is the basement of VFIOPCIDevice, or emulated PCI device.

I'm not sure what you mean by that.


-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v2 1/3] VFIO: Clear stale MSIx table during EEH reset
  2015-03-23  5:06     ` David Gibson
@ 2015-03-23  5:25       ` Gavin Shan
  2015-03-24  5:41         ` David Gibson
  0 siblings, 1 reply; 21+ messages in thread
From: Gavin Shan @ 2015-03-23  5:25 UTC (permalink / raw)
  To: David Gibson; +Cc: alex.williamson, qemu-ppc, Gavin Shan, qemu-devel

On Mon, Mar 23, 2015 at 04:06:56PM +1100, David Gibson wrote:
>On Fri, Mar 20, 2015 at 05:27:29PM +1100, Gavin Shan wrote:
>> On Fri, Mar 20, 2015 at 05:04:01PM +1100, David Gibson wrote:
>> >On Tue, Mar 17, 2015 at 03:31:24AM +1100, Gavin Shan wrote:
>> >> The PCI device MSIx table is cleaned out in hardware after EEH PE
>> >> reset. However, we still hold the stale MSIx entries in QEMU, which
>> >> should be cleared accordingly. Otherwise, we will run into another
>> >> (recursive) EEH error and the PCI devices contained in the PE have
>> >> to be offlined exceptionally.
>> >> 
>> >> The patch clears stale MSIx table before EEH PE reset so that MSIx
>> >> table could be restored properly after EEH PE reset.
>> >> 
>> >> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> >> ---
>> >> v2: vfio_container_eeh_event() stub for !CONFIG_PCI and separate
>> >>     error message for this function. Dropped vfio_put_group()
>> >>     on NULL group
>> >> ---
>> >>  hw/vfio/Makefile.objs  |  6 +++++-
>> >>  hw/vfio/common.c       |  7 +++++++
>> >>  hw/vfio/pci-stub.c     | 17 +++++++++++++++++
>> >>  hw/vfio/pci.c          | 38 ++++++++++++++++++++++++++++++++++++++
>> >>  include/hw/vfio/vfio.h |  2 ++
>> >>  5 files changed, 69 insertions(+), 1 deletion(-)
>> >>  create mode 100644 hw/vfio/pci-stub.c
>> >> 
>> >> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
>> >> index e31f30e..1b8a065 100644
>> >> --- a/hw/vfio/Makefile.objs
>> >> +++ b/hw/vfio/Makefile.objs
>> >> @@ -1,4 +1,8 @@
>> >>  ifeq ($(CONFIG_LINUX), y)
>> >>  obj-$(CONFIG_SOFTMMU) += common.o
>> >> -obj-$(CONFIG_PCI) += pci.o
>> >> +ifeq ($(CONFIG_PCI), y)
>> >> +obj-y += pci.o
>> >> +else
>> >> +obj-y += pci-stub.o
>> >> +endif
>> >>  endif
>> >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> >> index 148eb53..ed07814 100644
>> >> --- a/hw/vfio/common.c
>> >> +++ b/hw/vfio/common.c
>> >> @@ -949,7 +949,14 @@ int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
>> >>      switch (req) {
>> >>      case VFIO_CHECK_EXTENSION:
>> >>      case VFIO_IOMMU_SPAPR_TCE_GET_INFO:
>> >> +        break;
>> >>      case VFIO_EEH_PE_OP:
>> >> +        if (vfio_container_eeh_event(as, groupid, param) != 0) {
>> >
>> >I really dislike the idea of having an arbitrarily complex side effect
>> >from a function whose name suggest's it's just a trivial wrapper
>> >around the ioctl().
>> >
>> 
>> Ok. I guess you would like putting the complex in the callers of
>> vfio_container_ioctl().
>
>Well.. maybe.  I'd also be happy if helper functions were implemeneted
>which both called the ioctl() and did the other necessary pieces.
>They should just be called something that indicates their full
>function, not a name which suggests they're just an ioctl wrapper.
>

Indeed, vfio_container_ioctl() isn't indicating what the function is doing.
How about renaming it to vfio_container_event_and_ioctl()? I'm always bad
at giving a good function name :)

Thanks,
Gavin

>> All those callers are implemtend in sPAPR
>> platform (spapr_pci_vfio.c). I didn't put the logic there because:
>> 
>> - VFIOPCIDevice is invisible to sPAPR platform.
>
>It seems kind of silly that VFIOPCIDevice isn't visible to
>spapr_pci_vfio.c.
>
>> - We can't tell one particular PCI device attached to sPAPRPHBState
>>   is the basement of VFIOPCIDevice, or emulated PCI device.
>
>I'm not sure what you mean by that.
>
>
>-- 
>David Gibson			| I'll have my music baroque, and my code
>david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
>				| _way_ _around_!
>http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v2 1/3] VFIO: Clear stale MSIx table during EEH reset
  2015-03-23  5:25       ` Gavin Shan
@ 2015-03-24  5:41         ` David Gibson
  2015-03-24  6:24           ` Gavin Shan
  0 siblings, 1 reply; 21+ messages in thread
From: David Gibson @ 2015-03-24  5:41 UTC (permalink / raw)
  To: Gavin Shan; +Cc: alex.williamson, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3898 bytes --]

On Mon, Mar 23, 2015 at 04:25:10PM +1100, Gavin Shan wrote:
> On Mon, Mar 23, 2015 at 04:06:56PM +1100, David Gibson wrote:
> >On Fri, Mar 20, 2015 at 05:27:29PM +1100, Gavin Shan wrote:
> >> On Fri, Mar 20, 2015 at 05:04:01PM +1100, David Gibson wrote:
> >> >On Tue, Mar 17, 2015 at 03:31:24AM +1100, Gavin Shan wrote:
> >> >> The PCI device MSIx table is cleaned out in hardware after EEH PE
> >> >> reset. However, we still hold the stale MSIx entries in QEMU, which
> >> >> should be cleared accordingly. Otherwise, we will run into another
> >> >> (recursive) EEH error and the PCI devices contained in the PE have
> >> >> to be offlined exceptionally.
> >> >> 
> >> >> The patch clears stale MSIx table before EEH PE reset so that MSIx
> >> >> table could be restored properly after EEH PE reset.
> >> >> 
> >> >> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> >> >> ---
> >> >> v2: vfio_container_eeh_event() stub for !CONFIG_PCI and separate
> >> >>     error message for this function. Dropped vfio_put_group()
> >> >>     on NULL group
> >> >> ---
> >> >>  hw/vfio/Makefile.objs  |  6 +++++-
> >> >>  hw/vfio/common.c       |  7 +++++++
> >> >>  hw/vfio/pci-stub.c     | 17 +++++++++++++++++
> >> >>  hw/vfio/pci.c          | 38 ++++++++++++++++++++++++++++++++++++++
> >> >>  include/hw/vfio/vfio.h |  2 ++
> >> >>  5 files changed, 69 insertions(+), 1 deletion(-)
> >> >>  create mode 100644 hw/vfio/pci-stub.c
> >> >> 
> >> >> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
> >> >> index e31f30e..1b8a065 100644
> >> >> --- a/hw/vfio/Makefile.objs
> >> >> +++ b/hw/vfio/Makefile.objs
> >> >> @@ -1,4 +1,8 @@
> >> >>  ifeq ($(CONFIG_LINUX), y)
> >> >>  obj-$(CONFIG_SOFTMMU) += common.o
> >> >> -obj-$(CONFIG_PCI) += pci.o
> >> >> +ifeq ($(CONFIG_PCI), y)
> >> >> +obj-y += pci.o
> >> >> +else
> >> >> +obj-y += pci-stub.o
> >> >> +endif
> >> >>  endif
> >> >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> >> >> index 148eb53..ed07814 100644
> >> >> --- a/hw/vfio/common.c
> >> >> +++ b/hw/vfio/common.c
> >> >> @@ -949,7 +949,14 @@ int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
> >> >>      switch (req) {
> >> >>      case VFIO_CHECK_EXTENSION:
> >> >>      case VFIO_IOMMU_SPAPR_TCE_GET_INFO:
> >> >> +        break;
> >> >>      case VFIO_EEH_PE_OP:
> >> >> +        if (vfio_container_eeh_event(as, groupid, param) != 0) {
> >> >
> >> >I really dislike the idea of having an arbitrarily complex side effect
> >> >from a function whose name suggest's it's just a trivial wrapper
> >> >around the ioctl().
> >> >
> >> 
> >> Ok. I guess you would like putting the complex in the callers of
> >> vfio_container_ioctl().
> >
> >Well.. maybe.  I'd also be happy if helper functions were implemeneted
> >which both called the ioctl() and did the other necessary pieces.
> >They should just be called something that indicates their full
> >function, not a name which suggests they're just an ioctl wrapper.
> >
> 
> Indeed, vfio_container_ioctl() isn't indicating what the function is doing.
> How about renaming it to vfio_container_event_and_ioctl()? I'm always bad
> at giving a good function name :)

Well, I don't think your wrapper should be multiplexed.  The multiplex
works for the simple ioctl() wrapper, because there really is nothing
that varies apart from the exact ioctl number called.

But now that you have different operations here, I think you want
wrappers for each one - each one will call the ioctl(), then do the
specific extra steps necessary for that operation.  So
vfio_container_event() will go away as well, split into various other
functions.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v2 1/3] VFIO: Clear stale MSIx table during EEH reset
  2015-03-24  5:41         ` David Gibson
@ 2015-03-24  6:24           ` Gavin Shan
  2015-03-24  6:54             ` David Gibson
  0 siblings, 1 reply; 21+ messages in thread
From: Gavin Shan @ 2015-03-24  6:24 UTC (permalink / raw)
  To: David Gibson; +Cc: alex.williamson, qemu-ppc, Gavin Shan, qemu-devel

On Tue, Mar 24, 2015 at 04:41:21PM +1100, David Gibson wrote:
>On Mon, Mar 23, 2015 at 04:25:10PM +1100, Gavin Shan wrote:
>> On Mon, Mar 23, 2015 at 04:06:56PM +1100, David Gibson wrote:
>> >On Fri, Mar 20, 2015 at 05:27:29PM +1100, Gavin Shan wrote:
>> >> On Fri, Mar 20, 2015 at 05:04:01PM +1100, David Gibson wrote:
>> >> >On Tue, Mar 17, 2015 at 03:31:24AM +1100, Gavin Shan wrote:
>> >> >> The PCI device MSIx table is cleaned out in hardware after EEH PE
>> >> >> reset. However, we still hold the stale MSIx entries in QEMU, which
>> >> >> should be cleared accordingly. Otherwise, we will run into another
>> >> >> (recursive) EEH error and the PCI devices contained in the PE have
>> >> >> to be offlined exceptionally.
>> >> >> 
>> >> >> The patch clears stale MSIx table before EEH PE reset so that MSIx
>> >> >> table could be restored properly after EEH PE reset.
>> >> >> 
>> >> >> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> >> >> ---
>> >> >> v2: vfio_container_eeh_event() stub for !CONFIG_PCI and separate
>> >> >>     error message for this function. Dropped vfio_put_group()
>> >> >>     on NULL group
>> >> >> ---
>> >> >>  hw/vfio/Makefile.objs  |  6 +++++-
>> >> >>  hw/vfio/common.c       |  7 +++++++
>> >> >>  hw/vfio/pci-stub.c     | 17 +++++++++++++++++
>> >> >>  hw/vfio/pci.c          | 38 ++++++++++++++++++++++++++++++++++++++
>> >> >>  include/hw/vfio/vfio.h |  2 ++
>> >> >>  5 files changed, 69 insertions(+), 1 deletion(-)
>> >> >>  create mode 100644 hw/vfio/pci-stub.c
>> >> >> 
>> >> >> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
>> >> >> index e31f30e..1b8a065 100644
>> >> >> --- a/hw/vfio/Makefile.objs
>> >> >> +++ b/hw/vfio/Makefile.objs
>> >> >> @@ -1,4 +1,8 @@
>> >> >>  ifeq ($(CONFIG_LINUX), y)
>> >> >>  obj-$(CONFIG_SOFTMMU) += common.o
>> >> >> -obj-$(CONFIG_PCI) += pci.o
>> >> >> +ifeq ($(CONFIG_PCI), y)
>> >> >> +obj-y += pci.o
>> >> >> +else
>> >> >> +obj-y += pci-stub.o
>> >> >> +endif
>> >> >>  endif
>> >> >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> >> >> index 148eb53..ed07814 100644
>> >> >> --- a/hw/vfio/common.c
>> >> >> +++ b/hw/vfio/common.c
>> >> >> @@ -949,7 +949,14 @@ int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
>> >> >>      switch (req) {
>> >> >>      case VFIO_CHECK_EXTENSION:
>> >> >>      case VFIO_IOMMU_SPAPR_TCE_GET_INFO:
>> >> >> +        break;
>> >> >>      case VFIO_EEH_PE_OP:
>> >> >> +        if (vfio_container_eeh_event(as, groupid, param) != 0) {
>> >> >
>> >> >I really dislike the idea of having an arbitrarily complex side effect
>> >> >from a function whose name suggest's it's just a trivial wrapper
>> >> >around the ioctl().
>> >> >
>> >> 
>> >> Ok. I guess you would like putting the complex in the callers of
>> >> vfio_container_ioctl().
>> >
>> >Well.. maybe.  I'd also be happy if helper functions were implemeneted
>> >which both called the ioctl() and did the other necessary pieces.
>> >They should just be called something that indicates their full
>> >function, not a name which suggests they're just an ioctl wrapper.
>> >
>> 
>> Indeed, vfio_container_ioctl() isn't indicating what the function is doing.
>> How about renaming it to vfio_container_event_and_ioctl()? I'm always bad
>> at giving a good function name :)
>
>Well, I don't think your wrapper should be multiplexed.  The multiplex
>works for the simple ioctl() wrapper, because there really is nothing
>that varies apart from the exact ioctl number called.
>
>But now that you have different operations here, I think you want
>wrappers for each one - each one will call the ioctl(), then do the
>specific extra steps necessary for that operation.  So
>vfio_container_event() will go away as well, split into various other
>functions.
>

It wouldn't a good idea if I understand your proposal correctly. Currnetly,
the global function vfio_container_ioctl() can be called from sPAPR platform
for any ioctl commands handled in kernel source file vfio_iommu_spapr_tce.c,
which means the function isn't called for EEH only. Other sPAPR TCE container
ioctl commands are also routed by this function. There will be lots if having
one global function for each ioctl commands, which just improve the cost to
maintain the code.

Alternatively, we might expose another function vfio_container_eeh_ioctl(),
which calls vfio_container_ioctl() after doing what we did in vfio_container_event()
if necessary.

Thanks,
Gavin

>-- 
>David Gibson			| I'll have my music baroque, and my code
>david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
>				| _way_ _around_!
>http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v2 1/3] VFIO: Clear stale MSIx table during EEH reset
  2015-03-24  6:24           ` Gavin Shan
@ 2015-03-24  6:54             ` David Gibson
  2015-03-24 12:53               ` Alex Williamson
  0 siblings, 1 reply; 21+ messages in thread
From: David Gibson @ 2015-03-24  6:54 UTC (permalink / raw)
  To: Gavin Shan; +Cc: alex.williamson, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 5269 bytes --]

On Tue, Mar 24, 2015 at 05:24:55PM +1100, Gavin Shan wrote:
> On Tue, Mar 24, 2015 at 04:41:21PM +1100, David Gibson wrote:
> >On Mon, Mar 23, 2015 at 04:25:10PM +1100, Gavin Shan wrote:
> >> On Mon, Mar 23, 2015 at 04:06:56PM +1100, David Gibson wrote:
> >> >On Fri, Mar 20, 2015 at 05:27:29PM +1100, Gavin Shan wrote:
> >> >> On Fri, Mar 20, 2015 at 05:04:01PM +1100, David Gibson wrote:
> >> >> >On Tue, Mar 17, 2015 at 03:31:24AM +1100, Gavin Shan wrote:
> >> >> >> The PCI device MSIx table is cleaned out in hardware after EEH PE
> >> >> >> reset. However, we still hold the stale MSIx entries in QEMU, which
> >> >> >> should be cleared accordingly. Otherwise, we will run into another
> >> >> >> (recursive) EEH error and the PCI devices contained in the PE have
> >> >> >> to be offlined exceptionally.
> >> >> >> 
> >> >> >> The patch clears stale MSIx table before EEH PE reset so that MSIx
> >> >> >> table could be restored properly after EEH PE reset.
> >> >> >> 
> >> >> >> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> >> >> >> ---
> >> >> >> v2: vfio_container_eeh_event() stub for !CONFIG_PCI and separate
> >> >> >>     error message for this function. Dropped vfio_put_group()
> >> >> >>     on NULL group
> >> >> >> ---
> >> >> >>  hw/vfio/Makefile.objs  |  6 +++++-
> >> >> >>  hw/vfio/common.c       |  7 +++++++
> >> >> >>  hw/vfio/pci-stub.c     | 17 +++++++++++++++++
> >> >> >>  hw/vfio/pci.c          | 38 ++++++++++++++++++++++++++++++++++++++
> >> >> >>  include/hw/vfio/vfio.h |  2 ++
> >> >> >>  5 files changed, 69 insertions(+), 1 deletion(-)
> >> >> >>  create mode 100644 hw/vfio/pci-stub.c
> >> >> >> 
> >> >> >> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
> >> >> >> index e31f30e..1b8a065 100644
> >> >> >> --- a/hw/vfio/Makefile.objs
> >> >> >> +++ b/hw/vfio/Makefile.objs
> >> >> >> @@ -1,4 +1,8 @@
> >> >> >>  ifeq ($(CONFIG_LINUX), y)
> >> >> >>  obj-$(CONFIG_SOFTMMU) += common.o
> >> >> >> -obj-$(CONFIG_PCI) += pci.o
> >> >> >> +ifeq ($(CONFIG_PCI), y)
> >> >> >> +obj-y += pci.o
> >> >> >> +else
> >> >> >> +obj-y += pci-stub.o
> >> >> >> +endif
> >> >> >>  endif
> >> >> >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> >> >> >> index 148eb53..ed07814 100644
> >> >> >> --- a/hw/vfio/common.c
> >> >> >> +++ b/hw/vfio/common.c
> >> >> >> @@ -949,7 +949,14 @@ int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
> >> >> >>      switch (req) {
> >> >> >>      case VFIO_CHECK_EXTENSION:
> >> >> >>      case VFIO_IOMMU_SPAPR_TCE_GET_INFO:
> >> >> >> +        break;
> >> >> >>      case VFIO_EEH_PE_OP:
> >> >> >> +        if (vfio_container_eeh_event(as, groupid, param) != 0) {
> >> >> >
> >> >> >I really dislike the idea of having an arbitrarily complex side effect
> >> >> >from a function whose name suggest's it's just a trivial wrapper
> >> >> >around the ioctl().
> >> >> >
> >> >> 
> >> >> Ok. I guess you would like putting the complex in the callers of
> >> >> vfio_container_ioctl().
> >> >
> >> >Well.. maybe.  I'd also be happy if helper functions were implemeneted
> >> >which both called the ioctl() and did the other necessary pieces.
> >> >They should just be called something that indicates their full
> >> >function, not a name which suggests they're just an ioctl wrapper.
> >> >
> >> 
> >> Indeed, vfio_container_ioctl() isn't indicating what the function is doing.
> >> How about renaming it to vfio_container_event_and_ioctl()? I'm always bad
> >> at giving a good function name :)
> >
> >Well, I don't think your wrapper should be multiplexed.  The multiplex
> >works for the simple ioctl() wrapper, because there really is nothing
> >that varies apart from the exact ioctl number called.
> >
> >But now that you have different operations here, I think you want
> >wrappers for each one - each one will call the ioctl(), then do the
> >specific extra steps necessary for that operation.  So
> >vfio_container_event() will go away as well, split into various other
> >functions.
> >
> 
> It wouldn't a good idea if I understand your proposal correctly. Currnetly,
> the global function vfio_container_ioctl() can be called from sPAPR platform
> for any ioctl commands handled in kernel source file vfio_iommu_spapr_tce.c,
> which means the function isn't called for EEH only. Other sPAPR TCE container
> ioctl commands are also routed by this function. There will be lots if having
> one global function for each ioctl commands, which just improve the cost to
> maintain the code.

I don't really follow your objection.  I'm only suggesting separate
wrappers for things which require extra actions currently implemented
in vfio_container_event().  Things which only ned the plain ioctl()
can still use the simple vfio_container_ioctl() wrapper.

> Alternatively, we might expose another function vfio_container_eeh_ioctl(),
> which calls vfio_container_ioctl() after doing what we did in vfio_container_event()
> if necessary.
> 
> Thanks,
> Gavin
> 
> 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v2 1/3] VFIO: Clear stale MSIx table during EEH reset
  2015-03-24  6:54             ` David Gibson
@ 2015-03-24 12:53               ` Alex Williamson
  2015-03-26  0:53                 ` Gavin Shan
  0 siblings, 1 reply; 21+ messages in thread
From: Alex Williamson @ 2015-03-24 12:53 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, Gavin Shan, qemu-devel

On Tue, 2015-03-24 at 17:54 +1100, David Gibson wrote:
> On Tue, Mar 24, 2015 at 05:24:55PM +1100, Gavin Shan wrote:
> > On Tue, Mar 24, 2015 at 04:41:21PM +1100, David Gibson wrote:
> > >On Mon, Mar 23, 2015 at 04:25:10PM +1100, Gavin Shan wrote:
> > >> On Mon, Mar 23, 2015 at 04:06:56PM +1100, David Gibson wrote:
> > >> >On Fri, Mar 20, 2015 at 05:27:29PM +1100, Gavin Shan wrote:
> > >> >> On Fri, Mar 20, 2015 at 05:04:01PM +1100, David Gibson wrote:
> > >> >> >On Tue, Mar 17, 2015 at 03:31:24AM +1100, Gavin Shan wrote:
> > >> >> >> The PCI device MSIx table is cleaned out in hardware after EEH PE
> > >> >> >> reset. However, we still hold the stale MSIx entries in QEMU, which
> > >> >> >> should be cleared accordingly. Otherwise, we will run into another
> > >> >> >> (recursive) EEH error and the PCI devices contained in the PE have
> > >> >> >> to be offlined exceptionally.
> > >> >> >> 
> > >> >> >> The patch clears stale MSIx table before EEH PE reset so that MSIx
> > >> >> >> table could be restored properly after EEH PE reset.
> > >> >> >> 
> > >> >> >> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> > >> >> >> ---
> > >> >> >> v2: vfio_container_eeh_event() stub for !CONFIG_PCI and separate
> > >> >> >>     error message for this function. Dropped vfio_put_group()
> > >> >> >>     on NULL group
> > >> >> >> ---
> > >> >> >>  hw/vfio/Makefile.objs  |  6 +++++-
> > >> >> >>  hw/vfio/common.c       |  7 +++++++
> > >> >> >>  hw/vfio/pci-stub.c     | 17 +++++++++++++++++
> > >> >> >>  hw/vfio/pci.c          | 38 ++++++++++++++++++++++++++++++++++++++
> > >> >> >>  include/hw/vfio/vfio.h |  2 ++
> > >> >> >>  5 files changed, 69 insertions(+), 1 deletion(-)
> > >> >> >>  create mode 100644 hw/vfio/pci-stub.c
> > >> >> >> 
> > >> >> >> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
> > >> >> >> index e31f30e..1b8a065 100644
> > >> >> >> --- a/hw/vfio/Makefile.objs
> > >> >> >> +++ b/hw/vfio/Makefile.objs
> > >> >> >> @@ -1,4 +1,8 @@
> > >> >> >>  ifeq ($(CONFIG_LINUX), y)
> > >> >> >>  obj-$(CONFIG_SOFTMMU) += common.o
> > >> >> >> -obj-$(CONFIG_PCI) += pci.o
> > >> >> >> +ifeq ($(CONFIG_PCI), y)
> > >> >> >> +obj-y += pci.o
> > >> >> >> +else
> > >> >> >> +obj-y += pci-stub.o
> > >> >> >> +endif
> > >> >> >>  endif
> > >> >> >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > >> >> >> index 148eb53..ed07814 100644
> > >> >> >> --- a/hw/vfio/common.c
> > >> >> >> +++ b/hw/vfio/common.c
> > >> >> >> @@ -949,7 +949,14 @@ int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
> > >> >> >>      switch (req) {
> > >> >> >>      case VFIO_CHECK_EXTENSION:
> > >> >> >>      case VFIO_IOMMU_SPAPR_TCE_GET_INFO:
> > >> >> >> +        break;
> > >> >> >>      case VFIO_EEH_PE_OP:
> > >> >> >> +        if (vfio_container_eeh_event(as, groupid, param) != 0) {
> > >> >> >
> > >> >> >I really dislike the idea of having an arbitrarily complex side effect
> > >> >> >from a function whose name suggest's it's just a trivial wrapper
> > >> >> >around the ioctl().
> > >> >> >
> > >> >> 
> > >> >> Ok. I guess you would like putting the complex in the callers of
> > >> >> vfio_container_ioctl().
> > >> >
> > >> >Well.. maybe.  I'd also be happy if helper functions were implemeneted
> > >> >which both called the ioctl() and did the other necessary pieces.
> > >> >They should just be called something that indicates their full
> > >> >function, not a name which suggests they're just an ioctl wrapper.
> > >> >
> > >> 
> > >> Indeed, vfio_container_ioctl() isn't indicating what the function is doing.
> > >> How about renaming it to vfio_container_event_and_ioctl()? I'm always bad
> > >> at giving a good function name :)
> > >
> > >Well, I don't think your wrapper should be multiplexed.  The multiplex
> > >works for the simple ioctl() wrapper, because there really is nothing
> > >that varies apart from the exact ioctl number called.
> > >
> > >But now that you have different operations here, I think you want
> > >wrappers for each one - each one will call the ioctl(), then do the
> > >specific extra steps necessary for that operation.  So
> > >vfio_container_event() will go away as well, split into various other
> > >functions.
> > >
> > 
> > It wouldn't a good idea if I understand your proposal correctly. Currnetly,
> > the global function vfio_container_ioctl() can be called from sPAPR platform
> > for any ioctl commands handled in kernel source file vfio_iommu_spapr_tce.c,
> > which means the function isn't called for EEH only. Other sPAPR TCE container
> > ioctl commands are also routed by this function. There will be lots if having
> > one global function for each ioctl commands, which just improve the cost to
> > maintain the code.
> 
> I don't really follow your objection.  I'm only suggesting separate
> wrappers for things which require extra actions currently implemented
> in vfio_container_event().  Things which only ned the plain ioctl()
> can still use the simple vfio_container_ioctl() wrapper.

vfio_container_ioctl() also filters to a limited set of ioctls, it
clearly does not allow any ioctl.

> > Alternatively, we might expose another function vfio_container_eeh_ioctl(),
> > which calls vfio_container_ioctl() after doing what we did in vfio_container_event()
> > if necessary.
> > 
> > Thanks,
> > Gavin
> > 
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v2 1/3] VFIO: Clear stale MSIx table during EEH reset
  2015-03-24 12:53               ` Alex Williamson
@ 2015-03-26  0:53                 ` Gavin Shan
  2015-03-26  1:10                   ` David Gibson
  0 siblings, 1 reply; 21+ messages in thread
From: Gavin Shan @ 2015-03-26  0:53 UTC (permalink / raw)
  To: Alex Williamson; +Cc: qemu-devel, qemu-ppc, Gavin Shan, David Gibson

On Tue, Mar 24, 2015 at 06:53:29AM -0600, Alex Williamson wrote:
>On Tue, 2015-03-24 at 17:54 +1100, David Gibson wrote:
>> On Tue, Mar 24, 2015 at 05:24:55PM +1100, Gavin Shan wrote:
>> > On Tue, Mar 24, 2015 at 04:41:21PM +1100, David Gibson wrote:
>> > >On Mon, Mar 23, 2015 at 04:25:10PM +1100, Gavin Shan wrote:
>> > >> On Mon, Mar 23, 2015 at 04:06:56PM +1100, David Gibson wrote:
>> > >> >On Fri, Mar 20, 2015 at 05:27:29PM +1100, Gavin Shan wrote:
>> > >> >> On Fri, Mar 20, 2015 at 05:04:01PM +1100, David Gibson wrote:
>> > >> >> >On Tue, Mar 17, 2015 at 03:31:24AM +1100, Gavin Shan wrote:
>> > >> >> >> The PCI device MSIx table is cleaned out in hardware after EEH PE
>> > >> >> >> reset. However, we still hold the stale MSIx entries in QEMU, which
>> > >> >> >> should be cleared accordingly. Otherwise, we will run into another
>> > >> >> >> (recursive) EEH error and the PCI devices contained in the PE have
>> > >> >> >> to be offlined exceptionally.
>> > >> >> >> 
>> > >> >> >> The patch clears stale MSIx table before EEH PE reset so that MSIx
>> > >> >> >> table could be restored properly after EEH PE reset.
>> > >> >> >> 
>> > >> >> >> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> > >> >> >> ---
>> > >> >> >> v2: vfio_container_eeh_event() stub for !CONFIG_PCI and separate
>> > >> >> >>     error message for this function. Dropped vfio_put_group()
>> > >> >> >>     on NULL group
>> > >> >> >> ---
>> > >> >> >>  hw/vfio/Makefile.objs  |  6 +++++-
>> > >> >> >>  hw/vfio/common.c       |  7 +++++++
>> > >> >> >>  hw/vfio/pci-stub.c     | 17 +++++++++++++++++
>> > >> >> >>  hw/vfio/pci.c          | 38 ++++++++++++++++++++++++++++++++++++++
>> > >> >> >>  include/hw/vfio/vfio.h |  2 ++
>> > >> >> >>  5 files changed, 69 insertions(+), 1 deletion(-)
>> > >> >> >>  create mode 100644 hw/vfio/pci-stub.c
>> > >> >> >> 
>> > >> >> >> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
>> > >> >> >> index e31f30e..1b8a065 100644
>> > >> >> >> --- a/hw/vfio/Makefile.objs
>> > >> >> >> +++ b/hw/vfio/Makefile.objs
>> > >> >> >> @@ -1,4 +1,8 @@
>> > >> >> >>  ifeq ($(CONFIG_LINUX), y)
>> > >> >> >>  obj-$(CONFIG_SOFTMMU) += common.o
>> > >> >> >> -obj-$(CONFIG_PCI) += pci.o
>> > >> >> >> +ifeq ($(CONFIG_PCI), y)
>> > >> >> >> +obj-y += pci.o
>> > >> >> >> +else
>> > >> >> >> +obj-y += pci-stub.o
>> > >> >> >> +endif
>> > >> >> >>  endif
>> > >> >> >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> > >> >> >> index 148eb53..ed07814 100644
>> > >> >> >> --- a/hw/vfio/common.c
>> > >> >> >> +++ b/hw/vfio/common.c
>> > >> >> >> @@ -949,7 +949,14 @@ int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
>> > >> >> >>      switch (req) {
>> > >> >> >>      case VFIO_CHECK_EXTENSION:
>> > >> >> >>      case VFIO_IOMMU_SPAPR_TCE_GET_INFO:
>> > >> >> >> +        break;
>> > >> >> >>      case VFIO_EEH_PE_OP:
>> > >> >> >> +        if (vfio_container_eeh_event(as, groupid, param) != 0) {
>> > >> >> >
>> > >> >> >I really dislike the idea of having an arbitrarily complex side effect
>> > >> >> >from a function whose name suggest's it's just a trivial wrapper
>> > >> >> >around the ioctl().
>> > >> >> >
>> > >> >> 
>> > >> >> Ok. I guess you would like putting the complex in the callers of
>> > >> >> vfio_container_ioctl().
>> > >> >
>> > >> >Well.. maybe.  I'd also be happy if helper functions were implemeneted
>> > >> >which both called the ioctl() and did the other necessary pieces.
>> > >> >They should just be called something that indicates their full
>> > >> >function, not a name which suggests they're just an ioctl wrapper.
>> > >> >
>> > >> 
>> > >> Indeed, vfio_container_ioctl() isn't indicating what the function is doing.
>> > >> How about renaming it to vfio_container_event_and_ioctl()? I'm always bad
>> > >> at giving a good function name :)
>> > >
>> > >Well, I don't think your wrapper should be multiplexed.  The multiplex
>> > >works for the simple ioctl() wrapper, because there really is nothing
>> > >that varies apart from the exact ioctl number called.
>> > >
>> > >But now that you have different operations here, I think you want
>> > >wrappers for each one - each one will call the ioctl(), then do the
>> > >specific extra steps necessary for that operation.  So
>> > >vfio_container_event() will go away as well, split into various other
>> > >functions.
>> > >
>> > 
>> > It wouldn't a good idea if I understand your proposal correctly. Currnetly,
>> > the global function vfio_container_ioctl() can be called from sPAPR platform
>> > for any ioctl commands handled in kernel source file vfio_iommu_spapr_tce.c,
>> > which means the function isn't called for EEH only. Other sPAPR TCE container
>> > ioctl commands are also routed by this function. There will be lots if having
>> > one global function for each ioctl commands, which just improve the cost to
>> > maintain the code.
>> 
>> I don't really follow your objection.  I'm only suggesting separate
>> wrappers for things which require extra actions currently implemented
>> in vfio_container_event().  Things which only ned the plain ioctl()
>> can still use the simple vfio_container_ioctl() wrapper.
>
>vfio_container_ioctl() also filters to a limited set of ioctls, it
>clearly does not allow any ioctl.
>

Ok. I think your guys expect something like follows. Note that the following
vfio_container_eeh_ioctl() will accept a limited set of EEH operations, similar
to what's doing in vfio_contain_ioctl() to the ioctl commands:

If you agree to have the changes, I'll put another patch on top of this one
to replace vfio_container_ioctl() in spapr_pci_vfio.c with vfio_container_eeh_ioctl()
for EEH cases.

int vfio_container_eeh_ioctl(AddressSpace *as, int32_t groupid,
                             struct vfio_eeh_pe_op *op)
{
    switch (op->op) {
    case VFIO_EEH_PE_RESET_HOT:
    case VFIO_EEH_PE_RESET_FUNDAMENTAL: {
        VFIOGroup *group;
        VFIODevice *vbasedev;
        VFIOPCIDevice *vdev;

        /*
         * The MSIx table will be cleaned out by reset. We need
         * disable it so that it can be reenabled properly. Also,
         * the cached MSIx table should be cleared as it's not
         * reflecting the contents in hardware.
         */
        group = vfio_get_group(groupid, as);
        if (!group) {
            error_report("vfio: group %d not found\n", groupid);
            return -1;
        }

        QLIST_FOREACH(vbasedev, &group->device_list, next) {
            vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
            if (msix_enabled(&vdev->pdev)) {
                vfio_disable_msix(vdev);
            }

            msix_reset(&vdev->pdev);
        }

        vfio_put_group(group);

        break;
    }
    case VFIO_EEH_PE_DISABLE:
    case VFIO_EEH_PE_ENABLE:
    case VFIO_EEH_PE_UNFREEZE_IO:
    case VFIO_EEH_PE_UNFREEZE_DMA:
    case VFIO_EEH_PE_GET_STATE:
    case VFIO_EEH_PE_RESET_DEACTIVATE:
    case VFIO_EEH_PE_CONFIGURE:
        break;
    default:
        error_report("vfio: unsupported EEH operation %X\n", op->op);
        return -1;
    }

    return vfio_container_ioctl(as, groupid, VFIO_EEH_PE_OP, op);
}

Thanks,
Gavin

>> > Alternatively, we might expose another function vfio_container_eeh_ioctl(),
>> > which calls vfio_container_ioctl() after doing what we did in vfio_container_event()
>> > if necessary.
>> > 
>> > Thanks,
>> > Gavin
>> > 
>> > 
>> > 
>> 
>
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v2 1/3] VFIO: Clear stale MSIx table during EEH reset
  2015-03-26  0:53                 ` Gavin Shan
@ 2015-03-26  1:10                   ` David Gibson
  2015-03-26  1:30                     ` Gavin Shan
  0 siblings, 1 reply; 21+ messages in thread
From: David Gibson @ 2015-03-26  1:10 UTC (permalink / raw)
  To: Gavin Shan; +Cc: Alex Williamson, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 9270 bytes --]

On Thu, Mar 26, 2015 at 11:53:48AM +1100, Gavin Shan wrote:
> On Tue, Mar 24, 2015 at 06:53:29AM -0600, Alex Williamson wrote:
> >On Tue, 2015-03-24 at 17:54 +1100, David Gibson wrote:
> >> On Tue, Mar 24, 2015 at 05:24:55PM +1100, Gavin Shan wrote:
> >> > On Tue, Mar 24, 2015 at 04:41:21PM +1100, David Gibson wrote:
> >> > >On Mon, Mar 23, 2015 at 04:25:10PM +1100, Gavin Shan wrote:
> >> > >> On Mon, Mar 23, 2015 at 04:06:56PM +1100, David Gibson wrote:
> >> > >> >On Fri, Mar 20, 2015 at 05:27:29PM +1100, Gavin Shan wrote:
> >> > >> >> On Fri, Mar 20, 2015 at 05:04:01PM +1100, David Gibson wrote:
> >> > >> >> >On Tue, Mar 17, 2015 at 03:31:24AM +1100, Gavin Shan wrote:
> >> > >> >> >> The PCI device MSIx table is cleaned out in hardware after EEH PE
> >> > >> >> >> reset. However, we still hold the stale MSIx entries in QEMU, which
> >> > >> >> >> should be cleared accordingly. Otherwise, we will run into another
> >> > >> >> >> (recursive) EEH error and the PCI devices contained in the PE have
> >> > >> >> >> to be offlined exceptionally.
> >> > >> >> >> 
> >> > >> >> >> The patch clears stale MSIx table before EEH PE reset so that MSIx
> >> > >> >> >> table could be restored properly after EEH PE reset.
> >> > >> >> >> 
> >> > >> >> >> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> >> > >> >> >> ---
> >> > >> >> >> v2: vfio_container_eeh_event() stub for !CONFIG_PCI and separate
> >> > >> >> >>     error message for this function. Dropped vfio_put_group()
> >> > >> >> >>     on NULL group
> >> > >> >> >> ---
> >> > >> >> >>  hw/vfio/Makefile.objs  |  6 +++++-
> >> > >> >> >>  hw/vfio/common.c       |  7 +++++++
> >> > >> >> >>  hw/vfio/pci-stub.c     | 17 +++++++++++++++++
> >> > >> >> >>  hw/vfio/pci.c          | 38 ++++++++++++++++++++++++++++++++++++++
> >> > >> >> >>  include/hw/vfio/vfio.h |  2 ++
> >> > >> >> >>  5 files changed, 69 insertions(+), 1 deletion(-)
> >> > >> >> >>  create mode 100644 hw/vfio/pci-stub.c
> >> > >> >> >> 
> >> > >> >> >> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
> >> > >> >> >> index e31f30e..1b8a065 100644
> >> > >> >> >> --- a/hw/vfio/Makefile.objs
> >> > >> >> >> +++ b/hw/vfio/Makefile.objs
> >> > >> >> >> @@ -1,4 +1,8 @@
> >> > >> >> >>  ifeq ($(CONFIG_LINUX), y)
> >> > >> >> >>  obj-$(CONFIG_SOFTMMU) += common.o
> >> > >> >> >> -obj-$(CONFIG_PCI) += pci.o
> >> > >> >> >> +ifeq ($(CONFIG_PCI), y)
> >> > >> >> >> +obj-y += pci.o
> >> > >> >> >> +else
> >> > >> >> >> +obj-y += pci-stub.o
> >> > >> >> >> +endif
> >> > >> >> >>  endif
> >> > >> >> >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> >> > >> >> >> index 148eb53..ed07814 100644
> >> > >> >> >> --- a/hw/vfio/common.c
> >> > >> >> >> +++ b/hw/vfio/common.c
> >> > >> >> >> @@ -949,7 +949,14 @@ int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
> >> > >> >> >>      switch (req) {
> >> > >> >> >>      case VFIO_CHECK_EXTENSION:
> >> > >> >> >>      case VFIO_IOMMU_SPAPR_TCE_GET_INFO:
> >> > >> >> >> +        break;
> >> > >> >> >>      case VFIO_EEH_PE_OP:
> >> > >> >> >> +        if (vfio_container_eeh_event(as, groupid, param) != 0) {
> >> > >> >> >
> >> > >> >> >I really dislike the idea of having an arbitrarily complex side effect
> >> > >> >> >from a function whose name suggest's it's just a trivial wrapper
> >> > >> >> >around the ioctl().
> >> > >> >> >
> >> > >> >> 
> >> > >> >> Ok. I guess you would like putting the complex in the callers of
> >> > >> >> vfio_container_ioctl().
> >> > >> >
> >> > >> >Well.. maybe.  I'd also be happy if helper functions were implemeneted
> >> > >> >which both called the ioctl() and did the other necessary pieces.
> >> > >> >They should just be called something that indicates their full
> >> > >> >function, not a name which suggests they're just an ioctl wrapper.
> >> > >> >
> >> > >> 
> >> > >> Indeed, vfio_container_ioctl() isn't indicating what the function is doing.
> >> > >> How about renaming it to vfio_container_event_and_ioctl()? I'm always bad
> >> > >> at giving a good function name :)
> >> > >
> >> > >Well, I don't think your wrapper should be multiplexed.  The multiplex
> >> > >works for the simple ioctl() wrapper, because there really is nothing
> >> > >that varies apart from the exact ioctl number called.
> >> > >
> >> > >But now that you have different operations here, I think you want
> >> > >wrappers for each one - each one will call the ioctl(), then do the
> >> > >specific extra steps necessary for that operation.  So
> >> > >vfio_container_event() will go away as well, split into various other
> >> > >functions.
> >> > >
> >> > 
> >> > It wouldn't a good idea if I understand your proposal correctly. Currnetly,
> >> > the global function vfio_container_ioctl() can be called from sPAPR platform
> >> > for any ioctl commands handled in kernel source file vfio_iommu_spapr_tce.c,
> >> > which means the function isn't called for EEH only. Other sPAPR TCE container
> >> > ioctl commands are also routed by this function. There will be lots if having
> >> > one global function for each ioctl commands, which just improve the cost to
> >> > maintain the code.
> >> 
> >> I don't really follow your objection.  I'm only suggesting separate
> >> wrappers for things which require extra actions currently implemented
> >> in vfio_container_event().  Things which only ned the plain ioctl()
> >> can still use the simple vfio_container_ioctl() wrapper.
> >
> >vfio_container_ioctl() also filters to a limited set of ioctls, it
> >clearly does not allow any ioctl.
> >
> 
> Ok. I think your guys expect something like follows. Note that the following
> vfio_container_eeh_ioctl() will accept a limited set of EEH operations, similar
> to what's doing in vfio_contain_ioctl() to the ioctl commands:
> 
> If you agree to have the changes, I'll put another patch on top of this one
> to replace vfio_container_ioctl() in spapr_pci_vfio.c with vfio_container_eeh_ioctl()
> for EEH cases.
> 
> int vfio_container_eeh_ioctl(AddressSpace *as, int32_t groupid,
>                              struct vfio_eeh_pe_op *op)
> {
>     switch (op->op) {
>     case VFIO_EEH_PE_RESET_HOT:
>     case VFIO_EEH_PE_RESET_FUNDAMENTAL: {
>         VFIOGroup *group;
>         VFIODevice *vbasedev;
>         VFIOPCIDevice *vdev;
> 
>         /*
>          * The MSIx table will be cleaned out by reset. We need
>          * disable it so that it can be reenabled properly. Also,
>          * the cached MSIx table should be cleared as it's not
>          * reflecting the contents in hardware.
>          */
>         group = vfio_get_group(groupid, as);
>         if (!group) {
>             error_report("vfio: group %d not found\n", groupid);
>             return -1;
>         }
> 
>         QLIST_FOREACH(vbasedev, &group->device_list, next) {
>             vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>             if (msix_enabled(&vdev->pdev)) {
>                 vfio_disable_msix(vdev);
>             }
> 
>             msix_reset(&vdev->pdev);
>         }
> 
>         vfio_put_group(group);
> 
>         break;
>     }
>     case VFIO_EEH_PE_DISABLE:
>     case VFIO_EEH_PE_ENABLE:
>     case VFIO_EEH_PE_UNFREEZE_IO:
>     case VFIO_EEH_PE_UNFREEZE_DMA:
>     case VFIO_EEH_PE_GET_STATE:
>     case VFIO_EEH_PE_RESET_DEACTIVATE:
>     case VFIO_EEH_PE_CONFIGURE:
>         break;
>     default:
>         error_report("vfio: unsupported EEH operation %X\n", op->op);
>         return -1;
>     }
> 
>     return vfio_container_ioctl(as, groupid, VFIO_EEH_PE_OP, op);
> }


No, extra operation specific logic inside the ioctl wrapper is exactly
what I want to avoid.  Instead I want to see
vfio_container_eeh_ioctl() remain as it is now - doing nothing but
verifying the ioctl() number, then passing the arguments on to
ioctl().

What I'm expecting is then to add a new functions, along the lines of:

int vfio_eeh_pe_reset(...)
{
    VFIOGroup *group;
    VFIODevice *vbasedev;
    VFIOPCIDevice *vdev;

    /*
     * The MSIx table will be cleaned out by reset. We need
     * disable it so that it can be reenabled properly. Also,
     * the cached MSIx table should be cleared as it's not
     * reflecting the contents in hardware.
     */
    group = vfio_get_group(groupid, as);
    if (!group) {
        error_report("vfio: group %d not found\n", groupid);
        return -1;
    }

    QLIST_FOREACH(vbasedev, &group->device_list, next) {
        vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
        if (msix_enabled(&vdev->pdev)) {
            vfio_disable_msix(vdev);
        }

        msix_reset(&vdev->pdev);
    }

    vfio_put_group(group);

    return vfio_eeh_container_ioctl(as, groupid,
                                    VFIO_EEH_PE_RESET_FUNDAMENTAL, op);
}

I this function can build the op structure itself from sensible
arguments, then that's even better.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH v2 1/3] VFIO: Clear stale MSIx table during EEH reset
  2015-03-26  1:10                   ` David Gibson
@ 2015-03-26  1:30                     ` Gavin Shan
  0 siblings, 0 replies; 21+ messages in thread
From: Gavin Shan @ 2015-03-26  1:30 UTC (permalink / raw)
  To: David Gibson; +Cc: Alex Williamson, qemu-ppc, Gavin Shan, qemu-devel

On Thu, Mar 26, 2015 at 12:10:52PM +1100, David Gibson wrote:
>On Thu, Mar 26, 2015 at 11:53:48AM +1100, Gavin Shan wrote:
>> On Tue, Mar 24, 2015 at 06:53:29AM -0600, Alex Williamson wrote:
>> >On Tue, 2015-03-24 at 17:54 +1100, David Gibson wrote:
>> >> On Tue, Mar 24, 2015 at 05:24:55PM +1100, Gavin Shan wrote:
>> >> > On Tue, Mar 24, 2015 at 04:41:21PM +1100, David Gibson wrote:
>> >> > >On Mon, Mar 23, 2015 at 04:25:10PM +1100, Gavin Shan wrote:
>> >> > >> On Mon, Mar 23, 2015 at 04:06:56PM +1100, David Gibson wrote:
>> >> > >> >On Fri, Mar 20, 2015 at 05:27:29PM +1100, Gavin Shan wrote:
>> >> > >> >> On Fri, Mar 20, 2015 at 05:04:01PM +1100, David Gibson wrote:
>> >> > >> >> >On Tue, Mar 17, 2015 at 03:31:24AM +1100, Gavin Shan wrote:
>> >> > >> >> >> The PCI device MSIx table is cleaned out in hardware after EEH PE
>> >> > >> >> >> reset. However, we still hold the stale MSIx entries in QEMU, which
>> >> > >> >> >> should be cleared accordingly. Otherwise, we will run into another
>> >> > >> >> >> (recursive) EEH error and the PCI devices contained in the PE have
>> >> > >> >> >> to be offlined exceptionally.
>> >> > >> >> >> 
>> >> > >> >> >> The patch clears stale MSIx table before EEH PE reset so that MSIx
>> >> > >> >> >> table could be restored properly after EEH PE reset.
>> >> > >> >> >> 
>> >> > >> >> >> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> >> > >> >> >> ---
>> >> > >> >> >> v2: vfio_container_eeh_event() stub for !CONFIG_PCI and separate
>> >> > >> >> >>     error message for this function. Dropped vfio_put_group()
>> >> > >> >> >>     on NULL group
>> >> > >> >> >> ---
>> >> > >> >> >>  hw/vfio/Makefile.objs  |  6 +++++-
>> >> > >> >> >>  hw/vfio/common.c       |  7 +++++++
>> >> > >> >> >>  hw/vfio/pci-stub.c     | 17 +++++++++++++++++
>> >> > >> >> >>  hw/vfio/pci.c          | 38 ++++++++++++++++++++++++++++++++++++++
>> >> > >> >> >>  include/hw/vfio/vfio.h |  2 ++
>> >> > >> >> >>  5 files changed, 69 insertions(+), 1 deletion(-)
>> >> > >> >> >>  create mode 100644 hw/vfio/pci-stub.c
>> >> > >> >> >> 
>> >> > >> >> >> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
>> >> > >> >> >> index e31f30e..1b8a065 100644
>> >> > >> >> >> --- a/hw/vfio/Makefile.objs
>> >> > >> >> >> +++ b/hw/vfio/Makefile.objs
>> >> > >> >> >> @@ -1,4 +1,8 @@
>> >> > >> >> >>  ifeq ($(CONFIG_LINUX), y)
>> >> > >> >> >>  obj-$(CONFIG_SOFTMMU) += common.o
>> >> > >> >> >> -obj-$(CONFIG_PCI) += pci.o
>> >> > >> >> >> +ifeq ($(CONFIG_PCI), y)
>> >> > >> >> >> +obj-y += pci.o
>> >> > >> >> >> +else
>> >> > >> >> >> +obj-y += pci-stub.o
>> >> > >> >> >> +endif
>> >> > >> >> >>  endif
>> >> > >> >> >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> >> > >> >> >> index 148eb53..ed07814 100644
>> >> > >> >> >> --- a/hw/vfio/common.c
>> >> > >> >> >> +++ b/hw/vfio/common.c
>> >> > >> >> >> @@ -949,7 +949,14 @@ int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
>> >> > >> >> >>      switch (req) {
>> >> > >> >> >>      case VFIO_CHECK_EXTENSION:
>> >> > >> >> >>      case VFIO_IOMMU_SPAPR_TCE_GET_INFO:
>> >> > >> >> >> +        break;
>> >> > >> >> >>      case VFIO_EEH_PE_OP:
>> >> > >> >> >> +        if (vfio_container_eeh_event(as, groupid, param) != 0) {
>> >> > >> >> >
>> >> > >> >> >I really dislike the idea of having an arbitrarily complex side effect
>> >> > >> >> >from a function whose name suggest's it's just a trivial wrapper
>> >> > >> >> >around the ioctl().
>> >> > >> >> >
>> >> > >> >> 
>> >> > >> >> Ok. I guess you would like putting the complex in the callers of
>> >> > >> >> vfio_container_ioctl().
>> >> > >> >
>> >> > >> >Well.. maybe.  I'd also be happy if helper functions were implemeneted
>> >> > >> >which both called the ioctl() and did the other necessary pieces.
>> >> > >> >They should just be called something that indicates their full
>> >> > >> >function, not a name which suggests they're just an ioctl wrapper.
>> >> > >> >
>> >> > >> 
>> >> > >> Indeed, vfio_container_ioctl() isn't indicating what the function is doing.
>> >> > >> How about renaming it to vfio_container_event_and_ioctl()? I'm always bad
>> >> > >> at giving a good function name :)
>> >> > >
>> >> > >Well, I don't think your wrapper should be multiplexed.  The multiplex
>> >> > >works for the simple ioctl() wrapper, because there really is nothing
>> >> > >that varies apart from the exact ioctl number called.
>> >> > >
>> >> > >But now that you have different operations here, I think you want
>> >> > >wrappers for each one - each one will call the ioctl(), then do the
>> >> > >specific extra steps necessary for that operation.  So
>> >> > >vfio_container_event() will go away as well, split into various other
>> >> > >functions.
>> >> > >
>> >> > 
>> >> > It wouldn't a good idea if I understand your proposal correctly. Currnetly,
>> >> > the global function vfio_container_ioctl() can be called from sPAPR platform
>> >> > for any ioctl commands handled in kernel source file vfio_iommu_spapr_tce.c,
>> >> > which means the function isn't called for EEH only. Other sPAPR TCE container
>> >> > ioctl commands are also routed by this function. There will be lots if having
>> >> > one global function for each ioctl commands, which just improve the cost to
>> >> > maintain the code.
>> >> 
>> >> I don't really follow your objection.  I'm only suggesting separate
>> >> wrappers for things which require extra actions currently implemented
>> >> in vfio_container_event().  Things which only ned the plain ioctl()
>> >> can still use the simple vfio_container_ioctl() wrapper.
>> >
>> >vfio_container_ioctl() also filters to a limited set of ioctls, it
>> >clearly does not allow any ioctl.
>> >
>> 
>> Ok. I think your guys expect something like follows. Note that the following
>> vfio_container_eeh_ioctl() will accept a limited set of EEH operations, similar
>> to what's doing in vfio_contain_ioctl() to the ioctl commands:
>> 
>> If you agree to have the changes, I'll put another patch on top of this one
>> to replace vfio_container_ioctl() in spapr_pci_vfio.c with vfio_container_eeh_ioctl()
>> for EEH cases.
>> 
[snip ...]
>
>No, extra operation specific logic inside the ioctl wrapper is exactly
>what I want to avoid.  Instead I want to see
>vfio_container_eeh_ioctl() remain as it is now - doing nothing but
>verifying the ioctl() number, then passing the arguments on to
>ioctl().
>

I think you were talking about vfio_container_ioctl() :)

>What I'm expecting is then to add a new functions, along the lines of:
>
>int vfio_eeh_pe_reset(...)
>{
>    VFIOGroup *group;
>    VFIODevice *vbasedev;
>    VFIOPCIDevice *vdev;
>
>    /*
>     * The MSIx table will be cleaned out by reset. We need
>     * disable it so that it can be reenabled properly. Also,
>     * the cached MSIx table should be cleared as it's not
>     * reflecting the contents in hardware.
>     */
>    group = vfio_get_group(groupid, as);
>    if (!group) {
>        error_report("vfio: group %d not found\n", groupid);
>        return -1;
>    }
>
>    QLIST_FOREACH(vbasedev, &group->device_list, next) {
>        vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>        if (msix_enabled(&vdev->pdev)) {
>            vfio_disable_msix(vdev);
>        }
>
>        msix_reset(&vdev->pdev);
>    }
>
>    vfio_put_group(group);
>
>    return vfio_eeh_container_ioctl(as, groupid,
>                                    VFIO_EEH_PE_RESET_FUNDAMENTAL, op);
>}
>
>I this function can build the op structure itself from sensible
>arguments, then that's even better.
>

Thanks, David. I assume that Alex doesn't object to this and I'll
change the code according to your suggestion. I'll send next revision
soon.

Thanks,
Gavin

>-- 
>David Gibson			| I'll have my music baroque, and my code
>david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
>				| _way_ _around_!
>http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2015-03-26  1:31 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-16 16:31 [Qemu-devel] [PATCH v2 1/3] VFIO: Clear stale MSIx table during EEH reset Gavin Shan
2015-03-16 16:31 ` [Qemu-devel] [PATCH v2 2/3] VFIO: Disable INTx interrupt on " Gavin Shan
2015-03-17 21:16   ` Alex Williamson
2015-03-18  4:54     ` [Qemu-devel] [Qemu-ppc] " Gavin Shan
2015-03-20  4:01       ` Gavin Shan
2015-03-20  5:57         ` Gavin Shan
2015-03-16 16:31 ` [Qemu-devel] [PATCH v2 3/3] sPAPR: Reenable EEH functionality on reboot Gavin Shan
2015-03-17 21:09 ` [Qemu-devel] [PATCH v2 1/3] VFIO: Clear stale MSIx table during EEH reset Alex Williamson
2015-03-17 23:26   ` Gavin Shan
2015-03-23  5:05     ` Gavin Shan
2015-03-20  6:04 ` David Gibson
2015-03-20  6:27   ` [Qemu-devel] [Qemu-ppc] " Gavin Shan
2015-03-23  5:06     ` David Gibson
2015-03-23  5:25       ` Gavin Shan
2015-03-24  5:41         ` David Gibson
2015-03-24  6:24           ` Gavin Shan
2015-03-24  6:54             ` David Gibson
2015-03-24 12:53               ` Alex Williamson
2015-03-26  0:53                 ` Gavin Shan
2015-03-26  1:10                   ` David Gibson
2015-03-26  1:30                     ` Gavin Shan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.