All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 1/3] VFIO: Clear stale MSIx table during EEH reset
@ 2015-03-11  6:11 Gavin Shan
  2015-03-11  6:11 ` [Qemu-devel] [PATCH 2/3] VFIO: Clear INTx pending state on " Gavin Shan
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Gavin Shan @ 2015-03-11  6:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: aik, agraf, Gavin Shan, alex.williamson, qemu-ppc, david

The PCI device MSIx table is cleaned out in hardware after EEH PE
reset. However, we still hold the stale MSIx entries in QEMU, which
should be cleared accordingly. Otherwise, we will run into another
(recursive) EEH error and the PCI devices contained in the PE have
to be offlined exceptionally.

The patch clears stale MSIx table before EEH PE reset so that MSIx
table could be restored properly after EEH PE reset.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 hw/vfio/common.c       |  6 +++++-
 hw/vfio/pci.c          | 39 +++++++++++++++++++++++++++++++++++++++
 include/hw/vfio/vfio.h |  3 ++-
 3 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 148eb53..e3833f4 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -949,8 +949,12 @@ int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
     switch (req) {
     case VFIO_CHECK_EXTENSION:
     case VFIO_IOMMU_SPAPR_TCE_GET_INFO:
-    case VFIO_EEH_PE_OP:
         break;
+    case VFIO_EEH_PE_OP:
+        if (!vfio_container_eeh_event(as, groupid, param)) {
+            break;
+        }
+        /* fallthru */
     default:
         /* Return an error on unknown requests */
         error_report("vfio: unsupported ioctl %X", req);
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 6b80539..8c4a8cb 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3319,6 +3319,45 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
     vdev->req_enabled = false;
 }
 
+int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
+                             struct vfio_eeh_pe_op *op)
+{
+    VFIOGroup *group;
+    VFIODevice *vbasedev;
+    VFIOPCIDevice *vdev;
+
+    group = vfio_get_group(groupid, as);
+    if (!group) {
+        vfio_put_group(group);
+        error_report("vfio: group %d not found\n", groupid);
+        return -1;
+    }
+
+    switch (op->op) {
+    case VFIO_EEH_PE_RESET_HOT:
+    case VFIO_EEH_PE_RESET_FUNDAMENTAL:
+        /*
+         * The MSIx table will be cleaned out by reset. We need
+         * disable it so that it can be reenabled properly. Also,
+         * the cached MSIx table should be cleared as it's not
+         * reflecting the contents in hardware.
+         */
+        QLIST_FOREACH(vbasedev, &group->device_list, next) {
+            vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
+            if (msix_enabled(&vdev->pdev)) {
+                vfio_disable_msix(vdev);
+            }
+
+            msix_reset(&vdev->pdev);
+        }
+
+        break;
+    }
+
+    vfio_put_group(group);
+    return 0;
+}
+
 static int vfio_initfn(PCIDevice *pdev)
 {
     VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
diff --git a/include/hw/vfio/vfio.h b/include/hw/vfio/vfio.h
index 0b26cd8..99528a3 100644
--- a/include/hw/vfio/vfio.h
+++ b/include/hw/vfio/vfio.h
@@ -5,5 +5,6 @@
 
 extern int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
                                 int req, void *param);
-
+extern int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
+                                    struct vfio_eeh_pe_op *op);
 #endif
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH 2/3] VFIO: Clear INTx pending state on EEH reset
  2015-03-11  6:11 [Qemu-devel] [PATCH 1/3] VFIO: Clear stale MSIx table during EEH reset Gavin Shan
@ 2015-03-11  6:11 ` Gavin Shan
  2015-03-12  1:48   ` David Gibson
  2015-03-13 21:51   ` Alex Williamson
  2015-03-11  6:11 ` [Qemu-devel] [PATCH 3/3] sPAPR: Reenable EEH functionality on reboot Gavin Shan
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 15+ messages in thread
From: Gavin Shan @ 2015-03-11  6:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: aik, agraf, Gavin Shan, alex.williamson, qemu-ppc, david

When Linux guest recovers from EEH error on the following Emulex
adapter, the MSIx interrupts are disabled and the INTx emulation
is enabled. One INTx interrupt is injected to the guest by host
because of detected pending INTx interrupts on the adapter. QEMU
disables mmap'ed BAR regions and starts a timer to enable those
regions at later point the INTx interrupt handler. Unfortunately,
"VFIOPCIDevice->intx.pending" isn't cleared, meaning those disabled
mapp'ed BAR regions won't be reenabled properly. It leads to EEH
recovery failure at guest side because of hanged MMIO access.

 # lspci | grep Emulex
 0000:01:00.0 Ethernet controller: Emulex Corporation \
              OneConnect 10Gb NIC (be3) (rev 02)
 0000:01:00.1 Ethernet controller: Emulex Corporation \
              OneConnect 10Gb NIC (be3) (rev 02)

The patch clears "VFIOPCIDevice->intx.pending" after EEH reset
is completed on the PE, which contains the adapter. In turn, the
mmap'ed BAR regions can be reenabled to avoid EEH recovery failure.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 hw/vfio/pci.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 8c4a8cb..55e0904 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3352,6 +3352,20 @@ int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
         }
 
         break;
+    case VFIO_EEH_PE_RESET_DEACTIVATE:
+        /*
+         * We might have INTx interrupt whose handler disabled the
+         * memory mapped BARs. Without clearing the INTx pending
+         * state, the timer kicked by the INTx interrupt handler
+         * won't enable those disabled memory mapped BARs, which
+         * leads EEH recovery failure.
+         */
+        QLIST_FOREACH(vbasedev, &group->device_list, next) {
+            vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
+            vdev->intx.pending = false;
+        }
+
+        break;
     }
 
     vfio_put_group(group);
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH 3/3] sPAPR: Reenable EEH functionality on reboot
  2015-03-11  6:11 [Qemu-devel] [PATCH 1/3] VFIO: Clear stale MSIx table during EEH reset Gavin Shan
  2015-03-11  6:11 ` [Qemu-devel] [PATCH 2/3] VFIO: Clear INTx pending state on " Gavin Shan
@ 2015-03-11  6:11 ` Gavin Shan
  2015-03-12  1:04 ` [Qemu-devel] [PATCH 1/3] VFIO: Clear stale MSIx table during EEH reset David Gibson
  2015-03-13 21:33 ` Alex Williamson
  3 siblings, 0 replies; 15+ messages in thread
From: Gavin Shan @ 2015-03-11  6:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: aik, agraf, Gavin Shan, alex.williamson, qemu-ppc, david

When rebooting the guest, some PEs might be in frozen state. The
contained PCI devices won't work properly if their frozen states
aren't cleared in time. One case running into this situation would
be maximal EEH error times encountered in the guest.

The patch reenables the EEH functinality on PEs on PHB's reset
callback, which will clear their frozen states if needed.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 hw/ppc/spapr_pci_vfio.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
index 99a1be5..0cd96d3 100644
--- a/hw/ppc/spapr_pci_vfio.c
+++ b/hw/ppc/spapr_pci_vfio.c
@@ -71,9 +71,21 @@ static void spapr_phb_vfio_finish_realize(sPAPRPHBState *sphb, Error **errp)
                                 spapr_tce_get_iommu(tcet));
 }
 
+/*
+ * The PE might be in frozen state. To reenable the EEH
+ * functionality on it will clean the frozen state, which
+ * ensures that the comtained PCI devices will work properly.
+ */
 static void spapr_phb_vfio_reset(DeviceState *qdev)
 {
-    /* Do nothing */
+    sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(qdev);
+    struct vfio_eeh_pe_op op = {
+        .argsz = sizeof(op),
+        .op = VFIO_EEH_PE_ENABLE,
+    };
+
+    vfio_container_ioctl(&svphb->phb.iommu_as,
+                         svphb->iommugroupid, VFIO_EEH_PE_OP, &op);
 }
 
 static int spapr_phb_vfio_eeh_set_option(sPAPRPHBState *sphb,
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] VFIO: Clear stale MSIx table during EEH reset
  2015-03-11  6:11 [Qemu-devel] [PATCH 1/3] VFIO: Clear stale MSIx table during EEH reset Gavin Shan
  2015-03-11  6:11 ` [Qemu-devel] [PATCH 2/3] VFIO: Clear INTx pending state on " Gavin Shan
  2015-03-11  6:11 ` [Qemu-devel] [PATCH 3/3] sPAPR: Reenable EEH functionality on reboot Gavin Shan
@ 2015-03-12  1:04 ` David Gibson
  2015-03-12  3:02   ` Gavin Shan
  2015-03-13 21:33 ` Alex Williamson
  3 siblings, 1 reply; 15+ messages in thread
From: David Gibson @ 2015-03-12  1:04 UTC (permalink / raw)
  To: Gavin Shan; +Cc: aik, alex.williamson, qemu-ppc, qemu-devel, agraf

[-- Attachment #1: Type: text/plain, Size: 4048 bytes --]

On Wed, Mar 11, 2015 at 05:11:52PM +1100, Gavin Shan wrote:
> The PCI device MSIx table is cleaned out in hardware after EEH PE
> reset. However, we still hold the stale MSIx entries in QEMU, which
> should be cleared accordingly. Otherwise, we will run into another
> (recursive) EEH error and the PCI devices contained in the PE have
> to be offlined exceptionally.
> 
> The patch clears stale MSIx table before EEH PE reset so that MSIx
> table could be restored properly after EEH PE reset.
> 
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>  hw/vfio/common.c       |  6 +++++-
>  hw/vfio/pci.c          | 39 +++++++++++++++++++++++++++++++++++++++
>  include/hw/vfio/vfio.h |  3 ++-
>  3 files changed, 46 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 148eb53..e3833f4 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -949,8 +949,12 @@ int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
>      switch (req) {
>      case VFIO_CHECK_EXTENSION:
>      case VFIO_IOMMU_SPAPR_TCE_GET_INFO:
> -    case VFIO_EEH_PE_OP:
>          break;
> +    case VFIO_EEH_PE_OP:
> +        if (!vfio_container_eeh_event(as, groupid, param)) {

Please use == 0 not !, remembering that !some_function() is the
success case hurts my brain.

> +            break;
> +        }
> +        /* fallthru */

It doesn't look like the fallthrough will generate the correct error
message: it will say "unsupported ioctl" but
vfio_container_eeh_event() could fail for some other reason.

>      default:
>          /* Return an error on unknown requests */
>          error_report("vfio: unsupported ioctl %X", req);
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 6b80539..8c4a8cb 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3319,6 +3319,45 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
>      vdev->req_enabled = false;
>  }
>  
> +int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
> +                             struct vfio_eeh_pe_op *op)
> +{
> +    VFIOGroup *group;
> +    VFIODevice *vbasedev;
> +    VFIOPCIDevice *vdev;
> +
> +    group = vfio_get_group(groupid, as);
> +    if (!group) {
> +        vfio_put_group(group);

Is vfio_put_group(NULL) really what you want?

> +        error_report("vfio: group %d not found\n", groupid);
> +        return -1;
> +    }
> +
> +    switch (op->op) {
> +    case VFIO_EEH_PE_RESET_HOT:
> +    case VFIO_EEH_PE_RESET_FUNDAMENTAL:
> +        /*
> +         * The MSIx table will be cleaned out by reset. We need
> +         * disable it so that it can be reenabled properly. Also,
> +         * the cached MSIx table should be cleared as it's not
> +         * reflecting the contents in hardware.
> +         */
> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
> +            vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
> +            if (msix_enabled(&vdev->pdev)) {
> +                vfio_disable_msix(vdev);
> +            }
> +
> +            msix_reset(&vdev->pdev);
> +        }
> +
> +        break;
> +    }
> +
> +    vfio_put_group(group);
> +    return 0;
> +}
> +
>  static int vfio_initfn(PCIDevice *pdev)
>  {
>      VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> diff --git a/include/hw/vfio/vfio.h b/include/hw/vfio/vfio.h
> index 0b26cd8..99528a3 100644
> --- a/include/hw/vfio/vfio.h
> +++ b/include/hw/vfio/vfio.h
> @@ -5,5 +5,6 @@
>  
>  extern int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
>                                  int req, void *param);
> -
> +extern int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
> +                                    struct vfio_eeh_pe_op *op);
>  #endif

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] VFIO: Clear INTx pending state on EEH reset
  2015-03-11  6:11 ` [Qemu-devel] [PATCH 2/3] VFIO: Clear INTx pending state on " Gavin Shan
@ 2015-03-12  1:48   ` David Gibson
  2015-03-12  3:07     ` Gavin Shan
  2015-03-13 21:51   ` Alex Williamson
  1 sibling, 1 reply; 15+ messages in thread
From: David Gibson @ 2015-03-12  1:48 UTC (permalink / raw)
  To: Gavin Shan; +Cc: aik, alex.williamson, qemu-ppc, qemu-devel, agraf

[-- Attachment #1: Type: text/plain, Size: 2588 bytes --]

On Wed, Mar 11, 2015 at 05:11:53PM +1100, Gavin Shan wrote:
> When Linux guest recovers from EEH error on the following Emulex
> adapter, the MSIx interrupts are disabled and the INTx emulation
> is enabled. One INTx interrupt is injected to the guest by host
> because of detected pending INTx interrupts on the adapter. QEMU
> disables mmap'ed BAR regions and starts a timer to enable those
> regions at later point the INTx interrupt handler. Unfortunately,
> "VFIOPCIDevice->intx.pending" isn't cleared, meaning those disabled
> mapp'ed BAR regions won't be reenabled properly. It leads to EEH
> recovery failure at guest side because of hanged MMIO access.
> 
>  # lspci | grep Emulex
>  0000:01:00.0 Ethernet controller: Emulex Corporation \
>               OneConnect 10Gb NIC (be3) (rev 02)
>  0000:01:00.1 Ethernet controller: Emulex Corporation \
>               OneConnect 10Gb NIC (be3) (rev 02)
> 
> The patch clears "VFIOPCIDevice->intx.pending" after EEH reset
> is completed on the PE, which contains the adapter. In turn, the
> mmap'ed BAR regions can be reenabled to avoid EEH recovery failure.
> 
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>  hw/vfio/pci.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 8c4a8cb..55e0904 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3352,6 +3352,20 @@ int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
>          }
>  
>          break;
> +    case VFIO_EEH_PE_RESET_DEACTIVATE:
> +        /*
> +         * We might have INTx interrupt whose handler disabled the
> +         * memory mapped BARs. Without clearing the INTx pending
> +         * state, the timer kicked by the INTx interrupt handler
> +         * won't enable those disabled memory mapped BARs, which
> +         * leads EEH recovery failure.
> +         */
> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
> +            vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
> +            vdev->intx.pending = false;
> +        }
> +
> +        break;
>      }
>  
>      vfio_put_group(group);

I'm not sure that invoking these side effects from the low-level
ioctl() wrapper makes a lot of sense.  Wouldn't it be clearer to
explicitly do the necessary state cleanup in the reset callers.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] VFIO: Clear stale MSIx table during EEH reset
  2015-03-12  1:04 ` [Qemu-devel] [PATCH 1/3] VFIO: Clear stale MSIx table during EEH reset David Gibson
@ 2015-03-12  3:02   ` Gavin Shan
  0 siblings, 0 replies; 15+ messages in thread
From: Gavin Shan @ 2015-03-12  3:02 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-devel, aik, agraf, Gavin Shan, alex.williamson, qemu-ppc

On Thu, Mar 12, 2015 at 12:04:59PM +1100, David Gibson wrote:
>On Wed, Mar 11, 2015 at 05:11:52PM +1100, Gavin Shan wrote:
>> The PCI device MSIx table is cleaned out in hardware after EEH PE
>> reset. However, we still hold the stale MSIx entries in QEMU, which
>> should be cleared accordingly. Otherwise, we will run into another
>> (recursive) EEH error and the PCI devices contained in the PE have
>> to be offlined exceptionally.
>> 
>> The patch clears stale MSIx table before EEH PE reset so that MSIx
>> table could be restored properly after EEH PE reset.
>> 
>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> ---
>>  hw/vfio/common.c       |  6 +++++-
>>  hw/vfio/pci.c          | 39 +++++++++++++++++++++++++++++++++++++++
>>  include/hw/vfio/vfio.h |  3 ++-
>>  3 files changed, 46 insertions(+), 2 deletions(-)
>> 
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 148eb53..e3833f4 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -949,8 +949,12 @@ int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
>>      switch (req) {
>>      case VFIO_CHECK_EXTENSION:
>>      case VFIO_IOMMU_SPAPR_TCE_GET_INFO:
>> -    case VFIO_EEH_PE_OP:
>>          break;
>> +    case VFIO_EEH_PE_OP:
>> +        if (!vfio_container_eeh_event(as, groupid, param)) {
>
>Please use == 0 not !, remembering that !some_function() is the
>success case hurts my brain.
>

Yes, I'll fix it as below.

>> +            break;
>> +        }
>> +        /* fallthru */
>
>It doesn't look like the fallthrough will generate the correct error
>message: it will say "unsupported ioctl" but
>vfio_container_eeh_event() could fail for some other reason.
>

For now, vfio_container_eeh_event() fails when VFIO group can't be
found, which is checked by vfio_container_do_ioctl() as well. However,
it's worthy to have precise message as follows:

	case VFIO_EEH_PE_OP:
	    if (vfio_container_eeh_event(as, groupid, param) != 0) {
                error_report("vfio: cannot handle EEH event on group %d\n",
                             groupid);
            }

            break;

>>      default:
>>          /* Return an error on unknown requests */
>>          error_report("vfio: unsupported ioctl %X", req);
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 6b80539..8c4a8cb 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -3319,6 +3319,45 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
>>      vdev->req_enabled = false;
>>  }
>>  
>> +int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
>> +                             struct vfio_eeh_pe_op *op)
>> +{
>> +    VFIOGroup *group;
>> +    VFIODevice *vbasedev;
>> +    VFIOPCIDevice *vdev;
>> +
>> +    group = vfio_get_group(groupid, as);
>> +    if (!group) {
>> +        vfio_put_group(group);
>
>Is vfio_put_group(NULL) really what you want?
>

No, it should be dropped.

Thanks,
Gavin

>> +        error_report("vfio: group %d not found\n", groupid);
>> +        return -1;
>> +    }
>> +
>> +    switch (op->op) {
>> +    case VFIO_EEH_PE_RESET_HOT:
>> +    case VFIO_EEH_PE_RESET_FUNDAMENTAL:
>> +        /*
>> +         * The MSIx table will be cleaned out by reset. We need
>> +         * disable it so that it can be reenabled properly. Also,
>> +         * the cached MSIx table should be cleared as it's not
>> +         * reflecting the contents in hardware.
>> +         */
>> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
>> +            vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>> +            if (msix_enabled(&vdev->pdev)) {
>> +                vfio_disable_msix(vdev);
>> +            }
>> +
>> +            msix_reset(&vdev->pdev);
>> +        }
>> +
>> +        break;
>> +    }
>> +
>> +    vfio_put_group(group);
>> +    return 0;
>> +}
>> +
>>  static int vfio_initfn(PCIDevice *pdev)
>>  {
>>      VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
>> diff --git a/include/hw/vfio/vfio.h b/include/hw/vfio/vfio.h
>> index 0b26cd8..99528a3 100644
>> --- a/include/hw/vfio/vfio.h
>> +++ b/include/hw/vfio/vfio.h
>> @@ -5,5 +5,6 @@
>>  
>>  extern int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
>>                                  int req, void *param);
>> -
>> +extern int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
>> +                                    struct vfio_eeh_pe_op *op);
>>  #endif
>
>-- 
>David Gibson			| I'll have my music baroque, and my code
>david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
>				| _way_ _around_!
>http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] VFIO: Clear INTx pending state on EEH reset
  2015-03-12  1:48   ` David Gibson
@ 2015-03-12  3:07     ` Gavin Shan
  0 siblings, 0 replies; 15+ messages in thread
From: Gavin Shan @ 2015-03-12  3:07 UTC (permalink / raw)
  To: David Gibson
  Cc: qemu-devel, aik, agraf, Gavin Shan, alex.williamson, qemu-ppc

On Thu, Mar 12, 2015 at 12:48:16PM +1100, David Gibson wrote:
>On Wed, Mar 11, 2015 at 05:11:53PM +1100, Gavin Shan wrote:
>> When Linux guest recovers from EEH error on the following Emulex
>> adapter, the MSIx interrupts are disabled and the INTx emulation
>> is enabled. One INTx interrupt is injected to the guest by host
>> because of detected pending INTx interrupts on the adapter. QEMU
>> disables mmap'ed BAR regions and starts a timer to enable those
>> regions at later point the INTx interrupt handler. Unfortunately,
>> "VFIOPCIDevice->intx.pending" isn't cleared, meaning those disabled
>> mapp'ed BAR regions won't be reenabled properly. It leads to EEH
>> recovery failure at guest side because of hanged MMIO access.
>> 
>>  # lspci | grep Emulex
>>  0000:01:00.0 Ethernet controller: Emulex Corporation \
>>               OneConnect 10Gb NIC (be3) (rev 02)
>>  0000:01:00.1 Ethernet controller: Emulex Corporation \
>>               OneConnect 10Gb NIC (be3) (rev 02)
>> 
>> The patch clears "VFIOPCIDevice->intx.pending" after EEH reset
>> is completed on the PE, which contains the adapter. In turn, the
>> mmap'ed BAR regions can be reenabled to avoid EEH recovery failure.
>> 
>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> ---
>>  hw/vfio/pci.c | 14 ++++++++++++++
>>  1 file changed, 14 insertions(+)
>> 
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 8c4a8cb..55e0904 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -3352,6 +3352,20 @@ int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
>>          }
>>  
>>          break;
>> +    case VFIO_EEH_PE_RESET_DEACTIVATE:
>> +        /*
>> +         * We might have INTx interrupt whose handler disabled the
>> +         * memory mapped BARs. Without clearing the INTx pending
>> +         * state, the timer kicked by the INTx interrupt handler
>> +         * won't enable those disabled memory mapped BARs, which
>> +         * leads EEH recovery failure.
>> +         */
>> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
>> +            vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>> +            vdev->intx.pending = false;
>> +        }
>> +
>> +        break;
>>      }
>>  
>>      vfio_put_group(group);
>
>I'm not sure that invoking these side effects from the low-level
>ioctl() wrapper makes a lot of sense.  Wouldn't it be clearer to
>explicitly do the necessary state cleanup in the reset callers.
>

Yes, I agree that putting this into reset caller can help isolating
PowerPC unique EEH code from general code. The only problem I have is
"struct VFIOPCIDevice" is only visible in hw/vfio/pci.c

Thanks,
Gavin

>-- 
>David Gibson			| I'll have my music baroque, and my code
>david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
>				| _way_ _around_!
>http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] VFIO: Clear stale MSIx table during EEH reset
  2015-03-11  6:11 [Qemu-devel] [PATCH 1/3] VFIO: Clear stale MSIx table during EEH reset Gavin Shan
                   ` (2 preceding siblings ...)
  2015-03-12  1:04 ` [Qemu-devel] [PATCH 1/3] VFIO: Clear stale MSIx table during EEH reset David Gibson
@ 2015-03-13 21:33 ` Alex Williamson
  2015-03-15 22:27   ` Gavin Shan
  3 siblings, 1 reply; 15+ messages in thread
From: Alex Williamson @ 2015-03-13 21:33 UTC (permalink / raw)
  To: Gavin Shan; +Cc: aik, david, qemu-ppc, qemu-devel, agraf

On Wed, 2015-03-11 at 17:11 +1100, Gavin Shan wrote:
> The PCI device MSIx table is cleaned out in hardware after EEH PE
> reset. However, we still hold the stale MSIx entries in QEMU, which
> should be cleared accordingly. Otherwise, we will run into another
> (recursive) EEH error and the PCI devices contained in the PE have
> to be offlined exceptionally.
> 
> The patch clears stale MSIx table before EEH PE reset so that MSIx
> table could be restored properly after EEH PE reset.
> 
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>  hw/vfio/common.c       |  6 +++++-
>  hw/vfio/pci.c          | 39 +++++++++++++++++++++++++++++++++++++++
>  include/hw/vfio/vfio.h |  3 ++-
>  3 files changed, 46 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 148eb53..e3833f4 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -949,8 +949,12 @@ int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
>      switch (req) {
>      case VFIO_CHECK_EXTENSION:
>      case VFIO_IOMMU_SPAPR_TCE_GET_INFO:
> -    case VFIO_EEH_PE_OP:
>          break;
> +    case VFIO_EEH_PE_OP:
> +        if (!vfio_container_eeh_event(as, groupid, param)) {
> +            break;
> +        }
> +        /* fallthru */
>      default:
>          /* Return an error on unknown requests */
>          error_report("vfio: unsupported ioctl %X", req);
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 6b80539..8c4a8cb 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3319,6 +3319,45 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
>      vdev->req_enabled = false;
>  }
>  
> +int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
> +                             struct vfio_eeh_pe_op *op)

Called by common, but lives in pci.  This would need a stub if QEMU is
built without CONFIG_PCI.  Thanks,

Alex

> +{
> +    VFIOGroup *group;
> +    VFIODevice *vbasedev;
> +    VFIOPCIDevice *vdev;
> +
> +    group = vfio_get_group(groupid, as);
> +    if (!group) {
> +        vfio_put_group(group);
> +        error_report("vfio: group %d not found\n", groupid);
> +        return -1;
> +    }
> +
> +    switch (op->op) {
> +    case VFIO_EEH_PE_RESET_HOT:
> +    case VFIO_EEH_PE_RESET_FUNDAMENTAL:
> +        /*
> +         * The MSIx table will be cleaned out by reset. We need
> +         * disable it so that it can be reenabled properly. Also,
> +         * the cached MSIx table should be cleared as it's not
> +         * reflecting the contents in hardware.
> +         */
> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
> +            vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
> +            if (msix_enabled(&vdev->pdev)) {
> +                vfio_disable_msix(vdev);
> +            }
> +
> +            msix_reset(&vdev->pdev);
> +        }
> +
> +        break;
> +    }
> +
> +    vfio_put_group(group);
> +    return 0;
> +}
> +
>  static int vfio_initfn(PCIDevice *pdev)
>  {
>      VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
> diff --git a/include/hw/vfio/vfio.h b/include/hw/vfio/vfio.h
> index 0b26cd8..99528a3 100644
> --- a/include/hw/vfio/vfio.h
> +++ b/include/hw/vfio/vfio.h
> @@ -5,5 +5,6 @@
>  
>  extern int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
>                                  int req, void *param);
> -
> +extern int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
> +                                    struct vfio_eeh_pe_op *op);
>  #endif

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] VFIO: Clear INTx pending state on EEH reset
  2015-03-11  6:11 ` [Qemu-devel] [PATCH 2/3] VFIO: Clear INTx pending state on " Gavin Shan
  2015-03-12  1:48   ` David Gibson
@ 2015-03-13 21:51   ` Alex Williamson
  2015-03-16  1:04     ` Gavin Shan
  1 sibling, 1 reply; 15+ messages in thread
From: Alex Williamson @ 2015-03-13 21:51 UTC (permalink / raw)
  To: Gavin Shan; +Cc: aik, david, qemu-ppc, qemu-devel, agraf

On Wed, 2015-03-11 at 17:11 +1100, Gavin Shan wrote:
> When Linux guest recovers from EEH error on the following Emulex
> adapter, the MSIx interrupts are disabled and the INTx emulation
> is enabled. One INTx interrupt is injected to the guest by host
> because of detected pending INTx interrupts on the adapter. QEMU
> disables mmap'ed BAR regions and starts a timer to enable those
> regions at later point the INTx interrupt handler. Unfortunately,
> "VFIOPCIDevice->intx.pending" isn't cleared, meaning those disabled
> mapp'ed BAR regions won't be reenabled properly. It leads to EEH
> recovery failure at guest side because of hanged MMIO access.
> 
>  # lspci | grep Emulex
>  0000:01:00.0 Ethernet controller: Emulex Corporation \
>               OneConnect 10Gb NIC (be3) (rev 02)
>  0000:01:00.1 Ethernet controller: Emulex Corporation \
>               OneConnect 10Gb NIC (be3) (rev 02)
> 
> The patch clears "VFIOPCIDevice->intx.pending" after EEH reset
> is completed on the PE, which contains the adapter. In turn, the
> mmap'ed BAR regions can be reenabled to avoid EEH recovery failure.
> 
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>  hw/vfio/pci.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 8c4a8cb..55e0904 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3352,6 +3352,20 @@ int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
>          }
>  
>          break;
> +    case VFIO_EEH_PE_RESET_DEACTIVATE:
> +        /*
> +         * We might have INTx interrupt whose handler disabled the
> +         * memory mapped BARs. Without clearing the INTx pending
> +         * state, the timer kicked by the INTx interrupt handler
> +         * won't enable those disabled memory mapped BARs, which
> +         * leads EEH recovery failure.
> +         */
> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
> +            vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
> +            vdev->intx.pending = false;
> +        }

I'm nervous that "pending" is trying to track that a) the host interrupt
is masked and b) the emulated INTx line for the device is asserted, but
we're not clearing the state of any of that here.  We can handle a
spurious EOI, the device should simply re-assert the interrupt, but
changing one piece of tracking w/o getting everything in sync seems like
a looming bug.  Thanks,

Alex

> +
> +        break;
>      }
>  
>      vfio_put_group(group);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] VFIO: Clear stale MSIx table during EEH reset
  2015-03-13 21:33 ` Alex Williamson
@ 2015-03-15 22:27   ` Gavin Shan
  0 siblings, 0 replies; 15+ messages in thread
From: Gavin Shan @ 2015-03-15 22:27 UTC (permalink / raw)
  To: Alex Williamson; +Cc: qemu-devel, aik, agraf, Gavin Shan, qemu-ppc, david

On Fri, Mar 13, 2015 at 03:33:50PM -0600, Alex Williamson wrote:
>On Wed, 2015-03-11 at 17:11 +1100, Gavin Shan wrote:
>> The PCI device MSIx table is cleaned out in hardware after EEH PE
>> reset. However, we still hold the stale MSIx entries in QEMU, which
>> should be cleared accordingly. Otherwise, we will run into another
>> (recursive) EEH error and the PCI devices contained in the PE have
>> to be offlined exceptionally.
>> 
>> The patch clears stale MSIx table before EEH PE reset so that MSIx
>> table could be restored properly after EEH PE reset.
>> 
>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> ---
>>  hw/vfio/common.c       |  6 +++++-
>>  hw/vfio/pci.c          | 39 +++++++++++++++++++++++++++++++++++++++
>>  include/hw/vfio/vfio.h |  3 ++-
>>  3 files changed, 46 insertions(+), 2 deletions(-)
>> 
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 148eb53..e3833f4 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -949,8 +949,12 @@ int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
>>      switch (req) {
>>      case VFIO_CHECK_EXTENSION:
>>      case VFIO_IOMMU_SPAPR_TCE_GET_INFO:
>> -    case VFIO_EEH_PE_OP:
>>          break;
>> +    case VFIO_EEH_PE_OP:
>> +        if (!vfio_container_eeh_event(as, groupid, param)) {
>> +            break;
>> +        }
>> +        /* fallthru */
>>      default:
>>          /* Return an error on unknown requests */
>>          error_report("vfio: unsupported ioctl %X", req);
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 6b80539..8c4a8cb 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -3319,6 +3319,45 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
>>      vdev->req_enabled = false;
>>  }
>>  
>> +int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
>> +                             struct vfio_eeh_pe_op *op)
>
>Called by common, but lives in pci.  This would need a stub if QEMU is
>built without CONFIG_PCI.  Thanks,
>

Right. I'll add a stub for !CONFIG_PCI.

Thanks,
Gavin

>Alex
>
>> +{
>> +    VFIOGroup *group;
>> +    VFIODevice *vbasedev;
>> +    VFIOPCIDevice *vdev;
>> +
>> +    group = vfio_get_group(groupid, as);
>> +    if (!group) {
>> +        vfio_put_group(group);
>> +        error_report("vfio: group %d not found\n", groupid);
>> +        return -1;
>> +    }
>> +
>> +    switch (op->op) {
>> +    case VFIO_EEH_PE_RESET_HOT:
>> +    case VFIO_EEH_PE_RESET_FUNDAMENTAL:
>> +        /*
>> +         * The MSIx table will be cleaned out by reset. We need
>> +         * disable it so that it can be reenabled properly. Also,
>> +         * the cached MSIx table should be cleared as it's not
>> +         * reflecting the contents in hardware.
>> +         */
>> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
>> +            vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>> +            if (msix_enabled(&vdev->pdev)) {
>> +                vfio_disable_msix(vdev);
>> +            }
>> +
>> +            msix_reset(&vdev->pdev);
>> +        }
>> +
>> +        break;
>> +    }
>> +
>> +    vfio_put_group(group);
>> +    return 0;
>> +}
>> +
>>  static int vfio_initfn(PCIDevice *pdev)
>>  {
>>      VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
>> diff --git a/include/hw/vfio/vfio.h b/include/hw/vfio/vfio.h
>> index 0b26cd8..99528a3 100644
>> --- a/include/hw/vfio/vfio.h
>> +++ b/include/hw/vfio/vfio.h
>> @@ -5,5 +5,6 @@
>>  
>>  extern int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
>>                                  int req, void *param);
>> -
>> +extern int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
>> +                                    struct vfio_eeh_pe_op *op);
>>  #endif
>
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] VFIO: Clear INTx pending state on EEH reset
  2015-03-13 21:51   ` Alex Williamson
@ 2015-03-16  1:04     ` Gavin Shan
  2015-03-16  4:05       ` [Qemu-devel] [Qemu-ppc] " Benjamin Herrenschmidt
  0 siblings, 1 reply; 15+ messages in thread
From: Gavin Shan @ 2015-03-16  1:04 UTC (permalink / raw)
  To: Alex Williamson; +Cc: qemu-devel, aik, agraf, Gavin Shan, qemu-ppc, david

On Fri, Mar 13, 2015 at 03:51:27PM -0600, Alex Williamson wrote:
>On Wed, 2015-03-11 at 17:11 +1100, Gavin Shan wrote:
>> When Linux guest recovers from EEH error on the following Emulex
>> adapter, the MSIx interrupts are disabled and the INTx emulation
>> is enabled. One INTx interrupt is injected to the guest by host
>> because of detected pending INTx interrupts on the adapter. QEMU
>> disables mmap'ed BAR regions and starts a timer to enable those
>> regions at later point the INTx interrupt handler. Unfortunately,
>> "VFIOPCIDevice->intx.pending" isn't cleared, meaning those disabled
>> mapp'ed BAR regions won't be reenabled properly. It leads to EEH
>> recovery failure at guest side because of hanged MMIO access.
>> 
>>  # lspci | grep Emulex
>>  0000:01:00.0 Ethernet controller: Emulex Corporation \
>>               OneConnect 10Gb NIC (be3) (rev 02)
>>  0000:01:00.1 Ethernet controller: Emulex Corporation \
>>               OneConnect 10Gb NIC (be3) (rev 02)
>> 
>> The patch clears "VFIOPCIDevice->intx.pending" after EEH reset
>> is completed on the PE, which contains the adapter. In turn, the
>> mmap'ed BAR regions can be reenabled to avoid EEH recovery failure.
>> 
>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> ---
>>  hw/vfio/pci.c | 14 ++++++++++++++
>>  1 file changed, 14 insertions(+)
>> 
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 8c4a8cb..55e0904 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -3352,6 +3352,20 @@ int vfio_container_eeh_event(AddressSpace *as, int32_t groupid,
>>          }
>>  
>>          break;
>> +    case VFIO_EEH_PE_RESET_DEACTIVATE:
>> +        /*
>> +         * We might have INTx interrupt whose handler disabled the
>> +         * memory mapped BARs. Without clearing the INTx pending
>> +         * state, the timer kicked by the INTx interrupt handler
>> +         * won't enable those disabled memory mapped BARs, which
>> +         * leads EEH recovery failure.
>> +         */
>> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
>> +            vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>> +            vdev->intx.pending = false;
>> +        }
>
>I'm nervous that "pending" is trying to track that a) the host interrupt
>is masked and b) the emulated INTx line for the device is asserted, but
>we're not clearing the state of any of that here.  We can handle a
>spurious EOI, the device should simply re-assert the interrupt, but
>changing one piece of tracking w/o getting everything in sync seems like
>a looming bug.  Thanks,
>

After the PCI device takes EEH reset, we shouldn't see pending INTx from
hardware side. So I think it's safe to clear "intx.pending" when EEH reset
is completed. However, I think I have to explain more about how this issue
was found and my understanding:

(1) Guest detects EEH errors. The device driver disables MSIx interrupt by
calling pci_disable_msix(), which disable MSIx vectors and then enable
INTx.

(2) QEMU sends IOCTL commands to host to disable MSIx and enable INTx. At
this stage the INTx is still masked. At later point, the guest is requesting
unmasking INTx, which is captured by host. Host checks and founds pending
INTx, which is sent to QEMU. In QEMU INTx handler (vfio_intx_interrupt()),
the mmap'ed regions are disabled, "intx.pending" is set and a timer is started
to reenable mmap'ed regions if "intx.pending" is cleared there. However,
"intx.pending" is only cleared upon BAR access in slow path, which is never
happing.

(3) After guest disables MSIx and issue EEH reset, the device driver starts
to check its firmware state by reading MMIO register, which isn't completed
by QEMU VFIO BAR slow path (Note: fast path supported by mmaped regions have
been disabled). Eventually, the guest hangs on reading MMIO register. With
this patch applied to QEMU, I didn't see the problem again. 

Thanks,
Gavin

>Alex
>
>> +
>> +        break;
>>      }
>>  
>>      vfio_put_group(group);
>
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 2/3] VFIO: Clear INTx pending state on EEH reset
  2015-03-16  1:04     ` Gavin Shan
@ 2015-03-16  4:05       ` Benjamin Herrenschmidt
  2015-03-16 14:34         ` Gavin Shan
  0 siblings, 1 reply; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2015-03-16  4:05 UTC (permalink / raw)
  To: Gavin Shan; +Cc: Alex Williamson, qemu-ppc, qemu-devel, david

On Mon, 2015-03-16 at 12:04 +1100, Gavin Shan wrote:
> 
> 
> (2) QEMU sends IOCTL commands to host to disable MSIx and enable INTx. At
> this stage the INTx is still masked. At later point, the guest is requesting
> unmasking INTx, which is captured by host. Host checks and founds pending
> INTx, which is sent to QEMU. In QEMU INTx handler (vfio_intx_interrupt()),
> the mmap'ed regions are disabled, "intx.pending" is set and a timer is started
> to reenable mmap'ed regions if "intx.pending" is cleared there. However,
> "intx.pending" is only cleared upon BAR access in slow path, which is never
> happing.
> 
> (3) After guest disables MSIx and issue EEH reset, the device driver starts
> to check its firmware state by reading MMIO register, which isn't completed
> by QEMU VFIO BAR slow path (Note: fast path supported by mmaped regions have
> been disabled). Eventually, the guest hangs on reading MMIO register. With
> this patch applied to QEMU, I didn't see the problem again. 

Note that it might be a good idea to disable INTx (and synchronize with a cfg
read of some sort) around resetting a device.

Otherwise, you may hit a known issue if the device is behind a switch and has
sent the INTx "assert" message, and not the "deassert" one before it gets reset.

That can cause the INTx to effectively be "stuck" in the switch preventing a
subsequent one from being delivered.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 2/3] VFIO: Clear INTx pending state on EEH reset
  2015-03-16  4:05       ` [Qemu-devel] [Qemu-ppc] " Benjamin Herrenschmidt
@ 2015-03-16 14:34         ` Gavin Shan
  2015-03-16 15:05           ` Alex Williamson
  0 siblings, 1 reply; 15+ messages in thread
From: Gavin Shan @ 2015-03-16 14:34 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Alex Williamson, david, qemu-ppc, Gavin Shan, qemu-devel

On Mon, Mar 16, 2015 at 03:05:32PM +1100, Benjamin Herrenschmidt wrote:
>On Mon, 2015-03-16 at 12:04 +1100, Gavin Shan wrote:
>> 
>> 
>> (2) QEMU sends IOCTL commands to host to disable MSIx and enable INTx. At
>> this stage the INTx is still masked. At later point, the guest is requesting
>> unmasking INTx, which is captured by host. Host checks and founds pending
>> INTx, which is sent to QEMU. In QEMU INTx handler (vfio_intx_interrupt()),
>> the mmap'ed regions are disabled, "intx.pending" is set and a timer is started
>> to reenable mmap'ed regions if "intx.pending" is cleared there. However,
>> "intx.pending" is only cleared upon BAR access in slow path, which is never
>> happing.
>> 
>> (3) After guest disables MSIx and issue EEH reset, the device driver starts
>> to check its firmware state by reading MMIO register, which isn't completed
>> by QEMU VFIO BAR slow path (Note: fast path supported by mmaped regions have
>> been disabled). Eventually, the guest hangs on reading MMIO register. With
>> this patch applied to QEMU, I didn't see the problem again. 
>
>Note that it might be a good idea to disable INTx (and synchronize with a cfg
>read of some sort) around resetting a device.
>
>Otherwise, you may hit a known issue if the device is behind a switch and has
>sent the INTx "assert" message, and not the "deassert" one before it gets reset.
>
>That can cause the INTx to effectively be "stuck" in the switch preventing a
>subsequent one from being delivered.
>

Yeah, It makes more sense to disable INTx before issuing EEH reset. I verified
that disabling INTx interrupt upon EEH reset can avoid the issue as well. I'll
post updated patch accordingly if Alex Williamson doesn't object.

Thanks,
Gavin

>Cheers,
>Ben.
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 2/3] VFIO: Clear INTx pending state on EEH reset
  2015-03-16 14:34         ` Gavin Shan
@ 2015-03-16 15:05           ` Alex Williamson
  2015-03-16 15:38             ` Gavin Shan
  0 siblings, 1 reply; 15+ messages in thread
From: Alex Williamson @ 2015-03-16 15:05 UTC (permalink / raw)
  To: Gavin Shan; +Cc: qemu-ppc, qemu-devel, david

On Tue, 2015-03-17 at 01:34 +1100, Gavin Shan wrote:
> On Mon, Mar 16, 2015 at 03:05:32PM +1100, Benjamin Herrenschmidt wrote:
> >On Mon, 2015-03-16 at 12:04 +1100, Gavin Shan wrote:
> >> 
> >> 
> >> (2) QEMU sends IOCTL commands to host to disable MSIx and enable INTx. At
> >> this stage the INTx is still masked. At later point, the guest is requesting
> >> unmasking INTx, which is captured by host. Host checks and founds pending
> >> INTx, which is sent to QEMU. In QEMU INTx handler (vfio_intx_interrupt()),
> >> the mmap'ed regions are disabled, "intx.pending" is set and a timer is started
> >> to reenable mmap'ed regions if "intx.pending" is cleared there. However,
> >> "intx.pending" is only cleared upon BAR access in slow path, which is never
> >> happing.
> >> 
> >> (3) After guest disables MSIx and issue EEH reset, the device driver starts
> >> to check its firmware state by reading MMIO register, which isn't completed
> >> by QEMU VFIO BAR slow path (Note: fast path supported by mmaped regions have
> >> been disabled). Eventually, the guest hangs on reading MMIO register. With
> >> this patch applied to QEMU, I didn't see the problem again. 
> >
> >Note that it might be a good idea to disable INTx (and synchronize with a cfg
> >read of some sort) around resetting a device.
> >
> >Otherwise, you may hit a known issue if the device is behind a switch and has
> >sent the INTx "assert" message, and not the "deassert" one before it gets reset.
> >
> >That can cause the INTx to effectively be "stuck" in the switch preventing a
> >subsequent one from being delivered.
> >
> 
> Yeah, It makes more sense to disable INTx before issuing EEH reset. I verified
> that disabling INTx interrupt upon EEH reset can avoid the issue as well. I'll
> post updated patch accordingly if Alex Williamson doesn't object.

That sounds like a cleaner approach, but you seem to be skipping
something around why the slow-path clearing of intx.pending isn't
working for you.  Step (2) says "... is only cleared upon BAR access in
slow path, which is never happening."  Step (3) "the device driver
starts to check its firmware state by reading MMIO register, which isn't
completed by QEMU VFIO BAR slow path".   So it sounds like (3) is doing
exactly what should allow the QEMU path INTx state machine to advance,
so why doesn't it work?  Thanks,

Alex

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 2/3] VFIO: Clear INTx pending state on EEH reset
  2015-03-16 15:05           ` Alex Williamson
@ 2015-03-16 15:38             ` Gavin Shan
  0 siblings, 0 replies; 15+ messages in thread
From: Gavin Shan @ 2015-03-16 15:38 UTC (permalink / raw)
  To: Alex Williamson; +Cc: david, qemu-ppc, Gavin Shan, qemu-devel

On Mon, Mar 16, 2015 at 09:05:27AM -0600, Alex Williamson wrote:
>On Tue, 2015-03-17 at 01:34 +1100, Gavin Shan wrote:
>> On Mon, Mar 16, 2015 at 03:05:32PM +1100, Benjamin Herrenschmidt wrote:
>> >On Mon, 2015-03-16 at 12:04 +1100, Gavin Shan wrote:
>> >> 
>> >> (2) QEMU sends IOCTL commands to host to disable MSIx and enable INTx. At
>> >> this stage the INTx is still masked. At later point, the guest is requesting
>> >> unmasking INTx, which is captured by host. Host checks and founds pending
>> >> INTx, which is sent to QEMU. In QEMU INTx handler (vfio_intx_interrupt()),
>> >> the mmap'ed regions are disabled, "intx.pending" is set and a timer is started
>> >> to reenable mmap'ed regions if "intx.pending" is cleared there. However,
>> >> "intx.pending" is only cleared upon BAR access in slow path, which is never
>> >> happing.
>> >> 
>> >> (3) After guest disables MSIx and issue EEH reset, the device driver starts
>> >> to check its firmware state by reading MMIO register, which isn't completed
>> >> by QEMU VFIO BAR slow path (Note: fast path supported by mmaped regions have
>> >> been disabled). Eventually, the guest hangs on reading MMIO register. With
>> >> this patch applied to QEMU, I didn't see the problem again. 
>> >
>> >Note that it might be a good idea to disable INTx (and synchronize with a cfg
>> >read of some sort) around resetting a device.
>> >
>> >Otherwise, you may hit a known issue if the device is behind a switch and has
>> >sent the INTx "assert" message, and not the "deassert" one before it gets reset.
>> >
>> >That can cause the INTx to effectively be "stuck" in the switch preventing a
>> >subsequent one from being delivered.
>> >
>> 
>> Yeah, It makes more sense to disable INTx before issuing EEH reset. I verified
>> that disabling INTx interrupt upon EEH reset can avoid the issue as well. I'll
>> post updated patch accordingly if Alex Williamson doesn't object.
>
>That sounds like a cleaner approach, but you seem to be skipping
>something around why the slow-path clearing of intx.pending isn't
>working for you.  Step (2) says "... is only cleared upon BAR access in
>slow path, which is never happening."  Step (3) "the device driver
>starts to check its firmware state by reading MMIO register, which isn't
>completed by QEMU VFIO BAR slow path".   So it sounds like (3) is doing
>exactly what should allow the QEMU path INTx state machine to advance,
>so why doesn't it work?  Thanks,
>

Thanks for confirm. I'll send out v2 tomorrow.

Nope, I'm not skipping why the slow path doesn't work. I didn't
understand well on how the QEMU memory model works together with
KVM. I need more time to trace and update with the findings.

Thanks,
Gavin 

>Alex
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-03-16 15:39 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-11  6:11 [Qemu-devel] [PATCH 1/3] VFIO: Clear stale MSIx table during EEH reset Gavin Shan
2015-03-11  6:11 ` [Qemu-devel] [PATCH 2/3] VFIO: Clear INTx pending state on " Gavin Shan
2015-03-12  1:48   ` David Gibson
2015-03-12  3:07     ` Gavin Shan
2015-03-13 21:51   ` Alex Williamson
2015-03-16  1:04     ` Gavin Shan
2015-03-16  4:05       ` [Qemu-devel] [Qemu-ppc] " Benjamin Herrenschmidt
2015-03-16 14:34         ` Gavin Shan
2015-03-16 15:05           ` Alex Williamson
2015-03-16 15:38             ` Gavin Shan
2015-03-11  6:11 ` [Qemu-devel] [PATCH 3/3] sPAPR: Reenable EEH functionality on reboot Gavin Shan
2015-03-12  1:04 ` [Qemu-devel] [PATCH 1/3] VFIO: Clear stale MSIx table during EEH reset David Gibson
2015-03-12  3:02   ` Gavin Shan
2015-03-13 21:33 ` Alex Williamson
2015-03-15 22:27   ` Gavin Shan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.