linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] powerpc/eeh: Enhancement to EEH for VF
@ 2016-02-26  0:04 Gavin Shan
  2016-02-26  0:04 ` [PATCH 1/3] powerpc/eeh: Don't propagate error to guest Gavin Shan
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Gavin Shan @ 2016-02-26  0:04 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: mpe, Gavin Shan

Those patches are based on the series of patches supporting EEH for VF,
which is pending for merging: https://patchwork.ozlabs.org/patch/581315/

This series of patches fixes couple of issue that resides in previous
patchset:

   * The error handlers provided by vfio-pci driver shouldn't be called.
     Otherwise, the guest is simply killed.
   * When we have partially hoplug in error recovery, we shouldn't remove
     those passed-through devices. Otherwise, the guest will be brought
     to undefined situation.
   * When we have errors detected on PF PE, hold VF PE that has been passed
     through to guest until the recovery on PF PE is done

Gavin Shan (3):
  powerpc/eeh: Don't propagate error to guest
  powerpc/eeh: Don't remove passed VFs
  powerpc/eeh: Synchronize recovery in host/guest

 arch/powerpc/kernel/eeh.c        | 11 +++++++++++
 arch/powerpc/kernel/eeh_driver.c | 13 ++++++++-----
 2 files changed, 19 insertions(+), 5 deletions(-)

-- 
2.1.0

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/3] powerpc/eeh: Don't propagate error to guest
  2016-02-26  0:04 [PATCH 0/3] powerpc/eeh: Enhancement to EEH for VF Gavin Shan
@ 2016-02-26  0:04 ` Gavin Shan
  2016-02-26  0:04 ` [PATCH 2/3] powerpc/eeh: Don't remove passed VFs Gavin Shan
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Gavin Shan @ 2016-02-26  0:04 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: mpe, Gavin Shan

When EEH error happened to the parent PE of those PEs that have
been passed through to guest, the error is propagated to guest
domain and the VFIO driver's error handlers are called. It's not
correct as the error in the host domain shouldn't be propagated
to guests and affect them.

This adds one more limitation when calling EEH error handlers.
If the PE has been passed through to guest, the error handlers
won't be called.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/eeh_driver.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index c0fe7a6..6c59de8 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -195,7 +195,7 @@ static void *eeh_report_error(void *data, void *userdata)
 	enum pci_ers_result rc, *res = userdata;
 	struct pci_driver *driver;
 
-	if (!dev || eeh_dev_removed(edev))
+	if (!dev || eeh_dev_removed(edev) || eeh_pe_passed(edev->pe))
 		return NULL;
 	dev->error_state = pci_channel_io_frozen;
 
@@ -237,7 +237,7 @@ static void *eeh_report_mmio_enabled(void *data, void *userdata)
 	enum pci_ers_result rc, *res = userdata;
 	struct pci_driver *driver;
 
-	if (!dev || eeh_dev_removed(edev))
+	if (!dev || eeh_dev_removed(edev) || eeh_pe_passed(edev->pe))
 		return NULL;
 
 	driver = eeh_pcid_get(dev);
@@ -277,7 +277,7 @@ static void *eeh_report_reset(void *data, void *userdata)
 	enum pci_ers_result rc, *res = userdata;
 	struct pci_driver *driver;
 
-	if (!dev || eeh_dev_removed(edev))
+	if (!dev || eeh_dev_removed(edev) || eeh_pe_passed(edev->pe))
 		return NULL;
 	dev->error_state = pci_channel_io_normal;
 
@@ -336,7 +336,7 @@ static void *eeh_report_resume(void *data, void *userdata)
 	bool was_in_error;
 	struct pci_driver *driver;
 
-	if (!dev || eeh_dev_removed(edev))
+	if (!dev || eeh_dev_removed(edev) || eeh_pe_passed(edev->pe))
 		return NULL;
 	dev->error_state = pci_channel_io_normal;
 
@@ -375,7 +375,7 @@ static void *eeh_report_failure(void *data, void *userdata)
 	struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
 	struct pci_driver *driver;
 
-	if (!dev || eeh_dev_removed(edev))
+	if (!dev || eeh_dev_removed(edev) || eeh_pe_passed(edev->pe))
 		return NULL;
 	dev->error_state = pci_channel_io_perm_failure;
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/3] powerpc/eeh: Don't remove passed VFs
  2016-02-26  0:04 [PATCH 0/3] powerpc/eeh: Enhancement to EEH for VF Gavin Shan
  2016-02-26  0:04 ` [PATCH 1/3] powerpc/eeh: Don't propagate error to guest Gavin Shan
@ 2016-02-26  0:04 ` Gavin Shan
  2016-02-26  0:04 ` [PATCH 3/3] powerpc/eeh: Synchronize recovery in host/guest Gavin Shan
  2016-03-02  1:04 ` [PATCH 0/3] powerpc/eeh: Enhancement to EEH for VF Russell Currey
  3 siblings, 0 replies; 7+ messages in thread
From: Gavin Shan @ 2016-02-26  0:04 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: mpe, Gavin Shan

When we have partial hotplug as part of the error recovery on PF,
the VFs that are bound with vfio-pci driver will experience hotplug.
That's not allowed.

This checks if the VF PE is passed or not. If it does, we leave
the VF without removing it.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/eeh_driver.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 6c59de8..fb6207d 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -455,6 +455,9 @@ static void *eeh_rmv_device(void *data, void *userdata)
 	if (driver) {
 		eeh_pcid_put(dev);
 		if (removed &&
+		    eeh_pe_passed(edev->pe))
+			return NULL;
+		if (removed &&
 		    driver->err_handler &&
 		    driver->err_handler->error_detected &&
 		    driver->err_handler->slot_reset)
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/3] powerpc/eeh: Synchronize recovery in host/guest
  2016-02-26  0:04 [PATCH 0/3] powerpc/eeh: Enhancement to EEH for VF Gavin Shan
  2016-02-26  0:04 ` [PATCH 1/3] powerpc/eeh: Don't propagate error to guest Gavin Shan
  2016-02-26  0:04 ` [PATCH 2/3] powerpc/eeh: Don't remove passed VFs Gavin Shan
@ 2016-02-26  0:04 ` Gavin Shan
  2016-03-02  1:03   ` Russell Currey
  2016-03-02  1:04 ` [PATCH 0/3] powerpc/eeh: Enhancement to EEH for VF Russell Currey
  3 siblings, 1 reply; 7+ messages in thread
From: Gavin Shan @ 2016-02-26  0:04 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: mpe, Gavin Shan

When passing through SRIOV VFs to guest, we possibly encounter EEH
error on PF. In this case, the VF PEs are put into frozen state.
The error could be reported to guest before it's captured by the
host. That means the guest could attempt to recover errors on VFs
before host gets chance to recover errors on PFs. The VFs won't be
recovered successfully.

This enforces the recovery order for above case: the recovery on
child PE in guest is hold until the recovery on parent PE in host
is completed.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/eeh.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index fd9c782..42bd546 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1541,6 +1541,17 @@ int eeh_pe_get_state(struct eeh_pe *pe)
 	if (!eeh_ops || !eeh_ops->get_state)
 		return -ENOENT;
 
+	/*
+	 * If the parent PE, which is owned by host kernel, is experiencing
+	 * error recovery. We should return temporarily unavailable PE state
+	 * so that the recovery on guest side is suspended until the error
+	 * recovery is completed on host side.
+	 */
+	if (pe->parent &&
+	    !(pe->state & EEH_PE_REMOVED) &&
+	    (pe->parent->state & (EEH_PE_ISOLATED | EEH_PE_RECOVERING)))
+		return EEH_PE_STATE_UNAVAIL;
+
 	result = eeh_ops->get_state(pe, NULL);
 	rst_active = !!(result & EEH_STATE_RESET_ACTIVE);
 	dma_en = !!(result & EEH_STATE_DMA_ENABLED);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 3/3] powerpc/eeh: Synchronize recovery in host/guest
  2016-02-26  0:04 ` [PATCH 3/3] powerpc/eeh: Synchronize recovery in host/guest Gavin Shan
@ 2016-03-02  1:03   ` Russell Currey
  2016-03-02  2:13     ` Gavin Shan
  0 siblings, 1 reply; 7+ messages in thread
From: Russell Currey @ 2016-03-02  1:03 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev

On Fri, 2016-02-26 at 11:04 +1100, Gavin Shan wrote:
> When passing through SRIOV VFs to guest, we possibly encounter EEH
> error on PF. In this case, the VF PEs are put into frozen state.
> The error could be reported to guest before it's captured by the
> host. That means the guest could attempt to recover errors on VFs
> before host gets chance to recover errors on PFs. The VFs won't be
> recovered successfully.
> 
> This enforces the recovery order for above case: the recovery on
> child PE in guest is hold until the recovery on parent PE in host
> is completed.
> 
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kernel/eeh.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index fd9c782..42bd546 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -1541,6 +1541,17 @@ int eeh_pe_get_state(struct eeh_pe *pe)
>  	if (!eeh_ops || !eeh_ops->get_state)
>  		return -ENOENT;
>  
> +	/*
> +	 * If the parent PE, which is owned by host kernel, is
> experiencing
> +	 * error recovery. We should return temporarily unavailable PE
> state
> +	 * so that the recovery on guest side is suspended until the
> error
> +	 * recovery is completed on host side.
> +	 */

Hi Gavin,

I think this could be worded a little better.  For example:

/*
 * If the parent PE is owned by the host kernel and is undergoing
 * error recovery, we should return the PE state as temporarily
 * unavailable so that the error recovery on the guest is suspended
 * until the recovery completes on the host.
 */

> +	if (pe->parent &&
> +	    !(pe->state & EEH_PE_REMOVED) &&
> +	    (pe->parent->state & (EEH_PE_ISOLATED | EEH_PE_RECOVERING)))
> +		return EEH_PE_STATE_UNAVAIL;
> +
>  	result = eeh_ops->get_state(pe, NULL);
>  	rst_active = !!(result & EEH_STATE_RESET_ACTIVE);
>  	dma_en = !!(result & EEH_STATE_DMA_ENABLED);

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/3] powerpc/eeh: Enhancement to EEH for VF
  2016-02-26  0:04 [PATCH 0/3] powerpc/eeh: Enhancement to EEH for VF Gavin Shan
                   ` (2 preceding siblings ...)
  2016-02-26  0:04 ` [PATCH 3/3] powerpc/eeh: Synchronize recovery in host/guest Gavin Shan
@ 2016-03-02  1:04 ` Russell Currey
  3 siblings, 0 replies; 7+ messages in thread
From: Russell Currey @ 2016-03-02  1:04 UTC (permalink / raw)
  To: Gavin Shan, linuxppc-dev

On Fri, 2016-02-26 at 11:04 +1100, Gavin Shan wrote:
> Those patches are based on the series of patches supporting EEH for VF,
> which is pending for merging: https://patchwork.ozlabs.org/patch/581315/
> 
> This series of patches fixes couple of issue that resides in previous
> patchset:
> 
>    * The error handlers provided by vfio-pci driver shouldn't be called.
>      Otherwise, the guest is simply killed.
>    * When we have partially hoplug in error recovery, we shouldn't remove
>      those passed-through devices. Otherwise, the guest will be brought
>      to undefined situation.
>    * When we have errors detected on PF PE, hold VF PE that has been
> passed
>      through to guest until the recovery on PF PE is done
> 
> Gavin Shan (3):
>   powerpc/eeh: Don't propagate error to guest
>   powerpc/eeh: Don't remove passed VFs
>   powerpc/eeh: Synchronize recovery in host/guest
> 
>  arch/powerpc/kernel/eeh.c        | 11 +++++++++++
>  arch/powerpc/kernel/eeh_driver.c | 13 ++++++++-----
>  2 files changed, 19 insertions(+), 5 deletions(-)
> 

For the whole series (incorporating my comments on patch 3):

Reviewed-by: Russell Currey <ruscur@russell.cc>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 3/3] powerpc/eeh: Synchronize recovery in host/guest
  2016-03-02  1:03   ` Russell Currey
@ 2016-03-02  2:13     ` Gavin Shan
  0 siblings, 0 replies; 7+ messages in thread
From: Gavin Shan @ 2016-03-02  2:13 UTC (permalink / raw)
  To: Russell Currey; +Cc: Gavin Shan, linuxppc-dev

On Wed, Mar 02, 2016 at 12:03:20PM +1100, Russell Currey wrote:
>On Fri, 2016-02-26 at 11:04 +1100, Gavin Shan wrote:
>> When passing through SRIOV VFs to guest, we possibly encounter EEH
>> error on PF. In this case, the VF PEs are put into frozen state.
>> The error could be reported to guest before it's captured by the
>> host. That means the guest could attempt to recover errors on VFs
>> before host gets chance to recover errors on PFs. The VFs won't be
>> recovered successfully.
>> 
>> This enforces the recovery order for above case: the recovery on
>> child PE in guest is hold until the recovery on parent PE in host
>> is completed.
>> 
>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/kernel/eeh.c | 11 +++++++++++
>>  1 file changed, 11 insertions(+)
>> 
>> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
>> index fd9c782..42bd546 100644
>> --- a/arch/powerpc/kernel/eeh.c
>> +++ b/arch/powerpc/kernel/eeh.c
>> @@ -1541,6 +1541,17 @@ int eeh_pe_get_state(struct eeh_pe *pe)
>>  	if (!eeh_ops || !eeh_ops->get_state)
>>  		return -ENOENT;
>>  
>> +	/*
>> +	 * If the parent PE, which is owned by host kernel, is
>> experiencing
>> +	 * error recovery. We should return temporarily unavailable PE
>> state
>> +	 * so that the recovery on guest side is suspended until the
>> error
>> +	 * recovery is completed on host side.
>> +	 */
>
>Hi Gavin,
>
>I think this could be worded a little better.  For example:
>
>/*
> * If the parent PE is owned by the host kernel and is undergoing
> * error recovery, we should return the PE state as temporarily
> * unavailable so that the error recovery on the guest is suspended
> * until the recovery completes on the host.
> */
>

Yes, it will be integrated to v2. Thanks for review.

>> +	if (pe->parent &&
>> +	    !(pe->state & EEH_PE_REMOVED) &&
>> +	    (pe->parent->state & (EEH_PE_ISOLATED | EEH_PE_RECOVERING)))
>> +		return EEH_PE_STATE_UNAVAIL;
>> +
>>  	result = eeh_ops->get_state(pe, NULL);
>>  	rst_active = !!(result & EEH_STATE_RESET_ACTIVE);
>>  	dma_en = !!(result & EEH_STATE_DMA_ENABLED);
>

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-03-02  2:14 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-26  0:04 [PATCH 0/3] powerpc/eeh: Enhancement to EEH for VF Gavin Shan
2016-02-26  0:04 ` [PATCH 1/3] powerpc/eeh: Don't propagate error to guest Gavin Shan
2016-02-26  0:04 ` [PATCH 2/3] powerpc/eeh: Don't remove passed VFs Gavin Shan
2016-02-26  0:04 ` [PATCH 3/3] powerpc/eeh: Synchronize recovery in host/guest Gavin Shan
2016-03-02  1:03   ` Russell Currey
2016-03-02  2:13     ` Gavin Shan
2016-03-02  1:04 ` [PATCH 0/3] powerpc/eeh: Enhancement to EEH for VF Russell Currey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).