All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] powerpc/pseries: Avoid context switch in EEH reset if required
@ 2015-01-18 22:47 Gavin Shan
  2015-01-20  9:28 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 11+ messages in thread
From: Gavin Shan @ 2015-01-18 22:47 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Gavin Shan

On pseries platform, the EEH reset backend pseries_eeh_reset() can
be called in atomic context as follows. For this case, we should
call udelay() instead of msleep() to avoid context switching.

     drivers/scsi/ipr.c::ipr_reset_slot_reset_done()
     drivers/pci/pci.c::pci_set_pcie_reset_state()
     arch/powerpc/kernel/eeh.c::pcibios_set_pcie_reset_state()
     arch/powerpc/platforms/pseries/eeh_pseries.c::pseries_eeh_reset()

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Tested-by: Wen Xiong<wenxiong@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/eeh_pseries.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
index a6c7e19..67623a3 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -503,8 +503,7 @@ static int pseries_eeh_get_state(struct eeh_pe *pe, int *state)
  */
 static int pseries_eeh_reset(struct eeh_pe *pe, int option)
 {
-	int config_addr;
-	int ret;
+	int config_addr, delay, ret;
 
 	/* Figure out PE address */
 	config_addr = pe->config_addr;
@@ -528,9 +527,14 @@ static int pseries_eeh_reset(struct eeh_pe *pe, int option)
 	/* We need reset hold or settlement delay */
 	if (option == EEH_RESET_FUNDAMENTAL ||
 	    option == EEH_RESET_HOT)
-		msleep(EEH_PE_RST_HOLD_TIME);
+		delay = EEH_PE_RST_HOLD_TIME;
+	else
+		delay = EEH_PE_RST_SETTLE_TIME;
+
+	if (in_atomic())
+		udelay(delay * 1000);
 	else
-		msleep(EEH_PE_RST_SETTLE_TIME);
+		msleep(delay);
 
 	return ret;
 }
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] powerpc/pseries: Avoid context switch in EEH reset if required
  2015-01-18 22:47 [PATCH] powerpc/pseries: Avoid context switch in EEH reset if required Gavin Shan
@ 2015-01-20  9:28 ` Benjamin Herrenschmidt
  2015-01-20 22:56   ` Gavin Shan
  0 siblings, 1 reply; 11+ messages in thread
From: Benjamin Herrenschmidt @ 2015-01-20  9:28 UTC (permalink / raw)
  To: Gavin Shan; +Cc: linuxppc-dev

On Mon, 2015-01-19 at 09:47 +1100, Gavin Shan wrote:
> On pseries platform, the EEH reset backend pseries_eeh_reset() can
> be called in atomic context as follows. For this case, we should
> call udelay() instead of msleep() to avoid context switching.
> 
>      drivers/scsi/ipr.c::ipr_reset_slot_reset_done()
>      drivers/pci/pci.c::pci_set_pcie_reset_state()
>      arch/powerpc/kernel/eeh.c::pcibios_set_pcie_reset_state()
>      arch/powerpc/platforms/pseries/eeh_pseries.c::pseries_eeh_reset()

It's not acceptable to introduce multi-millisecond delays at interrupt
time. In fact, we should generally not use udelay in such context.

I understand that this is an exceptional error handling case but it's
still not right.

Are there many other users of pci_set_pcie_reset_state() at interrupt
time ? Can we have a discussion with the PCI folks as to whether that
should be legal or not ?

I'm tempted to require that it's made illegal.

Ben.

> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> Tested-by: Wen Xiong<wenxiong@linux.vnet.ibm.com>
> ---
>  arch/powerpc/platforms/pseries/eeh_pseries.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
> index a6c7e19..67623a3 100644
> --- a/arch/powerpc/platforms/pseries/eeh_pseries.c
> +++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
> @@ -503,8 +503,7 @@ static int pseries_eeh_get_state(struct eeh_pe *pe, int *state)
>   */
>  static int pseries_eeh_reset(struct eeh_pe *pe, int option)
>  {
> -	int config_addr;
> -	int ret;
> +	int config_addr, delay, ret;
>  
>  	/* Figure out PE address */
>  	config_addr = pe->config_addr;
> @@ -528,9 +527,14 @@ static int pseries_eeh_reset(struct eeh_pe *pe, int option)
>  	/* We need reset hold or settlement delay */
>  	if (option == EEH_RESET_FUNDAMENTAL ||
>  	    option == EEH_RESET_HOT)
> -		msleep(EEH_PE_RST_HOLD_TIME);
> +		delay = EEH_PE_RST_HOLD_TIME;
> +	else
> +		delay = EEH_PE_RST_SETTLE_TIME;
> +
> +	if (in_atomic())
> +		udelay(delay * 1000);
>  	else
> -		msleep(EEH_PE_RST_SETTLE_TIME);
> +		msleep(delay);
>  
>  	return ret;
>  }

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] powerpc/pseries: Avoid context switch in EEH reset if required
  2015-01-20  9:28 ` Benjamin Herrenschmidt
@ 2015-01-20 22:56   ` Gavin Shan
  2015-01-20 23:53     ` Gavin Shan
  0 siblings, 1 reply; 11+ messages in thread
From: Gavin Shan @ 2015-01-20 22:56 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, wenxiong, Gavin Shan

On Tue, Jan 20, 2015 at 10:28:16AM +0100, Benjamin Herrenschmidt wrote:
>On Mon, 2015-01-19 at 09:47 +1100, Gavin Shan wrote:
>> On pseries platform, the EEH reset backend pseries_eeh_reset() can
>> be called in atomic context as follows. For this case, we should
>> call udelay() instead of msleep() to avoid context switching.
>> 
>>      drivers/scsi/ipr.c::ipr_reset_slot_reset_done()
>>      drivers/pci/pci.c::pci_set_pcie_reset_state()
>>      arch/powerpc/kernel/eeh.c::pcibios_set_pcie_reset_state()
>>      arch/powerpc/platforms/pseries/eeh_pseries.c::pseries_eeh_reset()
>
>It's not acceptable to introduce multi-millisecond delays at interrupt
>time. In fact, we should generally not use udelay in such context.
>
>I understand that this is an exceptional error handling case but it's
>still not right.
>

Yes, I agree it's unsafe to udelay for multi-milliseconds as the queued
works in atomic context is expected to be completed as soon as possible.

>Are there many other users of pci_set_pcie_reset_state() at interrupt
>time ? Can we have a discussion with the PCI folks as to whether that
>should be legal or not ?
>
>I'm tempted to require that it's made illegal.

Currently, there are 2 drivers calling this function: IPR and misc/genwqe.
Also, VFIO would call this function for IBM and Mellanox adapters in PowerKVM
repository. For now, IPR driver is the only one call this function in atomic
context. 

Sure, I'll send one email to confirm with PCI folks. I guess it's illegal
to call pci_set_pcie_reset_state() in atomic context. If it's the case,
I'm afraid Wendy has to change IPR driver to replace the reset timer with
something else (e.g. workqueue).

Thanks,
Gavin

>
>Ben.
>
>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>> Tested-by: Wen Xiong<wenxiong@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/platforms/pseries/eeh_pseries.c | 12 ++++++++----
>>  1 file changed, 8 insertions(+), 4 deletions(-)
>> 
>> diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
>> index a6c7e19..67623a3 100644
>> --- a/arch/powerpc/platforms/pseries/eeh_pseries.c
>> +++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
>> @@ -503,8 +503,7 @@ static int pseries_eeh_get_state(struct eeh_pe *pe, int *state)
>>   */
>>  static int pseries_eeh_reset(struct eeh_pe *pe, int option)
>>  {
>> -	int config_addr;
>> -	int ret;
>> +	int config_addr, delay, ret;
>>  
>>  	/* Figure out PE address */
>>  	config_addr = pe->config_addr;
>> @@ -528,9 +527,14 @@ static int pseries_eeh_reset(struct eeh_pe *pe, int option)
>>  	/* We need reset hold or settlement delay */
>>  	if (option == EEH_RESET_FUNDAMENTAL ||
>>  	    option == EEH_RESET_HOT)
>> -		msleep(EEH_PE_RST_HOLD_TIME);
>> +		delay = EEH_PE_RST_HOLD_TIME;
>> +	else
>> +		delay = EEH_PE_RST_SETTLE_TIME;
>> +
>> +	if (in_atomic())
>> +		udelay(delay * 1000);
>>  	else
>> -		msleep(EEH_PE_RST_SETTLE_TIME);
>> +		msleep(delay);
>>  
>>  	return ret;
>>  }
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] powerpc/pseries: Avoid context switch in EEH reset if required
  2015-01-20 22:56   ` Gavin Shan
@ 2015-01-20 23:53     ` Gavin Shan
  2015-01-23  3:50       ` Gavin Shan
  0 siblings, 1 reply; 11+ messages in thread
From: Gavin Shan @ 2015-01-20 23:53 UTC (permalink / raw)
  To: Gavin Shan; +Cc: wenxiong, linuxppc-dev

On Wed, Jan 21, 2015 at 09:56:07AM +1100, Gavin Shan wrote:
>On Tue, Jan 20, 2015 at 10:28:16AM +0100, Benjamin Herrenschmidt wrote:
>>On Mon, 2015-01-19 at 09:47 +1100, Gavin Shan wrote:
>>> On pseries platform, the EEH reset backend pseries_eeh_reset() can
>>> be called in atomic context as follows. For this case, we should
>>> call udelay() instead of msleep() to avoid context switching.
>>> 
>>>      drivers/scsi/ipr.c::ipr_reset_slot_reset_done()
>>>      drivers/pci/pci.c::pci_set_pcie_reset_state()
>>>      arch/powerpc/kernel/eeh.c::pcibios_set_pcie_reset_state()
>>>      arch/powerpc/platforms/pseries/eeh_pseries.c::pseries_eeh_reset()
>>
>>It's not acceptable to introduce multi-millisecond delays at interrupt
>>time. In fact, we should generally not use udelay in such context.
>>
>>I understand that this is an exceptional error handling case but it's
>>still not right.
>>
>
>Yes, I agree it's unsafe to udelay for multi-milliseconds as the queued
>works in atomic context is expected to be completed as soon as possible.
>
>>Are there many other users of pci_set_pcie_reset_state() at interrupt
>>time ? Can we have a discussion with the PCI folks as to whether that
>>should be legal or not ?
>>
>>I'm tempted to require that it's made illegal.
>
>Currently, there are 2 drivers calling this function: IPR and misc/genwqe.
>Also, VFIO would call this function for IBM and Mellanox adapters in PowerKVM
>repository. For now, IPR driver is the only one call this function in atomic
>context. 
>
>Sure, I'll send one email to confirm with PCI folks. I guess it's illegal
>to call pci_set_pcie_reset_state() in atomic context. If it's the case,
>I'm afraid Wendy has to change IPR driver to replace the reset timer with
>something else (e.g. workqueue).
>

Another way is to drop the hold/settle delays for pcibios_set_pcie_reset_state()
and IPR relies on the timer interval to cover them. Wendy, could you please
let me know if it would work for you or not?

    Start reset timer;
    Timer expires, assert the reset. Restart the timer with assert delay;
    Timer expires, deassert the reset. Restart the timer with settle delay;
    Timer expires, ready for subsequent works;

Thanks,
Gavin

>>
>>Ben.
>>
>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>> Tested-by: Wen Xiong<wenxiong@linux.vnet.ibm.com>
>>> ---
>>>  arch/powerpc/platforms/pseries/eeh_pseries.c | 12 ++++++++----
>>>  1 file changed, 8 insertions(+), 4 deletions(-)
>>> 
>>> diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
>>> index a6c7e19..67623a3 100644
>>> --- a/arch/powerpc/platforms/pseries/eeh_pseries.c
>>> +++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
>>> @@ -503,8 +503,7 @@ static int pseries_eeh_get_state(struct eeh_pe *pe, int *state)
>>>   */
>>>  static int pseries_eeh_reset(struct eeh_pe *pe, int option)
>>>  {
>>> -	int config_addr;
>>> -	int ret;
>>> +	int config_addr, delay, ret;
>>>  
>>>  	/* Figure out PE address */
>>>  	config_addr = pe->config_addr;
>>> @@ -528,9 +527,14 @@ static int pseries_eeh_reset(struct eeh_pe *pe, int option)
>>>  	/* We need reset hold or settlement delay */
>>>  	if (option == EEH_RESET_FUNDAMENTAL ||
>>>  	    option == EEH_RESET_HOT)
>>> -		msleep(EEH_PE_RST_HOLD_TIME);
>>> +		delay = EEH_PE_RST_HOLD_TIME;
>>> +	else
>>> +		delay = EEH_PE_RST_SETTLE_TIME;
>>> +
>>> +	if (in_atomic())
>>> +		udelay(delay * 1000);
>>>  	else
>>> -		msleep(EEH_PE_RST_SETTLE_TIME);
>>> +		msleep(delay);
>>>  
>>>  	return ret;
>>>  }
>>
>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] powerpc/pseries: Avoid context switch in EEH reset if required
  2015-01-20 23:53     ` Gavin Shan
@ 2015-01-23  3:50       ` Gavin Shan
  2015-01-24  9:57         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 11+ messages in thread
From: Gavin Shan @ 2015-01-23  3:50 UTC (permalink / raw)
  To: Gavin Shan; +Cc: wenxiong, linuxppc-dev

On Wed, Jan 21, 2015 at 10:53:38AM +1100, Gavin Shan wrote:
>On Wed, Jan 21, 2015 at 09:56:07AM +1100, Gavin Shan wrote:
>>On Tue, Jan 20, 2015 at 10:28:16AM +0100, Benjamin Herrenschmidt wrote:
>>>On Mon, 2015-01-19 at 09:47 +1100, Gavin Shan wrote:
>>>> On pseries platform, the EEH reset backend pseries_eeh_reset() can
>>>> be called in atomic context as follows. For this case, we should
>>>> call udelay() instead of msleep() to avoid context switching.
>>>> 
>>>>      drivers/scsi/ipr.c::ipr_reset_slot_reset_done()
>>>>      drivers/pci/pci.c::pci_set_pcie_reset_state()
>>>>      arch/powerpc/kernel/eeh.c::pcibios_set_pcie_reset_state()
>>>>      arch/powerpc/platforms/pseries/eeh_pseries.c::pseries_eeh_reset()
>>>
>>>It's not acceptable to introduce multi-millisecond delays at interrupt
>>>time. In fact, we should generally not use udelay in such context.
>>>
>>>I understand that this is an exceptional error handling case but it's
>>>still not right.
>>>
>>
>>Yes, I agree it's unsafe to udelay for multi-milliseconds as the queued
>>works in atomic context is expected to be completed as soon as possible.
>>
>>>Are there many other users of pci_set_pcie_reset_state() at interrupt
>>>time ? Can we have a discussion with the PCI folks as to whether that
>>>should be legal or not ?
>>>
>>>I'm tempted to require that it's made illegal.
>>
>>Currently, there are 2 drivers calling this function: IPR and misc/genwqe.
>>Also, VFIO would call this function for IBM and Mellanox adapters in PowerKVM
>>repository. For now, IPR driver is the only one call this function in atomic
>>context. 
>>
>>Sure, I'll send one email to confirm with PCI folks. I guess it's illegal
>>to call pci_set_pcie_reset_state() in atomic context. If it's the case,
>>I'm afraid Wendy has to change IPR driver to replace the reset timer with
>>something else (e.g. workqueue).
>>
>
>Another way is to drop the hold/settle delays for pcibios_set_pcie_reset_state()
>and IPR relies on the timer interval to cover them. Wendy, could you please
>let me know if it would work for you or not?
>
>    Start reset timer;
>    Timer expires, assert the reset. Restart the timer with assert delay;
>    Timer expires, deassert the reset. Restart the timer with settle delay;
>    Timer expires, ready for subsequent works;
>

Brian King is the author of pci_set_pcie_reset_state() and as he said, it was
expected to be called in atomic context. So I'm going to come up with another
approach as above, caller of pci_set_pcie_reset_state() should take corresponding 
delays as what IPR driver is doing.

Messages from Brian for reference:

| The API has changed. I wrote the pci_set_pcie_reset_state API originally.
| When this API was put in place initially, it was perfectly legal to call it
| from an atomic context. Can you clarify why we have to have the delay in the
| pci_set_pcie_reset_state function? Shouldn't it be the responsibility of the
| callers to ensure a proper delay is used? This was always the case until
| recently.

So please ignore this patch and I'll send another one, which is implemented in
above approach.

Thanks,
Gavin

>>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>> Tested-by: Wen Xiong<wenxiong@linux.vnet.ibm.com>
>>>> ---
>>>>  arch/powerpc/platforms/pseries/eeh_pseries.c | 12 ++++++++----
>>>>  1 file changed, 8 insertions(+), 4 deletions(-)
>>>> 
>>>> diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
>>>> index a6c7e19..67623a3 100644
>>>> --- a/arch/powerpc/platforms/pseries/eeh_pseries.c
>>>> +++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
>>>> @@ -503,8 +503,7 @@ static int pseries_eeh_get_state(struct eeh_pe *pe, int *state)
>>>>   */
>>>>  static int pseries_eeh_reset(struct eeh_pe *pe, int option)
>>>>  {
>>>> -	int config_addr;
>>>> -	int ret;
>>>> +	int config_addr, delay, ret;
>>>>  
>>>>  	/* Figure out PE address */
>>>>  	config_addr = pe->config_addr;
>>>> @@ -528,9 +527,14 @@ static int pseries_eeh_reset(struct eeh_pe *pe, int option)
>>>>  	/* We need reset hold or settlement delay */
>>>>  	if (option == EEH_RESET_FUNDAMENTAL ||
>>>>  	    option == EEH_RESET_HOT)
>>>> -		msleep(EEH_PE_RST_HOLD_TIME);
>>>> +		delay = EEH_PE_RST_HOLD_TIME;
>>>> +	else
>>>> +		delay = EEH_PE_RST_SETTLE_TIME;
>>>> +
>>>> +	if (in_atomic())
>>>> +		udelay(delay * 1000);
>>>>  	else
>>>> -		msleep(EEH_PE_RST_SETTLE_TIME);
>>>> +		msleep(delay);
>>>>  
>>>>  	return ret;
>>>>  }
>>>
>>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] powerpc/pseries: Avoid context switch in EEH reset if required
  2015-01-23  3:50       ` Gavin Shan
@ 2015-01-24  9:57         ` Benjamin Herrenschmidt
  2015-01-26 23:36           ` Brian King
  0 siblings, 1 reply; 11+ messages in thread
From: Benjamin Herrenschmidt @ 2015-01-24  9:57 UTC (permalink / raw)
  To: Gavin Shan; +Cc: linuxppc-dev, wenxiong, Brian J King

On Fri, 2015-01-23 at 14:50 +1100, Gavin Shan wrote:

> Messages from Brian for reference:
> 
> | The API has changed. I wrote the pci_set_pcie_reset_state API originally.
> | When this API was put in place initially, it was perfectly legal to call it
> | from an atomic context. Can you clarify why we have to have the delay in the
> | pci_set_pcie_reset_state function? Shouldn't it be the responsibility of the
> | callers to ensure a proper delay is used? This was always the case until
> | recently.
> 
> So please ignore this patch and I'll send another one, which is implemented in
> above approach.

I still think it's not a great idea to allow that API to be called in
atomic context.

Brian, the reset API for PCIe involves FW calls which might have to do
a bunch of stuff under the hood, including potentially significant
delays.

For example, under OPAL (and I suppose PowerVM), if doing a PERST, the
API calls will loop until the link is back up, at least when "releasing"
the reset line.

I wouldn't be surprised if on x86, similar kinds of ACPI calls are
needed which may not be the best thing to do in atomic context.

I don't see any specific performance issues with issuing resets, so I
would strongly advocate for changing the API requirements instead so
that it's called from a task context.

Cheers,
Ben.



> Thanks,
> Gavin
> 
> >>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> >>>> Tested-by: Wen Xiong<wenxiong@linux.vnet.ibm.com>
> >>>> ---
> >>>>  arch/powerpc/platforms/pseries/eeh_pseries.c | 12 ++++++++----
> >>>>  1 file changed, 8 insertions(+), 4 deletions(-)
> >>>> 
> >>>> diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
> >>>> index a6c7e19..67623a3 100644
> >>>> --- a/arch/powerpc/platforms/pseries/eeh_pseries.c
> >>>> +++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
> >>>> @@ -503,8 +503,7 @@ static int pseries_eeh_get_state(struct eeh_pe *pe, int *state)
> >>>>   */
> >>>>  static int pseries_eeh_reset(struct eeh_pe *pe, int option)
> >>>>  {
> >>>> -	int config_addr;
> >>>> -	int ret;
> >>>> +	int config_addr, delay, ret;
> >>>>  
> >>>>  	/* Figure out PE address */
> >>>>  	config_addr = pe->config_addr;
> >>>> @@ -528,9 +527,14 @@ static int pseries_eeh_reset(struct eeh_pe *pe, int option)
> >>>>  	/* We need reset hold or settlement delay */
> >>>>  	if (option == EEH_RESET_FUNDAMENTAL ||
> >>>>  	    option == EEH_RESET_HOT)
> >>>> -		msleep(EEH_PE_RST_HOLD_TIME);
> >>>> +		delay = EEH_PE_RST_HOLD_TIME;
> >>>> +	else
> >>>> +		delay = EEH_PE_RST_SETTLE_TIME;
> >>>> +
> >>>> +	if (in_atomic())
> >>>> +		udelay(delay * 1000);
> >>>>  	else
> >>>> -		msleep(EEH_PE_RST_SETTLE_TIME);
> >>>> +		msleep(delay);
> >>>>  
> >>>>  	return ret;
> >>>>  }
> >>>
> >>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] powerpc/pseries: Avoid context switch in EEH reset if required
  2015-01-24  9:57         ` Benjamin Herrenschmidt
@ 2015-01-26 23:36           ` Brian King
  2015-01-27  4:31             ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 11+ messages in thread
From: Brian King @ 2015-01-26 23:36 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Gavin Shan; +Cc: wenxiong, linuxppc-dev, Brian J King

On 01/24/2015 03:57 AM, Benjamin Herrenschmidt wrote:
> On Fri, 2015-01-23 at 14:50 +1100, Gavin Shan wrote:
> 
>> Messages from Brian for reference:
>>
>> | The API has changed. I wrote the pci_set_pcie_reset_state API originally.
>> | When this API was put in place initially, it was perfectly legal to call it
>> | from an atomic context. Can you clarify why we have to have the delay in the
>> | pci_set_pcie_reset_state function? Shouldn't it be the responsibility of the
>> | callers to ensure a proper delay is used? This was always the case until
>> | recently.
>>
>> So please ignore this patch and I'll send another one, which is implemented in
>> above approach.
> 
> I still think it's not a great idea to allow that API to be called in
> atomic context.

That was the entire purpose of the API though. If a driver doesn't need to
do the reset in atomic context, why bother having separate assert / deassert
APIs and just have an API that does the reset, delay, and deassert?


> Brian, the reset API for PCIe involves FW calls which might have to do
> a bunch of stuff under the hood, including potentially significant
> delays.
> 
> For example, under OPAL (and I suppose PowerVM), if doing a PERST, the
> API calls will loop until the link is back up, at least when "releasing"
> the reset line.

Under PowerVM, this maps to the ibm,set-slot-reset. According to PAPR+ V2.7,
R1-7.2.4-5:

For RTAS calls which do not allow the Status of -2 (Busy), there may be “rare” instances where an
anomaly may occur and the call may take longer than a “very short period of time.” In these cases, the call
must complete within 250 microseconds. Otherwise, a hardware error response must be given.


> I wouldn't be surprised if on x86, similar kinds of ACPI calls are
> needed which may not be the best thing to do in atomic context.

x86 hasn't wired up the function yet, so we don't have any code
to look at here. In fact, Power is the only architecture that has
wired up this API at all, all other architectures will return -EINVAL
if it is called.


> I don't see any specific performance issues with issuing resets, so I
> would strongly advocate for changing the API requirements instead so
> that it's called from a task context.

To set some context, this function is only used by ipr for some old
broken adapters. These are adapters that are not supported on p8,
so will never show up under OPAL, only PowerVM. I'm fine with looking
at alternatives for the future, but I can't say I'm too excited about
changing the calling requirements for an API that has been around
for many years. Particularly given that this code is only needed for
these old adapters. If its difficult to implement this for OPAL without
noticeable delays, could we just return -EINVAL for this function on OPAL?,
since its not needed there today anyway.

Thanks,

Brian

> 
> Cheers,
> Ben.
> 
> 
> 
>> Thanks,
>> Gavin
>>
>>>>>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
>>>>>> Tested-by: Wen Xiong<wenxiong@linux.vnet.ibm.com>
>>>>>> ---
>>>>>>  arch/powerpc/platforms/pseries/eeh_pseries.c | 12 ++++++++----
>>>>>>  1 file changed, 8 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
>>>>>> index a6c7e19..67623a3 100644
>>>>>> --- a/arch/powerpc/platforms/pseries/eeh_pseries.c
>>>>>> +++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
>>>>>> @@ -503,8 +503,7 @@ static int pseries_eeh_get_state(struct eeh_pe *pe, int *state)
>>>>>>   */
>>>>>>  static int pseries_eeh_reset(struct eeh_pe *pe, int option)
>>>>>>  {
>>>>>> -	int config_addr;
>>>>>> -	int ret;
>>>>>> +	int config_addr, delay, ret;
>>>>>>  
>>>>>>  	/* Figure out PE address */
>>>>>>  	config_addr = pe->config_addr;
>>>>>> @@ -528,9 +527,14 @@ static int pseries_eeh_reset(struct eeh_pe *pe, int option)
>>>>>>  	/* We need reset hold or settlement delay */
>>>>>>  	if (option == EEH_RESET_FUNDAMENTAL ||
>>>>>>  	    option == EEH_RESET_HOT)
>>>>>> -		msleep(EEH_PE_RST_HOLD_TIME);
>>>>>> +		delay = EEH_PE_RST_HOLD_TIME;
>>>>>> +	else
>>>>>> +		delay = EEH_PE_RST_SETTLE_TIME;
>>>>>> +
>>>>>> +	if (in_atomic())
>>>>>> +		udelay(delay * 1000);
>>>>>>  	else
>>>>>> -		msleep(EEH_PE_RST_SETTLE_TIME);
>>>>>> +		msleep(delay);
>>>>>>  
>>>>>>  	return ret;
>>>>>>  }
>>>>>
>>>>>
> 
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] powerpc/pseries: Avoid context switch in EEH reset if required
  2015-01-26 23:36           ` Brian King
@ 2015-01-27  4:31             ` Benjamin Herrenschmidt
  2015-01-27 22:58               ` Brian King
  0 siblings, 1 reply; 11+ messages in thread
From: Benjamin Herrenschmidt @ 2015-01-27  4:31 UTC (permalink / raw)
  To: Brian King; +Cc: wenxiong, linuxppc-dev, Gavin Shan, Brian J King

On Mon, 2015-01-26 at 17:36 -0600, Brian King wrote:
> To set some context, this function is only used by ipr for some old
> broken adapters. These are adapters that are not supported on p8,
> so will never show up under OPAL, only PowerVM. I'm fine with looking
> at alternatives for the future, but I can't say I'm too excited about
> changing the calling requirements for an API that has been around
> for many years. Particularly given that this code is only needed for
> these old adapters. If its difficult to implement this for OPAL without
> noticeable delays, could we just return -EINVAL for this function on OPAL?,
> since its not needed there today anyway.

Because it's needed for other things nowadays afaik, though IPR is the only one
that needs this to be done at interrupt time...

In fact, even with IPR and the existing call, how do you wait for the link to come
back for a PERST ? That can take a while...

Ben.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] powerpc/pseries: Avoid context switch in EEH reset if required
  2015-01-27  4:31             ` Benjamin Herrenschmidt
@ 2015-01-27 22:58               ` Brian King
  2015-01-27 23:58                 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 11+ messages in thread
From: Brian King @ 2015-01-27 22:58 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, wenxiong, Gavin Shan, Brian J King

On 01/26/2015 10:31 PM, Benjamin Herrenschmidt wrote:
> On Mon, 2015-01-26 at 17:36 -0600, Brian King wrote:
>> To set some context, this function is only used by ipr for some old
>> broken adapters. These are adapters that are not supported on p8,
>> so will never show up under OPAL, only PowerVM. I'm fine with looking
>> at alternatives for the future, but I can't say I'm too excited about
>> changing the calling requirements for an API that has been around
>> for many years. Particularly given that this code is only needed for
>> these old adapters. If its difficult to implement this for OPAL without
>> noticeable delays, could we just return -EINVAL for this function on OPAL?,
>> since its not needed there today anyway.
> 
> Because it's needed for other things nowadays afaik, though IPR is the only one
> that needs this to be done at interrupt time...

I'd argue we are our own worst enemy here really. The new user is EEH code.
I don't see a huge reason that code would need to use this exact same API.

> In fact, even with IPR and the existing call, how do you wait for the link to come
> back for a PERST ? That can take a while...

Basically, I assert reset, delay for 1/2 second via a timer interrupt, deassert reset,
delay for 2 seconds via another timer interrupt, then proceed with adapter initialization.

Thanks,

Brian

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] powerpc/pseries: Avoid context switch in EEH reset if required
  2015-01-27 22:58               ` Brian King
@ 2015-01-27 23:58                 ` Benjamin Herrenschmidt
  2015-01-30  1:37                   ` Gavin Shan
  0 siblings, 1 reply; 11+ messages in thread
From: Benjamin Herrenschmidt @ 2015-01-27 23:58 UTC (permalink / raw)
  To: Brian King; +Cc: linuxppc-dev, wenxiong, Gavin Shan, Brian J King

On Tue, 2015-01-27 at 16:58 -0600, Brian King wrote:
> I'd argue we are our own worst enemy here really. The new user is EEH
> code.
> I don't see a huge reason that code would need to use this exact same
> API.
> 
> > In fact, even with IPR and the existing call, how do you wait for
> the link to come
> > back for a PERST ? That can take a while...
> 
> Basically, I assert reset, delay for 1/2 second via a timer interrupt,
> deassert reset,
> delay for 2 seconds via another timer interrupt, then proceed with
> adapter initialization.

I'm surprised that even works properly... for example in the case of
PERST we need to mask various error traps before asserting and unmask
them when the link comes up (such as the surprise link down error), I
don't see an opportunity in that scheme for FW to do that latter...

Ben.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] powerpc/pseries: Avoid context switch in EEH reset if required
  2015-01-27 23:58                 ` Benjamin Herrenschmidt
@ 2015-01-30  1:37                   ` Gavin Shan
  0 siblings, 0 replies; 11+ messages in thread
From: Gavin Shan @ 2015-01-30  1:37 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Brian King, linuxppc-dev, wenxiong, Gavin Shan, Brian J King

On Wed, Jan 28, 2015 at 10:58:42AM +1100, Benjamin Herrenschmidt wrote:
>On Tue, 2015-01-27 at 16:58 -0600, Brian King wrote:
>> I'd argue we are our own worst enemy here really. The new user is EEH
>> code.
>> I don't see a huge reason that code would need to use this exact same
>> API.
>> 
>> > In fact, even with IPR and the existing call, how do you wait for
>> the link to come
>> > back for a PERST ? That can take a while...
>> 
>> Basically, I assert reset, delay for 1/2 second via a timer interrupt,
>> deassert reset,
>> delay for 2 seconds via another timer interrupt, then proceed with
>> adapter initialization.
>
>I'm surprised that even works properly... for example in the case of
>PERST we need to mask various error traps before asserting and unmask
>them when the link comes up (such as the surprise link down error), I
>don't see an opportunity in that scheme for FW to do that latter...
>

The FW perhaps does more than what's supposed to do for assert, and
less than what's supposed to do for deassert, but need confirm with
FW developers later. In this case, the link should come up in 1/2
second, which is really short. Otherwise, FW need implement deassert
function in blocking mode to wait the link to come back, which forces
the API to be called in non-atomic context. I'll check with FW developer
later on this.

I guess we have to change the API to be called in non-atomic context in
long run. For now, Wendy is waiting for the fix and port it to RHEL7.1.
I also sent another alternative patch, which was verified by Wendy.
I'm not sure if it's reasonable to include the following patch and
change driver's code to call this API under non-atomic context later
as proceeding enhancement?

https://patchwork.ozlabs.org/patch/432065/

Thanks,
Gavin

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-01-30  1:38 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-18 22:47 [PATCH] powerpc/pseries: Avoid context switch in EEH reset if required Gavin Shan
2015-01-20  9:28 ` Benjamin Herrenschmidt
2015-01-20 22:56   ` Gavin Shan
2015-01-20 23:53     ` Gavin Shan
2015-01-23  3:50       ` Gavin Shan
2015-01-24  9:57         ` Benjamin Herrenschmidt
2015-01-26 23:36           ` Brian King
2015-01-27  4:31             ` Benjamin Herrenschmidt
2015-01-27 22:58               ` Brian King
2015-01-27 23:58                 ` Benjamin Herrenschmidt
2015-01-30  1:37                   ` Gavin Shan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.