linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
@ 2010-05-18 21:33 Darren Hart
  2010-05-18 21:52 ` Brian King
  0 siblings, 1 reply; 23+ messages in thread
From: Darren Hart @ 2010-05-18 21:33 UTC (permalink / raw)
  To: lkml, 
  Cc: Will Schmidt, Thomas Gleixner, Jan-Bernd Themann,
	Nivedita Singhvi, Brian King, Michael Ellerman, Doug Maxey

>From ad81664794e33d785f533c5edee37aaba20dd92d Mon Sep 17 00:00:00 2001
From: Darren Hart <dvhltc@us.ibm.com>
Date: Tue, 18 May 2010 11:07:13 -0700
Subject: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)

The underlying hardware is edge triggered but presented by XICS as level
triggered. The edge triggered interrupts are not reissued after masking. This
is not a problem in mainline which does not mask the interrupt (relying on the
EOI mechanism instead). The threaded interrupts in PREEMPT_RT do mask the
interrupt, and can lose interrupts that occurred while masked, resulting in a
hung ethernet interface.

The receive handler simply calls napi_schedule(), as such, there is no
significant additional overhead in making this non-threaded, since we either
wakeup the threaded irq handler to call napi_schedule(), or just call
napi_schedule() directly to wakeup the softirqs.  As the receive handler is
lockless, there is no need to convert any of the ehea spinlock_t's to
atomic_spinlock_t's.

Without this patch, a simple scp file copy loop would fail quickly (usually
seconds). We have over two hours of sustained scp activity with the patch
applied.

Credit goes to Will Schmidt for lots of instrumentation and tracing which
clarified the scenario and to Thomas Gleixner for the incredibly simple
solution.

Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Acked-by: Will Schmidt <will_schmidt@vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linuxtronix.de>
Cc: Jan-Bernd Themann <themann@de.ibm.com>
Cc: Nivedita Singhvi <niv@us.ibm.com>
Cc: Brian King <bjking1@us.ibm.com>
Cc: Michael Ellerman <ellerman@au1.ibm.com>
Cc: Doug Maxey <doug.maxey@us.ibm.com>
---
 drivers/net/ehea/ehea_main.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index 977c3d3..2c53df2 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -1263,7 +1263,7 @@ static int ehea_reg_interrupts(struct net_device *dev)
 			 "%s-queue%d", dev->name, i);
 		ret = ibmebus_request_irq(pr->eq->attr.ist1,
 					  ehea_recv_irq_handler,
-					  IRQF_DISABLED, pr->int_send_name,
+					  IRQF_DISABLED | IRQF_NODELAY, pr->int_send_name,
 					  pr);
 		if (ret) {
 			ehea_error("failed registering irq for ehea_queue "
-- 
1.5.5.6
-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
  2010-05-18 21:33 [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY) Darren Hart
@ 2010-05-18 21:52 ` Brian King
  2010-05-18 22:19   ` Nivedita Singhvi
  2010-05-18 22:22   ` Darren Hart
  0 siblings, 2 replies; 23+ messages in thread
From: Brian King @ 2010-05-18 21:52 UTC (permalink / raw)
  To: dvhltc
  Cc: linux-kernel, Will Schmidt, Thomas Gleixner, Jan-Bernd Themann,
	niv, Michael Ellerman, Doug Maxey, linuxppc-dev

Is IRQF_NODELAY something specific to the RT kernel? I don't see it in mainline...

-Brian


On 05/18/2010 04:33 PM, dvhltc@linux.vnet.ibm.com wrote:
>>From ad81664794e33d785f533c5edee37aaba20dd92d Mon Sep 17 00:00:00 2001
> From: Darren Hart <dvhltc@us.ibm.com>
> Date: Tue, 18 May 2010 11:07:13 -0700
> Subject: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
> 
> The underlying hardware is edge triggered but presented by XICS as level
> triggered. The edge triggered interrupts are not reissued after masking. This
> is not a problem in mainline which does not mask the interrupt (relying on the
> EOI mechanism instead). The threaded interrupts in PREEMPT_RT do mask the
> interrupt, and can lose interrupts that occurred while masked, resulting in a
> hung ethernet interface.
> 
> The receive handler simply calls napi_schedule(), as such, there is no
> significant additional overhead in making this non-threaded, since we either
> wakeup the threaded irq handler to call napi_schedule(), or just call
> napi_schedule() directly to wakeup the softirqs.  As the receive handler is
> lockless, there is no need to convert any of the ehea spinlock_t's to
> atomic_spinlock_t's.
> 
> Without this patch, a simple scp file copy loop would fail quickly (usually
> seconds). We have over two hours of sustained scp activity with the patch
> applied.
> 
> Credit goes to Will Schmidt for lots of instrumentation and tracing which
> clarified the scenario and to Thomas Gleixner for the incredibly simple
> solution.
> 
> Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
> Acked-by: Will Schmidt <will_schmidt@vnet.ibm.com>
> Cc: Thomas Gleixner <tglx@linuxtronix.de>
> Cc: Jan-Bernd Themann <themann@de.ibm.com>
> Cc: Nivedita Singhvi <niv@us.ibm.com>
> Cc: Brian King <bjking1@us.ibm.com>
> Cc: Michael Ellerman <ellerman@au1.ibm.com>
> Cc: Doug Maxey <doug.maxey@us.ibm.com>
> ---
>  drivers/net/ehea/ehea_main.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
> index 977c3d3..2c53df2 100644
> --- a/drivers/net/ehea/ehea_main.c
> +++ b/drivers/net/ehea/ehea_main.c
> @@ -1263,7 +1263,7 @@ static int ehea_reg_interrupts(struct net_device *dev)
>  			 "%s-queue%d", dev->name, i);
>  		ret = ibmebus_request_irq(pr->eq->attr.ist1,
>  					  ehea_recv_irq_handler,
> -					  IRQF_DISABLED, pr->int_send_name,
> +					  IRQF_DISABLED | IRQF_NODELAY, pr->int_send_name,
>  					  pr);
>  		if (ret) {
>  			ehea_error("failed registering irq for ehea_queue "


-- 
Brian King
Linux on Power Virtualization
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
  2010-05-18 21:52 ` Brian King
@ 2010-05-18 22:19   ` Nivedita Singhvi
  2010-05-18 22:22   ` Darren Hart
  1 sibling, 0 replies; 23+ messages in thread
From: Nivedita Singhvi @ 2010-05-18 22:19 UTC (permalink / raw)
  To: Brian King
  Cc: dvhltc, linux-kernel, Will Schmidt, Thomas Gleixner,
	Jan-Bernd Themann, Michael Ellerman, Doug Maxey, linuxppc-dev

Brian King wrote:
> Is IRQF_NODELAY something specific to the RT kernel? I don't see it in mainline...

Yep, this is RT only.

thanks,
Nivedita

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
  2010-05-18 21:52 ` Brian King
  2010-05-18 22:19   ` Nivedita Singhvi
@ 2010-05-18 22:22   ` Darren Hart
  2010-05-19  1:25     ` Michael Ellerman
  1 sibling, 1 reply; 23+ messages in thread
From: Darren Hart @ 2010-05-18 22:22 UTC (permalink / raw)
  To: Brian King
  Cc: dvhltc, linux-kernel, Will Schmidt, Thomas Gleixner,
	Jan-Bernd Themann, niv, Michael Ellerman, Doug Maxey,
	linuxppc-dev

On 05/18/2010 02:52 PM, Brian King wrote:
> Is IRQF_NODELAY something specific to the RT kernel? I don't see it in mainline...

Yes, it basically says "don't make this handler threaded".

--
Darren

>
> -Brian
>
>
> On 05/18/2010 04:33 PM, dvhltc@linux.vnet.ibm.com wrote:
>> > From ad81664794e33d785f533c5edee37aaba20dd92d Mon Sep 17 00:00:00 2001
>> From: Darren Hart<dvhltc@us.ibm.com>
>> Date: Tue, 18 May 2010 11:07:13 -0700
>> Subject: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
>>
>> The underlying hardware is edge triggered but presented by XICS as level
>> triggered. The edge triggered interrupts are not reissued after masking. This
>> is not a problem in mainline which does not mask the interrupt (relying on the
>> EOI mechanism instead). The threaded interrupts in PREEMPT_RT do mask the
>> interrupt, and can lose interrupts that occurred while masked, resulting in a
>> hung ethernet interface.
>>
>> The receive handler simply calls napi_schedule(), as such, there is no
>> significant additional overhead in making this non-threaded, since we either
>> wakeup the threaded irq handler to call napi_schedule(), or just call
>> napi_schedule() directly to wakeup the softirqs.  As the receive handler is
>> lockless, there is no need to convert any of the ehea spinlock_t's to
>> atomic_spinlock_t's.
>>
>> Without this patch, a simple scp file copy loop would fail quickly (usually
>> seconds). We have over two hours of sustained scp activity with the patch
>> applied.
>>
>> Credit goes to Will Schmidt for lots of instrumentation and tracing which
>> clarified the scenario and to Thomas Gleixner for the incredibly simple
>> solution.
>>
>> Signed-off-by: Darren Hart<dvhltc@us.ibm.com>
>> Acked-by: Will Schmidt<will_schmidt@vnet.ibm.com>
>> Cc: Thomas Gleixner<tglx@linuxtronix.de>
>> Cc: Jan-Bernd Themann<themann@de.ibm.com>
>> Cc: Nivedita Singhvi<niv@us.ibm.com>
>> Cc: Brian King<bjking1@us.ibm.com>
>> Cc: Michael Ellerman<ellerman@au1.ibm.com>
>> Cc: Doug Maxey<doug.maxey@us.ibm.com>
>> ---
>>   drivers/net/ehea/ehea_main.c |    2 +-
>>   1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
>> index 977c3d3..2c53df2 100644
>> --- a/drivers/net/ehea/ehea_main.c
>> +++ b/drivers/net/ehea/ehea_main.c
>> @@ -1263,7 +1263,7 @@ static int ehea_reg_interrupts(struct net_device *dev)
>>   			 "%s-queue%d", dev->name, i);
>>   		ret = ibmebus_request_irq(pr->eq->attr.ist1,
>>   					  ehea_recv_irq_handler,
>> -					  IRQF_DISABLED, pr->int_send_name,
>> +					  IRQF_DISABLED | IRQF_NODELAY, pr->int_send_name,
>>   					  pr);
>>   		if (ret) {
>>   			ehea_error("failed registering irq for ehea_queue "
>
>


-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
  2010-05-18 22:22   ` Darren Hart
@ 2010-05-19  1:25     ` Michael Ellerman
  2010-05-19 14:16       ` Darren Hart
  0 siblings, 1 reply; 23+ messages in thread
From: Michael Ellerman @ 2010-05-19  1:25 UTC (permalink / raw)
  To: Darren Hart
  Cc: Brian King, Jan-Bernd Themann, dvhltc, linux-kernel,
	Will Schmidt, niv, Thomas Gleixner, Doug Maxey, linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 656 bytes --]

On Tue, 2010-05-18 at 15:22 -0700, Darren Hart wrote:
> On 05/18/2010 02:52 PM, Brian King wrote:
> > Is IRQF_NODELAY something specific to the RT kernel? I don't see it in mainline...
> 
> Yes, it basically says "don't make this handler threaded".

That is a good fix for EHEA, but the threaded handling is still broken
for anything else that is edge triggered isn't it?

The result of the discussion about two years ago on this was that we
needed a custom flow handler for XICS on RT.

Apart from the issue of loosing interrupts there is also the fact that
masking on the XICS requires an RTAS call which takes a global lock.

cheers



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
  2010-05-19  1:25     ` Michael Ellerman
@ 2010-05-19 14:16       ` Darren Hart
  2010-05-19 14:38         ` Thomas Gleixner
  2010-05-20  1:28         ` Michael Ellerman
  0 siblings, 2 replies; 23+ messages in thread
From: Darren Hart @ 2010-05-19 14:16 UTC (permalink / raw)
  To: michael
  Cc: Brian King, Jan-Bernd Themann, dvhltc, linux-kernel,
	Will Schmidt, niv, Thomas Gleixner, Doug Maxey, linuxppc-dev

On 05/18/2010 06:25 PM, Michael Ellerman wrote:
> On Tue, 2010-05-18 at 15:22 -0700, Darren Hart wrote:
>> On 05/18/2010 02:52 PM, Brian King wrote:
>>> Is IRQF_NODELAY something specific to the RT kernel? I don't see it in mainline...
>>
>> Yes, it basically says "don't make this handler threaded".
>
> That is a good fix for EHEA, but the threaded handling is still broken
> for anything else that is edge triggered isn't it?

No, I don't believe so. Edge triggered interrupts that are reported as 
edge triggered interrupts will use the edge handler (which was the 
approach Sebastien took to make this work back in 2008). Since XICS 
presents all interrupts as Level Triggered, they use the fasteoi path.

>
> The result of the discussion about two years ago on this was that we
> needed a custom flow handler for XICS on RT.

I'm still not clear on why the ultimate solution wasn't to have XICS 
report edge triggered as edge triggered. Probably some complexity of the 
entire power stack that I am ignorant of.

> Apart from the issue of loosing interrupts there is also the fact that
> masking on the XICS requires an RTAS call which takes a global lock.

Right, one of may reasons why we felt this was the right fix. The other 
is that there is no real additional overhead in running this as 
non-threaded since the receive handler is so short (just napi_schedule()).

-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
  2010-05-19 14:16       ` Darren Hart
@ 2010-05-19 14:38         ` Thomas Gleixner
  2010-05-19 21:08           ` Thomas Gleixner
  2010-05-20  1:32           ` Michael Ellerman
  2010-05-20  1:28         ` Michael Ellerman
  1 sibling, 2 replies; 23+ messages in thread
From: Thomas Gleixner @ 2010-05-19 14:38 UTC (permalink / raw)
  To: Darren Hart
  Cc: michael, Brian King, Jan-Bernd Themann, dvhltc, linux-kernel,
	Will Schmidt, niv, Doug Maxey, linuxppc-dev

On Wed, 19 May 2010, Darren Hart wrote:

> On 05/18/2010 06:25 PM, Michael Ellerman wrote:
> > On Tue, 2010-05-18 at 15:22 -0700, Darren Hart wrote:
> > > On 05/18/2010 02:52 PM, Brian King wrote:
> > > > Is IRQF_NODELAY something specific to the RT kernel? I don't see it in
> > > > mainline...
> > > 
> > > Yes, it basically says "don't make this handler threaded".
> > 
> > That is a good fix for EHEA, but the threaded handling is still broken
> > for anything else that is edge triggered isn't it?
> 
> No, I don't believe so. Edge triggered interrupts that are reported as edge
> triggered interrupts will use the edge handler (which was the approach
> Sebastien took to make this work back in 2008). Since XICS presents all
> interrupts as Level Triggered, they use the fasteoi path.

I wonder whether the XICS interrupts which are edge type can be
identified from the irq_desc->flags. Then we could avoid the masking
for those in the fasteoi_handler in general.

> > 
> > The result of the discussion about two years ago on this was that we
> > needed a custom flow handler for XICS on RT.
> 
> I'm still not clear on why the ultimate solution wasn't to have XICS report
> edge triggered as edge triggered. Probably some complexity of the entire power
> stack that I am ignorant of.
> 
> > Apart from the issue of loosing interrupts there is also the fact that
> > masking on the XICS requires an RTAS call which takes a global lock.

Right, I'd love to avoid that but with real level interrupts we'd run
into an interrupt storm. Though another solution would be to issue the
EOI after the threaded handler finished, that'd work as well, but
needs testing.

> Right, one of may reasons why we felt this was the right fix. The other is
> that there is no real additional overhead in running this as non-threaded
> since the receive handler is so short (just napi_schedule()).

Yes, in the case at hand it's the right thing to do, as we avoid
another wakeup/context switch.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
  2010-05-19 14:38         ` Thomas Gleixner
@ 2010-05-19 21:08           ` Thomas Gleixner
  2010-05-20  1:34             ` Michael Ellerman
  2010-05-20  1:32           ` Michael Ellerman
  1 sibling, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2010-05-19 21:08 UTC (permalink / raw)
  To: Darren Hart
  Cc: michael, Brian King, Jan-Bernd Themann, dvhltc, linux-kernel,
	Will Schmidt, niv, Doug Maxey, linuxppc-dev

On Wed, 19 May 2010, Thomas Gleixner wrote:
> > I'm still not clear on why the ultimate solution wasn't to have XICS report
> > edge triggered as edge triggered. Probably some complexity of the entire power
> > stack that I am ignorant of.
> > 
> > > Apart from the issue of loosing interrupts there is also the fact that
> > > masking on the XICS requires an RTAS call which takes a global lock.
> 
> Right, I'd love to avoid that but with real level interrupts we'd run
> into an interrupt storm. Though another solution would be to issue the
> EOI after the threaded handler finished, that'd work as well, but
> needs testing.

Thought more about that. The case at hand (ehea) is nasty:

The driver does _NOT_ disable the rx interrupt in the card in the rx
interrupt handler - for whatever reason.

 So even in mainline you get repeated rx interrupts when packets
 arrive while napi is processing the poll, which is suboptimal at
 least. In fact it is counterproductive as the whole purpose of NAPI
 is to _NOT_ get interrupts for consecutive incoming packets while the
 poll is active.

Most of the other network drivers do:

      rx_irq()
	disable rx interrupts on card
	napi_schedule()

Now when the napi poll is done (no more packets available) then the
driver reenables the rx interrupt on the card.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
  2010-05-19 14:16       ` Darren Hart
  2010-05-19 14:38         ` Thomas Gleixner
@ 2010-05-20  1:28         ` Michael Ellerman
  1 sibling, 0 replies; 23+ messages in thread
From: Michael Ellerman @ 2010-05-20  1:28 UTC (permalink / raw)
  To: Darren Hart
  Cc: Brian King, Jan-Bernd Themann, dvhltc, linux-kernel,
	Will Schmidt, niv, Thomas Gleixner, Doug Maxey, linuxppc-dev,
	Milton Miller

[-- Attachment #1: Type: text/plain, Size: 2094 bytes --]

On Wed, 2010-05-19 at 07:16 -0700, Darren Hart wrote:
> On 05/18/2010 06:25 PM, Michael Ellerman wrote:
> > On Tue, 2010-05-18 at 15:22 -0700, Darren Hart wrote:
> >> On 05/18/2010 02:52 PM, Brian King wrote:
> >>> Is IRQF_NODELAY something specific to the RT kernel? I don't see it in mainline...
> >>
> >> Yes, it basically says "don't make this handler threaded".
> >
> > That is a good fix for EHEA, but the threaded handling is still broken
> > for anything else that is edge triggered isn't it?
> 
> No, I don't believe so. Edge triggered interrupts that are reported as 
> edge triggered interrupts will use the edge handler (which was the 
> approach Sebastien took to make this work back in 2008). Since XICS 
> presents all interrupts as Level Triggered, they use the fasteoi path.

But that's the point, no interrupts on XICS are reported as edge, even
if they are actually edge somewhere deep in the hardware. I don't think
we have any reliable way to determine what is what.

> > The result of the discussion about two years ago on this was that we
> > needed a custom flow handler for XICS on RT.
> 
> I'm still not clear on why the ultimate solution wasn't to have XICS 
> report edge triggered as edge triggered. Probably some complexity of the 
> entire power stack that I am ignorant of.

I'm not really sure either, but I think it's a case of a leaky
abstraction on the part of the hypervisor. Edge interrupts behave as
level as long as you handle the irq before EOI, but if you mask they
don't. But Milton's the expert on that.

> > Apart from the issue of loosing interrupts there is also the fact that
> > masking on the XICS requires an RTAS call which takes a global lock.
> 
> Right, one of may reasons why we felt this was the right fix. The other 
> is that there is no real additional overhead in running this as 
> non-threaded since the receive handler is so short (just napi_schedule()).

True. It's not a fix in general though. I'm worried that we're going to
see the exact same bug for MSI(-X) interrupts.

cheers



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
  2010-05-19 14:38         ` Thomas Gleixner
  2010-05-19 21:08           ` Thomas Gleixner
@ 2010-05-20  1:32           ` Michael Ellerman
  2010-05-20  8:21             ` Thomas Gleixner
  1 sibling, 1 reply; 23+ messages in thread
From: Michael Ellerman @ 2010-05-20  1:32 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Darren Hart, Brian King, Jan-Bernd Themann, dvhltc, linux-kernel,
	Will Schmidt, niv, Doug Maxey, linuxppc-dev, Milton Miller

[-- Attachment #1: Type: text/plain, Size: 2182 bytes --]

On Wed, 2010-05-19 at 16:38 +0200, Thomas Gleixner wrote:
> On Wed, 19 May 2010, Darren Hart wrote:
> 
> > On 05/18/2010 06:25 PM, Michael Ellerman wrote:
> > > On Tue, 2010-05-18 at 15:22 -0700, Darren Hart wrote:
> > > > On 05/18/2010 02:52 PM, Brian King wrote:
> > > > > Is IRQF_NODELAY something specific to the RT kernel? I don't see it in
> > > > > mainline...
> > > > 
> > > > Yes, it basically says "don't make this handler threaded".
> > > 
> > > That is a good fix for EHEA, but the threaded handling is still broken
> > > for anything else that is edge triggered isn't it?
> > 
> > No, I don't believe so. Edge triggered interrupts that are reported as edge
> > triggered interrupts will use the edge handler (which was the approach
> > Sebastien took to make this work back in 2008). Since XICS presents all
> > interrupts as Level Triggered, they use the fasteoi path.
> 
> I wonder whether the XICS interrupts which are edge type can be
> identified from the irq_desc->flags. Then we could avoid the masking
> for those in the fasteoi_handler in general.

I'm not sure they can be. I know on other similar HW we can detect LSI
vs MSI, but that's without the HV in the equation.

> > > The result of the discussion about two years ago on this was that we
> > > needed a custom flow handler for XICS on RT.
> > 
> > I'm still not clear on why the ultimate solution wasn't to have XICS report
> > edge triggered as edge triggered. Probably some complexity of the entire power
> > stack that I am ignorant of.
> > 
> > > Apart from the issue of loosing interrupts there is also the fact that
> > > masking on the XICS requires an RTAS call which takes a global lock.
> 
> Right, I'd love to avoid that but with real level interrupts we'd run
> into an interrupt storm. Though another solution would be to issue the
> EOI after the threaded handler finished, that'd work as well, but
> needs testing.

Yeah I think that was the idea for the custom flow handler. We'd reset
the processor priority so we can take other interrupts (which the EOI
usually does for you), then do the actual EOI after the handler
finished.

cheers

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
  2010-05-19 21:08           ` Thomas Gleixner
@ 2010-05-20  1:34             ` Michael Ellerman
  2010-05-20  7:37               ` Jan-Bernd Themann
  0 siblings, 1 reply; 23+ messages in thread
From: Michael Ellerman @ 2010-05-20  1:34 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Darren Hart, Brian King, Jan-Bernd Themann, dvhltc, linux-kernel,
	Will Schmidt, niv, Doug Maxey, linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 1051 bytes --]

On Wed, 2010-05-19 at 23:08 +0200, Thomas Gleixner wrote:
> On Wed, 19 May 2010, Thomas Gleixner wrote:
> > > I'm still not clear on why the ultimate solution wasn't to have XICS report
> > > edge triggered as edge triggered. Probably some complexity of the entire power
> > > stack that I am ignorant of.
> > > 
> > > > Apart from the issue of loosing interrupts there is also the fact that
> > > > masking on the XICS requires an RTAS call which takes a global lock.
> > 
> > Right, I'd love to avoid that but with real level interrupts we'd run
> > into an interrupt storm. Though another solution would be to issue the
> > EOI after the threaded handler finished, that'd work as well, but
> > needs testing.
> 
> Thought more about that. The case at hand (ehea) is nasty:
> 
> The driver does _NOT_ disable the rx interrupt in the card in the rx
> interrupt handler - for whatever reason.

Yeah I saw that, but I don't know why it's written that way. Perhaps
Jan-Bernd or Doug will chime in and enlighten us? :)

cheers



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
  2010-05-20  1:34             ` Michael Ellerman
@ 2010-05-20  7:37               ` Jan-Bernd Themann
  2010-05-20  8:14                 ` Thomas Gleixner
  0 siblings, 1 reply; 23+ messages in thread
From: Jan-Bernd Themann @ 2010-05-20  7:37 UTC (permalink / raw)
  To: michael
  Cc: Brian King, Doug Maxey, dvhltc, Darren Hart, linux-kernel,
	linuxppc-dev, niv, Thomas Gleixner, Will Schmidt

Hi,


Michael Ellerman <michael@ellerman.id.au> wrote on 20.05.2010 03:34:08:


> Subject:
>
> Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
>
> On Wed, 2010-05-19 at 23:08 +0200, Thomas Gleixner wrote:
> > On Wed, 19 May 2010, Thomas Gleixner wrote:
> > > > I'm still not clear on why the ultimate solution wasn't to
> have XICS report
> > > > edge triggered as edge triggered. Probably some complexity of
> the entire power
> > > > stack that I am ignorant of.
> > > >
> > > > > Apart from the issue of loosing interrupts there is also thefact
that
> > > > > masking on the XICS requires an RTAS call which takes a global
lock.
> > >
> > > Right, I'd love to avoid that but with real level interrupts we'd run
> > > into an interrupt storm. Though another solution would be to issue
the
> > > EOI after the threaded handler finished, that'd work as well, but
> > > needs testing.
> >
> > Thought more about that. The case at hand (ehea) is nasty:
> >
> > The driver does _NOT_ disable the rx interrupt in the card in the rx
> > interrupt handler - for whatever reason.
>
> Yeah I saw that, but I don't know why it's written that way. Perhaps
> Jan-Bernd or Doug will chime in and enlighten us? :)

>From our perspective there is no need to disable interrupts for the RX side
as
the chip does not fire further interrupts until we tell the chip to do so
for a
particular queue. We have multiple receive queues with an own interrupt
each
so that the interrupts can arrive on multiple CPUs in parallel.
Interrupts are enabled again when we leave the NAPI Poll function for the
corresponding
receive queue.

Michael, does this answer your question?

Regards,
Jan-Bernd


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
  2010-05-20  7:37               ` Jan-Bernd Themann
@ 2010-05-20  8:14                 ` Thomas Gleixner
  2010-05-20  9:05                   ` Jan-Bernd Themann
  2010-05-20 14:39                   ` Darren Hart
  0 siblings, 2 replies; 23+ messages in thread
From: Thomas Gleixner @ 2010-05-20  8:14 UTC (permalink / raw)
  To: Jan-Bernd Themann
  Cc: michael, Brian King, Doug Maxey, dvhltc, Darren Hart,
	linux-kernel, linuxppc-dev, niv, Will Schmidt

On Thu, 20 May 2010, Jan-Bernd Themann wrote:
> > > Thought more about that. The case at hand (ehea) is nasty:
> > >
> > > The driver does _NOT_ disable the rx interrupt in the card in the rx
> > > interrupt handler - for whatever reason.
> >
> > Yeah I saw that, but I don't know why it's written that way. Perhaps
> > Jan-Bernd or Doug will chime in and enlighten us? :)
> 
> From our perspective there is no need to disable interrupts for the
> RX side as the chip does not fire further interrupts until we tell
> the chip to do so for a particular queue. We have multiple receive

The traces tell a different story though:

    ehea_recv_irq_handler()
      napi_reschedule()
    eoi()
    ehea_poll()
      ...
      ehea_recv_irq_handler()    <---------------- ???
        napi_reschedule()
      ...
      napi_complete()

Can't tell whether you can see the same behaviour in mainline, but I
don't see a reason why not.

> queues with an own interrupt each so that the interrupts can arrive
> on multiple CPUs in parallel.  Interrupts are enabled again when we
> leave the NAPI Poll function for the corresponding receive queue.

I can't see a piece of code which does that, but that's probably just
lack of detailed hardware knowledge on my side.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
  2010-05-20  1:32           ` Michael Ellerman
@ 2010-05-20  8:21             ` Thomas Gleixner
  2010-05-21  9:18               ` [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY), " Milton Miller
  0 siblings, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2010-05-20  8:21 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Darren Hart, Brian King, Jan-Bernd Themann, dvhltc, linux-kernel,
	Will Schmidt, niv, Doug Maxey, linuxppc-dev, Milton Miller

On Thu, 20 May 2010, Michael Ellerman wrote:
> On Wed, 2010-05-19 at 16:38 +0200, Thomas Gleixner wrote:
> > On Wed, 19 May 2010, Darren Hart wrote:
> > 
> > > On 05/18/2010 06:25 PM, Michael Ellerman wrote:
> > > > On Tue, 2010-05-18 at 15:22 -0700, Darren Hart wrote:
> 
> > > > The result of the discussion about two years ago on this was that we
> > > > needed a custom flow handler for XICS on RT.
> > > 
> > > I'm still not clear on why the ultimate solution wasn't to have XICS report
> > > edge triggered as edge triggered. Probably some complexity of the entire power
> > > stack that I am ignorant of.
> > > 
> > > > Apart from the issue of loosing interrupts there is also the fact that
> > > > masking on the XICS requires an RTAS call which takes a global lock.
> > 
> > Right, I'd love to avoid that but with real level interrupts we'd run
> > into an interrupt storm. Though another solution would be to issue the
> > EOI after the threaded handler finished, that'd work as well, but
> > needs testing.
> 
> Yeah I think that was the idea for the custom flow handler. We'd reset
> the processor priority so we can take other interrupts (which the EOI
> usually does for you), then do the actual EOI after the handler
> finished.

That only works when the card does not issue new interrupts until the
EOI happens. If the EOI is only relevant for the interrupt controller,
then you are going to lose any edge which comes in before the EOI as
well.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
  2010-05-20  8:14                 ` Thomas Gleixner
@ 2010-05-20  9:05                   ` Jan-Bernd Themann
  2010-05-20  9:19                     ` Thomas Gleixner
  2010-05-20 14:53                     ` Will Schmidt
  2010-05-20 14:39                   ` Darren Hart
  1 sibling, 2 replies; 23+ messages in thread
From: Jan-Bernd Themann @ 2010-05-20  9:05 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Brian King, Doug Maxey, Darren Hart, dvhltc, linux-kernel,
	linuxppc-dev, michael, niv, Will Schmidt


Hi Thomas

> Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
>
> On Thu, 20 May 2010, Jan-Bernd Themann wrote:
> > > > Thought more about that. The case at hand (ehea) is nasty:
> > > >
> > > > The driver does _NOT_ disable the rx interrupt in the card in the
rx
> > > > interrupt handler - for whatever reason.
> > >
> > > Yeah I saw that, but I don't know why it's written that way. Perhaps
> > > Jan-Bernd or Doug will chime in and enlighten us? :)
> >
> > From our perspective there is no need to disable interrupts for the
> > RX side as the chip does not fire further interrupts until we tell
> > the chip to do so for a particular queue. We have multiple receive
>
> The traces tell a different story though:
>
>     ehea_recv_irq_handler()
>       napi_reschedule()
>     eoi()
>     ehea_poll()
>       ...
>       ehea_recv_irq_handler()    <---------------- ???
>         napi_reschedule()
>       ...
>       napi_complete()
>
> Can't tell whether you can see the same behaviour in mainline, but I
> don't see a reason why not.

Is this the same interrupt we are seeing here, or do we see a second other
interrupt popping up on the same CPU? As I said, with multiple receive
queues
(if enabled) you can have multiple interrupts in parallel.

Pleaes check if multiple queues are enabled. The following module parameter
is used for that:

MODULE_PARM_DESC(use_mcs, " 0:NAPI, 1:Multiple receive queues, Default = 0
");

you should also see the number of used HEA interrupts in /proc/interrupts


>
> > queues with an own interrupt each so that the interrupts can arrive
> > on multiple CPUs in parallel.  Interrupts are enabled again when we
> > leave the NAPI Poll function for the corresponding receive queue.
>
> I can't see a piece of code which does that, but that's probably just
> lack of detailed hardware knowledge on my side.

If you mean the "re-enable" piece of code, it is not very obvious, you are
right.
Interrupts are only generated if a particular register for our completion
queues
is written. We do this in the following line:

          ehea_reset_cq_ep(pr->recv_cq);
          ehea_reset_cq_ep(pr->send_cq);
          ehea_reset_cq_n1(pr->recv_cq);
          ehea_reset_cq_n1(pr->send_cq);

So this is in a way an indirect way to ask for interrupts when new
completions were
written to memory. We don't really disable/enable interrupts on the HEA
chip itself.

I think there are some mechanisms build in the HEA chip that should prevent
that
interrupts don't get lost. But that is something that is / was completely
hidden from
us, so my skill is very limited there.

If more details are needed here we should involve the PHYP guys + eHEA HW
guys if not
already done. Did anyone already talk to them?

Regards,
Jan-Bernd


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
  2010-05-20  9:05                   ` Jan-Bernd Themann
@ 2010-05-20  9:19                     ` Thomas Gleixner
  2010-05-20 14:26                       ` Nivedita Singhvi
  2010-05-20 14:53                     ` Will Schmidt
  1 sibling, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2010-05-20  9:19 UTC (permalink / raw)
  To: Jan-Bernd Themann
  Cc: Brian King, Doug Maxey, Darren Hart, dvhltc, linux-kernel,
	linuxppc-dev, michael, niv, Will Schmidt

Jan-Bernd,

On Thu, 20 May 2010, Jan-Bernd Themann wrote:

> 
> Hi Thomas
> 
> > Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
> >
> > On Thu, 20 May 2010, Jan-Bernd Themann wrote:
> > > > > Thought more about that. The case at hand (ehea) is nasty:
> > > > >
> > > > > The driver does _NOT_ disable the rx interrupt in the card in the
> rx
> > > > > interrupt handler - for whatever reason.
> > > >
> > > > Yeah I saw that, but I don't know why it's written that way. Perhaps
> > > > Jan-Bernd or Doug will chime in and enlighten us? :)
> > >
> > > From our perspective there is no need to disable interrupts for the
> > > RX side as the chip does not fire further interrupts until we tell
> > > the chip to do so for a particular queue. We have multiple receive
> >
> > The traces tell a different story though:
> >
> >     ehea_recv_irq_handler()
> >       napi_reschedule()
> >     eoi()
> >     ehea_poll()
> >       ...
> >       ehea_recv_irq_handler()    <---------------- ???
> >         napi_reschedule()
> >       ...
> >       napi_complete()
> >
> > Can't tell whether you can see the same behaviour in mainline, but I
> > don't see a reason why not.
> 
> Is this the same interrupt we are seeing here, or do we see a second other
> interrupt popping up on the same CPU? As I said, with multiple receive
> queues
> (if enabled) you can have multiple interrupts in parallel.

According to the traces it's the very same interrupt number.

> Pleaes check if multiple queues are enabled. The following module parameter
> is used for that:
> 
> MODULE_PARM_DESC(use_mcs, " 0:NAPI, 1:Multiple receive queues, Default = 0
> ");
> 
> you should also see the number of used HEA interrupts in /proc/interrupts

I leave that for Will and Darren, they have the hardware :)

> >
> > > queues with an own interrupt each so that the interrupts can arrive
> > > on multiple CPUs in parallel.  Interrupts are enabled again when we
> > > leave the NAPI Poll function for the corresponding receive queue.
> >
> > I can't see a piece of code which does that, but that's probably just
> > lack of detailed hardware knowledge on my side.
> 
> If you mean the "re-enable" piece of code, it is not very obvious,
> you are right.  Interrupts are only generated if a particular
> register for our completion queues is written. We do this in the
> following line:
> 
>           ehea_reset_cq_ep(pr->recv_cq);
>           ehea_reset_cq_ep(pr->send_cq);
>           ehea_reset_cq_n1(pr->recv_cq);
>           ehea_reset_cq_n1(pr->send_cq);
> 
> So this is in a way an indirect way to ask for interrupts when new
> completions were written to memory. We don't really disable/enable
> interrupts on the HEA chip itself.

Ah, ok. That's after the napi_complete which looks correct.
 
> I think there are some mechanisms build in the HEA chip that should
> prevent that interrupts don't get lost. But that is something that
> is / was completely hidden from us, so my skill is very limited
> there.
>
> If more details are needed here we should involve the PHYP guys +
> eHEA HW guys if not already done. Did anyone already talk to them?

Will or Darren might have, but lets gather more information first
before we rack their nerves :)

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
  2010-05-20  9:19                     ` Thomas Gleixner
@ 2010-05-20 14:26                       ` Nivedita Singhvi
  0 siblings, 0 replies; 23+ messages in thread
From: Nivedita Singhvi @ 2010-05-20 14:26 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Jan-Bernd Themann, Brian King, Doug Maxey, Darren Hart, dvhltc,
	linux-kernel, linuxppc-dev, michael, niv, Will Schmidt

Thomas Gleixner wrote:

>> Pleaes check if multiple queues are enabled. The following module parameter
>> is used for that:
>>
>> MODULE_PARM_DESC(use_mcs, " 0:NAPI, 1:Multiple receive queues, Default = 0
>> ");
>>
>> you should also see the number of used HEA interrupts in /proc/interrupts
> 
> I leave that for Will and Darren, they have the hardware :)

  16:     477477	...	XICS      Level     IPI
  17:        129 ...	XICS      Level     hvc_console
  18:          0 ...	XICS      Level     RAS_EPOW
  33:     139232 ...     XICS      Level     mlx4_core
256:          3 ...     XICS      Level     ehea_neq
259:          0 ...     XICS      Level     eth0-aff
260:    2082153 ...	XICS      Level     eth0-queue0
289:     119166 ...	XICS      Level     ipr
305:          0 ...	XICS      Level     ohci_hcd:usb2
306:          0 ...	XICS      Level     ohci_hcd:usb3
307:    2389839 ...	XICS      Level     ehci_hcd:usb1


Nope, multiple rx queues not enabled.

thanks,
Nivedita

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
  2010-05-20  8:14                 ` Thomas Gleixner
  2010-05-20  9:05                   ` Jan-Bernd Themann
@ 2010-05-20 14:39                   ` Darren Hart
  2010-05-20 14:45                     ` Thomas Gleixner
  1 sibling, 1 reply; 23+ messages in thread
From: Darren Hart @ 2010-05-20 14:39 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Jan-Bernd Themann, michael, Brian King, Doug Maxey, dvhltc,
	linux-kernel, linuxppc-dev, niv, Will Schmidt

On 05/20/2010 01:14 AM, Thomas Gleixner wrote:
> On Thu, 20 May 2010, Jan-Bernd Themann wrote:
>>>> Thought more about that. The case at hand (ehea) is nasty:
>>>>
>>>> The driver does _NOT_ disable the rx interrupt in the card in the rx
>>>> interrupt handler - for whatever reason.
>>>
>>> Yeah I saw that, but I don't know why it's written that way. Perhaps
>>> Jan-Bernd or Doug will chime in and enlighten us? :)
>>
>>  From our perspective there is no need to disable interrupts for the
>> RX side as the chip does not fire further interrupts until we tell
>> the chip to do so for a particular queue. We have multiple receive
>
> The traces tell a different story though:
>
>      ehea_recv_irq_handler()
>        napi_reschedule()
>      eoi()
>      ehea_poll()
>        ...
>        ehea_recv_irq_handler()<---------------- ???
>          napi_reschedule()
>        ...
>        napi_complete()
>
> Can't tell whether you can see the same behaviour in mainline, but I
> don't see a reason why not.

I was going to suggest that because these are threaded handlers, perhaps 
they are rescheduled on a different CPU and then receive the interrupt 
for the other CPU/queue that Jan was mentioning.

But, the handlers are affined if I remember correctly, and we aren't 
running with multiple receive queues. So, we're back to the same 
question, why are we seeing another irq. It comes in before 
napi_complete() and therefor before the ehea_reset*() block of calls 
which do the equivalent of re-enabling interrupts.

--
Darren

>
>> queues with an own interrupt each so that the interrupts can arrive
>> on multiple CPUs in parallel.  Interrupts are enabled again when we
>> leave the NAPI Poll function for the corresponding receive queue.
>
> I can't see a piece of code which does that, but that's probably just
> lack of detailed hardware knowledge on my side.
>
> Thanks,
>
> 	tglx


-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
  2010-05-20 14:39                   ` Darren Hart
@ 2010-05-20 14:45                     ` Thomas Gleixner
  2010-05-20 21:44                       ` Will Schmidt
  0 siblings, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2010-05-20 14:45 UTC (permalink / raw)
  To: Darren Hart
  Cc: Jan-Bernd Themann, michael, Brian King, Doug Maxey, dvhltc,
	linux-kernel, linuxppc-dev, niv, Will Schmidt

On Thu, 20 May 2010, Darren Hart wrote:

> On 05/20/2010 01:14 AM, Thomas Gleixner wrote:
> > On Thu, 20 May 2010, Jan-Bernd Themann wrote:
> > > > > Thought more about that. The case at hand (ehea) is nasty:
> > > > > 
> > > > > The driver does _NOT_ disable the rx interrupt in the card in the rx
> > > > > interrupt handler - for whatever reason.
> > > > 
> > > > Yeah I saw that, but I don't know why it's written that way. Perhaps
> > > > Jan-Bernd or Doug will chime in and enlighten us? :)
> > > 
> > >  From our perspective there is no need to disable interrupts for the
> > > RX side as the chip does not fire further interrupts until we tell
> > > the chip to do so for a particular queue. We have multiple receive
> > 
> > The traces tell a different story though:
> > 
> >      ehea_recv_irq_handler()
> >        napi_reschedule()
> >      eoi()
> >      ehea_poll()
> >        ...
> >        ehea_recv_irq_handler()<---------------- ???
> >          napi_reschedule()
> >        ...
> >        napi_complete()
> > 
> > Can't tell whether you can see the same behaviour in mainline, but I
> > don't see a reason why not.
> 
> I was going to suggest that because these are threaded handlers, perhaps they
> are rescheduled on a different CPU and then receive the interrupt for the
> other CPU/queue that Jan was mentioning.
> 
> But, the handlers are affined if I remember correctly, and we aren't running
> with multiple receive queues. So, we're back to the same question, why are we
> seeing another irq. It comes in before napi_complete() and therefor before the
> ehea_reset*() block of calls which do the equivalent of re-enabling
> interrupts.

Can you slap a few trace points into that driver with a stock mainline
kernel and verify that ?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
  2010-05-20  9:05                   ` Jan-Bernd Themann
  2010-05-20  9:19                     ` Thomas Gleixner
@ 2010-05-20 14:53                     ` Will Schmidt
  1 sibling, 0 replies; 23+ messages in thread
From: Will Schmidt @ 2010-05-20 14:53 UTC (permalink / raw)
  To: Jan-Bernd Themann
  Cc: Thomas Gleixner, Brian King, Doug Maxey, Darren Hart, dvhltc,
	linux-kernel, linuxppc-dev, michael, niv

On Thu, 2010-05-20 at 11:05 +0200, Jan-Bernd Themann wrote:
> Hi Thomas
> 
> > Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
> >
> > On Thu, 20 May 2010, Jan-Bernd Themann wrote:
> > > > > Thought more about that. The case at hand (ehea) is nasty:
> > > > >
> > > > > The driver does _NOT_ disable the rx interrupt in the card in the
> rx
> > > > > interrupt handler - for whatever reason.
> > > >
> > > > Yeah I saw that, but I don't know why it's written that way. Perhaps
> > > > Jan-Bernd or Doug will chime in and enlighten us? :)
> > >
> > > From our perspective there is no need to disable interrupts for the
> > > RX side as the chip does not fire further interrupts until we tell
> > > the chip to do so for a particular queue. We have multiple receive
> >
> > The traces tell a different story though:
> >
> >     ehea_recv_irq_handler()
> >       napi_reschedule()
> >     eoi()
> >     ehea_poll()
> >       ...
> >       ehea_recv_irq_handler()    <---------------- ???
> >         napi_reschedule()
> >       ...
> >       napi_complete()
> >
> > Can't tell whether you can see the same behaviour in mainline, but I
> > don't see a reason why not.
> 
> Is this the same interrupt we are seeing here, or do we see a second other
> interrupt popping up on the same CPU? As I said, with multiple receive
> queues
> (if enabled) you can have multiple interrupts in parallel.

Same interrupt number (260).      Per the trace data, the first
ehea_recv_irq_handler (at 117.904525) was on cpu 0, the second (at
117.904689) was on cpu 1.



<...>-2180  [000]  117.904525: .ehea_recv_irq_handler: ENTER 0 c0000000e8bd08b0
<...>-2180  [000]  117.904527: .ehea_recv_irq_handler: napi_reschedule COMpleted  c0000000e8bd08b0
<...>-2180  [000]  117.904528: .ehea_recv_irq_handler: EXIT reschedule(1) 1 c0000000e8bd08b0
<...>-2180  [000]  117.904529: .xics_unmask_irq: xics: unmask virq 260 772
<...>-2180  [000]  117.904547: .xics_unmask_irq: xics: unmask virq pre-xive 260 772 0 status:0 ff
<...>-2180  [000]  117.904586: .xics_unmask_irq: xics: unmask virq post-xive 260 772 0 D:11416 status:0 5
<...>-2180  [000]  117.904602: .handle_fasteoi_irq: 260 8004000
<...>-2180  [000]  117.904603: .xics_mask_irq: xics: mask virq 260 772
<...>-2180  [000]  117.904634: .xics_mask_real_irq: xics: before: mask_real 772 status:0 5
<...>-2180  [000]  117.904668: .xics_mask_real_irq: xics: after: mask_real 772 status:0 ff
<...>-2180  [000]  117.904669: .handle_fasteoi_irq: pre-action:  260 8004100
<...>-2180  [000]  117.904671: .handle_fasteoi_irq: post-action: 260 8004100
<...>-2180  [000]  117.904672: .handle_fasteoi_irq: exit.  260 8004000
<...>-7     [000]  117.904681: .ehea_poll:  ENTER  1  c0000000e8bd08b0 poll_counter:0 force:0
<...>-7     [000]  117.904683: .ehea_proc_rwqes: ehea_check_cqe 0
<...>-2180  [001]  117.904689: .ehea_recv_irq_handler: ENTER 1 c0000000e8bd08b0
<...>-7     [000]  117.904690: .ehea_proc_rwqes: ehea_check_cqe 0
<...>-2180  [001]  117.904691: .ehea_recv_irq_handler: napi_reschedule inCOMplete  c0000000e8bd08b0
<...>-2180  [001]  117.904692: .ehea_recv_irq_handler: EXIT reschedule(0) 1 c0000000e8bd08b0
<...>-2180  [001]  117.904694: .xics_unmask_irq: xics: unmask virq 260 772
<...>-7     [000]  117.904702: .ehea_refill_rq2: ehea_refill_rq2
<...>-7     [000]  117.904703: .ehea_refill_rq_def: ehea_refill_rq_def
<...>-7     [000]  117.904704: .ehea_refill_rq3: ehea_refill_rq3
<...>-7     [000]  117.904705: .ehea_refill_rq_def: ehea_refill_rq_def
<...>-7     [000]  117.904706: .napi_complete: napi_complete: ENTER  state: 1  c0000000e8bd08b0
<...>-7     [000]  117.904707: .napi_complete: napi_complete: EXIT  state: 0  c0000000e8bd08b0
<...>-7     [000]  117.904710: .ehea_poll: EXIT  !cqe rx(2).   0  c0000000e8bd08b0
<...>-2180  [001]  117.904719: .xics_unmask_irq: xics: unmask virq pre-xive 260 772 0 status:0 ff
<...>-2180  [001]  117.904761: .xics_unmask_irq: xics: unmask virq post-xive 260 772 0 D:12705 status:0 5




> Pleaes check if multiple queues are enabled. The following module parameter
> is used for that:
> 
> MODULE_PARM_DESC(use_mcs, " 0:NAPI, 1:Multiple receive queues, Default = 0
> ");

No module parameters were used, should be plain old defaults.  

> 
> you should also see the number of used HEA interrupts in /proc/interrupts
> 

256:          1    0    0    0    0    0   0    0   XICS      Level     ehea_neq
259:          0    0    0    0    0    0   0    0   XICS      Level     eth0-aff
260:     361965    0    0    0    0    0   0    0   XICS      Level     eth0-queue0




> 
> >
> > > queues with an own interrupt each so that the interrupts can arrive
> > > on multiple CPUs in parallel.  Interrupts are enabled again when we
> > > leave the NAPI Poll function for the corresponding receive queue.
> >
> > I can't see a piece of code which does that, but that's probably just
> > lack of detailed hardware knowledge on my side.
> 
> If you mean the "re-enable" piece of code, it is not very obvious, you are
> right.
> Interrupts are only generated if a particular register for our completion
> queues
> is written. We do this in the following line:
> 
>           ehea_reset_cq_ep(pr->recv_cq);
>           ehea_reset_cq_ep(pr->send_cq);
>           ehea_reset_cq_n1(pr->recv_cq);
>           ehea_reset_cq_n1(pr->send_cq);
> 
> So this is in a way an indirect way to ask for interrupts when new
> completions were
> written to memory. We don't really disable/enable interrupts on the HEA
> chip itself.
> 
> I think there are some mechanisms build in the HEA chip that should prevent
> that
> interrupts don't get lost. But that is something that is / was completely
> hidden from
> us, so my skill is very limited there.
> 
> If more details are needed here we should involve the PHYP guys + eHEA HW
> guys if not
> already done. Did anyone already talk to them?
> 
> Regards,
> Jan-Bernd
> 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
  2010-05-20 14:45                     ` Thomas Gleixner
@ 2010-05-20 21:44                       ` Will Schmidt
  0 siblings, 0 replies; 23+ messages in thread
From: Will Schmidt @ 2010-05-20 21:44 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Darren Hart, Jan-Bernd Themann, michael, Brian King, Doug Maxey,
	dvhltc, linux-kernel, linuxppc-dev, niv

On Thu, 2010-05-20 at 16:45 +0200, Thomas Gleixner wrote:
> On Thu, 20 May 2010, Darren Hart wrote:
> 
> > On 05/20/2010 01:14 AM, Thomas Gleixner wrote:
> > > On Thu, 20 May 2010, Jan-Bernd Themann wrote:
> > > > > > Thought more about that. The case at hand (ehea) is nasty:
> > > > > > 
> > > > > > The driver does _NOT_ disable the rx interrupt in the card in the rx
> > > > > > interrupt handler - for whatever reason.
> > > > > 
> > > > > Yeah I saw that, but I don't know why it's written that way. Perhaps
> > > > > Jan-Bernd or Doug will chime in and enlighten us? :)
> > > > 
> > > >  From our perspective there is no need to disable interrupts for the
> > > > RX side as the chip does not fire further interrupts until we tell
> > > > the chip to do so for a particular queue. We have multiple receive
> > > 
> > > The traces tell a different story though:
> > > 
> > >      ehea_recv_irq_handler()
> > >        napi_reschedule()
> > >      eoi()
> > >      ehea_poll()
> > >        ...
> > >        ehea_recv_irq_handler()<---------------- ???
> > >          napi_reschedule()
> > >        ...
> > >        napi_complete()
> > > 
> > > Can't tell whether you can see the same behaviour in mainline, but I
> > > don't see a reason why not.
> > 
> > I was going to suggest that because these are threaded handlers, perhaps they
> > are rescheduled on a different CPU and then receive the interrupt for the
> > other CPU/queue that Jan was mentioning.
> > 
> > But, the handlers are affined if I remember correctly, and we aren't running
> > with multiple receive queues. So, we're back to the same question, why are we
> > seeing another irq. It comes in before napi_complete() and therefor before the
> > ehea_reset*() block of calls which do the equivalent of re-enabling
> > interrupts.
> 
> Can you slap a few trace points into that driver with a stock mainline
> kernel and verify that ?

2.6.33.4 (non-rt kernel) with similar trace_printk hooks in place...
Most data lumps look like so:

          <idle>-0     [000]  1097.685337: .handle_fasteoi_irq: ENTER 260 4000
          <idle>-0     [000]  1097.685339: .handle_fasteoi_irq: pre-action 260 4100
          <idle>-0     [000]  1097.685339: .ehea_recv_irq_handler:  ENTER  c0000000e8980700
          <idle>-0     [000]  1097.685340: .ehea_recv_irq_handler: napi_schedule ... c0000000e8980700
          <idle>-0     [000]  1097.685341: .ehea_recv_irq_handler: napi_schedule Calling __napi_schedule ... c0000000e8980700
          <idle>-0     [000]  1097.685342: .ehea_recv_irq_handler:  EXIT c0000000e8980700
          <idle>-0     [000]  1097.685343: .handle_fasteoi_irq: post-action 260 4100
          <idle>-0     [000]  1097.685344: .handle_fasteoi_irq: EXIT. 260 4000
          <idle>-0     [000]  1097.685346: .ehea_poll:  ENTER c0000000e8980700
          <idle>-0     [000]  1097.685352: .napi_complete: napi_complete: ENTER c0000000e8980700
          <idle>-0     [000]  1097.685352: .napi_complete: napi_complete: EXIT c0000000e8980700
          <idle>-0     [000]  1097.685355: .ehea_poll:  EXIT !cqe rx(1) c0000000e8980700

But I did see one like this, which shows a ehea_recv_irq_handler ENTER
within a ehea_poll ENTER.   (which I think is what you were expecting,
or wanted to verify..)       


          <idle>-0     [000]  1097.616261: .handle_fasteoi_irq: ENTER 260 4000
          <idle>-0     [000]  1097.616262: .handle_fasteoi_irq: pre-action 260 4100
*          <idle>-0     [000]  1097.616263: .ehea_recv_irq_handler:  ENTER  c0000000e8980700
          <idle>-0     [000]  1097.616264: .ehea_recv_irq_handler: napi_schedule ... c0000000e8980700
          <idle>-0     [000]  1097.616265: .ehea_recv_irq_handler: napi_schedule Calling __napi_schedule ... c0000000e8980700
          <idle>-0     [000]  1097.616265: .ehea_recv_irq_handler:  EXIT c0000000e8980700
          <idle>-0     [000]  1097.616266: .handle_fasteoi_irq: post-action 260 4100
          <idle>-0     [000]  1097.616268: .handle_fasteoi_irq: EXIT. 260 4000
*          <idle>-0     [000]  1097.616270: .ehea_poll:  ENTER c0000000e8980700
          <idle>-0     [000]  1097.616282: .handle_fasteoi_irq: ENTER 260 4000
          <idle>-0     [000]  1097.616283: .handle_fasteoi_irq: pre-action 260 4100
*          <idle>-0     [000]  1097.616284: .ehea_recv_irq_handler:  ENTER  c0000000e8980700
          <idle>-0     [000]  1097.616285: .ehea_recv_irq_handler: napi_schedule ... c0000000e8980700
          <idle>-0     [000]  1097.616286: .ehea_recv_irq_handler: napi_schedule NOT Calling __napi_schedule... c0000000e8980700
          <idle>-0     [000]  1097.616286: .ehea_recv_irq_handler:  EXIT c0000000e8980700
          <idle>-0     [000]  1097.616287: .handle_fasteoi_irq: post-action 260 4100
          <idle>-0     [000]  1097.616289: .handle_fasteoi_irq: EXIT. 260 4000
          <idle>-0     [000]  1097.616299: .napi_complete: napi_complete: ENTER c0000000e8980700
          <idle>-0     [000]  1097.616300: .napi_complete: napi_complete: EXIT c0000000e8980700
          <idle>-0     [000]  1097.616302: .ehea_poll: napi_reschedule COMpleted c0000000e8980700
          <idle>-0     [000]  1097.616303: .napi_complete: napi_complete: ENTER c0000000e8980700
          <idle>-0     [000]  1097.616304: .napi_complete: napi_complete: EXIT c0000000e8980700
*          <idle>-0     [000]  1097.616306: .ehea_poll:  EXIT !cqe rx(4) c0000000e8980700



Let me know if you want/need more or a variation, etc.. 

Thanks, 
-Will


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY), Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
  2010-05-20  8:21             ` Thomas Gleixner
@ 2010-05-21  9:18               ` Milton Miller
  2010-09-20 14:26                 ` Jan-Bernd Themann
  0 siblings, 1 reply; 23+ messages in thread
From: Milton Miller @ 2010-05-21  9:18 UTC (permalink / raw)
  To: Thomas Gleixner, Jan-Bernd Themann
  Cc: Darren Hart, Brian King, Michael Ellerman, dvhltc, linux-kernel,
	Will Schmidt, niv, Doug Maxey, linuxppc-dev, Milton Miller

On Thu, 20 May 2010 at 10:21:36 +0200 (CEST) Thomas Gleixner  wrote:
> On Thu, 20 May 2010, Michael Ellerman wrote:
> > On Wed, 2010-05-19 at 16:38 +0200, Thomas Gleixner wrote:
> > > On Wed, 19 May 2010, Darren Hart wrote:
> > >
> > > > On 05/18/2010 06:25 PM, Michael Ellerman wrote:
> > > > > On Tue, 2010-05-18 at 15:22 -0700, Darren Hart wrote:
> >
> > > > > The result of the discussion about two years ago on this was that we
> > > > > needed a custom flow handler for XICS on RT.
> > > >
> > > > I'm still not clear on why the ultimate solution wasn't to have XICS report
> > > > edge triggered as edge triggered. Probably some complexity of the entire power
> > > > stack that I am ignorant of.
> > > >
> > > > > Apart from the issue of loosing interrupts there is also the fact that
> > > > > masking on the XICS requires an RTAS call which takes a global lock.
> > >
> > > Right, I'd love to avoid that but with real level interrupts we'd run
> > > into an interrupt storm. Though another solution would be to issue the
> > > EOI after the threaded handler finished, that'd work as well, but
> > > needs testing.
> >
> > Yeah I think that was the idea for the custom flow handler. We'd reset
> > the processor priority so we can take other interrupts (which the EOI
> > usually does for you), then do the actual EOI after the handler
> > finished.
> 
> That only works when the card does not issue new interrupts until the
> EOI happens. If the EOI is only relevant for the interrupt controller,
> then you are going to lose any edge which comes in before the EOI as
> well.

Well, the real MSIs have an extra bit to allow the eoi to dally behind
the mmio on another path and that should cover this race when the irq
is left enabled.

Jan-Bernd HEA has that change, right?

milton

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)
  2010-05-21  9:18               ` [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY), " Milton Miller
@ 2010-09-20 14:26                 ` Jan-Bernd Themann
  0 siblings, 0 replies; 23+ messages in thread
From: Jan-Bernd Themann @ 2010-09-20 14:26 UTC (permalink / raw)
  To: Milton Miller
  Cc: Brian King, Doug Maxey, dvhltc, Darren Hart, linux-kernel,
	linux-kernel-owner, linuxppc-dev, Michael Ellerman,
	Milton Miller, niv, Thomas Gleixner, Will Schmidt

Hi Milton,

sorry for the delayed answer, was on vacation.

linux-kernel-owner@vger.kernel.org wrote on 21.05.2010 11:18:01:
> linux-kernel-owner@vger.kernel.org
>
> On Thu, 20 May 2010 at 10:21:36 +0200 (CEST) Thomas Gleixner  wrote:
> > On Thu, 20 May 2010, Michael Ellerman wrote:
> > > On Wed, 2010-05-19 at 16:38 +0200, Thomas Gleixner wrote:
> > > > On Wed, 19 May 2010, Darren Hart wrote:
> > > >
> > > > > On 05/18/2010 06:25 PM, Michael Ellerman wrote:
> > > > > > On Tue, 2010-05-18 at 15:22 -0700, Darren Hart wrote:
> > >
> > > > > > The result of the discussion about two years ago on this was
that we
> > > > > > needed a custom flow handler for XICS on RT.
> > > > >
> > > > > I'm still not clear on why the ultimate solution wasn't to
> have XICS report
> > > > > edge triggered as edge triggered. Probably some complexity
> of the entire power
> > > > > stack that I am ignorant of.
> > > > >
> > > > > > Apart from the issue of loosing interrupts there is also
> the fact that
> > > > > > masking on the XICS requires an RTAS call which takes a global
lock.
> > > >
> > > > Right, I'd love to avoid that but with real level interrupts we'd
run
> > > > into an interrupt storm. Though another solution would be to issue
the
> > > > EOI after the threaded handler finished, that'd work as well, but
> > > > needs testing.
> > >
> > > Yeah I think that was the idea for the custom flow handler. We'd
reset
> > > the processor priority so we can take other interrupts (which the EOI
> > > usually does for you), then do the actual EOI after the handler
> > > finished.
> >
> > That only works when the card does not issue new interrupts until the
> > EOI happens. If the EOI is only relevant for the interrupt controller,
> > then you are going to lose any edge which comes in before the EOI as
> > well.
>
> Well, the real MSIs have an extra bit to allow the eoi to dally behind
> the mmio on another path and that should cover this race when the irq
> is left enabled.
>
> Jan-Bernd HEA has that change, right?

I don't now. We never hit problems so we did not look very deep into this
area.
We probably have to talk to the HEA HW developers to be sure.

Regards,
Jan-Bernd


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2010-09-20 14:26 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-18 21:33 [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY) Darren Hart
2010-05-18 21:52 ` Brian King
2010-05-18 22:19   ` Nivedita Singhvi
2010-05-18 22:22   ` Darren Hart
2010-05-19  1:25     ` Michael Ellerman
2010-05-19 14:16       ` Darren Hart
2010-05-19 14:38         ` Thomas Gleixner
2010-05-19 21:08           ` Thomas Gleixner
2010-05-20  1:34             ` Michael Ellerman
2010-05-20  7:37               ` Jan-Bernd Themann
2010-05-20  8:14                 ` Thomas Gleixner
2010-05-20  9:05                   ` Jan-Bernd Themann
2010-05-20  9:19                     ` Thomas Gleixner
2010-05-20 14:26                       ` Nivedita Singhvi
2010-05-20 14:53                     ` Will Schmidt
2010-05-20 14:39                   ` Darren Hart
2010-05-20 14:45                     ` Thomas Gleixner
2010-05-20 21:44                       ` Will Schmidt
2010-05-20  1:32           ` Michael Ellerman
2010-05-20  8:21             ` Thomas Gleixner
2010-05-21  9:18               ` [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY), " Milton Miller
2010-09-20 14:26                 ` Jan-Bernd Themann
2010-05-20  1:28         ` Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).