All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai] IRQ flow handler: “handle_edge_irq” vs “handle_level_irq”
@ 2015-05-13  7:55 Hongfei Cheng
  2015-05-13 18:06 ` Gilles Chanteperdrix
  0 siblings, 1 reply; 13+ messages in thread
From: Hongfei Cheng @ 2015-05-13  7:55 UTC (permalink / raw)
  To: xenomai

Hi All,

In the document of "Porting Xenomai dual kernel to a new ARM SoC",
section "flow handler" states: If the flow handler is
“handle_edge_irq”, and the system locks up when the first interrupt
happens, try replacing “handle_edge_irq” with “handle_level_irq”.

Could anyone explain why “handle_edge_irq” can cause system to hang
while “handle_level_irq” doesn't? I searched the archives but failed
to find any discussion on this topic.

Thanks!
Hongfei


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Xenomai] IRQ flow handler: “handle_edge_irq” vs “handle_level_irq”
  2015-05-13  7:55 [Xenomai] IRQ flow handler: “handle_edge_irq” vs “handle_level_irq” Hongfei Cheng
@ 2015-05-13 18:06 ` Gilles Chanteperdrix
  2015-05-18 17:32   ` Hongfei Cheng
  0 siblings, 1 reply; 13+ messages in thread
From: Gilles Chanteperdrix @ 2015-05-13 18:06 UTC (permalink / raw)
  To: Hongfei Cheng; +Cc: xenomai

On Wed, May 13, 2015 at 12:55:46AM -0700, Hongfei Cheng wrote:
> Hi All,
> 
> In the document of "Porting Xenomai dual kernel to a new ARM SoC",
> section "flow handler" states: If the flow handler is
> “handle_edge_irq”, and the system locks up when the first interrupt
> happens, try replacing “handle_edge_irq” with “handle_level_irq”.
> 
> Could anyone explain why “handle_edge_irq” can cause system to hang
> while “handle_level_irq” doesn't? I searched the archives but failed
> to find any discussion on this topic.

level irqs are acked and masked at interrupt controller level on
entry, whereas edge interrupts are only acked. The Adeos patch
unmasks the irqs at processor level after this initial ack and mask
or ack. So, if the interrupt controller keeps the interrupt asserted
at the processor level when the level stays the same (IOW, if the
interrupt controller only supports the "level" behaviour), unmasking
the interrupt line will cause the interrupt to trigger again, being
ack again, etc... without any chance for the handler to ack the
interrupt at device level. The fix is to use handle_level_irq so
that the irq is masked at interrupt controller level before irqs are
unmasked at processor level.

This does not happen with an unpatched kernel, because irqs are not
unmasked at processor level until the handler has had a chance to
execute and ack the interrupt at device level.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Xenomai] IRQ flow handler: “handle_edge_irq” vs “handle_level_irq”
  2015-05-13 18:06 ` Gilles Chanteperdrix
@ 2015-05-18 17:32   ` Hongfei Cheng
  2015-05-18 17:36     ` Gilles Chanteperdrix
  0 siblings, 1 reply; 13+ messages in thread
From: Hongfei Cheng @ 2015-05-18 17:32 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

On Wed, May 13, 2015 at 11:06 AM, Gilles Chanteperdrix
<gilles.chanteperdrix@xenomai.org> wrote:
> On Wed, May 13, 2015 at 12:55:46AM -0700, Hongfei Cheng wrote:
>> Hi All,
>>
>> In the document of "Porting Xenomai dual kernel to a new ARM SoC",
>> section "flow handler" states: If the flow handler is
>> “handle_edge_irq”, and the system locks up when the first interrupt
>> happens, try replacing “handle_edge_irq” with “handle_level_irq”.
>>
>> Could anyone explain why “handle_edge_irq” can cause system to hang
>> while “handle_level_irq” doesn't? I searched the archives but failed
>> to find any discussion on this topic.
>
> level irqs are acked and masked at interrupt controller level on
> entry, whereas edge interrupts are only acked. The Adeos patch
> unmasks the irqs at processor level after this initial ack and mask
> or ack. So, if the interrupt controller keeps the interrupt asserted
> at the processor level when the level stays the same (IOW, if the
> interrupt controller only supports the "level" behaviour), unmasking
> the interrupt line will cause the interrupt to trigger again, being
> ack again, etc... without any chance for the handler to ack the
> interrupt at device level. The fix is to use handle_level_irq so
> that the irq is masked at interrupt controller level before irqs are
> unmasked at processor level.
>
> This does not happen with an unpatched kernel, because irqs are not
> unmasked at processor level until the handler has had a chance to
> execute and ack the interrupt at device level.

Gilles, thank you for the detailed explanation.

For ARM Cortex-A9/A15/A57, can we make the following statements as to
which irq flow handler can be used for a specific interrupt:
1. An interrupt input is programmed as "level-sensitive" on the
interrupt controller (GIC): only flow handler “handle_level_irq”
should be used at the device driver level.
2. An interrupt input is programmed as "edge-triggered" on the
interrupt controller (GIC): either "edge" or "level" flow handler can
be used at the device driver level.

Hongfei


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Xenomai] IRQ flow handler: “handle_edge_irq” vs “handle_level_irq”
  2015-05-18 17:32   ` Hongfei Cheng
@ 2015-05-18 17:36     ` Gilles Chanteperdrix
  2015-05-19 17:23       ` Hongfei Cheng
  0 siblings, 1 reply; 13+ messages in thread
From: Gilles Chanteperdrix @ 2015-05-18 17:36 UTC (permalink / raw)
  To: Hongfei Cheng; +Cc: xenomai

On Mon, May 18, 2015 at 10:32:49AM -0700, Hongfei Cheng wrote:
> On Wed, May 13, 2015 at 11:06 AM, Gilles Chanteperdrix
> <gilles.chanteperdrix@xenomai.org> wrote:
> > On Wed, May 13, 2015 at 12:55:46AM -0700, Hongfei Cheng wrote:
> >> Hi All,
> >>
> >> In the document of "Porting Xenomai dual kernel to a new ARM SoC",
> >> section "flow handler" states: If the flow handler is
> >> “handle_edge_irq”, and the system locks up when the first interrupt
> >> happens, try replacing “handle_edge_irq” with “handle_level_irq”.
> >>
> >> Could anyone explain why “handle_edge_irq” can cause system to hang
> >> while “handle_level_irq” doesn't? I searched the archives but failed
> >> to find any discussion on this topic.
> >
> > level irqs are acked and masked at interrupt controller level on
> > entry, whereas edge interrupts are only acked. The Adeos patch
> > unmasks the irqs at processor level after this initial ack and mask
> > or ack. So, if the interrupt controller keeps the interrupt asserted
> > at the processor level when the level stays the same (IOW, if the
> > interrupt controller only supports the "level" behaviour), unmasking
> > the interrupt line will cause the interrupt to trigger again, being
> > ack again, etc... without any chance for the handler to ack the
> > interrupt at device level. The fix is to use handle_level_irq so
> > that the irq is masked at interrupt controller level before irqs are
> > unmasked at processor level.
> >
> > This does not happen with an unpatched kernel, because irqs are not
> > unmasked at processor level until the handler has had a chance to
> > execute and ack the interrupt at device level.
> 
> Gilles, thank you for the detailed explanation.
> 
> For ARM Cortex-A9/A15/A57, can we make the following statements as to
> which irq flow handler can be used for a specific interrupt:
> 1. An interrupt input is programmed as "level-sensitive" on the
> interrupt controller (GIC): only flow handler “handle_level_irq”
> should be used at the device driver level.
> 2. An interrupt input is programmed as "edge-triggered" on the
> interrupt controller (GIC): either "edge" or "level" flow handler can
> be used at the device driver level.

I believe on cortex A9, the GIC irqs are of the "fasteoi" type.
Which is a kind of level, but requires more patching of the irq
controller routines. Is not that code shared between 32 and 64 bits?

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Xenomai] IRQ flow handler: “handle_edge_irq” vs “handle_level_irq”
  2015-05-18 17:36     ` Gilles Chanteperdrix
@ 2015-05-19 17:23       ` Hongfei Cheng
  2015-05-19 18:39         ` Gilles Chanteperdrix
  0 siblings, 1 reply; 13+ messages in thread
From: Hongfei Cheng @ 2015-05-19 17:23 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

On Mon, May 18, 2015 at 10:36 AM, Gilles Chanteperdrix
<gilles.chanteperdrix@xenomai.org> wrote:
> On Mon, May 18, 2015 at 10:32:49AM -0700, Hongfei Cheng wrote:
>> On Wed, May 13, 2015 at 11:06 AM, Gilles Chanteperdrix
>> <gilles.chanteperdrix@xenomai.org> wrote:
>> > On Wed, May 13, 2015 at 12:55:46AM -0700, Hongfei Cheng wrote:
>> >> Hi All,
>> >>
>> >> In the document of "Porting Xenomai dual kernel to a new ARM SoC",
>> >> section "flow handler" states: If the flow handler is
>> >> “handle_edge_irq”, and the system locks up when the first interrupt
>> >> happens, try replacing “handle_edge_irq” with “handle_level_irq”.
>> >>
>> >> Could anyone explain why “handle_edge_irq” can cause system to hang
>> >> while “handle_level_irq” doesn't? I searched the archives but failed
>> >> to find any discussion on this topic.
>> >
>> > level irqs are acked and masked at interrupt controller level on
>> > entry, whereas edge interrupts are only acked. The Adeos patch
>> > unmasks the irqs at processor level after this initial ack and mask
>> > or ack. So, if the interrupt controller keeps the interrupt asserted
>> > at the processor level when the level stays the same (IOW, if the
>> > interrupt controller only supports the "level" behaviour), unmasking
>> > the interrupt line will cause the interrupt to trigger again, being
>> > ack again, etc... without any chance for the handler to ack the
>> > interrupt at device level. The fix is to use handle_level_irq so
>> > that the irq is masked at interrupt controller level before irqs are
>> > unmasked at processor level.
>> >
>> > This does not happen with an unpatched kernel, because irqs are not
>> > unmasked at processor level until the handler has had a chance to
>> > execute and ack the interrupt at device level.
>>
>> Gilles, thank you for the detailed explanation.
>>
>> For ARM Cortex-A9/A15/A57, can we make the following statements as to
>> which irq flow handler can be used for a specific interrupt:
>> 1. An interrupt input is programmed as "level-sensitive" on the
>> interrupt controller (GIC): only flow handler “handle_level_irq”
>> should be used at the device driver level.
>> 2. An interrupt input is programmed as "edge-triggered" on the
>> interrupt controller (GIC): either "edge" or "level" flow handler can
>> be used at the device driver level.
>
> I believe on cortex A9, the GIC irqs are of the "fasteoi" type.
> Which is a kind of level, but requires more patching of the irq
> controller routines. Is not that code shared between 32 and 64 bits?

The GIC code is indeed shared between AArch32 and AArch64. However on
our ARMv8 platform, only parts of the SPIs, 32~447, are initialized
and mapped to fasteoi. The rest of the SPIs, 448~639, are for platform
specific peripheral drivers. These drivers use handle_level_irq and
handle_edge_irq as flow handlers.

What we are seeing is that, when I-pipe is enabled, some of the
platform specific peripheral drivers receive numerous interrupts which
are nonexistent when I-pipe is disabled. At first glance, such problem
appears to match the pattern of using "handle_edge_irq" for
level-triggered interrupts at the GIC. However the number of the
"bogus" interrupts (per second) are only in the 100s. The handlers in
the device drivers still have chance to process these interrupts.

Here is a partial call stack:
<peripheral-driver-irq-mask>+0xc/0x18
gic_hold_irq+0xb8/0x134
__ipipe_ack_fasteoi_irq+0x14/0x20
__ipipe_dispatch_irq+0x90/0x2a8
__ipipe_grab_irq+0x5c/0x74

Does this call stack look correct to you? Any idea as to the cause of
such problem?

Thanks,
Hongfei


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Xenomai] IRQ flow handler: “handle_edge_irq” vs “handle_level_irq”
  2015-05-19 17:23       ` Hongfei Cheng
@ 2015-05-19 18:39         ` Gilles Chanteperdrix
  2015-05-19 19:13           ` Hongfei Cheng
  2015-05-29 21:32           ` Hongfei Cheng
  0 siblings, 2 replies; 13+ messages in thread
From: Gilles Chanteperdrix @ 2015-05-19 18:39 UTC (permalink / raw)
  To: Hongfei Cheng; +Cc: xenomai

On Tue, May 19, 2015 at 10:23:36AM -0700, Hongfei Cheng wrote:
> On Mon, May 18, 2015 at 10:36 AM, Gilles Chanteperdrix
> <gilles.chanteperdrix@xenomai.org> wrote:
> > On Mon, May 18, 2015 at 10:32:49AM -0700, Hongfei Cheng wrote:
> >> On Wed, May 13, 2015 at 11:06 AM, Gilles Chanteperdrix
> >> <gilles.chanteperdrix@xenomai.org> wrote:
> >> > On Wed, May 13, 2015 at 12:55:46AM -0700, Hongfei Cheng wrote:
> >> >> Hi All,
> >> >>
> >> >> In the document of "Porting Xenomai dual kernel to a new ARM SoC",
> >> >> section "flow handler" states: If the flow handler is
> >> >> “handle_edge_irq”, and the system locks up when the first interrupt
> >> >> happens, try replacing “handle_edge_irq” with “handle_level_irq”.
> >> >>
> >> >> Could anyone explain why “handle_edge_irq” can cause system to hang
> >> >> while “handle_level_irq” doesn't? I searched the archives but failed
> >> >> to find any discussion on this topic.
> >> >
> >> > level irqs are acked and masked at interrupt controller level on
> >> > entry, whereas edge interrupts are only acked. The Adeos patch
> >> > unmasks the irqs at processor level after this initial ack and mask
> >> > or ack. So, if the interrupt controller keeps the interrupt asserted
> >> > at the processor level when the level stays the same (IOW, if the
> >> > interrupt controller only supports the "level" behaviour), unmasking
> >> > the interrupt line will cause the interrupt to trigger again, being
> >> > ack again, etc... without any chance for the handler to ack the
> >> > interrupt at device level. The fix is to use handle_level_irq so
> >> > that the irq is masked at interrupt controller level before irqs are
> >> > unmasked at processor level.
> >> >
> >> > This does not happen with an unpatched kernel, because irqs are not
> >> > unmasked at processor level until the handler has had a chance to
> >> > execute and ack the interrupt at device level.
> >>
> >> Gilles, thank you for the detailed explanation.
> >>
> >> For ARM Cortex-A9/A15/A57, can we make the following statements as to
> >> which irq flow handler can be used for a specific interrupt:
> >> 1. An interrupt input is programmed as "level-sensitive" on the
> >> interrupt controller (GIC): only flow handler “handle_level_irq”
> >> should be used at the device driver level.
> >> 2. An interrupt input is programmed as "edge-triggered" on the
> >> interrupt controller (GIC): either "edge" or "level" flow handler can
> >> be used at the device driver level.
> >
> > I believe on cortex A9, the GIC irqs are of the "fasteoi" type.
> > Which is a kind of level, but requires more patching of the irq
> > controller routines. Is not that code shared between 32 and 64 bits?
> 
> The GIC code is indeed shared between AArch32 and AArch64. However on
> our ARMv8 platform, only parts of the SPIs, 32~447, are initialized
> and mapped to fasteoi. The rest of the SPIs, 448~639, are for platform
> specific peripheral drivers. These drivers use handle_level_irq and
> handle_edge_irq as flow handlers.
> 
> What we are seeing is that, when I-pipe is enabled, some of the
> platform specific peripheral drivers receive numerous interrupts which
> are nonexistent when I-pipe is disabled. At first glance, such problem
> appears to match the pattern of using "handle_edge_irq" for
> level-triggered interrupts at the GIC. However the number of the
> "bogus" interrupts (per second) are only in the 100s. The handlers in
> the device drivers still have chance to process these interrupts.
> 
> Here is a partial call stack:
> <peripheral-driver-irq-mask>+0xc/0x18
> gic_hold_irq+0xb8/0x134
> __ipipe_ack_fasteoi_irq+0x14/0x20
> __ipipe_dispatch_irq+0x90/0x2a8
> __ipipe_grab_irq+0x5c/0x74
> 
> Does this call stack look correct to you? Any idea as to the cause of
> such problem?

No, this call stack does not look correct. gic_hold_irq should only
mask the irq at interrupt controller level, not call back anything
related to peripheral driver itself.

Also, if an irq does not use handle_fasteoi_irq, gic_hold_irq should
not be invoked.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Xenomai] IRQ flow handler: “handle_edge_irq” vs “handle_level_irq”
  2015-05-19 18:39         ` Gilles Chanteperdrix
@ 2015-05-19 19:13           ` Hongfei Cheng
  2015-05-29 21:32           ` Hongfei Cheng
  1 sibling, 0 replies; 13+ messages in thread
From: Hongfei Cheng @ 2015-05-19 19:13 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

On Tue, May 19, 2015 at 11:39 AM, Gilles Chanteperdrix
<gilles.chanteperdrix@xenomai.org> wrote:
> On Tue, May 19, 2015 at 10:23:36AM -0700, Hongfei Cheng wrote:
>> On Mon, May 18, 2015 at 10:36 AM, Gilles Chanteperdrix
>> <gilles.chanteperdrix@xenomai.org> wrote:
>> > On Mon, May 18, 2015 at 10:32:49AM -0700, Hongfei Cheng wrote:
>> >> On Wed, May 13, 2015 at 11:06 AM, Gilles Chanteperdrix
>> >> <gilles.chanteperdrix@xenomai.org> wrote:
>> >> > On Wed, May 13, 2015 at 12:55:46AM -0700, Hongfei Cheng wrote:
>> >> >> Hi All,
>> >> >>
>> >> >> In the document of "Porting Xenomai dual kernel to a new ARM SoC",
>> >> >> section "flow handler" states: If the flow handler is
>> >> >> “handle_edge_irq”, and the system locks up when the first interrupt
>> >> >> happens, try replacing “handle_edge_irq” with “handle_level_irq”.
>> >> >>
>> >> >> Could anyone explain why “handle_edge_irq” can cause system to hang
>> >> >> while “handle_level_irq” doesn't? I searched the archives but failed
>> >> >> to find any discussion on this topic.
>> >> >
>> >> > level irqs are acked and masked at interrupt controller level on
>> >> > entry, whereas edge interrupts are only acked. The Adeos patch
>> >> > unmasks the irqs at processor level after this initial ack and mask
>> >> > or ack. So, if the interrupt controller keeps the interrupt asserted
>> >> > at the processor level when the level stays the same (IOW, if the
>> >> > interrupt controller only supports the "level" behaviour), unmasking
>> >> > the interrupt line will cause the interrupt to trigger again, being
>> >> > ack again, etc... without any chance for the handler to ack the
>> >> > interrupt at device level. The fix is to use handle_level_irq so
>> >> > that the irq is masked at interrupt controller level before irqs are
>> >> > unmasked at processor level.
>> >> >
>> >> > This does not happen with an unpatched kernel, because irqs are not
>> >> > unmasked at processor level until the handler has had a chance to
>> >> > execute and ack the interrupt at device level.
>> >>
>> >> Gilles, thank you for the detailed explanation.
>> >>
>> >> For ARM Cortex-A9/A15/A57, can we make the following statements as to
>> >> which irq flow handler can be used for a specific interrupt:
>> >> 1. An interrupt input is programmed as "level-sensitive" on the
>> >> interrupt controller (GIC): only flow handler “handle_level_irq”
>> >> should be used at the device driver level.
>> >> 2. An interrupt input is programmed as "edge-triggered" on the
>> >> interrupt controller (GIC): either "edge" or "level" flow handler can
>> >> be used at the device driver level.
>> >
>> > I believe on cortex A9, the GIC irqs are of the "fasteoi" type.
>> > Which is a kind of level, but requires more patching of the irq
>> > controller routines. Is not that code shared between 32 and 64 bits?
>>
>> The GIC code is indeed shared between AArch32 and AArch64. However on
>> our ARMv8 platform, only parts of the SPIs, 32~447, are initialized
>> and mapped to fasteoi. The rest of the SPIs, 448~639, are for platform
>> specific peripheral drivers. These drivers use handle_level_irq and
>> handle_edge_irq as flow handlers.
>>
>> What we are seeing is that, when I-pipe is enabled, some of the
>> platform specific peripheral drivers receive numerous interrupts which
>> are nonexistent when I-pipe is disabled. At first glance, such problem
>> appears to match the pattern of using "handle_edge_irq" for
>> level-triggered interrupts at the GIC. However the number of the
>> "bogus" interrupts (per second) are only in the 100s. The handlers in
>> the device drivers still have chance to process these interrupts.
>>
>> Here is a partial call stack:
>> <peripheral-driver-irq-mask>+0xc/0x18
>> gic_hold_irq+0xb8/0x134
>> __ipipe_ack_fasteoi_irq+0x14/0x20
>> __ipipe_dispatch_irq+0x90/0x2a8
>> __ipipe_grab_irq+0x5c/0x74
>>
>> Does this call stack look correct to you? Any idea as to the cause of
>> such problem?
>
> No, this call stack does not look correct. gic_hold_irq should only
> mask the irq at interrupt controller level, not call back anything
> related to peripheral driver itself.
>
> Also, if an irq does not use handle_fasteoi_irq, gic_hold_irq should
> not be invoked.

Thanks, Gilles. I'll dive into this and report back my findings.

Hongfei


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Xenomai] IRQ flow handler: “handle_edge_irq” vs “handle_level_irq”
  2015-05-19 18:39         ` Gilles Chanteperdrix
  2015-05-19 19:13           ` Hongfei Cheng
@ 2015-05-29 21:32           ` Hongfei Cheng
  2015-05-30 14:12             ` Gilles Chanteperdrix
  1 sibling, 1 reply; 13+ messages in thread
From: Hongfei Cheng @ 2015-05-29 21:32 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

On Tue, May 19, 2015 at 11:39 AM, Gilles Chanteperdrix <
gilles.chanteperdrix@xenomai.org> wrote:
> On Tue, May 19, 2015 at 10:23:36AM -0700, Hongfei Cheng wrote:
>> On Mon, May 18, 2015 at 10:36 AM, Gilles Chanteperdrix
>> <gilles.chanteperdrix@xenomai.org> wrote:
>> > On Mon, May 18, 2015 at 10:32:49AM -0700, Hongfei Cheng wrote:
>> >> On Wed, May 13, 2015 at 11:06 AM, Gilles Chanteperdrix
>> >> <gilles.chanteperdrix@xenomai.org> wrote:
>> >> > On Wed, May 13, 2015 at 12:55:46AM -0700, Hongfei Cheng wrote:
>> >> >> Hi All,
>> >> >>
>> >> >> In the document of "Porting Xenomai dual kernel to a new ARM SoC",
>> >> >> section "flow handler" states: If the flow handler is
>> >> >> “handle_edge_irq”, and the system locks up when the first interrupt
>> >> >> happens, try replacing “handle_edge_irq” with “handle_level_irq”.
>> >> >>
>> >> >> Could anyone explain why “handle_edge_irq” can cause system to hang
>> >> >> while “handle_level_irq” doesn't? I searched the archives but
failed
>> >> >> to find any discussion on this topic.
>> >> >
>> >> > level irqs are acked and masked at interrupt controller level on
>> >> > entry, whereas edge interrupts are only acked. The Adeos patch
>> >> > unmasks the irqs at processor level after this initial ack and mask
>> >> > or ack. So, if the interrupt controller keeps the interrupt asserted
>> >> > at the processor level when the level stays the same (IOW, if the
>> >> > interrupt controller only supports the "level" behaviour), unmasking
>> >> > the interrupt line will cause the interrupt to trigger again, being
>> >> > ack again, etc... without any chance for the handler to ack the
>> >> > interrupt at device level. The fix is to use handle_level_irq so
>> >> > that the irq is masked at interrupt controller level before irqs are
>> >> > unmasked at processor level.
>> >> >
>> >> > This does not happen with an unpatched kernel, because irqs are not
>> >> > unmasked at processor level until the handler has had a chance to
>> >> > execute and ack the interrupt at device level.
>> >>
>> >> Gilles, thank you for the detailed explanation.
>> >>
>> >> For ARM Cortex-A9/A15/A57, can we make the following statements as to
>> >> which irq flow handler can be used for a specific interrupt:
>> >> 1. An interrupt input is programmed as "level-sensitive" on the
>> >> interrupt controller (GIC): only flow handler “handle_level_irq”
>> >> should be used at the device driver level.
>> >> 2. An interrupt input is programmed as "edge-triggered" on the
>> >> interrupt controller (GIC): either "edge" or "level" flow handler can
>> >> be used at the device driver level.
>> >
>> > I believe on cortex A9, the GIC irqs are of the "fasteoi" type.
>> > Which is a kind of level, but requires more patching of the irq
>> > controller routines. Is not that code shared between 32 and 64 bits?
>>
>> The GIC code is indeed shared between AArch32 and AArch64. However on
>> our ARMv8 platform, only parts of the SPIs, 32~447, are initialized
>> and mapped to fasteoi. The rest of the SPIs, 448~639, are for platform
>> specific peripheral drivers. These drivers use handle_level_irq and
>> handle_edge_irq as flow handlers.
>>
>> What we are seeing is that, when I-pipe is enabled, some of the
>> platform specific peripheral drivers receive numerous interrupts which
>> are nonexistent when I-pipe is disabled. At first glance, such problem
>> appears to match the pattern of using "handle_edge_irq" for
>> level-triggered interrupts at the GIC. However the number of the
>> "bogus" interrupts (per second) are only in the 100s. The handlers in
>> the device drivers still have chance to process these interrupts.
>>
>> Here is a partial call stack:
>> <peripheral-driver-irq-mask>+0xc/0x18
>> gic_hold_irq+0xb8/0x134
>> __ipipe_ack_fasteoi_irq+0x14/0x20
>> __ipipe_dispatch_irq+0x90/0x2a8
>> __ipipe_grab_irq+0x5c/0x74
>>
>> Does this call stack look correct to you? Any idea as to the cause of
>> such problem?
>
> No, this call stack does not look correct. gic_hold_irq should only
> mask the irq at interrupt controller level, not call back anything
> related to peripheral driver itself.
>
> Also, if an irq does not use handle_fasteoi_irq, gic_hold_irq should
> not be invoked.

What we've found is that one of the vendor-specific platform drivers
defines arch specific GIC irq extension, gic_arch_extn. The GIC callbacks
added by the I-pipe core, gic_hold_irq() and gic_release_irq(), "extend"
some of the GIC irqs to the platform driver's irq_mask and irq_unmask
callbacks. None of these GIC irqs are registered by the platform driver.
The following code changes are needed to successfully probe the platform
driver.

In addition, the spinlock used in the irq callbacks of the platform driver
has to be converted to I-pipe spinlock. Below is a sequence of the
spinlocks used for gic_mask_irq(), for instance:
<drivers/irqchip/irq-gic.c> gic_mask_irq():
raw_spin_lock_irqsave_cond(&irq_controller_lock, flags); /*
IPIPE_DEFINE_RAW_SPINLOCK(irq_controller_lock) */
<platform driver> msm_mpm_disable_irq(): spin_lock_irqsave(&msm_mpm_lock,
flags); /* IPIPE_DEFINE_SPINLOCK(msm_mpm_lock) */
<kernel/sched/core.c> complete(struct completion *x):
spin_lock_irqsave(&x->wait.lock, flags); /* x->wait.lock is defined as
spinlock_t */
......
<kernel/sched/core.c> complete(struct completion *x):
spin_unlock_irqrestore(&x->wait.lock, flags);
<platform driver> msm_mpm_disable_irq():
spin_unlock_irqrestore(&msm_mpm_lock, flags);
<drivers/irqchip/irq-gic.c>:
raw_spin_unlock_irqrestore_cond(&irq_controller_lock, flags);

Two questions:
1). Is the linux spinlock used in its scheduler's complete() function safe
in this chain of spinlocks?
2). Is there any risk if gic_arch_extn.irq_hold() and
gic_arch_extn.irq_release() are set as NULL (in the platform driver)?

diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
index debcd50..fdb0a5b 100644
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@ -363,11 +363,16 @@ static void gic_hold_irq(struct irq_data *d)

        raw_spin_lock_irqsave_cond(&irq_controller_lock, flags);
        writel_relaxed_no_log(mask, gic_dist_base(d) +
GIC_DIST_ENABLE_CLEAR + (gic_irq(d) / 32) * 4);
+#ifdef CONFIG_ARCH_MSM
+       if (gic_arch_extn.irq_hold)
+               gic_arch_extn.irq_hold(d);
+#else /* !CONFIG_ARCH_MSM */
        if (gic_arch_extn.irq_mask)
                gic_arch_extn.irq_mask(d);
        if (gic_arch_extn.irq_eoi) {
                gic_arch_extn.irq_eoi(d);
        }
+#endif /* CONFIG_ARCH_MSM */
        writel_relaxed(gic_irq(d), gic_cpu_base(d) + GIC_CPU_EOI);
        raw_spin_unlock_irqrestore_cond(&irq_controller_lock, flags);
 }
@@ -378,8 +383,13 @@ static void gic_release_irq(struct irq_data *d)
        unsigned long flags;

        raw_spin_lock_irqsave_cond(&irq_controller_lock, flags);
+#ifdef CONFIG_ARCH_MSM
+       if (gic_arch_extn.irq_release)
+               gic_arch_extn.irq_release(d);
+#else /* !CONFIG_ARCH_MSM */
        if (gic_arch_extn.irq_unmask)
                gic_arch_extn.irq_unmask(d);
+#endif /* CONFIG_ARCH_MSM */
        writel_relaxed(mask, gic_dist_base(d) + GIC_DIST_ENABLE_SET +
(gic_irq(d) / 32) * 4);
        raw_spin_unlock_irqrestore_cond(&irq_controller_lock, flags);
 }

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [Xenomai] IRQ flow handler: “handle_edge_irq” vs “handle_level_irq”
  2015-05-29 21:32           ` Hongfei Cheng
@ 2015-05-30 14:12             ` Gilles Chanteperdrix
  2015-06-01 17:36               ` Hongfei Cheng
  0 siblings, 1 reply; 13+ messages in thread
From: Gilles Chanteperdrix @ 2015-05-30 14:12 UTC (permalink / raw)
  To: Hongfei Cheng; +Cc: xenomai

On Fri, May 29, 2015 at 02:32:51PM -0700, Hongfei Cheng wrote:
> Two questions:
> 1). Is the linux spinlock used in its scheduler's complete() function safe
> in this chain of spinlocks?

Definitely not. Obviously. I am even surprised you ask.

> 2). Is there any risk if gic_arch_extn.irq_hold() and
> gic_arch_extn.irq_release() are set as NULL (in the platform driver)?

There should be no reason to define this archiecture hold and
release. Using the mask and unmask ones should be sufficient.
 

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Xenomai] IRQ flow handler: “handle_edge_irq” vs “handle_level_irq”
  2015-05-30 14:12             ` Gilles Chanteperdrix
@ 2015-06-01 17:36               ` Hongfei Cheng
  2015-06-01 17:49                 ` Gilles Chanteperdrix
  0 siblings, 1 reply; 13+ messages in thread
From: Hongfei Cheng @ 2015-06-01 17:36 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

On Sat, May 30, 2015 at 7:12 AM, Gilles Chanteperdrix
<gilles.chanteperdrix@xenomai.org> wrote:
> On Fri, May 29, 2015 at 02:32:51PM -0700, Hongfei Cheng wrote:
>> Two questions:
>> 1). Is the linux spinlock used in its scheduler's complete() function safe
>> in this chain of spinlocks?
>
> Definitely not. Obviously. I am even surprised you ask.

What surprised me is that, with a Linux spinlock in the loop, the
platform driver is probed successfully and the system boots just fine
with I-pipe and Xenomai enabled. We can even run the latency test with
reasonable results.

What are the obvious signs to look for when:
1). a Linux spinlock is mistakenly used in place of an ipipe spinlock?
2). an ipipe spinlock is mistakenly used in place of a Linux spinlock?

>> 2). Is there any risk if gic_arch_extn.irq_hold() and
>> gic_arch_extn.irq_release() are set as NULL (in the platform driver)?
>
> There should be no reason to define this archiecture hold and
> release. Using the mask and unmask ones should be sufficient.

When gic_arch_extn is defined by the platform driver, without making
the changes in gic_hold_irq() and gic_release_irq(), we'd get the
following incorrect call stack, which you previously confirmed. The
platform driver also fails during system bootup.

msm_mpm_disable_irq+0xc/0x18
gic_hold_irq+0xb8/0x134
__ipipe_ack_fasteoi_irq+0x14/0x20
__ipipe_dispatch_irq+0x90/0x2a8
__ipipe_grab_irq+0x5c/0x74

By the way, what are the purposes of gic_hold_irq() and gic_release_irq()?

Thanks,
Hongfei


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Xenomai] IRQ flow handler: “handle_edge_irq” vs “handle_level_irq”
  2015-06-01 17:36               ` Hongfei Cheng
@ 2015-06-01 17:49                 ` Gilles Chanteperdrix
  2015-06-01 19:48                   ` Hongfei Cheng
  0 siblings, 1 reply; 13+ messages in thread
From: Gilles Chanteperdrix @ 2015-06-01 17:49 UTC (permalink / raw)
  To: Hongfei Cheng; +Cc: xenomai

On Mon, Jun 01, 2015 at 10:36:12AM -0700, Hongfei Cheng wrote:
> On Sat, May 30, 2015 at 7:12 AM, Gilles Chanteperdrix
> <gilles.chanteperdrix@xenomai.org> wrote:
> > On Fri, May 29, 2015 at 02:32:51PM -0700, Hongfei Cheng wrote:
> >> Two questions:
> >> 1). Is the linux spinlock used in its scheduler's complete() function safe
> >> in this chain of spinlocks?
> >
> > Definitely not. Obviously. I am even surprised you ask.
> 
> What surprised me is that, with a Linux spinlock in the loop, the
> platform driver is probed successfully and the system boots just fine
> with I-pipe and Xenomai enabled. We can even run the latency test with
> reasonable results.
> 
> What are the obvious signs to look for when:
> 1). a Linux spinlock is mistakenly used in place of an ipipe
> spinlock?

a Linux spinlock, when it disables interrupts, does not disable
them for real, so the risk is to be preempted by a real-time
activity which tries to take the same spinlock, and so you get a
deadlock. Of course, you will not observe anything as long as it
does not happen. But it can happen, and this is enough to want and
avoid it.

> 2). an ipipe spinlock is mistakenly used in place of a Linux spinlock?

If the section protected by the ipipe spinlock does things which
last for long, or ends up calling Linux functions, the risk is to
create large masking sections for the real time activity.

> 
> >> 2). Is there any risk if gic_arch_extn.irq_hold() and
> >> gic_arch_extn.irq_release() are set as NULL (in the platform driver)?
> >
> > There should be no reason to define this archiecture hold and
> > release. Using the mask and unmask ones should be sufficient.
> 
> When gic_arch_extn is defined by the platform driver, without making
> the changes in gic_hold_irq() and gic_release_irq(), we'd get the
> following incorrect call stack, which you previously confirmed.
> The

No, I did not confirm that this call stack was wrong. There is
nothing wrong in gic_hold_irq calling msm_mpm_disable_irq. Now if
msm_mpm_disable_irq calls some linux functions such as taking a
spinlock or calling scheduler related function, now that is wrong.

> platform driver also fails during system bootup.
> 
> msm_mpm_disable_irq+0xc/0x18
> gic_hold_irq+0xb8/0x134
> __ipipe_ack_fasteoi_irq+0x14/0x20
> __ipipe_dispatch_irq+0x90/0x2a8
> __ipipe_grab_irq+0x5c/0x74
> 
> By the way, what are the purposes of gic_hold_irq() and gic_release_irq()?

See the I-pipe porting guide for ARM.

Note that you could have found the answers to the questions in this
mail by understanding how things work. So, it would perhaps be
better for you to understand how the patch works, rather than
jumping head on in a port.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Xenomai] IRQ flow handler: “handle_edge_irq” vs “handle_level_irq”
  2015-06-01 17:49                 ` Gilles Chanteperdrix
@ 2015-06-01 19:48                   ` Hongfei Cheng
  2015-06-01 21:16                     ` Gilles Chanteperdrix
  0 siblings, 1 reply; 13+ messages in thread
From: Hongfei Cheng @ 2015-06-01 19:48 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

On Mon, Jun 1, 2015 at 10:49 AM, Gilles Chanteperdrix
<gilles.chanteperdrix@xenomai.org> wrote:
>> What surprised me is that, with a Linux spinlock in the loop, the
>> platform driver is probed successfully and the system boots just fine
>> with I-pipe and Xenomai enabled. We can even run the latency test with
>> reasonable results.
>>
>> What are the obvious signs to look for when:
>> 1). a Linux spinlock is mistakenly used in place of an ipipe
>> spinlock?
>
> a Linux spinlock, when it disables interrupts, does not disable
> them for real, so the risk is to be preempted by a real-time
> activity which tries to take the same spinlock, and so you get a
> deadlock. Of course, you will not observe anything as long as it
> does not happen. But it can happen, and this is enough to want and
> avoid it.

Makes sense to me. Thanks for the clarification.

>> 2). an ipipe spinlock is mistakenly used in place of a Linux spinlock?
>
> If the section protected by the ipipe spinlock does things which
> last for long, or ends up calling Linux functions, the risk is to
> create large masking sections for the real time activity.

Also makes sense.

>>
>> >> 2). Is there any risk if gic_arch_extn.irq_hold() and
>> >> gic_arch_extn.irq_release() are set as NULL (in the platform driver)?
>> >
>> > There should be no reason to define this archiecture hold and
>> > release. Using the mask and unmask ones should be sufficient.
>>
>> When gic_arch_extn is defined by the platform driver, without making
>> the changes in gic_hold_irq() and gic_release_irq(), we'd get the
>> following incorrect call stack, which you previously confirmed.
>> The
>
> No, I did not confirm that this call stack was wrong. There is
> nothing wrong in gic_hold_irq calling msm_mpm_disable_irq.

Looks like I misinterpreted the comment you made on May 19.
--------
>> Here is a partial call stack:
>> <peripheral-driver-irq-mask>+0xc/0x18
>> gic_hold_irq+0xb8/0x134
>> __ipipe_ack_fasteoi_irq+0x14/0x20
>> __ipipe_dispatch_irq+0x90/0x2a8
>> __ipipe_grab_irq+0x5c/0x74
>>
>> Does this call stack look correct to you?
>
> No, this call stack does not look correct. gic_hold_irq should only
> mask the irq at interrupt controller level, not call back anything
> related to peripheral driver itself.
>
> Also, if an irq does not use handle_fasteoi_irq, gic_hold_irq should
> not be invoked.
~~~~~

> Now if msm_mpm_disable_irq calls some linux functions such as
> taking a spinlock or calling scheduler related function,
> now that is wrong.

I agree.

>> platform driver also fails during system bootup.
>>
>> msm_mpm_disable_irq+0xc/0x18
>> gic_hold_irq+0xb8/0x134
>> __ipipe_ack_fasteoi_irq+0x14/0x20
>> __ipipe_dispatch_irq+0x90/0x2a8
>> __ipipe_grab_irq+0x5c/0x74
>>
>> By the way, what are the purposes of gic_hold_irq() and gic_release_irq()?
>
> See the I-pipe porting guide for ARM.
>
> Note that you could have found the answers to the questions in this
> mail by understanding how things work. So, it would perhaps be
> better for you to understand how the patch works, rather than
> jumping head on in a port.

I've been going through the porting guide for ARM SoC several times. I
also carefully studied the I-pipe core patch code. However the only
two references I've found about gic_hold_irq() and gic_release_irq()
in the "flow handler" section don't answer my questions. I'd
appreciate if you can point me to any document or code which describes
or demonstrates the usage of these two callbacks added by I-pipe.

Thanks,
Hongfei


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Xenomai] IRQ flow handler: “handle_edge_irq” vs “handle_level_irq”
  2015-06-01 19:48                   ` Hongfei Cheng
@ 2015-06-01 21:16                     ` Gilles Chanteperdrix
  0 siblings, 0 replies; 13+ messages in thread
From: Gilles Chanteperdrix @ 2015-06-01 21:16 UTC (permalink / raw)
  To: Hongfei Cheng; +Cc: xenomai

On Mon, Jun 01, 2015 at 12:48:44PM -0700, Hongfei Cheng wrote:
> On Mon, Jun 1, 2015 at 10:49 AM, Gilles Chanteperdrix
> <gilles.chanteperdrix@xenomai.org> wrote:
> >> What surprised me is that, with a Linux spinlock in the loop, the
> >> platform driver is probed successfully and the system boots just fine
> >> with I-pipe and Xenomai enabled. We can even run the latency test with
> >> reasonable results.
> >>
> >> What are the obvious signs to look for when:
> >> 1). a Linux spinlock is mistakenly used in place of an ipipe
> >> spinlock?
> >
> > a Linux spinlock, when it disables interrupts, does not disable
> > them for real, so the risk is to be preempted by a real-time
> > activity which tries to take the same spinlock, and so you get a
> > deadlock. Of course, you will not observe anything as long as it
> > does not happen. But it can happen, and this is enough to want and
> > avoid it.
> 
> Makes sense to me. Thanks for the clarification.
> 
> >> 2). an ipipe spinlock is mistakenly used in place of a Linux spinlock?
> >
> > If the section protected by the ipipe spinlock does things which
> > last for long, or ends up calling Linux functions, the risk is to
> > create large masking sections for the real time activity.
> 
> Also makes sense.
> 
> >>
> >> >> 2). Is there any risk if gic_arch_extn.irq_hold() and
> >> >> gic_arch_extn.irq_release() are set as NULL (in the platform driver)?
> >> >
> >> > There should be no reason to define this archiecture hold and
> >> > release. Using the mask and unmask ones should be sufficient.
> >>
> >> When gic_arch_extn is defined by the platform driver, without making
> >> the changes in gic_hold_irq() and gic_release_irq(), we'd get the
> >> following incorrect call stack, which you previously confirmed.
> >> The
> >
> > No, I did not confirm that this call stack was wrong. There is
> > nothing wrong in gic_hold_irq calling msm_mpm_disable_irq.
> 
> Looks like I misinterpreted the comment you made on May 19.
> --------
> >> Here is a partial call stack:
> >> <peripheral-driver-irq-mask>+0xc/0x18
> >> gic_hold_irq+0xb8/0x134
> >> __ipipe_ack_fasteoi_irq+0x14/0x20
> >> __ipipe_dispatch_irq+0x90/0x2a8
> >> __ipipe_grab_irq+0x5c/0x74
> >>
> >> Does this call stack look correct to you?
> >
> > No, this call stack does not look correct. gic_hold_irq should only
> > mask the irq at interrupt controller level, not call back anything
> > related to peripheral driver itself.
> >
> > Also, if an irq does not use handle_fasteoi_irq, gic_hold_irq should
> > not be invoked.

What I meant is that the peripheral driver business should end-up
being called in the peripheral driver handler, not the
per-architecture irq controller hook. However, I would rather think
msm_mpm_disable_irq is platform-specific, not peripheral driver
specific. If it is platform specific, then this is what the irq
controller hook is for.

> ~~~~~
> 
> > Now if msm_mpm_disable_irq calls some linux functions such as
> > taking a spinlock or calling scheduler related function,
> > now that is wrong.
> 
> I agree.
> 
> >> platform driver also fails during system bootup.
> >>
> >> msm_mpm_disable_irq+0xc/0x18
> >> gic_hold_irq+0xb8/0x134
> >> __ipipe_ack_fasteoi_irq+0x14/0x20
> >> __ipipe_dispatch_irq+0x90/0x2a8
> >> __ipipe_grab_irq+0x5c/0x74
> >>
> >> By the way, what are the purposes of gic_hold_irq() and gic_release_irq()?
> >
> > See the I-pipe porting guide for ARM.
> >
> > Note that you could have found the answers to the questions in this
> > mail by understanding how things work. So, it would perhaps be
> > better for you to understand how the patch works, rather than
> > jumping head on in a port.
> 
> I've been going through the porting guide for ARM SoC several times. I
> also carefully studied the I-pipe core patch code. However the only
> two references I've found about gic_hold_irq() and gic_release_irq()
> in the "flow handler" section don't answer my questions. I'd
> appreciate if you can point me to any document or code which describes
> or demonstrates the usage of these two callbacks added by I-pipe.

A fasteoi irq is a level irq, so, it should be masked upon reception
before interrupts are re-enabled at processor level, otherwise the
irq will trigger again. the mask_ack and unmask callbacks contain
calls to ipipe_lock/unlock_irq, which only make sense for Linux
domain irqs, when masking the irq line upon irq reception, we do not
know yet on what domain the irq will be handled, so the irq_hold and
irq_release callback do the same as mask and eoi + unmask, only
without the calls to ipipe_lock_irq/ipipe_unlock_irq.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-06-01 21:16 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-13  7:55 [Xenomai] IRQ flow handler: “handle_edge_irq” vs “handle_level_irq” Hongfei Cheng
2015-05-13 18:06 ` Gilles Chanteperdrix
2015-05-18 17:32   ` Hongfei Cheng
2015-05-18 17:36     ` Gilles Chanteperdrix
2015-05-19 17:23       ` Hongfei Cheng
2015-05-19 18:39         ` Gilles Chanteperdrix
2015-05-19 19:13           ` Hongfei Cheng
2015-05-29 21:32           ` Hongfei Cheng
2015-05-30 14:12             ` Gilles Chanteperdrix
2015-06-01 17:36               ` Hongfei Cheng
2015-06-01 17:49                 ` Gilles Chanteperdrix
2015-06-01 19:48                   ` Hongfei Cheng
2015-06-01 21:16                     ` Gilles Chanteperdrix

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.