linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Pending interrupts not always replayed
@ 2010-04-18  3:34 Guillaume Knispel
  2010-04-18 14:46 ` Guillaume Knispel
  0 siblings, 1 reply; 4+ messages in thread
From: Guillaume Knispel @ 2010-04-18  3:34 UTC (permalink / raw)
  To: linux-kernel, Linuxppc-dev
  Cc: Bartlomiej Zolnierkiewicz, Benjamin Herrenschmidt, Ingo Molnar,
	Lars-Peter Clausen, Linus Torvalds, Michael Buesch,
	Peter Zijlstra, Thomas Gleixner

Hi,

>From reading the code (kernel/irq stuffs), it seems that at least some
handle_edge_irq based interrupts are not replayed when enabling if
desc->chip->retrigger == NULL and on a platform where
CONFIG_HARDIRQS_SW_RESEND is not set (which for now is only defined for
(some?) arm and avr32). Depending on the pic this problem might
be masked if multiple interrupts are received between disable_irq and
enable_irq (maybe other conditions exist with controllers i don't know).

I happen to program a PPC SoC (MPC8555E to be precise), using some edge
interrupts on the PortC of the CPM2 pic (arch/powerpc/sysdev/cpm2_pic.c)
in a driver that does some enable_irq/disable_irq while various
patterns of interrupts are received and i think i've hit the problem.

Is the behavior intended (not always replaying edge triggered interrupts)
or is it just a bug to fix?

Anybody knows if it's safe to activate CONFIG_HARDIRQS_SW_RESEND for a
powerpc or are there some known dangerous interactions?
(I'll give it a try on my side in the meantime anyway)

Cheers!
-- 
Guillaume Knispel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Pending interrupts not always replayed
  2010-04-18  3:34 Pending interrupts not always replayed Guillaume Knispel
@ 2010-04-18 14:46 ` Guillaume Knispel
  2010-04-18 19:14   ` Thomas Gleixner
  0 siblings, 1 reply; 4+ messages in thread
From: Guillaume Knispel @ 2010-04-18 14:46 UTC (permalink / raw)
  To: linux-kernel, Linuxppc-dev
  Cc: Bartlomiej Zolnierkiewicz, Benjamin Herrenschmidt, Ingo Molnar,
	Lars-Peter Clausen, Linus Torvalds, Michael Buesch,
	Peter Zijlstra, Thomas Gleixner

On Sun, 18 Apr 2010 05:34:39 +0200
Guillaume Knispel <gknispel@proformatique.com> wrote:

> From reading the code (kernel/irq stuffs), it seems that at least some
> handle_edge_irq based interrupts are not replayed when enabling if
> desc->chip->retrigger == NULL and on a platform where
> CONFIG_HARDIRQS_SW_RESEND is not set (which for now is only defined for
> (some?) arm and avr32). Depending on the pic this problem might
> be masked if multiple interrupts are received between disable_irq and
> enable_irq (maybe other conditions exist with controllers i don't know).
> 
> I happen to program a PPC SoC (MPC8555E to be precise), using some edge
> interrupts on the PortC of the CPM2 pic (arch/powerpc/sysdev/cpm2_pic.c)
> in a driver that does some enable_irq/disable_irq while various
> patterns of interrupts are received and i think i've hit the problem.
> 
> Is the behavior intended (not always replaying edge triggered interrupts)
> or is it just a bug to fix?
> 
> Anybody knows if it's safe to activate CONFIG_HARDIRQS_SW_RESEND for a
> powerpc or are there some known dangerous interactions?
> (I'll give it a try on my side in the meantime anyway)

A follow up on this issue and on my CONFIG_HARDIRQS_SW_RESEND tests:
I've basically done this:

Index: linux-2.6.22.18/arch/powerpc/Kconfig
===================================================================
--- linux-2.6.22.18.orig/arch/powerpc/Kconfig   2010-04-18 15:22:24.000000000 +0200
+++ linux-2.6.22.18/arch/powerpc/Kconfig        2010-04-18 15:23:21.000000000 +0200
@@ -35,6 +35,10 @@
        bool
        default y
 
+config HARDIRQS_SW_RESEND
+       bool
+       default y
+
 config IRQ_PER_CPU
        bool
        default y

(yeah, I know, old kernel, but the code involved hasn't change much
 since)

and of course +CONFIG_HARDIRQS_SW_RESEND=y in my .config

Now everything seems to work fine: my device was not previously not
interrupting anymore after typically 1 or 2 minutes (because the
interrupt signal stays at level low until the device is served, so
if one falling edge is missed no more interruption will ever be seen),
and right now it has been running for 1 hour without any observable
problem.

Cheers!
-- 
Guillaume Knispel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Pending interrupts not always replayed
  2010-04-18 14:46 ` Guillaume Knispel
@ 2010-04-18 19:14   ` Thomas Gleixner
  2010-04-19 13:09     ` Guillaume Knispel
  0 siblings, 1 reply; 4+ messages in thread
From: Thomas Gleixner @ 2010-04-18 19:14 UTC (permalink / raw)
  To: Guillaume Knispel
  Cc: linux-kernel, Linuxppc-dev, Bartlomiej Zolnierkiewicz,
	Benjamin Herrenschmidt, Ingo Molnar, Lars-Peter Clausen,
	Linus Torvalds, Michael Buesch, Peter Zijlstra

On Sun, 18 Apr 2010, Guillaume Knispel wrote:
> Now everything seems to work fine: my device was not previously not
> interrupting anymore after typically 1 or 2 minutes (because the
> interrupt signal stays at level low until the device is served, so

So you are having a level interrupt device and the irq line is handled
by an edge type handler ? Why not configuring the irq line for level
and in the first place ?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Pending interrupts not always replayed
  2010-04-18 19:14   ` Thomas Gleixner
@ 2010-04-19 13:09     ` Guillaume Knispel
  0 siblings, 0 replies; 4+ messages in thread
From: Guillaume Knispel @ 2010-04-19 13:09 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, Linuxppc-dev, Bartlomiej Zolnierkiewicz,
	Benjamin Herrenschmidt, Ingo Molnar, Lars-Peter Clausen,
	Linus Torvalds, Michael Buesch, Peter Zijlstra

On Sun, 18 Apr 2010 21:14:12 +0200 (CEST)
Thomas Gleixner <tglx@linutronix.de> wrote:

> On Sun, 18 Apr 2010, Guillaume Knispel wrote:
> > Now everything seems to work fine: my device was not previously not
> > interrupting anymore after typically 1 or 2 minutes (because the
> > interrupt signal stays at level low until the device is served, so
> 
> So you are having a level interrupt device and the irq line is handled
> by an edge type handler ? Why not configuring the irq line for level
> and in the first place ?

Not really the problem here, it indeed helped me do discover the real
bug (because I would not have seen that an interrupt went missing if the
device have generated another one all by itself soon after that).

I would of course prefer to have this line in level trigger, but the
pic just can't do it for this signal, so I play with masking and
unmasking interrupts on the device side to regenerate edges when needed.
When interrupts are enabled, I do that in the isr, so if it is not
called after a falling edge it won't be anymore in the future.

What is a real bug IMHO is that when in falling edge trigger mode,
the sequence:
 - disable_irq(),
 - 1 falling edge,
 - enable_irq()
doesn't lead to the isr being called just after enable_irq().

This is because:

* __disable_irq basically do (when really disabling)
	desc->status |= IRQ_DISABLED;
	desc->chip->disable(irq);	<= noop or racy see under

* arch/powerpc/sysdev/cpm2_pic.c does not define a disable() in
  struct irq_chip cpm2_pic. So default_disable is used, and
  it's a noop. For pics where this is not a noop, there can be a race
  if the interrupt triggers before the disable(): the flow handler will
  still be executed just after desc->lock is unlocked in
  disable_irq_nosync(). So it does not really matters whether
  disable() is implemented or is a noop.

* When the flow handler is executed after disable_irq_nosync(),
  handle_edge_irq() does:
	desc->status &= ~(IRQ_REPLAY | IRQ_WAITING);
	desc->status |= (IRQ_PENDING | IRQ_MASKED);
	mask_ack_irq(desc, irq);
  The responsibility of triggering the interrupt has now been
  transferred from the pic to the kernel (the interrupt has been acked
  at pic level), so I would say that if the kernel does not run replay
  it later, it will be a bug.

* When __enable_irq() is called, it does:
	unsigned int status = desc->status & ~IRQ_DISABLED;
	desc->status = status | IRQ_NOPROBE;
	check_irq_resend(desc, irq);

* if it's an edge triggered IRQ (and in my case it is)
  check_irq_resend() sometimes, if the planets are aligned, resend the
  IRQ. Planet alignement is defined by retrigger being provided for the
  irq_chip and retrigger succeed, or CONFIG_HARDIRQS_SW_RESEND is
  defined (in which case and if retrigger is absent or has failed, the
  hardirq is indeed executed in a tasklet -- which might maybe cause
  obscure problems?).

My guess is that CONFIG_HARDIRQS_SW_RESEND was meant to be defined on
any platform where a pic can exist and can do edge trigger but does not
implement retrigger or implements a retrigger that can fail (because
otherwise interrupts can be missed).

This seems to be a quite unknown fact that might have been missed by
multiple irq_chips (and could be missed by more in the future), and I
would propose to just unconditionally build the resend_irqs() tasklet
and its scheduling code, and completely remove
CONFIG_HARDIRQS_SW_RESEND, at least if it is sure that executing a flow
handler in a tasklet can never cause obscure problems.

Cheers!
-- 
Guillaume Knispel

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-04-19 13:10 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-18  3:34 Pending interrupts not always replayed Guillaume Knispel
2010-04-18 14:46 ` Guillaume Knispel
2010-04-18 19:14   ` Thomas Gleixner
2010-04-19 13:09     ` Guillaume Knispel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).