* [Adeos-main] NULL interrupt handler "ipd->irqs[irq].handler" in __ipipe_run_irq()
@ 2011-09-01 2:22 Tom Evans
2011-09-01 9:56 ` Philippe Gerum
0 siblings, 1 reply; 3+ messages in thread
From: Tom Evans @ 2011-09-01 2:22 UTC (permalink / raw)
To: adeos-main
This problem has probably been solved years ago, but Google and
searching this list didn't find me anything.
I'm running an old (2006) Linux 2.4 kernel with Xenomai 2.1 with the
Adeos patches on an MPC5200 (ppc).
Every now and then when I stress the system it crashes because
"ipd->irqs[irq].handler" is NULL for "irq == 1" (a valid irq on this
system) in this code:
kernel/include/asm/ipipe.h::
#define __ipipe_run_isr(ipd, irq, cpuid) \
do { \
if (ipd == ipipe_root_domain) { \
/* \
* Linux handlers are called w/ hw interrupts on so \
* that they could not defer interrupts for higher \
* priority domains. \
*/ \
local_irq_enable_hw(); \
((void (*)(unsigned, struct pt_regs *)) \
ipd->irqs[irq].handler) (irq, __ipipe_tick_regs + cpuid); \
local_irq_disable_hw(); \
} else { \
__clear_bit(IPIPE_SYNC_FLAG, &cpudata->status); \
ipd->irqs[irq].handler(irq,ipd->irqs[irq].cookie); \
__set_bit(IPIPE_SYNC_FLAG, &cpudata->status); \
} \
} while(0)
If I add code to printk() when there's a NULL handler and also add a
printk() to ipipe_virtualize_irq() to detail all interrupt registrations
and de-registrations I get the following:
[ 53.32] 1080:closing...
[ 53.32] ipipe_virtualize_irq(256, 0x00000000)
[ 53.32] ipipe_virtualize_irq(56, 0x00000000)
[ 53.34] 1463:mscan_hwrelease out
[ 53.34] ipipe_virtualize_irq(57, 0x00000000)
[ 53.34] 1463:mscan_hwrelease out
[ 53.35] pcan: pccard_release()
[ 53.35] ipipe_virtualize_irq(1, 0x00000000)
[ 53.36] __ipipe_run_isr(, 1, ) handler is NULL! #######
So it looks like the interrupt is happening in hardware and being queued
and THEN it is being deregistered (with the handler being set to zero in
ipipe_virtualize_irq()) and then it is being pulled from the pipe, run
and (usually) crashes.
I've checked all the Adeos patches I can find for all architectures up
to the current date, and none of them have had changes made to check for
the condition of a NULL interrupt handler in the pipe.
Simply adding a test in __ipipe_run_isr() to ignore these entries seems
to fix this problem for me.
The other solution I can think of would be to make
ipipe_virtualize_irq() smarter so on deregistration it removes any
pending interrupts from the pipelines. Has that been done in any newer
versions?
This problem might match the old (2007) and long running (40 messages)
bug report "Re: Xenomai and MSI enabled crashes kernel" listed here:
http://thread.gmane.org/gmane.linux.real-time.xenomai.users/3643/focus=3657
I'd be interested in any observations, comments or pointers to the "real
cause" and any other "real fixes".
Tom Evans
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Adeos-main] NULL interrupt handler "ipd->irqs[irq].handler" in __ipipe_run_irq()
2011-09-01 2:22 [Adeos-main] NULL interrupt handler "ipd->irqs[irq].handler" in __ipipe_run_irq() Tom Evans
@ 2011-09-01 9:56 ` Philippe Gerum
2011-09-05 23:36 ` Tom Evans
0 siblings, 1 reply; 3+ messages in thread
From: Philippe Gerum @ 2011-09-01 9:56 UTC (permalink / raw)
To: Tom Evans; +Cc: adeos-main
On Thu, 2011-09-01 at 12:22 +1000, Tom Evans wrote:
> This problem has probably been solved years ago, but Google and
> searching this list didn't find me anything.
>
> I'm running an old (2006) Linux 2.4 kernel with Xenomai 2.1 with the
> Adeos patches on an MPC5200 (ppc).
>
> Every now and then when I stress the system it crashes because
> "ipd->irqs[irq].handler" is NULL for "irq == 1" (a valid irq on this
> system) in this code:
<snip>
> So it looks like the interrupt is happening in hardware and being queued
> and THEN it is being deregistered (with the handler being set to zero in
> ipipe_virtualize_irq()) and then it is being pulled from the pipe, run
> and (usually) crashes.
>
> I've checked all the Adeos patches I can find for all architectures up
> to the current date, and none of them have had changes made to check for
> the condition of a NULL interrupt handler in the pipe.
>
> Simply adding a test in __ipipe_run_isr() to ignore these entries seems
> to fix this problem for me.
>
> The other solution I can think of would be to make
> ipipe_virtualize_irq() smarter so on deregistration it removes any
> pending interrupts from the pipelines. Has that been done in any newer
> versions?
>
> This problem might match the old (2007) and long running (40 messages)
> bug report "Re: Xenomai and MSI enabled crashes kernel" listed here:
>
> http://thread.gmane.org/gmane.linux.real-time.xenomai.users/3643/focus=3657
>
Actually, the issue discussed in this thread is MSI+x86 specific,
related to the interrupt namespace, so this does not apply to your case.
> I'd be interested in any observations, comments or pointers to the "real
> cause" and any other "real fixes".
ipipe_virtualize_irq() is an internal service which should be called for
unregistering an IRQ only after the source was shut at device level, and
possibly masked on the interrupt controller. It must be called with
interrupt enabled for the domain which owns the unregistered handler.
On uniprocessor systems, these two conditions are enough to make sure
that no IRQ is lingering in the interrupt log after the handler was
nullified.
I can't spot the routines appearing in the backtrace you sent in the
vanilla linux/xenomai code I have at hand, but if this is a real-time
CAN stack, you may want to check whether the device is properly quiesced
and the IRQ line masked prior to unregistering the interrupt in the
pipeline.
--
Philippe.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Adeos-main] NULL interrupt handler "ipd->irqs[irq].handler" in __ipipe_run_irq()
2011-09-01 9:56 ` Philippe Gerum
@ 2011-09-05 23:36 ` Tom Evans
0 siblings, 0 replies; 3+ messages in thread
From: Tom Evans @ 2011-09-05 23:36 UTC (permalink / raw)
To: adeos-main; +Cc: Philippe Gerum
Philippe Gerum wrote:
> On Thu, 2011-09-01 at 12:22 +1000, Tom Evans wrote:
>> This problem has probably been solved years ago, but Google and
>> searching this list didn't find me anything.
>>
>> I'm running an old (2006) Linux 2.4 kernel with Xenomai 2.1 with the
>> Adeos patches on an MPC5200 (ppc).
>>
>> Every now and then when I stress the system it crashes because
>> "ipd->irqs[irq].handler" is NULL for "irq == 1" (a valid irq on this
>> system) in this code:
>
> <snip>
>
>> So it looks like the interrupt is happening in hardware and being queued
>> and THEN it is being deregistered (with the handler being set to zero in
>> ipipe_virtualize_irq()) and then it is being pulled from the pipe, run
>> and (usually) crashes.
<snip>
>> I'd be interested in any observations, comments or pointers to the "real
>> cause" and any other "real fixes".
>
> ipipe_virtualize_irq() is an internal service which should be called for
> unregistering an IRQ only after the source was shut at device level, and
> possibly masked on the interrupt controller. It must be called with
> interrupt enabled for the domain which owns the unregistered handler.
> On uniprocessor systems, these two conditions are enough to make sure
> that no IRQ is lingering in the interrupt log after the handler was
> nullified.
>
> I can't spot the routines appearing in the backtrace you sent in the
> vanilla linux/xenomai code I have at hand, but if this is a real-time
> CAN stack, you may want to check whether the device is properly quiesced
> and the IRQ line masked prior to unregistering the interrupt in the
> pipeline.
Thanks for your prompt and detailed reply.
Yes, it is a real time CAN stack. It supports Philips SJA1000 CAN chips
on Peak Systems PCMCIA cards connected through TI PCI1520 PCI bridge
chips (using the "Yenta" drivers) through a Freescale MPC5200's PCI
interface The four CAN chips and the PCMCIA Bridge all use a single
shared interrupt. The CAN chip interrupts are real-time, but if they
find none of the CAN chips are responsible the interrupt is handballed
(via XN_ISR_PROPAGATE) to the Linux-based PCMCIA code to see if it was a
card insert event.
There's a lot to go wrong. Frankly it is amazing it works as well as it
does. I can't guarantee that "all interrupt sources are shut down" at
the time of the ipipe_virtualize_irq() because of the PCMCIA sharing.
I've found that checking for a null interrupt vector and ignoring it
solves my problem, and makes the code more robust against any other
corner cases.
Tom
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2011-09-05 23:36 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-01 2:22 [Adeos-main] NULL interrupt handler "ipd->irqs[irq].handler" in __ipipe_run_irq() Tom Evans
2011-09-01 9:56 ` Philippe Gerum
2011-09-05 23:36 ` Tom Evans
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.