linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.5.68-mmX: Drowning in irq 7: nobody cared!
@ 2003-05-05 13:23 Shane Shrybman
  2003-05-05 21:30 ` Andrew Morton
  0 siblings, 1 reply; 10+ messages in thread
From: Shane Shrybman @ 2003-05-05 13:23 UTC (permalink / raw)
  To: linux-kernel

Hi,

I am getting a lot of these in the logs. This is with the ALSA emu10k1
driver for a SB live card. This is a x86, UP, KT133 system with preempt
enabled. The system seems to be running fine.

handlers:
[<d8986540>] (gcc2_compiled.+0x0/0x390 [snd_emu10k1])
irq 7: nobody cared!
Call Trace:
 [<c010c5c2>] handle_IRQ_event+0xa2/0x110
 [<c010c7c0>] do_IRQ+0xa0/0x130
 [<c010b14c>] common_interrupt+0x18/0x20
 [<c012048c>] do_softirq+0x3c/0xa0
 [<c010c826>] do_IRQ+0x106/0x130
 [<c010b14c>] common_interrupt+0x18/0x20
 [<c0108884>] default_idle+0x24/0x30
 [<c0114e55>] apm_cpu_idle+0x125/0x170
 [<c0114d30>] apm_cpu_idle+0x0/0x170
 [<c0108860>] default_idle+0x0/0x30
 [<c0108902>] cpu_idle+0x32/0x50
 [<c0105000>] _stext+0x0/0x60
 [<c02c46be>] start_kernel+0x15e/0x170

handlers:
[<d8986540>] (gcc2_compiled.+0x0/0x390 [snd_emu10k1])
irq 7: nobody cared!
Call Trace:
 [<c010c5c2>] handle_IRQ_event+0xa2/0x110
 [<c010c7c0>] do_IRQ+0xa0/0x130
 [<c010b14c>] common_interrupt+0x18/0x20
 [<c012048c>] do_softirq+0x3c/0xa0
 [<c010c826>] do_IRQ+0x106/0x130
 [<c010b14c>] common_interrupt+0x18/0x20
 [<c01203dd>] current_kernel_time+0xd/0x40
 [<c015f4d5>] inode_update_time+0x15/0x90
 [<c01fd188>] memcpy_toiovec+0x68/0xb0
 [<c0132af0>] generic_file_aio_write_nolock+0x390/0x9c0
 [<c01fb9f7>] kfree_skbmem+0x17/0x20
 [<c011b0f0>] autoremove_wake_function+0x0/0x40
 [<c01f86dc>] sock_recvmsg+0x8c/0xb0
 [<c013318f>] generic_file_write_nolock+0x6f/0x90
 [<c0119cec>] __wake_up+0x1c/0x40
 [<c0134737>] __alloc_pages+0x97/0x3a0
 [<c0134a33>] __alloc_pages+0x393/0x3a0
 [<c01f97f2>] sys_recvfrom+0xa2/0x100
 [<c01f9836>] sys_recvfrom+0xe6/0x100
 [<c0134a5a>] __get_free_pages+0x1a/0x50
 [<c0134481>] free_hot_cold_page+0x21/0xf0
 [<c0133350>] generic_file_writev+0x30/0x50
 [<c014933d>] do_readv_writev+0x1bd/0x260
 [<c0148e30>] do_sync_write+0x0/0xb0
 [<c01f9ecf>] sys_socketcall+0x15f/0x1f0
 [<c0149474>] vfs_writev+0x44/0x50
 [<c01494e8>] sys_writev+0x28/0x40
 [<c010a7df>] syscall_call+0x7/0xb

           CPU0       
  0:   45130454          XT-PIC  timer
  1:       6730          XT-PIC  i8042
  2:          0          XT-PIC  cascade
  5:     278881          XT-PIC  uhci-hcd, uhci-hcd
  7:     128713          XT-PIC  EMU10K1
  8:          1          XT-PIC  rtc
 10:     983196          XT-PIC  ide2, ide3, bttv0
 11:    3031816          XT-PIC  eth0
 12:         60          XT-PIC  i8042, i8042, i8042, i8042
 14:      67179          XT-PIC  ide0
 15:        520          XT-PIC  ide1
NMI:          0 
LOC:   45131802 
ERR:        182
MIS:          0

BTW: What about the 4 i8042's on irq 12. Is this normal/OK?

Regards,

Shane


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.5.68-mmX: Drowning in irq 7: nobody cared!
  2003-05-05 13:23 2.5.68-mmX: Drowning in irq 7: nobody cared! Shane Shrybman
@ 2003-05-05 21:30 ` Andrew Morton
  2003-05-06  9:35   ` Alan Cox
  2003-05-07 22:10   ` Shane Shrybman
  0 siblings, 2 replies; 10+ messages in thread
From: Andrew Morton @ 2003-05-05 21:30 UTC (permalink / raw)
  To: Shane Shrybman; +Cc: linux-kernel

Shane Shrybman <shrybman@sympatico.ca> wrote:
>
> Hi,
> 
> I am getting a lot of these in the logs. This is with the ALSA emu10k1
> driver for a SB live card. This is a x86, UP, KT133 system with preempt
> enabled. The system seems to be running fine.
> 
> handlers:
> [<d8986540>] (gcc2_compiled.+0x0/0x390 [snd_emu10k1])
> irq 7: nobody cared!

Beats me.  Does this fix it up?

diff -puN sound/pci/emu10k1/irq.c~sound-irq-hack sound/pci/emu10k1/irq.c
--- 25/sound/pci/emu10k1/irq.c~sound-irq-hack	Mon May  5 14:28:58 2003
+++ 25-akpm/sound/pci/emu10k1/irq.c	Mon May  5 14:29:17 2003
@@ -147,5 +147,5 @@ irqreturn_t snd_emu10k1_interrupt(int ir
 			outl(IPR_FXDSP, emu->port + IPR);
 		}
 	}
-	return IRQ_RETVAL(handled);
+	return IRQ_HANDLED;
 }

_


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.5.68-mmX: Drowning in irq 7: nobody cared!
  2003-05-05 21:30 ` Andrew Morton
@ 2003-05-06  9:35   ` Alan Cox
  2003-05-06 10:41     ` Zwane Mwaikambo
  2003-05-06 15:17     ` Andrew Morton
  2003-05-07 22:10   ` Shane Shrybman
  1 sibling, 2 replies; 10+ messages in thread
From: Alan Cox @ 2003-05-06  9:35 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Shane Shrybman, Linux Kernel Mailing List

On Llu, 2003-05-05 at 22:30, Andrew Morton wrote:
> Shane Shrybman <shrybman@sympatico.ca> wrote:
> >
> > Hi,
> > 
> > I am getting a lot of these in the logs. This is with the ALSA emu10k1
> > driver for a SB live card. This is a x86, UP, KT133 system with preempt
> > enabled. The system seems to be running fine.
> > 
> > handlers:
> > [<d8986540>] (gcc2_compiled.+0x0/0x390 [snd_emu10k1])
> > irq 7: nobody cared!
> 
> Beats me.  Does this fix it up?

With APIC at least it doesnt suprise me the least. The IRQ hack seems
extremely racey. Remember on most systems (especially with PIII type
APIC) IRQ delivery is asynchronous to the bus so you get

IRQ arrives
			sound card
			loop
				clean up IRQ
IRQ sent
				still more work, do it
			done
			HANDLED

IRQ arrives
			sound card
			Umm duh no work for me
			NOT HANDLED

		Whine

For anything where you get pairs of close IRQ's


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.5.68-mmX: Drowning in irq 7: nobody cared!
  2003-05-06  9:35   ` Alan Cox
@ 2003-05-06 10:41     ` Zwane Mwaikambo
  2003-05-06 11:17       ` Alan Cox
  2003-05-06 15:17     ` Andrew Morton
  1 sibling, 1 reply; 10+ messages in thread
From: Zwane Mwaikambo @ 2003-05-06 10:41 UTC (permalink / raw)
  To: Alan Cox; +Cc: Andrew Morton, Shane Shrybman, Linux Kernel Mailing List

On Tue, 6 May 2003, Alan Cox wrote:

> With APIC at least it doesnt suprise me the least. The IRQ hack seems
> extremely racey. Remember on most systems (especially with PIII type
> APIC) IRQ delivery is asynchronous to the bus so you get
> 
> IRQ arrives
> 			sound card
> 			loop
> 				clean up IRQ
> IRQ sent
> 				still more work, do it
> 			done
> 			HANDLED
> 
> IRQ arrives
> 			sound card
> 			Umm duh no work for me
> 			NOT HANDLED
> 
> 		Whine
> 
> For anything where you get pairs of close IRQ's

Shouldn't this also be observed more easily on P4/xAPIC since you can have 
a pending vector in the IRR and ISR whilst the core processes one.

	Zwane
-- 
function.linuxpower.ca

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.5.68-mmX: Drowning in irq 7: nobody cared!
  2003-05-06 10:41     ` Zwane Mwaikambo
@ 2003-05-06 11:17       ` Alan Cox
  0 siblings, 0 replies; 10+ messages in thread
From: Alan Cox @ 2003-05-06 11:17 UTC (permalink / raw)
  To: Zwane Mwaikambo; +Cc: Andrew Morton, Shane Shrybman, Linux Kernel Mailing List

On Maw, 2003-05-06 at 11:41, Zwane Mwaikambo wrote:
> > For anything where you get pairs of close IRQ's
> 
> Shouldn't this also be observed more easily on P4/xAPIC since you can have 
> a pending vector in the IRR and ISR whilst the core processes one.

I don't know enough about the pending vector stuff. For the older APIC the
IRQ's go via a suprisingly slow seperate APIC bus (4 wire if I remember
rightly). 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.5.68-mmX: Drowning in irq 7: nobody cared!
  2003-05-06 15:17     ` Andrew Morton
@ 2003-05-06 15:07       ` Alan Cox
  2003-05-06 16:12         ` Andrew Morton
  0 siblings, 1 reply; 10+ messages in thread
From: Alan Cox @ 2003-05-06 15:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: shrybman, Linux Kernel Mailing List

On Maw, 2003-05-06 at 16:17, Andrew Morton wrote:
> Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> >
> >  With APIC at least it doesnt suprise me the least. The IRQ hack seems
> >  extremely racey.
> 
> Good point.  How about we do something like "if half of the past 1000
> interrupts weren't handled then try to kill the IRQ"?

And if its a sound card generating close pairs of IRQs you might still
trip. It seems the heuristic is more complicated


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.5.68-mmX: Drowning in irq 7: nobody cared!
  2003-05-06  9:35   ` Alan Cox
  2003-05-06 10:41     ` Zwane Mwaikambo
@ 2003-05-06 15:17     ` Andrew Morton
  2003-05-06 15:07       ` Alan Cox
  1 sibling, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2003-05-06 15:17 UTC (permalink / raw)
  To: Alan Cox; +Cc: shrybman, linux-kernel

Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
>
>  With APIC at least it doesnt suprise me the least. The IRQ hack seems
>  extremely racey.

Good point.  How about we do something like "if half of the past 1000
interrupts weren't handled then try to kill the IRQ"?



 arch/i386/kernel/irq.c |   55 +++++++++++++++++++++++++++++++++++++++----------
 include/linux/irq.h    |    2 +
 2 files changed, 46 insertions(+), 11 deletions(-)

diff -puN arch/i386/kernel/irq.c~irq-check-rate-limit arch/i386/kernel/irq.c
--- 25/arch/i386/kernel/irq.c~irq-check-rate-limit	2003-05-06 07:54:17.000000000 -0700
+++ 25-akpm/arch/i386/kernel/irq.c	2003-05-06 08:16:15.000000000 -0700
@@ -66,8 +66,12 @@
 /*
  * Controller mappings for all interrupt sources:
  */
-irq_desc_t irq_desc[NR_IRQS] __cacheline_aligned =
-	{ [0 ... NR_IRQS-1] = { 0, &no_irq_type, NULL, 0, SPIN_LOCK_UNLOCKED}};
+irq_desc_t irq_desc[NR_IRQS] __cacheline_aligned = {
+	[0 ... NR_IRQS-1] = {
+		.handler = &no_irq_type,
+		.lock = SPIN_LOCK_UNLOCKED
+	}
+};
 
 static void register_irq_proc (unsigned int irq);
 
@@ -209,7 +213,6 @@ int handle_IRQ_event(unsigned int irq,
 {
 	int status = 1;	/* Force the "do bottom halves" bit */
 	int retval = 0;
-	struct irqaction *first_action = action;
 
 	if (!(action->flags & SA_INTERRUPT))
 		local_irq_enable();
@@ -222,19 +225,43 @@ int handle_IRQ_event(unsigned int irq,
 	if (status & SA_SAMPLE_RANDOM)
 		add_interrupt_randomness(irq);
 	local_irq_disable();
-	if (retval != 1) {
+	return status;
+}
+
+/*
+ * If 500 of the previous 1000 interrupts have not been handled then assume
+ * that the IRQ is stuck in some manner.  Drop a diagnostic and try to turn the
+ * IRQ off.
+ *
+ * Called under desc->lock
+ */
+static void note_interrupt(irq_desc_t *desc, int irq, int status)
+{
+	if (status != IRQ_HANDLED)
+		desc->irqs_unhandled++;
+	desc->irq_count++;
+	if (desc->irq_count < 1000)
+		return;
+
+	desc->irq_count = 0;
+	if (desc->irqs_unhandled > 500) {
+		/*
+		 * The interrupt is stuck
+		 */
 		static int count = 100;
+		struct irqaction *action;
+
 		if (count) {
 			count--;
-			if (retval) {
+			if (status) {
 				printk("irq event %d: bogus retval mask %x\n",
-					irq, retval);
+					irq, status);
 			} else {
 				printk("irq %d: nobody cared!\n", irq);
 			}
 			dump_stack();
 			printk("handlers:\n");
-			action = first_action;
+			action = desc->action;
 			do {
 				printk("[<%p>]", action->handler);
 				print_symbol(" (%s)",
@@ -243,9 +270,13 @@ int handle_IRQ_event(unsigned int irq,
 				action = action->next;
 			} while (action);
 		}
+		/*
+		 * Now kill the IRQ
+		 */
+		desc->status |= IRQ_DISABLED;
+		desc->handler->disable(irq);
 	}
-
-	return status;
+	desc->irqs_unhandled = 0;
 }
 
 /*
@@ -418,10 +449,12 @@ asmlinkage unsigned int do_IRQ(struct pt
 	 * SMP environment.
 	 */
 	for (;;) {
+		int status;
+
 		spin_unlock(&desc->lock);
-		handle_IRQ_event(irq, &regs, action);
+		status = handle_IRQ_event(irq, &regs, action);
 		spin_lock(&desc->lock);
-		
+		note_interrupt(desc, irq, status);
 		if (likely(!(desc->status & IRQ_PENDING)))
 			break;
 		desc->status &= ~IRQ_PENDING;
diff -puN include/linux/irq.h~irq-check-rate-limit include/linux/irq.h
--- 25/include/linux/irq.h~irq-check-rate-limit	2003-05-06 07:56:03.000000000 -0700
+++ 25-akpm/include/linux/irq.h	2003-05-06 08:05:17.000000000 -0700
@@ -61,6 +61,8 @@ typedef struct {
 	hw_irq_controller *handler;
 	struct irqaction *action;	/* IRQ action list */
 	unsigned int depth;		/* nested irq disables */
+	unsigned int irq_count;		/* For detecting broken interrupts */
+	unsigned int irqs_unhandled;
 	spinlock_t lock;
 } ____cacheline_aligned irq_desc_t;
 

_


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.5.68-mmX: Drowning in irq 7: nobody cared!
  2003-05-06 15:07       ` Alan Cox
@ 2003-05-06 16:12         ` Andrew Morton
  0 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2003-05-06 16:12 UTC (permalink / raw)
  To: Alan Cox; +Cc: shrybman, linux-kernel

Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
>
> It seems the heuristic is more complicated

Any suggestions?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.5.68-mmX: Drowning in irq 7: nobody cared!
  2003-05-05 21:30 ` Andrew Morton
  2003-05-06  9:35   ` Alan Cox
@ 2003-05-07 22:10   ` Shane Shrybman
  1 sibling, 0 replies; 10+ messages in thread
From: Shane Shrybman @ 2003-05-07 22:10 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Hi Andrew & Alan,

Sorry for the delay but the one liner below does seem to have cleared up
the issue. I have been running it for about eight hours, with some sound
on all the time, and haven't seen any 'nobody cared' messages.

BTW: I hand applied this to 2.5.69-mm1. So I am confident that this one
liner did fix it.

On Mon, 2003-05-05 at 17:30, Andrew Morton wrote:
> Shane Shrybman <shrybman@sympatico.ca> wrote:
> >
> > Hi,
> > 
> > I am getting a lot of these in the logs. This is with the ALSA emu10k1
> > driver for a SB live card. This is a x86, UP, KT133 system with preempt
> > enabled. The system seems to be running fine.
> > 
> > handlers:
> > [<d8986540>] (gcc2_compiled.+0x0/0x390 [snd_emu10k1])
> > irq 7: nobody cared!
> 
> Beats me.  Does this fix it up?
> 
> diff -puN sound/pci/emu10k1/irq.c~sound-irq-hack sound/pci/emu10k1/irq.c
> --- 25/sound/pci/emu10k1/irq.c~sound-irq-hack	Mon May  5 14:28:58 2003
> +++ 25-akpm/sound/pci/emu10k1/irq.c	Mon May  5 14:29:17 2003
> @@ -147,5 +147,5 @@ irqreturn_t snd_emu10k1_interrupt(int ir
>  			outl(IPR_FXDSP, emu->port + IPR);
>  		}
>  	}
> -	return IRQ_RETVAL(handled);
> +	return IRQ_HANDLED;
>  }

Regards,

Shane



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.5.68-mmX: Drowning in irq 7: nobody cared!
@ 2003-05-06 21:44 Chuck Ebbert
  0 siblings, 0 replies; 10+ messages in thread
From: Chuck Ebbert @ 2003-05-06 21:44 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Alan Cox

Andrew Morton wrote:

> Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> >
> > It seems the heuristic is more complicated
> 
> Any suggestions?


 Does this pseudocode look like it would work?  It should make it
only complain if two or more interrupts in a row go unhandled.

   
int last_irq_was_dropped[NR_IRQS];

/* call each handler in turn for this irq */

for (each_driver(irq)) {
        ret = call_driver();
        if (ret == irq_handled) {
                if (unlikely(last_irq_was_dropped[irq])
                        last_irq_was_dropped[irq] = 0;
                break;
        }
}
if (ret != irq_handled) {
        if (unlikely(last_irq_was_dropped[irq]))
                complain();
        else
                last_irq_was_dropped[irq] = 1;
}

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2003-05-07 21:58 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-05-05 13:23 2.5.68-mmX: Drowning in irq 7: nobody cared! Shane Shrybman
2003-05-05 21:30 ` Andrew Morton
2003-05-06  9:35   ` Alan Cox
2003-05-06 10:41     ` Zwane Mwaikambo
2003-05-06 11:17       ` Alan Cox
2003-05-06 15:17     ` Andrew Morton
2003-05-06 15:07       ` Alan Cox
2003-05-06 16:12         ` Andrew Morton
2003-05-07 22:10   ` Shane Shrybman
2003-05-06 21:44 Chuck Ebbert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).