linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Question regarding ERR in /proc/interrupts.
@ 2005-01-12 19:12 Justin Piszcz
  2005-01-12 19:31 ` Randy.Dunlap
  2005-01-12 20:16 ` linux-os
  0 siblings, 2 replies; 5+ messages in thread
From: Justin Piszcz @ 2005-01-12 19:12 UTC (permalink / raw)
  To: linux-kernel

Is there anyway to log each ERR to a file or way to find out what caused 
each ERR?

For example, I know this is the cause of a few of them:
spurious 8259A interrupt: IRQ7.

But not all 20, is there any available option to do this?

$ cat /proc/interrupts
            CPU0
   0:  887759057          XT-PIC  timer
   1:       3138          XT-PIC  i8042
   2:          0          XT-PIC  cascade
   5:       5811          XT-PIC  Crystal audio controller
   9:  265081861          XT-PIC  ide4, eth1, eth2
  10:    9087912          XT-PIC  ide6, ide7
  11:     837707          XT-PIC  ide2, ide3
  12:      13854          XT-PIC  i8042
  14:   63373075          XT-PIC  eth0
NMI:          0
ERR:         20


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question regarding ERR in /proc/interrupts.
  2005-01-12 19:12 Question regarding ERR in /proc/interrupts Justin Piszcz
@ 2005-01-12 19:31 ` Randy.Dunlap
  2005-01-12 19:52   ` Justin Piszcz
  2005-01-12 20:16 ` linux-os
  1 sibling, 1 reply; 5+ messages in thread
From: Randy.Dunlap @ 2005-01-12 19:31 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1235 bytes --]

Justin Piszcz wrote:
> Is there anyway to log each ERR to a file or way to find out what caused 
> each ERR?
> 
> For example, I know this is the cause of a few of them:
> spurious 8259A interrupt: IRQ7.
> 
> But not all 20, is there any available option to do this?

Are you sure about that?

MOTD:  what kernel version?

2.6.10 (and probably all) prints such message one time for each
"spurious" IRQ, sets a flag for that IRQ, and then doesn't
print such message for that IRQ any more (i.e., so that
log isn't spammed).  Each distinct spurious IRQ should be
logged (one time).  If you want more, you'll need to patch
a source file and rebuild the kernel (attached, for i8259
PIC, not for APIC, since that's what you seem to have).

> $ cat /proc/interrupts
>            CPU0
>   0:  887759057          XT-PIC  timer
>   1:       3138          XT-PIC  i8042
>   2:          0          XT-PIC  cascade
>   5:       5811          XT-PIC  Crystal audio controller
>   9:  265081861          XT-PIC  ide4, eth1, eth2
>  10:    9087912          XT-PIC  ide6, ide7
>  11:     837707          XT-PIC  ide2, ide3
>  12:      13854          XT-PIC  i8042
>  14:   63373075          XT-PIC  eth0
> NMI:          0
> ERR:         20

-- 
~Randy

[-- Attachment #2: irq_err_msg.patch --]
[-- Type: text/x-patch, Size: 749 bytes --]

linux-2610-bk13

Print all spurious IRQs. (!)

Signed-off-by: Randy Dunlap <rddunlap@osdl.org>

diffstat:=
 arch/i386/kernel/i8259.c |    2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

diff -Naurp ./arch/i386/kernel/i8259.c~irq_err ./arch/i386/kernel/i8259.c
--- ./arch/i386/kernel/i8259.c~irq_err	2004-12-24 13:35:28.000000000 -0800
+++ ./arch/i386/kernel/i8259.c	2005-01-12 11:28:44.233785256 -0800
@@ -225,7 +225,7 @@ spurious_8259A_irq:
 		 * At this point we can be sure the IRQ is spurious,
 		 * lets ACK and report it. [once per IRQ]
 		 */
-		if (!(spurious_irq_mask & irqmask)) {
+		/* if (!(spurious_irq_mask & irqmask)) */ {
 			printk(KERN_DEBUG "spurious 8259A interrupt: IRQ%d.\n", irq);
 			spurious_irq_mask |= irqmask;
 		}

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Re: Question regarding ERR in /proc/interrupts.
  2005-01-12 19:31 ` Randy.Dunlap
@ 2005-01-12 19:52   ` Justin Piszcz
  2005-01-12 20:11     ` Randy.Dunlap
  0 siblings, 1 reply; 5+ messages in thread
From: Justin Piszcz @ 2005-01-12 19:52 UTC (permalink / raw)
  To: Randy.Dunlap; +Cc: linux-kernel

The kernel is 2.6.10.

The patch would effectively increment the ERR counter for each ERR 
correct?

Is there anyway to trace the path or cause of an ERR?

For instance, I know I can make one occur like this:

I have 3 promise boards in a box, when I am doing multiple transfers 
across 2-3 drives and doing an NFS transfer, I may hear the IBM or Hitachi 
disk click and the ERR will incremement or just a long pause.  Also, I 
have used the IBM drive for 4-5+ yrs, never had any data corruption.  The 
disks themselves are not bad.  It would just be nice to understand why 
such spurious interrupts occur.

Dell Setup:

  PCI SLOT 1 = PCI1

  The PCI slots are on a riser board (Dell GX1p)

  PCI1 = Closest to motherboard.

  PCI1 = Intel GigE Nic
  PCI2 = Promise ATA/100
  PCI3 = Maxtor Promise ATA/133
  PCI4 = Maxtor Promise ATA/133
  PCI5 = 4 Port 10/100 NIC
  ISA1 = Empty
  ISA2 = Empty
  ISA3 = Empty

  Note: Nothing is attached to the system's IDE ports, they are disabled.
        I also turned off ACPI/stuff I do not use.



  On Wed, 12 Jan 2005, Randy.Dunlap 
wrote:

> Justin Piszcz wrote:
>> Is there anyway to log each ERR to a file or way to find out what caused 
>> each ERR?
>> 
>> For example, I know this is the cause of a few of them:
>> spurious 8259A interrupt: IRQ7.
>> 
>> But not all 20, is there any available option to do this?
>
> Are you sure about that?
>
> MOTD:  what kernel version?
>
> 2.6.10 (and probably all) prints such message one time for each
> "spurious" IRQ, sets a flag for that IRQ, and then doesn't
> print such message for that IRQ any more (i.e., so that
> log isn't spammed).  Each distinct spurious IRQ should be
> logged (one time).  If you want more, you'll need to patch
> a source file and rebuild the kernel (attached, for i8259
> PIC, not for APIC, since that's what you seem to have).
>
>> $ cat /proc/interrupts
>>            CPU0
>>   0:  887759057          XT-PIC  timer
>>   1:       3138          XT-PIC  i8042
>>   2:          0          XT-PIC  cascade
>>   5:       5811          XT-PIC  Crystal audio controller
>>   9:  265081861          XT-PIC  ide4, eth1, eth2
>>  10:    9087912          XT-PIC  ide6, ide7
>>  11:     837707          XT-PIC  ide2, ide3
>>  12:      13854          XT-PIC  i8042
>>  14:   63373075          XT-PIC  eth0
>> NMI:          0
>> ERR:         20
>
> -- 
> ~Randy
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question regarding ERR in /proc/interrupts.
  2005-01-12 19:52   ` Justin Piszcz
@ 2005-01-12 20:11     ` Randy.Dunlap
  0 siblings, 0 replies; 5+ messages in thread
From: Randy.Dunlap @ 2005-01-12 20:11 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-kernel

Justin Piszcz wrote:
> The kernel is 2.6.10.
> 
> The patch would effectively increment the ERR counter for each ERR correct?

No, that is already incremented for each ERR, the patch just makes
each and every one of them be printed.  (warning)

> Is there anyway to trace the path or cause of an ERR?

Just the interrupt number and hence what device it is used by
(in /proc/interrupts).  However, the 8259 PIC reports spurious
interrupts on IRQ 7.  That's "normal" for it.

> For instance, I know I can make one occur like this:
> 
> I have 3 promise boards in a box, when I am doing multiple transfers 
> across 2-3 drives and doing an NFS transfer, I may hear the IBM or 
> Hitachi disk click and the ERR will incremement or just a long pause.  
> Also, I have used the IBM drive for 4-5+ yrs, never had any data 
> corruption.  The disks themselves are not bad.  It would just be nice to 
> understand why such spurious interrupts occur.

No idea, sorry.  I've seen a few problems with riser boards (in
general, mostly timing related), but I don't know anything about
this one.

Did this start happening recently?

Have you tried asking the drives if they have any SMART data
(problems) logged?

> Dell Setup:
> 
>  PCI SLOT 1 = PCI1
> 
>  The PCI slots are on a riser board (Dell GX1p)
> 
>  PCI1 = Closest to motherboard.
> 
>  PCI1 = Intel GigE Nic
>  PCI2 = Promise ATA/100
>  PCI3 = Maxtor Promise ATA/133
>  PCI4 = Maxtor Promise ATA/133
>  PCI5 = 4 Port 10/100 NIC
>  ISA1 = Empty
>  ISA2 = Empty
>  ISA3 = Empty
> 
>  Note: Nothing is attached to the system's IDE ports, they are disabled.
>        I also turned off ACPI/stuff I do not use.
> 
> 
> 
>  On Wed, 12 Jan 2005, Randy.Dunlap wrote:
> 
>> Justin Piszcz wrote:
>>
>>> Is there anyway to log each ERR to a file or way to find out what 
>>> caused each ERR?
>>>
>>> For example, I know this is the cause of a few of them:
>>> spurious 8259A interrupt: IRQ7.
>>>
>>> But not all 20, is there any available option to do this?
>>
>>
>> Are you sure about that?
>>
>> MOTD:  what kernel version?
>>
>> 2.6.10 (and probably all) prints such message one time for each
>> "spurious" IRQ, sets a flag for that IRQ, and then doesn't
>> print such message for that IRQ any more (i.e., so that
>> log isn't spammed).  Each distinct spurious IRQ should be
>> logged (one time).  If you want more, you'll need to patch
>> a source file and rebuild the kernel (attached, for i8259
>> PIC, not for APIC, since that's what you seem to have).
>>
>>> $ cat /proc/interrupts
>>>            CPU0
>>>   0:  887759057          XT-PIC  timer
>>>   1:       3138          XT-PIC  i8042
>>>   2:          0          XT-PIC  cascade
>>>   5:       5811          XT-PIC  Crystal audio controller
>>>   9:  265081861          XT-PIC  ide4, eth1, eth2
>>>  10:    9087912          XT-PIC  ide6, ide7
>>>  11:     837707          XT-PIC  ide2, ide3
>>>  12:      13854          XT-PIC  i8042
>>>  14:   63373075          XT-PIC  eth0
>>> NMI:          0
>>> ERR:         20

-- 
~Randy

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question regarding ERR in /proc/interrupts.
  2005-01-12 19:12 Question regarding ERR in /proc/interrupts Justin Piszcz
  2005-01-12 19:31 ` Randy.Dunlap
@ 2005-01-12 20:16 ` linux-os
  1 sibling, 0 replies; 5+ messages in thread
From: linux-os @ 2005-01-12 20:16 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: Linux kernel

On Wed, 12 Jan 2005, Justin Piszcz wrote:

> Is there anyway to log each ERR to a file or way to find out what caused each 
> ERR?
>
> For example, I know this is the cause of a few of them:
> spurious 8259A interrupt: IRQ7.
>
> But not all 20, is there any available option to do this?
>
> $ cat /proc/interrupts
>           CPU0
>  0:  887759057          XT-PIC  timer
>  1:       3138          XT-PIC  i8042
>  2:          0          XT-PIC  cascade
>  5:       5811          XT-PIC  Crystal audio controller
>  9:  265081861          XT-PIC  ide4, eth1, eth2
> 10:    9087912          XT-PIC  ide6, ide7
> 11:     837707          XT-PIC  ide2, ide3
> 12:      13854          XT-PIC  i8042
> 14:   63373075          XT-PIC  eth0
> NMI:          0
> ERR:         20
>

I'm not sure you really want to do that! The ERR value is a
spurious interrupt total. You will never learn where
it comes from because it comes from nowhere, which is
why it is called "spurious". Spurious interrupts are
really caused by the CPU, not a particular interrupt
controller. When the INT line is raised, the hardware
is supposed to put an address on the bus so the CPU can
branch to the handler (via some indirection). The
INT pin to the CPU is supposed to be manipulated
by a controller, either the PIC or IO-APIC.

Suppose the controller didn't raise an interrupt, but
the CPU thought it did. In that case, when the CPU signals
the controller to output the vector, the controller says;
"Dohhh... WTF. It's not me...". But the CPU needs some
address to complete the cycle so the controller puts its
last, lowest priority, vector on the bus to complete the
cycle. The CPU branches to the code and the code checks
for a possible printer interrupt (IRQ7). If the printer
didn't signal, it used to write a nasty-gram to the log
before acknowledging the interrupt. Recent kernels only
write such once. However, the number of such instances
are totaled for your review. If you have a lot of them,
it generally means you have:

(1) Too much crosstalk on the motherboard.
(2) Power supplies out of specification.
(3) Too hot so timing gets skewed.
(4) Etc.

It's NEVER the interrupt controller! NEVER. The Spurious
interrupt proves that the controller did its job by completing
the hardware handshake with the CPU. Don't kill the messenger.
It's just doing its job!

FYI 20 spurious interrupts out of the bazzillion shown isn't
too bad. It shows that your hardware isn't perfect.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.10 on an i686 machine (5537.79 BogoMips).
  Notice : All mail here is now cached for review by Dictator Bush.
                  98.36% of all statistics are fiction.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2005-01-12 20:26 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-01-12 19:12 Question regarding ERR in /proc/interrupts Justin Piszcz
2005-01-12 19:31 ` Randy.Dunlap
2005-01-12 19:52   ` Justin Piszcz
2005-01-12 20:11     ` Randy.Dunlap
2005-01-12 20:16 ` linux-os

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).