linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?
@ 2001-01-10 21:30 Frank de Lange
  2001-01-10 22:21 ` Manfred Spraul
  2001-01-11 11:48 ` Andrew Morton
  0 siblings, 2 replies; 36+ messages in thread
From: Frank de Lange @ 2001-01-10 21:30 UTC (permalink / raw)
  To: linux-kernel

Hi'all,

Ever since I put two ethernet-cards (cheap Winbond W89C940 based PCI NE2K
clones) in my BP-6 system, I've been experiencing intermittent network hangs. A
hang manifests itself as a total failure to communicate through either network
card, and can only be solved by rebooting. Removing and reloading the modules
does not fix the problem, only a reboot works.

I have searched high and low for possible causes, but I found no definite
answer. I suspect the problem may be hardware-related (the BP-6 can be tricky
sometimes), but I want to rule out software issues before I take the system
apart. So, my question: does anyone else experience these network-related
problems with a BP-6 based system? Or maybe other similar problems, where a
specific subsystem hangs an can only be revived by rebooting the box? The
network cards are currently in PCI4 and PCI5 (which should work, they are
NON-busmastering cards after all...), but relocating the cards does not solve
the problem. This problem has been nagging me ever since I moved to 2.3.x
(somewhere around 2.3.30). As it is intermittent, it is very difficult to pin
down. I suspect the APIC in not completely sane, or some timing on the bus is
out of spec, but that's no more than a suspicion. And since I do not have
access to a logic analyzer it is somewhat hard to prove...

Here's a lspci for the box:

00:00.0 Host bridge: Intel Corporation 440BX/ZX - 82443BX/ZX Host bridge (rev 03)
00:01.0 PCI bridge: Intel Corporation 440BX/ZX - 82443BX/ZX AGP bridge (rev 03)
00:07.0 ISA bridge: Intel Corporation 82371AB PIIX4 ISA (rev 02)
00:07.1 IDE interface: Intel Corporation 82371AB PIIX4 IDE (rev 01)
00:07.2 USB Controller: Intel Corporation 82371AB PIIX4 USB (rev 01)
00:07.3 Bridge: Intel Corporation 82371AB PIIX4 ACPI (rev 02)
00:09.0 Ethernet controller: Winbond Electronics Corp W89C940
00:0b.0 Multimedia video controller: Brooktree Corporation Bt878 (rev 02)
00:0b.1 Multimedia controller: Brooktree Corporation Bt878 (rev 02)
00:0d.0 SCSI storage controller: Symbios Logic Inc. (formerly NCR) 53c875 (rev 26)
00:0f.0 Multimedia audio controller: Ensoniq ES1371 [AudioPCI-97] (rev 06)
00:11.0 Ethernet controller: Winbond Electronics Corp W89C940 (rev 0b)
00:13.0 Unknown mass storage controller: Triones Technologies, Inc. HPT366 (rev 01)
00:13.1 Unknown mass storage controller: Triones Technologies, Inc. HPT366 (rev 01)
01:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G400 AGP (rev 04)

and here's a cat /proc/<interesting_file>

/proc/cpuinfo shows this box contains dual Celeron 466's (non-overclocked)

/proc/interrupts:
           CPU0       CPU1       
  0:   10003353    9483961    IO-APIC-edge  timer
  1:      85449      84279    IO-APIC-edge  keyboard
  2:          0          0          XT-PIC  cascade
  3:        167        249    IO-APIC-edge  serial
  4:     380807     381140    IO-APIC-edge  serial
 14:     136991     132077    IO-APIC-edge  ide0
 15:      25836      24605    IO-APIC-edge  ide1
 16:      42510      42482   IO-APIC-level  es1371, mga@PCI:1:0:0
 17:         26         26   IO-APIC-level  sym53c8xx
 18:       9287       8837   IO-APIC-level  bttv
 19:     205294     205191   IO-APIC-level  eth0, eth1, usb-uhci
NMI:   19487238   19487238 
LOC:   19488621   19488620 
ERR:          0

/proc/meminfo:
        total:    used:    free:  shared: buffers:  cached:
Mem:  261984256 260354048  1630208        0 12873728 99012608
Swap: 511926272 14245888 497680384
...
...

# network cards share IRQ with USB, which hosts...
/proc/bus/usb/devices:
T:  Bus=01 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#=  1 Spd=12  MxCh= 2
B:  Alloc= 11/900 us ( 1%), #Int=  1, #Iso=  0
D:  Ver= 1.00 Cls=09(hub  ) Sub=00 Prot=00 MxPS= 8 #Cfgs=  1
P:  Vendor=0000 ProdID=0000 Rev= 0.00
S:  Product=USB UHCI Root Hub
S:  SerialNumber=c000
C:* #Ifs= 1 Cfg#= 1 Atr=40 MxPwr=  0mA
I:  If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub
E:  Ad=81(I) Atr=03(Int.) MxPS=   8 Ivl=255ms
T:  Bus=01 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#=  2 Spd=12  MxCh= 0
D:  Ver= 1.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs=  1
P:  Vendor=03f0 ProdID=0401 Rev= 1.00
S:  Product=HP ScanJet 5200C
S:  SerialNumber=SG95D1720ZHT
C:* #Ifs= 1 Cfg#= 1 Atr=60 MxPwr=  0mA
I:  If#= 0 Alt= 0 #EPs= 3 Cls=00(>ifc ) Sub=00 Prot=00 Driver=usbscanner
E:  Ad=81(I) Atr=02(Bulk) MxPS=  64 Ivl=  0ms
E:  Ad=02(O) Atr=02(Bulk) MxPS=  16 Ivl=  0ms
E:  Ad=83(I) Atr=03(Int.) MxPS=   1 Ivl=250ms

uname -a:
Linux behemoth.localnet 2.4.0 #1 SMP Fri Jan 5 15:41:39 CET 2001 i686 unknown

Cheers//Frank

-- 
  WWWWW      _______________________
 ## o o\    /     Frank de Lange     \
 }#   \|   /                          \
  ##---# _/     <Hacker for Hire>      \
   ####   \      +31-320-252965        /
           \    frank@unternet.org    /
            -------------------------
 [ "Omnis enim res, quae dando non deficit, dum habetur
    et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware  related?
  2001-01-10 21:30 QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? Frank de Lange
@ 2001-01-10 22:21 ` Manfred Spraul
  2001-01-10 22:29   ` Frank de Lange
  2001-01-10 22:40   ` Frank de Lange
  2001-01-11 11:48 ` Andrew Morton
  1 sibling, 2 replies; 36+ messages in thread
From: Manfred Spraul @ 2001-01-10 22:21 UTC (permalink / raw)
  To: Frank de Lange; +Cc: linux-kernel

Frank de Lange wrote:
> 
> Hi'all,
> 
> Ever since I put two ethernet-cards (cheap Winbond W89C940 based PCI NE2K
> clones) in my BP-6 system, I've been experiencing intermittent network hangs.
>

Which driver do you use? The driver in 2.4.0 contains several bugfixes.
If that driver still hangs then I'll double check the documentation.

> A
> hang manifests itself as a total failure to communicate through either network
> card, and can only be solved by rebooting. Removing and reloading the modules
> does not fix the problem, only a reboot works.
> 

That's different from my problems:
unload+reload always fixed my problems with the unpatch winbond-840
driver.

> which should work, they are
> NON-busmastering cards after all...),
third line in w840_probe1():

	pci_set_master().

And the documentation begins with
W89C840F
	PCI Bus Master Fast Ethernet LAN Controller.


--
	Manfred
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?
  2001-01-10 22:21 ` Manfred Spraul
@ 2001-01-10 22:29   ` Frank de Lange
  2001-01-10 22:40   ` Frank de Lange
  1 sibling, 0 replies; 36+ messages in thread
From: Frank de Lange @ 2001-01-10 22:29 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: linux-kernel

On Wed, Jan 10, 2001 at 11:21:49PM +0100, Manfred Spraul wrote:
> Which driver do you use? The driver in 2.4.0 contains several bugfixes.
> If that driver still hangs then I'll double check the documentation.

The NE2K PCI one... I'll try to fiddle around with the driver, who knows...

> And the documentation begins with
> W89C840F
> 	PCI Bus Master Fast Ethernet LAN Controller.

That is the 'F' (Fast) version. My cards are just plain and simple 10base2/T,
humble and non-busmastering (AFAIK).

Cheers//Frank


-- 
  WWWWW      _______________________
 ## o o\    /     Frank de Lange     \
 }#   \|   /                          \
  ##---# _/     <Hacker for Hire>      \
   ####   \      +31-320-252965        /
           \    frank@unternet.org    /
            -------------------------
 [ "Omnis enim res, quae dando non deficit, dum habetur
    et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?
  2001-01-10 22:21 ` Manfred Spraul
  2001-01-10 22:29   ` Frank de Lange
@ 2001-01-10 22:40   ` Frank de Lange
  1 sibling, 0 replies; 36+ messages in thread
From: Frank de Lange @ 2001-01-10 22:40 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: linux-kernel

On Wed, Jan 10, 2001 at 11:21:49PM +0100, Manfred Spraul wrote:
> > which should work, they are
> > NON-busmastering cards after all...),
> third line in w840_probe1():
> 
> 	pci_set_master().
> 
> And the documentation begins with
> W89C840F
> 	PCI Bus Master Fast Ethernet LAN Controller.

...in addition to my previous reply, your cards use the Winbond 840 series,
while my cards use the 940 series. Higher number, but a less capabpe chipset or
so it seems...

Hm, but that reminds me not to get 840's to solve my problems :-)

Cheers//Frank
-- 
  WWWWW      _______________________
 ## o o\    /     Frank de Lange     \
 }#   \|   /                          \
  ##---# _/     <Hacker for Hire>      \
   ####   \      +31-320-252965        /
           \    frank@unternet.org    /
            -------------------------
 [ "Omnis enim res, quae dando non deficit, dum habetur
    et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware  related?
  2001-01-10 21:30 QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? Frank de Lange
  2001-01-10 22:21 ` Manfred Spraul
@ 2001-01-11 11:48 ` Andrew Morton
  2001-01-11 15:22   ` Frank de Lange
                     ` (5 more replies)
  1 sibling, 6 replies; 36+ messages in thread
From: Andrew Morton @ 2001-01-11 11:48 UTC (permalink / raw)
  To: Frank de Lange; +Cc: linux-kernel

Frank de Lange wrote:
> 
> Hi'all,
> 
> Ever since I put two ethernet-cards (cheap Winbond W89C940 based PCI NE2K
> clones) in my BP-6 system, I've been experiencing intermittent network hangs. A
> hang manifests itself as a total failure to communicate through either network
> card, and can only be solved by rebooting. Removing and reloading the modules
> does not fix the problem, only a reboot works.
> 

Losing both NICs at the same time could be the elusive "APIC
stops generating interrupts" problem.

Do you get any transmit timeout messages in the logs?  If
so, send them.

Does it happen with a uniprocessor build?

Are you able to boot with the `noapic' LILO option?
If so, does that make it stop?

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?
  2001-01-11 11:48 ` Andrew Morton
@ 2001-01-11 15:22   ` Frank de Lange
  2001-01-11 16:55   ` Frank de Lange
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 36+ messages in thread
From: Frank de Lange @ 2001-01-11 15:22 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

On Thu, Jan 11, 2001 at 10:48:23PM +1100, Andrew Morton wrote:
> Losing both NICs at the same time could be the elusive "APIC
> stops generating interrupts" problem.

Yup, that's what I thought... But the real question is, is this a
software/configuration problem or a hardware problem which can only be fixed by
physically changing something on the board?... As it is, as you call it,
'elusive', it is a b*tch to pinpoint the source of these problems...

> Do you get any transmit timeout messages in the logs?  If
> so, send them.

Here they are (marked with ***):

grep -B2 -A2 transmit /var/log/messages:
    Jan 10 22:24:47 behemoth kernel: usb_control/bulk_msg: timeout 
    Jan 10 22:24:50 behemoth kernel: usb_control/bulk_msg: timeout 
*** Jan 10 22:56:51 behemoth kernel: NETDEV WATCHDOG: eth0: transmit timed out 
    Jan 10 22:57:03 behemoth last message repeated 7 times
    Jan 10 22:57:03 behemoth kernel: SysRq: Emergency Sync 
    --
    Jan 10 22:57:09 behemoth kernel: Syncing device 16:07 ... OK 
    Jan 10 22:57:09 behemoth kernel: Done. 
*** Jan 10 22:57:09 behemoth kernel: NETDEV WATCHDOG: eth0: transmit timed out 
    Jan 10 22:57:09 behemoth kernel: SysRq: Emergency Sync 
    Jan 10 22:57:09 behemoth kernel: Syncing device 03:01 ... OK 

> Does it happen with a uniprocessor build?

Not tried yet, since I wanna use both CPU's :-).

> Are you able to boot with the `noapic' LILO option?

I am, and did it a while ago. As far as I remember, it did not make it stop...
I'll try again (even though it is not a real solution, since that APIC is there
for a reason...)

Cheers//Frank

-- 
  WWWWW      _______________________
 ## o o\    /     Frank de Lange     \
 }#   \|   /                          \
  ##---# _/     <Hacker for Hire>      \
   ####   \      +31-320-252965        /
           \    frank@unternet.org    /
            -------------------------
 [ "Omnis enim res, quae dando non deficit, dum habetur
    et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?
  2001-01-11 11:48 ` Andrew Morton
  2001-01-11 15:22   ` Frank de Lange
@ 2001-01-11 16:55   ` Frank de Lange
  2001-01-11 19:18   ` Frank de Lange
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 36+ messages in thread
From: Frank de Lange @ 2001-01-11 16:55 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

> Do you get any transmit timeout messages in the logs?  If
> so, send them.

In addition to my previous message, here's what I get from the debug log
facility:

Jan 10 22:56:51 behemoth kernel: NETDEV WATCHDOG: eth0: transmit timed out 
Jan 10 22:56:51 behemoth kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=33. 
Jan 10 22:56:52 behemoth kernel: NETDEV WATCHDOG: eth0: transmit timed out 
Jan 10 22:56:52 behemoth kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=26. 
Jan 10 22:56:53 behemoth kernel: NETDEV WATCHDOG: eth0: transmit timed out 
Jan 10 22:56:53 behemoth kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=30. 
Jan 10 22:56:56 behemoth kernel: NETDEV WATCHDOG: eth0: transmit timed out 
Jan 10 22:56:56 behemoth kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=78. 
Jan 10 22:56:56 behemoth kernel: NETDEV WATCHDOG: eth0: transmit timed out 
Jan 10 22:56:56 behemoth kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=32. 
Jan 10 22:56:58 behemoth kernel: NETDEV WATCHDOG: eth0: transmit timed out 
Jan 10 22:56:58 behemoth kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=89. 
Jan 10 22:57:00 behemoth kernel: NETDEV WATCHDOG: eth0: transmit timed out 
Jan 10 22:57:00 behemoth kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=77. 
Jan 10 22:57:03 behemoth kernel: NETDEV WATCHDOG: eth0: transmit timed out 
Jan 10 22:57:03 behemoth kernel: eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=171. 

So yeah, I get timeouts allright...

Currently running NOAPIC, pity to see CPU1 receiving no interrupts at all... In the same debug log I now just saw this:

Jan 11 17:37:05 behemoth kernel: spurious 8259A interrupt: IRQ7

That's weird, since there's nothing there...:

cat /proc/interrupts 
           CPU0       CPU1       
  0:     232967          0          XT-PIC  timer
  1:       6424          0          XT-PIC  keyboard
  2:          0          0          XT-PIC  cascade
  3:        138          0          XT-PIC  serial
  4:      46201          0          XT-PIC  serial
  9:         52          0          XT-PIC  sym53c8xx
 10:     744329          0          XT-PIC  eth0, eth1, usb-uhci
 11:          0          0          XT-PIC  bttv
 12:          0          0          XT-PIC  es1371, mga@PCI:1:0:0
 14:      19778          0          XT-PIC  ide0
 15:       4520          0          XT-PIC  ide1
NMI:          0          0 
LOC:     232916     232914 
ERR:          1

See? Nothing on 7... This is with NOAPIC (as you can see from the XT-PIC's in
the above dump). BP6 again?

Cheers//Frank
-- 
  WWWWW      _______________________
 ## o o\    /     Frank de Lange     \
 }#   \|   /                          \
  ##---# _/     <Hacker for Hire>      \
   ####   \      +31-320-252965        /
           \    frank@unternet.org    /
            -------------------------
 [ "Omnis enim res, quae dando non deficit, dum habetur
    et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?
  2001-01-11 11:48 ` Andrew Morton
  2001-01-11 15:22   ` Frank de Lange
  2001-01-11 16:55   ` Frank de Lange
@ 2001-01-11 19:18   ` Frank de Lange
       [not found]     ` <3A5E0849.EB428D70@mandrakesoft.com>
  2001-01-11 19:38   ` Frank de Lange
                     ` (2 subsequent siblings)
  5 siblings, 1 reply; 36+ messages in thread
From: Frank de Lange @ 2001-01-11 19:18 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Here's another posting to the list which mentions problems with NE2K and BP6:

http://web.gnu.walfield.org/mail-archive/linux-kernel/2000-August/0132.html

"...In another machine, a dual celeron abit-bp6, recent 2.3.x kernels seem to 
dislike my realtek 8029 NIC. (I know, it's garbage plugged in to 
garbage...) The network card will die randomly, usually when I'm sending 
large amounts of data. When it dies, there are no kernel messages, and 
the interrupt count in /proc/interrupts for the card stop changing. Minor 
(painful) experimentation has shown that if the card is sharing the 
interrupt with anything else (say, ide2), it takes that with it. This 
only happens in "newer" kernels, it's fine in 2.2.16, and in some earlier 
2.3.x kernels. It goes away if I boot with the noapic=1 kernel parameter, 
and seems to be replaced with harmless "spurious 8259A interrupt: IRQ7." 
messages. (I haven't configured any hardware at all to be on IRQ7 - 
though I'm lead to believe IRQ7 has some sort of special purpose) ..."

So I'm not the only one...

Cheers//Frank
-- 
  WWWWW      _______________________
 ## o o\    /     Frank de Lange     \
 }#   \|   /                          \
  ##---# _/     <Hacker for Hire>      \
   ####   \      +31-320-252965        /
           \    frank@unternet.org    /
            -------------------------
 [ "Omnis enim res, quae dando non deficit, dum habetur
    et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?
  2001-01-11 11:48 ` Andrew Morton
                     ` (2 preceding siblings ...)
  2001-01-11 19:18   ` Frank de Lange
@ 2001-01-11 19:38   ` Frank de Lange
  2001-01-11 19:49   ` Frank de Lange
  2001-01-11 21:09   ` Frank de Lange
  5 siblings, 0 replies; 36+ messages in thread
From: Frank de Lange @ 2001-01-11 19:38 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Hm, the noapic option seems to help, as I'm currently beating the network to
death but it won't die... As the problem is elusive, it is hard to tell, and it
would not surprise me if the net dropped dead the moment this mail went
through, but current indication is that noapic makes the sudden net-death
disappear.

So.... we're still left with the question 'is this hardware-related, or is it a
software/configuration problem'? Other people seem to have similar problems
with dissimilar hardware (tulip cards instead of Winbond, etc), on 2.2.x as
well as 2.3/4.x. As I do not run Windows (NT or 2K), I can not tell if this
problem also occurs there. And my FreeBSD-box is uniprocessor... So... has
anyone seen anything like this on other 'true' (SMP) OS's? If so, that would
indicate a hardware problem...

Cheers//Frank
-- 
  WWWWW      _______________________
 ## o o\    /     Frank de Lange     \
 }#   \|   /                          \
  ##---# _/     <Hacker for Hire>      \
   ####   \      +31-320-252965        /
           \    frank@unternet.org    /
            -------------------------
 [ "Omnis enim res, quae dando non deficit, dum habetur
    et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?
  2001-01-11 11:48 ` Andrew Morton
                     ` (3 preceding siblings ...)
  2001-01-11 19:38   ` Frank de Lange
@ 2001-01-11 19:49   ` Frank de Lange
  2001-01-11 21:09   ` Frank de Lange
  5 siblings, 0 replies; 36+ messages in thread
From: Frank de Lange @ 2001-01-11 19:49 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Another observation wrt. behaviour with 'noapic'...

When streaming time-critical data over the network (running esound to another
server, etc), sometimes there are hiccups in the stream. These hiccups seem to
be much less frequent, if at all present, when running with 'noapic'. I'm
currently running sound over a heavily loaded ethernet, no hiccups at all...
Weird, since the apic ought to spread the load of handling the interrupts over
all available CPU's.

Whatever is causing this, there seems to be something fishy in the way
interrupts are handled when the apic(s) is/are enabled...

Cheers//Frank
-- 
  WWWWW      _______________________
 ## o o\    /     Frank de Lange     \
 }#   \|   /                          \
  ##---# _/     <Hacker for Hire>      \
   ####   \      +31-320-252965        /
           \    frank@unternet.org    /
            -------------------------
 [ "Omnis enim res, quae dando non deficit, dum habetur
    et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?
  2001-01-11 11:48 ` Andrew Morton
                     ` (4 preceding siblings ...)
  2001-01-11 19:49   ` Frank de Lange
@ 2001-01-11 21:09   ` Frank de Lange
  2001-01-11 21:47     ` Jeff Garzik
  5 siblings, 1 reply; 36+ messages in thread
From: Frank de Lange @ 2001-01-11 21:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

OK, just one last addition to what has nearly become my own thread...

I now am fairly certain that the problem (network stalls on multiprocessor systems) is not BP6 or NE2K-PCI specific. I found several postings which relate to similar problems on dissimilar hardware. Another interesting one is:

Re: PROBLEM : Networking stops working with kernel 2.4.0-test11 
  (http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg18722.html)

"...I have an almos identical system as you, 2x200MMX motherboard (Gigabyte
       586DX) also Voodoo3 (2000 pci) the same nic Realtek 8029AS, also a bt848
       tv card, also SCSI (Aic-7880 onboard, but not used).

       I have reported it some time ago, and now all I get with
       2.4.0-test11-pre4 and I think a additional patch is  NETDEV WATCHDOG:
       eth0: transmit timed out, and something in the console about lost irq?

       I can't reproduce it with a uniprocesor kernel, and I have a 3c503 card
       wich uses the 8390 module, so I suppose that the problem it's not in the
       8390, and it seems to be smp related...."


ne2k-pci freezes with APIC error on 2.4.0-testX SMP
  (http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg14468.html)

"...

       When doing massive NFS transfers (2.4 machine as the client) on my SMP
       box
       (Abit BP6 2x celeronA 533mhz (non-overclocked) 64Mb ram, latest
       apt-get-ed
       debian woody) my ne2k-pci card (Realtek Semiconductor Co., Ltd.
       RTL-8029(AS)
        (rev 0)) suddenly stops working. test5 spits that in syslog:..."

More to be found when searching the archives. This problem has been around for
a long, long time (probably since the current level of apic-support was added,
somewhere around 2.3.1x?). It has been reported by several people, several
times. I feel like rigging every apic-related piece of code with a zillion
bells and printk's but that would surely only create more mayhem as this whole
thing seems to be timing-related...

Anyone got any idea's on how to tackle this? Anyone who is 'intimate with' the
apic-related code? It'll take me some time to dive into that part, so if there
is anyone who already has taken the plunge, do tell...

Cheers//Frank

[ who is still running apic-less, without problems [
-- 
  WWWWW      _______________________
 ## o o\    /     Frank de Lange     \
 }#   \|   /                          \
  ##---# _/     <Hacker for Hire>      \
   ####   \      +31-320-252965        /
           \    frank@unternet.org    /
            -------------------------
 [ "Omnis enim res, quae dando non deficit, dum habetur
    et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware  related?
  2001-01-11 21:09   ` Frank de Lange
@ 2001-01-11 21:47     ` Jeff Garzik
  2001-01-11 21:53       ` Frank de Lange
  2001-01-12 14:35       ` David Woodhouse
  0 siblings, 2 replies; 36+ messages in thread
From: Jeff Garzik @ 2001-01-11 21:47 UTC (permalink / raw)
  To: Frank de Lange; +Cc: Andrew Morton, linux-kernel

Frank de Lange wrote:
> 
> OK, just one last addition to what has nearly become my own thread...
> 
> I now am fairly certain that the problem (network stalls on multiprocessor systems) is not BP6 or NE2K-PCI specific. I found several postings which relate to similar problems on dissimilar hardware. Another interesting one is:

>        I have reported it some time ago, and now all I get with
>        2.4.0-test11-pre4 and I think a additional patch is  NETDEV WATCHDOG:
>        eth0: transmit timed out, and something in the console about lost irq?



Are you judging based on the error message?  The 'netdev watchdog ...'
message is a generic error message that could have any number of
causes.  It's just saying, well, what it says :)  The kernel was unable
to transmit a packet in a certain amount of time.  You might get these
messages if you unplug a cable suddenly, or if your hardware isn't
delivering interrupts, or many other things...

	Jeff


-- 
Jeff Garzik       | "You see, in this world there's two kinds of
Building 1024     |  people, my friend: Those with loaded guns
MandrakeSoft      |  and those who dig. You dig."  --Blondie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?
  2001-01-11 21:47     ` Jeff Garzik
@ 2001-01-11 21:53       ` Frank de Lange
  2001-01-12 14:35       ` David Woodhouse
  1 sibling, 0 replies; 36+ messages in thread
From: Frank de Lange @ 2001-01-11 21:53 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Andrew Morton, linux-kernel

On Thu, Jan 11, 2001 at 04:47:00PM -0500, Jeff Garzik wrote:
> Are you judging based on the error message?  The 'netdev watchdog ...'
> message is a generic error message that could have any number of
> causes.  It's just saying, well, what it says :)  The kernel was unable
> to transmit a packet in a certain amount of time.  You might get these
> messages if you unplug a cable suddenly, or if your hardware isn't
> delivering interrupts, or many other things...

No, I'm judging based on the fact that I found reports from people using
NE2K-PCI with several cards as well as tulip-based cards (different driver) on
abit BP6 as well as Gigabyte motherboards, mostly on 2.3.x/2.4.x kernels. I
found some postings with these problems on 2.2.x kernels.

Cheers//Frank
-- 
  WWWWW      _______________________
 ## o o\    /     Frank de Lange     \
 }#   \|   /                          \
  ##---# _/     <Hacker for Hire>      \
   ####   \      +31-320-252965        /
           \    frank@unternet.org    /
            -------------------------
 [ "Omnis enim res, quae dando non deficit, dum habetur
    et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?
       [not found]     ` <3A5E0849.EB428D70@mandrakesoft.com>
@ 2001-01-12  0:28       ` Frank de Lange
  2001-01-12 11:40         ` Andrew Morton
  0 siblings, 1 reply; 36+ messages in thread
From: Frank de Lange @ 2001-01-12  0:28 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel

On Thu, Jan 11, 2001 at 02:23:53PM -0500, Jeff Garzik wrote:
> Just out of curiosity, if you boot a Linux 2.4.0 kernel with the
> "noapic" command line option, does behavior improve?

For the curious, here's a summary of some tests I did:

apic, 2 cpu's, no smp affinity -> network hangs under load
apic, maxcpus=1, no smp affinity -> network hangs under load
apic, 2 cpu's, smp affinity for all irq's on CPU1 -> network hangs under load
noapic, 2 cpu's, no smp affinity -> NO HANG, WORKSFORME

Quick and dirty conclusion: as soon as the apic comes in to play, things get
messy...

ps. load == 2 simultaneous nfs cp -rd <big_directory> sessions and streaming
esd audio over the network

Cheers//Frank
-- 
  WWWWW      _______________________
 ## o o\    /     Frank de Lange     \
 }#   \|   /                          \
  ##---# _/     <Hacker for Hire>      \
   ####   \      +31-320-252965        /
           \    frank@unternet.org    /
            -------------------------
 [ "Omnis enim res, quae dando non deficit, dum habetur
    et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware  related?
  2001-01-12  0:28       ` Frank de Lange
@ 2001-01-12 11:40         ` Andrew Morton
  2001-01-12 15:06           ` Frank de Lange
  2001-01-12 15:36           ` Frank de Lange
  0 siblings, 2 replies; 36+ messages in thread
From: Andrew Morton @ 2001-01-12 11:40 UTC (permalink / raw)
  To: Frank de Lange; +Cc: Maciej W. Rozycki, linux-kernel

Frank de Lange wrote:
> 
> Quick and dirty conclusion: as soon as the apic comes in to play, things get
> messy...

Yup.

Frank, for over a year there have been sporadic reports
of APIC's forgetting how to deliver interrupts.  Not only
on BP6's.  Often with 3com NICs, so I've never been 100% sure
that it's not a failure in the NIC.

In your case, you have three devices on the same IRQ and
they all go to lunch at the same time.  That's pretty
convincing.

Nobody has been able to repeat this frequently enough
for any useful debugging to be done. Don't go away!

Here is a debugging patch.  Could you please apply this,
rebuild and:

1: Type ALT-SYSRQ-A when everything is good
2: Type ALT-SYSRQ-A when everything is bad
3: send the resulting logs.

I've Cc'ed Maciej, who understands this stuff.





--- linux-2.4.0-test11.macro/arch/i386/kernel/io_apic.c	Thu Oct  5 21:08:17 2000
+++ linux-2.4.0-test11/arch/i386/kernel/io_apic.c	Sun Nov 26 12:39:01 2000
@@ -692,7 +692,7 @@ void __init UNEXPECTED_IO_APIC(void)
 	printk(KERN_WARNING "          to linux-smp@vger.kernel.org\n");
 }
 
-void __init print_IO_APIC(void)
+void /*__init*/ print_IO_APIC(void)
 {
 	int apic, i;
 	struct IO_APIC_reg_00 reg_00;
diff -up --recursive --new-file linux-2.4.0-test11.macro/drivers/char/sysrq.c linux-2.4.0-test11/drivers/char/sysrq.c
--- linux-2.4.0-test11.macro/drivers/char/sysrq.c	Tue Nov 14 10:24:52 2000
+++ linux-2.4.0-test11/drivers/char/sysrq.c	Sun Nov 26 12:42:11 2000
@@ -72,6 +72,15 @@ void handle_sysrq(int key, struct pt_reg
 	console_loglevel = 7;
 	printk(KERN_INFO "SysRq: ");
 	switch (key) {
+	case 'a':
+		printk("\n");
+		printk("print_PIC()\n");
+		print_PIC();
+		printk("print_IO_APIC()\n");
+		print_IO_APIC();
+		printk("print_all_local_APICs()\n");
+		print_all_local_APICs();
+		break;
 	case 'r':					    /* R -- Reset raw mode */
 		if (kbd) {
 			kbd->kbdmode = VC_XLATE;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?
  2001-01-11 21:47     ` Jeff Garzik
  2001-01-11 21:53       ` Frank de Lange
@ 2001-01-12 14:35       ` David Woodhouse
  1 sibling, 0 replies; 36+ messages in thread
From: David Woodhouse @ 2001-01-12 14:35 UTC (permalink / raw)
  To: Frank de Lange; +Cc: Jeff Garzik, Andrew Morton, linux-kernel


frank@unternet.org said:
>  No, I'm judging based on the fact that I found reports from people
> using NE2K-PCI with several cards as well as tulip-based cards
> (different driver) on abit BP6 as well as Gigabyte motherboards,
> mostly on 2.3.x/2.4.x kernels. I found some postings with these
> problems on 2.2.x kernels. 

IRQ 19 on my BP6 stopped arriving a few days ago. 

 19:      90373      90473   IO-APIC-level  usb-uhci

Removing and reloading the usb-uhci driver didn't help. Loading the uhci 
driver just oopsed, which seems to be its normal behaviour on the occasions 
on which I try it.

Rebooting fixed it. I was half tempted to code a 'kick APIC because I think 
it broke' function, but then decided not to bother.

It might be nice in 2.5 to give drivers some way of kicking the APIC when 
they think they've missed an interrupt, much like the network code kicks 
the driver. And to deal more gracefully with IRQ storms.

Once a driver has a way of saying "Oi! Why isn't IRQ x working?" it would 
be feasible to just disable the damn thing if we receive a million of them 
in rapid succession.

--
dwmw2


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?
  2001-01-12 11:40         ` Andrew Morton
@ 2001-01-12 15:06           ` Frank de Lange
  2001-01-12 15:36           ` Frank de Lange
  1 sibling, 0 replies; 36+ messages in thread
From: Frank de Lange @ 2001-01-12 15:06 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Maciej W. Rozycki, linux-kernel

On Fri, Jan 12, 2001 at 10:40:04PM +1100, Andrew Morton wrote:
> Frank de Lange wrote:
> > 
> > Quick and dirty conclusion: as soon as the apic comes in to play, things get
> > messy...
> Here is a debugging patch.  Could you please apply this,
> rebuild and:
> 
> 1: Type ALT-SYSRQ-A when everything is good
> 2: Type ALT-SYSRQ-A when everything is bad
> 3: send the resulting logs.

WillCo...

Now rebuilding...


Cheers//Frank

-- 
  WWWWW      _______________________
 ## o o\    /     Frank de Lange     \
 }#   \|   /                          \
  ##---# _/     <Hacker for Hire>      \
   ####   \      +31-320-252965        /
           \    frank@unternet.org    /
            -------------------------
 [ "Omnis enim res, quae dando non deficit, dum habetur
    et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?
  2001-01-12 11:40         ` Andrew Morton
  2001-01-12 15:06           ` Frank de Lange
@ 2001-01-12 15:36           ` Frank de Lange
  1 sibling, 0 replies; 36+ messages in thread
From: Frank de Lange @ 2001-01-12 15:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Maciej W. Rozycki, linux-kernel

On Fri, Jan 12, 2001 at 10:40:04PM +1100, Andrew Morton wrote:
> Here is a debugging patch.  Could you please apply this,
> rebuild and:
> 
> 1: Type ALT-SYSRQ-A when everything is good
> 2: Type ALT-SYSRQ-A when everything is bad
> 3: send the resulting logs.

OK, here's the results I get...

Before network hang
===================

print_PIC()
printing PIC contents
print_IO_APIC()
testing the IO APIC.......................

.................................... done.
print_all_local_APICs()

... APIC ID:      01000000 (1)
... APIC VERSION: 00040011
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000001000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000010000000000000000


... APIC ID:      00000000 (0)
... APIC VERSION: 00040011
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000010000000000000001000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000001000

NOTICE: results differ every time I hit ALT-SYSRQ-A.
	The '1' bit at 'row 11, col. 26' stays '1'
	no matter how many times I use the magic keys.
	The other '1' bits jump around a bit, or
	disappear alltogether. Also, the sequence
	in which the APICs appear in the dump sometimes
	differs (this example shows 1 first, then 0,
	other times you'd see 0 first, then 1)

After network hang
==================

print_PIC()
printing PIC contents
print_IO_APIC()
testing the IO APIC.......................

.................................... done.
print_all_local_APICs()

... APIC ID:      00000000 (0)
... APIC VERSION: 00040011
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000010000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000010000000000000000


... APIC ID:      01000000 (1)
... APIC VERSION: 00040011
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000001000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000010000000000000000

NOTICE:	hmmm... see, now that '1' bit at row 11,
	col. 26 for APIC 0 which was '1' before
	has turned to '0'. It will stay '0' no
	matter how many times I hit the magic keys...
	It seems to have been replaced by the '1'
	bit at row 11, col. 10, since that bit 
	stays '1' no matter how many magic I
	throw at it...

Hope this helps... If you need more, let me know...

Cheers//Frank
-- 
  WWWWW      _______________________
 ## o o\    /     Frank de Lange     \
 }#   \|   /                          \
  ##---# _/     <Hacker for Hire>      \
   ####   \      +31-320-252965        /
           \    frank@unternet.org    /
            -------------------------
 [ "Omnis enim res, quae dando non deficit, dum habetur
    et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?
       [not found] <20010112165104.A22465@unternet.org>
@ 2001-01-16 19:23 ` Maciej W. Rozycki
  0 siblings, 0 replies; 36+ messages in thread
From: Maciej W. Rozycki @ 2001-01-16 19:23 UTC (permalink / raw)
  To: Frank de Lange; +Cc: Andrew Morton, linux-kernel

On Fri, 12 Jan 2001, Frank de Lange wrote:

[I've cut syslog junk away for clarity -- you could just do `dmesg -s 32768'.]
> before network hang
> ===================
[...]
>  NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
[...]
>  13 0FF 0F  0    1    0   1   0    1    1    99
[...]
> printing local APIC contents on CPU#0/0:
[...]
> ... APIC TMR field:
> 0123456789abcdef0123456789abcdef
[...]
> 00000000010000000000000001000000
                           ^
[...]
> printing local APIC contents on CPU#1/1:
[...]
> ... APIC TMR field:
> 0123456789abcdef0123456789abcdef
[...]
> 00000000000000000000000001000000
                           ^
[...]

 Here everything is fine.  Vector 0x99 (the one you are having troubles
with) is set up as level-triggered (Trig is 1) and the respective bits of
the Trigger Mode Register (TMR) of the local APIC of both CPUs are set,
i.e. the last 0x99 IRQ processed was level-triggered as expected.  As a
part of the ususal inter-APIC handshake for level-triggered interrupts an
EOI message was sent to the originating I/O APIC (IRR is 0) to inform it,
it's free to send the IRQ again if still asserted. 

> after network hang
> ==================
[...]
> NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
[...]
>  13 0FF 0F  0    1    1   1   0    1    1    99
[...]
> printing local APIC contents on CPU#1/1:
[...]
> ... APIC TMR field:
> 0123456789abcdef0123456789abcdef
[...]
> 00000000000000000000000001000000
                           ^
> printing local APIC contents on CPU#0/0:
[...]
> ... APIC TMR field:
> 0123456789abcdef0123456789abcdef
[...]
> 00000000010000000000000000000000
                           ^ Gotcha!
[...]

 Here the last 0x99 IRQ delivered to CPU#1 was fine, just like before. 
But the last 0x99 IRQ CPU#0 received was apparently delivered as
edge-triggered -- the respective bits of the Trigger Mode Register (TMR)
of the local APIC is cleared.  Hence the local APIC decided no EOI message
is needed for the originating I/O APIC as edge-triggered interrupts are
always sent by an I/O APIC whenever arriving (it's not possible for
level-triggered ones as an IRQ storm would result).  Upon receiving an EOI
command from Linux the local APIC decides everything is finished and the
I/O APIC is left stuck with the IRR bit set to 1.  It's still waiting for
an EOI message to arrive for further 0x99 IRQs to send.

 How could it happen?  Well, I guess a transmission error could have
happened that remained unnoticed by the checksumming hardware.  As the
checksum algorithm is pretty trivial -- a cumulative sum of 2-bit values
-- it might just have happened two bits got toggled.  I believe such
errors are happening due to marginal hardware -- not every i386 SMP box
shows this problem, even if a high volume of level-triggered interrupts is
observed. 

 Thank you very much for the log.  I already have an idea how to
automatically recover from such a situation.  No driver change is required
-- apparently no driver is at fault, it's just a load of the inter-APIC
bus, I/O APICs (including system bus accesses) or the whole system in
general. 

 I'm hereby asking everyone not to modify drivers just to circumvent APIC
lock-ups.  Especially if such changes would "punish" perfectly good
systems.  It won't cure anything -- it might only make it happen less
frequently due to different conditions.  I hope to have changes ready to
test by the next week.

  Maciej

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--------------------------------------------------------------+
+        e-mail: macro@ds2.pg.gda.pl, PGP key available        +

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?
  2001-01-14  0:13                   ` Roeland Th. Jansen
@ 2001-01-14  0:23                     ` Frank de Lange
  0 siblings, 0 replies; 36+ messages in thread
From: Frank de Lange @ 2001-01-14  0:23 UTC (permalink / raw)
  To: Roeland Th. Jansen
  Cc: Ingo Molnar, Linus Torvalds, Manfred Spraul, dwmw2, linux-kernel,
	Alan Cox

On Sun, Jan 14, 2001 at 12:13:58AM +0000, Roeland Th. Jansen wrote:
> On Fri, Jan 12, 2001 at 09:03:49PM +0100, Ingo Molnar wrote:
> > well, some time ago i had an ne2k card in an SMP system as well, and found
> > this very problem. Disabling/enabling focus-cpu appeared to make a
> > difference, but later on i made experiments that show that in both cases
> > the hang happens. I spent a good deal of time trying to fix this problem,
> > but failed - so any fresh ideas are more than welcome.
> 
> for the record. my BP6, non OC, apic smp system with ne2k fails within
> 24 hours here too. if I can be of any help..... (2.4.0. kernel. no
> vmware or opensound)

You can help yourself by applying Manfred's patch to 8390.c (in preference to
my own patch to the same file). This will sove the hanging-network problem. If
your entire box hangs, that's another story which will probably not be fixed by
that patch. You can find the patch in Manfred's posting to the list from Fri
Jan 12 2001 - 14:04:24 EST.

I've been running a patched driver for more than a day now, under heavy network
load, without problems.

Frank

-- 
  WWWWW      _______________________
 ## o o\    /     Frank de Lange     \
 }#   \|   /                          \
  ##---# _/     <Hacker for Hire>      \
   ####   \      +31-320-252965        /
           \    frank@unternet.org    /
            -------------------------
 [ "Omnis enim res, quae dando non deficit, dum habetur
    et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?
  2001-01-12 20:03                 ` Ingo Molnar
@ 2001-01-14  0:13                   ` Roeland Th. Jansen
  2001-01-14  0:23                     ` Frank de Lange
  0 siblings, 1 reply; 36+ messages in thread
From: Roeland Th. Jansen @ 2001-01-14  0:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Frank de Lange, Manfred Spraul, dwmw2,
	linux-kernel, Alan Cox

On Fri, Jan 12, 2001 at 09:03:49PM +0100, Ingo Molnar wrote:
> well, some time ago i had an ne2k card in an SMP system as well, and found
> this very problem. Disabling/enabling focus-cpu appeared to make a
> difference, but later on i made experiments that show that in both cases
> the hang happens. I spent a good deal of time trying to fix this problem,
> but failed - so any fresh ideas are more than welcome.

for the record. my BP6, non OC, apic smp system with ne2k fails within
24 hours here too. if I can be of any help..... (2.4.0. kernel. no
vmware or opensound)

-- 
Grobbebol's Home                   |  Don't give in to spammers.   -o)
http://www.xs4all.nl/~bengel       | Use your real e-mail address   /\
Linux 2.2.16 SMP 2x466MHz / 256 MB |        on Usenet.             _\_v  
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?
  2001-01-12 19:59               ` Linus Torvalds
  2001-01-12 20:03                 ` Ingo Molnar
  2001-01-12 20:05                 ` Frank de Lange
@ 2001-01-12 21:21                 ` Frank de Lange
  2 siblings, 0 replies; 36+ messages in thread
From: Frank de Lange @ 2001-01-12 21:21 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Manfred Spraul, dwmw2, linux-kernel, mingo, Alan Cox

> Remind me: what polarity are your io-apic irq's? Level, edge, sideways?
> Anything else that might be relevant?

Well, sideways ofcourse! :-)

here's a cat /proc/interrupts from the (BP6) box:

           CPU0       CPU1       
  0:     104936     105433    IO-APIC-edge  timer
  1:       4444       4384    IO-APIC-edge  keyboard
  2:          0          0          XT-PIC  cascade
  3:         79         59    IO-APIC-edge  serial
  4:      12743      12850    IO-APIC-edge  serial
 14:       7855       7885    IO-APIC-edge  ide0
 15:       1990       1703    IO-APIC-edge  ide1
 16:          0          0   IO-APIC-level  es1371, mga@PCI:1:0:0
 17:         24         28   IO-APIC-level  sym53c8xx
 18:          0          0   IO-APIC-level  bttv
 19:     460435     460402   IO-APIC-level  eth0, eth1, usb-uhci
NMI:     210303     210303 
LOC:     210285     210284 
ERR:          0

The interrupt which caused problems was 19 (with both network cards and USB on
it). It shows a high number of interrupts because I've been load-testing the
network. The mere fact that it shows this hig number of interrupts shows the
fix works...

As this is a BP6, I'm now supposed to go on about the dead chickens, dedicated
air conditioners, nuclear powersupplies and other magic you're supposed to buy
to get these boards running. Well, nothing of that sort, it is running on a
simple (but high quality) 235W PSU with heatgreased coolers on the CPUs and the
BX xhipset. Nothing is overclocked. CPU and chipset tmeperatures are 24.C and
32.C, respectively.

In short, nothing remarkable. All PCI slots are used, as you can see from my
first posting in this thread (which contains more info on the hardware).

//Frank
-- 
  WWWWW      _______________________
 ## o o\    /     Frank de Lange     \
 }#   \|   /                          \
  ##---# _/     <Hacker for Hire>      \
   ####   \      +31-320-252965        /
           \    frank@unternet.org    /
            -------------------------
 [ "Omnis enim res, quae dando non deficit, dum habetur
    et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?
  2001-01-12 19:59               ` Linus Torvalds
  2001-01-12 20:03                 ` Ingo Molnar
@ 2001-01-12 20:05                 ` Frank de Lange
  2001-01-12 21:21                 ` Frank de Lange
  2 siblings, 0 replies; 36+ messages in thread
From: Frank de Lange @ 2001-01-12 20:05 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Manfred Spraul, dwmw2, linux-kernel, mingo, Alan Cox

On Fri, Jan 12, 2001 at 11:59:25AM -0800, Linus Torvalds wrote:
> > Could this really be the solution?
> 
> I'd like to know _which_ of the two makes a difference (or does it only
> trigger with both of them enabled)? And even then I'm not sure that it is
> "the" solution - both changes to io-apic handling had some reason for
> them. Ingo, what was the focus-cpu thing?

Well, with 'this' (in 'could THIS be') I really meant the move from disable_irq
to the irq_safe spinlocks. I'm currently running with the patched 8390.c
driver, patched io_apic (TARGET_CPUS 0xff) and patched apic.c (focus cpu
enabled), and have had no problems yet... even though I'm running several
simulatnsous nfs cp -rd <big_dir>, streaming network audio, scanning with an
USB scanner, etc.

So far, it seems that the patch to 8390.c removed the symptoms. The changes to
apic.c and io_apic.c did not make the network hang come back. 

Cheers//Frank
-- 
  WWWWW      _______________________
 ## o o\    /     Frank de Lange     \
 }#   \|   /                          \
  ##---# _/     <Hacker for Hire>      \
   ####   \      +31-320-252965        /
           \    frank@unternet.org    /
            -------------------------
 [ "Omnis enim res, quae dando non deficit, dum habetur
    et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?
  2001-01-12 19:59               ` Linus Torvalds
@ 2001-01-12 20:03                 ` Ingo Molnar
  2001-01-14  0:13                   ` Roeland Th. Jansen
  2001-01-12 20:05                 ` Frank de Lange
  2001-01-12 21:21                 ` Frank de Lange
  2 siblings, 1 reply; 36+ messages in thread
From: Ingo Molnar @ 2001-01-12 20:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Frank de Lange, Manfred Spraul, dwmw2, linux-kernel, Alan Cox


On Fri, 12 Jan 2001, Linus Torvalds wrote:

> [...] Ingo, what was the focus-cpu thing?

well, some time ago i had an ne2k card in an SMP system as well, and found
this very problem. Disabling/enabling focus-cpu appeared to make a
difference, but later on i made experiments that show that in both cases
the hang happens. I spent a good deal of time trying to fix this problem,
but failed - so any fresh ideas are more than welcome.

	Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?
  2001-01-12 19:52             ` Frank de Lange
@ 2001-01-12 19:59               ` Linus Torvalds
  2001-01-12 20:03                 ` Ingo Molnar
                                   ` (2 more replies)
  0 siblings, 3 replies; 36+ messages in thread
From: Linus Torvalds @ 2001-01-12 19:59 UTC (permalink / raw)
  To: Frank de Lange; +Cc: Manfred Spraul, dwmw2, linux-kernel, mingo, Alan Cox



On Fri, 12 Jan 2001, Frank de Lange wrote:

> On Fri, Jan 12, 2001 at 08:33:15PM +0100, Manfred Spraul wrote:
> > Frank, the 2.4.0 contains 2 band aids that were added for ne2k smp:
> > 
> > * From Ingo: focus cpu disabled, in arch/i386/kernel/apic.c
> > * From myself: TARGET_CPU = cpu_online_mask, was 0xFF.
> > 
> > Could you disable both bandaids? I disabled them, no problems so far.
> 
> I disabled both (I guess you meant the 'define TARGET_CPUS cpu_online' in
> io_apic.c?), and reverted my own patch, added your patch... Now running with
> the usual heavy network load, no problems so far... Also made USB produce
> interrupts (shares irq with network), no problems...
> 
> Could this really be the solution?

I'd like to know _which_ of the two makes a difference (or does it only
trigger with both of them enabled)? And even then I'm not sure that it is
"the" solution - both changes to io-apic handling had some reason for
them. Ingo, what was the focus-cpu thing?

		Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?
  2001-01-12 19:33           ` Manfred Spraul
@ 2001-01-12 19:52             ` Frank de Lange
  2001-01-12 19:59               ` Linus Torvalds
  0 siblings, 1 reply; 36+ messages in thread
From: Frank de Lange @ 2001-01-12 19:52 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: dwmw2, linux-kernel, mingo, Alan Cox, torvalds

On Fri, Jan 12, 2001 at 08:33:15PM +0100, Manfred Spraul wrote:
> Frank, the 2.4.0 contains 2 band aids that were added for ne2k smp:
> 
> * From Ingo: focus cpu disabled, in arch/i386/kernel/apic.c
> * From myself: TARGET_CPU = cpu_online_mask, was 0xFF.
> 
> Could you disable both bandaids? I disabled them, no problems so far.

I disabled both (I guess you meant the 'define TARGET_CPUS cpu_online' in
io_apic.c?), and reverted my own patch, added your patch... Now running with
the usual heavy network load, no problems so far... Also made USB produce
interrupts (shares irq with network), no problems...

Could this really be the solution?

Cheers//Frank
-- 
  WWWWW      _______________________
 ## o o\    /     Frank de Lange     \
 }#   \|   /                          \
  ##---# _/     <Hacker for Hire>      \
   ####   \      +31-320-252965        /
           \    frank@unternet.org    /
            -------------------------
 [ "Omnis enim res, quae dando non deficit, dum habetur
    et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware  related?
  2001-01-12 19:21         ` Frank de Lange
@ 2001-01-12 19:33           ` Manfred Spraul
  2001-01-12 19:52             ` Frank de Lange
  0 siblings, 1 reply; 36+ messages in thread
From: Manfred Spraul @ 2001-01-12 19:33 UTC (permalink / raw)
  To: Frank de Lange; +Cc: dwmw2, linux-kernel, mingo, Alan Cox, torvalds

Frank de Lange wrote:
> 
> On Fri, Jan 12, 2001 at 08:04:24PM +0100, Manfred Spraul wrote:
> > I removed the disable_irq lines from 8390.c, and that fixed the problem:
> > no hang within 2 minutes - the test is still running.
> >
> > Frank, could you double check it?
> 
> I'm currently running my own patched version, which uses
> spin_lock_irq/spin_unlock_irq instead of
> spin_lock_irqsave/spin_unlock_irqrestore like you patch uses. Looking at
> spinlock.h, spin_lock_irq does a local irq disable, which seems to be closer to
> the original intent (disable_irq) than spin_lock_irqsave. Anyone want to
> comment on this?
> 
It's a bit dangerous: _if_ one of the function is called with disabled
local interrupts, then spin_unlock_irq would enable these interrupts.
That could cause other problems, but I haven't checked if these function
are actually called with disabled interrupts - e.g. the transmit
function is called with enabled interrupts.

Frank, the 2.4.0 contains 2 band aids that were added for ne2k smp:

* From Ingo: focus cpu disabled, in arch/i386/kernel/apic.c
* From myself: TARGET_CPU = cpu_online_mask, was 0xFF.

Could you disable both bandaids? I disabled them, no problems so far.

Now back to the disable_irq_nosync().
--
	Manfred
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?
  2001-01-12 19:04       ` Manfred Spraul
  2001-01-12 19:07         ` Frank de Lange
@ 2001-01-12 19:21         ` Frank de Lange
  2001-01-12 19:33           ` Manfred Spraul
  1 sibling, 1 reply; 36+ messages in thread
From: Frank de Lange @ 2001-01-12 19:21 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: dwmw2, linux-kernel, mingo, Alan Cox, torvalds

On Fri, Jan 12, 2001 at 08:04:24PM +0100, Manfred Spraul wrote:
> I removed the disable_irq lines from 8390.c, and that fixed the problem:
> no hang within 2 minutes - the test is still running.
> 
> Frank, could you double check it?

I'm currently running my own patched version, which uses
spin_lock_irq/spin_unlock_irq instead of
spin_lock_irqsave/spin_unlock_irqrestore like you patch uses. Looking at
spinlock.h, spin_lock_irq does a local irq disable, which seems to be closer to
the original intent (disable_irq) than spin_lock_irqsave. Anyone want to
comment on this?

Anyway, still running under load, also got USB (which uses the same irq) to
produce some interrupts by scanning some stuff. No problems so far...

Cheers//Frank

-- 
  WWWWW      _______________________
 ## o o\    /     Frank de Lange     \
 }#   \|   /                          \
  ##---# _/     <Hacker for Hire>      \
   ####   \      +31-320-252965        /
           \    frank@unternet.org    /
            -------------------------
 [ "Omnis enim res, quae dando non deficit, dum habetur
    et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?
  2001-01-12 19:04       ` Manfred Spraul
@ 2001-01-12 19:07         ` Frank de Lange
  2001-01-12 19:21         ` Frank de Lange
  1 sibling, 0 replies; 36+ messages in thread
From: Frank de Lange @ 2001-01-12 19:07 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: dwmw2, linux-kernel, mingo, Alan Cox, torvalds

On Fri, Jan 12, 2001 at 08:04:24PM +0100, Manfred Spraul wrote:
> Linus wrote:
> > Does this seem to happen mainly with drivers that use "disable_irq()" 
> > and "enable_irq()"? I know the ne drivers do (through the 8390 module), 
> > and some others do too (3c59x). 
> 
> I removed the disable_irq lines from 8390.c, and that fixed the problem:
> no hang within 2 minutes - the test is still running.
> 
> Frank, could you double check it?

Hm, I also sent in a (somewhat different) patch on my own... :-)]

Anyway, still running under heavy load...

Cheers//Frank
-- 
  WWWWW      _______________________
 ## o o\    /     Frank de Lange     \
 }#   \|   /                          \
  ##---# _/     <Hacker for Hire>      \
   ####   \      +31-320-252965        /
           \    frank@unternet.org    /
            -------------------------
 [ "Omnis enim res, quae dando non deficit, dum habetur
    et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware  related?
  2001-01-12 18:25     ` Frank de Lange
@ 2001-01-12 19:04       ` Manfred Spraul
  2001-01-12 19:07         ` Frank de Lange
  2001-01-12 19:21         ` Frank de Lange
  0 siblings, 2 replies; 36+ messages in thread
From: Manfred Spraul @ 2001-01-12 19:04 UTC (permalink / raw)
  To: Frank de Lange; +Cc: dwmw2, linux-kernel, mingo, Alan Cox, torvalds

[-- Attachment #1: Type: text/plain, Size: 371 bytes --]

Linus wrote:
> Does this seem to happen mainly with drivers that use "disable_irq()" 
> and "enable_irq()"? I know the ne drivers do (through the 8390 module), 
> and some others do too (3c59x). 

I removed the disable_irq lines from 8390.c, and that fixed the problem:
no hang within 2 minutes - the test is still running.

Frank, could you double check it?

--
	Manfred

[-- Attachment #2: patch-frank --]
[-- Type: text/plain, Size: 1620 bytes --]

// $Header$
// Kernel Version:
//  VERSION = 2
//  PATCHLEVEL = 4
//  SUBLEVEL = 0
//  EXTRAVERSION =
--- 2.4/drivers/net/8390.c	Thu Jan  4 22:00:55 2001
+++ build-2.4/drivers/net/8390.c	Fri Jan 12 19:53:47 2001
@@ -242,15 +242,15 @@
 
 	/* Ugly but a reset can be slow, yet must be protected */
 		
-	disable_irq_nosync(dev->irq);
-	spin_lock(&ei_local->page_lock);
+/*	disable_irq_nosync(dev->irq);*/
+	spin_lock_irqsave(&ei_local->page_lock, flags);
 		
 	/* Try to restart the card.  Perhaps the user has fixed something. */
 	ei_reset_8390(dev);
 	NS8390_init(dev, 1);
 		
-	spin_unlock(&ei_local->page_lock);
-	enable_irq(dev->irq);
+	spin_unlock_irqrestore(&ei_local->page_lock, flags);
+/*	enable_irq(dev->irq); */
 	netif_wake_queue(dev);
 }
     
@@ -285,9 +285,9 @@
 	 *	Slow phase with lock held.
 	 */
 	 
-	disable_irq_nosync(dev->irq);
+/*	disable_irq_nosync(dev->irq);*/
 	
-	spin_lock(&ei_local->page_lock);
+	spin_lock_irqsave(&ei_local->page_lock, flags);
 	
 	ei_local->irqlock = 1;
 
@@ -327,8 +327,8 @@
 		ei_local->irqlock = 0;
 		netif_stop_queue(dev);
 		outb_p(ENISR_ALL, e8390_base + EN0_IMR);
-		spin_unlock(&ei_local->page_lock);
-		enable_irq(dev->irq);
+		spin_unlock_irqrestore(&ei_local->page_lock, flags);
+/*		enable_irq(dev->irq);*/
 		ei_local->stat.tx_errors++;
 		return 1;
 	}
@@ -383,8 +383,8 @@
 	ei_local->irqlock = 0;
 	outb_p(ENISR_ALL, e8390_base + EN0_IMR);
 	
-	spin_unlock(&ei_local->page_lock);
-	enable_irq(dev->irq);
+	spin_unlock_irqrestore(&ei_local->page_lock, flags);
+/*	enable_irq(dev->irq); */
 
 	dev_kfree_skb (skb);
 	ei_local->stat.tx_bytes += send_length;

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?
  2001-01-12 17:51   ` Manfred Spraul
@ 2001-01-12 18:25     ` Frank de Lange
  2001-01-12 19:04       ` Manfred Spraul
  0 siblings, 1 reply; 36+ messages in thread
From: Frank de Lange @ 2001-01-12 18:25 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: dwmw2, linux-kernel

On Fri, Jan 12, 2001 at 06:51:36PM +0100, Manfred Spraul wrote:
> Frank, I've attached a proposed kick_IOAPIC pin. Could you try it?
> I'm rebooting with that patch right now.

I added the patch, and tried it out. When the network hangs, I am able to revive it with ALT-SYSRQ-Q. The debug log shows these entries:

Jan 12 19:22:57 behemoth kernel: SysRq: <0> NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
Jan 12 19:22:57 behemoth kernel: Before:
Jan 12 19:22:57 behemoth kernel:  00 003 03  0    1    1   1   1    1    1    99
Jan 12 19:22:57 behemoth kernel: After switching to edge:
Jan 12 19:22:57 behemoth kernel:  00 003 03  0    0    1   1   1    1    1    99
Jan 12 19:22:57 behemoth kernel: After switch back:
Jan 12 19:22:57 behemoth kernel:  00 003 03  0    1    1   1   1    1    1    99

-- 
  WWWWW      _______________________
 ## o o\    /     Frank de Lange     \
 }#   \|   /                          \
  ##---# _/     <Hacker for Hire>      \
   ####   \      +31-320-252965        /
           \    frank@unternet.org    /
            -------------------------
 [ "Omnis enim res, quae dando non deficit, dum habetur
    et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware  related?
  2001-01-12 17:33 ` Frank de Lange
@ 2001-01-12 17:51   ` Manfred Spraul
  2001-01-12 18:25     ` Frank de Lange
  0 siblings, 1 reply; 36+ messages in thread
From: Manfred Spraul @ 2001-01-12 17:51 UTC (permalink / raw)
  To: Frank de Lange; +Cc: dwmw2, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 499 bytes --]

Frank de Lange wrote:
> 
> On Fri, Jan 12, 2001 at 06:16:36PM +0100, Manfred Spraul wrote:
> > I would first concentrate on the differences between 2.2 and 2.4:
> >
> > Frank, could you try what happens with the NMI oopser disabled?
> 
> Here's the results with nmi_watchdog=0
> 
> 
> After network hang (nmi_watchdog=0)
> ===================================
> 

It still hangs.

Frank, I've attached a proposed kick_IOAPIC pin. Could you try it?
I'm rebooting with that patch right now.
--
	Manfred

[-- Attachment #2: patch-frank --]
[-- Type: text/plain, Size: 1629 bytes --]

1) add to the end of io_apic.c:

static void print_line(struct IO_APIC_route_entry* entry)
{
	printk(KERN_EMERG " %02x %03X %02X  ",
			0,
			entry->dest.logical.logical_dest,
			entry->dest.physical.physical_dest
		);

	printk("%1d    %1d    %1d   %1d   %1d    %1d    %1d    %02X\n",
			entry->mask,
			entry->trigger,
			entry->irr,
			entry->polarity,
			entry->delivery_status,
			entry->dest_mode,
			entry->delivery_mode,
			entry->vector
		);
}

void kick_IOAPIC_pin(int pin)
{
    	unsigned long flags;
	struct IO_APIC_route_entry entry;

	local_irq_save(flags);

	*(((int *)&entry) + 1) = io_apic_read(0, 0x11 + 2 * pin);
	*(((int *)&entry) + 0) = io_apic_read(0, 0x10 + 2 * pin);

	printk(KERN_EMERG " NR Log Phy Mask Trig IRR Pol"
			  " Stat Dest Deli Vect:   \n");
	printk(KERN_EMERG "Before:\n");
	print_line(&entry);

	entry.trigger = 0;
	io_apic_write(0, 0x11 + 2 * pin, *(((int *)&entry) + 1));
	io_apic_write(0, 0x10 + 2 * pin, *(((int *)&entry) + 0));
	udelay(10);
	printk(KERN_EMERG "After switching to edge:\n");
	print_line(&entry);

	entry.trigger = 1;
	io_apic_write(0, 0x11 + 2 * pin, *(((int *)&entry) + 1));
	io_apic_write(0, 0x10 + 2 * pin, *(((int *)&entry) + 0));
	udelay(10);
	printk(KERN_EMERG "After switch back:\n");
	print_line(&entry);

	local_irq_restore(flags);
}

2) add to sysrq.c:
--- 2.4/drivers/char/sysrq.c	Mon Dec  4 02:48:19 2000
+++ build-2.4/drivers/char/sysrq.c	Fri Jan 12 18:37:57 2001
@@ -137,6 +137,9 @@
 		send_sig_all(SIGKILL, 1);
 		orig_log_level = 8;
 		break;
+	case 'q':
+		kick_IOAPIC_pin(19);
+
 	default:					    /* Unknown: help */
 		if (kbd)
 			printk("unRaw ");

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?
  2001-01-12 17:16 Manfred Spraul
@ 2001-01-12 17:33 ` Frank de Lange
  2001-01-12 17:51   ` Manfred Spraul
  0 siblings, 1 reply; 36+ messages in thread
From: Frank de Lange @ 2001-01-12 17:33 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: dwmw2, linux-kernel

On Fri, Jan 12, 2001 at 06:16:36PM +0100, Manfred Spraul wrote:
> I would first concentrate on the differences between 2.2 and 2.4:
> 
> Frank, could you try what happens with the NMI oopser disabled?

Here's the results with nmi_watchdog=0

Before network hang (nmi_watchdog=0)
====================================
Jan 12 18:24:43 behemoth kernel: SysRq:
Jan 12 18:24:43 behemoth kernel: print_PIC()
Jan 12 18:24:43 behemoth kernel:
Jan 12 18:24:43 behemoth kernel: printing PIC contents
Jan 12 18:24:43 behemoth kernel: ... PIC  IMR: fffa
Jan 12 18:24:43 behemoth kernel: ... PIC  IRR: 0000
Jan 12 18:24:43 behemoth kernel: ... PIC  ISR: 0000
Jan 12 18:24:43 behemoth kernel: ... PIC ELCR: 1e00
Jan 12 18:24:43 behemoth kernel: print_IO_APIC()
Jan 12 18:24:43 behemoth kernel: number of MP IRQ sources: 23.
Jan 12 18:24:43 behemoth kernel: number of IO-APIC #2 registers: 24.
Jan 12 18:24:43 behemoth kernel: testing the IO APIC.......................
Jan 12 18:24:43 behemoth kernel:
Jan 12 18:24:43 behemoth kernel: IO APIC #2......
Jan 12 18:24:43 behemoth kernel: .... register #00: 02000000
Jan 12 18:24:43 behemoth kernel: .......    : physical APIC id: 02
Jan 12 18:24:43 behemoth kernel: .... register #01: 00170011
Jan 12 18:24:43 behemoth kernel: .......     : max redirection entries: 0017
Jan 12 18:24:43 behemoth kernel: .......     : IO APIC version: 0011
Jan 12 18:24:43 behemoth kernel: .... register #02: 00000000
Jan 12 18:24:43 behemoth kernel: .......     : arbitration: 00
Jan 12 18:24:43 behemoth kernel: .... IRQ redirection table:
Jan 12 18:24:43 behemoth kernel:  NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
Jan 12 18:24:43 behemoth kernel:  00 000 00  1    0    0   0   0    0    0    00
Jan 12 18:24:43 behemoth kernel:  01 003 03  0    0    0   0   0    1    1    39
Jan 12 18:24:43 behemoth kernel:  02 003 03  0    0    0   0   0    1    1    31
Jan 12 18:24:43 behemoth kernel:  03 003 03  0    0    0   0   0    1    1    41
Jan 12 18:24:43 behemoth kernel:  04 003 03  0    0    0   0   0    1    1    49
Jan 12 18:24:43 behemoth kernel:  05 003 03  0    0    0   0   0    1    1    51
Jan 12 18:24:43 behemoth kernel:  06 003 03  0    0    0   0   0    1    1    59
Jan 12 18:24:43 behemoth kernel:  07 003 03  0    0    0   0   0    1    1    61
Jan 12 18:24:43 behemoth kernel:  08 003 03  0    0    0   0   0    1    1    69
Jan 12 18:24:43 behemoth kernel:  09 000 00  1    0    0   0   0    0    0    00
Jan 12 18:24:43 behemoth kernel:  0a 000 00  1    0    0   0   0    0    0    00
Jan 12 18:24:43 behemoth kernel:  0b 000 00  1    0    0   0   0    0    0    00
Jan 12 18:24:43 behemoth kernel:  0c 000 00  1    0    0   0   0    0    0    00
Jan 12 18:24:43 behemoth kernel:  0d 000 00  1    0    0   0   0    0    0    00
Jan 12 18:24:43 behemoth kernel:  0e 003 03  0    0    0   0   0    1    1    71
Jan 12 18:24:43 behemoth kernel:  0f 003 03  0    0    0   0   0    1    1    79
Jan 12 18:24:43 behemoth kernel:  10 003 03  0    1    0   1   0    1    1    81
Jan 12 18:24:43 behemoth kernel:  11 003 03  0    1    0   1   0    1    1    89
Jan 12 18:24:43 behemoth kernel:  12 003 03  0    1    0   1   0    1    1    91
Jan 12 18:24:43 behemoth kernel:  13 003 03  0    1    0   1   0    1    1    99
Jan 12 18:24:43 behemoth kernel:  14 000 00  1    0    0   0   0    0    0    00
Jan 12 18:24:43 behemoth kernel:  15 000 00  1    0    0   0   0    0    0    00
Jan 12 18:24:43 behemoth kernel:  16 000 00  1    0    0   0   0    0    0    00
Jan 12 18:24:43 behemoth kernel:  17 000 00  1    0    0   0   0    0    0    00
Jan 12 18:24:43 behemoth kernel: IRQ to pin mappings:
Jan 12 18:24:43 behemoth kernel: IRQ0 -> 2
Jan 12 18:24:43 behemoth kernel: IRQ1 -> 1
Jan 12 18:24:43 behemoth kernel: IRQ3 -> 3
Jan 12 18:24:43 behemoth kernel: IRQ4 -> 4
Jan 12 18:24:43 behemoth kernel: IRQ5 -> 5
Jan 12 18:24:43 behemoth kernel: IRQ6 -> 6
Jan 12 18:24:43 behemoth kernel: IRQ7 -> 7
Jan 12 18:24:43 behemoth kernel: IRQ8 -> 8
Jan 12 18:24:43 behemoth kernel: IRQ13 -> 13
Jan 12 18:24:43 behemoth kernel: IRQ14 -> 14
Jan 12 18:24:43 behemoth kernel: IRQ15 -> 15
Jan 12 18:24:43 behemoth kernel: IRQ16 -> 16
Jan 12 18:24:43 behemoth kernel: IRQ17 -> 17
Jan 12 18:24:43 behemoth kernel: IRQ18 -> 18
Jan 12 18:24:43 behemoth kernel: IRQ19 -> 19
Jan 12 18:24:43 behemoth kernel: .................................... done.
Jan 12 18:24:43 behemoth kernel: print_all_local_APICs()
Jan 12 18:24:43 behemoth kernel:
Jan 12 18:24:43 behemoth kernel: printing local APIC contents on CPU#1/1:
Jan 12 18:24:43 behemoth kernel: ... APIC ID:      01000000 (1)
Jan 12 18:24:43 behemoth kernel: ... APIC VERSION: 00040011
Jan 12 18:24:43 behemoth kernel: ... APIC TASKPRI: 00000000 (00)
Jan 12 18:24:43 behemoth kernel: ... APIC ARBPRI: 00000000 (00)
Jan 12 18:24:43 behemoth kernel: ... APIC PROCPRI: 00000000
Jan 12 18:24:43 behemoth kernel: ... APIC EOI: 00000000
Jan 12 18:24:43 behemoth kernel: ... APIC LDR: 02000000
Jan 12 18:24:43 behemoth kernel: ... APIC DFR: ffffffff
Jan 12 18:24:43 behemoth kernel: ... APIC SPIV: 000003ff
Jan 12 18:24:43 behemoth kernel: ... APIC ISR field:
Jan 12 18:24:43 behemoth kernel: 0123456789abcdef0123456789abcdef
Jan 12 18:24:43 behemoth kernel: 00000000000000000000000000000000
Jan 12 18:24:43 behemoth last message repeated 7 times
Jan 12 18:24:43 behemoth kernel: ... APIC TMR field:
Jan 12 18:24:43 behemoth kernel: 0123456789abcdef0123456789abcdef
Jan 12 18:24:43 behemoth kernel: 00000000000000000000000000000000
Jan 12 18:24:43 behemoth last message repeated 3 times
Jan 12 18:24:43 behemoth kernel: 00000000010000000000000001000000
Jan 12 18:24:43 behemoth kernel: 00000000000000000000000000000000
Jan 12 18:24:43 behemoth last message repeated 2 times
Jan 12 18:24:43 behemoth kernel: ... APIC IRR field:
Jan 12 18:24:43 behemoth kernel: 0123456789abcdef0123456789abcdef
Jan 12 18:24:43 behemoth kernel: 00000000000000000000000000000000
Jan 12 18:24:43 behemoth last message repeated 7 times
Jan 12 18:24:43 behemoth kernel: ... APIC ESR: 00000000
Jan 12 18:24:43 behemoth kernel: ... APIC ICR: 000008fc
Jan 12 18:24:43 behemoth kernel: ... APIC ICR2: 01000000
Jan 12 18:24:43 behemoth kernel: ... APIC LVTT: 000200ef
Jan 12 18:24:43 behemoth kernel: ... APIC LVTPC: 00010000
Jan 12 18:24:43 behemoth kernel: ... APIC LVT0: 00010700
Jan 12 18:24:43 behemoth kernel: ... APIC LVT1: 00010400
Jan 12 18:24:43 behemoth kernel: ... APIC LVTERR: 000000fe
Jan 12 18:24:43 behemoth kernel: ... APIC TMICT: 0000a322
Jan 12 18:24:43 behemoth kernel: ... APIC TMCCT: 00000be6
Jan 12 18:24:43 behemoth kernel: ... APIC TDCR: 00000003
Jan 12 18:24:43 behemoth kernel:
Jan 12 18:24:43 behemoth kernel:
Jan 12 18:24:43 behemoth kernel: printing local APIC contents on CPU#0/0:
Jan 12 18:24:43 behemoth kernel: ... APIC ID:      00000000 (0)
Jan 12 18:24:43 behemoth kernel: ... APIC VERSION: 00040011
Jan 12 18:24:43 behemoth kernel: ... APIC TASKPRI: 00000000 (00)
Jan 12 18:24:43 behemoth kernel: ... APIC ARBPRI: 000000e0 (e0)
Jan 12 18:24:43 behemoth kernel: ... APIC PROCPRI: 00000000
Jan 12 18:24:43 behemoth kernel: ... APIC EOI: 00000000
Jan 12 18:24:43 behemoth kernel: ... APIC LDR: 01000000
Jan 12 18:24:43 behemoth kernel: ... APIC DFR: ffffffff
Jan 12 18:24:43 behemoth kernel: ... APIC SPIV: 000003ff
Jan 12 18:24:43 behemoth kernel: ... APIC ISR field:
Jan 12 18:24:43 behemoth kernel: 0123456789abcdef0123456789abcdef
Jan 12 18:24:43 behemoth kernel: 00000000000000000000000000000000
Jan 12 18:24:43 behemoth last message repeated 7 times
Jan 12 18:24:43 behemoth kernel: ... APIC TMR field:
Jan 12 18:24:43 behemoth kernel: 0123456789abcdef0123456789abcdef
Jan 12 18:24:43 behemoth kernel: 00000000000000000000000000000000
Jan 12 18:24:43 behemoth last message repeated 3 times
Jan 12 18:24:43 behemoth kernel: 00000000010000000000000001000000
Jan 12 18:24:43 behemoth kernel: 00000000000000000000000000000000
Jan 12 18:24:43 behemoth last message repeated 2 times
Jan 12 18:24:43 behemoth kernel: ... APIC IRR field:
Jan 12 18:24:43 behemoth kernel: 0123456789abcdef0123456789abcdef
Jan 12 18:24:43 behemoth kernel: 00000000000000000000000000000000
Jan 12 18:24:43 behemoth kernel: 00000000000000000100000000000000
Jan 12 18:24:43 behemoth kernel: 00000000000000000000000000000000
Jan 12 18:24:43 behemoth last message repeated 4 times
Jan 12 18:24:43 behemoth kernel: 00000000000000010000000000000000
Jan 12 18:24:43 behemoth kernel: ... APIC ESR: 00000000
Jan 12 18:24:43 behemoth kernel: ... APIC ICR: 000c08fb
Jan 12 18:24:43 behemoth kernel: ... APIC ICR2: 02000000
Jan 12 18:24:43 behemoth kernel: ... APIC LVTT: 000200ef
Jan 12 18:24:43 behemoth kernel: ... APIC LVTPC: 00010000
Jan 12 18:24:43 behemoth kernel: ... APIC LVT0: 00010700
Jan 12 18:24:43 behemoth kernel: ... APIC LVT1: 00000400
Jan 12 18:24:43 behemoth kernel: ... APIC LVTERR: 000000fe
Jan 12 18:24:43 behemoth kernel: ... APIC TMICT: 0000a322
Jan 12 18:24:43 behemoth kernel: ... APIC TMCCT: 000041e1
Jan 12 18:24:43 behemoth kernel: ... APIC TDCR: 00000003


After network hang (nmi_watchdog=0)
===================================

Jan 12 18:26:21 behemoth kernel: SysRq:
Jan 12 18:26:21 behemoth kernel: print_PIC()
Jan 12 18:26:21 behemoth kernel:
Jan 12 18:26:21 behemoth kernel: printing PIC contents
Jan 12 18:26:21 behemoth kernel: ... PIC  IMR: fffa
Jan 12 18:26:21 behemoth kernel: ... PIC  IRR: 0000
Jan 12 18:26:21 behemoth kernel: ... PIC  ISR: 0000
Jan 12 18:26:21 behemoth kernel: ... PIC ELCR: 1e00
Jan 12 18:26:21 behemoth kernel: print_IO_APIC()
Jan 12 18:26:21 behemoth kernel: number of MP IRQ sources: 23.
Jan 12 18:26:21 behemoth kernel: number of IO-APIC #2 registers: 24.
Jan 12 18:26:21 behemoth kernel: testing the IO APIC.......................
Jan 12 18:26:21 behemoth kernel:
Jan 12 18:26:21 behemoth kernel: IO APIC #2......
Jan 12 18:26:21 behemoth kernel: .... register #00: 02000000
Jan 12 18:26:21 behemoth kernel: .......    : physical APIC id: 02
Jan 12 18:26:21 behemoth kernel: .... register #01: 00170011
Jan 12 18:26:21 behemoth kernel: .......     : max redirection entries: 0017
Jan 12 18:26:21 behemoth kernel: .......     : IO APIC version: 0011
Jan 12 18:26:21 behemoth kernel: .... register #02: 00000000
Jan 12 18:26:21 behemoth kernel: .......     : arbitration: 00
Jan 12 18:26:21 behemoth kernel: .... IRQ redirection table:
Jan 12 18:26:21 behemoth kernel:  NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
Jan 12 18:26:21 behemoth kernel:  00 000 00  1    0    0   0   0    0    0    00
Jan 12 18:26:21 behemoth kernel:  01 003 03  0    0    0   0   0    1    1    39
Jan 12 18:26:21 behemoth kernel:  02 003 03  0    0    0   0   0    1    1    31
Jan 12 18:26:21 behemoth kernel:  03 003 03  0    0    0   0   0    1    1    41
Jan 12 18:26:21 behemoth kernel:  04 003 03  0    0    0   0   0    1    1    49
Jan 12 18:26:21 behemoth kernel:  05 003 03  0    0    0   0   0    1    1    51
Jan 12 18:26:21 behemoth kernel:  06 003 03  0    0    0   0   0    1    1    59
Jan 12 18:26:21 behemoth kernel:  07 003 03  0    0    0   0   0    1    1    61
Jan 12 18:26:21 behemoth kernel:  08 003 03  0    0    0   0   0    1    1    69
Jan 12 18:26:21 behemoth kernel:  09 000 00  1    0    0   0   0    0    0    00
Jan 12 18:26:21 behemoth kernel:  0a 000 00  1    0    0   0   0    0    0    00
Jan 12 18:26:21 behemoth kernel:  0b 000 00  1    0    0   0   0    0    0    00
Jan 12 18:26:21 behemoth kernel:  0c 000 00  1    0    0   0   0    0    0    00
Jan 12 18:26:21 behemoth kernel:  0d 000 00  1    0    0   0   0    0    0    00
Jan 12 18:26:21 behemoth kernel:  0e 003 03  0    0    0   0   0    1    1    71
Jan 12 18:26:21 behemoth kernel:  0f 003 03  0    0    0   0   0    1    1    79
Jan 12 18:26:21 behemoth kernel:  10 003 03  0    1    0   1   0    1    1    81
Jan 12 18:26:21 behemoth kernel:  11 003 03  0    1    0   1   0    1    1    89
Jan 12 18:26:21 behemoth kernel:  12 003 03  0    1    0   1   0    1    1    91
Jan 12 18:26:21 behemoth kernel:  13 003 03  0    1    1   1   0    1    1    99
Jan 12 18:26:21 behemoth kernel:  14 000 00  1    0    0   0   0    0    0    00
Jan 12 18:26:21 behemoth kernel:  15 000 00  1    0    0   0   0    0    0    00
Jan 12 18:26:21 behemoth kernel:  16 000 00  1    0    0   0   0    0    0    00
Jan 12 18:26:21 behemoth kernel:  17 000 00  1    0    0   0   0    0    0    00
Jan 12 18:26:21 behemoth kernel: IRQ to pin mappings:
Jan 12 18:26:21 behemoth kernel: IRQ0 -> 2
Jan 12 18:26:21 behemoth kernel: IRQ1 -> 1
Jan 12 18:26:21 behemoth kernel: IRQ3 -> 3
Jan 12 18:26:21 behemoth kernel: IRQ4 -> 4
Jan 12 18:26:21 behemoth kernel: IRQ5 -> 5
Jan 12 18:26:21 behemoth kernel: IRQ6 -> 6
Jan 12 18:26:21 behemoth kernel: IRQ7 -> 7
Jan 12 18:26:21 behemoth kernel: IRQ8 -> 8
Jan 12 18:26:21 behemoth kernel: IRQ13 -> 13
Jan 12 18:26:21 behemoth kernel: IRQ14 -> 14
Jan 12 18:26:21 behemoth kernel: IRQ15 -> 15
Jan 12 18:26:21 behemoth kernel: IRQ16 -> 16
Jan 12 18:26:21 behemoth kernel: IRQ17 -> 17
Jan 12 18:26:21 behemoth kernel: IRQ18 -> 18
Jan 12 18:26:21 behemoth kernel: IRQ19 -> 19
Jan 12 18:26:21 behemoth kernel: .................................... done.
Jan 12 18:26:21 behemoth kernel: print_all_local_APICs()
Jan 12 18:26:21 behemoth kernel:
Jan 12 18:26:21 behemoth kernel: printing local APIC contents on CPU#1/1:
Jan 12 18:26:21 behemoth kernel: ... APIC ID:      01000000 (1)
Jan 12 18:26:21 behemoth kernel: ... APIC VERSION: 00040011
Jan 12 18:26:21 behemoth kernel: ... APIC TASKPRI: 00000000 (00)
Jan 12 18:26:21 behemoth kernel: ... APIC ARBPRI: 00000000 (00)
Jan 12 18:26:21 behemoth kernel: ... APIC PROCPRI: 00000000
Jan 12 18:26:21 behemoth kernel: ... APIC EOI: 00000000
Jan 12 18:26:21 behemoth kernel: ... APIC LDR: 02000000
Jan 12 18:26:21 behemoth kernel: ... APIC DFR: ffffffff
Jan 12 18:26:21 behemoth kernel: ... APIC SPIV: 000003ff
Jan 12 18:26:21 behemoth kernel: ... APIC ISR field:
Jan 12 18:26:21 behemoth kernel: 0123456789abcdef0123456789abcdef
Jan 12 18:26:21 behemoth kernel: 00000000000000000000000000000000
Jan 12 18:26:21 behemoth last message repeated 7 times
Jan 12 18:26:21 behemoth kernel: ... APIC TMR field:
Jan 12 18:26:21 behemoth kernel: 0123456789abcdef0123456789abcdef
Jan 12 18:26:21 behemoth kernel: 00000000000000000000000000000000
Jan 12 18:26:21 behemoth last message repeated 3 times
Jan 12 18:26:21 behemoth kernel: 00000000010000000000000000000000
Jan 12 18:26:21 behemoth kernel: 00000000000000000000000000000000
Jan 12 18:26:21 behemoth last message repeated 2 times
Jan 12 18:26:21 behemoth kernel: ... APIC IRR field:
Jan 12 18:26:21 behemoth kernel: 0123456789abcdef0123456789abcdef
Jan 12 18:26:21 behemoth kernel: 00000000000000000000000000000000
Jan 12 18:26:21 behemoth last message repeated 7 times
Jan 12 18:26:21 behemoth kernel: ... APIC ESR: 00000000
Jan 12 18:26:21 behemoth kernel: ... APIC ICR: 000008fc
Jan 12 18:26:21 behemoth kernel: ... APIC ICR2: 01000000
Jan 12 18:26:21 behemoth kernel: ... APIC LVTT: 000200ef
Jan 12 18:26:21 behemoth kernel: ... APIC LVTPC: 00010000
Jan 12 18:26:21 behemoth kernel: ... APIC LVT0: 00010700
Jan 12 18:26:21 behemoth kernel: ... APIC LVT1: 00010400
Jan 12 18:26:21 behemoth kernel: ... APIC LVTERR: 000000fe
Jan 12 18:26:21 behemoth kernel: ... APIC TMICT: 0000a322
Jan 12 18:26:21 behemoth kernel: ... APIC TMCCT: 00001803
Jan 12 18:26:21 behemoth kernel: ... APIC TDCR: 00000003
Jan 12 18:26:21 behemoth kernel:
Jan 12 18:26:21 behemoth kernel:
Jan 12 18:26:21 behemoth kernel: printing local APIC contents on CPU#0/0:
Jan 12 18:26:21 behemoth kernel: ... APIC ID:      00000000 (0)
Jan 12 18:26:21 behemoth kernel: ... APIC VERSION: 00040011
Jan 12 18:26:21 behemoth kernel: ... APIC TASKPRI: 00000000 (00)
Jan 12 18:26:21 behemoth kernel: ... APIC ARBPRI: 000000e0 (e0)
Jan 12 18:26:21 behemoth kernel: ... APIC PROCPRI: 00000000
Jan 12 18:26:21 behemoth kernel: ... APIC EOI: 00000000
Jan 12 18:26:21 behemoth kernel: ... APIC LDR: 01000000
Jan 12 18:26:21 behemoth kernel: ... APIC DFR: ffffffff
Jan 12 18:26:21 behemoth kernel: ... APIC SPIV: 000003ff
Jan 12 18:26:21 behemoth kernel: ... APIC ISR field:
Jan 12 18:26:21 behemoth kernel: 0123456789abcdef0123456789abcdef
Jan 12 18:26:21 behemoth kernel: 00000000000000000000000000000000
Jan 12 18:26:21 behemoth last message repeated 7 times
Jan 12 18:26:21 behemoth kernel: ... APIC TMR field:
Jan 12 18:26:21 behemoth kernel: 0123456789abcdef0123456789abcdef
Jan 12 18:26:21 behemoth kernel: 00000000000000000000000000000000
Jan 12 18:26:21 behemoth last message repeated 3 times
Jan 12 18:26:21 behemoth kernel: 00000000010000000000000001000000
Jan 12 18:26:21 behemoth kernel: 00000000000000000000000000000000
Jan 12 18:26:21 behemoth last message repeated 2 times
Jan 12 18:26:21 behemoth kernel: ... APIC IRR field:
Jan 12 18:26:21 behemoth kernel: 0123456789abcdef0123456789abcdef
Jan 12 18:26:21 behemoth kernel: 00000000000000000000000000000000
Jan 12 18:26:21 behemoth last message repeated 6 times
Jan 12 18:26:21 behemoth kernel: 00000000000000010000000000000000
Jan 12 18:26:21 behemoth kernel: ... APIC ESR: 00000000
Jan 12 18:26:21 behemoth kernel: ... APIC ICR: 000c08fb
Jan 12 18:26:21 behemoth kernel: ... APIC ICR2: 02000000
Jan 12 18:26:21 behemoth kernel: ... APIC LVTT: 000200ef
Jan 12 18:26:21 behemoth kernel: ... APIC LVTPC: 00010000
Jan 12 18:26:21 behemoth kernel: ... APIC LVT0: 00010700
Jan 12 18:26:21 behemoth kernel: ... APIC LVT1: 00000400
Jan 12 18:26:21 behemoth kernel: ... APIC LVTERR: 000000fe
Jan 12 18:26:21 behemoth kernel: ... APIC TMICT: 0000a322
Jan 12 18:26:21 behemoth kernel: ... APIC TMCCT: 00004e26
Jan 12 18:26:21 behemoth kernel: ... APIC TDCR: 00000003

-- 
  WWWWW      _______________________
 ## o o\    /     Frank de Lange     \
 }#   \|   /                          \
  ##---# _/     <Hacker for Hire>      \
   ####   \      +31-320-252965        /
           \    frank@unternet.org    /
            -------------------------
 [ "Omnis enim res, quae dando non deficit, dum habetur
    et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware  related?
@ 2001-01-12 17:16 Manfred Spraul
  2001-01-12 17:33 ` Frank de Lange
  0 siblings, 1 reply; 36+ messages in thread
From: Manfred Spraul @ 2001-01-12 17:16 UTC (permalink / raw)
  To: dwmw2, linux-kernel, frank

> 
> manfred@colorfullife.com said: 
> > IRR for interrupt 19 is set, that means the IO APIC has sent the 
> > interrupt to a cpu but not yet received the corresponding EOI. 
> 
> OK, but couldn't we reset it by sending an extra EOI when the drivers 
> decide that they've missed interrupts? 

How?
You send an EOI by writing 0 to the EOI register of the local apic, and
then the local apic automagically checks it's ISR bitfield.
It takes the highest set bit and clears it. Then it checks that bit in
the TMR, and it if's also set in the TMR then it sends an EOI to the IO
apic.

The magic seems to be tamper proof: all bits are read only.

The bit on the IO apic is also read only.
Perhaps with brute force? Switch the interrupt to edge triggered on the
io apic, wait 1 usec, switch it back to level triggered. The IRR bit is
undefined for edge triggered interrupts, perhaps that clears the IRR
bit.

I would first concentrate on the differences between 2.2 and 2.4:

Frank, could you try what happens with the NMI oopser disabled?

The second major difference I'm immediately aware of is the number of
the reschedule/tlb flush/etc interrupt: 2.2 uses the lowest priority,
2.4 the highest priority.

--
	Manfred
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related?
  2001-01-12 16:53 Manfred Spraul
@ 2001-01-12 17:02 ` David Woodhouse
  0 siblings, 0 replies; 36+ messages in thread
From: David Woodhouse @ 2001-01-12 17:02 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: frank, linux-kernel



manfred@colorfullife.com said:
> IRR for interrupt 19 is set, that means the IO APIC has sent the
> interrupt to a cpu but not yet received the corresponding EOI.

OK, but couldn't we reset it by sending an extra EOI when the drivers 
decide that they've missed interrupts?

--
dwmw2


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware  related?
@ 2001-01-12 16:53 Manfred Spraul
  2001-01-12 17:02 ` David Woodhouse
  0 siblings, 1 reply; 36+ messages in thread
From: Manfred Spraul @ 2001-01-12 16:53 UTC (permalink / raw)
  To: frank, linux-kernel

Let's decode it:

> IO APIC #2...... 
> NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect: 
> 12 0FF 0F 0 1 0 1 0 1 1 91 
> 13 0FF 0F 0 1 1 1 0 1 1 99 

IRR for interrupt 19 is set, that means the IO APIC has sent the
interrupt to a cpu but not yet received the corresponding EOI.

That bit is read only, so we can't set it to 0 to kick the io apic.

The Vector is 99, we must check that bit in the ISR, TMR and IRR of both
cpus.

cpu1:
> ISR: all bits 0
> TMR: only bit 0x99 is set
> IRR: all bits 0

cpu0:
> ISR: all bits 0
> TMR: only bit 0x89 is set
> IRR: bit 0xfc and bit 0xef are set.

ISR is the in-server register, 0 means that the cpu is not processing an
interrupt right now.

TMR is the trigger mode registers, 1 means that the local apic should
send an EOI to the io apic when the cpu signals the EOI to the local
apic.


IRR is the list of pending interrupts:
0xef is the local timer interrupt,
0xfc is the reschedule interrupt
(see include/asm-i386/hw_irq.h)

These bits are also read only.

If you search the IO APIC documentation: number 29056601 - just search
with google. The local APIC is documented in the main cpu handbook (PPro
or later), in the chapter about multiple processor management 

--
	Manfred
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2001-01-16 19:25 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-01-10 21:30 QUESTION: Network hangs with BP6 and 2.4.x kernels, hardware related? Frank de Lange
2001-01-10 22:21 ` Manfred Spraul
2001-01-10 22:29   ` Frank de Lange
2001-01-10 22:40   ` Frank de Lange
2001-01-11 11:48 ` Andrew Morton
2001-01-11 15:22   ` Frank de Lange
2001-01-11 16:55   ` Frank de Lange
2001-01-11 19:18   ` Frank de Lange
     [not found]     ` <3A5E0849.EB428D70@mandrakesoft.com>
2001-01-12  0:28       ` Frank de Lange
2001-01-12 11:40         ` Andrew Morton
2001-01-12 15:06           ` Frank de Lange
2001-01-12 15:36           ` Frank de Lange
2001-01-11 19:38   ` Frank de Lange
2001-01-11 19:49   ` Frank de Lange
2001-01-11 21:09   ` Frank de Lange
2001-01-11 21:47     ` Jeff Garzik
2001-01-11 21:53       ` Frank de Lange
2001-01-12 14:35       ` David Woodhouse
2001-01-12 16:53 Manfred Spraul
2001-01-12 17:02 ` David Woodhouse
2001-01-12 17:16 Manfred Spraul
2001-01-12 17:33 ` Frank de Lange
2001-01-12 17:51   ` Manfred Spraul
2001-01-12 18:25     ` Frank de Lange
2001-01-12 19:04       ` Manfred Spraul
2001-01-12 19:07         ` Frank de Lange
2001-01-12 19:21         ` Frank de Lange
2001-01-12 19:33           ` Manfred Spraul
2001-01-12 19:52             ` Frank de Lange
2001-01-12 19:59               ` Linus Torvalds
2001-01-12 20:03                 ` Ingo Molnar
2001-01-14  0:13                   ` Roeland Th. Jansen
2001-01-14  0:23                     ` Frank de Lange
2001-01-12 20:05                 ` Frank de Lange
2001-01-12 21:21                 ` Frank de Lange
     [not found] <20010112165104.A22465@unternet.org>
2001-01-16 19:23 ` Maciej W. Rozycki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).