linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: CPU Lockup with 2.4.21 and 2.4.22-pre
       [not found] <0001F3D0@gwia.compu-shack.com>
@ 2003-07-22 10:24 ` Michael Troß
  2003-07-22 11:51   ` Udo A. Steinberg
  0 siblings, 1 reply; 8+ messages in thread
From: Michael Troß @ 2003-07-22 10:24 UTC (permalink / raw)
  To: Udo A. Steinberg; +Cc: Linux Kernel Mailing List

Am Mon, 2003-07-21 um 16.17 schrieb Udo A. Steinberg:
> On Mon, 21 Jul 2003 16:12:26 +0200 Udo A. Steinberg (UAS) wrote:
> 
> UAS> We have a Dual-Xeon machine with Hyperthreading which keeps locking
> up hard,
> UAS> so that not even Sysrq works anymore. I have captured such a lockup
> using the
> UAS> NMI oopser. Below you'll find the lockup fed through ksymoops. Note
> that
> UAS> after CPU3 locked up, CPU2 did too. But that lockup couldn't be
> captured
> UAS> anymore. Kernel is a monolithic 2.4.22-pre6. Problem also happened
> on
> UAS> plain 2.4.21. I can provide more information wrt. hardware, config
> etc.
> UAS> on request.

Would be really useful if you do so.

> Sorry, I used the wrong System.map. Below is the fixed decode. Looks
> like
> the lockup is caused by the 3rd party Compushack FDDI driver.

What makes you believe this? There is no matching code sequence like the
one from your dump in the driver, to be exact: in a driver compiled with
gcc 3.3 and kernel 2.4.21.

> Regards,
> -Udo.

Regards,
Michael


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: CPU Lockup with 2.4.21 and 2.4.22-pre
  2003-07-22 10:24 ` CPU Lockup with 2.4.21 and 2.4.22-pre Michael Troß
@ 2003-07-22 11:51   ` Udo A. Steinberg
  0 siblings, 0 replies; 8+ messages in thread
From: Udo A. Steinberg @ 2003-07-22 11:51 UTC (permalink / raw)
  To: Michael Troß; +Cc: Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1207 bytes --]

On 22 Jul 2003 12:24:24 +0200 Michael Troß (MT) wrote:

UAS> I can provide more information wrt. hardware, config etc.
UAS> on request.

MT> Would be really useful if you do so.

I have put the following information at: http://www.wh8.tu-dresden.de/fddi/

* My .config for 2.4.22-pre6
* dmesg output of 2.4.22-pre6 (both 2.4.21 and 2.4.22-pre6 behave the same)
* the ksymoops output of the lockup
* the output of lspci -v
* the fddi patch i used (applies cleanly to 2.4.21 and with fuzz to -pre6)

Note that the fddi patch includes a patch you've previously sent me, which
isn't present in the driver on your website.

If you need more information, let me know. Also if you have any tips or
patches that would help in debugging the issue, I'm happy to try them.

MT> What makes you believe this? There is no matching code sequence like the
MT> one from your dump in the driver, to be exact: in a driver compiled with
MT> gcc 3.3 and kernel 2.4.21.

The fact that the backtrace in the decoded oops looks like the lockup
happened in the fddi driver led me to the conclusion that this may be
the culprit. I have compiled the 2.4.22-pre6 kernel with gcc-3.3 also.

Regards,
-Udo.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: CPU Lockup with 2.4.21 and 2.4.22-pre
  2003-07-22 15:26 ` Michael Troß
@ 2003-07-22 15:30   ` Udo A. Steinberg
  0 siblings, 0 replies; 8+ messages in thread
From: Udo A. Steinberg @ 2003-07-22 15:30 UTC (permalink / raw)
  To: Michael Troß; +Cc: Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 472 bytes --]

On 22 Jul 2003 17:26:10 +0200 Michael Troß (MT) wrote:

MT> Seems that a spin lock is already held. Do you get this oops right after
MT> opening the device? Then please try NoSelfTest.

No, the lockup happens during operation. Sometimes the kernel runs only for
about one hour, sometimes for a day, but never longer before the lockups
happen.

I don't think going back to 2.4.18 will make a difference for this case,
or do you think it will?

Regards,
-Udo.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: CPU Lockup with 2.4.21 and 2.4.22-pre
       [not found] <0001F49C@gwia.compu-shack.com>
@ 2003-07-22 15:26 ` Michael Troß
  2003-07-22 15:30   ` Udo A. Steinberg
  0 siblings, 1 reply; 8+ messages in thread
From: Michael Troß @ 2003-07-22 15:26 UTC (permalink / raw)
  To: Udo A. Steinberg; +Cc: Linux Kernel Mailing List

Am Die, 2003-07-22 um 16.24 schrieb Udo A. Steinberg:

> MT> As you might know, the Compu-Shack fddi products reached end-of-life
> MT> last year.
> 
> Yes. Just thought I'd let you know that we aren't using the same
> patch as on the website, but one that has been rediffed for 2.4.21 and
> has an additional fix from you in it.

Mentioned it just to let you know that the company is no longer
providing new drivers for new kernels. Probably you better stay with
2.4.18.

[snip]

Seems that a spin lock is already held. Do you get this oops right after
opening the device? Then please try NoSelfTest.

Regards,
Michael


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: CPU Lockup with 2.4.21 and 2.4.22-pre
  2003-07-22 12:55 ` Michael Troß
@ 2003-07-22 14:24   ` Udo A. Steinberg
  0 siblings, 0 replies; 8+ messages in thread
From: Udo A. Steinberg @ 2003-07-22 14:24 UTC (permalink / raw)
  To: Michael Troß; +Cc: Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 3612 bytes --]

On 22 Jul 2003 14:55:05 +0200 Michael Troß (MT) wrote:

MT> As you might know, the Compu-Shack fddi products reached end-of-life
MT> last year.

Yes. Just thought I'd let you know that we aren't using the same
patch as on the website, but one that has been rediffed for 2.4.21 and
has an additional fix from you in it.

MT> As I can't locate the code sequence in my driver module, please check it
MT> with your compiled kernel:
MT>   objdump -d vmlinux | grep -A 20 "7e f5" | grep csfddi

c01f8334:       7e f5                   jle    c01f832b <.text.lock.csfddi>
c01f8336:       e9 87 d1 ff ff          jmp    c01f54c2 <csfddi_transmit+0x22>
c01f8344:       7e f5                   jle    c01f833b <.text.lock.csfddi+0x10>
c01f8346:       e9 b2 d2 ff ff          jmp    c01f55fd <csfddi_transmit_timeout+0x1d>
c01f8354:       7e f5                   jle    c01f834b <.text.lock.csfddi+0x20>
c01f8356:       e9 02 d7 ff ff          jmp    c01f5a5d <csfddi_interrupt+0xd>
c01f8364:       7e f5                   jle    c01f835b <.text.lock.csfddi+0x30>
c01f8366:       e9 e8 d9 ff ff          jmp    c01f5d53 <csfddi_timer_work+0x33>
c01f8374:       7e f5                   jle    c01f836b <.text.lock.csfddi+0x40>
c01f8376:       e9 db da ff ff          jmp    c01f5e56 <csfddi_timer+0x56>

MT> Do you get a result like the code line from your oops, which eip is
MT> referring to?

It's referring to EIP c01f8364. Here is the disassembly of the code fragment.

c01f832b <.text.lock.csfddi>:
c01f832b:       80 bb 94 00 00 00 00    cmpb   $0x0,0x94(%ebx)
c01f8332:       f3 90                   repz nop 
c01f8334:       7e f5                   jle    c01f832b <.text.lock.csfddi>
c01f8336:       e9 87 d1 ff ff          jmp    c01f54c2 <csfddi_transmit+0x22>
c01f833b:       80 be 94 00 00 00 00    cmpb   $0x0,0x94(%esi)               
c01f8342:       f3 90                   repz nop 
c01f8344:       7e f5                   jle    c01f833b <.text.lock.csfddi+0x10>
c01f8346:       e9 b2 d2 ff ff          jmp    c01f55fd <csfddi_transmit_timeout+0x1d>
c01f834b:       80 be 94 00 00 00 00    cmpb   $0x0,0x94(%esi)
c01f8352:       f3 90                   repz nop 
c01f8354:       7e f5                   jle    c01f834b <.text.lock.csfddi+0x20>
c01f8356:       e9 02 d7 ff ff          jmp    c01f5a5d <csfddi_interrupt+0xd>
c01f835b:       80 be 94 00 00 00 00    cmpb   $0x0,0x94(%esi) 
c01f8362:       f3 90                   repz nop 
c01f8364:       7e f5                   jle    c01f835b <.text.lock.csfddi+0x30>
c01f8366:       e9 e8 d9 ff ff          jmp    c01f5d53 <csfddi_timer_work+0x33>
c01f836b:       80 3d 40 be 2e c0 00    cmpb   $0x0,0xc02ebe40
c01f8372:       f3 90                   repz nop
c01f8374:       7e f5                   jle    c01f836b <.text.lock.csfddi+0x40>
c01f8376:       e9 db da ff ff          jmp    c01f5e56 <csfddi_timer+0x56>
c01f837b:       90                      nop
c01f837c:       90                      nop    
c01f837d:       90                      nop    
c01f837e:       90                      nop    
c01f837f:       90                      nop    

I've also put up the vmlinux image at the URL I've posted in my previous
post, if it's of any help.

MT> But you got two different decoding results, didn't you ?!

The first posting which was only sent to LKML and not to you had the
lockup output misdecoded, because I used a wrong System.map.
The second posting (the one I cc'd to you) and the decoded lockup output
(lockup.txt) on the website are the correct ones.

Regards,
-Udo.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: CPU Lockup with 2.4.21 and 2.4.22-pre
       [not found] <0001F474@gwia.compu-shack.com>
@ 2003-07-22 12:55 ` Michael Troß
  2003-07-22 14:24   ` Udo A. Steinberg
  0 siblings, 1 reply; 8+ messages in thread
From: Michael Troß @ 2003-07-22 12:55 UTC (permalink / raw)
  To: Udo A. Steinberg; +Cc: Linux Kernel Mailing List

Am Die, 2003-07-22 um 13.51 schrieb Udo A. Steinberg:

> Note that the fddi patch includes a patch you've previously sent me, which
> isn't present in the driver on your website.

As you might know, the Compu-Shack fddi products reached end-of-life
last year.

> If you need more information, let me know. Also if you have any tips or
> patches that would help in debugging the issue, I'm happy to try them.

As I can't locate the code sequence in my driver module, please check it
with your compiled kernel:
  objdump -d vmlinux | grep -A 20 "7e f5" | grep csfddi
or module:
  hexdump -e '32/1 "%02x " "\n"' csf.o | grep "7e f5 e9 e8"
Do you get a result like the code line from your oops, which eip is
referring to?

> MT> What makes you believe this? There is no matching code sequence like the
> MT> one from your dump in the driver, to be exact: in a driver compiled with
> MT> gcc 3.3 and kernel 2.4.21.
> 
> The fact that the backtrace in the decoded oops looks like the lockup
> happened in the fddi driver led me to the conclusion that this may be
> the culprit.

But you got two different decoding results, didn't you ?!

> Regards,
> -Udo.

Regards,
Michael


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: CPU Lockup with 2.4.21 and 2.4.22-pre
  2003-07-21 14:12 Udo A. Steinberg
@ 2003-07-21 14:17 ` Udo A. Steinberg
  0 siblings, 0 replies; 8+ messages in thread
From: Udo A. Steinberg @ 2003-07-21 14:17 UTC (permalink / raw)
  To: Michael Troß; +Cc: Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 3028 bytes --]

On Mon, 21 Jul 2003 16:12:26 +0200 Udo A. Steinberg (UAS) wrote:

UAS> We have a Dual-Xeon machine with Hyperthreading which keeps locking up hard,
UAS> so that not even Sysrq works anymore. I have captured such a lockup using the
UAS> NMI oopser. Below you'll find the lockup fed through ksymoops. Note that
UAS> after CPU3 locked up, CPU2 did too. But that lockup couldn't be captured
UAS> anymore. Kernel is a monolithic 2.4.22-pre6. Problem also happened on
UAS> plain 2.4.21. I can provide more information wrt. hardware, config etc.
UAS> on request.

Sorry, I used the wrong System.map. Below is the fixed decode. Looks like
the lockup is caused by the 3rd party Compushack FDDI driver.

Regards,
-Udo.


ksymoops 2.4.9 on i686 2.4.22-pre6.  Options used
     -V (default)
     -K (specified)
     -l /proc/modules (default)
     -o /lib/modules/2.4.22-pre6/ (default)
     -m /boot/System.map-2.4.22 (specified)

No modules in ksyms, skipping objects
No ksyms, skipping lsmod
NMI Watchdog detected LOCKUP on CPU3, eip c01f8364, registers:
CPU:    3
EIP:    0010:[<c01f8364>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00000082
eax: 00000006   ebx: 00000202   ecx: f7ee3400   edx: c01f5d20
esi: f7ee3400   edi: 00000180   ebp: 00000003   esp: f7efff64
ds: 0018   es: 0018   ss: 0018
Process ksoftirqd_CPU3 (pid: 6, stackpage=f7eff000)
Stack: f7ee3428 c02d03e6 c0105cfa f6dae000 f7e4ac80 f7efff88 f7efff88 c011f8da 
       f7ee3400 f7ee34ac f7ee34ac c0393434 00000000 c0122cfd c02ebe1c c011f7f5 
       c011f6a3 00000009 00000001 c0367a00 fffffffe c011f456 c0367a00 00000246 
Call Trace:    [<c0105cfa>] [<c011f8da>] [<c0122cfd>] [<c011f7f5>] [<c011f6a3>]
  [<c011f456>] [<c011f9b5>] [<c0105000>] [<c01058ee>] [<c011f8f0>]
Code: 7e f5 e9 e8 d9 ff ff 80 3d 40 be 2e c0 00 f3 90 7e f5 e9 db 


>>EIP; c01f8364 <.text.lock.csfddi+39/55>   <=====

>>edx; c01f5d20 <csfddi_timer_work+0/e0>

Trace; c0105cfa <__switch_to+ca/d0>
Trace; c011f8da <__run_task_queue+6a/80>
Trace; c0122cfd <immediate_bh+1d/20>
Trace; c011f7f5 <bh_action+45/70>
Trace; c011f6a3 <tasklet_hi_action+63/b0>
Trace; c011f456 <do_softirq+d6/e0>
Trace; c011f9b5 <ksoftirqd+c5/f0>
Trace; c0105000 <_stext+0/0>
Trace; c01058ee <arch_kernel_thread+2e/40>
Trace; c011f8f0 <ksoftirqd+0/f0>

Code;  c01f8364 <.text.lock.csfddi+39/55>
00000000 <_EIP>:
Code;  c01f8364 <.text.lock.csfddi+39/55>   <=====
   0:   7e f5                     jle    fffffff7 <_EIP+0xfffffff7>   <=====
Code;  c01f8366 <.text.lock.csfddi+3b/55>
   2:   e9 e8 d9 ff ff            jmp    ffffd9ef <_EIP+0xffffd9ef>
Code;  c01f836b <.text.lock.csfddi+40/55>
   7:   80 3d 40 be 2e c0 00      cmpb   $0x0,0xc02ebe40
Code;  c01f8372 <.text.lock.csfddi+47/55>
   e:   f3 90                     repz nop 
Code;  c01f8374 <.text.lock.csfddi+49/55>
  10:   7e f5                     jle    7 <_EIP+0x7>
Code;  c01f8376 <.text.lock.csfddi+4b/55>
  12:   e9 db 00 00 00            jmp    f2 <_EIP+0xf2>

 NMI Watchdog detected LOCKUP on CPU2, eip c01062cd, registers:

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* CPU Lockup with 2.4.21 and 2.4.22-pre
@ 2003-07-21 14:12 Udo A. Steinberg
  2003-07-21 14:17 ` Udo A. Steinberg
  0 siblings, 1 reply; 8+ messages in thread
From: Udo A. Steinberg @ 2003-07-21 14:12 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Marcelo Tosatti

[-- Attachment #1: Type: text/plain, Size: 2845 bytes --]


Hi all,

We have a Dual-Xeon machine with Hyperthreading which keeps locking up hard,
so that not even Sysrq works anymore. I have captured such a lockup using the
NMI oopser. Below you'll find the lockup fed through ksymoops. Note that
after CPU3 locked up, CPU2 did too. But that lockup couldn't be captured
anymore. Kernel is a monolithic 2.4.22-pre6. Problem also happened on
plain 2.4.21. I can provide more information wrt. hardware, config etc.
on request.

Regards,
-Udo.


ksymoops 2.4.9 on i686 2.4.22-pre6.  Options used
     -V (default)
     -K (specified)
     -l /proc/modules (default)
     -o /lib/modules/2.4.22-pre6/ (default)
     -m /boot/System.map-2.4.21 (specified)

No modules in ksyms, skipping objects
No ksyms, skipping lsmod
NMI Watchdog detected LOCKUP on CPU3, eip c01f8364, registers:
CPU:    3
EIP:    0010:[<c01f8364>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00000082
eax: 00000006   ebx: 00000202   ecx: f7ee3400   edx: c01f5d20
esi: f7ee3400   edi: 00000180   ebp: 00000003   esp: f7efff64
ds: 0018   es: 0018   ss: 0018
Process ksoftirqd_CPU3 (pid: 6, stackpage=f7eff000)
Stack: f7ee3428 c02d03e6 c0105cfa f6dae000 f7e4ac80 f7efff88 f7efff88 c011f8da 
       f7ee3400 f7ee34ac f7ee34ac c0393434 00000000 c0122cfd c02ebe1c c011f7f5 
       c011f6a3 00000009 00000001 c0367a00 fffffffe c011f456 c0367a00 00000246 
Call Trace:    [<c0105cfa>] [<c011f8da>] [<c0122cfd>] [<c011f7f5>] [<c011f6a3>]
  [<c011f456>] [<c011f9b5>] [<c0105000>] [<c01058ee>] [<c011f8f0>]
Code: 7e f5 e9 e8 d9 ff ff 80 3d 40 be 2e c0 00 f3 90 7e f5 e9 db 


>>EIP; c01f8364 <pcibios_lookup_irq+194/370>   <=====

>>edx; c01f5d20 <restore_i387+70/1a0>

Trace; c0105cfa <ext2_file_operations+3a/60>
Trace; c011f8da <unix_stream_ops+1a/60>
Trace; c0122cfd <init_tss+7fd/2000>
Trace; c011f7f5 <arpt_sockopts+15/40>
Trace; c011f6a3 <required_len.1+23/60>
Trace; c011f456 <info.0+76/140>
Trace; c011f9b5 <unix_table+15/60>
Trace; c0105000 <proc_mem_inode_operations+20/60>
Trace; c01058ee <nibblemap+e/40>
Trace; c011f8f0 <unix_stream_ops+30/60>

Code;  c01f8364 <pcibios_lookup_irq+194/370>
00000000 <_EIP>:
Code;  c01f8364 <pcibios_lookup_irq+194/370>   <=====
   0:   7e f5                     jle    fffffff7 <_EIP+0xfffffff7>   <=====
Code;  c01f8366 <pcibios_lookup_irq+196/370>
   2:   e9 e8 d9 ff ff            jmp    ffffd9ef <_EIP+0xffffd9ef>
Code;  c01f836b <pcibios_lookup_irq+19b/370>
   7:   80 3d 40 be 2e c0 00      cmpb   $0x0,0xc02ebe40
Code;  c01f8372 <pcibios_lookup_irq+1a2/370>
   e:   f3 90                     repz nop 
Code;  c01f8374 <pcibios_lookup_irq+1a4/370>
  10:   7e f5                     jle    7 <_EIP+0x7>
Code;  c01f8376 <pcibios_lookup_irq+1a6/370>
  12:   e9 db 00 00 00            jmp    f2 <_EIP+0xf2>

 NMI Watchdog detected LOCKUP on CPU2, eip c01062cd, registers:

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2003-07-22 15:15 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <0001F3D0@gwia.compu-shack.com>
2003-07-22 10:24 ` CPU Lockup with 2.4.21 and 2.4.22-pre Michael Troß
2003-07-22 11:51   ` Udo A. Steinberg
     [not found] <0001F49C@gwia.compu-shack.com>
2003-07-22 15:26 ` Michael Troß
2003-07-22 15:30   ` Udo A. Steinberg
     [not found] <0001F474@gwia.compu-shack.com>
2003-07-22 12:55 ` Michael Troß
2003-07-22 14:24   ` Udo A. Steinberg
2003-07-21 14:12 Udo A. Steinberg
2003-07-21 14:17 ` Udo A. Steinberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).