linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.4.18-24 SMP Machine stuck in zombie state after kernel Oops
@ 2003-07-16  7:54 yuval yeret
  0 siblings, 0 replies; 2+ messages in thread
From: yuval yeret @ 2003-07-16  7:54 UTC (permalink / raw)
  To: linux-kernel; +Cc: yuval

Hi,

Tried to find information about a kernel OOPS I've seen a lot of times 
already on 4 different machines - but nothing seems to be said about this in 
the list archives or anywhere else for that matter.

We are running 2.4.18-24 on SMP machines with 2CPUs and hyperthreading 
(SuperMicro Xeon servers) and doing heavy IO to disk and networking. (Qlogic 
HBAs and Intel e1000 NICs are used)

At some point the machine oopses (no scenario except heavy nfs-server like 
load):


The oops doesn't appear in logs or on the console, but I've been able to use 
the diagnostic keys to get the following information:

right ALT+Scroll lock

CPU 0 : swapper

[<c0106f32>] (0xc0323fc4))  default_idle
[<c0105000>] (0xc0323fd4)) empty_zero_page

CPU 1 : swapper

(<c0106f32>)  (0xc 6597fb0)     - default_idle
(<c011d29b>) (0xc 6597fd0)    - out_of_line_bug
(<c011d449>) (8xc 659ffc)        - printk

CPU 3 : swapper

[<c0106f32>] (0xc8257fb0))  default_idle
[<c011d449>][0xc8257fd0)) printk


CTRL +Scroll lock

XINETD
[<c0108dc0>]sys_sigaltstack
[<c01151b1>]flush_tlb_page
[<c0115140>]flush_tlb_page
[<c014086e>]shmem_file_setup
[<c0131109>]do_generic_file_read
[<c013121e>]generic_file_read
[<c01310b0>]do_generic_file_read
[<c0142aa6>]sys_read
[<c0108ccf>]sys_sigaltstack

CROND

[<c01198f1>]wait_for_completion
[<c011c47a>]do_fork
[<c0107718>]dump_thread
[<c0108ccf>]sys_sigaltstack



After the oops networking stack continues to function, some running daemons 
continue to work (I'm seeing network traffic from the machine which 
indicates that clearly), but login into the node is not possible via 
console, ssh, rsh, and the majority of the application processes are dead.

Any information / pointers will be appreciated.

If any information is missing or anything I should do to help analyze next 
time it happens tell me as well.


Thanks,

--
Yuval Yeret
Exanet
yuval@exanet.com
http://www.exanet.com
Tel.  972-9-9717782
Fax. 972-9-9717778

_________________________________________________________________
The new MSN 8: smart spam protection and 2 months FREE*  
http://join.msn.com/?page=features/junkmail


^ permalink raw reply	[flat|nested] 2+ messages in thread

* RE: 2.4.18-24 SMP Machine stuck in zombie state after kernel Oops
@ 2003-07-29  9:47 yuval yeret
  0 siblings, 0 replies; 2+ messages in thread
From: yuval yeret @ 2003-07-29  9:47 UTC (permalink / raw)
  To: linux-kernel; +Cc: yuval

>Tried to find information about a kernel OOPS I've seen a lot of times 
>already on 4 different machines - >but nothing seems to be said about this 
>in the list archives or anywhere else for that matter.

>We are running 2.4.18-24 on SMP machines with 2CPUs and hyperthreading 
>(SuperMicro Xeon >servers) and doing heavy IO to disk and networking. 
>(Qlogic HBAs and Intel e1000 NICs are used)
Actually this time it happened on machines with Emulex HBAs.

>At some point the machine oopses (no scenario except heavy nfs-server like 
>load):
.....snipped...
>After the oops networking stack continues to function, some running daemons 
>continue to work (I'm >seeing network traffic from the machine which 
>indicates that clearly), but login into the node is not >possible via 
>console, ssh, rsh, and the majority of the application processes are dead.

Since then I tried to reproduce the problem with devfs disabled, and after 
some time found a pattern that reproduces this scenario quite consistently.

This time I've been able to see the output of the magic keys in the log:



===============================
Magic keys show:
Jul 29 11:00:17 node1 kernel: Pid: 0, comm:              swapper
Jul 29 11:00:17 node1 kernel: EIP: 0010:[default_idle+41/64] CPU: 2
Jul 29 11:00:17 node1 kernel: EIP: 0010:[<c0106e89>] CPU: 2
Jul 29 11:00:17 node1 kernel: EIP is at  (2.4.18-24exa)
Jul 29 11:00:17 node1 kernel:  EFLAGS: 00000246    Not tainted
Jul 29 11:00:17 node1 kernel: EAX: 00000000 EBX: c0106e60 ECX: 00000032 EDX: 
c4af6000
Jul 29 11:00:17 node1 kernel: ESI: c4af6000 EDI: c4af6000 EBP: c0106e60 DS: 
0018 ES: 0018
Jul 29 11:00:17 node1 kernel: CR0: 8005003b CR2: 4012faa0 CR3: 33b6a340 CR4: 
000006f0
Jul 29 11:00:17 node1 kernel: Call Trace: [cpu_idle+50/80]  (0xc4af7fb0))
Jul 29 11:00:17 node1 kernel: Call Trace: [<c0106f02>]  (0xc4af7fb0))
Jul 29 11:00:17 node1 kernel: [printk+297/320]  (0xc4af7fd0))
Jul 29 11:00:17 node1 kernel: [<c011d409>]  (0xc4af7fd0))

Jul 29 11:00:20 node1 kernel: Pid: 0, comm:              swapper
Jul 29 11:00:20 node1 kernel: EIP: 0010:[default_idle+41/64] CPU: 0
Jul 29 11:00:20 node1 kernel: EIP: 0010:[<c0106e89>] CPU: 0
Jul 29 11:00:20 node1 kernel: EIP is at  (2.4.18-24exa)
Jul 29 11:00:20 node1 kernel:  EFLAGS: 00000246    Not tainted
Jul 29 11:00:20 node1 kernel: EAX: 00000000 EBX: c0106e60 ECX: 00000032 EDX: 
c031c000
Jul 29 11:00:20 node1 kernel: ESI: c031c000 EDI: c031c000 EBP: c0106e60 DS: 
0018 ES: 0018
Jul 29 11:00:20 node1 kernel: CR0: 8005003b CR2: 40048794 CR3: 36963c80 CR4: 
000006f0
Jul 29 11:00:20 node1 kernel: Call Trace: [cpu_idle+50/80]  (0xc031dfc4))
Jul 29 11:00:20 node1 kernel: [<c0106f02>]  (0xc031dfc4))
Jul 29 11:00:20 node1 kernel: [_stext+0/80]  (0xc031dfd0))
Jul 29 11:00:20 node1 kernel: [<c0105000>]  (0xc031dfd0))
Jul 29 11:00:20 node1 kernel:

Any information / pointers will be appreciated.



If any information is missing or anything I should do to help analyze next 
time it happens tell me as well.


Thanks,

--
Yuval Yeret
Exanet
yuval@exanet.com
http://www.exanet.com
Tel.  972-9-9717782
Fax. 972-9-9717778

_________________________________________________________________
The new MSN 8: advanced junk mail protection and 2 months FREE* 
http://join.msn.com/?page=features/junkmail


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2003-07-29  9:47 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-07-16  7:54 2.4.18-24 SMP Machine stuck in zombie state after kernel Oops yuval yeret
2003-07-29  9:47 yuval yeret

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).