linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* kernel freeze on 2.4.32, apparently in cached_lookup
@ 2006-01-24 17:49 Chris Lightfoot
  2006-01-24 21:13 ` Willy Tarreau
  0 siblings, 1 reply; 3+ messages in thread
From: Chris Lightfoot @ 2006-01-24 17:49 UTC (permalink / raw)
  To: linux-kernel

I have a Pentium 4 machine running stock kernel 2.4.32
with ext3 on LVM on software RAID-1. HIMEM is enabled and
the machine has 3GB of RAM. Various details of the machine
and kernel as here:

http://ex-parrot.com/~chris/tmp/20060124/caesious-.config
http://ex-parrot.com/~chris/tmp/20060124/caesious-cpuinfo
http://ex-parrot.com/~chris/tmp/20060124/caesious-lsmod
http://ex-parrot.com/~chris/tmp/20060124/caesious-lspci

Occasionally -- often when running updatedb or another
disk-heavy cron job, but sometimes during normal use of
the machine -- the machine freezes up almost entirely
(mouse pointer stops working, ditto VC switching, no
console output if on the text console, SSH sessions
freeze, but network packet forwarding and NAT still work).
There's no output on the VGA console and the machine
doesn't respond to Ctrl-Alt-Sysrq, but does respond to
break+... on the serial console. That gives sysrq-p output
like this, from the most recent freeze:

SysRq : Show Regs
Pid: 30641, comm:             updatedb
EIP: 0010:d_lookup+63/110 CPU: 0 EFLAGS: 00000287    Tainted: P
EAX: c8632710 EBX: c8632700 ECX: 00000012 EDX: 13fe1842
ESI: d373b000 EDI: 0003ffff EBP: ea93bedc DS: 0018 ES: 0018
CR0: 8005003b CR2: 080a4094 CR3: 2965b000 CR4: 000006d0
Call Trace: cached_lookup+11/50 link_path_walk+63b/900 vfs_permission+79/120 path_lookup+1e/30 __user_walk+2b/50 sys_lstat64+17/70 system_call+33/38

-- repeating sysrq+p suggests that the kernel is stuck in 
d_lookup:

http://ex-parrot.com/~chris/tmp/20060124/caesious-regs-symbols

There's no oops or other message logged.

(I'm running a uniprocessor kernel -- the SMP kernel also
freezes under similar circumstances, and I wanted to
eliminate the SMP code as a source of problems.)

Does this look like a known problem? If not, what should I
do next to track down the problem? In particular, what
other information should I try to collect next time it
freezes?

(Please cc replies to me if possible....)

-- 
Q. Can I make copies of the copyright form?
(US Copyright Office FAQ)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: kernel freeze on 2.4.32, apparently in cached_lookup
  2006-01-24 17:49 kernel freeze on 2.4.32, apparently in cached_lookup Chris Lightfoot
@ 2006-01-24 21:13 ` Willy Tarreau
  2006-01-25  1:54   ` Chris Lightfoot
  0 siblings, 1 reply; 3+ messages in thread
From: Willy Tarreau @ 2006-01-24 21:13 UTC (permalink / raw)
  To: Chris Lightfoot; +Cc: linux-kernel

Hi,

On Tue, Jan 24, 2006 at 05:49:28PM +0000, Chris Lightfoot wrote:
> I have a Pentium 4 machine running stock kernel 2.4.32
> with ext3 on LVM on software RAID-1. HIMEM is enabled and
> the machine has 3GB of RAM. Various details of the machine
> and kernel as here:
> 
> http://ex-parrot.com/~chris/tmp/20060124/caesious-.config
> http://ex-parrot.com/~chris/tmp/20060124/caesious-cpuinfo
> http://ex-parrot.com/~chris/tmp/20060124/caesious-lsmod
> http://ex-parrot.com/~chris/tmp/20060124/caesious-lspci
> 
> Occasionally -- often when running updatedb or another
> disk-heavy cron job, but sometimes during normal use of
> the machine -- the machine freezes up almost entirely
> (mouse pointer stops working, ditto VC switching, no
> console output if on the text console, SSH sessions
> freeze, but network packet forwarding and NAT still work).
> There's no output on the VGA console and the machine
> doesn't respond to Ctrl-Alt-Sysrq, but does respond to
> break+... on the serial console. That gives sysrq-p output
> like this, from the most recent freeze:
> 
> SysRq : Show Regs
> Pid: 30641, comm:             updatedb
> EIP: 0010:d_lookup+63/110 CPU: 0 EFLAGS: 00000287    Tainted: P
> EAX: c8632710 EBX: c8632700 ECX: 00000012 EDX: 13fe1842
> ESI: d373b000 EDI: 0003ffff EBP: ea93bedc DS: 0018 ES: 0018
> CR0: 8005003b CR2: 080a4094 CR3: 2965b000 CR4: 000006d0
> Call Trace: cached_lookup+11/50 link_path_walk+63b/900 vfs_permission+79/120 path_lookup+1e/30 __user_walk+2b/50 sys_lstat64+17/70 system_call+33/38
> 
> -- repeating sysrq+p suggests that the kernel is stuck in 
> d_lookup:
> 
> http://ex-parrot.com/~chris/tmp/20060124/caesious-regs-symbols
> 
> There's no oops or other message logged.
> 
> (I'm running a uniprocessor kernel -- the SMP kernel also
> freezes under similar circumstances, and I wanted to
> eliminate the SMP code as a source of problems.)
> 
> Does this look like a known problem? If not, what should I
> do next to track down the problem? In particular, what
> other information should I try to collect next time it
> freezes?

It seems a little weird. I've never seen such a case yet, but
found a few ones looking like yours, but there is nothing
common between them (various FS, +/- highmem, ...) and all
of them only report oops or panics. No interesting response
anyway.

What seems strange in your report is that the kernel freezes.
The only part in cached_lookup() which could freeze IMHO is
when it calls d_lookup(), but for this, you should have a
closed loop instead of a linked list. It could happen with
some memory corruption, but you would get far more oopses
and panics than freezes. For this reason, I believe you
might have some random problem on your filesystem. Could
you run a full fsck on it ?

If it does not find anything, probably that a night-long
memtest will give us some indications.

> (Please cc replies to me if possible....)

Regards,
Willy


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: kernel freeze on 2.4.32, apparently in cached_lookup
  2006-01-24 21:13 ` Willy Tarreau
@ 2006-01-25  1:54   ` Chris Lightfoot
  0 siblings, 0 replies; 3+ messages in thread
From: Chris Lightfoot @ 2006-01-25  1:54 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: linux-kernel

On Tue, Jan 24, 2006 at 10:13:12PM +0100, Willy Tarreau wrote:
    [...]
> What seems strange in your report is that the kernel freezes.
> The only part in cached_lookup() which could freeze IMHO is
> when it calls d_lookup(), but for this, you should have a
> closed loop instead of a linked list. It could happen with
> some memory corruption, but you would get far more oopses
> and panics than freezes. For this reason, I believe you
> might have some random problem on your filesystem. Could
> you run a full fsck on it ?

fsck finds the filesystem is clean; I ran memtest
overnight when I built the machine and it didn't find
anything. Nick's suggestion that it could be a temperature
problem is also interesting; I've added another fan to the
machine and I'll see if that helps matters; if not I'll
try memtest again.

-- 
``It's not a bomb. It's a device that explodes.''
  (possibly-apocryphal statement by French spokesman,
  before the 1995 nuclear tests)

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2006-01-25  1:54 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-01-24 17:49 kernel freeze on 2.4.32, apparently in cached_lookup Chris Lightfoot
2006-01-24 21:13 ` Willy Tarreau
2006-01-25  1:54   ` Chris Lightfoot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).