linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* reiserfs oops [2.5.65]
@ 2003-03-19 14:10 Dave Jones
  2003-03-20  8:25 ` Oleg Drokin
  0 siblings, 1 reply; 16+ messages in thread
From: Dave Jones @ 2003-03-19 14:10 UTC (permalink / raw)
  To: Linux Kernel

Happened during this mornings cron jobs.
find(1) had hung, taking the whole box with it.
Could switch tty's, but no keyboard input at all,
couldn't ssh into box..

12 hours previous, I had been putting it through a fairly
heavy set of stresstests with ltp/fsx/fsstress, though
nothing turned up then. I left the box idle, and went to
bed, and woke up to find this...

Mar 19 06:35:41 mesh kernel: kernel BUG at mm/slab.c:1552!
Mar 19 06:35:41 mesh kernel: invalid operand: 0000
Mar 19 06:35:41 mesh kernel: CPU:    0
Mar 19 06:35:41 mesh kernel: EIP:    0060:[<c015417b>]    Not tainted
Mar 19 06:35:41 mesh kernel: EFLAGS: 00010012
Mar 19 06:35:41 mesh kernel: EIP is at kmem_cache_free+0x24b/0x2c0
Mar 19 06:35:41 mesh kernel: eax: 002fdd94   ebx: 00010c00   ecx: 000001dc   edx: c11bd788
Mar 19 06:35:41 mesh kernel: esi: 68f30060   edi: c1f30060   ebp: c11c3df0   esp: c11c3dc8
Mar 19 06:35:41 mesh kernel: ds: 007b   es: 007b   ss: 0068
Mar 19 06:35:41 mesh kernel: Process kswapd0 (pid: 6, threadinfo=c11c2000 task=c11c1980)
Mar 19 06:35:41 mesh kernel: Stack: c11bd788 c0746458 0000006b c1f30020 c0280dfd c11d32e8 00000292 c1f30098
Mar 19 06:35:41 mesh kernel:        c11c3e74 00000031 c11c3e00 c0280dfd c11bd788 c1f30064 c11c3e14 c0194846
Mar 19 06:35:41 mesh kernel:        c1f30098 c6dcd320 c1f30098 c11c3e44 c0194bfc c1f30098 00000000 00000000
Mar 19 06:35:41 mesh kernel: Call Trace:
Mar 19 06:35:41 mesh kernel:  [<c0280dfd>] reiserfs_destroy_inode+0x1d/0x30
Mar 19 06:35:41 mesh kernel:  [<c0280dfd>] reiserfs_destroy_inode+0x1d/0x30
Mar 19 06:35:41 mesh kernel:  [<c0194846>] destroy_inode+0x36/0x60
Mar 19 06:35:41 mesh kernel:  [<c0194bfc>] dispose_list+0x4c/0x1f0
Mar 19 06:35:41 mesh kernel:  [<c01952e0>] prune_icache+0x190/0x480
Mar 19 06:35:41 mesh kernel:  [<c01955f8>] shrink_icache_memory+0x28/0x30
Mar 19 06:35:41 mesh kernel:  [<c015724f>] shrink_slab+0x11f/0x170
Mar 19 06:35:41 mesh kernel:  [<c0158d1e>] balance_pgdat+0x12e/0x180
Mar 19 06:35:41 mesh kernel:  [<c0158e5d>] kswapd+0xed/0xf0
Mar 19 06:35:41 mesh kernel:  [<c0124ad0>] autoremove_wake_function+0x0/0x50
Mar 19 06:35:41 mesh kernel:  [<c010a2e6>] ret_from_fork+0x6/0x20
Mar 19 06:35:41 mesh kernel:  [<c0124ad0>] autoremove_wake_function+0x0/0x50
Mar 19 06:35:41 mesh kernel:  [<c0158d70>] kswapd+0x0/0xf0
Mar 19 06:35:41 mesh kernel:  [<c01075fd>] kernel_thread_helper+0x5/0x18
Mar 19 06:35:41 mesh kernel:
Mar 19 06:35:41 mesh kernel: Code: 0f 0b 10 06 56 5a 54 c0 e9 d7 fe ff ff 8b 55 08 8b 5a 34 e9


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: reiserfs oops [2.5.65]
  2003-03-19 14:10 reiserfs oops [2.5.65] Dave Jones
@ 2003-03-20  8:25 ` Oleg Drokin
  2003-03-20 11:48   ` Dave Jones
  2003-03-20 13:24   ` Dave Jones
  0 siblings, 2 replies; 16+ messages in thread
From: Oleg Drokin @ 2003-03-20  8:25 UTC (permalink / raw)
  To: Dave Jones, Linux Kernel

Hello!

On Wed, Mar 19, 2003 at 02:10:48PM +0000, Dave Jones wrote:
> Happened during this mornings cron jobs.
> find(1) had hung, taking the whole box with it.
> Could switch tty's, but no keyboard input at all,
> couldn't ssh into box..
> 12 hours previous, I had been putting it through a fairly
> heavy set of stresstests with ltp/fsx/fsstress, though
> nothing turned up then. I left the box idle, and went to
> bed, and woke up to find this...

Hm, very interesting. Thank you.
I've seen this once too, but on kernel patched with lots of unrelated and
possibly memory corrupting stuff.
I will look at it more closely.
BTW, it oopsed not in find. Is your box SMP?

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: reiserfs oops [2.5.65]
  2003-03-20  8:25 ` Oleg Drokin
@ 2003-03-20 11:48   ` Dave Jones
  2003-03-20 13:24   ` Dave Jones
  1 sibling, 0 replies; 16+ messages in thread
From: Dave Jones @ 2003-03-20 11:48 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: Linux Kernel

On Thu, Mar 20, 2003 at 11:25:59AM +0300, Oleg Drokin wrote:

 > Hm, very interesting. Thank you.
 > I've seen this once too, but on kernel patched with lots of unrelated and
 > possibly memory corrupting stuff.
 > I will look at it more closely.
 > BTW, it oopsed not in find. Is your box SMP?

Nope. UP VIA Nehemiah.

		Dave


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: reiserfs oops [2.5.65]
  2003-03-20  8:25 ` Oleg Drokin
  2003-03-20 11:48   ` Dave Jones
@ 2003-03-20 13:24   ` Dave Jones
  2003-03-20 14:03     ` Dave Jones
                       ` (2 more replies)
  1 sibling, 3 replies; 16+ messages in thread
From: Dave Jones @ 2003-03-20 13:24 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: Linux Kernel

On Thu, Mar 20, 2003 at 11:25:59AM +0300, Oleg Drokin wrote:

 > Hm, very interesting. Thank you.
 > I've seen this once too, but on kernel patched with lots of unrelated and
 > possibly memory corrupting stuff.
 > I will look at it more closely.
 > BTW, it oopsed not in find. Is your box SMP?

Same box committed seppuku overnight, this time in a different way.

There's lots of "slab error in cache_alloc_debugcheck_after()"
warnings. cache reiser_inode_cache memory after object was overwritten

Some call traces.
 check_poison_obj <- kmem_cache_alloc <- reierfs_alloc_inode <-
 reiserfs_alloc_inode <- alloc_inode <- get_new_inode <-
 reiserfs_init_locked <- reiserfs_find_actor <- reiserfs_iget <-
 reiserfs_find_actor <- reiserfs_init_locked_inode <- reiserfs_lookup <-
 real_lookup <- do_lookup <- link_path_walk <- kmem_cache_alloc <-
 __user_walk <- vfs_lstat <- sys_lstat64 <- syscall_call

Slab corruption: start=c70c7044, expend=c70c7213 problemat=c70c7044
Last user: [<c0280dcb>](reiserfs_alloc_inode+0x1b/0x30)
Data: (lots of hex)

I'll give that box a run of memtest to rule out memory corruption
problems. I'll also hook up a serial terminal tonight to catch tomorrow
nights 'activity' in full 8-)

		Dave


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: reiserfs oops [2.5.65]
  2003-03-20 13:24   ` Dave Jones
@ 2003-03-20 14:03     ` Dave Jones
  2003-03-21  0:59     ` Andrew Morton
  2003-03-21  9:14     ` reiserfs oops [2.5.65] Oleg Drokin
  2 siblings, 0 replies; 16+ messages in thread
From: Dave Jones @ 2003-03-20 14:03 UTC (permalink / raw)
  To: Oleg Drokin, Linux Kernel

On Thu, Mar 20, 2003 at 01:24:09PM +0000, Dave Jones wrote:

 >  > Hm, very interesting. Thank you.
 >  > I've seen this once too, but on kernel patched with lots of unrelated and
 >  > possibly memory corrupting stuff.
 >  > I will look at it more closely.
 >  > BTW, it oopsed not in find. Is your box SMP?
 > 
 > Same box committed seppuku overnight, this time in a different way.

Managed to get the full logs off the box..
http://www.codemonkey.org.uk/cruft/oops.txt

Including the very first oops that triggered the 'bad shit'.

Mar 20 06:41:07 mesh kernel: Unable to handle kernel paging request at virtual address 20b8c080
Mar 20 06:41:07 mesh kernel:  printing eip:
Mar 20 06:41:07 mesh kernel: c01538b4
Mar 20 06:41:07 mesh kernel: *pde = 00000000
Mar 20 06:41:07 mesh kernel: Oops: 0002
Mar 20 06:41:07 mesh kernel: CPU:    0
Mar 20 06:41:07 mesh kernel: EIP:    0060:[<c01538b4>]    Not tainted 
Mar 20 06:41:07 mesh kernel: EFLAGS: 00010012
Mar 20 06:41:07 mesh kernel: EIP is at cache_flusharray+0x104/0x450
Mar 20 06:41:07 mesh kernel: eax: c1eea020   ebx: 00000006   ecx: c49e621c   edx: 20b8c080
Mar 20 06:41:07 mesh kernel: esi: c49e6000   edi: c11bd788   ebp: c11c3dc0   esp: c11c3d88
Mar 20 06:41:07 mesh kernel: ds: 007b   es: 007b   ss: 0068
Mar 20 06:41:07 mesh kernel: Process kswapd0 (pid: 6, threadinfo=c11c2000 task=c11c1980)
Mar 20 06:41:07 mesh kernel: Stack: c11d656c c6bb2000 c11c3e18 c11c2000 c05ac7c0 00000800 c6129040 c11bd7a4 
Mar 20 06:41:07 mesh kernel:        c11bd794 c11d32f8 00000010 00000800 c1792040 c179221c c11c3df0 c0154121 
Mar 20 06:41:07 mesh kernel:        c11bd788 c11d32e8 0000006b c1792000 c0280dfd c11d32e8 00000292 c1792254 
Mar 20 06:41:07 mesh kernel: Call Trace:
Mar 20 06:41:07 mesh kernel:  [<c0154121>] kmem_cache_free+0x1f1/0x2c0
Mar 20 06:41:07 mesh kernel:  [<c0280dfd>] reiserfs_destroy_inode+0x1d/0x30
Mar 20 06:41:07 mesh kernel:  [<c0280dfd>] reiserfs_destroy_inode+0x1d/0x30
Mar 20 06:41:07 mesh kernel:  [<c0194846>] destroy_inode+0x36/0x60
Mar 20 06:41:07 mesh kernel:  [<c0194bfc>] dispose_list+0x4c/0x1f0
Mar 20 06:41:07 mesh kernel:  [<c01952e0>] prune_icache+0x190/0x480
Mar 20 06:41:07 mesh kernel:  [<c01955f8>] shrink_icache_memory+0x28/0x30
Mar 20 06:41:07 mesh kernel:  [<c015724f>] shrink_slab+0x11f/0x170
Mar 20 06:41:07 mesh kernel:  [<c0158d1e>] balance_pgdat+0x12e/0x180
Mar 20 06:41:07 mesh kernel:  [<c0158e5d>] kswapd+0xed/0xf0
Mar 20 06:41:07 mesh kernel:  [<c0124ad0>] autoremove_wake_function+0x0/0x50
Mar 20 06:41:07 mesh kernel:  [<c010a2e6>] ret_from_fork+0x6/0x20
Mar 20 06:41:07 mesh kernel:  [<c0124ad0>] autoremove_wake_function+0x0/0x50
Mar 20 06:41:07 mesh kernel:  [<c0158d70>] kswapd+0x0/0xf0
Mar 20 06:41:07 mesh kernel:  [<c01075fd>] kernel_thread_helper+0x5/0x18
Mar 20 06:41:07 mesh kernel: 
Mar 20 06:41:07 mesh kernel: Code: 89 02 31 d2 8b 46 0c 29 c1 89 c8 f7 77 30 89 c1 8b 46 14 89 
Mar 20 06:41:07 mesh kernel:  <6>note: kswapd0[6] exited with preempt_count 1
Mar 20 06:41:07 mesh kernel: mm/slab.c:1647: spin_lock(mm/slab.c:c11bd7c8) already locked by mm/slab.c/1775

		Dave




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: reiserfs oops [2.5.65]
  2003-03-21  0:59     ` Andrew Morton
@ 2003-03-20 23:13       ` Dave Jones
  2003-03-20 23:34         ` Randy.Dunlap
  0 siblings, 1 reply; 16+ messages in thread
From: Dave Jones @ 2003-03-20 23:13 UTC (permalink / raw)
  To: Andrew Morton; +Cc: green, linux-kernel

On Thu, Mar 20, 2003 at 04:59:41PM -0800, Andrew Morton wrote:
 > > 
 > > Slab corruption: start=c70c7044, expend=c70c7213 problemat=c70c7044
 > > Last user: [<c0280dcb>](reiserfs_alloc_inode+0x1b/0x30)
 > > Data: (lots of hex)
 > 
 > Alas, the "(lots of hex)" is important - it lets us determine which member of
 > struct reiserfs_inode was actually altered.

You're in luck, as noted in the follow up, I captured it all..
http://www.codemonkey.org.uk/cruft/oops.txt
 
 > It would be nice if we had a more robust way of capturing all this info,
 > especially the oops-while-running-X lossage.  Dump-to-floppy or something.

The tricky thing about capturing this one was that something in the VFS
was wedged hard, so I'm lucky the logs made it to disk.
Using sysrq, I synced and the disks made really nasty chugging noises.
umount read only made it write a bunch of stuff out, more chugging
noises for 15 minutes, kill all tasks then popped up a getty for me to
log in on. It took about 3 minutes to get a shell (disk IO was seriously
slow). I took a peek at vmstat 1 (no logs of this sorry), and nothing
out of the ordinary. Not much in swap, plenty of swap free.
short: the machine was horked. I synced, and rebooted, and thankfully
the logs were still there after fsck.ext2 recovered /var

		Dave


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: reiserfs oops [2.5.65]
  2003-03-20 23:13       ` Dave Jones
@ 2003-03-20 23:34         ` Randy.Dunlap
  2003-03-22 19:36           ` Nicholas Wourms
  0 siblings, 1 reply; 16+ messages in thread
From: Randy.Dunlap @ 2003-03-20 23:34 UTC (permalink / raw)
  To: Dave Jones; +Cc: akpm, green, linux-kernel

On Thu, 20 Mar 2003 23:13:35 +0000 Dave Jones <davej@codemonkey.org.uk> wrote:

| On Thu, Mar 20, 2003 at 04:59:41PM -0800, Andrew Morton wrote:
|  > > 
|  > > Slab corruption: start=c70c7044, expend=c70c7213 problemat=c70c7044
|  > > Last user: [<c0280dcb>](reiserfs_alloc_inode+0x1b/0x30)
|  > > Data: (lots of hex)
|  > 
|  > Alas, the "(lots of hex)" is important - it lets us determine which member of
|  > struct reiserfs_inode was actually altered.
| 
| You're in luck, as noted in the follow up, I captured it all..
| http://www.codemonkey.org.uk/cruft/oops.txt
|  
|  > It would be nice if we had a more robust way of capturing all this info,
|  > especially the oops-while-running-X lossage.  Dump-to-floppy or something.
| 
| The tricky thing about capturing this one was that something in the VFS
| was wedged hard, so I'm lucky the logs made it to disk.
| Using sysrq, I synced and the disks made really nasty chugging noises.
| umount read only made it write a bunch of stuff out, more chugging
| noises for 15 minutes, kill all tasks then popped up a getty for me to
| log in on. It took about 3 minutes to get a shell (disk IO was seriously
| slow). I took a peek at vmstat 1 (no logs of this sorry), and nothing
| out of the ordinary. Not much in swap, plenty of swap free.
| short: the machine was horked. I synced, and rebooted, and thankfully
| the logs were still there after fsck.ext2 recovered /var

I've done some 2.5.xyz work on kmsgdump (dump kernel messages to
floppy).  I'll try to get back to it soon.

--
~Randy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: reiserfs oops [2.5.65]
  2003-03-20 13:24   ` Dave Jones
  2003-03-20 14:03     ` Dave Jones
@ 2003-03-21  0:59     ` Andrew Morton
  2003-03-20 23:13       ` Dave Jones
  2003-03-21  9:14     ` reiserfs oops [2.5.65] Oleg Drokin
  2 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2003-03-21  0:59 UTC (permalink / raw)
  To: Dave Jones; +Cc: green, linux-kernel

Dave Jones <davej@codemonkey.org.uk> wrote:
>
> There's lots of "slab error in cache_alloc_debugcheck_after()"
> warnings. cache reiser_inode_cache memory after object was overwritten
> 
> Some call traces.
>  check_poison_obj <- kmem_cache_alloc <- reierfs_alloc_inode <-
>  reiserfs_alloc_inode <- alloc_inode <- get_new_inode <-
>  reiserfs_init_locked <- reiserfs_find_actor <- reiserfs_iget <-
>  reiserfs_find_actor <- reiserfs_init_locked_inode <- reiserfs_lookup <-
>  real_lookup <- do_lookup <- link_path_walk <- kmem_cache_alloc <-
>  __user_walk <- vfs_lstat <- sys_lstat64 <- syscall_call
> 
> Slab corruption: start=c70c7044, expend=c70c7213 problemat=c70c7044
> Last user: [<c0280dcb>](reiserfs_alloc_inode+0x1b/0x30)
> Data: (lots of hex)

Alas, the "(lots of hex)" is important - it lets us determine which member of
struct reiserfs_inode was actually altered.

> I'll give that box a run of memtest to rule out memory corruption
> problems. I'll also hook up a serial terminal tonight to catch tomorrow
> nights 'activity' in full 8-)

Good, thanks.

It would be nice if we had a more robust way of capturing all this info,
especially the oops-while-running-X lossage.  Dump-to-floppy or something.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: reiserfs oops [2.5.65]
  2003-03-20 13:24   ` Dave Jones
  2003-03-20 14:03     ` Dave Jones
  2003-03-21  0:59     ` Andrew Morton
@ 2003-03-21  9:14     ` Oleg Drokin
  2003-03-21 11:54       ` Dave Jones
  2 siblings, 1 reply; 16+ messages in thread
From: Oleg Drokin @ 2003-03-21  9:14 UTC (permalink / raw)
  To: Dave Jones, Linux Kernel

Hello!

On Thu, Mar 20, 2003 at 01:24:09PM +0000, Dave Jones wrote:
>  > Hm, very interesting. Thank you.
>  > I've seen this once too, but on kernel patched with lots of unrelated and
>  > possibly memory corrupting stuff.
>  > I will look at it more closely.
>  > BTW, it oopsed not in find. Is your box SMP?
> Same box committed seppuku overnight, this time in a different way.

Hm, am I missing something?
So it died in the morning yesterday, but before that it died again? Or were those
two different nights? ;)

> There's lots of "slab error in cache_alloc_debugcheck_after()"
> warnings. cache reiser_inode_cache memory after object was overwritten

This second oops and first BUG you quoted indicate that internal slab structures
(I think second oops happened in the middle of list_del) were corrupted, not
the guarded data itself.
At least I think so.
Can I take a look at your .config?

Thank you.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: reiserfs oops [2.5.65]
  2003-03-21  9:14     ` reiserfs oops [2.5.65] Oleg Drokin
@ 2003-03-21 11:54       ` Dave Jones
  2003-03-21 13:42         ` Oleg Drokin
  0 siblings, 1 reply; 16+ messages in thread
From: Dave Jones @ 2003-03-21 11:54 UTC (permalink / raw)
  To: Oleg Drokin; +Cc: Linux Kernel

On Fri, Mar 21, 2003 at 12:14:55PM +0300, Oleg Drokin wrote:

 > >  > Hm, very interesting. Thank you.
 > >  > I've seen this once too, but on kernel patched with lots of unrelated and
 > >  > possibly memory corrupting stuff.
 > >  > I will look at it more closely.
 > >  > BTW, it oopsed not in find. Is your box SMP?
 > > Same box committed seppuku overnight, this time in a different way.
 > 
 > Hm, am I missing something?
 > So it died in the morning yesterday, but before that it died again? Or were those
 > two different nights? ;)

two different nights. (with a reboot between them), and again last
night. Seems to die each 6am when the cron jobs run.
if you want, I'll force them to run more often to chase this down / try
out debugging patches etc..
 
 > > There's lots of "slab error in cache_alloc_debugcheck_after()"
 > > warnings. cache reiser_inode_cache memory after object was overwritten
 > This second oops and first BUG you quoted indicate that internal slab structures
 > (I think second oops happened in the middle of list_del) were corrupted, not
 > the guarded data itself.
 > At least I think so.
 > Can I take a look at your .config?

http://www.codemonkey.org.uk/cruft/dotconfig

I'll rebuild that kernel with reiserfs extended checking on later.

		Dave


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: reiserfs oops [2.5.65]
  2003-03-21 11:54       ` Dave Jones
@ 2003-03-21 13:42         ` Oleg Drokin
  0 siblings, 0 replies; 16+ messages in thread
From: Oleg Drokin @ 2003-03-21 13:42 UTC (permalink / raw)
  To: Dave Jones, Linux Kernel

Hello!

On Fri, Mar 21, 2003 at 11:54:47AM +0000, Dave Jones wrote:
>  > > There's lots of "slab error in cache_alloc_debugcheck_after()"
>  > > warnings. cache reiser_inode_cache memory after object was overwritten
>  > This second oops and first BUG you quoted indicate that internal slab structures
>  > (I think second oops happened in the middle of list_del) were corrupted, not
>  > the guarded data itself.
>  > At least I think so.
>  > Can I take a look at your .config?
> http://www.codemonkey.org.uk/cruft/dotconfig

I wonder if you can reproduce the problem with CONFIG_PREEMPT disabled.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: reiserfs oops [2.5.65]
  2003-03-20 23:34         ` Randy.Dunlap
@ 2003-03-22 19:36           ` Nicholas Wourms
  2003-03-25 16:15             ` Bill Davidsen
  2003-03-31 18:53             ` [announce] kmsgdump for 2.5.65/66 Randy.Dunlap
  0 siblings, 2 replies; 16+ messages in thread
From: Nicholas Wourms @ 2003-03-22 19:36 UTC (permalink / raw)
  To: linux-kernel

Randy.Dunlap wrote:
[SNIP]
> 
> I've done some 2.5.xyz work on kmsgdump (dump kernel messages to
> floppy).  I'll try to get back to it soon.
> 

Thank you!  That'd be a god-send for those of us w/o serial 
ports and who have very cramped hands from hand-copying 
panics :-D.  Frankly, I can't imagine why something a simple 
as this isn't in the kernel.  Technically, it isn't a 
debugger, so I don't think it violates Linus' "No Kernel 
Debuggers in the Kernel" rule.

Cheers,
Nicholas



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: reiserfs oops [2.5.65]
  2003-03-22 19:36           ` Nicholas Wourms
@ 2003-03-25 16:15             ` Bill Davidsen
  2003-03-31 18:53             ` [announce] kmsgdump for 2.5.65/66 Randy.Dunlap
  1 sibling, 0 replies; 16+ messages in thread
From: Bill Davidsen @ 2003-03-25 16:15 UTC (permalink / raw)
  To: Nicholas Wourms; +Cc: linux-kernel

On Sat, 22 Mar 2003, Nicholas Wourms wrote:

> Randy.Dunlap wrote:
> [SNIP]
> > 
> > I've done some 2.5.xyz work on kmsgdump (dump kernel messages to
> > floppy).  I'll try to get back to it soon.
> > 
> 
> Thank you!  That'd be a god-send for those of us w/o serial 
> ports and who have very cramped hands from hand-copying 
> panics :-D.  Frankly, I can't imagine why something a simple 
> as this isn't in the kernel.  Technically, it isn't a 
> debugger, so I don't think it violates Linus' "No Kernel 
> Debuggers in the Kernel" rule.

I have the feeling from Linus' reaction to the "dump to hd partiton"
feature, that he is all developer with little professional (paid to de
what someone else wants) admin experience. I tried to explain that
machines many timezones away in a secured environment make it hard to read
a console (none), insert a floppy, or use a serial cable. I actually
looked at serial, it would take (a) a capital budget for the cable, (b) a
labor budget for a cleared consultant to plug it in, (c) approval of the
network group because it's a data cable, and (d) a security analysis of
the risks of connecting two secure machines.

And of course the identical machines I can touch don't have that failure
mode :-(

Anyway you are not likely to see that in the official kernel unless Linus
has a change of heart, but you can patch it in so that's not critical,
just one more issue if you have to justify Linux vs. AIX or Solaris in a
bid.

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [announce] kmsgdump for 2.5.65/66
  2003-03-22 19:36           ` Nicholas Wourms
  2003-03-25 16:15             ` Bill Davidsen
@ 2003-03-31 18:53             ` Randy.Dunlap
  2003-03-31 19:17               ` Nicholas Wourms
  1 sibling, 1 reply; 16+ messages in thread
From: Randy.Dunlap @ 2003-03-31 18:53 UTC (permalink / raw)
  To: Nicholas Wourms; +Cc: linux-kernel

On Sat, 22 Mar 2003 14:36:07 -0500 Nicholas Wourms <nwourms@myrealbox.com> wrote:

| Randy.Dunlap wrote:
|
| > I've done some 2.5.xyz work on kmsgdump (dump kernel messages to
| > floppy).  I'll try to get back to it soon.
| > 
| 
| Thank you!  That'd be a god-send for those of us w/o serial 
| ports and who have very cramped hands from hand-copying 
| panics :-D.  Frankly, I can't imagine why something a simple 
| as this isn't in the kernel.  Technically, it isn't a 
| debugger, so I don't think it violates Linus' "No Kernel 
| Debuggers in the Kernel" rule.


so.....

kmsgdump for Linux 2.5.65/2.5.66
2003-03-31
version 0.4.5

kmsgdump home:  http://w.ods.org/tools/kmsgdump/ (Willy Tarreau)
my kmsgdump patches:  http://www.xenotime.net/linux/kmsgdump/

'kmsgdump-2565.diff' applies cleanly to Linux 2.5.65 and with
just a few patch offsets to Linux 2.5.66.

Tested on P4 UP and P4 SMP (Linux 2.5.65/66).

======================================================================

kmsgdump is a Linux kernel patch for x86 that can be used to dump the
contents of the kernel log buffer to a floppy disk or to a printer
(one that the system BIOS knows about) after a kernel panic happens.

======================================================================

caveats:

1.  This version of kmsgdump doesn't support kernel log buffers of more
than 60 KB (due to x86 real-mode segment addressing).  However, the
kernel log buffer is currently always a power of 2 (like 16 or 32 or
64 KB), so a 32 KB log buffer is the largest that is currently supported.

2.  The kmsgdump text-mode interface doesn't work with a USB-only keyboard
setup.  I had to add a PS/2 keyboard to my test system to use it.

======================================================================

TODO:

1.  Limit LOG_BUF_LEN to 60 KB then fix this limit by having a moving
    segment register value

###
--
~Randy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [announce] kmsgdump for 2.5.65/66
  2003-03-31 18:53             ` [announce] kmsgdump for 2.5.65/66 Randy.Dunlap
@ 2003-03-31 19:17               ` Nicholas Wourms
  2003-03-31 20:44                 ` Randy.Dunlap
  0 siblings, 1 reply; 16+ messages in thread
From: Nicholas Wourms @ 2003-03-31 19:17 UTC (permalink / raw)
  To: Randy.Dunlap; +Cc: linux-kernel

Randy.Dunlap wrote:
> 
> so.....
> 
> kmsgdump for Linux 2.5.65/2.5.66
> 2003-03-31
> version 0.4.5

Thanks for porting this to 2.5.66!

[SNIP]


> 2.  The kmsgdump text-mode interface doesn't work with a USB-only keyboard
> setup.  I had to add a PS/2 keyboard to my test system to use it.

Hey, it's better then the alternative ;-).  I, too, use a 
usb keyboard.  Perhaps some of the kdb usb code could be 
ripped off?

Cheers,
Nicholas


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [announce] kmsgdump for 2.5.65/66
  2003-03-31 19:17               ` Nicholas Wourms
@ 2003-03-31 20:44                 ` Randy.Dunlap
  0 siblings, 0 replies; 16+ messages in thread
From: Randy.Dunlap @ 2003-03-31 20:44 UTC (permalink / raw)
  To: Nicholas Wourms; +Cc: linux-kernel

On Mon, 31 Mar 2003 14:17:03 -0500 Nicholas Wourms <nwourms@myrealbox.com> wrote:

| Randy.Dunlap wrote:
| > 
| > so.....
| > 
| > kmsgdump for Linux 2.5.65/2.5.66
| > 2003-03-31
| > version 0.4.5
| 
| Thanks for porting this to 2.5.66!
| 
| [SNIP]
| 
| 
| > 2.  The kmsgdump text-mode interface doesn't work with a USB-only keyboard
| > setup.  I had to add a PS/2 keyboard to my test system to use it.
| 
| Hey, it's better then the alternative ;-).  I, too, use a 
| usb keyboard.  Perhaps some of the kdb usb code could be 
| ripped off?

Maybe.  I haven't looked at that code, although I am aware of kdb having it.
However, kmsgdump operates in real mode (currently), so this might need
to be real-mode USB drivers.  And then if a USB keyboard is supported,
someone will say, "how about supporting a USB floppy or USB printer?".  :(

--
~Randy

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2003-03-31 20:37 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-03-19 14:10 reiserfs oops [2.5.65] Dave Jones
2003-03-20  8:25 ` Oleg Drokin
2003-03-20 11:48   ` Dave Jones
2003-03-20 13:24   ` Dave Jones
2003-03-20 14:03     ` Dave Jones
2003-03-21  0:59     ` Andrew Morton
2003-03-20 23:13       ` Dave Jones
2003-03-20 23:34         ` Randy.Dunlap
2003-03-22 19:36           ` Nicholas Wourms
2003-03-25 16:15             ` Bill Davidsen
2003-03-31 18:53             ` [announce] kmsgdump for 2.5.65/66 Randy.Dunlap
2003-03-31 19:17               ` Nicholas Wourms
2003-03-31 20:44                 ` Randy.Dunlap
2003-03-21  9:14     ` reiserfs oops [2.5.65] Oleg Drokin
2003-03-21 11:54       ` Dave Jones
2003-03-21 13:42         ` Oleg Drokin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).