linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.6.18 BUG: unable to handle kernel NULL pointer dereference at virtual address 000,0000a
@ 2006-09-23 15:56 Christian Weiske
  2006-09-23 20:42 ` Andrew Morton
  0 siblings, 1 reply; 15+ messages in thread
From: Christian Weiske @ 2006-09-23 15:56 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 7304 bytes --]

Hello,


I have a reproducible BUG on my server that occurs whenever disk usage
gets too high / too much swapping occurs (at least I think that is). The
box has one reiserfs filesystem of about 187GB size, the disk is on an
Epia 5000 board, between them is a Promise Ultra 100 PCI IDE controller
card.


Any hints about how to resolve this problem are very welcome.


The trace from the serial console:
-------------
Oops: 0002 [#1]
PREEMPT
Modules linked in:
CPU:    0
EIP:    0060:[<c0112a54>]    Not tainted VLI
EFLAGS: 00010013   (2.6.18 #1)
EIP is at scheduler_tick+0x84/0x340
eax: 00000002   ebx: c7eec590   ecx: c5e960d5   edx: 4457222b
esi: c5e96100   edi: 0000002b   ebp: c7f43864   esp: c7f43850
ds: 007b   es: 007b   ss: 0068
Process  (pid: 6820, ti=c7f42000 task=c7eec590 task.ti=00000002)
Stack: 00000000 c7eec590 c7eec590 00000000 00000000 c7f438d0 c0120c83
c7f438d0
       00000000 c010597b 00000000 c04fbe00 c013d785 00000000 00000000
c7f438d0
       c056ea00 00000000 c04fbe00 c7f438d0 c013d833 00000000 c7f438d0
c04fbe00
Call Trace:
 [<c0120c83>] update_process_times+0x33/0x80
 [<c010597b>] timer_interrupt+0x3b/0x70
 [<c013d785>] handle_IRQ_event+0x35/0x70
 [<c013d833>] __do_IRQ+0x73/0x100
 [<c01047f5>] do_IRQ+0x25/0x50
 [<c0102e7a>] common_interrupt+0x1a/0x20
 [<c028300e>] _mmx_memcpy+0x6e/0x180
 [<c01b69f6>] leaf_copy_items+0x36/0x100
 [<c0282f1c>] memcpy+0x3c/0x50
 [<c0282f88>] memmove+0x38/0x50
 [<c01b72c5>] leaf_paste_in_buffer+0xa5/0x340
 [<c019fc4c>] balance_leaf+0x2cc/0x2e10
 [<c01af706>] get_parents+0x106/0x1a0
 [<c01a2ac1>] do_balance+0x61/0xf0
 [<c01b0d41>] wait_tb_buffers_until_unlocked+0x211/0x280
 [<c01b0f46>] fix_nodes+0x196/0x3d0
 [<c01bd3b6>] reiserfs_paste_into_item+0x196/0x1c0
 [<c01ab701>] reiserfs_allocate_blocks_for_region+0x971/0x13c0
 [<c01baea4>] search_for_position_by_key+0x134/0x330
 [<c013f6a6>] add_to_page_cache+0x46/0xc0
 [<c0162f92>] alloc_buffer_head+0x12/0x50
 [<c0160385>] alloc_page_buffers+0x65/0xc0
 [<c01a5606>] make_cpu_key+0x36/0x40
 [<c01b9b16>] pathrelse+0x26/0x40
 [<c01ad7a4>] reiserfs_file_write+0x694/0x720
 [<c01404f6>] __generic_file_aio_read+0x196/0x210
 [<c0140280>] file_read_actor+0x0/0xe0
 [<c012039c>] change_clocksource+0xc/0x140
 [<c0120b4d>] update_wall_time+0x18d/0x290
 [<c012b0c0>] autoremove_wake_function+0x0/0x40
 [<c0112c65>] scheduler_tick+0x295/0x340
 [<c015e254>] vfs_write+0x84/0x150
 [<c015e3cd>] sys_write+0x3d/0x70
 [<c0102c17>] syscall_call+0x7/0xb
Code: da 8b 5d f0 01 4b 50 11 53 54 39 1d 04 5d 5a c0 89 35 f8 5c 5a c0
89 3d fc
 5c 5a c0 74 12 a1 0c 5d 5a c0 39 43 30 74 1f 8b 43 04 <0f> ba 68 08 03
8d 65 f4
 5b 5e 5f 5d c3 eb 0d 90 90 90 90 90 90
EIP: [<c0112a54>] scheduler_tick+0x84/0x340 SS:ESP 0068:c7f43850
 <1>BUG: unable to handle kernel NULL pointer dereference at virtual
address 000
0000a
 printing eip:
c01123b2
*pde = 00000000
Oops: 0002 [#2]
PREEMPT
Modules linked in:
CPU:    0
EIP:    0060:[<c01123b2>]    Not tainted VLI
EFLAGS: 00010097   (2.6.18 #1)
EIP is at try_to_wake_up+0x52/0xb0
eax: 00000002   ebx: cf79fa90   ecx: cf79fab8   edx: c7eec590
esi: c05a5ce0   edi: 00000000   ebp: c7f436c8   esp: c7f436b8
ds: 007b   es: 007b   ss: 0068
Process  (pid: 6820, ti=c7f42000 task=c7eec590 task.ti=00000002)
Stack: 00000012 00000000 c04fbfcc 00000001 c7f436ec c0112d66 cf79fa90
00000001
       00000000 00000000 c7f42000 00000000 00000012 c7f43714 c0112dc2
c04fbfcc
       00000001 00000001 00000000 00000000 000031f8 00000046 000031f8
fffff5d8
Call Trace:
 [<c0112d66>] __wake_up_common+0x36/0x70
 [<c0112dc2>] __wake_up+0x22/0x50
 [<c011786a>] release_console_sem+0xda/0x100
 [<c01175af>] vprintk+0x18f/0x2b0
 [<c01176b9>] vprintk+0x299/0x2b0
 [<c010323d>] show_stack_log_lvl+0x8d/0xb0
 [<c0112a68>] scheduler_tick+0x98/0x340
 [<c011740f>] printk+0xf/0x20
 [<c010ded3>] bust_spinlocks+0x43/0x50
 [<c0103575>] die+0x85/0x210
 [<c010e1c0>] do_page_fault+0x0/0x570
 [<c010e490>] do_page_fault+0x2d0/0x570
 [<c0112d66>] __wake_up_common+0x36/0x70
 [<c010e1c0>] do_page_fault+0x0/0x570
 [<c0102ec9>] error_code+0x39/0x40
 [<c0112a54>] scheduler_tick+0x84/0x340
 [<c0120c83>] update_process_times+0x33/0x80
 [<c010597b>] timer_interrupt+0x3b/0x70
 [<c013d785>] handle_IRQ_event+0x35/0x70
 [<c013d833>] __do_IRQ+0x73/0x100
 [<c01047f5>] do_IRQ+0x25/0x50
 [<c0102e7a>] common_interrupt+0x1a/0x20
 [<c028300e>] _mmx_memcpy+0x6e/0x180
 [<c01b69f6>] leaf_copy_items+0x36/0x100
 [<c0282f1c>] memcpy+0x3c/0x50
 [<c0282f88>] memmove+0x38/0x50
 [<c01b72c5>] leaf_paste_in_buffer+0xa5/0x340
 [<c019fc4c>] balance_leaf+0x2cc/0x2e10
 [<c01af706>] get_parents+0x106/0x1a0
 [<c01a2ac1>] do_balance+0x61/0xf0
 [<c01b0d41>] wait_tb_buffers_until_unlocked+0x211/0x280
 [<c01b0f46>] fix_nodes+0x196/0x3d0
 [<c01bd3b6>] reiserfs_paste_into_item+0x196/0x1c0
 [<c01ab701>] reiserfs_allocate_blocks_for_region+0x971/0x13c0
 [<c01baea4>] search_for_position_by_key+0x134/0x330
 [<c013f6a6>] add_to_page_cache+0x46/0xc0
 [<c0162f92>] alloc_buffer_head+0x12/0x50
 [<c0160385>] alloc_page_buffers+0x65/0xc0
 [<c01a5606>] make_cpu_key+0x36/0x40
 [<c01b9b16>] pathrelse+0x26/0x40
 [<c01ad7a4>] reiserfs_file_write+0x694/0x720
 [<c01404f6>] __generic_file_aio_read+0x196/0x210
 [<c0140280>] file_read_actor+0x0/0xe0
 [<c012039c>] change_clocksource+0xc/0x140
 [<c0120b4d>] update_wall_time+0x18d/0x290
 [<c012b0c0>] autoremove_wake_function+0x0/0x40
 [<c0112c65>] scheduler_tick+0x295/0x340
 [<c015e254>] vfs_write+0x84/0x150
 [<c015e3cd>] sys_write+0x3d/0x70
 [<c0102c17>] syscall_call+0x7/0xb
Code: 3d 83 f8 02 74 63 a8 40 75 62 6a 01 56 53 e8 f6 fe ff ff 8b 45 10
83 c4 0c
 85 c0 75 1c 8b 56 20 8b 42 1c 39 43 1c 7d 11 8b 42 04 <0f> ba 68 08 03
89 f6 8d
 bc 27 00 00 00 00 bf 01 00 00 00 c7 03
EIP: [<c01123b2>] try_to_wake_up+0x52/0xb0 SS:ESP 0068:c7f436b8
 <0>Kernel panic - not syncing: Fatal exception in interrupt
-------------


# cat /proc/cpuinfo
processor       : 0
vendor_id       : CentaurHauls
cpu family      : 6
model           : 7
model name      : VIA Samuel 2
stepping        : 3
cpu MHz         : 533.373
cache size      : 64 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu de tsc msr cx8 mtrr pge mmx 3dnow
bogomips        : 1068.09


# ./scripts/ver_linux
If some fields are empty or look unusual you may have an old version.
Compare to the current minimal requirements in Documentation/Changes.

Linux dojo 2.6.18 #1 PREEMPT Sat Sep 23 16:24:51 Local time zone must be
set--see  i686 VIA Samuel 2 GNU/Linux

Gnu C                  3.4.6
Gnu make               3.80
binutils               2.16.1
util-linux             2.12r
mount                  2.12r
module-init-tools      3.2.1
e2fsprogs              1.38
reiserfsprogs          3.6.19
Linux C Library        2.3.6
Dynamic linker (ldd)   2.3.6
Procps                 3.2.6
Net-tools              1.60
Kbd                    1.12
Sh-utils               5.94
udev                   087
Modules Loaded




Please CC me as I am not subscribed.

-- 
Regards/MfG,
Christian Weiske


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 2.6.18 BUG: unable to handle kernel NULL pointer dereference at virtual address 000,0000a
  2006-09-23 20:42 ` Andrew Morton
@ 2006-09-23 20:39   ` Ingo Molnar
  2006-09-24  9:11   ` Christian Weiske
  2006-09-24 12:20   ` Christian Weiske
  2 siblings, 0 replies; 15+ messages in thread
From: Ingo Molnar @ 2006-09-23 20:39 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Christian Weiske, linux-kernel, reiserfs-dev, Nick Piggin


* Andrew Morton <akpm@osdl.org> wrote:

> > EIP is at scheduler_tick+0x84/0x340
> > eax: 00000002   ebx: c7eec590   ecx: c5e960d5   edx: 4457222b
> > esi: c5e96100   edi: 0000002b   ebp: c7f43864   esp: c7f43850

hm, edx looks quite ASCII-ish:

  +"WD

which could suggest some hw problem or memory scribble. (or not)

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 2.6.18 BUG: unable to handle kernel NULL pointer dereference at virtual address 000,0000a
  2006-09-23 15:56 2.6.18 BUG: unable to handle kernel NULL pointer dereference at virtual address 000,0000a Christian Weiske
@ 2006-09-23 20:42 ` Andrew Morton
  2006-09-23 20:39   ` Ingo Molnar
                     ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Andrew Morton @ 2006-09-23 20:42 UTC (permalink / raw)
  To: Christian Weiske; +Cc: linux-kernel, reiserfs-dev, Ingo Molnar, Nick Piggin


cc's added.  This looks quite serious.

On Sat, 23 Sep 2006 17:56:05 +0200
Christian Weiske <cweiske@cweiske.de> wrote:

> Hello,
> 
> 
> I have a reproducible BUG on my server that occurs whenever disk usage
> gets too high / too much swapping occurs (at least I think that is). The
> box has one reiserfs filesystem of about 187GB size, the disk is on an
> Epia 5000 board, between them is a Promise Ultra 100 PCI IDE controller
> card.
> 

Do you think this bug is due to the 2.6.18 upgrade?

Have you run fsck across the filesystem(s)?

Does the oops always look the same as this one?

Please turn on the various CONFIG_DEBUG_* options, see if that turns up
anything.

It would be interesting to find out if enabling CONFIG_4KSTACKS makes this
go away (although I'm not sure why).

This looks more like a bug in the CPU scheduler than in the filesystem.

p->thread_info is NULL in scheduler_tick()'s first call to
set_tsk_need_resched(), at line 3008.

Thanks.

> 
> Any hints about how to resolve this problem are very welcome.
> 
> 
> The trace from the serial console:
> -------------
> Oops: 0002 [#1]
> PREEMPT
> Modules linked in:
> CPU:    0
> EIP:    0060:[<c0112a54>]    Not tainted VLI
> EFLAGS: 00010013   (2.6.18 #1)
> EIP is at scheduler_tick+0x84/0x340
> eax: 00000002   ebx: c7eec590   ecx: c5e960d5   edx: 4457222b
> esi: c5e96100   edi: 0000002b   ebp: c7f43864   esp: c7f43850
> ds: 007b   es: 007b   ss: 0068
> Process  (pid: 6820, ti=c7f42000 task=c7eec590 task.ti=00000002)
> Stack: 00000000 c7eec590 c7eec590 00000000 00000000 c7f438d0 c0120c83
> c7f438d0
>        00000000 c010597b 00000000 c04fbe00 c013d785 00000000 00000000
> c7f438d0
>        c056ea00 00000000 c04fbe00 c7f438d0 c013d833 00000000 c7f438d0
> c04fbe00
> Call Trace:
>  [<c0120c83>] update_process_times+0x33/0x80
>  [<c010597b>] timer_interrupt+0x3b/0x70
>  [<c013d785>] handle_IRQ_event+0x35/0x70
>  [<c013d833>] __do_IRQ+0x73/0x100
>  [<c01047f5>] do_IRQ+0x25/0x50
>  [<c0102e7a>] common_interrupt+0x1a/0x20
>  [<c028300e>] _mmx_memcpy+0x6e/0x180
>  [<c01b69f6>] leaf_copy_items+0x36/0x100
>  [<c0282f1c>] memcpy+0x3c/0x50
>  [<c0282f88>] memmove+0x38/0x50
>  [<c01b72c5>] leaf_paste_in_buffer+0xa5/0x340
>  [<c019fc4c>] balance_leaf+0x2cc/0x2e10
>  [<c01af706>] get_parents+0x106/0x1a0
>  [<c01a2ac1>] do_balance+0x61/0xf0
>  [<c01b0d41>] wait_tb_buffers_until_unlocked+0x211/0x280
>  [<c01b0f46>] fix_nodes+0x196/0x3d0
>  [<c01bd3b6>] reiserfs_paste_into_item+0x196/0x1c0
>  [<c01ab701>] reiserfs_allocate_blocks_for_region+0x971/0x13c0
>  [<c01baea4>] search_for_position_by_key+0x134/0x330
>  [<c013f6a6>] add_to_page_cache+0x46/0xc0
>  [<c0162f92>] alloc_buffer_head+0x12/0x50
>  [<c0160385>] alloc_page_buffers+0x65/0xc0
>  [<c01a5606>] make_cpu_key+0x36/0x40
>  [<c01b9b16>] pathrelse+0x26/0x40
>  [<c01ad7a4>] reiserfs_file_write+0x694/0x720
>  [<c01404f6>] __generic_file_aio_read+0x196/0x210
>  [<c0140280>] file_read_actor+0x0/0xe0
>  [<c012039c>] change_clocksource+0xc/0x140
>  [<c0120b4d>] update_wall_time+0x18d/0x290
>  [<c012b0c0>] autoremove_wake_function+0x0/0x40
>  [<c0112c65>] scheduler_tick+0x295/0x340
>  [<c015e254>] vfs_write+0x84/0x150
>  [<c015e3cd>] sys_write+0x3d/0x70
>  [<c0102c17>] syscall_call+0x7/0xb
> Code: da 8b 5d f0 01 4b 50 11 53 54 39 1d 04 5d 5a c0 89 35 f8 5c 5a c0
> 89 3d fc
>  5c 5a c0 74 12 a1 0c 5d 5a c0 39 43 30 74 1f 8b 43 04 <0f> ba 68 08 03
> 8d 65 f4
>  5b 5e 5f 5d c3 eb 0d 90 90 90 90 90 90
> EIP: [<c0112a54>] scheduler_tick+0x84/0x340 SS:ESP 0068:c7f43850
>  <1>BUG: unable to handle kernel NULL pointer dereference at virtual
> address 000
> 0000a
>  printing eip:
> c01123b2
> *pde = 00000000
> Oops: 0002 [#2]
> PREEMPT
> Modules linked in:
> CPU:    0
> EIP:    0060:[<c01123b2>]    Not tainted VLI
> EFLAGS: 00010097   (2.6.18 #1)
> EIP is at try_to_wake_up+0x52/0xb0
> eax: 00000002   ebx: cf79fa90   ecx: cf79fab8   edx: c7eec590
> esi: c05a5ce0   edi: 00000000   ebp: c7f436c8   esp: c7f436b8
> ds: 007b   es: 007b   ss: 0068
> Process  (pid: 6820, ti=c7f42000 task=c7eec590 task.ti=00000002)
> Stack: 00000012 00000000 c04fbfcc 00000001 c7f436ec c0112d66 cf79fa90
> 00000001
>        00000000 00000000 c7f42000 00000000 00000012 c7f43714 c0112dc2
> c04fbfcc
>        00000001 00000001 00000000 00000000 000031f8 00000046 000031f8
> fffff5d8
> Call Trace:
>  [<c0112d66>] __wake_up_common+0x36/0x70
>  [<c0112dc2>] __wake_up+0x22/0x50
>  [<c011786a>] release_console_sem+0xda/0x100
>  [<c01175af>] vprintk+0x18f/0x2b0
>  [<c01176b9>] vprintk+0x299/0x2b0
>  [<c010323d>] show_stack_log_lvl+0x8d/0xb0
>  [<c0112a68>] scheduler_tick+0x98/0x340
>  [<c011740f>] printk+0xf/0x20
>  [<c010ded3>] bust_spinlocks+0x43/0x50
>  [<c0103575>] die+0x85/0x210
>  [<c010e1c0>] do_page_fault+0x0/0x570
>  [<c010e490>] do_page_fault+0x2d0/0x570
>  [<c0112d66>] __wake_up_common+0x36/0x70
>  [<c010e1c0>] do_page_fault+0x0/0x570
>  [<c0102ec9>] error_code+0x39/0x40
>  [<c0112a54>] scheduler_tick+0x84/0x340
>  [<c0120c83>] update_process_times+0x33/0x80
>  [<c010597b>] timer_interrupt+0x3b/0x70
>  [<c013d785>] handle_IRQ_event+0x35/0x70
>  [<c013d833>] __do_IRQ+0x73/0x100
>  [<c01047f5>] do_IRQ+0x25/0x50
>  [<c0102e7a>] common_interrupt+0x1a/0x20
>  [<c028300e>] _mmx_memcpy+0x6e/0x180
>  [<c01b69f6>] leaf_copy_items+0x36/0x100
>  [<c0282f1c>] memcpy+0x3c/0x50
>  [<c0282f88>] memmove+0x38/0x50
>  [<c01b72c5>] leaf_paste_in_buffer+0xa5/0x340
>  [<c019fc4c>] balance_leaf+0x2cc/0x2e10
>  [<c01af706>] get_parents+0x106/0x1a0
>  [<c01a2ac1>] do_balance+0x61/0xf0
>  [<c01b0d41>] wait_tb_buffers_until_unlocked+0x211/0x280
>  [<c01b0f46>] fix_nodes+0x196/0x3d0
>  [<c01bd3b6>] reiserfs_paste_into_item+0x196/0x1c0
>  [<c01ab701>] reiserfs_allocate_blocks_for_region+0x971/0x13c0
>  [<c01baea4>] search_for_position_by_key+0x134/0x330
>  [<c013f6a6>] add_to_page_cache+0x46/0xc0
>  [<c0162f92>] alloc_buffer_head+0x12/0x50
>  [<c0160385>] alloc_page_buffers+0x65/0xc0
>  [<c01a5606>] make_cpu_key+0x36/0x40
>  [<c01b9b16>] pathrelse+0x26/0x40
>  [<c01ad7a4>] reiserfs_file_write+0x694/0x720
>  [<c01404f6>] __generic_file_aio_read+0x196/0x210
>  [<c0140280>] file_read_actor+0x0/0xe0
>  [<c012039c>] change_clocksource+0xc/0x140
>  [<c0120b4d>] update_wall_time+0x18d/0x290
>  [<c012b0c0>] autoremove_wake_function+0x0/0x40
>  [<c0112c65>] scheduler_tick+0x295/0x340
>  [<c015e254>] vfs_write+0x84/0x150
>  [<c015e3cd>] sys_write+0x3d/0x70
>  [<c0102c17>] syscall_call+0x7/0xb
> Code: 3d 83 f8 02 74 63 a8 40 75 62 6a 01 56 53 e8 f6 fe ff ff 8b 45 10
> 83 c4 0c
>  85 c0 75 1c 8b 56 20 8b 42 1c 39 43 1c 7d 11 8b 42 04 <0f> ba 68 08 03
> 89 f6 8d
>  bc 27 00 00 00 00 bf 01 00 00 00 c7 03
> EIP: [<c01123b2>] try_to_wake_up+0x52/0xb0 SS:ESP 0068:c7f436b8
>  <0>Kernel panic - not syncing: Fatal exception in interrupt
> -------------
> 
> 
> # cat /proc/cpuinfo
> processor       : 0
> vendor_id       : CentaurHauls
> cpu family      : 6
> model           : 7
> model name      : VIA Samuel 2
> stepping        : 3
> cpu MHz         : 533.373
> cache size      : 64 KB
> fdiv_bug        : no
> hlt_bug         : no
> f00f_bug        : no
> coma_bug        : no
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 1
> wp              : yes
> flags           : fpu de tsc msr cx8 mtrr pge mmx 3dnow
> bogomips        : 1068.09
> 
> 
> # ./scripts/ver_linux
> If some fields are empty or look unusual you may have an old version.
> Compare to the current minimal requirements in Documentation/Changes.
> 
> Linux dojo 2.6.18 #1 PREEMPT Sat Sep 23 16:24:51 Local time zone must be
> set--see  i686 VIA Samuel 2 GNU/Linux
> 
> Gnu C                  3.4.6
> Gnu make               3.80
> binutils               2.16.1
> util-linux             2.12r
> mount                  2.12r
> module-init-tools      3.2.1
> e2fsprogs              1.38
> reiserfsprogs          3.6.19
> Linux C Library        2.3.6
> Dynamic linker (ldd)   2.3.6
> Procps                 3.2.6
> Net-tools              1.60
> Kbd                    1.12
> Sh-utils               5.94
> udev                   087
> Modules Loaded
> 
> 
> 
> 
> Please CC me as I am not subscribed.
> 
> -- 
> Regards/MfG,
> Christian Weiske
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 2.6.18 BUG: unable to handle kernel NULL pointer dereference at virtual address 000,0000a
  2006-09-23 20:42 ` Andrew Morton
  2006-09-23 20:39   ` Ingo Molnar
@ 2006-09-24  9:11   ` Christian Weiske
  2006-09-24  9:30     ` Christian Weiske
  2006-09-24 10:19     ` Andrew Morton
  2006-09-24 12:20   ` Christian Weiske
  2 siblings, 2 replies; 15+ messages in thread
From: Christian Weiske @ 2006-09-24  9:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: reiserfs-dev, Ingo Molnar, Nick Piggin


[-- Attachment #1.1: Type: text/plain, Size: 1652 bytes --]

Andrew,


>> I have a reproducible BUG on my server that occurs whenever disk usage
>> gets too high / too much swapping occurs (at least I think that is). The
>> box has one reiserfs filesystem of about 187GB size, the disk is on an
>> Epia 5000 board, between them is a Promise Ultra 100 PCI IDE controller
>> card.
> Do you think this bug is due to the 2.6.18 upgrade?

No. I already had it in 2.6.17.6.

> Have you run fsck across the filesystem(s)?
fsck at boot turns up
> ReiserFS: hde3: checking transaction log (hde3)
> ReiserFS: hde3: replayed 22 transactions in 0 seconds
> ReiserFS: hde3: Using r5 hash to sort names
nothing more

> Does the oops always look the same as this one?
No, not exactly the same. I attach three log files. If you diff them,
there will be about 30% of the lines different.

One thing I have to note is that the second Oops appears about 10
seconds after the first one.

> Please turn on the various CONFIG_DEBUG_* options, see if that turns up
> anything.
That indeed turns up something. The debug messages indicate that java
wants to lock something and gets stuck. Note that the messages until
"slab corruption" are printed first, and the others about a minute or
two later.

And I still can ping and do everything until the slab corruption occurs.
(Thus the other messages some minute later)


> It would be interesting to find out if enabling CONFIG_4KSTACKS makes this
> go away (although I'm not sure why).
Didn't try this yet, but will.

I put the logs in a tar.bz2 because I didn't want to flood the list with
a 200k message.

-- 
Regards/MfG,
Christian Weiske

[-- Attachment #1.2: dojo kernelpanic + debug.tar.bz2 --]
[-- Type: application/octet-stream, Size: 8522 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 2.6.18 BUG: unable to handle kernel NULL pointer dereference at virtual address 000,0000a
  2006-09-24  9:11   ` Christian Weiske
@ 2006-09-24  9:30     ` Christian Weiske
  2006-09-24 10:19     ` Andrew Morton
  1 sibling, 0 replies; 15+ messages in thread
From: Christian Weiske @ 2006-09-24  9:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: reiserfs-dev

[-- Attachment #1: Type: text/plain, Size: 250 bytes --]

> I put the logs in a tar.bz2 because I didn't want to flood the list with
> a 200k message.

In case the bz2 didn't make it through the list:
http://xml.cweiske.de/dojo%20kernelpanic%20+%20debug.tar.bz2

-- 
Regards/MfG,
Christian Weiske


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 2.6.18 BUG: unable to handle kernel NULL pointer dereference at virtual address 000,0000a
  2006-09-24  9:11   ` Christian Weiske
  2006-09-24  9:30     ` Christian Weiske
@ 2006-09-24 10:19     ` Andrew Morton
  2006-09-24 17:59       ` Ingo Molnar
  1 sibling, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2006-09-24 10:19 UTC (permalink / raw)
  To: Christian Weiske, netdev
  Cc: linux-kernel, reiserfs-dev, Ingo Molnar, Nick Piggin

On Sun, 24 Sep 2006 11:11:02 +0200
Christian Weiske <cweiske@cweiske.de> wrote:

> Andrew,
> 

You keep on losing Cc:s.  Please preserve them all with care when replying.

> 
> >> I have a reproducible BUG on my server that occurs whenever disk usage
> >> gets too high / too much swapping occurs (at least I think that is). The
> >> box has one reiserfs filesystem of about 187GB size, the disk is on an
> >> Epia 5000 board, between them is a Promise Ultra 100 PCI IDE controller
> >> card.
> > Do you think this bug is due to the 2.6.18 upgrade?
> 
> No. I already had it in 2.6.17.6.
> 
> > Have you run fsck across the filesystem(s)?
> fsck at boot turns up
> > ReiserFS: hde3: checking transaction log (hde3)
> > ReiserFS: hde3: replayed 22 transactions in 0 seconds
> > ReiserFS: hde3: Using r5 hash to sort names
> nothing more
> 
> > Does the oops always look the same as this one?
> No, not exactly the same. I attach three log files. If you diff them,
> there will be about 30% of the lines different.
> 
> One thing I have to note is that the second Oops appears about 10
> seconds after the first one.
> 
> > Please turn on the various CONFIG_DEBUG_* options, see if that turns up
> > anything.
> That indeed turns up something. The debug messages indicate that java
> wants to lock something and gets stuck. Note that the messages until
> "slab corruption" are printed first, and the others about a minute or
> two later.
> 
> And I still can ping and do everything until the slab corruption occurs.
> (Thus the other messages some minute later)
> 
> 
> > It would be interesting to find out if enabling CONFIG_4KSTACKS makes this
> > go away (although I'm not sure why).
> Didn't try this yet, but will.
> 
> I put the logs in a tar.bz2 because I didn't want to flood the list with
> a 200k message.
> 

OK, you have crashes in the scheduler and one crash when accessing a
reiserfs structure.

You have tcp_v6 lockdep warnings.  They're in
http://xml.cweiske.de/dojo%20kernelpanic%20+%20debug.tar.bz2 is anyone is
keen.  (I've largely lost interest in lockdep warnings - many of them are
false positives and require make-lockdep-shut-up patches).

You have what claims to be a netfilter-related memory corruption:

Slab corruption: start=c608a42c, len=172
Redzone: 0x6b6b6b6b/0xc0411958.
Last user: [<170fc2a5>](0x170fc2a5)
0a0: 6b 6b 6b 6b 6b 6b 6b a5 71 f0 2c 5a
Prev obj: start=c608a2c1, len=172
Redzone: 0xec0410f/0x1170fc2.
Last user: [<30000000>](0x30000000)
000: 00 00 00 10 a2 08 c6 a8 a5 08 c6 46 3a 00 00 10
010: 10 41 c0 bc a2 08 c6 20 d3 60 c0 00 00 00 00 00
slab error in cache_alloc_debugcheck_after(): cache `ip_conntrack': double freen
 [<c01034b9>] show_trace+0x19/0x20
 [<c01035ba>] dump_stack+0x1a/0x20
 [<c0160c11>] __slab_error+0x21/0x30
 [<c0162ca1>] cache_alloc_debugcheck_after+0x121/0x1a0
 [<c0162ffb>] kmem_cache_alloc+0x6b/0xc0
 [<c041184c>] ip_conntrack_alloc+0x3c/0x130
 [<c041198a>] init_conntrack+0x2a/0x110
 [<c0411c4e>] ip_conntrack_in+0x1de/0x230
BUG: unable to handle kernel NULL pointer dereference at virtual address 0000008
 printing eip:



And another in what appears to be core ipv4:

Slab corruption: start=c3aff608, len=240
Redzone: 0x6b6b6b6b/0x0.
Last user: [<170fc2a5>](0x170fc2a5)
0e0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 71 f0 2c 5a
Prev obj: start=c3aff48f, len=240
Redzone: 0x6b6b6b6b/0x6b6b6b6b.
Last user: [<6b6b6b6b>](0x6b6b6b6b)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
slab error in cache_alloc_debugcheck_after(): cache `ip_dst_cache': double freen
 [<c01034b9>] show_trace+0x19/0x20
 [<c01035ba>] dump_stack+0x1a/0x20
 [<c0160c11>] __slab_error+0x21/0x30
 [<c0162ca1>] cache_alloc_debugcheck_after+0x121/0x1a0
 [<c0162ffb>] kmem_cache_alloc+0x6b/0xc0
 [<c03ca3c4>] dst_alloc+0x24/0x90
 [<c03da865>] ip_route_input_slow+0x295/0x8c0
 [<c03daf92>] ip_route_input+0x102/0x1d0
 [<c03dd29a>] ip_rcv+0x27a/0x440
 [<c03c6d41>] netif_receive_skb+0x1b1/0x1f0
 [<c03c6e10>] process_backlog+0x90/0x120
 [<c03c6f0d>] net_rx_action+0x6d/0x100
 [<c011d4af>] __do_softirq+0x6f/0x100
 [<c011d59f>] do_softirq+0x5f/0x70
 [<c011d603>] irq_exit+0x53/0x60
 [<c0104c28>] do_IRQ+0x38/0x70
 [<c0103145>] common_interrupt+0x25/0x30
 [<c028e19b>] memcpy+0x3b/0x50
 [<c028e208>] memmove+0x38/0x50
 [<c01bf85d>] leaf_paste_in_buffer+0x7d/0x320
 [<c01a862c>] balance_leaf+0x24c/0x27d0
 [<c01aaee0>] do_balance+0x60/0xf0
 [<c01c56e4>] reiserfs_paste_into_item+0x164/0x190
 [<c01b3ab5>] reiserfs_allocate_blocks_for_region+0x925/0x12e0
 [<c01b5b2c>] reiserfs_file_write+0x72c/0x7c0
 [<c0166768>] vfs_write+0x88/0x170
 [<c01668fc>] sys_write+0x3c/0x70
 [<c0102e77>] syscall_call+0x7/0xb


And another networking-related scribble:


Slab corruption: start=c64159ec, len=156
Redzone: 0x6b6b6b6b/0xc03c048a.
Last user: [<170fc2a5>](0x170fc2a5)
090: 6b 6b 6b 6b 6b 6b 6b a5 71 f0 2c 5a
Prev obj: start=c64158ec, len=156
Redzone: 0x6b6b6b6b/0x6b6b6b6b.
Last user: [<6b6b6b6b>](0x6b6b6b6b)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
slab error in cache_alloc_debugcheck_after(): cache `skbuff_head_cache': doublen
BUG: unable to handle kernel paging request at virtual address b2724e87
 printing eip:
c028bf49
*pde = 00000000
Oops: 0000 [#1]


A lot of your oopses seem to point at the hrtimer code:


BUG: unable to handle kernel paging request at virtual address b2724e87
 printing eip:
c028bf49
*pde = 00000000
Oops: 0000 [#1]
PREEMPT
Modules linked in:
CPU:    0
EIP:    0060:[<c028bf49>]    Not tainted VLI
EFLAGS: 00010086   (2.6.18 #2)
EIP is at __rb_erase_color+0x59/0x180
eax: b2724e87   ebx: c69c1f64   ecx: cdbcdf64   edx: 00000000
esi: c69c1f64   edi: c05137d8   ebp: c6405390   esp: c6405384
ds: 007b   es: 007b   ss: 0068
Process Øc.{.J.. (pid: 0, ti=c6404000 task=c0511b40 task.ti=c01174d0)
Stack: c69c1f64 00000000 c05137d8 c64053b4 c028c17b 00000000 c69c1f64 c0513804
       00000001 cdbcdf64 c05137d8 c05137d8 c64053cc c012f4ba cdbcdf64 c0513804
       c012f7f0 cdbcdf64 c64053f0 c012f777 cdbcdf64 c05137d8 c05137dc 00000001
Call Trace:
 [<c010354e>] show_stack_log_lvl+0x8e/0xb0
 [<c010370a>] show_registers+0x14a/0x1d0
 [<c0103987>] die+0x167/0x210
 [<c010ed13>] do_page_fault+0x173/0x580
 [<c0103199>] error_code+0x39/0x40
 [<c028c17b>] rb_erase+0x10b/0x140
 [<c012f4ba>] __remove_hrtimer+0x1a/0x40
 [<c012f777>] hrtimer_run_queues+0x77/0xf0
 [<c0121f46>] run_timer_softirq+0x16/0x1a0
 [<c011d4af>] __do_softirq+0x6f/0x100
 [<c011d59f>] do_softirq+0x5f/0x70



What a mess.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 2.6.18 BUG: unable to handle kernel NULL pointer dereference at virtual address 000,0000a
  2006-09-23 20:42 ` Andrew Morton
  2006-09-23 20:39   ` Ingo Molnar
  2006-09-24  9:11   ` Christian Weiske
@ 2006-09-24 12:20   ` Christian Weiske
  2006-09-24 16:50     ` Andrew Morton
  2 siblings, 1 reply; 15+ messages in thread
From: Christian Weiske @ 2006-09-24 12:20 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, reiserfs-dev, Ingo Molnar, Nick Piggin


[-- Attachment #1.1: Type: text/plain, Size: 527 bytes --]

Andrew,

> It would be interesting to find out if enabling CONFIG_4KSTACKS makes this
> go away (although I'm not sure why).
So, here are the results from the 4K runs:

Beside one Oops message, I got a "kernel BUG at mm/slab.c:2747!" in log
#1. Call traces as usual.

Further, logs #2 and #3 show funny things; the thing just rebooted. Log
#2 has some oversized ethernet frames before the reboot.



Sorry for the CC, I thought you were subscribed to lkml and removed you.

-- 
Regards/MfG,
Christian Weiske

[-- Attachment #1.2: dojo kernelpanic debug 4k 1.log --]
[-- Type: text/plain, Size: 7572 bytes --]

=============================================
[ INFO: possible recursive locking detected ]
---------------------------------------------
java/6750 is trying to acquire lock:
 (slock-AF_INET6){-+..}, at: [<c03be6f4>] sk_clone+0xf4/0x310

but task is already holding lock:
 (slock-AF_INET6){-+..}, at: [<c0444eaf>] tcp_v6_rcv+0x34f/0x6f0

other info that might help us debug this:
1 lock held by java/6750:
 #0:  (slock-AF_INET6){-+..}, at: [<c0444eaf>] tcp_v6_rcv+0x34f/0x6f0

stack backtrace:
 [<c01034b9>] show_trace+0x19/0x20
 [<c01035ba>] dump_stack+0x1a/0x20
 [<c0131454>] print_deadlock_bug+0xa4/0xb0
 [<c01314ca>] check_deadlock+0x6a/0x80
 [<c0132cf7>] __lock_acquire+0x4f7/0x950
 [<c01337cd>] lock_acquire+0x5d/0x80
 [<c0483415>] _spin_lock+0x25/0x30
 [<c03be6f4>] sk_clone+0xf4/0x310
 [<c03e6b31>] inet_csk_clone+0x11/0x70
 [<c03fb3c5>] tcp_create_openreq_child+0x15/0x3e0
 [<c04442c2>] tcp_v6_syn_recv_sock+0x142/0x610
 [<c03fb8a9>] tcp_check_req+0x119/0x420
 [<c0443d75>] tcp_v6_hnd_req+0x45/0x130
 [<c0444af7>] tcp_v6_do_rcv+0x247/0x2b0
 [<c0445136>] tcp_v6_rcv+0x5d6/0x6f0
 [<c04272df>] ip6_input+0x16f/0x340
 [<c0427004>] ipv6_rcv+0x114/0x280
 [<c03c6eb1>] netif_receive_skb+0x1b1/0x1f0
 [<c03c6f80>] process_backlog+0x90/0x120
 [<c03c707d>] net_rx_action+0x6d/0x100
 [<c011d68f>] __do_softirq+0x6f/0x100
 [<c0104de7>] do_softirq+0x87/0xe0
 =======================
 [<c011d5d9>] local_bh_enable_ip+0xb9/0x100
 [<c0483661>] _spin_unlock_bh+0x31/0x40
 [<c03bf750>] release_sock+0x50/0xb0
 [<c0405647>] inet_wait_for_connect+0x67/0xd0
 [<c0405748>] inet_stream_connect+0x98/0x1d0
 [<c03bc6d7>] sys_connect+0x67/0xa0
 [<c03bd1c6>] sys_socketcall+0xc6/0x1e0
 [<c0102e77>] syscall_call+0x7/0xb
Slab corruption: start=c62fae5c, len=172
Redzone: 0x6b6b6b6b/0xc0411ac8.
Last user: [<170fc2a5>](0x170fc2a5)
0a0: 6b 6b 6b 6b 6b 6b 6b a5 71 f0 2c 5a
Prev obj: start=c62facf8, len=172
Redzone: 0xc0d36b48/0xc04110a0.
Last user: [<0000000e>](0xe)
000: 90 6a d3 c0 f3 81 01 00 80 11 41 c0 ec ac 2f c6
010: e0 1c 61 c0 00 00 00 00 00 00 00 00 33 02 00 00
slab error in cache_alloc_debugcheck_after(): cache `ip_conntrack': double freen
 [<c01034b9>] show_trace+0x19/0x20
 [<c01035ba>] dump_stack+0x1a/0x20
 [<c0160d81>] __slab_error+0x21/0x30
 [<c0162e11>] cache_alloc_debugcheck_after+0x121/0x1a0
 [<c016316b>] kmem_cache_alloc+0x6b/0xc0
 [<c04119bc>] ip_conntrack_alloc+0x3c/0x130
 [<c0411afa>] init_conntrack+0x2a/0x110
 [<c0411dbe>] ip_conntrack_in+0x1de/0x230
 [<c03d7707>] nf_iterate+0x57/0xa0
 [<c03d77a6>] nf_hook_slow+0x56/0xe0
 [<c03dd3c9>] ip_rcv+0x239/0x440
 [<c03c6eb1>] netif_receive_skb+0x1b1/0x1f0
 [<c03c6f80>] process_backlog+0x90/0x120
 [<c03c707d>] net_rx_action+0x6d/0x100
 [<c011d68f>] __do_softirq+0x6f/0x100
 [<c0104de7>] do_softirq+0x87/0xe0
 =======================
 [<c011d773>] irq_exit+0x53/0x60
 [<c0104c5a>] do_IRQ+0x6a/0xb0
 [<c0103145>] common_interrupt+0x25/0x30
 [<c028e30b>] memcpy+0x3b/0x50
 [<c028e378>] memmove+0x38/0x50
 [<c01bf9cd>] leaf_paste_in_buffer+0x7d/0x320
 [<c01a879c>] balance_leaf+0x24c/0x27d0
 [<c01ab050>] do_balance+0x60/0xf0
 [<c01c5854>] reiserfs_paste_into_item+0x164/0x190
 [<c01b3c25>] reiserfs_allocate_blocks_for_region+0x925/0x12e0
 [<c01b5c9c>] reiserfs_file_write+0x72c/0x7c0
 [<c01668d8>] vfs_write+0x88/0x170
 [<c0166a6c>] sys_write+0x3c/0x70
 [<c0102e77>] syscall_call+0x7/0xb
c62fae58: redzone 1:0x6b6b6b6b, redzone 2:0xc0411ac8
------------[ cut here ]------------
kernel BUG at mm/slab.c:2747!
invalid opcode: 0000 [#1]
PREEMPT
Modules linked in:
CPU:    0
EIP:    0060:[<c01629d1>]    Not tainted VLI
EFLAGS: 00010087   (2.6.18 #3)
EIP is at cache_free_debugcheck+0x241/0x250
eax: 0113bcc5   ebx: 00010c00   ecx: 000000b8   edx: cf660500
esi: 00000014   edi: c62fae58   ebp: c05f7f70   esp: c05f7f5c
ds: 007b   es: 007b   ss: 0068
Process java (pid: 6848, ti=c05f7000 task=c6934b00 task.ti=c69e6000)
Stack: 0113bcc5 c62fa040 c13dc7d8 c62fae5c cf660500 c05f7f94 c0163581 cf660500
       c62fae5c c0411ac8 00000246 c62fae5c c69ad904 00000009 c05f7fa4 c0411ac8
       cf660500 c62fae5c c05f7fb4 c0411131 c62fae5c cd8acb30 c05f7fc8 c03c06b4
Call Trace:
 [<c010354e>] show_stack_log_lvl+0x8e/0xb0
 [<c010370a>] show_registers+0x14a/0x1d0
 [<c0103987>] die+0x167/0x210
 [<c0103aac>] do_trap+0x7c/0xc0
 [<c0103d40>] do_invalid_op+0x90/0xa0
 [<c0103199>] error_code+0x39/0x40
 [<c0163581>] kmem_cache_free+0x61/0xf0
 [<c0411ac8>] ip_conntrack_free+0x18/0x20
 [<c0411131>] destroy_conntrack+0x91/0xe0
 [<c03c06b4>] __kfree_skb+0x74/0xf0
 [<c03c6c36>] net_tx_action+0x56/0x120
 [<c011d68f>] __do_softirq+0x6f/0x100
 [<c0104de7>] do_softirq+0x87/0xe0
 =======================
 [<c011d773>] irq_exit+0x53/0x60
 [<c0104c5a>] do_IRQ+0x6a/0xb0
 [<c0103145>] common_interrupt+0x25/0x30
 [<c028e30b>] memcpy+0x3b/0x50
 [<c028e378>] memmove+0x38/0x50
 [<c01bf9cd>] leaf_paste_in_buffer+0x7d/0x320
 [<c01a879c>] balance_leaf+0x24c/0x27d0
 [<c01ab050>] do_balance+0x60/0xf0
 [<c01c5854>] reiserfs_paste_into_item+0x164/0x190
 [<c01b3c25>] reiserfs_allocate_blocks_for_region+0x925/0x12e0
 [<c01b5c9c>] reiserfs_file_write+0x72c/0x7c0
 [<c01668d8>] vfs_write+0x88/0x170
 [<c0166a6c>] sys_write+0x3c/0x70
 [<c0102e77>] syscall_call+0x7/0xb
Code: 47 ff ff ff e9 68 ff ff ff 0f 0b 60 02 cd e6 4a c0 e9 1b fe ff ff 8b 52 0
EIP: [<c01629d1>] cache_free_debugcheck+0x241/0x250 SS:ESP 0068:c05f7f5c
 <0>Kernel panic - not syncing: Fatal exception in interrupt
 <3>Slab corruption: start=c62a8d58, len=2048
Redzone: 0x6b6b6b6b/0xc03c0543.
Last user: [<170fc2a5>](0x170fc2a5)
7f0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 71 f0 2c 5a
Prev obj: start=c62a8487, len=2048
Redzone: 0x0/0x5a5a5a5a.
Last user: [<5a5a5a5a>](0x5a5a5a5a)
000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
slab error in cache_alloc_debugcheck_after(): cache `size-2048': double free, on
 [<c01034b9>] show_trace+0x19/0x20
 [<c01035ba>] dump_stack+0x1a/0x20
 [<c0160d81>] __slab_error+0x21/0x30
 [<c0162e11>] cache_alloc_debugcheck_after+0x121/0x1a0
 [<c01634c8>] __kmalloc_track_caller+0xa8/0x100
 [<c03c029d>] __alloc_skb+0x4d/0x110
 [<c030438b>] rhine_rx+0x29b/0x490
 [<c0303db3>] rhine_interrupt+0x193/0x240
 [<c0144807>] handle_IRQ_event+0x27/0x70
 [<c01448d3>] __do_IRQ+0x83/0x110
 [<c0104c53>] do_IRQ+0x63/0xb0
 =======================
 [<c0103145>] common_interrupt+0x25/0x30
 [<c0103a21>] die+0x201/0x210
 [<c0103aac>] do_trap+0x7c/0xc0
 [<c0103d40>] do_invalid_op+0x90/0xa0
 [<c0103199>] error_code+0x39/0x40
 [<c0163581>] kmem_cache_free+0x61/0xf0
 [<c0411ac8>] ip_conntrack_free+0x18/0x20
 [<c0411131>] destroy_conntrack+0x91/0xe0
 [<c03c06b4>] __kfree_skb+0x74/0xf0
 [<c03c6c36>] net_tx_action+0x56/0x120
 [<c011d68f>] __do_softirq+0x6f/0x100
 [<c0104de7>] do_softirq+0x87/0xe0
 =======================
 [<c011d773>] irq_exit+0x53/0x60
 [<c0104c5a>] do_IRQ+0x6a/0xb0
 [<c0103145>] common_interrupt+0x25/0x30
 [<c028e30b>] memcpy+0x3b/0x50
 [<c028e378>] memmove+0x38/0x50
 [<c01bf9cd>] leaf_paste_in_buffer+0x7d/0x320
 [<c01a879c>] balance_leaf+0x24c/0x27d0
 [<c01ab050>] do_balance+0x60/0xf0
 [<c01c5854>] reiserfs_paste_into_item+0x164/0x190
 [<c01b3c25>] reiserfs_allocate_blocks_for_region+0x925/0x12e0
 [<c01b5c9c>] reiserfs_file_write+0x72c/0x7c0
 [<c01668d8>] vfs_write+0x88/0x170
 [<c0166a6c>] sys_write+0x3c/0x70
 [<c0102e77>] syscall_call+0x7/0xb
c62a8d54: redzone 1:0x6b6b6b6b, redzone 2:0xc03c0543
 

[-- Attachment #1.3: dojo kernelpanic debug 4k 2.log --]
[-- Type: text/plain, Size: 3813 bytes --]

=============================================
[ INFO: possible recursive locking detected ]
---------------------------------------------
java/6736 is trying to acquire lock:
 (slock-AF_INET6){-+..}, at: [<c03be6f4>] sk_clone+0xf4/0x310

but task is already holding lock:
 (slock-AF_INET6){-+..}, at: [<c0444eaf>] tcp_v6_rcv+0x34f/0x6f0

other info that might help us debug this:
1 lock held by java/6736:
 #0:  (slock-AF_INET6){-+..}, at: [<c0444eaf>] tcp_v6_rcv+0x34f/0x6f0

stack backtrace:
 [<c01034b9>] show_trace+0x19/0x20
 [<c01035ba>] dump_stack+0x1a/0x20
 [<c0131454>] print_deadlock_bug+0xa4/0xb0
 [<c01314ca>] check_deadlock+0x6a/0x80
 [<c0132cf7>] __lock_acquire+0x4f7/0x950
 [<c01337cd>] lock_acquire+0x5d/0x80
 [<c0483415>] _spin_lock+0x25/0x30
 [<c03be6f4>] sk_clone+0xf4/0x310
 [<c03e6b31>] inet_csk_clone+0x11/0x70
 [<c03fb3c5>] tcp_create_openreq_child+0x15/0x3e0
 [<c04442c2>] tcp_v6_syn_recv_sock+0x142/0x610
 [<c03fb8a9>] tcp_check_req+0x119/0x420
 [<c0443d75>] tcp_v6_hnd_req+0x45/0x130
 [<c0444af7>] tcp_v6_do_rcv+0x247/0x2b0
 [<c0445136>] tcp_v6_rcv+0x5d6/0x6f0
 [<c04272df>] ip6_input+0x16f/0x340
 [<c0427004>] ipv6_rcv+0x114/0x280
 [<c03c6eb1>] netif_receive_skb+0x1b1/0x1f0
 [<c03c6f80>] process_backlog+0x90/0x120
 [<c03c707d>] net_rx_action+0x6d/0x100
 [<c011d68f>] __do_softirq+0x6f/0x100
 [<c0104de7>] do_softirq+0x87/0xe0
 =======================
 [<c011d5d9>] local_bh_enable_ip+0xb9/0x100
 [<c0483661>] _spin_unlock_bh+0x31/0x40
 [<c03bf750>] release_sock+0x50/0xb0
 [<c0405647>] inet_wait_for_connect+0x67/0xd0
 [<c0405748>] inet_stream_connect+0x98/0x1d0
 [<c03bc6d7>] sys_connect+0x67/0xa0
 [<c03bd1c6>] sys_socketcall+0xc6/0x1e0
 [<c0102e77>] syscall_call+0x7/0xb
eth0: Oversized Ethernet frame spanned multiple buffers, entry 0xb length 0 sta!
eth0: Oversized Ethernet frame cd4810b0 vs cd4810b0.
eth0: Oversized Ethernet frame spanned multiple buffers, entry 0xc length 0 sta!
eth0: Oversized Ethernet frame cd4810c0 vs cd4810c0.
eth0: Oversized Ethernet frame spanned multiple buffers, entry 0xd length 0 sta!
eth0: Oversized Ethernet frame cd4810d0 vs cd4810d0.
eth0: Oversized Ethernet frame spanned multiple buffers, entry 0xe length 0 sta!
eth0: Oversized Ethernet frame cd4810e0 vs cd4810e0.
eth0: Oversized Ethernet frame spanned multiple buffers, entry 0xf length 0 sta!
eth0: Oversized Ethernet frame cd4810f0 vs cd4810f0.
eth0: Oversized Ethernet frame spanned multiple buffers, entry 0x0 length 0 sta!
eth0: Oversized Ethernet frame cd481000 vs cd481000.
eth0: Oversized Ethernet frame spanned multiple buffers, entry 0x1 length 0 sta!
eth0: Oversized Ethernet frame cd481010 vs cd481010.
eth0: Oversized Ethernet frame spanned multiple buffers, entry 0x2 length 0 sta!
eth0: Oversized Ethernet frame cd481020 vs cd481020.
eth0: Oversized Ethernet frame spanned multiple buffers, entry 0x3 length 0 sta!
eth0: Oversized Ethernet frame cd481030 vs cd481030.
eth0: Oversized Ethernet frame spanned multiple buffers, entry 0x4 length 0 sta!
eth0: Oversized Ethernet frame cd481040 vs cd481040.
eth0: Oversized Ethernet frame spanned multiple buffers, entry 0x5 length 0 sta!
eth0: Oversized Ethernet frame cd481050 vs cd481050.
eth0: Oversized Ethernet frame spanned multiple buffers, entry 0x6 length 0 sta!
eth0: Oversized Ethernet frame cd481060 vs cd481060.
eth0: Oversized Ethernet frame spanned multiple buffers, entry 0x7 length 0 sta!
eth0: Oversized Ethernet frame cd481070 vs cd481070.
eth0: Oversized Ethernet frame spanned multiple buffers, entry 0x8 length 0 sta!
eth0: Oversized Ethernet frame cd481080 vs cd481080.
eth0: Oversized Ethernet frame spanned multiple buffers, entry 0x9 length 0 sta!
eth0: Oversized Ethernet frame cd481090 vs cd481090.

[followed by a restart!]

[-- Attachment #1.4: dojo kernelpanic debug 4k 3.log --]
[-- Type: text/plain, Size: 1757 bytes --]

=============================================
[ INFO: possible recursive locking detected ]
---------------------------------------------
java/6743 is trying to acquire lock:
 (slock-AF_INET6){-+..}, at: [<c03be6f4>] sk_clone+0xf4/0x310

but task is already holding lock:
 (slock-AF_INET6){-+..}, at: [<c0444eaf>] tcp_v6_rcv+0x34f/0x6f0

other info that might help us debug this:
1 lock held by java/6743:
 #0:  (slock-AF_INET6){-+..}, at: [<c0444eaf>] tcp_v6_rcv+0x34f/0x6f0

stack backtrace:
 [<c01034b9>] show_trace+0x19/0x20
 [<c01035ba>] dump_stack+0x1a/0x20
 [<c0131454>] print_deadlock_bug+0xa4/0xb0
 [<c01314ca>] check_deadlock+0x6a/0x80
 [<c0132cf7>] __lock_acquire+0x4f7/0x950
 [<c01337cd>] lock_acquire+0x5d/0x80
 [<c0483415>] _spin_lock+0x25/0x30
 [<c03be6f4>] sk_clone+0xf4/0x310
 [<c03e6b31>] inet_csk_clone+0x11/0x70
 [<c03fb3c5>] tcp_create_openreq_child+0x15/0x3e0
 [<c04442c2>] tcp_v6_syn_recv_sock+0x142/0x610
 [<c03fb8a9>] tcp_check_req+0x119/0x420
 [<c0443d75>] tcp_v6_hnd_req+0x45/0x130
 [<c0444af7>] tcp_v6_do_rcv+0x247/0x2b0
 [<c0445136>] tcp_v6_rcv+0x5d6/0x6f0
 [<c04272df>] ip6_input+0x16f/0x340
 [<c0427004>] ipv6_rcv+0x114/0x280
 [<c03c6eb1>] netif_receive_skb+0x1b1/0x1f0
 [<c03c6f80>] process_backlog+0x90/0x120
 [<c03c707d>] net_rx_action+0x6d/0x100
 [<c011d68f>] __do_softirq+0x6f/0x100
 [<c0104de7>] do_softirq+0x87/0xe0
 =======================
 [<c011d5d9>] local_bh_enable_ip+0xb9/0x100
 [<c0483661>] _spin_unlock_bh+0x31/0x40
 [<c03bf750>] release_sock+0x50/0xb0
 [<c0405647>] inet_wait_for_connect+0x67/0xd0
 [<c0405748>] inet_stream_connect+0x98/0x1d0
 [<c03bc6d7>] sys_connect+0x67/0xa0
 [<c03bd1c6>] sys_socketcall+0xc6/0x1e0
 [<c0102e77>] syscall_call+0x7/0xb

[reboot]

[-- Attachment #1.5: dojo kernelpanic debug 4k 4.log --]
[-- Type: text/plain, Size: 10334 bytes --]

=============================================
[ INFO: possible recursive locking detected ]
---------------------------------------------
java/6746 is trying to acquire lock:
 (slock-AF_INET6){-+..}, at: [<c03be6f4>] sk_clone+0xf4/0x310

but task is already holding lock:
 (slock-AF_INET6){-+..}, at: [<c0444eaf>] tcp_v6_rcv+0x34f/0x6f0

other info that might help us debug this:
1 lock held by java/6746:
 #0:  (slock-AF_INET6){-+..}, at: [<c0444eaf>] tcp_v6_rcv+0x34f/0x6f0

stack backtrace:
 [<c01034b9>] show_trace+0x19/0x20
 [<c01035ba>] dump_stack+0x1a/0x20
 [<c0131454>] print_deadlock_bug+0xa4/0xb0
 [<c01314ca>] check_deadlock+0x6a/0x80
 [<c0132cf7>] __lock_acquire+0x4f7/0x950
 [<c01337cd>] lock_acquire+0x5d/0x80
 [<c0483415>] _spin_lock+0x25/0x30
 [<c03be6f4>] sk_clone+0xf4/0x310
 [<c03e6b31>] inet_csk_clone+0x11/0x70
 [<c03fb3c5>] tcp_create_openreq_child+0x15/0x3e0
 [<c04442c2>] tcp_v6_syn_recv_sock+0x142/0x610
 [<c03fb8a9>] tcp_check_req+0x119/0x420
 [<c0443d75>] tcp_v6_hnd_req+0x45/0x130
 [<c0444af7>] tcp_v6_do_rcv+0x247/0x2b0
 [<c0445136>] tcp_v6_rcv+0x5d6/0x6f0
 [<c04272df>] ip6_input+0x16f/0x340
 [<c0427004>] ipv6_rcv+0x114/0x280
 [<c03c6eb1>] netif_receive_skb+0x1b1/0x1f0
 [<c03c6f80>] process_backlog+0x90/0x120
 [<c03c707d>] net_rx_action+0x6d/0x100
 [<c011d68f>] __do_softirq+0x6f/0x100
 [<c0104de7>] do_softirq+0x87/0xe0
 =======================
 [<c011d5d9>] local_bh_enable_ip+0xb9/0x100
 [<c0483661>] _spin_unlock_bh+0x31/0x40
 [<c03bf750>] release_sock+0x50/0xb0
 [<c0405647>] inet_wait_for_connect+0x67/0xd0
 [<c0405748>] inet_stream_connect+0x98/0x1d0
 [<c03bc6d7>] sys_connect+0x67/0xa0
 [<c03bd1c6>] sys_socketcall+0xc6/0x1e0
 [<c0102e77>] syscall_call+0x7/0xb
BUG: unable to handle kernel paging request at virtual address 170fc2c3
 printing eip:
c03d958a
*pde = 00000000
Oops: 0000 [#1]
PREEMPT
Modules linked in:
CPU:    0
EIP:    0060:[<c03d958a>]    Not tainted VLI
EFLAGS: 00010286   (2.6.18 #3)
EIP is at __ip_select_ident+0x4a/0xa0
eax: c6b78050   ebx: c613d8bc   ecx: ffffffff   edx: c05f7000
esi: 170fc2a5   edi: c13f1814   ebp: c05f7e70   esp: c05f7e64
ds: 007b   es: 007b   ss: 0068
Process java (pid: 6844, ti=c05f7000 task=c6b78050 task.ti=c0eb3000)
Stack: c3fd3254 c13f1814 c8a62034 c05f7f38 c03e0df8 c13f1814 c613d8bc 00000000
       00000000 c613d8bc c05f7ea0 c03bf290 cdfc85dc c05f7ea0 c05f7ebc c01334dd
       fffffff5 c8a62034 c05f7f70 c0406027 c8a62034 00000000 c05f7ed4 c613d8bc
Call Trace:
 [<c010354e>] show_stack_log_lvl+0x8e/0xb0
 [<c010370a>] show_registers+0x14a/0x1d0
 [<c0103987>] die+0x167/0x210
 [<c010eef3>] do_page_fault+0x173/0x580
 [<c0103199>] error_code+0x39/0x40
 [<c03e0df8>] ip_queue_xmit+0x468/0x520
 [<c03f26df>] tcp_transmit_skb+0x27f/0x4b0
 [<c03f4a93>] tcp_retransmit_skb+0x153/0x2d0
 [<c03f66af>] tcp_retransmit_timer+0xdf/0x3f0
 [<c03f6a91>] tcp_write_timer+0xd1/0x100
 [<c0122154>] run_timer_softirq+0xb4/0x1a0
 [<c011d68f>] __do_softirq+0x6f/0x100
 [<c0104de7>] do_softirq+0x87/0xe0
 =======================
 [<c011d773>] irq_exit+0x53/0x60
 [<c0104c5a>] do_IRQ+0x6a/0xb0
 [<c0103145>] common_interrupt+0x25/0x30
 [<c028e30b>] memcpy+0x3b/0x50
 [<c028e378>] memmove+0x38/0x50
 [<c01bf9cd>] leaf_paste_in_buffer+0x7d/0x320
 [<c01a879c>] balance_leaf+0x24c/0x27d0
 [<c01ab050>] do_balance+0x60/0xf0
 [<c01c5854>] reiserfs_paste_into_item+0x164/0x190
 [<c01b3c25>] reiserfs_allocate_blocks_for_region+0x925/0x12e0
 [<c01b5c9c>] reiserfs_file_write+0x72c/0x7c0
 [<c01668d8>] vfs_write+0x88/0x170
 [<c0166a6c>] sys_write+0x3c/0x70
 [<c0102e77>] syscall_call+0x7/0xb
Code: fe ff ff 8b b3 ec 00 00 00 58 85 f6 5a 75 12 57 e8 7c ff ff ff 8d 65 f4 5
EIP: [<c03d958a>] __ip_select_ident+0x4a/0xa0 SS:ESP 0068:c05f7e64
 <0>Kernel panic - not syncing: Fatal exception in interrupt
 <3>Slab corruption: start=c6403564, len=2048
Redzone: 0x6b6b6b6b/0xc03c0543.
Last user: [<00000000>](0x0)
7f0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 71 f0 2c 5a
Prev obj: start=c6402cd3, len=2048
Redzone: 0x6b6b6b6b/0x6b6b6b6b.
Last user: [<6b6b6b6b>](0x6b6b6b6b)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
slab error in cache_alloc_debugcheck_after(): cache `size-2048': double free, on
 [<c01034b9>] show_trace+0x19/0x20
 [<c01035ba>] dump_stack+0x1a/0x20
 [<c0160d81>] __slab_error+0x21/0x30
 [<c0162e11>] cache_alloc_debugcheck_after+0x121/0x1a0
 [<c01634c8>] __kmalloc_track_caller+0xa8/0x100
 [<c03c029d>] __alloc_skb+0x4d/0x110
 [<c030438b>] rhine_rx+0x29b/0x490
 [<c0303db3>] rhine_interrupt+0x193/0x240
 [<c0144807>] handle_IRQ_event+0x27/0x70
 [<c01448d3>] __do_IRQ+0x83/0x110
 [<c0104c53>] do_IRQ+0x63/0xb0
 =======================
 [<c0103145>] common_interrupt+0x25/0x30
 [<c028e1dd>] __delay+0xd/0x10
 [<c028e205>] __const_udelay+0x25/0x30
 [<c0117ce8>] panic+0xf8/0x100
 [<c0103a21>] die+0x201/0x210
 [<c010eef3>] do_page_fault+0x173/0x580
 [<c0103199>] error_code+0x39/0x40
 [<c03e0df8>] ip_queue_xmit+0x468/0x520
 [<c03f26df>] tcp_transmit_skb+0x27f/0x4b0
 [<c03f4a93>] tcp_retransmit_skb+0x153/0x2d0
 [<c03f66af>] tcp_retransmit_timer+0xdf/0x3f0
 [<c03f6a91>] tcp_write_timer+0xd1/0x100
 [<c0122154>] run_timer_softirq+0xb4/0x1a0
 [<c011d68f>] __do_softirq+0x6f/0x100
 [<c0104de7>] do_softirq+0x87/0xe0
 =======================
 [<c011d773>] irq_exit+0x53/0x60
 [<c0104c5a>] do_IRQ+0x6a/0xb0
 [<c0103145>] common_interrupt+0x25/0x30
 [<c028e30b>] memcpy+0x3b/0x50
 [<c028e378>] memmove+0x38/0x50
 [<c01bf9cd>] leaf_paste_in_buffer+0x7d/0x320
 [<c01a879c>] balance_leaf+0x24c/0x27d0
 [<c01ab050>] do_balance+0x60/0xf0
 [<c01c5854>] reiserfs_paste_into_item+0x164/0x190
 [<c01b3c25>] reiserfs_allocate_blocks_for_region+0x925/0x12e0
 [<c01b5c9c>] reiserfs_file_write+0x72c/0x7c0
 [<c01668d8>] vfs_write+0x88/0x170
 [<c0166a6c>] sys_write+0x3c/0x70
 [<c0102e77>] syscall_call+0x7/0xb
c6403560: redzone 1:0x6b6b6b6b, redzone 2:0xc03c0543
Slab corruption: start=c6402d58, len=2048
Redzone: 0x6b6b6b6b/0x0.
Last user: [<5a2cf071>](0x5a2cf071)
7f0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 71 f0 2c 5a
Prev obj: start=c64024c7, len=2048
Redzone: 0x656c7564/0x6b6b6b6b.
Last user: [<6b6b6b6b>](0x6b6b6b6b)
000: 3d 63 6f 6d 6d 75 6e 69 74 79 26 61 63 74 69 6f
010: 6e 3d 76 69 65 77 5f 74 6f 70 69 63 26 74 6f 70
slab error in cache_alloc_debugcheck_after(): cache `size-2048': double free, on
 [<c01034b9>] show_trace+0x19/0x20
 [<c01035ba>] dump_stack+0x1a/0x20
 [<c0160d81>] __slab_error+0x21/0x30
 [<c0162e11>] cache_alloc_debugcheck_after+0x121/0x1a0
 [<c01634c8>] __kmalloc_track_caller+0xa8/0x100
 [<c03c029d>] __alloc_skb+0x4d/0x110
 [<c030438b>] rhine_rx+0x29b/0x490
 [<c0303db3>] rhine_interrupt+0x193/0x240
 [<c0144807>] handle_IRQ_event+0x27/0x70
 [<c01448d3>] __do_IRQ+0x83/0x110
 [<c0104c53>] do_IRQ+0x63/0xb0
 =======================
 [<c0103145>] common_interrupt+0x25/0x30
 [<c028e1dd>] __delay+0xd/0x10
 [<c028e205>] __const_udelay+0x25/0x30
 [<c0117ce8>] panic+0xf8/0x100
 [<c0103a21>] die+0x201/0x210
 [<c010eef3>] do_page_fault+0x173/0x580
 [<c0103199>] error_code+0x39/0x40
 [<c03e0df8>] ip_queue_xmit+0x468/0x520
 [<c03f26df>] tcp_transmit_skb+0x27f/0x4b0
 [<c03f4a93>] tcp_retransmit_skb+0x153/0x2d0
 [<c03f66af>] tcp_retransmit_timer+0xdf/0x3f0
 [<c03f6a91>] tcp_write_timer+0xd1/0x100
 [<c0122154>] run_timer_softirq+0xb4/0x1a0
 [<c011d68f>] __do_softirq+0x6f/0x100
 [<c0104de7>] do_softirq+0x87/0xe0
 =======================
 [<c011d773>] irq_exit+0x53/0x60
 [<c0104c5a>] do_IRQ+0x6a/0xb0
 [<c0103145>] common_interrupt+0x25/0x30
 [<c028e30b>] memcpy+0x3b/0x50
 [<c028e378>] memmove+0x38/0x50
 [<c01bf9cd>] leaf_paste_in_buffer+0x7d/0x320
 [<c01a879c>] balance_leaf+0x24c/0x27d0
 [<c01ab050>] do_balance+0x60/0xf0
 [<c01c5854>] reiserfs_paste_into_item+0x164/0x190
 [<c01b3c25>] reiserfs_allocate_blocks_for_region+0x925/0x12e0
 [<c01b5c9c>] reiserfs_file_write+0x72c/0x7c0
 [<c01668d8>] vfs_write+0x88/0x170
 [<c0166a6c>] sys_write+0x3c/0x70
 [<c0102e77>] syscall_call+0x7/0xb
c6402d54: redzone 1:0x6b6b6b6b, redzone 2:0x0
Slab corruption: start=c640254c, len=2048
Redzone: 0x6b6b6b6b/0x0.
Last user: [<5a2cf071>](0x5a2cf071)
7f0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 71 f0 2c 5a
Prev obj: start=c6401cbb, len=2048
Redzone: 0x19a60cb7/0x68702e78.
Last user: [<6f6d3f70>](0x6f6d3f70)
000: b7 f8 a5 19 b7 00 13 00 00 f4 90 48 08 7c a4 19
010: b7 70 a4 19 b7 01 00 00 00 00 13 00 00 01 00 00
slab error in cache_alloc_debugcheck_after(): cache `size-2048': double free, on
 [<c01034b9>] show_trace+0x19/0x20
 [<c01035ba>] dump_stack+0x1a/0x20
 [<c0160d81>] __slab_error+0x21/0x30
 [<c0162e11>] cache_alloc_debugcheck_after+0x121/0x1a0
 [<c01634c8>] __kmalloc_track_caller+0xa8/0x100
 [<c03c029d>] __alloc_skb+0x4d/0x110
 [<c030438b>] rhine_rx+0x29b/0x490
 [<c0303db3>] rhine_interrupt+0x193/0x240
 [<c0144807>] handle_IRQ_event+0x27/0x70
 [<c01448d3>] __do_IRQ+0x83/0x110
 [<c0104c53>] do_IRQ+0x63/0xb0
 =======================
 [<c0103145>] common_interrupt+0x25/0x30
 [<c028e1dd>] __delay+0xd/0x10
 [<c028e205>] __const_udelay+0x25/0x30
 [<c0117ce8>] panic+0xf8/0x100
 [<c0103a21>] die+0x201/0x210
 [<c010eef3>] do_page_fault+0x173/0x580
 [<c0103199>] error_code+0x39/0x40
 [<c03e0df8>] ip_queue_xmit+0x468/0x520
 [<c03f26df>] tcp_transmit_skb+0x27f/0x4b0
 [<c03f4a93>] tcp_retransmit_skb+0x153/0x2d0
 [<c03f66af>] tcp_retransmit_timer+0xdf/0x3f0
 [<c03f6a91>] tcp_write_timer+0xd1/0x100
 [<c0122154>] run_timer_softirq+0xb4/0x1a0
 [<c011d68f>] __do_softirq+0x6f/0x100
 [<c0104de7>] do_softirq+0x87/0xe0
 =======================
 [<c011d773>] irq_exit+0x53/0x60
 [<c0104c5a>] do_IRQ+0x6a/0xb0
 [<c0103145>] common_interrupt+0x25/0x30
 [<c028e30b>] memcpy+0x3b/0x50
 [<c028e378>] memmove+0x38/0x50
 [<c01bf9cd>] leaf_paste_in_buffer+0x7d/0x320
 [<c01a879c>] balance_leaf+0x24c/0x27d0
 [<c01ab050>] do_balance+0x60/0xf0
 [<c01c5854>] reiserfs_paste_into_item+0x164/0x190
 [<c01b3c25>] reiserfs_allocate_blocks_for_region+0x925/0x12e0
 [<c01b5c9c>] reiserfs_file_write+0x72c/0x7c0
 [<c01668d8>] vfs_write+0x88/0x170
 [<c0166a6c>] sys_write+0x3c/0x70
 [<c0102e77>] syscall_call+0x7/0xb
c6402548: redzone 1:0x6b6b6b6b, redzone 2:0x0

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 2.6.18 BUG: unable to handle kernel NULL pointer dereference at virtual address 000,0000a
  2006-09-24 12:20   ` Christian Weiske
@ 2006-09-24 16:50     ` Andrew Morton
  2006-09-24 17:47       ` Christian Weiske
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2006-09-24 16:50 UTC (permalink / raw)
  To: Christian Weiske; +Cc: linux-kernel, reiserfs-dev, Ingo Molnar, Nick Piggin

On Sun, 24 Sep 2006 14:20:14 +0200
Christian Weiske <cweiske@cweiske.de> wrote:

> > It would be interesting to find out if enabling CONFIG_4KSTACKS makes this
> > go away (although I'm not sure why).
> So, here are the results from the 4K runs:
> 
> Beside one Oops message, I got a "kernel BUG at mm/slab.c:2747!" in log
> #1. Call traces as usual.
> 
> Further, logs #2 and #3 show funny things; the thing just rebooted. Log
> #2 has some oversized ethernet frames before the reboot.

I assume that you have confirmed that the machine doesn't have hardware
problems?  Does it run some earlier kernel OK?  

And how long does it take to crash?

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 2.6.18 BUG: unable to handle kernel NULL pointer dereference at virtual address 000,0000a
  2006-09-24 16:50     ` Andrew Morton
@ 2006-09-24 17:47       ` Christian Weiske
  2006-09-25  4:14         ` Nick Piggin
  2006-09-25 18:36         ` Christian Weiske
  0 siblings, 2 replies; 15+ messages in thread
From: Christian Weiske @ 2006-09-24 17:47 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, reiserfs-dev, Ingo Molnar, Nick Piggin

[-- Attachment #1: Type: text/plain, Size: 796 bytes --]

Andrew,


> I assume that you have confirmed that the machine doesn't have hardware
> problems?  Does it run some earlier kernel OK?  
The disks are both fine, they worked in other pcs without problems. The
ide controller card also worked fine, and the motherboard is new -
whatever you can expect with that. Maybe the combination is the problem.

I had some problems after running the machine for some days but I
thought that wasn't a hardware but more a kernel timing problem:
http://bugzilla.kernel.org/show_bug.cgi?id=6969


> And how long does it take to crash?
After starting the yacy daemon, it's about half a minute until the
"possible recursive locking detected" appears, and after one or two
minutes the whole thing crashes.


-- 
Regards/MfG,
Christian Weiske


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 2.6.18 BUG: unable to handle kernel NULL pointer dereference at virtual address 000,0000a
  2006-09-24 10:19     ` Andrew Morton
@ 2006-09-24 17:59       ` Ingo Molnar
  0 siblings, 0 replies; 15+ messages in thread
From: Ingo Molnar @ 2006-09-24 17:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christian Weiske, netdev, linux-kernel, reiserfs-dev, Nick Piggin


* Andrew Morton <akpm@osdl.org> wrote:

> You have tcp_v6 lockdep warnings.  They're in
> http://xml.cweiske.de/dojo%20kernelpanic%20+%20debug.tar.bz2 is anyone is
> keen.  (I've largely lost interest in lockdep warnings - many of them are
> false positives and require make-lockdep-shut-up patches).

FYI, this is from Herbert Xu's recent mail to netdev:

| Subject: Re: neigh_lookup lockdep warning
|
| [...]
| BTW, out of the last four validator reports I've read three have 
| turned out to be genuine bugs.  So you guys have done a fantastic job!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 2.6.18 BUG: unable to handle kernel NULL pointer dereference at virtual address 000,0000a
  2006-09-24 17:47       ` Christian Weiske
@ 2006-09-25  4:14         ` Nick Piggin
  2006-09-25 18:36         ` Christian Weiske
  1 sibling, 0 replies; 15+ messages in thread
From: Nick Piggin @ 2006-09-25  4:14 UTC (permalink / raw)
  To: Christian Weiske; +Cc: Andrew Morton, linux-kernel, reiserfs-dev, Ingo Molnar

Christian Weiske wrote:

>Andrew,
>
>
>
>>I assume that you have confirmed that the machine doesn't have hardware
>>problems?  Does it run some earlier kernel OK?  
>>
>The disks are both fine, they worked in other pcs without problems. The
>ide controller card also worked fine, and the motherboard is new -
>whatever you can expect with that. Maybe the combination is the problem.
>

Memory, motherboard, and CPU would be possible candidates, in roughly
that order of likelihood. If you can run memtest86+ on it overnight,
that would provide a bit more confidence in all.

Can you try using a different IDE controller to reproduce the panic
on the same system?

>>And how long does it take to crash?
>>
>After starting the yacy daemon, it's about half a minute until the
>"possible recursive locking detected" appears, and after one or two
>minutes the whole thing crashes.
>

I wonder if that does anything unusual apart from use the network?
Can you break it with anything else? a big ftp transfer?

--

Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 2.6.18 BUG: unable to handle kernel NULL pointer dereference at virtual address 000,0000a
  2006-09-24 17:47       ` Christian Weiske
  2006-09-25  4:14         ` Nick Piggin
@ 2006-09-25 18:36         ` Christian Weiske
  2006-09-25 21:26           ` Andrew Morton
  1 sibling, 1 reply; 15+ messages in thread
From: Christian Weiske @ 2006-09-25 18:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, reiserfs-dev, Ingo Molnar, Nick Piggin

[-- Attachment #1: Type: text/plain, Size: 487 bytes --]

>> I assume that you have confirmed that the machine doesn't have hardware
>> problems?  Does it run some earlier kernel OK?  
> The disks are both fine, they worked in other pcs without problems. The
> ide controller card also worked fine, and the motherboard is new -
> whatever you can expect with that. Maybe the combination is the problem.

So this is definitely a hardware problem? Which component is most likely
to be the bad one?

-- 
Regards/MfG,
Christian Weiske


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 2.6.18 BUG: unable to handle kernel NULL pointer dereference at virtual address 000,0000a
  2006-09-25 18:36         ` Christian Weiske
@ 2006-09-25 21:26           ` Andrew Morton
  2006-10-02 17:01             ` Christian Weiske
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2006-09-25 21:26 UTC (permalink / raw)
  To: Christian Weiske; +Cc: linux-kernel, reiserfs-dev, Ingo Molnar, Nick Piggin

On Mon, 25 Sep 2006 20:36:57 +0200
Christian Weiske <cweiske@cweiske.de> wrote:

> >> I assume that you have confirmed that the machine doesn't have hardware
> >> problems?  Does it run some earlier kernel OK?  
> > The disks are both fine, they worked in other pcs without problems. The
> > ide controller card also worked fine, and the motherboard is new -
> > whatever you can expect with that. Maybe the combination is the problem.
> 
> So this is definitely a hardware problem?

Is it?  I don't recall us having established that.  Does the machine run
any earlier kernel without failing?

> Which component is most likely
> to be the bad one?

Motherboard, I guess.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 2.6.18 BUG: unable to handle kernel NULL pointer dereference at virtual address 000,0000a
  2006-09-25 21:26           ` Andrew Morton
@ 2006-10-02 17:01             ` Christian Weiske
  2006-10-03 14:20               ` Christian Weiske
  0 siblings, 1 reply; 15+ messages in thread
From: Christian Weiske @ 2006-10-02 17:01 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, reiserfs-dev, Ingo Molnar, Nick Piggin

[-- Attachment #1: Type: text/plain, Size: 384 bytes --]

Andrew,

> Is it?  I don't recall us having established that.  Does the machine run
> any earlier kernel without failing?
The lowest version I could go back was 2.6.12 which also panicked, so I
guess it happens everywhere.

I am now trying to get a small disk that is not accessed via the pci ide
card, perhaps that brings more info.

-- 
Regards/MfG,
Christian Weiske


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: 2.6.18 BUG: unable to handle kernel NULL pointer dereference at virtual address 000,0000a
  2006-10-02 17:01             ` Christian Weiske
@ 2006-10-03 14:20               ` Christian Weiske
  0 siblings, 0 replies; 15+ messages in thread
From: Christian Weiske @ 2006-10-03 14:20 UTC (permalink / raw)
  To: Christian Weiske
  Cc: Andrew Morton, linux-kernel, reiserfs-dev, Ingo Molnar, Nick Piggin

[-- Attachment #1: Type: text/plain, Size: 712 bytes --]

Hello all,

>> Is it?  I don't recall us having established that.  Does the machine run
>> any earlier kernel without failing?
> I am now trying to get a small disk that is not accessed via the pci ide
> card, perhaps that brings more info.

So, after testing two more days I can say the following:
I mirrored the partitions from the 300gb drive to a 6gb one.
The error does not occur on the small disk, neither when directly
connected to the motherboard's ide channel, nor when used through the
pci card (which was my hope). And although SMART says everything is ok,
it seems to me as if the harddrive is broken somehow. Great.

Thanks for all your help!

-- 
Regards/MfG,
Christian Weiske


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2006-10-03 14:20 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-09-23 15:56 2.6.18 BUG: unable to handle kernel NULL pointer dereference at virtual address 000,0000a Christian Weiske
2006-09-23 20:42 ` Andrew Morton
2006-09-23 20:39   ` Ingo Molnar
2006-09-24  9:11   ` Christian Weiske
2006-09-24  9:30     ` Christian Weiske
2006-09-24 10:19     ` Andrew Morton
2006-09-24 17:59       ` Ingo Molnar
2006-09-24 12:20   ` Christian Weiske
2006-09-24 16:50     ` Andrew Morton
2006-09-24 17:47       ` Christian Weiske
2006-09-25  4:14         ` Nick Piggin
2006-09-25 18:36         ` Christian Weiske
2006-09-25 21:26           ` Andrew Morton
2006-10-02 17:01             ` Christian Weiske
2006-10-03 14:20               ` Christian Weiske

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).