linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Oops with current linus' git tree
@ 2006-01-16 18:15 Diego Calleja
  2006-01-17  4:20 ` Nick Piggin
  0 siblings, 1 reply; 9+ messages in thread
From: Diego Calleja @ 2006-01-16 18:15 UTC (permalink / raw)
  To: linux-kernel

I'm having two noticeable problems with the current linus' tree

1) Oops while watching a DVD with kaffeine (kde based video player),
   oops pasted below.

2) This is a dual p3 machine, but only one CPU is being used to
   run processes on it. CPU #1 is detected etc, but processes will
   be scheduled only in CPU #0. /proc/interrupts shows that CPU #1 is
   still used to service interrupts. I'm able to force processes to run
   on that CPU with taskset but it won't happen automatically like it
   usually does. dmesg here: http://terra.es/personal/diegocg/dmesg


Jan 16 18:04:07 estel kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000040
Jan 16 18:04:07 estel kernel:  printing eip:
Jan 16 18:04:07 estel kernel: c0147a2e
Jan 16 18:04:07 estel kernel: *pde = 00000000
Jan 16 18:04:07 estel kernel: Oops: 0000 [#1]
Jan 16 18:04:07 estel kernel: PREEMPT SMP
Jan 16 18:04:07 estel kernel: Modules linked in: radeon ipt_REJECT xt_tcpudp lp ipt_MASQUERADE iptable_nat ip_nat ip_conntrack iptable_filter
 ip_tables x_tables usbhid ohci_hcd usbcore parport_pc parport floppy pcspkr ide_cd cdrom unix
Jan 16 18:04:07 estel kernel: CPU:    0
Jan 16 18:04:07 estel kernel: EIP:    0060:[find_get_page+46/96]    Not tainted VLI
Jan 16 18:04:07 estel kernel: EFLAGS: 00010002   (2.6.15)
Jan 16 18:04:07 estel kernel: EIP is at find_get_page+0x2e/0x60
Jan 16 18:04:07 estel kernel: eax: 00000040   ebx: 00000040   ecx: 00000000   edx: 00000003
Jan 16 18:04:07 estel kernel: esi: 0003352c   edi: c1a2b178   ebp: c1a2b168   esp: c2b09e20
Jan 16 18:04:07 estel kernel: ds: 007b   es: 007b   ss: 0068
Jan 16 18:04:07 estel kernel: Process kaffeine (pid: 2164, threadinfo=c2b09000 task=e3ffc050)
Jan 16 18:04:07 estel kernel: Stack: <0>00001000 0003352c 0003352c c01491b9 00001000 0000002c f40a3cc0 f40a3d0c
Jan 16 18:04:07 estel kernel:        c1a2b0b4 0003352c 000be343 00000000 0003354a 0003353e 0003352b be344000
Jan 16 18:04:07 estel kernel:        00000000 00000000 00001000 00033521 00000020 00000000 00000000 0003353d
Jan 16 18:04:07 estel kernel: Call Trace:
Jan 16 18:04:07 estel kernel:  [do_generic_mapping_read+409/1200] do_generic_mapping_read+0x199/0x4b0
Jan 16 18:04:07 estel kernel:  [file_read_actor+0/240] file_read_actor+0x0/0xf0
Jan 16 18:04:07 estel kernel:  [__generic_file_aio_read+367/576] __generic_file_aio_read+0x16f/0x240
Jan 16 18:04:07 estel kernel:  [file_read_actor+0/240] file_read_actor+0x0/0xf0
Jan 16 18:04:07 estel kernel:  [unqueue_me+106/176] unqueue_me+0x6a/0xb0
Jan 16 18:04:07 estel kernel:  [generic_file_read+152/192] generic_file_read+0x98/0xc0
Jan 16 18:04:07 estel kernel:  [autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50
Jan 16 18:04:07 estel kernel:  [default_wake_function+0/16] default_wake_function+0x0/0x10
Jan 16 18:04:07 estel kernel:  [vfs_read+161/352] vfs_read+0xa1/0x160
Jan 16 18:04:07 estel kernel:  [generic_file_read+0/192] generic_file_read+0x0/0xc0
Jan 16 18:04:07 estel kernel:  [sys_read+65/112] sys_read+0x41/0x70
Jan 16 18:04:07 estel kernel:  [sysenter_past_esp+84/117] sysenter_past_esp+0x54/0x75
Jan 16 18:04:07 estel kernel: Code: 89 7c 24 08 8d 78 10 89 1c 24 89 c3 89 f8 89 74 24 04 89 d6 83 c3 04 e8 81 a1 1d 00 89 d8 89 f2 e8 68 83
08 00 85 c0 89 c3 74 0d <8b> 00 89 da f6 c4 40 75 1c f0 ff 42 04 89 f8 e8 be a5 1d 00 89
Jan 16 18:04:07 estel kernel:  <6>note: kaffeine[2164] exited with preempt_count 1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Oops with current linus' git tree
  2006-01-16 18:15 Oops with current linus' git tree Diego Calleja
@ 2006-01-17  4:20 ` Nick Piggin
  2006-01-17  4:24   ` Nick Piggin
  2006-01-17 13:17   ` Diego Calleja
  0 siblings, 2 replies; 9+ messages in thread
From: Nick Piggin @ 2006-01-17  4:20 UTC (permalink / raw)
  To: Diego Calleja; +Cc: linux-kernel

Diego Calleja wrote:
> I'm having two noticeable problems with the current linus' tree
> 
> 1) Oops while watching a DVD with kaffeine (kde based video player),
>    oops pasted below.
> 

 From your oops it looks as though the radix_tree_lookup in find_get_page
has returned 0x40. It could be a flipped bit - is your memory OK?

Can you apply the attached patch and try to reproduce the oops?

> 2) This is a dual p3 machine, but only one CPU is being used to
>    run processes on it. CPU #1 is detected etc, but processes will
>    be scheduled only in CPU #0. /proc/interrupts shows that CPU #1 is
>    still used to service interrupts. I'm able to force processes to run
>    on that CPU with taskset but it won't happen automatically like it
>    usually does. dmesg here: http://terra.es/personal/diegocg/dmesg
> 

What happens if you run several infinite loops to increase the load?
Does everything still stay on CPU0?

> 
> Jan 16 18:04:07 estel kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000040

Thanks,
Nick

-- 
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Oops with current linus' git tree
  2006-01-17  4:20 ` Nick Piggin
@ 2006-01-17  4:24   ` Nick Piggin
  2006-01-17 13:17   ` Diego Calleja
  1 sibling, 0 replies; 9+ messages in thread
From: Nick Piggin @ 2006-01-17  4:24 UTC (permalink / raw)
  To: Diego Calleja; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 485 bytes --]

Nick Piggin wrote:
> Diego Calleja wrote:
> 
>> I'm having two noticeable problems with the current linus' tree
>>
>> 1) Oops while watching a DVD with kaffeine (kde based video player),
>>    oops pasted below.
>>
> 
>  From your oops it looks as though the radix_tree_lookup in find_get_page
> has returned 0x40. It could be a flipped bit - is your memory OK?
> 
> Can you apply the attached patch and try to reproduce the oops?
> 

Really attached now.

-- 
SUSE Labs, Novell Inc.


[-- Attachment #2: radix-tree-debug.patch --]
[-- Type: text/plain, Size: 766 bytes --]

Index: linux-2.6/lib/radix-tree.c
===================================================================
--- linux-2.6.orig/lib/radix-tree.c	2006-01-03 19:05:57.000000000 +1100
+++ linux-2.6/lib/radix-tree.c	2006-01-17 15:17:36.000000000 +1100
@@ -233,6 +233,8 @@ int radix_tree_insert(struct radix_tree_
 	int offset;
 	int error;
 
+	BUG_ON((unsigned long)item < PAGE_OFFSET);
+
 	/* Make sure the tree is high enough.  */
 	if ((!index && !root->rnode) ||
 			index > radix_tree_maxindex(root->height)) {
@@ -334,6 +336,8 @@ void *radix_tree_lookup(struct radix_tre
 	void **slot;
 
 	slot = __lookup_slot(root, index);
+	if (slot && *slot)
+		BUG_ON((unsigned long)(*slot) < PAGE_OFFSET);
 	return slot != NULL ? *slot : NULL;
 }
 EXPORT_SYMBOL(radix_tree_lookup);

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Oops with current linus' git tree
  2006-01-17  4:20 ` Nick Piggin
  2006-01-17  4:24   ` Nick Piggin
@ 2006-01-17 13:17   ` Diego Calleja
  2006-01-18  0:20     ` Diego Calleja
  2006-01-18  3:25     ` Nick Piggin
  1 sibling, 2 replies; 9+ messages in thread
From: Diego Calleja @ 2006-01-17 13:17 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-kernel

El Tue, 17 Jan 2006 15:20:36 +1100,
Nick Piggin <nickpiggin@yahoo.com.au> escribió:

>  From your oops it looks as though the radix_tree_lookup in find_get_page
> has returned 0x40. It could be a flipped bit - is your memory OK?

It's ECC memory, I'd doubt it.


> Can you apply the attached patch and try to reproduce the oops?

You're saying that I'll have to spend all the afternoon watching
DVDs? Well, if the linux kernel needs it!


> What happens if you run several infinite loops to increase the load?
> Does everything still stay on CPU0?

Yes, I run several "cat /dev/zero > /dev/null &" and they all kept in
CPU #0. 

I did a bitsection search and I couldn't found the culprit, apparently
it is caused by a config option; now it works fine after switching off
CONFIG_HOTPLUG_CPU and some ACPI options. Also, when it didn't work
the CPU that would get all the processes could be CPU #0 or #1 - it
changed randomly depending on the boot.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Oops with current linus' git tree
  2006-01-17 13:17   ` Diego Calleja
@ 2006-01-18  0:20     ` Diego Calleja
  2006-01-18  3:23       ` Nick Piggin
  2006-01-18  3:25     ` Nick Piggin
  1 sibling, 1 reply; 9+ messages in thread
From: Diego Calleja @ 2006-01-18  0:20 UTC (permalink / raw)
  To: nickpiggin; +Cc: linux-kernel

El Tue, 17 Jan 2006 14:17:25 +0100,
Diego Calleja <diegocg@gmail.com> escribió:

> > Can you apply the attached patch and try to reproduce the oops?
> 
> You're saying that I'll have to spend all the afternoon watching
> DVDs? Well, if the linux kernel needs it!


I've been running kaffeine for hours and i didn't triggered it, it's
hard to reproduce :/

I'll continue trying to hit it, even if it was a hardware error
it should happen again!

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Oops with current linus' git tree
  2006-01-18  0:20     ` Diego Calleja
@ 2006-01-18  3:23       ` Nick Piggin
  2006-01-19 19:31         ` Diego Calleja
  0 siblings, 1 reply; 9+ messages in thread
From: Nick Piggin @ 2006-01-18  3:23 UTC (permalink / raw)
  To: Diego Calleja; +Cc: linux-kernel

Diego Calleja wrote:
> El Tue, 17 Jan 2006 14:17:25 +0100,
> Diego Calleja <diegocg@gmail.com> escribió:
> 
> 
>>>Can you apply the attached patch and try to reproduce the oops?
>>
>>You're saying that I'll have to spend all the afternoon watching
>>DVDs? Well, if the linux kernel needs it!
> 
> 
> 
> I've been running kaffeine for hours and i didn't triggered it, it's
> hard to reproduce :/
> 

That's what I feared. Thanks for trying though.

> I'll continue trying to hit it, even if it was a hardware error
> it should happen again!
> 

Yeah, it is unlikely to hit the same place if it is, but if it
is a rare bug then hopefully that check will catch it.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Oops with current linus' git tree
  2006-01-17 13:17   ` Diego Calleja
  2006-01-18  0:20     ` Diego Calleja
@ 2006-01-18  3:25     ` Nick Piggin
  2006-01-18 14:02       ` Diego Calleja
  1 sibling, 1 reply; 9+ messages in thread
From: Nick Piggin @ 2006-01-18  3:25 UTC (permalink / raw)
  To: Diego Calleja; +Cc: linux-kernel

Diego Calleja wrote:
> El Tue, 17 Jan 2006 15:20:36 +1100,
> Nick Piggin <nickpiggin@yahoo.com.au> escribió:
> 

>>What happens if you run several infinite loops to increase the load?
>>Does everything still stay on CPU0?
> 
> 
> Yes, I run several "cat /dev/zero > /dev/null &" and they all kept in
> CPU #0. 
> 
> I did a bitsection search and I couldn't found the culprit, apparently
> it is caused by a config option; now it works fine after switching off
> CONFIG_HOTPLUG_CPU and some ACPI options. Also, when it didn't work
> the CPU that would get all the processes could be CPU #0 or #1 - it
> changed randomly depending on the boot.
> 

If you can report those configuration options and the symptoms in a
new thread to lkml that would be helpful. Also if you can work out
when it started happening, that helps too.

Thanks,
Nick

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Oops with current linus' git tree
  2006-01-18  3:25     ` Nick Piggin
@ 2006-01-18 14:02       ` Diego Calleja
  0 siblings, 0 replies; 9+ messages in thread
From: Diego Calleja @ 2006-01-18 14:02 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-kernel

El Wed, 18 Jan 2006 14:25:30 +1100,
Nick Piggin <nickpiggin@yahoo.com.au> escribió:


> If you can report those configuration options and the symptoms in a
> new thread to lkml that would be helpful. Also if you can work out
> when it started happening, that helps too.


It's CONFIG_ACPI_PROCESSOR who triggers it; when compiled as module
everything works but when compiled in the kernel one of the two
CPUs doesn't get any process scheduled. I'll open a new bug report.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Oops with current linus' git tree
  2006-01-18  3:23       ` Nick Piggin
@ 2006-01-19 19:31         ` Diego Calleja
  0 siblings, 0 replies; 9+ messages in thread
From: Diego Calleja @ 2006-01-19 19:31 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-kernel

El Wed, 18 Jan 2006 14:23:09 +1100,
Nick Piggin <nickpiggin@yahoo.com.au> escribió:

> > I've been running kaffeine for hours and i didn't triggered it, it's
> > hard to reproduce :/
> > 
> 
> That's what I feared. Thanks for trying though.

Ok, I've got another oops when closing amarok. This doesn't seem to hit 
your debug checks, but I though it could be related. After the fisrt
oops I enabled the ECC event logging in the bios and it hasn't recorded
anything, so I doubt the problem is faulty ram (this is plain 2.6.16-rc1)



Eeek! page_mapcount(page) went negative! (-1)
  page->flags = 400
  page->count = 1
  page->mapping = 00000000
------------[ cut here ]------------
kernel BUG at mm/rmap.c:524!
invalid opcode: 0000 [#1]
PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: ipt_REJECT xt_tcpudp radeon lp thermal fan button processor ac ipt_MASQUERADE iptable_nat ip_nat ip_conntrack iptable_filt
er ip_tables x_tables usbhid ohci_hcd parport_pc parport usbcore pcspkr floppy ide_cd e100 cdrom unix
CPU:    0
EIP:    0060:[<c014ac11>]    Not tainted VLI
EFLAGS: 00010286   (2.6.16-rc1)
EIP is at page_remove_rmap+0x67/0x81
eax: ffffffff   ebx: c1000000   ecx: c03214f8   edx: 00000001
esi: 00000000   edi: b6208000   ebp: e7172ee8   esp: e7172ee4
ds: 007b   es: 007b   ss: 0068
Process amarokapp (pid: 5475, threadinfo=e7172000 task=e7197ac0)
Stack: <0>c1000000 e7172f44 c0145bcd b60ff000 de9832b0 e7172f64 00005ef9 00000000
       00000000 b62cc000 ee834b60 ee834b60 ee834b60 ee818e54 ffffffff ffffffff
       debdf820 c170a680 ee818e04 b62cc000 00000000 c170a680 ee818e04 e44bf440
Call Trace:
 [<c0103e58>] show_stack_log_lvl+0xaa/0xb5
 [<c0103f95>] show_registers+0x132/0x19e
 [<c01042ca>] die+0x168/0x1ed
 [<c010454e>] do_trap+0x7c/0x96
 [<c01047ad>] do_invalid_op+0x89/0x93
 [<c01038e3>] error_code+0x4f/0x54
 [<c0145bcd>] unmap_vmas+0x22d/0x487
 [<c0148900>] unmap_region+0x92/0x116
 [<c0148e97>] do_munmap+0x144/0x19a
 [<c0148f3b>] sys_munmap+0x4e/0x67
 [<c0102d87>] sysenter_past_esp+0x54/0x75
Code: 40 74 03 8b 53 0c 8b 42 04 40 50 68 0f 02 2e c0 e8 52 34 fd ff ff 73 10 68 26 02 2e c0 e8 45 34 fd ff 83 c4 10 8b 43 08 40 79 08 <0f> 0
b 0c 02 bb 01 2e c0 83 ca ff b8 10 00 00 00 e8 e7 45 ff ff
 <3>Debug: sleeping function called from invalid context at include/linux/rwsem.h:43
in_atomic():1, irqs_disabled():0
 [<c010401d>] show_trace+0xd/0xf
 [<c0104034>] dump_stack+0x15/0x17
 [<c0116ce5>] __might_sleep+0x86/0x90
 [<c011e6f3>] profile_task_exit+0x1b/0x47
 [<c011f96a>] do_exit+0x1c/0x72e
 [<c010434f>] do_simd_coprocessor_error+0x0/0x183
 [<c010454e>] do_trap+0x7c/0x96
 [<c01047ad>] do_invalid_op+0x89/0x93
 [<c01038e3>] error_code+0x4f/0x54
 [<c0145bcd>] unmap_vmas+0x22d/0x487
 [<c0148900>] unmap_region+0x92/0x116
 [<c0148e97>] do_munmap+0x144/0x19a
 [<c0148f3b>] sys_munmap+0x4e/0x67
 [<c0102d87>] sysenter_past_esp+0x54/0x75
note: amarokapp[5475] exited with preempt_count 2
scheduling while atomic: amarokapp/0x00000002/5475
 [<c010401d>] show_trace+0xd/0xf
 [<c0104034>] dump_stack+0x15/0x17
 [<c02c1b31>] schedule+0x43/0x7d1
 [<c02c3708>] rwsem_down_read_failed+0x166/0x185
 [<c0132cad>] .text.lock.futex+0x73/0xe6
 [<c0132c2b>] sys_futex+0xa2/0xb1
 [<c011b277>] mm_release+0x5a/0x65
 [<c011eeca>] exit_mm+0x16/0x139
 [<c011face>] do_exit+0x180/0x72e
 [<c010434f>] do_simd_coprocessor_error+0x0/0x183
 [<c010454e>] do_trap+0x7c/0x96
 [<c01047ad>] do_invalid_op+0x89/0x93
 [<c01038e3>] error_code+0x4f/0x54
 [<c0145bcd>] unmap_vmas+0x22d/0x487
 [<c0148900>] unmap_region+0x92/0x116
 [<c0148e97>] do_munmap+0x144/0x19a
 [<c0148f3b>] sys_munmap+0x4e/0x67
 [<c0102d87>] sysenter_past_esp+0x54/0x75

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2006-01-19 19:32 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-01-16 18:15 Oops with current linus' git tree Diego Calleja
2006-01-17  4:20 ` Nick Piggin
2006-01-17  4:24   ` Nick Piggin
2006-01-17 13:17   ` Diego Calleja
2006-01-18  0:20     ` Diego Calleja
2006-01-18  3:23       ` Nick Piggin
2006-01-19 19:31         ` Diego Calleja
2006-01-18  3:25     ` Nick Piggin
2006-01-18 14:02       ` Diego Calleja

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).