linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* NULL pointer in kmem_cache_alloc with 2.6.25-rc1
@ 2008-02-15  6:47 Zhang, Yanmin
  2008-02-15 16:42 ` Christoph Lameter
  2008-02-18 12:59 ` Andrew Morton
  0 siblings, 2 replies; 8+ messages in thread
From: Zhang, Yanmin @ 2008-02-15  6:47 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: LKML

On my 16-core tigerton, kernel panic when I ran hackbench process testing. See
below log.

Commandline to start hackbench:
#./hackbench 150 process 2000

Kernel config options:
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set


Kernel panic at line 1637 in file mm/slub.c because object=c->freelist=NULL.

-yanmin

********************syslog***************

lkp-tt01-x8664 login: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
IP: [<ffffffff80279948>] kmem_cache_alloc+0x3c/0x6c
PGD 45c5a1067 PUD 45c5a2067 PMD 0
Oops: 0000 [1] SMP
CPU 0
Modules linked in: video output
Pid: 8095, comm: hackbench Not tainted 2.6.25-rc1 #1
RIP: 0010:[<ffffffff80279948>]  [<ffffffff80279948>] kmem_cache_alloc+0x3c/0x6c
RSP: 0018:ffff81045c5b3bf8  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff810001007d40 RCX: ffffffff805e0024
RDX: 0000000000000000 RSI: 00000000000004d0 RDI: ffffffff80928890
RBP: 00000000000004d0 R08: ffff81045cfa2a80 R09: ffff81045cfa2a80
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000064
R13: ffffffff80928890 R14: 0000000000000000 R15: 0000000000000000
FS:  00007fb63b0ab6f0(0000) GS:ffffffff808ff000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 000000045c5a0000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process hackbench (pid: 8095, threadinfo ffff81045c5b2000, task ffff81045c5ae7b0)
Stack:  ffff810437c399c0 0000000000000000 00000000000004d0 ffffffff805e0024
 ffff81045cfdbb80 00000000000004d0 7fffffffffffffff 0000000000000064
 ffff81045c5b3d04 ffffffff805dc389 0000000000000000 ffff81045c5ae7b0
Call Trace:
 [<ffffffff805e0024>] ? __alloc_skb+0x31/0x121
 [<ffffffff805dc389>] ? sock_alloc_send_skb+0x77/0x1d2
 [<ffffffff80243897>] ? autoremove_wake_function+0x0/0x2e
 [<ffffffff805e12d5>] ? memcpy_fromiovec+0x36/0x66
 [<ffffffff80636ec0>] ? unix_stream_sendmsg+0x165/0x333
 [<ffffffff805d9863>] ? sock_aio_write+0xd1/0xe0
 [<ffffffff80227834>] ? __wake_up_common+0x41/0x74
 [<ffffffff8027d267>] ? do_sync_write+0xc9/0x10c
 [<ffffffff80265fbb>] ? __do_fault+0x382/0x3cd
 [<ffffffff80243897>] ? autoremove_wake_function+0x0/0x2e
 [<ffffffff80268840>] ? handle_mm_fault+0x38a/0x70d
 [<ffffffff806afd39>] ? error_exit+0x0/0x51
 [<ffffffff80228008>] ? __dequeue_entity+0x1c/0x32
 [<ffffffff8027d9bb>] ? vfs_write+0xc0/0x136
 [<ffffffff8027dee5>] ? sys_write+0x45/0x6e
 [<ffffffff8020bdcb>] ? system_call_after_swapgs+0x7b/0x80



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NULL pointer in kmem_cache_alloc with 2.6.25-rc1
  2008-02-15  6:47 NULL pointer in kmem_cache_alloc with 2.6.25-rc1 Zhang, Yanmin
@ 2008-02-15 16:42 ` Christoph Lameter
  2008-02-19  7:03   ` Zhang, Yanmin
  2008-02-18 12:59 ` Andrew Morton
  1 sibling, 1 reply; 8+ messages in thread
From: Christoph Lameter @ 2008-02-15 16:42 UTC (permalink / raw)
  To: Zhang, Yanmin; +Cc: LKML

On Fri, 15 Feb 2008, Zhang, Yanmin wrote:

> On my 16-core tigerton, kernel panic when I ran hackbench process testing. See
> below log.
> 

> Kernel panic at line 1637 in file mm/slub.c because object=c->freelist=NULL.

Hmmmm. freelist should never be NULL. Could you rerun the test and boot with 
slub_debug to make sure that there is no memory corruption?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NULL pointer in kmem_cache_alloc with 2.6.25-rc1
  2008-02-15  6:47 NULL pointer in kmem_cache_alloc with 2.6.25-rc1 Zhang, Yanmin
  2008-02-15 16:42 ` Christoph Lameter
@ 2008-02-18 12:59 ` Andrew Morton
  2008-02-18 16:52   ` Arjan van de Ven
  1 sibling, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2008-02-18 12:59 UTC (permalink / raw)
  To: Zhang, Yanmin; +Cc: Christoph Lameter, LKML, Arjan van de Ven, Ingo Molnar

On Fri, 15 Feb 2008 14:47:01 +0800 "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> wrote:

> Call Trace:
>  [<ffffffff805e0024>] ? __alloc_skb+0x31/0x121
>  [<ffffffff805dc389>] ? sock_alloc_send_skb+0x77/0x1d2
>  [<ffffffff80243897>] ? autoremove_wake_function+0x0/0x2e
>  [<ffffffff805e12d5>] ? memcpy_fromiovec+0x36/0x66
>  [<ffffffff80636ec0>] ? unix_stream_sendmsg+0x165/0x333
>  [<ffffffff805d9863>] ? sock_aio_write+0xd1/0xe0
>  [<ffffffff80227834>] ? __wake_up_common+0x41/0x74
>  [<ffffffff8027d267>] ? do_sync_write+0xc9/0x10c
>  [<ffffffff80265fbb>] ? __do_fault+0x382/0x3cd
>  [<ffffffff80243897>] ? autoremove_wake_function+0x0/0x2e
>  [<ffffffff80268840>] ? handle_mm_fault+0x38a/0x70d
>  [<ffffffff806afd39>] ? error_exit+0x0/0x51
>  [<ffffffff80228008>] ? __dequeue_entity+0x1c/0x32
>  [<ffffffff8027d9bb>] ? vfs_write+0xc0/0x136
>  [<ffffffff8027dee5>] ? sys_write+0x45/0x6e
>  [<ffffffff8020bdcb>] ? system_call_after_swapgs+0x7b/0x80

off-topic, but...  Why are all the backtrace decodes here marked as
being unreliable?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NULL pointer in kmem_cache_alloc with 2.6.25-rc1
  2008-02-18 12:59 ` Andrew Morton
@ 2008-02-18 16:52   ` Arjan van de Ven
  2008-02-19  3:14     ` Zhang, Yanmin
  0 siblings, 1 reply; 8+ messages in thread
From: Arjan van de Ven @ 2008-02-18 16:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Zhang, Yanmin, Christoph Lameter, LKML, Ingo Molnar

On Mon, 18 Feb 2008 04:59:18 -0800
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Fri, 15 Feb 2008 14:47:01 +0800 "Zhang, Yanmin"
> <yanmin_zhang@linux.intel.com> wrote:
> 
> > Call Trace:
> >  [<ffffffff805e0024>] ? __alloc_skb+0x31/0x121
> >  [<ffffffff805dc389>] ? sock_alloc_send_skb+0x77/0x1d2
> >  [<ffffffff80243897>] ? autoremove_wake_function+0x0/0x2e
> >  [<ffffffff805e12d5>] ? memcpy_fromiovec+0x36/0x66
> >  [<ffffffff80636ec0>] ? unix_stream_sendmsg+0x165/0x333
> >  [<ffffffff805d9863>] ? sock_aio_write+0xd1/0xe0
> >  [<ffffffff80227834>] ? __wake_up_common+0x41/0x74
> >  [<ffffffff8027d267>] ? do_sync_write+0xc9/0x10c
> >  [<ffffffff80265fbb>] ? __do_fault+0x382/0x3cd
> >  [<ffffffff80243897>] ? autoremove_wake_function+0x0/0x2e
> >  [<ffffffff80268840>] ? handle_mm_fault+0x38a/0x70d
> >  [<ffffffff806afd39>] ? error_exit+0x0/0x51
> >  [<ffffffff80228008>] ? __dequeue_entity+0x1c/0x32
> >  [<ffffffff8027d9bb>] ? vfs_write+0xc0/0x136
> >  [<ffffffff8027dee5>] ? sys_write+0x45/0x6e
> >  [<ffffffff8020bdcb>] ? system_call_after_swapgs+0x7b/0x80
> 
> off-topic, but...  Why are all the backtrace decodes here marked as
> being unreliable?

probably because the stack is a tad confused, so the back tracer doesn't see
even a single good stack frame.

Is CONFIG_FRAME_POINTER on?

-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NULL pointer in kmem_cache_alloc with 2.6.25-rc1
  2008-02-18 16:52   ` Arjan van de Ven
@ 2008-02-19  3:14     ` Zhang, Yanmin
  0 siblings, 0 replies; 8+ messages in thread
From: Zhang, Yanmin @ 2008-02-19  3:14 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Andrew Morton, Christoph Lameter, LKML, Ingo Molnar

On Mon, 2008-02-18 at 08:52 -0800, Arjan van de Ven wrote:
> On Mon, 18 Feb 2008 04:59:18 -0800
> Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > On Fri, 15 Feb 2008 14:47:01 +0800 "Zhang, Yanmin"
> > <yanmin_zhang@linux.intel.com> wrote:
> > 
> > > Call Trace:
> > >  [<ffffffff805e0024>] ? __alloc_skb+0x31/0x121
> > >  [<ffffffff805dc389>] ? sock_alloc_send_skb+0x77/0x1d2
> > >  [<ffffffff80243897>] ? autoremove_wake_function+0x0/0x2e
> > >  [<ffffffff805e12d5>] ? memcpy_fromiovec+0x36/0x66
> > >  [<ffffffff80636ec0>] ? unix_stream_sendmsg+0x165/0x333
> > >  [<ffffffff805d9863>] ? sock_aio_write+0xd1/0xe0
> > >  [<ffffffff80227834>] ? __wake_up_common+0x41/0x74
> > >  [<ffffffff8027d267>] ? do_sync_write+0xc9/0x10c
> > >  [<ffffffff80265fbb>] ? __do_fault+0x382/0x3cd
> > >  [<ffffffff80243897>] ? autoremove_wake_function+0x0/0x2e
> > >  [<ffffffff80268840>] ? handle_mm_fault+0x38a/0x70d
> > >  [<ffffffff806afd39>] ? error_exit+0x0/0x51
> > >  [<ffffffff80228008>] ? __dequeue_entity+0x1c/0x32
> > >  [<ffffffff8027d9bb>] ? vfs_write+0xc0/0x136
> > >  [<ffffffff8027dee5>] ? sys_write+0x45/0x6e
> > >  [<ffffffff8020bdcb>] ? system_call_after_swapgs+0x7b/0x80
> > 
> > off-topic, but...  Why are all the backtrace decodes here marked as
> > being unreliable?
At least ffffffff80279948 is correct. The register values and ip address looks like
matching the disassembled codes.

> 
> probably because the stack is a tad confused, so the back tracer doesn't see
> even a single good stack frame.
> 
> Is CONFIG_FRAME_POINTER on?
No.

With 2.6.25-rc2, 3 x86-64 machines hit the same issue.

-yanmin



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NULL pointer in kmem_cache_alloc with 2.6.25-rc1
  2008-02-15 16:42 ` Christoph Lameter
@ 2008-02-19  7:03   ` Zhang, Yanmin
  2008-02-19  7:38     ` Pekka Enberg
  0 siblings, 1 reply; 8+ messages in thread
From: Zhang, Yanmin @ 2008-02-19  7:03 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: LKML

On Fri, 2008-02-15 at 08:42 -0800, Christoph Lameter wrote:
> On Fri, 15 Feb 2008, Zhang, Yanmin wrote:
> 
> > On my 16-core tigerton, kernel panic when I ran hackbench process testing. See
> > below log.
> > 
> 
> > Kernel panic at line 1637 in file mm/slub.c because object=c->freelist=NULL.
> 
> Hmmmm. freelist should never be NULL. Could you rerun the test and boot with 
> slub_debug to make sure that there is no memory corruption?
1) Without slub_debug option, sometime I could trigger it, sometimes not.
2) With slub_debug option, or just enable debug for slab skbuff_fclone_cache
and skbuff_head_cache, I couldn't trigger it.

I will do more testing and investigation, as the bug also exists in 2.6.25-rc2.

-yanmin



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NULL pointer in kmem_cache_alloc with 2.6.25-rc1
  2008-02-19  7:03   ` Zhang, Yanmin
@ 2008-02-19  7:38     ` Pekka Enberg
  2008-02-27 22:15       ` Christoph Lameter
  0 siblings, 1 reply; 8+ messages in thread
From: Pekka Enberg @ 2008-02-19  7:38 UTC (permalink / raw)
  To: Zhang, Yanmin
  Cc: Christoph Lameter, LKML, Ingo Molnar, Mathieu Desnoyers,
	Linus Torvalds, Torsten Kaiser

Hi Yanmin,

> On Fri, 2008-02-15 at 08:42 -0800, Christoph Lameter wrote:
> > > Kernel panic at line 1637 in file mm/slub.c because object=c->freelist=NULL.
> >
> > Hmmmm. freelist should never be NULL. Could you rerun the test and boot with
> > slub_debug to make sure that there is no memory corruption?

On Feb 19, 2008 9:03 AM, Zhang, Yanmin <yanmin_zhang@linux.intel.com> wrote:
> 1) Without slub_debug option, sometime I could trigger it, sometimes not.
> 2) With slub_debug option, or just enable debug for slab skbuff_fclone_cache
> and skbuff_head_cache, I couldn't trigger it.
>
> I will do more testing and investigation, as the bug also exists in 2.6.25-rc2.

Could you please try Ingo's patch: http://lkml.org/lkml/2008/2/19/13

Looks like there are some problems with SLUB_FASTPATH.

                                    Pekka

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NULL pointer in kmem_cache_alloc with 2.6.25-rc1
  2008-02-19  7:38     ` Pekka Enberg
@ 2008-02-27 22:15       ` Christoph Lameter
  0 siblings, 0 replies; 8+ messages in thread
From: Christoph Lameter @ 2008-02-27 22:15 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Zhang, Yanmin, LKML, Ingo Molnar, Mathieu Desnoyers,
	Linus Torvalds, Torsten Kaiser

On Tue, 19 Feb 2008, Pekka Enberg wrote:

> > I will do more testing and investigation, as the bug also exists in 2.6.25-rc2.
> 
> Could you please try Ingo's patch: http://lkml.org/lkml/2008/2/19/13
> 
> Looks like there are some problems with SLUB_FASTPATH.

Yeah that would determine if its the fastpath code. Note that the new 
fastpath code may expose bugs that so far have been tolerated by the slab 
allocators. If a store of a NULL to the first word of an object occurs 
then this will be tolerated in 2.6.24 because slub will take it as the end 
of the object list and just forget about the remaining objects. The 
fastpath code will terminate with a NULL pointer deref.

Enabling full debugging may get rid of the race due to the slowdown. It 
could help to just enable minimal debugging for the slab cache in which we 
see the issue.

So do

slub_debug=F,<affected cache>


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-02-27 22:16 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-02-15  6:47 NULL pointer in kmem_cache_alloc with 2.6.25-rc1 Zhang, Yanmin
2008-02-15 16:42 ` Christoph Lameter
2008-02-19  7:03   ` Zhang, Yanmin
2008-02-19  7:38     ` Pekka Enberg
2008-02-27 22:15       ` Christoph Lameter
2008-02-18 12:59 ` Andrew Morton
2008-02-18 16:52   ` Arjan van de Ven
2008-02-19  3:14     ` Zhang, Yanmin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).