From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754453Ab1FSUnk (ORCPT ); Sun, 19 Jun 2011 16:43:40 -0400 Received: from mail-fx0-f46.google.com ([209.85.161.46]:46075 "EHLO mail-fx0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751795Ab1FSUnh (ORCPT ); Sun, 19 Jun 2011 16:43:37 -0400 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=yC2FXcxNfar/fmrYO575pRy9oa9tGlsy4TbrDacqusdHzL/wGu8ZZiHAr0WKGkrDV7 m+B1x92lPULXBPtmlAWQgi6C+1/N+PRdSvm4SN7RQLHJIOxc1pfGiuQuSx6ngc2EvfAW ZG2jnZhAiA4hDZNBFrOktsPxfmywGT2S6t7uM= Date: Sun, 19 Jun 2011 22:43:29 +0200 From: Marcin Slusarz To: Thomas Gleixner Cc: Catalin Marinas , Tejun Heo , LKML , Dipankar Sarma , "Paul E. McKenney" , Andrew Morton Subject: Re: [PATCH] debugobjects: fix boot crash when both kmemleak and debugobjects are enabled (was: Re: early kernel crash when kmemleak is enabled) Message-ID: <20110619204329.GA4771@joi.lan> References: <20110515105505.GA21631@joi.lan> <20110519134218.GH627@htj.dyndns.org> <1305812924.26710.41.camel@e102109-lin.cambridge.arm.com> <20110519135425.GI627@htj.dyndns.org> <1305814133.26710.69.camel@e102109-lin.cambridge.arm.com> <20110527202503.GA2769@joi.lan> <20110528112342.GA3068@joi.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110528112342.GA3068@joi.lan> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, May 28, 2011 at 01:23:42PM +0200, Marcin Slusarz wrote: > On Fri, May 27, 2011 at 10:37:54PM +0200, Thomas Gleixner wrote: > > On Fri, 27 May 2011, Marcin Slusarz wrote: > > > On Thu, May 19, 2011 at 03:08:53PM +0100, Catalin Marinas wrote: > > > > On Thu, 2011-05-19 at 14:54 +0100, Tejun Heo wrote: > > > > > On Thu, May 19, 2011 at 02:48:44PM +0100, Catalin Marinas wrote: > > > > > > Thanks for tracking this down. Untested (I can add a log afterwards): > > > > > > > > > > > > diff --git a/init/main.c b/init/main.c > > > > > > index 4a9479e..48df882 100644 > > > > > > --- a/init/main.c > > > > > > +++ b/init/main.c > > > > > > @@ -580,8 +580,8 @@ asmlinkage void __init start_kernel(void) > > > > > > #endif > > > > > > page_cgroup_init(); > > > > > > enable_debug_pagealloc(); > > > > > > - kmemleak_init(); > > > > > > debug_objects_mem_init(); > > > > > > + kmemleak_init(); > > > > > > setup_per_cpu_pageset(); > > > > > > numa_policy_init(); > > > > > > if (late_time_init) > > > > > > > > > > Heh, that was swift. Yeap, seems to work here. Please feel free to > > > > > add my Tested-by. > > > > > > > > Thanks. I have two other minor kmemleak fixes, so I'll send Linus a pull > > > > request in the next day or so. > > > > > > > > > > With this patch applied kernel didn't panic, but kmemleak did not work either: > > > > > > kmemleak: Early log buffer exceeded, please increase DEBUG_KMEMLEAK_EARLY_LOG_SIZE > > > kmemleak: Kernel memory leak detector disabled > > > > > > I increased DEBUG_KMEMLEAK_EARLY_LOG_SIZE from 400 to 1000, and it crashed in > > > exactly the same way: > > > > ... > > > > > The problem is: debugobjects want to use workqueues (system_wq actually), but they > > > are initialized much later in a boot process. > > > > > > Attached patch fixes this issue for me. > > > > > > > > > diff --git a/lib/debugobjects.c b/lib/debugobjects.c > > > index 9d86e45..a78b7c6 100644 > > > --- a/lib/debugobjects.c > > > +++ b/lib/debugobjects.c > > > @@ -198,7 +198,7 @@ static void free_object(struct debug_obj *obj) > > > * initialized: > > > */ > > > if (obj_pool_free > ODEBUG_POOL_SIZE && obj_cache) > > > - sched = !work_pending(&debug_obj_work); > > > + sched = keventd_up() && !work_pending(&debug_obj_work); > > > hlist_add_head(&obj->node, &obj_pool); > > > obj_pool_free++; > > > obj_pool_used--; > > > > > > > Sigh, yes. Care to resend with changelog and signed-off-by ? > > > > Sure. > > --- > From: Marcin Slusarz > Subject: [PATCH] debugobjects: fix boot crash when both kmemleak and debugobjects are enabled > > order of initialization look like this: > > ... > debugobjects > kmemleak > ...(lots of other subsystems)... > workqueues (through early initcall) > ... > > debugobjects use schedule_work for batch freeing of its data and kmemleak > heavily use debugobjects, so when it comes to freeing and workqueues were > not initialized yet, kernel crashes: > > BUG: unable to handle kernel NULL pointer dereference at (null) > IP: [] __queue_work+0x29/0x41a > PGD 0 > Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC > (...) > Pid: 1, comm: swapper Not tainted 2.6.39-rc4-nv+ #721 Bochs Bochs > RIP: 0010:[] [] __queue_work+0x29/0x41a > (...) > Call Trace: > > [] queue_work_on+0x16/0x1d > [] queue_work+0x29/0x55 > [] schedule_work+0x13/0x15 > [] free_object+0x90/0x95 > [] debug_check_no_obj_freed+0x187/0x1d3 > [] ? _raw_spin_unlock_irqrestore+0x30/0x4d > [] ? free_object_rcu+0x68/0x6d > [] kmem_cache_free+0x64/0x12c > [] free_object_rcu+0x68/0x6d > [] __rcu_process_callbacks+0x1b6/0x2d9 > [] ? tick_handle_periodic+0x1f/0x6c > [] rcu_process_callbacks+0x7b/0x83 > [] __do_softirq+0x117/0x207 > [] ? handle_irq_event+0x47/0x5c > [] call_softirq+0x1c/0x30 > [] do_softirq+0x38/0x80 > [] irq_exit+0x4e/0xa0 > [] do_IRQ+0x97/0xae > [] common_interrupt+0x13/0x13 > > [] ? delay_tsc+0x48/0xcb > [] __const_udelay+0x25/0x27 > [] timer_irq_works+0x3c/0x77 > [] setup_IO_APIC+0x337/0x755 > [] native_smp_prepare_cpus+0x3a0/0x451 > [] ? _raw_spin_unlock_irq+0x19/0x34 > [] kernel_init+0x4e/0x135 > [] ? trace_hardirqs_on_thunk+0x3a/0x3c > [] kernel_thread_helper+0x4/0x10 > [] ? finish_task_switch+0x5a/0xcb > [] ? _raw_spin_unlock_irq+0x19/0x34 > [] ? retint_restore_args+0xe/0xe > [] ? parse_early_options+0x20/0x20 > [] ? gs_change+0xb/0xb > Code: c9 c3 55 48 89 e5 41 57 41 56 41 55 49 89 f5 41 54 48 c7 c6 a0 b7 a3 81 53 41 89 fc 48 83 ec 28 48 89 d3 48 89 d7 e8 63 d7 1b 00 > f6 45 00 40 0f 84 6b 01 00 00 b8 09 00 00 00 83 3d 28 10 a0 > RIP [] __queue_work+0x29/0x41a > RSP > CR2: 0000000000000000 > ---[ end trace 4eaa2a86a8e2da22 ]--- > Kernel panic - not syncing: Fatal exception in interrupt > > ...because system_wq is NULL. > > Fix it by checking if workqueues susbystem was initialized before using. > > Signed-off-by: Marcin Slusarz > Cc: Thomas Gleixner > Cc: Tejun Heo > Cc: Catalin Marinas > Cc: stable@kernel.org > --- > lib/debugobjects.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/lib/debugobjects.c b/lib/debugobjects.c > index 9d86e45..a78b7c6 100644 > --- a/lib/debugobjects.c > +++ b/lib/debugobjects.c > @@ -198,7 +198,7 @@ static void free_object(struct debug_obj *obj) > * initialized: > */ > if (obj_pool_free > ODEBUG_POOL_SIZE && obj_cache) > - sched = !work_pending(&debug_obj_work); > + sched = keventd_up() && !work_pending(&debug_obj_work); > hlist_add_head(&obj->node, &obj_pool); > obj_pool_free++; > obj_pool_used--; > -- What's up with this patch? I can't find it in -next... Marcin