From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965672AbXCSGYv (ORCPT ); Mon, 19 Mar 2007 02:24:51 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S965691AbXCSGYv (ORCPT ); Mon, 19 Mar 2007 02:24:51 -0400 Received: from mail.suse.de ([195.135.220.2]:54246 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965672AbXCSGYu (ORCPT ); Mon, 19 Mar 2007 02:24:50 -0400 From: Neil Brown To: Andrew Morton Date: Mon, 19 Mar 2007 17:24:28 +1100 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <17918.11420.155569.991473@notabene.brown> Cc: Peter Zijlstra , Folkert van Heusden , linux-kernel@vger.kernel.org, Oleg Nesterov , "J. Bruce Fields" Subject: Re: [2.6.20] BUG: workqueue leaked lock In-Reply-To: message from Andrew Morton on Friday March 16 References: <20070313165014.GE31960@vanheusden.com> <20070315110628.8bd2c07b.akpm@linux-foundation.org> <1174034480.7124.16.camel@twins> <20070316033912.8780a9cd.akpm@linux-foundation.org> X-Mailer: VM 7.19 under Emacs 21.4.1 X-face: [Gw_3E*Gng}4rRrKRYotwlE?.2|**#s9D > OK. That's not necessarily a bug: one could envisage a (weird) piece of > code which takes a lock then releases it on a later workqueue invokation. > But I'm not sure that nfs4_laundromat() is actually supposed to be doing > anything like that. > > Then again, maybe it is: it seems to be waddling through a directory under > the control of a little state machine, with timeouts. > > Neil: help? I'm quite certain that laundromat_main does *not* leave client_mutex locked as the last thing it does is call nfs4_unlock_state which is mutex_unlock(&client_mutex); To me, that raises some doubt about whether the lock leak check is working properly... It is somewhat harder to track locking of i_mutex, but it seems to me that every time it is taken, it is released again shortly afterwards. So I think this must be a problem with leak detection, not with NFSd. NeilBrown > On Fri, 16 Mar 2007 09:41:20 +0100 Peter Zijlstra wrote: > > > On Thu, 2007-03-15 at 11:06 -0800, Andrew Morton wrote: > > > > On Tue, 13 Mar 2007 17:50:14 +0100 Folkert van Heusden wrote: > > > > ... > > > > [ 1756.728209] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577 > > > > [ 1756.728271] last function: laundromat_main+0x0/0x69 [nfsd] > > > > [ 1756.728392] 2 locks held by nfsd4/3577: > > > > [ 1756.728435] #0: (client_mutex){--..}, at: [] mutex_lock+0x8/0xa > > > > [ 1756.728679] #1: (&inode->i_mutex){--..}, at: [] mutex_lock+0x8/0xa > > > > [ 1756.728923] [] show_trace_log_lvl+0x1a/0x30 > > > > [ 1756.729015] [] show_trace+0x12/0x14 > > > > [ 1756.729103] [] dump_stack+0x16/0x18 > > > > [ 1756.729187] [] run_workqueue+0x167/0x170 > > > > [ 1756.729276] [] worker_thread+0x146/0x165 > > > > [ 1756.729368] [] kthread+0x97/0xc4 > > > > [ 1756.729456] [] kernel_thread_helper+0x7/0x10 > > > > [ 1756.729547] ======================= > > > > [ 1792.436492] svc: unknown version (0 for prog 100003, nfsd) > > > > [ 1846.683648] BUG: workqueue leaked lock or atomic: nfsd4/0x00000000/3577 > > > > [ 1846.683701] last function: laundromat_main+0x0/0x69 [nfsd] > > > > [ 1846.683832] 2 locks held by nfsd4/3577: > > > > [ 1846.683885] #0: (client_mutex){--..}, at: [] mutex_lock+0x8/0xa > > > > [ 1846.683980] #1: (&inode->i_mutex){--..}, at: [] mutex_lock+0x8/0xa > > > > [ 1846.683988] [] show_trace_log_lvl+0x1a/0x30 > > > > [ 1846.683994] [] show_trace+0x12/0x14 > > > > [ 1846.683997] [] dump_stack+0x16/0x18 > > > > [ 1846.684001] [] run_workqueue+0x167/0x170 > > > > [ 1846.684006] [] worker_thread+0x146/0x165 > > > > [ 1846.684012] [] kthread+0x97/0xc4 > > > > [ 1846.684023] [] kernel_thread_helper+0x7/0x10 > > > > > > Oleg, that's a fairly incomprehensible message we have in there. Can you > > > please explain what it means? > > > > I think I'm responsible for this message (commit > > d5abe669172f20a4129a711de0f250a4e07db298); what is says is that the > > function executed by the workqueue (here laundromat_main) leaked an > > atomic context or is still holding locks (2 locks in this case). > >