linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* panic - Attempting to free lock with active block list
@ 2005-01-05 19:57 Jan-Frode Myklebust
  2005-01-05 20:32 ` Chris Wright
  0 siblings, 1 reply; 6+ messages in thread
From: Jan-Frode Myklebust @ 2005-01-05 19:57 UTC (permalink / raw)
  To: linux-kernel, linux-xfs; +Cc: Eirik Thorsnes

We have a couple of mail-servers running first 2.6.9-1.681_FC3smp
and was later upgraded to the Fedora test kernel 2.6.10-1.727_FC3smp
which I think is pretty plain 2.6.10 + ac2. But they both keep
crashing with the message:

       Kernel panic - not syncing: Attempting to free lock with active block list

Any ideas how to attack this?

We're running Centos 3.3, ext3 for root-disks, ext2 on /boot,
XFS for mail-spools, lots of nfs-mounted directories..


  -jf

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: panic - Attempting to free lock with active block list
  2005-01-05 19:57 panic - Attempting to free lock with active block list Jan-Frode Myklebust
@ 2005-01-05 20:32 ` Chris Wright
  2005-01-05 21:38   ` Jan-Frode Myklebust
  2005-01-05 21:54   ` Trond Myklebust
  0 siblings, 2 replies; 6+ messages in thread
From: Chris Wright @ 2005-01-05 20:32 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, Eirik Thorsnes, smfrench,
	trond.myklebust, matthew

* Jan-Frode Myklebust (Jan-Frode.Myklebust@bccs.uib.no) wrote:
> We have a couple of mail-servers running first 2.6.9-1.681_FC3smp
> and was later upgraded to the Fedora test kernel 2.6.10-1.727_FC3smp
> which I think is pretty plain 2.6.10 + ac2. But they both keep
> crashing with the message:
> 
>        Kernel panic - not syncing: Attempting to free lock with active block list
> 
> Any ideas how to attack this?
> 
> We're running Centos 3.3, ext3 for root-disks, ext2 on /boot,
> XFS for mail-spools, lots of nfs-mounted directories..

It seems likely it's nfs related in this case since it stresses the
fs/locks code differently than local filesystems.  I recall Steve French
reporting similar issue with cifs last month.

Message-Id: <1102097193.3540.4.camel@smfhome1.smfdom>

Are those three cases really panic-worthy?  Could we change to BUG_ON()
and try and get some useful debugging?  Trond, Willy, any ideas?

thanks,
-chris

===== fs/locks.c 1.76 vs edited =====
--- 1.76/fs/locks.c	2005-01-04 18:48:28 -08:00
+++ edited/fs/locks.c	2005-01-05 12:31:34 -08:00
@@ -159,14 +159,20 @@ static inline void locks_free_lock(struc
 		BUG();
 		return;
 	}
-	if (waitqueue_active(&fl->fl_wait))
-		panic("Attempting to free lock with active wait queue");
+	if (waitqueue_active(&fl->fl_wait)) {
+		printk("Attempting to free lock with active wait queue");
+		BUG();
+	}
 
-	if (!list_empty(&fl->fl_block))
-		panic("Attempting to free lock with active block list");
+	if (!list_empty(&fl->fl_block)) {
+		printk("Attempting to free lock with active block list");
+		BUG();
+	}
 
-	if (!list_empty(&fl->fl_link))
-		panic("Attempting to free lock on active lock list");
+	if (!list_empty(&fl->fl_link)) {
+		printk("Attempting to free lock on active lock list");
+		BUG();
+	}
 
 	if (fl->fl_ops) {
 		if (fl->fl_ops->fl_release_private)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: panic - Attempting to free lock with active block list
  2005-01-05 20:32 ` Chris Wright
@ 2005-01-05 21:38   ` Jan-Frode Myklebust
  2005-01-05 21:54   ` Trond Myklebust
  1 sibling, 0 replies; 6+ messages in thread
From: Jan-Frode Myklebust @ 2005-01-05 21:38 UTC (permalink / raw)
  To: Chris Wright
  Cc: linux-kernel, linux-xfs, Eirik Thorsnes, smfrench,
	trond.myklebust, matthew

On Wed, Jan 05, 2005 at 12:32:07PM -0800, Chris Wright wrote:
> 
> It seems likely it's nfs related in this case since it stresses the
> fs/locks code differently than local filesystems.  I recall Steve French
> reporting similar issue with cifs last month.

Also found this on the linux-cifs-client list:

	http://lists.samba.org/archive/linux-cifs-client/2004-December/000617.html

Is the suggested fix also relevant for fs/nfs/file.c ?

 
   -jf

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: panic - Attempting to free lock with active block list
  2005-01-05 20:32 ` Chris Wright
  2005-01-05 21:38   ` Jan-Frode Myklebust
@ 2005-01-05 21:54   ` Trond Myklebust
  2005-01-06 15:17     ` Jan-Frode Myklebust
  1 sibling, 1 reply; 6+ messages in thread
From: Trond Myklebust @ 2005-01-05 21:54 UTC (permalink / raw)
  To: Chris Wright; +Cc: linux-kernel, linux-xfs, Eirik Thorsnes, smfrench, matthew

on den 05.01.2005 Klokka 12:32 (-0800) skreiv Chris Wright:
> * Jan-Frode Myklebust (Jan-Frode.Myklebust@bccs.uib.no) wrote:
> > We have a couple of mail-servers running first 2.6.9-1.681_FC3smp
> > and was later upgraded to the Fedora test kernel 2.6.10-1.727_FC3smp
> > which I think is pretty plain 2.6.10 + ac2. But they both keep
> > crashing with the message:
> > 
> >        Kernel panic - not syncing: Attempting to free lock with active block list
> > 
> > Any ideas how to attack this?

Well, the prevailing theory tends to start along the lines of "find out
how to reproduce the problem...". ;-)

Looking at the NFS code, I can attempt a wild guess about what may be
happening: there may be a race when pressing ^C in the middle of a
blocking NFS lock RPC call, and if so, the following patch will fix it.

Try it, and see whether or not it fixes your problem, but if it doesn't,
then I agree with Chris' suggestion of replacing those "panic()" calls
with BUG_ON()s.

Cheers,
  Trond

 file.c |    2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.10/fs/nfs/file.c
===================================================================
--- linux-2.6.10.orig/fs/nfs/file.c
+++ linux-2.6.10/fs/nfs/file.c
@@ -374,7 +374,7 @@ static int do_setlk(struct file *filp, i
 		 * the process exits.
 		 */
 		if (status == -EINTR || status == -ERESTARTSYS)
-			posix_lock_file(filp, fl);
+			posix_lock_file_wait(filp, fl);
 	} else
 		status = posix_lock_file_wait(filp, fl);
 	unlock_kernel();


-- 
Trond Myklebust <trond.myklebust@fys.uio.no>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: panic - Attempting to free lock with active block list
  2005-01-05 21:54   ` Trond Myklebust
@ 2005-01-06 15:17     ` Jan-Frode Myklebust
  0 siblings, 0 replies; 6+ messages in thread
From: Jan-Frode Myklebust @ 2005-01-06 15:17 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Chris Wright, linux-kernel, linux-xfs, Eirik Thorsnes, smfrench, matthew

On Wed, Jan 05, 2005 at 10:54:03PM +0100, Trond Myklebust wrote:
> 
> Looking at the NFS code, I can attempt a wild guess about what may be
> happening: there may be a race when pressing ^C in the middle of a
> blocking NFS lock RPC call, and if so, the following patch will fix it.


A whopping 9 hours of uptime now :) So the one-liner patch seems to have 
fixed it.

Thanks!

> -			posix_lock_file(filp, fl);
> +			posix_lock_file_wait(filp, fl);


  -jf

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: panic - Attempting to free lock with active block list
@ 2005-01-11 16:09 Anders Saaby
  0 siblings, 0 replies; 6+ messages in thread
From: Anders Saaby @ 2005-01-11 16:09 UTC (permalink / raw)
  To: linux-kernel

Hi Myklebust(s) :)

I have seen the exact same error on one of my webservers which is serving
from an NFS export and under heavy load. ~2 hours uptime before panic'ing.
I then tried Trond's patch which seems to work. 14 hours of uptime now. :)

Anyways, I have a couple of issues you might be able to clear up for me:

First issue:
New strange message in the kernel log:

"nlmclnt_lock: VFS is out of sync with lock manager!"

- What does this mean? - Is it bad?, What can i do?


Second issue:
my fs/nfs/file.c doesn't look like yours (Vanilla 2.6.10):

<fs/nfs/file.c SNIP>
        status = NFS_PROTO(inode)->lock(filp, cmd, fl);
        /* If we were signalled we still need to ensure that
         * we clean up any state on the server. We therefore
         * record the lock call as having succeeded in order to
         * ensure that locks_remove_posix() cleans it out when
         * the process exits.
         */
        if (status == -EINTR || status == -ERESTARTSYS)
                posix_lock_file_wait(filp, fl);
        unlock_kernel();
        if (status < 0)
                return status;
        /*
         * Make sure we clear the cache whenever we try to get the lock.
         * This makes locking act as a cache coherency point.
         */
        filemap_fdatawrite(filp->f_mapping);
        down(&inode->i_sem);
        nfs_wb_all(inode);      /* we may have slept */
        up(&inode->i_sem);
        filemap_fdatawait(filp->f_mapping);
        nfs_zap_caches(inode);
        return 0;
</SNIP>

So... Am I missing another patch or something else?

Jan-Frode Myklebust wrote:

> On Wed, Jan 05, 2005 at 10:54:03PM +0100, Trond Myklebust wrote:
>> 
>> Looking at the NFS code, I can attempt a wild guess about what may be
>> happening: there may be a race when pressing ^C in the middle of a
>> blocking NFS lock RPC call, and if so, the following patch will fix it.
> 
> 
> A whopping 9 hours of uptime now :) So the one-liner patch seems to have
> fixed it.
> 
> Thanks!
> 
>> -   posix_lock_file(filp, fl);
>> +   posix_lock_file_wait(filp, fl);
> 
> 
>   -jf

-- 
Med venlig hilsen - Best regards - Meilleures salutations

Anders Saaby
Systems Engineer
------------------------------------------------
Cohaesio A/S - Maglebjergvej 5D - DK-2800 Lyngby
Phone: +45 45 880 888 - Fax: +45 45 880 777
Mail: as@cohaesio.com - http://www.cohaesio.com
------------------------------------------------

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-01-11 16:09 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-01-05 19:57 panic - Attempting to free lock with active block list Jan-Frode Myklebust
2005-01-05 20:32 ` Chris Wright
2005-01-05 21:38   ` Jan-Frode Myklebust
2005-01-05 21:54   ` Trond Myklebust
2005-01-06 15:17     ` Jan-Frode Myklebust
2005-01-11 16:09 Anders Saaby

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).