* panic - Attempting to free lock with active block list
@ 2005-01-05 19:57 Jan-Frode Myklebust
2005-01-05 20:32 ` Chris Wright
0 siblings, 1 reply; 6+ messages in thread
From: Jan-Frode Myklebust @ 2005-01-05 19:57 UTC (permalink / raw)
To: linux-kernel, linux-xfs; +Cc: Eirik Thorsnes
We have a couple of mail-servers running first 2.6.9-1.681_FC3smp
and was later upgraded to the Fedora test kernel 2.6.10-1.727_FC3smp
which I think is pretty plain 2.6.10 + ac2. But they both keep
crashing with the message:
Kernel panic - not syncing: Attempting to free lock with active block list
Any ideas how to attack this?
We're running Centos 3.3, ext3 for root-disks, ext2 on /boot,
XFS for mail-spools, lots of nfs-mounted directories..
-jf
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: panic - Attempting to free lock with active block list
2005-01-05 19:57 panic - Attempting to free lock with active block list Jan-Frode Myklebust
@ 2005-01-05 20:32 ` Chris Wright
2005-01-05 21:38 ` Jan-Frode Myklebust
2005-01-05 21:54 ` Trond Myklebust
0 siblings, 2 replies; 6+ messages in thread
From: Chris Wright @ 2005-01-05 20:32 UTC (permalink / raw)
To: linux-kernel, linux-xfs, Eirik Thorsnes, smfrench,
trond.myklebust, matthew
* Jan-Frode Myklebust (Jan-Frode.Myklebust@bccs.uib.no) wrote:
> We have a couple of mail-servers running first 2.6.9-1.681_FC3smp
> and was later upgraded to the Fedora test kernel 2.6.10-1.727_FC3smp
> which I think is pretty plain 2.6.10 + ac2. But they both keep
> crashing with the message:
>
> Kernel panic - not syncing: Attempting to free lock with active block list
>
> Any ideas how to attack this?
>
> We're running Centos 3.3, ext3 for root-disks, ext2 on /boot,
> XFS for mail-spools, lots of nfs-mounted directories..
It seems likely it's nfs related in this case since it stresses the
fs/locks code differently than local filesystems. I recall Steve French
reporting similar issue with cifs last month.
Message-Id: <1102097193.3540.4.camel@smfhome1.smfdom>
Are those three cases really panic-worthy? Could we change to BUG_ON()
and try and get some useful debugging? Trond, Willy, any ideas?
thanks,
-chris
===== fs/locks.c 1.76 vs edited =====
--- 1.76/fs/locks.c 2005-01-04 18:48:28 -08:00
+++ edited/fs/locks.c 2005-01-05 12:31:34 -08:00
@@ -159,14 +159,20 @@ static inline void locks_free_lock(struc
BUG();
return;
}
- if (waitqueue_active(&fl->fl_wait))
- panic("Attempting to free lock with active wait queue");
+ if (waitqueue_active(&fl->fl_wait)) {
+ printk("Attempting to free lock with active wait queue");
+ BUG();
+ }
- if (!list_empty(&fl->fl_block))
- panic("Attempting to free lock with active block list");
+ if (!list_empty(&fl->fl_block)) {
+ printk("Attempting to free lock with active block list");
+ BUG();
+ }
- if (!list_empty(&fl->fl_link))
- panic("Attempting to free lock on active lock list");
+ if (!list_empty(&fl->fl_link)) {
+ printk("Attempting to free lock on active lock list");
+ BUG();
+ }
if (fl->fl_ops) {
if (fl->fl_ops->fl_release_private)
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: panic - Attempting to free lock with active block list
2005-01-05 20:32 ` Chris Wright
@ 2005-01-05 21:38 ` Jan-Frode Myklebust
2005-01-05 21:54 ` Trond Myklebust
1 sibling, 0 replies; 6+ messages in thread
From: Jan-Frode Myklebust @ 2005-01-05 21:38 UTC (permalink / raw)
To: Chris Wright
Cc: linux-kernel, linux-xfs, Eirik Thorsnes, smfrench,
trond.myklebust, matthew
On Wed, Jan 05, 2005 at 12:32:07PM -0800, Chris Wright wrote:
>
> It seems likely it's nfs related in this case since it stresses the
> fs/locks code differently than local filesystems. I recall Steve French
> reporting similar issue with cifs last month.
Also found this on the linux-cifs-client list:
http://lists.samba.org/archive/linux-cifs-client/2004-December/000617.html
Is the suggested fix also relevant for fs/nfs/file.c ?
-jf
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: panic - Attempting to free lock with active block list
2005-01-05 20:32 ` Chris Wright
2005-01-05 21:38 ` Jan-Frode Myklebust
@ 2005-01-05 21:54 ` Trond Myklebust
2005-01-06 15:17 ` Jan-Frode Myklebust
1 sibling, 1 reply; 6+ messages in thread
From: Trond Myklebust @ 2005-01-05 21:54 UTC (permalink / raw)
To: Chris Wright; +Cc: linux-kernel, linux-xfs, Eirik Thorsnes, smfrench, matthew
on den 05.01.2005 Klokka 12:32 (-0800) skreiv Chris Wright:
> * Jan-Frode Myklebust (Jan-Frode.Myklebust@bccs.uib.no) wrote:
> > We have a couple of mail-servers running first 2.6.9-1.681_FC3smp
> > and was later upgraded to the Fedora test kernel 2.6.10-1.727_FC3smp
> > which I think is pretty plain 2.6.10 + ac2. But they both keep
> > crashing with the message:
> >
> > Kernel panic - not syncing: Attempting to free lock with active block list
> >
> > Any ideas how to attack this?
Well, the prevailing theory tends to start along the lines of "find out
how to reproduce the problem...". ;-)
Looking at the NFS code, I can attempt a wild guess about what may be
happening: there may be a race when pressing ^C in the middle of a
blocking NFS lock RPC call, and if so, the following patch will fix it.
Try it, and see whether or not it fixes your problem, but if it doesn't,
then I agree with Chris' suggestion of replacing those "panic()" calls
with BUG_ON()s.
Cheers,
Trond
file.c | 2 +-
1 files changed, 1 insertion(+), 1 deletion(-)
Index: linux-2.6.10/fs/nfs/file.c
===================================================================
--- linux-2.6.10.orig/fs/nfs/file.c
+++ linux-2.6.10/fs/nfs/file.c
@@ -374,7 +374,7 @@ static int do_setlk(struct file *filp, i
* the process exits.
*/
if (status == -EINTR || status == -ERESTARTSYS)
- posix_lock_file(filp, fl);
+ posix_lock_file_wait(filp, fl);
} else
status = posix_lock_file_wait(filp, fl);
unlock_kernel();
--
Trond Myklebust <trond.myklebust@fys.uio.no>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: panic - Attempting to free lock with active block list
2005-01-05 21:54 ` Trond Myklebust
@ 2005-01-06 15:17 ` Jan-Frode Myklebust
0 siblings, 0 replies; 6+ messages in thread
From: Jan-Frode Myklebust @ 2005-01-06 15:17 UTC (permalink / raw)
To: Trond Myklebust
Cc: Chris Wright, linux-kernel, linux-xfs, Eirik Thorsnes, smfrench, matthew
On Wed, Jan 05, 2005 at 10:54:03PM +0100, Trond Myklebust wrote:
>
> Looking at the NFS code, I can attempt a wild guess about what may be
> happening: there may be a race when pressing ^C in the middle of a
> blocking NFS lock RPC call, and if so, the following patch will fix it.
A whopping 9 hours of uptime now :) So the one-liner patch seems to have
fixed it.
Thanks!
> - posix_lock_file(filp, fl);
> + posix_lock_file_wait(filp, fl);
-jf
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: panic - Attempting to free lock with active block list
@ 2005-01-11 16:09 Anders Saaby
0 siblings, 0 replies; 6+ messages in thread
From: Anders Saaby @ 2005-01-11 16:09 UTC (permalink / raw)
To: linux-kernel
Hi Myklebust(s) :)
I have seen the exact same error on one of my webservers which is serving
from an NFS export and under heavy load. ~2 hours uptime before panic'ing.
I then tried Trond's patch which seems to work. 14 hours of uptime now. :)
Anyways, I have a couple of issues you might be able to clear up for me:
First issue:
New strange message in the kernel log:
"nlmclnt_lock: VFS is out of sync with lock manager!"
- What does this mean? - Is it bad?, What can i do?
Second issue:
my fs/nfs/file.c doesn't look like yours (Vanilla 2.6.10):
<fs/nfs/file.c SNIP>
status = NFS_PROTO(inode)->lock(filp, cmd, fl);
/* If we were signalled we still need to ensure that
* we clean up any state on the server. We therefore
* record the lock call as having succeeded in order to
* ensure that locks_remove_posix() cleans it out when
* the process exits.
*/
if (status == -EINTR || status == -ERESTARTSYS)
posix_lock_file_wait(filp, fl);
unlock_kernel();
if (status < 0)
return status;
/*
* Make sure we clear the cache whenever we try to get the lock.
* This makes locking act as a cache coherency point.
*/
filemap_fdatawrite(filp->f_mapping);
down(&inode->i_sem);
nfs_wb_all(inode); /* we may have slept */
up(&inode->i_sem);
filemap_fdatawait(filp->f_mapping);
nfs_zap_caches(inode);
return 0;
</SNIP>
So... Am I missing another patch or something else?
Jan-Frode Myklebust wrote:
> On Wed, Jan 05, 2005 at 10:54:03PM +0100, Trond Myklebust wrote:
>>
>> Looking at the NFS code, I can attempt a wild guess about what may be
>> happening: there may be a race when pressing ^C in the middle of a
>> blocking NFS lock RPC call, and if so, the following patch will fix it.
>
>
> A whopping 9 hours of uptime now :) So the one-liner patch seems to have
> fixed it.
>
> Thanks!
>
>> - posix_lock_file(filp, fl);
>> + posix_lock_file_wait(filp, fl);
>
>
> -jf
--
Med venlig hilsen - Best regards - Meilleures salutations
Anders Saaby
Systems Engineer
------------------------------------------------
Cohaesio A/S - Maglebjergvej 5D - DK-2800 Lyngby
Phone: +45 45 880 888 - Fax: +45 45 880 777
Mail: as@cohaesio.com - http://www.cohaesio.com
------------------------------------------------
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2005-01-11 16:09 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-01-05 19:57 panic - Attempting to free lock with active block list Jan-Frode Myklebust
2005-01-05 20:32 ` Chris Wright
2005-01-05 21:38 ` Jan-Frode Myklebust
2005-01-05 21:54 ` Trond Myklebust
2005-01-06 15:17 ` Jan-Frode Myklebust
2005-01-11 16:09 Anders Saaby
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).