All of lore.kernel.org
 help / color / mirror / Atom feed
* NFS client fcntl locks going missing on FC6
@ 2007-02-23 23:16 M A Young
  2007-02-24 17:19 ` Steve Dickson
  0 siblings, 1 reply; 5+ messages in thread
From: M A Young @ 2007-02-23 23:16 UTC (permalink / raw)
  To: nfs

We have been having problems of files getting mysteriously locked on a
linux NFS client to a Netapp NFS server (most recently with the
kernel-2.6.19-1.2911.fc6 package, and ONTAP 7.0.5), and after some
investigation it seems that the problem is that the linux box is getting a
lock but not recording it, with the result that the lock is never removed
from the NetApp server. I reported this in RedHat's bugzilla
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=229469
and haven't heard anything from them, but it is possible that it is more
general than just a RedHat issue.
I have found a way to reproduce the locks being left on the server which
seems to occur when an attempt to lock a file is cancelled and a second
attempt to lock the file immediately afterwards succeeds, which results in
the server locking the file but not the client, and thus the lock never
being removed from the server.
My test program to reproduce the bug is attached to the RedHat bug report
at https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=148693 which can
be done by running the program twice concurrently from an NFS mount.
Is this a known bug and is it easy to fix?

	Michael Young

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NFS client fcntl locks going missing on FC6
  2007-02-23 23:16 NFS client fcntl locks going missing on FC6 M A Young
@ 2007-02-24 17:19 ` Steve Dickson
  2007-02-24 18:06   ` M A Young
  0 siblings, 1 reply; 5+ messages in thread
From: Steve Dickson @ 2007-02-24 17:19 UTC (permalink / raw)
  To: M A Young; +Cc: nfs

M A Young wrote:
> We have been having problems of files getting mysteriously locked on a
> linux NFS client to a Netapp NFS server (most recently with the
> kernel-2.6.19-1.2911.fc6 package, and ONTAP 7.0.5), and after some
> investigation it seems that the problem is that the linux box is getting a
> lock but not recording it, with the result that the lock is never removed
> from the NetApp server. I reported this in RedHat's bugzilla
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=229469
> and haven't heard anything from them, but it is possible that it is more
> general than just a RedHat issue.
In the future, when opening bugs like this (bugs against
the kernel) please make sure you added steved@redhat.com
to the cc list. Otherwise the bug will end up in bug
purgatory for a while until gets assigned correctly...

> I have found a way to reproduce the locks being left on the server which
> seems to occur when an attempt to lock a file is cancelled and a second
> attempt to lock the file immediately afterwards succeeds, which results in
> the server locking the file but not the client, and thus the lock never
> being removed from the server.
> My test program to reproduce the bug is attached to the RedHat bug report
> at https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=148693 which can
> be done by running the program twice concurrently from an NFS mount.
> Is this a known bug and is it easy to fix?
Way back when... we had a customer using our 2.4 kernel that was
able to "lose" locks by using some goofy java script...
By lose locks I mean,  the server would grant the lock,
but the lock was never locally recored due to some obscure
failure (like a conflicting locks ) . The failure was never
noted, so an unlock was never sent... thus leaving an lost lock
on the server.

We thought we fixed this by making the client send an unlock when
this obscure failure happen, but his goofy java script was still
able to lose locks but much less infrequently (once every few months)...
The ultimately the fix was using the above kernel patch and having
the script to an unlock() (ignoring the return status) and then
doing the lock... (Can you say hack! :-\ )


Looking at the 2.6.20 kernel, it appears some precautions were
taken for this race since the lock is checked locally for any
conflicts (i.e. do_vfs_lock() is called with fl_flags & FL_ACCESS),
but there still might be an problem since if the second call to
do_vfs_lock() (the one that actually records the lock in the VFS)
fails, a unlock is *not* sent back to the server....
Although  this error condition will be noted with the
following syslog message:
     "nlmclnt_lock: VFS is out of sync with lock manager!"

So are you seeing this type of message in /var/log/messages?

steved.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NFS client fcntl locks going missing on FC6
  2007-02-24 17:19 ` Steve Dickson
@ 2007-02-24 18:06   ` M A Young
  2007-02-25  0:32     ` M A Young
  0 siblings, 1 reply; 5+ messages in thread
From: M A Young @ 2007-02-24 18:06 UTC (permalink / raw)
  To: Steve Dickson; +Cc: nfs

On Sat, 24 Feb 2007, Steve Dickson wrote:

> Looking at the 2.6.20 kernel, it appears some precautions were
> taken for this race since the lock is checked locally for any
> conflicts (i.e. do_vfs_lock() is called with fl_flags & FL_ACCESS),
> but there still might be an problem since if the second call to
> do_vfs_lock() (the one that actually records the lock in the VFS)
> fails, a unlock is *not* sent back to the server....
> Although  this error condition will be noted with the
> following syslog message:
>      "nlmclnt_lock: VFS is out of sync with lock manager!"
>
> So are you seeing this type of message in /var/log/messages?

I have not had a chance to try a 2.6.20 kernel, but with the 2.6.19 Fedora
kernels we do see a lot of errors like
do_vfs_lock: VFS is out of sync with lock manager!

I have been doing more checking, and I couldn't successfully repeat my
experiment running my program twice on a single nfs client, but running it
on two different NFS clients at the same time works every time. The
type of NFS server doesn't seem to matter, as I have reproduced the
problem with a linux NFS server.

	Michael Young

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NFS client fcntl locks going missing on FC6
  2007-02-24 18:06   ` M A Young
@ 2007-02-25  0:32     ` M A Young
  2007-03-01  0:51       ` M A Young
  0 siblings, 1 reply; 5+ messages in thread
From: M A Young @ 2007-02-25  0:32 UTC (permalink / raw)
  To: Steve Dickson; +Cc: nfs

It seems I have mistaken the problem somewhat. The problem isn't as I
first thought that the lock isn't being created, as further checking shows
that the lock is created on both the client and the server. What goes
wrong in the case I have been seeing is that the unlock request sent to
the server has a new svid and so it doesn't remove the lock.

My new theory is that the problem occurs when a lock on a file fails the
first time (because the lock is blocked, and the system call subsequently
interrupted), but a subsequent locking attempt on the same file succeeds,
however somehow the system sees traces of the initial failure and as a
result the svid isn't recorded correctly.

	Michael Young

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NFS client fcntl locks going missing on FC6
  2007-02-25  0:32     ` M A Young
@ 2007-03-01  0:51       ` M A Young
  0 siblings, 0 replies; 5+ messages in thread
From: M A Young @ 2007-03-01  0:51 UTC (permalink / raw)
  To: nfs; +Cc: Steve Dickson

I have found out what was causing the bug I was seeing. The do_setlk
function of fs/nfs/file.c creates a local lock if the remote lock attempt
was blocked and then interrupted, supposedly so that the remote lock is
removed when the process exits. Unfortunately by this stage the lock has
forgotten how to do this, and if the process tries again for the same
remote lock and succeeds in getting it, the system reuses the existing
local only lock rather than creating a new one, which means that there is
apparently nothing using the owner records storing the (new) svid, and
this is information is discarded. As a result, when the file is unlocked,
there is no record of the svid that was used, so the unlock attempt uses a
new svid which of course fails, and the file is never unlocked.
I have attached a patch to the RedHat bugzilla bug (229469)
https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=148974
which under limited testing fixes the problem I was seeing by not creating
the local lock if it has no hope of removing any possible remote one,
though I suspect more thought could yield a better patch.

	Michael Young

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-03-01  0:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-23 23:16 NFS client fcntl locks going missing on FC6 M A Young
2007-02-24 17:19 ` Steve Dickson
2007-02-24 18:06   ` M A Young
2007-02-25  0:32     ` M A Young
2007-03-01  0:51       ` M A Young

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.