[NFS client] NFS locks not released on abnormal process termination

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [NFS client] NFS locks not released on abnormal process termination
@ 2003-12-06  4:48 Kenny Simpson
  2003-12-06 19:50 ` Philippe Troin
  0 siblings, 1 reply; 16+ messages in thread
From: Kenny Simpson @ 2003-12-06  4:48 UTC (permalink / raw)
  To: linux-kernel

I have a up-to-date RH9 (kernel 2.4.20-24.9, nfs utils 1.0.1-3.9) I'm using as
an NFS client (the server is a NetApp), and I'm trying to use advisory locking
of a file.
If the process that locks the file exits normally, the lock is released, and
everything is fine.
However, if the process aborts, the lock is left with no clear way to remove
it.  I must remove the file to get rid of the lock.

Details:
Here is a test case:
int main()
{
  int fd = open("file", O_RDWR);
  if (lockf( fd, F_TLOCK, 0 ) < 0)
....  print error message query owner
  pause();
  close( fd );
}

If I run this, when it gets to the pause(), I can clearly see in /proc/locks
the process owning the lock.
If I then kill -ABRT <pid>, the entry in /proc/locks goes away, but the lock is
not removed from the server.
When I run the program a second time, the lock acquire failes, and it says the
(now defunct) old process still owns the lock.  Since I cannot easily make
another process with the id of the original, I seem to have no way to
explicitly release the lock.

I even ran ethereal to watch which NLM requests were being made.  No unlock
request was ever sent, so I don't think this can be a server issue.

Any ideas?  Is it supposed to work this way?

thanks,
-Kenny

__________________________________
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [NFS client] NFS locks not released on abnormal process termination
  2003-12-06  4:48 [NFS client] NFS locks not released on abnormal process termination Kenny Simpson
@ 2003-12-06 19:50 ` Philippe Troin
  2003-12-08  3:39   ` Kenny Simpson
  0 siblings, 1 reply; 16+ messages in thread
From: Philippe Troin @ 2003-12-06 19:50 UTC (permalink / raw)
  To: Kenny Simpson; +Cc: linux-kernel, nfs

Kenny Simpson <theonetruekenny@yahoo.com> writes:

> I have a up-to-date RH9 (kernel 2.4.20-24.9, nfs utils 1.0.1-3.9) I'm using as
> an NFS client (the server is a NetApp), and I'm trying to use advisory locking
> of a file.
> If the process that locks the file exits normally, the lock is released, and
> everything is fine.
> However, if the process aborts, the lock is left with no clear way to remove
> it.  I must remove the file to get rid of the lock.
> 
> Details:
> Here is a test case:
> int main()
> {
>   int fd = open("file", O_RDWR);
>   if (lockf( fd, F_TLOCK, 0 ) < 0)
> ....  print error message query owner
>   pause();
>   close( fd );
> }
> 
> If I run this, when it gets to the pause(), I can clearly see in /proc/locks
> the process owning the lock.
> If I then kill -ABRT <pid>, the entry in /proc/locks goes away, but the lock is
> not removed from the server.
> When I run the program a second time, the lock acquire failes, and it says the
> (now defunct) old process still owns the lock.  Since I cannot easily make
> another process with the id of the original, I seem to have no way to
> explicitly release the lock.

I've also seen this behavior witht the stock 2.4.22 and 2.4.23
kernels.

See the thread:

 http://sourceforge.net/mailarchive/forum.php?thread_id=3213117&forum_id=4930
 
 
> I even ran ethereal to watch which NLM requests were being made.  No unlock
> request was ever sent, so I don't think this can be a server issue.
> 
> Any ideas?  Is it supposed to work this way?

No and no.

Phil.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [NFS client] NFS locks not released on abnormal process termination
  2003-12-06 19:50 ` Philippe Troin
@ 2003-12-08  3:39   ` Kenny Simpson
  2003-12-08  5:16     ` Trond Myklebust
  0 siblings, 1 reply; 16+ messages in thread
From: Kenny Simpson @ 2003-12-08  3:39 UTC (permalink / raw)
  To: Philippe Troin; +Cc: linux-kernel, nfs

--- Philippe Troin <phil@fifi.org> wrote:
> 
> I've also seen this behavior witht the stock 2.4.22 and 2.4.23
> kernels.
> 
> See the thread:
> 
>  http://sourceforge.net/mailarchive/forum.php?thread_id=3213117&forum_id=4930
>  

So, this patch has not found its way into any kernel yet?
Is there anyone actively persuing this bug?

-Kenny


__________________________________
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [NFS client] NFS locks not released on abnormal process termination
  2003-12-08  3:39   ` Kenny Simpson
@ 2003-12-08  5:16     ` Trond Myklebust
  2003-12-08 17:32       ` Philippe Troin
  0 siblings, 1 reply; 16+ messages in thread
From: Trond Myklebust @ 2003-12-08  5:16 UTC (permalink / raw)
  To: Kenny Simpson; +Cc: Philippe Troin, linux-kernel, nfs

>>>>> " " == Kenny Simpson <theonetruekenny@yahoo.com> writes:

     > So, this patch has not found its way into any kernel yet?  Is
     > there anyone actively persuing this bug?

Feel free. There are only so many hours in a day, and right now
mine are pretty much overbooked with NFSv4 stuff...

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [NFS client] NFS locks not released on abnormal process termination
  2003-12-08  5:16     ` Trond Myklebust
@ 2003-12-08 17:32       ` Philippe Troin
  2003-12-08 19:56         ` Trond Myklebust
  0 siblings, 1 reply; 16+ messages in thread
From: Philippe Troin @ 2003-12-08 17:32 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Kenny Simpson, linux-kernel, nfs

Trond Myklebust <trond.myklebust@fys.uio.no> writes:

> >>>>> " " == Kenny Simpson <theonetruekenny@yahoo.com> writes:
> 
>      > So, this patch has not found its way into any kernel yet?  Is
>      > there anyone actively persuing this bug?
> 
> Feel free. There are only so many hours in a day, and right now
> mine are pretty much overbooked with NFSv4 stuff...

Please note that this fix only mitigates the bug: it can still occur,
but much less frequently. Before this patch, nfsd would loose track of
the lock (see the enclosed program at the beginning of the thread)
after a few (<5) kills. With the patch, it takes sometimes as many as
300~500 kills before the bugs manifests itself.

Trond, do you think I should push the patch to Marcelo, or should I
wait for a better fix? I don't think Marcelo would accept a partial
fix. I would try to fix it myself, but I have no clue on the inner
workings of lockd/nfsd.

Phil.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [NFS client] NFS locks not released on abnormal process termination
  2003-12-08 17:32       ` Philippe Troin
@ 2003-12-08 19:56         ` Trond Myklebust
  2003-12-09  8:15           ` Philippe Troin
  0 siblings, 1 reply; 16+ messages in thread
From: Trond Myklebust @ 2003-12-08 19:56 UTC (permalink / raw)
  To: Philippe Troin; +Cc: Trond Myklebust, Kenny Simpson, linux-kernel, nfs

[-- Attachment #1: Type: text/plain, Size: 856 bytes --]

På må , 08/12/2003 klokka 12:32, skreiv Philippe Troin:
> Trond, do you think I should push the patch to Marcelo, or should I
> wait for a better fix?

No. If I wanted a partial fix, I could just as well have pushed it to
Marcelo myself. When I said "feel free" I was referring to pursuing the
remaining signalling bugs.

I have a feeling the second race case of your test is that you are
interrupting the fcntl(F_SETLK) call while it is on the wire. If you do
that, then the server may record the lock as taken, but the client will
never receive the reply, and so will not know that it must clean up
locks...
Hmm... For that case, we probably want to have the locking code record
the call as having succeeded in order to ensure that we do indeed clear
out the lock on process exit. See if the appended patch helps...

Cheers,
  Trond


[-- Attachment #2: Type: text/plain, Size: 4219 bytes --]

--- linux-2.4.23-rc1/fs/nfs/file.c.orig	2003-11-16 19:28:52.000000000 -0500
+++ linux-2.4.23-rc1/fs/nfs/file.c	2003-12-08 14:53:12.000000000 -0500
@@ -241,6 +241,92 @@
 	goto out;
 }
 
+static int
+do_getlk(struct inode *inode, int cmd, struct file_lock *fl)
+{
+	int status;
+
+	lock_kernel();
+	status = nlmclnt_proc(inode, cmd, fl);
+	unlock_kernel();
+	return status;
+}
+
+static int
+do_unlk(struct inode *inode, int cmd, struct file_lock *fl)
+{
+	sigset_t oldset;
+	int status;
+
+	rpc_clnt_sigmask(NFS_CLIENT(inode), &oldset);
+	/*
+	 * Flush all pending writes before doing anything
+	 * with locks..
+	 */
+	filemap_fdatasync(inode->i_mapping);
+	down(&inode->i_sem);
+	nfs_wb_all(inode);
+	up(&inode->i_sem);
+	filemap_fdatawait(inode->i_mapping);
+
+	/* NOTE: special case
+	 *	If we're signalled while cleaning up locks on process exit, we
+	 *	still need to complete the unlock.
+	 */
+	lock_kernel();
+	status = nlmclnt_proc(inode, cmd, fl);
+	unlock_kernel();
+	rpc_clnt_sigunmask(NFS_CLIENT(inode), &oldset);
+	return status;
+}
+
+static int
+do_setlk(struct inode *inode, int cmd, struct file_lock *fl)
+{
+	int status;
+
+	/*
+	 * Flush all pending writes before doing anything
+	 * with locks..
+	 */
+	status = filemap_fdatasync(inode->i_mapping);
+	if (status == 0) {
+		down(&inode->i_sem);
+		status = nfs_wb_all(inode);
+		up(&inode->i_sem);
+		if (status == 0)
+			status = filemap_fdatawait(inode->i_mapping);
+	}
+	if (status < 0)
+		return status;
+
+	lock_kernel();
+	status = nlmclnt_proc(inode, cmd, fl);
+	/* If we were signalled we still need to ensure that
+	 * we clean up any state on the server. We therefore
+	 * record the lock call as having succeeded in order to
+	 * ensure that locks_remove_posix() cleans it out when
+	 * the process exits.
+	 */
+	if (status == -EINTR || status == -ERESTARTSYS)
+		posix_lock_file(filp, fl, 0);
+	unlock_kernel();
+	if (status < 0)
+		return status;
+
+	/*
+	 * Make sure we clear the cache whenever we try to get the lock.
+	 * This makes locking act as a cache coherency point.
+	 */
+	filemap_fdatasync(inode->i_mapping);
+	down(&inode->i_sem);
+	nfs_wb_all(inode);      /* we may have slept */
+	up(&inode->i_sem);
+	filemap_fdatawait(inode->i_mapping);
+	nfs_zap_caches(inode);
+	return 0;
+}
+
 /*
  * Lock a (portion of) a file
  */
@@ -248,8 +334,6 @@
 nfs_lock(struct file *filp, int cmd, struct file_lock *fl)
 {
 	struct inode * inode = filp->f_dentry->d_inode;
-	int	status = 0;
-	int	status2;
 
 	dprintk("NFS: nfs_lock(f=%4x/%ld, t=%x, fl=%x, r=%Ld:%Ld)\n",
 			inode->i_dev, inode->i_ino,
@@ -266,8 +350,8 @@
 	/* Fake OK code if mounted without NLM support */
 	if (NFS_SERVER(inode)->flags & NFS_MOUNT_NONLM) {
 		if (IS_GETLK(cmd))
-			status = LOCK_USE_CLNT;
-		goto out_ok;
+			return LOCK_USE_CLNT;
+		return 0;
 	}
 
 	/*
@@ -280,42 +364,9 @@
 	if (!fl->fl_owner || (fl->fl_flags & (FL_POSIX|FL_BROKEN)) != FL_POSIX)
 		return -ENOLCK;
 
-	/*
-	 * Flush all pending writes before doing anything
-	 * with locks..
-	 */
-	status = filemap_fdatasync(inode->i_mapping);
-	down(&inode->i_sem);
-	status2 = nfs_wb_all(inode);
-	if (status2 && !status)
-		status = status2;
-	up(&inode->i_sem);
-	status2 = filemap_fdatawait(inode->i_mapping);
-	if (status2 && !status)
-		status = status2;
-	if (status < 0)
-		return status;
-
-	lock_kernel();
-	status = nlmclnt_proc(inode, cmd, fl);
-	unlock_kernel();
-	if (status < 0)
-		return status;
-	
-	status = 0;
-
-	/*
-	 * Make sure we clear the cache whenever we try to get the lock.
-	 * This makes locking act as a cache coherency point.
-	 */
- out_ok:
-	if ((IS_SETLK(cmd) || IS_SETLKW(cmd)) && fl->fl_type != F_UNLCK) {
-		filemap_fdatasync(inode->i_mapping);
-		down(&inode->i_sem);
-		nfs_wb_all(inode);      /* we may have slept */
-		up(&inode->i_sem);
-		filemap_fdatawait(inode->i_mapping);
-		nfs_zap_caches(inode);
-	}
-	return status;
+	if (IS_GETLK(cmd))
+		return do_getlk(inode, cmd, fl);
+	if (fl->fl_type == F_UNLCK)
+		return do_unlk(inode, cmd, fl);
+	return do_setlk(inode, cmd, fl);
 }

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [NFS client] NFS locks not released on abnormal process termination
  2003-12-08 19:56         ` Trond Myklebust
@ 2003-12-09  8:15           ` Philippe Troin
  2003-12-09  8:42             ` Trond Myklebust
  2004-01-08 10:47             ` YAMAMOTO Takashi
  0 siblings, 2 replies; 16+ messages in thread
From: Philippe Troin @ 2003-12-09  8:15 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Kenny Simpson, linux-kernel, nfs

[-- Attachment #1: Type: text/plain, Size: 1206 bytes --]

Trond Myklebust <trond.myklebust@fys.uio.no> writes:

> På må , 08/12/2003 klokka 12:32, skreiv Philippe Troin:
> > Trond, do you think I should push the patch to Marcelo, or should I
> > wait for a better fix?
> 
> No. If I wanted a partial fix, I could just as well have pushed it to
> Marcelo myself. When I said "feel free" I was referring to pursuing the
> remaining signalling bugs.

Ok... Although I willing to "feel free", I might not have enough time
and ability...

> I have a feeling the second race case of your test is that you are
> interrupting the fcntl(F_SETLK) call while it is on the wire. If you do
> that, then the server may record the lock as taken, but the client will
> never receive the reply, and so will not know that it must clean up
> locks...
>
> Hmm... For that case, we probably want to have the locking code record
> the call as having succeeded in order to ensure that we do indeed clear
> out the lock on process exit. See if the appended patch helps...

Thanks Trond.

>From my reading of the patch, it supersedes the old patch, and is only
necessary on the client. Is also does not compile :-)
Here's an updated patch which does compile.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: linux-2.4.23-nfs-lock-race-2.patch --]
[-- Type: text/x-patch, Size: 4142 bytes --]

diff -ruN linux-2.4.23.orig/fs/nfs/file.c linux-2.4.23/fs/nfs/file.c
--- linux-2.4.23.orig/fs/nfs/file.c	Mon Dec  8 12:11:39 2003
+++ linux-2.4.23/fs/nfs/file.c	Mon Dec  8 12:26:31 2003
@@ -241,6 +241,93 @@
 	goto out;
 }
 
+static int
+do_getlk(struct inode *inode, int cmd, struct file_lock *fl)
+{
+	int status;
+
+	lock_kernel();
+	status = nlmclnt_proc(inode, cmd, fl);
+	unlock_kernel();
+	return status;
+}
+
+static int
+do_unlk(struct inode *inode, int cmd, struct file_lock *fl)
+{
+	sigset_t oldset;
+	int status;
+
+	rpc_clnt_sigmask(NFS_CLIENT(inode), &oldset);
+	/*
+	 * Flush all pending writes before doing anything
+	 * with locks..
+	 */
+	filemap_fdatasync(inode->i_mapping);
+	down(&inode->i_sem);
+	nfs_wb_all(inode);
+	up(&inode->i_sem);
+	filemap_fdatawait(inode->i_mapping);
+
+	/* NOTE: special case
+	 *	If we're signalled while cleaning up locks on process exit, we
+	 *	still need to complete the unlock.
+	 */
+	lock_kernel();
+	status = nlmclnt_proc(inode, cmd, fl);
+	unlock_kernel();
+	rpc_clnt_sigunmask(NFS_CLIENT(inode), &oldset);
+	return status;
+}
+
+static int
+do_setlk(struct file *filp, int cmd, struct file_lock *fl)
+{
+	struct inode * inode = filp->f_dentry->d_inode;
+	int status;
+
+	/*
+	 * Flush all pending writes before doing anything
+	 * with locks..
+	 */
+	status = filemap_fdatasync(inode->i_mapping);
+	if (status == 0) {
+		down(&inode->i_sem);
+		status = nfs_wb_all(inode);
+		up(&inode->i_sem);
+		if (status == 0)
+			status = filemap_fdatawait(inode->i_mapping);
+	}
+	if (status < 0)
+		return status;
+
+	lock_kernel();
+	status = nlmclnt_proc(inode, cmd, fl);
+	/* If we were signalled we still need to ensure that
+	 * we clean up any state on the server. We therefore
+	 * record the lock call as having succeeded in order to
+	 * ensure that locks_remove_posix() cleans it out when
+	 * the process exits.
+	 */
+	if (status == -EINTR || status == -ERESTARTSYS)
+		posix_lock_file(filp, fl, 0);
+	unlock_kernel();
+	if (status < 0)
+		return status;
+
+	/*
+	 * Make sure we clear the cache whenever we try to get the lock.
+	 * This makes locking act as a cache coherency point.
+	 */
+	filemap_fdatasync(inode->i_mapping);
+	down(&inode->i_sem);
+	nfs_wb_all(inode);      /* we may have slept */
+	up(&inode->i_sem);
+	filemap_fdatawait(inode->i_mapping);
+	nfs_zap_caches(inode);
+	return 0;
+}
+
 /*
  * Lock a (portion of) a file
  */
@@ -248,8 +335,6 @@
 nfs_lock(struct file *filp, int cmd, struct file_lock *fl)
 {
 	struct inode * inode = filp->f_dentry->d_inode;
-	int	status = 0;
-	int	status2;
 
 	dprintk("NFS: nfs_lock(f=%4x/%ld, t=%x, fl=%x, r=%Ld:%Ld)\n",
 			inode->i_dev, inode->i_ino,
@@ -266,8 +351,8 @@
 	/* Fake OK code if mounted without NLM support */
 	if (NFS_SERVER(inode)->flags & NFS_MOUNT_NONLM) {
 		if (IS_GETLK(cmd))
-			status = LOCK_USE_CLNT;
-		goto out_ok;
+			return LOCK_USE_CLNT;
+		return 0;
 	}
 
 	/*
@@ -280,42 +365,9 @@
 	if (!fl->fl_owner || (fl->fl_flags & (FL_POSIX|FL_BROKEN)) != FL_POSIX)
 		return -ENOLCK;
 
-	/*
-	 * Flush all pending writes before doing anything
-	 * with locks..
-	 */
-	status = filemap_fdatasync(inode->i_mapping);
-	down(&inode->i_sem);
-	status2 = nfs_wb_all(inode);
-	if (status2 && !status)
-		status = status2;
-	up(&inode->i_sem);
-	status2 = filemap_fdatawait(inode->i_mapping);
-	if (status2 && !status)
-		status = status2;
-	if (status < 0)
-		return status;
-
-	lock_kernel();
-	status = nlmclnt_proc(inode, cmd, fl);
-	unlock_kernel();
-	if (status < 0)
-		return status;
-	
-	status = 0;
-
-	/*
-	 * Make sure we clear the cache whenever we try to get the lock.
-	 * This makes locking act as a cache coherency point.
-	 */
- out_ok:
-	if ((IS_SETLK(cmd) || IS_SETLKW(cmd)) && fl->fl_type != F_UNLCK) {
-		filemap_fdatasync(inode->i_mapping);
-		down(&inode->i_sem);
-		nfs_wb_all(inode);      /* we may have slept */
-		up(&inode->i_sem);
-		filemap_fdatawait(inode->i_mapping);
-		nfs_zap_caches(inode);
-	}
-	return status;
+	if (IS_GETLK(cmd))
+		return do_getlk(inode, cmd, fl);
+	if (fl->fl_type == F_UNLCK)
+		return do_unlk(inode, cmd, fl);
+	return do_setlk(filp, cmd, fl);
 }

[-- Attachment #3: Type: text/plain, Size: 212 bytes --]


I am still running tests, but so far it looks good (that is all locks
are freed when a process with locks running on a NFS client is
killed).

I am still running tests and will report their result later.

Phil.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [NFS client] NFS locks not released on abnormal process termination
  2003-12-09  8:15           ` Philippe Troin
@ 2003-12-09  8:42             ` Trond Myklebust
  2003-12-09 18:46               ` Philippe Troin
  2004-01-08 10:47             ` YAMAMOTO Takashi
  1 sibling, 1 reply; 16+ messages in thread
From: Trond Myklebust @ 2003-12-09  8:42 UTC (permalink / raw)
  To: Philippe Troin; +Cc: Kenny Simpson, linux-kernel, nfs

>>>>> " " == Philippe Troin <phil@fifi.org> writes:

     > From my reading of the patch, it supersedes the old patch, and
     > is only
     > necessary on the client. Is also does not compile :-)

Yeah, I admit I didn't test it out...

     > Here's an updated patch which does compile.

Thanks.

     > I am still running tests, but so far it looks good (that is all
     > locks are freed when a process with locks running on a NFS
     > client is killed).

Good...

There are still 2 other issues with the generic POSIX locking code.
Both issues have to do with CLONE_VM and have been raised on
linux-kernel & linux-fsdevel. Unfortunately they met with no response,
so I'm unable to pursue...

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [NFS client] NFS locks not released on abnormal process termination
  2003-12-09  8:42             ` Trond Myklebust
@ 2003-12-09 18:46               ` Philippe Troin
  2003-12-10  2:42                 ` Kenny Simpson
  2003-12-15  1:04                 ` Kenny Simpson
  0 siblings, 2 replies; 16+ messages in thread
From: Philippe Troin @ 2003-12-09 18:46 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Kenny Simpson, linux-kernel, nfs

[-- Attachment #1: Type: text/plain, Size: 982 bytes --]

Trond Myklebust <trond.myklebust@fys.uio.no> writes:

> >>>>> " " == Philippe Troin <phil@fifi.org> writes:
> 
>      > From my reading of the patch, it supersedes the old patch, and
>      > is only
>      > necessary on the client. Is also does not compile :-)
> 
> Yeah, I admit I didn't test it out...
> 
>      > Here's an updated patch which does compile.
> 
> Thanks.
> 
>      > I am still running tests, but so far it looks good (that is all
>      > locks are freed when a process with locks running on a NFS
>      > client is killed).
> 
> Good...

I've ran test overnight on four boxen, and no locks were lost.
I guess you can send this patch to Marcello now.

I've tested with the enclosed program.

 
> There are still 2 other issues with the generic POSIX locking code.
> Both issues have to do with CLONE_VM and have been raised on
> linux-kernel & linux-fsdevel. Unfortunately they met with no response,
> so I'm unable to pursue...

Can we help? Pointers?

Phil.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: kill-locks.c --]
[-- Type: text/x-csrc, Size: 2549 bytes --]

#define _GNU_SOURCE
#define _LARGEFILE_SOURCE
#define _FILE_OFFSET_BITS 64

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
#include <signal.h>
#include <sys/wait.h>
#include <errno.h>

#define FNAME	"kill-locks.tmp"
#define BUFSIZE	16384
#define DEATHSIG SIGINT
#define LOCKTRIES       10
#define STRINGIFY_(x)	#x
#define STRINGIFY(x)    STRINGIFY_(x)
#define LOCKTRIESSTR    STRINGIFY(LOCKTRIES)
#define LOCKSLEEPSECS   1

void parent_try_lock(int mfd, int successcount, int status)
{
  int		i;
  struct flock	lck;

  for (i=0; 1 /* always true */; ++i)
    {
      lck.l_type   = F_WRLCK;
      lck.l_whence = SEEK_SET;
      lck.l_start  = (off_t)0;
      lck.l_len    = (off_t)0;
      if (fcntl(mfd, F_SETLK, &lck) == -1)
	{
	  if (errno == EAGAIN)
	    {
	      if ( i == LOCKTRIES )
		{
		  fprintf(stderr,
			  "unexpected status from child %08X\n"
			  "successful locking attempts: %d\n",
			  status, successcount);
		  exit(1);
		}
	      else
		{
		  sleep(LOCKSLEEPSECS);
		}
	    }
	  else
	    perror("[parent] fcntl(F_SETLK)"), exit(1);
	}
      else
	{
	  lck.l_type   = F_UNLCK;
	  lck.l_whence = SEEK_SET;
	  lck.l_start  = (off_t)0;
	  lck.l_len    = (off_t)0;
	  if (fcntl(mfd, F_SETLK, &lck) == -1)
	    perror("[parent] fcntl(F_SETLK) UNLCK"), exit(1);
	  if (status)
	    fprintf(stderr, "[parent] transient error, going on...\n");
	  break;
	}
    }
}

int
main()
{
  int	successcount = 0;
  int	mfd;
  /**/

  mfd = open(FNAME, O_RDWR|O_CREAT, 0666);
  if (mfd == -1)
    perror("open()"), exit(1);
  parent_try_lock(mfd, successcount, 0);

  while (1)
    {
      pid_t	childpid;
      int	status;
      /**/

      childpid = fork();
      if (childpid == (pid_t) -1)
	perror("fork()"), exit(1);
      if (childpid == 0)
	{
	  /* Child */
	  int		fd;
	  struct flock	lck;
	  char		buf[BUFSIZE];
	  /**/

	  fd = open(FNAME, O_RDWR|O_CREAT, 0666);
	  if (fd == -1)
	    perror("[child] open()"), exit(1);

	  lck.l_type   = F_WRLCK;
	  lck.l_whence = SEEK_SET;
	  lck.l_start  = (off_t)0;
	  lck.l_len    = (off_t)0;
	  if (fcntl(fd, F_SETLK, &lck) == -1)
	    perror("[child] fcntl(F_SETLK)"), exit(1);
	  memset(buf, 0, sizeof(buf));
	  while(1)
	    write(fd, buf, sizeof(buf));
	}

      usleep(rand()%1000);
      kill(childpid, DEATHSIG);
      if (waitpid(childpid, &status, 0) != childpid)
	perror("waitpid"), exit(1);
      if ( ! (WIFSIGNALED(status) && WTERMSIG(status) == DEATHSIG))
	parent_try_lock(mfd, successcount, status);
      ++successcount;
    }
}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [NFS client] NFS locks not released on abnormal process termination
  2003-12-09 18:46               ` Philippe Troin
@ 2003-12-10  2:42                 ` Kenny Simpson
  2003-12-15  1:04                 ` Kenny Simpson
  1 sibling, 0 replies; 16+ messages in thread
From: Kenny Simpson @ 2003-12-10  2:42 UTC (permalink / raw)
  To: Philippe Troin, trond.myklebust; +Cc: linux-kernel, nfs

> I've ran test overnight on four boxen, and no locks were lost.
> I guess you can send this patch to Marcello now.
>
Excellent work!

> > There are still 2 other issues with the generic POSIX locking code.
> > Both issues have to do with CLONE_VM and have been raised on
> > linux-kernel & linux-fsdevel. Unfortunately they met with no response,
> > so I'm unable to pursue...
> 
> Can we help? Pointers?
Let me know if/how I can help.

Again, great work.

thanks,
-Kenny

__________________________________
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [NFS client] NFS locks not released on abnormal process termination
  2003-12-09 18:46               ` Philippe Troin
  2003-12-10  2:42                 ` Kenny Simpson
@ 2003-12-15  1:04                 ` Kenny Simpson
  2003-12-15  1:14                   ` Trond Myklebust
  1 sibling, 1 reply; 16+ messages in thread
From: Kenny Simpson @ 2003-12-15  1:04 UTC (permalink / raw)
  To: Philippe Troin; +Cc: trond.myklebust, linux-kernel, nfs

Any hope of getting this patch into 2.6?

-Kenny


__________________________________
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [NFS client] NFS locks not released on abnormal process termination
  2003-12-15  1:04                 ` Kenny Simpson
@ 2003-12-15  1:14                   ` Trond Myklebust
  0 siblings, 0 replies; 16+ messages in thread
From: Trond Myklebust @ 2003-12-15  1:14 UTC (permalink / raw)
  To: Kenny Simpson; +Cc: Philippe Troin, linux-kernel, nfs

>>>>> " " == Kenny Simpson <theonetruekenny@yahoo.com> writes:

     > Any hope of getting this patch into 2.6?  -Kenny

2.6 is in full freeze at the moment. Later perhaps...

Cheers,
  Trond



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [NFS client] NFS locks not released on abnormal process termination
  2003-12-09  8:15           ` Philippe Troin
  2003-12-09  8:42             ` Trond Myklebust
@ 2004-01-08 10:47             ` YAMAMOTO Takashi
  2004-01-08 16:50               ` trond.myklebust
  1 sibling, 1 reply; 16+ messages in thread
From: YAMAMOTO Takashi @ 2004-01-08 10:47 UTC (permalink / raw)
  To: phil; +Cc: trond.myklebust, theonetruekenny, linux-kernel, nfs

hi,

> +	status = nlmclnt_proc(inode, cmd, fl);
> +	/* If we were signalled we still need to ensure that
> +	 * we clean up any state on the server. We therefore
> +	 * record the lock call as having succeeded in order to
> +	 * ensure that locks_remove_posix() cleans it out when
> +	 * the process exits.
> +	 */
> +	if (status == -EINTR || status == -ERESTARTSYS)
> +		posix_lock_file(filp, fl, 0);
> +	unlock_kernel();
> +	if (status < 0)
> +		return status;

i think it's problematic because you can't assume the lock was
granted on the server and the signaled process might not exit immediately.

YAMAMOTO Takashi


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [NFS client] NFS locks not released on abnormal process termination
  2004-01-08 10:47             ` YAMAMOTO Takashi
@ 2004-01-08 16:50               ` trond.myklebust
  2004-01-09  2:56                 ` [NFS] " YAMAMOTO Takashi
  0 siblings, 1 reply; 16+ messages in thread
From: trond.myklebust @ 2004-01-08 16:50 UTC (permalink / raw)
  To: yamamoto; +Cc: phil, trond.myklebust, theonetruekenny, linux-kernel, nfs

>
> i think it's problematic because you can't assume the lock was
> granted on the server and the signaled process might not exit
> immediately.

The point is that it is *worse* to assume the lock was not granted,
since then it will never get cleared on the server.

The RPC layer blocks all signals except SIGKILL, so the signalled
process has no choice but to exit immediately if something gets
through.

Cheers,
  Trond



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [NFS] Re: [NFS client] NFS locks not released on abnormal process termination
  2004-01-08 16:50               ` trond.myklebust
@ 2004-01-09  2:56                 ` YAMAMOTO Takashi
  2004-01-09  3:40                   ` trond.myklebust
  0 siblings, 1 reply; 16+ messages in thread
From: YAMAMOTO Takashi @ 2004-01-09  2:56 UTC (permalink / raw)
  To: trond.myklebust; +Cc: phil, theonetruekenny, linux-kernel, nfs

hi,

> > i think it's problematic because you can't assume the lock was
> > granted on the server and the signaled process might not exit
> > immediately.
> 
> The point is that it is *worse* to assume the lock was not granted,
> since then it will never get cleared on the server.

yes.

> The RPC layer blocks all signals except SIGKILL, so the signalled
> process has no choice but to exit immediately if something gets
> through.

we're talking about interruptible mounts, aren't we?

are you referring to rpc_clnt_sigmask() ?
i think it isn't safe to assume sa_handler isn't changed during
blocking for lock.  consider CLONE_SIGHAND, for example.

YAMAMOTO Takashi


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [NFS] Re: [NFS client] NFS locks not released on abnormal process termination
  2004-01-09  2:56                 ` [NFS] " YAMAMOTO Takashi
@ 2004-01-09  3:40                   ` trond.myklebust
  0 siblings, 0 replies; 16+ messages in thread
From: trond.myklebust @ 2004-01-09  3:40 UTC (permalink / raw)
  To: yamamoto; +Cc: phil, theonetruekenny, linux-kernel, nfs

>> The RPC layer blocks all signals except SIGKILL, so the signalled
>> process has no choice but to exit immediately if something gets
>> through.
>
> we're talking about interruptible mounts, aren't we?
>
> are you referring to rpc_clnt_sigmask() ?
> i think it isn't safe to assume sa_handler isn't changed during
> blocking for lock.  consider CLONE_SIGHAND, for example.

So what? If you decide handle a signal, then you are taking full
responsibility for the recovery process. It is up to _you_ to take action
to either recover the lock or to undo it, not the kernel. To determine
whether or not the lock was taken on the server you can just do a
fcntl(GETLK) call.

All the kernel cares about is that when the process exits, it needs to
clean up all the locks that are owned by that pid.

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2004-01-09  3:41 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-12-06  4:48 [NFS client] NFS locks not released on abnormal process termination Kenny Simpson
2003-12-06 19:50 ` Philippe Troin
2003-12-08  3:39   ` Kenny Simpson
2003-12-08  5:16     ` Trond Myklebust
2003-12-08 17:32       ` Philippe Troin
2003-12-08 19:56         ` Trond Myklebust
2003-12-09  8:15           ` Philippe Troin
2003-12-09  8:42             ` Trond Myklebust
2003-12-09 18:46               ` Philippe Troin
2003-12-10  2:42                 ` Kenny Simpson
2003-12-15  1:04                 ` Kenny Simpson
2003-12-15  1:14                   ` Trond Myklebust
2004-01-08 10:47             ` YAMAMOTO Takashi
2004-01-08 16:50               ` trond.myklebust
2004-01-09  2:56                 ` [NFS] " YAMAMOTO Takashi
2004-01-09  3:40                   ` trond.myklebust

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).