linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* lease/delegation/oplock semantics
@ 2011-06-10  0:09 J. Bruce Fields
  2011-06-10  0:10 ` [PATCH 1/2] locks: introduce i_blockleases to close lease races J. Bruce Fields
  0 siblings, 1 reply; 13+ messages in thread
From: J. Bruce Fields @ 2011-06-10  0:09 UTC (permalink / raw)
  To: linux-nfs-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, Christoph Hellwig,
	Mimi Zohar, Eric Paris

Resending with some extra cc's, hope that's OK:

NFSv4 (along with, from what I'm told, Samba) has a problem with read
leases, which is that they are currently only broken on write opens,
whereas they should really be broken on any operation that changes an
inode's metadata or any of the links pointing to it.

Also, to break leases in a non-racy way it's not sufficient to have a
single break_lease() call from the vfs code; we also need to prevent
anyone from acquiring a new lease while the operation is still in
progress.

So here's one attempt to deal with those problems, by adding to the
inode a counter which is incremented whenever we start such an operation
and decremented when we finish it.

I only handle unlink; we'd also need to do link, rename, chown, and
chmod, at least.

Comments? Better ideas?

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] locks: introduce i_blockleases to close lease races
  2011-06-10  0:09 lease/delegation/oplock semantics J. Bruce Fields
@ 2011-06-10  0:10 ` J. Bruce Fields
       [not found]   ` <20110610001011.GD22215-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
  2011-06-10 20:24   ` [PATCH 1/2] locks: introduce i_blockleases to close lease races Mimi Zohar
  0 siblings, 2 replies; 13+ messages in thread
From: J. Bruce Fields @ 2011-06-10  0:10 UTC (permalink / raw)
  To: linux-nfs
  Cc: linux-fsdevel, samba-technical, Christoph Hellwig, Mimi Zohar,
	Eric Paris

From: J. Bruce Fields <bfields@redhat.com>

Since break_lease is called before i_writecount is incremented, there's
a window between the two where a setlease call would have no way to know
that an open is about to happen.

We fix this by adding a new inode field, i_blockleases, that is
incremented while a lease-breaking operation is in progress.

We will later reuse i_blockleases to enforce lease-breaking for rename,
unlink, etc.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
 fs/inode.c         |    1 +
 fs/locks.c         |   28 ++++++++++++++++++++++++++++
 fs/namei.c         |    6 +++++-
 include/linux/fs.h |    3 +++
 4 files changed, 37 insertions(+), 1 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 33c963d..4f253a2 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -198,6 +198,7 @@ int inode_init_always(struct super_block *sb, struct inode *inode)
 	inode->i_uid = 0;
 	inode->i_gid = 0;
 	atomic_set(&inode->i_writecount, 0);
+	atomic_set(&inode->i_blockleases, 0);
 	inode->i_size = 0;
 	inode->i_blocks = 0;
 	inode->i_bytes = 0;
diff --git a/fs/locks.c b/fs/locks.c
index 0a4f50d..7699b1b 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1164,6 +1164,32 @@ static void time_out_leases(struct inode *inode)
 	}
 }
 
+/* Disallow all leases (read or write): */
+void disallow_leases(struct inode *inode, int flags)
+{
+	if (!inode)
+		return;
+	if (!S_ISREG(inode->i_mode))
+		return;
+	if ((flags & O_ACCMODE) == O_RDONLY)
+		return;
+	atomic_inc(&inode->i_blockleases);
+}
+EXPORT_SYMBOL_GPL(disallow_leases);
+
+void reallow_leases(struct inode *inode, int flags)
+{
+	if (!inode)
+		return;
+	if (!S_ISREG(inode->i_mode))
+		return;
+	if ((flags & O_ACCMODE) == O_RDONLY)
+		return;
+	BUG_ON(atomic_read(&inode->i_blockleases) <= 0);
+	atomic_dec(&inode->i_blockleases);
+}
+EXPORT_SYMBOL_GPL(reallow_leases);
+
 /**
  *	__break_lease	-	revoke all outstanding leases on file
  *	@inode: the inode of the file to return
@@ -1369,6 +1395,8 @@ int generic_setlease(struct file *filp, long arg, struct file_lock **flp)
 
 	if (arg != F_UNLCK) {
 		error = -EAGAIN;
+		if (atomic_read(&inode->i_blockleases))
+			goto out;
 		if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0))
 			goto out;
 		if ((arg == F_WRLCK)
diff --git a/fs/namei.c b/fs/namei.c
index 3cb616d..079e68c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2277,10 +2277,14 @@ ok:
 		want_write = 1;
 	}
 common:
+	disallow_leases(nd->path.dentry->d_inode, open_flag);
 	error = may_open(&nd->path, acc_mode, open_flag);
-	if (error)
+	if (error) {
+		reallow_leases(nd->path.dentry->d_inode, open_flag);
 		goto exit;
+	}
 	filp = nameidata_to_filp(nd);
+	reallow_leases(nd->path.dentry->d_inode, open_flag);
 	if (!IS_ERR(filp)) {
 		error = ima_file_check(filp, op->acc_mode);
 		if (error) {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 1b95af3..daf8443 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -796,6 +796,7 @@ struct inode {
 #ifdef CONFIG_IMA
 	atomic_t		i_readcount; /* struct files open RO */
 #endif
+	atomic_t		i_blockleases; /* setlease fails when >0 */
 	atomic_t		i_writecount;
 #ifdef CONFIG_SECURITY
 	void			*i_security;
@@ -1161,6 +1162,8 @@ extern void lease_get_mtime(struct inode *, struct timespec *time);
 extern int generic_setlease(struct file *, long, struct file_lock **);
 extern int vfs_setlease(struct file *, long, struct file_lock **);
 extern int lease_modify(struct file_lock **, int);
+extern void disallow_leases(struct inode *, int flags);
+extern void reallow_leases(struct inode *, int flags);
 extern int lock_may_read(struct inode *, loff_t start, unsigned long count);
 extern int lock_may_write(struct inode *, loff_t start, unsigned long count);
 extern void lock_flocks(void);
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] locks: break lease on unlink
       [not found]   ` <20110610001011.GD22215-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
@ 2011-06-10  0:11     ` J. Bruce Fields
  0 siblings, 0 replies; 13+ messages in thread
From: J. Bruce Fields @ 2011-06-10  0:11 UTC (permalink / raw)
  To: linux-nfs-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, Christoph Hellwig,
	Mimi Zohar, Eric Paris

NFSv4 uses leases (delegations) to allow clients to do opens locally.
An open takes a component name, so to do a local open a client needs to
know that at least that last component name points at the same time as
long as they hold the delegation.  For that reason, the NFSv4 spec
requires that delegations be broken on unlink.

Waiting for a lease to be broken may mean waiting for an nfs client or a
user process to give it up, so doing that while holding the i_mutex
would risk deadlocks.  So instead we do a non-blocking lease break, then
drop the i_mutex and do a blocking lease break if necessary.

Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 fs/namei.c    |   32 ++++++++++++++++++++++++++++----
 fs/nfsd/vfs.c |    6 +++++-
 2 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 079e68c..4661469 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2799,6 +2799,7 @@ static long do_unlinkat(int dfd, const char __user *pathname)
 
 	nd.flags &= ~LOOKUP_PARENT;
 
+retry:
 	mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
 	dentry = lookup_hash(&nd);
 	error = PTR_ERR(dentry);
@@ -2806,15 +2807,27 @@ static long do_unlinkat(int dfd, const char __user *pathname)
 		/* Why not before? Because we want correct error value */
 		if (nd.last.name[nd.last.len])
 			goto slashes;
-		inode = dentry->d_inode;
-		if (inode)
-			ihold(inode);
+		if (inode && inode != dentry->d_inode) {
+			reallow_leases(inode, O_WRONLY);
+			iput(inode);
+			inode = NULL;
+		}
+		if (!inode) {
+			inode = dentry->d_inode;
+			if (inode)
+				ihold(inode);
+			disallow_leases(inode, O_WRONLY);
+		}
 		error = mnt_want_write(nd.path.mnt);
 		if (error)
 			goto exit2;
 		error = security_path_unlink(&nd.path, dentry);
 		if (error)
 			goto exit3;
+		if (inode)
+			error = break_lease(inode, O_WRONLY|O_NONBLOCK);
+		if (error)
+			goto exit3;
 		error = vfs_unlink(nd.path.dentry->d_inode, dentry);
 exit3:
 		mnt_drop_write(nd.path.mnt);
@@ -2822,8 +2835,19 @@ exit3:
 		dput(dentry);
 	}
 	mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
-	if (inode)
+	if (inode) {
+		/* XXX: safe to use EWOULDBLOCK==EAGAIN to do this?: */
+		if (error == -EWOULDBLOCK) {
+			/*
+			 * Wait this time, then retry with leases left
+			 * disabled:
+			 */
+			break_lease(inode, O_WRONLY);
+			goto retry;
+		}
+		reallow_leases(inode, O_WRONLY);
 		iput(inode);	/* truncate the inode here */
+	}
 exit1:
 	path_put(&nd.path);
 	putname(name);
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index f4e056a..ef01b98 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1826,13 +1826,17 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
 	if (host_err)
 		goto out_put;
 
+	disallow_leases(rdentry->d_inode, O_WRONLY);
 	host_err = nfsd_break_lease(rdentry->d_inode);
-	if (host_err)
+	if (host_err) {
+		reallow_leases(rdentry->d_inode, O_WRONLY);
 		goto out_drop_write;
+	}
 	if (type != S_IFDIR)
 		host_err = vfs_unlink(dirp, rdentry);
 	else
 		host_err = vfs_rmdir(dirp, rdentry);
+	reallow_leases(rdentry->d_inode, O_WRONLY);
 	if (!host_err)
 		host_err = commit_metadata(fhp);
 out_drop_write:
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] locks: introduce i_blockleases to close lease races
  2011-06-10  0:10 ` [PATCH 1/2] locks: introduce i_blockleases to close lease races J. Bruce Fields
       [not found]   ` <20110610001011.GD22215-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
@ 2011-06-10 20:24   ` Mimi Zohar
  2011-06-10 21:34     ` J. Bruce Fields
  1 sibling, 1 reply; 13+ messages in thread
From: Mimi Zohar @ 2011-06-10 20:24 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: linux-nfs, linux-fsdevel, samba-technical, Christoph Hellwig, Eric Paris

On Thu, 2011-06-09 at 20:10 -0400, J. Bruce Fields wrote:
> From: J. Bruce Fields <bfields@redhat.com>
> 
> Since break_lease is called before i_writecount is incremented, there's
> a window between the two where a setlease call would have no way to know
> that an open is about to happen.

So unless the break_lease() call is moved from may_open() to after 
nameidata_to_filp(), I don't see any other options.

Mimi

> We fix this by adding a new inode field, i_blockleases, that is
> incremented while a lease-breaking operation is in progress.
> 
> We will later reuse i_blockleases to enforce lease-breaking for rename,
> unlink, etc.
> 
> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> ---
>  fs/inode.c         |    1 +
>  fs/locks.c         |   28 ++++++++++++++++++++++++++++
>  fs/namei.c         |    6 +++++-
>  include/linux/fs.h |    3 +++
>  4 files changed, 37 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index 33c963d..4f253a2 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -198,6 +198,7 @@ int inode_init_always(struct super_block *sb, struct inode *inode)
>  	inode->i_uid = 0;
>  	inode->i_gid = 0;
>  	atomic_set(&inode->i_writecount, 0);
> +	atomic_set(&inode->i_blockleases, 0);
>  	inode->i_size = 0;
>  	inode->i_blocks = 0;
>  	inode->i_bytes = 0;
> diff --git a/fs/locks.c b/fs/locks.c
> index 0a4f50d..7699b1b 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -1164,6 +1164,32 @@ static void time_out_leases(struct inode *inode)
>  	}
>  }
> 
> +/* Disallow all leases (read or write): */
> +void disallow_leases(struct inode *inode, int flags)
> +{
> +	if (!inode)
> +		return;
> +	if (!S_ISREG(inode->i_mode))
> +		return;
> +	if ((flags & O_ACCMODE) == O_RDONLY)
> +		return;
> +	atomic_inc(&inode->i_blockleases);
> +}
> +EXPORT_SYMBOL_GPL(disallow_leases);
> +
> +void reallow_leases(struct inode *inode, int flags)
> +{
> +	if (!inode)
> +		return;
> +	if (!S_ISREG(inode->i_mode))
> +		return;
> +	if ((flags & O_ACCMODE) == O_RDONLY)
> +		return;
> +	BUG_ON(atomic_read(&inode->i_blockleases) <= 0);
> +	atomic_dec(&inode->i_blockleases);
> +}
> +EXPORT_SYMBOL_GPL(reallow_leases);
> +
>  /**
>   *	__break_lease	-	revoke all outstanding leases on file
>   *	@inode: the inode of the file to return
> @@ -1369,6 +1395,8 @@ int generic_setlease(struct file *filp, long arg, struct file_lock **flp)
> 
>  	if (arg != F_UNLCK) {
>  		error = -EAGAIN;
> +		if (atomic_read(&inode->i_blockleases))
> +			goto out;
>  		if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0))
>  			goto out;
>  		if ((arg == F_WRLCK)
> diff --git a/fs/namei.c b/fs/namei.c
> index 3cb616d..079e68c 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -2277,10 +2277,14 @@ ok:
>  		want_write = 1;
>  	}
>  common:
> +	disallow_leases(nd->path.dentry->d_inode, open_flag);
>  	error = may_open(&nd->path, acc_mode, open_flag);
> -	if (error)
> +	if (error) {
> +		reallow_leases(nd->path.dentry->d_inode, open_flag);
>  		goto exit;
> +	}
>  	filp = nameidata_to_filp(nd);
> +	reallow_leases(nd->path.dentry->d_inode, open_flag);
>  	if (!IS_ERR(filp)) {
>  		error = ima_file_check(filp, op->acc_mode);
>  		if (error) {
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 1b95af3..daf8443 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -796,6 +796,7 @@ struct inode {
>  #ifdef CONFIG_IMA
>  	atomic_t		i_readcount; /* struct files open RO */
>  #endif
> +	atomic_t		i_blockleases; /* setlease fails when >0 */
>  	atomic_t		i_writecount;
>  #ifdef CONFIG_SECURITY
>  	void			*i_security;
> @@ -1161,6 +1162,8 @@ extern void lease_get_mtime(struct inode *, struct timespec *time);
>  extern int generic_setlease(struct file *, long, struct file_lock **);
>  extern int vfs_setlease(struct file *, long, struct file_lock **);
>  extern int lease_modify(struct file_lock **, int);
> +extern void disallow_leases(struct inode *, int flags);
> +extern void reallow_leases(struct inode *, int flags);
>  extern int lock_may_read(struct inode *, loff_t start, unsigned long count);
>  extern int lock_may_write(struct inode *, loff_t start, unsigned long count);
>  extern void lock_flocks(void);



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] locks: introduce i_blockleases to close lease races
  2011-06-10 20:24   ` [PATCH 1/2] locks: introduce i_blockleases to close lease races Mimi Zohar
@ 2011-06-10 21:34     ` J. Bruce Fields
  2011-06-12  4:08       ` J. Bruce Fields
  0 siblings, 1 reply; 13+ messages in thread
From: J. Bruce Fields @ 2011-06-10 21:34 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: linux-nfs, linux-fsdevel, samba-technical, Christoph Hellwig, Eric Paris

On Fri, Jun 10, 2011 at 04:24:00PM -0400, Mimi Zohar wrote:
> On Thu, 2011-06-09 at 20:10 -0400, J. Bruce Fields wrote:
> > From: J. Bruce Fields <bfields@redhat.com>
> > 
> > Since break_lease is called before i_writecount is incremented, there's
> > a window between the two where a setlease call would have no way to know
> > that an open is about to happen.
> 
> So unless the break_lease() call is moved from may_open() to after 
> nameidata_to_filp(), I don't see any other options.

Actually, offhand I can't see why that wouldn't be OK.

Though I think we still end up needing something like i_blockleases to
handle unlink, link, rename, chown, and chmod.

--b.

> 
> Mimi
> 
> > We fix this by adding a new inode field, i_blockleases, that is
> > incremented while a lease-breaking operation is in progress.
> > 
> > We will later reuse i_blockleases to enforce lease-breaking for rename,
> > unlink, etc.
> > 
> > Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> > ---
> >  fs/inode.c         |    1 +
> >  fs/locks.c         |   28 ++++++++++++++++++++++++++++
> >  fs/namei.c         |    6 +++++-
> >  include/linux/fs.h |    3 +++
> >  4 files changed, 37 insertions(+), 1 deletions(-)
> > 
> > diff --git a/fs/inode.c b/fs/inode.c
> > index 33c963d..4f253a2 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -198,6 +198,7 @@ int inode_init_always(struct super_block *sb, struct inode *inode)
> >  	inode->i_uid = 0;
> >  	inode->i_gid = 0;
> >  	atomic_set(&inode->i_writecount, 0);
> > +	atomic_set(&inode->i_blockleases, 0);
> >  	inode->i_size = 0;
> >  	inode->i_blocks = 0;
> >  	inode->i_bytes = 0;
> > diff --git a/fs/locks.c b/fs/locks.c
> > index 0a4f50d..7699b1b 100644
> > --- a/fs/locks.c
> > +++ b/fs/locks.c
> > @@ -1164,6 +1164,32 @@ static void time_out_leases(struct inode *inode)
> >  	}
> >  }
> > 
> > +/* Disallow all leases (read or write): */
> > +void disallow_leases(struct inode *inode, int flags)
> > +{
> > +	if (!inode)
> > +		return;
> > +	if (!S_ISREG(inode->i_mode))
> > +		return;
> > +	if ((flags & O_ACCMODE) == O_RDONLY)
> > +		return;
> > +	atomic_inc(&inode->i_blockleases);
> > +}
> > +EXPORT_SYMBOL_GPL(disallow_leases);
> > +
> > +void reallow_leases(struct inode *inode, int flags)
> > +{
> > +	if (!inode)
> > +		return;
> > +	if (!S_ISREG(inode->i_mode))
> > +		return;
> > +	if ((flags & O_ACCMODE) == O_RDONLY)
> > +		return;
> > +	BUG_ON(atomic_read(&inode->i_blockleases) <= 0);
> > +	atomic_dec(&inode->i_blockleases);
> > +}
> > +EXPORT_SYMBOL_GPL(reallow_leases);
> > +
> >  /**
> >   *	__break_lease	-	revoke all outstanding leases on file
> >   *	@inode: the inode of the file to return
> > @@ -1369,6 +1395,8 @@ int generic_setlease(struct file *filp, long arg, struct file_lock **flp)
> > 
> >  	if (arg != F_UNLCK) {
> >  		error = -EAGAIN;
> > +		if (atomic_read(&inode->i_blockleases))
> > +			goto out;
> >  		if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0))
> >  			goto out;
> >  		if ((arg == F_WRLCK)
> > diff --git a/fs/namei.c b/fs/namei.c
> > index 3cb616d..079e68c 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -2277,10 +2277,14 @@ ok:
> >  		want_write = 1;
> >  	}
> >  common:
> > +	disallow_leases(nd->path.dentry->d_inode, open_flag);
> >  	error = may_open(&nd->path, acc_mode, open_flag);
> > -	if (error)
> > +	if (error) {
> > +		reallow_leases(nd->path.dentry->d_inode, open_flag);
> >  		goto exit;
> > +	}
> >  	filp = nameidata_to_filp(nd);
> > +	reallow_leases(nd->path.dentry->d_inode, open_flag);
> >  	if (!IS_ERR(filp)) {
> >  		error = ima_file_check(filp, op->acc_mode);
> >  		if (error) {
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 1b95af3..daf8443 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -796,6 +796,7 @@ struct inode {
> >  #ifdef CONFIG_IMA
> >  	atomic_t		i_readcount; /* struct files open RO */
> >  #endif
> > +	atomic_t		i_blockleases; /* setlease fails when >0 */
> >  	atomic_t		i_writecount;
> >  #ifdef CONFIG_SECURITY
> >  	void			*i_security;
> > @@ -1161,6 +1162,8 @@ extern void lease_get_mtime(struct inode *, struct timespec *time);
> >  extern int generic_setlease(struct file *, long, struct file_lock **);
> >  extern int vfs_setlease(struct file *, long, struct file_lock **);
> >  extern int lease_modify(struct file_lock **, int);
> > +extern void disallow_leases(struct inode *, int flags);
> > +extern void reallow_leases(struct inode *, int flags);
> >  extern int lock_may_read(struct inode *, loff_t start, unsigned long count);
> >  extern int lock_may_write(struct inode *, loff_t start, unsigned long count);
> >  extern void lock_flocks(void);
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] locks: introduce i_blockleases to close lease races
  2011-06-10 21:34     ` J. Bruce Fields
@ 2011-06-12  4:08       ` J. Bruce Fields
  2011-06-12 19:10         ` Mimi Zohar
  0 siblings, 1 reply; 13+ messages in thread
From: J. Bruce Fields @ 2011-06-12  4:08 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: linux-nfs, linux-fsdevel, samba-technical, Christoph Hellwig, Eric Paris

On Fri, Jun 10, 2011 at 05:34:46PM -0400, J. Bruce Fields wrote:
> On Fri, Jun 10, 2011 at 04:24:00PM -0400, Mimi Zohar wrote:
> > On Thu, 2011-06-09 at 20:10 -0400, J. Bruce Fields wrote:
> > > From: J. Bruce Fields <bfields@redhat.com>
> > > 
> > > Since break_lease is called before i_writecount is incremented, there's
> > > a window between the two where a setlease call would have no way to know
> > > that an open is about to happen.
> > 
> > So unless the break_lease() call is moved from may_open() to after 
> > nameidata_to_filp(), I don't see any other options.
> 
> Actually, offhand I can't see why that wouldn't be OK.
> 
> Though I think we still end up needing something like i_blockleases to
> handle unlink, link, rename, chown, and chmod.

Well, I guess there's a bizarre alternative that wouldn't require a new
inode field:

What we care about is conflicts between read leases and operations that
modify the metadata of the inode or the set of names pointing to it.

As far as I can tell those operations all take the i_mutex either on the
inode itself or on the parents of one of its aliases.

So, you could prevent break_lease/setlease races by calling setlease
under *all* of those i_mutexes:

	- take i_mutex on the inode
	- take i_lock to prevent the set of aliases from changing
	- take i_mutex for parent of each alias
	- set the lease
	- drop the parent i_mutexes, etc.

where the i_mutexes would all be taken with mutex_trylock, and we'd just
fail the whole setlease if any of them failed.

???

--b.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] locks: introduce i_blockleases to close lease races
  2011-06-12  4:08       ` J. Bruce Fields
@ 2011-06-12 19:10         ` Mimi Zohar
  2011-06-12 19:12           ` J. Bruce Fields
  0 siblings, 1 reply; 13+ messages in thread
From: Mimi Zohar @ 2011-06-12 19:10 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: linux-nfs, linux-fsdevel, samba-technical, Christoph Hellwig, Eric Paris

On Sun, 2011-06-12 at 00:08 -0400, J. Bruce Fields wrote:
> On Fri, Jun 10, 2011 at 05:34:46PM -0400, J. Bruce Fields wrote:
> > On Fri, Jun 10, 2011 at 04:24:00PM -0400, Mimi Zohar wrote:
> > > On Thu, 2011-06-09 at 20:10 -0400, J. Bruce Fields wrote:
> > > > From: J. Bruce Fields <bfields@redhat.com>
> > > > 
> > > > Since break_lease is called before i_writecount is incremented, there's
> > > > a window between the two where a setlease call would have no way to know
> > > > that an open is about to happen.
> > > 
> > > So unless the break_lease() call is moved from may_open() to after 
> > > nameidata_to_filp(), I don't see any other options.
> > 
> > Actually, offhand I can't see why that wouldn't be OK.
> > 
> > Though I think we still end up needing something like i_blockleases to
> > handle unlink, link, rename, chown, and chmod.
> 
> Well, I guess there's a bizarre alternative that wouldn't require a new
> inode field:

In lieu of adding a new inode field, another possible option, a bit
kludgy, would be extending i_flock with an additional fl_flag
FL_BLOCKLEASE.

#define IS_BLOCKLEASE(fl)    (fl->fl_flags & FL_BLOCKLEASE)

Mimi

> What we care about is conflicts between read leases and operations that
> modify the metadata of the inode or the set of names pointing to it.
> 
> As far as I can tell those operations all take the i_mutex either on the
> inode itself or on the parents of one of its aliases.
> 
> So, you could prevent break_lease/setlease races by calling setlease
> under *all* of those i_mutexes:
> 
> 	- take i_mutex on the inode
> 	- take i_lock to prevent the set of aliases from changing
> 	- take i_mutex for parent of each alias
> 	- set the lease
> 	- drop the parent i_mutexes, etc.
> 
> where the i_mutexes would all be taken with mutex_trylock, and we'd just
> fail the whole setlease if any of them failed.
> 
> ???
> 
> --b.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] locks: introduce i_blockleases to close lease races
  2011-06-12 19:10         ` Mimi Zohar
@ 2011-06-12 19:12           ` J. Bruce Fields
       [not found]             ` <20110612191220.GK12149-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: J. Bruce Fields @ 2011-06-12 19:12 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: linux-nfs, linux-fsdevel, samba-technical, Christoph Hellwig, Eric Paris

On Sun, Jun 12, 2011 at 03:10:04PM -0400, Mimi Zohar wrote:
> On Sun, 2011-06-12 at 00:08 -0400, J. Bruce Fields wrote:
> > On Fri, Jun 10, 2011 at 05:34:46PM -0400, J. Bruce Fields wrote:
> > > On Fri, Jun 10, 2011 at 04:24:00PM -0400, Mimi Zohar wrote:
> > > > On Thu, 2011-06-09 at 20:10 -0400, J. Bruce Fields wrote:
> > > > > From: J. Bruce Fields <bfields@redhat.com>
> > > > > 
> > > > > Since break_lease is called before i_writecount is incremented, there's
> > > > > a window between the two where a setlease call would have no way to know
> > > > > that an open is about to happen.
> > > > 
> > > > So unless the break_lease() call is moved from may_open() to after 
> > > > nameidata_to_filp(), I don't see any other options.
> > > 
> > > Actually, offhand I can't see why that wouldn't be OK.
> > > 
> > > Though I think we still end up needing something like i_blockleases to
> > > handle unlink, link, rename, chown, and chmod.
> > 
> > Well, I guess there's a bizarre alternative that wouldn't require a new
> > inode field:
> 
> In lieu of adding a new inode field, another possible option, a bit
> kludgy, would be extending i_flock with an additional fl_flag
> FL_BLOCKLEASE.
> 
> #define IS_BLOCKLEASE(fl)    (fl->fl_flags & FL_BLOCKLEASE)

Alas, that would mean adding and removing one of these file locks around
every single link, unlink, rename,....

--b.

> 
> Mimi
> 
> > What we care about is conflicts between read leases and operations that
> > modify the metadata of the inode or the set of names pointing to it.
> > 
> > As far as I can tell those operations all take the i_mutex either on the
> > inode itself or on the parents of one of its aliases.
> > 
> > So, you could prevent break_lease/setlease races by calling setlease
> > under *all* of those i_mutexes:
> > 
> > 	- take i_mutex on the inode
> > 	- take i_lock to prevent the set of aliases from changing
> > 	- take i_mutex for parent of each alias
> > 	- set the lease
> > 	- drop the parent i_mutexes, etc.
> > 
> > where the i_mutexes would all be taken with mutex_trylock, and we'd just
> > fail the whole setlease if any of them failed.
> > 
> > ???
> > 
> > --b.
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] locks: introduce i_blockleases to close lease races
       [not found]             ` <20110612191220.GK12149-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
@ 2011-06-12 20:54               ` Mimi Zohar
  2011-06-13 12:19                 ` J. Bruce Fields
  0 siblings, 1 reply; 13+ messages in thread
From: Mimi Zohar @ 2011-06-12 20:54 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, Christoph Hellwig,
	Eric Paris

On Sun, 2011-06-12 at 15:12 -0400, J. Bruce Fields wrote: 
> On Sun, Jun 12, 2011 at 03:10:04PM -0400, Mimi Zohar wrote:
> > On Sun, 2011-06-12 at 00:08 -0400, J. Bruce Fields wrote:
> > > On Fri, Jun 10, 2011 at 05:34:46PM -0400, J. Bruce Fields wrote:
> > > > On Fri, Jun 10, 2011 at 04:24:00PM -0400, Mimi Zohar wrote:
> > > > > On Thu, 2011-06-09 at 20:10 -0400, J. Bruce Fields wrote:
> > > > > > From: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > > > > > 
> > > > > > Since break_lease is called before i_writecount is incremented, there's
> > > > > > a window between the two where a setlease call would have no way to know
> > > > > > that an open is about to happen.
> > > > > 
> > > > > So unless the break_lease() call is moved from may_open() to after 
> > > > > nameidata_to_filp(), I don't see any other options.
> > > > 
> > > > Actually, offhand I can't see why that wouldn't be OK.
> > > > 
> > > > Though I think we still end up needing something like i_blockleases to
> > > > handle unlink, link, rename, chown, and chmod.
> > > 
> > > Well, I guess there's a bizarre alternative that wouldn't require a new
> > > inode field:
> > 
> > In lieu of adding a new inode field, another possible option, a bit
> > kludgy, would be extending i_flock with an additional fl_flag
> > FL_BLOCKLEASE.
> > 
> > #define IS_BLOCKLEASE(fl)    (fl->fl_flags & FL_BLOCKLEASE)
> 
> Alas, that would mean adding and removing one of these file locks around
> every single link, unlink, rename,....
> 
> --b.

You're adding a call to break_lease() for each of them.  Currently
__break_lease() is only called if a lease exists. Assuming there aren't
any existing leases, couldn't break_lease() call something like
block_lease()?  The free would be after the link, unlink, ...,
completed/failed.

(You wouldn't actually need to alloc/free the 'struct file_lock' each
time, just set the pointer and reset to NULL.)

Mimi

> > 
> > Mimi
> > 
> > > What we care about is conflicts between read leases and operations that
> > > modify the metadata of the inode or the set of names pointing to it.
> > > 
> > > As far as I can tell those operations all take the i_mutex either on the
> > > inode itself or on the parents of one of its aliases.
> > > 
> > > So, you could prevent break_lease/setlease races by calling setlease
> > > under *all* of those i_mutexes:
> > > 
> > > 	- take i_mutex on the inode
> > > 	- take i_lock to prevent the set of aliases from changing
> > > 	- take i_mutex for parent of each alias
> > > 	- set the lease
> > > 	- drop the parent i_mutexes, etc.
> > > 
> > > where the i_mutexes would all be taken with mutex_trylock, and we'd just
> > > fail the whole setlease if any of them failed.
> > > 
> > > ???
> > > 
> > > --b.
> > 
> > 



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] locks: introduce i_blockleases to close lease races
  2011-06-12 20:54               ` Mimi Zohar
@ 2011-06-13 12:19                 ` J. Bruce Fields
  2011-06-13 20:37                   ` Mimi Zohar
       [not found]                   ` <20110613121939.GL12149-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
  0 siblings, 2 replies; 13+ messages in thread
From: J. Bruce Fields @ 2011-06-13 12:19 UTC (permalink / raw)
  To: Mimi Zohar; +Cc: linux-fsdevel, linux-nfs, samba-technical, Eric Paris

On Sun, Jun 12, 2011 at 04:54:33PM -0400, Mimi Zohar wrote:
> On Sun, 2011-06-12 at 15:12 -0400, J. Bruce Fields wrote: 
> > On Sun, Jun 12, 2011 at 03:10:04PM -0400, Mimi Zohar wrote:
> > > On Sun, 2011-06-12 at 00:08 -0400, J. Bruce Fields wrote:
> > > > On Fri, Jun 10, 2011 at 05:34:46PM -0400, J. Bruce Fields wrote:
> > > > > On Fri, Jun 10, 2011 at 04:24:00PM -0400, Mimi Zohar wrote:
> > > > > > On Thu, 2011-06-09 at 20:10 -0400, J. Bruce Fields wrote:
> > > > > > > From: J. Bruce Fields <bfields@redhat.com>
> > > > > > > 
> > > > > > > Since break_lease is called before i_writecount is incremented, there's
> > > > > > > a window between the two where a setlease call would have no way to know
> > > > > > > that an open is about to happen.
> > > > > > 
> > > > > > So unless the break_lease() call is moved from may_open() to after 
> > > > > > nameidata_to_filp(), I don't see any other options.
> > > > > 
> > > > > Actually, offhand I can't see why that wouldn't be OK.
> > > > > 
> > > > > Though I think we still end up needing something like i_blockleases to
> > > > > handle unlink, link, rename, chown, and chmod.
> > > > 
> > > > Well, I guess there's a bizarre alternative that wouldn't require a new
> > > > inode field:
> > > 
> > > In lieu of adding a new inode field, another possible option, a bit
> > > kludgy, would be extending i_flock with an additional fl_flag
> > > FL_BLOCKLEASE.
> > > 
> > > #define IS_BLOCKLEASE(fl)    (fl->fl_flags & FL_BLOCKLEASE)
> > 
> > Alas, that would mean adding and removing one of these file locks around
> > every single link, unlink, rename,....
> > 
> > --b.
> 
> You're adding a call to break_lease() for each of them.  Currently
> __break_lease() is only called if a lease exists. Assuming there aren't
> any existing leases, couldn't break_lease() call something like
> block_lease()?  The free would be after the link, unlink, ...,
> completed/failed.
> 
> (You wouldn't actually need to alloc/free the 'struct file_lock' each
> time, just set the pointer and reset to NULL.)

Well, the pointer has to be set to something.  I suppose we could put a
struct file_lock on the stack.

--b.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] locks: introduce i_blockleases to close lease races
  2011-06-13 12:19                 ` J. Bruce Fields
@ 2011-06-13 20:37                   ` Mimi Zohar
  2011-06-14  0:35                     ` J. Bruce Fields
       [not found]                   ` <20110613121939.GL12149-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
  1 sibling, 1 reply; 13+ messages in thread
From: Mimi Zohar @ 2011-06-13 20:37 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-fsdevel, linux-nfs, samba-technical, Eric Paris

On Mon, 2011-06-13 at 08:19 -0400, J. Bruce Fields wrote:
> On Sun, Jun 12, 2011 at 04:54:33PM -0400, Mimi Zohar wrote:
> > On Sun, 2011-06-12 at 15:12 -0400, J. Bruce Fields wrote: 
> > > On Sun, Jun 12, 2011 at 03:10:04PM -0400, Mimi Zohar wrote:
> > > > On Sun, 2011-06-12 at 00:08 -0400, J. Bruce Fields wrote:
> > > > > On Fri, Jun 10, 2011 at 05:34:46PM -0400, J. Bruce Fields wrote:
> > > > > > On Fri, Jun 10, 2011 at 04:24:00PM -0400, Mimi Zohar wrote:
> > > > > > > On Thu, 2011-06-09 at 20:10 -0400, J. Bruce Fields wrote:
> > > > > > > > From: J. Bruce Fields <bfields@redhat.com>
> > > > > > > > 
> > > > > > > > Since break_lease is called before i_writecount is incremented, there's
> > > > > > > > a window between the two where a setlease call would have no way to know
> > > > > > > > that an open is about to happen.
> > > > > > > 
> > > > > > > So unless the break_lease() call is moved from may_open() to after 
> > > > > > > nameidata_to_filp(), I don't see any other options.
> > > > > > 
> > > > > > Actually, offhand I can't see why that wouldn't be OK.
> > > > > > 
> > > > > > Though I think we still end up needing something like i_blockleases to
> > > > > > handle unlink, link, rename, chown, and chmod.
> > > > > 
> > > > > Well, I guess there's a bizarre alternative that wouldn't require a new
> > > > > inode field:
> > > > 
> > > > In lieu of adding a new inode field, another possible option, a bit
> > > > kludgy, would be extending i_flock with an additional fl_flag
> > > > FL_BLOCKLEASE.
> > > > 
> > > > #define IS_BLOCKLEASE(fl)    (fl->fl_flags & FL_BLOCKLEASE)
> > > 
> > > Alas, that would mean adding and removing one of these file locks around
> > > every single link, unlink, rename,....
> > > 
> > > --b.
> > 
> > You're adding a call to break_lease() for each of them.  Currently
> > __break_lease() is only called if a lease exists. Assuming there aren't
> > any existing leases, couldn't break_lease() call something like
> > block_lease()?  The free would be after the link, unlink, ...,
> > completed/failed.
> > 
> > (You wouldn't actually need to alloc/free the 'struct file_lock' each
> > time, just set the pointer and reset to NULL.)
> 
> Well, the pointer has to be set to something.  I suppose we could put a
> struct file_lock on the stack.
> 
> --b.

Instead of putting the struct file_lock on the stack, how about creating
a dummy list containing a single element with FL_BLOCKLEASE set?

Mimi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] locks: introduce i_blockleases to close lease races
  2011-06-13 20:37                   ` Mimi Zohar
@ 2011-06-14  0:35                     ` J. Bruce Fields
  0 siblings, 0 replies; 13+ messages in thread
From: J. Bruce Fields @ 2011-06-14  0:35 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: linux-nfs, linux-fsdevel, samba-technical, Christoph Hellwig, Eric Paris

On Mon, Jun 13, 2011 at 04:37:03PM -0400, Mimi Zohar wrote:
> On Mon, 2011-06-13 at 08:19 -0400, J. Bruce Fields wrote:
> > On Sun, Jun 12, 2011 at 04:54:33PM -0400, Mimi Zohar wrote:
> > > On Sun, 2011-06-12 at 15:12 -0400, J. Bruce Fields wrote: 
> > > > On Sun, Jun 12, 2011 at 03:10:04PM -0400, Mimi Zohar wrote:
> > > > > In lieu of adding a new inode field, another possible option, a bit
> > > > > kludgy, would be extending i_flock with an additional fl_flag
> > > > > FL_BLOCKLEASE.
> > > > > 
> > > > > #define IS_BLOCKLEASE(fl)    (fl->fl_flags & FL_BLOCKLEASE)
> > > > 
> > > > Alas, that would mean adding and removing one of these file locks around
> > > > every single link, unlink, rename,....
> > > > 
> > > > --b.
> > > 
> > > You're adding a call to break_lease() for each of them.  Currently
> > > __break_lease() is only called if a lease exists. Assuming there aren't
> > > any existing leases, couldn't break_lease() call something like
> > > block_lease()?  The free would be after the link, unlink, ...,
> > > completed/failed.
> > > 
> > > (You wouldn't actually need to alloc/free the 'struct file_lock' each
> > > time, just set the pointer and reset to NULL.)
> > 
> > Well, the pointer has to be set to something.  I suppose we could put a
> > struct file_lock on the stack.
> > 
> > --b.
> 
> Instead of putting the struct file_lock on the stack, how about creating
> a dummy list containing a single element with FL_BLOCKLEASE set?

I'm afraid I don't understand what you're proposing.  Could you explain
in more detail?

--b.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] locks: introduce i_blockleases to close lease races
       [not found]                   ` <20110613121939.GL12149-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
@ 2011-06-15 15:47                     ` J. Bruce Fields
  0 siblings, 0 replies; 13+ messages in thread
From: J. Bruce Fields @ 2011-06-15 15:47 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	samba-technical-w/Ol4Ecudpl8XjKLYN78aQ, Christoph Hellwig,
	Eric Paris

On Mon, Jun 13, 2011 at 08:19:39AM -0400, J. Bruce Fields wrote:
> On Sun, Jun 12, 2011 at 04:54:33PM -0400, Mimi Zohar wrote:
> > You're adding a call to break_lease() for each of them.  Currently
> > __break_lease() is only called if a lease exists. Assuming there aren't
> > any existing leases, couldn't break_lease() call something like
> > block_lease()?  The free would be after the link, unlink, ...,
> > completed/failed.
> > 
> > (You wouldn't actually need to alloc/free the 'struct file_lock' each
> > time, just set the pointer and reset to NULL.)
> 
> Well, the pointer has to be set to something.  I suppose we could put a
> struct file_lock on the stack.

The locking code is under a global spinlock--instead of an atomic inc
and dec of inode->i_blockleases we'd be doing a pair of lock/unlocks of
file_lock_lock.

We could probably fix that: off the top of my head the only reason I see
for a global lock is the stupid deadlock-detection code.  Which is only
needed for posix locks (and I'm not at all convinced it's needed even
there).

With that fixed, it would be a choice between an i_lock-protected
list_add/list_del vs. an atomic inc/dec of a new inode field.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-06-15 15:47 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-10  0:09 lease/delegation/oplock semantics J. Bruce Fields
2011-06-10  0:10 ` [PATCH 1/2] locks: introduce i_blockleases to close lease races J. Bruce Fields
     [not found]   ` <20110610001011.GD22215-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2011-06-10  0:11     ` [PATCH 2/2] locks: break lease on unlink J. Bruce Fields
2011-06-10 20:24   ` [PATCH 1/2] locks: introduce i_blockleases to close lease races Mimi Zohar
2011-06-10 21:34     ` J. Bruce Fields
2011-06-12  4:08       ` J. Bruce Fields
2011-06-12 19:10         ` Mimi Zohar
2011-06-12 19:12           ` J. Bruce Fields
     [not found]             ` <20110612191220.GK12149-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2011-06-12 20:54               ` Mimi Zohar
2011-06-13 12:19                 ` J. Bruce Fields
2011-06-13 20:37                   ` Mimi Zohar
2011-06-14  0:35                     ` J. Bruce Fields
     [not found]                   ` <20110613121939.GL12149-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2011-06-15 15:47                     ` J. Bruce Fields

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).