All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/12] Implement NFSv4 delegations, take 8
@ 2013-07-03 20:12 J. Bruce Fields
  2013-07-03 20:12 ` [PATCH 01/12] vfs: pull ext4's double-i_mutex-locking into common code J. Bruce Fields
                   ` (9 more replies)
  0 siblings, 10 replies; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-03 20:12 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-nfs, linux-fsdevel, J. Bruce Fields

From: "J. Bruce Fields" <bfields@redhat.com>

This is just a rebase to 3.10 of the previous posting.  As far as I can
tell, it's ready to be merged.  Introduction (repeated from before) follows:


This patch series implements read delegations, which allow NFSv4 clients
to perform read opens without contacting the server, by promising to
call back to clients before modifying the data, metadata, or set of
links pointing to a file.

The main recent change was in response to review from Linus, who didn't
want us to hang under directory i_mutex's on timeouts communicating with
unresponsive clients.

So, this version of the series drops the i_mutex before waiting.  The
logic ends up looking something like:

        acquire locks
        look up inode
        test for delegation; if found:
                take reference on inode
                release locks
                wait for delegation break
                drop reference on inode
                retry


The initial test for a delegation happens after the lock on the
delegated inode is acquired, but additional directory mutexes may have
been acquired further up the call stack.  I therefore add a "struct
inode **" argument to any intervening functions, which we use to pass
the inode back up to the caller in the case it needs to wait for the
delegation to be broken.

I also allow callers to pass in NULL for the "struct inode **" argument
to indicate they'd rather just fail than wait for a delegation.  For
example, as long as ecryptfs isn't exportable I assume they'd rather not
see retry logic there that they won't use.  But I may have misjudged in
some of these cases.


J. Bruce Fields (12):
  vfs: pull ext4's double-i_mutex-locking into common code
  vfs: don't use PARENT/CHILD lock classes for non-directories
  vfs: rename I_MUTEX_QUOTA now that it's not used for quotas
  vfs: take i_mutex on renamed file
  locks: introduce new FL_DELEG lock flag
  locks: implement delegations
  namei: minor vfs_unlink cleanup
  locks: break delegations on unlink
  locks: helper functions for delegation breaking
  locks: break delegations on rename
  locks: break delegations on link
  locks: break delegations on any attribute modification

 Documentation/filesystems/directory-locking |   31 ++++++++---
 drivers/base/devtmpfs.c                     |    6 +-
 fs/attr.c                                   |    5 +-
 fs/cachefiles/interface.c                   |    4 +-
 fs/cachefiles/namei.c                       |    4 +-
 fs/ecryptfs/inode.c                         |    6 +-
 fs/ext4/ext4.h                              |    2 -
 fs/ext4/ioctl.c                             |    4 +-
 fs/ext4/move_extent.c                       |   40 +-------------
 fs/hpfs/namei.c                             |    2 +-
 fs/inode.c                                  |   35 +++++++++++-
 fs/locks.c                                  |   51 +++++++++++++----
 fs/namei.c                                  |   79 +++++++++++++++++++++------
 fs/nfsd/nfs4state.c                         |    2 +-
 fs/nfsd/vfs.c                               |   14 +++--
 fs/open.c                                   |   21 +++++--
 fs/utimes.c                                 |    9 ++-
 include/linux/fs.h                          |   72 ++++++++++++++++++++----
 ipc/mqueue.c                                |    2 +-
 19 files changed, 273 insertions(+), 116 deletions(-)

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 01/12] vfs: pull ext4's double-i_mutex-locking into common code
  2013-07-03 20:12 [PATCH 00/12] Implement NFSv4 delegations, take 8 J. Bruce Fields
@ 2013-07-03 20:12 ` J. Bruce Fields
       [not found]   ` <1372882356-14168-2-git-send-email-bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2013-07-03 20:12 ` [PATCH 02/12] vfs: don't use PARENT/CHILD lock classes for non-directories J. Bruce Fields
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-03 20:12 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-nfs, linux-fsdevel, J. Bruce Fields, Theodore Ts'o,
	Andreas Dilger

From: "J. Bruce Fields" <bfields@redhat.com>

We want to do this elsewhere as well.

Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
 fs/ext4/ext4.h        |    2 --
 fs/ext4/ioctl.c       |    4 ++--
 fs/ext4/move_extent.c |   40 ++--------------------------------------
 fs/inode.c            |   29 +++++++++++++++++++++++++++++
 include/linux/fs.h    |    3 +++
 5 files changed, 36 insertions(+), 42 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 5aae3d1..3590abe 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2642,8 +2642,6 @@ extern void ext4_double_down_write_data_sem(struct inode *first,
 					    struct inode *second);
 extern void ext4_double_up_write_data_sem(struct inode *orig_inode,
 					  struct inode *donor_inode);
-void ext4_inode_double_lock(struct inode *inode1, struct inode *inode2);
-void ext4_inode_double_unlock(struct inode *inode1, struct inode *inode2);
 extern int ext4_move_extents(struct file *o_filp, struct file *d_filp,
 			     __u64 start_orig, __u64 start_donor,
 			     __u64 len, __u64 *moved_len);
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index 9491ac0..12048f7 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -129,7 +129,7 @@ static long swap_inode_boot_loader(struct super_block *sb,
 
 	/* Protect orig inodes against a truncate and make sure,
 	 * that only 1 swap_inode_boot_loader is running. */
-	ext4_inode_double_lock(inode, inode_bl);
+	lock_two_nondirectories(inode, inode_bl);
 
 	truncate_inode_pages(&inode->i_data, 0);
 	truncate_inode_pages(&inode_bl->i_data, 0);
@@ -204,7 +204,7 @@ static long swap_inode_boot_loader(struct super_block *sb,
 	ext4_inode_resume_unlocked_dio(inode);
 	ext4_inode_resume_unlocked_dio(inode_bl);
 
-	ext4_inode_double_unlock(inode, inode_bl);
+	unlock_two_nondirectories(inode, inode_bl);
 
 	iput(inode_bl);
 
diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
index 3dcbf36..986a838 100644
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@@ -1206,42 +1206,6 @@ mext_check_arguments(struct inode *orig_inode,
 }
 
 /**
- * ext4_inode_double_lock - Lock i_mutex on both @inode1 and @inode2
- *
- * @inode1:	the inode structure
- * @inode2:	the inode structure
- *
- * Lock two inodes' i_mutex
- */
-void
-ext4_inode_double_lock(struct inode *inode1, struct inode *inode2)
-{
-	BUG_ON(inode1 == inode2);
-	if (inode1 < inode2) {
-		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT);
-		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
-	} else {
-		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_PARENT);
-		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_CHILD);
-	}
-}
-
-/**
- * ext4_inode_double_unlock - Release i_mutex on both @inode1 and @inode2
- *
- * @inode1:     the inode that is released first
- * @inode2:     the inode that is released second
- *
- */
-
-void
-ext4_inode_double_unlock(struct inode *inode1, struct inode *inode2)
-{
-	mutex_unlock(&inode1->i_mutex);
-	mutex_unlock(&inode2->i_mutex);
-}
-
-/**
  * ext4_move_extents - Exchange the specified range of a file
  *
  * @o_filp:		file structure of the original file
@@ -1330,7 +1294,7 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp,
 		return -EINVAL;
 	}
 	/* Protect orig and donor inodes against a truncate */
-	ext4_inode_double_lock(orig_inode, donor_inode);
+	lock_two_nondirectories(orig_inode, donor_inode);
 
 	/* Wait for all existing dio workers */
 	ext4_inode_block_unlocked_dio(orig_inode);
@@ -1538,7 +1502,7 @@ out:
 	ext4_double_up_write_data_sem(orig_inode, donor_inode);
 	ext4_inode_resume_unlocked_dio(orig_inode);
 	ext4_inode_resume_unlocked_dio(donor_inode);
-	ext4_inode_double_unlock(orig_inode, donor_inode);
+	unlock_two_nondirectories(orig_inode, donor_inode);
 
 	return ret;
 }
diff --git a/fs/inode.c b/fs/inode.c
index 00d5fc3..b8afbc7 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -980,6 +980,35 @@ void unlock_new_inode(struct inode *inode)
 EXPORT_SYMBOL(unlock_new_inode);
 
 /**
+ * lock_two_nondirectories - take two i_mutexes on non-directory objects
+ * @inode1: first inode to lock
+ * @inode2: second inode to lock
+ */
+void lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
+{
+	if (inode1 < inode2) {
+		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT);
+		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
+	} else {
+		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_PARENT);
+		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_CHILD);
+	}
+}
+EXPORT_SYMBOL(lock_two_nondirectories);
+
+/**
+ * unlock_two_nondirectories - release locks from lock_two_nondirectories()
+ * @inode1: first inode to unlock
+ * @inode2: second inode to unlock
+ */
+void unlock_two_nondirectories(struct inode *inode1, struct inode *inode2)
+{
+	mutex_unlock(&inode1->i_mutex);
+	mutex_unlock(&inode2->i_mutex);
+}
+EXPORT_SYMBOL(unlock_two_nondirectories);
+
+/**
  * iget5_locked - obtain an inode from a mounted file system
  * @sb:		super block of file system
  * @hashval:	hash value (usually inode number) to get
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 65c2be2..3258761 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -634,6 +634,9 @@ enum inode_i_mutex_lock_class
 	I_MUTEX_QUOTA
 };
 
+void lock_two_nondirectories(struct inode *, struct inode*);
+void unlock_two_nondirectories(struct inode *, struct inode*);
+
 /*
  * NOTE: in a 32bit arch with a preemptable kernel and
  * an UP compile the i_size_read/write must be atomic
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 02/12] vfs: don't use PARENT/CHILD lock classes for non-directories
  2013-07-03 20:12 [PATCH 00/12] Implement NFSv4 delegations, take 8 J. Bruce Fields
  2013-07-03 20:12 ` [PATCH 01/12] vfs: pull ext4's double-i_mutex-locking into common code J. Bruce Fields
@ 2013-07-03 20:12 ` J. Bruce Fields
       [not found]   ` <1372882356-14168-3-git-send-email-bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
       [not found] ` <1372882356-14168-1-git-send-email-bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-03 20:12 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-nfs, linux-fsdevel, J. Bruce Fields

From: "J. Bruce Fields" <bfields@redhat.com>

Reserve I_MUTEX_PARENT and I_MUTEX_CHILD for locking of actual
directories.

(Also I_MUTEX_QUOTA isn't really a meaningful name for this locking
class any more; fixed in a later patch.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
 fs/inode.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index b8afbc7..942451b 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -987,11 +987,11 @@ EXPORT_SYMBOL(unlock_new_inode);
 void lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
 {
 	if (inode1 < inode2) {
-		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT);
-		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
+		mutex_lock(&inode1->i_mutex);
+		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_QUOTA);
 	} else {
-		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_PARENT);
-		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_CHILD);
+		mutex_lock(&inode2->i_mutex);
+		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_QUOTA);
 	}
 }
 EXPORT_SYMBOL(lock_two_nondirectories);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 03/12] vfs: rename I_MUTEX_QUOTA now that it's not used for quotas
  2013-07-03 20:12 [PATCH 00/12] Implement NFSv4 delegations, take 8 J. Bruce Fields
@ 2013-07-03 20:12     ` J. Bruce Fields
  2013-07-03 20:12 ` [PATCH 02/12] vfs: don't use PARENT/CHILD lock classes for non-directories J. Bruce Fields
                       ` (8 subsequent siblings)
  9 siblings, 0 replies; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-03 20:12 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, J. Bruce Fields

From: "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

I_MUTEX_QUOTA is now just being used whenever we want to lock two
non-directories.  So the name isn't right.  I_MUTEX_NONDIR2 isn't
especially elegant but it's the best I could think of.

Also fix some outdated documentation.

Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 fs/inode.c         |    4 ++--
 include/linux/fs.h |    9 ++++++---
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 942451b..304db4c 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -988,10 +988,10 @@ void lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
 {
 	if (inode1 < inode2) {
 		mutex_lock(&inode1->i_mutex);
-		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_QUOTA);
+		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_NONDIR2);
 	} else {
 		mutex_lock(&inode2->i_mutex);
-		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_QUOTA);
+		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_NONDIR2);
 	}
 }
 EXPORT_SYMBOL(lock_two_nondirectories);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3258761..ec88235 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -620,10 +620,13 @@ static inline int inode_unhashed(struct inode *inode)
  * 0: the object of the current VFS operation
  * 1: parent
  * 2: child/target
- * 3: quota file
+ * 3: xattr
+ * 4: second non-directory
+ * The last is for certain operations (such as rename) which lock two
+ * non-directories at once.
  *
  * The locking order between these classes is
- * parent -> child -> normal -> xattr -> quota
+ * parent -> child -> normal -> xattr -> second non-directory
  */
 enum inode_i_mutex_lock_class
 {
@@ -631,7 +634,7 @@ enum inode_i_mutex_lock_class
 	I_MUTEX_PARENT,
 	I_MUTEX_CHILD,
 	I_MUTEX_XATTR,
-	I_MUTEX_QUOTA
+	I_MUTEX_NONDIR2
 };
 
 void lock_two_nondirectories(struct inode *, struct inode*);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 03/12] vfs: rename I_MUTEX_QUOTA now that it's not used for quotas
@ 2013-07-03 20:12     ` J. Bruce Fields
  0 siblings, 0 replies; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-03 20:12 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-nfs, linux-fsdevel, J. Bruce Fields

From: "J. Bruce Fields" <bfields@redhat.com>

I_MUTEX_QUOTA is now just being used whenever we want to lock two
non-directories.  So the name isn't right.  I_MUTEX_NONDIR2 isn't
especially elegant but it's the best I could think of.

Also fix some outdated documentation.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
 fs/inode.c         |    4 ++--
 include/linux/fs.h |    9 ++++++---
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 942451b..304db4c 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -988,10 +988,10 @@ void lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
 {
 	if (inode1 < inode2) {
 		mutex_lock(&inode1->i_mutex);
-		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_QUOTA);
+		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_NONDIR2);
 	} else {
 		mutex_lock(&inode2->i_mutex);
-		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_QUOTA);
+		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_NONDIR2);
 	}
 }
 EXPORT_SYMBOL(lock_two_nondirectories);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3258761..ec88235 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -620,10 +620,13 @@ static inline int inode_unhashed(struct inode *inode)
  * 0: the object of the current VFS operation
  * 1: parent
  * 2: child/target
- * 3: quota file
+ * 3: xattr
+ * 4: second non-directory
+ * The last is for certain operations (such as rename) which lock two
+ * non-directories at once.
  *
  * The locking order between these classes is
- * parent -> child -> normal -> xattr -> quota
+ * parent -> child -> normal -> xattr -> second non-directory
  */
 enum inode_i_mutex_lock_class
 {
@@ -631,7 +634,7 @@ enum inode_i_mutex_lock_class
 	I_MUTEX_PARENT,
 	I_MUTEX_CHILD,
 	I_MUTEX_XATTR,
-	I_MUTEX_QUOTA
+	I_MUTEX_NONDIR2
 };
 
 void lock_two_nondirectories(struct inode *, struct inode*);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 04/12] vfs: take i_mutex on renamed file
  2013-07-03 20:12 [PATCH 00/12] Implement NFSv4 delegations, take 8 J. Bruce Fields
@ 2013-07-03 20:12     ` J. Bruce Fields
  2013-07-03 20:12 ` [PATCH 02/12] vfs: don't use PARENT/CHILD lock classes for non-directories J. Bruce Fields
                       ` (8 subsequent siblings)
  9 siblings, 0 replies; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-03 20:12 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, J. Bruce Fields

From: "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

A read delegation is used by NFSv4 as a guarantee that a client can
perform local read opens without informing the server.

The open operation takes the last component of the pathname as an
argument, thus is also a lookup operation, and giving the client the
above guarantee means informing the client before we allow anything that
would change the set of names pointing to the inode.

Therefore, we need to break delegations on rename, link, and unlink.

We also need to prevent new delegations from being acquired while one of
these operations is in progress.

We could add some completely new locking for that purpose, but it's
simpler to use the i_mutex, since that's already taken by all the
operations we care about.

The single exception is rename.  So, modify rename to take the i_mutex
on the file that is being renamed.

Also fix up lockdep and Documentation/filesystems/directory-locking to
reflect the change.

Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 Documentation/filesystems/directory-locking |   31 +++++++++++++++++++--------
 fs/namei.c                                  |   12 ++++++++---
 2 files changed, 31 insertions(+), 12 deletions(-)

diff --git a/Documentation/filesystems/directory-locking b/Documentation/filesystems/directory-locking
index ff7b611..09bbf9a 100644
--- a/Documentation/filesystems/directory-locking
+++ b/Documentation/filesystems/directory-locking
@@ -2,6 +2,10 @@
 kinds of locks - per-inode (->i_mutex) and per-filesystem
 (->s_vfs_rename_mutex).
 
+	When taking the i_mutex on multiple non-directory objects, we
+always acquire the locks in order by increasing address.  We'll call
+that "inode pointer" order in the following.
+
 	For our purposes all operations fall in 5 classes:
 
 1) read access.  Locking rules: caller locks directory we are accessing.
@@ -12,8 +16,9 @@ kinds of locks - per-inode (->i_mutex) and per-filesystem
 locks victim and calls the method.
 
 4) rename() that is _not_ cross-directory.  Locking rules: caller locks
-the parent, finds source and target, if target already exists - locks it
-and then calls the method.
+the parent and finds source and target.  If target already exists, lock
+it.  If source is a non-directory, lock it.  If that means we need to
+lock both, lock them in inode pointer order.
 
 5) link creation.  Locking rules:
 	* lock parent
@@ -30,7 +35,9 @@ rules:
 		fail with -ENOTEMPTY
 	* if new parent is equal to or is a descendent of source
 		fail with -ELOOP
-	* if target exists - lock it.
+	* If target exists, lock it.  If source is a non-directory, lock
+	  it.  In case that means we need to lock both source and target,
+	  do so in inode pointer order.
 	* call the method.
 
 
@@ -56,9 +63,11 @@ objects - A < B iff A is an ancestor of B.
     renames will be blocked on filesystem lock and we don't start changing
     the order until we had acquired all locks).
 
-(3) any operation holds at most one lock on non-directory object and
-    that lock is acquired after all other locks.  (Proof: see descriptions
-    of operations).
+(3) locks on non-directory objects are acquired only after locks on
+    directory objects, and are acquired in inode pointer order.
+    (Proof: all operations but renames take lock on at most one
+    non-directory object, except renames, which take locks on source and
+    target in inode pointer order in the case they are not directories.)
 
 	Now consider the minimal deadlock.  Each process is blocked on
 attempt to acquire some lock and already holds at least one lock.  Let's
@@ -66,9 +75,13 @@ consider the set of contended locks.  First of all, filesystem lock is
 not contended, since any process blocked on it is not holding any locks.
 Thus all processes are blocked on ->i_mutex.
 
-	Non-directory objects are not contended due to (3).  Thus link
-creation can't be a part of deadlock - it can't be blocked on source
-and it means that it doesn't hold any locks.
+	By (3), any process holding a non-directory lock can only be
+waiting on another non-directory lock with a larger address.  Therefore
+the process holding the "largest" such lock can always make progress, and
+non-directory objects are not included in the set of contended locks.
+
+	Thus link creation can't be a part of deadlock - it can't be
+blocked on source and it means that it doesn't hold any locks.
 
 	Any contended object is either held by cross-directory rename or
 has a child that is also contended.  Indeed, suppose that it is held by
diff --git a/fs/namei.c b/fs/namei.c
index 9ed9361..61f6076 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3677,7 +3677,8 @@ SYSCALL_DEFINE2(link, const char __user *, oldname, const char __user *, newname
  *	   That's where 4.4 screws up. Current fix: serialization on
  *	   sb->s_vfs_rename_mutex. We might be more accurate, but that's another
  *	   story.
- *	c) we have to lock _three_ objects - parents and victim (if it exists).
+ *	c) we have to lock _four_ objects - parents and victim (if it exists),
+ *	   and source (if it is not a directory).
  *	   And that - after we got ->i_mutex on parents (until then we don't know
  *	   whether the target exists).  Solution: try to be smart with locking
  *	   order for inodes.  We rely on the fact that tree topology may change
@@ -3753,6 +3754,7 @@ static int vfs_rename_other(struct inode *old_dir, struct dentry *old_dentry,
 			    struct inode *new_dir, struct dentry *new_dentry)
 {
 	struct inode *target = new_dentry->d_inode;
+	struct inode *source = old_dentry->d_inode;
 	int error;
 
 	error = security_inode_rename(old_dir, old_dentry, new_dir, new_dentry);
@@ -3761,7 +3763,9 @@ static int vfs_rename_other(struct inode *old_dir, struct dentry *old_dentry,
 
 	dget(new_dentry);
 	if (target)
-		mutex_lock(&target->i_mutex);
+		lock_two_nondirectories(source, target);
+	else
+		mutex_lock(&source->i_mutex);
 
 	error = -EBUSY;
 	if (d_mountpoint(old_dentry)||d_mountpoint(new_dentry))
@@ -3777,7 +3781,9 @@ static int vfs_rename_other(struct inode *old_dir, struct dentry *old_dentry,
 		d_move(old_dentry, new_dentry);
 out:
 	if (target)
-		mutex_unlock(&target->i_mutex);
+		unlock_two_nondirectories(source, target);
+	else
+		mutex_unlock(&source->i_mutex);
 	dput(new_dentry);
 	return error;
 }
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 04/12] vfs: take i_mutex on renamed file
@ 2013-07-03 20:12     ` J. Bruce Fields
  0 siblings, 0 replies; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-03 20:12 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-nfs, linux-fsdevel, J. Bruce Fields

From: "J. Bruce Fields" <bfields@redhat.com>

A read delegation is used by NFSv4 as a guarantee that a client can
perform local read opens without informing the server.

The open operation takes the last component of the pathname as an
argument, thus is also a lookup operation, and giving the client the
above guarantee means informing the client before we allow anything that
would change the set of names pointing to the inode.

Therefore, we need to break delegations on rename, link, and unlink.

We also need to prevent new delegations from being acquired while one of
these operations is in progress.

We could add some completely new locking for that purpose, but it's
simpler to use the i_mutex, since that's already taken by all the
operations we care about.

The single exception is rename.  So, modify rename to take the i_mutex
on the file that is being renamed.

Also fix up lockdep and Documentation/filesystems/directory-locking to
reflect the change.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
 Documentation/filesystems/directory-locking |   31 +++++++++++++++++++--------
 fs/namei.c                                  |   12 ++++++++---
 2 files changed, 31 insertions(+), 12 deletions(-)

diff --git a/Documentation/filesystems/directory-locking b/Documentation/filesystems/directory-locking
index ff7b611..09bbf9a 100644
--- a/Documentation/filesystems/directory-locking
+++ b/Documentation/filesystems/directory-locking
@@ -2,6 +2,10 @@
 kinds of locks - per-inode (->i_mutex) and per-filesystem
 (->s_vfs_rename_mutex).
 
+	When taking the i_mutex on multiple non-directory objects, we
+always acquire the locks in order by increasing address.  We'll call
+that "inode pointer" order in the following.
+
 	For our purposes all operations fall in 5 classes:
 
 1) read access.  Locking rules: caller locks directory we are accessing.
@@ -12,8 +16,9 @@ kinds of locks - per-inode (->i_mutex) and per-filesystem
 locks victim and calls the method.
 
 4) rename() that is _not_ cross-directory.  Locking rules: caller locks
-the parent, finds source and target, if target already exists - locks it
-and then calls the method.
+the parent and finds source and target.  If target already exists, lock
+it.  If source is a non-directory, lock it.  If that means we need to
+lock both, lock them in inode pointer order.
 
 5) link creation.  Locking rules:
 	* lock parent
@@ -30,7 +35,9 @@ rules:
 		fail with -ENOTEMPTY
 	* if new parent is equal to or is a descendent of source
 		fail with -ELOOP
-	* if target exists - lock it.
+	* If target exists, lock it.  If source is a non-directory, lock
+	  it.  In case that means we need to lock both source and target,
+	  do so in inode pointer order.
 	* call the method.
 
 
@@ -56,9 +63,11 @@ objects - A < B iff A is an ancestor of B.
     renames will be blocked on filesystem lock and we don't start changing
     the order until we had acquired all locks).
 
-(3) any operation holds at most one lock on non-directory object and
-    that lock is acquired after all other locks.  (Proof: see descriptions
-    of operations).
+(3) locks on non-directory objects are acquired only after locks on
+    directory objects, and are acquired in inode pointer order.
+    (Proof: all operations but renames take lock on at most one
+    non-directory object, except renames, which take locks on source and
+    target in inode pointer order in the case they are not directories.)
 
 	Now consider the minimal deadlock.  Each process is blocked on
 attempt to acquire some lock and already holds at least one lock.  Let's
@@ -66,9 +75,13 @@ consider the set of contended locks.  First of all, filesystem lock is
 not contended, since any process blocked on it is not holding any locks.
 Thus all processes are blocked on ->i_mutex.
 
-	Non-directory objects are not contended due to (3).  Thus link
-creation can't be a part of deadlock - it can't be blocked on source
-and it means that it doesn't hold any locks.
+	By (3), any process holding a non-directory lock can only be
+waiting on another non-directory lock with a larger address.  Therefore
+the process holding the "largest" such lock can always make progress, and
+non-directory objects are not included in the set of contended locks.
+
+	Thus link creation can't be a part of deadlock - it can't be
+blocked on source and it means that it doesn't hold any locks.
 
 	Any contended object is either held by cross-directory rename or
 has a child that is also contended.  Indeed, suppose that it is held by
diff --git a/fs/namei.c b/fs/namei.c
index 9ed9361..61f6076 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3677,7 +3677,8 @@ SYSCALL_DEFINE2(link, const char __user *, oldname, const char __user *, newname
  *	   That's where 4.4 screws up. Current fix: serialization on
  *	   sb->s_vfs_rename_mutex. We might be more accurate, but that's another
  *	   story.
- *	c) we have to lock _three_ objects - parents and victim (if it exists).
+ *	c) we have to lock _four_ objects - parents and victim (if it exists),
+ *	   and source (if it is not a directory).
  *	   And that - after we got ->i_mutex on parents (until then we don't know
  *	   whether the target exists).  Solution: try to be smart with locking
  *	   order for inodes.  We rely on the fact that tree topology may change
@@ -3753,6 +3754,7 @@ static int vfs_rename_other(struct inode *old_dir, struct dentry *old_dentry,
 			    struct inode *new_dir, struct dentry *new_dentry)
 {
 	struct inode *target = new_dentry->d_inode;
+	struct inode *source = old_dentry->d_inode;
 	int error;
 
 	error = security_inode_rename(old_dir, old_dentry, new_dir, new_dentry);
@@ -3761,7 +3763,9 @@ static int vfs_rename_other(struct inode *old_dir, struct dentry *old_dentry,
 
 	dget(new_dentry);
 	if (target)
-		mutex_lock(&target->i_mutex);
+		lock_two_nondirectories(source, target);
+	else
+		mutex_lock(&source->i_mutex);
 
 	error = -EBUSY;
 	if (d_mountpoint(old_dentry)||d_mountpoint(new_dentry))
@@ -3777,7 +3781,9 @@ static int vfs_rename_other(struct inode *old_dir, struct dentry *old_dentry,
 		d_move(old_dentry, new_dentry);
 out:
 	if (target)
-		mutex_unlock(&target->i_mutex);
+		unlock_two_nondirectories(source, target);
+	else
+		mutex_unlock(&source->i_mutex);
 	dput(new_dentry);
 	return error;
 }
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 05/12] locks: introduce new FL_DELEG lock flag
  2013-07-03 20:12 [PATCH 00/12] Implement NFSv4 delegations, take 8 J. Bruce Fields
@ 2013-07-03 20:12     ` J. Bruce Fields
  2013-07-03 20:12 ` [PATCH 02/12] vfs: don't use PARENT/CHILD lock classes for non-directories J. Bruce Fields
                       ` (8 subsequent siblings)
  9 siblings, 0 replies; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-03 20:12 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, J. Bruce Fields

From: "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

For now FL_DELEG is just a synonym for FL_LEASE.  So this patch doesn't
change behavior.

Next we'll modify break_lease to treat FL_DELEG leases differently, to
account for the fact that NFSv4 delegations should be broken in more
situations than Windows oplocks.

Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 fs/locks.c          |    2 +-
 fs/nfsd/nfs4state.c |    2 +-
 include/linux/fs.h  |    1 +
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index cb424a4..deec4de 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -131,7 +131,7 @@
 
 #define IS_POSIX(fl)	(fl->fl_flags & FL_POSIX)
 #define IS_FLOCK(fl)	(fl->fl_flags & FL_FLOCK)
-#define IS_LEASE(fl)	(fl->fl_flags & FL_LEASE)
+#define IS_LEASE(fl)	(fl->fl_flags & (FL_LEASE|FL_DELEG))
 
 static bool lease_breaking(struct file_lock *fl)
 {
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 316ec84..616ff83 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -2932,7 +2932,7 @@ static struct file_lock *nfs4_alloc_init_lease(struct nfs4_delegation *dp, int f
 		return NULL;
 	locks_init_lock(fl);
 	fl->fl_lmops = &nfsd_lease_mng_ops;
-	fl->fl_flags = FL_LEASE;
+	fl->fl_flags = FL_DELEG;
 	fl->fl_type = flag == NFS4_OPEN_DELEGATE_READ? F_RDLCK: F_WRLCK;
 	fl->fl_end = OFFSET_MAX;
 	fl->fl_owner = (fl_owner_t)(dp->dl_file);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index ec88235..116b3e9 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -884,6 +884,7 @@ static inline int file_check_writeable(struct file *filp)
 
 #define FL_POSIX	1
 #define FL_FLOCK	2
+#define FL_DELEG	4	/* NFSv4 delegation */
 #define FL_ACCESS	8	/* not trying to lock, just looking */
 #define FL_EXISTS	16	/* when unlocking, test for existence */
 #define FL_LEASE	32	/* lease held on this file */
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 05/12] locks: introduce new FL_DELEG lock flag
@ 2013-07-03 20:12     ` J. Bruce Fields
  0 siblings, 0 replies; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-03 20:12 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-nfs, linux-fsdevel, J. Bruce Fields

From: "J. Bruce Fields" <bfields@redhat.com>

For now FL_DELEG is just a synonym for FL_LEASE.  So this patch doesn't
change behavior.

Next we'll modify break_lease to treat FL_DELEG leases differently, to
account for the fact that NFSv4 delegations should be broken in more
situations than Windows oplocks.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
 fs/locks.c          |    2 +-
 fs/nfsd/nfs4state.c |    2 +-
 include/linux/fs.h  |    1 +
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index cb424a4..deec4de 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -131,7 +131,7 @@
 
 #define IS_POSIX(fl)	(fl->fl_flags & FL_POSIX)
 #define IS_FLOCK(fl)	(fl->fl_flags & FL_FLOCK)
-#define IS_LEASE(fl)	(fl->fl_flags & FL_LEASE)
+#define IS_LEASE(fl)	(fl->fl_flags & (FL_LEASE|FL_DELEG))
 
 static bool lease_breaking(struct file_lock *fl)
 {
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 316ec84..616ff83 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -2932,7 +2932,7 @@ static struct file_lock *nfs4_alloc_init_lease(struct nfs4_delegation *dp, int f
 		return NULL;
 	locks_init_lock(fl);
 	fl->fl_lmops = &nfsd_lease_mng_ops;
-	fl->fl_flags = FL_LEASE;
+	fl->fl_flags = FL_DELEG;
 	fl->fl_type = flag == NFS4_OPEN_DELEGATE_READ? F_RDLCK: F_WRLCK;
 	fl->fl_end = OFFSET_MAX;
 	fl->fl_owner = (fl_owner_t)(dp->dl_file);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index ec88235..116b3e9 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -884,6 +884,7 @@ static inline int file_check_writeable(struct file *filp)
 
 #define FL_POSIX	1
 #define FL_FLOCK	2
+#define FL_DELEG	4	/* NFSv4 delegation */
 #define FL_ACCESS	8	/* not trying to lock, just looking */
 #define FL_EXISTS	16	/* when unlocking, test for existence */
 #define FL_LEASE	32	/* lease held on this file */
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 06/12] locks: implement delegations
  2013-07-03 20:12 [PATCH 00/12] Implement NFSv4 delegations, take 8 J. Bruce Fields
                   ` (2 preceding siblings ...)
       [not found] ` <1372882356-14168-1-git-send-email-bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2013-07-03 20:12 ` J. Bruce Fields
       [not found]   ` <1372882356-14168-7-git-send-email-bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2013-07-03 20:12 ` [PATCH 07/12] namei: minor vfs_unlink cleanup J. Bruce Fields
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-03 20:12 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-nfs, linux-fsdevel, J. Bruce Fields

From: "J. Bruce Fields" <bfields@redhat.com>

Implement NFSv4 delegations at the vfs level using the new FL_DELEG lock
type.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
 fs/locks.c         |   49 +++++++++++++++++++++++++++++++++++++++----------
 include/linux/fs.h |   18 +++++++++++++++---
 2 files changed, 54 insertions(+), 13 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index deec4de..2b56954 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1176,28 +1176,40 @@ static void time_out_leases(struct inode *inode)
 	}
 }
 
+static bool leases_conflict(struct file_lock *lease, struct file_lock *breaker)
+{
+	if ((breaker->fl_flags & FL_DELEG) && (lease->fl_flags & FL_LEASE))
+		return false;
+	return locks_conflict(breaker, lease);
+}
+
 /**
  *	__break_lease	-	revoke all outstanding leases on file
  *	@inode: the inode of the file to return
- *	@mode: the open mode (read or write)
+ *	@mode: O_RDONLY: break only write leases; O_WRONLY or O_RDWR:
+ *	    break all leases
+ *	@type: FL_LEASE: break leases and delegations; FL_DELEG: break
+ *	    only delegations
  *
  *	break_lease (inlined for speed) has checked there already is at least
  *	some kind of lock (maybe a lease) on this file.  Leases are broken on
  *	a call to open() or truncate().  This function can sleep unless you
  *	specified %O_NONBLOCK to your open().
  */
-int __break_lease(struct inode *inode, unsigned int mode)
+int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
 {
 	int error = 0;
 	struct file_lock *new_fl, *flock;
 	struct file_lock *fl;
 	unsigned long break_time;
 	int i_have_this_lease = 0;
+	bool lease_conflict = false;
 	int want_write = (mode & O_ACCMODE) != O_RDONLY;
 
 	new_fl = lease_alloc(NULL, want_write ? F_WRLCK : F_RDLCK);
 	if (IS_ERR(new_fl))
 		return PTR_ERR(new_fl);
+	new_fl->fl_flags = type;
 
 	lock_flocks();
 
@@ -1207,13 +1219,16 @@ int __break_lease(struct inode *inode, unsigned int mode)
 	if ((flock == NULL) || !IS_LEASE(flock))
 		goto out;
 
-	if (!locks_conflict(flock, new_fl))
+	for (fl = flock; fl && IS_LEASE(fl); fl = fl->fl_next) {
+		if (leases_conflict(fl, new_fl)) {
+			lease_conflict = true;
+			if (fl->fl_owner == current->files)
+				i_have_this_lease = 1;
+		}
+	}
+	if (!lease_conflict)
 		goto out;
 
-	for (fl = flock; fl && IS_LEASE(fl); fl = fl->fl_next)
-		if (fl->fl_owner == current->files)
-			i_have_this_lease = 1;
-
 	break_time = 0;
 	if (lease_break_time > 0) {
 		break_time = jiffies + lease_break_time * HZ;
@@ -1222,6 +1237,8 @@ int __break_lease(struct inode *inode, unsigned int mode)
 	}
 
 	for (fl = flock; fl && IS_LEASE(fl); fl = fl->fl_next) {
+		if (!leases_conflict(fl, new_fl))
+			continue;
 		if (want_write) {
 			if (fl->fl_flags & FL_UNLOCK_PENDING)
 				continue;
@@ -1263,7 +1280,7 @@ restart:
 		 */
 		for (flock = inode->i_flock; flock && IS_LEASE(flock);
 				flock = flock->fl_next) {
-			if (locks_conflict(new_fl, flock))
+			if (leases_conflict(new_fl, flock))
 				goto restart;
 		}
 		error = 0;
@@ -1343,9 +1360,20 @@ int generic_add_lease(struct file *filp, long arg, struct file_lock **flp)
 	struct file_lock *fl, **before, **my_before = NULL, *lease;
 	struct dentry *dentry = filp->f_path.dentry;
 	struct inode *inode = dentry->d_inode;
+	bool is_deleg = (*flp)->fl_flags & FL_DELEG;
 	int error;
 
 	lease = *flp;
+	/*
+	 * In the delegation case we need mutual exclusion with
+	 * a number of operations that take the i_mutex.  We trylock
+	 * because delegations are an optional optimization, and if
+	 * there's some chance of a conflict--we'd rather not
+	 * bother, maybe that's a sign this just isn't a good file to
+	 * hand out a delegation on.
+	 */
+	if (is_deleg && !mutex_trylock(&inode->i_mutex))
+		return -EAGAIN;
 
 	error = -EAGAIN;
 	if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0))
@@ -1397,9 +1425,10 @@ int generic_add_lease(struct file *filp, long arg, struct file_lock **flp)
 		goto out;
 
 	locks_insert_lock(before, lease);
-	return 0;
-
+	error = 0;
 out:
+	if (is_deleg)
+		mutex_unlock(&inode->i_mutex);
 	return error;
 }
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 116b3e9..c6cc686 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1006,7 +1006,7 @@ extern int vfs_test_lock(struct file *, struct file_lock *);
 extern int vfs_lock_file(struct file *, unsigned int, struct file_lock *, struct file_lock *);
 extern int vfs_cancel_lock(struct file *filp, struct file_lock *fl);
 extern int flock_lock_file_wait(struct file *filp, struct file_lock *fl);
-extern int __break_lease(struct inode *inode, unsigned int flags);
+extern int __break_lease(struct inode *inode, unsigned int flags, unsigned int type);
 extern void lease_get_mtime(struct inode *, struct timespec *time);
 extern int generic_setlease(struct file *, long, struct file_lock **);
 extern int vfs_setlease(struct file *, long, struct file_lock **);
@@ -1119,7 +1119,7 @@ static inline int flock_lock_file_wait(struct file *filp,
 	return -ENOLCK;
 }
 
-static inline int __break_lease(struct inode *inode, unsigned int mode)
+static inline int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
 {
 	return 0;
 }
@@ -1951,9 +1951,17 @@ static inline int locks_verify_truncate(struct inode *inode,
 static inline int break_lease(struct inode *inode, unsigned int mode)
 {
 	if (inode->i_flock)
-		return __break_lease(inode, mode);
+		return __break_lease(inode, mode, FL_LEASE);
 	return 0;
 }
+
+static inline int break_deleg(struct inode *inode, unsigned int mode)
+{
+	if (inode->i_flock)
+		return __break_lease(inode, mode, FL_DELEG);
+	return 0;
+}
+
 #else /* !CONFIG_FILE_LOCKING */
 static inline int locks_mandatory_locked(struct inode *inode)
 {
@@ -1993,6 +2001,10 @@ static inline int break_lease(struct inode *inode, unsigned int mode)
 	return 0;
 }
 
+static inline int break_deleg(struct inode *inode, unsigned int mode)
+{
+	return 0;
+}
 #endif /* CONFIG_FILE_LOCKING */
 
 /* fs/open.c */
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 07/12] namei: minor vfs_unlink cleanup
  2013-07-03 20:12 [PATCH 00/12] Implement NFSv4 delegations, take 8 J. Bruce Fields
                   ` (3 preceding siblings ...)
  2013-07-03 20:12 ` [PATCH 06/12] locks: implement delegations J. Bruce Fields
@ 2013-07-03 20:12 ` J. Bruce Fields
       [not found]   ` <1372882356-14168-8-git-send-email-bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2013-07-03 20:12 ` [PATCH 08/12] locks: break delegations on unlink J. Bruce Fields
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-03 20:12 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-nfs, linux-fsdevel, J. Bruce Fields

From: "J. Bruce Fields" <bfields@redhat.com>

We'll be using dentry->d_inode in one more place.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
 fs/namei.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 61f6076..7e76fe1 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3386,6 +3386,7 @@ SYSCALL_DEFINE1(rmdir, const char __user *, pathname)
 
 int vfs_unlink(struct inode *dir, struct dentry *dentry)
 {
+	struct inode *target = dentry->d_inode;
 	int error = may_delete(dir, dentry, 0);
 
 	if (error)
@@ -3394,7 +3395,7 @@ int vfs_unlink(struct inode *dir, struct dentry *dentry)
 	if (!dir->i_op->unlink)
 		return -EPERM;
 
-	mutex_lock(&dentry->d_inode->i_mutex);
+	mutex_lock(&target->i_mutex);
 	if (d_mountpoint(dentry))
 		error = -EBUSY;
 	else {
@@ -3405,11 +3406,11 @@ int vfs_unlink(struct inode *dir, struct dentry *dentry)
 				dont_mount(dentry);
 		}
 	}
-	mutex_unlock(&dentry->d_inode->i_mutex);
+	mutex_unlock(&target->i_mutex);
 
 	/* We don't d_delete() NFS sillyrenamed files--they still exist. */
 	if (!error && !(dentry->d_flags & DCACHE_NFSFS_RENAMED)) {
-		fsnotify_link_count(dentry->d_inode);
+		fsnotify_link_count(target);
 		d_delete(dentry);
 	}
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 08/12] locks: break delegations on unlink
  2013-07-03 20:12 [PATCH 00/12] Implement NFSv4 delegations, take 8 J. Bruce Fields
                   ` (4 preceding siblings ...)
  2013-07-03 20:12 ` [PATCH 07/12] namei: minor vfs_unlink cleanup J. Bruce Fields
@ 2013-07-03 20:12 ` J. Bruce Fields
       [not found]   ` <1372882356-14168-9-git-send-email-bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2013-07-03 20:12 ` [PATCH 09/12] locks: helper functions for delegation breaking J. Bruce Fields
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-03 20:12 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-nfs, linux-fsdevel, J. Bruce Fields, David Howells,
	Tyler Hicks, Dustin Kirkland

From: "J. Bruce Fields" <bfields@redhat.com>

We need to break delegations on any operation that changes the set of
links pointing to an inode.  Start with unlink.

Such operations also hold the i_mutex on a parent directory.  Breaking a
delegation may require waiting for a timeout (by default 90 seconds) in
the case of a unresponsive NFS client.  To avoid blocking all directory
operations, we therefore drop locks before waiting for the delegation.
The logic then looks like:

	acquire locks
	...
	test for delegation; if found:
		take reference on inode
		release locks
		wait for delegation break
		drop reference on inode
		retry

It is possible this could never terminate.  (Even if we take precautions
to prevent another delegation being acquired on the same inode, we could
get a different inode on each retry.)  But this seems very unlikely.

The initial test for a delegation happens after the lock on the target
inode is acquired, but the directory inode may have been acquired
further up the call stack.  We therefore add a "struct inode **"
argument to any intervening functions, which we use to pass the inode
back up to the caller in the case it needs a delegation synchronously
broken.

Cc: David Howells <dhowells@redhat.com>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
 drivers/base/devtmpfs.c |    2 +-
 fs/cachefiles/namei.c   |    2 +-
 fs/ecryptfs/inode.c     |    2 +-
 fs/namei.c              |   24 +++++++++++++++++++++---
 fs/nfsd/vfs.c           |    2 +-
 include/linux/fs.h      |    2 +-
 ipc/mqueue.c            |    2 +-
 7 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
index 7413d06..1b8490e 100644
--- a/drivers/base/devtmpfs.c
+++ b/drivers/base/devtmpfs.c
@@ -324,7 +324,7 @@ static int handle_remove(const char *nodename, struct device *dev)
 			mutex_lock(&dentry->d_inode->i_mutex);
 			notify_change(dentry, &newattrs);
 			mutex_unlock(&dentry->d_inode->i_mutex);
-			err = vfs_unlink(parent.dentry->d_inode, dentry);
+			err = vfs_unlink(parent.dentry->d_inode, dentry, NULL);
 			if (!err || err == -ENOENT)
 				deleted = 1;
 		}
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index 8c01c5fc..d61d884 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -294,7 +294,7 @@ static int cachefiles_bury_object(struct cachefiles_cache *cache,
 		if (ret < 0) {
 			cachefiles_io_error(cache, "Unlink security error");
 		} else {
-			ret = vfs_unlink(dir->d_inode, rep);
+			ret = vfs_unlink(dir->d_inode, rep, NULL);
 
 			if (preemptive)
 				cachefiles_mark_object_buried(cache, rep);
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index 5eab400..af42d88 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -153,7 +153,7 @@ static int ecryptfs_do_unlink(struct inode *dir, struct dentry *dentry,
 
 	dget(lower_dentry);
 	lower_dir_dentry = lock_parent(lower_dentry);
-	rc = vfs_unlink(lower_dir_inode, lower_dentry);
+	rc = vfs_unlink(lower_dir_inode, lower_dentry, NULL);
 	if (rc) {
 		printk(KERN_ERR "Error in vfs_unlink; rc = [%d]\n", rc);
 		goto out_unlock;
diff --git a/fs/namei.c b/fs/namei.c
index 7e76fe1..cba3db1 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3384,7 +3384,7 @@ SYSCALL_DEFINE1(rmdir, const char __user *, pathname)
 	return do_rmdir(AT_FDCWD, pathname);
 }
 
-int vfs_unlink(struct inode *dir, struct dentry *dentry)
+int vfs_unlink(struct inode *dir, struct dentry *dentry, struct inode **delegated_inode)
 {
 	struct inode *target = dentry->d_inode;
 	int error = may_delete(dir, dentry, 0);
@@ -3401,11 +3401,20 @@ int vfs_unlink(struct inode *dir, struct dentry *dentry)
 	else {
 		error = security_inode_unlink(dir, dentry);
 		if (!error) {
+			error = break_deleg(target, O_WRONLY|O_NONBLOCK);
+			if (error) {
+				if (error == -EWOULDBLOCK && delegated_inode) {
+					*delegated_inode = target;
+					ihold(target);
+				}
+				goto out;
+			}
 			error = dir->i_op->unlink(dir, dentry);
 			if (!error)
 				dont_mount(dentry);
 		}
 	}
+out:
 	mutex_unlock(&target->i_mutex);
 
 	/* We don't d_delete() NFS sillyrenamed files--they still exist. */
@@ -3430,6 +3439,7 @@ static long do_unlinkat(int dfd, const char __user *pathname)
 	struct dentry *dentry;
 	struct nameidata nd;
 	struct inode *inode = NULL;
+	struct inode *delegated_inode = NULL;
 	unsigned int lookup_flags = 0;
 retry:
 	name = user_path_parent(dfd, pathname, &nd, lookup_flags);
@@ -3444,7 +3454,7 @@ retry:
 	error = mnt_want_write(nd.path.mnt);
 	if (error)
 		goto exit1;
-
+retry_deleg:
 	mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
 	dentry = lookup_hash(&nd);
 	error = PTR_ERR(dentry);
@@ -3459,13 +3469,21 @@ retry:
 		error = security_path_unlink(&nd.path, dentry);
 		if (error)
 			goto exit2;
-		error = vfs_unlink(nd.path.dentry->d_inode, dentry);
+		error = vfs_unlink(nd.path.dentry->d_inode, dentry, &delegated_inode);
 exit2:
 		dput(dentry);
 	}
 	mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
 	if (inode)
 		iput(inode);	/* truncate the inode here */
+	inode = NULL;
+	if (delegated_inode) {
+		error = break_deleg(delegated_inode, O_WRONLY);
+		iput(delegated_inode);
+		delegated_inode = NULL;
+		if (!error)
+			goto retry_deleg;
+	}
 	mnt_drop_write(nd.path.mnt);
 exit1:
 	path_put(&nd.path);
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 84ce601..6ccaca2 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1882,7 +1882,7 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
 	if (host_err)
 		goto out_put;
 	if (type != S_IFDIR)
-		host_err = vfs_unlink(dirp, rdentry);
+		host_err = vfs_unlink(dirp, rdentry, NULL);
 	else
 		host_err = vfs_rmdir(dirp, rdentry);
 	if (!host_err)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c6cc686..f951588 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1463,7 +1463,7 @@ extern int vfs_mknod(struct inode *, struct dentry *, umode_t, dev_t);
 extern int vfs_symlink(struct inode *, struct dentry *, const char *);
 extern int vfs_link(struct dentry *, struct inode *, struct dentry *);
 extern int vfs_rmdir(struct inode *, struct dentry *);
-extern int vfs_unlink(struct inode *, struct dentry *);
+extern int vfs_unlink(struct inode *, struct dentry *, struct inode **);
 extern int vfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *);
 
 /*
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index e4e47f6..384eb35 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -884,7 +884,7 @@ SYSCALL_DEFINE1(mq_unlink, const char __user *, u_name)
 		err = -ENOENT;
 	} else {
 		ihold(inode);
-		err = vfs_unlink(dentry->d_parent->d_inode, dentry);
+		err = vfs_unlink(dentry->d_parent->d_inode, dentry, NULL);
 	}
 	dput(dentry);
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 09/12] locks: helper functions for delegation breaking
  2013-07-03 20:12 [PATCH 00/12] Implement NFSv4 delegations, take 8 J. Bruce Fields
                   ` (5 preceding siblings ...)
  2013-07-03 20:12 ` [PATCH 08/12] locks: break delegations on unlink J. Bruce Fields
@ 2013-07-03 20:12 ` J. Bruce Fields
       [not found]   ` <1372882356-14168-10-git-send-email-bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2013-07-09 13:23   ` Jeff Layton
  2013-07-03 20:12 ` [PATCH 10/12] locks: break delegations on rename J. Bruce Fields
                   ` (2 subsequent siblings)
  9 siblings, 2 replies; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-03 20:12 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-nfs, linux-fsdevel, J. Bruce Fields

From: "J. Bruce Fields" <bfields@redhat.com>

We'll need the same logic for rename and link.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
 fs/namei.c         |   13 +++----------
 include/linux/fs.h |   33 +++++++++++++++++++++++++++++++--
 2 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index cba3db1..a9d4031 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3401,14 +3401,9 @@ int vfs_unlink(struct inode *dir, struct dentry *dentry, struct inode **delegate
 	else {
 		error = security_inode_unlink(dir, dentry);
 		if (!error) {
-			error = break_deleg(target, O_WRONLY|O_NONBLOCK);
-			if (error) {
-				if (error == -EWOULDBLOCK && delegated_inode) {
-					*delegated_inode = target;
-					ihold(target);
-				}
+			error = try_break_deleg(target, delegated_inode);
+			if (error)
 				goto out;
-			}
 			error = dir->i_op->unlink(dir, dentry);
 			if (!error)
 				dont_mount(dentry);
@@ -3478,9 +3473,7 @@ exit2:
 		iput(inode);	/* truncate the inode here */
 	inode = NULL;
 	if (delegated_inode) {
-		error = break_deleg(delegated_inode, O_WRONLY);
-		iput(delegated_inode);
-		delegated_inode = NULL;
+		error = break_deleg_wait(&delegated_inode);
 		if (!error)
 			goto retry_deleg;
 	}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f951588..c37e463 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1894,6 +1894,9 @@ extern bool our_mnt(struct vfsmount *mnt);
 
 extern int current_umask(void);
 
+extern void ihold(struct inode * inode);
+extern void iput(struct inode *);
+
 /* /sys/fs */
 extern struct kobject *fs_kobj;
 
@@ -1962,6 +1965,28 @@ static inline int break_deleg(struct inode *inode, unsigned int mode)
 	return 0;
 }
 
+static inline int try_break_deleg(struct inode *inode, struct inode **delegated_inode)
+{
+	int ret;
+
+	ret = break_deleg(inode, O_WRONLY|O_NONBLOCK);
+	if (ret == -EWOULDBLOCK && delegated_inode) {
+		*delegated_inode = inode;
+		ihold(inode);
+	}
+	return ret;
+}
+
+static inline int break_deleg_wait(struct inode **delegated_inode)
+{
+	int ret;
+
+	ret = break_deleg(*delegated_inode, O_WRONLY);
+	iput(*delegated_inode);
+	*delegated_inode = NULL;
+	return ret;
+}
+
 #else /* !CONFIG_FILE_LOCKING */
 static inline int locks_mandatory_locked(struct inode *inode)
 {
@@ -2005,6 +2030,12 @@ static inline int break_deleg(struct inode *inode, unsigned int mode)
 {
 	return 0;
 }
+
+static inline int try_break_deleg(struct inode *inode, struct delegated_inode **inode)
+{
+	return 0;
+}
+
 #endif /* CONFIG_FILE_LOCKING */
 
 /* fs/open.c */
@@ -2335,8 +2366,6 @@ extern loff_t vfs_llseek(struct file *file, loff_t offset, int whence);
 extern int inode_init_always(struct super_block *, struct inode *);
 extern void inode_init_once(struct inode *);
 extern void address_space_init_once(struct address_space *mapping);
-extern void ihold(struct inode * inode);
-extern void iput(struct inode *);
 extern struct inode * igrab(struct inode *);
 extern ino_t iunique(struct super_block *, ino_t);
 extern int inode_needs_sync(struct inode *inode);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 10/12] locks: break delegations on rename
  2013-07-03 20:12 [PATCH 00/12] Implement NFSv4 delegations, take 8 J. Bruce Fields
                   ` (6 preceding siblings ...)
  2013-07-03 20:12 ` [PATCH 09/12] locks: helper functions for delegation breaking J. Bruce Fields
@ 2013-07-03 20:12 ` J. Bruce Fields
       [not found]   ` <1372882356-14168-11-git-send-email-bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2013-07-03 20:12 ` [PATCH 11/12] locks: break delegations on link J. Bruce Fields
  2013-07-03 20:12 ` [PATCH 12/12] locks: break delegations on any attribute modification J. Bruce Fields
  9 siblings, 1 reply; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-03 20:12 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-nfs, linux-fsdevel, J. Bruce Fields, David Howells

From: "J. Bruce Fields" <bfields@redhat.com>

Cc: David Howells <dhowells@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
 fs/cachefiles/namei.c |    2 +-
 fs/namei.c            |   26 ++++++++++++++++++++++----
 fs/nfsd/vfs.c         |    2 +-
 include/linux/fs.h    |    2 +-
 4 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index d61d884..678a8af 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -396,7 +396,7 @@ try_again:
 		cachefiles_io_error(cache, "Rename security error %d", ret);
 	} else {
 		ret = vfs_rename(dir->d_inode, rep,
-				 cache->graveyard->d_inode, grave);
+				 cache->graveyard->d_inode, grave, NULL);
 		if (ret != 0 && ret != -ENOMEM)
 			cachefiles_io_error(cache,
 					    "Rename failed with error %d", ret);
diff --git a/fs/namei.c b/fs/namei.c
index a9d4031..be00d37 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3763,7 +3763,8 @@ out:
 }
 
 static int vfs_rename_other(struct inode *old_dir, struct dentry *old_dentry,
-			    struct inode *new_dir, struct dentry *new_dentry)
+			    struct inode *new_dir, struct dentry *new_dentry,
+			    struct inode **delegated_inode)
 {
 	struct inode *target = new_dentry->d_inode;
 	struct inode *source = old_dentry->d_inode;
@@ -3783,6 +3784,14 @@ static int vfs_rename_other(struct inode *old_dir, struct dentry *old_dentry,
 	if (d_mountpoint(old_dentry)||d_mountpoint(new_dentry))
 		goto out;
 
+	error = try_break_deleg(source, delegated_inode);
+	if (error)
+		goto out;
+	if (target) {
+		error = try_break_deleg(target, delegated_inode);
+		if (error)
+			goto out;
+	}
 	error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry);
 	if (error)
 		goto out;
@@ -3801,7 +3810,8 @@ out:
 }
 
 int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
-	       struct inode *new_dir, struct dentry *new_dentry)
+	       struct inode *new_dir, struct dentry *new_dentry,
+	       struct inode **delegated_inode)
 {
 	int error;
 	int is_dir = S_ISDIR(old_dentry->d_inode->i_mode);
@@ -3829,7 +3839,7 @@ int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
 	if (is_dir)
 		error = vfs_rename_dir(old_dir,old_dentry,new_dir,new_dentry);
 	else
-		error = vfs_rename_other(old_dir,old_dentry,new_dir,new_dentry);
+		error = vfs_rename_other(old_dir,old_dentry,new_dir,new_dentry,delegated_inode);
 	if (!error)
 		fsnotify_move(old_dir, new_dir, old_name, is_dir,
 			      new_dentry->d_inode, old_dentry);
@@ -3845,6 +3855,7 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
 	struct dentry *old_dentry, *new_dentry;
 	struct dentry *trap;
 	struct nameidata oldnd, newnd;
+	struct inode *delegated_inode = NULL;
 	struct filename *from;
 	struct filename *to;
 	unsigned int lookup_flags = 0;
@@ -3884,6 +3895,7 @@ retry:
 	newnd.flags &= ~LOOKUP_PARENT;
 	newnd.flags |= LOOKUP_RENAME_TARGET;
 
+retry_deleg:
 	trap = lock_rename(new_dir, old_dir);
 
 	old_dentry = lookup_hash(&oldnd);
@@ -3920,13 +3932,19 @@ retry:
 	if (error)
 		goto exit5;
 	error = vfs_rename(old_dir->d_inode, old_dentry,
-				   new_dir->d_inode, new_dentry);
+				   new_dir->d_inode, new_dentry,
+				   &delegated_inode);
 exit5:
 	dput(new_dentry);
 exit4:
 	dput(old_dentry);
 exit3:
 	unlock_rename(new_dir, old_dir);
+	if (delegated_inode) {
+		error = break_deleg_wait(&delegated_inode);
+		if (!error)
+			goto retry_deleg;
+	}
 	mnt_drop_write(oldnd.path.mnt);
 exit2:
 	if (retry_estale(error, lookup_flags))
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 6ccaca2..54ac814 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1809,7 +1809,7 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
 		if (host_err)
 			goto out_dput_new;
 	}
-	host_err = vfs_rename(fdir, odentry, tdir, ndentry);
+	host_err = vfs_rename(fdir, odentry, tdir, ndentry, NULL);
 	if (!host_err) {
 		host_err = commit_metadata(tfhp);
 		if (!host_err)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c37e463..a35dadb 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1464,7 +1464,7 @@ extern int vfs_symlink(struct inode *, struct dentry *, const char *);
 extern int vfs_link(struct dentry *, struct inode *, struct dentry *);
 extern int vfs_rmdir(struct inode *, struct dentry *);
 extern int vfs_unlink(struct inode *, struct dentry *, struct inode **);
-extern int vfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *);
+extern int vfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *, struct inode **);
 
 /*
  * VFS dentry helper functions.
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 11/12] locks: break delegations on link
  2013-07-03 20:12 [PATCH 00/12] Implement NFSv4 delegations, take 8 J. Bruce Fields
                   ` (7 preceding siblings ...)
  2013-07-03 20:12 ` [PATCH 10/12] locks: break delegations on rename J. Bruce Fields
@ 2013-07-03 20:12 ` J. Bruce Fields
       [not found]   ` <1372882356-14168-12-git-send-email-bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2013-07-03 20:12 ` [PATCH 12/12] locks: break delegations on any attribute modification J. Bruce Fields
  9 siblings, 1 reply; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-03 20:12 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-nfs, linux-fsdevel, J. Bruce Fields, Tyler Hicks, Dustin Kirkland

From: "J. Bruce Fields" <bfields@redhat.com>

Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
 fs/ecryptfs/inode.c |    2 +-
 fs/namei.c          |   17 +++++++++++++----
 fs/nfsd/vfs.c       |    2 +-
 include/linux/fs.h  |    2 +-
 4 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index af42d88..19e4435 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -475,7 +475,7 @@ static int ecryptfs_link(struct dentry *old_dentry, struct inode *dir,
 	dget(lower_new_dentry);
 	lower_dir_dentry = lock_parent(lower_new_dentry);
 	rc = vfs_link(lower_old_dentry, lower_dir_dentry->d_inode,
-		      lower_new_dentry);
+		      lower_new_dentry, NULL);
 	if (rc || !lower_new_dentry->d_inode)
 		goto out_lock;
 	rc = ecryptfs_interpose(lower_new_dentry, new_dentry, dir->i_sb);
diff --git a/fs/namei.c b/fs/namei.c
index be00d37..18267e0 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3566,7 +3566,7 @@ SYSCALL_DEFINE2(symlink, const char __user *, oldname, const char __user *, newn
 	return sys_symlinkat(oldname, AT_FDCWD, newname);
 }
 
-int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_dentry)
+int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_dentry, struct inode **delegated_inode)
 {
 	struct inode *inode = old_dentry->d_inode;
 	unsigned max_links = dir->i_sb->s_max_links;
@@ -3602,8 +3602,11 @@ int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_de
 		error =  -ENOENT;
 	else if (max_links && inode->i_nlink >= max_links)
 		error = -EMLINK;
-	else
-		error = dir->i_op->link(old_dentry, dir, new_dentry);
+	else {
+		error = try_break_deleg(inode, delegated_inode);
+		if (!error)
+			error = dir->i_op->link(old_dentry, dir, new_dentry);
+	}
 	mutex_unlock(&inode->i_mutex);
 	if (!error)
 		fsnotify_link(dir, inode, new_dentry);
@@ -3624,6 +3627,7 @@ SYSCALL_DEFINE5(linkat, int, olddfd, const char __user *, oldname,
 {
 	struct dentry *new_dentry;
 	struct path old_path, new_path;
+	struct inode *delegated_inode = NULL;
 	int how = 0;
 	int error;
 
@@ -3662,9 +3666,14 @@ retry:
 	error = security_path_link(old_path.dentry, &new_path, new_dentry);
 	if (error)
 		goto out_dput;
-	error = vfs_link(old_path.dentry, new_path.dentry->d_inode, new_dentry);
+	error = vfs_link(old_path.dentry, new_path.dentry->d_inode, new_dentry, &delegated_inode);
 out_dput:
 	done_path_create(&new_path, new_dentry);
+	if (delegated_inode) {
+		error = break_deleg_wait(&delegated_inode);
+		if (!error)
+			goto retry;
+	}
 	if (retry_estale(error, how)) {
 		how |= LOOKUP_REVAL;
 		goto retry;
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 54ac814..b9740cb 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1708,7 +1708,7 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
 		err = nfserrno(host_err);
 		goto out_dput;
 	}
-	host_err = vfs_link(dold, dirp, dnew);
+	host_err = vfs_link(dold, dirp, dnew, NULL);
 	if (!host_err) {
 		err = nfserrno(commit_metadata(ffhp));
 		if (!err)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index a35dadb..936413c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1461,7 +1461,7 @@ extern int vfs_create(struct inode *, struct dentry *, umode_t, bool);
 extern int vfs_mkdir(struct inode *, struct dentry *, umode_t);
 extern int vfs_mknod(struct inode *, struct dentry *, umode_t, dev_t);
 extern int vfs_symlink(struct inode *, struct dentry *, const char *);
-extern int vfs_link(struct dentry *, struct inode *, struct dentry *);
+extern int vfs_link(struct dentry *, struct inode *, struct dentry *, struct inode **);
 extern int vfs_rmdir(struct inode *, struct dentry *);
 extern int vfs_unlink(struct inode *, struct dentry *, struct inode **);
 extern int vfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *, struct inode **);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 12/12] locks: break delegations on any attribute modification
  2013-07-03 20:12 [PATCH 00/12] Implement NFSv4 delegations, take 8 J. Bruce Fields
                   ` (8 preceding siblings ...)
  2013-07-03 20:12 ` [PATCH 11/12] locks: break delegations on link J. Bruce Fields
@ 2013-07-03 20:12 ` J. Bruce Fields
  2013-07-09 13:30   ` Jeff Layton
  9 siblings, 1 reply; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-03 20:12 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-nfs, linux-fsdevel, J. Bruce Fields, Mikulas Patocka,
	David Howells, Tyler Hicks, Dustin Kirkland

From: "J. Bruce Fields" <bfields@redhat.com>

NFSv4 uses leases to guarantee that clients can cache metadata as well
as data.

Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Cc: David Howells <dhowells@redhat.com>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
 drivers/base/devtmpfs.c   |    4 ++--
 fs/attr.c                 |    5 ++++-
 fs/cachefiles/interface.c |    4 ++--
 fs/ecryptfs/inode.c       |    2 +-
 fs/hpfs/namei.c           |    2 +-
 fs/inode.c                |    6 +++++-
 fs/nfsd/vfs.c             |    8 ++++++--
 fs/open.c                 |   21 +++++++++++++++++----
 fs/utimes.c               |    9 ++++++++-
 include/linux/fs.h        |    2 +-
 10 files changed, 47 insertions(+), 16 deletions(-)

diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
index 1b8490e..0f38201 100644
--- a/drivers/base/devtmpfs.c
+++ b/drivers/base/devtmpfs.c
@@ -216,7 +216,7 @@ static int handle_create(const char *nodename, umode_t mode, kuid_t uid,
 		newattrs.ia_gid = gid;
 		newattrs.ia_valid = ATTR_MODE|ATTR_UID|ATTR_GID;
 		mutex_lock(&dentry->d_inode->i_mutex);
-		notify_change(dentry, &newattrs);
+		notify_change(dentry, &newattrs, NULL);
 		mutex_unlock(&dentry->d_inode->i_mutex);
 
 		/* mark as kernel-created inode */
@@ -322,7 +322,7 @@ static int handle_remove(const char *nodename, struct device *dev)
 			newattrs.ia_valid =
 				ATTR_UID|ATTR_GID|ATTR_MODE;
 			mutex_lock(&dentry->d_inode->i_mutex);
-			notify_change(dentry, &newattrs);
+			notify_change(dentry, &newattrs, NULL);
 			mutex_unlock(&dentry->d_inode->i_mutex);
 			err = vfs_unlink(parent.dentry->d_inode, dentry, NULL);
 			if (!err || err == -ENOENT)
diff --git a/fs/attr.c b/fs/attr.c
index 1449adb..261f5c9 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -167,7 +167,7 @@ void setattr_copy(struct inode *inode, const struct iattr *attr)
 }
 EXPORT_SYMBOL(setattr_copy);
 
-int notify_change(struct dentry * dentry, struct iattr * attr)
+int notify_change(struct dentry * dentry, struct iattr * attr, struct inode **delegated_inode)
 {
 	struct inode *inode = dentry->d_inode;
 	umode_t mode = inode->i_mode;
@@ -243,6 +243,9 @@ int notify_change(struct dentry * dentry, struct iattr * attr)
 	error = security_inode_setattr(dentry, attr);
 	if (error)
 		return error;
+	error = try_break_deleg(inode, delegated_inode);
+	if (error)
+		return error;
 
 	if (inode->i_op->setattr)
 		error = inode->i_op->setattr(dentry, attr);
diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
index 746ce53..40f5917 100644
--- a/fs/cachefiles/interface.c
+++ b/fs/cachefiles/interface.c
@@ -417,14 +417,14 @@ static int cachefiles_attr_changed(struct fscache_object *_object)
 		_debug("discard tail %llx", oi_size);
 		newattrs.ia_valid = ATTR_SIZE;
 		newattrs.ia_size = oi_size & PAGE_MASK;
-		ret = notify_change(object->backer, &newattrs);
+		ret = notify_change(object->backer, &newattrs, NULL);
 		if (ret < 0)
 			goto truncate_failed;
 	}
 
 	newattrs.ia_valid = ATTR_SIZE;
 	newattrs.ia_size = ni_size;
-	ret = notify_change(object->backer, &newattrs);
+	ret = notify_change(object->backer, &newattrs, NULL);
 
 truncate_failed:
 	mutex_unlock(&object->backer->d_inode->i_mutex);
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index 19e4435..bd54575 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -992,7 +992,7 @@ static int ecryptfs_setattr(struct dentry *dentry, struct iattr *ia)
 		lower_ia.ia_valid &= ~ATTR_MODE;
 
 	mutex_lock(&lower_dentry->d_inode->i_mutex);
-	rc = notify_change(lower_dentry, &lower_ia);
+	rc = notify_change(lower_dentry, &lower_ia, NULL);
 	mutex_unlock(&lower_dentry->d_inode->i_mutex);
 out:
 	fsstack_copy_attr_all(inode, lower_inode);
diff --git a/fs/hpfs/namei.c b/fs/hpfs/namei.c
index 345713d..1b39afd 100644
--- a/fs/hpfs/namei.c
+++ b/fs/hpfs/namei.c
@@ -407,7 +407,7 @@ again:
 			/*printk("HPFS: truncating file before delete.\n");*/
 			newattrs.ia_size = 0;
 			newattrs.ia_valid = ATTR_SIZE | ATTR_CTIME;
-			err = notify_change(dentry, &newattrs);
+			err = notify_change(dentry, &newattrs, NULL);
 			put_write_access(inode);
 			if (!err)
 				goto again;
diff --git a/fs/inode.c b/fs/inode.c
index 304db4c..664d631 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1633,7 +1633,11 @@ static int __remove_suid(struct dentry *dentry, int kill)
 	struct iattr newattrs;
 
 	newattrs.ia_valid = ATTR_FORCE | kill;
-	return notify_change(dentry, &newattrs);
+	/*
+	 * Note we call this on write, so notify_change will not
+	 * encounter any conflicting delegations:
+	 */
+	return notify_change(dentry, &newattrs, NULL);
 }
 
 int file_remove_suid(struct file *file)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index b9740cb..e781901 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -426,7 +426,7 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, struct iattr *iap,
 			goto out_nfserr;
 		fh_lock(fhp);
 
-		host_err = notify_change(dentry, iap);
+		host_err = notify_change(dentry, iap, NULL);
 		err = nfserrno(host_err);
 		fh_unlock(fhp);
 	}
@@ -959,7 +959,11 @@ static void kill_suid(struct dentry *dentry)
 	ia.ia_valid = ATTR_KILL_SUID | ATTR_KILL_SGID | ATTR_KILL_PRIV;
 
 	mutex_lock(&dentry->d_inode->i_mutex);
-	notify_change(dentry, &ia);
+	/*
+	 * Note we call this on write, so notify_change will not
+	 * encounter any conflicting delegations:
+	 */
+	notify_change(dentry, &ia, NULL);
 	mutex_unlock(&dentry->d_inode->i_mutex);
 }
 
diff --git a/fs/open.c b/fs/open.c
index 8c74100..1a39d29 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -57,7 +57,7 @@ int do_truncate(struct dentry *dentry, loff_t length, unsigned int time_attrs,
 		newattrs.ia_valid |= ret | ATTR_FORCE;
 
 	mutex_lock(&dentry->d_inode->i_mutex);
-	ret = notify_change(dentry, &newattrs);
+	ret = notify_change(dentry, &newattrs, NULL);
 	mutex_unlock(&dentry->d_inode->i_mutex);
 	return ret;
 }
@@ -464,21 +464,28 @@ out:
 static int chmod_common(struct path *path, umode_t mode)
 {
 	struct inode *inode = path->dentry->d_inode;
+	struct inode *delegated_inode = NULL;
 	struct iattr newattrs;
 	int error;
 
 	error = mnt_want_write(path->mnt);
 	if (error)
 		return error;
+retry_deleg:
 	mutex_lock(&inode->i_mutex);
 	error = security_path_chmod(path, mode);
 	if (error)
 		goto out_unlock;
 	newattrs.ia_mode = (mode & S_IALLUGO) | (inode->i_mode & ~S_IALLUGO);
 	newattrs.ia_valid = ATTR_MODE | ATTR_CTIME;
-	error = notify_change(path->dentry, &newattrs);
+	error = notify_change(path->dentry, &newattrs, &delegated_inode);
 out_unlock:
 	mutex_unlock(&inode->i_mutex);
+	if (delegated_inode) {
+		error = break_deleg_wait(&delegated_inode);
+		if (!error)
+			goto retry_deleg;
+	}
 	mnt_drop_write(path->mnt);
 	return error;
 }
@@ -523,6 +530,7 @@ SYSCALL_DEFINE2(chmod, const char __user *, filename, umode_t, mode)
 static int chown_common(struct path *path, uid_t user, gid_t group)
 {
 	struct inode *inode = path->dentry->d_inode;
+	struct inode *delegated_inode = NULL;
 	int error;
 	struct iattr newattrs;
 	kuid_t uid;
@@ -547,12 +555,17 @@ static int chown_common(struct path *path, uid_t user, gid_t group)
 	if (!S_ISDIR(inode->i_mode))
 		newattrs.ia_valid |=
 			ATTR_KILL_SUID | ATTR_KILL_SGID | ATTR_KILL_PRIV;
+retry_deleg:
 	mutex_lock(&inode->i_mutex);
 	error = security_path_chown(path, uid, gid);
 	if (!error)
-		error = notify_change(path->dentry, &newattrs);
+		error = notify_change(path->dentry, &newattrs, &delegated_inode);
 	mutex_unlock(&inode->i_mutex);
-
+	if (delegated_inode) {
+		error = break_deleg_wait(&delegated_inode);
+		if (!error)
+			goto retry_deleg;
+	}
 	return error;
 }
 
diff --git a/fs/utimes.c b/fs/utimes.c
index f4fb7ec..aa138d6 100644
--- a/fs/utimes.c
+++ b/fs/utimes.c
@@ -53,6 +53,7 @@ static int utimes_common(struct path *path, struct timespec *times)
 	int error;
 	struct iattr newattrs;
 	struct inode *inode = path->dentry->d_inode;
+	struct inode *delegated_inode = NULL;
 
 	error = mnt_want_write(path->mnt);
 	if (error)
@@ -101,9 +102,15 @@ static int utimes_common(struct path *path, struct timespec *times)
 				goto mnt_drop_write_and_out;
 		}
 	}
+retry_deleg:
 	mutex_lock(&inode->i_mutex);
-	error = notify_change(path->dentry, &newattrs);
+	error = notify_change(path->dentry, &newattrs, &delegated_inode);
 	mutex_unlock(&inode->i_mutex);
+	if (delegated_inode) {
+		error = break_deleg_wait(&delegated_inode);
+		if (!error)
+			goto retry_deleg;
+	}
 
 mnt_drop_write_and_out:
 	mnt_drop_write(path->mnt);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 936413c..0d6d919 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2256,7 +2256,7 @@ extern void emergency_remount(void);
 #ifdef CONFIG_BLOCK
 extern sector_t bmap(struct inode *, sector_t);
 #endif
-extern int notify_change(struct dentry *, struct iattr *);
+extern int notify_change(struct dentry *, struct iattr *, struct inode **);
 extern int inode_permission(struct inode *, int);
 extern int generic_permission(struct inode *, int);
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [PATCH 01/12] vfs: pull ext4's double-i_mutex-locking into common code
  2013-07-03 20:12 ` [PATCH 01/12] vfs: pull ext4's double-i_mutex-locking into common code J. Bruce Fields
@ 2013-07-09 10:49       ` Jeff Layton
  0 siblings, 0 replies; 83+ messages in thread
From: Jeff Layton @ 2013-07-09 10:49 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Al Viro, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Theodore Ts'o,
	Andreas Dilger

On Wed,  3 Jul 2013 16:12:25 -0400
"J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> From: "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> 
> We want to do this elsewhere as well.
> 
> Cc: "Theodore Ts'o" <tytso-3s7WtUTddSA@public.gmane.org>
> Cc: Andreas Dilger <adilger.kernel-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org>
> Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  fs/ext4/ext4.h        |    2 --
>  fs/ext4/ioctl.c       |    4 ++--
>  fs/ext4/move_extent.c |   40 ++--------------------------------------
>  fs/inode.c            |   29 +++++++++++++++++++++++++++++
>  include/linux/fs.h    |    3 +++
>  5 files changed, 36 insertions(+), 42 deletions(-)
> 
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 5aae3d1..3590abe 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -2642,8 +2642,6 @@ extern void ext4_double_down_write_data_sem(struct inode *first,
>  					    struct inode *second);
>  extern void ext4_double_up_write_data_sem(struct inode *orig_inode,
>  					  struct inode *donor_inode);
> -void ext4_inode_double_lock(struct inode *inode1, struct inode *inode2);
> -void ext4_inode_double_unlock(struct inode *inode1, struct inode *inode2);
>  extern int ext4_move_extents(struct file *o_filp, struct file *d_filp,
>  			     __u64 start_orig, __u64 start_donor,
>  			     __u64 len, __u64 *moved_len);
> diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
> index 9491ac0..12048f7 100644
> --- a/fs/ext4/ioctl.c
> +++ b/fs/ext4/ioctl.c
> @@ -129,7 +129,7 @@ static long swap_inode_boot_loader(struct super_block *sb,
>  
>  	/* Protect orig inodes against a truncate and make sure,
>  	 * that only 1 swap_inode_boot_loader is running. */
> -	ext4_inode_double_lock(inode, inode_bl);
> +	lock_two_nondirectories(inode, inode_bl);
>  
>  	truncate_inode_pages(&inode->i_data, 0);
>  	truncate_inode_pages(&inode_bl->i_data, 0);
> @@ -204,7 +204,7 @@ static long swap_inode_boot_loader(struct super_block *sb,
>  	ext4_inode_resume_unlocked_dio(inode);
>  	ext4_inode_resume_unlocked_dio(inode_bl);
>  
> -	ext4_inode_double_unlock(inode, inode_bl);
> +	unlock_two_nondirectories(inode, inode_bl);
>  
>  	iput(inode_bl);
>  
> diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
> index 3dcbf36..986a838 100644
> --- a/fs/ext4/move_extent.c
> +++ b/fs/ext4/move_extent.c
> @@ -1206,42 +1206,6 @@ mext_check_arguments(struct inode *orig_inode,
>  }
>  
>  /**
> - * ext4_inode_double_lock - Lock i_mutex on both @inode1 and @inode2
> - *
> - * @inode1:	the inode structure
> - * @inode2:	the inode structure
> - *
> - * Lock two inodes' i_mutex
> - */
> -void
> -ext4_inode_double_lock(struct inode *inode1, struct inode *inode2)
> -{
> -	BUG_ON(inode1 == inode2);
> -	if (inode1 < inode2) {
> -		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT);
> -		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
> -	} else {
> -		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_PARENT);
> -		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_CHILD);
> -	}
> -}
> -
> -/**
> - * ext4_inode_double_unlock - Release i_mutex on both @inode1 and @inode2
> - *
> - * @inode1:     the inode that is released first
> - * @inode2:     the inode that is released second
> - *
> - */
> -
> -void
> -ext4_inode_double_unlock(struct inode *inode1, struct inode *inode2)
> -{
> -	mutex_unlock(&inode1->i_mutex);
> -	mutex_unlock(&inode2->i_mutex);
> -}
> -
> -/**
>   * ext4_move_extents - Exchange the specified range of a file
>   *
>   * @o_filp:		file structure of the original file
> @@ -1330,7 +1294,7 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp,
>  		return -EINVAL;
>  	}
>  	/* Protect orig and donor inodes against a truncate */
> -	ext4_inode_double_lock(orig_inode, donor_inode);
> +	lock_two_nondirectories(orig_inode, donor_inode);
>  
>  	/* Wait for all existing dio workers */
>  	ext4_inode_block_unlocked_dio(orig_inode);
> @@ -1538,7 +1502,7 @@ out:
>  	ext4_double_up_write_data_sem(orig_inode, donor_inode);
>  	ext4_inode_resume_unlocked_dio(orig_inode);
>  	ext4_inode_resume_unlocked_dio(donor_inode);
> -	ext4_inode_double_unlock(orig_inode, donor_inode);
> +	unlock_two_nondirectories(orig_inode, donor_inode);
>  
>  	return ret;
>  }
> diff --git a/fs/inode.c b/fs/inode.c
> index 00d5fc3..b8afbc7 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -980,6 +980,35 @@ void unlock_new_inode(struct inode *inode)
>  EXPORT_SYMBOL(unlock_new_inode);
>  
>  /**
> + * lock_two_nondirectories - take two i_mutexes on non-directory objects
> + * @inode1: first inode to lock
> + * @inode2: second inode to lock
> + */
> +void lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
> +{
> +	if (inode1 < inode2) {
> +		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT);
> +		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
> +	} else {
> +		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_PARENT);
> +		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_CHILD);
> +	}
> +}
> +EXPORT_SYMBOL(lock_two_nondirectories);
> +
> +/**
> + * unlock_two_nondirectories - release locks from lock_two_nondirectories()
> + * @inode1: first inode to unlock
> + * @inode2: second inode to unlock
> + */
> +void unlock_two_nondirectories(struct inode *inode1, struct inode *inode2)
> +{
> +	mutex_unlock(&inode1->i_mutex);
> +	mutex_unlock(&inode2->i_mutex);
> +}
> +EXPORT_SYMBOL(unlock_two_nondirectories);
> +
> +/**
>   * iget5_locked - obtain an inode from a mounted file system
>   * @sb:		super block of file system
>   * @hashval:	hash value (usually inode number) to get
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 65c2be2..3258761 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -634,6 +634,9 @@ enum inode_i_mutex_lock_class
>  	I_MUTEX_QUOTA
>  };
>  
> +void lock_two_nondirectories(struct inode *, struct inode*);
> +void unlock_two_nondirectories(struct inode *, struct inode*);
> +
>  /*
>   * NOTE: in a 32bit arch with a preemptable kernel and
>   * an UP compile the i_size_read/write must be atomic

Looks straightforward...

Acked-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 01/12] vfs: pull ext4's double-i_mutex-locking into common code
@ 2013-07-09 10:49       ` Jeff Layton
  0 siblings, 0 replies; 83+ messages in thread
From: Jeff Layton @ 2013-07-09 10:49 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Al Viro, linux-nfs, linux-fsdevel, Theodore Ts'o, Andreas Dilger

On Wed,  3 Jul 2013 16:12:25 -0400
"J. Bruce Fields" <bfields@redhat.com> wrote:

> From: "J. Bruce Fields" <bfields@redhat.com>
> 
> We want to do this elsewhere as well.
> 
> Cc: "Theodore Ts'o" <tytso@mit.edu>
> Cc: Andreas Dilger <adilger.kernel@dilger.ca>
> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> ---
>  fs/ext4/ext4.h        |    2 --
>  fs/ext4/ioctl.c       |    4 ++--
>  fs/ext4/move_extent.c |   40 ++--------------------------------------
>  fs/inode.c            |   29 +++++++++++++++++++++++++++++
>  include/linux/fs.h    |    3 +++
>  5 files changed, 36 insertions(+), 42 deletions(-)
> 
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 5aae3d1..3590abe 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -2642,8 +2642,6 @@ extern void ext4_double_down_write_data_sem(struct inode *first,
>  					    struct inode *second);
>  extern void ext4_double_up_write_data_sem(struct inode *orig_inode,
>  					  struct inode *donor_inode);
> -void ext4_inode_double_lock(struct inode *inode1, struct inode *inode2);
> -void ext4_inode_double_unlock(struct inode *inode1, struct inode *inode2);
>  extern int ext4_move_extents(struct file *o_filp, struct file *d_filp,
>  			     __u64 start_orig, __u64 start_donor,
>  			     __u64 len, __u64 *moved_len);
> diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
> index 9491ac0..12048f7 100644
> --- a/fs/ext4/ioctl.c
> +++ b/fs/ext4/ioctl.c
> @@ -129,7 +129,7 @@ static long swap_inode_boot_loader(struct super_block *sb,
>  
>  	/* Protect orig inodes against a truncate and make sure,
>  	 * that only 1 swap_inode_boot_loader is running. */
> -	ext4_inode_double_lock(inode, inode_bl);
> +	lock_two_nondirectories(inode, inode_bl);
>  
>  	truncate_inode_pages(&inode->i_data, 0);
>  	truncate_inode_pages(&inode_bl->i_data, 0);
> @@ -204,7 +204,7 @@ static long swap_inode_boot_loader(struct super_block *sb,
>  	ext4_inode_resume_unlocked_dio(inode);
>  	ext4_inode_resume_unlocked_dio(inode_bl);
>  
> -	ext4_inode_double_unlock(inode, inode_bl);
> +	unlock_two_nondirectories(inode, inode_bl);
>  
>  	iput(inode_bl);
>  
> diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
> index 3dcbf36..986a838 100644
> --- a/fs/ext4/move_extent.c
> +++ b/fs/ext4/move_extent.c
> @@ -1206,42 +1206,6 @@ mext_check_arguments(struct inode *orig_inode,
>  }
>  
>  /**
> - * ext4_inode_double_lock - Lock i_mutex on both @inode1 and @inode2
> - *
> - * @inode1:	the inode structure
> - * @inode2:	the inode structure
> - *
> - * Lock two inodes' i_mutex
> - */
> -void
> -ext4_inode_double_lock(struct inode *inode1, struct inode *inode2)
> -{
> -	BUG_ON(inode1 == inode2);
> -	if (inode1 < inode2) {
> -		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT);
> -		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
> -	} else {
> -		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_PARENT);
> -		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_CHILD);
> -	}
> -}
> -
> -/**
> - * ext4_inode_double_unlock - Release i_mutex on both @inode1 and @inode2
> - *
> - * @inode1:     the inode that is released first
> - * @inode2:     the inode that is released second
> - *
> - */
> -
> -void
> -ext4_inode_double_unlock(struct inode *inode1, struct inode *inode2)
> -{
> -	mutex_unlock(&inode1->i_mutex);
> -	mutex_unlock(&inode2->i_mutex);
> -}
> -
> -/**
>   * ext4_move_extents - Exchange the specified range of a file
>   *
>   * @o_filp:		file structure of the original file
> @@ -1330,7 +1294,7 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp,
>  		return -EINVAL;
>  	}
>  	/* Protect orig and donor inodes against a truncate */
> -	ext4_inode_double_lock(orig_inode, donor_inode);
> +	lock_two_nondirectories(orig_inode, donor_inode);
>  
>  	/* Wait for all existing dio workers */
>  	ext4_inode_block_unlocked_dio(orig_inode);
> @@ -1538,7 +1502,7 @@ out:
>  	ext4_double_up_write_data_sem(orig_inode, donor_inode);
>  	ext4_inode_resume_unlocked_dio(orig_inode);
>  	ext4_inode_resume_unlocked_dio(donor_inode);
> -	ext4_inode_double_unlock(orig_inode, donor_inode);
> +	unlock_two_nondirectories(orig_inode, donor_inode);
>  
>  	return ret;
>  }
> diff --git a/fs/inode.c b/fs/inode.c
> index 00d5fc3..b8afbc7 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -980,6 +980,35 @@ void unlock_new_inode(struct inode *inode)
>  EXPORT_SYMBOL(unlock_new_inode);
>  
>  /**
> + * lock_two_nondirectories - take two i_mutexes on non-directory objects
> + * @inode1: first inode to lock
> + * @inode2: second inode to lock
> + */
> +void lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
> +{
> +	if (inode1 < inode2) {
> +		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT);
> +		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
> +	} else {
> +		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_PARENT);
> +		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_CHILD);
> +	}
> +}
> +EXPORT_SYMBOL(lock_two_nondirectories);
> +
> +/**
> + * unlock_two_nondirectories - release locks from lock_two_nondirectories()
> + * @inode1: first inode to unlock
> + * @inode2: second inode to unlock
> + */
> +void unlock_two_nondirectories(struct inode *inode1, struct inode *inode2)
> +{
> +	mutex_unlock(&inode1->i_mutex);
> +	mutex_unlock(&inode2->i_mutex);
> +}
> +EXPORT_SYMBOL(unlock_two_nondirectories);
> +
> +/**
>   * iget5_locked - obtain an inode from a mounted file system
>   * @sb:		super block of file system
>   * @hashval:	hash value (usually inode number) to get
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 65c2be2..3258761 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -634,6 +634,9 @@ enum inode_i_mutex_lock_class
>  	I_MUTEX_QUOTA
>  };
>  
> +void lock_two_nondirectories(struct inode *, struct inode*);
> +void unlock_two_nondirectories(struct inode *, struct inode*);
> +
>  /*
>   * NOTE: in a 32bit arch with a preemptable kernel and
>   * an UP compile the i_size_read/write must be atomic

Looks straightforward...

Acked-by: Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 02/12] vfs: don't use PARENT/CHILD lock classes for non-directories
  2013-07-03 20:12 ` [PATCH 02/12] vfs: don't use PARENT/CHILD lock classes for non-directories J. Bruce Fields
@ 2013-07-09 10:50       ` Jeff Layton
  0 siblings, 0 replies; 83+ messages in thread
From: Jeff Layton @ 2013-07-09 10:50 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Al Viro, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

On Wed,  3 Jul 2013 16:12:26 -0400
"J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> From: "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> 
> Reserve I_MUTEX_PARENT and I_MUTEX_CHILD for locking of actual
> directories.
> 
> (Also I_MUTEX_QUOTA isn't really a meaningful name for this locking
> class any more; fixed in a later patch.)
> 
> Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  fs/inode.c |    8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index b8afbc7..942451b 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -987,11 +987,11 @@ EXPORT_SYMBOL(unlock_new_inode);
>  void lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
>  {
>  	if (inode1 < inode2) {
> -		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT);
> -		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
> +		mutex_lock(&inode1->i_mutex);
> +		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_QUOTA);
>  	} else {
> -		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_PARENT);
> -		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_CHILD);
> +		mutex_lock(&inode2->i_mutex);
> +		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_QUOTA);
>  	}
>  }
>  EXPORT_SYMBOL(lock_two_nondirectories);

Acked-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 02/12] vfs: don't use PARENT/CHILD lock classes for non-directories
@ 2013-07-09 10:50       ` Jeff Layton
  0 siblings, 0 replies; 83+ messages in thread
From: Jeff Layton @ 2013-07-09 10:50 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Al Viro, linux-nfs, linux-fsdevel

On Wed,  3 Jul 2013 16:12:26 -0400
"J. Bruce Fields" <bfields@redhat.com> wrote:

> From: "J. Bruce Fields" <bfields@redhat.com>
> 
> Reserve I_MUTEX_PARENT and I_MUTEX_CHILD for locking of actual
> directories.
> 
> (Also I_MUTEX_QUOTA isn't really a meaningful name for this locking
> class any more; fixed in a later patch.)
> 
> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> ---
>  fs/inode.c |    8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index b8afbc7..942451b 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -987,11 +987,11 @@ EXPORT_SYMBOL(unlock_new_inode);
>  void lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
>  {
>  	if (inode1 < inode2) {
> -		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT);
> -		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
> +		mutex_lock(&inode1->i_mutex);
> +		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_QUOTA);
>  	} else {
> -		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_PARENT);
> -		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_CHILD);
> +		mutex_lock(&inode2->i_mutex);
> +		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_QUOTA);
>  	}
>  }
>  EXPORT_SYMBOL(lock_two_nondirectories);

Acked-by: Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 03/12] vfs: rename I_MUTEX_QUOTA now that it's not used for quotas
  2013-07-03 20:12     ` J. Bruce Fields
@ 2013-07-09 10:54         ` Jeff Layton
  -1 siblings, 0 replies; 83+ messages in thread
From: Jeff Layton @ 2013-07-09 10:54 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Al Viro, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

On Wed,  3 Jul 2013 16:12:27 -0400
"J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> From: "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> 
> I_MUTEX_QUOTA is now just being used whenever we want to lock two
> non-directories.  So the name isn't right.  I_MUTEX_NONDIR2 isn't
> especially elegant but it's the best I could think of.
> 
> Also fix some outdated documentation.
> 
> Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  fs/inode.c         |    4 ++--
>  include/linux/fs.h |    9 ++++++---
>  2 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index 942451b..304db4c 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -988,10 +988,10 @@ void lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
>  {
>  	if (inode1 < inode2) {
>  		mutex_lock(&inode1->i_mutex);
> -		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_QUOTA);
> +		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_NONDIR2);
>  	} else {
>  		mutex_lock(&inode2->i_mutex);
> -		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_QUOTA);
> +		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_NONDIR2);
>  	}
>  }
>  EXPORT_SYMBOL(lock_two_nondirectories);
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 3258761..ec88235 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -620,10 +620,13 @@ static inline int inode_unhashed(struct inode *inode)
>   * 0: the object of the current VFS operation
>   * 1: parent
>   * 2: child/target
> - * 3: quota file
> + * 3: xattr
> + * 4: second non-directory
> + * The last is for certain operations (such as rename) which lock two
> + * non-directories at once.
>   *
>   * The locking order between these classes is
> - * parent -> child -> normal -> xattr -> quota
> + * parent -> child -> normal -> xattr -> second non-directory
>   */
>  enum inode_i_mutex_lock_class
>  {
> @@ -631,7 +634,7 @@ enum inode_i_mutex_lock_class
>  	I_MUTEX_PARENT,
>  	I_MUTEX_CHILD,
>  	I_MUTEX_XATTR,
> -	I_MUTEX_QUOTA
> +	I_MUTEX_NONDIR2
>  };
>  
>  void lock_two_nondirectories(struct inode *, struct inode*);

Ugly name, but I'm not sure what to call it either. Wonder if it would
make sense to do some sort of SOURCE/TARGET lock class and rearrange
the code to take that into account? But, that's just bikeshedding, so...

Acked-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 03/12] vfs: rename I_MUTEX_QUOTA now that it's not used for quotas
@ 2013-07-09 10:54         ` Jeff Layton
  0 siblings, 0 replies; 83+ messages in thread
From: Jeff Layton @ 2013-07-09 10:54 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Al Viro, linux-nfs, linux-fsdevel

On Wed,  3 Jul 2013 16:12:27 -0400
"J. Bruce Fields" <bfields@redhat.com> wrote:

> From: "J. Bruce Fields" <bfields@redhat.com>
> 
> I_MUTEX_QUOTA is now just being used whenever we want to lock two
> non-directories.  So the name isn't right.  I_MUTEX_NONDIR2 isn't
> especially elegant but it's the best I could think of.
> 
> Also fix some outdated documentation.
> 
> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> ---
>  fs/inode.c         |    4 ++--
>  include/linux/fs.h |    9 ++++++---
>  2 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index 942451b..304db4c 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -988,10 +988,10 @@ void lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
>  {
>  	if (inode1 < inode2) {
>  		mutex_lock(&inode1->i_mutex);
> -		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_QUOTA);
> +		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_NONDIR2);
>  	} else {
>  		mutex_lock(&inode2->i_mutex);
> -		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_QUOTA);
> +		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_NONDIR2);
>  	}
>  }
>  EXPORT_SYMBOL(lock_two_nondirectories);
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 3258761..ec88235 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -620,10 +620,13 @@ static inline int inode_unhashed(struct inode *inode)
>   * 0: the object of the current VFS operation
>   * 1: parent
>   * 2: child/target
> - * 3: quota file
> + * 3: xattr
> + * 4: second non-directory
> + * The last is for certain operations (such as rename) which lock two
> + * non-directories at once.
>   *
>   * The locking order between these classes is
> - * parent -> child -> normal -> xattr -> quota
> + * parent -> child -> normal -> xattr -> second non-directory
>   */
>  enum inode_i_mutex_lock_class
>  {
> @@ -631,7 +634,7 @@ enum inode_i_mutex_lock_class
>  	I_MUTEX_PARENT,
>  	I_MUTEX_CHILD,
>  	I_MUTEX_XATTR,
> -	I_MUTEX_QUOTA
> +	I_MUTEX_NONDIR2
>  };
>  
>  void lock_two_nondirectories(struct inode *, struct inode*);

Ugly name, but I'm not sure what to call it either. Wonder if it would
make sense to do some sort of SOURCE/TARGET lock class and rearrange
the code to take that into account? But, that's just bikeshedding, so...

Acked-by: Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 04/12] vfs: take i_mutex on renamed file
  2013-07-03 20:12     ` J. Bruce Fields
@ 2013-07-09 10:59         ` Jeff Layton
  -1 siblings, 0 replies; 83+ messages in thread
From: Jeff Layton @ 2013-07-09 10:59 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Al Viro, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

On Wed,  3 Jul 2013 16:12:28 -0400
"J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> From: "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> 
> A read delegation is used by NFSv4 as a guarantee that a client can
> perform local read opens without informing the server.
> 
> The open operation takes the last component of the pathname as an
> argument, thus is also a lookup operation, and giving the client the
> above guarantee means informing the client before we allow anything that
> would change the set of names pointing to the inode.
> 
> Therefore, we need to break delegations on rename, link, and unlink.
> 
> We also need to prevent new delegations from being acquired while one of
> these operations is in progress.
> 
> We could add some completely new locking for that purpose, but it's
> simpler to use the i_mutex, since that's already taken by all the
> operations we care about.
> 
> The single exception is rename.  So, modify rename to take the i_mutex
> on the file that is being renamed.
> 
> Also fix up lockdep and Documentation/filesystems/directory-locking to
> reflect the change.
> 
> Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  Documentation/filesystems/directory-locking |   31 +++++++++++++++++++--------
>  fs/namei.c                                  |   12 ++++++++---
>  2 files changed, 31 insertions(+), 12 deletions(-)
> 
> diff --git a/Documentation/filesystems/directory-locking b/Documentation/filesystems/directory-locking
> index ff7b611..09bbf9a 100644
> --- a/Documentation/filesystems/directory-locking
> +++ b/Documentation/filesystems/directory-locking
> @@ -2,6 +2,10 @@
>  kinds of locks - per-inode (->i_mutex) and per-filesystem
>  (->s_vfs_rename_mutex).
>  
> +	When taking the i_mutex on multiple non-directory objects, we
> +always acquire the locks in order by increasing address.  We'll call
> +that "inode pointer" order in the following.
> +
>  	For our purposes all operations fall in 5 classes:
>  
>  1) read access.  Locking rules: caller locks directory we are accessing.
> @@ -12,8 +16,9 @@ kinds of locks - per-inode (->i_mutex) and per-filesystem
>  locks victim and calls the method.
>  
>  4) rename() that is _not_ cross-directory.  Locking rules: caller locks
> -the parent, finds source and target, if target already exists - locks it
> -and then calls the method.
> +the parent and finds source and target.  If target already exists, lock
> +it.  If source is a non-directory, lock it.  If that means we need to
> +lock both, lock them in inode pointer order.
>  
>  5) link creation.  Locking rules:
>  	* lock parent
> @@ -30,7 +35,9 @@ rules:
>  		fail with -ENOTEMPTY
>  	* if new parent is equal to or is a descendent of source
>  		fail with -ELOOP
> -	* if target exists - lock it.
> +	* If target exists, lock it.  If source is a non-directory, lock
> +	  it.  In case that means we need to lock both source and target,
> +	  do so in inode pointer order.
>  	* call the method.
>  
>  
> @@ -56,9 +63,11 @@ objects - A < B iff A is an ancestor of B.
>      renames will be blocked on filesystem lock and we don't start changing
>      the order until we had acquired all locks).
>  
> -(3) any operation holds at most one lock on non-directory object and
> -    that lock is acquired after all other locks.  (Proof: see descriptions
> -    of operations).
> +(3) locks on non-directory objects are acquired only after locks on
> +    directory objects, and are acquired in inode pointer order.
> +    (Proof: all operations but renames take lock on at most one
> +    non-directory object, except renames, which take locks on source and
> +    target in inode pointer order in the case they are not directories.)
>  
>  	Now consider the minimal deadlock.  Each process is blocked on
>  attempt to acquire some lock and already holds at least one lock.  Let's
> @@ -66,9 +75,13 @@ consider the set of contended locks.  First of all, filesystem lock is
>  not contended, since any process blocked on it is not holding any locks.
>  Thus all processes are blocked on ->i_mutex.
>  
> -	Non-directory objects are not contended due to (3).  Thus link
> -creation can't be a part of deadlock - it can't be blocked on source
> -and it means that it doesn't hold any locks.
> +	By (3), any process holding a non-directory lock can only be
> +waiting on another non-directory lock with a larger address.  Therefore
> +the process holding the "largest" such lock can always make progress, and
> +non-directory objects are not included in the set of contended locks.
> +
> +	Thus link creation can't be a part of deadlock - it can't be
> +blocked on source and it means that it doesn't hold any locks.
>  
>  	Any contended object is either held by cross-directory rename or
>  has a child that is also contended.  Indeed, suppose that it is held by
> diff --git a/fs/namei.c b/fs/namei.c
> index 9ed9361..61f6076 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3677,7 +3677,8 @@ SYSCALL_DEFINE2(link, const char __user *, oldname, const char __user *, newname
>   *	   That's where 4.4 screws up. Current fix: serialization on
>   *	   sb->s_vfs_rename_mutex. We might be more accurate, but that's another
>   *	   story.
> - *	c) we have to lock _three_ objects - parents and victim (if it exists).
> + *	c) we have to lock _four_ objects - parents and victim (if it exists),
> + *	   and source (if it is not a directory).
>   *	   And that - after we got ->i_mutex on parents (until then we don't know
>   *	   whether the target exists).  Solution: try to be smart with locking
>   *	   order for inodes.  We rely on the fact that tree topology may change
> @@ -3753,6 +3754,7 @@ static int vfs_rename_other(struct inode *old_dir, struct dentry *old_dentry,
>  			    struct inode *new_dir, struct dentry *new_dentry)
>  {
>  	struct inode *target = new_dentry->d_inode;
> +	struct inode *source = old_dentry->d_inode;
>  	int error;
>  
>  	error = security_inode_rename(old_dir, old_dentry, new_dir, new_dentry);
> @@ -3761,7 +3763,9 @@ static int vfs_rename_other(struct inode *old_dir, struct dentry *old_dentry,
>  
>  	dget(new_dentry);
>  	if (target)
> -		mutex_lock(&target->i_mutex);
> +		lock_two_nondirectories(source, target);
> +	else
> +		mutex_lock(&source->i_mutex);
>  
>  	error = -EBUSY;
>  	if (d_mountpoint(old_dentry)||d_mountpoint(new_dentry))
> @@ -3777,7 +3781,9 @@ static int vfs_rename_other(struct inode *old_dir, struct dentry *old_dentry,
>  		d_move(old_dentry, new_dentry);
>  out:
>  	if (target)
> -		mutex_unlock(&target->i_mutex);
> +		unlock_two_nondirectories(source, target);
> +	else
> +		mutex_unlock(&source->i_mutex);
>  	dput(new_dentry);
>  	return error;
>  }

Seems sane...

Acked-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 04/12] vfs: take i_mutex on renamed file
@ 2013-07-09 10:59         ` Jeff Layton
  0 siblings, 0 replies; 83+ messages in thread
From: Jeff Layton @ 2013-07-09 10:59 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Al Viro, linux-nfs, linux-fsdevel

On Wed,  3 Jul 2013 16:12:28 -0400
"J. Bruce Fields" <bfields@redhat.com> wrote:

> From: "J. Bruce Fields" <bfields@redhat.com>
> 
> A read delegation is used by NFSv4 as a guarantee that a client can
> perform local read opens without informing the server.
> 
> The open operation takes the last component of the pathname as an
> argument, thus is also a lookup operation, and giving the client the
> above guarantee means informing the client before we allow anything that
> would change the set of names pointing to the inode.
> 
> Therefore, we need to break delegations on rename, link, and unlink.
> 
> We also need to prevent new delegations from being acquired while one of
> these operations is in progress.
> 
> We could add some completely new locking for that purpose, but it's
> simpler to use the i_mutex, since that's already taken by all the
> operations we care about.
> 
> The single exception is rename.  So, modify rename to take the i_mutex
> on the file that is being renamed.
> 
> Also fix up lockdep and Documentation/filesystems/directory-locking to
> reflect the change.
> 
> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> ---
>  Documentation/filesystems/directory-locking |   31 +++++++++++++++++++--------
>  fs/namei.c                                  |   12 ++++++++---
>  2 files changed, 31 insertions(+), 12 deletions(-)
> 
> diff --git a/Documentation/filesystems/directory-locking b/Documentation/filesystems/directory-locking
> index ff7b611..09bbf9a 100644
> --- a/Documentation/filesystems/directory-locking
> +++ b/Documentation/filesystems/directory-locking
> @@ -2,6 +2,10 @@
>  kinds of locks - per-inode (->i_mutex) and per-filesystem
>  (->s_vfs_rename_mutex).
>  
> +	When taking the i_mutex on multiple non-directory objects, we
> +always acquire the locks in order by increasing address.  We'll call
> +that "inode pointer" order in the following.
> +
>  	For our purposes all operations fall in 5 classes:
>  
>  1) read access.  Locking rules: caller locks directory we are accessing.
> @@ -12,8 +16,9 @@ kinds of locks - per-inode (->i_mutex) and per-filesystem
>  locks victim and calls the method.
>  
>  4) rename() that is _not_ cross-directory.  Locking rules: caller locks
> -the parent, finds source and target, if target already exists - locks it
> -and then calls the method.
> +the parent and finds source and target.  If target already exists, lock
> +it.  If source is a non-directory, lock it.  If that means we need to
> +lock both, lock them in inode pointer order.
>  
>  5) link creation.  Locking rules:
>  	* lock parent
> @@ -30,7 +35,9 @@ rules:
>  		fail with -ENOTEMPTY
>  	* if new parent is equal to or is a descendent of source
>  		fail with -ELOOP
> -	* if target exists - lock it.
> +	* If target exists, lock it.  If source is a non-directory, lock
> +	  it.  In case that means we need to lock both source and target,
> +	  do so in inode pointer order.
>  	* call the method.
>  
>  
> @@ -56,9 +63,11 @@ objects - A < B iff A is an ancestor of B.
>      renames will be blocked on filesystem lock and we don't start changing
>      the order until we had acquired all locks).
>  
> -(3) any operation holds at most one lock on non-directory object and
> -    that lock is acquired after all other locks.  (Proof: see descriptions
> -    of operations).
> +(3) locks on non-directory objects are acquired only after locks on
> +    directory objects, and are acquired in inode pointer order.
> +    (Proof: all operations but renames take lock on at most one
> +    non-directory object, except renames, which take locks on source and
> +    target in inode pointer order in the case they are not directories.)
>  
>  	Now consider the minimal deadlock.  Each process is blocked on
>  attempt to acquire some lock and already holds at least one lock.  Let's
> @@ -66,9 +75,13 @@ consider the set of contended locks.  First of all, filesystem lock is
>  not contended, since any process blocked on it is not holding any locks.
>  Thus all processes are blocked on ->i_mutex.
>  
> -	Non-directory objects are not contended due to (3).  Thus link
> -creation can't be a part of deadlock - it can't be blocked on source
> -and it means that it doesn't hold any locks.
> +	By (3), any process holding a non-directory lock can only be
> +waiting on another non-directory lock with a larger address.  Therefore
> +the process holding the "largest" such lock can always make progress, and
> +non-directory objects are not included in the set of contended locks.
> +
> +	Thus link creation can't be a part of deadlock - it can't be
> +blocked on source and it means that it doesn't hold any locks.
>  
>  	Any contended object is either held by cross-directory rename or
>  has a child that is also contended.  Indeed, suppose that it is held by
> diff --git a/fs/namei.c b/fs/namei.c
> index 9ed9361..61f6076 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3677,7 +3677,8 @@ SYSCALL_DEFINE2(link, const char __user *, oldname, const char __user *, newname
>   *	   That's where 4.4 screws up. Current fix: serialization on
>   *	   sb->s_vfs_rename_mutex. We might be more accurate, but that's another
>   *	   story.
> - *	c) we have to lock _three_ objects - parents and victim (if it exists).
> + *	c) we have to lock _four_ objects - parents and victim (if it exists),
> + *	   and source (if it is not a directory).
>   *	   And that - after we got ->i_mutex on parents (until then we don't know
>   *	   whether the target exists).  Solution: try to be smart with locking
>   *	   order for inodes.  We rely on the fact that tree topology may change
> @@ -3753,6 +3754,7 @@ static int vfs_rename_other(struct inode *old_dir, struct dentry *old_dentry,
>  			    struct inode *new_dir, struct dentry *new_dentry)
>  {
>  	struct inode *target = new_dentry->d_inode;
> +	struct inode *source = old_dentry->d_inode;
>  	int error;
>  
>  	error = security_inode_rename(old_dir, old_dentry, new_dir, new_dentry);
> @@ -3761,7 +3763,9 @@ static int vfs_rename_other(struct inode *old_dir, struct dentry *old_dentry,
>  
>  	dget(new_dentry);
>  	if (target)
> -		mutex_lock(&target->i_mutex);
> +		lock_two_nondirectories(source, target);
> +	else
> +		mutex_lock(&source->i_mutex);
>  
>  	error = -EBUSY;
>  	if (d_mountpoint(old_dentry)||d_mountpoint(new_dentry))
> @@ -3777,7 +3781,9 @@ static int vfs_rename_other(struct inode *old_dir, struct dentry *old_dentry,
>  		d_move(old_dentry, new_dentry);
>  out:
>  	if (target)
> -		mutex_unlock(&target->i_mutex);
> +		unlock_two_nondirectories(source, target);
> +	else
> +		mutex_unlock(&source->i_mutex);
>  	dput(new_dentry);
>  	return error;
>  }

Seems sane...

Acked-by: Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 05/12] locks: introduce new FL_DELEG lock flag
  2013-07-03 20:12     ` J. Bruce Fields
@ 2013-07-09 11:00         ` Jeff Layton
  -1 siblings, 0 replies; 83+ messages in thread
From: Jeff Layton @ 2013-07-09 11:00 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Al Viro, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

On Wed,  3 Jul 2013 16:12:29 -0400
"J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> From: "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> 
> For now FL_DELEG is just a synonym for FL_LEASE.  So this patch doesn't
> change behavior.
> 
> Next we'll modify break_lease to treat FL_DELEG leases differently, to
> account for the fact that NFSv4 delegations should be broken in more
> situations than Windows oplocks.
> 
> Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  fs/locks.c          |    2 +-
>  fs/nfsd/nfs4state.c |    2 +-
>  include/linux/fs.h  |    1 +
>  3 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/locks.c b/fs/locks.c
> index cb424a4..deec4de 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -131,7 +131,7 @@
>  
>  #define IS_POSIX(fl)	(fl->fl_flags & FL_POSIX)
>  #define IS_FLOCK(fl)	(fl->fl_flags & FL_FLOCK)
> -#define IS_LEASE(fl)	(fl->fl_flags & FL_LEASE)
> +#define IS_LEASE(fl)	(fl->fl_flags & (FL_LEASE|FL_DELEG))
>  
>  static bool lease_breaking(struct file_lock *fl)
>  {
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 316ec84..616ff83 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -2932,7 +2932,7 @@ static struct file_lock *nfs4_alloc_init_lease(struct nfs4_delegation *dp, int f
>  		return NULL;
>  	locks_init_lock(fl);
>  	fl->fl_lmops = &nfsd_lease_mng_ops;
> -	fl->fl_flags = FL_LEASE;
> +	fl->fl_flags = FL_DELEG;
>  	fl->fl_type = flag == NFS4_OPEN_DELEGATE_READ? F_RDLCK: F_WRLCK;
>  	fl->fl_end = OFFSET_MAX;
>  	fl->fl_owner = (fl_owner_t)(dp->dl_file);
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index ec88235..116b3e9 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -884,6 +884,7 @@ static inline int file_check_writeable(struct file *filp)
>  
>  #define FL_POSIX	1
>  #define FL_FLOCK	2
> +#define FL_DELEG	4	/* NFSv4 delegation */
>  #define FL_ACCESS	8	/* not trying to lock, just looking */
>  #define FL_EXISTS	16	/* when unlocking, test for existence */
>  #define FL_LEASE	32	/* lease held on this file */

Acked-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 05/12] locks: introduce new FL_DELEG lock flag
@ 2013-07-09 11:00         ` Jeff Layton
  0 siblings, 0 replies; 83+ messages in thread
From: Jeff Layton @ 2013-07-09 11:00 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Al Viro, linux-nfs, linux-fsdevel

On Wed,  3 Jul 2013 16:12:29 -0400
"J. Bruce Fields" <bfields@redhat.com> wrote:

> From: "J. Bruce Fields" <bfields@redhat.com>
> 
> For now FL_DELEG is just a synonym for FL_LEASE.  So this patch doesn't
> change behavior.
> 
> Next we'll modify break_lease to treat FL_DELEG leases differently, to
> account for the fact that NFSv4 delegations should be broken in more
> situations than Windows oplocks.
> 
> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> ---
>  fs/locks.c          |    2 +-
>  fs/nfsd/nfs4state.c |    2 +-
>  include/linux/fs.h  |    1 +
>  3 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/locks.c b/fs/locks.c
> index cb424a4..deec4de 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -131,7 +131,7 @@
>  
>  #define IS_POSIX(fl)	(fl->fl_flags & FL_POSIX)
>  #define IS_FLOCK(fl)	(fl->fl_flags & FL_FLOCK)
> -#define IS_LEASE(fl)	(fl->fl_flags & FL_LEASE)
> +#define IS_LEASE(fl)	(fl->fl_flags & (FL_LEASE|FL_DELEG))
>  
>  static bool lease_breaking(struct file_lock *fl)
>  {
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 316ec84..616ff83 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -2932,7 +2932,7 @@ static struct file_lock *nfs4_alloc_init_lease(struct nfs4_delegation *dp, int f
>  		return NULL;
>  	locks_init_lock(fl);
>  	fl->fl_lmops = &nfsd_lease_mng_ops;
> -	fl->fl_flags = FL_LEASE;
> +	fl->fl_flags = FL_DELEG;
>  	fl->fl_type = flag == NFS4_OPEN_DELEGATE_READ? F_RDLCK: F_WRLCK;
>  	fl->fl_end = OFFSET_MAX;
>  	fl->fl_owner = (fl_owner_t)(dp->dl_file);
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index ec88235..116b3e9 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -884,6 +884,7 @@ static inline int file_check_writeable(struct file *filp)
>  
>  #define FL_POSIX	1
>  #define FL_FLOCK	2
> +#define FL_DELEG	4	/* NFSv4 delegation */
>  #define FL_ACCESS	8	/* not trying to lock, just looking */
>  #define FL_EXISTS	16	/* when unlocking, test for existence */
>  #define FL_LEASE	32	/* lease held on this file */

Acked-by: Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 06/12] locks: implement delegations
  2013-07-03 20:12 ` [PATCH 06/12] locks: implement delegations J. Bruce Fields
@ 2013-07-09 12:23       ` Jeff Layton
  0 siblings, 0 replies; 83+ messages in thread
From: Jeff Layton @ 2013-07-09 12:23 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Al Viro, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

On Wed,  3 Jul 2013 16:12:30 -0400
"J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> From: "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> 
> Implement NFSv4 delegations at the vfs level using the new FL_DELEG lock
> type.
> 
> Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  fs/locks.c         |   49 +++++++++++++++++++++++++++++++++++++++----------
>  include/linux/fs.h |   18 +++++++++++++++---
>  2 files changed, 54 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/locks.c b/fs/locks.c
> index deec4de..2b56954 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -1176,28 +1176,40 @@ static void time_out_leases(struct inode *inode)
>  	}
>  }
>  
> +static bool leases_conflict(struct file_lock *lease, struct file_lock *breaker)
> +{
> +	if ((breaker->fl_flags & FL_DELEG) && (lease->fl_flags & FL_LEASE))
> +		return false;
> +	return locks_conflict(breaker, lease);
> +}
> +
>  /**
>   *	__break_lease	-	revoke all outstanding leases on file
>   *	@inode: the inode of the file to return
> - *	@mode: the open mode (read or write)
> + *	@mode: O_RDONLY: break only write leases; O_WRONLY or O_RDWR:
> + *	    break all leases
> + *	@type: FL_LEASE: break leases and delegations; FL_DELEG: break
> + *	    only delegations
>   *
>   *	break_lease (inlined for speed) has checked there already is at least
>   *	some kind of lock (maybe a lease) on this file.  Leases are broken on
>   *	a call to open() or truncate().  This function can sleep unless you
>   *	specified %O_NONBLOCK to your open().
>   */
> -int __break_lease(struct inode *inode, unsigned int mode)
> +int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
>  {
>  	int error = 0;
>  	struct file_lock *new_fl, *flock;
>  	struct file_lock *fl;
>  	unsigned long break_time;
>  	int i_have_this_lease = 0;
> +	bool lease_conflict = false;
>  	int want_write = (mode & O_ACCMODE) != O_RDONLY;
>  
>  	new_fl = lease_alloc(NULL, want_write ? F_WRLCK : F_RDLCK);
>  	if (IS_ERR(new_fl))
>  		return PTR_ERR(new_fl);
> +	new_fl->fl_flags = type;
>  
>  	lock_flocks();
>  
> @@ -1207,13 +1219,16 @@ int __break_lease(struct inode *inode, unsigned int mode)
>  	if ((flock == NULL) || !IS_LEASE(flock))
>  		goto out;
>  
> -	if (!locks_conflict(flock, new_fl))
> +	for (fl = flock; fl && IS_LEASE(fl); fl = fl->fl_next) {
> +		if (leases_conflict(fl, new_fl)) {
> +			lease_conflict = true;
> +			if (fl->fl_owner == current->files)
> +				i_have_this_lease = 1;
> +		}
> +	}
> +	if (!lease_conflict)
>  		goto out;
>  
> -	for (fl = flock; fl && IS_LEASE(fl); fl = fl->fl_next)
> -		if (fl->fl_owner == current->files)
> -			i_have_this_lease = 1;
> -
>  	break_time = 0;
>  	if (lease_break_time > 0) {
>  		break_time = jiffies + lease_break_time * HZ;
> @@ -1222,6 +1237,8 @@ int __break_lease(struct inode *inode, unsigned int mode)
>  	}
>  
>  	for (fl = flock; fl && IS_LEASE(fl); fl = fl->fl_next) {
> +		if (!leases_conflict(fl, new_fl))
> +			continue;
>  		if (want_write) {
>  			if (fl->fl_flags & FL_UNLOCK_PENDING)
>  				continue;
> @@ -1263,7 +1280,7 @@ restart:
>  		 */
>  		for (flock = inode->i_flock; flock && IS_LEASE(flock);
>  				flock = flock->fl_next) {
> -			if (locks_conflict(new_fl, flock))
> +			if (leases_conflict(new_fl, flock))
>  				goto restart;
>  		}
>  		error = 0;
> @@ -1343,9 +1360,20 @@ int generic_add_lease(struct file *filp, long arg, struct file_lock **flp)
>  	struct file_lock *fl, **before, **my_before = NULL, *lease;
>  	struct dentry *dentry = filp->f_path.dentry;
>  	struct inode *inode = dentry->d_inode;
> +	bool is_deleg = (*flp)->fl_flags & FL_DELEG;
>  	int error;
>  
>  	lease = *flp;
> +	/*
> +	 * In the delegation case we need mutual exclusion with
> +	 * a number of operations that take the i_mutex.  We trylock
> +	 * because delegations are an optional optimization, and if
> +	 * there's some chance of a conflict--we'd rather not
> +	 * bother, maybe that's a sign this just isn't a good file to
> +	 * hand out a delegation on.
> +	 */
> +	if (is_deleg && !mutex_trylock(&inode->i_mutex))
> +		return -EAGAIN;
>  
>  	error = -EAGAIN;
>  	if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0))
> @@ -1397,9 +1425,10 @@ int generic_add_lease(struct file *filp, long arg, struct file_lock **flp)
>  		goto out;
>  
>  	locks_insert_lock(before, lease);
> -	return 0;
> -
> +	error = 0;
>  out:
> +	if (is_deleg)
> +		mutex_unlock(&inode->i_mutex);
>  	return error;
>  }
>  
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 116b3e9..c6cc686 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1006,7 +1006,7 @@ extern int vfs_test_lock(struct file *, struct file_lock *);
>  extern int vfs_lock_file(struct file *, unsigned int, struct file_lock *, struct file_lock *);
>  extern int vfs_cancel_lock(struct file *filp, struct file_lock *fl);
>  extern int flock_lock_file_wait(struct file *filp, struct file_lock *fl);
> -extern int __break_lease(struct inode *inode, unsigned int flags);
> +extern int __break_lease(struct inode *inode, unsigned int flags, unsigned int type);
>  extern void lease_get_mtime(struct inode *, struct timespec *time);
>  extern int generic_setlease(struct file *, long, struct file_lock **);
>  extern int vfs_setlease(struct file *, long, struct file_lock **);
> @@ -1119,7 +1119,7 @@ static inline int flock_lock_file_wait(struct file *filp,
>  	return -ENOLCK;
>  }
>  
> -static inline int __break_lease(struct inode *inode, unsigned int mode)
> +static inline int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
>  {
>  	return 0;
>  }
> @@ -1951,9 +1951,17 @@ static inline int locks_verify_truncate(struct inode *inode,
>  static inline int break_lease(struct inode *inode, unsigned int mode)
>  {
>  	if (inode->i_flock)
> -		return __break_lease(inode, mode);
> +		return __break_lease(inode, mode, FL_LEASE);
>  	return 0;
>  }
> +
> +static inline int break_deleg(struct inode *inode, unsigned int mode)
> +{
> +	if (inode->i_flock)
> +		return __break_lease(inode, mode, FL_DELEG);
> +	return 0;
> +}
> +
>  #else /* !CONFIG_FILE_LOCKING */
>  static inline int locks_mandatory_locked(struct inode *inode)
>  {
> @@ -1993,6 +2001,10 @@ static inline int break_lease(struct inode *inode, unsigned int mode)
>  	return 0;
>  }
>  
> +static inline int break_deleg(struct inode *inode, unsigned int mode)
> +{
> +	return 0;
> +}
>  #endif /* CONFIG_FILE_LOCKING */
>  
>  /* fs/open.c */

Looks reasonable...

This (of course) has the same potential race that Al ID'ed a few days
ago. We'll probably need to reconcile this patch with whatever fix we
come up with there, but it shouldn't be too difficult.

Acked-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 06/12] locks: implement delegations
@ 2013-07-09 12:23       ` Jeff Layton
  0 siblings, 0 replies; 83+ messages in thread
From: Jeff Layton @ 2013-07-09 12:23 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Al Viro, linux-nfs, linux-fsdevel

On Wed,  3 Jul 2013 16:12:30 -0400
"J. Bruce Fields" <bfields@redhat.com> wrote:

> From: "J. Bruce Fields" <bfields@redhat.com>
> 
> Implement NFSv4 delegations at the vfs level using the new FL_DELEG lock
> type.
> 
> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> ---
>  fs/locks.c         |   49 +++++++++++++++++++++++++++++++++++++++----------
>  include/linux/fs.h |   18 +++++++++++++++---
>  2 files changed, 54 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/locks.c b/fs/locks.c
> index deec4de..2b56954 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -1176,28 +1176,40 @@ static void time_out_leases(struct inode *inode)
>  	}
>  }
>  
> +static bool leases_conflict(struct file_lock *lease, struct file_lock *breaker)
> +{
> +	if ((breaker->fl_flags & FL_DELEG) && (lease->fl_flags & FL_LEASE))
> +		return false;
> +	return locks_conflict(breaker, lease);
> +}
> +
>  /**
>   *	__break_lease	-	revoke all outstanding leases on file
>   *	@inode: the inode of the file to return
> - *	@mode: the open mode (read or write)
> + *	@mode: O_RDONLY: break only write leases; O_WRONLY or O_RDWR:
> + *	    break all leases
> + *	@type: FL_LEASE: break leases and delegations; FL_DELEG: break
> + *	    only delegations
>   *
>   *	break_lease (inlined for speed) has checked there already is at least
>   *	some kind of lock (maybe a lease) on this file.  Leases are broken on
>   *	a call to open() or truncate().  This function can sleep unless you
>   *	specified %O_NONBLOCK to your open().
>   */
> -int __break_lease(struct inode *inode, unsigned int mode)
> +int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
>  {
>  	int error = 0;
>  	struct file_lock *new_fl, *flock;
>  	struct file_lock *fl;
>  	unsigned long break_time;
>  	int i_have_this_lease = 0;
> +	bool lease_conflict = false;
>  	int want_write = (mode & O_ACCMODE) != O_RDONLY;
>  
>  	new_fl = lease_alloc(NULL, want_write ? F_WRLCK : F_RDLCK);
>  	if (IS_ERR(new_fl))
>  		return PTR_ERR(new_fl);
> +	new_fl->fl_flags = type;
>  
>  	lock_flocks();
>  
> @@ -1207,13 +1219,16 @@ int __break_lease(struct inode *inode, unsigned int mode)
>  	if ((flock == NULL) || !IS_LEASE(flock))
>  		goto out;
>  
> -	if (!locks_conflict(flock, new_fl))
> +	for (fl = flock; fl && IS_LEASE(fl); fl = fl->fl_next) {
> +		if (leases_conflict(fl, new_fl)) {
> +			lease_conflict = true;
> +			if (fl->fl_owner == current->files)
> +				i_have_this_lease = 1;
> +		}
> +	}
> +	if (!lease_conflict)
>  		goto out;
>  
> -	for (fl = flock; fl && IS_LEASE(fl); fl = fl->fl_next)
> -		if (fl->fl_owner == current->files)
> -			i_have_this_lease = 1;
> -
>  	break_time = 0;
>  	if (lease_break_time > 0) {
>  		break_time = jiffies + lease_break_time * HZ;
> @@ -1222,6 +1237,8 @@ int __break_lease(struct inode *inode, unsigned int mode)
>  	}
>  
>  	for (fl = flock; fl && IS_LEASE(fl); fl = fl->fl_next) {
> +		if (!leases_conflict(fl, new_fl))
> +			continue;
>  		if (want_write) {
>  			if (fl->fl_flags & FL_UNLOCK_PENDING)
>  				continue;
> @@ -1263,7 +1280,7 @@ restart:
>  		 */
>  		for (flock = inode->i_flock; flock && IS_LEASE(flock);
>  				flock = flock->fl_next) {
> -			if (locks_conflict(new_fl, flock))
> +			if (leases_conflict(new_fl, flock))
>  				goto restart;
>  		}
>  		error = 0;
> @@ -1343,9 +1360,20 @@ int generic_add_lease(struct file *filp, long arg, struct file_lock **flp)
>  	struct file_lock *fl, **before, **my_before = NULL, *lease;
>  	struct dentry *dentry = filp->f_path.dentry;
>  	struct inode *inode = dentry->d_inode;
> +	bool is_deleg = (*flp)->fl_flags & FL_DELEG;
>  	int error;
>  
>  	lease = *flp;
> +	/*
> +	 * In the delegation case we need mutual exclusion with
> +	 * a number of operations that take the i_mutex.  We trylock
> +	 * because delegations are an optional optimization, and if
> +	 * there's some chance of a conflict--we'd rather not
> +	 * bother, maybe that's a sign this just isn't a good file to
> +	 * hand out a delegation on.
> +	 */
> +	if (is_deleg && !mutex_trylock(&inode->i_mutex))
> +		return -EAGAIN;
>  
>  	error = -EAGAIN;
>  	if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0))
> @@ -1397,9 +1425,10 @@ int generic_add_lease(struct file *filp, long arg, struct file_lock **flp)
>  		goto out;
>  
>  	locks_insert_lock(before, lease);
> -	return 0;
> -
> +	error = 0;
>  out:
> +	if (is_deleg)
> +		mutex_unlock(&inode->i_mutex);
>  	return error;
>  }
>  
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 116b3e9..c6cc686 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1006,7 +1006,7 @@ extern int vfs_test_lock(struct file *, struct file_lock *);
>  extern int vfs_lock_file(struct file *, unsigned int, struct file_lock *, struct file_lock *);
>  extern int vfs_cancel_lock(struct file *filp, struct file_lock *fl);
>  extern int flock_lock_file_wait(struct file *filp, struct file_lock *fl);
> -extern int __break_lease(struct inode *inode, unsigned int flags);
> +extern int __break_lease(struct inode *inode, unsigned int flags, unsigned int type);
>  extern void lease_get_mtime(struct inode *, struct timespec *time);
>  extern int generic_setlease(struct file *, long, struct file_lock **);
>  extern int vfs_setlease(struct file *, long, struct file_lock **);
> @@ -1119,7 +1119,7 @@ static inline int flock_lock_file_wait(struct file *filp,
>  	return -ENOLCK;
>  }
>  
> -static inline int __break_lease(struct inode *inode, unsigned int mode)
> +static inline int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
>  {
>  	return 0;
>  }
> @@ -1951,9 +1951,17 @@ static inline int locks_verify_truncate(struct inode *inode,
>  static inline int break_lease(struct inode *inode, unsigned int mode)
>  {
>  	if (inode->i_flock)
> -		return __break_lease(inode, mode);
> +		return __break_lease(inode, mode, FL_LEASE);
>  	return 0;
>  }
> +
> +static inline int break_deleg(struct inode *inode, unsigned int mode)
> +{
> +	if (inode->i_flock)
> +		return __break_lease(inode, mode, FL_DELEG);
> +	return 0;
> +}
> +
>  #else /* !CONFIG_FILE_LOCKING */
>  static inline int locks_mandatory_locked(struct inode *inode)
>  {
> @@ -1993,6 +2001,10 @@ static inline int break_lease(struct inode *inode, unsigned int mode)
>  	return 0;
>  }
>  
> +static inline int break_deleg(struct inode *inode, unsigned int mode)
> +{
> +	return 0;
> +}
>  #endif /* CONFIG_FILE_LOCKING */
>  
>  /* fs/open.c */

Looks reasonable...

This (of course) has the same potential race that Al ID'ed a few days
ago. We'll probably need to reconcile this patch with whatever fix we
come up with there, but it shouldn't be too difficult.

Acked-by: Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 07/12] namei: minor vfs_unlink cleanup
  2013-07-03 20:12 ` [PATCH 07/12] namei: minor vfs_unlink cleanup J. Bruce Fields
@ 2013-07-09 12:50       ` Jeff Layton
  0 siblings, 0 replies; 83+ messages in thread
From: Jeff Layton @ 2013-07-09 12:50 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Al Viro, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

On Wed,  3 Jul 2013 16:12:31 -0400
"J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> From: "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> 
> We'll be using dentry->d_inode in one more place.
> 
> Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  fs/namei.c |    7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index 61f6076..7e76fe1 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3386,6 +3386,7 @@ SYSCALL_DEFINE1(rmdir, const char __user *, pathname)
>  
>  int vfs_unlink(struct inode *dir, struct dentry *dentry)
>  {
> +	struct inode *target = dentry->d_inode;
>  	int error = may_delete(dir, dentry, 0);
>  
>  	if (error)
> @@ -3394,7 +3395,7 @@ int vfs_unlink(struct inode *dir, struct dentry *dentry)
>  	if (!dir->i_op->unlink)
>  		return -EPERM;
>  
> -	mutex_lock(&dentry->d_inode->i_mutex);
> +	mutex_lock(&target->i_mutex);
>  	if (d_mountpoint(dentry))
>  		error = -EBUSY;
>  	else {
> @@ -3405,11 +3406,11 @@ int vfs_unlink(struct inode *dir, struct dentry *dentry)
>  				dont_mount(dentry);
>  		}
>  	}
> -	mutex_unlock(&dentry->d_inode->i_mutex);
> +	mutex_unlock(&target->i_mutex);
>  
>  	/* We don't d_delete() NFS sillyrenamed files--they still exist. */
>  	if (!error && !(dentry->d_flags & DCACHE_NFSFS_RENAMED)) {
> -		fsnotify_link_count(dentry->d_inode);
> +		fsnotify_link_count(target);
>  		d_delete(dentry);
>  	}
>  

Acked-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 07/12] namei: minor vfs_unlink cleanup
@ 2013-07-09 12:50       ` Jeff Layton
  0 siblings, 0 replies; 83+ messages in thread
From: Jeff Layton @ 2013-07-09 12:50 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Al Viro, linux-nfs, linux-fsdevel

On Wed,  3 Jul 2013 16:12:31 -0400
"J. Bruce Fields" <bfields@redhat.com> wrote:

> From: "J. Bruce Fields" <bfields@redhat.com>
> 
> We'll be using dentry->d_inode in one more place.
> 
> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> ---
>  fs/namei.c |    7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index 61f6076..7e76fe1 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3386,6 +3386,7 @@ SYSCALL_DEFINE1(rmdir, const char __user *, pathname)
>  
>  int vfs_unlink(struct inode *dir, struct dentry *dentry)
>  {
> +	struct inode *target = dentry->d_inode;
>  	int error = may_delete(dir, dentry, 0);
>  
>  	if (error)
> @@ -3394,7 +3395,7 @@ int vfs_unlink(struct inode *dir, struct dentry *dentry)
>  	if (!dir->i_op->unlink)
>  		return -EPERM;
>  
> -	mutex_lock(&dentry->d_inode->i_mutex);
> +	mutex_lock(&target->i_mutex);
>  	if (d_mountpoint(dentry))
>  		error = -EBUSY;
>  	else {
> @@ -3405,11 +3406,11 @@ int vfs_unlink(struct inode *dir, struct dentry *dentry)
>  				dont_mount(dentry);
>  		}
>  	}
> -	mutex_unlock(&dentry->d_inode->i_mutex);
> +	mutex_unlock(&target->i_mutex);
>  
>  	/* We don't d_delete() NFS sillyrenamed files--they still exist. */
>  	if (!error && !(dentry->d_flags & DCACHE_NFSFS_RENAMED)) {
> -		fsnotify_link_count(dentry->d_inode);
> +		fsnotify_link_count(target);
>  		d_delete(dentry);
>  	}
>  

Acked-by: Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 08/12] locks: break delegations on unlink
  2013-07-03 20:12 ` [PATCH 08/12] locks: break delegations on unlink J. Bruce Fields
@ 2013-07-09 13:05       ` Jeff Layton
  0 siblings, 0 replies; 83+ messages in thread
From: Jeff Layton @ 2013-07-09 13:05 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Al Viro, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, David Howells, Tyler Hicks,
	Dustin Kirkland

On Wed,  3 Jul 2013 16:12:32 -0400
"J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> From: "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> 
> We need to break delegations on any operation that changes the set of
> links pointing to an inode.  Start with unlink.
> 
> Such operations also hold the i_mutex on a parent directory.  Breaking a
> delegation may require waiting for a timeout (by default 90 seconds) in
> the case of a unresponsive NFS client.  To avoid blocking all directory
> operations, we therefore drop locks before waiting for the delegation.
> The logic then looks like:
> 
> 	acquire locks
> 	...
> 	test for delegation; if found:
> 		take reference on inode
> 		release locks
> 		wait for delegation break
> 		drop reference on inode
> 		retry
> 
> It is possible this could never terminate.  (Even if we take precautions
> to prevent another delegation being acquired on the same inode, we could
> get a different inode on each retry.)  But this seems very unlikely.
> 
> The initial test for a delegation happens after the lock on the target
> inode is acquired, but the directory inode may have been acquired
> further up the call stack.  We therefore add a "struct inode **"
> argument to any intervening functions, which we use to pass the inode
> back up to the caller in the case it needs a delegation synchronously
> broken.
> 
> Cc: David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Cc: Tyler Hicks <tyhicks-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> Cc: Dustin Kirkland <dustin.kirkland-Bv2LyzZ6GzxBDgjK7y7TUQ@public.gmane.org>
> Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  drivers/base/devtmpfs.c |    2 +-
>  fs/cachefiles/namei.c   |    2 +-
>  fs/ecryptfs/inode.c     |    2 +-
>  fs/namei.c              |   24 +++++++++++++++++++++---
>  fs/nfsd/vfs.c           |    2 +-
>  include/linux/fs.h      |    2 +-
>  ipc/mqueue.c            |    2 +-
>  7 files changed, 27 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
> index 7413d06..1b8490e 100644
> --- a/drivers/base/devtmpfs.c
> +++ b/drivers/base/devtmpfs.c
> @@ -324,7 +324,7 @@ static int handle_remove(const char *nodename, struct device *dev)
>  			mutex_lock(&dentry->d_inode->i_mutex);
>  			notify_change(dentry, &newattrs);
>  			mutex_unlock(&dentry->d_inode->i_mutex);
> -			err = vfs_unlink(parent.dentry->d_inode, dentry);
> +			err = vfs_unlink(parent.dentry->d_inode, dentry, NULL);
>  			if (!err || err == -ENOENT)
>  				deleted = 1;
>  		}
> diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
> index 8c01c5fc..d61d884 100644
> --- a/fs/cachefiles/namei.c
> +++ b/fs/cachefiles/namei.c
> @@ -294,7 +294,7 @@ static int cachefiles_bury_object(struct cachefiles_cache *cache,
>  		if (ret < 0) {
>  			cachefiles_io_error(cache, "Unlink security error");
>  		} else {
> -			ret = vfs_unlink(dir->d_inode, rep);
> +			ret = vfs_unlink(dir->d_inode, rep, NULL);
>  
>  			if (preemptive)
>  				cachefiles_mark_object_buried(cache, rep);
> diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
> index 5eab400..af42d88 100644
> --- a/fs/ecryptfs/inode.c
> +++ b/fs/ecryptfs/inode.c
> @@ -153,7 +153,7 @@ static int ecryptfs_do_unlink(struct inode *dir, struct dentry *dentry,
>  
>  	dget(lower_dentry);
>  	lower_dir_dentry = lock_parent(lower_dentry);
> -	rc = vfs_unlink(lower_dir_inode, lower_dentry);
> +	rc = vfs_unlink(lower_dir_inode, lower_dentry, NULL);
>  	if (rc) {
>  		printk(KERN_ERR "Error in vfs_unlink; rc = [%d]\n", rc);
>  		goto out_unlock;
> diff --git a/fs/namei.c b/fs/namei.c
> index 7e76fe1..cba3db1 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3384,7 +3384,7 @@ SYSCALL_DEFINE1(rmdir, const char __user *, pathname)
>  	return do_rmdir(AT_FDCWD, pathname);
>  }
>  
> -int vfs_unlink(struct inode *dir, struct dentry *dentry)
> +int vfs_unlink(struct inode *dir, struct dentry *dentry, struct inode **delegated_inode)

nit: this might be a good time to add a kerneldoc header on this
function. The delegated_inode thing might not be clear to the
uninitiated.

>  {
>  	struct inode *target = dentry->d_inode;
>  	int error = may_delete(dir, dentry, 0);
> @@ -3401,11 +3401,20 @@ int vfs_unlink(struct inode *dir, struct dentry *dentry)
>  	else {
>  		error = security_inode_unlink(dir, dentry);
>  		if (!error) {
> +			error = break_deleg(target, O_WRONLY|O_NONBLOCK);
> +			if (error) {
> +				if (error == -EWOULDBLOCK && delegated_inode) {
> +					*delegated_inode = target;
> +					ihold(target);
> +				}
> +				goto out;
> +			}
>  			error = dir->i_op->unlink(dir, dentry);
>  			if (!error)
>  				dont_mount(dentry);
>  		}
>  	}
> +out:
>  	mutex_unlock(&target->i_mutex);
>  
>  	/* We don't d_delete() NFS sillyrenamed files--they still exist. */
> @@ -3430,6 +3439,7 @@ static long do_unlinkat(int dfd, const char __user *pathname)
>  	struct dentry *dentry;
>  	struct nameidata nd;
>  	struct inode *inode = NULL;
> +	struct inode *delegated_inode = NULL;
>  	unsigned int lookup_flags = 0;
>  retry:
>  	name = user_path_parent(dfd, pathname, &nd, lookup_flags);
> @@ -3444,7 +3454,7 @@ retry:
>  	error = mnt_want_write(nd.path.mnt);
>  	if (error)
>  		goto exit1;
> -
> +retry_deleg:
>  	mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
>  	dentry = lookup_hash(&nd);
>  	error = PTR_ERR(dentry);
> @@ -3459,13 +3469,21 @@ retry:
>  		error = security_path_unlink(&nd.path, dentry);
>  		if (error)
>  			goto exit2;
> -		error = vfs_unlink(nd.path.dentry->d_inode, dentry);
> +		error = vfs_unlink(nd.path.dentry->d_inode, dentry, &delegated_inode);
>  exit2:
>  		dput(dentry);
>  	}
>  	mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
>  	if (inode)
>  		iput(inode);	/* truncate the inode here */
> +	inode = NULL;
> +	if (delegated_inode) {
> +		error = break_deleg(delegated_inode, O_WRONLY);
> +		iput(delegated_inode);
> +		delegated_inode = NULL;
> +		if (!error)
> +			goto retry_deleg;
> +	}
>  	mnt_drop_write(nd.path.mnt);
>  exit1:
>  	path_put(&nd.path);
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 84ce601..6ccaca2 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1882,7 +1882,7 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
>  	if (host_err)
>  		goto out_put;
>  	if (type != S_IFDIR)
> -		host_err = vfs_unlink(dirp, rdentry);
> +		host_err = vfs_unlink(dirp, rdentry, NULL);
>  	else
>  		host_err = vfs_rmdir(dirp, rdentry);
>  	if (!host_err)
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index c6cc686..f951588 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1463,7 +1463,7 @@ extern int vfs_mknod(struct inode *, struct dentry *, umode_t, dev_t);
>  extern int vfs_symlink(struct inode *, struct dentry *, const char *);
>  extern int vfs_link(struct dentry *, struct inode *, struct dentry *);
>  extern int vfs_rmdir(struct inode *, struct dentry *);
> -extern int vfs_unlink(struct inode *, struct dentry *);
> +extern int vfs_unlink(struct inode *, struct dentry *, struct inode **);
>  extern int vfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *);
>  
>  /*
> diff --git a/ipc/mqueue.c b/ipc/mqueue.c
> index e4e47f6..384eb35 100644
> --- a/ipc/mqueue.c
> +++ b/ipc/mqueue.c
> @@ -884,7 +884,7 @@ SYSCALL_DEFINE1(mq_unlink, const char __user *, u_name)
>  		err = -ENOENT;
>  	} else {
>  		ihold(inode);
> -		err = vfs_unlink(dentry->d_parent->d_inode, dentry);
> +		err = vfs_unlink(dentry->d_parent->d_inode, dentry, NULL);
>  	}
>  	dput(dentry);
>  

We probably also ought to eyeball some of these other cases where you
passing in NULL as the deleg_inode too. It's probably reasonable in
most cases -- exporting a filesystem that you also mount using ecryptfs
seems silly, but you never know...
Looks reasonable otherwise, if a little convoluted.


-- 
Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 08/12] locks: break delegations on unlink
@ 2013-07-09 13:05       ` Jeff Layton
  0 siblings, 0 replies; 83+ messages in thread
From: Jeff Layton @ 2013-07-09 13:05 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Al Viro, linux-nfs, linux-fsdevel, David Howells, Tyler Hicks,
	Dustin Kirkland

On Wed,  3 Jul 2013 16:12:32 -0400
"J. Bruce Fields" <bfields@redhat.com> wrote:

> From: "J. Bruce Fields" <bfields@redhat.com>
> 
> We need to break delegations on any operation that changes the set of
> links pointing to an inode.  Start with unlink.
> 
> Such operations also hold the i_mutex on a parent directory.  Breaking a
> delegation may require waiting for a timeout (by default 90 seconds) in
> the case of a unresponsive NFS client.  To avoid blocking all directory
> operations, we therefore drop locks before waiting for the delegation.
> The logic then looks like:
> 
> 	acquire locks
> 	...
> 	test for delegation; if found:
> 		take reference on inode
> 		release locks
> 		wait for delegation break
> 		drop reference on inode
> 		retry
> 
> It is possible this could never terminate.  (Even if we take precautions
> to prevent another delegation being acquired on the same inode, we could
> get a different inode on each retry.)  But this seems very unlikely.
> 
> The initial test for a delegation happens after the lock on the target
> inode is acquired, but the directory inode may have been acquired
> further up the call stack.  We therefore add a "struct inode **"
> argument to any intervening functions, which we use to pass the inode
> back up to the caller in the case it needs a delegation synchronously
> broken.
> 
> Cc: David Howells <dhowells@redhat.com>
> Cc: Tyler Hicks <tyhicks@canonical.com>
> Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> ---
>  drivers/base/devtmpfs.c |    2 +-
>  fs/cachefiles/namei.c   |    2 +-
>  fs/ecryptfs/inode.c     |    2 +-
>  fs/namei.c              |   24 +++++++++++++++++++++---
>  fs/nfsd/vfs.c           |    2 +-
>  include/linux/fs.h      |    2 +-
>  ipc/mqueue.c            |    2 +-
>  7 files changed, 27 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
> index 7413d06..1b8490e 100644
> --- a/drivers/base/devtmpfs.c
> +++ b/drivers/base/devtmpfs.c
> @@ -324,7 +324,7 @@ static int handle_remove(const char *nodename, struct device *dev)
>  			mutex_lock(&dentry->d_inode->i_mutex);
>  			notify_change(dentry, &newattrs);
>  			mutex_unlock(&dentry->d_inode->i_mutex);
> -			err = vfs_unlink(parent.dentry->d_inode, dentry);
> +			err = vfs_unlink(parent.dentry->d_inode, dentry, NULL);
>  			if (!err || err == -ENOENT)
>  				deleted = 1;
>  		}
> diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
> index 8c01c5fc..d61d884 100644
> --- a/fs/cachefiles/namei.c
> +++ b/fs/cachefiles/namei.c
> @@ -294,7 +294,7 @@ static int cachefiles_bury_object(struct cachefiles_cache *cache,
>  		if (ret < 0) {
>  			cachefiles_io_error(cache, "Unlink security error");
>  		} else {
> -			ret = vfs_unlink(dir->d_inode, rep);
> +			ret = vfs_unlink(dir->d_inode, rep, NULL);
>  
>  			if (preemptive)
>  				cachefiles_mark_object_buried(cache, rep);
> diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
> index 5eab400..af42d88 100644
> --- a/fs/ecryptfs/inode.c
> +++ b/fs/ecryptfs/inode.c
> @@ -153,7 +153,7 @@ static int ecryptfs_do_unlink(struct inode *dir, struct dentry *dentry,
>  
>  	dget(lower_dentry);
>  	lower_dir_dentry = lock_parent(lower_dentry);
> -	rc = vfs_unlink(lower_dir_inode, lower_dentry);
> +	rc = vfs_unlink(lower_dir_inode, lower_dentry, NULL);
>  	if (rc) {
>  		printk(KERN_ERR "Error in vfs_unlink; rc = [%d]\n", rc);
>  		goto out_unlock;
> diff --git a/fs/namei.c b/fs/namei.c
> index 7e76fe1..cba3db1 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3384,7 +3384,7 @@ SYSCALL_DEFINE1(rmdir, const char __user *, pathname)
>  	return do_rmdir(AT_FDCWD, pathname);
>  }
>  
> -int vfs_unlink(struct inode *dir, struct dentry *dentry)
> +int vfs_unlink(struct inode *dir, struct dentry *dentry, struct inode **delegated_inode)

nit: this might be a good time to add a kerneldoc header on this
function. The delegated_inode thing might not be clear to the
uninitiated.

>  {
>  	struct inode *target = dentry->d_inode;
>  	int error = may_delete(dir, dentry, 0);
> @@ -3401,11 +3401,20 @@ int vfs_unlink(struct inode *dir, struct dentry *dentry)
>  	else {
>  		error = security_inode_unlink(dir, dentry);
>  		if (!error) {
> +			error = break_deleg(target, O_WRONLY|O_NONBLOCK);
> +			if (error) {
> +				if (error == -EWOULDBLOCK && delegated_inode) {
> +					*delegated_inode = target;
> +					ihold(target);
> +				}
> +				goto out;
> +			}
>  			error = dir->i_op->unlink(dir, dentry);
>  			if (!error)
>  				dont_mount(dentry);
>  		}
>  	}
> +out:
>  	mutex_unlock(&target->i_mutex);
>  
>  	/* We don't d_delete() NFS sillyrenamed files--they still exist. */
> @@ -3430,6 +3439,7 @@ static long do_unlinkat(int dfd, const char __user *pathname)
>  	struct dentry *dentry;
>  	struct nameidata nd;
>  	struct inode *inode = NULL;
> +	struct inode *delegated_inode = NULL;
>  	unsigned int lookup_flags = 0;
>  retry:
>  	name = user_path_parent(dfd, pathname, &nd, lookup_flags);
> @@ -3444,7 +3454,7 @@ retry:
>  	error = mnt_want_write(nd.path.mnt);
>  	if (error)
>  		goto exit1;
> -
> +retry_deleg:
>  	mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
>  	dentry = lookup_hash(&nd);
>  	error = PTR_ERR(dentry);
> @@ -3459,13 +3469,21 @@ retry:
>  		error = security_path_unlink(&nd.path, dentry);
>  		if (error)
>  			goto exit2;
> -		error = vfs_unlink(nd.path.dentry->d_inode, dentry);
> +		error = vfs_unlink(nd.path.dentry->d_inode, dentry, &delegated_inode);
>  exit2:
>  		dput(dentry);
>  	}
>  	mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
>  	if (inode)
>  		iput(inode);	/* truncate the inode here */
> +	inode = NULL;
> +	if (delegated_inode) {
> +		error = break_deleg(delegated_inode, O_WRONLY);
> +		iput(delegated_inode);
> +		delegated_inode = NULL;
> +		if (!error)
> +			goto retry_deleg;
> +	}
>  	mnt_drop_write(nd.path.mnt);
>  exit1:
>  	path_put(&nd.path);
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 84ce601..6ccaca2 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1882,7 +1882,7 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
>  	if (host_err)
>  		goto out_put;
>  	if (type != S_IFDIR)
> -		host_err = vfs_unlink(dirp, rdentry);
> +		host_err = vfs_unlink(dirp, rdentry, NULL);
>  	else
>  		host_err = vfs_rmdir(dirp, rdentry);
>  	if (!host_err)
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index c6cc686..f951588 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1463,7 +1463,7 @@ extern int vfs_mknod(struct inode *, struct dentry *, umode_t, dev_t);
>  extern int vfs_symlink(struct inode *, struct dentry *, const char *);
>  extern int vfs_link(struct dentry *, struct inode *, struct dentry *);
>  extern int vfs_rmdir(struct inode *, struct dentry *);
> -extern int vfs_unlink(struct inode *, struct dentry *);
> +extern int vfs_unlink(struct inode *, struct dentry *, struct inode **);
>  extern int vfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *);
>  
>  /*
> diff --git a/ipc/mqueue.c b/ipc/mqueue.c
> index e4e47f6..384eb35 100644
> --- a/ipc/mqueue.c
> +++ b/ipc/mqueue.c
> @@ -884,7 +884,7 @@ SYSCALL_DEFINE1(mq_unlink, const char __user *, u_name)
>  		err = -ENOENT;
>  	} else {
>  		ihold(inode);
> -		err = vfs_unlink(dentry->d_parent->d_inode, dentry);
> +		err = vfs_unlink(dentry->d_parent->d_inode, dentry, NULL);
>  	}
>  	dput(dentry);
>  

We probably also ought to eyeball some of these other cases where you
passing in NULL as the deleg_inode too. It's probably reasonable in
most cases -- exporting a filesystem that you also mount using ecryptfs
seems silly, but you never know...
Looks reasonable otherwise, if a little convoluted.


-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 08/12] locks: break delegations on unlink
  2013-07-09 13:05       ` Jeff Layton
@ 2013-07-09 13:07           ` Jeff Layton
  -1 siblings, 0 replies; 83+ messages in thread
From: Jeff Layton @ 2013-07-09 13:07 UTC (permalink / raw)
  To: Jeff Layton
  Cc: J. Bruce Fields, Al Viro, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, David Howells, Tyler Hicks,
	Dustin Kirkland

On Tue, 9 Jul 2013 09:05:06 -0400
Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> On Wed,  3 Jul 2013 16:12:32 -0400
> "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
> > From: "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > 
> > We need to break delegations on any operation that changes the set of
> > links pointing to an inode.  Start with unlink.
> > 
> > Such operations also hold the i_mutex on a parent directory.  Breaking a
> > delegation may require waiting for a timeout (by default 90 seconds) in
> > the case of a unresponsive NFS client.  To avoid blocking all directory
> > operations, we therefore drop locks before waiting for the delegation.
> > The logic then looks like:
> > 
> > 	acquire locks
> > 	...
> > 	test for delegation; if found:
> > 		take reference on inode
> > 		release locks
> > 		wait for delegation break
> > 		drop reference on inode
> > 		retry
> > 
> > It is possible this could never terminate.  (Even if we take precautions
> > to prevent another delegation being acquired on the same inode, we could
> > get a different inode on each retry.)  But this seems very unlikely.
> > 
> > The initial test for a delegation happens after the lock on the target
> > inode is acquired, but the directory inode may have been acquired
> > further up the call stack.  We therefore add a "struct inode **"
> > argument to any intervening functions, which we use to pass the inode
> > back up to the caller in the case it needs a delegation synchronously
> > broken.
> > 
> > Cc: David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > Cc: Tyler Hicks <tyhicks-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> > Cc: Dustin Kirkland <dustin.kirkland-Bv2LyzZ6GzxBDgjK7y7TUQ@public.gmane.org>
> > Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > ---
> >  drivers/base/devtmpfs.c |    2 +-
> >  fs/cachefiles/namei.c   |    2 +-
> >  fs/ecryptfs/inode.c     |    2 +-
> >  fs/namei.c              |   24 +++++++++++++++++++++---
> >  fs/nfsd/vfs.c           |    2 +-
> >  include/linux/fs.h      |    2 +-
> >  ipc/mqueue.c            |    2 +-
> >  7 files changed, 27 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
> > index 7413d06..1b8490e 100644
> > --- a/drivers/base/devtmpfs.c
> > +++ b/drivers/base/devtmpfs.c
> > @@ -324,7 +324,7 @@ static int handle_remove(const char *nodename, struct device *dev)
> >  			mutex_lock(&dentry->d_inode->i_mutex);
> >  			notify_change(dentry, &newattrs);
> >  			mutex_unlock(&dentry->d_inode->i_mutex);
> > -			err = vfs_unlink(parent.dentry->d_inode, dentry);
> > +			err = vfs_unlink(parent.dentry->d_inode, dentry, NULL);
> >  			if (!err || err == -ENOENT)
> >  				deleted = 1;
> >  		}
> > diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
> > index 8c01c5fc..d61d884 100644
> > --- a/fs/cachefiles/namei.c
> > +++ b/fs/cachefiles/namei.c
> > @@ -294,7 +294,7 @@ static int cachefiles_bury_object(struct cachefiles_cache *cache,
> >  		if (ret < 0) {
> >  			cachefiles_io_error(cache, "Unlink security error");
> >  		} else {
> > -			ret = vfs_unlink(dir->d_inode, rep);
> > +			ret = vfs_unlink(dir->d_inode, rep, NULL);
> >  
> >  			if (preemptive)
> >  				cachefiles_mark_object_buried(cache, rep);
> > diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
> > index 5eab400..af42d88 100644
> > --- a/fs/ecryptfs/inode.c
> > +++ b/fs/ecryptfs/inode.c
> > @@ -153,7 +153,7 @@ static int ecryptfs_do_unlink(struct inode *dir, struct dentry *dentry,
> >  
> >  	dget(lower_dentry);
> >  	lower_dir_dentry = lock_parent(lower_dentry);
> > -	rc = vfs_unlink(lower_dir_inode, lower_dentry);
> > +	rc = vfs_unlink(lower_dir_inode, lower_dentry, NULL);
> >  	if (rc) {
> >  		printk(KERN_ERR "Error in vfs_unlink; rc = [%d]\n", rc);
> >  		goto out_unlock;
> > diff --git a/fs/namei.c b/fs/namei.c
> > index 7e76fe1..cba3db1 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -3384,7 +3384,7 @@ SYSCALL_DEFINE1(rmdir, const char __user *, pathname)
> >  	return do_rmdir(AT_FDCWD, pathname);
> >  }
> >  
> > -int vfs_unlink(struct inode *dir, struct dentry *dentry)
> > +int vfs_unlink(struct inode *dir, struct dentry *dentry, struct inode **delegated_inode)
> 
> nit: this might be a good time to add a kerneldoc header on this
> function. The delegated_inode thing might not be clear to the
> uninitiated.
> 
> >  {
> >  	struct inode *target = dentry->d_inode;
> >  	int error = may_delete(dir, dentry, 0);
> > @@ -3401,11 +3401,20 @@ int vfs_unlink(struct inode *dir, struct dentry *dentry)
> >  	else {
> >  		error = security_inode_unlink(dir, dentry);
> >  		if (!error) {
> > +			error = break_deleg(target, O_WRONLY|O_NONBLOCK);
> > +			if (error) {
> > +				if (error == -EWOULDBLOCK && delegated_inode) {
> > +					*delegated_inode = target;
> > +					ihold(target);
> > +				}
> > +				goto out;
> > +			}
> >  			error = dir->i_op->unlink(dir, dentry);
> >  			if (!error)
> >  				dont_mount(dentry);
> >  		}
> >  	}
> > +out:
> >  	mutex_unlock(&target->i_mutex);
> >  
> >  	/* We don't d_delete() NFS sillyrenamed files--they still exist. */
> > @@ -3430,6 +3439,7 @@ static long do_unlinkat(int dfd, const char __user *pathname)
> >  	struct dentry *dentry;
> >  	struct nameidata nd;
> >  	struct inode *inode = NULL;
> > +	struct inode *delegated_inode = NULL;
> >  	unsigned int lookup_flags = 0;
> >  retry:
> >  	name = user_path_parent(dfd, pathname, &nd, lookup_flags);
> > @@ -3444,7 +3454,7 @@ retry:
> >  	error = mnt_want_write(nd.path.mnt);
> >  	if (error)
> >  		goto exit1;
> > -
> > +retry_deleg:
> >  	mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
> >  	dentry = lookup_hash(&nd);
> >  	error = PTR_ERR(dentry);
> > @@ -3459,13 +3469,21 @@ retry:
> >  		error = security_path_unlink(&nd.path, dentry);
> >  		if (error)
> >  			goto exit2;
> > -		error = vfs_unlink(nd.path.dentry->d_inode, dentry);
> > +		error = vfs_unlink(nd.path.dentry->d_inode, dentry, &delegated_inode);
> >  exit2:
> >  		dput(dentry);
> >  	}
> >  	mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
> >  	if (inode)
> >  		iput(inode);	/* truncate the inode here */
> > +	inode = NULL;
> > +	if (delegated_inode) {
> > +		error = break_deleg(delegated_inode, O_WRONLY);
> > +		iput(delegated_inode);
> > +		delegated_inode = NULL;
> > +		if (!error)
> > +			goto retry_deleg;
> > +	}
> >  	mnt_drop_write(nd.path.mnt);
> >  exit1:
> >  	path_put(&nd.path);
> > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > index 84ce601..6ccaca2 100644
> > --- a/fs/nfsd/vfs.c
> > +++ b/fs/nfsd/vfs.c
> > @@ -1882,7 +1882,7 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
> >  	if (host_err)
> >  		goto out_put;
> >  	if (type != S_IFDIR)
> > -		host_err = vfs_unlink(dirp, rdentry);
> > +		host_err = vfs_unlink(dirp, rdentry, NULL);
> >  	else
> >  		host_err = vfs_rmdir(dirp, rdentry);
> >  	if (!host_err)
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index c6cc686..f951588 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -1463,7 +1463,7 @@ extern int vfs_mknod(struct inode *, struct dentry *, umode_t, dev_t);
> >  extern int vfs_symlink(struct inode *, struct dentry *, const char *);
> >  extern int vfs_link(struct dentry *, struct inode *, struct dentry *);
> >  extern int vfs_rmdir(struct inode *, struct dentry *);
> > -extern int vfs_unlink(struct inode *, struct dentry *);
> > +extern int vfs_unlink(struct inode *, struct dentry *, struct inode **);
> >  extern int vfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *);
> >  
> >  /*
> > diff --git a/ipc/mqueue.c b/ipc/mqueue.c
> > index e4e47f6..384eb35 100644
> > --- a/ipc/mqueue.c
> > +++ b/ipc/mqueue.c
> > @@ -884,7 +884,7 @@ SYSCALL_DEFINE1(mq_unlink, const char __user *, u_name)
> >  		err = -ENOENT;
> >  	} else {
> >  		ihold(inode);
> > -		err = vfs_unlink(dentry->d_parent->d_inode, dentry);
> > +		err = vfs_unlink(dentry->d_parent->d_inode, dentry, NULL);
> >  	}
> >  	dput(dentry);
> >  
> 
> We probably also ought to eyeball some of these other cases where you
> passing in NULL as the deleg_inode too. It's probably reasonable in
> most cases -- exporting a filesystem that you also mount using ecryptfs
> seems silly, but you never know...
> Looks reasonable otherwise, if a little convoluted.
> 

My apologies -- one of my gripes with claws-mail is that "ctrl+return"
is "Send", and I occasionally fat-finger it. Anyway...

Acked-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 08/12] locks: break delegations on unlink
@ 2013-07-09 13:07           ` Jeff Layton
  0 siblings, 0 replies; 83+ messages in thread
From: Jeff Layton @ 2013-07-09 13:07 UTC (permalink / raw)
  To: Jeff Layton
  Cc: J. Bruce Fields, Al Viro, linux-nfs, linux-fsdevel,
	David Howells, Tyler Hicks, Dustin Kirkland

On Tue, 9 Jul 2013 09:05:06 -0400
Jeff Layton <jlayton@redhat.com> wrote:

> On Wed,  3 Jul 2013 16:12:32 -0400
> "J. Bruce Fields" <bfields@redhat.com> wrote:
> 
> > From: "J. Bruce Fields" <bfields@redhat.com>
> > 
> > We need to break delegations on any operation that changes the set of
> > links pointing to an inode.  Start with unlink.
> > 
> > Such operations also hold the i_mutex on a parent directory.  Breaking a
> > delegation may require waiting for a timeout (by default 90 seconds) in
> > the case of a unresponsive NFS client.  To avoid blocking all directory
> > operations, we therefore drop locks before waiting for the delegation.
> > The logic then looks like:
> > 
> > 	acquire locks
> > 	...
> > 	test for delegation; if found:
> > 		take reference on inode
> > 		release locks
> > 		wait for delegation break
> > 		drop reference on inode
> > 		retry
> > 
> > It is possible this could never terminate.  (Even if we take precautions
> > to prevent another delegation being acquired on the same inode, we could
> > get a different inode on each retry.)  But this seems very unlikely.
> > 
> > The initial test for a delegation happens after the lock on the target
> > inode is acquired, but the directory inode may have been acquired
> > further up the call stack.  We therefore add a "struct inode **"
> > argument to any intervening functions, which we use to pass the inode
> > back up to the caller in the case it needs a delegation synchronously
> > broken.
> > 
> > Cc: David Howells <dhowells@redhat.com>
> > Cc: Tyler Hicks <tyhicks@canonical.com>
> > Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
> > Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> > ---
> >  drivers/base/devtmpfs.c |    2 +-
> >  fs/cachefiles/namei.c   |    2 +-
> >  fs/ecryptfs/inode.c     |    2 +-
> >  fs/namei.c              |   24 +++++++++++++++++++++---
> >  fs/nfsd/vfs.c           |    2 +-
> >  include/linux/fs.h      |    2 +-
> >  ipc/mqueue.c            |    2 +-
> >  7 files changed, 27 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
> > index 7413d06..1b8490e 100644
> > --- a/drivers/base/devtmpfs.c
> > +++ b/drivers/base/devtmpfs.c
> > @@ -324,7 +324,7 @@ static int handle_remove(const char *nodename, struct device *dev)
> >  			mutex_lock(&dentry->d_inode->i_mutex);
> >  			notify_change(dentry, &newattrs);
> >  			mutex_unlock(&dentry->d_inode->i_mutex);
> > -			err = vfs_unlink(parent.dentry->d_inode, dentry);
> > +			err = vfs_unlink(parent.dentry->d_inode, dentry, NULL);
> >  			if (!err || err == -ENOENT)
> >  				deleted = 1;
> >  		}
> > diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
> > index 8c01c5fc..d61d884 100644
> > --- a/fs/cachefiles/namei.c
> > +++ b/fs/cachefiles/namei.c
> > @@ -294,7 +294,7 @@ static int cachefiles_bury_object(struct cachefiles_cache *cache,
> >  		if (ret < 0) {
> >  			cachefiles_io_error(cache, "Unlink security error");
> >  		} else {
> > -			ret = vfs_unlink(dir->d_inode, rep);
> > +			ret = vfs_unlink(dir->d_inode, rep, NULL);
> >  
> >  			if (preemptive)
> >  				cachefiles_mark_object_buried(cache, rep);
> > diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
> > index 5eab400..af42d88 100644
> > --- a/fs/ecryptfs/inode.c
> > +++ b/fs/ecryptfs/inode.c
> > @@ -153,7 +153,7 @@ static int ecryptfs_do_unlink(struct inode *dir, struct dentry *dentry,
> >  
> >  	dget(lower_dentry);
> >  	lower_dir_dentry = lock_parent(lower_dentry);
> > -	rc = vfs_unlink(lower_dir_inode, lower_dentry);
> > +	rc = vfs_unlink(lower_dir_inode, lower_dentry, NULL);
> >  	if (rc) {
> >  		printk(KERN_ERR "Error in vfs_unlink; rc = [%d]\n", rc);
> >  		goto out_unlock;
> > diff --git a/fs/namei.c b/fs/namei.c
> > index 7e76fe1..cba3db1 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -3384,7 +3384,7 @@ SYSCALL_DEFINE1(rmdir, const char __user *, pathname)
> >  	return do_rmdir(AT_FDCWD, pathname);
> >  }
> >  
> > -int vfs_unlink(struct inode *dir, struct dentry *dentry)
> > +int vfs_unlink(struct inode *dir, struct dentry *dentry, struct inode **delegated_inode)
> 
> nit: this might be a good time to add a kerneldoc header on this
> function. The delegated_inode thing might not be clear to the
> uninitiated.
> 
> >  {
> >  	struct inode *target = dentry->d_inode;
> >  	int error = may_delete(dir, dentry, 0);
> > @@ -3401,11 +3401,20 @@ int vfs_unlink(struct inode *dir, struct dentry *dentry)
> >  	else {
> >  		error = security_inode_unlink(dir, dentry);
> >  		if (!error) {
> > +			error = break_deleg(target, O_WRONLY|O_NONBLOCK);
> > +			if (error) {
> > +				if (error == -EWOULDBLOCK && delegated_inode) {
> > +					*delegated_inode = target;
> > +					ihold(target);
> > +				}
> > +				goto out;
> > +			}
> >  			error = dir->i_op->unlink(dir, dentry);
> >  			if (!error)
> >  				dont_mount(dentry);
> >  		}
> >  	}
> > +out:
> >  	mutex_unlock(&target->i_mutex);
> >  
> >  	/* We don't d_delete() NFS sillyrenamed files--they still exist. */
> > @@ -3430,6 +3439,7 @@ static long do_unlinkat(int dfd, const char __user *pathname)
> >  	struct dentry *dentry;
> >  	struct nameidata nd;
> >  	struct inode *inode = NULL;
> > +	struct inode *delegated_inode = NULL;
> >  	unsigned int lookup_flags = 0;
> >  retry:
> >  	name = user_path_parent(dfd, pathname, &nd, lookup_flags);
> > @@ -3444,7 +3454,7 @@ retry:
> >  	error = mnt_want_write(nd.path.mnt);
> >  	if (error)
> >  		goto exit1;
> > -
> > +retry_deleg:
> >  	mutex_lock_nested(&nd.path.dentry->d_inode->i_mutex, I_MUTEX_PARENT);
> >  	dentry = lookup_hash(&nd);
> >  	error = PTR_ERR(dentry);
> > @@ -3459,13 +3469,21 @@ retry:
> >  		error = security_path_unlink(&nd.path, dentry);
> >  		if (error)
> >  			goto exit2;
> > -		error = vfs_unlink(nd.path.dentry->d_inode, dentry);
> > +		error = vfs_unlink(nd.path.dentry->d_inode, dentry, &delegated_inode);
> >  exit2:
> >  		dput(dentry);
> >  	}
> >  	mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
> >  	if (inode)
> >  		iput(inode);	/* truncate the inode here */
> > +	inode = NULL;
> > +	if (delegated_inode) {
> > +		error = break_deleg(delegated_inode, O_WRONLY);
> > +		iput(delegated_inode);
> > +		delegated_inode = NULL;
> > +		if (!error)
> > +			goto retry_deleg;
> > +	}
> >  	mnt_drop_write(nd.path.mnt);
> >  exit1:
> >  	path_put(&nd.path);
> > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > index 84ce601..6ccaca2 100644
> > --- a/fs/nfsd/vfs.c
> > +++ b/fs/nfsd/vfs.c
> > @@ -1882,7 +1882,7 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
> >  	if (host_err)
> >  		goto out_put;
> >  	if (type != S_IFDIR)
> > -		host_err = vfs_unlink(dirp, rdentry);
> > +		host_err = vfs_unlink(dirp, rdentry, NULL);
> >  	else
> >  		host_err = vfs_rmdir(dirp, rdentry);
> >  	if (!host_err)
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index c6cc686..f951588 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -1463,7 +1463,7 @@ extern int vfs_mknod(struct inode *, struct dentry *, umode_t, dev_t);
> >  extern int vfs_symlink(struct inode *, struct dentry *, const char *);
> >  extern int vfs_link(struct dentry *, struct inode *, struct dentry *);
> >  extern int vfs_rmdir(struct inode *, struct dentry *);
> > -extern int vfs_unlink(struct inode *, struct dentry *);
> > +extern int vfs_unlink(struct inode *, struct dentry *, struct inode **);
> >  extern int vfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *);
> >  
> >  /*
> > diff --git a/ipc/mqueue.c b/ipc/mqueue.c
> > index e4e47f6..384eb35 100644
> > --- a/ipc/mqueue.c
> > +++ b/ipc/mqueue.c
> > @@ -884,7 +884,7 @@ SYSCALL_DEFINE1(mq_unlink, const char __user *, u_name)
> >  		err = -ENOENT;
> >  	} else {
> >  		ihold(inode);
> > -		err = vfs_unlink(dentry->d_parent->d_inode, dentry);
> > +		err = vfs_unlink(dentry->d_parent->d_inode, dentry, NULL);
> >  	}
> >  	dput(dentry);
> >  
> 
> We probably also ought to eyeball some of these other cases where you
> passing in NULL as the deleg_inode too. It's probably reasonable in
> most cases -- exporting a filesystem that you also mount using ecryptfs
> seems silly, but you never know...
> Looks reasonable otherwise, if a little convoluted.
> 

My apologies -- one of my gripes with claws-mail is that "ctrl+return"
is "Send", and I occasionally fat-finger it. Anyway...

Acked-by: Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 09/12] locks: helper functions for delegation breaking
  2013-07-03 20:12 ` [PATCH 09/12] locks: helper functions for delegation breaking J. Bruce Fields
@ 2013-07-09 13:09       ` Jeff Layton
  2013-07-09 13:23   ` Jeff Layton
  1 sibling, 0 replies; 83+ messages in thread
From: Jeff Layton @ 2013-07-09 13:09 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Al Viro, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

On Wed,  3 Jul 2013 16:12:33 -0400
"J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> From: "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> 
> We'll need the same logic for rename and link.
> 
> Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  fs/namei.c         |   13 +++----------
>  include/linux/fs.h |   33 +++++++++++++++++++++++++++++++--
>  2 files changed, 34 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index cba3db1..a9d4031 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3401,14 +3401,9 @@ int vfs_unlink(struct inode *dir, struct dentry *dentry, struct inode **delegate
>  	else {
>  		error = security_inode_unlink(dir, dentry);
>  		if (!error) {
> -			error = break_deleg(target, O_WRONLY|O_NONBLOCK);
> -			if (error) {
> -				if (error == -EWOULDBLOCK && delegated_inode) {
> -					*delegated_inode = target;
> -					ihold(target);
> -				}
> +			error = try_break_deleg(target, delegated_inode);
> +			if (error)
>  				goto out;
> -			}
>  			error = dir->i_op->unlink(dir, dentry);
>  			if (!error)
>  				dont_mount(dentry);
> @@ -3478,9 +3473,7 @@ exit2:
>  		iput(inode);	/* truncate the inode here */
>  	inode = NULL;
>  	if (delegated_inode) {
> -		error = break_deleg(delegated_inode, O_WRONLY);
> -		iput(delegated_inode);
> -		delegated_inode = NULL;
> +		error = break_deleg_wait(&delegated_inode);
>  		if (!error)
>  			goto retry_deleg;
>  	}
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index f951588..c37e463 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1894,6 +1894,9 @@ extern bool our_mnt(struct vfsmount *mnt);
>  
>  extern int current_umask(void);
>  
> +extern void ihold(struct inode * inode);
> +extern void iput(struct inode *);
> +
>  /* /sys/fs */
>  extern struct kobject *fs_kobj;
>  
> @@ -1962,6 +1965,28 @@ static inline int break_deleg(struct inode *inode, unsigned int mode)
>  	return 0;
>  }
>  
> +static inline int try_break_deleg(struct inode *inode, struct inode **delegated_inode)
> +{
> +	int ret;
> +
> +	ret = break_deleg(inode, O_WRONLY|O_NONBLOCK);
> +	if (ret == -EWOULDBLOCK && delegated_inode) {
> +		*delegated_inode = inode;
> +		ihold(inode);
> +	}
> +	return ret;
> +}
> +
> +static inline int break_deleg_wait(struct inode **delegated_inode)
> +{
> +	int ret;
> +
> +	ret = break_deleg(*delegated_inode, O_WRONLY);
> +	iput(*delegated_inode);
> +	*delegated_inode = NULL;
> +	return ret;
> +}
> +
>  #else /* !CONFIG_FILE_LOCKING */
>  static inline int locks_mandatory_locked(struct inode *inode)
>  {
> @@ -2005,6 +2030,12 @@ static inline int break_deleg(struct inode *inode, unsigned int mode)
>  {
>  	return 0;
>  }
> +
> +static inline int try_break_deleg(struct inode *inode, struct delegated_inode **inode)
> +{
> +	return 0;
> +}
> +
>  #endif /* CONFIG_FILE_LOCKING */
>  
>  /* fs/open.c */
> @@ -2335,8 +2366,6 @@ extern loff_t vfs_llseek(struct file *file, loff_t offset, int whence);
>  extern int inode_init_always(struct super_block *, struct inode *);
>  extern void inode_init_once(struct inode *);
>  extern void address_space_init_once(struct address_space *mapping);
> -extern void ihold(struct inode * inode);
> -extern void iput(struct inode *);
>  extern struct inode * igrab(struct inode *);
>  extern ino_t iunique(struct super_block *, ino_t);
>  extern int inode_needs_sync(struct inode *inode);

Nice cleanup. Might be reasonable to reorder or merge this patch with
the previous one to reduce "churn" in vfs_unlink.

Acked-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 09/12] locks: helper functions for delegation breaking
@ 2013-07-09 13:09       ` Jeff Layton
  0 siblings, 0 replies; 83+ messages in thread
From: Jeff Layton @ 2013-07-09 13:09 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Al Viro, linux-nfs, linux-fsdevel

On Wed,  3 Jul 2013 16:12:33 -0400
"J. Bruce Fields" <bfields@redhat.com> wrote:

> From: "J. Bruce Fields" <bfields@redhat.com>
> 
> We'll need the same logic for rename and link.
> 
> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> ---
>  fs/namei.c         |   13 +++----------
>  include/linux/fs.h |   33 +++++++++++++++++++++++++++++++--
>  2 files changed, 34 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index cba3db1..a9d4031 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3401,14 +3401,9 @@ int vfs_unlink(struct inode *dir, struct dentry *dentry, struct inode **delegate
>  	else {
>  		error = security_inode_unlink(dir, dentry);
>  		if (!error) {
> -			error = break_deleg(target, O_WRONLY|O_NONBLOCK);
> -			if (error) {
> -				if (error == -EWOULDBLOCK && delegated_inode) {
> -					*delegated_inode = target;
> -					ihold(target);
> -				}
> +			error = try_break_deleg(target, delegated_inode);
> +			if (error)
>  				goto out;
> -			}
>  			error = dir->i_op->unlink(dir, dentry);
>  			if (!error)
>  				dont_mount(dentry);
> @@ -3478,9 +3473,7 @@ exit2:
>  		iput(inode);	/* truncate the inode here */
>  	inode = NULL;
>  	if (delegated_inode) {
> -		error = break_deleg(delegated_inode, O_WRONLY);
> -		iput(delegated_inode);
> -		delegated_inode = NULL;
> +		error = break_deleg_wait(&delegated_inode);
>  		if (!error)
>  			goto retry_deleg;
>  	}
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index f951588..c37e463 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1894,6 +1894,9 @@ extern bool our_mnt(struct vfsmount *mnt);
>  
>  extern int current_umask(void);
>  
> +extern void ihold(struct inode * inode);
> +extern void iput(struct inode *);
> +
>  /* /sys/fs */
>  extern struct kobject *fs_kobj;
>  
> @@ -1962,6 +1965,28 @@ static inline int break_deleg(struct inode *inode, unsigned int mode)
>  	return 0;
>  }
>  
> +static inline int try_break_deleg(struct inode *inode, struct inode **delegated_inode)
> +{
> +	int ret;
> +
> +	ret = break_deleg(inode, O_WRONLY|O_NONBLOCK);
> +	if (ret == -EWOULDBLOCK && delegated_inode) {
> +		*delegated_inode = inode;
> +		ihold(inode);
> +	}
> +	return ret;
> +}
> +
> +static inline int break_deleg_wait(struct inode **delegated_inode)
> +{
> +	int ret;
> +
> +	ret = break_deleg(*delegated_inode, O_WRONLY);
> +	iput(*delegated_inode);
> +	*delegated_inode = NULL;
> +	return ret;
> +}
> +
>  #else /* !CONFIG_FILE_LOCKING */
>  static inline int locks_mandatory_locked(struct inode *inode)
>  {
> @@ -2005,6 +2030,12 @@ static inline int break_deleg(struct inode *inode, unsigned int mode)
>  {
>  	return 0;
>  }
> +
> +static inline int try_break_deleg(struct inode *inode, struct delegated_inode **inode)
> +{
> +	return 0;
> +}
> +
>  #endif /* CONFIG_FILE_LOCKING */
>  
>  /* fs/open.c */
> @@ -2335,8 +2366,6 @@ extern loff_t vfs_llseek(struct file *file, loff_t offset, int whence);
>  extern int inode_init_always(struct super_block *, struct inode *);
>  extern void inode_init_once(struct inode *);
>  extern void address_space_init_once(struct address_space *mapping);
> -extern void ihold(struct inode * inode);
> -extern void iput(struct inode *);
>  extern struct inode * igrab(struct inode *);
>  extern ino_t iunique(struct super_block *, ino_t);
>  extern int inode_needs_sync(struct inode *inode);

Nice cleanup. Might be reasonable to reorder or merge this patch with
the previous one to reduce "churn" in vfs_unlink.

Acked-by: Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 10/12] locks: break delegations on rename
  2013-07-03 20:12 ` [PATCH 10/12] locks: break delegations on rename J. Bruce Fields
@ 2013-07-09 13:14       ` Jeff Layton
  0 siblings, 0 replies; 83+ messages in thread
From: Jeff Layton @ 2013-07-09 13:14 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Al Viro, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, David Howells

On Wed,  3 Jul 2013 16:12:34 -0400
"J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> From: "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> 
> Cc: David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  fs/cachefiles/namei.c |    2 +-
>  fs/namei.c            |   26 ++++++++++++++++++++++----
>  fs/nfsd/vfs.c         |    2 +-
>  include/linux/fs.h    |    2 +-
>  4 files changed, 25 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
> index d61d884..678a8af 100644
> --- a/fs/cachefiles/namei.c
> +++ b/fs/cachefiles/namei.c
> @@ -396,7 +396,7 @@ try_again:
>  		cachefiles_io_error(cache, "Rename security error %d", ret);
>  	} else {
>  		ret = vfs_rename(dir->d_inode, rep,
> -				 cache->graveyard->d_inode, grave);
> +				 cache->graveyard->d_inode, grave, NULL);
>  		if (ret != 0 && ret != -ENOMEM)
>  			cachefiles_io_error(cache,
>  					    "Rename failed with error %d", ret);
> diff --git a/fs/namei.c b/fs/namei.c
> index a9d4031..be00d37 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3763,7 +3763,8 @@ out:
>  }
>  
>  static int vfs_rename_other(struct inode *old_dir, struct dentry *old_dentry,
> -			    struct inode *new_dir, struct dentry *new_dentry)
> +			    struct inode *new_dir, struct dentry *new_dentry,
> +			    struct inode **delegated_inode)
>  {
>  	struct inode *target = new_dentry->d_inode;
>  	struct inode *source = old_dentry->d_inode;
> @@ -3783,6 +3784,14 @@ static int vfs_rename_other(struct inode *old_dir, struct dentry *old_dentry,
>  	if (d_mountpoint(old_dentry)||d_mountpoint(new_dentry))
>  		goto out;
>  
> +	error = try_break_deleg(source, delegated_inode);
> +	if (error)
> +		goto out;
> +	if (target) {
> +		error = try_break_deleg(target, delegated_inode);
> +		if (error)
> +			goto out;
> +	}
>  	error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry);
>  	if (error)
>  		goto out;
> @@ -3801,7 +3810,8 @@ out:
>  }
>  
>  int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
> -	       struct inode *new_dir, struct dentry *new_dentry)
> +	       struct inode *new_dir, struct dentry *new_dentry,
> +	       struct inode **delegated_inode)
>  {
>  	int error;
>  	int is_dir = S_ISDIR(old_dentry->d_inode->i_mode);
> @@ -3829,7 +3839,7 @@ int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
>  	if (is_dir)
>  		error = vfs_rename_dir(old_dir,old_dentry,new_dir,new_dentry);
>  	else
> -		error = vfs_rename_other(old_dir,old_dentry,new_dir,new_dentry);
> +		error = vfs_rename_other(old_dir,old_dentry,new_dir,new_dentry,delegated_inode);
>  	if (!error)
>  		fsnotify_move(old_dir, new_dir, old_name, is_dir,
>  			      new_dentry->d_inode, old_dentry);
> @@ -3845,6 +3855,7 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
>  	struct dentry *old_dentry, *new_dentry;
>  	struct dentry *trap;
>  	struct nameidata oldnd, newnd;
> +	struct inode *delegated_inode = NULL;
>  	struct filename *from;
>  	struct filename *to;
>  	unsigned int lookup_flags = 0;
> @@ -3884,6 +3895,7 @@ retry:
>  	newnd.flags &= ~LOOKUP_PARENT;
>  	newnd.flags |= LOOKUP_RENAME_TARGET;
>  
> +retry_deleg:
>  	trap = lock_rename(new_dir, old_dir);
>  
>  	old_dentry = lookup_hash(&oldnd);
> @@ -3920,13 +3932,19 @@ retry:
>  	if (error)
>  		goto exit5;
>  	error = vfs_rename(old_dir->d_inode, old_dentry,
> -				   new_dir->d_inode, new_dentry);
> +				   new_dir->d_inode, new_dentry,
> +				   &delegated_inode);
>  exit5:
>  	dput(new_dentry);
>  exit4:
>  	dput(old_dentry);
>  exit3:
>  	unlock_rename(new_dir, old_dir);
> +	if (delegated_inode) {
> +		error = break_deleg_wait(&delegated_inode);
> +		if (!error)
> +			goto retry_deleg;
> +	}
>  	mnt_drop_write(oldnd.path.mnt);
>  exit2:
>  	if (retry_estale(error, lookup_flags))
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 6ccaca2..54ac814 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1809,7 +1809,7 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
>  		if (host_err)
>  			goto out_dput_new;
>  	}
> -	host_err = vfs_rename(fdir, odentry, tdir, ndentry);
> +	host_err = vfs_rename(fdir, odentry, tdir, ndentry, NULL);
>  	if (!host_err) {
>  		host_err = commit_metadata(tfhp);
>  		if (!host_err)
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index c37e463..a35dadb 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1464,7 +1464,7 @@ extern int vfs_symlink(struct inode *, struct dentry *, const char *);
>  extern int vfs_link(struct dentry *, struct inode *, struct dentry *);
>  extern int vfs_rmdir(struct inode *, struct dentry *);
>  extern int vfs_unlink(struct inode *, struct dentry *, struct inode **);
> -extern int vfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *);
> +extern int vfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *, struct inode **);
>  
>  /*
>   * VFS dentry helper functions.

Acked-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 10/12] locks: break delegations on rename
@ 2013-07-09 13:14       ` Jeff Layton
  0 siblings, 0 replies; 83+ messages in thread
From: Jeff Layton @ 2013-07-09 13:14 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Al Viro, linux-nfs, linux-fsdevel, David Howells

On Wed,  3 Jul 2013 16:12:34 -0400
"J. Bruce Fields" <bfields@redhat.com> wrote:

> From: "J. Bruce Fields" <bfields@redhat.com>
> 
> Cc: David Howells <dhowells@redhat.com>
> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> ---
>  fs/cachefiles/namei.c |    2 +-
>  fs/namei.c            |   26 ++++++++++++++++++++++----
>  fs/nfsd/vfs.c         |    2 +-
>  include/linux/fs.h    |    2 +-
>  4 files changed, 25 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
> index d61d884..678a8af 100644
> --- a/fs/cachefiles/namei.c
> +++ b/fs/cachefiles/namei.c
> @@ -396,7 +396,7 @@ try_again:
>  		cachefiles_io_error(cache, "Rename security error %d", ret);
>  	} else {
>  		ret = vfs_rename(dir->d_inode, rep,
> -				 cache->graveyard->d_inode, grave);
> +				 cache->graveyard->d_inode, grave, NULL);
>  		if (ret != 0 && ret != -ENOMEM)
>  			cachefiles_io_error(cache,
>  					    "Rename failed with error %d", ret);
> diff --git a/fs/namei.c b/fs/namei.c
> index a9d4031..be00d37 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3763,7 +3763,8 @@ out:
>  }
>  
>  static int vfs_rename_other(struct inode *old_dir, struct dentry *old_dentry,
> -			    struct inode *new_dir, struct dentry *new_dentry)
> +			    struct inode *new_dir, struct dentry *new_dentry,
> +			    struct inode **delegated_inode)
>  {
>  	struct inode *target = new_dentry->d_inode;
>  	struct inode *source = old_dentry->d_inode;
> @@ -3783,6 +3784,14 @@ static int vfs_rename_other(struct inode *old_dir, struct dentry *old_dentry,
>  	if (d_mountpoint(old_dentry)||d_mountpoint(new_dentry))
>  		goto out;
>  
> +	error = try_break_deleg(source, delegated_inode);
> +	if (error)
> +		goto out;
> +	if (target) {
> +		error = try_break_deleg(target, delegated_inode);
> +		if (error)
> +			goto out;
> +	}
>  	error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry);
>  	if (error)
>  		goto out;
> @@ -3801,7 +3810,8 @@ out:
>  }
>  
>  int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
> -	       struct inode *new_dir, struct dentry *new_dentry)
> +	       struct inode *new_dir, struct dentry *new_dentry,
> +	       struct inode **delegated_inode)
>  {
>  	int error;
>  	int is_dir = S_ISDIR(old_dentry->d_inode->i_mode);
> @@ -3829,7 +3839,7 @@ int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
>  	if (is_dir)
>  		error = vfs_rename_dir(old_dir,old_dentry,new_dir,new_dentry);
>  	else
> -		error = vfs_rename_other(old_dir,old_dentry,new_dir,new_dentry);
> +		error = vfs_rename_other(old_dir,old_dentry,new_dir,new_dentry,delegated_inode);
>  	if (!error)
>  		fsnotify_move(old_dir, new_dir, old_name, is_dir,
>  			      new_dentry->d_inode, old_dentry);
> @@ -3845,6 +3855,7 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
>  	struct dentry *old_dentry, *new_dentry;
>  	struct dentry *trap;
>  	struct nameidata oldnd, newnd;
> +	struct inode *delegated_inode = NULL;
>  	struct filename *from;
>  	struct filename *to;
>  	unsigned int lookup_flags = 0;
> @@ -3884,6 +3895,7 @@ retry:
>  	newnd.flags &= ~LOOKUP_PARENT;
>  	newnd.flags |= LOOKUP_RENAME_TARGET;
>  
> +retry_deleg:
>  	trap = lock_rename(new_dir, old_dir);
>  
>  	old_dentry = lookup_hash(&oldnd);
> @@ -3920,13 +3932,19 @@ retry:
>  	if (error)
>  		goto exit5;
>  	error = vfs_rename(old_dir->d_inode, old_dentry,
> -				   new_dir->d_inode, new_dentry);
> +				   new_dir->d_inode, new_dentry,
> +				   &delegated_inode);
>  exit5:
>  	dput(new_dentry);
>  exit4:
>  	dput(old_dentry);
>  exit3:
>  	unlock_rename(new_dir, old_dir);
> +	if (delegated_inode) {
> +		error = break_deleg_wait(&delegated_inode);
> +		if (!error)
> +			goto retry_deleg;
> +	}
>  	mnt_drop_write(oldnd.path.mnt);
>  exit2:
>  	if (retry_estale(error, lookup_flags))
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 6ccaca2..54ac814 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1809,7 +1809,7 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
>  		if (host_err)
>  			goto out_dput_new;
>  	}
> -	host_err = vfs_rename(fdir, odentry, tdir, ndentry);
> +	host_err = vfs_rename(fdir, odentry, tdir, ndentry, NULL);
>  	if (!host_err) {
>  		host_err = commit_metadata(tfhp);
>  		if (!host_err)
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index c37e463..a35dadb 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1464,7 +1464,7 @@ extern int vfs_symlink(struct inode *, struct dentry *, const char *);
>  extern int vfs_link(struct dentry *, struct inode *, struct dentry *);
>  extern int vfs_rmdir(struct inode *, struct dentry *);
>  extern int vfs_unlink(struct inode *, struct dentry *, struct inode **);
> -extern int vfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *);
> +extern int vfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *, struct inode **);
>  
>  /*
>   * VFS dentry helper functions.

Acked-by: Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 11/12] locks: break delegations on link
  2013-07-03 20:12 ` [PATCH 11/12] locks: break delegations on link J. Bruce Fields
@ 2013-07-09 13:16       ` Jeff Layton
  0 siblings, 0 replies; 83+ messages in thread
From: Jeff Layton @ 2013-07-09 13:16 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Al Viro, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Tyler Hicks,
	Dustin Kirkland

On Wed,  3 Jul 2013 16:12:35 -0400
"J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> From: "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> 
> Cc: Tyler Hicks <tyhicks-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> Cc: Dustin Kirkland <dustin.kirkland-Bv2LyzZ6GzxBDgjK7y7TUQ@public.gmane.org>
> Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  fs/ecryptfs/inode.c |    2 +-
>  fs/namei.c          |   17 +++++++++++++----
>  fs/nfsd/vfs.c       |    2 +-
>  include/linux/fs.h  |    2 +-
>  4 files changed, 16 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
> index af42d88..19e4435 100644
> --- a/fs/ecryptfs/inode.c
> +++ b/fs/ecryptfs/inode.c
> @@ -475,7 +475,7 @@ static int ecryptfs_link(struct dentry *old_dentry, struct inode *dir,
>  	dget(lower_new_dentry);
>  	lower_dir_dentry = lock_parent(lower_new_dentry);
>  	rc = vfs_link(lower_old_dentry, lower_dir_dentry->d_inode,
> -		      lower_new_dentry);
> +		      lower_new_dentry, NULL);
>  	if (rc || !lower_new_dentry->d_inode)
>  		goto out_lock;
>  	rc = ecryptfs_interpose(lower_new_dentry, new_dentry, dir->i_sb);
> diff --git a/fs/namei.c b/fs/namei.c
> index be00d37..18267e0 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3566,7 +3566,7 @@ SYSCALL_DEFINE2(symlink, const char __user *, oldname, const char __user *, newn
>  	return sys_symlinkat(oldname, AT_FDCWD, newname);
>  }
>  
> -int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_dentry)
> +int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_dentry, struct inode **delegated_inode)

A kerneldoc comment would be nice here. Ditto for vfs_rename* in the
previous patch...

>  {
>  	struct inode *inode = old_dentry->d_inode;
>  	unsigned max_links = dir->i_sb->s_max_links;
> @@ -3602,8 +3602,11 @@ int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_de
>  		error =  -ENOENT;
>  	else if (max_links && inode->i_nlink >= max_links)
>  		error = -EMLINK;
> -	else
> -		error = dir->i_op->link(old_dentry, dir, new_dentry);
> +	else {
> +		error = try_break_deleg(inode, delegated_inode);
> +		if (!error)
> +			error = dir->i_op->link(old_dentry, dir, new_dentry);
> +	}
>  	mutex_unlock(&inode->i_mutex);
>  	if (!error)
>  		fsnotify_link(dir, inode, new_dentry);
> @@ -3624,6 +3627,7 @@ SYSCALL_DEFINE5(linkat, int, olddfd, const char __user *, oldname,
>  {
>  	struct dentry *new_dentry;
>  	struct path old_path, new_path;
> +	struct inode *delegated_inode = NULL;
>  	int how = 0;
>  	int error;
>  
> @@ -3662,9 +3666,14 @@ retry:
>  	error = security_path_link(old_path.dentry, &new_path, new_dentry);
>  	if (error)
>  		goto out_dput;
> -	error = vfs_link(old_path.dentry, new_path.dentry->d_inode, new_dentry);
> +	error = vfs_link(old_path.dentry, new_path.dentry->d_inode, new_dentry, &delegated_inode);
>  out_dput:
>  	done_path_create(&new_path, new_dentry);
> +	if (delegated_inode) {
> +		error = break_deleg_wait(&delegated_inode);
> +		if (!error)
> +			goto retry;
> +	}
>  	if (retry_estale(error, how)) {
>  		how |= LOOKUP_REVAL;
>  		goto retry;
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 54ac814..b9740cb 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1708,7 +1708,7 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
>  		err = nfserrno(host_err);
>  		goto out_dput;
>  	}
> -	host_err = vfs_link(dold, dirp, dnew);
> +	host_err = vfs_link(dold, dirp, dnew, NULL);
>  	if (!host_err) {
>  		err = nfserrno(commit_metadata(ffhp));
>  		if (!err)
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index a35dadb..936413c 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1461,7 +1461,7 @@ extern int vfs_create(struct inode *, struct dentry *, umode_t, bool);
>  extern int vfs_mkdir(struct inode *, struct dentry *, umode_t);
>  extern int vfs_mknod(struct inode *, struct dentry *, umode_t, dev_t);
>  extern int vfs_symlink(struct inode *, struct dentry *, const char *);
> -extern int vfs_link(struct dentry *, struct inode *, struct dentry *);
> +extern int vfs_link(struct dentry *, struct inode *, struct dentry *, struct inode **);
>  extern int vfs_rmdir(struct inode *, struct dentry *);
>  extern int vfs_unlink(struct inode *, struct dentry *, struct inode **);
>  extern int vfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *, struct inode **);

Acked-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 11/12] locks: break delegations on link
@ 2013-07-09 13:16       ` Jeff Layton
  0 siblings, 0 replies; 83+ messages in thread
From: Jeff Layton @ 2013-07-09 13:16 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Al Viro, linux-nfs, linux-fsdevel, Tyler Hicks, Dustin Kirkland

On Wed,  3 Jul 2013 16:12:35 -0400
"J. Bruce Fields" <bfields@redhat.com> wrote:

> From: "J. Bruce Fields" <bfields@redhat.com>
> 
> Cc: Tyler Hicks <tyhicks@canonical.com>
> Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> ---
>  fs/ecryptfs/inode.c |    2 +-
>  fs/namei.c          |   17 +++++++++++++----
>  fs/nfsd/vfs.c       |    2 +-
>  include/linux/fs.h  |    2 +-
>  4 files changed, 16 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
> index af42d88..19e4435 100644
> --- a/fs/ecryptfs/inode.c
> +++ b/fs/ecryptfs/inode.c
> @@ -475,7 +475,7 @@ static int ecryptfs_link(struct dentry *old_dentry, struct inode *dir,
>  	dget(lower_new_dentry);
>  	lower_dir_dentry = lock_parent(lower_new_dentry);
>  	rc = vfs_link(lower_old_dentry, lower_dir_dentry->d_inode,
> -		      lower_new_dentry);
> +		      lower_new_dentry, NULL);
>  	if (rc || !lower_new_dentry->d_inode)
>  		goto out_lock;
>  	rc = ecryptfs_interpose(lower_new_dentry, new_dentry, dir->i_sb);
> diff --git a/fs/namei.c b/fs/namei.c
> index be00d37..18267e0 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3566,7 +3566,7 @@ SYSCALL_DEFINE2(symlink, const char __user *, oldname, const char __user *, newn
>  	return sys_symlinkat(oldname, AT_FDCWD, newname);
>  }
>  
> -int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_dentry)
> +int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_dentry, struct inode **delegated_inode)

A kerneldoc comment would be nice here. Ditto for vfs_rename* in the
previous patch...

>  {
>  	struct inode *inode = old_dentry->d_inode;
>  	unsigned max_links = dir->i_sb->s_max_links;
> @@ -3602,8 +3602,11 @@ int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_de
>  		error =  -ENOENT;
>  	else if (max_links && inode->i_nlink >= max_links)
>  		error = -EMLINK;
> -	else
> -		error = dir->i_op->link(old_dentry, dir, new_dentry);
> +	else {
> +		error = try_break_deleg(inode, delegated_inode);
> +		if (!error)
> +			error = dir->i_op->link(old_dentry, dir, new_dentry);
> +	}
>  	mutex_unlock(&inode->i_mutex);
>  	if (!error)
>  		fsnotify_link(dir, inode, new_dentry);
> @@ -3624,6 +3627,7 @@ SYSCALL_DEFINE5(linkat, int, olddfd, const char __user *, oldname,
>  {
>  	struct dentry *new_dentry;
>  	struct path old_path, new_path;
> +	struct inode *delegated_inode = NULL;
>  	int how = 0;
>  	int error;
>  
> @@ -3662,9 +3666,14 @@ retry:
>  	error = security_path_link(old_path.dentry, &new_path, new_dentry);
>  	if (error)
>  		goto out_dput;
> -	error = vfs_link(old_path.dentry, new_path.dentry->d_inode, new_dentry);
> +	error = vfs_link(old_path.dentry, new_path.dentry->d_inode, new_dentry, &delegated_inode);
>  out_dput:
>  	done_path_create(&new_path, new_dentry);
> +	if (delegated_inode) {
> +		error = break_deleg_wait(&delegated_inode);
> +		if (!error)
> +			goto retry;
> +	}
>  	if (retry_estale(error, how)) {
>  		how |= LOOKUP_REVAL;
>  		goto retry;
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 54ac814..b9740cb 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1708,7 +1708,7 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
>  		err = nfserrno(host_err);
>  		goto out_dput;
>  	}
> -	host_err = vfs_link(dold, dirp, dnew);
> +	host_err = vfs_link(dold, dirp, dnew, NULL);
>  	if (!host_err) {
>  		err = nfserrno(commit_metadata(ffhp));
>  		if (!err)
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index a35dadb..936413c 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1461,7 +1461,7 @@ extern int vfs_create(struct inode *, struct dentry *, umode_t, bool);
>  extern int vfs_mkdir(struct inode *, struct dentry *, umode_t);
>  extern int vfs_mknod(struct inode *, struct dentry *, umode_t, dev_t);
>  extern int vfs_symlink(struct inode *, struct dentry *, const char *);
> -extern int vfs_link(struct dentry *, struct inode *, struct dentry *);
> +extern int vfs_link(struct dentry *, struct inode *, struct dentry *, struct inode **);
>  extern int vfs_rmdir(struct inode *, struct dentry *);
>  extern int vfs_unlink(struct inode *, struct dentry *, struct inode **);
>  extern int vfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *, struct inode **);

Acked-by: Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 09/12] locks: helper functions for delegation breaking
  2013-07-03 20:12 ` [PATCH 09/12] locks: helper functions for delegation breaking J. Bruce Fields
       [not found]   ` <1372882356-14168-10-git-send-email-bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2013-07-09 13:23   ` Jeff Layton
  2013-07-09 19:38     ` J. Bruce Fields
  1 sibling, 1 reply; 83+ messages in thread
From: Jeff Layton @ 2013-07-09 13:23 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Al Viro, linux-nfs, linux-fsdevel

On Wed,  3 Jul 2013 16:12:33 -0400
"J. Bruce Fields" <bfields@redhat.com> wrote:

> From: "J. Bruce Fields" <bfields@redhat.com>
> 
> We'll need the same logic for rename and link.
> 
> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> ---
>  fs/namei.c         |   13 +++----------
>  include/linux/fs.h |   33 +++++++++++++++++++++++++++++++--
>  2 files changed, 34 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index cba3db1..a9d4031 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3401,14 +3401,9 @@ int vfs_unlink(struct inode *dir, struct dentry *dentry, struct inode **delegate
>  	else {
>  		error = security_inode_unlink(dir, dentry);
>  		if (!error) {
> -			error = break_deleg(target, O_WRONLY|O_NONBLOCK);
> -			if (error) {
> -				if (error == -EWOULDBLOCK && delegated_inode) {
> -					*delegated_inode = target;
> -					ihold(target);
> -				}
> +			error = try_break_deleg(target, delegated_inode);
> +			if (error)
>  				goto out;
> -			}
>  			error = dir->i_op->unlink(dir, dentry);
>  			if (!error)
>  				dont_mount(dentry);
> @@ -3478,9 +3473,7 @@ exit2:
>  		iput(inode);	/* truncate the inode here */
>  	inode = NULL;
>  	if (delegated_inode) {
> -		error = break_deleg(delegated_inode, O_WRONLY);
> -		iput(delegated_inode);
> -		delegated_inode = NULL;
> +		error = break_deleg_wait(&delegated_inode);
>  		if (!error)
>  			goto retry_deleg;
>  	}
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index f951588..c37e463 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1894,6 +1894,9 @@ extern bool our_mnt(struct vfsmount *mnt);
>  
>  extern int current_umask(void);
>  
> +extern void ihold(struct inode * inode);
> +extern void iput(struct inode *);
> +
>  /* /sys/fs */
>  extern struct kobject *fs_kobj;
>  
> @@ -1962,6 +1965,28 @@ static inline int break_deleg(struct inode *inode, unsigned int mode)
>  	return 0;
>  }
>  
> +static inline int try_break_deleg(struct inode *inode, struct inode **delegated_inode)
> +{
> +	int ret;
> +
> +	ret = break_deleg(inode, O_WRONLY|O_NONBLOCK);
> +	if (ret == -EWOULDBLOCK && delegated_inode) {
> +		*delegated_inode = inode;
> +		ihold(inode);
> +	}
> +	return ret;
> +}
> +

Actually, now that I look...

Suppose a vfs_unlink caller passes in a NULL delegated_inode pointer.
He'll get back a -EWOULDBLOCK here if there's a delegation on it.
Presumably he'll just treat that as a hard error and the delegation
would never get broken. Is that expected?

> +static inline int break_deleg_wait(struct inode **delegated_inode)
> +{
> +	int ret;
> +
> +	ret = break_deleg(*delegated_inode, O_WRONLY);
> +	iput(*delegated_inode);
> +	*delegated_inode = NULL;
> +	return ret;
> +}
> +
>  #else /* !CONFIG_FILE_LOCKING */
>  static inline int locks_mandatory_locked(struct inode *inode)
>  {
> @@ -2005,6 +2030,12 @@ static inline int break_deleg(struct inode *inode, unsigned int mode)
>  {
>  	return 0;
>  }
> +
> +static inline int try_break_deleg(struct inode *inode, struct delegated_inode **inode)
> +{
> +	return 0;
> +}
> +
>  #endif /* CONFIG_FILE_LOCKING */
>  
>  /* fs/open.c */
> @@ -2335,8 +2366,6 @@ extern loff_t vfs_llseek(struct file *file, loff_t offset, int whence);
>  extern int inode_init_always(struct super_block *, struct inode *);
>  extern void inode_init_once(struct inode *);
>  extern void address_space_init_once(struct address_space *mapping);
> -extern void ihold(struct inode * inode);
> -extern void iput(struct inode *);
>  extern struct inode * igrab(struct inode *);
>  extern ino_t iunique(struct super_block *, ino_t);
>  extern int inode_needs_sync(struct inode *inode);


-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 12/12] locks: break delegations on any attribute modification
  2013-07-03 20:12 ` [PATCH 12/12] locks: break delegations on any attribute modification J. Bruce Fields
@ 2013-07-09 13:30   ` Jeff Layton
       [not found]     ` <20130709093047.0096f061-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
  0 siblings, 1 reply; 83+ messages in thread
From: Jeff Layton @ 2013-07-09 13:30 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Al Viro, linux-nfs, linux-fsdevel, Mikulas Patocka,
	David Howells, Tyler Hicks, Dustin Kirkland

On Wed,  3 Jul 2013 16:12:36 -0400
"J. Bruce Fields" <bfields@redhat.com> wrote:

> From: "J. Bruce Fields" <bfields@redhat.com>
> 
> NFSv4 uses leases to guarantee that clients can cache metadata as well
> as data.
> 
> Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
> Cc: David Howells <dhowells@redhat.com>
> Cc: Tyler Hicks <tyhicks@canonical.com>
> Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> ---
>  drivers/base/devtmpfs.c   |    4 ++--
>  fs/attr.c                 |    5 ++++-
>  fs/cachefiles/interface.c |    4 ++--
>  fs/ecryptfs/inode.c       |    2 +-
>  fs/hpfs/namei.c           |    2 +-
>  fs/inode.c                |    6 +++++-
>  fs/nfsd/vfs.c             |    8 ++++++--
>  fs/open.c                 |   21 +++++++++++++++++----
>  fs/utimes.c               |    9 ++++++++-
>  include/linux/fs.h        |    2 +-
>  10 files changed, 47 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
> index 1b8490e..0f38201 100644
> --- a/drivers/base/devtmpfs.c
> +++ b/drivers/base/devtmpfs.c
> @@ -216,7 +216,7 @@ static int handle_create(const char *nodename, umode_t mode, kuid_t uid,
>  		newattrs.ia_gid = gid;
>  		newattrs.ia_valid = ATTR_MODE|ATTR_UID|ATTR_GID;
>  		mutex_lock(&dentry->d_inode->i_mutex);
> -		notify_change(dentry, &newattrs);
> +		notify_change(dentry, &newattrs, NULL);
>  		mutex_unlock(&dentry->d_inode->i_mutex);
>  
>  		/* mark as kernel-created inode */
> @@ -322,7 +322,7 @@ static int handle_remove(const char *nodename, struct device *dev)
>  			newattrs.ia_valid =
>  				ATTR_UID|ATTR_GID|ATTR_MODE;
>  			mutex_lock(&dentry->d_inode->i_mutex);
> -			notify_change(dentry, &newattrs);
> +			notify_change(dentry, &newattrs, NULL);
>  			mutex_unlock(&dentry->d_inode->i_mutex);
>  			err = vfs_unlink(parent.dentry->d_inode, dentry, NULL);
>  			if (!err || err == -ENOENT)
> diff --git a/fs/attr.c b/fs/attr.c
> index 1449adb..261f5c9 100644
> --- a/fs/attr.c
> +++ b/fs/attr.c
> @@ -167,7 +167,7 @@ void setattr_copy(struct inode *inode, const struct iattr *attr)
>  }
>  EXPORT_SYMBOL(setattr_copy);
>  
> -int notify_change(struct dentry * dentry, struct iattr * attr)
> +int notify_change(struct dentry * dentry, struct iattr * attr, struct inode **delegated_inode)
>  {
>  	struct inode *inode = dentry->d_inode;
>  	umode_t mode = inode->i_mode;
> @@ -243,6 +243,9 @@ int notify_change(struct dentry * dentry, struct iattr * attr)
>  	error = security_inode_setattr(dentry, attr);
>  	if (error)
>  		return error;
> +	error = try_break_deleg(inode, delegated_inode);
> +	if (error)
> +		return error;
>  
>  	if (inode->i_op->setattr)
>  		error = inode->i_op->setattr(dentry, attr);
> diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
> index 746ce53..40f5917 100644
> --- a/fs/cachefiles/interface.c
> +++ b/fs/cachefiles/interface.c
> @@ -417,14 +417,14 @@ static int cachefiles_attr_changed(struct fscache_object *_object)
>  		_debug("discard tail %llx", oi_size);
>  		newattrs.ia_valid = ATTR_SIZE;
>  		newattrs.ia_size = oi_size & PAGE_MASK;
> -		ret = notify_change(object->backer, &newattrs);
> +		ret = notify_change(object->backer, &newattrs, NULL);
>  		if (ret < 0)
>  			goto truncate_failed;
>  	}
>  
>  	newattrs.ia_valid = ATTR_SIZE;
>  	newattrs.ia_size = ni_size;
> -	ret = notify_change(object->backer, &newattrs);
> +	ret = notify_change(object->backer, &newattrs, NULL);
>  
>  truncate_failed:
>  	mutex_unlock(&object->backer->d_inode->i_mutex);
> diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
> index 19e4435..bd54575 100644
> --- a/fs/ecryptfs/inode.c
> +++ b/fs/ecryptfs/inode.c
> @@ -992,7 +992,7 @@ static int ecryptfs_setattr(struct dentry *dentry, struct iattr *ia)
>  		lower_ia.ia_valid &= ~ATTR_MODE;
>  
>  	mutex_lock(&lower_dentry->d_inode->i_mutex);
> -	rc = notify_change(lower_dentry, &lower_ia);
> +	rc = notify_change(lower_dentry, &lower_ia, NULL);
>  	mutex_unlock(&lower_dentry->d_inode->i_mutex);
>  out:
>  	fsstack_copy_attr_all(inode, lower_inode);
> diff --git a/fs/hpfs/namei.c b/fs/hpfs/namei.c
> index 345713d..1b39afd 100644
> --- a/fs/hpfs/namei.c
> +++ b/fs/hpfs/namei.c
> @@ -407,7 +407,7 @@ again:
>  			/*printk("HPFS: truncating file before delete.\n");*/
>  			newattrs.ia_size = 0;
>  			newattrs.ia_valid = ATTR_SIZE | ATTR_CTIME;
> -			err = notify_change(dentry, &newattrs);
> +			err = notify_change(dentry, &newattrs, NULL);
>  			put_write_access(inode);
>  			if (!err)
>  				goto again;
> diff --git a/fs/inode.c b/fs/inode.c
> index 304db4c..664d631 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -1633,7 +1633,11 @@ static int __remove_suid(struct dentry *dentry, int kill)
>  	struct iattr newattrs;
>  
>  	newattrs.ia_valid = ATTR_FORCE | kill;
> -	return notify_change(dentry, &newattrs);
> +	/*
> +	 * Note we call this on write, so notify_change will not
> +	 * encounter any conflicting delegations:
> +	 */
> +	return notify_change(dentry, &newattrs, NULL);
>  }
>  
>  int file_remove_suid(struct file *file)
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index b9740cb..e781901 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -426,7 +426,7 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, struct iattr *iap,
>  			goto out_nfserr;
>  		fh_lock(fhp);
>  
> -		host_err = notify_change(dentry, iap);
> +		host_err = notify_change(dentry, iap, NULL);
>  		err = nfserrno(host_err);
>  		fh_unlock(fhp);
>  	}
> @@ -959,7 +959,11 @@ static void kill_suid(struct dentry *dentry)
>  	ia.ia_valid = ATTR_KILL_SUID | ATTR_KILL_SGID | ATTR_KILL_PRIV;
>  
>  	mutex_lock(&dentry->d_inode->i_mutex);
> -	notify_change(dentry, &ia);
> +	/*
> +	 * Note we call this on write, so notify_change will not
> +	 * encounter any conflicting delegations:
> +	 */
> +	notify_change(dentry, &ia, NULL);
>  	mutex_unlock(&dentry->d_inode->i_mutex);
>  }
>  
> diff --git a/fs/open.c b/fs/open.c
> index 8c74100..1a39d29 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -57,7 +57,7 @@ int do_truncate(struct dentry *dentry, loff_t length, unsigned int time_attrs,
>  		newattrs.ia_valid |= ret | ATTR_FORCE;
>  
>  	mutex_lock(&dentry->d_inode->i_mutex);
> -	ret = notify_change(dentry, &newattrs);
> +	ret = notify_change(dentry, &newattrs, NULL);
>  	mutex_unlock(&dentry->d_inode->i_mutex);
>  	return ret;
>  }

Isn't it possible we'll need to break a delegation on truncate()?

> @@ -464,21 +464,28 @@ out:
>  static int chmod_common(struct path *path, umode_t mode)
>  {
>  	struct inode *inode = path->dentry->d_inode;
> +	struct inode *delegated_inode = NULL;
>  	struct iattr newattrs;
>  	int error;
>  
>  	error = mnt_want_write(path->mnt);
>  	if (error)
>  		return error;
> +retry_deleg:
>  	mutex_lock(&inode->i_mutex);
>  	error = security_path_chmod(path, mode);
>  	if (error)
>  		goto out_unlock;
>  	newattrs.ia_mode = (mode & S_IALLUGO) | (inode->i_mode & ~S_IALLUGO);
>  	newattrs.ia_valid = ATTR_MODE | ATTR_CTIME;
> -	error = notify_change(path->dentry, &newattrs);
> +	error = notify_change(path->dentry, &newattrs, &delegated_inode);
>  out_unlock:
>  	mutex_unlock(&inode->i_mutex);
> +	if (delegated_inode) {
> +		error = break_deleg_wait(&delegated_inode);
> +		if (!error)
> +			goto retry_deleg;
> +	}
>  	mnt_drop_write(path->mnt);
>  	return error;
>  }
> @@ -523,6 +530,7 @@ SYSCALL_DEFINE2(chmod, const char __user *, filename, umode_t, mode)
>  static int chown_common(struct path *path, uid_t user, gid_t group)
>  {
>  	struct inode *inode = path->dentry->d_inode;
> +	struct inode *delegated_inode = NULL;
>  	int error;
>  	struct iattr newattrs;
>  	kuid_t uid;
> @@ -547,12 +555,17 @@ static int chown_common(struct path *path, uid_t user, gid_t group)
>  	if (!S_ISDIR(inode->i_mode))
>  		newattrs.ia_valid |=
>  			ATTR_KILL_SUID | ATTR_KILL_SGID | ATTR_KILL_PRIV;
> +retry_deleg:
>  	mutex_lock(&inode->i_mutex);
>  	error = security_path_chown(path, uid, gid);
>  	if (!error)
> -		error = notify_change(path->dentry, &newattrs);
> +		error = notify_change(path->dentry, &newattrs, &delegated_inode);
>  	mutex_unlock(&inode->i_mutex);
> -
> +	if (delegated_inode) {
> +		error = break_deleg_wait(&delegated_inode);
> +		if (!error)
> +			goto retry_deleg;
> +	}
>  	return error;
>  }
>  
> diff --git a/fs/utimes.c b/fs/utimes.c
> index f4fb7ec..aa138d6 100644
> --- a/fs/utimes.c
> +++ b/fs/utimes.c
> @@ -53,6 +53,7 @@ static int utimes_common(struct path *path, struct timespec *times)
>  	int error;
>  	struct iattr newattrs;
>  	struct inode *inode = path->dentry->d_inode;
> +	struct inode *delegated_inode = NULL;
>  
>  	error = mnt_want_write(path->mnt);
>  	if (error)
> @@ -101,9 +102,15 @@ static int utimes_common(struct path *path, struct timespec *times)
>  				goto mnt_drop_write_and_out;
>  		}
>  	}
> +retry_deleg:
>  	mutex_lock(&inode->i_mutex);
> -	error = notify_change(path->dentry, &newattrs);
> +	error = notify_change(path->dentry, &newattrs, &delegated_inode);
>  	mutex_unlock(&inode->i_mutex);
> +	if (delegated_inode) {
> +		error = break_deleg_wait(&delegated_inode);
> +		if (!error)
> +			goto retry_deleg;
> +	}
>  
>  mnt_drop_write_and_out:
>  	mnt_drop_write(path->mnt);
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 936413c..0d6d919 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2256,7 +2256,7 @@ extern void emergency_remount(void);
>  #ifdef CONFIG_BLOCK
>  extern sector_t bmap(struct inode *, sector_t);
>  #endif
> -extern int notify_change(struct dentry *, struct iattr *);
> +extern int notify_change(struct dentry *, struct iattr *, struct inode **);
>  extern int inode_permission(struct inode *, int);
>  extern int generic_permission(struct inode *, int);
>  


-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 03/12] vfs: rename I_MUTEX_QUOTA now that it's not used for quotas
  2013-07-09 10:54         ` Jeff Layton
  (?)
@ 2013-07-09 14:26         ` J. Bruce Fields
  2013-07-09 14:31           ` Jeff Layton
  -1 siblings, 1 reply; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-09 14:26 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Al Viro, linux-nfs, linux-fsdevel

On Tue, Jul 09, 2013 at 06:54:00AM -0400, Jeff Layton wrote:
> On Wed,  3 Jul 2013 16:12:27 -0400
> "J. Bruce Fields" <bfields@redhat.com> wrote:
> 
> > From: "J. Bruce Fields" <bfields@redhat.com>
> > 
> > I_MUTEX_QUOTA is now just being used whenever we want to lock two
> > non-directories.  So the name isn't right.  I_MUTEX_NONDIR2 isn't
> > especially elegant but it's the best I could think of.
> > 
> > Also fix some outdated documentation.
> > 
> > Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> > ---
> >  fs/inode.c         |    4 ++--
> >  include/linux/fs.h |    9 ++++++---
> >  2 files changed, 8 insertions(+), 5 deletions(-)
> > 
> > diff --git a/fs/inode.c b/fs/inode.c
> > index 942451b..304db4c 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -988,10 +988,10 @@ void lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
> >  {
> >  	if (inode1 < inode2) {
> >  		mutex_lock(&inode1->i_mutex);
> > -		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_QUOTA);
> > +		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_NONDIR2);
> >  	} else {
> >  		mutex_lock(&inode2->i_mutex);
> > -		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_QUOTA);
> > +		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_NONDIR2);
> >  	}
> >  }
> >  EXPORT_SYMBOL(lock_two_nondirectories);
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 3258761..ec88235 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -620,10 +620,13 @@ static inline int inode_unhashed(struct inode *inode)
> >   * 0: the object of the current VFS operation
> >   * 1: parent
> >   * 2: child/target
> > - * 3: quota file
> > + * 3: xattr
> > + * 4: second non-directory
> > + * The last is for certain operations (such as rename) which lock two
> > + * non-directories at once.
> >   *
> >   * The locking order between these classes is
> > - * parent -> child -> normal -> xattr -> quota
> > + * parent -> child -> normal -> xattr -> second non-directory
> >   */
> >  enum inode_i_mutex_lock_class
> >  {
> > @@ -631,7 +634,7 @@ enum inode_i_mutex_lock_class
> >  	I_MUTEX_PARENT,
> >  	I_MUTEX_CHILD,
> >  	I_MUTEX_XATTR,
> > -	I_MUTEX_QUOTA
> > +	I_MUTEX_NONDIR2
> >  };
> >  
> >  void lock_two_nondirectories(struct inode *, struct inode*);
> 
> Ugly name, but I'm not sure what to call it either. Wonder if it would
> make sense to do some sort of SOURCE/TARGET lock class and rearrange
> the code to take that into account?

You need to order the locks globally somehow (e.g. by ancestor order in
the case of the parent directories)--you can't always take them in the
order source and target, for example, because a rename from A/ into B/
could then deadlock with a simultaneous rename from B/ into A/.  So I
don't think SOURCE and TARGET would work for names.  The current names
have a certain logic, but there's probably something more elegant.

Currently: after these patches a rename of regular a regular file onto
another regular file will take locks on the source and target parents,
and source and target (victim) files.  The first two will take PARENT
and CHILD, the second NORMAL and NONDIR2.

--b.

> But, that's just bikeshedding, so...
> 
> Acked-by: Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 03/12] vfs: rename I_MUTEX_QUOTA now that it's not used for quotas
  2013-07-09 14:26         ` J. Bruce Fields
@ 2013-07-09 14:31           ` Jeff Layton
  0 siblings, 0 replies; 83+ messages in thread
From: Jeff Layton @ 2013-07-09 14:31 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Al Viro, linux-nfs, linux-fsdevel

On Tue, 9 Jul 2013 10:26:52 -0400
"J. Bruce Fields" <bfields@redhat.com> wrote:

> On Tue, Jul 09, 2013 at 06:54:00AM -0400, Jeff Layton wrote:
> > On Wed,  3 Jul 2013 16:12:27 -0400
> > "J. Bruce Fields" <bfields@redhat.com> wrote:
> > 
> > > From: "J. Bruce Fields" <bfields@redhat.com>
> > > 
> > > I_MUTEX_QUOTA is now just being used whenever we want to lock two
> > > non-directories.  So the name isn't right.  I_MUTEX_NONDIR2 isn't
> > > especially elegant but it's the best I could think of.
> > > 
> > > Also fix some outdated documentation.
> > > 
> > > Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> > > ---
> > >  fs/inode.c         |    4 ++--
> > >  include/linux/fs.h |    9 ++++++---
> > >  2 files changed, 8 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/fs/inode.c b/fs/inode.c
> > > index 942451b..304db4c 100644
> > > --- a/fs/inode.c
> > > +++ b/fs/inode.c
> > > @@ -988,10 +988,10 @@ void lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
> > >  {
> > >  	if (inode1 < inode2) {
> > >  		mutex_lock(&inode1->i_mutex);
> > > -		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_QUOTA);
> > > +		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_NONDIR2);
> > >  	} else {
> > >  		mutex_lock(&inode2->i_mutex);
> > > -		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_QUOTA);
> > > +		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_NONDIR2);
> > >  	}
> > >  }
> > >  EXPORT_SYMBOL(lock_two_nondirectories);
> > > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > > index 3258761..ec88235 100644
> > > --- a/include/linux/fs.h
> > > +++ b/include/linux/fs.h
> > > @@ -620,10 +620,13 @@ static inline int inode_unhashed(struct inode *inode)
> > >   * 0: the object of the current VFS operation
> > >   * 1: parent
> > >   * 2: child/target
> > > - * 3: quota file
> > > + * 3: xattr
> > > + * 4: second non-directory
> > > + * The last is for certain operations (such as rename) which lock two
> > > + * non-directories at once.
> > >   *
> > >   * The locking order between these classes is
> > > - * parent -> child -> normal -> xattr -> quota
> > > + * parent -> child -> normal -> xattr -> second non-directory
> > >   */
> > >  enum inode_i_mutex_lock_class
> > >  {
> > > @@ -631,7 +634,7 @@ enum inode_i_mutex_lock_class
> > >  	I_MUTEX_PARENT,
> > >  	I_MUTEX_CHILD,
> > >  	I_MUTEX_XATTR,
> > > -	I_MUTEX_QUOTA
> > > +	I_MUTEX_NONDIR2
> > >  };
> > >  
> > >  void lock_two_nondirectories(struct inode *, struct inode*);
> > 
> > Ugly name, but I'm not sure what to call it either. Wonder if it would
> > make sense to do some sort of SOURCE/TARGET lock class and rearrange
> > the code to take that into account?
> 
> You need to order the locks globally somehow (e.g. by ancestor order in
> the case of the parent directories)--you can't always take them in the
> order source and target, for example, because a rename from A/ into B/
> could then deadlock with a simultaneous rename from B/ into A/.  So I
> don't think SOURCE and TARGET would work for names.  The current names
> have a certain logic, but there's probably something more elegant.
> 
> Currently: after these patches a rename of regular a regular file onto
> another regular file will take locks on the source and target parents,
> and source and target (victim) files.  The first two will take PARENT
> and CHILD, the second NORMAL and NONDIR2.
> 

Fair enough -- makes sense...

> > But, that's just bikeshedding, so...
> > 
> > Acked-by: Jeff Layton <jlayton@redhat.com>


-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 06/12] locks: implement delegations
  2013-07-09 12:23       ` Jeff Layton
@ 2013-07-09 14:41           ` J. Bruce Fields
  -1 siblings, 0 replies; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-09 14:41 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Al Viro, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

On Tue, Jul 09, 2013 at 08:23:00AM -0400, Jeff Layton wrote:
> On Wed,  3 Jul 2013 16:12:30 -0400
> "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
> > From: "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > 
> > Implement NFSv4 delegations at the vfs level using the new FL_DELEG lock
> > type.
> > 
> > Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > ---
> >  fs/locks.c         |   49 +++++++++++++++++++++++++++++++++++++++----------
> >  include/linux/fs.h |   18 +++++++++++++++---
> >  2 files changed, 54 insertions(+), 13 deletions(-)
> > 
> > diff --git a/fs/locks.c b/fs/locks.c
> > index deec4de..2b56954 100644
> > --- a/fs/locks.c
> > +++ b/fs/locks.c
> > @@ -1176,28 +1176,40 @@ static void time_out_leases(struct inode *inode)
> >  	}
> >  }
> >  
> > +static bool leases_conflict(struct file_lock *lease, struct file_lock *breaker)
> > +{
> > +	if ((breaker->fl_flags & FL_DELEG) && (lease->fl_flags & FL_LEASE))
> > +		return false;
> > +	return locks_conflict(breaker, lease);
> > +}
> > +
> >  /**
> >   *	__break_lease	-	revoke all outstanding leases on file
> >   *	@inode: the inode of the file to return
> > - *	@mode: the open mode (read or write)
> > + *	@mode: O_RDONLY: break only write leases; O_WRONLY or O_RDWR:
> > + *	    break all leases
> > + *	@type: FL_LEASE: break leases and delegations; FL_DELEG: break
> > + *	    only delegations
> >   *
> >   *	break_lease (inlined for speed) has checked there already is at least
> >   *	some kind of lock (maybe a lease) on this file.  Leases are broken on
> >   *	a call to open() or truncate().  This function can sleep unless you
> >   *	specified %O_NONBLOCK to your open().
> >   */
> > -int __break_lease(struct inode *inode, unsigned int mode)
> > +int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
> >  {
> >  	int error = 0;
> >  	struct file_lock *new_fl, *flock;
> >  	struct file_lock *fl;
> >  	unsigned long break_time;
> >  	int i_have_this_lease = 0;
> > +	bool lease_conflict = false;
> >  	int want_write = (mode & O_ACCMODE) != O_RDONLY;
> >  
> >  	new_fl = lease_alloc(NULL, want_write ? F_WRLCK : F_RDLCK);
> >  	if (IS_ERR(new_fl))
> >  		return PTR_ERR(new_fl);
> > +	new_fl->fl_flags = type;
> >  
> >  	lock_flocks();
> >  
> > @@ -1207,13 +1219,16 @@ int __break_lease(struct inode *inode, unsigned int mode)
> >  	if ((flock == NULL) || !IS_LEASE(flock))
> >  		goto out;
> >  
> > -	if (!locks_conflict(flock, new_fl))
> > +	for (fl = flock; fl && IS_LEASE(fl); fl = fl->fl_next) {
> > +		if (leases_conflict(fl, new_fl)) {
> > +			lease_conflict = true;
> > +			if (fl->fl_owner == current->files)
> > +				i_have_this_lease = 1;
> > +		}
> > +	}
> > +	if (!lease_conflict)
> >  		goto out;
> >  
> > -	for (fl = flock; fl && IS_LEASE(fl); fl = fl->fl_next)
> > -		if (fl->fl_owner == current->files)
> > -			i_have_this_lease = 1;
> > -
> >  	break_time = 0;
> >  	if (lease_break_time > 0) {
> >  		break_time = jiffies + lease_break_time * HZ;
> > @@ -1222,6 +1237,8 @@ int __break_lease(struct inode *inode, unsigned int mode)
> >  	}
> >  
> >  	for (fl = flock; fl && IS_LEASE(fl); fl = fl->fl_next) {
> > +		if (!leases_conflict(fl, new_fl))
> > +			continue;
> >  		if (want_write) {
> >  			if (fl->fl_flags & FL_UNLOCK_PENDING)
> >  				continue;
> > @@ -1263,7 +1280,7 @@ restart:
> >  		 */
> >  		for (flock = inode->i_flock; flock && IS_LEASE(flock);
> >  				flock = flock->fl_next) {
> > -			if (locks_conflict(new_fl, flock))
> > +			if (leases_conflict(new_fl, flock))
> >  				goto restart;
> >  		}
> >  		error = 0;
> > @@ -1343,9 +1360,20 @@ int generic_add_lease(struct file *filp, long arg, struct file_lock **flp)
> >  	struct file_lock *fl, **before, **my_before = NULL, *lease;
> >  	struct dentry *dentry = filp->f_path.dentry;
> >  	struct inode *inode = dentry->d_inode;
> > +	bool is_deleg = (*flp)->fl_flags & FL_DELEG;
> >  	int error;
> >  
> >  	lease = *flp;
> > +	/*
> > +	 * In the delegation case we need mutual exclusion with
> > +	 * a number of operations that take the i_mutex.  We trylock
> > +	 * because delegations are an optional optimization, and if
> > +	 * there's some chance of a conflict--we'd rather not
> > +	 * bother, maybe that's a sign this just isn't a good file to
> > +	 * hand out a delegation on.
> > +	 */
> > +	if (is_deleg && !mutex_trylock(&inode->i_mutex))
> > +		return -EAGAIN;
> >  
> >  	error = -EAGAIN;
> >  	if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0))
> > @@ -1397,9 +1425,10 @@ int generic_add_lease(struct file *filp, long arg, struct file_lock **flp)
> >  		goto out;
> >  
> >  	locks_insert_lock(before, lease);
> > -	return 0;
> > -
> > +	error = 0;
> >  out:
> > +	if (is_deleg)
> > +		mutex_unlock(&inode->i_mutex);
> >  	return error;
> >  }
> >  
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 116b3e9..c6cc686 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -1006,7 +1006,7 @@ extern int vfs_test_lock(struct file *, struct file_lock *);
> >  extern int vfs_lock_file(struct file *, unsigned int, struct file_lock *, struct file_lock *);
> >  extern int vfs_cancel_lock(struct file *filp, struct file_lock *fl);
> >  extern int flock_lock_file_wait(struct file *filp, struct file_lock *fl);
> > -extern int __break_lease(struct inode *inode, unsigned int flags);
> > +extern int __break_lease(struct inode *inode, unsigned int flags, unsigned int type);
> >  extern void lease_get_mtime(struct inode *, struct timespec *time);
> >  extern int generic_setlease(struct file *, long, struct file_lock **);
> >  extern int vfs_setlease(struct file *, long, struct file_lock **);
> > @@ -1119,7 +1119,7 @@ static inline int flock_lock_file_wait(struct file *filp,
> >  	return -ENOLCK;
> >  }
> >  
> > -static inline int __break_lease(struct inode *inode, unsigned int mode)
> > +static inline int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
> >  {
> >  	return 0;
> >  }
> > @@ -1951,9 +1951,17 @@ static inline int locks_verify_truncate(struct inode *inode,
> >  static inline int break_lease(struct inode *inode, unsigned int mode)
> >  {
> >  	if (inode->i_flock)
> > -		return __break_lease(inode, mode);
> > +		return __break_lease(inode, mode, FL_LEASE);
> >  	return 0;
> >  }
> > +
> > +static inline int break_deleg(struct inode *inode, unsigned int mode)
> > +{
> > +	if (inode->i_flock)
> > +		return __break_lease(inode, mode, FL_DELEG);
> > +	return 0;
> > +}
> > +
> >  #else /* !CONFIG_FILE_LOCKING */
> >  static inline int locks_mandatory_locked(struct inode *inode)
> >  {
> > @@ -1993,6 +2001,10 @@ static inline int break_lease(struct inode *inode, unsigned int mode)
> >  	return 0;
> >  }
> >  
> > +static inline int break_deleg(struct inode *inode, unsigned int mode)
> > +{
> > +	return 0;
> > +}
> >  #endif /* CONFIG_FILE_LOCKING */
> >  
> >  /* fs/open.c */
> 
> Looks reasonable...
> 
> This (of course) has the same potential race that Al ID'ed a few days
> ago. We'll probably need to reconcile this patch with whatever fix we
> come up with there, but it shouldn't be too difficult.

Note there are currently no users for write delegations.  I'd like to do
study the case more carefully first.

We should probably make that clear: I'll add a WARN and error return for
an attempt to set a write delegation.

--b.

diff --git a/fs/locks.c b/fs/locks.c
index 2b56954..38f6baf 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1375,6 +1375,12 @@ int generic_add_lease(struct file *filp, long arg, struct file_lock **flp)
 	if (is_deleg && !mutex_trylock(&inode->i_mutex))
 		return -EAGAIN;
 
+	if (is_deleg && arg == F_WRLCK) {
+		/* Write delegations not currently supported. */
+		WARN_ON_ONCE(1);
+		return -EINVAL;
+	}
+
 	error = -EAGAIN;
 	if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0))
 		goto out;
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [PATCH 06/12] locks: implement delegations
@ 2013-07-09 14:41           ` J. Bruce Fields
  0 siblings, 0 replies; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-09 14:41 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Al Viro, linux-nfs, linux-fsdevel

On Tue, Jul 09, 2013 at 08:23:00AM -0400, Jeff Layton wrote:
> On Wed,  3 Jul 2013 16:12:30 -0400
> "J. Bruce Fields" <bfields@redhat.com> wrote:
> 
> > From: "J. Bruce Fields" <bfields@redhat.com>
> > 
> > Implement NFSv4 delegations at the vfs level using the new FL_DELEG lock
> > type.
> > 
> > Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> > ---
> >  fs/locks.c         |   49 +++++++++++++++++++++++++++++++++++++++----------
> >  include/linux/fs.h |   18 +++++++++++++++---
> >  2 files changed, 54 insertions(+), 13 deletions(-)
> > 
> > diff --git a/fs/locks.c b/fs/locks.c
> > index deec4de..2b56954 100644
> > --- a/fs/locks.c
> > +++ b/fs/locks.c
> > @@ -1176,28 +1176,40 @@ static void time_out_leases(struct inode *inode)
> >  	}
> >  }
> >  
> > +static bool leases_conflict(struct file_lock *lease, struct file_lock *breaker)
> > +{
> > +	if ((breaker->fl_flags & FL_DELEG) && (lease->fl_flags & FL_LEASE))
> > +		return false;
> > +	return locks_conflict(breaker, lease);
> > +}
> > +
> >  /**
> >   *	__break_lease	-	revoke all outstanding leases on file
> >   *	@inode: the inode of the file to return
> > - *	@mode: the open mode (read or write)
> > + *	@mode: O_RDONLY: break only write leases; O_WRONLY or O_RDWR:
> > + *	    break all leases
> > + *	@type: FL_LEASE: break leases and delegations; FL_DELEG: break
> > + *	    only delegations
> >   *
> >   *	break_lease (inlined for speed) has checked there already is at least
> >   *	some kind of lock (maybe a lease) on this file.  Leases are broken on
> >   *	a call to open() or truncate().  This function can sleep unless you
> >   *	specified %O_NONBLOCK to your open().
> >   */
> > -int __break_lease(struct inode *inode, unsigned int mode)
> > +int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
> >  {
> >  	int error = 0;
> >  	struct file_lock *new_fl, *flock;
> >  	struct file_lock *fl;
> >  	unsigned long break_time;
> >  	int i_have_this_lease = 0;
> > +	bool lease_conflict = false;
> >  	int want_write = (mode & O_ACCMODE) != O_RDONLY;
> >  
> >  	new_fl = lease_alloc(NULL, want_write ? F_WRLCK : F_RDLCK);
> >  	if (IS_ERR(new_fl))
> >  		return PTR_ERR(new_fl);
> > +	new_fl->fl_flags = type;
> >  
> >  	lock_flocks();
> >  
> > @@ -1207,13 +1219,16 @@ int __break_lease(struct inode *inode, unsigned int mode)
> >  	if ((flock == NULL) || !IS_LEASE(flock))
> >  		goto out;
> >  
> > -	if (!locks_conflict(flock, new_fl))
> > +	for (fl = flock; fl && IS_LEASE(fl); fl = fl->fl_next) {
> > +		if (leases_conflict(fl, new_fl)) {
> > +			lease_conflict = true;
> > +			if (fl->fl_owner == current->files)
> > +				i_have_this_lease = 1;
> > +		}
> > +	}
> > +	if (!lease_conflict)
> >  		goto out;
> >  
> > -	for (fl = flock; fl && IS_LEASE(fl); fl = fl->fl_next)
> > -		if (fl->fl_owner == current->files)
> > -			i_have_this_lease = 1;
> > -
> >  	break_time = 0;
> >  	if (lease_break_time > 0) {
> >  		break_time = jiffies + lease_break_time * HZ;
> > @@ -1222,6 +1237,8 @@ int __break_lease(struct inode *inode, unsigned int mode)
> >  	}
> >  
> >  	for (fl = flock; fl && IS_LEASE(fl); fl = fl->fl_next) {
> > +		if (!leases_conflict(fl, new_fl))
> > +			continue;
> >  		if (want_write) {
> >  			if (fl->fl_flags & FL_UNLOCK_PENDING)
> >  				continue;
> > @@ -1263,7 +1280,7 @@ restart:
> >  		 */
> >  		for (flock = inode->i_flock; flock && IS_LEASE(flock);
> >  				flock = flock->fl_next) {
> > -			if (locks_conflict(new_fl, flock))
> > +			if (leases_conflict(new_fl, flock))
> >  				goto restart;
> >  		}
> >  		error = 0;
> > @@ -1343,9 +1360,20 @@ int generic_add_lease(struct file *filp, long arg, struct file_lock **flp)
> >  	struct file_lock *fl, **before, **my_before = NULL, *lease;
> >  	struct dentry *dentry = filp->f_path.dentry;
> >  	struct inode *inode = dentry->d_inode;
> > +	bool is_deleg = (*flp)->fl_flags & FL_DELEG;
> >  	int error;
> >  
> >  	lease = *flp;
> > +	/*
> > +	 * In the delegation case we need mutual exclusion with
> > +	 * a number of operations that take the i_mutex.  We trylock
> > +	 * because delegations are an optional optimization, and if
> > +	 * there's some chance of a conflict--we'd rather not
> > +	 * bother, maybe that's a sign this just isn't a good file to
> > +	 * hand out a delegation on.
> > +	 */
> > +	if (is_deleg && !mutex_trylock(&inode->i_mutex))
> > +		return -EAGAIN;
> >  
> >  	error = -EAGAIN;
> >  	if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0))
> > @@ -1397,9 +1425,10 @@ int generic_add_lease(struct file *filp, long arg, struct file_lock **flp)
> >  		goto out;
> >  
> >  	locks_insert_lock(before, lease);
> > -	return 0;
> > -
> > +	error = 0;
> >  out:
> > +	if (is_deleg)
> > +		mutex_unlock(&inode->i_mutex);
> >  	return error;
> >  }
> >  
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 116b3e9..c6cc686 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -1006,7 +1006,7 @@ extern int vfs_test_lock(struct file *, struct file_lock *);
> >  extern int vfs_lock_file(struct file *, unsigned int, struct file_lock *, struct file_lock *);
> >  extern int vfs_cancel_lock(struct file *filp, struct file_lock *fl);
> >  extern int flock_lock_file_wait(struct file *filp, struct file_lock *fl);
> > -extern int __break_lease(struct inode *inode, unsigned int flags);
> > +extern int __break_lease(struct inode *inode, unsigned int flags, unsigned int type);
> >  extern void lease_get_mtime(struct inode *, struct timespec *time);
> >  extern int generic_setlease(struct file *, long, struct file_lock **);
> >  extern int vfs_setlease(struct file *, long, struct file_lock **);
> > @@ -1119,7 +1119,7 @@ static inline int flock_lock_file_wait(struct file *filp,
> >  	return -ENOLCK;
> >  }
> >  
> > -static inline int __break_lease(struct inode *inode, unsigned int mode)
> > +static inline int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
> >  {
> >  	return 0;
> >  }
> > @@ -1951,9 +1951,17 @@ static inline int locks_verify_truncate(struct inode *inode,
> >  static inline int break_lease(struct inode *inode, unsigned int mode)
> >  {
> >  	if (inode->i_flock)
> > -		return __break_lease(inode, mode);
> > +		return __break_lease(inode, mode, FL_LEASE);
> >  	return 0;
> >  }
> > +
> > +static inline int break_deleg(struct inode *inode, unsigned int mode)
> > +{
> > +	if (inode->i_flock)
> > +		return __break_lease(inode, mode, FL_DELEG);
> > +	return 0;
> > +}
> > +
> >  #else /* !CONFIG_FILE_LOCKING */
> >  static inline int locks_mandatory_locked(struct inode *inode)
> >  {
> > @@ -1993,6 +2001,10 @@ static inline int break_lease(struct inode *inode, unsigned int mode)
> >  	return 0;
> >  }
> >  
> > +static inline int break_deleg(struct inode *inode, unsigned int mode)
> > +{
> > +	return 0;
> > +}
> >  #endif /* CONFIG_FILE_LOCKING */
> >  
> >  /* fs/open.c */
> 
> Looks reasonable...
> 
> This (of course) has the same potential race that Al ID'ed a few days
> ago. We'll probably need to reconcile this patch with whatever fix we
> come up with there, but it shouldn't be too difficult.

Note there are currently no users for write delegations.  I'd like to do
study the case more carefully first.

We should probably make that clear: I'll add a WARN and error return for
an attempt to set a write delegation.

--b.

diff --git a/fs/locks.c b/fs/locks.c
index 2b56954..38f6baf 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1375,6 +1375,12 @@ int generic_add_lease(struct file *filp, long arg, struct file_lock **flp)
 	if (is_deleg && !mutex_trylock(&inode->i_mutex))
 		return -EAGAIN;
 
+	if (is_deleg && arg == F_WRLCK) {
+		/* Write delegations not currently supported. */
+		WARN_ON_ONCE(1);
+		return -EINVAL;
+	}
+
 	error = -EAGAIN;
 	if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0))
 		goto out;

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [PATCH 01/12] vfs: pull ext4's double-i_mutex-locking into common code
  2013-07-09 10:49       ` Jeff Layton
  (?)
@ 2013-07-09 15:48       ` Theodore Ts'o
  -1 siblings, 0 replies; 83+ messages in thread
From: Theodore Ts'o @ 2013-07-09 15:48 UTC (permalink / raw)
  To: Jeff Layton
  Cc: J. Bruce Fields, Al Viro, linux-nfs, linux-fsdevel, Andreas Dilger

Acked-by: "Theodore Ts'o" <tytso@mit.edu>

						- Ted

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 08/12] locks: break delegations on unlink
  2013-07-09 13:05       ` Jeff Layton
@ 2013-07-09 15:58           ` J. Bruce Fields
  -1 siblings, 0 replies; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-09 15:58 UTC (permalink / raw)
  To: Jeff Layton
  Cc: J. Bruce Fields, Al Viro, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, David Howells, Tyler Hicks,
	Dustin Kirkland

On Tue, Jul 09, 2013 at 09:05:06AM -0400, Jeff Layton wrote:
> On Wed,  3 Jul 2013 16:12:32 -0400
> "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
> > From: "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > 
> > We need to break delegations on any operation that changes the set of
> > links pointing to an inode.  Start with unlink.
> > 
> > Such operations also hold the i_mutex on a parent directory.  Breaking a
> > delegation may require waiting for a timeout (by default 90 seconds) in
> > the case of a unresponsive NFS client.  To avoid blocking all directory
> > operations, we therefore drop locks before waiting for the delegation.
> > The logic then looks like:
> > 
> > 	acquire locks
> > 	...
> > 	test for delegation; if found:
> > 		take reference on inode
> > 		release locks
> > 		wait for delegation break
> > 		drop reference on inode
> > 		retry
> > 
> > It is possible this could never terminate.  (Even if we take precautions
> > to prevent another delegation being acquired on the same inode, we could
> > get a different inode on each retry.)  But this seems very unlikely.
> > 
> > The initial test for a delegation happens after the lock on the target
> > inode is acquired, but the directory inode may have been acquired
> > further up the call stack.  We therefore add a "struct inode **"
> > argument to any intervening functions, which we use to pass the inode
> > back up to the caller in the case it needs a delegation synchronously
> > broken.
...
> > -int vfs_unlink(struct inode *dir, struct dentry *dentry)
> > +int vfs_unlink(struct inode *dir, struct dentry *dentry, struct inode **delegated_inode)
> 
> nit: this might be a good time to add a kerneldoc header on this
> function. The delegated_inode thing might not be clear to the
> uninitiated.

Something like this?

--b.

diff --git a/fs/namei.c b/fs/namei.c
index cba3db1..7c6e244 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3384,6 +3384,24 @@ SYSCALL_DEFINE1(rmdir, const char __user *, pathname)
 	return do_rmdir(AT_FDCWD, pathname);
 }
 
+/**
+ * vfs_unlink - unlink a filesystem object
+ * @dir:	parent directory
+ * @dentry:	victim
+ * @delegated_inode: returns victim inode, if the inode is delegated.
+ *
+ * The caller must hold dir->i_mutex.
+ *
+ * If vfs_unlink discovers a delegation, it will return -EWOULDBLOCK and
+ * return a reference to the inode in delegated_inode.  The caller
+ * should then break the delegation on that inode and retry.  Because
+ * breaking a delegation may take a long time, the caller should drop
+ * dir->i_mutex before doing so.
+ *
+ * Alternatively, a caller may pass NULL for delegated_inode.  This may
+ * be appropriate for callers that expect the underlying filesystem not
+ * to be NFS exported.
+ */
 int vfs_unlink(struct inode *dir, struct dentry *dentry, struct inode **delegated_inode)
 {
 	struct inode *target = dentry->d_inode;
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [PATCH 08/12] locks: break delegations on unlink
@ 2013-07-09 15:58           ` J. Bruce Fields
  0 siblings, 0 replies; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-09 15:58 UTC (permalink / raw)
  To: Jeff Layton
  Cc: J. Bruce Fields, Al Viro, linux-nfs, linux-fsdevel,
	David Howells, Tyler Hicks, Dustin Kirkland

On Tue, Jul 09, 2013 at 09:05:06AM -0400, Jeff Layton wrote:
> On Wed,  3 Jul 2013 16:12:32 -0400
> "J. Bruce Fields" <bfields@redhat.com> wrote:
> 
> > From: "J. Bruce Fields" <bfields@redhat.com>
> > 
> > We need to break delegations on any operation that changes the set of
> > links pointing to an inode.  Start with unlink.
> > 
> > Such operations also hold the i_mutex on a parent directory.  Breaking a
> > delegation may require waiting for a timeout (by default 90 seconds) in
> > the case of a unresponsive NFS client.  To avoid blocking all directory
> > operations, we therefore drop locks before waiting for the delegation.
> > The logic then looks like:
> > 
> > 	acquire locks
> > 	...
> > 	test for delegation; if found:
> > 		take reference on inode
> > 		release locks
> > 		wait for delegation break
> > 		drop reference on inode
> > 		retry
> > 
> > It is possible this could never terminate.  (Even if we take precautions
> > to prevent another delegation being acquired on the same inode, we could
> > get a different inode on each retry.)  But this seems very unlikely.
> > 
> > The initial test for a delegation happens after the lock on the target
> > inode is acquired, but the directory inode may have been acquired
> > further up the call stack.  We therefore add a "struct inode **"
> > argument to any intervening functions, which we use to pass the inode
> > back up to the caller in the case it needs a delegation synchronously
> > broken.
...
> > -int vfs_unlink(struct inode *dir, struct dentry *dentry)
> > +int vfs_unlink(struct inode *dir, struct dentry *dentry, struct inode **delegated_inode)
> 
> nit: this might be a good time to add a kerneldoc header on this
> function. The delegated_inode thing might not be clear to the
> uninitiated.

Something like this?

--b.

diff --git a/fs/namei.c b/fs/namei.c
index cba3db1..7c6e244 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3384,6 +3384,24 @@ SYSCALL_DEFINE1(rmdir, const char __user *, pathname)
 	return do_rmdir(AT_FDCWD, pathname);
 }
 
+/**
+ * vfs_unlink - unlink a filesystem object
+ * @dir:	parent directory
+ * @dentry:	victim
+ * @delegated_inode: returns victim inode, if the inode is delegated.
+ *
+ * The caller must hold dir->i_mutex.
+ *
+ * If vfs_unlink discovers a delegation, it will return -EWOULDBLOCK and
+ * return a reference to the inode in delegated_inode.  The caller
+ * should then break the delegation on that inode and retry.  Because
+ * breaking a delegation may take a long time, the caller should drop
+ * dir->i_mutex before doing so.
+ *
+ * Alternatively, a caller may pass NULL for delegated_inode.  This may
+ * be appropriate for callers that expect the underlying filesystem not
+ * to be NFS exported.
+ */
 int vfs_unlink(struct inode *dir, struct dentry *dentry, struct inode **delegated_inode)
 {
 	struct inode *target = dentry->d_inode;

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [PATCH 08/12] locks: break delegations on unlink
  2013-07-09 15:58           ` J. Bruce Fields
  (?)
@ 2013-07-09 16:02           ` Jeff Layton
  -1 siblings, 0 replies; 83+ messages in thread
From: Jeff Layton @ 2013-07-09 16:02 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: J. Bruce Fields, Al Viro, linux-nfs, linux-fsdevel,
	David Howells, Tyler Hicks, Dustin Kirkland

On Tue, 9 Jul 2013 11:58:43 -0400
"J. Bruce Fields" <bfields@fieldses.org> wrote:

> On Tue, Jul 09, 2013 at 09:05:06AM -0400, Jeff Layton wrote:
> > On Wed,  3 Jul 2013 16:12:32 -0400
> > "J. Bruce Fields" <bfields@redhat.com> wrote:
> > 
> > > From: "J. Bruce Fields" <bfields@redhat.com>
> > > 
> > > We need to break delegations on any operation that changes the set of
> > > links pointing to an inode.  Start with unlink.
> > > 
> > > Such operations also hold the i_mutex on a parent directory.  Breaking a
> > > delegation may require waiting for a timeout (by default 90 seconds) in
> > > the case of a unresponsive NFS client.  To avoid blocking all directory
> > > operations, we therefore drop locks before waiting for the delegation.
> > > The logic then looks like:
> > > 
> > > 	acquire locks
> > > 	...
> > > 	test for delegation; if found:
> > > 		take reference on inode
> > > 		release locks
> > > 		wait for delegation break
> > > 		drop reference on inode
> > > 		retry
> > > 
> > > It is possible this could never terminate.  (Even if we take precautions
> > > to prevent another delegation being acquired on the same inode, we could
> > > get a different inode on each retry.)  But this seems very unlikely.
> > > 
> > > The initial test for a delegation happens after the lock on the target
> > > inode is acquired, but the directory inode may have been acquired
> > > further up the call stack.  We therefore add a "struct inode **"
> > > argument to any intervening functions, which we use to pass the inode
> > > back up to the caller in the case it needs a delegation synchronously
> > > broken.
> ...
> > > -int vfs_unlink(struct inode *dir, struct dentry *dentry)
> > > +int vfs_unlink(struct inode *dir, struct dentry *dentry, struct inode **delegated_inode)
> > 
> > nit: this might be a good time to add a kerneldoc header on this
> > function. The delegated_inode thing might not be clear to the
> > uninitiated.
> 
> Something like this?
> 
> --b.
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index cba3db1..7c6e244 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3384,6 +3384,24 @@ SYSCALL_DEFINE1(rmdir, const char __user *, pathname)
>  	return do_rmdir(AT_FDCWD, pathname);
>  }
>  
> +/**
> + * vfs_unlink - unlink a filesystem object
> + * @dir:	parent directory
> + * @dentry:	victim
> + * @delegated_inode: returns victim inode, if the inode is delegated.
> + *
> + * The caller must hold dir->i_mutex.
> + *
> + * If vfs_unlink discovers a delegation, it will return -EWOULDBLOCK and
> + * return a reference to the inode in delegated_inode.  The caller
> + * should then break the delegation on that inode and retry.  Because
> + * breaking a delegation may take a long time, the caller should drop
> + * dir->i_mutex before doing so.
> + *
> + * Alternatively, a caller may pass NULL for delegated_inode.  This may
> + * be appropriate for callers that expect the underlying filesystem not
> + * to be NFS exported.
> + */
>  int vfs_unlink(struct inode *dir, struct dentry *dentry, struct inode **delegated_inode)
>  {
>  	struct inode *target = dentry->d_inode;

ACK -- looks good to me.
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 08/12] locks: break delegations on unlink
  2013-07-09 13:05       ` Jeff Layton
@ 2013-07-09 19:29           ` J. Bruce Fields
  -1 siblings, 0 replies; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-09 19:29 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Al Viro, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, David Howells, Tyler Hicks,
	Dustin Kirkland

On Tue, Jul 09, 2013 at 09:05:06AM -0400, Jeff Layton wrote:
> We probably also ought to eyeball some of these other cases where you
> passing in NULL as the deleg_inode too. It's probably reasonable in
> most cases -- exporting a filesystem that you also mount using ecryptfs
> seems silly, but you never know...

The cases in this patch:

	- devtmpfs, mqueue: not exportable

	- cachefiles, ecryptfs: probably nothing prevents exporting the
	  underlying filesystem, but it sounds nuts.  If somebody
	  actually runs into this case, we'll error out--the kernel
	  won't crash, the delegation won't be violated--and they can
	  come to us and try to make the case why this is a problem.

	- nfsd: handles delegation breaks in its own way (by returning a
	  DELAY error to the client and leaving it to the client to
	  retry).

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 08/12] locks: break delegations on unlink
@ 2013-07-09 19:29           ` J. Bruce Fields
  0 siblings, 0 replies; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-09 19:29 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Al Viro, linux-nfs, linux-fsdevel, David Howells, Tyler Hicks,
	Dustin Kirkland

On Tue, Jul 09, 2013 at 09:05:06AM -0400, Jeff Layton wrote:
> We probably also ought to eyeball some of these other cases where you
> passing in NULL as the deleg_inode too. It's probably reasonable in
> most cases -- exporting a filesystem that you also mount using ecryptfs
> seems silly, but you never know...

The cases in this patch:

	- devtmpfs, mqueue: not exportable

	- cachefiles, ecryptfs: probably nothing prevents exporting the
	  underlying filesystem, but it sounds nuts.  If somebody
	  actually runs into this case, we'll error out--the kernel
	  won't crash, the delegation won't be violated--and they can
	  come to us and try to make the case why this is a problem.

	- nfsd: handles delegation breaks in its own way (by returning a
	  DELAY error to the client and leaving it to the client to
	  retry).

--b.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 09/12] locks: helper functions for delegation breaking
  2013-07-09 13:09       ` Jeff Layton
  (?)
@ 2013-07-09 19:31       ` J. Bruce Fields
  2013-07-09 19:37         ` Jeff Layton
  -1 siblings, 1 reply; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-09 19:31 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Al Viro, linux-nfs, linux-fsdevel

On Tue, Jul 09, 2013 at 09:09:52AM -0400, Jeff Layton wrote:
> On Wed,  3 Jul 2013 16:12:33 -0400
> "J. Bruce Fields" <bfields@redhat.com> wrote:
> 
> > From: "J. Bruce Fields" <bfields@redhat.com>
> > 
> > We'll need the same logic for rename and link.
> > 
> > Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> > ---
> >  fs/namei.c         |   13 +++----------
> >  include/linux/fs.h |   33 +++++++++++++++++++++++++++++++--
> >  2 files changed, 34 insertions(+), 12 deletions(-)
> > 
> > diff --git a/fs/namei.c b/fs/namei.c
> > index cba3db1..a9d4031 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -3401,14 +3401,9 @@ int vfs_unlink(struct inode *dir, struct dentry *dentry, struct inode **delegate
> >  	else {
> >  		error = security_inode_unlink(dir, dentry);
> >  		if (!error) {
> > -			error = break_deleg(target, O_WRONLY|O_NONBLOCK);
> > -			if (error) {
> > -				if (error == -EWOULDBLOCK && delegated_inode) {
> > -					*delegated_inode = target;
> > -					ihold(target);
> > -				}
> > +			error = try_break_deleg(target, delegated_inode);
> > +			if (error)
> >  				goto out;
> > -			}
> >  			error = dir->i_op->unlink(dir, dentry);
> >  			if (!error)
> >  				dont_mount(dentry);
> > @@ -3478,9 +3473,7 @@ exit2:
> >  		iput(inode);	/* truncate the inode here */
> >  	inode = NULL;
> >  	if (delegated_inode) {
> > -		error = break_deleg(delegated_inode, O_WRONLY);
> > -		iput(delegated_inode);
> > -		delegated_inode = NULL;
> > +		error = break_deleg_wait(&delegated_inode);
> >  		if (!error)
> >  			goto retry_deleg;
> >  	}
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index f951588..c37e463 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -1894,6 +1894,9 @@ extern bool our_mnt(struct vfsmount *mnt);
> >  
> >  extern int current_umask(void);
> >  
> > +extern void ihold(struct inode * inode);
> > +extern void iput(struct inode *);
> > +
> >  /* /sys/fs */
> >  extern struct kobject *fs_kobj;
> >  
> > @@ -1962,6 +1965,28 @@ static inline int break_deleg(struct inode *inode, unsigned int mode)
> >  	return 0;
> >  }
> >  
> > +static inline int try_break_deleg(struct inode *inode, struct inode **delegated_inode)
> > +{
> > +	int ret;
> > +
> > +	ret = break_deleg(inode, O_WRONLY|O_NONBLOCK);
> > +	if (ret == -EWOULDBLOCK && delegated_inode) {
> > +		*delegated_inode = inode;
> > +		ihold(inode);
> > +	}
> > +	return ret;
> > +}
> > +
> > +static inline int break_deleg_wait(struct inode **delegated_inode)
> > +{
> > +	int ret;
> > +
> > +	ret = break_deleg(*delegated_inode, O_WRONLY);
> > +	iput(*delegated_inode);
> > +	*delegated_inode = NULL;
> > +	return ret;
> > +}
> > +
> >  #else /* !CONFIG_FILE_LOCKING */
> >  static inline int locks_mandatory_locked(struct inode *inode)
> >  {
> > @@ -2005,6 +2030,12 @@ static inline int break_deleg(struct inode *inode, unsigned int mode)
> >  {
> >  	return 0;
> >  }
> > +
> > +static inline int try_break_deleg(struct inode *inode, struct delegated_inode **inode)
> > +{
> > +	return 0;
> > +}
> > +
> >  #endif /* CONFIG_FILE_LOCKING */
> >  
> >  /* fs/open.c */
> > @@ -2335,8 +2366,6 @@ extern loff_t vfs_llseek(struct file *file, loff_t offset, int whence);
> >  extern int inode_init_always(struct super_block *, struct inode *);
> >  extern void inode_init_once(struct inode *);
> >  extern void address_space_init_once(struct address_space *mapping);
> > -extern void ihold(struct inode * inode);
> > -extern void iput(struct inode *);
> >  extern struct inode * igrab(struct inode *);
> >  extern ino_t iunique(struct super_block *, ino_t);
> >  extern int inode_needs_sync(struct inode *inode);
> 
> Nice cleanup. Might be reasonable to reorder or merge this patch with
> the previous one to reduce "churn" in vfs_unlink.

That's how I first did it then I thought "eh, there's a way I could make
this patch a little smaller...".   Seemed like it might be a tad easier
to review this way.  Up to you.

--b.

> 
> Acked-by: Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 09/12] locks: helper functions for delegation breaking
  2013-07-09 19:31       ` J. Bruce Fields
@ 2013-07-09 19:37         ` Jeff Layton
  0 siblings, 0 replies; 83+ messages in thread
From: Jeff Layton @ 2013-07-09 19:37 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Al Viro, linux-nfs, linux-fsdevel

On Tue, 9 Jul 2013 15:31:25 -0400
"J. Bruce Fields" <bfields@redhat.com> wrote:

> On Tue, Jul 09, 2013 at 09:09:52AM -0400, Jeff Layton wrote:
> > On Wed,  3 Jul 2013 16:12:33 -0400
> > "J. Bruce Fields" <bfields@redhat.com> wrote:
> > 
> > > From: "J. Bruce Fields" <bfields@redhat.com>
> > > 
> > > We'll need the same logic for rename and link.
> > > 
> > > Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> > > ---
> > >  fs/namei.c         |   13 +++----------
> > >  include/linux/fs.h |   33 +++++++++++++++++++++++++++++++--
> > >  2 files changed, 34 insertions(+), 12 deletions(-)
> > > 
> > > diff --git a/fs/namei.c b/fs/namei.c
> > > index cba3db1..a9d4031 100644
> > > --- a/fs/namei.c
> > > +++ b/fs/namei.c
> > > @@ -3401,14 +3401,9 @@ int vfs_unlink(struct inode *dir, struct dentry *dentry, struct inode **delegate
> > >  	else {
> > >  		error = security_inode_unlink(dir, dentry);
> > >  		if (!error) {
> > > -			error = break_deleg(target, O_WRONLY|O_NONBLOCK);
> > > -			if (error) {
> > > -				if (error == -EWOULDBLOCK && delegated_inode) {
> > > -					*delegated_inode = target;
> > > -					ihold(target);
> > > -				}
> > > +			error = try_break_deleg(target, delegated_inode);
> > > +			if (error)
> > >  				goto out;
> > > -			}
> > >  			error = dir->i_op->unlink(dir, dentry);
> > >  			if (!error)
> > >  				dont_mount(dentry);
> > > @@ -3478,9 +3473,7 @@ exit2:
> > >  		iput(inode);	/* truncate the inode here */
> > >  	inode = NULL;
> > >  	if (delegated_inode) {
> > > -		error = break_deleg(delegated_inode, O_WRONLY);
> > > -		iput(delegated_inode);
> > > -		delegated_inode = NULL;
> > > +		error = break_deleg_wait(&delegated_inode);
> > >  		if (!error)
> > >  			goto retry_deleg;
> > >  	}
> > > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > > index f951588..c37e463 100644
> > > --- a/include/linux/fs.h
> > > +++ b/include/linux/fs.h
> > > @@ -1894,6 +1894,9 @@ extern bool our_mnt(struct vfsmount *mnt);
> > >  
> > >  extern int current_umask(void);
> > >  
> > > +extern void ihold(struct inode * inode);
> > > +extern void iput(struct inode *);
> > > +
> > >  /* /sys/fs */
> > >  extern struct kobject *fs_kobj;
> > >  
> > > @@ -1962,6 +1965,28 @@ static inline int break_deleg(struct inode *inode, unsigned int mode)
> > >  	return 0;
> > >  }
> > >  
> > > +static inline int try_break_deleg(struct inode *inode, struct inode **delegated_inode)
> > > +{
> > > +	int ret;
> > > +
> > > +	ret = break_deleg(inode, O_WRONLY|O_NONBLOCK);
> > > +	if (ret == -EWOULDBLOCK && delegated_inode) {
> > > +		*delegated_inode = inode;
> > > +		ihold(inode);
> > > +	}
> > > +	return ret;
> > > +}
> > > +
> > > +static inline int break_deleg_wait(struct inode **delegated_inode)
> > > +{
> > > +	int ret;
> > > +
> > > +	ret = break_deleg(*delegated_inode, O_WRONLY);
> > > +	iput(*delegated_inode);
> > > +	*delegated_inode = NULL;
> > > +	return ret;
> > > +}
> > > +
> > >  #else /* !CONFIG_FILE_LOCKING */
> > >  static inline int locks_mandatory_locked(struct inode *inode)
> > >  {
> > > @@ -2005,6 +2030,12 @@ static inline int break_deleg(struct inode *inode, unsigned int mode)
> > >  {
> > >  	return 0;
> > >  }
> > > +
> > > +static inline int try_break_deleg(struct inode *inode, struct delegated_inode **inode)
> > > +{
> > > +	return 0;
> > > +}
> > > +
> > >  #endif /* CONFIG_FILE_LOCKING */
> > >  
> > >  /* fs/open.c */
> > > @@ -2335,8 +2366,6 @@ extern loff_t vfs_llseek(struct file *file, loff_t offset, int whence);
> > >  extern int inode_init_always(struct super_block *, struct inode *);
> > >  extern void inode_init_once(struct inode *);
> > >  extern void address_space_init_once(struct address_space *mapping);
> > > -extern void ihold(struct inode * inode);
> > > -extern void iput(struct inode *);
> > >  extern struct inode * igrab(struct inode *);
> > >  extern ino_t iunique(struct super_block *, ino_t);
> > >  extern int inode_needs_sync(struct inode *inode);
> > 
> > Nice cleanup. Might be reasonable to reorder or merge this patch with
> > the previous one to reduce "churn" in vfs_unlink.
> 
> That's how I first did it then I thought "eh, there's a way I could make
> this patch a little smaller...".   Seemed like it might be a tad easier
> to review this way.  Up to you.
> 

Reordering it would still keep it granular. Maybe introduce the helpers
first and then add patches that make the callers use them?

But that's just me being picky -- the patches look reasonable either
way...

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 09/12] locks: helper functions for delegation breaking
  2013-07-09 13:23   ` Jeff Layton
@ 2013-07-09 19:38     ` J. Bruce Fields
  2013-07-09 20:28       ` Jeff Layton
  0 siblings, 1 reply; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-09 19:38 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Al Viro, linux-nfs, linux-fsdevel

On Tue, Jul 09, 2013 at 09:23:10AM -0400, Jeff Layton wrote:
> On Wed,  3 Jul 2013 16:12:33 -0400
> "J. Bruce Fields" <bfields@redhat.com> wrote:
> 
> > From: "J. Bruce Fields" <bfields@redhat.com>
> > 
> > We'll need the same logic for rename and link.
> > 
> > Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> > ---
> >  fs/namei.c         |   13 +++----------
> >  include/linux/fs.h |   33 +++++++++++++++++++++++++++++++--
> >  2 files changed, 34 insertions(+), 12 deletions(-)
> > 
> > diff --git a/fs/namei.c b/fs/namei.c
> > index cba3db1..a9d4031 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -3401,14 +3401,9 @@ int vfs_unlink(struct inode *dir, struct dentry *dentry, struct inode **delegate
> >  	else {
> >  		error = security_inode_unlink(dir, dentry);
> >  		if (!error) {
> > -			error = break_deleg(target, O_WRONLY|O_NONBLOCK);
> > -			if (error) {
> > -				if (error == -EWOULDBLOCK && delegated_inode) {
> > -					*delegated_inode = target;
> > -					ihold(target);
> > -				}
> > +			error = try_break_deleg(target, delegated_inode);
> > +			if (error)
> >  				goto out;
> > -			}
> >  			error = dir->i_op->unlink(dir, dentry);
> >  			if (!error)
> >  				dont_mount(dentry);
> > @@ -3478,9 +3473,7 @@ exit2:
> >  		iput(inode);	/* truncate the inode here */
> >  	inode = NULL;
> >  	if (delegated_inode) {
> > -		error = break_deleg(delegated_inode, O_WRONLY);
> > -		iput(delegated_inode);
> > -		delegated_inode = NULL;
> > +		error = break_deleg_wait(&delegated_inode);
> >  		if (!error)
> >  			goto retry_deleg;
> >  	}
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index f951588..c37e463 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -1894,6 +1894,9 @@ extern bool our_mnt(struct vfsmount *mnt);
> >  
> >  extern int current_umask(void);
> >  
> > +extern void ihold(struct inode * inode);
> > +extern void iput(struct inode *);
> > +
> >  /* /sys/fs */
> >  extern struct kobject *fs_kobj;
> >  
> > @@ -1962,6 +1965,28 @@ static inline int break_deleg(struct inode *inode, unsigned int mode)
> >  	return 0;
> >  }
> >  
> > +static inline int try_break_deleg(struct inode *inode, struct inode **delegated_inode)
> > +{
> > +	int ret;
> > +
> > +	ret = break_deleg(inode, O_WRONLY|O_NONBLOCK);
> > +	if (ret == -EWOULDBLOCK && delegated_inode) {
> > +		*delegated_inode = inode;
> > +		ihold(inode);
> > +	}
> > +	return ret;
> > +}
> > +
> 
> Actually, now that I look...
> 
> Suppose a vfs_unlink caller passes in a NULL delegated_inode pointer.
> He'll get back a -EWOULDBLOCK here if there's a delegation on it.
> Presumably he'll just treat that as a hard error and the delegation
> would never get broken. Is that expected?

Yes.

The callers that pass in NULL are callers who should really never
encounter a write delegation.  E.g. it might happen if you exported a
filesystem that you're also stacking ecryptfs on top of (assuming that's
possible).  Maybe I'm missing something, but that looks pointless.
Erroring out seems a safe way to handle it.

It wouldn't be a problem to convert more of those callers to do the same
retry loop, it just seems better to avoid the extra spaghetti to support
what looks like a nutty use case.

--b.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 09/12] locks: helper functions for delegation breaking
  2013-07-09 19:38     ` J. Bruce Fields
@ 2013-07-09 20:28       ` Jeff Layton
  0 siblings, 0 replies; 83+ messages in thread
From: Jeff Layton @ 2013-07-09 20:28 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Al Viro, linux-nfs, linux-fsdevel

On Tue, 9 Jul 2013 15:38:36 -0400
"J. Bruce Fields" <bfields@redhat.com> wrote:

> On Tue, Jul 09, 2013 at 09:23:10AM -0400, Jeff Layton wrote:
> > On Wed,  3 Jul 2013 16:12:33 -0400
> > "J. Bruce Fields" <bfields@redhat.com> wrote:
> > 
> > > From: "J. Bruce Fields" <bfields@redhat.com>
> > > 
> > > We'll need the same logic for rename and link.
> > > 
> > > Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> > > ---
> > >  fs/namei.c         |   13 +++----------
> > >  include/linux/fs.h |   33 +++++++++++++++++++++++++++++++--
> > >  2 files changed, 34 insertions(+), 12 deletions(-)
> > > 
> > > diff --git a/fs/namei.c b/fs/namei.c
> > > index cba3db1..a9d4031 100644
> > > --- a/fs/namei.c
> > > +++ b/fs/namei.c
> > > @@ -3401,14 +3401,9 @@ int vfs_unlink(struct inode *dir, struct dentry *dentry, struct inode **delegate
> > >  	else {
> > >  		error = security_inode_unlink(dir, dentry);
> > >  		if (!error) {
> > > -			error = break_deleg(target, O_WRONLY|O_NONBLOCK);
> > > -			if (error) {
> > > -				if (error == -EWOULDBLOCK && delegated_inode) {
> > > -					*delegated_inode = target;
> > > -					ihold(target);
> > > -				}
> > > +			error = try_break_deleg(target, delegated_inode);
> > > +			if (error)
> > >  				goto out;
> > > -			}
> > >  			error = dir->i_op->unlink(dir, dentry);
> > >  			if (!error)
> > >  				dont_mount(dentry);
> > > @@ -3478,9 +3473,7 @@ exit2:
> > >  		iput(inode);	/* truncate the inode here */
> > >  	inode = NULL;
> > >  	if (delegated_inode) {
> > > -		error = break_deleg(delegated_inode, O_WRONLY);
> > > -		iput(delegated_inode);
> > > -		delegated_inode = NULL;
> > > +		error = break_deleg_wait(&delegated_inode);
> > >  		if (!error)
> > >  			goto retry_deleg;
> > >  	}
> > > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > > index f951588..c37e463 100644
> > > --- a/include/linux/fs.h
> > > +++ b/include/linux/fs.h
> > > @@ -1894,6 +1894,9 @@ extern bool our_mnt(struct vfsmount *mnt);
> > >  
> > >  extern int current_umask(void);
> > >  
> > > +extern void ihold(struct inode * inode);
> > > +extern void iput(struct inode *);
> > > +
> > >  /* /sys/fs */
> > >  extern struct kobject *fs_kobj;
> > >  
> > > @@ -1962,6 +1965,28 @@ static inline int break_deleg(struct inode *inode, unsigned int mode)
> > >  	return 0;
> > >  }
> > >  
> > > +static inline int try_break_deleg(struct inode *inode, struct inode **delegated_inode)
> > > +{
> > > +	int ret;
> > > +
> > > +	ret = break_deleg(inode, O_WRONLY|O_NONBLOCK);
> > > +	if (ret == -EWOULDBLOCK && delegated_inode) {
> > > +		*delegated_inode = inode;
> > > +		ihold(inode);
> > > +	}
> > > +	return ret;
> > > +}
> > > +
> > 
> > Actually, now that I look...
> > 
> > Suppose a vfs_unlink caller passes in a NULL delegated_inode pointer.
> > He'll get back a -EWOULDBLOCK here if there's a delegation on it.
> > Presumably he'll just treat that as a hard error and the delegation
> > would never get broken. Is that expected?
> 
> Yes.
> 
> The callers that pass in NULL are callers who should really never
> encounter a write delegation.  E.g. it might happen if you exported a
> filesystem that you're also stacking ecryptfs on top of (assuming that's
> possible).  Maybe I'm missing something, but that looks pointless.
> Erroring out seems a safe way to handle it.
> 
> It wouldn't be a problem to convert more of those callers to do the same
> retry loop, it just seems better to avoid the extra spaghetti to support
> what looks like a nutty use case.
> 
> --b.

Ok, thanks for clarifying that. I agree that those others are nutty
use-cases, at least for now. If someone were to ever allow setting of
FL_DELEGs from userland though they'll probably need to contend with
them.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 11/12] locks: break delegations on link
  2013-07-09 13:16       ` Jeff Layton
@ 2013-07-09 20:41           ` J. Bruce Fields
  -1 siblings, 0 replies; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-09 20:41 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Al Viro, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Tyler Hicks,
	Dustin Kirkland

On Tue, Jul 09, 2013 at 09:16:17AM -0400, Jeff Layton wrote:
> On Wed,  3 Jul 2013 16:12:35 -0400
> "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
> > From: "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > 
> > Cc: Tyler Hicks <tyhicks-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> > Cc: Dustin Kirkland <dustin.kirkland-Bv2LyzZ6GzxBDgjK7y7TUQ@public.gmane.org>
> > Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > ---
> >  fs/ecryptfs/inode.c |    2 +-
> >  fs/namei.c          |   17 +++++++++++++----
> >  fs/nfsd/vfs.c       |    2 +-
> >  include/linux/fs.h  |    2 +-
> >  4 files changed, 16 insertions(+), 7 deletions(-)
> > 
> > diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
> > index af42d88..19e4435 100644
> > --- a/fs/ecryptfs/inode.c
> > +++ b/fs/ecryptfs/inode.c
> > @@ -475,7 +475,7 @@ static int ecryptfs_link(struct dentry *old_dentry, struct inode *dir,
> >  	dget(lower_new_dentry);
> >  	lower_dir_dentry = lock_parent(lower_new_dentry);
> >  	rc = vfs_link(lower_old_dentry, lower_dir_dentry->d_inode,
> > -		      lower_new_dentry);
> > +		      lower_new_dentry, NULL);
> >  	if (rc || !lower_new_dentry->d_inode)
> >  		goto out_lock;
> >  	rc = ecryptfs_interpose(lower_new_dentry, new_dentry, dir->i_sb);
> > diff --git a/fs/namei.c b/fs/namei.c
> > index be00d37..18267e0 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -3566,7 +3566,7 @@ SYSCALL_DEFINE2(symlink, const char __user *, oldname, const char __user *, newn
> >  	return sys_symlinkat(oldname, AT_FDCWD, newname);
> >  }
> >  
> > -int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_dentry)
> > +int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_dentry, struct inode **delegated_inode)
> 
> A kerneldoc comment would be nice here. Ditto for vfs_rename* in the
> previous patch...

OK, done locally, using modified versions of the comment for unlink.
I'll resend the series soon.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 11/12] locks: break delegations on link
@ 2013-07-09 20:41           ` J. Bruce Fields
  0 siblings, 0 replies; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-09 20:41 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Al Viro, linux-nfs, linux-fsdevel, Tyler Hicks, Dustin Kirkland

On Tue, Jul 09, 2013 at 09:16:17AM -0400, Jeff Layton wrote:
> On Wed,  3 Jul 2013 16:12:35 -0400
> "J. Bruce Fields" <bfields@redhat.com> wrote:
> 
> > From: "J. Bruce Fields" <bfields@redhat.com>
> > 
> > Cc: Tyler Hicks <tyhicks@canonical.com>
> > Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
> > Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> > ---
> >  fs/ecryptfs/inode.c |    2 +-
> >  fs/namei.c          |   17 +++++++++++++----
> >  fs/nfsd/vfs.c       |    2 +-
> >  include/linux/fs.h  |    2 +-
> >  4 files changed, 16 insertions(+), 7 deletions(-)
> > 
> > diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
> > index af42d88..19e4435 100644
> > --- a/fs/ecryptfs/inode.c
> > +++ b/fs/ecryptfs/inode.c
> > @@ -475,7 +475,7 @@ static int ecryptfs_link(struct dentry *old_dentry, struct inode *dir,
> >  	dget(lower_new_dentry);
> >  	lower_dir_dentry = lock_parent(lower_new_dentry);
> >  	rc = vfs_link(lower_old_dentry, lower_dir_dentry->d_inode,
> > -		      lower_new_dentry);
> > +		      lower_new_dentry, NULL);
> >  	if (rc || !lower_new_dentry->d_inode)
> >  		goto out_lock;
> >  	rc = ecryptfs_interpose(lower_new_dentry, new_dentry, dir->i_sb);
> > diff --git a/fs/namei.c b/fs/namei.c
> > index be00d37..18267e0 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -3566,7 +3566,7 @@ SYSCALL_DEFINE2(symlink, const char __user *, oldname, const char __user *, newn
> >  	return sys_symlinkat(oldname, AT_FDCWD, newname);
> >  }
> >  
> > -int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_dentry)
> > +int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_dentry, struct inode **delegated_inode)
> 
> A kerneldoc comment would be nice here. Ditto for vfs_rename* in the
> previous patch...

OK, done locally, using modified versions of the comment for unlink.
I'll resend the series soon.

--b.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 12/12] locks: break delegations on any attribute modification
  2013-07-09 13:30   ` Jeff Layton
@ 2013-07-09 20:51         ` J. Bruce Fields
  0 siblings, 0 replies; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-09 20:51 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Al Viro, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Mikulas Patocka,
	David Howells, Tyler Hicks, Dustin Kirkland

On Tue, Jul 09, 2013 at 09:30:47AM -0400, Jeff Layton wrote:
> On Wed,  3 Jul 2013 16:12:36 -0400
> "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
> > From: "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > 
> > NFSv4 uses leases to guarantee that clients can cache metadata as well
> > as data.
> > 
> > Cc: Mikulas Patocka <mikulas-TTVWCEgN8Z9G4ohzP4jBZS1Fcj925eT/@public.gmane.org>
> > Cc: David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > Cc: Tyler Hicks <tyhicks-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> > Cc: Dustin Kirkland <dustin.kirkland-Bv2LyzZ6GzxBDgjK7y7TUQ@public.gmane.org>
> > Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > ---
> >  drivers/base/devtmpfs.c   |    4 ++--
> >  fs/attr.c                 |    5 ++++-
> >  fs/cachefiles/interface.c |    4 ++--
> >  fs/ecryptfs/inode.c       |    2 +-
> >  fs/hpfs/namei.c           |    2 +-
> >  fs/inode.c                |    6 +++++-
> >  fs/nfsd/vfs.c             |    8 ++++++--
> >  fs/open.c                 |   21 +++++++++++++++++----
> >  fs/utimes.c               |    9 ++++++++-
> >  include/linux/fs.h        |    2 +-
> >  10 files changed, 47 insertions(+), 16 deletions(-)
> > 
> > diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
> > index 1b8490e..0f38201 100644
> > --- a/drivers/base/devtmpfs.c
> > +++ b/drivers/base/devtmpfs.c
> > @@ -216,7 +216,7 @@ static int handle_create(const char *nodename, umode_t mode, kuid_t uid,
> >  		newattrs.ia_gid = gid;
> >  		newattrs.ia_valid = ATTR_MODE|ATTR_UID|ATTR_GID;
> >  		mutex_lock(&dentry->d_inode->i_mutex);
> > -		notify_change(dentry, &newattrs);
> > +		notify_change(dentry, &newattrs, NULL);
> >  		mutex_unlock(&dentry->d_inode->i_mutex);
> >  
> >  		/* mark as kernel-created inode */
> > @@ -322,7 +322,7 @@ static int handle_remove(const char *nodename, struct device *dev)
> >  			newattrs.ia_valid =
> >  				ATTR_UID|ATTR_GID|ATTR_MODE;
> >  			mutex_lock(&dentry->d_inode->i_mutex);
> > -			notify_change(dentry, &newattrs);
> > +			notify_change(dentry, &newattrs, NULL);
> >  			mutex_unlock(&dentry->d_inode->i_mutex);
> >  			err = vfs_unlink(parent.dentry->d_inode, dentry, NULL);
> >  			if (!err || err == -ENOENT)
> > diff --git a/fs/attr.c b/fs/attr.c
> > index 1449adb..261f5c9 100644
> > --- a/fs/attr.c
> > +++ b/fs/attr.c
> > @@ -167,7 +167,7 @@ void setattr_copy(struct inode *inode, const struct iattr *attr)
> >  }
> >  EXPORT_SYMBOL(setattr_copy);
> >  
> > -int notify_change(struct dentry * dentry, struct iattr * attr)
> > +int notify_change(struct dentry * dentry, struct iattr * attr, struct inode **delegated_inode)
> >  {
> >  	struct inode *inode = dentry->d_inode;
> >  	umode_t mode = inode->i_mode;
> > @@ -243,6 +243,9 @@ int notify_change(struct dentry * dentry, struct iattr * attr)
> >  	error = security_inode_setattr(dentry, attr);
> >  	if (error)
> >  		return error;
> > +	error = try_break_deleg(inode, delegated_inode);
> > +	if (error)
> > +		return error;
> >  
> >  	if (inode->i_op->setattr)
> >  		error = inode->i_op->setattr(dentry, attr);
> > diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
> > index 746ce53..40f5917 100644
> > --- a/fs/cachefiles/interface.c
> > +++ b/fs/cachefiles/interface.c
> > @@ -417,14 +417,14 @@ static int cachefiles_attr_changed(struct fscache_object *_object)
> >  		_debug("discard tail %llx", oi_size);
> >  		newattrs.ia_valid = ATTR_SIZE;
> >  		newattrs.ia_size = oi_size & PAGE_MASK;
> > -		ret = notify_change(object->backer, &newattrs);
> > +		ret = notify_change(object->backer, &newattrs, NULL);
> >  		if (ret < 0)
> >  			goto truncate_failed;
> >  	}
> >  
> >  	newattrs.ia_valid = ATTR_SIZE;
> >  	newattrs.ia_size = ni_size;
> > -	ret = notify_change(object->backer, &newattrs);
> > +	ret = notify_change(object->backer, &newattrs, NULL);
> >  
> >  truncate_failed:
> >  	mutex_unlock(&object->backer->d_inode->i_mutex);
> > diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
> > index 19e4435..bd54575 100644
> > --- a/fs/ecryptfs/inode.c
> > +++ b/fs/ecryptfs/inode.c
> > @@ -992,7 +992,7 @@ static int ecryptfs_setattr(struct dentry *dentry, struct iattr *ia)
> >  		lower_ia.ia_valid &= ~ATTR_MODE;
> >  
> >  	mutex_lock(&lower_dentry->d_inode->i_mutex);
> > -	rc = notify_change(lower_dentry, &lower_ia);
> > +	rc = notify_change(lower_dentry, &lower_ia, NULL);
> >  	mutex_unlock(&lower_dentry->d_inode->i_mutex);
> >  out:
> >  	fsstack_copy_attr_all(inode, lower_inode);
> > diff --git a/fs/hpfs/namei.c b/fs/hpfs/namei.c
> > index 345713d..1b39afd 100644
> > --- a/fs/hpfs/namei.c
> > +++ b/fs/hpfs/namei.c
> > @@ -407,7 +407,7 @@ again:
> >  			/*printk("HPFS: truncating file before delete.\n");*/
> >  			newattrs.ia_size = 0;
> >  			newattrs.ia_valid = ATTR_SIZE | ATTR_CTIME;
> > -			err = notify_change(dentry, &newattrs);
> > +			err = notify_change(dentry, &newattrs, NULL);
> >  			put_write_access(inode);
> >  			if (!err)
> >  				goto again;
> > diff --git a/fs/inode.c b/fs/inode.c
> > index 304db4c..664d631 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -1633,7 +1633,11 @@ static int __remove_suid(struct dentry *dentry, int kill)
> >  	struct iattr newattrs;
> >  
> >  	newattrs.ia_valid = ATTR_FORCE | kill;
> > -	return notify_change(dentry, &newattrs);
> > +	/*
> > +	 * Note we call this on write, so notify_change will not
> > +	 * encounter any conflicting delegations:
> > +	 */
> > +	return notify_change(dentry, &newattrs, NULL);
> >  }
> >  
> >  int file_remove_suid(struct file *file)
> > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > index b9740cb..e781901 100644
> > --- a/fs/nfsd/vfs.c
> > +++ b/fs/nfsd/vfs.c
> > @@ -426,7 +426,7 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, struct iattr *iap,
> >  			goto out_nfserr;
> >  		fh_lock(fhp);
> >  
> > -		host_err = notify_change(dentry, iap);
> > +		host_err = notify_change(dentry, iap, NULL);
> >  		err = nfserrno(host_err);
> >  		fh_unlock(fhp);
> >  	}
> > @@ -959,7 +959,11 @@ static void kill_suid(struct dentry *dentry)
> >  	ia.ia_valid = ATTR_KILL_SUID | ATTR_KILL_SGID | ATTR_KILL_PRIV;
> >  
> >  	mutex_lock(&dentry->d_inode->i_mutex);
> > -	notify_change(dentry, &ia);
> > +	/*
> > +	 * Note we call this on write, so notify_change will not
> > +	 * encounter any conflicting delegations:
> > +	 */
> > +	notify_change(dentry, &ia, NULL);
> >  	mutex_unlock(&dentry->d_inode->i_mutex);
> >  }
> >  
> > diff --git a/fs/open.c b/fs/open.c
> > index 8c74100..1a39d29 100644
> > --- a/fs/open.c
> > +++ b/fs/open.c
> > @@ -57,7 +57,7 @@ int do_truncate(struct dentry *dentry, loff_t length, unsigned int time_attrs,
> >  		newattrs.ia_valid |= ret | ATTR_FORCE;
> >  
> >  	mutex_lock(&dentry->d_inode->i_mutex);
> > -	ret = notify_change(dentry, &newattrs);
> > +	ret = notify_change(dentry, &newattrs, NULL);
> >  	mutex_unlock(&dentry->d_inode->i_mutex);
> >  	return ret;
> >  }
> 
> Isn't it possible we'll need to break a delegation on truncate()?

In the truncate case, the caller called break_lease, and in the
ftruncate case it's called with a write open, and the open already broke
any leases or delegations.

Might need a comment--could I get away with just this?:

 	mutex_lock(&dentry->d_inode->i_mutex);
+	/* NULL is safe: any delegations have already been broken: */
 	ret = notify_change(dentry, &newattrs, NULL);
 	mutex_unlock(&dentry->d_inode->i_mutex);
 	return ret;

I also added something to the notify_change kerneldoc: "passing NULL is
fine for callers holding the file open for write, as there can be no
conflicting delegation in that case."

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 12/12] locks: break delegations on any attribute modification
@ 2013-07-09 20:51         ` J. Bruce Fields
  0 siblings, 0 replies; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-09 20:51 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Al Viro, linux-nfs, linux-fsdevel, Mikulas Patocka,
	David Howells, Tyler Hicks, Dustin Kirkland

On Tue, Jul 09, 2013 at 09:30:47AM -0400, Jeff Layton wrote:
> On Wed,  3 Jul 2013 16:12:36 -0400
> "J. Bruce Fields" <bfields@redhat.com> wrote:
> 
> > From: "J. Bruce Fields" <bfields@redhat.com>
> > 
> > NFSv4 uses leases to guarantee that clients can cache metadata as well
> > as data.
> > 
> > Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
> > Cc: David Howells <dhowells@redhat.com>
> > Cc: Tyler Hicks <tyhicks@canonical.com>
> > Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
> > Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> > ---
> >  drivers/base/devtmpfs.c   |    4 ++--
> >  fs/attr.c                 |    5 ++++-
> >  fs/cachefiles/interface.c |    4 ++--
> >  fs/ecryptfs/inode.c       |    2 +-
> >  fs/hpfs/namei.c           |    2 +-
> >  fs/inode.c                |    6 +++++-
> >  fs/nfsd/vfs.c             |    8 ++++++--
> >  fs/open.c                 |   21 +++++++++++++++++----
> >  fs/utimes.c               |    9 ++++++++-
> >  include/linux/fs.h        |    2 +-
> >  10 files changed, 47 insertions(+), 16 deletions(-)
> > 
> > diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
> > index 1b8490e..0f38201 100644
> > --- a/drivers/base/devtmpfs.c
> > +++ b/drivers/base/devtmpfs.c
> > @@ -216,7 +216,7 @@ static int handle_create(const char *nodename, umode_t mode, kuid_t uid,
> >  		newattrs.ia_gid = gid;
> >  		newattrs.ia_valid = ATTR_MODE|ATTR_UID|ATTR_GID;
> >  		mutex_lock(&dentry->d_inode->i_mutex);
> > -		notify_change(dentry, &newattrs);
> > +		notify_change(dentry, &newattrs, NULL);
> >  		mutex_unlock(&dentry->d_inode->i_mutex);
> >  
> >  		/* mark as kernel-created inode */
> > @@ -322,7 +322,7 @@ static int handle_remove(const char *nodename, struct device *dev)
> >  			newattrs.ia_valid =
> >  				ATTR_UID|ATTR_GID|ATTR_MODE;
> >  			mutex_lock(&dentry->d_inode->i_mutex);
> > -			notify_change(dentry, &newattrs);
> > +			notify_change(dentry, &newattrs, NULL);
> >  			mutex_unlock(&dentry->d_inode->i_mutex);
> >  			err = vfs_unlink(parent.dentry->d_inode, dentry, NULL);
> >  			if (!err || err == -ENOENT)
> > diff --git a/fs/attr.c b/fs/attr.c
> > index 1449adb..261f5c9 100644
> > --- a/fs/attr.c
> > +++ b/fs/attr.c
> > @@ -167,7 +167,7 @@ void setattr_copy(struct inode *inode, const struct iattr *attr)
> >  }
> >  EXPORT_SYMBOL(setattr_copy);
> >  
> > -int notify_change(struct dentry * dentry, struct iattr * attr)
> > +int notify_change(struct dentry * dentry, struct iattr * attr, struct inode **delegated_inode)
> >  {
> >  	struct inode *inode = dentry->d_inode;
> >  	umode_t mode = inode->i_mode;
> > @@ -243,6 +243,9 @@ int notify_change(struct dentry * dentry, struct iattr * attr)
> >  	error = security_inode_setattr(dentry, attr);
> >  	if (error)
> >  		return error;
> > +	error = try_break_deleg(inode, delegated_inode);
> > +	if (error)
> > +		return error;
> >  
> >  	if (inode->i_op->setattr)
> >  		error = inode->i_op->setattr(dentry, attr);
> > diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
> > index 746ce53..40f5917 100644
> > --- a/fs/cachefiles/interface.c
> > +++ b/fs/cachefiles/interface.c
> > @@ -417,14 +417,14 @@ static int cachefiles_attr_changed(struct fscache_object *_object)
> >  		_debug("discard tail %llx", oi_size);
> >  		newattrs.ia_valid = ATTR_SIZE;
> >  		newattrs.ia_size = oi_size & PAGE_MASK;
> > -		ret = notify_change(object->backer, &newattrs);
> > +		ret = notify_change(object->backer, &newattrs, NULL);
> >  		if (ret < 0)
> >  			goto truncate_failed;
> >  	}
> >  
> >  	newattrs.ia_valid = ATTR_SIZE;
> >  	newattrs.ia_size = ni_size;
> > -	ret = notify_change(object->backer, &newattrs);
> > +	ret = notify_change(object->backer, &newattrs, NULL);
> >  
> >  truncate_failed:
> >  	mutex_unlock(&object->backer->d_inode->i_mutex);
> > diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
> > index 19e4435..bd54575 100644
> > --- a/fs/ecryptfs/inode.c
> > +++ b/fs/ecryptfs/inode.c
> > @@ -992,7 +992,7 @@ static int ecryptfs_setattr(struct dentry *dentry, struct iattr *ia)
> >  		lower_ia.ia_valid &= ~ATTR_MODE;
> >  
> >  	mutex_lock(&lower_dentry->d_inode->i_mutex);
> > -	rc = notify_change(lower_dentry, &lower_ia);
> > +	rc = notify_change(lower_dentry, &lower_ia, NULL);
> >  	mutex_unlock(&lower_dentry->d_inode->i_mutex);
> >  out:
> >  	fsstack_copy_attr_all(inode, lower_inode);
> > diff --git a/fs/hpfs/namei.c b/fs/hpfs/namei.c
> > index 345713d..1b39afd 100644
> > --- a/fs/hpfs/namei.c
> > +++ b/fs/hpfs/namei.c
> > @@ -407,7 +407,7 @@ again:
> >  			/*printk("HPFS: truncating file before delete.\n");*/
> >  			newattrs.ia_size = 0;
> >  			newattrs.ia_valid = ATTR_SIZE | ATTR_CTIME;
> > -			err = notify_change(dentry, &newattrs);
> > +			err = notify_change(dentry, &newattrs, NULL);
> >  			put_write_access(inode);
> >  			if (!err)
> >  				goto again;
> > diff --git a/fs/inode.c b/fs/inode.c
> > index 304db4c..664d631 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -1633,7 +1633,11 @@ static int __remove_suid(struct dentry *dentry, int kill)
> >  	struct iattr newattrs;
> >  
> >  	newattrs.ia_valid = ATTR_FORCE | kill;
> > -	return notify_change(dentry, &newattrs);
> > +	/*
> > +	 * Note we call this on write, so notify_change will not
> > +	 * encounter any conflicting delegations:
> > +	 */
> > +	return notify_change(dentry, &newattrs, NULL);
> >  }
> >  
> >  int file_remove_suid(struct file *file)
> > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > index b9740cb..e781901 100644
> > --- a/fs/nfsd/vfs.c
> > +++ b/fs/nfsd/vfs.c
> > @@ -426,7 +426,7 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, struct iattr *iap,
> >  			goto out_nfserr;
> >  		fh_lock(fhp);
> >  
> > -		host_err = notify_change(dentry, iap);
> > +		host_err = notify_change(dentry, iap, NULL);
> >  		err = nfserrno(host_err);
> >  		fh_unlock(fhp);
> >  	}
> > @@ -959,7 +959,11 @@ static void kill_suid(struct dentry *dentry)
> >  	ia.ia_valid = ATTR_KILL_SUID | ATTR_KILL_SGID | ATTR_KILL_PRIV;
> >  
> >  	mutex_lock(&dentry->d_inode->i_mutex);
> > -	notify_change(dentry, &ia);
> > +	/*
> > +	 * Note we call this on write, so notify_change will not
> > +	 * encounter any conflicting delegations:
> > +	 */
> > +	notify_change(dentry, &ia, NULL);
> >  	mutex_unlock(&dentry->d_inode->i_mutex);
> >  }
> >  
> > diff --git a/fs/open.c b/fs/open.c
> > index 8c74100..1a39d29 100644
> > --- a/fs/open.c
> > +++ b/fs/open.c
> > @@ -57,7 +57,7 @@ int do_truncate(struct dentry *dentry, loff_t length, unsigned int time_attrs,
> >  		newattrs.ia_valid |= ret | ATTR_FORCE;
> >  
> >  	mutex_lock(&dentry->d_inode->i_mutex);
> > -	ret = notify_change(dentry, &newattrs);
> > +	ret = notify_change(dentry, &newattrs, NULL);
> >  	mutex_unlock(&dentry->d_inode->i_mutex);
> >  	return ret;
> >  }
> 
> Isn't it possible we'll need to break a delegation on truncate()?

In the truncate case, the caller called break_lease, and in the
ftruncate case it's called with a write open, and the open already broke
any leases or delegations.

Might need a comment--could I get away with just this?:

 	mutex_lock(&dentry->d_inode->i_mutex);
+	/* NULL is safe: any delegations have already been broken: */
 	ret = notify_change(dentry, &newattrs, NULL);
 	mutex_unlock(&dentry->d_inode->i_mutex);
 	return ret;

I also added something to the notify_change kerneldoc: "passing NULL is
fine for callers holding the file open for write, as there can be no
conflicting delegation in that case."

--b.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 12/12] locks: break delegations on any attribute modification
  2013-07-09 20:51         ` J. Bruce Fields
@ 2013-07-09 21:19             ` J. Bruce Fields
  -1 siblings, 0 replies; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-09 21:19 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Al Viro, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Mikulas Patocka,
	David Howells, Tyler Hicks, Dustin Kirkland

On Tue, Jul 09, 2013 at 04:51:01PM -0400, J. Bruce Fields wrote:
> On Tue, Jul 09, 2013 at 09:30:47AM -0400, Jeff Layton wrote:
> > On Wed,  3 Jul 2013 16:12:36 -0400
> > "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > 
> > > From: "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > > 
> > > NFSv4 uses leases to guarantee that clients can cache metadata as well
> > > as data.
> > > 
...
> > Isn't it possible we'll need to break a delegation on truncate()?
> 
> In the truncate case, the caller called break_lease, and in the
> ftruncate case it's called with a write open, and the open already broke
> any leases or delegations.
> 
> Might need a comment--could I get away with just this?:
> 
>  	mutex_lock(&dentry->d_inode->i_mutex);
> +	/* NULL is safe: any delegations have already been broken: */
>  	ret = notify_change(dentry, &newattrs, NULL);
>  	mutex_unlock(&dentry->d_inode->i_mutex);
>  	return ret;
> 
> I also added something to the notify_change kerneldoc: "passing NULL is
> fine for callers holding the file open for write, as there can be no
> conflicting delegation in that case."

Another question is whether it's really worth dropping locks and
retrying in this case.

We could instead do the following.

--b.

commit 40a4fd613034cd3f242ec11e5ecd44f9a83ab39d
Author: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Date:   Tue Sep 20 17:19:26 2011 -0400

    locks: break delegations on any attribute modification
    
    NFSv4 uses leases to guarantee that clients can cache metadata as well
    as data.
    
    Note unlike link, unlink, and rename, we don't bother dropping locks and
    retrying.  In the other cases we're holding a directory mutex, hence
    blocking operations (even lookups) on the same directory.  In this case
    we're holding only the i_mutex on this file, so the impact of an
    unresponsive client is limited to this file.
    
    Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

diff --git a/fs/attr.c b/fs/attr.c
index 1449adb..a2c1d04 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -243,6 +243,9 @@ int notify_change(struct dentry * dentry, struct iattr * attr)
 	error = security_inode_setattr(dentry, attr);
 	if (error)
 		return error;
+	error = break_deleg_wait(inode);
+	if (error)
+		return error;
 
 	if (inode->i_op->setattr)
 		error = inode->i_op->setattr(dentry, attr);
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [PATCH 12/12] locks: break delegations on any attribute modification
@ 2013-07-09 21:19             ` J. Bruce Fields
  0 siblings, 0 replies; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-09 21:19 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Al Viro, linux-nfs, linux-fsdevel, Mikulas Patocka,
	David Howells, Tyler Hicks, Dustin Kirkland

On Tue, Jul 09, 2013 at 04:51:01PM -0400, J. Bruce Fields wrote:
> On Tue, Jul 09, 2013 at 09:30:47AM -0400, Jeff Layton wrote:
> > On Wed,  3 Jul 2013 16:12:36 -0400
> > "J. Bruce Fields" <bfields@redhat.com> wrote:
> > 
> > > From: "J. Bruce Fields" <bfields@redhat.com>
> > > 
> > > NFSv4 uses leases to guarantee that clients can cache metadata as well
> > > as data.
> > > 
...
> > Isn't it possible we'll need to break a delegation on truncate()?
> 
> In the truncate case, the caller called break_lease, and in the
> ftruncate case it's called with a write open, and the open already broke
> any leases or delegations.
> 
> Might need a comment--could I get away with just this?:
> 
>  	mutex_lock(&dentry->d_inode->i_mutex);
> +	/* NULL is safe: any delegations have already been broken: */
>  	ret = notify_change(dentry, &newattrs, NULL);
>  	mutex_unlock(&dentry->d_inode->i_mutex);
>  	return ret;
> 
> I also added something to the notify_change kerneldoc: "passing NULL is
> fine for callers holding the file open for write, as there can be no
> conflicting delegation in that case."

Another question is whether it's really worth dropping locks and
retrying in this case.

We could instead do the following.

--b.

commit 40a4fd613034cd3f242ec11e5ecd44f9a83ab39d
Author: J. Bruce Fields <bfields@redhat.com>
Date:   Tue Sep 20 17:19:26 2011 -0400

    locks: break delegations on any attribute modification
    
    NFSv4 uses leases to guarantee that clients can cache metadata as well
    as data.
    
    Note unlike link, unlink, and rename, we don't bother dropping locks and
    retrying.  In the other cases we're holding a directory mutex, hence
    blocking operations (even lookups) on the same directory.  In this case
    we're holding only the i_mutex on this file, so the impact of an
    unresponsive client is limited to this file.
    
    Signed-off-by: J. Bruce Fields <bfields@redhat.com>

diff --git a/fs/attr.c b/fs/attr.c
index 1449adb..a2c1d04 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -243,6 +243,9 @@ int notify_change(struct dentry * dentry, struct iattr * attr)
 	error = security_inode_setattr(dentry, attr);
 	if (error)
 		return error;
+	error = break_deleg_wait(inode);
+	if (error)
+		return error;
 
 	if (inode->i_op->setattr)
 		error = inode->i_op->setattr(dentry, attr);

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [PATCH 01/12] vfs: pull ext4's double-i_mutex-locking into common code
  2013-07-03 20:12 ` [PATCH 01/12] vfs: pull ext4's double-i_mutex-locking into common code J. Bruce Fields
@ 2013-07-09 22:04       ` Dave Chinner
  0 siblings, 0 replies; 83+ messages in thread
From: Dave Chinner @ 2013-07-09 22:04 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Al Viro, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Theodore Ts'o,
	Andreas Dilger

On Wed, Jul 03, 2013 at 04:12:25PM -0400, J. Bruce Fields wrote:
> From: "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> 
> We want to do this elsewhere as well.
> 
> Cc: "Theodore Ts'o" <tytso-3s7WtUTddSA@public.gmane.org>
> Cc: Andreas Dilger <adilger.kernel-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org>
> Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  fs/ext4/ext4.h        |    2 --
>  fs/ext4/ioctl.c       |    4 ++--
>  fs/ext4/move_extent.c |   40 ++--------------------------------------
>  fs/inode.c            |   29 +++++++++++++++++++++++++++++
>  include/linux/fs.h    |    3 +++
>  5 files changed, 36 insertions(+), 42 deletions(-)
> 
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 5aae3d1..3590abe 100644

Just to throw a spanner in the works - have you considered that
other filesystems might have different inode lock ordering rules?

For example, XFS locks multiple inodes in ascending inode number
order, not ordered by pointer address. Hence we end up different
inode lock ordering at different layers of the stack and I can't see
that ending well....

> diff --git a/fs/inode.c b/fs/inode.c
> index 00d5fc3..b8afbc7 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -980,6 +980,35 @@ void unlock_new_inode(struct inode *inode)
>  EXPORT_SYMBOL(unlock_new_inode);
>  
>  /**
> + * lock_two_nondirectories - take two i_mutexes on non-directory objects
> + * @inode1: first inode to lock
> + * @inode2: second inode to lock
> + */
> +void lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
> +{
> +	if (inode1 < inode2) {
> +		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT);
> +		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
> +	} else {
> +		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_PARENT);
> +		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_CHILD);
> +	}
> +}
> +EXPORT_SYMBOL(lock_two_nondirectories);

What makes this specific to non-directories? If it's not to be used
for directory inodes, then there should be WARN_ON_ONCE() guards in
the code...

Cheers,

Dave.
-- 
Dave Chinner
david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 01/12] vfs: pull ext4's double-i_mutex-locking into common code
@ 2013-07-09 22:04       ` Dave Chinner
  0 siblings, 0 replies; 83+ messages in thread
From: Dave Chinner @ 2013-07-09 22:04 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Al Viro, linux-nfs, linux-fsdevel, Theodore Ts'o, Andreas Dilger

On Wed, Jul 03, 2013 at 04:12:25PM -0400, J. Bruce Fields wrote:
> From: "J. Bruce Fields" <bfields@redhat.com>
> 
> We want to do this elsewhere as well.
> 
> Cc: "Theodore Ts'o" <tytso@mit.edu>
> Cc: Andreas Dilger <adilger.kernel@dilger.ca>
> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> ---
>  fs/ext4/ext4.h        |    2 --
>  fs/ext4/ioctl.c       |    4 ++--
>  fs/ext4/move_extent.c |   40 ++--------------------------------------
>  fs/inode.c            |   29 +++++++++++++++++++++++++++++
>  include/linux/fs.h    |    3 +++
>  5 files changed, 36 insertions(+), 42 deletions(-)
> 
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 5aae3d1..3590abe 100644

Just to throw a spanner in the works - have you considered that
other filesystems might have different inode lock ordering rules?

For example, XFS locks multiple inodes in ascending inode number
order, not ordered by pointer address. Hence we end up different
inode lock ordering at different layers of the stack and I can't see
that ending well....

> diff --git a/fs/inode.c b/fs/inode.c
> index 00d5fc3..b8afbc7 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -980,6 +980,35 @@ void unlock_new_inode(struct inode *inode)
>  EXPORT_SYMBOL(unlock_new_inode);
>  
>  /**
> + * lock_two_nondirectories - take two i_mutexes on non-directory objects
> + * @inode1: first inode to lock
> + * @inode2: second inode to lock
> + */
> +void lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
> +{
> +	if (inode1 < inode2) {
> +		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT);
> +		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
> +	} else {
> +		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_PARENT);
> +		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_CHILD);
> +	}
> +}
> +EXPORT_SYMBOL(lock_two_nondirectories);

What makes this specific to non-directories? If it's not to be used
for directory inodes, then there should be WARN_ON_ONCE() guards in
the code...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 12/12] locks: break delegations on any attribute modification
  2013-07-09 20:51         ` J. Bruce Fields
@ 2013-07-09 23:57             ` Jeff Layton
  -1 siblings, 0 replies; 83+ messages in thread
From: Jeff Layton @ 2013-07-09 23:57 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Al Viro, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Mikulas Patocka,
	David Howells, Tyler Hicks, Dustin Kirkland

On Tue, 9 Jul 2013 16:51:01 -0400
"J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> On Tue, Jul 09, 2013 at 09:30:47AM -0400, Jeff Layton wrote:
> > On Wed,  3 Jul 2013 16:12:36 -0400
> > "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > 
> > > From: "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > > 
> > > NFSv4 uses leases to guarantee that clients can cache metadata as well
> > > as data.
> > > 
> > > Cc: Mikulas Patocka <mikulas-TTVWCEgN8Z9G4ohzP4jBZS1Fcj925eT/@public.gmane.org>
> > > Cc: David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > > Cc: Tyler Hicks <tyhicks-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> > > Cc: Dustin Kirkland <dustin.kirkland-Bv2LyzZ6GzxBDgjK7y7TUQ@public.gmane.org>
> > > Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > > ---
> > >  drivers/base/devtmpfs.c   |    4 ++--
> > >  fs/attr.c                 |    5 ++++-
> > >  fs/cachefiles/interface.c |    4 ++--
> > >  fs/ecryptfs/inode.c       |    2 +-
> > >  fs/hpfs/namei.c           |    2 +-
> > >  fs/inode.c                |    6 +++++-
> > >  fs/nfsd/vfs.c             |    8 ++++++--
> > >  fs/open.c                 |   21 +++++++++++++++++----
> > >  fs/utimes.c               |    9 ++++++++-
> > >  include/linux/fs.h        |    2 +-
> > >  10 files changed, 47 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
> > > index 1b8490e..0f38201 100644
> > > --- a/drivers/base/devtmpfs.c
> > > +++ b/drivers/base/devtmpfs.c
> > > @@ -216,7 +216,7 @@ static int handle_create(const char *nodename, umode_t mode, kuid_t uid,
> > >  		newattrs.ia_gid = gid;
> > >  		newattrs.ia_valid = ATTR_MODE|ATTR_UID|ATTR_GID;
> > >  		mutex_lock(&dentry->d_inode->i_mutex);
> > > -		notify_change(dentry, &newattrs);
> > > +		notify_change(dentry, &newattrs, NULL);
> > >  		mutex_unlock(&dentry->d_inode->i_mutex);
> > >  
> > >  		/* mark as kernel-created inode */
> > > @@ -322,7 +322,7 @@ static int handle_remove(const char *nodename, struct device *dev)
> > >  			newattrs.ia_valid =
> > >  				ATTR_UID|ATTR_GID|ATTR_MODE;
> > >  			mutex_lock(&dentry->d_inode->i_mutex);
> > > -			notify_change(dentry, &newattrs);
> > > +			notify_change(dentry, &newattrs, NULL);
> > >  			mutex_unlock(&dentry->d_inode->i_mutex);
> > >  			err = vfs_unlink(parent.dentry->d_inode, dentry, NULL);
> > >  			if (!err || err == -ENOENT)
> > > diff --git a/fs/attr.c b/fs/attr.c
> > > index 1449adb..261f5c9 100644
> > > --- a/fs/attr.c
> > > +++ b/fs/attr.c
> > > @@ -167,7 +167,7 @@ void setattr_copy(struct inode *inode, const struct iattr *attr)
> > >  }
> > >  EXPORT_SYMBOL(setattr_copy);
> > >  
> > > -int notify_change(struct dentry * dentry, struct iattr * attr)
> > > +int notify_change(struct dentry * dentry, struct iattr * attr, struct inode **delegated_inode)
> > >  {
> > >  	struct inode *inode = dentry->d_inode;
> > >  	umode_t mode = inode->i_mode;
> > > @@ -243,6 +243,9 @@ int notify_change(struct dentry * dentry, struct iattr * attr)
> > >  	error = security_inode_setattr(dentry, attr);
> > >  	if (error)
> > >  		return error;
> > > +	error = try_break_deleg(inode, delegated_inode);
> > > +	if (error)
> > > +		return error;
> > >  
> > >  	if (inode->i_op->setattr)
> > >  		error = inode->i_op->setattr(dentry, attr);
> > > diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
> > > index 746ce53..40f5917 100644
> > > --- a/fs/cachefiles/interface.c
> > > +++ b/fs/cachefiles/interface.c
> > > @@ -417,14 +417,14 @@ static int cachefiles_attr_changed(struct fscache_object *_object)
> > >  		_debug("discard tail %llx", oi_size);
> > >  		newattrs.ia_valid = ATTR_SIZE;
> > >  		newattrs.ia_size = oi_size & PAGE_MASK;
> > > -		ret = notify_change(object->backer, &newattrs);
> > > +		ret = notify_change(object->backer, &newattrs, NULL);
> > >  		if (ret < 0)
> > >  			goto truncate_failed;
> > >  	}
> > >  
> > >  	newattrs.ia_valid = ATTR_SIZE;
> > >  	newattrs.ia_size = ni_size;
> > > -	ret = notify_change(object->backer, &newattrs);
> > > +	ret = notify_change(object->backer, &newattrs, NULL);
> > >  
> > >  truncate_failed:
> > >  	mutex_unlock(&object->backer->d_inode->i_mutex);
> > > diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
> > > index 19e4435..bd54575 100644
> > > --- a/fs/ecryptfs/inode.c
> > > +++ b/fs/ecryptfs/inode.c
> > > @@ -992,7 +992,7 @@ static int ecryptfs_setattr(struct dentry *dentry, struct iattr *ia)
> > >  		lower_ia.ia_valid &= ~ATTR_MODE;
> > >  
> > >  	mutex_lock(&lower_dentry->d_inode->i_mutex);
> > > -	rc = notify_change(lower_dentry, &lower_ia);
> > > +	rc = notify_change(lower_dentry, &lower_ia, NULL);
> > >  	mutex_unlock(&lower_dentry->d_inode->i_mutex);
> > >  out:
> > >  	fsstack_copy_attr_all(inode, lower_inode);
> > > diff --git a/fs/hpfs/namei.c b/fs/hpfs/namei.c
> > > index 345713d..1b39afd 100644
> > > --- a/fs/hpfs/namei.c
> > > +++ b/fs/hpfs/namei.c
> > > @@ -407,7 +407,7 @@ again:
> > >  			/*printk("HPFS: truncating file before delete.\n");*/
> > >  			newattrs.ia_size = 0;
> > >  			newattrs.ia_valid = ATTR_SIZE | ATTR_CTIME;
> > > -			err = notify_change(dentry, &newattrs);
> > > +			err = notify_change(dentry, &newattrs, NULL);
> > >  			put_write_access(inode);
> > >  			if (!err)
> > >  				goto again;
> > > diff --git a/fs/inode.c b/fs/inode.c
> > > index 304db4c..664d631 100644
> > > --- a/fs/inode.c
> > > +++ b/fs/inode.c
> > > @@ -1633,7 +1633,11 @@ static int __remove_suid(struct dentry *dentry, int kill)
> > >  	struct iattr newattrs;
> > >  
> > >  	newattrs.ia_valid = ATTR_FORCE | kill;
> > > -	return notify_change(dentry, &newattrs);
> > > +	/*
> > > +	 * Note we call this on write, so notify_change will not
> > > +	 * encounter any conflicting delegations:
> > > +	 */
> > > +	return notify_change(dentry, &newattrs, NULL);
> > >  }
> > >  
> > >  int file_remove_suid(struct file *file)
> > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > > index b9740cb..e781901 100644
> > > --- a/fs/nfsd/vfs.c
> > > +++ b/fs/nfsd/vfs.c
> > > @@ -426,7 +426,7 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, struct iattr *iap,
> > >  			goto out_nfserr;
> > >  		fh_lock(fhp);
> > >  
> > > -		host_err = notify_change(dentry, iap);
> > > +		host_err = notify_change(dentry, iap, NULL);
> > >  		err = nfserrno(host_err);
> > >  		fh_unlock(fhp);
> > >  	}
> > > @@ -959,7 +959,11 @@ static void kill_suid(struct dentry *dentry)
> > >  	ia.ia_valid = ATTR_KILL_SUID | ATTR_KILL_SGID | ATTR_KILL_PRIV;
> > >  
> > >  	mutex_lock(&dentry->d_inode->i_mutex);
> > > -	notify_change(dentry, &ia);
> > > +	/*
> > > +	 * Note we call this on write, so notify_change will not
> > > +	 * encounter any conflicting delegations:
> > > +	 */
> > > +	notify_change(dentry, &ia, NULL);
> > >  	mutex_unlock(&dentry->d_inode->i_mutex);
> > >  }
> > >  
> > > diff --git a/fs/open.c b/fs/open.c
> > > index 8c74100..1a39d29 100644
> > > --- a/fs/open.c
> > > +++ b/fs/open.c
> > > @@ -57,7 +57,7 @@ int do_truncate(struct dentry *dentry, loff_t length, unsigned int time_attrs,
> > >  		newattrs.ia_valid |= ret | ATTR_FORCE;
> > >  
> > >  	mutex_lock(&dentry->d_inode->i_mutex);
> > > -	ret = notify_change(dentry, &newattrs);
> > > +	ret = notify_change(dentry, &newattrs, NULL);
> > >  	mutex_unlock(&dentry->d_inode->i_mutex);
> > >  	return ret;
> > >  }
> > 
> > Isn't it possible we'll need to break a delegation on truncate()?
> 
> In the truncate case, the caller called break_lease, and in the
> ftruncate case it's called with a write open, and the open already broke
> any leases or delegations.
> 
> Might need a comment--could I get away with just this?:
> 
>  	mutex_lock(&dentry->d_inode->i_mutex);
> +	/* NULL is safe: any delegations have already been broken: */
>  	ret = notify_change(dentry, &newattrs, NULL);
>  	mutex_unlock(&dentry->d_inode->i_mutex);
>  	return ret;
> 
> I also added something to the notify_change kerneldoc: "passing NULL is
> fine for callers holding the file open for write, as there can be no
> conflicting delegation in that case."
> 


Ahh ok, makes sense. Both comments sound like good additions.

Thanks,
-- 
Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 12/12] locks: break delegations on any attribute modification
@ 2013-07-09 23:57             ` Jeff Layton
  0 siblings, 0 replies; 83+ messages in thread
From: Jeff Layton @ 2013-07-09 23:57 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Al Viro, linux-nfs, linux-fsdevel, Mikulas Patocka,
	David Howells, Tyler Hicks, Dustin Kirkland

On Tue, 9 Jul 2013 16:51:01 -0400
"J. Bruce Fields" <bfields@redhat.com> wrote:

> On Tue, Jul 09, 2013 at 09:30:47AM -0400, Jeff Layton wrote:
> > On Wed,  3 Jul 2013 16:12:36 -0400
> > "J. Bruce Fields" <bfields@redhat.com> wrote:
> > 
> > > From: "J. Bruce Fields" <bfields@redhat.com>
> > > 
> > > NFSv4 uses leases to guarantee that clients can cache metadata as well
> > > as data.
> > > 
> > > Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
> > > Cc: David Howells <dhowells@redhat.com>
> > > Cc: Tyler Hicks <tyhicks@canonical.com>
> > > Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
> > > Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> > > ---
> > >  drivers/base/devtmpfs.c   |    4 ++--
> > >  fs/attr.c                 |    5 ++++-
> > >  fs/cachefiles/interface.c |    4 ++--
> > >  fs/ecryptfs/inode.c       |    2 +-
> > >  fs/hpfs/namei.c           |    2 +-
> > >  fs/inode.c                |    6 +++++-
> > >  fs/nfsd/vfs.c             |    8 ++++++--
> > >  fs/open.c                 |   21 +++++++++++++++++----
> > >  fs/utimes.c               |    9 ++++++++-
> > >  include/linux/fs.h        |    2 +-
> > >  10 files changed, 47 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
> > > index 1b8490e..0f38201 100644
> > > --- a/drivers/base/devtmpfs.c
> > > +++ b/drivers/base/devtmpfs.c
> > > @@ -216,7 +216,7 @@ static int handle_create(const char *nodename, umode_t mode, kuid_t uid,
> > >  		newattrs.ia_gid = gid;
> > >  		newattrs.ia_valid = ATTR_MODE|ATTR_UID|ATTR_GID;
> > >  		mutex_lock(&dentry->d_inode->i_mutex);
> > > -		notify_change(dentry, &newattrs);
> > > +		notify_change(dentry, &newattrs, NULL);
> > >  		mutex_unlock(&dentry->d_inode->i_mutex);
> > >  
> > >  		/* mark as kernel-created inode */
> > > @@ -322,7 +322,7 @@ static int handle_remove(const char *nodename, struct device *dev)
> > >  			newattrs.ia_valid =
> > >  				ATTR_UID|ATTR_GID|ATTR_MODE;
> > >  			mutex_lock(&dentry->d_inode->i_mutex);
> > > -			notify_change(dentry, &newattrs);
> > > +			notify_change(dentry, &newattrs, NULL);
> > >  			mutex_unlock(&dentry->d_inode->i_mutex);
> > >  			err = vfs_unlink(parent.dentry->d_inode, dentry, NULL);
> > >  			if (!err || err == -ENOENT)
> > > diff --git a/fs/attr.c b/fs/attr.c
> > > index 1449adb..261f5c9 100644
> > > --- a/fs/attr.c
> > > +++ b/fs/attr.c
> > > @@ -167,7 +167,7 @@ void setattr_copy(struct inode *inode, const struct iattr *attr)
> > >  }
> > >  EXPORT_SYMBOL(setattr_copy);
> > >  
> > > -int notify_change(struct dentry * dentry, struct iattr * attr)
> > > +int notify_change(struct dentry * dentry, struct iattr * attr, struct inode **delegated_inode)
> > >  {
> > >  	struct inode *inode = dentry->d_inode;
> > >  	umode_t mode = inode->i_mode;
> > > @@ -243,6 +243,9 @@ int notify_change(struct dentry * dentry, struct iattr * attr)
> > >  	error = security_inode_setattr(dentry, attr);
> > >  	if (error)
> > >  		return error;
> > > +	error = try_break_deleg(inode, delegated_inode);
> > > +	if (error)
> > > +		return error;
> > >  
> > >  	if (inode->i_op->setattr)
> > >  		error = inode->i_op->setattr(dentry, attr);
> > > diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c
> > > index 746ce53..40f5917 100644
> > > --- a/fs/cachefiles/interface.c
> > > +++ b/fs/cachefiles/interface.c
> > > @@ -417,14 +417,14 @@ static int cachefiles_attr_changed(struct fscache_object *_object)
> > >  		_debug("discard tail %llx", oi_size);
> > >  		newattrs.ia_valid = ATTR_SIZE;
> > >  		newattrs.ia_size = oi_size & PAGE_MASK;
> > > -		ret = notify_change(object->backer, &newattrs);
> > > +		ret = notify_change(object->backer, &newattrs, NULL);
> > >  		if (ret < 0)
> > >  			goto truncate_failed;
> > >  	}
> > >  
> > >  	newattrs.ia_valid = ATTR_SIZE;
> > >  	newattrs.ia_size = ni_size;
> > > -	ret = notify_change(object->backer, &newattrs);
> > > +	ret = notify_change(object->backer, &newattrs, NULL);
> > >  
> > >  truncate_failed:
> > >  	mutex_unlock(&object->backer->d_inode->i_mutex);
> > > diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
> > > index 19e4435..bd54575 100644
> > > --- a/fs/ecryptfs/inode.c
> > > +++ b/fs/ecryptfs/inode.c
> > > @@ -992,7 +992,7 @@ static int ecryptfs_setattr(struct dentry *dentry, struct iattr *ia)
> > >  		lower_ia.ia_valid &= ~ATTR_MODE;
> > >  
> > >  	mutex_lock(&lower_dentry->d_inode->i_mutex);
> > > -	rc = notify_change(lower_dentry, &lower_ia);
> > > +	rc = notify_change(lower_dentry, &lower_ia, NULL);
> > >  	mutex_unlock(&lower_dentry->d_inode->i_mutex);
> > >  out:
> > >  	fsstack_copy_attr_all(inode, lower_inode);
> > > diff --git a/fs/hpfs/namei.c b/fs/hpfs/namei.c
> > > index 345713d..1b39afd 100644
> > > --- a/fs/hpfs/namei.c
> > > +++ b/fs/hpfs/namei.c
> > > @@ -407,7 +407,7 @@ again:
> > >  			/*printk("HPFS: truncating file before delete.\n");*/
> > >  			newattrs.ia_size = 0;
> > >  			newattrs.ia_valid = ATTR_SIZE | ATTR_CTIME;
> > > -			err = notify_change(dentry, &newattrs);
> > > +			err = notify_change(dentry, &newattrs, NULL);
> > >  			put_write_access(inode);
> > >  			if (!err)
> > >  				goto again;
> > > diff --git a/fs/inode.c b/fs/inode.c
> > > index 304db4c..664d631 100644
> > > --- a/fs/inode.c
> > > +++ b/fs/inode.c
> > > @@ -1633,7 +1633,11 @@ static int __remove_suid(struct dentry *dentry, int kill)
> > >  	struct iattr newattrs;
> > >  
> > >  	newattrs.ia_valid = ATTR_FORCE | kill;
> > > -	return notify_change(dentry, &newattrs);
> > > +	/*
> > > +	 * Note we call this on write, so notify_change will not
> > > +	 * encounter any conflicting delegations:
> > > +	 */
> > > +	return notify_change(dentry, &newattrs, NULL);
> > >  }
> > >  
> > >  int file_remove_suid(struct file *file)
> > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > > index b9740cb..e781901 100644
> > > --- a/fs/nfsd/vfs.c
> > > +++ b/fs/nfsd/vfs.c
> > > @@ -426,7 +426,7 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, struct iattr *iap,
> > >  			goto out_nfserr;
> > >  		fh_lock(fhp);
> > >  
> > > -		host_err = notify_change(dentry, iap);
> > > +		host_err = notify_change(dentry, iap, NULL);
> > >  		err = nfserrno(host_err);
> > >  		fh_unlock(fhp);
> > >  	}
> > > @@ -959,7 +959,11 @@ static void kill_suid(struct dentry *dentry)
> > >  	ia.ia_valid = ATTR_KILL_SUID | ATTR_KILL_SGID | ATTR_KILL_PRIV;
> > >  
> > >  	mutex_lock(&dentry->d_inode->i_mutex);
> > > -	notify_change(dentry, &ia);
> > > +	/*
> > > +	 * Note we call this on write, so notify_change will not
> > > +	 * encounter any conflicting delegations:
> > > +	 */
> > > +	notify_change(dentry, &ia, NULL);
> > >  	mutex_unlock(&dentry->d_inode->i_mutex);
> > >  }
> > >  
> > > diff --git a/fs/open.c b/fs/open.c
> > > index 8c74100..1a39d29 100644
> > > --- a/fs/open.c
> > > +++ b/fs/open.c
> > > @@ -57,7 +57,7 @@ int do_truncate(struct dentry *dentry, loff_t length, unsigned int time_attrs,
> > >  		newattrs.ia_valid |= ret | ATTR_FORCE;
> > >  
> > >  	mutex_lock(&dentry->d_inode->i_mutex);
> > > -	ret = notify_change(dentry, &newattrs);
> > > +	ret = notify_change(dentry, &newattrs, NULL);
> > >  	mutex_unlock(&dentry->d_inode->i_mutex);
> > >  	return ret;
> > >  }
> > 
> > Isn't it possible we'll need to break a delegation on truncate()?
> 
> In the truncate case, the caller called break_lease, and in the
> ftruncate case it's called with a write open, and the open already broke
> any leases or delegations.
> 
> Might need a comment--could I get away with just this?:
> 
>  	mutex_lock(&dentry->d_inode->i_mutex);
> +	/* NULL is safe: any delegations have already been broken: */
>  	ret = notify_change(dentry, &newattrs, NULL);
>  	mutex_unlock(&dentry->d_inode->i_mutex);
>  	return ret;
> 
> I also added something to the notify_change kerneldoc: "passing NULL is
> fine for callers holding the file open for write, as there can be no
> conflicting delegation in that case."
> 


Ahh ok, makes sense. Both comments sound like good additions.

Thanks,
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 01/12] vfs: pull ext4's double-i_mutex-locking into common code
  2013-07-09 22:04       ` Dave Chinner
@ 2013-07-10  0:21         ` J. Bruce Fields
  -1 siblings, 0 replies; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-10  0:21 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Al Viro, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Theodore Ts'o,
	Andreas Dilger

On Wed, Jul 10, 2013 at 08:04:11AM +1000, Dave Chinner wrote:
> On Wed, Jul 03, 2013 at 04:12:25PM -0400, J. Bruce Fields wrote:
> > From: "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > 
> > We want to do this elsewhere as well.
> > 
> > Cc: "Theodore Ts'o" <tytso-3s7WtUTddSA@public.gmane.org>
> > Cc: Andreas Dilger <adilger.kernel-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org>
> > Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > ---
> >  fs/ext4/ext4.h        |    2 --
> >  fs/ext4/ioctl.c       |    4 ++--
> >  fs/ext4/move_extent.c |   40 ++--------------------------------------
> >  fs/inode.c            |   29 +++++++++++++++++++++++++++++
> >  include/linux/fs.h    |    3 +++
> >  5 files changed, 36 insertions(+), 42 deletions(-)
> > 
> > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> > index 5aae3d1..3590abe 100644

Thanks for the comment:

> Just to throw a spanner in the works - have you considered that
> other filesystems might have different inode lock ordering rules?
> 
> For example, XFS locks multiple inodes in ascending inode number
> order, not ordered by pointer address. Hence we end up different
> inode lock ordering at different layers of the stack and I can't see
> that ending well....

What lock(s) is it taking exactly, where?  If there's a possible
deadlock, can we come up with a compatible ordering?

> > diff --git a/fs/inode.c b/fs/inode.c
> > index 00d5fc3..b8afbc7 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -980,6 +980,35 @@ void unlock_new_inode(struct inode *inode)
> >  EXPORT_SYMBOL(unlock_new_inode);
> >  
> >  /**
> > + * lock_two_nondirectories - take two i_mutexes on non-directory objects
> > + * @inode1: first inode to lock
> > + * @inode2: second inode to lock
> > + */
> > +void lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
> > +{
> > +	if (inode1 < inode2) {
> > +		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT);
> > +		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
> > +	} else {
> > +		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_PARENT);
> > +		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_CHILD);
> > +	}
> > +}
> > +EXPORT_SYMBOL(lock_two_nondirectories);
> 
> What makes this specific to non-directories?

See 

	http://mid.gmane.org/<1372882356-14168-5-git-send-email-bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

The only caller outside ext4 is vfs_rename_other.

I think we could make it work for directories two if necessary though
the ordering would be more complicated.  Currently there's no reason.

> If it's not to be used for directory inodes, then there should be
> WARN_ON_ONCE() guards in the code...

Sure.  So something like the following.

Hm.  I also overlooked that ext4 had a BUG() for the case they're equal.
Maybe we should keep that too if it's not overkill.

--b.

commit ad9a94b0e91d6057734e9835782e0c2cdc148bdc
Author: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Date:   Wed Apr 18 15:16:33 2012 -0400

    vfs: pull ext4's double-i_mutex-locking into common code
    
    We want to do this elsewhere as well.
    
    Also catch any attempts to use it for directories (where this ordering
    would conflict with ancestor-first directory ordering in lock_rename).
    
    Cc: Andreas Dilger <adilger.kernel-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org>
    Cc: Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org>
    Acked-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    Acked-by: "Theodore Ts'o" <tytso-3s7WtUTddSA@public.gmane.org>
    Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 5aae3d1..3590abe 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2642,8 +2642,6 @@ extern void ext4_double_down_write_data_sem(struct inode *first,
 					    struct inode *second);
 extern void ext4_double_up_write_data_sem(struct inode *orig_inode,
 					  struct inode *donor_inode);
-void ext4_inode_double_lock(struct inode *inode1, struct inode *inode2);
-void ext4_inode_double_unlock(struct inode *inode1, struct inode *inode2);
 extern int ext4_move_extents(struct file *o_filp, struct file *d_filp,
 			     __u64 start_orig, __u64 start_donor,
 			     __u64 len, __u64 *moved_len);
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index 9491ac0..12048f7 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -129,7 +129,7 @@ static long swap_inode_boot_loader(struct super_block *sb,
 
 	/* Protect orig inodes against a truncate and make sure,
 	 * that only 1 swap_inode_boot_loader is running. */
-	ext4_inode_double_lock(inode, inode_bl);
+	lock_two_nondirectories(inode, inode_bl);
 
 	truncate_inode_pages(&inode->i_data, 0);
 	truncate_inode_pages(&inode_bl->i_data, 0);
@@ -204,7 +204,7 @@ static long swap_inode_boot_loader(struct super_block *sb,
 	ext4_inode_resume_unlocked_dio(inode);
 	ext4_inode_resume_unlocked_dio(inode_bl);
 
-	ext4_inode_double_unlock(inode, inode_bl);
+	unlock_two_nondirectories(inode, inode_bl);
 
 	iput(inode_bl);
 
diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
index 3dcbf36..986a838 100644
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@@ -1206,42 +1206,6 @@ mext_check_arguments(struct inode *orig_inode,
 }
 
 /**
- * ext4_inode_double_lock - Lock i_mutex on both @inode1 and @inode2
- *
- * @inode1:	the inode structure
- * @inode2:	the inode structure
- *
- * Lock two inodes' i_mutex
- */
-void
-ext4_inode_double_lock(struct inode *inode1, struct inode *inode2)
-{
-	BUG_ON(inode1 == inode2);
-	if (inode1 < inode2) {
-		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT);
-		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
-	} else {
-		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_PARENT);
-		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_CHILD);
-	}
-}
-
-/**
- * ext4_inode_double_unlock - Release i_mutex on both @inode1 and @inode2
- *
- * @inode1:     the inode that is released first
- * @inode2:     the inode that is released second
- *
- */
-
-void
-ext4_inode_double_unlock(struct inode *inode1, struct inode *inode2)
-{
-	mutex_unlock(&inode1->i_mutex);
-	mutex_unlock(&inode2->i_mutex);
-}
-
-/**
  * ext4_move_extents - Exchange the specified range of a file
  *
  * @o_filp:		file structure of the original file
@@ -1330,7 +1294,7 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp,
 		return -EINVAL;
 	}
 	/* Protect orig and donor inodes against a truncate */
-	ext4_inode_double_lock(orig_inode, donor_inode);
+	lock_two_nondirectories(orig_inode, donor_inode);
 
 	/* Wait for all existing dio workers */
 	ext4_inode_block_unlocked_dio(orig_inode);
@@ -1538,7 +1502,7 @@ out:
 	ext4_double_up_write_data_sem(orig_inode, donor_inode);
 	ext4_inode_resume_unlocked_dio(orig_inode);
 	ext4_inode_resume_unlocked_dio(donor_inode);
-	ext4_inode_double_unlock(orig_inode, donor_inode);
+	unlock_two_nondirectories(orig_inode, donor_inode);
 
 	return ret;
 }
diff --git a/fs/inode.c b/fs/inode.c
index 00d5fc3..8f3c6fa 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -980,6 +980,37 @@ void unlock_new_inode(struct inode *inode)
 EXPORT_SYMBOL(unlock_new_inode);
 
 /**
+ * lock_two_nondirectories - take two i_mutexes on non-directory objects
+ * @inode1: first inode to lock
+ * @inode2: second inode to lock
+ */
+void lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
+{
+	WARN_ON_ONCE(S_ISDIR(inode1->i_mode) || S_ISDIR(inode2->i_mode));
+	WARN_ON_ONCE(inode1 == inode2);
+	if (inode1 < inode2) {
+		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT);
+		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
+	} else {
+		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_PARENT);
+		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_CHILD);
+	}
+}
+EXPORT_SYMBOL(lock_two_nondirectories);
+
+/**
+ * unlock_two_nondirectories - release locks from lock_two_nondirectories()
+ * @inode1: first inode to unlock
+ * @inode2: second inode to unlock
+ */
+void unlock_two_nondirectories(struct inode *inode1, struct inode *inode2)
+{
+	mutex_unlock(&inode1->i_mutex);
+	mutex_unlock(&inode2->i_mutex);
+}
+EXPORT_SYMBOL(unlock_two_nondirectories);
+
+/**
  * iget5_locked - obtain an inode from a mounted file system
  * @sb:		super block of file system
  * @hashval:	hash value (usually inode number) to get
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 65c2be2..3258761 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -634,6 +634,9 @@ enum inode_i_mutex_lock_class
 	I_MUTEX_QUOTA
 };
 
+void lock_two_nondirectories(struct inode *, struct inode*);
+void unlock_two_nondirectories(struct inode *, struct inode*);
+
 /*
  * NOTE: in a 32bit arch with a preemptable kernel and
  * an UP compile the i_size_read/write must be atomic
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [PATCH 01/12] vfs: pull ext4's double-i_mutex-locking into common code
@ 2013-07-10  0:21         ` J. Bruce Fields
  0 siblings, 0 replies; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-10  0:21 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Al Viro, linux-nfs, linux-fsdevel, Theodore Ts'o, Andreas Dilger

On Wed, Jul 10, 2013 at 08:04:11AM +1000, Dave Chinner wrote:
> On Wed, Jul 03, 2013 at 04:12:25PM -0400, J. Bruce Fields wrote:
> > From: "J. Bruce Fields" <bfields@redhat.com>
> > 
> > We want to do this elsewhere as well.
> > 
> > Cc: "Theodore Ts'o" <tytso@mit.edu>
> > Cc: Andreas Dilger <adilger.kernel@dilger.ca>
> > Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> > ---
> >  fs/ext4/ext4.h        |    2 --
> >  fs/ext4/ioctl.c       |    4 ++--
> >  fs/ext4/move_extent.c |   40 ++--------------------------------------
> >  fs/inode.c            |   29 +++++++++++++++++++++++++++++
> >  include/linux/fs.h    |    3 +++
> >  5 files changed, 36 insertions(+), 42 deletions(-)
> > 
> > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> > index 5aae3d1..3590abe 100644

Thanks for the comment:

> Just to throw a spanner in the works - have you considered that
> other filesystems might have different inode lock ordering rules?
> 
> For example, XFS locks multiple inodes in ascending inode number
> order, not ordered by pointer address. Hence we end up different
> inode lock ordering at different layers of the stack and I can't see
> that ending well....

What lock(s) is it taking exactly, where?  If there's a possible
deadlock, can we come up with a compatible ordering?

> > diff --git a/fs/inode.c b/fs/inode.c
> > index 00d5fc3..b8afbc7 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -980,6 +980,35 @@ void unlock_new_inode(struct inode *inode)
> >  EXPORT_SYMBOL(unlock_new_inode);
> >  
> >  /**
> > + * lock_two_nondirectories - take two i_mutexes on non-directory objects
> > + * @inode1: first inode to lock
> > + * @inode2: second inode to lock
> > + */
> > +void lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
> > +{
> > +	if (inode1 < inode2) {
> > +		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT);
> > +		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
> > +	} else {
> > +		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_PARENT);
> > +		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_CHILD);
> > +	}
> > +}
> > +EXPORT_SYMBOL(lock_two_nondirectories);
> 
> What makes this specific to non-directories?

See 

	http://mid.gmane.org/<1372882356-14168-5-git-send-email-bfields@redhat.com>

The only caller outside ext4 is vfs_rename_other.

I think we could make it work for directories two if necessary though
the ordering would be more complicated.  Currently there's no reason.

> If it's not to be used for directory inodes, then there should be
> WARN_ON_ONCE() guards in the code...

Sure.  So something like the following.

Hm.  I also overlooked that ext4 had a BUG() for the case they're equal.
Maybe we should keep that too if it's not overkill.

--b.

commit ad9a94b0e91d6057734e9835782e0c2cdc148bdc
Author: J. Bruce Fields <bfields@redhat.com>
Date:   Wed Apr 18 15:16:33 2012 -0400

    vfs: pull ext4's double-i_mutex-locking into common code
    
    We want to do this elsewhere as well.
    
    Also catch any attempts to use it for directories (where this ordering
    would conflict with ancestor-first directory ordering in lock_rename).
    
    Cc: Andreas Dilger <adilger.kernel@dilger.ca>
    Cc: Dave Chinner <david@fromorbit.com>
    Acked-by: Jeff Layton <jlayton@redhat.com>
    Acked-by: "Theodore Ts'o" <tytso@mit.edu>
    Signed-off-by: J. Bruce Fields <bfields@redhat.com>

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 5aae3d1..3590abe 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2642,8 +2642,6 @@ extern void ext4_double_down_write_data_sem(struct inode *first,
 					    struct inode *second);
 extern void ext4_double_up_write_data_sem(struct inode *orig_inode,
 					  struct inode *donor_inode);
-void ext4_inode_double_lock(struct inode *inode1, struct inode *inode2);
-void ext4_inode_double_unlock(struct inode *inode1, struct inode *inode2);
 extern int ext4_move_extents(struct file *o_filp, struct file *d_filp,
 			     __u64 start_orig, __u64 start_donor,
 			     __u64 len, __u64 *moved_len);
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index 9491ac0..12048f7 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -129,7 +129,7 @@ static long swap_inode_boot_loader(struct super_block *sb,
 
 	/* Protect orig inodes against a truncate and make sure,
 	 * that only 1 swap_inode_boot_loader is running. */
-	ext4_inode_double_lock(inode, inode_bl);
+	lock_two_nondirectories(inode, inode_bl);
 
 	truncate_inode_pages(&inode->i_data, 0);
 	truncate_inode_pages(&inode_bl->i_data, 0);
@@ -204,7 +204,7 @@ static long swap_inode_boot_loader(struct super_block *sb,
 	ext4_inode_resume_unlocked_dio(inode);
 	ext4_inode_resume_unlocked_dio(inode_bl);
 
-	ext4_inode_double_unlock(inode, inode_bl);
+	unlock_two_nondirectories(inode, inode_bl);
 
 	iput(inode_bl);
 
diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
index 3dcbf36..986a838 100644
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@@ -1206,42 +1206,6 @@ mext_check_arguments(struct inode *orig_inode,
 }
 
 /**
- * ext4_inode_double_lock - Lock i_mutex on both @inode1 and @inode2
- *
- * @inode1:	the inode structure
- * @inode2:	the inode structure
- *
- * Lock two inodes' i_mutex
- */
-void
-ext4_inode_double_lock(struct inode *inode1, struct inode *inode2)
-{
-	BUG_ON(inode1 == inode2);
-	if (inode1 < inode2) {
-		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT);
-		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
-	} else {
-		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_PARENT);
-		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_CHILD);
-	}
-}
-
-/**
- * ext4_inode_double_unlock - Release i_mutex on both @inode1 and @inode2
- *
- * @inode1:     the inode that is released first
- * @inode2:     the inode that is released second
- *
- */
-
-void
-ext4_inode_double_unlock(struct inode *inode1, struct inode *inode2)
-{
-	mutex_unlock(&inode1->i_mutex);
-	mutex_unlock(&inode2->i_mutex);
-}
-
-/**
  * ext4_move_extents - Exchange the specified range of a file
  *
  * @o_filp:		file structure of the original file
@@ -1330,7 +1294,7 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp,
 		return -EINVAL;
 	}
 	/* Protect orig and donor inodes against a truncate */
-	ext4_inode_double_lock(orig_inode, donor_inode);
+	lock_two_nondirectories(orig_inode, donor_inode);
 
 	/* Wait for all existing dio workers */
 	ext4_inode_block_unlocked_dio(orig_inode);
@@ -1538,7 +1502,7 @@ out:
 	ext4_double_up_write_data_sem(orig_inode, donor_inode);
 	ext4_inode_resume_unlocked_dio(orig_inode);
 	ext4_inode_resume_unlocked_dio(donor_inode);
-	ext4_inode_double_unlock(orig_inode, donor_inode);
+	unlock_two_nondirectories(orig_inode, donor_inode);
 
 	return ret;
 }
diff --git a/fs/inode.c b/fs/inode.c
index 00d5fc3..8f3c6fa 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -980,6 +980,37 @@ void unlock_new_inode(struct inode *inode)
 EXPORT_SYMBOL(unlock_new_inode);
 
 /**
+ * lock_two_nondirectories - take two i_mutexes on non-directory objects
+ * @inode1: first inode to lock
+ * @inode2: second inode to lock
+ */
+void lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
+{
+	WARN_ON_ONCE(S_ISDIR(inode1->i_mode) || S_ISDIR(inode2->i_mode));
+	WARN_ON_ONCE(inode1 == inode2);
+	if (inode1 < inode2) {
+		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_PARENT);
+		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_CHILD);
+	} else {
+		mutex_lock_nested(&inode2->i_mutex, I_MUTEX_PARENT);
+		mutex_lock_nested(&inode1->i_mutex, I_MUTEX_CHILD);
+	}
+}
+EXPORT_SYMBOL(lock_two_nondirectories);
+
+/**
+ * unlock_two_nondirectories - release locks from lock_two_nondirectories()
+ * @inode1: first inode to unlock
+ * @inode2: second inode to unlock
+ */
+void unlock_two_nondirectories(struct inode *inode1, struct inode *inode2)
+{
+	mutex_unlock(&inode1->i_mutex);
+	mutex_unlock(&inode2->i_mutex);
+}
+EXPORT_SYMBOL(unlock_two_nondirectories);
+
+/**
  * iget5_locked - obtain an inode from a mounted file system
  * @sb:		super block of file system
  * @hashval:	hash value (usually inode number) to get
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 65c2be2..3258761 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -634,6 +634,9 @@ enum inode_i_mutex_lock_class
 	I_MUTEX_QUOTA
 };
 
+void lock_two_nondirectories(struct inode *, struct inode*);
+void unlock_two_nondirectories(struct inode *, struct inode*);
+
 /*
  * NOTE: in a 32bit arch with a preemptable kernel and
  * an UP compile the i_size_read/write must be atomic

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [PATCH 12/12] locks: break delegations on any attribute modification
  2013-07-09 21:19             ` J. Bruce Fields
@ 2013-07-10  1:26                 ` Jeff Layton
  -1 siblings, 0 replies; 83+ messages in thread
From: Jeff Layton @ 2013-07-10  1:26 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Al Viro, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Mikulas Patocka,
	David Howells, Tyler Hicks, Dustin Kirkland

On Tue, 9 Jul 2013 17:19:12 -0400
"J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> On Tue, Jul 09, 2013 at 04:51:01PM -0400, J. Bruce Fields wrote:
> > On Tue, Jul 09, 2013 at 09:30:47AM -0400, Jeff Layton wrote:
> > > On Wed,  3 Jul 2013 16:12:36 -0400
> > > "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > > 
> > > > From: "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > > > 
> > > > NFSv4 uses leases to guarantee that clients can cache metadata as well
> > > > as data.
> > > > 
> ...
> > > Isn't it possible we'll need to break a delegation on truncate()?
> > 
> > In the truncate case, the caller called break_lease, and in the
> > ftruncate case it's called with a write open, and the open already broke
> > any leases or delegations.
> > 
> > Might need a comment--could I get away with just this?:
> > 
> >  	mutex_lock(&dentry->d_inode->i_mutex);
> > +	/* NULL is safe: any delegations have already been broken: */
> >  	ret = notify_change(dentry, &newattrs, NULL);
> >  	mutex_unlock(&dentry->d_inode->i_mutex);
> >  	return ret;
> > 
> > I also added something to the notify_change kerneldoc: "passing NULL is
> > fine for callers holding the file open for write, as there can be no
> > conflicting delegation in that case."
> 
> Another question is whether it's really worth dropping locks and
> retrying in this case.
> 
> We could instead do the following.
> 
> --b.
> 
> commit 40a4fd613034cd3f242ec11e5ecd44f9a83ab39d
> Author: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Date:   Tue Sep 20 17:19:26 2011 -0400
> 
>     locks: break delegations on any attribute modification
>     
>     NFSv4 uses leases to guarantee that clients can cache metadata as well
>     as data.
>     
>     Note unlike link, unlink, and rename, we don't bother dropping locks and
>     retrying.  In the other cases we're holding a directory mutex, hence
>     blocking operations (even lookups) on the same directory.  In this case
>     we're holding only the i_mutex on this file, so the impact of an
>     unresponsive client is limited to this file.
>     
>     Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> 
> diff --git a/fs/attr.c b/fs/attr.c
> index 1449adb..a2c1d04 100644
> --- a/fs/attr.c
> +++ b/fs/attr.c
> @@ -243,6 +243,9 @@ int notify_change(struct dentry * dentry, struct iattr * attr)
>  	error = security_inode_setattr(dentry, attr);
>  	if (error)
>  		return error;
> +	error = break_deleg_wait(inode);
> +	if (error)
> +		return error;
>  
>  	if (inode->i_op->setattr)
>  		error = inode->i_op->setattr(dentry, attr);


I guess the question is whether there are operations that require the
i_mutex but that don't require the delegation recall to have finished.

-- 
Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 12/12] locks: break delegations on any attribute modification
@ 2013-07-10  1:26                 ` Jeff Layton
  0 siblings, 0 replies; 83+ messages in thread
From: Jeff Layton @ 2013-07-10  1:26 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Al Viro, linux-nfs, linux-fsdevel, Mikulas Patocka,
	David Howells, Tyler Hicks, Dustin Kirkland

On Tue, 9 Jul 2013 17:19:12 -0400
"J. Bruce Fields" <bfields@redhat.com> wrote:

> On Tue, Jul 09, 2013 at 04:51:01PM -0400, J. Bruce Fields wrote:
> > On Tue, Jul 09, 2013 at 09:30:47AM -0400, Jeff Layton wrote:
> > > On Wed,  3 Jul 2013 16:12:36 -0400
> > > "J. Bruce Fields" <bfields@redhat.com> wrote:
> > > 
> > > > From: "J. Bruce Fields" <bfields@redhat.com>
> > > > 
> > > > NFSv4 uses leases to guarantee that clients can cache metadata as well
> > > > as data.
> > > > 
> ...
> > > Isn't it possible we'll need to break a delegation on truncate()?
> > 
> > In the truncate case, the caller called break_lease, and in the
> > ftruncate case it's called with a write open, and the open already broke
> > any leases or delegations.
> > 
> > Might need a comment--could I get away with just this?:
> > 
> >  	mutex_lock(&dentry->d_inode->i_mutex);
> > +	/* NULL is safe: any delegations have already been broken: */
> >  	ret = notify_change(dentry, &newattrs, NULL);
> >  	mutex_unlock(&dentry->d_inode->i_mutex);
> >  	return ret;
> > 
> > I also added something to the notify_change kerneldoc: "passing NULL is
> > fine for callers holding the file open for write, as there can be no
> > conflicting delegation in that case."
> 
> Another question is whether it's really worth dropping locks and
> retrying in this case.
> 
> We could instead do the following.
> 
> --b.
> 
> commit 40a4fd613034cd3f242ec11e5ecd44f9a83ab39d
> Author: J. Bruce Fields <bfields@redhat.com>
> Date:   Tue Sep 20 17:19:26 2011 -0400
> 
>     locks: break delegations on any attribute modification
>     
>     NFSv4 uses leases to guarantee that clients can cache metadata as well
>     as data.
>     
>     Note unlike link, unlink, and rename, we don't bother dropping locks and
>     retrying.  In the other cases we're holding a directory mutex, hence
>     blocking operations (even lookups) on the same directory.  In this case
>     we're holding only the i_mutex on this file, so the impact of an
>     unresponsive client is limited to this file.
>     
>     Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> 
> diff --git a/fs/attr.c b/fs/attr.c
> index 1449adb..a2c1d04 100644
> --- a/fs/attr.c
> +++ b/fs/attr.c
> @@ -243,6 +243,9 @@ int notify_change(struct dentry * dentry, struct iattr * attr)
>  	error = security_inode_setattr(dentry, attr);
>  	if (error)
>  		return error;
> +	error = break_deleg_wait(inode);
> +	if (error)
> +		return error;
>  
>  	if (inode->i_op->setattr)
>  		error = inode->i_op->setattr(dentry, attr);


I guess the question is whether there are operations that require the
i_mutex but that don't require the delegation recall to have finished.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 01/12] vfs: pull ext4's double-i_mutex-locking into common code
  2013-07-10  0:21         ` J. Bruce Fields
@ 2013-07-10  2:09             ` Dave Chinner
  -1 siblings, 0 replies; 83+ messages in thread
From: Dave Chinner @ 2013-07-10  2:09 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Al Viro, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Theodore Ts'o,
	Andreas Dilger

On Tue, Jul 09, 2013 at 08:21:20PM -0400, J. Bruce Fields wrote:
> On Wed, Jul 10, 2013 at 08:04:11AM +1000, Dave Chinner wrote:
> > On Wed, Jul 03, 2013 at 04:12:25PM -0400, J. Bruce Fields wrote:
> > > From: "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > > 
> > > We want to do this elsewhere as well.
> > > 
> > > Cc: "Theodore Ts'o" <tytso-3s7WtUTddSA@public.gmane.org>
> > > Cc: Andreas Dilger <adilger.kernel-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org>
> > > Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > > ---
> > >  fs/ext4/ext4.h        |    2 --
> > >  fs/ext4/ioctl.c       |    4 ++--
> > >  fs/ext4/move_extent.c |   40 ++--------------------------------------
> > >  fs/inode.c            |   29 +++++++++++++++++++++++++++++
> > >  include/linux/fs.h    |    3 +++
> > >  5 files changed, 36 insertions(+), 42 deletions(-)
> > > 
> > > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> > > index 5aae3d1..3590abe 100644
> 
> Thanks for the comment:
> 
> > Just to throw a spanner in the works - have you considered that
> > other filesystems might have different inode lock ordering rules?
> > 
> > For example, XFS locks multiple inodes in ascending inode number
> > order, not ordered by pointer address. Hence we end up different
> > inode lock ordering at different layers of the stack and I can't see
> > that ending well....
> 
> What lock(s) is it taking exactly, where?

xfs_lock_two_inodes() locks two XFS inodes and doesn't require
i_mutex on the inodes to be held first.

Then there's xfs_lock_inodes() which can lock an arbitrary number of
inodes and has some special casing to avoid transaction subsystem
deadlocks. That's used by rename so typically is used for 4 inodes
maximum, and the ordering is set up via xfs_sort_for_rename(). The
VFS typically already holds the i_mutex on these inodes first, so
I'm not so concerned about this case.

I'm not sure that there is actually deadlock, but given that XFS can
lock multiple inodes independently of the VFS (e.g. through ioctl
interfaces) I'm extremely wary of differences in lock ordering on
the same structure....

> If there's a possible
> deadlock, can we come up with a compatible ordering?

Sure. I'd prefer ordering by inode number, because then ordering is
deterministic rather than being dependent on memory allocation
results. It makes forensic analysis of deadlocks and corruptions
easier because you can look at on-disk structures and accurately
predict locking behaviour and therefore determine the order of
operations that should occur. With lock ordering determined by
memory addresses, you can't easily predict the lock ordering two
particular inodes might take from one operation to another.

> > > +EXPORT_SYMBOL(lock_two_nondirectories);
> > 
> > What makes this specific to non-directories?
> 
> See 
> 
> 	http://mid.gmane.org/<1372882356-14168-5-git-send-email-bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> 
> The only caller outside ext4 is vfs_rename_other.

Ah, so we now mix two different lock ordering models for directories
vs non-directories.  i.e. lock_rename() enforces parent/child
relationships on the two directories being locked, but if there is
no ancestry, it doesn't order the inode locking at all.

So it seems that we can make up whatever ordering we want here,
as long as we use it everywhere for locking multiple inodes. What
other code locks multiple inodes?

> I think we could make it work for directories two if necessary though
> the ordering would be more complicated.  Currently there's no reason.

lock_rename() already does it ;)

> > If it's not to be used for directory inodes, then there should be
> > WARN_ON_ONCE() guards in the code...
> 
> Sure.  So something like the following.
> 
> Hm.  I also overlooked that ext4 had a BUG() for the case they're equal.
> Maybe we should keep that too if it's not overkill.

Just do like lock_rename() does - detect it an only lock the single
inode. That way you can also pass in a null inode2 and have it
behave appropriately - it will get rid of the if (target) ... else
code out of vfs_rename_other....

>  /**
> + * lock_two_nondirectories - take two i_mutexes on non-directory objects
> + * @inode1: first inode to lock
> + * @inode2: second inode to lock
> + */
> +void lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
> +{
> +	WARN_ON_ONCE(S_ISDIR(inode1->i_mode) || S_ISDIR(inode2->i_mode));
> +	WARN_ON_ONCE(inode1 == inode2);

Sure, but I'd split the first warn on - that way we know which inode
is triggering the warning....

Cheers,

Dave.
-- 
Dave Chinner
david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 01/12] vfs: pull ext4's double-i_mutex-locking into common code
@ 2013-07-10  2:09             ` Dave Chinner
  0 siblings, 0 replies; 83+ messages in thread
From: Dave Chinner @ 2013-07-10  2:09 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Al Viro, linux-nfs, linux-fsdevel, Theodore Ts'o, Andreas Dilger

On Tue, Jul 09, 2013 at 08:21:20PM -0400, J. Bruce Fields wrote:
> On Wed, Jul 10, 2013 at 08:04:11AM +1000, Dave Chinner wrote:
> > On Wed, Jul 03, 2013 at 04:12:25PM -0400, J. Bruce Fields wrote:
> > > From: "J. Bruce Fields" <bfields@redhat.com>
> > > 
> > > We want to do this elsewhere as well.
> > > 
> > > Cc: "Theodore Ts'o" <tytso@mit.edu>
> > > Cc: Andreas Dilger <adilger.kernel@dilger.ca>
> > > Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> > > ---
> > >  fs/ext4/ext4.h        |    2 --
> > >  fs/ext4/ioctl.c       |    4 ++--
> > >  fs/ext4/move_extent.c |   40 ++--------------------------------------
> > >  fs/inode.c            |   29 +++++++++++++++++++++++++++++
> > >  include/linux/fs.h    |    3 +++
> > >  5 files changed, 36 insertions(+), 42 deletions(-)
> > > 
> > > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> > > index 5aae3d1..3590abe 100644
> 
> Thanks for the comment:
> 
> > Just to throw a spanner in the works - have you considered that
> > other filesystems might have different inode lock ordering rules?
> > 
> > For example, XFS locks multiple inodes in ascending inode number
> > order, not ordered by pointer address. Hence we end up different
> > inode lock ordering at different layers of the stack and I can't see
> > that ending well....
> 
> What lock(s) is it taking exactly, where?

xfs_lock_two_inodes() locks two XFS inodes and doesn't require
i_mutex on the inodes to be held first.

Then there's xfs_lock_inodes() which can lock an arbitrary number of
inodes and has some special casing to avoid transaction subsystem
deadlocks. That's used by rename so typically is used for 4 inodes
maximum, and the ordering is set up via xfs_sort_for_rename(). The
VFS typically already holds the i_mutex on these inodes first, so
I'm not so concerned about this case.

I'm not sure that there is actually deadlock, but given that XFS can
lock multiple inodes independently of the VFS (e.g. through ioctl
interfaces) I'm extremely wary of differences in lock ordering on
the same structure....

> If there's a possible
> deadlock, can we come up with a compatible ordering?

Sure. I'd prefer ordering by inode number, because then ordering is
deterministic rather than being dependent on memory allocation
results. It makes forensic analysis of deadlocks and corruptions
easier because you can look at on-disk structures and accurately
predict locking behaviour and therefore determine the order of
operations that should occur. With lock ordering determined by
memory addresses, you can't easily predict the lock ordering two
particular inodes might take from one operation to another.

> > > +EXPORT_SYMBOL(lock_two_nondirectories);
> > 
> > What makes this specific to non-directories?
> 
> See 
> 
> 	http://mid.gmane.org/<1372882356-14168-5-git-send-email-bfields@redhat.com>
> 
> The only caller outside ext4 is vfs_rename_other.

Ah, so we now mix two different lock ordering models for directories
vs non-directories.  i.e. lock_rename() enforces parent/child
relationships on the two directories being locked, but if there is
no ancestry, it doesn't order the inode locking at all.

So it seems that we can make up whatever ordering we want here,
as long as we use it everywhere for locking multiple inodes. What
other code locks multiple inodes?

> I think we could make it work for directories two if necessary though
> the ordering would be more complicated.  Currently there's no reason.

lock_rename() already does it ;)

> > If it's not to be used for directory inodes, then there should be
> > WARN_ON_ONCE() guards in the code...
> 
> Sure.  So something like the following.
> 
> Hm.  I also overlooked that ext4 had a BUG() for the case they're equal.
> Maybe we should keep that too if it's not overkill.

Just do like lock_rename() does - detect it an only lock the single
inode. That way you can also pass in a null inode2 and have it
behave appropriately - it will get rid of the if (target) ... else
code out of vfs_rename_other....

>  /**
> + * lock_two_nondirectories - take two i_mutexes on non-directory objects
> + * @inode1: first inode to lock
> + * @inode2: second inode to lock
> + */
> +void lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
> +{
> +	WARN_ON_ONCE(S_ISDIR(inode1->i_mode) || S_ISDIR(inode2->i_mode));
> +	WARN_ON_ONCE(inode1 == inode2);

Sure, but I'd split the first warn on - that way we know which inode
is triggering the warning....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 01/12] vfs: pull ext4's double-i_mutex-locking into common code
  2013-07-10  2:09             ` Dave Chinner
  (?)
@ 2013-07-10  2:40             ` J. Bruce Fields
       [not found]               ` <20130710024059.GN32574-spRCxval1Z7TsXDwO4sDpg@public.gmane.org>
  -1 siblings, 1 reply; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-10  2:40 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Al Viro, linux-nfs, linux-fsdevel, Theodore Ts'o, Andreas Dilger

On Wed, Jul 10, 2013 at 12:09:21PM +1000, Dave Chinner wrote:
> On Tue, Jul 09, 2013 at 08:21:20PM -0400, J. Bruce Fields wrote:
> > On Wed, Jul 10, 2013 at 08:04:11AM +1000, Dave Chinner wrote:
...
> > > Just to throw a spanner in the works - have you considered that
> > > other filesystems might have different inode lock ordering rules?
> > > 
> > > For example, XFS locks multiple inodes in ascending inode number
> > > order, not ordered by pointer address. Hence we end up different
> > > inode lock ordering at different layers of the stack and I can't see
> > > that ending well....
> > 
> > What lock(s) is it taking exactly, where?
> 
> xfs_lock_two_inodes() locks two XFS inodes and doesn't require
> i_mutex on the inodes to be held first.
> 
> Then there's xfs_lock_inodes() which can lock an arbitrary number of
> inodes and has some special casing to avoid transaction subsystem
> deadlocks. That's used by rename so typically is used for 4 inodes
> maximum, and the ordering is set up via xfs_sort_for_rename(). The
> VFS typically already holds the i_mutex on these inodes first, so
> I'm not so concerned about this case.
> 
> I'm not sure that there is actually deadlock, but given that XFS can
> lock multiple inodes independently of the VFS (e.g. through ioctl
> interfaces) I'm extremely wary of differences in lock ordering on
> the same structure....

OK.

> > If there's a possible
> > deadlock, can we come up with a compatible ordering?
> 
> Sure. I'd prefer ordering by inode number, because then ordering is
> deterministic rather than being dependent on memory allocation
> results.  It makes forensic analysis of deadlocks and corruptions
> easier because you can look at on-disk structures and accurately
> predict locking behaviour and therefore determine the order of
> operations that should occur. With lock ordering determined by
> memory addresses, you can't easily predict the lock ordering two
> particular inodes might take from one operation to another.

Hm, OK, not having done this I don't have a good feeling for how
important that is, but I can take your word for it.

But the ext4 code actually originally used i_ino order and was changed
by 03bd8b9b896c8e "ext4: move_extent code cleanup", possibly on Linus's
suggestion?:

	http://mid.gmane.org/<CA+55aFwdh_QWG-R2FQ71kDXiNYZ04qPANBsY_PssVUwEBH4uSw@mail.gmail.com>

	"And the only sane order is comparing inode pointers, not inode
	numbers like ext4 apparently does."

(Uh, I thought I also remembered some rationale but can't dig up the
email now.)

> > > > +EXPORT_SYMBOL(lock_two_nondirectories);
> > > 
> > > What makes this specific to non-directories?
> > 
> > See 
> > 
> > 	http://mid.gmane.org/<1372882356-14168-5-git-send-email-bfields@redhat.com>
> > 
> > The only caller outside ext4 is vfs_rename_other.
> 
> Ah, so we now mix two different lock ordering models for directories
> vs non-directories.  i.e. lock_rename() enforces parent/child
> relationships on the two directories being locked, but if there is
> no ancestry, it doesn't order the inode locking at all.
> 
> So it seems that we can make up whatever ordering we want here,
> as long as we use it everywhere for locking multiple inodes. What
> other code locks multiple inodes?

The ext4 code is the only code I know of--but only I think because Al
pointed out.  And obviously I overlooked the xfs case.  I'll try looking
harder....

...
> Just do like lock_rename() does - detect it an only lock the single
> inode. That way you can also pass in a null inode2 and have it
> behave appropriately - it will get rid of the if (target) ... else
> code out of vfs_rename_other....

OK, will do.

...
> > +	WARN_ON_ONCE(S_ISDIR(inode1->i_mode) || S_ISDIR(inode2->i_mode));
> > +	WARN_ON_ONCE(inode1 == inode2);
> 
> Sure, but I'd split the first warn on - that way we know which inode
> is triggering the warning....

Also sounds reasonable, thanks.

--b.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 01/12] vfs: pull ext4's double-i_mutex-locking into common code
  2013-07-10  2:40             ` J. Bruce Fields
@ 2013-07-10  3:38                   ` Dave Chinner
  0 siblings, 0 replies; 83+ messages in thread
From: Dave Chinner @ 2013-07-10  3:38 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Al Viro, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Theodore Ts'o,
	Andreas Dilger

On Tue, Jul 09, 2013 at 10:40:59PM -0400, J. Bruce Fields wrote:
> On Wed, Jul 10, 2013 at 12:09:21PM +1000, Dave Chinner wrote:
> > On Tue, Jul 09, 2013 at 08:21:20PM -0400, J. Bruce Fields wrote:
> > > On Wed, Jul 10, 2013 at 08:04:11AM +1000, Dave Chinner wrote:
> ...
> > > > Just to throw a spanner in the works - have you considered that
> > > > other filesystems might have different inode lock ordering rules?
> > > > 
> > > > For example, XFS locks multiple inodes in ascending inode number
> > > > order, not ordered by pointer address. Hence we end up different
> > > > inode lock ordering at different layers of the stack and I can't see
> > > > that ending well....
> > > 
> > > What lock(s) is it taking exactly, where?
> > 
> > xfs_lock_two_inodes() locks two XFS inodes and doesn't require
> > i_mutex on the inodes to be held first.
> > 
> > Then there's xfs_lock_inodes() which can lock an arbitrary number of
> > inodes and has some special casing to avoid transaction subsystem
> > deadlocks. That's used by rename so typically is used for 4 inodes
> > maximum, and the ordering is set up via xfs_sort_for_rename(). The
> > VFS typically already holds the i_mutex on these inodes first, so
> > I'm not so concerned about this case.
> > 
> > I'm not sure that there is actually deadlock, but given that XFS can
> > lock multiple inodes independently of the VFS (e.g. through ioctl
> > interfaces) I'm extremely wary of differences in lock ordering on
> > the same structure....
> 
> OK.
> 
> > > If there's a possible
> > > deadlock, can we come up with a compatible ordering?
> > 
> > Sure. I'd prefer ordering by inode number, because then ordering is
> > deterministic rather than being dependent on memory allocation
> > results.  It makes forensic analysis of deadlocks and corruptions
> > easier because you can look at on-disk structures and accurately
> > predict locking behaviour and therefore determine the order of
> > operations that should occur. With lock ordering determined by
> > memory addresses, you can't easily predict the lock ordering two
> > particular inodes might take from one operation to another.
> 
> Hm, OK, not having done this I don't have a good feeling for how
> important that is, but I can take your word for it.
> 
> But the ext4 code actually originally used i_ino order and was changed
> by 03bd8b9b896c8e "ext4: move_extent code cleanup", possibly on Linus's
> suggestion?:
> 
> 	http://mid.gmane.org/<CA+55aFwdh_QWG-R2FQ71kDXiNYZ04qPANBsY_PssVUwEBH4uSw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
> 
> 	"And the only sane order is comparing inode pointers, not inode
> 	numbers like ext4 apparently does."

Interesting. What has worked for the last 20 years must be wrong if
Linus says so ;)

> 
> (Uh, I thought I also remembered some rationale but can't dig up the
> email now.)

Probably duplicate inode numbers on inodes in different filesystems.
But rename doesn't allow that, and I don't we ever want to allow
arbitrary nested inode locking across superblocks. Hence I can't
think of a reason why it's a problem...

FWIW - gfs2 does multiple glock locking similar to XFS inode locking
- it sorts the locks in lock number order and the locks them all one
at a time...

> > > > > +EXPORT_SYMBOL(lock_two_nondirectories);
> > > > 
> > > > What makes this specific to non-directories?
> > > 
> > > See 
> > > 
> > > 	http://mid.gmane.org/<1372882356-14168-5-git-send-email-bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > > 
> > > The only caller outside ext4 is vfs_rename_other.
> > 
> > Ah, so we now mix two different lock ordering models for directories
> > vs non-directories.  i.e. lock_rename() enforces parent/child
> > relationships on the two directories being locked, but if there is
> > no ancestry, it doesn't order the inode locking at all.
> > 
> > So it seems that we can make up whatever ordering we want here,
> > as long as we use it everywhere for locking multiple inodes. What
> > other code locks multiple inodes?
> 
> The ext4 code is the only code I know of--but only I think because Al
> pointed out.  And obviously I overlooked the xfs case.  I'll try looking
> harder....

A quick grep shows lock_2_inodes() in fs/ubifs/dir.c. I don't see
any other obvious ones.

Cheers,

Dave.
-- 
Dave Chinner
david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 01/12] vfs: pull ext4's double-i_mutex-locking into common code
@ 2013-07-10  3:38                   ` Dave Chinner
  0 siblings, 0 replies; 83+ messages in thread
From: Dave Chinner @ 2013-07-10  3:38 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Al Viro, linux-nfs, linux-fsdevel, Theodore Ts'o, Andreas Dilger

On Tue, Jul 09, 2013 at 10:40:59PM -0400, J. Bruce Fields wrote:
> On Wed, Jul 10, 2013 at 12:09:21PM +1000, Dave Chinner wrote:
> > On Tue, Jul 09, 2013 at 08:21:20PM -0400, J. Bruce Fields wrote:
> > > On Wed, Jul 10, 2013 at 08:04:11AM +1000, Dave Chinner wrote:
> ...
> > > > Just to throw a spanner in the works - have you considered that
> > > > other filesystems might have different inode lock ordering rules?
> > > > 
> > > > For example, XFS locks multiple inodes in ascending inode number
> > > > order, not ordered by pointer address. Hence we end up different
> > > > inode lock ordering at different layers of the stack and I can't see
> > > > that ending well....
> > > 
> > > What lock(s) is it taking exactly, where?
> > 
> > xfs_lock_two_inodes() locks two XFS inodes and doesn't require
> > i_mutex on the inodes to be held first.
> > 
> > Then there's xfs_lock_inodes() which can lock an arbitrary number of
> > inodes and has some special casing to avoid transaction subsystem
> > deadlocks. That's used by rename so typically is used for 4 inodes
> > maximum, and the ordering is set up via xfs_sort_for_rename(). The
> > VFS typically already holds the i_mutex on these inodes first, so
> > I'm not so concerned about this case.
> > 
> > I'm not sure that there is actually deadlock, but given that XFS can
> > lock multiple inodes independently of the VFS (e.g. through ioctl
> > interfaces) I'm extremely wary of differences in lock ordering on
> > the same structure....
> 
> OK.
> 
> > > If there's a possible
> > > deadlock, can we come up with a compatible ordering?
> > 
> > Sure. I'd prefer ordering by inode number, because then ordering is
> > deterministic rather than being dependent on memory allocation
> > results.  It makes forensic analysis of deadlocks and corruptions
> > easier because you can look at on-disk structures and accurately
> > predict locking behaviour and therefore determine the order of
> > operations that should occur. With lock ordering determined by
> > memory addresses, you can't easily predict the lock ordering two
> > particular inodes might take from one operation to another.
> 
> Hm, OK, not having done this I don't have a good feeling for how
> important that is, but I can take your word for it.
> 
> But the ext4 code actually originally used i_ino order and was changed
> by 03bd8b9b896c8e "ext4: move_extent code cleanup", possibly on Linus's
> suggestion?:
> 
> 	http://mid.gmane.org/<CA+55aFwdh_QWG-R2FQ71kDXiNYZ04qPANBsY_PssVUwEBH4uSw@mail.gmail.com>
> 
> 	"And the only sane order is comparing inode pointers, not inode
> 	numbers like ext4 apparently does."

Interesting. What has worked for the last 20 years must be wrong if
Linus says so ;)

> 
> (Uh, I thought I also remembered some rationale but can't dig up the
> email now.)

Probably duplicate inode numbers on inodes in different filesystems.
But rename doesn't allow that, and I don't we ever want to allow
arbitrary nested inode locking across superblocks. Hence I can't
think of a reason why it's a problem...

FWIW - gfs2 does multiple glock locking similar to XFS inode locking
- it sorts the locks in lock number order and the locks them all one
at a time...

> > > > > +EXPORT_SYMBOL(lock_two_nondirectories);
> > > > 
> > > > What makes this specific to non-directories?
> > > 
> > > See 
> > > 
> > > 	http://mid.gmane.org/<1372882356-14168-5-git-send-email-bfields@redhat.com>
> > > 
> > > The only caller outside ext4 is vfs_rename_other.
> > 
> > Ah, so we now mix two different lock ordering models for directories
> > vs non-directories.  i.e. lock_rename() enforces parent/child
> > relationships on the two directories being locked, but if there is
> > no ancestry, it doesn't order the inode locking at all.
> > 
> > So it seems that we can make up whatever ordering we want here,
> > as long as we use it everywhere for locking multiple inodes. What
> > other code locks multiple inodes?
> 
> The ext4 code is the only code I know of--but only I think because Al
> pointed out.  And obviously I overlooked the xfs case.  I'll try looking
> harder....

A quick grep shows lock_2_inodes() in fs/ubifs/dir.c. I don't see
any other obvious ones.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 12/12] locks: break delegations on any attribute modification
  2013-07-10  1:26                 ` Jeff Layton
@ 2013-07-10 19:33                     ` J. Bruce Fields
  -1 siblings, 0 replies; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-10 19:33 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Al Viro, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Mikulas Patocka,
	David Howells, Tyler Hicks, Dustin Kirkland

On Tue, Jul 09, 2013 at 09:26:25PM -0400, Jeff Layton wrote:
> On Tue, 9 Jul 2013 17:19:12 -0400
> "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
> > On Tue, Jul 09, 2013 at 04:51:01PM -0400, J. Bruce Fields wrote:
> > > On Tue, Jul 09, 2013 at 09:30:47AM -0400, Jeff Layton wrote:
> > > > On Wed,  3 Jul 2013 16:12:36 -0400
> > > > "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > > > 
> > > > > From: "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > > > > 
> > > > > NFSv4 uses leases to guarantee that clients can cache metadata as well
> > > > > as data.
> > > > > 
> > ...
> > > > Isn't it possible we'll need to break a delegation on truncate()?
> > > 
> > > In the truncate case, the caller called break_lease, and in the
> > > ftruncate case it's called with a write open, and the open already broke
> > > any leases or delegations.
> > > 
> > > Might need a comment--could I get away with just this?:
> > > 
> > >  	mutex_lock(&dentry->d_inode->i_mutex);
> > > +	/* NULL is safe: any delegations have already been broken: */
> > >  	ret = notify_change(dentry, &newattrs, NULL);
> > >  	mutex_unlock(&dentry->d_inode->i_mutex);
> > >  	return ret;
> > > 
> > > I also added something to the notify_change kerneldoc: "passing NULL is
> > > fine for callers holding the file open for write, as there can be no
> > > conflicting delegation in that case."
> > 
> > Another question is whether it's really worth dropping locks and
> > retrying in this case.
> > 
> > We could instead do the following.
> > 
> > --b.
> > 
> > commit 40a4fd613034cd3f242ec11e5ecd44f9a83ab39d
> > Author: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > Date:   Tue Sep 20 17:19:26 2011 -0400
> > 
> >     locks: break delegations on any attribute modification
> >     
> >     NFSv4 uses leases to guarantee that clients can cache metadata as well
> >     as data.
> >     
> >     Note unlike link, unlink, and rename, we don't bother dropping locks and
> >     retrying.  In the other cases we're holding a directory mutex, hence
> >     blocking operations (even lookups) on the same directory.  In this case
> >     we're holding only the i_mutex on this file, so the impact of an
> >     unresponsive client is limited to this file.
> >     
> >     Signed-off-by: J. Bruce Fields <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > 
> > diff --git a/fs/attr.c b/fs/attr.c
> > index 1449adb..a2c1d04 100644
> > --- a/fs/attr.c
> > +++ b/fs/attr.c
> > @@ -243,6 +243,9 @@ int notify_change(struct dentry * dentry, struct iattr * attr)
> >  	error = security_inode_setattr(dentry, attr);
> >  	if (error)
> >  		return error;
> > +	error = break_deleg_wait(inode);
> > +	if (error)
> > +		return error;
> >  
> >  	if (inode->i_op->setattr)
> >  		error = inode->i_op->setattr(dentry, attr);
> 
> 
> I guess the question is whether there are operations that require the
> i_mutex but that don't require the delegation recall to have finished.

Also if there's any risk that something on the delegation-return path
might take the i_mutex then we'd risk blocking the client's attempt to
return until the delegation timed out and got revoked.

In fact a CLAIM_DELEG_CUR open needs a lookup so probably runs into
exactly that problem.  OK, back to the more complicated solution.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 12/12] locks: break delegations on any attribute modification
@ 2013-07-10 19:33                     ` J. Bruce Fields
  0 siblings, 0 replies; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-10 19:33 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Al Viro, linux-nfs, linux-fsdevel, Mikulas Patocka,
	David Howells, Tyler Hicks, Dustin Kirkland

On Tue, Jul 09, 2013 at 09:26:25PM -0400, Jeff Layton wrote:
> On Tue, 9 Jul 2013 17:19:12 -0400
> "J. Bruce Fields" <bfields@redhat.com> wrote:
> 
> > On Tue, Jul 09, 2013 at 04:51:01PM -0400, J. Bruce Fields wrote:
> > > On Tue, Jul 09, 2013 at 09:30:47AM -0400, Jeff Layton wrote:
> > > > On Wed,  3 Jul 2013 16:12:36 -0400
> > > > "J. Bruce Fields" <bfields@redhat.com> wrote:
> > > > 
> > > > > From: "J. Bruce Fields" <bfields@redhat.com>
> > > > > 
> > > > > NFSv4 uses leases to guarantee that clients can cache metadata as well
> > > > > as data.
> > > > > 
> > ...
> > > > Isn't it possible we'll need to break a delegation on truncate()?
> > > 
> > > In the truncate case, the caller called break_lease, and in the
> > > ftruncate case it's called with a write open, and the open already broke
> > > any leases or delegations.
> > > 
> > > Might need a comment--could I get away with just this?:
> > > 
> > >  	mutex_lock(&dentry->d_inode->i_mutex);
> > > +	/* NULL is safe: any delegations have already been broken: */
> > >  	ret = notify_change(dentry, &newattrs, NULL);
> > >  	mutex_unlock(&dentry->d_inode->i_mutex);
> > >  	return ret;
> > > 
> > > I also added something to the notify_change kerneldoc: "passing NULL is
> > > fine for callers holding the file open for write, as there can be no
> > > conflicting delegation in that case."
> > 
> > Another question is whether it's really worth dropping locks and
> > retrying in this case.
> > 
> > We could instead do the following.
> > 
> > --b.
> > 
> > commit 40a4fd613034cd3f242ec11e5ecd44f9a83ab39d
> > Author: J. Bruce Fields <bfields@redhat.com>
> > Date:   Tue Sep 20 17:19:26 2011 -0400
> > 
> >     locks: break delegations on any attribute modification
> >     
> >     NFSv4 uses leases to guarantee that clients can cache metadata as well
> >     as data.
> >     
> >     Note unlike link, unlink, and rename, we don't bother dropping locks and
> >     retrying.  In the other cases we're holding a directory mutex, hence
> >     blocking operations (even lookups) on the same directory.  In this case
> >     we're holding only the i_mutex on this file, so the impact of an
> >     unresponsive client is limited to this file.
> >     
> >     Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> > 
> > diff --git a/fs/attr.c b/fs/attr.c
> > index 1449adb..a2c1d04 100644
> > --- a/fs/attr.c
> > +++ b/fs/attr.c
> > @@ -243,6 +243,9 @@ int notify_change(struct dentry * dentry, struct iattr * attr)
> >  	error = security_inode_setattr(dentry, attr);
> >  	if (error)
> >  		return error;
> > +	error = break_deleg_wait(inode);
> > +	if (error)
> > +		return error;
> >  
> >  	if (inode->i_op->setattr)
> >  		error = inode->i_op->setattr(dentry, attr);
> 
> 
> I guess the question is whether there are operations that require the
> i_mutex but that don't require the delegation recall to have finished.

Also if there's any risk that something on the delegation-return path
might take the i_mutex then we'd risk blocking the client's attempt to
return until the delegation timed out and got revoked.

In fact a CLAIM_DELEG_CUR open needs a lookup so probably runs into
exactly that problem.  OK, back to the more complicated solution.

--b.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 01/12] vfs: pull ext4's double-i_mutex-locking into common code
  2013-07-10  3:38                   ` Dave Chinner
@ 2013-07-10 21:26                     ` J. Bruce Fields
  -1 siblings, 0 replies; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-10 21:26 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Al Viro, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Theodore Ts'o,
	Andreas Dilger

On Wed, Jul 10, 2013 at 01:38:53PM +1000, Dave Chinner wrote:
> On Tue, Jul 09, 2013 at 10:40:59PM -0400, J. Bruce Fields wrote:
> > On Wed, Jul 10, 2013 at 12:09:21PM +1000, Dave Chinner wrote:
> > > Sure. I'd prefer ordering by inode number, because then ordering is
> > > deterministic rather than being dependent on memory allocation
> > > results.  It makes forensic analysis of deadlocks and corruptions
> > > easier because you can look at on-disk structures and accurately
> > > predict locking behaviour and therefore determine the order of
> > > operations that should occur. With lock ordering determined by
> > > memory addresses, you can't easily predict the lock ordering two
> > > particular inodes might take from one operation to another.
> > 
> > Hm, OK, not having done this I don't have a good feeling for how
> > important that is, but I can take your word for it.
> > 
> > But the ext4 code actually originally used i_ino order and was changed
> > by 03bd8b9b896c8e "ext4: move_extent code cleanup", possibly on Linus's
> > suggestion?:
> > 
> > 	http://mid.gmane.org/<CA+55aFwdh_QWG-R2FQ71kDXiNYZ04qPANBsY_PssVUwEBH4uSw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
> > 
> > 	"And the only sane order is comparing inode pointers, not inode
> > 	numbers like ext4 apparently does."
> 
> Interesting. What has worked for the last 20 years must be wrong if
> Linus says so ;)
> 
> > 
> > (Uh, I thought I also remembered some rationale but can't dig up the
> > email now.)
> 
> Probably duplicate inode numbers on inodes in different filesystems.
> But rename doesn't allow that, and I don't we ever want to allow
> arbitrary nested inode locking across superblocks. Hence I can't
> think of a reason why it's a problem...

I have some vague memory the argument was rather that inode numbers
could fail to be unique within a fs due to bugs, but I may be making
that up.  I've got no strong opinion here.

> FWIW - gfs2 does multiple glock locking similar to XFS inode locking
> - it sorts the locks in lock number order and the locks them all one
> at a time...
... 
> A quick grep shows lock_2_inodes() in fs/ubifs/dir.c. I don't see
> any other obvious ones.

OK.  I'll put off reposting till I've had a chance to look at those
cases more carefully....  Thanks for the review!

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 01/12] vfs: pull ext4's double-i_mutex-locking into common code
@ 2013-07-10 21:26                     ` J. Bruce Fields
  0 siblings, 0 replies; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-10 21:26 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Al Viro, linux-nfs, linux-fsdevel, Theodore Ts'o, Andreas Dilger

On Wed, Jul 10, 2013 at 01:38:53PM +1000, Dave Chinner wrote:
> On Tue, Jul 09, 2013 at 10:40:59PM -0400, J. Bruce Fields wrote:
> > On Wed, Jul 10, 2013 at 12:09:21PM +1000, Dave Chinner wrote:
> > > Sure. I'd prefer ordering by inode number, because then ordering is
> > > deterministic rather than being dependent on memory allocation
> > > results.  It makes forensic analysis of deadlocks and corruptions
> > > easier because you can look at on-disk structures and accurately
> > > predict locking behaviour and therefore determine the order of
> > > operations that should occur. With lock ordering determined by
> > > memory addresses, you can't easily predict the lock ordering two
> > > particular inodes might take from one operation to another.
> > 
> > Hm, OK, not having done this I don't have a good feeling for how
> > important that is, but I can take your word for it.
> > 
> > But the ext4 code actually originally used i_ino order and was changed
> > by 03bd8b9b896c8e "ext4: move_extent code cleanup", possibly on Linus's
> > suggestion?:
> > 
> > 	http://mid.gmane.org/<CA+55aFwdh_QWG-R2FQ71kDXiNYZ04qPANBsY_PssVUwEBH4uSw@mail.gmail.com>
> > 
> > 	"And the only sane order is comparing inode pointers, not inode
> > 	numbers like ext4 apparently does."
> 
> Interesting. What has worked for the last 20 years must be wrong if
> Linus says so ;)
> 
> > 
> > (Uh, I thought I also remembered some rationale but can't dig up the
> > email now.)
> 
> Probably duplicate inode numbers on inodes in different filesystems.
> But rename doesn't allow that, and I don't we ever want to allow
> arbitrary nested inode locking across superblocks. Hence I can't
> think of a reason why it's a problem...

I have some vague memory the argument was rather that inode numbers
could fail to be unique within a fs due to bugs, but I may be making
that up.  I've got no strong opinion here.

> FWIW - gfs2 does multiple glock locking similar to XFS inode locking
> - it sorts the locks in lock number order and the locks them all one
> at a time...
... 
> A quick grep shows lock_2_inodes() in fs/ubifs/dir.c. I don't see
> any other obvious ones.

OK.  I'll put off reposting till I've had a chance to look at those
cases more carefully....  Thanks for the review!

--b.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 01/12] vfs: pull ext4's double-i_mutex-locking into common code
  2013-07-10 21:26                     ` J. Bruce Fields
@ 2013-07-11 14:04                         ` Jeff Layton
  -1 siblings, 0 replies; 83+ messages in thread
From: Jeff Layton @ 2013-07-11 14:04 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Dave Chinner, Al Viro, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Theodore Ts'o,
	Andreas Dilger

On Wed, 10 Jul 2013 17:26:21 -0400
"J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> On Wed, Jul 10, 2013 at 01:38:53PM +1000, Dave Chinner wrote:
> > On Tue, Jul 09, 2013 at 10:40:59PM -0400, J. Bruce Fields wrote:
> > > On Wed, Jul 10, 2013 at 12:09:21PM +1000, Dave Chinner wrote:
> > > > Sure. I'd prefer ordering by inode number, because then ordering is
> > > > deterministic rather than being dependent on memory allocation
> > > > results.  It makes forensic analysis of deadlocks and corruptions
> > > > easier because you can look at on-disk structures and accurately
> > > > predict locking behaviour and therefore determine the order of
> > > > operations that should occur. With lock ordering determined by
> > > > memory addresses, you can't easily predict the lock ordering two
> > > > particular inodes might take from one operation to another.
> > > 
> > > Hm, OK, not having done this I don't have a good feeling for how
> > > important that is, but I can take your word for it.
> > > 
> > > But the ext4 code actually originally used i_ino order and was changed
> > > by 03bd8b9b896c8e "ext4: move_extent code cleanup", possibly on Linus's
> > > suggestion?:
> > > 
> > > 	http://mid.gmane.org/<CA+55aFwdh_QWG-R2FQ71kDXiNYZ04qPANBsY_PssVUwEBH4uSw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
> > > 
> > > 	"And the only sane order is comparing inode pointers, not inode
> > > 	numbers like ext4 apparently does."
> > 
> > Interesting. What has worked for the last 20 years must be wrong if
> > Linus says so ;)
> > 
> > > 
> > > (Uh, I thought I also remembered some rationale but can't dig up the
> > > email now.)
> > 
> > Probably duplicate inode numbers on inodes in different filesystems.
> > But rename doesn't allow that, and I don't we ever want to allow
> > arbitrary nested inode locking across superblocks. Hence I can't
> > think of a reason why it's a problem...
> 
> I have some vague memory the argument was rather that inode numbers
> could fail to be unique within a fs due to bugs, but I may be making
> that up.  I've got no strong opinion here.
> 

There are also legitimate cases where inode numbers can collide,
particularly on network filesystems. That's one of the main reasons we
have iget5_locked().

One possibility might be to order by i_ino first, and then fall back to
using the inode pointer value if they are equal.

> > FWIW - gfs2 does multiple glock locking similar to XFS inode locking
> > - it sorts the locks in lock number order and the locks them all one
> > at a time...
> ... 
> > A quick grep shows lock_2_inodes() in fs/ubifs/dir.c. I don't see
> > any other obvious ones.
> 
> OK.  I'll put off reposting till I've had a chance to look at those
> cases more carefully....  Thanks for the review!
> 
> --b.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Jeff Layton <jlayton-vpEMnDpepFuMZCB2o+C8xQ@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 01/12] vfs: pull ext4's double-i_mutex-locking into common code
@ 2013-07-11 14:04                         ` Jeff Layton
  0 siblings, 0 replies; 83+ messages in thread
From: Jeff Layton @ 2013-07-11 14:04 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Dave Chinner, Al Viro, linux-nfs, linux-fsdevel,
	Theodore Ts'o, Andreas Dilger

On Wed, 10 Jul 2013 17:26:21 -0400
"J. Bruce Fields" <bfields@redhat.com> wrote:

> On Wed, Jul 10, 2013 at 01:38:53PM +1000, Dave Chinner wrote:
> > On Tue, Jul 09, 2013 at 10:40:59PM -0400, J. Bruce Fields wrote:
> > > On Wed, Jul 10, 2013 at 12:09:21PM +1000, Dave Chinner wrote:
> > > > Sure. I'd prefer ordering by inode number, because then ordering is
> > > > deterministic rather than being dependent on memory allocation
> > > > results.  It makes forensic analysis of deadlocks and corruptions
> > > > easier because you can look at on-disk structures and accurately
> > > > predict locking behaviour and therefore determine the order of
> > > > operations that should occur. With lock ordering determined by
> > > > memory addresses, you can't easily predict the lock ordering two
> > > > particular inodes might take from one operation to another.
> > > 
> > > Hm, OK, not having done this I don't have a good feeling for how
> > > important that is, but I can take your word for it.
> > > 
> > > But the ext4 code actually originally used i_ino order and was changed
> > > by 03bd8b9b896c8e "ext4: move_extent code cleanup", possibly on Linus's
> > > suggestion?:
> > > 
> > > 	http://mid.gmane.org/<CA+55aFwdh_QWG-R2FQ71kDXiNYZ04qPANBsY_PssVUwEBH4uSw@mail.gmail.com>
> > > 
> > > 	"And the only sane order is comparing inode pointers, not inode
> > > 	numbers like ext4 apparently does."
> > 
> > Interesting. What has worked for the last 20 years must be wrong if
> > Linus says so ;)
> > 
> > > 
> > > (Uh, I thought I also remembered some rationale but can't dig up the
> > > email now.)
> > 
> > Probably duplicate inode numbers on inodes in different filesystems.
> > But rename doesn't allow that, and I don't we ever want to allow
> > arbitrary nested inode locking across superblocks. Hence I can't
> > think of a reason why it's a problem...
> 
> I have some vague memory the argument was rather that inode numbers
> could fail to be unique within a fs due to bugs, but I may be making
> that up.  I've got no strong opinion here.
> 

There are also legitimate cases where inode numbers can collide,
particularly on network filesystems. That's one of the main reasons we
have iget5_locked().

One possibility might be to order by i_ino first, and then fall back to
using the inode pointer value if they are equal.

> > FWIW - gfs2 does multiple glock locking similar to XFS inode locking
> > - it sorts the locks in lock number order and the locks them all one
> > at a time...
> ... 
> > A quick grep shows lock_2_inodes() in fs/ubifs/dir.c. I don't see
> > any other obvious ones.
> 
> OK.  I'll put off reposting till I've had a chance to look at those
> cases more carefully....  Thanks for the review!
> 
> --b.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 01/12] vfs: pull ext4's double-i_mutex-locking into common code
  2013-07-11 14:04                         ` Jeff Layton
@ 2013-07-12 22:07                             ` J. Bruce Fields
  -1 siblings, 0 replies; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-12 22:07 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Dave Chinner, Al Viro, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Theodore Ts'o,
	Andreas Dilger, swhiteho-H+wXaHxf7aLQT0dZR+AlfA

On Thu, Jul 11, 2013 at 10:04:06AM -0400, Jeff Layton wrote:
> On Wed, 10 Jul 2013 17:26:21 -0400
> "J. Bruce Fields" <bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
> > On Wed, Jul 10, 2013 at 01:38:53PM +1000, Dave Chinner wrote:
> > > On Tue, Jul 09, 2013 at 10:40:59PM -0400, J. Bruce Fields wrote:
> > > > On Wed, Jul 10, 2013 at 12:09:21PM +1000, Dave Chinner wrote:
> > > > > Sure. I'd prefer ordering by inode number, because then ordering is
> > > > > deterministic rather than being dependent on memory allocation
> > > > > results.  It makes forensic analysis of deadlocks and corruptions
> > > > > easier because you can look at on-disk structures and accurately
> > > > > predict locking behaviour and therefore determine the order of
> > > > > operations that should occur. With lock ordering determined by
> > > > > memory addresses, you can't easily predict the lock ordering two
> > > > > particular inodes might take from one operation to another.
> > > > 
> > > > Hm, OK, not having done this I don't have a good feeling for how
> > > > important that is, but I can take your word for it.
> > > > 
> > > > But the ext4 code actually originally used i_ino order and was changed
> > > > by 03bd8b9b896c8e "ext4: move_extent code cleanup", possibly on Linus's
> > > > suggestion?:
> > > > 
> > > > 	http://mid.gmane.org/<CA+55aFwdh_QWG-R2FQ71kDXiNYZ04qPANBsY_PssVUwEBH4uSw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
> > > > 
> > > > 	"And the only sane order is comparing inode pointers, not inode
> > > > 	numbers like ext4 apparently does."
> > > 
> > > Interesting. What has worked for the last 20 years must be wrong if
> > > Linus says so ;)
> > > 
> > > > 
> > > > (Uh, I thought I also remembered some rationale but can't dig up the
> > > > email now.)
> > > 
> > > Probably duplicate inode numbers on inodes in different filesystems.
> > > But rename doesn't allow that, and I don't we ever want to allow
> > > arbitrary nested inode locking across superblocks. Hence I can't
> > > think of a reason why it's a problem...
> > 
> > I have some vague memory the argument was rather that inode numbers
> > could fail to be unique within a fs due to bugs, but I may be making
> > that up.  I've got no strong opinion here.
> > 
> 
> There are also legitimate cases where inode numbers can collide,
> particularly on network filesystems. That's one of the main reasons we
> have iget5_locked().
> 
> One possibility might be to order by i_ino first, and then fall back to
> using the inode pointer value if they are equal.

As long as no one ever modifies i_ino.  Which I'd think would be a
shooting offense.  But it sure looks like fuse allows this--see
fuse_do_getattr->fuse_change_attributes->fuse_change_attributes_common.

Maybe I'm misunderstanding....

As long as there's a chance filesystems (even if only due to bugs) could
mess with this sort of guarantee I'm really inclined to stick with the
obviously-well-defined pointer ordering even if it means giving up the
determinism Dave wants.  Argh.

> > > FWIW - gfs2 does multiple glock locking similar to XFS inode locking
> > > - it sorts the locks in lock number order and the locks them all one
> > > at a time...

Taking a look--I don't think I'm going to begin to understand how that's
used in any reasonable amount of time.  Cc'ing Steve in case he can.

> > > A quick grep shows lock_2_inodes() in fs/ubifs/dir.c. I don't see
> > > any other obvious ones.

Which isn't bothering with consistent lock ordering because (says a
comment) its only called after taking the vfs locks.  Which looks
correct--the only callers are in link, unlink, and rmdir methods.  And a
similar lock_3_inodes is called from the rename method.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 01/12] vfs: pull ext4's double-i_mutex-locking into common code
@ 2013-07-12 22:07                             ` J. Bruce Fields
  0 siblings, 0 replies; 83+ messages in thread
From: J. Bruce Fields @ 2013-07-12 22:07 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Dave Chinner, Al Viro, linux-nfs, linux-fsdevel,
	Theodore Ts'o, Andreas Dilger, swhiteho

On Thu, Jul 11, 2013 at 10:04:06AM -0400, Jeff Layton wrote:
> On Wed, 10 Jul 2013 17:26:21 -0400
> "J. Bruce Fields" <bfields@redhat.com> wrote:
> 
> > On Wed, Jul 10, 2013 at 01:38:53PM +1000, Dave Chinner wrote:
> > > On Tue, Jul 09, 2013 at 10:40:59PM -0400, J. Bruce Fields wrote:
> > > > On Wed, Jul 10, 2013 at 12:09:21PM +1000, Dave Chinner wrote:
> > > > > Sure. I'd prefer ordering by inode number, because then ordering is
> > > > > deterministic rather than being dependent on memory allocation
> > > > > results.  It makes forensic analysis of deadlocks and corruptions
> > > > > easier because you can look at on-disk structures and accurately
> > > > > predict locking behaviour and therefore determine the order of
> > > > > operations that should occur. With lock ordering determined by
> > > > > memory addresses, you can't easily predict the lock ordering two
> > > > > particular inodes might take from one operation to another.
> > > > 
> > > > Hm, OK, not having done this I don't have a good feeling for how
> > > > important that is, but I can take your word for it.
> > > > 
> > > > But the ext4 code actually originally used i_ino order and was changed
> > > > by 03bd8b9b896c8e "ext4: move_extent code cleanup", possibly on Linus's
> > > > suggestion?:
> > > > 
> > > > 	http://mid.gmane.org/<CA+55aFwdh_QWG-R2FQ71kDXiNYZ04qPANBsY_PssVUwEBH4uSw@mail.gmail.com>
> > > > 
> > > > 	"And the only sane order is comparing inode pointers, not inode
> > > > 	numbers like ext4 apparently does."
> > > 
> > > Interesting. What has worked for the last 20 years must be wrong if
> > > Linus says so ;)
> > > 
> > > > 
> > > > (Uh, I thought I also remembered some rationale but can't dig up the
> > > > email now.)
> > > 
> > > Probably duplicate inode numbers on inodes in different filesystems.
> > > But rename doesn't allow that, and I don't we ever want to allow
> > > arbitrary nested inode locking across superblocks. Hence I can't
> > > think of a reason why it's a problem...
> > 
> > I have some vague memory the argument was rather that inode numbers
> > could fail to be unique within a fs due to bugs, but I may be making
> > that up.  I've got no strong opinion here.
> > 
> 
> There are also legitimate cases where inode numbers can collide,
> particularly on network filesystems. That's one of the main reasons we
> have iget5_locked().
> 
> One possibility might be to order by i_ino first, and then fall back to
> using the inode pointer value if they are equal.

As long as no one ever modifies i_ino.  Which I'd think would be a
shooting offense.  But it sure looks like fuse allows this--see
fuse_do_getattr->fuse_change_attributes->fuse_change_attributes_common.

Maybe I'm misunderstanding....

As long as there's a chance filesystems (even if only due to bugs) could
mess with this sort of guarantee I'm really inclined to stick with the
obviously-well-defined pointer ordering even if it means giving up the
determinism Dave wants.  Argh.

> > > FWIW - gfs2 does multiple glock locking similar to XFS inode locking
> > > - it sorts the locks in lock number order and the locks them all one
> > > at a time...

Taking a look--I don't think I'm going to begin to understand how that's
used in any reasonable amount of time.  Cc'ing Steve in case he can.

> > > A quick grep shows lock_2_inodes() in fs/ubifs/dir.c. I don't see
> > > any other obvious ones.

Which isn't bothering with consistent lock ordering because (says a
comment) its only called after taking the vfs locks.  Which looks
correct--the only callers are in link, unlink, and rmdir methods.  And a
similar lock_3_inodes is called from the rename method.

--b.

^ permalink raw reply	[flat|nested] 83+ messages in thread

end of thread, other threads:[~2013-07-12 22:07 UTC | newest]

Thread overview: 83+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-03 20:12 [PATCH 00/12] Implement NFSv4 delegations, take 8 J. Bruce Fields
2013-07-03 20:12 ` [PATCH 01/12] vfs: pull ext4's double-i_mutex-locking into common code J. Bruce Fields
     [not found]   ` <1372882356-14168-2-git-send-email-bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-07-09 10:49     ` Jeff Layton
2013-07-09 10:49       ` Jeff Layton
2013-07-09 15:48       ` Theodore Ts'o
2013-07-09 22:04     ` Dave Chinner
2013-07-09 22:04       ` Dave Chinner
2013-07-10  0:21       ` J. Bruce Fields
2013-07-10  0:21         ` J. Bruce Fields
     [not found]         ` <20130710002120.GM32574-spRCxval1Z7TsXDwO4sDpg@public.gmane.org>
2013-07-10  2:09           ` Dave Chinner
2013-07-10  2:09             ` Dave Chinner
2013-07-10  2:40             ` J. Bruce Fields
     [not found]               ` <20130710024059.GN32574-spRCxval1Z7TsXDwO4sDpg@public.gmane.org>
2013-07-10  3:38                 ` Dave Chinner
2013-07-10  3:38                   ` Dave Chinner
2013-07-10 21:26                   ` J. Bruce Fields
2013-07-10 21:26                     ` J. Bruce Fields
     [not found]                     ` <20130710212620.GB24548-spRCxval1Z7TsXDwO4sDpg@public.gmane.org>
2013-07-11 14:04                       ` Jeff Layton
2013-07-11 14:04                         ` Jeff Layton
     [not found]                         ` <20130711100406.21b08420-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2013-07-12 22:07                           ` J. Bruce Fields
2013-07-12 22:07                             ` J. Bruce Fields
2013-07-03 20:12 ` [PATCH 02/12] vfs: don't use PARENT/CHILD lock classes for non-directories J. Bruce Fields
     [not found]   ` <1372882356-14168-3-git-send-email-bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-07-09 10:50     ` Jeff Layton
2013-07-09 10:50       ` Jeff Layton
     [not found] ` <1372882356-14168-1-git-send-email-bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-07-03 20:12   ` [PATCH 03/12] vfs: rename I_MUTEX_QUOTA now that it's not used for quotas J. Bruce Fields
2013-07-03 20:12     ` J. Bruce Fields
     [not found]     ` <1372882356-14168-4-git-send-email-bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-07-09 10:54       ` Jeff Layton
2013-07-09 10:54         ` Jeff Layton
2013-07-09 14:26         ` J. Bruce Fields
2013-07-09 14:31           ` Jeff Layton
2013-07-03 20:12   ` [PATCH 04/12] vfs: take i_mutex on renamed file J. Bruce Fields
2013-07-03 20:12     ` J. Bruce Fields
     [not found]     ` <1372882356-14168-5-git-send-email-bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-07-09 10:59       ` Jeff Layton
2013-07-09 10:59         ` Jeff Layton
2013-07-03 20:12   ` [PATCH 05/12] locks: introduce new FL_DELEG lock flag J. Bruce Fields
2013-07-03 20:12     ` J. Bruce Fields
     [not found]     ` <1372882356-14168-6-git-send-email-bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-07-09 11:00       ` Jeff Layton
2013-07-09 11:00         ` Jeff Layton
2013-07-03 20:12 ` [PATCH 06/12] locks: implement delegations J. Bruce Fields
     [not found]   ` <1372882356-14168-7-git-send-email-bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-07-09 12:23     ` Jeff Layton
2013-07-09 12:23       ` Jeff Layton
     [not found]       ` <20130709082300.206bf176-4QP7MXygkU+dMjc06nkz3ljfA9RmPOcC@public.gmane.org>
2013-07-09 14:41         ` J. Bruce Fields
2013-07-09 14:41           ` J. Bruce Fields
2013-07-03 20:12 ` [PATCH 07/12] namei: minor vfs_unlink cleanup J. Bruce Fields
     [not found]   ` <1372882356-14168-8-git-send-email-bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-07-09 12:50     ` Jeff Layton
2013-07-09 12:50       ` Jeff Layton
2013-07-03 20:12 ` [PATCH 08/12] locks: break delegations on unlink J. Bruce Fields
     [not found]   ` <1372882356-14168-9-git-send-email-bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-07-09 13:05     ` Jeff Layton
2013-07-09 13:05       ` Jeff Layton
     [not found]       ` <20130709090506.71c96841-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2013-07-09 13:07         ` Jeff Layton
2013-07-09 13:07           ` Jeff Layton
2013-07-09 15:58         ` J. Bruce Fields
2013-07-09 15:58           ` J. Bruce Fields
2013-07-09 16:02           ` Jeff Layton
2013-07-09 19:29         ` J. Bruce Fields
2013-07-09 19:29           ` J. Bruce Fields
2013-07-03 20:12 ` [PATCH 09/12] locks: helper functions for delegation breaking J. Bruce Fields
     [not found]   ` <1372882356-14168-10-git-send-email-bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-07-09 13:09     ` Jeff Layton
2013-07-09 13:09       ` Jeff Layton
2013-07-09 19:31       ` J. Bruce Fields
2013-07-09 19:37         ` Jeff Layton
2013-07-09 13:23   ` Jeff Layton
2013-07-09 19:38     ` J. Bruce Fields
2013-07-09 20:28       ` Jeff Layton
2013-07-03 20:12 ` [PATCH 10/12] locks: break delegations on rename J. Bruce Fields
     [not found]   ` <1372882356-14168-11-git-send-email-bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-07-09 13:14     ` Jeff Layton
2013-07-09 13:14       ` Jeff Layton
2013-07-03 20:12 ` [PATCH 11/12] locks: break delegations on link J. Bruce Fields
     [not found]   ` <1372882356-14168-12-git-send-email-bfields-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-07-09 13:16     ` Jeff Layton
2013-07-09 13:16       ` Jeff Layton
     [not found]       ` <20130709091617.1c175da4-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2013-07-09 20:41         ` J. Bruce Fields
2013-07-09 20:41           ` J. Bruce Fields
2013-07-03 20:12 ` [PATCH 12/12] locks: break delegations on any attribute modification J. Bruce Fields
2013-07-09 13:30   ` Jeff Layton
     [not found]     ` <20130709093047.0096f061-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2013-07-09 20:51       ` J. Bruce Fields
2013-07-09 20:51         ` J. Bruce Fields
     [not found]         ` <20130709205101.GK32574-spRCxval1Z7TsXDwO4sDpg@public.gmane.org>
2013-07-09 21:19           ` J. Bruce Fields
2013-07-09 21:19             ` J. Bruce Fields
     [not found]             ` <20130709211911.GL32574-spRCxval1Z7TsXDwO4sDpg@public.gmane.org>
2013-07-10  1:26               ` Jeff Layton
2013-07-10  1:26                 ` Jeff Layton
     [not found]                 ` <20130709212625.7fdfc6e1-4QP7MXygkU+dMjc06nkz3ljfA9RmPOcC@public.gmane.org>
2013-07-10 19:33                   ` J. Bruce Fields
2013-07-10 19:33                     ` J. Bruce Fields
2013-07-09 23:57           ` Jeff Layton
2013-07-09 23:57             ` Jeff Layton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.