[PATCH 0/6] fs: Fix directory corruption when moving directories

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/6] fs: Fix directory corruption when moving directories
@ 2023-05-25 10:16 Jan Kara
  2023-05-25 10:16 ` [PATCH 1/6] ext4: Remove ext4 locking of moved directory Jan Kara
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: Jan Kara @ 2023-05-25 10:16 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Christian Brauner, Miklos Szeredi,
	Darrick J. Wong, Ted Tso, Jaegeuk Kim, linux-ext4,
	linux-f2fs-devel, linux-xfs, Jan Kara

Hello,

this patch set fixes a problem with cross directory renames originally reported
in [1]. To quickly sum it up some filesystems (so far we know at least about
ext4, udf, f2fs, ocfs2, likely also reiserfs, gfs2 and others) need to lock the
directory when it is being renamed into another directory. This is because we
need to update the parent pointer in the directory in that case and if that
races with other operation on the directory (in particular a conversion from
one directory format into another), bad things can happen.

So far we've done the locking in the filesystem code but recently Darrick
pointed out [2] that we've missed the RENAME_EXCHANGE case in our ext4 fix.
That one is particularly nasty because RENAME_EXCHANGE can arbitrarily mix
regular files and directories and proper lock ordering is not achievable in the
filesystems alone.

This patch set adds locking into vfs_rename() so that not only parent
directories but also moved inodes (regardless whether they are directories or
not) are locked when calling into the filesystem.

								Honza

[1] https://lore.kernel.org/all/20230117123735.un7wbamlbdihninm@quack3
[2] https://lore.kernel.org/all/20230517045836.GA11594@frogsfrogsfrogs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/6] ext4: Remove ext4 locking of moved directory
  2023-05-25 10:16 [PATCH 0/6] fs: Fix directory corruption when moving directories Jan Kara
@ 2023-05-25 10:16 ` Jan Kara
  2023-05-25 10:16 ` [PATCH 2/6] Revert "udf: Protect rename against modification of moved directory" Jan Kara
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Jan Kara @ 2023-05-25 10:16 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Christian Brauner, Miklos Szeredi,
	Darrick J. Wong, Ted Tso, Jaegeuk Kim, linux-ext4,
	linux-f2fs-devel, linux-xfs, Jan Kara, stable

Remove locking of moved directory in ext4_rename2(). We will take care
of it in VFS instead. This effectively reverts commit 0813299c586b
("ext4: Fix possible corruption when moving a directory") and followup
fixes.

CC: Ted Tso <tytso@mit.edu>
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/namei.c | 17 ++---------------
 1 file changed, 2 insertions(+), 15 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 45b579805c95..0caf6c730ce3 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -3834,19 +3834,10 @@ static int ext4_rename(struct mnt_idmap *idmap, struct inode *old_dir,
 			return retval;
 	}
 
-	/*
-	 * We need to protect against old.inode directory getting converted
-	 * from inline directory format into a normal one.
-	 */
-	if (S_ISDIR(old.inode->i_mode))
-		inode_lock_nested(old.inode, I_MUTEX_NONDIR2);
-
 	old.bh = ext4_find_entry(old.dir, &old.dentry->d_name, &old.de,
 				 &old.inlined);
-	if (IS_ERR(old.bh)) {
-		retval = PTR_ERR(old.bh);
-		goto unlock_moved_dir;
-	}
+	if (IS_ERR(old.bh))
+		return PTR_ERR(old.bh);
 
 	/*
 	 *  Check for inode number is _not_ due to possible IO errors.
@@ -4043,10 +4034,6 @@ static int ext4_rename(struct mnt_idmap *idmap, struct inode *old_dir,
 	brelse(old.bh);
 	brelse(new.bh);
 
-unlock_moved_dir:
-	if (S_ISDIR(old.inode->i_mode))
-		inode_unlock(old.inode);
-
 	return retval;
 }
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/6] Revert "udf: Protect rename against modification of moved directory"
  2023-05-25 10:16 [PATCH 0/6] fs: Fix directory corruption when moving directories Jan Kara
  2023-05-25 10:16 ` [PATCH 1/6] ext4: Remove ext4 locking of moved directory Jan Kara
@ 2023-05-25 10:16 ` Jan Kara
  2023-05-25 10:16 ` [PATCH 3/6] Revert "f2fs: fix potential corruption when moving a directory" Jan Kara
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Jan Kara @ 2023-05-25 10:16 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Christian Brauner, Miklos Szeredi,
	Darrick J. Wong, Ted Tso, Jaegeuk Kim, linux-ext4,
	linux-f2fs-devel, linux-xfs, Jan Kara, stable

This reverts commit f950fd0529130a617b3da526da9fb6a896ce87c2. The
locking is going to be provided by vfs_rename() in the following
patches.

CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/udf/namei.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/fs/udf/namei.c b/fs/udf/namei.c
index fd20423d3ed2..fd29a66e7241 100644
--- a/fs/udf/namei.c
+++ b/fs/udf/namei.c
@@ -793,11 +793,6 @@ static int udf_rename(struct mnt_idmap *idmap, struct inode *old_dir,
 			if (!empty_dir(new_inode))
 				goto out_oiter;
 		}
-		/*
-		 * We need to protect against old_inode getting converted from
-		 * ICB to normal directory.
-		 */
-		inode_lock_nested(old_inode, I_MUTEX_NONDIR2);
 		retval = udf_fiiter_find_entry(old_inode, &dotdot_name,
 					       &diriter);
 		if (retval == -ENOENT) {
@@ -806,10 +801,8 @@ static int udf_rename(struct mnt_idmap *idmap, struct inode *old_dir,
 				old_inode->i_ino);
 			retval = -EFSCORRUPTED;
 		}
-		if (retval) {
-			inode_unlock(old_inode);
+		if (retval)
 			goto out_oiter;
-		}
 		has_diriter = true;
 		tloc = lelb_to_cpu(diriter.fi.icb.extLocation);
 		if (udf_get_lb_pblock(old_inode->i_sb, &tloc, 0) !=
@@ -889,7 +882,6 @@ static int udf_rename(struct mnt_idmap *idmap, struct inode *old_dir,
 			       udf_dir_entry_len(&diriter.fi));
 		udf_fiiter_write_fi(&diriter, NULL);
 		udf_fiiter_release(&diriter);
-		inode_unlock(old_inode);
 
 		inode_dec_link_count(old_dir);
 		if (new_inode)
@@ -901,10 +893,8 @@ static int udf_rename(struct mnt_idmap *idmap, struct inode *old_dir,
 	}
 	return 0;
 out_oiter:
-	if (has_diriter) {
+	if (has_diriter)
 		udf_fiiter_release(&diriter);
-		inode_unlock(old_inode);
-	}
 	udf_fiiter_release(&oiter);
 
 	return retval;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/6] Revert "f2fs: fix potential corruption when moving a directory"
  2023-05-25 10:16 [PATCH 0/6] fs: Fix directory corruption when moving directories Jan Kara
  2023-05-25 10:16 ` [PATCH 1/6] ext4: Remove ext4 locking of moved directory Jan Kara
  2023-05-25 10:16 ` [PATCH 2/6] Revert "udf: Protect rename against modification of moved directory" Jan Kara
@ 2023-05-25 10:16 ` Jan Kara
  2023-05-25 10:16 ` [PATCH 4/6] fs: Establish locking order for unrelated directories Jan Kara
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Jan Kara @ 2023-05-25 10:16 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Christian Brauner, Miklos Szeredi,
	Darrick J. Wong, Ted Tso, Jaegeuk Kim, linux-ext4,
	linux-f2fs-devel, linux-xfs, Jan Kara, stable

This reverts commit d94772154e524b329a168678836745d2773a6e02. The
locking is going to be provided by VFS.

CC: Jaegeuk Kim <jaegeuk@kernel.org>
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/f2fs/namei.c | 16 +---------------
 1 file changed, 1 insertion(+), 15 deletions(-)

diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index 77a71276ecb1..ad597b417fea 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -995,20 +995,12 @@ static int f2fs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
 			goto out;
 	}
 
-	/*
-	 * Copied from ext4_rename: we need to protect against old.inode
-	 * directory getting converted from inline directory format into
-	 * a normal one.
-	 */
-	if (S_ISDIR(old_inode->i_mode))
-		inode_lock_nested(old_inode, I_MUTEX_NONDIR2);
-
 	err = -ENOENT;
 	old_entry = f2fs_find_entry(old_dir, &old_dentry->d_name, &old_page);
 	if (!old_entry) {
 		if (IS_ERR(old_page))
 			err = PTR_ERR(old_page);
-		goto out_unlock_old;
+		goto out;
 	}
 
 	if (S_ISDIR(old_inode->i_mode)) {
@@ -1116,9 +1108,6 @@ static int f2fs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
 
 	f2fs_unlock_op(sbi);
 
-	if (S_ISDIR(old_inode->i_mode))
-		inode_unlock(old_inode);
-
 	if (IS_DIRSYNC(old_dir) || IS_DIRSYNC(new_dir))
 		f2fs_sync_fs(sbi->sb, 1);
 
@@ -1133,9 +1122,6 @@ static int f2fs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
 		f2fs_put_page(old_dir_page, 0);
 out_old:
 	f2fs_put_page(old_page, 0);
-out_unlock_old:
-	if (S_ISDIR(old_inode->i_mode))
-		inode_unlock(old_inode);
 out:
 	iput(whiteout);
 	return err;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/6] fs: Establish locking order for unrelated directories
  2023-05-25 10:16 [PATCH 0/6] fs: Fix directory corruption when moving directories Jan Kara
                   ` (2 preceding siblings ...)
  2023-05-25 10:16 ` [PATCH 3/6] Revert "f2fs: fix potential corruption when moving a directory" Jan Kara
@ 2023-05-25 10:16 ` Jan Kara
  2023-05-26  9:45   ` Christian Brauner
  2023-05-25 10:16 ` [PATCH 5/6] fs: Lock moved directories Jan Kara
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Jan Kara @ 2023-05-25 10:16 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Christian Brauner, Miklos Szeredi,
	Darrick J. Wong, Ted Tso, Jaegeuk Kim, linux-ext4,
	linux-f2fs-devel, linux-xfs, Jan Kara, stable

Currently the locking order of inode locks for directories that are not
in ancestor relationship is not defined because all operations that
needed to lock two directories like this were serialized by
sb->s_vfs_rename_mutex. However some filesystems need to lock two
subdirectories for RENAME_EXCHANGE operations and for this we need the
locking order established even for two tree-unrelated directories.
Provide a helper function lock_two_inodes() that establishes lock
ordering for any two inodes and use it in lock_two_directories().

CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/inode.c    | 34 ++++++++++++++++++++++++++++++++++
 fs/internal.h |  2 ++
 fs/namei.c    |  4 ++--
 3 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 577799b7855f..2015fa50d34a 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1103,6 +1103,40 @@ void discard_new_inode(struct inode *inode)
 }
 EXPORT_SYMBOL(discard_new_inode);
 
+/**
+ * lock_two_inodes - lock two inodes (may be regular files but also dirs)
+ *
+ * Lock any non-NULL argument. The caller must make sure that if he is passing
+ * in two directories, one is not ancestor of the other.  Zero, one or two
+ * objects may be locked by this function.
+ *
+ * @inode1: first inode to lock
+ * @inode2: second inode to lock
+ * @subclass1: inode lock subclass for the first lock obtained
+ * @subclass2: inode lock subclass for the second lock obtained
+ */
+void lock_two_inodes(struct inode *inode1, struct inode *inode2,
+		     unsigned subclass1, unsigned subclass2)
+{
+	if (!inode1 || !inode2)
+		goto lock;
+
+	/*
+	 * If one object is directory and the other is not, we must make sure
+	 * to lock directory first as the other object may be its child.
+	 */
+	if (S_ISDIR(inode2->i_mode) == S_ISDIR(inode1->i_mode)) {
+		if (inode1 > inode2)
+			swap(inode1, inode2);
+	} else if (!S_ISDIR(inode1->i_mode))
+		swap(inode1, inode2);
+lock:
+	if (inode1)
+		inode_lock_nested(inode1, subclass1);
+	if (inode2 && inode2 != inode1)
+		inode_lock_nested(inode2, subclass2);
+}
+
 /**
  * lock_two_nondirectories - take two i_mutexes on non-directory objects
  *
diff --git a/fs/internal.h b/fs/internal.h
index bd3b2810a36b..377030a50aca 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -152,6 +152,8 @@ extern long prune_icache_sb(struct super_block *sb, struct shrink_control *sc);
 int dentry_needs_remove_privs(struct mnt_idmap *, struct dentry *dentry);
 bool in_group_or_capable(struct mnt_idmap *idmap,
 			 const struct inode *inode, vfsgid_t vfsgid);
+void lock_two_inodes(struct inode *inode1, struct inode *inode2,
+		     unsigned subclass1, unsigned subclass2);
 
 /*
  * fs-writeback.c
diff --git a/fs/namei.c b/fs/namei.c
index e4fe0879ae55..148570aabe74 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3028,8 +3028,8 @@ static struct dentry *lock_two_directories(struct dentry *p1, struct dentry *p2)
 		return p;
 	}
 
-	inode_lock_nested(p1->d_inode, I_MUTEX_PARENT);
-	inode_lock_nested(p2->d_inode, I_MUTEX_PARENT2);
+	lock_two_inodes(p1->d_inode, p2->d_inode,
+			I_MUTEX_PARENT, I_MUTEX_PARENT2);
 	return NULL;
 }
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/6] fs: Lock moved directories
  2023-05-25 10:16 [PATCH 0/6] fs: Fix directory corruption when moving directories Jan Kara
                   ` (3 preceding siblings ...)
  2023-05-25 10:16 ` [PATCH 4/6] fs: Establish locking order for unrelated directories Jan Kara
@ 2023-05-25 10:16 ` Jan Kara
  2023-05-25 10:16 ` [PATCH 6/6] fs: Restrict lock_two_nondirectories() to non-directory inodes Jan Kara
  2023-05-26 15:58 ` [PATCH 0/6] fs: Fix directory corruption when moving directories Christian Brauner
  6 siblings, 0 replies; 14+ messages in thread
From: Jan Kara @ 2023-05-25 10:16 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Christian Brauner, Miklos Szeredi,
	Darrick J. Wong, Ted Tso, Jaegeuk Kim, linux-ext4,
	linux-f2fs-devel, linux-xfs, Jan Kara, stable

When a directory is moved to a different directory, some filesystems
(udf, ext4, ocfs2, f2fs, and likely gfs2, reiserfs, and others) need to
update their pointer to the parent and this must not race with other
operations on the directory. Lock the directories when they are moved.
Although not all filesystems need this locking, we perform it in
vfs_rename() because getting the lock ordering right is really difficult
and we don't want to expose these locking details to filesystems.

CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 .../filesystems/directory-locking.rst         | 26 ++++++++++---------
 fs/namei.c                                    | 22 ++++++++++------
 2 files changed, 28 insertions(+), 20 deletions(-)

diff --git a/Documentation/filesystems/directory-locking.rst b/Documentation/filesystems/directory-locking.rst
index 504ba940c36c..dccd61c7c5c3 100644
--- a/Documentation/filesystems/directory-locking.rst
+++ b/Documentation/filesystems/directory-locking.rst
@@ -22,12 +22,11 @@ exclusive.
 3) object removal.  Locking rules: caller locks parent, finds victim,
 locks victim and calls the method.  Locks are exclusive.
 
-4) rename() that is _not_ cross-directory.  Locking rules: caller locks
-the parent and finds source and target.  In case of exchange (with
-RENAME_EXCHANGE in flags argument) lock both.  In any case,
-if the target already exists, lock it.  If the source is a non-directory,
-lock it.  If we need to lock both, lock them in inode pointer order.
-Then call the method.  All locks are exclusive.
+4) rename() that is _not_ cross-directory.  Locking rules: caller locks the
+parent and finds source and target.  We lock both (provided they exist).  If we
+need to lock two inodes of different type (dir vs non-dir), we lock directory
+first.  If we need to lock two inodes of the same type, lock them in inode
+pointer order.  Then call the method.  All locks are exclusive.
 NB: we might get away with locking the source (and target in exchange
 case) shared.
 
@@ -44,15 +43,17 @@ All locks are exclusive.
 rules:
 
 	* lock the filesystem
-	* lock parents in "ancestors first" order.
+	* lock parents in "ancestors first" order. If one is not ancestor of
+	  the other, lock them in inode pointer order.
 	* find source and target.
 	* if old parent is equal to or is a descendent of target
 	  fail with -ENOTEMPTY
 	* if new parent is equal to or is a descendent of source
 	  fail with -ELOOP
-	* If it's an exchange, lock both the source and the target.
-	* If the target exists, lock it.  If the source is a non-directory,
-	  lock it.  If we need to lock both, do so in inode pointer order.
+	* Lock both the source and the target provided they exist. If we
+	  need to lock two inodes of different type (dir vs non-dir), we lock
+	  the directory first. If we need to lock two inodes of the same type,
+	  lock them in inode pointer order.
 	* call the method.
 
 All ->i_rwsem are taken exclusive.  Again, we might get away with locking
@@ -66,8 +67,9 @@ If no directory is its own ancestor, the scheme above is deadlock-free.
 
 Proof:
 
-	First of all, at any moment we have a partial ordering of the
-	objects - A < B iff A is an ancestor of B.
+	First of all, at any moment we have a linear ordering of the
+	objects - A < B iff (A is an ancestor of B) or (B is not an ancestor
+        of A and ptr(A) < ptr(B)).
 
 	That ordering can change.  However, the following is true:
 
diff --git a/fs/namei.c b/fs/namei.c
index 148570aabe74..6a5e26a529e1 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4731,7 +4731,7 @@ SYSCALL_DEFINE2(link, const char __user *, oldname, const char __user *, newname
  *	   sb->s_vfs_rename_mutex. We might be more accurate, but that's another
  *	   story.
  *	c) we have to lock _four_ objects - parents and victim (if it exists),
- *	   and source (if it is not a directory).
+ *	   and source.
  *	   And that - after we got ->i_mutex on parents (until then we don't know
  *	   whether the target exists).  Solution: try to be smart with locking
  *	   order for inodes.  We rely on the fact that tree topology may change
@@ -4815,10 +4815,16 @@ int vfs_rename(struct renamedata *rd)
 
 	take_dentry_name_snapshot(&old_name, old_dentry);
 	dget(new_dentry);
-	if (!is_dir || (flags & RENAME_EXCHANGE))
-		lock_two_nondirectories(source, target);
-	else if (target)
-		inode_lock(target);
+	/*
+	 * Lock all moved children. Moved directories may need to change parent
+	 * pointer so they need the lock to prevent against concurrent
+	 * directory changes moving parent pointer. For regular files we've
+	 * historically always done this. The lockdep locking subclasses are
+	 * somewhat arbitrary but RENAME_EXCHANGE in particular can swap
+	 * regular files and directories so it's difficult to tell which
+	 * subclasses to use.
+	 */
+	lock_two_inodes(source, target, I_MUTEX_NORMAL, I_MUTEX_NONDIR2);
 
 	error = -EPERM;
 	if (IS_SWAPFILE(source) || (target && IS_SWAPFILE(target)))
@@ -4866,9 +4872,9 @@ int vfs_rename(struct renamedata *rd)
 			d_exchange(old_dentry, new_dentry);
 	}
 out:
-	if (!is_dir || (flags & RENAME_EXCHANGE))
-		unlock_two_nondirectories(source, target);
-	else if (target)
+	if (source)
+		inode_unlock(source);
+	if (target)
 		inode_unlock(target);
 	dput(new_dentry);
 	if (!error) {
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 6/6] fs: Restrict lock_two_nondirectories() to non-directory inodes
  2023-05-25 10:16 [PATCH 0/6] fs: Fix directory corruption when moving directories Jan Kara
                   ` (4 preceding siblings ...)
  2023-05-25 10:16 ` [PATCH 5/6] fs: Lock moved directories Jan Kara
@ 2023-05-25 10:16 ` Jan Kara
  2023-05-26 12:13   ` Amir Goldstein
  2023-05-26 15:58 ` [PATCH 0/6] fs: Fix directory corruption when moving directories Christian Brauner
  6 siblings, 1 reply; 14+ messages in thread
From: Jan Kara @ 2023-05-25 10:16 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Christian Brauner, Miklos Szeredi,
	Darrick J. Wong, Ted Tso, Jaegeuk Kim, linux-ext4,
	linux-f2fs-devel, linux-xfs, Jan Kara

Currently lock_two_nondirectories() is skipping any passed directories.
After vfs_rename() uses lock_two_inodes(), all the remaining four users
of this function pass only regular files to it. So drop the somewhat
unusual "skip directory" logic and instead warn if anybody passes
directory to it. This also allows us to use lock_two_inodes() in
lock_two_nondirectories() to concentrate the lock ordering logic in less
places.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/inode.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 2015fa50d34a..accf5125a049 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1140,7 +1140,7 @@ void lock_two_inodes(struct inode *inode1, struct inode *inode2,
 /**
  * lock_two_nondirectories - take two i_mutexes on non-directory objects
  *
- * Lock any non-NULL argument that is not a directory.
+ * Lock any non-NULL argument. Passed objects must not be directories.
  * Zero, one or two objects may be locked by this function.
  *
  * @inode1: first inode to lock
@@ -1148,13 +1148,9 @@ void lock_two_inodes(struct inode *inode1, struct inode *inode2,
  */
 void lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
 {
-	if (inode1 > inode2)
-		swap(inode1, inode2);
-
-	if (inode1 && !S_ISDIR(inode1->i_mode))
-		inode_lock(inode1);
-	if (inode2 && !S_ISDIR(inode2->i_mode) && inode2 != inode1)
-		inode_lock_nested(inode2, I_MUTEX_NONDIR2);
+	WARN_ON_ONCE(S_ISDIR(inode1->i_mode));
+	WARN_ON_ONCE(S_ISDIR(inode2->i_mode));
+	lock_two_inodes(inode1, inode2, I_MUTEX_NORMAL, I_MUTEX_NONDIR2);
 }
 EXPORT_SYMBOL(lock_two_nondirectories);
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/6] fs: Establish locking order for unrelated directories
  2023-05-25 10:16 ` [PATCH 4/6] fs: Establish locking order for unrelated directories Jan Kara
@ 2023-05-26  9:45   ` Christian Brauner
  2023-05-29 12:41     ` Jan Kara
  0 siblings, 1 reply; 14+ messages in thread
From: Christian Brauner @ 2023-05-26  9:45 UTC (permalink / raw)
  To: Jan Kara
  Cc: Al Viro, linux-fsdevel, Miklos Szeredi, Darrick J. Wong, Ted Tso,
	Jaegeuk Kim, linux-ext4, linux-f2fs-devel, linux-xfs, stable

On Thu, May 25, 2023 at 12:16:10PM +0200, Jan Kara wrote:
> Currently the locking order of inode locks for directories that are not
> in ancestor relationship is not defined because all operations that
> needed to lock two directories like this were serialized by
> sb->s_vfs_rename_mutex. However some filesystems need to lock two
> subdirectories for RENAME_EXCHANGE operations and for this we need the
> locking order established even for two tree-unrelated directories.
> Provide a helper function lock_two_inodes() that establishes lock
> ordering for any two inodes and use it in lock_two_directories().
> 
> CC: stable@vger.kernel.org
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/inode.c    | 34 ++++++++++++++++++++++++++++++++++
>  fs/internal.h |  2 ++
>  fs/namei.c    |  4 ++--
>  3 files changed, 38 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index 577799b7855f..2015fa50d34a 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -1103,6 +1103,40 @@ void discard_new_inode(struct inode *inode)
>  }
>  EXPORT_SYMBOL(discard_new_inode);
>  
> +/**
> + * lock_two_inodes - lock two inodes (may be regular files but also dirs)
> + *
> + * Lock any non-NULL argument. The caller must make sure that if he is passing
> + * in two directories, one is not ancestor of the other.  Zero, one or two
> + * objects may be locked by this function.
> + *
> + * @inode1: first inode to lock
> + * @inode2: second inode to lock
> + * @subclass1: inode lock subclass for the first lock obtained
> + * @subclass2: inode lock subclass for the second lock obtained
> + */
> +void lock_two_inodes(struct inode *inode1, struct inode *inode2,
> +		     unsigned subclass1, unsigned subclass2)
> +{
> +	if (!inode1 || !inode2)
> +		goto lock;

Before this change in

lock_two_nondirectories(struct inode *inode1, struct inode *inode2)

the swap() would cause the non-NULL inode to always be locked with
I_MUTEX_NONDIR2. Now it can be either I_MUTEX_NORMAL or I_MUTEX_NONDIR2.
Is that change intentional?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 6/6] fs: Restrict lock_two_nondirectories() to non-directory inodes
  2023-05-25 10:16 ` [PATCH 6/6] fs: Restrict lock_two_nondirectories() to non-directory inodes Jan Kara
@ 2023-05-26 12:13   ` Amir Goldstein
  2023-05-29 12:42     ` Jan Kara
  0 siblings, 1 reply; 14+ messages in thread
From: Amir Goldstein @ 2023-05-26 12:13 UTC (permalink / raw)
  To: Jan Kara
  Cc: Al Viro, linux-fsdevel, Christian Brauner, Miklos Szeredi,
	Darrick J. Wong, Ted Tso, Jaegeuk Kim, linux-ext4,
	linux-f2fs-devel, linux-xfs

On Thu, May 25, 2023 at 1:17 PM Jan Kara <jack@suse.cz> wrote:
>
> Currently lock_two_nondirectories() is skipping any passed directories.
> After vfs_rename() uses lock_two_inodes(), all the remaining four users
> of this function pass only regular files to it. So drop the somewhat
> unusual "skip directory" logic and instead warn if anybody passes
> directory to it. This also allows us to use lock_two_inodes() in
> lock_two_nondirectories() to concentrate the lock ordering logic in less
> places.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/inode.c | 12 ++++--------
>  1 file changed, 4 insertions(+), 8 deletions(-)
>
> diff --git a/fs/inode.c b/fs/inode.c
> index 2015fa50d34a..accf5125a049 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -1140,7 +1140,7 @@ void lock_two_inodes(struct inode *inode1, struct inode *inode2,
>  /**
>   * lock_two_nondirectories - take two i_mutexes on non-directory objects
>   *
> - * Lock any non-NULL argument that is not a directory.
> + * Lock any non-NULL argument. Passed objects must not be directories.
>   * Zero, one or two objects may be locked by this function.
>   *
>   * @inode1: first inode to lock
> @@ -1148,13 +1148,9 @@ void lock_two_inodes(struct inode *inode1, struct inode *inode2,
>   */
>  void lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
>  {
> -       if (inode1 > inode2)
> -               swap(inode1, inode2);
> -
> -       if (inode1 && !S_ISDIR(inode1->i_mode))
> -               inode_lock(inode1);
> -       if (inode2 && !S_ISDIR(inode2->i_mode) && inode2 != inode1)
> -               inode_lock_nested(inode2, I_MUTEX_NONDIR2);
> +       WARN_ON_ONCE(S_ISDIR(inode1->i_mode));
> +       WARN_ON_ONCE(S_ISDIR(inode2->i_mode));
> +       lock_two_inodes(inode1, inode2, I_MUTEX_NORMAL, I_MUTEX_NONDIR2);
>  }
>  EXPORT_SYMBOL(lock_two_nondirectories);
>

Need the same treatment to unlock_two_nondirectories() because now if
someone does pass a directory they will get a warning but also a leaked lock.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/6] fs: Fix directory corruption when moving directories
  2023-05-25 10:16 [PATCH 0/6] fs: Fix directory corruption when moving directories Jan Kara
                   ` (5 preceding siblings ...)
  2023-05-25 10:16 ` [PATCH 6/6] fs: Restrict lock_two_nondirectories() to non-directory inodes Jan Kara
@ 2023-05-26 15:58 ` Christian Brauner
  2023-05-31 14:09   ` Christian Brauner
  6 siblings, 1 reply; 14+ messages in thread
From: Christian Brauner @ 2023-05-26 15:58 UTC (permalink / raw)
  To: Jan Kara
  Cc: Al Viro, linux-fsdevel, Miklos Szeredi, Darrick J. Wong, Ted Tso,
	Jaegeuk Kim, linux-ext4, linux-f2fs-devel, linux-xfs

On Thu, May 25, 2023 at 12:16:06PM +0200, Jan Kara wrote:
> Hello,
> 
> this patch set fixes a problem with cross directory renames originally reported
> in [1]. To quickly sum it up some filesystems (so far we know at least about
> ext4, udf, f2fs, ocfs2, likely also reiserfs, gfs2 and others) need to lock the
> directory when it is being renamed into another directory. This is because we
> need to update the parent pointer in the directory in that case and if that
> races with other operation on the directory (in particular a conversion from
> one directory format into another), bad things can happen.
> 
> So far we've done the locking in the filesystem code but recently Darrick
> pointed out [2] that we've missed the RENAME_EXCHANGE case in our ext4 fix.
> That one is particularly nasty because RENAME_EXCHANGE can arbitrarily mix
> regular files and directories and proper lock ordering is not achievable in the
> filesystems alone.
> 
> This patch set adds locking into vfs_rename() so that not only parent
> directories but also moved inodes (regardless whether they are directories or
> not) are locked when calling into the filesystem.

This locking is trauma inducing.

So I was staring at this for a long time and the thing that bothered me
big time was that I couldn't easily figure out how we ended up with the
locking scheme that we have. So I went digging. Corrections and
additions very welcome...

Before 914a6e0ea12 ("Import 2.3.51pre1") locking for rename was based on
s_vfs_rename_sem and i_sem. For both within- and across-directory
renames s_vfs_rename_sem was acquired and the i_sem on the parent
directories was acquired but the target wasn't locked.

Then 914a6e0ea12 ("Import 2.3.51pre1") introduced an additional i_zombie
semaphore to protect against create, link, mknod, mkdir, unlink, and
rmdir on the target. So i_zombie had to be acquired during
vfs_rename_dir() on both parent and the victim but only if the source
was a directory. Back then RENAME_EXCHANGE didn't exist so if source was
a directory then target if it existed must've been a directory as well.

The original reasoning behind only locking the target if the source was
a directory was based on some filesystems not being able to deal with
opened but unlinked directories.

The i_zombie finally died in 1b3d7c93c6d ("[PATCH] (2.5.4) death of
->i_zombie") and a new locking scheme was introduced. The
s_vfs_rename_sem was now only used for across-directory renames. Instead
of having i_zombie and i_sem only i_sem was left. Now, both
vfs_rename_dir(/* if renaming directory */) and vfs_rename_other()
grabbed i_sem on both parents and on the target. So now the target was
always explicitly protected against a concurrent unlink or rmdir
as that would be done as part of the rename operation and that race
would just been awkward to allow afaict. Probably always was.

The locking of source however is a different story. This was introduced
in 6cedba8962f4 ("vfs: take i_mutex on renamed file") to prevent new
delegations from being acquired during rename, link, or unlink. So now
both source and target were locked. The target seemingly because it
would be unlinked in the filesystem's rename method and the source to
prevent new delegations from being acquired. So, since leases can only
be taken on regular files vfs_setlease()/generic_setlease() directories
were never considered for locking. So the lock never had to be acquired
for source directories.

So in any case, under the assumption that the broad strokes are correct
there seems to be no inherent reason why locking source and target if
they're directories couldn't be done if the ordering is well-defined.
Which is what originally made me hesitate. IOW, given my current
knowledge this seems ok.

Frankly, if we end up unconditionally locking source and target we're in
a better place from a pure maintainability perspective as far as I'm
concerned. Even if we end up taking the lock pointlessly for e.g., NFS
with RENAME_EXCHANGE. The last thing we need is more conditions on when
things are locked and why.

/me goes off into the weekend

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/6] fs: Establish locking order for unrelated directories
  2023-05-26  9:45   ` Christian Brauner
@ 2023-05-29 12:41     ` Jan Kara
  2023-05-30 12:42       ` Christian Brauner
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Kara @ 2023-05-29 12:41 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Jan Kara, Al Viro, linux-fsdevel, Miklos Szeredi,
	Darrick J. Wong, Ted Tso, Jaegeuk Kim, linux-ext4,
	linux-f2fs-devel, linux-xfs, stable

On Fri 26-05-23 11:45:15, Christian Brauner wrote:
> On Thu, May 25, 2023 at 12:16:10PM +0200, Jan Kara wrote:
> > Currently the locking order of inode locks for directories that are not
> > in ancestor relationship is not defined because all operations that
> > needed to lock two directories like this were serialized by
> > sb->s_vfs_rename_mutex. However some filesystems need to lock two
> > subdirectories for RENAME_EXCHANGE operations and for this we need the
> > locking order established even for two tree-unrelated directories.
> > Provide a helper function lock_two_inodes() that establishes lock
> > ordering for any two inodes and use it in lock_two_directories().
> > 
> > CC: stable@vger.kernel.org
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  fs/inode.c    | 34 ++++++++++++++++++++++++++++++++++
> >  fs/internal.h |  2 ++
> >  fs/namei.c    |  4 ++--
> >  3 files changed, 38 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/inode.c b/fs/inode.c
> > index 577799b7855f..2015fa50d34a 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -1103,6 +1103,40 @@ void discard_new_inode(struct inode *inode)
> >  }
> >  EXPORT_SYMBOL(discard_new_inode);
> >  
> > +/**
> > + * lock_two_inodes - lock two inodes (may be regular files but also dirs)
> > + *
> > + * Lock any non-NULL argument. The caller must make sure that if he is passing
> > + * in two directories, one is not ancestor of the other.  Zero, one or two
> > + * objects may be locked by this function.
> > + *
> > + * @inode1: first inode to lock
> > + * @inode2: second inode to lock
> > + * @subclass1: inode lock subclass for the first lock obtained
> > + * @subclass2: inode lock subclass for the second lock obtained
> > + */
> > +void lock_two_inodes(struct inode *inode1, struct inode *inode2,
> > +		     unsigned subclass1, unsigned subclass2)
> > +{
> > +	if (!inode1 || !inode2)
> > +		goto lock;
> 
> Before this change in
> 
> lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
> 
> the swap() would cause the non-NULL inode to always be locked with
> I_MUTEX_NONDIR2. Now it can be either I_MUTEX_NORMAL or I_MUTEX_NONDIR2.
> Is that change intentional?

Kind of. I don't think we really care so I didn't bother to complicate the
code for this. If you think keeping the lockdep class consistent is worth
it, I can modify the patch...

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 6/6] fs: Restrict lock_two_nondirectories() to non-directory inodes
  2023-05-26 12:13   ` Amir Goldstein
@ 2023-05-29 12:42     ` Jan Kara
  0 siblings, 0 replies; 14+ messages in thread
From: Jan Kara @ 2023-05-29 12:42 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, Al Viro, linux-fsdevel, Christian Brauner,
	Miklos Szeredi, Darrick J. Wong, Ted Tso, Jaegeuk Kim,
	linux-ext4, linux-f2fs-devel, linux-xfs

On Fri 26-05-23 15:13:06, Amir Goldstein wrote:
> On Thu, May 25, 2023 at 1:17 PM Jan Kara <jack@suse.cz> wrote:
> >
> > Currently lock_two_nondirectories() is skipping any passed directories.
> > After vfs_rename() uses lock_two_inodes(), all the remaining four users
> > of this function pass only regular files to it. So drop the somewhat
> > unusual "skip directory" logic and instead warn if anybody passes
> > directory to it. This also allows us to use lock_two_inodes() in
> > lock_two_nondirectories() to concentrate the lock ordering logic in less
> > places.
> >
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  fs/inode.c | 12 ++++--------
> >  1 file changed, 4 insertions(+), 8 deletions(-)
> >
> > diff --git a/fs/inode.c b/fs/inode.c
> > index 2015fa50d34a..accf5125a049 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -1140,7 +1140,7 @@ void lock_two_inodes(struct inode *inode1, struct inode *inode2,
> >  /**
> >   * lock_two_nondirectories - take two i_mutexes on non-directory objects
> >   *
> > - * Lock any non-NULL argument that is not a directory.
> > + * Lock any non-NULL argument. Passed objects must not be directories.
> >   * Zero, one or two objects may be locked by this function.
> >   *
> >   * @inode1: first inode to lock
> > @@ -1148,13 +1148,9 @@ void lock_two_inodes(struct inode *inode1, struct inode *inode2,
> >   */
> >  void lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
> >  {
> > -       if (inode1 > inode2)
> > -               swap(inode1, inode2);
> > -
> > -       if (inode1 && !S_ISDIR(inode1->i_mode))
> > -               inode_lock(inode1);
> > -       if (inode2 && !S_ISDIR(inode2->i_mode) && inode2 != inode1)
> > -               inode_lock_nested(inode2, I_MUTEX_NONDIR2);
> > +       WARN_ON_ONCE(S_ISDIR(inode1->i_mode));
> > +       WARN_ON_ONCE(S_ISDIR(inode2->i_mode));
> > +       lock_two_inodes(inode1, inode2, I_MUTEX_NORMAL, I_MUTEX_NONDIR2);
> >  }
> >  EXPORT_SYMBOL(lock_two_nondirectories);
> >
> 
> Need the same treatment to unlock_two_nondirectories() because now if
> someone does pass a directory they will get a warning but also a leaked lock.

Yes, probably that is good defensive programming. I'll update the patch.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/6] fs: Establish locking order for unrelated directories
  2023-05-29 12:41     ` Jan Kara
@ 2023-05-30 12:42       ` Christian Brauner
  0 siblings, 0 replies; 14+ messages in thread
From: Christian Brauner @ 2023-05-30 12:42 UTC (permalink / raw)
  To: Jan Kara
  Cc: Al Viro, linux-fsdevel, Miklos Szeredi, Darrick J. Wong, Ted Tso,
	Jaegeuk Kim, linux-ext4, linux-f2fs-devel, linux-xfs, stable

On Mon, May 29, 2023 at 02:41:31PM +0200, Jan Kara wrote:
> On Fri 26-05-23 11:45:15, Christian Brauner wrote:
> > On Thu, May 25, 2023 at 12:16:10PM +0200, Jan Kara wrote:
> > > Currently the locking order of inode locks for directories that are not
> > > in ancestor relationship is not defined because all operations that
> > > needed to lock two directories like this were serialized by
> > > sb->s_vfs_rename_mutex. However some filesystems need to lock two
> > > subdirectories for RENAME_EXCHANGE operations and for this we need the
> > > locking order established even for two tree-unrelated directories.
> > > Provide a helper function lock_two_inodes() that establishes lock
> > > ordering for any two inodes and use it in lock_two_directories().
> > > 
> > > CC: stable@vger.kernel.org
> > > Signed-off-by: Jan Kara <jack@suse.cz>
> > > ---
> > >  fs/inode.c    | 34 ++++++++++++++++++++++++++++++++++
> > >  fs/internal.h |  2 ++
> > >  fs/namei.c    |  4 ++--
> > >  3 files changed, 38 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/fs/inode.c b/fs/inode.c
> > > index 577799b7855f..2015fa50d34a 100644
> > > --- a/fs/inode.c
> > > +++ b/fs/inode.c
> > > @@ -1103,6 +1103,40 @@ void discard_new_inode(struct inode *inode)
> > >  }
> > >  EXPORT_SYMBOL(discard_new_inode);
> > >  
> > > +/**
> > > + * lock_two_inodes - lock two inodes (may be regular files but also dirs)
> > > + *
> > > + * Lock any non-NULL argument. The caller must make sure that if he is passing
> > > + * in two directories, one is not ancestor of the other.  Zero, one or two
> > > + * objects may be locked by this function.
> > > + *
> > > + * @inode1: first inode to lock
> > > + * @inode2: second inode to lock
> > > + * @subclass1: inode lock subclass for the first lock obtained
> > > + * @subclass2: inode lock subclass for the second lock obtained
> > > + */
> > > +void lock_two_inodes(struct inode *inode1, struct inode *inode2,
> > > +		     unsigned subclass1, unsigned subclass2)
> > > +{
> > > +	if (!inode1 || !inode2)
> > > +		goto lock;
> > 
> > Before this change in
> > 
> > lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
> > 
> > the swap() would cause the non-NULL inode to always be locked with
> > I_MUTEX_NONDIR2. Now it can be either I_MUTEX_NORMAL or I_MUTEX_NONDIR2.
> > Is that change intentional?
> 
> Kind of. I don't think we really care so I didn't bother to complicate the
> code for this. If you think keeping the lockdep class consistent is worth
> it, I can modify the patch...

Either a short comment or consistent lockdep class would be nice. I know
it probably doesn't matter much but otherwise someone may end up
wondering whether that's ok or not.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/6] fs: Fix directory corruption when moving directories
  2023-05-26 15:58 ` [PATCH 0/6] fs: Fix directory corruption when moving directories Christian Brauner
@ 2023-05-31 14:09   ` Christian Brauner
  0 siblings, 0 replies; 14+ messages in thread
From: Christian Brauner @ 2023-05-31 14:09 UTC (permalink / raw)
  To: Jan Kara
  Cc: Al Viro, linux-fsdevel, Miklos Szeredi, Darrick J. Wong, Ted Tso,
	Jaegeuk Kim, linux-ext4, linux-f2fs-devel, linux-xfs

On Fri, May 26, 2023 at 05:58:14PM +0200, Christian Brauner wrote:
> On Thu, May 25, 2023 at 12:16:06PM +0200, Jan Kara wrote:
> > Hello,
> > 
> > this patch set fixes a problem with cross directory renames originally reported
> > in [1]. To quickly sum it up some filesystems (so far we know at least about
> > ext4, udf, f2fs, ocfs2, likely also reiserfs, gfs2 and others) need to lock the
> > directory when it is being renamed into another directory. This is because we
> > need to update the parent pointer in the directory in that case and if that
> > races with other operation on the directory (in particular a conversion from
> > one directory format into another), bad things can happen.
> > 
> > So far we've done the locking in the filesystem code but recently Darrick
> > pointed out [2] that we've missed the RENAME_EXCHANGE case in our ext4 fix.
> > That one is particularly nasty because RENAME_EXCHANGE can arbitrarily mix
> > regular files and directories and proper lock ordering is not achievable in the
> > filesystems alone.
> > 
> > This patch set adds locking into vfs_rename() so that not only parent
> > directories but also moved inodes (regardless whether they are directories or
> > not) are locked when calling into the filesystem.
> 
> This locking is trauma inducing.
> 
> So I was staring at this for a long time and the thing that bothered me
> big time was that I couldn't easily figure out how we ended up with the
> locking scheme that we have. So I went digging. Corrections and
> additions very welcome...

This kept being stuck in the back of my mind so a few
additions/corrections.

> 
> Before 914a6e0ea12 ("Import 2.3.51pre1") locking for rename was based on
> s_vfs_rename_sem and i_sem. For both within- and across-directory
> renames s_vfs_rename_sem was acquired and the i_sem on the parent
> directories was acquired but the target wasn't locked.
> 
> Then 914a6e0ea12 ("Import 2.3.51pre1") introduced an additional i_zombie
> semaphore to protect against create, link, mknod, mkdir, unlink, and
> rmdir on the target. So i_zombie had to be acquired during

If I reconstructed this correctly, then the motivation behind the
introduction of i_zombie was that locking order for i_sem was solely
based on pointer order before 1b3d7c93c6d ("[PATCH] (2.5.4"). So when it
turned out that an existing directory that was replaced during a rename
had to be locked it would've meant deadlocks:

mv /a/b /a/c/d                          rmdir /a/c/d
pointer order: a < c                    pointer order: d > c

inode_lock(a->i_sem)
inode_lock(c->i_sem)                    inode_lock(d->i_sem)

// acquired separately
// after lock on parents
inode_lock(d->i_sem)                    inode_lock(c->i_sem)

and the immediate fix was to introduce a separate i_zombie mutex that
was used to protect against concurrent removals of the target directory.

> vfs_rename_dir() on both parent and the victim but only if the source
> was a directory. Back then RENAME_EXCHANGE didn't exist so if source was
> a directory then target if it existed must've been a directory as well.
> 
> The original reasoning behind only locking the target if the source was
> a directory was based on some filesystems not being able to deal with
> opened but unlinked directories.
> 
> The i_zombie finally died in 1b3d7c93c6d ("[PATCH] (2.5.4) death of
> ->i_zombie") and a new locking scheme was introduced. The

And that locking scheme realized that the topological ordering of the
subtrees we're operating on makes it possible to lock in ancestor
order. IOW, lock parent directories before child directories and if no
ancestor relationship exists lock by pointer ordering. This allowed to
kill i_zombie...

> s_vfs_rename_sem was now only used for across-directory renames. Instead
> of having i_zombie and i_sem only i_sem was left. Now, both
> vfs_rename_dir(/* if renaming directory */) and vfs_rename_other()
> grabbed i_sem on both parents and on the target. So now the target was
> always explicitly protected against a concurrent unlink or rmdir
> as that would be done as part of the rename operation and that race
> would just been awkward to allow afaict. Probably always was.

I think the main worry making the target directory locking necessary
during rename were also scenarios like the following. Because an
existing target directory that was about to be removed during rename
wasn't locked - in contrast to an explicit rmdir - it would've meant
that the following race should've been possible:

mv /a/b /a/c/d                          touch /a/c/d/e

inode_lock(a->i_sem)
inode_lock(c->i_sem)
// now we're removing d                 // lock on d isn't held by rename
                                        inode_lock(d->i_sem) 
                                        // add new file into d

iow, this would be racing adding a new link to the directory that's
in the process of being removed.

> 
> The locking of source however is a different story. This was introduced
> in 6cedba8962f4 ("vfs: take i_mutex on renamed file") to prevent new
> delegations from being acquired during rename, link, or unlink. So now
> both source and target were locked. The target seemingly because it
> would be unlinked in the filesystem's rename method and the source to
> prevent new delegations from being acquired. So, since leases can only
> be taken on regular files vfs_setlease()/generic_setlease() directories
> were never considered for locking. So the lock never had to be acquired
> for source directories.
> 
> So in any case, under the assumption that the broad strokes are correct
> there seems to be no inherent reason why locking source and target if
> they're directories couldn't be done if the ordering is well-defined.
> Which is what originally made me hesitate. IOW, given my current
> knowledge this seems ok.
> 
> Frankly, if we end up unconditionally locking source and target we're in
> a better place from a pure maintainability perspective as far as I'm
> concerned. Even if we end up taking the lock pointlessly for e.g., NFS
> with RENAME_EXCHANGE. The last thing we need is more conditions on when
> things are locked and why.
> 
> /me goes off into the weekend

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-05-31 14:18 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-25 10:16 [PATCH 0/6] fs: Fix directory corruption when moving directories Jan Kara
2023-05-25 10:16 ` [PATCH 1/6] ext4: Remove ext4 locking of moved directory Jan Kara
2023-05-25 10:16 ` [PATCH 2/6] Revert "udf: Protect rename against modification of moved directory" Jan Kara
2023-05-25 10:16 ` [PATCH 3/6] Revert "f2fs: fix potential corruption when moving a directory" Jan Kara
2023-05-25 10:16 ` [PATCH 4/6] fs: Establish locking order for unrelated directories Jan Kara
2023-05-26  9:45   ` Christian Brauner
2023-05-29 12:41     ` Jan Kara
2023-05-30 12:42       ` Christian Brauner
2023-05-25 10:16 ` [PATCH 5/6] fs: Lock moved directories Jan Kara
2023-05-25 10:16 ` [PATCH 6/6] fs: Restrict lock_two_nondirectories() to non-directory inodes Jan Kara
2023-05-26 12:13   ` Amir Goldstein
2023-05-29 12:42     ` Jan Kara
2023-05-26 15:58 ` [PATCH 0/6] fs: Fix directory corruption when moving directories Christian Brauner
2023-05-31 14:09   ` Christian Brauner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).