[PATCH v2 0/6] fs: Fix directory corruption when moving directories

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/6] fs: Fix directory corruption when moving directories
@ 2023-06-01 10:58 Jan Kara
  2023-06-01 10:58 ` [PATCH v2 1/6] ext4: Remove ext4 locking of moved directory Jan Kara
                   ` (7 more replies)
  0 siblings, 8 replies; 21+ messages in thread
From: Jan Kara @ 2023-06-01 10:58 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Christian Brauner, Miklos Szeredi,
	Darrick J. Wong, Ted Tso, Jaegeuk Kim, linux-ext4, linux-xfs,
	linux-f2fs-devel, Jan Kara

Hello,

this patch set fixes a problem with cross directory renames originally reported
in [1]. To quickly sum it up some filesystems (so far we know at least about
ext4, udf, f2fs, ocfs2, likely also reiserfs, gfs2 and others) need to lock the
directory when it is being renamed into another directory. This is because we
need to update the parent pointer in the directory in that case and if that
races with other operation on the directory (in particular a conversion from
one directory format into another), bad things can happen.

So far we've done the locking in the filesystem code but recently Darrick
pointed out [2] that we've missed the RENAME_EXCHANGE case in our ext4 fix.
That one is particularly nasty because RENAME_EXCHANGE can arbitrarily mix
regular files and directories and proper lock ordering is not achievable in the
filesystems alone.

This patch set adds locking into vfs_rename() so that not only parent
directories but also moved inodes (regardless whether they are directories or
not) are locked when calling into the filesystem.

Changes since v1:
* Made sure lock_two_inodes() uses subclass1 for the obtained lock in case
  there is only one inode locked
* Fixes unlocked_two_nondirectories() to properly unlock inodes even if
  directories are accidentally passed in.

								Honza

[1] https://lore.kernel.org/all/20230117123735.un7wbamlbdihninm@quack3
[2] https://lore.kernel.org/all/20230517045836.GA11594@frogsfrogsfrogs

Previous versions:
Link: http://lore.kernel.org/r/20230525100654.15069-1-jack@suse.cz # v1

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 1/6] ext4: Remove ext4 locking of moved directory
  2023-06-01 10:58 [PATCH v2 0/6] fs: Fix directory corruption when moving directories Jan Kara
@ 2023-06-01 10:58 ` Jan Kara
  2023-06-01 14:52   ` Theodore Ts'o
  2023-06-01 10:58 ` [PATCH v2 2/6] Revert "udf: Protect rename against modification of moved directory" Jan Kara
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Jan Kara @ 2023-06-01 10:58 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Christian Brauner, Miklos Szeredi,
	Darrick J. Wong, Ted Tso, Jaegeuk Kim, linux-ext4, linux-xfs,
	linux-f2fs-devel, Jan Kara, stable

Remove locking of moved directory in ext4_rename2(). We will take care
of it in VFS instead. This effectively reverts commit 0813299c586b
("ext4: Fix possible corruption when moving a directory") and followup
fixes.

CC: Ted Tso <tytso@mit.edu>
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/namei.c | 17 ++---------------
 1 file changed, 2 insertions(+), 15 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 45b579805c95..0caf6c730ce3 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -3834,19 +3834,10 @@ static int ext4_rename(struct mnt_idmap *idmap, struct inode *old_dir,
 			return retval;
 	}
 
-	/*
-	 * We need to protect against old.inode directory getting converted
-	 * from inline directory format into a normal one.
-	 */
-	if (S_ISDIR(old.inode->i_mode))
-		inode_lock_nested(old.inode, I_MUTEX_NONDIR2);
-
 	old.bh = ext4_find_entry(old.dir, &old.dentry->d_name, &old.de,
 				 &old.inlined);
-	if (IS_ERR(old.bh)) {
-		retval = PTR_ERR(old.bh);
-		goto unlock_moved_dir;
-	}
+	if (IS_ERR(old.bh))
+		return PTR_ERR(old.bh);
 
 	/*
 	 *  Check for inode number is _not_ due to possible IO errors.
@@ -4043,10 +4034,6 @@ static int ext4_rename(struct mnt_idmap *idmap, struct inode *old_dir,
 	brelse(old.bh);
 	brelse(new.bh);
 
-unlock_moved_dir:
-	if (S_ISDIR(old.inode->i_mode))
-		inode_unlock(old.inode);
-
 	return retval;
 }
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 2/6] Revert "udf: Protect rename against modification of moved directory"
  2023-06-01 10:58 [PATCH v2 0/6] fs: Fix directory corruption when moving directories Jan Kara
  2023-06-01 10:58 ` [PATCH v2 1/6] ext4: Remove ext4 locking of moved directory Jan Kara
@ 2023-06-01 10:58 ` Jan Kara
  2023-06-01 10:58 ` [PATCH v2 3/6] Revert "f2fs: fix potential corruption when moving a directory" Jan Kara
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Jan Kara @ 2023-06-01 10:58 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Christian Brauner, Miklos Szeredi,
	Darrick J. Wong, Ted Tso, Jaegeuk Kim, linux-ext4, linux-xfs,
	linux-f2fs-devel, Jan Kara, stable

This reverts commit f950fd0529130a617b3da526da9fb6a896ce87c2. The
locking is going to be provided by vfs_rename() in the following
patches.

CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/udf/namei.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/fs/udf/namei.c b/fs/udf/namei.c
index fd20423d3ed2..fd29a66e7241 100644
--- a/fs/udf/namei.c
+++ b/fs/udf/namei.c
@@ -793,11 +793,6 @@ static int udf_rename(struct mnt_idmap *idmap, struct inode *old_dir,
 			if (!empty_dir(new_inode))
 				goto out_oiter;
 		}
-		/*
-		 * We need to protect against old_inode getting converted from
-		 * ICB to normal directory.
-		 */
-		inode_lock_nested(old_inode, I_MUTEX_NONDIR2);
 		retval = udf_fiiter_find_entry(old_inode, &dotdot_name,
 					       &diriter);
 		if (retval == -ENOENT) {
@@ -806,10 +801,8 @@ static int udf_rename(struct mnt_idmap *idmap, struct inode *old_dir,
 				old_inode->i_ino);
 			retval = -EFSCORRUPTED;
 		}
-		if (retval) {
-			inode_unlock(old_inode);
+		if (retval)
 			goto out_oiter;
-		}
 		has_diriter = true;
 		tloc = lelb_to_cpu(diriter.fi.icb.extLocation);
 		if (udf_get_lb_pblock(old_inode->i_sb, &tloc, 0) !=
@@ -889,7 +882,6 @@ static int udf_rename(struct mnt_idmap *idmap, struct inode *old_dir,
 			       udf_dir_entry_len(&diriter.fi));
 		udf_fiiter_write_fi(&diriter, NULL);
 		udf_fiiter_release(&diriter);
-		inode_unlock(old_inode);
 
 		inode_dec_link_count(old_dir);
 		if (new_inode)
@@ -901,10 +893,8 @@ static int udf_rename(struct mnt_idmap *idmap, struct inode *old_dir,
 	}
 	return 0;
 out_oiter:
-	if (has_diriter) {
+	if (has_diriter)
 		udf_fiiter_release(&diriter);
-		inode_unlock(old_inode);
-	}
 	udf_fiiter_release(&oiter);
 
 	return retval;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 3/6] Revert "f2fs: fix potential corruption when moving a directory"
  2023-06-01 10:58 [PATCH v2 0/6] fs: Fix directory corruption when moving directories Jan Kara
  2023-06-01 10:58 ` [PATCH v2 1/6] ext4: Remove ext4 locking of moved directory Jan Kara
  2023-06-01 10:58 ` [PATCH v2 2/6] Revert "udf: Protect rename against modification of moved directory" Jan Kara
@ 2023-06-01 10:58 ` Jan Kara
  2023-06-01 10:58 ` [PATCH v2 4/6] fs: Establish locking order for unrelated directories Jan Kara
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Jan Kara @ 2023-06-01 10:58 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Christian Brauner, Miklos Szeredi,
	Darrick J. Wong, Ted Tso, Jaegeuk Kim, linux-ext4, linux-xfs,
	linux-f2fs-devel, Jan Kara, stable

This reverts commit d94772154e524b329a168678836745d2773a6e02. The
locking is going to be provided by VFS.

CC: Jaegeuk Kim <jaegeuk@kernel.org>
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/f2fs/namei.c | 16 +---------------
 1 file changed, 1 insertion(+), 15 deletions(-)

diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index 77a71276ecb1..ad597b417fea 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -995,20 +995,12 @@ static int f2fs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
 			goto out;
 	}
 
-	/*
-	 * Copied from ext4_rename: we need to protect against old.inode
-	 * directory getting converted from inline directory format into
-	 * a normal one.
-	 */
-	if (S_ISDIR(old_inode->i_mode))
-		inode_lock_nested(old_inode, I_MUTEX_NONDIR2);
-
 	err = -ENOENT;
 	old_entry = f2fs_find_entry(old_dir, &old_dentry->d_name, &old_page);
 	if (!old_entry) {
 		if (IS_ERR(old_page))
 			err = PTR_ERR(old_page);
-		goto out_unlock_old;
+		goto out;
 	}
 
 	if (S_ISDIR(old_inode->i_mode)) {
@@ -1116,9 +1108,6 @@ static int f2fs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
 
 	f2fs_unlock_op(sbi);
 
-	if (S_ISDIR(old_inode->i_mode))
-		inode_unlock(old_inode);
-
 	if (IS_DIRSYNC(old_dir) || IS_DIRSYNC(new_dir))
 		f2fs_sync_fs(sbi->sb, 1);
 
@@ -1133,9 +1122,6 @@ static int f2fs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
 		f2fs_put_page(old_dir_page, 0);
 out_old:
 	f2fs_put_page(old_page, 0);
-out_unlock_old:
-	if (S_ISDIR(old_inode->i_mode))
-		inode_unlock(old_inode);
 out:
 	iput(whiteout);
 	return err;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 4/6] fs: Establish locking order for unrelated directories
  2023-06-01 10:58 [PATCH v2 0/6] fs: Fix directory corruption when moving directories Jan Kara
                   ` (2 preceding siblings ...)
  2023-06-01 10:58 ` [PATCH v2 3/6] Revert "f2fs: fix potential corruption when moving a directory" Jan Kara
@ 2023-06-01 10:58 ` Jan Kara
  2023-06-01 13:58   ` Christian Brauner
  2023-06-02  1:36   ` kernel test robot
  2023-06-01 10:58 ` [PATCH v2 5/6] fs: Lock moved directories Jan Kara
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 21+ messages in thread
From: Jan Kara @ 2023-06-01 10:58 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Christian Brauner, Miklos Szeredi,
	Darrick J. Wong, Ted Tso, Jaegeuk Kim, linux-ext4, linux-xfs,
	linux-f2fs-devel, Jan Kara, stable

Currently the locking order of inode locks for directories that are not
in ancestor relationship is not defined because all operations that
needed to lock two directories like this were serialized by
sb->s_vfs_rename_mutex. However some filesystems need to lock two
subdirectories for RENAME_EXCHANGE operations and for this we need the
locking order established even for two tree-unrelated directories.
Provide a helper function lock_two_inodes() that establishes lock
ordering for any two inodes and use it in lock_two_directories().

CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/inode.c    | 42 ++++++++++++++++++++++++++++++++++++++++++
 fs/internal.h |  2 ++
 fs/namei.c    |  4 ++--
 3 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 577799b7855f..4000ab08bbc0 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1103,6 +1103,48 @@ void discard_new_inode(struct inode *inode)
 }
 EXPORT_SYMBOL(discard_new_inode);
 
+/**
+ * lock_two_inodes - lock two inodes (may be regular files but also dirs)
+ *
+ * Lock any non-NULL argument. The caller must make sure that if he is passing
+ * in two directories, one is not ancestor of the other.  Zero, one or two
+ * objects may be locked by this function.
+ *
+ * @inode1: first inode to lock
+ * @inode2: second inode to lock
+ * @subclass1: inode lock subclass for the first lock obtained
+ * @subclass2: inode lock subclass for the second lock obtained
+ */
+void lock_two_inodes(struct inode *inode1, struct inode *inode2,
+		     unsigned subclass1, unsigned subclass2)
+{
+	if (!inode1 || !inode2)
+		/*
+		 * Make sure @subclass1 will be used for the acquired lock.
+		 * This is not strictly necessary (no current caller cares) but
+		 * let's keep things consistent.
+		 */
+		if (!inode1)
+			swap(inode1, inode2);
+		goto lock;
+	}
+
+	/*
+	 * If one object is directory and the other is not, we must make sure
+	 * to lock directory first as the other object may be its child.
+	 */
+	if (S_ISDIR(inode2->i_mode) == S_ISDIR(inode1->i_mode)) {
+		if (inode1 > inode2)
+			swap(inode1, inode2);
+	} else if (!S_ISDIR(inode1->i_mode))
+		swap(inode1, inode2);
+lock:
+	if (inode1)
+		inode_lock_nested(inode1, subclass1);
+	if (inode2 && inode2 != inode1)
+		inode_lock_nested(inode2, subclass2);
+}
+
 /**
  * lock_two_nondirectories - take two i_mutexes on non-directory objects
  *
diff --git a/fs/internal.h b/fs/internal.h
index bd3b2810a36b..377030a50aca 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -152,6 +152,8 @@ extern long prune_icache_sb(struct super_block *sb, struct shrink_control *sc);
 int dentry_needs_remove_privs(struct mnt_idmap *, struct dentry *dentry);
 bool in_group_or_capable(struct mnt_idmap *idmap,
 			 const struct inode *inode, vfsgid_t vfsgid);
+void lock_two_inodes(struct inode *inode1, struct inode *inode2,
+		     unsigned subclass1, unsigned subclass2);
 
 /*
  * fs-writeback.c
diff --git a/fs/namei.c b/fs/namei.c
index e4fe0879ae55..148570aabe74 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3028,8 +3028,8 @@ static struct dentry *lock_two_directories(struct dentry *p1, struct dentry *p2)
 		return p;
 	}
 
-	inode_lock_nested(p1->d_inode, I_MUTEX_PARENT);
-	inode_lock_nested(p2->d_inode, I_MUTEX_PARENT2);
+	lock_two_inodes(p1->d_inode, p2->d_inode,
+			I_MUTEX_PARENT, I_MUTEX_PARENT2);
 	return NULL;
 }
 
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 5/6] fs: Lock moved directories
  2023-06-01 10:58 [PATCH v2 0/6] fs: Fix directory corruption when moving directories Jan Kara
                   ` (3 preceding siblings ...)
  2023-06-01 10:58 ` [PATCH v2 4/6] fs: Establish locking order for unrelated directories Jan Kara
@ 2023-06-01 10:58 ` Jan Kara
  2023-06-01 10:58 ` [PATCH v2 6/6] fs: Restrict lock_two_nondirectories() to non-directory inodes Jan Kara
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Jan Kara @ 2023-06-01 10:58 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Christian Brauner, Miklos Szeredi,
	Darrick J. Wong, Ted Tso, Jaegeuk Kim, linux-ext4, linux-xfs,
	linux-f2fs-devel, Jan Kara, stable

When a directory is moved to a different directory, some filesystems
(udf, ext4, ocfs2, f2fs, and likely gfs2, reiserfs, and others) need to
update their pointer to the parent and this must not race with other
operations on the directory. Lock the directories when they are moved.
Although not all filesystems need this locking, we perform it in
vfs_rename() because getting the lock ordering right is really difficult
and we don't want to expose these locking details to filesystems.

CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 .../filesystems/directory-locking.rst         | 26 ++++++++++---------
 fs/namei.c                                    | 22 ++++++++++------
 2 files changed, 28 insertions(+), 20 deletions(-)

diff --git a/Documentation/filesystems/directory-locking.rst b/Documentation/filesystems/directory-locking.rst
index 504ba940c36c..dccd61c7c5c3 100644
--- a/Documentation/filesystems/directory-locking.rst
+++ b/Documentation/filesystems/directory-locking.rst
@@ -22,12 +22,11 @@ exclusive.
 3) object removal.  Locking rules: caller locks parent, finds victim,
 locks victim and calls the method.  Locks are exclusive.
 
-4) rename() that is _not_ cross-directory.  Locking rules: caller locks
-the parent and finds source and target.  In case of exchange (with
-RENAME_EXCHANGE in flags argument) lock both.  In any case,
-if the target already exists, lock it.  If the source is a non-directory,
-lock it.  If we need to lock both, lock them in inode pointer order.
-Then call the method.  All locks are exclusive.
+4) rename() that is _not_ cross-directory.  Locking rules: caller locks the
+parent and finds source and target.  We lock both (provided they exist).  If we
+need to lock two inodes of different type (dir vs non-dir), we lock directory
+first.  If we need to lock two inodes of the same type, lock them in inode
+pointer order.  Then call the method.  All locks are exclusive.
 NB: we might get away with locking the source (and target in exchange
 case) shared.
 
@@ -44,15 +43,17 @@ All locks are exclusive.
 rules:
 
 	* lock the filesystem
-	* lock parents in "ancestors first" order.
+	* lock parents in "ancestors first" order. If one is not ancestor of
+	  the other, lock them in inode pointer order.
 	* find source and target.
 	* if old parent is equal to or is a descendent of target
 	  fail with -ENOTEMPTY
 	* if new parent is equal to or is a descendent of source
 	  fail with -ELOOP
-	* If it's an exchange, lock both the source and the target.
-	* If the target exists, lock it.  If the source is a non-directory,
-	  lock it.  If we need to lock both, do so in inode pointer order.
+	* Lock both the source and the target provided they exist. If we
+	  need to lock two inodes of different type (dir vs non-dir), we lock
+	  the directory first. If we need to lock two inodes of the same type,
+	  lock them in inode pointer order.
 	* call the method.
 
 All ->i_rwsem are taken exclusive.  Again, we might get away with locking
@@ -66,8 +67,9 @@ If no directory is its own ancestor, the scheme above is deadlock-free.
 
 Proof:
 
-	First of all, at any moment we have a partial ordering of the
-	objects - A < B iff A is an ancestor of B.
+	First of all, at any moment we have a linear ordering of the
+	objects - A < B iff (A is an ancestor of B) or (B is not an ancestor
+        of A and ptr(A) < ptr(B)).
 
 	That ordering can change.  However, the following is true:
 
diff --git a/fs/namei.c b/fs/namei.c
index 148570aabe74..6a5e26a529e1 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4731,7 +4731,7 @@ SYSCALL_DEFINE2(link, const char __user *, oldname, const char __user *, newname
  *	   sb->s_vfs_rename_mutex. We might be more accurate, but that's another
  *	   story.
  *	c) we have to lock _four_ objects - parents and victim (if it exists),
- *	   and source (if it is not a directory).
+ *	   and source.
  *	   And that - after we got ->i_mutex on parents (until then we don't know
  *	   whether the target exists).  Solution: try to be smart with locking
  *	   order for inodes.  We rely on the fact that tree topology may change
@@ -4815,10 +4815,16 @@ int vfs_rename(struct renamedata *rd)
 
 	take_dentry_name_snapshot(&old_name, old_dentry);
 	dget(new_dentry);
-	if (!is_dir || (flags & RENAME_EXCHANGE))
-		lock_two_nondirectories(source, target);
-	else if (target)
-		inode_lock(target);
+	/*
+	 * Lock all moved children. Moved directories may need to change parent
+	 * pointer so they need the lock to prevent against concurrent
+	 * directory changes moving parent pointer. For regular files we've
+	 * historically always done this. The lockdep locking subclasses are
+	 * somewhat arbitrary but RENAME_EXCHANGE in particular can swap
+	 * regular files and directories so it's difficult to tell which
+	 * subclasses to use.
+	 */
+	lock_two_inodes(source, target, I_MUTEX_NORMAL, I_MUTEX_NONDIR2);
 
 	error = -EPERM;
 	if (IS_SWAPFILE(source) || (target && IS_SWAPFILE(target)))
@@ -4866,9 +4872,9 @@ int vfs_rename(struct renamedata *rd)
 			d_exchange(old_dentry, new_dentry);
 	}
 out:
-	if (!is_dir || (flags & RENAME_EXCHANGE))
-		unlock_two_nondirectories(source, target);
-	else if (target)
+	if (source)
+		inode_unlock(source);
+	if (target)
 		inode_unlock(target);
 	dput(new_dentry);
 	if (!error) {
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 6/6] fs: Restrict lock_two_nondirectories() to non-directory inodes
  2023-06-01 10:58 [PATCH v2 0/6] fs: Fix directory corruption when moving directories Jan Kara
                   ` (4 preceding siblings ...)
  2023-06-01 10:58 ` [PATCH v2 5/6] fs: Lock moved directories Jan Kara
@ 2023-06-01 10:58 ` Jan Kara
  2023-06-02 13:05 ` [PATCH v2 0/6] fs: Fix directory corruption when moving directories Christian Brauner
  2023-07-06  0:18 ` [f2fs-dev] " patchwork-bot+f2fs
  7 siblings, 0 replies; 21+ messages in thread
From: Jan Kara @ 2023-06-01 10:58 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Christian Brauner, Miklos Szeredi,
	Darrick J. Wong, Ted Tso, Jaegeuk Kim, linux-ext4, linux-xfs,
	linux-f2fs-devel, Jan Kara

Currently lock_two_nondirectories() is skipping any passed directories.
After vfs_rename() uses lock_two_inodes(), all the remaining four users
of this function pass only regular files to it. So drop the somewhat
unusual "skip directory" logic and instead warn if anybody passes
directory to it. This also allows us to use lock_two_inodes() in
lock_two_nondirectories() to concentrate the lock ordering logic in less
places.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/inode.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 4000ab08bbc0..e8d10fd18378 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1148,7 +1148,7 @@ void lock_two_inodes(struct inode *inode1, struct inode *inode2,
 /**
  * lock_two_nondirectories - take two i_mutexes on non-directory objects
  *
- * Lock any non-NULL argument that is not a directory.
+ * Lock any non-NULL argument. Passed objects must not be directories.
  * Zero, one or two objects may be locked by this function.
  *
  * @inode1: first inode to lock
@@ -1156,13 +1156,9 @@ void lock_two_inodes(struct inode *inode1, struct inode *inode2,
  */
 void lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
 {
-	if (inode1 > inode2)
-		swap(inode1, inode2);
-
-	if (inode1 && !S_ISDIR(inode1->i_mode))
-		inode_lock(inode1);
-	if (inode2 && !S_ISDIR(inode2->i_mode) && inode2 != inode1)
-		inode_lock_nested(inode2, I_MUTEX_NONDIR2);
+	WARN_ON_ONCE(S_ISDIR(inode1->i_mode));
+	WARN_ON_ONCE(S_ISDIR(inode2->i_mode));
+	lock_two_inodes(inode1, inode2, I_MUTEX_NORMAL, I_MUTEX_NONDIR2);
 }
 EXPORT_SYMBOL(lock_two_nondirectories);
 
@@ -1173,9 +1169,11 @@ EXPORT_SYMBOL(lock_two_nondirectories);
  */
 void unlock_two_nondirectories(struct inode *inode1, struct inode *inode2)
 {
-	if (inode1 && !S_ISDIR(inode1->i_mode))
+	WARN_ON_ONCE(S_ISDIR(inode1->i_mode));
+	WARN_ON_ONCE(S_ISDIR(inode2->i_mode));
+	if (inode1)
 		inode_unlock(inode1);
-	if (inode2 && !S_ISDIR(inode2->i_mode) && inode2 != inode1)
+	if (inode2 && inode2 != inode1)
 		inode_unlock(inode2);
 }
 EXPORT_SYMBOL(unlock_two_nondirectories);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 4/6] fs: Establish locking order for unrelated directories
  2023-06-01 10:58 ` [PATCH v2 4/6] fs: Establish locking order for unrelated directories Jan Kara
@ 2023-06-01 13:58   ` Christian Brauner
  2023-06-01 15:24     ` Jan Kara
  2023-06-02  1:36   ` kernel test robot
  1 sibling, 1 reply; 21+ messages in thread
From: Christian Brauner @ 2023-06-01 13:58 UTC (permalink / raw)
  To: Jan Kara
  Cc: Al Viro, linux-fsdevel, Miklos Szeredi, Darrick J. Wong, Ted Tso,
	Jaegeuk Kim, linux-ext4, linux-xfs, linux-f2fs-devel, stable

On Thu, Jun 01, 2023 at 12:58:24PM +0200, Jan Kara wrote:
> Currently the locking order of inode locks for directories that are not
> in ancestor relationship is not defined because all operations that
> needed to lock two directories like this were serialized by
> sb->s_vfs_rename_mutex. However some filesystems need to lock two
> subdirectories for RENAME_EXCHANGE operations and for this we need the
> locking order established even for two tree-unrelated directories.
> Provide a helper function lock_two_inodes() that establishes lock
> ordering for any two inodes and use it in lock_two_directories().
> 
> CC: stable@vger.kernel.org
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/inode.c    | 42 ++++++++++++++++++++++++++++++++++++++++++
>  fs/internal.h |  2 ++
>  fs/namei.c    |  4 ++--
>  3 files changed, 46 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index 577799b7855f..4000ab08bbc0 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -1103,6 +1103,48 @@ void discard_new_inode(struct inode *inode)
>  }
>  EXPORT_SYMBOL(discard_new_inode);
>  
> +/**
> + * lock_two_inodes - lock two inodes (may be regular files but also dirs)
> + *
> + * Lock any non-NULL argument. The caller must make sure that if he is passing
> + * in two directories, one is not ancestor of the other.  Zero, one or two
> + * objects may be locked by this function.
> + *
> + * @inode1: first inode to lock
> + * @inode2: second inode to lock
> + * @subclass1: inode lock subclass for the first lock obtained
> + * @subclass2: inode lock subclass for the second lock obtained
> + */
> +void lock_two_inodes(struct inode *inode1, struct inode *inode2,
> +		     unsigned subclass1, unsigned subclass2)
> +{
> +	if (!inode1 || !inode2)

I think you forgot the opening bracket...
I can just fix this up for you though.

> +		/*
> +		 * Make sure @subclass1 will be used for the acquired lock.
> +		 * This is not strictly necessary (no current caller cares) but
> +		 * let's keep things consistent.
> +		 */
> +		if (!inode1)
> +			swap(inode1, inode2);
> +		goto lock;
> +	}

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/6] ext4: Remove ext4 locking of moved directory
  2023-06-01 10:58 ` [PATCH v2 1/6] ext4: Remove ext4 locking of moved directory Jan Kara
@ 2023-06-01 14:52   ` Theodore Ts'o
  2023-06-01 15:27     ` Jan Kara
  0 siblings, 1 reply; 21+ messages in thread
From: Theodore Ts'o @ 2023-06-01 14:52 UTC (permalink / raw)
  To: Jan Kara
  Cc: Al Viro, linux-fsdevel, Christian Brauner, Miklos Szeredi,
	Darrick J. Wong, Jaegeuk Kim, linux-ext4, linux-xfs,
	linux-f2fs-devel, stable

On Thu, Jun 01, 2023 at 12:58:21PM +0200, Jan Kara wrote:
> Remove locking of moved directory in ext4_rename2(). We will take care
> of it in VFS instead. This effectively reverts commit 0813299c586b
> ("ext4: Fix possible corruption when moving a directory") and followup
> fixes.

Remind me --- commit 0813299c586b is not actually causing any
problems; it's just not fully effective at solving the problem.  Is
that correct?

In other words, is there a rush in trying to get this revert to Linus
during this cycle as a regression fix?

I think the answer is no, and we can just let this full patch series
go in via the vfs branch during the next merge window, but I just
wanted to make sure.

Thanks!

					- Ted

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 4/6] fs: Establish locking order for unrelated directories
  2023-06-01 13:58   ` Christian Brauner
@ 2023-06-01 15:24     ` Jan Kara
  2023-06-01 15:37       ` David Laight
  2023-06-01 15:59       ` Christian Brauner
  0 siblings, 2 replies; 21+ messages in thread
From: Jan Kara @ 2023-06-01 15:24 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Jan Kara, Al Viro, linux-fsdevel, Miklos Szeredi,
	Darrick J. Wong, Ted Tso, Jaegeuk Kim, linux-ext4, linux-xfs,
	linux-f2fs-devel, stable

On Thu 01-06-23 15:58:58, Christian Brauner wrote:
> On Thu, Jun 01, 2023 at 12:58:24PM +0200, Jan Kara wrote:
> > Currently the locking order of inode locks for directories that are not
> > in ancestor relationship is not defined because all operations that
> > needed to lock two directories like this were serialized by
> > sb->s_vfs_rename_mutex. However some filesystems need to lock two
> > subdirectories for RENAME_EXCHANGE operations and for this we need the
> > locking order established even for two tree-unrelated directories.
> > Provide a helper function lock_two_inodes() that establishes lock
> > ordering for any two inodes and use it in lock_two_directories().
> > 
> > CC: stable@vger.kernel.org
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  fs/inode.c    | 42 ++++++++++++++++++++++++++++++++++++++++++
> >  fs/internal.h |  2 ++
> >  fs/namei.c    |  4 ++--
> >  3 files changed, 46 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/inode.c b/fs/inode.c
> > index 577799b7855f..4000ab08bbc0 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -1103,6 +1103,48 @@ void discard_new_inode(struct inode *inode)
> >  }
> >  EXPORT_SYMBOL(discard_new_inode);
> >  
> > +/**
> > + * lock_two_inodes - lock two inodes (may be regular files but also dirs)
> > + *
> > + * Lock any non-NULL argument. The caller must make sure that if he is passing
> > + * in two directories, one is not ancestor of the other.  Zero, one or two
> > + * objects may be locked by this function.
> > + *
> > + * @inode1: first inode to lock
> > + * @inode2: second inode to lock
> > + * @subclass1: inode lock subclass for the first lock obtained
> > + * @subclass2: inode lock subclass for the second lock obtained
> > + */
> > +void lock_two_inodes(struct inode *inode1, struct inode *inode2,
> > +		     unsigned subclass1, unsigned subclass2)
> > +{
> > +	if (!inode1 || !inode2)
> 
> I think you forgot the opening bracket...
> I can just fix this up for you though.

Oh, yes. Apparently I forgot to rerun git-format-patch after fixing up this
bit. I'm sorry for that. The patch series has survived full ext4 fstests
run on my end.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/6] ext4: Remove ext4 locking of moved directory
  2023-06-01 14:52   ` Theodore Ts'o
@ 2023-06-01 15:27     ` Jan Kara
  2023-06-01 15:58       ` Christian Brauner
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Kara @ 2023-06-01 15:27 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Jan Kara, Al Viro, linux-fsdevel, Christian Brauner,
	Miklos Szeredi, Darrick J. Wong, Jaegeuk Kim, linux-ext4,
	linux-xfs, linux-f2fs-devel, stable

On Thu 01-06-23 10:52:22, Theodore Ts'o wrote:
> On Thu, Jun 01, 2023 at 12:58:21PM +0200, Jan Kara wrote:
> > Remove locking of moved directory in ext4_rename2(). We will take care
> > of it in VFS instead. This effectively reverts commit 0813299c586b
> > ("ext4: Fix possible corruption when moving a directory") and followup
> > fixes.
> 
> Remind me --- commit 0813299c586b is not actually causing any
> problems; it's just not fully effective at solving the problem.  Is
> that correct?

Yes, correct.

> In other words, is there a rush in trying to get this revert to Linus
> during this cycle as a regression fix?
> 
> I think the answer is no, and we can just let this full patch series
> go in via the vfs branch during the next merge window, but I just
> wanted to make sure.

Exactly, that's my plan as well.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v2 4/6] fs: Establish locking order for unrelated directories
  2023-06-01 15:24     ` Jan Kara
@ 2023-06-01 15:37       ` David Laight
  2023-06-01 16:13         ` Jan Kara
  2023-06-01 15:59       ` Christian Brauner
  1 sibling, 1 reply; 21+ messages in thread
From: David Laight @ 2023-06-01 15:37 UTC (permalink / raw)
  To: 'Jan Kara', Christian Brauner
  Cc: Al Viro, linux-fsdevel, Miklos Szeredi, Darrick J. Wong, Ted Tso,
	Jaegeuk Kim, linux-ext4, linux-xfs, linux-f2fs-devel, stable

...
> > > + * Lock any non-NULL argument. The caller must make sure that if he is passing
> > > + * in two directories, one is not ancestor of the other

Not directly relevant to this change but is the 'not an ancestor'
check actually robust?

I found a condition in which the kernel 'pwd' code (which follows
the inode chain) failed to stop at the base of a chroot.

I suspect that the ancestor check would fail the same way.

IIRC the problematic code used unshare() to 'escape' from
a network natespace.
If it was inside a chroot (that wasn't on a mount point) there
ware two copies of the 'chroot /' inode and the match failed.

I might be able to find the test case.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/6] ext4: Remove ext4 locking of moved directory
  2023-06-01 15:27     ` Jan Kara
@ 2023-06-01 15:58       ` Christian Brauner
  0 siblings, 0 replies; 21+ messages in thread
From: Christian Brauner @ 2023-06-01 15:58 UTC (permalink / raw)
  To: Jan Kara
  Cc: Theodore Ts'o, Al Viro, linux-fsdevel, Miklos Szeredi,
	Darrick J. Wong, Jaegeuk Kim, linux-ext4, linux-xfs,
	linux-f2fs-devel, stable

On Thu, Jun 01, 2023 at 05:27:46PM +0200, Jan Kara wrote:
> On Thu 01-06-23 10:52:22, Theodore Ts'o wrote:
> > On Thu, Jun 01, 2023 at 12:58:21PM +0200, Jan Kara wrote:
> > > Remove locking of moved directory in ext4_rename2(). We will take care
> > > of it in VFS instead. This effectively reverts commit 0813299c586b
> > > ("ext4: Fix possible corruption when moving a directory") and followup
> > > fixes.
> > 
> > Remind me --- commit 0813299c586b is not actually causing any
> > problems; it's just not fully effective at solving the problem.  Is
> > that correct?
> 
> Yes, correct.
> 
> > In other words, is there a rush in trying to get this revert to Linus
> > during this cycle as a regression fix?
> > 
> > I think the answer is no, and we can just let this full patch series
> > go in via the vfs branch during the next merge window, but I just
> > wanted to make sure.
> 
> Exactly, that's my plan as well.

Yeah, we'll have time and ideally this should soak in -next for a good
while also gives others time to take a look.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 4/6] fs: Establish locking order for unrelated directories
  2023-06-01 15:24     ` Jan Kara
  2023-06-01 15:37       ` David Laight
@ 2023-06-01 15:59       ` Christian Brauner
  1 sibling, 0 replies; 21+ messages in thread
From: Christian Brauner @ 2023-06-01 15:59 UTC (permalink / raw)
  To: Jan Kara
  Cc: Al Viro, linux-fsdevel, Miklos Szeredi, Darrick J. Wong, Ted Tso,
	Jaegeuk Kim, linux-ext4, linux-xfs, linux-f2fs-devel, stable

On Thu, Jun 01, 2023 at 05:24:49PM +0200, Jan Kara wrote:
> On Thu 01-06-23 15:58:58, Christian Brauner wrote:
> > On Thu, Jun 01, 2023 at 12:58:24PM +0200, Jan Kara wrote:
> > > Currently the locking order of inode locks for directories that are not
> > > in ancestor relationship is not defined because all operations that
> > > needed to lock two directories like this were serialized by
> > > sb->s_vfs_rename_mutex. However some filesystems need to lock two
> > > subdirectories for RENAME_EXCHANGE operations and for this we need the
> > > locking order established even for two tree-unrelated directories.
> > > Provide a helper function lock_two_inodes() that establishes lock
> > > ordering for any two inodes and use it in lock_two_directories().
> > > 
> > > CC: stable@vger.kernel.org
> > > Signed-off-by: Jan Kara <jack@suse.cz>
> > > ---
> > >  fs/inode.c    | 42 ++++++++++++++++++++++++++++++++++++++++++
> > >  fs/internal.h |  2 ++
> > >  fs/namei.c    |  4 ++--
> > >  3 files changed, 46 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/fs/inode.c b/fs/inode.c
> > > index 577799b7855f..4000ab08bbc0 100644
> > > --- a/fs/inode.c
> > > +++ b/fs/inode.c
> > > @@ -1103,6 +1103,48 @@ void discard_new_inode(struct inode *inode)
> > >  }
> > >  EXPORT_SYMBOL(discard_new_inode);
> > >  
> > > +/**
> > > + * lock_two_inodes - lock two inodes (may be regular files but also dirs)
> > > + *
> > > + * Lock any non-NULL argument. The caller must make sure that if he is passing
> > > + * in two directories, one is not ancestor of the other.  Zero, one or two
> > > + * objects may be locked by this function.
> > > + *
> > > + * @inode1: first inode to lock
> > > + * @inode2: second inode to lock
> > > + * @subclass1: inode lock subclass for the first lock obtained
> > > + * @subclass2: inode lock subclass for the second lock obtained
> > > + */
> > > +void lock_two_inodes(struct inode *inode1, struct inode *inode2,
> > > +		     unsigned subclass1, unsigned subclass2)
> > > +{
> > > +	if (!inode1 || !inode2)
> > 
> > I think you forgot the opening bracket...
> > I can just fix this up for you though.
> 
> Oh, yes. Apparently I forgot to rerun git-format-patch after fixing up this
> bit. I'm sorry for that. The patch series has survived full ext4 fstests

No problem at all!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 4/6] fs: Establish locking order for unrelated directories
  2023-06-01 15:37       ` David Laight
@ 2023-06-01 16:13         ` Jan Kara
  2023-06-01 16:21           ` Christian Brauner
  2023-06-01 16:33           ` David Laight
  0 siblings, 2 replies; 21+ messages in thread
From: Jan Kara @ 2023-06-01 16:13 UTC (permalink / raw)
  To: David Laight
  Cc: 'Jan Kara',
	Christian Brauner, Al Viro, linux-fsdevel, Miklos Szeredi,
	Darrick J. Wong, Ted Tso, Jaegeuk Kim, linux-ext4, linux-xfs,
	linux-f2fs-devel, stable

On Thu 01-06-23 15:37:32, David Laight wrote:
> ...
> > > > + * Lock any non-NULL argument. The caller must make sure that if he is passing
> > > > + * in two directories, one is not ancestor of the other
> 
> Not directly relevant to this change but is the 'not an ancestor'
> check actually robust?
> 
> I found a condition in which the kernel 'pwd' code (which follows
> the inode chain) failed to stop at the base of a chroot.
> 
> I suspect that the ancestor check would fail the same way.

Honestly, I'm not sure how this could be the case but I'm not a dcache
expert. d_ancestor() works on dentries and the whole dcache code pretty
much relies on the fact that there always is at most one dentry for any
directory. Also in case we call d_ancestor() from this code, we have the
whole filesystem locked from any other directory moves so the ancestor
relationship of two dirs cannot change (which is different from pwd code
AFAIK). So IMHO no failure is possible in our case.

								Honza

> 
> IIRC the problematic code used unshare() to 'escape' from
> a network natespace.
> If it was inside a chroot (that wasn't on a mount point) there
> ware two copies of the 'chroot /' inode and the match failed.
> 
> I might be able to find the test case.
> 
> 	David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 4/6] fs: Establish locking order for unrelated directories
  2023-06-01 16:13         ` Jan Kara
@ 2023-06-01 16:21           ` Christian Brauner
  2023-06-01 16:33           ` David Laight
  1 sibling, 0 replies; 21+ messages in thread
From: Christian Brauner @ 2023-06-01 16:21 UTC (permalink / raw)
  To: Jan Kara
  Cc: David Laight, Al Viro, linux-fsdevel, Miklos Szeredi,
	Darrick J. Wong, Ted Tso, Jaegeuk Kim, linux-ext4, linux-xfs,
	linux-f2fs-devel, stable

On Thu, Jun 01, 2023 at 06:13:53PM +0200, Jan Kara wrote:
> On Thu 01-06-23 15:37:32, David Laight wrote:
> > ...
> > > > > + * Lock any non-NULL argument. The caller must make sure that if he is passing
> > > > > + * in two directories, one is not ancestor of the other
> > 
> > Not directly relevant to this change but is the 'not an ancestor'
> > check actually robust?
> > 
> > I found a condition in which the kernel 'pwd' code (which follows
> > the inode chain) failed to stop at the base of a chroot.
> > 
> > I suspect that the ancestor check would fail the same way.
> 
> Honestly, I'm not sure how this could be the case but I'm not a dcache
> expert. d_ancestor() works on dentries and the whole dcache code pretty
> much relies on the fact that there always is at most one dentry for any
> directory. Also in case we call d_ancestor() from this code, we have the
> whole filesystem locked from any other directory moves so the ancestor
> relationship of two dirs cannot change (which is different from pwd code
> AFAIK). So IMHO no failure is possible in our case.

Yes, this is a red herring. What matters is that the tree topology can't
change which is up to the caller to guarantee. And where it's called
we're under s_vfs_rename_mutex. It's also literally mentioned in the
directory locking documentation.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v2 4/6] fs: Establish locking order for unrelated directories
  2023-06-01 16:13         ` Jan Kara
  2023-06-01 16:21           ` Christian Brauner
@ 2023-06-01 16:33           ` David Laight
  2023-06-02 12:34             ` Christian Brauner
  1 sibling, 1 reply; 21+ messages in thread
From: David Laight @ 2023-06-01 16:33 UTC (permalink / raw)
  To: 'Jan Kara'
  Cc: Christian Brauner, Al Viro, linux-fsdevel, Miklos Szeredi,
	Darrick J. Wong, Ted Tso, Jaegeuk Kim, linux-ext4, linux-xfs,
	linux-f2fs-devel, stable

From: Jan Kara <jack@suse.cz>
> Sent: 01 June 2023 17:14
> 
> On Thu 01-06-23 15:37:32, David Laight wrote:
> > ...
> > > > > + * Lock any non-NULL argument. The caller must make sure that if he is passing
> > > > > + * in two directories, one is not ancestor of the other
> >
> > Not directly relevant to this change but is the 'not an ancestor'
> > check actually robust?
> >
> > I found a condition in which the kernel 'pwd' code (which follows
> > the inode chain) failed to stop at the base of a chroot.
> >
> > I suspect that the ancestor check would fail the same way.
> 
> Honestly, I'm not sure how this could be the case but I'm not a dcache
> expert. d_ancestor() works on dentries and the whole dcache code pretty
> much relies on the fact that there always is at most one dentry for any
> directory. Also in case we call d_ancestor() from this code, we have the
> whole filesystem locked from any other directory moves so the ancestor
> relationship of two dirs cannot change (which is different from pwd code
> AFAIK). So IMHO no failure is possible in our case.

I've found the test program.
This uses readlinkat() to get the full path /proc/self/fd/0.
It should be inside the chroot, but the comparison done
to detect the 'root' fails.

Now maybe any rename that would hit this is invalid
for other reasons.
But something is awry somewhere.

	David

The program below reproduces this when run with stdin
redirected to a file in the current directory.

This sequence is used by 'ip netns exec' so isn't actually
that unusual.

	David

#define _GNU_SOURCE
#include <unistd.h>
#include <stdio.h>
#include <fcntl.h>
#include <sched.h>

static void print_link(const char *where, int fd)
{
        char buf[256];

        printf("%s: %.*s\n", where, (int)readlinkat(fd, "", buf, sizeof buf), buf);
}

int main(int argc, char **argv)
{
        int link_fd = open("/proc/self/fd/0", O_PATH | O_NOFOLLOW);

        print_link("initial", link_fd);
        if (chroot("."))
                return 1;
        print_link("after chroot", link_fd);
        if (unshare(CLONE_NEWNS))
                return 2;
        print_link("after unshare", link_fd);
        return 0;
}

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 4/6] fs: Establish locking order for unrelated directories
  2023-06-01 10:58 ` [PATCH v2 4/6] fs: Establish locking order for unrelated directories Jan Kara
  2023-06-01 13:58   ` Christian Brauner
@ 2023-06-02  1:36   ` kernel test robot
  1 sibling, 0 replies; 21+ messages in thread
From: kernel test robot @ 2023-06-02  1:36 UTC (permalink / raw)
  To: Jan Kara, Al Viro
  Cc: oe-kbuild-all, linux-fsdevel, Christian Brauner, Miklos Szeredi,
	Darrick J. Wong, Ted Tso, Jaegeuk Kim, linux-ext4, linux-xfs,
	linux-f2fs-devel, Jan Kara, stable

Hi Jan,

kernel test robot noticed the following build errors:

[auto build test ERROR on tytso-ext4/dev]
[also build test ERROR on jaegeuk-f2fs/dev-test jaegeuk-f2fs/dev linus/master v6.4-rc4 next-20230601]
[cannot apply to vfs-idmapping/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Jan-Kara/ext4-Remove-ext4-locking-of-moved-directory/20230601-225100
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git dev
patch link:    https://lore.kernel.org/r/20230601105830.13168-4-jack%40suse.cz
patch subject: [PATCH v2 4/6] fs: Establish locking order for unrelated directories
config: um-i386_defconfig (https://download.01.org/0day-ci/archive/20230602/202306020948.TBmCxtVw-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/234d970a1de0d79e372cc04d6a8112d2aec56c44
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Jan-Kara/ext4-Remove-ext4-locking-of-moved-directory/20230601-225100
        git checkout 234d970a1de0d79e372cc04d6a8112d2aec56c44
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=um SUBARCH=i386 olddefconfig
        make W=1 O=build_dir ARCH=um SUBARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202306020948.TBmCxtVw-lkp@intel.com/

All error/warnings (new ones prefixed by >>):

   fs/inode.c: In function 'lock_two_inodes':
>> fs/inode.c:1121:9: warning: this 'if' clause does not guard... [-Wmisleading-indentation]
    1121 |         if (!inode1 || !inode2)
         |         ^~
   fs/inode.c:1129:17: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the 'if'
    1129 |                 goto lock;
         |                 ^~~~
>> fs/inode.c:1129:17: error: label 'lock' used but not defined
   fs/inode.c: At top level:
>> fs/inode.c:1136:9: error: expected identifier or '(' before 'if'
    1136 |         if (S_ISDIR(inode2->i_mode) == S_ISDIR(inode1->i_mode)) {
         |         ^~
>> fs/inode.c:1139:11: error: expected identifier or '(' before 'else'
    1139 |         } else if (!S_ISDIR(inode1->i_mode))
         |           ^~~~
   In file included from include/linux/kernel.h:27,
                    from include/linux/cpumask.h:10,
                    from include/linux/smp.h:13,
                    from include/linux/lockdep.h:14,
                    from include/linux/spinlock.h:63,
                    from include/linux/wait.h:9,
                    from include/linux/wait_bit.h:8,
                    from include/linux/fs.h:6,
                    from fs/inode.c:7:
>> include/linux/minmax.h:167:63: error: expected identifier or '(' before 'while'
     167 |         do { typeof(a) __tmp = (a); (a) = (b); (b) = __tmp; } while (0)
         |                                                               ^~~~~
   fs/inode.c:1140:17: note: in expansion of macro 'swap'
    1140 |                 swap(inode1, inode2);
         |                 ^~~~
>> fs/inode.c:1141:5: error: expected '=', ',', ';', 'asm' or '__attribute__' before ':' token
    1141 | lock:
         |     ^
   fs/inode.c:1144:9: error: expected identifier or '(' before 'if'
    1144 |         if (inode2 && inode2 != inode1)
         |         ^~
>> fs/inode.c:1146:1: error: expected identifier or '(' before '}' token
    1146 | }
         | ^


vim +/lock +1129 fs/inode.c

  1105	
  1106	/**
  1107	 * lock_two_inodes - lock two inodes (may be regular files but also dirs)
  1108	 *
  1109	 * Lock any non-NULL argument. The caller must make sure that if he is passing
  1110	 * in two directories, one is not ancestor of the other.  Zero, one or two
  1111	 * objects may be locked by this function.
  1112	 *
  1113	 * @inode1: first inode to lock
  1114	 * @inode2: second inode to lock
  1115	 * @subclass1: inode lock subclass for the first lock obtained
  1116	 * @subclass2: inode lock subclass for the second lock obtained
  1117	 */
  1118	void lock_two_inodes(struct inode *inode1, struct inode *inode2,
  1119			     unsigned subclass1, unsigned subclass2)
  1120	{
> 1121		if (!inode1 || !inode2)
  1122			/*
  1123			 * Make sure @subclass1 will be used for the acquired lock.
  1124			 * This is not strictly necessary (no current caller cares) but
  1125			 * let's keep things consistent.
  1126			 */
  1127			if (!inode1)
  1128				swap(inode1, inode2);
> 1129			goto lock;
  1130		}
  1131	
  1132		/*
  1133		 * If one object is directory and the other is not, we must make sure
  1134		 * to lock directory first as the other object may be its child.
  1135		 */
> 1136		if (S_ISDIR(inode2->i_mode) == S_ISDIR(inode1->i_mode)) {
  1137			if (inode1 > inode2)
  1138				swap(inode1, inode2);
> 1139		} else if (!S_ISDIR(inode1->i_mode))
  1140			swap(inode1, inode2);
> 1141	lock:
  1142		if (inode1)
  1143			inode_lock_nested(inode1, subclass1);
  1144		if (inode2 && inode2 != inode1)
  1145			inode_lock_nested(inode2, subclass2);
> 1146	}
  1147	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 4/6] fs: Establish locking order for unrelated directories
  2023-06-01 16:33           ` David Laight
@ 2023-06-02 12:34             ` Christian Brauner
  0 siblings, 0 replies; 21+ messages in thread
From: Christian Brauner @ 2023-06-02 12:34 UTC (permalink / raw)
  To: David Laight
  Cc: 'Jan Kara',
	Al Viro, linux-fsdevel, Miklos Szeredi, Darrick J. Wong, Ted Tso,
	Jaegeuk Kim, linux-ext4, linux-xfs, linux-f2fs-devel, stable

On Thu, Jun 01, 2023 at 04:33:58PM +0000, David Laight wrote:
> From: Jan Kara <jack@suse.cz>
> > Sent: 01 June 2023 17:14
> > 
> > On Thu 01-06-23 15:37:32, David Laight wrote:
> > > ...
> > > > > > + * Lock any non-NULL argument. The caller must make sure that if he is passing
> > > > > > + * in two directories, one is not ancestor of the other
> > >
> > > Not directly relevant to this change but is the 'not an ancestor'
> > > check actually robust?
> > >
> > > I found a condition in which the kernel 'pwd' code (which follows
> > > the inode chain) failed to stop at the base of a chroot.
> > >
> > > I suspect that the ancestor check would fail the same way.
> > 
> > Honestly, I'm not sure how this could be the case but I'm not a dcache
> > expert. d_ancestor() works on dentries and the whole dcache code pretty
> > much relies on the fact that there always is at most one dentry for any
> > directory. Also in case we call d_ancestor() from this code, we have the
> > whole filesystem locked from any other directory moves so the ancestor
> > relationship of two dirs cannot change (which is different from pwd code
> > AFAIK). So IMHO no failure is possible in our case.
> 
> I've found the test program.
> This uses readlinkat() to get the full path /proc/self/fd/0.
> It should be inside the chroot, but the comparison done
> to detect the 'root' fails.

That's intentional and relied-upon behavior. In glibc alone for tty
validation it wants the full link path returned. So any change in this
is an immediate widespread userspace regression.

> 
> Now maybe any rename that would hit this is invalid
> for other reasons.
> But something is awry somewhere.

It really isn't.

> 
> 	David
> 
> The program below reproduces this when run with stdin
> redirected to a file in the current directory.
> 
> This sequence is used by 'ip netns exec' so isn't actually

Fwiw, it doesn't use chroot() at all. 

> that unusual.
> 
> 	David
> 
> #define _GNU_SOURCE
> #include <unistd.h>
> #include <stdio.h>
> #include <fcntl.h>
> #include <sched.h>
> 
> static void print_link(const char *where, int fd)
> {
>         char buf[256];
> 
>         printf("%s: %.*s\n", where, (int)readlinkat(fd, "", buf, sizeof buf), buf);
> }
> 
> int main(int argc, char **argv)
> {
>         int link_fd = open("/proc/self/fd/0", O_PATH | O_NOFOLLOW);
> 
>         print_link("initial", link_fd);
>         if (chroot("."))

chroot(2):

"This call changes an ingredient in the pathname resolution process and
does nothing else. In particular, it is not intended [...] to restrict
filesystem system calls.

[...]

This call does not close open file descriptors, and such file
descriptors may allow access to files outside the chroot tree."

>                 return 1;
>         print_link("after chroot", link_fd);
>         if (unshare(CLONE_NEWNS))
>                 return 2;
>         print_link("after unshare", link_fd);
>         return 0;
> }

But anyway, the code sample you provided is using O_PATH | O_NOFOLLOW to
open magic link, i.e., /proc/<pid>/fd/<nr>. That means whatever the
magic link refers to isn't really reopened. You can create these magic
link references trivially for every path:

        int fd = open("/tmp", 0);

        // create fd referencing magic link
        sprintf(buf, "/proc/self/fd/%d", fd);
        int link_fd = open(buf, O_PATH | O_NOFOLLOW);

In fact, you don't even need magic links for that. You can get the same
behavior with any symlink:

        ln -sf /usr /BLUB
        linkt_fd = open("/BLUB", O_PATH | O_NOFOLLOW);

If you pass such a fd to readlinkat() then d_path() will give you the
full path whether it's accessible in your namespace/chroot/pivot_root()
or not.

Look at __prepend_path() current->fs->root is only used to terminate the
walk for fds that are scopable _beneath_ your chroot:

        mkdir -p /A/B/C
        touch /A/B/C/D

        chroot("/A/B/C");
        int fd = open("/A/B/C/D", 0);
        sprintf(buf, "/proc/self/fd/%d", fd);
        int link_fd = open(buf, O_PATH | O_NOFOLLOW);

In this case, you'll see that after chroot("/A/B/C") it'll print:

        /D

And this actually makes a lot of sense. The fd for /A/B/C/D is scoped
beneath your chroot(). But an fd pointing outside of your chroot is not
scoped by the chroot because you can also very well do:

        fchdir(fd-outside-chroot)

And btw, orderings such as:

        chroot()
        unshare(CLONE_NEWNS)

aren't intuitive. It seems that you're under the impression that the
unshare(CLONE_NEWNS) doesn't have any effect on the chroot() but it
does. Going back to the previous example:

        mkdir -p /A/B/C
        touch /A/B/C/D

        chroot("/A/B/C");
        int fd = open("/A/B/C/D", 0);
        sprintf(buf, "/proc/self/fd/%d", fd);
        int link_fd = open(buf, O_PATH | O_NOFOLLOW);

Compare what gets printed after the chroot() and after
unshare(CLONE_NEWNS). You'll see /D after the chroot() but again the
full path /A/B/C/D after the unshare(). Why?

The reason is that if the mount that you're currently chroot()ed into is
copied as part of the unshare(CLONE_NEWNS) then current->fs->root will
be updated to refer to the copy.

But since this is a copy it means that __prepend_path() doesn't
terminate the walk at /D. That's seemingly counterintuitive but makes
sense if you consider that you were moved into a new mount namespace.
The mount the fd refers to is now inaccessible from your mount namespace
and so the full path is returned again.

Yes, that's not straightforward but heavily relied upon so even if we
could change it to be less surprising it would break the hell out of
everyone.

And most of this doesn't have anything to do with ancestor relationships
per se since this code is able to detect concurrent tree modifications
through rename_lock seqlock iirc. That's a related but different
problem. The effects you're seeing are caused by mount semantics more
than anything else.

And btw about /proc/self/fd/0 specifically... Not verifying an fd
pointing to a pty device in any type of sandbox in the age of containers
is ripe for confusion. Quoting from work I did on glibc years ago:

"It's a common practice among container managers to allocate
 a PTY master/slave pair in the host's mount namespace (the slave having
 a path like "/dev/pty/$X"), bind mount the slave to "/dev/console" in
 the container's mount namespace, and send the slave FD to a process in
 the container. Inside of the container, the slave-end isn't available
 at its original path ("/dev/pts/$X"), since the container mount
 namespace has a separate devpts instance from the host (that path may
 or may not exist in the container; if it does exist, it's not the same
 PTY slave device)."

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/6] fs: Fix directory corruption when moving directories
  2023-06-01 10:58 [PATCH v2 0/6] fs: Fix directory corruption when moving directories Jan Kara
                   ` (5 preceding siblings ...)
  2023-06-01 10:58 ` [PATCH v2 6/6] fs: Restrict lock_two_nondirectories() to non-directory inodes Jan Kara
@ 2023-06-02 13:05 ` Christian Brauner
  2023-07-06  0:18 ` [f2fs-dev] " patchwork-bot+f2fs
  7 siblings, 0 replies; 21+ messages in thread
From: Christian Brauner @ 2023-06-02 13:05 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christian Brauner, linux-fsdevel, Miklos Szeredi,
	Darrick J. Wong, Ted Tso, Jaegeuk Kim, linux-ext4, linux-xfs,
	linux-f2fs-devel, Al Viro

On Thu, 01 Jun 2023 12:58:20 +0200, Jan Kara wrote:
> this patch set fixes a problem with cross directory renames originally reported
> in [1]. To quickly sum it up some filesystems (so far we know at least about
> ext4, udf, f2fs, ocfs2, likely also reiserfs, gfs2 and others) need to lock the
> directory when it is being renamed into another directory. This is because we
> need to update the parent pointer in the directory in that case and if that
> races with other operation on the directory (in particular a conversion from
> one directory format into another), bad things can happen.
> 
> [...]

I've picked this up to get it into -next. I've folded the following fix
for the missing { into [4/6].

---

Applied to the vfs.rename.locking branch of the vfs/vfs.git tree.
Patches in the vfs.rename.locking branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.rename.locking

[1/6] ext4: Remove ext4 locking of moved directory
      https://git.kernel.org/vfs/vfs/c/3658840cd363
[2/6] Revert "udf: Protect rename against modification of moved directory"
      https://git.kernel.org/vfs/vfs/c/7517ce5dc4d6
[3/6] Revert "f2fs: fix potential corruption when moving a directory"
      https://git.kernel.org/vfs/vfs/c/cde3c9d7e2a3
[4/6] fs: Establish locking order for unrelated directories
      https://git.kernel.org/vfs/vfs/c/f23ce7571853
[5/6] fs: Lock moved directories
      https://git.kernel.org/vfs/vfs/c/28eceeda130f
[6/6] fs: Restrict lock_two_nondirectories() to non-directory inodes
      https://git.kernel.org/vfs/vfs/c/afb4adc7c3ef

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [f2fs-dev] [PATCH v2 0/6] fs: Fix directory corruption when moving directories
  2023-06-01 10:58 [PATCH v2 0/6] fs: Fix directory corruption when moving directories Jan Kara
                   ` (6 preceding siblings ...)
  2023-06-02 13:05 ` [PATCH v2 0/6] fs: Fix directory corruption when moving directories Christian Brauner
@ 2023-07-06  0:18 ` patchwork-bot+f2fs
  7 siblings, 0 replies; 21+ messages in thread
From: patchwork-bot+f2fs @ 2023-07-06  0:18 UTC (permalink / raw)
  To: Jan Kara
  Cc: viro, brauner, tytso, miklos, djwong, linux-f2fs-devel,
	linux-xfs, linux-fsdevel, jaegeuk, linux-ext4

Hello:

This series was applied to jaegeuk/f2fs.git (dev)
by Christian Brauner <brauner@kernel.org>:

On Thu,  1 Jun 2023 12:58:20 +0200 you wrote:
> Hello,
> 
> this patch set fixes a problem with cross directory renames originally reported
> in [1]. To quickly sum it up some filesystems (so far we know at least about
> ext4, udf, f2fs, ocfs2, likely also reiserfs, gfs2 and others) need to lock the
> directory when it is being renamed into another directory. This is because we
> need to update the parent pointer in the directory in that case and if that
> races with other operation on the directory (in particular a conversion from
> one directory format into another), bad things can happen.
> 
> [...]

Here is the summary with links:
  - [f2fs-dev,v2,1/6] ext4: Remove ext4 locking of moved directory
    https://git.kernel.org/jaegeuk/f2fs/c/3658840cd363
  - [f2fs-dev,v2,2/6] Revert "udf: Protect rename against modification of moved directory"
    https://git.kernel.org/jaegeuk/f2fs/c/7517ce5dc4d6
  - [f2fs-dev,v2,3/6] Revert "f2fs: fix potential corruption when moving a directory"
    https://git.kernel.org/jaegeuk/f2fs/c/cde3c9d7e2a3
  - [f2fs-dev,v2,4/6] fs: Establish locking order for unrelated directories
    (no matching commit)
  - [f2fs-dev,v2,5/6] fs: Lock moved directories
    https://git.kernel.org/jaegeuk/f2fs/c/28eceeda130f
  - [f2fs-dev,v2,6/6] fs: Restrict lock_two_nondirectories() to non-directory inodes
    (no matching commit)

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2023-07-06  0:18 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-01 10:58 [PATCH v2 0/6] fs: Fix directory corruption when moving directories Jan Kara
2023-06-01 10:58 ` [PATCH v2 1/6] ext4: Remove ext4 locking of moved directory Jan Kara
2023-06-01 14:52   ` Theodore Ts'o
2023-06-01 15:27     ` Jan Kara
2023-06-01 15:58       ` Christian Brauner
2023-06-01 10:58 ` [PATCH v2 2/6] Revert "udf: Protect rename against modification of moved directory" Jan Kara
2023-06-01 10:58 ` [PATCH v2 3/6] Revert "f2fs: fix potential corruption when moving a directory" Jan Kara
2023-06-01 10:58 ` [PATCH v2 4/6] fs: Establish locking order for unrelated directories Jan Kara
2023-06-01 13:58   ` Christian Brauner
2023-06-01 15:24     ` Jan Kara
2023-06-01 15:37       ` David Laight
2023-06-01 16:13         ` Jan Kara
2023-06-01 16:21           ` Christian Brauner
2023-06-01 16:33           ` David Laight
2023-06-02 12:34             ` Christian Brauner
2023-06-01 15:59       ` Christian Brauner
2023-06-02  1:36   ` kernel test robot
2023-06-01 10:58 ` [PATCH v2 5/6] fs: Lock moved directories Jan Kara
2023-06-01 10:58 ` [PATCH v2 6/6] fs: Restrict lock_two_nondirectories() to non-directory inodes Jan Kara
2023-06-02 13:05 ` [PATCH v2 0/6] fs: Fix directory corruption when moving directories Christian Brauner
2023-07-06  0:18 ` [f2fs-dev] " patchwork-bot+f2fs

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).