linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/7] cross rename
@ 2013-10-01 16:00 Miklos Szeredi
  2013-10-01 16:00 ` [PATCH 1/7] vfs: rename: move d_move() up Miklos Szeredi
                   ` (8 more replies)
  0 siblings, 9 replies; 18+ messages in thread
From: Miklos Szeredi @ 2013-10-01 16:00 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel
  Cc: viro, torvalds, hch, akpm, dhowells, zab, jack, tytso, mszeredi

This series adds a new syscall, renameat2(), which is the same as renameat() but
with a flags argument.  Internally i_op->reaname2() is also added, which can
later be merged with ->rename() but is kept separately for now, since this would
just blow up this patch without helping review.

The purpose of extending rename is to add cross-rename, a symmetric variant of
rename, which exchanges the two files.  This allows interesting things, which
were not possible before, for example atomically replacing a directory tree with
a symlink, etc...

The other reason to introduce this is for whiteout handling in union/overlay
solutions in an atomic manner without having to add complex code to each
filesystem's rmdir, mkdir and rename just for handling whiteouts.  With
cross-rename it becomes possible to handle whiteouts in a generic manner in most
of the cases:

rmdir(P) where P needs to be whiteout:
  - create-whiteout(orphan/X)
  - exchange(orphan/X, P)
  - rmdir(orphan/X)

mkdir(P) where P is currently a whiteout:
  - mkdir(orphan/X)
  - exchange(orphan/X, P)
  - unlink(orphan/X)

rename(P, Q) where P needs to be whiteout and Q is negative:
  - create-whiteout(Q)
  - exchange(P, Q)

rename(P, Q) where Q is a directory containing only whiteouts:
  - mkdir(orphan/X) (marked as opaque)
  - exchange(orphan/X, Q)
  - recursive-remove(orphan/X)
  - rename(P, Q)

The case that cannot be handled with cross rename is:

rename(P, Q) where P needs to be whiteout and Q exists

For this case a new rename flag will be propsed that atomically creates the
whiteout.

Thanks,
Miklos

---
Miklos Szeredi (7):
      vfs: rename: move d_move() up
      vfs: rename: use common code for dir and non-dir
      vfs: add renameat2 syscall and cross-rename
      ext4: rename: create ext4_renament structure for local vars
      ext4: rename: move EMLINK check up
      ext4: rename: split out helper functions
      ext4: add cross rename support

---
 arch/x86/syscalls/syscall_64.tbl |   1 +
 fs/dcache.c                      |  46 ++++-
 fs/ext4/namei.c                  | 379 +++++++++++++++++++++++++--------------
 fs/namei.c                       | 218 ++++++++++++----------
 include/linux/dcache.h           |   1 +
 include/linux/fs.h               |   2 +
 include/uapi/linux/fs.h          |   2 +
 7 files changed, 415 insertions(+), 234 deletions(-)



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/7] vfs: rename: move d_move() up
  2013-10-01 16:00 [RFC PATCH 0/7] cross rename Miklos Szeredi
@ 2013-10-01 16:00 ` Miklos Szeredi
  2013-10-01 16:00 ` [PATCH 2/7] vfs: rename: use common code for dir and non-dir Miklos Szeredi
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: Miklos Szeredi @ 2013-10-01 16:00 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel
  Cc: viro, torvalds, hch, akpm, dhowells, zab, jack, tytso, mszeredi

From: Miklos Szeredi <mszeredi@suse.cz>

Move the d_move() in vfs_rename_dir() up, similarly to how it's done in
vfs_rename_other().  The next patch will consolidate these two functions
and this is the only structural difference between them.

I'm not sure if doing the d_move() after the dput is even valid.  But there
may be a logical explanation for that.  But moving the d_move() before the
dput() (and the mutex_unlock()) should definitely not hurt.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/namei.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 645268f..911aadc 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4007,13 +4007,12 @@ static int vfs_rename_dir(struct inode *old_dir, struct dentry *old_dentry,
 		target->i_flags |= S_DEAD;
 		dont_mount(new_dentry);
 	}
+	if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE))
+		d_move(old_dentry, new_dentry);
 out:
 	if (target)
 		mutex_unlock(&target->i_mutex);
 	dput(new_dentry);
-	if (!error)
-		if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE))
-			d_move(old_dentry,new_dentry);
 	return error;
 }
 
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/7] vfs: rename: use common code for dir and non-dir
  2013-10-01 16:00 [RFC PATCH 0/7] cross rename Miklos Szeredi
  2013-10-01 16:00 ` [PATCH 1/7] vfs: rename: move d_move() up Miklos Szeredi
@ 2013-10-01 16:00 ` Miklos Szeredi
  2013-10-01 16:00 ` [PATCH 3/7] vfs: add renameat2 syscall and cross-rename Miklos Szeredi
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: Miklos Szeredi @ 2013-10-01 16:00 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel
  Cc: viro, torvalds, hch, akpm, dhowells, zab, jack, tytso, mszeredi

From: Miklos Szeredi <mszeredi@suse.cz>

There's actually very little difference between vfs_rename_dir() and
vfs_rename_other() so move both inline into vfs_rename() which still stays
reasonably readable.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/namei.c | 118 ++++++++++++++++++++-----------------------------------------
 1 file changed, 39 insertions(+), 79 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 911aadc..7ec6a12 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3963,19 +3963,38 @@ SYSCALL_DEFINE2(link, const char __user *, oldname, const char __user *, newname
  *	   ->i_mutex on parents, which works but leads to some truly excessive
  *	   locking].
  */
-static int vfs_rename_dir(struct inode *old_dir, struct dentry *old_dentry,
-			  struct inode *new_dir, struct dentry *new_dentry)
+int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
+	       struct inode *new_dir, struct dentry *new_dentry)
 {
-	int error = 0;
+	int error;
+	const unsigned char *old_name;
+	struct inode *source = old_dentry->d_inode;
 	struct inode *target = new_dentry->d_inode;
-	unsigned max_links = new_dir->i_sb->s_max_links;
+	bool is_dir = S_ISDIR(source->i_mode);
+
+	if (source == target)
+		return 0;
+
+	error = may_delete(old_dir, old_dentry, is_dir);
+	if (error)
+		return error;
+
+	if (!target)
+		error = may_create(new_dir, new_dentry);
+	else
+		error = may_delete(new_dir, new_dentry, is_dir);
+	if (error)
+		return error;
+
+	if (!old_dir->i_op->rename)
+		return -EPERM;
 
 	/*
 	 * If we are going to change the parent - check write permissions,
 	 * we'll need to flip '..'.
 	 */
-	if (new_dir != old_dir) {
-		error = inode_permission(old_dentry->d_inode, MAY_WRITE);
+	if (is_dir && new_dir != old_dir) {
+		error = inode_permission(source, MAY_WRITE);
 		if (error)
 			return error;
 	}
@@ -3984,6 +4003,7 @@ static int vfs_rename_dir(struct inode *old_dir, struct dentry *old_dentry,
 	if (error)
 		return error;
 
+	old_name = fsnotify_oldname_init(old_dentry->d_name.name);
 	dget(new_dentry);
 	if (target)
 		mutex_lock(&target->i_mutex);
@@ -3992,96 +4012,36 @@ static int vfs_rename_dir(struct inode *old_dir, struct dentry *old_dentry,
 	if (d_mountpoint(old_dentry) || d_mountpoint(new_dentry))
 		goto out;
 
-	error = -EMLINK;
-	if (max_links && !target && new_dir != old_dir &&
-	    new_dir->i_nlink >= max_links)
-		goto out;
+	if (is_dir) {
+		unsigned max_links = new_dir->i_sb->s_max_links;
 
-	if (target)
-		shrink_dcache_parent(new_dentry);
-	error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry);
-	if (error)
-		goto out;
+		error = -EMLINK;
+		if (max_links && !target && new_dir != old_dir &&
+		    new_dir->i_nlink >= max_links)
+			goto out;
 
-	if (target) {
-		target->i_flags |= S_DEAD;
-		dont_mount(new_dentry);
+		if (target)
+			shrink_dcache_parent(new_dentry);
 	}
-	if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE))
-		d_move(old_dentry, new_dentry);
-out:
-	if (target)
-		mutex_unlock(&target->i_mutex);
-	dput(new_dentry);
-	return error;
-}
-
-static int vfs_rename_other(struct inode *old_dir, struct dentry *old_dentry,
-			    struct inode *new_dir, struct dentry *new_dentry)
-{
-	struct inode *target = new_dentry->d_inode;
-	int error;
-
-	error = security_inode_rename(old_dir, old_dentry, new_dir, new_dentry);
-	if (error)
-		return error;
-
-	dget(new_dentry);
-	if (target)
-		mutex_lock(&target->i_mutex);
-
-	error = -EBUSY;
-	if (d_mountpoint(old_dentry)||d_mountpoint(new_dentry))
-		goto out;
 
 	error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry);
 	if (error)
 		goto out;
 
-	if (target)
+	if (target) {
+		if (is_dir)
+			target->i_flags |= S_DEAD;
 		dont_mount(new_dentry);
+	}
 	if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE))
 		d_move(old_dentry, new_dentry);
 out:
 	if (target)
 		mutex_unlock(&target->i_mutex);
 	dput(new_dentry);
-	return error;
-}
-
-int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
-	       struct inode *new_dir, struct dentry *new_dentry)
-{
-	int error;
-	int is_dir = S_ISDIR(old_dentry->d_inode->i_mode);
-	const unsigned char *old_name;
-
-	if (old_dentry->d_inode == new_dentry->d_inode)
- 		return 0;
- 
-	error = may_delete(old_dir, old_dentry, is_dir);
-	if (error)
-		return error;
-
-	if (!new_dentry->d_inode)
-		error = may_create(new_dir, new_dentry);
-	else
-		error = may_delete(new_dir, new_dentry, is_dir);
-	if (error)
-		return error;
-
-	if (!old_dir->i_op->rename)
-		return -EPERM;
-
-	old_name = fsnotify_oldname_init(old_dentry->d_name.name);
-
-	if (is_dir)
-		error = vfs_rename_dir(old_dir,old_dentry,new_dir,new_dentry);
-	else
-		error = vfs_rename_other(old_dir,old_dentry,new_dir,new_dentry);
 	if (!error)
 		fsnotify_move(old_dir, new_dir, old_name, is_dir,
-			      new_dentry->d_inode, old_dentry);
+			      target, old_dentry);
 	fsnotify_oldname_free(old_name);
 
 	return error;
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/7] vfs: add renameat2 syscall and cross-rename
  2013-10-01 16:00 [RFC PATCH 0/7] cross rename Miklos Szeredi
  2013-10-01 16:00 ` [PATCH 1/7] vfs: rename: move d_move() up Miklos Szeredi
  2013-10-01 16:00 ` [PATCH 2/7] vfs: rename: use common code for dir and non-dir Miklos Szeredi
@ 2013-10-01 16:00 ` Miklos Szeredi
  2013-10-02 12:26   ` Jan Kara
  2013-10-01 16:00 ` [PATCH 4/7] ext4: rename: create ext4_renament structure for local vars Miklos Szeredi
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 18+ messages in thread
From: Miklos Szeredi @ 2013-10-01 16:00 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel
  Cc: viro, torvalds, hch, akpm, dhowells, zab, jack, tytso, mszeredi

From: Miklos Szeredi <mszeredi@suse.cz>

Add new renameat2 syscall, which is the same as renameat with an added
flags argument.

If flags is zero then this is a plain overwriting rename.  If flags contain
RENAME_EXCHANGE then exchange source and destination files.  There's no
restriction on the type of the files; e.g. a directory can be exchanged
with a symlink.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 arch/x86/syscalls/syscall_64.tbl |   1 +
 fs/dcache.c                      |  46 ++++++++++---
 fs/namei.c                       | 139 ++++++++++++++++++++++++++++++---------
 include/linux/dcache.h           |   1 +
 include/linux/fs.h               |   2 +
 include/uapi/linux/fs.h          |   2 +
 6 files changed, 152 insertions(+), 39 deletions(-)

diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
index 38ae65d..fafd734 100644
--- a/arch/x86/syscalls/syscall_64.tbl
+++ b/arch/x86/syscalls/syscall_64.tbl
@@ -320,6 +320,7 @@
 311	64	process_vm_writev	sys_process_vm_writev
 312	common	kcmp			sys_kcmp
 313	common	finit_module		sys_finit_module
+314	common	renameat2		sys_renameat2
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/fs/dcache.c b/fs/dcache.c
index 4100030..1735bac 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2495,12 +2495,14 @@ static void switch_names(struct dentry *dentry, struct dentry *target)
 			dentry->d_name.name = dentry->d_iname;
 		} else {
 			/*
-			 * Both are internal.  Just copy target to dentry
+			 * Both are internal.
 			 */
-			memcpy(dentry->d_iname, target->d_name.name,
-					target->d_name.len + 1);
-			dentry->d_name.len = target->d_name.len;
-			return;
+			unsigned int i;
+			BUILD_BUG_ON(!IS_ALIGNED(DNAME_INLINE_LEN, sizeof(long)));
+			for (i = 0; i < DNAME_INLINE_LEN / sizeof(long); i++) {
+				swap(((long *) &dentry->d_iname)[i],
+				     ((long *) &target->d_iname)[i]);
+			}
 		}
 	}
 	swap(dentry->d_name.len, target->d_name.len);
@@ -2557,13 +2559,15 @@ static void dentry_unlock_parents_for_move(struct dentry *dentry,
  * __d_move - move a dentry
  * @dentry: entry to move
  * @target: new dentry
+ * @exchange: exchange the two dentries
  *
  * Update the dcache to reflect the move of a file name. Negative
  * dcache entries should not be moved in this way. Caller must hold
  * rename_lock, the i_mutex of the source and target directories,
  * and the sb->s_vfs_rename_mutex if they differ. See lock_rename().
  */
-static void __d_move(struct dentry * dentry, struct dentry * target)
+static void __d_move(struct dentry *dentry, struct dentry *target,
+		     bool exchange)
 {
 	if (!dentry->d_inode)
 		printk(KERN_WARNING "VFS: moving negative dcache entry\n");
@@ -2587,6 +2591,10 @@ static void __d_move(struct dentry * dentry, struct dentry * target)
 
 	/* Unhash the target: dput() will then get rid of it */
 	__d_drop(target);
+	if (exchange) {
+		__d_rehash(target,
+			   d_hash(dentry->d_parent, dentry->d_name.hash));
+	}
 
 	list_del(&dentry->d_u.d_child);
 	list_del(&target->d_u.d_child);
@@ -2613,6 +2621,8 @@ static void __d_move(struct dentry * dentry, struct dentry * target)
 	write_seqcount_end(&dentry->d_seq);
 
 	dentry_unlock_parents_for_move(dentry, target);
+	if (exchange)
+		fsnotify_d_move(target);
 	spin_unlock(&target->d_lock);
 	fsnotify_d_move(dentry);
 	spin_unlock(&dentry->d_lock);
@@ -2630,11 +2640,31 @@ static void __d_move(struct dentry * dentry, struct dentry * target)
 void d_move(struct dentry *dentry, struct dentry *target)
 {
 	write_seqlock(&rename_lock);
-	__d_move(dentry, target);
+	__d_move(dentry, target, false);
 	write_sequnlock(&rename_lock);
 }
 EXPORT_SYMBOL(d_move);
 
+/*
+ * d_exchange - exchange two dentries
+ * @dentry1: first dentry
+ * @dentry2: second dentry
+ */
+void d_exchange(struct dentry *dentry1, struct dentry *dentry2)
+{
+	write_seqlock(&rename_lock);
+
+	WARN_ON(!dentry1->d_inode);
+	WARN_ON(!dentry2->d_inode);
+	WARN_ON(IS_ROOT(dentry1));
+	WARN_ON(IS_ROOT(dentry2));
+
+	__d_move(dentry1, dentry2, true);
+
+	write_sequnlock(&rename_lock);
+}
+
+
 /**
  * d_ancestor - search for an ancestor
  * @p1: ancestor dentry
@@ -2682,7 +2712,7 @@ static struct dentry *__d_unalias(struct inode *inode,
 	m2 = &alias->d_parent->d_inode->i_mutex;
 out_unalias:
 	if (likely(!d_mountpoint(alias))) {
-		__d_move(alias, dentry);
+		__d_move(alias, dentry, false);
 		ret = alias;
 	}
 out_err:
diff --git a/fs/namei.c b/fs/namei.c
index 7ec6a12..55700b3 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3963,14 +3963,18 @@ SYSCALL_DEFINE2(link, const char __user *, oldname, const char __user *, newname
  *	   ->i_mutex on parents, which works but leads to some truly excessive
  *	   locking].
  */
-int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
-	       struct inode *new_dir, struct dentry *new_dentry)
+static int vfs_rename2(struct inode *old_dir, struct dentry *old_dentry,
+		       struct inode *new_dir, struct dentry *new_dentry,
+		       unsigned int flags)
 {
 	int error;
 	const unsigned char *old_name;
 	struct inode *source = old_dentry->d_inode;
 	struct inode *target = new_dentry->d_inode;
 	bool is_dir = S_ISDIR(source->i_mode);
+	bool new_is_dir = target ? S_ISDIR(target->i_mode) : false;
+	bool overwrite = !(flags & RENAME_EXCHANGE);
+	unsigned max_links = new_dir->i_sb->s_max_links;
 
 	if (source == target)
 		return 0;
@@ -3981,74 +3985,116 @@ int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
 
 	if (!target)
 		error = may_create(new_dir, new_dentry);
-	else
+	else if (overwrite)
 		error = may_delete(new_dir, new_dentry, is_dir);
+	else
+		error = may_delete(new_dir, new_dentry, new_is_dir);
 	if (error)
 		return error;
 
-	if (!old_dir->i_op->rename)
+	if (!old_dir->i_op->rename && !old_dir->i_op->rename2)
 		return -EPERM;
 
+	if (flags && !old_dir->i_op->rename2)
+		return -EOPNOTSUPP;
+
 	/*
 	 * If we are going to change the parent - check write permissions,
 	 * we'll need to flip '..'.
 	 */
-	if (is_dir && new_dir != old_dir) {
-		error = inode_permission(source, MAY_WRITE);
-		if (error)
-			return error;
+	if (new_dir != old_dir) {
+		if (is_dir) {
+			error = inode_permission(source, MAY_WRITE);
+			if (error)
+				return error;
+		}
+		if (!overwrite && new_is_dir) {
+			error = inode_permission(target, MAY_WRITE);
+			if (error)
+				return error;
+		}
 	}
 
 	error = security_inode_rename(old_dir, old_dentry, new_dir, new_dentry);
 	if (error)
 		return error;
 
+	if (!overwrite) {
+		error = security_inode_rename(new_dir, new_dentry,
+					      old_dir, old_dentry);
+		if (error)
+			return error;
+	}
+
 	old_name = fsnotify_oldname_init(old_dentry->d_name.name);
 	dget(new_dentry);
-	if (target)
+	if (overwrite && target)
 		mutex_lock(&target->i_mutex);
 
 	error = -EBUSY;
 	if (d_mountpoint(old_dentry) || d_mountpoint(new_dentry))
 		goto out;
 
-	if (is_dir) {
-		unsigned max_links = new_dir->i_sb->s_max_links;
-
+	if (max_links && new_dir != old_dir) {
 		error = -EMLINK;
-		if (max_links && !target && new_dir != old_dir &&
-		    new_dir->i_nlink >= max_links)
+		if (is_dir && !new_is_dir && new_dir->i_nlink >= max_links)
 			goto out;
+		if (!overwrite && !is_dir && new_is_dir &&
+		    old_dir->i_nlink > max_links)
+			goto out;
+	}
+
+	if (overwrite && is_dir && target)
+		shrink_dcache_parent(new_dentry);
 
-		if (target)
-			shrink_dcache_parent(new_dentry);
+	if (old_dir->i_op->rename2) {
+		error = old_dir->i_op->rename2(old_dir, old_dentry,
+					       new_dir, new_dentry, flags);
+	} else {
+		error = old_dir->i_op->rename(old_dir, old_dentry,
+					      new_dir, new_dentry);
 	}
 
-	error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry);
 	if (error)
 		goto out;
 
-	if (target) {
+	if (overwrite && target) {
 		if (is_dir)
 			target->i_flags |= S_DEAD;
 		dont_mount(new_dentry);
 	}
-	if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE))
-		d_move(old_dentry, new_dentry);
+	if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE)) {
+		if (overwrite)
+			d_move(old_dentry, new_dentry);
+		else
+			d_exchange(old_dentry, new_dentry);
+	}
 out:
-	if (target)
+	if (overwrite && target)
 		mutex_unlock(&target->i_mutex);
 	dput(new_dentry);
-	if (!error)
+	if (!error) {
 		fsnotify_move(old_dir, new_dir, old_name, is_dir,
-			      target, old_dentry);
+			      overwrite ? target : NULL, old_dentry);
+		if (!overwrite) {
+			fsnotify_move(new_dir, old_dir, old_dentry->d_name.name,
+				      new_is_dir, NULL, new_dentry);
+		}
+	}
 	fsnotify_oldname_free(old_name);
 
 	return error;
 }
 
-SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
-		int, newdfd, const char __user *, newname)
+int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
+	       struct inode *new_dir, struct dentry *new_dentry)
+{
+	return vfs_rename2(old_dir, old_dentry, new_dir, new_dentry, 0);
+}
+
+
+SYSCALL_DEFINE5(renameat2, int, olddfd, const char __user *, oldname,
+		int, newdfd, const char __user *, newname, unsigned int, flags)
 {
 	struct dentry *old_dir, *new_dir;
 	struct dentry *old_dentry, *new_dentry;
@@ -4058,7 +4104,13 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
 	struct filename *to;
 	unsigned int lookup_flags = 0;
 	bool should_retry = false;
+	bool overwrite = !(flags & RENAME_EXCHANGE);
 	int error;
+
+	error = -EOPNOTSUPP;
+	if (flags & ~RENAME_EXCHANGE)
+		goto exit;
+
 retry:
 	from = user_path_parent(olddfd, oldname, &oldnd, lookup_flags);
 	if (IS_ERR(from)) {
@@ -4091,7 +4143,8 @@ retry:
 
 	oldnd.flags &= ~LOOKUP_PARENT;
 	newnd.flags &= ~LOOKUP_PARENT;
-	newnd.flags |= LOOKUP_RENAME_TARGET;
+	if (overwrite)
+		newnd.flags |= LOOKUP_RENAME_TARGET;
 
 	trap = lock_rename(new_dir, old_dir);
 
@@ -4108,7 +4161,7 @@ retry:
 		error = -ENOTDIR;
 		if (oldnd.last.name[oldnd.last.len])
 			goto exit4;
-		if (newnd.last.name[newnd.last.len])
+		if (overwrite && newnd.last.name[newnd.last.len])
 			goto exit4;
 	}
 	/* source should not be ancestor of target */
@@ -4119,8 +4172,19 @@ retry:
 	error = PTR_ERR(new_dentry);
 	if (IS_ERR(new_dentry))
 		goto exit4;
+	if (!overwrite) {
+		error = -ENOENT;
+		if (!new_dentry->d_inode)
+			goto exit4;
+
+		if (!S_ISDIR(new_dentry->d_inode->i_mode)) {
+			error = -ENOTDIR;
+			if (newnd.last.name[newnd.last.len])
+				goto exit4;
+		}
+	}
 	/* target should not be an ancestor of source */
-	error = -ENOTEMPTY;
+	error = overwrite ? -ENOTEMPTY : -EINVAL;
 	if (new_dentry == trap)
 		goto exit5;
 
@@ -4128,8 +4192,15 @@ retry:
 				     &newnd.path, new_dentry);
 	if (error)
 		goto exit5;
-	error = vfs_rename(old_dir->d_inode, old_dentry,
-				   new_dir->d_inode, new_dentry);
+	if (!overwrite) {
+		error = security_path_rename(&newnd.path, new_dentry,
+					     &oldnd.path, old_dentry);
+		if (error)
+			goto exit5;
+	}
+
+	error = vfs_rename2(old_dir->d_inode, old_dentry,
+			    new_dir->d_inode, new_dentry, flags);
 exit5:
 	dput(new_dentry);
 exit4:
@@ -4154,9 +4225,15 @@ exit:
 	return error;
 }
 
+SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
+		int, newdfd, const char __user *, newname)
+{
+	return sys_renameat2(olddfd, oldname, newdfd, newname, 0);
+}
+
 SYSCALL_DEFINE2(rename, const char __user *, oldname, const char __user *, newname)
 {
-	return sys_renameat(AT_FDCWD, oldname, AT_FDCWD, newname);
+	return sys_renameat2(AT_FDCWD, oldname, AT_FDCWD, newname, 0);
 }
 
 int vfs_readlink(struct dentry *dentry, char __user *buffer, int buflen, const char *link)
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 59066e0..ce5ebed 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -297,6 +297,7 @@ extern void dentry_update_name_case(struct dentry *, struct qstr *);
 
 /* used for rename() and baskets */
 extern void d_move(struct dentry *, struct dentry *);
+extern void d_exchange(struct dentry *, struct dentry *);
 extern struct dentry *d_ancestor(struct dentry *, struct dentry *);
 
 /* appendix may either be NULL or be used for transname suffixes */
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3f40547..71c6cf9 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1572,6 +1572,8 @@ struct inode_operations {
 	int (*mknod) (struct inode *,struct dentry *,umode_t,dev_t);
 	int (*rename) (struct inode *, struct dentry *,
 			struct inode *, struct dentry *);
+	int (*rename2) (struct inode *, struct dentry *,
+			struct inode *, struct dentry *, unsigned int);
 	int (*setattr) (struct dentry *, struct iattr *);
 	int (*getattr) (struct vfsmount *mnt, struct dentry *, struct kstat *);
 	int (*setxattr) (struct dentry *, const char *,const void *,size_t,int);
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 6c28b61..ebdafb6 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -35,6 +35,8 @@
 #define SEEK_HOLE	4	/* seek to the next hole */
 #define SEEK_MAX	SEEK_HOLE
 
+#define RENAME_EXCHANGE	(1 << 0)	/* Exchange source and dest */
+
 struct fstrim_range {
 	__u64 start;
 	__u64 len;
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/7] ext4: rename: create ext4_renament structure for local vars
  2013-10-01 16:00 [RFC PATCH 0/7] cross rename Miklos Szeredi
                   ` (2 preceding siblings ...)
  2013-10-01 16:00 ` [PATCH 3/7] vfs: add renameat2 syscall and cross-rename Miklos Szeredi
@ 2013-10-01 16:00 ` Miklos Szeredi
  2013-10-02 12:01   ` Jan Kara
  2013-10-01 16:00 ` [PATCH 5/7] ext4: rename: move EMLINK check up Miklos Szeredi
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 18+ messages in thread
From: Miklos Szeredi @ 2013-10-01 16:00 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel
  Cc: viro, torvalds, hch, akpm, dhowells, zab, jack, tytso, mszeredi

From: Miklos Szeredi <mszeredi@suse.cz>

Need to split up ext4_rename() into helpers but there are two many local
variables involved, so create a new structure.  This also, apparently,
makes the generated code size slightly smaller.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/ext4/namei.c | 207 ++++++++++++++++++++++++++++++--------------------------
 1 file changed, 110 insertions(+), 97 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 1bec5a5..01bd80e 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -3002,6 +3002,18 @@ static struct buffer_head *ext4_get_first_dir_block(handle_t *handle,
 	return ext4_get_first_inline_block(inode, parent_de, retval);
 }
 
+struct ext4_renament {
+	struct inode *dir;
+	struct dentry *dentry;
+	struct inode *inode;
+	struct buffer_head *bh;
+	struct buffer_head *dir_bh;
+	struct ext4_dir_entry_2 *de;
+	struct ext4_dir_entry_2 *parent_de;
+	int inlined;
+	int parent_inlined;
+};
+
 /*
  * Anybody can rename anything with this: the permission checks are left to the
  * higher-level routines.
@@ -3014,193 +3026,194 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
 		       struct inode *new_dir, struct dentry *new_dentry)
 {
 	handle_t *handle = NULL;
-	struct inode *old_inode, *new_inode;
-	struct buffer_head *old_bh, *new_bh, *dir_bh;
-	struct ext4_dir_entry_2 *old_de, *new_de;
+	struct ext4_renament old = {
+		.dir = old_dir,
+		.dentry = old_dentry,
+	};
+	struct ext4_renament new = {
+		.dir = new_dir,
+		.dentry = new_dentry,
+	};
 	int retval;
-	int inlined = 0, new_inlined = 0;
-	struct ext4_dir_entry_2 *parent_de;
-
-	dquot_initialize(old_dir);
-	dquot_initialize(new_dir);
 
-	old_bh = new_bh = dir_bh = NULL;
+	dquot_initialize(old.dir);
+	dquot_initialize(new.dir);
 
 	/* Initialize quotas before so that eventual writes go
 	 * in separate transaction */
-	if (new_dentry->d_inode)
-		dquot_initialize(new_dentry->d_inode);
+	if (new.dentry->d_inode)
+		dquot_initialize(new.dentry->d_inode);
 
-	old_bh = ext4_find_entry(old_dir, &old_dentry->d_name, &old_de, NULL);
+	old.bh = ext4_find_entry(old.dir, &old.dentry->d_name, &old.de, NULL);
 	/*
 	 *  Check for inode number is _not_ due to possible IO errors.
 	 *  We might rmdir the source, keep it as pwd of some process
 	 *  and merrily kill the link to whatever was created under the
 	 *  same name. Goodbye sticky bit ;-<
 	 */
-	old_inode = old_dentry->d_inode;
+	old.inode = old.dentry->d_inode;
 	retval = -ENOENT;
-	if (!old_bh || le32_to_cpu(old_de->inode) != old_inode->i_ino)
+	if (!old.bh || le32_to_cpu(old.de->inode) != old.inode->i_ino)
 		goto end_rename;
 
-	new_inode = new_dentry->d_inode;
-	new_bh = ext4_find_entry(new_dir, &new_dentry->d_name,
-				 &new_de, &new_inlined);
-	if (new_bh) {
-		if (!new_inode) {
-			brelse(new_bh);
-			new_bh = NULL;
+	new.inode = new.dentry->d_inode;
+	new.bh = ext4_find_entry(new.dir, &new.dentry->d_name,
+				 &new.de, &new.inlined);
+	if (new.bh) {
+		if (!new.inode) {
+			brelse(new.bh);
+			new.bh = NULL;
 		}
 	}
-	if (new_inode && !test_opt(new_dir->i_sb, NO_AUTO_DA_ALLOC))
-		ext4_alloc_da_blocks(old_inode);
+	if (new.inode && !test_opt(new.dir->i_sb, NO_AUTO_DA_ALLOC))
+		ext4_alloc_da_blocks(old.inode);
 
-	handle = ext4_journal_start(old_dir, EXT4_HT_DIR,
-		(2 * EXT4_DATA_TRANS_BLOCKS(old_dir->i_sb) +
+	handle = ext4_journal_start(old.dir, EXT4_HT_DIR,
+		(2 * EXT4_DATA_TRANS_BLOCKS(old.dir->i_sb) +
 		 EXT4_INDEX_EXTRA_TRANS_BLOCKS + 2));
 	if (IS_ERR(handle))
 		return PTR_ERR(handle);
 
-	if (IS_DIRSYNC(old_dir) || IS_DIRSYNC(new_dir))
+	if (IS_DIRSYNC(old.dir) || IS_DIRSYNC(new.dir))
 		ext4_handle_sync(handle);
 
-	if (S_ISDIR(old_inode->i_mode)) {
-		if (new_inode) {
+	if (S_ISDIR(old.inode->i_mode)) {
+		if (new.inode) {
 			retval = -ENOTEMPTY;
-			if (!empty_dir(new_inode))
+			if (!empty_dir(new.inode))
 				goto end_rename;
 		}
 		retval = -EIO;
-		dir_bh = ext4_get_first_dir_block(handle, old_inode,
-						  &retval, &parent_de,
-						  &inlined);
-		if (!dir_bh)
+		old.dir_bh = ext4_get_first_dir_block(handle, old.inode,
+						  &retval, &old.parent_de,
+						  &old.parent_inlined);
+		if (!old.dir_bh)
 			goto end_rename;
-		if (le32_to_cpu(parent_de->inode) != old_dir->i_ino)
+		if (le32_to_cpu(old.parent_de->inode) != old.dir->i_ino)
 			goto end_rename;
 		retval = -EMLINK;
-		if (!new_inode && new_dir != old_dir &&
-		    EXT4_DIR_LINK_MAX(new_dir))
+		if (!new.inode && new.dir != old.dir &&
+		    EXT4_DIR_LINK_MAX(new.dir))
 			goto end_rename;
-		BUFFER_TRACE(dir_bh, "get_write_access");
-		retval = ext4_journal_get_write_access(handle, dir_bh);
+		BUFFER_TRACE(old.dir_bh, "get_write_access");
+		retval = ext4_journal_get_write_access(handle, old.dir_bh);
 		if (retval)
 			goto end_rename;
 	}
-	if (!new_bh) {
-		retval = ext4_add_entry(handle, new_dentry, old_inode);
+	if (!new.bh) {
+		retval = ext4_add_entry(handle, new.dentry, old.inode);
 		if (retval)
 			goto end_rename;
 	} else {
-		BUFFER_TRACE(new_bh, "get write access");
-		retval = ext4_journal_get_write_access(handle, new_bh);
+		BUFFER_TRACE(new.bh, "get write access");
+		retval = ext4_journal_get_write_access(handle, new.bh);
 		if (retval)
 			goto end_rename;
-		new_de->inode = cpu_to_le32(old_inode->i_ino);
-		if (EXT4_HAS_INCOMPAT_FEATURE(new_dir->i_sb,
+		new.de->inode = cpu_to_le32(old.inode->i_ino);
+		if (EXT4_HAS_INCOMPAT_FEATURE(new.dir->i_sb,
 					      EXT4_FEATURE_INCOMPAT_FILETYPE))
-			new_de->file_type = old_de->file_type;
-		new_dir->i_version++;
-		new_dir->i_ctime = new_dir->i_mtime =
-					ext4_current_time(new_dir);
-		ext4_mark_inode_dirty(handle, new_dir);
-		BUFFER_TRACE(new_bh, "call ext4_handle_dirty_metadata");
-		if (!new_inlined) {
+			new.de->file_type = old.de->file_type;
+		new.dir->i_version++;
+		new.dir->i_ctime = new.dir->i_mtime =
+					ext4_current_time(new.dir);
+		ext4_mark_inode_dirty(handle, new.dir);
+		BUFFER_TRACE(new.bh, "call ext4_handle_dirty_metadata");
+		if (!new.inlined) {
 			retval = ext4_handle_dirty_dirent_node(handle,
-							       new_dir, new_bh);
+							       new.dir, new.bh);
 			if (unlikely(retval)) {
-				ext4_std_error(new_dir->i_sb, retval);
+				ext4_std_error(new.dir->i_sb, retval);
 				goto end_rename;
 			}
 		}
-		brelse(new_bh);
-		new_bh = NULL;
+		brelse(new.bh);
+		new.bh = NULL;
 	}
 
 	/*
 	 * Like most other Unix systems, set the ctime for inodes on a
 	 * rename.
 	 */
-	old_inode->i_ctime = ext4_current_time(old_inode);
-	ext4_mark_inode_dirty(handle, old_inode);
+	old.inode->i_ctime = ext4_current_time(old.inode);
+	ext4_mark_inode_dirty(handle, old.inode);
 
 	/*
 	 * ok, that's it
 	 */
-	if (le32_to_cpu(old_de->inode) != old_inode->i_ino ||
-	    old_de->name_len != old_dentry->d_name.len ||
-	    strncmp(old_de->name, old_dentry->d_name.name, old_de->name_len) ||
-	    (retval = ext4_delete_entry(handle, old_dir,
-					old_de, old_bh)) == -ENOENT) {
-		/* old_de could have moved from under us during htree split, so
+	if (le32_to_cpu(old.de->inode) != old.inode->i_ino ||
+	    old.de->name_len != old.dentry->d_name.len ||
+	    strncmp(old.de->name, old.dentry->d_name.name, old.de->name_len) ||
+	    (retval = ext4_delete_entry(handle, old.dir,
+					old.de, old.bh)) == -ENOENT) {
+		/* old.de could have moved from under us during htree split, so
 		 * make sure that we are deleting the right entry.  We might
 		 * also be pointing to a stale entry in the unused part of
-		 * old_bh so just checking inum and the name isn't enough. */
+		 * old.bh so just checking inum and the name isn't enough. */
 		struct buffer_head *old_bh2;
 		struct ext4_dir_entry_2 *old_de2;
 
-		old_bh2 = ext4_find_entry(old_dir, &old_dentry->d_name,
+		old_bh2 = ext4_find_entry(old.dir, &old.dentry->d_name,
 					  &old_de2, NULL);
 		if (old_bh2) {
-			retval = ext4_delete_entry(handle, old_dir,
+			retval = ext4_delete_entry(handle, old.dir,
 						   old_de2, old_bh2);
 			brelse(old_bh2);
 		}
 	}
 	if (retval) {
-		ext4_warning(old_dir->i_sb,
+		ext4_warning(old.dir->i_sb,
 				"Deleting old file (%lu), %d, error=%d",
-				old_dir->i_ino, old_dir->i_nlink, retval);
+				old.dir->i_ino, old.dir->i_nlink, retval);
 	}
 
-	if (new_inode) {
-		ext4_dec_count(handle, new_inode);
-		new_inode->i_ctime = ext4_current_time(new_inode);
+	if (new.inode) {
+		ext4_dec_count(handle, new.inode);
+		new.inode->i_ctime = ext4_current_time(new.inode);
 	}
-	old_dir->i_ctime = old_dir->i_mtime = ext4_current_time(old_dir);
-	ext4_update_dx_flag(old_dir);
-	if (dir_bh) {
-		parent_de->inode = cpu_to_le32(new_dir->i_ino);
-		BUFFER_TRACE(dir_bh, "call ext4_handle_dirty_metadata");
-		if (!inlined) {
-			if (is_dx(old_inode)) {
+	old.dir->i_ctime = old.dir->i_mtime = ext4_current_time(old.dir);
+	ext4_update_dx_flag(old.dir);
+	if (old.dir_bh) {
+		old.parent_de->inode = cpu_to_le32(new.dir->i_ino);
+		BUFFER_TRACE(old.dir_bh, "call ext4_handle_dirty_metadata");
+		if (!old.parent_inlined) {
+			if (is_dx(old.inode)) {
 				retval = ext4_handle_dirty_dx_node(handle,
-								   old_inode,
-								   dir_bh);
+								   old.inode,
+								   old.dir_bh);
 			} else {
 				retval = ext4_handle_dirty_dirent_node(handle,
-							old_inode, dir_bh);
+							old.inode, old.dir_bh);
 			}
 		} else {
-			retval = ext4_mark_inode_dirty(handle, old_inode);
+			retval = ext4_mark_inode_dirty(handle, old.inode);
 		}
 		if (retval) {
-			ext4_std_error(old_dir->i_sb, retval);
+			ext4_std_error(old.dir->i_sb, retval);
 			goto end_rename;
 		}
-		ext4_dec_count(handle, old_dir);
-		if (new_inode) {
+		ext4_dec_count(handle, old.dir);
+		if (new.inode) {
 			/* checked empty_dir above, can't have another parent,
 			 * ext4_dec_count() won't work for many-linked dirs */
-			clear_nlink(new_inode);
+			clear_nlink(new.inode);
 		} else {
-			ext4_inc_count(handle, new_dir);
-			ext4_update_dx_flag(new_dir);
-			ext4_mark_inode_dirty(handle, new_dir);
+			ext4_inc_count(handle, new.dir);
+			ext4_update_dx_flag(new.dir);
+			ext4_mark_inode_dirty(handle, new.dir);
 		}
 	}
-	ext4_mark_inode_dirty(handle, old_dir);
-	if (new_inode) {
-		ext4_mark_inode_dirty(handle, new_inode);
-		if (!new_inode->i_nlink)
-			ext4_orphan_add(handle, new_inode);
+	ext4_mark_inode_dirty(handle, old.dir);
+	if (new.inode) {
+		ext4_mark_inode_dirty(handle, new.inode);
+		if (!new.inode->i_nlink)
+			ext4_orphan_add(handle, new.inode);
 	}
 	retval = 0;
 
 end_rename:
-	brelse(dir_bh);
-	brelse(old_bh);
-	brelse(new_bh);
+	brelse(old.dir_bh);
+	brelse(old.bh);
+	brelse(new.bh);
 	if (handle)
 		ext4_journal_stop(handle);
 	return retval;
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/7] ext4: rename: move EMLINK check up
  2013-10-01 16:00 [RFC PATCH 0/7] cross rename Miklos Szeredi
                   ` (3 preceding siblings ...)
  2013-10-01 16:00 ` [PATCH 4/7] ext4: rename: create ext4_renament structure for local vars Miklos Szeredi
@ 2013-10-01 16:00 ` Miklos Szeredi
  2013-10-02 12:05   ` Jan Kara
  2013-10-01 16:00 ` [PATCH 6/7] ext4: rename: split out helper functions Miklos Szeredi
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 18+ messages in thread
From: Miklos Szeredi @ 2013-10-01 16:00 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel
  Cc: viro, torvalds, hch, akpm, dhowells, zab, jack, tytso, mszeredi

From: Miklos Szeredi <mszeredi@suse.cz>

Move checking i_nlink from after ext4_get_first_dir_block() to before.  The
check doesn't rely on the result of that function and the function only
fails on fs corruption, so the order shouldn't matter.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/ext4/namei.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 01bd80e..1348251 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -3082,6 +3082,10 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
 			retval = -ENOTEMPTY;
 			if (!empty_dir(new.inode))
 				goto end_rename;
+		} else {
+			retval = -EMLINK;
+			if (new.dir != old.dir && EXT4_DIR_LINK_MAX(new.dir))
+				goto end_rename;
 		}
 		retval = -EIO;
 		old.dir_bh = ext4_get_first_dir_block(handle, old.inode,
@@ -3091,10 +3095,6 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
 			goto end_rename;
 		if (le32_to_cpu(old.parent_de->inode) != old.dir->i_ino)
 			goto end_rename;
-		retval = -EMLINK;
-		if (!new.inode && new.dir != old.dir &&
-		    EXT4_DIR_LINK_MAX(new.dir))
-			goto end_rename;
 		BUFFER_TRACE(old.dir_bh, "get_write_access");
 		retval = ext4_journal_get_write_access(handle, old.dir_bh);
 		if (retval)
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 6/7] ext4: rename: split out helper functions
  2013-10-01 16:00 [RFC PATCH 0/7] cross rename Miklos Szeredi
                   ` (4 preceding siblings ...)
  2013-10-01 16:00 ` [PATCH 5/7] ext4: rename: move EMLINK check up Miklos Szeredi
@ 2013-10-01 16:00 ` Miklos Szeredi
  2013-10-02 12:19   ` Jan Kara
  2013-10-01 16:00 ` [PATCH 7/7] ext4: add cross rename support Miklos Szeredi
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 18+ messages in thread
From: Miklos Szeredi @ 2013-10-01 16:00 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel
  Cc: viro, torvalds, hch, akpm, dhowells, zab, jack, tytso, mszeredi

From: Miklos Szeredi <mszeredi@suse.cz>

Cross rename (exchange source and dest) will need to call some of these
helpers for both source and dest, while overwriting rename currently only
calls them for one or the other.  This also makes the code easier to
follow.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/ext4/namei.c | 199 +++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 126 insertions(+), 73 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 1348251..fb0f1db 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -3014,6 +3014,125 @@ struct ext4_renament {
 	int parent_inlined;
 };
 
+static int ext4_rename_dir_prepare(handle_t *handle, struct ext4_renament *ent)
+{
+	int retval;
+
+	ent->dir_bh = ext4_get_first_dir_block(handle, ent->inode,
+					      &retval, &ent->parent_de,
+					      &ent->parent_inlined);
+	if (!ent->dir_bh)
+		return retval;
+	if (le32_to_cpu(ent->parent_de->inode) != ent->dir->i_ino)
+		return -EIO;
+	BUFFER_TRACE(ent->dir_bh, "get_write_access");
+	return ext4_journal_get_write_access(handle, ent->dir_bh);
+}
+
+static int ext4_rename_dir_finish(handle_t *handle, struct ext4_renament *ent,
+				  unsigned dir_ino)
+{
+	int retval;
+
+	ent->parent_de->inode = cpu_to_le32(dir_ino);
+	BUFFER_TRACE(ent->dir_bh, "call ext4_handle_dirty_metadata");
+	if (!ent->parent_inlined) {
+		if (is_dx(ent->inode)) {
+			retval = ext4_handle_dirty_dx_node(handle,
+							   ent->inode,
+							   ent->dir_bh);
+		} else {
+			retval = ext4_handle_dirty_dirent_node(handle,
+							       ent->inode,
+							       ent->dir_bh);
+		}
+	} else {
+		retval = ext4_mark_inode_dirty(handle, ent->inode);
+	}
+	if (retval) {
+		ext4_std_error(ent->dir->i_sb, retval);
+		return retval;
+	}
+	return 0;
+}
+
+static int ext4_setent(handle_t *handle, struct ext4_renament *ent,
+		       unsigned ino, unsigned file_type)
+{
+	int retval;
+
+	BUFFER_TRACE(ent->bh, "get write access");
+	retval = ext4_journal_get_write_access(handle, ent->bh);
+	if (retval)
+		return retval;
+	ent->de->inode = cpu_to_le32(ino);
+	if (EXT4_HAS_INCOMPAT_FEATURE(ent->dir->i_sb,
+				      EXT4_FEATURE_INCOMPAT_FILETYPE))
+		ent->de->file_type = file_type;
+	ent->dir->i_version++;
+	ent->dir->i_ctime = ent->dir->i_mtime =
+		ext4_current_time(ent->dir);
+	ext4_mark_inode_dirty(handle, ent->dir);
+	BUFFER_TRACE(ent->bh, "call ext4_handle_dirty_metadata");
+	if (!ent->inlined) {
+		retval = ext4_handle_dirty_dirent_node(handle,
+						       ent->dir, ent->bh);
+		if (unlikely(retval)) {
+			ext4_std_error(ent->dir->i_sb, retval);
+			return retval;
+		}
+	}
+	brelse(ent->bh);
+	ent->bh = NULL;
+
+	return 0;
+}
+
+static int ext4_find_delete_entry(handle_t *handle, struct inode *dir,
+				  const struct qstr *d_name)
+{
+	int retval = -ENOENT;
+	struct buffer_head *bh;
+	struct ext4_dir_entry_2 *de;
+
+	bh = ext4_find_entry(dir, d_name, &de, NULL);
+	if (bh) {
+		retval = ext4_delete_entry(handle, dir, de, bh);
+		brelse(bh);
+	}
+	return retval;
+}
+
+static void ext4_rename_delete(handle_t *handle, struct ext4_renament *ent)
+{
+	int retval;
+	/*
+	 * ent->de could have moved from under us during htree split, so make
+	 * sure that we are deleting the right entry.  We might also be pointing
+	 * to a stale entry in the unused part of ent->bh so just checking inum
+	 * and the name isn't enough.
+	 */
+	if (le32_to_cpu(ent->de->inode) != ent->inode->i_ino ||
+	    ent->de->name_len != ent->dentry->d_name.len ||
+	    strncmp(ent->de->name, ent->dentry->d_name.name,
+		    ent->de->name_len)) {
+		retval = ext4_find_delete_entry(handle, ent->dir,
+						&ent->dentry->d_name);
+	} else {
+		retval = ext4_delete_entry(handle, ent->dir, ent->de, ent->bh);
+		if (retval == -ENOENT) {
+			retval = ext4_find_delete_entry(handle, ent->dir,
+							&ent->dentry->d_name);
+		}
+	}
+
+	if (retval) {
+		ext4_warning(ent->dir->i_sb,
+				"Deleting old file (%lu), %d, error=%d",
+				ent->dir->i_ino, ent->dir->i_nlink, retval);
+	}
+}
+
 /*
  * Anybody can rename anything with this: the permission checks are left to the
  * higher-level routines.
@@ -3087,16 +3206,7 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
 			if (new.dir != old.dir && EXT4_DIR_LINK_MAX(new.dir))
 				goto end_rename;
 		}
-		retval = -EIO;
-		old.dir_bh = ext4_get_first_dir_block(handle, old.inode,
-						  &retval, &old.parent_de,
-						  &old.parent_inlined);
-		if (!old.dir_bh)
-			goto end_rename;
-		if (le32_to_cpu(old.parent_de->inode) != old.dir->i_ino)
-			goto end_rename;
-		BUFFER_TRACE(old.dir_bh, "get_write_access");
-		retval = ext4_journal_get_write_access(handle, old.dir_bh);
+		retval = ext4_rename_dir_prepare(handle, &old);
 		if (retval)
 			goto end_rename;
 	}
@@ -3105,29 +3215,10 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
 		if (retval)
 			goto end_rename;
 	} else {
-		BUFFER_TRACE(new.bh, "get write access");
-		retval = ext4_journal_get_write_access(handle, new.bh);
+		retval = ext4_setent(handle, &new,
+				     old.inode->i_ino, old.de->file_type);
 		if (retval)
 			goto end_rename;
-		new.de->inode = cpu_to_le32(old.inode->i_ino);
-		if (EXT4_HAS_INCOMPAT_FEATURE(new.dir->i_sb,
-					      EXT4_FEATURE_INCOMPAT_FILETYPE))
-			new.de->file_type = old.de->file_type;
-		new.dir->i_version++;
-		new.dir->i_ctime = new.dir->i_mtime =
-					ext4_current_time(new.dir);
-		ext4_mark_inode_dirty(handle, new.dir);
-		BUFFER_TRACE(new.bh, "call ext4_handle_dirty_metadata");
-		if (!new.inlined) {
-			retval = ext4_handle_dirty_dirent_node(handle,
-							       new.dir, new.bh);
-			if (unlikely(retval)) {
-				ext4_std_error(new.dir->i_sb, retval);
-				goto end_rename;
-			}
-		}
-		brelse(new.bh);
-		new.bh = NULL;
 	}
 
 	/*
@@ -3140,31 +3231,7 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
 	/*
 	 * ok, that's it
 	 */
-	if (le32_to_cpu(old.de->inode) != old.inode->i_ino ||
-	    old.de->name_len != old.dentry->d_name.len ||
-	    strncmp(old.de->name, old.dentry->d_name.name, old.de->name_len) ||
-	    (retval = ext4_delete_entry(handle, old.dir,
-					old.de, old.bh)) == -ENOENT) {
-		/* old.de could have moved from under us during htree split, so
-		 * make sure that we are deleting the right entry.  We might
-		 * also be pointing to a stale entry in the unused part of
-		 * old.bh so just checking inum and the name isn't enough. */
-		struct buffer_head *old_bh2;
-		struct ext4_dir_entry_2 *old_de2;
-
-		old_bh2 = ext4_find_entry(old.dir, &old.dentry->d_name,
-					  &old_de2, NULL);
-		if (old_bh2) {
-			retval = ext4_delete_entry(handle, old.dir,
-						   old_de2, old_bh2);
-			brelse(old_bh2);
-		}
-	}
-	if (retval) {
-		ext4_warning(old.dir->i_sb,
-				"Deleting old file (%lu), %d, error=%d",
-				old.dir->i_ino, old.dir->i_nlink, retval);
-	}
+	ext4_rename_delete(handle, &old);
 
 	if (new.inode) {
 		ext4_dec_count(handle, new.inode);
@@ -3173,24 +3240,10 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
 	old.dir->i_ctime = old.dir->i_mtime = ext4_current_time(old.dir);
 	ext4_update_dx_flag(old.dir);
 	if (old.dir_bh) {
-		old.parent_de->inode = cpu_to_le32(new.dir->i_ino);
-		BUFFER_TRACE(old.dir_bh, "call ext4_handle_dirty_metadata");
-		if (!old.parent_inlined) {
-			if (is_dx(old.inode)) {
-				retval = ext4_handle_dirty_dx_node(handle,
-								   old.inode,
-								   old.dir_bh);
-			} else {
-				retval = ext4_handle_dirty_dirent_node(handle,
-							old.inode, old.dir_bh);
-			}
-		} else {
-			retval = ext4_mark_inode_dirty(handle, old.inode);
-		}
-		if (retval) {
-			ext4_std_error(old.dir->i_sb, retval);
+		retval = ext4_rename_dir_finish(handle, &old, new.dir->i_ino);
+		if (retval)
 			goto end_rename;
-		}
+
 		ext4_dec_count(handle, old.dir);
 		if (new.inode) {
 			/* checked empty_dir above, can't have another parent,
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 7/7] ext4: add cross rename support
  2013-10-01 16:00 [RFC PATCH 0/7] cross rename Miklos Szeredi
                   ` (5 preceding siblings ...)
  2013-10-01 16:00 ` [PATCH 6/7] ext4: rename: split out helper functions Miklos Szeredi
@ 2013-10-01 16:00 ` Miklos Szeredi
  2013-10-03  1:58 ` [RFC PATCH 0/7] cross rename H. Peter Anvin
  2013-10-04 23:58 ` Andy Lutomirski
  8 siblings, 0 replies; 18+ messages in thread
From: Miklos Szeredi @ 2013-10-01 16:00 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel
  Cc: viro, torvalds, hch, akpm, dhowells, zab, jack, tytso, mszeredi

From: Miklos Szeredi <mszeredi@suse.cz>

Implement RENAME_EXCHANGE flag in renameat2 syscall.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/ext4/namei.c | 103 +++++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 73 insertions(+), 30 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index fb0f1db..6d87a09 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -3141,8 +3141,9 @@ static void ext4_rename_delete(handle_t *handle, struct ext4_renament *ent)
  * while new_{dentry,inode) refers to the destination dentry/inode
  * This comes from rename(const char *oldpath, const char *newpath)
  */
-static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
-		       struct inode *new_dir, struct dentry *new_dentry)
+static int ext4_rename2(struct inode *old_dir, struct dentry *old_dentry,
+			struct inode *new_dir, struct dentry *new_dentry,
+			unsigned int flags)
 {
 	handle_t *handle = NULL;
 	struct ext4_renament old = {
@@ -3154,16 +3155,18 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
 		.dentry = new_dentry,
 	};
 	int retval;
+	bool overwrite = !(flags & RENAME_EXCHANGE);
 
 	dquot_initialize(old.dir);
 	dquot_initialize(new.dir);
 
 	/* Initialize quotas before so that eventual writes go
 	 * in separate transaction */
-	if (new.dentry->d_inode)
+	if (overwrite && new.dentry->d_inode)
 		dquot_initialize(new.dentry->d_inode);
 
-	old.bh = ext4_find_entry(old.dir, &old.dentry->d_name, &old.de, NULL);
+	old.bh = ext4_find_entry(old.dir, &old.dentry->d_name,
+				 &old.de, &old.inlined);
 	/*
 	 *  Check for inode number is _not_ due to possible IO errors.
 	 *  We might rmdir the source, keep it as pwd of some process
@@ -3178,18 +3181,22 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
 	new.inode = new.dentry->d_inode;
 	new.bh = ext4_find_entry(new.dir, &new.dentry->d_name,
 				 &new.de, &new.inlined);
-	if (new.bh) {
-		if (!new.inode) {
-			brelse(new.bh);
-			new.bh = NULL;
+	if (overwrite) {
+		if (new.bh) {
+			if (!new.inode) {
+				brelse(new.bh);
+				new.bh = NULL;
+			}
 		}
+		if (new.inode && !test_opt(new.dir->i_sb, NO_AUTO_DA_ALLOC))
+			ext4_alloc_da_blocks(old.inode);
+	} else if (!new.bh || le32_to_cpu(new.de->inode) != new.inode->i_ino) {
+		goto end_rename;
 	}
-	if (new.inode && !test_opt(new.dir->i_sb, NO_AUTO_DA_ALLOC))
-		ext4_alloc_da_blocks(old.inode);
 
 	handle = ext4_journal_start(old.dir, EXT4_HT_DIR,
 		(2 * EXT4_DATA_TRANS_BLOCKS(old.dir->i_sb) +
-		 EXT4_INDEX_EXTRA_TRANS_BLOCKS + 2));
+		 2 * EXT4_INDEX_EXTRA_TRANS_BLOCKS + 2));
 	if (IS_ERR(handle))
 		return PTR_ERR(handle);
 
@@ -3197,11 +3204,12 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
 		ext4_handle_sync(handle);
 
 	if (S_ISDIR(old.inode->i_mode)) {
-		if (new.inode) {
+		if (overwrite && new.inode) {
 			retval = -ENOTEMPTY;
 			if (!empty_dir(new.inode))
 				goto end_rename;
-		} else {
+		}
+		if (!new.inode || !S_ISDIR(new.inode->i_mode)) {
 			retval = -EMLINK;
 			if (new.dir != old.dir && EXT4_DIR_LINK_MAX(new.dir))
 				goto end_rename;
@@ -3210,15 +3218,34 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
 		if (retval)
 			goto end_rename;
 	}
+	if (!overwrite && S_ISDIR(new.inode->i_mode)) {
+		if (!S_ISDIR(old.inode->i_mode)) {
+			retval = -EMLINK;
+			if (new.dir != old.dir && EXT4_DIR_LINK_MAX(old.dir))
+				goto end_rename;
+		}
+		retval = ext4_rename_dir_prepare(handle, &new);
+		if (retval)
+			goto end_rename;
+	}
+
 	if (!new.bh) {
 		retval = ext4_add_entry(handle, new.dentry, old.inode);
 		if (retval)
 			goto end_rename;
 	} else {
+		u8 new_file_type = new.de->file_type;
 		retval = ext4_setent(handle, &new,
 				     old.inode->i_ino, old.de->file_type);
 		if (retval)
 			goto end_rename;
+
+		if (!overwrite) {
+			retval = ext4_setent(handle, &old,
+					     new.inode->i_ino, new_file_type);
+			if (retval)
+				goto end_rename;
+		}
 	}
 
 	/*
@@ -3228,35 +3255,51 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
 	old.inode->i_ctime = ext4_current_time(old.inode);
 	ext4_mark_inode_dirty(handle, old.inode);
 
-	/*
-	 * ok, that's it
-	 */
-	ext4_rename_delete(handle, &old);
+	if (overwrite) {
+		/*
+		 * ok, that's it
+		 */
+		ext4_rename_delete(handle, &old);
 
-	if (new.inode) {
-		ext4_dec_count(handle, new.inode);
-		new.inode->i_ctime = ext4_current_time(new.inode);
+		old.dir->i_ctime = old.dir->i_mtime = ext4_current_time(old.dir);
+		ext4_update_dx_flag(old.dir);
 	}
-	old.dir->i_ctime = old.dir->i_mtime = ext4_current_time(old.dir);
-	ext4_update_dx_flag(old.dir);
+	/* S_ISDIR(old.inode->i_mode */
 	if (old.dir_bh) {
 		retval = ext4_rename_dir_finish(handle, &old, new.dir->i_ino);
 		if (retval)
 			goto end_rename;
 
-		ext4_dec_count(handle, old.dir);
-		if (new.inode) {
-			/* checked empty_dir above, can't have another parent,
-			 * ext4_dec_count() won't work for many-linked dirs */
-			clear_nlink(new.inode);
-		} else {
+		if (overwrite || !S_ISDIR(new.inode->i_mode))
+			ext4_dec_count(handle, old.dir);
+
+		if (!new.inode || !S_ISDIR(new.inode->i_mode)) {
 			ext4_inc_count(handle, new.dir);
 			ext4_update_dx_flag(new.dir);
 			ext4_mark_inode_dirty(handle, new.dir);
 		}
 	}
+	/* !overwrite && S_ISDIR(new.inode->i_mode */
+	if (new.dir_bh) {
+		retval = ext4_rename_dir_finish(handle, &new, old.dir->i_ino);
+		if (retval)
+			goto end_rename;
+
+		if (!S_ISDIR(old.inode->i_mode)) {
+			ext4_dec_count(handle, new.dir);
+			ext4_inc_count(handle, old.dir);
+			ext4_mark_inode_dirty(handle, new.dir);
+		}
+	}
 	ext4_mark_inode_dirty(handle, old.dir);
-	if (new.inode) {
+	if (overwrite && new.inode) {
+		ext4_dec_count(handle, new.inode);
+		new.inode->i_ctime = ext4_current_time(new.inode);
+		if (S_ISDIR(old.inode->i_mode)) {
+			/* checked empty_dir above, can't have another parent,
+			 * ext4_dec_count() won't work for many-linked dirs */
+			clear_nlink(new.inode);
+		}
 		ext4_mark_inode_dirty(handle, new.inode);
 		if (!new.inode->i_nlink)
 			ext4_orphan_add(handle, new.inode);
@@ -3285,7 +3328,7 @@ const struct inode_operations ext4_dir_inode_operations = {
 	.rmdir		= ext4_rmdir,
 	.mknod		= ext4_mknod,
 	.tmpfile	= ext4_tmpfile,
-	.rename		= ext4_rename,
+	.rename2	= ext4_rename2,
 	.setattr	= ext4_setattr,
 	.setxattr	= generic_setxattr,
 	.getxattr	= generic_getxattr,
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/7] ext4: rename: create ext4_renament structure for local vars
  2013-10-01 16:00 ` [PATCH 4/7] ext4: rename: create ext4_renament structure for local vars Miklos Szeredi
@ 2013-10-02 12:01   ` Jan Kara
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Kara @ 2013-10-02 12:01 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-fsdevel, linux-kernel, viro, torvalds, hch, akpm, dhowells,
	zab, jack, tytso, mszeredi

On Tue 01-10-13 18:00:36, Miklos Szeredi wrote:
> From: Miklos Szeredi <mszeredi@suse.cz>
> 
> Need to split up ext4_rename() into helpers but there are two many local
> variables involved, so create a new structure.  This also, apparently,
> makes the generated code size slightly smaller.
> 
> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
> ---
>  fs/ext4/namei.c | 207 ++++++++++++++++++++++++++++++--------------------------
>  1 file changed, 110 insertions(+), 97 deletions(-)
> 
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index 1bec5a5..01bd80e 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -3002,6 +3002,18 @@ static struct buffer_head *ext4_get_first_dir_block(handle_t *handle,
>  	return ext4_get_first_inline_block(inode, parent_de, retval);
>  }
>  
> +struct ext4_renament {
> +	struct inode *dir;
> +	struct dentry *dentry;
> +	struct inode *inode;
> +	struct buffer_head *bh;
> +	struct buffer_head *dir_bh;
> +	struct ext4_dir_entry_2 *de;
> +	struct ext4_dir_entry_2 *parent_de;
> +	int inlined;
> +	int parent_inlined;
> +};
  The patch looks good to me. Only the inline handling looks a bit strange.
The structure have inlined and parent_inlined however you only use
new.inlined and old.parent_inlined. Now new.inlined actually means whether
new.dir is inlined (i.e. the destination directory of rename) and
old.parent_inlined is set when the source for rename is an inlined
directory. So at least the naming looks the other way around to me.

Also as the code currently is including 'inlined' and 'parent_inlined' in
ext4_renament looks like overdoing it a bit. But I haven't checked the
other two patches yet.

								Honza
>  /*
>   * Anybody can rename anything with this: the permission checks are left to the
>   * higher-level routines.
> @@ -3014,193 +3026,194 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
>  		       struct inode *new_dir, struct dentry *new_dentry)
>  {
>  	handle_t *handle = NULL;
> -	struct inode *old_inode, *new_inode;
> -	struct buffer_head *old_bh, *new_bh, *dir_bh;
> -	struct ext4_dir_entry_2 *old_de, *new_de;
> +	struct ext4_renament old = {
> +		.dir = old_dir,
> +		.dentry = old_dentry,
> +	};
> +	struct ext4_renament new = {
> +		.dir = new_dir,
> +		.dentry = new_dentry,
> +	};
>  	int retval;
> -	int inlined = 0, new_inlined = 0;
> -	struct ext4_dir_entry_2 *parent_de;
> -
> -	dquot_initialize(old_dir);
> -	dquot_initialize(new_dir);
>  
> -	old_bh = new_bh = dir_bh = NULL;
> +	dquot_initialize(old.dir);
> +	dquot_initialize(new.dir);
>  
>  	/* Initialize quotas before so that eventual writes go
>  	 * in separate transaction */
> -	if (new_dentry->d_inode)
> -		dquot_initialize(new_dentry->d_inode);
> +	if (new.dentry->d_inode)
> +		dquot_initialize(new.dentry->d_inode);
>  
> -	old_bh = ext4_find_entry(old_dir, &old_dentry->d_name, &old_de, NULL);
> +	old.bh = ext4_find_entry(old.dir, &old.dentry->d_name, &old.de, NULL);
>  	/*
>  	 *  Check for inode number is _not_ due to possible IO errors.
>  	 *  We might rmdir the source, keep it as pwd of some process
>  	 *  and merrily kill the link to whatever was created under the
>  	 *  same name. Goodbye sticky bit ;-<
>  	 */
> -	old_inode = old_dentry->d_inode;
> +	old.inode = old.dentry->d_inode;
>  	retval = -ENOENT;
> -	if (!old_bh || le32_to_cpu(old_de->inode) != old_inode->i_ino)
> +	if (!old.bh || le32_to_cpu(old.de->inode) != old.inode->i_ino)
>  		goto end_rename;
>  
> -	new_inode = new_dentry->d_inode;
> -	new_bh = ext4_find_entry(new_dir, &new_dentry->d_name,
> -				 &new_de, &new_inlined);
> -	if (new_bh) {
> -		if (!new_inode) {
> -			brelse(new_bh);
> -			new_bh = NULL;
> +	new.inode = new.dentry->d_inode;
> +	new.bh = ext4_find_entry(new.dir, &new.dentry->d_name,
> +				 &new.de, &new.inlined);
> +	if (new.bh) {
> +		if (!new.inode) {
> +			brelse(new.bh);
> +			new.bh = NULL;
>  		}
>  	}
> -	if (new_inode && !test_opt(new_dir->i_sb, NO_AUTO_DA_ALLOC))
> -		ext4_alloc_da_blocks(old_inode);
> +	if (new.inode && !test_opt(new.dir->i_sb, NO_AUTO_DA_ALLOC))
> +		ext4_alloc_da_blocks(old.inode);
>  
> -	handle = ext4_journal_start(old_dir, EXT4_HT_DIR,
> -		(2 * EXT4_DATA_TRANS_BLOCKS(old_dir->i_sb) +
> +	handle = ext4_journal_start(old.dir, EXT4_HT_DIR,
> +		(2 * EXT4_DATA_TRANS_BLOCKS(old.dir->i_sb) +
>  		 EXT4_INDEX_EXTRA_TRANS_BLOCKS + 2));
>  	if (IS_ERR(handle))
>  		return PTR_ERR(handle);
>  
> -	if (IS_DIRSYNC(old_dir) || IS_DIRSYNC(new_dir))
> +	if (IS_DIRSYNC(old.dir) || IS_DIRSYNC(new.dir))
>  		ext4_handle_sync(handle);
>  
> -	if (S_ISDIR(old_inode->i_mode)) {
> -		if (new_inode) {
> +	if (S_ISDIR(old.inode->i_mode)) {
> +		if (new.inode) {
>  			retval = -ENOTEMPTY;
> -			if (!empty_dir(new_inode))
> +			if (!empty_dir(new.inode))
>  				goto end_rename;
>  		}
>  		retval = -EIO;
> -		dir_bh = ext4_get_first_dir_block(handle, old_inode,
> -						  &retval, &parent_de,
> -						  &inlined);
> -		if (!dir_bh)
> +		old.dir_bh = ext4_get_first_dir_block(handle, old.inode,
> +						  &retval, &old.parent_de,
> +						  &old.parent_inlined);
> +		if (!old.dir_bh)
>  			goto end_rename;
> -		if (le32_to_cpu(parent_de->inode) != old_dir->i_ino)
> +		if (le32_to_cpu(old.parent_de->inode) != old.dir->i_ino)
>  			goto end_rename;
>  		retval = -EMLINK;
> -		if (!new_inode && new_dir != old_dir &&
> -		    EXT4_DIR_LINK_MAX(new_dir))
> +		if (!new.inode && new.dir != old.dir &&
> +		    EXT4_DIR_LINK_MAX(new.dir))
>  			goto end_rename;
> -		BUFFER_TRACE(dir_bh, "get_write_access");
> -		retval = ext4_journal_get_write_access(handle, dir_bh);
> +		BUFFER_TRACE(old.dir_bh, "get_write_access");
> +		retval = ext4_journal_get_write_access(handle, old.dir_bh);
>  		if (retval)
>  			goto end_rename;
>  	}
> -	if (!new_bh) {
> -		retval = ext4_add_entry(handle, new_dentry, old_inode);
> +	if (!new.bh) {
> +		retval = ext4_add_entry(handle, new.dentry, old.inode);
>  		if (retval)
>  			goto end_rename;
>  	} else {
> -		BUFFER_TRACE(new_bh, "get write access");
> -		retval = ext4_journal_get_write_access(handle, new_bh);
> +		BUFFER_TRACE(new.bh, "get write access");
> +		retval = ext4_journal_get_write_access(handle, new.bh);
>  		if (retval)
>  			goto end_rename;
> -		new_de->inode = cpu_to_le32(old_inode->i_ino);
> -		if (EXT4_HAS_INCOMPAT_FEATURE(new_dir->i_sb,
> +		new.de->inode = cpu_to_le32(old.inode->i_ino);
> +		if (EXT4_HAS_INCOMPAT_FEATURE(new.dir->i_sb,
>  					      EXT4_FEATURE_INCOMPAT_FILETYPE))
> -			new_de->file_type = old_de->file_type;
> -		new_dir->i_version++;
> -		new_dir->i_ctime = new_dir->i_mtime =
> -					ext4_current_time(new_dir);
> -		ext4_mark_inode_dirty(handle, new_dir);
> -		BUFFER_TRACE(new_bh, "call ext4_handle_dirty_metadata");
> -		if (!new_inlined) {
> +			new.de->file_type = old.de->file_type;
> +		new.dir->i_version++;
> +		new.dir->i_ctime = new.dir->i_mtime =
> +					ext4_current_time(new.dir);
> +		ext4_mark_inode_dirty(handle, new.dir);
> +		BUFFER_TRACE(new.bh, "call ext4_handle_dirty_metadata");
> +		if (!new.inlined) {
>  			retval = ext4_handle_dirty_dirent_node(handle,
> -							       new_dir, new_bh);
> +							       new.dir, new.bh);
>  			if (unlikely(retval)) {
> -				ext4_std_error(new_dir->i_sb, retval);
> +				ext4_std_error(new.dir->i_sb, retval);
>  				goto end_rename;
>  			}
>  		}
> -		brelse(new_bh);
> -		new_bh = NULL;
> +		brelse(new.bh);
> +		new.bh = NULL;
>  	}
>  
>  	/*
>  	 * Like most other Unix systems, set the ctime for inodes on a
>  	 * rename.
>  	 */
> -	old_inode->i_ctime = ext4_current_time(old_inode);
> -	ext4_mark_inode_dirty(handle, old_inode);
> +	old.inode->i_ctime = ext4_current_time(old.inode);
> +	ext4_mark_inode_dirty(handle, old.inode);
>  
>  	/*
>  	 * ok, that's it
>  	 */
> -	if (le32_to_cpu(old_de->inode) != old_inode->i_ino ||
> -	    old_de->name_len != old_dentry->d_name.len ||
> -	    strncmp(old_de->name, old_dentry->d_name.name, old_de->name_len) ||
> -	    (retval = ext4_delete_entry(handle, old_dir,
> -					old_de, old_bh)) == -ENOENT) {
> -		/* old_de could have moved from under us during htree split, so
> +	if (le32_to_cpu(old.de->inode) != old.inode->i_ino ||
> +	    old.de->name_len != old.dentry->d_name.len ||
> +	    strncmp(old.de->name, old.dentry->d_name.name, old.de->name_len) ||
> +	    (retval = ext4_delete_entry(handle, old.dir,
> +					old.de, old.bh)) == -ENOENT) {
> +		/* old.de could have moved from under us during htree split, so
>  		 * make sure that we are deleting the right entry.  We might
>  		 * also be pointing to a stale entry in the unused part of
> -		 * old_bh so just checking inum and the name isn't enough. */
> +		 * old.bh so just checking inum and the name isn't enough. */
>  		struct buffer_head *old_bh2;
>  		struct ext4_dir_entry_2 *old_de2;
>  
> -		old_bh2 = ext4_find_entry(old_dir, &old_dentry->d_name,
> +		old_bh2 = ext4_find_entry(old.dir, &old.dentry->d_name,
>  					  &old_de2, NULL);
>  		if (old_bh2) {
> -			retval = ext4_delete_entry(handle, old_dir,
> +			retval = ext4_delete_entry(handle, old.dir,
>  						   old_de2, old_bh2);
>  			brelse(old_bh2);
>  		}
>  	}
>  	if (retval) {
> -		ext4_warning(old_dir->i_sb,
> +		ext4_warning(old.dir->i_sb,
>  				"Deleting old file (%lu), %d, error=%d",
> -				old_dir->i_ino, old_dir->i_nlink, retval);
> +				old.dir->i_ino, old.dir->i_nlink, retval);
>  	}
>  
> -	if (new_inode) {
> -		ext4_dec_count(handle, new_inode);
> -		new_inode->i_ctime = ext4_current_time(new_inode);
> +	if (new.inode) {
> +		ext4_dec_count(handle, new.inode);
> +		new.inode->i_ctime = ext4_current_time(new.inode);
>  	}
> -	old_dir->i_ctime = old_dir->i_mtime = ext4_current_time(old_dir);
> -	ext4_update_dx_flag(old_dir);
> -	if (dir_bh) {
> -		parent_de->inode = cpu_to_le32(new_dir->i_ino);
> -		BUFFER_TRACE(dir_bh, "call ext4_handle_dirty_metadata");
> -		if (!inlined) {
> -			if (is_dx(old_inode)) {
> +	old.dir->i_ctime = old.dir->i_mtime = ext4_current_time(old.dir);
> +	ext4_update_dx_flag(old.dir);
> +	if (old.dir_bh) {
> +		old.parent_de->inode = cpu_to_le32(new.dir->i_ino);
> +		BUFFER_TRACE(old.dir_bh, "call ext4_handle_dirty_metadata");
> +		if (!old.parent_inlined) {
> +			if (is_dx(old.inode)) {
>  				retval = ext4_handle_dirty_dx_node(handle,
> -								   old_inode,
> -								   dir_bh);
> +								   old.inode,
> +								   old.dir_bh);
>  			} else {
>  				retval = ext4_handle_dirty_dirent_node(handle,
> -							old_inode, dir_bh);
> +							old.inode, old.dir_bh);
>  			}
>  		} else {
> -			retval = ext4_mark_inode_dirty(handle, old_inode);
> +			retval = ext4_mark_inode_dirty(handle, old.inode);
>  		}
>  		if (retval) {
> -			ext4_std_error(old_dir->i_sb, retval);
> +			ext4_std_error(old.dir->i_sb, retval);
>  			goto end_rename;
>  		}
> -		ext4_dec_count(handle, old_dir);
> -		if (new_inode) {
> +		ext4_dec_count(handle, old.dir);
> +		if (new.inode) {
>  			/* checked empty_dir above, can't have another parent,
>  			 * ext4_dec_count() won't work for many-linked dirs */
> -			clear_nlink(new_inode);
> +			clear_nlink(new.inode);
>  		} else {
> -			ext4_inc_count(handle, new_dir);
> -			ext4_update_dx_flag(new_dir);
> -			ext4_mark_inode_dirty(handle, new_dir);
> +			ext4_inc_count(handle, new.dir);
> +			ext4_update_dx_flag(new.dir);
> +			ext4_mark_inode_dirty(handle, new.dir);
>  		}
>  	}
> -	ext4_mark_inode_dirty(handle, old_dir);
> -	if (new_inode) {
> -		ext4_mark_inode_dirty(handle, new_inode);
> -		if (!new_inode->i_nlink)
> -			ext4_orphan_add(handle, new_inode);
> +	ext4_mark_inode_dirty(handle, old.dir);
> +	if (new.inode) {
> +		ext4_mark_inode_dirty(handle, new.inode);
> +		if (!new.inode->i_nlink)
> +			ext4_orphan_add(handle, new.inode);
>  	}
>  	retval = 0;
>  
>  end_rename:
> -	brelse(dir_bh);
> -	brelse(old_bh);
> -	brelse(new_bh);
> +	brelse(old.dir_bh);
> +	brelse(old.bh);
> +	brelse(new.bh);
>  	if (handle)
>  		ext4_journal_stop(handle);
>  	return retval;
> -- 
> 1.8.1.4
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/7] ext4: rename: move EMLINK check up
  2013-10-01 16:00 ` [PATCH 5/7] ext4: rename: move EMLINK check up Miklos Szeredi
@ 2013-10-02 12:05   ` Jan Kara
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Kara @ 2013-10-02 12:05 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-fsdevel, linux-kernel, viro, torvalds, hch, akpm, dhowells,
	zab, jack, tytso, mszeredi

On Tue 01-10-13 18:00:37, Miklos Szeredi wrote:
> From: Miklos Szeredi <mszeredi@suse.cz>
> 
> Move checking i_nlink from after ext4_get_first_dir_block() to before.  The
> check doesn't rely on the result of that function and the function only
> fails on fs corruption, so the order shouldn't matter.
> 
> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
  This looks good. You can add:
Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/namei.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index 01bd80e..1348251 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -3082,6 +3082,10 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
>  			retval = -ENOTEMPTY;
>  			if (!empty_dir(new.inode))
>  				goto end_rename;
> +		} else {
> +			retval = -EMLINK;
> +			if (new.dir != old.dir && EXT4_DIR_LINK_MAX(new.dir))
> +				goto end_rename;
>  		}
>  		retval = -EIO;
>  		old.dir_bh = ext4_get_first_dir_block(handle, old.inode,
> @@ -3091,10 +3095,6 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
>  			goto end_rename;
>  		if (le32_to_cpu(old.parent_de->inode) != old.dir->i_ino)
>  			goto end_rename;
> -		retval = -EMLINK;
> -		if (!new.inode && new.dir != old.dir &&
> -		    EXT4_DIR_LINK_MAX(new.dir))
> -			goto end_rename;
>  		BUFFER_TRACE(old.dir_bh, "get_write_access");
>  		retval = ext4_journal_get_write_access(handle, old.dir_bh);
>  		if (retval)
> -- 
> 1.8.1.4
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/7] ext4: rename: split out helper functions
  2013-10-01 16:00 ` [PATCH 6/7] ext4: rename: split out helper functions Miklos Szeredi
@ 2013-10-02 12:19   ` Jan Kara
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Kara @ 2013-10-02 12:19 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-fsdevel, linux-kernel, viro, torvalds, hch, akpm, dhowells,
	zab, jack, tytso, mszeredi

On Tue 01-10-13 18:00:38, Miklos Szeredi wrote:
> From: Miklos Szeredi <mszeredi@suse.cz>
> 
> Cross rename (exchange source and dest) will need to call some of these
> helpers for both source and dest, while overwriting rename currently only
> calls them for one or the other.  This also makes the code easier to
> follow.
  The patch looks good to me. You can add:
Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> 
> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
> ---
>  fs/ext4/namei.c | 199 +++++++++++++++++++++++++++++++++++---------------------
>  1 file changed, 126 insertions(+), 73 deletions(-)
> 
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index 1348251..fb0f1db 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -3014,6 +3014,125 @@ struct ext4_renament {
>  	int parent_inlined;
>  };
>  
> +static int ext4_rename_dir_prepare(handle_t *handle, struct ext4_renament *ent)
> +{
> +	int retval;
> +
> +	ent->dir_bh = ext4_get_first_dir_block(handle, ent->inode,
> +					      &retval, &ent->parent_de,
> +					      &ent->parent_inlined);
> +	if (!ent->dir_bh)
> +		return retval;
> +	if (le32_to_cpu(ent->parent_de->inode) != ent->dir->i_ino)
> +		return -EIO;
> +	BUFFER_TRACE(ent->dir_bh, "get_write_access");
> +	return ext4_journal_get_write_access(handle, ent->dir_bh);
> +}
> +
> +static int ext4_rename_dir_finish(handle_t *handle, struct ext4_renament *ent,
> +				  unsigned dir_ino)
> +{
> +	int retval;
> +
> +	ent->parent_de->inode = cpu_to_le32(dir_ino);
> +	BUFFER_TRACE(ent->dir_bh, "call ext4_handle_dirty_metadata");
> +	if (!ent->parent_inlined) {
> +		if (is_dx(ent->inode)) {
> +			retval = ext4_handle_dirty_dx_node(handle,
> +							   ent->inode,
> +							   ent->dir_bh);
> +		} else {
> +			retval = ext4_handle_dirty_dirent_node(handle,
> +							       ent->inode,
> +							       ent->dir_bh);
> +		}
> +	} else {
> +		retval = ext4_mark_inode_dirty(handle, ent->inode);
> +	}
> +	if (retval) {
> +		ext4_std_error(ent->dir->i_sb, retval);
> +		return retval;
> +	}
> +	return 0;
> +}
> +
> +static int ext4_setent(handle_t *handle, struct ext4_renament *ent,
> +		       unsigned ino, unsigned file_type)
> +{
> +	int retval;
> +
> +	BUFFER_TRACE(ent->bh, "get write access");
> +	retval = ext4_journal_get_write_access(handle, ent->bh);
> +	if (retval)
> +		return retval;
> +	ent->de->inode = cpu_to_le32(ino);
> +	if (EXT4_HAS_INCOMPAT_FEATURE(ent->dir->i_sb,
> +				      EXT4_FEATURE_INCOMPAT_FILETYPE))
> +		ent->de->file_type = file_type;
> +	ent->dir->i_version++;
> +	ent->dir->i_ctime = ent->dir->i_mtime =
> +		ext4_current_time(ent->dir);
> +	ext4_mark_inode_dirty(handle, ent->dir);
> +	BUFFER_TRACE(ent->bh, "call ext4_handle_dirty_metadata");
> +	if (!ent->inlined) {
> +		retval = ext4_handle_dirty_dirent_node(handle,
> +						       ent->dir, ent->bh);
> +		if (unlikely(retval)) {
> +			ext4_std_error(ent->dir->i_sb, retval);
> +			return retval;
> +		}
> +	}
> +	brelse(ent->bh);
> +	ent->bh = NULL;
> +
> +	return 0;
> +}
> +
> +static int ext4_find_delete_entry(handle_t *handle, struct inode *dir,
> +				  const struct qstr *d_name)
> +{
> +	int retval = -ENOENT;
> +	struct buffer_head *bh;
> +	struct ext4_dir_entry_2 *de;
> +
> +	bh = ext4_find_entry(dir, d_name, &de, NULL);
> +	if (bh) {
> +		retval = ext4_delete_entry(handle, dir, de, bh);
> +		brelse(bh);
> +	}
> +	return retval;
> +}
> +
> +static void ext4_rename_delete(handle_t *handle, struct ext4_renament *ent)
> +{
> +	int retval;
> +	/*
> +	 * ent->de could have moved from under us during htree split, so make
> +	 * sure that we are deleting the right entry.  We might also be pointing
> +	 * to a stale entry in the unused part of ent->bh so just checking inum
> +	 * and the name isn't enough.
> +	 */
> +	if (le32_to_cpu(ent->de->inode) != ent->inode->i_ino ||
> +	    ent->de->name_len != ent->dentry->d_name.len ||
> +	    strncmp(ent->de->name, ent->dentry->d_name.name,
> +		    ent->de->name_len)) {
> +		retval = ext4_find_delete_entry(handle, ent->dir,
> +						&ent->dentry->d_name);
> +	} else {
> +		retval = ext4_delete_entry(handle, ent->dir, ent->de, ent->bh);
> +		if (retval == -ENOENT) {
> +			retval = ext4_find_delete_entry(handle, ent->dir,
> +							&ent->dentry->d_name);
> +		}
> +	}
> +
> +	if (retval) {
> +		ext4_warning(ent->dir->i_sb,
> +				"Deleting old file (%lu), %d, error=%d",
> +				ent->dir->i_ino, ent->dir->i_nlink, retval);
> +	}
> +}
> +
>  /*
>   * Anybody can rename anything with this: the permission checks are left to the
>   * higher-level routines.
> @@ -3087,16 +3206,7 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
>  			if (new.dir != old.dir && EXT4_DIR_LINK_MAX(new.dir))
>  				goto end_rename;
>  		}
> -		retval = -EIO;
> -		old.dir_bh = ext4_get_first_dir_block(handle, old.inode,
> -						  &retval, &old.parent_de,
> -						  &old.parent_inlined);
> -		if (!old.dir_bh)
> -			goto end_rename;
> -		if (le32_to_cpu(old.parent_de->inode) != old.dir->i_ino)
> -			goto end_rename;
> -		BUFFER_TRACE(old.dir_bh, "get_write_access");
> -		retval = ext4_journal_get_write_access(handle, old.dir_bh);
> +		retval = ext4_rename_dir_prepare(handle, &old);
>  		if (retval)
>  			goto end_rename;
>  	}
> @@ -3105,29 +3215,10 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
>  		if (retval)
>  			goto end_rename;
>  	} else {
> -		BUFFER_TRACE(new.bh, "get write access");
> -		retval = ext4_journal_get_write_access(handle, new.bh);
> +		retval = ext4_setent(handle, &new,
> +				     old.inode->i_ino, old.de->file_type);
>  		if (retval)
>  			goto end_rename;
> -		new.de->inode = cpu_to_le32(old.inode->i_ino);
> -		if (EXT4_HAS_INCOMPAT_FEATURE(new.dir->i_sb,
> -					      EXT4_FEATURE_INCOMPAT_FILETYPE))
> -			new.de->file_type = old.de->file_type;
> -		new.dir->i_version++;
> -		new.dir->i_ctime = new.dir->i_mtime =
> -					ext4_current_time(new.dir);
> -		ext4_mark_inode_dirty(handle, new.dir);
> -		BUFFER_TRACE(new.bh, "call ext4_handle_dirty_metadata");
> -		if (!new.inlined) {
> -			retval = ext4_handle_dirty_dirent_node(handle,
> -							       new.dir, new.bh);
> -			if (unlikely(retval)) {
> -				ext4_std_error(new.dir->i_sb, retval);
> -				goto end_rename;
> -			}
> -		}
> -		brelse(new.bh);
> -		new.bh = NULL;
>  	}
>  
>  	/*
> @@ -3140,31 +3231,7 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
>  	/*
>  	 * ok, that's it
>  	 */
> -	if (le32_to_cpu(old.de->inode) != old.inode->i_ino ||
> -	    old.de->name_len != old.dentry->d_name.len ||
> -	    strncmp(old.de->name, old.dentry->d_name.name, old.de->name_len) ||
> -	    (retval = ext4_delete_entry(handle, old.dir,
> -					old.de, old.bh)) == -ENOENT) {
> -		/* old.de could have moved from under us during htree split, so
> -		 * make sure that we are deleting the right entry.  We might
> -		 * also be pointing to a stale entry in the unused part of
> -		 * old.bh so just checking inum and the name isn't enough. */
> -		struct buffer_head *old_bh2;
> -		struct ext4_dir_entry_2 *old_de2;
> -
> -		old_bh2 = ext4_find_entry(old.dir, &old.dentry->d_name,
> -					  &old_de2, NULL);
> -		if (old_bh2) {
> -			retval = ext4_delete_entry(handle, old.dir,
> -						   old_de2, old_bh2);
> -			brelse(old_bh2);
> -		}
> -	}
> -	if (retval) {
> -		ext4_warning(old.dir->i_sb,
> -				"Deleting old file (%lu), %d, error=%d",
> -				old.dir->i_ino, old.dir->i_nlink, retval);
> -	}
> +	ext4_rename_delete(handle, &old);
>  
>  	if (new.inode) {
>  		ext4_dec_count(handle, new.inode);
> @@ -3173,24 +3240,10 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
>  	old.dir->i_ctime = old.dir->i_mtime = ext4_current_time(old.dir);
>  	ext4_update_dx_flag(old.dir);
>  	if (old.dir_bh) {
> -		old.parent_de->inode = cpu_to_le32(new.dir->i_ino);
> -		BUFFER_TRACE(old.dir_bh, "call ext4_handle_dirty_metadata");
> -		if (!old.parent_inlined) {
> -			if (is_dx(old.inode)) {
> -				retval = ext4_handle_dirty_dx_node(handle,
> -								   old.inode,
> -								   old.dir_bh);
> -			} else {
> -				retval = ext4_handle_dirty_dirent_node(handle,
> -							old.inode, old.dir_bh);
> -			}
> -		} else {
> -			retval = ext4_mark_inode_dirty(handle, old.inode);
> -		}
> -		if (retval) {
> -			ext4_std_error(old.dir->i_sb, retval);
> +		retval = ext4_rename_dir_finish(handle, &old, new.dir->i_ino);
> +		if (retval)
>  			goto end_rename;
> -		}
> +
>  		ext4_dec_count(handle, old.dir);
>  		if (new.inode) {
>  			/* checked empty_dir above, can't have another parent,
> -- 
> 1.8.1.4
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/7] vfs: add renameat2 syscall and cross-rename
  2013-10-01 16:00 ` [PATCH 3/7] vfs: add renameat2 syscall and cross-rename Miklos Szeredi
@ 2013-10-02 12:26   ` Jan Kara
  2013-10-02 14:59     ` Miklos Szeredi
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Kara @ 2013-10-02 12:26 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-fsdevel, linux-kernel, viro, torvalds, hch, akpm, dhowells,
	zab, jack, tytso, mszeredi

On Tue 01-10-13 18:00:35, Miklos Szeredi wrote:
> From: Miklos Szeredi <mszeredi@suse.cz>
> 
> Add new renameat2 syscall, which is the same as renameat with an added
> flags argument.
> 
> If flags is zero then this is a plain overwriting rename.  If flags contain
> RENAME_EXCHANGE then exchange source and destination files.  There's no
> restriction on the type of the files; e.g. a directory can be exchanged
> with a symlink.
  It's not completely clear to me what should happen if RENAME_EXCHANGE is
set but destination doesn't exist. Return -ENOENT or just do standard
rename?

								Honza
> 
> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
> ---
>  arch/x86/syscalls/syscall_64.tbl |   1 +
>  fs/dcache.c                      |  46 ++++++++++---
>  fs/namei.c                       | 139 ++++++++++++++++++++++++++++++---------
>  include/linux/dcache.h           |   1 +
>  include/linux/fs.h               |   2 +
>  include/uapi/linux/fs.h          |   2 +
>  6 files changed, 152 insertions(+), 39 deletions(-)
> 
> diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
> index 38ae65d..fafd734 100644
> --- a/arch/x86/syscalls/syscall_64.tbl
> +++ b/arch/x86/syscalls/syscall_64.tbl
> @@ -320,6 +320,7 @@
>  311	64	process_vm_writev	sys_process_vm_writev
>  312	common	kcmp			sys_kcmp
>  313	common	finit_module		sys_finit_module
> +314	common	renameat2		sys_renameat2
>  
>  #
>  # x32-specific system call numbers start at 512 to avoid cache impact
> diff --git a/fs/dcache.c b/fs/dcache.c
> index 4100030..1735bac 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -2495,12 +2495,14 @@ static void switch_names(struct dentry *dentry, struct dentry *target)
>  			dentry->d_name.name = dentry->d_iname;
>  		} else {
>  			/*
> -			 * Both are internal.  Just copy target to dentry
> +			 * Both are internal.
>  			 */
> -			memcpy(dentry->d_iname, target->d_name.name,
> -					target->d_name.len + 1);
> -			dentry->d_name.len = target->d_name.len;
> -			return;
> +			unsigned int i;
> +			BUILD_BUG_ON(!IS_ALIGNED(DNAME_INLINE_LEN, sizeof(long)));
> +			for (i = 0; i < DNAME_INLINE_LEN / sizeof(long); i++) {
> +				swap(((long *) &dentry->d_iname)[i],
> +				     ((long *) &target->d_iname)[i]);
> +			}
>  		}
>  	}
>  	swap(dentry->d_name.len, target->d_name.len);
> @@ -2557,13 +2559,15 @@ static void dentry_unlock_parents_for_move(struct dentry *dentry,
>   * __d_move - move a dentry
>   * @dentry: entry to move
>   * @target: new dentry
> + * @exchange: exchange the two dentries
>   *
>   * Update the dcache to reflect the move of a file name. Negative
>   * dcache entries should not be moved in this way. Caller must hold
>   * rename_lock, the i_mutex of the source and target directories,
>   * and the sb->s_vfs_rename_mutex if they differ. See lock_rename().
>   */
> -static void __d_move(struct dentry * dentry, struct dentry * target)
> +static void __d_move(struct dentry *dentry, struct dentry *target,
> +		     bool exchange)
>  {
>  	if (!dentry->d_inode)
>  		printk(KERN_WARNING "VFS: moving negative dcache entry\n");
> @@ -2587,6 +2591,10 @@ static void __d_move(struct dentry * dentry, struct dentry * target)
>  
>  	/* Unhash the target: dput() will then get rid of it */
>  	__d_drop(target);
> +	if (exchange) {
> +		__d_rehash(target,
> +			   d_hash(dentry->d_parent, dentry->d_name.hash));
> +	}
>  
>  	list_del(&dentry->d_u.d_child);
>  	list_del(&target->d_u.d_child);
> @@ -2613,6 +2621,8 @@ static void __d_move(struct dentry * dentry, struct dentry * target)
>  	write_seqcount_end(&dentry->d_seq);
>  
>  	dentry_unlock_parents_for_move(dentry, target);
> +	if (exchange)
> +		fsnotify_d_move(target);
>  	spin_unlock(&target->d_lock);
>  	fsnotify_d_move(dentry);
>  	spin_unlock(&dentry->d_lock);
> @@ -2630,11 +2640,31 @@ static void __d_move(struct dentry * dentry, struct dentry * target)
>  void d_move(struct dentry *dentry, struct dentry *target)
>  {
>  	write_seqlock(&rename_lock);
> -	__d_move(dentry, target);
> +	__d_move(dentry, target, false);
>  	write_sequnlock(&rename_lock);
>  }
>  EXPORT_SYMBOL(d_move);
>  
> +/*
> + * d_exchange - exchange two dentries
> + * @dentry1: first dentry
> + * @dentry2: second dentry
> + */
> +void d_exchange(struct dentry *dentry1, struct dentry *dentry2)
> +{
> +	write_seqlock(&rename_lock);
> +
> +	WARN_ON(!dentry1->d_inode);
> +	WARN_ON(!dentry2->d_inode);
> +	WARN_ON(IS_ROOT(dentry1));
> +	WARN_ON(IS_ROOT(dentry2));
> +
> +	__d_move(dentry1, dentry2, true);
> +
> +	write_sequnlock(&rename_lock);
> +}
> +
> +
>  /**
>   * d_ancestor - search for an ancestor
>   * @p1: ancestor dentry
> @@ -2682,7 +2712,7 @@ static struct dentry *__d_unalias(struct inode *inode,
>  	m2 = &alias->d_parent->d_inode->i_mutex;
>  out_unalias:
>  	if (likely(!d_mountpoint(alias))) {
> -		__d_move(alias, dentry);
> +		__d_move(alias, dentry, false);
>  		ret = alias;
>  	}
>  out_err:
> diff --git a/fs/namei.c b/fs/namei.c
> index 7ec6a12..55700b3 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3963,14 +3963,18 @@ SYSCALL_DEFINE2(link, const char __user *, oldname, const char __user *, newname
>   *	   ->i_mutex on parents, which works but leads to some truly excessive
>   *	   locking].
>   */
> -int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
> -	       struct inode *new_dir, struct dentry *new_dentry)
> +static int vfs_rename2(struct inode *old_dir, struct dentry *old_dentry,
> +		       struct inode *new_dir, struct dentry *new_dentry,
> +		       unsigned int flags)
>  {
>  	int error;
>  	const unsigned char *old_name;
>  	struct inode *source = old_dentry->d_inode;
>  	struct inode *target = new_dentry->d_inode;
>  	bool is_dir = S_ISDIR(source->i_mode);
> +	bool new_is_dir = target ? S_ISDIR(target->i_mode) : false;
> +	bool overwrite = !(flags & RENAME_EXCHANGE);
> +	unsigned max_links = new_dir->i_sb->s_max_links;
>  
>  	if (source == target)
>  		return 0;
> @@ -3981,74 +3985,116 @@ int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
>  
>  	if (!target)
>  		error = may_create(new_dir, new_dentry);
> -	else
> +	else if (overwrite)
>  		error = may_delete(new_dir, new_dentry, is_dir);
> +	else
> +		error = may_delete(new_dir, new_dentry, new_is_dir);
>  	if (error)
>  		return error;
>  
> -	if (!old_dir->i_op->rename)
> +	if (!old_dir->i_op->rename && !old_dir->i_op->rename2)
>  		return -EPERM;
>  
> +	if (flags && !old_dir->i_op->rename2)
> +		return -EOPNOTSUPP;
> +
>  	/*
>  	 * If we are going to change the parent - check write permissions,
>  	 * we'll need to flip '..'.
>  	 */
> -	if (is_dir && new_dir != old_dir) {
> -		error = inode_permission(source, MAY_WRITE);
> -		if (error)
> -			return error;
> +	if (new_dir != old_dir) {
> +		if (is_dir) {
> +			error = inode_permission(source, MAY_WRITE);
> +			if (error)
> +				return error;
> +		}
> +		if (!overwrite && new_is_dir) {
> +			error = inode_permission(target, MAY_WRITE);
> +			if (error)
> +				return error;
> +		}
>  	}
>  
>  	error = security_inode_rename(old_dir, old_dentry, new_dir, new_dentry);
>  	if (error)
>  		return error;
>  
> +	if (!overwrite) {
> +		error = security_inode_rename(new_dir, new_dentry,
> +					      old_dir, old_dentry);
> +		if (error)
> +			return error;
> +	}
> +
>  	old_name = fsnotify_oldname_init(old_dentry->d_name.name);
>  	dget(new_dentry);
> -	if (target)
> +	if (overwrite && target)
>  		mutex_lock(&target->i_mutex);
>  
>  	error = -EBUSY;
>  	if (d_mountpoint(old_dentry) || d_mountpoint(new_dentry))
>  		goto out;
>  
> -	if (is_dir) {
> -		unsigned max_links = new_dir->i_sb->s_max_links;
> -
> +	if (max_links && new_dir != old_dir) {
>  		error = -EMLINK;
> -		if (max_links && !target && new_dir != old_dir &&
> -		    new_dir->i_nlink >= max_links)
> +		if (is_dir && !new_is_dir && new_dir->i_nlink >= max_links)
>  			goto out;
> +		if (!overwrite && !is_dir && new_is_dir &&
> +		    old_dir->i_nlink > max_links)
> +			goto out;
> +	}
> +
> +	if (overwrite && is_dir && target)
> +		shrink_dcache_parent(new_dentry);
>  
> -		if (target)
> -			shrink_dcache_parent(new_dentry);
> +	if (old_dir->i_op->rename2) {
> +		error = old_dir->i_op->rename2(old_dir, old_dentry,
> +					       new_dir, new_dentry, flags);
> +	} else {
> +		error = old_dir->i_op->rename(old_dir, old_dentry,
> +					      new_dir, new_dentry);
>  	}
>  
> -	error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry);
>  	if (error)
>  		goto out;
>  
> -	if (target) {
> +	if (overwrite && target) {
>  		if (is_dir)
>  			target->i_flags |= S_DEAD;
>  		dont_mount(new_dentry);
>  	}
> -	if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE))
> -		d_move(old_dentry, new_dentry);
> +	if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE)) {
> +		if (overwrite)
> +			d_move(old_dentry, new_dentry);
> +		else
> +			d_exchange(old_dentry, new_dentry);
> +	}
>  out:
> -	if (target)
> +	if (overwrite && target)
>  		mutex_unlock(&target->i_mutex);
>  	dput(new_dentry);
> -	if (!error)
> +	if (!error) {
>  		fsnotify_move(old_dir, new_dir, old_name, is_dir,
> -			      target, old_dentry);
> +			      overwrite ? target : NULL, old_dentry);
> +		if (!overwrite) {
> +			fsnotify_move(new_dir, old_dir, old_dentry->d_name.name,
> +				      new_is_dir, NULL, new_dentry);
> +		}
> +	}
>  	fsnotify_oldname_free(old_name);
>  
>  	return error;
>  }
>  
> -SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
> -		int, newdfd, const char __user *, newname)
> +int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
> +	       struct inode *new_dir, struct dentry *new_dentry)
> +{
> +	return vfs_rename2(old_dir, old_dentry, new_dir, new_dentry, 0);
> +}
> +
> +
> +SYSCALL_DEFINE5(renameat2, int, olddfd, const char __user *, oldname,
> +		int, newdfd, const char __user *, newname, unsigned int, flags)
>  {
>  	struct dentry *old_dir, *new_dir;
>  	struct dentry *old_dentry, *new_dentry;
> @@ -4058,7 +4104,13 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
>  	struct filename *to;
>  	unsigned int lookup_flags = 0;
>  	bool should_retry = false;
> +	bool overwrite = !(flags & RENAME_EXCHANGE);
>  	int error;
> +
> +	error = -EOPNOTSUPP;
> +	if (flags & ~RENAME_EXCHANGE)
> +		goto exit;
> +
>  retry:
>  	from = user_path_parent(olddfd, oldname, &oldnd, lookup_flags);
>  	if (IS_ERR(from)) {
> @@ -4091,7 +4143,8 @@ retry:
>  
>  	oldnd.flags &= ~LOOKUP_PARENT;
>  	newnd.flags &= ~LOOKUP_PARENT;
> -	newnd.flags |= LOOKUP_RENAME_TARGET;
> +	if (overwrite)
> +		newnd.flags |= LOOKUP_RENAME_TARGET;
>  
>  	trap = lock_rename(new_dir, old_dir);
>  
> @@ -4108,7 +4161,7 @@ retry:
>  		error = -ENOTDIR;
>  		if (oldnd.last.name[oldnd.last.len])
>  			goto exit4;
> -		if (newnd.last.name[newnd.last.len])
> +		if (overwrite && newnd.last.name[newnd.last.len])
>  			goto exit4;
>  	}
>  	/* source should not be ancestor of target */
> @@ -4119,8 +4172,19 @@ retry:
>  	error = PTR_ERR(new_dentry);
>  	if (IS_ERR(new_dentry))
>  		goto exit4;
> +	if (!overwrite) {
> +		error = -ENOENT;
> +		if (!new_dentry->d_inode)
> +			goto exit4;
> +
> +		if (!S_ISDIR(new_dentry->d_inode->i_mode)) {
> +			error = -ENOTDIR;
> +			if (newnd.last.name[newnd.last.len])
> +				goto exit4;
> +		}
> +	}
>  	/* target should not be an ancestor of source */
> -	error = -ENOTEMPTY;
> +	error = overwrite ? -ENOTEMPTY : -EINVAL;
>  	if (new_dentry == trap)
>  		goto exit5;
>  
> @@ -4128,8 +4192,15 @@ retry:
>  				     &newnd.path, new_dentry);
>  	if (error)
>  		goto exit5;
> -	error = vfs_rename(old_dir->d_inode, old_dentry,
> -				   new_dir->d_inode, new_dentry);
> +	if (!overwrite) {
> +		error = security_path_rename(&newnd.path, new_dentry,
> +					     &oldnd.path, old_dentry);
> +		if (error)
> +			goto exit5;
> +	}
> +
> +	error = vfs_rename2(old_dir->d_inode, old_dentry,
> +			    new_dir->d_inode, new_dentry, flags);
>  exit5:
>  	dput(new_dentry);
>  exit4:
> @@ -4154,9 +4225,15 @@ exit:
>  	return error;
>  }
>  
> +SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
> +		int, newdfd, const char __user *, newname)
> +{
> +	return sys_renameat2(olddfd, oldname, newdfd, newname, 0);
> +}
> +
>  SYSCALL_DEFINE2(rename, const char __user *, oldname, const char __user *, newname)
>  {
> -	return sys_renameat(AT_FDCWD, oldname, AT_FDCWD, newname);
> +	return sys_renameat2(AT_FDCWD, oldname, AT_FDCWD, newname, 0);
>  }
>  
>  int vfs_readlink(struct dentry *dentry, char __user *buffer, int buflen, const char *link)
> diff --git a/include/linux/dcache.h b/include/linux/dcache.h
> index 59066e0..ce5ebed 100644
> --- a/include/linux/dcache.h
> +++ b/include/linux/dcache.h
> @@ -297,6 +297,7 @@ extern void dentry_update_name_case(struct dentry *, struct qstr *);
>  
>  /* used for rename() and baskets */
>  extern void d_move(struct dentry *, struct dentry *);
> +extern void d_exchange(struct dentry *, struct dentry *);
>  extern struct dentry *d_ancestor(struct dentry *, struct dentry *);
>  
>  /* appendix may either be NULL or be used for transname suffixes */
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 3f40547..71c6cf9 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1572,6 +1572,8 @@ struct inode_operations {
>  	int (*mknod) (struct inode *,struct dentry *,umode_t,dev_t);
>  	int (*rename) (struct inode *, struct dentry *,
>  			struct inode *, struct dentry *);
> +	int (*rename2) (struct inode *, struct dentry *,
> +			struct inode *, struct dentry *, unsigned int);
>  	int (*setattr) (struct dentry *, struct iattr *);
>  	int (*getattr) (struct vfsmount *mnt, struct dentry *, struct kstat *);
>  	int (*setxattr) (struct dentry *, const char *,const void *,size_t,int);
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index 6c28b61..ebdafb6 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -35,6 +35,8 @@
>  #define SEEK_HOLE	4	/* seek to the next hole */
>  #define SEEK_MAX	SEEK_HOLE
>  
> +#define RENAME_EXCHANGE	(1 << 0)	/* Exchange source and dest */
> +
>  struct fstrim_range {
>  	__u64 start;
>  	__u64 len;
> -- 
> 1.8.1.4
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/7] vfs: add renameat2 syscall and cross-rename
  2013-10-02 12:26   ` Jan Kara
@ 2013-10-02 14:59     ` Miklos Szeredi
  0 siblings, 0 replies; 18+ messages in thread
From: Miklos Szeredi @ 2013-10-02 14:59 UTC (permalink / raw)
  To: Jan Kara
  Cc: Linux-Fsdevel, Kernel Mailing List, Al Viro, Linus Torvalds,
	Christoph Hellwig, Andrew Morton, David Howells, Zach Brown,
	Theodore Ts'o, mszeredi

On Wed, Oct 2, 2013 at 2:26 PM, Jan Kara <jack@suse.cz> wrote:
> On Tue 01-10-13 18:00:35, Miklos Szeredi wrote:
>> From: Miklos Szeredi <mszeredi@suse.cz>
>>
>> Add new renameat2 syscall, which is the same as renameat with an added
>> flags argument.
>>
>> If flags is zero then this is a plain overwriting rename.  If flags contain
>> RENAME_EXCHANGE then exchange source and destination files.  There's no
>> restriction on the type of the files; e.g. a directory can be exchanged
>> with a symlink.
>   It's not completely clear to me what should happen if RENAME_EXCHANGE is
> set but destination doesn't exist. Return -ENOENT or just do standard
> rename?

For RENAME_EXCHANGE the vfs part should ensure this doesn't happen
(it's -ENOENT if either of the dentries are negative).

Another interesting case, which is disallowed for now, is if one of
the dentries is unlinked, e.g. generated by ->tmpfile().  That one
might be useful and the semantics are well defined: the unlinked one
gets linked and the linked one gets unlinked.  But it would need
additional code in vfs and  filesystems, and for now I'd rather keep
this as simple as possible.

Thanks,
Miklos

>
>                                                                 Honza
>>
>> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
>> ---
>>  arch/x86/syscalls/syscall_64.tbl |   1 +
>>  fs/dcache.c                      |  46 ++++++++++---
>>  fs/namei.c                       | 139 ++++++++++++++++++++++++++++++---------
>>  include/linux/dcache.h           |   1 +
>>  include/linux/fs.h               |   2 +
>>  include/uapi/linux/fs.h          |   2 +
>>  6 files changed, 152 insertions(+), 39 deletions(-)
>>
>> diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
>> index 38ae65d..fafd734 100644
>> --- a/arch/x86/syscalls/syscall_64.tbl
>> +++ b/arch/x86/syscalls/syscall_64.tbl
>> @@ -320,6 +320,7 @@
>>  311  64      process_vm_writev       sys_process_vm_writev
>>  312  common  kcmp                    sys_kcmp
>>  313  common  finit_module            sys_finit_module
>> +314  common  renameat2               sys_renameat2
>>
>>  #
>>  # x32-specific system call numbers start at 512 to avoid cache impact
>> diff --git a/fs/dcache.c b/fs/dcache.c
>> index 4100030..1735bac 100644
>> --- a/fs/dcache.c
>> +++ b/fs/dcache.c
>> @@ -2495,12 +2495,14 @@ static void switch_names(struct dentry *dentry, struct dentry *target)
>>                       dentry->d_name.name = dentry->d_iname;
>>               } else {
>>                       /*
>> -                      * Both are internal.  Just copy target to dentry
>> +                      * Both are internal.
>>                        */
>> -                     memcpy(dentry->d_iname, target->d_name.name,
>> -                                     target->d_name.len + 1);
>> -                     dentry->d_name.len = target->d_name.len;
>> -                     return;
>> +                     unsigned int i;
>> +                     BUILD_BUG_ON(!IS_ALIGNED(DNAME_INLINE_LEN, sizeof(long)));
>> +                     for (i = 0; i < DNAME_INLINE_LEN / sizeof(long); i++) {
>> +                             swap(((long *) &dentry->d_iname)[i],
>> +                                  ((long *) &target->d_iname)[i]);
>> +                     }
>>               }
>>       }
>>       swap(dentry->d_name.len, target->d_name.len);
>> @@ -2557,13 +2559,15 @@ static void dentry_unlock_parents_for_move(struct dentry *dentry,
>>   * __d_move - move a dentry
>>   * @dentry: entry to move
>>   * @target: new dentry
>> + * @exchange: exchange the two dentries
>>   *
>>   * Update the dcache to reflect the move of a file name. Negative
>>   * dcache entries should not be moved in this way. Caller must hold
>>   * rename_lock, the i_mutex of the source and target directories,
>>   * and the sb->s_vfs_rename_mutex if they differ. See lock_rename().
>>   */
>> -static void __d_move(struct dentry * dentry, struct dentry * target)
>> +static void __d_move(struct dentry *dentry, struct dentry *target,
>> +                  bool exchange)
>>  {
>>       if (!dentry->d_inode)
>>               printk(KERN_WARNING "VFS: moving negative dcache entry\n");
>> @@ -2587,6 +2591,10 @@ static void __d_move(struct dentry * dentry, struct dentry * target)
>>
>>       /* Unhash the target: dput() will then get rid of it */
>>       __d_drop(target);
>> +     if (exchange) {
>> +             __d_rehash(target,
>> +                        d_hash(dentry->d_parent, dentry->d_name.hash));
>> +     }
>>
>>       list_del(&dentry->d_u.d_child);
>>       list_del(&target->d_u.d_child);
>> @@ -2613,6 +2621,8 @@ static void __d_move(struct dentry * dentry, struct dentry * target)
>>       write_seqcount_end(&dentry->d_seq);
>>
>>       dentry_unlock_parents_for_move(dentry, target);
>> +     if (exchange)
>> +             fsnotify_d_move(target);
>>       spin_unlock(&target->d_lock);
>>       fsnotify_d_move(dentry);
>>       spin_unlock(&dentry->d_lock);
>> @@ -2630,11 +2640,31 @@ static void __d_move(struct dentry * dentry, struct dentry * target)
>>  void d_move(struct dentry *dentry, struct dentry *target)
>>  {
>>       write_seqlock(&rename_lock);
>> -     __d_move(dentry, target);
>> +     __d_move(dentry, target, false);
>>       write_sequnlock(&rename_lock);
>>  }
>>  EXPORT_SYMBOL(d_move);
>>
>> +/*
>> + * d_exchange - exchange two dentries
>> + * @dentry1: first dentry
>> + * @dentry2: second dentry
>> + */
>> +void d_exchange(struct dentry *dentry1, struct dentry *dentry2)
>> +{
>> +     write_seqlock(&rename_lock);
>> +
>> +     WARN_ON(!dentry1->d_inode);
>> +     WARN_ON(!dentry2->d_inode);
>> +     WARN_ON(IS_ROOT(dentry1));
>> +     WARN_ON(IS_ROOT(dentry2));
>> +
>> +     __d_move(dentry1, dentry2, true);
>> +
>> +     write_sequnlock(&rename_lock);
>> +}
>> +
>> +
>>  /**
>>   * d_ancestor - search for an ancestor
>>   * @p1: ancestor dentry
>> @@ -2682,7 +2712,7 @@ static struct dentry *__d_unalias(struct inode *inode,
>>       m2 = &alias->d_parent->d_inode->i_mutex;
>>  out_unalias:
>>       if (likely(!d_mountpoint(alias))) {
>> -             __d_move(alias, dentry);
>> +             __d_move(alias, dentry, false);
>>               ret = alias;
>>       }
>>  out_err:
>> diff --git a/fs/namei.c b/fs/namei.c
>> index 7ec6a12..55700b3 100644
>> --- a/fs/namei.c
>> +++ b/fs/namei.c
>> @@ -3963,14 +3963,18 @@ SYSCALL_DEFINE2(link, const char __user *, oldname, const char __user *, newname
>>   *      ->i_mutex on parents, which works but leads to some truly excessive
>>   *      locking].
>>   */
>> -int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
>> -            struct inode *new_dir, struct dentry *new_dentry)
>> +static int vfs_rename2(struct inode *old_dir, struct dentry *old_dentry,
>> +                    struct inode *new_dir, struct dentry *new_dentry,
>> +                    unsigned int flags)
>>  {
>>       int error;
>>       const unsigned char *old_name;
>>       struct inode *source = old_dentry->d_inode;
>>       struct inode *target = new_dentry->d_inode;
>>       bool is_dir = S_ISDIR(source->i_mode);
>> +     bool new_is_dir = target ? S_ISDIR(target->i_mode) : false;
>> +     bool overwrite = !(flags & RENAME_EXCHANGE);
>> +     unsigned max_links = new_dir->i_sb->s_max_links;
>>
>>       if (source == target)
>>               return 0;
>> @@ -3981,74 +3985,116 @@ int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
>>
>>       if (!target)
>>               error = may_create(new_dir, new_dentry);
>> -     else
>> +     else if (overwrite)
>>               error = may_delete(new_dir, new_dentry, is_dir);
>> +     else
>> +             error = may_delete(new_dir, new_dentry, new_is_dir);
>>       if (error)
>>               return error;
>>
>> -     if (!old_dir->i_op->rename)
>> +     if (!old_dir->i_op->rename && !old_dir->i_op->rename2)
>>               return -EPERM;
>>
>> +     if (flags && !old_dir->i_op->rename2)
>> +             return -EOPNOTSUPP;
>> +
>>       /*
>>        * If we are going to change the parent - check write permissions,
>>        * we'll need to flip '..'.
>>        */
>> -     if (is_dir && new_dir != old_dir) {
>> -             error = inode_permission(source, MAY_WRITE);
>> -             if (error)
>> -                     return error;
>> +     if (new_dir != old_dir) {
>> +             if (is_dir) {
>> +                     error = inode_permission(source, MAY_WRITE);
>> +                     if (error)
>> +                             return error;
>> +             }
>> +             if (!overwrite && new_is_dir) {
>> +                     error = inode_permission(target, MAY_WRITE);
>> +                     if (error)
>> +                             return error;
>> +             }
>>       }
>>
>>       error = security_inode_rename(old_dir, old_dentry, new_dir, new_dentry);
>>       if (error)
>>               return error;
>>
>> +     if (!overwrite) {
>> +             error = security_inode_rename(new_dir, new_dentry,
>> +                                           old_dir, old_dentry);
>> +             if (error)
>> +                     return error;
>> +     }
>> +
>>       old_name = fsnotify_oldname_init(old_dentry->d_name.name);
>>       dget(new_dentry);
>> -     if (target)
>> +     if (overwrite && target)
>>               mutex_lock(&target->i_mutex);
>>
>>       error = -EBUSY;
>>       if (d_mountpoint(old_dentry) || d_mountpoint(new_dentry))
>>               goto out;
>>
>> -     if (is_dir) {
>> -             unsigned max_links = new_dir->i_sb->s_max_links;
>> -
>> +     if (max_links && new_dir != old_dir) {
>>               error = -EMLINK;
>> -             if (max_links && !target && new_dir != old_dir &&
>> -                 new_dir->i_nlink >= max_links)
>> +             if (is_dir && !new_is_dir && new_dir->i_nlink >= max_links)
>>                       goto out;
>> +             if (!overwrite && !is_dir && new_is_dir &&
>> +                 old_dir->i_nlink > max_links)
>> +                     goto out;
>> +     }
>> +
>> +     if (overwrite && is_dir && target)
>> +             shrink_dcache_parent(new_dentry);
>>
>> -             if (target)
>> -                     shrink_dcache_parent(new_dentry);
>> +     if (old_dir->i_op->rename2) {
>> +             error = old_dir->i_op->rename2(old_dir, old_dentry,
>> +                                            new_dir, new_dentry, flags);
>> +     } else {
>> +             error = old_dir->i_op->rename(old_dir, old_dentry,
>> +                                           new_dir, new_dentry);
>>       }
>>
>> -     error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry);
>>       if (error)
>>               goto out;
>>
>> -     if (target) {
>> +     if (overwrite && target) {
>>               if (is_dir)
>>                       target->i_flags |= S_DEAD;
>>               dont_mount(new_dentry);
>>       }
>> -     if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE))
>> -             d_move(old_dentry, new_dentry);
>> +     if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE)) {
>> +             if (overwrite)
>> +                     d_move(old_dentry, new_dentry);
>> +             else
>> +                     d_exchange(old_dentry, new_dentry);
>> +     }
>>  out:
>> -     if (target)
>> +     if (overwrite && target)
>>               mutex_unlock(&target->i_mutex);
>>       dput(new_dentry);
>> -     if (!error)
>> +     if (!error) {
>>               fsnotify_move(old_dir, new_dir, old_name, is_dir,
>> -                           target, old_dentry);
>> +                           overwrite ? target : NULL, old_dentry);
>> +             if (!overwrite) {
>> +                     fsnotify_move(new_dir, old_dir, old_dentry->d_name.name,
>> +                                   new_is_dir, NULL, new_dentry);
>> +             }
>> +     }
>>       fsnotify_oldname_free(old_name);
>>
>>       return error;
>>  }
>>
>> -SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
>> -             int, newdfd, const char __user *, newname)
>> +int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
>> +            struct inode *new_dir, struct dentry *new_dentry)
>> +{
>> +     return vfs_rename2(old_dir, old_dentry, new_dir, new_dentry, 0);
>> +}
>> +
>> +
>> +SYSCALL_DEFINE5(renameat2, int, olddfd, const char __user *, oldname,
>> +             int, newdfd, const char __user *, newname, unsigned int, flags)
>>  {
>>       struct dentry *old_dir, *new_dir;
>>       struct dentry *old_dentry, *new_dentry;
>> @@ -4058,7 +4104,13 @@ SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
>>       struct filename *to;
>>       unsigned int lookup_flags = 0;
>>       bool should_retry = false;
>> +     bool overwrite = !(flags & RENAME_EXCHANGE);
>>       int error;
>> +
>> +     error = -EOPNOTSUPP;
>> +     if (flags & ~RENAME_EXCHANGE)
>> +             goto exit;
>> +
>>  retry:
>>       from = user_path_parent(olddfd, oldname, &oldnd, lookup_flags);
>>       if (IS_ERR(from)) {
>> @@ -4091,7 +4143,8 @@ retry:
>>
>>       oldnd.flags &= ~LOOKUP_PARENT;
>>       newnd.flags &= ~LOOKUP_PARENT;
>> -     newnd.flags |= LOOKUP_RENAME_TARGET;
>> +     if (overwrite)
>> +             newnd.flags |= LOOKUP_RENAME_TARGET;
>>
>>       trap = lock_rename(new_dir, old_dir);
>>
>> @@ -4108,7 +4161,7 @@ retry:
>>               error = -ENOTDIR;
>>               if (oldnd.last.name[oldnd.last.len])
>>                       goto exit4;
>> -             if (newnd.last.name[newnd.last.len])
>> +             if (overwrite && newnd.last.name[newnd.last.len])
>>                       goto exit4;
>>       }
>>       /* source should not be ancestor of target */
>> @@ -4119,8 +4172,19 @@ retry:
>>       error = PTR_ERR(new_dentry);
>>       if (IS_ERR(new_dentry))
>>               goto exit4;
>> +     if (!overwrite) {
>> +             error = -ENOENT;
>> +             if (!new_dentry->d_inode)
>> +                     goto exit4;
>> +
>> +             if (!S_ISDIR(new_dentry->d_inode->i_mode)) {
>> +                     error = -ENOTDIR;
>> +                     if (newnd.last.name[newnd.last.len])
>> +                             goto exit4;
>> +             }
>> +     }
>>       /* target should not be an ancestor of source */
>> -     error = -ENOTEMPTY;
>> +     error = overwrite ? -ENOTEMPTY : -EINVAL;
>>       if (new_dentry == trap)
>>               goto exit5;
>>
>> @@ -4128,8 +4192,15 @@ retry:
>>                                    &newnd.path, new_dentry);
>>       if (error)
>>               goto exit5;
>> -     error = vfs_rename(old_dir->d_inode, old_dentry,
>> -                                new_dir->d_inode, new_dentry);
>> +     if (!overwrite) {
>> +             error = security_path_rename(&newnd.path, new_dentry,
>> +                                          &oldnd.path, old_dentry);
>> +             if (error)
>> +                     goto exit5;
>> +     }
>> +
>> +     error = vfs_rename2(old_dir->d_inode, old_dentry,
>> +                         new_dir->d_inode, new_dentry, flags);
>>  exit5:
>>       dput(new_dentry);
>>  exit4:
>> @@ -4154,9 +4225,15 @@ exit:
>>       return error;
>>  }
>>
>> +SYSCALL_DEFINE4(renameat, int, olddfd, const char __user *, oldname,
>> +             int, newdfd, const char __user *, newname)
>> +{
>> +     return sys_renameat2(olddfd, oldname, newdfd, newname, 0);
>> +}
>> +
>>  SYSCALL_DEFINE2(rename, const char __user *, oldname, const char __user *, newname)
>>  {
>> -     return sys_renameat(AT_FDCWD, oldname, AT_FDCWD, newname);
>> +     return sys_renameat2(AT_FDCWD, oldname, AT_FDCWD, newname, 0);
>>  }
>>
>>  int vfs_readlink(struct dentry *dentry, char __user *buffer, int buflen, const char *link)
>> diff --git a/include/linux/dcache.h b/include/linux/dcache.h
>> index 59066e0..ce5ebed 100644
>> --- a/include/linux/dcache.h
>> +++ b/include/linux/dcache.h
>> @@ -297,6 +297,7 @@ extern void dentry_update_name_case(struct dentry *, struct qstr *);
>>
>>  /* used for rename() and baskets */
>>  extern void d_move(struct dentry *, struct dentry *);
>> +extern void d_exchange(struct dentry *, struct dentry *);
>>  extern struct dentry *d_ancestor(struct dentry *, struct dentry *);
>>
>>  /* appendix may either be NULL or be used for transname suffixes */
>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>> index 3f40547..71c6cf9 100644
>> --- a/include/linux/fs.h
>> +++ b/include/linux/fs.h
>> @@ -1572,6 +1572,8 @@ struct inode_operations {
>>       int (*mknod) (struct inode *,struct dentry *,umode_t,dev_t);
>>       int (*rename) (struct inode *, struct dentry *,
>>                       struct inode *, struct dentry *);
>> +     int (*rename2) (struct inode *, struct dentry *,
>> +                     struct inode *, struct dentry *, unsigned int);
>>       int (*setattr) (struct dentry *, struct iattr *);
>>       int (*getattr) (struct vfsmount *mnt, struct dentry *, struct kstat *);
>>       int (*setxattr) (struct dentry *, const char *,const void *,size_t,int);
>> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
>> index 6c28b61..ebdafb6 100644
>> --- a/include/uapi/linux/fs.h
>> +++ b/include/uapi/linux/fs.h
>> @@ -35,6 +35,8 @@
>>  #define SEEK_HOLE    4       /* seek to the next hole */
>>  #define SEEK_MAX     SEEK_HOLE
>>
>> +#define RENAME_EXCHANGE      (1 << 0)        /* Exchange source and dest */
>> +
>>  struct fstrim_range {
>>       __u64 start;
>>       __u64 len;
>> --
>> 1.8.1.4
>>
> --
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/7] cross rename
  2013-10-01 16:00 [RFC PATCH 0/7] cross rename Miklos Szeredi
                   ` (6 preceding siblings ...)
  2013-10-01 16:00 ` [PATCH 7/7] ext4: add cross rename support Miklos Szeredi
@ 2013-10-03  1:58 ` H. Peter Anvin
  2013-10-03  5:34   ` Linus Torvalds
  2013-10-04 23:58 ` Andy Lutomirski
  8 siblings, 1 reply; 18+ messages in thread
From: H. Peter Anvin @ 2013-10-03  1:58 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-fsdevel, linux-kernel, viro, torvalds, hch, akpm, dhowells,
	zab, jack, tytso, mszeredi

On 10/01/2013 09:00 AM, Miklos Szeredi wrote:
> This series adds a new syscall, renameat2(), which is the same as renameat() but
> with a flags argument.  Internally i_op->reaname2() is also added, which can
> later be merged with ->rename() but is kept separately for now, since this would
> just blow up this patch without helping review.
> 
> The purpose of extending rename is to add cross-rename, a symmetric variant of
> rename, which exchanges the two files.  This allows interesting things, which
> were not possible before, for example atomically replacing a directory tree with
> a symlink, etc...
> 

I would suggest it shouldn't be renameat2() but rather renameat3(), i.e.
rename file A -> B, if B exists rename B to C.  It may not be desirable
to expose the stale B in the same namespace as A, but still want it to
be possible to scavenge it.  Obviously, A=C is a valid subcase.

	-hpa



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/7] cross rename
  2013-10-03  1:58 ` [RFC PATCH 0/7] cross rename H. Peter Anvin
@ 2013-10-03  5:34   ` Linus Torvalds
  2013-10-03  5:36     ` H. Peter Anvin
  0 siblings, 1 reply; 18+ messages in thread
From: Linus Torvalds @ 2013-10-03  5:34 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Miklos Szeredi, linux-fsdevel, Linux Kernel Mailing List,
	Al Viro, Christoph Hellwig, Andrew Morton, David Howells, zab,
	Jan Kara, Theodore Ts'o, Miklos Szeredi

On Wed, Oct 2, 2013 at 6:58 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>
> I would suggest it shouldn't be renameat2() but rather renameat3(), i.e.
> rename file A -> B, if B exists rename B to C.  It may not be desirable
> to expose the stale B in the same namespace as A, but still want it to
> be possible to scavenge it.  Obviously, A=C is a valid subcase.

I really *really* prefer to stay with two names. Miklos had an earlier
three-name version, and it was hugely more complex, and it does not
fit nearly as well in the model.

Two directory entries is also what the current rename() effectively
always does (clearing one, changing another). So doing the
cross-rename model is actually fairly close to a normal rename. A
three-way one is not actually at all similar.

So I was actually very relieved to see this much simpler and cleaner
model, because the alternative really was nasty. This one looks fairly
simple and clean and straightforward. The previous was none of that.

              Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/7] cross rename
  2013-10-03  5:34   ` Linus Torvalds
@ 2013-10-03  5:36     ` H. Peter Anvin
  0 siblings, 0 replies; 18+ messages in thread
From: H. Peter Anvin @ 2013-10-03  5:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Miklos Szeredi, linux-fsdevel, Linux Kernel Mailing List,
	Al Viro, Christoph Hellwig, Andrew Morton, David Howells, zab,
	Jan Kara, Theodore Ts'o, Miklos Szeredi

Yes... Al  and I had a brief conversation about the complexities over IRC this evening.

Linus Torvalds <torvalds@linux-foundation.org> wrote:
>On Wed, Oct 2, 2013 at 6:58 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>>
>> I would suggest it shouldn't be renameat2() but rather renameat3(),
>i.e.
>> rename file A -> B, if B exists rename B to C.  It may not be
>desirable
>> to expose the stale B in the same namespace as A, but still want it
>to
>> be possible to scavenge it.  Obviously, A=C is a valid subcase.
>
>I really *really* prefer to stay with two names. Miklos had an earlier
>three-name version, and it was hugely more complex, and it does not
>fit nearly as well in the model.
>
>Two directory entries is also what the current rename() effectively
>always does (clearing one, changing another). So doing the
>cross-rename model is actually fairly close to a normal rename. A
>three-way one is not actually at all similar.
>
>So I was actually very relieved to see this much simpler and cleaner
>model, because the alternative really was nasty. This one looks fairly
>simple and clean and straightforward. The previous was none of that.
>
>              Linus

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/7] cross rename
  2013-10-01 16:00 [RFC PATCH 0/7] cross rename Miklos Szeredi
                   ` (7 preceding siblings ...)
  2013-10-03  1:58 ` [RFC PATCH 0/7] cross rename H. Peter Anvin
@ 2013-10-04 23:58 ` Andy Lutomirski
  2013-10-05  0:11   ` Linus Torvalds
  8 siblings, 1 reply; 18+ messages in thread
From: Andy Lutomirski @ 2013-10-04 23:58 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-fsdevel, linux-kernel, viro, torvalds, hch, akpm, dhowells,
	zab, jack, tytso, mszeredi

On 10/01/2013 09:00 AM, Miklos Szeredi wrote:
> This series adds a new syscall, renameat2(), which is the same as renameat() but
> with a flags argument.  Internally i_op->reaname2() is also added, which can
> later be merged with ->rename() but is kept separately for now, since this would
> just blow up this patch without helping review.

How hard would it be to also add RENAME_NOREPLACE that fails if the
destination already exists?

IMO this would get rid of the last sane use of hard links (link + unlink
to simulate non-clobbering rename), and it would be nice on filesystems
that don't have hard links.

Windows has supported this since Windows 98, IIRC (using MoveFileEx).

--Andy


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/7] cross rename
  2013-10-04 23:58 ` Andy Lutomirski
@ 2013-10-05  0:11   ` Linus Torvalds
  0 siblings, 0 replies; 18+ messages in thread
From: Linus Torvalds @ 2013-10-05  0:11 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Miklos Szeredi, linux-fsdevel, Linux Kernel Mailing List,
	Al Viro, Christoph Hellwig, Andrew Morton, David Howells, zab,
	Jan Kara, Theodore Ts'o, Miklos Szeredi

On Fri, Oct 4, 2013 at 4:58 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> How hard would it be to also add RENAME_NOREPLACE that fails if the
> destination already exists?

Trivial. And I agree that that should be a good flag to have.

               Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2013-10-05  0:11 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-01 16:00 [RFC PATCH 0/7] cross rename Miklos Szeredi
2013-10-01 16:00 ` [PATCH 1/7] vfs: rename: move d_move() up Miklos Szeredi
2013-10-01 16:00 ` [PATCH 2/7] vfs: rename: use common code for dir and non-dir Miklos Szeredi
2013-10-01 16:00 ` [PATCH 3/7] vfs: add renameat2 syscall and cross-rename Miklos Szeredi
2013-10-02 12:26   ` Jan Kara
2013-10-02 14:59     ` Miklos Szeredi
2013-10-01 16:00 ` [PATCH 4/7] ext4: rename: create ext4_renament structure for local vars Miklos Szeredi
2013-10-02 12:01   ` Jan Kara
2013-10-01 16:00 ` [PATCH 5/7] ext4: rename: move EMLINK check up Miklos Szeredi
2013-10-02 12:05   ` Jan Kara
2013-10-01 16:00 ` [PATCH 6/7] ext4: rename: split out helper functions Miklos Szeredi
2013-10-02 12:19   ` Jan Kara
2013-10-01 16:00 ` [PATCH 7/7] ext4: add cross rename support Miklos Szeredi
2013-10-03  1:58 ` [RFC PATCH 0/7] cross rename H. Peter Anvin
2013-10-03  5:34   ` Linus Torvalds
2013-10-03  5:36     ` H. Peter Anvin
2013-10-04 23:58 ` Andy Lutomirski
2013-10-05  0:11   ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).