All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v3 0/2] fs: add AT_REPLACE flag for linkat()
@ 2018-04-24  6:19 Omar Sandoval
  2018-04-24  6:19 ` [RFC PATCH v3 1/2] fs: add AT_REPLACE flag for linkat() which replaces the target Omar Sandoval
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Omar Sandoval @ 2018-04-24  6:19 UTC (permalink / raw)
  To: Al Viro, linux-fsdevel; +Cc: Linus Torvalds, linux-api, kernel-team, Xi Wang

From: Omar Sandoval <osandov@fb.com>

Hi, Al,

This is a respin of my linkat() AT_REPLACE series, previously posted
here:

https://patchwork.kernel.org/patch/9636735/
https://patchwork.kernel.org/patch/9636733/

There are no changes since v2, only a rebase onto v4.17-rc2 and some
minor additions to the commit messages.

The goal is to allow for updating a file atomically in-place with an
O_TMPFILE like so:

- open temporary file with O_TMPFILE
- write temporary file contents
- fsync temporary file
- atomically replace permanent location with the temporary file
- fsync parent directory

Patch 1 implements the VFS support for this flag. The implementation
resembles sys_renameat2(), and I took care to preserve all of the
original error cases and make the new error cases consistent with
rename.

Previously, we discussed extending renameat2() instead of linkat() for
this, but this makes a mess of rename and also forces us to special-case
the parent directory for O_TMPFILEs in implementations of
i_op->rename(), so I still think linkat() is a better fit.

Patch 2 adds support for AT_REPLACE to Btrfs. That's the codebase I'm
most familiar with so that's where I started, but it should be
straightforward to implement for other filesystems. v1 of this series
had some incorrect dentry fiddling, so now I just unhash the replaced
dentry for simplicity.

Please take a look.

Thanks!

Cc: Xi Wang <xi@cs.washington.edu>

Omar Sandoval (2):
  fs: add AT_REPLACE flag for linkat() which replaces the target
  Btrfs: add support for linkat() AT_REPLACE

 fs/btrfs/inode.c           |  65 ++++++++++++-
 fs/ecryptfs/inode.c        |   2 +-
 fs/namei.c                 | 181 +++++++++++++++++++++++++++++--------
 fs/nfsd/vfs.c              |   2 +-
 fs/overlayfs/overlayfs.h   |   2 +-
 include/linux/fs.h         |   3 +-
 include/uapi/linux/fcntl.h |   1 +
 7 files changed, 211 insertions(+), 45 deletions(-)

-- 
2.17.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC PATCH v3 1/2] fs: add AT_REPLACE flag for linkat() which replaces the target
  2018-04-24  6:19 [RFC PATCH v3 0/2] fs: add AT_REPLACE flag for linkat() Omar Sandoval
@ 2018-04-24  6:19 ` Omar Sandoval
  2018-04-24 22:14   ` Omar Sandoval
  2018-04-24  6:19 ` [RFC PATCH v3 2/2] Btrfs: add support for linkat() AT_REPLACE Omar Sandoval
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Omar Sandoval @ 2018-04-24  6:19 UTC (permalink / raw)
  To: Al Viro, linux-fsdevel; +Cc: Linus Torvalds, linux-api, kernel-team, Xi Wang

From: Omar Sandoval <osandov@fb.com>

One of the most common uses of temporary files is the classic atomic
replacement pattern, i.e.,

- write temporary file
- fsync temporary file
- rename temporary file over real file
- fsync parent directory

Now, we have O_TMPFILE, which gives us a much better way to create
temporary files, but it's not possible to use it for this pattern.

This patch introduces an AT_REPLACE flag which allows linkat() to
replace the target file. Now, the temporary file in the pattern above
can be a proper O_TMPFILE. Even without O_TMPFILE, this is a new
primitive which might be useful in other contexts.

The implementation on the VFS side mimics sys_renameat2().

Cc: Xi Wang <xi@cs.washington.edu>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/ecryptfs/inode.c        |   2 +-
 fs/namei.c                 | 181 +++++++++++++++++++++++++++++--------
 fs/nfsd/vfs.c              |   2 +-
 fs/overlayfs/overlayfs.h   |   2 +-
 include/linux/fs.h         |   3 +-
 include/uapi/linux/fcntl.h |   1 +
 6 files changed, 150 insertions(+), 41 deletions(-)

diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index 97d17eaeba07..d3d29bf5d6b7 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -437,7 +437,7 @@ static int ecryptfs_link(struct dentry *old_dentry, struct inode *dir,
 	dget(lower_new_dentry);
 	lower_dir_dentry = lock_parent(lower_new_dentry);
 	rc = vfs_link(lower_old_dentry, d_inode(lower_dir_dentry),
-		      lower_new_dentry, NULL);
+		      lower_new_dentry, NULL, 0);
 	if (rc || d_really_is_negative(lower_new_dentry))
 		goto out_lock;
 	rc = ecryptfs_interpose(lower_new_dentry, new_dentry, dir->i_sb);
diff --git a/fs/namei.c b/fs/namei.c
index 186bd2464fd5..2cc2b1deaa12 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4149,6 +4149,7 @@ SYSCALL_DEFINE2(symlink, const char __user *, oldname, const char __user *, newn
  * @dir:	new parent
  * @new_dentry:	where to create the new link
  * @delegated_inode: returns inode needing a delegation break
+ * @flags:      link flags
  *
  * The caller must hold dir->i_mutex
  *
@@ -4162,16 +4163,26 @@ SYSCALL_DEFINE2(symlink, const char __user *, oldname, const char __user *, newn
  * be appropriate for callers that expect the underlying filesystem not
  * to be NFS exported.
  */
-int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_dentry, struct inode **delegated_inode)
+int vfs_link(struct dentry *old_dentry, struct inode *dir,
+	     struct dentry *new_dentry, struct inode **delegated_inode,
+	     unsigned int flags)
 {
 	struct inode *inode = old_dentry->d_inode;
+	struct inode *target = new_dentry->d_inode;
 	unsigned max_links = dir->i_sb->s_max_links;
 	int error;
 
 	if (!inode)
 		return -ENOENT;
 
-	error = may_create(dir, new_dentry);
+	if (target) {
+		if (flags & AT_REPLACE)
+			error = may_delete(dir, new_dentry, d_is_dir(old_dentry));
+		else
+			error = -EEXIST;
+	} else {
+		error = may_create(dir, new_dentry);
+	}
 	if (error)
 		return error;
 
@@ -4190,8 +4201,10 @@ int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_de
 	 */
 	if (HAS_UNMAPPED_ID(inode))
 		return -EPERM;
-	if (!dir->i_op->link)
+	if (!dir->i_op->link && !dir->i_op->link2)
 		return -EPERM;
+	if (flags && !dir->i_op->link2)
+		return -EINVAL;
 	if (S_ISDIR(inode->i_mode))
 		return -EPERM;
 
@@ -4199,26 +4212,58 @@ int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_de
 	if (error)
 		return error;
 
-	inode_lock(inode);
+	dget(new_dentry);
+	lock_two_nondirectories(inode, target);
+
+	if (is_local_mountpoint(new_dentry)) {
+		error = -EBUSY;
+		goto out;
+	}
+
 	/* Make sure we don't allow creating hardlink to an unlinked file */
-	if (inode->i_nlink == 0 && !(inode->i_state & I_LINKABLE))
+	if (inode->i_nlink == 0 && !(inode->i_state & I_LINKABLE)) {
 		error =  -ENOENT;
-	else if (max_links && inode->i_nlink >= max_links)
+		goto out;
+	}
+	if (max_links && inode->i_nlink >= max_links) {
 		error = -EMLINK;
-	else {
-		error = try_break_deleg(inode, delegated_inode);
-		if (!error)
-			error = dir->i_op->link(old_dentry, dir, new_dentry);
+		goto out;
+	}
+
+	error = try_break_deleg(inode, delegated_inode);
+	if (error)
+		goto out;
+	if (target) {
+		error = try_break_deleg(target, delegated_inode);
+		if (error)
+			goto out;
+	}
+
+	if (dir->i_op->link)
+		error = dir->i_op->link(old_dentry, dir, new_dentry);
+	else
+		error = dir->i_op->link2(old_dentry, dir, new_dentry, flags);
+	if (error)
+		goto out;
+
+	if (target) {
+		dont_mount(new_dentry);
+		detach_mounts(new_dentry);
 	}
 
-	if (!error && (inode->i_state & I_LINKABLE)) {
+	if (inode->i_state & I_LINKABLE) {
 		spin_lock(&inode->i_lock);
 		inode->i_state &= ~I_LINKABLE;
 		spin_unlock(&inode->i_lock);
 	}
-	inode_unlock(inode);
-	if (!error)
+out:
+	unlock_two_nondirectories(inode, target);
+	dput(new_dentry);
+	if (!error) {
+		if (target)
+			fsnotify_link_count(target);
 		fsnotify_link(dir, inode, new_dentry);
+	}
 	return error;
 }
 EXPORT_SYMBOL(vfs_link);
@@ -4237,11 +4282,15 @@ int do_linkat(int olddfd, const char __user *oldname, int newdfd,
 {
 	struct dentry *new_dentry;
 	struct path old_path, new_path;
+	struct qstr new_last;
+	int new_type;
 	struct inode *delegated_inode = NULL;
-	int how = 0;
+	struct filename *to;
+	unsigned int how = 0, target_flags;
+	bool should_retry = false;
 	int error;
 
-	if ((flags & ~(AT_SYMLINK_FOLLOW | AT_EMPTY_PATH)) != 0)
+	if ((flags & ~(AT_SYMLINK_FOLLOW | AT_EMPTY_PATH | AT_REPLACE)) != 0)
 		return -EINVAL;
 	/*
 	 * To use null names we require CAP_DAC_READ_SEARCH
@@ -4256,44 +4305,102 @@ int do_linkat(int olddfd, const char __user *oldname, int newdfd,
 
 	if (flags & AT_SYMLINK_FOLLOW)
 		how |= LOOKUP_FOLLOW;
+
+	if (flags & AT_REPLACE)
+		target_flags = LOOKUP_RENAME_TARGET;
+	else
+		target_flags = LOOKUP_CREATE | LOOKUP_EXCL;
 retry:
 	error = user_path_at(olddfd, oldname, how, &old_path);
 	if (error)
 		return error;
 
-	new_dentry = user_path_create(newdfd, newname, &new_path,
-					(how & LOOKUP_REVAL));
-	error = PTR_ERR(new_dentry);
-	if (IS_ERR(new_dentry))
-		goto out;
+	to = filename_parentat(newdfd, getname(newname), how & LOOKUP_REVAL,
+			       &new_path, &new_last, &new_type);
+	if (IS_ERR(to)) {
+		error = PTR_ERR(to);
+		goto exit1;
+	}
+
+	if (old_path.mnt != new_path.mnt) {
+		error = -EXDEV;
+		goto exit2;
+	}
+
+	if (new_type != LAST_NORM) {
+		if (flags & AT_REPLACE)
+			error = -EBUSY;
+		else
+			error = -EEXIST;
+		goto exit2;
+	}
+
+	error = mnt_want_write(old_path.mnt);
+	if (error)
+		goto exit2;
+
+retry_deleg:
+	inode_lock_nested(new_path.dentry->d_inode, I_MUTEX_PARENT);
+
+	new_dentry = __lookup_hash(&new_last, new_path.dentry,
+				   (how & LOOKUP_REVAL) | target_flags);
+	if (IS_ERR(new_dentry)) {
+		error = PTR_ERR(new_dentry);
+		goto exit3;
+	}
+	if (!(flags & AT_REPLACE) && d_is_positive(new_dentry)) {
+		error = -EEXIST;
+		goto exit4;
+	}
+	if (new_last.name[new_last.len]) {
+		/* trailing slash on negative dentry gives -ENOENT */
+		if (d_is_negative(new_dentry)) {
+			error = -ENOENT;
+			goto exit4;
+		}
+
+		/*
+		 * unless the source is a directory, trailing slash gives
+		 * -ENOTDIR (this can only happen in the AT_REPLACE case, so we
+		 * make this consistent with sys_renameat2() even though a
+		 * source directory will fail later with -EPERM)
+		 */
+		if (!d_is_dir(old_path.dentry)) {
+			error = -ENOTDIR;
+			goto exit4;
+		}
+	}
 
-	error = -EXDEV;
-	if (old_path.mnt != new_path.mnt)
-		goto out_dput;
 	error = may_linkat(&old_path);
 	if (unlikely(error))
-		goto out_dput;
+		goto exit4;
 	error = security_path_link(old_path.dentry, &new_path, new_dentry);
 	if (error)
-		goto out_dput;
-	error = vfs_link(old_path.dentry, new_path.dentry->d_inode, new_dentry, &delegated_inode);
-out_dput:
-	done_path_create(&new_path, new_dentry);
+		goto exit4;
+	error = vfs_link(old_path.dentry, new_path.dentry->d_inode, new_dentry,
+			 &delegated_inode, flags & AT_REPLACE);
+exit4:
+	dput(new_dentry);
+exit3:
+	inode_unlock(new_path.dentry->d_inode);
 	if (delegated_inode) {
 		error = break_deleg_wait(&delegated_inode);
-		if (!error) {
-			path_put(&old_path);
-			goto retry;
-		}
+		if (!error)
+			goto retry_deleg;
 	}
-	if (retry_estale(error, how)) {
-		path_put(&old_path);
+	mnt_drop_write(old_path.mnt);
+exit2:
+	if (retry_estale(error, how))
+		should_retry = true;
+	path_put(&new_path);
+	putname(to);
+exit1:
+	path_put(&old_path);
+	if (should_retry) {
+		should_retry = false;
 		how |= LOOKUP_REVAL;
 		goto retry;
 	}
-out:
-	path_put(&old_path);
-
 	return error;
 }
 
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 2410b093a2e6..541a78a6d684 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1594,7 +1594,7 @@ nfsd_link(struct svc_rqst *rqstp, struct svc_fh *ffhp,
 	err = nfserr_noent;
 	if (d_really_is_negative(dold))
 		goto out_dput;
-	host_err = vfs_link(dold, dirp, dnew, NULL);
+	host_err = vfs_link(dold, dirp, dnew, NULL, 0);
 	if (!host_err) {
 		err = nfserrno(commit_metadata(ffhp));
 		if (!err)
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index e0b7de799f6b..906d0ee1de74 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -100,7 +100,7 @@ static inline int ovl_do_unlink(struct inode *dir, struct dentry *dentry)
 static inline int ovl_do_link(struct dentry *old_dentry, struct inode *dir,
 			      struct dentry *new_dentry, bool debug)
 {
-	int err = vfs_link(old_dentry, dir, new_dentry, NULL);
+	int err = vfs_link(old_dentry, dir, new_dentry, NULL, 0);
 	if (debug) {
 		pr_debug("link(%pd2, %pd2) = %i\n",
 			 old_dentry, new_dentry, err);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 760d8da1b6c7..8e44b12023c2 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1607,7 +1607,7 @@ extern int vfs_create(struct inode *, struct dentry *, umode_t, bool);
 extern int vfs_mkdir(struct inode *, struct dentry *, umode_t);
 extern int vfs_mknod(struct inode *, struct dentry *, umode_t, dev_t);
 extern int vfs_symlink(struct inode *, struct dentry *, const char *);
-extern int vfs_link(struct dentry *, struct inode *, struct dentry *, struct inode **);
+extern int vfs_link(struct dentry *, struct inode *, struct dentry *, struct inode **, unsigned int);
 extern int vfs_rmdir(struct inode *, struct dentry *);
 extern int vfs_unlink(struct inode *, struct dentry *, struct inode **);
 extern int vfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *, struct inode **, unsigned int);
@@ -1752,6 +1752,7 @@ struct inode_operations {
 
 	int (*create) (struct inode *,struct dentry *, umode_t, bool);
 	int (*link) (struct dentry *,struct inode *,struct dentry *);
+	int (*link2) (struct dentry *,struct inode *,struct dentry *,unsigned int);
 	int (*unlink) (struct inode *,struct dentry *);
 	int (*symlink) (struct inode *,struct dentry *,const char *);
 	int (*mkdir) (struct inode *,struct dentry *,umode_t);
diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
index 6448cdd9a350..b601ad36e726 100644
--- a/include/uapi/linux/fcntl.h
+++ b/include/uapi/linux/fcntl.h
@@ -84,6 +84,7 @@
 #define AT_SYMLINK_FOLLOW	0x400   /* Follow symbolic links.  */
 #define AT_NO_AUTOMOUNT		0x800	/* Suppress terminal automount traversal */
 #define AT_EMPTY_PATH		0x1000	/* Allow empty relative pathname */
+#define AT_REPLACE		0x2000	/* Replace new path */
 
 #define AT_STATX_SYNC_TYPE	0x6000	/* Type of synchronisation required from statx() */
 #define AT_STATX_SYNC_AS_STAT	0x0000	/* - Do whatever stat() does */
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH v3 2/2] Btrfs: add support for linkat() AT_REPLACE
  2018-04-24  6:19 [RFC PATCH v3 0/2] fs: add AT_REPLACE flag for linkat() Omar Sandoval
  2018-04-24  6:19 ` [RFC PATCH v3 1/2] fs: add AT_REPLACE flag for linkat() which replaces the target Omar Sandoval
@ 2018-04-24  6:19 ` Omar Sandoval
  2018-04-24 13:21 ` [RFC PATCH v3 0/2] fs: add AT_REPLACE flag for linkat() Christoph Hellwig
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Omar Sandoval @ 2018-04-24  6:19 UTC (permalink / raw)
  To: Al Viro, linux-fsdevel; +Cc: Linus Torvalds, linux-api, kernel-team, Xi Wang

From: Omar Sandoval <osandov@fb.com>

The implementation is fairly straightforward and looks a lot like
btrfs_rename(). The only tricky bit is that instead of playing games
with the dcache, we simply drop the dentry for it to be instantiated on
the next lookup. This can be improved in the future.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/inode.c | 65 +++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 61 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d241285a0d2a..a061285bf3ab 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6675,16 +6675,21 @@ static int btrfs_create(struct inode *dir, struct dentry *dentry,
 }
 
 static int btrfs_link(struct dentry *old_dentry, struct inode *dir,
-		      struct dentry *dentry)
+		      struct dentry *dentry, unsigned int flags)
 {
 	struct btrfs_trans_handle *trans = NULL;
+	unsigned int trans_num_items;
 	struct btrfs_root *root = BTRFS_I(dir)->root;
 	struct inode *inode = d_inode(old_dentry);
+	struct inode *new_inode = d_inode(dentry);
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	u64 index;
 	int err;
 	int drop_inode = 0;
 
+	if (flags & ~AT_REPLACE)
+		return -EINVAL;
+
 	/* do not allow sys_link's with other subvols of the same device */
 	if (root->objectid != BTRFS_I(inode)->root->objectid)
 		return -EXDEV;
@@ -6692,16 +6697,47 @@ static int btrfs_link(struct dentry *old_dentry, struct inode *dir,
 	if (inode->i_nlink >= BTRFS_LINK_MAX)
 		return -EMLINK;
 
+	/* check for collisions, even if the  name isn't there */
+	err = btrfs_check_dir_item_collision(root, dir->i_ino,
+					     dentry->d_name.name,
+					     dentry->d_name.len);
+	if (err) {
+		if (err == -EEXIST) {
+			if (WARN_ON(!new_inode))
+				return err;
+		} else {
+			return err;
+		}
+	}
+
+	/*
+	 * we're using link to replace one file with another. Start IO on it now
+	 * so we don't add too much work to the end of the transaction
+	 */
+	if (new_inode && S_ISREG(inode->i_mode) && new_inode->i_size)
+		filemap_flush(inode->i_mapping);
+
 	err = btrfs_set_inode_index(BTRFS_I(dir), &index);
 	if (err)
 		goto fail;
 
 	/*
+	 * For the source:
 	 * 2 items for inode and inode ref
 	 * 2 items for dir items
 	 * 1 item for parent inode
+	 *
+	 * For the target:
+	 * 1 for the possible orphan item
+	 * 1 for the dir item
+	 * 1 for the dir index
+	 * 1 for the inode ref
+	 * 1 for the inode
 	 */
-	trans = btrfs_start_transaction(root, 5);
+	trans_num_items = 5;
+	if (new_inode)
+		trans_num_items += 5;
+	trans = btrfs_start_transaction(root, trans_num_items);
 	if (IS_ERR(trans)) {
 		err = PTR_ERR(trans);
 		trans = NULL;
@@ -6713,6 +6749,22 @@ static int btrfs_link(struct dentry *old_dentry, struct inode *dir,
 	inc_nlink(inode);
 	inode_inc_iversion(inode);
 	inode->i_ctime = current_time(inode);
+
+	if (new_inode) {
+		inode_inc_iversion(new_inode);
+		new_inode->i_ctime = current_time(new_inode);
+		err = btrfs_unlink_inode(trans, root, BTRFS_I(dir),
+					 BTRFS_I(new_inode),
+					 dentry->d_name.name,
+					 dentry->d_name.len);
+		if (!err && new_inode->i_nlink == 0)
+			err = btrfs_orphan_add(trans, BTRFS_I(new_inode));
+		if (err) {
+			btrfs_abort_transaction(trans, err);
+			goto fail;
+		}
+	}
+
 	ihold(inode);
 	set_bit(BTRFS_INODE_COPY_EVERYTHING, &BTRFS_I(inode)->runtime_flags);
 
@@ -6735,7 +6787,12 @@ static int btrfs_link(struct dentry *old_dentry, struct inode *dir,
 			if (err)
 				goto fail;
 		}
-		d_instantiate(dentry, inode);
+		if (new_inode) {
+			d_drop(dentry);
+			iput(inode);
+		} else {
+			d_instantiate(dentry, inode);
+		}
 		btrfs_log_new_name(trans, BTRFS_I(inode), NULL, parent);
 	}
 
@@ -10555,7 +10612,7 @@ static const struct inode_operations btrfs_dir_inode_operations = {
 	.lookup		= btrfs_lookup,
 	.create		= btrfs_create,
 	.unlink		= btrfs_unlink,
-	.link		= btrfs_link,
+	.link2		= btrfs_link,
 	.mkdir		= btrfs_mkdir,
 	.rmdir		= btrfs_rmdir,
 	.rename		= btrfs_rename2,
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH v3 0/2] fs: add AT_REPLACE flag for linkat()
  2018-04-24  6:19 [RFC PATCH v3 0/2] fs: add AT_REPLACE flag for linkat() Omar Sandoval
  2018-04-24  6:19 ` [RFC PATCH v3 1/2] fs: add AT_REPLACE flag for linkat() which replaces the target Omar Sandoval
  2018-04-24  6:19 ` [RFC PATCH v3 2/2] Btrfs: add support for linkat() AT_REPLACE Omar Sandoval
@ 2018-04-24 13:21 ` Christoph Hellwig
  2018-04-24 15:26   ` Omar Sandoval
  2018-04-27 12:00 ` Michael Kerrisk (man-pages)
  2018-05-04 18:30 ` Omar Sandoval
  4 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2018-04-24 13:21 UTC (permalink / raw)
  To: Omar Sandoval
  Cc: Al Viro, linux-fsdevel, Linus Torvalds, linux-api, kernel-team, Xi Wang

> Patch 1 implements the VFS support for this flag. The implementation
> resembles sys_renameat2(), and I took care to preserve all of the
> original error cases and make the new error cases consistent with
> rename.

Shouldn't we try to reuse the rename code and the ->rename method
instead of largely duplicating it?

In fact I wonder if a better interface would just use renameat(2)
and accept the AT_EMPTY_PATH (for the source name only).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH v3 0/2] fs: add AT_REPLACE flag for linkat()
  2018-04-24 13:21 ` [RFC PATCH v3 0/2] fs: add AT_REPLACE flag for linkat() Christoph Hellwig
@ 2018-04-24 15:26   ` Omar Sandoval
  2018-04-26  7:49     ` Omar Sandoval
  0 siblings, 1 reply; 10+ messages in thread
From: Omar Sandoval @ 2018-04-24 15:26 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Al Viro, linux-fsdevel, Linus Torvalds, linux-api, kernel-team, Xi Wang

On Tue, Apr 24, 2018 at 06:21:07AM -0700, Christoph Hellwig wrote:
> > Patch 1 implements the VFS support for this flag. The implementation
> > resembles sys_renameat2(), and I took care to preserve all of the
> > original error cases and make the new error cases consistent with
> > rename.
> 
> Shouldn't we try to reuse the rename code and the ->rename method
> instead of largely duplicating it?

As I mentioned in my cover letter, I did try, and it was uglier to reuse
rename. 

For O_TMPFILE, we can't pass a meaningful old_dir to ->rename(), so now
every ->rename() supporting AT_EMPTY_PATH has to grow a bunch of special
cases for ignoring old_dir (I looked at Btrfs, ext4, and XFS, and it'd
be messy for all three). Conversely, ->link() just needs the extra
unlink before link logic (I added ->link2() for this RFC, but for the
actual patch I'll just add the flag to ->link()).

On the VFS side, in my opinion, there isn't that much duplication -- we
have the may_delete()/is_local_mountpoint() a la unlink, delegation
handling, and some specific error cases we need to handle similarly to
rename, but that's not too suprising for a new primitive.

> In fact I wonder if a better interface would just use renameat(2)
> and accept the AT_EMPTY_PATH (for the source name only).

renameat() makes sense for the O_TMPFILE case, but linkat() + AT_REPLACE
also supports a normal, named source file, and renameat() +
AT_EMPTY_PATH for that doesn't make sense at all (rename this file by
file descriptor, but don't really rename it?).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH v3 1/2] fs: add AT_REPLACE flag for linkat() which replaces the target
  2018-04-24  6:19 ` [RFC PATCH v3 1/2] fs: add AT_REPLACE flag for linkat() which replaces the target Omar Sandoval
@ 2018-04-24 22:14   ` Omar Sandoval
  0 siblings, 0 replies; 10+ messages in thread
From: Omar Sandoval @ 2018-04-24 22:14 UTC (permalink / raw)
  To: Al Viro, linux-fsdevel; +Cc: Linus Torvalds, linux-api, kernel-team, Xi Wang

On Mon, Apr 23, 2018 at 11:19:41PM -0700, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> One of the most common uses of temporary files is the classic atomic
> replacement pattern, i.e.,
> 
> - write temporary file
> - fsync temporary file
> - rename temporary file over real file
> - fsync parent directory
> 
> Now, we have O_TMPFILE, which gives us a much better way to create
> temporary files, but it's not possible to use it for this pattern.
> 
> This patch introduces an AT_REPLACE flag which allows linkat() to
> replace the target file. Now, the temporary file in the pattern above
> can be a proper O_TMPFILE. Even without O_TMPFILE, this is a new
> primitive which might be useful in other contexts.
> 
> The implementation on the VFS side mimics sys_renameat2().
> 
> Cc: Xi Wang <xi@cs.washington.edu>
> Signed-off-by: Omar Sandoval <osandov@fb.com>
> ---
>  fs/ecryptfs/inode.c        |   2 +-
>  fs/namei.c                 | 181 +++++++++++++++++++++++++++++--------
>  fs/nfsd/vfs.c              |   2 +-
>  fs/overlayfs/overlayfs.h   |   2 +-
>  include/linux/fs.h         |   3 +-
>  include/uapi/linux/fcntl.h |   1 +
>  6 files changed, 150 insertions(+), 41 deletions(-)
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index 186bd2464fd5..2cc2b1deaa12 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -4149,6 +4149,7 @@ SYSCALL_DEFINE2(symlink, const char __user *, oldname, const char __user *, newn
>   * @dir:	new parent
>   * @new_dentry:	where to create the new link
>   * @delegated_inode: returns inode needing a delegation break
> + * @flags:      link flags
>   *
>   * The caller must hold dir->i_mutex
>   *
> @@ -4162,16 +4163,26 @@ SYSCALL_DEFINE2(symlink, const char __user *, oldname, const char __user *, newn
>   * be appropriate for callers that expect the underlying filesystem not
>   * to be NFS exported.
>   */
> -int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_dentry, struct inode **delegated_inode)
> +int vfs_link(struct dentry *old_dentry, struct inode *dir,
> +	     struct dentry *new_dentry, struct inode **delegated_inode,
> +	     unsigned int flags)
>  {
>  	struct inode *inode = old_dentry->d_inode;
> +	struct inode *target = new_dentry->d_inode;
>  	unsigned max_links = dir->i_sb->s_max_links;
>  	int error;
>  
>  	if (!inode)
>  		return -ENOENT;
>  
> -	error = may_create(dir, new_dentry);
> +	if (target) {
> +		if (flags & AT_REPLACE)

This needs an

	if (inode == target)
		return 0;
		
I'll fix this in v4.

> +			error = may_delete(dir, new_dentry, d_is_dir(old_dentry));
> +		else
> +			error = -EEXIST;
> +	} else {
> +		error = may_create(dir, new_dentry);
> +	}
>  	if (error)
>  		return error;
>  

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH v3 0/2] fs: add AT_REPLACE flag for linkat()
  2018-04-24 15:26   ` Omar Sandoval
@ 2018-04-26  7:49     ` Omar Sandoval
  0 siblings, 0 replies; 10+ messages in thread
From: Omar Sandoval @ 2018-04-26  7:49 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Al Viro, linux-fsdevel, Linus Torvalds, linux-api, kernel-team, Xi Wang

On Tue, Apr 24, 2018 at 08:26:10AM -0700, Omar Sandoval wrote:
> On Tue, Apr 24, 2018 at 06:21:07AM -0700, Christoph Hellwig wrote:
> > > Patch 1 implements the VFS support for this flag. The implementation
> > > resembles sys_renameat2(), and I took care to preserve all of the
> > > original error cases and make the new error cases consistent with
> > > rename.
> > 
> > Shouldn't we try to reuse the rename code and the ->rename method
> > instead of largely duplicating it?
> 
> As I mentioned in my cover letter, I did try, and it was uglier to reuse
> rename. 
> 
> For O_TMPFILE, we can't pass a meaningful old_dir to ->rename(), so now
> every ->rename() supporting AT_EMPTY_PATH has to grow a bunch of special
> cases for ignoring old_dir (I looked at Btrfs, ext4, and XFS, and it'd
> be messy for all three). Conversely, ->link() just needs the extra
> unlink before link logic (I added ->link2() for this RFC, but for the
> actual patch I'll just add the flag to ->link()).
> 
> On the VFS side, in my opinion, there isn't that much duplication -- we
> have the may_delete()/is_local_mountpoint() a la unlink, delegation
> handling, and some specific error cases we need to handle similarly to
> rename, but that's not too suprising for a new primitive.
> 
> > In fact I wonder if a better interface would just use renameat(2)
> > and accept the AT_EMPTY_PATH (for the source name only).
> 
> renameat() makes sense for the O_TMPFILE case, but linkat() + AT_REPLACE
> also supports a normal, named source file, and renameat() +
> AT_EMPTY_PATH for that doesn't make sense at all (rename this file by
> file descriptor, but don't really rename it?).

Oh, another argument in favor of linkat() is that existing programs
which are naming an O_TMPFILE will already be using linkat(), and it's
just a matter of adding the AT_REPLACE flag and skipping the temporary
name/rename dance.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH v3 0/2] fs: add AT_REPLACE flag for linkat()
  2018-04-24  6:19 [RFC PATCH v3 0/2] fs: add AT_REPLACE flag for linkat() Omar Sandoval
                   ` (2 preceding siblings ...)
  2018-04-24 13:21 ` [RFC PATCH v3 0/2] fs: add AT_REPLACE flag for linkat() Christoph Hellwig
@ 2018-04-27 12:00 ` Michael Kerrisk (man-pages)
  2018-05-04 18:29   ` Omar Sandoval
  2018-05-04 18:30 ` Omar Sandoval
  4 siblings, 1 reply; 10+ messages in thread
From: Michael Kerrisk (man-pages) @ 2018-04-27 12:00 UTC (permalink / raw)
  To: Omar Sandoval, Al Viro, linux-fsdevel
  Cc: mtk.manpages, Linus Torvalds, linux-api, kernel-team, Xi Wang

Hello Omar,

On 04/24/2018 08:19 AM, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> Hi, Al,
> 
> This is a respin of my linkat() AT_REPLACE series, previously posted
> here:
> 
> https://patchwork.kernel.org/patch/9636735/
> https://patchwork.kernel.org/patch/9636733/
> 
> There are no changes since v2, only a rebase onto v4.17-rc2 and some
> minor additions to the commit messages.
> 
> The goal is to allow for updating a file atomically in-place with an
> O_TMPFILE like so:
> 
> - open temporary file with O_TMPFILE
> - write temporary file contents
> - fsync temporary file
> - atomically replace permanent location with the temporary file
> - fsync parent directory

Could we have an example program with the next revision please,
perhaps as part of the commit message?

And, please also some text suitable for inclusion in the linkat(2)
manual page.

Thanks,

Michael

> Patch 1 implements the VFS support for this flag. The implementation
> resembles sys_renameat2(), and I took care to preserve all of the
> original error cases and make the new error cases consistent with
> rename.
> 
> Previously, we discussed extending renameat2() instead of linkat() for
> this, but this makes a mess of rename and also forces us to special-case
> the parent directory for O_TMPFILEs in implementations of
> i_op->rename(), so I still think linkat() is a better fit.
> 
> Patch 2 adds support for AT_REPLACE to Btrfs. That's the codebase I'm
> most familiar with so that's where I started, but it should be
> straightforward to implement for other filesystems. v1 of this series
> had some incorrect dentry fiddling, so now I just unhash the replaced
> dentry for simplicity.
> 
> Please take a look.
> 
> Thanks!
> 
> Cc: Xi Wang <xi@cs.washington.edu>
> 
> Omar Sandoval (2):
>   fs: add AT_REPLACE flag for linkat() which replaces the target
>   Btrfs: add support for linkat() AT_REPLACE
> 
>  fs/btrfs/inode.c           |  65 ++++++++++++-
>  fs/ecryptfs/inode.c        |   2 +-
>  fs/namei.c                 | 181 +++++++++++++++++++++++++++++--------
>  fs/nfsd/vfs.c              |   2 +-
>  fs/overlayfs/overlayfs.h   |   2 +-
>  include/linux/fs.h         |   3 +-
>  include/uapi/linux/fcntl.h |   1 +
>  7 files changed, 211 insertions(+), 45 deletions(-)
> 


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH v3 0/2] fs: add AT_REPLACE flag for linkat()
  2018-04-27 12:00 ` Michael Kerrisk (man-pages)
@ 2018-05-04 18:29   ` Omar Sandoval
  0 siblings, 0 replies; 10+ messages in thread
From: Omar Sandoval @ 2018-05-04 18:29 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Al Viro, linux-fsdevel, Linus Torvalds, linux-api, kernel-team, Xi Wang

On Fri, Apr 27, 2018 at 02:00:39PM +0200, Michael Kerrisk (man-pages) wrote:
> Hello Omar,
> 
> On 04/24/2018 08:19 AM, Omar Sandoval wrote:
> > From: Omar Sandoval <osandov@fb.com>
> > 
> > Hi, Al,
> > 
> > This is a respin of my linkat() AT_REPLACE series, previously posted
> > here:
> > 
> > https://patchwork.kernel.org/patch/9636735/
> > https://patchwork.kernel.org/patch/9636733/
> > 
> > There are no changes since v2, only a rebase onto v4.17-rc2 and some
> > minor additions to the commit messages.
> > 
> > The goal is to allow for updating a file atomically in-place with an
> > O_TMPFILE like so:
> > 
> > - open temporary file with O_TMPFILE
> > - write temporary file contents
> > - fsync temporary file
> > - atomically replace permanent location with the temporary file
> > - fsync parent directory
> 
> Could we have an example program with the next revision please,
> perhaps as part of the commit message?
> 
> And, please also some text suitable for inclusion in the linkat(2)
> manual page.
> 
> Thanks,
> 
> Michael

Yes, I'll do that.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH v3 0/2] fs: add AT_REPLACE flag for linkat()
  2018-04-24  6:19 [RFC PATCH v3 0/2] fs: add AT_REPLACE flag for linkat() Omar Sandoval
                   ` (3 preceding siblings ...)
  2018-04-27 12:00 ` Michael Kerrisk (man-pages)
@ 2018-05-04 18:30 ` Omar Sandoval
  4 siblings, 0 replies; 10+ messages in thread
From: Omar Sandoval @ 2018-05-04 18:30 UTC (permalink / raw)
  To: Al Viro, linux-fsdevel; +Cc: Linus Torvalds, linux-api, kernel-team, Xi Wang

On Mon, Apr 23, 2018 at 11:19:40PM -0700, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> Hi, Al,
> 
> This is a respin of my linkat() AT_REPLACE series, previously posted
> here:
> 
> https://patchwork.kernel.org/patch/9636735/
> https://patchwork.kernel.org/patch/9636733/
> 
> There are no changes since v2, only a rebase onto v4.17-rc2 and some
> minor additions to the commit messages.
> 
> The goal is to allow for updating a file atomically in-place with an
> O_TMPFILE like so:
> 
> - open temporary file with O_TMPFILE
> - write temporary file contents
> - fsync temporary file
> - atomically replace permanent location with the temporary file
> - fsync parent directory
> 
> Patch 1 implements the VFS support for this flag. The implementation
> resembles sys_renameat2(), and I took care to preserve all of the
> original error cases and make the new error cases consistent with
> rename.
> 
> Previously, we discussed extending renameat2() instead of linkat() for
> this, but this makes a mess of rename and also forces us to special-case
> the parent directory for O_TMPFILEs in implementations of
> i_op->rename(), so I still think linkat() is a better fit.
> 
> Patch 2 adds support for AT_REPLACE to Btrfs. That's the codebase I'm
> most familiar with so that's where I started, but it should be
> straightforward to implement for other filesystems. v1 of this series
> had some incorrect dentry fiddling, so now I just unhash the replaced
> dentry for simplicity.
> 
> Please take a look.
> 
> Thanks!

Ping, Al, can you take a look?

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-05-04 18:30 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-24  6:19 [RFC PATCH v3 0/2] fs: add AT_REPLACE flag for linkat() Omar Sandoval
2018-04-24  6:19 ` [RFC PATCH v3 1/2] fs: add AT_REPLACE flag for linkat() which replaces the target Omar Sandoval
2018-04-24 22:14   ` Omar Sandoval
2018-04-24  6:19 ` [RFC PATCH v3 2/2] Btrfs: add support for linkat() AT_REPLACE Omar Sandoval
2018-04-24 13:21 ` [RFC PATCH v3 0/2] fs: add AT_REPLACE flag for linkat() Christoph Hellwig
2018-04-24 15:26   ` Omar Sandoval
2018-04-26  7:49     ` Omar Sandoval
2018-04-27 12:00 ` Michael Kerrisk (man-pages)
2018-05-04 18:29   ` Omar Sandoval
2018-05-04 18:30 ` Omar Sandoval

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.